[ https://issues.apache.org/jira/browse/ARROW-17005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563896#comment-17563896 ]
Jonathan Swenson commented on ARROW-17005: ------------------------------------------ It's possible to override arrow output type currently (using setJdbcToArrowTypeConverter) but it seems like what you are describing would be a generalization that would allow for writing a custom JDBCConsumer (or am I misinterpreting "custom conversion for a custom type") I think that would also generalize some of the work that [~toddfarmer] has done to help with the BigDecimal issues I've run into: https://issues.apache.org/jira/browse/ARROW-16600 https://issues.apache.org/jira/browse/ARROW-16427 The postgres driver has really been a bit of a pain for me 🤦🏼♂️ > [Java] Incorrect results from JDBC Adapter from Postgres of non-nullable > column through left join > ------------------------------------------------------------------------------------------------- > > Key: ARROW-17005 > URL: https://issues.apache.org/jira/browse/ARROW-17005 > Project: Apache Arrow > Issue Type: Wish > Components: Java > Reporter: Jonathan Swenson > Priority: Major > > Unsure to consider this a bug or wish, but the JDBC to Arrow Adapter produces > incorrect results when wrapping the postgres driver in certain cases. > If you left join a non-nullable column, the column becomes nullable (if the > join condition does not match any columns). However the postgres > ResultSetMetaData lies to you and still indicates that the column is still > non-nullable. > When iterating through the data, results come back as null (isNull will > return true). > However, because of the way that the JDBCConsumer is created, it creates a > non-nullable consumer and will not check the nullability of these results. > Unfortunately, this results in incorrect data or errors depending on the data > types returned. > The postgres JDBC team has closed a ticket about this indicating that it > would be impossible for them to return the correct data nullability data to > the JDBC driver. see: [https://github.com/pgjdbc/pgjdbc/issues/2079] > An example: > Table: > ||t1.id|| > |2| > |3| > {code:java} > CREATE TABLE t1 (id integer NOT NULL); > INSERT INTO t1 VALUES (2), (3); > {code} > Query > {code:java} > WITH t2 AS (SELECT 1 AS id UNION SELECT 2) > SELECT > t1.id > FROM t2 > LEFT JOIN t1 on t1.id = t2.id;{code} > This returns the result set: > ||id|| > |2| > |null| > The ResultSetMetaData indicates that the column is non-nullable (as t1.id is > non-nullable) but there is null data in the result. > The Arrow Vector that is present after the result set is consumed, looks like > this: > ||id|| > |2| > |0| > ResultSet.getInt(1) will return 0 when the source data is null, with an > expectation that you check isNull. > The data is incorrect and silently fails potentially leading to clients / > consumers getting bad data. > > In other cases, such as UUID (mapped to UTF-8 vectors) the value will fail to > load into arrow due to expecting null data and throwing a NPE when > deserializing / converting to bytearrays. > > I was able to work around this problem by wrapping the postgres JDBC > ResultSetMetadata and always forcing the nullability to nullable (or > nullability unknown). > Unfortunately I don't think there is a great way to solve this, but perhaps > some way to configure / override the JDBCConsumer creation would allow for > users of this library to override this behavior, however the silent failure > and incorrect data might lead to users not noticing. -- This message was sent by Atlassian Jira (v8.20.10#820010)