[ 
https://issues.apache.org/jira/browse/ARROW-17005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Li resolved ARROW-17005.
------------------------------
    Fix Version/s: 9.0.0
       Resolution: Fixed

Issue resolved by pull request 13558
[https://github.com/apache/arrow/pull/13558]

> [Java] Incorrect results from JDBC Adapter from Postgres of non-nullable 
> column through left join
> -------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-17005
>                 URL: https://issues.apache.org/jira/browse/ARROW-17005
>             Project: Apache Arrow
>          Issue Type: Wish
>          Components: Java
>            Reporter: Jonathan Swenson
>            Assignee: David Li
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 9.0.0
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Unsure to consider this a bug or wish, but the JDBC to Arrow Adapter produces 
> incorrect results when wrapping the postgres driver in certain cases. 
> If you left join a non-nullable column, the column becomes nullable (if the 
> join condition does not match any columns). However the postgres 
> ResultSetMetaData lies to you and still indicates that the column is still 
> non-nullable. 
> When iterating through the data, results come back as null (isNull will 
> return true). 
> However, because of the way that the JDBCConsumer is created, it creates a 
> non-nullable consumer and will not check the nullability of these results. 
> Unfortunately, this results in incorrect data or errors depending on the data 
> types returned. 
> The postgres JDBC team has closed a ticket about this indicating that it 
> would be impossible for them to return the correct data nullability data to 
> the JDBC driver. see: [https://github.com/pgjdbc/pgjdbc/issues/2079]
> An example: 
> Table: 
> ||t1.id||
> |2|
> |3|
> {code:java}
> CREATE TABLE t1 (id integer NOT NULL);
> INSERT INTO t1 VALUES (2), (3);
> {code}
> Query
> {code:java}
> WITH t2 AS (SELECT 1 AS id UNION SELECT 2)
> SELECT 
>   t1.id 
> FROM t2 
> LEFT JOIN t1 on t1.id = t2.id;{code}
> This returns the result set:
> ||id||
> |2|
> |null|
> The ResultSetMetaData indicates that the column is non-nullable (as t1.id is 
> non-nullable) but there is null data in the result. 
> The Arrow Vector that is present after the result set is consumed, looks like 
> this: 
> ||id||
> |2|
> |0|
> ResultSet.getInt(1) will return 0 when the source data is null, with an 
> expectation that you check isNull. 
> The data is incorrect and silently fails potentially leading to clients / 
> consumers getting bad data. 
>  
> In other cases, such as UUID (mapped to UTF-8 vectors) the value will fail to 
> load into arrow due to expecting null data and throwing a NPE when 
> deserializing / converting to bytearrays. 
>  
> I was able to work around this problem by wrapping the postgres JDBC 
> ResultSetMetadata and always forcing the nullability to nullable (or 
> nullability unknown). 
> Unfortunately I don't think there is a great way to solve this, but perhaps 
> some way to configure / override the JDBCConsumer creation would allow for 
> users of this library to override this behavior, however the silent failure 
> and incorrect data might lead to users not noticing. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to