[ 
https://issues.apache.org/jira/browse/ARROW-16035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534011#comment-17534011
 ] 

Todd Farmer commented on ARROW-16035:
-------------------------------------

[~jswenson] : Thanks for your review and comments!  Here's my thinking:

I understand the comment about ResultSet.isLast() being optional only for 
TYPE_FORWARD_ONLY ResultSets.  I see the same note about optional support in 
the [documentation for 
isAfterLast()|https://docs.oracle.com/en/java/javase/11/docs/api/java.sql/java/sql/ResultSet.html#isAfterLast()]:
{code:java}
boolean isAfterLast() throws SQLException

Retrieves whether the cursor is after the last row in this ResultSet object.

Note:Support for the isAfterLast method is optional for ResultSets with a 
result set type of TYPE_FORWARD_ONLY {code}
While it's possible that a JDBC driver offers support for one of the above 
methods, but not the other, I considered them effectively equivalent for our 
purposes.  Both are optional in the same context, one is potentially expensive, 
the other ambiguous in the case of empty results.  In the event of a JDBC 
driver electing to not support these methods, there is no mechanism prior to 
the first time the underlying ResultSet is read to assess whether an empty 
ResultSet has been supplied - we literally have to call ResultSet.next() to see 
what gets returned.  That doesn't happen during initialization of the 
ArrowVectorIterator object - it only happens when ArrowVectorIterator.next() is 
called.  That effectively means that there is no way to support 
ArrowVectorIterator.hasNext() without first calling ArrowVectorIterator.next() 
- at least in the context where the JDBC driver vendor has elected to not 
implement the optional methods.

We could elect to have ArrowVectorIterator.hasNext() return true until 
ArrowVectorIterator.next() has been called, and rely entirely on the added 
"readComplete" variable to indicate when the ResultSet has been fully read or 
not.  That would result in the following strange behavior, though:
{code:java}
ArrowVectorIterator iter = JdbcToArrow.sqlToArrowVectorIterator(emptyResultSet, 
config);
boolean willBeTrue = iter.hasNext();  // always true until next() is called
VectorSchemaRoot root = iter.next();  // reads ResultSet for first time, 
discovers it's empty
int willBeZero = root.getRowCount();  // no rows!
boolean willBeFalse = iter.hasNext(); // now is false after reading 
ResultSet{code}
Effectively, this would make ArrowVectorIterator _always_ return at least one 
VectorSchemaRoot, even when the underlying ResultSet is empty.  I'm not 
entirely clear whether this is acceptable.

While not really related to your overall point, I'm having difficulty 
identifying a context in which ResultSet.isLast() would be expensive, while 
ResultSet.isAfterLast() would not be.

> [Java] Arrow to JDBC ArrowVectorIterator with does not terminate with empty 
> result set
> --------------------------------------------------------------------------------------
>
>                 Key: ARROW-16035
>                 URL: https://issues.apache.org/jira/browse/ARROW-16035
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java
>    Affects Versions: 7.0.0
>            Reporter: Jonathan Swenson
>            Assignee: Todd Farmer
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 8.0.0
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Using an ArrowVectorIterator built from a JDBC Result Set that is empty 
> causes the iterator to never terminate. 
> {code:java}
> ArrowVectorIterator iterator =
>     JdbcToArrow.sqlToArrowVectorIterator(conn.createStatement()
>         .executeQuery("select 1 from table1 where false"), config); {code}
>  
> It appears as though this is due to the implementation of the 
> [hasNext()|https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowVectorIterator.java#L158]
>  method.
> The expectation is that the `isAfterLast()` method on a JDBC result set 
> return true when the result set is empty. However, according to the [JDBC 
> documentation|https://docs.oracle.com/en/java/javase/11/docs/api/java.sql/java/sql/ResultSet.html#isAfterLast()]
>  it will always return false when the result set is empty. 
> {quote}Returns:{{{}true{}}} if the cursor is after the last row; {{false}} if 
> the cursor is at any other position or the result set contains no rows
> {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to