[
https://issues.apache.org/jira/browse/SPARK-57274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jiwon Park updated SPARK-57274:
-------------------------------
Description:
Follow-up to SPARK-54108(execute*) and SPARK-54014 (setMaxRows), filling the
remaining Statement-side gaps that JDBC client tools (e.g. DataGrip) exercise
on every query.
h3. Problem
{{SparkConnectStatement}} still throws {{SQLFeatureNotSupportedException}} for
accessors that JDBC client tools call around query execution:
{{{}setFetchSize{}}}/{{{}getFetchSize{}}},
{{{}setFetchDirection{}}}/{{{}getFetchDirection{}}}, {{{}getResultSetType{}}},
and {{{}setQueryTimeout{}}}. A single such call aborts the query path.
{{getMoreResults}} also throws unconditionally. Per its javadoc, results are
exhausted when {{{}getMoreResults() == false && getUpdateCount() == -1{}}}, so
the
standard JDBC result-drain loop
while (stmt.getMoreResults() || stmt.getUpdateCount() != -1) \{ ... }
either errors out or, if the tool swallows the exception, spins forever,
because {{getUpdateCount}} never transitions to -1 after the single result is
consumed. DataGrip hangs indefinitely on a result-less command such as {{{}USE
<db>{}}}.
Finally, {{SparkConnectConnection}} implements only the no-arg
{{{}createStatement(){}}}; the {{(type, concurrency)}} and {{(type,
concurrency, holdability)}} overloads throw, and JDBC client tools always call
them.
h3. Changes
* Implement the {{SparkConnectStatement}} accessors. Spark Connect results are
forward-only/read-only and the server paginates, so fetch size and query
timeout are stored as hints and echoed back (matching Spark Thrift / Hive
JDBC); fetch direction accepts only {{{}FETCH_FORWARD{}}}; {{getResultSetType
}}returns {{{}TYPE_FORWARD_ONLY{}}}. {{getFetchSize}} reports a non-zero
default (1000), matching Spark Thrift / Hive JDBC rather than the spec's 0
("driver decides"), since the value is only an informational hint here.
* Fix {{getMoreResults}} to close the current {{{}ResultSet{}}}, report no
further results, and flip {{getUpdateCount}} to -1 so drain loops terminate.
* Implement the {{createStatement}} type/concurrency overloads on
{{{}SparkConnectConnection{}}}: accept {{TYPE_FORWARD_ONLY}} and
{{TYPE_SCROLL_INSENSITIVE }}(downgraded to forward-only), reject updatable
concurrency and scroll-sensitive type with a clear message (mirroring the Hive
JDBC driver policy used by the Spark Thrift Server). Holdability is ignored,
since Connect results are effectively {{{}CLOSE_CURSORS_AT_COMMIT{}}}.
h3. Tests
New cases in {{{}SparkConnectStatementSuite{}}}: accessor defaults and
validation, drain-loop termination for both result-bearing and result-less
commands, and the typed {{createStatement}} overloads.
> Support fetch/type accessors and getMoreResults for SparkConnectStatement
> -------------------------------------------------------------------------
>
> Key: SPARK-57274
> URL: https://issues.apache.org/jira/browse/SPARK-57274
> Project: Spark
> Issue Type: Sub-task
> Components: Connect
> Affects Versions: 4.2.0
> Reporter: Jiwon Park
> Priority: Major
>
> Follow-up to SPARK-54108(execute*) and SPARK-54014 (setMaxRows), filling the
> remaining Statement-side gaps that JDBC client tools (e.g. DataGrip) exercise
> on every query.
> h3. Problem
> {{SparkConnectStatement}} still throws {{SQLFeatureNotSupportedException}}
> for accessors that JDBC client tools call around query execution:
> {{{}setFetchSize{}}}/{{{}getFetchSize{}}},
> {{{}setFetchDirection{}}}/{{{}getFetchDirection{}}},
> {{{}getResultSetType{}}}, and {{{}setQueryTimeout{}}}. A single such call
> aborts the query path.
> {{getMoreResults}} also throws unconditionally. Per its javadoc, results are
> exhausted when {{{}getMoreResults() == false && getUpdateCount() == -1{}}},
> so the
> standard JDBC result-drain loop
> while (stmt.getMoreResults() || stmt.getUpdateCount() != -1) \{ ... }
> either errors out or, if the tool swallows the exception, spins forever,
> because {{getUpdateCount}} never transitions to -1 after the single result is
> consumed. DataGrip hangs indefinitely on a result-less command such as
> {{{}USE <db>{}}}.
> Finally, {{SparkConnectConnection}} implements only the no-arg
> {{{}createStatement(){}}}; the {{(type, concurrency)}} and {{(type,
> concurrency, holdability)}} overloads throw, and JDBC client tools always
> call them.
> h3. Changes
> * Implement the {{SparkConnectStatement}} accessors. Spark Connect results
> are forward-only/read-only and the server paginates, so fetch size and query
> timeout are stored as hints and echoed back (matching Spark Thrift / Hive
> JDBC); fetch direction accepts only {{{}FETCH_FORWARD{}}}; {{getResultSetType
> }}returns {{{}TYPE_FORWARD_ONLY{}}}. {{getFetchSize}} reports a non-zero
> default (1000), matching Spark Thrift / Hive JDBC rather than the spec's 0
> ("driver decides"), since the value is only an informational hint here.
> * Fix {{getMoreResults}} to close the current {{{}ResultSet{}}}, report no
> further results, and flip {{getUpdateCount}} to -1 so drain loops terminate.
> * Implement the {{createStatement}} type/concurrency overloads on
> {{{}SparkConnectConnection{}}}: accept {{TYPE_FORWARD_ONLY}} and
> {{TYPE_SCROLL_INSENSITIVE }}(downgraded to forward-only), reject updatable
> concurrency and scroll-sensitive type with a clear message (mirroring the
> Hive JDBC driver policy used by the Spark Thrift Server). Holdability is
> ignored, since Connect results are effectively
> {{{}CLOSE_CURSORS_AT_COMMIT{}}}.
> h3. Tests
> New cases in {{{}SparkConnectStatementSuite{}}}: accessor defaults and
> validation, drain-loop termination for both result-bearing and result-less
> commands, and the typed {{createStatement}} overloads.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]