[ 
https://issues.apache.org/jira/browse/DRILL-8388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Turton updated DRILL-8388:
--------------------------------
    Description: 
When a JDBC client issues a CTAS statement then Drill will return a record for 
each completed writer fragment containing the number of records that fragment 
wrote. These records are returned in the usual streaming fashion as writer 
fragments complete, their order being unknowable in advance. If the client 
application immediately closes its clientside JDBC resources after its call to 
Statement.executeQuery has returned as follows
{code:java}
Statement ctasStatement = conn.createStatement();
ResultSet ctasResults = ctasStatement.executeQuery(ctasQueryText);
ctasResults.close();
ctasStatement.close();
{code}
it may be that the CTAS statement may still be executing, and that is then 
prematurely cancelled depending on good or bad luck with respect to timing.

The cancellation of the CTAS statement is usually benign if it spawned only one 
writer fragment, but if it spawned more than one then it is likely that at 
least one writer will be interrupted before it has finished writing, resulting 
in incomplete or even corrupted output. Even in the benign case, such queries 
conclude in the CANCELLED state rather than the FINISHED state.

To have CTAS queries reliably conclude completely, the JDBC client can wait for 
all of the writer fragments to complete before it closes its JDBC resources by 
scrolling through the ResultSet before closing it.
{code:java}
while (ctasResults.next());{code}
 

  was:
I'll refine this ticket as I discover more but at the current time I believe 
this bug can reproduced as follows.
 # The Drill writer format is set to Parquet.
 # A CTAS statement is issued over JDBC (the bug does not appear to manifest 
for the same query received over REST).
 # The CTAS statement spawns multiple Parquet writer fragments.
 # The query is apparently cancelled (by the Drill/JDBC client?) before all of 
the writer fragments have completed.
 # Some writer fragments have created no output file at all. Others have 
created invalid, zero-byte Parquet files. Others have created valid empty 
Parquet files and others have created valid non-empty Parquet files.
 # A subsequent query against the destination fails because it encounters 
zero-byte Parquet files.


> CTAS sent over JDBC may be cancelled if query results are not fetched
> ---------------------------------------------------------------------
>
>                 Key: DRILL-8388
>                 URL: https://issues.apache.org/jira/browse/DRILL-8388
>             Project: Apache Drill
>          Issue Type: Task
>          Components: Client - JDBC, Storage - Writer
>    Affects Versions: 1.20.3
>            Reporter: James Turton
>            Assignee: James Turton
>            Priority: Major
>             Fix For: Future
>
>
> When a JDBC client issues a CTAS statement then Drill will return a record 
> for each completed writer fragment containing the number of records that 
> fragment wrote. These records are returned in the usual streaming fashion as 
> writer fragments complete, their order being unknowable in advance. If the 
> client application immediately closes its clientside JDBC resources after its 
> call to Statement.executeQuery has returned as follows
> {code:java}
> Statement ctasStatement = conn.createStatement();
> ResultSet ctasResults = ctasStatement.executeQuery(ctasQueryText);
> ctasResults.close();
> ctasStatement.close();
> {code}
> it may be that the CTAS statement may still be executing, and that is then 
> prematurely cancelled depending on good or bad luck with respect to timing.
> The cancellation of the CTAS statement is usually benign if it spawned only 
> one writer fragment, but if it spawned more than one then it is likely that 
> at least one writer will be interrupted before it has finished writing, 
> resulting in incomplete or even corrupted output. Even in the benign case, 
> such queries conclude in the CANCELLED state rather than the FINISHED state.
> To have CTAS queries reliably conclude completely, the JDBC client can wait 
> for all of the writer fragments to complete before it closes its JDBC 
> resources by scrolling through the ResultSet before closing it.
> {code:java}
> while (ctasResults.next());{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to