[
https://issues.apache.org/jira/browse/DRILL-8388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Turton updated DRILL-8388:
--------------------------------
Description:
When a JDBC client issues a CTAS statement then Drill will return a record for
each completed writer fragment containing the number of records that fragment
wrote. These records are returned in the usual streaming fashion as writer
fragments complete, their order being unknowable in advance. If the client
application immediately closes its clientside JDBC resources after its call to
Statement.executeQuery has returned as follows
{code:java}
Statement ctasStatement = conn.createStatement();
ResultSet ctasResults = ctasStatement.executeQuery(ctasQueryText);
ctasResults.close();
ctasStatement.close();
{code}
it may be that the CTAS statement may still be executing, and that is then
prematurely cancelled depending on good or bad luck with respect to timing.
The cancellation of the CTAS statement is usually benign if it spawned only one
writer fragment, but if it spawned more than one then it is likely that at
least one writer will be interrupted before it has finished writing, resulting
in incomplete or even corrupted output. Even in the benign case, such queries
conclude in the CANCELLED state rather than the FINISHED state.
To have CTAS queries reliably conclude completely, the JDBC client can wait for
all of the writer fragments to complete before it closes its JDBC resources by
scrolling through the ResultSet before closing it.
{code:java}
while (ctasResults.next());{code}
was:
I'll refine this ticket as I discover more but at the current time I believe
this bug can reproduced as follows.
# The Drill writer format is set to Parquet.
# A CTAS statement is issued over JDBC (the bug does not appear to manifest
for the same query received over REST).
# The CTAS statement spawns multiple Parquet writer fragments.
# The query is apparently cancelled (by the Drill/JDBC client?) before all of
the writer fragments have completed.
# Some writer fragments have created no output file at all. Others have
created invalid, zero-byte Parquet files. Others have created valid empty
Parquet files and others have created valid non-empty Parquet files.
# A subsequent query against the destination fails because it encounters
zero-byte Parquet files.
> CTAS sent over JDBC may be cancelled if query results are not fetched
> ---------------------------------------------------------------------
>
> Key: DRILL-8388
> URL: https://issues.apache.org/jira/browse/DRILL-8388
> Project: Apache Drill
> Issue Type: Task
> Components: Client - JDBC, Storage - Writer
> Affects Versions: 1.20.3
> Reporter: James Turton
> Assignee: James Turton
> Priority: Major
> Fix For: Future
>
>
> When a JDBC client issues a CTAS statement then Drill will return a record
> for each completed writer fragment containing the number of records that
> fragment wrote. These records are returned in the usual streaming fashion as
> writer fragments complete, their order being unknowable in advance. If the
> client application immediately closes its clientside JDBC resources after its
> call to Statement.executeQuery has returned as follows
> {code:java}
> Statement ctasStatement = conn.createStatement();
> ResultSet ctasResults = ctasStatement.executeQuery(ctasQueryText);
> ctasResults.close();
> ctasStatement.close();
> {code}
> it may be that the CTAS statement may still be executing, and that is then
> prematurely cancelled depending on good or bad luck with respect to timing.
> The cancellation of the CTAS statement is usually benign if it spawned only
> one writer fragment, but if it spawned more than one then it is likely that
> at least one writer will be interrupted before it has finished writing,
> resulting in incomplete or even corrupted output. Even in the benign case,
> such queries conclude in the CANCELLED state rather than the FINISHED state.
> To have CTAS queries reliably conclude completely, the JDBC client can wait
> for all of the writer fragments to complete before it closes its JDBC
> resources by scrolling through the ResultSet before closing it.
> {code:java}
> while (ctasResults.next());{code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)