[ 
https://issues.apache.org/jira/browse/DRILL-8388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Turton updated DRILL-8388:
--------------------------------
    Description: 
I'll refine this ticket as I discover more but at the current time I believe 
this bug can reproduced as follows.
 # The Drill writer format is set to Parquet.
 # A CTAS statement is issued over JDBC (the bug does not appear to manifest 
for the same query received over REST).
 # The CTAS statement spawns multiple Parquet writer fragments. It may also be 
necessary that these fragments are distributed over more than one Drillbit 
(unconfirmed on a single Drillbit).
 # The query is apparently cancelled (by the Drill/JDBC client?) before all of 
the writer fragments have completed.
 # Some writer fragments have created no output file at all. Others have 
created invalid, zero-byte Parquet files. Others have created valid empty 
Parquet files and others have created valid non-empty Parquet files.
 # A subsequent query against the destination fails because it encounters 
zero-byte Parquet files.

  was:
I'll refine this ticket as I discover more but at the current time I believe 
this bug can reproduced as follows.
 # The Drill writer format is set to Parquet.
 # A CTAS statement is issued over JDBC (the bug does not appear to manifest 
for the same query received over REST).
 # The CTAS statement spawns multiple Parquet writer fragments. It may also be 
necessary that these fragments are distributed over more than one Drillbit 
(unconfirmed on a single Drillbit).
 # Some of the Parquet writer fragments receive batches containing zero records.
 # The query is apparently cancelled (by the Drill/JDBC client?) before all of 
the writer fragments have completed.
 # Some writer fragments have created no output file at all. Others have 
created invalid, zero-byte Parquet files. Others have created valid empty 
Parquet files and others have created valid non-empty Parquet files.
 # A subsequent query against the destination fails because it encounters 
zero-byte Parquet files.


> Zero-record Parquet writer fragments result in query cancellation and 
> zero-byte Parquet files
> ---------------------------------------------------------------------------------------------
>
>                 Key: DRILL-8388
>                 URL: https://issues.apache.org/jira/browse/DRILL-8388
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Writer
>    Affects Versions: 1.20.3
>            Reporter: James Turton
>            Assignee: James Turton
>            Priority: Major
>             Fix For: 1.21.0
>
>
> I'll refine this ticket as I discover more but at the current time I believe 
> this bug can reproduced as follows.
>  # The Drill writer format is set to Parquet.
>  # A CTAS statement is issued over JDBC (the bug does not appear to manifest 
> for the same query received over REST).
>  # The CTAS statement spawns multiple Parquet writer fragments. It may also 
> be necessary that these fragments are distributed over more than one Drillbit 
> (unconfirmed on a single Drillbit).
>  # The query is apparently cancelled (by the Drill/JDBC client?) before all 
> of the writer fragments have completed.
>  # Some writer fragments have created no output file at all. Others have 
> created invalid, zero-byte Parquet files. Others have created valid empty 
> Parquet files and others have created valid non-empty Parquet files.
>  # A subsequent query against the destination fails because it encounters 
> zero-byte Parquet files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to