[ https://issues.apache.org/jira/browse/DRILL-8388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
James Turton updated DRILL-8388: -------------------------------- Description: I'll refine this ticket as I discover more but at the current time I believe this bug can reproduced as follows. # The Drill writer format is set to Parquet. # A CTAS statement is issued over JDBC (the bug does not appear to manifest for the same query received over REST). # The CTAS statement spawns multiple Parquet writer fragments. It may also be necessary that these fragments are distributed over more than one Drillbit (unconfirmed on a single Drillbit). # The query is apparently cancelled (by the Drill/JDBC client?) before all of the writer fragments have completed. # Some writer fragments have created no output file at all. Others have created invalid, zero-byte Parquet files. Others have created valid empty Parquet files and others have created valid non-empty Parquet files. # A subsequent query against the destination fails because it encounters zero-byte Parquet files. was: I'll refine this ticket as I discover more but at the current time I believe this bug can reproduced as follows. # The Drill writer format is set to Parquet. # A CTAS statement is issued over JDBC (the bug does not appear to manifest for the same query received over REST). # The CTAS statement spawns multiple Parquet writer fragments. It may also be necessary that these fragments are distributed over more than one Drillbit (unconfirmed on a single Drillbit). # Some of the Parquet writer fragments receive batches containing zero records. # The query is apparently cancelled (by the Drill/JDBC client?) before all of the writer fragments have completed. # Some writer fragments have created no output file at all. Others have created invalid, zero-byte Parquet files. Others have created valid empty Parquet files and others have created valid non-empty Parquet files. # A subsequent query against the destination fails because it encounters zero-byte Parquet files. > Zero-record Parquet writer fragments result in query cancellation and > zero-byte Parquet files > --------------------------------------------------------------------------------------------- > > Key: DRILL-8388 > URL: https://issues.apache.org/jira/browse/DRILL-8388 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Writer > Affects Versions: 1.20.3 > Reporter: James Turton > Assignee: James Turton > Priority: Major > Fix For: 1.21.0 > > > I'll refine this ticket as I discover more but at the current time I believe > this bug can reproduced as follows. > # The Drill writer format is set to Parquet. > # A CTAS statement is issued over JDBC (the bug does not appear to manifest > for the same query received over REST). > # The CTAS statement spawns multiple Parquet writer fragments. It may also > be necessary that these fragments are distributed over more than one Drillbit > (unconfirmed on a single Drillbit). > # The query is apparently cancelled (by the Drill/JDBC client?) before all > of the writer fragments have completed. > # Some writer fragments have created no output file at all. Others have > created invalid, zero-byte Parquet files. Others have created valid empty > Parquet files and others have created valid non-empty Parquet files. > # A subsequent query against the destination fails because it encounters > zero-byte Parquet files. -- This message was sent by Atlassian Jira (v8.20.10#820010)