[jira] [Updated] (IMPALA-7612) Parquet file with no rows in it causing WARNING in explain

Zsombor Fedor (JIRA) Mon, 24 Sep 2018 03:25:21 -0700


     [ 
https://issues.apache.org/jira/browse/IMPALA-7612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Zsombor Fedor updated IMPALA-7612:
----------------------------------
    Description: 
An empty Parquet file, with no rows in it causing a warning in explain:
{code:java}
WARNING: The following tables have potentially corrupt table statistics. Drop 
and re-compute statistics to resolve this problem. {code}
This Warning is showing even after
{code:java}
compute stats tp;{code}
because :
{code:java}
partitions=1/1 files=1 size=220B{code}
but numRows = 0.

A simple reproduction:
{code:java}
create table tp (a int);{code}
create and empty.csv file

create parquet file from the csv with a simple MR job:

[https://github.com/tomwhite/hadoop-book/blob/master/ch13-parquet/src/main/java/TextToParquetWithAvro.java]

using the following schema:
{code:java}
"{\n" +
 " \"type\": \"record\",\n" + 
 " \"name\": \"tp\",\n" +
 " \"doc\": \"Avro schema for table tp\",\n" +
 " \"fields\":\n" + 
 " [\n" + 
 " {\"name\": \"a\", \"type\": \"int\"}\n"+
 " ]\n"+
 "}\n");{code}

Put the output Parquet file onto the HDFS, then

{code:java}
compute stats tp;
explain select * from tp;
{code}



  was:
An empty Parquet file, with no rows in it causing a warning in explain:
{code:java}
WARNING: The following tables have potentially corrupt table statistics. Drop 
and re-compute statistics to resolve this problem. {code}
This Warning is showing even after
{code:java}
compute stats tp;{code}
because :
{code:java}
partitions=1/1 files=1 size=220B{code}
but numRows = 0.

A simple reproduction:
{code:java}
create table tp (a int);{code}
create and empty.csv file

create parquet file from the csv with a simple MR job:

[https://github.com/tomwhite/hadoop-book/blob/master/ch13-parquet/src/main/java/TextToParquetWithAvro.java]

using the following schema:
{code:java}
"{\n" +
 " \"type\": \"record\",\n" + 
 " \"name\": \"tp\",\n" +
 " \"doc\": \"Avro schema for table tp\",\n" +
 " \"fields\":\n" + 
 " [\n" + 
 " {\"name\": \"a\", \"type\": \"int\"}\n"+
 " ]\n"+
 "}\n");{code}


> Parquet file with no rows in it causing WARNING in explain
> ----------------------------------------------------------
>
>                 Key: IMPALA-7612
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7612
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Frontend
>    Affects Versions: Impala 2.12.0
>            Reporter: Zsombor Fedor
>            Priority: Major
>
> An empty Parquet file, with no rows in it causing a warning in explain:
> {code:java}
> WARNING: The following tables have potentially corrupt table statistics. Drop 
> and re-compute statistics to resolve this problem. {code}
> This Warning is showing even after
> {code:java}
> compute stats tp;{code}
> because :
> {code:java}
> partitions=1/1 files=1 size=220B{code}
> but numRows = 0.
> A simple reproduction:
> {code:java}
> create table tp (a int);{code}
> create and empty.csv file
> create parquet file from the csv with a simple MR job:
> [https://github.com/tomwhite/hadoop-book/blob/master/ch13-parquet/src/main/java/TextToParquetWithAvro.java]
> using the following schema:
> {code:java}
> "{\n" +
>  " \"type\": \"record\",\n" + 
>  " \"name\": \"tp\",\n" +
>  " \"doc\": \"Avro schema for table tp\",\n" +
>  " \"fields\":\n" + 
>  " [\n" + 
>  " {\"name\": \"a\", \"type\": \"int\"}\n"+
>  " ]\n"+
>  "}\n");{code}
> Put the output Parquet file onto the HDFS, then
> {code:java}
> compute stats tp;
> explain select * from tp;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7612) Parquet file with no rows in it causing WARNING in explain

Reply via email to