[ https://issues.apache.org/jira/browse/IMPALA-7612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zsombor Fedor updated IMPALA-7612: ---------------------------------- Description: An empty Parquet file, with no rows in it causing a warning in explain: {code:java} WARNING: The following tables have potentially corrupt table statistics. Drop and re-compute statistics to resolve this problem. {code} This Warning is showing even after {code:java} compute stats tp;{code} because : {code:java} partitions=1/1 files=1 size=220B{code} but numRows = 0. A simple reproduction: {code:java} create table tp (a int);{code} create and empty.csv file create parquet file from the csv with a simple MR job: [https://github.com/tomwhite/hadoop-book/blob/master/ch13-parquet/src/main/java/TextToParquetWithAvro.java] using the following schema: {code:java} "{\n" + " \"type\": \"record\",\n" + " \"name\": \"tp\",\n" + " \"doc\": \"Avro schema for table tp\",\n" + " \"fields\":\n" + " [\n" + " {\"name\": \"a\", \"type\": \"int\"}\n"+ " ]\n"+ "}\n");{code} Put the output Parquet file onto the HDFS, then {code:java} compute stats tp; explain select * from tp; {code} was: An empty Parquet file, with no rows in it causing a warning in explain: {code:java} WARNING: The following tables have potentially corrupt table statistics. Drop and re-compute statistics to resolve this problem. {code} This Warning is showing even after {code:java} compute stats tp;{code} because : {code:java} partitions=1/1 files=1 size=220B{code} but numRows = 0. A simple reproduction: {code:java} create table tp (a int);{code} create and empty.csv file create parquet file from the csv with a simple MR job: [https://github.com/tomwhite/hadoop-book/blob/master/ch13-parquet/src/main/java/TextToParquetWithAvro.java] using the following schema: {code:java} "{\n" + " \"type\": \"record\",\n" + " \"name\": \"tp\",\n" + " \"doc\": \"Avro schema for table tp\",\n" + " \"fields\":\n" + " [\n" + " {\"name\": \"a\", \"type\": \"int\"}\n"+ " ]\n"+ "}\n");{code} > Parquet file with no rows in it causing WARNING in explain > ---------------------------------------------------------- > > Key: IMPALA-7612 > URL: https://issues.apache.org/jira/browse/IMPALA-7612 > Project: IMPALA > Issue Type: New Feature > Components: Frontend > Affects Versions: Impala 2.12.0 > Reporter: Zsombor Fedor > Priority: Major > > An empty Parquet file, with no rows in it causing a warning in explain: > {code:java} > WARNING: The following tables have potentially corrupt table statistics. Drop > and re-compute statistics to resolve this problem. {code} > This Warning is showing even after > {code:java} > compute stats tp;{code} > because : > {code:java} > partitions=1/1 files=1 size=220B{code} > but numRows = 0. > A simple reproduction: > {code:java} > create table tp (a int);{code} > create and empty.csv file > create parquet file from the csv with a simple MR job: > [https://github.com/tomwhite/hadoop-book/blob/master/ch13-parquet/src/main/java/TextToParquetWithAvro.java] > using the following schema: > {code:java} > "{\n" + > " \"type\": \"record\",\n" + > " \"name\": \"tp\",\n" + > " \"doc\": \"Avro schema for table tp\",\n" + > " \"fields\":\n" + > " [\n" + > " {\"name\": \"a\", \"type\": \"int\"}\n"+ > " ]\n"+ > "}\n");{code} > Put the output Parquet file onto the HDFS, then > {code:java} > compute stats tp; > explain select * from tp; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org