Hyunsik Choi created TAJO-806:
---------------------------------
Summary: CreateTableNode in CTAS has a wrong schema as output
schema and table shcmea.
Key: TAJO-806
URL: https://issues.apache.org/jira/browse/TAJO-806
Project: Tajo
Issue Type: Bug
Components: planner/optimizer, storage
Reporter: Hyunsik Choi
Assignee: Hyunsik Choi
Fix For: 0.9.0, 0.8.1
In below case, currently, TajoWriteSupport just takes the schema of the table
{{orders}}. In other words, each column qualifier was {{default.orders}}
instead of {{default.parquet_test}}. This is a bug. In such a case, we can meet
the following error when we read parquet files.
{noformat}
default> create table parquet_test using parquet as select * from orders;
Progress: 0%, response time: 1.119 sec
Progress: 0%, response time: 2.121 sec
Progress: 0%, response time: 3.123 sec
Progress: 83%, response time: 4.126 sec
Progress: 100%, response time: 4.709 sec
(1500000 rows, 4.709 sec, 109.9 MiB inserted)
default> select * from parquet_test;
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
details.
Exception in thread "main" java.lang.NullPointerException
at
parquet.hadoop.InternalParquetRecordReader.close(InternalParquetRecordReader.java:118)
at parquet.hadoop.ParquetReader.close(ParquetReader.java:144)
at
org.apache.tajo.storage.parquet.ParquetScanner.close(ParquetScanner.java:87)
at org.apache.tajo.storage.MergeScanner.close(MergeScanner.java:137)
at org.apache.tajo.jdbc.TajoResultSet.close(TajoResultSet.java:153)
at org.apache.tajo.cli.TajoCli.localQueryCompleted(TajoCli.java:387)
at org.apache.tajo.cli.TajoCli.executeQuery(TajoCli.java:365)
at org.apache.tajo.cli.TajoCli.executeParsedResults(TajoCli.java:322)
at org.apache.tajo.cli.TajoCli.runShell(TajoCli.java:311)
at org.apache.tajo.cli.TajoCli.main(TajoCli.java:490)
Apr 30, 2014 11:04:01 AM INFO: parquet.hadoop.ParquetFileReader: reading
another 1 footers
{noformat}
The patch fixes the bug where CreateTableNode takes the wrong schema.
In addition, I found the potential problem where ParquetFile stores the Tajo
Schema into its extra meta data. I think that it will problem when users
renames its database name or table name. So, I removed the code to insert a
Tajo schema into extra metadata and I changed Parquet reading to not use extra
metadata.
Tajo mainly uses Catalog system to manage schemas, and reading parquet files in
Tajo depends on Tajo catalog. So, it will work well. Also, other systems can
access parquet files by directly reading parquet's native schema.
--
This message was sent by Atlassian JIRA
(v6.2#6252)