[ https://issues.apache.org/jira/browse/IMPALA-11345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Quanlong Huang updated IMPALA-11345: ------------------------------------ Priority: Critical (was: Major) > Query failed when creating equal conjunction map for Parquet bloom filter > ------------------------------------------------------------------------- > > Key: IMPALA-11345 > URL: https://issues.apache.org/jira/browse/IMPALA-11345 > Project: IMPALA > Issue Type: Bug > Components: Backend, Distributed Exec > Affects Versions: Impala 4.1.0 > Environment: CentOS-7, Impala-4.1 > Reporter: Yuchen Fan > Priority: Critical > > When querying Hive table was added columns without using 'cascade', Impala > will encounter error like "Unable to find SchemaNode for path > 'db.table.column' in the schema of file > 'hdfs://xxx/path/to/parquet_file_before_add_column'." I checked parquet file > in error log and found that the schema is not compatible with table metadata. > Call stack is attached as below. Path and table name is masked: > {code:java} > I0609 18:04:25.970052 115413 status.cc:129] > c94d0ab3fdf8f943:3203006100000002] Unable to find SchemaNode for path > 'xxx_db.xxx_table.xxx_column' in the schema of file > 'hdfs://xxx_nn/xxx_table_path/000000_0'. > @ 0xea543b impala::Status::Status() > @ 0x1e3225c > impala::HdfsParquetScanner::CreateColIdx2EqConjunctMap() > @ 0x1e363ea impala::HdfsParquetScanner::Open() > @ 0x19b40d0 > impala::HdfsScanNodeBase::CreateAndOpenScannerHelper() > @ 0x1b5cbae impala::HdfsScanNode::ProcessSplit() > @ 0x1b5e12a impala::HdfsScanNode::ScannerThread() > @ 0x1b5e9c6 > _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE > @ 0x18eafa9 impala::Thread::SuperviseThread() > @ 0x18ee11a boost::detail::thread_data<>::run() > @ 0x2385510 thread_proxy > @ 0x7fb5b0745162 start_thread > @ 0x7fb5ad21df6c __clone{code} > The error may be relation with > [IMPALA-10640|https://issues.apache.org/jira/browse/IMPALA-10640]. Bloom > filter requires right hand values of equal conjunction matches with current > file schema. The filter will be unavailable if the column does not exist in > all parquet files scanned. I think we can disable parquet bloom filter for > this single query or scan node when discovered such situation. > How to reproduce (using impala-shell): > # create table parquet_test (id INT) stored as parquet; > # insert into parquet_test values (1),(2),(3); > # alter table parquet_test add columns (name STRING); > # insert into parquet_test values (4, "James"); > # select * from parquet_test where name in ("Lily"); > # Error occured. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org