[ https://issues.apache.org/jira/browse/IMPALA-11414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564895#comment-17564895 ]
Quanlong Huang commented on IMPALA-11414: ----------------------------------------- Oops, just realized the new test added in [https://gerrit.cloudera.org/c/18700/2/testdata/workloads/functional-query/queries/QueryTest/parquet-late-materialization-unique-db.test] answers my question. The stacktrace is {noformat} F0711 17:17:40.015542 13326 parquet-column-readers.cc:1285] fe42cd2b76c34693:a76d783b00000001] Check failed: num_rows > 0 (0 vs. 0) *** Check failure stack trace: *** @ 0x36fa43c google::LogMessage::Fail() @ 0x36fbcec google::LogMessage::SendToLog() @ 0x36f9d9a google::LogMessage::Flush() @ 0x36fd958 google::LogMessageFatal::~LogMessageFatal() @ 0x1ec98f4 impala::BaseScalarColumnReader::SkipTopLevelRows<>() @ 0x1ecb298 impala::BaseScalarColumnReader::SkipRowsInternal<>() @ 0x1ecb6fc impala::BaseScalarColumnReader::SkipRows() @ 0x1e4ecf5 impala::HdfsParquetScanner::FillScratchMicroBatches() @ 0x1e682e1 impala::HdfsParquetScanner::AssembleRows<>() @ 0x1e64c5f impala::HdfsParquetScanner::GetNextInternal() @ 0x1e50119 impala::HdfsParquetScanner::ProcessSplit() @ 0x1a9bf93 impala::HdfsScanNode::ProcessSplit() @ 0x1a9d0d0 impala::HdfsScanNode::ScannerThread() @ 0x1a9d96c _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE @ 0x1833d3a impala::Thread::SuperviseThread() @ 0x1836eac boost::detail::thread_data<>::run() @ 0x23b3070 thread_proxy @ 0x7f63f58da6b9 start_thread @ 0x7f63f230c4dc clone {noformat} > Off-by-one error in Parquet late materialization > ------------------------------------------------ > > Key: IMPALA-11414 > URL: https://issues.apache.org/jira/browse/IMPALA-11414 > Project: IMPALA > Issue Type: Bug > Components: Backend > Reporter: Zoltán Borók-Nagy > Assignee: Zoltán Borók-Nagy > Priority: Major > > With PARQUET_LATE_MATERIALIZATION we can set the number of minimum > consecutive rows that if filtered out, we avoid materialization of rows in > other columns in parquet. > E.g. if PARQUET_LATE_MATERIALIZATION is 10, and in a filtered column we find > at least 10 consecutive rows that don't pass the predicates we avoid > materializing the corresponding rows in the other columns. > But due to an off-by-one error we actually only need > (PARQUET_LATE_MATERIALIZATION - 1) consecutive elements. This means if we set > PARQUET_LATE_MATERIALIZATION to one, then we need zero consecutive filtered > out elements which leads to a crash/DCHECK. The bug is in the > GetMicroBatches() algorithm when we produce the micro batches based on the > selected rows. > Setting PARQUET_LATE_MATERIALIZATION to 0 doesn't make sense so it shouldn't > be allowed. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org