[jira] [Commented] (IMPALA-11414) Off-by-one error in Parquet late materialization

Quanlong Huang (Jira) Mon, 11 Jul 2022 02:22:07 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-11414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564895#comment-17564895
 ]


Quanlong Huang commented on IMPALA-11414:
-----------------------------------------

Oops, just realized the new test added in 
[https://gerrit.cloudera.org/c/18700/2/testdata/workloads/functional-query/queries/QueryTest/parquet-late-materialization-unique-db.test]
 answers my question. The stacktrace is
{noformat}
F0711 17:17:40.015542 13326 parquet-column-readers.cc:1285] 
fe42cd2b76c34693:a76d783b00000001] Check failed: num_rows > 0 (0 vs. 0)  
*** Check failure stack trace: *** 
    @          0x36fa43c  google::LogMessage::Fail()
    @          0x36fbcec  google::LogMessage::SendToLog()
    @          0x36f9d9a  google::LogMessage::Flush()
    @          0x36fd958  google::LogMessageFatal::~LogMessageFatal()
    @          0x1ec98f4  impala::BaseScalarColumnReader::SkipTopLevelRows<>()
    @          0x1ecb298  impala::BaseScalarColumnReader::SkipRowsInternal<>()
    @          0x1ecb6fc  impala::BaseScalarColumnReader::SkipRows()
    @          0x1e4ecf5  impala::HdfsParquetScanner::FillScratchMicroBatches()
    @          0x1e682e1  impala::HdfsParquetScanner::AssembleRows<>()
    @          0x1e64c5f  impala::HdfsParquetScanner::GetNextInternal()
    @          0x1e50119  impala::HdfsParquetScanner::ProcessSplit()
    @          0x1a9bf93  impala::HdfsScanNode::ProcessSplit()
    @          0x1a9d0d0  impala::HdfsScanNode::ScannerThread()
    @          0x1a9d96c  
_ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
    @          0x1833d3a  impala::Thread::SuperviseThread()
    @          0x1836eac  boost::detail::thread_data<>::run()
    @          0x23b3070  thread_proxy
    @     0x7f63f58da6b9  start_thread
    @     0x7f63f230c4dc  clone
{noformat}

> Off-by-one error in Parquet late materialization
> ------------------------------------------------
>
>                 Key: IMPALA-11414
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11414
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>
> With PARQUET_LATE_MATERIALIZATION we can set the number of minimum 
> consecutive rows that if filtered out, we avoid materialization of rows in 
> other columns in parquet.
> E.g. if PARQUET_LATE_MATERIALIZATION is 10, and in a filtered column we find 
> at least 10 consecutive rows that don't pass the predicates we avoid 
> materializing the corresponding rows in the other columns.
> But due to an off-by-one error we actually only need 
> (PARQUET_LATE_MATERIALIZATION - 1) consecutive elements. This means if we set 
> PARQUET_LATE_MATERIALIZATION to one, then we need zero consecutive filtered 
> out elements which leads to a crash/DCHECK. The bug is in the 
> GetMicroBatches() algorithm when we produce the micro batches based on the 
> selected rows.
> Setting PARQUET_LATE_MATERIALIZATION to 0 doesn't make sense so it shouldn't 
> be allowed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-11414) Off-by-one error in Parquet late materialization

Reply via email to