[ https://issues.apache.org/jira/browse/HUDI-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Kudinkin updated HUDI-3639: ---------------------------------- Priority: Critical (was: Blocker) > [Incremental] Add Proper Incremental Records FIltering support into Hudi's > custom RDD > ------------------------------------------------------------------------------------- > > Key: HUDI-3639 > URL: https://issues.apache.org/jira/browse/HUDI-3639 > Project: Apache Hudi > Issue Type: Bug > Reporter: Alexey Kudinkin > Priority: Critical > Fix For: 0.13.0 > > > Currently, Hudi's `MergeOnReadIncrementalRelation` solely relies on > `ParquetFileReader` to do record-level filtering of the records that don't > belong to a timeline span being queried. > As a side-effect, Hudi actually have to disable the use of > [VectorizedParquetReader|https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-vectorized-parquet-reader.html] > (since using one would prevent records from being filtered by the Reader) > > Instead, we should make sure that proper record-level filtering is performed > w/in the returned RDD, instead of squarely relying on FileReader to do that. -- This message was sent by Atlassian Jira (v8.20.10#820010)