[jira] [Updated] (HUDI-3639) [Incremental] Add Proper Incremental Records FIltering support into Hudi's custom RDD
[ https://issues.apache.org/jira/browse/HUDI-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Zhang updated HUDI-3639: Fix Version/s: 0.14.0 (was: 0.13.1) > [Incremental] Add Proper Incremental Records FIltering support into Hudi's > custom RDD > - > > Key: HUDI-3639 > URL: https://issues.apache.org/jira/browse/HUDI-3639 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexey Kudinkin >Priority: Critical > Labels: pull-request-available > Fix For: 0.14.0 > > > Currently, Hudi's `MergeOnReadIncrementalRelation` solely relies on > `ParquetFileReader` to do record-level filtering of the records that don't > belong to a timeline span being queried. > As a side-effect, Hudi actually have to disable the use of > [VectorizedParquetReader|https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-vectorized-parquet-reader.html] > (since using one would prevent records from being filtered by the Reader) > > Instead, we should make sure that proper record-level filtering is performed > w/in the returned RDD, instead of squarely relying on FileReader to do that. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-3639) [Incremental] Add Proper Incremental Records FIltering support into Hudi's custom RDD
[ https://issues.apache.org/jira/browse/HUDI-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-3639: - Labels: pull-request-available (was: ) > [Incremental] Add Proper Incremental Records FIltering support into Hudi's > custom RDD > - > > Key: HUDI-3639 > URL: https://issues.apache.org/jira/browse/HUDI-3639 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexey Kudinkin >Priority: Critical > Labels: pull-request-available > Fix For: 0.13.1 > > > Currently, Hudi's `MergeOnReadIncrementalRelation` solely relies on > `ParquetFileReader` to do record-level filtering of the records that don't > belong to a timeline span being queried. > As a side-effect, Hudi actually have to disable the use of > [VectorizedParquetReader|https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-vectorized-parquet-reader.html] > (since using one would prevent records from being filtered by the Reader) > > Instead, we should make sure that proper record-level filtering is performed > w/in the returned RDD, instead of squarely relying on FileReader to do that. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-3639) [Incremental] Add Proper Incremental Records FIltering support into Hudi's custom RDD
[ https://issues.apache.org/jira/browse/HUDI-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3639: -- Fix Version/s: 0.13.1 (was: 0.13.0) > [Incremental] Add Proper Incremental Records FIltering support into Hudi's > custom RDD > - > > Key: HUDI-3639 > URL: https://issues.apache.org/jira/browse/HUDI-3639 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexey Kudinkin >Priority: Critical > Fix For: 0.13.1 > > > Currently, Hudi's `MergeOnReadIncrementalRelation` solely relies on > `ParquetFileReader` to do record-level filtering of the records that don't > belong to a timeline span being queried. > As a side-effect, Hudi actually have to disable the use of > [VectorizedParquetReader|https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-vectorized-parquet-reader.html] > (since using one would prevent records from being filtered by the Reader) > > Instead, we should make sure that proper record-level filtering is performed > w/in the returned RDD, instead of squarely relying on FileReader to do that. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-3639) [Incremental] Add Proper Incremental Records FIltering support into Hudi's custom RDD
[ https://issues.apache.org/jira/browse/HUDI-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3639: -- Priority: Critical (was: Blocker) > [Incremental] Add Proper Incremental Records FIltering support into Hudi's > custom RDD > - > > Key: HUDI-3639 > URL: https://issues.apache.org/jira/browse/HUDI-3639 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexey Kudinkin >Priority: Critical > Fix For: 0.13.0 > > > Currently, Hudi's `MergeOnReadIncrementalRelation` solely relies on > `ParquetFileReader` to do record-level filtering of the records that don't > belong to a timeline span being queried. > As a side-effect, Hudi actually have to disable the use of > [VectorizedParquetReader|https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-vectorized-parquet-reader.html] > (since using one would prevent records from being filtered by the Reader) > > Instead, we should make sure that proper record-level filtering is performed > w/in the returned RDD, instead of squarely relying on FileReader to do that. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-3639) [Incremental] Add Proper Incremental Records FIltering support into Hudi's custom RDD
[ https://issues.apache.org/jira/browse/HUDI-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-3639: -- Fix Version/s: 0.13.0 (was: 0.12.0) > [Incremental] Add Proper Incremental Records FIltering support into Hudi's > custom RDD > - > > Key: HUDI-3639 > URL: https://issues.apache.org/jira/browse/HUDI-3639 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexey Kudinkin >Priority: Blocker > Fix For: 0.13.0 > > > Currently, Hudi's `MergeOnReadIncrementalRelation` solely relies on > `ParquetFileReader` to do record-level filtering of the records that don't > belong to a timeline span being queried. > As a side-effect, Hudi actually have to disable the use of > [VectorizedParquetReader|https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-vectorized-parquet-reader.html] > (since using one would prevent records from being filtered by the Reader) > > Instead, we should make sure that proper record-level filtering is performed > w/in the returned RDD, instead of squarely relying on FileReader to do that. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-3639) [Incremental] Add Proper Incremental Records FIltering support into Hudi's custom RDD
[ https://issues.apache.org/jira/browse/HUDI-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3639: -- Fix Version/s: 0.12.0 > [Incremental] Add Proper Incremental Records FIltering support into Hudi's > custom RDD > - > > Key: HUDI-3639 > URL: https://issues.apache.org/jira/browse/HUDI-3639 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexey Kudinkin >Priority: Blocker > Fix For: 0.12.0 > > > Currently, Hudi's `MergeOnReadIncrementalRelation` solely relies on > `ParquetFileReader` to do record-level filtering of the records that don't > belong to a timeline span being queried. > As a side-effect, Hudi actually have to disable the use of > [VectorizedParquetReader|https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-vectorized-parquet-reader.html] > (since using one would prevent records from being filtered by the Reader) > > Instead, we should make sure that proper record-level filtering is performed > w/in the returned RDD, instead of squarely relying on FileReader to do that. -- This message was sent by Atlassian Jira (v8.20.1#820001)