[jira] [Updated] (HUDI-3639) [Incremental] Add Proper Incremental Records FIltering support into Hudi's custom RDD

2023-05-22 Thread Yue Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yue Zhang updated HUDI-3639:

Fix Version/s: 0.14.0
   (was: 0.13.1)

> [Incremental] Add Proper Incremental Records FIltering support into Hudi's 
> custom RDD
> -
>
> Key: HUDI-3639
> URL: https://issues.apache.org/jira/browse/HUDI-3639
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> Currently, Hudi's `MergeOnReadIncrementalRelation` solely relies on 
> `ParquetFileReader` to do record-level filtering of the records that don't 
> belong to a timeline span being queried.
> As a side-effect, Hudi actually have to disable the use of 
> [VectorizedParquetReader|https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-vectorized-parquet-reader.html]
>  (since using one would prevent records from being filtered by the Reader)
>  
> Instead, we should make sure that proper record-level filtering is performed 
> w/in the returned RDD, instead of squarely relying on FileReader to do that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-3639) [Incremental] Add Proper Incremental Records FIltering support into Hudi's custom RDD

2023-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-3639:
-
Labels: pull-request-available  (was: )

> [Incremental] Add Proper Incremental Records FIltering support into Hudi's 
> custom RDD
> -
>
> Key: HUDI-3639
> URL: https://issues.apache.org/jira/browse/HUDI-3639
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.13.1
>
>
> Currently, Hudi's `MergeOnReadIncrementalRelation` solely relies on 
> `ParquetFileReader` to do record-level filtering of the records that don't 
> belong to a timeline span being queried.
> As a side-effect, Hudi actually have to disable the use of 
> [VectorizedParquetReader|https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-vectorized-parquet-reader.html]
>  (since using one would prevent records from being filtered by the Reader)
>  
> Instead, we should make sure that proper record-level filtering is performed 
> w/in the returned RDD, instead of squarely relying on FileReader to do that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-3639) [Incremental] Add Proper Incremental Records FIltering support into Hudi's custom RDD

2022-12-20 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-3639:
--
Fix Version/s: 0.13.1
   (was: 0.13.0)

> [Incremental] Add Proper Incremental Records FIltering support into Hudi's 
> custom RDD
> -
>
> Key: HUDI-3639
> URL: https://issues.apache.org/jira/browse/HUDI-3639
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Priority: Critical
> Fix For: 0.13.1
>
>
> Currently, Hudi's `MergeOnReadIncrementalRelation` solely relies on 
> `ParquetFileReader` to do record-level filtering of the records that don't 
> belong to a timeline span being queried.
> As a side-effect, Hudi actually have to disable the use of 
> [VectorizedParquetReader|https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-vectorized-parquet-reader.html]
>  (since using one would prevent records from being filtered by the Reader)
>  
> Instead, we should make sure that proper record-level filtering is performed 
> w/in the returned RDD, instead of squarely relying on FileReader to do that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-3639) [Incremental] Add Proper Incremental Records FIltering support into Hudi's custom RDD

2022-12-20 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-3639:
--
Priority: Critical  (was: Blocker)

> [Incremental] Add Proper Incremental Records FIltering support into Hudi's 
> custom RDD
> -
>
> Key: HUDI-3639
> URL: https://issues.apache.org/jira/browse/HUDI-3639
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Priority: Critical
> Fix For: 0.13.0
>
>
> Currently, Hudi's `MergeOnReadIncrementalRelation` solely relies on 
> `ParquetFileReader` to do record-level filtering of the records that don't 
> belong to a timeline span being queried.
> As a side-effect, Hudi actually have to disable the use of 
> [VectorizedParquetReader|https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-vectorized-parquet-reader.html]
>  (since using one would prevent records from being filtered by the Reader)
>  
> Instead, we should make sure that proper record-level filtering is performed 
> w/in the returned RDD, instead of squarely relying on FileReader to do that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-3639) [Incremental] Add Proper Incremental Records FIltering support into Hudi's custom RDD

2022-07-29 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-3639:
--
Fix Version/s: 0.13.0
   (was: 0.12.0)

> [Incremental] Add Proper Incremental Records FIltering support into Hudi's 
> custom RDD
> -
>
> Key: HUDI-3639
> URL: https://issues.apache.org/jira/browse/HUDI-3639
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Priority: Blocker
> Fix For: 0.13.0
>
>
> Currently, Hudi's `MergeOnReadIncrementalRelation` solely relies on 
> `ParquetFileReader` to do record-level filtering of the records that don't 
> belong to a timeline span being queried.
> As a side-effect, Hudi actually have to disable the use of 
> [VectorizedParquetReader|https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-vectorized-parquet-reader.html]
>  (since using one would prevent records from being filtered by the Reader)
>  
> Instead, we should make sure that proper record-level filtering is performed 
> w/in the returned RDD, instead of squarely relying on FileReader to do that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-3639) [Incremental] Add Proper Incremental Records FIltering support into Hudi's custom RDD

2022-03-15 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-3639:
--
Fix Version/s: 0.12.0

> [Incremental] Add Proper Incremental Records FIltering support into Hudi's 
> custom RDD
> -
>
> Key: HUDI-3639
> URL: https://issues.apache.org/jira/browse/HUDI-3639
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Priority: Blocker
> Fix For: 0.12.0
>
>
> Currently, Hudi's `MergeOnReadIncrementalRelation` solely relies on 
> `ParquetFileReader` to do record-level filtering of the records that don't 
> belong to a timeline span being queried.
> As a side-effect, Hudi actually have to disable the use of 
> [VectorizedParquetReader|https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-vectorized-parquet-reader.html]
>  (since using one would prevent records from being filtered by the Reader)
>  
> Instead, we should make sure that proper record-level filtering is performed 
> w/in the returned RDD, instead of squarely relying on FileReader to do that.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)