subject:"\[jira\] \[Updated\] \(HUDI\-4766\) Fix HoodieFlinkClusteringJob"

[jira] [Updated] (HUDI-4766) Fix HoodieFlinkClusteringJob

2022-09-07 Thread Danny Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-4766:
-
Fix Version/s: 0.12.1

> Fix HoodieFlinkClusteringJob
> 
>
> Key: HUDI-4766
> URL: https://issues.apache.org/jira/browse/HUDI-4766
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: voon
>Assignee: voon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>
> h1. Flink Hudi Clustering Issues
>  
>  # Integer type used for byte-size configuration parameters instead of long
>  ** Maximum size range of 2^31-1 bytes ~2 gigabytes
>  # Unable to choose a particular instant to execute
>  # Unable to select filter mode as the method that controls this is 
> overridden by _FlinkSizeBasedClusteringPlanStrategy#filterPartitionPaths_
>  # No cleaning
>  ** With reference to OfflineCompaction (HoodieFlinkCompactor), cleaning is 
> only enabled if _clean.async.enabled = false._
>  # Schedule configuration is not consistent with HoodieFlinkCompactor 
> defining the flag = false, which is opposite of HoodieFlinkCompactor
>  # No ability to allow props to be passed in using _--props/–hoodie-conf_
>  ** Required for passing in configurations like:
>  *** _hoodie.parquet.compression.ratio_
>  *** Partition filter configurations depending on strategy
>  # Clustering group will spit out files of _hoodie.parquet.max.file.size_ 
> (120MB by default)
>  # Multiple clustering jobs can execute, but no fine-grain control over 
> restarting jobs that have failed. Current implementation will only filter for 
> REQUESTED clustering jobs; rollbacks will never be performed.
>  # Removed unused _getNumberOfOutputFileGroups()_ function.
>  ** _hoodie.clustering.plan.strategy.small.file.limit_
>  ** _hoodie.clustering.plan.strategy.max.bytes.per.group_
>  ** _hoodie.clustering.plan.strategy.target.file.max.bytes_
>  ** Will create N file groups (1 task will be writing to each file group, 
> increasing parallelism)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4766) Fix HoodieFlinkClusteringJob

2022-09-01 Thread voon (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

voon updated HUDI-4766:
---
Description: 
h1. Flink Hudi Clustering Issues

 
 # Integer type used for byte-size configuration parameters instead of long
 ** Maximum size range of 2^31-1 bytes ~2 gigabytes
 # Unable to choose a particular instant to execute
 # Unable to select filter mode as the method that controls this is overridden 
by _FlinkSizeBasedClusteringPlanStrategy#filterPartitionPaths_
 # No cleaning
 ** With reference to OfflineCompaction (HoodieFlinkCompactor), cleaning is 
only enabled if _clean.async.enabled = false._
 # Schedule configuration is not consistent with HoodieFlinkCompactor defining 
the flag = false, which is opposite of HoodieFlinkCompactor
 # No ability to allow props to be passed in using _--props/–hoodie-conf_
 ** Required for passing in configurations like:
 *** _hoodie.parquet.compression.ratio_
 *** Partition filter configurations depending on strategy
 # Clustering group will spit out files of _hoodie.parquet.max.file.size_ 
(120MB by default)
 # Multiple clustering jobs can execute, but no fine-grain control over 
restarting jobs that have failed. Current implementation will only filter for 
REQUESTED clustering jobs; rollbacks will never be performed.
 # Removed unused _getNumberOfOutputFileGroups()_ function.
 ** _hoodie.clustering.plan.strategy.small.file.limit_
 ** _hoodie.clustering.plan.strategy.max.bytes.per.group_
 ** _hoodie.clustering.plan.strategy.target.file.max.bytes_
 ** Will create N file groups (1 task will be writing to each file group, 
increasing parallelism)

  was:
h1. Flink Hudi Clustering Issues

 
 # Integer type used for byte size variables instead of long
 ** Maximum size range of 2^31-1 bytes ~2 gigabytes
 # Unable to choose a particular instant to execute
 # Unable to select filter mode as the method that controls this is overridden 
by _FlinkSizeBasedClusteringPlanStrategy#filterPartitionPaths_
 # No cleaning
 ** With reference to OfflineCompaction (HoodieFlinkCompactor), cleaning is 
only enabled if _clean.async.enabled = false._
 # Schedule configuration is not consistent with HoodieFlinkCompactor defining 
the flag = false, which is opposite of HoodieFlinkCompactor
 # No ability to allow props to be passed in using _--props/–hoodie-conf_
 ** Required for passing in configurations like:
 *** _hoodie.parquet.compression.ratio_
 *** Partition filter configurations depending on strategy
 # Clustering group will spit out files of _hoodie.parquet.max.file.size_ 
(120MB by default)
 # Multiple clustering jobs can execute, but no fine-grain control over 
restarting jobs that have failed. Current implementation will only filter for 
REQUESTED clustering jobs; rollbacks will never be performed.
 # Removed unused _getNumberOfOutputFileGroups()_ function.
 ** _hoodie.clustering.plan.strategy.small.file.limit_
 ** _hoodie.clustering.plan.strategy.max.bytes.per.group_
 ** _hoodie.clustering.plan.strategy.target.file.max.bytes_
 ** Will create N file groups (1 task will be writing to each file group, 
increasing parallelism)


> Fix HoodieFlinkClusteringJob
> 
>
> Key: HUDI-4766
> URL: https://issues.apache.org/jira/browse/HUDI-4766
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: voon
>Assignee: voon
>Priority: Major
>  Labels: pull-request-available
>
> h1. Flink Hudi Clustering Issues
>  
>  # Integer type used for byte-size configuration parameters instead of long
>  ** Maximum size range of 2^31-1 bytes ~2 gigabytes
>  # Unable to choose a particular instant to execute
>  # Unable to select filter mode as the method that controls this is 
> overridden by _FlinkSizeBasedClusteringPlanStrategy#filterPartitionPaths_
>  # No cleaning
>  ** With reference to OfflineCompaction (HoodieFlinkCompactor), cleaning is 
> only enabled if _clean.async.enabled = false._
>  # Schedule configuration is not consistent with HoodieFlinkCompactor 
> defining the flag = false, which is opposite of HoodieFlinkCompactor
>  # No ability to allow props to be passed in using _--props/–hoodie-conf_
>  ** Required for passing in configurations like:
>  *** _hoodie.parquet.compression.ratio_
>  *** Partition filter configurations depending on strategy
>  # Clustering group will spit out files of _hoodie.parquet.max.file.size_ 
> (120MB by default)
>  # Multiple clustering jobs can execute, but no fine-grain control over 
> restarting jobs that have failed. Current implementation will only filter for 
> REQUESTED clustering jobs; rollbacks will never be performed.
>  # Removed unused _getNumberOfOutputFileGroups()_ function.
>  ** _hoodie.clustering.plan.strategy.small.file.limit_
>  ** _hoodie.clustering.plan.strategy.max.bytes.per.group_
>  ** _hoodie.clustering.plan.strategy.target.file.max.byte

[jira] [Updated] (HUDI-4766) Fix HoodieFlinkClusteringJob

2022-09-01 Thread voon (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

voon updated HUDI-4766:
---
Description: 
h1. Flink Hudi Clustering Issues

 
 # Integer type used for byte size variables instead of long
 ** Maximum size range of 2^31-1 bytes ~2 gigabytes
 # Unable to choose a particular instant to execute
 # Unable to select filter mode as the method that controls this is overridden 
by _FlinkSizeBasedClusteringPlanStrategy#filterPartitionPaths_
 # No cleaning
 ** With reference to OfflineCompaction (HoodieFlinkCompactor), cleaning is 
only enabled if _clean.async.enabled = false._
 # Schedule configuration is not consistent with HoodieFlinkCompactor defining 
the flag = false, which is opposite of HoodieFlinkCompactor
 # No ability to allow props to be passed in using _--props/–hoodie-conf_
 ** Required for passing in configurations like:
 *** _hoodie.parquet.compression.ratio_
 *** Partition filter configurations depending on strategy
 # Clustering group will spit out files of _hoodie.parquet.max.file.size_ 
(120MB by default)
 # Multiple clustering jobs can execute, but no fine-grain control over 
restarting jobs that have failed. Current implementation will only filter for 
REQUESTED clustering jobs; rollbacks will never be performed.
 # Removed unused _getNumberOfOutputFileGroups()_ function.
 ** _hoodie.clustering.plan.strategy.small.file.limit_
 ** _hoodie.clustering.plan.strategy.max.bytes.per.group_
 ** _hoodie.clustering.plan.strategy.target.file.max.bytes_
 ** Will create N file groups (1 task will be writing to each file group, 
increasing parallelism)

  was:
h1. Flink Hudi Clustering Issues

 
 # Integer type used for byte size variables instead of long
 ** Maximum size range of 2^31-1 bytes ~2 gigabytes
 # Unable to choose a particular instant to execute
 # Unable to select filter mode as the method that controls this is overridden 
by _FlinkSizeBasedClusteringPlanStrategy#filterPartitionPaths_
 # No cleaning
 ** With reference to OfflineCompaction (HoodieFlinkCompactor), cleaning is 
only enabled if _clean.async.enabled = false._
 # Schedule configuration is not consistent with HoodieFlinkCompactor defining 
the flag = false, which is opposite of HoodieFlinkCompactor
 # Allow props to be passed in using _--props/–hoodie-conf_
 ** Required for passing in configurations like:
 *** _hoodie.parquet.compression.ratio_
 *** Partition filter configurations depending on strategy
 # Clustering group will spit out files of _hoodie.parquet.max.file.size_ 
(120MB by default)
 # Multiple clustering jobs can execute, but no fine-grain control over 
restarting jobs that have failed. Current implementation will only filter for 
REQUESTED clustering jobs; rollbacks will never be performed.
 # Removed unused _getNumberOfOutputFileGroups()_ function.
 ** _hoodie.clustering.plan.strategy.small.file.limit_
 ** _hoodie.clustering.plan.strategy.max.bytes.per.group_
 ** _hoodie.clustering.plan.strategy.target.file.max.bytes_
 ** Will create N file groups (1 task will be writing to each file group, 
increasing parallelism)


> Fix HoodieFlinkClusteringJob
> 
>
> Key: HUDI-4766
> URL: https://issues.apache.org/jira/browse/HUDI-4766
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: voon
>Assignee: voon
>Priority: Major
>  Labels: pull-request-available
>
> h1. Flink Hudi Clustering Issues
>  
>  # Integer type used for byte size variables instead of long
>  ** Maximum size range of 2^31-1 bytes ~2 gigabytes
>  # Unable to choose a particular instant to execute
>  # Unable to select filter mode as the method that controls this is 
> overridden by _FlinkSizeBasedClusteringPlanStrategy#filterPartitionPaths_
>  # No cleaning
>  ** With reference to OfflineCompaction (HoodieFlinkCompactor), cleaning is 
> only enabled if _clean.async.enabled = false._
>  # Schedule configuration is not consistent with HoodieFlinkCompactor 
> defining the flag = false, which is opposite of HoodieFlinkCompactor
>  # No ability to allow props to be passed in using _--props/–hoodie-conf_
>  ** Required for passing in configurations like:
>  *** _hoodie.parquet.compression.ratio_
>  *** Partition filter configurations depending on strategy
>  # Clustering group will spit out files of _hoodie.parquet.max.file.size_ 
> (120MB by default)
>  # Multiple clustering jobs can execute, but no fine-grain control over 
> restarting jobs that have failed. Current implementation will only filter for 
> REQUESTED clustering jobs; rollbacks will never be performed.
>  # Removed unused _getNumberOfOutputFileGroups()_ function.
>  ** _hoodie.clustering.plan.strategy.small.file.limit_
>  ** _hoodie.clustering.plan.strategy.max.bytes.per.group_
>  ** _hoodie.clustering.plan.strategy.target.file.max.bytes_
>  ** Will create N file groups (1 task w

[jira] [Updated] (HUDI-4766) Fix HoodieFlinkClusteringJob

2022-09-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4766:
-
Labels: pull-request-available  (was: )

> Fix HoodieFlinkClusteringJob
> 
>
> Key: HUDI-4766
> URL: https://issues.apache.org/jira/browse/HUDI-4766
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: voon
>Assignee: voon
>Priority: Major
>  Labels: pull-request-available
>
> h1. Flink Hudi Clustering Issues
>  
>  # Integer type used for byte size variables instead of long
>  ** Maximum size range of 2^31-1 bytes ~2 gigabytes
>  # Unable to choose a particular instant to execute
>  # Unable to select filter mode as the method that controls this is 
> overridden by _FlinkSizeBasedClusteringPlanStrategy#filterPartitionPaths_
>  # No cleaning
>  ** With reference to OfflineCompaction (HoodieFlinkCompactor), cleaning is 
> only enabled if _clean.async.enabled = false._
>  # Schedule configuration is not consistent with HoodieFlinkCompactor 
> defining the flag = false, which is opposite of HoodieFlinkCompactor
>  # Allow props to be passed in using _--props/–hoodie-conf_
>  ** Required for passing in configurations like:
>  *** _hoodie.parquet.compression.ratio_
>  *** Partition filter configurations depending on strategy
>  # Clustering group will spit out files of _hoodie.parquet.max.file.size_ 
> (120MB by default)
>  # Multiple clustering jobs can execute, but no fine-grain control over 
> restarting jobs that have failed. Current implementation will only filter for 
> REQUESTED clustering jobs; rollbacks will never be performed.
>  # Removed unused _getNumberOfOutputFileGroups()_ function.
>  ** _hoodie.clustering.plan.strategy.small.file.limit_
>  ** _hoodie.clustering.plan.strategy.max.bytes.per.group_
>  ** _hoodie.clustering.plan.strategy.target.file.max.bytes_
>  ** Will create N file groups (1 task will be writing to each file group, 
> increasing parallelism)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4766) Fix HoodieFlinkClusteringJob

[jira] [Updated] (HUDI-4766) Fix HoodieFlinkClusteringJob

[jira] [Updated] (HUDI-4766) Fix HoodieFlinkClusteringJob

[jira] [Updated] (HUDI-4766) Fix HoodieFlinkClusteringJob

4 matches

Site Navigation

Mail list logo

Footer information