[jira] [Updated] (HUDI-4766) Fix HoodieFlinkClusteringJob
[ https://issues.apache.org/jira/browse/HUDI-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-4766: - Fix Version/s: 0.12.1 > Fix HoodieFlinkClusteringJob > > > Key: HUDI-4766 > URL: https://issues.apache.org/jira/browse/HUDI-4766 > Project: Apache Hudi > Issue Type: Bug >Reporter: voon >Assignee: voon >Priority: Major > Labels: pull-request-available > Fix For: 0.12.1 > > > h1. Flink Hudi Clustering Issues > > # Integer type used for byte-size configuration parameters instead of long > ** Maximum size range of 2^31-1 bytes ~2 gigabytes > # Unable to choose a particular instant to execute > # Unable to select filter mode as the method that controls this is > overridden by _FlinkSizeBasedClusteringPlanStrategy#filterPartitionPaths_ > # No cleaning > ** With reference to OfflineCompaction (HoodieFlinkCompactor), cleaning is > only enabled if _clean.async.enabled = false._ > # Schedule configuration is not consistent with HoodieFlinkCompactor > defining the flag = false, which is opposite of HoodieFlinkCompactor > # No ability to allow props to be passed in using _--props/–hoodie-conf_ > ** Required for passing in configurations like: > *** _hoodie.parquet.compression.ratio_ > *** Partition filter configurations depending on strategy > # Clustering group will spit out files of _hoodie.parquet.max.file.size_ > (120MB by default) > # Multiple clustering jobs can execute, but no fine-grain control over > restarting jobs that have failed. Current implementation will only filter for > REQUESTED clustering jobs; rollbacks will never be performed. > # Removed unused _getNumberOfOutputFileGroups()_ function. > ** _hoodie.clustering.plan.strategy.small.file.limit_ > ** _hoodie.clustering.plan.strategy.max.bytes.per.group_ > ** _hoodie.clustering.plan.strategy.target.file.max.bytes_ > ** Will create N file groups (1 task will be writing to each file group, > increasing parallelism) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4766) Fix HoodieFlinkClusteringJob
[ https://issues.apache.org/jira/browse/HUDI-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] voon updated HUDI-4766: --- Description: h1. Flink Hudi Clustering Issues # Integer type used for byte-size configuration parameters instead of long ** Maximum size range of 2^31-1 bytes ~2 gigabytes # Unable to choose a particular instant to execute # Unable to select filter mode as the method that controls this is overridden by _FlinkSizeBasedClusteringPlanStrategy#filterPartitionPaths_ # No cleaning ** With reference to OfflineCompaction (HoodieFlinkCompactor), cleaning is only enabled if _clean.async.enabled = false._ # Schedule configuration is not consistent with HoodieFlinkCompactor defining the flag = false, which is opposite of HoodieFlinkCompactor # No ability to allow props to be passed in using _--props/–hoodie-conf_ ** Required for passing in configurations like: *** _hoodie.parquet.compression.ratio_ *** Partition filter configurations depending on strategy # Clustering group will spit out files of _hoodie.parquet.max.file.size_ (120MB by default) # Multiple clustering jobs can execute, but no fine-grain control over restarting jobs that have failed. Current implementation will only filter for REQUESTED clustering jobs; rollbacks will never be performed. # Removed unused _getNumberOfOutputFileGroups()_ function. ** _hoodie.clustering.plan.strategy.small.file.limit_ ** _hoodie.clustering.plan.strategy.max.bytes.per.group_ ** _hoodie.clustering.plan.strategy.target.file.max.bytes_ ** Will create N file groups (1 task will be writing to each file group, increasing parallelism) was: h1. Flink Hudi Clustering Issues # Integer type used for byte size variables instead of long ** Maximum size range of 2^31-1 bytes ~2 gigabytes # Unable to choose a particular instant to execute # Unable to select filter mode as the method that controls this is overridden by _FlinkSizeBasedClusteringPlanStrategy#filterPartitionPaths_ # No cleaning ** With reference to OfflineCompaction (HoodieFlinkCompactor), cleaning is only enabled if _clean.async.enabled = false._ # Schedule configuration is not consistent with HoodieFlinkCompactor defining the flag = false, which is opposite of HoodieFlinkCompactor # No ability to allow props to be passed in using _--props/–hoodie-conf_ ** Required for passing in configurations like: *** _hoodie.parquet.compression.ratio_ *** Partition filter configurations depending on strategy # Clustering group will spit out files of _hoodie.parquet.max.file.size_ (120MB by default) # Multiple clustering jobs can execute, but no fine-grain control over restarting jobs that have failed. Current implementation will only filter for REQUESTED clustering jobs; rollbacks will never be performed. # Removed unused _getNumberOfOutputFileGroups()_ function. ** _hoodie.clustering.plan.strategy.small.file.limit_ ** _hoodie.clustering.plan.strategy.max.bytes.per.group_ ** _hoodie.clustering.plan.strategy.target.file.max.bytes_ ** Will create N file groups (1 task will be writing to each file group, increasing parallelism) > Fix HoodieFlinkClusteringJob > > > Key: HUDI-4766 > URL: https://issues.apache.org/jira/browse/HUDI-4766 > Project: Apache Hudi > Issue Type: Bug >Reporter: voon >Assignee: voon >Priority: Major > Labels: pull-request-available > > h1. Flink Hudi Clustering Issues > > # Integer type used for byte-size configuration parameters instead of long > ** Maximum size range of 2^31-1 bytes ~2 gigabytes > # Unable to choose a particular instant to execute > # Unable to select filter mode as the method that controls this is > overridden by _FlinkSizeBasedClusteringPlanStrategy#filterPartitionPaths_ > # No cleaning > ** With reference to OfflineCompaction (HoodieFlinkCompactor), cleaning is > only enabled if _clean.async.enabled = false._ > # Schedule configuration is not consistent with HoodieFlinkCompactor > defining the flag = false, which is opposite of HoodieFlinkCompactor > # No ability to allow props to be passed in using _--props/–hoodie-conf_ > ** Required for passing in configurations like: > *** _hoodie.parquet.compression.ratio_ > *** Partition filter configurations depending on strategy > # Clustering group will spit out files of _hoodie.parquet.max.file.size_ > (120MB by default) > # Multiple clustering jobs can execute, but no fine-grain control over > restarting jobs that have failed. Current implementation will only filter for > REQUESTED clustering jobs; rollbacks will never be performed. > # Removed unused _getNumberOfOutputFileGroups()_ function. > ** _hoodie.clustering.plan.strategy.small.file.limit_ > ** _hoodie.clustering.plan.strategy.max.bytes.per.group_ > ** _hoodie.clustering.plan.strategy.target.file.max.byte
[jira] [Updated] (HUDI-4766) Fix HoodieFlinkClusteringJob
[ https://issues.apache.org/jira/browse/HUDI-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] voon updated HUDI-4766: --- Description: h1. Flink Hudi Clustering Issues # Integer type used for byte size variables instead of long ** Maximum size range of 2^31-1 bytes ~2 gigabytes # Unable to choose a particular instant to execute # Unable to select filter mode as the method that controls this is overridden by _FlinkSizeBasedClusteringPlanStrategy#filterPartitionPaths_ # No cleaning ** With reference to OfflineCompaction (HoodieFlinkCompactor), cleaning is only enabled if _clean.async.enabled = false._ # Schedule configuration is not consistent with HoodieFlinkCompactor defining the flag = false, which is opposite of HoodieFlinkCompactor # No ability to allow props to be passed in using _--props/–hoodie-conf_ ** Required for passing in configurations like: *** _hoodie.parquet.compression.ratio_ *** Partition filter configurations depending on strategy # Clustering group will spit out files of _hoodie.parquet.max.file.size_ (120MB by default) # Multiple clustering jobs can execute, but no fine-grain control over restarting jobs that have failed. Current implementation will only filter for REQUESTED clustering jobs; rollbacks will never be performed. # Removed unused _getNumberOfOutputFileGroups()_ function. ** _hoodie.clustering.plan.strategy.small.file.limit_ ** _hoodie.clustering.plan.strategy.max.bytes.per.group_ ** _hoodie.clustering.plan.strategy.target.file.max.bytes_ ** Will create N file groups (1 task will be writing to each file group, increasing parallelism) was: h1. Flink Hudi Clustering Issues # Integer type used for byte size variables instead of long ** Maximum size range of 2^31-1 bytes ~2 gigabytes # Unable to choose a particular instant to execute # Unable to select filter mode as the method that controls this is overridden by _FlinkSizeBasedClusteringPlanStrategy#filterPartitionPaths_ # No cleaning ** With reference to OfflineCompaction (HoodieFlinkCompactor), cleaning is only enabled if _clean.async.enabled = false._ # Schedule configuration is not consistent with HoodieFlinkCompactor defining the flag = false, which is opposite of HoodieFlinkCompactor # Allow props to be passed in using _--props/–hoodie-conf_ ** Required for passing in configurations like: *** _hoodie.parquet.compression.ratio_ *** Partition filter configurations depending on strategy # Clustering group will spit out files of _hoodie.parquet.max.file.size_ (120MB by default) # Multiple clustering jobs can execute, but no fine-grain control over restarting jobs that have failed. Current implementation will only filter for REQUESTED clustering jobs; rollbacks will never be performed. # Removed unused _getNumberOfOutputFileGroups()_ function. ** _hoodie.clustering.plan.strategy.small.file.limit_ ** _hoodie.clustering.plan.strategy.max.bytes.per.group_ ** _hoodie.clustering.plan.strategy.target.file.max.bytes_ ** Will create N file groups (1 task will be writing to each file group, increasing parallelism) > Fix HoodieFlinkClusteringJob > > > Key: HUDI-4766 > URL: https://issues.apache.org/jira/browse/HUDI-4766 > Project: Apache Hudi > Issue Type: Bug >Reporter: voon >Assignee: voon >Priority: Major > Labels: pull-request-available > > h1. Flink Hudi Clustering Issues > > # Integer type used for byte size variables instead of long > ** Maximum size range of 2^31-1 bytes ~2 gigabytes > # Unable to choose a particular instant to execute > # Unable to select filter mode as the method that controls this is > overridden by _FlinkSizeBasedClusteringPlanStrategy#filterPartitionPaths_ > # No cleaning > ** With reference to OfflineCompaction (HoodieFlinkCompactor), cleaning is > only enabled if _clean.async.enabled = false._ > # Schedule configuration is not consistent with HoodieFlinkCompactor > defining the flag = false, which is opposite of HoodieFlinkCompactor > # No ability to allow props to be passed in using _--props/–hoodie-conf_ > ** Required for passing in configurations like: > *** _hoodie.parquet.compression.ratio_ > *** Partition filter configurations depending on strategy > # Clustering group will spit out files of _hoodie.parquet.max.file.size_ > (120MB by default) > # Multiple clustering jobs can execute, but no fine-grain control over > restarting jobs that have failed. Current implementation will only filter for > REQUESTED clustering jobs; rollbacks will never be performed. > # Removed unused _getNumberOfOutputFileGroups()_ function. > ** _hoodie.clustering.plan.strategy.small.file.limit_ > ** _hoodie.clustering.plan.strategy.max.bytes.per.group_ > ** _hoodie.clustering.plan.strategy.target.file.max.bytes_ > ** Will create N file groups (1 task w
[jira] [Updated] (HUDI-4766) Fix HoodieFlinkClusteringJob
[ https://issues.apache.org/jira/browse/HUDI-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4766: - Labels: pull-request-available (was: ) > Fix HoodieFlinkClusteringJob > > > Key: HUDI-4766 > URL: https://issues.apache.org/jira/browse/HUDI-4766 > Project: Apache Hudi > Issue Type: Bug >Reporter: voon >Assignee: voon >Priority: Major > Labels: pull-request-available > > h1. Flink Hudi Clustering Issues > > # Integer type used for byte size variables instead of long > ** Maximum size range of 2^31-1 bytes ~2 gigabytes > # Unable to choose a particular instant to execute > # Unable to select filter mode as the method that controls this is > overridden by _FlinkSizeBasedClusteringPlanStrategy#filterPartitionPaths_ > # No cleaning > ** With reference to OfflineCompaction (HoodieFlinkCompactor), cleaning is > only enabled if _clean.async.enabled = false._ > # Schedule configuration is not consistent with HoodieFlinkCompactor > defining the flag = false, which is opposite of HoodieFlinkCompactor > # Allow props to be passed in using _--props/–hoodie-conf_ > ** Required for passing in configurations like: > *** _hoodie.parquet.compression.ratio_ > *** Partition filter configurations depending on strategy > # Clustering group will spit out files of _hoodie.parquet.max.file.size_ > (120MB by default) > # Multiple clustering jobs can execute, but no fine-grain control over > restarting jobs that have failed. Current implementation will only filter for > REQUESTED clustering jobs; rollbacks will never be performed. > # Removed unused _getNumberOfOutputFileGroups()_ function. > ** _hoodie.clustering.plan.strategy.small.file.limit_ > ** _hoodie.clustering.plan.strategy.max.bytes.per.group_ > ** _hoodie.clustering.plan.strategy.target.file.max.bytes_ > ** Will create N file groups (1 task will be writing to each file group, > increasing parallelism) -- This message was sent by Atlassian Jira (v8.20.10#820010)