[jira] [Updated] (HUDI-5716) Fix Partitioners to avoid assuming that parallelism is always present
[ https://issues.apache.org/jira/browse/HUDI-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Wason updated HUDI-5716: - Fix Version/s: 0.14.1 (was: 0.14.0) > Fix Partitioners to avoid assuming that parallelism is always present > - > > Key: HUDI-5716 > URL: https://issues.apache.org/jira/browse/HUDI-5716 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.14.1 > > > Currently, `Partitioner` impls assume that there's always going to be some > parallelism level. > This has not been issue previously for the following reasons: > * RDDs always have inherent "parallelism" level defined as the # of > partitions they operating upon. However for Dataset (SparkPlan) that's not > necessarily the case (som SparkPlans might not be reporting the output > partitioning) > * Additionally, we have had the default parallelism level set in our configs > before which meant that we'd prefer that over the actual incoming dataset. > However, since we've recently removed default parallelism value from our > configs we now need to fix Partitioners to make sure these are not assuming > that parallelism is always going to be present. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5716) Fix Partitioners to avoid assuming that parallelism is always present
[ https://issues.apache.org/jira/browse/HUDI-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Zhang updated HUDI-5716: Fix Version/s: 0.14.0 (was: 0.13.1) > Fix Partitioners to avoid assuming that parallelism is always present > - > > Key: HUDI-5716 > URL: https://issues.apache.org/jira/browse/HUDI-5716 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.14.0 > > > Currently, `Partitioner` impls assume that there's always going to be some > parallelism level. > This has not been issue previously for the following reasons: > * RDDs always have inherent "parallelism" level defined as the # of > partitions they operating upon. However for Dataset (SparkPlan) that's not > necessarily the case (som SparkPlans might not be reporting the output > partitioning) > * Additionally, we have had the default parallelism level set in our configs > before which meant that we'd prefer that over the actual incoming dataset. > However, since we've recently removed default parallelism value from our > configs we now need to fix Partitioners to make sure these are not assuming > that parallelism is always going to be present. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5716) Fix Partitioners to avoid assuming that parallelism is always present
[ https://issues.apache.org/jira/browse/HUDI-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5716: - Labels: pull-request-available (was: ) > Fix Partitioners to avoid assuming that parallelism is always present > - > > Key: HUDI-5716 > URL: https://issues.apache.org/jira/browse/HUDI-5716 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.1 > > > Currently, `Partitioner` impls assume that there's always going to be some > parallelism level. > This has not been issue previously for the following reasons: > * RDDs always have inherent "parallelism" level defined as the # of > partitions they operating upon. However for Dataset (SparkPlan) that's not > necessarily the case (som SparkPlans might not be reporting the output > partitioning) > * Additionally, we have had the default parallelism level set in our configs > before which meant that we'd prefer that over the actual incoming dataset. > However, since we've recently removed default parallelism value from our > configs we now need to fix Partitioners to make sure these are not assuming > that parallelism is always going to be present. -- This message was sent by Atlassian Jira (v8.20.10#820010)