[ https://issues.apache.org/jira/browse/HUDI-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yue Zhang updated HUDI-5716: ---------------------------- Fix Version/s: 0.14.0 (was: 0.13.1) > Fix Partitioners to avoid assuming that parallelism is always present > --------------------------------------------------------------------- > > Key: HUDI-5716 > URL: https://issues.apache.org/jira/browse/HUDI-5716 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core > Reporter: Alexey Kudinkin > Assignee: Alexey Kudinkin > Priority: Blocker > Labels: pull-request-available > Fix For: 0.14.0 > > > Currently, `Partitioner` impls assume that there's always going to be some > parallelism level. > This has not been issue previously for the following reasons: > * RDDs always have inherent "parallelism" level defined as the # of > partitions they operating upon. However for Dataset (SparkPlan) that's not > necessarily the case (som SparkPlans might not be reporting the output > partitioning) > * Additionally, we have had the default parallelism level set in our configs > before which meant that we'd prefer that over the actual incoming dataset. > However, since we've recently removed default parallelism value from our > configs we now need to fix Partitioners to make sure these are not assuming > that parallelism is always going to be present. -- This message was sent by Atlassian Jira (v8.20.10#820010)