[ https://issues.apache.org/jira/browse/HUDI-5692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hui An updated HUDI-5692: ------------------------- Fix Version/s: 0.13.0 > SpillableMapBasePath should be lazily loaded > -------------------------------------------- > > Key: HUDI-5692 > URL: https://issues.apache.org/jira/browse/HUDI-5692 > Project: Apache Hudi > Issue Type: Bug > Reporter: Hui An > Assignee: Hui An > Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > If we use {{withInferFunction}} to set the default value of > {{{}SPILLABLE_MAP_BASE_PATH{}}}, this default value will be set to > {{{}HoodieWriteConfig{}}}'s {{{}properties{}}}, and will be serialized to all > executors. This could introduce the issue that if the driver doesn't have the > same temporary location with the executors side(e.g. driver: /mnt/disk1, > executor: /mnt/disk2), the executor would throw error to create the spilled > map path(since the executor machine doesn't have the directory /mnt/disk1). > {code:java} > Caused by: org.apache.hudi.exception.HoodieIOException: Unable to create > :/mnt/ssd/0/yarn/nm-local-dir/usercache/test/appcache/application_1673593627114_3970647/hudi-BITCASK-e3741235-6571-4112-8b20-271408148238 > at > org.apache.hudi.common.util.collection.ExternalSpillableMap.getDiskBasedMap(ExternalSpillableMap.java:119) > at > org.apache.hudi.common.util.collection.ExternalSpillableMap.getDiskBasedMapNumEntries(ExternalSpillableMap.java:138) > at org.apache.hudi.io.HoodieMergeHandle.init(HoodieMergeHandle.java:268) > at org.apache.hudi.io.HoodieMergeHandle.(HoodieMergeHandle.java:129) > at org.apache.hudi.io.HoodieMergeHandle.(HoodieMergeHandle.java:121) > at org.apache.hudi.io.HoodieConcatHandle.(HoodieConcatHandle.java:81) > at > org.apache.hudi.io.HoodieMergeHandleFactory.create(HoodieMergeHandleFactory.java:60) > at > org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.getUpdateHandle(BaseSparkCommitActionExecutor.java:386) > at > org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:363) > at > org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:330) > ... 29 more > Caused by: java.io.IOException: Unable to create > :/mnt/ssd/0/yarn/nm-local-dir/usercache/test/appcache/application_1673593627114_3970647/hudi-BITCASK-e3741235-6571-4112-8b20-271408148238 > at org.apache.hudi.common.util.FileIOUtils.mkdir(FileIOUtils.java:70) > at org.apache.hudi.common.util.collection.DiskMap.(DiskMap.java:55) > at > org.apache.hudi.common.util.collection.BitCaskDiskMap.(BitCaskDiskMap.java:98) > at > org.apache.hudi.common.util.collection.ExternalSpillableMap.getDiskBasedMap(ExternalSpillableMap.java:116) > ... 38 more > > {code} > A better solution is to calculate the temporary location when calling > {{getSpillableMapBasePath}} -- This message was sent by Atlassian Jira (v8.20.10#820010)