[ 
https://issues.apache.org/jira/browse/HUDI-5692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui An updated HUDI-5692:
-------------------------
    Fix Version/s: 0.13.0

> SpillableMapBasePath should be lazily loaded
> --------------------------------------------
>
>                 Key: HUDI-5692
>                 URL: https://issues.apache.org/jira/browse/HUDI-5692
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Hui An
>            Assignee: Hui An
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.13.0
>
>
> If we use {{withInferFunction}} to set the default value of 
> {{{}SPILLABLE_MAP_BASE_PATH{}}}, this default value will be set to 
> {{{}HoodieWriteConfig{}}}'s {{{}properties{}}}, and will be serialized to all 
> executors. This could introduce the issue that if the driver doesn't have the 
> same temporary location with the executors side(e.g. driver: /mnt/disk1, 
> executor: /mnt/disk2), the executor would throw error to create the spilled 
> map path(since the executor machine doesn't have the directory /mnt/disk1).
> {code:java}
> Caused by: org.apache.hudi.exception.HoodieIOException: Unable to create 
> :/mnt/ssd/0/yarn/nm-local-dir/usercache/test/appcache/application_1673593627114_3970647/hudi-BITCASK-e3741235-6571-4112-8b20-271408148238
>       at 
> org.apache.hudi.common.util.collection.ExternalSpillableMap.getDiskBasedMap(ExternalSpillableMap.java:119)
>       at 
> org.apache.hudi.common.util.collection.ExternalSpillableMap.getDiskBasedMapNumEntries(ExternalSpillableMap.java:138)
>       at org.apache.hudi.io.HoodieMergeHandle.init(HoodieMergeHandle.java:268)
>       at org.apache.hudi.io.HoodieMergeHandle.(HoodieMergeHandle.java:129)
>       at org.apache.hudi.io.HoodieMergeHandle.(HoodieMergeHandle.java:121)
>       at org.apache.hudi.io.HoodieConcatHandle.(HoodieConcatHandle.java:81)
>       at 
> org.apache.hudi.io.HoodieMergeHandleFactory.create(HoodieMergeHandleFactory.java:60)
>       at 
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.getUpdateHandle(BaseSparkCommitActionExecutor.java:386)
>       at 
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:363)
>       at 
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:330)
>       ... 29 more
> Caused by: java.io.IOException: Unable to create 
> :/mnt/ssd/0/yarn/nm-local-dir/usercache/test/appcache/application_1673593627114_3970647/hudi-BITCASK-e3741235-6571-4112-8b20-271408148238
>       at org.apache.hudi.common.util.FileIOUtils.mkdir(FileIOUtils.java:70)
>       at org.apache.hudi.common.util.collection.DiskMap.(DiskMap.java:55)
>       at 
> org.apache.hudi.common.util.collection.BitCaskDiskMap.(BitCaskDiskMap.java:98)
>       at 
> org.apache.hudi.common.util.collection.ExternalSpillableMap.getDiskBasedMap(ExternalSpillableMap.java:116)
>       ... 38 more
>  
> {code}
> A better solution is to calculate the temporary location when calling 
> {{getSpillableMapBasePath}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to