[
https://issues.apache.org/jira/browse/GOBBLIN-2105?focusedWorklogId=925461&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-925461
]
ASF GitHub Bot logged work on GOBBLIN-2105:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 11/Jul/24 14:50
Start Date: 11/Jul/24 14:50
Worklog Time Spent: 10m
Work Description: arjun4084346 merged PR #3993:
URL: https://github.com/apache/gobblin/pull/3993
Issue Time Tracking
-------------------
Worklog Id: (was: 925461)
Remaining Estimate: 0h
Time Spent: 10m
> Ensure the destination path does not exist before renaming during Gobblin
> compaction.
> -------------------------------------------------------------------------------------
>
> Key: GOBBLIN-2105
> URL: https://issues.apache.org/jira/browse/GOBBLIN-2105
> Project: Apache Gobblin
> Issue Type: Improvement
> Components: gobblin-compaction
> Reporter: Arpit Varshney
> Assignee: Issac Buenrostro
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> As part of Gobblin compaction (deduplication), compacted files are moved from
> staging to their final location at the end of the process. This movement is
> handled by the
> org.apache.gobblin.compaction.action.CompactionCompleteFileOperationAction#onCompactionJobComplete
> method, which determines the appropriate destination path and moves the
> compacted files accordingly.
> Current Issue:
> - If the flag compaction.rename.source.dir.enabled is set to false (not in
> append mode) and recompaction.write.to.new.folder is set to true, a new
> directory is determined based on the execution count derived from the state
> file.
> - The state file, however, is generated after the move to the final location.
> If there are any failures during this move, the state file will be incorrect.
> - In the next execution, the determined destination path might already exist.
> This will cause the rename operation to create an additional child directory,
> as is the behavior of HDFS rename when the destination directory already
> exists.
> Requirement:
> We need to ensure that the destination path determined must not exist before
> the rename operation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)