Arpit Varshney created GOBBLIN-2105:
---------------------------------------
Summary: Ensure the destination path does not exist before
renaming during Gobblin compaction.
Key: GOBBLIN-2105
URL: https://issues.apache.org/jira/browse/GOBBLIN-2105
Project: Apache Gobblin
Issue Type: Improvement
Components: gobblin-compaction
Reporter: Arpit Varshney
Assignee: Issac Buenrostro
As part of Gobblin compaction (deduplication), compacted files are moved from
staging to their final location at the end of the process. This movement is
handled by the
org.apache.gobblin.compaction.action.CompactionCompleteFileOperationAction#onCompactionJobComplete
method, which determines the appropriate destination path and moves the
compacted files accordingly.
Current Issue:
- If the flag compaction.rename.source.dir.enabled is set to false (not in
append mode) and recompaction.write.to.new.folder is set to true, a new
directory is determined based on the execution count derived from the state
file.
- The state file, however, is generated after the move to the final location.
If there are any failures during this move, the state file will be incorrect.
- In the next execution, the determined destination path might already exist.
This will cause the rename operation to create an additional child directory,
as is the behavior of HDFS rename when the destination directory already exists.
Requirement:
We need to ensure that the destination path determined must not exist before
the rename operation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)