[ 
https://issues.apache.org/jira/browse/GOBBLIN-2105?focusedWorklogId=925461&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-925461
 ]

ASF GitHub Bot logged work on GOBBLIN-2105:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/Jul/24 14:50
            Start Date: 11/Jul/24 14:50
    Worklog Time Spent: 10m 
      Work Description: arjun4084346 merged PR #3993:
URL: https://github.com/apache/gobblin/pull/3993




Issue Time Tracking
-------------------

            Worklog Id:     (was: 925461)
    Remaining Estimate: 0h
            Time Spent: 10m

> Ensure the destination path does not exist before renaming during Gobblin 
> compaction.
> -------------------------------------------------------------------------------------
>
>                 Key: GOBBLIN-2105
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-2105
>             Project: Apache Gobblin
>          Issue Type: Improvement
>          Components: gobblin-compaction
>            Reporter: Arpit Varshney
>            Assignee: Issac Buenrostro
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> As part of Gobblin compaction (deduplication), compacted files are moved from 
> staging to their final location at the end of the process. This movement is 
> handled by the 
> org.apache.gobblin.compaction.action.CompactionCompleteFileOperationAction#onCompactionJobComplete
>  method, which determines the appropriate destination path and moves the 
> compacted files accordingly.
> Current Issue:
> - If the flag compaction.rename.source.dir.enabled is set to false (not in 
> append mode) and recompaction.write.to.new.folder is set to true, a new 
> directory is determined based on the execution count derived from the state 
> file.
> - The state file, however, is generated after the move to the final location. 
> If there are any failures during this move, the state file will be incorrect.
> - In the next execution, the determined destination path might already exist. 
> This will cause the rename operation to create an additional child directory, 
> as is the behavior of HDFS rename when the destination directory already 
> exists.
> Requirement:
> We need to ensure that the destination path determined must not exist before 
> the rename operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to