[
https://issues.apache.org/jira/browse/GOBBLIN-1714?focusedWorklogId=811530&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-811530
]
ASF GitHub Bot logged work on GOBBLIN-1714:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 23/Sep/22 08:08
Start Date: 23/Sep/22 08:08
Worklog Time Spent: 10m
Work Description: Will-Lo commented on code in PR #3568:
URL: https://github.com/apache/gobblin/pull/3568#discussion_r978345925
##########
gobblin-data-management/src/test/java/org/apache/gobblin/data/management/copy/RecursiveCopyableDatasetTest.java:
##########
@@ -326,7 +326,7 @@ public TestRecursiveCopyableDataset(Path source, Path
target, List<FileStatus> s
@Override
protected List<FileStatus> getFilesAtPath(FileSystem fs, Path path,
PathFilter fileFilter)
- throws IOException {
+ throws RuntimeException {
Review Comment:
It's not standard to declare that functions throw RuntimeException. See:
http://www.javapractices.com/topic/TopicAction.do?Id=129
It's not enforced by Java as this (and its descendants) are considered to be
non-recoverable exceptions, so it is not necessary for callers to handle
RuntimeException explicitly.
##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/RecursiveCopyableDataset.java:
##########
@@ -195,22 +196,28 @@ public Collection<? extends CopyEntity>
getCopyableFiles(FileSystem targetFs, Co
Map<Path, FileStatus> filesInSource =
createPathMap(getFilesAtPath(this.fs, this.rootPath, this.pathFilter),
this.rootPath);
- Map<Path, FileStatus> filesInTarget =
- createPathMap(getFilesAtPath(targetFs, targetPath, this.pathFilter),
targetPath);
+
+ // Allow fileNotFoundException for filesInTarget since if it doesn't
exist, they will be created.
+ List<FileStatus> filesAtPath = Lists.newArrayList();
+ try {
+ filesAtPath = getFilesAtPath(targetFs, targetPath, this.pathFilter);
+ } catch (FileNotFoundException e) {
+ log.info(String.format("Could not find any files on targetFs %s path
%s.", targetFs.getUri(), targetPath));
+ }
+ Map<Path, FileStatus> filesInTarget = createPathMap(filesAtPath,
targetPath);
return getCopyableFilesImpl(configuration, filesInSource, filesInTarget,
targetFs,
nonGlobSearchPath, configuration.getPublishDir(), targetPath);
}
@VisibleForTesting
protected List<FileStatus> getFilesAtPath(FileSystem fs, Path path,
PathFilter fileFilter)
- throws IOException {
+ throws FileNotFoundException {
try {
return FileListUtils
.listFilesToCopyAtPath(fs, path, fileFilter,
applyFilterToDirectories, includeEmptyDirectories);
} catch (IOException e) {
- log.warn(String.format("Could not find any files on fs %s path %s due to
the following exception. Returning an empty list of files.", fs.getUri(),
path), e);
- return Lists.newArrayList();
+ throw new FileNotFoundException(String.format("Could not find any files
on fs %s path %s.", fs.getUri(), path));
}
Review Comment:
Sorry if I was misleading earlier, I thought about it some more and I think
we need to be cautious here. We want to actually do the reverse of what you
have. So we have the function catch (FileNotFoundException) here silently,
which is the old behavior. We want to actually have this function return the
empty list `filesAtPath` since otherwise it would cause all pipelines with one
missing target folder to perform a full copy instead of an incremental copy.
This means that there will be a tradeoff, the sourceFS will still fail
silently if the folder is missing on the source.
##########
gobblin-data-management/src/test/java/org/apache/gobblin/data/management/copy/RecursiveCopyableDatasetTest.java:
##########
@@ -326,7 +326,7 @@ public TestRecursiveCopyableDataset(Path source, Path
target, List<FileStatus> s
@Override
protected List<FileStatus> getFilesAtPath(FileSystem fs, Path path,
PathFilter fileFilter)
- throws IOException {
+ throws RuntimeException {
Review Comment:
Though since this is a test function, so you can probably just have it throw
the IOException instead or have it match the function definition but not throw
##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/RecursiveCopyableDataset.java:
##########
@@ -195,22 +196,28 @@ public Collection<? extends CopyEntity>
getCopyableFiles(FileSystem targetFs, Co
Map<Path, FileStatus> filesInSource =
createPathMap(getFilesAtPath(this.fs, this.rootPath, this.pathFilter),
this.rootPath);
- Map<Path, FileStatus> filesInTarget =
- createPathMap(getFilesAtPath(targetFs, targetPath, this.pathFilter),
targetPath);
+
+ // Allow fileNotFoundException for filesInTarget since if it doesn't
exist, they will be created.
+ List<FileStatus> filesAtPath = Lists.newArrayList();
+ try {
+ filesAtPath = getFilesAtPath(targetFs, targetPath, this.pathFilter);
+ } catch (FileNotFoundException e) {
+ log.info(String.format("Could not find any files on targetFs %s path
%s.", targetFs.getUri(), targetPath));
+ }
+ Map<Path, FileStatus> filesInTarget = createPathMap(filesAtPath,
targetPath);
return getCopyableFilesImpl(configuration, filesInSource, filesInTarget,
targetFs,
nonGlobSearchPath, configuration.getPublishDir(), targetPath);
}
@VisibleForTesting
protected List<FileStatus> getFilesAtPath(FileSystem fs, Path path,
PathFilter fileFilter)
- throws IOException {
+ throws FileNotFoundException {
try {
return FileListUtils
.listFilesToCopyAtPath(fs, path, fileFilter,
applyFilterToDirectories, includeEmptyDirectories);
} catch (IOException e) {
- log.warn(String.format("Could not find any files on fs %s path %s due to
the following exception. Returning an empty list of files.", fs.getUri(),
path), e);
- return Lists.newArrayList();
+ throw new FileNotFoundException(String.format("Could not find any files
on fs %s path %s.", fs.getUri(), path));
}
Review Comment:
Given that this is the current behavior for the other Gobblin pipelines (if
source does not exist, do not fail but report no work done), let's just go with
this for now. If we want to fail loudly on no workunits collected, we should
handle it holistically at a higher level.
Issue Time Tracking
-------------------
Worklog Id: (was: 811530)
Time Spent: 0.5h (was: 20m)
> Silent failure during data copy
> -------------------------------
>
> Key: GOBBLIN-1714
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1714
> Project: Apache Gobblin
> Issue Type: Bug
> Reporter: Andy Jiang
> Priority: Major
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)