phet commented on code in PR #3528:
URL: https://github.com/apache/gobblin/pull/3528#discussion_r926141507
##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimeAwareRecursiveCopyableDataset.java:
##########
@@ -170,37 +129,41 @@ private boolean
lookbackTimeMatchesFormat(PeriodFormatterBuilder formatterBuilde
@Override
protected List<FileStatus> getFilesAtPath(FileSystem fs, Path path,
PathFilter fileFilter) throws IOException {
- DateTimeFormatter formatter = DateTimeFormat.forPattern(this.datePattern);
- LocalDateTime endDate = currentTime;
- LocalDateTime startDate = endDate.minus(this.lookbackPeriod);
- List<FileStatus> fileStatuses = Lists.newArrayList();
+ return recursivelyGetFilesAtDatePath(fs, path, "", fileFilter, 1);
+ }
- // Data inside of nested folders representing timestamps need to be
fetched differently
- if (datePattern.contains(FileSystems.getDefault().getSeparator())) {
- // Use an iterator that traverses through all times from lookback to
current time, based on format
- DateRangeIterator dateRangeIterator = new DateRangeIterator(startDate,
endDate, this.patternQualifier);
- while (dateRangeIterator.hasNext()) {
- Path pathWithDateTime = new Path(path,
dateRangeIterator.next().toString(formatter));
- if (!fs.exists(pathWithDateTime)) {
- continue;
- }
- fileStatuses.addAll(super.getFilesAtPath(fs, pathWithDateTime,
fileFilter));
- }
- } else {
- // Look at the top level directories and compare if those fit into the
date format
- Iterator<FileStatus> folderIterator =
Arrays.asList(fs.listStatus(path)).iterator();
+ private List<FileStatus> recursivelyGetFilesAtDatePath(FileSystem fs, Path
path, String traversedDatePath, PathFilter fileFilter, int level) throws
IOException {
+ List<FileStatus> fileStatuses = Lists.newArrayList();
+ Iterator<FileStatus> folderIterator =
Arrays.asList(fs.listStatus(path)).iterator();
+
+ // Check if at the lowest level/granularity of the date folder
+ if (this.datePattern.split(FileSystems.getDefault().getSeparator()).length
== level) {
Review Comment:
since `dateaPattern` doesn't change, could always calc in `getFilesAtPath`
and decrement on each recursive step
same with the `endDate` and `startDate` below: could they be calculated once
in the helper function?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]