[
https://issues.apache.org/jira/browse/GOBBLIN-1669?focusedWorklogId=793476&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793476
]
ASF GitHub Bot logged work on GOBBLIN-1669:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 21/Jul/22 00:12
Start Date: 21/Jul/22 00:12
Worklog Time Spent: 10m
Work Description: phet commented on code in PR #3528:
URL: https://github.com/apache/gobblin/pull/3528#discussion_r926141507
##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimeAwareRecursiveCopyableDataset.java:
##########
@@ -170,37 +129,41 @@ private boolean
lookbackTimeMatchesFormat(PeriodFormatterBuilder formatterBuilde
@Override
protected List<FileStatus> getFilesAtPath(FileSystem fs, Path path,
PathFilter fileFilter) throws IOException {
- DateTimeFormatter formatter = DateTimeFormat.forPattern(this.datePattern);
- LocalDateTime endDate = currentTime;
- LocalDateTime startDate = endDate.minus(this.lookbackPeriod);
- List<FileStatus> fileStatuses = Lists.newArrayList();
+ return recursivelyGetFilesAtDatePath(fs, path, "", fileFilter, 1);
+ }
- // Data inside of nested folders representing timestamps need to be
fetched differently
- if (datePattern.contains(FileSystems.getDefault().getSeparator())) {
- // Use an iterator that traverses through all times from lookback to
current time, based on format
- DateRangeIterator dateRangeIterator = new DateRangeIterator(startDate,
endDate, this.patternQualifier);
- while (dateRangeIterator.hasNext()) {
- Path pathWithDateTime = new Path(path,
dateRangeIterator.next().toString(formatter));
- if (!fs.exists(pathWithDateTime)) {
- continue;
- }
- fileStatuses.addAll(super.getFilesAtPath(fs, pathWithDateTime,
fileFilter));
- }
- } else {
- // Look at the top level directories and compare if those fit into the
date format
- Iterator<FileStatus> folderIterator =
Arrays.asList(fs.listStatus(path)).iterator();
+ private List<FileStatus> recursivelyGetFilesAtDatePath(FileSystem fs, Path
path, String traversedDatePath, PathFilter fileFilter, int level) throws
IOException {
+ List<FileStatus> fileStatuses = Lists.newArrayList();
+ Iterator<FileStatus> folderIterator =
Arrays.asList(fs.listStatus(path)).iterator();
+
+ // Check if at the lowest level/granularity of the date folder
+ if (this.datePattern.split(FileSystems.getDefault().getSeparator()).length
== level) {
Review Comment:
since `dateaPattern` doesn't change, could always calc in `getFilesAtPath`
and decrement on each recursive step
same with the `endDate` and `startDate` below: could they be calculated once
in the helper function?
Issue Time Tracking
-------------------
Worklog Id: (was: 793476)
Time Spent: 2h (was: 1h 50m)
> Support seconds with TimeAwareRecursiveCopyableDataset
> ------------------------------------------------------
>
> Key: GOBBLIN-1669
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1669
> Project: Apache Gobblin
> Issue Type: Improvement
> Components: gobblin-service
> Reporter: William Lo
> Assignee: Abhishek Tiwari
> Priority: Major
> Time Spent: 2h
> Remaining Estimate: 0h
>
> # Support seconds with the timeiterator
> # Optimize non-nested timestamp representations e.g. yyyy-mm-dd-hh-mm-ss to
> not use an iterator, and instead list the top level directory to reduce the
> number of FS calls.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)