[ 
https://issues.apache.org/jira/browse/GOBBLIN-1669?focusedWorklogId=793476&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793476
 ]

ASF GitHub Bot logged work on GOBBLIN-1669:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 21/Jul/22 00:12
            Start Date: 21/Jul/22 00:12
    Worklog Time Spent: 10m 
      Work Description: phet commented on code in PR #3528:
URL: https://github.com/apache/gobblin/pull/3528#discussion_r926141507


##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimeAwareRecursiveCopyableDataset.java:
##########
@@ -170,37 +129,41 @@ private boolean 
lookbackTimeMatchesFormat(PeriodFormatterBuilder formatterBuilde
 
   @Override
   protected List<FileStatus> getFilesAtPath(FileSystem fs, Path path, 
PathFilter fileFilter) throws IOException {
-    DateTimeFormatter formatter = DateTimeFormat.forPattern(this.datePattern);
-    LocalDateTime endDate = currentTime;
-    LocalDateTime startDate = endDate.minus(this.lookbackPeriod);
-    List<FileStatus> fileStatuses = Lists.newArrayList();
+    return recursivelyGetFilesAtDatePath(fs, path, "", fileFilter, 1);
+  }
 
-    // Data inside of nested folders representing timestamps need to be 
fetched differently
-    if (datePattern.contains(FileSystems.getDefault().getSeparator())) {
-      // Use an iterator that traverses through all times from lookback to 
current time, based on format
-      DateRangeIterator dateRangeIterator = new DateRangeIterator(startDate, 
endDate, this.patternQualifier);
-      while (dateRangeIterator.hasNext()) {
-        Path pathWithDateTime = new Path(path, 
dateRangeIterator.next().toString(formatter));
-        if (!fs.exists(pathWithDateTime)) {
-          continue;
-        }
-        fileStatuses.addAll(super.getFilesAtPath(fs, pathWithDateTime, 
fileFilter));
-      }
-    } else {
-      // Look at the top level directories and compare if those fit into the 
date format
-      Iterator<FileStatus> folderIterator = 
Arrays.asList(fs.listStatus(path)).iterator();
+  private List<FileStatus> recursivelyGetFilesAtDatePath(FileSystem fs, Path 
path, String traversedDatePath, PathFilter fileFilter, int level) throws 
IOException {
+    List<FileStatus> fileStatuses = Lists.newArrayList();
+    Iterator<FileStatus> folderIterator = 
Arrays.asList(fs.listStatus(path)).iterator();
+
+    // Check if at the lowest level/granularity of the date folder
+    if (this.datePattern.split(FileSystems.getDefault().getSeparator()).length 
== level) {

Review Comment:
   since `dateaPattern` doesn't change, could always calc in `getFilesAtPath` 
and decrement on each recursive step
   
   same with the `endDate` and `startDate` below: could they be calculated once 
in the helper function?





Issue Time Tracking
-------------------

    Worklog Id:     (was: 793476)
    Time Spent: 2h  (was: 1h 50m)

> Support seconds with TimeAwareRecursiveCopyableDataset
> ------------------------------------------------------
>
>                 Key: GOBBLIN-1669
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1669
>             Project: Apache Gobblin
>          Issue Type: Improvement
>          Components: gobblin-service
>            Reporter: William Lo
>            Assignee: Abhishek Tiwari
>            Priority: Major
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> # Support seconds with the timeiterator
>  # Optimize non-nested timestamp representations e.g. yyyy-mm-dd-hh-mm-ss to 
> not use an iterator, and instead list the top level directory to reduce the 
> number of FS calls.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to