Will-Lo commented on code in PR #3563: URL: https://github.com/apache/gobblin/pull/3563#discussion_r973484788
########## gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/TimeAwareRecursiveCopyableDataset.java: ########## @@ -134,9 +134,40 @@ protected List<FileStatus> getFilesAtPath(FileSystem fs, Path path, PathFilter f return recursivelyGetFilesAtDatePath(fs, path, "", fileFilter, 1, startDate, endDate, formatter); } + public Boolean checkPathDateTimeValidity(LocalDateTime startDate, LocalDateTime endDate, String traversedDatePath) { + int[] startDateSplit = new int[] { startDate.getYear(), startDate.getMonthOfYear(), startDate.getDayOfMonth(), + startDate.getHourOfDay(), startDate.getMinuteOfHour(), startDate.getSecondOfMinute(), startDate.getMillisOfSecond() }; + int[] endDateSplit = new int[] { endDate.getYear(), endDate.getMonthOfYear(), endDate.getDayOfMonth(), + endDate.getHourOfDay(), endDate.getMinuteOfHour(), endDate.getSecondOfMinute(), endDate.getMillisOfSecond() }; + + String[] traversedDatePathSplit = traversedDatePath.split("/"); + + // Only check the number of parameters that the traversedDatePath has traversed through so far + for (int index = 0; index < traversedDatePathSplit.length; index++) { + // Only attempt to parse the number if the entire string are digits + boolean onlyNumbers = traversedDatePathSplit[index].matches("^[0-9]+$"); + if (onlyNumbers) { + if (Integer.parseInt(traversedDatePathSplit[index]) < startDateSplit[index] || + Integer.parseInt(traversedDatePathSplit[index]) > endDateSplit[index]) { + return false; + } + } + else { + return false; + } + } + return true; Review Comment: I believe this would not work when considering ranges that span beyond multiple years/months/days. Consider traversedDatePathSplit == [2022, 09, 01 ....] startDate is 2022/08/20, endDate is 2022/09/10 Then: it would return false since it thinks the date is previous to the start Date, 01 < 20. I would follow Arjun's recommendation of keeping strings as dates, and then rounding them to the lowest granularity, and then comparing them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@gobblin.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org