[ 
https://issues.apache.org/jira/browse/GOBBLIN-1001?focusedWorklogId=358097&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-358097
 ]

ASF GitHub Bot logged work on GOBBLIN-1001:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/Dec/19 21:28
            Start Date: 11/Dec/19 21:28
    Worklog Time Spent: 10m 
      Work Description: zxcware commented on issue #2846: [GOBBLIN-1001] 
Implement TimePartitionGlobFinder
URL: 
https://github.com/apache/incubator-gobblin/pull/2846#issuecomment-564740742
 
 
   @autumnust Yeah, `yesterdayPartition` is really specific, I'm thinking about 
generalize it to `enforcePreviousN`(looking for better name suggestions) 
partitions. Its main responsibility is to create `EmptyFileSystemDataset` if 
any of the previous N doesn't exist, signaling quiet time. In addition, it 
focuses on time partitions and supports different time formats(not limitted to 
`yyyy/MM/dd`) compared to vanilla `DefaultFileSystemGlobFinder`.  (I'm adding 
comments about it s usage)
   
   By `enforcePreviousN`, it's tied with company requirements even less and 
makes it more justifiable to open-source. In our use case, we capture the quiet 
time signal to publish compaction watermark. It can be captured by others to do 
different operations. 
   
   Another consideration was we have to make internal copies of open source 
compaction constructs(`MRTask`, `Verifier`, `CompactionAction`), if 
`EmptyFileSystemDataset` is made internal. Compared to make 
`EmptyFileSystemDataset` first citizen of open source compaction flow, the 
implementation and mountainous cost of internalization is high, given most of 
our pipelines use open source compaction constructs 
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 358097)
    Time Spent: 50m  (was: 40m)

> Implement TimePartitionGlobFinder
> ---------------------------------
>
>                 Key: GOBBLIN-1001
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1001
>             Project: Apache Gobblin
>          Issue Type: Task
>            Reporter: Zhixiong Chen
>            Assignee: Zhixiong Chen
>            Priority: Major
>          Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to