[ 
https://issues.apache.org/jira/browse/GOBBLIN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Tiwari reassigned GOBBLIN-117:
---------------------------------------

       Assignee: Hung Tran
    Component/s: state-management

> Remove current.jst from FsStateStore, add state-store retention to driver
> -------------------------------------------------------------------------
>
>                 Key: GOBBLIN-117
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-117
>             Project: Apache Gobblin
>          Issue Type: Bug
>          Components: state-management
>            Reporter: Sahil Takiar
>            Assignee: Hung Tran
>              Labels: Core:TaskManagement, enhancement
>
> - The `FsStateStore` creates and updates a `current.jst` to track the most 
> recent version of the job state
> - The problem is that for AWS users, the state-store typically has be put on 
> S3
> - The problem is that for overwriting data, S3 only provides eventual 
> consistency
> - This can cause problems as Gobblin jobs will see an old version of the 
> state-store
> A simple solution to this problem would be to:
> - Remove the concept of `current.jst` and just let each state-store entry be 
> of the form `job_id.jst`
> - The gobblin just does a `ls` on the state-store directory, sorts the 
> contents by file name and picks the most recent one
> - File listing time + sorting the listing shouldn't take long, but just in 
> case the state-store retention job should be run as part of the Gobblin core 
> job - either in the `ApplicationLauncher` or the `JobLauncher`
>  
> *Github Url* : https://github.com/linkedin/gobblin/issues/882 
> *Github Reporter* : [~stakiar] 
> *Github Created At* : 2016-03-25T03:54:26Z 
> *Github Updated At* : 2017-01-12T04:50:48Z 
> h3. Comments 
> ----
> [~stakiar] wrote on 2016-03-25T03:55:30Z : @zliu41 I believe we discussed 
> this briefly while working on #741, any comments on the above approach?
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/882#issuecomment-201126060 
> ----
> *zliu41* wrote on 2016-03-25T15:35:35Z : LGTM except that if a job has 
> multiple datasets there will be multiple `current.jst`s so you'll need to 
> find the most recent one for each dataset urn.
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/882#issuecomment-201334649 
> ----
> [~jbaranick] wrote on 2016-04-12T02:58:38Z : I've started working on this.
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/882#issuecomment-208682515 
> ----
> *lakshmanantokbox* wrote on 2016-04-22T01:23:37Z : If the consistency is 
> turned on in EMR,“consistent view” for 
> EMRFS(https://blogs.aws.amazon.com/bigdata/post/Tx1WL4KR7SE37YY/Ensuring-Consistency-When-Using-Amazon-S3-and-Amazon-Elastic-MapReduce-for-ETL-W),
>  this problem can be avoided
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/882#issuecomment-213198627 
> ----
> [~jbaranick] wrote on 2016-04-22T01:48:32Z : Correct, but for those use 
> Qubole, this is not the case.
> > On Apr 21, 2016, at 6:23 PM, lakshmanantokbox [email protected] 
> > wrote:
> > 
> > If the consistency is turned on in EMR,“consistent view” for 
> > EMRFS(https://blogs.aws.amazon.com/bigdata/post/Tx1WL4KR7SE37YY/Ensuring-Consistency-When-Using-Amazon-S3-and-Amazon-Elastic-MapReduce-for-ETL-W),
> >  this problem can be avoided
> > 
> > —
> > You are receiving this because you commented.
> > Reply to this email directly or view it on GitHub
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/issues/882#issuecomment-213207376



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to