[
https://issues.apache.org/jira/browse/GOBBLIN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Abhishek Tiwari reassigned GOBBLIN-117:
---------------------------------------
Assignee: Hung Tran
Component/s: state-management
> Remove current.jst from FsStateStore, add state-store retention to driver
> -------------------------------------------------------------------------
>
> Key: GOBBLIN-117
> URL: https://issues.apache.org/jira/browse/GOBBLIN-117
> Project: Apache Gobblin
> Issue Type: Bug
> Components: state-management
> Reporter: Sahil Takiar
> Assignee: Hung Tran
> Labels: Core:TaskManagement, enhancement
>
> - The `FsStateStore` creates and updates a `current.jst` to track the most
> recent version of the job state
> - The problem is that for AWS users, the state-store typically has be put on
> S3
> - The problem is that for overwriting data, S3 only provides eventual
> consistency
> - This can cause problems as Gobblin jobs will see an old version of the
> state-store
> A simple solution to this problem would be to:
> - Remove the concept of `current.jst` and just let each state-store entry be
> of the form `job_id.jst`
> - The gobblin just does a `ls` on the state-store directory, sorts the
> contents by file name and picks the most recent one
> - File listing time + sorting the listing shouldn't take long, but just in
> case the state-store retention job should be run as part of the Gobblin core
> job - either in the `ApplicationLauncher` or the `JobLauncher`
>
> *Github Url* : https://github.com/linkedin/gobblin/issues/882
> *Github Reporter* : [~stakiar]
> *Github Created At* : 2016-03-25T03:54:26Z
> *Github Updated At* : 2017-01-12T04:50:48Z
> h3. Comments
> ----
> [~stakiar] wrote on 2016-03-25T03:55:30Z : @zliu41 I believe we discussed
> this briefly while working on #741, any comments on the above approach?
>
>
> *Github Url* :
> https://github.com/linkedin/gobblin/issues/882#issuecomment-201126060
> ----
> *zliu41* wrote on 2016-03-25T15:35:35Z : LGTM except that if a job has
> multiple datasets there will be multiple `current.jst`s so you'll need to
> find the most recent one for each dataset urn.
>
>
> *Github Url* :
> https://github.com/linkedin/gobblin/issues/882#issuecomment-201334649
> ----
> [~jbaranick] wrote on 2016-04-12T02:58:38Z : I've started working on this.
>
>
> *Github Url* :
> https://github.com/linkedin/gobblin/issues/882#issuecomment-208682515
> ----
> *lakshmanantokbox* wrote on 2016-04-22T01:23:37Z : If the consistency is
> turned on in EMR,“consistent view” for
> EMRFS(https://blogs.aws.amazon.com/bigdata/post/Tx1WL4KR7SE37YY/Ensuring-Consistency-When-Using-Amazon-S3-and-Amazon-Elastic-MapReduce-for-ETL-W),
> this problem can be avoided
>
>
> *Github Url* :
> https://github.com/linkedin/gobblin/issues/882#issuecomment-213198627
> ----
> [~jbaranick] wrote on 2016-04-22T01:48:32Z : Correct, but for those use
> Qubole, this is not the case.
> > On Apr 21, 2016, at 6:23 PM, lakshmanantokbox [email protected]
> > wrote:
> >
> > If the consistency is turned on in EMR,“consistent view” for
> > EMRFS(https://blogs.aws.amazon.com/bigdata/post/Tx1WL4KR7SE37YY/Ensuring-Consistency-When-Using-Amazon-S3-and-Amazon-Elastic-MapReduce-for-ETL-W),
> > this problem can be avoided
> >
> > —
> > You are receiving this because you commented.
> > Reply to this email directly or view it on GitHub
>
>
> *Github Url* :
> https://github.com/linkedin/gobblin/issues/882#issuecomment-213207376
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)