The checkpoint cleanup works for HDFS right? I assume the job manager should see that as well.
This is not a trivial problem in general, so the assumptions we were making now that the JM can actually execute the cleanup logic. Aljoscha Krettek <aljos...@apache.org> ezt írta (időpont: 2015. jún. 15., H, 15:40): > @Ufuk The cleanup bug for file:// checkpoints is not easy to fix IMHO. > > On Mon, 15 Jun 2015 at 15:39 Aljoscha Krettek <aljos...@apache.org> wrote: > > > Oh yes, on that I agree. I'm just saying that the checkpoint setting > > should maybe be a central setting. > > > > On Mon, 15 Jun 2015 at 15:38 Matthias J. Sax < > > mj...@informatik.hu-berlin.de> wrote: > > > >> Hi, > >> > >> IMHO, it is very common that Workers do have their own config files (eg, > >> Storm works the same way). And I think it make a lot of senses. You > >> might run Flink in an heterogeneous cluster and you want to assign > >> different memory and slots for different hardware. This would not be > >> possible using a single config file (specified at the master and > >> distribute it). > >> > >> > >> -Matthias > >> > >> On 06/15/2015 03:30 PM, Aljoscha Krettek wrote: > >> > Regarding 1), thats why I said "bugs and features". :D But I think of > >> it as > >> > a bug, since people will normally set in in the flink-conf.yaml on the > >> > master and assume that it works. That's what I assumed and it took me > a > >> > while to figure out that the task managers don't respect this setting. > >> > > >> > Regarding 3), if you think about it, this could never work. The state > >> > handle cleanup logic happens purely on the JobManager. So what happens > >> is > >> > that the TaskManagers create state in some directory, let's say > >> > /tmp/checkpoints, on the TaskManager. For cleanup, the JobManager gets > >> the > >> > state handle and calls discard (on the JobManager), this tries to > >> cleanup > >> > the state in /tmp/checkpoints, but of course, there is nothing there > >> since > >> > we are still on the JobManager. > >> > > >> > On Mon, 15 Jun 2015 at 15:23 Márton Balassi <balassi.mar...@gmail.com > > > >> > wrote: > >> > > >> >> @Aljoscha: > >> >> 1) I think this just means that you can set the state backend on a > >> >> taskmanager basis. > >> >> 3) This is a serious issue then. Is it work when you set it in the > >> >> flink-conf.yaml? > >> >> > >> >> On Mon, Jun 15, 2015 at 3:17 PM, Aljoscha Krettek < > aljos...@apache.org > >> > > >> >> wrote: > >> >> > >> >>> So, during my testing of the state checkpointing on a cluster I > >> >> discovered > >> >>> several things (bugs and features): > >> >>> > >> >>> - If you have a setup where the configuration is not synced to the > >> >> workers > >> >>> they do not pick up the state back-end configuration. The workers do > >> not > >> >>> respect the setting in the flink-cont.yaml on the master > >> >>> - HDFS checkpointing works fine if you manually set it as the > per-job > >> >>> state-backend using setStateHandleProvider() > >> >>> - If you manually set the stateHandleProvider to a "file://" > backend, > >> old > >> >>> checkpoints will not be cleaned up, they will also not be cleaned up > >> >> when a > >> >>> job is finished. > >> >>> > >> >>> On Sun, 14 Jun 2015 at 23:22 Maximilian Michels <m...@apache.org> > >> wrote: > >> >>> > >> >>>> Hi Henry, > >> >>>> > >> >>>> This is just a dry run. The goal is to get everything in shape for > a > >> >>> proper > >> >>>> vote. > >> >>>> > >> >>>> Kind regards, > >> >>>> Max > >> >>>> > >> >>>> > >> >>>> On Sun, Jun 14, 2015 at 7:58 PM, Henry Saputra < > >> >> henry.sapu...@gmail.com> > >> >>>> wrote: > >> >>>> > >> >>>>> Hi Max, > >> >>>>> > >> >>>>> Are you doing official VOTE on the RC on 0.9 release or this is > just > >> >> a > >> >>>> dry > >> >>>>> run? > >> >>>>> > >> >>>>> > >> >>>>> - Henry > >> >>>>> > >> >>>>> On Sun, Jun 14, 2015 at 9:11 AM, Maximilian Michels < > m...@apache.org > >> > > >> >>>>> wrote: > >> >>>>>> Dear Flink community, > >> >>>>>> > >> >>>>>> Here's the second release candidate for the 0.9.0 release. We > >> >> haven't > >> >>>>> had a > >> >>>>>> formal vote on the previous release candidate but it received an > >> >>>> implicit > >> >>>>>> -1 because of a couple of issues. > >> >>>>>> > >> >>>>>> Thanks to the hard-working Flink devs these issues should be > solved > >> >>>> now. > >> >>>>>> The following commits have been added to the second release > >> >>> candidate: > >> >>>>>> > >> >>>>>> f5f0709 [FLINK-2194] [type extractor] Excludes Writable type from > >> >>>>>> WritableTypeInformation to be treated as an interface > >> >>>>>> 40e2df5 [FLINK-2072] [ml] Adds quickstart guide > >> >>>>>> af0fee5 [FLINK-2207] Fix TableAPI conversion documenation and > >> >> further > >> >>>>>> renamings for consistency. > >> >>>>>> e513be7 [FLINK-2206] Fix incorrect counts of finished, canceled, > >> >> and > >> >>>>> failed > >> >>>>>> jobs in webinterface > >> >>>>>> ecfde6d [docs][release] update stable version to 0.9.0 > >> >>>>>> 4d8ae1c [docs] remove obsolete YARN link and cleanup download > links > >> >>>>>> f27fc81 [FLINK-2195] Configure Configurable Hadoop InputFormats > >> >>>>>> ce3bc9c [streaming] [api-breaking] Minor DataStream cleanups > >> >>>>>> 0edc0c8 [build] [streaming] Streaming parents dependencies pushed > >> >> to > >> >>>>>> children > >> >>>>>> 6380b95 [streaming] Logging update for checkpointed streaming > >> >>>> topologies > >> >>>>>> 5993e28 [FLINK-2199] Escape UTF characters in Scala Shell welcome > >> >>>>> squirrel. > >> >>>>>> 80dd72d [FLINK-2196] [javaAPI] Moved misplaced > >> >> SortPartitionOperator > >> >>>>> class > >> >>>>>> c8c2e2c [hotfix] Bring KMeansDataGenerator and KMeans quickstart > in > >> >>>> sync > >> >>>>>> 77def9f [FLINK-2183][runtime] fix deadlock for concurrent slot > >> >>> release > >> >>>>>> 87988ae [scripts] remove quickstart scripts > >> >>>>>> f3a96de [streaming] Fixed streaming example jars packaging and > >> >>>>> termination > >> >>>>>> 255c554 [FLINK-2191] Fix inconsistent use of closure cleaner in > >> >> Scala > >> >>>>>> Streaming > >> >>>>>> 1343f26 [streaming] Allow force-enabling checkpoints for > iterative > >> >>> jobs > >> >>>>>> c59d291 Fixed a few trivial issues: > >> >>>>>> e0e6f59 [streaming] Optional iteration feedback partitioning > added > >> >>>>>> 348ac86 [hotfix] Fix YARNSessionFIFOITCase > >> >>>>>> 80cf2c5 [ml] Makes StandardScalers state package private and > reduce > >> >>>>>> redundant code. Adjusts flink-ml readme. > >> >>>>>> c83ee8a [FLINK-1844] [ml] Add MinMaxScaler implementation in the > >> >>>>>> proprocessing package, test for the for the corresponding > >> >>> functionality > >> >>>>> and > >> >>>>>> documentation. > >> >>>>>> ee7c417 [docs] [streaming] Added states and fold to the streaming > >> >>> docs > >> >>>>>> fcca75c [docs] Fix some typos and grammar in the Streaming > >> >>> Programming > >> >>>>>> Guide. > >> >>>>>> > >> >>>>>> > >> >>>>>> Again, we need to test the new release candidate. Therefore, I've > >> >>>>> created a > >> >>>>>> new document where we keep track of our testing criteria for > >> >>> releases: > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > https://docs.google.com/document/d/162AZEX8lo0Njal10mmt9wzM5GYVL5WME-VfwGmwpBoA/edit > >> >>>>>> > >> >>>>>> Everyone who tested previously, could take a different task this > >> >>> time. > >> >>>>> For > >> >>>>>> some components we probably don't have to test again but, if in > >> >>> doubt, > >> >>>>>> testing twice doesn't hurt. > >> >>>>>> > >> >>>>>> Happy testing :) > >> >>>>>> > >> >>>>>> Cheers, > >> >>>>>> Max > >> >>>>>> > >> >>>>>> Git branch: release-0.9.0-rc2 > >> >>>>>> Release binaries: http://people.apache.org/~mxm/flink-0.9.0-rc2/ > >> >>>>>> Maven artifacts: > >> >>>>>> > >> >>>> > >> >> > >> https://repository.apache.org/content/repositories/orgapacheflink-1040/ > >> >>>>>> PGP public key for verifying the signatures: > >> >>>>>> > http://pgp.mit.edu/pks/lookup?op=vindex&search=0xDE976D18C2909CBF > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > > >> > >> >