The checkpoint cleanup works for HDFS right? I assume the job manager
should see that as well.

This is not a trivial problem in general, so the assumptions we were making
now that the JM can actually execute the cleanup logic.

Aljoscha Krettek <aljos...@apache.org> ezt írta (időpont: 2015. jún. 15.,
H, 15:40):

> @Ufuk The cleanup bug for file:// checkpoints is not easy to fix IMHO.
>
> On Mon, 15 Jun 2015 at 15:39 Aljoscha Krettek <aljos...@apache.org> wrote:
>
> > Oh yes, on that I agree. I'm just saying that the checkpoint setting
> > should maybe be a central setting.
> >
> > On Mon, 15 Jun 2015 at 15:38 Matthias J. Sax <
> > mj...@informatik.hu-berlin.de> wrote:
> >
> >> Hi,
> >>
> >> IMHO, it is very common that Workers do have their own config files (eg,
> >> Storm works the same way). And I think it make a lot of senses. You
> >> might run Flink in an heterogeneous cluster and you want to assign
> >> different memory and slots for different hardware. This would not be
> >> possible using a single config file (specified at the master and
> >> distribute it).
> >>
> >>
> >> -Matthias
> >>
> >> On 06/15/2015 03:30 PM, Aljoscha Krettek wrote:
> >> > Regarding 1), thats why I said "bugs and features". :D But I think of
> >> it as
> >> > a bug, since people will normally set in in the flink-conf.yaml on the
> >> > master and assume that it works. That's what I assumed and it took me
> a
> >> > while to figure out that the task managers don't respect this setting.
> >> >
> >> > Regarding 3), if you think about it, this could never work. The state
> >> > handle cleanup logic happens purely on the JobManager. So what happens
> >> is
> >> > that the TaskManagers create state in some directory, let's say
> >> > /tmp/checkpoints, on the TaskManager. For cleanup, the JobManager gets
> >> the
> >> > state handle and calls discard (on the JobManager), this tries to
> >> cleanup
> >> > the state in /tmp/checkpoints, but of course, there is nothing there
> >> since
> >> > we are still on the JobManager.
> >> >
> >> > On Mon, 15 Jun 2015 at 15:23 Márton Balassi <balassi.mar...@gmail.com
> >
> >> > wrote:
> >> >
> >> >> @Aljoscha:
> >> >> 1) I think this just means that you can set the state backend on a
> >> >> taskmanager basis.
> >> >> 3) This is a serious issue then. Is it work when you set it in the
> >> >> flink-conf.yaml?
> >> >>
> >> >> On Mon, Jun 15, 2015 at 3:17 PM, Aljoscha Krettek <
> aljos...@apache.org
> >> >
> >> >> wrote:
> >> >>
> >> >>> So, during my testing of the state checkpointing on a cluster I
> >> >> discovered
> >> >>> several things (bugs and features):
> >> >>>
> >> >>>  - If you have a setup where the configuration is not synced to the
> >> >> workers
> >> >>> they do not pick up the state back-end configuration. The workers do
> >> not
> >> >>> respect the setting in the flink-cont.yaml on the master
> >> >>> - HDFS checkpointing works fine if you manually set it as the
> per-job
> >> >>> state-backend using setStateHandleProvider()
> >> >>> - If you manually set the stateHandleProvider to a "file://"
> backend,
> >> old
> >> >>> checkpoints will not be cleaned up, they will also not be cleaned up
> >> >> when a
> >> >>> job is finished.
> >> >>>
> >> >>> On Sun, 14 Jun 2015 at 23:22 Maximilian Michels <m...@apache.org>
> >> wrote:
> >> >>>
> >> >>>> Hi Henry,
> >> >>>>
> >> >>>> This is just a dry run. The goal is to get everything in shape for
> a
> >> >>> proper
> >> >>>> vote.
> >> >>>>
> >> >>>> Kind regards,
> >> >>>> Max
> >> >>>>
> >> >>>>
> >> >>>> On Sun, Jun 14, 2015 at 7:58 PM, Henry Saputra <
> >> >> henry.sapu...@gmail.com>
> >> >>>> wrote:
> >> >>>>
> >> >>>>> Hi Max,
> >> >>>>>
> >> >>>>> Are you doing official VOTE on the RC on 0.9 release or this is
> just
> >> >> a
> >> >>>> dry
> >> >>>>> run?
> >> >>>>>
> >> >>>>>
> >> >>>>> - Henry
> >> >>>>>
> >> >>>>> On Sun, Jun 14, 2015 at 9:11 AM, Maximilian Michels <
> m...@apache.org
> >> >
> >> >>>>> wrote:
> >> >>>>>> Dear Flink community,
> >> >>>>>>
> >> >>>>>> Here's the second release candidate for the 0.9.0 release. We
> >> >> haven't
> >> >>>>> had a
> >> >>>>>> formal vote on the previous release candidate but it received an
> >> >>>> implicit
> >> >>>>>> -1 because of a couple of issues.
> >> >>>>>>
> >> >>>>>> Thanks to the hard-working Flink devs these issues should be
> solved
> >> >>>> now.
> >> >>>>>> The following commits have been added to the second release
> >> >>> candidate:
> >> >>>>>>
> >> >>>>>> f5f0709 [FLINK-2194] [type extractor] Excludes Writable type from
> >> >>>>>> WritableTypeInformation to be treated as an interface
> >> >>>>>> 40e2df5 [FLINK-2072] [ml] Adds quickstart guide
> >> >>>>>> af0fee5 [FLINK-2207] Fix TableAPI conversion documenation and
> >> >> further
> >> >>>>>> renamings for consistency.
> >> >>>>>> e513be7 [FLINK-2206] Fix incorrect counts of finished, canceled,
> >> >> and
> >> >>>>> failed
> >> >>>>>> jobs in webinterface
> >> >>>>>> ecfde6d [docs][release] update stable version to 0.9.0
> >> >>>>>> 4d8ae1c [docs] remove obsolete YARN link and cleanup download
> links
> >> >>>>>> f27fc81 [FLINK-2195] Configure Configurable Hadoop InputFormats
> >> >>>>>> ce3bc9c [streaming] [api-breaking] Minor DataStream cleanups
> >> >>>>>> 0edc0c8 [build] [streaming] Streaming parents dependencies pushed
> >> >> to
> >> >>>>>> children
> >> >>>>>> 6380b95 [streaming] Logging update for checkpointed streaming
> >> >>>> topologies
> >> >>>>>> 5993e28 [FLINK-2199] Escape UTF characters in Scala Shell welcome
> >> >>>>> squirrel.
> >> >>>>>> 80dd72d [FLINK-2196] [javaAPI] Moved misplaced
> >> >> SortPartitionOperator
> >> >>>>> class
> >> >>>>>> c8c2e2c [hotfix] Bring KMeansDataGenerator and KMeans quickstart
> in
> >> >>>> sync
> >> >>>>>> 77def9f [FLINK-2183][runtime] fix deadlock for concurrent slot
> >> >>> release
> >> >>>>>> 87988ae [scripts] remove quickstart scripts
> >> >>>>>> f3a96de [streaming] Fixed streaming example jars packaging and
> >> >>>>> termination
> >> >>>>>> 255c554 [FLINK-2191] Fix inconsistent use of closure cleaner in
> >> >> Scala
> >> >>>>>> Streaming
> >> >>>>>> 1343f26 [streaming] Allow force-enabling checkpoints for
> iterative
> >> >>> jobs
> >> >>>>>> c59d291 Fixed a few trivial issues:
> >> >>>>>> e0e6f59 [streaming] Optional iteration feedback partitioning
> added
> >> >>>>>> 348ac86 [hotfix] Fix YARNSessionFIFOITCase
> >> >>>>>> 80cf2c5 [ml] Makes StandardScalers state package private and
> reduce
> >> >>>>>> redundant code. Adjusts flink-ml readme.
> >> >>>>>> c83ee8a [FLINK-1844] [ml] Add MinMaxScaler implementation in the
> >> >>>>>> proprocessing package, test for the for the corresponding
> >> >>> functionality
> >> >>>>> and
> >> >>>>>> documentation.
> >> >>>>>> ee7c417 [docs] [streaming] Added states and fold to the streaming
> >> >>> docs
> >> >>>>>> fcca75c [docs] Fix some typos and grammar in the Streaming
> >> >>> Programming
> >> >>>>>> Guide.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> Again, we need to test the new release candidate. Therefore, I've
> >> >>>>> created a
> >> >>>>>> new document where we keep track of our testing criteria for
> >> >>> releases:
> >> >>>>>>
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >>
> https://docs.google.com/document/d/162AZEX8lo0Njal10mmt9wzM5GYVL5WME-VfwGmwpBoA/edit
> >> >>>>>>
> >> >>>>>> Everyone who tested previously, could take a different task this
> >> >>> time.
> >> >>>>> For
> >> >>>>>> some components we probably don't have to test again but, if in
> >> >>> doubt,
> >> >>>>>> testing twice doesn't hurt.
> >> >>>>>>
> >> >>>>>> Happy testing :)
> >> >>>>>>
> >> >>>>>> Cheers,
> >> >>>>>> Max
> >> >>>>>>
> >> >>>>>> Git branch: release-0.9.0-rc2
> >> >>>>>> Release binaries: http://people.apache.org/~mxm/flink-0.9.0-rc2/
> >> >>>>>> Maven artifacts:
> >> >>>>>>
> >> >>>>
> >> >>
> >> https://repository.apache.org/content/repositories/orgapacheflink-1040/
> >> >>>>>> PGP public key for verifying the signatures:
> >> >>>>>>
> http://pgp.mit.edu/pks/lookup?op=vindex&search=0xDE976D18C2909CBF
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >> >
> >>
> >>
>

Reply via email to