Re: [DISCUSS] FLIP-169: DataStream API for Fine-Grained Resource Requirements

Yangze Guo Tue, 08 Jun 2021 01:09:36 -0700

@Xintong
> introduce a general approach for overwriting such job specifics without 
> re-compiling the job
I think that would be a good direction. Just share some cents on this
topic. I'd divide the job-level specifics into two categories:
- Specifics which affect how Flink executes the job, e.g.
"parallelism.default". Currently, most of these specifics have a
corresponding config option.
- Job-specific arguments, e.g. the "input" of our WordCount example.
Those could only be passes as program arguments.
It might be good to have a general approach for overwriting all the
above arguments. One preliminary idea is introducing a separate
"job-conf.yaml".


All in all, I agree that this topic requires more careful designs and
deserved a separate discussion thread.

Best,
Yangze Guo

On Tue, Jun 8, 2021 at 1:47 PM Xintong Song <tonysong...@gmail.com> wrote:
>
> I think being able to specify fine grained resource requirements without
> having to change the codes and recompile the job is indeed a good idea. It
> definitely improves the usability.
>
> However, this requires more careful designs, which probably deserves a
> separate thread. I'd be good to have that discussion, but maybe not block
> this feature on that.
>
> One idea concerning the configuration approach: As Yangze said, flink
> configuration options are supposed to take effect at cluster level. For
> updating job level specifics that are not suitable to be introduced as a
> config option, currently the only way is to pass them as program arguments.
> Would it make sense to introduce a general approach for overwriting such
> job specifics without re-compiling the job?
>
> Thank you~
>
> Xintong Song
>
>
>
> On Tue, Jun 8, 2021 at 1:23 PM Yangze Guo <karma...@gmail.com> wrote:
>
> > @Wenlong
> > After another consideration, the config option approach I mentioned
> > above might not be appropriate. The resource requirements for SSG
> > should be a job level configuration and should no be set in the
> > flink-conf.
> >
> > I think we can define a JSON format, which would be the ResourceSpecs
> > mapped by the name of SSGs, for the resource requirements of a
> > specific job. Then, we allow user to configure the file path of that
> > JSON. The JSON will be only parsed in runtime, which allows user to
> > tune it without re-compiling the job.
> >
> > We can add another #setSlotSharingGroupResources for configuring the
> > file path of that JSON:
> > ```
> > /**
> >  * Specify fine-grained resource requirements for slot sharing groups
> > with the given resource JSON file. The existing resource
> >  * requirement of the same slot sharing group will be replaced.
> >  */
> > public StreamExecutionEnvironment setSlotSharingGroupResources(
> >         String pathToResourceJson);
> > ```
> >
> > WDYT?
> >
> > Best,
> > Yangze Guo
> >
> > On Tue, Jun 8, 2021 at 12:12 PM Yangze Guo <karma...@gmail.com> wrote:
> > >
> > > Thanks for the feedbacks, Xintong and Wenlong!
> > >
> > > @Wenlong
> > > I think that is a good idea, adjust the resource without re-compiling
> > > the job will facilitate the tuning process.
> > > We can define a pattern "slot-sharing-group.resource.{ssg name}"
> > > (welcome any proposal for the prefix naming) for the resource spec
> > > config of a slot sharing group. Then, user can set the ResourceSpec of
> > > SSG "ssg1" by adding "slot-sharing-group.resource.ssg1: {cpu: 1.0,
> > > heap: 100m, off-heap: 100m....}". WDYT?
> > >
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Tue, Jun 8, 2021 at 10:37 AM wenlong.lwl <wenlong88....@gmail.com>
> > wrote:
> > > >
> > > > Thanks Yangze for the flip, it is great for users to be able to
> > declare the
> > > > fine-grained resource requirements for the job.
> > > >
> > > > I have one minor suggestion: can we support setting resource
> > requirements
> > > > by configuration? Currently most of the config options in execution
> > config
> > > > can be configured by configuration, and it is very likely that users
> > need
> > > > to adjust the resource according to the performance of their job during
> > > > debugging,  Providing a configuration way will make it more convenient.
> > > >
> > > > Bests,
> > > > Wenlong Lyu
> > > >
> > > > On Thu, 3 Jun 2021 at 15:59, Xintong Song <tonysong...@gmail.com>
> > wrote:
> > > >
> > > > > Thanks Yangze for preparing the FLIP.
> > > > >
> > > > > The proposed changes look good to me.
> > > > >
> > > > > As you've mentioned in the implementation plan, I believe one of the
> > most
> > > > > important tasks of this FLIP is to have the feature well documented.
> > It
> > > > > would be really nice if we can keep that in mind and start drafting
> > the
> > > > > documentation early.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Jun 3, 2021 at 3:13 PM Yangze Guo <karma...@gmail.com>
> > wrote:
> > > > >
> > > > > > Hi, there,
> > > > > >
> > > > > > We would like to start a discussion thread on "FLIP-169: DataStream
> > > > > > API for Fine-Grained Resource Requirements"[1], where we propose
> > the
> > > > > > DataStream API for specifying fine-grained resource requirements in
> > > > > > StreamExecutionEnvironment.
> > > > > >
> > > > > > Please find more details in the FLIP wiki document [1]. Looking
> > > > > > forward to your feedback.
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-169+DataStream+API+for+Fine-Grained+Resource+Requirements
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Yangze Guo
> > > > > >
> > > > >
> >

Re: [DISCUSS] FLIP-169: DataStream API for Fine-Grained Resource Requirements

Reply via email to