Thanks for sharing the information with us, Piyush an Lasse.

@Piyush


Thanks for offering the help. IMO, there are currently several problems
that make supporting Flink on Mesos challenging for us.


   1. *Lack of Mesos experts.* AFAIK, there are very few people (if not
   none) among the active contributors in this community that are
   familiar with Mesos and can help with development on this component.
   2. *Absence of tests.* Mesos does not provide a testing cluster, like
   `MiniYARNCluster`, making it hard to test interactions between Flink and
   Mesos. We have only a few very simple e2e tests running on Mesos deployed
   in a docker, covering the most fundamental workflows. We are not sure how
   well those tests work, especially against some potential corner cases.
   3. *Divergence from other deployment.* Because of 1 and 2, the new
   efforts (features, maintenance, refactors) tend to exclude Mesos if
   possible. When the new efforts have to touch the Mesos related components
   (e.g., changes to the common resource manager interfaces), we have to be
   very careful and make as few changes as possible, to avoid accidentally
   breaking anything that we are not familiar with. As a result, the component
   diverges a lot from other deployment components (K8s/Yarn), which makes it
   harder to maintain.

It would be greatly appreciated if you can help with either of the above
issues.


Additionally, I have a few questions concerning your use cases at Criteo.
IIUC, you are going to stay on Mesos in the foreseeable future, while
keeping the Flink version up-to-date? What Flink version are you currently
using? How often do you upgrade (e.g., every release)? Would you be good
with keeping the Flink on Mesos component as it is (means that deployment
and resource management improvements may not be ported to Mesos), while
keeping other components up-to-date (e.g., improvements from programming
APIs, operators, state backens, etc.)?


Thank you~

Xintong Song



On Sat, Oct 24, 2020 at 2:48 AM Lasse Nedergaard <
lassenedergaardfl...@gmail.com> wrote:

> Hi
>
> At Trackunit We have been using Mesos for long time but have now moved to
> k8s.
>
> Med venlig hilsen / Best regards
> Lasse Nedergaard
>
>
> Den 23. okt. 2020 kl. 17.01 skrev Robert Metzger <rmetz...@apache.org>:
>
> 
> Hey Piyush,
> thanks a lot for raising this concern. I believe we should keep Mesos in
> Flink then in the foreseeable future.
> Your offer to help is much appreciated. We'll let you know once there is
> something.
>
> On Fri, Oct 23, 2020 at 4:28 PM Piyush Narang <p.nar...@criteo.com> wrote:
>
>> Thanks Kostas. If there's items we can help with, I'm sure we'd be able
>> to find folks who would be excited to contribute / help in any way.
>>
>> -- Piyush
>>
>>
>> On 10/23/20, 10:25 AM, "Kostas Kloudas" <kklou...@gmail.com> wrote:
>>
>>     Thanks Piyush for the message.
>>     After this, I revoke my +1. I agree with the previous opinions that we
>>     cannot drop code that is actively used by users, especially if it
>>     something that deep in the stack as support for cluster management
>>     framework.
>>
>>     Cheers,
>>     Kostas
>>
>>     On Fri, Oct 23, 2020 at 4:15 PM Piyush Narang <p.nar...@criteo.com>
>> wrote:
>>     >
>>     > Hi folks,
>>     >
>>     >
>>     >
>>     > We at Criteo are active users of the Flink on Mesos resource
>> management component. We are pretty heavy users of Mesos for scheduling
>> workloads on our edge datacenters and we do want to continue to be able to
>> run some of our Flink topologies (to compute machine learning short term
>> features) on those DCs. If possible our vote would be not to drop Mesos
>> support as that will tie us to an old release / have to maintain a fork as
>> we’re not planning to migrate off Mesos anytime soon. Is the burden
>> something that can be helped with by the community? (Or are you referring
>> to having to ensure PRs handle the Mesos piece as well when they touch the
>> resource managers?)
>>     >
>>     >
>>     >
>>     > Thanks,
>>     >
>>     >
>>     >
>>     > -- Piyush
>>     >
>>     >
>>     >
>>     >
>>     >
>>     > From: Till Rohrmann <trohrm...@apache.org>
>>     > Date: Friday, October 23, 2020 at 8:19 AM
>>     > To: Xintong Song <tonysong...@gmail.com>
>>     > Cc: dev <dev@flink.apache.org>, user <u...@flink.apache.org>
>>     > Subject: Re: [SURVEY] Remove Mesos support
>>     >
>>     >
>>     >
>>     > Thanks for starting this survey Robert! I second Konstantin and
>> Xintong in the sense that our Mesos user's opinions should matter most
>> here. If our community is no longer using the Mesos integration, then I
>> would be +1 for removing it in order to decrease the maintenance burden.
>>     >
>>     >
>>     >
>>     > Cheers,
>>     >
>>     > Till
>>     >
>>     >
>>     >
>>     > On Fri, Oct 23, 2020 at 2:03 PM Xintong Song <tonysong...@gmail.com>
>> wrote:
>>     >
>>     > +1 for adding a warning in 1.12 about planning to remove Mesos
>> support.
>>     >
>>     >
>>     >
>>     > With my developer hat on, removing the Mesos support would
>> definitely reduce the maintaining overhead for the deployment and resource
>> management related components. On the other hand, the Flink on Mesos users'
>> voices definitely matter a lot for this community. Either way, it would be
>> good to draw users attention to this discussion early.
>>     >
>>     >
>>     >
>>     > Thank you~
>>     >
>>     > Xintong Song
>>     >
>>     >
>>     >
>>     >
>>     >
>>     > On Fri, Oct 23, 2020 at 7:53 PM Konstantin Knauf <kna...@apache.org>
>> wrote:
>>     >
>>     > Hi Robert,
>>     >
>>     > +1 to the plan you outlined. If we were to drop support in Flink
>> 1.13+, we
>>     > would still support it in Flink 1.12- with bug fixes for some time
>> so that
>>     > users have time to move on.
>>     >
>>     > It would certainly be very interesting to hear from current Flink
>> on Mesos
>>     > users, on how they see the evolution of this part of the ecosystem.
>>     >
>>     > Best,
>>     >
>>     > Konstantin
>>
>>
>>

Reply via email to