Thanks for the feedback, Yang.

Some updates I want to share in this thread.
I have built a PoC version of Meos e2e test with WordCount
workflow.[1] Then, I ran it in the testing environment. As the result
shown here[2]:
- For pulling image from DockerHub, it took 1 minute and 21 seconds
- For building it locally, it took 2 minutes and 54 seconds.

I prefer building it locally. Although it is slower, I think the time
overhead, comparing to the cost of maintaining the image in DockerHub
and the whole test process, is trivial for building or pulling the
image.

I look forward to hearing from you. ;)

Best,
Yangze Guo

[1]https://github.com/KarmaGYZ/flink/commit/0406d942446a1b17f81d93235b21a829bf88ccf0
[2]https://travis-ci.org/KarmaGYZ/flink/jobs/623207957
Best,
Yangze Guo

On Mon, Dec 9, 2019 at 2:39 PM Yang Wang <danrtsey...@gmail.com> wrote:
>
> Thanks Yangze for starting this discussion.
>
> Just share my thoughts.
>
> If the mesos official docker image could not meet our requirement, i suggest 
> to build the image locally.
> We have done the same things for yarn e2e tests. This way is more flexible 
> and easy to maintain. However,
> i have no idea how long building the mesos image locally will take. Based on 
> previous experience of yarn, i
> think it may not take too much time.
>
>
>
> Best,
> Yang
>
> Yangze Guo <karma...@gmail.com> 于2019年12月7日周六 下午4:25写道:
>>
>> Thanks for your feedback!
>>
>> @Till
>> Regarding the time overhead, I think it mainly come from the network
>> transmission. For building the image locally, it will totally download
>> 260MB files including the base image and packages. For pulling from
>> DockerHub, the compressed size of the image is 347MB. Thus, I agree
>> that it is ok to build the image locally.
>>
>> @Piyush
>> Thank you for offering the help and sharing your usage scenario. In
>> current stage, I think it will be really helpful if you can compress
>> the custom image[1] or reduce the time overhead to build it locally.
>> Any ideas for improving test coverage will also be appreciated.
>>
>> [1]https://hub.docker.com/layers/karmagyz/mesos-flink/latest/images/sha256-4e1caefea107818aa11374d6ac8a6e889922c81806f5cd791ead141f18ec7e64
>>
>> Best,
>> Yangze Guo
>>
>> On Sat, Dec 7, 2019 at 3:17 AM Piyush Narang <p.nar...@criteo.com> wrote:
>> >
>> > +1 from our end as well. At Criteo, we are running some Flink jobs on 
>> > Mesos in production to compute short term features for machine learning. 
>> > We’d love to help out and contribute on this initiative.
>> >
>> > Thanks,
>> > -- Piyush
>> >
>> >
>> > From: Till Rohrmann <trohrm...@apache.org>
>> > Date: Friday, December 6, 2019 at 8:10 AM
>> > To: dev <dev@flink.apache.org>
>> > Cc: user <u...@flink.apache.org>
>> > Subject: Re: [DISCUSS] Adding e2e tests for Flink's Mesos integration
>> >
>> > Big +1 for adding a fully working e2e test for Flink's Mesos integration. 
>> > Ideally we would have it ready for the 1.10 release. The lack of such a 
>> > test has bitten us already multiple times.
>> >
>> > In general I would prefer to use the official image if possible since it 
>> > frees us from maintaining our own custom image. Since Java 9 is no longer 
>> > officially supported as we opted for supporting Java 11 (LTS) it might not 
>> > be feasible, though. How much longer would building the custom image vs. 
>> > downloading the custom image from DockerHub be? Maybe it is ok to build 
>> > the image locally. Then we would not have to maintain the image.
>> >
>> > Cheers,
>> > Till
>> >
>> > On Fri, Dec 6, 2019 at 11:05 AM Yangze Guo 
>> > <karma...@gmail.com<mailto:karma...@gmail.com>> wrote:
>> > Hi, all,
>> >
>> > Currently, there is no end to end test or IT case for Mesos deployment
>> > while the common deployment related developing would inevitably touch
>> > the logic of this component. Thus, some work needs to be done to
>> > guarantee experience for both Meos users and contributors. After
>> > offline discussion with Till and Xintong, we have some basic ideas and
>> > would like to start a discussion thread on adding end to end tests for
>> > Flink's Mesos integration.
>> >
>> > As a first step, we would like to keep the scope of this contribution
>> > to be relative small. This may also help us to quickly get some basic
>> > test cases that might be helpful for the upcoming 1.10 release.
>> >
>> > As far as we can think of, what needs to be done is to setup a Mesos
>> > framework during the testing and determine which tests need to be
>> > included.
>> >
>> >
>> > ** Regarding the Mesos framework, after trying out several approaches,
>> > I find that setting up Mesos in docker is probably what we want. The
>> > resources needed for building and setting up Mesos from source is
>> > probably not affordable in most of the scenarios. So, the one open
>> > question that worth discussion is the choice of Docker image. We have
>> > come up with two options.
>> >
>> > - Using official Mesos image[1]
>> > The official image was the first alternative that come to our mind,
>> > but we run into some sort of Java version compatibility problem that
>> > leads to failures of launching task executors. Flink supports Java 9
>> > since version 1.9.0 [2], However, the official Docker image of Mesos
>> > is built with a development version of JDK 9, which probably has
>> > caused this problem. Unless we want to make Flink to also be
>> > compatible with the JDK development version used by the official mesos
>> > image, this option does not work out. Besides, according to the
>> > official roadmap[5], Java 9 is not a long-term support version, which
>> > may bring stability risk in future.
>> >
>> > - Build a custom image
>> > I've already tried build a custom image[3] and successfully run most
>> > of the existing end to end tests cases with it. The image is built
>> > with Ubuntu 16.04, JDK 8 and Mesos 1.7.1. For the mesos e2e test
>> > framework, we could either build the image from a Docker file or pull
>> > the pre-built image from DockerHub (or other hub services) during the
>> > testing.
>> > If we decide to publish the an image on DockerHub, we probably need a
>> > Flink official  repository/account to hold it.
>> >
>> >
>> > ** Regarding the test coverage, we think the following three tests
>> > could be a good starting point that covers a very essential set of
>> > behaviors for Mesos deployment.
>> > - Wordcount end-to-end test. For verifying the basic process of Mesos
>> > deployment.
>> > - Multiple submissions of the same job. For preventing resource
>> > management problems on Mesos, such as [4]
>> > - State TTL RocksDb backend end-to-end test. For verifying memory
>> > configuration behaviors, since Mesos has it’s own config options and
>> > logics.
>> >
>> > Unfortunately, neither of us who participated the initial offline
>> > discussion has much experience for running flink on mesos in
>> > production. It would be good that users and experts who actually use
>> > flink on mesos can join the discussion and provide some feedbacks. Any
>> > feedback, idea, suggestion, concern and question will be welcomed and
>> > appreciated.
>> >
>> >
>> > BTW, we would like to raise a survey on the usages of Flink on Mesos
>> > in the community. For the Flink on Mesos users, we would like to
>> > learn:
>> > - Which version of Mesos do you use and what setups (such as Marathon)
>> > do you need for Mesos
>> > - Is it Flink job cluster or session cluster that  is majorly used
>> > - How is the scale of the Flink / Mesos cluster
>> >
>> >
>> > [1]https://hub.docker.com/r/mesosphere/mesos
>> > [2]https://issues.apache.org/jira/browse/FLINK-11307
>> > [3]https://hub.docker.com/repository/docker/karmagyz/mesos-flink
>> > [4]https://issues.apache.org/jira/browse/FLINK-14074
>> > [5]https://www.oracle.com/technetwork/java/java-se-support-roadmap.html
>> >
>> >
>> > Best,
>> > Yangze Guo

Reply via email to