This sounds good Xiyuan. I'd also be in favour of running the ARM builds
regularly as cron jobs and once we see that they are stable we could run
them for every master commit. Hence, I'd say let's fix the above mentioned
problems and then set the nightly cron job up.

Cheers,
Till

On Fri, Sep 20, 2019 at 8:57 AM Xiyuan Wang <wangxiyuan1...@gmail.com>
wrote:

> Sure,  we can run daily ARM job as Travis CI nightly jobs firstly. Once
> it's stable enough, we can consider adding it to peer PR.
>
> BTW, I tested flink-end-to-end-test on ARM in last few days. Keeping the
> same as Travis, all 7 scenarios were tested:
>
> 1. split_checkpoints.sh
> 2. split_sticky.sh
> 3. split_ha.sh
> 4. split_heavy.sh
> 5. split_misc_hadoopfree.sh
> 6. split_misc.sh
> 7. split_container.sh
>
> The 1st-6th scenarios works well within some hacking and bug fixing
> locally:
>     1. frocksdb doesn't have official ARM release, so I built and install
> it locally for ARM.
>           https://issues.apache.org/jira/browse/FLINK-13598
>     2. Prometheus has ARM release but the test always download x86 version.
> Download the correct version can fix the issue.
>           https://issues.apache.org/jira/browse/FLINK-14086
>     3. Elasticsearch 6.0+ enables Xpack machine learning feature by
> default, but this feature doesn't support ARM. So Elasticsearch 6.0+ failed
> to start on ARM. Set `Xpack.ml.enabled: false` can fix this issue.
>           https://issues.apache.org/jira/browse/FLINK-14126
>
> The 7th scenario for container failed because:
>     1. docker-compose doesn't have official ARM package. Use `apt install
> docker-compose` can solve the problem.
>     2. minikube doesn't support ARM arch. Use kubeadm for K8S installation
> can solve the problem.
>
> Fixing the problem mentioned above is not hard. So I think we can add flink
> build, unit-test and e2e test as nightly jobs now.
>
> Any idea?
>
> Thanks.
>
> Stephan Ewen <se...@apache.org> 于2019年9月19日周四 下午5:44写道:
>
> > My gut feeling is that having a CI that only runs on a specific command
> > will not help too much.
> >
> > What about going with nightly builds then? We could set up the ARM CI the
> > same way as the Travis CI nightly builds (cron builds). They report build
> > failures to "bui...@flink.apache.org".
> > Maybe Chesnay or Jark could help with what needs to be done to post to
> that
> > mailing list?
> >
> > A requirement would be that the builds are stable, from the ARM
> > perspective, meaning that there are no failures at the moment caused by
> ARM
> > specific issue.
> >
> > What do the others think?
> >
> >
> > On Tue, Sep 3, 2019 at 4:40 AM Xiyuan Wang <wangxiyuan1...@gmail.com>
> > wrote:
> >
> > > The ARM CI trigger has been changed to `github comment` way only. It
> > means
> > > that every PR won't start ARM test unless a comment `check_arm` is
> added.
> > > Like what I did in the PR[1].
> > >
> > > A POC for Flink nightly end to end test job is created as well[2]. I'll
> > > improve it then.
> > >
> > > Any feedback or question?
> > >
> > >
> > > [1]: https://github.com/apache/flink/pull/9416
> > >      https://github.com/apache/flink/pull/9416#issuecomment-527268203
> > > [2]: https://github.com/theopenlab/openlab-zuul-jobs/pull/631
> > >
> > >
> > > Thanks
> > >
> > > Xiyuan Wang <wangxiyuan1...@gmail.com> 于2019年8月26日周一 下午7:41写道:
> > >
> > > > Before ARM CI is ready, I can close the CI test for each PR and let
> it
> > > > only be triggered by PR comment.  It's quite easy for OpenLab to do
> > this.
> > > >
> > > > OpenLab have many job piplines[1].  Now I use `check` pipline in
> > > > https://github.com/apache/flink/pull/9416. The job trigger contains
> > > > github_action and github_comment[2]. I can create a new pipline for
> > > Flink,
> > > > the new trigger can only contain github_coment like:
> > > >
> > > > trigger:
> > > >   github:
> > > >  - event: pull_request
> > > >    action: comment
> > > >    comment: (?i)^\s*recheck_arm_build\s*$
> > > >
> > > > So that the ARM job will not be ran for every PR. It'll be just ran
> for
> > > > the PR which have `recheck_arm_build` comment.
> > > >
> > > > Then once ARM CI is ready, I can add it back.
> > > >
> > > >
> > > > nightly tests can be added as well of couse. There is a kind of job
> in
> > > > OpenLab called `periodic job`. We can use it for Flink daily nightly
> > > tests.
> > > > If any error occur, the report can be sent to
> bui...@flink.apache.org
> > > as
> > > > well.
> > > >
> > > > [1]:
> > > >
> > >
> >
> https://github.com/theopenlab/openlab-zuul-jobs/blob/master/zuul.d/pipelines.yaml
> > > > [2]:
> > > >
> > >
> >
> https://github.com/theopenlab/openlab-zuul-jobs/blob/master/zuul.d/pipelines.yaml#L10-L19
> > > >
> > > > Stephan Ewen <se...@apache.org> 于2019年8月26日周一 下午6:13写道:
> > > >
> > > >> Adding CI builds for ARM makes only sense when we actually take them
> > > into
> > > >> account as "blocking a merge", otherwise there is no point in having
> > > them.
> > > >> So we would need to be prepared to do that.
> > > >>
> > > >> The cases where something runs in UNIX/x64 but fails on ARM are few
> > > cases
> > > >> and so far seem to have been related to libraries or some magic that
> > > tries
> > > >> to do system dependent actions outside Java.
> > > >>
> > > >> One worthwhile discussion could be whether to run the ARM CI builds
> as
> > > >> part
> > > >> of the nightly tests, not on every commit.
> > > >> There are a lot of nightly tests, for example for different Java /
> > > Scala /
> > > >> Hadoop versions.
> > > >>
> > > >> On Mon, Aug 26, 2019 at 10:46 AM Xiyuan Wang <
> > wangxiyuan1...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Sorry, maybe my words is misleading.
> > > >> >
> > > >> > We are just starting adding ARM support. So the CI is non-voting
> at
> > > this
> > > >> > moment to avoid blocking normal Flink development.
> > > >> >
> > > >> > But once the ARM CI works well and stable enough. We should mark
> it
> > as
> > > >> > voting. It means that in the future, if the ARM test is failed in
> a
> > > PR,
> > > >> the
> > > >> > PR can not be merged. The test log may tell develpers what error
> is
> > > >> > comming. If the develper need debug the detail on an ARM vm,
> OpenLab
> > > can
> > > >> > provider it.
> > > >> >
> > > >> > Adding ARM CI can make sure Flink support ARM originally
> > > >> >
> > > >> > I left a workflow in the PR, I'd like to print it here:
> > > >> >
> > > >> >    1. Add the basic build script to ensure the CI system and build
> > job
> > > >> >    works as expect. The job should be marked as non-voting first,
> it
> > > >> means the
> > > >> >    CI test failure won't block Flink PR to be merged.
> > > >> >    2. Add the test script to run unit/intergration test. At this
> > step
> > > >> the
> > > >> >    --fn parameter will be added to mvn test. It will run the full
> > test
> > > >> cases
> > > >> >    in Flink, so that we can find what test is failed on ARM.
> > > >> >    3. Fix the test failure one by one.
> > > >> >    4. Once all the tests are passed, remove the --fn parameter and
> > > keep
> > > >> >    watch the CI's status for some days. If some bugs raise then,
> fix
> > > >> them as
> > > >> >    what we usually do for travis-ci.
> > > >> >    5. Once the CI is stable enought, remove the non-voting tag, so
> > > that
> > > >> >    the ARM CI will be the same as travis-ci, to be one of the gate
> > for
> > > >> Flink
> > > >> >    PR.
> > > >> >    6. Finally, Flink community can announce and release Flink ARM
> > > >> version.
> > > >> >
> > > >> >
> > > >> > Chesnay Schepler <ches...@apache.org> 于2019年8月26日周一 下午2:25写道:
> > > >> >
> > > >> >> I'm sorry, but if these issues are only fixed later anyway I see
> no
> > > >> >> reason to run these tests on each PR. We're just adding noise to
> > each
> > > >> PR
> > > >> >> that everyone will just ignore.
> > > >> >>
> > > >> >> I'm curious as to the benefit of having this directly in Flink;
> why
> > > >> >> aren't the ARM builds run outside of the Flink project, and fixes
> > for
> > > >> it
> > > >> >> provided?
> > > >> >>
> > > >> >> It seems to me like nothing about these arm builds is actually
> > > handled
> > > >> >> by the Flink project.
> > > >> >>
> > > >> >> On 26/08/2019 03:43, Xiyuan Wang wrote:
> > > >> >> > Thanks for Stephan to bring up this topic.
> > > >> >> >
> > > >> >> > The package build jobs work well now. I have a simple online
> demo
> > > >> which
> > > >> >> is
> > > >> >> > built and ran on a ARM VM. Feel free to have a try[1].
> > > >> >> >
> > > >> >> > As the first step for ARM support, maybe it's good to add them
> > now.
> > > >> >> >
> > > >> >> > While for the next step, the test part is still broken. It
> > relates
> > > to
> > > >> >> some
> > > >> >> > points we find:
> > > >> >> >
> > > >> >> > 1. Some unit tests are failed[1] by Java coding. These kind of
> > > >> failure
> > > >> >> can
> > > >> >> > be fixed easily.
> > > >> >> > 2. Some tests are failed by depending on third part
> libaraies[2].
> > > It
> > > >> >> > includes frocksdb, MapR Client and Netty. They don't have ARM
> > > >> release.
> > > >> >> >      a. Frocksdb: I'm testing it locally now by `make
> check_some`
> > > and
> > > >> >> `make
> > > >> >> > jtest` similar with its travis job. There are 3 tests failed by
> > > `make
> > > >> >> > check_some`. Please see the ticket for more details. Once the
> > test
> > > >> pass,
> > > >> >> > frocksdb can release ARM package then.
> > > >> >> >      b. MapR Client. This belongs to MapR company. At this
> > moment,
> > > >> >> maybe we
> > > >> >> > should skip MapR support for Flink ARM.
> > > >> >> >      c. Netty. Actually Netty runs well on our ARM machine. We
> > will
> > > >> ask
> > > >> >> > Netty community to release ARM support. If they do not want,
> > > OpenLab
> > > >> >> will
> > > >> >> > handle a Maven Repository for some common libraries on ARM.
> > > >> >> >
> > > >> >> >
> > > >> >> > For Chesnay's concern:
> > > >> >> >
> > > >> >> > Firstly, OpenLab team will keep maintaining and fixing ARM CI.
> It
> > > >> means
> > > >> >> > that once build or test fails, we'll fix it at once.
> > > >> >> > Secondly,  OpenLab can provide ARM VMs to everyone for
> > reproducing
> > > >> and
> > > >> >> > testing. You just need to creat a  Test Request issue in
> > > openlab[1].
> > > >> >> Then
> > > >> >> > we'll create ARM VMs for you, you can  login and do the thing
> you
> > > >> want.
> > > >> >> >
> > > >> >> > Does it make sense?
> > > >> >> >
> > > >> >> > [1]: http://114.115.168.52:8081/#/overview
> > > >> >> > [1]: https://issues.apache.org/jira/browse/FLINK-13449
> > > >> >> >        https://issues.apache.org/jira/browse/FLINK-13450
> > > >> >> > [2]: https://issues.apache.org/jira/browse/FLINK-13598
> > > >> >> > [3]: https://github.com/theopenlab/openlab/issues/new/choose
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> > Chesnay Schepler <ches...@apache.org> 于2019年8月24日周六 上午12:10写道:
> > > >> >> >
> > > >> >> >> I'm wondering what we are supposed to do if the build fails?
> > > >> >> >> We aren't providing and guides on setting up an arm dev
> > > >> environment; so
> > > >> >> >> reproducing it locally isn't possible.
> > > >> >> >>
> > > >> >> >> On 23/08/2019 17:55, Stephan Ewen wrote:
> > > >> >> >>> Hi all!
> > > >> >> >>>
> > > >> >> >>> As part of the Flink on ARM effort, there is a pull request
> > that
> > > >> >> >> triggers a
> > > >> >> >>> build on OpenLabs CI for each push and runs tests on ARM
> > > machines.
> > > >> >> >>>
> > > >> >> >>> Currently that build is roughly equivalent to what the "core"
> > and
> > > >> >> "tests"
> > > >> >> >>> profiles do on Travis.
> > > >> >> >>> The result will be posted to the PR comments, similar to the
> > > Flink
> > > >> >> Bot's
> > > >> >> >>> Travis build result.
> > > >> >> >>> The build currently passes :-) so Flink seems to be okay on
> > ARM.
> > > >> >> >>>
> > > >> >> >>> My suggestion would be to try and add this and gather some
> > > >> experience
> > > >> >> >> with
> > > >> >> >>> it.
> > > >> >> >>> The Travis build results should be our "ground truth" and the
> > ARM
> > > >> CI
> > > >> >> >>> (openlabs CI) would be "informational only" at the beginning,
> > but
> > > >> >> helping
> > > >> >> >>> us understand when we break ARM support.
> > > >> >> >>>
> > > >> >> >>> You can see this in the PR that adds the openlabs CI config:
> > > >> >> >>> https://github.com/apache/flink/pull/9416
> > > >> >> >>>
> > > >> >> >>> Any objections?
> > > >> >> >>>
> > > >> >> >>> Best,
> > > >> >> >>> Stephan
> > > >> >> >>>
> > > >> >> >>
> > > >> >>
> > > >> >>
> > > >>
> > > >
> > >
> >
>

Reply via email to