Here's what zeppelin community did, we make a python script to check the
build status of pull request.
Here's script:
https://github.com/apache/zeppelin/blob/master/travis_check.py
And this is the script we used in Jenkins build job.
if [ -f "travis_check.py" ]; then
git log -n 1
STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull request.*from.*" | sed
's/.*GitHub pull request <a
href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1 \2/g')
AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g')
PR=$(echo $STATUS | awk '{print $1}' | sed 's/.*[/]\(.*\)$/\1/g')
#COMMIT=$(git log -n 1 | grep "^Merge:" | awk '{print $3}')
#if [ -z $COMMIT ]; then
# COMMIT=$(curl -s https://api.github.com/repos/apache/zeppelin/pulls/$PR
| grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr '\n' ' ' | sed
's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | grep -v "apache:" |
sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
#fi
# get commit hash from PR
COMMIT=$(curl -s https://api.github.com/repos/apache/zeppelin/pulls/$PR |
grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr '\n' ' ' | sed
's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | grep -v "apache:" |
sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g')
sleep 30 # sleep few moment to wait travis starts the build
RET_CODE=0
python ./travis_check.py ${AUTHOR} ${COMMIT} || RET_CODE=$?
if [ $RET_CODE -eq 2 ]; then # try with repository name when travis-ci is
not available in the account
RET_CODE=0
AUTHOR=$(curl -s https://api.github.com/repos/apache/zeppelin/pulls/$PR
| grep '"full_name":' | grep -v "apache/zeppelin" | sed
's/.*[:][^"]*["]\([^/]*\).*/\1/g')
python ./travis_check.py ${AUTHOR} ${COMMIT} || RET_CODE=$?
fi
if [ $RET_CODE -eq 2 ]; then # fail with can't find build information in
the travis
set +x
echo "-----------------------------------------------------"
echo "Looks like travis-ci is not configured for your fork."
echo "Please setup by swich on 'zeppelin' repository at
https://travis-ci.org/profile and travis-ci."
echo "And then make sure 'Build branch updates' option is enabled in
the settings https://travis-ci.org/${AUTHOR}/zeppelin/settings."
echo ""
echo "To trigger CI after setup, you will need ammend your last commit
with"
echo "git commit --amend"
echo "git push your-remote HEAD --force"
echo ""
echo "See
http://zeppelin.apache.org/contribution/contributions.html#continuous-integration
."
fi
exit $RET_CODE
else
set +x
echo "travis_check.py does not exists"
exit 1
fi
Chesnay Schepler <[email protected]> 于2019年6月29日周六 下午3:17写道:
> Does this imply that a Jenkins job is active as long as the Travis build
> runs?
>
> On 26/06/2019 21:28, Bowen Li wrote:
> > Hi,
> >
> > @Dawid, I think the "long test running" as I mentioned in the first
> email,
> > also as you guys said, belongs to "a big effort which is much harder to
> > accomplish in a short period of time and may deserve its own separate
> > discussion". Thus I didn't include it in what we can do in a foreseeable
> > short term.
> >
> > Besides, I don't think that's the ultimate reason for lack of build
> > resources. Even if the build is shortened to something like 2h, the
> > problems of no build machine works about 6 or more hours in PST daytime
> > that I described will still happen, because no machine from ASF INFRA's
> > pool is allocated to Flink. As I have paid close attention to the build
> > queue in the past few weekdays, it's a pretty clear pattern now.
> >
> > **The ultimate root cause** for that is - we don't have any **dedicated**
> > build resources that we can stably rely on. I'm actually ok to wait for a
> > long time if there are build requests running, it means at least we are
> > making progress. But I'm not ok with no build resource. A better place I
> > think we should aim at in short term is to always have at least a central
> > pool (can be 3 or 5) of machines dedicated to build Flink at any time, or
> > maybe use users resources.
> >
> > @Chesnay @Robert I synced with Jeff offline that Zeppelin community is
> > using a Jenkins job to automatically build on users' travis account and
> > link the result back to github PR. I guess the Jenkins job would fetch
> > latest upstream master and build the PR against it. Jeff has filed
> tickets
> > to learn and get access to the Jenkins infra. It'll better to fully
> > understand it first before judging this approach.
> >
> > I also heard good things about CircleCI, and ASF INFRA seems to have a
> pool
> > of build capacity there too. Can be an alternative to consider.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz <
> [email protected]>
> > wrote:
> >
> >> Sorry to jump in late, but I think Bowen missed the most important point
> >> from Chesnay's previous message in the summary. The ultimate reason for
> >> all the problems is that the tests take close to 2 hours to run already.
> >> I fully support this claim: "Unless people start caring about test times
> >> before adding them, this issue cannot be solved"
> >>
> >> This is also another reason why using user's Travis account won't help.
> >> Every few weeks we reach the user's time limit for a single profile.
> >> This makes the user's builds simply fail, until we either properly
> >> decrease the time the tests take (which I am not sure we ever did) or
> >> postpone the problem by splitting into more profiles. (Note that the ASF
> >> Travis account has higher time limits)
> >>
> >> Best,
> >>
> >> Dawid
> >>
> >> On 26/06/2019 09:36, Robert Metzger wrote:
> >>> Do we know if using "the best" available hardware would improve the
> build
> >>> times?
> >>> Imagine we would run the build on machines with plenty of main memory
> to
> >>> mount everything to ramdisk + the latest CPU architecture?
> >>>
> >>> Throwing hardware at the problem could help reduce the time of an
> >>> individual build, and using our own infrastructure would remove our
> >>> dependency on Apache's Travis account (with the obvious downside of
> >> having
> >>> to maintain the infrastructure)
> >>> We could use an open source travis alternative, to have a similar
> >>> experience and make the migration easy.
> >>>
> >>>
> >>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler <[email protected]>
> >> wrote:
> >>>> From what I gathered, there's no special sauce that the Zeppelin
> >>>> project uses which actually integrates a users Travis account into the
> >> PR.
> >>>> They just disabled Travis for PRs. And that's kind of it.
> >>>>
> >>>> Naturally we can do this (duh) and safe the ASF a fair amount of
> >>>> resources, but there are downsides:
> >>>>
> >>>> The discoverability of the Travis check takes a nose-dive. Either we
> >>>> require every contributor to always, an every commit, also post a
> Travis
> >>>> build, or we have the reviewer sift through the contributors account
> to
> >>>> find it.
> >>>>
> >>>> This is rather cumbersome. Additionally, it's also not equivalent to
> >>>> having a PR build.
> >>>>
> >>>> A normal branch build takes a branch as is and tests it. A PR build
> >>>> merges the branch into master, and then runs it. (Fun fact: This is
> why
> >>>> a PR without merge conflicts is not being run on Travis.)
> >>>>
> >>>> And ultimately, everyone can already make use of this approach anyway.
> >>>>
> >>>> On 25/06/2019 08:02, Jark Wu wrote:
> >>>>> Hi Jeff,
> >>>>>
> >>>>> Thanks for sharing the Zeppelin approach. I think it's a good idea to
> >>>>> leverage user's travis account.
> >>>>> In this way, we can have almost unlimited concurrent build jobs and
> >>>>> developers can restart build by themselves (currently only committers
> >>>>> can restart PR's build).
> >>>>>
> >>>>> But I'm still not very clear how to integrate user's travis build
> into
> >>>>> the Flink pull request's build automatically. Can you explain more in
> >>>>> detail?
> >>>>>
> >>>>> Another question: does travis only build branches for user account?
> >>>>> My concern is that builds for PRs will rebase user's commits against
> >>>>> current master branch.
> >>>>> This will help us to find problems before merge. Builds for branches
> >>>>> will lose the impact of new commits in master.
> >>>>> How does Zeppelin solve this problem?
> >>>>>
> >>>>> Thanks again for sharing the idea.
> >>>>>
> >>>>> Regards,
> >>>>> Jark
> >>>>>
> >>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang <[email protected]
> >>>>> <mailto:[email protected]>> wrote:
> >>>>>
> >>>>> Hi Folks,
> >>>>>
> >>>>> Zeppelin meet this kind of issue before, we solve it by
> delegating
> >>>>> each
> >>>>> one's PR build to his travis account (Everyone can have 5 free
> >>>>> slot for
> >>>>> travis build).
> >>>>> Apache account travis build is only triggered when PR is merged.
> >>>>>
> >>>>>
> >>>>>
> >>>>> Kurt Young <[email protected] <mailto:[email protected]>>
> >>>>> 于2019年6月25日周二 上午10:16写道:
> >>>>>
> >>>>> > (Forgot to cc George)
> >>>>> >
> >>>>> > Best,
> >>>>> > Kurt
> >>>>> >
> >>>>> >
> >>>>> > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young <[email protected]
> >>>>> <mailto:[email protected]>> wrote:
> >>>>> >
> >>>>> > > Hi Bowen,
> >>>>> > >
> >>>>> > > Thanks for bringing this up. We actually have discussed
> about
> >>>>> this, and I
> >>>>> > > think Till and George have
> >>>>> > > already spend sometime investigating it. I have cced both of
> >>>>> them, and
> >>>>> > > maybe they can share
> >>>>> > > their findings.
> >>>>> > >
> >>>>> > > Best,
> >>>>> > > Kurt
> >>>>> > >
> >>>>> > >
> >>>>> > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu <[email protected]
> >>>>> <mailto:[email protected]>> wrote:
> >>>>> > >
> >>>>> > >> Hi Bowen,
> >>>>> > >>
> >>>>> > >> Thanks for bringing this. We also suffered from the long
> >>>>> build time.
> >>>>> > >> I agree that we should focus on solving build capacity
> >>>>> problem in the
> >>>>> > >> thread.
> >>>>> > >>
> >>>>> > >> My observation is there is only one build is running, all
> the
> >>>>> others
> >>>>> > >> (other
> >>>>> > >> PRs, master) are pending.
> >>>>> > >> The pricing plan[1] of travis shows it can support
> concurrent
> >>>>> build
> >>>>> > jobs.
> >>>>> > >> But I don't know which plan we are using, might be the free
> >>>>> plan for
> >>>>> > open
> >>>>> > >> source.
> >>>>> > >>
> >>>>> > >> I cc-ed Chesnay who may have some experience on Travis.
> >>>>> > >>
> >>>>> > >> Regards,
> >>>>> > >> Jark
> >>>>> > >>
> >>>>> > >> [1]: https://travis-ci.com/plans
> >>>>> > >>
> >>>>> > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li <
> [email protected]
> >>>>> <mailto:[email protected]>> wrote:
> >>>>> > >>
> >>>>> > >> > Hi Steven,
> >>>>> > >> >
> >>>>> > >> > I think you may not read what I wrote. The discussion is
> >> about
> >>>>> > "unstable
> >>>>> > >> > build **capacity**", in another word "unstable / lack of
> >> build
> >>>>> > >> resources",
> >>>>> > >> > not "unstable build".
> >>>>> > >> >
> >>>>> > >> > On Mon, Jun 24, 2019 at 4:40 PM Steven Wu
> >>>>> <[email protected] <mailto:[email protected]>>
> >>>>> > wrote:
> >>>>> > >> >
> >>>>> > >> > > long and sometimes unstable build is definitely a pain
> >>>> point.
> >>>>> > >> > >
> >>>>> > >> > > I suspect the build failure here in
> flink-connector-kafka
> >>>>> is not
> >>>>> > >> related
> >>>>> > >> > to
> >>>>> > >> > > my change. but there is no easy re-run the build on
> >>>>> travis UI.
> >>>>> > Google
> >>>>> > >> > > search showed a trick of close-and-open the PR will
> >>>>> trigger rebuild.
> >>>>> > >> but
> >>>>> > >> > > that could add noises to the PR activities.
> >>>>> > >> > > https://travis-ci.org/apache/flink/jobs/545555519
> >>>>> > >> > >
> >>>>> > >> > > travis-ci for my personal repo often failed with
> >>>>> exceeding time
> >>>>> > limit
> >>>>> > >> > after
> >>>>> > >> > > 4+ hours.
> >>>>> > >> > > The job exceeded the maximum time limit for jobs, and
> has
> >>>>> been
> >>>>> > >> > terminated.
> >>>>> > >> > >
> >>>>> > >> > > On Mon, Jun 24, 2019 at 4:15 PM Bowen Li
> >>>>> <[email protected] <mailto:[email protected]>>
> >>>>> > wrote:
> >>>>> > >> > >
> >>>>> > >> > > > https://travis-ci.org/apache/flink/builds/549681530
> >>>>> This build
> >>>>> > >> > request
> >>>>> > >> > > > has
> >>>>> > >> > > > been sitting at **HEAD of the queue** since I first
> saw
> >>>>> it at PST
> >>>>> > >> > 10:30am
> >>>>> > >> > > > (not sure how long it's been there before 10:30am).
> >>>>> It's PST
> >>>>> > 4:12pm
> >>>>> > >> now
> >>>>> > >> > > and
> >>>>> > >> > > > it hasn't started yet.
> >>>>> > >> > > >
> >>>>> > >> > > > On Mon, Jun 24, 2019 at 2:48 PM Bowen Li
> >>>>> <[email protected] <mailto:[email protected]>>
> >>>>> > >> wrote:
> >>>>> > >> > > >
> >>>>> > >> > > > > Hi devs,
> >>>>> > >> > > > >
> >>>>> > >> > > > > I've been experiencing the pain resulting from lack
> >>>>> of stable
> >>>>> > >> build
> >>>>> > >> > > > > capacity on Travis for Flink PRs [1].
> Specifically, I
> >>>>> noticed
> >>>>> > >> often
> >>>>> > >> > > that
> >>>>> > >> > > > no
> >>>>> > >> > > > > build in the queue is making any progress for
> hours,
> >> and
> >>>>> > suddenly
> >>>>> > >> 5
> >>>>> > >> > or
> >>>>> > >> > > 6
> >>>>> > >> > > > > builds kick off all together after the long pause.
> >>>>> I'm at PST
> >>>>> > >> > (UTC-08)
> >>>>> > >> > > > time
> >>>>> > >> > > > > zone, and I've seen pause can be as long as 6 hours
> >>>>> from PST 9am
> >>>>> > >> to
> >>>>> > >> > 3pm
> >>>>> > >> > > > > (let alone the time needed to drain the queue
> >>>>> afterwards).
> >>>>> > >> > > > >
> >>>>> > >> > > > > I think this has greatly impacted our productivity.
> >> I've
> >>>>> > >> experienced
> >>>>> > >> > > that
> >>>>> > >> > > > > PRs submitted in the early morning of PST time zone
> >>>>> won't finish
> >>>>> > >> > their
> >>>>> > >> > > > > build until late night of the same day.
> >>>>> > >> > > > >
> >>>>> > >> > > > > So my questions are:
> >>>>> > >> > > > >
> >>>>> > >> > > > > - Has anyone else experienced the same problem or
> >>>>> have similar
> >>>>> > >> > > > observation
> >>>>> > >> > > > > on TravisCI? (I suspect it has things to do with
> time
> >>>>> zone)
> >>>>> > >> > > > >
> >>>>> > >> > > > > - What pricing plan of TravisCI is Flink currently
> >>>>> using? Is it
> >>>>> > >> the
> >>>>> > >> > > free
> >>>>> > >> > > > > plan for open source projects? What are the
> >>>>> guaranteed build
> >>>>> > >> capacity
> >>>>> > >> > > of
> >>>>> > >> > > > > the current plan?
> >>>>> > >> > > > >
> >>>>> > >> > > > > - If the current pricing plan (either free or paid)
> >>>> can't
> >>>>> > provide
> >>>>> > >> > > stable
> >>>>> > >> > > > > build capacity, can we upgrade to a higher priced
> >>>>> plan with
> >>>>> > larger
> >>>>> > >> > and
> >>>>> > >> > > > more
> >>>>> > >> > > > > stable build capacity?
> >>>>> > >> > > > >
> >>>>> > >> > > > > BTW, another factor that contribute to the
> >>>>> productivity problem
> >>>>> > is
> >>>>> > >> > that
> >>>>> > >> > > > > our build is slow - we run full build for every PR
> >> and a
> >>>>> > >> successful
> >>>>> > >> > > full
> >>>>> > >> > > > > build takes ~5h. We definitely have more options to
> >>>>> solve it,
> >>>>> > for
> >>>>> > >> > > > instance,
> >>>>> > >> > > > > modularize the build graphs and reuse artifacts
> from
> >> the
> >>>>> > previous
> >>>>> > >> > > build.
> >>>>> > >> > > > > But I think that can be a big effort which is much
> >>>>> harder to
> >>>>> > >> > accomplish
> >>>>> > >> > > > in
> >>>>> > >> > > > > a short period of time and may deserve its own
> >> separate
> >>>>> > >> discussion.
> >>>>> > >> > > > >
> >>>>> > >> > > > > [1]
> https://travis-ci.org/apache/flink/pull_requests
> >>>>> > >> > > > >
> >>>>> > >> > > > >
> >>>>> > >> > > >
> >>>>> > >> > >
> >>>>> > >> >
> >>>>> > >>
> >>>>> > >
> >>>>> >
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Best Regards
> >>>>>
> >>>>> Jeff Zhang
> >>>>>
> >>
>
>
--
Best Regards
Jeff Zhang