Re: update on git timeouts for jenkins builds

shane knapp Tue, 28 Jul 2015 13:31:41 -0700

git caches are set up on all workers for the pull request builder, and
builds are building w/the cache...  however in the build logs it
doesn't seem to be actually *hitting* the cache, so i guess i'll be
doing some more poking and prodding to see wtf is going on.



On Tue, Jul 28, 2015 at 12:49 PM, shane knapp <skn...@berkeley.edu> wrote:
> btw, the directory perm issue was only happening on
> amp-jenkins-worker-04 and -05.  both of the broken dirs were
> clobbered, so we won't be seeing any more of these again.
>
> On Tue, Jul 28, 2015 at 12:28 PM, shane knapp <skn...@berkeley.edu> wrote:
>> ++joshrosen
>>
>> ok, i found out some of what's going on.  some builds were failing as such:
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38749/console
>>
>> note that it's unable to remove the target/ directory during the
>> build...  this is caused by 'git clean -fdx' running, and deep in the
>> target directory there were a couple of dirs that had the wrong
>> permission bits set:
>>
>> dr-xr-xr-x.  2 jenkins jenkins 4096 Jul 27 06:54
>> /home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-615f93cc-27ad-464b-b0d4-4352c96c22ee
>>
>> note the missing 'w' on the owner bits.  this is what was causing
>> those failures.  after manually deleting the two entries that i found
>> (using the command below), we've whacked this mole for now.
>>
>> for x in $(cat jenkins_workers.txt); do echo $x; ssh $x "find
>> /home/jenkins/workspace/SparkPullRequestBuilder*/target/tmp -maxdepth
>> 3| xargs ls -ld | egrep ^dr-x"; echo; echo; done
>>
>> as for what exactly is messing up the perms, i'm not entirely sure.
>> josh, you have any ideas?
>>
>> shane
>>
>> On Tue, Jul 28, 2015 at 11:51 AM, shane knapp <skn...@berkeley.edu> wrote:
>>> hey all, i'm just back in from my wedding weekend (woot!) and am
>>> working on figuring out what's happening w/the git timeouts for pull
>>> request builds.
>>>
>>> TL;DR:  if your build fails due to a timeout, please retrigger your
>>> builds.  i know this isn't the BEST solution, but until we get some
>>> stuff implemented (traffic shaping, git cache for the workers) it's
>>> the only thing i can recommend.
>>>
>>> here's a snapshot of the state of the union:
>>> $ get_timeouts.sh 5
>>> timeouts by date:
>>> 2015-07-23 -- 3
>>> 2015-07-24 -- 1
>>> 2015-07-26 -- 7
>>> 2015-07-27 -- 18
>>> 2015-07-28 -- 9
>>>
>>> timeouts by project:
>>>      35 SparkPullRequestBuilder
>>>       3 Tachyon-Pull-Request-Builder
>>> total builds (excepting aborted by a user):
>>> 1908
>>>
>>> total percentage of builds timing out:
>>> 01%
>>>
>>> nothing has changed on our end AFAIK, our traffic graphs look totally
>>> fine, but starting sunday, we started seeing a spike in timeouts, with
>>> yesterday being the worst.  today is also not looking good either.
>>>
>>> github is looking OK, but not "great":
>>> https://status.github.com/
>>>
>>> as a solution, we'll be setting up some traffic shaping on our end, as
>>> well as implementing a git cache on the workers so that we'll
>>> (hopefully) minimize how many hits we make against github.  i was
>>> planning on doing the git cache months ago, but the timeout issue
>>> pretty much went away and i back-burnered that idea until today.
>>>
>>> other than that, i'll be posting updates as we get them.
>>>
>>> shane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: update on git timeouts for jenkins builds

Reply via email to