btw, the directory perm issue was only happening on
amp-jenkins-worker-04 and -05.  both of the broken dirs were
clobbered, so we won't be seeing any more of these again.

On Tue, Jul 28, 2015 at 12:28 PM, shane knapp <skn...@berkeley.edu> wrote:
> ++joshrosen
>
> ok, i found out some of what's going on.  some builds were failing as such:
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38749/console
>
> note that it's unable to remove the target/ directory during the
> build...  this is caused by 'git clean -fdx' running, and deep in the
> target directory there were a couple of dirs that had the wrong
> permission bits set:
>
> dr-xr-xr-x.  2 jenkins jenkins 4096 Jul 27 06:54
> /home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-615f93cc-27ad-464b-b0d4-4352c96c22ee
>
> note the missing 'w' on the owner bits.  this is what was causing
> those failures.  after manually deleting the two entries that i found
> (using the command below), we've whacked this mole for now.
>
> for x in $(cat jenkins_workers.txt); do echo $x; ssh $x "find
> /home/jenkins/workspace/SparkPullRequestBuilder*/target/tmp -maxdepth
> 3| xargs ls -ld | egrep ^dr-x"; echo; echo; done
>
> as for what exactly is messing up the perms, i'm not entirely sure.
> josh, you have any ideas?
>
> shane
>
> On Tue, Jul 28, 2015 at 11:51 AM, shane knapp <skn...@berkeley.edu> wrote:
>> hey all, i'm just back in from my wedding weekend (woot!) and am
>> working on figuring out what's happening w/the git timeouts for pull
>> request builds.
>>
>> TL;DR:  if your build fails due to a timeout, please retrigger your
>> builds.  i know this isn't the BEST solution, but until we get some
>> stuff implemented (traffic shaping, git cache for the workers) it's
>> the only thing i can recommend.
>>
>> here's a snapshot of the state of the union:
>> $ get_timeouts.sh 5
>> timeouts by date:
>> 2015-07-23 -- 3
>> 2015-07-24 -- 1
>> 2015-07-26 -- 7
>> 2015-07-27 -- 18
>> 2015-07-28 -- 9
>>
>> timeouts by project:
>>      35 SparkPullRequestBuilder
>>       3 Tachyon-Pull-Request-Builder
>> total builds (excepting aborted by a user):
>> 1908
>>
>> total percentage of builds timing out:
>> 01%
>>
>> nothing has changed on our end AFAIK, our traffic graphs look totally
>> fine, but starting sunday, we started seeing a spike in timeouts, with
>> yesterday being the worst.  today is also not looking good either.
>>
>> github is looking OK, but not "great":
>> https://status.github.com/
>>
>> as a solution, we'll be setting up some traffic shaping on our end, as
>> well as implementing a git cache on the workers so that we'll
>> (hopefully) minimize how many hits we make against github.  i was
>> planning on doing the git cache months ago, but the timeout issue
>> pretty much went away and i back-burnered that idea until today.
>>
>> other than that, i'll be posting updates as we get them.
>>
>> shane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to