++joshrosen

ok, i found out some of what's going on.  some builds were failing as such:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38749/console

note that it's unable to remove the target/ directory during the
build...  this is caused by 'git clean -fdx' running, and deep in the
target directory there were a couple of dirs that had the wrong
permission bits set:

dr-xr-xr-x.  2 jenkins jenkins 4096 Jul 27 06:54
/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-615f93cc-27ad-464b-b0d4-4352c96c22ee

note the missing 'w' on the owner bits.  this is what was causing
those failures.  after manually deleting the two entries that i found
(using the command below), we've whacked this mole for now.

for x in $(cat jenkins_workers.txt); do echo $x; ssh $x "find
/home/jenkins/workspace/SparkPullRequestBuilder*/target/tmp -maxdepth
3| xargs ls -ld | egrep ^dr-x"; echo; echo; done

as for what exactly is messing up the perms, i'm not entirely sure.
josh, you have any ideas?

shane

On Tue, Jul 28, 2015 at 11:51 AM, shane knapp <skn...@berkeley.edu> wrote:
> hey all, i'm just back in from my wedding weekend (woot!) and am
> working on figuring out what's happening w/the git timeouts for pull
> request builds.
>
> TL;DR:  if your build fails due to a timeout, please retrigger your
> builds.  i know this isn't the BEST solution, but until we get some
> stuff implemented (traffic shaping, git cache for the workers) it's
> the only thing i can recommend.
>
> here's a snapshot of the state of the union:
> $ get_timeouts.sh 5
> timeouts by date:
> 2015-07-23 -- 3
> 2015-07-24 -- 1
> 2015-07-26 -- 7
> 2015-07-27 -- 18
> 2015-07-28 -- 9
>
> timeouts by project:
>      35 SparkPullRequestBuilder
>       3 Tachyon-Pull-Request-Builder
> total builds (excepting aborted by a user):
> 1908
>
> total percentage of builds timing out:
> 01%
>
> nothing has changed on our end AFAIK, our traffic graphs look totally
> fine, but starting sunday, we started seeing a spike in timeouts, with
> yesterday being the worst.  today is also not looking good either.
>
> github is looking OK, but not "great":
> https://status.github.com/
>
> as a solution, we'll be setting up some traffic shaping on our end, as
> well as implementing a git cache on the workers so that we'll
> (hopefully) minimize how many hits we make against github.  i was
> planning on doing the git cache months ago, but the timeout issue
> pretty much went away and i back-burnered that idea until today.
>
> other than that, i'll be posting updates as we get them.
>
> shane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to