date:20170208

[OMPI devel] Travis: one thing that might help

2017-02-08 Thread Jeff Squyres (jsquyres)

I noticed the other evening that we are doing two things at Travis:

1. Building pull requests
2. Building pushes

The 2nd one might well be contributing to our backlog (i.e., every time a PR is 
merged to the ompi repo, we Travis build again).

I also confirmed with Travis that we're supposed to he able to be building 5 
jobs concurrently within Travis.

I've just turned off building pushes, so we should *only* be building pull 
requests.  Let's see if this will help with the turnaround time on Travis 
builds...

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] Travis: one thing that might help

2017-02-08 Thread gilles

Jeff,

i also noted that each time a PR is updated, a new Travis build is 
started.
on the other hand, Jenkins is a bit smarter and does not build or cancel 
"obsolete" PR.
i think most of us cannot manually direct Travis to cancel a given build.

fwiw, building pushes is not useless.
we recently hit a case in which the PR was successfully built, then some 
other changes were made but they did not cause any conflicts from a a 
git point of view, so the PR was merged.
unfortunatly, master could not build any more because there was a indeed 
a conflict that git had no way to detect.

Cheers,

Gilles

- Original Message -
> I noticed the other evening that we are doing two things at Travis:
> 
> 1. Building pull requests
> 2. Building pushes
> 
> The 2nd one might well be contributing to our backlog (i.e., every 
time a PR is merged to the ompi repo, we Travis build again).
> 
> I also confirmed with Travis that we're supposed to he able to be 
building 5 jobs concurrently within Travis.
> 
> I've just turned off building pushes, so we should *only* be building 
pull requests.  Let's see if this will help with the turnaround time on 
Travis builds...
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> 


___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] Travis: one thing that might help

2017-02-08 Thread Jeff Squyres (jsquyres)

On Feb 8, 2017, at 9:34 AM, gil...@rist.or.jp wrote:
> 
> i also noted that each time a PR is updated, a new Travis build is 
> started.
> on the other hand, Jenkins is a bit smarter and does not build or cancel 
> "obsolete" PR.

Are you sure?

Here's how I thought Jenkins worked:

- create a PR: queue up a Jenkins job
- push a change to a PR: queue up a Jenkins job

So let's say a scenario like this happens:

1. jenkins is fully busy
2. jeff submits PR 1234, queues jenkins job 5678
3. jeff pushes another commit to PR 1234, queues jenkins job 5679
4. jeff pushes another commit to PR 1234, queues jenkins job 5680
5. jenkins becomes unbusy, runs job 5678
   --> this does test the head of the PR branch -- not the PR as it was 
initially submitted
6. jenkins finishes 5678 and runs job 5679
   --> this *also* tests the head of the PR branch -- i.e., exactly what was 
tested in 5678
7. jenkins finishes 5679 and runs job 5680
   --> this *also* tests the head of the PR branch -- i.e., exactly what was 
tested in 5678 and 5679

I.e., my understanding was that Jenkins would do multiple redundant jobs and 
not be able to tell the difference between them (because of the lack of state 
kept between individual Jenkins jobs)

I know that that *used* to be the case.  Perhaps recently versions of Jenkins 
(or its plugins?) have made this better such that 5679 and 5680 would turn into 
no-ops...?  Do you know if this is the case?

> i think most of us cannot manually direct Travis to cancel a given build.
> 
> fwiw, building pushes is not useless.
> we recently hit a case in which the PR was successfully built, then some 
> other changes were made but they did not cause any conflicts from a a 
> git point of view, so the PR was merged.
> unfortunatly, master could not build any more because there was a indeed 
> a conflict that git had no way to detect.

I agree -- building pushes is not a bad thing for exactly the reason you cite.

But if Travis has limited resources, I'm wondering if it would be better to 
utilize them for PRs than the uncommon case of detecting problems-upon-merge.

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] Travis: one thing that might help

2017-02-08 Thread gilles

Jeff,

iirc, i saw build being cancelled (i was monitoring the Jenkins console) 
when new commits were pushed (or force pushed) to the current PR

i will make a test tomorrow

it is fair that using Travis for new PR is very likely more useful than 
for validating all builds

Cheers,

Gilles

- Original Message -
> On Feb 8, 2017, at 9:34 AM, gil...@rist.or.jp wrote:
> > 
> > i also noted that each time a PR is updated, a new Travis build is 
> > started.
> > on the other hand, Jenkins is a bit smarter and does not build or 
cancel 
> > "obsolete" PR.
> 
> Are you sure?
> 
> Here's how I thought Jenkins worked:
> 
> - create a PR: queue up a Jenkins job
> - push a change to a PR: queue up a Jenkins job
> 
> So let's say a scenario like this happens:
> 
> 1. jenkins is fully busy
> 2. jeff submits PR 1234, queues jenkins job 5678
> 3. jeff pushes another commit to PR 1234, queues jenkins job 5679
> 4. jeff pushes another commit to PR 1234, queues jenkins job 5680
> 5. jenkins becomes unbusy, runs job 5678
>--> this does test the head of the PR branch -- not the PR as it 
was initially submitted
> 6. jenkins finishes 5678 and runs job 5679
>--> this *also* tests the head of the PR branch -- i.e., exactly 
what was tested in 5678
> 7. jenkins finishes 5679 and runs job 5680
>--> this *also* tests the head of the PR branch -- i.e., exactly 
what was tested in 5678 and 5679
> 
> I.e., my understanding was that Jenkins would do multiple redundant 
jobs and not be able to tell the difference between them (because of the 
lack of state kept between individual Jenkins jobs)
> 
> I know that that *used* to be the case.  Perhaps recently versions of 
Jenkins (or its plugins?) have made this better such that 5679 and 5680 
would turn into no-ops...?  Do you know if this is the case?
> 
> > i think most of us cannot manually direct Travis to cancel a given 
build.
> > 
> > fwiw, building pushes is not useless.
> > we recently hit a case in which the PR was successfully built, then 
some 
> > other changes were made but they did not cause any conflicts from a 
a 
> > git point of view, so the PR was merged.
> > unfortunatly, master could not build any more because there was a 
indeed 
> > a conflict that git had no way to detect.
> 
> I agree -- building pushes is not a bad thing for exactly the reason 
you cite.
> 
> But if Travis has limited resources, I'm wondering if it would be 
better to utilize them for PRs than the uncommon case of detecting 
problems-upon-merge.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> 


___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] Travis: one thing that might help

2017-02-08 Thread Jeff Squyres (jsquyres)

On Feb 8, 2017, at 10:15 AM, gil...@rist.or.jp wrote:
> 
> iirc, i saw build being cancelled (i was monitoring the Jenkins console) 
> when new commits were pushed (or force pushed) to the current PR
> 
> i will make a test tomorrow

Oh, sweet. That would be good to know; thanks!

> it is fair that using Travis for new PR is very likely more useful than 
> for validating all builds

We talked yesterday on the call about possibly upgrading to a paid Travis plan 
to see if that could ease some of the congestion.  Some of us are looking into 
this, and will report back on next Tuesday's call.

-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] Segfault on MPI init

2017-02-08 Thread Jeff Squyres (jsquyres)

What version of Open MPI are you running?

The error is indicating that Open MPI is trying to start a user-level helper 
daemon on the remote node, and the daemon is seg faulting (which is unusual).

One thing to be aware of:

 https://www.open-mpi.org/faq/?category=building#install-overwrite



> On Feb 6, 2017, at 8:14 AM, Cyril Bordage  wrote:
> 
> Hello,
> 
> I cannot run the a program with MPI when I compile it myself.
> On some nodes I have the following error:
> 
> [mimi012:17730] *** Process received signal ***
> [mimi012:17730] Signal: Segmentation fault (11)
> [mimi012:17730] Signal code: Address not mapped (1)
> [mimi012:17730] Failing at address: 0xf8
> [mimi012:17730] [ 0] /lib64/libpthread.so.0(+0xf500)[0x766c0500]
> [mimi012:17730] [ 1]
> /home/bordage/modules/openmpi/openmpi-debug/lib/libopen-pal.so.0(opal_libevent2022_event_priority_set+0xa9)[0x7781fcb9]
> [mimi012:17730] [ 2]
> /home/bordage/modules/openmpi/openmpi-debug/lib/openmpi/mca_oob_tcp.so(+0xebcd)[0x7197fbcd]
> [mimi012:17730] [ 3]
> /home/bordage/modules/openmpi/openmpi-debug/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_peer_accept+0xa1)[0x71981e34]
> [mimi012:17730] [ 4]
> /home/bordage/modules/openmpi/openmpi-debug/lib/openmpi/mca_oob_tcp.so(+0xab1d)[0x7197bb1d]
> [mimi012:17730] [ 5]
> /home/bordage/modules/openmpi/openmpi-debug/lib/libopen-pal.so.0(opal_libevent2022_event_base_loop+0x53c)[0x7782323c]
> [mimi012:17730] [ 6]
> /home/bordage/modules/openmpi/openmpi-debug/lib/libopen-pal.so.0(+0x3d34c)[0x777c534c]
> [mimi012:17730] [ 7] /lib64/libpthread.so.0(+0x7851)[0x766b8851]
> [mimi012:17730] [ 8] /lib64/libc.so.6(clone+0x6d)[0x7640694d]
> [mimi012:17730] *** End of error message ***
> --
> ORTE has lost communication with its daemon located on node:
> 
>  hostname:  mimi012
> 
> This is usually due to either a failure of the TCP network
> connection to the node, or possibly an internal failure of
> the daemon itself. We cannot recover from this failure, and
> therefore will terminate the job.
> --
> 
> 
> The error does not appear with the official MPI installed in the
> platform. I asked the admins about their compilation options but there
> is nothing particular.
> 
> Moreover it appears only for some node lists. Still, the nodes seem to
> be fine since it works with the official version of MPI of the platform.
> 
> To be sure it is not a network problem I tried to use "-mca btl
> tcp,sm,self" or "-mca btl openib,sm,self" with no change.
> 
> Do you have any idea where this error may come from?
> 
> Thank you.
> 
> 
> Cyril Bordage.
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] Travis: one thing that might help

Re: [OMPI devel] Travis: one thing that might help

Re: [OMPI devel] Travis: one thing that might help

Re: [OMPI devel] Travis: one thing that might help

Re: [OMPI devel] Travis: one thing that might help

Re: [OMPI devel] Segfault on MPI init

6 matches

Site Navigation

Mail list logo

Footer information