[OMPI devel] Travis: one thing that might help
I noticed the other evening that we are doing two things at Travis: 1. Building pull requests 2. Building pushes The 2nd one might well be contributing to our backlog (i.e., every time a PR is merged to the ompi repo, we Travis build again). I also confirmed with Travis that we're supposed to he able to be building 5 jobs concurrently within Travis. I've just turned off building pushes, so we should *only* be building pull requests. Let's see if this will help with the turnaround time on Travis builds... -- Jeff Squyres jsquy...@cisco.com ___ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
Re: [OMPI devel] Travis: one thing that might help
Jeff, i also noted that each time a PR is updated, a new Travis build is started. on the other hand, Jenkins is a bit smarter and does not build or cancel "obsolete" PR. i think most of us cannot manually direct Travis to cancel a given build. fwiw, building pushes is not useless. we recently hit a case in which the PR was successfully built, then some other changes were made but they did not cause any conflicts from a a git point of view, so the PR was merged. unfortunatly, master could not build any more because there was a indeed a conflict that git had no way to detect. Cheers, Gilles - Original Message - > I noticed the other evening that we are doing two things at Travis: > > 1. Building pull requests > 2. Building pushes > > The 2nd one might well be contributing to our backlog (i.e., every time a PR is merged to the ompi repo, we Travis build again). > > I also confirmed with Travis that we're supposed to he able to be building 5 jobs concurrently within Travis. > > I've just turned off building pushes, so we should *only* be building pull requests. Let's see if this will help with the turnaround time on Travis builds... > > -- > Jeff Squyres > jsquy...@cisco.com > > ___ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > ___ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
Re: [OMPI devel] Travis: one thing that might help
On Feb 8, 2017, at 9:34 AM, gil...@rist.or.jp wrote: > > i also noted that each time a PR is updated, a new Travis build is > started. > on the other hand, Jenkins is a bit smarter and does not build or cancel > "obsolete" PR. Are you sure? Here's how I thought Jenkins worked: - create a PR: queue up a Jenkins job - push a change to a PR: queue up a Jenkins job So let's say a scenario like this happens: 1. jenkins is fully busy 2. jeff submits PR 1234, queues jenkins job 5678 3. jeff pushes another commit to PR 1234, queues jenkins job 5679 4. jeff pushes another commit to PR 1234, queues jenkins job 5680 5. jenkins becomes unbusy, runs job 5678 --> this does test the head of the PR branch -- not the PR as it was initially submitted 6. jenkins finishes 5678 and runs job 5679 --> this *also* tests the head of the PR branch -- i.e., exactly what was tested in 5678 7. jenkins finishes 5679 and runs job 5680 --> this *also* tests the head of the PR branch -- i.e., exactly what was tested in 5678 and 5679 I.e., my understanding was that Jenkins would do multiple redundant jobs and not be able to tell the difference between them (because of the lack of state kept between individual Jenkins jobs) I know that that *used* to be the case. Perhaps recently versions of Jenkins (or its plugins?) have made this better such that 5679 and 5680 would turn into no-ops...? Do you know if this is the case? > i think most of us cannot manually direct Travis to cancel a given build. > > fwiw, building pushes is not useless. > we recently hit a case in which the PR was successfully built, then some > other changes were made but they did not cause any conflicts from a a > git point of view, so the PR was merged. > unfortunatly, master could not build any more because there was a indeed > a conflict that git had no way to detect. I agree -- building pushes is not a bad thing for exactly the reason you cite. But if Travis has limited resources, I'm wondering if it would be better to utilize them for PRs than the uncommon case of detecting problems-upon-merge. -- Jeff Squyres jsquy...@cisco.com ___ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
Re: [OMPI devel] Travis: one thing that might help
Jeff, iirc, i saw build being cancelled (i was monitoring the Jenkins console) when new commits were pushed (or force pushed) to the current PR i will make a test tomorrow it is fair that using Travis for new PR is very likely more useful than for validating all builds Cheers, Gilles - Original Message - > On Feb 8, 2017, at 9:34 AM, gil...@rist.or.jp wrote: > > > > i also noted that each time a PR is updated, a new Travis build is > > started. > > on the other hand, Jenkins is a bit smarter and does not build or cancel > > "obsolete" PR. > > Are you sure? > > Here's how I thought Jenkins worked: > > - create a PR: queue up a Jenkins job > - push a change to a PR: queue up a Jenkins job > > So let's say a scenario like this happens: > > 1. jenkins is fully busy > 2. jeff submits PR 1234, queues jenkins job 5678 > 3. jeff pushes another commit to PR 1234, queues jenkins job 5679 > 4. jeff pushes another commit to PR 1234, queues jenkins job 5680 > 5. jenkins becomes unbusy, runs job 5678 >--> this does test the head of the PR branch -- not the PR as it was initially submitted > 6. jenkins finishes 5678 and runs job 5679 >--> this *also* tests the head of the PR branch -- i.e., exactly what was tested in 5678 > 7. jenkins finishes 5679 and runs job 5680 >--> this *also* tests the head of the PR branch -- i.e., exactly what was tested in 5678 and 5679 > > I.e., my understanding was that Jenkins would do multiple redundant jobs and not be able to tell the difference between them (because of the lack of state kept between individual Jenkins jobs) > > I know that that *used* to be the case. Perhaps recently versions of Jenkins (or its plugins?) have made this better such that 5679 and 5680 would turn into no-ops...? Do you know if this is the case? > > > i think most of us cannot manually direct Travis to cancel a given build. > > > > fwiw, building pushes is not useless. > > we recently hit a case in which the PR was successfully built, then some > > other changes were made but they did not cause any conflicts from a a > > git point of view, so the PR was merged. > > unfortunatly, master could not build any more because there was a indeed > > a conflict that git had no way to detect. > > I agree -- building pushes is not a bad thing for exactly the reason you cite. > > But if Travis has limited resources, I'm wondering if it would be better to utilize them for PRs than the uncommon case of detecting problems-upon-merge. > > -- > Jeff Squyres > jsquy...@cisco.com > > ___ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > ___ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
Re: [OMPI devel] Travis: one thing that might help
On Feb 8, 2017, at 10:15 AM, gil...@rist.or.jp wrote: > > iirc, i saw build being cancelled (i was monitoring the Jenkins console) > when new commits were pushed (or force pushed) to the current PR > > i will make a test tomorrow Oh, sweet. That would be good to know; thanks! > it is fair that using Travis for new PR is very likely more useful than > for validating all builds We talked yesterday on the call about possibly upgrading to a paid Travis plan to see if that could ease some of the congestion. Some of us are looking into this, and will report back on next Tuesday's call. -- Jeff Squyres jsquy...@cisco.com ___ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
Re: [OMPI devel] Segfault on MPI init
What version of Open MPI are you running? The error is indicating that Open MPI is trying to start a user-level helper daemon on the remote node, and the daemon is seg faulting (which is unusual). One thing to be aware of: https://www.open-mpi.org/faq/?category=building#install-overwrite > On Feb 6, 2017, at 8:14 AM, Cyril Bordage wrote: > > Hello, > > I cannot run the a program with MPI when I compile it myself. > On some nodes I have the following error: > > [mimi012:17730] *** Process received signal *** > [mimi012:17730] Signal: Segmentation fault (11) > [mimi012:17730] Signal code: Address not mapped (1) > [mimi012:17730] Failing at address: 0xf8 > [mimi012:17730] [ 0] /lib64/libpthread.so.0(+0xf500)[0x766c0500] > [mimi012:17730] [ 1] > /home/bordage/modules/openmpi/openmpi-debug/lib/libopen-pal.so.0(opal_libevent2022_event_priority_set+0xa9)[0x7781fcb9] > [mimi012:17730] [ 2] > /home/bordage/modules/openmpi/openmpi-debug/lib/openmpi/mca_oob_tcp.so(+0xebcd)[0x7197fbcd] > [mimi012:17730] [ 3] > /home/bordage/modules/openmpi/openmpi-debug/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_peer_accept+0xa1)[0x71981e34] > [mimi012:17730] [ 4] > /home/bordage/modules/openmpi/openmpi-debug/lib/openmpi/mca_oob_tcp.so(+0xab1d)[0x7197bb1d] > [mimi012:17730] [ 5] > /home/bordage/modules/openmpi/openmpi-debug/lib/libopen-pal.so.0(opal_libevent2022_event_base_loop+0x53c)[0x7782323c] > [mimi012:17730] [ 6] > /home/bordage/modules/openmpi/openmpi-debug/lib/libopen-pal.so.0(+0x3d34c)[0x777c534c] > [mimi012:17730] [ 7] /lib64/libpthread.so.0(+0x7851)[0x766b8851] > [mimi012:17730] [ 8] /lib64/libc.so.6(clone+0x6d)[0x7640694d] > [mimi012:17730] *** End of error message *** > -- > ORTE has lost communication with its daemon located on node: > > hostname: mimi012 > > This is usually due to either a failure of the TCP network > connection to the node, or possibly an internal failure of > the daemon itself. We cannot recover from this failure, and > therefore will terminate the job. > -- > > > The error does not appear with the official MPI installed in the > platform. I asked the admins about their compilation options but there > is nothing particular. > > Moreover it appears only for some node lists. Still, the nodes seem to > be fine since it works with the official version of MPI of the platform. > > To be sure it is not a network problem I tried to use "-mca btl > tcp,sm,self" or "-mca btl openib,sm,self" with no change. > > Do you have any idea where this error may come from? > > Thank you. > > > Cyril Bordage. > ___ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel -- Jeff Squyres jsquy...@cisco.com ___ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel