Re: [OMPI devel] Seldom deadlock in mpirun

2016-06-03 Thread Jeff Squyres (jsquyres)
That's disappointing / puzzling.

Threads 4 and 5 look like they're in the PMIX / ORTE progress threads, 
respectively.

But I don'tt see any obvious signs of what thread 1, 2, 3 are for.  Huh.

When is this hang happening -- during init?  Middle of the program?  During 
finalize?


> On Jun 2, 2016, at 6:00 PM, George Bosilca  wrote:
> 
> Sure, but they mostly look similar.
> 
>   George.
> 
> 
> (lldb) thread list
> Process 76811 stopped
>   thread #1: tid = 0x272b40e, 0x7fff93306de6 
> libsystem_kernel.dylib`__psynch_mutexwait + 10, queue = 
> 'com.apple.main-thread', stop reason = signal SIGSTOP
>   thread #2: tid = 0x272b40f, 0x7fff93306de6 
> libsystem_kernel.dylib`__psynch_mutexwait + 10
>   thread #3: tid = 0x272b410, 0x7fff93306de6 
> libsystem_kernel.dylib`__psynch_mutexwait + 10
>   thread #4: tid = 0x272b411, 0x7fff9330707a 
> libsystem_kernel.dylib`__select + 10
> * thread #5: tid = 0x272b412, 0x7fff9330707a 
> libsystem_kernel.dylib`__select + 10
> (lldb)
> 
> 
> (lldb) thread select 1
> * thread #1: tid = 0x272b40e, 0x7fff93306de6 
> libsystem_kernel.dylib`__psynch_mutexwait + 10, queue = 
> 'com.apple.main-thread', stop reason = signal SIGSTOP
> frame #0: 0x7fff93306de6 libsystem_kernel.dylib`__psynch_mutexwait + 
> 10
> libsystem_kernel.dylib`__psynch_mutexwait:
> ->  0x7fff93306de6 <+10>: jae0x7fff93306df0; <+20>
> 0x7fff93306de8 <+12>: movq   %rax, %rdi
> 0x7fff93306deb <+15>: jmp0x7fff933017cd; cerror_nocancel
> 0x7fff93306df0 <+20>: retq
> (lldb) bt
> * thread #1: tid = 0x272b40e, 0x7fff93306de6 
> libsystem_kernel.dylib`__psynch_mutexwait + 10, queue = 
> 'com.apple.main-thread', stop reason = signal SIGSTOP
>   * frame #0: 0x7fff93306de6 libsystem_kernel.dylib`__psynch_mutexwait + 
> 10
> frame #1: 0x7fff9a000e4a 
> libsystem_pthread.dylib`_pthread_mutex_lock_wait + 89
> frame #2: 0x7fff99ffe5f5 
> libsystem_pthread.dylib`_pthread_mutex_lock_slow + 300
> frame #3: 0x7fff8c2a00f8 libdyld.dylib`dyldGlobalLockAcquire() + 16
> frame #4: 0x7fff6ca8e177 
> dyld`ImageLoaderMachOCompressed::doBindFastLazySymbol(unsigned int, 
> ImageLoader::LinkContext const&, void (*)(), void (*)()) + 55
> frame #5: 0x7fff6ca78063 dyld`dyld::fastBindLazySymbol(ImageLoader**, 
> unsigned long) + 90
> frame #6: 0x7fff8c2a0262 libdyld.dylib`dyld_stub_binder + 282
> frame #7: 0x00010a5b29b0 libopen-pal.0.dylib`obj_order_type + 3776
> 
> 
> (lldb) thread select 2
> * thread #2: tid = 0x272b40f, 0x7fff93306de6 
> libsystem_kernel.dylib`__psynch_mutexwait + 10
> frame #0: 0x7fff93306de6 libsystem_kernel.dylib`__psynch_mutexwait + 
> 10
> libsystem_kernel.dylib`__psynch_mutexwait:
> ->  0x7fff93306de6 <+10>: jae0x7fff93306df0; <+20>
> 0x7fff93306de8 <+12>: movq   %rax, %rdi
> 0x7fff93306deb <+15>: jmp0x7fff933017cd; cerror_nocancel
> 0x7fff93306df0 <+20>: retq
> (lldb) bt
> * thread #2: tid = 0x272b40f, 0x7fff93306de6 
> libsystem_kernel.dylib`__psynch_mutexwait + 10
>   * frame #0: 0x7fff93306de6 libsystem_kernel.dylib`__psynch_mutexwait + 
> 10
> frame #1: 0x7fff9a000e4a 
> libsystem_pthread.dylib`_pthread_mutex_lock_wait + 89
> frame #2: 0x7fff99ffe5f5 
> libsystem_pthread.dylib`_pthread_mutex_lock_slow + 300
> frame #3: 0x7fff8c2a00f8 libdyld.dylib`dyldGlobalLockAcquire() + 16
> frame #4: 0x7fff6ca8e177 
> dyld`ImageLoaderMachOCompressed::doBindFastLazySymbol(unsigned int, 
> ImageLoader::LinkContext const&, void (*)(), void (*)()) + 55
> frame #5: 0x7fff6ca78063 dyld`dyld::fastBindLazySymbol(ImageLoader**, 
> unsigned long) + 90
> frame #6: 0x7fff8c2a0262 libdyld.dylib`dyld_stub_binder + 282
> frame #7: 0x00010a5b29b0 libopen-pal.0.dylib`obj_order_type + 3776
> 
> 
> (lldb) thread select 3
> * thread #3: tid = 0x272b410, 0x7fff93306de6 
> libsystem_kernel.dylib`__psynch_mutexwait + 10
> frame #0: 0x7fff93306de6 libsystem_kernel.dylib`__psynch_mutexwait + 
> 10
> libsystem_kernel.dylib`__psynch_mutexwait:
> ->  0x7fff93306de6 <+10>: jae0x7fff93306df0; <+20>
> 0x7fff93306de8 <+12>: movq   %rax, %rdi
> 0x7fff93306deb <+15>: jmp0x7fff933017cd; cerror_nocancel
> 0x7fff93306df0 <+20>: retq
> (lldb) bt
> * thread #3: tid = 0x272b410, 0x7fff93306de6 
> libsystem_kernel.dylib`__psynch_mutexwait + 10
>   * frame #0: 0x7fff93306de6 libsystem_kernel.dylib`__psynch_mutexwait + 
> 10
> frame #1: 0x7fff9a000e4a 
> libsystem_pthread.dylib`_pthread_mutex_lock_wait + 89
> frame #2: 0x7fff99ffe5f5 
> libsystem_pthread.dylib`_pthread_mutex_lock_slow + 300
> frame #3: 0x7fff8c2a00f8 libdyld.dylib`dyldGlobalLockAcquire() + 16
> frame #4: 0x7fff6ca8e177 
> dyld`ImageLoaderMachOCompressed::doBindFastLazySymbol(unsigned int, 
> ImageLoader::LinkContext const&, void (*)(), void (*)()) +

[OMPI devel] Fwd: Reboot deep-thought

2016-06-03 Thread Jeff Squyres (jsquyres)
FYI -- www.open-mpi.org and mtt.open-mpi.org need a scheduled maintenance 
window this upcoming Monday, June 6.  See below for details.


> Begin forwarded message:
> 
> From: "Kim, DongInn"
> Subject: Reboot deep-thought
> Date: June 3, 2016 at 11:19:02 AM PDT
> 
> Dear All,
> 
>> Hi DongInn,
>> 
>> Any chance we can arrange to reboot deep-thought? This is for three
>> important reasons: 1) for the RHEL 6.8 to kick in, 2) update all
>> BIOS/firmware to the latest, greatest version and 3) update the
>> OpenManage to 8.3.
>> 
>> Thanks,
>> Bruce
> 
> As Bruce mentioned above, we need to reboot deep-thought to apply all the 
> necessary maintenances.
> Bruce and I have setup the schedule to restart deep-thought at 7am (E.T.) on 
> June 6th, 2016.
> It will be back on in an hour.
> 
> Date: June 6th, 2016
> Time:
> 04:00 am - 05:00 am Pacific Time
> 05:00 am - 06:00 am Mountain Time
> 06:00 am - 07:00 am Central Time
> 07:00 am - 08:00 am Eastern Time
> 11:00 am - 12:00 pm GMT
> 
> The following services will not be available during the above downtime:
> - Web services:
>  www.open-mpi.org
>  www.mpi-forum.org
>  mtt.open-mpi.org
> 
> - NFS
>  Shared directory
>  User’s $HOME
> 
> - Login to the CREST VM server
> 
> Please let me know if you have any questions or concerns about this reboot.


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI devel] 1.10.3rc4 ready for test

2016-06-03 Thread Ralph Castain
Hello folks

The release candidate is in the usual place:

https://www.open-mpi.org/software/ompi/v1.10/ 


Please note that the OMPI web site will be down for maintenance 7-8am US 
Eastern time on June 6th.

I would like to get a round of final checks on this RC, and hopefully release 
on June 10th.

Thanks
Ralph