Dear Elliot What happened, is simply a segmentation fault due to libpthread and thereafter the nodes were not accessible.
regards Moritz On Tuesday, 28 October 2014 23:53:03 UTC+2, Elliot Saba wrote: > > Moritz, I'm interested in what broke on the compute nodes. Do you have > any example output from trying to run Julia on the compute nodes? > -E > > On Tue, Oct 28, 2014 at 1:57 PM, Tony Kelman <to...@kelman.net > <javascript:>> wrote: > >> Elliot and I had some discussions recently where we were thinking it >> might be a good idea to combine some of these settings under one easy group >> flag like JULIA_PORTABLE=1 or something, that would then set >> OPENBLAS_DYNAMIC_ARCH, along with the flags needed for the system image >> that I can never remember. If we get that working and documented, then we >> could consider disabling OPENBLAS_DYNAMIC_ARCH by default so we can have >> faster from-scratch source builds. >> >> >> On Tuesday, October 28, 2014 10:45:16 AM UTC-7, Isaiah wrote: >>> >>> The headnode/childnode issue is usually an architecture mismatch. You >>> can target a more generic architecture to get around this; see the >>> discussion in this thread: >>> https://groups.google.com/d/msg/julia-dev/Eqp0GhZWxME/3mGKX1l_L9gJ >>> >>> ps: this should go in the FAQ... if someone new on here wants to make a >>> first Julia pull request: click the "Edit on GitHub" button at the >>> top-right while viewing the documentation. Add an entry for this, and click >>> "Submit". >>> >>> On Tue, Oct 28, 2014 at 1:12 PM, moritz braun <moritz...@gmail.com> >>> wrote: >>> >>>> Dear All >>>> >>>> Due to our provided not being able / willing to provide is with >>>> updates for the Lustre drivers we are currently stuck with >>>> a 2.6.32 Kernel from 2011 on our 128 Nodes cluster. >>>> Unfortunately, our current setup will not change for the next 18 months >>>> or so until the upgrade has gone on Tender..... >>>> >>>> I tried the following >>>> 1. Compilation with the gcc toolchain while disabling AVX with >>>> OPENBLAS_NO_AVX = 1 >>>> This had worked on a single SMP 32 processor server running REL 6.5. >>>> On REL 6.2. it only worked for the headnode. On the other nodes the >>>> executable resulted in a binary format error. >>>> 2. using one of the generic 64 bit builds. >>>> Worked on headnode, but broke compute nodes >>>> 3. Compiliation using icc,icpc and ifc as described in >>>> http://goparallel.sourceforge.net/wp-content/uploads/2014/03/ >>>> TheParallelUniverse_Issue_17.pdf >>>> This failed with difficult to understand and hidden errors. >>>> ( I will try again soon and post the output of it!) >>>> >>>> I am a bit a the end of my knowledge! >>>> >>>> Any hints would be appreciated. >>>> >>>> >>>> regards >>>> >>>> Moritz Braun >>>> >>> >>> >