Thanks for the help Sebastien,
So I think the problem is that I recently refreshed the ray repository, but
not the RayPlatform. When I pulled from both, my test assembly now goes to
completion without errors.
To answer your other questions, yes the checkpoints wrote out fine, and I
didn't run with any profiling so I don't know how the IO behaved.
So I have another question on how to set the routing parameters to properly
scale this up for our 1.2 Terabase dataset. I didn't see in the
documentation any insight on which routing shape and degree to use for a
given architecture. I'm going to run on hopper
http://www.nersc.gov/users/computational-systems/hopper/configuration/compute-nodes/.
Which has 24 cores per physical node, each with 4 NUMA nodes and 6
cores. The nodes themselves are connected in a 3D torus. So, given that
shape of the hardware, how would you recommend that I set the routing and
degrees for a large job of 1000 - 8000 cores?
Thanks again!
Rob
On Tue, Oct 9, 2012 at 7:08 AM, Sébastien Boisvert <
[email protected]> wrote:
> Hello,
>
> [CC'ed the mailing list.]
>
> On 08/10/12 05:22 PM, Rob Egan wrote:
> > Hello Sebastien,
> >
> > If you remember, we were both at the ICIS workshop in Utah this year.
>
> Yes I remember.
>
> >
> > I've been trying to use Ray on our 1.2 Terabase cow rumen metagenomic
> dataset but am running into
> > trouble getting it to complete successfully,
> > and I was wondering if you could give me some advice on how to get it
> working.
>
> Of course.
>
> > The version I used was from git:
> > Ray: d8a8d737481d657ad8979e025eecacb9f400f3b1
>
> This is from Thu Sep 27 17:33:06 2012 -0400
>
> > RayPlatform: 9a197dc4dd750d0807543dfc2cc394e235df97c6
>
> This is from Wed May 2 16:43:10 2012 -0400, before the release of
> Ray v2.0.0.
>
> Are you sure of the git hashes because Ray d8a8d737481d657 should not
> compile
> properly with RayPlatform 9a197dc4dd750d080.
>
> Also, if you are using gcc 4.7.x, you need to pull as a bug was fixed.
>
> See my other answers below.
>
> >
> > I'm running at NERSC's hopper system and even at just a small fraction
> of the total data available
> > (70GBytes), Ray is segfaulting shortly after it completes the
> OptimalMarkers checkpoint
> > (after 3 hours on 480 cores). Attached is the last 10000 lines of the
> log file.
> >
>
> Are the checkpoints working well for you ?
>
> Are you seeing I/O peaks on your file system ?
>
> > This is the not-too-useful message I get from the logfiles:
> >
> > _pmiu_daemon(SIGCHLD): [NID 00211] [c0-3c1s6n1] [Sat Sep 29 20:25:29
> 2012] PE RANK 308 exit signal Segmentation fault
> > [NID 00211] 2012-09-29 20:25:53 Apid 11826156: initiated application
> termination
> > _pmiu_daemon(SIGCHLD): [NID 00270] [c0-3c1s7n2] [Sat Sep 29 20:25:28
> 2012] PE RANK 180 exit signal Segmentation fault
> > _pmiu_daemon(SIGCHLD): [NID 06352] [c1-3c1s7n0] [Sat Sep 29 20:25:28
> 2012] PE RANK 462 exit signal Segmentation fault
> > _pmiu_daemon(SIGCHLD): [NID 06355] [c1-3c1s6n1] [Sat Sep 29 20:25:28
> 2012] PE RANK 337 exit signal Segmentation fault
> > _pmiu_daemon(SIGCHLD): [NID 00207] [c0-3c0s7n1] [Sat Sep 29 20:25:28
> 2012] PE RANK 100 exit signal Segmentation fault
> >
> >
>
> At least we know that some segments were protected by the memory system !
>
> > Our data set is a mixture of fragment pairs (2x100 350 insert), jumping
> pairs (2x76 3k & 5k insert)
> > and overlapping/merged long reads (200-250bp). For the 1/20th view I
> only included highly abundant
> > (>256 depth by kmer=31) pre-screened reads.
>
> Nice libraries ! It is nice to see people using long distances for de novo
> assemblies.
>
> > This is the command line options:
> >
> > "-o ../Ray-151.d2-Ray-2267125.sdb.d -disable-network-test
> -bloom-filter-bits 0 -write-contig-paths \
> > -read-write-checkpoints . -k 151 -s
> ../kmernator-FR-1521692.sdb-FR-MinDepth2-PartitionDepth \
> > 256-I200.fastq -i
> ../kmernator-FR-1521692.sdb-FR-MinDepth2-PartitionDepth256-I300.fastq -i \
> > ../kmernator-FR-1521692.sdb-FR-MinDepth2-PartitionDepth256-I350.fastq -i
> > ../kmernator-FR-1521692.sdb-FR-MinDepth2 -PartitionDepth256-I3k.fastq
> -i \
> > ../kmernator-FR-1521692.sdb-FR-MinDepth2-PartitionDepth256-I5k.fastq \
> > -s ../med-PD256-I250.fastq"
> >
>
> It's good to see that the interleaved code path is still popular.
>
> I see that you are using long k-mers.
>
> We fixed a segmentation fault at this very place recently:
>
> commit 1e6719eba3a919f32ab7132090e23e67b084df85
> Author: Sébastien Boisvert <[email protected]>
> Date: Thu Aug 16 13:20:23 2012 -0400
>
> A path with 0 k-mers has 0 nucleotides, not 0-k+1.
>
> This fixes segmentation faults when using large k-mers for
> assemblies.
>
> Reported-by: Pier-Luc Plante <[email protected]>
> Signed-off-by: Sébastien Boisvert <[email protected]>
>
> code/plugin_KmerAcademyBuilder/Kmer.h | 4 ++++
> code/plugin_Scaffolder/Scaffolder.cpp | 2 +-
> code/plugin_SeedExtender/SeedExtender.cpp | 4 +++-
> code/plugin_SeedingData/SeedingData.cpp | 13 +++++++++++--
>
>
> But this git hash you provided indicates that you alrady have that.
>
> Can you confirm ?
>
> > Thanks in advance!
> > Rob Egan
> >
>
>
>
>
>
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Denovoassembler-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users