On 13/07/11 17:01, David Eccles (gringer) wrote:
> Um... what? So... the first few workers that get added don't actually do
> any work, and the activeWorkers array isn't populated until the first 5
> vertices are loaded. This agrees with what's on lines 138-144:
>
> int coverage=node->getCoverage(&vertexKey);
> int minimum=5;
> if(coverage<minimum){
> m_completedJobs++;
> }else{
> m_aliveWorkers[m_SEEDING_i].constructor(&vertexKey,m_parameters,
> m_outboxAllocator,m_virtualCommunicator,m_SEEDING_i);
> m_activeWorkers.insert(m_SEEDING_i);
> }
If I reduce that minimum down to 2 [i.e. changing
that-which-should-not-be-changed], and change one of the output strings
to base-space [it was previously displaying the root k-mer in
colour-space], then I get a successful assembly, both for sebhtml/ray,
and for gringer/ray.
$ wc -l test_phiX.Contigs.fasta testSeb_phiX_5k.Contigs.fasta
90 test_phiX.Contigs.fasta
90 testSeb_phiX_5k.Contigs.fasta
180 total
$ diff test_phiX.Contigs.fasta testSeb_phiX_5k.Contigs.fasta
[no output]
$ fasta_formatter -i ../tests/phix/phix.fasta | fastx_reverse_complement
| grep $(fasta_formatter -i test_phiX.Scaffolds.fasta | grep -v '^>') >
/dev/null && echo "success (match in reverse direction)"
success (match in reverse direction)
My code now does a correct assembly both on simulated and on real
(circularised) phiX data:
$ ../code/Ray -s
~/illuminadata/110517_H134_0062_A816HKABXX/Data/Intensities/BaseCalls/fastq/ignore/head100k_interleaved_noN_phiX_region1101.fasta
-o ray_output/test2_phiX_circular
...
Number of contigs: 1
Total length of contigs: 5385
Number of contigs >= 500 nt: 1
Total length of contigs >= 500 nt: 5385
Number of scaffolds: 1
Total length of scaffolds: 5385
Number of scaffolds >= 500 nt: 1
Total length of scaffolds >= 500: 5385
-- BLAST results (for 600bp sequence overlapping joiner) --
>gb|M14428.1|S13CG Bacteriophage S13 circular DNA, complete genome
Length=5386
Score = 1068 bits (1184), Expect = 0.0
Identities = 597/600 (99%), Gaps = 0/600 (0%)
Strand=Plus/Plus
$ ./test_phiX.sh
Checking full Ray run with phiX genome... 5000 Reads simulated...
Running Ray... success (match in forward direction)!
This is doing things that Ray can already do (i.e. read in base-space
reads, assemble, output as base-space), but I need to make sure it can
do at least that before trying other things.
It's also slower than sebhtml/ray (Total: 25 seconds vs 14 seconds; 57
seconds if I include asserts and debug symbols). I presume this is
because of the additional complexity for getOutgoingEdges / getLastCode
(it needs to re-calculate the last base each time by iterating through
the k-mer). Perhaps the k-mer should store the last base as well as the
first....
-- David
------------------------------------------------------------------------------
AppSumo Presents a FREE Video for the SourceForge Community by Eric
Ries, the creator of the Lean Startup Methodology on "Lean Startup
Secrets Revealed." This video shows you how to validate your ideas,
optimize your ideas and identify your business strategy.
http://p.sf.net/sfu/appsumosfdev2dev
_______________________________________________
Denovoassembler-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users