Sébastien Boisvert wrote:
 > First base should be recorded in structures/Read.
 > Also, symbols (from {0,1,2,3} or from {A,C,G,T}) are encoded in 2
 > bits in Ray so I think your changes will likely break support for
 > color-space (because of the N).

I've had a go at storing first base + colour space sequence in the Read 
structure. The short summary is that you can now read in one type, and 
pull out the other type (with or without double-encoding). It now trims 
off the first base (if extracting out colour-space sequence), because I 
didn't want to disrupt this elsewhere too much [yet].

I need to go a bit deeper and replicate this kind of format for Kmers 
for it to be really useful, but I was able to fit things in this far 
without changing too much other code.

My most interesting commit is probably this one:

https://github.com/gringer/ray/commit/95efe30674b6bea14bc68c90d8e65c261ecbe3ed

Although this one makes it actually usable (well, almost).

https://github.com/gringer/ray/commit/b41673e9f44b59354ce5352d3e718ef28e9dafa2

 > Can you test on http://solidsoftwaretools.com/gf/project/ecoli50x50/
 > to see if it works or fails ?

I'll do that in the next day or so, but my most recent work has been 
trying to set up unit tests to make sure things are doing what I expect:

$ CODE=../code; g++ $CODE/format/ColorSpaceCodec.cpp \
  $CODE/structures/Read.cpp \
  $CODE/core/common_functions.cpp \
  $CODE/memory/malloc_types.cpp \
  $CODE/memory/allocator.cpp \
  $CODE/memory/MyAllocator.cpp \
  $CODE/memory/ReusableMemoryStore.cpp \
  $CODE/structures/Kmer.cpp \
  $CODE/cryptography/crypto.cpp \
  unit_tests.cpp -I$CODE -I..

$ ./a.out
Checking ColorSpaceCodec:
1: checking colour-space decode (junk characters)... success!
2: checking colour-space decode (fully informative sequence)... success!
3: checking colour-space decode (inverse function actions)... success!
4: checking colour-space decode (reverse decode)... success!
Checking Read:
1: checking colour-space encoding converted to double-encoded 
base-space... warning: useless double-encoding requested for base-space 
output... success!
2: checking colour-space encoding converted to colour-space... success!
3: checking colour-space encoding converted to double-encoded 
colour-space... success!
4: checking colour-space encoding with misreads converted to 
base-space... success!
5: checking colour-space encoding with misreads converted to 
colour-space... success!
6: checking base-space encoding converted to colour-space... success!
7: checking base-space encoding with misreads converted to 
colour-space... success!
7: checking base-space encoding with misreads converted to base-space... 
success!


I think the quirkiest transformation is from a junk-filled base-space to 
colour-space:

    ZZAC6ACGCXAAATAT55ACTCCAGCTCC..RCA..
-> NNACNACGCNAAATATNNACTCCAGCTCCNNNCANN
-> ACNACGCNAAATATNNACTCCAGCTCCNNNCA
-> 1101331000333300122012322010011

Will keep you posted about my attempts at getting this working.

Cheers,
David Eccles

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Denovoassembler-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to