Sébastien Boisvert wrote:
> First base should be recorded in structures/Read.
> Also, symbols (from {0,1,2,3} or from {A,C,G,T}) are encoded in 2
> bits in Ray so I think your changes will likely break support for
> color-space (because of the N).
I've had a go at storing first base + colour space sequence in the Read
structure. The short summary is that you can now read in one type, and
pull out the other type (with or without double-encoding). It now trims
off the first base (if extracting out colour-space sequence), because I
didn't want to disrupt this elsewhere too much [yet].
I need to go a bit deeper and replicate this kind of format for Kmers
for it to be really useful, but I was able to fit things in this far
without changing too much other code.
My most interesting commit is probably this one:
https://github.com/gringer/ray/commit/95efe30674b6bea14bc68c90d8e65c261ecbe3ed
Although this one makes it actually usable (well, almost).
https://github.com/gringer/ray/commit/b41673e9f44b59354ce5352d3e718ef28e9dafa2
> Can you test on http://solidsoftwaretools.com/gf/project/ecoli50x50/
> to see if it works or fails ?
I'll do that in the next day or so, but my most recent work has been
trying to set up unit tests to make sure things are doing what I expect:
$ CODE=../code; g++ $CODE/format/ColorSpaceCodec.cpp \
$CODE/structures/Read.cpp \
$CODE/core/common_functions.cpp \
$CODE/memory/malloc_types.cpp \
$CODE/memory/allocator.cpp \
$CODE/memory/MyAllocator.cpp \
$CODE/memory/ReusableMemoryStore.cpp \
$CODE/structures/Kmer.cpp \
$CODE/cryptography/crypto.cpp \
unit_tests.cpp -I$CODE -I..
$ ./a.out
Checking ColorSpaceCodec:
1: checking colour-space decode (junk characters)... success!
2: checking colour-space decode (fully informative sequence)... success!
3: checking colour-space decode (inverse function actions)... success!
4: checking colour-space decode (reverse decode)... success!
Checking Read:
1: checking colour-space encoding converted to double-encoded
base-space... warning: useless double-encoding requested for base-space
output... success!
2: checking colour-space encoding converted to colour-space... success!
3: checking colour-space encoding converted to double-encoded
colour-space... success!
4: checking colour-space encoding with misreads converted to
base-space... success!
5: checking colour-space encoding with misreads converted to
colour-space... success!
6: checking base-space encoding converted to colour-space... success!
7: checking base-space encoding with misreads converted to
colour-space... success!
7: checking base-space encoding with misreads converted to base-space...
success!
I think the quirkiest transformation is from a junk-filled base-space to
colour-space:
ZZAC6ACGCXAAATAT55ACTCCAGCTCC..RCA..
-> NNACNACGCNAAATATNNACTCCAGCTCCNNNCANN
-> ACNACGCNAAATATNNACTCCAGCTCCNNNCA
-> 1101331000333300122012322010011
Will keep you posted about my attempts at getting this working.
Cheers,
David Eccles
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Denovoassembler-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users