Re: [Denovoassembler-users] 454 GS FLX Titanium reads with Ray

Sébastien Boisvert Fri, 01 Jun 2012 07:55:02 -0700

Hello !

This is a good question and I have the answer for you.



Today is my collaboration day in my scheduling so I can provide
a detailed answer.



For Illumina data, you either have 2 files (*R1* and *R2* or *_1* and *_2*).

Both files have the same number of reads. The first file contains leftreads and

the second file contains right reads.

For Illumina paired-end sequencing, all reads are paired. This is justsimplier

and less confusing and this is basically possible because
the Illumina SBS sequencing technology uses reversible terminators.
The technology is described in reference 1.
You can also have a single file in case where single-end sequencing occured.

For 454 data, you will have 3 files for paired end sequencing (orwhatever it is called).Mate pairs or paired reads with the 454 technology implies generatinglibraries usingcircularization and the biotin/streptavidin strategy. This is describedin reference 2.

As a consequence, a number of reads won't be paired,
these will just be plain & boring single-end reads.

The proportion of single-end reads in a paired end 454 run will vary.The attributes

of the operator (mostly dexterity and intelligence as well as the level)
will make this proportion vary.

You will also obtain paired reads, obviously.

To explain to you and other readers, I downloaded the data you are usingfrom EBI.


I downloaded 3 454 files.

seb@fault:~/2012/454-$ ls -lh
total 29M
-rw-rw-r-- 1 seb seb  11M jun  1 10:24 SRR454996_1.fastq.gz
-rw-rw-r-- 1 seb seb  15M jun  1 10:24 SRR454996_2.fastq.gz
-rw-rw-r-- 1 seb seb 3,4M jun  1 10:24 SRR454996.fastq.gz

The EBI page erroneously reports that each of the three files has 92,871sequences.

http://www.ebi.ac.uk/ena/data/view/SRP012081

The correct counts are:

seb@fault:~/2012/454-$ zcat SRR454996_1.fastq.gz|grep ^@SRR|wc -l
84448

seb@fault:~/2012/454-$ zcat SRR454996_2.fastq.gz|grep ^@SRR|wc -l
84448

seb@fault:~/2012/454-$ zcat SRR454996.fastq.gz|grep ^@SRR|wc -l
8423


There is 84448 * 2 sequences + 8423 sequences.


So you should use Ray with this:


mpiexec -n 64 \
Ray -k 25 -o SRP012081-Ray-Odin9 \
-p SRP012081/SRR454996_1.fastq.gz SRP012081/SRR454996_2.fastq.gz \
-s SRP012081/SRR454996.fastq.gz



> Also how can I understand if a library is shortjumpimg or longjumping?

On the EBI site, you have these columns too:

Library Name Library Layout Library Strategy Library Source LibrarySelection



But they are not very informative regarding that.


This information may be available on EBI in the XML files, sometimes.

Your best chance for this information is usually the associated paper.

Ray will detect these for you anyway, or at least Ray will try.
For Neisseria meningitidis, everything should work quite well.

I think there are 3 standard sizes for 454 paired end sequencing:

* 3 kb
* 8 kb
* 20 kb




References

1. The paper describing the Illumina sequencing technology.
Nature. 2008 November 6; 456(7218): 53--59.
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2581791/?tool=pubmed

2. An application node describing 454 paired end sequencing
/Nature Methods/ - *5*, (2008) doi:10.1038/nmeth.f.212
http://www.nature.com/nmeth/journal/v5/n5/full/nmeth.f.212.html



Happy assembly.



nikos ioannidis a écrit :

Hello

I see that many studies at ENA have 454 GS FLX Titanium paired reads,
but most of the times the files are fastq1 fastq2 and fastq3
how should i compine these files as an input in Ray?

Like: -p fastq1 fastq2
         -p fastq2 fastq3

something like that could work?

For example at this study : http://www.ebi.ac.uk/ena/data/view/SRP012081

they have:

Instrument Model        Library Layout           Run Read Count Run Base Count
Illumina HiSeq 2000    PAIRED                     12,794,66
     42Gb                 Fastq file#1
Illumina HiSeq 2000    PAIRED                     12,794,66
     42Gb                 Fastq file#2

454 GS FLX Titanium  PAIRED                      66,691
         31Mb             Fastq file#1<-
454 GS FLX Titanium  PAIRED                      66,691
         31Mb             Fastq file#2<-
454 GS FLX Titanium PAIRED                        66,691
         31Mb             Fastq file#3<-

454 GS FLX Titanium  PAIRED                     92,871
         39Mb            Fastq file#1<-
454 GS FLX Titanium PAIRED                      92,871
         39Mb            Fastq file#2<-
454 GS FLX Titanium PAIRED                      92,871
          39Mb           Fastq file#3<-_



how should I set Ray for them?


Also how can I understand if a library is shortjumpimg or longjumping?

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Denovoassembler-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

_______________________________________________
Denovoassembler-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Re: [Denovoassembler-users] 454 GS FLX Titanium reads with Ray

Reply via email to