On Thu, Oct 20, 2011 at 2:15 PM, Eric Cabot <[email protected]> wrote: >> I was not aware of this new naming. It seems like a terrible decision from >> Illumina because now both reads in a pair technically have the same ID (but >> a different description). > > This is not quite the case. Here are two fastq header lines for a pair of > reads produced by Illumina's CASAVA 1.8: > > @XYZZY:123:D0ABCDEFG:7:1101:1445:2057 1:N:0:CTTGTA > @XYZZY:123:D0ABCDEFG:7:1101:1445:2057 2:N:0:CTTGTA
Yes, Illumina gives both read 1 and read 2 the same template ID of XYZZY:123:D0ABCDEFG:7:1101:1445:2057 (much like the two reads would have the same ID in a SAM/BAM file). > The two key things to note, relevant to this discussion are: > > 1. A space character is used to split the fields into two groups. > This is actually a good thing, because that particular character can NEVER > appear in either a sequence or a quality line. This make it easy to detect > name lines as those beginning with "@" (a valid quality character) and also > having a space. If you are writing a parser for the new Illumina fastq > format, please don't break the names on spaces! Yes, you could use the space as a sanity test for *this* style Illumina FASTQ, and have a bespoke parser which treats this all specially. But for a generic FASTQ parser you *should* split at the space. The point is Illumina have changed the meaning of their FASTQ identifier, it used to be the template ID plus a /1 or /2 suffix, but now it is just the common template ID used for both parts. > 2. Appart from the read number, encoded as the digit immediately following > the space, the two lines are identical--as they were with earlier CASAVA > versions. Why is this worse than two lines differing by "/1" vs. "/2"? Because it is a change from the existing well established convention, which will require changed to hundreds of scripts and and tools (guessed number including user's bespoke scripts). > An additional improvement with the new naming convention is that flowcell > and run ID's, as well as a flag for not passing filters (where N means does > PF), are now included. Yes, that is good. Peter ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

