Re: [Bioc-sig-seq] readFastq() error

joseph Thu, 24 Mar 2011 11:24:05 -0700

Hi Kasper
You are right, I did not understand it; but now I do. Thank you for explaining 
it to me.
I deleted the last read and now I am able to read in the file with readFastq().





________________________________
From: Kasper Daniel Hansen <kasperdanielhan...@gmail.com>
To: Martin Morgan <mtmor...@fhcrc.org>

Sent: Thu, March 24, 2011 10:49:01 AM
Subject: Re: [Bioc-sig-seq] readFastq() error

Joseph,

Perhaps you do not fully understand the results of the code you showed
a few emails ago.

> which(nchar(rd) != nchar(qual))
[1] 16509910

This number tells us that there is a single read where the qual length
and the read length is different and that this read is read number
16509910.  If you tell us that there are 16509910 reads, this means
that the read where there is a problem, is the last read in the file.

> length(which(nchar(rd) == nchar(qual)))
[1] 16509909

This number tells us how many reads have the same read length and qual
length.  Note that it is 1 smaller than the previous number.  Together
this tells us that there is a single read in the file that has a
problem and that this read is the last read.

Martin proposed a missing newline (because that is something you
sometimes see), but as he said, when you look at the read you printed
out, there is actually something missing.

Now, you can ignore it, or you can track it down.  This _may_ indicate
that however you got the file, something went wrong durign the file
creation and you are missing part of the file, which could be bad.  Or
you could just have a malformed single read which would be irritating,
but probably ignorable.

Kasper


On Thu, Mar 24, 2011 at 1:33 PM, Martin Morgan <mtmor...@fhcrc.org> wrote:
> On 03/24/2011 10:01 AM, joseph wrote:
>
>> @GAII_0001:6:91:210:160#0/1
>> CTCGCGAAGCTTCTCTGGAGGAGAGTGATGTACGATGNCN
>> +GAII_0001:6:91:210:160#0/1
>> a__a_a__a_ba]abbabXa__a_BBBBBBBBBBBBBB
>> boyce-162-119:mRNA_monocyte jdhahbi$
>
> As you can see, the last read has two quality scores less than the number of
> nucleotides. This has been introduced somewhere in your upstream processing
> path.
>
> Martin
>>
>>
>>
>> ------------------------------------------------------------------------
>> *From:* Martin Morgan <mtmor...@fhcrc.org>

>> *Cc:* bioc-sig-sequencing@r-project.org
>> *Sent:* Thu, March 24, 2011 9:39:42 AM
>> *Subject:* Re: [Bioc-sig-seq] readFastq() error
>>
>> On 03/24/2011 09:41 AM, joseph wrote:
>>  > I added a new line character at the end of the file
>>  > echo >> reads.fq
>>  > I got the same numbers when I repeated the analysis
>>
>> you indicated that there were 16509910 reads in the file, and the test
>> indicates its the last read that causes problems, so what does the last
>> read look like? e.g., tail reads.fq
>>
>> Martin
>>  >
>>  >
>>  >
>> ------------------------------------------------------------------------
>>  > *From:* Martin Morgan <mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>>

>>  > *Cc:* bioc-sig-sequencing@r-project.org
>> <mailto:bioc-sig-sequencing@r-project.org>
>>  > *Sent:* Wed, March 23, 2011 7:44:40 PM
>>  > *Subject:* Re: [Bioc-sig-seq] readFastq() error
>>  >
>>  > On 03/23/2011 05:49 PM, joseph wrote:
>>  > > Hi Martin
>>  > > here is what I got:
>>  > > x = readLines('~/myDir/reads.fq')
>>  > > rd = x[c(FALSE, TRUE, FALSE, FALSE)]
>>  > > qual = x[c(FALSE, FALSE, FALSE, TRUE)]
>>  > > > which(nchar(rd) != nchar(qual))
>>  > > [1] 16509910
>>  > > # that is all the reads in the file
>>  > > # When I tried to count the reads with the same number of characters,
>> I
>>  > > also got all the reads
>>  > > > length(which(nchar(rd) == nchar(qual)))
>>  > > [1] 16509909
>>  >
>>  > I suspect there is a missing end-of-line on the last line of the file.
>>  > >
>>  > > Joseph
>>  > >
>>  > >
>>  > >
>>  > >
>> ------------------------------------------------------------------------
>>  > > *From:* Martin Morgan <mtmor...@fhcrc.org
>> <mailto:mtmor...@fhcrc.org> <mailto:mtmor...@fhcrc.org
>> <mailto:mtmor...@fhcrc.org>>>


>>  > > *Cc:* bioc-sig-sequencing@r-project.org
>> <mailto:bioc-sig-sequencing@r-project.org>
>>  > <mailto:bioc-sig-sequencing@r-project.org
>> <mailto:bioc-sig-sequencing@r-project.org>>
>>  > > *Sent:* Wed, March 23, 2011 4:21:25 PM
>>  > > *Subject:* Re: [Bioc-sig-seq] readFastq() error
>>  > >
>>  > > On 03/23/2011 04:07 PM, Martin Morgan wrote:
>>  > > > On 03/23/2011 03:58 PM, joseph wrote:
>>  > > >> Hello
>>  > > >> How would you fix a FASTQ file that gives the following error when
>>  > > >> read with
>>  > > >> readFastq()?
>>  > > >> Other lanes from the same flow cell are imported fine with
>>  > readFastq().
>>  > > >>
>>  > > >> rfq = readFastq("~/myDir", pattern="reads.fq")
>>  > > >> Error: Input/Output
>>  > > >> file(s):
>>  > > >> ~/myDir/reads.fq
>>  > > >> message: IncompatibleTypes
>>  > > >> message: invalid class "ShortReadQ" object: some sread and quality
>>  > > widths
>>  > > >> differ
>>  > > >>
>>  > > >
>>  > > > you could read the file in
>>  > > >
>>  > > > x = readLines('~/myDir/reads.fq')
>>  > > >
>>  > > > split it into reads and qualities
>>  > > >
>>  > > > rd = x[c(FALSE, TRUE, FALSE, FALSE)]
>>  > > > qual = x[c(FALSE, FALSE, TRUE, FALSE)]
>>  > >
>>  > > oops, x[c(FALSE, FALSE, FALSE, TRUE)]
>>  > >
>>  > > >
>>  > > > and ask which have different numbers of characters
>>  > > >
>>  > > > which(nchar(rd) != nchar(qual))
>>  > > >
>>  > > > Martin
>>  > > >
>>  > > >> head reads.fq
>>  > > >> @GAII_0001:6:1:0:101#0/1
>>  > > >> NCTCANCATTGTTTGGACGGAACAAAACCGGGGACAATCT
>>  > > >> +GAII_0001:6:1:0:101#0/1
>>  > > >> BX[_\B_VXGQQU]]]YTPMGWTZZTVQ_X[TGYPZG[WZ
>>  > > >> @GAII_0001:6:1:0:123#0/1
>>  > > >> NGTGANTCNGCTCATTGCGAGTTTTAACCTTTTCTCTATC
>>  > > >> +GAII_0001:6:1:0:123#0/1
>>  > > >> BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
>>  > > >> @GAII_0001:6:1:0:168#0/1
>>  > > >> NCCAGNCCCAGCAGCCCTTCCTTTTCCCTGCTTACCCTCA
>>  > > >>
>>  > > >>
>>  > > >>
>>  > > >> [[alternative HTML version deleted]]
>>  > > >>
>>  > > >> _______________________________________________
>>  > > >> Bioc-sig-sequencing mailing list
>>  > > >> Bioc-sig-sequencing@r-project.org
>> <mailto:Bioc-sig-sequencing@r-project.org>
>>  > <mailto:Bioc-sig-sequencing@r-project.org
>> <mailto:Bioc-sig-sequencing@r-project.org>>
>>  > > <mailto:Bioc-sig-sequencing@r-project.org
>> <mailto:Bioc-sig-sequencing@r-project.org>
>>  > <mailto:Bioc-sig-sequencing@r-project.org
>> <mailto:Bioc-sig-sequencing@r-project.org>>>
>>  > > >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>  > > >
>>  > > >
>>  > >
>>  > >
>>  > > --
>>  > > Computational Biology
>>  > > Fred Hutchinson Cancer Research Center
>>  > > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>  > >
>>  > > Location: M1-B861
>>  > > Telephone: 206 667-2793
>>  > >
>>  >
>>  >
>>  > --
>>  > Computational Biology
>>  > Fred Hutchinson Cancer Research Center
>>  > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>  >
>>  > Location: M1-B861
>>  > Telephone: 206 667-2793
>>  >
>>
>>
>> --
>> Computational Biology
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>
>> Location: M1-B861
>> Telephone: 206 667-2793
>>
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>



      
        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] readFastq() error

Reply via email to