Re: [Denovoassembler-users] Ray and 454 paired-end data

Sébastien Boisvert Thu, 12 Jan 2012 08:28:45 -0800

Hi !

(the list is in C.C., I redacted proprietary or confidential information)



On 12/01/12 07:56 AM, Stefan Zoller wrote:
> Dear Sebastien
>
> I have recently installed your Ray assembler and was quite impressed
> with the speed and result when I ran an illumina data set.

Illumina, Inc. data have so few sequencing errors.
It is almost beautiful to analyze. ;)

> However, I am
> running into a problem when using 454 paired-end data. I hope you have
> some time to help me with this.
>
>    


It may be related to homopolymers.
I don't like them.

> The problem is as follows:
> I am loading several 454 libraries with different insert sizes that I
> know from using another assembler should give a reasonable assembly with
> about InformationRedacted scaffolds.
> However, in Ray I do not get any scaffolding

Do you get at least Scaffolds.fasta ?

>   and I now wonder if I am
> doing something wrong when preparing the "left" and "right" fastq files
> from the original sff files,

This is tricky, indeed. See below.

>   or if I have to use some additional options
> in Ray.
>    

No there is no other option to provide.
Ray knows what to do.



> This is my run command:
>      mpiexec -n 15 Ray  -k 25 \
>     -p InformationRedacted.1.fastq InformationRedacted.2.fastq 
> InformationRedacted InformationRedacted \
>     -p InformationRedacted.1.fastq InformationRedacted.2.fastq 
> InformationRedacted InformationRedacted \
>     -p InformationRedacted.1.fastq
> InformationRedacted.2.fastq InformationRedacted InformationRedacted \
>     -p InformationRedacted.1.fastq
> InformationRedacted.2.fastq 5000 1550 \
>     -o InformationRedacted-k25>  log.ray 2>&1&
>
>    

This seems correct.

> Could it be, that Ray is expecting a special format for the fastq
> sequence header lines?

No, Ray only read the sequence of any entry in fastq format,
the 3 other lines are not even processed.


However, your pairs must be like that:


------------------> <-------------------------


of like that


<------------------                   --------------------------->


other formats are not considered.


>   My fastq files look like this:
> @InformationRedacted/1
> InformationRedacted
> +
> InformationRedacted
> InformationRedacted/1
> InformationRedacted>
> +
> <InformationRedacted>
>
>
> Do you maybe have a testfile, that you could share with me, that I could
> run on my installation? Or could you maybe give hints on how to prepare
> the left and right sequence files when coming from the original 454 sff
> files?
>    

I don't have a 454 dataset with large inserts.

However, you can try to use 454-pairs to prepare the fastq files
from the sff files.

Here it is:

https://github.com/sebhtml/454-pairs

I used this on some bird genomes.


> I am grateful for any help!
>
> Best regards,
>    

May the source be with you.

> Stefan
>
>
>    


------------------------------------------------------------------------------
RSA(R) Conference 2012
Mar 27 - Feb 2
Save $400 by Jan. 27
Register now!
http://p.sf.net/sfu/rsa-sfdev2dev2
_______________________________________________
Denovoassembler-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Re: [Denovoassembler-users] Ray and 454 paired-end data

Reply via email to