Hi ! (the list is in C.C., I redacted proprietary or confidential information)
On 12/01/12 07:56 AM, Stefan Zoller wrote: > Dear Sebastien > > I have recently installed your Ray assembler and was quite impressed > with the speed and result when I ran an illumina data set. Illumina, Inc. data have so few sequencing errors. It is almost beautiful to analyze. ;) > However, I am > running into a problem when using 454 paired-end data. I hope you have > some time to help me with this. > > It may be related to homopolymers. I don't like them. > The problem is as follows: > I am loading several 454 libraries with different insert sizes that I > know from using another assembler should give a reasonable assembly with > about InformationRedacted scaffolds. > However, in Ray I do not get any scaffolding Do you get at least Scaffolds.fasta ? > and I now wonder if I am > doing something wrong when preparing the "left" and "right" fastq files > from the original sff files, This is tricky, indeed. See below. > or if I have to use some additional options > in Ray. > No there is no other option to provide. Ray knows what to do. > This is my run command: > mpiexec -n 15 Ray -k 25 \ > -p InformationRedacted.1.fastq InformationRedacted.2.fastq > InformationRedacted InformationRedacted \ > -p InformationRedacted.1.fastq InformationRedacted.2.fastq > InformationRedacted InformationRedacted \ > -p InformationRedacted.1.fastq > InformationRedacted.2.fastq InformationRedacted InformationRedacted \ > -p InformationRedacted.1.fastq > InformationRedacted.2.fastq 5000 1550 \ > -o InformationRedacted-k25> log.ray 2>&1& > > This seems correct. > Could it be, that Ray is expecting a special format for the fastq > sequence header lines? No, Ray only read the sequence of any entry in fastq format, the 3 other lines are not even processed. However, your pairs must be like that: ------------------> <------------------------- of like that <------------------ ---------------------------> other formats are not considered. > My fastq files look like this: > @InformationRedacted/1 > InformationRedacted > + > InformationRedacted > InformationRedacted/1 > InformationRedacted> > + > <InformationRedacted> > > > Do you maybe have a testfile, that you could share with me, that I could > run on my installation? Or could you maybe give hints on how to prepare > the left and right sequence files when coming from the original 454 sff > files? > I don't have a 454 dataset with large inserts. However, you can try to use 454-pairs to prepare the fastq files from the sff files. Here it is: https://github.com/sebhtml/454-pairs I used this on some bird genomes. > I am grateful for any help! > > Best regards, > May the source be with you. > Stefan > > > ------------------------------------------------------------------------------ RSA(R) Conference 2012 Mar 27 - Feb 2 Save $400 by Jan. 27 Register now! http://p.sf.net/sfu/rsa-sfdev2dev2 _______________________________________________ Denovoassembler-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
