Hi Peng, The sequences are the query and target as they exist in the sources. Use the alignment information in the other psl data fields to determine where any mismatches, gaps, etc. are present in the alignment.
You could always output the actual alignments with BLAT if you want to visualize the matches in detail. See the other "-out=type" options and try a few to see which provides the best view for your uses. Thanks! Jennifer --------------------------------- Jennifer Jackson UCSC Genome Informatics Group http://genome.ucsc.edu/ On 4/21/10 7:57 PM, Peng Yu wrote: > On Wed, Apr 21, 2010 at 5:28 PM, Jennifer Jackson<[email protected]> wrote: >> Hello Peng, >> >> pslx is the same as psl format with the sequence (query and target) >> included. This is noted in the BLAT documentation: >> >> http://genome.ucsc.edu/goldenPath/help/blatSpec.html >> >> -out=type Controls output file format. Type is one of: >> psl - Default. Tab separated format, no sequence >> pslx - Tab separated format with sequence >> etc .............. > > The word 'sequence' lacks clear definition. What sequence? Only the > perfect matched would be shown? What about mismatch (in the middle and > at the ends)? What about gaps? (These are what I mean by 'corner' > cases. There may be other cases that I am not aware of.) > > I have tried some test cases. But since I am not able to dig into the > source code of BLAT, I will not be able to enumerate all the possible > cases. Could you or somebody who are familiar with BLAT show example > output for each corner case? > >> FAQ for psl format: >> http://genome.ucsc.edu/FAQ/FAQformat.html#format2 >> >> 21 columns for psl, 23 for pslx >> >> Hopefully this helps, >> Jennifer >> >> --------------------------------- >> Jennifer Jackson >> UCSC Genome Informatics Group >> http://genome.ucsc.edu/ >> >> On 4/21/10 8:50 AM, Peng Yu wrote: >>> >>> I could just guess what a field represents from the field name. But my >>> guess may not correct for corner cases. Could you let me know where >>> the description of the format is? Also, it seems that there are >>> different number of fields above '----' and below it. Why? >>> >>> psLayout version 3 >>> >>> match mis- rep. N's Q gap Q gap T gap T gap strand Q >>> Q Q >>> Q T T T T block blockSizes >>> qStarts tStarts >>> match match count bases count bases >>> name >>> size start end name size start end >>> count >>> >>> --------------------------------------------------------------------------------------------------------------------------------------------------------------- >>> 24 0 0 0 0 0 0 0 + >>> test_sequence 25 1 25 chr1 75 26 50 1 >>> 24, 1, 26, ttgcaccggaaagtctgctccaga, >>> ttgcaccggaaagtctgctccaga, >>> >> > > > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
