Hi, Bremen!
Could you explain what "backend sequences" are?
It seems likely that BLAT could do the job.
It is good for many kinds of alignment jobs.
However, for very short reads, we recommend
using a short read aligner like MAQ, etc.
I found this on samtools site to-do list:
-------------
Converting the PSL format to SAM
* Priority: Low
* Difficulty: Easy
* Background: PSL is widely used by UCSC. Samtools provides a
simple converter, but it only translates coordinates.
* Description: Implement a proper converter for PSL. A perl/python
script would be ideal.
* Note: It would be better for someone to maintain the converters
for other formats. It is hard for one person to keep track of the
development of all the aligners.
-----------
Background on SAM/BAM
http://bioinformatics.oxfordjournals.org/cgi/reprint/btp352v1.pdf
BAM is just binary compressed and possibly indexed SAM.
SAM is defined as these required fields:
# Name Description
1 QNAME Query NAME of the read or the read pair
2 FLAG bitwise FLAG (pairing, strand, mate strand, etc.)
3 RNAME Reference sequence NAME
4 POS 1-based leftmost POSition of clipped alignment
5 MAPQ MAPping Quality (Phred-scaled)
6 CIGAR extended CIGAR string (operations: MIDNSHP)
7 MRNM Mate Reference NaMe (‘=’ if same as RNAME)
8 MPOS 1-based leftmost Mate POSition
9 ISIZE inferred Insert SIZE
10 SEQ query SEQuence on the same strand as the reference
11 QUAL query QUALity (ASCII-33=Phred base quality)
They say that they already have a converter.
Perhaps it is good enough.
To convert psl to SAM, it might be a little easier
if you output pslx since it will include the sequence
itself.
Just off-hand, I'd say the pslx format could supply
info for fields 1, 4, and 10 of the SAM format.
Some of the other fields allow * to stand in for unknown values.
What are hoping to do with the samtools?
The bigBed and bigWig tools from UCSC provide some
overlap in functionality with SAM/BAM.
PSL FORMAT INFO
----------------
http://genome.ucsc.edu/FAQ/FAQformat#format2
PSL lines represent alignments, and are typically taken from files
generated by BLAT or psLayout. See the BLAT documentation for more
details. All of the following fields are required on each data line
within a PSL file:
1. matches - Number of bases that match that aren't repeats
2. misMatches - Number of bases that don't match
3. repMatches - Number of bases that match but are part of repeats
4. nCount - Number of 'N' bases
5. qNumInsert - Number of inserts in query
6. qBaseInsert - Number of bases inserted in query
7. tNumInsert - Number of inserts in target
8. tBaseInsert - Number of bases inserted in target
9. strand - '+' or '-' for query strand. For translated alignments,
second '+'or '-' is for genomic strand
10. qName - Query sequence name
11. qSize - Query sequence size
12. qStart - Alignment start position in query
13. qEnd - Alignment end position in query
14. tName - Target sequence name
15. tSize - Target sequence size
16. tStart - Alignment start position in target
17. tEnd - Alignment end position in target
18. blockCount - Number of blocks in the alignment (a block contains
no gaps)
19. blockSizes - Comma-separated list of sizes of each block
20. qStarts - Comma-separated list of starting positions of each
block in query
21. tStarts - Comma-separated list of starting positions of each
block in target
-Galt
> Hello,
> I am wondering if Blat is recommended for my situation. I am hoping to
> display backend sequences mapped against genomic sequences and to view the
> differences using SAM. What comparison tool would you recommend? I see as of
> yet samtools doesn't convert psl to sam but there is a low priority open
> task to do so. The task difficulty has been described as easy, so surely
> there's someone that has already performed this task?
>
> Thanks,
> Bremen Braun
> _______________________________________________
> Genome maillist - [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist - [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome