On 20/01/2011 09:36, Peter Rice wrote:
On 01/20/11 09:06, Stephen Taylor wrote:
Hi Peter,


Is EMBOSS planning to release tools that produce SAM format in the near
future or is it more likely to be on the customary July 15th release?

The last release EMBOSS 6.3.1 has SAM as an output format for sequences
and pairwise alignments (-oformat sam and -aformat sam respectively).

... oops, -osformat for sequences of course.


Sadly, fuzznuc doesn't seem to work using aformat or oformat. Is that
due to be supported?

Ah, fuzznuc reports features so we hadn't implemented SAM there.

However, you can use -rformat listfile to get USAs for the features, and
then seqret -osformat sam @listfilename to get the sequences in SAM format.

Possibly scope to do more there. What would you like to see in SAM
output for fuzznuc?

My motivation was to build BAM tracks showing matches of lots of patterns in the genome sequence. I hadn't thought about proteins but I guess you could so something similar.

The SAM file would show the position of each match per line and the CIGAR string containing the matched pattern and SEQ (col 10) containing the query pattern expanded to show the match. The original pattern could be in the OPT field. I see there is a tag for Mismatching positions (MD) which would work for regex style matches (so good for 'dreg'), but I am not sure it would be strictly legal for a PROSITE like pattern.

e.g for [CG](5)TG{A}N(1,5)C

Could you have

MD:Z:[CG](5)TG{A}N(1,5)C

?

It looks like {,} is not allowed. So perhaps you would have to translate the pattern to a regex or generate an alternative optional tag. I am not a SAM expert so apologies if I am proposing to violate the format rules!

Incidentally, I would use dreg but it doesn't allow mismatches to be easily 
specified.

Steve




_______________________________________________
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss

Reply via email to