Hi Jonas, Thanks for emailing me that example with an M in the sequence. Biopython could parse it fine, and having checked our existing sample test files, this one has K, R and Y bases:
https://github.com/biopython/biopython/blob/master/Tests/Abi/3730.ab1 BioJava would be welcome to use that (double check with Bow, CC'd, if you need it explicitly under a different licence). Regards, Peter On Tue, Jul 12, 2016 at 5:41 PM, Peter Cock <[email protected]> wrote: > Hi Jonas, > > Are you happy to share sample file(s) using IUPAC ambiguity > codes like M = A or C which could be freely used by BioJava > and other projects as a test case? > > (I'm specifically asking for Biopython as I'm not sure if anyone > has tried this with our ABI parser) > > Thanks, > > Peter > > On Tue, Jul 12, 2016 at 4:26 PM, Jonas Dehairs <[email protected]> > wrote: >> The 4.2 API currently does not have methods for importing and >> handeling Sanger sequencing files (ABI, SCF). I'm currently resorting >> to the legacy classes in 1.9.1 (ChromatogramFactory and Chromatogram). >> >> ChromatogramFactory only supports Sanger trace files with standard >> ATGCN characters. It throws a >> UnsupportedChromatogramFormatException upon reading Sanger files with >> IUPAC Ambiguity Codes (for example M = A or C). Even if I would just >> like to access the traces and ignore the base calls, this is >> impossible with the current implementation since we can't even open >> the file if it contains Ambiguity codes. >> >> On a side note, I have been getting more and more questions from users >> why they can't open their Sanger sequencing files (in my program that >> uses BioJava). I think the popularity of CRISPR and the >> characterization of CRISPR KO clones (which is likely to result in >> heterozygous base calls) is increasing the number of people that have >> these IUPAC Ambiguity Sanger files. >> >> For now, I tell people to go back to the Sanger sequencing software >> that exports the ABI or SCF files and disable IUPAC Ambiguity in the >> export options. In that case the base calling algorithm just picks the >> strongest signals in case of ambiguity and sticks to standard ATGCN >> characters. >> >> Anyway, I am requesting the addition of the Chromatogram classes to >> the new API with support for opening files if they contain UPAC >> Ambiguity Codes. >> >> Thank you for this useful API, >> _______________________________________________ >> Biojava-l mailing list - [email protected] >> http://mailman.open-bio.org/mailman/listinfo/biojava-l _______________________________________________ Biojava-l mailing list - [email protected] http://mailman.open-bio.org/mailman/listinfo/biojava-l
