Thanks, I filed this as a feature request for the BioJava 4 series on GitHub.
Andreas On Wed, Jul 13, 2016 at 4:10 AM, Peter Cock <[email protected]> wrote: > Hi Jonas, > > Thanks for emailing me that example with an M in the sequence. > Biopython could parse it fine, and having checked our existing > sample test files, this one has K, R and Y bases: > > https://github.com/biopython/biopython/blob/master/Tests/Abi/3730.ab1 > > BioJava would be welcome to use that (double check with > Bow, CC'd, if you need it explicitly under a different licence). > > Regards, > > Peter > > > On Tue, Jul 12, 2016 at 5:41 PM, Peter Cock <[email protected]> > wrote: > > Hi Jonas, > > > > Are you happy to share sample file(s) using IUPAC ambiguity > > codes like M = A or C which could be freely used by BioJava > > and other projects as a test case? > > > > (I'm specifically asking for Biopython as I'm not sure if anyone > > has tried this with our ABI parser) > > > > Thanks, > > > > Peter > > > > On Tue, Jul 12, 2016 at 4:26 PM, Jonas Dehairs <[email protected]> > wrote: > >> The 4.2 API currently does not have methods for importing and > >> handeling Sanger sequencing files (ABI, SCF). I'm currently resorting > >> to the legacy classes in 1.9.1 (ChromatogramFactory and Chromatogram). > >> > >> ChromatogramFactory only supports Sanger trace files with standard > >> ATGCN characters. It throws a > >> UnsupportedChromatogramFormatException upon reading Sanger files with > >> IUPAC Ambiguity Codes (for example M = A or C). Even if I would just > >> like to access the traces and ignore the base calls, this is > >> impossible with the current implementation since we can't even open > >> the file if it contains Ambiguity codes. > >> > >> On a side note, I have been getting more and more questions from users > >> why they can't open their Sanger sequencing files (in my program that > >> uses BioJava). I think the popularity of CRISPR and the > >> characterization of CRISPR KO clones (which is likely to result in > >> heterozygous base calls) is increasing the number of people that have > >> these IUPAC Ambiguity Sanger files. > >> > >> For now, I tell people to go back to the Sanger sequencing software > >> that exports the ABI or SCF files and disable IUPAC Ambiguity in the > >> export options. In that case the base calling algorithm just picks the > >> strongest signals in case of ambiguity and sticks to standard ATGCN > >> characters. > >> > >> Anyway, I am requesting the addition of the Chromatogram classes to > >> the new API with support for opening files if they contain UPAC > >> Ambiguity Codes. > >> > >> Thank you for this useful API, > >> _______________________________________________ > >> Biojava-l mailing list - [email protected] > >> http://mailman.open-bio.org/mailman/listinfo/biojava-l > _______________________________________________ > Biojava-l mailing list - [email protected] > http://mailman.open-bio.org/mailman/listinfo/biojava-l >
_______________________________________________ Biojava-l mailing list - [email protected] http://mailman.open-bio.org/mailman/listinfo/biojava-l
