I've added the gap symbol to DNA & RNA compound sets. Hopefully this error will go away. If not then we'll have to look into the alignment code & get it to use the gap symbol
Andy On 6 Dec 2010, at 20:00, Scooter Willis wrote: > It would be nice to have a cool indexing system that allowed dynamic indexes > of the data model but not worth the headache. If we are going to go big we > should use the same gap symbols that were added for protein sequences. > > Scooter > > On Mon, Dec 6, 2010 at 2:32 PM, Andy Yates <[email protected]> wrote: > I would say partially an oversight on my part & partially done on purpose (a > gap is not a nucleotide after all). However I'm all in favour of being > pragmatic here so lets add them in. If I get an okay from the relevant > parties I'll commit the change in. > > Andy > > On 6 Dec 2010, at 18:41, Chris Friedline wrote: > > > OK, so here's a quick fix now that I know where to look. In my local > > source I added the following line to the constructor of DNACompoundSet > > and recompiled. > > > > addNucleotideCompound("-", "-"); > > > > Not sure if this is the correct place for it in terms of what the devs > > want to do globally, but it gets me moving forward again. Gap > > characters are in AminoAcidCompoundSet so I'm wondering if this was > > just a tiny oversight on the nucleotide front. > > > > Thanks again for the help everyone, > > Chris > > > > On Mon, Dec 6, 2010 at 1:28 PM, Chris Friedline <[email protected]> wrote: > >> That does help, thanks. However, when calling getAsList() on the > >> aligned sequences and printing, this is what I see. Something seems > >> wrong. It does appear as though null is being inserted where there > >> should be gaps > >> > >> seq = [A, A, C, A, C, T, T, G, A, C, A, T, G, T, T, C, null, G, T, C, > >> G, C, A, A, C, T, T, T, T, A, A, G, A, G, A, T, T, A, G, A, G, T, T, > >> T, T, C, G, G, T, T, C, G, G, C, C, G, G, A, C, G, A, A, A, C, A, C, > >> null, null, null, null, null, null] > >> seq = [T, A, C, C, C, T, T, A, A, C, A, T, null, null, T, C, A, G, T, > >> G, A, C, A, A, C, C, T, C, null, null, A, G, A, G, A, T, G, A, G, > >> null, G, C, T, T, T, C, T, C, T, T, C, G, G, null, null, null, null, > >> null, null, null, A, G, A, C, A, C, T, G, G, G, A, T] > >> > >> Chris > >> > >> On Mon, Dec 6, 2010 at 12:22 PM, Andreas Prlic <[email protected]> wrote: > >>> Hi Andy, > >>> > >>> Check out the SimpleAlignedSequence class, for how Gaps are handled... > >>> Does that help? > >>> > >>> Andreas > >>> > >>> On Mon, Dec 6, 2010 at 7:13 AM, Andy Yates <[email protected]> wrote: > >>>> So myself & Chris have discussed this off list & we believe it's because > >>>> of a NULL compound element in the Sequence given to the SequenceMixin > >>>> method. > >>>> > >>>> Does anyone on list know how the AlignedSequence code encodes gaps & the > >>>> alike? > >>>> > >>>> Andy > >>>> > >>>> On 6 Dec 2010, at 13:50, Andy Yates wrote: > >>>> > >>>>> Hi Chris, > >>>>> > >>>>> Well that's going into my toStringBuilder() method & that particular > >>>>> line is concerned with asking a compound for its String representation. > >>>>> How often do we get nulls in our Sequences and how to deal with them. > >>>>> After all the Sequence AGTCNULLAGTC is probably more harmful then > >>>>> helpful > >>>>> > >>>>> Andy > >>>>> > >>>>> On 6 Dec 2010, at 12:45, Chris Friedline wrote: > >>>>> > >>>>>> Hello, > >>>>>> > >>>>>> Found another potential error case, this time in beta2 (fresh pull > >>>>>> from git last evening). For more info, please see > >>>>>> http://pastie.org/1351388 for test case and stack trace. The JUnit > >>>>>> test passes simply because the pair object is not null, but fails when > >>>>>> trying to extract any information from the pair itself (toString(), > >>>>>> getIdenticals(), etc). The substitution matrix file is from > >>>>>> ftp://ftp.ncbi.nih.gov/blast/matrices. I'm doing large numbers of > >>>>>> pairwise alignments, which do not all fail, but most do with this same > >>>>>> error. > >>>>>> > >>>>>> Thanks, > >>>>>> Chris > >>>>>> > >>>>>> -- > >>>>>> PhD Candidate, Integrative Life Sciences > >>>>>> Virginia Commonwealth University > >>>>>> Richmond, VA > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Biojava-l mailing list - [email protected] > >>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Biojava-l mailing list - [email protected] > >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>>> > >>>> -- > >>>> Andrew Yates Ensembl Genomes Engineer > >>>> EMBL-EBI Tel: +44-(0)1223-492538 > >>>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > >>>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Biojava-l mailing list - [email protected] > >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>>> > >>> > >>> > >>> > >>> -- > >>> ----------------------------------------------------------------------- > >>> Dr. Andreas Prlic > >>> Senior Scientist, RCSB PDB Protein Data Bank > >>> University of California, San Diego > >>> (+1) 858.246.0526 > >>> ----------------------------------------------------------------------- > >>> > >>> _______________________________________________ > >>> Biojava-l mailing list - [email protected] > >>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >>> > >> > >> > >> > >> -- > >> PhD Candidate, Integrative Life Sciences > >> Virginia Commonwealth University > >> Richmond, VA > >> > > > > > > > > -- > > PhD Candidate, Integrative Life Sciences > > Virginia Commonwealth University > > Richmond, VA > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > Biojava-l mailing list - [email protected] > http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
