I've added the gap symbol to DNA & RNA compound sets. Hopefully this error will 
go away. If not then we'll have to look into the alignment code & get it to use 
the gap symbol

Andy

On 6 Dec 2010, at 20:00, Scooter Willis wrote:

> It would be nice to have a cool indexing system that allowed dynamic indexes 
> of the data model but not worth the headache. If we are going to go big we 
> should use the same gap symbols that were added for protein sequences.
> 
> Scooter
> 
> On Mon, Dec 6, 2010 at 2:32 PM, Andy Yates <[email protected]> wrote:
> I would say partially an oversight on my part & partially done on purpose (a 
> gap is not a nucleotide after all). However I'm all in favour of being 
> pragmatic here so lets add them in. If I get an okay from the relevant 
> parties I'll commit the change in.
> 
> Andy
> 
> On 6 Dec 2010, at 18:41, Chris Friedline wrote:
> 
> > OK, so here's a quick fix now that I know where to look.  In my local
> > source I added the following line to the constructor of DNACompoundSet
> > and recompiled.
> >
> > addNucleotideCompound("-", "-");
> >
> > Not sure if this is the correct place for it in terms of what the devs
> > want to do globally, but it gets me moving forward again.  Gap
> > characters are in AminoAcidCompoundSet so I'm wondering if this was
> > just a tiny oversight on the nucleotide front.
> >
> > Thanks again for the help everyone,
> > Chris
> >
> > On Mon, Dec 6, 2010 at 1:28 PM, Chris Friedline <[email protected]> wrote:
> >> That does help, thanks.  However, when calling getAsList() on the
> >> aligned sequences and printing, this is what I see.  Something seems
> >> wrong.  It does appear as though null is being inserted where there
> >> should be gaps
> >>
> >> seq = [A, A, C, A, C, T, T, G, A, C, A, T, G, T, T, C, null, G, T, C,
> >> G, C, A, A, C, T, T, T, T, A, A, G, A, G, A, T, T, A, G, A, G, T, T,
> >> T, T, C, G, G, T, T, C, G, G, C, C, G, G, A, C, G, A, A, A, C, A, C,
> >> null, null, null, null, null, null]
> >> seq = [T, A, C, C, C, T, T, A, A, C, A, T, null, null, T, C, A, G, T,
> >> G, A, C, A, A, C, C, T, C, null, null, A, G, A, G, A, T, G, A, G,
> >> null, G, C, T, T, T, C, T, C, T, T, C, G, G, null, null, null, null,
> >> null, null, null, A, G, A, C, A, C, T, G, G, G, A, T]
> >>
> >> Chris
> >>
> >> On Mon, Dec 6, 2010 at 12:22 PM, Andreas Prlic <[email protected]> wrote:
> >>> Hi Andy,
> >>>
> >>> Check out the SimpleAlignedSequence class, for how Gaps are handled...
> >>> Does that help?
> >>>
> >>> Andreas
> >>>
> >>> On Mon, Dec 6, 2010 at 7:13 AM, Andy Yates <[email protected]> wrote:
> >>>> So myself & Chris have discussed this off list & we believe it's because 
> >>>> of a NULL compound element in the Sequence given to the SequenceMixin 
> >>>> method.
> >>>>
> >>>> Does anyone on list know how the AlignedSequence code encodes gaps & the 
> >>>> alike?
> >>>>
> >>>> Andy
> >>>>
> >>>> On 6 Dec 2010, at 13:50, Andy Yates wrote:
> >>>>
> >>>>> Hi Chris,
> >>>>>
> >>>>> Well that's going into my toStringBuilder() method & that particular 
> >>>>> line is concerned with asking a compound for its String representation. 
> >>>>> How often do we get nulls in our Sequences and how to deal with them. 
> >>>>> After all the Sequence AGTCNULLAGTC is probably more harmful then 
> >>>>> helpful
> >>>>>
> >>>>> Andy
> >>>>>
> >>>>> On 6 Dec 2010, at 12:45, Chris Friedline wrote:
> >>>>>
> >>>>>> Hello,
> >>>>>>
> >>>>>> Found another potential error case, this time in beta2 (fresh pull
> >>>>>> from git last evening).  For more info, please see
> >>>>>> http://pastie.org/1351388 for test case and stack trace.  The JUnit
> >>>>>> test passes simply because the pair object is not null, but fails when
> >>>>>> trying to extract any information from the pair itself (toString(),
> >>>>>> getIdenticals(), etc). The substitution matrix file is from
> >>>>>> ftp://ftp.ncbi.nih.gov/blast/matrices.  I'm doing large numbers of
> >>>>>> pairwise alignments, which do not all fail, but most do with this same
> >>>>>> error.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Chris
> >>>>>>
> >>>>>> --
> >>>>>> PhD Candidate, Integrative Life Sciences
> >>>>>> Virginia Commonwealth University
> >>>>>> Richmond, VA
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Biojava-l mailing list  -  [email protected]
> >>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Biojava-l mailing list  -  [email protected]
> >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>>
> >>>> --
> >>>> Andrew Yates                   Ensembl Genomes Engineer
> >>>> EMBL-EBI                       Tel: +44-(0)1223-492538
> >>>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> >>>> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Biojava-l mailing list  -  [email protected]
> >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> -----------------------------------------------------------------------
> >>> Dr. Andreas Prlic
> >>> Senior Scientist, RCSB PDB Protein Data Bank
> >>> University of California, San Diego
> >>> (+1) 858.246.0526
> >>> -----------------------------------------------------------------------
> >>>
> >>> _______________________________________________
> >>> Biojava-l mailing list  -  [email protected]
> >>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>
> >>
> >>
> >>
> >> --
> >> PhD Candidate, Integrative Life Sciences
> >> Virginia Commonwealth University
> >> Richmond, VA
> >>
> >
> >
> >
> > --
> > PhD Candidate, Integrative Life Sciences
> > Virginia Commonwealth University
> > Richmond, VA
> 
> --
> Andrew Yates                   Ensembl Genomes Engineer
> EMBL-EBI                       Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
> 
> 
> 
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  [email protected]
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
> 

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/





_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

Reply via email to