Mark & Richard, Thanks for the quick responses.
It looks like the combination of KnuthMorrisPrattSearch and edit() will do just what I need. FYI, The SymbolListCharSequence won't work for me, as I'm actually porting the code to .Net, and the .Net RegEx engine isn't flexible enough to accept a non-string. (Please don't hate me; I'm working in a java-averse environment, and I want to take advantage of all the BioJava goodness.) Cheers, -Doug On Fri, Sep 19, 2008 at 7:43 AM, Mark Schreiber <[EMAIL PROTECTED]>wrote: > Hi - > > You don't have to go to a String to make a match. There is a class > SymbolListCharSequence that wraps a SymbolList as a CharSequence that lets > you perform Regexs etc to identify the match. You can also use the > KnuthMorrisPrattSearch to find exact matches. > > Finally to find non-exact matches you can use the SmithWaterman or > Needleman Wunsch. > > - Mark > > On Fri, Sep 19, 2008 at 4:42 PM, Richard Holland < > [EMAIL PROTECTED]> wrote: > >> Hello. >> >> To be honest, I think you've already got the only way to quickly >> locate a subsequence within a sequence. For whatever reason, the >> Sequence and SymbolList interfaces lack any kind of indexOf() or >> find() functions, and the SequenceTools class, usually the provider of >> all things useful, also fails to fill the gap. >> >> You're right about there being a SymbolList edit facility. This only >> works on SymbolLists that have declared themselves editable, which >> will depend on how your SymbolList objects were created. What you do >> is create a new Edit object, based on starting position in the >> original sequence, length of sequence to remove in the original, and >> the SymbolList you want to use to replace the removed bits. Then you >> pass this to the edit() method on the SymbolList/Sequence object you >> want to replace. >> >> So, the end result is only a small improvement on your original plan, >> but here goes: >> >> 1. Create your sequence. >> 2. Create your other sequence. >> 3. Convert both to strings and use an indexOf in the String object to >> locate the subsequence in the original sequence. >> 4. Use string tools to flip the subsequence then create a new >> SymbolList based on it. >> 5. If the original sequence is editable, use the Edit method >> described above to replace a chunk of it with the new flipped >> subsequence. Otherwise, construct a new string using the String object >> methods and construct a new original sequence based on that instead. >> >> cheers. >> Richard >> >> 2008/9/19 Doug Swisher <[EMAIL PROTECTED]>: >> > Hi, >> > >> > I'm pretty new to BioJava, and I'm a bit stuck. I'm hoping someone can >> help >> > out a bit...even if it's just a hint as to where to look next. >> > >> > I have a long DNA sequence and a shorter sequence that exists within the >> > larger one. I want to find the location of the smaller sequence within >> the >> > larger one, and then create a new sequence with the small one flipped >> > end-for-end. That's confusing, so let me give an example. >> > >> > Long sequence: aaaagacttttt >> > Short sequence: gact >> > Goal sequence: aaaatcagtttt >> > >> > To find the location of the short sequence within the larger one, I >> could >> > certainly do some string manipulation: >> > >> > SymbolList bigDNA = DNATools.createDNA("aaaagacttttt"); >> > SymbolList subDNA = DNATools.createDNA("gact"); >> > int start = bigDNA.seqString().indexOf(subDNA.seqString()); >> > >> > While that would work, I'm wondering if there is a more efficient method >> > that avoids the conversion to strings (in my real code, I start with >> > Sequences, not strings; I used SymbolLists here for simplicity). >> > >> > To "excise" the short sequence, flip it around, and construct a new >> > SymbolList, I could also do some string manipulation, as in the >> following: >> > >> > StringBuilder middle = new StringBuilder(subDNA.seqString()); >> > String leftPart = bigDNA.seqString().substring(0, subDNA.length()); >> > String rightPart = bigDNA.seqString().substring(start + >> subDNA.length(), >> > bigDNA.length()); >> > SymbolList goalDNA = DNATools.createDNA(leftPart + middle.reverse() + >> > rightPart); >> > >> > Looking at the documentation, such as ProjectionUtils or >> SymbolList.edit(), >> > it appears there might be some support for manipulating the sequence >> > directly. Is there a way to do it, without again dropping "down" to >> > strings? >> > >> > Thanks in advance for any assistance. >> > >> > Cheers, >> > -Doug >> > >> > P.S. Yeah, the second code snippet is pretty inefficient; I was trying >> to be >> > clear rather than efficient. >> > _______________________________________________ >> > Biojava-l mailing list - Biojava-l@lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> > >> >> >> >> -- >> Richard Holland, BSc MBCS >> Finance Director, Eagle Genomics Ltd >> M: +44 7500 438846 | E: [EMAIL PROTECTED] >> http://www.eaglegenomics.com/ >> _______________________________________________ >> Biojava-l mailing list - Biojava-l@lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > _______________________________________________ Biojava-l mailing list - Biojava-l@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l