Thank you! I can foresee some brain stretching in my future.

And yes, just two sequences of text, and what should be very similar text. (I'm 
trying to write tests for a conversion process.)

-- Graydon

On Thu, Feb 12, 2026, at 07:12, David Birnbaum wrote:
> 
> With just two sequences you can use Needleman-Wunsch. It’s a dynamic 
> programming algorithm that provides an optimal alignment (good thing, 
> although there may be more than one optimal alignment), but it doesn’t scale 
> well (not good thing). I describe an XSLT 3.0 implementation in my 2020 
> XMLPrague paper at 
> https://archive.xmlprague.cz/2020/files/xmlprague-2020-proceedings.pdf
> 
> Your question doesn’t clarify whether you’re looking for index numbers in the 
> alignment (where a word in one input might be matched by a gap in the other) 
> or in the inputs (where aligned words share a position in the alignment but 
> may have different positions in the inputs). For either of those 
> interpretations, though, a solution will begin by finding an alignment. 
> 
> David J. Birnbaum
> [email protected]
> 
>> On Feb 11, 2026, at 9:41 PM, Graydon Saunders <[email protected]> 
>> wrote:
>> 
>> Hello!
>> 
>> If I have two (fairly long) sequences of text, ('The', 'words', 'are', 
>> 'sequence', 'members') and I want all the index numbers of matching pairs 
>> despite the sequences only mostly matching (so a word, or several words, can 
>> be missing from sequence A or sequence B), is there an established algorithm 
>> for doing this?
>> 
>> (If I search on "aligning sequences" I get bioinformatics about gene 
>> sequences; if I search on "aligning text" I get typography.)
>> 
>> Thanks!
>> Graydon

Reply via email to