With just two sequences you can use Needleman-Wunsch. It’s a dynamic 
programming algorithm that provides an optimal alignment (good thing, although 
there may be more than one optimal alignment), but it doesn’t scale well (not 
good thing). I describe an XSLT 3.0 implementation in my 2020 XMLPrague paper 
at https://archive.xmlprague.cz/2020/files/xmlprague-2020-proceedings.pdf

Your question doesn’t clarify whether you’re looking for index numbers in the 
alignment (where a word in one input might be matched by a gap in the other) or 
in the inputs (where aligned words share a position in the alignment but may 
have different positions in the inputs). For either of those interpretations, 
though, a solution will begin by finding an alignment. 

David J. Birnbaum
[email protected]

> On Feb 11, 2026, at 9:41 PM, Graydon Saunders <[email protected]> wrote:
> 
> 
> Hello!
> 
> If I have two (fairly long) sequences of text, ('The', 'words', 'are', 
> 'sequence', 'members') and I want all the index numbers of matching pairs 
> despite the sequences only mostly matching (so a word, or several words, can 
> be missing from sequence A or sequence B), is there an established algorithm 
> for doing this?
> 
> (If I search on "aligning sequences" I get bioinformatics about gene 
> sequences; if I search on "aligning text" I get typography.)
> 
> Thanks!
> Graydon

Reply via email to