Thank you! I can foresee some brain stretching in my future. And yes, just two sequences of text, and what should be very similar text. (I'm trying to write tests for a conversion process.)
-- Graydon On Thu, Feb 12, 2026, at 07:12, David Birnbaum wrote: > > With just two sequences you can use Needleman-Wunsch. It’s a dynamic > programming algorithm that provides an optimal alignment (good thing, > although there may be more than one optimal alignment), but it doesn’t scale > well (not good thing). I describe an XSLT 3.0 implementation in my 2020 > XMLPrague paper at > https://archive.xmlprague.cz/2020/files/xmlprague-2020-proceedings.pdf > > Your question doesn’t clarify whether you’re looking for index numbers in the > alignment (where a word in one input might be matched by a gap in the other) > or in the inputs (where aligned words share a position in the alignment but > may have different positions in the inputs). For either of those > interpretations, though, a solution will begin by finding an alignment. > > David J. Birnbaum > [email protected] > >> On Feb 11, 2026, at 9:41 PM, Graydon Saunders <[email protected]> >> wrote: >> >> Hello! >> >> If I have two (fairly long) sequences of text, ('The', 'words', 'are', >> 'sequence', 'members') and I want all the index numbers of matching pairs >> despite the sequences only mostly matching (so a word, or several words, can >> be missing from sequence A or sequence B), is there an established algorithm >> for doing this? >> >> (If I search on "aligning sequences" I get bioinformatics about gene >> sequences; if I search on "aligning text" I get typography.) >> >> Thanks! >> Graydon

