On Apr 13, 11:05 pm, Paul Rubin <http://[EMAIL PROTECTED]> wrote: > "Rhamphoryncus" <[EMAIL PROTECTED]> writes: > > > i = s.index(e) => s[i] = e > > > Then this algorithm is no longer guaranteed to work with strings. > > It never worked correctly on unicode strings anyway (which becomes the > > canonical string in python 3.0). > > What?! Are you sure? That sounds broken to me.
Nope, it's pretty fundamental to working with text, unicode only being an extreme example: there's a wide number of ways to break down a chunk of text, making the odds of "e" being any particular one fairly low. Python's unicode type only makes this slightly worse, not promising any particular one is available. For example, if you had an algorithm designed for ascii that gathered statistics on how common each "character" is, you'd want to redesign it to use either grapheme clusters or scalar values, then improve it to merge duplicate characters. You'd need to roll your own iterator though, Python doesn't provide a method that's specifically grapheme clusters or scalar values (and if I'm wrong I'd love to hear it!). -- Adam Olsen, aka Rhamphoryncus -- http://mail.python.org/mailman/listinfo/python-list