On Apr 13, 11:05 pm, Paul Rubin <http://[EMAIL PROTECTED]> wrote:
> "Rhamphoryncus" <[EMAIL PROTECTED]> writes:
> > >   i = s.index(e) => s[i] = e
> > > Then this algorithm is no longer guaranteed to work with strings.
> > It never worked correctly on unicode strings anyway (which becomes the
> > canonical string in python 3.0).
>
> What?!   Are you sure?  That sounds broken to me.

Nope, it's pretty fundamental to working with text, unicode only being
an extreme example: there's a wide number of ways to break down a
chunk of text, making the odds of "e" being any particular one fairly
low.  Python's unicode type only makes this slightly worse, not
promising any particular one is available.

For example, if you had an algorithm designed for ascii that gathered
statistics on how common each "character" is, you'd want to redesign
it to use either grapheme clusters or scalar values, then improve it
to merge duplicate characters.  You'd need to roll your own iterator
though, Python doesn't provide a method that's specifically grapheme
clusters or scalar values (and if I'm wrong I'd love to hear it!).

--
Adam Olsen, aka Rhamphoryncus

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to