On Oct 6, 8:39 am, Alexey Moskvin <[EMAIL PROTECTED]> wrote: > Martin, thanks for fast reply, now anything is ok! > On Oct 6, 1:30 am, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > > > > I have a set of strings (all letters are capitalized) at utf-8, > > > That's the problem. If these are really utf-8 encoded byte strings, > > then .lower likely won't work. It uses the C library's tolower API, > > which works on a byte level, i.e. can't work for multi-byte encodings. > > > What you need to do is to operate on Unicode strings. I.e. instead > > of > > > s.lower() > > > do > > > s.decode("utf-8").lower() > > > or (if you need byte strings back) > > > s.decode("utf-8").lower().encode("utf-8") > > > If you find that you write the latter, I recommend that you redesign > > your application. Don't use byte strings to represent text, but use > > Unicode strings all the time, except at the system boundary (where > > you decode/encode as appropriate). > > > There are some limitations with Unicode .lower also, but I don't > > think they apply to Russian (specifically, SpecialCasing.txt is > > not considered). > > > HTH, > > Martin
Alexey, if your strings stored in some text file you can use "codecs" package > import codecs > handler = codecs.open('somefile', 'r', 'utf-8') > # ... do the job > handler.close() I prefer this way to deal with russian in utf-8. Konstantin. -- http://mail.python.org/mailman/listinfo/python-list