Thank you, I tried again and I figured it out. That's something with beautiful soup, I worked with it a year ago also dealing with Chinese html pages and nothing error happened. I read the old code and I find the difference. Change the page to unicode before feeding to beautiful soup, then everything will be OK.
On Jan 28, 3:26 am, "Paul McGuire" <[EMAIL PROTECTED]> wrote: > After looking at the pyparsing results, I think I see the problem with > your original code. You are selecting only the characters after the > rightmost "-" character, but you really want to select everything to > the right of "- -". In some of the titles, the encoded Chinese > includes a "-" character, so you are chopping off everything before > that. > > Try changing your code to: > title=full_title.split("- -")[1] > > I think then your original program will work. > > -- Paul -- http://mail.python.org/mailman/listinfo/python-list