Re: extracting from web pages but got disordered words sometimes

Frank Potter Sat, 27 Jan 2007 19:03:35 -0800

Thank you, I tried again and I figured it out.
That's something with beautiful soup, I worked with it a year ago also 
dealing with Chinese html pages and nothing error happened. I read the 
old code and I find the difference. Change the page to unicode before 
feeding to beautiful soup, then everything will be OK.


On Jan 28, 3:26 am, "Paul McGuire" <[EMAIL PROTECTED]> wrote:
> After looking at the pyparsing results, I think I see the problem with
> your original code.  You are selecting only the characters after the
> rightmost "-" character, but you really want to select everything to
> the right of "- -".  In some of the titles, the encoded Chinese
> includes a "-" character, so you are chopping off everything before
> that.
>
> Try changing your code to:
>     title=full_title.split("- -")[1]
>
> I think then your original program will work.
>
> -- Paul

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: extracting from web pages but got disordered words sometimes

Reply via email to