I still said, that there is no workaround which is based on
BeautifulSoup. The only way would be to clean the characters manually or
with other libraries like lxml. I discussed this with superfly some time
ago and we both agreed, that porting to lxml would be the best solution.

-- 
You received this bug notification because you are a member of OpenLP
Core, which is subscribed to OpenLP.
https://bugs.launchpad.net/bugs/706211

Title:
  Biblegateway serves defect UTF8 for chinese bibles and breaks the
  charset detection

Status in OpenLP - Worship Presentation Software:
  Confirmed

Bug description:
  This was still mentioned on the forum. As I'm not sure, when I find
  the time to work on it, I write tis report.

  When downloading references like:
  http://www.biblegateway.com/passage/?search=John%203&version=CUV
  the received HTML is UTF-8 encoded. The last two characters in <meta 
name="description" content="... are invalid UTF-8. This causes BeautifulSoup to 
fall back to cp1252 (which is wrong).
  Forcing BeautifulSoup to use UTF-8 didn't work for me. It still fell back to 
cp1252. The best thing would be to rewrite the regarding codes for all three 
servers in lxml as the use of LXML is even recommended by BeautifulSoup and it 
would make more sense to unify the library use in OpenLP.



_______________________________________________
Mailing list: https://launchpad.net/~openlp-core
Post to     : openlp-core@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openlp-core
More help   : https://help.launchpad.net/ListHelp

Reply via email to