subject:"Re\: unicode confusing"

Re: unicode confusing

2009-05-26 Thread Pet

On May 25, 6:07 pm, Paul Boddie p...@boddie.org.uk wrote:
 On 25 Mai, 17:39, someone petshm...@googlemail.com wrote:

  Hi,

  reading content of webpage (encoded in utf-8) with urllib2, I can't
  get parsed data into DB

  Exception:

    File /usr/lib/python2.5/site-packages/pyPgSQL/PgSQL.py, line 3111,
  in execute
      raise OperationalError, msg
  libpq.OperationalError: ERROR:  invalid UTF-8 byte sequence detected
  near byte 0xe4

  I've already checked several python unicode tutorials, but I have no
  idea how to solve my problem.

 With pyPgSQL, there are a few tricks that you have to take into
 account:

 1. With PostgreSQL, it would appear advantageous to create databases
 using the -E unicode option.

Hi,

DB is in UTF8



 2. When connecting, use the client_encoding and unicode_results
 arguments for the connect function call:

   connection = PgSQL.connect(client_encoding=utf-8,
 unicode_results=1)

If I do unicode_results=1, then there are exceptions in other places,
e.g. urllib.urlencode(values)
cant encode values


 3. After connecting, it appears necessary to set the client encoding
 explicitly:

   connection.cursor().execute(set client_encoding to unicode)

I've tried this as well, but still have exceptions


 I'd appreciate any suggestions which improve on the above, but what
 this should allow you to do is to present Unicode objects to the
 database and to receive such objects from queries. Whether you can
 relax this and pass UTF-8-encoded strings instead of Unicode objects
 is not something I can guarantee, but it's usually recommended that
 you manipulate Unicode objects in your program where possible, and
 here you should be able to let pyPgSQL deal with the encodings
 preferred by the database.


Thanks for your suggestions! Sadly, I can't solve my problem...

Pet

 Paul

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode confusing

2009-05-26 Thread Pet

On May 26, 9:29 am, Pet petshm...@googlemail.com wrote:
 On May 25, 6:07 pm, Paul Boddie p...@boddie.org.uk wrote:





  On 25 Mai, 17:39, someone petshm...@googlemail.com wrote:

   Hi,

   reading content of webpage (encoded in utf-8) with urllib2, I can't
   get parsed data into DB

   Exception:

     File /usr/lib/python2.5/site-packages/pyPgSQL/PgSQL.py, line 3111,
   in execute
       raise OperationalError, msg
   libpq.OperationalError: ERROR:  invalid UTF-8 byte sequence detected
   near byte 0xe4

   I've already checked several python unicode tutorials, but I have no
   idea how to solve my problem.

  With pyPgSQL, there are a few tricks that you have to take into
  account:

  1. With PostgreSQL, it would appear advantageous to create databases
  using the -E unicode option.

 Hi,

 DB is in UTF8



  2. When connecting, use the client_encoding and unicode_results
  arguments for the connect function call:

    connection = PgSQL.connect(client_encoding=utf-8,
  unicode_results=1)

 If I do unicode_results=1, then there are exceptions in other places,
 e.g. urllib.urlencode(values)
 cant encode values



  3. After connecting, it appears necessary to set the client encoding
  explicitly:

    connection.cursor().execute(set client_encoding to unicode)

 I've tried this as well, but still have exceptions



  I'd appreciate any suggestions which improve on the above, but what
  this should allow you to do is to present Unicode objects to the
  database and to receive such objects from queries. Whether you can
  relax this and pass UTF-8-encoded strings instead of Unicode objects
  is not something I can guarantee, but it's usually recommended that
  you manipulate Unicode objects in your program where possible, and
  here you should be able to let pyPgSQL deal with the encodings
  preferred by the database.

 Thanks for your suggestions! Sadly, I can't solve my problem...

 Pet



  Paul

After some time, I've tried, to convert result with unicode(result,
'ISO-8859-15') and that was it :)
I've thought it was already utf-8, because of charset defining in
meta of webpage I'm fetching
Pet
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode confusing

2009-05-26 Thread Paul Boddie

On 26 Mai, 10:09, Pet petshm...@googlemail.com wrote:

 After some time, I've tried, to convert result with unicode(result,
 'ISO-8859-15') and that was it :)

I haven't really investigated having unicode_results set to false (or
the default) with a database containing UTF-8 (or any non-ASCII
encoded) text, since it's always desirable to manipulate Unicode
internally in one's programs: I don't want plain strings containing
various encoded sequences of bytes when I'm dealing with characters.
That said, if one were consuming XML/HTML and then putting it in raw
form into a database (including the tags), I could understand that
Unicode objects might then seem like a distraction.

 I've thought it was already utf-8, because of charset defining in
 meta of webpage I'm fetching

There are lots of caveats about Web page encodings - which metadata
actually indicates the encoding - but I still regard the best approach
to involve converting text to Unicode as soon as possible, then
presenting Unicode objects to the database. This way, you can separate
the decisions about which encodings the Web pages are using and which
encoding the database is using.

Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode confusing

Re: unicode confusing

Re: unicode confusing

3 matches

Site Navigation

Mail list logo

Footer information