Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding "UTF8"

darklow Sun, 24 Apr 2011 13:44:57 -0700

Yes i can confirm - Script version 4.6 works perfectly on same server with
same files.
And i think by this we come closer to solution.
Maybe this helps to identify the problem, this is what we did on our server.
(Remember, we are doing this copying because there are only stable versions
for Debian on server allowed, but we need those md5 hashes from 4.7 version)


1. We installed imdbpy 4.6 with all the dependancies
(python-psycopg2, python-dns python-formencode python-pkg-resources
python-sqlobject)
2. I downloaded version 4.7 and overwritten following directories with files
from 4.7 source:

cp -r imdbpy4.7/docs/* /usr/share/doc/python-imdb/
cp -r imdbpy4.7/imdb/* /usr/share/pyshared/imdb/


3. Now i run imdbpy2sql.py from version 4.7 source like before and it fails
with invalid byte sequence.
4. I copied back 4.6. version files to mentioned directories and import for
version 4.6 works again.

By looking on install log, i didnt see any more relative files, that i
should overwrite. So the problem might be at dependancies.
You have any idea, where could be the problem and what else should we
overwrite or update so that v4.7 works?
Thank you.


On Sun, Apr 24, 2011 at 10:03 PM, darklow <dark...@gmail.com> wrote:

> There has never been any issues with our PostgresSQL database, we always
> have used UTF-8 and are using this time.
> I have tried plenty of scripts, workarounds so far, many decode().encode()
> tries, but nothing helps, just gettings different errors by these.
> I also tried adding following lines, to be sure everything is fine with
> connection to Database:
>
> import psycopg2
> import psycopg2.extensions
> psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
> psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)
>
> import codecs
> sys.setdefaultencoding('utf-8')
>
> CURS.execute("SET NAMES 'utf8'")
> CURS.execute("SET CLIENT_ENCODING TO 'utf8'")
>
>
> But still nothing helps.
> I tried reinstalling all installed dependancies and run from clean sources,
> but no luck.
> I tried to run scripts with SQLAlchemy instead of SQLObject, but same
> error, so the problem is not there.
>
> I woud like to ask you one thing.
> Every test takes about 1h, because error takes place in Actors Cast list.
> Can you please tell what are the exact list of commands that are converting
> lines from file to line to sql.
> So i could create new script, that tries small version of actors.list with
> problematic lines only, runs few unicode() and decode() lines in correct
> order and try to insert these lines in some test table into database. So i
> could try, more faster and not to wait 1 hour for every try...
>
> What i tried already is to open actor.list file with PHP, read every line
> and using iconv converted string to UTF8 and inserted into PostgreSQL
> database and everything worked fine. It makes me think that problem might be
> somewhere in cutting line in peaces, maybe it does something wrong, cuts
> some good unicode character into peaces and so invalid byte sequence
> appears. If i had correct function list for Python, i could run more tests.
>
> PS. Just run test with 4.6 version, to see if it still works with 4.6
> version, then we could more easy diagnose by looking in file changes.
> I'll post the results
>
> Thank you.
>
> On Sat, Apr 23, 2011 at 3:23 PM, Davide Alberani <
> davide.alber...@gmail.com> wrote:
>
>> On Wed, Apr 20, 2011 at 14:08, darklow <dark...@gmail.com> wrote:
>> > Still no luck :/ maybe the problem is in some environmental variables or
>> > settings, which on installed version are present, but running from
>> source
>> > are missing or incorrect?
>>
>> Seems unlikely to me.
>>
>> > What about this, i printed out some variables:
>> > print sys.stdout.encoding -> UTF-8
>> > print sys.stdin.encoding   -> UTF-8
>> > print sys.getdefaultencoding(); -> ascii
>> > Is it ok that  sys.getdefaultencoding(); == ascii ?
>>
>> These are fine.
>>
>> I've reproduced - at the best of my capabilities - your environment:
>> - no IMDbPY installed in the system.
>> - IMDbPY from source (the latest version in the Mercurial repository),
>>  setting the PYTHONPATH environment variable to point to the
>>  source directory.
>> - the cutils C module was not compiled.
>> - the last actors.list.gz file.
>> - postgres 8.4; my database was created with these settings:
>>  CREATE DATABASE imdb
>>    WITH OWNER = postgres
>>       ENCODING = 'UTF8'
>>       TABLESPACE = pg_default
>>       LC_COLLATE = 'it_IT.utf8'
>>       LC_CTYPE = 'it_IT.utf8'
>>       CONNECTION LIMIT = -1;
>>
>> I've run it with your and other portions of the actors.list.gz file, and
>> everything went fine.
>>
>> Now... if I were you, I'd:
>> - create a virtualenv environment with:
>>    virtualenv --no-site-packages
>> - install in it IMDbPY, using easy_install or pip (the executable in
>>  your virtualenv, I mean) so that you'll have all the correct dependecies
>>  available.
>> - run the imdbpy2sql.py within your virtualenv.
>>
>> If it still fails:
>> - check your postgres settings.
>> - try using SQLite (just for a test) - see notes in README.sqldb
>>
>>
>> HTH,
>> --
>> Davide Alberani <davide.alber...@gmail.com>  [PGP KeyID: 0x465BFD47]
>> http://www.mimante.net/
>>
>
>

------------------------------------------------------------------------------
Fulfilling the Lean Software Promise
Lean software platforms are now widely adopted and the benefits have been 
demonstrated beyond question. Learn why your peers are replacing JEE 
containers with lightweight application servers - and what you can gain 
from the move. http://p.sf.net/sfu/vmware-sfemails

_______________________________________________
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help

Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding "UTF8"

Reply via email to