Finally! removing portion by portion from actors.list i found the exact line
that creates error.
This is following line from actors.list characters:

Guillaume, Fran?s (I) 23 d?mbre 2008: le jour orance s'est arr?e (2005) (TV)
*                        "Michel, l'enfant-roi" (1972)  (uncredited)
 [(1972) Le rescap?[8;25H"Une Su?ise ?aris" (1975)  [Le photographe]*
*
*
As you can see brackets doesn't match.
I also found where these invalid bytes appeared.
If i remove following code from imdbp2sql.py then script works without
errors:

Lines (1506 - 1513) version 4.7:

            if role[-1:] == ']':
                    role = role[:-1]
                if role[-1:] == ')':
                    nidx = role.find('(')
                    if nidx != -1:
                        note = role[nidx:]
                        role = role[:nidx].rstrip()
                        if not role: role = None

So that means because of these matching brackets this split script does
something wrong and so these invalid bytes appears.
Also it is hard for me to understand why this thing is happening only to me.
One idea was there may be something with unzip function that is used to
decompress actors.list.gz.

I attached sample files: exact character line with error, and also compiled
version of actors.list with correct head and foot and error
I created these files using gunzip, sed and cat functions to decompress, cut
exact lines and combine head, middle and foot parts.

Any suggestions how to fix these lines so error doesnt appears?
I am afraid by removing these lines will make some wrong data to import.

Bu finally i have feeling we are very close to discover the real problem :)


On Fri, Apr 29, 2011 at 1:55 AM, darklow <dark...@gmail.com> wrote:

> Thanks. This time i was lucky, sysadmin just installed python-dev package.
> Also to get install IMDbPYscript without errors, we needed to install also:
> *install libxml2-dev libxslt-dev*
> and afterwards psycopg was missing too, so i run from virtualenv
> *pip install psycopg*
>
> And only now i was available to run imdbpy2sql script withour errors.
> Also since now i got virtualenv i installed also psyco.
> Now script is running, but unfortunately i got the same error :((((
>
>
> Traceback (most recent call last):
>   File "./bin/imdbpy2sql.py", line 2950, in <module>
>     run()
>   File "./bin/imdbpy2sql.py", line 2811, in run
>     castLists(_charIDsList=characters_imdbIDs)
>   File "./bin/imdbpy2sql.py", line 1575, in castLists
>     doCast(f, roleid, rolename)
>   File "./bin/imdbpy2sql.py", line 1534, in doCast
>     cid = CACHE_CID.addUnique(role)
>   File "./bin/imdbpy2sql.py", line 957, in addUnique
>     else: return self.add(key, miscData)
>   File "./bin/imdbpy2sql.py", line 950, in add
>     self[key] = c
>   File "./bin/imdbpy2sql.py", line 860, in __setitem__
>     self.flush()
>   File "./bin/imdbpy2sql.py", line 921, in flush
>     raise
>   File "./bin/imdbpy2sql.py", line 883, in flush
>     self._toDB(quiet)
>   File "./bin/imdbpy2sql.py", line 1185, in _toDB
>     CURS.executemany(self.sqlstr, self.converter(l))
> psycopg2.DataError: invalid byte sequence for encoding "UTF8": 0xc320
> HINT:  This error can also happen if the byte sequence does not match the
> encoding expected by the server, which is controlled by "client_encoding".
>
> (myvenv)darklow@moon:~/myvenv$
>
>
> I just run out of ideas :((((
>
>
> On Fri, Apr 29, 2011 at 12:00 AM, Davide Alberani <
> davide.alber...@gmail.com> wrote:
>
>> On Thu, Apr 28, 2011 at 22:52, darklow <dark...@gmail.com> wrote:
>> >
>> > However last command pip install IMDbPY didn't succeeded so well, looks
>> like
>> > i got exactly the same error, that another user reported some days ago
>> in
>> > the same discussion and he has also UTF-8 encoding problem:
>>
>> Sure: you don't have the python-dev package installed
>> in your system. :-/
>> A per-user installation is possible, but a little tricky...
>>
>> > By running python setup.py install  I receive the same error. I also
>> tried
>> > latest version (4.8dev20110425) but got same error.
>>
>> Using the latest version sources, run (after you've activated your
>> virtualenv!):
>>  python setup.py install --without-cutils
>>
>> > Maybe this explains the problem why the script doesn't handle UTF-8 at
>> first
>> > place - some strange incapabilities with cutils.c
>>
>> I've run some tests without the compiled C module, so I think this
>> is not the cause, but at this point... who knows. :-)
>>
>>
>>
>> --
>> Davide Alberani <davide.alber...@gmail.com>  [PGP KeyID: 0x465BFD47]
>> http://www.mimante.net/
>>
>
>

Attachment: actors.list
Description: Binary data

Attachment: actors.list.middle
Description: Binary data

Attachment: actors.list.gz
Description: GNU Zip compressed data

------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help

Reply via email to