Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding "UTF8"

darklow Sun, 17 Apr 2011 05:05:04 -0700

Updated this morning to latest data files, no change and unfortunately this
fix also doesn't work.
I even tried adding
self.sqlstr = self.sqlstr.replace('\xec\x8c\xa0', '') in _toDB function and
still get the same error.
Maybe this unicode character replacement method is wrong?

This error started when we uninstalled imdbpy (left all the dependency libs)
and started run it without installation. Maybe there is some kind of problem
and some kind of hidden unicode dependencies? Maybe you can try to run
without installation, jus from source?

Also every time i start the script i receive two warnings:
2011-04-17 11:13:37,398 WARNING [imdbpy.parser.sql.aux]
/data/web/imdb/imdbpy4.7-159671/imdb/parser/sql/__init__.py:125: Unable to
import the cutils.ratcliff function.  Searching names and titles using the
"sql" data access system will be slower.
2011-04-17 11:13:37,399 WARNING [imdbpy.parser.sql.aux]
/data/web/imdb/imdbpy4.7-159671/imdb/parser/sql/__init__.py:332: Unable to
import the cutils.soundex function.  Searches of movie titles and person
names will be a bit slower.
IMPORTING psyco... FAILED (not a big deal, everything is alright...)

maybe that is some kind related?

On Sat, Apr 16, 2011 at 6:01 PM, Davide Alberani
<davide.alber...@gmail.com>wrote:

> On Wed, Apr 13, 2011 at 08:46, darklow <dark...@gmail.com> wrote:
> > Maybe someone knows some fast dirty fix at least how to skip such invalid
> > byte sequence strings while there are no official fix, so i can finish
> the
> > import?
> > Can we detect invalid byte characters?
>
> Hi again,
> actually my problem is that I'm unable to reproduce this bug. :-)
> Using Postgresql and SQLObject, my run goes on smooth.
>
> I have downloaded the 'actors.list.gz' file today, so it's possible that
> some
> garbage was removed.
>
> Anyway, the previously proposed solution was obviously flawed, since
> the problem was on _character_ names.
>
> So, let's edit again the imdbpy2sql.py file and change the lines around
> 1540
> so that they become:
>
>        movieid = CACHE_MID.addUnique(title)
>        if role is not None:
>            roles = filter(None, [x.strip() for x in role.split('/')])
>            for role in roles:
>                role = role.replace('\xec\x8c\xa0', '')  # TEMPORARY FIX
>                 cid = CACHE_CID.addUnique(role)
>                 sqldata.add((pid, movieid, cid, note, order))
>
> Maybe this will help... who knows? :-)
>
> --
> Davide Alberani <davide.alber...@gmail.com>  [PGP KeyID: 0x465BFD47]
> http://www.mimante.net/
>

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev

_______________________________________________
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help

Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding "UTF8"

Reply via email to