On Wed, Apr 13, 2011 at 08:46, darklow <dark...@gmail.com> wrote: > Maybe someone knows some fast dirty fix at least how to skip such invalid > byte sequence strings while there are no official fix, so i can finish the > import? > Can we detect invalid byte characters?
Hi again, actually my problem is that I'm unable to reproduce this bug. :-) Using Postgresql and SQLObject, my run goes on smooth. I have downloaded the 'actors.list.gz' file today, so it's possible that some garbage was removed. Anyway, the previously proposed solution was obviously flawed, since the problem was on _character_ names. So, let's edit again the imdbpy2sql.py file and change the lines around 1540 so that they become: movieid = CACHE_MID.addUnique(title) if role is not None: roles = filter(None, [x.strip() for x in role.split('/')]) for role in roles: role = role.replace('\xec\x8c\xa0', '') # TEMPORARY FIX cid = CACHE_CID.addUnique(role) sqldata.add((pid, movieid, cid, note, order)) Maybe this will help... who knows? :-) -- Davide Alberani <davide.alber...@gmail.com> [PGP KeyID: 0x465BFD47] http://www.mimante.net/ ------------------------------------------------------------------------------ Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev _______________________________________________ Imdbpy-help mailing list Imdbpy-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-help