On Oct 05, Vitaly Pashkov <ad...@fluda.net> wrote:

> Trying to find out where exactly my
> problem occurs i slightly modified "readMovieList()" function at line 1427
> of imdbpy2sql.py, so now it looks like:
> mid = CACHE_MID.addUnique(title.decode('utf-8'), yearData)

I'm not sure that with such a change you'll get a consistent
database. :-/
Anyway it's worth a try, at least to spot the problem.

Two things you can try, now:
1. put your movies.list.gz somewhere: with the one I've downloaded
   yesterday I'm unable to reproduce the problem; I'd like to try yours.
2. with your change to line 1427 in place, modify also the title_soundex
   function adding at the top these two lines:
     print _(title)
     sys.stdout.flush()
   in your output you should see the "wrong" title right before
   the UnicodeWarning warning.

> and then it continues to importing data. This error occurs only once.

It starts to sound a lot like garbage in the data...

> Any ideas how i can intercept this UnicodeWarning so i can see at what line of
> movie.list it happen? I tried to add next line to readMovieList():
> print "Title: " + title + ", counter: " + str(count)
> but the output is soooo big... tonns of lines, over9000.

Try modifying the title_soundex function as said above, and
then run your command appending this:
  2&>1 | tee ~/OUTPUT.txt

After that, you can easily search for UnicodeWarning in the
~/OUTPUT.txt file.

> BTW, after that change (decode('utf-8')) DB content is looking good, but
> as it contains almost 1.5M records i can't be sure for 100%.

As said, doing so I'm not too sure that the database will store the
titles with the right encoding (every database seems to have a
different opinion about how to handle input in unicode and/or utf8
or other encodings...)

Thank you very much for you help!

-- 
Davide Alberani <davide.alber...@gmail.com> [GPG KeyID: 0x465BFD47]
http://erlug.linux.it/~da/

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf
_______________________________________________
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help

Reply via email to