Davide,

You are right. After the check, I have realized that I was using InnoDB
tables. And at the end of 15 hours, the script was still running :) So I
have immediately stopped the current elaboration as you suggested. I've
changed the configuration to MyISAM tables also set

character-set-server   = utf8
default-collation      = utf8_unicode_ci
default-character-set  = utf8

I have also made some changes about performance in MySQL configuration.
After all those changes, I will immediately give you feedback about the
situation. But all your support so far have been so helpful for me. Thanks a
lot:)

Other than all these, I want to ask your opinion about one more issue.

I am intending to use IMDbPY for my thesis work, in which I am designing and
implementing ReMovender, an intelligent web based movie recommendation
system. And I need to keep my movie data up to date by running some scripts
from the user interface.

First I have thought of crawling http://www.imdb.com/nowplaying/. But after
realizing that the movies in this page are already in the imdb database,
I've found out that it would take so much time for this page to help me with
obtaining the up-to-date data.(The movies can be displayed in the nowplaying
interface even after 3 years passed when the database was updated. And I can
need 3 years of checking nowplaying interface although there is never a
difference with my current database) So I need to be aware just when the
database is updated.

Do you think that IMDbPY can be helpful for this issue? Maybe, I can provide
a user interface to the administrator of my system so that he/she can update
the movie data properly.
http://imdbpy.sourceforge.net/docs/README.sqldb.txtalso mentions about
diffs files of IMDb. But I haven't been able to find a
document about how to use the diffs files with IMDbPY and I'd be very really
if you shared your opinion about whether this would be a good idea and how I
can make this dream to come true :)

I hope I could explain what I am intending to do.

Thank you so much for your time and all the help that you have given so far.
I am planning to state my thanks to you and your team in the first page of
my thesis for all the contribution you have made :)

Best Regards,
Gozde






2009/5/10 Davide Alberani <davide.alber...@gmail.com>

> On May 10, Gozde Ozbal <gozba...@gmail.com> wrote:
>
> > I am using -d C:\IMDB-ftp -u mysql://root:123...@localhost/imdb
>
> Seems fine.  Are you using InnoDB or MyISAM tables?
> There are some options to improve performances, with InnoDB (but
> notice that MyISAM is _always_ faster, for our needs).
>
> > I am sorry that I cannot send you the output before the exception
> > in the error scenario, since I have been running the script for
> > approximately 10 hours and don't wanna halt it at the moment
>
> I fear that you'll end up with a lot of mess in your database
> anyway. :-/
> Given the kind of errors, I think there's something serious, and
> it will prevent the data to be useful, sorry.
>
> > After this little change, I only receive lots of warnings like
> > C:\WorkSpace\tez\IMDbPY-4.1\imdbpy2sql.py:1107: Warning: Incorrect
> > string value: '\xC31536' for column 'phonetic_code' at row 1094
>
> A real mess. :-)
> phonetic_code must be "AsciiChar+4digits", so I fear something is
> gone horribly wrong.
>
> > And please note that these warnings also exist for other columns
> > like note, name_pcode_nf and surname_pcode and I haven't changed
> > the lenghts of those.  Do you have any idea about how these warnings
> > can be prevented?
>
> Sounds a bit like a configuration problem on your side, but I
> can't be too sure.
> Probably something about character-set/collation: in the README.sqldb
> file (in the docs) there are some example to fix it.
>
> My ideas for you:
> - stop the current elaboration.
> - leave the length of phoneticCode to 5.
> - put a single file (I'd start with movies.list.gz, trying with others
>  if no problems occur) in an empty directory and use it as the "-d"
>  argument of imdbpy2sql.py
> - try changing your MySQL configuration until you get no errors/warnings.
>
> In the next days, I'll try with MySQL 5.1 and the latest IMDb data.
>
>
> Thank you very much for your effort debugging the problem.
>
> --
> Davide Alberani <davide.alber...@gmail.com> [GPG KeyID: 0x465BFD47]
> http://erlug.linux.it/~da/ <http://erlug.linux.it/%7Eda/>
>
------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help

Reply via email to