Hi Davide,
I just wanted to let you know that after increasing my RAM capacity, I am
now able to import all movie data in only 190 minutes :)

I still receive the incorrect string value warnings although I have changed
mysql conf file to use utf8 as mentioned in the document that you have
mentioned. However, all those characters are displayed correctly in
mysqladmin and so I hope that it will not be such a big problem for me.

By the way, I just want the users to be aware of new movies(the movies other
than the ones that were imported to our database by means of IMDbPY ), so
that the system that I am designing can be a really useful and up to date
one. In order to achive this, I am intending to provide an interface for the
administrator of the system. The administrator will be able to update the
movies with just a click :)

After your explanation, I guess there are only 2 ways for that:

1) downloading every file on imdb interface and running IMDbPY script at the
end of each week (that will certainly be more accurate but time consuming)

2) Properly crawling http://www.imdb.com/nowplaying/ or
http://italian.imdb.com/Recent/  <http://italian.imdb.com/Recent/> (because
of the reason I stated before, this will not provide with different movies
other than the ones in our database, for a long time)

I hope I could now explain what I am intending to do. Please, let me know if
you have any other idea than those two.

Thank you so much for your time and all the support you have given.
Best Regards,
Gozde Ozbal


2009/5/11 Davide Alberani <davide.alber...@gmail.com>

> On May 10, Gozde Ozbal <gozba...@gmail.com> wrote:
>
> > I have also made some changes about performance in MySQL
> > configuration.
>
> Good; another possible speed-up is to use imdbpy2sql.py to dump a set
> of CSV files, later imported into the database.  See README.sqldb
> for the details.
>
> By the way, the plain text data files still contain some file
> (complete-crew, distributors, keywords, miscellaneous-companies
> and special-effects-companies at this moment) with movie titles in
> the old format (IMDb recently switched from "Title, The" to "The Title").
>
> Running imdbpy2sql.py you should use the --fix-old-style-titles
> argument so that every title is converted to the new format.
>
> > I am intending to use IMDbPY for my thesis work,
>
> Cool!  Seems to be one of the more popular use of IMDbPY, these days. :-)
>
> Maybe you can be interested in the Hollywood Informatics group,
> where some IMDbPY developers and users are trying to explore
> strange uses of the IMDb data. :-)
> It's a new initiative and so nothing was done, yet, but it can
> be a good place to share some ideas (not about IMDbPY itself:
> there are already these mailing list, for help and development).
>
> > in which I am designing and implementing ReMovender, an intelligent
> > web based movie recommendation system. And I need to keep my movie
> > data up to date by running some scripts from the user interface.
>
> I'm not sure to have completely understood how it will work and
> what kind of data you need to keep up-to-date.
> Can you provide a short example of what an user will do?
>
> > First I have thought of crawling http://www.imdb.com/nowplaying/. But
> > after realizing that the movies in this page are already in the
> > imdb database, I've found out that it would take so much time for
> > this page to help me with obtaining the up-to-date data.
>
> You can parse that page (or http://italian.imdb.com/Recent/ ) to get
> a list of movies that are in theaters; after that, if you need complete
> information about a movie, you can use IMDbPY.
> Beware that the movieIDs internally used by IMDbPY to uniquely identify
> a movie are not the same for the web and the SQL database: you see
> that Star Trek (2009) has a movieID=0796366 on the web (and it's the
> same if you access its data using the 'http' and 'mobile' data
> access systems of IMDbPY), but with the 'sql' data access systems
> it will have another integer ID.
> That's because IMDb doesn't distribute a map from titles to
> movieIDs in the plain text data files (obviously you can search
> for the complete title and get the first result: 99.9% of the times
> it will lead to the same movie).
>
> > So I need to be aware just when the database is updated.
>
> The IMDb's database?
> The web pages are _constantly_ updated, so you can't tell for sure
> if something new was added.
> The plain text data files are update once a week (but not every file
> every week).
>
> > Do you think that IMDbPY can be helpful for this issue? Maybe,
> > I can provide a user interface to the administrator of my system
> > so that he/she can update the movie data properly.
>
> IMDbPY can be useful to get easy access to information about
> movies, persons, characters and companies; maybe it's what you
> need, maybe not. :-)
>
> > http://imdbpy.sourceforge.net/docs/README.sqldb.txt also mentions
> > about diffs files of IMDb. But I haven't been able to find a document
> > about how to use the diffs files with IMDbPY
>
> Right now they can't be used, if not to patch your set of plain
> text data files and then run imdbpy2sql.py again on the patched set.
>
> IMDbPY can't still update a database using the patches; Timo Schulz
> is working on the same problem and trust me: it's a very difficult
> task.
>
> > I am planning to state my thanks to you and your team in the first
> > page of my thesis for all the contribution you have made :)
>
> Thank you very much, I'd be very proud of it. :-)
>
>
> HTH,
> --
> Davide Alberani <davide.alber...@gmail.com> [GPG KeyID: 0x465BFD47]
> http://erlug.linux.it/~da/ <http://erlug.linux.it/%7Eda/>
>
>
> ------------------------------------------------------------------------------
> The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
> production scanning environment may not be a perfect world - but thanks to
> Kodak, there's a perfect scanner to get the job done! With the NEW KODAK
> i700
> Series Scanner you'll get full speed at 300 dpi even with all image
> processing features enabled. http://p.sf.net/sfu/kodak-com
> _______________________________________________
> Imdbpy-help mailing list
> Imdbpy-help@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/imdbpy-help
>
------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help

Reply via email to