Hi,

I'm new to Python and I've been playing with it and AppEngine and
imdbpy for a couple weeks. For the project I've in mind I really need
to have as many imdbid values mapped as possible. During research, and
checking the raw files myself, I found that many people ask for it but
it's kinda impossible for imdbpy2sql to do better than it does at
guessing ids. Since I really want that info I came up with a solution
that I believe is easily compatible with the existing imdbpy
philosophy as well as fault-tolerant to the problems pointed around
(e.g. names change).

My solution is based on the fact that searching imdb for the raw names
(in the movies.list file) returns an exact match almost aways. That
means, overtime, some applications will end up getting the true id of
a movie but there is no way for imdbpy2sql/database to recover the
original raw title. What I've done was to add a new column to the
titles table called rawMD5 and when importing to the database I
calculate & store the MD5 of the raw title.

I'm not done with the rest but, the idea is to simply create a "hash-
table" matching imdbid values with the md5 of each raw title. Using
batch import scripts or simply letting applications collect a sizable
amount of hashes and then centralizing on a single file would make it
possible for the imdbpy2sql import code to get the right imdbid codes
for most of the records.

When changes in titles, new titles, etc... would occur it would simply
fail gracefully and over time those new hash-imdbid codes could be
made available.

Let me know what you think. The changes to support a MD5 column are
just 2-3 lines iirc and it shouldn't cause any problems to anyone, yet
it would allow for this type of feature to be implemented even if
outside the imdbpy code base. But having db-compatibility with the
upstream code base would be a very nice bonus.

Cheers!

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help

Reply via email to