Phenomenally helpful.

So in short, if i understand it all correctly, the IMDB has in their
own database: the real IMDB id, whether something is adult, whether
something is on Amazon/Blockbuster. However, none of this is shared by
them, and the only way to get any of them is to use a script to do a
title search and scrape the pages?

On Fri, Mar 19, 2010 at 6:16 AM, Davide Alberani
<davide.alber...@gmail.com> wrote:
> On Mar 19, Michael Liu <mikel...@gmail.com> wrote:
>
>> I used imdbpy2sql to populate a local database with the IMDB data, but
>> have some questions about the data.
>
> Hi!
> I take this as an opportunity to write some FAQs to put in the
> documentation, since these questions came up a lot, lately. :-)
>
>> In the titles table, the column imdb_id is empty. Am I missing a file
>> I needed to download to fill this in? How can I get imdb_ids?
>
> Q2: why the movieID (and other IDs) used in the 'sql' database are not
>    the same used on the IMDb.com site?
>
> A2: first, a bit of nomenclature: we'll call "movieID" (or things like
>    "personID", for instance of the Person class) a unique identifier used
>    by IMDbPY to manage a single movie (or other kinds of object).
>    We'll call "imdbID" a unique identifier used, for the same kind
>    of data, by the IMDb.com site (i.e.: the 7-digit number in tt0094226,
>    as seen in the URL for "The Untouchables").
>
>    Using IMDbPY to access the web ('http' and 'mobile' data access
>    systems), movieIDs and imdbIDs are the same thing - beware that
>    in this case a movieID is a string, with the leading zeroes.
>
>    Unfortunately, populating a sql database with data from the plain
>    text data files, we don't have access to imdbIDs - since they are
>    not distributed at all - and so we have to made them by ourselves
>    (they are the 'id' column in tables like 'title' or 'name').
>    This mean that these values are valid only for your current database:
>    if you update it with a newer set of plain text data files, these IDs
>    will surely change (and, by the way, they are integers).
>    It's also obvious, now, that you can't exchange IDs between the
>    'http' (or 'mobile') data access system and 'sql', and in the same
>    way you can't use imdbIDs with your local database or vice-versa.
>
>
> Q3: using a sql database, what's the imdb_id (or something like that)
>    column in tables like 'title', 'name' and so on?
>
> A3: it's internally used by IMDbPY to remember the imdbID (the one
>    used by the web site - accessing the database you'll use the numeric
>    value of the 'id' column, as movieID) of a movie, once it stumbled
>    upon.  This way, if IMDbPY is asked again about the imdbID of
>    a movie (or person, or ...), it doesn't have to contact again to
>    the web site.  Notice that you have to access the sql database using
>    a user with write permission, to update it.
>
>    As a bonus, when possible, the values of these imdbIDs are saved
>    between updates of the sql database (using the imdbpy2sql.py script).
>    Beware that it's tricky and not always possible, but the script does
>    its best to succeed.
>
> Q4: but what if I really need the imdbIDs, to use my database?
>
> A4: no, you don't.  Search for a title, get its information.  Be happy!
>
> Q5: I have a great idea: write a script to fetch all the imdbID from the
>    web site!  Can't you do it?
>
> A5: yeah, I can.  But I won't. :-)
>    It would be somewhat easy to map every title on the web to its
>    imdbID, but there are still a lot of problems.
>    First of all, every user will end up doing it for its own copy
>    of the plain text data files (and this will make the imdbpy2sql.py
>    script painfully slow and prone to all sort of problems).
>    Moreover, the imdbIDs are unique and never reused, true, but movie
>    title _do_ change: to fix typos, override working titles, to cope
>    with a new movie with the same title release in the same year (not
>    to mention cancelled or postponed movies).
>
>    Besides that, we'd have to do the same for persons, characters and
>    companies.  Believe me: it doesn't make sense.
>    Work on your local database using your movieIDs (or even better:
>    don't mind about movieIDs and think in terms of searches and Movie
>    instances!) and retrieve the imdbID only in the rare circumstances
>    when you really need them (see the next FAQ).
>    Repeat with me: I DON'T NEED ALL THE imdbIDs. :-)
>
>> Without the imdb_id, is it possible for me to generate a link to a
>> given movie on IMDB?
>
> Q6: using a sql database, how can I convert a movieID (whose value
>    is valid only locally) to an imdbID (the ID used by the imdb.com site)?
>
> A6: various functions can be used to convert a movieID (or personID or
>    other IDs) to the imdbID used by the seb site.
>    Example of code:
>
>      from imdb import IMDb
>      ia = IMDb('sql', uri=URI_TO_YOUR_SQL_DATABASE)
>      movie = ia.search_movie('The Untouchables')[0] # a Movie instance.
>      print 'The movieID for The Untouchables:', movie.movieID
>      print 'The imdbID used by the site:', ia.get_imdbMovieID(movie.movieID)
>      print 'Same ID, smarter function:', ia.get_imdbID(movie)
>
>    It goes without saying that get_imdbMovieID has some sibling
>    methods: get_imdbPersonID, get_imdbCompanyID and get_imdbCharacterID.
>    Also notice that the get_imdbID method is smater, and takes any kind
>    of instance (the other functions need a movieID, personID, ...)
>
>    Another method that will try to retrieve the imdbID is get_imdbURL,
>    which works like get_imdbID but returns an URL.
>
>    In case of problems, these methods will return None.
>
>> Also, the online IMDB is aware of which titles are adult movies, but I
>> don't see any similar column in my local database. How can I determine
>> whether a movie is adult or not?
>
> Read README.adult and see imdb/parser/sql/__init__.py: searching for
> a title, it tries to guess if it's an adult title.
> It can't be perfect and I don't assume any kind of responsibilities
> on this matter. ;-)
>
>> Lastly, online IMDB seems to know which movies are and aren't
>> available on Amazon and Blockbuster. Is that in the database
>> somewhere?
>
> No.
> Accessing the web ('http' and 'mobile'), there are parsers for
> the 'amazon reviews' page, but these information are not published
> in the plain text data files.
>
>
> HTH,
> --
> Davide Alberani <davide.alber...@gmail.com> [GPG KeyID: 0x465BFD47]
> http://www.mimante.net/
>

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help

Reply via email to