On Sun, Nov 28, 2010 at 2:50 AM, Derek Ditch <derek.ditch...@gmail.com> wrote:
>
> I'm working on a project that analyzes graph structures using a modified
> version of PageRank for a sample data set, I'm considering IMDB, using
> imdbpy,

Hi!
Your project sounds extremely cool. :-)

> So, it doesn't look like imdbpy has the ability built-in to iterate through
> all movies, or movies of a specific genre, nor of actors. So I suppose I
> will create a new method of the Movie class and implementation within the
> sql parser to return all results.

You're right: by itself IMDbPY doesn't have the ability to iterate over a
large subset of the IMDb database, and I don't think it's a feature we
should integrate too much; keep in mind that as a principle IMDbPY tries
to be transparent regarding its access to the information: such a feature
would be too specific of the SQL database, and impossible - or at least
legally dubious - to implement for the HTTP access.

Obviously a more or less separated package/framework to work on bunches
of items extracted from the SQL database would be more than welcome and,
to tell the truth, a way to express more complex searches on the SQL database
could be a really nice and useful feature. :-)

Basically, it works this way: each items which must be uniquely
identified (Person,
Movie, Character and Company instances) uses the 'id' primary key column of
its database table as ID (as you may have noticed, the ID used for a movie in
the SQL database is _not_ the imdbID used by IMDb on its web site, since the
latter are not included in the plain text data file).

So the best approach would be to access the SQL database in the same way
IMDbPY does: since we're amazingly cool ;-) we didn't settle on one ORM, but
we transparently support both SQLObject (we use its semantic, in our code)
and SQLAlchemy.
Their interface is abstracted in the dbschema.py, alchemyadapter.py and
objectadapter.py (beware: there's a certain amount of black magic involved :-)

To use them, the process is somewhat manual, and could probably be more
automated; to import what you need, see the __init__ method of the
IMDbSqlAccessSystem class in the sql/__init__.py file (very similar code can
be fund in the imdbpy2sql.py script, around line 277.
After that, as said, you can access the database using the created objects
(the ones returned by the getDBTables function) using the SQLObject syntax.
With these object, you can create complex queries on the database; once
you have the list of IDs you're interested in (or a generator of IDs), you can
use a normal imdb.IMDb('sql', ...) instance to access every information you
need using IMDbPY.

It goes without saying that it's possible that the information that you need
are somewhat limited, and such a solution could be too much: maybe you
can give up using the ORM abstraction and even IMDbPY, working directly
on the database.

Let me know how you decide to proceed, and if you need any help - as ideas
or clarification on the internals of IMDbPY: unfortunately right now I don't
have any time to write code :-/

-- 
Davide Alberani <davide.alber...@gmail.com>  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help

Reply via email to