On Mar 22, Michael Liu <mikel...@gmail.com> wrote:

> In the title table generated by imdbpy2sql, what are the meanings of
> the column titled imdb_index and phonetic_code?

They are internally used, so they're not documented.

phonetic_code (and the various *_pcode columns in other tables) is
used when you search for a given title; it's value is calculated
at insert-time, and is a SOUNDEX phonetic code (i.e. a representation
of how a given word/phrase sounds).
So that, handling a search, we can select a subset of the database of
titles that "sound similar" to the one we're searching for - this subset
is then ordered using a Ratcliff-Obershelp similarity metric.
You can find the layout of the database in the imdb.parser.sql.dbschema
module (abstracted: we're pretty naive and support both SQLObject and
SQLAlchemy... ;-)
An old message about sondex/racliff-obershelp:
  
http://sourceforge.net/mailarchive/message.php?msg_name=20060407152643.GB4376%40libero.it
 

imdb_index is what a long time ago I decided to call the "imdbIndex"
(probably not a very good name...): it's used when two movies,
produced the same year, share the same title.  It's the one you may see
in the imdb.com page after a title, inside the parentheses containing
the production year, separated by a slash.

Example:
  10 Bullets (2007/I)
  10 Bullets (2007/II)

It's also used to disambiguate persons' names.


Now... a question: do you really need to understand the internals
of IMDbPY?  It's perfectly legit, but IMDbPY is not only a tool to
put the plain text data files into a SQL database: it's perfectly
able to extract the information from the database, too. :-)
Are you sure that you need to directly access the database, without
using IMDbPY?


Bye!
-- 
Davide Alberani <davide.alber...@gmail.com> [GPG KeyID: 0x465BFD47]
http://www.mimante.net/

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help

Reply via email to