Hi Davide,

Thanks for the code, however it doesn't really work so far. Here are
the top 10 movies ['title'] and guessEnglishTitles. It seems that the
aka matching is not working (except for the Good Bad and Ugly :-))

The Shawshank Redemption
Втеча з Шоушенка
-
The Godfather
Mario Puzo's The Godfather
-
The Godfather: Part II
Mario Puzo's The Godfather: Part II
-
Il buono, il brutto, il cattivo.
The Good, the Bad and the Ugly
-
Pulp Fiction
Кримiнальне чтиво
-
Schindler's List
Список Шиндлера
-
One Flew Over the Cuckoo's Nest
Zbor deasupra unui cuib de cuci
-
The Dark Knight
Batman: The Dark Knight
-
The Lord of the Rings: The Return of the King
The Return of the





Zsolt





On Sun, Nov 13, 2011 at 7:11 PM, Davide Alberani
<davide.alber...@gmail.com> wrote:
> On Sun, Nov 13, 2011 at 18:19, Zsolt Ero <zsolt....@gmail.com> wrote:
>>
>> Yes, they always differ for all international movies.
>
> It has to be this way, to have consistency with the data from the plain
> text data files.
>
> You best shot is something like this (modify it as you wish):
>
> import re
> import imdb
>
> ia = imdb.IMDb()
> ibibic = ia.get_movie("0060196")
>
> # List of regexp used to search for
> # possible English title in the notes.
> # Notice that the order matters: the
> # first match, wins.
> # Right now .findall is used, but .match
> # can be used, too (changing the regexps...)
> # A
lso notice that there may be other variations
> # that can be consided, like '(imdb display title)',
> # '(literal English title)' and so on.
> _re_english_akas_notes = (
>    re.compile('^International \(English', re.I),
>    re.compile('International \(English', re.I),
>    re.compile('^USA', re.I),
>    re.compile('USA', re.I),
>    re.compile('^UK', re.I),
>    re.compile('UK', re.I),
>    re.compile('^English', re.I),
>    re.compile('English', re.I),
> )
>
> def guessEnglishTitle(movie, _releaseInfoToo=True):
>    """Return the guessed English title
>    of the movie, or the default title,
>    if unable to guess."""
>    # Consider both AKAs from the main
>    # and release info pages.
>    akas = movie.get('akas') or []
>    if _releaseInfoToo:
>        # FIXME: ia MUST be a parameter of this function!
>        ia.update(movie, 'release dates')
>        akas += movie.get('akas from release info') or []
>    aka_list = []
>    for aka in akas:
>        aka_split = aka.split('::', 1)
>        if len(aka_split) < 2:
>            continue
>        aka_list.append(aka_split)
>    best_guess = None
>    for title, note in aka_list:
>        for re_ in _re_english_akas_notes:
>            if re_.findall(note):
>                best_guess = title
>                break
>        if best_guess:
>            break
>    return title or movie.get('title')
>
> print guessEnglishTitle(ibibic)
>
>
>
>
> --
> Davide Alberani <davide.alber...@gmail.com>  [PGP KeyID: 0x465BFD47]
> http://www.mimante.net/
>

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help

Reply via email to