Re: Disable HTML in forum messages (was: Movie (MPAA) ratings and Python?)

2013-12-12 Thread Steven D'Aprano
On Wed, 11 Dec 2013 19:23:39 -0800, rusi wrote:

 The problem is that then your other mails (may) become plain text and
 your friends/recipients will wonder whether you've entered a
 time-machine and gone back to 1990!!

Not everything that's changed since 1990 has been an improvement.


 Many people find it simpler to just use Google groups.  It also has its
 problems (as do all methods!) but in sum its the easiest option to use.

How ironic. After mocking those of us who prefer to send and receive 
plain text, you then recommend that people use a delivery mechanism which 
sends plain text.



-- 
Steven
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Disable HTML in forum messages (was: Movie (MPAA) ratings and Python?)

2013-12-12 Thread Steve Hayes
On 12 Dec 2013 11:05:35 GMT, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:

On Wed, 11 Dec 2013 19:23:39 -0800, rusi wrote:

 The problem is that then your other mails (may) become plain text and
 your friends/recipients will wonder whether you've entered a
 time-machine and gone back to 1990!!

Not everything that's changed since 1990 has been an improvement.

And vice versa. 


-- 
Steve Hayes from Tshwane, South Africa
Web:  http://www.khanya.org.za/stevesig.htm
Blog: http://khanya.wordpress.com
E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Movie (MPAA) ratings and Python?

2013-12-12 Thread Dave Angel
On Wed, 11 Dec 2013 23:22:14 -0700, Michael Torrie 
torr...@gmail.com wrote:
From what I can see gmail is producing a multipart message that has 

a
plaint text part and an html part.  This is what gmail normally 

does and
as far as I know it's RFC-compliant and that's what gmail always 

does.

Always does doesn't mean it's a good idea on a text newsgroup. 

Very often the pretty text in the html part is mangled in the text 
part. Most often this is just indentation,  but for Python that's a 
biggie. It also means that we don't all see the same thing. 

Including both makes the download slower and more expensive. 

Some text newsreaders refuse to show anything if there's an html 
part.  Mine (groundhog on android) apparently shows the text part if 
it follows the html part.


--
DaveA

--
https://mail.python.org/mailman/listinfo/python-list


Re: Movie (MPAA) ratings and Python?

2013-12-11 Thread Petite Abeille

On Dec 11, 2013, at 12:50 AM, Dan Stromberg drsali...@gmail.com wrote:

 Now the question becomes: Why did chardet tell me it was windows-1255?  :)

As it says on the tin: chardet guesses the encoding of text files. The 
operative word is ‘guesses’.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Movie (MPAA) ratings and Python?

2013-12-11 Thread Ned Batchelder

On 12/10/13 6:50 PM, Dan Stromberg wrote:


On Tue, Dec 10, 2013 at 1:07 PM, Petite Abeille
petite.abei...@gmail.com mailto:petite.abei...@gmail.com wrote:


On Dec 10, 2013, at 6:25 AM, Dan Stromberg drsali...@gmail.com
mailto:drsali...@gmail.com wrote:

  The IMDB flat text file probably came the closest, but it appears
to have encoding issues; it's apparently nearly windows-1255, but
not quite.

It's ISO-8859-1.

Thanks - that reads well from CPython 3.3.

Now the question becomes: Why did chardet tell me it was windows-1255?  :)


It probably told you it was Windows-1252 (I'm assuming the last 5 is a 
typo).


Windows-1252 is a super-set of ISO-8859-1, so any text that is correct 
ISO-8859-1 is also correct Windows-1252.  In addition, it's not uncommon 
to find text marked as ISO-8859-1 that in fact has characters that make 
it Windows-1252.



--
Ned Batchelder, http://nedbatchelder.com

--
https://mail.python.org/mailman/listinfo/python-list


Re: Movie (MPAA) ratings and Python?

2013-12-11 Thread Dan Stromberg
On Wed, Dec 11, 2013 at 10:35 AM, Ned Batchelder n...@nedbatchelder.comwrote:

 On 12/10/13 6:50 PM, Dan Stromberg wrote:
 Now the question becomes: Why did chardet tell me it was windows-1255?  :)

 It probably told you it was Windows-1252 (I'm assuming the last 5 is a
 typo).

 Windows-1252 is a super-set of ISO-8859-1, so any text that is correct
 ISO-8859-1 is also correct Windows-1252.  In addition, it's not uncommon to
 find text marked as ISO-8859-1 that in fact has characters that make it
 Windows-1252.


 $ chardet mpaa-ratings-reasons.list
mpaa-ratings-reasons.list: windows-1255 (confidence: 0.97)

I'm aware that chardet is playing guessing games, though one would hope it
would guess well most of the time, and give a reasonable confidence rating.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Movie (MPAA) ratings and Python?

2013-12-11 Thread Steven D'Aprano
On Wed, 11 Dec 2013 15:07:35 -0800, Dan Stromberg wrote:

  $ chardet mpaa-ratings-reasons.list
 mpaa-ratings-reasons.list: windows-1255 (confidence: 0.97)
 
 I'm aware that chardet is playing guessing games, though one would hope
 it would guess well most of the time, and give a reasonable confidence
 rating. 

What reason do you have for thinking that Windows-1255 isn't a reasonable 
guess? If the bulk of the text is Latin-1 except perhaps for one or two 
Hebrew characters (or what chardet thinks are Hebrew characters), it may 
actually be a reasonable guess.

If it is a poor guess, perhaps you ought to report it to the chardet 
maintainers as a good example of a poor guess.


By the way, this forum is a text-only newsgroup and so-called Rich 
Text (actually HTML) posts are frowned upon because most people don't 
appreciate having to read gunk like this:

 div dir=ltrbrdiv class=gmail_extradiv
 class=gmail_quote ... br
 blockquote class=gmail_quote style=margin:0px 0px 0px
 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1exdiv
 class=im ... br/div/div/div/div

If you can, would you please turn off rich text posting when you post 
here please?

Thank you.



-- 
Steven
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Movie (MPAA) ratings and Python?

2013-12-11 Thread Dan Stromberg
On Wed, Dec 11, 2013 at 3:24 PM, Steven D'Aprano 
steve+comp.lang.pyt...@pearwood.info wrote:

 On Wed, 11 Dec 2013 15:07:35 -0800, Dan Stromberg wrote:

   $ chardet mpaa-ratings-reasons.list
  mpaa-ratings-reasons.list: windows-1255 (confidence: 0.97)
 
  I'm aware that chardet is playing guessing games, though one would hope
  it would guess well most of the time, and give a reasonable confidence
  rating.

 What reason do you have for thinking that Windows-1255 isn't a reasonable
 guess? If the bulk of the text is Latin-1 except perhaps for one or two
 Hebrew characters (or what chardet thinks are Hebrew characters), it may
 actually be a reasonable guess.


I get a traceback if I try to read the file as Windows-1255.  I don't get a
traceback if I read it as ISO-8859-1.


 If it is a poor guess, perhaps you ought to report it to the chardet
 maintainers as a good example of a poor guess.

I was considering that, and may do so.

I've also been wondering if ISO-8859-1 is just an octet-oriented codec, so
it'll read about anything.  There are clearly non-7-bit-ASCII characters in
the file that look like line noise in an mrxvt.

By the way, this forum is a text-only newsgroup and so-called Rich
 Text (actually HTML) posts are frowned upon because most people don't
 appreciate having to read gunk like this:

  div dir=ltrbrdiv class=gmail_extradiv
  class=gmail_quote ... br
  blockquote class=gmail_quote style=margin:0px 0px 0px
  0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1exdiv
  class=im ... br/div/div/div/div

 If you can, would you please turn off rich text posting when you post
 here please?

 Thank you.

Apologies.  I didn't realize gmail was doing this.   I had thought it would
only do so if I used the formatting options in the composer, but perhaps it
does so even when just typing text.

I formerly used MH; are you using MH?  There isn't a lot of e-mail programs
that don't do HTML anymore.  Even mutt can do HTML with very slight
configuration; it's actually quite powerful and ISTR it can do MH folders.

I found a remove formatting button in gmail's composer, and used it on
this message.  Does this message look like plain text?

I'm not really prepared to give up gmail's quick searching; I used to index
my e-mails using pyindex and dovecot, but happily I don't need to anymore.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Movie (MPAA) ratings and Python?

2013-12-11 Thread Ned Batchelder

On 12/11/13 6:39 PM, Dan Stromberg wrote:


On Wed, Dec 11, 2013 at 3:24 PM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info
mailto:steve+comp.lang.pyt...@pearwood.info wrote:

On Wed, 11 Dec 2013 15:07:35 -0800, Dan Stromberg wrote:

   $ chardet mpaa-ratings-reasons.list
  mpaa-ratings-reasons.list: windows-1255 (confidence: 0.97)
 
  I'm aware that chardet is playing guessing games, though one
would hope
  it would guess well most of the time, and give a reasonable
confidence
  rating.

What reason do you have for thinking that Windows-1255 isn't a
reasonable
guess? If the bulk of the text is Latin-1 except perhaps for one or two
Hebrew characters (or what chardet thinks are Hebrew characters), it may
actually be a reasonable guess.


I get a traceback if I try to read the file as Windows-1255.  I don't
get a traceback if I read it as ISO-8859-1.

If it is a poor guess, perhaps you ought to report it to the chardet
maintainers as a good example of a poor guess.

I was considering that, and may do so.

I've also been wondering if ISO-8859-1 is just an octet-oriented codec,
so it'll read about anything.  There are clearly non-7-bit-ASCII
characters in the file that look like line noise in an mrxvt.


Both ISO-8859-1 and Windows-1255 are octet-oriented, I don't see why one 
would raise an exception when the other didn't.  Unless the exception 
isn't on the decode, but instead on your attempt to output the result. 
Can you show the full traceback you're seeing?


--
Ned Batchelder, http://nedbatchelder.com

--
https://mail.python.org/mailman/listinfo/python-list


Disable HTML in forum messages (was: Movie (MPAA) ratings and Python?)

2013-12-11 Thread Ben Finney
Dan Stromberg drsali...@gmail.com writes:

 On Wed, Dec 11, 2013 at 3:24 PM, Steven D'Aprano 
 steve+comp.lang.pyt...@pearwood.info wrote:
  By the way, this forum is a text-only newsgroup and so-called Rich
  Text (actually HTML) posts are frowned upon […]
  If you can, would you please turn off rich text posting when you post
  here please?

 Apologies. I didn't realize gmail was doing this. I had thought it
 would only do so if I used the formatting options in the composer, but
 perhaps it does so even when just typing text.

Thanks for taking measures to send messages in plain text.

 I found a remove formatting button in gmail's composer, and used it
 on this message. Does this message look like plain text?

Still sent with an HTML part, so some other change must be needed to
disable that.

 There isn't a lot of e-mail programs that don't do HTML anymore.

Many of the better mail clients allow the user to explicitly stop
rendering HTML (but still have it available, as Steven points out).

Disabling HTML in messages is a good idea: HTML rarely adds anything
useful to a message in a discussion forum, but it can cause the mail
program to do actions unwanted by the user (e.g. fetch images from
elsewhere, or run ECMAScript, or invoke HTML rendering bugs).

Plain text doesn't have those problems, which is why it's more courteous
to stop sending HTML messages in most cases.

Because it's inefficient to poll many recipients for whether their
system can work with HTML messages, avoiding sending HTML altogether is
especially advisable with multiple recipients, such as discussion
forums.

 I'm not really prepared to give up gmail's quick searching; I used to
 index my e-mails using pyindex and dovecot, but happily I don't need
 to anymore.

You will be pleased to know, then, that ‘notmuch’ is a client-side
system providing very quick email indexing and searching
URL:http://notmuchmail.org/.

Notmuch is available directly from several operating systems (e.g.
Debian) or install it yourself. It works with numerous existing mail
clients, and brings the significant advantage of organising one's email
by search, not by exclusive folders.

-- 
 \“Intellectual property is to the 21st century what the slave |
  `\  trade was to the 16th.” —David Mertz |
_o__)  |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Disable HTML in forum messages (was: Movie (MPAA) ratings and Python?)

2013-12-11 Thread Ian Kelly
On Wed, Dec 11, 2013 at 6:12 PM, Ben Finney ben+pyt...@benfinney.id.au wrote:
 I found a remove formatting button in gmail's composer, and used it
 on this message. Does this message look like plain text?

 Still sent with an HTML part, so some other change must be needed to
 disable that.

Check the default formatting in the settings, or perhaps instead of
no signature there is an empty signature selected that is adding
formatting?

 There isn't a lot of e-mail programs that don't do HTML anymore.

 Many of the better mail clients allow the user to explicitly stop
 rendering HTML (but still have it available, as Steven points out).

Unfortunately, Gmail has recently moved away from the explicit toggle
and now only has that Remove formatting command, which will remove
any existing formatting from the draft but won't necessarily prevent
it from accidentally slipping back in.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Movie (MPAA) ratings and Python?

2013-12-11 Thread Ian Kelly
On Wed, Dec 11, 2013 at 6:01 PM, Ned Batchelder n...@nedbatchelder.com wrote:
 I've also been wondering if ISO-8859-1 is just an octet-oriented codec,
 so it'll read about anything.  There are clearly non-7-bit-ASCII
 characters in the file that look like line noise in an mrxvt.


 Both ISO-8859-1 and Windows-1255 are octet-oriented, I don't see why one
 would raise an exception when the other didn't.  Unless the exception isn't
 on the decode, but instead on your attempt to output the result. Can you
 show the full traceback you're seeing?

There are gaps in CP 1255 (see
http://en.wikipedia.org/wiki/Code_page_1255), so I presume the file
contains one or more of those octets that don't map to anything at
all.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Disable HTML in forum messages (was: Movie (MPAA) ratings and Python?)

2013-12-11 Thread rusi
On Thursday, December 12, 2013 6:42:42 AM UTC+5:30, Ben Finney wrote:
 Dan Stromberg writes:

  I found a remove formatting button in gmail's composer, and used it
  on this message. Does this message look like plain text?

 Still sent with an HTML part, so some other change must be needed to
 disable that.

  There isn't a lot of e-mail programs that don't do HTML anymore.

 Many of the better mail clients allow the user to explicitly stop
 rendering HTML (but still have it available, as Steven points out).

 Disabling HTML in messages is a good idea: HTML rarely adds anything
 useful to a message in a discussion forum, but it can cause the mail
 program to do actions unwanted by the user (e.g. fetch images from
 elsewhere, or run ECMAScript, or invoke HTML rendering bugs).

When you click on send/reply in gmail, there's a small down-triangle
next to the dustbin, inside which you will find a plain text option

The problem is that then your other mails (may) become plain text and
your friends/recipients will wonder whether you've entered a time-machine
and gone back to 1990!!

Many people find it simpler to just use Google groups.  It also has its
problems (as do all methods!) but in sum its the easiest option to use.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Movie (MPAA) ratings and Python?

2013-12-11 Thread Dan Stromberg
On Wed, Dec 11, 2013 at 5:01 PM, Ned Batchelder n...@nedbatchelder.comwrote:

 On 12/11/13 6:39 PM, Dan Stromberg wrote:


 On Wed, Dec 11, 2013 at 3:24 PM, Steven D'Aprano
 steve+comp.lang.pyt...@pearwood.info
 mailto:steve+comp.lang.pyt...@pearwood.info wrote:

 On Wed, 11 Dec 2013 15:07:35 -0800, Dan Stromberg wrote:

$ chardet mpaa-ratings-reasons.list
   mpaa-ratings-reasons.list: windows-1255 (confidence: 0.97)
  
   I'm aware that chardet is playing guessing games, though one
 would hope
   it would guess well most of the time, and give a reasonable
 confidence
   rating.

 What reason do you have for thinking that Windows-1255 isn't a
 reasonable
 guess? If the bulk of the text is Latin-1 except perhaps for one or
 two
 Hebrew characters (or what chardet thinks are Hebrew characters), it
 may
 actually be a reasonable guess.


 I get a traceback if I try to read the file as Windows-1255.  I don't
 get a traceback if I read it as ISO-8859-1.

 If it is a poor guess, perhaps you ought to report it to the chardet
 maintainers as a good example of a poor guess.

 I was considering that, and may do so.

 I've also been wondering if ISO-8859-1 is just an octet-oriented codec,
 so it'll read about anything.  There are clearly non-7-bit-ASCII
 characters in the file that look like line noise in an mrxvt.


 Both ISO-8859-1 and Windows-1255 are octet-oriented, I don't see why one
 would raise an exception when the other didn't.  Unless the exception isn't
 on the decode, but instead on your attempt to output the result. Can you
 show the full traceback you're seeing?


$ ./movie-ratings
Traceback (most recent call last):
  File ./movie-ratings, line 85, in module
main()
  File ./movie-ratings, line 68, in main
ratings =
get_ratings('/home/dstromberg/src/home-svn/movie-ratings/trunk/mpaa-ratings-reasons.list')
  File ./movie-ratings, line 52, in get_ratings
for line in ratings_file:
  File /usr/local/cpython-3.3/lib/python3.3/encodings/cp1255.py, line 23,
in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0xfc in position
1225: character maps to undefined

BTW, other than satisfying our respective curiosities, I consider this
project finished.  It's probably not getting ratings for my entire movie
collection, but it is getting them for a significant fraction, which is all
I was really looking for.  Now I know which ones are rated PG, so I can
decide whether to let my 8 year old watch them.

This is with cpython-3.3.

Thanks.  ^_^
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Disable HTML in forum messages (was: Movie (MPAA) ratings and Python?)

2013-12-11 Thread Chris Angelico
On Thu, Dec 12, 2013 at 2:23 PM, rusi rustompm...@gmail.com wrote:
 When you click on send/reply in gmail, there's a small down-triangle
 next to the dustbin, inside which you will find a plain text option

 The problem is that then your other mails (may) become plain text and
 your friends/recipients will wonder whether you've entered a time-machine
 and gone back to 1990!!

Or maybe they'll wonder if you've just magically changed your font
settings to be exactly what they most want to read, because you're no
longer sending text that's too large / too small for them to
comfortably read.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Movie (MPAA) ratings and Python?

2013-12-11 Thread Michael Torrie
On 12/11/2013 04:39 PM, Dan Stromberg wrote:
 If you can, would you please turn off rich text posting when you post
 here please?

 Apologies.  I didn't realize gmail was doing this.   I had thought it would
 only do so if I used the formatting options in the composer, but perhaps it
 does so even when just typing text.

From what I can see gmail is producing a multipart message that has a
plaint text part and an html part.  This is what gmail normally does and
as far as I know it's RFC-compliant and that's what gmail always does.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Movie (MPAA) ratings and Python?

2013-12-10 Thread Dan Stromberg
On Mon, Dec 9, 2013 at 10:40 PM, Ben Finney ben+pyt...@benfinney.id.auwrote:

 Dan Stromberg drsali...@gmail.com writes:

  Is anyone using a module or database that gives Python 3.x access to MPAA
  ratings (EG G, PG, PG-13, etc.)?

 What information would you want access to? Why would a library (rather
 than, say, a short set of strings) be needed?

Movie ratings.   EG G, PG, PG-13, etc.

A library might query a REST interface or screenscrape, though most
relevant websites have policies against screenscraping.


  I explored a few of the possibilities on Pypi, a couple of web
 interfaces,
  and the IMDB flat text file with ratings and reasons for those ratings,
 but
  I've not been really impressed yet.

 You seem to be talking about some MPAA document, where is it so we can
 know what specifically you're referring to?
 It's available from many places, EG:
 http://www.filewatcher.com/m/mpaa-ratings-reasons.list.gz.203532-0.html

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Movie (MPAA) ratings and Python?

2013-12-10 Thread Dan Stromberg
On Mon, Dec 9, 2013 at 10:48 PM, Paul Scott pscott...@gmail.com wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 10/12/2013 08:40, Ben Finney wrote:
  Dan Stromberg drsali...@gmail.com writes:
 
  Is anyone using a module or database that gives Python 3.x access
  to MPAA ratings (EG G, PG, PG-13, etc.)?

 If you are already using IMDB you should have a look at
 http://imdbpy.sourceforge.net/downloads.html as well. It provides a
 relatively simple Python interface to either a local or hosted IMDB
 dataset and allows you to grab the MPAA rating directly from the
 canonical movie name.

I believe this was the module I got the farthest with.  I was using it
without a local database, instead querying IMDB's website.  However, it
appeared to be 2.x only (no 3.x yet), and it was tracebacking a lot.

The rest of the IMDB-related packages on Pypi appeared to have tiny version
numbers, or to have not been updated in quite a while.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Movie (MPAA) ratings and Python?

2013-12-10 Thread Ben Finney
Dan Stromberg drsali...@gmail.com writes:

 On Mon, Dec 9, 2013 at 10:40 PM, Ben Finney ben+pyt...@benfinney.id.auwrote:
  What information would you want access to? Why would a library
  (rather than, say, a short set of strings) be needed?
 
 Movie ratings.   EG G, PG, PG-13, etc.

That tells me only that you want short strings. Based on what you've
said so far, your requirements can be met with code like this:

movie_ratings = [G, PG, PG-13, …]

which doesn't need a library to access.

So, I ask again: What data do you want access to? Can you describe what
you want your program to receive when it accesses movie ratings?

Is this information held specifically by the MPAA? If so, where is it
online, and how do the MPAA make it available publicly? These are
questions to answer prior to asking about Python libraries.

Before asking “how do I use Python for this job?”, you need to help us
understand what “this job” is.

-- 
 \ “For your convenience we recommend courteous, efficient |
  `\self-service.” —supermarket, Hong Kong |
_o__)  |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Movie (MPAA) ratings and Python?

2013-12-10 Thread Petite Abeille

On Dec 10, 2013, at 6:25 AM, Dan Stromberg drsali...@gmail.com wrote:

 The IMDB flat text file probably came the closest, but it appears to have 
 encoding issues; it's apparently nearly windows-1255, but not quite.

It's ISO-8859-1.

Both certificates.list.gz and mpaa-ratings-reasons.list.gz are rather 
straightforward to parse.

For the US, you will get something along these lines out of 
certificates.list.gz:

USA:(Banned)
USA:12
USA:AO
USA:Approved
USA:C
USA:E
USA:E10+
USA:G
USA:GP
USA:K-A
USA:M
USA:M/PG
USA:NC-17
USA:Not Rated
USA:Open
USA:PG
USA:PG-13
USA:Passed
USA:R
USA:T
USA:TV-14
USA:TV-G
USA:TV-MA
USA:TV-PG
USA:TV-Y
USA:TV-Y7
USA:Unrated
USA:X

And as mentioned, imdbpy handles all this out-of-the-box if you don’t feel like 
doing it yourself.




-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Movie (MPAA) ratings and Python?

2013-12-10 Thread Michael Torrie
On 12/10/2013 01:26 PM, Ben Finney wrote:
 Movie ratings.   EG G, PG, PG-13, etc.
 
 That tells me only that you want short strings. Based on what you've
 said so far, your requirements can be met with code like this:
 
 movie_ratings = [G, PG, PG-13, …]
 
 which doesn't need a library to access.
 
 So, I ask again: What data do you want access to? Can you describe what
 you want your program to receive when it accesses movie ratings?

I'm not sure whether there's actual confusion here on your part, or
deliberate obtuseness.  From the other comments on this thread, it seems
some people at least understand what he wants to do and I believe he's
been pointed in the right direction.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Movie (MPAA) ratings and Python?

2013-12-10 Thread Ben Finney
Michael Torrie torr...@gmail.com writes:

 I'm not sure whether there's actual confusion here on your part, or
 deliberate obtuseness.

Not confusion, but a desire to avoid guesses based on very vague
requirements.

 From the other comments on this thread, it seems some people at least
 understand what he wants to do and I believe he's been pointed in the
 right direction.

Okay, but it would be good if the OP could clearly state what he wants
so the answers have some context for other readers.

Anyway, I'll bow out of this thread now.

-- 
 \ “I used to think that the brain was the most wonderful organ in |
  `\   my body. Then I realized who was telling me this.” —Emo Philips |
_o__)  |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Movie (MPAA) ratings and Python?

2013-12-10 Thread Dan Stromberg
On Tue, Dec 10, 2013 at 1:07 PM, Petite Abeille petite.abei...@gmail.comwrote:


 On Dec 10, 2013, at 6:25 AM, Dan Stromberg drsali...@gmail.com wrote:

  The IMDB flat text file probably came the closest, but it appears to
 have encoding issues; it's apparently nearly windows-1255, but not quite.

 It's ISO-8859-1.

Thanks - that reads well from CPython 3.3.

Now the question becomes: Why did chardet tell me it was windows-1255?  :)


 Both certificates.list.gz and mpaa-ratings-reasons.list.gz are rather
 straightforward to parse.

Sure, with an appropriate encoding.


 For the US, you will get something along these lines out of
 certificates.list.gz:

 USA:(Banned)
 USA:12
 USA:AO
 USA:Approved
 USA:C
 USA:E
 USA:E10+
 USA:G
 USA:GP
 USA:K-A
 USA:M
 USA:M/PG
 USA:NC-17
 USA:Not Rated
 USA:Open
 USA:PG
 USA:PG-13
 USA:Passed
 USA:R
 USA:T
 USA:TV-14
 USA:TV-G
 USA:TV-MA
 USA:TV-PG
 USA:TV-Y
 USA:TV-Y7
 USA:Unrated
 USA:X

 And as mentioned, imdbpy handles all this out-of-the-box if you don’t feel
 like doing it yourself.

But I believe imdbpy is 2.7 only.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Movie (MPAA) ratings and Python?

2013-12-10 Thread Dan Stromberg
On Tue, Dec 10, 2013 at 3:34 PM, Ben Finney ben+pyt...@benfinney.id.auwrote:

 Michael Torrie torr...@gmail.com writes:

  I'm not sure whether there's actual confusion here on your part, or
  deliberate obtuseness.

 Not confusion, but a desire to avoid guesses based on very vague
 requirements.


What part of movie ratings (EG G, PG, PG-13) don't you understand?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Movie (MPAA) ratings and Python?

2013-12-10 Thread Ben Finney
Dan Stromberg drsali...@gmail.com writes:

 What part of movie ratings (EG G, PG, PG-13) don't you understand?

As stated, that example requirement is satisfied by a list of strings
‘[G, PG, PG-13]’. If your example of “movie ratings” is a small
collection of short strings, then that's all I've got to go on before
needing to guess from a wide space of possible options.

I understand what MPAA movie ratings are, but that doesn't clarify what
*you* mean in terms of what data you want your program to access beyond
the strings “G”, “PG”, “PG-13” themselves.

Anyway, it appears others have accurately guessed your intent from
information beyond what you presented in your request. But you'll
probably agree that's not a very reliable way of getting effective
answers.

-- 
 \“I was in Las Vegas, at the roulette table, having a furious |
  `\ argument over what I considered to be an odd number.” —Steven |
_o__)   Wright |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Movie (MPAA) ratings and Python?

2013-12-10 Thread Mark Lawrence

On 10/12/2013 23:50, Dan Stromberg wrote:


But I believe imdbpy is 2.7 only.



I guess it wouldn't be that difficult to run it through 2to3.  Try that 
and see what happens?


--
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.


Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: Movie (MPAA) ratings and Python?

2013-12-10 Thread Dan Stromberg
On Tue, Dec 10, 2013 at 4:07 PM, Mark Lawrence breamore...@yahoo.co.ukwrote:

 On 10/12/2013 23:50, Dan Stromberg wrote:


 But I believe imdbpy is 2.7 only.


 I guess it wouldn't be that difficult to run it through 2to3.  Try that
 and see what happens?


2to3 doesn't necessarily produce working code.  I've had better luck
porting to 3.x (while continuing to support 2.x) using a single codebase.
http://stromberg.dnsalias.org/~dstromberg/Intro-to-Python/Python%202%20and%203.pdf

That said, porting imdbpy to 3.x is more of a time commitment than I'm
looking for.  Now that I have an encoding that works with the MPAA text
files, I'll probably use that; that should be quick and painless, assuming
that difflib or similar can do the sort of fuzzy matching I'm hoping for.

BTW, I tried using metals (meta ls, not multiple kinds of metal) for a
while, which I believe is based on imdbpy, but it was tracebacking quite a
bit - on 2.7.  I had to make a one line change to get it to use 2.7 instead
of 2.6; this suggests to me that metals and/or imdbpy aren't being
supported very actively.

I'd prefer to use something with an active community around it, but failing
that, I'd prefer to use something _small_ I write myself.
-- 
https://mail.python.org/mailman/listinfo/python-list


Movie (MPAA) ratings and Python?

2013-12-09 Thread Dan Stromberg
Is anyone using a module or database that gives Python 3.x access to MPAA
ratings (EG G, PG, PG-13, etc.)?

I explored a few of the possibilities on Pypi, a couple of web interfaces,
and the IMDB flat text file with ratings and reasons for those ratings, but
I've not been really impressed yet.

The IMDB flat text file probably came the closest, but it appears to have
encoding issues; it's apparently nearly windows-1255, but not quite.

I may end up using the IMDB flat text file with binary I/O, but before I
dig into this any farther, I thought I should ask here: Has someone already
explored this and found a solution they were happy with?

Thanks!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Movie (MPAA) ratings and Python?

2013-12-09 Thread Ben Finney
Dan Stromberg drsali...@gmail.com writes:

 Is anyone using a module or database that gives Python 3.x access to MPAA
 ratings (EG G, PG, PG-13, etc.)?

What information would you want access to? Why would a library (rather
than, say, a short set of strings) be needed?

 I explored a few of the possibilities on Pypi, a couple of web interfaces,
 and the IMDB flat text file with ratings and reasons for those ratings, but
 I've not been really impressed yet.

You seem to be talking about some MPAA document, where is it so we can
know what specifically you're referring to?

-- 
 \ “A child of five could understand this. Fetch me a child of |
  `\  five.” —Groucho Marx |
_o__)  |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Movie (MPAA) ratings and Python?

2013-12-09 Thread Paul Scott
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/12/2013 08:40, Ben Finney wrote:
 Dan Stromberg drsali...@gmail.com writes:
 
 Is anyone using a module or database that gives Python 3.x access
 to MPAA ratings (EG G, PG, PG-13, etc.)?

If you are already using IMDB you should have a look at
http://imdbpy.sourceforge.net/downloads.html as well. It provides a
relatively simple Python interface to either a local or hosted IMDB
dataset and allows you to grab the MPAA rating directly from the
canonical movie name.

- -- Paul

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJSprk1AAoJEP7GEwhwShZxOpgIAIMYG9QRo0XHe5InJejMh0tX
rLAkDL/2oSTQ3/nYNId5EJhDAF4GFu7LYgy4e3HIIWjIPw8UM64FFdFY/3d2t2hQ
jiWSNCoj8E+5m25m8Ob3oBcv+/bQRKsXuD+DvmGhoSvwnDaNqpYmiPBRyHgKp3tm
FoKJCkmgJoMX6KWCauBuVnoRSZGO0os3fZ0t/LpUHXjeZw5xLtvLm5aNqq9vWVin
V0nLZO7DPzN9hBQU6MAkdE6d6C3a/MbIU0s/fgCRJ9bB2SpQc55ewnZxWZLstgAh
WLUPQyY06d6iv5NM7N9Adehs4xxRj3jCIw54Wl8Vhk3h1UeJygxzN1C7HfI2URY=
=2jod
-END PGP SIGNATURE-
-- 
https://mail.python.org/mailman/listinfo/python-list