Re: [Python-Dev] standard library mimetypes module pathologically broken?

2009-08-02 Thread Jacob Rus
Jacob Rus wrote:
 Brett Cannon wrote:
 Jacob Rus wrote:
 At the very least, I
 think some changes can be made to this code without altering its basic
 function, which would clean up the actual mime types it returns,
 comment the exceptions to Apache and explain why they're there, and
 make the code flow understandable to someone reading the code.

 That all sounds reasonable.

 Okay, as a start, I did a simple code cleanup that I think fixes some
 potential bugs (any code using its own instance of the MimeTypes class
 should now be insulated from other same-process users of the module),
 chops out 80 or 90 lines, removes some redundant code paths, clarifies
 some of the micro level behavior of some chunks of code, adds a bit
 more to the docstring at the top of the file, and makes the program
 flow somewhat clearer … *without* changing the semantics of the module
 or its included list of MIME types.

Here is a somewhat more substantively changed version. This one does
away with the 'inited' flag and the 'init' function, which might be
impossible given that their documented (though I would be extremely
surprised if anyone calls them in third-party code), and makes the
behavior of the code much clearer, I think, by making it very obvious
how the singleton instance is actually working.

Additionally, this version brings the lazy loading of Apache
mime.types files to every MimeTypes instance, and makes the
read_mime_types() function behave as expected (only getting the
mapping from an apache mime.types file rather than including some
extra types as the current code does).

In this version, tests would want to call the _init_singleton()
function to reset to defaults.

http://pastie.textmate.org/568399
http://pastie.textmate.org/568400

To reiterate: this should still behave identically to the current
module in all reasonable conditions. I still haven't made any changes
to the set of MIME types included in the file, or the behavior of the
module. Some such changes should be made as well, but the changes so
far should be relatively uncontroversial, I hope.

Cheers,
Jacob Rus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] standard library mimetypes module pathologically broken?

2009-08-02 Thread Michael Foord

Jacob Rus wrote:

Jacob Rus wrote:
  

Brett Cannon wrote:


Jacob Rus wrote:
  

At the very least, I
think some changes can be made to this code without altering its basic
function, which would clean up the actual mime types it returns,
comment the exceptions to Apache and explain why they're there, and
make the code flow understandable to someone reading the code.


That all sounds reasonable.
  

Okay, as a start, I did a simple code cleanup that I think fixes some
potential bugs (any code using its own instance of the MimeTypes class
should now be insulated from other same-process users of the module),
chops out 80 or 90 lines, removes some redundant code paths, clarifies
some of the micro level behavior of some chunks of code, adds a bit
more to the docstring at the top of the file, and makes the program
flow somewhat clearer … *without* changing the semantics of the module
or its included list of MIME types.



Here is a somewhat more substantively changed version. This one does
away with the 'inited' flag and the 'init' function, which might be
impossible given that their documented (though I would be extremely
surprised if anyone calls them in third-party code), and makes the
behavior of the code much clearer, I think, by making it very obvious
how the singleton instance is actually working.

Additionally, this version brings the lazy loading of Apache
mime.types files to every MimeTypes instance, and makes the
read_mime_types() function behave as expected (only getting the
mapping from an apache mime.types file rather than including some
extra types as the current code does).

In this version, tests would want to call the _init_singleton()
function to reset to defaults.

http://pastie.textmate.org/568399
http://pastie.textmate.org/568400

To reiterate: this should still behave identically to the current
module in all reasonable conditions. I still haven't made any changes
to the set of MIME types included in the file, or the behavior of the
module. Some such changes should be made as well, but the changes so
far should be relatively uncontroversial, I hope.
  


Please post the patches to the Python bug tracker:

   http://bugs.python.org

Thanks

Michael Foord


Cheers,
Jacob Rus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
  



--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] standard library mimetypes module pathologically broken?

2009-08-02 Thread Paul Moore
2009/8/2 Michael Foord fuzzy...@voidspace.org.uk:
[...]
 In this version, tests would want to call the _init_singleton()
 function to reset to defaults.
[...]
 Please post the patches to the Python bug tracker:

   http://bugs.python.org

 Thanks

The patch you post should also patch the test suite to use your
replacement initialisation function where needed (if you didn't
already do that).

Paul.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] standard library mimetypes module pathologically broken?

2009-08-02 Thread Robert Lehmann
On Sat, 01 Aug 2009 23:37:18 -0700, Jacob Rus wrote:

 Here is a somewhat more substantively changed version. This one does
 away with the 'inited' flag and the 'init' function, which might be
 impossible given that their documented (though I would be extremely
 surprised if anyone calls them in third-party code)
[snip]

There seem to be quite a bunch of high-profile third-party modules 
relying on this interface, eg. Zope, Plone, TurboGears, and CherryPy. See
http://www.google.com/codesearch?q=mimetypes.init+lang%3Apython for a 
more thorough listing.

Given that most of them aren't ported to Python 3 yet, I guess, changing 
the semantics in 3.x seems not-too-bad to me.

HTH,

-- 
Robert Stargaming Lehmann

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [regex] memory leak

2009-08-02 Thread MRAB

John Machin wrote:

Hi Matthew,

Your post in c.l.py about your re rewrite didn't mention where to report 
bugs etc so I dug this address out of Google Groups ...


Environment: Python 2.6.2, Windows XP SP3, your latest (29 July) regex 
from the Python bugtracker.


Problem is repeated calls of e.g. compiled_pattern.search(some_text) -- 
Task Manager performance panel shows increasing memory usage with regex 
but not with re. It appears to be cumulative i.e. changing to another 
pattern or text doesn't release memory.


Example:

8-- regex_timer.py
import sys
import time
if sys.platform == 'win32':
timer = time.clock
else:
timer = time.time
module = __import__(sys.argv[1])
count = int(sys.argv[2])
pattern = sys.argv[3]
expected = sys.argv[4]
text = 80 * '~' + 'qwerty'
rx = module.compile(pattern)
t0 = timer()
for i in xrange(count):
assert rx.search(text).group(0) == expected
t1 = timer()
print %d iterations in %.6f seconds % (count, t1 - t0)
8---

Here are the results of running this (plus observed difference between 
peak memory usage and base memory usage):


dos-prompt\python26\python regex_timer.py regex 100 ~ ~
100 iterations in 3.811500 seconds [60 Mb]

dos-prompt\python26\python regex_timer.py regex 200 ~ ~
200 iterations in 7.581335 seconds [128 Mb]

dos-prompt\python26\python regex_timer.py re 200 ~ ~
200 iterations in 2.549738 seconds [3 Mb]

This happens on a variety of patterns: w, wert, [a-z]+, [a-z]+t, 
...



Thanks for that, John. I've should've kept an eye on the Task Manager!
:-) Now fixed.

It's surprising how much time and effort is needed just to manage the
memory!
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pylinting the stdlib

2009-08-02 Thread Mark Dickinson
On Sat, Aug 1, 2009 at 11:40 PM, Vincent Legollvincent.leg...@gmail.com wrote:
 Hello,

 I've fed parts of the stdlib to pylint and after some filtering
 there appears to be some things that looks strange, I've
 filled a few bugs to the tracker for them.

 buglist snipped

 Is this useless and taking reviewer's time for nothing ?

 Please advise, if this is deemed useful, I'll continue further

I think this is valuable work---please do continue!

Just out of interest, how many false positives did you have
to filter out in finding the 5 cases above?

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] standard library mimetypes module pathologically broken?

2009-08-02 Thread Jim Jewett
[It may be worth creating a patch; I think most of these comments
would be better on the bug-tracker.]

(1)  In a few cases, it looked like you were changing parameter names
between files and filenames.  This might break code that was
calling it with keyword arguments -- as I typically would for this
type of function.

(1a)  If you are going to change the .sig, you might as well do it
right, and make the default be knownfiles rather than the empty
tuple.

(2)  The comment about why inited was set true at the beginning of the
function instead of the end should probably be kept, or at least
reworded.

(3) Default values:

(3a) Why the list of known files going back to Apache 1.2, in that
order?  Is there any risk in using too *new* of a MimeTypes file?

I would assume that the goal is to pick up whatever changes the user
has made locally, but in that case, it still makes sense to have the
newest file be the last one read, in case Apache has made bugfixes.

(3b)  Also, this would improve cross-platform consistency; if I read
that correctly, the Apache files will override the python defaults on
unix or a mac, but not on windows.  That will change the results on
the majority of items in _common_types.  (application vs text, whether
to put an x- in front of the word pict.)

(3c)  rtf is listed in non-standard, but
http://www.iana.org/assignments/media-types/ does define it.  (Though
whether to guess application vs text is not defined, and python
chooses differently from apache.)

(3d)  jpg is listed as non-standard.  It turns out that this is just
for the inverse mapping, where image/jpg is non-standard (for
image/jpeg) but that is worth a comment.  (see #5)

(3e)  In _types_map, the lines marked duplicates are duplicate keys,
not duplicate values; it would be more clear to also comment out the
(first) line itself, instead of just marking it a duplicate.  (Or
better yet, to mention that it is just being added for the inverse
mapping, if that is the case.)


(4)  Why bother to lazyinit?Is there any sane usecase for a
MimeTypes that hasn't been inited?

I see value in not reading the default files, but none in not reading
at least the files that were asked for.  I could see value in only
partial initialization if there were several long steps, but right
now, initialization is all-or-nothing.

If the thing is useless without an init, then it makes sense to just
get done it immediately and skip the later checks; anyone who could
have actually saved time should just remove the import.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] standard library mimetypes module pathologically broken?

2009-08-02 Thread Jacob Rus
Jim Jewett wrote:
 [It may be worth creating a patch; I think most of these comments
 would be better on the bug-tracker.]

I'm going to do that shortly.

 (1)  In a few cases, it looked like you were changing parameter names
 between files and filenames.  This might break code that was
 calling it with keyword arguments -- as I typically would for this
 type of function.

Sorry, that was a mistake.

 (1a)  If you are going to change the .sig, you might as well do it
 right, and make the default be knownfiles rather than the empty
 tuple.

Seems reasonable.

 (2)  The comment about why inited was set true at the beginning of the
 function instead of the end should probably be kept, or at least
 reworded.

 (3) Default values:

 (3a) Why the list of known files going back to Apache 1.2, in that
 order?  Is there any risk in using too *new* of a MimeTypes file?

 I would assume that the goal is to pick up whatever changes the user
 has made locally, but in that case, it still makes sense to have the
 newest file be the last one read, in case Apache has made bugfixes.

I did not change this in my patch, but I completely agree. Indeed, I
think it makes more sense to grab the newest Apache mime.types and
just include them with the standard library, either as an in-code
python object, or as a mime.types file to be parsed.

 (3b)  Also, this would improve cross-platform consistency; if I read
 that correctly, the Apache files will override the python defaults on
 unix or a mac, but not on windows.  That will change the results on
 the majority of items in _common_types.  (application vs text, whether
 to put an x- in front of the word pict.)

Quite possibly true. It actually seems

 (3c)  rtf is listed in non-standard, but
 http://www.iana.org/assignments/media-types/ does define it.  (Though
 whether to guess application vs text is not defined, and python
 chooses differently from apache.)

 (3d)  jpg is listed as non-standard.  It turns out that this is just
 for the inverse mapping, where image/jpg is non-standard (for
 image/jpeg) but that is worth a comment.  (see #5)

 (3e)  In _types_map, the lines marked duplicates are duplicate keys,
 not duplicate values; it would be more clear to also comment out the
 (first) line itself, instead of just marking it a duplicate.  (Or
 better yet, to mention that it is just being added for the inverse
 mapping, if that is the case.)

I completely agree that this whole section should be considered
carefully. Just any changes might have more impact on backwards
compatibility than the code flow changes I made, so I thought they
could be in a separate patch.

 (4)  Why bother to lazyinit?    Is there any sane usecase for a
 MimeTypes that hasn't been inited?

Only because the original was written that way, back in 1997 or
whatever. I don't think there's necessarily any need for it these
days: reading the default files even should be blazingly fast, unless
the disk is otherwise thrashing: each is about a a 37k file, and there
are at most going to be 3 or 4 of them installed on one machine for
different versions of Apache.

Cheers,
Jacob Rus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] standard library mimetypes module pathologically broken?

2009-08-02 Thread Jacob Rus
Robert Lehmann wrote:
 Jacob Rus wrote:
 Here is a somewhat more substantively changed version. This one does
 away with the 'inited' flag and the 'init' function, which might be
 impossible given that their documented (though I would be extremely
 surprised if anyone calls them in third-party code)
 [snip]

 There seem to be quite a bunch of high-profile third-party modules
 relying on this interface, eg. Zope, Plone, TurboGears, and CherryPy. See
 http://www.google.com/codesearch?q=mimetypes.init+lang%3Apython for a
 more thorough listing.

 Given that most of them aren't ported to Python 3 yet, I guess, changing
 the semantics in 3.x seems not-too-bad to me.

Ooh, okay.  Well I guess we can’t get rid of those then!

Michael Foord wrote:
 Please post the patches to the Python bug tracker:

I made a new issue on the bug tracker,
http://bugs.python.org/issue6626, and added a new patch which should
hopefully be fairly reasonable.  I still haven't addressed the issue
of which MIME types should be included by default, and how precisely
the logic should work for setting those up. But again, hopefully this
at least makes it clear what the code is trying to do, so that it's
relatively readable for someone trying to use the module. (For
instance, so they'll be warned off of using init() and breaking
each-other's code)

Paul Moore wrote:
 The patch you post should also patch the test suite to use your
 replacement initialisation function where needed (if you didn't
 already do that).

Done. The tests still pass, though to be honest this test suite isn't
really testing any edge cases.

Cheers,
Jacob Rus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] standard library mimetypes module pathologically broken?

2009-08-02 Thread Glyph Lefkowitz
On Sun, Aug 2, 2009 at 4:17 PM, Jacob Rus jacobo...@gmail.com wrote:

 Robert Lehmann wrote:
  Jacob Rus wrote:
  Here is a somewhat more substantively changed version. This one does
  away with the 'inited' flag and the 'init' function, which might be
  impossible given that their documented (though I would be extremely
  surprised if anyone calls them in third-party code)
  [snip]
 
  There seem to be quite a bunch of high-profile third-party modules
  relying on this interface, eg. Zope, Plone, TurboGears, and CherryPy. See
  http://www.google.com/codesearch?q=mimetypes.init+lang%3Apython for a
  more thorough listing.
 
  Given that most of them aren't ported to Python 3 yet, I guess, changing
  the semantics in 3.x seems not-too-bad to me.


No, it's bad.  If I may quote Guido:
http://www.artima.com/weblogs/viewpost.jsp?thread=227041

So, once more for emphasis: *Don't change your APIs at the same time as
 porting to Py3k!*


Please follow this policy as much as possible in the standard library; the
language transition is going to be hard enough.

Put a different way: please don't change the library unless you're
*also*going to write a 2to3 fixer that somehow updates all calling
code, too.

Ooh, okay.  Well I guess we can’t get rid of those then!


Indeed not.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com