Re: [Python-Dev] standard library mimetypes module pathologically broken?
Antoine Pitrou: > After a fair amount of discussion on Rietveld, I think you should post another > patch without the deprecations. > (since the discussion was fairly long, I won't repeat here the reasons I gave > unless someone asks me to ) > Besides, it would be nice to have the additional tests you were talking about. I'd guess that if I make another patch, no one else will actually look at the discussion on the first one (though maybe no one will look at it either way)... I'd rather get another couple of opinions about it before burying that conversation. > This sounds very pie-in-the-sky compared to the original intent of the patch > (that is, fix the mimetypes module's implementation oddities). Okay. At least for me, the goals are twofold, because not only is the implementation odd, but I consider the semantics broken as well. But even just fixing the obvious implementation problems would be a big improvement. Cheers, Jacob Rus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] standard library mimetypes module pathologically broken?
11 Aug 2009, Benjamin Peterson wrote: > 2009/8/11 Jacob Rus: >> I have some other questions: How does one deprecate part of a standard >> library API? How can we alert users to the deprecation? When can the >> deprecated parts be removed? > > Basically, you add a DeprecationWarning to the API. Then remove it in > the next major version. Okay, I made another patch, http://bugs.python.org/issue6626 That adds some deprecation warnings to many of the functions/methods in the module. (I think the 'strict' parameters should also be deprecated. But I'm considering actually making a new class, MimeTypesRegistry, or something, and then just making its API stay mostly compatible with MimeTypes, but extended to behave the way I think it should, and deprecating the MimeTypes class altogether, making it a subclass in the interim.) Is there any way to explicitly (i.e. in code rather than docs) deprecate string flags or dicts/lists from the module global namespace? I don't think users should be mucking with the module's singleton at all, and should be forced to make a new registry instance to customize the behavior, so they don't break each-other's code. > Then, you might garner some more reviews by putting your patch up on > Rietveld; it makes reviewing much painful. Okay, my last Rietveld link didn't get any eyeballs, but here's another try: http://codereview.appspot.com/107042 Cheers, Jacob Rus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] standard library mimetypes module pathologically broken?
Benjamin Peterson wrote: > It looks like you need to add some tests for the bugs you fixed to > test_mimetypes. While you're at it, you could improve that test > generally, since it's not exactly extensive. Okay, I'll try to do this sometime in the next few days, if I get the chance. > Then, you might garner some more reviews by putting your patch up on > Rietveld; it makes reviewing much painful. Okay, now that svn.python.org is back up, here's a Rietveld link: http://codereview.appspot.com/104091/show Cheers, Jacob Rus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] standard library mimetypes module pathologically broken?
Glyph Lefkowitz wrote: > Jacob Rus wrote: > No, [changing the semantics in 3.x] is bad. If I may quote Guido: > http://www.artima.com/weblogs/viewpost.jsp?thread=227041 > >> So, once more for emphasis: Don't change your APIs at the same time as >> porting to Py3k! > > Please follow this policy as much as possible in the standard library; the > language transition is going to be hard enough. > >> Ooh, okay. Well I guess we can’t get rid of those then! > > Indeed not. Well, I've had some patches up at http://bugs.python.org/issue6626 for over a week now, and my updated version should have identical semantics to the current module, just with the module's *actual* behavior clear to anyone reading the code, some serious edge-case bugs fixed, and a general performance improvement. I'd like to make some further changes, particularly in which types and extensions the module knows about, to bring it up to date, and ideally even to remove the dependency on an Apache install, but I'd like some discussion and advice about it. I have some other questions: How does one deprecate part of a standard library API? How can we alert users to the deprecation? When can the deprecated parts be removed? I don't want to just give up on this, because I put more than a day of time into it, and I really do think the previous code was of poorer quality than should be in the standard library: I don't want new Python users reading it and thinking that's just how things are done around here. But if no one looks at my patches, I'm not sure what more I can do. Again: http://bugs.python.org/issue6626 Cheers, Jacob ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] standard library mimetypes module pathologically broken?
Robert Lehmann wrote: > Jacob Rus wrote: >> Here is a somewhat more substantively changed version. This one does >> away with the 'inited' flag and the 'init' function, which might be >> impossible given that their documented (though I would be extremely >> surprised if anyone calls them in third-party code) > [snip] > > There seem to be quite a bunch of high-profile third-party modules > relying on this interface, eg. Zope, Plone, TurboGears, and CherryPy. See > http://www.google.com/codesearch?q=mimetypes.init+lang%3Apython for a > more thorough listing. > > Given that most of them aren't ported to Python 3 yet, I guess, changing > the semantics in 3.x seems not-too-bad to me. Ooh, okay. Well I guess we can’t get rid of those then! Michael Foord wrote: > Please post the patches to the Python bug tracker: I made a new issue on the bug tracker, <http://bugs.python.org/issue6626>, and added a new patch which should hopefully be fairly reasonable. I still haven't addressed the issue of which MIME types should be included by default, and how precisely the logic should work for setting those up. But again, hopefully this at least makes it clear what the code is trying to do, so that it's relatively readable for someone trying to use the module. (For instance, so they'll be warned off of using init() and breaking each-other's code) Paul Moore wrote: > The patch you post should also patch the test suite to use your > replacement initialisation function where needed (if you didn't > already do that). Done. The tests still pass, though to be honest this test suite isn't really testing any edge cases. Cheers, Jacob Rus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] standard library mimetypes module pathologically broken?
Jim Jewett wrote: > [It may be worth creating a patch; I think most of these comments > would be better on the bug-tracker.] I'm going to do that shortly. > (1) In a few cases, it looked like you were changing parameter names > between "files" and "filenames". This might break code that was > calling it with keyword arguments -- as I typically would for this > type of function. Sorry, that was a mistake. > (1a) If you are going to change the .sig, you might as well do it > right, and make the default be "knownfiles" rather than the empty > tuple. Seems reasonable. > (2) The comment about why inited was set true at the beginning of the > function instead of the end should probably be kept, or at least > reworded. > > (3) Default values: > > (3a) Why the list of known files going back to Apache 1.2, in that > order? Is there any risk in using too *new* of a MimeTypes file? > > I would assume that the goal is to pick up whatever changes the user > has made locally, but in that case, it still makes sense to have the > newest file be the last one read, in case Apache has made bugfixes. I did not change this in my patch, but I completely agree. Indeed, I think it makes more sense to grab the newest Apache mime.types and just include them with the standard library, either as an in-code python object, or as a mime.types file to be parsed. > (3b) Also, this would improve cross-platform consistency; if I read > that correctly, the Apache files will override the python defaults on > unix or a mac, but not on windows. That will change the results on > the majority of items in _common_types. (application vs text, whether > to put an x- in front of the word pict.) Quite possibly true. It actually seems > (3c) rtf is listed in non-standard, but > http://www.iana.org/assignments/media-types/ does define it. (Though > whether to guess application vs text is not defined, and python > chooses differently from apache.) > > (3d) jpg is listed as non-standard. It turns out that this is just > for the inverse mapping, where image/jpg is non-standard (for > image/jpeg) but that is worth a comment. (see #5) > > (3e) In _types_map, the lines marked duplicates are duplicate keys, > not duplicate values; it would be more clear to also comment out the > (first) line itself, instead of just marking it a duplicate. (Or > better yet, to mention that it is just being added for the inverse > mapping, if that is the case.) I completely agree that this whole section should be considered carefully. Just any changes might have more impact on backwards compatibility than the code flow changes I made, so I thought they could be in a separate patch. > (4) Why bother to lazyinit? Is there any sane usecase for a > MimeTypes that hasn't been inited? Only because the original was written that way, back in 1997 or whatever. I don't think there's necessarily any need for it these days: reading the default files even should be blazingly fast, unless the disk is otherwise thrashing: each is about a a 37k file, and there are at most going to be 3 or 4 of them installed on one machine for different versions of Apache. Cheers, Jacob Rus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] standard library mimetypes module pathologically broken?
Jacob Rus wrote: > Brett Cannon wrote: >> Jacob Rus wrote: >>> At the very least, I >>> think some changes can be made to this code without altering its basic >>> function, which would clean up the actual mime types it returns, >>> comment the exceptions to Apache and explain why they're there, and >>> make the code flow understandable to someone reading the code. >> >> That all sounds reasonable. > > Okay, as a start, I did a simple code cleanup that I think fixes some > potential bugs (any code using its own instance of the MimeTypes class > should now be insulated from other same-process users of the module), > chops out 80 or 90 lines, removes some redundant code paths, clarifies > some of the micro level behavior of some chunks of code, adds a bit > more to the docstring at the top of the file, and makes the program > flow somewhat clearer … *without* changing the semantics of the module > or its included list of MIME types. Here is a somewhat more substantively changed version. This one does away with the 'inited' flag and the 'init' function, which might be impossible given that their documented (though I would be extremely surprised if anyone calls them in third-party code), and makes the behavior of the code much clearer, I think, by making it very obvious how the singleton instance is actually working. Additionally, this version brings the lazy loading of Apache mime.types files to every MimeTypes instance, and makes the read_mime_types() function behave as expected (only getting the mapping from an apache mime.types file rather than including some extra types as the current code does). In this version, tests would want to call the _init_singleton() function to reset to defaults. http://pastie.textmate.org/568399 http://pastie.textmate.org/568400 To reiterate: this should still behave identically to the current module in all reasonable conditions. I still haven't made any changes to the set of MIME types included in the file, or the behavior of the module. Some such changes should be made as well, but the changes so far should be relatively uncontroversial, I hope. Cheers, Jacob Rus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] standard library mimetypes module pathologically broken?
Jacob Rus wrote: > Here's a diff: > http://pastie.textmate.org/568329 > > And here's the whole file: > http://pastie.textmate.org/568333 Slightly better: http://pastie.textmate.org/568354 http://pastie.textmate.org/568355 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] standard library mimetypes module pathologically broken?
Brett Cannon wrote: > Jacob Rus wrote: >> At the very least, I >> think some changes can be made to this code without altering its basic >> function, which would clean up the actual mime types it returns, >> comment the exceptions to Apache and explain why they're there, and >> make the code flow understandable to someone reading the code. > > That all sounds reasonable. Okay, as a start, I did a simple code cleanup that I think fixes some potential bugs (any code using its own instance of the MimeTypes class should now be insulated from other same-process users of the module), chops out 80 or 90 lines, removes some redundant code paths, clarifies some of the micro level behavior of some chunks of code, adds a bit more to the docstring at the top of the file, and makes the program flow somewhat clearer … *without* changing the semantics of the module or its included list of MIME types. Here's a diff: http://pastie.textmate.org/568329 And here's the whole file: http://pastie.textmate.org/568333 This change does require any tests that previously called _default_mime_types() to instead call init(). Any thoughts? Jacob Rus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] standard library mimetypes module pathol ogically broken?
Andrew McNabb wrote: > Jacob Rus wrote: >> * The operation is crazy: It defines a MimeTypes class which >> actually stores the type mappings, but this class is designed to >> be a singleton. The way that such a design is enforced is >> through the use of the module-global 'init' function, which >> makes an instance of the class, and then maps all of the >> functions in the module global namespace to instance methods. >> But confusingly, all such functions are also defined >> independently of the init function, with definitions such as: >> >> def guess_type(url, strict=True): >> if not inited: >> init() >> return guess_type(url, strict) > > I can't speak for any of your other complaints, but I know that this > weird init stuff is fixed in trunk. Actually, this fix changes the semantics of the code quite substantially (not in any way that is incompatible with the extremely vague documentation, but in a way that might break any code that relies on the Python <=2.6 behavior). If such a change is okay, then we can do quite a bit of implementation change under these new semantics. Cheers, Jacob Rus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] standard library mimetypes module pathologically broken?
Brett Cannon wrote: > Jacob Rus wrote: >> * It defines __all__: I didn’t even realize __all__ could be used >> for single-file modules (w/o submodules), but it definitely >> shouldn’t be here. > > __all__ is used to control what a module exports when used in an import *, > nothing more. Thus it's use in a module compared to a package is completely > legitimate. > >> This specific __all__ oddly does not include >> all of the documented variables and functions in the mimetypes >> class. It’s not clear why someone calling import * here wouldn’t >> want the bits not included. > > If something is documented by not listed in __all__ that is a bug. In this case, everything in the module is documented, including parts that should be private, but only a small number are in __all__. My recommendation would be to make those private parts be _ variables and remove them from the docs (using them has no legitimate use cases I can see), and rip out __all__. >> * It creates a _default_mime_types() function which declares a >> bunch of global variables, and then immediately calls >> _default_mime_types() below the definition. There is literally >> no difference in result between this and just putting those >> variables at the top level of the file, so I have no idea why >> this function exists, except to make the code more confusing. > > It could potentially be used for testing, but that's a guess. Here's an abridged version of this function. I don’t think there’s any reason for this that I can see. def _default_mime_types(): global suffix_map global encodings_map global types_map global common_types suffix_map = { '.tgz': '.tar.gz', #... } encodings_map = { '.gz': 'gzip', #... } types_map = { '.a' : 'application/octet-stream', #... } common_types = { '.jpg' : 'image/jpg', #... } _default_mime_types() > Probably came from someone who is very OO happy. Not everyone comes to > Python ready to embrace its procedural or slightly functional facets. Yes, it seems so to me too. > So the problem of changing fundamentally how the code works, even for a > cleanup, is that it will break someone's code out there because they > depended on the module's crazy way of doing things. Now if they are cheating > and looking at things that are meant to be hidden you might be able to clean > things up, but if the semantics are exposed to the user, then there is not > much we can do w/o breaking someone's code. The problem is that the semantics as documented are really ambiguous, and what I would consider the reasonable interpretation is different from what the code actually does. So anyone using this code naively is going to run into trouble, and anyone relying on how the code actually works is going behind the back of the docs, but they sort of have to in order to use much of the functionality of the module. I agree this puts us in a tricky spot. > Honestly, if the code is as bad as it seems -- including its API --, the > best bet would be to come up with a new module for handling MIME types from > scratch, put it up on the Cheeseshop/PyPI, and get the community behind it. > If the community picks it up as the de-facto replacement for mimetypes and > the code has settled we can then talk about adding it to the standard > library and begin deprecating mimetypes. > And thanks for willing to volunteer to fix this. Okay. Well I'd still like to hear a bit about what people really need before trying to make a new API. I'm not such an experienced API designer, and I haven’t really plumbed the depths of mimetypes use cases (though it seems to me like quite a simple module of not more than 100 lines of code or so would suffice). At the very least, I think some changes can be made to this code without altering its basic function, which would clean up the actual mime types it returns, comment the exceptions to Apache and explain why they're there, and make the code flow understandable to someone reading the code. Cheers, Jacob Rus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] standard library mimetypes module pathologically broken?
Brett Cannon wrote: >>>> * It creates a _default_mime_types() function which declares a >>>> bunch of global variables, and then immediately calls >>>> _default_mime_types() below the definition. There is literally >>>> no difference in result between this and just putting those >>>> variables at the top level of the file, so I have no idea why >>>> this function exists, except to make the code more confusing. >>> >>> It could potentially be used for testing, but that's a guess. >> >> Here's an abridged version of this function. I don’t think there’s any >> reason for this that I can see. >> >> def _default_mime_types(): >> global suffix_map >> global encodings_map >> global types_map >> global common_types >> >> suffix_map = { >> '.tgz': '.tar.gz', #... >> } >> >> encodings_map = { >> '.gz': 'gzip', #... >> } >> >> types_map = { >> '.a' : 'application/octet-stream', #... >> } >> >> common_types = { >> '.jpg' : 'image/jpg', #... >> } >> >> _default_mime_types() > > As R. David pointed out, it is being used by regrtest to clean up after > running the test suite. Yeah, basically the issue is that the default mime types should be separate objects from the final set after apache's files have been parsed and custom additions have been made. If these ones at the top level are renamed and not modified after creation, if new objects with all the updated stuff is put at these names, and if the test code is changed to instead reset the ones at these names based on the default objects, I think that will maybe fix things. I'll try to write some potential patches in the next day or two and submit them here for advice. >> The problem is that the semantics as documented are really ambiguous, >> and what I would consider the reasonable interpretation is different >> from what the code actually does. So anyone using this code naively is >> going to run into trouble, and anyone relying on how the code actually >> works is going behind the back of the docs, but they sort of have to >> in order to use much of the functionality of the module. I agree this >> puts us in a tricky spot. > > Well, perhaps the docs can be updated to match the code where cleanup would > change the semantics. I think that would make the docs extremely confusing, and I’m not even sure it would be possible. The current semantics are vaguely okay if an API consumer sticks to straight-forward use cases, such as any which don’t break when the current docs are followed (anything complicated is going to break unless the code is read a few times), and assuming such uses it would be possible to swap out most of the implementation for something relatively straight-forward. But if any of the edges are pushed, the semantics quickly turn insane, to the point I’m not sure they’re document-able. Anyone expecting the code to work that way is going to have a buggy program anyway, so I’m not sure it makes sense to bend over backwards leaving the particular set of bugs unchanged. Cheers, Jacob Rus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] standard library mimetypes module pathologically broken?
e original is discarded. Basically the only reason for the .copy() is to make sure that the correct updates are applied to the apache mimetype defaults, but the code will gladly re-read all of the apache files even after its mapped types are already in these dictionaries, essentially making re-initializing a (very expensive) no-op. All we’re doing is a lot of unnecessary extra disk reads and memory allocations and deallocations. The only time this has any effect is when a non-singleton MimeTypes instance is created, as in the read_mime_types function. * And that read_mime_types function is a doozy. It tries to open a filename, spits back None if there’s an IOError (instead of raising the exception as it should), and then creates a new MimeTypes instance (remember, this is identical to the singleton MimeTypes instance because it starts itself from that one’s mappings), adds any new types it finds in the file with that name, and then returns the 'strict' types_map from it. I’m not sure whether any sane user of this API would expect it to return the existing type mappings *plus* the extra ones in the provided filename, but I really can’t imagine this function ever being particularly useful: it requires you are reading mime types in apache format, but not the apache mime type files you already looked at, and then the only way to find out what new mappings were defined is to take the difference of the default mappings with the result of the function. * The code itself, on a line-by-line basis, is unpythonic and unnecessarily verbose, confusing, and slow. The code should be rewritten to use python 2.3–2.6 features: even leaving its functionality identical it could be cut to about half the number of lines, and made clearer. In case the above doesn’t make this clear: this code is extremely confusing. Trying to read it has caused all the people around me to look up as I shout "what the fuck??!" at the screen every few minutes, as each new revelation gives another surprise. I’m not convinced that I completely understand what the code does, because it has been quite effectively obfuscated, but I understand enough to want to throw the whole thing out, and start essentially from scratch. So the question is, what should be done about this? I’d like to hear how people use the mimetypes module, and how they expect it to work, to figure out the sanest possible mostly-backwards-compatible replacement which could be dropped in (ideally this would just allow the use of default mimetypes and rip out the ability to alter the default datastore: or is there some easy way to change this away from a singleton without breaking code which calls these methods?), and then extend that replacement to support a somewhat saner model for anyone who actually wants to extend the set of mappings. My guess is that replacement code could actually fix subtle bugs in existing uses of this module, by people who had a sane expectation of how it was supposed to work. At the very least, the parts about figuring out exactly which exceptions to Apache’s set of default types are useful would be a good idea, and I’d maybe even recommend including an up-to-date copy of Apache’s mime.types file in the Python Standard Library, and then only overriding its definitions for future versions of Apache (and then overriding the combination of both of those with further exceptions deemed useful for python, with comments explaining why each exception), so that we’re not bothering to look up horribly out-of-date types in multiple locations from Apache 1, 1.2, 1.3, etc. I’d also recommend making the API for overriding definitions be the same as the code used to declare the default overrides, because as it is there are three ways do define types: a) in a mime.types formatted file, b) in a python dictionary that gets initialized with a confusing bit of code, and c) through the add_type function. Does anyone else have thoughts about this, or maybe some good (it had better be *really* good) explanations why this code is the way it is? I'd be happy to try to rewrite it, but I think I’d need a bit of help figuring out how to make the rewrite backwards-compatible. Note: someone else has had fun with this module: <http://lucumr.pocoo.org/2009/3/1/the-1000-speedup-or-the-stdlib-sucks> <http://lucumr.pocoo.org/2009/7/24/singletons-and-their-problems-in-python> Cheers, Jacob Rus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com