[Python-Dev] Automatic encoding detection [was: Re: Python3 complexity - 2 use cases]
So when it is time to guess [at the character encoding of a file], a source of good guesses is an important battery to include. The barrier for entry to the standard library is higher than mere usefulness. Agreed. But most programs will need it, and people will either include (the same) 3rd-party library themselves, or write their own workaround, or have buggy code *is* sufficient. The points of contention are (1) How many programs have to deal with documents written outside their control -- and probably originating on another system. I'm not ready to say most programs in general, but I think that barrier is met for both web clients (for which we already supply several batteries) and quick-and-dirty utilities. (2) How serious are the bugs / How annoying are the workarounds? As someone who mostly sticks to English, and who tends to manually ignore stray bytes when dealing with a semi-binary file format, the bugs aren't that serious for me personally. So I may well choose to write buggy programs, and the bug may well never get triggered on my own machine. But having a batch process crash one run in ten (where it didn't crash at all under Python 2) is a bad thing. There are environments where (once I knew about it) I would add chardet (if I could get approval for the 3rd-party component). (3) How clearcut is the *right* answer? As I said, at one point (several years ago), the w3c and whatwg started to standardize the right answer. They backed that out, because vendors wanted the option to improve their detection in the future without violating standards. There are certainly situations where local knowledge can do better than a global solution like chardet, but ... the right answer is clear most of the time. Just ignoring the problem is still a 99% answer, because most text in ASCII-mostly environments really is close enough. But that is harder (and the One Obvious Way is less reliable) under Python 3 than it was under Python 2. An alias for open that defaulted to surrogate-escape (or returned the new ASCIIstr bytes hybrid) would probably be sufficient to get back (almost) to Python 2 levels of ease and reliability. But it would tend to encourage ASCII/English-only assumptions. You could fix most of the remaining problems by scripting a web browser, except that scripting the browser in a cross-platform manner is slow and problematic, even with webbrowser.py. Whatever a recent Firefox does is (almost by definition) good enough, and is available ... but maybe not in a convenient form, which is one reason that chardet was created as a port thereof. Also note that firefox assumes you will update more often than Python does. Whatever chardet said at the time the Python release was cut is almost certainly good enough too. The browser makers go to great lengths to match each other even in bizarre corner cases. (Which is one reason there aren't more competing solutions.) But that doesn't mean it is *impossible* to construct a test case where they disagree -- or even one where a recent improvement in the algorithms led to regressions for one particular document. That said, such regressions should be limited to documents that were not properly labeled in the first place, and should be rare even there. Think of the changes as obscure bugfixes, akin to a program starting to handle NaN properly, in a place where it should not ever see one. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Python3 complexity - 2 use cases
Steven D'Aprano wrote: I think that heuristics to guess the encoding have their role to play, if the caller understands the risks. Ben Finney wrote: In my opinion, content-type guessing heuristics certainly don't belong in the standard library. It would be great if there were never any need to guess. But in the real world, there is -- and often the user won't know any more than python does. So when it is time to guess, a source of good guesses is an important battery to include. The HTML5 specifications go through some fairly extreme contortions to document what browsers actually do, as opposed to what previous standards have mandated. They don't currently specify how to guess (though I think a draft once tried, since the major browsers all do it, and at the time did it similarly), but the specs do explicitly support such a step, and do provide an implementation note encouraging user-agents to do at least minimal auto-detection. http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining-the-character-encoding My own opinion is therefore that Python SHOULD provide better support for both of the following use cases: (1) Treat this file like it came from the web -- including autodetection and even overriding explicit charset declarations for certain charsets. We should explicitly treat autodetection like time zone data -- there is no promise that the right answer (or at least the best guess) won't change, even within a release. I offer no opinion on whether chardet in particular is still too volatile, but the docs should warn that the API is driven by possibly changing external data. (2) Treat this file as ASCII+, where anything non-ASCII will (at most) be written back out unchanged; it doesn't even need to be converted to text. At this time, I don't know whether the right answer is making it easy to default to surrogate-escape for all error-handling, adding more bytes methods, encouraging use of python's latin-1 variant, offering a dedicated (new?) codec, or some new suggestion. I do know that this use case is important, and that python 3 currently looks clumsy compared to python 2. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Which direction is UnTransform? / Unicode is different
(Fri Nov 15 16:57:00 CET 2013) Stephen J. Turnbull wrote: Serhiy Storchaka wrote: If the transform() method will be added, I prefer to have only one transformation method and specify a direction by the transformation name (bzip2/unbzip2). Me too. Until I consider special cases like compress, or lower, and realize that there are enough special cases to become a major wart if generic transforms ever became popular. People think about these transformations as en- or de-coding, not transforming, most of the time. Even for a transformation that is an involution (eg, rot13), people have an very clear idea of what's encoded and what's not, and they are going to prefer the names encode and decode for these (generic) operations in many cases. I think this is one of the major stumbling blocks with unicode. I originally disagreed strongly with what Stephen wrote -- but then I realized that all my counterexamples involved unicode text. I can tell whether something is tarred or untarred, zipped or unzipped. But an 8-bit (even Latin-1, let alone ASCII) bytestring really doesn't seem encoded, and it doesn't make sense to decode a perfectly readable (ASCII) string into a sequence of code units. Nor does it help that http://www.unicode.org/glossary/#code_unit defines code unit as The minimal bit combination that can represent a unit of encoded text for processing or interchange. The Unicode Standard uses 8-bit code units in the UTF-8 encoding form, 16-bit code units in the UTF-16 encoding form, and 32-bit code units in the UTF-32 encoding form. (See definition D77 in Section 3.9, Unicode Encoding Forms.) I have to read that very carefully to avoid mentally translating it into Code Units are *en*coded, and there are lots of different complicated encodings that I wouldn't use unless I were doing special processing or interchange. If I'm not using the network, or if my interchange format already looks like readable ASCII, then unicode sure sounds like a complication. I *will* get confused over which direction is encoding and which is decoding. (Removing .decode() from the (unicode) str type in 3 does help a lot, if I have a Python 3 interpreter running to check against.) I'm not sure exactly what implications the above has, but it certainly supports separating the Text Processing from the generic codecs, both in the documentation and in any potential new methods. Instead of relying on introspection of .decodes_to and .encodes_to, it would be useful to have charsetcodecs and tranformcodecs as entirely different modules, with their own separate registries. I will even note that the existing help(codecs) seems more appropriate for charsetcodecs than it does for the current conjoined module. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 454 (tracemalloc) disable == clear?
(Tue Oct 29 12:37:52 CET 2013) Victor Stinner wrote: For consistency, you cannot keep traces when tracing is disabled. The free() must be enabled to remove allocated memory blocks, or next malloc() may get the same address which would raise an assertion error (you cannot have two memory blocks at the same address). That seems like an a quirk of the implementation, particularly since the actual address is not returned to the user. Nor do I see any way of knowing when that allocation is freed. Well, unless I missed it... I don't see how to get anything beyond the return value of get_traces, which is a (time-ordered?) list of allocation size with then-current call stack. It doesn't mention any attribute for indicating that some entries are de-allocations, let alone the actual address of each allocation. For the reason explained above, it's not possible to disable the whole module temporarly. Internally, tracemalloc uses a thread-local variable (called the reentrant flag) to disable temporarly tracing allocations in the current thread. It only disables tracing new allocations, deallocations are still proceed. Even assuming the restriction is needed, this just seems to mean that disabling (or filtering) should not affect de-allocation events, for fear of corrupting tracemalloc's internal structures. In that case, I would expect disabling (and filtering) to stop capturing new allocation events for me, but I would still expect tracemalloc to do proper internal maintenance. It would at least explain why you need both disable *and* reset; reset would empty those internal structures, so that tracemalloc could shortcut that maintenance. I would NOT assume that I needed to call reset when changing the filters, nor would I assume that changing them threw out existing traces. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] backported Enum
(On June 19, 2013) Barry Warsaw wrote about porting mailman from flufl.enum to the stdlib.enum: Switching from call syntax to getitem syntax for looking up an enum member by name, e.g. -delivery_mode = DeliveryMode(data['delivery_mode']) +delivery_mode = DeliveryMode[data['delivery_mode']] Switching from getitem syntax to call syntax for looking up an enum member by value, e.g. -return self._enum[value] +return self._enum(value) Interesting that these two were exactly opposite from flufl.enum. Is there a reason why these were reversed? I can sort of convince myself that it makes sense because dicts work better with strings than with ints, but ... it seems like such a minor win that I'm not sure it is worth backwards incompatibility. (Of course, I also don't know how much use stdlib.enum has already gotten with the current syntax.) Switching from int() to .value to get the integer value of an enum member, e.g. -return (member.list_id, member.address.email, int(member.role)) +return (member.list_id, member.address.email, member.role.value) Is just this a style preference? Using a .value attribute certainly makes sense, but I don't see it mentioned in the PEP as even optional, let alone recommended. If you care that the value be specifically an int (as opposed to any object), then a int constructor may be better. [Some additional changes that mean there will be *some* changes, which does reduce the pressure for backwards compatibility.] ... An unexpected difference is that failing name lookups raise a KeyError instead of a ValueError. I could understand either, as well as AttributeError, since the instance that would represent that value isn't a class attribute. Looking at Enum creation, I think ValueError would be better than TypeError for complaints about duplicate names. Was TypeError chosen because it should only happen during setup? I would also not be shocked if some people expect failed value lookups to raise an IndexError, though I expect they would adapt if they get something else that makes sense. Would it be wrong to create an EnumError that subclasses (ValueError, KeyError, AttributeError) and to raise that subclass from everything but _StealthProperty and _get_mixins? -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Keyword meanings [was: Accept just PEP-0426]
Vinay Sajip reworded the 'Provides-Dist' definition to explicitly say: The use of multiple names in this field *must not* be used for bundling distributions together. It is intended for use when projects are forked and merged over time ... (1) Then how *should* the bundle-of-several-components case be represented? (2) How is 'Provides-Dist' different from 'Obsoletes-Dist'? The only difference I can see is that it may be a bit more polite to people who do want to install multiple versions of a (possibly abstract) package. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 362: 4th edition
Summary: *Every* Parameter attribute is optional, even name. (Think of builtins, even if they aren't automatically supported yet.) So go ahead and define some others that are sometimes useful. Instead of defining a BoundArguments class, just return a copy of the Signature, with value attributes added to the Parameters. Use subclasses to distinguish the parameter kind. (Replacing most of the is_ methods from the 3rd version.) [is_]implemented is important information, but the API isn't quite right; even with tweaks, maybe we should wait a version before freezing it on the base class. But I would be happy to have Larry create a Signature for the os.* functions, whether that means a subclass or just an extra instance attribute. I favor passing a class to Signature.format, because so many of the formatting arguments would normally change in parallel. But my tolerance for nested structures may be unusually high. I make some more specific suggestions below. In http://mail.python.org/pipermail/python-dev/2012-June/120305.html Yury Selivanov wrote: A Signature object has the following public attributes and methods: * return_annotation : object The annotation for the return type of the function if specified. If the function has no annotation for its return type, this attribute is not set. This means users must already be prepared to use hasattr with the Signature as well as the Parameters -- in which case, I don't see any harm in a few extra optional properties. I would personally prefer to see the name (and qualname) and docstring, but it would make perfect sense to implement these by keeping a weakref to the original callable, and just delegating there unless/until the properties are explicitly changed. I suspect others will have a use for additional delegated attributes, such as the self of boundmethods. I do agree that __eq__ and __hash__ should depend at most on the parameters (including their order) and the annotation. * parameters : OrderedDict An ordered mapping of parameters' names to the corresponding Parameter objects (keyword-only arguments are in the same order as listed in ``code.co_varnames``). For a specification, that feels a little too tied to the specific implementation. How about: Parameters will be ordered as they are in the function declaration. or even just: Positional parameters will be ordered as they are in the function declaration. because: def f(*, a=4, b=5): pass and: def f(*, b=5, a=4): pass should probably have equal signatures. Wild thought: Instead of just *having* an OrderedDict of Parameters, should a Signature *be* that OrderedDict (with other attributes)? That is, should signature(testfn)[foo] get the foo parameter? * bind(\*args, \*\*kwargs) - BoundArguments Creates a mapping from positional and keyword arguments to parameters. Raises a ``BindError`` (subclass of ``TypeError``) if the passed arguments do not match the signature. * bind_partial(\*args, \*\*kwargs) - BoundArguments Works the same way as ``bind()``, but allows the omission of some required arguments (mimics ``functools.partial`` behavior.) Are those descriptions actually correct? I would expect the mapping to be from parameters (or parameter names) to values extracted from *args and **kwargs. And I'm not sure the current patch does even that, since it seems to instead return a non-Mapping object (but with a mapping attribute) that could be used to re-create *args, **kwargs in canonical form. (Though that canonicalization is valuable for calls; it might even be worth an as_call method.) I think it should be explicit that this mapping does not include parameters which would be filled by default arguments. In fact, if you stick with this interface, I would like a 3rd method that does fill out everything. But I think it would be simpler to just add an optional attribute to each Parameter instance, and let bind fill that in on the copies, so that the return value is also a Signature. (No need for the BoundArguments class.) Then the user can decide whether or not to plug in the defaults for missing values. * format(...) - str Formats the Signature object to a string. Optional arguments allow for custom render functions for parameter names, annotations and default values, along with custom separators. I think it should state explicitly that by default, the return value will be a string that could be used to declare an equivalent function, if Signature were replaced with def funcname. There are enough customization parameters that would often be changed together (e.g., to produce HTML output) that it might make sense to use overridable class defaults -- or even to make format a class itself. I also think it would make sense to delegate formatting the individual parameters to the parameter objects.
[Python-Dev] time.clock_info() field names
In http://mail.python.org/pipermail/python-dev/2012-April/119134.html Benjamin Peterson wrote: I see PEP 418 gives time.clock_info() two boolean fields named is_monotonic and is_adjusted. I think the is_ is unnecessary and a bit ugly, and they could just be renamed monotonic and adjusted. I agree with monotonic, but I think it should be adjustable. To me, adjusted and is_adjusted both imply that an adjustment has already been made; adjustable only implies that it is possible. I do remember concerns (including Stephen J. Turnbull's CAL_0O19nmi0+zB+tV8poZDAffNdTnohxo9y5dbw+E2q=9rx...@mail.gmail.com ) that adjustable should imply at least a list of past adjustments, and preferably a way to make an adjustment. I just think that stating it is adjustable (without saying how, or whether and when it already happened) is less wrong than claiming it is already adjusted just in case it might have been. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Rename time.steady(strict=True) to time.monotonic()?
In http://mail.python.org/pipermail/python-dev/2012-March/118024.html Steven D'Aprano wrote: What makes this steady, given that it can be adjusted and it can go backwards? It is best-effort for steady, but putting best in the name would be an attractive nuisance. Is steady() merely a convenience function to avoid the user having to write something like this? try: mytimer = time.monotonic except AttributeError: mytimer = time.time That would still be worth doing. But I think the main point is that the clock *should* be monotonic, and *should* be as precise as possible. Given that it returns seconds elapsed (since an undefined start), perhaps it should be time.seconds() or even time.counter() -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Docs of weak stdlib modules should encourage exploration of 3rd-party alternatives
In http://mail.python.org/pipermail/python-dev/2012-March/117570.html Steven D'Aprano posted: Need is awfully strong. I don't believe it is the responsibility of the standard library to be judge and reviewer of third party packages that it doesn't control. It is, however, user-friendly to indicate when the stdlib selections are particularly likely to be for reasons other than A bunch of experts believe this is the best way to do this. Cpython's documentation is (de facto) the documentation for python in general, and pointing people towards other resources (particularly pypi itself) is quite reasonable. Many modules are in the stdlib in part because they are an *acceptable* way of doing something, and the best ways are either changing too quickly or are so complicated that it doesn't make sense to burden the *standard* libary for specialist needs. In those cases, I do think the documentation should say so. Specific examples: http://docs.python.org/library/numeric.html quite reasonably has subsections only for what ships with Python. But I think the introductory paragraph could stand to have an extra sentence explaining why and when people should look beyond the stanard library, such as: Applications centered around mathematics may benefit from specialist 3rd party libraries, such as numpy http://pypi.python.org/pypi/numpy/ , gmpy http://pypi.python.org/pypi/gmpy , and scipy http://pypi.python.org/pypi/scipy . I would add a similar sentence to the web section, or the internet protocols section if web is still not broken out separately. http://docs.python.org/dev/library/internet.html Note that some web conventions are still evolving too quickly for covenient encapsulation in a stable library. Many applications will therefore prefer functional replacements from third parties, such as requests or httplib2, or frameworks such as Django and Zope. www-related products can be found by browsing PyPI for top internet subtopic www/http. http://pypi.python.org/pypi?:action=browsec=319c=326 [I think that searching by classifier -- which first requires browse, and can't be reached from the list of classifiers -- could be improved.] Should we recommend wxPython over Pyjamas or PyGUI or PyGtk? Actually, I think the existing http://docs.python.org/library/othergui.html does a pretty good job; I would not object to adding mentions of other tools as well, but wiki reference is probably sufficient. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Issue #10278 -- why not just an attribute?
In http://mail.python.org/pipermail/python-dev/2012-March/117762.html Georg Brandl posted: + If available, a monotonic clock is used. By default, if *strict* is False, + the function falls back to another clock if the monotonic clock failed or is + not available. If *strict* is True, raise an :exc:`OSError` on error or + :exc:`NotImplementedError` if no monotonic clock is available. This is not clear to me. Why wouldn't it raise OSError on error even with strict=False? Please clarify which exception is raised in which case. Passing strict as an argument seems like overkill since it will always be meaningless on some (most?) platforms. Why not just use a function attribute? Those few users who do care can check the value of time.steady.monotonic before calling time.steady(); exceptions raised will always be whatever the clock actually raises. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Python install layout and the PATH on win32
In view-source:http://mail.python.org/pipermail/python-dev/2012-March/117586.html van.lindberg at gmail.com posted: 1) The layout for the python root directory for all platforms should be as follows: stdlib = {base/userbase}/lib/python{py_version_short} platstdlib = {base/userbase}/lib/python{py_version_short} purelib = {base/userbase}/lib/python{py_version_short}/site-packages platlib = {base/userbase}/lib/python{py_version_short}/site-packages include = {base/userbase}/include/python{py_version_short} scripts = {base/userbase}/bin data = {base/userbase} Why? Pure python vs compiled C doesn't need to be separated at the directory level, except for cleanliness. Some (generally unix) systems prefer to split the libraries into several additional pieces depending on CPU architecture. The structure listed above doesn't have a location for docs. Some packages (such as tcl) may be better off in their own area. What is data? Is this an extra split compared to today, or does it refer to things like LICENSE.txt, README.txt, and NEWS.txt? And even once I figure out where files have moved, and assume that the split is perfect -- what does this buy me over the current situation? I was under the impression that programs like distutils already handled finding the appropriate directories for a program; if you're rewriting that logic, you're just asking for bugs on a strange platform that you don't use. If you're looking for things interactively, then platform conventions are probably more important than consistency across platforms. If you disagree, you are welcome to reorganize your personal linux installation so that it matches windows, and see whether it causes you any problems. ... We *already* have this. The only difference in this proposal is that we go from py_version_nodot to py_version_short, i.e. from c:\python33\lib\python33 to c:\python33\lib\python3.3 I have not seen that redundancy before on windows. I'm pretty sure that it is a relic of your Linux provider wanting to support multiple python versions using shared filesystems. The Windows standard is to use a local disk, and to bundle it all up into its own directory, similar to the way that java apps sometimes ship with their own JVM. Also note that using the dot in a directory name is incautious. I haven't personally had trouble in several years, but doing so is odd enough that some should be expected. Python already causes some grief by not installing in Program Files, but that is at least justified by the spaces in filenames problem; what is the advantange of 3.3? I'm using windows, and I just followed the defaults at installation. It is possible that the installer continued to do something based on an earlier installation, but I don't think this machine has ever had a customized installation of any python version. C:\python32\* Everything is under here; I assume {base/userbase} would be set to C:\python32 As is customary for windows, the base directory contains the license/readme/news and all executables that the user is expected to use directly. (python.exe, pythonw.exe. It also contains w9xpopen.exe that users do not use, but that too is fairly common.) There is no data directory. Subdirectories are: C:\python32\DLLs In additional to regular DLL files, it contains .pyd files and icons. It looks like modules from the stdlib that happen to be written in C. Most users will never bother to look here. C:\python32\Doc A .chm file; full html would be fine too, but removing it would be a bad idea. C:\python32\include These are the header files, though most users will never have any use for them, as there isn't generally a compiler. C:\python32\Lib The standard library -- or at least the portion implemented in python. Note that site-packages is a subdirectory here. It doesn't happen to have an __init__.py, but to an ordinary user it looks just like any other stdlib package, such as xml or multiprocessing. I personally happen to keep things in subdirectories of site-packages, but I can't say what is standard. Moving site-packages out of the Lib directory might make sense, but probably isn't worth the backward compatibility hit. C:\python32\libs .lib files. I'm not entirely sure what these (as opposed to the DLLs) are for; lib files aren't that common on windows. My machine does not appear to have any that aren't associated with cross-platform tools or unix emulation. C:\python32\tcl Note that this is in addition to associated files under DLLs and libs. I would prefer to see them in one place, but moving it in with non-tcl stuff would not be an improvement. Most users will never look (or care); those that do usually appreciate knowing that, for example, the dde subdirectory is for tcl. C:\python32\Tools This has three subdirectories (i18n,
[Python-Dev] Python install layout and the PATH on win32
In http://mail.python.org/pipermail/python-dev/2012-March/117617.html van.lindberg at gmail.com posted: As noted earlier in the thread, I also change my proposal to maintain the existing differences between system installs and user installs. [Wanted lower case, which should be irrelevant; sysconfig.get_python_inc already assumes lower case despite the configuration file.] [Wanted bin instead of Scripts, even though they aren't binaries.] If there are to be any changes, I *am* tempted to at least harmonize the two install types, but to use the less redundant system form. If the user is deliberately trying to hide that it is version 33 (or even that it is python), then so be it; defaulting to redundant information is not an improvement. Set the base/userbase at install time, with defaults of base = %SystemDrive%\{py_version_nodot} userbase = %USERPROFILE%\Application Data\{py_version_nodot} usedbase = base for system installs; userbase for per-user installs. Then let the rest default to subdirectories; sysconfig.get_config_vars on windows explicitly doesn't provide as many variables as unix, just INCLUDEPY (which should default to {usedbase}/include) and LIBDEST and BINLIBDEST (both of which should default to {usedbase}/lib). And no, I'm not forgetting data or scripts. As best I can tell, sysconfig doesn't actually expose them, and there is no Scripts directory on my machine (except inside Tools). Perhaps some installers create it when they install their own extensions? -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] problem with recursive yield from delegation
http://mail.python.org/pipermail/python-dev/2012-March/117396.html Stefan Behnel posted: I found a problem in the current yield from implementation ... [paraphrasing] g1 yields from g2 g2 yields from g1 XXX python follows the existing delegation without checking re-entrancy g2 (2nd call) checks re-entrancy, and raises an exception g1 (2nd call) gets to handle the exception, and doesn't g2 (1st call) gets to handle the exception, and does How is this a problem? Re-entering a generator is a bug. Python caught it and raised an appropriate exception. It would be nice if python caught the generator cycle as soon as it was created, just as it would be nice if reference cycles were collected as soon as they became garbage. But python doesn't promise to catch cycles immediately, and the checks required to do so would slow down all code, so in practice the checks are delayed. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Adding a builtins parameter to eval(), exec() and __import__().
http://mail.python.org/pipermail/python-dev/2012-March/117395.html Brett Cannon posted: [in reply to Mark Shannon's suggestion of adding a builtins parameter to match locals and globals] It's a mess right now to try to grab the __import__() implementation and this would actually help clarify import semantics by saying that __import__() for any chained imports comes from __import__()s locals, globals, or builtins arguments (in that order) or from the builtins module itself (i.e. tstate-builtins). How does that differ from today? If you're saying that the locals and (module-level) globals aren't always checked in order, then that is a semantic change. Probably a good change, but still a change -- and it can be made indepenently of Mark's suggestion. Also note that I would assume this was for sandboxing, and that missing names should *not* fall back to the real globals, although I would understand if bootstrapping required the import statement to get special treatment. (Note that I like Mark's proposed change; I just don't see how it cleans up import.) -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] [RELEASED] Python 3.3.0 alpha 1
In http://mail.python.org/pipermail/python-dev/2012-March/117348.html Georg Brandl ge...@python.org posted: Python 3.3 includes a range of improvements of the 3.x series, as well as easier porting between 2.x and 3.x. Major new features in the 3.3 release series are: As much as it is nice to just celebrate improvements, I think readers (particularly on the download page http://www.python.org/download/releases/3.3.0/ ) would be better served if there were an additional point about porting and the hash changes. http://docs.python.org/dev/whatsnew/3.3.html#porting-to-python-3-3 also failed to mention this, and even the changelog didn't seem to warn people about failing tests or tell them how to work around it. Perhaps something like: Hash Randomization (issue 13703) is now on by default. Unfortunately, this does break some tests; it can be temporarily turned off by setting the environment variable PYTHONHASHSEED to 0 before launching python. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 416: Add a frozendict builtin type
In http://mail.python.org/pipermail/python-dev/2012-February/117113.html Victor Stinner posted: An immutable mapping can be implemented using frozendict:: class immutabledict(frozendict): def __new__(cls, *args, **kw): # ensure that all values are immutable for key, value in itertools.chain(args, kw.items()): if not isinstance(value, (int, float, complex, str, bytes)): hash(value) # frozendict ensures that all keys are immutable return frozendict.__new__(cls, *args, **kw) What is the purpose of this? Is it just a hashable frozendict? If it is for security (as some other messages suggest), then I don't think it really helps. class Proxy: def __eq__(self, other): return self.value == other def __hash__(self): return hash(self.value) An instance of Proxy is hashable, and the hash is not object.hash, but it is still mutable. You're welcome to call that buggy, but a secure sandbox will have to deal with much worse. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 414 - Unicode Literals for Python 3
In http://mail.python.org/pipermail/python-dev/2012-February/117070.html Vinay Sajip wrote: It's moot, but as I see it: the purpose of PEP 414 is to facilitate a single codebase across 2.x and 3.x. However, it only does this if your 3.x interest is 3.3+ For many people -- particularly those who haven't ported yet -- 3.x will mean 3.3+. There are some who will support 3.2 because it is a LTS release on some distribution, just as there were some who supported Python 1.5 (but not 1.6) long into the 2.x cycle, but I expect them to be the minority. I certainly don't expect 3.2 to remain a primary development target, the way that 2.7 is. IIRC, the only ways to use 3.2 even today are: (a) Make an explicit choice to use something other than the default (b) Download directly and choose 3.x without OS support (c) Use Arch Linux These are the sort of people who can be expected to upgrade. Now also remember that we're talking specifically about projects that have *not* been ported to 3.x (== no existing users to support), and that won't be ported until 3.2 is already in maintenance mode. If you also want to or need to support 3.0 - 3.2, it makes your workflow more painful, Compared to dropping 3.2, yes. Compared to supporting 3.2 today? I don't see how. because you can't run tests on 2.x or 3.3 and then run them on 3.2 without an intermediate source conversion step - just like the 2to3 step that people find painful when it's part of maintenance workflow, and which in part prompted the PEP in the first place. So the only differences compared to today are that: (a) Fewer branches are after the auto-conversion. (b) No current branches are after the auto-conversion. (c) The auto-conversion is much more limited in scope. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 414 - Unicode Literals for Python 3
In http://mail.python.org/pipermail/python-dev/2012-February/116953.html Terry J. Reedy wrote: I presume that most 2.6 code has problems other than u'' when attempting to run under 3.x. Why? If you're talking about generic code that has seen minimal changes since 2.0, sure. But I think this request is specifically for projects that are thinking about python 3, but are trying to use a single source base regardless of version. Using an automatic translation step means that python (or at least python 3) would no longer be the actual source code. I've worked with enough generated source code in other languages that it is worth some pain to avoid even a slippery slope. By the time you drop 2.5, the subset language is already pretty good; if I have to write something version-specific, I prefer to treat that as a sign that I am using the wrong approach. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Add a frozendict builtin type
In http://mail.python.org/pipermail/python-dev/2012-February/116955.html Victor Stinner proposed: The blacklist implementation has a major issue: it is still possible to call write methods of the dict class (e.g. dict.set(my_frozendict, key, value)). It is also possible to use ctypes and violate even more invariants. For most purposes, this falls under consenting adults. The whitelist implementation has an issue: frozendict and dict are not compatible, dict is not a subclass of frozendict (and frozendict is not a subclass of dict). And because of Liskov substitutability, they shouldn't be; they should be sibling children of a basedict that doesn't have the the mutating methods, but also doesn't *promise* not to mutate. * frozendict values must be immutable, as dict keys Why? That may be useful, but an immutable dict whose values might mutate is also useful; by forcing that choice, it starts to feel too specialized for a builtin. * Add an hash field to the PyDictObject structure That is another indication that it should really be a sibling class; most of the uses I have had for immutable dicts still didn't need hashing. It might be a worth adding anyhow, but only to immutable dicts -- not to every instance dict or keywords parameter. * frozendict.__hash__ computes hash(frozenset(self.items())) and caches the result is its private hash attribute Why? hash(frozenset(selk.keys())) would still meet the hash contract, but it would be approximately twice as fast, and I can think of only one case where it wouldn't work just as well. (That case is wanting to store a dict of alternative configuration dicts (with no defaulting of values), but ALSO wanting to use the configurations themselves (as opposed to their names) as the dict keys.) -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP for new dictionary implementation
PEP author Mark Shannon wrote (in http://mail.python.org/pipermail/python-dev/attachments/20120208/05be469a/attachment.txt): ... allows ... (the ``__dict__`` attribute of an object) to share keys with other attribute dictionaries of instances of the same class. Is the same class a deliberate restriction, or just a convenience of implementation? I have often created subclasses (or even families of subclasses) where instances (as opposed to the type) aren't likely to have additional attributes. These would benefit from key-sharing across classes, but I grant that it is a minority use case that isn't worth optimizing if it complicates the implementation. By separating the keys (and hashes) from the values it is possible to share the keys between multiple dictionaries and improve memory use. Have you timed not storing the hash (in the dict) at all, at least for (unicode) str-only dicts? Going to the string for its own cached hash breaks locality a bit more, but saves 1/3 of the memory for combined tables, and may make a big difference for classes that have relatively few instances. Reduction in memory use is directly related to the number of dictionaries with shared keys in existence at any time. These dictionaries are typically half the size of the current dictionary implementation. How do you measure that? The limit for huge N across huge numbers of dicts should be 1/3 (because both hashes and keys are shared); I assume that gets swamped by object overhead in typical small dicts. If a table is split the values in the keys table are ignored, instead the values are held in a separate array. If they're just dead weight, then why not use them to hold indices into the array, so that values arrays only have to be as long as the number of keys, rather than rounding them up to a large-enough power-of-two? (On average, this should save half the slots.) A combined-table dictionary never becomes a split-table dictionary. I thought it did (at least temporarily) as part of resizing; are you saying that it will be re-split by the time another thread is allowed to see it, so that it is never observed as combined? Given that this optimization is limited to class instances, I think there should be some explanation of why you didn't just automatically add slots for each variable assigned (by hard-coded name) within a method; the keys would still be stored on the type, and array storage could still be used for the values; the __dict__ slot could initially be a NULL pointer, and instance dicts could be added exactly when they were needed, covering only the oddball keys. I would reword (or at least reformat) the Cons section; at the moment, it looks like there are four separate objections, and seems to be a bit dismissive towards backwards copmatibility. Perhaps something like: While this PEP does not change any documented APIs or invariants, it does break some de facto invariants. C extension modules may be relying on the current physical layout of a dictionary. That said, extensions which rely on internals may already need to be recompiled with each feature release; there are already changes planned for both Unicode (for efficiency) and dicts (for security) that would require authors of these extensions to at least review their code. Because iteration (and repr) order can depend on the order in which keys are inserted, it will be possible to construct instances that iterate in a different order than they would under the current implementation. Note, however, that this will happen very rarely in code which does not deliberately trigger the differences, and that test cases which rely on a particular iteration order will already need to be corrected in order to take advantage of the security enhancements being discussed under hash randomization, or for use with Jython and PyPy. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Store timestamps as decimal.Decimal objects
In http://mail.python.org/pipermail/python-dev/2012-February/116073.html Nick Coghlan wrote: Besides, float128 is a bad example - such a type could just be returned directly where we return float64 now. (The only reason we can't do that with Decimal is because we deliberately don't allow implicit conversion of float values to Decimal values in binary operations). If we could really replace float with another type, then there is no reason that type couldn't be a nearly trivial Decimal subclass which simply flips the default value of the (never used by any caller) allow_float parameter to internal function _convert_other. Since decimal inherits straight from object, this subtype could even be made to inherit from float as well, and to store the lower- precision value there. It could even produce the decimal version lazily, so as to minimize slowdown on cases that do not need the greater precision. Of course, that still doesn't answer questions on whether the higher precision is a good idea ... -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] plugging the hash attack
In http://mail.python.org/pipermail/python-dev/2012-January/116003.html Benjamin Peterson wrote: 2. It will be off by default in stable releases ... This will prevent code breakage ... 2012/1/27 Steven D'Aprano steve at pearwood.info: ... it will become on by default in some future release? On Fri, Jan 27, 2012, Benjamin Peterson benjamin at python.org wrote: Yes, 3.3. The solution in 3.3 could even be one of the more sophisticated proposals we have today. Brett Cannon (Mon Jan 30) wrote: I think that would be good. And I would even argue we remove support for turning it off to force people to no longer lean on dict ordering as a crutch (in 3.3 obviously). Turning it on by default is fine. Removing the ability to turn it off is bad. If regression tests fail with python 3, the easiest thing to do is just not to migrate to python 3. Some decisions (certainly around unittest, but I think even around hash codes) were settled precisely because tests shouldn't break unless the functionality has really changed. Python 3 isn't yet so dominant as to change that tradeoff. I would go so far as to add an extra step in the porting recommendations; before porting to python 3.x, run your test suite several times with hash randomization turned on; any failures at this point are relying on formally undefined behavior and should be fixed, but can *probably* be fixed just by wrapping the results in sorted. (I would offer a patch to the porting-to-py3 recommendation, except that I couldn't find any not associated specifically with 3.0) -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Counting collisions for the win
In http://mail.python.org/pipermail/python-dev/2012-January/115715.html Frank Sievertsen wrote: Am 20.01.2012 13:08, schrieb Victor Stinner: I'm surprised we haven't seen bug reports about it from users of 64-bit Pythons long ago A Python dictionary only uses the lower bits of a hash value. If your dictionary has less than 2**32 items, the dictionary order is exactly the same on 32 and 64 bits system: hash32(str) mask == hash64(str) mask for mask= 2**32-1. No, that's not true. Whenever a collision happens, other bits are mixed in very fast. Frank Bits are mixed in quickly from a denial-of-service standpoint, but Victor is correct from a Why don't the tests already fail? standpoint. A dict with 2**12 slots, holding over 2700 entries, will be far larger than most test cases -- particularly those with visible output. In a dict that size, 32-bit and 64-bit machines will still probe the same first, second, third, fourth, fifth, and sixth slots. Even on the rare cases when there are at least 6 collisions, the next slots may well be either the same, or close enough that it doesn't show up in a changed iteration order. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 410 (Decimal timestamp): the implementation is ready for a review
PEP author Victor asked (in http://mail.python.org/pipermail/python-dev/2012-February/116499.html): Maybe I missed the answer, but how do you handle timestamp with an unspecified starting point like os.times() or time.clock()? Should we leave these function unchanged? If *all* you know is that it is monotonic, then you can't -- but then you don't really have resolution either, as the clock may well speed up or slow down. If you do have resolution, and the only problem is that you don't know what the epoch was, then you can figure that out well enough by (once per type per process) comparing it to something that does have an epoch, like time.gmtime(). -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 411: Provisional packages in the Python standard library
Eli Bendersky wrote (in http://mail.python.org/pipermail/python-dev/2012-February/116393.html ): A package will be marked provisional by including the following paragraph as a note at the top of its documentation page: I really would like some marker available from within Python itself. Use cases: (1) During development, the documentation I normally read first is whatever results from import module; help(module), or possibly dir(module). (2) At BigCorp, there were scheduled times to move as much as possible to the current (or current-1) version. Regardless of policy, full regression test suites don't generally exist. If Python were viewed as part of the infrastructure (rather than as part of a specific application), or if I were responsible for maintaining an internal application built on python, that would be the time to upgrade python -- and I would want an easy way to figure out which applications and libraries I should concentrate on for testing. * Encapsulation of the import state (PEP 368) Wrong PEP number. I'm guessing that you meant 406. -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com