[issue46868] Improve performance of math.prod with bignums (and functools.reduce?)
benrg added the comment: >That memory frugality adds a log2 factor to the runtime. Your iterative algorithm is exactly the one I had in mind, but it doesn't have the run time that you seem to think. Is that the whole reason for our disagreement? It does only O(1) extra work (not even amortized O(1), really O(1)) for each call of the binary function, and there are exactly n-1 calls. There's a log(n) term (not factor) for expanding the array and skipping NULLs in the final cleanup. The constant factor for it is tiny since the array is so small. I implemented it in C and benchmarked it against reduce with unvarying arguments (binary | on identical ints), and it's slightly slower around 75% of the time, and slightly faster around 25% of the time, seemingly at random, even in the same test, which I suppose is related to where the allocator decides to put the temporaries. The reordering only needs to have a sliver of a benefit for it to come out on top. When I said "at the cost of a log factor" in the first message, I meant relative to algorithms like ''.join, not left-reduce. >I suspect the title of this report referenced "math.prod with bignums" because >it's the only actual concrete use case you had ;-) I led with math.prod because its evaluation order isn't documented, so it can be changed (and I guess I should have said explicitly that there is no up-front penalty to changing it beyond tricky cache locality issues). I said "bignums" because I had in mind third-party libraries and the custom classes that I mentioned in my last message. I put ? after reduce because its left-associativity is documented and useful (e.g. with nonassociative functions), so it would have to be extended or a new function added, which is always a hard sell. I also wanted the title to be short. I did the best I could. -- ___ Python tracker <https://bugs.python.org/issue46868> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46791] Allow os.remove to defer to rmdir
benrg added the comment: The REMOVE_DIR case reduces to return RemoveDirectoryW(path->wide) ? 0 : -1; so I think there's no reason to combine it with the other two. The REMOVE_BOTH case is attrs = GetFileAttributesW(path->wide); if (attrs != INVALID_FILE_ATTRIBUTES && (attrs & FILE_ATTRIBUTE_DIRECTORY)) { success = RemoveDirectoryW(path->wide); } else { success = DeleteFileW(path->wide); } return success ? 0 : -1; For REMOVE_BOTH, I don't see the need of calling GetFileAttributes - couldn't you just try DeleteFile, and if that fails, RemoveDirectory? -- nosy: +benrg ___ Python tracker <https://bugs.python.org/issue46791> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46868] Improve performance of math.prod with bignums (and functools.reduce?)
benrg added the comment: Anything that produces output of O(m+n) size in O(m+n) time. Ordered merging operations. Mergesort is a binary ordered merge with log-depth tree reduction, and insertion sort is the same binary operation with linear-depth tree reduction. Say you're merging sorted lists of intervals, and overlapping intervals need special treatment. It's easier to write a manifestly correct binary merge than an n-way merge, or a filter after heapq.merge that needs to deal with complex interval clusters. I've written that sort of code. Any situation that resembles a fast path but doesn't qualify for the fast path. For example, there's an optimized factorial function in math, but you need double factorial. Or math.prod is optimized for ints as you suggested, but you have a class that uses ints internally but doesn't pass the CheckExact test. Usually when you miss out on a fast path, you just take a (sometimes large) constant-factor penalty, but here it pushes you into a different complexity class. Or you have a class that uses floats internally and wants to limit accumulated roundoff errors, but the struture of the computation doesn't fit fsum. >Tree reduction is very popular in the parallel processing world, for obvious >reasons. It's the same reason in every case: the log depth limits the accumulation of some bad thing. In parallel computing it's critical-path length, in factorial and mergesort it's size, in fsum it's roundoff error. Log depth helps in a range of situations. >I've searched in vain for other languages that try to address this "in general" You've got me there. >As Guido will tell you, the only original idea in Python is adding an "else" >clause to loops ;-) I don't think that's really true, except in the sense that there's nothing new under the sun. No one would use Python if it was just like other languages except slower and with for-else. -- ___ Python tracker <https://bugs.python.org/issue46868> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46868] Improve performance of math.prod with bignums (and functools.reduce?)
benrg added the comment: My example used ints, but I was being deliberately vague when I said "bignums". Balanced-tree reduction certainly isn't optimal for ints, and may not be optimal for anything, but it's pretty good for a lot of things. It's the comparison-based sorting of reduction algorithms. * When the inputs are of similar sizes, it tends to produce intermediate operands of similar sizes, which helps with Karatsuba multiplication (as you said). * When the inputs are of different sizes, it limits the "damage" any one of them can do, since they only participate in log2(n) operations each. * It doesn't look at the values, so it works with third-party types that are unknown to the stdlib. There's always a fallback case, and balanced reduction is good for that. If there's a faster path for ints that looks at their bit lengths, great. -- ___ Python tracker <https://bugs.python.org/issue46868> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46868] Improve performance of math.prod with bignums (and functools.reduce?)
New submission from benrg : math.prod is slow at multiplying arbitrary-precision numbers. E.g., compare the run time of factorial(5) to prod(range(2, 50001)). factorial has some special-case optimizations, but the bulk of the difference is due to prod evaluating an expression tree of depth n. If you re-parenthesize the product so that the tree has depth log n, as factorial does, it's much faster. The evaluation order of prod isn't documented, so I think the change would be safe. factorial uses recursion to build the tree, but it can be done iteratively with no advance knowledge of the total number of nodes. This trick is widely useful for turning a way of combining two things into a way of combining many things, so I wouldn't mind seeing a generic version of it in the standard library, e.g. reduce(..., order='mid'). For many specific cases there are more efficient alternatives (''.join, itertools.chain, set.unions, heapq.merge), but it's nice to have a recipe that saves you the trouble of writing special-case algorithms at the cost of a log factor that's often ignorable. -- components: Library (Lib) messages: 414126 nosy: benrg priority: normal severity: normal status: open title: Improve performance of math.prod with bignums (and functools.reduce?) type: enhancement versions: Python 3.11 ___ Python tracker <https://bugs.python.org/issue46868> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28824] os.environ should preserve the case of the OS keys ?
benrg added the comment: This issue should be marked dependent on issue 43702 or issue 46862, since fixing it could break third-party code unless they're fixed first. > Given 'nt.environ' is available without case remapping, I think that's the > best workaround. Right now, it's not a good workaround because it contains the environment at the time the interpreter was started, not the current environment. On Posix, _Environ takes a reference to posix.environ and uses it directly, so it does get updated. On Windows, _Environ gets a rewritten dictionary and nt.environ is just a space-wasting attractive nuisance. I think it should be replaced with getenviron() which builds a dict from the environment block each time it's called. But posix.environ is documented (though nt.environ isn't), so maybe not. > class _CaseInsensitiveString(str): I think there should be a public class like this. It could be useful to email.message.Message and its clients like urllib. They currently store headers in a list and every operation is O(n). The semantics are tricky. As written, it violates the requirement that equal objects have equal hashes. To fix that, you'd have to make every CIS compare unequal to every str. At that point, it probably shouldn't be a str subclass, which also has the advantage that it's not limited to strings. It can be a generic compare-by-key wrapper. -- nosy: +benrg ___ Python tracker <https://bugs.python.org/issue28824> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46862] subprocess makes environment blocks with duplicate keys on Windows
New submission from benrg : On Windows, if one writes env = os.environ.copy() env['http_proxy'] = 'whatever' or either of the documented equivalents ({**os.environ, ...} or (os.environ | {...})), and passes the resulting environment to subprocess.run or subprocess.Popen, the spawned process may get an environment containing both `HTTP_PROXY` and `http_proxy`. Most Win32 software will see only the first one, which contains the unmodified value from os.environ. Because os.environ forces all keys to upper case, it's possible to work around this by using only upper case keys in the update, but that behavior of os.environ is nonstandard (issue 46861), and subprocess shouldn't depend on it always being true, nor should end users have to. Since dicts preserve order, the user's (presumable) intent is preserved in the env argument. I think subprocess should do something like env = {k.upper(): (k, v) for k, v in env.items()} env = dict(env.values()) to discard duplicate keys, keeping only the rightmost one. -- components: Library (Lib), Windows messages: 414068 nosy: benrg, paul.moore, steve.dower, tim.golden, zach.ware priority: normal severity: normal status: open title: subprocess makes environment blocks with duplicate keys on Windows type: behavior versions: Python 3.10 ___ Python tracker <https://bugs.python.org/issue46862> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46861] os.environ forces variable names to upper case on Windows
New submission from benrg : The Windows functions that deal with environment variables are case-insensitive and case-preserving, like most Windows file systems. Many environment variables are conventionally written in all caps, but others aren't, such as `ProgramData`, `PSModulePath`, and `windows_tracing_logfile`. os.environ forces all environment variable names to upper case when it's constructed. One consequence is that if you pass a modified environment to subprocess.Popen, you end up with variables named `PROGRAMDATA`, etc., even if you didn't modify their values. While this is unlikely to break things since other software normally ignores the case, it's nonstandard behavior, and disconcerting when the affected variable names are shown to human beings. Here's an example of someone being confused by this: https://stackoverflow.com/questions/19023238/why-python-uppercases-all-environment-variables-in-windows -- components: Library (Lib), Windows messages: 414064 nosy: benrg, paul.moore, steve.dower, tim.golden, zach.ware priority: normal severity: normal status: open title: os.environ forces variable names to upper case on Windows type: behavior versions: Python 3.10, Python 3.11, Python 3.7, Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue46861> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46858] mmap constructor resets the file pointer on Windows
New submission from benrg : On Windows, `mmap.mmap(f.fileno(), ...)` has the undocumented side effect of setting f's file pointer to 0. The responsible code in mmapmodule is this: /* Win9x appears to need us seeked to zero */ lseek(fileno, 0, SEEK_SET); Win9x is no longer supported, and I'm quite sure that NT doesn't have whatever problem they were trying to fix. I think this code should be deleted, and a regression test added to verify that mmap leaves the file pointer alone on all platforms. (mmap also maintains its own file pointer, the `pos` field of `mmap_object`, which is initially set to zero. This issue is about the kernel file pointer, not mmap's pointer.) -- components: IO, Library (Lib), Windows messages: 414039 nosy: benrg, paul.moore, steve.dower, tim.golden, zach.ware priority: normal severity: normal status: open title: mmap constructor resets the file pointer on Windows type: behavior versions: Python 3.10, Python 3.11, Python 3.7, Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue46858> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46848] Use optimized string search function in mmap.find()
benrg added the comment: memmem isn't a standard C function, and some libraries don't have it, notably Microsoft's. newlib's memmem seems to be the same as glibc's, but is under a BSD 3-clause license instead of LGPL. An older version of newlib's memmem (prior to 2019-01-01) has the license "Permission to use, copy, modify, and distribute this software is freely granted, provided that this notice is preserved", and is still highly optimized and much better than a naive implementation. Of course, bundling it would no longer be quite so "free". Old newlib memmem: https://sourceware.org/git/?p=newlib-cygwin.git;a=blob_plain;f=newlib/libc/string/memmem.c;h=25704e467decff5971b34f4189ddfff04ac5fa8e New newlib memmem: https://sourceware.org/git/?p=newlib-cygwin.git;a=blob_plain;f=newlib/libc/string/memmem.c Helper file for both: https://sourceware.org/git/?p=newlib-cygwin.git;a=blob_plain;f=newlib/libc/string/str-two-way.h ------ nosy: +benrg ___ Python tracker <https://bugs.python.org/issue46848> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46842] py to pyc location mapping with sys.pycache_prefix isn't 1-to-1 on Windows
New submission from benrg : `importlib._bootstrap_external` contains this comment: # We need an absolute path to the py file to avoid the possibility of # collisions within sys.pycache_prefix [...] # [...] the idea here is that if we get `Foo\Bar`, we first # make it absolute (`C:\Somewhere\Foo\Bar`), then make it root-relative # (`Somewhere\Foo\Bar`), so we end up placing the bytecode file in an # unambiguous `C:\Bytecode\Somewhere\Foo\Bar\`. The code follows the comment, but doesn't achieve the goal: `C:\Somewhere\Foo\Bar` and `D:\Somewhere\Foo\Bar` collide. There is also no explicit handling of UNC paths, with the result that `\\Somewhere\Foo\Bar` maps to the same location. I think that on Windows the code should use a mapping like C:\Somewhere\Foo\Bar ==> C:\Bytecode\C\Somewhere\Foo\Bar D:\Somewhere\Foo\Bar ==> C:\Bytecode\D\Somewhere\Foo\Bar \\Somewhere\Foo\Bar ==> C:\Bytecode\UNC\Somewhere\Foo\Bar The lack of double-slash prefix handling also matters on Unixy platforms that give it a special meaning. Cygwin is probably affected by this. I don't know whether there are any others. -- components: Library (Lib), Windows messages: 413878 nosy: benrg, paul.moore, steve.dower, tim.golden, zach.ware priority: normal severity: normal status: open title: py to pyc location mapping with sys.pycache_prefix isn't 1-to-1 on Windows type: behavior versions: Python 3.10, Python 3.11, Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue46842> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42627] urllib.request.getproxies() misparses Windows registry proxy settings
New submission from benrg : If `HKCU\Software\Microsoft\Windows\CurrentVersion\Internet Settings\ProxyServer` contains the string `http=host:123;https=host:456;ftp=host:789`, then getproxies_registry() should return {'http': 'http://host:123', 'https': 'http://host:456', 'ftp': 'http://host:789'} for consistency with WinInet and Chromium, but it actually returns {'http': 'http://host:123', 'https': 'https://host:456', 'ftp': 'ftp://host:789'} This bug has existed for a very long time (since Python 2.0.1 if not earlier), but it was exposed recently when urllib3 added support for HTTPS-in-HTTPS proxies in version 1.26. Before that, an `https` prefix on the HTTPS proxy url was silently treated as `http`, accidentally resulting in the correct behavior. There are additional bugs in the treatment of single-proxy strings (the case when the string contains no `=` character). The Chromium code for parsing the ProxyServer string can be found here: https://source.chromium.org/chromium/chromium/src/+/refs/tags/89.0.4353.1:net/proxy_resolution/proxy_config.cc;l=86 Below is my attempt at modifying the code from `getproxies_registry` to approximately match Chromium's behavior. I could turn this into a patch, but I'd like feedback on the corner cases first. if '=' not in proxyServer and ';' not in proxyServer: # Use one setting for all protocols. # Chromium treats this as a separate category, and some software # uses the ALL_PROXY environment variable for a similar purpose, # so arguably this should be 'all={}'.format(proxyServer), # but this is more backward compatible. proxyServer = 'http={0};https={0};ftp={0}'.format(proxyServer) for p in proxyServer.split(';'): # Chromium and WinInet are inconsistent in their treatment of # invalid strings with the wrong number of = characters. It # probably doesn't matter. protocol, addresses = p.split('=', 1) protocol = protocol.strip() # Chromium supports more than one proxy per protocol. I don't # know how many clients support the same, but handling it is at # least no worse than leaving the commas uninterpreted. for address in addresses.split(','): if protocol in {'http', 'https', 'ftp', 'socks'}: # See if address has a type:// prefix if not re.match('(?:[^/:]+)://', address): if protocol == 'socks': # Chromium notes that the correct protocol here # is SOCKS4, but "socks://" is interpreted # as SOCKS5 elsewhere. I don't know whether # prepending socks4:// here would break code. address = 'socks://' + address else: address = 'http://' + address # A string like 'http=foo;http=bar' will produce a # comma-separated list, while previously 'bar' would # override 'foo'. That could potentially break something. if protocol not in proxies: proxies[protocol] = address else: proxies[protocol] += ',' + address -- components: Library (Lib), Windows messages: 382921 nosy: benrg, paul.moore, steve.dower, tim.golden, zach.ware priority: normal severity: normal status: open title: urllib.request.getproxies() misparses Windows registry proxy settings type: behavior versions: Python 3.10, Python 3.6, Python 3.7, Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue42627> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue32612] pathlib.(Pure)WindowsPaths can compare equal but refer to different files
benrg <benrud...@gmail.com> added the comment: I don't know whether this clarifies it at all, but if x and y are Path objects, and x == y, I would expect also x.exists() == y.exists(), and x.read_bytes() == y.read_bytes(), and so on, unless there is a race condition. I think all programmers expect that if x == y, then they refer to the same file. This is not true currently. -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue32612> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue32612] pathlib.(Pure)WindowsPaths can compare equal but refer to different files
benrg <benrud...@gmail.com> added the comment: This bug is about paths that compare *equal*, but refer to *different* files. I agree that the opposite is not much of a problem (and I said so in the original comment). The reason I classified this as a security bug is that Python scripts using pathlib on Windows could be vulnerable in certain cases to an attacker that can choose file names. For example, the order in which paths are added to a set or dict could affect which of two files is seen by the script. If different parts of the script add files in different orders - which would normally be safe - the result could be similar to a TOCTTOU race. I don't disagree that "doing a good enough job of case folding is better than ignoring it." I just think that pathlib should not case-fold strings that Windows filesystems don't. -- nosy: +pitrou type: enhancement -> security ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue32612> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue32612] pathlib.(Pure)WindowsPaths can compare equal but refer to different files
New submission from benrg <benrud...@gmail.com>: (Pure)WindowsPath uses str.lower to fold paths for comparison and hashing. This doesn't match the case folding of actual Windows file systems. There exist WindowsPath objects that compare and hash equal, but refer to different files. For example, the strings '\xdf' (sharp S) and '\u1e9e' (capital sharp S) '\u01c7' (LJ) and '\u01c8' (Lj) '\u0130' (I with dot) and 'i\u0307' (i followed by combining dot) 'K' and '\u212a' (Kelvin sign) are equal under str.lower folding but are distinct file names on NTFS volumes on my Windows 7 machine. There are hundreds of other such pairs. I think this is very bad. The reverse (paths that compare unequal but refer to the same file) is probably unavoidable and is expected by programmers. But paths that compare equal should never be unequal to the OS. How to fix this: Unfortunately, there is no correct way to case fold Windows paths. The FAT, NTFS, and exFAT drivers on my machine all have different behavior. (The examples above work on all three, except for 'K' and '\u212a', which are equivalent on FAT volumes.) NTFS stores its case-folding map on each volume in the hidden $UpCase file, so even different NTFS volumes on the same machine can have different behavior. The contents of $UpCase have changed over time as Windows is updated to support new Unicode versions. NTFS and NFS (and possibly WebDAV) also support full case sensitivity when used with Interix/SUA and Cygwin, though this requires disabling system-wide case insensitivity via the registry. I think that pathlib should either give up on case folding entirely, or should fold very conservatively, treating WCHARs as equivalent only if they're equivalent on all standard file systems on all supported Windows versions. If pathlib folds case at all, there should be a solution for people who need to interoperate with Cygwin or SUA tools on a case-sensitive machine, but I suppose they can just use PosixPath. -- components: Library (Lib), Windows messages: 310384 nosy: benrg, paul.moore, steve.dower, tim.golden, zach.ware priority: normal severity: normal status: open title: pathlib.(Pure)WindowsPaths can compare equal but refer to different files type: security versions: Python 3.4, Python 3.5, Python 3.6, Python 3.7 ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue32612> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue32525] Empty tuples are not optimized as constant expressions
New submission from benrg <benrud...@gmail.com>: >From 3.3 on, the expression () is compiled to BUILD_TUPLE 0 instead of >LOAD_CONST. That's probably fine and I suppose it's slightly more efficient to >avoid adding an entry to the constant table. The problem is that BUILD_TUPLE 0 is not treated as a constant for folding purposes, so any otherwise constant expression that contain () ends up compiling into O(n) bytecode instructions instead of 1. I think this is a bug (rather than an enhancement) because it seems unlikely to be the intended behavior. In 3.2 an earlier, and in 2.7, the constant-folding behavior is different, and many constant tuples aren't recognized at compile time for reasons unclear to me, but there are definitely cases it will fold that 3.3+ won't. For example, "x in {(), None}" tests a frozenset in 3.2, but builds a set at run time in 3.3+. -- components: Interpreter Core messages: 309739 nosy: benrg priority: normal severity: normal status: open title: Empty tuples are not optimized as constant expressions type: performance versions: Python 3.4, Python 3.5, Python 3.6, Python 3.7 ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue32525> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16701] Docs missing the behavior of += (in-place add) for lists.
benrg added the comment: AFAIK in C x += 1 is equivalent to x++, and both are semantically more about incrementing (mutating) the value of x than about creating a new value that gets assigned to x. Likewise it seems to me more natural to interpret x += y as add the value of y to the object x than add x and y together and save the result in x. Look, it's very simple: in C, ++x and x += 1 and x = x + 1 all mean the same thing. You can argue about how to describe the thing that they do, but there's only one thing to describe. Likewise, in every other language that borrows the op= syntax from C, it is a shorthand for the expanded version with the bare operator. As far as I know, Python is the only exception. If you know of another exception please say so. x = ([],) x[0] += [1] Traceback (most recent call last): File stdin, line 1, in module TypeError: 'tuple' object does not support item assignment x ([1],) I actually knew about this. It's an understandably difficult corner case, since the exception is raised after __iadd__ returns, so there's no chance for it to roll back its changes. At least, I thought it was a difficult corner case back when I thought the in-place update was a mere optimization. But if += really means .extend() on lists, this should not raise an exception at all. In fact there's no sense in having __iadd__ return a value that gets assigned anywhere, since mutable objects always mutate and return themselves and immutable objects don't define __iadd__. It looks like the interface was designed with the standard semantics in mind but the implementation did something different, leaving a vestigial assignment that's always a no-op. What a disaster. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16701 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16701] Docs missing the behavior of += (in-place add) for lists.
benrg added the comment: This is bizarre: Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600 32 bit (Intel)] on win32 Type help, copyright, credits or license for more information. x = y = [1, 2] x += [3] y [1, 2, 3] x = y = {1, 2} x -= {2} y {1} Since when has this been standard behavior? The documentation says: An augmented assignment expression like x += 1 can be rewritten as x = x + 1 to achieve a similar, but not exactly equal effect. In the augmented version, x is only evaluated once. Also, when possible, the actual operation is performed in-place, meaning that rather than creating a new object and assigning that to the target, the old object is modified instead. What is when possible supposed to mean here? I always thought it meant when there are known to be no other references to the object. If op= is always destructive on lists and sets, then where possible needs to be changed to always and a prominent warning added, like WARNING: X OP= EXPR DOES NOT BEHAVE EVEN REMOTELY LIKE X = X OP EXPR IN PYTHON WHEN X IS A MUTABLE OBJECT, IN STARK CONTRAST TO EVERY OTHER LANGUAGE WITH A SIMILAR SYNTAX. -- nosy: +benrg ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16701 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16701] Docs missing the behavior of += (in-place add) for lists.
benrg added the comment: As far as I know Ezio is correct, when possible means when the target is mutable. The documentation should probably be clarified on that point. Yes, it needs to be made very, very clear in the documentation. As I said, I'm not aware of any other language in which var op= expr does not mean the same thing as var = var op expr. I'm actually amazed that neither of you recognize the weirdness of this behavior (and even more amazed that GvR apparently didn't). I'm an experienced professional programmer, and I dutifully read the official documentation cover to cover when I started programming in Python, and I interpreted this paragraph wrongly, because I interpreted it in the only way that made sense given the meaning of these operators in every other language that has them. Python is designed to be unsurprising; constructs generally mean what it looks like they mean. You need to explain this unique feature of Python in terms so clear that it can't possibly be mistaken for the behavior of all of the other languages. Remember, Python names refer to pointers to objects, they are not variables in the sense that other languages have variables. That has nothing to do with this. Yes, in Python (and Java and Javascript and many other languages) all objects live on the heap, local variables are not first-class objects, and var = expr is a special form. That doesn't change the fact that in all of those other languages, var += expr means var = var + expr. In C++ local variables are first-class objects and var += expr means var.operator+=(expr) or operator+=(var, expr), and this normally modifies the thing on the left in a way that's visible through references. But in C++, var = var + expr also modifies the thing on the left, in the same way. In Python and Java and Javascript and ..., var = value never visibly mutates any heap object, and neither does var = var + value (in any library that defines a sane + operator), and therefore neither should var += value (again, in any sanely designed library). And it doesn't. Except in Python. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16701 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11427] ctypes from_buffer no longer accepts bytes
benrg benrud...@gmail.com added the comment: I am still interested in this for the same reason I was interested in this in the first place; nothing has changed. I guess I will reiterate, and try to expand. The problem is that ctypes tries to enforce const correctness (inconsistently), but it has no way to declare its objects as const, and assumes they are all non-const. This is the worst possible combination. It would make sense to have no notion of const and never try to enforce it (like KR C), and it would make sense to have constness and try to enforce it (like C++). But the present situation is like a compiler that treats string literals as const char[], and doesn't allow casting away const, but doesn't allow the use of the const keyword in user code, so there's no way to pass a string literal as a function argument because the necessary type can't be expressed. Instead you have to copy the literal to a char array and pass that. Calling C functions is inherently unsafe. Calling C functions could always corrupt the Python heap or otherwise screw around with state that the Python environment thinks that it owns. All I want is some means to assert that my code is not going to do that, as a way of getting around the limited type system that can't provide those sorts of guarantees. More broadly, what I want from ctypes is a way to do the sorts of things that I can do in C. If I can call foo(bar + 17) in C, I want to be able to make that call in Python. I wasn't using (c_type*N).from_buffer because I wanted to. I was using it after wasting literally hours trying to find some other way to get ctypes to agree to pass a pointer into the buffer of a Python object to an external function (which was not going to alter it, I promise). This should be easy; instead it's nearly impossible. I don't want to wrestle with random un-overridable attempts to enforce correctness when calling a language where that can never be enforced. I just want to call my C function. I'm pretty sure that I verified that this code worked in 3.1.3 before opening this bug, but it's been a while. I could try to reproduce it, but I think this functionality should be present regardless. You can call it a feature request instead of a regression if you want. -- resolution: invalid - status: closed - open ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11427 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11430] can't change the sizeof a Structure that doesn't own its buffer
New submission from benrg benrud...@gmail.com: A struct that is resized knows its new size; among other things, the new size is returned by sizeof. But it seems to be impossible to increase the size of a struct that doesn't own its buffer. resize fails in this case. This would not be too bad if the size were merely informational, but unfortunately ctypes also uses it for bounds checking. This makes from_buffer and from_address a lot less useful than they would otherwise be. I think that either resize should succeed when the underlying buffer is already large enough (or unconditionally in the case of from_address), or else from_buffer and from_address should take a size argument, or possibly both. -- assignee: theller components: ctypes messages: 130237 nosy: benrg, theller priority: normal severity: normal status: open title: can't change the sizeof a Structure that doesn't own its buffer type: feature request versions: Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11430 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11428] with statement looks up __exit__ incorrectly
benrg benrud...@gmail.com added the comment: But when I translate my example according to PEP 343, it works (i.e., doesn't raise an exception) in 3.2, and PEP 343 says [t]he details of the above translation are intended to prescribe the exact semantics. So I think that at least one of PEP 343, the evaluation of mgr.__exit__, or the evaluation of with mgr: pass must be broken, though I'm no longer sure which. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11428 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2405] Drop w9xpopen and all dependencies
benrg benrud...@gmail.com added the comment: w9xpopen is currently used on NT. The patch to use it on NT was checked in by bquinlan in August of 2001 (http://mail.python.org/pipermail/patches/2001-August/005719.html). He claims that it is necessary in NT, even though (a) the cited knowledge base article explicitly states that it is not necessary on NT, and (b) the knowledge base article has now been deleted from Microsoft's web site, indicating that they consider it no longer relevant (they have deleted all Win9x-specific documentation, but Win2K-specific documentation is still there). I just don't believe that the problem solved by w9xpopen has ever existed in any version of NT. There is no credible evidence for it. There are any number of other reasons why introducing an intermediate process might have hidden some unrelated bug or otherwise resolved the problem the Win9x-Win2K upgraders were having a decade ago. I think that the use of w9xpopen in NT is a bug, not an obsolete feature, and there's no reason it couldn't be gone in 3.2.1. Also, I suppose it doesn't matter any more, but the logic for deciding when to run w9xpopen should be (target executable is 16-bit), which can be determined by reading the file header. Right now the test is (shell is True and (running on win9x or the command processor is named command.com)). Every part of this test is deficient. Python programs can spawn 16-bit processes (including the shell itself) without using shell=True. Not every win9x shell is 16-bit; 32-bit shells like cmd.exe work fine. And there are 16-bit shells not named command.com, such as 4DOS. -- nosy: +benrg ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2405 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2405] Drop w9xpopen and all dependencies
benrg benrud...@gmail.com added the comment: It turns out that, on Windows 7 32-bit with COMSPEC pointing to command.com, platform.popen('dir').read() works with w9xpopen and fails (no output) without it. But the reason has nothing to do with the old Win9x problem. It's because subprocess always quotes the command line after /c, which command.com doesn't understand. But w9xpopen decodes the command line (in the runtime, before main() is called) and then reencodes it, this time quoting only arguments with spaces in them. Command.com then gets /c dir, and is happy. It would be interesting if this was the bug that led to w9xpopen being used in NT for the last ten years. There are layers upon layers of brokenness here. w9xpopen should not be messing with the command line in the first place; it should call GetCommandLine() and pass the result untouched to CreateProcess (after skipping its own name). It certainly should not be using the argv[] contents, which are parsed with an algorithm that doesn't match the one used by cmd.exe. The decoding-encoding process munges the command line in hard-to-understand ways. Additionally, subprocess.py doesn't quote the shell name (my usual shell is C:\Program Files\TCCLE12\TCC.EXE), and it converts an argument list to a string using list2cmdline even when shell=True, which makes little sense to me. I think w9xpopen should be deleted and forgotten. It was written badly and has apparently been largely ignored for 10+ years. There is probably a better solution to the problem even on Win9x, such as a worker thread in the Python process that waits on both the process and pipe handles. But also, all of the shell=True code in subprocess.py needs to be rethought from the ground up. I don't think it should exist at all; far better to provide convenient support in subprocess for setting up pipelines, and require people to explicitly invoke the shell for the few remaining legitimate use cases. That should probably be discussed elsewhere, though. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2405 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11427] ctypes from_buffer no longer accepts bytes
New submission from benrg benrud...@gmail.com: In Python 3.1.3, (c_char*5).from_buffer(b'abcde') worked. In 3.2 it fails with TypeError: expected an object with a writable buffer interface. This seems to represent a significant decrease in the functionality of ctypes, since, if I understand correctly, it has no notion of a const array or a const char. I used from_buffer with a bytes argument in 3.1 and it was far from obvious how to port to 3.2 without introducing expensive copying. I understand the motivation behind requiring a writable buffer, but I think it's a bad idea. If you take this to its logical conclusion, it should not be possible to pass bytes or str values directly to C functions, since there's no way to be sure they won't write through the pointer. -- assignee: theller components: ctypes messages: 130229 nosy: benrg, theller priority: normal severity: normal status: open title: ctypes from_buffer no longer accepts bytes type: behavior versions: Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11427 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11428] with statement looks up __exit__ incorrectly
New submission from benrg benrud...@gmail.com: class MakeContextHandler: def __init__(self, enter, exit): self.__enter__ = enter self.__exit__ = exit with MakeContextHandler(lambda: None, lambda *e: None): pass In 3.1.3 this worked; in 3.2 it raises AttributeError('__exit__'), which appears to be a bug. -- components: Interpreter Core messages: 130231 nosy: benrg priority: normal severity: normal status: open title: with statement looks up __exit__ incorrectly type: behavior versions: Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11428 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8847] crash appending list and namedtuple
benrg benrud...@gmail.com added the comment: The bug is still present in 3.2. -- versions: +Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8847 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11429] ctypes is highly eclectic in its raw-memory support
New submission from benrg benrud...@gmail.com: ctypes accepts bytes objects as arguments to C functions, but not bytearray objects. It has its own array types but seems to be unaware of array.array. It doesn't even understand memoryview objects. I think that all of these types should be passable to C code. Additionally, while passing a pointer to a bytes value to a C function is easy, it's remarkably difficult to pass that same pointer with an offset added to it. I first tried byref(buf, offset), but byref wouldn't accept bytes. Then I tried addressof(buf), but that didn't work either, even though ctypes is clearly able to obtain this address when it has to. After banging my head against the wall for longer than I care to think about, I finally came up with something like byref((c_char*length).from_buffer(buf), offset). But that broke in 3.2. After wasting even more time, I came up with addressof(cast(buf, POINTER(c_char)).contents) + offset. This is nuts. There should be a simple and documented way to do this. My first preference would be for the byref method, since it was the first thing I tried, and would have saved me the most time. Ideally both byref and addressof should work for bytes objects as they do for ctypes arrays (and also for bytearray, memoryview, etc.) -- assignee: theller components: ctypes messages: 130236 nosy: benrg, theller priority: normal severity: normal status: open title: ctypes is highly eclectic in its raw-memory support type: feature request versions: Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11429 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5391] mmap: read_byte/write_byte and object type
benrg benrud...@gmail.com added the comment: With this patch, read_byte returns an integer in the range -128 to 127 instead of 0 to 255 if char is signed. Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52) [MSC v.1500 32 bit (Intel)] on win32 is affected by this. I think it is a bug. The test code would fail if the test string contained any bytes outside the ASCII range. (Did this really go unnoticed for a year and a half? I noticed it the moment I first tried to use read_byte (which was just now). I see that read_byte was broken in a different way in 3.0. Does anybody actually use it?) -- nosy: +benrg ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5391 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8847] crash appending list and namedtuple
New submission from benrg benrud...@gmail.com: c:\python Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52) [MSC v.1500 32 bit (Intel)] on win32 Type help, copyright, credits or license for more information. from collections import namedtuple foo = namedtuple('foo', '') [1] + foo() At this point the interpreter crashes. Also happens when foo has named arguments, and in batch scripts. foo() + [1] throws a TypeError as expected. [] + foo() returns (). The immediate cause of the crash is the CALL instruction at 1E031D5A in python31.dll jumping into uninitialized memory. -- components: Interpreter Core, Library (Lib), Windows messages: 106695 nosy: benrg priority: normal severity: normal status: open title: crash appending list and namedtuple type: crash versions: Python 3.1 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8847 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com