[issue46868] Improve performance of math.prod with bignums (and functools.reduce?)

2022-02-28 Thread benrg


benrg  added the comment:

>That memory frugality adds a log2 factor to the runtime.

Your iterative algorithm is exactly the one I had in mind, but it doesn't have 
the run time that you seem to think. Is that the whole reason for our 
disagreement?

It does only O(1) extra work (not even amortized O(1), really O(1)) for each 
call of the binary function, and there are exactly n-1 calls. There's a log(n) 
term (not factor) for expanding the array and skipping NULLs in the final 
cleanup. The constant factor for it is tiny since the array is so small.

I implemented it in C and benchmarked it against reduce with unvarying 
arguments (binary | on identical ints), and it's slightly slower around 75% of 
the time, and slightly faster around 25% of the time, seemingly at random, even 
in the same test, which I suppose is related to where the allocator decides to 
put the temporaries. The reordering only needs to have a sliver of a benefit 
for it to come out on top.

When I said "at the cost of a log factor" in the first message, I meant 
relative to algorithms like ''.join, not left-reduce.


>I suspect the title of this report referenced "math.prod with bignums" because 
>it's the only actual concrete use case you had ;-)

I led with math.prod because its evaluation order isn't documented, so it can 
be changed (and I guess I should have said explicitly that there is no up-front 
penalty to changing it beyond tricky cache locality issues). I said "bignums" 
because I had in mind third-party libraries and the custom classes that I 
mentioned in my last message. I put ? after reduce because its 
left-associativity is documented and useful (e.g. with nonassociative 
functions), so it would have to be extended or a new function added, which is 
always a hard sell. I also wanted the title to be short. I did the best I could.

--

___
Python tracker 
<https://bugs.python.org/issue46868>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46791] Allow os.remove to defer to rmdir

2022-02-28 Thread benrg


benrg  added the comment:

The REMOVE_DIR case reduces to

return RemoveDirectoryW(path->wide) ? 0 : -1;

so I think there's no reason to combine it with the other two.

The REMOVE_BOTH case is

attrs = GetFileAttributesW(path->wide);

if (attrs != INVALID_FILE_ATTRIBUTES && (attrs & FILE_ATTRIBUTE_DIRECTORY)) 
{
success = RemoveDirectoryW(path->wide);
} else {
success = DeleteFileW(path->wide);
}

return success ? 0 : -1;

For REMOVE_BOTH, I don't see the need of calling GetFileAttributes - couldn't 
you just try DeleteFile, and if that fails, RemoveDirectory?

--
nosy: +benrg

___
Python tracker 
<https://bugs.python.org/issue46791>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46868] Improve performance of math.prod with bignums (and functools.reduce?)

2022-02-28 Thread benrg


benrg  added the comment:

Anything that produces output of O(m+n) size in O(m+n) time. Ordered merging 
operations. Mergesort is a binary ordered merge with log-depth tree reduction, 
and insertion sort is the same binary operation with linear-depth tree 
reduction. Say you're merging sorted lists of intervals, and overlapping 
intervals need special treatment. It's easier to write a manifestly correct 
binary merge than an n-way merge, or a filter after heapq.merge that needs to 
deal with complex interval clusters. I've written that sort of code.

Any situation that resembles a fast path but doesn't qualify for the fast path. 
For example, there's an optimized factorial function in math, but you need 
double factorial. Or math.prod is optimized for ints as you suggested, but you 
have a class that uses ints internally but doesn't pass the CheckExact test. 
Usually when you miss out on a fast path, you just take a (sometimes large) 
constant-factor penalty, but here it pushes you into a different complexity 
class. Or you have a class that uses floats internally and wants to limit 
accumulated roundoff errors, but the struture of the computation doesn't fit 
fsum.

>Tree reduction is very popular in the parallel processing world, for obvious 
>reasons.

It's the same reason in every case: the log depth limits the accumulation of 
some bad thing. In parallel computing it's critical-path length, in factorial 
and mergesort it's size, in fsum it's roundoff error. Log depth helps in a 
range of situations.

>I've searched in vain for other languages that try to address this "in general"

You've got me there.

>As Guido will tell you, the only original idea in Python is adding an "else" 
>clause to loops ;-)

I don't think that's really true, except in the sense that there's nothing new 
under the sun. No one would use Python if it was just like other languages 
except slower and with for-else.

--

___
Python tracker 
<https://bugs.python.org/issue46868>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46868] Improve performance of math.prod with bignums (and functools.reduce?)

2022-02-27 Thread benrg


benrg  added the comment:

My example used ints, but I was being deliberately vague when I said "bignums". 
Balanced-tree reduction certainly isn't optimal for ints, and may not be 
optimal for anything, but it's pretty good for a lot of things. It's the 
comparison-based sorting of reduction algorithms.

* When the inputs are of similar sizes, it tends to produce intermediate 
operands of similar sizes, which helps with Karatsuba multiplication (as you 
said).

* When the inputs are of different sizes, it limits the "damage" any one of 
them can do, since they only participate in log2(n) operations each.

* It doesn't look at the values, so it works with third-party types that are 
unknown to the stdlib.

There's always a fallback case, and balanced reduction is good for that. If 
there's a faster path for ints that looks at their bit lengths, great.

--

___
Python tracker 
<https://bugs.python.org/issue46868>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46868] Improve performance of math.prod with bignums (and functools.reduce?)

2022-02-26 Thread benrg


New submission from benrg :

math.prod is slow at multiplying arbitrary-precision numbers. E.g., compare the 
run time of factorial(5) to prod(range(2, 50001)).

factorial has some special-case optimizations, but the bulk of the difference 
is due to prod evaluating an expression tree of depth n. If you re-parenthesize 
the product so that the tree has depth log n, as factorial does, it's much 
faster. The evaluation order of prod isn't documented, so I think the change 
would be safe.

factorial uses recursion to build the tree, but it can be done iteratively with 
no advance knowledge of the total number of nodes.

This trick is widely useful for turning a way of combining two things into a 
way of combining many things, so I wouldn't mind seeing a generic version of it 
in the standard library, e.g. reduce(..., order='mid'). For many specific cases 
there are more efficient alternatives (''.join, itertools.chain, set.unions, 
heapq.merge), but it's nice to have a recipe that saves you the trouble of 
writing special-case algorithms at the cost of a log factor that's often 
ignorable.

--
components: Library (Lib)
messages: 414126
nosy: benrg
priority: normal
severity: normal
status: open
title: Improve performance of math.prod with bignums (and functools.reduce?)
type: enhancement
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue46868>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28824] os.environ should preserve the case of the OS keys ?

2022-02-26 Thread benrg


benrg  added the comment:

This issue should be marked dependent on issue 43702 or issue 46862, since 
fixing it could break third-party code unless they're fixed first.


> Given 'nt.environ' is available without case remapping, I think that's the 
> best workaround.

Right now, it's not a good workaround because it contains the environment at 
the time the interpreter was started, not the current environment. On Posix, 
_Environ takes a reference to posix.environ and uses it directly, so it does 
get updated. On Windows, _Environ gets a rewritten dictionary and nt.environ is 
just a space-wasting attractive nuisance. I think it should be replaced with 
getenviron() which builds a dict from the environment block each time it's 
called. But posix.environ is documented (though nt.environ isn't), so maybe not.


> class _CaseInsensitiveString(str):

I think there should be a public class like this. It could be useful to 
email.message.Message and its clients like urllib. They currently store headers 
in a list and every operation is O(n).

The semantics are tricky. As written, it violates the requirement that equal 
objects have equal hashes. To fix that, you'd have to make every CIS compare 
unequal to every str. At that point, it probably shouldn't be a str subclass, 
which also has the advantage that it's not limited to strings. It can be a 
generic compare-by-key wrapper.

--
nosy: +benrg

___
Python tracker 
<https://bugs.python.org/issue28824>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46862] subprocess makes environment blocks with duplicate keys on Windows

2022-02-25 Thread benrg


New submission from benrg :

On Windows, if one writes

env = os.environ.copy()
env['http_proxy'] = 'whatever'

or either of the documented equivalents ({**os.environ, ...} or (os.environ | 
{...})), and passes the resulting environment to subprocess.run or 
subprocess.Popen, the spawned process may get an environment containing both 
`HTTP_PROXY` and `http_proxy`. Most Win32 software will see only the first one, 
which contains the unmodified value from os.environ.

Because os.environ forces all keys to upper case, it's possible to work around 
this by using only upper case keys in the update, but that behavior of 
os.environ is nonstandard (issue 46861), and subprocess shouldn't depend on it 
always being true, nor should end users have to.

Since dicts preserve order, the user's (presumable) intent is preserved in the 
env argument. I think subprocess should do something like

env = {k.upper(): (k, v) for k, v in env.items()}
env = dict(env.values())

to discard duplicate keys, keeping only the rightmost one.

--
components: Library (Lib), Windows
messages: 414068
nosy: benrg, paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: subprocess makes environment blocks with duplicate keys on Windows
type: behavior
versions: Python 3.10

___
Python tracker 
<https://bugs.python.org/issue46862>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46861] os.environ forces variable names to upper case on Windows

2022-02-25 Thread benrg


New submission from benrg :

The Windows functions that deal with environment variables are case-insensitive 
and case-preserving, like most Windows file systems. Many environment variables 
are conventionally written in all caps, but others aren't, such as 
`ProgramData`, `PSModulePath`, and `windows_tracing_logfile`.

os.environ forces all environment variable names to upper case when it's 
constructed. One consequence is that if you pass a modified environment to 
subprocess.Popen, you end up with variables named `PROGRAMDATA`, etc., even if 
you didn't modify their values.

While this is unlikely to break things since other software normally ignores 
the case, it's nonstandard behavior, and disconcerting when the affected 
variable names are shown to human beings.

Here's an example of someone being confused by this: 
https://stackoverflow.com/questions/19023238/why-python-uppercases-all-environment-variables-in-windows

--
components: Library (Lib), Windows
messages: 414064
nosy: benrg, paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: os.environ forces variable names to upper case on Windows
type: behavior
versions: Python 3.10, Python 3.11, Python 3.7, Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue46861>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46858] mmap constructor resets the file pointer on Windows

2022-02-25 Thread benrg


New submission from benrg :

On Windows, `mmap.mmap(f.fileno(), ...)` has the undocumented side effect of 
setting f's file pointer to 0.

The responsible code in mmapmodule is this:

/* Win9x appears to need us seeked to zero */
lseek(fileno, 0, SEEK_SET);

Win9x is no longer supported, and I'm quite sure that NT doesn't have whatever 
problem they were trying to fix. I think this code should be deleted, and a 
regression test added to verify that mmap leaves the file pointer alone on all 
platforms.

(mmap also maintains its own file pointer, the `pos` field of `mmap_object`, 
which is initially set to zero. This issue is about the kernel file pointer, 
not mmap's pointer.)

--
components: IO, Library (Lib), Windows
messages: 414039
nosy: benrg, paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: mmap constructor resets the file pointer on Windows
type: behavior
versions: Python 3.10, Python 3.11, Python 3.7, Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue46858>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46848] Use optimized string search function in mmap.find()

2022-02-24 Thread benrg


benrg  added the comment:

memmem isn't a standard C function, and some libraries don't have it, notably 
Microsoft's.

newlib's memmem seems to be the same as glibc's, but is under a BSD 3-clause 
license instead of LGPL. An older version of newlib's memmem (prior to 
2019-01-01) has the license "Permission to use, copy, modify, and distribute 
this software is freely granted, provided that this notice is preserved", and 
is still highly optimized and much better than a naive implementation.

Of course, bundling it would no longer be quite so "free".

Old newlib memmem: 
https://sourceware.org/git/?p=newlib-cygwin.git;a=blob_plain;f=newlib/libc/string/memmem.c;h=25704e467decff5971b34f4189ddfff04ac5fa8e

New newlib memmem: 
https://sourceware.org/git/?p=newlib-cygwin.git;a=blob_plain;f=newlib/libc/string/memmem.c

Helper file for both: 
https://sourceware.org/git/?p=newlib-cygwin.git;a=blob_plain;f=newlib/libc/string/str-two-way.h

------
nosy: +benrg

___
Python tracker 
<https://bugs.python.org/issue46848>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46842] py to pyc location mapping with sys.pycache_prefix isn't 1-to-1 on Windows

2022-02-23 Thread benrg


New submission from benrg :

`importlib._bootstrap_external` contains this comment:

# We need an absolute path to the py file to avoid the possibility of
# collisions within sys.pycache_prefix [...]
# [...] the idea here is that if we get `Foo\Bar`, we first
# make it absolute (`C:\Somewhere\Foo\Bar`), then make it root-relative
# (`Somewhere\Foo\Bar`), so we end up placing the bytecode file in an
# unambiguous `C:\Bytecode\Somewhere\Foo\Bar\`.

The code follows the comment, but doesn't achieve the goal: 
`C:\Somewhere\Foo\Bar` and `D:\Somewhere\Foo\Bar` collide. There is also no 
explicit handling of UNC paths, with the result that `\\Somewhere\Foo\Bar` maps 
to the same location.

I think that on Windows the code should use a mapping like

C:\Somewhere\Foo\Bar  ==>  C:\Bytecode\C\Somewhere\Foo\Bar
D:\Somewhere\Foo\Bar  ==>  C:\Bytecode\D\Somewhere\Foo\Bar
\\Somewhere\Foo\Bar   ==>  C:\Bytecode\UNC\Somewhere\Foo\Bar

The lack of double-slash prefix handling also matters on Unixy platforms that 
give it a special meaning. Cygwin is probably affected by this. I don't know 
whether there are any others.

--
components: Library (Lib), Windows
messages: 413878
nosy: benrg, paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: py to pyc location mapping with sys.pycache_prefix isn't 1-to-1 on 
Windows
type: behavior
versions: Python 3.10, Python 3.11, Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue46842>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42627] urllib.request.getproxies() misparses Windows registry proxy settings

2020-12-12 Thread benrg


New submission from benrg :

If `HKCU\Software\Microsoft\Windows\CurrentVersion\Internet 
Settings\ProxyServer` contains the string 
`http=host:123;https=host:456;ftp=host:789`, then getproxies_registry() should 
return

{'http': 'http://host:123', 'https': 'http://host:456', 'ftp': 
'http://host:789'}

for consistency with WinInet and Chromium, but it actually returns

{'http': 'http://host:123', 'https': 'https://host:456', 'ftp': 
'ftp://host:789'}

This bug has existed for a very long time (since Python 2.0.1 if not earlier), 
but it was exposed recently when urllib3 added support for HTTPS-in-HTTPS 
proxies in version 1.26. Before that, an `https` prefix on the HTTPS proxy url 
was silently treated as `http`, accidentally resulting in the correct behavior.

There are additional bugs in the treatment of single-proxy strings (the case 
when the string contains no `=` character).

The Chromium code for parsing the ProxyServer string can be found here: 
https://source.chromium.org/chromium/chromium/src/+/refs/tags/89.0.4353.1:net/proxy_resolution/proxy_config.cc;l=86

Below is my attempt at modifying the code from `getproxies_registry` to 
approximately match Chromium's behavior. I could turn this into a patch, but 
I'd like feedback on the corner cases first.

if '=' not in proxyServer and ';' not in proxyServer:
# Use one setting for all protocols.
# Chromium treats this as a separate category, and some software
# uses the ALL_PROXY environment variable for a similar purpose,
# so arguably this should be 'all={}'.format(proxyServer),
# but this is more backward compatible.
proxyServer = 'http={0};https={0};ftp={0}'.format(proxyServer)

for p in proxyServer.split(';'):
# Chromium and WinInet are inconsistent in their treatment of
# invalid strings with the wrong number of = characters. It
# probably doesn't matter.
protocol, addresses = p.split('=', 1)
protocol = protocol.strip()

# Chromium supports more than one proxy per protocol. I don't
# know how many clients support the same, but handling it is at
# least no worse than leaving the commas uninterpreted.
for address in addresses.split(','):
if protocol in {'http', 'https', 'ftp', 'socks'}:
# See if address has a type:// prefix
if not re.match('(?:[^/:]+)://', address):
if protocol == 'socks':
# Chromium notes that the correct protocol here
# is SOCKS4, but "socks://" is interpreted
# as SOCKS5 elsewhere. I don't know whether
# prepending socks4:// here would break code.
address = 'socks://' + address
else:
address = 'http://' + address

# A string like 'http=foo;http=bar' will produce a
# comma-separated list, while previously 'bar' would
# override 'foo'. That could potentially break something.
if protocol not in proxies:
proxies[protocol] = address
else:
proxies[protocol] += ',' + address

--
components: Library (Lib), Windows
messages: 382921
nosy: benrg, paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: urllib.request.getproxies() misparses Windows registry proxy settings
type: behavior
versions: Python 3.10, Python 3.6, Python 3.7, Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue42627>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32612] pathlib.(Pure)WindowsPaths can compare equal but refer to different files

2018-01-23 Thread benrg

benrg <benrud...@gmail.com> added the comment:

I don't know whether this clarifies it at all, but if x and y are Path objects, 
and x == y, I would expect also x.exists() == y.exists(), and x.read_bytes() == 
y.read_bytes(), and so on, unless there is a race condition. I think all 
programmers expect that if x == y, then they refer to the same file. This is 
not true currently.

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32612>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32612] pathlib.(Pure)WindowsPaths can compare equal but refer to different files

2018-01-22 Thread benrg

benrg <benrud...@gmail.com> added the comment:

This bug is about paths that compare *equal*, but refer to *different* files. I 
agree that the opposite is not much of a problem (and I said so in the original 
comment).

The reason I classified this as a security bug is that Python scripts using 
pathlib on Windows could be vulnerable in certain cases to an attacker that can 
choose file names. For example, the order in which paths are added to a set or 
dict could affect which of two files is seen by the script. If different parts 
of the script add files in different orders - which would normally be safe - 
the result could be similar to a TOCTTOU race.

I don't disagree that "doing a good enough job of case folding is better than 
ignoring it." I just think that pathlib should not case-fold strings that 
Windows filesystems don't.

--
nosy: +pitrou
type: enhancement -> security

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32612>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32612] pathlib.(Pure)WindowsPaths can compare equal but refer to different files

2018-01-21 Thread benrg

New submission from benrg <benrud...@gmail.com>:

(Pure)WindowsPath uses str.lower to fold paths for comparison and hashing. This 
doesn't match the case folding of actual Windows file systems. There exist 
WindowsPath objects that compare and hash equal, but refer to different files. 
For example, the strings

  '\xdf' (sharp S) and '\u1e9e' (capital sharp S)
  '\u01c7' (LJ) and '\u01c8' (Lj)
  '\u0130' (I with dot) and 'i\u0307' (i followed by combining dot)
  'K' and '\u212a' (Kelvin sign)

are equal under str.lower folding but are distinct file names on NTFS volumes 
on my Windows 7 machine. There are hundreds of other such pairs.

I think this is very bad. The reverse (paths that compare unequal but refer to 
the same file) is probably unavoidable and is expected by programmers. But 
paths that compare equal should never be unequal to the OS.

How to fix this:

Unfortunately, there is no correct way to case fold Windows paths. The FAT, 
NTFS, and exFAT drivers on my machine all have different behavior. (The 
examples above work on all three, except for 'K' and '\u212a', which are 
equivalent on FAT volumes.) NTFS stores its case-folding map on each volume in 
the hidden $UpCase file, so even different NTFS volumes on the same machine can 
have different behavior. The contents of $UpCase have changed over time as 
Windows is updated to support new Unicode versions. NTFS and NFS (and possibly 
WebDAV) also support full case sensitivity when used with Interix/SUA and 
Cygwin, though this requires disabling system-wide case insensitivity via the 
registry.

I think that pathlib should either give up on case folding entirely, or should 
fold very conservatively, treating WCHARs as equivalent only if they're 
equivalent on all standard file systems on all supported Windows versions.

If pathlib folds case at all, there should be a solution for people who need to 
interoperate with Cygwin or SUA tools on a case-sensitive machine, but I 
suppose they can just use PosixPath.

--
components: Library (Lib), Windows
messages: 310384
nosy: benrg, paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: pathlib.(Pure)WindowsPaths can compare equal but refer to different files
type: security
versions: Python 3.4, Python 3.5, Python 3.6, Python 3.7

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32612>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32525] Empty tuples are not optimized as constant expressions

2018-01-09 Thread benrg

New submission from benrg <benrud...@gmail.com>:

>From 3.3 on, the expression () is compiled to BUILD_TUPLE 0 instead of 
>LOAD_CONST. That's probably fine and I suppose it's slightly more efficient to 
>avoid adding an entry to the constant table.

The problem is that BUILD_TUPLE 0 is not treated as a constant for folding 
purposes, so any otherwise constant expression that contain () ends up 
compiling into O(n) bytecode instructions instead of 1. I think this is a bug 
(rather than an enhancement) because it seems unlikely to be the intended 
behavior.

In 3.2 an earlier, and in 2.7, the constant-folding behavior is different, and 
many constant tuples aren't recognized at compile time for reasons unclear to 
me, but there are definitely cases it will fold that 3.3+ won't. For example, 
"x in {(), None}" tests a frozenset in 3.2, but builds a set at run time in 
3.3+.

--
components: Interpreter Core
messages: 309739
nosy: benrg
priority: normal
severity: normal
status: open
title: Empty tuples are not optimized as constant expressions
type: performance
versions: Python 3.4, Python 3.5, Python 3.6, Python 3.7

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32525>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16701] Docs missing the behavior of += (in-place add) for lists.

2013-01-24 Thread benrg

benrg added the comment:

 AFAIK in C x += 1 is equivalent to x++, and both are semantically
 more about incrementing (mutating) the value of x than about creating a
 new value that gets assigned to x. Likewise it seems to me more natural
 to interpret x += y as add the value of y to the object x than add
 x and y together and save the result in x.

Look, it's very simple: in C, ++x and x += 1 and x = x + 1 all mean the same 
thing. You can argue about how to describe the thing that they do, but there's 
only one thing to describe. Likewise, in every other language that borrows the 
op= syntax from C, it is a shorthand for the expanded version with the bare 
operator. As far as I know, Python is the only exception. If you know of 
another exception please say so.


  x = ([],)
  x[0] += [1]
 Traceback (most recent call last):
   File stdin, line 1, in module
 TypeError: 'tuple' object does not support item assignment
  x
 ([1],)

I actually knew about this. It's an understandably difficult corner case, since 
the exception is raised after __iadd__ returns, so there's no chance for it to 
roll back its changes.

At least, I thought it was a difficult corner case back when I thought the 
in-place update was a mere optimization. But if += really means .extend() on 
lists, this should not raise an exception at all. In fact there's no sense in 
having __iadd__ return a value that gets assigned anywhere, since mutable 
objects always mutate and return themselves and immutable objects don't define 
__iadd__. It looks like the interface was designed with the standard semantics 
in mind but the implementation did something different, leaving a vestigial 
assignment that's always a no-op. What a disaster.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16701
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16701] Docs missing the behavior of += (in-place add) for lists.

2013-01-23 Thread benrg

benrg added the comment:

This is bizarre:

Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600 32 bit 
(Intel)] on win32
Type help, copyright, credits or license for more information.
 x = y = [1, 2]
 x += [3]
 y
[1, 2, 3]
 x = y = {1, 2}
 x -= {2}
 y
{1}


Since when has this been standard behavior? The documentation says:

An augmented assignment expression like x += 1 can be rewritten as x = x + 1 
to achieve a similar, but not exactly equal effect. In the augmented version, x 
is only evaluated once. Also, when possible, the actual operation is performed 
in-place, meaning that rather than creating a new object and assigning that to 
the target, the old object is modified instead.

What is when possible supposed to mean here? I always thought it meant when 
there are known to be no other references to the object. If op= is always 
destructive on lists and sets, then where possible needs to be changed to 
always and a prominent warning added, like WARNING: X OP= EXPR DOES NOT 
BEHAVE EVEN REMOTELY LIKE X = X OP EXPR IN PYTHON WHEN X IS A MUTABLE OBJECT, 
IN STARK CONTRAST TO EVERY OTHER LANGUAGE WITH A SIMILAR SYNTAX.

--
nosy: +benrg

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16701
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16701] Docs missing the behavior of += (in-place add) for lists.

2013-01-23 Thread benrg

benrg added the comment:

 As far as I know Ezio is correct, when possible means when the target is 
 mutable.  The documentation should probably be clarified on that point.

Yes, it needs to be made very, very clear in the documentation. As I said, I'm 
not aware of any other language in which var op= expr does not mean the same 
thing as var = var op expr. I'm actually amazed that neither of you recognize 
the weirdness of this behavior (and even more amazed that GvR apparently 
didn't). I'm an experienced professional programmer, and I dutifully read the 
official documentation cover to cover when I started programming in Python, and 
I interpreted this paragraph wrongly, because I interpreted it in the only way 
that made sense given the meaning of these operators in every other language 
that has them. Python is designed to be unsurprising; constructs generally mean 
what it looks like they mean. You need to explain this unique feature of Python 
in terms so clear that it can't possibly be mistaken for the behavior of all of 
the other languages.

 Remember, Python names refer to pointers to objects, they are not variables 
 in the sense that other languages have variables.

That has nothing to do with this. Yes, in Python (and Java and Javascript and 
many other languages) all objects live on the heap, local variables are not 
first-class objects, and var = expr is a special form. That doesn't change the 
fact that in all of those other languages, var += expr means var = var + expr. 
In C++ local variables are first-class objects and var += expr means 
var.operator+=(expr) or operator+=(var, expr), and this normally modifies the 
thing on the left in a way that's visible through references. But in C++, var = 
var + expr also modifies the thing on the left, in the same way.

In Python and Java and Javascript and ..., var = value never visibly mutates 
any heap object, and neither does var = var + value (in any library that 
defines a sane + operator), and therefore neither should var += value (again, 
in any sanely designed library). And it doesn't. Except in Python.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16701
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11427] ctypes from_buffer no longer accepts bytes

2011-11-29 Thread benrg

benrg benrud...@gmail.com added the comment:

I am still interested in this for the same reason I was interested in this in 
the first place; nothing has changed. I guess I will reiterate, and try to 
expand.

The problem is that ctypes tries to enforce const correctness (inconsistently), 
but it has no way to declare its objects as const, and assumes they are all 
non-const. This is the worst possible combination. It would make sense to have 
no notion of const and never try to enforce it (like KR C), and it would make 
sense to have constness and try to enforce it (like C++). But the present 
situation is like a compiler that treats string literals as const char[], and 
doesn't allow casting away const, but doesn't allow the use of the const 
keyword in user code, so there's no way to pass a string literal as a function 
argument because the necessary type can't be expressed. Instead you have to 
copy the literal to a char array and pass that.

Calling C functions is inherently unsafe. Calling C functions could always 
corrupt the Python heap or otherwise screw around with state that the Python 
environment thinks that it owns. All I want is some means to assert that my 
code is not going to do that, as a way of getting around the limited type 
system that can't provide those sorts of guarantees. More broadly, what I want 
from ctypes is a way to do the sorts of things that I can do in C. If I can 
call foo(bar + 17) in C, I want to be able to make that call in Python.

I wasn't using (c_type*N).from_buffer because I wanted to. I was using it after 
wasting literally hours trying to find some other way to get ctypes to agree to 
pass a pointer into the buffer of a Python object to an external function 
(which was not going to alter it, I promise). This should be easy; instead it's 
nearly impossible. I don't want to wrestle with random un-overridable attempts 
to enforce correctness when calling a language where that can never be 
enforced. I just want to call my C function.

I'm pretty sure that I verified that this code worked in 3.1.3 before opening 
this bug, but it's been a while. I could try to reproduce it, but I think this 
functionality should be present regardless. You can call it a feature request 
instead of a regression if you want.

--
resolution: invalid - 
status: closed - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11427
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11430] can't change the sizeof a Structure that doesn't own its buffer

2011-03-07 Thread benrg

New submission from benrg benrud...@gmail.com:

A struct that is resized knows its new size; among other things, the new size 
is returned by sizeof.

But it seems to be impossible to increase the size of a struct that doesn't own 
its buffer. resize fails in this case. This would not be too bad if the size 
were merely informational, but unfortunately ctypes also uses it for bounds 
checking. This makes from_buffer and from_address a lot less useful than they 
would otherwise be.

I think that either resize should succeed when the underlying buffer is already 
large enough (or unconditionally in the case of from_address), or else 
from_buffer and from_address should take a size argument, or possibly both.

--
assignee: theller
components: ctypes
messages: 130237
nosy: benrg, theller
priority: normal
severity: normal
status: open
title: can't change the sizeof a Structure that doesn't own its buffer
type: feature request
versions: Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11430
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11428] with statement looks up __exit__ incorrectly

2011-03-07 Thread benrg

benrg benrud...@gmail.com added the comment:

But when I translate my example according to PEP 343, it works (i.e., doesn't 
raise an exception) in 3.2, and PEP 343 says [t]he details of the above 
translation are intended to prescribe the exact semantics. So I think that at 
least one of PEP 343, the evaluation of mgr.__exit__, or the evaluation of with 
mgr: pass must be broken, though I'm no longer sure which.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11428
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2405] Drop w9xpopen and all dependencies

2011-03-07 Thread benrg

benrg benrud...@gmail.com added the comment:

w9xpopen is currently used on NT. The patch to use it on NT was checked in by 
bquinlan in August of 2001 
(http://mail.python.org/pipermail/patches/2001-August/005719.html). He claims 
that it is necessary in NT, even though (a) the cited knowledge base article 
explicitly states that it is not necessary on NT, and (b) the knowledge base 
article has now been deleted from Microsoft's web site, indicating that they 
consider it no longer relevant (they have deleted all Win9x-specific 
documentation, but Win2K-specific documentation is still there).

I just don't believe that the problem solved by w9xpopen has ever existed in 
any version of NT. There is no credible evidence for it. There are any number 
of other reasons why introducing an intermediate process might have hidden some 
unrelated bug or otherwise resolved the problem the Win9x-Win2K upgraders were 
having a decade ago. I think that the use of w9xpopen in NT is a bug, not an 
obsolete feature, and there's no reason it couldn't be gone in 3.2.1.

Also, I suppose it doesn't matter any more, but the logic for deciding when to 
run w9xpopen should be (target executable is 16-bit), which can be determined 
by reading the file header. Right now the test is (shell is True and (running 
on win9x or the command processor is named command.com)). Every part of this 
test is deficient. Python programs can spawn 16-bit processes (including the 
shell itself) without using shell=True. Not every win9x shell is 16-bit; 32-bit 
shells like cmd.exe work fine. And there are 16-bit shells not named 
command.com, such as 4DOS.

--
nosy: +benrg

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2405
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2405] Drop w9xpopen and all dependencies

2011-03-07 Thread benrg

benrg benrud...@gmail.com added the comment:

It turns out that, on Windows 7 32-bit with COMSPEC pointing to command.com, 
platform.popen('dir').read() works with w9xpopen and fails (no output) without 
it.

But the reason has nothing to do with the old Win9x problem. It's because 
subprocess always quotes the command line after /c, which command.com doesn't 
understand. But w9xpopen decodes the command line (in the runtime, before 
main() is called) and then reencodes it, this time quoting only arguments with 
spaces in them. Command.com then gets /c dir, and is happy. It would be 
interesting if this was the bug that led to w9xpopen being used in NT for the 
last ten years.

There are layers upon layers of brokenness here. w9xpopen should not be messing 
with the command line in the first place; it should call GetCommandLine() and 
pass the result untouched to CreateProcess (after skipping its own name). It 
certainly should not be using the argv[] contents, which are parsed with an 
algorithm that doesn't match the one used by cmd.exe. The decoding-encoding 
process munges the command line in hard-to-understand ways. Additionally, 
subprocess.py doesn't quote the shell name (my usual shell is C:\Program 
Files\TCCLE12\TCC.EXE), and it converts an argument list to a string using 
list2cmdline even when shell=True, which makes little sense to me.

I think w9xpopen should be deleted and forgotten. It was written badly and has 
apparently been largely ignored for 10+ years. There is probably a better 
solution to the problem even on Win9x, such as a worker thread in the Python 
process that waits on both the process and pipe handles. But also, all of the 
shell=True code in subprocess.py needs to be rethought from the ground up. I 
don't think it should exist at all; far better to provide convenient support in 
subprocess for setting up pipelines, and require people to explicitly invoke 
the shell for the few remaining legitimate use cases. That should probably be 
discussed elsewhere, though.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2405
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11427] ctypes from_buffer no longer accepts bytes

2011-03-06 Thread benrg

New submission from benrg benrud...@gmail.com:

In Python 3.1.3, (c_char*5).from_buffer(b'abcde') worked. In 3.2 it fails with 
TypeError: expected an object with a writable buffer interface.

This seems to represent a significant decrease in the functionality of ctypes, 
since, if I understand correctly, it has no notion of a const array or a const 
char. I used from_buffer with a bytes argument in 3.1 and it was far from 
obvious how to port to 3.2 without introducing expensive copying. I understand 
the motivation behind requiring a writable buffer, but I think it's a bad idea. 
If you take this to its logical conclusion, it should not be possible to pass 
bytes or str values directly to C functions, since there's no way to be sure 
they won't write through the pointer.

--
assignee: theller
components: ctypes
messages: 130229
nosy: benrg, theller
priority: normal
severity: normal
status: open
title: ctypes from_buffer no longer accepts bytes
type: behavior
versions: Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11427
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11428] with statement looks up __exit__ incorrectly

2011-03-06 Thread benrg

New submission from benrg benrud...@gmail.com:

class MakeContextHandler:
  def __init__(self, enter, exit):
self.__enter__ = enter
self.__exit__ = exit

with MakeContextHandler(lambda: None, lambda *e: None): pass

In 3.1.3 this worked; in 3.2 it raises AttributeError('__exit__'), which 
appears to be a bug.

--
components: Interpreter Core
messages: 130231
nosy: benrg
priority: normal
severity: normal
status: open
title: with statement looks up __exit__ incorrectly
type: behavior
versions: Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11428
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8847] crash appending list and namedtuple

2011-03-06 Thread benrg

benrg benrud...@gmail.com added the comment:

The bug is still present in 3.2.

--
versions: +Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8847
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11429] ctypes is highly eclectic in its raw-memory support

2011-03-06 Thread benrg

New submission from benrg benrud...@gmail.com:

ctypes accepts bytes objects as arguments to C functions, but not bytearray 
objects. It has its own array types but seems to be unaware of array.array. It 
doesn't even understand memoryview objects. I think that all of these types 
should be passable to C code.

Additionally, while passing a pointer to a bytes value to a C function is easy, 
it's remarkably difficult to pass that same pointer with an offset added to it. 
I first tried byref(buf, offset), but byref wouldn't accept bytes. Then I tried 
addressof(buf), but that didn't work either, even though ctypes is clearly able 
to obtain this address when it has to. After banging my head against the wall 
for longer than I care to think about, I finally came up with something like 
byref((c_char*length).from_buffer(buf), offset). But that broke in 3.2. After 
wasting even more time, I came up with addressof(cast(buf, 
POINTER(c_char)).contents) + offset. This is nuts. There should be a simple and 
documented way to do this. My first preference would be for the byref method, 
since it was the first thing I tried, and would have saved me the most time. 
Ideally both byref and addressof should work for bytes objects as they do for 
ctypes arrays (and also for bytearray, memoryview, etc.)

--
assignee: theller
components: ctypes
messages: 130236
nosy: benrg, theller
priority: normal
severity: normal
status: open
title: ctypes is highly eclectic in its raw-memory support
type: feature request
versions: Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11429
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5391] mmap: read_byte/write_byte and object type

2010-10-16 Thread benrg

benrg benrud...@gmail.com added the comment:

With this patch, read_byte returns an integer in the range -128 to 127 instead 
of 0 to 255 if char is signed. Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52) 
[MSC v.1500 32 bit (Intel)] on win32 is affected by this. I think it is a bug. 
The test code would fail if the test string contained any bytes outside the 
ASCII range.

(Did this really go unnoticed for a year and a half? I noticed it the moment I 
first tried to use read_byte (which was just now). I see that read_byte was 
broken in a different way in 3.0. Does anybody actually use it?)

--
nosy: +benrg

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5391
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8847] crash appending list and namedtuple

2010-05-28 Thread benrg

New submission from benrg benrud...@gmail.com:

c:\python
  Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52) [MSC v.1500 32 bit (Intel)] 
on win32
  Type help, copyright, credits or license for more information.
   from collections import namedtuple
   foo = namedtuple('foo', '')
   [1] + foo()

At this point the interpreter crashes. Also happens when foo has named 
arguments, and in batch scripts. foo() + [1] throws a TypeError as expected. [] 
+ foo() returns (). The immediate cause of the crash is the CALL instruction at 
1E031D5A in python31.dll jumping into uninitialized memory.

--
components: Interpreter Core, Library (Lib), Windows
messages: 106695
nosy: benrg
priority: normal
severity: normal
status: open
title: crash appending list and namedtuple
type: crash
versions: Python 3.1

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8847
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com