[issue46065] re.findall takes forever and never ends

2021-12-19 Thread Gareth Rees


Gareth Rees  added the comment:

This kind of question is frequently asked (#3128, #29977, #28690, #30973, 
#1737127, etc.), and so maybe it deserves an answer somewhere in the Python 
documentation.

--
resolution:  -> wont fix
stage:  -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue46065>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46065] re.findall takes forever and never ends

2021-12-19 Thread Gareth Rees


Gareth Rees  added the comment:

The way to avoid this behaviour is to disallow the attempts at matching that 
you know are going to fail. As Serhiy described above, if the search fails 
starting at the first character of the string, it will move forward and try 
again starting at the second character. But you know that this new attempt must 
fail, so you can force the regular expression engine to discard the attempt 
immediately.

Here's an illustration in a simpler setting, where we are looking for all 
strings of 'a' followed by 'b':

>>> import re
>>> from timeit import timeit
>>> text = 'a' * 10
>>> timeit(lambda:re.findall(r'a+b', text), number=1)
6.64353118114

We know that any successful match must be preceded by a character other than 
'a' (or the beginning of the string), so we can reject many unsuccessful 
matches like this:

>>> timeit(lambda:re.findall(r'(?:^|[^a])(a+b)', text), number=1)
0.00374348114981

In your case, a successful match must be preceded by [^a-zA-Z0-9_.+-] (or the 
beginning of the string).

--
nosy: +g...@garethrees.org

___
Python tracker 
<https://bugs.python.org/issue46065>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15443] datetime module has no support for nanoseconds

2021-12-18 Thread Gareth Rees


Gareth Rees  added the comment:

I also have a use case that would benefit from nanosecond resolution in 
Python's datetime objects, that is, representing and querying the results of 
clock_gettime() in a program trace.

On modern Linuxes with a vDSO, clock_gettime() does not require a system call 
and completes within a few nanoseconds. So Python's datetime objects do not 
have sufficient resolution to distinguish between adjacent calls to 
clock_gettime().

This means that, like Mark Dickinson above, I have to choose between using 
datetime for queries (which would be convenient) and accepting that nearby 
events in the trace may be indistinguishable, or implementing my own 
datetime-like data structure.

--
nosy: +g...@garethrees.org

___
Python tracker 
<https://bugs.python.org/issue15443>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45643] SIGSTKFLT is missing from the signals module on Linux

2021-11-17 Thread Gareth Rees


Gareth Rees  added the comment:

Tagging vstinner as you have touched Modules/signalmodule.c a few times in the 
last year. What do you think?

--
nosy: +vstinner

___
Python tracker 
<https://bugs.python.org/issue45643>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45643] SIGSTKFLT is missing from the signals module on Linux

2021-10-28 Thread Gareth Rees


Change by Gareth Rees :


--
keywords: +patch
pull_requests: +27529
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/29266

___
Python tracker 
<https://bugs.python.org/issue45643>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45643] SIGSTKFLT is missing from the signals module on Linux

2021-10-28 Thread Gareth Rees

New submission from Gareth Rees :

BACKGROUND

On Linux, "man 7 signal" includes SIGSTKFLT in its table of "various other 
signals":

Signal Value   Action  Comment
───
SIGSTKFLT  -,16,-   Term   Stack fault on coprocessor (unused)

Here "-,16,-" means that the signal is defined with the value 16 on x86 and ARM 
but not on Alpha, SPARC or MIPS. I believe that the intention was to use 
SIGSTKFLT for stack faults on the x87 math coprocessor, but this was either 
removed or never implemented, so that the signal is defined in 
/usr/include/signal.h but not used by the Linux kernel.


USE CASE

SIGSTKFLT is one of a handful of signals that are not used by the kernel, so 
that user-space programs are free to use it for their own purposes, for example 
for inter-thread or inter-process pre-emptive communication.

Accordingly, it would be nice if the name SIGSTKFLT were available in the 
Python signal module on the platforms where the signal is available, for use 
and reporting in these cases.

--
components: Library (Lib)
messages: 405174
nosy: g...@garethrees.org
priority: normal
severity: normal
status: open
title: SIGSTKFLT is missing from the signals module on Linux
type: enhancement
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue45643>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45476] [C API] Convert "AS" functions, like PyFloat_AS_DOUBLE(), to static inline functions

2021-10-15 Thread Gareth Rees

Gareth Rees  added the comment:

If the problem is accidental use of the result of PyFloat_AS_DOUBLE() as an 
lvalue, why not use the comma operator to ensure that the result is an rvalue?

The C99 standard says "A comma operator does not yield an lvalue" in §6.5.17; I 
imagine there is similar text in other versions of the standard.

The idea would be to define a helper macro like this:

/* As expr, but can only be used as an rvalue. */
#define Py_RVALUE(expr) ((void)0, (expr))

and then use the helper where needed, for example:

#define PyFloat_AS_DOUBLE(op) Py_RVALUE(((PyFloatObject *)(op))->ob_fval)

--
nosy: +g...@garethrees.org

___
Python tracker 
<https://bugs.python.org/issue45476>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41092] Report actual size from 'os.path.getsize'

2020-06-26 Thread Gareth Rees


Gareth Rees  added the comment:

The proposed change adds a Boolean flag to os.path.getsize() so that it returns:

os.stat(filename).st_blocks * 512

(where the 512 is the file system block size on Linux; some work is needed to 
make this portable to other operating systems).

The Boolean argument here would always be constant in practice -- that is, 
you'd always call it like this:

virtual_size = os.path.getsize(filename, apparent=True)
allocated_size = os.path.getsize(filename, apparent=False)

and never like this:

x_size = os.path.getsize(filename, apparent=x)

where x varies at runtime.

The "no constant bool arguments" design principle [1] suggests that this should 
be added as a new function, something like os.path.getallocatedsize().

  [1] https://mail.python.org/pipermail/python-ideas/2016-May/040181.html

--
nosy: +g...@garethrees.org

___
Python tracker 
<https://bugs.python.org/issue41092>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40707] Popen.communicate documentation does not say how to get the return code

2020-06-23 Thread Gareth Rees


Gareth Rees  added the comment:

Is there anything I can do to move this forward?

--

___
Python tracker 
<https://bugs.python.org/issue40707>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40707] Popen.communicate documentation does not say how to get the return code

2020-05-23 Thread Gareth Rees


Gareth Rees  added the comment:

The following test cases in test_subprocess.py call the communicate() method 
and then immediately assert that returncode attribute has the expected value:

* test_stdout_none
* test_stderr_redirect_with_no_stdout_redirect
* test_stdout_filedes_of_stdout
* test_communicate_stdin
* test_universal_newlines_communicate_stdin
* test_universal_newlines_communicate_input_none
* test_universal_newlines_communicate_stdin_stdout_stderr
* test_nonexisting_with_pipes
* test_wait_when_sigchild_ignored
* test_startupinfo_copy
* test_close_fds_with_stdio
* test_communicate_stdin

You'll see that some of these test for success (returncode == 0) and some for 
failure (returncode == 1). This seems like adequate test coverage to me, but if 
something is missing, let me know.

--

___
Python tracker 
<https://bugs.python.org/issue40707>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40707] Popen.communicate documentation does not say how to get the return code

2020-05-21 Thread Gareth Rees


Change by Gareth Rees :


--
keywords: +patch
pull_requests: +19559
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/20283

___
Python tracker 
<https://bugs.python.org/issue40707>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40707] Popen.communicate documentation does not say how to get the return code

2020-05-21 Thread Gareth Rees


New submission from Gareth Rees :

When using subprocess.Popen.communicate(), it is natural to wonder how to get
the exit code of the subprocess. However, the documentation [1] says:

Interact with process: Send data to stdin. Read data from stdout and
stderr, until end-of-file is reached. Wait for process to terminate. The
optional input argument should be data to be sent to the child process, or
None, if no data should be sent to the child. If streams were opened in
text mode, input must be a string. Otherwise, it must be bytes.

communicate() returns a tuple (stdout_data, stderr_data). The data will be
strings if streams were opened in text mode; otherwise, bytes.

If you can guess that communicate() might set returncode, then you can find
what you need in the documentation for that attribute [2]:

The child return code, set by poll() and wait() (and indirectly by
communicate()).

I suggest that the documentation for communicate() be updated to mention that
it sets the returncode attribute. This would be consistent with poll() and
wait(), which already mention this.

[1]: 
https://docs.python.org/3/library/subprocess.html#subprocess.Popen.communicate
[2]: 
https://docs.python.org/3/library/subprocess.html#subprocess.Popen.returncode

--
assignee: docs@python
components: Documentation
messages: 369502
nosy: docs@python, g...@garethrees.org
priority: normal
severity: normal
status: open
title: Popen.communicate documentation does not say how to get the return code
type: enhancement
versions: Python 3.10, Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 
3.9

___
Python tracker 
<https://bugs.python.org/issue40707>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17005] Add a topological sort algorithm

2020-01-09 Thread Gareth Rees


Gareth Rees  added the comment:

I'd like to push back on the idea that graphs with isolated vertices are 
"unusual cases" as suggested by Raymond.

A very common use case (possibly the most common) for topological sorting is 
job scheduling. In this use case you have a collection of jobs, some of which 
have dependencies on other jobs, and you want to output a schedule according to 
which the jobs can be executed so that each job is executed after all its 
dependencies.

In this use case, any job that has no dependencies, and is not itself a 
dependency of any other job, is an isolated vertex in the dependency graph. 
This means that the proposed interface (that is, the interface taking only 
pairs of vertices) will not be suitable for this use case. Any any programmer 
who tries to use it for this use case will be setting themselves up for failure.

--

___
Python tracker 
<https://bugs.python.org/issue17005>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17005] Add a topological sort algorithm

2019-01-18 Thread Gareth Rees


Gareth Rees  added the comment:

Just to elaborate on what I mean by "bug magnet". (I'm sure Pablo understands 
this, but there may be other readers who would like to see it spelled out.)

Suppose that you have a directed graph represented as a mapping from a vertex 
to an iterable of its out-neighbours. Then the "obvious" way to get a total 
order on the vertices in the graph would be to generate the edges and pass them 
to topsort:

def edges(graph):
return ((v, w) for v, ww in graph.items() for w in ww)
order = topsort(edges(graph))

This will appear to work fine if it is never tested with a graph that has 
isolated vertices (which would be an all too easy omission).

To handle isolated vertices you have to remember to write something like this:

reversed_graph = {v: [] for v in graph}
for v, ww in graph.items():
for w in ww:
reversed_graph[w].append(v)
order = topsort(edges(graph)) + [
  v for v, ww in graph.items() if not ww and not reversed_graph[v]]

I think it likely that beginner programmers will forget to do this and be 
surprised later on when their total order is missing some of the vertices.

--

___
Python tracker 
<https://bugs.python.org/issue17005>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17005] Add a topological sort algorithm

2019-01-18 Thread Gareth Rees


Gareth Rees  added the comment:

I approve in general with the principle of including a topological sort 
algorithm in the standard library. However, I have three problems with the 
approach in PR 11583:

1. The name "topsort" is most naturally parsed as "top sort" which could be 
misinterpreted (as a sort that puts items on top in some way). If the name must 
be abbreviated then "toposort" would be better.

2. "Topological sort" is a terrible name: the analogy with topological graph 
theory is (i) unlikely to be helpful to anyone; and (ii) not quite right. I 
know that the name is widely used in computing, but a name incorporating 
"linearize" or "linear order" or "total order" would be much clearer.

3. The proposed interface is not suitable for all cases! The function topsort 
takes a list of directed edges and returns a linear order on the vertices in 
those edges (if any linear order exists). But this means that if there are any 
isolated vertices (that is, vertices with no edges) in the dependency graph, 
then there is no way of passing those vertices to the function. This means that 
(i) it is inconvenient to use the proposed interface because you have to find 
the isolated vertices in your graph and add them to the linear order after 
calling the function; (ii) it is a bug magnet because many programmers will 
omit this step, meaning that their code will unexpectedly fail when their graph 
has an isolated vertex. The interface needs to be redesigned to take the graph 
in some other representation.

--
nosy: +g...@garethrees.org

___
Python tracker 
<https://bugs.python.org/issue17005>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32194] When creating list of dictionaries and updating datetime objects one by one, all values are set to last one of the list.

2017-12-01 Thread Gareth Rees

Gareth Rees <g...@garethrees.org> added the comment:

The behaviour of the * operator (and the associated gotcha) is documented under 
"Common sequence operations" [1]:

Note that items in the sequence s are not copied; they are referenced
multiple times. This often haunts new Python programmers ...

There is also an entry in the FAQ [2]:

replicating a list with * doesn’t create copies, it only creates
references to the existing objects

[1] 
https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range
[2] https://docs.python.org/3/faq/programming.html#faq-multidimensional-list

--
nosy: +g...@garethrees.org
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32194>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31895] Native hijri calendar support

2017-10-30 Thread Gareth Rees

Gareth Rees <g...@garethrees.org> added the comment:

convertdate does not document which version of the Islamic calendar it uses, 
but looking at the source code, it seems that it uses a rule-based calendar 
which has a 30-year cycle with 11 leap years. This won't help Haneef, who wants 
the Umm al-Qura calendar.

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31895>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31895] Native hijri calendar support

2017-10-30 Thread Gareth Rees

Gareth Rees <g...@garethrees.org> added the comment:

It is a substantial undertaking, requiring a great deal of expertise, to 
implement the Islamic calendar. The difficulty is that there are multiple 
versions of the calendar. In some places the calendar is based on human 
observation of the new moon, and so a database of past observations is needed 
(and future dates can't be represented). In other places the time of 
observability of the new moon is calculated according to an astronomical 
ephemeris (and different ephemerides are used in different places and at 
different times).

--
nosy: +g...@garethrees.org

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31895>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28647] python --help: -u is misdocumented as binary mode

2017-10-11 Thread Gareth Rees

Gareth Rees <g...@garethrees.org> added the comment:

You're welcome.

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue28647>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24869] shlex lineno inaccurate with certain inputs

2017-07-21 Thread Gareth Rees

Gareth Rees added the comment:

I've made a pull request. (Not because I expect it to be merged as-is, but to 
provide a starting point for discussion.)

--
nosy: +petri.lehtinen, vinay.sajip

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue24869>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24869] shlex lineno inaccurate with certain inputs

2017-07-21 Thread Gareth Rees

Changes by Gareth Rees <g...@garethrees.org>:


--
pull_requests: +2849

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue24869>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30976] multiprocessing.Process.is_alive can show True for dead processes

2017-07-20 Thread Gareth Rees

Gareth Rees added the comment:

This is a race condition — when os.kill returns, that means that the signal has 
been delivered, but it does not mean that the subprocess has exited yet. You 
can see this by inserting a sleep after the kill and before the liveness check:

print(proc.is_alive())
os.kill(proc.pid, signal.SIGTERM)
time.sleep(1)
print(proc.is_alive())

This (probably) gives the process time to exit. (Presumably the 
psutil.pid_exists() call has a similar effect.) Of course, waiting for 1 second 
(or any amount of time) might not be enough. The right thing to do is to join 
the process. Then when the join exits you know it died.

--
nosy: +g...@garethrees.org

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30976>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30973] Regular expression "hangs" interpreter

2017-07-20 Thread Gareth Rees

Gareth Rees added the comment:

This is the usual exponential backtracking behaviour of Python's regex engine. 
The problem is that the regex

(?:[^*]+|\*[^/])*

can match against a string in exponentially many ways, and Python's regex 
engine tries all of them before giving up.

--
nosy: +g...@garethrees.org

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30973>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19896] Exposing "q" and "Q" to multiprocessing.sharedctypes

2017-07-17 Thread Gareth Rees

Changes by Gareth Rees <g...@garethrees.org>:


--
pull_requests: +2801

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue19896>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19896] Exposing "q" and "Q" to multiprocessing.sharedctypes

2017-07-17 Thread Gareth Rees

Gareth Rees added the comment:

(If he hasn't, I don't think I can make a PR because I read his patch and so 
any implementation I make now is based on his patch and so potentially 
infringes his copyright.)

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue19896>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19896] Exposing "q" and "Q" to multiprocessing.sharedctypes

2017-07-17 Thread Gareth Rees

Gareth Rees added the comment:

Has Antony Lee has made a copyright assignment?

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue19896>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19896] Exposing "q" and "Q" to multiprocessing.sharedctypes

2017-07-17 Thread Gareth Rees

Changes by Gareth Rees <g...@garethrees.org>:


--
nosy: +benjamin.peterson

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue19896>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30943] printf-style Bytes Formatting sometimes do not worked.

2017-07-17 Thread Gareth Rees

Gareth Rees added the comment:

This was already noted in issue29714 and fixed by Xiang Zhang in commit 
b76ad5121e2.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30943>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30943] printf-style Bytes Formatting sometimes do not worked.

2017-07-17 Thread Gareth Rees

Gareth Rees added the comment:

Test case minimization:

Python 3.6.1 (default, Apr 24 2017, 06:18:27) 
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> b'a\x00%(a)s' % {b'a': b'a'}
b'a\x00%(a)s'

It seems that all formatting operations after a zero byte are ignored. This is 
because the code for parsing the format string (in _PyBytes_FormatEx in 
Objects/bytesobject.c) uses the following approach to find the next % character:

while (--fmtcnt >= 0) {
if (*fmt != '%') {
Py_ssize_t len;
char *pos;
pos = strchr(fmt + 1, '%');

But strchr uses the C notion of strings, which are terminated by a zero byte.

--
nosy: +g...@garethrees.org

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30943>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19896] Exposing "q" and "Q" to multiprocessing.sharedctypes

2017-07-14 Thread Gareth Rees

Gareth Rees added the comment:

Patch looks good to me. The test cases are not very systematic (why only int, 
double, and long long?), but that's not the fault of the patch and shouldn't 
prevent its being applied.

--
nosy: +g...@garethrees.org

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue19896>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-14 Thread Gareth Rees

Gareth Rees added the comment:

I propose:

1. Ask Richard Oudkerk why in changeset 3b82e0d83bf9 the temporary file is 
zero-filled and not truncated. Perhaps there's some file system where this is 
necessary? (I tested HFS+ which doesn't support sparse files, and zero-filling 
seems not to be necessary, but maybe there's some other file system where it 
is?)

2. If there's no good reason for zero-filling the temporary file, replace it 
with a call to os.ftruncate(fd, size).

3. Update the documentation to mention the performance issue when porting 
multiprocessing code from 2 to 3. Unfortunately, I don't think there's any 
advice that the documentation can give that will help work around it -- 
monkey-patching works but is not supported.

4. Consider writing a fix, or at least a supported workaround. Here's a 
suggestion: update multiprocessing.sharedctypes and multiprocessing.heap so 
that they use anonymous maps in the 'fork' context. The idea is to update the 
RawArray and RawValue functions so that they take the context, and then pass 
the context down to _new_value, BufferWrapper.__init__ and thence to 
Heap.malloc where it can be used to determine what kind of Arena (file-backed 
or anonymous) should be used to satisfy the allocation request. The Heap class 
would have to have to segregate its blocks according to what kind of Arena they 
come from.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30919>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-14 Thread Gareth Rees

Gareth Rees added the comment:

I see now that the default start method is 'fork' (except on Windows), so 
calling set_start_method is unnecessary.

Note that you don't have to edit multiprocessing/heap.py, you can 
"monkey-patch" it in the program that needs the anonymous mapping:

from multiprocessing.heap import Arena

def anonymous_arena_init(self, size, fd=-1):
"Create Arena using an anonymous memory mapping."
self.size = size
self.fd = fd  # still kept but is not used !
self.buffer = mmap.mmap(-1, self.size)

Arena.__init__ = anonymous_arena_init

As for what it will break — any code that uses the 'spawn' or 'forkserver' 
start methods.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30919>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-14 Thread Gareth Rees

Gareth Rees added the comment:

Nonetheless this is bound to be a nasty performance for many people doing big 
data processing with NumPy/SciPy/Pandas and multiprocessing and moving from 2 
to 3, so even if it can't be fixed, the documentation ought to warn about the 
problem and explain how to work around it.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30919>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-14 Thread Gareth Rees

Gareth Rees added the comment:

If you need the 2.7 behaviour (anonymous mappings) in 3.5 then you can still do 
it, with some effort. I think the approach that requires the smallest amount of 
work would be to ensure that subprocesses are started using fork(), by calling 
multiprocessing.set_start_method('fork'), and then monkey-patch 
multiprocessing.heap.Arena.__init__ so that it creates anonymous mappings using 
mmap.mmap(-1, size).

(I suggested above that Python could be modified to create anonymous mappings 
in the 'fork' case, but now that I look at the code in detail, I see that it 
would be tricky, because the Arena class has no idea about the Context in which 
it is going to be used -- at the moment you can create one shared object and 
then pass it to subprocesses under different Contexts, so the shared objects 
have to support the lowest common denominator.)

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30919>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-13 Thread Gareth Rees

Gareth Rees added the comment:

Note that some filesystems (e.g. HFS+) don't support sparse files, so creating 
a large Arena will still be slow on these filesystems even if the file is 
created using ftruncate().

(This could be fixed, for the "fork" start method only, by using anonymous maps 
in that case.)

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30919>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30919] Shared Array Memory Allocation Regression

2017-07-13 Thread Gareth Rees

Gareth Rees added the comment:

In Python 2.7, multiprocessing.heap.Arena uses an anonymous memory mapping on 
Unix. Anonymous memory mappings can be shared between processes but only via 
fork().

But Python 3 supports other ways of starting subprocesses (see issue 8713 [1]) 
and so an anonymous memory mapping no longer works. So instead a temporary file 
is created, filled with zeros to the given size, and mapped into memory (see 
changeset 3b82e0d83bf9 [2]). It is the zero-filling of the temporary file that 
takes the time, because this forces the operating system to allocate space on 
the disk.

But why not use ftruncate() (instead of write()) to quickly create a file with 
holes? POSIX says [3], "If the file size is increased, the extended area shall 
appear as if it were zero-filled" which would seem to satisfy the requirement.

[1] https://bugs.python.org/issue8713
[2] https://hg.python.org/cpython/rev/3b82e0d83bf9
[3] http://pubs.opengroup.org/onlinepubs/9699919799/functions/ftruncate.html

--
nosy: +g...@garethrees.org

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30919>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30564] Base64 decoding gives incorrect outputs.

2017-06-04 Thread Gareth Rees

Gareth Rees added the comment:

RFC 4648 section 3.5 says:

   The padding step in base 64 and base 32 encoding can, if improperly
   implemented, lead to non-significant alterations of the encoded data.
   For example, if the input is only one octet for a base 64 encoding,
   then all six bits of the first symbol are used, but only the first
   two bits of the next symbol are used.  These pad bits MUST be set to
   zero by conforming encoders, which is described in the descriptions
   on padding below.  If this property do not hold, there is no
   canonical representation of base-encoded data, and multiple base-
   encoded strings can be decoded to the same binary data.  If this
   property (and others discussed in this document) holds, a canonical
   encoding is guaranteed.

   In some environments, the alteration is critical and therefore
   decoders MAY chose to reject an encoding if the pad bits have not
   been set to zero.

If decoders may choose to reject non-canonical encodings, then they may also
choose to accept them. (That's the meaning of "MAY" in RFC 2119.) So I think
Python's behaviour is conforming to the standard.

--
nosy: +g...@garethrees.org

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30564>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29977] re.sub stalls forever on an unmatched non-greedy case

2017-04-04 Thread Gareth Rees

Gareth Rees added the comment:

See also issue28690, issue212521, issue753711, issue1515829, etc.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue29977>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29977] re.sub stalls forever on an unmatched non-greedy case

2017-04-04 Thread Gareth Rees

Gareth Rees added the comment:

The problem here is that both "." and "\s" match a whitespace character, and 
because you have the re.DOTALL flag turned on this includes "\n", and so the 
number of different ways in which (.|\s)* can be matched against a string is 
exponential in the number of whitespace characters in the string.

It is best to design your regular expression so as to limit the number of 
different ways it can match. Here I recommend the expression:

/\*(?:[^*]|\*[^/])*\*/

which can match in only one way.

--
nosy: +g...@garethrees.org

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue29977>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14376] sys.exit documents argument as "integer" but actually requires "subtype of int"

2017-02-02 Thread Gareth Rees

Gareth Rees added the comment:

Thank you, Mark (and everyone else who helped).

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue14376>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14376] sys.exit documents argument as "integer" but actually requires "subtype of int"

2017-02-01 Thread Gareth Rees

Gareth Rees added the comment:

Thanks for the revised patch, Mark. The new tests look good.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue14376>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24869] shlex lineno inaccurate with certain inputs

2017-02-01 Thread Gareth Rees

Gareth Rees added the comment:

Here's a patch that implements my proposal (1) -- under this patch, tokens read 
from an input stream belong to a subtype of str with startline and endline 
attributes giving the line numbers of the first and last character of the 
token. This allows the accurate reporting of error messages relating to a 
token. I updated the documentation and added a test case.

--
keywords: +patch
Added file: http://bugs.python.org/file46479/issue24869.patch

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue24869>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14376] sys.exit documents argument as "integer" but actually requires "subtype of int"

2017-02-01 Thread Gareth Rees

Gareth Rees added the comment:

In Windows, under cmd.exe, you can use %errorlevel%

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue14376>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14376] sys.exit documents argument as "integer" but actually requires "subtype of int"

2017-02-01 Thread Gareth Rees

Gareth Rees added the comment:

Is there any chance of making progress on this issue? Is there anything wrong 
with my patch? Did I omit any relevant point in my message of 2016-06-11 16:26? 
It would be nice if this were not left in limbo for another four years.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue14376>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28743] test_choices_algorithms() in test_random uses lots of memory

2016-11-19 Thread Gareth Rees

Gareth Rees added the comment:

In order for this to work, the __getitem__ method needs to be:

def __getitem__(self, key):
if 0 <= key < self.n:
return self.elem
else:
raise IndexError(key)

But unfortunately this is very bad for the performance of the test. The 
original code, with [1]*n:

Ran 1 test in 5.256s

With RepeatedSequence(1, n):

Ran 1 test in 33.620s

So that's no good. However, I notice that although the documentation of choices 
specifies that weights is a sequence, in fact it seems only to require an 
iterable:

cum_weights = list(_itertools.accumulate(weights))

so itertools.repeat works, and is faster than the original code:

Ran 1 test in 4.991s

Patch attached, in case it's acceptable to pass an iterable here.

--
keywords: +patch
Added file: http://bugs.python.org/file45546/issue28743.patch

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28743>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28743] test_choices_algorithms() in test_random uses lots of memory

2016-11-19 Thread Gareth Rees

Gareth Rees added the comment:

Couldn't the test case use something like this to avoid allocating so much 
memory?

from collections.abc import Sequence

class RepeatedSequence(Sequence):
"""Immutable sequence of n repeats of elem."""
def __init__(self, elem, n):
self.elem = elem
self.n = n

def __getitem__(self, key):
return self.elem

def __len__(self):
return self.n

and then:

self.gen.choices(range(n), RepeatedSequence(1, n), k=1)

--
nosy: +Gareth.Rees

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28743>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28690] Loop in re (regular expression) processing

2016-11-14 Thread Gareth Rees

Gareth Rees added the comment:

This is a well-known gotcha with backtracking regexp implementations. The 
problem is that in the alternation "( +|'[^']*'|\"[^\"]*\"|[^>]+)" there are 
some characters (space, apostrophe, double quotes) that match multiple 
alternatives (for example a space matches both " +" and "[^>]+"). This causes 
the regexp engine to have to backtrack for each ambiguous character to try out 
the other alternatives, leading to runtime that's exponential in the number of 
ambiguous characters.

Linear behaviour can be restored if you make the alternation unambiguous, like 
this: ( +|'[^']*'|\"[^\"]*\"|[^>'\"]+)

--
nosy: +Gareth.Rees

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28690>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28676] On macOS Sierra, warning: implicit declaration of function 'getentropy'

2016-11-12 Thread Gareth Rees

New submission from Gareth Rees:

On macOS Sierra (OSX 10.12.1):

$ ./configure --with-pydebug && make
[... lots of output omitted ...]
gcc -c -Wno-unused-result -Wsign-compare -g -O0 -Wall -Wstrict-prototypes   
 -std=c99 -Wextra -Wno-unused-result -Wno-unused-parameter 
-Wno-missing-field-initializers   -I. -I./Include-DPy_BUILD_CORE -o 
Python/random.o Python/random.c
Python/random.c:97:19: warning: implicit declaration of function 
'getentropy' is
  invalid in C99 [-Wimplicit-function-declaration]
res = getentropy(buffer, len);
  ^
1 warning generated.

This is because OSX 10.12.1 has getentropy() but does not have
getrandom(). You can see this in pyconfig.h:

/* Define to 1 if you have the `getentropy' function. */
#define HAVE_GETENTROPY 1

/* Define to 1 if the getrandom() function is available */
/* #undef HAVE_GETRANDOM */

and this means that in Python/random.c the header  is
not included:

#  ifdef HAVE_GETRANDOM
#include 
#  elif defined(HAVE_GETRANDOM_SYSCALL)
#include 
#  endif

It's necessary include  if either HAVE_GETRANDOM or
HAVE_GETENTROPY is defined.

--
components: Build, macOS
files: getentropy.patch
keywords: patch
messages: 280669
nosy: Gareth.Rees, ned.deily, ronaldoussoren
priority: normal
severity: normal
status: open
title: On macOS Sierra, warning: implicit declaration of function 'getentropy'
type: compile error
versions: Python 3.7
Added file: http://bugs.python.org/file45465/getentropy.patch

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28676>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28647] python --help: -u is misdocumented as binary mode

2016-11-12 Thread Gareth Rees

Gareth Rees added the comment:

Here's a patch that copies the text for the -u option from the man page to the 
--help output.

--
keywords: +patch
Added file: http://bugs.python.org/file45463/issue28647.patch

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28647>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28647] python --help: -u is misdocumented as binary mode

2016-11-12 Thread Gareth Rees

Gareth Rees added the comment:

The output of "python3.5 --help" says:

-u : unbuffered binary stdout and stderr, stdin always buffered;
 also PYTHONUNBUFFERED=x
 see man page for details on internal buffering relating to '-u'

If you look at the man page as instructed then you'll see a clearer
explanation:

-u   Force  the  binary  I/O  layers  of  stdout  and  stderr  to  be
 unbuffered.  stdin is always buffered.  The text I/O layer  will
 still be line-buffered.

For example, if you try this:

python3.5 -uc 'import 
sys,time;w=sys.stdout.buffer.write;w(b"a");time.sleep(1);w(b"b");'

then you'll see that the binary output is indeed unbuffered as
documented.

The output of --help is trying to abbreviate this explanation, but I
think it's abbreviated too much. The explanation from the man page
seems clear to me, and is only a little longer, so I suggest changing
the --help output to match the man page.

--
nosy: +Gareth.Rees

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28647>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27588] Type objects are hashable and comparable for equality but this is not documented

2016-07-22 Thread Gareth Rees

New submission from Gareth Rees:

The type objects constructed by the metaclasses in the typing module are 
hashable and comparable for equality:

>>> from typing import *
>>> {Mapping[str, int], Mapping[int, str]}
{typing.Mapping[int, str], typing.Mapping[str, int]}
>>> Union[str, int, float] == Union[float, int, str]
True
>>> List[int] == List[float]
False

but this is not clearly documented in the documentation for the typing module 
(there are a handful of examples using equality, but it's not explicit that 
these are runnable).

It would be nice if there were explicit documentation for these properties of 
type objects.

--
assignee: docs@python
components: Documentation
messages: 270981
nosy: Gareth.Rees, docs@python
priority: normal
severity: normal
status: open
title: Type objects are hashable and comparable for equality but this is not 
documented
type: enhancement
versions: Python 3.5, Python 3.6

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27588>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24869] shlex lineno inaccurate with certain inputs

2016-06-13 Thread Gareth Rees

Gareth Rees added the comment:

A third alternative:

3. Add a method whose effect is to consume comments and whitespace, but which 
does not yield a token. You could then call this method, and then look at 
shlex.lineno, which will be the line number of the first character of the next 
token (if there is a next token).

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue24869>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24869] shlex lineno inaccurate with certain inputs

2016-06-13 Thread Gareth Rees

Gareth Rees added the comment:

Just to restate the problem:

The use case is that when emitting an error message for a token, we want to 
include the number of the line containing the token (or the number of the line 
where the token started, if the token spans multiple lines, as it might if it's 
a string containing newlines).

But there is no way to satisfy this use case given the features of the shlex 
module. In particular, shlex.lineno (which looks as if it ought to help) is 
actually the line number of the first character that has not yet been consumed 
by the lexer, and in general this is not the same as the line number of the 
previous (or the next) token.

I can think of two alternatives that would satisfy the use case:

1. Instead of returning tokens as str objects, return them as instances of a 
subclass of str that has a property that gives the line number of the first 
character of the token. (Maybe it should also have properties for the column 
number of the first character, and the line and column number of the last 
character too? These properties would support better error messages.)

2. Add new methods that return tuples giving the token and its line number (and 
possibly column number etc. as in alternative 1).

My preference would be for alternative (1), but I suppose there is a very tiny 
risk of breaking some code that relied upon get_token returning an instance of 
str exactly rather than an instance of a subclass of str.

--
nosy: +Gareth.Rees

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue24869>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27306] Grammatical Error in Documentation - Tarfile page

2016-06-13 Thread Gareth Rees

Gareth Rees added the comment:

Here's a patch improving the grammar in the tarfile documentation.

--
keywords: +patch
nosy: +Gareth.Rees
Added file: http://bugs.python.org/file43375/issue27306.patch

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27306>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20508] IndexError from ipaddress._BaseNetwork.__getitem__ has no message

2016-06-11 Thread Gareth Rees

Gareth Rees added the comment:

Thank you for applying this patch.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue20508>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14376] sys.exit documents argument as "integer" but actually requires "subtype of int"

2016-06-11 Thread Gareth Rees

Gareth Rees added the comment:

Let's not allow the perfect to be the enemy of the good here.

The issue I reported is a very specific one: in Python 2.7, if you pass a long 
to sys.exit, then the value of the long is not used as the exit code. This is 
bad because functions like os.spawnv that return exit codes (that you might 
reasonably want to pass on to sys.exit) can return them as long.

My patch only proposes to address this one issue. In order to keep the impact 
as small as possible, I do not propose to make any other changes, or address 
any other problems.

But in the comments here people have brought up THREE other issues:

1. Alexander Belopolsky expresses the concern that "(int)PyLong_AsLong(value) 
can silently convert non-zero error code to zero."

This is not a problem introduced by my patch -- the current code is:

exitcode = (int)PyInt_AsLong(value)

which has exactly the same problem (because PyIntObject stores its value as a 
long). So this concern (even if valid) is not a reason to reject my patch.

2. Ethan Furman wrote: "we need to protect against overflow from  to 
"

But again, this is not a problem introduced by my patch. The current code says:

exitcode = (int)PyInt_AsLong(value);

and my patch does not change this line. The possibility of this overflow is not 
a reason to reject my patch.

3. Alexander says, "Passing anything other than one of the os.EX_* constants to 
sys.exit() is a bad idea"

First, this is not a problem introduced by my patch. The existing code in 
Python 2.7 allows you to specify other exit codes. So this problem (if it is a 
problem) is not a reason to reject my patch.

Second, this claim is surely not right -- when a subprocess fails it often 
makes sense to pass on the exit code of the subprocess, whatever that is. This 
is exactly the use case that I mentioned in my original report (that is, 
passing on the exit code from os.spawnv to sys.exit).

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue14376>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20508] IndexError from ipaddress._BaseNetwork.__getitem__ has no message

2016-06-11 Thread Gareth Rees

Gareth Rees added the comment:

I've attached a revised patch that addresses Berker Peksag's concerns:

1. The message associated with the IndexError is now "address out of range" 
with no information about which address failed or why.

2. There's a new test case for an IndexError from an IPv6 address lookup.

--
Added file: http://bugs.python.org/file43341/ipaddress.patch

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue20508>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24460] urlencode() of dictionary not as expected

2015-06-17 Thread Gareth Rees

Gareth Rees added the comment:

If you read the documentation for urllib.parse.urlencode [1], you'll
see that it says:

The value element in itself can be a sequence and in that case, if
the optional parameter doseq is evaluates to True, individual
key=value pairs separated by '' are generated for each element of
the value sequence for the key.

So you need to write:

 urllib.parse.urlencode(thisDict, doseq=True)
'SomeVar3=ghiSomeVar1=abcSomeVar2=def'

[1]: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlencode

--
nosy: +Gareth.Rees

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24460
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24405] Missing code markup in Expressions documentation

2015-06-08 Thread Gareth Rees

New submission from Gareth Rees:

The Expressions documentation contains the text:

 * Sets and frozensets define comparison operators to mean subset and superset
  tests.  Those relations do not define total orderings (the two sets ``{1,2}``
  and {2,3} are not equal, nor subsets of one another, nor supersets of one
  another).

Here {2,3} should be marked up as code (like {1,2}) but is not.

--
assignee: docs@python
components: Documentation
files: markup.patch
keywords: patch
messages: 244996
nosy: Gareth.Rees, docs@python
priority: normal
severity: normal
status: open
title: Missing code markup in Expressions documentation
type: enhancement
Added file: http://bugs.python.org/file39657/markup.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24405
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24406] Built-in Types documentation doesn't explain how dictionaries are compared for equality

2015-06-08 Thread Gareth Rees

Changes by Gareth Rees g...@garethrees.org:


--
title: Bulit-in Types documentation doesn't explain how dictionaries are 
compared for equality - Built-in Types documentation doesn't explain how 
dictionaries are compared for equality

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24406
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24067] Weakproxy is an instance of collections.Iterator

2015-04-28 Thread Gareth Rees

Gareth Rees added the comment:

The documentation says that weakref.Proxy objects are not hashable because 
this avoids a number of problems related to their fundamentally mutable 
nature, and prevent their use as dictionary keys.

Hashable objects must be immutable, otherwise the hash might change, 
invalidating the invariants that make dictionaries work, but Proxy objects are 
fundamentally mutable: when there are no more strong references to the proxied 
object, the object gets destroyed and the Proxy object now refers to None. If 
the Proxy object were hashable then its hash would change at this point.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24067
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24067] Weakproxy is an instance of collections.Iterator

2015-04-28 Thread Gareth Rees

Gareth Rees added the comment:

 I don't see any reason for proxy objects to be less hashable than ref objects.

The difference is that unlike a ref object, a proxy object is supposed to 
forward its method calls to the proxied object. So consider what happens if you 
forward the __hash__ method to the proxied object: the hash will change when 
the object dies.

A proxy object could, of course, not forward the __hash__ method, instead 
computing its own hash. But I think this would do more harm than good: surely 
most attempts to store weakref.Proxy objects in sets or dictionaries are going 
to be mistakes -- the user should have used a WeakKeyDictionary or a WeakSet 
instead.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24067
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24067] Weakproxy is an instance of collections.Iterator

2015-04-28 Thread Gareth Rees

Gareth Rees added the comment:

Hashable is particularly misleading, because weakref.Proxy objects are not 
hashable regardless of the referent.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24067
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24067] Weakproxy is an instance of collections.Iterator

2015-04-28 Thread Gareth Rees

Gareth Rees added the comment:

Not just Iterator, but Container, Hashable, Iterable, and Sized too!

 import weakref
 class C: pass
 o = C()
 w = weakref.proxy(o)
 from collections.abc import *
 isinstance(w, Container)
True
 isinstance(w, Hashable)
True
 isinstance(w, Iterable)
True
 isinstance(w, Sized)
True

--
nosy: +Gareth.Rees

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24067
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20606] Operator Documentation Example doesn't work

2015-01-04 Thread Gareth Rees

Gareth Rees added the comment:

This is a duplicate of #22180, which was fixed in changeset 9c250f34bfa3 by 
Raymond Hettinger in branch '3.4'. The fix just removes the bad example, as in 
my patch. So I suggest that this issue be closed as a duplicate.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20606
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20941] pytime.c:184 and pytime.c:218: runtime error, outside the range of representable values of type 'long'

2014-03-16 Thread Gareth Rees

Gareth Rees added the comment:

 How did you get this warning?

This looks like runtime output from a program built using Clang/LLVM with 
-fsanitize=undefined. See here: 
http://clang.llvm.org/docs/UsersManual.html#controlling-code-generation

Signed integer overflow is undefined behaviour, so by the time *sec = 
(time_t)intpart has been evaluated, the undefined behaviour has already 
happened. It is too late to check for it afterwards.

--
nosy: +Gareth.Rees

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20941
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20905] Adapt heapq push/pop/replace to allow passing a comparator.

2014-03-13 Thread Gareth Rees

Gareth Rees added the comment:

It would be better to accept a key function instead of a comparison
function (cf. heapq.nlargest and heapq.nsmallest).

But note that this has been proposed before and rejected: see
issue1904 where Raymond Hettinger provides this rationale:

Use cases aside, there is another design issue in that the
key-function approach doesn't work well with the heap functions on
regular lists. Successive calls to heap functions will of
necessity call the key- function multiple times for any given
element. This contrasts with sort () where the whole purpose of
the key function was to encapsulate the decorate-sort-undecorate
pattern which was desirable because the key- function called
exactly once per element.

However, in the case of the bisect module (where requests for a key
function are also common), Guido was recently persuaded that there was
a valid use case. See issue4356, and this thread on the Python-ideas
mailing list:
https://mail.python.org/pipermail/python-ideas/2012-February/thread.html#13650
where Arnaud Delobelle points out that:

Also, in Python 3 one can't assume that values will be comparable so
the (key, value) tuple trick won't work: comparing the tuples may well
throw a TypeError.

and Guido responds:

Bingo. That clinches it. We need to add key=.

--
nosy: +Gareth.Rees

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20905
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20774] collections.deque should ship with a stdlib json serializer

2014-02-26 Thread Gareth Rees

Gareth Rees added the comment:

The JSON implementation uses these tests to determine how to serialize a Python 
object:

isinstance(o, (list, tuple))
isinstance(o, dict)

So any subclasses of list and tuple are serialized as a list, and any subclass 
of dict is serialized as an object. For example:

 json.dumps(collections.defaultdict())
'{}'
 json.dumps(collections.OrderedDict())
'{}'
 json.dumps(collections.namedtuple('mytuple', ())())
'[]'

When deserialized, you'll get back a plain dictionary or list, so there's no 
round-trip property here.

The tests could perhaps be changed to:

isinstance(o, collections.abc.Sequence)
isinstance(o, collections.abc.Mapping)

I'm not a JSON expert, so I have no informed opinion on whether this is a good 
idea or not, but in any case, this change wouldn't help with deques, as a deque 
is not a Sequence. That's because deques don't have an index method (see 
issue10059 and issue12543).

--
nosy: +Gareth.Rees

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20774
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20727] Improved roundrobin itertools recipe

2014-02-25 Thread Gareth Rees

Gareth Rees added the comment:

If 100 doesn't work for you, try a larger number.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20727
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20727] Improved roundrobin itertools recipe

2014-02-25 Thread Gareth Rees

Gareth Rees added the comment:

I suspect I messed up the timing I did yesterday, because today I find that 100 
isn't large enough, but here's what I found today (in Python 3.3):

 from timeit import timeit
 test = [tuple(range(300))] + [()] * 100
 timeit(lambda:list(roundrobin1(*test)), number=1) # old recipe
8.386148632998811
 timeit(lambda:list(roundrobin2(*test)), number=1) # new recipe
16.757110453007044

The new recipe is more than twice as slow as the old in this case, and its 
performance gets relatively worse as you increase the number 300.

I should add that I do recognise that the new recipe is better for nearly all 
cases (it's simpler as well as faster), but I want to point out an important 
feature of the old recipe, namely that it discards iterables as they are 
finished with, giving it worst-case O(n) performance (albeit slow) whereas the 
new recipe has worst case O(n^2). As we found out with hash tables, worst-case 
O(n^2) performance can be a problem when inputs are untrusted, so there are use 
cases where people might legitimately prefer an O(n) solution even if it's a 
bit slower in common cases.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20727
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20727] Improved roundrobin itertools recipe

2014-02-25 Thread Gareth Rees

Gareth Rees added the comment:

But now that I look at the code more carefully, the old recipe also has O(n^2) 
behaviour, because cycle(islice(nexts, pending)) costs O(n) and is called O(n) 
times. To have worst-case O(n) behaviour, you'd need something like this:

from collections import deque

def roundrobin3(*iterables):
roundrobin('ABC', 'D', 'EF') -- A D E B F C
nexts = deque(iter(it).__next__ for it in iterables)
while nexts:
try:
while True:
yield nexts[0]()
nexts.rotate(-1)
except StopIteration:
nexts.popleft()

 from timeit import timeit
 test = [tuple(range(1000))] + [()] * 1000
 timeit(lambda:list(roundrobin1(*test)), number=100) # old recipe
5.184364624001319
 timeit(lambda:list(roundrobin2(*test)), number=100) # new recipe
5.139592286024708
 timeit(lambda:list(roundrobin3(*test)), number=100)
0.16217014100402594

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20727
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20727] Improved roundrobin itertools recipe

2014-02-24 Thread Gareth Rees

Gareth Rees added the comment:

 benchmarks show it to be more than twice as fast

I'm sure they do, but other benchmarks show it to be more than twice as slow. 
Try something like:

iterables = [range(100)] + [()] * 100

--
nosy: +Gareth.Rees

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20727
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2014-02-18 Thread Gareth Rees

Gareth Rees added the comment:

Thanks for your work on this, Terry. I apologise for the complexity of my 
original report, and will try not to do it again.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19363] Python 2.7's future_builtins.map is not compatible with Python 3's map

2014-02-17 Thread Gareth Rees

Gareth Rees added the comment:

Sorry about that; here it is. I had second thoughts about recommending zip() as 
an alternative (that would only work for cases where the None was constant; in 
other cases you might need lambda *args: args, but this seemed too 
complicated), so the note now says only:

Note: In Python 3, map() does not accept None for the
function argument.

--
keywords: +patch
Added file: http://bugs.python.org/file34117/issue19363.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19363
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20606] Operator Documentation Example doesn't work

2014-02-12 Thread Gareth Rees

Gareth Rees added the comment:

The failing example is:

d = {}
keys = range(256)
vals = map(chr, keys)
map(operator.setitem, [d]*len(keys), keys, vals)   

which works in Python 2 where map returns a list, but not in Python 3 where map 
returns an iterator.

Doc/library/operator.rst follows the example with this note:

.. XXX: find a better, readable, example

Additional problems with the example:

1. It's poorly motivated because a dictionary comprehension would be simpler 
and shorter:

d = {i: chr(i) for i in range(256)}

2. It's also unclear why you'd need this dictionary when you could just call 
the function chr (but I suppose some interface might require a dictionary 
rather than a function).

3. To force the map to be evaluated, you need to write list(map(...)) which 
allocates an unnecessary list object and then throws it away. To avoid the 
unnecessary allocation you could use the consume recipe from the itertools 
documentation and write collections.deque(map(...), maxlen=0) but this is 
surely too obscure to use as an example.

I had a look through the Python sources, and made an Ohloh Code search for 
operator.setitem and I didn't find any good examples of its use, so I think 
the best thing to do is just to delete the example.

http://code.ohloh.net/search?s=%22operator.setitem%22pp=0fl=Pythonmp=1ml=1me=1md=1ff=1filterChecked=true

--
nosy: +Gareth.Rees

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20606
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20606] Operator Documentation Example doesn't work

2014-02-12 Thread Gareth Rees

Changes by Gareth Rees g...@garethrees.org:


--
keywords: +patch
Added file: http://bugs.python.org/file34059/operator.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20606
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20539] math.factorial may throw OverflowError

2014-02-07 Thread Gareth Rees

Gareth Rees added the comment:

It's not a case of internal storage overflowing. The error is from
Modules/mathmodule.c:1426 and it's the input 10**19 that's too large
to convert to a C long. You get the same kind of error in other
places where PyLong_AsLong or PyLong_AsInt is called on a
user-supplied value, for example:

 import pickle
 pickle.dumps(10**19, 10**19)
Traceback (most recent call last):
  File stdin, line 1, in module
OverflowError: Python int too large to convert to C long

--
nosy: +Gareth.Rees

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20539
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2014-02-06 Thread Gareth Rees

Gareth Rees added the comment:

I did some research on the cause of this issue. The assertion was
added in this change by Jeremy Hylton in August 2006:
https://mail.python.org/pipermail/python-checkins/2006-August/055812.html
(The corresponding Mercurial commit is here:
http://hg.python.org/cpython/rev/cc992d75d5b3#l217.25).

At that point I believe the assertion was reasonable. I think it would
have been triggered by backslash-continued lines, but otherwise it
worked.

But in this change http://hg.python.org/cpython/rev/51e24512e305 in
March 2008 Trent Nelson applied this patch by Michael Foord
http://bugs.python.org/file9741/tokenize_patch.diff to implement PEP
263 and fix issue719888. The patch added ENCODING tokens to the output
of tokenize.tokenize(). The ENCODING token is always generated with
row number 0, while the first actual token is generated with row
number 1. So now every token stream from tokenize.tokenize() sets off
the assertion.

The lack of a test case for tokenize.untokenize() in full mode meant
that it was (and is) all too easy for someone to accidentally break it
like this.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2014-02-05 Thread Gareth Rees

Changes by Gareth Rees g...@garethrees.org:


--
assignee:  - docs@python
components: +Documentation, Tests
nosy: +docs@python

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2014-02-05 Thread Gareth Rees

Changes by Gareth Rees g...@garethrees.org:


Removed file: http://bugs.python.org/file33919/Issue12691.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20507] TypeError from str.join has no message

2014-02-04 Thread Gareth Rees

New submission from Gareth Rees:

If you pass an object of the wrong type to str.join, Python raises a
TypeError with no error message:

Python 3.4.0b3 (default, Jan 27 2014, 02:26:41) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin
Type help, copyright, credits or license for more information.
 ''.join(1)
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError

It is unnecessarily hard to understand from this error what the
problem actually was. Which object had the wrong type? What type
should it have been? Normally a TypeError is associated with a message
explaining which type was wrong, and what it should have been. For
example:

 b''.join(1)
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: can only join an iterable

It would be nice if the TypeError from ''.join(1) included a message
like this.

The reason for the lack of message is that PyUnicode_Join starts out
by calling PySequence_Fast(seq, ) which suppresses the error message
from PyObject_GetIter. This commit by Tim Peters is responsible:
http://hg.python.org/cpython/rev/8579859f198c. The commit message
doesn't mention the suppression of the message so I can assume that it
was an oversight.

I suggest replacing the line:

fseq = PySequence_Fast(seq, );

in PyUnicode_Join in unicodeobject.c with:

fseq = PySequence_Fast(seq, can only join an iterable);

for consistency with bytes_join in stringlib/join.h. Patch attached.

--
components: Interpreter Core
files: join.patch
keywords: patch
messages: 210200
nosy: Gareth.Rees
priority: normal
severity: normal
status: open
title: TypeError from str.join has no message
type: behavior
versions: Python 3.4
Added file: http://bugs.python.org/file33900/join.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20507
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20508] IndexError from ipaddress._BaseNetwork.__getitem__ has no message

2014-02-04 Thread Gareth Rees

New submission from Gareth Rees:

If you try to look up an out-of-range address from an object returned
by ipaddress.ip_network, then ipaddress._BaseNetwork.__getitem__
raises an IndexError with no message:

Python 3.4.0b3 (default, Jan 27 2014, 02:26:41) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin
Type help, copyright, credits or license for more information.
 import ipaddress
 ipaddress.ip_network('2001:db8::8/125')[100]
Traceback (most recent call last):
  File stdin, line 1, in module
  File 
/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/ipaddress.py,
 line 601, in __getitem__
raise IndexError
IndexError

Normally an IndexError is associated with a message explaining the
cause of the error. For example:

 [].pop()
Traceback (most recent call last):
  File stdin, line 1, in module
IndexError: pop from empty list

It would be nice if the IndexError from
ipaddress._BaseNetwork.__getitem__ included a message like this.

With the attached patch, the error message looks like this in the
positive case:

 ipaddress.ip_network('2001:db8::8/125')[100]
Traceback (most recent call last):
  File stdin, line 1, in module
  File /Users/gdr/hg.python.org/cpython/Lib/ipaddress.py, line 602, in 
__getitem__
% (self, self.num_addresses))
IndexError: 100 out of range 0..7 for 2001:db8::8/125

and like this in the negative case:

 ipaddress.ip_network('2001:db8::8/125')[-100]
Traceback (most recent call last):
  File stdin, line 1, in module
  File /Users/gdr/hg.python.org/cpython/Lib/ipaddress.py, line 608, in 
__getitem__
% (n - 1, self.num_addresses, self))
IndexError: -100 out of range -8..-1 for 2001:db8::8/125

(If you have a better suggestion for how the error message should
read, I could submit a revised patch. I suppose it could just say
address index out of range for consistency with list.__getitem__ and
str.__getitem__. But I think the extra information is likely to be
helpful for the programmer who is trying to track down the cause of an
error.)

--
components: Library (Lib)
files: ipaddress.patch
keywords: patch
messages: 210224
nosy: Gareth.Rees
priority: normal
severity: normal
status: open
title: IndexError from ipaddress._BaseNetwork.__getitem__ has no message
type: behavior
versions: Python 3.4
Added file: http://bugs.python.org/file33903/ipaddress.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20508
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20508] IndexError from ipaddress._BaseNetwork.__getitem__ has no message

2014-02-04 Thread Gareth Rees

Changes by Gareth Rees g...@garethrees.org:


--
type: behavior - enhancement

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20508
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19362] Documentation for len() fails to mention that it works on sets

2014-02-04 Thread Gareth Rees

Gareth Rees added the comment:

Here's a revised patch using Ezio's suggestion (Return the number of items of 
a sequence or container).

--
Added file: http://bugs.python.org/file33904/len-set.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19362
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19362] Documentation for len() fails to mention that it works on sets

2014-02-04 Thread Gareth Rees

Changes by Gareth Rees g...@garethrees.org:


--
title: Documentation for len() fails to mention that it works   on sets - 
Documentation for len() fails to mention that it works on sets
versions: +Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19362
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14376] sys.exit documents argument as integer but actually requires subtype of int

2014-02-04 Thread Gareth Rees

Gareth Rees added the comment:

Patch attached. I added a test case to Lib/test/test_sys.py.

--
Added file: http://bugs.python.org/file33906/exit.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14376
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20510] Test cases in test_sys don't match the comments

2014-02-04 Thread Gareth Rees

New submission from Gareth Rees:

Lib/test/test_sys.py contains test cases with incorrect comments -- or
comments with incorrect test cases, if you prefer:

# call without argument
try:
sys.exit(0)
except SystemExit as exc:
self.assertEqual(exc.code, 0)
...

# call with tuple argument with one entry
# entry will be unpacked
try:
sys.exit(42)
except SystemExit as exc:
self.assertEqual(exc.code, 42)
...

# call with integer argument
try:
sys.exit((42,))
except SystemExit as exc:
self.assertEqual(exc.code, 42)
...

(In the quote above I've edited out some inessential detail; see the
file if you really want to know.)

You can see that in the first test case sys.exit is called with an
argument (although the comment claims otherwise); in the second it is
called with an integer (not a tuple), and in the third it is called
with a tuple (not an integer).

These comments have been unchanged since the original commit by Walter
Dörwald http://hg.python.org/cpython/rev/6a1394660270. I've attached
a patch that corrects the first test case and swaps the comments for
the second and third test cases:

# call without argument
rc = subprocess.call([sys.executable, -c,
  import sys; sys.exit()])
self.assertEqual(rc, 0)

# call with integer argument
try:
sys.exit(42)
except SystemExit as exc:
self.assertEqual(exc.code, 42)
...

# call with tuple argument with one entry
# entry will be unpacked
try:
sys.exit((42,))
except SystemExit as exc:
self.assertEqual(exc.code, 42)
...

Note that in the first test case (without an argument) sys.exit() with
no argument actually raises SystemExit(None), so it's not sufficient
to catch the SystemExit and check exc.code; I need to check that it
actually gets translated to 0 on exit.

--
components: Tests
files: exittest.patch
keywords: patch
messages: 210246
nosy: Gareth.Rees
priority: normal
severity: normal
status: open
title: Test cases in test_sys don't match the comments
type: enhancement
versions: Python 3.4
Added file: http://bugs.python.org/file33908/exittest.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20510
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19363] Python 2.7's future_builtins.map is not compatible with Python 3's map

2014-02-04 Thread Gareth Rees

Gareth Rees added the comment:

What about a documentation change instead? The future_builtins chapter
http://docs.python.org/2/library/future_builtins.html in the
standard library documentation could note the incompatibility.

I've attached a patch which adds the following note to the
documentation for future_builtins.map:

Note: In Python 3, map() does not accept None for the function
argument. (zip() can be used instead.)

--
status: closed - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19363
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20510] Test cases in test_sys don't match the comments

2014-02-04 Thread Gareth Rees

Gareth Rees added the comment:

I normally try not to make changes while we're in here for fear of
introducing errors! But I guess the test cases are less critical, so
I've taken your review comments as a license to submit a revised patch
that:

* incorporates your suggestion to use assert_python_ok from
  test.script_helper, instead of subprocess.call;
* replaces the other uses of subprocess.call with
  assert_python_failure and adds a check on stdout;
* cleans up the assertion-testing code using the context manager form
  of unittest.TestCase.assertRaises.

I've signed and submitted a contributor agreement as requested.

--
Added file: http://bugs.python.org/file33914/exittest-1.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20510
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19362] Documentation for len() fails to mention that it works on sets

2014-02-04 Thread Gareth Rees

Gareth Rees added the comment:

Here's a revised patch for Terry (Return the number of items of a sequence or 
collection.)

--
Added file: http://bugs.python.org/file33916/len-set.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19362
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2014-02-04 Thread Gareth Rees

Gareth Rees added the comment:

Yury, let me see if I can move this issue forward. I clearly haven't
done a good job of explaining these problems, how they are related,
and why it makes sense to solve them together, so let me have a go
now.

1. tokenize.untokenize() raises AssertionError if you pass it a
   sequence of tokens output from tokenize.tokenize(). This was my
   original problem report, and it's still not fixed in Python 3.4:

  Python 3.4.0b3 (default, Jan 27 2014, 02:26:41) 
  [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin
  Type help, copyright, credits or license for more information.
   import tokenize, io
   t = list(tokenize.tokenize(io.BytesIO('1+1'.encode('utf8')).readline))
   tokenize.untokenize(t)
  Traceback (most recent call last):
File stdin, line 1, in module
File 
/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/tokenize.py,
 line 317, in untokenize
  out = ut.untokenize(iterable)
File 
/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/tokenize.py,
 line 246, in untokenize
  self.add_whitespace(start)
File 
/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/tokenize.py,
 line 232, in add_whitespace
  assert row = self.prev_row
  AssertionError

   This defeats any attempt to use the sequence:

  input code - tokenize - transform - untokenize - output code

   to transform Python code. But this ought to be the main use case
   for the untokenize function! That's how I came across the problem
   in the first place, when I was starting to write Minipy
   https://github.com/gareth-rees/minipy.

2. Fixing problem #1 is easy (just swap = for =), but it raises the
   question: why wasn't this mistake caught by test_tokenize? There's
   a test function roundtrip() whose docstring says:

  Test roundtrip for `untokenize`. `f` is an open file or a
  string. The source code in f is tokenized, converted back to
  source code via tokenize.untokenize(), and tokenized again from
  the latter. The test fails if the second tokenization doesn't
  match the first.

   If I don't fix the problem with roundtrip(), then how can I be
   sure I have fixed the problem? Clearly it's necessary to fix the
   test case and establish that it provokes the assertion.

   So why doesn't roundtrip() detect the error? Well, it turns out
   that tokenize.untokenize() has two modes of operation and
   roundtrip() only tests one of them.

   The documentation for tokenize.untokenize() is rather cryptic, and
   all it says is:

  Each element returned by the [input] iterable must be a token
  sequence with at least two elements, a token number and token
  value. If only two tokens are passed, the resulting output is
  poor.

   By reverse-engineering the implementation, it seems that it has two
   modes of operation.

   In the first mode (which I have called compatibility mode after
   the method Untokenizer.compat() that implements it) you pass it
   tokens in the form of 2-element tuples (type, text). These must
   have exactly 2 elements.

   In the second mode (which I have called full mode based on the
   description full input in the docstring) you pass it tokens in
   the form of tuples with 5 elements (type, text, start, end, line).
   These are compatible with the namedtuples returned from
   tokenize.tokenize().

   The full mode has the buggy assertion, but
   test_tokenize.roundtrip() only tests the compatibility mode.

   So I must (i) fix roundtrip() so that it tests both modes; (ii)
   improve the documentation for tokenize.untokenize() so that
   programmers have some chance of figuring this out in future!

3. As soon as I make roundtrip() test both modes it provokes the
   assertion failure. Good, so I can fix the assertion. Problem #1
   solved.

   But now there are test failures in full mode:

  $ ./python.exe -m test test_tokenize
  [1/1] test_tokenize
  **
  File /Users/gdr/hg.python.org/cpython/Lib/test/test_tokenize.py, line 
?, in test.test_tokenize.__test__.doctests
  Failed example:
  for testfile in testfiles:
  if not roundtrip(open(testfile, 'rb')):
  print(Roundtrip failed for file %s % testfile)
  break
  else: True
  Expected:
  True
  Got:
  Roundtrip failed for file 
/Users/gdr/hg.python.org/cpython/Lib/test/test_platform.py
  **
  1 items had failures:
 1 of  73 in test.test_tokenize.__test__.doctests
  ***Test Failed*** 1 failures.
  test test_tokenize failed -- 1 of 78 doctests failed
  1 test failed:
  test_tokenize

   Examination of the failed tokenization shows that if the source

[issue12691] tokenize.untokenize is broken

2014-02-04 Thread Gareth Rees

Changes by Gareth Rees g...@garethrees.org:


--
nosy: +benjamin.peterson

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19362] Documentation for len() fails to mention that it works on sets

2013-10-23 Thread Gareth Rees

New submission from Gareth Rees:

The help text for the len() built-in function says:

Return the number of items of a sequence or mapping.

This omits to mention that len() works on sets too. I suggest this be changed 
to:

Return the number of items of a sequence, mapping, or set.

Similarly, the documentation for len() says:

The argument may be a sequence (string, tuple or list) or a mapping 
(dictionary).

I suggest this be changed to

The argument may be a sequence (string, tuple or list), a mapping 
(dictionary), or a set.

(Of course, strictly speaking, len() accepts any object with a __len__ method, 
but sequences, mappings and sets are the ones that are built-in to the Python 
core, and so these are the ones it is important to mention in the help and the 
documentation.)

--
assignee: docs@python
components: Documentation
files: len-set.patch
keywords: patch
messages: 201019
nosy: Gareth.Rees, docs@python
priority: normal
severity: normal
status: open
title: Documentation for len() fails to mention that it works on sets
type: enhancement
Added file: http://bugs.python.org/file32313/len-set.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19362
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19363] Python 2.7's future_builtins.map is not compatible with Python 3's map

2013-10-23 Thread Gareth Rees

New submission from Gareth Rees:

In Python 2.7, future_builtins.map accepts None as its first (function) 
argument:

Python 2.7.5 (default, Aug  1 2013, 01:01:17) 
 from future_builtins import map
 list(map(None, range(3), 'ABC'))
[(0, 'A'), (1, 'B'), (2, 'C')]

But in Python 3.x, map does not accept None as its first argument:

Python 3.3.2 (default, May 21 2013, 11:50:47) 
 list(map(None, range(3), 'ABC'))
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: 'NoneType' object is not callable

The documentation says, if you want to write code compatible with Python 3 
builtins, import them from this module, so this incompatibility may give 
Python 2.7 programmers the false impression that a program which uses map(None, 
...) is portable to Python 3.

I suggest that future_builtins.map in Python 2.7 should behave the same as map 
in Python 3: that is, it should raise a TypeError if None was passed as the 
first argument.

--
components: Library (Lib)
messages: 201020
nosy: Gareth.Rees
priority: normal
severity: normal
status: open
title: Python 2.7's future_builtins.map is not compatible with Python 3's map
type: behavior
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19363
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19362] Documentation for len() fails to mention that it works on sets

2013-10-23 Thread Gareth Rees

Gareth Rees added the comment:

I considered suggesting container, but the problem is that container is 
used elsewhere to mean object supporting the 'in' operator (in particular, 
collections.abc.Container has a __contains__ method but no __len__ method).

The abstract base class for object with a length is collections.abc.Sized, 
but I don't think using the term sized would be clear to users.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19362
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14376] sys.exit documents argument as integer but actually requires subtype of int

2012-03-21 Thread Gareth Rees

Gareth Rees g...@garethrees.org added the comment:

 Wouldn't you also have to deal with possible errors from the PyInt_AsLong 
 call?

Good point. But I note that Python 3 just does

exitcode = (int)PyLong_AsLong(value);

so maybe it's not important to do error handling here.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14376
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14376] sys.exit documents argument as integer but actually requires subtype of int

2012-03-20 Thread Gareth Rees

New submission from Gareth Rees g...@garethrees.org:

The documentation for sys.exit says, The optional argument arg can be an 
integer giving the exit status (defaulting to zero), or another type of object.

However, the arguments that are treated as exit statuses are actually subtypes 
of int.

So, a bool argument is fine:

$ python2.7 -c import sys; sys.exit(False); echo $?
0

But a long argument is not:

$ python2.7 -c import sys; sys.exit(long(0)); echo $?
0
1

The latter behaviour can be surprising since functions like os.spawnv may 
return the exit status of the executed process as a long on some platforms, so 
that if you try to pass on the exit code via

code = os.spawnv(...)
sys.exit(code)

you may get a mysterious surprise: code is 0 but exit code is 1.

It would be simple to change line 1112 of pythonrun.c from

if (PyInt_Check(value))

to

if (PyInt_Check(value) || PyLong_Check(value))

(This issue is not present in Python 3 because there is no longer a distinction 
between int and long.)

--
components: Library (Lib)
messages: 156470
nosy: Gareth.Rees
priority: normal
severity: normal
status: open
title: sys.exit documents argument as integer but actually requires subtype 
of int
type: behavior
versions: Python 2.6, Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14376
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12700] test_faulthandler fails on Mac OS X Lion

2011-08-08 Thread Gareth Rees

Gareth Rees g...@garethrees.org added the comment:

After changing NULL to (int *)1, all tests pass.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12700
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12700] test_faulthandler fails on Mac OS X Lion

2011-08-08 Thread Gareth Rees

Gareth Rees g...@garethrees.org added the comment:

All tests now pass.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12700
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12691] tokenize.untokenize is broken

2011-08-05 Thread Gareth Rees

Gareth Rees g...@garethrees.org added the comment:

I think I can make these changes independently and issue two patches, one 
fixing the problems with untokenize listed here, and another improving tokenize.

I've just noticed a third bug in untokenize: in full mode, it doesn't handle 
backslash-continued lines correctly.

Python 3.3.0a0 (default:c099ba0a278e, Aug  2 2011, 12:35:03) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on 
darwin
Type help, copyright, credits or license for more information.
 from io import BytesIO
 from tokenize import tokenize, untokenize
 untokenize(tokenize(BytesIO('1 and \\\n not 
2'.encode('utf8')).readline))
b'1 andnot 2'

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12691
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



  1   2   >