[issue46065] re.findall takes forever and never ends
Gareth Rees added the comment: This kind of question is frequently asked (#3128, #29977, #28690, #30973, #1737127, etc.), and so maybe it deserves an answer somewhere in the Python documentation. -- resolution: -> wont fix stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue46065> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46065] re.findall takes forever and never ends
Gareth Rees added the comment: The way to avoid this behaviour is to disallow the attempts at matching that you know are going to fail. As Serhiy described above, if the search fails starting at the first character of the string, it will move forward and try again starting at the second character. But you know that this new attempt must fail, so you can force the regular expression engine to discard the attempt immediately. Here's an illustration in a simpler setting, where we are looking for all strings of 'a' followed by 'b': >>> import re >>> from timeit import timeit >>> text = 'a' * 10 >>> timeit(lambda:re.findall(r'a+b', text), number=1) 6.64353118114 We know that any successful match must be preceded by a character other than 'a' (or the beginning of the string), so we can reject many unsuccessful matches like this: >>> timeit(lambda:re.findall(r'(?:^|[^a])(a+b)', text), number=1) 0.00374348114981 In your case, a successful match must be preceded by [^a-zA-Z0-9_.+-] (or the beginning of the string). -- nosy: +g...@garethrees.org ___ Python tracker <https://bugs.python.org/issue46065> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15443] datetime module has no support for nanoseconds
Gareth Rees added the comment: I also have a use case that would benefit from nanosecond resolution in Python's datetime objects, that is, representing and querying the results of clock_gettime() in a program trace. On modern Linuxes with a vDSO, clock_gettime() does not require a system call and completes within a few nanoseconds. So Python's datetime objects do not have sufficient resolution to distinguish between adjacent calls to clock_gettime(). This means that, like Mark Dickinson above, I have to choose between using datetime for queries (which would be convenient) and accepting that nearby events in the trace may be indistinguishable, or implementing my own datetime-like data structure. -- nosy: +g...@garethrees.org ___ Python tracker <https://bugs.python.org/issue15443> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45643] SIGSTKFLT is missing from the signals module on Linux
Gareth Rees added the comment: Tagging vstinner as you have touched Modules/signalmodule.c a few times in the last year. What do you think? -- nosy: +vstinner ___ Python tracker <https://bugs.python.org/issue45643> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45643] SIGSTKFLT is missing from the signals module on Linux
Change by Gareth Rees : -- keywords: +patch pull_requests: +27529 stage: -> patch review pull_request: https://github.com/python/cpython/pull/29266 ___ Python tracker <https://bugs.python.org/issue45643> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45643] SIGSTKFLT is missing from the signals module on Linux
New submission from Gareth Rees : BACKGROUND On Linux, "man 7 signal" includes SIGSTKFLT in its table of "various other signals": Signal Value Action Comment ─── SIGSTKFLT -,16,- Term Stack fault on coprocessor (unused) Here "-,16,-" means that the signal is defined with the value 16 on x86 and ARM but not on Alpha, SPARC or MIPS. I believe that the intention was to use SIGSTKFLT for stack faults on the x87 math coprocessor, but this was either removed or never implemented, so that the signal is defined in /usr/include/signal.h but not used by the Linux kernel. USE CASE SIGSTKFLT is one of a handful of signals that are not used by the kernel, so that user-space programs are free to use it for their own purposes, for example for inter-thread or inter-process pre-emptive communication. Accordingly, it would be nice if the name SIGSTKFLT were available in the Python signal module on the platforms where the signal is available, for use and reporting in these cases. -- components: Library (Lib) messages: 405174 nosy: g...@garethrees.org priority: normal severity: normal status: open title: SIGSTKFLT is missing from the signals module on Linux type: enhancement versions: Python 3.11 ___ Python tracker <https://bugs.python.org/issue45643> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45476] [C API] Convert "AS" functions, like PyFloat_AS_DOUBLE(), to static inline functions
Gareth Rees added the comment: If the problem is accidental use of the result of PyFloat_AS_DOUBLE() as an lvalue, why not use the comma operator to ensure that the result is an rvalue? The C99 standard says "A comma operator does not yield an lvalue" in §6.5.17; I imagine there is similar text in other versions of the standard. The idea would be to define a helper macro like this: /* As expr, but can only be used as an rvalue. */ #define Py_RVALUE(expr) ((void)0, (expr)) and then use the helper where needed, for example: #define PyFloat_AS_DOUBLE(op) Py_RVALUE(((PyFloatObject *)(op))->ob_fval) -- nosy: +g...@garethrees.org ___ Python tracker <https://bugs.python.org/issue45476> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41092] Report actual size from 'os.path.getsize'
Gareth Rees added the comment: The proposed change adds a Boolean flag to os.path.getsize() so that it returns: os.stat(filename).st_blocks * 512 (where the 512 is the file system block size on Linux; some work is needed to make this portable to other operating systems). The Boolean argument here would always be constant in practice -- that is, you'd always call it like this: virtual_size = os.path.getsize(filename, apparent=True) allocated_size = os.path.getsize(filename, apparent=False) and never like this: x_size = os.path.getsize(filename, apparent=x) where x varies at runtime. The "no constant bool arguments" design principle [1] suggests that this should be added as a new function, something like os.path.getallocatedsize(). [1] https://mail.python.org/pipermail/python-ideas/2016-May/040181.html -- nosy: +g...@garethrees.org ___ Python tracker <https://bugs.python.org/issue41092> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40707] Popen.communicate documentation does not say how to get the return code
Gareth Rees added the comment: Is there anything I can do to move this forward? -- ___ Python tracker <https://bugs.python.org/issue40707> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40707] Popen.communicate documentation does not say how to get the return code
Gareth Rees added the comment: The following test cases in test_subprocess.py call the communicate() method and then immediately assert that returncode attribute has the expected value: * test_stdout_none * test_stderr_redirect_with_no_stdout_redirect * test_stdout_filedes_of_stdout * test_communicate_stdin * test_universal_newlines_communicate_stdin * test_universal_newlines_communicate_input_none * test_universal_newlines_communicate_stdin_stdout_stderr * test_nonexisting_with_pipes * test_wait_when_sigchild_ignored * test_startupinfo_copy * test_close_fds_with_stdio * test_communicate_stdin You'll see that some of these test for success (returncode == 0) and some for failure (returncode == 1). This seems like adequate test coverage to me, but if something is missing, let me know. -- ___ Python tracker <https://bugs.python.org/issue40707> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40707] Popen.communicate documentation does not say how to get the return code
Change by Gareth Rees : -- keywords: +patch pull_requests: +19559 stage: -> patch review pull_request: https://github.com/python/cpython/pull/20283 ___ Python tracker <https://bugs.python.org/issue40707> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40707] Popen.communicate documentation does not say how to get the return code
New submission from Gareth Rees : When using subprocess.Popen.communicate(), it is natural to wonder how to get the exit code of the subprocess. However, the documentation [1] says: Interact with process: Send data to stdin. Read data from stdout and stderr, until end-of-file is reached. Wait for process to terminate. The optional input argument should be data to be sent to the child process, or None, if no data should be sent to the child. If streams were opened in text mode, input must be a string. Otherwise, it must be bytes. communicate() returns a tuple (stdout_data, stderr_data). The data will be strings if streams were opened in text mode; otherwise, bytes. If you can guess that communicate() might set returncode, then you can find what you need in the documentation for that attribute [2]: The child return code, set by poll() and wait() (and indirectly by communicate()). I suggest that the documentation for communicate() be updated to mention that it sets the returncode attribute. This would be consistent with poll() and wait(), which already mention this. [1]: https://docs.python.org/3/library/subprocess.html#subprocess.Popen.communicate [2]: https://docs.python.org/3/library/subprocess.html#subprocess.Popen.returncode -- assignee: docs@python components: Documentation messages: 369502 nosy: docs@python, g...@garethrees.org priority: normal severity: normal status: open title: Popen.communicate documentation does not say how to get the return code type: enhancement versions: Python 3.10, Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue40707> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17005] Add a topological sort algorithm
Gareth Rees added the comment: I'd like to push back on the idea that graphs with isolated vertices are "unusual cases" as suggested by Raymond. A very common use case (possibly the most common) for topological sorting is job scheduling. In this use case you have a collection of jobs, some of which have dependencies on other jobs, and you want to output a schedule according to which the jobs can be executed so that each job is executed after all its dependencies. In this use case, any job that has no dependencies, and is not itself a dependency of any other job, is an isolated vertex in the dependency graph. This means that the proposed interface (that is, the interface taking only pairs of vertices) will not be suitable for this use case. Any any programmer who tries to use it for this use case will be setting themselves up for failure. -- ___ Python tracker <https://bugs.python.org/issue17005> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17005] Add a topological sort algorithm
Gareth Rees added the comment: Just to elaborate on what I mean by "bug magnet". (I'm sure Pablo understands this, but there may be other readers who would like to see it spelled out.) Suppose that you have a directed graph represented as a mapping from a vertex to an iterable of its out-neighbours. Then the "obvious" way to get a total order on the vertices in the graph would be to generate the edges and pass them to topsort: def edges(graph): return ((v, w) for v, ww in graph.items() for w in ww) order = topsort(edges(graph)) This will appear to work fine if it is never tested with a graph that has isolated vertices (which would be an all too easy omission). To handle isolated vertices you have to remember to write something like this: reversed_graph = {v: [] for v in graph} for v, ww in graph.items(): for w in ww: reversed_graph[w].append(v) order = topsort(edges(graph)) + [ v for v, ww in graph.items() if not ww and not reversed_graph[v]] I think it likely that beginner programmers will forget to do this and be surprised later on when their total order is missing some of the vertices. -- ___ Python tracker <https://bugs.python.org/issue17005> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17005] Add a topological sort algorithm
Gareth Rees added the comment: I approve in general with the principle of including a topological sort algorithm in the standard library. However, I have three problems with the approach in PR 11583: 1. The name "topsort" is most naturally parsed as "top sort" which could be misinterpreted (as a sort that puts items on top in some way). If the name must be abbreviated then "toposort" would be better. 2. "Topological sort" is a terrible name: the analogy with topological graph theory is (i) unlikely to be helpful to anyone; and (ii) not quite right. I know that the name is widely used in computing, but a name incorporating "linearize" or "linear order" or "total order" would be much clearer. 3. The proposed interface is not suitable for all cases! The function topsort takes a list of directed edges and returns a linear order on the vertices in those edges (if any linear order exists). But this means that if there are any isolated vertices (that is, vertices with no edges) in the dependency graph, then there is no way of passing those vertices to the function. This means that (i) it is inconvenient to use the proposed interface because you have to find the isolated vertices in your graph and add them to the linear order after calling the function; (ii) it is a bug magnet because many programmers will omit this step, meaning that their code will unexpectedly fail when their graph has an isolated vertex. The interface needs to be redesigned to take the graph in some other representation. -- nosy: +g...@garethrees.org ___ Python tracker <https://bugs.python.org/issue17005> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue32194] When creating list of dictionaries and updating datetime objects one by one, all values are set to last one of the list.
Gareth Rees <g...@garethrees.org> added the comment: The behaviour of the * operator (and the associated gotcha) is documented under "Common sequence operations" [1]: Note that items in the sequence s are not copied; they are referenced multiple times. This often haunts new Python programmers ... There is also an entry in the FAQ [2]: replicating a list with * doesn’t create copies, it only creates references to the existing objects [1] https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range [2] https://docs.python.org/3/faq/programming.html#faq-multidimensional-list -- nosy: +g...@garethrees.org resolution: -> not a bug stage: -> resolved status: open -> closed ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue32194> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31895] Native hijri calendar support
Gareth Rees <g...@garethrees.org> added the comment: convertdate does not document which version of the Islamic calendar it uses, but looking at the source code, it seems that it uses a rule-based calendar which has a 30-year cycle with 11 leap years. This won't help Haneef, who wants the Umm al-Qura calendar. -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue31895> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31895] Native hijri calendar support
Gareth Rees <g...@garethrees.org> added the comment: It is a substantial undertaking, requiring a great deal of expertise, to implement the Islamic calendar. The difficulty is that there are multiple versions of the calendar. In some places the calendar is based on human observation of the new moon, and so a database of past observations is needed (and future dates can't be represented). In other places the time of observability of the new moon is calculated according to an astronomical ephemeris (and different ephemerides are used in different places and at different times). -- nosy: +g...@garethrees.org ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue31895> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28647] python --help: -u is misdocumented as binary mode
Gareth Rees <g...@garethrees.org> added the comment: You're welcome. -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue28647> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24869] shlex lineno inaccurate with certain inputs
Gareth Rees added the comment: I've made a pull request. (Not because I expect it to be merged as-is, but to provide a starting point for discussion.) -- nosy: +petri.lehtinen, vinay.sajip ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue24869> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24869] shlex lineno inaccurate with certain inputs
Changes by Gareth Rees <g...@garethrees.org>: -- pull_requests: +2849 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue24869> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30976] multiprocessing.Process.is_alive can show True for dead processes
Gareth Rees added the comment: This is a race condition — when os.kill returns, that means that the signal has been delivered, but it does not mean that the subprocess has exited yet. You can see this by inserting a sleep after the kill and before the liveness check: print(proc.is_alive()) os.kill(proc.pid, signal.SIGTERM) time.sleep(1) print(proc.is_alive()) This (probably) gives the process time to exit. (Presumably the psutil.pid_exists() call has a similar effect.) Of course, waiting for 1 second (or any amount of time) might not be enough. The right thing to do is to join the process. Then when the join exits you know it died. -- nosy: +g...@garethrees.org ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30976> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30973] Regular expression "hangs" interpreter
Gareth Rees added the comment: This is the usual exponential backtracking behaviour of Python's regex engine. The problem is that the regex (?:[^*]+|\*[^/])* can match against a string in exponentially many ways, and Python's regex engine tries all of them before giving up. -- nosy: +g...@garethrees.org ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30973> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19896] Exposing "q" and "Q" to multiprocessing.sharedctypes
Changes by Gareth Rees <g...@garethrees.org>: -- pull_requests: +2801 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19896> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19896] Exposing "q" and "Q" to multiprocessing.sharedctypes
Gareth Rees added the comment: (If he hasn't, I don't think I can make a PR because I read his patch and so any implementation I make now is based on his patch and so potentially infringes his copyright.) -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19896> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19896] Exposing "q" and "Q" to multiprocessing.sharedctypes
Gareth Rees added the comment: Has Antony Lee has made a copyright assignment? -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19896> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19896] Exposing "q" and "Q" to multiprocessing.sharedctypes
Changes by Gareth Rees <g...@garethrees.org>: -- nosy: +benjamin.peterson ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19896> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30943] printf-style Bytes Formatting sometimes do not worked.
Gareth Rees added the comment: This was already noted in issue29714 and fixed by Xiang Zhang in commit b76ad5121e2. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30943> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30943] printf-style Bytes Formatting sometimes do not worked.
Gareth Rees added the comment: Test case minimization: Python 3.6.1 (default, Apr 24 2017, 06:18:27) [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> b'a\x00%(a)s' % {b'a': b'a'} b'a\x00%(a)s' It seems that all formatting operations after a zero byte are ignored. This is because the code for parsing the format string (in _PyBytes_FormatEx in Objects/bytesobject.c) uses the following approach to find the next % character: while (--fmtcnt >= 0) { if (*fmt != '%') { Py_ssize_t len; char *pos; pos = strchr(fmt + 1, '%'); But strchr uses the C notion of strings, which are terminated by a zero byte. -- nosy: +g...@garethrees.org ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30943> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19896] Exposing "q" and "Q" to multiprocessing.sharedctypes
Gareth Rees added the comment: Patch looks good to me. The test cases are not very systematic (why only int, double, and long long?), but that's not the fault of the patch and shouldn't prevent its being applied. -- nosy: +g...@garethrees.org ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19896> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30919] Shared Array Memory Allocation Regression
Gareth Rees added the comment: I propose: 1. Ask Richard Oudkerk why in changeset 3b82e0d83bf9 the temporary file is zero-filled and not truncated. Perhaps there's some file system where this is necessary? (I tested HFS+ which doesn't support sparse files, and zero-filling seems not to be necessary, but maybe there's some other file system where it is?) 2. If there's no good reason for zero-filling the temporary file, replace it with a call to os.ftruncate(fd, size). 3. Update the documentation to mention the performance issue when porting multiprocessing code from 2 to 3. Unfortunately, I don't think there's any advice that the documentation can give that will help work around it -- monkey-patching works but is not supported. 4. Consider writing a fix, or at least a supported workaround. Here's a suggestion: update multiprocessing.sharedctypes and multiprocessing.heap so that they use anonymous maps in the 'fork' context. The idea is to update the RawArray and RawValue functions so that they take the context, and then pass the context down to _new_value, BufferWrapper.__init__ and thence to Heap.malloc where it can be used to determine what kind of Arena (file-backed or anonymous) should be used to satisfy the allocation request. The Heap class would have to have to segregate its blocks according to what kind of Arena they come from. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30919> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30919] Shared Array Memory Allocation Regression
Gareth Rees added the comment: I see now that the default start method is 'fork' (except on Windows), so calling set_start_method is unnecessary. Note that you don't have to edit multiprocessing/heap.py, you can "monkey-patch" it in the program that needs the anonymous mapping: from multiprocessing.heap import Arena def anonymous_arena_init(self, size, fd=-1): "Create Arena using an anonymous memory mapping." self.size = size self.fd = fd # still kept but is not used ! self.buffer = mmap.mmap(-1, self.size) Arena.__init__ = anonymous_arena_init As for what it will break — any code that uses the 'spawn' or 'forkserver' start methods. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30919> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30919] Shared Array Memory Allocation Regression
Gareth Rees added the comment: Nonetheless this is bound to be a nasty performance for many people doing big data processing with NumPy/SciPy/Pandas and multiprocessing and moving from 2 to 3, so even if it can't be fixed, the documentation ought to warn about the problem and explain how to work around it. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30919> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30919] Shared Array Memory Allocation Regression
Gareth Rees added the comment: If you need the 2.7 behaviour (anonymous mappings) in 3.5 then you can still do it, with some effort. I think the approach that requires the smallest amount of work would be to ensure that subprocesses are started using fork(), by calling multiprocessing.set_start_method('fork'), and then monkey-patch multiprocessing.heap.Arena.__init__ so that it creates anonymous mappings using mmap.mmap(-1, size). (I suggested above that Python could be modified to create anonymous mappings in the 'fork' case, but now that I look at the code in detail, I see that it would be tricky, because the Arena class has no idea about the Context in which it is going to be used -- at the moment you can create one shared object and then pass it to subprocesses under different Contexts, so the shared objects have to support the lowest common denominator.) -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30919> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30919] Shared Array Memory Allocation Regression
Gareth Rees added the comment: Note that some filesystems (e.g. HFS+) don't support sparse files, so creating a large Arena will still be slow on these filesystems even if the file is created using ftruncate(). (This could be fixed, for the "fork" start method only, by using anonymous maps in that case.) -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30919> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30919] Shared Array Memory Allocation Regression
Gareth Rees added the comment: In Python 2.7, multiprocessing.heap.Arena uses an anonymous memory mapping on Unix. Anonymous memory mappings can be shared between processes but only via fork(). But Python 3 supports other ways of starting subprocesses (see issue 8713 [1]) and so an anonymous memory mapping no longer works. So instead a temporary file is created, filled with zeros to the given size, and mapped into memory (see changeset 3b82e0d83bf9 [2]). It is the zero-filling of the temporary file that takes the time, because this forces the operating system to allocate space on the disk. But why not use ftruncate() (instead of write()) to quickly create a file with holes? POSIX says [3], "If the file size is increased, the extended area shall appear as if it were zero-filled" which would seem to satisfy the requirement. [1] https://bugs.python.org/issue8713 [2] https://hg.python.org/cpython/rev/3b82e0d83bf9 [3] http://pubs.opengroup.org/onlinepubs/9699919799/functions/ftruncate.html -- nosy: +g...@garethrees.org ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30919> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30564] Base64 decoding gives incorrect outputs.
Gareth Rees added the comment: RFC 4648 section 3.5 says: The padding step in base 64 and base 32 encoding can, if improperly implemented, lead to non-significant alterations of the encoded data. For example, if the input is only one octet for a base 64 encoding, then all six bits of the first symbol are used, but only the first two bits of the next symbol are used. These pad bits MUST be set to zero by conforming encoders, which is described in the descriptions on padding below. If this property do not hold, there is no canonical representation of base-encoded data, and multiple base- encoded strings can be decoded to the same binary data. If this property (and others discussed in this document) holds, a canonical encoding is guaranteed. In some environments, the alteration is critical and therefore decoders MAY chose to reject an encoding if the pad bits have not been set to zero. If decoders may choose to reject non-canonical encodings, then they may also choose to accept them. (That's the meaning of "MAY" in RFC 2119.) So I think Python's behaviour is conforming to the standard. -- nosy: +g...@garethrees.org ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30564> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29977] re.sub stalls forever on an unmatched non-greedy case
Gareth Rees added the comment: See also issue28690, issue212521, issue753711, issue1515829, etc. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29977> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29977] re.sub stalls forever on an unmatched non-greedy case
Gareth Rees added the comment: The problem here is that both "." and "\s" match a whitespace character, and because you have the re.DOTALL flag turned on this includes "\n", and so the number of different ways in which (.|\s)* can be matched against a string is exponential in the number of whitespace characters in the string. It is best to design your regular expression so as to limit the number of different ways it can match. Here I recommend the expression: /\*(?:[^*]|\*[^/])*\*/ which can match in only one way. -- nosy: +g...@garethrees.org ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29977> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14376] sys.exit documents argument as "integer" but actually requires "subtype of int"
Gareth Rees added the comment: Thank you, Mark (and everyone else who helped). -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue14376> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14376] sys.exit documents argument as "integer" but actually requires "subtype of int"
Gareth Rees added the comment: Thanks for the revised patch, Mark. The new tests look good. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue14376> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24869] shlex lineno inaccurate with certain inputs
Gareth Rees added the comment: Here's a patch that implements my proposal (1) -- under this patch, tokens read from an input stream belong to a subtype of str with startline and endline attributes giving the line numbers of the first and last character of the token. This allows the accurate reporting of error messages relating to a token. I updated the documentation and added a test case. -- keywords: +patch Added file: http://bugs.python.org/file46479/issue24869.patch ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue24869> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14376] sys.exit documents argument as "integer" but actually requires "subtype of int"
Gareth Rees added the comment: In Windows, under cmd.exe, you can use %errorlevel% -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue14376> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14376] sys.exit documents argument as "integer" but actually requires "subtype of int"
Gareth Rees added the comment: Is there any chance of making progress on this issue? Is there anything wrong with my patch? Did I omit any relevant point in my message of 2016-06-11 16:26? It would be nice if this were not left in limbo for another four years. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue14376> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28743] test_choices_algorithms() in test_random uses lots of memory
Gareth Rees added the comment: In order for this to work, the __getitem__ method needs to be: def __getitem__(self, key): if 0 <= key < self.n: return self.elem else: raise IndexError(key) But unfortunately this is very bad for the performance of the test. The original code, with [1]*n: Ran 1 test in 5.256s With RepeatedSequence(1, n): Ran 1 test in 33.620s So that's no good. However, I notice that although the documentation of choices specifies that weights is a sequence, in fact it seems only to require an iterable: cum_weights = list(_itertools.accumulate(weights)) so itertools.repeat works, and is faster than the original code: Ran 1 test in 4.991s Patch attached, in case it's acceptable to pass an iterable here. -- keywords: +patch Added file: http://bugs.python.org/file45546/issue28743.patch ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue28743> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28743] test_choices_algorithms() in test_random uses lots of memory
Gareth Rees added the comment: Couldn't the test case use something like this to avoid allocating so much memory? from collections.abc import Sequence class RepeatedSequence(Sequence): """Immutable sequence of n repeats of elem.""" def __init__(self, elem, n): self.elem = elem self.n = n def __getitem__(self, key): return self.elem def __len__(self): return self.n and then: self.gen.choices(range(n), RepeatedSequence(1, n), k=1) -- nosy: +Gareth.Rees ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue28743> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28690] Loop in re (regular expression) processing
Gareth Rees added the comment: This is a well-known gotcha with backtracking regexp implementations. The problem is that in the alternation "( +|'[^']*'|\"[^\"]*\"|[^>]+)" there are some characters (space, apostrophe, double quotes) that match multiple alternatives (for example a space matches both " +" and "[^>]+"). This causes the regexp engine to have to backtrack for each ambiguous character to try out the other alternatives, leading to runtime that's exponential in the number of ambiguous characters. Linear behaviour can be restored if you make the alternation unambiguous, like this: ( +|'[^']*'|\"[^\"]*\"|[^>'\"]+) -- nosy: +Gareth.Rees ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue28690> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28676] On macOS Sierra, warning: implicit declaration of function 'getentropy'
New submission from Gareth Rees: On macOS Sierra (OSX 10.12.1): $ ./configure --with-pydebug && make [... lots of output omitted ...] gcc -c -Wno-unused-result -Wsign-compare -g -O0 -Wall -Wstrict-prototypes -std=c99 -Wextra -Wno-unused-result -Wno-unused-parameter -Wno-missing-field-initializers -I. -I./Include-DPy_BUILD_CORE -o Python/random.o Python/random.c Python/random.c:97:19: warning: implicit declaration of function 'getentropy' is invalid in C99 [-Wimplicit-function-declaration] res = getentropy(buffer, len); ^ 1 warning generated. This is because OSX 10.12.1 has getentropy() but does not have getrandom(). You can see this in pyconfig.h: /* Define to 1 if you have the `getentropy' function. */ #define HAVE_GETENTROPY 1 /* Define to 1 if the getrandom() function is available */ /* #undef HAVE_GETRANDOM */ and this means that in Python/random.c the header is not included: # ifdef HAVE_GETRANDOM #include # elif defined(HAVE_GETRANDOM_SYSCALL) #include # endif It's necessary include if either HAVE_GETRANDOM or HAVE_GETENTROPY is defined. -- components: Build, macOS files: getentropy.patch keywords: patch messages: 280669 nosy: Gareth.Rees, ned.deily, ronaldoussoren priority: normal severity: normal status: open title: On macOS Sierra, warning: implicit declaration of function 'getentropy' type: compile error versions: Python 3.7 Added file: http://bugs.python.org/file45465/getentropy.patch ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue28676> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28647] python --help: -u is misdocumented as binary mode
Gareth Rees added the comment: Here's a patch that copies the text for the -u option from the man page to the --help output. -- keywords: +patch Added file: http://bugs.python.org/file45463/issue28647.patch ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue28647> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28647] python --help: -u is misdocumented as binary mode
Gareth Rees added the comment: The output of "python3.5 --help" says: -u : unbuffered binary stdout and stderr, stdin always buffered; also PYTHONUNBUFFERED=x see man page for details on internal buffering relating to '-u' If you look at the man page as instructed then you'll see a clearer explanation: -u Force the binary I/O layers of stdout and stderr to be unbuffered. stdin is always buffered. The text I/O layer will still be line-buffered. For example, if you try this: python3.5 -uc 'import sys,time;w=sys.stdout.buffer.write;w(b"a");time.sleep(1);w(b"b");' then you'll see that the binary output is indeed unbuffered as documented. The output of --help is trying to abbreviate this explanation, but I think it's abbreviated too much. The explanation from the man page seems clear to me, and is only a little longer, so I suggest changing the --help output to match the man page. -- nosy: +Gareth.Rees ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue28647> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27588] Type objects are hashable and comparable for equality but this is not documented
New submission from Gareth Rees: The type objects constructed by the metaclasses in the typing module are hashable and comparable for equality: >>> from typing import * >>> {Mapping[str, int], Mapping[int, str]} {typing.Mapping[int, str], typing.Mapping[str, int]} >>> Union[str, int, float] == Union[float, int, str] True >>> List[int] == List[float] False but this is not clearly documented in the documentation for the typing module (there are a handful of examples using equality, but it's not explicit that these are runnable). It would be nice if there were explicit documentation for these properties of type objects. -- assignee: docs@python components: Documentation messages: 270981 nosy: Gareth.Rees, docs@python priority: normal severity: normal status: open title: Type objects are hashable and comparable for equality but this is not documented type: enhancement versions: Python 3.5, Python 3.6 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue27588> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24869] shlex lineno inaccurate with certain inputs
Gareth Rees added the comment: A third alternative: 3. Add a method whose effect is to consume comments and whitespace, but which does not yield a token. You could then call this method, and then look at shlex.lineno, which will be the line number of the first character of the next token (if there is a next token). -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue24869> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24869] shlex lineno inaccurate with certain inputs
Gareth Rees added the comment: Just to restate the problem: The use case is that when emitting an error message for a token, we want to include the number of the line containing the token (or the number of the line where the token started, if the token spans multiple lines, as it might if it's a string containing newlines). But there is no way to satisfy this use case given the features of the shlex module. In particular, shlex.lineno (which looks as if it ought to help) is actually the line number of the first character that has not yet been consumed by the lexer, and in general this is not the same as the line number of the previous (or the next) token. I can think of two alternatives that would satisfy the use case: 1. Instead of returning tokens as str objects, return them as instances of a subclass of str that has a property that gives the line number of the first character of the token. (Maybe it should also have properties for the column number of the first character, and the line and column number of the last character too? These properties would support better error messages.) 2. Add new methods that return tuples giving the token and its line number (and possibly column number etc. as in alternative 1). My preference would be for alternative (1), but I suppose there is a very tiny risk of breaking some code that relied upon get_token returning an instance of str exactly rather than an instance of a subclass of str. -- nosy: +Gareth.Rees ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue24869> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27306] Grammatical Error in Documentation - Tarfile page
Gareth Rees added the comment: Here's a patch improving the grammar in the tarfile documentation. -- keywords: +patch nosy: +Gareth.Rees Added file: http://bugs.python.org/file43375/issue27306.patch ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue27306> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20508] IndexError from ipaddress._BaseNetwork.__getitem__ has no message
Gareth Rees added the comment: Thank you for applying this patch. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue20508> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14376] sys.exit documents argument as "integer" but actually requires "subtype of int"
Gareth Rees added the comment: Let's not allow the perfect to be the enemy of the good here. The issue I reported is a very specific one: in Python 2.7, if you pass a long to sys.exit, then the value of the long is not used as the exit code. This is bad because functions like os.spawnv that return exit codes (that you might reasonably want to pass on to sys.exit) can return them as long. My patch only proposes to address this one issue. In order to keep the impact as small as possible, I do not propose to make any other changes, or address any other problems. But in the comments here people have brought up THREE other issues: 1. Alexander Belopolsky expresses the concern that "(int)PyLong_AsLong(value) can silently convert non-zero error code to zero." This is not a problem introduced by my patch -- the current code is: exitcode = (int)PyInt_AsLong(value) which has exactly the same problem (because PyIntObject stores its value as a long). So this concern (even if valid) is not a reason to reject my patch. 2. Ethan Furman wrote: "we need to protect against overflow from to " But again, this is not a problem introduced by my patch. The current code says: exitcode = (int)PyInt_AsLong(value); and my patch does not change this line. The possibility of this overflow is not a reason to reject my patch. 3. Alexander says, "Passing anything other than one of the os.EX_* constants to sys.exit() is a bad idea" First, this is not a problem introduced by my patch. The existing code in Python 2.7 allows you to specify other exit codes. So this problem (if it is a problem) is not a reason to reject my patch. Second, this claim is surely not right -- when a subprocess fails it often makes sense to pass on the exit code of the subprocess, whatever that is. This is exactly the use case that I mentioned in my original report (that is, passing on the exit code from os.spawnv to sys.exit). -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue14376> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20508] IndexError from ipaddress._BaseNetwork.__getitem__ has no message
Gareth Rees added the comment: I've attached a revised patch that addresses Berker Peksag's concerns: 1. The message associated with the IndexError is now "address out of range" with no information about which address failed or why. 2. There's a new test case for an IndexError from an IPv6 address lookup. -- Added file: http://bugs.python.org/file43341/ipaddress.patch ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue20508> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24460] urlencode() of dictionary not as expected
Gareth Rees added the comment: If you read the documentation for urllib.parse.urlencode [1], you'll see that it says: The value element in itself can be a sequence and in that case, if the optional parameter doseq is evaluates to True, individual key=value pairs separated by '' are generated for each element of the value sequence for the key. So you need to write: urllib.parse.urlencode(thisDict, doseq=True) 'SomeVar3=ghiSomeVar1=abcSomeVar2=def' [1]: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlencode -- nosy: +Gareth.Rees ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24460 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24405] Missing code markup in Expressions documentation
New submission from Gareth Rees: The Expressions documentation contains the text: * Sets and frozensets define comparison operators to mean subset and superset tests. Those relations do not define total orderings (the two sets ``{1,2}`` and {2,3} are not equal, nor subsets of one another, nor supersets of one another). Here {2,3} should be marked up as code (like {1,2}) but is not. -- assignee: docs@python components: Documentation files: markup.patch keywords: patch messages: 244996 nosy: Gareth.Rees, docs@python priority: normal severity: normal status: open title: Missing code markup in Expressions documentation type: enhancement Added file: http://bugs.python.org/file39657/markup.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24405 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24406] Built-in Types documentation doesn't explain how dictionaries are compared for equality
Changes by Gareth Rees g...@garethrees.org: -- title: Bulit-in Types documentation doesn't explain how dictionaries are compared for equality - Built-in Types documentation doesn't explain how dictionaries are compared for equality ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24406 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24067] Weakproxy is an instance of collections.Iterator
Gareth Rees added the comment: The documentation says that weakref.Proxy objects are not hashable because this avoids a number of problems related to their fundamentally mutable nature, and prevent their use as dictionary keys. Hashable objects must be immutable, otherwise the hash might change, invalidating the invariants that make dictionaries work, but Proxy objects are fundamentally mutable: when there are no more strong references to the proxied object, the object gets destroyed and the Proxy object now refers to None. If the Proxy object were hashable then its hash would change at this point. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24067 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24067] Weakproxy is an instance of collections.Iterator
Gareth Rees added the comment: I don't see any reason for proxy objects to be less hashable than ref objects. The difference is that unlike a ref object, a proxy object is supposed to forward its method calls to the proxied object. So consider what happens if you forward the __hash__ method to the proxied object: the hash will change when the object dies. A proxy object could, of course, not forward the __hash__ method, instead computing its own hash. But I think this would do more harm than good: surely most attempts to store weakref.Proxy objects in sets or dictionaries are going to be mistakes -- the user should have used a WeakKeyDictionary or a WeakSet instead. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24067 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24067] Weakproxy is an instance of collections.Iterator
Gareth Rees added the comment: Hashable is particularly misleading, because weakref.Proxy objects are not hashable regardless of the referent. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24067 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24067] Weakproxy is an instance of collections.Iterator
Gareth Rees added the comment: Not just Iterator, but Container, Hashable, Iterable, and Sized too! import weakref class C: pass o = C() w = weakref.proxy(o) from collections.abc import * isinstance(w, Container) True isinstance(w, Hashable) True isinstance(w, Iterable) True isinstance(w, Sized) True -- nosy: +Gareth.Rees ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24067 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20606] Operator Documentation Example doesn't work
Gareth Rees added the comment: This is a duplicate of #22180, which was fixed in changeset 9c250f34bfa3 by Raymond Hettinger in branch '3.4'. The fix just removes the bad example, as in my patch. So I suggest that this issue be closed as a duplicate. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20606 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20941] pytime.c:184 and pytime.c:218: runtime error, outside the range of representable values of type 'long'
Gareth Rees added the comment: How did you get this warning? This looks like runtime output from a program built using Clang/LLVM with -fsanitize=undefined. See here: http://clang.llvm.org/docs/UsersManual.html#controlling-code-generation Signed integer overflow is undefined behaviour, so by the time *sec = (time_t)intpart has been evaluated, the undefined behaviour has already happened. It is too late to check for it afterwards. -- nosy: +Gareth.Rees ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20941 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20905] Adapt heapq push/pop/replace to allow passing a comparator.
Gareth Rees added the comment: It would be better to accept a key function instead of a comparison function (cf. heapq.nlargest and heapq.nsmallest). But note that this has been proposed before and rejected: see issue1904 where Raymond Hettinger provides this rationale: Use cases aside, there is another design issue in that the key-function approach doesn't work well with the heap functions on regular lists. Successive calls to heap functions will of necessity call the key- function multiple times for any given element. This contrasts with sort () where the whole purpose of the key function was to encapsulate the decorate-sort-undecorate pattern which was desirable because the key- function called exactly once per element. However, in the case of the bisect module (where requests for a key function are also common), Guido was recently persuaded that there was a valid use case. See issue4356, and this thread on the Python-ideas mailing list: https://mail.python.org/pipermail/python-ideas/2012-February/thread.html#13650 where Arnaud Delobelle points out that: Also, in Python 3 one can't assume that values will be comparable so the (key, value) tuple trick won't work: comparing the tuples may well throw a TypeError. and Guido responds: Bingo. That clinches it. We need to add key=. -- nosy: +Gareth.Rees ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20905 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20774] collections.deque should ship with a stdlib json serializer
Gareth Rees added the comment: The JSON implementation uses these tests to determine how to serialize a Python object: isinstance(o, (list, tuple)) isinstance(o, dict) So any subclasses of list and tuple are serialized as a list, and any subclass of dict is serialized as an object. For example: json.dumps(collections.defaultdict()) '{}' json.dumps(collections.OrderedDict()) '{}' json.dumps(collections.namedtuple('mytuple', ())()) '[]' When deserialized, you'll get back a plain dictionary or list, so there's no round-trip property here. The tests could perhaps be changed to: isinstance(o, collections.abc.Sequence) isinstance(o, collections.abc.Mapping) I'm not a JSON expert, so I have no informed opinion on whether this is a good idea or not, but in any case, this change wouldn't help with deques, as a deque is not a Sequence. That's because deques don't have an index method (see issue10059 and issue12543). -- nosy: +Gareth.Rees ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20774 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20727] Improved roundrobin itertools recipe
Gareth Rees added the comment: If 100 doesn't work for you, try a larger number. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20727 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20727] Improved roundrobin itertools recipe
Gareth Rees added the comment: I suspect I messed up the timing I did yesterday, because today I find that 100 isn't large enough, but here's what I found today (in Python 3.3): from timeit import timeit test = [tuple(range(300))] + [()] * 100 timeit(lambda:list(roundrobin1(*test)), number=1) # old recipe 8.386148632998811 timeit(lambda:list(roundrobin2(*test)), number=1) # new recipe 16.757110453007044 The new recipe is more than twice as slow as the old in this case, and its performance gets relatively worse as you increase the number 300. I should add that I do recognise that the new recipe is better for nearly all cases (it's simpler as well as faster), but I want to point out an important feature of the old recipe, namely that it discards iterables as they are finished with, giving it worst-case O(n) performance (albeit slow) whereas the new recipe has worst case O(n^2). As we found out with hash tables, worst-case O(n^2) performance can be a problem when inputs are untrusted, so there are use cases where people might legitimately prefer an O(n) solution even if it's a bit slower in common cases. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20727 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20727] Improved roundrobin itertools recipe
Gareth Rees added the comment: But now that I look at the code more carefully, the old recipe also has O(n^2) behaviour, because cycle(islice(nexts, pending)) costs O(n) and is called O(n) times. To have worst-case O(n) behaviour, you'd need something like this: from collections import deque def roundrobin3(*iterables): roundrobin('ABC', 'D', 'EF') -- A D E B F C nexts = deque(iter(it).__next__ for it in iterables) while nexts: try: while True: yield nexts[0]() nexts.rotate(-1) except StopIteration: nexts.popleft() from timeit import timeit test = [tuple(range(1000))] + [()] * 1000 timeit(lambda:list(roundrobin1(*test)), number=100) # old recipe 5.184364624001319 timeit(lambda:list(roundrobin2(*test)), number=100) # new recipe 5.139592286024708 timeit(lambda:list(roundrobin3(*test)), number=100) 0.16217014100402594 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20727 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20727] Improved roundrobin itertools recipe
Gareth Rees added the comment: benchmarks show it to be more than twice as fast I'm sure they do, but other benchmarks show it to be more than twice as slow. Try something like: iterables = [range(100)] + [()] * 100 -- nosy: +Gareth.Rees ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20727 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12691] tokenize.untokenize is broken
Gareth Rees added the comment: Thanks for your work on this, Terry. I apologise for the complexity of my original report, and will try not to do it again. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12691 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19363] Python 2.7's future_builtins.map is not compatible with Python 3's map
Gareth Rees added the comment: Sorry about that; here it is. I had second thoughts about recommending zip() as an alternative (that would only work for cases where the None was constant; in other cases you might need lambda *args: args, but this seemed too complicated), so the note now says only: Note: In Python 3, map() does not accept None for the function argument. -- keywords: +patch Added file: http://bugs.python.org/file34117/issue19363.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19363 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20606] Operator Documentation Example doesn't work
Gareth Rees added the comment: The failing example is: d = {} keys = range(256) vals = map(chr, keys) map(operator.setitem, [d]*len(keys), keys, vals) which works in Python 2 where map returns a list, but not in Python 3 where map returns an iterator. Doc/library/operator.rst follows the example with this note: .. XXX: find a better, readable, example Additional problems with the example: 1. It's poorly motivated because a dictionary comprehension would be simpler and shorter: d = {i: chr(i) for i in range(256)} 2. It's also unclear why you'd need this dictionary when you could just call the function chr (but I suppose some interface might require a dictionary rather than a function). 3. To force the map to be evaluated, you need to write list(map(...)) which allocates an unnecessary list object and then throws it away. To avoid the unnecessary allocation you could use the consume recipe from the itertools documentation and write collections.deque(map(...), maxlen=0) but this is surely too obscure to use as an example. I had a look through the Python sources, and made an Ohloh Code search for operator.setitem and I didn't find any good examples of its use, so I think the best thing to do is just to delete the example. http://code.ohloh.net/search?s=%22operator.setitem%22pp=0fl=Pythonmp=1ml=1me=1md=1ff=1filterChecked=true -- nosy: +Gareth.Rees ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20606 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20606] Operator Documentation Example doesn't work
Changes by Gareth Rees g...@garethrees.org: -- keywords: +patch Added file: http://bugs.python.org/file34059/operator.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20606 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20539] math.factorial may throw OverflowError
Gareth Rees added the comment: It's not a case of internal storage overflowing. The error is from Modules/mathmodule.c:1426 and it's the input 10**19 that's too large to convert to a C long. You get the same kind of error in other places where PyLong_AsLong or PyLong_AsInt is called on a user-supplied value, for example: import pickle pickle.dumps(10**19, 10**19) Traceback (most recent call last): File stdin, line 1, in module OverflowError: Python int too large to convert to C long -- nosy: +Gareth.Rees ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20539 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12691] tokenize.untokenize is broken
Gareth Rees added the comment: I did some research on the cause of this issue. The assertion was added in this change by Jeremy Hylton in August 2006: https://mail.python.org/pipermail/python-checkins/2006-August/055812.html (The corresponding Mercurial commit is here: http://hg.python.org/cpython/rev/cc992d75d5b3#l217.25). At that point I believe the assertion was reasonable. I think it would have been triggered by backslash-continued lines, but otherwise it worked. But in this change http://hg.python.org/cpython/rev/51e24512e305 in March 2008 Trent Nelson applied this patch by Michael Foord http://bugs.python.org/file9741/tokenize_patch.diff to implement PEP 263 and fix issue719888. The patch added ENCODING tokens to the output of tokenize.tokenize(). The ENCODING token is always generated with row number 0, while the first actual token is generated with row number 1. So now every token stream from tokenize.tokenize() sets off the assertion. The lack of a test case for tokenize.untokenize() in full mode meant that it was (and is) all too easy for someone to accidentally break it like this. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12691 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12691] tokenize.untokenize is broken
Changes by Gareth Rees g...@garethrees.org: -- assignee: - docs@python components: +Documentation, Tests nosy: +docs@python ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12691 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12691] tokenize.untokenize is broken
Changes by Gareth Rees g...@garethrees.org: Removed file: http://bugs.python.org/file33919/Issue12691.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12691 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20507] TypeError from str.join has no message
New submission from Gareth Rees: If you pass an object of the wrong type to str.join, Python raises a TypeError with no error message: Python 3.4.0b3 (default, Jan 27 2014, 02:26:41) [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin Type help, copyright, credits or license for more information. ''.join(1) Traceback (most recent call last): File stdin, line 1, in module TypeError It is unnecessarily hard to understand from this error what the problem actually was. Which object had the wrong type? What type should it have been? Normally a TypeError is associated with a message explaining which type was wrong, and what it should have been. For example: b''.join(1) Traceback (most recent call last): File stdin, line 1, in module TypeError: can only join an iterable It would be nice if the TypeError from ''.join(1) included a message like this. The reason for the lack of message is that PyUnicode_Join starts out by calling PySequence_Fast(seq, ) which suppresses the error message from PyObject_GetIter. This commit by Tim Peters is responsible: http://hg.python.org/cpython/rev/8579859f198c. The commit message doesn't mention the suppression of the message so I can assume that it was an oversight. I suggest replacing the line: fseq = PySequence_Fast(seq, ); in PyUnicode_Join in unicodeobject.c with: fseq = PySequence_Fast(seq, can only join an iterable); for consistency with bytes_join in stringlib/join.h. Patch attached. -- components: Interpreter Core files: join.patch keywords: patch messages: 210200 nosy: Gareth.Rees priority: normal severity: normal status: open title: TypeError from str.join has no message type: behavior versions: Python 3.4 Added file: http://bugs.python.org/file33900/join.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20507 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20508] IndexError from ipaddress._BaseNetwork.__getitem__ has no message
New submission from Gareth Rees: If you try to look up an out-of-range address from an object returned by ipaddress.ip_network, then ipaddress._BaseNetwork.__getitem__ raises an IndexError with no message: Python 3.4.0b3 (default, Jan 27 2014, 02:26:41) [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin Type help, copyright, credits or license for more information. import ipaddress ipaddress.ip_network('2001:db8::8/125')[100] Traceback (most recent call last): File stdin, line 1, in module File /opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/ipaddress.py, line 601, in __getitem__ raise IndexError IndexError Normally an IndexError is associated with a message explaining the cause of the error. For example: [].pop() Traceback (most recent call last): File stdin, line 1, in module IndexError: pop from empty list It would be nice if the IndexError from ipaddress._BaseNetwork.__getitem__ included a message like this. With the attached patch, the error message looks like this in the positive case: ipaddress.ip_network('2001:db8::8/125')[100] Traceback (most recent call last): File stdin, line 1, in module File /Users/gdr/hg.python.org/cpython/Lib/ipaddress.py, line 602, in __getitem__ % (self, self.num_addresses)) IndexError: 100 out of range 0..7 for 2001:db8::8/125 and like this in the negative case: ipaddress.ip_network('2001:db8::8/125')[-100] Traceback (most recent call last): File stdin, line 1, in module File /Users/gdr/hg.python.org/cpython/Lib/ipaddress.py, line 608, in __getitem__ % (n - 1, self.num_addresses, self)) IndexError: -100 out of range -8..-1 for 2001:db8::8/125 (If you have a better suggestion for how the error message should read, I could submit a revised patch. I suppose it could just say address index out of range for consistency with list.__getitem__ and str.__getitem__. But I think the extra information is likely to be helpful for the programmer who is trying to track down the cause of an error.) -- components: Library (Lib) files: ipaddress.patch keywords: patch messages: 210224 nosy: Gareth.Rees priority: normal severity: normal status: open title: IndexError from ipaddress._BaseNetwork.__getitem__ has no message type: behavior versions: Python 3.4 Added file: http://bugs.python.org/file33903/ipaddress.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20508 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20508] IndexError from ipaddress._BaseNetwork.__getitem__ has no message
Changes by Gareth Rees g...@garethrees.org: -- type: behavior - enhancement ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20508 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19362] Documentation for len() fails to mention that it works on sets
Gareth Rees added the comment: Here's a revised patch using Ezio's suggestion (Return the number of items of a sequence or container). -- Added file: http://bugs.python.org/file33904/len-set.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19362 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19362] Documentation for len() fails to mention that it works on sets
Changes by Gareth Rees g...@garethrees.org: -- title: Documentation for len() fails to mention that it works on sets - Documentation for len() fails to mention that it works on sets versions: +Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19362 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14376] sys.exit documents argument as integer but actually requires subtype of int
Gareth Rees added the comment: Patch attached. I added a test case to Lib/test/test_sys.py. -- Added file: http://bugs.python.org/file33906/exit.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14376 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20510] Test cases in test_sys don't match the comments
New submission from Gareth Rees: Lib/test/test_sys.py contains test cases with incorrect comments -- or comments with incorrect test cases, if you prefer: # call without argument try: sys.exit(0) except SystemExit as exc: self.assertEqual(exc.code, 0) ... # call with tuple argument with one entry # entry will be unpacked try: sys.exit(42) except SystemExit as exc: self.assertEqual(exc.code, 42) ... # call with integer argument try: sys.exit((42,)) except SystemExit as exc: self.assertEqual(exc.code, 42) ... (In the quote above I've edited out some inessential detail; see the file if you really want to know.) You can see that in the first test case sys.exit is called with an argument (although the comment claims otherwise); in the second it is called with an integer (not a tuple), and in the third it is called with a tuple (not an integer). These comments have been unchanged since the original commit by Walter Dörwald http://hg.python.org/cpython/rev/6a1394660270. I've attached a patch that corrects the first test case and swaps the comments for the second and third test cases: # call without argument rc = subprocess.call([sys.executable, -c, import sys; sys.exit()]) self.assertEqual(rc, 0) # call with integer argument try: sys.exit(42) except SystemExit as exc: self.assertEqual(exc.code, 42) ... # call with tuple argument with one entry # entry will be unpacked try: sys.exit((42,)) except SystemExit as exc: self.assertEqual(exc.code, 42) ... Note that in the first test case (without an argument) sys.exit() with no argument actually raises SystemExit(None), so it's not sufficient to catch the SystemExit and check exc.code; I need to check that it actually gets translated to 0 on exit. -- components: Tests files: exittest.patch keywords: patch messages: 210246 nosy: Gareth.Rees priority: normal severity: normal status: open title: Test cases in test_sys don't match the comments type: enhancement versions: Python 3.4 Added file: http://bugs.python.org/file33908/exittest.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20510 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19363] Python 2.7's future_builtins.map is not compatible with Python 3's map
Gareth Rees added the comment: What about a documentation change instead? The future_builtins chapter http://docs.python.org/2/library/future_builtins.html in the standard library documentation could note the incompatibility. I've attached a patch which adds the following note to the documentation for future_builtins.map: Note: In Python 3, map() does not accept None for the function argument. (zip() can be used instead.) -- status: closed - open ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19363 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20510] Test cases in test_sys don't match the comments
Gareth Rees added the comment: I normally try not to make changes while we're in here for fear of introducing errors! But I guess the test cases are less critical, so I've taken your review comments as a license to submit a revised patch that: * incorporates your suggestion to use assert_python_ok from test.script_helper, instead of subprocess.call; * replaces the other uses of subprocess.call with assert_python_failure and adds a check on stdout; * cleans up the assertion-testing code using the context manager form of unittest.TestCase.assertRaises. I've signed and submitted a contributor agreement as requested. -- Added file: http://bugs.python.org/file33914/exittest-1.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20510 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19362] Documentation for len() fails to mention that it works on sets
Gareth Rees added the comment: Here's a revised patch for Terry (Return the number of items of a sequence or collection.) -- Added file: http://bugs.python.org/file33916/len-set.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19362 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12691] tokenize.untokenize is broken
Gareth Rees added the comment: Yury, let me see if I can move this issue forward. I clearly haven't done a good job of explaining these problems, how they are related, and why it makes sense to solve them together, so let me have a go now. 1. tokenize.untokenize() raises AssertionError if you pass it a sequence of tokens output from tokenize.tokenize(). This was my original problem report, and it's still not fixed in Python 3.4: Python 3.4.0b3 (default, Jan 27 2014, 02:26:41) [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin Type help, copyright, credits or license for more information. import tokenize, io t = list(tokenize.tokenize(io.BytesIO('1+1'.encode('utf8')).readline)) tokenize.untokenize(t) Traceback (most recent call last): File stdin, line 1, in module File /opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/tokenize.py, line 317, in untokenize out = ut.untokenize(iterable) File /opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/tokenize.py, line 246, in untokenize self.add_whitespace(start) File /opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/tokenize.py, line 232, in add_whitespace assert row = self.prev_row AssertionError This defeats any attempt to use the sequence: input code - tokenize - transform - untokenize - output code to transform Python code. But this ought to be the main use case for the untokenize function! That's how I came across the problem in the first place, when I was starting to write Minipy https://github.com/gareth-rees/minipy. 2. Fixing problem #1 is easy (just swap = for =), but it raises the question: why wasn't this mistake caught by test_tokenize? There's a test function roundtrip() whose docstring says: Test roundtrip for `untokenize`. `f` is an open file or a string. The source code in f is tokenized, converted back to source code via tokenize.untokenize(), and tokenized again from the latter. The test fails if the second tokenization doesn't match the first. If I don't fix the problem with roundtrip(), then how can I be sure I have fixed the problem? Clearly it's necessary to fix the test case and establish that it provokes the assertion. So why doesn't roundtrip() detect the error? Well, it turns out that tokenize.untokenize() has two modes of operation and roundtrip() only tests one of them. The documentation for tokenize.untokenize() is rather cryptic, and all it says is: Each element returned by the [input] iterable must be a token sequence with at least two elements, a token number and token value. If only two tokens are passed, the resulting output is poor. By reverse-engineering the implementation, it seems that it has two modes of operation. In the first mode (which I have called compatibility mode after the method Untokenizer.compat() that implements it) you pass it tokens in the form of 2-element tuples (type, text). These must have exactly 2 elements. In the second mode (which I have called full mode based on the description full input in the docstring) you pass it tokens in the form of tuples with 5 elements (type, text, start, end, line). These are compatible with the namedtuples returned from tokenize.tokenize(). The full mode has the buggy assertion, but test_tokenize.roundtrip() only tests the compatibility mode. So I must (i) fix roundtrip() so that it tests both modes; (ii) improve the documentation for tokenize.untokenize() so that programmers have some chance of figuring this out in future! 3. As soon as I make roundtrip() test both modes it provokes the assertion failure. Good, so I can fix the assertion. Problem #1 solved. But now there are test failures in full mode: $ ./python.exe -m test test_tokenize [1/1] test_tokenize ** File /Users/gdr/hg.python.org/cpython/Lib/test/test_tokenize.py, line ?, in test.test_tokenize.__test__.doctests Failed example: for testfile in testfiles: if not roundtrip(open(testfile, 'rb')): print(Roundtrip failed for file %s % testfile) break else: True Expected: True Got: Roundtrip failed for file /Users/gdr/hg.python.org/cpython/Lib/test/test_platform.py ** 1 items had failures: 1 of 73 in test.test_tokenize.__test__.doctests ***Test Failed*** 1 failures. test test_tokenize failed -- 1 of 78 doctests failed 1 test failed: test_tokenize Examination of the failed tokenization shows that if the source
[issue12691] tokenize.untokenize is broken
Changes by Gareth Rees g...@garethrees.org: -- nosy: +benjamin.peterson ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12691 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19362] Documentation for len() fails to mention that it works on sets
New submission from Gareth Rees: The help text for the len() built-in function says: Return the number of items of a sequence or mapping. This omits to mention that len() works on sets too. I suggest this be changed to: Return the number of items of a sequence, mapping, or set. Similarly, the documentation for len() says: The argument may be a sequence (string, tuple or list) or a mapping (dictionary). I suggest this be changed to The argument may be a sequence (string, tuple or list), a mapping (dictionary), or a set. (Of course, strictly speaking, len() accepts any object with a __len__ method, but sequences, mappings and sets are the ones that are built-in to the Python core, and so these are the ones it is important to mention in the help and the documentation.) -- assignee: docs@python components: Documentation files: len-set.patch keywords: patch messages: 201019 nosy: Gareth.Rees, docs@python priority: normal severity: normal status: open title: Documentation for len() fails to mention that it works on sets type: enhancement Added file: http://bugs.python.org/file32313/len-set.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19362 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19363] Python 2.7's future_builtins.map is not compatible with Python 3's map
New submission from Gareth Rees: In Python 2.7, future_builtins.map accepts None as its first (function) argument: Python 2.7.5 (default, Aug 1 2013, 01:01:17) from future_builtins import map list(map(None, range(3), 'ABC')) [(0, 'A'), (1, 'B'), (2, 'C')] But in Python 3.x, map does not accept None as its first argument: Python 3.3.2 (default, May 21 2013, 11:50:47) list(map(None, range(3), 'ABC')) Traceback (most recent call last): File stdin, line 1, in module TypeError: 'NoneType' object is not callable The documentation says, if you want to write code compatible with Python 3 builtins, import them from this module, so this incompatibility may give Python 2.7 programmers the false impression that a program which uses map(None, ...) is portable to Python 3. I suggest that future_builtins.map in Python 2.7 should behave the same as map in Python 3: that is, it should raise a TypeError if None was passed as the first argument. -- components: Library (Lib) messages: 201020 nosy: Gareth.Rees priority: normal severity: normal status: open title: Python 2.7's future_builtins.map is not compatible with Python 3's map type: behavior versions: Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19363 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19362] Documentation for len() fails to mention that it works on sets
Gareth Rees added the comment: I considered suggesting container, but the problem is that container is used elsewhere to mean object supporting the 'in' operator (in particular, collections.abc.Container has a __contains__ method but no __len__ method). The abstract base class for object with a length is collections.abc.Sized, but I don't think using the term sized would be clear to users. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19362 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14376] sys.exit documents argument as integer but actually requires subtype of int
Gareth Rees g...@garethrees.org added the comment: Wouldn't you also have to deal with possible errors from the PyInt_AsLong call? Good point. But I note that Python 3 just does exitcode = (int)PyLong_AsLong(value); so maybe it's not important to do error handling here. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14376 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14376] sys.exit documents argument as integer but actually requires subtype of int
New submission from Gareth Rees g...@garethrees.org: The documentation for sys.exit says, The optional argument arg can be an integer giving the exit status (defaulting to zero), or another type of object. However, the arguments that are treated as exit statuses are actually subtypes of int. So, a bool argument is fine: $ python2.7 -c import sys; sys.exit(False); echo $? 0 But a long argument is not: $ python2.7 -c import sys; sys.exit(long(0)); echo $? 0 1 The latter behaviour can be surprising since functions like os.spawnv may return the exit status of the executed process as a long on some platforms, so that if you try to pass on the exit code via code = os.spawnv(...) sys.exit(code) you may get a mysterious surprise: code is 0 but exit code is 1. It would be simple to change line 1112 of pythonrun.c from if (PyInt_Check(value)) to if (PyInt_Check(value) || PyLong_Check(value)) (This issue is not present in Python 3 because there is no longer a distinction between int and long.) -- components: Library (Lib) messages: 156470 nosy: Gareth.Rees priority: normal severity: normal status: open title: sys.exit documents argument as integer but actually requires subtype of int type: behavior versions: Python 2.6, Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14376 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12700] test_faulthandler fails on Mac OS X Lion
Gareth Rees g...@garethrees.org added the comment: After changing NULL to (int *)1, all tests pass. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12700 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12700] test_faulthandler fails on Mac OS X Lion
Gareth Rees g...@garethrees.org added the comment: All tests now pass. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12700 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12691] tokenize.untokenize is broken
Gareth Rees g...@garethrees.org added the comment: I think I can make these changes independently and issue two patches, one fixing the problems with untokenize listed here, and another improving tokenize. I've just noticed a third bug in untokenize: in full mode, it doesn't handle backslash-continued lines correctly. Python 3.3.0a0 (default:c099ba0a278e, Aug 2 2011, 12:35:03) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin Type help, copyright, credits or license for more information. from io import BytesIO from tokenize import tokenize, untokenize untokenize(tokenize(BytesIO('1 and \\\n not 2'.encode('utf8')).readline)) b'1 andnot 2' -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12691 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com