Re: [Python-Dev] PEP 471 (scandir): Add a new DirEntry.inode() method?
Antoine Pitrou solip...@pitrou.net: The whole point of scandir is to expose low-level system calls in a cross-platform way. Cross-platform is great and preferable, but low-level system facilities should be made available even when they are unique to a particular OS. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 2.x and 3.x use survey, 2014 edition
Mark Roberts wiz...@gmail.com: it's outright insulting to be told my complaints about writing 2/3 compatible code are invalid on the basis of premature optimization. IMO, you should consider forking your library code for Python2 and Python3. The multidialect code will look unidiomatic for each dialect. When the critical mass favors Python3 (possibly within a couple of years), the transition will be as total and quick as from VHS to DVDs. At that point, a multidialect library would seem quaint, while a separate Python2 fork can simply be left behind (bug fixes only). Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 2.x and 3.x use survey, 2014 edition
Brian Curtin br...@python.org: I'm a few inches shorter than Brett, but having done several sizable ports, dual-source has never even on the table. I would prefer the run 2to3 at installation time option before maintaining two versions (which I do not prefer at all in reality). How about run 3to2 at installation time? Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] datetime nanosecond support (ctd?)
Ethan Furman et...@stoneleaf.us: On 12/11/2014 11:14 AM, Guido van Rossum wrote: (I wouldn't be surprised if there wasn't -- while computer clocks have a precision in nanoseconds, that doesn't mean they are that *accurate* at all (even with ntpd running). The real-world use cases deal with getting this information from other devices (network cards, GPS, particle accelerators, etc.), so it's not really a matter of cross-computer accurancy, but relative accuracy (i.e. how long did something take?). It would be nice if it were possible to deal with high-precision epoch times and time deltas without special tricks. I have had to deal with femtosecond-precision IRL (albeit in a realtime C application, not in Python). Quad-precision floats (URL: http://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format) would do it for Python: * just do it in seconds * have enough precision for any needs * have enough renge for any needs Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Critical bash vulnerability CVE-2014-6271 may affect Python on *n*x and OSX
Steven D'Aprano st...@pearwood.info: Perhaps I'm missing something, but aren't there easier ways to attack os.system than the bash env vulnerability? The main concern is the cases where you provide a service accessible through an SSH login and try to sandbox the client with limited functionality. SSH passes some environment variables on to the service which can then be used as an XSS vector. For example, if you wrote an SVN server's SSH front end with Python and used subprocess.Popen(shell=True) to execute the SVN operations, you could become a victim. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 394 - Clarification of what python command should invoke
Donald Stufft don...@stufft.io: My biggest problem with ``python3``, is what happens after 3.9. I know Guido doesn’t particularly like two digit version numbers and it’s been suggested on this list that instead of 3.10 we’re likely to move directly into 4.0 regardless of if it’s a “big” change or not. python3 should be the name of the programming language specification. If CPython-4.0 supports all of the python3 programming language, /usr/bin/python3 should be a symbolic link to CPython-4.0. Or, if I reimplemented python3 with cyphon0.7, python3 could be a link to that. If CPython-4.x dropped support for some python3 features, you would no longer link python3 to it. Now, what would plain python be? It would make life easier for a lot of people if it were python2 for all eternity. By analogy, look at #!/bin/sh, which used to invoke the Bourne shell, later /bin/bash and on modern Debian, /bin/dash. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: PEP 475, Retry system calls failing with EINTR
Victor Stinner victor.stin...@gmail.com: No, it's the opposite. The PEP doesn't change the default behaviour of SIGINT: CTRL+C always interrupt the program. Which raises an interesting question: what happens to the os.read() return value if SIGINT is received? Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: PEP 475, Retry system calls failing with EINTR
Charles-François Natali cf.nat...@gmail.com: Which raises an interesting question: what happens to the os.read() return value if SIGINT is received? There's no return value, a KeywordInterrupt exception is raised. The PEP wouldn't change this behavior. Slightly disconcerting... but I'm sure overriding SIGINT would cure that. You don't want to lose data if you want to continue running. As for the general behavior: all programming languages/platforms handle EINTR transparently. C doesn't. EINTR is there for a purpose. I sure hope Python won't bury it under opaque APIs. The two requirements are: * Allow the application to react to signals immediately in the main flow. * Don't lose information. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: PEP 475, Retry system calls failing with EINTR
R. David Murray rdmur...@bitdance.com: Windows. Enough said? [...] This should tell you just about everything you need to know about why we want to fix this problem so that things work cross platform. I feel your pain. Well, not really; I just don't want my linux bliss to be taken away. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: PEP 475, Retry system calls failing with EINTR
R. David Murray rdmur...@bitdance.com: On Mon, 01 Sep 2014 14:15:52 +0300, Marko Rauhamaa ma...@pacujo.net wrote: * Allow the application to react to signals immediately in the main flow. You don't want to be writing your code in Python then. In Python you *never* get to react immediately to signals. The interpreter sets a flag and calls the python signal handler later. Yes, the call is ASAP, but ASAP is *not* immediately. You don't have to get that philosophical. Immediately means, without delay, without further I/O. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: PEP 475, Retry system calls failing with EINTR
Victor Stinner victor.stin...@gmail.com: Proposition === If a system call fails with ``EINTR``, Python must call signal handlers: call ``PyErr_CheckSignals()``. If a signal handler raises an exception, the Python function fails with the exception. Otherwise, the system call is retried. If the system call takes a timeout parameter, the timeout is recomputed. Signals are tricky and easy to get wrong, to be sure, but I think it is dangerous for Python to unconditionally commandeer signal handling. If the proposition is accepted, there should be a way to opt out. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: PEP 475, Retry system calls failing with EINTR
Victor Stinner victor.stin...@gmail.com: Sorry but I don't understand your remark. What is your problem with retrying syscall on EINTR? The application will often want the EINTR return (exception) instead of having the function resume on its own. Can you please elaborate? What do you mean by get wrong? Proper handling of signals is difficult and at times even impossible. For example it is impossible to wake up reliably from the select(2) system call when a signal is generated (which is why linux now has pselect). Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: PEP 475, Retry system calls failing with EINTR
Victor Stinner victor.stin...@gmail.com: But I don't get you point. How does this PEP make the situation worse? Did I say it would? I just wanted to make sure the system call resumption doesn't become mandatory. Haven't thought through what the exception raising technique would entail. It might be perfectly ok apart from being a change to the signal handler API. I don't know issues of signals with select() (and without a file descriptor used to wake up it). A signal handler often sets a flag, which is inspected when select() returns. The problem is when a signal arrives between testing the flag and calling select(). The pselect() system call allows you to block signals and have the system call unblock them correctly to avoid the race. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: PEP 475, Retry system calls failing with EINTR
Ethan Furman et...@stoneleaf.us: On 08/31/2014 02:19 PM, Marko Rauhamaa wrote: The application will often want the EINTR return (exception) instead of having the function resume on its own. Examples? As an ignorant person in this area, I do not know why I would ever want to have EINTR raised instead just getting the results of, say, my read() call. Say you are writing data into a file and it takes a long time (because there is a lot of data or the medium is very slow or there is a hardware problem). You might have designed in a signaling scheme to address just this possibility. Then, the system call had better come out right away without trying to complete the full extent of the call. If a signal is received when read() or write() has completed its task partially ( 0 bytes), no EINTR is returned but the partial count. Obviously, Python should take that possibility into account so that raising an exception in the signal handler (as mandated by the PEP) doesn't cause the partial result to be lost on os.read() or os.write(). Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: PEP 475, Retry system calls failing with EINTR
R. David Murray rdmur...@bitdance.com: PS: I recently switched from using selectors to using a timeout on a socket because in that particular application I could, and because reading a socket with a timeout handles EINTR (in recent python versions), whereas reading a non-blocking socket doesn't. Under the hood, a socket with a timeout is a non-blocking socket. Under what circumstances would a nonblocking socket generate an EINTR? I believe the biggest EINTR problem child is file I/O, which is always blocking in linux. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 476: Enabling certificate validation by default!
Paul Moore p.f.mo...@gmail.com: Cool, in which case this sounds like a good plan. I have no particular opinion on whether there should be a global Python-level don't check certificates option, but I would suggest that the docs include a section explaining how a user can implement a --no-check-certificates flag in their program if they want to (with appropriate warnings as to the risks, of course!). Better to explain how to do it properly than to say you shouldn't do that and have developers implement awkward or incorrect hacks in spite of the advice. Will there be a way to specify a particular CA certificate (as in wget --ca-certificate)? Will there be a way to specify a particular CA certificate directory (as in wget --ca-directory)? Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
Stephen J. Turnbull step...@xemacs.org: Just read as bytes and decode piecewise in one way or another. For Oleg's HTML case, there's a well-understood structure that can be used to determine retry points HTML and XML are interesting examples since their encoding is initially unknown: ?xml version=1.0? ^ +--- Now I know it is UTF-8 ?xml version=1.0 encoding=UTF-16? ^ +--- Now I know it was UTF-16 all along! Then we have: HTTP/1.1 200 OK Content-Type: text/html; charset=ISO-8859-1 !DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN html head meta http-equiv=Content-Type content=text/html; charset=utf-16 See how deep you have to parse the TCP stream before you realize the content encoding is UTF-16. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
Isaac Morland ijmor...@uwaterloo.ca: HTTP/1.1 200 OK Content-Type: text/html; charset=ISO-8859-1 !DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN html head meta http-equiv=Content-Type content=text/html; charset=utf-16 For HTML it's not quite so bad. According to the HTML 4 standard: [...] The Content-Type header takes precedence over a meta element. I thought I read once that the reason was to allow proxy servers to transcode documents but I don't have a cite for that. Also, the meta element must only be used when the character encoding is organized such that ASCII-valued bytes stand for ASCII characters so the initial UTF-16 example wouldn't be conformant in HTML. That's not how I read it: The META declaration must only be used when the character encoding is organized such that ASCII characters stand for themselves (at least until the META element is parsed). META declarations should appear as early as possible in the HEAD element. URL: http://www.w3.org/TR/1998/REC-html40-19980424/charset.ht ml#doc-char-set IOW, you must obey the HTTP character encoding until you have parsed a conflicting META content-type declaration. The author of the standard keeps a straight face and continues: For cases where neither the HTTP protocol nor the META element provides information about the character encoding of a document, HTML also provides the charset attribute on several elements. By combining these mechanisms, an author can greatly improve the chances that, when the user retrieves a resource, the user agent will recognize the character encoding. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
R. David Murray rdmur...@bitdance.com: The same problem existed in python2 if your goal was to produce a stream with a consistent encoding, but now python3 treats that as an error. I have a different interpretation of the situation: as a rule, use byte strings in Python3. Text strings are a special corner case for applications that have to deal with human languages. If your application has to talk SMTP, use bytes. If your application has to do IPC, use bytes. If your application has to do file I/O, use bytes. If your application is a word processor or an IM client, you have text strings available. You might find, though, that barely any modern GUI application is satisfied with crude text strings. You will need weights, styles, sizes, emoticons, positions, directions, shadows, alignment etc etc so it may be that Python's text strings are only good enough for storing individual characters or short snippets. In sum, Python's text strings might have one sweet spot: Usenet clients. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
Martin v. Löwis mar...@v.loewis.de: I think the people defending the Unix file names are just bytes side often miss an important detail: displaying file names to the user, and allowing the user to enter file names. The user interface is a real issue and needs to be addressed. It is separate from the OS interface, though. A script that just needs to traverse a directory tree and look at files by certain criteria can easily do so with not worrying about a text interpretation of the file names. A single system often has file names that have been encoded with different schemes. Only today, I have had to deal with the JIS character table (URL: http://i.msdn.microsoft.com/cc305152.932%28en-us,MSDN.10%29.gif) -- you will notice that it doesn't have a backslash character. A coworker uses ISO-8859-1. I use UTF-8. UTF-8, of course, will refuse to deal with some byte sequences. My point is that the poor programmer cannot ignore the possibility of funny character sets. If Python tried to protect the programmer from that possibility, the result might be even more intractable: how to act on a file with an non-UTF-8 filename if you are unable to express it as a text string? Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
Nick Coghlan ncogh...@gmail.com: Python 3 says it's *our* problem to deal with on behalf of our developers. URL: http://www.imdb.com/title/tt0120623/quotes?item=qt0353406 Flik: I was just trying to help. Mr. Soil: Then help us; *don't* help us. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
Tres Seaver tsea...@palladion.com: On 08/19/2014 01:43 PM, Ben Hoyt wrote: Fair enough. I don't quite understand, though -- why is the official policy to kill something that's essential on *nix? ISTM that the policy is based on a fantasy that it looks like text to me in my use cases, so therefore it must be text for everyone. What I like about Python is that it allows me to write native linux code without having to make portability compromises that plague, say, Java. I have select.epoll(). I have os.fork(). I have socket.TCP_CORK. The textualization of Python3 seems part of a conscious effort to make Python more Java-esque. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bytes path support
Guido van Rossum gu...@python.org: With my serious hat on, I would like to claim that *conceptually* filenames are most definitely text. Due to various historical accidents the UNIX system calls often encoded text as arguments, and we sometimes need to control that encoding. Due to historical accidents, text (in the Python sense) is not a first-class data type in Unix. Text, machine language, XML, Python etc are interpretations of bytes. Bytes are the first-class data type recognized by the kernel. That reality cannot be wished away. Hence the occasional need for bytes arguments. But most of the time you don't have to think about that, and forcing users to worry about it is mostly as counter-productive as forcing to think about the encoding of every text file. The users of Python programs can often be given higher-level facades. Unix programmers, though, shouldn't be shielded from bytes. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Multiline with statement line continuation
Steven D'Aprano st...@pearwood.info: I simply don't care. They will try it, discover that tuples are not context managers, fix their code, and move on. *Could* tuples (and lists and sequences) be context managers? *Should* tuples (and lists and sequences) be context managers? I don't think that some vague similarity between it and tuples is justification for rejecting the proposal. You might be able to have it bothways. You could have: with (open(name) for name in os.listdir(config)) as files: ... Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?
Nick Coghlan ncogh...@gmail.com: Right, it's not a mere optimisation - it's the only way to get containers to behave sensibly. Otherwise we'd end up with nonsense like: x = float(nan) x in [x] False Why is that nonsense? I mean, why is it any more nonsense than x == x False Anyway, personally, I'm perfectly happy to live with the choices of past generations, regardless of whether they were good or not. What you absolutely don't want to do is correct the choices of past generations. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Guido van Rossum gu...@python.org: Yeah, so the pyftp fix is to keep track of how many timers were cancelled, and if the number exceeds a threshold it just recreates the heap, something like heap = [x for x in heap if not x.cancelled] heapify(heap) I measured my target use case with a simple emulation on my linux PC. The simple test case emulates this scenario: Start N connections at frequency F and have each connection start a timer T. Then, rotate over the connections at the same frequency F restarting timer T. Stop after a duration that is much greater than T. Four different timer implementations were considered: HEAPQ: straight heapq HEAPQ*: heapq with the pyftp fix (reheapify whenever 80% of the outstanding timers have been canceled) SDICT: sorteddict (my C implementation) PyAVL: Python AVL tree (my implementation) Here are the results: N = 1000, F = 100 Hz, T = 10 min, duration 1 hr = Virt Res max len() urs sys CPU MB MB s s % = HEAPQ22 166000121.5 4.3 0.7 HEAPQ* 117 500018.4 4.2 0.6 SDICT116 100018.2 3.9 0.6 PyAVL116 100039.3 3.6 1.2 = N = 1, F = 1000 Hz, T = 10 min, duration 1 hr = Virt Res max len() urs sys CPU MB MB s s % = HEAPQ 125 120 600044 223.0 25.8 6.9 HEAPQ* 21 165 186.8 30.0 6.0 SDICT15 101 196.6 25.7 6.2 PyAVL16 111 412.5 22.3 12.1 = Conclusions: * The CPU load is almost identical in HEAPQ, HEAPQ* and SDICT. * HEAPQ* is better than HEAPQ because of the memory burden. * PyAVL is not all that bad compared with SDICT. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Raymond Hettinger raymond.hettin...@gmail.com: * An AVL balanced tree isn't the only solution or necessarily the best solution to the problem. Tree nodes tend to take more space than denser structures and they have awful cache locality (these are the same reasons that deques use doubly-linked blocks rather than a plain doubly linked lists). Maybe. The key is the API. The implementation underneath should be changeable. For example, Jython would probably use SortedTree to implement it. Performance tests should help decide when an implementation is switched for a more efficient one. In some of my tests, I haven't seen any significant performance differences between RB trees and AVL trees, for example. The blist implementation, which I have taken a quick glance at, buys cache locality at the price of block copying; I have no data to decide if the tradeoff is a good one. The main thing, IMO, is to get one sorted dictionary in. * The name of the tool probably should not be sortedtree. Correct. That was a mistake on the Subject line. In the code, it's sorteddict. * That said, it is a reasonable possibility that the standard library would benefit from some kind sorted collection (the idea comes up from time to time). Yes. As a user, I have craved for an implementation, which is readily available in Java and the linux kernel, for example. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Negative timedelta strings
Greg Ewing greg.ew...@canterbury.ac.nz: ISO 8601 doesn't seem to define a representation for negative durations, though, so it wouldn't solve the original problem. XSD uses ISO 8601 durations and allows a sign before the initial P. It would appear PT1M means 60 or 61 seconds. P1D means 23, 24 or 25 hours. P1M means 28..31 days etc. Timedelta would have no option but to stick to seconds: P29389453.2345S but then, why not simply use a number: 29389453.2345 Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Marko Rauhamaa ma...@pacujo.net: For example, Jython would probably use SortedTree to implement it. That word just keeps coming out of my keyboard. The Java class is of course the TreeMap. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Thomas Wouters tho...@python.org: Not to mention discussion about whether it shouldn't just be an existing PyPI package, like http://pypi.python.org/pypi/blist, rather than a new implementation. I'm fine with any implementation as long as it is in the standard library. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] collections.sortedtree
I have made a full implementation of a balanced tree and would like to know what the process is to have it considered for inclusion in Python 3. To summarize, the implementation closely parallels dict() features and resides in _collectionsmodule.c under the name collections.sortedtree. It uses solely the operator to compare keys. I have chosen the AVL tree as an implementation technique. The views support a number of optional arguments: sorteddict.keys(reversed=False, start=unspecified, inclusive=True) The primary objective of having a balanced tree in the standard library is to support ordered access in an efficient manner. The typical applications would include timers (networking), aging (cache) and prefix patterns (routing). Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Guido van Rossum gu...@python.org: Actually, the first step is publish it on PyPI, the second is to get a fair number of happy users there. The bar for getting something included into the stdlib is pretty high -- you need to demonstrate that there is a need *and* that having it as a 3rd party module is a problem. I hear you about the process. About the need part, I'm wondering if you haven't felt it in implementing the timers for asyncio. I have had that need in several network programming projects and have ended up using my AVL tree implementation (C and Python). Well, time will tell if frequent canceled timers end up piling up the heap queue. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Antoine Pitrou solip...@pitrou.net: Wouldn't a heapq work as well for those two? In my experience, networking entities typically start a timer at each interaction and cancel the pending one. So you have numerous timers that virtually never expire. You might have 100 interactions per second, each canceling and restarting a 10-minute timer. I don't know first hand if that causes heap queues to cause measurable heap or CPU pressure. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Terry Reedy tjre...@udel.edu: Perhaps the collections doc should mention that there are other specializes container types available on PyPI and either list some or point to a wiki page listing some. There must be at least 10 that could be included in such a list. There *is* a relatively high threshold of importing C extensions from an external source. If I build an application making use of them and advise coworkers to use it, they would likely balk at having to compile them. Not all machines have a development toolkit. Furthermore: # which pip /usr/bin/which: no pip in (/usr/local/sbin:/usr/local/bin:/sbin:/bin:\ /usr/sbin:/usr/bin:/root/bin) # yum install pip No package pip available. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Antoine Pitrou solip...@pitrou.net: Marko Rauhamaa ma...@pacujo.net wrote: In my experience, networking entities typically start a timer at each interaction and cancel the pending one. So you have numerous timers that virtually never expire. You might have 100 interactions per second, each canceling and restarting a 10-minute timer. Each individual heapq operation (push or pop) will be O(log n). That's not different from a balanced search tree (although of course the constant multiplier may vary). Yes, but if I have 1000 connections with one active timer each. The size of my sorted tree is 1000 timer objects. There are typically no expiries to react to. If the canceled timer lingers in the heapq till its expiry (in 10 minutes), the size is 100 * 10 * 60 = 60,000. The CPU has to wake up constantly to clear the expired timers. In practice, none of that might matter. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Dan Stromberg drsali...@gmail.com: It'd likely make sense to have either a pure python implementation, or pure python and C-extended, so that Pypy and Jython can share the feature with CPython. Jython can build directly on Java's native SortedMap implementation. The API should not tie it to a tree. Optimizations and refactorings should be allowed. Only O(log N) worst-case behavior should be mandated. (And now I notice I named this thread wrong; I named my thingy collections.sorteddict.) Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com