Re: [Python-Dev] collections.sortedtree
vne On Mon, Mar 31, 2014 at 2:01 PM, Daniel Stutzbach stutzb...@google.com wrote: On Fri, Mar 28, 2014 at 6:45 PM, Dan Stromberg drsali...@gmail.com wrote: In my testing blist.sorteddict was dead last for random keys, and wasn't last but was still significantly underperforming for sequential keys (outperforming only binary tree and scapegoat tree, behind all others): http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/2014-03-18/ Could you post the source code for your test tools, so that I can reproduce them locally and understand the results better? I think I'm confused about what you're trying to measure. It looks like the tests perform get and set operations, neither of which required a sorted dict. Wouldn't a good comparison of sorted dict types include at least one operation that relies on the sorted property? Possibly I've misunderstood how your tests work. The code is at http://stromberg.dnsalias.org/svn/python-tree-and-heap-comparison/trunk/ You're right, I'm not comparing find_min, find_max, generate_in_order or generate_reverse_order. The first two tend to be the same as find_arbitrary. I've been thinking about trying to import some C extension module trees, and just ignoring them if they fail to import. But I haven't. Trunk only compares pure python implementations - even for treap, which has a cython version. I tossed my changes after checking blist. Trunk does operations/second. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Guido van Rossum gu...@python.org: Yeah, so the pyftp fix is to keep track of how many timers were cancelled, and if the number exceeds a threshold it just recreates the heap, something like heap = [x for x in heap if not x.cancelled] heapify(heap) I measured my target use case with a simple emulation on my linux PC. The simple test case emulates this scenario: Start N connections at frequency F and have each connection start a timer T. Then, rotate over the connections at the same frequency F restarting timer T. Stop after a duration that is much greater than T. Four different timer implementations were considered: HEAPQ: straight heapq HEAPQ*: heapq with the pyftp fix (reheapify whenever 80% of the outstanding timers have been canceled) SDICT: sorteddict (my C implementation) PyAVL: Python AVL tree (my implementation) Here are the results: N = 1000, F = 100 Hz, T = 10 min, duration 1 hr = Virt Res max len() urs sys CPU MB MB s s % = HEAPQ22 166000121.5 4.3 0.7 HEAPQ* 117 500018.4 4.2 0.6 SDICT116 100018.2 3.9 0.6 PyAVL116 100039.3 3.6 1.2 = N = 1, F = 1000 Hz, T = 10 min, duration 1 hr = Virt Res max len() urs sys CPU MB MB s s % = HEAPQ 125 120 600044 223.0 25.8 6.9 HEAPQ* 21 165 186.8 30.0 6.0 SDICT15 101 196.6 25.7 6.2 PyAVL16 111 412.5 22.3 12.1 = Conclusions: * The CPU load is almost identical in HEAPQ, HEAPQ* and SDICT. * HEAPQ* is better than HEAPQ because of the memory burden. * PyAVL is not all that bad compared with SDICT. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Tres Seaver writes: I'm mostly arguing the FLOSS project You mean a (mostly) volunteer-supported FLOSS project, no? should feel free to ignore Given the above qualification, you can put a period here, as far as I'm concerned. My question is what does *Python* *want* to ignore?, not is it allowed to ignore? high-maintenance-cost What high maintenance cost? A stdlib addition is a marginal increase in cost. Do too many, and it's a serious burden, of course. So there needs to be *some* hurdle that any addition must clear, and that barrier becomes higher in proportion to the code base to developers who actually do maintenance ratio. But I think the question should be how high? not can they pay? commercial concerns until those concerns bring either blook (funded developer time) or treasure (pooled to pay for the same time) to the table to pay for them. I really don't think commercial profit as the motive for a request, or ability to pay, should be an important reason to *ignore* user wants. This smacks of the proposals for ransom software, which (a) sort of turns the principle of *volunteer* FLOSS on its head (if that doesn't bother most of the developers, no problem, but for one it bothers *me*), and (b) doesn't actually work AFAICT (it's not quite the same as crowdfunding, which does work). It's rather the reverse: I believe we should be prepared to deal with the conflict of interest that results when *some* of the developers are offered money to provide *cater* to such concerns[1] where the community doesn't think the benefit is that high. Footnotes: [1] Should not be that hard in *this* community, but that is the issue IMO. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 03/29/2014 03:46 AM, Stephen J. Turnbull wrote: I really don't think commercial profit as the motive for a request, or ability to pay, should be an important reason to *ignore* user wants. We've already got corrosion on the terminals the leaky batteries-included stdlib (ssl, anyone?). I see nothing wrong with rejecting additional batteries proposed purely FBO organizations who have the kind of policies you describe, but don't contribute blood-or-treasure toward their maintenannce (*especially* because they are in the enterprise space, and could be expected to pay for that kind of support). This kind of Python-in-a-tie discussion is probably moot for any version of Python that python-dev actually cares about, BTW: by the time such organizations get around to using 3.5, it likely won't even be getting security releases from the core developers. Tres. - -- === Tres Seaver +1 540-429-0999 tsea...@palladion.com Palladion Software Excellence by Designhttp://palladion.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlM3Q2wACgkQ+gerLs4ltQ7Y5QCdGhrpenkXJnRXoseVabQDwJHX 0zcAn0tAf6iHT276oxjZS/ZhPu49wF8M =6Gci -END PGP SIGNATURE- ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Chris Angelico writes: Don't forget, of course, that there is a middle ground. Something that's really REALLY awesome on PyPI but isn't in the stdlib might be packaged by various Linux distros. Oh, agreed, and any organization that cares that much will already have the RHEL or Ubuntu LTS (etc, those are the ones my buddies use) contract. CPython is not the only certification that is trusted (to whatever degree) by users of CPython. But forget Debian and Gentoo at least -- all it takes get in to those distros is a license (free) and a maintainer (at application time). And they'll even wink at the license given sufficient user demand. Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
-Original Message- From: Python-Dev [mailto:python-dev- bounces+kristjan=ccpgames@python.org] On Behalf Of Antoine Pitrou Sent: 27. mars 2014 15:53 To: python-dev@python.org Subject: Re: [Python-Dev] collections.sortedtree On Thu, 27 Mar 2014 08:50:01 -0700 Daniel Stutzbach stutzb...@google.com wrote: To provide efficient cancellation and removal, a heap implementation needs some way to efficiently answer What is the current index of this item?. There are a couple of ways to achieve that, but they all require more storage than heapq's list-based approach. You are right. I was assuming the index is already known. Regards Antoine. Yes. But for rare occurrances, it is annoying that heap.remove(item) is more expensive than it needs to be. It is a reasonable operation, just like list.remove() is. I'll be willing to experiment with extending the heapq. methods to take an optional map argument. 'map' would be a dict, mapping objects in the heap to indices. If provided, each of the heapq methouds would take care to update the map of any objects that it touches with the current index of the object. Usage could be something like: heap = [] map = {} def insert(i): heapq.heappush(heap, I, map) def pop(i): return heapq.heappop(heap, map) def remove(i): heapq.heapdel(heap, map[i], map) K ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 03/27/2014 04:11 AM, Stephen J. Turnbull wrote: Maybe. That depends on if you care about the convenience of folks who have to get new modules past Corporate Security, but it's easier to get an upgrade of the whole shebang. I don't think it's ever really been resolved whether they're a typical case that won't go away or a special group whose special needs should be considered. ISTM that such concerns should be addressed via some kind of paid support contract (a la RHEL), and not used to drive decisions for the underlying FLOSS project. The existence of aggregated resources from such a support organization would then be relevant to the include / exclude discussion: presumably the support organization could fund the maintenance of the otherwise-excluded module based on its customers' paid-for needs. Tres. - -- === Tres Seaver +1 540-429-0999 tsea...@palladion.com Palladion Software Excellence by Designhttp://palladion.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlM1jtQACgkQ+gerLs4ltQ5OOgCdHeOjBjpfJ1w5mkAWZsajflWK U3wAmgIxnc7BFIaoouQ0kdkCgoF+lMr3 =yhYm -END PGP SIGNATURE- ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On 28 March 2014 16:22, Tres Seaver tsea...@palladion.com wrote: On 03/28/2014 12:18 PM, Tres Seaver wrote: I'm mostly arguing the FLOSS project should feel free to ignore high-maintenance-cost commercial concerns until those concerns bring either blook (funded developer time) or treasure (pooled to pay for the same time) to the table to pay for them. Dangit, s/blook/blood/ Rats. I thought I'd just learned a new word. Regardless of reality, I think blook should be a real word :-) Paul ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 03/28/2014 11:57 AM, Stephen J. Turnbull wrote: So, let me get this straight: you think that one user should pay Red Hat to vet the package for RHEL, and another user should pay to get it into Ubuntu, and another user to get it into SuSE? And then the distros should all pay lawyers to write contracts to make sure that nobody is paying too much for support in the stdlib when they eventually get it in? (Except the customers, of course, everybody will be happy if *they* pay more than they need to.) No, I'm arguing that *if* the concerns you articulate represent a significant number of corporate* customers (i.e., a market), their interests would be better represented by *some* organization who is paid to reflect them. RHEL and Ubuntu LTS could potentially be contributors to that pool. I'm mostly arguing the FLOSS project should feel free to ignore high-maintenance-cost commercial concerns until those concerns bring either blook (funded developer time) or treasure (pooled to pay for the same time) to the table to pay for them. Tres. - -- === Tres Seaver +1 540-429-0999 tsea...@palladion.com Palladion Software Excellence by Designhttp://palladion.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlM1oPAACgkQ+gerLs4ltQ52ZgCg06AobjcZa1lGBDtFzRk6IjEK 6DkAnj33xAkqcDUjLGaT9NP4YtZdkAos =62VL -END PGP SIGNATURE- ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Tres Seaver writes: On 03/27/2014 04:11 AM, Stephen J. Turnbull wrote: Maybe. That depends on if you care about the convenience of folks who have to get new modules past Corporate Security, but it's easier to get an upgrade of the whole shebang. I don't think it's ever really been resolved whether they're a typical case that won't go away or a special group whose special needs should be considered. ISTM that such concerns should be addressed via some kind of paid support contract (a la RHEL), and not used to drive decisions for the underlying FLOSS project. The existence of aggregated resources from such a support organization would then be relevant to the include / exclude discussion: presumably the support organization could fund the maintenance of the otherwise-excluded module based on its customers' paid-for needs. So, let me get this straight: you think that one user should pay Red Hat to vet the package for RHEL, and another user should pay to get it into Ubuntu, and another user to get it into SuSE? And then the distros should all pay lawyers to write contracts to make sure that nobody is paying too much for support in the stdlib when they eventually get it in? (Except the customers, of course, everybody will be happy if *they* pay more than they need to.) Seems to me that putting it into stdlib in the first place makes more sense, if it's going to end up there at all. Another way to put it is, we need a better way to fund support of routine maintenance (ie, the unfun parts) than negotiating it module by module. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 03/28/2014 12:18 PM, Tres Seaver wrote: I'm mostly arguing the FLOSS project should feel free to ignore high-maintenance-cost commercial concerns until those concerns bring either blook (funded developer time) or treasure (pooled to pay for the same time) to the table to pay for them. Dangit, s/blook/blood/ Tres. - -- === Tres Seaver +1 540-429-0999 tsea...@palladion.com Palladion Software Excellence by Designhttp://palladion.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlM1ocMACgkQ+gerLs4ltQ6k8ACgm+ycaOaqZGsefJU5iu9kL4bS 1e4AmQHq/vb4Q6GV/MNuCZQr4HKG/JER =8fLu -END PGP SIGNATURE- ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On Fri, Mar 28, 2014 at 09:20:35AM +, Kristján Valur Jónsson wrote: I'll be willing to experiment with extending the heapq. methods to take an optional map argument. 'map' would be a dict, mapping objects in the heap to indices. If provided, each of the heapq methouds would take care to update the map of any objects that it touches with the current index of the object. Usage could be something like: heap = [] map = {} def insert(i): heapq.heappush(heap, I, map) def pop(i): return heapq.heappop(heap, map) def remove(i): heapq.heapdel(heap, map[i], map) If you're going to make heapq more complex, wouldn't it be better to encapsulate the logic into a Heap class? heapq should remain the same, and Heap could (if accepted) move into collections. And should this discussion move to python-ideas? -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On 2014-03-28 16:39, Paul Moore wrote: On 28 March 2014 16:22, Tres Seaver tsea...@palladion.com wrote: On 03/28/2014 12:18 PM, Tres Seaver wrote: I'm mostly arguing the FLOSS project should feel free to ignore high-maintenance-cost commercial concerns until those concerns bring either blook (funded developer time) or treasure (pooled to pay for the same time) to the table to pay for them. Dangit, s/blook/blood/ Rats. I thought I'd just learned a new word. Regardless of reality, I think blook should be a real word :-) Yes, and it should mean funded developer time on or for an open source project. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On Mar 26, 2014, at 1:31 PM, Marko Rauhamaa ma...@pacujo.net wrote: I have made a full implementation of a balanced tree and would like to know what the process is to have it considered for inclusion in Python 3. To summarize, the implementation closely parallels dict() features and resides in _collectionsmodule.c under the name collections.sortedtree. It uses solely the operator to compare keys. I have chosen the AVL tree as an implementation technique. FWIW, I think there may be room for a sorted collection somewhere in the standard library. As others have said, the best place to start is by putting a module on PyPi to let it mature and to compete with other approaches to the problem. Here are a few random thoughts on the over-all idea: * An AVL balanced tree isn't the only solution or necessarily the best solution to the problem. Tree nodes tend to take more space than denser structures and they have awful cache locality (these are the same reasons that deques use doubly-linked blocks rather than a plain doubly linked lists). * Another approach I have experimented with uses lazy sorting. That lets insertion be an O(1) step and incurs a one-time sorting cost upon the next lookup (and because Timsort exploits partial orderings, this can be very cheap). A lazy sorted list is dense and sequentially ordered in memory (reducing space overhead, taking full advantage of cache locality and memory controller auto-prefetch, and providing fast iteration speed by not needing to chase pointers). The lazy sort approach works best in applications that spend most of the time doing lookups and have only infrequent deletions and insertions. * The name of the tool probably should not be sortedtree. Ideally, the tool should be named for what it does, not how it does it (for the most part, users don't need to know whether the underlying implementation is a red-black tree, b-tree, judy array, sqlite database, or lazy list). That is why (I think) that Python dicts are called dicts rather than hash tables (the implementation) or an associative array (the computer science term for the abstract datatype). * There are plenty of data structures that have had good utility and popularity outside of the standard library. I that it is a good thing that blists, numpy arrays, databases, and panda dataframes all live outside the standard library. Those tools are easier to maintain externally and it is easier for you to keep control over the design. Remember the saying, the standard library is where code goes to die (and I would add that it should already be mature (or nearly dead) by the time it gets there). * That said, it is a reasonable possibility that the standard library would benefit from some kind sorted collection (the idea comes up from time to time). Raymond Hettinger___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Raymond Hettinger raymond.hettin...@gmail.com: * An AVL balanced tree isn't the only solution or necessarily the best solution to the problem. Tree nodes tend to take more space than denser structures and they have awful cache locality (these are the same reasons that deques use doubly-linked blocks rather than a plain doubly linked lists). Maybe. The key is the API. The implementation underneath should be changeable. For example, Jython would probably use SortedTree to implement it. Performance tests should help decide when an implementation is switched for a more efficient one. In some of my tests, I haven't seen any significant performance differences between RB trees and AVL trees, for example. The blist implementation, which I have taken a quick glance at, buys cache locality at the price of block copying; I have no data to decide if the tradeoff is a good one. The main thing, IMO, is to get one sorted dictionary in. * The name of the tool probably should not be sortedtree. Correct. That was a mistake on the Subject line. In the code, it's sorteddict. * That said, it is a reasonable possibility that the standard library would benefit from some kind sorted collection (the idea comes up from time to time). Yes. As a user, I have craved for an implementation, which is readily available in Java and the linux kernel, for example. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Marko Rauhamaa ma...@pacujo.net: For example, Jython would probably use SortedTree to implement it. That word just keeps coming out of my keyboard. The Java class is of course the TreeMap. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On 29 March 2014 01:57, Stephen J. Turnbull step...@xemacs.org wrote: Another way to put it is, we need a better way to fund support of routine maintenance (ie, the unfun parts) than negotiating it module by module. Yes, yes we do, and there *are* people working on that (see https://wiki.python.org/moin/PythonSoftwareFoundation/BoardCandidates2014) :) Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On Fri, Mar 28, 2014 at 5:48 PM, Daniel Stutzbach stutzb...@google.com wrote: On Fri, Mar 28, 2014 at 2:54 PM, Marko Rauhamaa ma...@pacujo.net wrote: The blist implementation, which I have taken a quick glance at, buys cache locality at the price of block copying; I have no data to decide if the tradeoff is a good one. There's actually a compile-time parameter so that you can tune the tradeoff by adjusting how wide each of blist's nodes are. Like most tradeoffs, whether it's good or bad depends on what you're trying to do. RedBlack trees are very similar to a B-tree with a node-width of 4: http://en.wikipedia.org/wiki/Red%E2%80%93black_tree#Analogy_to_B-trees_of_order_4 blist's sorteddict is written in Python on top of the blist type (which is written in C). Because of the Python layer, it's slower by a constant factor compared to pure-C implementations in microbenchmarks. It grieves me a bit to say this, and blist and blist's sorteddict are probably a good tool for an appropriate job, but blist's sorteddict type is quite a bit slower than several pure python implementations of dictionaries with ordered keys. In my testing blist.sorteddict was dead last for random keys, and wasn't last but was still significantly underperforming for sequential keys (outperforming only binary tree and scapegoat tree, behind all others): http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/2014-03-18/ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Nick Coghlan writes: On 27 Mar 2014 07:02, Guido van Rossum gu...@python.org wrote: Actually, the first step is publish it on PyPI, the second is to get a fair number of happy users there. The bar for getting something included into the stdlib is pretty high The why not a third party module? bar also got a fair bit higher with Python 3.4 - by bundling pip, we have deliberately made third party modules easier to consume, thus weakening the convenience argument that applies to stdlib inclusion. Maybe. That depends on if you care about the convenience of folks who have to get new modules past Corporate Security, but it's easier to get an upgrade of the whole shebang. I don't think it's ever really been resolved whether they're a typical case that won't go away or a special group whose special needs should be considered. Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On Thu, Mar 27, 2014 at 10:11 AM, Stephen J. Turnbull step...@xemacs.org wrote: Nick Coghlan writes: On 27 Mar 2014 07:02, Guido van Rossum gu...@python.org wrote: Actually, the first step is publish it on PyPI, the second is to get a fair number of happy users there. The bar for getting something included into the stdlib is pretty high The why not a third party module? bar also got a fair bit higher with Python 3.4 - by bundling pip, we have deliberately made third party modules easier to consume, thus weakening the convenience argument that applies to stdlib inclusion. Maybe. That depends on if you care about the convenience of folks who have to get new modules past Corporate Security, but it's easier to get an upgrade of the whole shebang. I don't think it's ever really been resolved whether they're a typical case that won't go away or a special group whose special needs should be considered. Steve And random pieces of C included in the standard library can be shuffled under the carpet under the disguise of upgrade or what are you suggesting? ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On 27 March 2014 08:16, Maciej Fijalkowski fij...@gmail.com wrote: And random pieces of C included in the standard library can be shuffled under the carpet under the disguise of upgrade or what are you suggesting? The sort of thing that happens is that the relevant approvers will accept python-dev as a trusted supplier and then Python upgrades are acceptable subject to review of the changes, etc. For a new module, there is a whole other level of questions around how do we trust the person who developed the code, do we need to do a full code review, etc? It's a bit unfair to describe the process as random pieces of C being shuffled under the carpet. (Although there probably are environments where that is uncomfortably close to the truth :-() Paul ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On Thu, Mar 27, 2014 at 11:07 AM, Paul Moore p.f.mo...@gmail.com wrote: On 27 March 2014 08:16, Maciej Fijalkowski fij...@gmail.com wrote: And random pieces of C included in the standard library can be shuffled under the carpet under the disguise of upgrade or what are you suggesting? The sort of thing that happens is that the relevant approvers will accept python-dev as a trusted supplier and then Python upgrades are acceptable subject to review of the changes, etc. For a new module, there is a whole other level of questions around how do we trust the person who developed the code, do we need to do a full code review, etc? It's a bit unfair to describe the process as random pieces of C being shuffled under the carpet. (Although there probably are environments where that is uncomfortably close to the truth :-() Paul I just find my company is stupid so let's work around it by putting stuff to python standard library unacceptable argument for python-dev and all the python community. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
True. I've long since added a heapdel() to our local fork. a heappop(idx=0) extension would do the same I can provide a patch if there is interest. K Ideally, I think you should be able to replace the cancelled item with the last item in the heap and then fix the heap in logarithmic time, but the heapq API doesn't seem to provide a way to do this. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On 27 March 2014 18:11, Stephen J. Turnbull step...@xemacs.org wrote: Nick Coghlan writes: On 27 Mar 2014 07:02, Guido van Rossum gu...@python.org wrote: Actually, the first step is publish it on PyPI, the second is to get a fair number of happy users there. The bar for getting something included into the stdlib is pretty high The why not a third party module? bar also got a fair bit higher with Python 3.4 - by bundling pip, we have deliberately made third party modules easier to consume, thus weakening the convenience argument that applies to stdlib inclusion. Maybe. That depends on if you care about the convenience of folks who have to get new modules past Corporate Security, but it's easier to get an upgrade of the whole shebang. I don't think it's ever really been resolved whether they're a typical case that won't go away or a special group whose special needs should be considered. I'm all too well aware of this particular problem (although thankfully don't have to deal with it myself any more), and it's a key part of why I said a fair bit higher rather than insurmountable :) Such environments *do* usually have procedures to get additional open source software approved, it's just a pain. That pain then becomes another factor to take into account deciding between using a simpler solution, rolling your own custom solution, or tackling your organisation's module approval process. That said, getting approval is definitely easier when the request is to trust the Python Software Foundation vs J. Random Programmer's account on GitHub, so yes, it still counts in favour of sufficiently compelling stdlib additions. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On 27 March 2014 19:10, Maciej Fijalkowski fij...@gmail.com wrote: On Thu, Mar 27, 2014 at 11:07 AM, Paul Moore p.f.mo...@gmail.com wrote: On 27 March 2014 08:16, Maciej Fijalkowski fij...@gmail.com wrote: And random pieces of C included in the standard library can be shuffled under the carpet under the disguise of upgrade or what are you suggesting? The sort of thing that happens is that the relevant approvers will accept python-dev as a trusted supplier and then Python upgrades are acceptable subject to review of the changes, etc. For a new module, there is a whole other level of questions around how do we trust the person who developed the code, do we need to do a full code review, etc? It's a bit unfair to describe the process as random pieces of C being shuffled under the carpet. (Although there probably are environments where that is uncomfortably close to the truth :-() Paul I just find my company is stupid so let's work around it by putting stuff to python standard library unacceptable argument for python-dev and all the python community. Due diligence and prudent risk management are not stupid - most open source projects and small companies just don't have the luxury of worrying about them, as they're so far down the list of concerns that the additional risk of using arbitrary code downloaded off the internet doesn't even register. As organisations get larger, they have more to lose, so they typically start worrying about that kind of thing. Properly vetting software for licensing and potential security issues is expensive though, so they usually prefer to outsource that task to trusted providers (this is one of the key concepts that gets me paid). Contractual arrangements and brand reputations start to replace blind trust in upstream developers. Is this less efficient than full open collaboration? Yes, it is - the verify part of trust, but verify comes at a real cost in time and money. However, there are plenty of good reasons that phrase is trust, but verify rather than trust unconditionally. You may choose not to care about those more cautious users, and that's fine - but they're still worth taking into account when making design decisions if you want to cross the marketing chasm from early adopters to everyone else. Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On Thu, Mar 27, 2014 at 8:58 PM, Nick Coghlan ncogh...@gmail.com wrote: On 27 March 2014 19:10, Maciej Fijalkowski fij...@gmail.com wrote: I just find my company is stupid so let's work around it by putting stuff to python standard library unacceptable argument for python-dev and all the python community. Due diligence and prudent risk management are not stupid - most open source projects and small companies just don't have the luxury of worrying about them, as they're so far down the list of concerns that the additional risk of using arbitrary code downloaded off the internet doesn't even register. I don't think anyone's saying it's stupid to be cautious, but more that it's stupid to blindly accept the latest python.org release and *not* accept something from another source. And if that's stupid, well, I'm stupid too - blindly accepting a whole lot of binary package updates because they're on ftp.au.debian.org, for instance. Why do I trust that, and not random sites on the internet? Because I trust that the Debian package maintainers to check what goes through, and I trust that there are people with reputations at stake, who won't want to send something dodgy through. It's not perfect, but it's a whole lot easier than checking every single package that goes through. ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On 27 March 2014 20:38, Chris Angelico ros...@gmail.com wrote: On Thu, Mar 27, 2014 at 8:58 PM, Nick Coghlan ncogh...@gmail.com wrote: On 27 March 2014 19:10, Maciej Fijalkowski fij...@gmail.com wrote: I just find my company is stupid so let's work around it by putting stuff to python standard library unacceptable argument for python-dev and all the python community. Due diligence and prudent risk management are not stupid - most open source projects and small companies just don't have the luxury of worrying about them, as they're so far down the list of concerns that the additional risk of using arbitrary code downloaded off the internet doesn't even register. I don't think anyone's saying it's stupid to be cautious, but more that it's stupid to blindly accept the latest python.org release and *not* accept something from another source. And if that's stupid, well, I'm stupid too - blindly accepting a whole lot of binary package updates because they're on ftp.au.debian.org, for instance. Why do I trust that, and not random sites on the internet? Because I trust that the Debian package maintainers to check what goes through, and I trust that there are people with reputations at stake, who won't want to send something dodgy through. It's not perfect, but it's a whole lot easier than checking every single package that goes through. Right - trusting the PSF and the Debian package review process are reasonable trade-offs at a personal level. However, it highlights why this is still a benefit of bringing things into the standard library: a lot of the overhead in review and audit processes is incurred based on the *number of components to be reviewed*, rather than the amount of code they contain. Other aspects of the overhead are incurred *per organisation trusted*. Inclusion in the standard library means not only bringing something under the PSF's trust umbrella, but also within the core Python component. While we usually use corporate environments as our example, it applies equally to Linux distros - they generally have Python packaged already, and will eventually inherit standard library improvements. For third party packages, it requires someone to do the work to get them installed. Extending our trust to include a new component isn't to be done lightly, but it *does* genuinely save work in the long run for a whole lot of other people whenever we choose to do so, and that is the point Stephen was making. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On Thu, Mar 27, 2014 at 10:10 PM, Nick Coghlan ncogh...@gmail.com wrote: Extending our trust to include a new component isn't to be done lightly, but it *does* genuinely save work in the long run for a whole lot of other people whenever we choose to do so, and that is the point Stephen was making. Exactly so. Blessing a module as part of the stdlib is an important mark of trust, and one that is most definitely not stupid to respect. (In case my previous message was ambiguous, I am fully in favour of people trusting python.org in this way, and fully understanding of people NOT trusting other sources, or at least not automatically.) ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On Thu, 27 Mar 2014 09:25:05 + Kristján Valur Jónsson krist...@ccpgames.com wrote: True. I've long since added a heapdel() to our local fork. a heappop(idx=0) extension would do the same I can provide a patch if there is interest. I think either of them would be cool. I don't know it would be approved by Raymond or whoever else. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
27.03.14 00:16, Guido van Rossum написав(ла): Yeah, so the pyftp fix is to keep track of how many timers were cancelled, and if the number exceeds a threshold it just recreates the heap, something like heap = [x for x in heap if not x.cancelled] heapify(heap) See also http://bugs.python.org/issue13451 which proposes such approach for the sched module. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On Wed, Mar 26, 2014 at 9:55 PM, Benjamin Peterson benja...@python.orgwrote: On Wed, Mar 26, 2014, at 13:31, Marko Rauhamaa wrote: I have made a full implementation of a balanced tree and would like to know what the process is to have it considered for inclusion in Python 3. It's not a bad idea. (I believe others have proposed an red-black tree.) Certainly, it requires a PEP and a few months of bikesheding, though. Not to mention discussion about whether it shouldn't just be an existing PyPI package, like http://pypi.python.org/pypi/blist, rather than a new implementation. -- Thomas Wouters tho...@python.org Hi! I'm an email virus! Think twice before sending your email to help me spread! ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Thomas Wouters tho...@python.org: Not to mention discussion about whether it shouldn't just be an existing PyPI package, like http://pypi.python.org/pypi/blist, rather than a new implementation. I'm fine with any implementation as long as it is in the standard library. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Surely you can show empathy and still explain why it's not that easy. On Mar 27, 2014 2:11 AM, Maciej Fijalkowski fij...@gmail.com wrote: On Thu, Mar 27, 2014 at 11:07 AM, Paul Moore p.f.mo...@gmail.com wrote: On 27 March 2014 08:16, Maciej Fijalkowski fij...@gmail.com wrote: And random pieces of C included in the standard library can be shuffled under the carpet under the disguise of upgrade or what are you suggesting? The sort of thing that happens is that the relevant approvers will accept python-dev as a trusted supplier and then Python upgrades are acceptable subject to review of the changes, etc. For a new module, there is a whole other level of questions around how do we trust the person who developed the code, do we need to do a full code review, etc? It's a bit unfair to describe the process as random pieces of C being shuffled under the carpet. (Although there probably are environments where that is uncomfortably close to the truth :-() Paul I just find my company is stupid so let's work around it by putting stuff to python standard library unacceptable argument for python-dev and all the python community. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On Thu, 27 Mar 2014 08:50:01 -0700 Daniel Stutzbach stutzb...@google.com wrote: Due to way the heapq is implemented, it can't provide an efficient API for removing an arbitrary item. Swapping with the last element allows you to efficiently remove the item at a particular index, but you first need to find the current index of an item, which requires a O(n) scan. To provide efficient cancellation and removal, a heap implementation needs some way to efficiently answer What is the current index of this item?. There are a couple of ways to achieve that, but they all require more storage than heapq's list-based approach. You are right. I was assuming the index is already known. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
for our stackless socket framework we have the same issue. Windows provides an opaque timer system where a timer can be cancelled by handle. But on linux one has to craft one's own. One thing with this particular use case is that a heapq is overkill for network timer. . For network timers a granularity of one second should typically be sufficient. Thus, one can implement a map of future times (in quantisized time, e.g. whole seconds) to sets of timers. A timer is then keyed by its quantisized due time plus its callback. Cancellation can then be O(1). From: Python-Dev [mailto:python-dev-bounces+kristjan=ccpgames@python.org] On Behalf Of Guido van Rossum Sent: 26. mars 2014 21:42 To: Marko Rauhamaa Cc: Python-Dev Subject: Re: [Python-Dev] collections.sortedtree I haven't felt it, heapq feels natural to me for this use case. :-) I'm aware of the issue of frequent cancelled timers, but chose to wait and see rather than preemptively fix it (only so many hours in a day). IIRC pyftplib has a clever cleanup algorithm that we can easily add if that usage pattern becomes popular. On Wed, Mar 26, 2014 at 2:36 PM, Marko Rauhamaa ma...@pacujo.netmailto:ma...@pacujo.net wrote: Guido van Rossum gu...@python.orgmailto:gu...@python.org: Actually, the first step is publish it on PyPI, the second is to get a fair number of happy users there. The bar for getting something included into the stdlib is pretty high -- you need to demonstrate that there is a need *and* that having it as a 3rd party module is a problem. I hear you about the process. About the need part, I'm wondering if you haven't felt it in implementing the timers for asyncio. I have had that need in several network programming projects and have ended up using my AVL tree implementation (C and Python). Well, time will tell if frequent canceled timers end up piling up the heap queue. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Nick Coghlan writes: On 27 March 2014 18:11, Stephen J. Turnbull step...@xemacs.org wrote: I don't think it's ever really been resolved whether they're a typical case that won't go away or a special group whose special needs should be considered. That said, getting approval is definitely easier when the request is to trust the Python Software Foundation vs J. Random Programmer's account on GitHub, so yes, it still counts in favour of sufficiently compelling stdlib additions. I note that this *still* doesn't resolve the alternatives presented above.wink/ I hasten to add, it doesn't need to be done *now*. Yes, I do think it matters. Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Maciej Fijalkowski writes: On Thu, Mar 27, 2014 at 10:11 AM, Stephen J. Turnbull step...@xemacs.org wrote: Maybe. That depends on if you care about the convenience of folks who have to get new modules past Corporate Security, but it's easier to get an upgrade of the whole shebang. I don't think it's ever really been resolved whether they're a typical case that won't go away or a special group whose special needs should be considered. Steve And random pieces of C included in the standard library can be shuffled under the carpet under the disguise of upgrade or what are you suggesting? CPython doesn't do stuff like that; C code in CPython gets careful review, and can hardly be called random. PyPI packages do do stuff like that. Not all of them by far, but to tell the difference you need to review line-by-line. So I'm simply saying (as Nick did) that it is often easier to get changes past a corporate bureaucracy if they are certified by inclusion in CPython (not to exclude PyPy or Jython, but the corporate consideration is about a specific distribution), than if it's a random package on PyPI. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Maciej Fijalkowski writes: I just find my company is stupid I find labeling companies as stupid, merely because they are cautious about what external code they allow their developers to depend on, unacceptable. And I'll go to the trouble of explaining why. so let's work around it by putting stuff to python standard library unacceptable argument for python-dev and all the python community. Then you're completely missing the point. There are two issues in including code in the Python standard library. The first is technical excellence. You need to hurdle a certain bar, or the code won't go in. Furthermore, this bar includes a comprehensive set of regression tests. This implies that (1) human review of the code is on average much easier for CPython stdlib code than for PyPI code, and (2) a claim that no relevant-to-the-company behavior has actually changed is (again, on average) much more verifiable and plausible for CPython than for PyPI code. This is at least some convenience for *all* users, and a near necessity for some strictly controlled environments. In the latter, the gatekeeper is all too likely to say PyPI? Bring us a CTO signoff that 'this module is essential', or forget it. The second is a purely economic tradeoff: is the value of Python aggregated across all our users better enhanced by restricting additions to the stdlib, and thus reducing future maintenance effort and (presumably) increasing the rate at which Python code is improved, or by improving the batteries that are included (with the opposite effects)? That's a judgment call. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On Fri, Mar 28, 2014 at 3:26 PM, Stephen J. Turnbull step...@xemacs.org wrote: This is at least some convenience for *all* users, and a near necessity for some strictly controlled environments. In the latter, the gatekeeper is all too likely to say PyPI? Bring us a CTO signoff that 'this module is essential', or forget it. Don't forget, of course, that there is a middle ground. Something that's really REALLY awesome on PyPI but isn't in the stdlib might be packaged by various Linux distros. If you use Debian, typing apt-cache pkgnames python- (or using tab completion on an apt-get install) will show you a veritable ton of packages, many of which are originally off PyPI. Even if you don't actually use that distribution, it might be worth telling the gatekeeper that Debian and/or Red Hat have indicated trust for some particular version of some particular package, which might help get over that hump. But it's still going to be a much bigger hump than it's in CPython 3.4. ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] collections.sortedtree
I have made a full implementation of a balanced tree and would like to know what the process is to have it considered for inclusion in Python 3. To summarize, the implementation closely parallels dict() features and resides in _collectionsmodule.c under the name collections.sortedtree. It uses solely the operator to compare keys. I have chosen the AVL tree as an implementation technique. The views support a number of optional arguments: sorteddict.keys(reversed=False, start=unspecified, inclusive=True) The primary objective of having a balanced tree in the standard library is to support ordered access in an efficient manner. The typical applications would include timers (networking), aging (cache) and prefix patterns (routing). Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On 03/26/2014 01:31 PM, Marko Rauhamaa wrote: I have made a full implementation of a balanced tree and would like to know what the process is to have it considered for inclusion in Python 3. Open a ticket on the tracker [1], post your code to that ticket, sign the CLA [2], answer questions, etc., that come up in the code review. I believe Raymond Hettinger is the Guardian of the collections module; add him as nosy. -- ~Ethan~ [1] http://http://bugs.python.org [2] https://www.python.org/psf/contrib/contrib-form ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On Wed, Mar 26, 2014, at 13:31, Marko Rauhamaa wrote: I have made a full implementation of a balanced tree and would like to know what the process is to have it considered for inclusion in Python 3. It's not a bad idea. (I believe others have proposed an red-black tree.) Certainly, it requires a PEP and a few months of bikesheding, though. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Actually, the first step is publish it on PyPI, the second is to get a fair number of happy users there. The bar for getting something included into the stdlib is pretty high -- you need to demonstrate that there is a need *and* that having it as a 3rd party module is a problem. And that once it's in, (a) it will be stable, and (b) someone who cares about it and knows the code thoroughly is available maintain it for years. On Wed, Mar 26, 2014 at 1:53 PM, Ethan Furman et...@stoneleaf.us wrote: On 03/26/2014 01:31 PM, Marko Rauhamaa wrote: I have made a full implementation of a balanced tree and would like to know what the process is to have it considered for inclusion in Python 3. Open a ticket on the tracker [1], post your code to that ticket, sign the CLA [2], answer questions, etc., that come up in the code review. I believe Raymond Hettinger is the Guardian of the collections module; add him as nosy. -- ~Ethan~ [1] http://http://bugs.python.org [2] https://www.python.org/psf/contrib/contrib-form ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ guido%40python.org -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On Mar 26, 2014, at 01:55 PM, Benjamin Peterson wrote: It's not a bad idea. (I believe others have proposed an red-black tree.) Certainly, it requires a PEP and a few months of bikesheding, though. Generally, PEPs aren't necessary for simple or relatively uncontroversial additions to existing modules or the stdlib. If you're proposing an addition to an existing module, file a tracker issue and nosy the module expert (for collections, it's Raymond Hettinger[1]). Others may see the new issue and decide to nosy themselves in if they want to provide additional feedback and input. If you're proposing an entirely new module, it's best to publish it first on PyPI, get lots of users, and listen to their feedback as you evolve the library. Once you feel like it's had plenty of time to bake as a third party module *and* the API is sufficiently stable that you won't mind the Python stdlib 18 month development cycle, then you can propose that it be included in Python. Cheers, -Barry [1] http://docs.python.org/devguide/experts.html ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On Wed, 26 Mar 2014 22:31:56 +0200 Marko Rauhamaa ma...@pacujo.net wrote: The primary objective of having a balanced tree in the standard library is to support ordered access in an efficient manner. The typical applications would include timers (networking), aging (cache) Wouldn't a heapq work as well for those two? Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Guido van Rossum gu...@python.org: Actually, the first step is publish it on PyPI, the second is to get a fair number of happy users there. The bar for getting something included into the stdlib is pretty high -- you need to demonstrate that there is a need *and* that having it as a 3rd party module is a problem. I hear you about the process. About the need part, I'm wondering if you haven't felt it in implementing the timers for asyncio. I have had that need in several network programming projects and have ended up using my AVL tree implementation (C and Python). Well, time will tell if frequent canceled timers end up piling up the heap queue. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On 3/26/2014 4:59 PM, Guido van Rossum wrote: Actually, the first step is publish it on PyPI, the second is to get a fair number of happy users there. The bar for getting something included into the stdlib is pretty high -- you need to demonstrate that there is a need *and* that having it as a 3rd party module is a problem. And that once it's in, (a) it will be stable, and (b) someone who cares about it and knows the code thoroughly is available maintain it for years. Perhaps the collections doc should mention that there are other specializes container types available on PyPI and either list some or point to a wiki page listing some. There must be at least 10 that could be included in such a list. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
I haven't felt it, heapq feels natural to me for this use case. :-) I'm aware of the issue of frequent cancelled timers, but chose to wait and see rather than preemptively fix it (only so many hours in a day). IIRC pyftplib has a clever cleanup algorithm that we can easily add if that usage pattern becomes popular. On Wed, Mar 26, 2014 at 2:36 PM, Marko Rauhamaa ma...@pacujo.net wrote: Guido van Rossum gu...@python.org: Actually, the first step is publish it on PyPI, the second is to get a fair number of happy users there. The bar for getting something included into the stdlib is pretty high -- you need to demonstrate that there is a need *and* that having it as a 3rd party module is a problem. I hear you about the process. About the need part, I'm wondering if you haven't felt it in implementing the timers for asyncio. I have had that need in several network programming projects and have ended up using my AVL tree implementation (C and Python). Well, time will tell if frequent canceled timers end up piling up the heap queue. Marko -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On 27 Mar 2014 07:02, Guido van Rossum gu...@python.org wrote: Actually, the first step is publish it on PyPI, the second is to get a fair number of happy users there. The bar for getting something included into the stdlib is pretty high -- you need to demonstrate that there is a need *and* that having it as a 3rd party module is a problem. And that once it's in, (a) it will be stable, and (b) someone who cares about it and knows the code thoroughly is available maintain it for years. The why not a third party module? bar also got a fair bit higher with Python 3.4 - by bundling pip, we have deliberately made third party modules easier to consume, thus weakening the convenience argument that applies to stdlib inclusion. The Python Packaging Authority are also working to reduce the barriers to distribution and consumption of C extensions, which will again weaken the argument for stdlib inclusion of third party C extensions. The metadata 2.0 efforts are also designed to streamline the process of conversion to platform specific formats like RPM. It's not that I don't see a sorted tree as valuable - I do. I just also see it as a fairly specialised type. The main case I could see being made for inclusion is if it made a major difference to the implementation of something else that was already in the stdlib. I believe that's where most other additions to the collections and types modules have come from in recent releases - we wanted to *use* them, and it seemed better to do the work to design an appropriate public API rather than keep them private (enum and generic function support arrived the same way, although we haven't actually redesigned pprint to rely on the latter yet). Cheers, Nick. On Wed, Mar 26, 2014 at 1:53 PM, Ethan Furman et...@stoneleaf.us wrote: On 03/26/2014 01:31 PM, Marko Rauhamaa wrote: I have made a full implementation of a balanced tree and would like to know what the process is to have it considered for inclusion in Python 3. Open a ticket on the tracker [1], post your code to that ticket, sign the CLA [2], answer questions, etc., that come up in the code review. I believe Raymond Hettinger is the Guardian of the collections module; add him as nosy. -- ~Ethan~ [1] http://http://bugs.python.org [2] https://www.python.org/psf/contrib/contrib-form ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Antoine Pitrou solip...@pitrou.net: Wouldn't a heapq work as well for those two? In my experience, networking entities typically start a timer at each interaction and cancel the pending one. So you have numerous timers that virtually never expire. You might have 100 interactions per second, each canceling and restarting a 10-minute timer. I don't know first hand if that causes heap queues to cause measurable heap or CPU pressure. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On Wed, 26 Mar 2014 23:41:42 +0200 Marko Rauhamaa ma...@pacujo.net wrote: Antoine Pitrou solip...@pitrou.net: Wouldn't a heapq work as well for those two? In my experience, networking entities typically start a timer at each interaction and cancel the pending one. So you have numerous timers that virtually never expire. You might have 100 interactions per second, each canceling and restarting a 10-minute timer. I don't know first hand if that causes heap queues to cause measurable heap or CPU pressure. Each individual heapq operation (push or pop) will be O(log n). That's not different from a balanced search tree (although of course the constant multiplier may vary). Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Terry Reedy tjre...@udel.edu: Perhaps the collections doc should mention that there are other specializes container types available on PyPI and either list some or point to a wiki page listing some. There must be at least 10 that could be included in such a list. There *is* a relatively high threshold of importing C extensions from an external source. If I build an application making use of them and advise coworkers to use it, they would likely balk at having to compile them. Not all machines have a development toolkit. Furthermore: # which pip /usr/bin/which: no pip in (/usr/local/sbin:/usr/local/bin:/sbin:/bin:\ /usr/sbin:/usr/bin:/root/bin) # yum install pip No package pip available. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Antoine Pitrou solip...@pitrou.net: Marko Rauhamaa ma...@pacujo.net wrote: In my experience, networking entities typically start a timer at each interaction and cancel the pending one. So you have numerous timers that virtually never expire. You might have 100 interactions per second, each canceling and restarting a 10-minute timer. Each individual heapq operation (push or pop) will be O(log n). That's not different from a balanced search tree (although of course the constant multiplier may vary). Yes, but if I have 1000 connections with one active timer each. The size of my sorted tree is 1000 timer objects. There are typically no expiries to react to. If the canceled timer lingers in the heapq till its expiry (in 10 minutes), the size is 100 * 10 * 60 = 60,000. The CPU has to wake up constantly to clear the expired timers. In practice, none of that might matter. Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On Wed, 26 Mar 2014 23:57:27 +0200 Marko Rauhamaa ma...@pacujo.net wrote: Antoine Pitrou solip...@pitrou.net: Marko Rauhamaa ma...@pacujo.net wrote: In my experience, networking entities typically start a timer at each interaction and cancel the pending one. So you have numerous timers that virtually never expire. You might have 100 interactions per second, each canceling and restarting a 10-minute timer. Each individual heapq operation (push or pop) will be O(log n). That's not different from a balanced search tree (although of course the constant multiplier may vary). Yes, but if I have 1000 connections with one active timer each. The size of my sorted tree is 1000 timer objects. There are typically no expiries to react to. If the canceled timer lingers in the heapq till its expiry (in 10 minutes), the size is 100 * 10 * 60 = 60,000. The CPU has to wake up constantly to clear the expired timers. Ideally, I think you should be able to replace the cancelled item with the last item in the heap and then fix the heap in logarithmic time, but the heapq API doesn't seem to provide a way to do this. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On Wed, Mar 26, 2014 at 6:02 PM, Antoine Pitrou solip...@pitrou.net wrote: On Wed, 26 Mar 2014 23:57:27 +0200 Marko Rauhamaa ma...@pacujo.net wrote: Antoine Pitrou solip...@pitrou.net: Marko Rauhamaa ma...@pacujo.net wrote: In my experience, networking entities typically start a timer at each interaction and cancel the pending one. So you have numerous timers that virtually never expire. You might have 100 interactions per second, each canceling and restarting a 10-minute timer. Each individual heapq operation (push or pop) will be O(log n). That's not different from a balanced search tree (although of course the constant multiplier may vary). Yes, but if I have 1000 connections with one active timer each. The size of my sorted tree is 1000 timer objects. There are typically no expiries to react to. If the canceled timer lingers in the heapq till its expiry (in 10 minutes), the size is 100 * 10 * 60 = 60,000. The CPU has to wake up constantly to clear the expired timers. Ideally, I think you should be able to replace the cancelled item with the last item in the heap and then fix the heap in logarithmic time, but the heapq API doesn't seem to provide a way to do this. Heaps provide easy access to the first item, but you can't find the last element without scanning the whole thing. With a heap, I think the best approach to the timer-cancellation problem is to periodically rebuild the heap from the non-cancelled items (Tornado keeps a count of cancellations and rebuilds the heap when the number of cancelled timeouts is half of the total. If cancellations are infrequent then they're just left in the heap until their time comes due). -Ben Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ben%40bendarnell.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Yeah, so the pyftp fix is to keep track of how many timers were cancelled, and if the number exceeds a threshold it just recreates the heap, something like heap = [x for x in heap if not x.cancelled] heapify(heap) On Wed, Mar 26, 2014 at 3:02 PM, Antoine Pitrou solip...@pitrou.net wrote: On Wed, 26 Mar 2014 23:57:27 +0200 Marko Rauhamaa ma...@pacujo.net wrote: Antoine Pitrou solip...@pitrou.net: Marko Rauhamaa ma...@pacujo.net wrote: In my experience, networking entities typically start a timer at each interaction and cancel the pending one. So you have numerous timers that virtually never expire. You might have 100 interactions per second, each canceling and restarting a 10-minute timer. Each individual heapq operation (push or pop) will be O(log n). That's not different from a balanced search tree (although of course the constant multiplier may vary). Yes, but if I have 1000 connections with one active timer each. The size of my sorted tree is 1000 timer objects. There are typically no expiries to react to. If the canceled timer lingers in the heapq till its expiry (in 10 minutes), the size is 100 * 10 * 60 = 60,000. The CPU has to wake up constantly to clear the expired timers. Ideally, I think you should be able to replace the cancelled item with the last item in the heap and then fix the heap in logarithmic time, but the heapq API doesn't seem to provide a way to do this. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On Wed, 26 Mar 2014 18:14:19 -0400 Ben Darnell b...@bendarnell.com wrote: If the canceled timer lingers in the heapq till its expiry (in 10 minutes), the size is 100 * 10 * 60 = 60,000. The CPU has to wake up constantly to clear the expired timers. Ideally, I think you should be able to replace the cancelled item with the last item in the heap and then fix the heap in logarithmic time, but the heapq API doesn't seem to provide a way to do this. Heaps provide easy access to the first item, but you can't find the last element without scanning the whole thing. I was talking about the last element, not the largest. The point of replacing with the last element is that shrinking is O(1). Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On Wed, Mar 26, 2014 at 1:55 PM, Benjamin Peterson benja...@python.org wrote: On Wed, Mar 26, 2014, at 13:31, Marko Rauhamaa wrote: I have made a full implementation of a balanced tree and would like to know what the process is to have it considered for inclusion in Python 3. It's not a bad idea. (I believe others have proposed an red-black tree.) Certainly, it requires a PEP and a few months of bikesheding, though. I'd recommend a plain binary tree (for random keys), red-black tree and treap (both for ordered or mostly-ordered keys, where red-black tree gives low operation duration standard deviation, and treap gives low average operation duration). https://en.wikipedia.org/wiki/Binary_search_tree#Performance_comparisons http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/ It'd likely make sense to have either a pure python implementation, or pure python and C-extended, so that Pypy and Jython can share the feature with CPython. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
On Wed, Mar 26, 2014 at 3:52 PM, Marko Rauhamaa ma...@pacujo.net wrote: Dan Stromberg drsali...@gmail.com: It'd likely make sense to have either a pure python implementation, or pure python and C-extended, so that Pypy and Jython can share the feature with CPython. Jython can build directly on Java's native SortedMap implementation. The API should not tie it to a tree. Optimizations and refactorings should be allowed. Only O(log N) worst-case behavior should be mandated. Rare worst cases should be fine for a good reason - see python dict's. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.sortedtree
Dan Stromberg drsali...@gmail.com: It'd likely make sense to have either a pure python implementation, or pure python and C-extended, so that Pypy and Jython can share the feature with CPython. Jython can build directly on Java's native SortedMap implementation. The API should not tie it to a tree. Optimizations and refactorings should be allowed. Only O(log N) worst-case behavior should be mandated. (And now I notice I named this thread wrong; I named my thingy collections.sorteddict.) Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com