Re: [Python-Dev] collections.sortedtree

2014-03-31 Thread Dan Stromberg
vne
On Mon, Mar 31, 2014 at 2:01 PM, Daniel Stutzbach stutzb...@google.com wrote:
 On Fri, Mar 28, 2014 at 6:45 PM, Dan Stromberg drsali...@gmail.com wrote:

 In my testing blist.sorteddict was dead last for random keys, and
 wasn't last but was still significantly underperforming for sequential
 keys (outperforming only binary tree and scapegoat tree, behind all
 others):


 http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/2014-03-18/


 Could you post the source code for your test tools, so that I can reproduce
 them locally and understand the results better?

 I think I'm confused about what you're trying to measure.  It looks like the
 tests perform get and set operations, neither of which required a sorted
 dict.  Wouldn't a good comparison of sorted dict types include at least one
 operation that relies on the sorted property?  Possibly I've misunderstood
 how your tests work.

The code is at 
http://stromberg.dnsalias.org/svn/python-tree-and-heap-comparison/trunk/

You're right, I'm not comparing find_min, find_max, generate_in_order
or generate_reverse_order.  The first two tend to be the same as
find_arbitrary.

I've been thinking about trying to import some C extension module
trees, and just ignoring them if they fail to import.  But I haven't.

Trunk only compares pure python implementations - even for treap,
which has a cython version.  I tossed my changes after checking blist.

Trunk does operations/second.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-30 Thread Marko Rauhamaa
Guido van Rossum gu...@python.org:

 Yeah, so the pyftp fix is to keep track of how many timers were
 cancelled, and if the number exceeds a threshold it just recreates the
 heap, something like

 heap = [x for x in heap if not x.cancelled]
 heapify(heap)

I measured my target use case with a simple emulation on my linux PC.

The simple test case emulates this scenario:

Start N connections at frequency F and have each connection start a
timer T. Then, rotate over the connections at the same frequency F
restarting timer T. Stop after a duration that is much greater than
T.

Four different timer implementations were considered:

   HEAPQ: straight heapq

   HEAPQ*: heapq with the pyftp fix (reheapify whenever 80% of the
   outstanding timers have been canceled)

   SDICT: sorteddict (my C implementation)

   PyAVL: Python AVL tree (my implementation)


Here are the results:

N = 1000, F = 100 Hz, T = 10 min, duration 1 hr

=
Virt Res  max len()   urs   sys   CPU
 MB   MB   s s %
=
HEAPQ22   166000121.5   4.3   0.7
HEAPQ*   117 500018.4   4.2   0.6
SDICT116 100018.2   3.9   0.6
PyAVL116 100039.3   3.6   1.2
=


N = 1, F = 1000 Hz, T = 10 min, duration 1 hr

=
Virt Res  max len()   urs   sys   CPU
 MB   MB   s s %
=
HEAPQ   125  120   600044   223.0  25.8   6.9
HEAPQ*   21   165   186.8  30.0   6.0
SDICT15   101   196.6  25.7   6.2
PyAVL16   111   412.5  22.3  12.1
=


Conclusions:

 * The CPU load is almost identical in HEAPQ, HEAPQ* and SDICT.

 * HEAPQ* is better than HEAPQ because of the memory burden.

 * PyAVL is not all that bad compared with SDICT.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-29 Thread Stephen J. Turnbull
Tres Seaver writes:

  I'm mostly arguing the FLOSS project

You mean a (mostly) volunteer-supported FLOSS project, no?

  should feel free to ignore

Given the above qualification, you can put a period here, as far as
I'm concerned.  My question is what does *Python* *want* to ignore?,
not is it allowed to ignore?

  high-maintenance-cost

What high maintenance cost?  A stdlib addition is a marginal
increase in cost.  Do too many, and it's a serious burden, of course.
So there needs to be *some* hurdle that any addition must clear, and
that barrier becomes higher in proportion to the code base to
developers who actually do maintenance ratio.  But I think the
question should be how high? not can they pay?

  commercial concerns until those concerns bring either blook (funded
  developer time) or treasure (pooled to pay for the same time) to
  the table to pay for them.

I really don't think commercial profit as the motive for a request, or
ability to pay, should be an important reason to *ignore* user wants.
This smacks of the proposals for ransom software, which (a) sort of
turns the principle of *volunteer* FLOSS on its head (if that doesn't
bother most of the developers, no problem, but for one it bothers
*me*), and (b) doesn't actually work AFAICT (it's not quite the same
as crowdfunding, which does work).

It's rather the reverse: I believe we should be prepared to deal with
the conflict of interest that results when *some* of the developers
are offered money to provide *cater* to such concerns[1] where the community
doesn't think the benefit is that high.

Footnotes: 
[1]  Should not be that hard in *this* community, but that is the
issue IMO.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-29 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 03/29/2014 03:46 AM, Stephen J. Turnbull wrote:

 I really don't think commercial profit as the motive for a request,
 or ability to pay, should be an important reason to *ignore* user
 wants.

We've already got corrosion on the terminals the leaky batteries-included
stdlib (ssl, anyone?).  I see nothing wrong with rejecting additional
batteries proposed purely FBO organizations who have the kind of policies
you describe, but don't contribute blood-or-treasure toward their
maintenannce (*especially* because they are in the enterprise space,
and could be expected to pay for that kind of support).

This kind of Python-in-a-tie discussion is probably moot for any
version of Python that python-dev actually cares about, BTW:  by the time
such organizations get around to using 3.5, it likely won't even be
getting security releases from the core developers.


Tres.
- -- 
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   Excellence by Designhttp://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlM3Q2wACgkQ+gerLs4ltQ7Y5QCdGhrpenkXJnRXoseVabQDwJHX
0zcAn0tAf6iHT276oxjZS/ZhPu49wF8M
=6Gci
-END PGP SIGNATURE-

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-28 Thread Stephen J. Turnbull
Chris Angelico writes:

  Don't forget, of course, that there is a middle ground. Something
  that's really REALLY awesome on PyPI but isn't in the stdlib might be
  packaged by various Linux distros.

Oh, agreed, and any organization that cares that much will already
have the RHEL or Ubuntu LTS (etc, those are the ones my buddies use)
contract.  CPython is not the only certification that is trusted (to
whatever degree) by users of CPython.

But forget Debian and Gentoo at least -- all it takes get in to those
distros is a license (free) and a maintainer (at application time).
And they'll even wink at the license given sufficient user demand.

Steve



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-28 Thread Kristján Valur Jónsson


 -Original Message-
 From: Python-Dev [mailto:python-dev-
 bounces+kristjan=ccpgames@python.org] On Behalf Of Antoine Pitrou
 Sent: 27. mars 2014 15:53
 To: python-dev@python.org
 Subject: Re: [Python-Dev] collections.sortedtree
 
 On Thu, 27 Mar 2014 08:50:01 -0700
 Daniel Stutzbach stutzb...@google.com wrote:
  To provide efficient cancellation and removal, a heap implementation
  needs some way to efficiently answer What is the current index of this
 item?.
   There are a couple of ways to achieve that, but they all require more
  storage than heapq's list-based approach.
 
 You are right. I was assuming the index is already known.
 
 Regards
 
 Antoine.

Yes.  But for rare occurrances, it is annoying that heap.remove(item) is more 
expensive than it
needs to be.  It is a reasonable operation, just like list.remove() is.

I'll be willing to experiment with extending the heapq. methods to take an 
optional map argument.
'map' would be a dict, mapping objects in the heap to indices.  If provided, 
each of the heapq methouds would
take care to update the map of any objects that it touches with the current 
index of the object.

Usage could be something like:

heap = []
map = {}
def insert(i):
heapq.heappush(heap, I, map)

def pop(i):
   return heapq.heappop(heap, map)

def remove(i):
  heapq.heapdel(heap, map[i], map)

K


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-28 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 03/27/2014 04:11 AM, Stephen J. Turnbull wrote:

 Maybe.  That depends on if you care about the convenience of folks
 who have to get new modules past Corporate Security, but it's easier
 to get an upgrade of the whole shebang.  I don't think it's ever
 really been resolved whether they're a typical case that won't go
 away or a special group whose special needs should be considered.

ISTM that such concerns should be addressed via some kind of paid support
contract (a la RHEL), and not used to drive decisions for the underlying
FLOSS project.  The existence of aggregated resources from such a support
organization would then be relevant to the include / exclude
discussion: presumably the support organization could fund the
maintenance of the otherwise-excluded module based on its customers'
paid-for needs.


Tres.
- -- 
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   Excellence by Designhttp://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlM1jtQACgkQ+gerLs4ltQ5OOgCdHeOjBjpfJ1w5mkAWZsajflWK
U3wAmgIxnc7BFIaoouQ0kdkCgoF+lMr3
=yhYm
-END PGP SIGNATURE-

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-28 Thread Paul Moore
On 28 March 2014 16:22, Tres Seaver tsea...@palladion.com wrote:
 On 03/28/2014 12:18 PM, Tres Seaver wrote:
 I'm mostly arguing the FLOSS project should feel free to ignore
 high-maintenance-cost commercial concerns until those concerns bring
 either blook (funded developer time) or treasure (pooled to pay for
 the same time) to the table to pay for them.

 Dangit,

  s/blook/blood/

Rats. I thought I'd just learned a new word. Regardless of reality, I
think blook should be a real word :-)

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-28 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 03/28/2014 11:57 AM, Stephen J. Turnbull wrote:

 So, let me get this straight: you think that one user should pay Red 
 Hat to vet the package for RHEL, and another user should pay to get
 it into Ubuntu, and another user to get it into SuSE?  And then the 
 distros should all pay lawyers to write contracts to make sure that 
 nobody is paying too much for support in the stdlib when they 
 eventually get it in?  (Except the customers, of course, everybody 
 will be happy if *they* pay more than they need to.)

No, I'm arguing that *if* the concerns you articulate represent a
significant number of corporate* customers (i.e., a market), their
interests would be better represented by *some* organization who is paid
to reflect them.  RHEL and Ubuntu LTS could potentially be contributors
to that pool.

I'm mostly arguing the FLOSS project should feel free to ignore
high-maintenance-cost commercial concerns until those concerns bring
either blook (funded developer time) or treasure (pooled to pay for the
same time) to the table to pay for them.


Tres.
- -- 
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   Excellence by Designhttp://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlM1oPAACgkQ+gerLs4ltQ52ZgCg06AobjcZa1lGBDtFzRk6IjEK
6DkAnj33xAkqcDUjLGaT9NP4YtZdkAos
=62VL
-END PGP SIGNATURE-

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-28 Thread Stephen J. Turnbull
Tres Seaver writes:

  On 03/27/2014 04:11 AM, Stephen J. Turnbull wrote:
  
   Maybe.  That depends on if you care about the convenience of folks
   who have to get new modules past Corporate Security, but it's easier
   to get an upgrade of the whole shebang.  I don't think it's ever
   really been resolved whether they're a typical case that won't go
   away or a special group whose special needs should be considered.
  
  ISTM that such concerns should be addressed via some kind of paid support
  contract (a la RHEL), and not used to drive decisions for the underlying
  FLOSS project.  The existence of aggregated resources from such a support
  organization would then be relevant to the include / exclude
  discussion: presumably the support organization could fund the
  maintenance of the otherwise-excluded module based on its customers'
  paid-for needs.

So, let me get this straight: you think that one user should pay Red
Hat to vet the package for RHEL, and another user should pay to get it
into Ubuntu, and another user to get it into SuSE?  And then the
distros should all pay lawyers to write contracts to make sure that
nobody is paying too much for support in the stdlib when they
eventually get it in?  (Except the customers, of course, everybody
will be happy if *they* pay more than they need to.)

Seems to me that putting it into stdlib in the first place makes more
sense, if it's going to end up there at all.

Another way to put it is, we need a better way to fund support of
routine maintenance (ie, the unfun parts) than negotiating it module
by module.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-28 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 03/28/2014 12:18 PM, Tres Seaver wrote:
 I'm mostly arguing the FLOSS project should feel free to ignore 
 high-maintenance-cost commercial concerns until those concerns bring 
 either blook (funded developer time) or treasure (pooled to pay for
 the same time) to the table to pay for them.

Dangit,

 s/blook/blood/



Tres.
- -- 
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   Excellence by Designhttp://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlM1ocMACgkQ+gerLs4ltQ6k8ACgm+ycaOaqZGsefJU5iu9kL4bS
1e4AmQHq/vb4Q6GV/MNuCZQr4HKG/JER
=8fLu
-END PGP SIGNATURE-

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-28 Thread Steven D'Aprano
On Fri, Mar 28, 2014 at 09:20:35AM +, Kristján Valur Jónsson wrote:

 I'll be willing to experiment with extending the heapq. methods to take an 
 optional map argument.
 'map' would be a dict, mapping objects in the heap to indices.  If provided, 
 each of the heapq methouds would
 take care to update the map of any objects that it touches with the current 
 index of the object.
 
 Usage could be something like:
 
 heap = []
 map = {}
 def insert(i):
 heapq.heappush(heap, I, map)
 
 def pop(i):
return heapq.heappop(heap, map)
 
 def remove(i):
   heapq.heapdel(heap, map[i], map)

If you're going to make heapq more complex, wouldn't it be better to 
encapsulate the logic into a Heap class? heapq should remain the same, 
and Heap could (if accepted) move into collections. And should this 
discussion move to python-ideas?


-- 
Steven
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-28 Thread MRAB

On 2014-03-28 16:39, Paul Moore wrote:

On 28 March 2014 16:22, Tres Seaver tsea...@palladion.com wrote:

On 03/28/2014 12:18 PM, Tres Seaver wrote:

I'm mostly arguing the FLOSS project should feel free to ignore
high-maintenance-cost commercial concerns until those concerns bring
either blook (funded developer time) or treasure (pooled to pay for
the same time) to the table to pay for them.


Dangit,

 s/blook/blood/


Rats. I thought I'd just learned a new word. Regardless of reality, I
think blook should be a real word :-)


Yes, and it should mean funded developer time on or for an open source
project.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-28 Thread Raymond Hettinger

On Mar 26, 2014, at 1:31 PM, Marko Rauhamaa ma...@pacujo.net wrote:

 I have made a full implementation of a balanced tree and would like to
 know what the process is to have it considered for inclusion in Python
 3.
 
 To summarize, the implementation closely parallels dict() features and
 resides in _collectionsmodule.c under the name collections.sortedtree.
 It uses solely the  operator to compare keys. I have chosen the AVL
 tree as an implementation technique.


FWIW, I think there may be room for a sorted collection somewhere in the
standard library.

As others have said, the best place to start is by putting a module on PyPi
to let it mature and to compete with other approaches to the problem.

Here are a few random thoughts on the over-all idea:

* An AVL balanced tree isn't the only solution or necessarily the best solution
to the problem.  Tree nodes tend to take more space than denser
structures and they have awful cache locality (these are the same reasons
that deques use doubly-linked blocks rather than a plain doubly linked lists).

* Another approach I have experimented with uses lazy sorting.  That lets
insertion be an O(1) step and incurs a one-time sorting cost upon the next
lookup (and because Timsort exploits partial orderings, this can be very
cheap).  A lazy sorted list is dense and sequentially ordered in memory
(reducing space overhead, taking full advantage of cache locality and memory
controller auto-prefetch, and providing fast iteration speed by not needing
to chase pointers).  The lazy sort approach works best in applications that
spend most of the time doing lookups and have only infrequent deletions
and insertions.

* The name of the tool probably should not be sortedtree. Ideally, the tool
should be named for what it does, not how it does it (for the most part,
users don't need to know whether the underlying implementation is
a red-black tree, b-tree, judy array, sqlite database, or lazy list).  That
is why (I think) that Python dicts are called dicts rather than hash tables
(the implementation) or an associative array (the computer science term
for the abstract datatype).

* There are plenty of data structures  that have had good utility and
popularity outside of the standard library.  I that it is a good thing that
blists, numpy arrays, databases, and panda dataframes all live outside
the standard library.  Those tools are easier to maintain externally and
it is easier for you to keep control over the design.  Remember the saying,
the standard library is where code goes to die (and I would add that it
should already be mature (or nearly dead) by the time it gets there). 

* That said, it is a reasonable possibility that the standard library would
benefit from some kind sorted collection (the idea comes up from time
to time).

  
 Raymond Hettinger___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-28 Thread Marko Rauhamaa
Raymond Hettinger raymond.hettin...@gmail.com:

 * An AVL balanced tree isn't the only solution or necessarily the best
 solution to the problem. Tree nodes tend to take more space than
 denser structures and they have awful cache locality (these are the
 same reasons that deques use doubly-linked blocks rather than a plain
 doubly linked lists).

Maybe. The key is the API. The implementation underneath should be
changeable. For example, Jython would probably use SortedTree to
implement it.

Performance tests should help decide when an implementation is switched
for a more efficient one. In some of my tests, I haven't seen any
significant performance differences between RB trees and AVL trees, for
example. The blist implementation, which I have taken a quick glance at,
buys cache locality at the price of block copying; I have no data to
decide if the tradeoff is a good one.

The main thing, IMO, is to get one sorted dictionary in.

 * The name of the tool probably should not be sortedtree.

Correct. That was a mistake on the Subject line. In the code, it's
sorteddict.

 * That said, it is a reasonable possibility that the standard library
 would benefit from some kind sorted collection (the idea comes up from
 time to time).

Yes. As a user, I have craved for an implementation, which is readily
available in Java and the linux kernel, for example.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-28 Thread Marko Rauhamaa
Marko Rauhamaa ma...@pacujo.net:

 For example, Jython would probably use SortedTree to implement it.

That word just keeps coming out of my keyboard. The Java class is of
course the TreeMap.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-28 Thread Nick Coghlan
On 29 March 2014 01:57, Stephen J. Turnbull step...@xemacs.org wrote:
 Another way to put it is, we need a better way to fund support of
 routine maintenance (ie, the unfun parts) than negotiating it module
 by module.

Yes, yes we do, and there *are* people working on that (see
https://wiki.python.org/moin/PythonSoftwareFoundation/BoardCandidates2014)
:)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-28 Thread Dan Stromberg
On Fri, Mar 28, 2014 at 5:48 PM, Daniel Stutzbach stutzb...@google.com wrote:
 On Fri, Mar 28, 2014 at 2:54 PM, Marko Rauhamaa ma...@pacujo.net wrote:

 The blist implementation, which I have taken a quick glance at,

 buys cache locality at the price of block copying; I have no data to
 decide if the tradeoff is a good one.


 There's actually a compile-time parameter so that you can tune the tradeoff
 by adjusting how wide each of blist's nodes are. Like most tradeoffs,
 whether it's good or bad depends on what you're trying to do.  RedBlack
 trees are very similar to a B-tree with a node-width of 4:
 http://en.wikipedia.org/wiki/Red%E2%80%93black_tree#Analogy_to_B-trees_of_order_4

 blist's sorteddict is written in Python on top of the blist type (which is
 written in C).  Because of the Python layer, it's slower by a constant
 factor compared to pure-C implementations in microbenchmarks.

It grieves me a bit to say this, and blist and blist's sorteddict are
probably a good tool for an appropriate job, but blist's sorteddict
type is quite a bit slower than several pure python implementations of
dictionaries with ordered keys.

In my testing blist.sorteddict was dead last for random keys, and
wasn't last but was still significantly underperforming for sequential
keys (outperforming only binary tree and scapegoat tree, behind all
others):

http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/2014-03-18/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-27 Thread Stephen J. Turnbull
Nick Coghlan writes:

   On 27 Mar 2014 07:02, Guido van Rossum gu...@python.org wrote:
  Actually, the first step is publish it on PyPI, the second is to
  get a fair number of happy users there. The bar for getting something
  included into the stdlib is pretty high

  The why not a third party module? bar also got a fair bit higher
  with Python 3.4 - by bundling pip, we have deliberately made third
  party modules easier to consume, thus weakening the convenience
  argument that applies to stdlib inclusion.

Maybe.  That depends on if you care about the convenience of folks who
have to get new modules past Corporate Security, but it's easier to
get an upgrade of the whole shebang.  I don't think it's ever really
been resolved whether they're a typical case that won't go away or a
special group whose special needs should be considered.

Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-27 Thread Maciej Fijalkowski
On Thu, Mar 27, 2014 at 10:11 AM, Stephen J. Turnbull
step...@xemacs.org wrote:
 Nick Coghlan writes:

On 27 Mar 2014 07:02, Guido van Rossum gu...@python.org wrote:
   Actually, the first step is publish it on PyPI, the second is to
   get a fair number of happy users there. The bar for getting something
   included into the stdlib is pretty high

   The why not a third party module? bar also got a fair bit higher
   with Python 3.4 - by bundling pip, we have deliberately made third
   party modules easier to consume, thus weakening the convenience
   argument that applies to stdlib inclusion.

 Maybe.  That depends on if you care about the convenience of folks who
 have to get new modules past Corporate Security, but it's easier to
 get an upgrade of the whole shebang.  I don't think it's ever really
 been resolved whether they're a typical case that won't go away or a
 special group whose special needs should be considered.

 Steve

And random pieces of C included in the standard library can be
shuffled under the carpet under the disguise of upgrade or what are
you suggesting?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-27 Thread Paul Moore
On 27 March 2014 08:16, Maciej Fijalkowski fij...@gmail.com wrote:
 And random pieces of C included in the standard library can be
 shuffled under the carpet under the disguise of upgrade or what are
 you suggesting?

The sort of thing that happens is that the relevant approvers will
accept python-dev as a trusted supplier and then Python upgrades are
acceptable subject to review of the changes, etc. For a new module,
there is a whole other level of questions around how do we trust the
person who developed the code, do we need to do a full code review,
etc?

It's a bit unfair to describe the process as random pieces of C
being shuffled under the carpet. (Although there probably are
environments where that is uncomfortably close to the truth :-()

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-27 Thread Maciej Fijalkowski
On Thu, Mar 27, 2014 at 11:07 AM, Paul Moore p.f.mo...@gmail.com wrote:
 On 27 March 2014 08:16, Maciej Fijalkowski fij...@gmail.com wrote:
 And random pieces of C included in the standard library can be
 shuffled under the carpet under the disguise of upgrade or what are
 you suggesting?

 The sort of thing that happens is that the relevant approvers will
 accept python-dev as a trusted supplier and then Python upgrades are
 acceptable subject to review of the changes, etc. For a new module,
 there is a whole other level of questions around how do we trust the
 person who developed the code, do we need to do a full code review,
 etc?

 It's a bit unfair to describe the process as random pieces of C
 being shuffled under the carpet. (Although there probably are
 environments where that is uncomfortably close to the truth :-()

 Paul

I just find my company is stupid so let's work around it by putting
stuff to python standard library unacceptable argument for python-dev
and all the python community.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-27 Thread Kristján Valur Jónsson
True.
I've long since added a heapdel() to our local fork.
a heappop(idx=0) extension would do the same
I can provide a patch if there is interest.
K


Ideally, I think you should be able to replace the cancelled item with
the last item in the heap and then fix the heap in logarithmic time,
but the heapq API doesn't seem to provide a way to do this.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-27 Thread Nick Coghlan
On 27 March 2014 18:11, Stephen J. Turnbull step...@xemacs.org wrote:
 Nick Coghlan writes:

On 27 Mar 2014 07:02, Guido van Rossum gu...@python.org wrote:
   Actually, the first step is publish it on PyPI, the second is to
   get a fair number of happy users there. The bar for getting something
   included into the stdlib is pretty high

   The why not a third party module? bar also got a fair bit higher
   with Python 3.4 - by bundling pip, we have deliberately made third
   party modules easier to consume, thus weakening the convenience
   argument that applies to stdlib inclusion.

 Maybe.  That depends on if you care about the convenience of folks who
 have to get new modules past Corporate Security, but it's easier to
 get an upgrade of the whole shebang.  I don't think it's ever really
 been resolved whether they're a typical case that won't go away or a
 special group whose special needs should be considered.

I'm all too well aware of this particular problem (although thankfully
don't have to deal with it myself any more), and it's a key part of
why I said a fair bit higher rather than insurmountable :)

Such environments *do* usually have procedures to get additional open
source software approved, it's just a pain. That pain then becomes
another factor to take into account deciding between using a simpler
solution, rolling your own custom solution, or tackling your
organisation's module approval process.

That said, getting approval is definitely easier when the request is
to trust the Python Software Foundation vs J. Random Programmer's
account on GitHub, so yes, it still counts in favour of sufficiently
compelling stdlib additions.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-27 Thread Nick Coghlan
On 27 March 2014 19:10, Maciej Fijalkowski fij...@gmail.com wrote:
 On Thu, Mar 27, 2014 at 11:07 AM, Paul Moore p.f.mo...@gmail.com wrote:
 On 27 March 2014 08:16, Maciej Fijalkowski fij...@gmail.com wrote:
 And random pieces of C included in the standard library can be
 shuffled under the carpet under the disguise of upgrade or what are
 you suggesting?

 The sort of thing that happens is that the relevant approvers will
 accept python-dev as a trusted supplier and then Python upgrades are
 acceptable subject to review of the changes, etc. For a new module,
 there is a whole other level of questions around how do we trust the
 person who developed the code, do we need to do a full code review,
 etc?

 It's a bit unfair to describe the process as random pieces of C
 being shuffled under the carpet. (Although there probably are
 environments where that is uncomfortably close to the truth :-()

 Paul

 I just find my company is stupid so let's work around it by putting
 stuff to python standard library unacceptable argument for python-dev
 and all the python community.

Due diligence and prudent risk management are not stupid - most open
source projects and small companies just don't have the luxury of
worrying about them, as they're so far down the list of concerns that
the additional risk of using arbitrary code downloaded off the
internet doesn't even register.

As organisations get larger, they have more to lose, so they typically
start worrying about that kind of thing. Properly vetting software for
licensing and potential security issues is expensive though, so they
usually prefer to outsource that task to trusted providers (this is
one of the key concepts that gets me paid). Contractual arrangements
and brand reputations start to replace blind trust in upstream
developers.

Is this less efficient than full open collaboration? Yes, it is - the
verify part of trust, but verify comes at a real cost in time and
money. However, there are plenty of good reasons that phrase is
trust, but verify rather than trust unconditionally.

You may choose not to care about those more cautious users, and that's
fine - but they're still worth taking into account when making design
decisions if you want to cross the marketing chasm from early
adopters to everyone else.

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-27 Thread Chris Angelico
On Thu, Mar 27, 2014 at 8:58 PM, Nick Coghlan ncogh...@gmail.com wrote:
On 27 March 2014 19:10, Maciej Fijalkowski fij...@gmail.com wrote:
 I just find my company is stupid so let's work around it by putting
 stuff to python standard library unacceptable argument for python-dev
 and all the python community.

 Due diligence and prudent risk management are not stupid - most open
 source projects and small companies just don't have the luxury of
 worrying about them, as they're so far down the list of concerns that
 the additional risk of using arbitrary code downloaded off the
 internet doesn't even register.

I don't think anyone's saying it's stupid to be cautious, but more
that it's stupid to blindly accept the latest python.org release and
*not* accept something from another source. And if that's stupid,
well, I'm stupid too - blindly accepting a whole lot of binary package
updates because they're on ftp.au.debian.org, for instance. Why do I
trust that, and not random sites on the internet? Because I trust that
the Debian package maintainers to check what goes through, and I trust
that there are people with reputations at stake, who won't want to
send something dodgy through. It's not perfect, but it's a whole lot
easier than checking every single package that goes through.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-27 Thread Nick Coghlan
On 27 March 2014 20:38, Chris Angelico ros...@gmail.com wrote:
 On Thu, Mar 27, 2014 at 8:58 PM, Nick Coghlan ncogh...@gmail.com wrote:
On 27 March 2014 19:10, Maciej Fijalkowski fij...@gmail.com wrote:
 I just find my company is stupid so let's work around it by putting
 stuff to python standard library unacceptable argument for python-dev
 and all the python community.

 Due diligence and prudent risk management are not stupid - most open
 source projects and small companies just don't have the luxury of
 worrying about them, as they're so far down the list of concerns that
 the additional risk of using arbitrary code downloaded off the
 internet doesn't even register.

 I don't think anyone's saying it's stupid to be cautious, but more
 that it's stupid to blindly accept the latest python.org release and
 *not* accept something from another source. And if that's stupid,
 well, I'm stupid too - blindly accepting a whole lot of binary package
 updates because they're on ftp.au.debian.org, for instance. Why do I
 trust that, and not random sites on the internet? Because I trust that
 the Debian package maintainers to check what goes through, and I trust
 that there are people with reputations at stake, who won't want to
 send something dodgy through. It's not perfect, but it's a whole lot
 easier than checking every single package that goes through.

Right - trusting the PSF and the Debian package review process are
reasonable trade-offs at a personal level. However, it highlights why
this is still a benefit of bringing things into the standard library:
a lot of the overhead in review and audit processes is incurred based
on the *number of components to be reviewed*, rather than the amount
of code they contain. Other aspects of the overhead are incurred *per
organisation trusted*.

Inclusion in the standard library means not only bringing something
under the PSF's trust umbrella, but also within the core Python
component. While we usually use corporate environments as our example,
it applies equally to Linux distros - they generally have Python
packaged already, and will eventually inherit standard library
improvements. For third party packages, it requires someone to do the
work to get them installed.

Extending our trust to include a new component isn't to be done
lightly, but it *does* genuinely save work in the long run for a whole
lot of other people whenever we choose to do so, and that is the point
Stephen was making.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-27 Thread Chris Angelico
On Thu, Mar 27, 2014 at 10:10 PM, Nick Coghlan ncogh...@gmail.com wrote:
 Extending our trust to include a new component isn't to be done
 lightly, but it *does* genuinely save work in the long run for a whole
 lot of other people whenever we choose to do so, and that is the point
 Stephen was making.

Exactly so.

Blessing a module as part of the stdlib is an important mark of trust,
and one that is most definitely not stupid to respect. (In case my
previous message was ambiguous, I am fully in favour of people
trusting python.org in this way, and fully understanding of people NOT
trusting other sources, or at least not automatically.)

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-27 Thread Antoine Pitrou
On Thu, 27 Mar 2014 09:25:05 +
Kristján Valur Jónsson krist...@ccpgames.com wrote:
 True.
 I've long since added a heapdel() to our local fork.
 a heappop(idx=0) extension would do the same
 I can provide a patch if there is interest.

I think either of them would be cool. I don't know it would be approved
by Raymond or whoever else.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-27 Thread Serhiy Storchaka

27.03.14 00:16, Guido van Rossum написав(ла):

Yeah, so the pyftp fix is to keep track of how many timers were
cancelled, and if the number exceeds a threshold it just recreates the
heap, something like

heap = [x for x in heap if not x.cancelled]
heapify(heap)


See also http://bugs.python.org/issue13451 which proposes such approach 
for the sched module.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-27 Thread Thomas Wouters
On Wed, Mar 26, 2014 at 9:55 PM, Benjamin Peterson benja...@python.orgwrote:

 On Wed, Mar 26, 2014, at 13:31, Marko Rauhamaa wrote:
 
  I have made a full implementation of a balanced tree and would like to
  know what the process is to have it considered for inclusion in Python
  3.

 It's not a bad idea. (I believe others have proposed an red-black tree.)
 Certainly, it requires a PEP and a few months of bikesheding, though.


Not to mention discussion about whether it shouldn't just be an existing
PyPI package, like http://pypi.python.org/pypi/blist, rather than a new
implementation.

-- 
Thomas Wouters tho...@python.org

Hi! I'm an email virus! Think twice before sending your email to help me
spread!
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-27 Thread Marko Rauhamaa
Thomas Wouters tho...@python.org:

 Not to mention discussion about whether it shouldn't just be an existing
 PyPI package, like http://pypi.python.org/pypi/blist, rather than a new
 implementation.

I'm fine with any implementation as long as it is in the standard
library.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-27 Thread Guido van Rossum
Surely you can show empathy and still explain why it's not that easy.
On Mar 27, 2014 2:11 AM, Maciej Fijalkowski fij...@gmail.com wrote:

 On Thu, Mar 27, 2014 at 11:07 AM, Paul Moore p.f.mo...@gmail.com wrote:
  On 27 March 2014 08:16, Maciej Fijalkowski fij...@gmail.com wrote:
  And random pieces of C included in the standard library can be
  shuffled under the carpet under the disguise of upgrade or what are
  you suggesting?
 
  The sort of thing that happens is that the relevant approvers will
  accept python-dev as a trusted supplier and then Python upgrades are
  acceptable subject to review of the changes, etc. For a new module,
  there is a whole other level of questions around how do we trust the
  person who developed the code, do we need to do a full code review,
  etc?
 
  It's a bit unfair to describe the process as random pieces of C
  being shuffled under the carpet. (Although there probably are
  environments where that is uncomfortably close to the truth :-()
 
  Paul

 I just find my company is stupid so let's work around it by putting
 stuff to python standard library unacceptable argument for python-dev
 and all the python community.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-27 Thread Antoine Pitrou
On Thu, 27 Mar 2014 08:50:01 -0700
Daniel Stutzbach stutzb...@google.com wrote:
 
 Due to way the heapq is implemented, it can't provide an efficient API for
 removing an arbitrary item.  Swapping with the last element allows you to
 efficiently remove the item at a particular index, but you first need to
 find the current index of an item, which requires a O(n) scan.
 
 To provide efficient cancellation and removal, a heap implementation needs
 some way to efficiently answer What is the current index of this item?.
  There are a couple of ways to achieve that, but they all require more
 storage than heapq's list-based approach.

You are right. I was assuming the index is already known.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-27 Thread Kristján Valur Jónsson

for our stackless socket framework we have the same issue.
Windows provides an opaque timer system where a timer can be cancelled by 
handle.  But on linux one has to
craft one's own.

One thing with this particular use case is that a heapq is overkill for network 
timer.  . For network timers a granularity of one
second should typically be sufficient.  Thus, one can implement a map of future 
times (in quantisized time, e.g. whole seconds) to sets of timers.
A timer is then keyed by its quantisized due time plus its callback.  
Cancellation can then be O(1).

From: Python-Dev [mailto:python-dev-bounces+kristjan=ccpgames@python.org] 
On Behalf Of Guido van Rossum
Sent: 26. mars 2014 21:42
To: Marko Rauhamaa
Cc: Python-Dev
Subject: Re: [Python-Dev] collections.sortedtree

I haven't felt it, heapq feels natural to me for this use case. :-)
I'm aware of the issue of frequent cancelled timers, but chose to wait and see 
rather than preemptively fix it (only so many hours in a day). IIRC pyftplib 
has a clever cleanup algorithm that we can easily add if that usage pattern 
becomes popular.

On Wed, Mar 26, 2014 at 2:36 PM, Marko Rauhamaa 
ma...@pacujo.netmailto:ma...@pacujo.net wrote:
Guido van Rossum gu...@python.orgmailto:gu...@python.org:

 Actually, the first step is publish it on PyPI, the second is to get a
 fair number of happy users there. The bar for getting something
 included into the stdlib is pretty high -- you need to demonstrate
 that there is a need *and* that having it as a 3rd party module is a
 problem.
I hear you about the process.

About the need part, I'm wondering if you haven't felt it in
implementing the timers for asyncio. I have had that need in several
network programming projects and have ended up using my AVL tree
implementation (C and Python).

Well, time will tell if frequent canceled timers end up piling up the
heap queue.




___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-27 Thread Stephen J. Turnbull
Nick Coghlan writes:
  On 27 March 2014 18:11, Stephen J. Turnbull step...@xemacs.org wrote:

   I don't think it's ever really been resolved whether they're a
   typical case that won't go away or a special group whose
   special needs should be considered.

  That said, getting approval is definitely easier when the request is
  to trust the Python Software Foundation vs J. Random Programmer's
  account on GitHub, so yes, it still counts in favour of sufficiently
  compelling stdlib additions.

I note that this *still* doesn't resolve the alternatives presented
above.wink/

I hasten to add, it doesn't need to be done *now*.  Yes, I do think it
matters.

Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-27 Thread Stephen J. Turnbull
Maciej Fijalkowski writes:
  On Thu, Mar 27, 2014 at 10:11 AM, Stephen J. Turnbull
  step...@xemacs.org wrote:

   Maybe.  That depends on if you care about the convenience of folks who
   have to get new modules past Corporate Security, but it's easier to
   get an upgrade of the whole shebang.  I don't think it's ever really
   been resolved whether they're a typical case that won't go away or a
   special group whose special needs should be considered.
  
   Steve
  
  And random pieces of C included in the standard library can be
  shuffled under the carpet under the disguise of upgrade or what are
  you suggesting?

CPython doesn't do stuff like that; C code in CPython gets careful
review, and can hardly be called random.  PyPI packages do do stuff
like that.  Not all of them by far, but to tell the difference you
need to review line-by-line.

So I'm simply saying (as Nick did) that it is often easier to get
changes past a corporate bureaucracy if they are certified by
inclusion in CPython (not to exclude PyPy or Jython, but the corporate
consideration is about a specific distribution), than if it's a random
package on PyPI.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-27 Thread Stephen J. Turnbull
Maciej Fijalkowski writes:

  I just find my company is stupid

I find labeling companies as stupid, merely because they are
cautious about what external code they allow their developers to
depend on, unacceptable.  And I'll go to the trouble of explaining
why.

  so let's work around it by putting stuff to python standard
  library unacceptable argument for python-dev and all the python
  community.

Then you're completely missing the point.  There are two issues in
including code in the Python standard library.  The first is technical
excellence.  You need to hurdle a certain bar, or the code won't go
in.  Furthermore, this bar includes a comprehensive set of regression
tests.  This implies that (1) human review of the code is on average
much easier for CPython stdlib code than for PyPI code, and (2) a
claim that no relevant-to-the-company behavior has actually changed is
(again, on average) much more verifiable and plausible for CPython
than for PyPI code.  This is at least some convenience for *all*
users, and a near necessity for some strictly controlled environments.
In the latter, the gatekeeper is all too likely to say PyPI?  Bring
us a CTO signoff that 'this module is essential', or forget it.

The second is a purely economic tradeoff: is the value of Python
aggregated across all our users better enhanced by restricting
additions to the stdlib, and thus reducing future maintenance effort
and (presumably) increasing the rate at which Python code is improved,
or by improving the batteries that are included (with the opposite
effects)?  That's a judgment call.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-27 Thread Chris Angelico
On Fri, Mar 28, 2014 at 3:26 PM, Stephen J. Turnbull step...@xemacs.org wrote:
 This is at least some convenience for *all*
 users, and a near necessity for some strictly controlled environments.
 In the latter, the gatekeeper is all too likely to say PyPI?  Bring
 us a CTO signoff that 'this module is essential', or forget it.

Don't forget, of course, that there is a middle ground. Something
that's really REALLY awesome on PyPI but isn't in the stdlib might be
packaged by various Linux distros. If you use Debian, typing
apt-cache pkgnames python- (or using tab completion on an apt-get
install) will show you a veritable ton of packages, many of which are
originally off PyPI. Even if you don't actually use that distribution,
it might be worth telling the gatekeeper that Debian and/or Red Hat
have indicated trust for some particular version of some particular
package, which might help get over that hump. But it's still going to
be a much bigger hump than it's in CPython 3.4.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] collections.sortedtree

2014-03-26 Thread Marko Rauhamaa

I have made a full implementation of a balanced tree and would like to
know what the process is to have it considered for inclusion in Python
3.

To summarize, the implementation closely parallels dict() features and
resides in _collectionsmodule.c under the name collections.sortedtree.
It uses solely the  operator to compare keys. I have chosen the AVL
tree as an implementation technique.

The views support a number of optional arguments:

sorteddict.keys(reversed=False, start=unspecified, inclusive=True)


The primary objective of having a balanced tree in the standard library
is to support ordered access in an efficient manner. The typical
applications would include timers (networking), aging (cache) and prefix
patterns (routing).


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-26 Thread Ethan Furman

On 03/26/2014 01:31 PM, Marko Rauhamaa wrote:


I have made a full implementation of a balanced tree and would like to
know what the process is to have it considered for inclusion in Python
3.


Open a ticket on the tracker [1], post your code to that ticket, sign the CLA [2], answer questions, etc., that come up 
in the code review.


I believe Raymond Hettinger is the Guardian of the collections module; add him 
as nosy.

--
~Ethan~

[1] http://http://bugs.python.org
[2] https://www.python.org/psf/contrib/contrib-form
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-26 Thread Benjamin Peterson
On Wed, Mar 26, 2014, at 13:31, Marko Rauhamaa wrote:
 
 I have made a full implementation of a balanced tree and would like to
 know what the process is to have it considered for inclusion in Python
 3.

It's not a bad idea. (I believe others have proposed an red-black tree.)
Certainly, it requires a PEP and a few months of bikesheding, though.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-26 Thread Guido van Rossum
Actually, the first step is publish it on PyPI, the second is to get a fair
number of happy users there. The bar for getting something included into
the stdlib is pretty high -- you need to demonstrate that there is a need
*and* that having it as a 3rd party module is a problem. And that once it's
in, (a) it will be stable, and (b) someone who cares about it and knows the
code thoroughly is available maintain it for years.


On Wed, Mar 26, 2014 at 1:53 PM, Ethan Furman et...@stoneleaf.us wrote:

 On 03/26/2014 01:31 PM, Marko Rauhamaa wrote:


 I have made a full implementation of a balanced tree and would like to
 know what the process is to have it considered for inclusion in Python
 3.


 Open a ticket on the tracker [1], post your code to that ticket, sign the
 CLA [2], answer questions, etc., that come up in the code review.

 I believe Raymond Hettinger is the Guardian of the collections module; add
 him as nosy.

 --
 ~Ethan~

 [1] http://http://bugs.python.org
 [2] https://www.python.org/psf/contrib/contrib-form

 ___
 Python-Dev mailing list
 Python-Dev@python.org
 https://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: https://mail.python.org/mailman/options/python-dev/
 guido%40python.org




-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-26 Thread Barry Warsaw
On Mar 26, 2014, at 01:55 PM, Benjamin Peterson wrote:

It's not a bad idea. (I believe others have proposed an red-black tree.)
Certainly, it requires a PEP and a few months of bikesheding, though.

Generally, PEPs aren't necessary for simple or relatively uncontroversial
additions to existing modules or the stdlib.

If you're proposing an addition to an existing module, file a tracker issue
and nosy the module expert (for collections, it's Raymond Hettinger[1]).
Others may see the new issue and decide to nosy themselves in if they want to
provide additional feedback and input.

If you're proposing an entirely new module, it's best to publish it first on
PyPI, get lots of users, and listen to their feedback as you evolve the
library.  Once you feel like it's had plenty of time to bake as a third party
module *and* the API is sufficiently stable that you won't mind the Python
stdlib 18 month development cycle, then you can propose that it be included in
Python.

Cheers,
-Barry

[1] http://docs.python.org/devguide/experts.html
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-26 Thread Antoine Pitrou
On Wed, 26 Mar 2014 22:31:56 +0200
Marko Rauhamaa ma...@pacujo.net wrote:
 
 The primary objective of having a balanced tree in the standard library
 is to support ordered access in an efficient manner. The typical
 applications would include timers (networking), aging (cache)

Wouldn't a heapq work as well for those two?

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-26 Thread Marko Rauhamaa
Guido van Rossum gu...@python.org:

 Actually, the first step is publish it on PyPI, the second is to get a
 fair number of happy users there. The bar for getting something
 included into the stdlib is pretty high -- you need to demonstrate
 that there is a need *and* that having it as a 3rd party module is a
 problem.

I hear you about the process.

About the need part, I'm wondering if you haven't felt it in
implementing the timers for asyncio. I have had that need in several
network programming projects and have ended up using my AVL tree
implementation (C and Python).

Well, time will tell if frequent canceled timers end up piling up the
heap queue.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-26 Thread Terry Reedy

On 3/26/2014 4:59 PM, Guido van Rossum wrote:

Actually, the first step is publish it on PyPI, the second is to get a
fair number of happy users there. The bar for getting something included
into the stdlib is pretty high -- you need to demonstrate that there is
a need *and* that having it as a 3rd party module is a problem. And that
once it's in, (a) it will be stable, and (b) someone who cares about it
and knows the code thoroughly is available maintain it for years.


Perhaps the collections doc should mention that there are other 
specializes container types available on PyPI and either list some or 
point to a wiki page listing some. There must be at least 10 that could 
be included in such a list.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-26 Thread Guido van Rossum
I haven't felt it, heapq feels natural to me for this use case. :-)

I'm aware of the issue of frequent cancelled timers, but chose to wait and
see rather than preemptively fix it (only so many hours in a day). IIRC
pyftplib has a clever cleanup algorithm that we can easily add if that
usage pattern becomes popular.


On Wed, Mar 26, 2014 at 2:36 PM, Marko Rauhamaa ma...@pacujo.net wrote:

 Guido van Rossum gu...@python.org:

  Actually, the first step is publish it on PyPI, the second is to get a
  fair number of happy users there. The bar for getting something
  included into the stdlib is pretty high -- you need to demonstrate
  that there is a need *and* that having it as a 3rd party module is a
  problem.

 I hear you about the process.

 About the need part, I'm wondering if you haven't felt it in
 implementing the timers for asyncio. I have had that need in several
 network programming projects and have ended up using my AVL tree
 implementation (C and Python).

 Well, time will tell if frequent canceled timers end up piling up the
 heap queue.


 Marko




-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-26 Thread Nick Coghlan
On 27 Mar 2014 07:02, Guido van Rossum gu...@python.org wrote:

 Actually, the first step is publish it on PyPI, the second is to get a
fair number of happy users there. The bar for getting something included
into the stdlib is pretty high -- you need to demonstrate that there is a
need *and* that having it as a 3rd party module is a problem. And that once
it's in, (a) it will be stable, and (b) someone who cares about it and
knows the code thoroughly is available maintain it for years.

The why not a third party module? bar also got a fair bit higher with
Python 3.4 - by bundling pip, we have deliberately made third party modules
easier to consume, thus weakening the convenience argument that applies to
stdlib inclusion.

The Python Packaging Authority are also working to reduce the barriers to
distribution and consumption of C extensions, which will again weaken the
argument for stdlib inclusion of third party C extensions. The metadata 2.0
efforts are also designed to streamline the process of conversion to
platform specific formats like RPM.

It's not that I don't see a sorted tree as valuable - I do. I just also see
it as a fairly specialised type. The main case I could see being made for
inclusion is if it made a major difference to the implementation of
something else that was already in the stdlib. I believe that's where most
other additions to the collections and types modules have come from in
recent releases - we wanted to *use* them, and it seemed better to do the
work to design an appropriate public API rather than keep them private
(enum and generic function support arrived the same way, although we
haven't actually redesigned pprint to rely on the latter yet).

Cheers,
Nick.



 On Wed, Mar 26, 2014 at 1:53 PM, Ethan Furman et...@stoneleaf.us wrote:

 On 03/26/2014 01:31 PM, Marko Rauhamaa wrote:


 I have made a full implementation of a balanced tree and would like to
 know what the process is to have it considered for inclusion in Python
 3.


 Open a ticket on the tracker [1], post your code to that ticket, sign
the CLA [2], answer questions, etc., that come up in the code review.

 I believe Raymond Hettinger is the Guardian of the collections module;
add him as nosy.

 --
 ~Ethan~

 [1] http://http://bugs.python.org
 [2] https://www.python.org/psf/contrib/contrib-form

 ___
 Python-Dev mailing list
 Python-Dev@python.org
 https://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
https://mail.python.org/mailman/options/python-dev/guido%40python.org




 --
 --Guido van Rossum (python.org/~guido)

 ___
 Python-Dev mailing list
 Python-Dev@python.org
 https://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-26 Thread Marko Rauhamaa
Antoine Pitrou solip...@pitrou.net:

 Wouldn't a heapq work as well for those two?

In my experience, networking entities typically start a timer at each
interaction and cancel the pending one. So you have numerous timers that
virtually never expire. You might have 100 interactions per second, each
canceling and restarting a 10-minute timer.

I don't know first hand if that causes heap queues to cause measurable
heap or CPU pressure.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-26 Thread Antoine Pitrou
On Wed, 26 Mar 2014 23:41:42 +0200
Marko Rauhamaa ma...@pacujo.net wrote:
 Antoine Pitrou solip...@pitrou.net:
 
  Wouldn't a heapq work as well for those two?
 
 In my experience, networking entities typically start a timer at each
 interaction and cancel the pending one. So you have numerous timers that
 virtually never expire. You might have 100 interactions per second, each
 canceling and restarting a 10-minute timer.
 
 I don't know first hand if that causes heap queues to cause measurable
 heap or CPU pressure.

Each individual heapq operation (push or pop) will be O(log n). That's
not different from a balanced search tree (although of course the
constant multiplier may vary).

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-26 Thread Marko Rauhamaa
Terry Reedy tjre...@udel.edu:

 Perhaps the collections doc should mention that there are other
 specializes container types available on PyPI and either list some or
 point to a wiki page listing some. There must be at least 10 that
 could be included in such a list.

There *is* a relatively high threshold of importing C extensions from an
external source. If I build an application making use of them and advise
coworkers to use it, they would likely balk at having to compile them.
Not all machines have a development toolkit.

Furthermore:

   # which pip
   /usr/bin/which: no pip in (/usr/local/sbin:/usr/local/bin:/sbin:/bin:\
   /usr/sbin:/usr/bin:/root/bin)
   # yum install pip
   No package pip available.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-26 Thread Marko Rauhamaa
Antoine Pitrou solip...@pitrou.net:

 Marko Rauhamaa ma...@pacujo.net wrote:
 In my experience, networking entities typically start a timer at each
 interaction and cancel the pending one. So you have numerous timers
 that virtually never expire. You might have 100 interactions per
 second, each canceling and restarting a 10-minute timer.

 Each individual heapq operation (push or pop) will be O(log n). That's
 not different from a balanced search tree (although of course the
 constant multiplier may vary).

Yes, but if I have 1000 connections with one active timer each. The size
of my sorted tree is 1000 timer objects. There are typically no expiries
to react to.

If the canceled timer lingers in the heapq till its expiry (in 10
minutes), the size is 100 * 10 * 60 = 60,000. The CPU has to wake up
constantly to clear the expired timers.

In practice, none of that might matter.


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-26 Thread Antoine Pitrou
On Wed, 26 Mar 2014 23:57:27 +0200
Marko Rauhamaa ma...@pacujo.net wrote:
 Antoine Pitrou solip...@pitrou.net:
 
  Marko Rauhamaa ma...@pacujo.net wrote:
  In my experience, networking entities typically start a timer at each
  interaction and cancel the pending one. So you have numerous timers
  that virtually never expire. You might have 100 interactions per
  second, each canceling and restarting a 10-minute timer.
 
  Each individual heapq operation (push or pop) will be O(log n). That's
  not different from a balanced search tree (although of course the
  constant multiplier may vary).
 
 Yes, but if I have 1000 connections with one active timer each. The size
 of my sorted tree is 1000 timer objects. There are typically no expiries
 to react to.
 
 If the canceled timer lingers in the heapq till its expiry (in 10
 minutes), the size is 100 * 10 * 60 = 60,000. The CPU has to wake up
 constantly to clear the expired timers.

Ideally, I think you should be able to replace the cancelled item with
the last item in the heap and then fix the heap in logarithmic time,
but the heapq API doesn't seem to provide a way to do this.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-26 Thread Ben Darnell
On Wed, Mar 26, 2014 at 6:02 PM, Antoine Pitrou solip...@pitrou.net wrote:

 On Wed, 26 Mar 2014 23:57:27 +0200
 Marko Rauhamaa ma...@pacujo.net wrote:
  Antoine Pitrou solip...@pitrou.net:
 
   Marko Rauhamaa ma...@pacujo.net wrote:
   In my experience, networking entities typically start a timer at each
   interaction and cancel the pending one. So you have numerous timers
   that virtually never expire. You might have 100 interactions per
   second, each canceling and restarting a 10-minute timer.
  
   Each individual heapq operation (push or pop) will be O(log n). That's
   not different from a balanced search tree (although of course the
   constant multiplier may vary).
 
  Yes, but if I have 1000 connections with one active timer each. The size
  of my sorted tree is 1000 timer objects. There are typically no expiries
  to react to.
 
  If the canceled timer lingers in the heapq till its expiry (in 10
  minutes), the size is 100 * 10 * 60 = 60,000. The CPU has to wake up
  constantly to clear the expired timers.

 Ideally, I think you should be able to replace the cancelled item with
 the last item in the heap and then fix the heap in logarithmic time,
 but the heapq API doesn't seem to provide a way to do this.


Heaps provide easy access to the first item, but you can't find the last
element without scanning the whole thing.  With a heap, I think the best
approach to the timer-cancellation problem is to periodically rebuild the
heap from the non-cancelled items (Tornado keeps a count of cancellations
and rebuilds the heap when the number of cancelled timeouts is half of the
total.  If cancellations are infrequent then they're just left in the heap
until their time comes due).

-Ben



 Regards

 Antoine.
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 https://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 https://mail.python.org/mailman/options/python-dev/ben%40bendarnell.com

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-26 Thread Guido van Rossum
Yeah, so the pyftp fix is to keep track of how many timers were cancelled,
and if the number exceeds a threshold it just recreates the heap, something
like

heap = [x for x in heap if not x.cancelled]
heapify(heap)


On Wed, Mar 26, 2014 at 3:02 PM, Antoine Pitrou solip...@pitrou.net wrote:

 On Wed, 26 Mar 2014 23:57:27 +0200
 Marko Rauhamaa ma...@pacujo.net wrote:
  Antoine Pitrou solip...@pitrou.net:
 
   Marko Rauhamaa ma...@pacujo.net wrote:
   In my experience, networking entities typically start a timer at each
   interaction and cancel the pending one. So you have numerous timers
   that virtually never expire. You might have 100 interactions per
   second, each canceling and restarting a 10-minute timer.
  
   Each individual heapq operation (push or pop) will be O(log n). That's
   not different from a balanced search tree (although of course the
   constant multiplier may vary).
 
  Yes, but if I have 1000 connections with one active timer each. The size
  of my sorted tree is 1000 timer objects. There are typically no expiries
  to react to.
 
  If the canceled timer lingers in the heapq till its expiry (in 10
  minutes), the size is 100 * 10 * 60 = 60,000. The CPU has to wake up
  constantly to clear the expired timers.

 Ideally, I think you should be able to replace the cancelled item with
 the last item in the heap and then fix the heap in logarithmic time,
 but the heapq API doesn't seem to provide a way to do this.

 Regards

 Antoine.
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 https://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 https://mail.python.org/mailman/options/python-dev/guido%40python.org




-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-26 Thread Antoine Pitrou
On Wed, 26 Mar 2014 18:14:19 -0400
Ben Darnell b...@bendarnell.com wrote:
  
   If the canceled timer lingers in the heapq till its expiry (in 10
   minutes), the size is 100 * 10 * 60 = 60,000. The CPU has to wake up
   constantly to clear the expired timers.
 
  Ideally, I think you should be able to replace the cancelled item with
  the last item in the heap and then fix the heap in logarithmic time,
  but the heapq API doesn't seem to provide a way to do this.
 
 Heaps provide easy access to the first item, but you can't find the last
 element without scanning the whole thing.

I was talking about the last element, not the largest. The point of
replacing with the last element is that shrinking is O(1).

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-26 Thread Dan Stromberg
On Wed, Mar 26, 2014 at 1:55 PM, Benjamin Peterson benja...@python.org wrote:
 On Wed, Mar 26, 2014, at 13:31, Marko Rauhamaa wrote:

 I have made a full implementation of a balanced tree and would like to
 know what the process is to have it considered for inclusion in Python
 3.

 It's not a bad idea. (I believe others have proposed an red-black tree.)
 Certainly, it requires a PEP and a few months of bikesheding, though.

I'd recommend a plain binary tree (for random keys), red-black tree
and treap (both for ordered or mostly-ordered keys, where red-black
tree gives low operation duration standard deviation, and treap gives
low average operation duration).

https://en.wikipedia.org/wiki/Binary_search_tree#Performance_comparisons
http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/

It'd likely make sense to have either a pure python implementation, or
pure python and C-extended, so that Pypy and Jython can share the
feature with CPython.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-26 Thread Dan Stromberg
On Wed, Mar 26, 2014 at 3:52 PM, Marko Rauhamaa ma...@pacujo.net wrote:
 Dan Stromberg drsali...@gmail.com:

 It'd likely make sense to have either a pure python implementation, or
 pure python and C-extended, so that Pypy and Jython can share the
 feature with CPython.

 Jython can build directly on Java's native SortedMap implementation. The
 API should not tie it to a tree. Optimizations and refactorings should
 be allowed. Only O(log N) worst-case behavior should be mandated.

Rare worst cases should be fine for a good reason - see python dict's.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.sortedtree

2014-03-26 Thread Marko Rauhamaa
Dan Stromberg drsali...@gmail.com:

 It'd likely make sense to have either a pure python implementation, or
 pure python and C-extended, so that Pypy and Jython can share the
 feature with CPython.

Jython can build directly on Java's native SortedMap implementation. The
API should not tie it to a tree. Optimizations and refactorings should
be allowed. Only O(log N) worst-case behavior should be mandated.

(And now I notice I named this thread wrong; I named my thingy
collections.sorteddict.)


Marko
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com