Re: Good data structure for finding date intervals including a given date

2012-05-21 Thread Daniel Stutzbach
On Wed, May 16, 2012 at 5:38 AM, Jean-Daniel wrote:

> On Sun, May 13, 2012 at 2:29 PM, Alec Taylor 
> wrote:
> > There is an ordered dict type since Python 3.1[1] and Python 2.7.3[2].
>
> Ordered dict are useful, but they only remember the ordered in which
> they were added, you can not order them a on key.
>

For what it's worth, my blist package (mentioned earlier in the thread)
also provides sortedlist and sorteddict classes which efficiently keep
items in sorted order.

I don't think those are enough on their own to create a efficient solution,
but they might be useful parts of an efficient solution.

More information:
http://pypi.python.org/pypi/blist
http://stutzbachenterprises.com/blist/

-- 
Daniel Stutzbach
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to sort a hash list without generating a new object?

2011-08-03 Thread Daniel Stutzbach
On Tue, Aug 2, 2011 at 5:53 PM, Chris Rebert  wrote:

> If you /really/ need a sorted mapping datatype, google for
> "sorteddict" (which is quite distinct from OrderedDict).
> Or look for a binary search tree or skip list implementation of some
> sort; but these aren't commonly used in Python, so it may be hard to
> find a good one.
>

The blist package (I'm the author) provides a list-like type that has O(log
n) insertions and deletions.  It provides a sorteddict type that uses the
blist type under-the-hood.

blist's "sorteddict" supports the "key" parameter (which works like
list.sort's key parameter), which the original poster could use to maintain
the keys in reverse order.

http://pypi.python.org/pypi/blist/

There's no overhead to learn how to use the new types.  A blist works
exactly like a list but with different performance characteristics, and a
sorteddict works just like a dict but keeps the keys sorted.

-- 
Daniel Stutzbach
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: blist question

2011-07-07 Thread Daniel Stutzbach
On Thu, Jul 7, 2011 at 5:08 AM, dmitrey  wrote:

> I feel lack of native Python lists operations (e.g. taking N greatest
> elements with the involved key function and O(n) speed) and
> occasionally found blist
> http://pypi.python.org/pypi/blist/
>


> Its entry says it is better than Python list. Did anyone ensure?
>

It has better performance for many operations.  I made a very detailed
performance comparison here (against Python 3.1):
http://stutzbachenterprises.com/performance-blist

Undoubtedly, you can find sequences of operations where Python's built-in
list type has better performance by a constant factor.  Python's built-in
list type is more memory-efficient for small lists (blist's memory usage for
small lists could be improved to be similar, but I haven't gotten around to
making the necessary changes.)


> Will it ever be merged into Python source code?
>

Seems unlikely.

-- 
Daniel Stutzbach
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ANN: blist 1.2.0

2010-07-23 Thread Daniel Stutzbach
On Wed, Jul 21, 2010 at 9:47 AM, Daniel Stutzbach <
dan...@stutzbachenterprises.com> wrote:

> What's new?
> ---
>
> - blist.sort() is now *substantially* faster than list.sort() when using
> int or float keys (O(n) vs. O(n log n))
> - The sortedset, sortedlist, and sorteddict types have been revamped for
> better compatibility with the standard library's types.
> - Comprehensive reference documentation is now available at
> http://stutzbachenterprises.com/blist/
> - Numerous other speed improvements
>

Quick addendum:

I just uploaded blist 1.2.1 which fixes a compilation error on BSD-derived
systems (notably MacOS X).
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ANN: blist 1.2.0

2010-07-22 Thread Daniel Stutzbach
On Thu, Jul 22, 2010 at 6:52 PM, Terry Reedy  wrote:

> Another sort of issue will be code maintainability. Two algorithms is
> potentially more problems than one. To me, that partly depends on how well
> factored the current code is.
>

Two algorithms is more code than one algorithm.  No way around that. :-)

Radix sort is a relatively simple sorting algorithm, so the extra complexity
is not too bad.  Once I have a patch ready, others can decide if the
additional complexity is worth the performance boost.


> It would be best is rsort were a switch replacement for timsort after all
> preparations (such as decorate) were done.
>

Yes, that's pretty much what I have in mind.  In pseudo-code:

sort:
decorate()
if all_correct_type and n > 40:
radix_sort()
else
timsort()
undecorate()


> And that is about all I can think of.


Thank you for the thoughts.  I appreciate them!
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ANN: blist 1.2.0

2010-07-22 Thread Daniel Stutzbach
On Thu, Jul 22, 2010 at 5:25 PM, Terry Reedy  wrote:

> Looks good so far. I would like to see that repeated all the way down to
> range(10) to make sure people doing millions of small sorts were not getting
> screwed.
>

I only use the radix sort for n > 40. :-)


> Have you run a patched version against test_sort.py? I believe it mostly
> tests lists of small ints, so radix methods would mostly be switched in.
>

For testing purposes, I maintain a private fork of Python where I've
replaced the built-in list with blist.  I then run Python's test suite as a
way of testing blist.  So, yes, all of the tests have been run.  :-)
 However, since I only use radix for n > 40, many of the tests have not
actually tested the radix sort.


> If it were added and the switching were internal, new test cases would be
> needed to test test timsort.


That's a good point.  It's tempting to add an undocumented parameter to
blist.sort that selects the sorting algorithm to use, to make it make it
easier to test multiple algorithms.  There are probably several different
ways to achieve a similar effect.  Do you mind if we table that discussion
until I actually have a patch?
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ANN: blist 1.2.0

2010-07-21 Thread Daniel Stutzbach
On Wed, Jul 21, 2010 at 5:27 PM, Terry Reedy  wrote:

> These tests use random numbers with a constant, relatively high density of
> 25%, which is favorable to radix sort. What if you do the same test with a
> constant range of, say, 10 (1 billion) or even a trillion or
> quadrillion. Or how about sorting a thousand 128bit ip6 addresses? Which
> wins for that?
>

blist switches to radix sort only if the keys contain only floats or only
integers that fit into a C long.  If the integers get too big, it reverts to
timsort.

When using a sort key, the decorate-sort-undecorate pattern requires
touching every object once before the sort.  The check for a consistent type
is done inexpensively during the decorate phase.

Here's an example result with a density of only 1%, where the radix sort is
around 4 times as fast:


otto:~/src/blist$ python3.1 -m timeit -s 'import random' -s 'x =
[random.randrange(1*100) for i in range(1)]' 'y = list(x)'
'y.sort(key=float)'
100 loops, best of 3: 12.4 msec per loop

otto:~/src/blist$ python3.1 -m timeit -s 'from blist import blist' -s
'import random' -s 'x = [random.randrange(1*100) for i in range(1)]'
'y = blist(x)' 'y.sort(key=float)'
100 loops, best of 3: 3.4 msec per loop


And a density of 0.01%:


otto:~/src/blist$ python3.1 -m timeit -s 'import random' -s 'x =
[random.randrange(1*1) for i in range(1)]' 'y = list(x)'
'y.sort(key=float)'
100 loops, best of 3: 12 msec per loop

otto:~/src/blist$ python3.1 -m timeit -s 'from blist import blist' -s
'import random' -s 'x = [random.randrange(1*1) for i in
range(1)]' 'y = blist(x)' 'y.sort(key=float)'
100 loops, best of 3: 3.52 msec per loop


> list.sort is (near) linear for lists that are mostly ordered. I think Tim's
> writeup about this in in the source. For instance, if one extends a sorted
> with 1% random additions and resorts, list.sort will skip the sorted part,
> sort the additions, and merge them back in. Radix sort ignores pre-existing
> order. So either a lot of comparitive profiling would be needed for
> auto-selection of sort method, or it should be user selectable.
>

I've tested exactly that scenario.  For already-ordered lists, radix sort
and timsort tie.


> Does your radix sort meet the stability guarantee of list.sort?
>

Yes.
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ANN: blist 1.2.0

2010-07-21 Thread Daniel Stutzbach
On Wed, Jul 21, 2010 at 10:32 AM, Antoine Pitrou wrote:

> Are you using some kind of radix sort?
> Could it be contributed back into the standard list type?


Yes and yes.  I have a few mostly-complete patches on the tracker that I
need to polish off, as well as several easy-to-fix bugs that I need to take
care of.  After that I plan to work on porting my sort optimizations back to
the standard list type.

Here's a performance comparison of sorting with blist versus 3.1's list:
http://stutzbachenterprises.com/performance-blist/sort-random-list
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


ANN: blist 1.2.0

2010-07-21 Thread Daniel Stutzbach
The blist package contains several container types to supplement those
built-in to Python.  These types closely mirror the built-in types' methods,
so the learning curve is close to zero.  The package includes:

- blist: a list type with O(log n) insertions, deletes, and slices and other
performance enhancements
- btuple: a tuple type with O(log) slices
- sorteddict: a dict type that efficiently keeps keys in sorted order
- sortedlist: a list type that efficiently keeps the items sorted order
- sortedset: an indexable set type that efficiently keeps the items sorted
order
- weaksortedlist, weaksortedset: weak reference versions of sortedlist and
sortedset

What's new?
---

- blist.sort() is now *substantially* faster than list.sort() when using int
or float keys (O(n) vs. O(n log n))
- The sortedset, sortedlist, and sorteddict types have been revamped for
better compatibility with the standard library's types.
- Comprehensive reference documentation is now available at
http://stutzbachenterprises.com/blist/
- Numerous other speed improvements

Links
-

- blist vs. list performance comparison:
http://stutzbachenterprises.com/performance-blist
- Documentation: http://stutzbachenterprises.com/blist/
- Download: http://pypi.python.org/pypi/blist/
- Source repository: http://github.com/DanielStutzbach/blist
- Issue tracker: http://github.com/DanielStutzbach/blist/issues
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


ANN: winreg_unicode 0.5.0

2010-07-08 Thread Daniel Stutzbach
I'm pleased to announce the release of winreg_unicode 0.5.0, the first
release of winreg_unicode.

The winreg_unicode package aims to be a drop-in replacement for Python
2's _winreg module. However, it returns unicode values where possible,
similar to Python 3's winreg module.

To illustrate the need for the winreg_unicode package, suppose an
application must query the registry to discover a filename and the registry
contains the string "međuresorna.txt". Python 2's _winreg module will return
"meduresorna.txt" instead, which is not the actual name of the file.

The winreg_unicode package does not yet contain all of _winreg's functions.
In particular, functions that write to the registry are not yet included.
Code contributions are welcome.

PyPi: http://pypi.python.org/pypi/winreg_unicode
Bug tracker: http://github.com/DanielStutzbach/winreg_unicode/issues
Source code: http://github.com/DanielStutzbach/winreg_unicode
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fast Efficient way to transfer an object to another list

2010-05-01 Thread Daniel Stutzbach
On Fri, Apr 30, 2010 at 9:16 PM, Jimbo  wrote:

> Hello I have a relatively simple thing to do; move an object from one
> to list into another. But I think my solution maybe inefficient &
> slow.
>

Removing an item from a list is O(n) on average, so it's going to be a bit
slow any way you slice it (unless you only remove from the end of the list
which is O(1)).

Can you tell us more about why you're using a list?  If the order of the
items doesn't matter, you may be able to use a set().

If the order matters, how is the list ordered?  If we know how the list is
ordered, we may be able to help you come up with a clever way to remove an
item cheaply.
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: negative "counts" in collections.Counter?

2010-03-08 Thread Daniel Stutzbach
On Mon, Mar 8, 2010 at 1:24 PM, Raymond Hettinger  wrote:

> Instinct says that conflating two models can be worse for usability
> than just picking one of the models and excluding the other.
>

Toward that end, shouldn't Counter do one (and only one) of the follow?
1) Disallow non-positive counts (like a Multiset/Bag)
2) Allow them when subtracting

In other words, to avoid conflating two models the following two examples
should produce the same output:

>>> c = Counter({'red': 1})
>>> c['red'] -= 2
>>> c
Counter({'red': -1})

>>> c = Counter({'red': 1})
>>> c2 = Counter({'red': 2})
>>> c - c2
Counter()

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Queue peek?

2010-03-02 Thread Daniel Stutzbach
On Tue, Mar 2, 2010 at 1:58 PM, Martin P. Hellwig <
martin.hell...@dcuktec.org> wrote:

> What actually happens if multiple threads at the same time, write to a
> shared dictionary (Not using the same key)?
>

All of Python's built-in types are thread safe.  Both updates will happen.
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: The best way to check if two lists have identical values

2010-02-25 Thread Daniel Stutzbach
On Thu, Feb 25, 2010 at 7:30 AM, mk  wrote:

> There's a number of complications here, depending on definition of 'lists
> with identical values', like whether the same value can be repeated
> different number of times in two lists, or whether the order of values
> matters.
>

Order and repetitions matter:
list1 == list2

Repetition matters, order does not:
sorted(list1) == sorted(list2) # items must be comparable

Neither matters:
set(list1) == set(list2) # items must be hashable

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Efficiently building ordered dict

2010-02-22 Thread Daniel Stutzbach
On Mon, Feb 22, 2010 at 10:32 AM, Bryan  wrote:

> unorderedDict = {}
> for thing in unorderedList:
>if thing.id in unorderedDict:
>UpdateExistingValue(unorderedDict[thing.id])
>else:
>CreateNewValue(unorderedDict[thing.id])
>
> orderedDict = OrderedDict()
> for k in sorted(unorderedDict.keys()):
>orderedDict[k]  unorderedDict.pop(k)
>

It's not entirely clear what UpdateExistingValue and CreateNewValue do.
However, it looks like you are trying to create a dictionary where the keys
are sorted.  Is that right?

If so, you could use the sorteddict type from my blist package, which is
similar to an OrderDict except it efficiently keeps the keys in sorted order
instead of insertion order.  For example:

>>> from blist import sorteddict
>>> my_dict = sorteddict({1: 'a', 6: 'b', -5: 'c'})
>>> my_dict.keys()
[-5, 1, 6]
>>> my_dict[2] = 'd'
>>> my_dict.keys()
[-5, 1, 2, 6]

It's available here:
http://pypi.python.org/pypi/blist/
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Efficient way to break up a list into two pieces

2010-02-20 Thread Daniel Stutzbach
On Sat, Feb 20, 2010 at 6:55 PM, marwie  wrote:

> Now my question is
> if there's a similar thing for breaking a list into two parts. Let's
> say I want to remove from l1 everything from and including position 10
> and store it in l2. Then I can write
>
>l2 = l1[10:]
>del l1[10:]


With Python's list implementation, there's no faster way to split a list
into two parts.  Because Python lists are essentially variable-length
arrays, Python has to copy O(n) references to create l2 and then decrement
O(n) references to remove them from l1.

I wrote an extension type called the blist to handle this and other
problems.  You use it just like you'd use a list, but it has different
performance characteristics.  It uses a hybrid array-tree structure and
performs copy-on-write behind the scenes, so it takes only O(log n) time to
create l2 and only O(log n) time to trim l1.

You'd use it something like this:

>>> from blist import blist
>>> l1 = blist()
>>> # Here, populate l1 as you normally would
>>> l2 = l1[10:]
>>> del l1[10:]

It's available at: http://pypi.python.org/pypi/blist/
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: The future of "frozen" types as the number of CPU cores increases

2010-02-17 Thread Daniel Stutzbach
On Tue, Feb 16, 2010 at 3:15 PM, John Nagle  wrote:

> One possible implementation would be
> to have unfrozen objects managed by reference counting and locking as
> in CPython.  Frozen objects would live in a different memory space and be
> garbage collected by a concurrent garbage collector.
>

A concurrent garbage collector for frozen object would still need to walk
unfrozen objects, to find references to frozen objects.


--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is a merge interval function available?

2010-02-10 Thread Daniel Stutzbach
On Wed, Feb 10, 2010 at 5:23 PM, Peng Yu  wrote:

> I'm wondering there is already a function in python library that can
> merge intervals.
>

Not in the standard library, but this package may help:

http://pypi.python.org/pypi/interval

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com/>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "if {negative}" vs. "if {positive}" style (was: New to Python)

2010-02-09 Thread Daniel Stutzbach
On Tue, Feb 9, 2010 at 9:36 PM, Tim Chase wrote:

> removing the "not" from the condition.  I admit I choose one over the other
> based on some gut-feeling aesthetic that I can't really nail down.  I think
> one of my major influencing factors revolves around the negative "not"
> portion having one or two lines and the  positive portion having a large
> block of code.


If one block is much shorter, I tend to put the shorter block first.  That
way, the "else" and the "if" are almost always on the screen together at the
same time.  If they're both long, I factor out one or both of the blocks
into functions.

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com/>
-- 
http://mail.python.org/mailman/listinfo/python-list


ANN: blist 1.1.1 - now with sortedlist, sortedset, and sorteddict

2010-01-31 Thread Daniel Stutzbach
blist 1.1.1 is now available:

http://pypi.python.org/pypi/blist/

What is blist?
--

The blist is a drop-in replacement for the Python list the provides
better performance when modifying large lists.  Python's built-in list
is a dynamically-sized array; to insert or removal an item from the
beginning or middle of the list, it has to move most of the list in
memory, i.e., O(n) operations.  The blist uses a flexible, hybrid
array/tree structure and only needs to move a small portion of items
in memory, specifically using O(log n) operations.

For small lists, the blist and the built-in list have virtually
identical performance.

What's new?
---

blist 1.1 introduces other data structures based on the blist:

- sortedlist
- sortedset
- weaksortedlist
- weaksorteset
- sorteddict
- btuple

These additional data structures are only available in Python 2.6 or
higher, as they make use of Abstract Base Classes.

The sortedlist is a list that's always sorted.  It's iterable and
indexable like a Python list, but to modify a sortedlist the same
methods you would use on a Python set (add, discard, or remove).

>>> from blist import sortedlist
>>> my_list = sortedlist([3,7,2,1])
>>> my_list
sortedlist([1, 2, 3, 7])
>>> my_list.add(5)
>>> my_list[3]
5
>>>

The sortedlist constructor takes an optional "key" argument, which may
be used to change the sort order just like the sorted() function.

>>> from blist import sortedlist
>>> my_list = sortedlist([3,7,2,1], key=lambda i: -i)
sortedlist([7, 3, 2, 1]
>>>

The sortedset is a set that's always sorted.  It's iterable and
indexable like a Python list, but modified like a set.  Essentially,
it's just like a sortedlist except that duplicates are ignored.

>>> from blist import sortedset
>>> my_set = sortedset([3,7,2,2])
sortedset([2, 3, 7]
>>>

The weaksortedlist and weaksortedset are weakref variations of the
sortedlist and sortedset.

The sorteddict works just like a regular dict, except the keys are
always sorted.  The sorteddict should not be confused with Python
2.7's OrderedDict type, which remembers the insertion order of the
keys.

>>> from blist import sorteddict
>>> my_dict = sorteddict({1: 5, 6: 8, -5: 9})
>>> my_dict.keys()
[-5, 1, 6]
>>>

The btuple is a drop-in replacement for the built-in tuple.  Compared
to the built-in tuple, the btuple offers the following advantages:

- Constructing a btuple from a blist takes O(1) time.
- Taking a slice of a btuple takes O(n) time, where n is the size of
  the original tuple.  The size of the slice does not matter.

>>> from blist import blist, btuple
>>> x = blist([0]) # x is a blist with one element
>>> x *= 2**29 # x is a blist with > 500 million elements
>>> y = btuple(x)  # y is a btuple with > 500 million elements

Feedback


We're eager to hear about your experiences with the blist.  You can
email me at dan...@stutzbachenterprises.com.  Alternately, bug reports
and feature requests may be reported on our bug tracker at:
http://github.com/DanielStutzbach/blist/issues

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com/>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: For loop searching takes too long!

2010-01-28 Thread Daniel Stutzbach
On Thu, Jan 28, 2010 at 5:52 PM, elsa  wrote:

>choice = random.choice(range(1,sum([i[0] for i in myList])+1))
>

That's the line causing the problem.  It's generating all of the numbers
between 1 and your sum, then picking one.  Try the following instead, which
will pick a number between 1 and your sum without generating each and every
one.

choice = random.randrange(1, sum(i[0] for i in myList)+1)

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ISO module for binomial coefficients, etc.

2010-01-27 Thread Daniel Stutzbach
On Sat, Jan 23, 2010 at 4:55 PM, kj  wrote:

> Before I go off to re-invent a thoroughly invented wheel, I thought
> I'd ask around for some existing module for computing binomial
> coefficient, hypergeometric coefficients, and other factorial-based
> combinatorial indices.  I'm looking for something that can handle
> fairly large factorials (on the order of 1!), using floating-point
> approximations as needed, and is smart about optimizations,
> memoizations, etc.
>

I don't have code for any of the other stuff, but I have some efficient code
(see below) for computing exact integer factorials.  It's a
divide-and-conquer algorithm.  I'm not sure of the exact time complexity,
but it appears to somewhere between n*log n and n**2.

python2.6 -m timeit -s 'from utils import factorial' 'x = factorial(1)'
10 loops, best of 3: 27.7 msec per loop

If that's not fast enough, I suggest using the gamma function to compute
floating-point approximations, as gamma(n) == (n-1)!.  Gamma isn't yet
included in the Python standard library (http://bugs.python.org/issue3366),
but you can use the ctypes module to get it from the system C library on
many platforms.

For computing binomial coefficients, you can use the lgamma function (log of
gamma), also found in my system C libraries.  Since choose(n, k) =
exp(lgamma(n) - lgamma(k) - lgamma(n-k)).

If you go with floating point, watch out for floating point rounding error.
;-)

def factorial(n):
"http://www.luschny.de/math/factorial/binarysplitfact.html";
p = 1
r = 1
p, r = factorial_loop(n, p, r)
return r << n_minus_num_of_bits(n)

def factorial_loop(n, p, r):
if n <= 2: return p, r
p, r = factorial_loop(n // 2, p, r)
p *= part_product(n // 2 + 1 + ((n // 2) & 1),
  n - 1 + (n & 1))
r *= p
return p, r

def part_product(n, m):
if (m <= (n+1)): return n
if (m == (n+2)): return n*m
k = (n+m) // 2
if ((k & 1) != 1):
k -= 1
return part_product(n, k) * part_product(k+2, m)

def n_minus_num_of_bits(v):
w = v
if w >= 2**32-1:
raise OverflowError
w -= (0x & w) >> 1
w = (w & 0x3333) + ((w >> 2) & 0x)
w = w + (w >> 4) & 0x0f0f0f0f
w += w >> 8
w += w >> 16
return v - (w & 0xff)

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: list.pop(0) vs. collections.dequeue

2010-01-25 Thread Daniel Stutzbach
On Mon, Jan 25, 2010 at 12:24 PM, Steve Howell  wrote:

> Hi Daniel, I agree with what Raymond Hettinger says toward the top of
> the PEP.  Blist, while extremely useful, does seem to have to trade
> off performance of common operations, notably "get item", in order to
> get better performance for other operations (notably insert/delete).
>

Actually, the latest version of blist is competitive for "get item" and
similar operations.  See:

http://stutzbachenterprises.com/performance-blist/item
http://stutzbachenterprises.com/performance-blist/set-item
http://stutzbachenterprises.com/performance-blist/lifo
http://stutzbachenterprises.com/performance-blist/shuffle

I added a flat cache of the leaf nodes, yielding O(1) get/set item
operations whenever those operations dominate over insert/delete
operations.  The cache adds around 1.5% memory overhead.


> My algorithm does exactly N pops and roughly N list accesses, so I
> would be going from N*N + N to N + N log N if switched to blist.  That
> would be at least a theoretical gain over the current performance, but
> if pop() were O(1), I could get the whole thing down to N time.
>

If I understand correctly, you feel strongly that a list.pop(0) that runs in
O(n) time is "broken", but you're comfortable with a list.pop(1) that runs
in O(n) time.  Is that correct?

How do you feel about a bisect.insort(list, item) that takes O(n) time?

Different people are bound to have different opinions about which operations
are most important and where lies the best tradeoff between different
operations (as well as code complexity).   I am not sure why you feel so
strongly that particular spot is best.

Obviously, I prefer a slightly different spot, but I also respect the core
developers' choice.

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: list.pop(0) vs. collections.dequeue

2010-01-24 Thread Daniel Stutzbach
On Sun, Jan 24, 2010 at 1:53 PM, Steve Howell  wrote:
> I don't think anybody provided an actual link, but please correct me
> if I overlooked it.

I have to wonder if my messages are all ending up in your spam folder
for some reason. :-)

PEP 3128 (which solves your problem, but not using the implementation
you suggest)
http://www.python.org/dev/peps/pep-3128/

Implementation as an extension module:
http://pypi.python.org/pypi/blist/

Related discussion:
http://mail.python.org/pipermail/python-3000/2007-April/006757.html
http://mail.python.org/pipermail/python-3000/2007-May/007491.html

Detailed performance comparison:
http://stutzbachenterprises.com/performance-blist

I maintain a private fork of Python 3 with the blist replacing the
regular list, as a way of rigorously testing the blist implementation.

Although I originally proposed a PEP, I am content to have the blist
exist as a third-party module.

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: list.pop(0) vs. collections.dequeue

2010-01-22 Thread Daniel Stutzbach
On Fri, Jan 22, 2010 at 5:27 PM, Steve Howell  wrote:

> I actually do expect Python to solve performance problems for me that
> are more easily solved in core than in Python itself.  So I guess
> that's where we differ.
>

You might be interested in the extension type I wrote (the "blist") that
looks, acts, and quacks like a list, but takes worst-case O(log n) time for
inserting and removing elements anywhere in the list.

It's available for download here:

http://pypi.python.org/pypi/blist/

And there's a detailed performance comparison with the built-in list here:

http://stutzbachenterprises.com/performance-blist

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sorted dictionary

2010-01-21 Thread Daniel Stutzbach
On Thu, Jan 21, 2010 at 12:45 PM, Jan Kaliszewski  wrote:

> Please note that I used funcions from bisect, that use binary search.
>
> Doesn't it take O(log n) time?
>

It takes O(log n) time to find the point to insert, but O(n) time to perform
the actual insertion.

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sorted dictionary

2010-01-21 Thread Daniel Stutzbach
On Thu, Jan 21, 2010 at 2:27 AM, Raymond Hettinger  wrote:

> Using an underlying list to track sorted items
> means that insertion and deletion take O(n) time.
> That could be reduced to O(log n) time by using
> a blist or skiplist as the underlying structure
> for maintaining a sort.
>

Indeed.  In fact, blist 1.1 (to be released within a month or so) will
include sorteddict, sortedset, sortedlist, weaksortedset, and weaksortedlist
types.

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: iterators and views of lists

2009-12-16 Thread Daniel Stutzbach
On Wed, Dec 16, 2009 at 5:39 AM, Steven D'Aprano <
st...@remove-this-cybersource.com.au> wrote:

> for item in seq[1:]:
>process(item)
>
> without making an unnecessary copy of almost all of seq.
>

I use the following idiom:
for i in range(1, len(seq)):
process(seq[i])

Alternately, if I'm using the blist extension type that I wrote, then
seq[1:] is O(log n).
http://pypi.python.org/pypi/blist/

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: iterators and views of lists

2009-12-16 Thread Daniel Stutzbach
On Wed, Dec 16, 2009 at 7:33 AM, Peter Otten <__pete...@web.de> wrote:

> islice() could be changed to special-case lists and tuples, but that feels
> a
> bit unclean.
>

How about special-casing objects that implement collections.Sequence?

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: treaps in python

2009-12-14 Thread Daniel Stutzbach
On Mon, Dec 14, 2009 at 10:49 PM, Dan Stromberg  wrote:

> Also, what's the best way to package something like this for consumption by
> python-folk?  There seem to be so many ways of packaging things anymore.
>  Are dist utils, a .deb and a .rpm the way to go?  Right now, it's just
> using make to stuff things in /usr/local.
>

Put a source .tar.gz on PyPi that includes a setup.py, optionally using
distribute_setup.py to augment distutils.

http://docs.python.org/distutils/

http://packages.python.org/distribute/using.html
http://pypi.python.org/pypi/distribute#distribute-setup-py

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: KeyboardInterrupt

2009-12-10 Thread Daniel Stutzbach
On Thu, Dec 10, 2009 at 4:42 PM, mattia  wrote:

> def go():
>threads = [Thread(target=do_work, args=()) for _ in range(2)]
>for t in threads:
>t.start()
>for t in threads:
>t.join()
>

The KeyboardInterrupt goes to the main thread, which is sitting there in
t.join() with no exception handler around it.

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: non-copy slices

2009-11-19 Thread Daniel Stutzbach
On Wed, Nov 18, 2009 at 9:00 PM, Rami Chowdhury wrote:

> I'm not sure you're understanding the point others have been making. A
> list item is merely another reference to an existing object -- it
> doesn't copy the object in any way.
>

It still has to copy the reference, though.  That takes O(n) time if you're
taking a big slice.

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: non-copy slices

2009-11-18 Thread Daniel Stutzbach
On Wed, Nov 18, 2009 at 2:22 PM, Themis Bourdenas <
t.bourdena...@imperial.ac.uk> wrote:

> It's nothing in the library that completely imitates the slice without the
> copies, right?


You might be interested in my blist extension type (link below).
Syntactically, using a blist is just like using a list, but it has different
performance characteristics.

In particular for your needs, taking a slice is cheap.  The blist implements
copy-on-write behind the scenes, so taking a slice takes O(log n) time,
where n is the size of the initial blist.

http://pypi.python.org/pypi/blist/

Here is a simple example, which creates a blist with over 500 million
elements and takes a slice of over 500 million elements.  In under 22
microseconds. :-)

from blist import blist
small_list = blist([0])
BIG_list = small_list * 2**29
BIG_slice = BIG_list[4:-5]

Cashew:~$ python2.5 -m timeit -s 'from blist import blist'
'small_list=blist([0])' ''BIG_list=small_list*2**29'
'BIG_slice=BIG_list[4:-5]'
1 loops, best of 3: 21.5 usec per loop

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


ANN: blist 1.0.2

2009-11-01 Thread Daniel Stutzbach
blist 1.0.2 is now available:

http://pypi.python.org/pypi/blist/

What is blist?
--

The blist is a type that looks, acts, and quacks like a Python list, but has
better asymptotic performance when inserting or deleting elements (O(log
n)). For small lists, blists and the built-in list have very similar
performance.  The blist also features copy-on-write behavior, so copying or
taking large slices from a list is inexpensive.

What's new?
---

blist version 1.0.2 includes two important bug fixes:

- Fixed a crash in the .index method, which was not properly sanitizing the
optional arguments.
- Fixed a possible crash when modifying the blist during iteration

Other changes include:

- Changed int to Py_ssize_t in several places for better 64-bit hygiene
- Removed some over-zealous assertion checks that were causing crashes in
oddball (but legal!) cases in debug builds
- Ported tests to run under Python 2.6 and Python 3.1 (but no longer Python
2.5)
- Got rid of warnings on non-ix86 platforms

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: efficient running median

2009-10-13 Thread Daniel Stutzbach
On Tue, Oct 13, 2009 at 10:22 AM, Janto Dreijer  wrote:

> I'm looking for code that will calculate the running median of a
> sequence, efficiently. (I'm trying to subtract the running median from
> a signal to correct for gradual drift).
>

In the past, I've used the following algorithm which approximates the
running median.  I got it from an article in a peer-reviewed journal, but I
can't locate the reference at the moment.

my_median = some_reasonable_default
step_size = some_value_depending_on_your_application

def adjust_median(value):
global my_median
if my_median < value:
my_median += step_size
else:
my_median -= step_size

Let me know if you have any questions about it.

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: iterate over list while changing it

2009-09-30 Thread Daniel Stutzbach
On Thu, Sep 24, 2009 at 3:32 PM, Torsten Mohr  wrote:

> a = [1, 2, 3, 4, 5, 6]
>
> for i, x in enumerate(a):
>if x == 3:
>a.pop(i)
>continue
>
>if x == 4:
>a.push(88)
>
>print "i", i, "x", x
>
> I'd like to iterate over a list and change that list while iterating.
> I'd still like to work on all items in that list, which is not happening
> in the example above.
>

I assume that by "a.push" you meant "a.append".

I believe this will do what you want:

a = [1, 2, 3, 4, 5, 6]
i = 0
while i < len(a):
   x = a[i]
   if x == 3:
   a.pop(i)
   i += 1
   continue

   if x == 4:
   a.append(88)

   print "i", i, "x", x
   i += 1

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: super() and multiple inheritance failure

2009-09-25 Thread Daniel Stutzbach
On Fri, Sep 25, 2009 at 9:36 PM, Steven D'Aprano <
st...@remove-this-cybersource.com.au> wrote:

> I don't understand why I'm getting the following behaviour when using
> super() with multiple inheritance. The following is a minimal example
> demonstrating the behaviour.
>

super() does not have the behavior that I would have expected, either, and
IMO the documentation is unenlightening.

I found the following website helpful in understanding what super()
*actually* does (though I do not support the author's hyperbolic
criticisms):
http://fuhm.net/super-harmful/

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Doubley imported module caused devastating bug

2009-09-24 Thread Daniel Stutzbach
On Thu, Sep 24, 2009 at 2:51 PM, Ethan Furman  wrote:

> I believe that modules are imported only once
>

That's *mostly* true, but try this one:

A.py:
print 'Importing A'
import B

B.py:
print 'Importing B'
import A

Cashew:/tmp$ python2.5 B.py
Importing B
Importing A
Importing B

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: delete items from list by indices

2009-09-23 Thread Daniel Stutzbach
On Wed, Sep 23, 2009 at 3:48 AM, Peter Otten <__pete...@web.de> wrote:

> Why's that obsession with speed?
>

Well, most of the solutions posted so far are O(n**2), which may be
noticeably slow if the list is of considerable length.  I wonder if the
original poster ran in to that problem.

>>> items = ["a", "b", "c", "d"]
> >>> delenda = set([0, 3])
> >>> items = [item for index, item in enumerate(items) if index not in
> delenda]
> >>> items
> ['b', 'c']


... except for that one.  That's O(n).  Very nice. :-)

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Creating a local variable scope.

2009-09-12 Thread Daniel Stutzbach
On Sat, Sep 12, 2009 at 10:02 AM, Ethan Furman  wrote:

> That's probably why he called it as:
> with do_nothing() as *imaginary*_local_scope:
>...
>del spam, eggs, imaginary_local_scope
>
> and then as the last item deleted all the variables, except the one he
> wanted to keep.
>

You're absolutely right.  My apologies.  I need to learn not to post to
mailing lists first thing in the morning! ;-)

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Creating a local variable scope.

2009-09-12 Thread Daniel Stutzbach
On Fri, Sep 11, 2009 at 8:29 PM, Steven D'Aprano <
st...@remove-this-cybersource.com.au> wrote:

> (4) Create a "do nothing" context manager allowing you to visually indent
> the block, but otherwise have no effect:
>

"with" doesn't create a new scope.  Observe:

Python 2.6.2 (r262:71600, Apr 15 2009, 07:20:39)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> with open('/dev/null') as f:
...   x = 5
...
>>> x
5

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Creating a local variable scope.

2009-09-11 Thread Daniel Stutzbach
2009/9/11 Johan Grönqvist 

> I find several places in my code where I would like to have a variable
> scope that is smaller than the enclosing function/class/module definition.
>

For what it's worth, there was a relevant proposal on the python-ideas list
a few months back:

http://mail.python.org/pipermail/python-ideas/2009-July/005114.html
<http://mail.python.org/pipermail/python-ideas/2009-July/005114.html%20>

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Simple addition to random module - Student's t

2009-09-03 Thread Daniel Stutzbach
On Wed, Sep 2, 2009 at 2:15 PM, Raymond Hettinger  wrote:

> ISTM, there ought to be a statistics module that can calculate
> cumulative distribution functions for a variety of distributions.
> This would be far more helpful than creating more generators.
>

Many of the formulas for cumulative distribution functions require math
functions not currently provided by Python (erf, gamma, etc.).

(http://bugs.python.org/issue3366 includes a patch to provide them)

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: List insertion cost

2009-07-21 Thread Daniel Stutzbach
On Tue, Jul 21, 2009 at 2:21 PM, Lucas P Melo  wrote:

> I would like to know how much it costs to insert an element into a list
> using this operation:
> a[2:2] = [ 1 ]
> i. e, what is the complexity of the operation above (given that len(a) =
> n)?
>

O(n)

If you want O(log n), you can use the blist extension type from
http://pypi.python.org/pypi/blist/

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: No trees in the stdlib?

2009-06-26 Thread Daniel Stutzbach
On Fri, Jun 26, 2009 at 1:54 AM, Miles Kaufmann  wrote:

> On Jun 26, 2009, at 2:23 AM, Chris Rebert wrote:
>
>> That's pretty much the bisect module in a nutshell. It manipulates a
>> sorted list using binary search.
>>
>
> With O(n) insertions and removals, though.  A decent self-balancing binary
> tree will generally do those in O(log n).


FWIW, you can get O(log**2 n) inserts and deletes by using the bisect module
on my blist extension type (http://pypi.python.org/pypi/blist/).  It's a
drop-in replacement for list(), with different asymptotic performance
characteristics.

Copying a blist is O(1), so the functional-programming types can wrap it in
non-mutating semantics if they so choose. ;)

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python list handling and Lisp list handling

2009-04-27 Thread Daniel Stutzbach
On Fri, Apr 24, 2009 at 10:19 AM, Mark Tarver wrote:

> but also says that their representation is implementation dependent.
> As far as I see this should mean that element access in Python should
> run in constant time.  Now if so this is a boon, because generally
>

When I first learned Python, I too was confused by the fact that Python
lists are actually arrays (or vectors) under-the-hood, and it was some time
before I learned that element access is fast (O(1)) but inserts, deletes,
and taking a sublist are slow (O(n)).

Much later, I went on to write a drop-in replacement type called the "blist"
that has the same methods as a Python list, but has better asymptotic
performance for many operations.  That way I can write use the list in the
most natural way without having to worry about accidentally hitting a O(n)
method.

http://pypi.python.org/pypi/blist/

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
--
http://mail.python.org/mailman/listinfo/python-list


Re: Programming in Python with a view to extending in C at a later date.

2009-04-20 Thread Daniel Stutzbach
On Mon, Apr 20, 2009 at 1:57 PM, dug.armad...@googlemail.com <
dug.armad...@googlemail.com> wrote:

> Say you set out to program in Python knowing that you will be
> converting parts of it into C ( or maybe C++) at a later date, but you
> do not know which parts.
>

I often use Python for rapid prototyping and later rewrite parts of it in C
for speed.  It's always been a pleasant experience, and I've yet to come
across a Python construct that made my life especially difficult later.  The
value of rapid development and debugging algorithmic problems in Python has
always outweighed the cost of later translating some of the code.

Hope that helps,

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
--
http://mail.python.org/mailman/listinfo/python-list