Re: Heap Implementation

2016-02-11 Thread Cem Karan

On Feb 10, 2016, at 1:23 PM, "Sven R. Kunze"  wrote:

> Hi Cem,
> 
> On 08.02.2016 02:37, Cem Karan wrote:
>> My apologies for not writing sooner, but work has been quite busy lately 
>> (and likely will be for some time to come).
> 
> no problem here. :)
> 
>> I read your approach, and it looks pretty good, but there may be one issue 
>> with it; how do you handle the same item being pushed into the heap more 
>> than once?  In my simple simulator, I'll push the same object into my event 
>> queue multiple times in a row.  The priority is the moment in the future 
>> when the object will be called.  As a result, items don't have unique 
>> priorities.  I know that there are methods of handling this from the 
>> client-side (tuples with unique counters come to mind), but if your library 
>> can handle it directly, then that could be useful to others as well.
> 
> I've pondered about that in the early design phase. I considered it a 
> slowdown for my use-case without benefit.
> 
> Why? Because I always push a fresh object ALTHOUGH it might be equal 
> comparing attributes (priority, deadline, etc.).
> 
> 
> That's the reason why I need to ask again: why pushing the same item on a 
> heap?
> 
> 
> Are we talking about function objects? If so, then your concern is valid. 
> Would you accept a solution that would involve wrapping the function in 
> another object carrying the priority? Would you prefer a wrapper that's 
> defined by xheap itself so you can just use it?

Yes.  I use priority queues for event loops.  The items I push in are callables 
(sometimes callbacks, sometimes objects with __call__()) and the priority is 
the simulation date that they should be called.  I push the same item multiple 
times in a row because it will modify itself by the call (e.g., the location of 
an actor is calculated by its velocity and the date). There are certain calls 
that I tend to push in all at once because the math for calculating when the 
event should occur is somewhat expensive to calculate, and always returns 
multiple dates at once.  

That is also why deleting or changing events can be useful; I know that at 
least some of those events will be canceled in the future, which makes deleting 
useful.  Note that it is also possible to cancel an event by marking it as 
cancelled, and then simply not executing it when you pop it off the queue, but 
I've found that there are a few cases in my simulations where the number of 
dead events that are in the queue exceeds the number of live events, which does 
have an impact on memory and operational speed (maintaining the heap 
invariant).  There isn't much difference though, but I need FAST code to deal 
with size of my simulations (thousands to tens of thousands of actors, over 
hundreds of millions of simulations, which is why I finally had to give up on 
python and switch to pure C).

Having a wrapper defined by xheap would be ideal; I suspect that I won't be the 
only one that needs to deal with this, so having it centrally located would be 
best.  It may also make it possible for you to optimize xheap's behavior in 
some way.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-10 Thread Sven R. Kunze

Hi Cem,

On 08.02.2016 02:37, Cem Karan wrote:

My apologies for not writing sooner, but work has been quite busy lately (and 
likely will be for some time to come).


no problem here. :)


I read your approach, and it looks pretty good, but there may be one issue with 
it; how do you handle the same item being pushed into the heap more than once?  
In my simple simulator, I'll push the same object into my event queue multiple 
times in a row.  The priority is the moment in the future when the object will 
be called.  As a result, items don't have unique priorities.  I know that there 
are methods of handling this from the client-side (tuples with unique counters 
come to mind), but if your library can handle it directly, then that could be 
useful to others as well.


I've pondered about that in the early design phase. I considered it a 
slowdown for my use-case without benefit.


Why? Because I always push a fresh object ALTHOUGH it might be equal 
comparing attributes (priority, deadline, etc.).



That's the reason why I need to ask again: why pushing the same item on 
a heap?



Are we talking about function objects? If so, then your concern is 
valid. Would you accept a solution that would involve wrapping the 
function in another object carrying the priority? Would you prefer a 
wrapper that's defined by xheap itself so you can just use it?



Best,
Sven
--
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-09 Thread Cem Karan

On Feb 9, 2016, at 8:27 PM, srinivas devaki  wrote:

> 
> 
> On Feb 10, 2016 6:11 AM, "Cem Karan"  wrote:
> >
> > Eh, its not too bad once you figure out how to do it.  It's easier in C 
> > though; you can use pointer tricks that let you find the element in 
> > constant time, and then removal will involve figuring out how to fix up 
> > your heap after you've removed the element.
> >
> 
> If you can do it with C pointers then you can do it with python's 
> references/mutable objects. :)
> in case of immutable objects, use a light mutable wrapper or better use list 
> for performance.

I should have been clearer; it's easier to UNDERSTAND in C, but you can 
implement it in either language.  C will still be faster, but only because its 
compiled.  It will also take a lot longer to code and ensure that it's correct, 
but that is the tradeoff.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-09 Thread srinivas devaki
On Feb 10, 2016 6:11 AM, "Cem Karan"  wrote:
>
> Eh, its not too bad once you figure out how to do it.  It's easier in C
though; you can use pointer tricks that let you find the element in
constant time, and then removal will involve figuring out how to fix up
your heap after you've removed the element.
>

If you can do it with C pointers then you can do it with python's
references/mutable objects. :)
in case of immutable objects, use a light mutable wrapper or better use
list for performance.

Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-09 Thread Cem Karan

On Feb 9, 2016, at 9:27 AM, Mark Lawrence  wrote:

> On 09/02/2016 11:44, Cem Karan wrote:
>> 
>> On Feb 9, 2016, at 4:40 AM, Mark Lawrence  wrote:
>> 
>>> On 09/02/2016 04:25, Cem Karan wrote:
 
 No problem, that's what I thought happened.  And you're right, I'm looking 
 for a priority queue (not the only reason to use a heap, but a pretty 
 important reason!)
 
>>> 
>>> I'm assuming I've missed the explanation, so what is the problem again with 
>>> https://docs.python.org/3/library/queue.html#queue.PriorityQueue or even 
>>> https://docs.python.org/3/library/asyncio-queue.html#asyncio.PriorityQueue ?
>> 
>> Efficiently changing the the priority of items already in the queue/deleting 
>> items in the queue (not the first item).  This comes up a LOT in event-based 
>> simulators where it's easier to tentatively add an event knowing that you 
>> might need to delete it or change it later.
>> 
>> Thanks,
>> Cem Karan
>> 
> 
> Thanks for that, but from the sounds of it sooner you than me :)

Eh, its not too bad once you figure out how to do it.  It's easier in C though; 
you can use pointer tricks that let you find the element in constant time, and 
then removal will involve figuring out how to fix up your heap after you've 
removed the element.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-09 Thread Mark Lawrence

On 09/02/2016 11:44, Cem Karan wrote:


On Feb 9, 2016, at 4:40 AM, Mark Lawrence  wrote:


On 09/02/2016 04:25, Cem Karan wrote:


No problem, that's what I thought happened.  And you're right, I'm looking for 
a priority queue (not the only reason to use a heap, but a pretty important 
reason!)



I'm assuming I've missed the explanation, so what is the problem again with 
https://docs.python.org/3/library/queue.html#queue.PriorityQueue or even 
https://docs.python.org/3/library/asyncio-queue.html#asyncio.PriorityQueue ?


Efficiently changing the the priority of items already in the queue/deleting 
items in the queue (not the first item).  This comes up a LOT in event-based 
simulators where it's easier to tentatively add an event knowing that you might 
need to delete it or change it later.

Thanks,
Cem Karan



Thanks for that, but from the sounds of it sooner you than me :)

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-09 Thread Cem Karan

On Feb 9, 2016, at 4:40 AM, Mark Lawrence  wrote:

> On 09/02/2016 04:25, Cem Karan wrote:
>> 
>> No problem, that's what I thought happened.  And you're right, I'm looking 
>> for a priority queue (not the only reason to use a heap, but a pretty 
>> important reason!)
>> 
> 
> I'm assuming I've missed the explanation, so what is the problem again with 
> https://docs.python.org/3/library/queue.html#queue.PriorityQueue or even 
> https://docs.python.org/3/library/asyncio-queue.html#asyncio.PriorityQueue ?

Efficiently changing the the priority of items already in the queue/deleting 
items in the queue (not the first item).  This comes up a LOT in event-based 
simulators where it's easier to tentatively add an event knowing that you might 
need to delete it or change it later.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-09 Thread Mark Lawrence

On 09/02/2016 04:25, Cem Karan wrote:


No problem, that's what I thought happened.  And you're right, I'm looking for 
a priority queue (not the only reason to use a heap, but a pretty important 
reason!)



I'm assuming I've missed the explanation, so what is the problem again 
with https://docs.python.org/3/library/queue.html#queue.PriorityQueue or 
even 
https://docs.python.org/3/library/asyncio-queue.html#asyncio.PriorityQueue ?


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-08 Thread Cem Karan

On Feb 8, 2016, at 10:12 PM, srinivas devaki  wrote:

> 
> On Feb 8, 2016 5:17 PM, "Cem Karan"  wrote:
> >
> > On Feb 7, 2016, at 10:15 PM, srinivas devaki  
> > wrote:
> > > On Feb 8, 2016 7:07 AM, "Cem Karan"  wrote:
> > > > I know that there are methods of handling this from the client-side 
> > > > (tuples with unique counters come to mind), but if your library can 
> > > > handle it directly, then that could be useful to others as well.
> > >
> > > yeah it is a good idea to do at client side.
> > > but if it should be introduced as feature into the library, instead of 
> > > tuples, we should just piggyback a single counter it to the self._indexes 
> > > dict, or better make another self._counts dict which will be light and 
> > > fast.
> > > and if you think again with this method you can easily subclass with just 
> > > using self._counts dict  in your subclass. but still I think it is good 
> > > to introduce it as a feature in the library.
> > >
> > > Regards
> > > Srinivas Devaki
> >
> > I meant that the counter is a trick to separate different instances of 
> > (item, priority) pairs when you're pushing in the same item multiple times, 
> > but with different priorities.
> 
> oh okay, I'm way too off.
> 
> what you are asking for is a Priority Queue like feature.
> 
> but the emphasis is on providing extra features to heap data structure.
> 
> and xheap doesn't support having duplicate items.
> 
> and if you want to insert same items with distinct priorities, you can 
> provide the priority with key argument to the xheap. what xheap doesn't 
> support is having same keys/priorities.
> So I got confused and proposed a method to have same keys.
> 
> Regards
> Srinivas Devaki

No problem, that's what I thought happened.  And you're right, I'm looking for 
a priority queue (not the only reason to use a heap, but a pretty important 
reason!)

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-08 Thread srinivas devaki
On Feb 8, 2016 5:17 PM, "Cem Karan"  wrote:
>
> On Feb 7, 2016, at 10:15 PM, srinivas devaki 
wrote:
> > On Feb 8, 2016 7:07 AM, "Cem Karan"  wrote:
> > > I know that there are methods of handling this from the client-side
(tuples with unique counters come to mind), but if your library can handle
it directly, then that could be useful to others as well.
> >
> > yeah it is a good idea to do at client side.
> > but if it should be introduced as feature into the library, instead of
tuples, we should just piggyback a single counter it to the self._indexes
dict, or better make another self._counts dict which will be light and fast.
> > and if you think again with this method you can easily subclass with
just using self._counts dict  in your subclass. but still I think it is
good to introduce it as a feature in the library.
> >
> > Regards
> > Srinivas Devaki
>
> I meant that the counter is a trick to separate different instances of
(item, priority) pairs when you're pushing in the same item multiple times,
but with different priorities.

oh okay, I'm way too off.

what you are asking for is a Priority Queue like feature.

but the emphasis is on providing extra features to heap data structure.

and xheap doesn't support having duplicate items.

and if you want to insert same items with distinct priorities, you can
provide the priority with key argument to the xheap. what xheap doesn't
support is having same keys/priorities.
So I got confused and proposed a method to have same keys.

Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-08 Thread Cem Karan
On Feb 7, 2016, at 10:15 PM, srinivas devaki  wrote:
> On Feb 8, 2016 7:07 AM, "Cem Karan"  wrote:
> > I know that there are methods of handling this from the client-side (tuples 
> > with unique counters come to mind), but if your library can handle it 
> > directly, then that could be useful to others as well.
> 
> yeah it is a good idea to do at client side.
> but if it should be introduced as feature into the library, instead of 
> tuples, we should just piggyback a single counter it to the self._indexes 
> dict, or better make another self._counts dict which will be light and fast.
> and if you think again with this method you can easily subclass with just 
> using self._counts dict  in your subclass. but still I think it is good to 
> introduce it as a feature in the library.
> 
> Regards
> Srinivas Devaki

Just to be 100% sure, you do mean to use the counters as UUIDs, right?  I don't 
mean that the elements in the heap get counted, I meant that the counter is a 
trick to separate different instances of (item, priority) pairs when you're 
pushing in the same item multiple times, but with different priorities.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-07 Thread srinivas devaki
On Feb 8, 2016 7:07 AM, "Cem Karan"  wrote:
>
>
>
> I know that there are methods of handling this from the client-side
(tuples with unique counters come to mind), but if your library can handle
it directly, then that could be useful to others as well.

yeah it is a good idea to do at client side.
but if it should be introduced as feature into the library, instead of
tuples, we should just piggyback a single counter it to the self._indexes
dict, or better make another self._counts dict which will be light and fast.
and if you think again with this method you can easily subclass with just
using self._counts dict  in your subclass. but still I think it is good to
introduce it as a feature in the library.

Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-07 Thread Cem Karan

On Jan 30, 2016, at 5:47 PM, Sven R. Kunze  wrote:

> Hi again,
> 
> as the topic of the old thread actually was fully discussed, I dare to open a 
> new one.
> 
> I finally managed to finish my heap implementation. You can find it at 
> https://pypi.python.org/pypi/xheap + https://github.com/srkunze/xheap.
> 
> I described my motivations and design decisions at 
> http://srkunze.blogspot.com/2016/01/fast-object-oriented-heap-implementation.html
>  
> 
> @Cem
> You've been worried about a C implementation. I can assure you that I did not 
> intend to rewrite the incredibly fast and well-tested heapq implementation. I 
> just re-used it.
> 
> I would really be grateful for your feedback as you have first-hand 
> experience with heaps.

<>

My apologies for not writing sooner, but work has been quite busy lately (and 
likely will be for some time to come).

I read your approach, and it looks pretty good, but there may be one issue with 
it; how do you handle the same item being pushed into the heap more than once?  
In my simple simulator, I'll push the same object into my event queue multiple 
times in a row.  The priority is the moment in the future when the object will 
be called.  As a result, items don't have unique priorities.  I know that there 
are methods of handling this from the client-side (tuples with unique counters 
come to mind), but if your library can handle it directly, then that could be 
useful to others as well.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-02 Thread Sven R. Kunze

On 02.02.2016 01:48, srinivas devaki wrote:


On Feb 1, 2016 10:54 PM, "Sven R. Kunze" > wrote:

>
> Maybe I didn't express myself well. Would you prefer the sweeping 
approach in terms of efficiency over how I implemented xheap currently?

>

complexity wise your approach is the best one of all that I have seen 
till now


> Without running some benchmarks, I have absolutely no feeling which 
approach is faster/more memory efficient etc.

>

this is obviously memory efficient but I don't know whether this 
approach would be faster than previous approaches, with previous 
approaches there is no call back into Python code from C code for 
comparison.
but this should be faster than HeapDict as HeapDict is directly using 
its own methods for heappush, heappop etc




Yes. So, it remains to be seen until I implemented the sweeping and 
compared them to each other.



PS: if you have time, could you please review my pull request.



Indeed, I am already thinking about it. :)


Best,
Sven
--
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-02 Thread Oscar Benjamin
On 2 February 2016 at 05:38, Steven D'Aprano
 wrote:
>
> In effect, each measurement you take is made up of two components:
>
> * the actual time that the code would take if it had exclusive
>   access to the machine with no other programs running, call it t;
>
> * and the noise added by the system, call it Δt.
>
> It is impossible to measure t alone, you always get t+Δt. The aim in timing
> is to reduce the effect of the noise as much as possible.

That's a reasonable model in some cases but not all. In particular the
"actual time" part is itself not a constant but rather depends on all
sorts of things like RAM caching etc. On the CPython side there could
be all kinds of optimisations that would kick in at different times to
change the "actual time". Likewise there are optimisations at OS and
hardware levels that kick in at different times.

> The way to do this with timeit is:
>
> - don't run extraneous code as part of your test (just measure what
>   you care about, with as little scaffolding around it);
>
> - run that code snippet as many times as you can bear in a loop,
>   *but* let timeit manage the loop; don't try doing it by hand;

There are many situations where running something in a loop over and
over triggers all kinds of optimisations. PyPy's jit warmup is an
example but the effect can occur at the hardware level as well.
Performance of code run repeatedly in a tight loop can be different to
the same code called once or called sporadically in a long-running
process.

This is especially true of code that operates on a data structure. In
real usage the data structure might be accessed rarely but a
tight-profiling loop forces the entire structure into the CPU's
caches.

> - only care about "best of N", where N is as large a number as you
>   can stand;
>
> - averages, standard deviation, and other statistics are pretty
>   much useless here, the only statistic that you care about is
>   the minimum.

This argument makes sense if you adopt the t+Δt model described above
but that model it is just a model. The minimum from micro-benchmarking
in a tight-loop is not necessarily a more representative statistic
than mean, median or whatever else. Standard deviation is actually
useful for assessing the variability of time taken which can also be
important and gives additional information about performance: I would
like to minimise both the expected time taken and also the variability
in that simultaneously if possible.

You seem to be describing timeit's way of profiling which is
reasonable but it's not the only way.

--
Oscar
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-01 Thread Steven D'Aprano
On Tuesday 02 February 2016 06:32, Sven R. Kunze wrote:

> On 31.01.2016 02:48, Steven D'Aprano wrote:
>> On Sunday 31 January 2016 09:47, Sven R. Kunze wrote:
>>
>>> @all
>>> What's the best/standardized tool in Python to perform benchmarking?
>> timeit
> Thanks, Steven.
> 
> Maybe, I am doing it wrong but I get some weird results:

You need to understand that in any modern desktop, server or laptop computer 
(embedded devices may be different) there is *constantly* a large amount of 
processing going on in the background. Your OS is constantly responding to 
network events, managing memory, skipping from one process to another, 
scheduling tasks; your anti-virus and other background programs are working; 
your window environment is watching the mouse, etc. So each time you run a 
test, it comes down to pure dumb luck how many other programs are busy at 
the same time. (You can, sometimes, influence that by quitting all other 
applications, unplugging the network cable, and keeping your hands off the 
keyboard and mouse while the test is running. But who does that?)

All that adds noise to the measured times. It's not unusual for timings to 
differ by a factor of ten from one run to the next. The speed will also 
differ depending on processor cache misses, and many other factors that are 
effectively impossible to predict with any certainty. This makes timing 
measurements on modern systems an inherently noisy process.

In effect, each measurement you take is made up of two components:

* the actual time that the code would take if it had exclusive 
  access to the machine with no other programs running, call it t;

* and the noise added by the system, call it Δt.

It is impossible to measure t alone, you always get t+Δt. The aim in timing 
is to reduce the effect of the noise as much as possible.

The way to do this with timeit is:

- don't run extraneous code as part of your test (just measure what 
  you care about, with as little scaffolding around it);

- run that code snippet as many times as you can bear in a loop, 
  *but* let timeit manage the loop; don't try doing it by hand;

- only care about "best of N", where N is as large a number as you 
  can stand;

- averages, standard deviation, and other statistics are pretty 
  much useless here, the only statistic that you care about is
  the minimum.


Out of those four guidelines, you missed three of them:

(1) Your test code includes scaffolding, the "for _ in range..." loop. 
You're timing how expensive it is to build a range object and loop over it.

(2) You picked a tiny number for the number of loops: ten. Timeit defaults 
to one million, which is good for *really* fast and tiny snippets.

I normally reduce that to 1, but run a larger number of trials. If the 
timing from each trial is too small, I increase the number of loops, and if 
it takes too long (I am impatient) I decrease it, but never below 100.

(3) You then use the repeat() method to calculate the "best of one", which 
is pretty much pointless. There is a timeit() method for "best of one", if 
that's ever useful.

I normally run 5 or 7 trials, and report only the best (smallest).


Here's your code:

>  >>> min(timeit.Timer('for _ in range(1): heappop(h)', 'from heapq
> import heappop; h=list(range(1000))').repeat(10, 1)),
> min(timeit.Timer('for _ in range(1): h.pop()', 'from xheap import
> Heap; h=Heap(range(1000))').repeat(10, 1))
> (0.01726761805314, 0.01615345600021101)


Your code would be easier to read and understand if it were split over 
multiple lines. This is not Perl and there's no advantage to one-liners.

Pulling your code apart and re-writing it in a way which I feel is more 
understandable:

t1 = Timer(
'for _ in range(1): heappop(h)',  # code being tested
'from heapq import heappop; h=list(range(1000))'  # setup
)

t2 = Timer(
'for _ in range(1): h.pop()',
'from xheap import Heap; h=Heap(range(1000))'
)

min(t1.repeat(10, 1))
min(t2.repeat(10, 1))


Something like this will probably be less noisy:

setup = """
from heapq import heappop
from xheap import Heap
a = list(range(1000))
h = Heap(a)
"""

t1 = Timer("heappop(a)", setup)
t2 = Timer("h.pop()", setup)

# print best of 7 trials, where each trial runs the code snippet 1 times
print(min(t1.repeat(1, 7)))
print(min(t2.repeat(1, 7)))


The times printed will be in seconds per 1 runs; divide it by 10 to get 
the time in *milliseconds* per run.

Note that this will *not* eliminate all noise or variation from one timing 
test to the other, but it will (I hope!) reduce it. If you're still seeing a 
lot of variation, try turning the garbage collector off and quitting as many 
other applications as possible.

If anyone else can suggest any other techniques for reducing noise, I'd love 
to hear about them.

Also note that times generated by timeit with one version of Python may not 
be measuring the same thing as those using a different 

Re: Heap Implementation

2016-02-01 Thread srinivas devaki
On Feb 1, 2016 10:54 PM, "Sven R. Kunze"  wrote:
>
> Maybe I didn't express myself well. Would you prefer the sweeping
approach in terms of efficiency over how I implemented xheap currently?
>

complexity wise your approach is the best one of all that I have seen till
now

> Without running some benchmarks, I have absolutely no feeling which
approach is faster/more memory efficient etc.
>

this is obviously memory efficient but I don't know whether this approach
would be faster than previous approaches, with previous approaches there is
no call back into Python code from C code for comparison.
but this should be faster than HeapDict as HeapDict is directly using its
own methods for heappush, heappop etc

PS: if you have time, could you please review my pull request.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-01 Thread Sven R. Kunze

On 31.01.2016 02:48, Steven D'Aprano wrote:

On Sunday 31 January 2016 09:47, Sven R. Kunze wrote:


@all
What's the best/standardized tool in Python to perform benchmarking?

timeit

Thanks, Steven.

Maybe, I am doing it wrong but I get some weird results:

>>> min(timeit.Timer('for _ in range(1): heappop(h)', 'from heapq 
import heappop; h=list(range(1000))').repeat(10, 1)), 
min(timeit.Timer('for _ in range(1): h.pop()', 'from xheap import 
Heap; h=Heap(range(1000))').repeat(10, 1))

(0.01726761805314, 0.01615345600021101)

>>> min(timeit.Timer('for _ in range(10): heappop(h)', 'from heapq 
import heappop; h=list(range(1000))').repeat(10, 1)), 
min(timeit.Timer('for _ in range(10): h.pop()', 'from xheap import 
Heap; h=Heap(range(1000))').repeat(10, 1))

(0.12321608699949138, 0.1304205129002)

>>> min(timeit.Timer('for _ in range(1): heappop(h)', 'from heapq 
import heappop; h=list(range(100))').repeat(10, 1)), 
min(timeit.Timer('for _ in range(1): h.pop()', 'from xheap import 
Heap; h=Heap(range(100))').repeat(10, 1))

(0.010081621999233903, 0.008791901999757101)

>>> min(timeit.Timer('for _ in range(100): heappop(h)', 'from heapq 
import heappop; h=list(range(1000))').repeat(10, 1)), 
min(timeit.Timer('for _ in range(100): h.pop()', 'from xheap import 
Heap; h=Heap(range(1000))').repeat(10, 1))

(0.6218949679996513, 0.7172151949998806)


How can it be that my wrapper is sometimes faster and sometimes slower 
than heapq? I wouldn't mind slower, but faster*?



Best,
Sven


* that behavior is reproducible also for other combos and other machines.
--
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-01 Thread Sven R. Kunze

On 31.01.2016 05:59, srinivas devaki wrote:

@Sven
actually you are not sweeping at all, as i remember from my last post
what i meant by sweeping is periodically deleting the elements which
were marked as popped items.


Exactly.

Maybe I didn't express myself well. Would you prefer the sweeping 
approach in terms of efficiency over how I implemented xheap currently?


Without running some benchmarks, I have absolutely no feeling which 
approach is faster/more memory efficient etc.



kudos on that __setitem__ technique,
instead of using references to the items like in HeapDict, it is
brilliant of you to simply use __setitem__


Thanks. :)


On Sun, Jan 31, 2016 at 4:17 AM, Sven R. Kunze  wrote:

Hi again,

as the topic of the old thread actually was fully discussed, I dare to open
a new one.

I finally managed to finish my heap implementation. You can find it at
https://pypi.python.org/pypi/xheap + https://github.com/srkunze/xheap.

I described my motivations and design decisions at
http://srkunze.blogspot.com/2016/01/fast-object-oriented-heap-implementation.html

@Cem
You've been worried about a C implementation. I can assure you that I did
not intend to rewrite the incredibly fast and well-tested heapq
implementation. I just re-used it.

I would really be grateful for your feedback as you have first-hand
experience with heaps.

@srinivas
You might want to have a look at the removal implementation. Do you think it
would be wiser/faster to switch for the sweeping approach?

I plan to publish some benchmarks to compare heapq and xheap.

@all
What's the best/standardized tool in Python to perform benchmarking? Right
now, I use a self-made combo of unittest.TestCase and time.time + proper
formatting.

Best,
Sven


PS: fixing some weird typos and added missing part.


--
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-01-30 Thread srinivas devaki
@Sven
actually you are not sweeping at all, as i remember from my last post
what i meant by sweeping is periodically deleting the elements which
were marked as popped items.

kudos on that __setitem__ technique,
instead of using references to the items like in HeapDict, it is
brilliant of you to simply use __setitem__

On Sun, Jan 31, 2016 at 4:17 AM, Sven R. Kunze  wrote:
> Hi again,
>
> as the topic of the old thread actually was fully discussed, I dare to open
> a new one.
>
> I finally managed to finish my heap implementation. You can find it at
> https://pypi.python.org/pypi/xheap + https://github.com/srkunze/xheap.
>
> I described my motivations and design decisions at
> http://srkunze.blogspot.com/2016/01/fast-object-oriented-heap-implementation.html
>
> @Cem
> You've been worried about a C implementation. I can assure you that I did
> not intend to rewrite the incredibly fast and well-tested heapq
> implementation. I just re-used it.
>
> I would really be grateful for your feedback as you have first-hand
> experience with heaps.
>
> @srinivas
> You might want to have a look at the removal implementation. Do you think it
> would be wiser/faster to switch for the sweeping approach?
>
> I plan to publish some benchmarks to compare heapq and xheap.
>
> @all
> What's the best/standardized tool in Python to perform benchmarking? Right
> now, I use a self-made combo of unittest.TestCase and time.time + proper
> formatting.
>
> Best,
> Sven
>
>
> PS: fixing some weird typos and added missing part.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-01-30 Thread Steven D'Aprano
On Sunday 31 January 2016 09:47, Sven R. Kunze wrote:

> @all
> What's the best/standardized tool in Python to perform benchmarking?

timeit



-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Heap Implementation

2016-01-30 Thread Sven R. Kunze

Hi again,

as the topic of the old thread actually was fully discussed, I dare to 
open a new one.


I finally managed to finish my heap implementation. You can find it at 
https://pypi.python.org/pypi/xheap + https://github.com/srkunze/xheap.


I described my motivations and design decisions at 
http://srkunze.blogspot.com/2016/01/fast-object-oriented-heap-implementation.html 



@Cem
You've been worried about a C implementation. I can assure you that I 
did not intend to rewrite the incredibly fast and well-tested heapq 
implementation. I just re-used it.


I would really be grateful for your feedback as you have first-hand 
experience with heaps.


@srinivas
You might want to have a look at the removal implementation. Do you 
think it would be wiser/faster to switch for the sweeping approach?


I plan to publish some benchmarks to compare heapq and xheap.

@all
What's the best/standardized tool in Python to perform benchmarking? 
Right now, I use a self-made combo of unittest.TestCase and time.time + 
proper formatting.


Best,
Sven


PS: fixing some weird typos and added missing part.
--
https://mail.python.org/mailman/listinfo/python-list