Efficiency of using long integers to hold bitmaps

2005-07-10 Thread Jeff Melvaine
I note that I can write expressions like "1 << 100" and the result is stored 
as a long integer, which means it is stored as an integer of arbitrary 
length.  I may need to use a large number of these, and am interested to 
know whether the storage efficiency of long integers is in danger of 
breaking my code if I use too many.  Would I do better to write a class that 
defines bitwise operations on arrays of integers, each integer being assumed 
to contain at most 32 bits?  I cannot find information in the Python manuals 
for 2.4.1 that would allow me to resolve this question; perhaps the 
intention is that programmers should not rely on implementation details.

Thanks in advance,

Jeff 


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Efficiency of using long integers to hold bitmaps

2005-07-12 Thread Jeff Melvaine
Raymond,

Thanks for your answers, which even covered the question that I didn't ask 
but should have.

 "A Python list is not an array()\n" * 100  :)

Jeff

"Raymond Hettinger" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> [Jeff Melvaine]
>> I note that I can write expressions like "1 << 100" and the result is 
>> stored
>> as a long integer, which means it is stored as an integer of arbitrary
>> length.  I may need to use a large number of these, and am interested to
>> know whether the storage efficiency of long integers is in danger of
>> breaking my code if I use too many.  Would I do better to write a class 
>> that
>> defines bitwise operations on arrays of integers, each integer being 
>> assumed
>> to contain at most 32 bits?
>
>
> Both array() objects and long integers are equally space efficient.
> In contrast, a list of integers takes up a lot of space (the list is
> stored as an array of pointers to individual integer objects).
>
> Raymond
> 


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Efficiency of using long integers to hold bitmaps

2005-07-12 Thread Jeff Melvaine
Bengt,

Thanks for your informative reply, further comments interleaved.

"Bengt Richter" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> On Mon, 11 Jul 2005 02:37:21 +1000, "Jeff Melvaine" 
> <[EMAIL PROTECTED]> wrote:
>
>>I note that I can write expressions like "1 << 100" and the result is 
>>stored
>>as a long integer, which means it is stored as an integer of arbitrary
>>length.  I may need to use a large number of these, and am interested to
>>know whether the storage efficiency of long integers is in danger of
>>breaking my code if I use too many.  Would I do better to write a class 
>>that
>>defines bitwise operations on arrays of integers, each integer being 
>>assumed
>>to contain at most 32 bits?  I cannot find information in the Python 
>>manuals
>>for 2.4.1 that would allow me to resolve this question; perhaps the
>>intention is that programmers should not rely on implementation details.
>>
>>Thanks in advance,
>>
> Sounds like a possible^H^H^H^H^H^H^H^Hprobable premature optimization 
> worry ;-)
>
I'm writing a Sudoku solver of generic order.  The object is not to make it 
half a millisecond faster than the guy next door's solver, but I'd like it 
to be able to do a bit more than just the daily newspaper puzzle, e.g. 
search for uniquely solvable puzzles with minimal numbers of clues. 
NP-completeness will put a lid on things sooner or later, but I'd like to 
get as far as possible before that happens.

Why in Python?  All my recent professional experience is in writing Ada, 
which is not my idea of a rapid prototyping language (or a rapid deliverable 
item handover language either, for that matter :).

Why do I want it to be efficient during debugging, rather than after fine 
tuning?  I take your point, but in a sense the ability to handle large 
problems is part of the proof of concept.

> What is a "large number of these" going to amount to? How many, tops?
> And how many bits in each? How many operations between them? (Since 
> integers
> are immutable, operations mean allocation of space for new ones for 
> results
> and disposing of unused garbage ones (probably back to a special fast pool 
> for
> integers and longs)). Are you interested in a speed/memory tradeoff?
>
The algorithms that interest me most do not involve cyclic data structures, 
so I am trusting in built-in reference counts to avoid memory leaks.  At the 
moment I'm expecting to use bitmaps of constant size (81 for order 3, or 256 
for order 4) for the most numerous data items, so fragmentation should not 
be excessive.

Space was my first thought, but I also expect that the parallelism of 
bitwise operations will be reasonably time-efficient.  I would hope to be 
able to do operations more quickly on a bitmap of n bits than on a list or 
array of n integer variables with values constrained to 0 or 1.  However the 
prospect of writing "a & b" and getting multiword functionality that could 
prove the concepts was rather appealing too; almost executable pseudocode.

The effectiveness of the algorithms will determine how much time and space I 
use, but for NP-complete problems the ceiling is always too low, and one is 
constantly learning new ways to duck.

> If your bit vectors are extremely large and sparse (have only a few bits 
> "on"),
> you might consider sets (import sets and help(sets)) of the bit numbers as 
> representations.
>
> BTW, I wonder if anyone has written an ordered bit set class in C yet. I 
> was tempted ;-)
>
For symbolic Boolean algebra on systems of 729 or 4096 variables, sparse is 
the way to go, but I would want ordered sets too.  I've already implemented 
a Boolean algebra system using Python lists (oops, thank you again Raymond) 
of 32-bit bitmaps, and the calculations it does are not dazzlingly fast.  My 
question about using long integers had a different approach in mind.

> How much memory do you have? Buying more can be a pretty cheap way of 
> solving space worries
> if you are getting paid for your time.
>
512Mb.  The only time I've hit the limit so far was when I got distracted 
enough to leave out the escape condition in a small recursive function. 
Death was surprisingly rapid.

> You should be able to subclass int or long as a way of writing your 
> program in terms of
> your own bit vector class. Then you can change your mind and change the 
> underlying representation
> without changing the code that uses the api. Compared to plain longs it 
> will cost you some speed
> to keep converting results to your own type though.
>
I like to encapsulate, but I hadn't thought of that one.  Yes, it's an 
approach to getting the best of both worlds; i