Re: Efficiency of using long integers to hold bitmaps

2005-07-12 Thread Jeff Melvaine
Raymond,

Thanks for your answers, which even covered the question that I didn't ask 
but should have.

code A Python list is not an array()\n * 100 /code :)

Jeff

Raymond Hettinger [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 [Jeff Melvaine]
 I note that I can write expressions like 1  100 and the result is 
 stored
 as a long integer, which means it is stored as an integer of arbitrary
 length.  I may need to use a large number of these, and am interested to
 know whether the storage efficiency of long integers is in danger of
 breaking my code if I use too many.  Would I do better to write a class 
 that
 defines bitwise operations on arrays of integers, each integer being 
 assumed
 to contain at most 32 bits?


 Both array() objects and long integers are equally space efficient.
 In contrast, a list of integers takes up a lot of space (the list is
 stored as an array of pointers to individual integer objects).

 Raymond
 


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Efficiency of using long integers to hold bitmaps

2005-07-12 Thread Jeff Melvaine
Bengt,

Thanks for your informative reply, further comments interleaved.

Bengt Richter [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 On Mon, 11 Jul 2005 02:37:21 +1000, Jeff Melvaine 
 [EMAIL PROTECTED] wrote:

I note that I can write expressions like 1  100 and the result is 
stored
as a long integer, which means it is stored as an integer of arbitrary
length.  I may need to use a large number of these, and am interested to
know whether the storage efficiency of long integers is in danger of
breaking my code if I use too many.  Would I do better to write a class 
that
defines bitwise operations on arrays of integers, each integer being 
assumed
to contain at most 32 bits?  I cannot find information in the Python 
manuals
for 2.4.1 that would allow me to resolve this question; perhaps the
intention is that programmers should not rely on implementation details.

Thanks in advance,

 Sounds like a possible^H^H^H^H^H^H^H^Hprobable premature optimization 
 worry ;-)

I'm writing a Sudoku solver of generic order.  The object is not to make it 
half a millisecond faster than the guy next door's solver, but I'd like it 
to be able to do a bit more than just the daily newspaper puzzle, e.g. 
search for uniquely solvable puzzles with minimal numbers of clues. 
NP-completeness will put a lid on things sooner or later, but I'd like to 
get as far as possible before that happens.

Why in Python?  All my recent professional experience is in writing Ada, 
which is not my idea of a rapid prototyping language (or a rapid deliverable 
item handover language either, for that matter :).

Why do I want it to be efficient during debugging, rather than after fine 
tuning?  I take your point, but in a sense the ability to handle large 
problems is part of the proof of concept.

 What is a large number of these going to amount to? How many, tops?
 And how many bits in each? How many operations between them? (Since 
 integers
 are immutable, operations mean allocation of space for new ones for 
 results
 and disposing of unused garbage ones (probably back to a special fast pool 
 for
 integers and longs)). Are you interested in a speed/memory tradeoff?

The algorithms that interest me most do not involve cyclic data structures, 
so I am trusting in built-in reference counts to avoid memory leaks.  At the 
moment I'm expecting to use bitmaps of constant size (81 for order 3, or 256 
for order 4) for the most numerous data items, so fragmentation should not 
be excessive.

Space was my first thought, but I also expect that the parallelism of 
bitwise operations will be reasonably time-efficient.  I would hope to be 
able to do operations more quickly on a bitmap of n bits than on a list or 
array of n integer variables with values constrained to 0 or 1.  However the 
prospect of writing a  b and getting multiword functionality that could 
prove the concepts was rather appealing too; almost executable pseudocode.

The effectiveness of the algorithms will determine how much time and space I 
use, but for NP-complete problems the ceiling is always too low, and one is 
constantly learning new ways to duck.

 If your bit vectors are extremely large and sparse (have only a few bits 
 on),
 you might consider sets (import sets and help(sets)) of the bit numbers as 
 representations.

 BTW, I wonder if anyone has written an ordered bit set class in C yet. I 
 was tempted ;-)

For symbolic Boolean algebra on systems of 729 or 4096 variables, sparse is 
the way to go, but I would want ordered sets too.  I've already implemented 
a Boolean algebra system using Python lists (oops, thank you again Raymond) 
of 32-bit bitmaps, and the calculations it does are not dazzlingly fast.  My 
question about using long integers had a different approach in mind.

 How much memory do you have? Buying more can be a pretty cheap way of 
 solving space worries
 if you are getting paid for your time.

512Mb.  The only time I've hit the limit so far was when I got distracted 
enough to leave out the escape condition in a small recursive function. 
Death was surprisingly rapid.

 You should be able to subclass int or long as a way of writing your 
 program in terms of
 your own bit vector class. Then you can change your mind and change the 
 underlying representation
 without changing the code that uses the api. Compared to plain longs it 
 will cost you some speed
 to keep converting results to your own type though.

I like to encapsulate, but I hadn't thought of that one.  Yes, it's an 
approach to getting the best of both worlds; instant performance or 
flexibility.  The thought of executable pseudocode that I translate into 
something better if necessary is not too bad for now.

 Bottom line, whether your code will break due to storage problems or be 
 fast enough
 will depend on numbers you didn't provide ;-)

Re netiquette (with thanks to other posters who made this thread so 
stimulating), I don't demand that respondents to my posted 

Re: Efficiency of using long integers to hold bitmaps

2005-07-12 Thread Bengt Richter
On Wed, 13 Jul 2005 03:24:48 +1000, Jeff Melvaine [EMAIL PROTECTED] wrote:

Bengt,

Thanks for your informative reply, further comments interleaved.

Can't reply fully now, but just had the thought that maybe some ideas
from 8-queens solvers might be useful or interesting. There is an old thread at


http://groups-beta.google.com/group/comp.lang.python/browse_frm/thread/f88f301b7578705a

that explores various ways of solving it, and uses various representations of 
the board,
including integer bit maps at the end, which turned out fastest IIRC. I'm sure 
it can still
be improved upon, and I'm not sure it will be worth your while to dig into it, 
unless you
think the problem fun, but there it is.

Regards,
Bengt Richter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Efficiency of using long integers to hold bitmaps

2005-07-12 Thread Jack Diederich
On Wed, Jul 13, 2005 at 03:24:48AM +1000, Jeff Melvaine wrote:
 Bengt,
 
 Thanks for your informative reply, further comments interleaved.
 
 Bengt Richter [EMAIL PROTECTED] wrote in message 
 news:[EMAIL PROTECTED]
  On Mon, 11 Jul 2005 02:37:21 +1000, Jeff Melvaine 
  [EMAIL PROTECTED] wrote:
 
 I note that I can write expressions like 1  100 and the result is 
 stored
 as a long integer, which means it is stored as an integer of arbitrary
 length.  I may need to use a large number of these, and am interested to
 know whether the storage efficiency of long integers is in danger of
 breaking my code if I use too many.  Would I do better to write a class 
 that
 defines bitwise operations on arrays of integers, each integer being 
 assumed
 to contain at most 32 bits?  I cannot find information in the Python 
 manuals
 for 2.4.1 that would allow me to resolve this question; perhaps the
 intention is that programmers should not rely on implementation details.
 
 Thanks in advance,
 
  Sounds like a possible^H^H^H^H^H^H^H^Hprobable premature optimization 
  worry ;-)
 
 I'm writing a Sudoku solver of generic order.  The object is not to make it 
 half a millisecond faster than the guy next door's solver, but I'd like it 
 to be able to do a bit more than just the daily newspaper puzzle, e.g. 
 search for uniquely solvable puzzles with minimal numbers of clues. 
 NP-completeness will put a lid on things sooner or later, but I'd like to 
 get as far as possible before that happens.
 

I would recommend making a hand-written C extension that does the heavy
lifting - but only in the tiny corner you need it to.  I've done this for
the ICFP[1] competition the last few years.  It is a time-limited competition
so the priorities are getting a pure-python program up and running to 
understand the problem and then making the slow parts go really fast so
you can try as many full games as possible to try out as many new strategies
as possible.

Typically this means just the game board representation is done in C.
You'll want the heuristics to be done in python in order to try out variations
easily.  For Sudoku the board implementation will likely have C functions
for copy(), valid() (raise ValueError), and something to return a list
of obviously legal values for a coordinate.  Passing coord tuples in and
list  dictionaries out has worked well for me (easy to use in the python
part of the program).

I keep C modules out of production code unless absolutely necessary, but
I have no qualms about using it in toy/hobby problems, especially because
the C code stays at a manageable few hundred lines for toy problems.
If you aren't much of a C guy check out pyrex[2].  In my darker days I did
C++ for a living so I much prefer writing modules by hand; python makes it
easy to do and it is faster, less buggy, and easier to debug.

-jackdied

[1] http://performancedrivers.com/icfp2002/
http://performancedrivers.com/icfp2004/
(other years I botched it badly enough I didn't make a webpage)
http://performancedrivers.com/icfp2002/icfp_module.c
http://performancedrivers.com/icfp2002/icfpBoard.c

[2] http://nz.cosc.canterbury.ac.nz/~greg/python/Pyrex/
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Cat and Mouse (wes Re: Efficiency of using long integers to hold bitmaps)

2005-07-11 Thread Steven D'Aprano
On Mon, 11 Jul 2005 02:58:29 +, Bengt Richter wrote:

 I think you are right about some game happening, but I'm not sure it's cat 
 and mouse.
 To me, an incomplete question feels like an invitation to play 20 questions 
 regarding
 what the real problem is. So I get a little annoyed, and often just bypass 
 the post.
 If I answer, the residual annoyance sometimes leads me to withhold my best 
 guess, and
 complain instead. Hence probably the cat and mouse impression. Not very big 
 of me, but
 OTOH a think a bit of a nudge towards better questions is not a bad thing. 
 OTO3H, maybe
 I should just silently pass up 20-questions invitations and not pollute this 
 pleasant space
 with perfumes of annoyance (since however diffuse, they are apparently 
 pungent enough
 for some to notice ;-)

If it helps, I find the entertainment value of the gentle nudging is the
only thing that makes the smell of stupid questions bearable.

It isn't true that there is no such thing as a stupid question. There are
intelligent questions that are asked in a rude and stupid way.

Expecting people to read your mind and understand what your question is
about is rude. Expecting people to decipher poorly written, badly spelt,
incomprehensible ramblings is rude. (Allowance should be made for those
whose native language is not English, and those with good excuses for bad
spelling, eg broken keyboard, actual dyslexia.) 

Some allowance for the occasional brain-fart or typo should be made, but
communication requires two parties. You wouldn't expect a mail server to
accept any random malformed packets and try to make sense of it, and you
shouldn't expect others to put in more work understanding your post than
you put in writing it.

As they say, be strict when sending and tolerant when receiving. I'm all
for that. But when people insist on sending broken packets, I see nothing
rude about bouncing those packets back with a error message or a creative
misunderstanding.



-- 
Steven

-- 
http://mail.python.org/mailman/listinfo/python-list


Efficiency of using long integers to hold bitmaps

2005-07-10 Thread Jeff Melvaine
I note that I can write expressions like 1  100 and the result is stored 
as a long integer, which means it is stored as an integer of arbitrary 
length.  I may need to use a large number of these, and am interested to 
know whether the storage efficiency of long integers is in danger of 
breaking my code if I use too many.  Would I do better to write a class that 
defines bitwise operations on arrays of integers, each integer being assumed 
to contain at most 32 bits?  I cannot find information in the Python manuals 
for 2.4.1 that would allow me to resolve this question; perhaps the 
intention is that programmers should not rely on implementation details.

Thanks in advance,

Jeff 


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Efficiency of using long integers to hold bitmaps

2005-07-10 Thread Raymond Hettinger
[Jeff Melvaine]
 I note that I can write expressions like 1  100 and the result is stored
 as a long integer, which means it is stored as an integer of arbitrary
 length.  I may need to use a large number of these, and am interested to
 know whether the storage efficiency of long integers is in danger of
 breaking my code if I use too many.  Would I do better to write a class that
 defines bitwise operations on arrays of integers, each integer being assumed
 to contain at most 32 bits?


Both array() objects and long integers are equally space efficient.
In contrast, a list of integers takes up a lot of space (the list is
stored as an array of pointers to individual integer objects).

Raymond

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Efficiency of using long integers to hold bitmaps

2005-07-10 Thread Bengt Richter
On Mon, 11 Jul 2005 02:37:21 +1000, Jeff Melvaine [EMAIL PROTECTED] wrote:

I note that I can write expressions like 1  100 and the result is stored 
as a long integer, which means it is stored as an integer of arbitrary 
length.  I may need to use a large number of these, and am interested to 
know whether the storage efficiency of long integers is in danger of 
breaking my code if I use too many.  Would I do better to write a class that 
defines bitwise operations on arrays of integers, each integer being assumed 
to contain at most 32 bits?  I cannot find information in the Python manuals 
for 2.4.1 that would allow me to resolve this question; perhaps the 
intention is that programmers should not rely on implementation details.

Thanks in advance,

Sounds like a possible^H^H^H^H^H^H^H^Hprobable premature optimization worry ;-)

What is a large number of these going to amount to? How many, tops?
And how many bits in each? How many operations between them? (Since integers
are immutable, operations mean allocation of space for new ones for results
and disposing of unused garbage ones (probably back to a special fast pool for
integers and longs)). Are you interested in a speed/memory tradeoff?

If your bit vectors are extremely large and sparse (have only a few bits on),
you might consider sets (import sets and help(sets)) of the bit numbers as 
representations.

BTW, I wonder if anyone has written an ordered bit set class in C yet. I was 
tempted ;-)

How much memory do you have? Buying more can be a pretty cheap way of solving 
space worries
if you are getting paid for your time.

You should be able to subclass int or long as a way of writing your program in 
terms of
your own bit vector class. Then you can change your mind and change the 
underlying representation
without changing the code that uses the api. Compared to plain longs it will 
cost you some speed
to keep converting results to your own type though.

Bottom line, whether your code will break due to storage problems or be fast 
enough
will depend on numbers you didn't provide ;-)

Regards,
Bengt Richter
-- 
http://mail.python.org/mailman/listinfo/python-list


Cat and Mouse (wes Re: Efficiency of using long integers to hold bitmaps)

2005-07-10 Thread Terry Reedy

Martin v. Löwis [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 Right. OTOH, I notice a frequent game of Katze und Maus (cat and mouse?)

Yes, apparently with the same idiomatic meaning, as you decribe the game 
here perfectly.

TJR



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Efficiency of using long integers to hold bitmaps

2005-07-10 Thread jepler
You'll find that using Python Longs unsuitable if you *change* the
bitmaps---All numeric types are immutable, so you'll copy the bitmap
each time you perform an operation like set bit.

numarray has a 'bit' typecode, though I'm not sure how such an array is
actually stored---from a quick look, it appears it's stored one bit per
byte:
 numarray.ones((3,3), 'Bool').tostring()
'\x01\x01\x01\x01\x01\x01\x01\x01\x01'

A Python object layer around array.array('B') would not necessarily be
fast, but it would at least avoid the copying problem, and be memory
efficient.
import array

class BitArray2D:
def __init__(self, rows, cols):
self.rows = rows
self.cols = cols
nbytes = (rows * cols + 7) / 8
self._data = array.array('B', [0]) * nbytes

def __getitem__(self, (r,c)):
# TODO: check r, c in bounds
idx = r + c * self.rows
byte = idx / 8
bit = 1  (idx%8)
return bool(self._data[byte]  bit)

def __setitem__(self, (r, c), v):
# TODO: check r, c in bounds
idx = r + c * self.rows
byte = idx / 8
bit = 1  (idx%8)
if v:
self._data[byte] |= bit
else:
self._data[byte] = ~bit

b = BitArray2D(10, 10)
print b._data
for x in range(10):
b[x,x] = b[9-x,x] = 1
print b._data
print
for x in range(10):
for y in range(10):
print  *[b[x,y]],
print

Jeff


pgpswUV2pQdjG.pgp
Description: PGP signature
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Cat and Mouse (wes Re: Efficiency of using long integers to hold bitmaps)

2005-07-10 Thread Bengt Richter
On Sun, 10 Jul 2005 19:30:58 -0400, Terry Reedy [EMAIL PROTECTED] wrote:


Martin v. Löwis [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 Right. OTOH, I notice a frequent game of Katze und Maus (cat and mouse?)

Yes, apparently with the same idiomatic meaning, as you decribe the game 
here perfectly.

I think you are right about some game happening, but I'm not sure it's cat and 
mouse.
To me, an incomplete question feels like an invitation to play 20 questions 
regarding
what the real problem is. So I get a little annoyed, and often just bypass the 
post.
If I answer, the residual annoyance sometimes leads me to withhold my best 
guess, and
complain instead. Hence probably the cat and mouse impression. Not very big of 
me, but
OTOH a think a bit of a nudge towards better questions is not a bad thing. 
OTO3H, maybe
I should just silently pass up 20-questions invitations and not pollute this 
pleasant space
with perfumes of annoyance (since however diffuse, they are apparently pungent 
enough
for some to notice ;-)

Regards,
Bengt Richter
-- 
http://mail.python.org/mailman/listinfo/python-list