Re: Efficiency of using long integers to hold bitmaps
Raymond, Thanks for your answers, which even covered the question that I didn't ask but should have. code A Python list is not an array()\n * 100 /code :) Jeff Raymond Hettinger [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] [Jeff Melvaine] I note that I can write expressions like 1 100 and the result is stored as a long integer, which means it is stored as an integer of arbitrary length. I may need to use a large number of these, and am interested to know whether the storage efficiency of long integers is in danger of breaking my code if I use too many. Would I do better to write a class that defines bitwise operations on arrays of integers, each integer being assumed to contain at most 32 bits? Both array() objects and long integers are equally space efficient. In contrast, a list of integers takes up a lot of space (the list is stored as an array of pointers to individual integer objects). Raymond -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficiency of using long integers to hold bitmaps
Bengt, Thanks for your informative reply, further comments interleaved. Bengt Richter [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] On Mon, 11 Jul 2005 02:37:21 +1000, Jeff Melvaine [EMAIL PROTECTED] wrote: I note that I can write expressions like 1 100 and the result is stored as a long integer, which means it is stored as an integer of arbitrary length. I may need to use a large number of these, and am interested to know whether the storage efficiency of long integers is in danger of breaking my code if I use too many. Would I do better to write a class that defines bitwise operations on arrays of integers, each integer being assumed to contain at most 32 bits? I cannot find information in the Python manuals for 2.4.1 that would allow me to resolve this question; perhaps the intention is that programmers should not rely on implementation details. Thanks in advance, Sounds like a possible^H^H^H^H^H^H^H^Hprobable premature optimization worry ;-) I'm writing a Sudoku solver of generic order. The object is not to make it half a millisecond faster than the guy next door's solver, but I'd like it to be able to do a bit more than just the daily newspaper puzzle, e.g. search for uniquely solvable puzzles with minimal numbers of clues. NP-completeness will put a lid on things sooner or later, but I'd like to get as far as possible before that happens. Why in Python? All my recent professional experience is in writing Ada, which is not my idea of a rapid prototyping language (or a rapid deliverable item handover language either, for that matter :). Why do I want it to be efficient during debugging, rather than after fine tuning? I take your point, but in a sense the ability to handle large problems is part of the proof of concept. What is a large number of these going to amount to? How many, tops? And how many bits in each? How many operations between them? (Since integers are immutable, operations mean allocation of space for new ones for results and disposing of unused garbage ones (probably back to a special fast pool for integers and longs)). Are you interested in a speed/memory tradeoff? The algorithms that interest me most do not involve cyclic data structures, so I am trusting in built-in reference counts to avoid memory leaks. At the moment I'm expecting to use bitmaps of constant size (81 for order 3, or 256 for order 4) for the most numerous data items, so fragmentation should not be excessive. Space was my first thought, but I also expect that the parallelism of bitwise operations will be reasonably time-efficient. I would hope to be able to do operations more quickly on a bitmap of n bits than on a list or array of n integer variables with values constrained to 0 or 1. However the prospect of writing a b and getting multiword functionality that could prove the concepts was rather appealing too; almost executable pseudocode. The effectiveness of the algorithms will determine how much time and space I use, but for NP-complete problems the ceiling is always too low, and one is constantly learning new ways to duck. If your bit vectors are extremely large and sparse (have only a few bits on), you might consider sets (import sets and help(sets)) of the bit numbers as representations. BTW, I wonder if anyone has written an ordered bit set class in C yet. I was tempted ;-) For symbolic Boolean algebra on systems of 729 or 4096 variables, sparse is the way to go, but I would want ordered sets too. I've already implemented a Boolean algebra system using Python lists (oops, thank you again Raymond) of 32-bit bitmaps, and the calculations it does are not dazzlingly fast. My question about using long integers had a different approach in mind. How much memory do you have? Buying more can be a pretty cheap way of solving space worries if you are getting paid for your time. 512Mb. The only time I've hit the limit so far was when I got distracted enough to leave out the escape condition in a small recursive function. Death was surprisingly rapid. You should be able to subclass int or long as a way of writing your program in terms of your own bit vector class. Then you can change your mind and change the underlying representation without changing the code that uses the api. Compared to plain longs it will cost you some speed to keep converting results to your own type though. I like to encapsulate, but I hadn't thought of that one. Yes, it's an approach to getting the best of both worlds; instant performance or flexibility. The thought of executable pseudocode that I translate into something better if necessary is not too bad for now. Bottom line, whether your code will break due to storage problems or be fast enough will depend on numbers you didn't provide ;-) Re netiquette (with thanks to other posters who made this thread so stimulating), I don't demand that respondents to my posted
Re: Efficiency of using long integers to hold bitmaps
On Wed, 13 Jul 2005 03:24:48 +1000, Jeff Melvaine [EMAIL PROTECTED] wrote: Bengt, Thanks for your informative reply, further comments interleaved. Can't reply fully now, but just had the thought that maybe some ideas from 8-queens solvers might be useful or interesting. There is an old thread at http://groups-beta.google.com/group/comp.lang.python/browse_frm/thread/f88f301b7578705a that explores various ways of solving it, and uses various representations of the board, including integer bit maps at the end, which turned out fastest IIRC. I'm sure it can still be improved upon, and I'm not sure it will be worth your while to dig into it, unless you think the problem fun, but there it is. Regards, Bengt Richter -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficiency of using long integers to hold bitmaps
On Wed, Jul 13, 2005 at 03:24:48AM +1000, Jeff Melvaine wrote: Bengt, Thanks for your informative reply, further comments interleaved. Bengt Richter [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] On Mon, 11 Jul 2005 02:37:21 +1000, Jeff Melvaine [EMAIL PROTECTED] wrote: I note that I can write expressions like 1 100 and the result is stored as a long integer, which means it is stored as an integer of arbitrary length. I may need to use a large number of these, and am interested to know whether the storage efficiency of long integers is in danger of breaking my code if I use too many. Would I do better to write a class that defines bitwise operations on arrays of integers, each integer being assumed to contain at most 32 bits? I cannot find information in the Python manuals for 2.4.1 that would allow me to resolve this question; perhaps the intention is that programmers should not rely on implementation details. Thanks in advance, Sounds like a possible^H^H^H^H^H^H^H^Hprobable premature optimization worry ;-) I'm writing a Sudoku solver of generic order. The object is not to make it half a millisecond faster than the guy next door's solver, but I'd like it to be able to do a bit more than just the daily newspaper puzzle, e.g. search for uniquely solvable puzzles with minimal numbers of clues. NP-completeness will put a lid on things sooner or later, but I'd like to get as far as possible before that happens. I would recommend making a hand-written C extension that does the heavy lifting - but only in the tiny corner you need it to. I've done this for the ICFP[1] competition the last few years. It is a time-limited competition so the priorities are getting a pure-python program up and running to understand the problem and then making the slow parts go really fast so you can try as many full games as possible to try out as many new strategies as possible. Typically this means just the game board representation is done in C. You'll want the heuristics to be done in python in order to try out variations easily. For Sudoku the board implementation will likely have C functions for copy(), valid() (raise ValueError), and something to return a list of obviously legal values for a coordinate. Passing coord tuples in and list dictionaries out has worked well for me (easy to use in the python part of the program). I keep C modules out of production code unless absolutely necessary, but I have no qualms about using it in toy/hobby problems, especially because the C code stays at a manageable few hundred lines for toy problems. If you aren't much of a C guy check out pyrex[2]. In my darker days I did C++ for a living so I much prefer writing modules by hand; python makes it easy to do and it is faster, less buggy, and easier to debug. -jackdied [1] http://performancedrivers.com/icfp2002/ http://performancedrivers.com/icfp2004/ (other years I botched it badly enough I didn't make a webpage) http://performancedrivers.com/icfp2002/icfp_module.c http://performancedrivers.com/icfp2002/icfpBoard.c [2] http://nz.cosc.canterbury.ac.nz/~greg/python/Pyrex/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Cat and Mouse (wes Re: Efficiency of using long integers to hold bitmaps)
On Mon, 11 Jul 2005 02:58:29 +, Bengt Richter wrote: I think you are right about some game happening, but I'm not sure it's cat and mouse. To me, an incomplete question feels like an invitation to play 20 questions regarding what the real problem is. So I get a little annoyed, and often just bypass the post. If I answer, the residual annoyance sometimes leads me to withhold my best guess, and complain instead. Hence probably the cat and mouse impression. Not very big of me, but OTOH a think a bit of a nudge towards better questions is not a bad thing. OTO3H, maybe I should just silently pass up 20-questions invitations and not pollute this pleasant space with perfumes of annoyance (since however diffuse, they are apparently pungent enough for some to notice ;-) If it helps, I find the entertainment value of the gentle nudging is the only thing that makes the smell of stupid questions bearable. It isn't true that there is no such thing as a stupid question. There are intelligent questions that are asked in a rude and stupid way. Expecting people to read your mind and understand what your question is about is rude. Expecting people to decipher poorly written, badly spelt, incomprehensible ramblings is rude. (Allowance should be made for those whose native language is not English, and those with good excuses for bad spelling, eg broken keyboard, actual dyslexia.) Some allowance for the occasional brain-fart or typo should be made, but communication requires two parties. You wouldn't expect a mail server to accept any random malformed packets and try to make sense of it, and you shouldn't expect others to put in more work understanding your post than you put in writing it. As they say, be strict when sending and tolerant when receiving. I'm all for that. But when people insist on sending broken packets, I see nothing rude about bouncing those packets back with a error message or a creative misunderstanding. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Efficiency of using long integers to hold bitmaps
I note that I can write expressions like 1 100 and the result is stored as a long integer, which means it is stored as an integer of arbitrary length. I may need to use a large number of these, and am interested to know whether the storage efficiency of long integers is in danger of breaking my code if I use too many. Would I do better to write a class that defines bitwise operations on arrays of integers, each integer being assumed to contain at most 32 bits? I cannot find information in the Python manuals for 2.4.1 that would allow me to resolve this question; perhaps the intention is that programmers should not rely on implementation details. Thanks in advance, Jeff -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficiency of using long integers to hold bitmaps
[Jeff Melvaine] I note that I can write expressions like 1 100 and the result is stored as a long integer, which means it is stored as an integer of arbitrary length. I may need to use a large number of these, and am interested to know whether the storage efficiency of long integers is in danger of breaking my code if I use too many. Would I do better to write a class that defines bitwise operations on arrays of integers, each integer being assumed to contain at most 32 bits? Both array() objects and long integers are equally space efficient. In contrast, a list of integers takes up a lot of space (the list is stored as an array of pointers to individual integer objects). Raymond -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficiency of using long integers to hold bitmaps
On Mon, 11 Jul 2005 02:37:21 +1000, Jeff Melvaine [EMAIL PROTECTED] wrote: I note that I can write expressions like 1 100 and the result is stored as a long integer, which means it is stored as an integer of arbitrary length. I may need to use a large number of these, and am interested to know whether the storage efficiency of long integers is in danger of breaking my code if I use too many. Would I do better to write a class that defines bitwise operations on arrays of integers, each integer being assumed to contain at most 32 bits? I cannot find information in the Python manuals for 2.4.1 that would allow me to resolve this question; perhaps the intention is that programmers should not rely on implementation details. Thanks in advance, Sounds like a possible^H^H^H^H^H^H^H^Hprobable premature optimization worry ;-) What is a large number of these going to amount to? How many, tops? And how many bits in each? How many operations between them? (Since integers are immutable, operations mean allocation of space for new ones for results and disposing of unused garbage ones (probably back to a special fast pool for integers and longs)). Are you interested in a speed/memory tradeoff? If your bit vectors are extremely large and sparse (have only a few bits on), you might consider sets (import sets and help(sets)) of the bit numbers as representations. BTW, I wonder if anyone has written an ordered bit set class in C yet. I was tempted ;-) How much memory do you have? Buying more can be a pretty cheap way of solving space worries if you are getting paid for your time. You should be able to subclass int or long as a way of writing your program in terms of your own bit vector class. Then you can change your mind and change the underlying representation without changing the code that uses the api. Compared to plain longs it will cost you some speed to keep converting results to your own type though. Bottom line, whether your code will break due to storage problems or be fast enough will depend on numbers you didn't provide ;-) Regards, Bengt Richter -- http://mail.python.org/mailman/listinfo/python-list
Cat and Mouse (wes Re: Efficiency of using long integers to hold bitmaps)
Martin v. Löwis [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Right. OTOH, I notice a frequent game of Katze und Maus (cat and mouse?) Yes, apparently with the same idiomatic meaning, as you decribe the game here perfectly. TJR -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficiency of using long integers to hold bitmaps
You'll find that using Python Longs unsuitable if you *change* the bitmaps---All numeric types are immutable, so you'll copy the bitmap each time you perform an operation like set bit. numarray has a 'bit' typecode, though I'm not sure how such an array is actually stored---from a quick look, it appears it's stored one bit per byte: numarray.ones((3,3), 'Bool').tostring() '\x01\x01\x01\x01\x01\x01\x01\x01\x01' A Python object layer around array.array('B') would not necessarily be fast, but it would at least avoid the copying problem, and be memory efficient. import array class BitArray2D: def __init__(self, rows, cols): self.rows = rows self.cols = cols nbytes = (rows * cols + 7) / 8 self._data = array.array('B', [0]) * nbytes def __getitem__(self, (r,c)): # TODO: check r, c in bounds idx = r + c * self.rows byte = idx / 8 bit = 1 (idx%8) return bool(self._data[byte] bit) def __setitem__(self, (r, c), v): # TODO: check r, c in bounds idx = r + c * self.rows byte = idx / 8 bit = 1 (idx%8) if v: self._data[byte] |= bit else: self._data[byte] = ~bit b = BitArray2D(10, 10) print b._data for x in range(10): b[x,x] = b[9-x,x] = 1 print b._data print for x in range(10): for y in range(10): print *[b[x,y]], print Jeff pgpswUV2pQdjG.pgp Description: PGP signature -- http://mail.python.org/mailman/listinfo/python-list
Re: Cat and Mouse (wes Re: Efficiency of using long integers to hold bitmaps)
On Sun, 10 Jul 2005 19:30:58 -0400, Terry Reedy [EMAIL PROTECTED] wrote: Martin v. Löwis [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Right. OTOH, I notice a frequent game of Katze und Maus (cat and mouse?) Yes, apparently with the same idiomatic meaning, as you decribe the game here perfectly. I think you are right about some game happening, but I'm not sure it's cat and mouse. To me, an incomplete question feels like an invitation to play 20 questions regarding what the real problem is. So I get a little annoyed, and often just bypass the post. If I answer, the residual annoyance sometimes leads me to withhold my best guess, and complain instead. Hence probably the cat and mouse impression. Not very big of me, but OTOH a think a bit of a nudge towards better questions is not a bad thing. OTO3H, maybe I should just silently pass up 20-questions invitations and not pollute this pleasant space with perfumes of annoyance (since however diffuse, they are apparently pungent enough for some to notice ;-) Regards, Bengt Richter -- http://mail.python.org/mailman/listinfo/python-list