Bengt Richter wrote: > On Sat, 07 Jan 2006 14:05:18 +0100, Claudio Grondi <[EMAIL PROTECTED]> wrote: > [...] > >>What I am also looking for is a conversion to base 256 (i.e where the >>full byte is used and the string and the integer have the same actual >>content if on appropriate endian machine), which would make the bit >>extraction comparable easy and effective as the i.__hex__() based method. >>Using digits() instead of the code you have provided speeds the whole >>thing up two times (see attached code for details), but still is >>i.__hex__() the best way to go and could be improved probably only by >>direct conversion to base 256 or even higher base as e.g. 2**16 or 2**32. >> > > (see Paul Rubin's post first). > BTW, I'd bet that '%x'.__mod__(i) will save you time over i.__hex__() by the > time > you finish fiddling with 0x and L, even if you don't use the rest. Yes, it seem to do, but not very significant, at least in case of the code provided (see attachment). Here the output of run of provided code on my Pentium 4, 3.0 GHz : Seconds for bit extraction from integer with 280 hex-digits when: looping over i>>1;i&0x01 = 0.001207 ^-- but in 30 bit chunks = 0.000904 looping over i>>1;i%2 = 0.002409 using i.__hex__() repr. = 0.000186 using '%02X'.__mod__(i) = 0.000145 using __hex__,array,binascii = 0.000117 using __mod__,array,binascii = 0.000114 using gmpy.digits(i,2) = 0.000615 > > The question looming is, "What are you intending to do with your new > representations > of twos complement integers?" As you see in the 'Subject' I am currently not primary after conversion of integers, but after understanding of Python implementation. The code posted is a kind of side effect required in order to be able to demonstrate progress if any. What I have lately learned from this and some other threads in (de.)comp.lang.python is, that to be able to get the best out of Python in terms of speed and understanding what is or is not possible, it is vital and necessary to understand the implementation. Another reason behind this all is, that I was so impressed as it had turned out, that it was possible to get the CPU time required for conversion of the largest known prime to decimal form down from six hours to six seconds (using BigDec), that I am now step by step checking if similar effects can't be achieved also elsewhere.
> > BTW2, an easy way to play with integers chopped into little-endian bit field > sequences is > a generator, e.g., (note that last field is sign bits, either all zeroes or > all ones, so > you can unambiguously reconstruct a signed integer from this variable width > representation). > (not very tested, and BTW as you see I hacked in using int when possible for > return values) > > >>> def bitsof(n, width=8): > ... m = (2**(width-1)-1)*2+1 # allow 2**width-1 == sys.maxint as int > ... if type(m) is long: > ... yield n&m > ... while n>0 or n<-1: > ... n>>=width > ... yield n&m > ... else: > ... yield int(n&m) > ... while n>0 or n<-1: > ... n>>=width > ... yield int(n&m) > ... > >>> list(bitsof(11,1)) > [1, 1, 0, 1, 0] > >>> list(bitsof(5,1)) > [1, 0, 1, 0] > >>> list(bitsof(-5,1)) > [1, 1, 0, 1] > >>> hex(3**100) > '0x5A4653CA673768565B41F775D6947D55CF3813D1L' > >>> ''.join(map('%02X'.__mod__, bitsof(3**100, 8))[::-1]) > '005A4653CA673768565B41F775D6947D55CF3813D1' > >>> ''.join(map('%02X'.__mod__, bitsof(-3**100, 8))[::-1]) > 'FFA5B9AC3598C897A9A4BE088A296B82AA30C7EC2F' > >>> hex(-3**100) > '-0x5A4653CA673768565B41F775D6947D55CF3813D1L' > > <gentle rant> > (I leave it to your judgement as to how useful our current hex() > representation > of negative numebers is ;-) > </gentle rant> > > Regards, > Bengt Richter I haven't yet run into negative integers, so thank's for making me aware of the potential problems with it. Yes, I would not expect the hex() to give me the minus sign - I see I should be very careful here (this is also the reason why I generally try to avoid usage of negative integers as good as I can). Claudio P.S. Attachment: import time # dctLstOfBitsVsCharOfNibble is a dictionary with a key beeing one character # string representing hexadecimal value of a nibble and a value beeing a list # of bits of the nibble where the lowest bit is stored at index 0 of the list. # i.e. dctLstOfBitsVsCharOfNibble = { '0':[0, 0, 0, 0], '1':[0, 0, 0, 1], '2':[0, 0, 1, 0], '3':[0, 0, 1, 1], '4':[0, 1, 0, 0], '5':[0, 1, 0, 1], '6':[0, 1, 1, 0], '7':[0, 1, 1, 1], '8':[1, 0, 0, 0], '9':[1, 0, 0, 1], 'A':[1, 0, 1, 0], 'B':[1, 0, 1, 1], 'C':[1, 1, 0, 0], 'D':[1, 1, 0, 1], 'E':[1, 1, 1, 0], 'F':[1, 1, 1, 1] } # The dctLstOfBitsVsCharOfNibble dictionary can be also generated by following code: # dctLstOfBitsVsCharOfNibble = {} # for intNibbleValue in range(0, 16): # lstBitReprOfCurrNibble=[] # for indxOfBit in range(0,4): # lstBitReprOfCurrNibble.append(intNibbleValue>>indxOfBit&0x01) # #:for # lstBitReprOfCurrNibble.reverse() # dctLstOfBitsVsCharOfNibble['%X'%(intNibbleValue,)]=lstBitReprOfCurrNibble # #:for dctLstOfBitsVsOrdOfChar = {} for ordOfChar in range(256): dctLstOfBitsVsOrdOfChar[ordOfChar] = dctLstOfBitsVsCharOfNibble['%X'%(ordOfChar>>4,)] + dctLstOfBitsVsCharOfNibble['%X'%(ordOfChar&0x0F,)] #:for # what gives: { # 0x00:[0, 0, 0, 0, 0, 0, 0, 0], # 0x01:[0, 0, 0, 0, 0, 0, 0, 1], # 0x02:[0, 0, 0, 0, 0, 0, 1, 0], # ... # 0xFD:[1, 1, 1, 1, 1, 1, 0, 1], # 0xFE:[1, 1, 1, 1, 1, 1, 1, 0], # 0xFF:[1, 1, 1, 1, 1, 1, 1, 1] # } dctLstOfBitsVsHexOfChar = {} for ordOfChar in range(256): dctLstOfBitsVsHexOfChar['%02X'%ordOfChar] = dctLstOfBitsVsOrdOfChar[ordOfChar] #:for # what gives: { # '00':[0, 0, 0, 0, 0, 0, 0, 0], # '01':[0, 0, 0, 0, 0, 0, 0, 1], # '02':[0, 0, 0, 0, 0, 0, 1, 0], # ... # 'FD':[1, 1, 1, 1, 1, 1, 0, 1], # 'FE':[1, 1, 1, 1, 1, 1, 1, 0], # 'FF':[1, 1, 1, 1, 1, 1, 1, 1] # } n = 0 NoOf32bitChunks = 0 lstBitsBitwiseAnd = [] lstBitsModulo = [] lstViaBitChunks = [] lstBitsViaHex = [] lstBitsViaMod = [] lstBitsGMPY = [] lstViaHexAndArray = [] lstViaModAndArray = [] timeBitwiseAnd = -1 timeModulo = -1 timeBitsViaHex = -1 timeBitsViaMod = -1 timeViaBitChunks = -1 timeGMPY = -1 timeViaHexAndArray = -1 timeViaModAndArray = -1 bitChunkSize = 30 # must be <= 32 while timeBitwiseAnd <= timeBitsViaHex + 0.001: n = (n<<32) + 0x12345678 NoOf32bitChunks += 1 i = n lstBitsModulo = [] start = time.clock() while i: lstBitsModulo.append(i%2) i=i>>1 timeModulo = time.clock()-start i = n lstBitsBitwiseAnd = [] start = time.clock() while i: lstBitsBitwiseAnd.append(i&0x01) i=i>>1 timeBitwiseAnd = time.clock()-start i = n lstViaBitChunks = [] bitFilter = 0 for dummy in range(bitChunkSize): bitFilter = (bitFilter<<1)+1 #:for start = time.clock() done = False while i: i1 = int(i & bitFilter) # int() converts here a long-integer to integer i >>= bitChunkSize if i == 0: done = True for k in xrange(bitChunkSize): lstViaBitChunks.append(i1 & 1) i1 >>= 1 if done and i1==0: # if this is the top word, and no bits are left, we are done break #:if #:for #:while timeViaBitChunks = time.clock()-start i = n # lstBitsViaHex = [] start = time.clock() strHexOf_i = i.__hex__()[2:] if strHexOf_i[-1]=='L': strHexOf_i=strHexOf_i[0:-1] #:if intNoOfLeadingZeroBits = 0 lstBitsOfFirstNibble = dctLstOfBitsVsCharOfNibble[strHexOf_i[0]] while not lstBitsOfFirstNibble[intNoOfLeadingZeroBits] and intNoOfLeadingZeroBits < 4: intNoOfLeadingZeroBits += 1 #:while if intNoOfLeadingZeroBits == 4: lstBitsViaHex = [] else: lstBitsViaHex = lstBitsOfFirstNibble[intNoOfLeadingZeroBits:] #:if for indxToStrHexOf_i in range(1,len(strHexOf_i)): lstBitsViaHex += dctLstOfBitsVsCharOfNibble[strHexOf_i[indxToStrHexOf_i]] #:for lstBitsViaHex.reverse() timeBitsViaHex = time.clock()-start i = n # lstBitsViaMod = [] start = time.clock() strHexOf_i = '%X'.__mod__(i) intNoOfLeadingZeroBits = 0 lstBitsOfFirstByte = dctLstOfBitsVsHexOfChar[strHexOf_i[0:2]] while not lstBitsOfFirstByte[intNoOfLeadingZeroBits] and intNoOfLeadingZeroBits < 8: intNoOfLeadingZeroBits += 1 #:while if intNoOfLeadingZeroBits == 8: lstBitsViaMod = [] else: lstBitsViaMod = lstBitsOfFirstByte[intNoOfLeadingZeroBits:] #:if for indxToStrHexOf_i in range(2,len(strHexOf_i),2): lstBitsViaMod += dctLstOfBitsVsHexOfChar[strHexOf_i[indxToStrHexOf_i:indxToStrHexOf_i+2]] #:for lstBitsViaMod.reverse() timeBitsViaMod = time.clock()-start i = n lstViaHexAndArray = [] import array, binascii start = time.clock() strHexOf_i = i.__hex__()[2:] if strHexOf_i[-1]=='L': strHexOf_i=strHexOf_i[0:-1] #:if for item in array.array('B', binascii.unhexlify(strHexOf_i)): lstViaHexAndArray += dctLstOfBitsVsOrdOfChar[item] #:for intNoOfLeadingZeroBits = 0 while not lstViaHexAndArray[intNoOfLeadingZeroBits] and intNoOfLeadingZeroBits < 4: intNoOfLeadingZeroBits += 1 #:while if intNoOfLeadingZeroBits == 4: lstViaHexAndArray = [] else: lstViaHexAndArray = lstViaHexAndArray[intNoOfLeadingZeroBits:] #:if lstViaHexAndArray.reverse() timeViaHexAndArray = time.clock()-start i = n lstViaModAndArray = [] import array, binascii start = time.clock() strHexOf_i = '%X'.__mod__(i) for item in array.array('B', binascii.unhexlify(strHexOf_i)): lstViaModAndArray += dctLstOfBitsVsOrdOfChar[item] #:for intNoOfLeadingZeroBits = 0 while not lstViaModAndArray[intNoOfLeadingZeroBits] and intNoOfLeadingZeroBits < 4: intNoOfLeadingZeroBits += 1 #:while if intNoOfLeadingZeroBits == 4: lstViaModAndArray = [] else: lstViaModAndArray = lstViaModAndArray[intNoOfLeadingZeroBits:] #:if lstViaModAndArray.reverse() timeViaModAndArray = time.clock()-start import gmpy lstBitsGMPY = [] # start = time.clock() # i = gmpy.mpz(n) # f = gmpy.scan1(i,0) # if f==0: # lstBitsGMPY.append(1) # k = 1 # f = gmpy.scan1(i,1) # else: # k = 0 # while f: # while k<f: # lstBitsGMPY.append(0) # k += 1 # lstBitsGMPY.append(1) # k += 1 # f = gmpy.scan1(i,f+1) # timeGMPY = time.clock()-start # gmpy.binary(i) delivers base-256 little endian binary representation i.e. : # gmpy.binary(0x12131415) == '\x15\x14\x13\x12' # so it can't be directly used here in the given context. start = time.clock() i = gmpy.mpz(n) strBitsOf_i = gmpy.digits(i,2) for char in strBitsOf_i: if char=='0': lstBitsGMPY.append(0) else: lstBitsGMPY.append(1) #:for lstBitsGMPY.reverse() timeGMPY = time.clock()-start #:while if( lstBitsBitwiseAnd == lstBitsModulo and lstBitsBitwiseAnd == lstBitsViaHex and lstBitsBitwiseAnd == lstBitsViaMod and lstBitsBitwiseAnd == lstBitsViaHex and lstBitsBitwiseAnd == lstViaBitChunks and lstBitsBitwiseAnd == lstViaHexAndArray and lstBitsBitwiseAnd == lstViaModAndArray and lstBitsBitwiseAnd == lstBitsGMPY ): print print ' Seconds for bit extraction from integer with %i hex-digits when: '%(NoOf32bitChunks*8,) print print ' looping over i>>1;i&0x01 = %f '%(timeBitwiseAnd,) print ' ^-- but in %i bit chunks = %f '%(bitChunkSize, timeViaBitChunks) print ' looping over i>>1;i%%2 = %f '%(timeModulo,) print ' using i.__hex__() repr. = %f '%(timeBitsViaHex,) print ' using \'%%02X\'.__mod__(i) = %f '%(timeBitsViaMod,) print ' using __hex__,array,binascii = %f '%(timeViaHexAndArray,) print ' using __mod__,array,binascii = %f '%(timeViaModAndArray,) print ' using gmpy.digits(i,2) = %f '%(timeGMPY,) print print ' Bits extraction from long integers with number of hexadecimal digits ' print ' > %i '%(NoOf32bitChunks*8,) print ' is at least 1 ms faster with i.__hex__() then with i>>1;i&0x01 method.' print print ' Modulo division (%2) is always slower than bitwise-and (&0x01).' -- http://mail.python.org/mailman/listinfo/python-list