Vincent Davis <vinc...@vincentdavis.net> writes: > My goal is to shorten a long integer into a shorter set of characters. > Below is what I have which gets me about a 45-50% reduction. Any suggestion > on how to improve upon this?
You can't improve much. A decimal digit carries log(10,2)=3.32 bits of information. A reasonable character set for Twitter-style links might have 80 or so characters (upper/lower alphabetic, digits, and a dozen or so punctuation characters), or log(80,2)= > I not limited to ascii but I didn't see how going to utf8 would help. If you could use Unicode characters like Chinese ideographs, that gives you a much larger alphabet to work with, so you'd need fewer chars displayed in the link, but they'd be hard for most people to type. > l = string.ascii_lowercase + string.ascii_uppercase + > '!"#$%&\'()*+,-./:;<=>?@[]^_`{|}~' OK, 83 chars, but it may not be ok to use some of them like #, /, and ?, since they can have special meanings in urls. Your algorithm looks basically ok though I didn't examine it closely. Here is my shortened version: import string # alphabet here is 83 chars alphabet = string.ascii_lowercase + \ string.ascii_uppercase +'!"#$%&\'()*+,-./:;<=>?@[]^_`{|}~' alphabet_size = len(alphabet) decoderdict = dict((b,a) for a,b in enumerate(alphabet)) def encoder(integer): a,b = divmod(integer, alphabet_size) if a == 0: return alphabet[b] return encoder(a) + alphabet[b] def decoder(code): return reduce(lambda n,d: n*alphabet_size + decoderdict[d], code, 0) def test(): n = 92928729379271 short = encoder(n) backagain = decoder(short) nlen = len(str(n)) print (nlen, len(short), float(len(short))/nlen) assert n==backagain, (n,short,b) test() -- https://mail.python.org/mailman/listinfo/python-list