Re: istep() addition to itertool? (Was: Re: Printing n elements per line in a list)
Rhamphoryncus wrote: I've run into this problem a few times, and although many solutions have been presented specifically for printing I would like to present a more general alternative. [snip interesting istep function] Would anybody else find this useful? Maybe worth adding it to itertool? yeah, but why on earth did you make it so complicated? def istep(iterable, step): a=[] for x in iterable: if len(a) = step: yield a a=[] a.append(x) if a: yield a -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: trouble using \ as a string
OriginalBrownster wrote: i want this because using python I am pulling in filenames from a mac..thus they are / in the pathways..and i want to .split it at the / to obtain the filename at the end...but its proving diffucult with this obstacle in the way. sounds like you want import posixpath posixpath.basename(path) assuming you are on a windows box,otherwise the normal os.path.basename will do it. -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: variable creation
Alistair King wrote: Hei all, im trying to create a list of variables for further use: [snip] this works to a certain extent but gets stuck on some loop. Im a beginner and am not sure where im going wrong. You are trying to do too much in one function. Split those loops up into a few little ones and the program will work... or if it doesn't, you'll know exactly where the problem is. def get_element(pt): Return None or a single element from the periodic table while 1: el = raw_input(Which element would you like to include? ) if not el: #blank answer return if el in pt: return pt[el] print This element is not in the periodic table, please try again def get_elements(pt): elements = [] while 1: el = get_element(pt) if not el: break elements.append(el) return elements See how using two separate functions makes it easy to test? In [10]:print get_element(pt) Which element would you like to include? X This element is not in the periodic table, please try again Which element would you like to include? H 1.00794001 In [11]:print get_elements(pt) Which element would you like to include? Z This element is not in the periodic table, please try again Which element would you like to include? Li Which element would you like to include? B Which element would you like to include? H Which element would you like to include? [6.9408, 10.811, 1.00794001] Now, since the information for a single element consists of more than just a single number, you'll probably want to make a class for them. Once you have an object for every element, you can add them to a class for the periodic table. -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: newb question: file searching
[EMAIL PROTECTED] wrote: I've narrowed down the problem. All the problems start when I try to eliminate the hidden files and directories. Is there a better way to do this? Well you almost have it, but your problem is that you are trying to do too many things in one function. (I bet I am starting to sound like a broken record :-)) The four distinct things you are doing are: * getting a list of all files in a tree * combining a files directory with its name to give the full path * ignoring hidden directories * matching files based on their extension If you split up each of those things into their own function you will end up with smaller easier to test pieces, and separate, reusable functions. The core function would be basically what you already have: def get_files(directory, include_hidden=False): Return an expanded list of files for a directory tree optionally not ignoring hidden directories for path, dirs, files in os.walk(directory): for fn in files: full = os.path.join(path, fn) yield full if not include_hidden: remove_hidden(dirs) and remove_hidden is a short, but tricky function since the directory list needs to be edited in place: def remove_hidden(dirlist): For a list containing directory names, remove any that start with a dot dirlist[:] = [d for d in dirlist if not d.startswith('.')] at this point, you can play with get_files on it's own, and test whether or not the include_hidden parameter works as expected. For the final step, I'd use an approach that pulls out the extension itself, and checks to see if it is in a list(or better, a set) of allowed filenames. globbing (*.foo) works as well, but if you are only ever matching on the extension, I believe this will work better. def get_files_by_ext(directory, ext_list, include_hidden=False): Return an expanded list of files for a directory tree where the file ends with one of the extensions in ext_list ext_list = set(ext_list) for fn in get_files(directory, include_hidden): _, ext = os.path.splitext(fn) ext=ext[1:] #remove dot if ext.lower() in ext_list: yield fn notice at this point we still haven't said anything about images! The task of finding files by extension is pretty generic, so it shouldn't be concerned about the actual extensions. once that works, you can simply do def get_images(directory, include_hidden=False): image_exts = ('jpg','jpeg','gif','png','bmp') return get_files_by_ext(directory, image_exts, include_hidden) Hope this helps :-) -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: newb question: file searching
[EMAIL PROTECTED] wrote: I do appreciate the advice, but I've got a 12 line function that does all of that. And it works! I just wish I understood a particular line of it. You miss the point. The functions I posted, up until get_files_by_ext which is the equivalent of your getFileList, total 17 actual lines. The 5 extra lines give 3 extra features. Maybe in a while when you need to do a similar file search you will realize why my way is better. [snip] The line I don't understand is: reversed(range(len(dirnames))) This is why I wrote and documented a separate remove_hidden function, it can be tricky. If you broke it up into multiple lines, and added print statements it would be clear what it does. l = len(dirnames) # l is the number of elements in dirnames, e.g. 6 r = range(l) # r contains the numbers 0,1,2,3,4,5 rv = reversed(r) # rv contains the numbers 5,4,3,2,1,0 The problem arises from how to remove elements in a list as you are going through it. If you delete element 0, element 1 then becomes element 0, and funny things happen. That particular solution is relatively simple, it just deletes elements from the end instead. That complicated expression arises because python doesn't have normal for loops. The version of remove_hidden I wrote is simpler, but relies on the even more obscure lst[:] construct for re-assigning a list. Both of them accomplish the same thing though, so if you wanted, you should be able to replace those 3 lines with just dirnames[:] = [d for d in dirnames if not d.startswith('.')] -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: technique to enter text using a mobile phone keypad (T9 dictionary-based disambiguation)
Petr Jake wrote: I have a standard 12-key mobile phone keypad connected to my Linux machine as a I2C peripheral. I would like to write a code which allows the text entry to the computer using this keypad (something like T9 on the mobile phones) According to the http://www.yorku.ca/mack/uist01.html dictionary-based disambiguation is coming in the mind. With dictionary-based disambiguation, each key is pressed only once. For example, to enter the, the user enters 8-4-3-0. The 0 key, for SPACE, delimits words and terminates disambiguation of the preceding keys. The key sequence 8-4-3 has 3 × 3 × 3 = 27 possible renderings (see Figure 1). The system compares the possibilities to a dictionary of words to guess the intended word. I would like to ask some guru here to give me the direction which technique (Python functionality) or which strategy to use to solve this riddle. Thanks for your advices and comments Regards Petr Jakes I can think of 2 approaches to this, 1) Map the numbers to parts of a regular expression, and then use this to search through the dictiionary. 2) Pre-compute a copy of the dictionary converted to it's numerical equivalent, then just match the numbers. The basic structure you need for both of these is simple. For the first method you use keys = ['','abc','def','ghi','] then if you have s=123321 ''.join(['[%s]' % keys[int(l)] for l in s]) will give you a string like '[abc][def][ghi][def][abc]', which you can then use to match words... I think the second solution would end up being faster, as long as you have the memory - no regex work, plus, you can sort the wordlist. The following quickly written class seems to work nicely: import string import bisect letters = string.lowercase numbers = '222333444555666888999' letter_mapping = dict(zip(letters, numbers)) class phone: def __init__(self): self.read_dictionary() def word_as_numbers(self, word): nums='' for letter in word: if letter in letter_mapping: nums += letter_mapping[letter] return nums def read_dictionary(self): words = [] for line in file(/usr/share/dict/words): word = line.strip().lower() nums = self.word_as_numbers(word) words.append((nums, word)) words.sort() self.dict = words def get_matching_words(self, number_str): tup = (number_str,) left = bisect.bisect_left(self.dict, tup) for num, word in self.dict[left:]: if num.startswith(number_str): yield word else: break It takes a second or two to read the list of words in, but matching is instant thanks to bisect: In [14]:%time p=phone.phone() CPU times: user 1.65 s, sys: 0.00 s, total: 1.65 s Wall time: 1.66 In [15]:%time list(p.get_matching_words('43556')) CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s Wall time: 0.01 Out[15]:['hello', 'hellman', hellman's, hello's, 'hellos'] It seems the ruby version just posted takes a similar approach, but uses an actual tree.. using the bisect module keeps it simple. -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: Why do I require an elif statement here?
danielx wrote: I'm surprised no one has mentioned neat-er, more pythonic ways of doing this. I'm also surprised no one mentioned regular expressions. Regular expressions are really powerful for searching and manipulating text. [snip] I'm surprised you don't count my post as a neat and pythonic way of doing this. I'm also surprised that you mention regular expressions after neat and pythonic. While regular expressions often serve a purpose, they are rarely neat. Anyway, here's my solution, which does Not use regular expressions: def reindent(line): ## we use slicing, because we don't know how long line is head = line[:OLD_INDENT] tail = line[OLD_INDENT:] ## if line starts with Exactly so many spaces... if head == whitespace*OLD_INDENT and not tail.startswith(' '): return whitespace*NEW_INDENT + tail else: return line# our default [snip] This function is broken. Not only does it still rely on global variables to work, it does not actually reindent lines correctly. Your function only changes lines that start with exactly OLD_INDENT spaces, ignoring any lines that start with a multiple of OLD_INDENT. -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: Why do I require an elif statement here?
Jim wrote: Could somebody tell me why I need the elif char == '\n' in the following code? This is required in order the pick up lines with just spaces in them. Why doesn't the else: statement pick this up? No idea. Look at the profile of your program: for.. if.. for.. if.. else.. if.. This is NOT good. The reason why you are having trouble getting it to work is that you are not writing it in a way that is easy to debug and test. If one block of code ends up being indented halfway across the screen it means you are doing something wrong. This program should be split up into a handful of small functions that each do one thing. The following is slightly longer, but immensely simpler. Most importantly, it can be imported from the python shell and each function can be tested individually. def leading_spaces(line): Return the number of leading spaces num = 0 for char in line: if char != ' ': break num += 1 return num def change_indent(line, old, new): Change the indent of this line using a ratio of old:new ws = leading_spaces(line) #if there was no leading whitespace, #or it wasn't a multiple of the old indent, do nothing if ws == 0 or ws % old: return line #otherwise change the indent new_spaces = ws/old*new new_indent = ' ' * new_spaces return new_indent + line.lstrip(' ') def reindent(ifname, ofname, old, new): f = open(ifname) o = open(ofname, 'w') for line in f: line = change_indent(line, old, new) o.write(line) f.close() o.close() if __name__ == __main__: try : ifname, ofname, old, new = sys.argv[1:] old = int(old) new = int(new) except ValueError: print blah sys.exit(1) reindent(ifname, ofname, old, new) -- http://mail.python.org/mailman/listinfo/python-list
Re: help - iter dict
[EMAIL PROTECTED] wrote: Im trying to iterate through values in a dictionary so i can find the closest value and then extract the key for that valuewhat ive done so far: [snip] short time. I was trying to define a function (its my first!) so that i could apply to several 'dictionary's and 'exvalue's. [snip] If you plan on searching a single dictionary for many values, it may be much faster to convert the dictionary into a sorted list, and use the bisect module to find the closest value... something like: import bisect class closer_finder: def __init__(self, dataset): self.dataset = dataset flat = [(k,v) for v,k in dataset.iteritems()] flat.sort() self.flat = flat def __getitem__(self, target): flat = self.flat index = bisect.bisect_right(flat, (target, )) #simple cases, smaller than the smaller, #or larger than the largest if index == 0: v,k = flat[0] return k,v elif index == len(flat): v,k = flat[-1] return k,v #otherwise see which of the neighbors is closest. leftval, leftkey = flat[index-1] rightval, rightkey = flat[index] leftdiff = abs(leftval - target) rightdiff = abs(rightval - target) if leftdiff = rightdiff: return leftkey, leftval else: return rightkey, rightval In [158]:sample_data Out[158]:{'a': 1, 'c': 6, 'b': 3} In [159]:d=closer_finder(sample_data) In [160]:d.flat Out[160]:[(1, 'a'), (3, 'b'), (6, 'c')] In [161]:d[4] Out[161]:('b', 3) -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: Thread Question
Ritesh Raj Sarraf wrote: I'd like to put my understanding over here and would be happy if people can correct me at places. ok :-) So here it goes: Firstly the code initializes the number of threads. Then it moves on to initializing requestQueue() and responseQueue(). Then it moves on to thread_pool, where it realizes that it has to execute the function run(). From NUMTHREADS in the for loop, it knows how many threads it is supposed to execute parallelly. right... So once the thread_pool is populated, it starts the threads. Actually, it doesn't start the threads. Instead, it puts the threads into the queue. Neither.. it puts the threads into a list. It puts the queues into the thread - by passing them as arguments to the Thread constructor. Then the real iteration, about which I was talking in my earlier post, is done. The iteration happens in one go. And requestQueue.put(item) puts all the items from lRawData into the queue of the run(). It doesn't necessarily have put all the items into the queue at once. The previous line starts all the threads, which immediately start running while 1: item = request.get() the default Queue size is infinite, but the program would still work fine if the queue was fixed to say, 6 elements. Now that I think of it, it may even perform better... If you have an iterator that will generate a very large number of items, and the function being called by each thread is slow, the queue may end up growing to hold millions of items and cause the system to run out of memory. But there, the run() already known its limitation on the number of threads. run() doesn't know anything about threads. All it knows is that it can call request.get() to get an item to work on, and response.put() when finished. No, I think the above statement is wrong. The actual pool about the number of threads is stored by thread_pool. Once its pool (at a time 3 as per this example) is empty, it again requests for more threads using the requestQueue() The pool is never empty. The program works like a bank with 3 tellers. Each teller knows nothing about any of the other tellers, or how many people are waiting in the line. All they know is that when they say Next! (request.get()) another person steps in front of them. The tellers don't move, the line moves. And in function run(), when the item of lRawData is None, the thread stops. The the cleanup and checks of any remaining threads is done. Yes, since each thread doesn't know anything about the rest of the program, when you send it an empty item it knows to quit. It would be analogous to the bank teller saying Next! and instead of a customer, the bank mananger steps forward to tell them that they can go home for the day. Is this all correct ? Mostly :) When you understand it fully, you should look at the example I showed you before. It is essentially the same thing, just wrapped in a class to be reusable. -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: Thread Question
Ritesh Raj Sarraf wrote: [snip] for item in list_items: download_from_web(item) This way, one items is downloaded at a time. I'm planning to implement threads in my application so that multiple items can be downloaded concurrently. I want the thread option to be user-defined. [snip] See my post about the iterthreader module I wrote... http://groups.google.com/group/comp.lang.python/browse_frm/thread/2ef29fae28cf44c1/ for url, result in Threader(download_from_web, list_items): print url, result #... -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: splitting words with brackets
faulkner wrote: er, ...|\[[^\]]*\]|... ^_^ That's why it is nice to use re.VERBOSE: def splitup(s): return re.findall(''' \( [^\)]* \) | \[ [^\]]* \] | \S+ ''', s, re.VERBOSE) Much less error prone this way -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: splitting words with brackets
Paul McGuire wrote: Comparitive timing of pyparsing vs. re comes in at about 2ms for pyparsing, vs. 0.13 for re's, so about 15x faster for re's. If psyco is used (and we skip the first call, which incurs all the compiling overhead), the speed difference drops to about 7-10x. I did try compiling the re, but this didn't appear to make any difference - probably user error. That is because of how the methods in the sre module are implemented... Compiling a regex really just saves you a dictionary lookup. def findall(pattern, string, flags=0): snip return _compile(pattern, flags).findall(string) def compile(pattern, flags=0): snip return _compile(pattern, flags) def _compile(*key): # internal: compile pattern cachekey = (type(key[0]),) + key p = _cache.get(cachekey) if p is not None: return p #snip -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: Nested function scope problem
Josiah Manson wrote: I just did some timings, and found that using a list instead of a string for tok is significantly slower (it takes 1.5x longer). Using a regex is slightly faster for long strings, and slightly slower for short ones. So, regex wins in both berevity and speed! I think the list.append method of building strings may only give you speed improvements when you are adding bigger chunks of strings together instead of 1 character at a time. also: http://docs.python.org/whatsnew/node12.html#SECTION000121 String concatenations in statements of the form s = s + abc and s += abc are now performed more efficiently in certain circumstances. This optimization won't be present in other Python implementations such as Jython, so you shouldn't rely on it; using the join() method of strings is still recommended when you want to efficiently glue a large number of strings together. (Contributed by Armin Rigo.) I tested both, and these are my results for fairly large strings: [EMAIL PROTECTED]:/tmp$ python /usr/lib/python2.4/timeit.py -s'import foo' 'foo.test(foo.breakLine)' 10 loops, best of 3: 914 msec per loop [EMAIL PROTECTED]:/tmp$ python /usr/lib/python2.4/timeit.py -s'import foo' 'foo.test(foo.breakLineRE)' 10 loops, best of 3: 289 msec per loop -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: Nested function scope problem
Bruno Desthuilliers wrote: Justin Azoff a écrit : if len(tok) 0: should be written as if(tok): actually, the parenthesis are useless. yes, that's what happens when you edit something instead of typing it over from scratch :-) -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: Python newbie needs constructive suggestions
[EMAIL PROTECTED] wrote: What is the idiomatically appropriate Python way to pass, as a function-type parameter, code that is most clearly written with a local variable? For example, map takes a function-type parameter: map(lambda x: x+1, [5, 17, 49.5]) What if, instead of just having x+1, I want an expression that is most clearly coded with a variable that is needed _only_ inside the lambda, e.g. if I wanted to use the name one instead of 1: map(lambda x: (one = 1 x+one), [5, 17, 49.5]) I believe most people would just write something like this: def something(): #local helper function to add one to a number def addone(x): one = 1 return x+one return map(addone, [5, 17, 49.5]) -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: Nested function scope problem
Simon Forman wrote: That third option seems to work fine. Well it does, but there are still many things wrong with it if len(tok) 0: should be written as if(tok): tok = '' tok = toc + c should be written as tok = [] tok.append(c) and later ''.join(toc) anyway, the entire thing should be replaced with something like this: import re def breakLine(s): splitters = '?()|:~,' chars = '^ \t\n\r\f\v%s' % splitters regex = '''( (?:[%s]) | (?:[%s]+))''' % (splitters, chars) return re.findall(regex, s,re.VERBOSE) That should be able to be simplified even more if one were to use the character lists built into the regex standard. -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: Dictionary question
Brian Elmegaard wrote: for a, e in l[-2].iteritems(): # Can this be written better? if a+c in l[-1]: if l[-1][a+c]x+e: l[-1][a+c]=x+e else: l[-1][a+c]=x+e # I'd start with something like for a, e in l[-2].iteritems(): keytotal = a+c valtotal = x+e last = l[-1] if keytotal in last: if last[keytotal] valtotal: last[keytotal] = valtotal else: last[keytotal] = valtotal Could probably simplify that even more by using min(), but I don't know what kind of data you are expecting last[keytotal] = min(last.get(keytotal), valtotal) comes close to working - it would if you were doing max. -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: Howto Determine mimetype without the file name extension?
Phoe6 wrote: Hi all, I had a filesystem crash and when I retrieved the data back the files had random names without extension. I decided to write a script to determine the file extension and create a newfile with extension. [...] but the problem with using file was it recognized both .xls (MS Excel) and .doc ( MS Doc) as Microsoft Word Document only. I need to separate the .xls and .doc files, I dont know if file will be helpful here. You may want to try the gnome.vfs module: info = gnome.vfs.get_file_info(filename, gnome.vfs.FILE_INFO_GET_MIME_TYPE) info.mime_type #mime type If all of your documents are .xls and .doc, you could also use one of the cli tools that converts .doc to txt like catdoc. These tools will fail on an .xls document, so if you run it and check for output. .doc files would output a lot, .xls files would output an error or nothing. The gnome.vfs module is probably your best bet though :-) Additionally, I would re-organize your program a bit. something like: import os import re import subprocess types = ( ('rtf', 'Rich Text Format data'), ('doc', 'Microsoft Office Document'), ('pdf', 'PDF'), ('txt', 'ASCII English text'), ) def get_magic(filename): pipe=subprocess.Popen(['file',filename],stdout=subprocess.PIPE) output = pipe.stdout.read() pipe.wait() return output def detext(filename): fileoutput = get_magic(filename) for ext, pattern in types: if pattern in fileoutput: return ext def allfiles(path): for root,dirs,files in os.walk(os.getcwd()): for each in files: fname = os.path.join(root,each) yield fname def fixnames(path): for fname in allfiles(path): extension = detext(fname) print fname, extension # def main(): path = os.getcwd() fixnames(path) if __name__ == '__main__': main() Short functions that just do one thing are always best. To change that to use gnome.vfs, just change the types list to be a dictionary like types = { 'application/msword': 'doc', 'application/vnd.ms-powerpoint': 'ppt', } and then def get_mime(filename): info = gnome.vfs.get_file_info(filename, gnome.vfs.FILE_INFO_GET_MIME_TYPE) return info.mime_type def detext(filename): mime_type = get_mime(filename) return types.get(mime_type) -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: compiling 2.3.5 on ubuntu
Steve Holden wrote: I'm quessing because (s)he wants to test programs on less recent versions of Python. Ubuntu 5.10 was already up to Python 2.4.2, so I can't imagine there's anything older on Ubuntu 6.06. regards Steve Both are avaiaible... -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
RFC: my iterthreader module
I have this iterthreader module that I've been working on for a while now. It is similar to itertools.imap, but it calls each function in its own thread and uses Queues for moving the data around. A better name for it would probably be ithreadmap, but anyway... The short explanation of it is if you have a loop like for item in biglist: print The value for %s is %s % (item, slowfunc(item)) or for item,val in ((item, slowfunc(item)) for item in biglist): print The value for %s is %s % (item, val) you can simply rewrite it as for item,val in iterthreader.Threader(slowfunc, biglist): print The value for %s is %s % (item, val) and it will hopefully run faster. The usual GIL issues still apply of course You can also subclass it in various ways, but I almost always just call it in the above manner. So, can anyone find any obvious problems with it? I've been meaning to re-post [1] it to the python cookbook, but I'd like to hear what others think first. I'm not aware of any other module that makes this particular use of threading this simple. [1] I _think_ I posted it before, but that may have just been in a comment import threading import Queue class Threader: def __init__(self, func=None, data=None, numthreads=2): if not numthreads 0: raise AssertionError(numthreads should be greater than 0) if func: self.handle_input=func if data: self.get_input = lambda : data self._numthreads=numthreads self.threads = [] self.run() def __iter__(self): return self def next(self): still_running, input, output = self.DQ.get() if not still_running: raise StopIteration return input, output def get_input(self): raise NotImplementedError, You must implement get_input as a function that returns an iterable def handle_input(self, input): raise NotImplementedError, You must implement handle_input as a function that returns anything def _handle_input(self): while 1: work_todo, input = self.Q.get() if not work_todo: break self.DQ.put((True, input, self.handle_input(input))) def cleanup(self): wait for all threads to stop and tell the main iter to stop for t in self.threads: t.join() self.DQ.put((False,None,None)) def run(self): self.Q=Queue.Queue() self.DQ=Queue.Queue() for x in range(self._numthreads): t=threading.Thread(target=self._handle_input) t.start() self.threads.append(t) try : for x in self.get_input(): self.Q.put((True, x)) except NotImplementedError, e: print e for x in range(self._numthreads): self.Q.put((False, None)) threading.Thread(target=self.cleanup).start() -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: compiling 2.3.5 on ubuntu
Py PY wrote: (Apologies if this appears twice. I posted it yesterday and it was held due to a 'suspicious header') I'm having a hard time trying to get a couple of tests to pass when compling Python 2.3.5 on Ubuntu Server Edition 6.06 LTS. I'm sure it's not too far removed from the desktop edition but, clearly, I need to tweak something or install some missling libs. Why are you compiling a package that is already built for you? -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: Deferred imports
Tom Plunket wrote: I'm using this package that I can't import on startup, instead needing to wait until some initialization takes place so I can set other things up so that I can subsequently import the package and have the startup needs of that package met. [...] So as y'all might guess, I have the solution in this sort of thing: import os global myDeferredModule class MyClass: def __init__(self): os.environ['MY_DEFERRED_MODULE_PARAM'] = str(self) global myDeferredModule import myDeferredModule m = MyClass() HOWEVER, my problem now comes from the fact that other modules need to use this import. So, I need to either use the above trick, or I need to just import the module in every function that needs it (which will be almost exclusively the constructors for the objects as they're the ones creating Pygame objects). So my question is, is 'import' heavy enough to want to avoid doing it every time an object is created, or is it fairly light if it's currently active somewhere else in the application? I suppose that may depend on how many objects I'm creating, and how frequently I'm creating them, but if 'import' resolves to essentially if not global_imports.has_key(module): do_stuff_to_import_module Importing a module repeatedly certainly won't help the application run any _faster_, but you may not notice the difference. Especially if this particular block of code does not run in a tight loop. It does work like you described, the specific dictionary is sys.modules: print file('a.py').read() print THIS IS A! 'a' in sys.modules False import a THIS IS A! 'a' in sys.modules True import a #will not execute a.py again ...then I'm not so worried about putting it into every constructor. Otherwise I'll do this trick, starting myDeferredModule = None and only do the import if not None. The extra check for not None probably wouldn't be much faster than the check import does in sys.modules. Just calling import in the constructor will also be easier for someone else to understand :-) Thanks! -tom! pypy has a neat lazy loading importer, you could see how they implement it, I think it is just something like class defer: def __getattr__(self, attr): return __import__(attr) defer = defer() then defer.foo will import foo for you. -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: String handling and the percent operator
Tom Plunket wrote: boilerplate = \ [big string] return boilerplate % ((module,) * 3) My question is, I don't like hardcoding the number of times that the module name should be repeated in the two return functions. Is there an straight forward (inline-appropriate) way to count the number of '%s'es in the 'boilerplate' strings? ...or maybe a different and more Pythonic way to do this? (Maybe I could somehow use generators?) thx. -tom! Of course.. stuff = {'lang': 'python', 'page': 'typesseq-strings.html'} print I should read the %(lang)s documentation at ... http://docs.%(lang)s.org/lib/%(page)s % stuff I should read the python documentation at http://docs.python.org/lib/typesseq-strings.html -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: Regular Expression problem
John Blogger wrote: That I want a particular tag value of one of my HTML files. ie: I want only the value after 'href=' in the tag 'link href=mystylesheet.css rel=stylesheet type=text/css' here it would be 'mystylesheet.css'. I used the following regex to get this value(I dont know if it is good). No matter how good it is you should still use something that understands html: from BeautifulSoup import BeautifulSoup html='link href=mystylesheet.css rel=stylesheet type=text/css' page=BeautifulSoup(html) page.link.get('href') 'mystylesheet.css' -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: Regular Expression problem
Justin Azoff wrote: from BeautifulSoup import BeautifulSoup html='link href=mystylesheet.css rel=stylesheet type=text/css' page=BeautifulSoup(html) page.link.get('href') 'mystylesheet.css' On second thought, you will probably want something like [link.get('href') for link in page.fetch('link',{'type':'text/css'})] ['mystylesheet.css'] which will properly handle multiple link tags. -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: how can I avoid abusing lists?
Thomas Nelson wrote: This is exactly what I want to do: every time I encounter this kind of value in my code, increment the appropriate type by one. Then I'd like to go back and find out how many of each type there were. This way I've written seems simple enough and effective, but it's very ugly and I don't think it's the intended use of lists. Does anyone know a cleaner way to have the same funtionality? Thanks, THN Just assign each type a number (type1 - 1, type2 - 2) and then count the values as usual def count(map, it): d={} for x in it: x = map[x] #only difference from normal count function #d[x]=d.get(x,0)+1 if x in d: d[x] +=1 else: d[x] = 1 return d map = {0:1, 1:1, 2:3, 3:1, 4:2} count(map, [1,1,0,4]) {1: 3, 2: 1} for x in count(map, [1,1,0,4]).items(): ... print 'type%d: %d' %x ... type1: 3 type2: 1 -- http://mail.python.org/mailman/listinfo/python-list
Re: Built-in Exceptions - How to Find Out Possible Errno's
Gregory Piñero wrote: Hi Guys, I'm sure this is documented somewhere, I just can't locate it. Say I have this code: try: myfile=file('greg.txt','r') except IOError, error: [...] So basically I'm looking for the document that tells me what possible errors I can catch and their numbers. I did find this but it doesn't have numbers and I can't tell if it's even what I'm looking for: http://docs.python.org/lib/module-errno.html Much thanks! that IS the module you are looking for. help(errno) [...] DESCRIPTION The value of each symbol is the corresponding integer value, e.g., on most systems, errno.ENOENT equals the integer 2. [...] ENODATA = 61 ENODEV = 19 ENOENT = 2 ENOEXEC = 8 all those E* constants ARE the numbers. furthermore, the object you get back from except has both the code and the string already: e exceptions.IOError instance at 0xb7ddbfec print e [Errno 2] No such file or directory: 'foo' dir(e) ['__doc__', '__getitem__', '__init__', '__module__', '__str__', 'args', 'errno', 'filename', 'strerror'] e.strerror 'No such file or directory' -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: smtplib problem for newbie
Noah Gift wrote: [snip] a = long(time.time() * 256) # use fractional seconds TypeError: 'module' object is not callable Part of your program includes a file or directory that you called 'long'. You should not re-use names of built-ins in your programs.. they cause you to get errors like the above. see: long('12') 12L open(long.py,'w') open file 'long.py', mode 'w' at 0x401e3380 import long long('12') Traceback (most recent call last): File stdin, line 1, in ? TypeError: 'module' object is not callable -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: Python to PHP Login System (HTTP Post)
Jeethu Rao wrote: You need to use httplib. http://docs.python.org/lib/httplib-examples.html Jeethu Rao Not at all. They need to read the documentation for urrlib: http://docs.python.org/lib/module-urllib.html http://docs.python.org/lib/node483.html The following example uses the POST method instead: Additionally, they probably need to use cookielib, otherwise the logged in state will not be persistant. -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: newb: comapring two strings
manstey wrote: Hi, Is there a clever way to see if two strings of the same length vary by only one character, and what the character is in both strings. E.g. str1=yaqtil str2=yaqtel they differ at str1[4] and the difference is ('i','e') something like this maybe? str1='yaqtil' str2='yaqtel' set(enumerate(str1)) ^ set(enumerate(str2)) set([(4, 'e'), (4, 'i')]) -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: best way to determine sequence ordering?
John Salerno wrote: If I want to make a list of four items, e.g. L = ['C', 'A', 'D', 'B'], and then figure out if a certain element precedes another element, what would be the best way to do that? Looking at the built-in list functions, I thought I could do something like: if L.index('A') L.index('D'): # do some stuff This actually performs pretty well since list.index is implemented in C. The obvious (to me) implementation of: def before(lst, a, b): for x in lst: if x == a: return True if x == b: return False runs about 10-50 times faster than the double index method if I use psyco. Without psyco, it ends up being faster for the cases where a or b appears early on in the list, and the other appears towards the end. -- http://mail.python.org/mailman/listinfo/python-list
Re: a simple regex question
John Salerno wrote: Ok, I'm stuck on another Python challenge question. Apparently what you have to do is search through a huge group of characters and find a single lowercase character that has exactly three uppercase characters on either side of it. Here's what I have so far: pattern = '([a-z][A-Z]{3}[a-z][A-Z]{3}[a-z])+' print re.search(pattern, mess).groups() Not sure if 'groups' is necessary or not. Anyway, this returns one matching string, but when I put this letter in as the solution to the problem, I get a message saying yes, but there are more, so assuming this means that there is more than one character with three caps on either side, is my RE written correctly to find them all? I didn't have the parentheses or + sign at first, but I added them to find all the possible matches, but still only one comes up. Thanks. I don't believe you _need_ the parenthesis or the + in that usage... Have a look at http://docs.python.org/lib/node115.html It should be obvious which method you need to use to find them all -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: Counting number of each item in a list.
Bruno Desthuilliers wrote: And of course, I was right. My solution seems to be faster than Paul's one (but slower than bearophile's), be it on small, medium or large lists. Your version is only fast on lists with a very small number of unique elements. changing mklist to have items = range(64) instead of the 9 item list and re-timing you will get better results: A100 (1 times): 7.63829684258 B100 (1 times): 1.34028482437 C100 (1 times): 0.812223911285 A1 (100 times): 9.78499102592 B1 (100 times): 1.26520299911 C1 (100 times): 0.857560873032 A100 (10 times): 87.6713900566 B100 (10 times): 12.7302949429 C100 (10 times): 8.35931396484 -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: general coding issues - coding style...
Dylan Moreland wrote: I would look into one of the many Vim scripts which automatically fold most large blocks without the ugly {{{. Who needs a script? set foldmethod=indent works pretty well for most python programs. -- http://mail.python.org/mailman/listinfo/python-list
Re: append to the end of a dictionary
Magnus Lycka wrote: orderedListOfTuples = [(k,mydict[k]) for k in sorted(mydict.keys())] orderedListOfTuples = sorted(mydict.items()) It's great that many people try to help out on comp.lang.python, the community won't survive otherwise, but I think it's important to test answers before posting them, unless you're sure about your answer. A wrong answer might actually be worse than no answer at all. I fear that newbies will just get scared off if they get bunch a of replies to their questions, and most are wrong. indeed. -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: Multiway Branching
[EMAIL PROTECTED] wrote: I need to look at two-byte pairs coming from a machine, and interpret the meaning based on the relative values of the two bytes. In C I'd use a switch statement. Python doesn't have such a branching statement. I have 21 comparisons to make, and that many if/elif/else statements is clunky and inefficient. Since these data are coming from an OMR scanner at 9600 bps (or faster if I can reset it programmatically to 38K over the serial cable), I want a fast algorithm. The data are of the form: if byte1 == 32 and byte2 == 32: row_value = 0 elif byte1 == 36 and byte2 == 32: row_value = natural ... elif byte1 == 32 and byte2 == 1: row_value = 5 elif byte1 == 66 and byte2 == 32: row_value = 0.167 There are two rows where the marked response equates to a string and 28 rows where the marked response equates to an integer (1-9) or float of defined values. Suggestions appreciated. Rich -- Richard B. Shepard, Ph.D. | Author of Quantifying Environmental Applied Ecosystem Services, Inc. (TM) | Impact Assessments Using Fuzzy Logic http://www.appl-ecosys.com Voice: 503-667-4517 Fax: 503-667-8863 Use a dictionary: byte_values = { (32,32) : 0, (36,32) : 'natural', (32,1 ) : 5, } row_value = byte_values[byte1,byte2] -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: Multiway Branching
[EMAIL PROTECTED] wrote: I need to look at two-byte pairs coming from a machine, and interpret the meaning based on the relative values of the two bytes. In C I'd use a switch statement. Python doesn't have such a branching statement. I have 21 comparisons to make, and that many if/elif/else statements is clunky and inefficient. Since these data are coming from an OMR scanner at 9600 bps (or faster if I can reset it programmatically to 38K over the serial cable), I want a fast algorithm. The data are of the form: if byte1 == 32 and byte2 == 32: row_value = 0 elif byte1 == 36 and byte2 == 32: row_value = natural ... elif byte1 == 32 and byte2 == 1: row_value = 5 elif byte1 == 66 and byte2 == 32: row_value = 0.167 There are two rows where the marked response equates to a string and 28 rows where the marked response equates to an integer (1-9) or float of defined values. Suggestions appreciated. Rich -- Richard B. Shepard, Ph.D. | Author of Quantifying Environmental Applied Ecosystem Services, Inc. (TM) | Impact Assessments Using Fuzzy Logic http://www.appl-ecosys.com Voice: 503-667-4517 Fax: 503-667-8863 Use a dictionary: byte_values = { (32,32) : 0, (36,32) : 'natural', (32,1 ) : 5, } row_value = byte_values[byte1,byte2] -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: Number set type
You could use IPy... http://svn.23.nu/svn/repos/IPy/trunk/IPy.py is one location for it... I wonder where you get O(n) and O(n^2) from... CIDR blocks are all sequential.. All you need to store is the starting and ending address or length. Then any set operation only has to deal with 4 numbers, and should be literally a few lines of code with no loops. -- http://mail.python.org/mailman/listinfo/python-list
Re: Number set type
Heiko Wundram wrote: Union of two IP4Ranges is simply normalizing a concatenated list of both IP4Range ranges. Normalizing takes O(log n)+O(n) = O(n) steps, where n is the number of ranges in the combined IP4Range. I see now :-) If the ranges are sorted, I bet you could just iterate through both at the same time, merging intersecting ranges where possible. Intersection takes O(n^2) steps in my current implementation (which I know is mathematically correct), where n is max(n_1,n_2) where n_1 is the number of ranges in the first IP4Range and n_2 the number of ranges in the second IP4Range respectively. Intersecting two IP4Ranges can be done with fewer steps, and I think it could be done in O(n) in the case of normalized and sorted ranges, and I have a few ideas of myself, but I'm currently too lazy to try to prove them correct. Yes.. if they are sorted, something like this should work: def intersection(self, other): ret = [] ai=iter(self.ranges) bi=iter(other.ranges) try : a = ai.next() b = bi.next() except StopIteration: return IP4Range([]) while 1: try : if a.intersects(b): ret.append(a.intersection(b)) a = ai.next() b = bi.next() elif a.start b.start: a = ai.next() else : b = bi.next() except StopIteration: break return IP4Range(ret) -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: Number set type
Justin Azoff wrote: Yes.. if they are sorted, something like this should work: Oops, that was almost right, but it would skip some ranges. This should always work: ... while 1: try : if a.intersects(b): ret.append(a.intersection(b)) if a.end b.end: a = ai.next() else : b = bi.next() elif a.start b.start: a = ai.next() else : b = bi.next() except StopIteration: break return RangeList(ret) -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: python coding contest
Tim Hochberg wrote: Note that in principle it's possible to encode the data for how to display a digit in one byte. Thus it's at least theoretically possible to condense all of the information about the string into a string that's 10 bytes long. In practice it turns out to be hard to do that, since a 10 byte string will generally have a representation that is longer than 10 bytes because of the way the escape sequences get printed out. As a result various people seem to be encoding the data in long integers of one sort or another. The data is then extracted using some recipe involving shifts and s. -tim I have a 163 character version(on 8 lines, haven't tried to compress it further) that does something like that.. the string ended up being printable enough to be included in the source unescaped. I think for most approaches, any space you save by using a string you lose after quoting it and using ord() to turn a character back into a number. I'm sure this particular method is a dead end, but it is a very intersting and probably unique solution :-) -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: python coding contest
Tim Hochberg wrote: In the 130's is definately possible, but I haven't heard of anyone doing better than that. I have a version that is 127, but only if you strip extra whitespace :-( -- - Justin -- http://mail.python.org/mailman/listinfo/python-list
Re: python coding contest
c=open(seven_seg.py).read() len(c) 251 len(c.replace( ,)) 152 :-) Knowing me, I'll forget to submit it. -- http://mail.python.org/mailman/listinfo/python-list
Re: Some simple performace tests (long)
How much ram does your machine have? the main point is except when a very large range is used on a memory-starved machine run x = range(10 ** 6) and look at the memory usage of python.. what happens when you run this program: import time def t(func, num): s = time.time() for x in func(num): pass return time.time() - s def run(func, num): times = [] for x in range(5): times.append(t(func,num)) return min(times), max(times), sum(times)/5 def main(): x = 10 ** 6 while 1: print trying, x for s, f in ('xr', xrange), (' r', range): print s + %.3f %.3f %.3f % run(f, x) x *= 1.5 x = int(x) if __name__ == __main__: main() I get (columns are mix/max/average): trying 100 xr 0.110 0.115 0.111 r 0.101 0.186 0.119 trying 150 xr 0.082 0.087 0.083 r 0.152 0.158 0.154 trying 225 xr 0.124 0.138 0.128 r 0.228 0.235 0.230 trying 3375000 xr 0.184 0.189 0.186 r 0.344 0.352 0.346 trying 5062500 xr 0.276 0.284 0.279 r 0.515 0.528 0.519 trying 7593750 xr 0.415 0.421 0.416 r 0.774 0.795 0.779 trying 11390625 xr 0.623 0.634 0.626 r 1.163 1.246 1.180 trying 17085937 xr 0.934 0.941 0.937 Killed The Killed is from the linux OOM killing the python process.. notice that the xrange for that number worked fine. -- http://mail.python.org/mailman/listinfo/python-list