Re: [Tutor] Faster procedure to filter two lists . Please help
[Gonçalo Rodrigues] It this correct? Python lists are not linked-lists (as in Scheme, for example). They are more like arrays (or vectors in C++/Java) with a little more sofistication built into them to allow, for example, to amortize over time a sequence of append operations. But in a nutshell, len is actually a field in the underlying C object so len() is a constant (O(1)) and as-fast-as-it-can-be operation. Someone correct me, please, If I'm wrong. That's all correct. It remains more idiomatic to do for thing in a_list: ... than for i in range(len(a_list)): thing = a_list[i] ... and the former is in fact a bit faster (or possibly a lot, if a_list is very large, because range(n) actually constructs a list containing n integers), but (ignoring the range() complication) there's no difference in O() behavior between the two. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Faster procedure to filter two lists . Please help
It this correct? Python lists are not linked-lists (as in Scheme, for example). They are more like arrays (or vectors in C++/Java) with a little more sofistication built into them to allow, for example, to amortize over time a sequence of append operations. But in a nutshell, len is actually a field in the underlying C object so len() is a constant (O(1)) and as-fast-as-it-can-be operation. Someone correct me, please, If I'm wrong. I thought that too, but after a recent thread when I said the same I got an off list mail assuring me that len() was calculated at run time by iterating over the collection! I guess I should go and look at the code! However in either case missing out the two function calls to len() would help improve the speed... Alan G. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Faster procedure to filter two lists . Please help
GR len is actually a field in the underlying C object so len() is a GR constant (O(1)) and as-fast-as-it-can-be operation. TP ...n integers), but (ignoring the range() complication) there's no TP difference in O() behavior between the two. OK, The timbot's word is good enough for me, I won't bother looking at the code, I'll revert to my previous assumption! :-) Alan G. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Faster procedure to filter two lists . Please help
[Alan Gauld] OK, The timbot's word is good enough for me, I won't bother looking at the code, I'll revert to my previous assumption! :-) It's educational to look at the code anyway wink. Here it is, from Python's listobject.c: static int list_length(PyListObject *a) { return a-ob_size; } People shouldn't be afraid to look at C code -- it won't bite them, and it's often the easiest way to get a definitive answer about the implementation. OTOH, people should be afraid to _write_ C code: it will bite you every time 0.9 wink. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Faster procedure to filter two lists . Please help
Hi Danny: Thank you for your suggestion. I tried creating a dictionary of 'what' list and searched keys with has_key method and it is pretty fast. Thanks again. following is the piece of code. K cors = [] intr = [] for i in range(len(what)): ele = split(what[i],'\t') cors.append(ele[0]) intr.append(ele[1]) what_dict = dict(zip(cors,intr)) for i in range(len(my_report)): cols = split(my_report[i],'\t') cor = cols[0] if what_dict.has_key(cor): intr = what_dict[cor] my_final_report.append(cols[0]+'\t'+intr+'\t'+cols[1]+'\t'+cols[2]) --- Danny Yoo [EMAIL PROTECTED] wrote: On Fri, 14 Jan 2005, kumar s wrote: for i in range(len(what)): ele = split(what[i],'\t') cor1 = ele[0] for k in range(len(my_report)): cols = split(my_report[k],'\t') cor = cols[0] if cor1 == cor: print cor+'\t'+ele[1]+'\t'+cols[1]+'\t'+cols[2] Hi Kumar, Ok, this calls for the use of an associative map or dictionary. The main time sink is the loop here: for k in range(len(my_report)): cols = split(my_report[k],'\t') cor = cols[0] if cor1 == cor: print cor+'\t'+ele[1]+'\t'+cols[1]+'\t'+cols[2] Conceptually, my_report can be considered a list of key/value pairs. For each element in 'my_report', the key is the first column (cols[0]), and the value is the rest of the columns (cols[1:]). The loop above can, in a pessimistic world, require a search across the whole of 'my_report'. This can take time that is proportional to the length of 'my_report'. You mentioned earlier that each list might be of length 249502, so we're looking into a process whose overall cost is gigantic. [Notes on calculating runtime cost: when the structure of the code looks like: for element1 in list1: for element2 in list2: some_operation_that_costs_K_time() then the overall cost of running this loop will be K * len(list1) * len(list2) ] We can do much better than this if we use a dictionary data structure. A dictionary can reduce the time it takes to do a lookup search down from a linear-time operation to an atomic-time one. Do you know about dictionaries yet? You can take a look at: http://www.ibiblio.org/obp/thinkCSpy/chap10.htm which will give an overview of a dictionary. It doesn't explain why dictionary lookup is fast, but we can talk about that later if you want. Please feel free to ask any questions about dictionaries and their use. Learning how to use a dictionary data structure is a skill that pays back extraordinarily well. Good luck! __ Do you Yahoo!? Meet the all-new My Yahoo! - Try it today! http://my.yahoo.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor