Re: [Tutor] Faster procedure to filter two lists . Please help

2005-01-15 Thread Tim Peters
[Gonçalo Rodrigues]
 It this correct? Python lists are not linked-lists (as in Scheme, for
 example). They are more like arrays (or vectors in C++/Java) with a
 little more sofistication built into them to allow, for example, to
 amortize over time a sequence of append operations. But in a nutshell,
 len is actually a field in the underlying C object so len() is a
 constant (O(1)) and as-fast-as-it-can-be operation.

 Someone correct me, please, If I'm wrong.

That's all correct.  It remains more idiomatic to do

for thing in a_list:
...

than

for i in range(len(a_list)):
thing = a_list[i]
...

and the former is in fact a bit faster (or possibly a lot, if a_list
is very large, because range(n) actually constructs a list containing
n integers), but (ignoring the range() complication) there's no
difference in O() behavior between the two.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Faster procedure to filter two lists . Please help

2005-01-15 Thread Alan Gauld
 It this correct? Python lists are not linked-lists (as in Scheme,
for
 example). They are more like arrays (or vectors in C++/Java) with a
 little more sofistication built into them to allow, for example, to
 amortize over time a sequence of append operations. But in a
nutshell,
 len is actually a field in the underlying C object so len() is a
 constant (O(1)) and as-fast-as-it-can-be operation.

 Someone correct me, please, If I'm wrong.

I thought that too, but after a recent thread when I said
the same I got an off list mail assuring me that len() was
calculated at run time by iterating over the collection!

I guess I should go and look at the code!
However in either case missing out the two function calls
to len() would help improve the speed...

Alan G.

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Faster procedure to filter two lists . Please help

2005-01-15 Thread Alan Gauld
 GR len is actually a field in the underlying C object so len() is a
 GR constant (O(1)) and as-fast-as-it-can-be operation.

TP ...n integers), but (ignoring the range() complication) there's no
TP difference in O() behavior between the two.

OK, The timbot's word is good enough for me, I won't bother
looking at the code, I'll revert to my previous assumption! :-)

Alan G.

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Faster procedure to filter two lists . Please help

2005-01-15 Thread Tim Peters
[Alan Gauld]
 OK, The timbot's word is good enough for me, I won't bother
 looking at the code, I'll revert to my previous assumption! :-)

It's educational to look at the code anyway wink.  Here it is, from
Python's listobject.c:

static int
list_length(PyListObject *a)
{
return a-ob_size;
}

People shouldn't be afraid to look at C code -- it won't bite them,
and it's often the easiest way to get a definitive answer about the
implementation.  OTOH, people should be afraid to _write_ C code:  it
will bite you every time 0.9 wink.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Faster procedure to filter two lists . Please help

2005-01-14 Thread kumar s
Hi Danny:
 Thank you for your suggestion. I tried creating a
dictionary of 'what' list and searched keys with
has_key method and it is pretty fast. 

Thanks again. following is the piece of code.

K



 cors = []
 intr = []
 for i in range(len(what)):
ele = split(what[i],'\t')
cors.append(ele[0])
intr.append(ele[1])


 what_dict = dict(zip(cors,intr))

 for i in range(len(my_report)):
cols = split(my_report[i],'\t')
cor = cols[0]
if what_dict.has_key(cor):
intr = what_dict[cor]

my_final_report.append(cols[0]+'\t'+intr+'\t'+cols[1]+'\t'+cols[2])





--- Danny Yoo [EMAIL PROTECTED] wrote:

 
 
 On Fri, 14 Jan 2005, kumar s wrote:
 
  for i in range(len(what)):
  ele = split(what[i],'\t')
  cor1 = ele[0]
  for k in range(len(my_report)):
  cols = split(my_report[k],'\t')
  cor = cols[0]
  if cor1 == cor:
  print cor+'\t'+ele[1]+'\t'+cols[1]+'\t'+cols[2]
 
 
 
 Hi Kumar,
 
 
 Ok, this calls for the use of an associative map
 or dictionary.
 
 
 The main time sink is the loop here:
 
  for k in range(len(my_report)):
  cols = split(my_report[k],'\t')
  cor = cols[0]
  if cor1 == cor:
  print cor+'\t'+ele[1]+'\t'+cols[1]+'\t'+cols[2]
 
 Conceptually, my_report can be considered a list of
 key/value pairs.  For
 each element in 'my_report', the key is the first
 column (cols[0]), and
 the value is the rest of the columns (cols[1:]).
 
 
 The loop above can, in a pessimistic world, require
 a search across the
 whole of 'my_report'.  This can take time that is
 proportional to the
 length of 'my_report'.  You mentioned earlier that
 each list might be of
 length 249502, so we're looking into a process whose
 overall cost is
 gigantic.
 
 
 [Notes on calculating runtime cost: when the
 structure of the code looks
 like:
 
 for element1 in list1:
 for element2 in list2:
 some_operation_that_costs_K_time()
 
 then the overall cost of running this loop will be
 
 K * len(list1) * len(list2)
 ]
 
 
 We can do much better than this if we use a
 dictionary data structure. A
 dictionary can reduce the time it takes to do a
 lookup search down from
 a linear-time operation to an atomic-time one.  Do
 you know about
 dictionaries yet?  You can take a look at:
 
 http://www.ibiblio.org/obp/thinkCSpy/chap10.htm
 
 which will give an overview of a dictionary.  It
 doesn't explain why
 dictionary lookup is fast, but we can talk about
 that later if you want.
 
 
 Please feel free to ask any questions about
 dictionaries and their use.
 Learning how to use a dictionary data structure is a
 skill that pays back
 extraordinarily well.
 
 
 Good luck!
 
 




__ 
Do you Yahoo!? 
Meet the all-new My Yahoo! - Try it today! 
http://my.yahoo.com 
 

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor