Re: [Tutor] Printing regular expression match

2005-12-03 Thread Danny Yoo


On Sat, 3 Dec 2005, Srinivas Iyyer wrote:
  a
 ['apple', 'boy', 'boy', 'apple']

  b
 ['Apple', 'BOY', 'APPLE-231']

  for i in a:
   pat = re.compile(i,re.IGNORECASE)
   for m in b:
   if pat.match(m):
   print m


Hi Srinivas,

We may want to change the problem so that it's less focused on printing
results directly.  We can rephrase the question as a list filtering
operation: we want to keep the elements of b that satisfy a certain
criteron.


Let's give a name to that criterion now:

##
def doesNameMatchSomePrefix(word, prefixes):
Returns True if the input word is matched by some prefix in
the input list of prefixes.  Otherwise, returns False.
# ... fill me in

##


Can you write doesNameMatchSomePrefix()?  In fact, you might not even need
regexes to write an initial version of it.



If you can write that function, then what you're asking:

 I do not want python to print both elenents from lists a and b.  I just
 want only the elements in the list B.

should not be so difficult: it'll be a straightforward loop across b,
using that helper function.



(Optimization can be done to make doesNameMatchSomePrefix() fast, but you
probably should concentrate on correctness first.  If you're interested in
doing something like this for a large number of prefixes, you might be
interested in:

http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/

which has more details and references to specialized modules that attack
the problem you've shown us so far.)


Good luck!

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Printing regular expression match

2005-12-03 Thread Srinivas Iyyer
Hi Danny, 
thanks for your email. 

In the example I've shown, there are no odd elements
except for character case. 

In the real case I have a list of 100 gene names for
Humans. 
The human gene names are conventioanlly represented in
higher cases (eg.DDX3X).  However, NCBI's gene_info
dataset the gene names are reported in lowercase (eg.
ddx3x).  I want to extract the rest of the information
for DDX3X that I have from NCBI's file (given that
dataset is in tab delim format). 

my approach was if i can define DDX3X is identical
ddx3x then I want to print that line from the other
list (NCBI's gene_info dataset). 

I guess, I understood your suggestion wrongly.  In
such case, why do I have to drop something from list b
(which is over 150 K lines). If I can create a sublist
of all elements in b (a small list of 100) then it is
more easy. this is my opinion. 

-srini


--- Danny Yoo [EMAIL PROTECTED] wrote:

 
 
 On Sat, 3 Dec 2005, Srinivas Iyyer wrote:
   a
  ['apple', 'boy', 'boy', 'apple']
 
   b
  ['Apple', 'BOY', 'APPLE-231']
 
   for i in a:
  pat = re.compile(i,re.IGNORECASE)
  for m in b:
  if pat.match(m):
  print m
 
 
 Hi Srinivas,
 
 We may want to change the problem so that it's less
 focused on printing
 results directly.  We can rephrase the question as a
 list filtering
 operation: we want to keep the elements of b that
 satisfy a certain
 criteron.
 
 
 Let's give a name to that criterion now:
 
 ##
 def doesNameMatchSomePrefix(word, prefixes):
 Returns True if the input word is matched by
 some prefix in
 the input list of prefixes.  Otherwise, returns
 False.
 # ... fill me in
 
 ##
 
 
 Can you write doesNameMatchSomePrefix()?  In fact,
 you might not even need
 regexes to write an initial version of it.
 
 
 
 If you can write that function, then what you're
 asking:
 
  I do not want python to print both elenents from
 lists a and b.  I just
  want only the elements in the list B.
 
 should not be so difficult: it'll be a
 straightforward loop across b,
 using that helper function.
 
 
 
 (Optimization can be done to make
 doesNameMatchSomePrefix() fast, but you
 probably should concentrate on correctness first. 
 If you're interested in
 doing something like this for a large number of
 prefixes, you might be
 interested in:
 


http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/
 
 which has more details and references to specialized
 modules that attack
 the problem you've shown us so far.)
 
 
 Good luck!
 
 




__ 
Yahoo! DSL – Something to write home about. 
Just $16.99/mo. or less. 
dsl.yahoo.com 

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor