Hi Danny,
thanks for your email.
In the example I've shown, there are no odd elements
except for character case.
In the real case I have a list of 100 gene names for
Humans.
The human gene names are conventioanlly represented in
higher cases (eg.DDX3X). However, NCBI's gene_info
dataset the gene names are reported in lowercase (eg.
ddx3x). I want to extract the rest of the information
for DDX3X that I have from NCBI's file (given that
dataset is in tab delim format).
my approach was if i can define DDX3X is identical
ddx3x then I want to print that line from the other
list (NCBI's gene_info dataset).
I guess, I understood your suggestion wrongly. In
such case, why do I have to drop something from list b
(which is over 150 K lines). If I can create a sublist
of all elements in b (a small list of 100) then it is
more easy. this is my opinion.
-srini
--- Danny Yoo [EMAIL PROTECTED] wrote:
On Sat, 3 Dec 2005, Srinivas Iyyer wrote:
a
['apple', 'boy', 'boy', 'apple']
b
['Apple', 'BOY', 'APPLE-231']
for i in a:
pat = re.compile(i,re.IGNORECASE)
for m in b:
if pat.match(m):
print m
Hi Srinivas,
We may want to change the problem so that it's less
focused on printing
results directly. We can rephrase the question as a
list filtering
operation: we want to keep the elements of b that
satisfy a certain
criteron.
Let's give a name to that criterion now:
##
def doesNameMatchSomePrefix(word, prefixes):
Returns True if the input word is matched by
some prefix in
the input list of prefixes. Otherwise, returns
False.
# ... fill me in
##
Can you write doesNameMatchSomePrefix()? In fact,
you might not even need
regexes to write an initial version of it.
If you can write that function, then what you're
asking:
I do not want python to print both elenents from
lists a and b. I just
want only the elements in the list B.
should not be so difficult: it'll be a
straightforward loop across b,
using that helper function.
(Optimization can be done to make
doesNameMatchSomePrefix() fast, but you
probably should concentrate on correctness first.
If you're interested in
doing something like this for a large number of
prefixes, you might be
interested in:
http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/
which has more details and references to specialized
modules that attack
the problem you've shown us so far.)
Good luck!
__
Yahoo! DSL Something to write home about.
Just $16.99/mo. or less.
dsl.yahoo.com
___
Tutor maillist - Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor