Srinivas Iyyer wrote:
Dear group,
I have a problem in finding a method to solve a
problem where I want to walk through a lineage of
terms and find group them from right to left.
A snippet of the problem is here. The terms in file as
tab delim manner.
a b c d car
a b c f truck
a b c d van
a b c d SUV
a b c f 18-wheeler
a b j k boat
a b j a submarine
a b d a B-747
a b j c cargo-ship
a b j p passenger-cruise ship
a b a a bicycle
a b a b motorcycle
Now my question is to enrich members that have
identical lineage with different leaf.
'i.e': a b c d - van suv . I have two terms in this
path and I am not happy with two. I wish to have more.
Then: a b c - car, van, truck, SUV and 18-wheeler
(automobiles that travel on road). I am happy with
this grouping and I enriched more items if I walk on
lienage : (a-b-c)
I'm not sure I understand what you want to do, but I think a tree where
each internal node is a dict and each leaf node is a list will do what
you want. You would end up with something like
tree['a']['b']['c']['d'] = ['car', 'van', 'suv']
To find the value for a b c you would traverse the tree to that point,
then accumulate all the leaf nodes underneath.
OK I guess I feel like writing code, here is a simple implementation. It
requires that all the leaves be at the same depth. It's not particularly
clever but it works and shows what can be done just by hooking together
basic data structures.
raw_data = '''a b c d car
a b c f truck
a b c d van
a b c d SUV
a b c f 18-wheeler
a b j k boat
a b j a submarine
a b d a B-747
a b j c cargo-ship
a b j p passenger-cruise ship
a b a a bicycle
a b a b motorcycle'''.splitlines()
tree = {}
# This builds the tree of nested dictionaries with lists as the leaves
for line in raw_data:
data = line.split(None, 5)
keys = data[:4]
value = data[4]
# This loop makes the dict nodes
subtree = tree
for key in keys[:-1]:
subtree = subtree.setdefault(key, {})
# The last key points to a list, not a dict
lastkey = keys[-1]
subtree.setdefault(lastkey, []).append(value)
def iter_leaves(subtree):
''' Recursive generator that yields all the leaf nodes of subtree '''
if isinstance(subtree, list):
# A leaf node
yield subtree
return
for item in subtree.itervalues():
for leaf in iter_leaves(item):
yield leaf
def get_leaves(*keys):
''' Return a list of all the leaves in the subtree pointed to by
*keys '''
subitem = tree
for key in keys:
subitem = subitem[key]
leaves = []
for leaf in iter_leaves(subitem):
leaves.extend(leaf)
return leaves
print get_leaves('a', 'b', 'c', 'd')
print get_leaves('a', 'b', 'c')
## prints
['car', 'van', 'SUV']
['car', 'van', 'SUV', 'truck', '18-wheeler']
Kent
Thus, I want to try to enrich for all 21 K lines of
lineages.
My question:
Is there a way to automate this problem.
My idea of doing this:
Since this is a tab delim file. I want to read a line
with say 5 columns (5 tabs). Search for items with
same column item 4 (because leaf items could be
unique). If I find a hit, then check if columns 3 and
2 are identical if so create a list.
Although this problem is more recursive and time and
resource consuming, I cannot think of an easy
solution.
Would you please suggest a nice and simple method to
solve this problem.
For people who are into bioinformatics (I know Danny
Yoo is a bioinformatician) the question is about GO
terms. I parsed OBO file and laid out the term
lineages that constitute the OBO-DAG structure. I want
to enrich the terms to do an enrichment analysis for a
set of terms that I am interested in.
Thank you in advance.
cheers
Srini
__
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
___
Tutor maillist - Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor
___
Tutor maillist - Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor