Re: [Tutor] mapping problem

2006-02-01 Thread Hugo González Monteverde
What about choosing a native Python data structure for this?

A list of attributes comes to mind, as presence of one can be easily 
tested with

if x in attributes:
 do_something

Here's a test:

  elements= {}
  elements['boat'] = ['a', 'b', 'j', 'k']
  'a' in elements['boat']
True
 

Then it is just a matter of getting the fiels into the structure, and 
then recursing through  the elements, like this.


elements = {}

for line in table_fileo:
 fields = line.split()

 #get rid of linefeed
 fields[-1] = fields[-1][:-1]

 elements[fields[-1]] = fields[:-1]

Then you get a dictionary with the last name in the table as key and a 
list of the other attributes as value.

Then, if you want to look for an attribute, just do:
for element in elements.items():
 if 'j' in element[1]:
 print element[0]

What do you think?

Hugo
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] mapping problem

2006-02-01 Thread Danny Yoo


 A snippet of the problem is here. The terms in file as
 tab delim manner.

[data cut]

 Now my question is to enrich members that have identical lineage with
 different leaf. 'i.e': a b c d - van suv.


Hi Srinivas,

You're starting to use the vocabulary of tree data structures here.  Let's
clarify what you mean by:

 I have two terms in this path and I am not happy with two. I wish to
 have more.

I think you mean I have two descendent leaves at this point ..., in
which case your next statement:


 Then: a b c - car, van, truck, SUV and 18-wheeler (automobiles that
 travel on road). I am happy with this grouping and I enriched more items
 if I walk on lienage : (a-b-c)

sounds like traversing up from your current node up to the parent node. In
your example, this expands the descendent leaves to include those other
elements.



 Is there a way to automate this problem.

I would strongly recommend not thinking about this as a text file problem,
but as a data structure problem.  Trying to do this by messing with lines
and tabs seems to me like asking to turn a simple problem into a difficult
one.  The fact that the data that you have is in a text file should be
treated as an accident more than anything: nothing binds you to try to
solve your problem by slogging through with the original data
representation.

The problem becomes much clearer if you're dealing with trees:

http://en.wikipedia.org/wiki/Tree_data_structure

in which case you can use the vocabulary of tree traversal to make the
problem very simple.  I might be misundestanding what you're sying but it
sounds like you're just asking for parent node traversal, which is a
fundamental operation on trees.

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] mapping problem

2006-02-01 Thread Kent Johnson
Srinivas Iyyer wrote:
 Dear group, 
   I have a problem in finding a method to solve a
 problem where I want to walk through a lineage of
 terms and find group them from right to left. 
 
 A snippet of the problem is here. The terms in file as
 tab delim manner. 
 
 a b   c   d   car
 a b   c   f   truck
 a b   c   d   van
 a b   c   d   SUV
 a b   c   f   18-wheeler
 a b   j   k   boat
 a b   j   a   submarine
 a b   d   a   B-747
 a b   j   c   cargo-ship
 a b   j   p   passenger-cruise ship
 a b   a   a   bicycle
 a b   a   b   motorcycle
 
 
 Now my question is to enrich members that have
 identical lineage with different leaf.
 'i.e': a b c d - van suv . I have two terms in this
 path and I am not happy with two. I wish to have more.
 
 Then: a b c - car, van, truck, SUV and 18-wheeler
 (automobiles that travel on road). I am happy with
 this grouping and I enriched more items if I walk on
 lienage : (a-b-c)

I'm not sure I understand what you want to do, but I think a tree where 
each internal node is a dict and each leaf node is a list will do what 
you want. You would end up with something like
tree['a']['b']['c']['d'] = ['car', 'van', 'suv']

To find the value for a b c you would traverse the tree to that point, 
then accumulate all the leaf nodes underneath.

OK I guess I feel like writing code, here is a simple implementation. It 
requires that all the leaves be at the same depth. It's not particularly 
clever but it works and shows what can be done just by hooking together 
basic data structures.

raw_data = '''a   b   c   d   car
a   b   c   f   truck
a   b   c   d   van
a   b   c   d   SUV
a   b   c   f   18-wheeler
a   b   j   k   boat
a   b   j   a   submarine
a   b   d   a   B-747
a   b   j   c   cargo-ship
a   b   j   p   passenger-cruise ship
a   b   a   a   bicycle
a   b   a   b   motorcycle'''.splitlines()

tree = {}

# This builds the tree of nested dictionaries with lists as the leaves
for line in raw_data:
 data = line.split(None, 5)
 keys = data[:4]
 value = data[4]

 # This loop makes the dict nodes
 subtree = tree
 for key in keys[:-1]:
 subtree = subtree.setdefault(key, {})

 # The last key points to a list, not a dict
 lastkey = keys[-1]
 subtree.setdefault(lastkey, []).append(value)


def iter_leaves(subtree):
 ''' Recursive generator that yields all the leaf nodes of subtree '''
 if isinstance(subtree, list):
 # A leaf node
 yield subtree
 return

 for item in subtree.itervalues():
 for leaf in iter_leaves(item):
 yield leaf

def get_leaves(*keys):
 ''' Return a list of all the leaves in the subtree pointed to by 
*keys '''
 subitem = tree
 for key in keys:
 subitem = subitem[key]

 leaves = []
 for leaf in iter_leaves(subitem):
 leaves.extend(leaf)

 return leaves

print get_leaves('a', 'b', 'c', 'd')
print get_leaves('a', 'b', 'c')

## prints

['car', 'van', 'SUV']
['car', 'van', 'SUV', 'truck', '18-wheeler']

Kent

 
 
 Thus, I want to try to enrich for all 21 K lines of
 lineages.
 
 My question:
 
 Is there a way to automate this problem.
 
 My idea of doing this:
 
 Since this is a tab delim file. I want to read a line
 with say 5 columns (5 tabs).  Search for items with
 same column item 4 (because leaf items could be
 unique). If I find a hit, then check if columns 3 and
 2 are identical if so create a list. 
 
 Although this problem is more recursive and time and
 resource consuming, I cannot think of an easy
 solution. 
 
 Would you please suggest a nice and simple method to
 solve this problem. 
 
 For people who are into bioinformatics (I know Danny
 Yoo is a bioinformatician) the question is about GO
 terms.  I parsed OBO file and laid out the term
 lineages that constitute the OBO-DAG structure. I want
 to enrich the terms to do an enrichment analysis for a
 set of terms that I am interested in.
 
 Thank you in advance.
 
 cheers
 Srini
 
 
 
 
 __
 Do You Yahoo!?
 Tired of spam?  Yahoo! Mail has the best spam protection around 
 http://mail.yahoo.com 
 ___
 Tutor maillist  -  Tutor@python.org
 http://mail.python.org/mailman/listinfo/tutor
 
 


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor