Re: [Tutor] Dictionary on data

2015-11-20 Thread Peter Otten
jarod_v6--- via Tutor wrote:

> Dear All!
> I have this  elements
> 
> In [445]: pt = line.split("\t")[9]
> 
> In [446]: pt
> Out[446]: 'gene_id "ENSG0223972"; gene_version "5"; transcript_id
> "ENST0456328"; transcript_version "2"; exon_number "1"; gene_name
> "DDX11L1"; gene_source "havana"; gene_biotype
> "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-002";
> transcript_source "havana"; transcript_biotype "processed_transcript";
> exon_id "ENSE2234944"; exon_version "1"; tag "basic";
> transcript_support_level "1";\n'
> 
> 
> and I want to create a dictionary like this
> 
> gene_id =  "ENSG0223972"; ...
> 
> 
> I found on stack over flow this way to create a dictionary of dictionary
> (http://stackoverflow.com/questions/8550912/python-dictionary-of-dictionaries)
> # This is our sample data
> data = [("Milter", "Miller", 4), ("Milter", "Miler", 4), ("Milter",
> "Malter", 2)]
> 
> # dictionary we want for the result
> dictionary = {}
> 
> # loop that makes it work
>  for realName, falseName, position in data:
> dictionary.setdefault(realName, {})[falseName] = position
> 
> I want to create a dictionary using   setdefault but I have difficult to
> trasform pt as list of tuple.
> 
>  data = pt.split(";")
>  in ()
>   1 for i in data:
>   2 l = i.split()
> > 3 print l[0]
>   4
> 
> IndexError: list index out of range
> 
> In [457]: for i in data:
> l = i.split()
> print l
>.:
> ['gene_id', '"ENSG0223972"']
> ['gene_version', '"5"']
> ['transcript_id', '"ENST0456328"']
> ['transcript_version', '"2"']
> ['exon_number', '"1"']
> ['gene_name', '"DDX11L1"']
> ['gene_source', '"havana"']
> ['gene_biotype', '"transcribed_unprocessed_pseudogene"']
> ['transcript_name', '"DDX11L1-002"']
> ['transcript_source', '"havana"']
> ['transcript_biotype', '"processed_transcript"']
> ['exon_id', '"ENSE2234944"']
> ['exon_version', '"1"']
> ['tag', '"basic"']
> ['transcript_support_level', '"1"']
> []
> 
> 
> So how can do that more elegant way?
> thanks so much!!

I don't see why you would need dict.setdefault(), you have the necessary 
pieces together:

data = pt.split(";")
pairs = (item.split() for item in data)
mydict = {item[0]: item[1].strip('"') for item in pairs if len(item) == 2}

You can protect against whitespace in the quoted strings with 
item.split(None, 1) instead of item.split(). If ";" is allowed in the quoted 
strings you have to work a little harder.



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Dictionary on data

2015-11-20 Thread jarod_v6--- via Tutor
Dear All!
I have this  elements

In [445]: pt = line.split("\t")[9]

In [446]: pt
Out[446]: 'gene_id "ENSG0223972"; gene_version "5"; transcript_id 
"ENST0456328"; transcript_version "2"; exon_number "1"; gene_name 
"DDX11L1"; gene_source "havana"; gene_biotype 
"transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-002"; 
transcript_source "havana"; transcript_biotype "processed_transcript"; exon_id 
"ENSE2234944"; exon_version "1"; tag "basic"; transcript_support_level 
"1";\n'


and I want to create a dictionary like this

gene_id =  "ENSG0223972"; ...


I found on stack over flow this way to create a dictionary of dictionary 
(http://stackoverflow.com/questions/8550912/python-dictionary-of-dictionaries)
# This is our sample data
data = [("Milter", "Miller", 4), ("Milter", "Miler", 4), ("Milter", "Malter", 
2)]

# dictionary we want for the result
dictionary = {}

# loop that makes it work
 for realName, falseName, position in data:
dictionary.setdefault(realName, {})[falseName] = position

I want to create a dictionary using   setdefault but I have difficult to 
trasform pt as list of tuple.

 data = pt.split(";")
 in ()
  1 for i in data:
  2 l = i.split()
> 3 print l[0]
  4 

IndexError: list index out of range

In [457]: for i in data:
l = i.split()
print l
   .: 
['gene_id', '"ENSG0223972"']
['gene_version', '"5"']
['transcript_id', '"ENST0456328"']
['transcript_version', '"2"']
['exon_number', '"1"']
['gene_name', '"DDX11L1"']
['gene_source', '"havana"']
['gene_biotype', '"transcribed_unprocessed_pseudogene"']
['transcript_name', '"DDX11L1-002"']
['transcript_source', '"havana"']
['transcript_biotype', '"processed_transcript"']
['exon_id', '"ENSE2234944"']
['exon_version', '"1"']
['tag', '"basic"']
['transcript_support_level', '"1"']
[]


So how can do that more elegant way?
thanks so much!!


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor