[Tutor] Problem on parsing data

2017-03-13 Thread jarod_v6--- via Tutor




I have a   csv   file  with "," as separator. 

If I try to separate using ",":




 I have many  different rows some with 30 columns some with 50 depend on many 
"," 

In [105]: dimension_columns = []

In [106]: with open(nomi) as f:
for i in f:
lines = i.rstrip("\n").split(",")
if "#"  not  in lines[0]:
dimension_columns.append(len(lines))
   .: 


In [108]: set(dimension_columns)
Out[108]: 
{30,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48,
 49,
 50,
 53,
 54,
 59}

The last coluns are as string "."  but they contqin some "," so using 
that script they parse. 

In [99]: lines
Out[99]: 
['chr10',
 '19896830',
 '19896830',
 'C',
 'A',
 '"intergenic"',
 '"ARL5B(dist=929890)',
 'PLXDC2(dist=208542)"',
 'NA',
 'NA',
 '"Score=458;Name=lod=97"',
 'NA',
 'NA',
 '"0.83"',
 '"rs7909976"',
 'NA',
 'NA',
 'NA',
 'NA',
 'NA',
 'NA',
 'NA',
 'NA',
 'NA',
 'NA',
 'NA',
 'NA',
 'NA',
 'NA',
 'NA',
 '"chr10\t19896830\trs7909976\tC\tA\t.\tREJECT\tDB\tGT:AD:BQ:DP:FA\t0:0',
 '69:.:69:1.00\t0/1:0',
 '37:32:37:1.00"']

What can I do for parse better that file and Have only the  comma outside the 
string ?



















































































































___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] R: Re: Create a pivot table (Peter Otten)

2016-05-20 Thread jarod_v6--- via Tutor
Thanks s much for  the help. I want to obtain table like this:


>csv.writer(sys.stdout, delimiter="\t").writerows(table)
>A100D33 D34 D35 D36 D37 D38 D39
>A   5 0 ...
>B   2 2  ...
>C  0  ..
>
I have tried the pandas way but unfortunately there is many duplicates . So 
Now I move to create a new file using dictionary  and  a file with this format.

('A', A100') 5
('B', 'A100) 2

I just wondering if there is a pythonic way to do this. I don't want to use 
another software if I can

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Create a pivot table

2016-05-19 Thread jarod_v6--- via Tutor
Dear All!
This is my data from a file.  I want to obtain a count table how many times c11 
are present for each project and Samples and  Program.

['Program', 'Sample', 'Featurename', 'Project'],
 ['A', 'A100', 'c11', 'post50'],
 ['A', 'A100', 'c12', 'post50'],
 ['A', 'A100', 'c14', 'post50'],
 ['A', 'A100', 'c67', 'post50'],
 ['A', 'A100', 'c76', 'post50'],
 ['B', 'A100', 'c11', 'post50'],
 ['B', 'A100', 'c99', 'post50'],
 ['B', 'D33', 'c33', 'post50'],
 ['B', 'D33', 'c31', 'post50'],
 ['C', 'D34', 'c32', 'post60'],
 ['C', 'D35', 'c33', 'post60'],
 ['C', 'D36', 'c11', 'post60'],
 ['C', 'D37', 'c45', 'post60'],
 ['C', 'D38', 'c36', 'post60'],
 ['C', 'D39', 'c37', 'post60']
I want to obtain pivot table with samples on columns and  program as rown and 
the values I want fusionwhat it is the best way to do this?thanks in advance!

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] R: Tutor Digest, Vol 146, Issue 23

2016-04-20 Thread jarod_v6--- via Tutor
Thanks so much!!
Now I try to understand. Once I have did the matrix at absence on presence I 
want to subtitute the values  of 1 or 0 inside the table extract some values 
form dictionary called tutto.





___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] R:reformatting data and traspose dictionary

2016-04-20 Thread jarod_v6--- via Tutor
Dear All,
sorry for my not  good presentation of the code.

I read a txt file  and I prepare a ditionary

files = os.listdir(".")
tutto={}
annotatemerge = {}
for i in files:
with open(i,"r") as f:
for it in f:
lines = it.rstrip("\n").split("\t")

if len(lines) >2 and lines[0] != '#CHROM':

conte = [lines[0],lines[1],lines[3],lines[4]]



tutto.setdefault(i+"::"+"-".join(conte)+"::"+str(lines),[]).append(1)

annotatemerge.setdefault("-".join(conte),set()).add(i)



I create two dictionary one

annotatemerge  with use as key some coordinate ( chr3-195710967-C-CG)  and 
connect with a set container with the name of file names
'chr3-195710967-C-CG': {'M8.vcf'},
 'chr17-29550645-T-C': {'M8.vcf'},
 'chr7-140434541-G-A': {'M8.vcf'},
 'chr14-62211578-CGTGT-C': {'M8.vcf', 'R76.vcf'},
 'chr3-197346770-GA-G': {'M8.vcf', 'R76.vcf'},
 'chr17-29683975-C-T': {'M8.vcf'},
 'chr13-48955585-T-A': {'R76.vcf'},

 the other dictionary report more information with as key a list of separated 
using this symbol "::" 


  {["M8.vcf::chr17-29665680-A-G::['chr17', '29665680', '.', 'A', 'G', '70.00', 
'PASS', 'DP=647;TI=NM_001042492,NM_000267;GI=NF1,NF1;FC=Silent,Silent', 'GT:GQ:
AD:VF:NL:SB:GQX', '0/1:70:623,24:0.
0371:20:-38.2744:70']": [1],...}


What I want to obtaine is  a list  whith this format:

coordinate\tM8.vcf\tR76.vcf\n   
chr3-195710967-C-CG\t1\t0\n
chr17-29550645-T-C\t1\t0\n
chr3-197346770-GA-G\t\1\t1\n
chr13-48955585-T-A\t0\t1\n


When I have that file I want to traspose that table so have the coordinate on 
columns and names of samples on rows



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] reformatting data and traspose dictionary

2016-04-20 Thread jarod_v6--- via Tutor
Hi!!!
I have this problems
I have a dictionary  like this:

[Name_file ::position] = lines 

#Samplename::chr10-43606756-C-T::['chr10', '43606756', '.', 'C', 'T', '100.00', 
'PASS', 
'DP=439;TI=NM_020630,NM_020975;GI=RET,RET;FC=Synonymous_V455V,Synonymous_V455V;EXON',
 'GT:GQ:AD:VF:NL:SB:GQX', '0/1:100:387,52:0.1185:20:-100.:100']

And I want to obtain this tables

Name_file on the row and position on the columns and the  one parametr inside 
of  lines


ie.
 chr10-43606756-C-T...
Samplename,Synonymous_V455V



and then I wan to do  the traspose of this matrix.
What is the simple way to do this?

Thanks so much!!


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Strange error (Peter Otten)

2015-11-20 Thread jarod_v6--- via Tutor
Thanks so much!! was a very silly error
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Dictionary on data

2015-11-20 Thread jarod_v6--- via Tutor
Dear All!
I have this  elements

In [445]: pt = line.split("\t")[9]

In [446]: pt
Out[446]: 'gene_id "ENSG0223972"; gene_version "5"; transcript_id 
"ENST0456328"; transcript_version "2"; exon_number "1"; gene_name 
"DDX11L1"; gene_source "havana"; gene_biotype 
"transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-002"; 
transcript_source "havana"; transcript_biotype "processed_transcript"; exon_id 
"ENSE2234944"; exon_version "1"; tag "basic"; transcript_support_level 
"1";\n'


and I want to create a dictionary like this

gene_id =  "ENSG0223972"; ...


I found on stack over flow this way to create a dictionary of dictionary 
(http://stackoverflow.com/questions/8550912/python-dictionary-of-dictionaries)
# This is our sample data
data = [("Milter", "Miller", 4), ("Milter", "Miler", 4), ("Milter", "Malter", 
2)]

# dictionary we want for the result
dictionary = {}

# loop that makes it work
 for realName, falseName, position in data:
dictionary.setdefault(realName, {})[falseName] = position

I want to create a dictionary using   setdefault but I have difficult to 
trasform pt as list of tuple.

 data = pt.split(";")
 in ()
  1 for i in data:
  2 l = i.split()
> 3 print l[0]
  4 

IndexError: list index out of range

In [457]: for i in data:
l = i.split()
print l
   .: 
['gene_id', '"ENSG0223972"']
['gene_version', '"5"']
['transcript_id', '"ENST0456328"']
['transcript_version', '"2"']
['exon_number', '"1"']
['gene_name', '"DDX11L1"']
['gene_source', '"havana"']
['gene_biotype', '"transcribed_unprocessed_pseudogene"']
['transcript_name', '"DDX11L1-002"']
['transcript_source', '"havana"']
['transcript_biotype', '"processed_transcript"']
['exon_id', '"ENSE2234944"']
['exon_version', '"1"']
['tag', '"basic"']
['transcript_support_level', '"1"']
[]


So how can do that more elegant way?
thanks so much!!


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Strange error

2015-11-19 Thread jarod_v6--- via Tutor
HI!!
This is my list:

In [378]: type(Span)
Out[378]: list

In [379]: Span
Out[379]: 
[['M02898:39:0-AH4BK:1:2107:17412:10850',
  'M02898:39:0-AH4BK:1:2117:15242:18766',
  'M02898:39:0-AH4BK:1:1112:21747:21214',
  'M02898:39:0-AH4BK:1:2112:5119:9813',
  'M02898:39:0-AH4BK:1:1102:26568:5630',
  'M02898:39:0-AH4BK:1:2118:19680:11792',
  'M02898:39:0-AH4BK:1:1103:5469:6578',
  'M02898:39:0-AH4BK:1:2101:13087:20965',
  'M02898:39:0-AH4BK:1:1103:28031:13653',
  'M02898:39:0-AH4BK:1:1103:8013:21346',
  'M02898:39:0-AH4BK:1:1107:9189:22557',
  'M02898:39:0-AH4BK:1:2118:21263:23091',
  'M02898:39:0-AH4BK:1:1115:12279:20054',
  'M02898:39:0-AH4BK:1:1102:19433:17489',
  'M02898:39:0-AH4BK:1:1110:14533:11792',
  'M02898:39:0-AH4BK:1:2106:18027:12878',
  'M02898:39:0-AH4BK:1:1104:4408:6824',
  'M02898:39:0-AH4BK:1:2101:5678:7400']]

I have this error


In [381]: len(Span)
---
TypeError Traceback (most recent call last)
 in ()
> 1 len(Span)

TypeError: 'str' object is not callable



Why??? IS a list!!!
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How to parse large files

2015-11-01 Thread jarod_v6--- via Tutor
My file have 1960607  rows but I don't understand why I'm not able to create a 
dictionary in fast way I try to use also gc.disable  but Not work.
I need to have dictionary but I have this erro:

with shelve.open("diz5") as db:
with open("tmp1.txt") as instream:
for line in instream:
assert line.count("\t") == 1
key, _tab, value = line.rstrip("\n").partition("\t")
values = db.get(key) or set()
values.add(value)
db[key] = values

AttributeErrorTraceback (most recent call last)
 in ()
> 1 with shelve.open("diz5") as db:
  2 with open("tmp1.txt") as instream:
  3 for line in instream:
  4 assert line.count("\t") == 1
  5 key, _tab, value = line.rstrip("\n").partition("\t")

AttributeError: DbfilenameShelf instance has no attribute '__exit__'

In [4]: 



I need to do intersection of dictionary key.
thanks  for the help
M.



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How to parse large files

2015-11-01 Thread jarod_v6--- via Tutor
Thanks!! 
I use python2.7 Can Also use in that version?
I don't understand why use partition and not  split(). what is the reason for 
that?

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] How to parse large files

2015-10-27 Thread jarod_v6--- via Tutor
Hi!
I want to reads two files and create simple  dictionary.  Input file contain 
more than 1 rows


diz5 = {}
with open("tmp1.txt") as p:
for i in p:
lines = i.rstrip("\n").split("\t")
diz5.setdefault(lines[0],set()).add(lines[1])

diz3 = {}
with open("tmp2.txt") as p:
for i in p:
lines = i.rstrip("\n").split("\t")
diz3.setdefault(lines[0],set()).add(lines[1])

how can manage better this reading and writing?
thanks so much

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] R: Re: Create complex dictionary :p:

2015-10-27 Thread jarod_v6--- via Tutor
Sorry,
I just realize my question are wrong: I want to knowhow to do dictionary of 
dictionary in most python way:

diz = { "A"={"e"=2,"s"=10},"B"={"e"=20,"s"=7}}

So I have some keys (A,B) is correlate with a new dictionary where I have 
other key I want to use. I want to extract for each key the "e" value.
thanks 
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Create complex dictionary

2015-10-22 Thread jarod_v6--- via Tutor
Hi!!I would like to prepare a dictionary with complex structure:

complex = {name ="value",surname="po",age=poi)
 What is the most pythonic way to build   a dictionary of dictionary?thanks for 
any help!
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Refresh library imported

2015-08-11 Thread jarod_v6--- via Tutor
HI there!!
I  try to develop some scripts. I use ipython for check if my script work.

When  I change  the script and try to import again that script I'm not able to 
see the modification so I need every time close  ipython and run again and 
import the script.
How can do the refresh of library without close ipython?
thanks so much!


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Problem on select esecution of object in a class

2015-08-05 Thread jarod_v6--- via Tutor
I have a class with many objects and I want to  select using opt parse some 
function
id = options.step
ena = Rnaseq(options.configura, options.rst, options.outdir)
now = datetime.datetime.now()
ena.show()
diz = {}
for i,t in enumerate(ena.steps()):
diz.setdefault(i,[]).append(t.__name__)

for i in ena.+.join(diz[id])+():

print i.command

 1 for i in ena.+.join(diz[id])+():
  2 
 3 print i.command
  4 

AttributeError: 'str' object has no attribute 'command'


here you se what they ouptut
ena.+.join(diz[id])+()
Out[85]: 'ena.trimmomatic()

Definition: ena.trimmomatic(self)
Source:
def trimmomatic(self):


Raw reads quality trimming and removing of Illumina adapters is 
performed using 
[Trimmomatic](http://www.usadellab.org/cms/index.php?page=trimmomatic).

This step takes as input files:

1. FASTQ files from the readset file if available


jobs = []
for readset in self.readsets:

trim_file_prefix = os.path.join(self.pt,trim, 
readset.sample.name, readset.name + .trim.)
trim_log = trim_file_prefix + log
trim_stats = trim_file_prefix + stats.csv
:

Any suggestion in how to choose the function to use?
M.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] R: Tutor Digest, Vol 138, Issue 26 Re: Problem on select esecution of object in a class (Alan Gauld)

2015-08-05 Thread jarod_v6--- via Tutor
Thanks so much fro the help. What I want to do is to obtain a selection of the 
function I want to run.

ena = Rnaseq(options.configura, options.rst, options.outdir)
cmdset = [ ena.trimmomatic,
ena.star,
ena.merge_trimmomatic_stats
]
ena.show()
1 ena.trimmomatic
2 ena.star
3 ena.merge_trimmomatic_stats
The class RNaseq have multiple function. I want a way to run or from 1 to 3 or 
from 2 to 3 or only the 2 o 3 step.

...
parser.add_option(-s, --step,action=store, 
dest=steps,type=string,
help= write input file: %prg -o : directory of results )

python myscript -s 1,3 ...

At the moment the only way I found is this:
 for cmd in cmdset: 
step = cmd()
for i in step:
print i.command
but is not elegant so I want to know more what is the right way to generate a 
execution f the function of the class by select which is the step we want to 
start.





However, building function names as strings and then calling
them is usually a bad design pattern. Especially for large
numbers of objects. So maybe if you explain what/why you are
doing this we can suggest a better alternative.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos




--

Subject: Digest Footer

___
Tutor maillist  -  Tutor@python.org
https://mail.python.org/mailman/listinfo/tutor


--

End of Tutor Digest, Vol 138, Issue 26
**



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] R: Tutor Digest, Vol 136, Issue 19

2015-06-08 Thread jarod_v6--- via Tutor
Thanks for the help!! The data I put unfortunately  was  runcated:
ENSG0267199 11.8156750037   1.74423209120.51035586473.4176781572
0.00063157740.0122038731ENSG0267199 NA  NA  NA
ENSG0267206 27.9863824875   -1.7496803666   0.5026610268-3.4808355401   
0.00049985230.0102622293ENSG0267206 LCN6P62502  158062
ENSG0267249 9.3904402364-1.3510262216   0.4923605689-2.743977294
0.00606997360.056855688 ENSG0267249 NA  NA  NA
ENSG0267270 8.36900695071.20362068840.47987522292.5081951119
0.012134964 0.0887668369ENSG0267270 NA
ENSG0267278 5.8613893946-1.7315438788   0.5939508055-2.9152984772   
0.00355348520.0403281717ENSG0267278 NA  NA  NA
ENSG0267328 36.1538389415   -1.7645196111   0.4926869829-3.5814212114   
0.00034173020.007661808 ENSG0267328 NA  NA  NA
ENSG0186575 252.0042869342  -1.6381747801   0.4198974515-3.901368713
9.56503323694447E-005   0.003015761 ENSG0186575 NF2 P35240  4771
ENSG0186716 107.848675839   -0.8308190712   0.3206514872-2.5910345165   
0.00956878940.076412686 ENSG0186716 BCR P11274  613
ENSG0186792 47.1740786192   0.98801583880.36633809422.6970054561
0.00699661240.0623663902ENSG0186792 HYAL3   O43820  8372
ENSG0186868 38.2607453371.85547460450.568540138 3.263577152 
0.0011001523
0.0179441693ENSG0186868 MAPT4137

However I don't know but the code works after reboot ipython.
I obtain only the  symbol name 
The code are not writing in Italian.  the letters are use without no meaning.





___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Problem on filtering data

2015-06-08 Thread jarod_v6--- via Tutor
Dear All;
I have a very silly problem.




with open(Dati_differenzialigistvsminigist_solodiff.csv) as p:
for i in p:
lines = i.strip(\n).split(\t)
if lines[8] != NA:
if lines[8] :
print lines[8]

Why I continue to obtain  empity line?

baseMeanlog2FoldChangelfcSEstatpvaluepadj
ensemblhgnc_symboluniprotentrez
ENSG000146049.2127074325806-1.230249313832590.386060601796602 
   -3.186674082015650.001439188499137720.0214436050108864
ENSG0001460STPG1Q5TH7490529
ENSG0001631104.058286326346-0.805557044512410.294010285837035 
   -2.739894089824180.006145898532814860.0574525590840568
ENSG0001631KRIT1O00522889
ENSG00029331777.439389337051.586302553551380.608489070400574  
  2.606953239944140.009135183446056390.0740469624219782
ENSG0002933TMEM176AQ96HP855365
ENSG00031377.120731227135742.212658945608880.501823258624614  
  4.409239523240291.03734243663752e-050.000620867030402752
ENSG0003137CYP26B1Q9NR6356603
ENSG00039898.98972643541858-1.191561953259440.467992084035133 
   -2.546115615857280.0108929103912960.0832138232974883
ENSG0003989SLC7A2P525696542
ENSG0004478352.6800097036971.077810136145450.371617391411002  
  2.900322108319780.003727793670877750.041335008661432
ENSG0004478FKBP4Q027902288
ENSG00047761808.49276145547-2.226919751099480.560563734648272 
   -3.972643275785177.10794561495787e-050.00243688669444854
ENSG0004776HSPB6O14558126393
ENSG0004779110.0665574143771.03716286291060.375665509210244   
 2.760867946304180.005764798038383960.0552052693506261ENSG0

thanks so much
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor