[Tutor] Problem on parsing data
I have a csv file with "," as separator. If I try to separate using ",": I have many different rows some with 30 columns some with 50 depend on many "," In [105]: dimension_columns = [] In [106]: with open(nomi) as f: for i in f: lines = i.rstrip("\n").split(",") if "#" not in lines[0]: dimension_columns.append(len(lines)) .: In [108]: set(dimension_columns) Out[108]: {30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 53, 54, 59} The last coluns are as string "." but they contqin some "," so using that script they parse. In [99]: lines Out[99]: ['chr10', '19896830', '19896830', 'C', 'A', '"intergenic"', '"ARL5B(dist=929890)', 'PLXDC2(dist=208542)"', 'NA', 'NA', '"Score=458;Name=lod=97"', 'NA', 'NA', '"0.83"', '"rs7909976"', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', '"chr10\t19896830\trs7909976\tC\tA\t.\tREJECT\tDB\tGT:AD:BQ:DP:FA\t0:0', '69:.:69:1.00\t0/1:0', '37:32:37:1.00"'] What can I do for parse better that file and Have only the comma outside the string ? ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] R: Re: Create a pivot table (Peter Otten)
Thanks s much for the help. I want to obtain table like this: >csv.writer(sys.stdout, delimiter="\t").writerows(table) >A100D33 D34 D35 D36 D37 D38 D39 >A 5 0 ... >B 2 2 ... >C 0 .. > I have tried the pandas way but unfortunately there is many duplicates . So Now I move to create a new file using dictionary and a file with this format. ('A', A100') 5 ('B', 'A100) 2 I just wondering if there is a pythonic way to do this. I don't want to use another software if I can ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Create a pivot table
Dear All! This is my data from a file. I want to obtain a count table how many times c11 are present for each project and Samples and Program. ['Program', 'Sample', 'Featurename', 'Project'], ['A', 'A100', 'c11', 'post50'], ['A', 'A100', 'c12', 'post50'], ['A', 'A100', 'c14', 'post50'], ['A', 'A100', 'c67', 'post50'], ['A', 'A100', 'c76', 'post50'], ['B', 'A100', 'c11', 'post50'], ['B', 'A100', 'c99', 'post50'], ['B', 'D33', 'c33', 'post50'], ['B', 'D33', 'c31', 'post50'], ['C', 'D34', 'c32', 'post60'], ['C', 'D35', 'c33', 'post60'], ['C', 'D36', 'c11', 'post60'], ['C', 'D37', 'c45', 'post60'], ['C', 'D38', 'c36', 'post60'], ['C', 'D39', 'c37', 'post60'] I want to obtain pivot table with samples on columns and program as rown and the values I want fusionwhat it is the best way to do this?thanks in advance! ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] R: Tutor Digest, Vol 146, Issue 23
Thanks so much!! Now I try to understand. Once I have did the matrix at absence on presence I want to subtitute the values of 1 or 0 inside the table extract some values form dictionary called tutto. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] R:reformatting data and traspose dictionary
Dear All, sorry for my not good presentation of the code. I read a txt file and I prepare a ditionary files = os.listdir(".") tutto={} annotatemerge = {} for i in files: with open(i,"r") as f: for it in f: lines = it.rstrip("\n").split("\t") if len(lines) >2 and lines[0] != '#CHROM': conte = [lines[0],lines[1],lines[3],lines[4]] tutto.setdefault(i+"::"+"-".join(conte)+"::"+str(lines),[]).append(1) annotatemerge.setdefault("-".join(conte),set()).add(i) I create two dictionary one annotatemerge with use as key some coordinate ( chr3-195710967-C-CG) and connect with a set container with the name of file names 'chr3-195710967-C-CG': {'M8.vcf'}, 'chr17-29550645-T-C': {'M8.vcf'}, 'chr7-140434541-G-A': {'M8.vcf'}, 'chr14-62211578-CGTGT-C': {'M8.vcf', 'R76.vcf'}, 'chr3-197346770-GA-G': {'M8.vcf', 'R76.vcf'}, 'chr17-29683975-C-T': {'M8.vcf'}, 'chr13-48955585-T-A': {'R76.vcf'}, the other dictionary report more information with as key a list of separated using this symbol "::" {["M8.vcf::chr17-29665680-A-G::['chr17', '29665680', '.', 'A', 'G', '70.00', 'PASS', 'DP=647;TI=NM_001042492,NM_000267;GI=NF1,NF1;FC=Silent,Silent', 'GT:GQ: AD:VF:NL:SB:GQX', '0/1:70:623,24:0. 0371:20:-38.2744:70']": [1],...} What I want to obtaine is a list whith this format: coordinate\tM8.vcf\tR76.vcf\n chr3-195710967-C-CG\t1\t0\n chr17-29550645-T-C\t1\t0\n chr3-197346770-GA-G\t\1\t1\n chr13-48955585-T-A\t0\t1\n When I have that file I want to traspose that table so have the coordinate on columns and names of samples on rows ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] reformatting data and traspose dictionary
Hi!!! I have this problems I have a dictionary like this: [Name_file ::position] = lines #Samplename::chr10-43606756-C-T::['chr10', '43606756', '.', 'C', 'T', '100.00', 'PASS', 'DP=439;TI=NM_020630,NM_020975;GI=RET,RET;FC=Synonymous_V455V,Synonymous_V455V;EXON', 'GT:GQ:AD:VF:NL:SB:GQX', '0/1:100:387,52:0.1185:20:-100.:100'] And I want to obtain this tables Name_file on the row and position on the columns and the one parametr inside of lines ie. chr10-43606756-C-T... Samplename,Synonymous_V455V and then I wan to do the traspose of this matrix. What is the simple way to do this? Thanks so much!! ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Strange error (Peter Otten)
Thanks so much!! was a very silly error ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Dictionary on data
Dear All! I have this elements In [445]: pt = line.split("\t")[9] In [446]: pt Out[446]: 'gene_id "ENSG0223972"; gene_version "5"; transcript_id "ENST0456328"; transcript_version "2"; exon_number "1"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-002"; transcript_source "havana"; transcript_biotype "processed_transcript"; exon_id "ENSE2234944"; exon_version "1"; tag "basic"; transcript_support_level "1";\n' and I want to create a dictionary like this gene_id = "ENSG0223972"; ... I found on stack over flow this way to create a dictionary of dictionary (http://stackoverflow.com/questions/8550912/python-dictionary-of-dictionaries) # This is our sample data data = [("Milter", "Miller", 4), ("Milter", "Miler", 4), ("Milter", "Malter", 2)] # dictionary we want for the result dictionary = {} # loop that makes it work for realName, falseName, position in data: dictionary.setdefault(realName, {})[falseName] = position I want to create a dictionary using setdefault but I have difficult to trasform pt as list of tuple. data = pt.split(";") in () 1 for i in data: 2 l = i.split() > 3 print l[0] 4 IndexError: list index out of range In [457]: for i in data: l = i.split() print l .: ['gene_id', '"ENSG0223972"'] ['gene_version', '"5"'] ['transcript_id', '"ENST0456328"'] ['transcript_version', '"2"'] ['exon_number', '"1"'] ['gene_name', '"DDX11L1"'] ['gene_source', '"havana"'] ['gene_biotype', '"transcribed_unprocessed_pseudogene"'] ['transcript_name', '"DDX11L1-002"'] ['transcript_source', '"havana"'] ['transcript_biotype', '"processed_transcript"'] ['exon_id', '"ENSE2234944"'] ['exon_version', '"1"'] ['tag', '"basic"'] ['transcript_support_level', '"1"'] [] So how can do that more elegant way? thanks so much!! ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Strange error
HI!! This is my list: In [378]: type(Span) Out[378]: list In [379]: Span Out[379]: [['M02898:39:0-AH4BK:1:2107:17412:10850', 'M02898:39:0-AH4BK:1:2117:15242:18766', 'M02898:39:0-AH4BK:1:1112:21747:21214', 'M02898:39:0-AH4BK:1:2112:5119:9813', 'M02898:39:0-AH4BK:1:1102:26568:5630', 'M02898:39:0-AH4BK:1:2118:19680:11792', 'M02898:39:0-AH4BK:1:1103:5469:6578', 'M02898:39:0-AH4BK:1:2101:13087:20965', 'M02898:39:0-AH4BK:1:1103:28031:13653', 'M02898:39:0-AH4BK:1:1103:8013:21346', 'M02898:39:0-AH4BK:1:1107:9189:22557', 'M02898:39:0-AH4BK:1:2118:21263:23091', 'M02898:39:0-AH4BK:1:1115:12279:20054', 'M02898:39:0-AH4BK:1:1102:19433:17489', 'M02898:39:0-AH4BK:1:1110:14533:11792', 'M02898:39:0-AH4BK:1:2106:18027:12878', 'M02898:39:0-AH4BK:1:1104:4408:6824', 'M02898:39:0-AH4BK:1:2101:5678:7400']] I have this error In [381]: len(Span) --- TypeError Traceback (most recent call last) in () > 1 len(Span) TypeError: 'str' object is not callable Why??? IS a list!!! ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] How to parse large files
My file have 1960607 rows but I don't understand why I'm not able to create a dictionary in fast way I try to use also gc.disable but Not work. I need to have dictionary but I have this erro: with shelve.open("diz5") as db: with open("tmp1.txt") as instream: for line in instream: assert line.count("\t") == 1 key, _tab, value = line.rstrip("\n").partition("\t") values = db.get(key) or set() values.add(value) db[key] = values AttributeErrorTraceback (most recent call last) in () > 1 with shelve.open("diz5") as db: 2 with open("tmp1.txt") as instream: 3 for line in instream: 4 assert line.count("\t") == 1 5 key, _tab, value = line.rstrip("\n").partition("\t") AttributeError: DbfilenameShelf instance has no attribute '__exit__' In [4]: I need to do intersection of dictionary key. thanks for the help M. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] How to parse large files
Thanks!! I use python2.7 Can Also use in that version? I don't understand why use partition and not split(). what is the reason for that? ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] How to parse large files
Hi! I want to reads two files and create simple dictionary. Input file contain more than 1 rows diz5 = {} with open("tmp1.txt") as p: for i in p: lines = i.rstrip("\n").split("\t") diz5.setdefault(lines[0],set()).add(lines[1]) diz3 = {} with open("tmp2.txt") as p: for i in p: lines = i.rstrip("\n").split("\t") diz3.setdefault(lines[0],set()).add(lines[1]) how can manage better this reading and writing? thanks so much ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] R: Re: Create complex dictionary :p:
Sorry, I just realize my question are wrong: I want to knowhow to do dictionary of dictionary in most python way: diz = { "A"={"e"=2,"s"=10},"B"={"e"=20,"s"=7}} So I have some keys (A,B) is correlate with a new dictionary where I have other key I want to use. I want to extract for each key the "e" value. thanks ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Create complex dictionary
Hi!!I would like to prepare a dictionary with complex structure: complex = {name ="value",surname="po",age=poi) What is the most pythonic way to build a dictionary of dictionary?thanks for any help! ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Refresh library imported
HI there!! I try to develop some scripts. I use ipython for check if my script work. When I change the script and try to import again that script I'm not able to see the modification so I need every time close ipython and run again and import the script. How can do the refresh of library without close ipython? thanks so much! ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Problem on select esecution of object in a class
I have a class with many objects and I want to select using opt parse some function id = options.step ena = Rnaseq(options.configura, options.rst, options.outdir) now = datetime.datetime.now() ena.show() diz = {} for i,t in enumerate(ena.steps()): diz.setdefault(i,[]).append(t.__name__) for i in ena.+.join(diz[id])+(): print i.command 1 for i in ena.+.join(diz[id])+(): 2 3 print i.command 4 AttributeError: 'str' object has no attribute 'command' here you se what they ouptut ena.+.join(diz[id])+() Out[85]: 'ena.trimmomatic() Definition: ena.trimmomatic(self) Source: def trimmomatic(self): Raw reads quality trimming and removing of Illumina adapters is performed using [Trimmomatic](http://www.usadellab.org/cms/index.php?page=trimmomatic). This step takes as input files: 1. FASTQ files from the readset file if available jobs = [] for readset in self.readsets: trim_file_prefix = os.path.join(self.pt,trim, readset.sample.name, readset.name + .trim.) trim_log = trim_file_prefix + log trim_stats = trim_file_prefix + stats.csv : Any suggestion in how to choose the function to use? M. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] R: Tutor Digest, Vol 138, Issue 26 Re: Problem on select esecution of object in a class (Alan Gauld)
Thanks so much fro the help. What I want to do is to obtain a selection of the function I want to run. ena = Rnaseq(options.configura, options.rst, options.outdir) cmdset = [ ena.trimmomatic, ena.star, ena.merge_trimmomatic_stats ] ena.show() 1 ena.trimmomatic 2 ena.star 3 ena.merge_trimmomatic_stats The class RNaseq have multiple function. I want a way to run or from 1 to 3 or from 2 to 3 or only the 2 o 3 step. ... parser.add_option(-s, --step,action=store, dest=steps,type=string, help= write input file: %prg -o : directory of results ) python myscript -s 1,3 ... At the moment the only way I found is this: for cmd in cmdset: step = cmd() for i in step: print i.command but is not elegant so I want to know more what is the right way to generate a execution f the function of the class by select which is the step we want to start. However, building function names as strings and then calling them is usually a bad design pattern. Especially for large numbers of objects. So maybe if you explain what/why you are doing this we can suggest a better alternative. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos -- Subject: Digest Footer ___ Tutor maillist - Tutor@python.org https://mail.python.org/mailman/listinfo/tutor -- End of Tutor Digest, Vol 138, Issue 26 ** ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] R: Tutor Digest, Vol 136, Issue 19
Thanks for the help!! The data I put unfortunately was runcated: ENSG0267199 11.8156750037 1.74423209120.51035586473.4176781572 0.00063157740.0122038731ENSG0267199 NA NA NA ENSG0267206 27.9863824875 -1.7496803666 0.5026610268-3.4808355401 0.00049985230.0102622293ENSG0267206 LCN6P62502 158062 ENSG0267249 9.3904402364-1.3510262216 0.4923605689-2.743977294 0.00606997360.056855688 ENSG0267249 NA NA NA ENSG0267270 8.36900695071.20362068840.47987522292.5081951119 0.012134964 0.0887668369ENSG0267270 NA ENSG0267278 5.8613893946-1.7315438788 0.5939508055-2.9152984772 0.00355348520.0403281717ENSG0267278 NA NA NA ENSG0267328 36.1538389415 -1.7645196111 0.4926869829-3.5814212114 0.00034173020.007661808 ENSG0267328 NA NA NA ENSG0186575 252.0042869342 -1.6381747801 0.4198974515-3.901368713 9.56503323694447E-005 0.003015761 ENSG0186575 NF2 P35240 4771 ENSG0186716 107.848675839 -0.8308190712 0.3206514872-2.5910345165 0.00956878940.076412686 ENSG0186716 BCR P11274 613 ENSG0186792 47.1740786192 0.98801583880.36633809422.6970054561 0.00699661240.0623663902ENSG0186792 HYAL3 O43820 8372 ENSG0186868 38.2607453371.85547460450.568540138 3.263577152 0.0011001523 0.0179441693ENSG0186868 MAPT4137 However I don't know but the code works after reboot ipython. I obtain only the symbol name The code are not writing in Italian. the letters are use without no meaning. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Problem on filtering data
Dear All; I have a very silly problem. with open(Dati_differenzialigistvsminigist_solodiff.csv) as p: for i in p: lines = i.strip(\n).split(\t) if lines[8] != NA: if lines[8] : print lines[8] Why I continue to obtain empity line? baseMeanlog2FoldChangelfcSEstatpvaluepadj ensemblhgnc_symboluniprotentrez ENSG000146049.2127074325806-1.230249313832590.386060601796602 -3.186674082015650.001439188499137720.0214436050108864 ENSG0001460STPG1Q5TH7490529 ENSG0001631104.058286326346-0.805557044512410.294010285837035 -2.739894089824180.006145898532814860.0574525590840568 ENSG0001631KRIT1O00522889 ENSG00029331777.439389337051.586302553551380.608489070400574 2.606953239944140.009135183446056390.0740469624219782 ENSG0002933TMEM176AQ96HP855365 ENSG00031377.120731227135742.212658945608880.501823258624614 4.409239523240291.03734243663752e-050.000620867030402752 ENSG0003137CYP26B1Q9NR6356603 ENSG00039898.98972643541858-1.191561953259440.467992084035133 -2.546115615857280.0108929103912960.0832138232974883 ENSG0003989SLC7A2P525696542 ENSG0004478352.6800097036971.077810136145450.371617391411002 2.900322108319780.003727793670877750.041335008661432 ENSG0004478FKBP4Q027902288 ENSG00047761808.49276145547-2.226919751099480.560563734648272 -3.972643275785177.10794561495787e-050.00243688669444854 ENSG0004776HSPB6O14558126393 ENSG0004779110.0665574143771.03716286291060.375665509210244 2.760867946304180.005764798038383960.0552052693506261ENSG0 thanks so much ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor