Re: [Tutor] table to dictionary and then analysis
thanks for all of the responses, has been really helpful On Fri, May 18, 2012 at 8:54 PM, Russel Winder rus...@winder.org.uk wrote: On Thu, 2012-05-17 at 19:35 +1000, Steven D'Aprano wrote: On Thu, May 17, 2012 at 08:27:07AM +0100, Russel Winder wrote: Should we be promoting use of the format method in strings rather than the % operator? % is deprecated now. It most certainly is not. There are no plans to deprecate the string % operator any time in the foreseeable future. It may, hypothetically, wither away from lack of use OK I am clearly wrong with the statement I made. I had assumed the statement (*) that it would be deprecated in 3.1 was carried through, I had not actually checked the reality in the 3.1 and 3.2 differences documents. My apologies for misdirection, thanks for pulling me up on this. (*) There is no statement of when deprecation would happen in PEP 3101 itself http://www.python.org/dev/peps/pep-3101/ but there is an explicit statement in http://docs.python.org/release/3.0/whatsnew/3.0.html#pep-3101-a-new-approach-to-string-formattingwhich clearly didn't happen even though it led to a lot of people thinking it would. -- Russel. = Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.win...@ekiga.net 41 Buckmaster Roadm: +44 7770 465 077 xmpp: rus...@winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] table to dictionary and then analysis
On Thu, 2012-05-17 at 19:35 +1000, Steven D'Aprano wrote: On Thu, May 17, 2012 at 08:27:07AM +0100, Russel Winder wrote: Should we be promoting use of the format method in strings rather than the % operator? % is deprecated now. It most certainly is not. There are no plans to deprecate the string % operator any time in the foreseeable future. It may, hypothetically, wither away from lack of use OK I am clearly wrong with the statement I made. I had assumed the statement (*) that it would be deprecated in 3.1 was carried through, I had not actually checked the reality in the 3.1 and 3.2 differences documents. My apologies for misdirection, thanks for pulling me up on this. (*) There is no statement of when deprecation would happen in PEP 3101 itself http://www.python.org/dev/peps/pep-3101/ but there is an explicit statement in http://docs.python.org/release/3.0/whatsnew/3.0.html#pep-3101-a-new-approach-to-string-formatting which clearly didn't happen even though it led to a lot of people thinking it would. -- Russel. = Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.win...@ekiga.net 41 Buckmaster Roadm: +44 7770 465 077 xmpp: rus...@winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder signature.asc Description: This is a digitally signed message part ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] table to dictionary and then analysis
On Wed, 2012-05-16 at 12:57 -0400, Joel Goldstick wrote: [...] I think the OP is just learning and this thread may have gotten of track. I didn't realize discussion of immediate side issues and alternatives, and allowing people to exchange information was OT in this mailing list. Also of course, OP didn't mention databases, but was asking how to do it with lists and dictionaries. I think there is irony somewhere in here. Here is some code to get started. I decided to use sqlite3 since its easy to use with python -- no finding and learning to load packages. #!/usr/bin/env python import sqlite3 as db # Ideally this shouldn't be global, but in this short code snippet it gets the job done # here we create a database and get a cursor conn = db.connect('climate.db') cursor = conn.cursor() print cursor I believe that there are more problems than just global data here. One obvious thing is this code is not safe against exceptions. I appreciate this is a trivially small program, but I think it important that sample code presents good style or explicitly states what is wrong so as to present what not to do. Your comment about global sits well with this, but for me doesn't go far enough. Python introduced context managers and the with statement exactly for this sort of situation, following the lead of C++ with RAII. I think we should all get into the habit of using the with statement automatically in this situation. # this will create a table for our data sql_create = CREATE TABLE if not exists rain ( id INTEGER PRIMARY KEY, year INTEGER, month TEXT(3), rainfall FLOAT, fire_area FLOAT ) # this will read the data file and put it in our database def populate_climate_table(file_name): reads the file_name and insert data into sqlite table sql_insert_string = insert into rain (year, month, rainfall, fire_area) values (%d, '%s', %f, %f) f = open(file_name) Same comment about context managers and with statement applies here: this code is not exception safe. f.readline() # get rid of column headers for l in f.readlines(): data_list = l.split() print data_list sql_insert = sql_insert_string % (int(data_list[0]), data_list[1], float(data_list[2]), float(data_list[3])) Should we be promoting use of the format method in strings rather than the % operator? % is deprecated now. Although not an issue here, this sort of SQL string manipulation is at the heart of SQL injection attacks and so should be frowned upon. Hence SQLAlchemy's expression languages, which goes some way to avoiding the whole issue. At the expense of having to load an additional package. With package management on Debian/Fedora/Ubuntu/MacPorts or the pip command this is not difficult to add. print sql_insert cursor.execute(sql_insert) conn.commit() if __name__ == '__main__': print sql_create cursor.execute(sql_create) populate_climate_table('data.txt') So, I haven't solved all of the questions with this code. The next thing to do is to read a little about sqlite select statements. for example: sqlite select sum(rainfall)/count(*) from rain; 3.97352768125 This statement will give the average rainfall over the complete dataset. To get the ave rainfall for a given year do this: sqlite select sum(rainfall)/count(*) from rain where year = 1983; Come back with more questions -- Russel. = Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.win...@ekiga.net 41 Buckmaster Roadm: +44 7770 465 077 xmpp: rus...@winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder signature.asc Description: This is a digitally signed message part ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] table to dictionary and then analysis
On Wed, 2012-05-16 at 16:03 +0100, Alan Gauld wrote: [...] I agree, but in this case SQL seemed like the most likely fit of the ones I knew. however: Which raises the point that the best design of a given problem in a given context is the one that is most comprehensible to the people directly involved. SQL MongoDB I know about these I have it on good authority yesterday that MongDB is only properly useful in a single store context, i.e. not a replicated cluster. Along withthis comes news that Riak is very good and has a Python API. CouchDB Cassandra Neo These are new to me. CouchDB is an Erlang implemented system. Ubuntu One uses this for example. Cassandra is an Apache project, a JVM-based system. MongoDB, CouchDB and Cassandra are document stores. Neo4J is a graph repository so of a very different architecture and performance characteristics. And then there is Redis :-) etc. Python only has SQLite3 as standard but there are alternatives. I have been using PyMongo quite successfully. Python comes with several storage/access options including shelve, gdbm, ldap, cobfig files, XML, in addition to SQL. Indeed. The problem I have had with shelve for this sort of thing is that is is critically dependent on the pickling algorithm and so potentially Python version dependent. [...] on flexiblity of data format. The OPs requirements suggested intelligent filtering of a fixed record format which is one of the areas where SQL works well. The other side of the coin is that the data is essentially single table so the relationship management aspects of SQL would not be needed. So I agree we don't have enough detail to be 100% sure that another option would not work as well or better. The signpost here is that the table as is is likely not in third normal form, and that if the problem currently being solved was actually a small part of a bigger problem, this issue would need to be addressed. But most other options require learning new (often bespoke) query languages and have limited user tools. All of these factors need to be included too. Mongo et al tend to be better suited, in my experience, to machine access applications rather than end user access. Agreed. Unfortunately, vendor commercial issues often get in the way of experimenting to find out where the NoSQL systems are genuinely better than SQL ones. We will get there though. Interesting, or not, the Big Data people are rapidly realizing that data mining and SQL are mutually incompatible. The current trend is towards streaming whole databases through dataflow programs. But Java rather than Python is the default language in that world. There are various articles around the Web comparing and contrasting these various models. Some of the articles are even reasonable :-) Wikipedia is my friend :-) :-) -- Russel. = Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.win...@ekiga.net 41 Buckmaster Roadm: +44 7770 465 077 xmpp: rus...@winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder signature.asc Description: This is a digitally signed message part ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] table to dictionary and then analysis
On 17/05/12 08:39, Russel Winder wrote: Interesting, or not, the Big Data people are rapidly realizing that data mining and SQL are mutually incompatible. After many years working with big data mining teams/apps my considered opinion is use SAS or one of its peers! It costs money but it works. And the savings in development time far outweigh the costs. And if you have to manage 100TB or more then the costs of SAS licenses are likely to be the least of your worries! And if you are really serious about it run SAS on top of a Teradata $torage $ystem... Note: I have no commercial connection with either company, just an admirer of their technology solutions. :-) -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] table to dictionary and then analysis
On Thu, May 17, 2012 at 08:27:07AM +0100, Russel Winder wrote: Should we be promoting use of the format method in strings rather than the % operator? % is deprecated now. It most certainly is not. There are no plans to deprecate the string % operator any time in the foreseeable future. It may, hypothetically, wither away from lack of use (although I doubt it -- so long as there are C programmers, there will be people who like % formatting). If you think it is deprecated, please show me in the official Python documentation where it says so. As far as I am concerned, the final word on deprecation belongs to the creator of Python, and BDFL, Guido van Rossum, who hopes that *at best* the format method will gradually replace % formatting over the next decade or so before the (as yet hypothetical) Python 4: http://mail.python.org/pipermail/python-dev/2009-September/092399.html Rather than repeat myself, I will just point to what I wrote back in January: http://mail.python.org/pipermail/python-list/2012-January/1285894.html -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] table to dictionary and then analysis
On 5/17/2012 3:27 AM, Russel Winder wrote: Should we be promoting use of the format method in strings rather than the % operator? % is deprecated now. I for one do not like seeing % deprecated. Why? It is not broken, and IMHO the easiest to use of all formatting options. -- Bob Gailer 919-636-4239 Chapel Hill NC ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] table to dictionary and then analysis
On 17/05/2012 10:35, Steven D'Aprano wrote: On Thu, May 17, 2012 at 08:27:07AM +0100, Russel Winder wrote: Should we be promoting use of the format method in strings rather than the % operator? % is deprecated now. It most certainly is not. There are no plans to deprecate the string % operator any time in the foreseeable future. It may, hypothetically, wither away from lack of use (although I doubt it -- so long as there are C programmers, there will be people who like % formatting). If you think it is deprecated, please show me in the official Python documentation where it says so. As far as I am concerned, the final word on deprecation belongs to the creator of Python, and BDFL, Guido van Rossum, who hopes that *at best* the format method will gradually replace % formatting over the next decade or so before the (as yet hypothetical) Python 4: http://mail.python.org/pipermail/python-dev/2009-September/092399.html Rather than repeat myself, I will just point to what I wrote back in January: http://mail.python.org/pipermail/python-list/2012-January/1285894.html You beat me to it :) -- Cheers. Mark Lawrence. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] table to dictionary and then analysis
On Tue, 2012-05-15 at 19:14 +0100, Alan Gauld wrote: On 15/05/12 10:36, Russel Winder wrote: ...queries passed over it then year a database it the right thing -- though I would probably choose a non-SQL database. As a matter of interest why? Because there are alternatives that need to be investigated on a per problem basis for the best database. SQL MongoDB CouchDB Cassandra Neo etc. Python only has SQLite3 as standard but there are alternatives. I have been using PyMongo quite successfully. And what kind of alternative would you use? See above ;-) It seems to me that SQL is ideally suited(*) to this type of role. I'm curious what the alternatives might be and why they would be preferred? (*)Because: Flexible query language, wide set of tools including GUI query builders, reporting tools etc. Plus easy integration with programming environments, scaleability (less an issue here), security(usually) etc. It is not clear that the original table works better with the relational model compared to one of the key-value stores or document stores. It might. But I no longer always go to SQL for persistence as I used to a few years ago. There are various articles around the Web comparing and contrasting these various models. Some of the articles are even reasonable :-) -- Russel. = Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.win...@ekiga.net 41 Buckmaster Roadm: +44 7770 465 077 xmpp: rus...@winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder signature.asc Description: This is a digitally signed message part ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] table to dictionary and then analysis
On 16/05/12 12:27, Russel Winder wrote: As a matter of interest why? Because there are alternatives that need to be investigated on a per problem basis for the best database. I agree, but in this case SQL seemed like the most likely fit of the ones I knew. however: SQL MongoDB I know about these CouchDB Cassandra Neo These are new to me. etc. Python only has SQLite3 as standard but there are alternatives. I have been using PyMongo quite successfully. Python comes with several storage/access options including shelve, gdbm, ldap, cobfig files, XML, in addition to SQL. It is not clear that the original table works better with the relational model compared to one of the key-value stores or document stores. Most key-value stores are optimised for fast queries of a single type and generally not great at grouping or ordering. They also tend to major on flexiblity of data format. The OPs requirements suggested intelligent filtering of a fixed record format which is one of the areas where SQL works well. The other side of the coin is that the data is essentially single table so the relationship management aspects of SQL would not be needed. So I agree we don't have enough detail to be 100% sure that another option would not work as well or better. But most other options require learning new (often bespoke) query languages and have limited user tools. All of these factors need to be included too. Mongo et al tend to be better suited, in my experience, to machine access applications rather than end user access. There are various articles around the Web comparing and contrasting these various models. Some of the articles are even reasonable :-) Wikipedia is my friend :-) -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] table to dictionary and then analysis
On Wed, May 16, 2012 at 11:03 AM, Alan Gauld alan.ga...@btinternet.com wrote: On 16/05/12 12:27, Russel Winder wrote: As a matter of interest why? Because there are alternatives that need to be investigated on a per problem basis for the best database. I agree, but in this case SQL seemed like the most likely fit of the ones I knew. however: SQL MongoDB I know about these CouchDB Cassandra Neo These are new to me. etc. Python only has SQLite3 as standard but there are alternatives. I have been using PyMongo quite successfully. Python comes with several storage/access options including shelve, gdbm, ldap, cobfig files, XML, in addition to SQL. It is not clear that the original table works better with the relational model compared to one of the key-value stores or document stores. Most key-value stores are optimised for fast queries of a single type and generally not great at grouping or ordering. They also tend to major on flexiblity of data format. The OPs requirements suggested intelligent filtering of a fixed record format which is one of the areas where SQL works well. The other side of the coin is that the data is essentially single table so the relationship management aspects of SQL would not be needed. So I agree we don't have enough detail to be 100% sure that another option would not work as well or better. But most other options require learning new (often bespoke) query languages and have limited user tools. All of these factors need to be included too. Mongo et al tend to be better suited, in my experience, to machine access applications rather than end user access. There are various articles around the Web comparing and contrasting these various models. Some of the articles are even reasonable :-) Wikipedia is my friend :-) -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor I think the OP is just learning and this thread may have gotten of track. Here is some code to get started. I decided to use sqlite3 since its easy to use with python -- no finding and learning to load packages. #!/usr/bin/env python import sqlite3 as db # Ideally this shouldn't be global, but in this short code snippet it gets the job done # here we create a database and get a cursor conn = db.connect('climate.db') cursor = conn.cursor() print cursor # this will create a table for our data sql_create = CREATE TABLE if not exists rain ( id INTEGER PRIMARY KEY, year INTEGER, month TEXT(3), rainfall FLOAT, fire_area FLOAT ) # this will read the data file and put it in our database def populate_climate_table(file_name): reads the file_name and insert data into sqlite table sql_insert_string = insert into rain (year, month, rainfall, fire_area) values (%d, '%s', %f, %f) f = open(file_name) f.readline() # get rid of column headers for l in f.readlines(): data_list = l.split() print data_list sql_insert = sql_insert_string % (int(data_list[0]), data_list[1], float(data_list[2]), float(data_list[3])) print sql_insert cursor.execute(sql_insert) conn.commit() if __name__ == '__main__': print sql_create cursor.execute(sql_create) populate_climate_table('data.txt') So, I haven't solved all of the questions with this code. The next thing to do is to read a little about sqlite select statements. for example: sqlite select sum(rainfall)/count(*) from rain; 3.97352768125 This statement will give the average rainfall over the complete dataset. To get the ave rainfall for a given year do this: sqlite select sum(rainfall)/count(*) from rain where year = 1983; Come back with more questions -- Joel Goldstick ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] table to dictionary and then analysis
Thanks Bob, sql does appear to be very simple although I cannot get the queries to work. Can you suggest a site that has examples for what I am trying to do. I have done some googling but it has not been successful so far. On Tue, May 15, 2012 at 1:38 PM, bob gailer bgai...@gmail.com wrote: On 5/14/2012 10:16 PM, questions anon wrote: I am completely new to dictionaries and I am not even sure if this is what I need to use. I have a text file that I would like to run summary stats on particular months, years and climate indices (in this case the climate indices are rainfall and fire area, so not actualy climate indices at all). I would set up a SQLite database with a table of 4 numeric columns: year, month, rainfall, firearea Use SQL to select the desired date range and do the max and avg calculations: select year, avg(firearea), max(rainfall) from table where year = 1973 and month between 6 and 8) you can use dictionaries but that will be harder. Here a start (untested). Assumes data are correct. months = dict(Jan=1,Feb=2,Mar=4,Apr=4,May=5,Jun=6,Jul=7,Aug=8,Sep=9.Oct=10,Nov=11,Dec=12) for line in open('d:/yearmonthrainfire.txt','r'): line = line.split() year = int(line[0]) month = months[line[1]] rainfall = float(line[2] firearea = float(line[3] sql = insert into table (year, month, rainfall, firearea) values(%i,%i,%f,%f) % (year, month, rainfall, firearea) # I don't have handy how one runs the sql A text file is attached but a small sample of the file: rainfallfirearea 1972 Jan12.70831990 1972 Feb14.170071420 1972 Mar14.56593020 1972 Apr1.5085173020 1972 May2.7800098890 1972 Jun1.6096192870 1972 Jul0.13815018128 1972 Aug0.21434614832 1972 Sep1.32210222834747.8 1972 Oct0.0926631373655.9 1972 Nov1.85227663585.1 1972 Dec2.01120600242959.6 1973 Jan5.55704346153.5 1973 Feb12.60326356116.2 1973 Mar11.08849105223.6 1973 Apr5.8649254492.4 .. I have used an example from a book (a primer on scientific programming with python) and it seems to be working (see below) but I am not sure if I have my keys etc. are set up correctly to then begin anlaysis, and even how to use the dictionaries in my analysis . For example how can I print out the year with calculated the mean 'firearea' of June-July-August for that year along with the maximum 'rainfall' for June-July-august of the same year? Any feedback will be greatly appreaciated! infile=open('d:/yearmonthrainfire.txt','r') lines=infile.readlines() infile.close() data={} #data[index][year]=indexvalue first_line=lines[0] climateindexname=first_line.split() for index in climateindexname: data[index]={} YEAR={} MONTH={} for line in lines[2:]: words=line.split() year=words[0] #years YEAR[year]={} month=words[1] #months MONTH[month]={} values=words[2:] #values of climateindices for index, v in zip(climateindexname, values): if v !=' ': data[index][year]=float(v) print years=, YEAR print months=, MONTH print data=, data We usually reserve all caps names for constants. You have way too many dictionaries. Your program seems very complex for a very simple task. I will not attempt to figure out what it does. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options:http://mail.python.org/mailman/listinfo/tutor -- Bob Gailer919-636-4239 Chapel Hill NC ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] table to dictionary and then analysis
On 15/05/12 07:12, questions anon wrote: Thanks Bob, sql does appear to be very simple although I cannot get the queries to work. Can you suggest a site that has examples for what I am trying to do. I have done some googling but it has not been successful so far. You can try my tutorial topic on databases. It covers the basics of using SQLlite to query data in a fairly concise format. Its only available in the Version 2 tutor so far... http://www.alan-g.me.uk/tutor/tutdbms.htm -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] table to dictionary and then analysis
On Mon, 2012-05-14 at 23:38 -0400, bob gailer wrote: [...] I would set up a SQLite database with a table of 4 numeric columns: year, month, rainfall, firearea Use SQL to select the desired date range and do the max and avg calculations: select year, avg(firearea), max(rainfall) from table where year = 1973 and month between 6 and 8) you can use dictionaries but that will be harder. Here a start (untested). Assumes data are correct. Clearly if the data is to be stored for a long time and have various (currently unknown) queries passed over it then year a database it the right thing -- though I would probably choose a non-SQL database. If the issues is to just do quick calculations over the data in the file format then nothing wrong with using dictionaries or parallel arrays à la: with open ( 'yearmonthrainfire.txt' ) as infile : climateindexname = infile.readline ( ).split ( ) data = [ line.split ( ) for line in infile.readlines ( ) ] years = sorted ( { item[0] for item in data } ) months = [ 'Jan' , 'Feb' , 'Mar' , 'Apr' , 'May' , 'Jun' , 'Jul' , 'Aug' , 'Sep' , 'Oct' , 'Nov' , 'Dec' ] dataByYear = { year : [ ( float ( item[2] ) , float ( item[3] ) ) for item in data if item[0] == year ] for year in years } dataByMonth = { month : [ ( float ( item[2] ) , float ( item[3] ) ) for item in data if item[1] == month ] for month in months } averagesByYear = { year : ( sum ( dataByYear[year][0] ) / len ( dataByYear[year][0] ) , sum ( dataByYear[year][1] ) / len ( dataByYear[year][1] ) ) for year in years } averagesByMonth = { month : ( sum ( dataByMonth[month][0] ) / len ( dataByMonth[month][0] ) , sum ( dataByMonth[month][1] ) / len ( dataByMonth[month][1] ) ) for month in months } for year in years : print ( year , averagesByYear[year][0] , averagesByYear[year][1] ) for month in months : print ( month , averagesByMonth[month][0] , averagesByMonth[month][1] ) The cost of the repetition in the code here is probably minimal compared to the disc access costs. On the other hand this is a small data set so time is probably not a big issue. -- Russel. = Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.win...@ekiga.net 41 Buckmaster Roadm: +44 7770 465 077 xmpp: rus...@winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder signature.asc Description: This is a digitally signed message part ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] table to dictionary and then analysis
On 15/05/12 10:36, Russel Winder wrote: ...queries passed over it then year a database it the right thing -- though I would probably choose a non-SQL database. As a matter of interest why? And what kind of alternative would you use? It seems to me that SQL is ideally suited(*) to this type of role. I'm curious what the alternatives might be and why they would be preferred? (*)Because: Flexible query language, wide set of tools including GUI query builders, reporting tools etc. Plus easy integration with programming environments, scaleability (less an issue here), security(usually) etc. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] table to dictionary and then analysis
I am completely new to dictionaries and I am not even sure if this is what I need to use. I have a text file that I would like to run summary stats on particular months, years and climate indices (in this case the climate indices are rainfall and fire area, so not actualy climate indices at all). A text file is attached but a small sample of the file: rainfallfirearea 1972 Jan12.70831990 1972 Feb14.170071420 1972 Mar14.56593020 1972 Apr1.5085173020 1972 May2.7800098890 1972 Jun1.6096192870 1972 Jul0.13815018128 1972 Aug0.21434614832 1972 Sep1.32210222834747.8 1972 Oct0.0926631373655.9 1972 Nov1.85227663585.1 1972 Dec2.01120600242959.6 1973 Jan5.55704346153.5 1973 Feb12.60326356116.2 1973 Mar11.08849105223.6 1973 Apr5.8649254492.4 .. I have used an example from a book (a primer on scientific programming with python) and it seems to be working (see below) but I am not sure if I have my keys etc. are set up correctly to then begin anlaysis, and even how to use the dictionaries in my analysis. For example how can I print out the year with calculated the mean 'firearea' of June-July-August for that year along with the maximum 'rainfall' for June-July-august of the same year? Any feedback will be greatly appreaciated! infile=open('d:/yearmonthrainfire.txt','r') lines=infile.readlines() infile.close() data={} #data[index][year]=indexvalue first_line=lines[0] climateindexname=first_line.split() for index in climateindexname: data[index]={} YEAR={} MONTH={} for line in lines[2:]: words=line.split() year=words[0] #years YEAR[year]={} month=words[1] #months MONTH[month]={} values=words[2:] #values of climateindices for index, v in zip(climateindexname, values): if v !=' ': data[index][year]=float(v) print years=, YEAR print months=, MONTH print data=, data meanrainfirearea 1972 Jan12.7083199 0 1972 Feb14.17007142 0 1972 Mar14.5659302 0 1972 Apr1.508517302 0 1972 May2.780009889 0 1972 Jun1.609619287 0 1972 Jul0.138150181 28 1972 Aug0.214346148 32 1972 Sep1.322102228 34747.8 1972 Oct0.092663137 3655.9 1972 Nov1.852276635 85.1 1972 Dec2.011206002 42959.6 1973 Jan5.55704346 153.5 1973 Feb12.60326356 116.2 1973 Mar11.08849105 223.6 1973 Apr5.864925449 2.4 1973 May2.352622232 0 1973 Jun1.600553474 0 1973 Jul0.776217634 0.4 1973 Aug0.369365192 0 1973 Sep2.2226749 13523.2 1973 Oct1.122739926 229.3 1973 Nov7.904255106 144 1973 Dec13.31568494 1558.5 1974 Jan24.85492667 1170.8 1974 Feb16.21160985 11.1 1974 Mar15.68630322 64.5 1974 Apr1.761830238 0 1974 May3.245113376 0 1974 Jun0.413113179 0 1974 Jul0.056925965 0 1974 Aug1.056679232 0 1974 Sep0.506806924 0.2 1974 Oct0.571465459 0 1974 Nov1.747845479 2939.4 1974 Dec5.423673558 2212.4 1975 Jan8.246915224 34838.4 1975 Feb10.09194262 14467.3 1975 Mar8.228448671 2673.5 1975 Apr4.033215013 608.4 1975 May1.953680674 28.2 1975 Jun1.343020358 0 1975 Jul1.038350229 48 1975 Aug1.217719419 0 1975 Sep2.873960222 2.4 1975 Oct3.647092246 2.1 1975 Nov1.789754789 41135 1975 Dec12.7257926 1837.2 1976 Jan8.670511763 12972.1 1976 Feb12.82303773 556.9 1976 Mar8.693052739 646.6 1976 Apr4.824761671 2441.1 1976 May1.185164667 174.3 1976 Jun0.807438979 0 1976 Jul0.772650504 0 1976 Aug0.339851675 0.8 1976 Sep0.172502614 18.5 1976 Oct0.942766622 18 1976 Nov2.882269122 115558.6 1976 Dec8.023477936 2566.7 1977 Jan6.995102092 2771.8 1977 Feb18.99860777 21637.1 1977 Mar10.80012409 145.1 1977 Apr7.522531332 792.3 1977 May3.779496681 171.6 1977 Jun0.653845173 0 1977 Jul0.577692075 0 1977 Aug0.46951093 903.4 1977 Sep0.690806944 164 1977 Oct0.355984233 38512.8 1977 Nov2.256052975 1946.3 1977 Dec4.696920357 3620.8 1978 Jan7.361228919 38343.3 1978 Feb6.082772277 7332.2 1978 Mar2.345215476 298.3 1978 Apr4.67913104 117.7 1978 May2.831787858 5 1978 Jun0.371906624 0 1978 Jul0.815953022 0 1978 Aug1.214826869 0 1978 Sep0.822085465
Re: [Tutor] table to dictionary and then analysis
On 5/14/2012 10:16 PM, questions anon wrote: I am completely new to dictionaries and I am not even sure if this is what I need to use. I have a text file that I would like to run summary stats on particular months, years and climate indices (in this case the climate indices are rainfall and fire area, so not actualy climate indices at all). I would set up a SQLite database with a table of 4 numeric columns: year, month, rainfall, firearea Use SQL to select the desired date range and do the max and avg calculations: select year, avg(firearea), max(rainfall) from table where year = 1973 and month between 6 and 8) you can use dictionaries but that will be harder. Here a start (untested). Assumes data are correct. months = dict(Jan=1,Feb=2,Mar=4,Apr=4,May=5,Jun=6,Jul=7,Aug=8,Sep=9.Oct=10,Nov=11,Dec=12) for line in open('d:/yearmonthrainfire.txt','r'): line = line.split() year = int(line[0]) month = months[line[1]] rainfall = float(line[2] firearea = float(line[3] sql = insert into table (year, month, rainfall, firearea) values(%i,%i,%f,%f) % (year, month, rainfall, firearea) # I don't have handy how one runs the sql A text file is attached but a small sample of the file: rainfallfirearea 1972 Jan12.70831990 1972 Feb14.170071420 1972 Mar14.56593020 1972 Apr1.5085173020 1972 May2.7800098890 1972 Jun1.6096192870 1972 Jul0.13815018128 1972 Aug0.21434614832 1972 Sep1.32210222834747.8 1972 Oct0.0926631373655.9 1972 Nov1.85227663585.1 1972 Dec2.01120600242959.6 1973 Jan5.55704346153.5 1973 Feb12.60326356116.2 1973 Mar11.08849105223.6 1973 Apr5.8649254492.4 .. I have used an example from a book (a primer on scientific programming with python) and it seems to be working (see below) but I am not sure if I have my keys etc. are set up correctly to then begin anlaysis, and even how to use the dictionaries in my analysis . For example how can I print out the year with calculated the mean 'firearea' of June-July-August for that year along with the maximum 'rainfall' for June-July-august of the same year? Any feedback will be greatly appreaciated! infile=open('d:/yearmonthrainfire.txt','r') lines=infile.readlines() infile.close() data={} #data[index][year]=indexvalue first_line=lines[0] climateindexname=first_line.split() for index in climateindexname: data[index]={} YEAR={} MONTH={} for line in lines[2:]: words=line.split() year=words[0] #years YEAR[year]={} month=words[1] #months MONTH[month]={} values=words[2:] #values of climateindices for index, v in zip(climateindexname, values): if v !=' ': data[index][year]=float(v) print years=, YEAR print months=, MONTH print data=, data We usually reserve all caps names for constants. You have way too many dictionaries. Your program seems very complex for a very simple task. I will not attempt to figure out what it does. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor -- Bob Gailer 919-636-4239 Chapel Hill NC ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor