Re: [Tutor] table to dictionary and then analysis

2012-05-22 Thread questions anon
thanks for all of the responses, has been really helpful

On Fri, May 18, 2012 at 8:54 PM, Russel Winder rus...@winder.org.uk wrote:

 On Thu, 2012-05-17 at 19:35 +1000, Steven D'Aprano wrote:
  On Thu, May 17, 2012 at 08:27:07AM +0100, Russel Winder wrote:
 
   Should we be promoting use of the format method in strings rather than
   the % operator? % is deprecated now.
 
  It most certainly is not.
 
  There are no plans to deprecate the string % operator any time in the
  foreseeable future. It may, hypothetically, wither away from lack of use

 OK I am clearly wrong with the statement I made.  I had assumed the
 statement (*) that it would be deprecated in 3.1 was carried through, I
 had not actually checked the reality in the 3.1 and 3.2 differences
 documents.  My apologies for misdirection, thanks for pulling me up on
 this.


 (*) There is no statement of when deprecation would happen in PEP 3101
 itself http://www.python.org/dev/peps/pep-3101/ but there is an explicit
 statement in

 http://docs.python.org/release/3.0/whatsnew/3.0.html#pep-3101-a-new-approach-to-string-formattingwhich
  clearly didn't happen even though it led to a lot of people thinking
 it would.

 --
 Russel.

 =
 Dr Russel Winder  t: +44 20 7585 2200   voip:
 sip:russel.win...@ekiga.net
 41 Buckmaster Roadm: +44 7770 465 077   xmpp: rus...@winder.org.uk
 London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder


 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/mailman/listinfo/tutor


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] table to dictionary and then analysis

2012-05-18 Thread Russel Winder
On Thu, 2012-05-17 at 19:35 +1000, Steven D'Aprano wrote:
 On Thu, May 17, 2012 at 08:27:07AM +0100, Russel Winder wrote:
 
  Should we be promoting use of the format method in strings rather than
  the % operator? % is deprecated now.
 
 It most certainly is not.
 
 There are no plans to deprecate the string % operator any time in the 
 foreseeable future. It may, hypothetically, wither away from lack of use 

OK I am clearly wrong with the statement I made.  I had assumed the
statement (*) that it would be deprecated in 3.1 was carried through, I
had not actually checked the reality in the 3.1 and 3.2 differences
documents.  My apologies for misdirection, thanks for pulling me up on
this.


(*) There is no statement of when deprecation would happen in PEP 3101
itself http://www.python.org/dev/peps/pep-3101/ but there is an explicit
statement in
http://docs.python.org/release/3.0/whatsnew/3.0.html#pep-3101-a-new-approach-to-string-formatting
 which clearly didn't happen even though it led to a lot of people thinking it 
would.

-- 
Russel.
=
Dr Russel Winder  t: +44 20 7585 2200   voip: sip:russel.win...@ekiga.net
41 Buckmaster Roadm: +44 7770 465 077   xmpp: rus...@winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder



signature.asc
Description: This is a digitally signed message part
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] table to dictionary and then analysis

2012-05-17 Thread Russel Winder
On Wed, 2012-05-16 at 12:57 -0400, Joel Goldstick wrote:
[...]
 I think the OP is just learning and this thread may have gotten of track.

I didn't realize discussion of immediate side issues and alternatives,
and allowing people to exchange information was OT in this mailing list.
Also of course, OP didn't mention databases, but was asking how to do it
with lists and dictionaries. I think there is irony somewhere in here.

 Here is some code to get started.  I decided to use sqlite3 since its
 easy to use with python -- no finding and learning to load packages.
 
 
 #!/usr/bin/env python
 
 import sqlite3 as db
 
 # Ideally this shouldn't be global, but in this short code snippet it
 gets the job done
 # here we create a database and get a cursor
 conn = db.connect('climate.db')
 cursor = conn.cursor()
 print cursor

I believe that there are more problems than just global data here. One
obvious thing is this code is not safe against exceptions. I appreciate
this is a trivially small program, but I think it important that sample
code presents good style or explicitly states what is wrong so as to
present what not to do. Your comment about global sits well with this,
but for me doesn't go far enough. Python introduced context managers and
the with statement exactly for this sort of situation, following the
lead of C++ with RAII. I think we should all get into the habit of using
the with statement automatically in this situation.

 # this will create a table for our data
 sql_create = CREATE TABLE if not exists rain (
 id INTEGER PRIMARY KEY,
 year INTEGER,
 month TEXT(3),
 rainfall FLOAT,
 fire_area FLOAT
 )
 
 # this will read the data file and put it in our database
 def populate_climate_table(file_name):
 
 reads the file_name and insert data into sqlite table
 
 sql_insert_string = insert into rain (year, month, rainfall,
 fire_area) values (%d, '%s', %f, %f)
 
 f = open(file_name)

Same comment about context managers and with statement applies here:
this code is not exception safe.

 f.readline() # get rid of column headers
 for l in f.readlines():
 data_list = l.split()
 print data_list
 sql_insert = sql_insert_string % (int(data_list[0]),
 data_list[1], float(data_list[2]), float(data_list[3]))

Should we be promoting use of the format method in strings rather than
the % operator? % is deprecated now.

Although not an issue here, this sort of SQL string manipulation is at
the heart of SQL injection attacks and so should be frowned upon. Hence
SQLAlchemy's expression languages, which goes some way to avoiding the
whole issue.  At the expense of having to load an additional package.
With package management on Debian/Fedora/Ubuntu/MacPorts or the pip
command this is not difficult to add. 

 print sql_insert
 cursor.execute(sql_insert)
 conn.commit()
 
 
 if __name__ == '__main__':
 
 print sql_create
 cursor.execute(sql_create)
 populate_climate_table('data.txt')
 
 
 So, I haven't solved all of the questions with this code.  The next
 thing to do is to read a little about sqlite select statements.
 for example: sqlite select sum(rainfall)/count(*) from rain;
 3.97352768125
 
 This statement will give the average rainfall over the complete dataset.
 To get the ave rainfall for a given year do this:
 sqlite select sum(rainfall)/count(*) from rain where year = 1983;
 
 Come back with more questions

-- 
Russel.
=
Dr Russel Winder  t: +44 20 7585 2200   voip: sip:russel.win...@ekiga.net
41 Buckmaster Roadm: +44 7770 465 077   xmpp: rus...@winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder


signature.asc
Description: This is a digitally signed message part
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] table to dictionary and then analysis

2012-05-17 Thread Russel Winder
On Wed, 2012-05-16 at 16:03 +0100, Alan Gauld wrote:
[...]
 I agree, but in this case SQL seemed like the most likely fit of the 
 ones I knew. however:

Which raises the point that the best design of a given problem in a
given context is the one that is most comprehensible to the people
directly involved.

   SQL
   MongoDB
 
 I know about these

I have it on good authority yesterday that MongDB is only properly
useful in a single store context, i.e. not a replicated cluster.  Along
withthis comes news that Riak is very good and has a Python API.

 
   CouchDB
   Cassandra
   Neo
 
 These are new to me.

CouchDB is an Erlang implemented system. Ubuntu One uses this for
example.  Cassandra is an Apache project, a JVM-based system. MongoDB,
CouchDB and Cassandra are document stores. Neo4J is a graph repository
so of a very different architecture and performance characteristics. And
then there is Redis :-)

 
  etc. Python only has SQLite3 as standard but there are alternatives. I
  have been using PyMongo quite successfully.
 
 Python comes with several storage/access options including shelve, gdbm, 
 ldap, cobfig files, XML, in addition to SQL.

Indeed. The problem I have had with shelve for this sort of thing is
that is is critically dependent on the pickling algorithm and so
potentially Python version dependent.

[...]
 on flexiblity of data format. The OPs requirements suggested intelligent 
 filtering of a fixed record format which is one of the areas where SQL 
 works well. The other side of the coin is that the data is essentially 
 single table so the relationship management aspects of SQL would not be 
 needed. So I agree we don't have enough detail
 to be 100% sure that another option would not work as well or better.

The signpost here is that the table as is is likely not in third normal
form, and that if the problem currently being solved was actually a
small part of a bigger problem, this issue would need to be addressed.

 But most other options require learning new (often bespoke) query 
 languages and have limited user tools. All of these factors need to be 
 included too. Mongo et al tend to be better suited, in my experience, to 
 machine access applications rather than end user access.

Agreed. Unfortunately, vendor commercial issues often get in the way of
experimenting to find out where the NoSQL systems are genuinely better
than SQL ones. We will get there though.

Interesting, or not, the Big Data people are rapidly realizing that
data mining and SQL are mutually incompatible. The current trend is
towards streaming whole databases through dataflow programs. But Java
rather than Python is the default language in that world.

  There are various articles around the Web comparing and contrasting
  these various models. Some of the articles are even reasonable :-)
 
 Wikipedia is my friend :-)

:-)

-- 
Russel.
=
Dr Russel Winder  t: +44 20 7585 2200   voip: sip:russel.win...@ekiga.net
41 Buckmaster Roadm: +44 7770 465 077   xmpp: rus...@winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder


signature.asc
Description: This is a digitally signed message part
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] table to dictionary and then analysis

2012-05-17 Thread Alan Gauld

On 17/05/12 08:39, Russel Winder wrote:


Interesting, or not, the Big Data people are rapidly realizing that
data mining and SQL are mutually incompatible.


After many years working with big data mining teams/apps my considered 
opinion is use SAS or one of its peers! It costs money but it works.
And the savings in development time far outweigh the costs. And if you 
have to manage 100TB or more then the costs of SAS licenses are likely 
to be the least of your worries! And if you are really serious about it

run SAS on top of a Teradata $torage $ystem...

Note:
I have no commercial connection with either company, just an admirer of 
their technology solutions.


:-)

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] table to dictionary and then analysis

2012-05-17 Thread Steven D'Aprano
On Thu, May 17, 2012 at 08:27:07AM +0100, Russel Winder wrote:

 Should we be promoting use of the format method in strings rather than
 the % operator? % is deprecated now.

It most certainly is not.

There are no plans to deprecate the string % operator any time in the 
foreseeable future. It may, hypothetically, wither away from lack of use 
(although I doubt it -- so long as there are C programmers, there will 
be people who like % formatting).

If you think it is deprecated, please show me in the official Python 
documentation where it says so.

As far as I am concerned, the final word on deprecation belongs to the 
creator of Python, and BDFL, Guido van Rossum, who hopes that *at best* 
the format method will gradually replace % formatting over the next 
decade or so before the (as yet hypothetical) Python 4:

http://mail.python.org/pipermail/python-dev/2009-September/092399.html

Rather than repeat myself, I will just point to what I wrote back in 
January:

http://mail.python.org/pipermail/python-list/2012-January/1285894.html



-- 
Steven
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] table to dictionary and then analysis

2012-05-17 Thread bob gailer

On 5/17/2012 3:27 AM, Russel Winder wrote:

Should we be promoting use of the format method in strings rather than
the % operator? % is deprecated now.
I for one do not like seeing % deprecated. Why? It is not broken, and 
IMHO the easiest to use of all formatting options.


--
Bob Gailer
919-636-4239
Chapel Hill NC

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] table to dictionary and then analysis

2012-05-17 Thread Mark Lawrence

On 17/05/2012 10:35, Steven D'Aprano wrote:

On Thu, May 17, 2012 at 08:27:07AM +0100, Russel Winder wrote:


Should we be promoting use of the format method in strings rather than
the % operator? % is deprecated now.


It most certainly is not.

There are no plans to deprecate the string % operator any time in the
foreseeable future. It may, hypothetically, wither away from lack of use
(although I doubt it -- so long as there are C programmers, there will
be people who like % formatting).

If you think it is deprecated, please show me in the official Python
documentation where it says so.

As far as I am concerned, the final word on deprecation belongs to the
creator of Python, and BDFL, Guido van Rossum, who hopes that *at best*
the format method will gradually replace % formatting over the next
decade or so before the (as yet hypothetical) Python 4:

http://mail.python.org/pipermail/python-dev/2009-September/092399.html

Rather than repeat myself, I will just point to what I wrote back in
January:

http://mail.python.org/pipermail/python-list/2012-January/1285894.html





You beat me to it :)

--
Cheers.

Mark Lawrence.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] table to dictionary and then analysis

2012-05-16 Thread Russel Winder
On Tue, 2012-05-15 at 19:14 +0100, Alan Gauld wrote:
 On 15/05/12 10:36, Russel Winder wrote:
  ...queries passed over it then year a database it the
  right thing -- though I would probably choose a non-SQL database.
 
 As a matter of interest why?

Because there are alternatives that need to be investigated on a per
problem basis for the best database.

SQL
MongoDB
CouchDB
Cassandra
Neo

etc. Python only has SQLite3 as standard but there are alternatives. I
have been using PyMongo quite successfully.

 And what kind of alternative would you use?

See above ;-)

 It seems to me that SQL is ideally suited(*) to this type of role. I'm 
 curious what the alternatives might be and why they would be preferred?
 
 (*)Because: Flexible query language, wide set of tools including GUI 
 query builders, reporting tools etc. Plus easy integration with 
 programming environments, scaleability (less an issue here), 
 security(usually) etc.

It is not clear that the original table works better with the relational
model compared to one of the key-value stores or document stores. It
might. But I no longer always go to SQL for persistence as I used to a
few years ago.

There are various articles around the Web comparing and contrasting
these various models. Some of the articles are even reasonable :-)


-- 
Russel.
=
Dr Russel Winder  t: +44 20 7585 2200   voip: sip:russel.win...@ekiga.net
41 Buckmaster Roadm: +44 7770 465 077   xmpp: rus...@winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder


signature.asc
Description: This is a digitally signed message part
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] table to dictionary and then analysis

2012-05-16 Thread Alan Gauld

On 16/05/12 12:27, Russel Winder wrote:


As a matter of interest why?


Because there are alternatives that need to be investigated on a per
problem basis for the best database.


I agree, but in this case SQL seemed like the most likely fit of the 
ones I knew. however:



 SQL
 MongoDB


I know about these


 CouchDB
 Cassandra
 Neo


These are new to me.


etc. Python only has SQLite3 as standard but there are alternatives. I
have been using PyMongo quite successfully.


Python comes with several storage/access options including shelve, gdbm, 
ldap, cobfig files, XML, in addition to SQL.



It is not clear that the original table works better with the relational
model compared to one of the key-value stores or document stores.


Most key-value stores are optimised for fast queries of a single type
and generally not great at grouping or ordering. They also tend to major 
on flexiblity of data format. The OPs requirements suggested intelligent 
filtering of a fixed record format which is one of the areas where SQL 
works well. The other side of the coin is that the data is essentially 
single table so the relationship management aspects of SQL would not be 
needed. So I agree we don't have enough detail

to be 100% sure that another option would not work as well or better.

But most other options require learning new (often bespoke) query 
languages and have limited user tools. All of these factors need to be 
included too. Mongo et al tend to be better suited, in my experience, to 
machine access applications rather than end user access.



There are various articles around the Web comparing and contrasting
these various models. Some of the articles are even reasonable :-)


Wikipedia is my friend :-)

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] table to dictionary and then analysis

2012-05-16 Thread Joel Goldstick
On Wed, May 16, 2012 at 11:03 AM, Alan Gauld alan.ga...@btinternet.com wrote:
 On 16/05/12 12:27, Russel Winder wrote:

 As a matter of interest why?


 Because there are alternatives that need to be investigated on a per
 problem basis for the best database.


 I agree, but in this case SQL seemed like the most likely fit of the ones I
 knew. however:

         SQL
         MongoDB


 I know about these

         CouchDB
         Cassandra
         Neo


 These are new to me.


 etc. Python only has SQLite3 as standard but there are alternatives. I
 have been using PyMongo quite successfully.


 Python comes with several storage/access options including shelve, gdbm,
 ldap, cobfig files, XML, in addition to SQL.


 It is not clear that the original table works better with the relational
 model compared to one of the key-value stores or document stores.


 Most key-value stores are optimised for fast queries of a single type
 and generally not great at grouping or ordering. They also tend to major on
 flexiblity of data format. The OPs requirements suggested intelligent
 filtering of a fixed record format which is one of the areas where SQL works
 well. The other side of the coin is that the data is essentially single
 table so the relationship management aspects of SQL would not be needed. So
 I agree we don't have enough detail
 to be 100% sure that another option would not work as well or better.

 But most other options require learning new (often bespoke) query languages
 and have limited user tools. All of these factors need to be included too.
 Mongo et al tend to be better suited, in my experience, to machine access
 applications rather than end user access.


 There are various articles around the Web comparing and contrasting
 these various models. Some of the articles are even reasonable :-)


 Wikipedia is my friend :-)


 --
 Alan G
 Author of the Learn to Program web site
 http://www.alan-g.me.uk/

 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/mailman/listinfo/tutor

I think the OP is just learning and this thread may have gotten of track.

Here is some code to get started.  I decided to use sqlite3 since its
easy to use with python -- no finding and learning to load packages.


#!/usr/bin/env python

import sqlite3 as db

# Ideally this shouldn't be global, but in this short code snippet it
gets the job done
# here we create a database and get a cursor
conn = db.connect('climate.db')
cursor = conn.cursor()
print cursor

# this will create a table for our data
sql_create = CREATE TABLE if not exists rain (
id INTEGER PRIMARY KEY,
year INTEGER,
month TEXT(3),
rainfall FLOAT,
fire_area FLOAT
)

# this will read the data file and put it in our database
def populate_climate_table(file_name):

reads the file_name and insert data into sqlite table

sql_insert_string = insert into rain (year, month, rainfall,
fire_area) values (%d, '%s', %f, %f)

f = open(file_name)
f.readline() # get rid of column headers
for l in f.readlines():
data_list = l.split()
print data_list
sql_insert = sql_insert_string % (int(data_list[0]),
data_list[1], float(data_list[2]), float(data_list[3]))
print sql_insert
cursor.execute(sql_insert)
conn.commit()


if __name__ == '__main__':

print sql_create
cursor.execute(sql_create)
populate_climate_table('data.txt')


So, I haven't solved all of the questions with this code.  The next
thing to do is to read a little about sqlite select statements.
for example: sqlite select sum(rainfall)/count(*) from rain;
3.97352768125

This statement will give the average rainfall over the complete dataset.
To get the ave rainfall for a given year do this:
sqlite select sum(rainfall)/count(*) from rain where year = 1983;

Come back with more questions
-- 
Joel Goldstick
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] table to dictionary and then analysis

2012-05-15 Thread questions anon
Thanks Bob,
sql does appear to be very simple although I cannot get the queries to
work. Can you suggest a site that has examples for what I am trying to do.
I have done some googling but it has not been successful so far.



On Tue, May 15, 2012 at 1:38 PM, bob gailer bgai...@gmail.com wrote:

  On 5/14/2012 10:16 PM, questions anon wrote:

 I am completely new to dictionaries and I am not even sure if this is what
 I need to use.
 I have a text file that I would like to run summary stats on particular
 months, years and climate indices (in this case the climate indices are
 rainfall and fire area, so not actualy climate indices at all).

 I would set up a SQLite database with a table of 4 numeric columns: year,
 month, rainfall, firearea
 Use SQL to select the desired date range and do the max and avg
 calculations:
 select year, avg(firearea), max(rainfall) from table where year = 1973 and
 month between 6 and 8)

 you can use dictionaries but that will be harder. Here a start (untested).
 Assumes data are correct.

 months =
 dict(Jan=1,Feb=2,Mar=4,Apr=4,May=5,Jun=6,Jul=7,Aug=8,Sep=9.Oct=10,Nov=11,Dec=12)
 for line in open('d:/yearmonthrainfire.txt','r'):
 line = line.split()
 year = int(line[0])
 month = months[line[1]]
 rainfall = float(line[2]
 firearea = float(line[3]
 sql = insert into table (year, month, rainfall, firearea)
 values(%i,%i,%f,%f) % (year, month, rainfall, firearea)
 # I don't have handy how one runs the sql


 A text file is attached but a small sample of the file:
   rainfallfirearea
 1972 Jan12.70831990
 1972 Feb14.170071420
 1972 Mar14.56593020
 1972 Apr1.5085173020
 1972 May2.7800098890
 1972 Jun1.6096192870
 1972 Jul0.13815018128
 1972 Aug0.21434614832
 1972 Sep1.32210222834747.8
 1972 Oct0.0926631373655.9
 1972 Nov1.85227663585.1
 1972 Dec2.01120600242959.6
 1973 Jan5.55704346153.5
 1973 Feb12.60326356116.2
 1973 Mar11.08849105223.6
 1973 Apr5.8649254492.4
 ..

 I have used an example from a book (a primer on scientific programming
 with python) and it seems to be working (see below) but I am not sure if I
 have my keys etc. are set up correctly to then begin anlaysis, and even how
 to use the dictionaries in my analysis

 . For example how can I print out the year with calculated the mean
 'firearea' of June-July-August for that year along with the maximum
 'rainfall' for June-July-august of the same year?
 Any feedback will be greatly appreaciated!

 infile=open('d:/yearmonthrainfire.txt','r')
 lines=infile.readlines()
 infile.close()
 data={} #data[index][year]=indexvalue
 first_line=lines[0]
 climateindexname=first_line.split()
 for index in climateindexname:
 data[index]={}
 YEAR={}
 MONTH={}

 for line in lines[2:]:
 words=line.split()
 year=words[0] #years
 YEAR[year]={}
 month=words[1] #months
 MONTH[month]={}
 values=words[2:] #values of climateindices
 for index, v in zip(climateindexname, values):
 if v !=' ':
 data[index][year]=float(v)

 print years=, YEAR
 print months=, MONTH
 print data=, data

 We usually reserve all caps names for constants.
 You have way too many dictionaries.
 Your program seems very complex for a very simple task.
 I will not attempt to figure out what it does.




 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription 
 options:http://mail.python.org/mailman/listinfo/tutor



 --
 Bob Gailer919-636-4239
 Chapel Hill NC


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] table to dictionary and then analysis

2012-05-15 Thread Alan Gauld

On 15/05/12 07:12, questions anon wrote:

Thanks Bob,
sql does appear to be very simple although I cannot get the queries to
work. Can you suggest a site that has examples for what I am trying to
do. I have done some googling but it has not been successful so far.


You can try my tutorial topic on databases.

It covers the basics of using SQLlite to query data in a fairly
concise format.

Its only available in the Version 2 tutor so far...

http://www.alan-g.me.uk/tutor/tutdbms.htm



--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] table to dictionary and then analysis

2012-05-15 Thread Russel Winder
On Mon, 2012-05-14 at 23:38 -0400, bob gailer wrote:
[...]
 I would set up a SQLite database with a table of 4 numeric columns: 
 year, month, rainfall, firearea
 Use SQL to select the desired date range and do the max and avg 
 calculations:
 select year, avg(firearea), max(rainfall) from table where year = 1973 
 and month between 6 and 8)
 
 you can use dictionaries but that will be harder. Here a start 
 (untested). Assumes data are correct.

Clearly if the data is to be stored for a long time and have various
(currently unknown) queries passed over it then year a database it the
right thing -- though I would probably choose a non-SQL database.

If the issues is to just do quick calculations over the data in the file
format then nothing wrong with using dictionaries or parallel arrays à
la:

with open ( 'yearmonthrainfire.txt' ) as infile :
climateindexname = infile.readline ( ).split ( )
data = [ line.split ( ) for line in infile.readlines ( ) ]

years = sorted ( { item[0] for item in data } )
months = [ 'Jan' , 'Feb' , 'Mar' , 'Apr' , 'May' , 'Jun' , 'Jul' , 
'Aug' , 'Sep' , 'Oct' , 'Nov' , 'Dec' ]

dataByYear = { year : [ ( float ( item[2] ) , float ( item[3] ) ) for 
item in data if item[0] == year ] for year in years } 
dataByMonth = { month : [ ( float ( item[2] ) , float ( item[3] ) ) for 
item in data if item[1] == month ] for month in months }

averagesByYear = { year : ( sum ( dataByYear[year][0] ) / len ( 
dataByYear[year][0] ) , sum ( dataByYear[year][1] ) / len ( dataByYear[year][1] 
) ) for year in years }
averagesByMonth = { month : ( sum ( dataByMonth[month][0] ) / len ( 
dataByMonth[month][0] ) , sum ( dataByMonth[month][1] ) / len ( 
dataByMonth[month][1] ) ) for month in months }

for year in years :
print ( year , averagesByYear[year][0] , averagesByYear[year][1] )

for month in months :
print ( month , averagesByMonth[month][0] , 
averagesByMonth[month][1] )

The cost of the repetition in the code here is probably minimal compared
to the disc access costs. On the other hand this is a small data set so
time is probably not a big issue.

-- 
Russel.
=
Dr Russel Winder  t: +44 20 7585 2200   voip: sip:russel.win...@ekiga.net
41 Buckmaster Roadm: +44 7770 465 077   xmpp: rus...@winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder


signature.asc
Description: This is a digitally signed message part
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] table to dictionary and then analysis

2012-05-15 Thread Alan Gauld

On 15/05/12 10:36, Russel Winder wrote:

...queries passed over it then year a database it the
right thing -- though I would probably choose a non-SQL database.


As a matter of interest why?
And what kind of alternative would you use?

It seems to me that SQL is ideally suited(*) to this type of role. I'm 
curious what the alternatives might be and why they would be preferred?


(*)Because: Flexible query language, wide set of tools including GUI 
query builders, reporting tools etc. Plus easy integration with 
programming environments, scaleability (less an issue here), 
security(usually) etc.


--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] table to dictionary and then analysis

2012-05-14 Thread questions anon
I am completely new to dictionaries and I am not even sure if this is what
I need to use.
I have a text file that I would like to run summary stats on particular
months, years and climate indices (in this case the climate indices are
rainfall and fire area, so not actualy climate indices at all).

A text file is attached but a small sample of the file:
  rainfallfirearea
1972 Jan12.70831990
1972 Feb14.170071420
1972 Mar14.56593020
1972 Apr1.5085173020
1972 May2.7800098890
1972 Jun1.6096192870
1972 Jul0.13815018128
1972 Aug0.21434614832
1972 Sep1.32210222834747.8
1972 Oct0.0926631373655.9
1972 Nov1.85227663585.1
1972 Dec2.01120600242959.6
1973 Jan5.55704346153.5
1973 Feb12.60326356116.2
1973 Mar11.08849105223.6
1973 Apr5.8649254492.4
..

I have used an example from a book (a primer on scientific programming with
python) and it seems to be working (see below) but I am not sure if I have
my keys etc. are set up correctly to then begin anlaysis, and even how to
use the dictionaries in my analysis. For example how can I print out the
year with calculated the mean 'firearea' of June-July-August for that year
along with the maximum 'rainfall' for June-July-august of the same year?
Any feedback will be greatly appreaciated!

infile=open('d:/yearmonthrainfire.txt','r')
lines=infile.readlines()
infile.close()
data={} #data[index][year]=indexvalue
first_line=lines[0]
climateindexname=first_line.split()
for index in climateindexname:
data[index]={}
YEAR={}
MONTH={}

for line in lines[2:]:
words=line.split()
year=words[0] #years
YEAR[year]={}
month=words[1] #months
MONTH[month]={}
values=words[2:] #values of climateindices
for index, v in zip(climateindexname, values):
if v !=' ':
data[index][year]=float(v)

print years=, YEAR
print months=, MONTH
print data=, data
meanrainfirearea
1972 Jan12.7083199  0
1972 Feb14.17007142 0
1972 Mar14.5659302  0
1972 Apr1.508517302 0
1972 May2.780009889 0
1972 Jun1.609619287 0
1972 Jul0.138150181 28
1972 Aug0.214346148 32
1972 Sep1.322102228 34747.8
1972 Oct0.092663137 3655.9
1972 Nov1.852276635 85.1
1972 Dec2.011206002 42959.6
1973 Jan5.55704346  153.5
1973 Feb12.60326356 116.2
1973 Mar11.08849105 223.6
1973 Apr5.864925449 2.4
1973 May2.352622232 0
1973 Jun1.600553474 0
1973 Jul0.776217634 0.4
1973 Aug0.369365192 0
1973 Sep2.2226749   13523.2
1973 Oct1.122739926 229.3
1973 Nov7.904255106 144
1973 Dec13.31568494 1558.5
1974 Jan24.85492667 1170.8
1974 Feb16.21160985 11.1
1974 Mar15.68630322 64.5
1974 Apr1.761830238 0
1974 May3.245113376 0
1974 Jun0.413113179 0
1974 Jul0.056925965 0
1974 Aug1.056679232 0
1974 Sep0.506806924 0.2
1974 Oct0.571465459 0
1974 Nov1.747845479 2939.4
1974 Dec5.423673558 2212.4
1975 Jan8.246915224 34838.4
1975 Feb10.09194262 14467.3
1975 Mar8.228448671 2673.5
1975 Apr4.033215013 608.4
1975 May1.953680674 28.2
1975 Jun1.343020358 0
1975 Jul1.038350229 48
1975 Aug1.217719419 0
1975 Sep2.873960222 2.4
1975 Oct3.647092246 2.1
1975 Nov1.789754789 41135
1975 Dec12.7257926  1837.2
1976 Jan8.670511763 12972.1
1976 Feb12.82303773 556.9
1976 Mar8.693052739 646.6
1976 Apr4.824761671 2441.1
1976 May1.185164667 174.3
1976 Jun0.807438979 0
1976 Jul0.772650504 0
1976 Aug0.339851675 0.8
1976 Sep0.172502614 18.5
1976 Oct0.942766622 18
1976 Nov2.882269122 115558.6
1976 Dec8.023477936 2566.7
1977 Jan6.995102092 2771.8
1977 Feb18.99860777 21637.1
1977 Mar10.80012409 145.1
1977 Apr7.522531332 792.3
1977 May3.779496681 171.6
1977 Jun0.653845173 0
1977 Jul0.577692075 0
1977 Aug0.46951093  903.4
1977 Sep0.690806944 164
1977 Oct0.355984233 38512.8
1977 Nov2.256052975 1946.3
1977 Dec4.696920357 3620.8
1978 Jan7.361228919 38343.3
1978 Feb6.082772277 7332.2
1978 Mar2.345215476 298.3
1978 Apr4.67913104  117.7
1978 May2.831787858 5
1978 Jun0.371906624 0
1978 Jul0.815953022 0
1978 Aug1.214826869 0
1978 Sep0.822085465   

Re: [Tutor] table to dictionary and then analysis

2012-05-14 Thread bob gailer

On 5/14/2012 10:16 PM, questions anon wrote:
I am completely new to dictionaries and I am not even sure if this is 
what I need to use.
I have a text file that I would like to run summary stats on 
particular months, years and climate indices (in this case the climate 
indices are rainfall and fire area, so not actualy climate indices at 
all).
I would set up a SQLite database with a table of 4 numeric columns: 
year, month, rainfall, firearea
Use SQL to select the desired date range and do the max and avg 
calculations:
select year, avg(firearea), max(rainfall) from table where year = 1973 
and month between 6 and 8)


you can use dictionaries but that will be harder. Here a start 
(untested). Assumes data are correct.


months = 
dict(Jan=1,Feb=2,Mar=4,Apr=4,May=5,Jun=6,Jul=7,Aug=8,Sep=9.Oct=10,Nov=11,Dec=12)

for line in open('d:/yearmonthrainfire.txt','r'):
line = line.split()
year = int(line[0])
month = months[line[1]]
rainfall = float(line[2]
firearea = float(line[3]
sql = insert into table (year, month, rainfall, firearea) 
values(%i,%i,%f,%f) % (year, month, rainfall, firearea)

# I don't have handy how one runs the sql


A text file is attached but a small sample of the file:
  rainfallfirearea
1972 Jan12.70831990
1972 Feb14.170071420
1972 Mar14.56593020
1972 Apr1.5085173020
1972 May2.7800098890
1972 Jun1.6096192870
1972 Jul0.13815018128
1972 Aug0.21434614832
1972 Sep1.32210222834747.8
1972 Oct0.0926631373655.9
1972 Nov1.85227663585.1
1972 Dec2.01120600242959.6
1973 Jan5.55704346153.5
1973 Feb12.60326356116.2
1973 Mar11.08849105223.6
1973 Apr5.8649254492.4
..

I have used an example from a book (a primer on scientific programming 
with python) and it seems to be working (see below) but I am not sure 
if I have my keys etc. are set up correctly to then begin anlaysis, 
and even how to use the dictionaries in my analysis
. For example how can I print out the year with calculated the mean 
'firearea' of June-July-August for that year along with the maximum 
'rainfall' for June-July-august of the same year?

Any feedback will be greatly appreaciated!

infile=open('d:/yearmonthrainfire.txt','r')
lines=infile.readlines()
infile.close()
data={} #data[index][year]=indexvalue
first_line=lines[0]
climateindexname=first_line.split()
for index in climateindexname:
data[index]={}
YEAR={}
MONTH={}

for line in lines[2:]:
words=line.split()
year=words[0] #years
YEAR[year]={}
month=words[1] #months
MONTH[month]={}
values=words[2:] #values of climateindices
for index, v in zip(climateindexname, values):
if v !=' ':
data[index][year]=float(v)

print years=, YEAR
print months=, MONTH
print data=, data

We usually reserve all caps names for constants.
You have way too many dictionaries.
Your program seems very complex for a very simple task.
I will not attempt to figure out what it does.





___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor



--
Bob Gailer
919-636-4239
Chapel Hill NC

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor