[Tutor] Organizing 15500 records, how?

2006-12-13 Thread Peter Jessop

With more than 15000 records you would be better off using a relational
database.
Although it will create more work to start with (you'll have to learn it),
it will save you a lot of work in the medium and long term.

Almost any relational database can be accessed from python.As it is just for
your own use SQLite might be the most appropiate (it has a very small
footprint) but MySQL is excellent and so are many others.

To use a relational database you might think about learning SQL. It is very
easy (especially if you you know any Boolean algebra) and is a language that
has been used almost unchanged for decades and shows every sign of staying
here for a long time. In computing it is one of the most useful things you
can learn. There is a good introductory, interactive tutorial
athttp://sqlcourse.com/

If you feel you need another abstraction layer on top of this you could look
at SQLObject http://www.sqlobject.org/.

Personally I would recommend that you start with MySQLhttp://www.mysql.com.
It is open source, easy to install and use, stable and fast.  But with SQL
motors you have lots of good choices.

Peter Jessop


On 12/13/06, Thomas [EMAIL PROTECTED] wrote:

I'm writing a program to analyse the profiles of the 15500 users of my
forum. I have the profiles as html files stored locally and I'm using
ClientForm to extract the various details from the html form in each
file.

My goal is to identify lurking spammers but also to learn how to
better spot spammers by calculating statistical correlations in the
data against known spammers.

I need advise with how to organise my data. There are 50 fields in
each profile, some fields will be much more use than others so I
though about creating say 10 files to start off with that contained
dictionaries of userid to field value. That way I'm dealing with 10 to
50 files instead of 15500.

Also, I am inexperienced with using classes but eager to learn and
wonder if they would be any help in this case.

Any advise much appreciated and thanks in advance,
Thomas
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Organizing 15500 records, how?

2006-12-13 Thread Alan Gauld
Thomas [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 I'm writing a program to analyse the profiles of the 15500 users

 though about creating say 10 files to start off with that contained
 dictionaries of userid to field value. That way I'm dealing with 10 
 to
 50 files instead of 15500.

To be honest, with that numbers you will be better using a database.
Both for storage and  search speed but also for the analysis.
SQL is designed for this kind of job.

You can read a shortish intro to using databases in my tutorial.
It uses SqlLite but is easily adapted to any database engine.
Even if your ISP doesn't support installed databases I would
still consider downloading your data into a local database
for the analysis job and writing HTML import/export functions
to deal with administering the web site. But if you can put a
database on the web site then so much the better. It will
almost certainly simplify your code and improve performance
and/or resource usage.

Alan G 


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Organizing 15500 records, how?

2006-12-12 Thread Thomas
I'm writing a program to analyse the profiles of the 15500 users of my
forum. I have the profiles as html files stored locally and I'm using
ClientForm to extract the various details from the html form in each
file.

My goal is to identify lurking spammers but also to learn how to
better spot spammers by calculating statistical correlations in the
data against known spammers.

I need advise with how to organise my data. There are 50 fields in
each profile, some fields will be much more use than others so I
though about creating say 10 files to start off with that contained
dictionaries of userid to field value. That way I'm dealing with 10 to
50 files instead of 15500.

Also, I am inexperienced with using classes but eager to learn and
wonder if they would be any help in this case.

Any advise much appreciated and thanks in advance,
Thomas
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor