Mudcat wrote: > I am trying to build a tool that analyzes stock data. Therefore I am > going to download and store quite a vast amount of it. Just for a > general number - assuming there are about 7000 listed stocks on the two > major markets plus some extras, 255 tradying days a year for 20 years, > that is about 36 million entries. > > Obviously a database is a logical choice for that. However I've never > used one, nor do I know what benefits I would get from using one. I am > worried about speed, memory usage, and disk space.
This is a typical use case for relational database systems. With something like DB2 or Oracle here, you can take advantage of more than 20 years of work by lots of developers trying to solve the kind of problems you will run into. You haven't really stated all the facts to decide what product to choose though. Will this be a multi-user applications? Do you forsee a client/server application? What operating system(s) do you need to support? With relational databases, it's plausible to move some of the hard work in the data analysis into the server. Using this well means that you need to learn a bit about how relational databases work, but I think it's with the trouble. It could mean that much less data ever needs to reach your Python program for processing, and that will mean a lot for your performance. Relational databases are very good at searching, sorting and simple aggregations of data. SQL is a declarative language, and in principle, your SQL code will just declare the correct queries and manipulations that you want to achieve, and tuning will be a separate activity, which doesn't need to involve program changes. In reality, there are certainly cases where changes in SQL code will influence performance, but to a very large extent, you can achieve good performance through building indices and by letting the database gather statistics and analyze the queries your programs contain. As a bonus, you also have advanced systems for security, transactional safety, on- line backup, replication etc. You don't get these advantages with any other data storage systems. I'd get Chris Fehily's "SQL Visual Quickstart Guide", which is as good as his Python book. As database, it depends a bit on your platform you work with. I'd avoid MySQL. Some friends of mine have used it for needs similar to yours, and they are now running into its severe shortcomings. (I did warn them.) For Windows, I think the single user version of SQL Server (MSDE?) is gratis. For both Windows and Linux/Unix, there are (I think) gratis versions of both Oracle 10g, IBM DB2 UDB and Mimer SQL. Mimer SQL is easy to install, Oracle is a pain, and I think DB2 is somewhere in between. PostgreSQL is also a good option. Either way, it certainly seems natural to learn relational databases and SQL if you want to work with financial software. -- http://mail.python.org/mailman/listinfo/python-list