Hi, I am trying to build a tool that analyzes stock data. Therefore I am going to download and store quite a vast amount of it. Just for a general number - assuming there are about 7000 listed stocks on the two major markets plus some extras, 255 tradying days a year for 20 years, that is about 36 million entries.
Obviously a database is a logical choice for that. However I've never used one, nor do I know what benefits I would get from using one. I am worried about speed, memory usage, and disk space. My initial thought was to put the data in large dictionaries and shelve them (and possibly zipping them to save storage space until the data is needed). However, these are huge files. Based on ones that I have already done, I estimated at least 5 gigs for storage this way. My structure for this files was a 3 layered dictionary. [Market][Stock][Date](Data List). That allows me to easily access any data for any date or stock in a particular market. Therefore I wasn't really concerned about the organizational aspects of a db since this would serve me fine. But before I put this all together I wanted to ask around to see if this is a good approach. Will it be faster to use a database over a structured dictionary? And will I get a lot of overhead if I go with a database? I'm hoping people who have dealt with such large data before can give me a little advice. Thanks ahead of time, Marc -- http://mail.python.org/mailman/listinfo/python-list