I have tens of millions (could be more) of document in files. Each of them has other properties in separate files. I need to check if they exist, update and merge properties, etc. And this is not a one time job. Because of the quantity of the files, I think querying and updating a database will take a long time...
Let's say, I want to do something a search engine needs to do in terms of the amount of data to be processed on a server. I doubt any serious search engine would use a database for indexing and searching. A hash table is what I need, not powerful queries. "John Nagle" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Jack wrote: >> I need to process large amount of data. The data structure fits well >> in a dictionary but the amount is large - close to or more than the size >> of physical memory. I wonder what will happen if I try to load the data >> into a dictionary. Will Python use swap memory or will it fail? >> >> Thanks. > > What are you trying to do? At one extreme, you're implementing > something > like a search engine that needs gigabytes of bitmaps to do joins fast as > hundreds of thousands of users hit the server, and need to talk seriously > about 64-bit address space machines. At the other, you have no idea how > to either use a database or do sequential processing. Tell us more. > > John Nagle -- http://mail.python.org/mailman/listinfo/python-list