Re: Dictionary or Database—Please advise
On Fri, Feb 26, 2010 at 10:58 AM, Jeremy jlcon...@gmail.com wrote: I have lots of data that I currently store in dictionaries. However, the memory requirements are becoming a problem. I am considering using a database of some sorts instead, but I have never used them before. Would a database be more memory efficient than a dictionary? I also need platform independence without having to install a database and Python interface on all the platforms I'll be using. Is there something built-in to Python that will allow me to do this? Thanks, Jeremy Python has SQLite 3 built-in and there are wrappers for MySQL and PostgreSQL on all major platforms. Any one of them will work- databases have the advantage that they're stored on the disk so you don't have all of it in memory simultaneously. -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: Dictionary or Database—Please advise
On Fri, Feb 26, 2010 at 7:58 AM, Jeremy jlcon...@gmail.com wrote: I have lots of data that I currently store in dictionaries. However, the memory requirements are becoming a problem. I am considering using a database of some sorts instead, but I have never used them before. Would a database be more memory efficient than a dictionary? I also need platform independence without having to install a database and Python interface on all the platforms I'll be using. Is there something built-in to Python that will allow me to do this? If you won't be using the SQL features of the database, `shelve` might be another option; from what I can grok, I sounds like a dictionary stored mostly on disk rather than entirely in RAM (not 100% sure though): http://docs.python.org/library/shelve.html It's in the std lib and supports several native dbm libraries for its backend; one of them should almost always be present. Cheers, Chris -- http://blog.rebertia.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Dictionary or Database—Please advise
On Feb 26, 3:58 pm, Jeremy jlcon...@gmail.com wrote: I have lots of data that I currently store in dictionaries. However, the memory requirements are becoming a problem. I am considering using a database of some sorts instead, but I have never used them before. Would a database be more memory efficient than a dictionary? I also need platform independence without having to install a database and Python interface on all the platforms I'll be using. Is there something built-in to Python that will allow me to do this? Thanks, Jeremy Maybe shelve would be enough for your needs? http://docs.python.org/library/shelve.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Dictionary or Database�Please advise
In article 891a98fa-c398-455a-981f-bf72af772...@s36g2000prh.googlegroups.com, Jeremy jlcon...@gmail.com wrote: I have lots of data that I currently store in dictionaries. However, the memory requirements are becoming a problem. I am considering using a database of some sorts instead, but I have never used them before. Would a database be more memory efficient than a dictionary? I also need platform independence without having to install a database and Python interface on all the platforms I'll be using. Is there something built-in to Python that will allow me to do this? Thanks, Jeremy This is a very vague question, so it'll get a vague answer :-) If you have so much data that you're running into memory problems, then yes, storing the data externally in an disk-resident database seems like a reasonable idea. Once you get into databases, platform independence will be an issue. There are many databases out there to pick from. If you want something which will work on a lot of platforms, a reasonable place to start looking is MySQL. It's free, runs on lots of platforms, has good Python support, and there's lots of people on the net who know it and are willing to give help and advice. Databases have a bit of a learning curve. If you've never done any database work, don't expect to download MySql (or any other database) this afternoon and be up and running by tomorrow. Whatever database you pick, you're almost certainly going to end up having to install it wherever you install your application. There's no such thing as a universally available database that you can expect to be available everywhere. Have fun! -- http://mail.python.org/mailman/listinfo/python-list
Re: Dictionary or Database—Please advise
On Feb 26, 9:29 am, Chris Rebert c...@rebertia.com wrote: On Fri, Feb 26, 2010 at 7:58 AM, Jeremy jlcon...@gmail.com wrote: I have lots of data that I currently store in dictionaries. However, the memory requirements are becoming a problem. I am considering using a database of some sorts instead, but I have never used them before. Would a database be more memory efficient than a dictionary? I also need platform independence without having to install a database and Python interface on all the platforms I'll be using. Is there something built-in to Python that will allow me to do this? If you won't be using the SQL features of the database, `shelve` might be another option; from what I can grok, I sounds like a dictionary stored mostly on disk rather than entirely in RAM (not 100% sure though):http://docs.python.org/library/shelve.html It's in the std lib and supports several native dbm libraries for its backend; one of them should almost always be present. Cheers, Chris --http://blog.rebertia.com Shelve looks like an interesting option, but what might pose an issue is that I'm reading the data from a disk instead of memory. I didn't mention this in my original post, but I was hoping that by using a database it would be more memory efficient in storing data in RAM so I wouldn't have to read from (or swap to/from) disk. Would using the shelve package make reading/writing data from disk faster since it is in a binary format? Jeremy -- http://mail.python.org/mailman/listinfo/python-list
Re: Dictionary or Database—Please advise
Jeremy wrote: I have lots of data that I currently store in dictionaries. However, the memory requirements are becoming a problem. I am considering using a database of some sorts instead, but I have never used them before. Would a database be more memory efficient than a dictionary? I also need platform independence without having to install a database and Python interface on all the platforms I'll be using. Is there something built-in to Python that will allow me to do this? Since you use dictionaries, I guess that simple store saving key:value will do? If so, bsddb support built into Python will do just nicely. bsddb is multiplatform, although I have not personally tested if a binary db created on one platform will be usable on another. You'd have to check this. Caveat: from what some people say I gather that binary format between bsddb versions tends to change. There's also ultra-cool-and-modern Tokyo Cabinet key:value store with Python bindings: http://pypi.python.org/pypi/pytc/ I didn't test it, though. Regards, mk -- http://mail.python.org/mailman/listinfo/python-list
Re: Dictionary or Database—Please advise
Jeremy wrote: Shelve looks like an interesting option, but what might pose an issue is that I'm reading the data from a disk instead of memory. I didn't mention this in my original post, but I was hoping that by using a database it would be more memory efficient in storing data in RAM so I wouldn't have to read from (or swap to/from) disk. Would using the shelve package make reading/writing data from disk faster since it is in a binary format? Read the docs: class shelve.BsdDbShelf(dict[, protocol=None[, writeback=False]])¶ A subclass of Shelf which exposes first(), next(), previous(), last() and set_location() which are available in the bsddb module but not in other database modules. The dict object passed to the constructor must support those methods. This is generally accomplished by calling one of bsddb.hashopen(), bsddb.btopen() or bsddb.rnopen(). The optional protocol and writeback parameters have the same interpretation as for the Shelf class. Apparently using shelve internally gives you option of using bsddb, which is good news: bsddb is B-tree DB, which is highly efficient for finding keys. I would recommend bsddb.btopen(), as it creates B-tree DB (perhaps other implementations, like anydb or hash db are good as well, but I personally didn't test them out). I can't say for Berkeley DB implementation, but in general B-tree algorithm has O(log2 n) complexity for finding keys, which roughly means that if you need to find particular key in a db of 1 million keys, you'll probably need ~20 disk accesses (or even less if some keys looked at in the process of search happen to be in the same disk sectors). So yes, it's highly efficient. Having said that, remember that disk is many orders of magnitude slower than RAM, so it's no free lunch.. Nothing will beat memory-based data structure when it comes to speed (well new flash or hybrid disks perhaps could significantly improve in comparison to current mainstream mechanical-platter disks? there are some hyper-fast storage hardware companies out there, although they tend to charge arm and leg for their stuff for now). Caveat: Berkeley DB is dual-licensed -- if you're using it for commercial work, it might be that you'd need to buy a license for it. Although I have had no experience with this really, if someone here did perhaps they will shed some light on it? Regards, mk -- http://mail.python.org/mailman/listinfo/python-list
Re: Dictionary or Database—Please advise
On Feb 26, 10:58 am, Jeremy jlcon...@gmail.com wrote: I have lots of data How much is lots? that I currently store in dictionaries. However, the memory requirements are becoming a problem. I am considering using a database of some sorts instead, but I have never used them before. Would a database be more memory efficient than a dictionary? What do you mean by more efficient? I also need platform independence without having to install a database and Python interface on all the platforms I'll be using. Is there something built-in to Python that will allow me to do this? The SQLite datbase engine is built into Python 2.5 and up. I have heard on this list that there may be problems with it with Python 2.6 and up on Linux, but I've stayed with 2.5 and it works fine for me on WinXP, Vista, and Linux. You can use it as a disk-stored single database file, or an in-memory- only database. The SQLite website (http://www.sqlite.org/) claims it is the most widely deployed SQL database engine in the world., for what that's worth. Have a look at this: http://docs.python.org/library/sqlite3.html Che Thanks, Jeremy -- http://mail.python.org/mailman/listinfo/python-list
Re: Dictionary or Database—Please advise
Shelve looks like an interesting option, but what might pose an issue is that I'm reading the data from a disk instead of memory. I didn't mention this in my original post, but I was hoping that by using a database it would be more memory efficient in storing data in RAM so I wouldn't have to read from (or swap to/from) disk. A database usually stores data on disk and not in RAM. However you could use sqlite with :memory:, so that it runs in RAM. Would using the shelve package make reading/writing data from disk faster since it is in a binary format? Faster than what? Shelve uses caching, so it is likely to be faster than a self-made solution. However, accessing disk is much slower than accessing RAM. Jeremy - Patrick -- http://mail.python.org/mailman/listinfo/python-list
Re: Dictionary or Database—Please advise
In article mailman.296.1267217819.4577.python-l...@python.org, Patrick Sabin patrick.just4...@gmail.com wrote: A database usually stores data on disk and not in RAM. However you could use sqlite with :memory:, so that it runs in RAM. The OP wants transparent caching, so :memory: wouldn't work. -- Aahz (a...@pythoncraft.com) * http://www.pythoncraft.com/ Many customs in this life persist because they ease friction and promote productivity as a result of universal agreement, and whether they are precisely the optimal choices is much less important. --Henry Spencer -- http://mail.python.org/mailman/listinfo/python-list
Re: Dictionary or Database—Please advise
On Feb 26, 12:58 pm, Jeremy jlcon...@gmail.com wrote: I have lots of data that I currently store in dictionaries. However, the memory requirements are becoming a problem. I am considering using a database of some sorts instead, but I have never used them before. Would a database be more memory efficient than a dictionary? I also need platform independence without having to install a database and Python interface on all the platforms I'll be using. Is there something built-in to Python that will allow me to do this? Thanks, Jeremy For small quantities of data, dicts are ok, but if your app continually handles (and add) large quantities of data, you should use a database. You can start with sqlite (which is bundled with python) and it's much easier to use than any other relational database out there. You don't need to install anything, since it's not a database server. You simply save your databases as files. If you don't know sql (the standard language used to query databases), I recomend this online tutorial: http://www.sqlcourse.com/ Good luck! Luis -- http://mail.python.org/mailman/listinfo/python-list
Re: Dictionary or Database�Please advise
In article roy-c4f98b.11395126022...@70-1-84-166.pools.spcsdns.net, Roy Smith r...@panix.com wrote: Whatever database you pick, you're almost certainly going to end up having to install it wherever you install your application. There's no such thing as a universally available database that you can expect to be available everywhere. ...unless you use SQLite. -- Aahz (a...@pythoncraft.com) * http://www.pythoncraft.com/ Many customs in this life persist because they ease friction and promote productivity as a result of universal agreement, and whether they are precisely the optimal choices is much less important. --Henry Spencer -- http://mail.python.org/mailman/listinfo/python-list
Re: Dictionary or Database—Please advise
In article 891a98fa-c398-455a-981f-bf72af772...@s36g2000prh.googlegroups.com, Jeremy jlcon...@gmail.com wrote: I have lots of data that I currently store in dictionaries. However, the memory requirements are becoming a problem. I am considering using a database of some sorts instead, but I have never used them before. Would a database be more memory efficient than a dictionary? I also need platform independence without having to install a database and Python interface on all the platforms I'll be using. Is there something built-in to Python that will allow me to do this? If you're serious about needing both a disk-based backing store *and* getting maximum use/performance from your RAM, you probably will need to combine memcached with one of the other solutions offered. But given your other requirement of not installing a DB, your best option will certainly be SQLite. You can use :memory: databases, but that will require shuttling data to disk manually. I suggest that you start with plain SQLite and only worry if you prove (repeat, PROVE) that DB is your bottleneck. -- Aahz (a...@pythoncraft.com) * http://www.pythoncraft.com/ Many customs in this life persist because they ease friction and promote productivity as a result of universal agreement, and whether they are precisely the optimal choices is much less important. --Henry Spencer -- http://mail.python.org/mailman/listinfo/python-list
Re: Dictionary or Database—Please advise
On Feb 26, 7:58 am, Jeremy jlcon...@gmail.com wrote: I have lots of data that I currently store in dictionaries. However, the memory requirements are becoming a problem. I am considering using a database of some sorts instead, but I have never used them before. Would a database be more memory efficient than a dictionary? I also need platform independence without having to install a database and Python interface on all the platforms I'll be using. Is there something built-in to Python that will allow me to do this? If you had wall-to-wall unit tests, you could swap the database in incrementally (Deprecation Refactor). You would just add one database table, switch one client of one dictionary to use that table, pass all the tests, and integrate. Repeat until nobody uses the dicts, then trivially retire them. If you don't have unit tests, then you have a bigger problem than memory requirements. (You can throw $50 hardware at that!) -- Phlip http://penbird.tumblr.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Dictionary or Database—Please advise
On Fri, 26 Feb 2010 21:56:47 +0100, Patrick Sabin wrote: Shelve looks like an interesting option, but what might pose an issue is that I'm reading the data from a disk instead of memory. I didn't mention this in my original post, but I was hoping that by using a database it would be more memory efficient in storing data in RAM so I wouldn't have to read from (or swap to/from) disk. A database usually stores data on disk and not in RAM. However you could use sqlite with :memory:, so that it runs in RAM. The OP started this thread with: I have lots of data that I currently store in dictionaries. However, the memory requirements are becoming a problem. So I'm amused that he thinks the solution to running out of memory is to run the heavy overhead of a database entirely in memory, instead of lightweight dicts :) My advice is, first try to optimize your use of dicts. Are you holding onto large numbers of big dicts that you don't need? Are you making unnecessary copies? If so, fix your algorithms. If you can't optimize the dicts anymore, then move to a proper database. Don't worry about whether it is on-disk or not, databases tend to be pretty fast regardless, and it is better to run your application a little bit more slowly than to not run it at all because it ran out of memory halfway through processing the data. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Dictionary or Database—Please advise
Jeremy jlcon...@gmail.com writes: I have lots of data that I currently store in dictionaries. However, the memory requirements are becoming a problem. I am considering using a database of some sorts instead, but I have never used them before. Would a database be more memory efficient than a dictionary? What are you trying to do? Yes, a database would let you use disk instead of ram, but the slowdown might be intolerable depending on the access pattern. You really have to consider what the whole application is up to, and what the database is doing for you. It might be imposing a high overhead to create benefits (e.g. transaction isolation) that you're not actually using. Somehow I've managed to do a lot of programming on large datasets without ever using databases very much. I'm not sure that's a good thing; but anyway, a lot of times you can do better with externally sorted flat files, near-line search engines, etc. than you can with databases. If the size of your data is fixed or is not growing too fast, and it's larger than your computer's memory by just a moderate amount (e.g. you have a 2GB computer and 3GB of data), the simplest approach may be to just buy a few more ram modules and put them in your computer. -- http://mail.python.org/mailman/listinfo/python-list