Re: Dictionary or Database—Please advise

2010-02-26 Thread Benjamin Kaplan
On Fri, Feb 26, 2010 at 10:58 AM, Jeremy jlcon...@gmail.com wrote:

 I have lots of data that I currently store in dictionaries.  However,
 the memory requirements are becoming a problem.  I am considering
 using a database of some sorts instead, but I have never used them
 before.  Would a database be more memory efficient than a dictionary?
 I also need platform independence without having to install a database
 and Python interface on all the platforms I'll be using.  Is there
 something built-in to Python that will allow me to do this?

 Thanks,
 Jeremy



Python has SQLite 3 built-in and there are wrappers for MySQL and PostgreSQL
on all major platforms. Any one of them will work- databases have the
advantage that they're stored on the disk so you don't have all of it in
memory simultaneously.


 --
 http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary or Database—Please advise

2010-02-26 Thread Chris Rebert
On Fri, Feb 26, 2010 at 7:58 AM, Jeremy jlcon...@gmail.com wrote:
 I have lots of data that I currently store in dictionaries.  However,
 the memory requirements are becoming a problem.  I am considering
 using a database of some sorts instead, but I have never used them
 before.  Would a database be more memory efficient than a dictionary?
 I also need platform independence without having to install a database
 and Python interface on all the platforms I'll be using.  Is there
 something built-in to Python that will allow me to do this?

If you won't be using the SQL features of the database, `shelve` might
be another option; from what I can grok, I sounds like a dictionary
stored mostly on disk rather than entirely in RAM (not 100% sure
though):
http://docs.python.org/library/shelve.html

It's in the std lib and supports several native dbm libraries for its
backend; one of them should almost always be present.

Cheers,
Chris
--
http://blog.rebertia.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary or Database—Please advise

2010-02-26 Thread lbolla
On Feb 26, 3:58 pm, Jeremy jlcon...@gmail.com wrote:
 I have lots of data that I currently store in dictionaries.  However,
 the memory requirements are becoming a problem.  I am considering
 using a database of some sorts instead, but I have never used them
 before.  Would a database be more memory efficient than a dictionary?
 I also need platform independence without having to install a database
 and Python interface on all the platforms I'll be using.  Is there
 something built-in to Python that will allow me to do this?

 Thanks,
 Jeremy

Maybe shelve would be enough for your needs?
http://docs.python.org/library/shelve.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary or Database�Please advise

2010-02-26 Thread Roy Smith
In article 
891a98fa-c398-455a-981f-bf72af772...@s36g2000prh.googlegroups.com,
 Jeremy jlcon...@gmail.com wrote:

 I have lots of data that I currently store in dictionaries.  However,
 the memory requirements are becoming a problem.  I am considering
 using a database of some sorts instead, but I have never used them
 before.  Would a database be more memory efficient than a dictionary?
 I also need platform independence without having to install a database
 and Python interface on all the platforms I'll be using.  Is there
 something built-in to Python that will allow me to do this?
 
 Thanks,
 Jeremy

This is a very vague question, so it'll get a vague answer :-)

If you have so much data that you're running into memory problems, then 
yes, storing the data externally in an disk-resident database seems like a 
reasonable idea.

Once you get into databases, platform independence will be an issue.  There 
are many databases out there to pick from.  If you want something which 
will work on a lot of platforms, a reasonable place to start looking is 
MySQL.  It's free, runs on lots of platforms, has good Python support, and 
there's lots of people on the net who know it and are willing to give help 
and advice.

Databases have a bit of a learning curve.  If you've never done any 
database work, don't expect to download MySql (or any other database) this 
afternoon and be up and running by tomorrow.

Whatever database you pick, you're almost certainly going to end up having 
to install it wherever you install your application.  There's no such thing 
as a universally available database that you can expect to be available 
everywhere.

Have fun!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary or Database—Please advise

2010-02-26 Thread Jeremy
On Feb 26, 9:29 am, Chris Rebert c...@rebertia.com wrote:
 On Fri, Feb 26, 2010 at 7:58 AM, Jeremy jlcon...@gmail.com wrote:
  I have lots of data that I currently store in dictionaries.  However,
  the memory requirements are becoming a problem.  I am considering
  using a database of some sorts instead, but I have never used them
  before.  Would a database be more memory efficient than a dictionary?
  I also need platform independence without having to install a database
  and Python interface on all the platforms I'll be using.  Is there
  something built-in to Python that will allow me to do this?

 If you won't be using the SQL features of the database, `shelve` might
 be another option; from what I can grok, I sounds like a dictionary
 stored mostly on disk rather than entirely in RAM (not 100% sure
 though):http://docs.python.org/library/shelve.html

 It's in the std lib and supports several native dbm libraries for its
 backend; one of them should almost always be present.

 Cheers,
 Chris
 --http://blog.rebertia.com

Shelve looks like an interesting option, but what might pose an issue
is that I'm reading the data from a disk instead of memory.  I didn't
mention this in my original post, but I was hoping that by using a
database it would be more memory efficient in storing data in RAM so I
wouldn't have to read from (or swap to/from) disk.  Would using the
shelve package make reading/writing data from disk faster since it is
in a binary format?

Jeremy
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary or Database—Please advise

2010-02-26 Thread mk

Jeremy wrote:

I have lots of data that I currently store in dictionaries.  However,
the memory requirements are becoming a problem.  I am considering
using a database of some sorts instead, but I have never used them
before.  Would a database be more memory efficient than a dictionary?
I also need platform independence without having to install a database
and Python interface on all the platforms I'll be using.  Is there
something built-in to Python that will allow me to do this?


Since you use dictionaries, I guess that simple store saving key:value 
will do?


If so, bsddb support built into Python will do just nicely.

bsddb is multiplatform, although I have not personally tested if a 
binary db created on one platform will be usable on another. You'd have 
to check this.


Caveat: from what some people say I gather that binary format between 
bsddb versions tends to change.


There's also ultra-cool-and-modern Tokyo Cabinet key:value store with 
Python bindings:


http://pypi.python.org/pypi/pytc/

I didn't test it, though.


Regards,
mk

--
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary or Database—Please advise

2010-02-26 Thread mk

Jeremy wrote:


Shelve looks like an interesting option, but what might pose an issue
is that I'm reading the data from a disk instead of memory.  I didn't
mention this in my original post, but I was hoping that by using a
database it would be more memory efficient in storing data in RAM so I
wouldn't have to read from (or swap to/from) disk.  Would using the
shelve package make reading/writing data from disk faster since it is
in a binary format?


Read the docs:

class shelve.BsdDbShelf(dict[, protocol=None[, writeback=False]])¶
A subclass of Shelf which exposes first(), next(), previous(), 
last() and set_location() which are available in the bsddb module but 
not in other database modules. The dict object passed to the constructor 
must support those methods. This is generally accomplished by calling 
one of bsddb.hashopen(), bsddb.btopen() or bsddb.rnopen(). The optional 
protocol and writeback parameters have the same interpretation as for 
the Shelf class.


Apparently using shelve internally gives you option of using bsddb, 
which is good news: bsddb is B-tree DB, which is highly efficient for 
finding keys. I would recommend bsddb.btopen(), as it creates B-tree DB 
(perhaps other implementations, like anydb or hash db are good as well, 
but I personally didn't test them out).


I can't say for Berkeley DB implementation, but in general B-tree 
algorithm has O(log2 n) complexity for finding keys, which roughly means 
that if you need to find particular key in a db of 1 million keys, 
you'll probably need ~20 disk accesses (or even less if some keys looked 
at in the process of search happen to be in the same disk sectors). So 
yes, it's highly efficient.


Having said that, remember that disk is many orders of magnitude slower 
than RAM, so it's no free lunch.. Nothing will beat memory-based data 
structure when it comes to speed (well new flash or hybrid disks perhaps 
could significantly improve in comparison to current mainstream 
mechanical-platter disks? there are some hyper-fast storage hardware 
companies out there, although they tend to charge arm and leg for their 
stuff for now).


Caveat: Berkeley DB is dual-licensed -- if you're using it for 
commercial work, it might be that you'd need to buy a license for it. 
Although I have had no experience with this really, if someone here did 
perhaps they will shed some light on it?


Regards,
mk



--
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary or Database—Please advise

2010-02-26 Thread CM
On Feb 26, 10:58 am, Jeremy jlcon...@gmail.com wrote:
 I have lots of data

How much is lots?

 that I currently store in dictionaries.  However,
 the memory requirements are becoming a problem.  I am considering
 using a database of some sorts instead, but I have never used them
 before.  Would a database be more memory efficient than a dictionary?

What do you mean by more efficient?

 I also need platform independence without having to install a database
 and Python interface on all the platforms I'll be using.  Is there
 something built-in to Python that will allow me to do this?

The SQLite datbase engine is built into Python 2.5 and up.  I have
heard on this list that there may be problems with it with Python 2.6
and up on Linux, but I've stayed with 2.5 and it works fine for me on
WinXP, Vista, and Linux.

You can use it as a disk-stored single database file, or an in-memory-
only database.  The SQLite website (http://www.sqlite.org/) claims it
is the most widely deployed SQL database engine in the world., for
what that's worth.

Have a look at this:
http://docs.python.org/library/sqlite3.html

Che




 Thanks,
 Jeremy

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary or Database—Please advise

2010-02-26 Thread Patrick Sabin



Shelve looks like an interesting option, but what might pose an issue
is that I'm reading the data from a disk instead of memory.  I didn't
mention this in my original post, but I was hoping that by using a
database it would be more memory efficient in storing data in RAM so I
wouldn't have to read from (or swap to/from) disk.


A database usually stores data on disk and not in RAM. However you could 
use sqlite with :memory:, so that it runs in RAM.



Would using the
shelve package make reading/writing data from disk faster since it is
in a binary format?

Faster than what? Shelve uses caching, so it is likely to be faster than 
a self-made solution. However, accessing disk is much slower than 
accessing RAM.



Jeremy

- Patrick
--
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary or Database—Please advise

2010-02-26 Thread Aahz
In article mailman.296.1267217819.4577.python-l...@python.org,
Patrick Sabin  patrick.just4...@gmail.com wrote:

A database usually stores data on disk and not in RAM. However you could 
use sqlite with :memory:, so that it runs in RAM.

The OP wants transparent caching, so :memory: wouldn't work.
-- 
Aahz (a...@pythoncraft.com)   * http://www.pythoncraft.com/

Many customs in this life persist because they ease friction and promote
productivity as a result of universal agreement, and whether they are
precisely the optimal choices is much less important. --Henry Spencer
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary or Database—Please advise

2010-02-26 Thread Luis M . González
On Feb 26, 12:58 pm, Jeremy jlcon...@gmail.com wrote:
 I have lots of data that I currently store in dictionaries.  However,
 the memory requirements are becoming a problem.  I am considering
 using a database of some sorts instead, but I have never used them
 before.  Would a database be more memory efficient than a dictionary?
 I also need platform independence without having to install a database
 and Python interface on all the platforms I'll be using.  Is there
 something built-in to Python that will allow me to do this?

 Thanks,
 Jeremy

For small quantities of data, dicts are ok, but if your app
continually handles (and add) large quantities of data, you should use
a database.
You can start with sqlite (which is bundled with python) and it's much
easier to use than any other relational database out there.
You don't need to install anything, since it's not a database server.
You simply save your databases as files.
If you don't know sql (the standard language used to query databases),
I recomend this online tutorial: http://www.sqlcourse.com/

Good luck!
Luis
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary or Database�Please advise

2010-02-26 Thread Aahz
In article roy-c4f98b.11395126022...@70-1-84-166.pools.spcsdns.net,
Roy Smith  r...@panix.com wrote:

Whatever database you pick, you're almost certainly going to end up having 
to install it wherever you install your application.  There's no such thing 
as a universally available database that you can expect to be available 
everywhere.

...unless you use SQLite.
-- 
Aahz (a...@pythoncraft.com)   * http://www.pythoncraft.com/

Many customs in this life persist because they ease friction and promote
productivity as a result of universal agreement, and whether they are
precisely the optimal choices is much less important. --Henry Spencer
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary or Database—Please advise

2010-02-26 Thread Aahz
In article 891a98fa-c398-455a-981f-bf72af772...@s36g2000prh.googlegroups.com,
Jeremy  jlcon...@gmail.com wrote:

I have lots of data that I currently store in dictionaries.  However,
the memory requirements are becoming a problem.  I am considering
using a database of some sorts instead, but I have never used them
before.  Would a database be more memory efficient than a dictionary?
I also need platform independence without having to install a database
and Python interface on all the platforms I'll be using.  Is there
something built-in to Python that will allow me to do this?

If you're serious about needing both a disk-based backing store *and*
getting maximum use/performance from your RAM, you probably will need to
combine memcached with one of the other solutions offered.

But given your other requirement of not installing a DB, your best option
will certainly be SQLite.  You can use :memory: databases, but that will
require shuttling data to disk manually.  I suggest that you start with
plain SQLite and only worry if you prove (repeat, PROVE) that DB is your
bottleneck.
-- 
Aahz (a...@pythoncraft.com)   * http://www.pythoncraft.com/

Many customs in this life persist because they ease friction and promote
productivity as a result of universal agreement, and whether they are
precisely the optimal choices is much less important. --Henry Spencer
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary or Database—Please advise

2010-02-26 Thread Phlip
On Feb 26, 7:58 am, Jeremy jlcon...@gmail.com wrote:
 I have lots of data that I currently store in dictionaries.  However,
 the memory requirements are becoming a problem.  I am considering
 using a database of some sorts instead, but I have never used them
 before.  Would a database be more memory efficient than a dictionary?
 I also need platform independence without having to install a database
 and Python interface on all the platforms I'll be using.  Is there
 something built-in to Python that will allow me to do this?

If you had wall-to-wall unit tests, you could swap the database in
incrementally (Deprecation Refactor).

You would just add one database table, switch one client of one
dictionary to use that table, pass all the tests, and integrate.
Repeat until nobody uses the dicts, then trivially retire them.

If you don't have unit tests, then you have a bigger problem than
memory requirements. (You can throw $50 hardware at that!)

--
  Phlip
  http://penbird.tumblr.com/
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary or Database—Please advise

2010-02-26 Thread Steven D'Aprano
On Fri, 26 Feb 2010 21:56:47 +0100, Patrick Sabin wrote:

 Shelve looks like an interesting option, but what might pose an issue
 is that I'm reading the data from a disk instead of memory.  I didn't
 mention this in my original post, but I was hoping that by using a
 database it would be more memory efficient in storing data in RAM so I
 wouldn't have to read from (or swap to/from) disk.
 
 A database usually stores data on disk and not in RAM. However you could
 use sqlite with :memory:, so that it runs in RAM.

The OP started this thread with:

I have lots of data that I currently store in dictionaries.  However,
the memory requirements are becoming a problem.

So I'm amused that he thinks the solution to running out of memory is to 
run the heavy overhead of a database entirely in memory, instead of 
lightweight dicts :)

My advice is, first try to optimize your use of dicts. Are you holding 
onto large numbers of big dicts that you don't need? Are you making 
unnecessary copies? If so, fix your algorithms.

If you can't optimize the dicts anymore, then move to a proper database. 
Don't worry about whether it is on-disk or not, databases tend to be 
pretty fast regardless, and it is better to run your application a little 
bit more slowly than to not run it at all because it ran out of memory 
halfway through processing the data.


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary or Database—Please advise

2010-02-26 Thread Paul Rubin
Jeremy jlcon...@gmail.com writes:
 I have lots of data that I currently store in dictionaries.  However,
 the memory requirements are becoming a problem.  I am considering
 using a database of some sorts instead, but I have never used them
 before.  Would a database be more memory efficient than a dictionary?

What are you trying to do?  Yes, a database would let you use disk
instead of ram, but the slowdown might be intolerable depending on the
access pattern.  You really have to consider what the whole application
is up to, and what the database is doing for you.  It might be imposing
a high overhead to create benefits (e.g. transaction isolation) that
you're not actually using.

Somehow I've managed to do a lot of programming on large datasets
without ever using databases very much.  I'm not sure that's a good
thing; but anyway, a lot of times you can do better with externally
sorted flat files, near-line search engines, etc. than you can with
databases.

If the size of your data is fixed or is not growing too fast, and it's
larger than your computer's memory by just a moderate amount (e.g. you
have a 2GB computer and 3GB of data), the simplest approach may be to
just buy a few more ram modules and put them in your computer.
-- 
http://mail.python.org/mailman/listinfo/python-list