On 01 Apr 2014, at 4:20, Jeff Janes <jeff.ja...@gmail.com> wrote:

> On Sunday, March 30, 2014, Stefan Keller <sfkel...@gmail.com> wrote:
> There seem to be two main things which make in-memory dbs special: 
> 1. Index: Having all data in memory there is no need for a default index. A 
> full-table scan "suddenly" becomes the default.
> 
> Surely not.  I would say that full table scans are *already* the default, 
> deviated from only if it thinks an index seems to be better.  If you don't 
> make an index, it can't seem to be better.  And I don't know of any 
> higher-level in-memory language which fails to provide a way to do efficient 
> searching into an in-memory structure, usually in the form of hash tables or 
> balanced trees.  If "let's seq scan everything as long as it is already in 
> memory" is a good idea, why would Perl, Java, Python, etc. (not to mention C 
> libraries and the source code of PostgreSQL itself) provide ways to do 
> efficient searches in memory?
> 
> The caveat here is you can't make certain constraints without an index.  In 
> theory you *could* have an unique constraint without an index to support it, 
> but if it were implemented you would probably rapidly learn that you don't 
> actually want to do that.

That’s what I thought initially reading that line as well, but that’s not 
necessarily true for an in-memory database.

The keyword here is “default index”. The reasoning is probably along the lines 
that if all your data is in-memory, then keeping it all sorted on the primary 
column(s) is relatively cheap (compared to doing so on disk). You could even 
split off the primary column(s) from the rest of the records and link back to 
them using pointers, which makes it easier to keep them in a (somewhat) 
balanced btree.

When you do that, the data effectively becomes the primary key index and would 
theoretically also be usable as the primary key constraint. So you _do_ have an 
index, but it’s not a separate one; it’s part of your data. That frees up 
memory that you do not need to preserve for an index, which is probably rather 
a gain for an in-memory database.

Apparently this works for HANA, but for an MVCC database like Postgres that’s a 
little more involved.

Of course, with databases like that you want them replicated on different 
hardware and kernels to reduce data-loss risks. And that brings us closer to 
what ISTR Stonebreaker is doing these days.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to