On May 16, 2011, at 8:47 AM, Merlin Moncure wrote:
> On Sat, May 14, 2011 at 5:10 AM, Stefan Keller <sfkel...@gmail.com> wrote:
>> Hi,
>> 
>> I am conducting a benchmark to compare KVP table vs. hstore and got
>> bad hstore performance results when the no. of records is greater than
>> about 500'000.
>> 
>> CREATE TABLE kvp ( id SERIAL PRIMARY KEY, key text NOT NULL, value text );
>> -- with index on key
>> CREATE TABLE myhstore ( id SERIAL PRIMARY KEY, obj hstore NOT NULL );
>> -- with GIST index on obj
>> 
>> Does anyone have experience with that?
> 
> hstore is not really designed for large-ish sets like that.

And KVP is? ;)

IIRC hstore ends up just storing everything as text, with pointers to know 
where things start and end. There's no real indexing inside hstore, so 
basically the only thing it can do is scan the entire hstore.

That said, I would strongly reconsider using KVP for anything except the most 
trivial of data sets. It is *extremely* inefficient. Do you really have 
absolutely no idea what *any* of your keys will be? Even if you need to support 
a certain amount of non-deterministic stuff, I would put everything you 
possibly can into real fields and only use KVP or hstore for things that you 
really didn't anticipate.

Keep in mind that for every *value*, your overhead is 24 bytes for the heap 
header, 2+ varlena bytes in the heap, plus the length of the key. In the index 
you're looking at 6+ bytes of overhead, 1+ byte for varlena, plus the length of 
the key. The PK will cost you an additional 16-24 bytes, depending on 
alignment. So that's a *minimum* of ~50 bytes per value, and realistically the 
overhead will be closer to 65-70 bytes, *per value*. Unless your values are 
decent-sized strings, the overhead is going to be many times larger than the 
actual data!
--
Jim C. Nasby, Database Architect                   j...@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to