Re: [PERFORM] Indexes for hashes

Torsten Zuehlsdorff Wed, 15 Jun 2016 05:46:49 -0700

Hello Ivan,

I have an application which stores a large amounts of hex-encoded hash
strings (nearly 100 GB of them), which means:


  * The number of distinct characters (alphabet) is limited to 16
  * Each string is of the same length, 64 characters
  * The strings are essentially random

Creating a B-Tree index on this results in the index size being larger
than the table itself, and there are disk space constraints.

I've found the SP-GIST radix tree index, and thought it could be a good
match for the data because of the above constraints. An attempt to
create it (as in CREATE INDEX ON t USING spgist(field_name)) apparently
takes more than 12 hours (while a similar B-tree index takes a few hours
at most), so I've interrupted it because "it probably is not going to
finish in a reasonable time". Some slides I found on the spgist index
allude that both build time and size are not really suitable for this
purpose.

My question is: what would be the most size-efficient index for this
situation?


It depends on what you want to query. What about the BRIN-Index:
https://www.postgresql.org/docs/9.5/static/brin-intro.html

This will result in a very small size, but depending on what you want toquery it will fit or not fit your needs.


Greetings,
Torsten


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Indexes for hashes

Reply via email to