Hello,

First post on the list, please be indulgent :)

I'm working on a fairly large DB (for me) - 23 GB data, 42 GB indexes, 100 M 
rows for my biggest table - and trying to find possible optimisations on the 
storage requirements... and hopefully trying to keep more indexes in RAM.

This DB is actually the import of "events" on a sliding window of 30 days, and 
I have no control over the events which are generated. I have control of the 
import script though.

Many of the columns have the following characteristics:
- VARCHAR
- low cardinality (typically < 100 distinct values)
- but I can see new values appearing "at any time" when importing data from 
external systems. I don't expect the cardinality to grow significantly though.

The naive storage of those columns is quite demanding when compared to the 
amount of information they carry, and I'm looking at solutions to optimise 
this. Obviously I could:

- use ENUMs to reduce the storage space to 4 bytes on disk (cf 
https://www.postgresql.org/docs/current/static/datatype-enum.html 
<https://www.postgresql.org/docs/current/static/datatype-enum.html>) assuming I 
managed the ENUMs by adding new values when needed. This would probably shrink 
the indexes significantly as well. It may have an impact on the comparison of 
values as well.

- normalize the DB by adding another table and a FOREIGN KEY - the management 
of this table could be done via triggers for instance, with a cost in 
complexity (triggers, applications accessing the DB, ...)

I would find much more elegant to use a datatype where my VARCHARs would be be 
internally stored as a SMALLINT (or similar), indexed as SMALLINT, while still 
being able to be externally seen as if it was a VARCHAR (comparison, ORM 
bindings, ...)

I didn't find any datatype which would work like this :(

Does anyone know of such a solution ?

Thank you,

Olivier

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to