> Personally, my biggest gripe about the way we do compression is that
> it's easy to detoast the same object lots of times. More generally,
> our in-memory representation of user data values is pretty much a
> mirror of our on-disk representation, even when that leads to excess
> conversions. Beyond what we do for TOAST, there's stuff like numeric
> where not only toast but then post-process the results into yet
> another internal form before performing any calculations - and then of
> course we have to convert back before returning from the calculation
> functions. And for things like XML, JSON, and hstore we have to
> repeatedly parse the string, every time someone wants to do anything
> to do. Of course, solving this is a very hard problem, and not
> solving it isn't a reason not to have more compression options - but
> more compression options will not solve the problems that I personally
> have in this area, by and large.
>
> At the risk of saying something totally obvious and stupid as I haven't
looked at the actual representation this sounds like a memoisation
problem. In ocaml terms:
type 'a rep =
| On_disk_rep of Byte_sequence
| In_memory_rep of 'a
type 'a t = 'a rep ref
let get_mem_rep t converter =
match !t with
| On_disk_rep seq ->
let res = converter seq in
t := In_memory_rep res;
res
| In_memory_rep x -> x
;;
... (if you need the other direction that it's straightforward too)...
Translating this into c is relatively straightforward if you have the
luxury of a fresh start
and don't have to be super efficient:
typedef enum { ON_DISK_REP, IN_MEMORY_REP } rep_kind_t;
type t = {
rep_kind_t rep_kind;
union {
char *on_disk;
void *in_memory;
} rep;
};
void *get_mem_rep(t *t, void * (*converter)(char *)) {
void *res;
switch (t->rep_kind) {
case ON_DISK_REP:
res = converter(t->on_disk);
t->rep.in_memory = res;
t->rep_kind = IN_MEMORY_REP;
return res;
case IN_MEMORY_REP;
return t->rep.in_memory;
}
}
Now of course fitting this into the existing types and ensuring that there
is neither too early freeing of memory nor memory leaks or other bugs is
probably a nightmare and why you said that this is a hard problem.
Cheers,
Bene