> Personally, my biggest gripe about the way we do compression is that
> it's easy to detoast the same object lots of times.  More generally,
> our in-memory representation of user data values is pretty much a
> mirror of our on-disk representation, even when that leads to excess
> conversions.  Beyond what we do for TOAST, there's stuff like numeric
> where not only toast but then post-process the results into yet
> another internal form before performing any calculations - and then of
> course we have to convert back before returning from the calculation
> functions.  And for things like XML, JSON, and hstore we have to
> repeatedly parse the string, every time someone wants to do anything
> to do.  Of course, solving this is a very hard problem, and not
> solving it isn't a reason not to have more compression options - but
> more compression options will not solve the problems that I personally
> have in this area, by and large.
>
> At the risk of saying something totally obvious and stupid as I haven't
looked at the actual representation this sounds like a memoisation
problem.  In ocaml terms:

type 'a rep =
  | On_disk_rep     of Byte_sequence
  | In_memory_rep of 'a

type 'a t = 'a rep ref

let get_mem_rep t converter =
  match !t with
  | On_disk_rep seq ->
    let res = converter seq in
    t := In_memory_rep res;
    res
  | In_memory_rep x -> x
;;

... (if you need the other direction that it's straightforward too)...

Translating this into c is relatively straightforward if you have the
luxury of a fresh start
and don't have to be super efficient:

typedef enum { ON_DISK_REP, IN_MEMORY_REP } rep_kind_t;

type t = {
  rep_kind_t rep_kind;
  union {
    char *on_disk;
    void *in_memory;
  } rep;
};

void *get_mem_rep(t *t, void * (*converter)(char *)) {
  void *res;
  switch (t->rep_kind) {
     case ON_DISK_REP:
        res = converter(t->on_disk);
        t->rep.in_memory = res;
        t->rep_kind = IN_MEMORY_REP;
        return res;
     case IN_MEMORY_REP;
        return t->rep.in_memory;
  }
}

Now of course fitting this into the existing types and ensuring that there
is neither too early freeing of memory nor memory leaks or other bugs is
probably a nightmare and why you said that this is a hard problem.

Cheers,

Bene

Reply via email to