> Personally, my biggest gripe about the way we do compression is that > it's easy to detoast the same object lots of times. More generally, > our in-memory representation of user data values is pretty much a > mirror of our on-disk representation, even when that leads to excess > conversions. Beyond what we do for TOAST, there's stuff like numeric > where not only toast but then post-process the results into yet > another internal form before performing any calculations - and then of > course we have to convert back before returning from the calculation > functions. And for things like XML, JSON, and hstore we have to > repeatedly parse the string, every time someone wants to do anything > to do. Of course, solving this is a very hard problem, and not > solving it isn't a reason not to have more compression options - but > more compression options will not solve the problems that I personally > have in this area, by and large. > > At the risk of saying something totally obvious and stupid as I haven't looked at the actual representation this sounds like a memoisation problem. In ocaml terms:
type 'a rep = | On_disk_rep of Byte_sequence | In_memory_rep of 'a type 'a t = 'a rep ref let get_mem_rep t converter = match !t with | On_disk_rep seq -> let res = converter seq in t := In_memory_rep res; res | In_memory_rep x -> x ;; ... (if you need the other direction that it's straightforward too)... Translating this into c is relatively straightforward if you have the luxury of a fresh start and don't have to be super efficient: typedef enum { ON_DISK_REP, IN_MEMORY_REP } rep_kind_t; type t = { rep_kind_t rep_kind; union { char *on_disk; void *in_memory; } rep; }; void *get_mem_rep(t *t, void * (*converter)(char *)) { void *res; switch (t->rep_kind) { case ON_DISK_REP: res = converter(t->on_disk); t->rep.in_memory = res; t->rep_kind = IN_MEMORY_REP; return res; case IN_MEMORY_REP; return t->rep.in_memory; } } Now of course fitting this into the existing types and ensuring that there is neither too early freeing of memory nor memory leaks or other bugs is probably a nightmare and why you said that this is a hard problem. Cheers, Bene