Re: Pluggable toaster

Tomas Vondra Mon, 17 Jan 2022 11:24:04 -0800



On 1/14/22 19:41, Teodor Sigaev wrote:

In my understanding, we want to be able to
1. Access data from a toasted object one slice at a time, by using
knowledge of the structure
2. If toasted data is updated, then update a minimum number of
slices(s), without rewriting the existing slices
3. If toasted data is expanded, then allownew slices to be appended to
the object without rewriting the existing slices


There are more options:

1 share common parts between not only versions of row but between allrows in a column. Seems strange but examples:

   - urls often have a common prefix and so storing in a prefix tree (as
     SP-GiST does) allows significantly decrease storage size
   - the same for json - it's often use case with common part of its
     hierarchical structure
   - one more usecase for json. If json use only a few schemes
     (structure) it's possible to store in toast storage only values and
     don't store keys and structure

This sounds interesting, but very much like column compression, whichwas proposed some time ago. If we haven't made much progrees with thatpatch (AFAICS), what's the likelihood we'll succeed here, when it'scombined with yet more complexity?

Maybe doing that kind of compression in TOAST is somehow simpler, but Idon't see it.

2 Current toast storage stores chunks in heap accesses method and toprovide fast access by toast id it makes an index. Ideas:

   - store chunks directly in btree tree, pgsql's btree already has an
     INCLUDE columns, so, chunks and visibility data will be stored only
     in leaf pages. Obviously it reduces number of disk's access for
     "untoasting".
   - use another access method for chunk storage

Maybe, but that probably requires more thought - e.g. btree requires thevalues to be less than 1/3 page, so I wonder how would that play withtoasting of values.

ISTM that we would want the toast algorithm to be associated with the
datatype, not the column?
Can you explain your thinking?
Hm. I'll try to explain my motivation.
1) Datatype could have more than one suitable toasters. For different
    usecases: fast retrieving, compact storage, fast update etc. As I
    told   above, for jsonb there are several optimal strategies for
    toasting:   for values with a few different structures, for close to
    hierarchical structures,  for values with different parts by access
    mode (easy to imagine json with some keys used for search and some
    keys only for   output to user)
2) Toaster could be designed to work with different data type. Suggested
    appendable toaster is designed to work with bytea but could work with
    text
Looking on this point I have doubts where to store connection betweentoaster and datatype. If we add toasteroid to pg_type how to deal withseveral toaster for one datatype? (And we could want to has differenttoaster on one table!) If we add typoid to pg_toaster then how it willwork with several datatypes? An idea to add a new many-to-manyconnection table seems workable but here there are another questions,such as will any toaster work with any table access method?
To resolve this bundle of question we propose validate() method oftoaster, which should be called during DDL operation, i.e. toaster isassigned to column or column's datatype is changed.

Seems you'd need a mapping table, to allow M:N mapping between types andtoasters, linking it to all "compatible" types. It's not clear to me howwould this work with custom data types, domains etc.

Also, what happens to existing values when you change the toaster? Whatif the toasters don't use the same access method to store the chunks(heap vs. btree)? And so on.

More thought:
Now postgres has two options for column: storage and compression and nowwe add toaster. For me it seems too redundantly. Seems, storage shouldbe binary value: inplace (plain as now) and toastable. All othervariation such as toast limit, compression enabling, compression kindshould be an per-column option for toaster (that's why we suggest validtoaster oid for any column with varlena/toastable datatype). It lookslike a good abstraction but we will have a problem with backwardcompatibility and I'm afraid I can't implement it very fast.

So you suggest we move all of this to toaster? I'd say -1 to that,because it makes it much harder to e.g. add custom compression method, etc.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Pluggable toaster

Reply via email to