Re: [HACKERS] GSOC - TOAST'ing in slices

George Papadrosou Thu, 16 Mar 2017 10:20:30 -0700

Hello all, 

thank you for your replies.  I agree with Alexander Korotkov that it is 
important to have a quality patch at the end of the summer.


Stephen, you mentioned PostGIS, but the conversation seems to lean towards 
JSONB. What are your thoughts?

Also, if I am to include some ideas/approaches in the proposal, it seems I 
should really focus on understanding how a specific data type is used, queried 
and indexed, which is a lot of exploring for a newcomer in postgres code.

In the meanwhile, I am trying to find how jsonb is indexed and queried. After I 
grasp the current situation I will be to think about new approaches.

Regards,
George 

> On 15 Μαρ 2017, at 15:53, Tom Lane <t...@sss.pgh.pa.us> wrote:
> 
> Robert Haas <robertmh...@gmail.com <mailto:robertmh...@gmail.com>> writes:
>> On Tue, Mar 14, 2017 at 10:03 PM, George Papadrosou
>> <gpapadro...@gmail.com> wrote:
>>> The project’s idea is implement different slicing approaches according to
>>> the value’s datatype. For example a text field could be split upon character
>>> boundaries while a JSON document would be split in a way that allows fast
>>> access to it’s keys or values.
> 
>> Hmm.  So if you had a long text field containing multibyte characters,
>> and you split it after, say, every 1024 characters rather than after
>> every N bytes, then you could do substr() without detoasting the whole
>> field.  On the other hand, my guess is that you'd waste a fair amount
>> of space in the TOAST table, because it's unlikely that the chunks
>> would be exactly the right size to fill every page of the table
>> completely.  On balance it seems like you'd be worse off, because
>> substr() probably isn't all that common an operation.
> 
> Keep in mind also that slicing on "interesting" boundaries rather than
> with the current procrustean-bed approach could save you at most one or
> two chunk fetches per access.  So the upside seems limited.  Moreover,
> how are you going to know whether a given toast item has been stored
> according to your newfangled approach?  I doubt we're going to accept
> forcing a dump/reload for this.
> 
> IMO, the real problem here is to be able to predict which chunk(s) to
> fetch at all, and I'd suggest focusing on that part of the problem rather
> than changes to physical storage.  It's hard to see how to do anything
> very smart for text (except in the single-byte-encoding case, which is
> already solved).  But the JSONB format was designed with some thought
> to this issue, so you might be able to get some traction there.
> 
>                       regards, tom lane

Re: [HACKERS] GSOC - TOAST'ing in slices

Reply via email to