Hello all, thank you for your replies. I agree with Alexander Korotkov that it is important to have a quality patch at the end of the summer.
Stephen, you mentioned PostGIS, but the conversation seems to lean towards JSONB. What are your thoughts? Also, if I am to include some ideas/approaches in the proposal, it seems I should really focus on understanding how a specific data type is used, queried and indexed, which is a lot of exploring for a newcomer in postgres code. In the meanwhile, I am trying to find how jsonb is indexed and queried. After I grasp the current situation I will be to think about new approaches. Regards, George > On 15 Μαρ 2017, at 15:53, Tom Lane <t...@sss.pgh.pa.us> wrote: > > Robert Haas <robertmh...@gmail.com <mailto:robertmh...@gmail.com>> writes: >> On Tue, Mar 14, 2017 at 10:03 PM, George Papadrosou >> <gpapadro...@gmail.com> wrote: >>> The project’s idea is implement different slicing approaches according to >>> the value’s datatype. For example a text field could be split upon character >>> boundaries while a JSON document would be split in a way that allows fast >>> access to it’s keys or values. > >> Hmm. So if you had a long text field containing multibyte characters, >> and you split it after, say, every 1024 characters rather than after >> every N bytes, then you could do substr() without detoasting the whole >> field. On the other hand, my guess is that you'd waste a fair amount >> of space in the TOAST table, because it's unlikely that the chunks >> would be exactly the right size to fill every page of the table >> completely. On balance it seems like you'd be worse off, because >> substr() probably isn't all that common an operation. > > Keep in mind also that slicing on "interesting" boundaries rather than > with the current procrustean-bed approach could save you at most one or > two chunk fetches per access. So the upside seems limited. Moreover, > how are you going to know whether a given toast item has been stored > according to your newfangled approach? I doubt we're going to accept > forcing a dump/reload for this. > > IMO, the real problem here is to be able to predict which chunk(s) to > fetch at all, and I'd suggest focusing on that part of the problem rather > than changes to physical storage. It's hard to see how to do anything > very smart for text (except in the single-byte-encoding case, which is > already solved). But the JSONB format was designed with some thought > to this issue, so you might be able to get some traction there. > > regards, tom lane