Re: [HACKERS] [RFC] indirect toast tuple support

2013-02-20 Thread Robert Haas
On Tue, Feb 19, 2013 at 9:26 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-02-19 09:12:02 -0500, Robert Haas wrote:
 On Tue, Feb 19, 2013 at 9:00 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  So the other way that we could do this is to use something that's the
  same size as a TOAST pointer but has different content - the
  seemingly-obvious choice being  va_toastrelid == 0.
 
  Unfortunately that would mean you need to copy the varatt_external (or
  whatever it would be called) to aligned storage to check what it
  is. Thats why I went the other way.

 How big a problem is that, though?

 There are quite some places where we test the actual type of a Datum
 inside tuptoaster.c. Copying it to local storage everytime might
 actually be noticeable performancewise. Besides the ugliness of needing
 a local variable, copying the data and only testing afterwards...

Hrm, OK.

   I'd be a little
  reluctant to do it the way you propose because we might, at some
  point, want to try to reduce the size of toast pointers.   If you have
  a tuple with many attributes, the size of the TOAST pointers
  themselves starts to add up.  It would be nice to be able to have 8
  byte or even 4 byte toast pointers to handle those situations.  If we
  steal one or both of those lengths to mean the data is cached in
  memory somewhere then we can't use those lengths in a smaller on-disk
  representation, which would seem a shame.
 
  I agree. As I said above, having the type overlayed into the lenght was
  and is a bad idea, I just haven't found a better one thats compatible
  yet.
  Except inventing typlen=-3 aka toast2 or something. But even that
  wouldn't help getting rid of existing pg_upgraded tables. Besides being
  a maintenance nightmare.
 
  The only reasonable thing I can see us doing is renaming
  varattrib_1b_e.va_len_1be into va_type and redefine VARSIZE_1B_E into a
  switch that maps types into lengths. But I think I would put this off,
  except placing a comment somewhere, until its gets necessary.

 I guess I wonder how hard we think it would be to insert such a thing
 when it becomes necessary.  How much stuff is there out there that
 cares about the fact that that length is a byte?

 You mean whether we could store the length in 6 bytes and use two for
 the type? That should probably work as well. But I don't see much
 advantage in that given that all those sizes ought to be static.
 Redefining VARSIZE_1B_E as indicated above should be fairly easy, there
 aren't many callsites that touch stuff at such low level.

/me blinks.

No, that's not what I meant.  I meant: how hard it would be to
redefine VARSIZE_1B_E along the lines you suggest?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] indirect toast tuple support

2013-02-20 Thread Andres Freund
On 2013-02-20 10:16:45 -0500, Robert Haas wrote:
 On Tue, Feb 19, 2013 at 9:26 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2013-02-19 09:12:02 -0500, Robert Haas wrote:
  On Tue, Feb 19, 2013 at 9:00 AM, Andres Freund and...@2ndquadrant.com 
  wrote:
I'd be a little
   reluctant to do it the way you propose because we might, at some
   point, want to try to reduce the size of toast pointers.   If you have
   a tuple with many attributes, the size of the TOAST pointers
   themselves starts to add up.  It would be nice to be able to have 8
   byte or even 4 byte toast pointers to handle those situations.  If we
   steal one or both of those lengths to mean the data is cached in
   memory somewhere then we can't use those lengths in a smaller on-disk
   representation, which would seem a shame.
  
   I agree. As I said above, having the type overlayed into the lenght was
   and is a bad idea, I just haven't found a better one thats compatible
   yet.
   Except inventing typlen=-3 aka toast2 or something. But even that
   wouldn't help getting rid of existing pg_upgraded tables. Besides being
   a maintenance nightmare.
  
   The only reasonable thing I can see us doing is renaming
   varattrib_1b_e.va_len_1be into va_type and redefine VARSIZE_1B_E into a
   switch that maps types into lengths. But I think I would put this off,
   except placing a comment somewhere, until its gets necessary.
 
  I guess I wonder how hard we think it would be to insert such a thing
  when it becomes necessary.  How much stuff is there out there that
  cares about the fact that that length is a byte?
 
  You mean whether we could store the length in 6 bytes and use two for
  the type? That should probably work as well. But I don't see much
  advantage in that given that all those sizes ought to be static.
  Redefining VARSIZE_1B_E as indicated above should be fairly easy, there
  aren't many callsites that touch stuff at such low level.
 
 /me blinks.
 
 No, that's not what I meant.  I meant: how hard it would be to
 redefine VARSIZE_1B_E along the lines you suggest?

Should be pretty easy. Will do so for the next revision.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] indirect toast tuple support

2013-02-20 Thread Greg Stark
On Tue, Feb 19, 2013 at 2:00 PM, Andres Freund and...@2ndquadrant.com wrote:
 The only reasonable thing I can see us doing is renaming
 varattrib_1b_e.va_len_1be into va_type and redefine VARSIZE_1B_E into a
 switch that maps types into lengths. But I think I would put this off,
 except placing a comment somewhere, until its gets necessary.

Is there any reason to make it a switch before we actually have two
types that happen to have the same length?

It might make the code clearer if there was an enum with the (one)
type listed but as long as all the enum values happen to have the
value of the length of the struct then it makes heap_form_tuple and
heap_deform_tuple marginally faster. (unless gcc can optimize away the
whole switch statement which might be plausible, especially if it's
just a few ?: expressions)

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] indirect toast tuple support

2013-02-20 Thread Greg Stark
On Thu, Feb 21, 2013 at 2:32 AM, Greg Stark st...@mit.edu wrote:
 On Tue, Feb 19, 2013 at 2:00 PM, Andres Freund and...@2ndquadrant.com wrote:
 The only reasonable thing I can see us doing is renaming
 varattrib_1b_e.va_len_1be into va_type and redefine VARSIZE_1B_E into a
 switch that maps types into lengths. But I think I would put this off,
 except placing a comment somewhere, until its gets necessary.

 Is there any reason to make it a switch before we actually have two
 types that happen to have the same length?

 It might make the code clearer if there was an enum with the (one)
 type listed but as long as all the enum values happen to have the
 value of the length of the struct then it makes heap_form_tuple and
 heap_deform_tuple marginally faster. (unless gcc can optimize away the
 whole switch statement which might be plausible, especially if it's
 just a few ?: expressions)

For what it's worth much of this was discussed at the time. I
originally wrote it as an enum and Tom changed it to a length byte,
specifically for performance reasons, and said we could always change
it back to an enum where some of the values just happened to be equal
to their length if we needed it:

http://www.postgresql.org/message-id/flat/82tzp7bbbh@mid.bfk.de#82tzp7bbbh@mid.bfk.de


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] indirect toast tuple support

2013-02-19 Thread Robert Haas
On Sat, Feb 16, 2013 at 11:42 AM, Andres Freund and...@2ndquadrant.com wrote:
 Given that there have been wishes to support something like b) for quite
 some time, independent from logical decoding, it seems like a good idea
 to add support for it. Its e.g. useful for avoiding repeated detoasting
 or decompression of tuples.

 The problem with b) is that there is no space in varlena's flag bits to
 directly denote that a varlena points into memory instead of either
 directly containing the data or a varattrib_1b_e containing a
 varatt_external pointing to an on-disk toasted tuple.

So the other way that we could do this is to use something that's the
same size as a TOAST pointer but has different content - the
seemingly-obvious choice being  va_toastrelid == 0.  I'd be a little
reluctant to do it the way you propose because we might, at some
point, want to try to reduce the size of toast pointers.   If you have
a tuple with many attributes, the size of the TOAST pointers
themselves starts to add up.  It would be nice to be able to have 8
byte or even 4 byte toast pointers to handle those situations.  If we
steal one or both of those lengths to mean the data is cached in
memory somewhere then we can't use those lengths in a smaller on-disk
representation, which would seem a shame.

But having said that, +1 on the general idea of getting something like
this done.  We really need a better infrastructure to avoid copying
large values around repeatedly in memory - a gigabyte is a lot of data
to be slinging around.

Of course, you will not be surprised to hear that I think this is 9.4 material.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] indirect toast tuple support

2013-02-19 Thread Andres Freund
On 2013-02-19 08:48:05 -0500, Robert Haas wrote:
 On Sat, Feb 16, 2013 at 11:42 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  Given that there have been wishes to support something like b) for quite
  some time, independent from logical decoding, it seems like a good idea
  to add support for it. Its e.g. useful for avoiding repeated detoasting
  or decompression of tuples.
 
  The problem with b) is that there is no space in varlena's flag bits to
  directly denote that a varlena points into memory instead of either
  directly containing the data or a varattrib_1b_e containing a
  varatt_external pointing to an on-disk toasted tuple.
 
 So the other way that we could do this is to use something that's the
 same size as a TOAST pointer but has different content - the
 seemingly-obvious choice being  va_toastrelid == 0.

Unfortunately that would mean you need to copy the varatt_external (or
whatever it would be called) to aligned storage to check what it
is. Thats why I went the other way.

Its a bit sad that varatt_1b_e only contains a length and not a type
byte. I would like to change the storage of existing toast types but
thats not going to work for pg_upgrade reasons...


  I'd be a little
 reluctant to do it the way you propose because we might, at some
 point, want to try to reduce the size of toast pointers.   If you have
 a tuple with many attributes, the size of the TOAST pointers
 themselves starts to add up.  It would be nice to be able to have 8
 byte or even 4 byte toast pointers to handle those situations.  If we
 steal one or both of those lengths to mean the data is cached in
 memory somewhere then we can't use those lengths in a smaller on-disk
 representation, which would seem a shame.

I agree. As I said above, having the type overlayed into the lenght was
and is a bad idea, I just haven't found a better one thats compatible
yet.
Except inventing typlen=-3 aka toast2 or something. But even that
wouldn't help getting rid of existing pg_upgraded tables. Besides being
a maintenance nightmare.

The only reasonable thing I can see us doing is renaming
varattrib_1b_e.va_len_1be into va_type and redefine VARSIZE_1B_E into a
switch that maps types into lengths. But I think I would put this off,
except placing a comment somewhere, until its gets necessary.

 But having said that, +1 on the general idea of getting something like
 this done.  We really need a better infrastructure to avoid copying
 large values around repeatedly in memory - a gigabyte is a lot of data
 to be slinging around.
 
 Of course, you will not be surprised to hear that I think this is 9.4 
 material.

Yes, obviously. But I need time to actually propose a working patch (I
already found 2 bugs in what I had submitted), thats why I brought it up
now. No point in wasting time if there's an oviously better idea around.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] indirect toast tuple support

2013-02-19 Thread Robert Haas
On Tue, Feb 19, 2013 at 9:00 AM, Andres Freund and...@2ndquadrant.com wrote:
 So the other way that we could do this is to use something that's the
 same size as a TOAST pointer but has different content - the
 seemingly-obvious choice being  va_toastrelid == 0.

 Unfortunately that would mean you need to copy the varatt_external (or
 whatever it would be called) to aligned storage to check what it
 is. Thats why I went the other way.

How big a problem is that, though?

  I'd be a little
 reluctant to do it the way you propose because we might, at some
 point, want to try to reduce the size of toast pointers.   If you have
 a tuple with many attributes, the size of the TOAST pointers
 themselves starts to add up.  It would be nice to be able to have 8
 byte or even 4 byte toast pointers to handle those situations.  If we
 steal one or both of those lengths to mean the data is cached in
 memory somewhere then we can't use those lengths in a smaller on-disk
 representation, which would seem a shame.

 I agree. As I said above, having the type overlayed into the lenght was
 and is a bad idea, I just haven't found a better one thats compatible
 yet.
 Except inventing typlen=-3 aka toast2 or something. But even that
 wouldn't help getting rid of existing pg_upgraded tables. Besides being
 a maintenance nightmare.

 The only reasonable thing I can see us doing is renaming
 varattrib_1b_e.va_len_1be into va_type and redefine VARSIZE_1B_E into a
 switch that maps types into lengths. But I think I would put this off,
 except placing a comment somewhere, until its gets necessary.

I guess I wonder how hard we think it would be to insert such a thing
when it becomes necessary.  How much stuff is there out there that
cares about the fact that that length is a byte?

 But having said that, +1 on the general idea of getting something like
 this done.  We really need a better infrastructure to avoid copying
 large values around repeatedly in memory - a gigabyte is a lot of data
 to be slinging around.

 Of course, you will not be surprised to hear that I think this is 9.4 
 material.

 Yes, obviously. But I need time to actually propose a working patch (I
 already found 2 bugs in what I had submitted), thats why I brought it up
 now. No point in wasting time if there's an oviously better idea around.

Cool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] indirect toast tuple support

2013-02-19 Thread Andres Freund
On 2013-02-19 09:12:02 -0500, Robert Haas wrote:
 On Tue, Feb 19, 2013 at 9:00 AM, Andres Freund and...@2ndquadrant.com wrote:
  So the other way that we could do this is to use something that's the
  same size as a TOAST pointer but has different content - the
  seemingly-obvious choice being  va_toastrelid == 0.
 
  Unfortunately that would mean you need to copy the varatt_external (or
  whatever it would be called) to aligned storage to check what it
  is. Thats why I went the other way.
 
 How big a problem is that, though?

There are quite some places where we test the actual type of a Datum
inside tuptoaster.c. Copying it to local storage everytime might
actually be noticeable performancewise. Besides the ugliness of needing
a local variable, copying the data and only testing afterwards...

   I'd be a little
  reluctant to do it the way you propose because we might, at some
  point, want to try to reduce the size of toast pointers.   If you have
  a tuple with many attributes, the size of the TOAST pointers
  themselves starts to add up.  It would be nice to be able to have 8
  byte or even 4 byte toast pointers to handle those situations.  If we
  steal one or both of those lengths to mean the data is cached in
  memory somewhere then we can't use those lengths in a smaller on-disk
  representation, which would seem a shame.
 
  I agree. As I said above, having the type overlayed into the lenght was
  and is a bad idea, I just haven't found a better one thats compatible
  yet.
  Except inventing typlen=-3 aka toast2 or something. But even that
  wouldn't help getting rid of existing pg_upgraded tables. Besides being
  a maintenance nightmare.
 
  The only reasonable thing I can see us doing is renaming
  varattrib_1b_e.va_len_1be into va_type and redefine VARSIZE_1B_E into a
  switch that maps types into lengths. But I think I would put this off,
  except placing a comment somewhere, until its gets necessary.
 
 I guess I wonder how hard we think it would be to insert such a thing
 when it becomes necessary.  How much stuff is there out there that
 cares about the fact that that length is a byte?

You mean whether we could store the length in 6 bytes and use two for
the type? That should probably work as well. But I don't see much
advantage in that given that all those sizes ought to be static.
Redefining VARSIZE_1B_E as indicated above should be fairly easy, there
aren't many callsites that touch stuff at such low level.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] indirect toast tuple support

2013-02-17 Thread Greg Stark
On Sat, Feb 16, 2013 at 4:42 PM, Andres Freund and...@2ndquadrant.com wrote:
 I propose extending the EXTERNAL varlenas to be able to point to memory
 instead just to disk. It seem apt to use EXTERNAL for this as they
 aren't stored in the normal heap tuple but somewhere else.
 Unfortunately there is no backward-compatible flag space in
 varattrib_1b_e either to nicely denote this and we sure don't want to
 break on-disk compatibility for this. Thus I propose to distinguish
 on-disk and in-memory tuples via the varattrib_1b_e.va_len_1be.

The intention was to use va_len_1be to allow extensibility with
different external toast types. We were originally not going to have
it at all and just before committing we became concerned that we
wanted to avoid painting ourselves into a corner where we wouldn't be
able to come up with any new formats for external toast types. So we
added this one byte. i suggest thinking of it more as a type field
that happens to be the length of the toast pointer by convention than
an actual length header.

Just as an example I have at various times proposed a column
compression method that would store a dictionary of common values and
the toast pointer would be a pointer to this dictionary and an index
into it.

I have no problem using it for this case since it's using up only one
particular value for this field. As long as you don't want to use up
all the possible values for a single type of external pointer it seems
in line with the plan.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] indirect toast tuple support

2013-02-16 Thread Andres Freund
On 2013-02-16 17:42:31 +0100, Andres Freund wrote:
 +/* --
 + * toast_datum_differs -
 + *
 + *  Determine whether two toasted datums are the same and don't have to be
 + *  stored again.
 + * --
 + */
 +static bool
 +toast_datum_differs(struct varlena *old_value, struct varlena *new_value)
 +{
 + Assert(VARATT_IS_EXTERNAL(old_value));
 + Assert(VARATT_IS_EXTERNAL(new_value));
 +
 + /* fast path for the common case where we have the toast oid available 
 */
 + if (VARATT_IS_EXTERNAL_TOAST(old_value) 
 + VARATT_IS_EXTERNAL_TOAST(new_value))
 + return memcmp((char *) old_value, (char *) new_value,
 +   VARSIZE_EXTERNAL(old_value)) == 0;
 ...
 + /* compare payload, we're fine with unaligned data */
 + return memcmp(VARDATA_ANY(old_value), VARDATA_ANY(new_value),
 +   VARSIZE_ANY_EXHDR(old_value)) == 0;
 +}

Those == need to be !=. Comes from changing the meaning of a function
last minute without an easy way to test (it just uselessly emits a new
toast tuple when nothing changed).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers