Re: [HACKERS] Alignment padding bytes in arrays vs the planner

2011-05-24 Thread Robert Haas
On Mon, May 23, 2011 at 1:12 AM, Noah Misch n...@leadboat.com wrote:
 On Tue, Apr 26, 2011 at 11:51:35PM -0400, Noah Misch wrote:
 On Tue, Apr 26, 2011 at 07:23:12PM -0400, Tom Lane wrote:
 [input functions aren't the only problematic source of uninitialized datum 
 bytes]

  We've run into other manifestations of this issue before.  Awhile ago
  I made a push to ensure that datatype input functions didn't leave any
  ill-defined padding bytes in their results, as a result of similar
  misbehavior for simple constants.  But this example shows that we'd
  really have to enforce the rule of no ill-defined bytes for just about
  every user-callable function's results, which is a pretty ugly prospect.

 FWIW, when I was running the test suite under valgrind, these were the 
 functions
 that left uninitialized bytes in datums: array_recv, array_set, 
 array_set_slice,
 array_map, construct_md_array, path_recv.  If the test suite covers this 
 well,
 we're not far off.  (Actually, I only had the check in PageAddItem ... 
 probably
 needed to be in one or two other places to catch as much as possible.)

 Adding a memory definedness check to printtup() turned up one more culprit:
 tsquery_and.

*squints*

OK, I can't see what's broken.  Help?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alignment padding bytes in arrays vs the planner

2011-05-24 Thread Noah Misch
On Tue, May 24, 2011 at 02:05:33PM -0400, Robert Haas wrote:
 On Mon, May 23, 2011 at 1:12 AM, Noah Misch n...@leadboat.com wrote:
  On Tue, Apr 26, 2011 at 11:51:35PM -0400, Noah Misch wrote:
  On Tue, Apr 26, 2011 at 07:23:12PM -0400, Tom Lane wrote:
  [input functions aren't the only problematic source of uninitialized datum 
  bytes]
 
   We've run into other manifestations of this issue before. ?Awhile ago
   I made a push to ensure that datatype input functions didn't leave any
   ill-defined padding bytes in their results, as a result of similar
   misbehavior for simple constants. ?But this example shows that we'd
   really have to enforce the rule of no ill-defined bytes for just about
   every user-callable function's results, which is a pretty ugly prospect.
 
  FWIW, when I was running the test suite under valgrind, these were the 
  functions
  that left uninitialized bytes in datums: array_recv, array_set, 
  array_set_slice,
  array_map, construct_md_array, path_recv. ?If the test suite covers this 
  well,
  we're not far off. ?(Actually, I only had the check in PageAddItem ... 
  probably
  needed to be in one or two other places to catch as much as possible.)
 
  Adding a memory definedness check to printtup() turned up one more culprit:
  tsquery_and.
 
 *squints*
 
 OK, I can't see what's broken.  Help?

QTN2QT() allocates memory for a TSQuery using palloc().  TSQuery contains an
array of QueryItem, which contains three bytes of padding between its first and
second members.  Those bytes don't get initialized, so we have unpredictable
content in the resulting datum.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alignment padding bytes in arrays vs the planner

2011-05-24 Thread Robert Haas
On Tue, May 24, 2011 at 2:11 PM, Noah Misch n...@leadboat.com wrote:
 OK, I can't see what's broken.  Help?

 QTN2QT() allocates memory for a TSQuery using palloc().  TSQuery contains an
 array of QueryItem, which contains three bytes of padding between its first 
 and
 second members.  Those bytes don't get initialized, so we have unpredictable
 content in the resulting datum.

OK, so I guess this needs to be applied and back-patched to 8.3, then.
 8.2 doesn't have this code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alignment padding bytes in arrays vs the planner

2011-05-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, May 24, 2011 at 2:11 PM, Noah Misch n...@leadboat.com wrote:
 QTN2QT() allocates memory for a TSQuery using palloc().  TSQuery contains an
 array of QueryItem, which contains three bytes of padding between its first 
 and
 second members.  Those bytes don't get initialized, so we have unpredictable
 content in the resulting datum.

 OK, so I guess this needs to be applied and back-patched to 8.3, then.

Yeah.  I'm in process of doing that, actually.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alignment padding bytes in arrays vs the planner

2011-05-24 Thread Robert Haas
On Tue, May 24, 2011 at 2:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, May 24, 2011 at 2:11 PM, Noah Misch n...@leadboat.com wrote:
 QTN2QT() allocates memory for a TSQuery using palloc().  TSQuery contains an
 array of QueryItem, which contains three bytes of padding between its first 
 and
 second members.  Those bytes don't get initialized, so we have unpredictable
 content in the resulting datum.

 OK, so I guess this needs to be applied and back-patched to 8.3, then.

 Yeah.  I'm in process of doing that, actually.

Excellent.  Are you going to look at MauMau's patch for bug #6011 also?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alignment padding bytes in arrays vs the planner

2011-05-24 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 Adding a memory definedness check to printtup() turned up one more culprit:
 tsquery_and.

Patch applied, thanks.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alignment padding bytes in arrays vs the planner

2011-05-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Excellent.  Are you going to look at MauMau's patch for bug #6011 also?

No.  I don't do Windows, so I can't test it.

(On general principles, I don't think that hacking write_eventlog the
way he did is appropriate; such a function should write the log, not
editorialize.  But that's up to whoever does commit it.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alignment padding bytes in arrays vs the planner

2011-05-22 Thread Noah Misch
On Tue, Apr 26, 2011 at 11:51:35PM -0400, Noah Misch wrote:
 On Tue, Apr 26, 2011 at 07:23:12PM -0400, Tom Lane wrote:
 [input functions aren't the only problematic source of uninitialized datum 
 bytes]
 
  We've run into other manifestations of this issue before.  Awhile ago
  I made a push to ensure that datatype input functions didn't leave any
  ill-defined padding bytes in their results, as a result of similar
  misbehavior for simple constants.  But this example shows that we'd
  really have to enforce the rule of no ill-defined bytes for just about
  every user-callable function's results, which is a pretty ugly prospect.
 
 FWIW, when I was running the test suite under valgrind, these were the 
 functions
 that left uninitialized bytes in datums: array_recv, array_set, 
 array_set_slice,
 array_map, construct_md_array, path_recv.  If the test suite covers this well,
 we're not far off.  (Actually, I only had the check in PageAddItem ... 
 probably
 needed to be in one or two other places to catch as much as possible.)

Adding a memory definedness check to printtup() turned up one more culprit:
tsquery_and.
*** a/src/backend/utils/adt/tsquery_util.c
--- b/src/backend/utils/adt/tsquery_util.c
***
*** 336,342  QTN2QT(QTNode *in)
cntsize(in, sumlen, nnode);
len = COMPUTESIZE(nnode, sumlen);
  
!   out = (TSQuery) palloc(len);
SET_VARSIZE(out, len);
out-size = nnode;
  
--- 336,342 
cntsize(in, sumlen, nnode);
len = COMPUTESIZE(nnode, sumlen);
  
!   out = (TSQuery) palloc0(len);
SET_VARSIZE(out, len);
out-size = nnode;
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alignment padding bytes in arrays vs the planner

2011-04-27 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 On Wed, Apr 27, 2011 at 12:23 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Any ideas about better answers?

 Here's a crazy idea. We could use string equality of the out
 function's representation instead. If an output function doesn't
 consistently output the same data for things that are equal or
 different data for things that aren't equal then there's a bug in it
 already since it means the data type won't survive a pg_dump/reload.

Hmm, cute idea, but existing output functions don't actually cooperate
with this goal very well:

* float4_out, float8_out, and most geometry types don't promise to
produce invertible results unless extra_float_digits is set high enough.

* timestamptz_out and friends react to both DateStyle and TimeZone
settings, which means you lose in the other direction: identical values
could print differently at different times.  This is a killer for the
planner's use since what it's doing is precisely comparing constants
generated during index or rule creation to those appearing in queries.

I think if we were going to go down this road, we'd have little choice
but to invent the specific concept of a type-specific identity function,
foo_identical(foo, foo) returns bool.  This wouldn't present any pain
for most datatype authors since omitting it would just license the
system to assume bitwise equality is the right behavior.  As far as the
other issues I mentioned go:

* We could dodge most of the performance hit if Const carried a flag
indicating whether the datatype has an identity function or not.  This
would allow equal() to fall through without a table lookup in the large
majority of cases.  It wouldn't add any expense to Const construction
since you already have to know or fetch other datatype properties like
pass-by-value.

* We could likely finesse the transactional-safety issue by allowing
equal() to fall back on bitwise comparison if not IsTransactionState.
I'm still not sure whether there actually are any cases where it has
to work outside a transaction, and if there are, the dumb/conservative
behavior is probably adequate.

Still, it'd be a lot of work, and it doesn't offer any chance of a
back-patchable fix.  Looking at Noah's valgrind results, I'm inclined
to just go seal off the known holes instead.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alignment padding bytes in arrays vs the planner

2011-04-27 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Tue, Apr 26, 2011 at 07:23:12PM -0400, Tom Lane wrote:
 [input functions aren't the only problematic source of uninitialized datum 
 bytes]

 FWIW, when I was running the test suite under valgrind, these were the
 functions that left uninitialized bytes in datums: array_recv,
 array_set, array_set_slice, array_map, construct_md_array, path_recv.
 If the test suite covers this well, we're not far off.  (Actually, I
 only had the check in PageAddItem ... probably needed to be in one or
 two other places to catch as much as possible.)

Hmm.  Eyeballing arrayfuncs.c yesterday, I noted the following functions
using palloc where palloc0 would be safer:

array_recv
array_get_slice
array_set
array_set_slice
array_map
construct_md_array
construct_empty_array

The last may not be an actual hazard since I think there are no pad
bytes in its result, but on the other hand palloc0 is cheap insurance
for it.  I hadn't looked at the geometry functions but padding in paths
isn't surprising at all.

When dealing with very large arrays, there might be a case to be made
for not using palloc0 but trying to zero just what needs zeroed.
However that looks a bit complicated to get right, and it's not
impossible that it could end up being slower, since it would add
per-element processing to fill pad bytes instead of just skipping over
them.  (memset is pretty damn fast on most machines ...)  For the moment
I'm just going to do s/palloc/palloc0/ as a reliable and back-patchable
fix --- possibly in future someone will care to look into whether the
other way is a performance win.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Alignment padding bytes in arrays vs the planner

2011-04-26 Thread Tom Lane
I looked into David Johnston's report of curious planner behavior:
http://archives.postgresql.org/pgsql-general/2011-04/msg00885.php

What is happening is that the planner doesn't reliably see the
expression ti_status = ANY ('{ACTIVE,DISPATCHED,FAILURE}'::text[])
as equal to other copies of itself, so it gets a bit schizoid as
to whether to suppress duplicates.  (There might be something else
funny too, but that's most of it.)  The reason it doesn't see this
is that the text[] constant is formed from evaluation of an ArrayExpr,
and construct_md_array is cavalier about pad bytes in its output,
and equalfuncs.c compares Const datums by simple memcmp() so equal()
doesn't say TRUE for two array constants with different padding bytes.

We've run into other manifestations of this issue before.  Awhile ago
I made a push to ensure that datatype input functions didn't leave any
ill-defined padding bytes in their results, as a result of similar
misbehavior for simple constants.  But this example shows that we'd
really have to enforce the rule of no ill-defined bytes for just about
every user-callable function's results, which is a pretty ugly prospect.

The seemingly-obvious alternative is to teach equal() to use
type-specific comparison functions that will successfully ignore
semantically-insignificant bytes in a value's representation.  However
this answer has got its own serious problems, including performance,
transaction safety (I don't think we can assume that equal() is always
called within live transactions) and the difficulty of identifying
suitable comparison functions.  Not all types have btree comparison
functions, and for some of the ones that do, btree equality does not
imply that the values are indistinguishable for every purpose, which is
what we really need from equal().

We can solve David's immediate case by ensuring that the functions in
arrayfuncs.c don't emit arrays containing ill-defined bytes, but I have
little confidence that there aren't similar bugs hiding under other
rocks.  The only bright spot in the mess is that datatypes containing
pad bytes aren't that common, since usually people feel a need to make
the on-disk representation as small as possible.

Any ideas about better answers?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alignment padding bytes in arrays vs the planner

2011-04-26 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


 Any ideas about better answers?

Seems like you covered it - anything other than memcmp() is going 
to require a lot of brainz and have lots of sharp edges.

 But this example shows that we'd really have to enforce the rule 
 of no ill-defined bytes for just about every user-callable 
 function's results, which is a pretty ugly prospect.

Why is that so ugly? Seems the most logical route. And even if 
we don't get all of them right away (e.g. not 'enforced' right 
away), we're no worse off than we are now, but we don't have to 
dive into retraining equal() or touch any other parts of the code.

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201104262139
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAk23dGEACgkQvJuQZxSWSsidwQCgrIc1I85P6a1jF5Xwq1vRbzwF
v/wAoImYBZZo930+IGgL61BEQ+1YCMaN
=9fkS
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alignment padding bytes in arrays vs the planner

2011-04-26 Thread Greg Stark
On Wed, Apr 27, 2011 at 12:23 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Any ideas about better answers?


Here's a crazy idea. We could use string equality of the out
function's representation instead. If an output function doesn't
consistently output the same data for things that are equal or
different data for things that aren't equal then there's a bug in it
already since it means the data type won't survive a pg_dump/reload.

That alone wouldn't help since the output function could also depend
on being in a transaction but whenever we build the Const datum we
must be in a transaction, so we could store a string representation in
the Const datum and then when we need to do equal() just compare those
string representations...

I think this still performs terribly and it wastes lots of memory (and
I would assume space in rule representations?) so I think it's just a
crazy idea, but since you're asking

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alignment padding bytes in arrays vs the planner

2011-04-26 Thread Noah Misch
On Tue, Apr 26, 2011 at 07:23:12PM -0400, Tom Lane wrote:
[input functions aren't the only problematic source of uninitialized datum 
bytes]

 We've run into other manifestations of this issue before.  Awhile ago
 I made a push to ensure that datatype input functions didn't leave any
 ill-defined padding bytes in their results, as a result of similar
 misbehavior for simple constants.  But this example shows that we'd
 really have to enforce the rule of no ill-defined bytes for just about
 every user-callable function's results, which is a pretty ugly prospect.

FWIW, when I was running the test suite under valgrind, these were the functions
that left uninitialized bytes in datums: array_recv, array_set, array_set_slice,
array_map, construct_md_array, path_recv.  If the test suite covers this well,
we're not far off.  (Actually, I only had the check in PageAddItem ... probably
needed to be in one or two other places to catch as much as possible.)

 The seemingly-obvious alternative is to teach equal() to use
 type-specific comparison functions that will successfully ignore
 semantically-insignificant bytes in a value's representation.  However
 this answer has got its own serious problems, including performance,
 transaction safety (I don't think we can assume that equal() is always
 called within live transactions) and the difficulty of identifying
 suitable comparison functions.  Not all types have btree comparison
 functions, and for some of the ones that do, btree equality does not
 imply that the values are indistinguishable for every purpose, which is
 what we really need from equal().

Doesn't seem promising, indeed.

nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers