Re: [HACKERS] Alignment padding bytes in arrays vs the planner
Robert Haas writes: > Excellent. Are you going to look at MauMau's patch for bug #6011 also? No. I don't do Windows, so I can't test it. (On general principles, I don't think that hacking write_eventlog the way he did is appropriate; such a function should write the log, not editorialize. But that's up to whoever does commit it.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Alignment padding bytes in arrays vs the planner
Noah Misch writes: > Adding a memory definedness check to printtup() turned up one more culprit: > tsquery_and. Patch applied, thanks. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Alignment padding bytes in arrays vs the planner
On Tue, May 24, 2011 at 2:18 PM, Tom Lane wrote: > Robert Haas writes: >> On Tue, May 24, 2011 at 2:11 PM, Noah Misch wrote: >>> QTN2QT() allocates memory for a TSQuery using palloc(). TSQuery contains an >>> array of QueryItem, which contains three bytes of padding between its first >>> and >>> second members. Those bytes don't get initialized, so we have unpredictable >>> content in the resulting datum. > >> OK, so I guess this needs to be applied and back-patched to 8.3, then. > > Yeah. I'm in process of doing that, actually. Excellent. Are you going to look at MauMau's patch for bug #6011 also? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Alignment padding bytes in arrays vs the planner
Robert Haas writes: > On Tue, May 24, 2011 at 2:11 PM, Noah Misch wrote: >> QTN2QT() allocates memory for a TSQuery using palloc(). TSQuery contains an >> array of QueryItem, which contains three bytes of padding between its first >> and >> second members. Those bytes don't get initialized, so we have unpredictable >> content in the resulting datum. > OK, so I guess this needs to be applied and back-patched to 8.3, then. Yeah. I'm in process of doing that, actually. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Alignment padding bytes in arrays vs the planner
On Tue, May 24, 2011 at 2:11 PM, Noah Misch wrote: >> OK, I can't see what's broken. Help? > > QTN2QT() allocates memory for a TSQuery using palloc(). TSQuery contains an > array of QueryItem, which contains three bytes of padding between its first > and > second members. Those bytes don't get initialized, so we have unpredictable > content in the resulting datum. OK, so I guess this needs to be applied and back-patched to 8.3, then. 8.2 doesn't have this code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Alignment padding bytes in arrays vs the planner
On Tue, May 24, 2011 at 02:05:33PM -0400, Robert Haas wrote: > On Mon, May 23, 2011 at 1:12 AM, Noah Misch wrote: > > On Tue, Apr 26, 2011 at 11:51:35PM -0400, Noah Misch wrote: > >> On Tue, Apr 26, 2011 at 07:23:12PM -0400, Tom Lane wrote: > >> [input functions aren't the only problematic source of uninitialized datum > >> bytes] > >> > >> > We've run into other manifestations of this issue before. ?Awhile ago > >> > I made a push to ensure that datatype input functions didn't leave any > >> > ill-defined padding bytes in their results, as a result of similar > >> > misbehavior for simple constants. ?But this example shows that we'd > >> > really have to enforce the rule of "no ill-defined bytes" for just about > >> > every user-callable function's results, which is a pretty ugly prospect. > >> > >> FWIW, when I was running the test suite under valgrind, these were the > >> functions > >> that left uninitialized bytes in datums: array_recv, array_set, > >> array_set_slice, > >> array_map, construct_md_array, path_recv. ?If the test suite covers this > >> well, > >> we're not far off. ?(Actually, I only had the check in PageAddItem ... > >> probably > >> needed to be in one or two other places to catch as much as possible.) > > > > Adding a memory definedness check to printtup() turned up one more culprit: > > tsquery_and. > > *squints* > > OK, I can't see what's broken. Help? QTN2QT() allocates memory for a TSQuery using palloc(). TSQuery contains an array of QueryItem, which contains three bytes of padding between its first and second members. Those bytes don't get initialized, so we have unpredictable content in the resulting datum. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Alignment padding bytes in arrays vs the planner
On Mon, May 23, 2011 at 1:12 AM, Noah Misch wrote: > On Tue, Apr 26, 2011 at 11:51:35PM -0400, Noah Misch wrote: >> On Tue, Apr 26, 2011 at 07:23:12PM -0400, Tom Lane wrote: >> [input functions aren't the only problematic source of uninitialized datum >> bytes] >> >> > We've run into other manifestations of this issue before. Awhile ago >> > I made a push to ensure that datatype input functions didn't leave any >> > ill-defined padding bytes in their results, as a result of similar >> > misbehavior for simple constants. But this example shows that we'd >> > really have to enforce the rule of "no ill-defined bytes" for just about >> > every user-callable function's results, which is a pretty ugly prospect. >> >> FWIW, when I was running the test suite under valgrind, these were the >> functions >> that left uninitialized bytes in datums: array_recv, array_set, >> array_set_slice, >> array_map, construct_md_array, path_recv. If the test suite covers this >> well, >> we're not far off. (Actually, I only had the check in PageAddItem ... >> probably >> needed to be in one or two other places to catch as much as possible.) > > Adding a memory definedness check to printtup() turned up one more culprit: > tsquery_and. *squints* OK, I can't see what's broken. Help? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Alignment padding bytes in arrays vs the planner
On Tue, Apr 26, 2011 at 11:51:35PM -0400, Noah Misch wrote: > On Tue, Apr 26, 2011 at 07:23:12PM -0400, Tom Lane wrote: > [input functions aren't the only problematic source of uninitialized datum > bytes] > > > We've run into other manifestations of this issue before. Awhile ago > > I made a push to ensure that datatype input functions didn't leave any > > ill-defined padding bytes in their results, as a result of similar > > misbehavior for simple constants. But this example shows that we'd > > really have to enforce the rule of "no ill-defined bytes" for just about > > every user-callable function's results, which is a pretty ugly prospect. > > FWIW, when I was running the test suite under valgrind, these were the > functions > that left uninitialized bytes in datums: array_recv, array_set, > array_set_slice, > array_map, construct_md_array, path_recv. If the test suite covers this well, > we're not far off. (Actually, I only had the check in PageAddItem ... > probably > needed to be in one or two other places to catch as much as possible.) Adding a memory definedness check to printtup() turned up one more culprit: tsquery_and. *** a/src/backend/utils/adt/tsquery_util.c --- b/src/backend/utils/adt/tsquery_util.c *** *** 336,342 QTN2QT(QTNode *in) cntsize(in, &sumlen, &nnode); len = COMPUTESIZE(nnode, sumlen); ! out = (TSQuery) palloc(len); SET_VARSIZE(out, len); out->size = nnode; --- 336,342 cntsize(in, &sumlen, &nnode); len = COMPUTESIZE(nnode, sumlen); ! out = (TSQuery) palloc0(len); SET_VARSIZE(out, len); out->size = nnode; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Alignment padding bytes in arrays vs the planner
Noah Misch writes: > On Tue, Apr 26, 2011 at 07:23:12PM -0400, Tom Lane wrote: > [input functions aren't the only problematic source of uninitialized datum > bytes] > FWIW, when I was running the test suite under valgrind, these were the > functions that left uninitialized bytes in datums: array_recv, > array_set, array_set_slice, array_map, construct_md_array, path_recv. > If the test suite covers this well, we're not far off. (Actually, I > only had the check in PageAddItem ... probably needed to be in one or > two other places to catch as much as possible.) Hmm. Eyeballing arrayfuncs.c yesterday, I noted the following functions using palloc where palloc0 would be safer: array_recv array_get_slice array_set array_set_slice array_map construct_md_array construct_empty_array The last may not be an actual hazard since I think there are no pad bytes in its result, but on the other hand palloc0 is cheap insurance for it. I hadn't looked at the geometry functions but padding in paths isn't surprising at all. When dealing with very large arrays, there might be a case to be made for not using palloc0 but trying to zero just what needs zeroed. However that looks a bit complicated to get right, and it's not impossible that it could end up being slower, since it would add per-element processing to fill pad bytes instead of just skipping over them. (memset is pretty damn fast on most machines ...) For the moment I'm just going to do s/palloc/palloc0/ as a reliable and back-patchable fix --- possibly in future someone will care to look into whether the other way is a performance win. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Alignment padding bytes in arrays vs the planner
Greg Stark writes: > On Wed, Apr 27, 2011 at 12:23 AM, Tom Lane wrote: >> Any ideas about better answers? > Here's a crazy idea. We could use string equality of the out > function's representation instead. If an output function doesn't > consistently output the same data for things that are equal or > different data for things that aren't equal then there's a bug in it > already since it means the data type won't survive a pg_dump/reload. Hmm, cute idea, but existing output functions don't actually cooperate with this goal very well: * float4_out, float8_out, and most geometry types don't promise to produce invertible results unless extra_float_digits is set high enough. * timestamptz_out and friends react to both DateStyle and TimeZone settings, which means you lose in the other direction: identical values could print differently at different times. This is a killer for the planner's use since what it's doing is precisely comparing constants generated during index or rule creation to those appearing in queries. I think if we were going to go down this road, we'd have little choice but to invent the specific concept of a type-specific "identity" function, "foo_identical(foo, foo) returns bool". This wouldn't present any pain for most datatype authors since omitting it would just license the system to assume bitwise equality is the right behavior. As far as the other issues I mentioned go: * We could dodge most of the performance hit if Const carried a flag indicating whether the datatype has an identity function or not. This would allow equal() to fall through without a table lookup in the large majority of cases. It wouldn't add any expense to Const construction since you already have to know or fetch other datatype properties like pass-by-value. * We could likely finesse the transactional-safety issue by allowing equal() to fall back on bitwise comparison if not IsTransactionState. I'm still not sure whether there actually are any cases where it has to work outside a transaction, and if there are, the dumb/conservative behavior is probably adequate. Still, it'd be a lot of work, and it doesn't offer any chance of a back-patchable fix. Looking at Noah's valgrind results, I'm inclined to just go seal off the known holes instead. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Alignment padding bytes in arrays vs the planner
On Tue, Apr 26, 2011 at 07:23:12PM -0400, Tom Lane wrote: [input functions aren't the only problematic source of uninitialized datum bytes] > We've run into other manifestations of this issue before. Awhile ago > I made a push to ensure that datatype input functions didn't leave any > ill-defined padding bytes in their results, as a result of similar > misbehavior for simple constants. But this example shows that we'd > really have to enforce the rule of "no ill-defined bytes" for just about > every user-callable function's results, which is a pretty ugly prospect. FWIW, when I was running the test suite under valgrind, these were the functions that left uninitialized bytes in datums: array_recv, array_set, array_set_slice, array_map, construct_md_array, path_recv. If the test suite covers this well, we're not far off. (Actually, I only had the check in PageAddItem ... probably needed to be in one or two other places to catch as much as possible.) > The seemingly-obvious alternative is to teach equal() to use > type-specific comparison functions that will successfully ignore > semantically-insignificant bytes in a value's representation. However > this answer has got its own serious problems, including performance, > transaction safety (I don't think we can assume that equal() is always > called within live transactions) and the difficulty of identifying > suitable comparison functions. Not all types have btree comparison > functions, and for some of the ones that do, btree "equality" does not > imply that the values are indistinguishable for every purpose, which is > what we really need from equal(). Doesn't seem promising, indeed. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Alignment padding bytes in arrays vs the planner
On Wed, Apr 27, 2011 at 12:23 AM, Tom Lane wrote: > Any ideas about better answers? > Here's a crazy idea. We could use string equality of the out function's representation instead. If an output function doesn't consistently output the same data for things that are equal or different data for things that aren't equal then there's a bug in it already since it means the data type won't survive a pg_dump/reload. That alone wouldn't help since the output function could also depend on being in a transaction but whenever we build the Const datum we must be in a transaction, so we could store a string representation in the Const datum and then when we need to do equal() just compare those string representations... I think this still performs terribly and it wastes lots of memory (and I would assume space in rule representations?) so I think it's just a crazy idea, but since you're asking -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Alignment padding bytes in arrays vs the planner
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 > Any ideas about better answers? Seems like you covered it - anything other than memcmp() is going to require a lot of brainz and have lots of sharp edges. > But this example shows that we'd really have to enforce the rule > of "no ill-defined bytes" for just about every user-callable > function's results, which is a pretty ugly prospect. Why is that so ugly? Seems the most logical route. And even if we don't get all of them right away (e.g. not 'enforced' right away), we're no worse off than we are now, but we don't have to dive into retraining equal() or touch any other parts of the code. - -- Greg Sabino Mullane g...@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201104262139 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAk23dGEACgkQvJuQZxSWSsidwQCgrIc1I85P6a1jF5Xwq1vRbzwF v/wAoImYBZZo930+IGgL61BEQ+1YCMaN =9fkS -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Alignment padding bytes in arrays vs the planner
I looked into David Johnston's report of curious planner behavior: http://archives.postgresql.org/pgsql-general/2011-04/msg00885.php What is happening is that the planner doesn't reliably see the expression ti_status = ANY ('{ACTIVE,DISPATCHED,FAILURE}'::text[]) as equal to other copies of itself, so it gets a bit schizoid as to whether to suppress duplicates. (There might be something else funny too, but that's most of it.) The reason it doesn't see this is that the text[] constant is formed from evaluation of an ArrayExpr, and construct_md_array is cavalier about pad bytes in its output, and equalfuncs.c compares Const datums by simple memcmp() so equal() doesn't say TRUE for two array constants with different padding bytes. We've run into other manifestations of this issue before. Awhile ago I made a push to ensure that datatype input functions didn't leave any ill-defined padding bytes in their results, as a result of similar misbehavior for simple constants. But this example shows that we'd really have to enforce the rule of "no ill-defined bytes" for just about every user-callable function's results, which is a pretty ugly prospect. The seemingly-obvious alternative is to teach equal() to use type-specific comparison functions that will successfully ignore semantically-insignificant bytes in a value's representation. However this answer has got its own serious problems, including performance, transaction safety (I don't think we can assume that equal() is always called within live transactions) and the difficulty of identifying suitable comparison functions. Not all types have btree comparison functions, and for some of the ones that do, btree "equality" does not imply that the values are indistinguishable for every purpose, which is what we really need from equal(). We can solve David's immediate case by ensuring that the functions in arrayfuncs.c don't emit arrays containing ill-defined bytes, but I have little confidence that there aren't similar bugs hiding under other rocks. The only bright spot in the mess is that datatypes containing pad bytes aren't that common, since usually people feel a need to make the on-disk representation as small as possible. Any ideas about better answers? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers