Re: [HACKERS] record identical operator - Review
On Thu, Oct 3, 2013 at 10:46:05AM -0700, Kevin Grittner wrote: opcname - varchar_ops kd_point_ops cidr_ops text_pattern_ops varchar_pattern_ops bpchar_pattern_ops (6 rows) Do these all have operators defined too? Every operator class is associated with operators. For example, while text_pattern_ops uses the same = and operators as the default text opclass (because that already uses memcmp), it adds ~~, ~=~, ~~, and ~=~ operators which also use memcmp (ignoring character encoding and collation). OK, my questions have been answered and I am no longer concerned about this patch causing equality confusion for our users. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator - Review
On 09/30/2013 09:08 AM, Kevin Grittner wrote: Steve Singer st...@ssinger.info wrote: How about To support matching of rows which include elements without a default B-tree operator class, the following operators are defined for composite type comparison: literal*=/, literal*lt;gt;/, literal*lt;/, literal*lt;=/, literal*gt;/, and literal*gt;=/. These operators compare the internal binary representation of the two rows. Two rows might have a different binary representation even though comparisons of the two rows with the equality operator is true. The ordering of rows under these comparision operators is deterministic but not otherwise meaningful. These operators are used internally for materialized views and might be useful for other specialized purposes such as replication but are not intended to be generally useful for writing queries. I agree that's an improvement. Thanks! Are there any outstanding issues on this patch preventing it from being committed? I think we have discussed this patch enough such that we now have consensus on proceeding with adding a record identical operator to SQL. No one has objected to the latest names of the operators. You haven't adjusted the patch to reduce the duplication between the equality and comparison functions, if you disagree with me and feel that doing so would increase the code complexity and be inconsistent with how we do things elsewhere that is fine. Steve -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator - Review
On Thu, Oct 3, 2013 at 09:59:03AM -0400, Steve Singer wrote: Are there any outstanding issues on this patch preventing it from being committed? I have not received answers to my email of October 1: http://www.postgresql.org/message-id/20131001024620.gb13...@momjian.us -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator - Review
Steve, Thanks for following-up on this; I had meant to reply much sooner but other things got in the way. Thanks again, Stephen * Steve Singer (st...@ssinger.info) wrote: Are there any outstanding issues on this patch preventing it from being committed? I think we have discussed this patch enough such that we now have consensus on proceeding with adding a record identical operator to SQL. No one has objected to the latest names of the operators. You haven't adjusted the patch to reduce the duplication between the equality and comparison functions, if you disagree with me and feel that doing so would increase the code complexity and be inconsistent with how we do things elsewhere that is fine. signature.asc Description: Digital signature
Re: [HACKERS] record identical operator - Review
Bruce Momjian br...@momjian.us wrote: On Fri, Sep 27, 2013 at 03:34:20PM -0700, Kevin Grittner wrote: We first need to document the existing record comparison operators. If they read the docs for comparing row_constructors and expect that to be the behavior they get when they compare records, they will be surprised. Well, if they appear in \do, I am thinking they should be documented. This patch now covers the ones which are record comparison operators, old and new. Feel free to document the others if you feel strongly on that point; but I don't feel that becomes the business of this patch. Because comparing primary keys doesn't tell us whether the old and new values in the row all match. OK, but my question was about why we need a full set of operators rather than just equal, and maybe not equal. I thought you said we needed others, e.g. , so we could do merge joins, but I thought we would just be doing comparisons after primary keys are joined, and if that is true, we could just use a function. http://www.postgresql.org/docs/current/interactive/xoper-optimization.html#AEN54334 Actually, I am now realizing you have to use the non-binary-level equals comparison on keys, then the binary-level equals on rows for this to work --- that's pretty confusing. Is that true? It's a matter of replacing the = operator for record comparisons in these two places with the new *= operator. http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/matview.c;h=238ccc72f5205ae00a15e6e17f384addfa445552;hb=master#l553 http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/matview.c;h=238ccc72f5205ae00a15e6e17f384addfa445552;hb=master#l684 (With or without the new operator, the second of these needs to be schema-qualified to avoid trouble if the user adds a different implementation of = ahead of the pg_catalog implementation on search_path, as users can do.) The difference between = and *= would not generally be visible to end users -- just to those working on matview.c. A quick query (lacking schema information and schema qualification) shows what is there by default: OK, the unique list is: opcname - varchar_ops kd_point_ops cidr_ops text_pattern_ops varchar_pattern_ops bpchar_pattern_ops (6 rows) Do these all have operators defined too? Every operator class is associated with operators. For example, while text_pattern_ops uses the same = and operators as the default text opclass (because that already uses memcmp), it adds ~~, ~=~, ~~, and ~=~ operators which also use memcmp (ignoring character encoding and collation). -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator - Review
Steve Singer st...@ssinger.info wrote: You haven't adjusted the patch to reduce the duplication between the equality and comparison functions, if you disagree with me and feel that doing so would increase the code complexity and be inconsistent with how we do things elsewhere that is fine. I think the main reason to keep them separate is that it makes it easier to compare record_cmp to record_image_cmp and record_eq to record_image_eq to see what the differences and similarities are. Other reasons are that I think all those conditionals inside a combined function would get messy and make the logic harder to understand. The number of places that would need conditionals, plus new wrappers that would be needed, would mean that the net reduction in lines of code would be minimal. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator - Review
Steve Singer st...@ssinger.info wrote: How about To support matching of rows which include elements without a default B-tree operator class, the following operators are defined for composite type comparison: literal*=/, literal*lt;gt;/, literal*lt;/, literal*lt;=/, literal*gt;/, and literal*gt;=/. These operators compare the internal binary representation of the two rows. Two rows might have a different binary representation even though comparisons of the two rows with the equality operator is true. The ordering of rows under these comparision operators is deterministic but not otherwise meaningful. These operators are used internally for materialized views and might be useful for other specialized purposes such as replication but are not intended to be generally useful for writing queries. I agree that's an improvement. Thanks! -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator - Review
On Fri, Sep 27, 2013 at 03:34:20PM -0700, Kevin Grittner wrote: The arguments for this patch are * We want the materialized view to return the same value as would be returned if the query were executed directly. This not only means that the values should be the same according to a datatypes = operator but that they should appear the same 'to the eyeball'. And to functions the user can run against the values. The example with the null bitmap for arrays being included when not necessary is something that isn't directly apparent to the eye, but queries which use pg_column_size would not get equal results. pg_column_size() is by definition internal details, so I am not worried about that not matching. * Supporting the materialized view refresh check via SQL makes a lot of sense but doing that requires exposing something via SQL * If we adequately document what we mean by record_image_identical and the operator we pick for this then users shouldn't be surprised at what they get if they use this We first need to document the existing record comparison operators. If they read the docs for comparing row_constructors and expect that to be the behavior they get when they compare records, they will be surprised. Well, if they appear in \do, I am thinking they should be documented. This is a good summary. I think there are a few things that make this issue difficult to decide: 1. We have to use an operator to give the RMVC (REFRESH MATERIALIZED VIEW CONCURRENTLY) SPI query the ability to optimize this query. If we could do this with an SQL or C function, there would be less concern about the confusion caused by adding this capability. (a) We can't. (b) Why would that be less confusing? Because function names, especially long ones, are more clearly self-documenting than operators. Question: If we are comparing based on some primary key, why do we need this to optimize? Because comparing primary keys doesn't tell us whether the old and new values in the row all match. OK, but my question was about why we need a full set of operators rather than just equal, and maybe not equal. I thought you said we needed others, e.g. , so we could do merge joins, but I thought we would just be doing comparisons after primary keys are joined, and if that is true, we could just use a function. Actually, I am now realizing you have to use the non-binary-level equals comparison on keys, then the binary-level equals on rows for this to work --- that's pretty confusing. Is that true? 3. Our type casting and operators are already complex, and adding another set of operators only compounds that. It cannot have any effect on any of the existing operators, so I'm not sure whether you are referring to the extra operators and functions, or something else. It does not, for example, introduce any risk of ambiguous operators. It makes our system more complex for the user to understand. One interesting approach would be to only allow the operator to be called from SPI queries. Why would that be a good idea? Because then it would be more of an internal operator. It would also be good to know about similar non-default entries in pg_opclass so we can understand the expected impact. A quick query (lacking schema information and schema qualification) shows what is there by default: OK, the unique list is: opcname - varchar_ops kd_point_ops cidr_ops text_pattern_ops varchar_pattern_ops bpchar_pattern_ops (6 rows) Do these all have operators defined too? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator - Review
On 09/28/2013 03:03 PM, Kevin Grittner wrote: para +To support matching of rows which include elements without a default +B-tree operator class, the following operators are defined for composite +type comparison: +literal*=/, +literal*lt;gt;/, +literal*lt;/, +literal*lt;=/, +literal*gt;/, and +literal*gt;=/. +These operators are also used internally to maintain materialized views, +and may be useful to replication software. To ensure that all user +visible changes are detected, even when the equality operator for the +type treats old and new values as equal, the byte images of the stored +data are compared. While ordering is deterministic, it is not generally +useful except to facilitate merge joins. Ordering may differ between +system architectures and major releases of +productnamePostgreSQL/productname. + /para How about To support matching of rows which include elements without a default B-tree operator class, the following operators are defined for composite type comparison: literal*=/, literal*lt;gt;/, literal*lt;/, literal*lt;=/, literal*gt;/, and literal*gt;=/. These operators compare the internal binary representation of the two rows. Two rows might have a different binary representation even though comparisons of the two rows with the equality operator is true. The ordering of rows under these comparision operators is deterministic but not otherwise meaningful. These operators are used internally for materialized views and might be useful for other specialized purposes such as replication but are not intended to be generally useful for writing queries. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator - Review
Kevin Grittner kgri...@ymail.com wrote: Bruce Momjian br...@momjian.us wrote: I think we need to see a patch from Kevin that shows all the row comparisons documented and we can see the impact of the user-visibile part. First draft attached. I'm inclined to first submit a proposed documentation patch for the existing record operators, and see if we can make everyone happy with that, and *then* see about adding documentation for the new ones. Trying to deal with both at once is likely to increase confusion and wheel-spinning. When I took a stab at it, the new operators only seemed to merit a single paragraph, so that is included after all. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company*** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** *** 12739,12745 WHERE EXISTS (SELECT 1 FROM tab2 WHERE col2 = tab1.col2); para See xref linkend=row-wise-comparison for details about the meaning !of a row-wise comparison. /para /sect2 --- 12739,12745 para See xref linkend=row-wise-comparison for details about the meaning !of a row constructor comparison. /para /sect2 *** *** 12795,12806 WHERE EXISTS (SELECT 1 FROM tab2 WHERE col2 = tab1.col2); para See xref linkend=row-wise-comparison for details about the meaning !of a row-wise comparison. /para /sect2 sect2 !titleRow-wise Comparison/title indexterm zone=functions-subquery primarycomparison/primary --- 12795,12806 para See xref linkend=row-wise-comparison for details about the meaning !of a row constructor comparison. /para /sect2 sect2 !titleSingle-row Comparison/title indexterm zone=functions-subquery primarycomparison/primary *** *** 12823,12829 WHERE EXISTS (SELECT 1 FROM tab2 WHERE col2 = tab1.col2); para See xref linkend=row-wise-comparison for details about the meaning !of a row-wise comparison. /para /sect2 /sect1 --- 12823,12829 para See xref linkend=row-wise-comparison for details about the meaning !of a row constructor comparison. /para /sect2 /sect1 *** *** 12853,12864 WHERE EXISTS (SELECT 1 FROM tab2 WHERE col2 = tab1.col2); /indexterm indexterm primaryrow-wise comparison/primary /indexterm indexterm primarycomparison/primary !secondaryrow-wise/secondary /indexterm indexterm --- 12853,12874 /indexterm indexterm +primarycomposite type/primary +secondarycomparison/secondary + /indexterm + + indexterm primaryrow-wise comparison/primary /indexterm indexterm primarycomparison/primary !secondarycomposite type/secondary ! /indexterm ! ! indexterm !primarycomparison/primary !secondaryrow constructor/secondary /indexterm indexterm *** *** 13023,13029 AND /sect2 sect2 id=row-wise-comparison !titleRow-wise Comparison/title synopsis replaceablerow_constructor/replaceable replaceableoperator/replaceable replaceablerow_constructor/replaceable --- 13033,13039 /sect2 sect2 id=row-wise-comparison !titleRow Constructor Comparison/title synopsis replaceablerow_constructor/replaceable replaceableoperator/replaceable replaceablerow_constructor/replaceable *** *** 13033,13052 AND Each side is a row constructor, as described in xref linkend=sql-syntax-row-constructors. The two row values must have the same number of fields. !Each side is evaluated and they are compared row-wise. Row comparisons !are allowed when the replaceableoperator/replaceable is literal=/, literallt;gt;/, literallt;/, literallt;=/, literalgt;/ or !literalgt;=/, !or has semantics similar to one of these. (To be specific, an operator !can be a row comparison operator if it is a member of a B-tree operator !class, or is the negator of the literal=/ member of a B-tree operator !class.) /para para The literal=/ and literallt;gt;/ cases work slightly differently from the others. Two rows are considered --- 13043,13067 Each side is a row constructor, as described in xref linkend=sql-syntax-row-constructors. The two row values must have the same number of fields. !Each side is evaluated and they are compared row-wise. Row constructor !comparisons are allowed when the replaceableoperator/replaceable is literal=/, literallt;gt;/, literallt;/, literallt;=/, literalgt;/ or !literalgt;=/. !Every row element must be of a type which has a default B-tree operator !class or the attempted comparison may generate an error. /para + note +para + Errors related to the number or types of elements might not
Re: [HACKERS] record identical operator - Review
Bruce Momjian br...@momjian.us wrote: On Thu, Sep 19, 2013 at 06:31:38PM -0400, Steve Singer wrote: On 09/12/2013 06:27 PM, Kevin Grittner wrote: Attached is a patch for a bit of infrastructure I believe to be necessary for correct behavior of REFRESH MATERIALIZED VIEW CONCURRENTLY as well as incremental maintenance of matviews. Here is attempt at a review of the v1 patch. There has been a heated discussion on list about if we even want this type of operator. I will try to summarize it here The arguments against it are * that a record_image_identical call on two records that returns false can still return true under the equality operator, and the records might (or might not) appear to be the same to users * Once we expose this as an operator via SQL someone will find it and use it and then complain when we change things such as the internal representation of a datatype which might break there queries I don't see where that is possible unless they count on the order of records sorted with the new operators, which would not be something I recommend. Changing the internal representation cannot break simple use of the alternative equality definition as long as all you cared about was whether the binary storage format of the values was identical. * The differences between = and record_image_identical might not be understood by everywhere who finds the operator leading to confusion Either they find it and read the code, or we document it and they read the docs. * Using a criteria other than equality (the = operator provided by the datatype) might cause problems if we later provide 'on change' triggers for materialized views I would fight user-defined triggers for matviews tooth and nail. We need to get incremental maintenance working instead. The arguments for this patch are * We want the materialized view to return the same value as would be returned if the query were executed directly. This not only means that the values should be the same according to a datatypes = operator but that they should appear the same 'to the eyeball'. And to functions the user can run against the values. The example with the null bitmap for arrays being included when not necessary is something that isn't directly apparent to the eye, but queries which use pg_column_size would not get equal results. * Supporting the materialized view refresh check via SQL makes a lot of sense but doing that requires exposing something via SQL * If we adequately document what we mean by record_image_identical and the operator we pick for this then users shouldn't be surprised at what they get if they use this We first need to document the existing record comparison operators. If they read the docs for comparing row_constructors and expect that to be the behavior they get when they compare records, they will be surprised. This is a good summary. I think there are a few things that make this issue difficult to decide: 1. We have to use an operator to give the RMVC (REFRESH MATERIALIZED VIEW CONCURRENTLY) SPI query the ability to optimize this query. If we could do this with an SQL or C function, there would be less concern about the confusion caused by adding this capability. (a) We can't. (b) Why would that be less confusing? Question: If we are comparing based on some primary key, why do we need this to optimize? Because comparing primary keys doesn't tell us whether the old and new values in the row all match. Certainly the primary key index will be used, no? It currently uses all columns which are part of unique indexes on the matview which are not partial (i.e., they do not use a WHERE clause), they index only on columns (not expressions). It is not possible to define a primary key on a matview. There are two reasons for using these columns: (a) It provides a correctness guarantee against having duplicated rows which have no nulls. This guarantee is what allows us to use the differential technique which is faster than retail DELETE and re-INSERT of everything. (b) Since we don't support hash joins of records, and it would tend to be pretty slow if we did, it allows faster hash joins on one or more columns which stand a good chance of doing this efficiently. 2. Agregates are already non-deterministic, Only some of them, and only with certain source data. so some feel that adding this feature doesn't improve much. It allows a materialized view which contains a column of a type without a default btree opclass to *use* concurrent refresh, it makes queries with deterministic results always generate matview results with exactly match the results of a VIEW using the query if it were run at the same moment, and for nondeterministic queries, it provides results consistent with *some* result set which could have been returned by a view at the same time. Unless we do something, none of these things are true. That seems like much to me. The counter-argument is that
Re: [HACKERS] record identical operator - Review
On Thu, Sep 19, 2013 at 06:31:38PM -0400, Steve Singer wrote: On 09/12/2013 06:27 PM, Kevin Grittner wrote: Attached is a patch for a bit of infrastructure I believe to be necessary for correct behavior of REFRESH MATERIALIZED VIEW CONCURRENTLY as well as incremental maintenance of matviews. Here is attempt at a review of the v1 patch. There has been a heated discussion on list about if we even want this type of operator. I will try to summarize it here The arguments against it are * that a record_image_identical call on two records that returns false can still return true under the equality operator, and the records might (or might not) appear to be the same to users * Once we expose this as an operator via SQL someone will find it and use it and then complain when we change things such as the internal representation of a datatype which might break there queries * The differences between = and record_image_identical might not be understood by everywhere who finds the operator leading to confusion * Using a criteria other than equality (the = operator provided by the datatype) might cause problems if we later provide 'on change' triggers for materialized views The arguments for this patch are * We want the materialized view to return the same value as would be returned if the query were executed directly. This not only means that the values should be the same according to a datatypes = operator but that they should appear the same 'to the eyeball'. * Supporting the materialized view refresh check via SQL makes a lot of sense but doing that requires exposing something via SQL * If we adequately document what we mean by record_image_identical and the operator we pick for this then users shouldn't be surprised at what they get if they use this This is a good summary. I think there are a few things that make this issue difficult to decide: 1. We have to use an operator to give the RMVC (REFRESH MATERIALIZED VIEW CONCURRENTLY) SPI query the ability to optimize this query. If we could do this with an SQL or C function, there would be less concern about the confusion caused by adding this capability. Question: If we are comparing based on some primary key, why do we need this to optimize? Certainly the primary key index will be used, no? 2. Agregates are already non-deterministic, so some feel that adding this feature doesn't improve much. The counter-argument is that without the binary row comparison ability, rows could be returned that could _never_ have been produced by the base data, which is more of a user surprise than non-deterministic rows. 3. Our type casting and operators are already complex, and adding another set of operators only compounds that. 4. There are legitimate cases where tool makers might want the ability to compare rows binarily, e.g. for replication, and adding these operators would help with that. I think we need to see a patch from Kevin that shows all the row comparisons documented and we can see the impact of the user-visibile part. One interesting approach would be to only allow the operator to be called from SPI queries. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator - Review
On Thu, Sep 26, 2013 at 05:50:08PM -0400, Bruce Momjian wrote: I think we need to see a patch from Kevin that shows all the row comparisons documented and we can see the impact of the user-visibile part. One interesting approach would be to only allow the operator to be called from SPI queries. It would also be good to know about similar non-default entries in pg_opclass so we can understand the expected impact. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator - Review
On Thu, Sep 19, 2013 at 06:31:38PM -0400, Steve Singer wrote: I think there is agreement that better (as in more obscure) operators than === and !== need to be picked and we need to find a place in the user-docs to warn users of the behaviour of this operators. Hannu has proposed *==* binary equal, surely very equal by any other definition as wall !==? maybe not equal -- binary inequal, may still be equal by other comparison methods It's a pity operators must be non-alpha and can't be named. Something like: SELECT foo OPERATOR(byte_equivalent) bar; is simultaneously obscure, yet clear. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] record identical operator - Review
Steve Singer st...@ssinger.info wrote: On 09/12/2013 06:27 PM, Kevin Grittner wrote: Attached is a patch for a bit of infrastructure I believe to be necessary for correct behavior of REFRESH MATERIALIZED VIEW CONCURRENTLY as well as incremental maintenance of matviews. Here is attempt at a review of the v1 patch. Thanks for looking at it. There has been a heated discussion on list about if we even want this type of operator. It's not a question of whether we want to allow operators which define comparisons between objects in a non-standard way. We already have 11 such cases; this would add one more to make 12. In all cases there are different operators for making a comparison that return potentially different results from the default operators, to support uses that are specific to that type. I think there is agreement that better (as in more obscure) operators than === and !== need to be picked and we need to find a place in the user-docs to warn users of the behaviour of this operators. Hannu has proposed *==* binary equal, surely very equal by any other definition as wall !==? maybe not equal -- binary inequal, may still be equal by other comparison methods and no one has yet objected to this. I do. The issue is that there are a great many places that there are multiple definitions of equality. We generally try to use something which tries to convey something about the nature of the operation, since the fact that it can have different results is a given. There is nothing in those operators that says binary image comparison. I thought about prepending a hash character in front of normal operators, because that somehow suggests binary operations to my eye (although I have no idea whether it does so for anyone else), but those operators are already used for an alternative set of comparisons for intervals, with a different meaning. I'm still trying to come up with something I really like. My vote would be to update the patch to use those operator names and then figure out where we can document them. It it means we have to section describing the equal operator for records as well then maybe we should do that so we can get on with things. Code Review The record_image_eq and record_image_cmp functions are VERY similar. It seems to me that you could have the meet of these functions into a common function (that isn't exposed via SQL) that then behaves differently in 2 or 3 spots based on a parameter that controls if it detoasts the tuple for the compare or returns false on the equals. Did you look at the record_eq and record_cmp functions which exist before this patch? Is there a reason we should do it one way for the default operators and another way for this alternative? Do you think we should change record_eq and record_cmp to do things the way you suggest? Beyond that I don't see any issues with the code. You call out a question about two records being compared have a different number of columns should it always be an error, or only an error when they match on the columns they do have in common. The current behaviour is select * FROM r1,r2 where r1==r2; a | b | a | b | c ---+---+---+---+--- 1 | 1 | 1 | 2 | 1 (1 row) update r2 set b=1; UPDATE 1 test=# select * FROM r1,r2 where r2==r1; ERROR: cannot compare record types with different numbers of columns This seems like a bad idea to me. I don't like that I get a type comparison error only sometimes based on the values of the data in the column. If I'm a user testing code that compares two records with this operator and I test my query with 1 value pair then and it works then I'd naively expect to get a true or false on all other value sets of the same record type. Again, this is a result of following the precedent set by the existing record comparison operators. test=# create table r1 (c1 int, c2 int); CREATE TABLE test=# create table r2 (c1 int, c2 int, c3 int); CREATE TABLE test=# insert into r1 values (1, 2); INSERT 0 1 test=# insert into r2 values (3, 4, 5); INSERT 0 1 test=# select * from r1 join r2 on r1 r2; c1 | c2 | c1 | c2 | c3 ++++ 1 | 2 | 3 | 4 | 5 (1 row) test=# update r1 set c1 = 3, c2 = 4; UPDATE 1 test=# select * from r1 join r2 on r1 r2; ERROR: cannot compare record types with different numbers of columns Would be in favor of forcing a change to the record comparison operators which have been in use for years? If not, is there a good reason to have them behave differently in this regard? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator - Review
Steve, Thanks for providing a summary. * Steve Singer (st...@ssinger.info) wrote: The arguments for this patch are * We want the materialized view to return the same value as would be returned if the query were executed directly. This not only means that the values should be the same according to a datatypes = operator but that they should appear the same 'to the eyeball'. With the cases where the equality operator doesn't match the 'to the eyeball' appearance, this really seems to be a pipe dream to me, unless we *also* define an ordering or similar which would then change the existing semantics for end users (which might be reasonable), but I'm not sure how we could do that without input from the type- iow, I don't think we can say well, we'll order by byte representation. It's also going to possibly be a performance hit since we're going to have to do both an equality op call during a HashAgg/HashJoin/whatever and have to do a which one is bigger of these two equal things, and then I wonder if we'd need to allow users to somehow specify I don't want the bigger of the equal things, I want the smaller, in this query for this equality check. I'd really like to see how we're going to provide for that when the user is doing a GROUP BY without breaking something else or causing problems with later SQL spec revisions. * Supporting the materialized view refresh check via SQL makes a lot of sense but doing that requires exposing something via SQL ... which we don't currently expose for the queries that we already support users running today. Users seem to generally be accepting of that too, even though what they end up with in the output isn't necessairly consistent from query to query. The issue here is that we're trying to make the mat view look like what the query would do when run at the same time, which is a bit of a losing battle, imv. * If we adequately document what we mean by record_image_identical and the operator we pick for this then users shouldn't be surprised at what they get if they use this That's a bit over-simplistic, really. We do try to keep to the POLA (principle of least astonishment) and that's not going to be easy here. I think there is agreement that better (as in more obscure) operators than === and !== need to be picked and we need to find a place in the user-docs to warn users of the behaviour of this operators. Hannu has proposed I'm a bit on the fence about this, after having discussed my concerns with Robert at PostgresOpen. If we're going to expose and support these at the SQL level, and we can figure out some semantics and consistency for using them that isn't totally baffling, then perhaps having them as simple and clear operator names would be reasonable. One concern here is if the SQL spec might decide some day that '===' is a useful operator for something else (perhaps even they look the same when cast to text). *==* binary equal, surely very equal by any other definition as wall !==? maybe not equal -- binary inequal, may still be equal by other comparison methods Those look alright to me also though, but we'd need to work out the other operation names, right? And then figure out if we want to use those other operators (less-than, greater-than, etc) when doing equality operations to figure out which equal value to return to the user. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] record identical operator - Review
Stephen Frost sfr...@snowman.net wrote: The issue here is that we're trying to make the mat view look like what the query would do when run at the same time, which is a bit of a losing battle, imv. If we're not doing that, I don't see the point of matviews. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator - Review
On Friday, September 20, 2013, Kevin Grittner wrote: Stephen Frost sfr...@snowman.net javascript:; wrote: The issue here is that we're trying to make the mat view look like what the query would do when run at the same time, which is a bit of a losing battle, imv. If we're not doing that, I don't see the point of matviews. When taken out of context, I can see how that might not come across correctly. I was merely pointing out that it's a losing battle in the context of types which have equality operators which claim equalness but cast to text with results that don't match there. Thanks, Stephen
Re: [HACKERS] record identical operator - Review
Stephen Frost sfr...@snowman.net wrote: On Friday, September 20, 2013, Kevin Grittner wrote: Stephen Frost sfr...@snowman.net wrote: The issue here is that we're trying to make the mat view look like what the query would do when run at the same time, which is a bit of a losing battle, imv. If we're not doing that, I don't see the point of matviews. When taken out of context, I can see how that might not come across correctly. I was merely pointing out that it's a losing battle in the context of types which have equality operators which claim equalness but cast to text with results that don't match there. I think the patch I've submitted wins that battle. The only oddity is that if someone uses a query for a matview which can provide results with rows which are equal to previous result rows under the default record opclass but different under this patch's opclass, RMVC could update to the latest query results when someone thinks that might not be necessary. Workarounds would include using a query with deterministic results (like using the upper() or lower() function on a grouped citext column in the result set) or not using the CONCURRENTLY option. Neither seems too onerous. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator - Review
On 09/20/2013 08:38 AM, Kevin Grittner wrote: Did you look at the record_eq and record_cmp functions which exist before this patch? Is there a reason we should do it one way for the default operators and another way for this alternative? Do you think we should change record_eq and record_cmp to do things the way you suggest? I think the record_eq and record_cmp functions would be better if they shared code as well, but I don't think changing that should be part of this patch, you aren't otherwise touching those functions. I know some people dislike code that is switch based and prefer duplication, my preference is to avoid duplication. This seems like a bad idea to me. I don't like that I get a type comparison error only sometimes based on the values of the data in the column. If I'm a user testing code that compares two records with this operator and I test my query with 1 value pair then and it works then I'd naively expect to get a true or false on all other value sets of the same record type. Again, this is a result of following the precedent set by the existing record comparison operators. test=# create table r1 (c1 int, c2 int); CREATE TABLE test=# create table r2 (c1 int, c2 int, c3 int); CREATE TABLE test=# insert into r1 values (1, 2); INSERT 0 1 test=# insert into r2 values (3, 4, 5); INSERT 0 1 test=# select * from r1 join r2 on r1 r2; c1 | c2 | c1 | c2 | c3 ++++ 1 | 2 | 3 | 4 | 5 (1 row) test=# update r1 set c1 = 3, c2 = 4; UPDATE 1 test=# select * from r1 join r2 on r1 r2; ERROR: cannot compare record types with different numbers of columns Would be in favor of forcing a change to the record comparison operators which have been in use for years? If not, is there a good reason to have them behave differently in this regard? -- I hadn't picked up on that you copied that part of the behaviour from the existing comparison operators. No we would need a really good reason for changing the behaviour of the comparison operators and I don't think we have that. I agree that the binary identical operators should behave similarly to the existing comparison operators and error out on the column number mismatch after comparing the columns that are present in both. Steve Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator - Review
On 09/12/2013 06:27 PM, Kevin Grittner wrote: Attached is a patch for a bit of infrastructure I believe to be necessary for correct behavior of REFRESH MATERIALIZED VIEW CONCURRENTLY as well as incremental maintenance of matviews. Here is attempt at a review of the v1 patch. There has been a heated discussion on list about if we even want this type of operator. I will try to summarize it here The arguments against it are * that a record_image_identical call on two records that returns false can still return true under the equality operator, and the records might (or might not) appear to be the same to users * Once we expose this as an operator via SQL someone will find it and use it and then complain when we change things such as the internal representation of a datatype which might break there queries * The differences between = and record_image_identical might not be understood by everywhere who finds the operator leading to confusion * Using a criteria other than equality (the = operator provided by the datatype) might cause problems if we later provide 'on change' triggers for materialized views The arguments for this patch are * We want the materialized view to return the same value as would be returned if the query were executed directly. This not only means that the values should be the same according to a datatypes = operator but that they should appear the same 'to the eyeball'. * Supporting the materialized view refresh check via SQL makes a lot of sense but doing that requires exposing something via SQL * If we adequately document what we mean by record_image_identical and the operator we pick for this then users shouldn't be surprised at what they get if they use this I think there is agreement that better (as in more obscure) operators than === and !== need to be picked and we need to find a place in the user-docs to warn users of the behaviour of this operators. Hannu has proposed *==* binary equal, surely very equal by any other definition as wall !==? maybe not equal -- binary inequal, may still be equal by other comparison methods and no one has yet objected to this. My vote would be to update the patch to use those operator names and then figure out where we can document them. It it means we have to section describing the equal operator for records as well then maybe we should do that so we can get on with things. Code Review The record_image_eq and record_image_cmp functions are VERY similar. It seems to me that you could have the meet of these functions into a common function (that isn't exposed via SQL) that then behaves differently in 2 or 3 spots based on a parameter that controls if it detoasts the tuple for the compare or returns false on the equals. Beyond that I don't see any issues with the code. You call out a question about two records being compared have a different number of columns should it always be an error, or only an error when they match on the columns they do have in common. The current behaviour is select * FROM r1,r2 where r1==r2; a | b | a | b | c ---+---+---+---+--- 1 | 1 | 1 | 2 | 1 (1 row) update r2 set b=1; UPDATE 1 test=# select * FROM r1,r2 where r2==r1; ERROR: cannot compare record types with different numbers of columns This seems like a bad idea to me. I don't like that I get a type comparison error only sometimes based on the values of the data in the column. If I'm a user testing code that compares two records with this operator and I test my query with 1 value pair then and it works then I'd naively expect to get a true or false on all other value sets of the same record type. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers