Re: [HACKERS] record identical operator - Review

2013-10-06 Thread Bruce Momjian
On Thu, Oct  3, 2013 at 10:46:05AM -0700, Kevin Grittner wrote:
    opcname
  -
  varchar_ops
  kd_point_ops
  cidr_ops
  text_pattern_ops
  varchar_pattern_ops
  bpchar_pattern_ops
  (6 rows)
 
  Do these all have operators defined too?
 
 Every operator class is associated with operators.  For example,
 while text_pattern_ops uses the same = and  operators as the
 default text opclass (because that already uses memcmp), it adds
 ~~, ~=~, ~~, and ~=~ operators which also use memcmp (ignoring
 character encoding and collation).

OK, my questions have been answered and I am no longer concerned about
this patch causing equality confusion for our users.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator - Review

2013-10-03 Thread Steve Singer

On 09/30/2013 09:08 AM, Kevin Grittner wrote:

Steve Singer st...@ssinger.info wrote:


How about

To support matching of rows which include elements without a default
  B-tree operator class, the following operators are defined for composite
  type comparison:
  literal*=/,
  literal*lt;gt;/,
  literal*lt;/,
  literal*lt;=/,
  literal*gt;/, and
  literal*gt;=/.

These operators compare the internal binary representation of the two
rows.  Two rows might have a different binary representation even
though comparisons of the two rows with the equality operator is true.
The ordering of rows under these comparision operators is deterministic
but not otherwise meaningful.  These operators are used internally for
materialized views and might be useful for other specialized purposes
such as replication but are not intended to be generally useful for
writing queries.

I agree that's an improvement.  Thanks!



Are there any outstanding issues on this patch preventing it from being 
committed?
I think we have discussed this patch enough such that we now have 
consensus on proceeding with adding a record identical operator to SQL.

No one has objected to the latest names of the operators.

You haven't adjusted the patch to reduce the duplication between the 
equality and comparison functions, if you disagree with me and feel that 
doing so would increase the code complexity and be inconsistent with how 
we do things elsewhere that is fine.


Steve




--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator - Review

2013-10-03 Thread Bruce Momjian
On Thu, Oct  3, 2013 at 09:59:03AM -0400, Steve Singer wrote:
 Are there any outstanding issues on this patch preventing it from
 being committed?

I have not received answers to my email of October 1:

http://www.postgresql.org/message-id/20131001024620.gb13...@momjian.us

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator - Review

2013-10-03 Thread Stephen Frost
Steve,

Thanks for following-up on this; I had meant to reply much sooner but
other things got in the way.

Thanks again,

Stephen

* Steve Singer (st...@ssinger.info) wrote:
 Are there any outstanding issues on this patch preventing it from
 being committed?
 I think we have discussed this patch enough such that we now have
 consensus on proceeding with adding a record identical operator to
 SQL.
 No one has objected to the latest names of the operators.
 
 You haven't adjusted the patch to reduce the duplication between the
 equality and comparison functions, if you disagree with me and feel
 that doing so would increase the code complexity and be inconsistent
 with how we do things elsewhere that is fine.


signature.asc
Description: Digital signature


Re: [HACKERS] record identical operator - Review

2013-10-03 Thread Kevin Grittner
Bruce Momjian br...@momjian.us wrote:
 On Fri, Sep 27, 2013 at 03:34:20PM -0700, Kevin Grittner wrote:

 We first need to document the existing record comparison operators.
 If they read the docs for comparing row_constructors and expect
 that to be the behavior they get when they compare records, they
 will be surprised.

 Well, if they appear in \do, I am thinking they should be documented.

This patch now covers the ones which are record comparison
operators, old and new.  Feel free to document the others if you
feel strongly on that point; but I don't feel that becomes the
business of this patch.

 Because comparing primary keys doesn't tell us whether the old and
 new values in the row all match.

 OK, but my question was about why we need a full set of operators rather
 than just equal, and maybe not equal.  I thought you said we needed
 others, e.g. , so we could do merge joins, but I thought we would just
 be doing comparisons after primary keys are joined, and if that is true,
 we could just use a function.

http://www.postgresql.org/docs/current/interactive/xoper-optimization.html#AEN54334

 Actually, I am now realizing you have to use the non-binary-level equals
 comparison on keys, then the binary-level equals on rows for this to
 work --- that's pretty confusing.  Is that true?

It's a matter of replacing the = operator for record comparisons in
these two places with the new *= operator.

http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/matview.c;h=238ccc72f5205ae00a15e6e17f384addfa445552;hb=master#l553
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/matview.c;h=238ccc72f5205ae00a15e6e17f384addfa445552;hb=master#l684

(With or without the new operator, the second of these needs to be
schema-qualified to avoid trouble if the user adds a different
implementation of = ahead of the pg_catalog implementation on
search_path, as users can do.)  The difference between = and *=
would not generally be visible to end users -- just to those
working on matview.c.

 A quick query (lacking schema information and schema qualification)
 shows what is there by default:

 OK, the unique list is:

   opcname
 -
 varchar_ops
 kd_point_ops
 cidr_ops
 text_pattern_ops
 varchar_pattern_ops
 bpchar_pattern_ops
 (6 rows)

 Do these all have operators defined too?

Every operator class is associated with operators.  For example,
while text_pattern_ops uses the same = and  operators as the
default text opclass (because that already uses memcmp), it adds
~~, ~=~, ~~, and ~=~ operators which also use memcmp (ignoring
character encoding and collation).

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator - Review

2013-10-03 Thread Kevin Grittner
Steve Singer st...@ssinger.info wrote:

 You haven't adjusted the patch to reduce the duplication between the
 equality and comparison functions, if you disagree with me and feel that
 doing so would increase the code complexity and be inconsistent with how
 we do things elsewhere that is fine.

I think the main reason to keep them separate is that it makes it
easier to compare record_cmp to record_image_cmp and record_eq to
record_image_eq to see what the differences and similarities are. 
Other reasons are that I think all those conditionals inside a
combined function would get messy and make the logic harder to
understand.  The number of places that would need conditionals,
plus new wrappers that would be needed, would mean that the net
reduction in lines of code would be minimal.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator - Review

2013-09-30 Thread Kevin Grittner
Steve Singer st...@ssinger.info wrote:

 How about

   To support matching of rows which include elements without a default
 B-tree operator class, the following operators are defined for composite
 type comparison:
 literal*=/,
 literal*lt;gt;/,
 literal*lt;/,
 literal*lt;=/,
 literal*gt;/, and
 literal*gt;=/.

 These operators compare the internal binary representation of the two
 rows.  Two rows might have a different binary representation even
 though comparisons of the two rows with the equality operator is true. 
 The ordering of rows under these comparision operators is deterministic
 but not otherwise meaningful.  These operators are used internally for
 materialized views and might be useful for other specialized purposes
 such as replication but are not intended to be generally useful for
 writing queries.

I agree that's an improvement.  Thanks!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator - Review

2013-09-30 Thread Bruce Momjian
On Fri, Sep 27, 2013 at 03:34:20PM -0700, Kevin Grittner wrote:
  The arguments for this patch are
  * We want the materialized view to return the same value as
  would be returned if the query were executed directly.  This not
  only means that the values should be the same according to a
  datatypes = operator but that they should appear the same 'to
  the eyeball'.
 
 And to functions the user can run against the values.  The example
 with the null bitmap for arrays being included when not necessary
 is something that isn't directly apparent to the eye, but queries
 which use pg_column_size would not get equal results.

pg_column_size() is by definition internal details, so I am not worried
about that not matching.

  * Supporting the materialized view refresh check via SQL makes a
  lot of sense but doing that requires exposing something via SQL
  * If we adequately document what we mean by
  record_image_identical and the operator we pick for this then
  users shouldn't be surprised at what they get if they use this
 
 We first need to document the existing record comparison operators.
 If they read the docs for comparing row_constructors and expect
 that to be the behavior they get when they compare records, they
 will be surprised.

Well, if they appear in \do, I am thinking they should be documented.

  This is a good summary.  I think there are a few things that make
  this issue difficult to decide:
 
  1.  We have to use an operator to give the RMVC (REFRESH
  MATERIALIZED VIEW CONCURRENTLY) SPI query the ability to optimize
  this query.  If we could do this with an SQL or C function, there
  would be less concern about the confusion caused by adding this
  capability.
 
  (a)  We can't.
 
  (b)  Why would that be less confusing?

Because function names, especially long ones, are more clearly
self-documenting than operators.

  Question:  If we are comparing based on some primary key, why do
  we need this to optimize?
 
 Because comparing primary keys doesn't tell us whether the old and
 new values in the row all match.

OK, but my question was about why we need a full set of operators rather
than just equal, and maybe not equal.  I thought you said we needed
others, e.g. , so we could do merge joins, but I thought we would just
be doing comparisons after primary keys are joined, and if that is true,
we could just use a function.

Actually, I am now realizing you have to use the non-binary-level equals
comparison on keys, then the binary-level equals on rows for this to
work --- that's pretty confusing.  Is that true?

  3.  Our type casting and operators are already complex, and
  adding another set of operators only compounds that.
 
 It cannot have any effect on any of the existing operators, so I'm
 not sure whether you are referring to the extra operators and
 functions, or something else.  It does not, for example, introduce
 any risk of ambiguous operators.

It makes our system more complex for the user to understand.

  One interesting approach would be to only allow the operator to
  be called from SPI queries.
 
 Why would that be a good idea?

Because then it would be more of an internal operator.

  It would also be good to know about similar non-default entries
  in pg_opclass so we can understand the expected impact.
 
 A quick query (lacking schema information and schema qualification)
 shows what is there by default:

OK, the unique list is:

   opcname
-
 varchar_ops
 kd_point_ops
 cidr_ops
 text_pattern_ops
 varchar_pattern_ops
 bpchar_pattern_ops
(6 rows)

Do these all have operators defined too?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator - Review

2013-09-29 Thread Steve Singer

On 09/28/2013 03:03 PM, Kevin Grittner wrote:

para
+To support matching of rows which include elements without a default
+B-tree operator class, the following operators are defined for composite
+type comparison:
+literal*=/,
+literal*lt;gt;/,
+literal*lt;/,
+literal*lt;=/,
+literal*gt;/, and
+literal*gt;=/.
+These operators are also used internally to maintain materialized views,
+and may be useful to replication software.  To ensure that all user
+visible changes are detected, even when the equality operator for the
+type treats old and new values as equal, the byte images of the stored
+data are compared.  While ordering is deterministic, it is not generally
+useful except to facilitate merge joins.  Ordering may differ between
+system architectures and major releases of
+productnamePostgreSQL/productname.
+   /para


How about

 To support matching of rows which include elements without a default
B-tree operator class, the following operators are defined for composite
type comparison:
literal*=/,
literal*lt;gt;/,
literal*lt;/,
literal*lt;=/,
literal*gt;/, and
   literal*gt;=/.

These operators compare the internal binary representation of the two 
rows.   Two rows might have a different binary representation even 
though comparisons of the two rows with the equality operator is true.   
The ordering of rows under these comparision operators is deterministic 
but not otherwise meaningful.  These operators are used internally for 
materialized views and might be useful for other specialized purposes 
such as replication but are not intended to be generally useful for 
writing queries.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator - Review

2013-09-28 Thread Kevin Grittner
Kevin Grittner kgri...@ymail.com wrote:
 Bruce Momjian br...@momjian.us wrote:

 I think we need to see a patch from Kevin that shows all the row
 comparisons documented and we can see the impact of the
 user-visibile part.

First draft attached.

 I'm inclined to first submit a proposed documentation patch for the
 existing record operators, and see if we can make everyone happy
 with that, and *then* see about adding documentation for the new
 ones.  Trying to deal with both at once is likely to increase
 confusion and wheel-spinning.

When I took a stab at it, the new operators only seemed to merit a
single paragraph, so that is included after all.
 
-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 12739,12745  WHERE EXISTS (SELECT 1 FROM tab2 WHERE col2 = tab1.col2);
  
para
 See xref linkend=row-wise-comparison for details about the meaning
!of a row-wise comparison.
/para
/sect2
  
--- 12739,12745 
  
para
 See xref linkend=row-wise-comparison for details about the meaning
!of a row constructor comparison.
/para
/sect2
  
***
*** 12795,12806  WHERE EXISTS (SELECT 1 FROM tab2 WHERE col2 = tab1.col2);
  
para
 See xref linkend=row-wise-comparison for details about the meaning
!of a row-wise comparison.
/para
/sect2
  
sect2
!titleRow-wise Comparison/title
  
 indexterm zone=functions-subquery
  primarycomparison/primary
--- 12795,12806 
  
para
 See xref linkend=row-wise-comparison for details about the meaning
!of a row constructor comparison.
/para
/sect2
  
sect2
!titleSingle-row Comparison/title
  
 indexterm zone=functions-subquery
  primarycomparison/primary
***
*** 12823,12829  WHERE EXISTS (SELECT 1 FROM tab2 WHERE col2 = tab1.col2);
  
para
 See xref linkend=row-wise-comparison for details about the meaning
!of a row-wise comparison.
/para
/sect2
   /sect1
--- 12823,12829 
  
para
 See xref linkend=row-wise-comparison for details about the meaning
!of a row constructor comparison.
/para
/sect2
   /sect1
***
*** 12853,12864  WHERE EXISTS (SELECT 1 FROM tab2 WHERE col2 = tab1.col2);
/indexterm
  
indexterm
 primaryrow-wise comparison/primary
/indexterm
  
indexterm
 primarycomparison/primary
!secondaryrow-wise/secondary
/indexterm
  
indexterm
--- 12853,12874 
/indexterm
  
indexterm
+primarycomposite type/primary
+secondarycomparison/secondary
+   /indexterm
+ 
+   indexterm
 primaryrow-wise comparison/primary
/indexterm
  
indexterm
 primarycomparison/primary
!secondarycomposite type/secondary
!   /indexterm
! 
!   indexterm
!primarycomparison/primary
!secondaryrow constructor/secondary
/indexterm
  
indexterm
***
*** 13023,13029  AND
/sect2
  
sect2 id=row-wise-comparison
!titleRow-wise Comparison/title
  
  synopsis
  replaceablerow_constructor/replaceable replaceableoperator/replaceable replaceablerow_constructor/replaceable
--- 13033,13039 
/sect2
  
sect2 id=row-wise-comparison
!titleRow Constructor Comparison/title
  
  synopsis
  replaceablerow_constructor/replaceable replaceableoperator/replaceable replaceablerow_constructor/replaceable
***
*** 13033,13052  AND
 Each side is a row constructor,
 as described in xref linkend=sql-syntax-row-constructors.
 The two row values must have the same number of fields.
!Each side is evaluated and they are compared row-wise.  Row comparisons
!are allowed when the replaceableoperator/replaceable is
 literal=/,
 literallt;gt;/,
 literallt;/,
 literallt;=/,
 literalgt;/ or
!literalgt;=/,
!or has semantics similar to one of these.  (To be specific, an operator
!can be a row comparison operator if it is a member of a B-tree operator
!class, or is the negator of the literal=/ member of a B-tree operator
!class.)
/para
  
para
 The literal=/ and literallt;gt;/ cases work slightly differently
 from the others.  Two rows are considered
--- 13043,13067 
 Each side is a row constructor,
 as described in xref linkend=sql-syntax-row-constructors.
 The two row values must have the same number of fields.
!Each side is evaluated and they are compared row-wise.  Row constructor
!comparisons are allowed when the replaceableoperator/replaceable is
 literal=/,
 literallt;gt;/,
 literallt;/,
 literallt;=/,
 literalgt;/ or
!literalgt;=/.
!Every row element must be of a type which has a default B-tree operator
!class or the attempted comparison may generate an error.
/para
  
+   note
+para
+ Errors related to the number or types of elements might not 

Re: [HACKERS] record identical operator - Review

2013-09-27 Thread Kevin Grittner
Bruce Momjian br...@momjian.us wrote:
 On Thu, Sep 19, 2013 at 06:31:38PM -0400, Steve Singer wrote:
 On 09/12/2013 06:27 PM, Kevin Grittner wrote:
 Attached is a patch for a bit of infrastructure I believe to be
 necessary for correct behavior of REFRESH MATERIALIZED VIEW
 CONCURRENTLY as well as incremental maintenance of matviews.

 Here is attempt at a review of the v1 patch.

 There has been a heated discussion on list about if we even want
 this type of operator. I will try to summarize it here

 The arguments against it are
 * that a record_image_identical call on two records that returns
 false can still return true under the equality operator, and the
 records might (or might not) appear to be the same to users
 * Once we expose this as an operator via SQL someone will find
 it and use it and then complain when we change things such as
 the internal representation of a datatype which might break
 there queries

I don't see where that is possible unless they count on the order
of records sorted with the new operators, which would not be
something I recommend.  Changing the internal representation cannot
break simple use of the alternative equality definition as long as
all you cared about was whether the binary storage format of the
values was identical.

 * The differences between = and record_image_identical might not
 be understood by everywhere who finds the operator leading to
 confusion

Either they find it and read the code, or we document it and they
read the docs.

 * Using a criteria other than equality (the = operator provided
 by the datatype) might cause problems if we later provide 'on
 change' triggers for materialized views

I would fight user-defined triggers for matviews tooth and nail.
We need to get incremental maintenance working instead.

 The arguments for this patch are
 * We want the materialized view to return the same value as
 would be returned if the query were executed directly.  This not
 only means that the values should be the same according to a
 datatypes = operator but that they should appear the same 'to
 the eyeball'.

And to functions the user can run against the values.  The example
with the null bitmap for arrays being included when not necessary
is something that isn't directly apparent to the eye, but queries
which use pg_column_size would not get equal results.

 * Supporting the materialized view refresh check via SQL makes a
 lot of sense but doing that requires exposing something via SQL
 * If we adequately document what we mean by
 record_image_identical and the operator we pick for this then
 users shouldn't be surprised at what they get if they use this

We first need to document the existing record comparison operators.
If they read the docs for comparing row_constructors and expect
that to be the behavior they get when they compare records, they
will be surprised.

 This is a good summary.  I think there are a few things that make
 this issue difficult to decide:

 1.  We have to use an operator to give the RMVC (REFRESH
 MATERIALIZED VIEW CONCURRENTLY) SPI query the ability to optimize
 this query.  If we could do this with an SQL or C function, there
 would be less concern about the confusion caused by adding this
 capability.

 (a)  We can't.

 (b)  Why would that be less confusing?

 Question:  If we are comparing based on some primary key, why do
 we need this to optimize?

Because comparing primary keys doesn't tell us whether the old and
new values in the row all match.

 Certainly the primary key index will be used, no?

It currently uses all columns which are part of unique indexes on
the matview which are not partial (i.e., they do not use a WHERE
clause), they index only on columns (not expressions).  It is not
possible to define a primary key on a matview.  There are two
reasons for using these columns:

(a)  It provides a correctness guarantee against having duplicated
rows which have no nulls.  This guarantee is what allows us to use
the differential technique which is faster than retail DELETE and
re-INSERT of everything.

(b)  Since we don't support hash joins of records, and it would
tend to be pretty slow if we did, it allows faster hash joins on
one or more columns which stand a good chance of doing this
efficiently.

 2.  Agregates are already non-deterministic,

Only some of them, and only with certain source data.

 so some feel that adding this feature doesn't improve much.

It allows a materialized view which contains a column of a type
without a default btree opclass to *use* concurrent refresh, it
makes queries with deterministic results always generate matview
results with exactly match the results of a VIEW using the query if
it were run at the same moment, and for nondeterministic queries,
it provides results consistent with *some* result set which could
have been returned by a view at the same time.  Unless we do
something, none of these things are true.  That seems like much to
me.

 The counter-argument is that 

Re: [HACKERS] record identical operator - Review

2013-09-26 Thread Bruce Momjian
On Thu, Sep 19, 2013 at 06:31:38PM -0400, Steve Singer wrote:
 On 09/12/2013 06:27 PM, Kevin Grittner wrote:
 Attached is a patch for a bit of infrastructure I believe to be
 necessary for correct behavior of REFRESH MATERIALIZED VIEW
 CONCURRENTLY as well as incremental maintenance of matviews.
 
 Here is attempt at a review of the v1 patch.
 
 There has been a heated discussion on list about if we even want
 this type of operator. I will try to summarize it here
 
 The arguments against it are
 * that a record_image_identical call on two records that returns
 false can still return true under the equality operator, and the
 records might (or might not) appear to be the same to users
 * Once we expose this as an operator via SQL someone will find it
 and use it and then complain when we change things such as the
 internal representation of a datatype which might
break there queries
 * The differences between = and record_image_identical might not be
 understood by everywhere who finds the operator leading to confusion
 * Using a criteria other than equality (the = operator provided by
 the datatype) might cause problems if we later provide 'on change'
 triggers for materialized views
 
 The arguments for this patch are
 * We want the materialized view to return the same value as would be
 returned if the query were executed directly.  This not only means
 that the values should be the same according to a datatypes =
 operator but that they should appear the same 'to the eyeball'.
 * Supporting the materialized view refresh check via SQL makes a lot
 of sense but doing that requires exposing something via SQL
 * If we adequately document what we mean by record_image_identical
 and the operator we pick for this then users shouldn't be surprised
 at what they get if they use this

This is a good summary.  I think there are a few things that make this
issue difficult to decide:

1.  We have to use an operator to give the RMVC (REFRESH MATERIALIZED
VIEW CONCURRENTLY) SPI query the ability to optimize this query.   If we
could do this with an SQL or C function, there would be less concern
about the confusion caused by adding this capability.

Question:  If we are comparing based on some primary key, why do we need
this to optimize?  Certainly the primary key index will be used, no?

2.  Agregates are already non-deterministic, so some feel that adding
this feature doesn't improve much.  The counter-argument is that without
the binary row comparison ability, rows could be returned that could
_never_ have been produced by the base data, which is more of a user
surprise than non-deterministic rows.

3.  Our type casting and operators are already complex, and adding
another set of operators only compounds that.

4.  There are legitimate cases where tool makers might want the ability
to compare rows binarily, e.g. for replication, and adding these
operators would help with that.

I think we need to see a patch from Kevin that shows all the row
comparisons documented and we can see the impact of the user-visibile
part.

One interesting approach would be to only allow the operator to be
called from SPI queries.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator - Review

2013-09-26 Thread Bruce Momjian
On Thu, Sep 26, 2013 at 05:50:08PM -0400, Bruce Momjian wrote:
 I think we need to see a patch from Kevin that shows all the row
 comparisons documented and we can see the impact of the user-visibile
 part.
 
 One interesting approach would be to only allow the operator to be
 called from SPI queries.

It would also be good to know about similar non-default entries in
pg_opclass so we can understand the expected impact.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator - Review

2013-09-20 Thread Martijn van Oosterhout
On Thu, Sep 19, 2013 at 06:31:38PM -0400, Steve Singer wrote:
 I think there is agreement that better (as in more obscure)
 operators than === and !== need to be picked  and we need to find a
 place in the user-docs to warn users of the behaviour of this
 operators.   Hannu has proposed
 
 *==*   binary equal, surely very equal by any other definition as wall
 !==?  maybe not equal -- binary inequal, may still be equal by
 other comparison methods

It's a pity operators must be non-alpha and can't be named. Something
like:

SELECT foo OPERATOR(byte_equivalent) bar;

is simultaneously obscure, yet clear.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] record identical operator - Review

2013-09-20 Thread Kevin Grittner
Steve Singer st...@ssinger.info wrote:
 On 09/12/2013 06:27 PM, Kevin Grittner wrote:
 Attached is a patch for a bit of infrastructure I believe to be
 necessary for correct behavior of REFRESH MATERIALIZED VIEW
 CONCURRENTLY as well as incremental maintenance of matviews.

 Here is attempt at a review of the v1 patch.

Thanks for looking at it.

 There has been a heated discussion on list about if we even want this
 type of operator.

It's not a question of whether we want to allow operators which
define comparisons between objects in a non-standard way.  We
already have 11 such cases; this would add one more to make 12.  In
all cases there are different operators for making a comparison
that return potentially different results from the default
operators, to support uses that are specific to that type.

 I think there is agreement that better (as in more obscure) operators
 than === and !== need to be picked  and we need to find a place in the
 user-docs to warn users of the behaviour of this operators.  Hannu has
 proposed

 *==*  binary equal, surely very equal by any other definition as
 wall
 !==?  maybe not equal -- binary inequal, may still be
 equal by
 other comparison methods

 and no one has yet objected to this.

I do.  The issue is that there are a great many places that there
are multiple definitions of equality.  We generally try to use
something which tries to convey something about the nature of the
operation, since the fact that it can have different results is a
given.  There is nothing in those operators that says binary image
comparison.  I thought about prepending a hash character in front
of normal operators, because that somehow suggests binary
operations to my eye (although I have no idea whether it does so
for anyone else), but those operators are already used for an
alternative set of comparisons for intervals, with a different
meaning.  I'm still trying to come up with something I really like.

 My vote would be to update the patch to
 use those operator names and then figure out where we can document them.  It 
 it
 means we have to section describing the equal operator for records as well 
 then
 maybe we should do that so we can get on with things.

 Code Review
 

 The record_image_eq and record_image_cmp functions are VERY similar.  It seems
 to me that you could have the meet of these functions into a common function
 (that isn't exposed via SQL) that then behaves differently  in 2 or 3 spots
 based on a parameter that controls if it detoasts the tuple for the compare or
 returns false on the equals.

Did you look at the record_eq and record_cmp functions which exist
before this patch?  Is there a reason we should do it one way for
the default operators and another way for this alternative?  Do you
think we should change record_eq and record_cmp to do things the
way you suggest?

 Beyond that I don't see any issues with the code.

 You call out a question about two records being compared have a different 
 number
 of columns should it always be an error, or only an error when they match on 
 the
 columns they do have in common.

 The current behaviour is

   select * FROM r1,r2 where r1==r2;
   a | b | a | b | c
 ---+---+---+---+---
   1 | 1 | 1 | 2 | 1
 (1 row)

 update r2 set b=1;
 UPDATE 1
 test=# select * FROM r1,r2 where r2==r1;
 ERROR:  cannot compare record types with different numbers of columns

 This seems like a bad idea to me.  I don't like that I get a type comparison
 error only sometimes based on the values of the data in the column.  If I'm
 a user testing code that compares two records with this operator and I test my
 query with 1 value pair then and it works then I'd naively expect to get a
 true or false on all other value sets of the same record type.

Again, this is a result of following the precedent set by the
existing record comparison operators.

test=# create table r1 (c1 int, c2 int);
CREATE TABLE
test=# create table r2 (c1 int, c2 int, c3 int);
CREATE TABLE
test=# insert into r1 values (1, 2);
INSERT 0 1
test=# insert into r2 values (3, 4, 5);
INSERT 0 1
test=# select * from r1 join r2 on r1  r2;
 c1 | c2 | c1 | c2 | c3
++++
  1 |  2 |  3 |  4 |  5
(1 row)

test=# update r1 set c1 = 3, c2 = 4;
UPDATE 1
test=# select * from r1 join r2 on r1  r2;
ERROR:  cannot compare record types with different numbers of columns

Would be in favor of forcing a change to the record comparison
operators which have been in use for years?  If not, is there a
good reason to have them behave differently in this regard?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator - Review

2013-09-20 Thread Stephen Frost
Steve,

  Thanks for providing a summary.

* Steve Singer (st...@ssinger.info) wrote:
 The arguments for this patch are
 * We want the materialized view to return the same value as would be
 returned if the query were executed directly.  This not only means
 that the values should be the same according to a datatypes =
 operator but that they should appear the same 'to the eyeball'.

With the cases where the equality operator doesn't match the 'to the
eyeball' appearance, this really seems to be a pipe dream to me, unless
we *also* define an ordering or similar which would then change the
existing semantics for end users (which might be reasonable), but I'm
not sure how we could do that without input from the type- iow, I don't
think we can say well, we'll order by byte representation.  It's also
going to possibly be a performance hit since we're going to have to do
both an equality op call during a HashAgg/HashJoin/whatever and have to
do a which one is bigger of these two equal things, and then I wonder
if we'd need to allow users to somehow specify I don't want the bigger
of the equal things, I want the smaller, in this query for this equality
check.  I'd really like to see how we're going to provide for that when
the user is doing a GROUP BY without breaking something else or causing
problems with later SQL spec revisions.

 * Supporting the materialized view refresh check via SQL makes a lot
 of sense but doing that requires exposing something via SQL

... which we don't currently expose for the queries that we already
support users running today.  Users seem to generally be accepting of
that too, even though what they end up with in the output isn't
necessairly consistent from query to query.  The issue here is that
we're trying to make the mat view look like what the query would do when
run at the same time, which is a bit of a losing battle, imv.

 * If we adequately document what we mean by record_image_identical
 and the operator we pick for this then users shouldn't be surprised
 at what they get if they use this

That's a bit over-simplistic, really.  We do try to keep to the POLA
(principle of least astonishment) and that's not going to be easy here.

 I think there is agreement that better (as in more obscure)
 operators than === and !== need to be picked  and we need to find a
 place in the user-docs to warn users of the behaviour of this
 operators.   Hannu has proposed

I'm a bit on the fence about this, after having discussed my concerns
with Robert at PostgresOpen.  If we're going to expose and support
these at the SQL level, and we can figure out some semantics and
consistency for using them that isn't totally baffling, then perhaps
having them as simple and clear operator names would be reasonable.  One
concern here is if the SQL spec might decide some day that '===' is a
useful operator for something else (perhaps even they look the same
when cast to text).

 *==*   binary equal, surely very equal by any other definition as wall
 !==?  maybe not equal -- binary inequal, may still be equal by
 other comparison methods

Those look alright to me also though, but we'd need to work out the
other operation names, right?  And then figure out if we want to use
those other operators (less-than, greater-than, etc) when doing equality
operations to figure out which equal value to return to the user.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] record identical operator - Review

2013-09-20 Thread Kevin Grittner
Stephen Frost sfr...@snowman.net wrote:

 The issue here is that we're trying to make the mat view look
 like what the query would do when run at the same time, which is
 a bit of a losing battle, imv.

If we're not doing that, I don't see the point of matviews.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator - Review

2013-09-20 Thread Stephen Frost
On Friday, September 20, 2013, Kevin Grittner wrote:

 Stephen Frost sfr...@snowman.net javascript:; wrote:

  The issue here is that we're trying to make the mat view look
  like what the query would do when run at the same time, which is
  a bit of a losing battle, imv.

 If we're not doing that, I don't see the point of matviews.


When taken out of context, I can see how that might not come across
correctly. I was merely pointing out that it's a losing battle in the
context of types which have equality operators which claim equalness but
cast to text with results that don't match there.

Thanks,

Stephen


Re: [HACKERS] record identical operator - Review

2013-09-20 Thread Kevin Grittner
Stephen Frost sfr...@snowman.net wrote:
 On Friday, September 20, 2013, Kevin Grittner  wrote:
 Stephen Frost sfr...@snowman.net wrote:

 The issue here is that we're trying to make the mat view look
 like what the query would do when run at the same time, which
 is a bit of a losing battle, imv.

 If we're not doing that, I don't see the point of matviews.

 When taken out of context, I can see how that might not come
 across correctly. I was merely pointing out that it's a losing
 battle in the context of types which have equality operators
 which claim equalness but cast to text with results that don't
 match there.

I think the patch I've submitted wins that battle.  The only oddity
is that if someone uses a query for a matview which can provide
results with rows which are equal to previous result rows under the
default record opclass but different under this patch's opclass,
RMVC could update to the latest query results when someone thinks
that might not be necessary.  Workarounds would include using a
query with deterministic results (like using the upper() or lower()
function on a grouped citext column in the result set) or not using
the CONCURRENTLY option.  Neither seems too onerous.

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator - Review

2013-09-20 Thread Steve Singer

On 09/20/2013 08:38 AM, Kevin Grittner wrote:
Did you look at the record_eq and record_cmp functions which exist 
before this patch?  Is there a reason we should do it one way for the 
default operators and another way for this alternative? 
Do you think we should change record_eq and record_cmp to do things 
the way you suggest? 


I think the record_eq and record_cmp functions would be better if they 
shared code as well, but I don't think changing that should be part of 
this patch, you aren't otherwise touching those functions. I know some 
people dislike code that is switch based and prefer duplication, my 
preference is to avoid duplication.


This seems like a bad idea to me.  I don't like that I get a type comparison
error only sometimes based on the values of the data in the column.  If I'm
a user testing code that compares two records with this operator and I test my
query with 1 value pair then and it works then I'd naively expect to get a
true or false on all other value sets of the same record type.


Again, this is a result of following the precedent set by the
existing record comparison operators.

test=# create table r1 (c1 int, c2 int);
CREATE TABLE
test=# create table r2 (c1 int, c2 int, c3 int);
CREATE TABLE
test=# insert into r1 values (1, 2);
INSERT 0 1
test=# insert into r2 values (3, 4, 5);
INSERT 0 1
test=# select * from r1 join r2 on r1  r2;
  c1 | c2 | c1 | c2 | c3
++++
   1 |  2 |  3 |  4 |  5
(1 row)

test=# update r1 set c1 = 3, c2 = 4;
UPDATE 1
test=# select * from r1 join r2 on r1  r2;
ERROR:  cannot compare record types with different numbers of columns

Would be in favor of forcing a change to the record comparison
operators which have been in use for years?  If not, is there a
good reason to have them behave differently in this regard?

--
I hadn't picked up on that you copied that part of the behaviour from 
the existing comparison operators.
No we would need a really good reason for changing the behaviour of the 
comparison operators and I don't think we have that.  I agree that the 
binary identical operators should behave similarly to the existing 
comparison operators and error out on the column number mismatch after 
comparing the columns that are present in both.


Steve


Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator - Review

2013-09-19 Thread Steve Singer

On 09/12/2013 06:27 PM, Kevin Grittner wrote:

Attached is a patch for a bit of infrastructure I believe to be
necessary for correct behavior of REFRESH MATERIALIZED VIEW
CONCURRENTLY as well as incremental maintenance of matviews.


Here is attempt at a review of the v1 patch.

There has been a heated discussion on list about if we even want this 
type of operator. I will try to summarize it here


The arguments against it are
* that a record_image_identical call on two records that returns false 
can still return true under the equality operator, and the records might 
(or might not) appear to be the same to users
* Once we expose this as an operator via SQL someone will find it and 
use it and then complain when we change things such as the internal 
representation of a datatype which might

   break there queries
* The differences between = and record_image_identical might not be 
understood by everywhere who finds the operator leading to confusion
* Using a criteria other than equality (the = operator provided by the 
datatype) might cause problems if we later provide 'on change' triggers 
for materialized views


The arguments for this patch are
* We want the materialized view to return the same value as would be 
returned if the query were executed directly.  This not only means that 
the values should be the same according to a datatypes = operator but 
that they should appear the same 'to the eyeball'.
* Supporting the materialized view refresh check via SQL makes a lot of 
sense but doing that requires exposing something via SQL
* If we adequately document what we mean by record_image_identical and 
the operator we pick for this then users shouldn't be surprised at what 
they get if they use this


I think there is agreement that better (as in more obscure) operators 
than === and !== need to be picked  and we need to find a place in the 
user-docs to warn users of the behaviour of this operators.   Hannu has 
proposed


*==*   binary equal, surely very equal by any other definition as wall
!==?  maybe not equal -- binary inequal, may still be equal by
other comparison methods

and no one has yet objected to this.  My vote would be to update the patch to 
use those operator names and then figure out where we can document them.  It it 
means we have to section describing the equal operator for records as well then 
maybe we should do that so we can get on with things.


Code Review


The record_image_eq and record_image_cmp functions are VERY similar.  It seems 
to me that you could have the meet of these functions into a common function 
(that isn't exposed via SQL) that then behaves differently  in 2 or 3 spots 
based on a parameter that controls if it detoasts the tuple for the compare or 
returns false on the equals.

Beyond that I don't see any issues with the code.

You call out a question about two records being compared have a different 
number of columns should it always be an error, or only an error when they 
match on the columns they do have in common.

The current behaviour is

 select * FROM r1,r2 where r1==r2;
 a | b | a | b | c
---+---+---+---+---
 1 | 1 | 1 | 2 | 1
(1 row)

update r2 set b=1;
UPDATE 1
test=# select * FROM r1,r2 where r2==r1;
ERROR:  cannot compare record types with different numbers of columns

This seems like a bad idea to me.  I don't like that I get a type comparison 
error only sometimes based on the values of the data in the column.  If I'm a 
user testing code that compares two records with this operator and I test my 
query with 1 value pair then and it works then I'd naively expect to get a true 
or false on all other value sets of the same record type.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers