Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-13 Thread Kyotaro HORIGUCHI
Hello, it seems to me to work as expected.

At Thu, 08 Dec 2016 15:58:10 -0500, Tom Lane  wrote in 
<7083.1481230...@sss.pgh.pa.us>
> I've pushed the previous patch to HEAD.  Attached is a proposed patch
> (against 9.6) that we could use for the back branches; it takes the
> brute force approach of just computing the correct value on-demand
> in the two functions at issue.  The key question of course is whether
> this is acceptable from a performance standpoint.  I did a simple test
> 
> using a 1000-entry VALUES list:
> 
> select count(a) from (
> values
>   ('0'::varchar(3), '0'::varchar(4)),
>   ('1'::varchar(3), '1'::varchar(4)),
>   ('2'::varchar(3), '2'::varchar(4)),
>   ('3'::varchar(3), '3'::varchar(4)),
>   ('4'::varchar(3), '4'::varchar(4)),
>   ...
>   ('996'::varchar(3), '996'::varchar(4)),
>   ('997'::varchar(3), '997'::varchar(4)),
>   ('998'::varchar(3), '998'::varchar(4)),
>   ('999'::varchar(3), '999'::varchar(4))
> ) v(a,b);
> 
> Since all the rows do have the same typmod, this represents the worst
> case where we have to scan all the way to the end to confirm the typmod,
> and it has about as little overhead otherwise as I could think of doing.
> I ran it like this:
> 
> pgbench -U postgres -n -c 1 -T 1000 -f bigvalues.sql regression
> 
> and could not see any above-the-noise-level difference --- in fact,
> it seemed like it was faster *with* the patch, which is obviously
> impossible; I blame that on chance realignments of loops vs. cache
> line boundaries.
> 
> So I think this is an okay candidate for back-patching.  If anyone
> wants to do their own performance tests, please do.

For all the additional computation, I had the same result on
9.6. The patch accelerates the processing around the noise rate..

9.6 without the patch 103.2 tps
9.6 withthe patch 108.3 tps

For the master branch,

master102.9 tps
0b78106c^ 103.4 tps

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-08 Thread Tom Lane
I've pushed the previous patch to HEAD.  Attached is a proposed patch
(against 9.6) that we could use for the back branches; it takes the
brute force approach of just computing the correct value on-demand
in the two functions at issue.  The key question of course is whether
this is acceptable from a performance standpoint.  I did a simple test
using a 1000-entry VALUES list:

select count(a) from (
values
  ('0'::varchar(3), '0'::varchar(4)),
  ('1'::varchar(3), '1'::varchar(4)),
  ('2'::varchar(3), '2'::varchar(4)),
  ('3'::varchar(3), '3'::varchar(4)),
  ('4'::varchar(3), '4'::varchar(4)),
  ...
  ('996'::varchar(3), '996'::varchar(4)),
  ('997'::varchar(3), '997'::varchar(4)),
  ('998'::varchar(3), '998'::varchar(4)),
  ('999'::varchar(3), '999'::varchar(4))
) v(a,b);

Since all the rows do have the same typmod, this represents the worst
case where we have to scan all the way to the end to confirm the typmod,
and it has about as little overhead otherwise as I could think of doing.
I ran it like this:

pgbench -U postgres -n -c 1 -T 1000 -f bigvalues.sql regression

and could not see any above-the-noise-level difference --- in fact,
it seemed like it was faster *with* the patch, which is obviously
impossible; I blame that on chance realignments of loops vs. cache
line boundaries.

So I think this is an okay candidate for back-patching.  If anyone
wants to do their own performance tests, please do.

regards, tom lane

diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c
index 1e3ecbc..c51fd81 100644
*** a/src/backend/parser/parse_relation.c
--- b/src/backend/parser/parse_relation.c
*** static void expandTupleDesc(TupleDesc tu
*** 52,57 
--- 52,58 
  int rtindex, int sublevels_up,
  int location, bool include_dropped,
  List **colnames, List **colvars);
+ static int32 *getValuesTypmods(RangeTblEntry *rte);
  static int	specialAttNum(const char *attname);
  static bool isQueryUsingTempRelation_walker(Node *node, void *context);
  
*** expandRTE(RangeTblEntry *rte, int rtinde
*** 2157,2165 
--- 2158,2179 
  			{
  /* Values RTE */
  ListCell   *aliasp_item = list_head(rte->eref->colnames);
+ int32	   *coltypmods;
  ListCell   *lcv;
  ListCell   *lcc;
  
+ /*
+  * It's okay to extract column types from the expressions in
+  * the first row, since all rows will have been coerced to the
+  * same types.  Their typmods might not be the same though, so
+  * we potentially need to examine all rows to compute those.
+  * Column collations are pre-computed in values_collations.
+  */
+ if (colvars)
+ 	coltypmods = getValuesTypmods(rte);
+ else
+ 	coltypmods = NULL;
+ 
  varattno = 0;
  forboth(lcv, (List *) linitial(rte->values_lists),
  		lcc, rte->values_collations)
*** expandRTE(RangeTblEntry *rte, int rtinde
*** 2184,2196 
  
  		varnode = makeVar(rtindex, varattno,
  		  exprType(col),
! 		  exprTypmod(col),
  		  colcollation,
  		  sublevels_up);
  		varnode->location = location;
  		*colvars = lappend(*colvars, varnode);
  	}
  }
  			}
  			break;
  		case RTE_JOIN:
--- 2198,2212 
  
  		varnode = makeVar(rtindex, varattno,
  		  exprType(col),
! 		  coltypmods[varattno - 1],
  		  colcollation,
  		  sublevels_up);
  		varnode->location = location;
  		*colvars = lappend(*colvars, varnode);
  	}
  }
+ if (coltypmods)
+ 	pfree(coltypmods);
  			}
  			break;
  		case RTE_JOIN:
*** expandRTE(RangeTblEntry *rte, int rtinde
*** 2296,2301 
--- 2312,2319 
  		varnode = makeVar(rtindex, varattno,
  		  coltype, coltypmod, colcoll,
  		  sublevels_up);
+ 		varnode->location = location;
+ 
  		*colvars = lappend(*colvars, varnode);
  	}
  }
*** expandTupleDesc(TupleDesc tupdesc, Alias
*** 2413,2418 
--- 2431,2504 
  }
  
  /*
+  * getValuesTypmods -- expandRTE subroutine
+  *
+  * Identify per-column typmods for the given VALUES RTE.  Returns a
+  * palloc'd array.
+  */
+ static int32 *
+ getValuesTypmods(RangeTblEntry *rte)
+ {
+ 	int32	   *coltypmods;
+ 	List	   *firstrow;
+ 	int			ncolumns,
+ nvalid,
+ i;
+ 	ListCell   *lc;
+ 
+ 	Assert(rte->values_lists != NIL);
+ 	firstrow = (List *) linitial(rte->values_lists);
+ 	ncolumns = list_length(firstrow);
+ 	coltypmods = (int32 *) palloc(ncolumns * sizeof(int32));
+ 	nvalid = 0;
+ 
+ 	/* Collect the typmods from the first VALUES row */
+ 	i = 0;
+ 	foreach(lc, firstrow)
+ 	{
+ 		Node	   *col = (Node *) lfirst(lc);
+ 
+ 		coltypmods[i] = exprTypmod(col);
+ 		if (coltypmods[i] >= 0)
+ 			nvalid++;
+ 		i++;
+ 	}
+ 
+ 	/*
+ 	 * Scan remaining rows; as soon as we have a non-matching typmod for a
+ 	 * column, reset that typmod to -1.  We can bail out early if all typmods
+ 	 

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-08 Thread Pavel Stehule
2016-12-08 14:03 GMT+01:00 Alvaro Herrera :

> Tom Lane wrote:
> > Alvaro Herrera  writes:
> > > Tom Lane wrote:
> > >> In HEAD, we could change the RTE data structure so that
> > >> transformValuesClause could save the typmod information in the RTE,
> > >> keeping the lookups cheap.
> >
> > > Hmm, I think this would be useful for the XMLTABLE patch too.  I talked
> > > a bit about it at
> > > https://www.postgresql.org/message-id/20161122204730.
> dgipy6gxi25j4e6a@alvherre.pgsql
> >
> > I dunno.  If your example there is correct that XMLTABLE can be called as
> > a plain function in a SELECT list, then I doubt that we want to tie
> > anything about it to the RTE data structure.  If anything, the case where
> > it appears in FROM seems to need to be treated as a generic RTE_FUNCTION
> > case.
>
> Well, XMLTABLE is specified by the standard to be part of ,
> which it turn is part of .  I can't immediately tell
> whether it allows XMLTABLE to be called like a regular function.  The
> current patch allows it, but maybe that's not right, and it's probably
> not that useful anyway.
>

It looks like function, and we support on both sides, so I implemented
both.

Probably, there is only 10 rows more related to this feature. Using this
function in target list is not critical feature - now with LATERAL JOIN we
can live without it. It is just some few steps forward to our user.

Again - implementation of this feature is probably few lines only.


>
> > I've been trying to avoid getting involved in the XMLTABLE patch, mainly
> > because I know zip about XML, but maybe I need to take a look.
>
> I think it'd be productive that you did so.  The XML part of it is
> reasonably well isolated, so you could give your opinion on the core
> parser / executor parts without looking at the XML part.
>

The critical part has zero relation to XML. All is some game with tupledesc.

Regards

Pavel


>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-08 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> > Tom Lane wrote:
> >> In HEAD, we could change the RTE data structure so that
> >> transformValuesClause could save the typmod information in the RTE,
> >> keeping the lookups cheap.
> 
> > Hmm, I think this would be useful for the XMLTABLE patch too.  I talked
> > a bit about it at
> > https://www.postgresql.org/message-id/20161122204730.dgipy6gxi25j4e6a@alvherre.pgsql
> 
> I dunno.  If your example there is correct that XMLTABLE can be called as
> a plain function in a SELECT list, then I doubt that we want to tie
> anything about it to the RTE data structure.  If anything, the case where
> it appears in FROM seems to need to be treated as a generic RTE_FUNCTION
> case.

Well, XMLTABLE is specified by the standard to be part of ,
which it turn is part of .  I can't immediately tell
whether it allows XMLTABLE to be called like a regular function.  The
current patch allows it, but maybe that's not right, and it's probably
not that useful anyway.

> I've been trying to avoid getting involved in the XMLTABLE patch, mainly
> because I know zip about XML, but maybe I need to take a look.

I think it'd be productive that you did so.  The XML part of it is
reasonably well isolated, so you could give your opinion on the core
parser / executor parts without looking at the XML part.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-07 Thread Kyotaro HORIGUCHI
Mmm. I did the same thing in select_common_type and resulted in a
messier (a bit).

At Wed, 07 Dec 2016 23:44:19 -0500, Tom Lane  wrote in 
<15128.1481172...@sss.pgh.pa.us>
> Attached is a patch that fixes this by storing typmod info in the RTE.
> This turned out to be straightforward, and I think it's definitely
> what we should do in HEAD.  I have mixed emotions about whether it's
> worth doing anything about it in the back branches.

With it, VALUES works as the same as UNION, as documentation
says.

https://www.postgresql.org/docs/8.2/static/queries-values.html

Past versions have the same documentation so back-patching the
*behavior* seems to me worth doing. Instead of modifying RTE,
re-coercing the first row's value would enough (I'm not sure how
to do that now) for back-patching.

> I chose to redefine the existing coltypes/coltypmods/colcollations
> lists for CTE RTEs as also applying to VALUES RTEs.  That saves a
> little space in the RangeTblEntry nodes and allows sharing code
> in a couple of places.  It's tempting to consider making that apply
> to all RTE types, which would permit collapsing expandRTE() and
> get_rte_attribute_type() into a single case.  But AFAICS there would
> be no benefit elsewhere, so I'm not sure the extra code churn is
> justified.

Agreed.

> BTW, I noticed that the CTE case of expandRTE() fails to assign the
> specified location to the generated Vars, which is clearly a bug
> though a very minor one; it would result in failing to display a
> parse error location in some cases where we would do so for Vars from
> other RTE types.  That part might be worth back-patching, not sure.

If we do back-patching VALUES patch, involving it would
worth, I think.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-07 Thread Pavel Stehule
2016-12-07 22:17 GMT+01:00 Alvaro Herrera :

> Tom Lane wrote:
>
> > In HEAD, we could change the RTE data structure so that
> > transformValuesClause could save the typmod information in the RTE,
> > keeping the lookups cheap.
>
> Hmm, I think this would be useful for the XMLTABLE patch too.  I talked
> a bit about it at
> https://www.postgresql.org/message-id/20161122204730.
> dgipy6gxi25j4e6a@alvherre.pgsql
>
> The patch has evolved quite a bit since then, but the tupdesc continues
> to be a problem.  Latest patch is at
> https://www.postgresql.org/message-id/CAFj8pRBsrhwR636-_
> 3TPbqu%3DFo3_DDer6_yp_afzR7qzhW1T6Q%40mail.gmail.com


What do you dislike on tupdesc usage there?

regards

Pavel

>
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-07 Thread Tom Lane
Attached is a patch that fixes this by storing typmod info in the RTE.
This turned out to be straightforward, and I think it's definitely
what we should do in HEAD.  I have mixed emotions about whether it's
worth doing anything about it in the back branches.

I chose to redefine the existing coltypes/coltypmods/colcollations
lists for CTE RTEs as also applying to VALUES RTEs.  That saves a
little space in the RangeTblEntry nodes and allows sharing code
in a couple of places.  It's tempting to consider making that apply
to all RTE types, which would permit collapsing expandRTE() and
get_rte_attribute_type() into a single case.  But AFAICS there would
be no benefit elsewhere, so I'm not sure the extra code churn is
justified.

BTW, I noticed that the CTE case of expandRTE() fails to assign the
specified location to the generated Vars, which is clearly a bug
though a very minor one; it would result in failing to display a
parse error location in some cases where we would do so for Vars from
other RTE types.  That part might be worth back-patching, not sure.

regards, tom lane

diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index e30c57e..d973225 100644
*** a/src/backend/nodes/copyfuncs.c
--- b/src/backend/nodes/copyfuncs.c
*** _copyRangeTblEntry(const RangeTblEntry *
*** 2149,2161 
  	COPY_NODE_FIELD(functions);
  	COPY_SCALAR_FIELD(funcordinality);
  	COPY_NODE_FIELD(values_lists);
- 	COPY_NODE_FIELD(values_collations);
  	COPY_STRING_FIELD(ctename);
  	COPY_SCALAR_FIELD(ctelevelsup);
  	COPY_SCALAR_FIELD(self_reference);
! 	COPY_NODE_FIELD(ctecoltypes);
! 	COPY_NODE_FIELD(ctecoltypmods);
! 	COPY_NODE_FIELD(ctecolcollations);
  	COPY_NODE_FIELD(alias);
  	COPY_NODE_FIELD(eref);
  	COPY_SCALAR_FIELD(lateral);
--- 2149,2160 
  	COPY_NODE_FIELD(functions);
  	COPY_SCALAR_FIELD(funcordinality);
  	COPY_NODE_FIELD(values_lists);
  	COPY_STRING_FIELD(ctename);
  	COPY_SCALAR_FIELD(ctelevelsup);
  	COPY_SCALAR_FIELD(self_reference);
! 	COPY_NODE_FIELD(coltypes);
! 	COPY_NODE_FIELD(coltypmods);
! 	COPY_NODE_FIELD(colcollations);
  	COPY_NODE_FIELD(alias);
  	COPY_NODE_FIELD(eref);
  	COPY_SCALAR_FIELD(lateral);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index b7a109c..edc1797 100644
*** a/src/backend/nodes/equalfuncs.c
--- b/src/backend/nodes/equalfuncs.c
*** _equalRangeTblEntry(const RangeTblEntry 
*** 2460,2472 
  	COMPARE_NODE_FIELD(functions);
  	COMPARE_SCALAR_FIELD(funcordinality);
  	COMPARE_NODE_FIELD(values_lists);
- 	COMPARE_NODE_FIELD(values_collations);
  	COMPARE_STRING_FIELD(ctename);
  	COMPARE_SCALAR_FIELD(ctelevelsup);
  	COMPARE_SCALAR_FIELD(self_reference);
! 	COMPARE_NODE_FIELD(ctecoltypes);
! 	COMPARE_NODE_FIELD(ctecoltypmods);
! 	COMPARE_NODE_FIELD(ctecolcollations);
  	COMPARE_NODE_FIELD(alias);
  	COMPARE_NODE_FIELD(eref);
  	COMPARE_SCALAR_FIELD(lateral);
--- 2460,2471 
  	COMPARE_NODE_FIELD(functions);
  	COMPARE_SCALAR_FIELD(funcordinality);
  	COMPARE_NODE_FIELD(values_lists);
  	COMPARE_STRING_FIELD(ctename);
  	COMPARE_SCALAR_FIELD(ctelevelsup);
  	COMPARE_SCALAR_FIELD(self_reference);
! 	COMPARE_NODE_FIELD(coltypes);
! 	COMPARE_NODE_FIELD(coltypmods);
! 	COMPARE_NODE_FIELD(colcollations);
  	COMPARE_NODE_FIELD(alias);
  	COMPARE_NODE_FIELD(eref);
  	COMPARE_SCALAR_FIELD(lateral);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 0d858f5..7258c03 100644
*** a/src/backend/nodes/outfuncs.c
--- b/src/backend/nodes/outfuncs.c
*** _outRangeTblEntry(StringInfo str, const 
*** 2841,2855 
  			break;
  		case RTE_VALUES:
  			WRITE_NODE_FIELD(values_lists);
! 			WRITE_NODE_FIELD(values_collations);
  			break;
  		case RTE_CTE:
  			WRITE_STRING_FIELD(ctename);
  			WRITE_UINT_FIELD(ctelevelsup);
  			WRITE_BOOL_FIELD(self_reference);
! 			WRITE_NODE_FIELD(ctecoltypes);
! 			WRITE_NODE_FIELD(ctecoltypmods);
! 			WRITE_NODE_FIELD(ctecolcollations);
  			break;
  		default:
  			elog(ERROR, "unrecognized RTE kind: %d", (int) node->rtekind);
--- 2841,2857 
  			break;
  		case RTE_VALUES:
  			WRITE_NODE_FIELD(values_lists);
! 			WRITE_NODE_FIELD(coltypes);
! 			WRITE_NODE_FIELD(coltypmods);
! 			WRITE_NODE_FIELD(colcollations);
  			break;
  		case RTE_CTE:
  			WRITE_STRING_FIELD(ctename);
  			WRITE_UINT_FIELD(ctelevelsup);
  			WRITE_BOOL_FIELD(self_reference);
! 			WRITE_NODE_FIELD(coltypes);
! 			WRITE_NODE_FIELD(coltypmods);
! 			WRITE_NODE_FIELD(colcollations);
  			break;
  		default:
  			elog(ERROR, "unrecognized RTE kind: %d", (int) node->rtekind);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index c587d4e..d608530 100644
*** a/src/backend/nodes/readfuncs.c
--- b/src/backend/nodes/readfuncs.c
*** _readRangeTblEntry(void)
*** 1314,1328 
  			break;
  		case RTE_VALUES:
  			READ_NODE_FIELD(values_lists);
! 			READ_NODE_FIELD(values_collations);
  			break;
  		case 

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-07 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> In HEAD, we could change the RTE data structure so that
>> transformValuesClause could save the typmod information in the RTE,
>> keeping the lookups cheap.

> Hmm, I think this would be useful for the XMLTABLE patch too.  I talked
> a bit about it at
> https://www.postgresql.org/message-id/20161122204730.dgipy6gxi25j4e6a@alvherre.pgsql

I dunno.  If your example there is correct that XMLTABLE can be called as
a plain function in a SELECT list, then I doubt that we want to tie
anything about it to the RTE data structure.  If anything, the case where
it appears in FROM seems to need to be treated as a generic RTE_FUNCTION
case.

I've been trying to avoid getting involved in the XMLTABLE patch, mainly
because I know zip about XML, but maybe I need to take a look.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-07 Thread Alvaro Herrera
Tom Lane wrote:

> In HEAD, we could change the RTE data structure so that
> transformValuesClause could save the typmod information in the RTE,
> keeping the lookups cheap.

Hmm, I think this would be useful for the XMLTABLE patch too.  I talked
a bit about it at
https://www.postgresql.org/message-id/20161122204730.dgipy6gxi25j4e6a@alvherre.pgsql

The patch has evolved quite a bit since then, but the tupdesc continues
to be a problem.  Latest patch is at
https://www.postgresql.org/message-id/CAFj8pRBsrhwR636-_3TPbqu%3DFo3_DDer6_yp_afzR7qzhW1T6Q%40mail.gmail.com

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-07 Thread David G. Johnston
On Wed, Dec 7, 2016 at 1:03 PM, Tom Lane  wrote:

> Still, things have been like this since 8.2 when we implemented multi-row
> VALUES, and nobody's noticed up to now.  Maybe the right answer is to
> change the data structure in HEAD and decree that we won't fix it in back
> branches.  I don't really like that answer though ...
>

​The merit of "won't back-patch"​ is that at least you don't punish those
who are being lazy (in a good sense) but generating values in subsequent
lines that conform to the type specification of the first record.  We
already implicitly accept such behavior elsewhere - though probably with
better validation - so keeping it here is defense-able.  We dislike
changing query plans in back branches and this really isn't that different.

The concern, especially since this can propagate to a CREATE TABLE AS, is
whether there is some kind of fundamental storage risk being introduced
that we do not want to have happen no matter how rare.  /me feels deja-vu...

David J.


Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-07 Thread Tom Lane
Kyotaro HORIGUCHI  writes:
> At Mon, 5 Dec 2016 21:54:08 -0700, "David G. Johnston" 
>  wrote in 
> 

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-07 Thread Kyotaro HORIGUCHI
Hello,

At Mon, 5 Dec 2016 21:54:08 -0700, "David G. Johnston" 
 wrote in 

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-05 Thread David G. Johnston
On Mon, Dec 5, 2016 at 9:54 PM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

>  The concern is that "scan every row" could be very expensive - though in
> writing this I'm thinking that you'd quickly find a non-match even in a
> large dataset - and so a less perfect but still valid solution is to simply
> discard the typemod if there is more than one row.
>

​My folly here - and the actual question to ask - is if you are faced with
large dataset that does have consistent typmods - is the benefit of knowing
what it is and carrying it to the next layer worth the cost of scanning
every single row to prove it is consistent?​

My vote would be no - and the only question to ask is whether n = 1 and n >
1 behaviors should differ - to which I'd say no as well - at least in
master.  In the back branches the current behavior would be retained if the
n = 1 behavior is kept different than the n > 1 behavior which is a worthy
trade-off.

David J.


Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-05 Thread David G. Johnston
feel free to s/typemod/typmod/ ... my fingers don't want to drop the "e"

On Mon, Dec 5, 2016 at 9:17 PM, Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:

> Hello,
>
> At Mon, 5 Dec 2016 18:59:42 -0700, "David G. Johnston" <
> david.g.johns...@gmail.com> wrote in 

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-05 Thread Kyotaro HORIGUCHI
Hello,

At Mon, 5 Dec 2016 18:59:42 -0700, "David G. Johnston" 
 wrote in 

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-05 Thread David G. Johnston
On Mon, Dec 5, 2016 at 6:36 PM, Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:

> Hello,
>
> At Mon, 5 Dec 2016 14:42:39 -0700, "David G. Johnston" <
> david.g.johns...@gmail.com> wrote in  v8nr0FsCFrQ=oo1dkp...@mail.gmail.com>
> > On Mon, Dec 5, 2016 at 2:22 PM, Tom Lane  wrote:
> >
> > > "David G. Johnston"  writes:
> > > > On Mon, Dec 5, 2016 at 1:08 PM, Tom Lane  wrote:
> > > >> In order to fix this, we first have to decide what the semantics
> ought
> > > >> to be.  I think there are two plausible definitions:
> > > >> 1. If all the expressions in the VALUES column share the same
> typmod,
> > > >> use that typmod, else use -1.
> > > >> 2. Use -1 whenever there is more than one VALUES row.
> > >
> > > > ​Can we be precise enough to perform #2 if the top-level (or
> immediate
> > > > parent) command is an INSERT - the existing table is going to
> enforce its
> > > > own typemod anyway, otherwise go with #1?
> > >
> > > I dunno if that's "precise" or just "randomly inconsistent" ;-)
> > >
> >
> > :)
> >
> > How does "targeted optimization" sound?
>
> (sorry I don't understand what the "targetted optimization" is..)
>

I guess its a bit misleading depending on the implementation chosen.  The
general thought is that we simply ignore typemod information in VALUES if
we have been told otherwise what that typemod will be (in this case an
insert into column will use the typemod of the existing column regardless
of the input data's typemod).


> FWIW, different from the UNION case, I don't see a reason that
> every row in a VALUES clause shares anything common with any
> other rows. Of course typmod and even type are not to be
> shared. (Type is shared, though.)
>

​You have a typo here somewhere..."type not to be shared. (Type is shared,
though)" doesn't work.​

>
> On the other hand, if we make all values to be strictly typed (I
> mean that every value brings its own type information along
> with), values also can consider strict type. But currently the
> following command is ignoring the type of the first value.
>
> =# select 'bar'::varchar(4) || '';
>  ?column?
> --
>  bar
>
>
​Its not ignored - is discarded during coercion to make the || operator
work on the same type (text).  ​

​SELECT '12345'::varchar(3) || '678'​


> Even though I'm not sure about SQL standard here but my
> feeling is something like the following.
>
> | FROM (
> |   VALUES (row 1), .. (row n))
> | AS foo (colname *type*, ..)
>
> for this case,
>
> | create temporary table product_codes  as select *
> | from (
> |   values
> |   ('abcdefg'),
> |   ('012345678901234567ABCDEFGHIJKLMN')
> | ) csv_data (product_code character varying(20));
>
> Myself have gotten errors for this false syntax several times:(
>

​Only the function in from form of this accepts a column definition in lieu
of a simple alias.  Regardless of the merits of this idea it would not be
backpatch-able and wouldn't resolve the current valid syntax problem.​

I suppose my option #4 has the same problem...


> > ​Unnecessary maybe, but wouldn't it be immaterial given we are only able
> to
> > be efficient when inserting exactly one row.
> >
> > There is also a #4 here to consider - if the first (or any) row is not
> type
> > unknown, and the remaining rows are all unknown, use the type and typemod
> > of the known row AND attempt coerce all of the unknowns to that same
> type.
> > I'd suggest this is probably the most user-friendly option (do as I mean,
> > not as I say).  The OP query would then fail since the second literal is
> > too long to fit in a varchar(20) - I would not want the value truncated
> so
> > an actual cast wouldn't work.
>
> 1 has a type of int,


​true​

1.0 has a type of float


​false - its numeric​
​select pg_typeof(1.0) -> numeric
​

and '1' has a typ
> ​e ​
> of text.


​false - its unknown​ (but implicitly cast-able)
select pgtype_of('1') -> error, pg_typeof(unknown) does not exist

So I don't see a situation where only the first row is
> detectably typed. Or is it means that only the first row is
> explicitly typed?


​I do indeed mean explicitly typed here.​  Using 1 or 1.0 in a values would
be a form of explicit typing in this sense.

David J.


Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-05 Thread Craig Ringer
On 6 December 2016 at 04:52, David G. Johnston
 wrote:

> Can we be precise enough to perform #2 if the top-level (or immediate
> parent) command is an INSERT - the existing table is going to enforce its
> own typemod anyway, otherwise go with #1?

We already routinely throw away typmod information in other places,
and can only truly rely on it when working directly with data in
tables. So it sounds sensible to me.

I see semi-regular complaints about this from JDBC users, who want to
be able to know things like "number of digits in this numeric" and
"can this column be null" even through various projections and
transformations of results. But I don't think your suggested change
will make the existing situation any worse.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-05 Thread Kyotaro HORIGUCHI
Hello,

At Mon, 5 Dec 2016 14:42:39 -0700, "David G. Johnston" 
 wrote in 

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-05 Thread David G. Johnston
On Mon, Dec 5, 2016 at 2:22 PM, Tom Lane  wrote:

> "David G. Johnston"  writes:
> > On Mon, Dec 5, 2016 at 1:08 PM, Tom Lane  wrote:
> >> In order to fix this, we first have to decide what the semantics ought
> >> to be.  I think there are two plausible definitions:
> >> 1. If all the expressions in the VALUES column share the same typmod,
> >> use that typmod, else use -1.
> >> 2. Use -1 whenever there is more than one VALUES row.
>
> > ​Can we be precise enough to perform #2 if the top-level (or immediate
> > parent) command is an INSERT - the existing table is going to enforce its
> > own typemod anyway, otherwise go with #1?
>
> I dunno if that's "precise" or just "randomly inconsistent" ;-)
>

:)

How does "targeted optimization" sound?


> > ​Lacking that possibility I'd say that documenting that our treatment of
> > typemod in VALUES is similar to our treatment of typemod in function
> > arguments would be acceptable. This suggests a #3 - simply use "-1"
> > regardless of the number of rows in the VALUES expression.
>
> I'm a bit concerned about whether that would introduce overhead that we
> avoid today, in particular for something like
>
> insert into foo (varchar20col) values ('bar'::varchar(20));
>
> I think if we throw away the knowledge that the VALUES row produces the
> right typmod already, we'd end up adding an unnecessary runtime coercion
> step.
>

​Unnecessary maybe, but wouldn't it be immaterial given we are only able to
be efficient when inserting exactly one row.

There is also a #4 here to consider - if the first (or any) row is not type
unknown, and the remaining rows are all unknown, use the type and typemod
of the known row AND attempt coerce all of the unknowns to that same type.
I'd suggest this is probably the most user-friendly option (do as I mean,
not as I say).  The OP query would then fail since the second literal is
too long to fit in a varchar(20) - I would not want the value truncated so
an actual cast wouldn't work.

David J.


David J.


Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-05 Thread Tom Lane
"David G. Johnston"  writes:
> On Mon, Dec 5, 2016 at 1:08 PM, Tom Lane  wrote:
>> In order to fix this, we first have to decide what the semantics ought
>> to be.  I think there are two plausible definitions:
>> 1. If all the expressions in the VALUES column share the same typmod,
>> use that typmod, else use -1.
>> 2. Use -1 whenever there is more than one VALUES row.

> ​Can we be precise enough to perform #2 if the top-level (or immediate
> parent) command is an INSERT - the existing table is going to enforce its
> own typemod anyway, otherwise go with #1?

I dunno if that's "precise" or just "randomly inconsistent" ;-)

> ​Lacking that possibility I'd say that documenting that our treatment of
> typemod in VALUES is similar to our treatment of typemod in function
> arguments would be acceptable. This suggests a #3 - simply use "-1"
> regardless of the number of rows in the VALUES expression.

I'm a bit concerned about whether that would introduce overhead that we
avoid today, in particular for something like

insert into foo (varchar20col) values ('bar'::varchar(20));

I think if we throw away the knowledge that the VALUES row produces the
right typmod already, we'd end up adding an unnecessary runtime coercion
step.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-05 Thread David G. Johnston
On Mon, Dec 5, 2016 at 1:08 PM, Tom Lane  wrote:

> I looked into the issue reported in bug #14448,
> https://www.postgresql.org/message-id/20161205143037.
> 4377.60754%40wrigleys.postgresql.org
>
> The core of it seems to be that expandRTE() will report the type and
> typmod of a column of a VALUES construct as being exprType() and
> exprTypmod() of the corresponding expression in the first row of
> the VALUES.  It's okay to handle data type that way, because we've
> coerced all the expressions for the column to the same type; but
> we have *not* coerced them to the same typmod.  So some of the values
> from later rows may fail to meet the claimed typmod.  This is not good.
>
> In order to fix this, we first have to decide what the semantics ought
> to be.  I think there are two plausible definitions:
>
> 1. If all the expressions in the VALUES column share the same typmod,
> use that typmod, else use -1.
>
> 2. Use -1 whenever there is more than one VALUES row.
>
> #1 is what we do for some comparable cases such as UNION and CASE.
> However, it's potentially quite expensive for large VALUES constructs.
> #2 would be a lot cheaper, and given that this is the first complaint
> we've gotten in all the years we've had multi-row-VALUES support, it's
> not clear that deriving a precise typmod is really all that useful
> for VALUES.
>
> I have no strong preference between these two, but I think whatever
> we do needs to be back-patched.  The behavior described in the bug
> report is definitely broken.
>
> Thoughts?
>

​Can we be precise enough to perform #2 if the top-level (or immediate
parent) command is an INSERT - the existing table is going to enforce its
own typemod anyway, otherwise go with #1?

​Lacking that possibility I'd say that documenting that our treatment of
typemod in VALUES is similar to our treatment of typemod in function
arguments would be acceptable. This suggests a #3 - simply use "-1"
regardless of the number of rows in the VALUES expression.

David J.


[HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-05 Thread Tom Lane
I looked into the issue reported in bug #14448,
https://www.postgresql.org/message-id/20161205143037.4377.60754%40wrigleys.postgresql.org

The core of it seems to be that expandRTE() will report the type and
typmod of a column of a VALUES construct as being exprType() and
exprTypmod() of the corresponding expression in the first row of
the VALUES.  It's okay to handle data type that way, because we've
coerced all the expressions for the column to the same type; but
we have *not* coerced them to the same typmod.  So some of the values
from later rows may fail to meet the claimed typmod.  This is not good.

In order to fix this, we first have to decide what the semantics ought
to be.  I think there are two plausible definitions:

1. If all the expressions in the VALUES column share the same typmod,
use that typmod, else use -1.

2. Use -1 whenever there is more than one VALUES row.

#1 is what we do for some comparable cases such as UNION and CASE.
However, it's potentially quite expensive for large VALUES constructs.
#2 would be a lot cheaper, and given that this is the first complaint
we've gotten in all the years we've had multi-row-VALUES support, it's
not clear that deriving a precise typmod is really all that useful
for VALUES.

I have no strong preference between these two, but I think whatever
we do needs to be back-patched.  The behavior described in the bug
report is definitely broken.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers