Updated patch attached.

2014-10-22 20:51 GMT+07:00 Pavel Stehule <pavel.steh...@gmail.com>:
>>>>
>>>>> I agree with your proposal. I have a few comments to design:
>>>>>
>>>>
>>>>> 1. patch doesn't hold documentation and regress tests, please append
>>>>> it.
>>>>>
>>>>
i've added some regression tests in arrays.sql and aggregate.sql.

I've only found the documentation of array_agg. Where is the doc for
array(subselect) defined?


> 2. this functionality (multidimensional aggregation) can be interesting
>>>>> more times, so maybe some interface like array builder should be 
>>>>> preferred.
>>>>>
>>>> We already have accumArrayResult and makeArrayResult/makeMdArrayResult,
>>>> haven't we?
>>>>
>>>> Actually array_agg(anyarray) can be implemented by using
>>>> accumArrayResult and makeMdArrayResult, but in the process we will
>>>> deconstruct the array data and null bit-flags into ArrayBuildState->dvalues
>>>> and dnulls. And we will reconstruct it through makeMdArrayResult. I think
>>>> this way it's not efficient, so i decided to reimplement it and memcpy the
>>>> array data and null flags as-is.
>>>>
>>>
>>> aha, so isn't better to fix a performance for accumArrayResult ?
>>>
>>
>> Ok, i'll go this route. While reading the code of array(subselect) as you
>> pointed below, i think the easiest way to implement support for both
>> array_agg(anyarray) and array(subselect) is to change accumArrayResult so
>> it operates both with scalar datum(s) and array datums, with performance
>> optimization for the latter.
>>
>
implemented it by modifying ArrayBuildState to ArrayBuildStateArray and
ArrayBuildStateScalar (do you have any idea for better struct naming?)


> In other places, i think it's clearer if we just use accumArrayResult and
>>>> makeArrayResult/makeMdArrayResult as API for building (multidimensional)
>>>> arrays.
>>>>
>>>>
>>>>> 3. array_agg was consistent with array(subselect), so it should be
>>>>> fixed too
>>>>>
>>>>> postgres=# select array_agg(a) from test;
>>>>>        array_agg
>>>>> -----------------------
>>>>>  {{1,2,3,4},{1,2,3,4}}
>>>>> (1 row)
>>>>>
>>>>> postgres=# select array(select a from test);
>>>>> ERROR:  could not find array type for data type integer[]
>>>>>
>>>>
>>>> I'm pretty new in postgresql development. Can you point out where is
>>>> array(subselect) implemented?
>>>>
>>>
>>> where you can start?
>>>
>>> postgres=# explain select array(select a from test);
>>> ERROR:  42704: could not find array type for data type integer[]
>>> LOCATION:  exprType, nodeFuncs.c:116
>>>
>>> look to code ... your magic keyword is a ARRAY_SUBLINK .. search in
>>> postgresql sources this keyword
>>>
>>
> attention: probably we don't would to allow arrays everywhere.
>

I've changed the places where i think it's appropriate.


> 4. why you use a magic constant (64) there?
>>>>>
>>>>> +         astate->abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes);
>>>>> +         astate->aitems = 64 * nitems;
>>>>>
>>>>> +             astate->nullbitmap = (bits8 *)
>>>>> +                 repalloc(astate->nullbitmap, (astate->aitems + 7) /
>>>>> 8);
>>>>>
>>>>
>>>> just follow the arbitrary size choosen in accumArrayResult
>>>> (arrayfuncs.c):
>>>> astate->alen = 64; /* arbitrary starting array size */
>>>>
>>>> it can be any number not too small and too big. Too small, and we will
>>>> realloc shortly. Too big, we will end up wasting memory.
>>>>
>>>
>>> you can try to alloc 1KB instead as start -- it is used more times in
>>> Postgres. Then a overhead is max 1KB per agg call - what is acceptable.
>>>
>>> You take this value from accumArrayResult, but it is targeted for
>>> shorted scalars - you should to expect so any array will be much larger.
>>>
>>
this patch allocates 1KB (1024 bytes) if the ndatabytes is < 512bytes. If
it is larger, it allocates 4 * size. For nullbitmap, it allocates 4 *
number of items in array.

Regards,
--
Ali Akbar
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***************
*** 12046,12051 **** NULL baz</literallayout>(3 rows)</entry>
--- 12046,12067 ----
       <row>
        <entry>
         <indexterm>
+         <primary>array_agg</primary>
+        </indexterm>
+        <function>array_agg(<replaceable class="parameter">anyarray</replaceable>)</function>
+       </entry>
+       <entry>
+        any
+       </entry>
+       <entry>
+        the same array type as input type
+       </entry>
+       <entry>input arrays, aggregated into higher-order multidimesional array. Rejects NULL and empty array as input.</entry>
+      </row>
+ 
+      <row>
+       <entry>
+        <indexterm>
          <primary>average</primary>
         </indexterm>
         <indexterm>
*** a/src/backend/nodes/nodeFuncs.c
--- b/src/backend/nodes/nodeFuncs.c
***************
*** 108,119 **** exprType(const Node *expr)
  					type = exprType((Node *) tent->expr);
  					if (sublink->subLinkType == ARRAY_SUBLINK)
  					{
! 						type = get_array_type(type);
! 						if (!OidIsValid(type))
! 							ereport(ERROR,
! 									(errcode(ERRCODE_UNDEFINED_OBJECT),
! 									 errmsg("could not find array type for data type %s",
! 							format_type_be(exprType((Node *) tent->expr)))));
  					}
  				}
  				else if (sublink->subLinkType == MULTIEXPR_SUBLINK)
--- 108,123 ----
  					type = exprType((Node *) tent->expr);
  					if (sublink->subLinkType == ARRAY_SUBLINK)
  					{
! 						if (!OidIsValid(get_element_type(type)))
! 						{
! 							/* not array, so check for its array type */
! 							type = get_array_type(type);
! 							if (!OidIsValid(type))
! 								ereport(ERROR,
! 										(errcode(ERRCODE_UNDEFINED_OBJECT),
! 										 errmsg("could not find array type for data type %s",
! 								format_type_be(exprType((Node *) tent->expr)))));
! 						}
  					}
  				}
  				else if (sublink->subLinkType == MULTIEXPR_SUBLINK)
***************
*** 139,150 **** exprType(const Node *expr)
  					type = subplan->firstColType;
  					if (subplan->subLinkType == ARRAY_SUBLINK)
  					{
! 						type = get_array_type(type);
! 						if (!OidIsValid(type))
! 							ereport(ERROR,
! 									(errcode(ERRCODE_UNDEFINED_OBJECT),
! 									 errmsg("could not find array type for data type %s",
! 									format_type_be(subplan->firstColType))));
  					}
  				}
  				else if (subplan->subLinkType == MULTIEXPR_SUBLINK)
--- 143,158 ----
  					type = subplan->firstColType;
  					if (subplan->subLinkType == ARRAY_SUBLINK)
  					{
! 						if (!OidIsValid(get_element_type(type)))
! 						{
! 							/* not array, so check for its array type */
! 							type = get_array_type(type);
! 							if (!OidIsValid(type))
! 								ereport(ERROR,
! 										(errcode(ERRCODE_UNDEFINED_OBJECT),
! 										 errmsg("could not find array type for data type %s",
! 										format_type_be(subplan->firstColType))));
! 						}
  					}
  				}
  				else if (subplan->subLinkType == MULTIEXPR_SUBLINK)
*** a/src/backend/optimizer/plan/subselect.c
--- b/src/backend/optimizer/plan/subselect.c
***************
*** 668,677 **** build_subplan(PlannerInfo *root, Plan *plan, PlannerInfo *subroot,
  
  		Assert(!te->resjunk);
  		Assert(testexpr == NULL);
! 		arraytype = get_array_type(exprType((Node *) te->expr));
! 		if (!OidIsValid(arraytype))
! 			elog(ERROR, "could not find array type for datatype %s",
! 				 format_type_be(exprType((Node *) te->expr)));
  		prm = generate_new_param(root,
  								 arraytype,
  								 exprTypmod((Node *) te->expr),
--- 668,683 ----
  
  		Assert(!te->resjunk);
  		Assert(testexpr == NULL);
! 
! 		arraytype = exprType((Node *) te->expr);
! 		if (!OidIsValid(get_element_type(arraytype)))
! 		{
! 			/* not array, so get the array type */
! 			arraytype = get_array_type(exprType((Node *) te->expr));
! 			if (!OidIsValid(arraytype))
! 				elog(ERROR, "could not find array type for datatype %s",
! 					 format_type_be(exprType((Node *) te->expr)));
! 		}
  		prm = generate_new_param(root,
  								 arraytype,
  								 exprTypmod((Node *) te->expr),
*** a/src/backend/utils/adt/array_userfuncs.c
--- b/src/backend/utils/adt/array_userfuncs.c
***************
*** 16,22 ****
  #include "utils/builtins.h"
  #include "utils/lsyscache.h"
  
- 
  /*-----------------------------------------------------------------------------
   * array_push :
   *		push an element onto either end of a one-dimensional array
--- 16,21 ----
*** a/src/backend/utils/adt/arrayfuncs.c
--- b/src/backend/utils/adt/arrayfuncs.c
***************
*** 145,151 **** static int width_bucket_array_variable(Datum operand,
  							Oid collation,
  							TypeCacheEntry *typentry);
  
- 
  /*
   * array_in :
   *		  converts an array from the external format in "string" to
--- 145,150 ----
*** a/src/include/catalog/pg_aggregate.h
--- b/src/include/catalog/pg_aggregate.h
***************
*** 275,280 **** DATA(insert ( 2901	n 0 xmlconcat2	-					-				-				-				f f 0	142		0	0		0	_null_
--- 275,281 ----
  
  /* array */
  DATA(insert ( 2335	n 0 array_agg_transfn	array_agg_finalfn	-				-				-				t f 0	2281	0	0		0	_null_ _null_ ));
+ DATA(insert ( 6005	n 0 array_agg_anyarray_transfn	array_agg_anyarray_finalfn	-				-				-				t f 0	2281	0	0		0	_null_ _null_ ));
  
  /* text */
  DATA(insert ( 3538	n 0 string_agg_transfn	string_agg_finalfn	-				-				-				f f 0	2281	0	0		0	_null_ _null_ ));
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
***************
*** 879,889 **** DATA(insert OID = 3167 (  array_remove	   PGNSP PGUID 12 1 0 0 0 f f f f f f i 2
  DESCR("remove any occurrences of an element from an array");
  DATA(insert OID = 3168 (  array_replace    PGNSP PGUID 12 1 0 0 0 f f f f f f i 3 0 2277 "2277 2283 2283" _null_ _null_ _null_ _null_ array_replace _null_ _null_ _null_ ));
  DESCR("replace any occurrences of an element in an array");
! DATA(insert OID = 2333 (  array_agg_transfn   PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 2281 "2281 2283" _null_ _null_ _null_ _null_ array_agg_transfn _null_ _null_ _null_ ));
  DESCR("aggregate transition function");
! DATA(insert OID = 2334 (  array_agg_finalfn   PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 2277 "2281 2283" _null_ _null_ _null_ _null_ array_agg_finalfn _null_ _null_ _null_ ));
  DESCR("aggregate final function");
! DATA(insert OID = 2335 (  array_agg		   PGNSP PGUID 12 1 0 0 0 t f f f f f i 1 0 2277 "2283" _null_ _null_ _null_ _null_ aggregate_dummy _null_ _null_ _null_ ));
  DESCR("concatenate aggregate input into an array");
  DATA(insert OID = 3218 ( width_bucket	   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 23 "2283 2277" _null_ _null_ _null_ _null_ width_bucket_array _null_ _null_ _null_ ));
  DESCR("bucket number of operand given a sorted array of bucket lower bounds");
--- 879,895 ----
  DESCR("remove any occurrences of an element from an array");
  DATA(insert OID = 3168 (  array_replace    PGNSP PGUID 12 1 0 0 0 f f f f f f i 3 0 2277 "2277 2283 2283" _null_ _null_ _null_ _null_ array_replace _null_ _null_ _null_ ));
  DESCR("replace any occurrences of an element in an array");
! DATA(insert OID = 2333 (  array_agg_transfn   PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 2281 "2281 2776" _null_ _null_ _null_ _null_ array_agg_transfn _null_ _null_ _null_ ));
  DESCR("aggregate transition function");
! DATA(insert OID = 2334 (  array_agg_finalfn   PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 2277 "2281 2776" _null_ _null_ _null_ _null_ array_agg_finalfn _null_ _null_ _null_ ));
  DESCR("aggregate final function");
! DATA(insert OID = 2335 (  array_agg		   PGNSP PGUID 12 1 0 0 0 t f f f f f i 1 0 2277 "2776" _null_ _null_ _null_ _null_ aggregate_dummy _null_ _null_ _null_ ));
! DESCR("concatenate aggregate input into an array");
! DATA(insert OID = 6003 (  array_agg_anyarray_transfn   PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 2281 "2281 2277" _null_ _null_ _null_ _null_ array_agg_anyarray_transfn _null_ _null_ _null_ ));
! DESCR("aggregate transition function");
! DATA(insert OID = 6004 (  array_agg_anyarray_finalfn   PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 2277 "2281 2277" _null_ _null_ _null_ _null_ array_agg_anyarray_finalfn _null_ _null_ _null_ ));
! DESCR("aggregate final function");
! DATA(insert OID = 6005 (  array_agg		   PGNSP PGUID 12 1 0 0 0 t f f f f f i 1 0 2277 "2277" _null_ _null_ _null_ _null_ aggregate_dummy _null_ _null_ _null_ ));
  DESCR("concatenate aggregate input into an array");
  DATA(insert OID = 3218 ( width_bucket	   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 23 "2283 2277" _null_ _null_ _null_ _null_ width_bucket_array _null_ _null_ _null_ ));
  DESCR("bucket number of operand given a sorted array of bucket lower bounds");
*** a/src/include/utils/array.h
--- b/src/include/utils/array.h
***************
*** 76,94 **** typedef struct
  
  /*
   * working state for accumArrayResult() and friends
   */
  typedef struct ArrayBuildState
  {
  	MemoryContext mcontext;		/* where all the temp stuff is kept */
  	Datum	   *dvalues;		/* array of accumulated Datums */
  	bool	   *dnulls;			/* array of is-null flags for Datums */
  	int			alen;			/* allocated length of above arrays */
  	int			nelems;			/* number of valid entries in above arrays */
! 	Oid			element_type;	/* data type of the Datums */
! 	int16		typlen;			/* needed info about datatype */
! 	bool		typbyval;
! 	char		typalign;
! } ArrayBuildState;
  
  /*
   * structure to cache type metadata needed for array manipulation
--- 76,132 ----
  
  /*
   * working state for accumArrayResult() and friends
+  *
+  * is_array_accum: whether accumulating array values.
+  * (if true must be casted to ArrayBuildStateArray, else
+  *  cast to ArrayBuildStateScalar)
   */
  typedef struct ArrayBuildState
  {
+ 	bool		is_array_accum;
  	MemoryContext mcontext;		/* where all the temp stuff is kept */
+ 	Oid			element_type;	/* data type of the Datums */
+ 	int16		typlen;			/* needed info about datatype */
+ 	bool		typbyval;
+ 	char		typalign;
+ } ArrayBuildState;
+ 
+ /*
+  * array build state for array accumulation of scalar datums
+  */
+ typedef struct ArrayBuildStateScalar
+ {
+ 	ArrayBuildState astate;
+ 
  	Datum	   *dvalues;		/* array of accumulated Datums */
  	bool	   *dnulls;			/* array of is-null flags for Datums */
  	int			alen;			/* allocated length of above arrays */
  	int			nelems;			/* number of valid entries in above arrays */
! } ArrayBuildStateScalar;
! 
! 
! /*
!  * array build state for array accumulation of array datums
!  */
! typedef struct
! {
! 	ArrayBuildState astate;
! 
! 	char	   *data;			/* array of accumulated data */
! 	bits8	   *nullbitmap;		/* bitmap of is-null flags for data */
! 
! 	int			abytes;			/* allocated length of above arrays */
! 	int			aitems;			/* allocated length of above arrays */
! 	int			nbytes;			/* number of used bytes in above arrays */
! 	int			nitems;			/* number of elements in above arrays */
! 	int			narray;			/* number of array accumulated */
! 
! 	int			ndims;			/* element dimensions */
! 	int		   *dims;
! 	int		   *lbs;
! 
! 	bool		hasnull;		/* any element has null */
! } ArrayBuildStateArray;
  
  /*
   * structure to cache type metadata needed for array manipulation
***************
*** 260,266 **** extern Datum makeArrayResult(ArrayBuildState *astate,
  				MemoryContext rcontext);
  extern Datum makeMdArrayResult(ArrayBuildState *astate, int ndims,
  				  int *dims, int *lbs, MemoryContext rcontext, bool release);
! 
  extern ArrayIterator array_create_iterator(ArrayType *arr, int slice_ndim);
  extern bool array_iterate(ArrayIterator iterator, Datum *value, bool *isnull);
  extern void array_free_iterator(ArrayIterator iterator);
--- 298,305 ----
  				MemoryContext rcontext);
  extern Datum makeMdArrayResult(ArrayBuildState *astate, int ndims,
  				  int *dims, int *lbs, MemoryContext rcontext, bool release);
! extern Datum makeArrayResultArray(ArrayBuildStateArray *astate,
! 				  MemoryContext rcontext, bool release);
  extern ArrayIterator array_create_iterator(ArrayType *arr, int slice_ndim);
  extern bool array_iterate(ArrayIterator iterator, Datum *value, bool *isnull);
  extern void array_free_iterator(ArrayIterator iterator);
***************
*** 293,298 **** extern ArrayType *create_singleton_array(FunctionCallInfo fcinfo,
--- 332,340 ----
  extern Datum array_agg_transfn(PG_FUNCTION_ARGS);
  extern Datum array_agg_finalfn(PG_FUNCTION_ARGS);
  
+ extern Datum array_agg_anyarray_transfn(PG_FUNCTION_ARGS);
+ extern Datum array_agg_anyarray_finalfn(PG_FUNCTION_ARGS);
+ 
  /*
   * prototypes for functions defined in array_typanalyze.c
   */
*** a/src/test/regress/expected/aggregates.out
--- b/src/test/regress/expected/aggregates.out
***************
*** 914,919 **** select array_agg(distinct a order by a desc nulls last)
--- 914,946 ----
   {3,2,1,NULL}
  (1 row)
  
+ -- array_agg(anyarray)
+ select array_agg(ar)
+   from (values ('{1,2}'::int[]), ('{3,4}'::int[])) v(ar);
+    array_agg   
+ ---------------
+  {{1,2},{3,4}}
+ (1 row)
+ 
+ select array_agg(distinct ar order by ar desc)
+   from (select array[i / 2] from generate_series(1,10) a(i)) b(ar);
+          array_agg         
+ ---------------------------
+  {{5},{4},{3},{2},{1},{0}}
+ (1 row)
+ 
+ select array_agg(ar)
+   from (select array_agg(array[i, i+1, i-1]) 
+           from generate_series(1,2) a(i)) b(ar);
+       array_agg      
+ ---------------------
+  {{{1,2,0},{2,3,1}}}
+ (1 row)
+ 
+ select array_agg('{}'::int[]) from generate_series(1,2);
+ ERROR:  cannot accumulate empty arrays
+ select array_agg(null::int[]) from generate_series(1,2);
+ ERROR:  cannot accumulate null arrays
  -- multi-arg aggs, strict/nonstrict, distinct/order by
  select aggfstr(a,b,c)
    from (values (1,3,'foo'),(0,null,null),(2,2,'bar'),(3,1,'baz')) v(a,b,c);
*** a/src/test/regress/expected/arrays.out
--- b/src/test/regress/expected/arrays.out
***************
*** 1521,1526 **** select array_agg(unique1) from tenk1 where unique1 < -15;
--- 1521,1543 ----
   
  (1 row)
  
+ select array(select unique1 from tenk1 where unique1 < 15 order by unique1);
+                 array                 
+ --------------------------------------
+  {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14}
+ (1 row)
+ 
+ select array(select array[i,i/2] from generate_series(1,5) a(i));
+               array              
+ ---------------------------------
+  {{1,0},{2,1},{3,1},{4,2},{5,2}}
+ (1 row)
+ 
+ -- cannot accumulate null arrays and empty arrays
+ select array(select null::int[]);
+ ERROR:  cannot accumulate null arrays
+ select array(select '{}'::int[]);
+ ERROR:  cannot accumulate empty arrays
  select unnest(array[1,2,3]);
   unnest 
  --------
*** a/src/test/regress/sql/aggregates.sql
--- b/src/test/regress/sql/aggregates.sql
***************
*** 322,327 **** select array_agg(distinct a order by a desc)
--- 322,339 ----
  select array_agg(distinct a order by a desc nulls last)
    from (values (1),(2),(1),(3),(null),(2)) v(a);
  
+ -- array_agg(anyarray)
+ select array_agg(ar)
+   from (values ('{1,2}'::int[]), ('{3,4}'::int[])) v(ar);
+ select array_agg(distinct ar order by ar desc)
+   from (select array[i / 2] from generate_series(1,10) a(i)) b(ar);
+ select array_agg(ar)
+   from (select array_agg(array[i, i+1, i-1])
+           from generate_series(1,2) a(i)) b(ar);
+ 
+ select array_agg('{}'::int[]) from generate_series(1,2);
+ select array_agg(null::int[]) from generate_series(1,2);
+ 
  -- multi-arg aggs, strict/nonstrict, distinct/order by
  
  select aggfstr(a,b,c)
*** a/src/test/regress/sql/arrays.sql
--- b/src/test/regress/sql/arrays.sql
***************
*** 432,437 **** select array_agg(ten) from (select ten from tenk1 where unique1 < 15 order by un
--- 432,443 ----
  select array_agg(nullif(ten, 4)) from (select ten from tenk1 where unique1 < 15 order by unique1) ss;
  select array_agg(unique1) from tenk1 where unique1 < -15;
  
+ select array(select unique1 from tenk1 where unique1 < 15 order by unique1);
+ select array(select array[i,i/2] from generate_series(1,5) a(i));
+ -- cannot accumulate null arrays and empty arrays
+ select array(select null::int[]);
+ select array(select '{}'::int[]);
+ 
  select unnest(array[1,2,3]);
  select * from unnest(array[1,2,3]);
  select unnest(array[1,2,3,4.5]::float8[]);
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to