2014-10-22 22:48 GMT+07:00 Pavel Stehule <pavel.steh...@gmail.com>: > 2014-10-22 16:58 GMT+02:00 Ali Akbar <the.ap...@gmail.com>: > >> Thanks for the review >> >> 2014-10-22 20:51 GMT+07:00 Pavel Stehule <pavel.steh...@gmail.com>: >> >>> I agree with your proposal. I have a few comments to design: >>> >> >>> 1. patch doesn't hold documentation and regress tests, please append it. >>> >> OK, i'll add the documentation and regression test >> >> >>> 2. this functionality (multidimensional aggregation) can be interesting >>> more times, so maybe some interface like array builder should be preferred. >>> >> We already have accumArrayResult and makeArrayResult/makeMdArrayResult, >> haven't we? >> >> Actually array_agg(anyarray) can be implemented by using accumArrayResult >> and makeMdArrayResult, but in the process we will deconstruct the array >> data and null bit-flags into ArrayBuildState->dvalues and dnulls. And we >> will reconstruct it through makeMdArrayResult. I think this way it's not >> efficient, so i decided to reimplement it and memcpy the array data and >> null flags as-is. >> > > aha, so isn't better to fix a performance for accumArrayResult ? >
Ok, i'll go this route. While reading the code of array(subselect) as you pointed below, i think the easiest way to implement support for both array_agg(anyarray) and array(subselect) is to change accumArrayResult so it operates both with scalar datum(s) and array datums, with performance optimization for the latter. In other places, i think it's clearer if we just use accumArrayResult and >> makeArrayResult/makeMdArrayResult as API for building (multidimensional) >> arrays. >> >> >>> 3. array_agg was consistent with array(subselect), so it should be fixed >>> too >>> >>> postgres=# select array_agg(a) from test; >>> array_agg >>> ----------------------- >>> {{1,2,3,4},{1,2,3,4}} >>> (1 row) >>> >>> postgres=# select array(select a from test); >>> ERROR: could not find array type for data type integer[] >>> >> >> I'm pretty new in postgresql development. Can you point out where is >> array(subselect) implemented? >> > > where you can start? > > postgres=# explain select array(select a from test); > ERROR: 42704: could not find array type for data type integer[] > LOCATION: exprType, nodeFuncs.c:116 > > look to code ... your magic keyword is a ARRAY_SUBLINK .. search in > postgresql sources this keyword > Found it. Thanks. On a side note in postgres array type for integer[] is still integer[], but in pg_type, integer[] has no array type. I propose we consider to change it so array type of anyarray is itself (not in this patch, of course, because it is a big change). Consider the following code in nodeFuncs.c:109 if (sublink->subLinkType == ARRAY_SUBLINK) { type = get_array_type(type); if (!OidIsValid(type)) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_OBJECT), errmsg("could not find array type for data type %s", format_type_be(exprType((Node *) tent->expr))))); } to implement array(subselect) you pointed above, we must specially checks for array types. Quick search on get_array_type returns 10 places. I'll think more about this later. For this patch, i'll go without changes in pg_type.h. 4. why you use a magic constant (64) there? >>> >>> + astate->abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes); >>> + astate->aitems = 64 * nitems; >>> >>> + astate->nullbitmap = (bits8 *) >>> + repalloc(astate->nullbitmap, (astate->aitems + 7) / 8); >>> >> >> just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c): >> astate->alen = 64; /* arbitrary starting array size */ >> >> it can be any number not too small and too big. Too small, and we will >> realloc shortly. Too big, we will end up wasting memory. >> > > you can try to alloc 1KB instead as start -- it is used more times in > Postgres. Then a overhead is max 1KB per agg call - what is acceptable. > > You take this value from accumArrayResult, but it is targeted for shorted > scalars - you should to expect so any array will be much larger. > Ok. Regards, -- Ali Akbar