2014-10-23 4:00 GMT+02:00 Ali Akbar <the.ap...@gmail.com>: > 2014-10-22 22:48 GMT+07:00 Pavel Stehule <pavel.steh...@gmail.com>: > >> 2014-10-22 16:58 GMT+02:00 Ali Akbar <the.ap...@gmail.com>: >> >>> Thanks for the review >>> >>> 2014-10-22 20:51 GMT+07:00 Pavel Stehule <pavel.steh...@gmail.com>: >>> >>>> I agree with your proposal. I have a few comments to design: >>>> >>> >>>> 1. patch doesn't hold documentation and regress tests, please append it. >>>> >>> OK, i'll add the documentation and regression test >>> >>> >>>> 2. this functionality (multidimensional aggregation) can be interesting >>>> more times, so maybe some interface like array builder should be preferred. >>>> >>> We already have accumArrayResult and makeArrayResult/makeMdArrayResult, >>> haven't we? >>> >>> Actually array_agg(anyarray) can be implemented by using >>> accumArrayResult and makeMdArrayResult, but in the process we will >>> deconstruct the array data and null bit-flags into ArrayBuildState->dvalues >>> and dnulls. And we will reconstruct it through makeMdArrayResult. I think >>> this way it's not efficient, so i decided to reimplement it and memcpy the >>> array data and null flags as-is. >>> >> >> aha, so isn't better to fix a performance for accumArrayResult ? >> > > Ok, i'll go this route. While reading the code of array(subselect) as you > pointed below, i think the easiest way to implement support for both > array_agg(anyarray) and array(subselect) is to change accumArrayResult so > it operates both with scalar datum(s) and array datums, with performance > optimization for the latter. > > In other places, i think it's clearer if we just use accumArrayResult and >>> makeArrayResult/makeMdArrayResult as API for building (multidimensional) >>> arrays. >>> >>> >>>> 3. array_agg was consistent with array(subselect), so it should be >>>> fixed too >>>> >>>> postgres=# select array_agg(a) from test; >>>> array_agg >>>> ----------------------- >>>> {{1,2,3,4},{1,2,3,4}} >>>> (1 row) >>>> >>>> postgres=# select array(select a from test); >>>> ERROR: could not find array type for data type integer[] >>>> >>> >>> I'm pretty new in postgresql development. Can you point out where is >>> array(subselect) implemented? >>> >> >> where you can start? >> >> postgres=# explain select array(select a from test); >> ERROR: 42704: could not find array type for data type integer[] >> LOCATION: exprType, nodeFuncs.c:116 >> >> look to code ... your magic keyword is a ARRAY_SUBLINK .. search in >> postgresql sources this keyword >> > > Found it. Thanks. > > On a side note in postgres array type for integer[] is still integer[], > but in pg_type, integer[] has no array type. I propose we consider to > change it so array type of anyarray is itself (not in this patch, of > course, because it is a big change). Consider the following code in > nodeFuncs.c:109 >
yes, it is true - this is really big change and maybe needs separate discuss - ***if we allow cycle there. I am not sure about possible side effects***. Maybe this change is not necessary, you can fix a check only ... "if type is not array or if get_array_type is null raise a error" > > if (sublink->subLinkType == ARRAY_SUBLINK) > { > type = get_array_type(type); > if (!OidIsValid(type)) > ereport(ERROR, > (errcode(ERRCODE_UNDEFINED_OBJECT), > errmsg("could not find array type for data type %s", > format_type_be(exprType((Node *) tent->expr))))); > } > > to implement array(subselect) you pointed above, we must specially checks > for array types. Quick search on get_array_type returns 10 places. > attention: probably we don't would to allow arrays everywhere. > > I'll think more about this later. For this patch, i'll go without changes > in pg_type.h. > > 4. why you use a magic constant (64) there? >>>> >>>> + astate->abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes); >>>> + astate->aitems = 64 * nitems; >>>> >>>> + astate->nullbitmap = (bits8 *) >>>> + repalloc(astate->nullbitmap, (astate->aitems + 7) / >>>> 8); >>>> >>> >>> just follow the arbitrary size choosen in accumArrayResult >>> (arrayfuncs.c): >>> astate->alen = 64; /* arbitrary starting array size */ >>> >>> it can be any number not too small and too big. Too small, and we will >>> realloc shortly. Too big, we will end up wasting memory. >>> >> >> you can try to alloc 1KB instead as start -- it is used more times in >> Postgres. Then a overhead is max 1KB per agg call - what is acceptable. >> >> You take this value from accumArrayResult, but it is targeted for shorted >> scalars - you should to expect so any array will be much larger. >> > > Ok. > > > Regards, > -- > Ali Akbar >