Re: range_agg with multirange inputs
This patch has been committed. I split it into a few pieces. On 12.03.22 04:18, Paul Jungwirth wrote: On 3/10/22 14:07, Chapman Flack wrote: When I apply this patch, I get a func.sgml with two entries for range_intersect_agg(anymultirange). Arg, fixed. In range_agg_transfn, you've changed the message in the "must be called with a range or multirange"; that seems like another good candidate to be an elog. Agreed. Updated here. I kept those messages as "range" or "multirange" separately, instead of "range or multirange". This way, we don't have to update all the messages of this kind when a new function is added. Since these are only internal messages anyway, I opted for higher maintainability.
Re: range_agg with multirange inputs
Fwiw the cfbot is failing due to a duplicate OID. Traditionally we didn't treat duplicate OIDs as reason to reject a patch because they're inevitable as other patches get committed and the committer can just renumber them. I think the cfbot kind of changes this calculus since it's a pain lose the visibility into whether the rest of the tests are passing that the cfbot normally gives us. If it's not to much of a hassle could you renumber resubmit the patch with an updated OID? [10:54:57.606] su postgres -c "make -s -j${BUILD_JOBS} world-bin" [10:54:57.927] Duplicate OIDs detected: [10:54:57.927] 8000
Re: range_agg with multirange inputs
On 03/11/22 22:18, Paul Jungwirth wrote: > Arg, fixed. > >> In range_agg_transfn, you've changed the message in the "must be called >> with a range or multirange"; that seems like another good candidate to >> be an elog. > > Agreed. Updated here. This looks good to me and passes installcheck-world, so I'll push the RfC button. > Sharing a prosrc seems even less remarkable than sharing an aggfinalfn. > You're right there are no cases for other finalfns yet, but I don't think > there is anything special about finalfns that would make this a weirder > thing to do there than with ordinary functions. You sent me back to look at how many of those there are. I get 42 cases of shared prosrc (43 now). The chief subgroup of those looks to involve sharing between parameter signatures where the types have identical layouts and the semantic differences are unimportant to the function in question (comparisons between bit or between varbit, overlaps taking timestamp or timestamptz, etc.). The other prominent group is range and multirange constructors, where the C function has an obviously generic name like range_constructor2 and gets shared by a bunch of SQL declarations. I think here we've added the first instance where the C function is shared by SQL-declared functions accepting two different polymorphic pseudotypes. But it's clearly simple and works, nothing objectionable about it. I had experimented with renaming multirange_agg_finalfn to just range_agg_finalfn so it would just look like two overloads of one function sharing a prosrc, and ultimately gave up because genbki.pl couldn't resolve the OID where the name is used in pg_aggregate.dat. That's why it surprised me to see three instances where other functions (overlaps, isfinite, name) do use the same SQL name for different overloads, but the explanation seems to be that nothing else at genbki time refers to those, so genbki's unique-name limitation doesn't affect them. Neither here nor there for this patch, but an interesting new thing I learned while reviewing it. Regards, -Chap
Re: range_agg with multirange inputs
On 3/10/22 14:07, Chapman Flack wrote: When I apply this patch, I get a func.sgml with two entries for range_intersect_agg(anymultirange). Arg, fixed. In range_agg_transfn, you've changed the message in the "must be called with a range or multirange"; that seems like another good candidate to be an elog. Agreed. Updated here. I think your query finds aggregate declarations that share the same SQL function declaration as their finalizer functions. That seems to be more common. The query I used looks for cases where different SQL-declared functions appear as finalizers of aggregates, but the different SQL declared functions share the same internal C implementation. Okay, I see. I believe that is quite common for ordinary SQL functions. Sharing a prosrc seems even less remarkable than sharing an aggfinalfn. You're right there are no cases for other finalfns yet, but I don't think there is anything special about finalfns that would make this a weirder thing to do there than with ordinary functions. Still, noting it with a comment does seem helpful. I've updated the remark to match what you suggested. Thank you again for the review, and sorry for so many iterations! :-) Yours, -- Paul ~{:-) p...@illuminatedcomputing.comFrom b409d7333132e34a928ec8cc0ecca0a3421b7268 Mon Sep 17 00:00:00 2001 From: "Paul A. Jungwirth" Date: Fri, 10 Dec 2021 16:04:57 -0800 Subject: [PATCH v4] Add range_agg with multirange inputs --- doc/src/sgml/func.sgml| 30 ++ src/backend/utils/adt/multirangetypes.c | 69 ++-- src/include/catalog/pg_aggregate.dat | 3 + src/include/catalog/pg_proc.dat | 11 ++ src/test/regress/expected/multirangetypes.out | 100 ++ src/test/regress/expected/opr_sanity.out | 1 + src/test/regress/sql/multirangetypes.sql | 21 src/test/regress/sql/opr_sanity.sql | 1 + 8 files changed, 230 insertions(+), 6 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index df3cd5987b..ab6c4f0093 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -20012,8 +20012,23 @@ SELECT NULLIF(value, '(none)') ... No + + + + range_agg + +range_agg ( value + anymultirange ) +anymultirange + + +Computes the union of the non-null input values. + + No + + range_intersect_agg @@ -20027,8 +20042,23 @@ SELECT NULLIF(value, '(none)') ... No + + + + range_intersect_agg + +range_intersect_agg ( value + anymultirange ) +anymultirange + + +Computes the intersection of the non-null input values. + + No + + string_agg diff --git a/src/backend/utils/adt/multirangetypes.c b/src/backend/utils/adt/multirangetypes.c index c474b24431..a6d376b083 100644 --- a/src/backend/utils/adt/multirangetypes.c +++ b/src/backend/utils/adt/multirangetypes.c @@ -1344,11 +1344,9 @@ range_agg_transfn(PG_FUNCTION_ARGS) elog(ERROR, "range_agg_transfn called in non-aggregate context"); rngtypoid = get_fn_expr_argtype(fcinfo->flinfo, 1); if (!type_is_range(rngtypoid)) - ereport(ERROR, -(errcode(ERRCODE_DATATYPE_MISMATCH), - errmsg("range_agg must be called with a range"))); + elog(ERROR, "range_agg must be called with a range or multirange"); if (PG_ARGISNULL(0)) state = initArrayResult(rngtypoid, aggContext, false); else @@ -1362,8 +1360,11 @@ range_agg_transfn(PG_FUNCTION_ARGS) } /* * range_agg_finalfn: use our internal array to merge touching ranges. + * + * Shared by range_agg_finalfn(anyrange) and + * multirange_agg_finalfn(anymultirange). */ Datum range_agg_finalfn(PG_FUNCTION_ARGS) { @@ -1397,8 +1398,66 @@ range_agg_finalfn(PG_FUNCTION_ARGS) PG_RETURN_MULTIRANGE_P(make_multirange(mltrngtypoid, typcache->rngtype, range_count, ranges)); } +/* + * multirange_agg_transfn: combine adjacent/overlapping multiranges. + * + * All we do here is gather the input multiranges' ranges into an array + * so that the finalfn can sort and combine them. + */ +Datum +multirange_agg_transfn(PG_FUNCTION_ARGS) +{ + MemoryContext aggContext; + Oid mltrngtypoid; + TypeCacheEntry *typcache; + TypeCacheEntry *rngtypcache; + ArrayBuildState *state; + MultirangeType *current; + int32 range_count; + RangeType **ranges; + int32 i; + + if (!AggCheckCallContext(fcinfo, &aggContext)) + elog(ERROR, "multirange_agg_transfn called in non-aggregate context"); + + mltrngtypoid = get_fn_expr_argtype(fcinfo->flinfo, 1); + if (!type_is_multirange(mltrngtypoid)) + elog(ERROR, "range_agg must be called with a range or multirange"); + + typcache = multirange_get_typcache(fcinfo, mltrngtypoid); + rngtypcache = typcache->rngtype; + + if (PG_ARGISNULL
Re: range_agg with multirange inputs
On 03/05/22 15:53, Paul Jungwirth wrote: > On 3/1/22 13:33, Chapman Flack wrote: >> I think the 4 lines should suffice, but it looks like this patch was >> generated from a rebase of the old one (with three lines) that ended up >> putting the new 'range_agg' entry ahead of 'max' in func.sgml, which >> position is now baked into the 4 lines of context. :) > > You're right, my last rebase messed up the docs. Here it is fixed. Sorry > about that! When I apply this patch, I get a func.sgml with two entries for range_intersect_agg(anymultirange). > I like the elog solution. I've changed them in both places. It looks like you've now got elog in three places: the "must be called with a range or multirange" in multirange_agg_transfn and multirange_intersect_agg_transfn, and the "called in non-aggregate context" in multirange_agg_transfn. I think that last is also ok, given that its state type is internal, so it shouldn't be reachable in a user call. In range_agg_transfn, you've changed the message in the "must be called with a range or multirange"; that seems like another good candidate to be an elog. > I see 13 other shared finalfns (using select array_agg(aggfnoid::regproc) as > procs, array_agg(aggtransfn) as transfns, aggfinalfn from pg_aggregate where > aggfinalfn is distinct from 0 group by aggfinalfn having count(*) > 1;) but > a comment can't hurt! Added. I think your query finds aggregate declarations that share the same SQL function declaration as their finalizer functions. That seems to be more common. The query I used looks for cases where different SQL-declared functions appear as finalizers of aggregates, but the different SQL declared functions share the same internal C implementation. That's the query where this seems to be the unique result. WITH finals(regp) AS ( SELECT DISTINCT CAST(aggfinalfn AS regprocedure) FROM pg_aggregate WHERE aggfinalfn <> 0 -- InvalidOid ) SELECT prosrc, array_agg(regp) FROM pg_proc, finals WHERE oid = regp AND prolang = 12 -- INTERNALlanguageId GROUP BY prosrc HAVING count(*) > 1; In other words, I think the interesting thing to say in the C comment is not "shared by range_agg(anyrange) and range_agg(anymultirange)", but "shared by range_agg_finalfn(internal,anyrange) and multirange_agg_finalfn(internal,anymultirange)". It seems a little extra surprising to have one C function declared in SQL with two different names and parameter signatures. It ends up working out because it relies on get_fn_expr_rettype, which can do its job for either polymorphic type it might find in the parameter declaration. But that's a bit subtle. :) Regards, -Chap
Re: range_agg with multirange inputs
On 3/1/22 13:33, Chapman Flack wrote: I think the 4 lines should suffice, but it looks like this patch was generated from a rebase of the old one (with three lines) that ended up putting the new 'range_agg' entry ahead of 'max' in func.sgml, which position is now baked into the 4 lines of context. :) You're right, my last rebase messed up the docs. Here it is fixed. Sorry about that! I would not change them to actual Assert, which would blow up the whole process on failure. If it's a genuine "not expected to happen" case, maybe changing it to elog (or ereport with errmsg_internal) would save a little workload for translators. I like the elog solution. I've changed them in both places. I did a small double-take seeing the C range_agg_finalfn being shared by the SQL range_agg_finalfn and multirange_agg_finalfn. I infer that the reason it works is get_fn_expr_rettype works equally well with either parameter type. Do you think it would be worth adding a comment at the C function explaining that? In a quick query I just did, I found no other aggregate final functions sharing a C function that way, so this could be the first. I see 13 other shared finalfns (using select array_agg(aggfnoid::regproc) as procs, array_agg(aggtransfn) as transfns, aggfinalfn from pg_aggregate where aggfinalfn is distinct from 0 group by aggfinalfn having count(*) > 1;) but a comment can't hurt! Added. Thanks, -- Paul ~{:-) p...@illuminatedcomputing.comFrom 1e7a96668b0ad341d29281055927ad693c656843 Mon Sep 17 00:00:00 2001 From: "Paul A. Jungwirth" Date: Fri, 10 Dec 2021 16:04:57 -0800 Subject: [PATCH v3] Add range_agg with multirange inputs --- doc/src/sgml/func.sgml| 45 src/backend/utils/adt/multirangetypes.c | 66 +++- src/include/catalog/pg_aggregate.dat | 3 + src/include/catalog/pg_proc.dat | 11 ++ src/test/regress/expected/multirangetypes.out | 100 ++ src/test/regress/expected/opr_sanity.out | 1 + src/test/regress/sql/multirangetypes.sql | 21 src/test/regress/sql/opr_sanity.sql | 1 + 8 files changed, 244 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index df3cd5987b..fb1963a687 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -20012,8 +20012,23 @@ SELECT NULLIF(value, '(none)') ... No + + + + range_agg + +range_agg ( value + anymultirange ) +anymultirange + + +Computes the union of the non-null input values. + + No + + range_intersect_agg @@ -20027,8 +20042,38 @@ SELECT NULLIF(value, '(none)') ... No + + + + range_intersect_agg + +range_intersect_agg ( value + anymultirange ) +anymultirange + + +Computes the intersection of the non-null input values. + + No + + + + + + range_intersect_agg + +range_intersect_agg ( value + anymultirange ) +anymultirange + + +Computes the intersection of the non-null input values. + + No + + string_agg diff --git a/src/backend/utils/adt/multirangetypes.c b/src/backend/utils/adt/multirangetypes.c index c474b24431..5a4c00457a 100644 --- a/src/backend/utils/adt/multirangetypes.c +++ b/src/backend/utils/adt/multirangetypes.c @@ -1346,9 +1346,9 @@ range_agg_transfn(PG_FUNCTION_ARGS) rngtypoid = get_fn_expr_argtype(fcinfo->flinfo, 1); if (!type_is_range(rngtypoid)) ereport(ERROR, (errcode(ERRCODE_DATATYPE_MISMATCH), - errmsg("range_agg must be called with a range"))); + errmsg("range_agg must be called with a range or multirange"))); if (PG_ARGISNULL(0)) state = initArrayResult(rngtypoid, aggContext, false); else @@ -1362,8 +1362,10 @@ range_agg_transfn(PG_FUNCTION_ARGS) } /* * range_agg_finalfn: use our internal array to merge touching ranges. + * + * Shared by range_agg(anyrange) and range_agg(anymultirange). */ Datum range_agg_finalfn(PG_FUNCTION_ARGS) { @@ -1397,8 +1399,66 @@ range_agg_finalfn(PG_FUNCTION_ARGS) PG_RETURN_MULTIRANGE_P(make_multirange(mltrngtypoid, typcache->rngtype, range_count, ranges)); } +/* + * multirange_agg_transfn: combine adjacent/overlapping multiranges. + * + * All we do here is gather the input multiranges' ranges into an array + * so that the finalfn can sort and combine them. + */ +Datum +multirange_agg_transfn(PG_FUNCTION_ARGS) +{ + MemoryContext aggContext; + Oid mltrngtypoid; + TypeCacheEntry *typcache; + TypeCacheEntry *rngtypcache; + ArrayBuildState *state; + MultirangeType *current; + int32 range_count; + RangeType **ranges; + int32 i; + + if (!AggCheckCallContext(f
Re: range_agg with multirange inputs
On 02/28/22 23:31, Paul Jungwirth wrote: > On 2/26/22 17:13, Chapman Flack wrote: >> (I think generating >> the patch with 4 lines of context would be enough to keep that from being >> a recurring issue.) > > Thank you for the review and the tip re 4 lines of context! Rebase attached. I think the 4 lines should suffice, but it looks like this patch was generated from a rebase of the old one (with three lines) that ended up putting the new 'range_agg' entry ahead of 'max' in func.sgml, which position is now baked into the 4 lines of context. :) So I think it needs a bit of manual attention to get the additions back in the right places, and then a 4-context-lines patch generated from that. > I changed the message to "range_agg must be called > with a range or multirange". How does that seem? That works for me. >> I kind of wonder whether either message is really reachable, at least >> through the aggregate machinery in the expected way. Won't that machinery >> ensure that it is calling the right transfn with the right type of >> argument? If that's the case, maybe the messages could only be seen >> by someone calling the transfn directly ... which also seems ruled out >> for these transfns because the state type is internal. Is this whole test >> more of the nature of an assertion? > > I don't think they are reachable, so perhaps they are more like asserts. Do > you think I should change it? It seems like a worthwhile check in any case. I would not change them to actual Assert, which would blow up the whole process on failure. If it's a genuine "not expected to happen" case, maybe changing it to elog (or ereport with errmsg_internal) would save a little workload for translators. But as you were copying an existing ereport with a translatable message, there's also an argument for sticking to that style, and maybe mentioning the question to an eventual committer who might have a stronger opinion. I did a small double-take seeing the C range_agg_finalfn being shared by the SQL range_agg_finalfn and multirange_agg_finalfn. I infer that the reason it works is get_fn_expr_rettype works equally well with either parameter type. Do you think it would be worth adding a comment at the C function explaining that? In a quick query I just did, I found no other aggregate final functions sharing a C function that way, so this could be the first. Regards, -Chap
Re: range_agg with multirange inputs
On 2/26/22 17:13, Chapman Flack wrote: This applies (with some fuzz) and passes installcheck-world, but a rebase is needed, because 3 lines of context aren't enough to get the doc changes in the right place in the aggregate function table. (I think generating the patch with 4 lines of context would be enough to keep that from being a recurring issue.) Thank you for the review and the tip re 4 lines of context! Rebase attached. One thing that seems a bit funny is this message in the new multirange_agg_transfn: + if (!type_is_multirange(mltrngtypoid)) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), +errmsg("range_agg must be called with a multirange"))); I agree it would be more helpful to users to let them know we can take either kind of argument. I changed the message to "range_agg must be called with a range or multirange". How does that seem? I kind of wonder whether either message is really reachable, at least through the aggregate machinery in the expected way. Won't that machinery ensure that it is calling the right transfn with the right type of argument? If that's the case, maybe the messages could only be seen by someone calling the transfn directly ... which also seems ruled out for these transfns because the state type is internal. Is this whole test more of the nature of an assertion? I don't think they are reachable, so perhaps they are more like asserts. Do you think I should change it? It seems like a worthwhile check in any case. Yours, -- Paul ~{:-) p...@illuminatedcomputing.comFrom a6689485aab9b1aaa6e866f2a577368c7a0e324e Mon Sep 17 00:00:00 2001 From: "Paul A. Jungwirth" Date: Fri, 10 Dec 2021 16:04:57 -0800 Subject: [PATCH v2] Add range_agg with multirange inputs --- doc/src/sgml/func.sgml| 30 ++ src/backend/utils/adt/multirangetypes.c | 62 ++- src/include/catalog/pg_aggregate.dat | 3 + src/include/catalog/pg_proc.dat | 11 ++ src/test/regress/expected/multirangetypes.out | 100 ++ src/test/regress/expected/opr_sanity.out | 1 + src/test/regress/sql/multirangetypes.sql | 21 src/test/regress/sql/opr_sanity.sql | 1 + 8 files changed, 228 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index df3cd5987b..6a72785327 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -19959,8 +19959,23 @@ SELECT NULLIF(value, '(none)') ... No + + + + range_agg + +range_agg ( value + anymultirange ) +anymultirange + + +Computes the union of the non-null input values. + + No + + max @@ -20012,8 +20027,23 @@ SELECT NULLIF(value, '(none)') ... No + + + + range_intersect_agg + +range_intersect_agg ( value + anymultirange ) +anymultirange + + +Computes the intersection of the non-null input values. + + No + + range_intersect_agg diff --git a/src/backend/utils/adt/multirangetypes.c b/src/backend/utils/adt/multirangetypes.c index c474b24431..0efef8cf35 100644 --- a/src/backend/utils/adt/multirangetypes.c +++ b/src/backend/utils/adt/multirangetypes.c @@ -1346,9 +1346,9 @@ range_agg_transfn(PG_FUNCTION_ARGS) rngtypoid = get_fn_expr_argtype(fcinfo->flinfo, 1); if (!type_is_range(rngtypoid)) ereport(ERROR, (errcode(ERRCODE_DATATYPE_MISMATCH), - errmsg("range_agg must be called with a range"))); + errmsg("range_agg must be called with a range or multirange"))); if (PG_ARGISNULL(0)) state = initArrayResult(rngtypoid, aggContext, false); else @@ -1397,8 +1397,68 @@ range_agg_finalfn(PG_FUNCTION_ARGS) PG_RETURN_MULTIRANGE_P(make_multirange(mltrngtypoid, typcache->rngtype, range_count, ranges)); } +/* + * multirange_agg_transfn: combine adjacent/overlapping multiranges. + * + * All we do here is gather the input multiranges' ranges into an array + * so that the finalfn can sort and combine them. + */ +Datum +multirange_agg_transfn(PG_FUNCTION_ARGS) +{ + MemoryContext aggContext; + Oid mltrngtypoid; + TypeCacheEntry *typcache; + TypeCacheEntry *rngtypcache; + ArrayBuildState *state; + MultirangeType *current; + int32 range_count; + RangeType **ranges; + int32 i; + + if (!AggCheckCallContext(fcinfo, &aggContext)) + elog(ERROR, "multirange_agg_transfn called in non-aggregate context"); + + mltrngtypoid = get_fn_expr_argtype(fcinfo->flinfo, 1); + if (!type_is_multirange(mltrngtypoid)) + ereport(ERROR, +(errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("range_agg must be called with a range or multirange"))); + + typcache = multirange_get_typcache(fcinfo, mltrngtypoid); + rngtypcache = typcache->rngtyp
Re: range_agg with multirange inputs
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:not tested This applies (with some fuzz) and passes installcheck-world, but a rebase is needed, because 3 lines of context aren't enough to get the doc changes in the right place in the aggregate function table. (I think generating the patch with 4 lines of context would be enough to keep that from being a recurring issue.) One thing that seems a bit funny is this message in the new multirange_agg_transfn: + if (!type_is_multirange(mltrngtypoid)) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), +errmsg("range_agg must be called with a multirange"))); It's clearly copied from the corresponding test and message in range_agg_transfn. They both say "range_agg must be called ...", which makes perfect sense, as from the user's perspective both messages come from (different overloads of) a function named range_agg. Still, it could be odd to have (again from the user's perspective) a function named range_agg that sometimes says "range_agg must be called with a range" and other times says "range_agg must be called with a multirange". I'm not sure how to tweak the wording (of either message or both) to make that less weird, but there's probably a way. I kind of wonder whether either message is really reachable, at least through the aggregate machinery in the expected way. Won't that machinery ensure that it is calling the right transfn with the right type of argument? If that's the case, maybe the messages could only be seen by someone calling the transfn directly ... which also seems ruled out for these transfns because the state type is internal. Is this whole test more of the nature of an assertion? Regards, -Chap The new status of this patch is: Waiting on Author
Re: range_agg
On Sun, Dec 27, 2020 at 9:07 PM David Fetter wrote: > On Sun, Dec 27, 2020 at 09:53:13AM -0800, Zhihong Yu wrote: > > This is not an ideal way to index multirages, but something we can > > easily have. > > What sort of indexing improvements do you have in mind? Approximation of multirange as a range can cause false positives. It's good if gaps are small, but what if they aren't. Ideally, we should split multirange to the ranges and index them separately. So, we would need a GIN-like index. The problem is that the GIN entry tree is a B-tree, which is not very useful for searching for ranges. If we could replace the GIN entry tree with GiST or SP-GiST, that should be good. We could index multirage parts separately and big gaps wouldn't be a problem. Similar work was already prototyped (it was prototyped under the name "vodka", but I'm not a big fan of this name). FWIW, such a new access method would need a lot of work to bring it to commit. I don't think it would be reasonable, before multiranges get popular. Regarding the GiST opclass, it seems the best we can do in GiST. -- Regards, Alexander Korotkov
Re: range_agg
On Sun, Dec 27, 2020 at 8:52 PM Zhihong Yu wrote: > This is not an ideal way to index multirages, but something we can easily > have. > > typo: multiranges Thanks for catching. I will revise the commit message before committing. -- Regards, Alexander Korotkov
Re: range_agg
On Sun, Dec 27, 2020 at 09:53:13AM -0800, Zhihong Yu wrote: > Hi, > > This is not an ideal way to index multirages, but something we can > easily have. What sort of indexing improvements do you have in mind? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: range_agg
Hi, This is not an ideal way to index multirages, but something we can easily have. typo: multiranges Cheers On Sun, Dec 27, 2020 at 1:50 AM Alexander Korotkov wrote: > On Thu, Dec 17, 2020 at 10:10 PM Alexander Korotkov > wrote: > > > > I think this patch is very close to committable. I'm going to spend > > some more time further polishing it and commit (if I don't find a > > major issue or face objections). > > The main patch is committed. I've prepared a set of improvements. > 0001 Fixes bug in bsearch comparison functions > 0002 Implements missing @> (range,multirange) operator and its commutator > 0003 Does refactors signatures of *_internal() multirange functions > 0004 Adds cross-type (range, multirange) operators handling to > existing range GiST opclass > 0005 Adds support for GiST multirange indexing by approximation of > multirange as the union range with no gaps > > The patchset is quite trivial. I'm going to push it if there are no > objections. > > The SP-GiST handling is more tricky and requires substantial work. > > -- > Regards, > Alexander Korotkov >
Re: range_agg
On Thu, Dec 17, 2020 at 10:10 PM Alexander Korotkov wrote: > > I think this patch is very close to committable. I'm going to spend > some more time further polishing it and commit (if I don't find a > major issue or face objections). The main patch is committed. I've prepared a set of improvements. 0001 Fixes bug in bsearch comparison functions 0002 Implements missing @> (range,multirange) operator and its commutator 0003 Does refactors signatures of *_internal() multirange functions 0004 Adds cross-type (range, multirange) operators handling to existing range GiST opclass 0005 Adds support for GiST multirange indexing by approximation of multirange as the union range with no gaps The patchset is quite trivial. I'm going to push it if there are no objections. The SP-GiST handling is more tricky and requires substantial work. -- Regards, Alexander Korotkov 0001-Fix-bugs-in-comparison-functions-for-multirange_bsea.patch Description: Binary data 0002-Implement-operators-for-checking-if-the-range-contai.patch Description: Binary data 0003-Improve-the-signature-of-internal-multirange-functio.patch Description: Binary data 0005-Add-GiST-indexes-for-multiranges.patch Description: Binary data 0004-Add-support-of-multirange-matching-to-the-existing-r.patch Description: Binary data
Re: range_agg
On Thu, Dec 17, 2020 at 2:37 AM Zhihong Yu wrote: > Letting user manually name the multirange (after a few automatic attempts) > seems reasonable. Accepted. Thank you for your feedback. -- Regards, Alexander Korotkov
Re: range_agg
Letting user manually name the multirange (after a few automatic attempts) seems reasonable. Cheers On Wed, Dec 16, 2020 at 3:34 PM Alexander Korotkov wrote: > On Thu, Dec 17, 2020 at 1:03 AM Alexander Korotkov > wrote: > > On Thu, Dec 17, 2020 at 12:54 AM Zhihong Yu wrote: > > > +* The idea is to prepend underscores as needed until we make a > name that > > > +* doesn't collide with anything ... > > > > > > I wonder if other characters (e.g. [a-z0-9]) can be used so that name > without collision can be found without calling truncate_identifier(). > > > > Probably. But multiranges just shares naming logic already existing > > in arrays. If we're going to change this, I think we should change > > this for arrays too. And this change shouldn't be part of multirange > > patch. > > I gave this another thought. Now we have facility to name multirange > types manually. I think we should give up with underscore naming > completely. If both replacing "range" with "mutlirange" in the > typename and appending "_multirange" to the type name failed (very > unlikely), then let user manually name the multirange. Any thoughts? > > -- > Regards, > Alexander Korotkov >
Re: range_agg
On Thu, Dec 17, 2020 at 1:03 AM Alexander Korotkov wrote: > On Thu, Dec 17, 2020 at 12:54 AM Zhihong Yu wrote: > > +* The idea is to prepend underscores as needed until we make a name > > that > > +* doesn't collide with anything ... > > > > I wonder if other characters (e.g. [a-z0-9]) can be used so that name > > without collision can be found without calling truncate_identifier(). > > Probably. But multiranges just shares naming logic already existing > in arrays. If we're going to change this, I think we should change > this for arrays too. And this change shouldn't be part of multirange > patch. I gave this another thought. Now we have facility to name multirange types manually. I think we should give up with underscore naming completely. If both replacing "range" with "mutlirange" in the typename and appending "_multirange" to the type name failed (very unlikely), then let user manually name the multirange. Any thoughts? -- Regards, Alexander Korotkov
Re: range_agg
On Thu, Dec 17, 2020 at 12:54 AM Zhihong Yu wrote: > +* The idea is to prepend underscores as needed until we make a name that > +* doesn't collide with anything ... > > I wonder if other characters (e.g. [a-z0-9]) can be used so that name without > collision can be found without calling truncate_identifier(). Probably. But multiranges just shares naming logic already existing in arrays. If we're going to change this, I think we should change this for arrays too. And this change shouldn't be part of multirange patch. > + else if (strcmp(defel->defname, "multirange_type_name") == 0) > + { > + if (multirangeTypeName != NULL) > + ereport(ERROR, > + (errcode(ERRCODE_SYNTAX_ERROR), > +errmsg("conflicting or redundant options"))); > > Maybe make the error message a bit different from occurrences of similar > error message (such as including multirangeTypeName) ? This is again isn't an invention of multirange. We use this message many times in DefineRange() and other places. From the first glance, I've nothing against changing this to a more informative message, but that should be done globally. And this change isn't directly related to multirage. Feel free to propose a patch improving this. -- Regards, Alexander Korotkov
Re: range_agg
On 2020-Dec-08, Alexander Korotkov wrote: > I also found a problem in multirange types naming logic. Consider the > following example. > > create type a_multirange AS (x float, y float); > create type a as range(subtype=text, collation="C"); > create table tbl (x __a_multirange); > drop type a_multirange; > > If you dump this database, the dump couldn't be restored. The > multirange type is named __a_multirange, because the type named > a_multirange already exists. However, it might appear that > a_multirange type is already deleted. When the dump is restored, a > multirange type is named a_multirange, and the corresponding table > fails to be created. The same thing doesn't happen with arrays, > because arrays are not referenced in dumps by their internal names. > > I think we probably should add an option to specify multirange type > names while creating a range type. Then dump can contain exact type > names used in the database, and restore wouldn't have a names > collision. Hmm, good point. I agree that a dump must preserve the name, since once created it is user-visible. I had not noticed this problem, but it's obvious in retrospect. > In general, I wonder if we can make the binary format of multiranges > more efficient. It seems that every function involving multiranges > from multirange_deserialize(). I think we can make functions like > multirange_contains_elem() much more efficient. Multirange is > basically an array of ranges. So we can pack it as follows. > 1. Typeid and rangecount > 2. Tightly packed array of flags (1-byte for each range) > 3. Array of indexes of boundaries (4-byte for each range). Or even > better we can combine offsets and lengths to be compression-friendly > like jsonb JEntry's do. > 4. Boundary values > Using this format, we can implement multirange_contains_elem(), > multirange_contains_range() without deserialization and using binary > search. That would be much more efficient. What do you think? I also agree. I spent some time staring at the I/O code a couple of months back but was unable to focus on it for long enough. I don't know JEntry's format, but I do remember that the storage format for JSONB was widely discussed back then; it seems wise to apply similar logic or at least similar reasoning.
Re: range_agg
On Mon, Nov 30, 2020 at 11:39 PM Alexander Korotkov wrote: > On Mon, Nov 30, 2020 at 10:35 PM Alexander Korotkov > wrote: > > On Sun, Nov 29, 2020 at 11:53 PM Paul A Jungwirth > > wrote: > > > > > > On Sun, Nov 29, 2020 at 11:43 AM Alexander Korotkov > > > wrote: > > > > Thank you. Could you please, update doc/src/sgml/catalogs.sgml, > > > > because pg_type and pg_range catalogs are updated. > > > > > > Attached! :-) > > > > You're quick, thank you. Please, also take a look at cfbot failure > > https://travis-ci.org/github/postgresql-cfbot/postgresql/builds/746623942 > > I've tried to reproduce it, but didn't manage yet. > > Got it. type_sanity test fails on any platform, you just need to > repeat "make check" till it fails. > > The failed query checked consistency of range types, but it didn't > take into account ranges of domains and ranges of records, which are > exercised by multirangetypes test running in parallel. We could teach > this query about such kinds of ranges, but I think that would be > overkill, because we're not going to introduce such builtin ranges > yet. So, I'm going to just move multirangetypes test into another > group of parallel tests. I also found a problem in multirange types naming logic. Consider the following example. create type a_multirange AS (x float, y float); create type a as range(subtype=text, collation="C"); create table tbl (x __a_multirange); drop type a_multirange; If you dump this database, the dump couldn't be restored. The multirange type is named __a_multirange, because the type named a_multirange already exists. However, it might appear that a_multirange type is already deleted. When the dump is restored, a multirange type is named a_multirange, and the corresponding table fails to be created. The same thing doesn't happen with arrays, because arrays are not referenced in dumps by their internal names. I think we probably should add an option to specify multirange type names while creating a range type. Then dump can contain exact type names used in the database, and restore wouldn't have a names collision. Another thing that worries me is the multirange serialization format. typedef struct { int32 vl_len_;/* varlena header */ charflags; /* range flags */ char_padding; /* Bounds must be aligned */ /* Following the header are zero to two bound values. */ } ShortRangeType; Comment says this structure doesn't contain a varlena header, while structure obviously has it. In general, I wonder if we can make the binary format of multiranges more efficient. It seems that every function involving multiranges from multirange_deserialize(). I think we can make functions like multirange_contains_elem() much more efficient. Multirange is basically an array of ranges. So we can pack it as follows. 1. Typeid and rangecount 2. Tightly packed array of flags (1-byte for each range) 3. Array of indexes of boundaries (4-byte for each range). Or even better we can combine offsets and lengths to be compression-friendly like jsonb JEntry's do. 4. Boundary values Using this format, we can implement multirange_contains_elem(), multirange_contains_range() without deserialization and using binary search. That would be much more efficient. What do you think? -- Regards, Alexander Korotkov
Re: range_agg
On Mon, Nov 30, 2020 at 10:35 PM Alexander Korotkov wrote: > On Sun, Nov 29, 2020 at 11:53 PM Paul A Jungwirth > wrote: > > > > On Sun, Nov 29, 2020 at 11:43 AM Alexander Korotkov > > wrote: > > > Thank you. Could you please, update doc/src/sgml/catalogs.sgml, > > > because pg_type and pg_range catalogs are updated. > > > > Attached! :-) > > You're quick, thank you. Please, also take a look at cfbot failure > https://travis-ci.org/github/postgresql-cfbot/postgresql/builds/746623942 > I've tried to reproduce it, but didn't manage yet. Got it. type_sanity test fails on any platform, you just need to repeat "make check" till it fails. The failed query checked consistency of range types, but it didn't take into account ranges of domains and ranges of records, which are exercised by multirangetypes test running in parallel. We could teach this query about such kinds of ranges, but I think that would be overkill, because we're not going to introduce such builtin ranges yet. So, I'm going to just move multirangetypes test into another group of parallel tests. -- Regards, Alexander Korotkov
Re: range_agg
On Sun, Nov 29, 2020 at 11:53 PM Paul A Jungwirth wrote: > > On Sun, Nov 29, 2020 at 11:43 AM Alexander Korotkov > wrote: > > Thank you. Could you please, update doc/src/sgml/catalogs.sgml, > > because pg_type and pg_range catalogs are updated. > > Attached! :-) You're quick, thank you. Please, also take a look at cfbot failure https://travis-ci.org/github/postgresql-cfbot/postgresql/builds/746623942 I've tried to reproduce it, but didn't manage yet. -- Regards, Alexander Korotkov
Re: range_agg
On Sun, Nov 29, 2020 at 8:11 PM Paul A Jungwirth wrote: > On Fri, Nov 27, 2020 at 12:35 AM Alexander Korotkov > wrote: > > I'd like to review this patch. Could you please rebase it once again? > > Thanks. > > Thanks! Here is a rebased version. It also includes one more cleanup > commit from Alvaro since the last one. Thank you. Could you please, update doc/src/sgml/catalogs.sgml, because pg_type and pg_range catalogs are updated. -- Regards, Alexander Korotkov
Re: range_agg
Hi! On Thu, Sep 24, 2020 at 3:05 AM Paul A Jungwirth wrote: > On Sun, Aug 16, 2020 at 12:55 PM Paul A Jungwirth > wrote: > > This is rebased on the current master, including some changes to doc > > tables and pg_upgrade handling of type oids. > > Here is a rebased version of this patch, including a bunch of cleanup > from Alvaro. (Thanks Alvaro!) I'd like to review this patch. Could you please rebase it once again? Thanks. -- Regards, Alexander Korotkov
Re: range_agg
čt 24. 9. 2020 v 2:05 odesílatel Paul A Jungwirth < p...@illuminatedcomputing.com> napsal: > On Sun, Aug 16, 2020 at 12:55 PM Paul A Jungwirth > wrote: > > This is rebased on the current master, including some changes to doc > > tables and pg_upgrade handling of type oids. > > Here is a rebased version of this patch, including a bunch of cleanup > from Alvaro. (Thanks Alvaro!) > I tested this patch and It looks well, I have not any objections 1. there are not new warnings 2. make check-world passed 3. build doc without problems 4. doc is enough, regress tests too 5. there was not objection against this feature in discussion, and I think it is interesting and useful feature - good additional to arrays Regards Pavel > Paul >
Re: range_agg
On Sun, Jul 5, 2020 at 12:11 PM Paul A Jungwirth wrote: > > Just knowing that arrays are > something we do this for is enough to hunt for clues, but if anyone > can point me more directly to code that will help me do it for > multiranges, I'd be appreciative. It looks like expandeddatum.h is where I should be looking. . . . Paul
Re: range_agg
On Sat, Apr 11, 2020 at 09:36:37AM -0700, Paul A Jungwirth wrote: > On Fri, Apr 10, 2020 at 8:44 PM Paul A Jungwirth > wrote: > > Thanks, and thanks for your v17 also. Here is a patch building on that > > and adding support for anycompatiblemultirange. > > Here is a v19 that just moved the multirange tests to a new parallel > group to avoid a max-20-tests error. Sorry about that! This needs to be rebased ; set cfbot to "waiting". -- Justin
Re: range_agg
On 2020-Apr-11, Paul A Jungwirth wrote: > On Sat, Apr 11, 2020 at 9:36 AM Paul A Jungwirth > wrote: > > Btw I'm working on typanalyze + selectivity, and it seems like the > > test suite doesn't run those things? > > Nevermind, I just had to add `analyze numrange_test` to > src/test/regress/sql/rangetypes.sql. :-) Do you want a separate patch > for that? Or maybe it should go in sql/vacuum.sql? Dunno, it seems fine in rangetypes.sql. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: range_agg
On Sat, Apr 11, 2020 at 9:36 AM Paul A Jungwirth wrote: > Btw I'm working on typanalyze + selectivity, and it seems like the > test suite doesn't run those things? Nevermind, I just had to add `analyze numrange_test` to src/test/regress/sql/rangetypes.sql. :-) Do you want a separate patch for that? Or maybe it should go in sql/vacuum.sql? Regards, Paul
Re: range_agg
v17 is a rebase fixing a minor parse_coerce.c edit; v16 lasted little :-( I chose to change the wording of the conflicting comment in enforce_generic_type_consistency(): * 3) Similarly, if return type is ANYRANGE or ANYMULTIRANGE, and any *argument is ANYRANGE or ANYMULTIRANGE, use that argument's *actual type, range type or multirange type as the function's return *type. This wording is less precise, in that it doesn't say exactly which of the three types is the actual result for each of the possible four cases (r->r, r->m, m->m, m->r) but I think it should be straightforward. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: range_agg
Thanks Alvaro! On Mon, Mar 23, 2020 at 4:33 PM Alvaro Herrera wrote: > > Thinking about the on-disk representation, can we do better than putting > the contained ranges in long-varlena format, including padding; also we > include the type OID with each element. Sounds wasteful. A more > compact representation might be to allow short varlenas and doing away > with the alignment padding, put the the type OID just once. This is > important because we cannot change it later. Can you give me some guidance on this? I don't know how to make the on-disk format different from the in-memory format. (And for the in-memory format, I think it's important to have actual RangeTypes inside the multirange.) Is there something in the documentation, or a README in the repo, or even another type I can follow? > I'm also wondering if multirange_in() is the right strategy. Would it> be sensible to give each range to range_parse or range_parse_bounde, so > that it determines where each range starts and ends? Then that function > doesn't have to worry about each quote and escape, duplicating range > parsing code. (This will probably require changing signature of the > rangetypes.c function, and exporting it; for example have > range_parse_bound allow bound_str to be NULL and in that case don't mess > with the StringInfo and just return the end position of the parsed > bound.) Yeah, I really wanted to do it that way originally too. As you say it would require passing back more information from the range-parsing code. I can take a stab at making the necessary changes. I'm a bit more confident now than I was then in changing the range code we have already. Regards, -- Paul ~{:-) p...@illuminatedcomputing.com
Re: range_agg
On 2020-Mar-19, Paul A Jungwirth wrote: > On Thu, Mar 19, 2020 at 1:43 PM Paul A Jungwirth > wrote: > > On Thu, Mar 19, 2020 at 1:42 PM Alvaro Herrera > > wrote: > > > There's been another flurry of commits in the polymorphic types area. > > > Can you please rebase again? > > > > I noticed that too. :-) I'm about halfway through a rebase right now. > > I can probably finish it up tonight. > > Here is that patch. I should probably add an anycompatiblemultirange > type now too? I'll get started on that tomorrow. Thanks for the new version. Here's a few minor adjustments while I continue to read through it. Thinking about the on-disk representation, can we do better than putting the contained ranges in long-varlena format, including padding; also we include the type OID with each element. Sounds wasteful. A more compact representation might be to allow short varlenas and doing away with the alignment padding, put the the type OID just once. This is important because we cannot change it later. I'm also wondering if multirange_in() is the right strategy. Would it be sensible to give each range to range_parse or range_parse_bounde, so that it determines where each range starts and ends? Then that function doesn't have to worry about each quote and escape, duplicating range parsing code. (This will probably require changing signature of the rangetypes.c function, and exporting it; for example have range_parse_bound allow bound_str to be NULL and in that case don't mess with the StringInfo and just return the end position of the parsed bound.) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From bd8b814426bef6f5a620351557cdc953ba0ae22e Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Mon, 23 Mar 2020 18:50:02 -0300 Subject: [PATCH 1/5] Fix typo --- src/backend/utils/adt/multirangetypes.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/utils/adt/multirangetypes.c b/src/backend/utils/adt/multirangetypes.c index bd3cf3dc93..6b3cde8eca 100644 --- a/src/backend/utils/adt/multirangetypes.c +++ b/src/backend/utils/adt/multirangetypes.c @@ -79,7 +79,7 @@ static int32 multirange_canonicalize(TypeCacheEntry *rangetyp, int32 input_range * to range_in, but we have to detect quoting and backslash-escaping * which can happen for range bounds. * Backslashes can escape something inside or outside a quoted string, - * and a quoted string can escape quote marks either either backslashes + * and a quoted string can escape quote marks with either backslashes * or double double-quotes. */ Datum -- 2.20.1 >From afd017faab3d6aff3ca8618d95a6f9c918ab8cca Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Mon, 23 Mar 2020 18:50:19 -0300 Subject: [PATCH 2/5] Remove trailing useless ; --- src/backend/utils/adt/multirangetypes.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/utils/adt/multirangetypes.c b/src/backend/utils/adt/multirangetypes.c index 6b3cde8eca..6e9cf77651 100644 --- a/src/backend/utils/adt/multirangetypes.c +++ b/src/backend/utils/adt/multirangetypes.c @@ -131,7 +131,7 @@ multirange_in(PG_FUNCTION_ARGS) input_str), errdetail("Unexpected end of input."))); - /* skip whitespace */ ; + /* skip whitespace */ if (isspace((unsigned char) ch)) continue; -- 2.20.1 >From d7c1ea171de018d025ab5de57f572a9353b02fdf Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Mon, 23 Mar 2020 18:50:40 -0300 Subject: [PATCH 3/5] reduce palloc+strlcpy to pnstrdup --- src/backend/utils/adt/multirangetypes.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/backend/utils/adt/multirangetypes.c b/src/backend/utils/adt/multirangetypes.c index 6e9cf77651..5b57416cfd 100644 --- a/src/backend/utils/adt/multirangetypes.c +++ b/src/backend/utils/adt/multirangetypes.c @@ -167,9 +167,8 @@ multirange_in(PG_FUNCTION_ARGS) parse_state = MULTIRANGE_IN_RANGE_ESCAPED; else if (ch == ']' || ch == ')') { - range_str_len = ptr - range_str + 2; - range_str_copy = palloc0(range_str_len); - strlcpy(range_str_copy, range_str, range_str_len); + range_str_len = ptr - range_str + 1; + range_str_copy = pnstrdup(range_str, range_str_len); if (range_capacity == range_count) { range_capacity *= 2; -- 2.20.1 >From 7698f57d1f5ab870c89a84da7bea24bef241a458 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Mon, 23 Mar 2020 18:50:59 -0300 Subject: [PATCH 4/5] silence compiler warning --- src/backend/utils/fmgr/funcapi.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/src/backend/utils/fmgr/funcapi.c b/src/backend/utils/fmgr/funcapi.c index ab24359981..3bfefcf48a 100644 --- a/src/backend/utils/fmgr/funcapi.c +++ b/src/backend/utils/fmgr/funcapi.c @@ -508,10 +508,13 @@ resolve_anyelement_from_others(polymorphic_actuals *actuals) else if (OidIsVal
Re: range_agg
On 2020-Mar-14, Paul A Jungwirth wrote: > On Fri, Mar 13, 2020 at 2:39 PM Alvaro Herrera > wrote: > > Here's the rebased version. > > > > I just realized I didn't include the API change I proposed in > > https://postgr.es/m/20200306200343.GA625@alvherre.pgsql ... > > Thanks for your help with this Alvaro! > > I was just adding your changes to my own branch and I noticed your > v12-0001 has different parameter names here: > > static MultirangeIOData * > -get_multirange_io_data(FunctionCallInfo fcinfo, Oid mltrngtypid, > IOFuncSelector func) > +get_multirange_io_data(FunctionCallInfo fcinfo, Oid rngtypid, > IOFuncSelector func) > I'm pretty sure mltrngtypid is the correct name here. Right? Let me > know if I'm missing something. :-) Heh. The intention here was to abbreviate to "typid", but if you want to keep the longer name, it's OK too. I don't think that name is particularly critical, since it should be obvious that it must be a multirange type. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: range_agg
On Thu, Mar 19, 2020 at 1:42 PM Alvaro Herrera wrote: > > On 2020-Mar-16, Paul A Jungwirth wrote: > > > On Sat, Mar 14, 2020 at 11:13 AM Paul A Jungwirth > > wrote: > > > I think that should fix the cfbot failure. > > > > I saw this patch was failing to apply again. There was some > > refactoring to how polymorphic types are determined. I added my > > changes for anymultirange to that new approach, and things should be > > passing again. > > There's been another flurry of commits in the polymorphic types area. > Can you please rebase again? I noticed that too. :-) I'm about halfway through a rebase right now. I can probably finish it up tonight. Paul
Re: range_agg
On 2020-Mar-16, Paul A Jungwirth wrote: > On Sat, Mar 14, 2020 at 11:13 AM Paul A Jungwirth > wrote: > > I think that should fix the cfbot failure. > > I saw this patch was failing to apply again. There was some > refactoring to how polymorphic types are determined. I added my > changes for anymultirange to that new approach, and things should be > passing again. There's been another flurry of commits in the polymorphic types area. Can you please rebase again? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: range_agg
On Fri, Mar 13, 2020 at 2:39 PM Alvaro Herrera wrote: > Here's the rebased version. > > I just realized I didn't include the API change I proposed in > https://postgr.es/m/20200306200343.GA625@alvherre.pgsql ... Thanks for your help with this Alvaro! I was just adding your changes to my own branch and I noticed your v12-0001 has different parameter names here: diff --git a/src/backend/utils/adt/multirangetypes.c b/src/backend/utils/adt/multirangetypes.c index f9dd0378cc..0c9afd5448 100644 --- a/src/backend/utils/adt/multirangetypes.c +++ b/src/backend/utils/adt/multirangetypes.c @@ -376,11 +375,11 @@ multirange_typanalyze(PG_FUNCTION_ARGS) * pointer to a type cache entry. */ static MultirangeIOData * -get_multirange_io_data(FunctionCallInfo fcinfo, Oid mltrngtypid, IOFuncSelector func) +get_multirange_io_data(FunctionCallInfo fcinfo, Oid rngtypid, IOFuncSelector func) { MultirangeIOData *cache = (MultirangeIOData *) fcinfo->flinfo->fn_extra; -if (cache == NULL || cache->typcache->type_id != mltrngtypid) +if (cache == NULL || cache->typcache->type_id != rngtypid) { int16 typlen; booltypbyval; @@ -389,9 +388,9 @@ get_multirange_io_data(FunctionCallInfo fcinfo, Oid mltrngtypid, IOFuncSelector cache = (MultirangeIOData *) MemoryContextAlloc(fcinfo->flinfo->fn_mcxt, sizeof(MultirangeIOData)); -cache->typcache = lookup_type_cache(mltrngtypid, TYPECACHE_MULTIRANGE_INFO); +cache->typcache = lookup_type_cache(rngtypid, TYPECACHE_MULTIRANGE_INFO); if (cache->typcache->rngtype == NULL) -elog(ERROR, "type %u is not a multirange type", mltrngtypid); +elog(ERROR, "type %u is not a multirange type", rngtypid); /* get_type_io_data does more than we need, but is convenient */ get_type_io_data(cache->typcache->rngtype->type_id, I'm pretty sure mltrngtypid is the correct name here. Right? Let me know if I'm missing something. :-) Yours, Paul
Re: range_agg
Paul A Jungwirth writes: > On Wed, Mar 11, 2020 at 4:39 PM Paul A Jungwirth > wrote: >> Oh, my last email left out the most important part. :-) Is this >> failure online somewhere so I can take a look at it and fix it? Look for your patch(es) at http://commitfest.cputube.org Right now it's not even applying, presumably because Alvaro already pushed some pieces, so you need to rebase. But when it was applying, one or both of the test builds was failing. regards, tom lane
Re: range_agg
On Wed, Mar 11, 2020 at 4:39 PM Paul A Jungwirth wrote: > > On Sat, Mar 7, 2020 at 12:20 PM Tom Lane wrote: > > Alvaro Herrera writes: > > > [ v11 patches ] > > The cfbot isn't too happy with this; it's getting differently-ordered > > results than you apparently did for the list of owned objects in > > dependency.out's DROP OWNED BY test. Not sure why that should be --- > > it seems like af6550d34 should have ensured that there's only one > > possible ordering. > > Oh, my last email left out the most important part. :-) Is this > failure online somewhere so I can take a look at it and fix it? Looks like I sent this just to Tom before. This is something I need to fix, right? Regards, Paul
Re: range_agg
On Thu, Mar 12, 2020 at 5:38 AM Alvaro Herrera wrote: > ... thinking about gist+spgist, I think they could be written > identically to those for ranges, using the lowest (first) lower bound > and the higher (last) upper bound. > > ... thinking about selectivity, I think the way to write that is to > first compute the selectivity for the range across the first lower bound > and the last upper bound, and then subtract that for the "negative" > space between the contained ranges. > > I have no immediate thoughts about typanalyze. I suppose it should be > somehow based on the implementation for ranges ... maybe a first-cut is > to construct fake ranges covering the whole multirange (as above) and > just use the ranges implementation (compute_range_stats). Thanks, this is pretty much what I was thinking too, but I'm really glad to have someone who knows better confirm it. I can get started on these right away, and I'll let folks know if I need any help. When I looked at this last fall there was a lot I didn't understand. More or less using the existing ranges implementation should be a big help though. Paul
Re: range_agg
Hello Paul, thanks for the thorough response to all these points. Regarding the merge of multiranges with ranges, I had also thought of that at some point and was leaning towards doing that, but after the latest responses I think the arguments against it are sensible; and now there's a clear majority for keeping them separate. I'll be posting an updated version of the patch later today. I was a bit scared bit this part: On 2020-Mar-11, Paul A Jungwirth wrote: > Finally, I think I mentioned this a long time ago, but I'm still not > sure if this patch needs work around these things: > > - gist opclass > - spgist opclass > - typanalyze > - selectivity > > I'd love for a real Postgres expert to tell me "No, we can add that > later" or "Yes, you have to add that now." While I think that the gist and spgist opclass are in the "very nice to have but still optional" category, the other two items seem mandatory (but I'm not 100% certain about that, TBH). I'm not sure we have time to get those ready during this commitfest. ... thinking about gist+spgist, I think they could be written identically to those for ranges, using the lowest (first) lower bound and the higher (last) upper bound. ... thinking about selectivity, I think the way to write that is to first compute the selectivity for the range across the first lower bound and the last upper bound, and then subtract that for the "negative" space between the contained ranges. I have no immediate thoughts about typanalyze. I suppose it should be somehow based on the implementation for ranges ... maybe a first-cut is to construct fake ranges covering the whole multirange (as above) and just use the ranges implementation (compute_range_stats). -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: range_agg
Thanks everyone for offering some thoughts on this! Tom Lane wrote: > have you given any thought to just deciding that ranges and > multiranges are the same type? I can see how it might be nice to have just one type to think about. Still I think keeping them separate makes sense. Other folks have brought up several reasons already. Just to chime in: Tom Lane wrote: > Isaac Morland writes: > > Definitely agreed that range and multirange (or whatever it's called) > > should be different. In the work I do I have a number of uses for ranges, > > but not (yet) for multiranges. I want to be able to declare a column as > > range and be sure that it is just a single range, and then call lower() and > > upper() on it and be sure to get just one value in each case; and if I > > accidentally try to take the union of ranges where the union isn’t another > > range, I want to get an error rather than calculate some weird (in my > > context) multirange. > > I do not find that argument convincing at all. Surely you could put > that constraint on your column using "CHECK (numranges(VALUE) <= 1)" > or some such notation. A check constraint works for columns, but there are other contexts where you'd like to restrict things to just a contiguous range, e.g. user-defined functions and intermediate results in queries. Basic ranges seem a lot simpler to think about, so I can appreciate how letting any range be a multirange adds a heavy cognitive burden. I think a lot of people will share Isaac's opinion here. Tom Lane wrote: > Also, this would allow us to remove at least one ugly misfeature: > > regression=# select '[1,2]'::int4range + '[3,10)'::int4range; > ?column? > -- > [1,10) > (1 row) > > regression=# select '[1,2]'::int4range + '[4,10)'::int4range; > ERROR: result of range union would not be contiguous Because of backwards compatibility we can't really change +/-/* not to raise (right?), so if we joined ranges and multiranges we'd need to add operators with a different name. I was calling those @+/@-/@* before, but that was considered too unintuitive and undiscoverable. Having two types lets us use the nicer operator names. Tom Lane wrote: > it seems like we could consider the traditional > range functions like lower() and upper() to report on the first or last > range bound in a multirange I tried to keep functions/operators similar, so already lower(mr) = lower(r) and upper(mr) = upper(r). I think *conceptually* it's good to make ranges & multiranges as interchangable as possible, but that doesn't mean they have to be the same type. Adding multiranges-as-ranges also raises questions about their string format. If a multirange is {[1,2), [4,5)} would you only print the curly braces when there is more than one element? I don't *think* allowing non-contiguous ranges would break how we use them in GiST indexes or exclusion constraints, but maybe someone can think of some problem I can't. It's one place to be wary anyway. At the very least it would make those things slower I expect. On a few other issues people have raised recently: Alvaro Herrera writes: > I wonder what's the point of multirange arrays. Is there a reason we > create those? We have arrays of everything else, so why not have them for multiranges? We don't have to identify specific use cases here, although I can see how you'd want to call array_agg/UNNEST on some multiranges, e.g. (Actually I really want to add an UNNEST that *takes* a multirange, but that could be a follow-on commit.) If nothing else I think omitting arrays of multiranges would be a strange irregularity in the type system. David G. Johnston wrote: > In the tests there is: > > +select '{[a,a],[b,b]}'::textmultirange; > + textmultirange > + > + {[a,a],[b,b]} > +(1 row) > + > +-- without canonicalization, we can't join these: > +select '{[a,a], [b,b]}'::textmultirange; > + textmultirange > + > + {[a,a],[b,b]} > +(1 row) > + > > Aside from the comment they are identical so I'm confused as to why both > tests exist - though I suspect it has to do with the fact that the expected > result would be {[a,b]} since text is discrete. Those tests are for basic string parsing (multirange_in), so one is testing {A,B} and the other {A, B} (with a space after the comma). (There are some tests right above those that also have blank spaces, but they only output a single element in the multirange result.) David G. Johnston wrote: > Also, the current patch set seems a bit undecided on whether it wants to be > truly a multi-range or a range that can report non-contiguous components. > Specifically, > > +select '{[a,d), [b,f]}'::textmultirange; > + textmultirange > + > + {[a,f]} > +(1 row) Without a canonicalization function, we can't know that [a,a] touches [b,b], but we *can* know that [a,d) touches [b,f). Or even: regression=# select '{[a,b), [b,b]}'::textmultirange; textmultirange {[a,b]} (1 r
Re: range_agg
On Mon, Mar 09, 2020 at 06:34:04PM -0700, Jeff Davis wrote: > On Sat, 2020-03-07 at 16:06 -0500, Tom Lane wrote: > > Actually ... have you given any thought to just deciding that ranges > > and > > multiranges are the same type? > > It has come up in a number of conversations, but I'm not sure if it was > discussed on this list. > > > I think on the whole the advantages win, > > and I feel like that might also be the case here. > > Some things to think about: > > 1. Ranges are common -- at least implicitly -- in a lot of > applications/systems. It's pretty easy to represent extrernal data as > ranges in postgres, and also to represent postgres ranges in external > systems. But I can see multiranges causing friction around a lot of > common tasks, like displaying in a UI. If you only expect ranges, you > can add a CHECK constraint, so this is annoying but not necessarily a > deal-breaker. It could become well and truly burdensome in a UI or an API. The difference between one, as ranges are now, and many, as multi-ranges would be if we shoehorn them into the range type, are pretty annoying to deal with. > 2. There are existing client libraries[1] that support range types and > transform them to types within the host language. Obviously, those > would need to be updated to expect multiple ranges. The type systems that would support such types might get unhappy with us if we started messing with some of the properties like contiguousness. > 3. It seems like we would want some kind of base "range" type. When you > try to break a multirange down into constituent ranges, what type would > those pieces be? (Aside: how do you get the constituent ranges?) > > I'm thinking more about casting to see if there's a possible compromise > there. I think the right compromise is to recognize that the closure of a set (ranges) over an operation (set union) may well be a different set (multi-ranges). Other operations have already been proposed, complete with concrete use cases that could really make PostgreSQL stand out. That we don't have an obvious choice of "most correct" operation over which to close ranges makes it even bigger a potential foot-gun when we choose one arbitrarily and declare it to be the canonical one. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: range_agg
On Sat, 2020-03-07 at 16:06 -0500, Tom Lane wrote: > Actually ... have you given any thought to just deciding that ranges > and > multiranges are the same type? It has come up in a number of conversations, but I'm not sure if it was discussed on this list. > I think on the whole the advantages win, > and I feel like that might also be the case here. Some things to think about: 1. Ranges are common -- at least implicitly -- in a lot of applications/systems. It's pretty easy to represent extrernal data as ranges in postgres, and also to represent postgres ranges in external systems. But I can see multiranges causing friction around a lot of common tasks, like displaying in a UI. If you only expect ranges, you can add a CHECK constraint, so this is annoying but not necessarily a deal-breaker. 2. There are existing client libraries[1] that support range types and transform them to types within the host language. Obviously, those would need to be updated to expect multiple ranges. 3. It seems like we would want some kind of base "range" type. When you try to break a multirange down into constituent ranges, what type would those pieces be? (Aside: how do you get the constituent ranges?) I'm thinking more about casting to see if there's a possible compromise there. Regards, Jeff Davis [1] https://sfackler.github.io/rust-postgres-range/doc/v0.8.2/postgres_range/
Re: range_agg
Alvaro Herrera writes: > I wonder what's the point of multirange arrays. Is there a reason we > create those? That's what we thought about arrays of composites to start with, too. regards, tom lane
Re: range_agg
I wonder what's the point of multirange arrays. Is there a reason we create those? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: range_agg
On Sat, Mar 7, 2020 at 4:06 PM Tom Lane wrote: > It's possible that this is a bad idea. It bears a lot of similarity, > I guess, to the way that Postgres doesn't consider arrays of different > dimensionality to be distinct types. That has some advantages but it > surely also has downsides. I think on the whole the advantages win, > and I feel like that might also be the case here. Personally, I'm pretty unhappy with the fact that the array system conflates arrays with different numbers of dimensions. Like, you end up having to write array_upper(X, 1) instead of just array_upper(X), and then you're still left wondering whether whatever you wrote is going to blow up if somebody sneaks a multidimensional array in there, or for that matter, an array with a non-standard lower bound. There's lots of little things like that, where the decision to decorate the array type with these extra frammishes makes it harder to use for everybody even though most people don't use (or even want) those features. So count me as +1 for keeping range and multirange separate. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: range_agg
On Fri, Dec 20, 2019 at 10:43 AM Alvaro Herrera wrote: > I took the liberty of rebasing this series on top of recent branch > master. > In the tests there is: +select '{[a,a],[b,b]}'::textmultirange; + textmultirange + + {[a,a],[b,b]} +(1 row) + +-- without canonicalization, we can't join these: +select '{[a,a], [b,b]}'::textmultirange; + textmultirange + + {[a,a],[b,b]} +(1 row) + Aside from the comment they are identical so I'm confused as to why both tests exist - though I suspect it has to do with the fact that the expected result would be {[a,b]} since text is discrete. Also, the current patch set seems a bit undecided on whether it wants to be truly a multi-range or a range that can report non-contiguous components. Specifically, +select '{[a,d), [b,f]}'::textmultirange; + textmultirange + + {[a,f]} +(1 row) There is a an argument that a multi-range should output {[a,d),[b,f]}. IMO its arguable that a multi-range container should not try and reduce the number of contained ranges at all. If that is indeed a desire, which seems like it is, that feature alone goes a long way to support wanting to just merge the desired functionality into the existing range type, where the final output has the minimum number of contiguous ranges possible, rather than having a separate multirange type. David J.
Re: range_agg
Isaac Morland writes: >> so 7. 3. 2020 v 22:20 odesílatel Tom Lane napsal: >>> Actually ... have you given any thought to just deciding that ranges and >>> multiranges are the same type? That is, any range can now potentially >>> contain multiple segments? > Definitely agreed that range and multirange (or whatever it's called) > should be different. In the work I do I have a number of uses for ranges, > but not (yet) for multiranges. I want to be able to declare a column as > range and be sure that it is just a single range, and then call lower() and > upper() on it and be sure to get just one value in each case; and if I > accidentally try to take the union of ranges where the union isn’t another > range, I want to get an error rather than calculate some weird (in my > context) multirange. I do not find that argument convincing at all. Surely you could put that constraint on your column using "CHECK (numranges(VALUE) <= 1)" or some such notation. Also, you're attacking a straw man with respect to lower() and upper(); I did not suggest changing them to return arrays, but rather interpreting them as returning the lowest or highest endpoint, which I think would be transparent in most cases. (There would obviously need to be some other functions that could dissect a multirange more completely.) The real problem with the proposal as it stands, I think, is exactly that range union has failure conditions and you have to use some other operator if you want to get a successful result always. That's an enormously ugly kluge, and if we'd done it right the first time nobody would have objected. Bottom line is that I don't think that we should add a pile of new moving parts to the type system just because people are afraid of change; arguably, that's *more* change (and more risk of bugs), not less. Unifying the types would, for example, get rid of the pesky question of what promoting a range to multirange should look like exactly, because it'd be a no-op. regards, tom lane
Re: range_agg
On Sat, 7 Mar 2020 at 16:27, Pavel Stehule wrote: > > so 7. 3. 2020 v 22:20 odesílatel Tom Lane napsal: > >> I wrote: >> > Actually ... have you given any thought to just deciding that ranges and >> > multiranges are the same type? That is, any range can now potentially >> > contain multiple segments? That would eliminate a whole lot of the >> > tedious infrastructure hacking involved in this patch, and let you focus >> > on the actually-useful functionality. >> >> > I think this behave is correct. Sometimes you should to get only one range > - and this check is a protection against not continuous range. > > if you expect multirange, then do > > select '[1,2]'::int4range::multirange + '[4,10)'::int4range; > Definitely agreed that range and multirange (or whatever it's called) should be different. In the work I do I have a number of uses for ranges, but not (yet) for multiranges. I want to be able to declare a column as range and be sure that it is just a single range, and then call lower() and upper() on it and be sure to get just one value in each case; and if I accidentally try to take the union of ranges where the union isn’t another range, I want to get an error rather than calculate some weird (in my context) multirange. On a related note, I was thinking about this and I don’t think I like range_agg as a name at all. I know we have array_agg and string_agg but surely shouldn’t this be called union_agg, and shouldn’t there also be an intersect_agg? I mean, taking the union isn’t the only possible aggregate on ranges or multiranges.
Re: range_agg
On Sat, Mar 07, 2020 at 06:45:44PM -0500, Tom Lane wrote: > David Fetter writes: > > There's another use case not yet covered here that could make this > > even more complex, we should probably plan for it: multi-ranges > > with weights. > > I'm inclined to reject that as completely out of scope. The core > argument for unifying multiranges with ranges, if you ask me, is to > make the data type closed under union. Weights are from some other > universe. I don't think they are. SQL databases are super useful because they do bags in addition to sets, so set union isn't the only, or maybe even the most important, operation over which ranges ought to be closed. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: range_agg
David Fetter writes: > There's another use case not yet covered here that could make this > even more complex, we should probably plan for it: multi-ranges with > weights. I'm inclined to reject that as completely out of scope. The core argument for unifying multiranges with ranges, if you ask me, is to make the data type closed under union. Weights are from some other universe. regards, tom lane
Re: range_agg
On Sat, Mar 07, 2020 at 04:06:32PM -0500, Tom Lane wrote: > I wrote: > > However, what I'm on about right at the moment is that I don't think > > there should be any delta in that test at all. As far as I can see, > > the design idea here is that multiranges will be automatically created > > over range types, and the user doesn't need to do that. To my mind, > > that means that they're an implementation detail and should not show up as > > separately-owned objects, any more than an autogenerated array type does. > > Actually ... have you given any thought to just deciding that ranges and > multiranges are the same type? That is, any range can now potentially > contain multiple segments? That would eliminate a whole lot of the > tedious infrastructure hacking involved in this patch, and let you focus > on the actually-useful functionality. If we're changing range types rather than constructing a new multi-range layer atop them, I think it would be helpful to have some way to figure out quickly whether this new range type was contiguous. One way to do that would be to include a "range cardinality" in the data structure which be the number of left ends in it. One of the things I'd pictured doing with multiranges was along the lines of a "full coverage" constraint like "During a shift, there can be no interval that's not covered," which would correspond to a "range cardinality" of 1. I confess I'm getting a little twitchy about the idea of eliding the cases of "one" and "many", though. > Assuming that that's ok, it seems like we could consider the traditional > range functions like lower() and upper() to report on the first or last > range bound in a multirange --- essentially, they ignore any "holes" > that exist inside the range. And the new functions for multiranges > act much like array slicing, in that they give you back pieces of a range > that aren't actually of a distinct type. So new functions along the lines of lowers(), uppers(), opennesses(), etc.? I guess this could be extended as needs emerge. There's another use case not yet covered here that could make this even more complex, we should probably plan for it: multi-ranges with weights. For example, SELECT weighted_range_union(r) FROM (VALUES('[0,1)'::float8range), ('[0,3)'), '('[2,5)')) AS t(r) would yield something along the lines of: (([0,1),1), ([1,3),2), ([3,5),1)) and wedging that into the range type seems messy. Each range would then have a cardinality, and each range within would have a weight, all of which would be an increasingly heavy burden on the common case where there's just a single range. Enhancing a separate multirange type to have weights seems like a cleaner path forward. Given that, I'm -1 on mushing multi-ranges into a special case of ranges, or /vice versa/. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: range_agg
so 7. 3. 2020 v 22:20 odesílatel Tom Lane napsal: > I wrote: > > Actually ... have you given any thought to just deciding that ranges and > > multiranges are the same type? That is, any range can now potentially > > contain multiple segments? That would eliminate a whole lot of the > > tedious infrastructure hacking involved in this patch, and let you focus > > on the actually-useful functionality. > > Also, this would allow us to remove at least one ugly misfeature: > > regression=# select '[1,2]'::int4range + '[3,10)'::int4range; > ?column? > -- > [1,10) > (1 row) > > regression=# select '[1,2]'::int4range + '[4,10)'::int4range; > ERROR: result of range union would not be contiguous > > If the result of range_union can be a multirange as easily as not, > we would no longer have to throw an error here. > I think this behave is correct. Sometimes you should to get only one range - and this check is a protection against not continuous range. if you expect multirange, then do select '[1,2]'::int4range::multirange + '[4,10)'::int4range; Regards Pavel > > regards, tom lane >
Re: range_agg
I wrote: > Actually ... have you given any thought to just deciding that ranges and > multiranges are the same type? That is, any range can now potentially > contain multiple segments? That would eliminate a whole lot of the > tedious infrastructure hacking involved in this patch, and let you focus > on the actually-useful functionality. Also, this would allow us to remove at least one ugly misfeature: regression=# select '[1,2]'::int4range + '[3,10)'::int4range; ?column? -- [1,10) (1 row) regression=# select '[1,2]'::int4range + '[4,10)'::int4range; ERROR: result of range union would not be contiguous If the result of range_union can be a multirange as easily as not, we would no longer have to throw an error here. regards, tom lane
Re: range_agg
I wrote: > However, what I'm on about right at the moment is that I don't think > there should be any delta in that test at all. As far as I can see, > the design idea here is that multiranges will be automatically created > over range types, and the user doesn't need to do that. To my mind, > that means that they're an implementation detail and should not show up as > separately-owned objects, any more than an autogenerated array type does. Actually ... have you given any thought to just deciding that ranges and multiranges are the same type? That is, any range can now potentially contain multiple segments? That would eliminate a whole lot of the tedious infrastructure hacking involved in this patch, and let you focus on the actually-useful functionality. It's possible that this is a bad idea. It bears a lot of similarity, I guess, to the way that Postgres doesn't consider arrays of different dimensionality to be distinct types. That has some advantages but it surely also has downsides. I think on the whole the advantages win, and I feel like that might also be the case here. The gating requirement for this would be to make sure that a plain range and a multirange can be told apart by contents. The first idea that comes to mind is to repurpose the allegedly-unused RANGE_xB_NULL bits in the flag byte at the end of the datum. If one of them is set, then it's a multirange, and we use a different interpretation of the bytes between the type OID and the flag byte. Assuming that that's ok, it seems like we could consider the traditional range functions like lower() and upper() to report on the first or last range bound in a multirange --- essentially, they ignore any "holes" that exist inside the range. And the new functions for multiranges act much like array slicing, in that they give you back pieces of a range that aren't actually of a distinct type. regards, tom lane
Re: range_agg
Alvaro Herrera writes: > [ v11 patches ] The cfbot isn't too happy with this; it's getting differently-ordered results than you apparently did for the list of owned objects in dependency.out's DROP OWNED BY test. Not sure why that should be --- it seems like af6550d34 should have ensured that there's only one possible ordering. However, what I'm on about right at the moment is that I don't think there should be any delta in that test at all. As far as I can see, the design idea here is that multiranges will be automatically created over range types, and the user doesn't need to do that. To my mind, that means that they're an implementation detail and should not show up as separately-owned objects, any more than an autogenerated array type does. So somewhere there's a missing bit of code, or more than one missing bit, to make multiranges act as derived types, the way arrays are. regards, tom lane
Re: range_agg
Thanks for looking at this again! On 3/4/20 1:33 PM, Alvaro Herrera wrote: I came across an interesting thing, namely multirange_canonicalize()'s use of qsort_arg with a callback of range_compare(). range_compare() calls range_deserialize() (non-trivial parsing) for each input range; multirange_canonicalize() later does a few extra deserialize calls of its own. Call me a premature optimization guy if you will, but I think it makes sense to have a different struct (let's call it "InMemoryRange") which stores the parsed representation of each range; then we can deserialize all ranges up front, and use that as many times as needed, without having to deserialize each range every time. I don't know, this sounds like a drastic change. I agree that multirange_deserialize and range_deserialize do a lot of copying (not really any parsing though, and they both assume their inputs are already de-TOASTED). But they are used very extensively, so if you wanted to remove them you'd have to rewrite a lot. I interpreted the intention of range_deserialize to be a way to keep the range struct fairly "private" and give a standard interface to extracting its attributes. Its motive seems akin to deconstruct_array. So I wrote multirange_deserialize to follow that principle. Both functions also handle memory alignment issues for you. With multirange_deserialize, there isn't actually much structure (just the list of ranges), so perhaps you could more easily omit it and give callers direct access into the multirange contents. That still seems risky though, and less well encapsulated. My preference would be to see if these functions are really a performance problem first, and only redo the in-memory structures if they are. Also that seems like something you could do as a separate project. (I wouldn't mind working on it myself, although I'd prefer to do actual temporal database features first.) There are no backwards-compatibility concerns to changing the in-memory structure, right? (Even if there are, it's too late to avoid them for ranges.) While I'm at this, why not name the new file simply multiranges.c instead of multirangetypes.c? As someone who doesn't do a lot of Postgres hacking, I tried to follow the approach in rangetypes.c as closely as I could, especially for naming things. So I named the file multirangetypes.c because there was already rangetypes.c. But also I can see how the "types" emphasizes that ranges and multiranges are not concrete types themselves, but more like abstract data types or generics (like arrays). Yours, -- Paul ~{:-) p...@illuminatedcomputing.com
Re: range_agg
I came across an interesting thing, namely multirange_canonicalize()'s use of qsort_arg with a callback of range_compare(). range_compare() calls range_deserialize() (non-trivial parsing) for each input range; multirange_canonicalize() later does a few extra deserialize calls of its own. Call me a premature optimization guy if you will, but I think it makes sense to have a different struct (let's call it "InMemoryRange") which stores the parsed representation of each range; then we can deserialize all ranges up front, and use that as many times as needed, without having to deserialize each range every time. While I'm at this, why not name the new file simply multiranges.c instead of multirangetypes.c? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: range_agg
Hi st 22. 1. 2020 v 0:55 odesílatel Paul A Jungwirth < p...@illuminatedcomputing.com> napsal: > On Sun, Jan 19, 2020 at 9:57 PM Paul A Jungwirth > wrote: > > On Sun, Jan 19, 2020 at 4:38 PM Tom Lane wrote: > > > True for casts involving concrete types, mainly because we'd like > > > the identity "value::typename == typename(value)" to hold without > > > too much worry about whether the latter is a plain function call > > > or a special case. Not sure whether it makes as much sense for > > > polymorphics, since casting to a polymorphic type is pretty silly: > > > we do seem to allow you to do that, but it's a no-op. > > > > > > ... > > > > > > Alternatively, consider this: a cast from some concrete multirange type > > > to anymultirange is a no-op, while any other sort of cast probably > ought > > > to be casting to some particular concrete multirange type. That would > > > line up with the existing operations for plain ranges. > > > > I agree you wouldn't actually cast by saying x::anymultirange, and the > > casts we define are already concrete, so instead you'd say > > x::int4multirange. But I think having a polymorphic function to > > convert from an anyrange to an anymultirange is useful so you can > > write generic functions. I can see how calling it "anymultirange" may > > be preferring the implementor perspective over the user perspective > > though, and how simply "multirange" would be more empathetic. I don't > > mind taking that approach. > > Here is a patch with anymultirange(anyrange) renamed to > multirange(anyrange). I also rebased on the latest master, added > documentation about the multirange(anyrange) function, and slightly > adjusted the formatting of the range functions table. > I think so this patch is ready for commiter. All tests passed, the doc is good enough (the chapter name "Range functions and Operators" should be renamed to "Range/multirange functions and Operators" The code formatting and comments looks well Thank you for your work Regards Pavel > Thanks, > Paul >
Re: range_agg
On Sun, Jan 19, 2020 at 4:38 PM Tom Lane wrote: > True for casts involving concrete types, mainly because we'd like > the identity "value::typename == typename(value)" to hold without > too much worry about whether the latter is a plain function call > or a special case. Not sure whether it makes as much sense for > polymorphics, since casting to a polymorphic type is pretty silly: > we do seem to allow you to do that, but it's a no-op. > > ... > > Alternatively, consider this: a cast from some concrete multirange type > to anymultirange is a no-op, while any other sort of cast probably ought > to be casting to some particular concrete multirange type. That would > line up with the existing operations for plain ranges. I agree you wouldn't actually cast by saying x::anymultirange, and the casts we define are already concrete, so instead you'd say x::int4multirange. But I think having a polymorphic function to convert from an anyrange to an anymultirange is useful so you can write generic functions. I can see how calling it "anymultirange" may be preferring the implementor perspective over the user perspective though, and how simply "multirange" would be more empathetic. I don't mind taking that approach. Yours, Paul
Re: range_agg
po 20. 1. 2020 v 1:38 odesílatel Tom Lane napsal: > Paul A Jungwirth writes: > > On Sun, Jan 19, 2020 at 12:10 AM Pavel Stehule > wrote: > >> Now, I think so name "anymultirange" is not good. Maybe better name is > just "multirange" > > > Are you sure? This function exists to be a cast to an anymultirange, > > and I thought the convention was to name cast functions after their > > destination type. > > True for casts involving concrete types, mainly because we'd like > the identity "value::typename == typename(value)" to hold without > too much worry about whether the latter is a plain function call > or a special case. Not sure whether it makes as much sense for > polymorphics, since casting to a polymorphic type is pretty silly: > we do seem to allow you to do that, but it's a no-op. > > I'm a little troubled by the notion that what you're talking about > here is not a no-op (if it were, you wouldn't need a function). > That seems like there's something fundamentally not quite right > either with the design or with how you're thinking about it. > I thinking about completeness of operations I can to write CREATE OR REPLACE FUNCTION fx(anyarray, anyelement) RETURNS anyarray AS $$ SELECT $1 || ARRAY[$2] $$ LANGUAGE sql; I need to some functionality for moving a value to different category (it is more generic than casting to specific type (that can hold category) CREATE OR REPLACE FUNCTION fx(anymultirange, anyrange) RETURNS anyrage AS $$ SELECT $1 + multirange($1) $$ LANGUAGE sql; is just a analogy. Regards Pavel > As a comparison point, we sometimes describe subscripting as > being a polymorphic operation like > > subscript(anyarray, integer) returns anyelement > > It would be completely unhelpful to call that anyelement(). > I feel like you might be making a similar mistake here. > > Alternatively, consider this: a cast from some concrete multirange type > to anymultirange is a no-op, while any other sort of cast probably ought > to be casting to some particular concrete multirange type. That would > line up with the existing operations for plain ranges. > > regards, tom lane >
Re: range_agg
Paul A Jungwirth writes: > On Sun, Jan 19, 2020 at 12:10 AM Pavel Stehule > wrote: >> Now, I think so name "anymultirange" is not good. Maybe better name is just >> "multirange" > Are you sure? This function exists to be a cast to an anymultirange, > and I thought the convention was to name cast functions after their > destination type. True for casts involving concrete types, mainly because we'd like the identity "value::typename == typename(value)" to hold without too much worry about whether the latter is a plain function call or a special case. Not sure whether it makes as much sense for polymorphics, since casting to a polymorphic type is pretty silly: we do seem to allow you to do that, but it's a no-op. I'm a little troubled by the notion that what you're talking about here is not a no-op (if it were, you wouldn't need a function). That seems like there's something fundamentally not quite right either with the design or with how you're thinking about it. As a comparison point, we sometimes describe subscripting as being a polymorphic operation like subscript(anyarray, integer) returns anyelement It would be completely unhelpful to call that anyelement(). I feel like you might be making a similar mistake here. Alternatively, consider this: a cast from some concrete multirange type to anymultirange is a no-op, while any other sort of cast probably ought to be casting to some particular concrete multirange type. That would line up with the existing operations for plain ranges. regards, tom lane
Re: range_agg
On Sun, Jan 19, 2020 at 12:10 AM Pavel Stehule wrote: > Now, I think so name "anymultirange" is not good. Maybe better name is just > "multirange" Are you sure? This function exists to be a cast to an anymultirange, and I thought the convention was to name cast functions after their destination type. I can change it, but in my opinion anymultirange follows the Postgres conventions better. But just let me know and I'll do "multirange" instead! Yours, Paul
Re: range_agg
so 18. 1. 2020 v 17:35 odesílatel Pavel Stehule napsal: > > > so 18. 1. 2020 v 17:07 odesílatel Paul A Jungwirth < > p...@illuminatedcomputing.com> napsal: > >> On Sat, Jan 18, 2020 at 7:20 AM Pavel Stehule >> wrote: >> > Can be nice to have a polymorphic function >> > >> > multirange(anymultirange, anyrange) returns anymultirange. This >> functions should to do multirange from $2 to type $1 >> > >> > It can enhance to using polymorphic types and simplify casting. >> >> Thanks for taking another look! I actually have that already but it is >> named anymultirange: >> >> regression=# select anymultirange(int4range(1,2)); >> anymultirange >> --- >> {[1,2)} >> (1 row) >> >> Will that work for you? >> > > It's better than I though > > >> I think I only wrote that to satisfy some requirement of having an >> anymultirange type, but I agree it could be useful. (I even used it in >> the regress tests.) Maybe it's worth documenting too? >> > Now, I think so name "anymultirange" is not good. Maybe better name is just "multirange" > yes > > >> > when I tried to write this function in plpgsql I got >> > >> > create or replace function multirange(anymultirange, anyrange) returns >> anymultirange as $$ >> > begin >> > execute format('select $2::%I', pg_typeof($1)) into $1; >> > return $1; >> > end; >> > $$ language plpgsql immutable strict; >> > >> > ERROR: unrecognized typtype: 109 >> > CONTEXT: compilation of PL/pgSQL function "multirange" near line 1 >> >> Hmm, I'll add a test for that and see if I can find the problem. >> > > ok > > >> Thanks! >> Paul >> >
Re: range_agg
so 18. 1. 2020 v 17:07 odesílatel Paul A Jungwirth < p...@illuminatedcomputing.com> napsal: > On Sat, Jan 18, 2020 at 7:20 AM Pavel Stehule > wrote: > > Can be nice to have a polymorphic function > > > > multirange(anymultirange, anyrange) returns anymultirange. This > functions should to do multirange from $2 to type $1 > > > > It can enhance to using polymorphic types and simplify casting. > > Thanks for taking another look! I actually have that already but it is > named anymultirange: > > regression=# select anymultirange(int4range(1,2)); > anymultirange > --- > {[1,2)} > (1 row) > > Will that work for you? > It's better than I though > I think I only wrote that to satisfy some requirement of having an > anymultirange type, but I agree it could be useful. (I even used it in > the regress tests.) Maybe it's worth documenting too? > yes > > when I tried to write this function in plpgsql I got > > > > create or replace function multirange(anymultirange, anyrange) returns > anymultirange as $$ > > begin > > execute format('select $2::%I', pg_typeof($1)) into $1; > > return $1; > > end; > > $$ language plpgsql immutable strict; > > > > ERROR: unrecognized typtype: 109 > > CONTEXT: compilation of PL/pgSQL function "multirange" near line 1 > > Hmm, I'll add a test for that and see if I can find the problem. > ok > Thanks! > Paul >
Re: range_agg
On Sat, Jan 18, 2020 at 7:20 AM Pavel Stehule wrote: > Can be nice to have a polymorphic function > > multirange(anymultirange, anyrange) returns anymultirange. This functions > should to do multirange from $2 to type $1 > > It can enhance to using polymorphic types and simplify casting. Thanks for taking another look! I actually have that already but it is named anymultirange: regression=# select anymultirange(int4range(1,2)); anymultirange --- {[1,2)} (1 row) Will that work for you? I think I only wrote that to satisfy some requirement of having an anymultirange type, but I agree it could be useful. (I even used it in the regress tests.) Maybe it's worth documenting too? > when I tried to write this function in plpgsql I got > > create or replace function multirange(anymultirange, anyrange) returns > anymultirange as $$ > begin > execute format('select $2::%I', pg_typeof($1)) into $1; > return $1; > end; > $$ language plpgsql immutable strict; > > ERROR: unrecognized typtype: 109 > CONTEXT: compilation of PL/pgSQL function "multirange" near line 1 Hmm, I'll add a test for that and see if I can find the problem. Thanks! Paul
Re: range_agg
pá 17. 1. 2020 v 21:08 odesílatel Paul A Jungwirth < p...@illuminatedcomputing.com> napsal: > On Fri, Jan 10, 2020 at 1:38 AM Pavel Stehule > wrote: > >> This still leaves the question of how best to format the docs for > >> these operators. I like being able to combine all the <@ variations > >> (e.g.) into one table row, but if that is too ugly I could give them > >> separate rows instead. Giving them all their own row consumes a lot of > >> vertical space though, and to me that makes the docs more tedious to > >> read & browse, so it's harder to grasp all the available range-related > >> operations at a glance. > > > > > > I have similar opinion - maybe is better do documentation for range and > multirange separately. Sometimes there are still removed operators @+ > > I like keeping the range/multirange operators together since they are > so similar for both types, but if others disagree I'd be grateful for > more feedback. > ok > > You're right that I left in a few references to the old @+ style > operators in the examples; I've fixed those. > > > If you can share TYPTYPE_RANGE in code for multiranges, then it should > be 'r'. If not, then it needs own value. > > Okay. I think a new 'm' value is warranted because they are not > interchangeable. > > >> I experimented with setting pg_type.typelem to the multirange's range > >> type, but it seemed to break a lot of things, and reading the code I > >> saw some places that treat a non-zero typelem as synonymous with being > >> an array. So I'm reluctant to make this change also, especially when > >> it is just as easy to query pg_range to get a multirange's range type. > > > > > > ok, it is unhappy, but it is true. This note should be somewhere in > code, please > > I've added a comment about this. I put it at the top of DefineRange > but let me know if that's the wrong place. > > The attached file is also rebased on currrent master. > Can be nice to have a polymorphic function multirange(anymultirange, anyrange) returns anymultirange. This functions should to do multirange from $2 to type $1 It can enhance to using polymorphic types and simplify casting. Usage CREATE OR REPLACE FUNCTION diff(anymultirange, anyrange) RETURNS anymultirange AS $$ SELECT $1 - multirange($1, $2) $$ LANGUAGE sql; when I tried to write this function in plpgsql I got create or replace function multirange(anymultirange, anyrange) returns anymultirange as $$ begin execute format('select $2::%I', pg_typeof($1)) into $1; return $1; end; $$ language plpgsql immutable strict; ERROR: unrecognized typtype: 109 CONTEXT: compilation of PL/pgSQL function "multirange" near line 1 So probably some support in PL is missing But all others looks very well Regards Pavel > > Thanks! > Paul >
Re: range_agg
Hi so 4. 1. 2020 v 6:29 odesílatel Paul A Jungwirth < p...@illuminatedcomputing.com> napsal: > On Fri, Dec 20, 2019 at 11:29 AM Alvaro Herrera > wrote: > > > Should I just give up on implicit casts and require you to specify > > > one? That makes it a little more annoying to mix range & multirange > > > types, but personally I'm okay with that. This is my preferred > > > approach. > > > > +1 > > Here is a patch adding the casts, rebasing on the latest master, and > incorporating Alvaro's changes. Per his earlier suggestion I combined > it all into one patch file, which also makes it easier to apply > rebases & updates. > This patch was applied cleanly and all tests passed > > My work on adding casts also removes the @+ / @- / @* operators and > adds + / - / * operators where both parameters are multiranges. I > retained other operators with mixed range/multirange parameters, both > because there are already range operators with mixed range/scalar > parameters (e.g. <@), and because it seemed like the objection to @+ / > @- / @* was not mixed parameters per se, but rather their > unguessability. Since the other operators are the same as the existing > range operators, they don't share that problem. > looks well > > This still leaves the question of how best to format the docs for > these operators. I like being able to combine all the <@ variations > (e.g.) into one table row, but if that is too ugly I could give them > separate rows instead. Giving them all their own row consumes a lot of > vertical space though, and to me that makes the docs more tedious to > read & browse, so it's harder to grasp all the available range-related > operations at a glance. > I have similar opinion - maybe is better do documentation for range and multirange separately. Sometimes there are still removed operators @+ > I'm skeptical of changing pg_type.typtype from 'm' to 'r'. A > multirange isn't a range, so why should we give it the same type? Also > won't this break any queries that are using that column to find range > types? What is the motivation to use the same typtype for both ranges > and multiranges? (There is plenty I don't understand here, e.g. why we > have both typtype and typcategory, so maybe there is a good reason I'm > missing.) > If you can share TYPTYPE_RANGE in code for multiranges, then it should be 'r'. If not, then it needs own value. > I experimented with setting pg_type.typelem to the multirange's range > type, but it seemed to break a lot of things, and reading the code I > saw some places that treat a non-zero typelem as synonymous with being > an array. So I'm reluctant to make this change also, especially when > it is just as easy to query pg_range to get a multirange's range type. > ok, it is unhappy, but it is true. This note should be somewhere in code, please > Also range types themselves don't set typelem to their base type, and > it seems like we'd want to treat ranges and multiranges the same way > here. > > Alvaro also suggested renaming pg_range.mltrngtypid to > pg_range.rngmultitypid, so it shares the same "rng" prefix as the > other columns in this table. Having a different prefix does stand out. > I've included that change in this patch too. > Personally I have not any comments to implemented functionality. > > Yours, > Paul >
Re: range_agg
Alvaro Herrera writes: > On 2019-Dec-20, Paul A Jungwirth wrote: >> Is it permitted to add casts with polymorphic inputs & outputs? Is >> that something that we would actually want to do? I'd probably need >> both the polymorphic and concrete casts so that you could still say >> `int4range(1,2)::int4multirange`. > I'm embarrased to admit that I don't grok the type system well enough > (yet) to answer this question. I would say no; if you want behavior like that you'd have to add code for it into the coercion machinery, much like the casts around, say, types record and record[] versus named composites and arrays of same. Expecting the generic cast machinery to do the right thing would be foolhardy. In any case, even if it did do the right thing, you'd still need some additional polymorphic type to express the behavior you wanted, no? So it's not clear there'd be any net savings of effort. regards, tom lane
Re: range_agg
On 2019-Dec-20, Paul A Jungwirth wrote: > Is it permitted to add casts with polymorphic inputs & outputs? Is > that something that we would actually want to do? I'd probably need > both the polymorphic and concrete casts so that you could still say > `int4range(1,2)::int4multirange`. I'm embarrased to admit that I don't grok the type system well enough (yet) to answer this question. > Should I change the coerce code to look for casts among concrete types > when the function has polymorphic types? I'm pretty scared to do > something like that though, both because of the complexity and lest I > cause unintended effects. Yeah, I suggest to stay away from that. I think this multirange thing is groundbreaking enough that we don't need to cause additional potential breakage. > Should I just give up on implicit casts and require you to specify > one? That makes it a little more annoying to mix range & multirange > types, but personally I'm okay with that. This is my preferred > approach. +1 > I have some time over the holidays to work on the other changes Alvaro > has suggested. I hope not to have made things worse by posting a rebase. Anyway, that's the reason I posted my other changes separately. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: range_agg
On Fri, Dec 20, 2019 at 10:19 AM Pavel Stehule wrote: > I had a talk with Paul about possible simplification of designed operators. > Last message from Paul was - he is working on new version. Thanks Alvaro & Pavel for helping move this forward. I've added the casts but they aren't used automatically for things like `range_union(r, mr)` or `mr + r`, even though they are implicit. That's because the casts are for concrete types (e.g. int4range -> int4multirange) but the functions & operators are for polymorphic types (anymultirange + anymultirange). So I'd like to get some feedback about the best way to proceed. Is it permitted to add casts with polymorphic inputs & outputs? Is that something that we would actually want to do? I'd probably need both the polymorphic and concrete casts so that you could still say `int4range(1,2)::int4multirange`. Should I change the coerce code to look for casts among concrete types when the function has polymorphic types? I'm pretty scared to do something like that though, both because of the complexity and lest I cause unintended effects. Should I just give up on implicit casts and require you to specify one? That makes it a little more annoying to mix range & multirange types, but personally I'm okay with that. This is my preferred approach. I have some time over the holidays to work on the other changes Alvaro has suggested. Thanks, Paul
Re: range_agg
On 2019-Dec-20, Alvaro Herrera wrote: > I am not convinced that adding TYPTYPE_MULTIRANGE is really necessary. > Why can't we just treat those types as TYPTYPE_RANGE and distinguish > them using TYPCATEGORY_MULTIRANGE? That's what we do for arrays. I'll > try to do that next. I think this can be simplified if we make the the multirange's pg_type.typelem carry the base range's OID (the link in the other direction already appears as pg_range.mltrngtypid, though I'd recommend renaming that to pg_range.rngmultitypid to maintain the "rng" prefix convention). Then we can distinguish a multirange from a plain range easily, both of which have typtype as TYPTYPE_RANGE, because typelem != 0 in a multi. That knowledge can be encapsulated easily in type_is_multirange and pg_dump's getTypes. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: range_agg
pá 20. 12. 2019 v 18:43 odesílatel Alvaro Herrera napsal: > I took the liberty of rebasing this series on top of recent branch > master. The first four are mostly Paul's originals, except for conflict > fixes; the rest are changes I'm proposing as I go along figuring out the > whole thing. (I would post just my proposed changes, if it weren't for > the rebasing; apologies for the messiness.) > > I am not convinced that adding TYPTYPE_MULTIRANGE is really necessary. > Why can't we just treat those types as TYPTYPE_RANGE and distinguish > them using TYPCATEGORY_MULTIRANGE? That's what we do for arrays. I'll > try to do that next. > > I think the algorithm for coming up with the multirange name is > suboptimal. It works fine with the name is short enough that we can add > a few extra letters, but otherwise the result look pretty silly. I > think we can still improve on that. I propose to make > makeUniqueTypeName accept a suffix, and truncate the letters that appear > *before* the suffix rather than truncating after it's been appended. > > There's a number of ereport() calls that should become elog(); and a > bunch of others that should probably acquire errcode() and be > reformatted per our style. > > > Regarding Pavel's documentation markup issue, > > > I am not sure how much is correct to use class="monospaced"> > > in doc. It is used for ranges, and multiranges, but no in other places > > I looked at the generated PDF and the table looks pretty bad; the words > in those entries overlap the words in the cell to their right. But that > also happens with entries that do not use ! > See [1] for an example of the existing docs being badly formatted. The > docbook documentation [2] seems to suggest that what Paul used is the > appropriate way to do this. > > Maybe a way is to make each entry have more than one row -- so the > example would appear below the other three fields in its own row, and > would be able to use the whole width of the table. > I had a talk with Paul about possible simplification of designed operators. Last message from Paul was - he is working on new version. Regards Pavel > [1] https://twitter.com/alvherre/status/1205563468595781633 > [2] https://tdg.docbook.org/tdg/5.1/literallayout.html > > -- > Álvaro Herrerahttps://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: range_agg
Hi Paul I'm starting to look at this again. Here's a few proposed changes to the current code as I read along. I noticed that 0001 does not compile on its own. It works as soon as I add 0002. What this is telling me is that the current patch split is not serving any goals; I think it's okay to merge them all into a single commit. If you want to split it in smaller pieces, it needs to be stuff that can be committed separately. One possible candidate for that is the new makeUniqueTypeName function you propose. I added this comment to explain what it does: /* - * makeUniqueTypeName: Prepend underscores as needed until we make a name that - * doesn't collide with anything. Tries the original typeName if requested. + * makeUniqueTypeName + * Generate a unique name for a prospective new type + * + * Given a typeName of length namelen, produce a new name into dest (an output + * buffer allocated by caller, which must of length NAMEDATALEN) by prepending + * underscores, until a non-conflicting name results. + * + * If tryOriginalName, first try with zero underscores. * * Returns the number of underscores added. */ This seems a little too strange; why not have the function allocate its output buffer instead, and return it? In order to support the case of it failing to find an appropriate name, have it return NULL, for caller to throw the "could not form ... name" error. The attached 0001 simplifies makeMultirangeConstructors; I think it was too baroque just because of it trying to imitate makeRangeConstructors; it seems easier to just repeat the ProcedureCreate call than trying to be clever. (makeRangeConstructors's comment is lying about the number of constructors it creates after commit df73584431e7, BTW. But note that the cruft in it to support doing it twice is not as much as in the new code). The other patches should be self-explanatory (I already submitted 0002 previously.) I'll keep going over the rest of it. Thanks! -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From 6a5b230f741fb4272770d09750b57fdad225027e Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Thu, 28 Nov 2019 19:08:43 -0300 Subject: [PATCH 1/6] Simplify makeMultirangeConstructors --- src/backend/commands/typecmds.c | 134 +++- 1 file changed, 80 insertions(+), 54 deletions(-) diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c index 38948a049b..1b012c9cad 100644 --- a/src/backend/commands/typecmds.c +++ b/src/backend/commands/typecmds.c @@ -1784,70 +1784,96 @@ static void makeMultirangeConstructors(const char *name, Oid namespace, Oid multirangeOid, Oid rangeArrayOid) { - static const char *const prosrc[2] = {"multirange_constructor0", - "multirange_constructor1"}; - static const int pronargs[2] = {0, 1}; - - Oid constructorArgTypes = rangeArrayOid; ObjectAddress myself, referenced; - int i; - - Datum allParamTypes[1] = {ObjectIdGetDatum(rangeArrayOid)}; - ArrayType *allParameterTypes = construct_array(allParamTypes, 1, OIDOID, - sizeof(Oid), true, 'i'); - Datum constructorAllParamTypes[2] = {PointerGetDatum(NULL), PointerGetDatum(allParameterTypes)}; - - Datum paramModes[1] = {CharGetDatum(FUNC_PARAM_VARIADIC)}; - ArrayType *parameterModes = construct_array(paramModes, 1, CHAROID, - 1, true, 'c'); - Datum constructorParamModes[2] = {PointerGetDatum(NULL), PointerGetDatum(parameterModes)}; + oidvector *argtypes; + Datum allParamTypes; + ArrayType *allParameterTypes; + Datum paramModes; + ArrayType *parameterModes; referenced.classId = TypeRelationId; referenced.objectId = multirangeOid; referenced.objectSubId = 0; - for (i = 0; i < lengthof(prosrc); i++) - { - oidvector *constructorArgTypesVector; + argtypes = buildoidvector(NULL, 0); + myself = ProcedureCreate(name, /* name: same as multirange type */ + namespace, + false, /* replace */ + false, /* returns set */ + multirangeOid, /* return type */ + BOOTSTRAP_SUPERUSERID, /* proowner */ + INTERNALlanguageId, /* language */ + F_FMGR_INTERNAL_VALIDATOR, + "multirange_constructor0", /* prosrc */ + NULL, /* probin */ + PROKIND_FUNCTION, + false, /* security_definer */ + false, /* leakproof */ + false, /* isStrict */ + PROVOLATILE_IMMUTABLE, /* volatility */ + PROPARALLEL_SAFE, /* parallel safety */ + argtypes, /* parameterTypes */ + PointerGetDatum(NULL), /* allParameterTypes */ + PointerGetDatum(NULL), /* parameterModes */ + PointerGetDatum(NULL), /* parameterNames */ + NIL, /* parameterDefaults */ + PointerGetDatum(NULL), /* trftypes */ + PointerGetDatum(NULL), /* proconfig */ + InvalidOid, /* prosupport */ + 1.0, /* procost */ + 0.0); /* prorows */ - constructorArgTypesVector
Re: range_agg
pá 22. 11. 2019 v 17:25 odesílatel Paul A Jungwirth < p...@illuminatedcomputing.com> napsal: > On Thu, Nov 21, 2019 at 9:21 PM Pavel Stehule > wrote: > > I though about it, and I think so cast from multirange to range is > useless, minimally it should be explicit. > > I agree: definitely not implicit. If I think of a good reason for it > I'll add it, but otherwise I'll leave it out. > > > On second hand - from range to multirange should be implicit. > > Okay. > > > The original patch did > > > > 1. MR @x MR = MR > > 2. R @x R = MR > > 3. MR @x R = MR > > > > I think so @1 & @3 has sense, but without introduction of special > operator. @2 is bad and can be solved by cast one or second operand. > > Yes. I like how #2 follows the int/numeric analogy: if you want a > numeric result from `int / int` you can say `int::numeric / int`. > > So my understanding is that conventionally cast functions are named > after the destination type, e.g. int8multirange(int8range) would be > the function to cast an int8range to an int8multirange. And > int8range(int8multirange) would go the other way (if we do that). We > already use these names for the "constructor" functions, but I think > that is actually okay. For the multirange->range cast, the parameter > type & number are different, so there is no real conflict. For the > range->multirange cast, the parameter type is the same, and the > constructor function is variadic---but I think that's fine, because > the semantics are the same: build a multirange whose only element is > the given range: > > regression=# select int8multirange(int8range(1,2)); > int8multirange > > {[1,2)} > (1 row) > > Even the NULL handling is already what we want: > > regression=# select int8multirange(null); > int8multirange > > NULL > (1 row) > > So I think it's fine, but I'm curious whether you see any problems > there? (I guess if there is a problem it's no big deal to name the > function something else) > It looks well now. I am not sure about benefit of cast from MR to R if MR has more than one values. But it can be there for completeness. I think in this moment is not important to implement all functionality - for start is good to implement basic functionality that can be good. It can be enhanced step by step in next versions. Pavel > > Thanks, > Paul >
Re: range_agg
On Thu, Nov 21, 2019 at 9:21 PM Pavel Stehule wrote: > I though about it, and I think so cast from multirange to range is useless, > minimally it should be explicit. I agree: definitely not implicit. If I think of a good reason for it I'll add it, but otherwise I'll leave it out. > On second hand - from range to multirange should be implicit. Okay. > The original patch did > > 1. MR @x MR = MR > 2. R @x R = MR > 3. MR @x R = MR > > I think so @1 & @3 has sense, but without introduction of special operator. > @2 is bad and can be solved by cast one or second operand. Yes. I like how #2 follows the int/numeric analogy: if you want a numeric result from `int / int` you can say `int::numeric / int`. So my understanding is that conventionally cast functions are named after the destination type, e.g. int8multirange(int8range) would be the function to cast an int8range to an int8multirange. And int8range(int8multirange) would go the other way (if we do that). We already use these names for the "constructor" functions, but I think that is actually okay. For the multirange->range cast, the parameter type & number are different, so there is no real conflict. For the range->multirange cast, the parameter type is the same, and the constructor function is variadic---but I think that's fine, because the semantics are the same: build a multirange whose only element is the given range: regression=# select int8multirange(int8range(1,2)); int8multirange {[1,2)} (1 row) Even the NULL handling is already what we want: regression=# select int8multirange(null); int8multirange NULL (1 row) So I think it's fine, but I'm curious whether you see any problems there? (I guess if there is a problem it's no big deal to name the function something else) Thanks, Paul
Re: range_agg
čt 21. 11. 2019 v 21:15 odesílatel Paul Jungwirth < p...@illuminatedcomputing.com> napsal: > On 11/21/19 1:06 AM, Pavel Stehule wrote: > > 2. I don't like introduction "safe" operators - now the basic operators > > are doubled, and nobody without documentation will use @* operators. > > > > It is not intuitive. I think is better to map this functionality to > > basic operators +- * and implement it just for pairs (Multirange, > > Multirange) and (Multirange, Range) if it is possible > > > > It's same relation line Numeric X integer. There should not be > > introduced new operators. If somebody need it for ranges, then he can > > use cast to multirange, and can continue. > > [snip] > > 3. There are not prepared casts - > > > > postgres=# select int8range(10,15)::int8multirange; > > ERROR: cannot cast type int8range to int8multirange > > LINE 1: select int8range(10,15)::int8multirange; > > ^ > > There should be some a) fully generic solution, or b) possibility to > > build implicit cast when any multirange type is created. > > Okay, I like the idea of just having `range + range` and `multirange + > multirange`, then letting you cast between ranges and multiranges. The > analogy to int/numeric seems strong. I guess if you cast a multirange > with more than one element to a range it will raise an error. That will > let me clean up the docs a lot too. > I though about it, and I think so cast from multirange to range is useless, minimally it should be explicit. On second hand - from range to multirange should be implicit. The original patch did 1. MR @x MR = MR 2. R @x R = MR 3. MR @x R = MR I think so @1 & @3 has sense, but without introduction of special operator. @2 is bad and can be solved by cast one or second operand. Pavel > Thanks! > > -- > Paul ~{:-) > p...@illuminatedcomputing.com >
Re: range_agg
On 11/21/19 1:06 AM, Pavel Stehule wrote: 2. I don't like introduction "safe" operators - now the basic operators are doubled, and nobody without documentation will use @* operators. It is not intuitive. I think is better to map this functionality to basic operators +- * and implement it just for pairs (Multirange, Multirange) and (Multirange, Range) if it is possible It's same relation line Numeric X integer. There should not be introduced new operators. If somebody need it for ranges, then he can use cast to multirange, and can continue. > [snip] 3. There are not prepared casts - postgres=# select int8range(10,15)::int8multirange; ERROR: cannot cast type int8range to int8multirange LINE 1: select int8range(10,15)::int8multirange; ^ There should be some a) fully generic solution, or b) possibility to build implicit cast when any multirange type is created. Okay, I like the idea of just having `range + range` and `multirange + multirange`, then letting you cast between ranges and multiranges. The analogy to int/numeric seems strong. I guess if you cast a multirange with more than one element to a range it will raise an error. That will let me clean up the docs a lot too. Thanks! -- Paul ~{:-) p...@illuminatedcomputing.com
Re: range_agg
st 20. 11. 2019 v 20:32 odesílatel Paul A Jungwirth < p...@illuminatedcomputing.com> napsal: > On Tue, Nov 19, 2019 at 9:49 PM Paul A Jungwirth > wrote: > > > > On Tue, Nov 19, 2019 at 1:17 AM Pavel Stehule > wrote: > > > Hi > > > I tested last patches. I found some issues > > > > Thank you for the review! > > Here is an updated patch series fixing the problems from that last > review. I would still like some direction about the doc formatting: > > yes, these bugs are fixed there are not compilation's issues tests passed doc is ok I have notes 1. the chapter should be renamed to "Range Functions and Operators" to "Range and Multirange Functions and Operators" But now the doc is not well readable - there is not clean, what functions are for range type, what for multirange and what for both 2. I don't like introduction "safe" operators - now the basic operators are doubled, and nobody without documentation will use @* operators. It is not intuitive. I think is better to map this functionality to basic operators +- * and implement it just for pairs (Multirange, Multirange) and (Multirange, Range) if it is possible It's same relation line Numeric X integer. There should not be introduced new operators. If somebody need it for ranges, then he can use cast to multirange, and can continue. The "safe" operators can be implement on user space - but should not be default solution. 3. There are not prepared casts - postgres=# select int8range(10,15)::int8multirange; ERROR: cannot cast type int8range to int8multirange LINE 1: select int8range(10,15)::int8multirange; ^ There should be some a) fully generic solution, or b) possibility to build implicit cast when any multirange type is created. Regards Pavel > > > I am not sure how much is correct to use class="monospaced"> in doc. It is used for ranges, and multiranges, but no > in other places > > > > I could use some advice here. Many operators seem best presented in > > groups of four, where only their parameter types change, for example: > > > > int8range < int8range > > int8range < int8multirange > > int8multirange < int8range > > int8multirange < int8multirange > > > > All I really want is to show those separated by line breaks. I > > couldn't find any other examples of that happening inside a table cell > > though (i.e. inside ). What is the best way > > to do that? > Personally I think it should be cleaned. Mainly if there is not visible differences. But range related doc it uses, so it is consistent with it. And then this is not big issue. > Thanks, > Paul >
Re: range_agg
On Tue, Nov 19, 2019 at 1:17 AM Pavel Stehule wrote: > Hi > I tested last patches. I found some issues Thank you for the review! > 1. you should not to try patch catversion. I've seen discussion on pgsql-hackers going both ways, but I'll leave it out of future patches. :-) > 2. there is warning > > parse_coerce.c: In function ‘enforce_generic_type_consistency’: > parse_coerce.c:1975:11: warning: ‘range_typelem’ may be used uninitialized in > this function [-Wmaybe-uninitialized] > 1975 | else if (range_typelem != elem_typeid) Fixed locally, will include in my next patch. > 3. there are problems with pg_upgrade. Regress tests fails > . . . > pg_restore: while PROCESSING TOC: > pg_restore: from TOC entry 1653; 1247 17044 TYPE arrayrange pavel > pg_restore: error: could not execute query: ERROR: pg_type array OID value > not set when in binary upgrade mode I see what's going on here. (Sorry if this verbose explanation is obvious; it's as much for me as for anyone.) With pg_upgrade the values of pg_type.oid and pg_type.typarray must be the same before & after. For built-in types there's no problem, because those are fixed by pg_type.dat. But for user-defined types we have to take extra steps to make sure they don't change. CREATE TYPE always uses two oids: one for the type and one for the type's array type. But now when you create a range type we use *four*: the range type, the array of that range, the multirange type, and the array of that multirange. Currently when you run pg_dump in "binary mode" (i.e. as part of pg_upgrade) it includes calls to special functions to set the next oid to use for pg_type.oid and pg_type.typarray. Then CREATE TYPE also has special "binary mode" code to check those variables and use those oids (e.g. AssignTypeArrayOid). After using them once it sets them back to InvalidOid so it doesn't keep using them. So I guess I need to add code to pg_dump so that it also outputs calls to two new special functions that similarly set the oid to use for the next multirange and multirange[]. For v12->v13 it will chose high-enough oids like we do already for arrays of domains. (For other upgrades it will use the existing value.) And then I can change the CREATE TYPE code to check those pre-set values when obtaining the next oid. Does that sound like the right approach here? > 4. there is a problem with doc > > extend.sgml:281: parser error : Opening and ending tag mismatch: para line > 270 and type > type of the ranges in an anymultirange. Hmm, yikes, I'll fix that! > I am not sure how much is correct to use > in doc. It is used for ranges, and multiranges, but no in other places I could use some advice here. Many operators seem best presented in groups of four, where only their parameter types change, for example: int8range < int8range int8range < int8multirange int8multirange < int8range int8multirange < int8multirange All I really want is to show those separated by line breaks. I couldn't find any other examples of that happening inside a table cell though (i.e. inside ). What is the best way to do that? Thanks, Paul
Re: range_agg
Hi čt 7. 11. 2019 v 3:36 odesílatel Paul A Jungwirth < p...@illuminatedcomputing.com> napsal: > On Wed, Nov 6, 2019 at 3:02 PM Paul A Jungwirth > wrote: > > On Thu, Sep 26, 2019 at 2:13 PM Alvaro Herrera > wrote: > > > Hello Paul, I've started to review this patch. Here's a few minor > > > things I ran across -- mostly compiler warnings (is my compiler too > > > ancient?). > > I just opened this thread to post a rebased set patches (especially > > because of the `const` additions to range functions). Maybe it's not > > that helpful since they don't include your changes yet but here they > > are anyway. I'll post some more with your changes shortly. > > Here is another batch of patches incorporating your improvements. It > seems like almost all the warnings were about moving variable > declarations above any other statements. For some reason I don't get > warnings about that on my end (compiling on OS X): > > platter:postgres paul$ gcc --version > Configured with: > --prefix=/Applications/Xcode.app/Contents/Developer/usr > --with-gxx-include-dir=/usr/include/c++/4.2.1 > Apple clang version 11.0.0 (clang-1100.0.33.12) > Target: x86_64-apple-darwin18.6.0 > Thread model: posix > InstalledDir: > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin > > For configure I'm saying this: > > ./configure CFLAGS=-ggdb 5-Og -g3 -fno-omit-frame-pointer > --enable-tap-tests --enable-cassert --enable-debug > --prefix=/Users/paul/local > > Any suggestions to get better warnings? On my other patch I got > feedback about the very same kind. I could just compile on Linux but > it's nice to work on this away from my desk on the laptop. Maybe > installing a real gcc is the way to go. > I tested last patches. I found some issues 1. you should not to try patch catversion. 2. there is warning parse_coerce.c: In function ‘enforce_generic_type_consistency’: parse_coerce.c:1975:11: warning: ‘range_typelem’ may be used uninitialized in this function [-Wmaybe-uninitialized] 1975 | else if (range_typelem != elem_typeid) 3. there are problems with pg_upgrade. Regress tests fails command: "/home/pavel/src/postgresql.master/tmp_install/usr/local/pgsql/bin/pg_restore" --host /home/pavel/src/postgresql.master/src/b pg_restore: connecting to database for restore pg_restore: creating DATABASE "regression" pg_restore: connecting to new database "regression" pg_restore: connecting to database "regression" as user "pavel" pg_restore: creating DATABASE PROPERTIES "regression" pg_restore: connecting to new database "regression" pg_restore: connecting to database "regression" as user "pavel" pg_restore: creating pg_largeobject "pg_largeobject" pg_restore: creating SCHEMA "fkpart3" pg_restore: creating SCHEMA "fkpart4" pg_restore: creating SCHEMA "fkpart5" pg_restore: creating SCHEMA "fkpart6" pg_restore: creating SCHEMA "mvtest_mvschema" pg_restore: creating SCHEMA "regress_indexing" pg_restore: creating SCHEMA "regress_rls_schema" pg_restore: creating SCHEMA "regress_schema_2" pg_restore: creating SCHEMA "testxmlschema" pg_restore: creating TRANSFORM "TRANSFORM FOR integer LANGUAGE "sql"" pg_restore: creating TYPE "public.aggtype" pg_restore: creating TYPE "public.arrayrange" pg_restore: while PROCESSING TOC: pg_restore: from TOC entry 1653; 1247 17044 TYPE arrayrange pavel pg_restore: error: could not execute query: ERROR: pg_type array OID value not set when in binary upgrade mode Command was:. -- For binary upgrade, must preserve pg_type oid SELECT pg_catalog.binary_upgrade_set_next_pg_type_oid('17044'::pg_catalog.oid); -- For binary upgrade, must preserve pg_type array oid SELECT pg_catalog.binary_upgrade_set_next_array_pg_type_oid('17045'::pg_catalog.oid); CREATE TYPE "public"."arrayrange" AS RANGE ( subtype = integer[] ); 4. there is a problem with doc echo ""; \ echo ""; \ } > version.sgml '/usr/bin/perl' ./mk_feature_tables.pl YES ../../../src/backend/catalog/sql_feature_packages.txt ../../../src/backend/catalog/sql_features.txt > features-supported.sgml '/usr/bin/perl' ./mk_feature_tables.pl NO ../../../src/backend/catalog/sql_feature_packages.txt ../../../src/backend/catalog/sql_features.txt > features-unsupported.sgml '/usr/bin/perl' ./generate-errcodes-table.pl ../../../src/backend/utils/errcodes.txt > errcodes-table.sgml '/usr/bin/perl' ./generate-keywords-table.pl . > keywords-table.sgml /usr/bin/xmllint --path . --noout --valid postgres.sgml extend.sgml:281: parser error : Opening and ending tag mismatch: para line 270 and type type of the ranges in an anymultirange. ^ extend.sgml:281: parser error : Opening and ending tag mismatch: sect2 line 270 and type type of the ranges in an anymultirange. ^ extend.sgml:282: parser error : Opening and ending tag mismatch: sect1 line 270 and para ^ extend.sgml:324: parser error : Opening and ending tag mismatch: chapt
Re: range_agg
Hello Paul, I've started to review this patch. Here's a few minor things I ran across -- mostly compiler warnings (is my compiler too ancient?). You don't have to agree with every fix -- feel free to use different fixes if you have them. Also, feel free to squash them onto whatever commit you like (I think they all belong onto 0001 except the last which seems to be for 0002). Did you not push your latest version to your github repo? I pulled from there and branch 'multirange' does not seem to match what you posted. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From 6d849beafea1e4129f044f8dd038933d6ea72b54 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Thu, 26 Sep 2019 16:56:26 -0300 Subject: [PATCH 1/9] Remove unused variable --- src/backend/catalog/pg_type.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/src/backend/catalog/pg_type.c b/src/backend/catalog/pg_type.c index 08552850c8..ea5b161c4d 100644 --- a/src/backend/catalog/pg_type.c +++ b/src/backend/catalog/pg_type.c @@ -876,7 +876,6 @@ makeMultirangeTypeName(const char *rangeTypeName, Oid typeNamespace) char mltrng[NAMEDATALEN]; char *mltrngunique = (char *) palloc(NAMEDATALEN); int namelen = strlen(rangeTypeName); - int rangelen; char *rangestr; int rangeoffset; int underscores; @@ -892,7 +891,6 @@ makeMultirangeTypeName(const char *rangeTypeName, Oid typeNamespace) if (rangestr) { rangeoffset = rangestr - rangeTypeName; - rangelen = strlen(rangestr); strlcpy(mltrng + rangeoffset, "multi", NAMEDATALEN - rangeoffset); strlcpy(mltrng + rangeoffset + 5, rangestr, NAMEDATALEN - rangeoffset - 5); namelen += 5; -- 2.17.1 >From 63bd9723a45c33cccd11704817fd954af5abaf31 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Thu, 26 Sep 2019 17:39:54 -0300 Subject: [PATCH 2/9] Silence compiler warning --- src/backend/parser/parse_coerce.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/backend/parser/parse_coerce.c b/src/backend/parser/parse_coerce.c index f0b1f6f831..aab2348952 100644 --- a/src/backend/parser/parse_coerce.c +++ b/src/backend/parser/parse_coerce.c @@ -1919,6 +1919,8 @@ enforce_generic_type_consistency(const Oid *actual_arg_types, format_type_be(elem_typeid; } } + else + range_typelem = InvalidOid; /* Get the range type based on the multirange type, if we have one */ if (OidIsValid(multirange_typeid)) -- 2.17.1 >From c4e4b1a97136f7a2047f8be6608cac0782dd6d12 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Thu, 26 Sep 2019 17:42:22 -0300 Subject: [PATCH 3/9] Silence 'mixed declarations and code' compiler warning --- src/backend/commands/typecmds.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c index 14a6857062..26ed3e4c76 100644 --- a/src/backend/commands/typecmds.c +++ b/src/backend/commands/typecmds.c @@ -1782,14 +1782,10 @@ makeMultirangeConstructors(const char *name, Oid namespace, static const char *const prosrc[2] = {"multirange_constructor0", "multirange_constructor1"}; static const int pronargs[2] = {0, 1}; - - Oid constructorArgTypes[0]; + Oid constructorArgTypes = rangeArrayOid; ObjectAddress myself, referenced; int i; - - constructorArgTypes[0] = rangeArrayOid; - Datum allParamTypes[1] = {ObjectIdGetDatum(rangeArrayOid)}; ArrayType *allParameterTypes = construct_array(allParamTypes, 1, OIDOID, sizeof(Oid), true, 'i'); @@ -1808,7 +1804,7 @@ makeMultirangeConstructors(const char *name, Oid namespace, { oidvector *constructorArgTypesVector; - constructorArgTypesVector = buildoidvector(constructorArgTypes, + constructorArgTypesVector = buildoidvector(&constructorArgTypes, pronargs[i]); myself = ProcedureCreate(name, /* name: same as multirange type */ -- 2.17.1 >From af414286417d4fddd078c7ffac087482b3ba959d Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Thu, 26 Sep 2019 17:49:19 -0300 Subject: [PATCH 4/9] Silence 'mixed declarations and code' compiler warnings --- src/backend/utils/fmgr/funcapi.c | 98 +++- 1 file changed, 59 insertions(+), 39 deletions(-) diff --git a/src/backend/utils/fmgr/funcapi.c b/src/backend/utils/fmgr/funcapi.c index e6e82fda63..701664bd3b 100644 --- a/src/backend/utils/fmgr/funcapi.c +++ b/src/backend/utils/fmgr/funcapi.c @@ -541,18 +541,21 @@ resolve_polymorphic_tupdesc(TupleDesc tupdesc, oidvector *declared_args, if (OidIsValid(anymultirange_type)) { - Oid rngtype = resolve_generic_type(ANYRANGEOID, - anymultirange_type, - ANYMULTIRANGEOID); + Oid rngtype; + Oid subtype; + + rngtype = resolve_generic_type(ANYRANGEOID, + anymultirange_type, + ANYMULTIRANGEOID); /* check for inconsistent range and multirange results */ if (OidIsVa
Re: range_agg
On Thu, 2019-09-05 at 10:45 -0700, Paul A Jungwirth wrote: > Right now I'm planning to do all that before sending a patch. I'm > happy to send something something in-progress too, but I don't want > to > waste any reviewers' time. If folks want an early peak though let me > know. (You can also find my messy progress at > https://github.com/pjungwir/postgresql/tree/multirange) Sounds good. The rule I use is: "will the feedback I get be helpful, or just tell me about obvious problems I already know about". Regards, Jeff Davis
Re: range_agg
On Thu, Sep 5, 2019 at 10:15 AM Jeff Davis wrote: > > On Sun, 2019-09-01 at 06:26 -0700, Paul A Jungwirth wrote: > > @+ and @- and @* (I dunno why but I kind of like it. We already have > > @> and <@.) > > I think I like this proposal best; it reminds me of perl. Though some > might say that's an argument against it. Thanks Jeff, it's my favorite too. :-) Strangely it feels the hardest to justify. Right now I have + and - and * implemented but I'll change them to @+ and @- and @* so that I can support `range R range = multirange` too. Btw is there any reason to send a "preview" patch with my current progress, since we're starting a new commit fest? Here is what I have left to do: - Change those three operators. - Write range_intersect_agg. (range_agg is done but needs some tests before I commit it.) - Write documentation. - Add multiranges to resolve_generic_type, and figure out how to test that (see the other thread about two latent range-related bugs there). - Rebase on current master. (It should be just a few weeks behind right now.) - Run pgindent to make sure I'm conforming to whitespace/style guidelines. - Split it up into a few separate patch files. Right now I'm planning to do all that before sending a patch. I'm happy to send something something in-progress too, but I don't want to waste any reviewers' time. If folks want an early peak though let me know. (You can also find my messy progress at https://github.com/pjungwir/postgresql/tree/multirange) Also here are some other items that won't be in my next patch, but should probably be done (maybe by someone else but I'm happy to figure it out too) before this is really committed: - typanalyze - selectivity - gist support - spgist support If anyone would like to help with those, let me know. :-) Yours, Paul
Re: range_agg
On Sun, 2019-09-01 at 06:26 -0700, Paul A Jungwirth wrote: > @+ and @- and @* (I dunno why but I kind of like it. We already have > @> and <@.) I think I like this proposal best; it reminds me of perl. Though some might say that's an argument against it. Regards, Jeff Davis
Re: range_agg
> > Btw I have working multirange_{send,recv,in,out} now. . . . Just about all the other operators are done too, but I wonder what symbols people like for union and minus? Range uses + for union. I have working code and tests that adds this: r + mr = mr mr + r = mr mr + mr = mr But I would like to use a different symbol instead, like ++, so I can have all four: r ++ r = mr r ++ mr = mr mr ++ r = mr mr ++ mr = mr (The existing r + r operator throws an error if the inputs have a gap.) The trouble is that ++ isn't allowed. (Neither is --.) From https://www.postgresql.org/docs/11/sql-createoperator.html : > A multicharacter operator name cannot end in + or -, unless the name also > contains at least one of these characters: > ~ ! @ # % ^ & | ` ? So are there any other suggestions? I'm open to arguments that I should just use +, but I think having a way to add two simple ranges and get a multirange would be nice too, so my preference is to find a new operator. It should work with minus and intersection (* for ranges) too. Some proposals: +* and -* and ** (* as in regex "zero or many" reminds me of how a multirange holds zero or many ranges. ** is confusing though because it's like exponentiation.) @+ and @- and @* (I dunno why but I kind of like it. We already have @> and <@.) <+> and <-> and <*> (I was hoping for (+) etc for the math connection to circled operators, but this is close. Maybe this would be stronger if multirange_{in,out} used <> delims instead of {}, although I also like how {} is consistent with arrays.) Anyone else have any thoughts? Thanks, Paul
Re: range_agg
On Wed, 2019-08-21 at 21:54 -0700, Paul A Jungwirth wrote: > Sometimes I think about having a maybe type instead of null/not > null. SQL nulls are already very "monadic" I think but nullability > doesn't travel. Yeah, that would be a great direction, but there is some additional complexity that we'd need to deal with that a "normal" compiler does not: * handling both existing global types (like int) as well as on-the- fly types like Maybe> * types need to do more things, like have serialized representations, interface with indexing strategies, and present the optimizer with choices that may influence which indexes can be used or not * at some point needs to work with normal SQL types and NULL * there are a lot of times we care not just whether a type is sortable, but we actually care about the way it's sorted (e.g. localization). typeclasses+newtype would probably be unacceptable for trying to match SQL behavior here. I'm all in favor of pursuing this, but it's not going to bear fruit very soon. > Btw I have working multirange_{send,recv,in,out} now, and I > automatically create a multirange type and its array type when > someone > creates a new range type. I have a decent start on passing tests and > no compiler warnings. I also have a start on anymultirange and > anyrangearray. (I think I need the latter to support a range-like > constructor function, so you can say `int4multirange(int4range(1,4), > int4range(8,10))`.) I want to get the any* types done and improve the > test coverage, and then I'll probably be ready to share a patch. Awesome! > Here are a couple other questions: > > - Does anyone have advice for the typanalyze function? I feel pretty > out of my depth there (although I haven't looked into typanalyze > stuff > very deeply yet). I can probably get some inspiration from > range_typanalyze and array_typanalyze, but those are both quite > detailed (their statistics functions that is). I think Alexander Korotkov did a lot of the heavy lifting here, perhaps he has a comment? I'd keep it simple for now if you can, and we can try to improve it later. > - What should a multirange do if you give it an empty range? I'm > thinking it should just ignore it, but then `'{}'::int4multirange = > '{empty}'::int4multirange`. Does that seem okay? (It does to me > actually, if we think of `empty` as the additive identity. Similarly > mr + empty = mr. I agree. Multiranges are more than just an array of ranges, so they coalesce into some canonical form. > - What should a multirange do if you give it a null, like > `int4multirange(int4range(1,4), null)`. I'm thinking it should be > null, just like mr + null = null. Right? Yes. NULL is for the overall multirange datum (that is, a multirange column can be NULL), but I don't think individual parts of a datatype make much sense as NULL. So, I agree that mr + null = null. (Note that arrays and records can have NULL parts, but I don't see a reason we should follow those examples for multiranges.) Regards, Jeff Davis
Re: range_agg
On Tue, Aug 20, 2019 at 10:33 PM Jeff Davis wrote: > > Is there any historical discussion around > > typemods on range types? > > I did find a few references: Thanks for looking those up! It's very interesting to see some of the original discussion around range types. Btw this is true of so much, isn't it?: > It's more a property of the > column than the type. Sometimes I think about having a maybe type instead of null/not null. SQL nulls are already very "monadic" I think but nullability doesn't travel. I feel like someone ought to write a paper about that, but I don't know of any. This is tantalizingly close (and a fun read) but really about something else: https://www.researchgate.net/publication/266657590_Incomplete_data_what_went_wrong_and_how_to_fix_it Since you're getting into Rust maybe you can update the wiki page mentioned in those threads about refactoring the type system. :-) Anyway sorry for the digression. . . . > So, I wouldn't spend a lot of time on typmod for multiranges. Okay, thanks! There is plenty else to do. I think I'm already supporting it as much as range types do. Btw I have working multirange_{send,recv,in,out} now, and I automatically create a multirange type and its array type when someone creates a new range type. I have a decent start on passing tests and no compiler warnings. I also have a start on anymultirange and anyrangearray. (I think I need the latter to support a range-like constructor function, so you can say `int4multirange(int4range(1,4), int4range(8,10))`.) I want to get the any* types done and improve the test coverage, and then I'll probably be ready to share a patch. Here are a couple other questions: - Does anyone have advice for the typanalyze function? I feel pretty out of my depth there (although I haven't looked into typanalyze stuff very deeply yet). I can probably get some inspiration from range_typanalyze and array_typanalyze, but those are both quite detailed (their statistics functions that is). - What should a multirange do if you give it an empty range? I'm thinking it should just ignore it, but then `'{}'::int4multirange = '{empty}'::int4multirange`. Does that seem okay? (It does to me actually, if we think of `empty` as the additive identity. Similarly mr + empty = mr. - What should a multirange do if you give it a null, like `int4multirange(int4range(1,4), null)`. I'm thinking it should be null, just like mr + null = null. Right? Thanks! Paul
Re: range_agg
On Sat, 2019-08-17 at 10:47 -0700, Paul A Jungwirth wrote: > So I'm wondering how seriously I should take this for multiranges? I > guess if a range type did support typmods, it would just delegate to > the underlying element type for their meaning, and so a multirange > should delegate it too? Is there any historical discussion around > typemods on range types? I did find a few references: https://www.postgresql.org/message-id/1288029716.8645.4.camel%40jdavis-ux.asterdata.local https://www.postgresql.org/message-id/2011091334.GB11603%40fetter.org https://www.postgresql.org/message-id/1296974485.27157.136.camel@jdavis I'd be interested in ways that we can use a typmod-like concept to improve the type system. Unfortunately, typmod is just not sophisticated enough to do very much because it's lost through function calls. Improving that would be a separate and challenging project. So, I wouldn't spend a lot of time on typmod for multiranges. Regards, Jeff Davis
Re: range_agg
On Mon, Jul 8, 2019 at 9:46 AM Paul A Jungwirth wrote: > - A multirange type is an extra thing you get when you define a range > (just like how you get a tstzrange[]). Therefore I've been able to make a little more progress on multiranges the last few days, but it reminded me of an open question I've had for awhile: typmods! I see places in the range code that gesture toward supporting typmods, but none of the existing range types permit them. For example: postgres=# select '5'::numeric(4,2); numeric - 5.00 (1 row) postgres=# select '[1,4)'::numrange(4,2); ERROR: type modifier is not allowed for type "numrange" LINE 1: select '[1,4)'::numrange(4,2); So I'm wondering how seriously I should take this for multiranges? I guess if a range type did support typmods, it would just delegate to the underlying element type for their meaning, and so a multirange should delegate it too? Is there any historical discussion around typemods on range types? Thanks! Paul
Re: range_agg
On Wed, Jul 24, 2019 at 5:13 PM Paul A Jungwirth wrote: > On Tue, Jul 23, 2019 at 3:32 PM Alvaro Herrera > wrote: > > Just checking if you've had a chance to make progress on this. > > Not a lot. :-) But I should have more time for it the next few weeks > than I did the last few. ... Hi Paul, I didn't follow this thread, but as the CF is coming to a close, I'm interpreting the above to mean that this is being worked on and there is a good chance of a new patch in time for September. Therefore I have moved this entry to that 'fest. Thanks, -- Thomas Munro https://enterprisedb.com
Re: range_agg
On Tue, Jul 23, 2019 at 3:32 PM Alvaro Herrera wrote: > Just checking if you've had a chance to make progress on this. Not a lot. :-) But I should have more time for it the next few weeks than I did the last few. I do have some code for creating concrete multirange types (used when you create a concrete range type) and filling in a TypeCacheEntry based on the range type oid---which I know is all very modest progress. I've been working on a multirange_in function and mostly just learning about Postgres varlena and TOASTed objects by reading the code for range_in & array_in. Here is something from my multirangetypes.h: /* * Multiranges are varlena objects, so must meet the varlena convention that * the first int32 of the object contains the total object size in bytes. * Be sure to use VARSIZE() and SET_VARSIZE() to access it, though! */ typedef struct { int32 vl_len_;/* varlena header (do not touch directly!) */ Oid multirangetypid;/* multirange type's own OID */ /* * Following the OID are the range objects themselves. * Note that ranges are varlena too, * depending on whether they have lower/upper bounds * and because even their base types can be varlena. * So we can't really index into this list. */ } MultirangeType; I'm working on parsing a multirange much like we parse an array, although it's a lot simpler because it's a single dimension and there are no nulls. I know that's not much to go on, but let me know if any of it worries you. :-) Paul
Re: range_agg
Hi Paul, Just checking if you've had a chance to make progress on this. Thanks, -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: range_agg
On 7/9/19 11:24 PM, David Fetter wrote: I seem to recall that the usual convention (at least in math) is to use intervals that are generally represented as open on the infinity side, but that might not fit how we do things. I think it does, unless I'm misunderstanding? Oh, I was just wondering about the square bracket on the left side of [null, 1). It's not super important. Ah, I understand now. Just a typo on my part. Thanks for catching it, and sorry for the confusion! !mr , perhaps? I like that suggestion. Honestly I'm not sure we even want an inverse, but it's so important theoretically we should at least consider whether it is appropriate here. Or maybe "inverse" is the wrong word for this, or there is a different meaning it should have. Jeff's suggestion of ~ for complement is better. Okay, thanks. I like it better too. Yours, -- Paul ~{:-) p...@illuminatedcomputing.com
Re: range_agg
On Tue, Jul 09, 2019 at 09:40:59AM -0700, Paul A Jungwirth wrote: > On Tue, Jul 9, 2019 at 8:51 AM David Fetter wrote: > > > - A multirange type is an extra thing you get when you define a range > > > (just like how you get a tstzrange[]). Therefore > > > - I don't need separate commands to add/drop multirange types. You get > > > one when you define a range type, and if you drop a range type it gets > > > dropped automatically. > > > > Yay for fewer manual steps! > > Thanks for taking a look and sharing your thoughts! > > > > - You can have a multirange[]. > > > > I can see how that would fall out of this, but I'm a little puzzled as > > to what people might use it for. Aggregates, maybe? > > I don't know either, but I thought it was standard to define a T[] for > every T. Anyway it doesn't seem difficult. > > > > - You can cast from a multirange to an array. The individual ranges > > > are always sorted in the result array. > > > > Is this so people can pick individual ranges out of the multirange, > > or...? > > Yes. I want this for foreign keys actually, where I construct a > multirange and ask for just its first range. I'm sure I'll understand this better once I get my head around temporal foreign keys. > > Speaking of casts, it's possible that a multirange is also a > > range. Would it make sense to have a cast from multirange to range? > > Hmm, that seems strange to me. You don't cast from an array to one of > its elements. If we have subscripting, why use casting to get the > first element? Excellent point. > > > - Interesting functions: > > > - multirange_length > > > > Is that the sum of the lengths of the ranges? Are we guaranteeing a > > measure in addition to ordering on ranges now? > > Just the number of disjoint ranges in the multirange. Thanks for clarifying. > > > - You can subscript a multirange like you do an array (? This could be > > > a function instead.) > > > > How would this play with the generic subscripting patch in flight? > > I'm not aware of that patch but I guess I better check it out. :-) Looks like I'm the second to mention it. Worth a review? > > > - inverse operator?: > > > - the inverse of {"[1,2)"} would be {"[null, 1)", "[2, null)"}. > > > > Is that the same as ["(∞, ∞)"] - {"[1,2)"}? > > Yes. > > > I seem to recall that the usual convention (at least in math) is > > to use intervals that are generally represented as open on the > > infinity side, but that might not fit how we do things. > > I think it does, unless I'm misunderstanding? Oh, I was just wondering about the square bracket on the left side of [null, 1). It's not super important. > > > - not sure we want this or what the symbol should be. I don't like > > > -mr as an inverse because then mr - mr != mr ++ -mr. > > > > !mr , perhaps? > > I like that suggestion. Honestly I'm not sure we even want an inverse, > but it's so important theoretically we should at least consider > whether it is appropriate here. Or maybe "inverse" is the wrong word > for this, or there is a different meaning it should have. Jeff's suggestion of ~ for complement is better. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: range_agg
st 10. 7. 2019 v 6:26 odesílatel Paul A Jungwirth < p...@illuminatedcomputing.com> napsal: > On Tue, Jul 9, 2019 at 12:24 PM Pavel Stehule > wrote: > > út 9. 7. 2019 v 21:10 odesílatel Pavel Stehule > napsal: > >> I afraid so with generic multiragetype there lot of array > infrastructure will be duplicated > > > > on second hand - it is true so classic array concat is not optimal for > set of ranges, so some functionality should be redefined every time. > > > > I don't know what is possible, but for me - multiranges is special kind > (subset) of arrays and can be implement as subset of arrays. I remember > other possible kind of arrays - "sets" without duplicates. It is similar > case, I think. > > > > Maybe introduction of multirages as new generic type is bad direction, > and can be better and more enhanceable in future to introduce some like > special kinds of arrays. So for example - unnest can be used directly for > arrays and multiranges too - because there will be common base. > > Well I'm afraid of that too a bit, although I also agree it may be an > opportunity to share some common behavior and implementation. For > example in the discussion about string syntax, I think keeping it the > same as arrays is nicer for people and lets us share more between the > two types. > > That said I don't think a multirange is a subtype of arrays (speaking > as a traditional object-oriented subtype), just something that shares > a lot of the same behavior. I'm inclined to maximize the overlap where > feasible though, e.g. string syntax, UNNEST, indexing, function naming > (`range_length`), etc. Something like Rust traits (or Java interfaces) > seems a closer mental model, not that we have to formalize that > somehow, particularly up front. > A introduction of new generic type can have some other impacts - there can be necessary special support for PL languages. I understand so it is hard to decide - because we miss some more generic base "sets". Probably we cannot to think more about it now, and we have to wait to some patches. Later we can see how much code is duplicated and if it is a problem or not. Regards Pavel > Yours, > Paul >
Re: range_agg
On Tue, Jul 9, 2019 at 12:24 PM Pavel Stehule wrote: > út 9. 7. 2019 v 21:10 odesílatel Pavel Stehule > napsal: >> I afraid so with generic multiragetype there lot of array infrastructure >> will be duplicated > > on second hand - it is true so classic array concat is not optimal for set of > ranges, so some functionality should be redefined every time. > > I don't know what is possible, but for me - multiranges is special kind > (subset) of arrays and can be implement as subset of arrays. I remember other > possible kind of arrays - "sets" without duplicates. It is similar case, I > think. > > Maybe introduction of multirages as new generic type is bad direction, and > can be better and more enhanceable in future to introduce some like special > kinds of arrays. So for example - unnest can be used directly for arrays and > multiranges too - because there will be common base. Well I'm afraid of that too a bit, although I also agree it may be an opportunity to share some common behavior and implementation. For example in the discussion about string syntax, I think keeping it the same as arrays is nicer for people and lets us share more between the two types. That said I don't think a multirange is a subtype of arrays (speaking as a traditional object-oriented subtype), just something that shares a lot of the same behavior. I'm inclined to maximize the overlap where feasible though, e.g. string syntax, UNNEST, indexing, function naming (`range_length`), etc. Something like Rust traits (or Java interfaces) seems a closer mental model, not that we have to formalize that somehow, particularly up front. Yours, Paul
Re: range_agg
On Tue, Jul 9, 2019 at 12:02 PM Jeff Davis wrote: > > - Multirange in/out work just like arrays, e.g. '{"[1,3)", "[5,6)"}' > > It would be cool to have a better text representation. We could go > simple like: > >'[1,3) [5,6)' Will that work with all ranges, even user-defined ones? With a tstzrange[] there is a lot of quoting: =# select array[tstzrange(now(), now() + interval '1 hour')]; array --- {"[\"2019-07-09 12:40:20.794054-07\",\"2019-07-09 13:40:20.794054-07\")"} I'm more inclined to follow the array syntax both because it will be familiar & consistent to users (no need to remember any differences) and because it's already built so we can use it and know it won't have gotchas. > I think "complement" might be a better name than "inverse". > > m1 - m2 = m1 * complement(m2) > > What about "~"? That seems like the right term and a good symbol. Paul
Re: range_agg
út 9. 7. 2019 v 21:10 odesílatel Pavel Stehule napsal: > > > út 9. 7. 2019 v 20:25 odesílatel Jeff Davis napsal: > >> On Tue, 2019-07-09 at 07:08 +0200, Pavel Stehule wrote: >> > >> > I am not against a multirange type, but I miss a explanation why you >> > introduce new kind of types and don't use just array of ranges. >> > >> > Introduction of new kind of types is not like introduction new type. >> >> The biggest benefit, in my opinion, is that it means you can define >> functions/operators that take an "anyrange" and return an >> "anymultirange". That way you don't have to define different functions >> for int4 ranges, date ranges, etc. >> >> > I am not sure how strong is this argument. > > I think so introduction of anyrangearray polymorphic type and enhancing > some type deduction can do same work. > > It starts to get even more complex when you want to add opclasses, etc. >> >> Ranges and arrays are effectively generic types that need a type >> parameter to become a concrete type. Ideally, we'd have first-class >> support for generic types, but I think that's a different topic ;-) > > > I afraid so with generic multiragetype there lot of array infrastructure > will be duplicated > on second hand - it is true so classic array concat is not optimal for set of ranges, so some functionality should be redefined every time. I don't know what is possible, but for me - multiranges is special kind (subset) of arrays and can be implement as subset of arrays. I remember other possible kind of arrays - "sets" without duplicates. It is similar case, I think. Maybe introduction of multirages as new generic type is bad direction, and can be better and more enhanceable in future to introduce some like special kinds of arrays. So for example - unnest can be used directly for arrays and multiranges too - because there will be common base. Regards Pavel > Regards > > Pavel > > >> Regards, >> Jeff Davis >> >> >>
Re: range_agg
út 9. 7. 2019 v 20:25 odesílatel Jeff Davis napsal: > On Tue, 2019-07-09 at 07:08 +0200, Pavel Stehule wrote: > > > > I am not against a multirange type, but I miss a explanation why you > > introduce new kind of types and don't use just array of ranges. > > > > Introduction of new kind of types is not like introduction new type. > > The biggest benefit, in my opinion, is that it means you can define > functions/operators that take an "anyrange" and return an > "anymultirange". That way you don't have to define different functions > for int4 ranges, date ranges, etc. > > I am not sure how strong is this argument. I think so introduction of anyrangearray polymorphic type and enhancing some type deduction can do same work. It starts to get even more complex when you want to add opclasses, etc. > > Ranges and arrays are effectively generic types that need a type > parameter to become a concrete type. Ideally, we'd have first-class > support for generic types, but I think that's a different topic ;-) I afraid so with generic multiragetype there lot of array infrastructure will be duplicated Regards Pavel > Regards, > Jeff Davis > > >