Hi I am looking to your last patch and I have a few questions, notes
1. I am thinking so reduction to only numeric types is not necessary - although we can live without it - but there are lot of non numeric categories: chars, date, ... 2. Still I strongly afraid about used searching method - there is not possible to check a validity of input. Did you check how much linear searching is slower - you spoke, so you have a real data and real use case? Used methods can returns wrong result without any warning, what is acceptable for extensions, but I am not sure, for core feature. 3. I am thinking about name - I don't think so varwidth_bucket is correct -- in relation to name "width_bucket" ... what about "range_bucket" or "scope_bucket" ? last patch is very simple, there are no new compilation or regress tests issues. Regards Pavel 2014-08-25 16:15 GMT+02:00 Petr Jelinek <p...@2ndquadrant.com>: > Hi, > > I finally had some time to get back to this. > > I attached version3 of the patch which "fixes" Tom's complaint about int8 > version by removing the int8 version as it does not seem necessary (the > float8 can handle integers just fine). > > This patch now basically has just one optimized function for float8 and > one for numeric datatypes, just like width_bucket. > > On 08/07/14 02:14, Tom Lane wrote: >> Also, I'm not convinced by this business of throwing an error for a >> NULL array element. Per spec, null arguments to width_bucket() >> produce a null result, not an error --- shouldn't this flavor act >> similarly? In any case I think the test needs to use >> array_contains_nulls() not just ARR_HASNULL. >> > > I really think there should be difference between NULL array and NULL > inside array and that NULL inside the array is wrong input. I changed the > check to array_contains_nulls() though. > > On 08/07/14 02:14, Tom Lane wrote: >> Lastly, the spec defines behaviors for width_bucket that allow either >> ascending or descending buckets. We could possibly do something >> similar >> > > I am not sure it's worth it here as we require input to be sorted anyway. > It might be worthwhile if we decided to do this as an aggregate (since > there input would not be required to be presorted) instead of function but > I am not sure if that's desirable either as that would limit usability to > only the single main use-case (group by and count()). > > > > On 20/07/14 11:01, Simon Riggs wrote: > >> On 16 July 2014 20:35, Pavel Stehule <pavel.steh...@gmail.com> wrote: >> >>> >>> On 08/07/14 02:14, Tom Lane wrote: >>>> >>>>> >>>>> I didn't see any discussion of the naming question in this thread. >>>>> I'd like to propose that it should be just "width_bucket()"; we can >>>>> easily determine which function is meant, considering that the >>>>> SQL-spec variants don't take arrays and don't even have the same >>>>> number of actual arguments. >>>>> >>>> >>>> I did mention in submission that the names are up for discussion, I am >>>> all >>>> for naming it just width_bucket. >>>> >>> >>> I had this idea too - but I am not sure if it is good idea. A distance >>> between ANSI SQL with_bucket and our enhancing is larger than in our >>> implementation of "median" for example. >>> >>> I can live with both names, but current name I prefer. >>> >> >> >> So I suggest that we use the more generic function name bin(), with a >> second choice of discretize() (though that seems annoyingly easy to >> spell incorrectly) >> >> > I really don't think bin() is good choice here, bucket has same meaning in > this context and bin is often used as shorthand for binary which would be > confusing here. > > Anyway I currently left the name as it is, I would not be against using > width_bucket() or even just bucket(), not sure about the discretize() > option. > > > -- > Petr Jelinek http://www.2ndQuadrant.com/ > > PostgreSQL Development, 24x7 Support, Training & Services >