Re: [HACKERS] Built-in binning functions

Pavel Stehule Fri, 29 Aug 2014 11:23:17 -0700

Hi

I am looking to your last patch and I have a few questions, notes


1. I am thinking so reduction to only numeric types is not necessary -
although we can live without it - but there are lot of non numeric
categories: chars, date, ...

2. Still I strongly afraid about used searching method - there is not
possible to check a  validity of input. Did you check how much linear
searching is slower - you spoke, so you have a real data and real use case?
Used methods can returns wrong result without any warning, what is
acceptable for extensions, but I am not sure, for core feature.

3. I am thinking about name - I don't think so varwidth_bucket is correct
-- in relation to name "width_bucket" ... what about "range_bucket" or
"scope_bucket" ?

last patch is very simple, there are no new compilation or regress tests
issues.

Regards

Pavel





2014-08-25 16:15 GMT+02:00 Petr Jelinek <[email protected]>:

> Hi,
>
> I finally had some time to get back to this.
>
> I attached version3 of the patch which "fixes" Tom's complaint about int8
> version by removing the int8 version as it does not seem necessary (the
> float8 can handle integers just fine).
>
> This patch now basically has just one optimized function for float8 and
> one for numeric datatypes, just like width_bucket.
>
>  On 08/07/14 02:14, Tom Lane wrote:
>> Also, I'm not convinced by this business of throwing an error for a
>> NULL array element.  Per spec, null arguments to width_bucket()
>> produce a null result, not an error --- shouldn't this flavor act
>> similarly?  In any case I think the test needs to use
>> array_contains_nulls() not just ARR_HASNULL.
>>
>
> I really think there should be difference between NULL array and NULL
> inside array and that NULL inside the array is wrong input. I changed the
> check to array_contains_nulls() though.
>
>  On 08/07/14 02:14, Tom Lane wrote:
>> Lastly, the spec defines behaviors for width_bucket that allow either
>> ascending or descending buckets.  We could possibly do something
>> similar
>>
>
> I am not sure it's worth it here as we require input to be sorted anyway.
> It might be worthwhile if we decided to do this as an aggregate (since
> there input would not be required to be presorted) instead of function but
> I am not sure if that's desirable either as that would limit usability to
> only the single main use-case (group by and count()).
>
>
>
> On 20/07/14 11:01, Simon Riggs wrote:
>
>> On 16 July 2014 20:35, Pavel Stehule <[email protected]> wrote:
>>
>>>
>>>  On 08/07/14 02:14, Tom Lane wrote:
>>>>
>>>>>
>>>>> I didn't see any discussion of the naming question in this thread.
>>>>> I'd like to propose that it should be just "width_bucket()"; we can
>>>>> easily determine which function is meant, considering that the
>>>>> SQL-spec variants don't take arrays and don't even have the same
>>>>> number of actual arguments.
>>>>>
>>>>
>>>> I did mention in submission that the names are up for discussion, I am
>>>> all
>>>> for naming it just width_bucket.
>>>>
>>>
>>> I had this idea too - but I am not sure if it is good idea. A distance
>>> between ANSI SQL with_bucket and our enhancing is larger than in our
>>> implementation of "median" for example.
>>>
>>> I can live with both names, but current name I prefer.
>>>
>>
>>
>> So I suggest that we use the more generic function name bin(), with a
>> second choice of discretize()  (though that seems annoyingly easy to
>> spell incorrectly)
>>
>>
> I really don't think bin() is good choice here, bucket has same meaning in
> this context and bin is often used as shorthand for binary which would be
> confusing here.
>
> Anyway I currently left the name as it is, I would not be against using
> width_bucket() or even just bucket(), not sure about the discretize()
> option.
>
>
> --
>  Petr Jelinek                  http://www.2ndQuadrant.com/
>
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Re: [HACKERS] Built-in binning functions

Reply via email to