You’re right, I ignored that can’t get return type from query context.

I’m not familiar with Calcite UDF, do you mean a new sql writing like “count 
(distinct_precise seller_id)”? That’s not transparent for user, seems not the 
best way.

Another way is still mapping count distinct query to one aggr func, and making 
the func could handle variety of ValueType. For example, abstracting a count 
distinct measure type called ‘CountDistinctMeasureType’, as parent of 
HLLCMeasureType and BitmapMeasureType, and mapping all count distinct query to 
‘CountDistinctAggFunc’, with abstract class ‘CountDistinctCounter’ as add() and 
merge() parameter type. When this aggr func was called, the processing depends 
on the value type, like HLLCounter or BitmapCounter.

I’not sure whether I’ve described it clear. Actually I have implemented bitmap 
count distinct in 1.x-staging by this way, keeping hll count distinct still 
working. Maybe I could implement it in 2.x-staging with your refactoring, and 
we could review the code later?

> 在 2015年12月10日,18:23,Li Yang <[email protected]> 写道:
> 
> I've considered exactly the same point. It does not work when mapping a
> query to the aggregation functions. A query will simply say "count
> (distinct seller_id)", and won't mention any return type.
> 
> The way out is adding a new aggregation for your count distinct using
> Calcite UDF, then it can be correctly mapped. I don't have an example yet,
> so we need do some exploration here. Actually I hope to use your case as an
> example.  :-)
> 
> 
> 
> On Thu, Dec 10, 2015 at 4:25 PM, Yerui Sun <[email protected]> wrote:
> 
>> It’s really great job, Yang!
>> 
>> I have a question about the MeasureTypeFactory. In the current 2.x-stating
>> code, two built-in measure types (hll and topn) were registered, and the
>> factory create the corresponding MeasureType only by funcName
>> (‘COUNT_DISTINCT’ for hll and ‘TOP_N’ for topn).
>> However, if I want to create a new measure type with same funcName, that’s
>> impossible. For example, I want to create bitmap measure by funcName
>> ‘COUNT_DISTINCT’, same as hll measure's funcName.
>> 
>> One possible way is that factory create measure type not only rely on
>> funcName, but also returnType, making one funcName to multi measure is
>> possible.
>> In another word, we could define the measure type in factory using
>> funcName and returnType, instead of only funcName for now.
>> 
>> Do you think this make sense? Looking for your comment.
>> 
>>> 在 2015年12月10日,14:57,Li Yang <[email protected]> 写道:
>>> 
>>>> Would it be possible to create a How to guide on ability to add custom
>> aggregates
>>> into Kylin
>>> 
>>> Definitely! I should spent some time on documentation in the following
>>> days. Many features have been added to 2.x. Aiming to release a 2.0 beta
>>> soon, it's time to work on document. :-)
>>> 
>>>> Where are the custom aggregates computed on the Kylin Service or on
>> Hbase
>>> CoProcessors?
>>> 
>>> The aggregation takes place in MR during cube build, then in CoProcessor
>>> and query service during query. Originally I hoped user can add new
>>> aggregation by just dropping a jar ball and some configuration. However
>> it
>>> turns out to be more than that due to CoProcessor... Anyway, it's a lot
>>> more friendly to developers now.
>>> 
>>> On Thu, Dec 10, 2015 at 2:14 PM, hongbin ma <[email protected]>
>> wrote:
>>> 
>>>> hi seshu
>>>> 
>>>> yang's work is more of a framework. it reduces developers' efforts if
>>>> he/she wants to add a new custom aggregations. Since some of the
>>>> aggregations happens in coprocessors, we cannot completely get rid of
>>>> re-compiling & re-deploying. If someone from the community is
>> interested in
>>>> crafting a new aggregation, he/she can take a look at how HLL/TOPN
>>>> aggregation is implemented.
>>>> 
>>>> On Wed, Dec 9, 2015 at 9:43 PM, Adunuthula, Seshu <[email protected]
>>> 
>>>> wrote:
>>>> 
>>>>> Yang,
>>>>> 
>>>>> Would it be possible to create a How to guide on ability to add custom
>>>>> aggregates into Kylin. Javadocs are good, but to encourage community
>>>>> participation we should make it more easily consumable.
>>>>> 
>>>>> Where are the custom aggregates computed on the Kylin Service or on
>> Hbase
>>>>> CoProcessors?
>>>>> 
>>>>> Regards
>>>>> Seshu Adunuthula.
>>>>> 
>>>>> On 12/8/15, 6:18 AM, "Adunuthula, Seshu" <[email protected]> wrote:
>>>>> 
>>>>>> This is awesome!
>>>>>> 
>>>>>> On 12/8/15, 6:05 AM, "Shi, Shaofeng" <[email protected]> wrote:
>>>>>> 
>>>>>>> This is another important refactor since making the build/query
>> engines
>>>>>>> as
>>>>>>> plugable. Thanks Yang!
>>>>>>> 
>>>>>>> On 12/8/15, 5:47 PM, "Li Yang" <[email protected]> wrote:
>>>>>>> 
>>>>>>>> This is a bump of KYLIN-976 in case you are not yet aware...
>>>>>>>> 
>>>>>>>> KYLIN-976 is a refactoring of how Kylin works with aggregation and
>>>> aims
>>>>>>>> to
>>>>>>>> allow adding custom aggregation types easily.
>>>>>>>> 
>>>>>>>> Kylin started with basic support of SUM, COUNT, MAX, MIN, AVG (from
>>>> sum
>>>>>>>> and
>>>>>>>> count), and COUNT_DISTINCT (based on hyperloglog). Later TopN is
>> added
>>>>>>>> in
>>>>>>>> 2.x branch. And the list is growing for sure. Xiaoyu is working on
>>>>>>>> storing
>>>>>>>> raw records as a special type of measure (KYLIN-1122), also Yerui is
>>>>>>>> working on precise count distinct using bitmap (KYLIN-1186).
>>>>>>>> 
>>>>>>>> The possibility is unlimited. Implement a domain specific
>> aggregation
>>>> is
>>>>>>>> now quite easy. E.g. aggregate user events to detect time serials or
>>>>>>>> access
>>>>>>>> patterns. Or draw a sketch of certain user groups. Or pre-calculate
>>>>>>>> clusters of data points. Or histogram... Use your imagination.
>>>>>>>> 
>>>>>>>> Whoever interested can peek at MeasureTypeFactory and MeasureType on
>>>> 2.x
>>>>>>>> branch. The API may still change, but at the same time is stable
>>>> enough
>>>>>>>> for
>>>>>>>> pilots. The javadoc should get you started. HLLCMeasureType and
>>>>>>>> TopNMeasureType are two good examples.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Cheers
>>>>>>>> Yang
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Regards,
>>>> 
>>>> *Bin Mahone | 马洪宾*
>>>> Apache Kylin: http://kylin.io
>>>> Github: https://github.com/binmahone
>>>> 
>> 
>> 

Reply via email to