Re: Bump KYLIN-976

Li Yang Thu, 10 Dec 2015 02:23:59 -0800

I've considered exactly the same point. It does not work when mapping a
query to the aggregation functions. A query will simply say "count
(distinct seller_id)", and won't mention any return type.


The way out is adding a new aggregation for your count distinct using
Calcite UDF, then it can be correctly mapped. I don't have an example yet,
so we need do some exploration here. Actually I hope to use your case as an
example.  :-)



On Thu, Dec 10, 2015 at 4:25 PM, Yerui Sun <[email protected]> wrote:

> It’s really great job, Yang!
>
> I have a question about the MeasureTypeFactory. In the current 2.x-stating
> code, two built-in measure types (hll and topn) were registered, and the
> factory create the corresponding MeasureType only by funcName
> (‘COUNT_DISTINCT’ for hll and ‘TOP_N’ for topn).
> However, if I want to create a new measure type with same funcName, that’s
> impossible. For example, I want to create bitmap measure by funcName
> ‘COUNT_DISTINCT’, same as hll measure's funcName.
>
> One possible way is that factory create measure type not only rely on
> funcName, but also returnType, making one funcName to multi measure is
> possible.
> In another word, we could define the measure type in factory using
> funcName and returnType, instead of only funcName for now.
>
> Do you think this make sense? Looking for your comment.
>
> > 在 2015年12月10日，14:57，Li Yang <[email protected]> 写道：
> >
> >> Would it be possible to create a How to guide on ability to add custom
> aggregates
> > into Kylin
> >
> > Definitely! I should spent some time on documentation in the following
> > days. Many features have been added to 2.x. Aiming to release a 2.0 beta
> > soon, it's time to work on document. :-)
> >
> >> Where are the custom aggregates computed on the Kylin Service or on
> Hbase
> > CoProcessors?
> >
> > The aggregation takes place in MR during cube build, then in CoProcessor
> > and query service during query. Originally I hoped user can add new
> > aggregation by just dropping a jar ball and some configuration. However
> it
> > turns out to be more than that due to CoProcessor... Anyway, it's a lot
> > more friendly to developers now.
> >
> > On Thu, Dec 10, 2015 at 2:14 PM, hongbin ma <[email protected]>
> wrote:
> >
> >> hi seshu
> >>
> >> yang's work is more of a framework. it reduces developers' efforts if
> >> he/she wants to add a new custom aggregations. Since some of the
> >> aggregations happens in coprocessors, we cannot completely get rid of
> >> re-compiling & re-deploying. If someone from the community is
> interested in
> >> crafting a new aggregation, he/she can take a look at how HLL/TOPN
> >> aggregation is implemented.
> >>
> >> On Wed, Dec 9, 2015 at 9:43 PM, Adunuthula, Seshu <[email protected]
> >
> >> wrote:
> >>
> >>> Yang,
> >>>
> >>> Would it be possible to create a How to guide on ability to add custom
> >>> aggregates into Kylin. Javadocs are good, but to encourage community
> >>> participation we should make it more easily consumable.
> >>>
> >>> Where are the custom aggregates computed on the Kylin Service or on
> Hbase
> >>> CoProcessors?
> >>>
> >>> Regards
> >>> Seshu Adunuthula.
> >>>
> >>> On 12/8/15, 6:18 AM, "Adunuthula, Seshu" <[email protected]> wrote:
> >>>
> >>>> This is awesome!
> >>>>
> >>>> On 12/8/15, 6:05 AM, "Shi, Shaofeng" <[email protected]> wrote:
> >>>>
> >>>>> This is another important refactor since making the build/query
> engines
> >>>>> as
> >>>>> plugable. Thanks Yang!
> >>>>>
> >>>>> On 12/8/15, 5:47 PM, "Li Yang" <[email protected]> wrote:
> >>>>>
> >>>>>> This is a bump of KYLIN-976 in case you are not yet aware...
> >>>>>>
> >>>>>> KYLIN-976 is a refactoring of how Kylin works with aggregation and
> >> aims
> >>>>>> to
> >>>>>> allow adding custom aggregation types easily.
> >>>>>>
> >>>>>> Kylin started with basic support of SUM, COUNT, MAX, MIN, AVG (from
> >> sum
> >>>>>> and
> >>>>>> count), and COUNT_DISTINCT (based on hyperloglog). Later TopN is
> added
> >>>>>> in
> >>>>>> 2.x branch. And the list is growing for sure. Xiaoyu is working on
> >>>>>> storing
> >>>>>> raw records as a special type of measure (KYLIN-1122), also Yerui is
> >>>>>> working on precise count distinct using bitmap (KYLIN-1186).
> >>>>>>
> >>>>>> The possibility is unlimited. Implement a domain specific
> aggregation
> >> is
> >>>>>> now quite easy. E.g. aggregate user events to detect time serials or
> >>>>>> access
> >>>>>> patterns. Or draw a sketch of certain user groups. Or pre-calculate
> >>>>>> clusters of data points. Or histogram... Use your imagination.
> >>>>>>
> >>>>>> Whoever interested can peek at MeasureTypeFactory and MeasureType on
> >> 2.x
> >>>>>> branch. The API may still change, but at the same time is stable
> >> enough
> >>>>>> for
> >>>>>> pilots. The javadoc should get you started. HLLCMeasureType and
> >>>>>> TopNMeasureType are two good examples.
> >>>>>>
> >>>>>>
> >>>>>> Cheers
> >>>>>> Yang
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >> --
> >> Regards,
> >>
> >> *Bin Mahone | 马洪宾*
> >> Apache Kylin: http://kylin.io
> >> Github: https://github.com/binmahone
> >>
>
>

Re: Bump KYLIN-976

Reply via email to