Re: Bump KYLIN-976

Li Yang Thu, 10 Dec 2015 19:36:46 -0800

I can see the need from user perspective. Let me look again at the query
parsing logic and see if any tweak is possible.


On Fri, Dec 11, 2015 at 7:59 AM, Luke Han <[email protected]> wrote:

> It should transparent to users, they should always use "count(distinct
> seller_id)"
>
> How about one setting value when user pickup "DistinctCount"? We already
> have error range, it should be easy to have one more option say "Precise"
> (but yes, also have to display warn message about the disadvantage for
> this). Then in code level, it could be easy to handle like Yerui mentioned.
>
> Thanks.
>
>
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> On Thu, Dec 10, 2015 at 7:33 PM, Yerui Sun <[email protected]> wrote:
>
> > You’re right, I ignored that can’t get return type from query context.
> >
> > I’m not familiar with Calcite UDF, do you mean a new sql writing like
> > “count (distinct_precise seller_id)”? That’s not transparent for user,
> > seems not the best way.
> >
> > Another way is still mapping count distinct query to one aggr func, and
> > making the func could handle variety of ValueType. For example,
> abstracting
> > a count distinct measure type called ‘CountDistinctMeasureType’, as
> parent
> > of HLLCMeasureType and BitmapMeasureType, and mapping all count distinct
> > query to ‘CountDistinctAggFunc’, with abstract class
> ‘CountDistinctCounter’
> > as add() and merge() parameter type. When this aggr func was called, the
> > processing depends on the value type, like HLLCounter or BitmapCounter.
> >
> > I’not sure whether I’ve described it clear. Actually I have implemented
> > bitmap count distinct in 1.x-staging by this way, keeping hll count
> > distinct still working. Maybe I could implement it in 2.x-staging with
> your
> > refactoring, and we could review the code later?
> >
> > > 在 2015年12月10日，18:23，Li Yang <[email protected]> 写道：
> > >
> > > I've considered exactly the same point. It does not work when mapping a
> > > query to the aggregation functions. A query will simply say "count
> > > (distinct seller_id)", and won't mention any return type.
> > >
> > > The way out is adding a new aggregation for your count distinct using
> > > Calcite UDF, then it can be correctly mapped. I don't have an example
> > yet,
> > > so we need do some exploration here. Actually I hope to use your case
> as
> > an
> > > example.  :-)
> > >
> > >
> > >
> > > On Thu, Dec 10, 2015 at 4:25 PM, Yerui Sun <[email protected]> wrote:
> > >
> > >> It’s really great job, Yang!
> > >>
> > >> I have a question about the MeasureTypeFactory. In the current
> > 2.x-stating
> > >> code, two built-in measure types (hll and topn) were registered, and
> the
> > >> factory create the corresponding MeasureType only by funcName
> > >> (‘COUNT_DISTINCT’ for hll and ‘TOP_N’ for topn).
> > >> However, if I want to create a new measure type with same funcName,
> > that’s
> > >> impossible. For example, I want to create bitmap measure by funcName
> > >> ‘COUNT_DISTINCT’, same as hll measure's funcName.
> > >>
> > >> One possible way is that factory create measure type not only rely on
> > >> funcName, but also returnType, making one funcName to multi measure is
> > >> possible.
> > >> In another word, we could define the measure type in factory using
> > >> funcName and returnType, instead of only funcName for now.
> > >>
> > >> Do you think this make sense? Looking for your comment.
> > >>
> > >>> 在 2015年12月10日，14:57，Li Yang <[email protected]> 写道：
> > >>>
> > >>>> Would it be possible to create a How to guide on ability to add
> custom
> > >> aggregates
> > >>> into Kylin
> > >>>
> > >>> Definitely! I should spent some time on documentation in the
> following
> > >>> days. Many features have been added to 2.x. Aiming to release a 2.0
> > beta
> > >>> soon, it's time to work on document. :-)
> > >>>
> > >>>> Where are the custom aggregates computed on the Kylin Service or on
> > >> Hbase
> > >>> CoProcessors?
> > >>>
> > >>> The aggregation takes place in MR during cube build, then in
> > CoProcessor
> > >>> and query service during query. Originally I hoped user can add new
> > >>> aggregation by just dropping a jar ball and some configuration.
> However
> > >> it
> > >>> turns out to be more than that due to CoProcessor... Anyway, it's a
> lot
> > >>> more friendly to developers now.
> > >>>
> > >>> On Thu, Dec 10, 2015 at 2:14 PM, hongbin ma <[email protected]>
> > >> wrote:
> > >>>
> > >>>> hi seshu
> > >>>>
> > >>>> yang's work is more of a framework. it reduces developers' efforts
> if
> > >>>> he/she wants to add a new custom aggregations. Since some of the
> > >>>> aggregations happens in coprocessors, we cannot completely get rid
> of
> > >>>> re-compiling & re-deploying. If someone from the community is
> > >> interested in
> > >>>> crafting a new aggregation, he/she can take a look at how HLL/TOPN
> > >>>> aggregation is implemented.
> > >>>>
> > >>>> On Wed, Dec 9, 2015 at 9:43 PM, Adunuthula, Seshu <
> > [email protected]
> > >>>
> > >>>> wrote:
> > >>>>
> > >>>>> Yang,
> > >>>>>
> > >>>>> Would it be possible to create a How to guide on ability to add
> > custom
> > >>>>> aggregates into Kylin. Javadocs are good, but to encourage
> community
> > >>>>> participation we should make it more easily consumable.
> > >>>>>
> > >>>>> Where are the custom aggregates computed on the Kylin Service or on
> > >> Hbase
> > >>>>> CoProcessors?
> > >>>>>
> > >>>>> Regards
> > >>>>> Seshu Adunuthula.
> > >>>>>
> > >>>>> On 12/8/15, 6:18 AM, "Adunuthula, Seshu" <[email protected]>
> > wrote:
> > >>>>>
> > >>>>>> This is awesome!
> > >>>>>>
> > >>>>>> On 12/8/15, 6:05 AM, "Shi, Shaofeng" <[email protected]> wrote:
> > >>>>>>
> > >>>>>>> This is another important refactor since making the build/query
> > >> engines
> > >>>>>>> as
> > >>>>>>> plugable. Thanks Yang!
> > >>>>>>>
> > >>>>>>> On 12/8/15, 5:47 PM, "Li Yang" <[email protected]> wrote:
> > >>>>>>>
> > >>>>>>>> This is a bump of KYLIN-976 in case you are not yet aware...
> > >>>>>>>>
> > >>>>>>>> KYLIN-976 is a refactoring of how Kylin works with aggregation
> and
> > >>>> aims
> > >>>>>>>> to
> > >>>>>>>> allow adding custom aggregation types easily.
> > >>>>>>>>
> > >>>>>>>> Kylin started with basic support of SUM, COUNT, MAX, MIN, AVG
> > (from
> > >>>> sum
> > >>>>>>>> and
> > >>>>>>>> count), and COUNT_DISTINCT (based on hyperloglog). Later TopN is
> > >> added
> > >>>>>>>> in
> > >>>>>>>> 2.x branch. And the list is growing for sure. Xiaoyu is working
> on
> > >>>>>>>> storing
> > >>>>>>>> raw records as a special type of measure (KYLIN-1122), also
> Yerui
> > is
> > >>>>>>>> working on precise count distinct using bitmap (KYLIN-1186).
> > >>>>>>>>
> > >>>>>>>> The possibility is unlimited. Implement a domain specific
> > >> aggregation
> > >>>> is
> > >>>>>>>> now quite easy. E.g. aggregate user events to detect time
> serials
> > or
> > >>>>>>>> access
> > >>>>>>>> patterns. Or draw a sketch of certain user groups. Or
> > pre-calculate
> > >>>>>>>> clusters of data points. Or histogram... Use your imagination.
> > >>>>>>>>
> > >>>>>>>> Whoever interested can peek at MeasureTypeFactory and
> MeasureType
> > on
> > >>>> 2.x
> > >>>>>>>> branch. The API may still change, but at the same time is stable
> > >>>> enough
> > >>>>>>>> for
> > >>>>>>>> pilots. The javadoc should get you started. HLLCMeasureType and
> > >>>>>>>> TopNMeasureType are two good examples.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Cheers
> > >>>>>>>> Yang
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Regards,
> > >>>>
> > >>>> *Bin Mahone | 马洪宾*
> > >>>> Apache Kylin: http://kylin.io
> > >>>> Github: https://github.com/binmahone
> > >>>>
> > >>
> > >>
> >
> >
>

Re: Bump KYLIN-976

Reply via email to