I've considered exactly the same point. It does not work when mapping a query to the aggregation functions. A query will simply say "count (distinct seller_id)", and won't mention any return type.
The way out is adding a new aggregation for your count distinct using Calcite UDF, then it can be correctly mapped. I don't have an example yet, so we need do some exploration here. Actually I hope to use your case as an example. :-) On Thu, Dec 10, 2015 at 4:25 PM, Yerui Sun <[email protected]> wrote: > It’s really great job, Yang! > > I have a question about the MeasureTypeFactory. In the current 2.x-stating > code, two built-in measure types (hll and topn) were registered, and the > factory create the corresponding MeasureType only by funcName > (‘COUNT_DISTINCT’ for hll and ‘TOP_N’ for topn). > However, if I want to create a new measure type with same funcName, that’s > impossible. For example, I want to create bitmap measure by funcName > ‘COUNT_DISTINCT’, same as hll measure's funcName. > > One possible way is that factory create measure type not only rely on > funcName, but also returnType, making one funcName to multi measure is > possible. > In another word, we could define the measure type in factory using > funcName and returnType, instead of only funcName for now. > > Do you think this make sense? Looking for your comment. > > > 在 2015年12月10日,14:57,Li Yang <[email protected]> 写道: > > > >> Would it be possible to create a How to guide on ability to add custom > aggregates > > into Kylin > > > > Definitely! I should spent some time on documentation in the following > > days. Many features have been added to 2.x. Aiming to release a 2.0 beta > > soon, it's time to work on document. :-) > > > >> Where are the custom aggregates computed on the Kylin Service or on > Hbase > > CoProcessors? > > > > The aggregation takes place in MR during cube build, then in CoProcessor > > and query service during query. Originally I hoped user can add new > > aggregation by just dropping a jar ball and some configuration. However > it > > turns out to be more than that due to CoProcessor... Anyway, it's a lot > > more friendly to developers now. > > > > On Thu, Dec 10, 2015 at 2:14 PM, hongbin ma <[email protected]> > wrote: > > > >> hi seshu > >> > >> yang's work is more of a framework. it reduces developers' efforts if > >> he/she wants to add a new custom aggregations. Since some of the > >> aggregations happens in coprocessors, we cannot completely get rid of > >> re-compiling & re-deploying. If someone from the community is > interested in > >> crafting a new aggregation, he/she can take a look at how HLL/TOPN > >> aggregation is implemented. > >> > >> On Wed, Dec 9, 2015 at 9:43 PM, Adunuthula, Seshu <[email protected] > > > >> wrote: > >> > >>> Yang, > >>> > >>> Would it be possible to create a How to guide on ability to add custom > >>> aggregates into Kylin. Javadocs are good, but to encourage community > >>> participation we should make it more easily consumable. > >>> > >>> Where are the custom aggregates computed on the Kylin Service or on > Hbase > >>> CoProcessors? > >>> > >>> Regards > >>> Seshu Adunuthula. > >>> > >>> On 12/8/15, 6:18 AM, "Adunuthula, Seshu" <[email protected]> wrote: > >>> > >>>> This is awesome! > >>>> > >>>> On 12/8/15, 6:05 AM, "Shi, Shaofeng" <[email protected]> wrote: > >>>> > >>>>> This is another important refactor since making the build/query > engines > >>>>> as > >>>>> plugable. Thanks Yang! > >>>>> > >>>>> On 12/8/15, 5:47 PM, "Li Yang" <[email protected]> wrote: > >>>>> > >>>>>> This is a bump of KYLIN-976 in case you are not yet aware... > >>>>>> > >>>>>> KYLIN-976 is a refactoring of how Kylin works with aggregation and > >> aims > >>>>>> to > >>>>>> allow adding custom aggregation types easily. > >>>>>> > >>>>>> Kylin started with basic support of SUM, COUNT, MAX, MIN, AVG (from > >> sum > >>>>>> and > >>>>>> count), and COUNT_DISTINCT (based on hyperloglog). Later TopN is > added > >>>>>> in > >>>>>> 2.x branch. And the list is growing for sure. Xiaoyu is working on > >>>>>> storing > >>>>>> raw records as a special type of measure (KYLIN-1122), also Yerui is > >>>>>> working on precise count distinct using bitmap (KYLIN-1186). > >>>>>> > >>>>>> The possibility is unlimited. Implement a domain specific > aggregation > >> is > >>>>>> now quite easy. E.g. aggregate user events to detect time serials or > >>>>>> access > >>>>>> patterns. Or draw a sketch of certain user groups. Or pre-calculate > >>>>>> clusters of data points. Or histogram... Use your imagination. > >>>>>> > >>>>>> Whoever interested can peek at MeasureTypeFactory and MeasureType on > >> 2.x > >>>>>> branch. The API may still change, but at the same time is stable > >> enough > >>>>>> for > >>>>>> pilots. The javadoc should get you started. HLLCMeasureType and > >>>>>> TopNMeasureType are two good examples. > >>>>>> > >>>>>> > >>>>>> Cheers > >>>>>> Yang > >>>>> > >>>> > >>> > >>> > >> > >> > >> -- > >> Regards, > >> > >> *Bin Mahone | 马洪宾* > >> Apache Kylin: http://kylin.io > >> Github: https://github.com/binmahone > >> > >
