I can see the need from user perspective. Let me look again at the query parsing logic and see if any tweak is possible.
On Fri, Dec 11, 2015 at 7:59 AM, Luke Han <[email protected]> wrote: > It should transparent to users, they should always use "count(distinct > seller_id)" > > How about one setting value when user pickup "DistinctCount"? We already > have error range, it should be easy to have one more option say "Precise" > (but yes, also have to display warn message about the disadvantage for > this). Then in code level, it could be easy to handle like Yerui mentioned. > > Thanks. > > > > > Best Regards! > --------------------- > > Luke Han > > On Thu, Dec 10, 2015 at 7:33 PM, Yerui Sun <[email protected]> wrote: > > > You’re right, I ignored that can’t get return type from query context. > > > > I’m not familiar with Calcite UDF, do you mean a new sql writing like > > “count (distinct_precise seller_id)”? That’s not transparent for user, > > seems not the best way. > > > > Another way is still mapping count distinct query to one aggr func, and > > making the func could handle variety of ValueType. For example, > abstracting > > a count distinct measure type called ‘CountDistinctMeasureType’, as > parent > > of HLLCMeasureType and BitmapMeasureType, and mapping all count distinct > > query to ‘CountDistinctAggFunc’, with abstract class > ‘CountDistinctCounter’ > > as add() and merge() parameter type. When this aggr func was called, the > > processing depends on the value type, like HLLCounter or BitmapCounter. > > > > I’not sure whether I’ve described it clear. Actually I have implemented > > bitmap count distinct in 1.x-staging by this way, keeping hll count > > distinct still working. Maybe I could implement it in 2.x-staging with > your > > refactoring, and we could review the code later? > > > > > 在 2015年12月10日,18:23,Li Yang <[email protected]> 写道: > > > > > > I've considered exactly the same point. It does not work when mapping a > > > query to the aggregation functions. A query will simply say "count > > > (distinct seller_id)", and won't mention any return type. > > > > > > The way out is adding a new aggregation for your count distinct using > > > Calcite UDF, then it can be correctly mapped. I don't have an example > > yet, > > > so we need do some exploration here. Actually I hope to use your case > as > > an > > > example. :-) > > > > > > > > > > > > On Thu, Dec 10, 2015 at 4:25 PM, Yerui Sun <[email protected]> wrote: > > > > > >> It’s really great job, Yang! > > >> > > >> I have a question about the MeasureTypeFactory. In the current > > 2.x-stating > > >> code, two built-in measure types (hll and topn) were registered, and > the > > >> factory create the corresponding MeasureType only by funcName > > >> (‘COUNT_DISTINCT’ for hll and ‘TOP_N’ for topn). > > >> However, if I want to create a new measure type with same funcName, > > that’s > > >> impossible. For example, I want to create bitmap measure by funcName > > >> ‘COUNT_DISTINCT’, same as hll measure's funcName. > > >> > > >> One possible way is that factory create measure type not only rely on > > >> funcName, but also returnType, making one funcName to multi measure is > > >> possible. > > >> In another word, we could define the measure type in factory using > > >> funcName and returnType, instead of only funcName for now. > > >> > > >> Do you think this make sense? Looking for your comment. > > >> > > >>> 在 2015年12月10日,14:57,Li Yang <[email protected]> 写道: > > >>> > > >>>> Would it be possible to create a How to guide on ability to add > custom > > >> aggregates > > >>> into Kylin > > >>> > > >>> Definitely! I should spent some time on documentation in the > following > > >>> days. Many features have been added to 2.x. Aiming to release a 2.0 > > beta > > >>> soon, it's time to work on document. :-) > > >>> > > >>>> Where are the custom aggregates computed on the Kylin Service or on > > >> Hbase > > >>> CoProcessors? > > >>> > > >>> The aggregation takes place in MR during cube build, then in > > CoProcessor > > >>> and query service during query. Originally I hoped user can add new > > >>> aggregation by just dropping a jar ball and some configuration. > However > > >> it > > >>> turns out to be more than that due to CoProcessor... Anyway, it's a > lot > > >>> more friendly to developers now. > > >>> > > >>> On Thu, Dec 10, 2015 at 2:14 PM, hongbin ma <[email protected]> > > >> wrote: > > >>> > > >>>> hi seshu > > >>>> > > >>>> yang's work is more of a framework. it reduces developers' efforts > if > > >>>> he/she wants to add a new custom aggregations. Since some of the > > >>>> aggregations happens in coprocessors, we cannot completely get rid > of > > >>>> re-compiling & re-deploying. If someone from the community is > > >> interested in > > >>>> crafting a new aggregation, he/she can take a look at how HLL/TOPN > > >>>> aggregation is implemented. > > >>>> > > >>>> On Wed, Dec 9, 2015 at 9:43 PM, Adunuthula, Seshu < > > [email protected] > > >>> > > >>>> wrote: > > >>>> > > >>>>> Yang, > > >>>>> > > >>>>> Would it be possible to create a How to guide on ability to add > > custom > > >>>>> aggregates into Kylin. Javadocs are good, but to encourage > community > > >>>>> participation we should make it more easily consumable. > > >>>>> > > >>>>> Where are the custom aggregates computed on the Kylin Service or on > > >> Hbase > > >>>>> CoProcessors? > > >>>>> > > >>>>> Regards > > >>>>> Seshu Adunuthula. > > >>>>> > > >>>>> On 12/8/15, 6:18 AM, "Adunuthula, Seshu" <[email protected]> > > wrote: > > >>>>> > > >>>>>> This is awesome! > > >>>>>> > > >>>>>> On 12/8/15, 6:05 AM, "Shi, Shaofeng" <[email protected]> wrote: > > >>>>>> > > >>>>>>> This is another important refactor since making the build/query > > >> engines > > >>>>>>> as > > >>>>>>> plugable. Thanks Yang! > > >>>>>>> > > >>>>>>> On 12/8/15, 5:47 PM, "Li Yang" <[email protected]> wrote: > > >>>>>>> > > >>>>>>>> This is a bump of KYLIN-976 in case you are not yet aware... > > >>>>>>>> > > >>>>>>>> KYLIN-976 is a refactoring of how Kylin works with aggregation > and > > >>>> aims > > >>>>>>>> to > > >>>>>>>> allow adding custom aggregation types easily. > > >>>>>>>> > > >>>>>>>> Kylin started with basic support of SUM, COUNT, MAX, MIN, AVG > > (from > > >>>> sum > > >>>>>>>> and > > >>>>>>>> count), and COUNT_DISTINCT (based on hyperloglog). Later TopN is > > >> added > > >>>>>>>> in > > >>>>>>>> 2.x branch. And the list is growing for sure. Xiaoyu is working > on > > >>>>>>>> storing > > >>>>>>>> raw records as a special type of measure (KYLIN-1122), also > Yerui > > is > > >>>>>>>> working on precise count distinct using bitmap (KYLIN-1186). > > >>>>>>>> > > >>>>>>>> The possibility is unlimited. Implement a domain specific > > >> aggregation > > >>>> is > > >>>>>>>> now quite easy. E.g. aggregate user events to detect time > serials > > or > > >>>>>>>> access > > >>>>>>>> patterns. Or draw a sketch of certain user groups. Or > > pre-calculate > > >>>>>>>> clusters of data points. Or histogram... Use your imagination. > > >>>>>>>> > > >>>>>>>> Whoever interested can peek at MeasureTypeFactory and > MeasureType > > on > > >>>> 2.x > > >>>>>>>> branch. The API may still change, but at the same time is stable > > >>>> enough > > >>>>>>>> for > > >>>>>>>> pilots. The javadoc should get you started. HLLCMeasureType and > > >>>>>>>> TopNMeasureType are two good examples. > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> Cheers > > >>>>>>>> Yang > > >>>>>>> > > >>>>>> > > >>>>> > > >>>>> > > >>>> > > >>>> > > >>>> -- > > >>>> Regards, > > >>>> > > >>>> *Bin Mahone | 马洪宾* > > >>>> Apache Kylin: http://kylin.io > > >>>> Github: https://github.com/binmahone > > >>>> > > >> > > >> > > > > >
