Good job Jiatao! I appreciate your support to the community! JiaTao Tao <taojia...@gmail.com> 于2018年11月7日周三 上午9:17写道:
> Very glad that my reply is helpful, I already opened a JIRA to add logs > for "*GTStreamAggregateScanner*" and next time it would be much easier to > navigate this :). > > cheney <531014...@qq.com> 于2018年11月6日周二 下午11:57写道: > >> Hi, JiaTao, thank you very much! The statis is right when I config >> "kylin.query.stream-aggregate-enabled=false". >> You are right. Records are pre-aggregated by GTStreamAggregateScanner. >> >> >> ------------------ 原始邮件 ------------------ >> *发件人:* "JiaTao Tao"<taojia...@gmail.com>; >> *发送时间:* 2018年11月6日(星期二) 晚上10:50 >> *收件人:* "user"<u...@kylin.apache.org>; >> *主题:* Re: doubt about measure of processedRowCount >> >> One possible place I can find in the code is using >> *GTStreamAggregateScanne*r (in "*SegmentCubeTupleIterator.java#111"*). >> You can find it does do aggregate in >> *"GTStreamAggregateScanner.AbstractStreamMergeIterator#next*" so it'll >> reduce the inputs. But there's no log printing in this class as you can >> see, so it's pretty hard to confirm. Try >> "kylin.query.stream-aggregate-enabled=false" and run the scenario again to >> see any differences. >> >> cheney <531014...@qq.com> 于2018年11月5日周一 下午6:55写道: >> >>> Yes. the log is as following. >>> >>> 2018-11-02 22:25:34,980 DEBUG [Query >>> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] >>> gtrecord.StorageResponseGTScatter:88 : Using >>> SortMergedPartitionResultIterator to merge 103 partition results >>> 2018-11-02 22:25:34,982 INFO [Query >>> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] >>> gtrecord.SequentialCubeTupleIterator:73 : Using Iterators.concat *to >>> merge segment results* >>> 2018-11-02 22:25:34,982 DEBUG [Query >>> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] enumerator.OLAPEnumerator:122 >>> : return TupleIterator... >>> 2018-11-02 22:25:34,991 INFO [Query >>> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] service.QueryService:897 : >>> *Processed >>> rows for each storageContext*: 366 >>> 2018-11-02 22:25:34,991 INFO [Query >>> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] service.QueryService:422 : >>> Stats of SQL response: isException: false, duration: 20, *total scan >>> count 1552* >>> >>> Acoording the log, *valueA *= 366. *valueB*= (total scan count) 1552 - >>> (total Agrrated/filterd in hbase)270 = 1282 >>> *valueB *is much larger than *valueA *. >>> >>> >>> >>> ------------------ 原始邮件 ------------------ >>> *发件人:* "JiaTao Tao"<taojia...@gmail.com>; >>> *发送时间:* 2018年11月5日(星期一) 下午2:41 >>> *收件人:* "user"<u...@kylin.apache.org>; >>> *主题:* Re: doubt about measure of processedRowCount >>> >>> Can you grep logs like "to merge segment results" in that scenario? >>> >>> cheney <531014...@qq.com> 于2018年11月3日周六 下午4:15写道: >>> >>>> Thank your repling, .but I am sure there's only one OlapContext in the >>>> quey in my scenario. >>>> ---Original--- >>>> *From:* "JiaTao Tao"<taojia...@gmail.com> >>>> *Date:* Sat, Nov 3, 2018 10:42 AM >>>> *To:* "user"<u...@kylin.apache.org>; >>>> *Subject:* Re: doubt about measure of processedRowCount >>>> >>>> Maybe count all the *valueA *would be more appropriate, cuz maybe >>>> there's more than one OlapContext in the query ( one OlapContext correspond >>>> one storageContext ). >>>> >>>> There are two good blogs about Kylin's query engine, you may take a >>>> look :). >>>> >>>> https://blog.csdn.net/yu616568/article/details/50838504 >>>> >>>> https://zhuanlan.zhihu.com/p/30613434 >>>> >>>> cheney <531014...@qq.com> 于2018年11月2日周五 下午11:10写道: >>>> >>>>> Hi, guys >>>>> >>>>> When I executed a sql in kylin, kylin server will log some log >>>>> about query statics. for example, The log is as following: >>>>> >>>>> "Processed rows for each storageContext: *valueA*". *valueA *is >>>>> processedRowCount. >>>>> >>>>> What I understand is processedRowCount is the record rows >>>>> numbers returned by hbase. >>>>> >>>>> Hbase corprocessor will log region stats, including: "*Total >>>>> scanned row*","Total filtered/aggred row". >>>>> >>>>> For one region, final records returned by hbase = *Total scanned >>>>> row - *Total filtered/aggred row; >>>>> Suppose this query need to scan 10 region in hbase, we can get >>>>> every region stats. we can get all records *valueB *returned by >>>>> hbase by >>>>> suming every final records in 10 region. >>>>> >>>>> In general, *valueA *is equal to * valueB*, but *valueB *is >>>>> much larger than *valueA* in sometimes. Why? >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> >>>> >>>> Regards! >>>> >>>> Aron Tao >>>> >>> >>> >>> -- >>> >>> >>> Regards! >>> >>> Aron Tao >>> >> >> >> -- >> >> >> Regards! >> >> Aron Tao >> > > > -- > > > Regards! > > Aron Tao > -- Best regards, Shaofeng Shi 史少锋