Very glad that my reply is helpful, I already opened a JIRA to add logs for "*GTStreamAggregateScanner*" and next time it would be much easier to navigate this :).
cheney <531014...@qq.com> 于2018年11月6日周二 下午11:57写道: > Hi, JiaTao, thank you very much! The statis is right when I config > "kylin.query.stream-aggregate-enabled=false". > You are right. Records are pre-aggregated by GTStreamAggregateScanner. > > > ------------------ 原始邮件 ------------------ > *发件人:* "JiaTao Tao"<taojia...@gmail.com>; > *发送时间:* 2018年11月6日(星期二) 晚上10:50 > *收件人:* "user"<u...@kylin.apache.org>; > *主题:* Re: doubt about measure of processedRowCount > > One possible place I can find in the code is using > *GTStreamAggregateScanne*r (in "*SegmentCubeTupleIterator.java#111"*). > You can find it does do aggregate in > *"GTStreamAggregateScanner.AbstractStreamMergeIterator#next*" so it'll > reduce the inputs. But there's no log printing in this class as you can > see, so it's pretty hard to confirm. Try > "kylin.query.stream-aggregate-enabled=false" and run the scenario again to > see any differences. > > cheney <531014...@qq.com> 于2018年11月5日周一 下午6:55写道: > >> Yes. the log is as following. >> >> 2018-11-02 22:25:34,980 DEBUG [Query >> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] >> gtrecord.StorageResponseGTScatter:88 : Using >> SortMergedPartitionResultIterator to merge 103 partition results >> 2018-11-02 22:25:34,982 INFO [Query >> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] >> gtrecord.SequentialCubeTupleIterator:73 : Using Iterators.concat *to >> merge segment results* >> 2018-11-02 22:25:34,982 DEBUG [Query >> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] enumerator.OLAPEnumerator:122 >> : return TupleIterator... >> 2018-11-02 22:25:34,991 INFO [Query >> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] service.QueryService:897 : >> *Processed >> rows for each storageContext*: 366 >> 2018-11-02 22:25:34,991 INFO [Query >> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] service.QueryService:422 : >> Stats of SQL response: isException: false, duration: 20, *total scan >> count 1552* >> >> Acoording the log, *valueA *= 366. *valueB*= (total scan count) 1552 - >> (total Agrrated/filterd in hbase)270 = 1282 >> *valueB *is much larger than *valueA *. >> >> >> >> ------------------ 原始邮件 ------------------ >> *发件人:* "JiaTao Tao"<taojia...@gmail.com>; >> *发送时间:* 2018年11月5日(星期一) 下午2:41 >> *收件人:* "user"<u...@kylin.apache.org>; >> *主题:* Re: doubt about measure of processedRowCount >> >> Can you grep logs like "to merge segment results" in that scenario? >> >> cheney <531014...@qq.com> 于2018年11月3日周六 下午4:15写道: >> >>> Thank your repling, .but I am sure there's only one OlapContext in the >>> quey in my scenario. >>> ---Original--- >>> *From:* "JiaTao Tao"<taojia...@gmail.com> >>> *Date:* Sat, Nov 3, 2018 10:42 AM >>> *To:* "user"<u...@kylin.apache.org>; >>> *Subject:* Re: doubt about measure of processedRowCount >>> >>> Maybe count all the *valueA *would be more appropriate, cuz maybe >>> there's more than one OlapContext in the query ( one OlapContext correspond >>> one storageContext ). >>> >>> There are two good blogs about Kylin's query engine, you may take a look >>> :). >>> >>> https://blog.csdn.net/yu616568/article/details/50838504 >>> >>> https://zhuanlan.zhihu.com/p/30613434 >>> >>> cheney <531014...@qq.com> 于2018年11月2日周五 下午11:10写道: >>> >>>> Hi, guys >>>> >>>> When I executed a sql in kylin, kylin server will log some log >>>> about query statics. for example, The log is as following: >>>> >>>> "Processed rows for each storageContext: *valueA*". *valueA *is >>>> processedRowCount. >>>> >>>> What I understand is processedRowCount is the record rows >>>> numbers returned by hbase. >>>> >>>> Hbase corprocessor will log region stats, including: "*Total >>>> scanned row*","Total filtered/aggred row". >>>> >>>> For one region, final records returned by hbase = *Total scanned >>>> row - *Total filtered/aggred row; >>>> Suppose this query need to scan 10 region in hbase, we can get >>>> every region stats. we can get all records *valueB *returned by hbase >>>> by >>>> suming every final records in 10 region. >>>> >>>> In general, *valueA *is equal to * valueB*, but *valueB *is much >>>> larger than *valueA* in sometimes. Why? >>>> >>>> >>>> >>> >>> >>> -- >>> >>> >>> Regards! >>> >>> Aron Tao >>> >> >> >> -- >> >> >> Regards! >> >> Aron Tao >> > > > -- > > > Regards! > > Aron Tao > -- Regards! Aron Tao