Re: doubt about measure of processedRowCount

JiaTao Tao Tue, 06 Nov 2018 17:18:01 -0800

Very glad that my reply is helpful, I already opened a JIRA to add logs for
"*GTStreamAggregateScanner*" and next time it would be much easier to
navigate this :).


cheney <531014...@qq.com> 于2018年11月6日周二 下午11:57写道：

> Hi, JiaTao, thank you very much!  The statis is right when I config 
> "kylin.query.stream-aggregate-enabled=false".
> You are right. Records are pre-aggregated by GTStreamAggregateScanner.
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "JiaTao Tao"<taojia...@gmail.com>;
> *发送时间:* 2018年11月6日(星期二) 晚上10:50
> *收件人:* "user"<u...@kylin.apache.org>;
> *主题:* Re: doubt about measure of processedRowCount
>
> One possible place I can find in the code is using
> *GTStreamAggregateScanne*r (in "*SegmentCubeTupleIterator.java#111"*).
> You can find it does do aggregate in
> *"GTStreamAggregateScanner.AbstractStreamMergeIterator#next*" so it'll
> reduce the inputs. But there's no log printing in this class as you can
> see, so it's pretty hard to confirm. Try
> "kylin.query.stream-aggregate-enabled=false" and run the scenario again to
> see any differences.
>
> cheney <531014...@qq.com> 于2018年11月5日周一 下午6:55写道：
>
>> Yes. the log is as following.
>>
>> 2018-11-02 22:25:34,980 DEBUG [Query
>> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914]
>> gtrecord.StorageResponseGTScatter:88 : Using
>> SortMergedPartitionResultIterator to merge 103 partition results
>> 2018-11-02 22:25:34,982 INFO  [Query
>> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914]
>> gtrecord.SequentialCubeTupleIterator:73 : Using Iterators.concat *to
>> merge segment results*
>> 2018-11-02 22:25:34,982 DEBUG [Query
>> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] enumerator.OLAPEnumerator:122
>> : return TupleIterator...
>> 2018-11-02 22:25:34,991 INFO  [Query
>> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] service.QueryService:897 : 
>> *Processed
>> rows for each storageContext*: 366
>> 2018-11-02 22:25:34,991 INFO  [Query
>> 03ea4f21-29ed-4b74-8faa-c57ecd44f412-198914] service.QueryService:422 :
>> Stats of SQL response: isException: false, duration: 20, *total scan
>> count 1552*
>>
>> Acoording the log,  *valueA *= 366. *valueB*= (total scan count) 1552 -
>> (total Agrrated/filterd in hbase)270 = 1282
>>  *valueB *is much larger than *valueA *.
>>
>>
>>
>> ------------------ 原始邮件 ------------------
>> *发件人:* "JiaTao Tao"<taojia...@gmail.com>;
>> *发送时间:* 2018年11月5日(星期一) 下午2:41
>> *收件人:* "user"<u...@kylin.apache.org>;
>> *主题:* Re: doubt about measure of processedRowCount
>>
>> Can you grep logs like "to merge segment results" in that scenario?
>>
>> cheney <531014...@qq.com> 于2018年11月3日周六 下午4:15写道：
>>
>>> Thank your repling, .but I  am sure there's only one OlapContext in the
>>> quey in my scenario.
>>> ---Original---
>>> *From:* "JiaTao Tao"<taojia...@gmail.com>
>>> *Date:* Sat, Nov 3, 2018 10:42 AM
>>> *To:* "user"<u...@kylin.apache.org>;
>>> *Subject:* Re: doubt about measure of processedRowCount
>>>
>>> Maybe count all the *valueA *would be more appropriate, cuz maybe
>>> there's more than one OlapContext in the query ( one OlapContext correspond
>>> one storageContext ).
>>>
>>> There are two good blogs about Kylin's query engine, you may take a look
>>> :).
>>>
>>> https://blog.csdn.net/yu616568/article/details/50838504
>>>
>>> https://zhuanlan.zhihu.com/p/30613434
>>>
>>> cheney <531014...@qq.com> 于2018年11月2日周五 下午11:10写道：
>>>
>>>> Hi, guys
>>>>
>>>>         When I executed a sql in kylin, kylin server will log some log
>>>> about query statics. for example, The log is as following:
>>>>
>>>>        "Processed rows for each storageContext: *valueA*". *valueA *is 
>>>> processedRowCount.
>>>>
>>>>        What I understand is processedRowCount is the record rows
>>>> numbers returned by hbase.
>>>>
>>>>        Hbase corprocessor will log region stats, including:  "*Total
>>>> scanned row*","Total filtered/aggred row".
>>>>
>>>>         For  one region,  final records returned by hbase = *Total scanned
>>>> row - *Total filtered/aggred row;
>>>>        Suppose this query need to scan 10 region in hbase, we can get
>>>> every region stats. we can get all records  *valueB *returned by hbase
>>>> by
>>>>        suming every final records in 10 region.
>>>>
>>>>       In general, *valueA *is equal to * valueB*, but *valueB *is much
>>>> larger than *valueA* in sometimes. Why?
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>>
>>> Regards!
>>>
>>> Aron Tao
>>>
>>
>>
>> --
>>
>>
>> Regards!
>>
>> Aron Tao
>>
>
>
> --
>
>
> Regards!
>
> Aron Tao
>


-- 


Regards!

Aron Tao

Re: doubt about measure of processedRowCount

Reply via email to