[ 
https://issues.apache.org/jira/browse/KYLIN-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858054#comment-15858054
 ] 

hongbin ma commented on KYLIN-2438:
-----------------------------------

+1 
happy to remove dependency on row size estimation

> replace scan threshold with max scan bytes
> ------------------------------------------
>
>                 Key: KYLIN-2438
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2438
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Query Engine, Storage - HBase
>    Affects Versions: v1.6.0
>            Reporter: Dayue Gao
>            Assignee: Dayue Gao
>
> In order to guard against bad queries that can consume lots of memory and 
> potentially crash kylin / hbase server, kylin limits the maximum number of 
> rows query can scan. The maximum value is chosen based on two configs
> # *kylin.query.scan.threshold* is used if the query doesn't contain 
> memory-hungry metrics
> # *kylin.query.mem.budget* / estimated_row_size is used otherwise as the per 
> region maximum.
> This approach however has several deficiencies:
> * It doesn't work with complex, varlen metrics very well. The estimated 
> threshold could be either too small or too large. If it's too small, good 
> queries are killed. If it's too large, bad queries are not banned.
> * Row count doesn't correspond to memory consumption, thus it's difficult to 
> determine how large scan threshold should be set to.
> * kylin.query.scan.threshold can't be override at cube level.
> In this JIRA, I propose to replace the current row count based threshold with 
> a more intuitive size based threshold
> * KYLIN-2437 will collect the number of bytes scanned at both region and 
> query level
> * A new configuration *kylin.query.max-scan-bytes* will be added to limits 
> the maximum number of bytes query can scan
> * *kylin.query.mem.budget* will be renamed to 
> *kylin.storage.hbase.coprocessor-max-scan-bytes*, which limits at region 
> level. No need to rely on estimations about row size any more.
> * The above two configs scan be override at cube level
> * the old *kylin.query.scan.threshold* will be deprecated



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to