[ https://issues.apache.org/jira/browse/KYLIN-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858054#comment-15858054 ]
hongbin ma commented on KYLIN-2438: ----------------------------------- +1 happy to remove dependency on row size estimation > replace scan threshold with max scan bytes > ------------------------------------------ > > Key: KYLIN-2438 > URL: https://issues.apache.org/jira/browse/KYLIN-2438 > Project: Kylin > Issue Type: Improvement > Components: Query Engine, Storage - HBase > Affects Versions: v1.6.0 > Reporter: Dayue Gao > Assignee: Dayue Gao > > In order to guard against bad queries that can consume lots of memory and > potentially crash kylin / hbase server, kylin limits the maximum number of > rows query can scan. The maximum value is chosen based on two configs > # *kylin.query.scan.threshold* is used if the query doesn't contain > memory-hungry metrics > # *kylin.query.mem.budget* / estimated_row_size is used otherwise as the per > region maximum. > This approach however has several deficiencies: > * It doesn't work with complex, varlen metrics very well. The estimated > threshold could be either too small or too large. If it's too small, good > queries are killed. If it's too large, bad queries are not banned. > * Row count doesn't correspond to memory consumption, thus it's difficult to > determine how large scan threshold should be set to. > * kylin.query.scan.threshold can't be override at cube level. > In this JIRA, I propose to replace the current row count based threshold with > a more intuitive size based threshold > * KYLIN-2437 will collect the number of bytes scanned at both region and > query level > * A new configuration *kylin.query.max-scan-bytes* will be added to limits > the maximum number of bytes query can scan > * *kylin.query.mem.budget* will be renamed to > *kylin.storage.hbase.coprocessor-max-scan-bytes*, which limits at region > level. No need to rely on estimations about row size any more. > * The above two configs scan be override at cube level > * the old *kylin.query.scan.threshold* will be deprecated -- This message was sent by Atlassian JIRA (v6.3.15#6346)