[ https://issues.apache.org/jira/browse/KYLIN-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
hongbin ma updated KYLIN-2438: ------------------------------ Description: In order to guard against bad queries that can consume lots of memory and potentially crash kylin / hbase server, kylin limits the maximum number of rows query can scan. The maximum value is chosen based on two configs # *kylin.query.scan.threshold* is used if the query doesn't contain memory-hungry metrics # *kylin.query.mem.budget* / estimated_row_size is used otherwise as the per region maximum. This approach however has several deficiencies: * It doesn't work with complex, varlen metrics very well. The estimated threshold could be either too small or too large. If it's too small, good queries are killed. If it's too large, bad queries are not banned. * Row count doesn't correspond to memory consumption, thus it's difficult to determine how large scan threshold should be set to. * kylin.query.scan.threshold can't be override at cube level. In this JIRA, I propose to replace the current row count based threshold with a more intuitive size based threshold * KYLIN-2437 will collect the number of bytes scanned at both region and query level * A new configuration *kylin.query.max-scan-bytes* will be added to limits the maximum number of bytes query can scan * *kylin.query.mem.budget* will be renamed to -*kylin.storage.hbase.coprocessor-max-scan-bytes*- +*kylin.storage.partition.max-scan-bytes*+, which limits at region level. No need to rely on estimations about row size any more. * The above two configs scan be override at cube level * the old *kylin.query.scan.threshold* will be deprecated was: In order to guard against bad queries that can consume lots of memory and potentially crash kylin / hbase server, kylin limits the maximum number of rows query can scan. The maximum value is chosen based on two configs # *kylin.query.scan.threshold* is used if the query doesn't contain memory-hungry metrics # *kylin.query.mem.budget* / estimated_row_size is used otherwise as the per region maximum. This approach however has several deficiencies: * It doesn't work with complex, varlen metrics very well. The estimated threshold could be either too small or too large. If it's too small, good queries are killed. If it's too large, bad queries are not banned. * Row count doesn't correspond to memory consumption, thus it's difficult to determine how large scan threshold should be set to. * kylin.query.scan.threshold can't be override at cube level. In this JIRA, I propose to replace the current row count based threshold with a more intuitive size based threshold * KYLIN-2437 will collect the number of bytes scanned at both region and query level * A new configuration *kylin.query.max-scan-bytes* will be added to limits the maximum number of bytes query can scan * *kylin.query.mem.budget* will be renamed to *kylin.storage.hbase.coprocessor-max-scan-bytes*, which limits at region level. No need to rely on estimations about row size any more. * The above two configs scan be override at cube level * the old *kylin.query.scan.threshold* will be deprecated > replace scan threshold with max scan bytes > ------------------------------------------ > > Key: KYLIN-2438 > URL: https://issues.apache.org/jira/browse/KYLIN-2438 > Project: Kylin > Issue Type: Improvement > Components: Query Engine, Storage - HBase > Affects Versions: v1.6.0 > Reporter: Dayue Gao > Assignee: Dayue Gao > Fix For: v2.0.0 > > > In order to guard against bad queries that can consume lots of memory and > potentially crash kylin / hbase server, kylin limits the maximum number of > rows query can scan. The maximum value is chosen based on two configs > # *kylin.query.scan.threshold* is used if the query doesn't contain > memory-hungry metrics > # *kylin.query.mem.budget* / estimated_row_size is used otherwise as the per > region maximum. > This approach however has several deficiencies: > * It doesn't work with complex, varlen metrics very well. The estimated > threshold could be either too small or too large. If it's too small, good > queries are killed. If it's too large, bad queries are not banned. > * Row count doesn't correspond to memory consumption, thus it's difficult to > determine how large scan threshold should be set to. > * kylin.query.scan.threshold can't be override at cube level. > In this JIRA, I propose to replace the current row count based threshold with > a more intuitive size based threshold > * KYLIN-2437 will collect the number of bytes scanned at both region and > query level > * A new configuration *kylin.query.max-scan-bytes* will be added to limits > the maximum number of bytes query can scan > * *kylin.query.mem.budget* will be renamed to > -*kylin.storage.hbase.coprocessor-max-scan-bytes*- > +*kylin.storage.partition.max-scan-bytes*+, which limits at region level. No > need to rely on estimations about row size any more. > * The above two configs scan be override at cube level > * the old *kylin.query.scan.threshold* will be deprecated -- This message was sent by Atlassian JIRA (v6.3.15#6346)