Dayue Gao created KYLIN-2438: -------------------------------- Summary: replace scan threshold with max scan bytes Key: KYLIN-2438 URL: https://issues.apache.org/jira/browse/KYLIN-2438 Project: Kylin Issue Type: Improvement Components: Query Engine, Storage - HBase Affects Versions: v1.6.0 Reporter: Dayue Gao Assignee: Dayue Gao
In order to guard against bad queries that can consume too much memory and then crash kylin / hbase server, kylin limits the maximum number of rows query can scan. The maximum value is determined by two configs # *kylin.query.scan.threshold* is used if the query doesn't contain memory-hungry metrics # otherwise, *kylin.query.mem.budget* / estimated_row_size is used as the maximum per region. This approach however has several deficiencies: * It doesn't work with complex, variable length metrics very well. The estimated threshold could be either too small or too large. If it's too small, good queries are killed. If it's too large, bad queries are not banned. * Row count doesn't correspond to memory consumption, thus it's difficult to determine how large scan threshold should be set to. * kylin.query.scan.threshold can't be override at cube level. In this JIRA, I propose to replace the current row count based threshold with a more intuitive size based threshold * KYLIN-2437 will collect the number of bytes scanned at both region and query level * A new configuration *kylin.query.max-scan-bytes* will be added to limits the maximum number of bytes query can scan in total * *kylin.query.mem.budget* will be renamed to *kylin.storage.hbase.coprocessor-max-scan-bytes*, which limits at region level * the old *kylin.query.scan.threshold* will be deprecated -- This message was sent by Atlassian JIRA (v6.3.15#6346)