[ https://issues.apache.org/jira/browse/KYLIN-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15619770#comment-15619770 ]
hongbin ma commented on KYLIN-2079: ----------------------------------- However, I don't think the patch will eliminate the "retrying". Even though ExpectedSizeIterator will respond to the first-retry's timeout very quickly, the underlying hbase rpc retying will not stop. Any idea to eliminate retrying? Sth like throwing a "DoNotRetryException"? > add explicit configuration knob for coprocessor timeout > ------------------------------------------------------- > > Key: KYLIN-2079 > URL: https://issues.apache.org/jira/browse/KYLIN-2079 > Project: Kylin > Issue Type: Sub-task > Components: Storage - HBase > Affects Versions: v1.5.4.1 > Reporter: Dayue Gao > Assignee: Dayue Gao > Fix For: v1.6.0 > > Attachments: KYLIN-2079.patch > > > Current self-termination timeout for CubeVisitService is calculated as the > product of three parameters: > * hbase.rpc.timeout > * hbase.client.retries.number (hardcode to 5) > * kylin.query.cube.visit.timeout.times > It has a few problems: > # due to this timeout being longer than hbase.rpc.timeout, user sees "Error > in coprocessor" instead of more descriptive GTScanSelfTerminatedException. > moreover, the request (probably a bad query) will be retried 5 times, > increasing pressure on regionserver > # it's not intuitive to set coprocessor timeout by adjusting > kylin.query.cube.visit.timeout.times > I propose the following changes: > # add a new kylin configuration "kylin.query.coprocessor.timeout.seconds" to > explicitly set coprocessor timeout. It defaults to 0, which means no value, > use hbase.rpc.timeout x 0.9 instead. When user sets it to a positive number, > kylin will use min(hbase.rpc.timeout x 0.9, > kylin.query.coprocessor.timeout.seconds) as coprocessor timeout > # remove "kylin.query.cube.visit.timeout.times". For cube visit timeout > (ExpectedSizeIterator), it's really a last resort, in case coprocessor didn't > terminate itself. I don't see too much needs for user to control it, set it > to coprocessor timeout x 10 should be a large enough. -- This message was sent by Atlassian JIRA (v6.3.4#6332)