Seems your query reads a lot of records out of HBase, which is not by design a normal case. Normally data should be very aggregated and only a few thousands of records are read. The query response time must be slow too I guess. Your cube could use some optimization to match the use case.
For the properties: kylin.query.scan.threshold=10000000 #default is 3M - This controls how many records read from HBase at most. A safety valve to keep Kylin from overloaded by bad queries. kylin.query.mem.budget=64424509440 #default is 3G - This controls how the memory cap each query can use in query server. The memory is used for final aggregation before returning correct result. kylin.query.cube.visit.timeout.times=3 #default is 1 - A timeout for waiting HBase scan to return. Cheers Yang On Tue, Jun 7, 2016 at 10:10 AM, 吴钰彬 <[email protected]> wrote: > Hi, Liyang > > Many thanks for reply this mail, by look into the source code, I have add > below 3 parameter into the kylin.profile to somehow solve the problem > > Btw, can you explain more on below parameter propose? > > kylin.query.scan.threshold=10000000 > #default is 3M > kylin.query.mem.budget=64424509440 > #default is 3G > kylin.query.cube.visit.timeout.times=3 > #default is 1 > > > > B.R > Austin.Woo > > -----邮件原件----- > 发件人: Li Yang [mailto:[email protected]] > 发送时间: 2016年6月7日 8:55 > 收件人: [email protected] > 抄送: 李欣 <[email protected]>; 亓庆国 <[email protected]> > 主题: Re: 答复: how to extend the threshold for kylin query?(from baixing.com) > > Hi Austin, > > Note the image didn't get through mail list, thus was not displayed. > > So we didn't quite get you issue yet. Could you try describe again? You > can use file hosting service to communicate attachments. > > Also it's always better to adopt the latest version. If you are early in > pilot stage, the shift should be easy. > > Cheers > Yang > > On Tue, May 31, 2016 at 3:43 PM, 吴钰彬 <[email protected]> wrote: > > > Hi, Kylin developers > > > > > > > > We further investigate our query, and when I change my SQL query and > > remove the count(distinct) , it work fine , and from our query only > > return > > 231 records. > > > > When add the count(distinct) measure should not increase the return > > records, but I don’t know how kylin work inside to calucate the > > count(distinct), it seem need scan more many record to return the result? > > > > How to solve this problem? > > > > > > > > Ps: right now we are using kylin version 1.3 > > > > > > > > > > > > > > > > B.R > > > > Austin.Woo > > > > > > > > *发件人:* 吴钰彬 > > *发送时间:* 2016年5月31日 10:35 > > *收件人:* '[email protected]' <[email protected]> > > *抄送:* '[email protected]' > > <[email protected]>; > > 李欣 <[email protected]>; 亓庆国 <[email protected]> > > *主题:* how to extend the threshold for kylin query?(from baixing.com) > > > > > > > > Hi, Kylin developers. > > > > > > > > This is Austin, DW team lead from Baixing.com. thanks for reading this > > mail. > > > > > > > > Right now we are research on kylin solution adopt for our big data > > query engine to consume our website click and event data, > > > > > > > > When we build some cube and try query from them, we face an issue as > > below. > > > > > > > > > > > > > > > > But when we check the configure in /conf/kylin.properties set as below > > > > l Kylin.query.scan.threshold=40000000 > > > > > > > > Can you help advise is there anything we missing here? > > > > > > > > Looking forward your reply, and many thanks for your time. > > > > > > > > > > > > B.R > > > > Austin.Woo > > >
