回复： Exceed scan threshold at 10000001

张磊 Fri, 28 Oct 2016 01:03:20 -0700

kylin do put pre-calculate results in hbase, if cude desc is below:
Dimensions:letter number
Measures:count
in hbase the result is 
count letter number
1         A        1
1         A        2
1         B        1
1         B        2
1         B        3
1         B        4
count  letter
2          A
4          B
If i query select count(1) from table group by letter,number limit 2,it should 
scan the first two rows(letter number Agg groups)?
when i query select count(1) from table group by letter limit 2,it should scan 
the two rows(letter Agg group) 
Do i say right?



------------------ 原始邮件 ------------------
发件人: "a.ramonportoles";<a.ramonporto...@gmail.com>;
发送时间: 2016年10月28日(星期五) 下午3:43
收件人: "dev"<dev@kylin.apache.org>; 

主题: Re: Exceed scan threshold at 10000001



hummmm
but are you using "group by LO_CUSTKEY,LO_PARTKEY"

And limit apply to final result no to scan rows

Example:
table with two columns Letter / Number
A:1
A:2
B:1
B:2
B:3
B:4

select count (1), Letter from TB group by Letter limit 1
   Result: 2:A
   Scans 2 rows

select count (1), Letter from TB group by Letter limit 2
   Result: 2:A
               4:B
   Scans 2 +4 rows


Alb



2016-10-28 8:33 GMT+02:00 张磊 <121762...@qq.com>:

> Query1:select count(1),sum(LO_REVENUE) from lineorder group by
> LO_CUSTKEY,LO_PARTKEY
> LIMIT 10000
>
>
> I find it scan 10000 rows from HBase
>
>
> Query2: select count(1),sum(LO_REVENUE) from lineorder group by
> LO_CUSTKEY,LO_PARTKEY
> LIMIT 10001
>
>
> I find it scan 10000001 rows from Hbase
>
>
> I do not know why?  Should not  scan 10001 row?
>
>
> The two query i scan the same HTable KYLIN_78ROC49NQY
> Kylin log:Endpoint RPC returned from HTable KYLIN_78ROC49NQY
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "ShaoFeng Shi";<shaofeng...@apache.org>;
> 发送时间: 2016年10月28日(星期五) 中午11:20
> 收件人: "dev"<dev@kylin.apache.org>;
>
> 主题: Re: Exceed scan threshold at 10000001
>
>
>
> Alberto, thanks for your explaination, you got the points and is already an
> Kylin expert I believe.
>
> In order to protect HBase and Kylin from crashing by bad queries (which
> scan too many rows), Kylin add this mechnisam to interrupt when reach some
> threshold. Usually in an OLAP scenario, the result wouldn't be too large.
> This is also a reminder for user to rethink the design; If you really want
> to get the threshold be enlarged, you can allocate more memory to Kylin and
> set "kylin.query.mem.budget" to bigger value.
>
> 2016-10-27 18:39 GMT+08:00 Alberto Ramón <a.ramonporto...@gmail.com>:
>
> > NOTE: I'm not a expert on Kylin  ;)
> >
> > Where is mandatory? No
> > Where is recommended? yes
> > Where bypass the threshold? No, I think this limit is hardcoded ¿?
> >
> > The real question must be: why this limit exists ?: (opinion)
> > - The target of Kylin is Real / Near RT, limit rows --> limit response
> time
> > - If Your are using JDBC, this is not a good option by performance
> > - Protect the HBase Coprocesor
> > - Perhaps you need a new Dim, to precalculate This Aggregate or filter by
> > this new Dim
> >
> > For Extra-Large queries, you can also check:
> >  -kylin.query.mem.budget= 3GB
> >  -hbase.server.scanner.max.result.size = 100MB  (limit from HBase, you
> can
> > disable with -1)
> >
> > Good Luck, Alb
> >
> > 2016-10-27 11:56 GMT+02:00 张磊 <121762...@qq.com>:
> >
> > > Do you mean when i query, i should add where clause,
> > > but in some case, the number of records > threshold, how can i do?
> > > For example, order by all groups, the number of the  all groups >
> > > threshold
> > >
> > >
> > >
> > >
> > > ------------------ 原始邮件 ------------------
> > > 发件人: "Alberto Ramón";<a.ramonporto...@gmail.com>;
> > > 发送时间: 2016年10月27日(星期四) 下午5:47
> > > 收件人: "dev"<dev@kylin.apache.org>;
> > >
> > > 主题: Re: Exceed scan threshold at 10000001
> > >
> > >
> > >
> > >  ERROR: Scan row count exceeded threshold
> > >
> > > MailList
> > > <http://mail-archives.apache.org/mod_mbox/kylin-user/
> > > 201608.mbox/%3CCALjEW7M_YYi7Xs55OqPdxS6pzNvD0%
> 2BamN2AX3hetnF0%3D9uFnow%
> > > 40mail.gmail.com%3E>
> > > Kilin
> > > 1787 <https://issues.apache.org/jira/browse/KYLIN-1787>v1.5.3
> > >
> > > *Scan row count exceeded threshold: 1000000, please add filter
> condition
> > to
> > > narrow down backend scan range, like where clause*
> > >
> > >
> > > BR, Alb
> > >
> > > 2016-10-27 11:40 GMT+02:00 张磊 <121762...@qq.com>:
> > >
> > > > Hi
> > > >
> > > >
> > > > When i query a sql, I do not know why should scan hbase? How can i
> do?
> > > > Thanks!
> > > >
> > > >
> > > > Table: lineorder  12,000,000 row records
> > > > Dimensions: LO_CUSTKEY,LO_PARTKEY
> > > > Measures: count(1), sum(LO_REVENUE)
> > > >
> > > >
> > > > Query SQL: select count(1),sum(LO_REVENUE) from lineorder group by
> > > > LO_CUSTKEY,LO_PARTKEY order by LO_CUSTKEY,LO_PARTKEY limit 50
> > > >
> > > >
> > > > I build a cude with two Dimensions and two Measures(count and sum),
> the
> > > > size of the Htable is 98 MB, when i execute a query in insight, it
> > shows
> > > > Error in coprocessor; and i check the hbase log, i find blow messages
> > > >
> > > >
> > > > 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,
> > > queue=1,port=16020]
> > > > gridtable.GTScanRequest: pre aggregation is not beneficial, skip it
> > > > 2016-10-27 02:06:13,470 INFO  [B.defaultRpcServer.handler=4,
> > > queue=1,port=16020]
> > > > endpoint.CubeVisitService: Scanned 1 rows from HBase.
> > > >
> > > >
> > > > 2016-10-27 02:24:20,884 INFO  [B.defaultRpcServer.handler=6,
> > > queue=0,port=16020]
> > > > endpoint.CubeVisitService: Scanned 9999001 rows from HBase.
> > > > 2016-10-27 02:24:20,889 INFO  [B.defaultRpcServer.handler=6,
> > > queue=0,port=16020]
> > > > endpoint.CubeVisitService: The cube visit did not finish normally
> > because
> > > > scan num exceeds threshold
> > > > org.apache.kylin.gridtable.GTScanExceedThresholdException: Exceed
> scan
> > > > threshold at 10000001
> > > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > > > endpoint.CubeVisitService$2.hasNext(CubeVisitService.java:267)
> > > >         at org.apache.kylin.storage.hbase.cube.v2.
> > > HBaseReadonlyStore$1$1.
> > > > hasNext(HBaseReadonlyStore.java:111)
> > > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > > > endpoint.CubeVisitService.visitCube(CubeVisitService.java:299)
> > > >         at org.apache.kylin.storage.hbase.cube.v2.coprocessor.
> > > > endpoint.generated.CubeVisitProtos$CubeVisitService.callMethod(
> > > > CubeVisitProtos.java:3952)
> > > >         at org.apache.hadoop.hbase.regionserver.HRegion.
> > > > execService(HRegion.java:7815)
> > > >         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > > > execServiceOnRegion(RSRpcServices.java:1986)
> > > >         at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > > > execService(RSRpcServices.java:1968)
> > > >         at org.apache.hadoop.hbase.protobuf.generated.
> > > > ClientProtos$ClientService$2.callBlockingMethod(
> > ClientProtos.java:33652)
> > > >         at org.apache.hadoop.hbase.ipc.
> RpcServer.call(RpcServer.java:
> > > 2178)
> > > >         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.
> > > java:112)
> > > >         at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
> > > > RpcExecutor.java:133)
> > > >         at org.apache.hadoop.hbase.ipc.
> RpcExecutor$1.run(RpcExecutor.
> > > > java:108)
> > > >         at java.lang.Thread.run(Thread.java:745)
> > >
> >
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>

回复： Exceed scan threshold at 10000001

Reply via email to