[ 
https://issues.apache.org/jira/browse/KYLIN-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230547#comment-17230547
 ] 

ASF subversion and git services commented on KYLIN-4711:
--------------------------------------------------------

Commit 76369197ba64868c4c89f56b5b05c23aa082ab7d in kylin's branch 
refs/heads/master from Guangxu Cheng
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=7636919 ]

KYLIN-4711 Change default value to 3 for 
kylin.metadata.hbase-client-retries-number


> Change default value to 3 for kylin.metadata.hbase-client-retries-number
> ------------------------------------------------------------------------
>
>                 Key: KYLIN-4711
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4711
>             Project: Kylin
>          Issue Type: Improvement
>    Affects Versions: v3.1.0
>            Reporter: Guangxu Cheng
>            Assignee: Guangxu Cheng
>            Priority: Major
>             Fix For: v3.1.2
>
>
> ```shell
>  java.lang.RuntimeException: 
> org.apache.kylin.job.exception.PersistentException: 
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
> attempts=1, exceptions:
>  Thu Aug 20 21:06:01 GMT+08:00 2020, RpcRetryingCaller
> {globalStartTime=1597928761253, pause=1000, retries=1}
> , org.apache.hadoop.hbase.NotServingRegionException: 
> org.apache.hadoop.hbase.NotServingRegionException: Region 
> kylin_production_metadata,/execute_output/3adc92f2-edcd-2705-5a9c-ad0afe4a0808-01,1594348337103.48b9e5e9c3c7891750236fcec84b38d5.
>  is not online on xxx.xxx.xxx.xxx,16031,1558009276096
>  at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3033)
>  at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1110)
>  at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2064)
>  at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33857)
>  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2189)
>  at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>  at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>  at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>  at java.lang.Thread.run(Thread.java:745)
>  on xxx.xxx.xxx.xxx,16031,1558009276096
>  at 
> org.apache.kylin.job.execution.ExecutableManager.getOutput(ExecutableManager.java:174)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.getOutput(AbstractExecutable.java:450)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.isDiscarded(AbstractExecutable.java:561)
>  at 
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:165)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:191)
>  at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:191)
>  at 
> org.apache.kylin.job.impl.threadpool.DistributedScheduler$JobRunner.run(DistributedScheduler.java:110)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
>  ```
>  Recently, our build job failed occasionally. After analysis, it was found 
> that the reason for the failure was due to abnormal access to the MetaStore. 
> We use HBase as MetaStore. 
>  When accessing HBase, the client will cache the region information of the 
> table in the client. When the region was moved, client will not actively 
> update the information in the cache. So the client will receive a 
> NotServingRegionException, the client will update the cache information when 
> retrying. But the number of retries in kylin is 1, which means that the 
> clinet will not try again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to