[ 
https://issues.apache.org/jira/browse/KYLIN-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16521957#comment-16521957
 ] 

Xingxing Di edited comment on KYLIN-3282 at 6/25/18 7:58 AM:
-------------------------------------------------------------

Hi Shaofeng, now I can't find the error log which cause job disappeared,  but i 
saw a few "Overwriting conflict "  error  logs like the origin post by 
readme_kylin.  

 
{code:java}
2018-06-10 19:34:51,044 ERROR [Scheduler 372961027 Job 
39a3698c-c961-456d-a655-3a9c5f8dc188-2194] common.MapReduceExecutable:195 : 
error execute Map
ReduceExecutable{id=39a3698c-c961-456d-a655-3a9c5f8dc188-15, name=Convert 
Cuboid Data to HFile, state=RUNNING}
java.lang.IllegalStateException: Overwriting conflict 
/execute_output/39a3698c-c961-456d-a655-3a9c5f8dc188-15, expect old TS 
1528630470288, but it i
s 1528630481128
at 
org.apache.kylin.storage.hbase.HBaseResourceStore.checkAndPutResourceImpl(HBaseResourceStore.java:316)
at 
org.apache.kylin.common.persistence.ResourceStore.checkAndPutResourceCheckpoint(ResourceStore.java:294)
at 
org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:280)
at 
org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:260)
at 
org.apache.kylin.job.dao.ExecutableDao.writeJobOutputResource(ExecutableDao.java:104)
at 
org.apache.kylin.job.dao.ExecutableDao.updateJobOutput(ExecutableDao.java:218)
at 
org.apache.kylin.job.execution.ExecutableManager.addJobInfo(ExecutableManager.java:470)
at 
org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:160)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125)
at 
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:64)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125)
at 
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:144)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
{code}


was (Author: dixingx...@yeah.net):
Hi Shaofeng, now I can't find the error log cause job disappeared,  but i saw 
few "Overwriting conflict "  error  logs like the origin post by readme_kylin.  

 
{code:java}

2018-06-10 19:34:51,044 ERROR [Scheduler 372961027 Job 
39a3698c-c961-456d-a655-3a9c5f8dc188-2194] common.MapReduceExecutable:195 : 
error execute Map
ReduceExecutable{id=39a3698c-c961-456d-a655-3a9c5f8dc188-15, name=Convert 
Cuboid Data to HFile, state=RUNNING}
java.lang.IllegalStateException: Overwriting conflict 
/execute_output/39a3698c-c961-456d-a655-3a9c5f8dc188-15, expect old TS 
1528630470288, but it i
s 1528630481128
at 
org.apache.kylin.storage.hbase.HBaseResourceStore.checkAndPutResourceImpl(HBaseResourceStore.java:316)
at 
org.apache.kylin.common.persistence.ResourceStore.checkAndPutResourceCheckpoint(ResourceStore.java:294)
at 
org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:280)
at 
org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:260)
at 
org.apache.kylin.job.dao.ExecutableDao.writeJobOutputResource(ExecutableDao.java:104)
at 
org.apache.kylin.job.dao.ExecutableDao.updateJobOutput(ExecutableDao.java:218)
at 
org.apache.kylin.job.execution.ExecutableManager.addJobInfo(ExecutableManager.java:470)
at 
org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:160)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125)
at 
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:64)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125)
at 
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:144)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
{code}

> hbase timeout cause the endless status.
> ---------------------------------------
>
>                 Key: KYLIN-3282
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3282
>             Project: Kylin
>          Issue Type: Bug
>          Components: Job Engine
>    Affects Versions: v2.3.0
>            Reporter: readme_kylin
>            Priority: Major
>
> ri Mar 09 12:52:07 GMT+08:00 2018, 
> RpcRetryingCaller\{globalStartTime=1520571112216, pause=100, retries=1}, 
> java.io.IOException: Call to QZ140/10.0.0.140:16020 failed on local 
> exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=8030361, 
> waitTime=15002, operationTimeout=15000 expired.
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:157)
>  at org.apache.hadoop.hbase.client.HTable.checkAndPut(HTable.java:1233)
>  at 
> org.apache.kylin.storage.hbase.HBaseResourceStore.checkAndPutResourceImpl(HBaseResourceStore.java:311)
>  at 
> org.apache.kylin.common.persistence.ResourceStore.checkAndPutResourceCheckpoint(ResourceStore.java:305)
>  at 
> org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:291)
>  at 
> org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:271)
>  at 
> org.apache.kylin.job.dao.ExecutableDao.writeJobOutputResource(ExecutableDao.java:88)
>  at 
> org.apache.kylin.job.dao.ExecutableDao.updateJobOutput(ExecutableDao.java:216)
>  at 
> org.apache.kylin.job.execution.ExecutableManager.addJobInfo(ExecutableManager.java:480)
>  at 
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:161)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162)
>  at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:67)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162)
>  at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:300)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:745)
>  
>  
> 2018-03-09 12:52:10,191 ERROR [Scheduler 9772827 Job 
> 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:100 : 
> 1th retries for onExecuteFinished fails due to {}
> java.lang.IllegalStateException: Overwriting conflict 
> /execute_output/499477a7-4c1a-4c5a-8d4a-0b3218a58dca-13, expect old TS 
> 1520571099067, but it is 1520571112216
>  at 
> org.apache.kylin.storage.hbase.HBaseResourceStore.checkAndPutResourceImpl(HBaseResourceStore.java:316)
>  at 
> org.apache.kylin.common.persistence.ResourceStore.checkAndPutResourceCheckpoint(ResourceStore.java:305)
>  at 
> org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:291)
>  at 
> org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:271)
>  at 
> org.apache.kylin.job.dao.ExecutableDao.writeJobOutputResource(ExecutableDao.java:88)
>  at 
> org.apache.kylin.job.dao.ExecutableDao.updateJobOutput(ExecutableDao.java:216)
>  at 
> org.apache.kylin.job.execution.ExecutableManager.addJobInfo(ExecutableManager.java:480)
>  at 
> org.apache.kylin.job.execution.ExecutableManager.addJobInfo(ExecutableManager.java:490)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.addExtraInfo(AbstractExecutable.java:403)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.setEndTime(AbstractExecutable.java:415)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.onExecuteFinished(AbstractExecutable.java:121)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.onExecuteFinishedWithRetry(AbstractExecutable.java:98)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:175)
>  at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:67)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162)
>  at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:300)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:745)
> 2018-03-09 12:52:10,193 ERROR [Scheduler 9772827 Job 
> 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:164 : 
> error running Executable: CubingJob\{id=499477a7-4c1a-4c5a-8d4a-0b3218a58dca, 
> name=BUILD CUBE - android_download_model_1_2_cube_1_3 - 
> 20180309000000_20180310000000 - GMT+08:00 2018-03-09 12:28:58, state=RUNNING}
> 2018-03-09 12:52:10,193 INFO [Scheduler 9772827 Job 
> 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:157 : 
> Retry 1
> 2018-03-09 12:52:10,313 ERROR [Scheduler 9772827 Job 
> 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:108 : 
> There shouldn't be a running subtask[jobId: 
> 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-13, jobName: Build N-Dimension Cuboid : 
> level 7],
> it might cause endless state, will retry to fetch subtask's state.
> 2018-03-09 12:52:10,414 ERROR [Scheduler 9772827 Job 
> 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:186 : 
> With 1 times retry, it's state is still RUNNING
> 2018-03-09 12:52:10,525 ERROR [Scheduler 9772827 Job 
> 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:186 : 
> With 2 times retry, it's state is still RUNNING
> 2018-03-09 12:52:10,626 ERROR [Scheduler 9772827 Job 
> 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:186 : 
> With 3 times retry, it's state is still RUNNING
> 2018-03-09 12:52:10,737 ERROR [Scheduler 9772827 Job 
> 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:186 : 
> With 4 times retry, it's state is still RUNNING
> 2018-03-09 12:52:10,839 ERROR [Scheduler 9772827 Job 
> 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:186 : 
> With 5 times retry, it's state is still RUNNING
> 2018-03-09 12:52:10,945 ERROR [Scheduler 9772827 Job 
> 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:186 : 
> With 6 times retry, it's state is still RUNNING
> 2018-03-09 12:52:11,047 ERROR [Scheduler 9772827 Job 
> 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:186 : 
> With 7 times retry, it's state is still RUNNING
> 2018-03-09 12:52:11,157 ERROR [Scheduler 9772827 Job 
> 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:186 : 
> With 8 times retry, it's state is still RUNNING
> 2018-03-09 12:52:11,260 ERROR [Scheduler 9772827 Job 
> 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:186 : 
> With 9 times retry, it's state is still RUNNING
> 2018-03-09 12:52:11,362 ERROR [Scheduler 9772827 Job 
> 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:186 : 
> With 10 times retry, it's state is still RUNNING
> 2018-03-09 12:52:11,363 ERROR [Scheduler 9772827 Job 
> 499477a7-4c1a-4c5a-8d4a-0b3218a58dca-516] execution.AbstractExecutable:195 : 
> Parent task: BUILD CUBE - android_download_model_1_2_cube_1_3 - 
> 20180309000000_20180310000000 - GMT+08:00 2018-03-09 12:28:58 is finished, 
> but it's subtask: Build N-Dimension Cuboid : level 7's state is still RUNNING
> , mark parent task failed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to