[jira] [Commented] (KYLIN-1805) It's easily got stuck when deleting HTables during running the StorageCleanupJob
[ https://issues.apache.org/jira/browse/KYLIN-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336552#comment-15336552 ] Zhong Yanghong commented on KYLIN-1805: --- The patch has been tested in ebay and it works well. > It's easily got stuck when deleting HTables during running the > StorageCleanupJob > > > Key: KYLIN-1805 > URL: https://issues.apache.org/jira/browse/KYLIN-1805 > Project: Kylin > Issue Type: Improvement > Components: Tools, Build and Test >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > Attachments: > fix_got_stuck_when_deleting_htables_during_storagecleanupjob_1_4_rc.patch > > > In some unlucky case that some unused htables cannot be deleted successfully, > currently kylin will be pending at there. It's better to skip those issued > htables and continue its deleting work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1806) Loose the condition of merge job failure during the step of deleting htables
[ https://issues.apache.org/jira/browse/KYLIN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336534#comment-15336534 ] Zhong Yanghong commented on KYLIN-1806: --- However, if a htable used for merge job was deleted unexpectedly due to hbase issue, the related merge job can never be accomplished. I still think such kind of hbase issue should not influence the accomplishment of kylin job. We have tools to regularly check and delete those unused htables. Still think an asynchronous way will be better. > Loose the condition of merge job failure during the step of deleting htables > > > Key: KYLIN-1806 > URL: https://issues.apache.org/jira/browse/KYLIN-1806 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > > For the merge job, after the major jobs finish successfully, it comes to the > step of Garbage Collection for deleting unused htables. Sometimes exceptions > will occur due to some hbase issue. For example, hbase cannot find the > related htables. It's better to skip these exceptions and continue the merge > job. Later, if those unused htables still exist in hbase, they can be deleted > by a regular job, StorageCleanupJob. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1806) Loose the condition of merge job failure during the step of deleting htables
[ https://issues.apache.org/jira/browse/KYLIN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336516#comment-15336516 ] Zhong Yanghong commented on KYLIN-1806: --- You are right and I made a mistake. The constraint is on the cube segment level. > Loose the condition of merge job failure during the step of deleting htables > > > Key: KYLIN-1806 > URL: https://issues.apache.org/jira/browse/KYLIN-1806 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > > For the merge job, after the major jobs finish successfully, it comes to the > step of Garbage Collection for deleting unused htables. Sometimes exceptions > will occur due to some hbase issue. For example, hbase cannot find the > related htables. It's better to skip these exceptions and continue the merge > job. Later, if those unused htables still exist in hbase, they can be deleted > by a regular job, StorageCleanupJob. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1806) Loose the condition of merge job failure during the step of deleting htables
[ https://issues.apache.org/jira/browse/KYLIN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336233#comment-15336233 ] Shaofeng SHI commented on KYLIN-1806: - I remember the constraint is on cube segment level, not on job level; that means, even if a merge job is failed on the GC step, as the segment has been updated to READY, it will not block the next merge job. I might be wrong, just want to double confirm. > Loose the condition of merge job failure during the step of deleting htables > > > Key: KYLIN-1806 > URL: https://issues.apache.org/jira/browse/KYLIN-1806 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > > For the merge job, after the major jobs finish successfully, it comes to the > step of Garbage Collection for deleting unused htables. Sometimes exceptions > will occur due to some hbase issue. For example, hbase cannot find the > related htables. It's better to skip these exceptions and continue the merge > job. Later, if those unused htables still exist in hbase, they can be deleted > by a regular job, StorageCleanupJob. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1805) It's easily got stuck when deleting HTables during running the StorageCleanupJob
[ https://issues.apache.org/jira/browse/KYLIN-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335853#comment-15335853 ] Zhong Yanghong commented on KYLIN-1805: --- Hi [~Shaofengshi], could you help review this patch? Thanks very much. > It's easily got stuck when deleting HTables during running the > StorageCleanupJob > > > Key: KYLIN-1805 > URL: https://issues.apache.org/jira/browse/KYLIN-1805 > Project: Kylin > Issue Type: Improvement > Components: Tools, Build and Test >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > Attachments: > fix_got_stuck_when_deleting_htables_during_storagecleanupjob_1_4_rc.patch > > > In some unlucky case that some unused htables cannot be deleted successfully, > currently kylin will be pending at there. It's better to skip those issued > htables and continue its deleting work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1806) Loose the condition of merge job failure during the step of deleting htables
[ https://issues.apache.org/jira/browse/KYLIN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335850#comment-15335850 ] Zhong Yanghong commented on KYLIN-1806: --- Once there was one htable which no longer existed in hbase before that step by some hbase issue, which cause much trouble. > Loose the condition of merge job failure during the step of deleting htables > > > Key: KYLIN-1806 > URL: https://issues.apache.org/jira/browse/KYLIN-1806 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > > For the merge job, after the major jobs finish successfully, it comes to the > step of Garbage Collection for deleting unused htables. Sometimes exceptions > will occur due to some hbase issue. For example, hbase cannot find the > related htables. It's better to skip these exceptions and continue the merge > job. Later, if those unused htables still exist in hbase, they can be deleted > by a regular job, StorageCleanupJob. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1806) Loose the condition of merge job failure during the step of deleting htables
[ https://issues.apache.org/jira/browse/KYLIN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335841#comment-15335841 ] Zhong Yanghong commented on KYLIN-1806: --- In streaming case, if the merge job gets stuck at that step, later merge job will not be triggerred, which will cause another issue. Since this issue is related to hbase rather than kylin, I think it should not stuck kylin's job process. > Loose the condition of merge job failure during the step of deleting htables > > > Key: KYLIN-1806 > URL: https://issues.apache.org/jira/browse/KYLIN-1806 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > > For the merge job, after the major jobs finish successfully, it comes to the > step of Garbage Collection for deleting unused htables. Sometimes exceptions > will occur due to some hbase issue. For example, hbase cannot find the > related htables. It's better to skip these exceptions and continue the merge > job. Later, if those unused htables still exist in hbase, they can be deleted > by a regular job, StorageCleanupJob. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1806) Loose the condition of merge job failure during the step of deleting htables
[ https://issues.apache.org/jira/browse/KYLIN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335832#comment-15335832 ] Shaofeng SHI commented on KYLIN-1806: - I don't agree with this proposal; the root cause is the hbase's environment issue, it should get user's attention to fix the problem, instead of tolerating and ignoring. Besides, as GC is the last step, the metadata change has been committed, the failure on this step isn't a blocker for cube building; User can resume when the HBase comes stable. > Loose the condition of merge job failure during the step of deleting htables > > > Key: KYLIN-1806 > URL: https://issues.apache.org/jira/browse/KYLIN-1806 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > > For the merge job, after the major jobs finish successfully, it comes to the > step of Garbage Collection for deleting unused htables. Sometimes exceptions > will occur due to some hbase issue. For example, hbase cannot find the > related htables. It's better to skip these exceptions and continue the merge > job. Later, if those unused htables still exist in hbase, they can be deleted > by a regular job, StorageCleanupJob. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1806) Loose the condition of merge job failure during the step of deleting htables
Zhong Yanghong created KYLIN-1806: - Summary: Loose the condition of merge job failure during the step of deleting htables Key: KYLIN-1806 URL: https://issues.apache.org/jira/browse/KYLIN-1806 Project: Kylin Issue Type: Improvement Components: Job Engine Reporter: Zhong Yanghong Assignee: Zhong Yanghong For the merge job, after the major jobs finish successfully, it comes to the step of Garbage Collection for deleting unused htables. Sometimes exceptions will occur due to some hbase issue. For example, hbase cannot find the related htables. It's better to skip these exceptions and continue the merge job. Later, if those unused htables still exist in hbase, they can be deleted by a regular job, StorageCleanupJob. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1805) It's easily got stuck when deleting HTables during running the StorageCleanupJob
[ https://issues.apache.org/jira/browse/KYLIN-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335754#comment-15335754 ] Zhong Yanghong commented on KYLIN-1805: --- Will first test on the 1.4-rc branch? If everything works well, will create one patch for the master branch? > It's easily got stuck when deleting HTables during running the > StorageCleanupJob > > > Key: KYLIN-1805 > URL: https://issues.apache.org/jira/browse/KYLIN-1805 > Project: Kylin > Issue Type: Improvement > Components: Tools, Build and Test >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > Attachments: > fix_got_stuck_when_deleting_htables_during_storagecleanupjob_1_4_rc.patch > > > In some unlucky case that some unused htables cannot be deleted successfully, > currently kylin will be pending at there. It's better to skip those issued > htables and continue its deleting work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1805) It's easily got stuck when deleting HTables during running the StorageCleanupJob
[ https://issues.apache.org/jira/browse/KYLIN-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhong Yanghong updated KYLIN-1805: -- Attachment: fix_got_stuck_when_deleting_htables_during_storagecleanupjob_1_4_rc.patch > It's easily got stuck when deleting HTables during running the > StorageCleanupJob > > > Key: KYLIN-1805 > URL: https://issues.apache.org/jira/browse/KYLIN-1805 > Project: Kylin > Issue Type: Improvement > Components: Tools, Build and Test >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > Attachments: > fix_got_stuck_when_deleting_htables_during_storagecleanupjob_1_4_rc.patch > > > In some unlucky case that some unused htables cannot be deleted successfully, > currently kylin will be pending at there. It's better to skip those issued > htables and continue its deleting work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1805) It's easily got stuck when deleting HTables during running the StorageCleanupJob
Zhong Yanghong created KYLIN-1805: - Summary: It's easily got stuck when deleting HTables during running the StorageCleanupJob Key: KYLIN-1805 URL: https://issues.apache.org/jira/browse/KYLIN-1805 Project: Kylin Issue Type: Improvement Components: Tools, Build and Test Reporter: Zhong Yanghong Assignee: Zhong Yanghong In some unlucky case that some unused htables cannot be deleted successfully, currently kylin will be pending at there. It's better to skip those issued htables and continue its deleting work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1590) 2 Kylin Steaming merge jobs of same time range triggered and failed
[ https://issues.apache.org/jira/browse/KYLIN-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335569#comment-15335569 ] liyang commented on KYLIN-1590: --- Agree to use lock around the update method. Maybe use a bigger lock, a lock on the manager. Cube instance could change between retries, ref lines below. Ken, do mind to make a patch? {code} cube = reloadCubeLocal(cube.getName()); update.setCubeInstance(cube); retry++; cube = updateCubeWithRetry(update, retry); {code} > 2 Kylin Steaming merge jobs of same time range triggered and failed > > > Key: KYLIN-1590 > URL: https://issues.apache.org/jira/browse/KYLIN-1590 > Project: Kylin > Issue Type: Bug > Components: streaming >Affects Versions: v1.4.0 >Reporter: qianqiaoneng >Assignee: Zhong Yanghong >Priority: Critical > > 2 issues: > 1. Kylin allows 2 merge jobs with same time range running. > 2. when 2 merge jobs with same time range are running on the same time, they > mixed up metadata, always get the HTable not found error. > Build Result of Job site_gmb - 20160415212000_20160415215000 - MERGE - PDT > 2016-04-15 14:58:38 > Build Result: ERROR > Job Engine: *** > Cube Name: site_gmb > Source Records Count: 0 > Start Time: Fri Apr 15 14:58:44 PDT 2016 > Duration: 2mins > MR Waiting: 0mins > Last Update Time: Fri Apr 15 15:01:42 PDT 2016 > Submitter: SYSTEM > Error Log: org.apache.hadoop.hbase.TableNotFoundException: KYLIN_NB2J0SRADJ > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1299) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1128) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1070) > at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:347) > at org.apache.hadoop.hbase.client.HTable.(HTable.java:201) > at org.apache.hadoop.hbase.client.HTable.(HTable.java:159) > at > org.apache.kylin.storage.hbase.steps.CubeHFileJob.run(CubeHFileJob.java:87) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:118) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:105) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:105) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > result code:2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)