[jira] [Commented] (KYLIN-1805) It's easily got stuck when deleting HTables during running the StorageCleanupJob

2016-06-17 Thread Zhong Yanghong (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336552#comment-15336552
 ] 

Zhong Yanghong commented on KYLIN-1805:
---

The patch has been tested in ebay and it works well.

> It's easily got stuck when deleting HTables during running the 
> StorageCleanupJob
> 
>
> Key: KYLIN-1805
> URL: https://issues.apache.org/jira/browse/KYLIN-1805
> Project: Kylin
>  Issue Type: Improvement
>  Components: Tools, Build and Test
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
> Attachments: 
> fix_got_stuck_when_deleting_htables_during_storagecleanupjob_1_4_rc.patch
>
>
> In some unlucky case that some unused htables cannot be deleted successfully, 
> currently kylin will be pending at there. It's better to skip those issued 
> htables and continue its deleting work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1806) Loose the condition of merge job failure during the step of deleting htables

2016-06-17 Thread Zhong Yanghong (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336534#comment-15336534
 ] 

Zhong Yanghong commented on KYLIN-1806:
---

However, if a htable used for merge job was deleted unexpectedly due to hbase 
issue, the related merge job can never be accomplished. I still think such kind 
of hbase issue should not influence the accomplishment of kylin job. We have 
tools to regularly check and delete those unused htables. Still think an 
asynchronous way will be better.

> Loose the condition of merge job failure during the step of deleting htables
> 
>
> Key: KYLIN-1806
> URL: https://issues.apache.org/jira/browse/KYLIN-1806
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>
> For the merge job, after the major jobs finish successfully, it comes to the 
> step of Garbage Collection for deleting unused htables. Sometimes exceptions 
> will occur due to some hbase issue. For example, hbase cannot find the 
> related htables. It's better to skip these exceptions and continue the merge 
> job. Later, if those unused htables still exist in hbase, they can be deleted 
> by a regular job, StorageCleanupJob.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1806) Loose the condition of merge job failure during the step of deleting htables

2016-06-17 Thread Zhong Yanghong (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336516#comment-15336516
 ] 

Zhong Yanghong commented on KYLIN-1806:
---

You are right and I made a mistake. The constraint is on the cube segment 
level. 

> Loose the condition of merge job failure during the step of deleting htables
> 
>
> Key: KYLIN-1806
> URL: https://issues.apache.org/jira/browse/KYLIN-1806
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>
> For the merge job, after the major jobs finish successfully, it comes to the 
> step of Garbage Collection for deleting unused htables. Sometimes exceptions 
> will occur due to some hbase issue. For example, hbase cannot find the 
> related htables. It's better to skip these exceptions and continue the merge 
> job. Later, if those unused htables still exist in hbase, they can be deleted 
> by a regular job, StorageCleanupJob.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1806) Loose the condition of merge job failure during the step of deleting htables

2016-06-17 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336233#comment-15336233
 ] 

Shaofeng SHI commented on KYLIN-1806:
-

I remember the constraint is on cube segment level, not on job level;  that 
means, even if a merge job is failed on the GC step, as the segment has been 
updated to READY, it will not block the next merge job. I might be wrong, just 
want to double confirm.

> Loose the condition of merge job failure during the step of deleting htables
> 
>
> Key: KYLIN-1806
> URL: https://issues.apache.org/jira/browse/KYLIN-1806
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>
> For the merge job, after the major jobs finish successfully, it comes to the 
> step of Garbage Collection for deleting unused htables. Sometimes exceptions 
> will occur due to some hbase issue. For example, hbase cannot find the 
> related htables. It's better to skip these exceptions and continue the merge 
> job. Later, if those unused htables still exist in hbase, they can be deleted 
> by a regular job, StorageCleanupJob.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1805) It's easily got stuck when deleting HTables during running the StorageCleanupJob

2016-06-17 Thread Zhong Yanghong (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335853#comment-15335853
 ] 

Zhong Yanghong commented on KYLIN-1805:
---

Hi [~Shaofengshi], could you help review this patch? Thanks very much.

> It's easily got stuck when deleting HTables during running the 
> StorageCleanupJob
> 
>
> Key: KYLIN-1805
> URL: https://issues.apache.org/jira/browse/KYLIN-1805
> Project: Kylin
>  Issue Type: Improvement
>  Components: Tools, Build and Test
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
> Attachments: 
> fix_got_stuck_when_deleting_htables_during_storagecleanupjob_1_4_rc.patch
>
>
> In some unlucky case that some unused htables cannot be deleted successfully, 
> currently kylin will be pending at there. It's better to skip those issued 
> htables and continue its deleting work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1806) Loose the condition of merge job failure during the step of deleting htables

2016-06-17 Thread Zhong Yanghong (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335850#comment-15335850
 ] 

Zhong Yanghong commented on KYLIN-1806:
---

Once there was one htable which no longer existed in hbase before that step by 
some hbase issue, which cause much trouble.

> Loose the condition of merge job failure during the step of deleting htables
> 
>
> Key: KYLIN-1806
> URL: https://issues.apache.org/jira/browse/KYLIN-1806
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>
> For the merge job, after the major jobs finish successfully, it comes to the 
> step of Garbage Collection for deleting unused htables. Sometimes exceptions 
> will occur due to some hbase issue. For example, hbase cannot find the 
> related htables. It's better to skip these exceptions and continue the merge 
> job. Later, if those unused htables still exist in hbase, they can be deleted 
> by a regular job, StorageCleanupJob.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1806) Loose the condition of merge job failure during the step of deleting htables

2016-06-17 Thread Zhong Yanghong (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335841#comment-15335841
 ] 

Zhong Yanghong commented on KYLIN-1806:
---

In streaming case, if the merge job gets stuck at that step, later merge job 
will not be triggerred, which will cause another issue. Since this issue is 
related to hbase rather than kylin, I think it should not stuck kylin's job 
process.

> Loose the condition of merge job failure during the step of deleting htables
> 
>
> Key: KYLIN-1806
> URL: https://issues.apache.org/jira/browse/KYLIN-1806
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>
> For the merge job, after the major jobs finish successfully, it comes to the 
> step of Garbage Collection for deleting unused htables. Sometimes exceptions 
> will occur due to some hbase issue. For example, hbase cannot find the 
> related htables. It's better to skip these exceptions and continue the merge 
> job. Later, if those unused htables still exist in hbase, they can be deleted 
> by a regular job, StorageCleanupJob.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1806) Loose the condition of merge job failure during the step of deleting htables

2016-06-17 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335832#comment-15335832
 ] 

Shaofeng SHI commented on KYLIN-1806:
-

I don't agree with this proposal; the root cause is the hbase's environment 
issue, it should get user's attention to fix the problem, instead of tolerating 
and ignoring.
Besides, as GC is the last step, the metadata change has been committed, the 
failure on this step isn't a blocker for cube building; User can resume when 
the HBase comes stable.

> Loose the condition of merge job failure during the step of deleting htables
> 
>
> Key: KYLIN-1806
> URL: https://issues.apache.org/jira/browse/KYLIN-1806
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>
> For the merge job, after the major jobs finish successfully, it comes to the 
> step of Garbage Collection for deleting unused htables. Sometimes exceptions 
> will occur due to some hbase issue. For example, hbase cannot find the 
> related htables. It's better to skip these exceptions and continue the merge 
> job. Later, if those unused htables still exist in hbase, they can be deleted 
> by a regular job, StorageCleanupJob.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1806) Loose the condition of merge job failure during the step of deleting htables

2016-06-17 Thread Zhong Yanghong (JIRA)
Zhong Yanghong created KYLIN-1806:
-

 Summary: Loose the condition of merge job failure during the step 
of deleting htables
 Key: KYLIN-1806
 URL: https://issues.apache.org/jira/browse/KYLIN-1806
 Project: Kylin
  Issue Type: Improvement
  Components: Job Engine
Reporter: Zhong Yanghong
Assignee: Zhong Yanghong


For the merge job, after the major jobs finish successfully, it comes to the 
step of Garbage Collection for deleting unused htables. Sometimes exceptions 
will occur due to some hbase issue. For example, hbase cannot find the related 
htables. It's better to skip these exceptions and continue the merge job. 
Later, if those unused htables still exist in hbase, they can be deleted by a 
regular job, StorageCleanupJob.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1805) It's easily got stuck when deleting HTables during running the StorageCleanupJob

2016-06-17 Thread Zhong Yanghong (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335754#comment-15335754
 ] 

Zhong Yanghong commented on KYLIN-1805:
---

Will first test on the 1.4-rc branch? If everything works well, will create one 
patch for the master branch?

> It's easily got stuck when deleting HTables during running the 
> StorageCleanupJob
> 
>
> Key: KYLIN-1805
> URL: https://issues.apache.org/jira/browse/KYLIN-1805
> Project: Kylin
>  Issue Type: Improvement
>  Components: Tools, Build and Test
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
> Attachments: 
> fix_got_stuck_when_deleting_htables_during_storagecleanupjob_1_4_rc.patch
>
>
> In some unlucky case that some unused htables cannot be deleted successfully, 
> currently kylin will be pending at there. It's better to skip those issued 
> htables and continue its deleting work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1805) It's easily got stuck when deleting HTables during running the StorageCleanupJob

2016-06-17 Thread Zhong Yanghong (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhong Yanghong updated KYLIN-1805:
--
Attachment: 
fix_got_stuck_when_deleting_htables_during_storagecleanupjob_1_4_rc.patch

> It's easily got stuck when deleting HTables during running the 
> StorageCleanupJob
> 
>
> Key: KYLIN-1805
> URL: https://issues.apache.org/jira/browse/KYLIN-1805
> Project: Kylin
>  Issue Type: Improvement
>  Components: Tools, Build and Test
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
> Attachments: 
> fix_got_stuck_when_deleting_htables_during_storagecleanupjob_1_4_rc.patch
>
>
> In some unlucky case that some unused htables cannot be deleted successfully, 
> currently kylin will be pending at there. It's better to skip those issued 
> htables and continue its deleting work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1805) It's easily got stuck when deleting HTables during running the StorageCleanupJob

2016-06-17 Thread Zhong Yanghong (JIRA)
Zhong Yanghong created KYLIN-1805:
-

 Summary: It's easily got stuck when deleting HTables during 
running the StorageCleanupJob
 Key: KYLIN-1805
 URL: https://issues.apache.org/jira/browse/KYLIN-1805
 Project: Kylin
  Issue Type: Improvement
  Components: Tools, Build and Test
Reporter: Zhong Yanghong
Assignee: Zhong Yanghong


In some unlucky case that some unused htables cannot be deleted successfully, 
currently kylin will be pending at there. It's better to skip those issued 
htables and continue its deleting work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1590) 2 Kylin Steaming merge jobs of same time range triggered and failed

2016-06-17 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335569#comment-15335569
 ] 

liyang commented on KYLIN-1590:
---

Agree to use lock around the update method. Maybe use a bigger lock, a lock on 
the manager. Cube instance could change between retries, ref lines below. Ken, 
do mind to make a patch?

{code}
cube = reloadCubeLocal(cube.getName());
update.setCubeInstance(cube);
retry++;
cube = updateCubeWithRetry(update, retry);
{code}

> 2 Kylin Steaming merge jobs of same time range triggered and failed 
> 
>
> Key: KYLIN-1590
> URL: https://issues.apache.org/jira/browse/KYLIN-1590
> Project: Kylin
>  Issue Type: Bug
>  Components: streaming
>Affects Versions: v1.4.0
>Reporter: qianqiaoneng
>Assignee: Zhong Yanghong
>Priority: Critical
>
> 2 issues:
> 1. Kylin allows 2 merge jobs with same time range running.
> 2. when 2 merge jobs with same time range are running on the same time, they 
> mixed up metadata, always get the HTable not found error.
> Build Result of Job site_gmb - 20160415212000_20160415215000 - MERGE - PDT 
> 2016-04-15 14:58:38
> Build Result: ERROR
> Job Engine: ***
> Cube Name: site_gmb
> Source Records Count: 0
> Start Time: Fri Apr 15 14:58:44 PDT 2016
> Duration: 2mins
> MR Waiting: 0mins
> Last Update Time: Fri Apr 15 15:01:42 PDT 2016
> Submitter: SYSTEM
> Error Log: org.apache.hadoop.hbase.TableNotFoundException: KYLIN_NB2J0SRADJ
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1299)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1128)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1070)
>   at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:347)
>   at org.apache.hadoop.hbase.client.HTable.(HTable.java:201)
>   at org.apache.hadoop.hbase.client.HTable.(HTable.java:159)
>   at 
> org.apache.kylin.storage.hbase.steps.CubeHFileJob.run(CubeHFileJob.java:87)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:118)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:105)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:105)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)