[jira] [Commented] (KYLIN-4060) "Garbage Collection on HDFS" step failed because of hdfs path not exists

2019-07-30 Thread WangSheng (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895966#comment-16895966
 ] 

WangSheng commented on KYLIN-4060:
--

Hi, [~wangrupeng], thanks for your advice,  and I've already seen the code you 
pasted in kylin-2.6.x, it's clearly that this problem has been solved, so I 
will close this jira. Anyway, thanks for your kind.

> "Garbage Collection on HDFS" step failed because of hdfs path not exists
> 
>
> Key: KYLIN-4060
> URL: https://issues.apache.org/jira/browse/KYLIN-4060
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v2.4.1
>Reporter: WangSheng
>Priority: Minor
>
> We found a bug recently when we used streaming cube on last job step "Garbage 
> Collection on HDFS", the proplem is as blow:
>  
> {code:java}
> Drop HDFS path on FileSystem: "hdfs://kylin-cluster" 
> HDFS path 
> /user/kylin/kylin_home/kylin_metadata/kylin-03c04b31-5d40-441a-a0df-289f5977b733/cube_test/fact_distinct_columns
>  not exists.
> File 
> /user/kylin/kylin_home/kylin_metadata/kylin-03c04b31-5d40-441a-a0df-289f5977b733/cube_test
>  does not exist.
> {code}
> When I check the code and log, I found that the main reason is:
>  
>  # A build job first submitted, and on step "Update Cube Info", segment 
> became "READY";
>  # Then a merge job submitted automatically by kylin, include segment on 
> step1. The merge job finished quickly, and deleted input segments hdfs path;
>  # After merge job finished, the build job continue build, "Hive Cleanup" and 
> "Garbage Collection on HBase", failed at last step because the hdfs path is 
> deleted on step2.
> Our version is 2.4.x, I'm not sure this if this bug fixed on latest 2.6.x 
> version. If not, please assign this Jira to me, thanks!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KYLIN-4060) "Garbage Collection on HDFS" step failed because of hdfs path not exists

2019-07-29 Thread wangrupeng (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895768#comment-16895768
 ] 

wangrupeng commented on KYLIN-4060:
---

Path oldPath = Path.getPathWithoutSchemeAndAuthority(new Path(path));
if (fileSystem.exists(oldPath)) {
 fileSystem.delete(oldPath, true);
 logger.debug("HDFS path " + oldPath + " is dropped.");
 output.append("HDFS path " + oldPath + " is dropped.\n");
} else {
 logger.debug("HDFS path " + oldPath + " not exists.");
 output.append("HDFS path " + oldPath + " not exists.\n");
}

> "Garbage Collection on HDFS" step failed because of hdfs path not exists
> 
>
> Key: KYLIN-4060
> URL: https://issues.apache.org/jira/browse/KYLIN-4060
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v2.4.1
>Reporter: WangSheng
>Priority: Minor
>
> We found a bug recently when we used streaming cube on last job step "Garbage 
> Collection on HDFS", the proplem is as blow:
>  
> {code:java}
> Drop HDFS path on FileSystem: "hdfs://kylin-cluster" 
> HDFS path 
> /user/kylin/kylin_home/kylin_metadata/kylin-03c04b31-5d40-441a-a0df-289f5977b733/cube_test/fact_distinct_columns
>  not exists.
> File 
> /user/kylin/kylin_home/kylin_metadata/kylin-03c04b31-5d40-441a-a0df-289f5977b733/cube_test
>  does not exist.
> {code}
> When I check the code and log, I found that the main reason is:
>  
>  # A build job first submitted, and on step "Update Cube Info", segment 
> became "READY";
>  # Then a merge job submitted automatically by kylin, include segment on 
> step1. The merge job finished quickly, and deleted input segments hdfs path;
>  # After merge job finished, the build job continue build, "Hive Cleanup" and 
> "Garbage Collection on HBase", failed at last step because the hdfs path is 
> deleted on step2.
> Our version is 2.4.x, I'm not sure this if this bug fixed on latest 2.6.x 
> version. If not, please assign this Jira to me, thanks!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KYLIN-4060) "Garbage Collection on HDFS" step failed because of hdfs path not exists

2019-07-29 Thread wangrupeng (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895740#comment-16895740
 ] 

wangrupeng commented on KYLIN-4060:
---

Hi, WangSheng, I submit streaming cube build job but I didn't get any error. I 
tried several time, every time a merge job sumitted, it always finished later 
than the last job submitted.

I checked the last step code(below), found that if the file or path not exists, 
it wont throw an exception to change the job state to 'error', it just output 
the normal log.  

Otherwise, maybe you can upgrade your kylin to see if the problem still exists. 

> "Garbage Collection on HDFS" step failed because of hdfs path not exists
> 
>
> Key: KYLIN-4060
> URL: https://issues.apache.org/jira/browse/KYLIN-4060
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v2.4.1
>Reporter: WangSheng
>Priority: Minor
>
> We found a bug recently when we used streaming cube on last job step "Garbage 
> Collection on HDFS", the proplem is as blow:
>  
> {code:java}
> Drop HDFS path on FileSystem: "hdfs://kylin-cluster" 
> HDFS path 
> /user/kylin/kylin_home/kylin_metadata/kylin-03c04b31-5d40-441a-a0df-289f5977b733/cube_test/fact_distinct_columns
>  not exists.
> File 
> /user/kylin/kylin_home/kylin_metadata/kylin-03c04b31-5d40-441a-a0df-289f5977b733/cube_test
>  does not exist.
> {code}
> When I check the code and log, I found that the main reason is:
>  
>  # A build job first submitted, and on step "Update Cube Info", segment 
> became "READY";
>  # Then a merge job submitted automatically by kylin, include segment on 
> step1. The merge job finished quickly, and deleted input segments hdfs path;
>  # After merge job finished, the build job continue build, "Hive Cleanup" and 
> "Garbage Collection on HBase", failed at last step because the hdfs path is 
> deleted on step2.
> Our version is 2.4.x, I'm not sure this if this bug fixed on latest 2.6.x 
> version. If not, please assign this Jira to me, thanks!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)