[ 
https://issues.apache.org/jira/browse/KYLIN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14877152#comment-14877152
 ] 

Shaofeng SHI commented on KYLIN-978:
------------------------------------

[~sunyerui] I see the problem now, thanks; 

Actually I prefer to not cleanup the files on HBase cluster, because the 
rowkey_states file is very small (only couple k bytes), and the hfiles will be 
removed by HBase automatically after the bulkload; Execpt there, there is no 
other files;

But we can drop the whole job working dir after a merge, from both hadoop 
cluster and hbase cluster, as it will no longer be used;

That would keep the code clean and easy to read; As your v2 patch has been 
applied in the code base, I would directly fix this on it; Please let me know 
if you see any problem. Thanks for your contribution!

> GarbageCollectionStep dropped Hive Intermediate Table but didn't drop 
> external hdfs path
> ----------------------------------------------------------------------------------------
>
>                 Key: KYLIN-978
>                 URL: https://issues.apache.org/jira/browse/KYLIN-978
>             Project: Kylin
>          Issue Type: Bug
>          Components: Job Engine
>    Affects Versions: v1.0, v0.7.2
>            Reporter: Yerui Sun
>            Assignee: Shaofeng SHI
>             Fix For: v1.1
>
>         Attachments: KYLIN-978-1.x-staging-v2.patch, 
> KYLIN-978-1.x-staging-v3.patch, KYLIN-978-2.x-staging.patch
>
>
> In GarbageCollectionStep, the hive intermediate table created in step 1 was 
> dropped. 
> As the table is external table, data was stored in a external hdfs path, like 
> '.../kylin-$\{jobId\}/kylin_intermediate_...', which didn't deleted when drop 
> hive table.
> Considering the purpose of GarbageCollectionStep, the external data path 
> should also be deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to