[
https://issues.apache.org/jira/browse/KYLIN-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733742#comment-14733742
]
Shaofeng SHI commented on KYLIN-998:
------------------------------------
Hi ChunEn, I merged your latest patch, and based on your patch I made a small
update, as I found you already parsed the job uuid from hive table name, so if
we check whether allJobs contains the uuid, we would know whether it belongs to
current deployment, so no need to check the corresponding HDFS exists or not;
You can see the change in commit 277e1524f5be92ba03447ee33010f10b8de5ca75;
Regarding the KYLIN-998-UUIDS.patch, as you know since 1.0 Kylin introduced the
GC step to drop tables automatically; for some exceptional case the offline
batch cleanup is good enough, so drop tables by job UUID will be less valuable;
So I will hold this patch, and I suggest you upgrade your 0.6 deployments to
0.7.2 or above for getting bug-fixes and enhancements;
BTW, your patch is also applied in 0.8 (just renamed to 2.x-staging); Again,
thanks for your contribution!
> Finish the hive intermediate table clean up job in
> org.apache.kylin.job.hadoop.cube.StorageCleanupJob
> -----------------------------------------------------------------------------------------------------
>
> Key: KYLIN-998
> URL: https://issues.apache.org/jira/browse/KYLIN-998
> Project: Kylin
> Issue Type: Improvement
> Components: Storage - HBase
> Affects Versions: v0.7.2, v0.7.1
> Reporter: nichunen
> Assignee: Shaofeng SHI
> Fix For: v1.1
>
> Attachments: KYLIN-998-0.7-staging-v3.patch,
> KYLIN-998-0.7-staging.patch, KYLIN-998-0.8-v3.patch, KYLIN-998-0.8.patch,
> KYLIN-998-UUIDS.patch
>
>
> Current kylin has its last cube building job step named “Garbage Collection”
> to remove the intermediate data in hdfs/hbase/hive. But if the job is
> accidentally stopped like problem in hadoop cluster, bad cube design,
> discarded by user, the data was left un-deleted.
> In such cases, we can run "hbase org.apache.hadoop.util.RunJar
> $KYLIN_HOME/lib/kylin-job-0.8.1-incubating-SNAPSHOT.jar
> org.apache.kylin.job.hadoop.cube.StorageCleanupJob --delete true" to remove
> the data. But the method "cleanUnusedIntermediateHiveTable" is unfinished.
> My first patch is to finish the method, it will remove unused hive tables
> with names begin with "kylin_intermediate_".
> My second patch add some methods to enable deleting unused data with uuids in
> command line, or stored in a file.
> I don't know whether the second patch is useful to you, it's used in our
> kylin server to remove data after one cube is deleted.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)