[ https://issues.apache.org/jira/browse/HIVE-13429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai updated HIVE-13429: ------------------------------ Attachment: HIVE-13429.1.patch > Tool to remove dangling scratch dir > ----------------------------------- > > Key: HIVE-13429 > URL: https://issues.apache.org/jira/browse/HIVE-13429 > Project: Hive > Issue Type: Improvement > Reporter: Daniel Dai > Assignee: Daniel Dai > Attachments: HIVE-13429.1.patch > > > We have seen in some cases, user will leave the scratch dir behind, and > eventually eat out hdfs storage. This could happen when vm restarts and leave > no chance for Hive to run shutdown hook. This is applicable for both HiveCli > and HiveServer2. Here we provide an external tool to clear dead scratch dir > as needed. > We need a way to identify which scratch dir is in use. We will rely on HDFS > write lock for that. Here is how HDFS write lock works: > 1. A HDFS client open HDFS file for write and only close at the time of > shutdown > 2. Cleanup process can try to open HDFS file for write. If the client holding > this file is still running, we will get exception. Otherwise, we know the > client is dead > 3. If the HDFS client dies without closing the HDFS file, NN will reclaim the > lease after 10 min, ie, the HDFS file hold by the dead client is writable > again after 10 min > So here is how we remove dangling scratch directory in Hive: > 1. HiveCli/HiveServer2 opens a well-named lock file in scratch directory and > only close it when we about to drop scratch directory > 2. A command line tool cleardanglingscratchdir will check every scratch > directory and try open the lock file for write. If it does not get exception, > meaning the owner is dead and we can safely remove the scratch directory > 3. The 10 min window means it is possible a HiveCli/HiveServer2 is dead but > we still cannot reclaim the scratch directory for another 10 min. But this > should be tolerable -- This message was sent by Atlassian JIRA (v6.3.4#6332)