Re: How to delete files older than X days in HDFS/Hadoop
If you're able to put your data in directories named by date (i.e. MMdd), you can take advantage of the fact that the HDFS client will return directories in sort order of the name, which returns the most recent dirs last. You can then cron a bash script that deletes all the but last N directories returned where N is how many days you want to keep. On Sat, Nov 26, 2011 at 8:26 PM, Ronnie Dove ron...@oceansync.com wrote: Hello Raimon, I like the idea of being able to search through files on HDFS so that we can find keywords or timestamp criteria, something that OceanSync will be doing in the future as a tool option. The others have told you some great ideas but I wanted to help you out from a Java API perspective. If you are a Java programmer, you would utilize FileSystem.listFiles() which returns the directory listing in a FileStatus[] format. You would crawl through the FileStatus Array in search for whether the FileStatus is a file or a directory. If it is a file, you will check the time stamp of the file using the FileStatus.getModificationTime(). If its a directory than it will be processed again using a while loop to check the contents of that directory. This sounds tough but as part of testing this, it is fairly fast and accurate. Below are the two API's that are needed as part of this method: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileStatus.html http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html Ronnie Dove OceanSync Management Developer http://www.oceansync.com RDove on irc.freenode.net #Hadoop - Original Message - From: Raimon Bosch raimon.bo...@gmail.com To: common-user@hadoop.apache.org Cc: Sent: Saturday, November 26, 2011 10:01 AM Subject: How to delete files older than X days in HDFS/Hadoop Hi, I'm wondering how to delete files older than X days with HDFS/Hadoop. On linux we can do it with the folowing command: find ~/datafolder/* -mtime +7 -exec rm {} \; Any ideas?
How to delete files older than X days in HDFS/Hadoop
Hi, I'm wondering how to delete files older than X days with HDFS/Hadoop. On linux we can do it with the folowing command: find ~/datafolder/* -mtime +7 -exec rm {} \; Any ideas?
RE: How to delete files older than X days in HDFS/Hadoop
AFAIK, there is no facility like this in HDFS through command line. One option is, write small client program and collect the files from root based on your condition and invoke delete on them. Regards, Uma From: Raimon Bosch [raimon.bo...@gmail.com] Sent: Saturday, November 26, 2011 8:31 PM To: common-user@hadoop.apache.org Subject: How to delete files older than X days in HDFS/Hadoop Hi, I'm wondering how to delete files older than X days with HDFS/Hadoop. On linux we can do it with the folowing command: find ~/datafolder/* -mtime +7 -exec rm {} \; Any ideas?
Re: How to delete files older than X days in HDFS/Hadoop
Wont be that easy but its possible to write. I did something like this. $HADOOP_HOME/bin/hadoop fs -rmr `$HADOOP_HOME/bin/hadoop fs -ls | grep '.*2011.11.1[1-8].*' | cut -f 19 -d \ ` Notice a space in -d \SPACE. -P On Sat, Nov 26, 2011 at 8:46 PM, Uma Maheswara Rao G mahesw...@huawei.comwrote: AFAIK, there is no facility like this in HDFS through command line. One option is, write small client program and collect the files from root based on your condition and invoke delete on them. Regards, Uma From: Raimon Bosch [raimon.bo...@gmail.com] Sent: Saturday, November 26, 2011 8:31 PM To: common-user@hadoop.apache.org Subject: How to delete files older than X days in HDFS/Hadoop Hi, I'm wondering how to delete files older than X days with HDFS/Hadoop. On linux we can do it with the folowing command: find ~/datafolder/* -mtime +7 -exec rm {} \; Any ideas?
Re: How to delete files older than X days in HDFS/Hadoop
Hello Raimon, I like the idea of being able to search through files on HDFS so that we can find keywords or timestamp criteria, something that OceanSync will be doing in the future as a tool option. The others have told you some great ideas but I wanted to help you out from a Java API perspective. If you are a Java programmer, you would utilize FileSystem.listFiles() which returns the directory listing in a FileStatus[] format. You would crawl through the FileStatus Array in search for whether the FileStatus is a file or a directory. If it is a file, you will check the time stamp of the file using the FileStatus.getModificationTime(). If its a directory than it will be processed again using a while loop to check the contents of that directory. This sounds tough but as part of testing this, it is fairly fast and accurate. Below are the two API's that are needed as part of this method: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileStatus.html http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html Ronnie Dove OceanSync Management Developer http://www.oceansync.com RDove on irc.freenode.net #Hadoop - Original Message - From: Raimon Bosch raimon.bo...@gmail.com To: common-user@hadoop.apache.org Cc: Sent: Saturday, November 26, 2011 10:01 AM Subject: How to delete files older than X days in HDFS/Hadoop Hi, I'm wondering how to delete files older than X days with HDFS/Hadoop. On linux we can do it with the folowing command: find ~/datafolder/* -mtime +7 -exec rm {} \; Any ideas?