Re: How to delete files older than X days in HDFS/Hadoop

2011-11-27 Thread Bill Graham
If you're able to put your data in directories named by date (i.e.
MMdd), you can take advantage of the fact that the HDFS client will
return directories in sort order of the name, which returns the most recent
dirs last. You can then cron a bash script that deletes all the but last N
directories returned where N is how many days you want to keep.



On Sat, Nov 26, 2011 at 8:26 PM, Ronnie Dove ron...@oceansync.com wrote:

 Hello Raimon,

 I like the idea of being able to search through files on HDFS so that we
 can find keywords or timestamp criteria, something that OceanSync will be
 doing in the future as a tool option.  The others have told you some great
 ideas but I wanted to help you out from a Java API perspective.  If you are
 a Java programmer, you would utilize FileSystem.listFiles() which returns
 the directory listing in a FileStatus[] format.  You would crawl through
 the FileStatus Array in search for whether the FileStatus is a file or a
 directory.  If it is a file, you will check the time stamp of the file
 using the FileStatus.getModificationTime().  If its a directory than it
 will be processed again using a while loop to check the contents of that
 directory.  This sounds tough but as part of testing this, it is fairly
 fast and accurate.  Below are the two API's that are needed as part of this
 method:


 http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileStatus.html

 http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html

 
 Ronnie Dove
 OceanSync Management Developer
 http://www.oceansync.com
 RDove on irc.freenode.net #Hadoop


 - Original Message -
 From: Raimon Bosch raimon.bo...@gmail.com
 To: common-user@hadoop.apache.org
 Cc:
 Sent: Saturday, November 26, 2011 10:01 AM
 Subject: How to delete files older than X days in HDFS/Hadoop

 Hi,

 I'm wondering how to delete files older than X days with HDFS/Hadoop. On
 linux we can do it with the folowing command:

 find ~/datafolder/* -mtime +7 -exec rm {} \;

 Any ideas?




How to delete files older than X days in HDFS/Hadoop

2011-11-26 Thread Raimon Bosch
Hi,

I'm wondering how to delete files older than X days with HDFS/Hadoop. On
linux we can do it with the folowing command:

find ~/datafolder/* -mtime +7 -exec rm {} \;

Any ideas?


RE: How to delete files older than X days in HDFS/Hadoop

2011-11-26 Thread Uma Maheswara Rao G
AFAIK, there is no facility like this in HDFS through command line.
One option is, write small client program and collect the files from root based 
on your condition and invoke delete on them.

Regards,
Uma

From: Raimon Bosch [raimon.bo...@gmail.com]
Sent: Saturday, November 26, 2011 8:31 PM
To: common-user@hadoop.apache.org
Subject: How to delete files older than X days in HDFS/Hadoop

Hi,

I'm wondering how to delete files older than X days with HDFS/Hadoop. On
linux we can do it with the folowing command:

find ~/datafolder/* -mtime +7 -exec rm {} \;

Any ideas?


Re: How to delete files older than X days in HDFS/Hadoop

2011-11-26 Thread Prashant Sharma
Wont be that easy but its possible to write.
I did something like this.
$HADOOP_HOME/bin/hadoop fs -rmr  `$HADOOP_HOME/bin/hadoop fs -ls | grep
'.*2011.11.1[1-8].*' | cut -f 19 -d \ `

Notice a space in -d \SPACE.

-P

On Sat, Nov 26, 2011 at 8:46 PM, Uma Maheswara Rao G
mahesw...@huawei.comwrote:

 AFAIK, there is no facility like this in HDFS through command line.
 One option is, write small client program and collect the files from root
 based on your condition and invoke delete on them.

 Regards,
 Uma
 
 From: Raimon Bosch [raimon.bo...@gmail.com]
 Sent: Saturday, November 26, 2011 8:31 PM
 To: common-user@hadoop.apache.org
 Subject: How to delete files older than X days in HDFS/Hadoop

 Hi,

 I'm wondering how to delete files older than X days with HDFS/Hadoop. On
 linux we can do it with the folowing command:

 find ~/datafolder/* -mtime +7 -exec rm {} \;

 Any ideas?



Re: How to delete files older than X days in HDFS/Hadoop

2011-11-26 Thread Ronnie Dove
Hello Raimon, 

I like the idea of being able to search through files on HDFS so that we can 
find keywords or timestamp criteria, something that OceanSync will be doing in 
the future as a tool option.  The others have told you some great ideas but I 
wanted to help you out from a Java API perspective.  If you are a Java 
programmer, you would utilize FileSystem.listFiles() which returns the 
directory listing in a FileStatus[] format.  You would crawl through the 
FileStatus Array in search for whether the FileStatus is a file or a directory. 
 If it is a file, you will check the time stamp of the file using the 
FileStatus.getModificationTime().  If its a directory than it will be processed 
again using a while loop to check the contents of that directory.  This sounds 
tough but as part of testing this, it is fairly fast and accurate.  Below are 
the two API's that are needed as part of this method:

http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileStatus.html
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html


Ronnie Dove
OceanSync Management Developer
http://www.oceansync.com
RDove on irc.freenode.net #Hadoop


- Original Message -
From: Raimon Bosch raimon.bo...@gmail.com
To: common-user@hadoop.apache.org
Cc: 
Sent: Saturday, November 26, 2011 10:01 AM
Subject: How to delete files older than X days in HDFS/Hadoop

Hi,

I'm wondering how to delete files older than X days with HDFS/Hadoop. On
linux we can do it with the folowing command:

find ~/datafolder/* -mtime +7 -exec rm {} \;

Any ideas?