Re: HDFS File Appending URGENT

2011-06-16 Thread jagaran das
Is the hadoop version Hadoop 0.20.203.0 API

That means still the hadoop files in HDFS version 0.20.20  are immutable?
And there is no means we can append to an existing file in HDFS?

We need to do this urgently as we have do set up the pipeline accordingly in 
production?

Regards,
Jagaran 




From: Xiaobo Gu 
To: common-user@hadoop.apache.org
Sent: Thu, 16 June, 2011 6:26:45 PM
Subject: Re: HDFS File Appending

please refer to FileUtil.CopyMerge

On Fri, Jun 17, 2011 at 8:33 AM, jagaran das  wrote:
> Hi,
>
> We have a requirement where
>
>  There would be huge number of small files to be pushed to hdfs and then use 
>pig
> to do analysis.
>  To get around the classic "Small File Issue" we merge the files and push a
> bigger file in to HDFS.
>  But we are loosing time in this merging process of our pipeline.
>
> But If we can directly append to an existing file in HDFS we can save this
> "Merging Files" time.
>
> Can you please suggest if there a newer stable version of Hadoop where can go
> for appending ?
>
> Thanks and Regards,
> Jagaran


Re: HDFS File Appending URGENT

2011-06-16 Thread Xiaobo Gu
You can merge multiple files into a new one, there is no means to
append to a existing file.

On Fri, Jun 17, 2011 at 10:29 AM, jagaran das  wrote:
> Is the hadoop version Hadoop 0.20.203.0 API
>
> That means still the hadoop files in HDFS version 0.20.20  are immutable?
> And there is no means we can append to an existing file in HDFS?
>
> We need to do this urgently as we have do set up the pipeline accordingly in
> production?
>
> Regards,
> Jagaran
>
>
>
> 
> From: Xiaobo Gu 
> To: common-user@hadoop.apache.org
> Sent: Thu, 16 June, 2011 6:26:45 PM
> Subject: Re: HDFS File Appending
>
> please refer to FileUtil.CopyMerge
>
> On Fri, Jun 17, 2011 at 8:33 AM, jagaran das  wrote:
>> Hi,
>>
>> We have a requirement where
>>
>>  There would be huge number of small files to be pushed to hdfs and then use
>>pig
>> to do analysis.
>>  To get around the classic "Small File Issue" we merge the files and push a
>> bigger file in to HDFS.
>>  But we are loosing time in this merging process of our pipeline.
>>
>> But If we can directly append to an existing file in HDFS we can save this
>> "Merging Files" time.
>>
>> Can you please suggest if there a newer stable version of Hadoop where can go
>> for appending ?
>>
>> Thanks and Regards,
>> Jagaran
>


Re: HDFS File Appending URGENT

2011-06-16 Thread jagaran das
Thanks a lot Xiabo.

I have tried with the  below code in HDFS version 0.20.20 and it worked.
Is it not stable yet?

public class HadoopFileWriter {
public static void main (String [] args) throws Exception{
try{
URI uri = new 
URI("hdfs://localhost:9000/Users/jagarandas/Work-Assignment/Analytics/analytics-poc/hadoop-0.20.203.0/data/test.dat");

Path pt=new Path(uri);
FileSystem fs = FileSystem.get(new Configuration());
BufferedWriter br;
if(fs.isFile(pt)){
br=new BufferedWriter(new OutputStreamWriter(fs.append(pt)));
 br.newLine();
}else{
 br=new BufferedWriter(new OutputStreamWriter(fs.create(pt,true)));
}
String line = args[0];
System.out.println(line);
br.write(line);
br.close();
}catch(Exception e){
e.printStackTrace();
System.out.println("File not found");
}
}
}

Thanks a lot for your help.

Regards,
Jagaran 





From: Xiaobo Gu 
To: common-user@hadoop.apache.org
Sent: Thu, 16 June, 2011 8:01:14 PM
Subject: Re: HDFS File Appending URGENT

You can merge multiple files into a new one, there is no means to
append to a existing file.

On Fri, Jun 17, 2011 at 10:29 AM, jagaran das  wrote:
> Is the hadoop version Hadoop 0.20.203.0 API
>
> That means still the hadoop files in HDFS version 0.20.20  are immutable?
> And there is no means we can append to an existing file in HDFS?
>
> We need to do this urgently as we have do set up the pipeline accordingly in
> production?
>
> Regards,
> Jagaran
>
>
>
> 
> From: Xiaobo Gu 
> To: common-user@hadoop.apache.org
> Sent: Thu, 16 June, 2011 6:26:45 PM
> Subject: Re: HDFS File Appending
>
> please refer to FileUtil.CopyMerge
>
> On Fri, Jun 17, 2011 at 8:33 AM, jagaran das  wrote:
>> Hi,
>>
>> We have a requirement where
>>
>>  There would be huge number of small files to be pushed to hdfs and then use
>>pig
>> to do analysis.
>>  To get around the classic "Small File Issue" we merge the files and push a
>> bigger file in to HDFS.
>>  But we are loosing time in this merging process of our pipeline.
>>
>> But If we can directly append to an existing file in HDFS we can save this
>> "Merging Files" time.
>>
>> Can you please suggest if there a newer stable version of Hadoop where can go
>> for appending ?
>>
>> Thanks and Regards,
>> Jagaran
>


Fw: HDFS File Appending URGENT

2011-06-17 Thread jagaran das
Please help me on this.
I need it very urgently

Regards,
Jagaran 


- Forwarded Message 
From: jagaran das 
To: common-user@hadoop.apache.org
Sent: Thu, 16 June, 2011 9:51:51 PM
Subject: Re: HDFS File Appending URGENT

Thanks a lot Xiabo.

I have tried with the  below code in HDFS version 0.20.20 and it worked.
Is it not stable yet?

public class HadoopFileWriter {
public static void main (String [] args) throws Exception{
try{
URI uri = new 
URI("hdfs://localhost:9000/Users/jagarandas/Work-Assignment/Analytics/analytics-poc/hadoop-0.20.203.0/data/test.dat");


Path pt=new Path(uri);
FileSystem fs = FileSystem.get(new Configuration());
BufferedWriter br;
if(fs.isFile(pt)){
br=new BufferedWriter(new OutputStreamWriter(fs.append(pt)));
br.newLine();
}else{
br=new BufferedWriter(new OutputStreamWriter(fs.create(pt,true)));
}
String line = args[0];
System.out.println(line);
br.write(line);
br.close();
}catch(Exception e){
e.printStackTrace();
System.out.println("File not found");
}
}
}

Thanks a lot for your help.

Regards,
Jagaran 





From: Xiaobo Gu 
To: common-user@hadoop.apache.org
Sent: Thu, 16 June, 2011 8:01:14 PM
Subject: Re: HDFS File Appending URGENT

You can merge multiple files into a new one, there is no means to
append to a existing file.

On Fri, Jun 17, 2011 at 10:29 AM, jagaran das  wrote:
> Is the hadoop version Hadoop 0.20.203.0 API
>
> That means still the hadoop files in HDFS version 0.20.20  are immutable?
> And there is no means we can append to an existing file in HDFS?
>
> We need to do this urgently as we have do set up the pipeline accordingly in
> production?
>
> Regards,
> Jagaran
>
>
>
> 
> From: Xiaobo Gu 
> To: common-user@hadoop.apache.org
> Sent: Thu, 16 June, 2011 6:26:45 PM
> Subject: Re: HDFS File Appending
>
> please refer to FileUtil.CopyMerge
>
> On Fri, Jun 17, 2011 at 8:33 AM, jagaran das  wrote:
>> Hi,
>>
>> We have a requirement where
>>
>>  There would be huge number of small files to be pushed to hdfs and then use
>>pig
>> to do analysis.
>>  To get around the classic "Small File Issue" we merge the files and push a
>> bigger file in to HDFS.
>>  But we are loosing time in this merging process of our pipeline.
>>
>> But If we can directly append to an existing file in HDFS we can save this
>> "Merging Files" time.
>>
>> Can you please suggest if there a newer stable version of Hadoop where can go
>> for appending ?
>>
>> Thanks and Regards,
>> Jagaran
>


Re: HDFS File Appending URGENT

2011-06-17 Thread Tsz Wo (Nicholas), Sze
Hi Jagaran,

Short answer: the append feature is not in any release.  In this sense, it is 
not stable.  Below are more details on the Append feature status.

- 0.20.x (includes release 0.20.2)
There are known bugs in append.  The bugs may cause data loss.

- 0.20-append
There were effort on fixing the known append bugs but there are no releases.  I 
heard Facebook was using it (with additional patches?) in production but I did 
not have the details.

- 0.21
It has a new append design (HDFS-265).  However, the 0.21.0 release is only a 
minor release.  It has not undergone testing at scale and should not be 
considered stable or suitable for production.  Also, 0.21 development has been 
discontinued.  Newly discovered bugs may not be fixed.

- 0.22, 0.23
Not yet released.


Regards,
Tsz-Wo





From: jagaran das 
To: common-user@hadoop.apache.org
Sent: Fri, June 17, 2011 11:15:04 AM
Subject: Fw: HDFS File Appending URGENT

Please help me on this.
I need it very urgently

Regards,
Jagaran 


- Forwarded Message 
From: jagaran das 
To: common-user@hadoop.apache.org
Sent: Thu, 16 June, 2011 9:51:51 PM
Subject: Re: HDFS File Appending URGENT

Thanks a lot Xiabo.

I have tried with the  below code in HDFS version 0.20.20 and it worked.
Is it not stable yet?

public class HadoopFileWriter {
public static void main (String [] args) throws Exception{
try{
URI uri = new 
URI("hdfs://localhost:9000/Users/jagarandas/Work-Assignment/Analytics/analytics-poc/hadoop-0.20.203.0/data/test.dat");



Path pt=new Path(uri);
FileSystem fs = FileSystem.get(new Configuration());
BufferedWriter br;
if(fs.isFile(pt)){
br=new BufferedWriter(new OutputStreamWriter(fs.append(pt)));
br.newLine();
}else{
br=new BufferedWriter(new OutputStreamWriter(fs.create(pt,true)));
}
String line = args[0];
System.out.println(line);
br.write(line);
br.close();
}catch(Exception e){
e.printStackTrace();
System.out.println("File not found");
}
}
}

Thanks a lot for your help.

Regards,
Jagaran 





From: Xiaobo Gu 
To: common-user@hadoop.apache.org
Sent: Thu, 16 June, 2011 8:01:14 PM
Subject: Re: HDFS File Appending URGENT

You can merge multiple files into a new one, there is no means to
append to a existing file.

On Fri, Jun 17, 2011 at 10:29 AM, jagaran das  wrote:
> Is the hadoop version Hadoop 0.20.203.0 API
>
> That means still the hadoop files in HDFS version 0.20.20  are immutable?
> And there is no means we can append to an existing file in HDFS?
>
> We need to do this urgently as we have do set up the pipeline accordingly in
> production?
>
> Regards,
> Jagaran
>
>
>
> 
> From: Xiaobo Gu 
> To: common-user@hadoop.apache.org
> Sent: Thu, 16 June, 2011 6:26:45 PM
> Subject: Re: HDFS File Appending
>
> please refer to FileUtil.CopyMerge
>
> On Fri, Jun 17, 2011 at 8:33 AM, jagaran das  wrote:
>> Hi,
>>
>> We have a requirement where
>>
>>  There would be huge number of small files to be pushed to hdfs and then use
>>pig
>> to do analysis.
>>  To get around the classic "Small File Issue" we merge the files and push a
>> bigger file in to HDFS.
>>  But we are loosing time in this merging process of our pipeline.
>>
>> But If we can directly append to an existing file in HDFS we can save this
>> "Merging Files" time.
>>
>> Can you please suggest if there a newer stable version of Hadoop where can go
>> for appending ?
>>
>> Thanks and Regards,
>> Jagaran
>


Re: HDFS File Appending URGENT

2011-06-17 Thread jagaran das
Thanks a lot guys.

Another query for production.

Do we have any way by which we can purge the hdfs job and history logs on time 
basis.
For example we want to keep only last 30 days log and its size is increasing a 
lot in production.

Thanks again

Regards,
Jagaran 




From: "Tsz Wo (Nicholas), Sze" 
To: common-user@hadoop.apache.org
Sent: Fri, 17 June, 2011 11:45:22 AM
Subject: Re: HDFS File Appending URGENT

Hi Jagaran,

Short answer: the append feature is not in any release.  In this sense, it is 
not stable.  Below are more details on the Append feature status.

- 0.20.x (includes release 0.20.2)
There are known bugs in append.  The bugs may cause data loss.

- 0.20-append
There were effort on fixing the known append bugs but there are no releases.  I 
heard Facebook was using it (with additional patches?) in production but I did 
not have the details.

- 0.21
It has a new append design (HDFS-265).  However, the 0.21.0 release is only a 
minor release.  It has not undergone testing at scale and should not be 
considered stable or suitable for production.  Also, 0.21 development has been 
discontinued.  Newly discovered bugs may not be fixed.

- 0.22, 0.23
Not yet released.


Regards,
Tsz-Wo





From: jagaran das 
To: common-user@hadoop.apache.org
Sent: Fri, June 17, 2011 11:15:04 AM
Subject: Fw: HDFS File Appending URGENT

Please help me on this.
I need it very urgently

Regards,
Jagaran 


- Forwarded Message 
From: jagaran das 
To: common-user@hadoop.apache.org
Sent: Thu, 16 June, 2011 9:51:51 PM
Subject: Re: HDFS File Appending URGENT

Thanks a lot Xiabo.

I have tried with the  below code in HDFS version 0.20.20 and it worked.
Is it not stable yet?

public class HadoopFileWriter {
public static void main (String [] args) throws Exception{
try{
URI uri = new 
URI("hdfs://localhost:9000/Users/jagarandas/Work-Assignment/Analytics/analytics-poc/hadoop-0.20.203.0/data/test.dat");




Path pt=new Path(uri);
FileSystem fs = FileSystem.get(new Configuration());
BufferedWriter br;
if(fs.isFile(pt)){
br=new BufferedWriter(new OutputStreamWriter(fs.append(pt)));
br.newLine();
}else{
br=new BufferedWriter(new OutputStreamWriter(fs.create(pt,true)));
}
String line = args[0];
System.out.println(line);
br.write(line);
br.close();
}catch(Exception e){
e.printStackTrace();
System.out.println("File not found");
}
}
}

Thanks a lot for your help.

Regards,
Jagaran 





From: Xiaobo Gu 
To: common-user@hadoop.apache.org
Sent: Thu, 16 June, 2011 8:01:14 PM
Subject: Re: HDFS File Appending URGENT

You can merge multiple files into a new one, there is no means to
append to a existing file.

On Fri, Jun 17, 2011 at 10:29 AM, jagaran das  wrote:
> Is the hadoop version Hadoop 0.20.203.0 API
>
> That means still the hadoop files in HDFS version 0.20.20  are immutable?
> And there is no means we can append to an existing file in HDFS?
>
> We need to do this urgently as we have do set up the pipeline accordingly in
> production?
>
> Regards,
> Jagaran
>
>
>
> 
> From: Xiaobo Gu 
> To: common-user@hadoop.apache.org
> Sent: Thu, 16 June, 2011 6:26:45 PM
> Subject: Re: HDFS File Appending
>
> please refer to FileUtil.CopyMerge
>
> On Fri, Jun 17, 2011 at 8:33 AM, jagaran das  wrote:
>> Hi,
>>
>> We have a requirement where
>>
>>  There would be huge number of small files to be pushed to hdfs and then use
>>pig
>> to do analysis.
>>  To get around the classic "Small File Issue" we merge the files and push a
>> bigger file in to HDFS.
>>  But we are loosing time in this merging process of our pipeline.
>>
>> But If we can directly append to an existing file in HDFS we can save this
>> "Merging Files" time.
>>
>> Can you please suggest if there a newer stable version of Hadoop where can go
>> for appending ?
>>
>> Thanks and Regards,
>> Jagaran
>


Re: HDFS File Appending URGENT

2011-06-20 Thread 潘飞
it seems that CDH3 hadoop 0.20.2 support appending feature

2011/6/18 jagaran das 

> Thanks a lot guys.
>
> Another query for production.
>
> Do we have any way by which we can purge the hdfs job and history logs on
> time
> basis.
> For example we want to keep only last 30 days log and its size is
> increasing a
> lot in production.
>
> Thanks again
>
> Regards,
> Jagaran
>
>
>
> 
> From: "Tsz Wo (Nicholas), Sze" 
> To: common-user@hadoop.apache.org
> Sent: Fri, 17 June, 2011 11:45:22 AM
> Subject: Re: HDFS File Appending URGENT
>
> Hi Jagaran,
>
> Short answer: the append feature is not in any release.  In this sense, it
> is
> not stable.  Below are more details on the Append feature status.
>
> - 0.20.x (includes release 0.20.2)
> There are known bugs in append.  The bugs may cause data loss.
>
> - 0.20-append
> There were effort on fixing the known append bugs but there are no
> releases.  I
> heard Facebook was using it (with additional patches?) in production but I
> did
> not have the details.
>
> - 0.21
> It has a new append design (HDFS-265).  However, the 0.21.0 release is only
> a
> minor release.  It has not undergone testing at scale and should not be
> considered stable or suitable for production.  Also, 0.21 development has
> been
> discontinued.  Newly discovered bugs may not be fixed.
>
> - 0.22, 0.23
> Not yet released.
>
>
> Regards,
> Tsz-Wo
>
>
>
>
> 
> From: jagaran das 
> To: common-user@hadoop.apache.org
> Sent: Fri, June 17, 2011 11:15:04 AM
> Subject: Fw: HDFS File Appending URGENT
>
> Please help me on this.
> I need it very urgently
>
> Regards,
> Jagaran
>
>
> - Forwarded Message 
> From: jagaran das 
> To: common-user@hadoop.apache.org
> Sent: Thu, 16 June, 2011 9:51:51 PM
> Subject: Re: HDFS File Appending URGENT
>
> Thanks a lot Xiabo.
>
> I have tried with the  below code in HDFS version 0.20.20 and it worked.
> Is it not stable yet?
>
> public class HadoopFileWriter {
> public static void main (String [] args) throws Exception{
> try{
> URI uri = new
>
> URI("hdfs://localhost:9000/Users/jagarandas/Work-Assignment/Analytics/analytics-poc/hadoop-0.20.203.0/data/test.dat");
>
>
>
>
> Path pt=new Path(uri);
> FileSystem fs = FileSystem.get(new Configuration());
> BufferedWriter br;
> if(fs.isFile(pt)){
> br=new BufferedWriter(new OutputStreamWriter(fs.append(pt)));
> br.newLine();
> }else{
> br=new BufferedWriter(new OutputStreamWriter(fs.create(pt,true)));
> }
> String line = args[0];
> System.out.println(line);
> br.write(line);
> br.close();
> }catch(Exception e){
> e.printStackTrace();
> System.out.println("File not found");
> }
> }
> }
>
> Thanks a lot for your help.
>
> Regards,
> Jagaran
>
>
>
>
> 
> From: Xiaobo Gu 
> To: common-user@hadoop.apache.org
> Sent: Thu, 16 June, 2011 8:01:14 PM
> Subject: Re: HDFS File Appending URGENT
>
> You can merge multiple files into a new one, there is no means to
> append to a existing file.
>
> On Fri, Jun 17, 2011 at 10:29 AM, jagaran das 
> wrote:
> > Is the hadoop version Hadoop 0.20.203.0 API
> >
> > That means still the hadoop files in HDFS version 0.20.20  are immutable?
> > And there is no means we can append to an existing file in HDFS?
> >
> > We need to do this urgently as we have do set up the pipeline accordingly
> in
> > production?
> >
> > Regards,
> > Jagaran
> >
> >
> >
> > 
> > From: Xiaobo Gu 
> > To: common-user@hadoop.apache.org
> > Sent: Thu, 16 June, 2011 6:26:45 PM
> > Subject: Re: HDFS File Appending
> >
> > please refer to FileUtil.CopyMerge
> >
> > On Fri, Jun 17, 2011 at 8:33 AM, jagaran das 
> wrote:
> >> Hi,
> >>
> >> We have a requirement where
> >>
> >>  There would be huge number of small files to be pushed to hdfs and then
> use
> >>pig
> >> to do analysis.
> >>  To get around the classic "Small File Issue" we merge the files and
> push a
> >> bigger file in to HDFS.
> >>  But we are loosing time in this merging process of our pipeline.
> >>
> >> But If we can directly append to an existing file in HDFS we can save
> this
> >> "Merging Files" time.
> >>
> >> Can you please suggest if there a newer stable version of Hadoop where
> can go
> >> for appending ?
> >>
> >> Thanks and Regards,
> >> Jagaran
> >
>



-- 
Stay Hungry. Stay Foolish.