Re: Cluster Migration

2016-05-10 Thread Ajay Chander
Never mind! I figured it out by saving it as hadoopfile and passing the codec to it. Thank you! On Tuesday, May 10, 2016, Ajay Chander wrote: > Hi, I have a folder temp1 in hdfs which have multiple format files > test1.txt, test2.avsc (Avro file) in it. Now I want to

Re: Cluster Migration

2016-05-10 Thread Ajay Chander
Hi, I have a folder temp1 in hdfs which have multiple format files test1.txt, test2.avsc (Avro file) in it. Now I want to compress these files together and store it under temp2 folder in hdfs. Expecting that temp2 folder will have one file test_compress.gz which has test1.txt and test2.avsc under

Re: Cluster Migration

2016-05-10 Thread Ajay Chander
Hi Deepak, Thanks for your response. If I am correct, you suggest reading all of those files into an rdd on the cluster using wholeTextFiles then apply compression codec on it, save the rdd to another Hadoop cluster? Thank you, Ajay On Tuesday, May 10, 2016, Deepak Sharma

Re: Cluster Migration

2016-05-10 Thread Ajay Chander
I will try that out. Thank you! On Tuesday, May 10, 2016, Deepak Sharma wrote: > Yes that's what I intended to say. > > Thanks > Deepak > On 10 May 2016 11:47 pm, "Ajay Chander" > wrote: > >> Hi

Re: Cluster Migration

2016-05-10 Thread Deepak Sharma
Yes that's what I intended to say. Thanks Deepak On 10 May 2016 11:47 pm, "Ajay Chander" wrote: > Hi Deepak, >Thanks for your response. If I am correct, you suggest reading all > of those files into an rdd on the cluster using wholeTextFiles then apply > compression

Re: Cluster Migration

2016-05-10 Thread Deepak Sharma
Hi Ajay You can look at wholeTextFiles method of rdd[string,string] and then map each of rdd to saveAsTextFile . This will serve the purpose . I don't think if anything default like distcp exists in spark Thanks Deepak On 10 May 2016 11:27 pm, "Ajay Chander" wrote: > Hi

Cluster Migration

2016-05-10 Thread Ajay Chander
Hi Everyone, we are planning to migrate the data between 2 clusters and I see distcp doesn't support data compression. Is there any efficient way to compress the data during the migration ? Can I implement any spark job to do this ? Thanks.