Yes that's what I intended to say. Thanks Deepak On 10 May 2016 11:47 pm, "Ajay Chander" <itsche...@gmail.com> wrote:
> Hi Deepak, > Thanks for your response. If I am correct, you suggest reading all > of those files into an rdd on the cluster using wholeTextFiles then apply > compression codec on it, save the rdd to another Hadoop cluster? > > Thank you, > Ajay > > On Tuesday, May 10, 2016, Deepak Sharma <deepakmc...@gmail.com> wrote: > >> Hi Ajay >> You can look at wholeTextFiles method of rdd[string,string] and then map >> each of rdd to saveAsTextFile . >> This will serve the purpose . >> I don't think if anything default like distcp exists in spark >> >> Thanks >> Deepak >> On 10 May 2016 11:27 pm, "Ajay Chander" <itsche...@gmail.com> wrote: >> >>> Hi Everyone, >>> >>> we are planning to migrate the data between 2 clusters and I see distcp >>> doesn't support data compression. Is there any efficient way to compress >>> the data during the migration ? Can I implement any spark job to do this ? >>> Thanks. >>> >>