Hi Ajay You can look at wholeTextFiles method of rdd[string,string] and then map each of rdd to saveAsTextFile . This will serve the purpose . I don't think if anything default like distcp exists in spark
Thanks Deepak On 10 May 2016 11:27 pm, "Ajay Chander" <itsche...@gmail.com> wrote: > Hi Everyone, > > we are planning to migrate the data between 2 clusters and I see distcp > doesn't support data compression. Is there any efficient way to compress > the data during the migration ? Can I implement any spark job to do this ? > Thanks. >