Hi, I have a folder temp1 in hdfs which have multiple format files test1.txt, test2.avsc (Avro file) in it. Now I want to compress these files together and store it under temp2 folder in hdfs. Expecting that temp2 folder will have one file test_compress.gz which has test1.txt and test2.avsc under it. Is there any possible/effiencient way to achieve this?
Thanks, Aj On Tuesday, May 10, 2016, Ajay Chander <itsche...@gmail.com> wrote: > I will try that out. Thank you! > > On Tuesday, May 10, 2016, Deepak Sharma <deepakmc...@gmail.com > <javascript:_e(%7B%7D,'cvml','deepakmc...@gmail.com');>> wrote: > >> Yes that's what I intended to say. >> >> Thanks >> Deepak >> On 10 May 2016 11:47 pm, "Ajay Chander" <itsche...@gmail.com> wrote: >> >>> Hi Deepak, >>> Thanks for your response. If I am correct, you suggest reading >>> all of those files into an rdd on the cluster using wholeTextFiles then >>> apply compression codec on it, save the rdd to another Hadoop cluster? >>> >>> Thank you, >>> Ajay >>> >>> On Tuesday, May 10, 2016, Deepak Sharma <deepakmc...@gmail.com> wrote: >>> >>>> Hi Ajay >>>> You can look at wholeTextFiles method of rdd[string,string] and then >>>> map each of rdd to saveAsTextFile . >>>> This will serve the purpose . >>>> I don't think if anything default like distcp exists in spark >>>> >>>> Thanks >>>> Deepak >>>> On 10 May 2016 11:27 pm, "Ajay Chander" <itsche...@gmail.com> wrote: >>>> >>>>> Hi Everyone, >>>>> >>>>> we are planning to migrate the data between 2 clusters and I see >>>>> distcp doesn't support data compression. Is there any efficient way to >>>>> compress the data during the migration ? Can I implement any spark job to >>>>> do this ? Thanks. >>>>> >>>>