Never mind! I figured it out by saving it as hadoopfile and passing the codec to it. Thank you!
On Tuesday, May 10, 2016, Ajay Chander <itsche...@gmail.com> wrote: > Hi, I have a folder temp1 in hdfs which have multiple format files > test1.txt, test2.avsc (Avro file) in it. Now I want to compress these files > together and store it under temp2 folder in hdfs. Expecting that temp2 > folder will have one file test_compress.gz which has test1.txt and > test2.avsc under it. Is there any possible/effiencient way to achieve this? > > Thanks, > Aj > > On Tuesday, May 10, 2016, Ajay Chander <itsche...@gmail.com > <javascript:_e(%7B%7D,'cvml','itsche...@gmail.com');>> wrote: > >> I will try that out. Thank you! >> >> On Tuesday, May 10, 2016, Deepak Sharma <deepakmc...@gmail.com> wrote: >> >>> Yes that's what I intended to say. >>> >>> Thanks >>> Deepak >>> On 10 May 2016 11:47 pm, "Ajay Chander" <itsche...@gmail.com> wrote: >>> >>>> Hi Deepak, >>>> Thanks for your response. If I am correct, you suggest reading >>>> all of those files into an rdd on the cluster using wholeTextFiles then >>>> apply compression codec on it, save the rdd to another Hadoop cluster? >>>> >>>> Thank you, >>>> Ajay >>>> >>>> On Tuesday, May 10, 2016, Deepak Sharma <deepakmc...@gmail.com> wrote: >>>> >>>>> Hi Ajay >>>>> You can look at wholeTextFiles method of rdd[string,string] and then >>>>> map each of rdd to saveAsTextFile . >>>>> This will serve the purpose . >>>>> I don't think if anything default like distcp exists in spark >>>>> >>>>> Thanks >>>>> Deepak >>>>> On 10 May 2016 11:27 pm, "Ajay Chander" <itsche...@gmail.com> wrote: >>>>> >>>>>> Hi Everyone, >>>>>> >>>>>> we are planning to migrate the data between 2 clusters and I see >>>>>> distcp doesn't support data compression. Is there any efficient way to >>>>>> compress the data during the migration ? Can I implement any spark job to >>>>>> do this ? Thanks. >>>>>> >>>>>