Never mind! I figured it out by saving it as hadoopfile and passing the
codec to it. Thank you!

On Tuesday, May 10, 2016, Ajay Chander <itsche...@gmail.com> wrote:

> Hi, I have a folder temp1 in hdfs which have multiple format files
> test1.txt, test2.avsc (Avro file) in it. Now I want to compress these files
> together and store it under temp2 folder in hdfs. Expecting that temp2
> folder will have one file test_compress.gz which has test1.txt and
> test2.avsc under it. Is there any possible/effiencient way to achieve this?
>
> Thanks,
> Aj
>
> On Tuesday, May 10, 2016, Ajay Chander <itsche...@gmail.com
> <javascript:_e(%7B%7D,'cvml','itsche...@gmail.com');>> wrote:
>
>> I will try that out. Thank you!
>>
>> On Tuesday, May 10, 2016, Deepak Sharma <deepakmc...@gmail.com> wrote:
>>
>>> Yes that's what I intended to say.
>>>
>>> Thanks
>>> Deepak
>>> On 10 May 2016 11:47 pm, "Ajay Chander" <itsche...@gmail.com> wrote:
>>>
>>>> Hi Deepak,
>>>>        Thanks for your response. If I am correct, you suggest reading
>>>> all of those files into an rdd on the cluster using wholeTextFiles then
>>>> apply compression codec on it, save the rdd to another Hadoop cluster?
>>>>
>>>> Thank you,
>>>> Ajay
>>>>
>>>> On Tuesday, May 10, 2016, Deepak Sharma <deepakmc...@gmail.com> wrote:
>>>>
>>>>> Hi Ajay
>>>>> You can look at wholeTextFiles method of rdd[string,string] and then
>>>>> map each of rdd  to saveAsTextFile .
>>>>> This will serve the purpose .
>>>>> I don't think if anything default like distcp exists in spark
>>>>>
>>>>> Thanks
>>>>> Deepak
>>>>> On 10 May 2016 11:27 pm, "Ajay Chander" <itsche...@gmail.com> wrote:
>>>>>
>>>>>> Hi Everyone,
>>>>>>
>>>>>> we are planning to migrate the data between 2 clusters and I see
>>>>>> distcp doesn't support data compression. Is there any efficient way to
>>>>>> compress the data during the migration ? Can I implement any spark job to
>>>>>> do this ? Thanks.
>>>>>>
>>>>>

Reply via email to