Hi, I have a folder temp1 in hdfs which have multiple format files
test1.txt, test2.avsc (Avro file) in it. Now I want to compress these files
together and store it under temp2 folder in hdfs. Expecting that temp2
folder will have one file test_compress.gz which has test1.txt and
test2.avsc under it. Is there any possible/effiencient way to achieve this?

Thanks,
Aj

On Tuesday, May 10, 2016, Ajay Chander <itsche...@gmail.com> wrote:

> I will try that out. Thank you!
>
> On Tuesday, May 10, 2016, Deepak Sharma <deepakmc...@gmail.com
> <javascript:_e(%7B%7D,'cvml','deepakmc...@gmail.com');>> wrote:
>
>> Yes that's what I intended to say.
>>
>> Thanks
>> Deepak
>> On 10 May 2016 11:47 pm, "Ajay Chander" <itsche...@gmail.com> wrote:
>>
>>> Hi Deepak,
>>>        Thanks for your response. If I am correct, you suggest reading
>>> all of those files into an rdd on the cluster using wholeTextFiles then
>>> apply compression codec on it, save the rdd to another Hadoop cluster?
>>>
>>> Thank you,
>>> Ajay
>>>
>>> On Tuesday, May 10, 2016, Deepak Sharma <deepakmc...@gmail.com> wrote:
>>>
>>>> Hi Ajay
>>>> You can look at wholeTextFiles method of rdd[string,string] and then
>>>> map each of rdd  to saveAsTextFile .
>>>> This will serve the purpose .
>>>> I don't think if anything default like distcp exists in spark
>>>>
>>>> Thanks
>>>> Deepak
>>>> On 10 May 2016 11:27 pm, "Ajay Chander" <itsche...@gmail.com> wrote:
>>>>
>>>>> Hi Everyone,
>>>>>
>>>>> we are planning to migrate the data between 2 clusters and I see
>>>>> distcp doesn't support data compression. Is there any efficient way to
>>>>> compress the data during the migration ? Can I implement any spark job to
>>>>> do this ? Thanks.
>>>>>
>>>>

Reply via email to