Re: How does mapreduce job determine the compress codec

2013-12-16 Thread Jiayu Ji
Thanks Azurry. That was exactly the thing I want to know.


On Sun, Dec 15, 2013 at 7:53 PM, Azuryy Yu  wrote:

> Hi Jiayu,
> For the Sequence file as an input, CompressCodec class was serialized in
> the file header, then Sequence Filereader will know the compression algo.
> thanks.
>
>
>
>
> On Mon, Dec 16, 2013 at 8:28 AM, Jiayu Ji  wrote:
>
>> Thanks Tao. I know I can tell it is a lzo file based on the magic number.
>> What I am curious is which class in hadoop used by the mapreduce job to
>> determine the file compression algorithm. At the end of the day, I am
>> trying to figure out whether all the inputs of a mapreduce job have to be
>> compressed with the same algorithm.
>>
>>
>> On Fri, Dec 13, 2013 at 11:16 PM, Tao Xiao wrote:
>>
>>> I suggest you download the lzo compressed file, no matter weather it has
>>> a lzo extension as its file name,  and open it in the form of hex bytes
>>> with tools like UltraEdit, and have a look at its heading contents.
>>>
>>>
>>> 2013/12/14 Jiayu Ji 
>>>
>>>> Hi
>>>>
>>>> I am having this question on how does mapreduce job determine the
>>>> compress codec on hdfs. From what I read on the definitive guide (page
>>>> 86)," the CompressionCodecFactory provides a way of mapping a filename
>>>> extension to a CompressionCodec using its getCodec() method". I did a test
>>>> with a lzo compressed file without a lzo extension. However, the mapreduce
>>>> job was still able to get the right codec. Does anyone know why? Thanks in
>>>> advance.
>>>>
>>>> Jiayu
>>>>
>>>
>>>
>>
>>
>> --
>> Jiayu (James) Ji,
>>
>> Cell: (312)823-7393
>>
>>
>


-- 
Jiayu (James) Ji,

Cell: (312)823-7393


Re: How does mapreduce job determine the compress codec

2013-12-15 Thread Azuryy Yu
Hi Jiayu,
For the Sequence file as an input, CompressCodec class was serialized in
the file header, then Sequence Filereader will know the compression algo.
thanks.




On Mon, Dec 16, 2013 at 8:28 AM, Jiayu Ji  wrote:

> Thanks Tao. I know I can tell it is a lzo file based on the magic number.
> What I am curious is which class in hadoop used by the mapreduce job to
> determine the file compression algorithm. At the end of the day, I am
> trying to figure out whether all the inputs of a mapreduce job have to be
> compressed with the same algorithm.
>
>
> On Fri, Dec 13, 2013 at 11:16 PM, Tao Xiao wrote:
>
>> I suggest you download the lzo compressed file, no matter weather it has
>> a lzo extension as its file name,  and open it in the form of hex bytes
>> with tools like UltraEdit, and have a look at its heading contents.
>>
>>
>> 2013/12/14 Jiayu Ji 
>>
>>> Hi
>>>
>>> I am having this question on how does mapreduce job determine the
>>> compress codec on hdfs. From what I read on the definitive guide (page
>>> 86)," the CompressionCodecFactory provides a way of mapping a filename
>>> extension to a CompressionCodec using its getCodec() method". I did a test
>>> with a lzo compressed file without a lzo extension. However, the mapreduce
>>> job was still able to get the right codec. Does anyone know why? Thanks in
>>> advance.
>>>
>>> Jiayu
>>>
>>
>>
>
>
> --
> Jiayu (James) Ji,
>
> Cell: (312)823-7393
>
>


Re: How does mapreduce job determine the compress codec

2013-12-15 Thread Jiayu Ji
Thanks Tao. I know I can tell it is a lzo file based on the magic number.
What I am curious is which class in hadoop used by the mapreduce job to
determine the file compression algorithm. At the end of the day, I am
trying to figure out whether all the inputs of a mapreduce job have to be
compressed with the same algorithm.


On Fri, Dec 13, 2013 at 11:16 PM, Tao Xiao  wrote:

> I suggest you download the lzo compressed file, no matter weather it has a
> lzo extension as its file name,  and open it in the form of hex bytes with
> tools like UltraEdit, and have a look at its heading contents.
>
>
> 2013/12/14 Jiayu Ji 
>
>> Hi
>>
>> I am having this question on how does mapreduce job determine the
>> compress codec on hdfs. From what I read on the definitive guide (page
>> 86)," the CompressionCodecFactory provides a way of mapping a filename
>> extension to a CompressionCodec using its getCodec() method". I did a test
>> with a lzo compressed file without a lzo extension. However, the mapreduce
>> job was still able to get the right codec. Does anyone know why? Thanks in
>> advance.
>>
>> Jiayu
>>
>
>


-- 
Jiayu (James) Ji,

Cell: (312)823-7393


Re: How does mapreduce job determine the compress codec

2013-12-13 Thread Tao Xiao
I suggest you download the lzo compressed file, no matter weather it has a
lzo extension as its file name,  and open it in the form of hex bytes with
tools like UltraEdit, and have a look at its heading contents.


2013/12/14 Jiayu Ji 

> Hi
>
> I am having this question on how does mapreduce job determine the compress
> codec on hdfs. From what I read on the definitive guide (page 86)," the
> CompressionCodecFactory provides a way of mapping a filename extension to a
> CompressionCodec using its getCodec() method". I did a test with a lzo
> compressed file without a lzo extension. However, the mapreduce job was
> still able to get the right codec. Does anyone know why? Thanks in advance.
>
> Jiayu
>


How does mapreduce job determine the compress codec

2013-12-13 Thread Jiayu Ji
Hi

I am having this question on how does mapreduce job determine the compress
codec on hdfs. From what I read on the definitive guide (page 86)," the
CompressionCodecFactory provides a way of mapping a filename extension to a
CompressionCodec using its getCodec() method". I did a test with a lzo
compressed file without a lzo extension. However, the mapreduce job was
still able to get the right codec. Does anyone know why? Thanks in advance.

Jiayu