Hey Bing,

What platform are you using to submit jobs from? Linux has native libs
pre-built and available already, and you can just use them OOB. Native
libraries also help performance beyond just compression today, so its
a must-have at the server end today.

Adam's second point was more about choosing the right compression
codec, which is your next step after having native libs setup -- and
yes you'd want the codec to either be splittable - LZO is splittable,
but requires a level of manual indexing first, Bz2 is natively
splittable but too slow, though gives a great compression ratio, the
rest (gz, snappy) so far aren't splittable on their own.

My suggestion is always to use SequenceFile/Avro-DataFile formats over
plaintext files, which are very well optimized for Hadoop and are
splittable without any manual work -- and support a variety of
compression codecs each.

On Wed, Feb 8, 2012 at 11:12 AM, Bing Li <lbl...@gmail.com> wrote:
> Dear Adam,
>
> Since I am not familiar with the native configuration, I have not solved
> the warning.
>
> You mean I can find a proper compression solution and do that in my own
> code instead of using the ones in Hadoop? What I need to do is just to
> guarantee the compressed data should be splittable. Right?
>
> Thanks so much for your help!
>
> Best regards,
> Bing
>
> On Wed, Feb 8, 2012 at 8:53 AM, Adam Brown <a...@hortonworks.com> wrote:
>
>> Hi Bing,
>>
>> If your data is highly compressible (e.g.-text) you should expect to see
>> higher throughput consuming the compressed stream.  Of course you should
>> mmend doing a benchmark run against some sample data.
>>
>> Keep in mind that not all compressed streams can be arbitrarily split, so
>> you must either use compression that is splittable (like lzo) or write your
>> own input/output handlers.
>>
>> cheers,
>>
>> Adam
>>
>> On Tue, Feb 7, 2012 at 6:05 AM, Bing Li <lbl...@gmail.com> wrote:
>>
>>> Dear Uma,
>>>
>>> Thanks so much for your reply!
>>>
>>> Is the compression technique critical to Hadoop? I am not familiar with
>>> native libraries.
>>>
>>> Best regards,
>>> Bing
>>>
>>> On Tue, Feb 7, 2012 at 6:04 PM, Uma Maheswara Rao G <mahesw...@huawei.com
>>> >wrote:
>>>
>>> > Looks you are not using any compression in your code. Hadoop has some
>>> > native libraries to load mainly for compression codecs.
>>> > When you want to use that compression tequniques, you need to compile
>>> with
>>> > this compile.native option enable. Also need to set in java library
>>> path.
>>> > If you are not using any such stuff, then you need not worry about that
>>> > warning.
>>> > Please look at the below link for more information.
>>> > http://hadoop.apache.org/common/docs/current/native_libraries.html
>>> >
>>> > Regards,
>>> > Uma
>>> > ________________________________________
>>> > From: Bing Li [lbl...@gmail.com]
>>> > Sent: Tuesday, February 07, 2012 3:08 PM
>>> > To: common-user@hadoop.apache.org
>>> > Subject: Unable to Load Native-Hadoop Library for Your Platform
>>> >
>>> > Dear all,
>>> >
>>> > I got an error when running a simple Java program on Hadoop. The
>>> program is
>>> > just to merge some local files to one and put it on Hadoop.
>>> >
>>> > The code is as follows.
>>> >
>>> >                ......
>>> >
>>> >                Configuration conf = new Configuration();
>>> >                try
>>> >                {
>>> >                        FileSystem hdfs = FileSystem.get(conf);
>>> >                        FileSystem local = FileSystem.getLocal(conf);
>>> >                        Path inputDir = new Path("/home/libing/Temp/");
>>> >                        Path hdfsFile = new
>>> > Path("/tmp/user/libing/example.txt");
>>> >
>>> >                        try
>>> >                        {
>>> >                                FileStatus[] inputFiles =
>>> > local.listStatus(inputDir);
>>> >                                FSDataOutputStream out =
>>> > hdfs.create(hdfsFile);
>>> >                                for (int i = 0; i < inputFiles.length; i
>>> ++)
>>> >                                {
>>> >
>>> > System.out.println(inputFiles[i].getPath().getName());
>>> >                                        FSDataInputStream in =
>>> > local.open(inputFiles[i].getPath());
>>> >                                        byte buffer[] = new byte[256];
>>> >                                        int bytesRead = 0;
>>> >                                        while ((bytesRead =
>>> > in.read(buffer)) > 0)
>>> >                                        {
>>> >                                                out.write(buffer, 0,
>>> > bytesRead);
>>> >                                        }
>>> >                                        in.close();
>>> >                                }
>>> >                                out.close();
>>> >                        }
>>> >                        catch (IOException e)
>>> >                        {
>>> >                                e.printStackTrace();
>>> >                        }
>>> >                }
>>> >                catch (IOException e)
>>> >                {
>>> >                        e.printStackTrace();
>>> >                }
>>> >
>>> >                ......
>>> >
>>> > I run it with ant and got the following warning. BTW, all the relevant
>>> jar
>>> > packages from Hadoop are specified in the build.xml.
>>> >
>>> >     [java] 2012-2-7 17:16:18 org.apache.hadoop.util.NativeCodeLoader
>>> > <clinit>
>>> >     [java] Warning: Unable to load native-hadoop library for your
>>> > platform... using builtin-java classes where applicable
>>> >
>>> > The program got a correct result. But I cannot figure out what the above
>>> > problem is.
>>> >
>>> > Thanks so much!
>>> > Bing
>>> >
>>>
>>
>>
>>
>> --
>> Adam Brown
>> Enablement Engineer
>> Hortonworks
>> <http://www.hadoopsummit.org/>
>>
>>



-- 
Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about

Reply via email to