Hey Bing, What platform are you using to submit jobs from? Linux has native libs pre-built and available already, and you can just use them OOB. Native libraries also help performance beyond just compression today, so its a must-have at the server end today.
Adam's second point was more about choosing the right compression codec, which is your next step after having native libs setup -- and yes you'd want the codec to either be splittable - LZO is splittable, but requires a level of manual indexing first, Bz2 is natively splittable but too slow, though gives a great compression ratio, the rest (gz, snappy) so far aren't splittable on their own. My suggestion is always to use SequenceFile/Avro-DataFile formats over plaintext files, which are very well optimized for Hadoop and are splittable without any manual work -- and support a variety of compression codecs each. On Wed, Feb 8, 2012 at 11:12 AM, Bing Li <lbl...@gmail.com> wrote: > Dear Adam, > > Since I am not familiar with the native configuration, I have not solved > the warning. > > You mean I can find a proper compression solution and do that in my own > code instead of using the ones in Hadoop? What I need to do is just to > guarantee the compressed data should be splittable. Right? > > Thanks so much for your help! > > Best regards, > Bing > > On Wed, Feb 8, 2012 at 8:53 AM, Adam Brown <a...@hortonworks.com> wrote: > >> Hi Bing, >> >> If your data is highly compressible (e.g.-text) you should expect to see >> higher throughput consuming the compressed stream. Of course you should >> mmend doing a benchmark run against some sample data. >> >> Keep in mind that not all compressed streams can be arbitrarily split, so >> you must either use compression that is splittable (like lzo) or write your >> own input/output handlers. >> >> cheers, >> >> Adam >> >> On Tue, Feb 7, 2012 at 6:05 AM, Bing Li <lbl...@gmail.com> wrote: >> >>> Dear Uma, >>> >>> Thanks so much for your reply! >>> >>> Is the compression technique critical to Hadoop? I am not familiar with >>> native libraries. >>> >>> Best regards, >>> Bing >>> >>> On Tue, Feb 7, 2012 at 6:04 PM, Uma Maheswara Rao G <mahesw...@huawei.com >>> >wrote: >>> >>> > Looks you are not using any compression in your code. Hadoop has some >>> > native libraries to load mainly for compression codecs. >>> > When you want to use that compression tequniques, you need to compile >>> with >>> > this compile.native option enable. Also need to set in java library >>> path. >>> > If you are not using any such stuff, then you need not worry about that >>> > warning. >>> > Please look at the below link for more information. >>> > http://hadoop.apache.org/common/docs/current/native_libraries.html >>> > >>> > Regards, >>> > Uma >>> > ________________________________________ >>> > From: Bing Li [lbl...@gmail.com] >>> > Sent: Tuesday, February 07, 2012 3:08 PM >>> > To: common-user@hadoop.apache.org >>> > Subject: Unable to Load Native-Hadoop Library for Your Platform >>> > >>> > Dear all, >>> > >>> > I got an error when running a simple Java program on Hadoop. The >>> program is >>> > just to merge some local files to one and put it on Hadoop. >>> > >>> > The code is as follows. >>> > >>> > ...... >>> > >>> > Configuration conf = new Configuration(); >>> > try >>> > { >>> > FileSystem hdfs = FileSystem.get(conf); >>> > FileSystem local = FileSystem.getLocal(conf); >>> > Path inputDir = new Path("/home/libing/Temp/"); >>> > Path hdfsFile = new >>> > Path("/tmp/user/libing/example.txt"); >>> > >>> > try >>> > { >>> > FileStatus[] inputFiles = >>> > local.listStatus(inputDir); >>> > FSDataOutputStream out = >>> > hdfs.create(hdfsFile); >>> > for (int i = 0; i < inputFiles.length; i >>> ++) >>> > { >>> > >>> > System.out.println(inputFiles[i].getPath().getName()); >>> > FSDataInputStream in = >>> > local.open(inputFiles[i].getPath()); >>> > byte buffer[] = new byte[256]; >>> > int bytesRead = 0; >>> > while ((bytesRead = >>> > in.read(buffer)) > 0) >>> > { >>> > out.write(buffer, 0, >>> > bytesRead); >>> > } >>> > in.close(); >>> > } >>> > out.close(); >>> > } >>> > catch (IOException e) >>> > { >>> > e.printStackTrace(); >>> > } >>> > } >>> > catch (IOException e) >>> > { >>> > e.printStackTrace(); >>> > } >>> > >>> > ...... >>> > >>> > I run it with ant and got the following warning. BTW, all the relevant >>> jar >>> > packages from Hadoop are specified in the build.xml. >>> > >>> > [java] 2012-2-7 17:16:18 org.apache.hadoop.util.NativeCodeLoader >>> > <clinit> >>> > [java] Warning: Unable to load native-hadoop library for your >>> > platform... using builtin-java classes where applicable >>> > >>> > The program got a correct result. But I cannot figure out what the above >>> > problem is. >>> > >>> > Thanks so much! >>> > Bing >>> > >>> >> >> >> >> -- >> Adam Brown >> Enablement Engineer >> Hortonworks >> <http://www.hadoopsummit.org/> >> >> -- Harsh J Customer Ops. Engineer Cloudera | http://tiny.cloudera.com/about