Re: Problems with block compression using native codecs (Snappy, LZO) and MapFile.Reader.get()

Jason B Tue, 22 May 2012 09:54:12 -0700

This is from our production environment.
Unfortunately, I cannot test this on any newer version until it is
upgraded to cdh4 (0.23)


But since this is chd3u1 release, it presumably already contains a lot
of bug fixes back ported from 0.21.

The SIGSER is just one of the issues.
EOFException is raised gracefully for other cases.


On 5/22/12, Edward Capriolo <edlinuxg...@gmail.com> wrote:
> if You are getting a SIGSEG it never hurts to try a more recent JVM.
> 21 has many bug fixes at this point.
>
> On Tue, May 22, 2012 at 11:45 AM, Jason B <urg...@gmail.com> wrote:
>> JIRA entry created:
>>
>> https://issues.apache.org/jira/browse/HADOOP-8423
>>
>>
>> On 5/21/12, Jason B <urg...@gmail.com> wrote:
>>> Sorry about using attachment. The code is below for the reference.
>>> (I will also file a jira as you suggesting)
>>>
>>> package codectest;
>>>
>>> import com.hadoop.compression.lzo.LzoCodec;
>>> import java.io.IOException;
>>> import java.util.Formatter;
>>> import org.apache.hadoop.conf.Configuration;
>>> import org.apache.hadoop.fs.FileSystem;
>>> import org.apache.hadoop.io.MapFile;
>>> import org.apache.hadoop.io.SequenceFile.CompressionType;
>>> import org.apache.hadoop.io.Text;
>>> import org.apache.hadoop.io.compress.CompressionCodec;
>>> import org.apache.hadoop.io.compress.DefaultCodec;
>>> import org.apache.hadoop.io.compress.SnappyCodec;
>>> import org.apache.hadoop.util.Tool;
>>> import org.apache.hadoop.util.ToolRunner;
>>>
>>> public class MapFileCodecTest implements Tool {
>>>     private Configuration conf = new Configuration();
>>>
>>>     private void createMapFile(Configuration conf, FileSystem fs, String
>>> path,
>>>             CompressionCodec codec, CompressionType type, int records)
>>> throws IOException {
>>>         MapFile.Writer writer = new MapFile.Writer(conf, fs, path,
>>> Text.class, Text.class,
>>>                 type, codec, null);
>>>         Text key = new Text();
>>>         for (int j = 0; j < records; j++) {
>>>             StringBuilder sb = new StringBuilder();
>>>             Formatter formatter = new Formatter(sb);
>>>             formatter.format("%03d", j);
>>>             key.set(sb.toString());
>>>             writer.append(key, key);
>>>         }
>>>         writer.close();
>>>     }
>>>
>>>     private void testCodec(Configuration conf, Class<? extends
>>> CompressionCodec> clazz,
>>>             CompressionType type, int records) throws IOException {
>>>         FileSystem fs = FileSystem.getLocal(conf);
>>>         try {
>>>             System.out.println("Creating MapFiles with " + records  +
>>>                     " records using codec " + clazz.getSimpleName());
>>>             String path = clazz.getSimpleName() + records;
>>>             createMapFile(conf, fs, path, clazz.newInstance(), type,
>>> records);
>>>             MapFile.Reader reader = new MapFile.Reader(fs, path, conf);
>>>             Text key1 = new Text("002");
>>>             if (reader.get(key1, new Text()) != null) {
>>>                 System.out.println("1st key found");
>>>             }
>>>             Text key2 = new Text("004");
>>>             if (reader.get(key2, new Text()) != null) {
>>>                 System.out.println("2nd key found");
>>>             }
>>>         } catch (Throwable ex) {
>>>             ex.printStackTrace();
>>>         }
>>>     }
>>>
>>>     @Override
>>>     public int run(String[] strings) throws Exception {
>>>         System.out.println("Using native library " +
>>> System.getProperty("java.library.path"));
>>>
>>>         testCodec(conf, DefaultCodec.class, CompressionType.RECORD,
>>> 100);
>>>         testCodec(conf, SnappyCodec.class, CompressionType.RECORD, 100);
>>>         testCodec(conf, LzoCodec.class, CompressionType.RECORD, 100);
>>>
>>>         testCodec(conf, DefaultCodec.class, CompressionType.RECORD, 10);
>>>         testCodec(conf, SnappyCodec.class, CompressionType.RECORD, 10);
>>>         testCodec(conf, LzoCodec.class, CompressionType.RECORD, 10);
>>>
>>>         testCodec(conf, DefaultCodec.class, CompressionType.BLOCK, 100);
>>>         testCodec(conf, SnappyCodec.class, CompressionType.BLOCK, 100);
>>>         testCodec(conf, LzoCodec.class, CompressionType.BLOCK, 100);
>>>
>>>         testCodec(conf, DefaultCodec.class, CompressionType.BLOCK, 10);
>>>         testCodec(conf, SnappyCodec.class, CompressionType.BLOCK, 10);
>>>         testCodec(conf, LzoCodec.class, CompressionType.BLOCK, 10);
>>>         return 0;
>>>     }
>>>
>>>     @Override
>>>     public void setConf(Configuration c) {
>>>         this.conf = c;
>>>     }
>>>
>>>     @Override
>>>     public Configuration getConf() {
>>>         return conf;
>>>     }
>>>
>>>     public static void main(String[] args) throws Exception {
>>>         ToolRunner.run(new MapFileCodecTest(), args);
>>>     }
>>>
>>> }
>>>
>>>
>>> On 5/21/12, Todd Lipcon <t...@cloudera.com> wrote:
>>>> Hi Jason,
>>>>
>>>> Sounds like a bug. Unfortunately the mailing list strips attachments.
>>>>
>>>> Can you file a jira in the HADOOP project, and attach your test case
>>>> there?
>>>>
>>>> Thanks
>>>> Todd
>>>>
>>>> On Mon, May 21, 2012 at 3:57 PM, Jason B <urg...@gmail.com> wrote:
>>>>> I am using Cloudera distribution cdh3u1.
>>>>>
>>>>> When trying to check native codecs for better decompression
>>>>> performance such as Snappy or LZO, I ran into issues with random
>>>>> access using MapFile.Reader.get(key, value) method.
>>>>> First call of MapFile.Reader.get() works but a second call fails.
>>>>>
>>>>> Also  I am getting different exceptions depending on number of entries
>>>>> in a map file.
>>>>> With LzoCodec and 10 record file, jvm gets aborted.
>>>>>
>>>>> At the same time the DefaultCodec works fine for all cases, as well as
>>>>> record compression for the native codecs.
>>>>>
>>>>> I created a simple test program (attached) that creates map files
>>>>> locally with sizes of 10 and 100 records for three codecs: Default,
>>>>> Snappy, and LZO.
>>>>> (The test requires corresponding native library available)
>>>>>
>>>>> The summary of problems are given below:
>>>>>
>>>>> Map Size: 100
>>>>> Compression: RECORD
>>>>> ==================
>>>>> DefaultCodec:  OK
>>>>> SnappyCodec: OK
>>>>> LzoCodec: OK
>>>>>
>>>>> Map Size: 10
>>>>> Compression: RECORD
>>>>> ==================
>>>>> DefaultCodec:  OK
>>>>> SnappyCodec: OK
>>>>> LzoCodec: OK
>>>>>
>>>>> Map Size: 100
>>>>> Compression: BLOCK
>>>>> ================
>>>>> DefaultCodec:  OK
>>>>>
>>>>> SnappyCodec: java.io.EOFException  at
>>>>> org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:114)
>>>>>
>>>>> LzoCodec: java.io.EOFException at
>>>>> org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:114)
>>>>>
>>>>> Map Size: 10
>>>>> Compression: BLOCK
>>>>> ==================
>>>>> DefaultCodec:  OK
>>>>>
>>>>> SnappyCodec: java.lang.NoClassDefFoundError: Ljava/lang/InternalError
>>>>> at
>>>>> org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect(Native
>>>>> Method)
>>>>>
>>>>> LzoCodec:
>>>>> #
>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>> #
>>>>> #  SIGSEGV (0xb) at pc=0x00002b068ffcbc00, pid=6385,
>>>>> tid=47304763508496
>>>>> #
>>>>> # JRE version: 6.0_21-b07
>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (17.0-b17 mixed mode
>>>>> linux-amd64 )
>>>>> # Problematic frame:
>>>>> # C  [liblzo2.so.2+0x13c00]  lzo1x_decompress+0x1a0
>>>>> #
>>>>> # An error report file with more information is saved as:
>>>>> # /hadoop/user/yurgis/testapp/hs_err_pid6385.log
>>>>> #
>>>>> # If you would like to submit a bug report, please visit:
>>>>> #   http://java.sun.com/webapps/bugreport/crash.jsp
>>>>> # The crash happened outside the Java Virtual Machine in native code.
>>>>> # See problematic frame for where to report the bug.
>>>>> #
>>>>
>>>>
>>>>
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>>>
>>>
>

Re: Problems with block compression using native codecs (Snappy, LZO) and MapFile.Reader.get()

Reply via email to