[ 
https://issues.apache.org/jira/browse/HADOOP-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15648921#comment-15648921
 ] 

Andrew Wang edited comment on HADOOP-11804 at 11/8/16 10:00 PM:
----------------------------------------------------------------

Thanks for the rev Sean. I tried it with Avro and got NoClassDefFound for Log4J:

{noformat}
testSort(org.apache.avro.mapred.TestAvroTextSort)  Time elapsed: 0.051 sec  <<< 
ERROR!
java.lang.NoClassDefFoundError: org/apache/log4j/Level
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at org.apache.hadoop.mapred.JobConf.<clinit>(JobConf.java:356)
        at 
org.apache.avro.mapred.TestAvroTextSort.testSort(TestAvroTextSort.java:37)
{noformat}

I think this is expected based on the contents of the hadoop-client-runtime 
pom.xml, which marks log4j as optional. I manually added this dependency, and 
then hit this:

{noformat}
testReadAvro(org.apache.avro.hadoop.io.TestAvroSequenceFile)  Time elapsed: 
0.016 sec  <<< ERROR!
java.lang.NullPointerException: null
        at 
org.apache.hadoop.io.serializer.SerializationFactory.<init>(SerializationFactory.java:58)
        at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1248)
        at 
org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1207)
        at 
org.apache.avro.hadoop.io.AvroSequenceFile$Writer.<init>(AvroSequenceFile.java:532)
        at 
org.apache.avro.hadoop.io.TestAvroSequenceFile.writeSequenceFile(TestAvroSequenceFile.java:200)
        at 
org.apache.avro.hadoop.io.TestAvroSequenceFile.testReadAvro(TestAvroSequenceFile.java:53)
{noformat}

I decompiled the SerializationFactory class, and noticed that it messed with 
the config key. I think we need to add some kind of exclusion for 
CommonConfigurationKeysPublic.

{code}
    // before
    if (conf.get(CommonConfigurationKeys.IO_SERIALIZATIONS_KEY).equals("")) {
    // decompiled
    if (conf.get("org.apache.hadoop.shaded.io.serializations").equals("")) {
{code}

Here's my Avro diff for master (without the log4j addition) if you want to try 
this yourself:

https://gist.github.com/anonymous/c064c283348a2d1bbec00845678339f9


was (Author: andrew.wang):
Thanks for the rev Sean. I tried it with Avro and got NoClassDefFound for Log4J:

{noformat}
testSort(org.apache.avro.mapred.TestAvroTextSort)  Time elapsed: 0.051 sec  <<< 
ERROR!
java.lang.NoClassDefFoundError: org/apache/log4j/Level
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at org.apache.hadoop.mapred.JobConf.<clinit>(JobConf.java:356)
        at 
org.apache.avro.mapred.TestAvroTextSort.testSort(TestAvroTextSort.java:37)
{noformat}

I think this is expected based on the contents of the hadoop-client-runtime 
pom.xml, which marks log4j as optional. I manually added this dependency, and 
then hit this:

{noformat}
testReadAvro(org.apache.avro.hadoop.io.TestAvroSequenceFile)  Time elapsed: 
0.016 sec  <<< ERROR!
java.lang.NullPointerException: null
        at 
org.apache.hadoop.io.serializer.SerializationFactory.<init>(SerializationFactory.java:58)
        at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1248)
        at 
org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1207)
        at 
org.apache.avro.hadoop.io.AvroSequenceFile$Writer.<init>(AvroSequenceFile.java:532)
        at 
org.apache.avro.hadoop.io.TestAvroSequenceFile.writeSequenceFile(TestAvroSequenceFile.java:200)
        at 
org.apache.avro.hadoop.io.TestAvroSequenceFile.testReadAvro(TestAvroSequenceFile.java:53)
{noformat}

I decompiled the SerializationFactory class, and noticed that it messed with 
the config key. I think we need to add some kind of exclusion for 
CommonConfigurationKeysPublic.

{code}
    // before
    if (conf.get(CommonConfigurationKeys.IO_SERIALIZATIONS_KEY).equals("")) {
    // decompiled
    if (conf.get("org.apache.hadoop.shaded.io.serializations").equals("")) {
{noformat}

Here's my Avro diff for master (without the log4j addition) if you want to try 
this yourself:

https://gist.github.com/anonymous/c064c283348a2d1bbec00845678339f9

> POC Hadoop Client w/o transitive dependencies
> ---------------------------------------------
>
>                 Key: HADOOP-11804
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11804
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: build
>            Reporter: Sean Busbey
>            Assignee: Sean Busbey
>         Attachments: HADOOP-11804.1.patch, HADOOP-11804.2.patch, 
> HADOOP-11804.3.patch, HADOOP-11804.4.patch, HADOOP-11804.5.patch, 
> HADOOP-11804.6.patch, HADOOP-11804.7.patch
>
>
> make a hadoop-client-api and hadoop-client-runtime that i.e. HBase can use to 
> talk with a Hadoop cluster without seeing any of the implementation 
> dependencies.
> see proposal on parent for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to