Hi Sriram,

I suspect the following in Gora to somehow be causing this issue:

IOUtils source:
http://svn.apache.org/viewvc/gora/trunk/gora-core/src/main/java/org/apache/gora/util/IOUtils.java?view=markup
QueryBase source:
http://svn.apache.org/viewvc/gora/trunk/gora-core/src/main/java/org/apache/gora/query/impl/QueryBase.java?view=markup

Notice that IOUtils.deserialize(…) calls expect a proper Configuration
object. If not passed (i.e., if null), they call the following.

68        private static Configuration getOrCreateConf(Configuration conf) {
69          if(conf == null) {
70            if(IOUtils.conf == null) {
71              IOUtils.conf = new Configuration();
72            }
73          }
74          return conf != null ? conf : IOUtils.conf;
75        }

Now QueryBase, has in its readFields method, some
IOUtils.deserialize(…) calls, that seem to pass a null for the
configuration object. The IOUtils.deserialize(…) method hence calls
this above method, and initializes a whole new Configuration object,
as the passed conf object is null.

If it does that, it would not be loading the "job.xml" file contents,
which is the job's config file (thats something the map task's config
set alone loads, and not a file thats loaded by default). So hence,
custom serializers will disappear the moment it begins using this new
Configuration object.

This is what you'll want to investigate and fix or notify the Gora
devs about (why QueryBase#readFields uses a null object, and if it can
reuse some set conf object). As a cheap hack fix, maybe doing the
following will make it work in an MR environment?

IOUtils.conf = new Configuration();
IOUtils.conf.addResource("job.xml");

I haven't tried the above, but let us know how we can be of further
assistance. An ideal fix would be to only use the MapTask's provided
Configuration object everywhere, somehow, and never re-create one.

P.s. If you want a thread ref link to share with other devs over Gora,
here it is: http://search-hadoop.com/m/BXZA4dTUFC

On Fri, Jul 27, 2012 at 1:24 PM, Sriram Ramachandrasekaran
<sri.ram...@gmail.com> wrote:
> Hello,
> I have an MR job that talks to HBase. I use Gora to talk to HBase. Gora also
> provides couple of classes which can be extended to write Mappers and
> Reducers, if the mappers need input from an HBase store and Reducers need to
> write it out to an HBase store. This is the reason why I use Gora.
>
> Now, when I run my MR job, I get an exception as below.
> (https://issues.apache.org/jira/browse/HADOOP-3093)
> java.lang.RuntimeException: java.io.IOException:
> java.lang.NullPointerException
> at
> org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:115)
> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
> at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:723)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.io.IOException: java.lang.NullPointerException
> at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:483)
> at
> org.apache.gora.mapreduce.GoraInputFormat.getQuery(GoraInputFormat.java:125)
> at
> org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:112)
> ... 9 more
> Caused by: java.lang.NullPointerException
> at
> org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:77)
> at org.apache.gora.util.IOUtils.deserialize(IOUtils.java:205)
> at org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:234)
> at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
> at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
> at
> org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
> at org.apache.hadoop.io.DefaultStringifier.load(DefaultStringifier.java:133)
> at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:480)
> ... 11 more
>
> I tried the following things to work through this issue.
> 0. The stack trace indicates that, when setting up a new Mapper, it is
> unable to deserialize something. (I could not get to understand where it
> fails).
> 1. I looked around the forums and realized that serialization options are
> not getting passed, so, I tried setting up, io.serializations config on the
> job.
>    1.1. I am not setting up the "io.serializations" myself, I use
> GoraMapReduceUtils.setIOSerializations() to do it. I verified that, the
> confs are getting proper serializers.
> 2. I verified in the job xml to see if these confs have got through, they
> were. But, it failed again.
> 3. I tried starting the hadoop job runner with debug options turned on and
> in suspend mode, -XDebug suspend=y and I also set the VM options for mapred
> child tasks, via the mapred.child.java.opts to see if I can debug the VM
> that gets spawned newly. Although I get a message on my stdout saying,
> opening port X and waiting, when I try to attach a remote debugger on that
> port, it does not work.
>
> I understand that, when SerializationFactory tries to deSerialize
> 'something', it does not find an appropriate unmarshaller and so it fails.
> But, I would like to know a way to find that 'something' and I would like to
> get some idea on how (pseudo) distributed MR jobs should be generally
> debugged. I tried searching, did not find anything useful.
>
> Any help/pointers would be greatly useful.
>
> Thanks!
>
> --
> It's just about how deep your longing is!
>



-- 
Harsh J

Reply via email to