[
https://issues.apache.org/jira/browse/NUTCH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860493#action_12860493
]
Soila Pertet commented on NUTCH-650:
------------------------------------
I encountered the following NULL exception while running nutchbase.
2010-04-24 01:58:47,012 WARN org.apache.hadoop.mapred.TaskTracker: Error
running child java.lang.NullPointerException at
org.apache.hadoop.hbase.io.ImmutableBytesWritable.<init>(ImmutableBytesWritable.java:59)
at org.apache.nutch.fetcher.Fetcher$FetcherMapper.map(Fetcher.java:81) at
org.apache.nutch.fetcher.Fetcher$FetcherMapper.map(Fetcher.java:77) at
org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at
org.apache.hadoop.mapred.Child.main(Child.java:170)
I downloaded nutchbase from svn co
http://svn.apache.org/repos/asf/lucene/nutch/branches/nutchbase and applied
Xiao's patch. I am running hadoop-0.20.3, hbase-0.20.3 and zookeeper-3.2.2.
In my application the error occurs after the first iteration of the
fetch/generate cycle and is limited to the base url with a generator mark=csh,
e.g.:
keyvalues={host:http:8080/wikipedia/de/de/index.html/mtdt:_csh_/1272088691273/Put/vlen=4}
But it works fine for values with generator mark=genmrk, e.g.,:
keyvalues={host:http:8080/wikipedia/de/de/images/wikimedia-button.png/mtdt:__genmrk__/1272088714395/Put/vlen=4,
host:http:8080/wikipedia/de/de/images/wikimedia-button.png/mtdt:_csh_/1272088691109/Put/vlen=4}
I modified my map function to check for null values in outKeyRaw in
org.apache.nutch.fetcher.Fetcher$FetcherMapper.map. This masks the error but I
am not sure if this is the right action to take. Please let me know.
Thanks.
> Hbase Integration
> -----------------
>
> Key: NUTCH-650
> URL: https://issues.apache.org/jira/browse/NUTCH-650
> Project: Nutch
> Issue Type: New Feature
> Reporter: Doğacan Güney
> Assignee: Doğacan Güney
> Fix For: 2.0
>
> Attachments: hbase-integration_v1.patch, hbase_v2.patch,
> malformedurl.patch, meta.patch, meta2.patch, nofollow-hbase.patch,
> NUTCH-650.patch, nutch-habase.patch, searching.diff, slash.patch
>
>
> This issue will track nutch/hbase integration
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.