Hi David,

I haven't used the Hbase backend with GORA for quite some time but from what
I can remember you'll need the following things :

* conf/hbase-site.xml => this should correspond to your local configuration
* conf/gora-hbase-mapping.xml => see below
* conf/gora.properties => don't think there anything you need to specify for
Hbase

* in nutch-site.xml

<property>
  <name>storage.data.store.class</name>
  <value>org.gora.hbase.store.HbaseStore</value>
  <description>Default class for storing data</description>
</property>

and of course all the necessary Hbase jars in the /lib dir - probably easier
to modify ivy/ivy.xml so that it includes Hbase

gora-hbase-mapping.xml  : not sure this is the latest version though

*<?xml version="1.0" encoding="UTF-8"?>

<gora-orm>

<table name="webtable">
  <family name="p"/> <!-- This can also have params like compression, bloom
filters -->
  <family name="f"/>
  <family name="s"/>
  <family name="il"/>
  <family name="ol"/>
  <family name="h"/>
  <family name="mtdt"/>
  <family name="mk"/>
</table>

<class table="webtable" keyClass="java.lang.String"
name="org.apache.nutch.storage.WebPage">
  <!-- fetch fields                                       -->
  <field name="baseUrl" family="f" qualifier="bas"/>
  <field name="status" family="f" qualifier="st"/>
  <field name="prevFetchTime" family="f" qualifier="pts"/>
  <field name="fetchTime" family="f" qualifier="ts"/>
  <field name="fetchInterval" family="f" qualifier="fi"/>
  <field name="retriesSinceFetch" family="f" qualifier="rsf"/>
  <field name="reprUrl" family="f" qualifier="rpr"/>
  <field name="content" family="f" qualifier="cnt"/>
  <field name="contentType" family="f" qualifier="typ"/>
  <field name="protocolStatus" family="f" qualifier="prot"/>
  <field name="modifiedTime" family="f" qualifier="mod"/>

  <!-- parse fields                                       -->
  <field name="title" family="p" qualifier="t"/>
  <field name="text" family="p" qualifier="c"/>
  <field name="parseStatus" family="p" qualifier="st"/>
  <field name="signature" family="p" qualifier="sig"/>
  <field name="prevSignature" family="p" qualifier="psig"/>

  <!-- score fields                                       -->
  <field name="score" family="s" qualifier="s"/>

  <field name="headers" family="h"/>

  <field name="inlinks" family="il"/>

  <field name="outlinks" family="ol"/>

  <field name="metadata" family="mtdt"/>

  <field name="markers" family="mk"/>

</class>

</gora-orm>*


HTH

Good luck!

Julien

-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

On 2 September 2010 12:58, David Stuart <
david.stu...@progressivealliance.co.uk> wrote:

> Hey All,
>
> I have setup the latest version nutch from trunk and am running into a few
> issues with hbase and injecting urls. when I run the command
>
> runtime/local/bin/nutch inject runtime/local/seed/
>
> I get
> InjectorJob: java.lang.RuntimeException: Could not create datastore
>        at
> org.apache.nutch.storage.StorageUtils.initMapperJob(StorageUtils.java:70)
>        at
> org.apache.nutch.storage.StorageUtils.initMapperJob(StorageUtils.java:50)
>        at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:233)
>        at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:246)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:256)
>
> Under the gora properties it should be pointing at localhost/nutchtest and
> I created that store manually in hbase is that right?


> I have found a few tutorials around nutchbase but the api seems to have
> changed since the merge with Nutch trunk
>
> Any help would be appreciated and I try to do a how to writeup
>
> Regards,
>
> Dave

Reply via email to