Hi David, I haven't used the Hbase backend with GORA for quite some time but from what I can remember you'll need the following things :
* conf/hbase-site.xml => this should correspond to your local configuration * conf/gora-hbase-mapping.xml => see below * conf/gora.properties => don't think there anything you need to specify for Hbase * in nutch-site.xml <property> <name>storage.data.store.class</name> <value>org.gora.hbase.store.HbaseStore</value> <description>Default class for storing data</description> </property> and of course all the necessary Hbase jars in the /lib dir - probably easier to modify ivy/ivy.xml so that it includes Hbase gora-hbase-mapping.xml : not sure this is the latest version though *<?xml version="1.0" encoding="UTF-8"?> <gora-orm> <table name="webtable"> <family name="p"/> <!-- This can also have params like compression, bloom filters --> <family name="f"/> <family name="s"/> <family name="il"/> <family name="ol"/> <family name="h"/> <family name="mtdt"/> <family name="mk"/> </table> <class table="webtable" keyClass="java.lang.String" name="org.apache.nutch.storage.WebPage"> <!-- fetch fields --> <field name="baseUrl" family="f" qualifier="bas"/> <field name="status" family="f" qualifier="st"/> <field name="prevFetchTime" family="f" qualifier="pts"/> <field name="fetchTime" family="f" qualifier="ts"/> <field name="fetchInterval" family="f" qualifier="fi"/> <field name="retriesSinceFetch" family="f" qualifier="rsf"/> <field name="reprUrl" family="f" qualifier="rpr"/> <field name="content" family="f" qualifier="cnt"/> <field name="contentType" family="f" qualifier="typ"/> <field name="protocolStatus" family="f" qualifier="prot"/> <field name="modifiedTime" family="f" qualifier="mod"/> <!-- parse fields --> <field name="title" family="p" qualifier="t"/> <field name="text" family="p" qualifier="c"/> <field name="parseStatus" family="p" qualifier="st"/> <field name="signature" family="p" qualifier="sig"/> <field name="prevSignature" family="p" qualifier="psig"/> <!-- score fields --> <field name="score" family="s" qualifier="s"/> <field name="headers" family="h"/> <field name="inlinks" family="il"/> <field name="outlinks" family="ol"/> <field name="metadata" family="mtdt"/> <field name="markers" family="mk"/> </class> </gora-orm>* HTH Good luck! Julien -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com On 2 September 2010 12:58, David Stuart < david.stu...@progressivealliance.co.uk> wrote: > Hey All, > > I have setup the latest version nutch from trunk and am running into a few > issues with hbase and injecting urls. when I run the command > > runtime/local/bin/nutch inject runtime/local/seed/ > > I get > InjectorJob: java.lang.RuntimeException: Could not create datastore > at > org.apache.nutch.storage.StorageUtils.initMapperJob(StorageUtils.java:70) > at > org.apache.nutch.storage.StorageUtils.initMapperJob(StorageUtils.java:50) > at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:233) > at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:246) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:256) > > Under the gora properties it should be pointing at localhost/nutchtest and > I created that store manually in hbase is that right? > I have found a few tutorials around nutchbase but the api seems to have > changed since the merge with Nutch trunk > > Any help would be appreciated and I try to do a how to writeup > > Regards, > > Dave