Hi, I think we need to commit all the necessary files to nutch so that it can work out of the box for sql, hbase and casssandra. We can even write commented-out entries in gora.properties, nutch-site.xml, etc so that using nutch with different backends becomes a configuration change. I will open a issue to track this down.
Cheers, Enis On Wed, Sep 8, 2010 at 1:53 PM, Julien Nioche <lists.digitalpeb...@gmail.com > wrote: > Hi guys, > > I've summarized the steps to follow for having GORA+Hbase with Nutch 2.0 on > http://wiki.apache.org/nutch/GORA_HBase > > Feel free to amend and improve as you see fit. > > Please bear in mind that Nutch 2.0 is at a very early stage and is far from > being bug-proof, see in particular [1]. > > HTH > > Julien > > [1] https://issues.apache.org/jira/browse/NUTCH-893 > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > > > On 6 September 2010 13:35, Andrzej Bialecki <a...@getopt.org> wrote: > > > On 2010-09-05 14:56, David Stuart wrote: > > > >> Hi All, > >> > >> I have done as per below and can create a table from within the hbase > >> shell. I found the appropriate create table method > >> bin/nutch org.apache.nutch.storage.WebTableCreator webtable but it only > >> returns null > >> > >> Any help would be great > >> > > > > You don't have to create a table manually - this should happen > > automatically when you first run any Nutch tool. Just make sure you have > > hbase-site.xml on your classpath in Nutch - best if you put it in your > conf/ > > and rebuild, so that it's packed into a job jar. > > > > Here's for example my config files that work with HBase (I don't use any > > non-standard settings for HBase, so my hbase-site.xml has no properties, > but > > still it needs to be included in Nutch job jar): > > > > gora-hbase-mapping.xml: > > ------------------------------------------------------------------------- > > > > <gora-orm> > > > > <table name="webtable"> > > <family name="p"/> <!-- This can also have params like compression, > bloom > > filters --> > > <family name="f"/> > > <family name="s"/> > > <family name="il"/> > > <family name="ol"/> > > <family name="h"/> > > <family name="mtdt"/> > > <family name="mk"/> > > </table> > > > > <class table="webtable" keyClass="java.lang.String" > > name="org.apache.nutch.storage.WebPage"> > > <!-- fetch fields --> > > <field name="baseUrl" family="f" qualifier="bas"/> > > <field name="status" family="f" qualifier="st"/> > > <field name="prevFetchTime" family="f" qualifier="pts"/> > > <field name="fetchTime" family="f" qualifier="ts"/> > > <field name="fetchInterval" family="f" qualifier="fi"/> > > <field name="retriesSinceFetch" family="f" qualifier="rsf"/> > > <field name="reprUrl" family="f" qualifier="rpr"/> > > <field name="content" family="f" qualifier="cnt"/> > > <field name="contentType" family="f" qualifier="typ"/> > > <field name="protocolStatus" family="f" qualifier="prot"/> > > <field name="modifiedTime" family="f" qualifier="mod"/> > > > > <!-- parse fields --> > > <field name="title" family="p" qualifier="t"/> > > <field name="text" family="p" qualifier="c"/> > > <field name="parseStatus" family="p" qualifier="st"/> > > <field name="signature" family="p" qualifier="sig"/> > > <field name="prevSignature" family="p" qualifier="psig"/> > > > > <!-- score fields --> > > <field name="score" family="s" qualifier="s"/> > > > > <field name="headers" family="h"/> > > > > <field name="inlinks" family="il"/> > > > > <field name="outlinks" family="ol"/> > > > > <field name="metadata" family="mtdt"/> > > > > <field name="markers" family="mk"/> > > > > </class> > > > > </gora-orm> > > ------------------------------------------------------------------------- > > > > nutch-site.xml: > > ------------------------------------------------------------------------- > > ... blah blah, a lot of unrelated stuff... > > > > <property> > > <name>storage.data.store.class</name> > > <value>org.gora.hbase.store.HBaseStore</value> > > > > <description>Default class for storing data</description> > > </property> > > ------------------------------------------------------------------------- > > > > Of course you need also to use the same hadoop files (hdfs-site and > > mapred-site) as the ones that HBase uses. > > > > > > -- > > Best regards, > > Andrzej Bialecki <>< > > ___. ___ ___ ___ _ _ __________________________________ > > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > > ___|||__|| \| || | Embedded Unix, System Integration > > http://www.sigram.com Contact: info at sigram dot com > > >