To a point yes. Heritrix will output in arc format. Then you can use the o.a.n.tools.arc.ArcSegmentsCreator to convert the arc files to segments. From there you can run other tools on the segments as normal. What you won't get is Heritrix access to the crawldb.

Dennis

Ryan Smith wrote:
Is it possible to use heritrix as nutch's crawler?


On Sat, Mar 28, 2009 at 3:53 PM, Sami Siren <ssi...@gmail.com> wrote:

I am pleased to announce the availability of  Apache Nutch 1.0.

Apache Nutch, a subproject of Apache Lucene, is open source web-search
software. It builds on Lucene Java, adding web-specifics, such as a crawler,
a link-graph database, parsers for HTML and other document formats.

Apache Nutch 1.0 contains a number of bug fixes and improvements such as
Solr Integration, new indexing framework and new scoring framework just to
mention a few. Details can be found in the changes file:

http://svn.apache.org/repos/asf/lucene/nutch/tags/release-1.0/CHANGES.txt

Apache Nutch is available for download from the following download page:
http://www.apache.org/dyn/closer.cgi/lucene/nutch/nutch-1.0.tar.gz

When downloading from a mirror site, please remember to verify the
downloads using signatures found on the Apache site:
http://www.apache.org/dist/lucene/nutch/KEYS

For more information on Apache Nutch, visit the project home page:
http://lucene.apache.org/nutch

-- Sami Siren (on behalf of the Apache Nutch community)


Reply via email to