(...apologies for the cross posting...) The Apache Nutch project is pleased to announce the release of Apache Nutch 1.1. The release contents have been pushed out to the main Apache release site so the releases should be available as soon as the mirrors get the syncs.
Apache Nutch, one of the six new Apache TLPs as a result of the April 2010 Board Meeting, is an extensible framework for building out large-scale web-based search. Layered on top of fellow Apache projects Hadoop, Lucene/Solr, and Tika, Nutch provides an out of the box platform for fetching web pages, pdf files, word documents, and more. Nutch parses the content and its relevant information, indexes its metadata, and makes it available for efficient query and retrieval over modern Internet protocols. Apache Nutch 1.1 contains a number of improvements and bug fixes. Details can be found in the changes file: http://www.apache.org/dist/nutch/CHANGES-1.1.txt Apache Nutch is available in source and binary form from the following download page: http://www.apache.org/dyn/closer.cgi/nutch/ In the initial 48 hours, the release may not be available on all mirrors. When downloading from a mirror site, please remember to verify the downloads using signatures found on the Apache site: http://www.apache.org/dist/nutch/KEYS-1.1.txt For more information on Apache Nutch, visit the project home page: http://nutch.apache.org -- Chris Mattmann (on behalf of the Apache Nutch community)