Hi, We developed our own crawler.
It's a lightweight crawler, conforming to the Google Connector Manager architecture. However, some neat features of the crawler: - Near real-time indexing. New pages are indexed seconds after they are crawled. - On demand pages. These pages are crawled in higher priority. - Depth control between recrawls (prevents loops) - Based on HtmlUnit, which supports JavaScript. Regards. On Thu, Jan 6, 2011 at 3:19 AM, Otis Gospodnetic <ogjunk-nu...@yahoo.com>wrote: > I think this is a good question and I'd be curious what the answer is, too. > Rida, could you please shed some light on this crawler side of Constellio? > > This is also interesting because LWE chose Aperture's crawler instead of > Nutch, even though Andrzej works for Lucid. How come? Is Nutch simply too > big and complex, while Aperture's stuff is more suitable for typical > non-Web-scale crawling needs of a typical enterprise/LWE customer? > > Thanks, > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > *From:* Davide Cavalaglio <davide.cavalag...@desktopsrl.com> > *To:* dev@nutch.apache.org > *Sent:* Tue, December 28, 2010 7:08:27 AM > *Subject:* Re: The Constellio team is proud to release its version 1.1 > > Hi, > but the crawler used by Constellio is Nutch? > > 2010/12/20 Rida Benjelloun <rida.benjell...@doculibre.com> > >> The Constellio team is proud to release its version 1.1 >> >> Constellio Open Source Enterprise Search is based on Apache Solr and using >> Google Search Appliances connectors architecture, it allows, with a single >> click, to find all relevant content in your organization (Web, email, ECM, >> CRM etc.). >> >> Please be advised that the GPL v.3.0 Constellio licence has been changed >> for the version LGPL v.3.0. >> >> The new licence LGPL v.3.0 gives more flexibility to developers interested >> in plugs-in/modules development or the integration of Constellio to other >> solutions. The SVN (svn.constellio.com) and the issue tracker ( >> issues.constellio.com) are now also open. >> >> Many important changes have been done in this new version. >> >> Here are some of new features developed in the 1.1 version: >> >> - Constellio multi-platform installer >> - Federeted search >> - Document security >> - Autocomplete for simple search base on most popular queries >> - Configurable advanced search interface and autocomplete based on >> field content >> - Solr connector (upload your schema.xml and content - xml and binary - >> files) >> - Activation of Solr HTTP Web services and make Constellio spell >> checker available through these services >> - Implementation of multiselect faceting >> - Configuration of display fields >> - Documents consultation used in the relevance calculation of search >> results >> - Add field boost, document boost, and Solr dismax (relevance) >> - Add Carrot2 for faceting >> - Web crawler improvements >> - Add new theme >> - and more ... >> Your comments/suggestions are also welcomed ! >> >> >> >> -- >> --------------------------------------------------------- >> Rida Benjelloun >> Constellio - Doculibre >> ridabenjell...@apache.org >> rida.benjell...@doculibre.com >> --------------------------------------------------------- >> > > -- --------------------------------------------------------- Rida Benjelloun Constellio - Doculibre ridabenjell...@apache.org rida.benjell...@doculibre.com ---------------------------------------------------------