Hi Karl, Maybe a good start would be to identify which parts of your crawler could be shared and would not take too much effort to be made generic. I haven't looked to the code of the crawler in great details but do you think the robots parser would be a good candidate?
Julien On 2 June 2011 16:23, Karl Wright <daddy...@gmail.com> wrote: > Absolutely! > We're a bit thin on active committers at the moment, which will > probably limit our ability to take any highly active roles in your > development process. But we do have a pile of code which you might be > able to leverage, and once there is common functionality available I > think we'd all prefer to use that rather than home-grown code. > > How would you prefer that we proceed? > > Karl > > > On Thu, Jun 2, 2011 at 11:11 AM, Julien Nioche > <lists.digitalpeb...@gmail.com> wrote: > > Hi guys, > > > > I'd just like to mention Crawler Commons which is a effort between the > > committers of various crawl-related projects (Nutch, Bixo or Heritrix) to > > put some basic functionalities in common. We currently have mostly a top > > level domain finder and a sitemap parser, but are definitely planning to > > have other things there as well, e.g. robots.txt parser, protocol handler > > etc... > > > > Would you like to get involved? There are quite a few things that the > > crawler in Manifold could reuse or contribute to. > > > > Best, > > > > Julien > > > > -- > > * > > *Open Source Solutions for Text Engineering > > > > http://digitalpebble.blogspot.com/ > > http://www.digitalpebble.com > > > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com