Just to clarify when I wrote about integrating the droids-crawler I wasn't thinking about merging it with droids-core. Right now, the droids-crawler does not use the droids-core at all. An old version of the droids-core code is copied in the project, so by integrating it, I meant making it having a dependency to the droids-core.
For the rest I totally agree that the core should be kept as simple as possible. br On Thu, Nov 18, 2010 at 10:36 PM, Paul Rogalinski <[email protected]> wrote: > HelloWorld, > > I'm currently building a crawler based on the droid-core implementation, > trying not to change anything in the core API / interfaces yet. Due to the > lack of documentation I was not so eager to dive directly into a lot of > crawler-code with unclear quality. Perhaps this was a mistake, but on the > other hand it does currently suit me quite well. > > My goal is to have a crawler with a very small footprint to be embedded > into a Hadoop map/reduce job. So I am not using Spring (IMHO too much > overhead to initialize when running inside map/reduce), recrawling or even > multi-threaded crawling. I do plan to spawn a lot of droids, each taking > care of one domain. Each droid has no need to jump domains or hosts. > Extracted data will be written into an HBase cluster for further processing. > > This is not some hobby side project for myself but a project with real > world deployment and it needs to be pretty much bullet proof. I am not going > crazy about beautiful architecture but focus rather on stable, clean and > hopefully bugfree code. Along with that I am finding smaller bugs in the > droids-core implementation and thinking about additions and minor changes to > the API. > > I am not sure *all* of this has its place in the droids-core module - in > the end my requirements are not very generic. But if somebody is interested > I am open to discussion how my work can help improving droids-core. > > Greetings, > Paul. > > P.S. > just parked my butt over at #droids/freenode. My timezone is CET and I'll > be checking activity on that channel in the evenings. To wake me up a ping > on any IM mentioned in the signature will help. > > > Chapuis Bertil wrote: > >> IMHO one of the primary requirements is to clean the trunk: for exemple, >> the >> work which has been done in the droids-crawler project has to be >> integrated >> with the droids-core project. Then making some refactoring and >> implementing >> some new features will be much easier. >> > -- > paul rogalinski · mailto: [email protected] · msn: [email protected] · aim: > pu1s4r · icq: 1177279 · skype: pulsar
