Just to clarify when I wrote about integrating the droids-crawler I wasn't
thinking about merging it with droids-core. Right now, the droids-crawler
does not use the droids-core at all. An old version of the droids-core code
is copied in the project, so by integrating it, I meant making it having a
dependency to the droids-core.

For the rest I totally agree that the core should be kept as simple as
possible.

br

On Thu, Nov 18, 2010 at 10:36 PM, Paul Rogalinski <[email protected]> wrote:

> HelloWorld,
>
> I'm currently building a crawler based on the droid-core implementation,
> trying not to change anything in the core API / interfaces yet. Due to the
> lack of documentation I was not so eager to dive directly into a lot of
> crawler-code with unclear quality. Perhaps this was a mistake, but on the
> other hand it does currently suit me quite well.
>
> My goal is to have a crawler with a very small footprint to be embedded
> into a Hadoop map/reduce job. So I am not using Spring (IMHO too much
> overhead to initialize when running inside map/reduce), recrawling or even
> multi-threaded crawling. I do plan to spawn a lot of droids, each taking
> care of one domain. Each droid has no need to jump domains or hosts.
> Extracted data will be written into an HBase cluster for further processing.
>
> This is not some hobby side project for myself but a project with real
> world deployment and it needs to be pretty much bullet proof. I am not going
> crazy about beautiful architecture but focus rather on  stable, clean and
> hopefully bugfree code. Along with that I am finding smaller bugs in the
> droids-core implementation and thinking about additions and minor changes to
> the API.
>
> I am not sure *all* of this has its place in the droids-core module - in
> the end my requirements are not very generic. But if somebody is interested
> I am open to discussion how my work can help improving droids-core.
>
> Greetings,
> Paul.
>
> P.S.
> just parked my butt over at #droids/freenode. My timezone is CET and I'll
> be checking activity on that channel in the evenings. To wake me up a ping
> on any IM mentioned in the signature will help.
>
>
> Chapuis Bertil wrote:
>
>> IMHO one of the primary requirements is to clean the trunk: for exemple,
>> the
>> work which has been done in the droids-crawler project has to be
>> integrated
>> with the droids-core project. Then making some refactoring and
>> implementing
>> some new features will be much easier.
>>
> --
> paul rogalinski · mailto: [email protected] · msn: [email protected] · aim:
> pu1s4r · icq: 1177279 · skype: pulsar

Reply via email to