On Mon, 2008-09-15 at 22:35 -0400, Ryan McKinley wrote:
> I'm tearing into things, figuring out what I think the API direction
> is/should be... I'm having trouble writing a coherent message, so
> I'll just send this and see how you feel. In general, I think we need
> to make a bigger separation between what is 'core' and what are the
> building blocks for specific use cases.
My answer ATM will be short since I need to finish some work.
First of all many thanks for this concept/architecture overview.
>
> CORE
> ================================
> Fundamentally, Droids is a framework to keep a bunch of Workers
> processing Tasks.
>
and Droid (robots, being crawler or racer)
> "Core" components relate to keeping Workers on Task.
>
> From the existing API, I think the following are "core"
> Queue
> Task
> Worker
> perhaps DelayTimer/Worker
>
> Core should deal with all the threading issues related to managing the
> Tasks. All the ThreadPoolExecutor stuff.
Agree.
>
> Unless I'm missing something, I don't even see why Droid is an
> interface -- it appears to be the parent container for management
> logic. AbstractDroid introduces some shared logic. Is it just that
> makes the manager Runnable?
The interface is to do
Droid droid = getDroid(name);
droid.run();
Every robot needs to implement the interface to invoke it generically.
>
> The javadoc for Droid run() says: "Invoke an instance of the worker
> used in the droid" but the behavior in HelloCrawler is that run()
> initializes everything and starts the workers. Is there a reason this
> needs to happen in its own Runnable instance?
No, if you can move it I would be delighted. Some code still has legacy
stuff in it which allowed to hammer a first working prototype but would
never win the cleanest coding award. ;)
> It seems the 'core' would focus on things like ThreadPoolExecutor.
>
+1
In addition one part of core should be dedicated for communication
between worker, droid and core. That we can create a webinterface that
allows you to control different droids and see their current workload
and success.
> I don't see any need for the existing Core.java class -- is it just
> there to make spring configuration easier. This seems like poor
> design since it gives access to everything. In my view, each
> component should only have access to what it needs.
Agree, some methods need to change their visibility and the core a
rewrite. It should be dedicated to the above mentioned points.
Maybe as well in the light of LABS-144.
>
> Is the existing Core.java just part of the Cli helper app? In
> public void start(String name){
> Droid droid = getDroid(name);
> droid.run();
> }
>
That is one important method of the CLI.
>
>
> COMPONENTS / Blocks? other name?
> =================================
>
> Each Droid implementation would include the 'Core' plus a set of
> components wired together. From the existing API, the things that
> strike me as components are:
>
> Protocol
> URL >> InputStream
> Parsing
> InputStream >> Metadata
parsing will produce SAX events and metadata
> Handler? Action?
> Metadata >> something
> (save to solr)
> (write to disk)
LABS-149
Extractor, consume events and extract Tasks.
First example is link extraction since it is crucial part in crawling.
>
>
> DROIDS
> =========
> We should deliver a few standard use cases where all the plumbing is
> hooked together:
> 1. simple web crawler
- HelloCrawler
> 2. simple filesystem walker
- FileRenameRacer
> 3. IMAP walker
TBN
Cheers Ryan.
salu2
--
Thorsten Scherler thorsten.at.apache.org
Open Source Java consulting, training and solutions
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]