--
Thorsten Scherler <thorsten.at.apache.org>
codeBusters S.L. - web based systems
<consulting, training and solutions>
http://www.codebusters.es/
--- Begin Message ---
On Tue, 2011-02-08 at 11:23 +0100, florent andré wrote:
> Hi Droids guys !
>
> +1 also for release.
>
> ** I have in my bag :
>
> - Interactive droid = don't need a list of link to start a crawl, can
> order link by link (imply a refactoring of a droids class i don't recall
> the name now)
>
> - Sax output : can pass a sax consumer for output (not a clean
> integration thought, have to see pro/con to stax)
>
> - xml format for parametrize a droid and pass him a todo list [1] : this
> implementation is linked to Lenya now, but can be easily extracted.
> Thinks could be a nice feature in case of droids server and for
> communication between droids entities.
>
> ++
>
Actually nearly all of the above can be done/is been supported by
droids.
The interactive point I am not sure whether I get it, but the SAX output
is be done via a cocoon3 integration. You can find it in the archives.
Regarding the xml format, actually you can use spring to do such things
and in the spring module there are example to wire components.
However knowing where you coming from (Florent and I working together in
Lenya) we can always do a xsl transform from your semantics to spring
bean. ;)
However not seeing it as priority nº1, I second the movement to reduce
complexity of droids to be configurable via a simple file Florent
describes.
salu2
>
> [1] : here come an example of file. This ns can be included in another
> one, so you have the result of crawl include in your original file.
> <?xml version="1.0" encoding="UTF-8"?>
> <robot xmlns="http://droids.apache.org/droids/0.2">
> <!-- parametrize the droids -->
> <params>
> <!-- TODO : test this configuration, -->
> <delay>10</delay>
> <!-- TODO : use the source resolver in code to get the file, and be
> able to use fallback -->
> <filters>
>
> <resource>fallback://lenya/modules/droidsTransformer/samples/regex-urlfilter-null.txt</resource>
> </filters>
> </params>
>
> <!-- indicate locations -->
> <locations>
> <location>http://www.zegoal.com/foot/france-ligue1/</location>
> <location>http://localhost:8080/lenya/index.html</location>
> </locations>
>
> </robot>
>
>
> On 02/08/2011 10:52 AM, Chapuis Bertil wrote:
> > I agree. We have to release.
> >
> > The changes I'd like to contribute back are the following.
> >
> > - TaskQueue repleaced by java.util.Queue
> > - Handling process reviewed.
> > - Factory pattern only used for Worker
> > - Extractors inherit from Handler => no need to parse the document twice
> > - Entity renamed in Identifier
> > - ContentEntity renamed in Resource
> > - Crawler moved to droids-crawler
> > - Parser moved to droids-parser
> > - Walker moved to droids-walker
> > - The walker also use an Extractor
> > - some minor changes
> >
> >
> >
> > On 8 February 2011 10:32, Thorsten Scherler <[email protected]> wrote:
> >
> >> On Tue, 2011-02-08 at 09:50 +0100, Chapuis Bertil wrote:
> >>> In previous emails and jira comments I saw several people mentionning the
> >>> fact they have a local copy of droids which evolved too much to be merged
> >>> back with the trunk. This is my case, and I think Paul Rogalinski is in
> >> the
> >>> same situation.
> >>>
> >>> Since the patches have only been applied periodically on the trunk during
> >>> the last months, I'd love to know if someone else is in the same
> >> situation
> >>> and what can of changes they made locally.
> >>
> >> I am not sure but I see it like you describe.
> >>
> >> IMO we should release what we have right now and then plan how to merge
> >> back all this different versions into a new droids version.
> >>
> >> IMO the next droids version should focus on ease of reuse and a droid
> >> server which starts and monitors the different droids.
> >>
> >> To start with:
> >> * who has a version of droids which (s)he is interested to merge back
> >> * what are the main difference between the forge and the "original"
> >> * ...
> >>
> >> salu2
> >> --
> >> Thorsten Scherler <thorsten.at.apache.org>
> >> codeBusters S.L. - web based systems
> >> <consulting, training and solutions>
> >> http://www.codebusters.es/
> >>
> >
--
Thorsten Scherler <thorsten.at.apache.org>
codeBusters S.L. - web based systems
<consulting, training and solutions>
http://www.codebusters.es/
--- End Message ---