more droids questions...

Ryan McKinley Wed, 03 Sep 2008 05:57:00 -0700

I'm finally able to spend a little time messing with droids -- i'mcomparing it to aperture (http://aperture.sourceforge.net/) to figureout what the best path is.

The parts that aperture does nicely is that you define a crawler foranything: web, file system, ical, imap etc, then index based fromthat. The problem is that RDF is deeply baked into the system and Idon't see any *good* ways to extend it / scale it.

Droids looks promising, but like nutch, it seems to assume web/textcrawling.


With droids how would you make a file system crawler?
extend Protocol with file://?

Currently Parser->Parse->ParseData->Outlink[] defines the next itemsto crawl. For non web crawling, what is the proposed model?

Also, it seems that Parse.java assumes you are only working withtext. How would you crawl a directory of images and index the EXIFtags? Even considering parsing a word document (and extracting links)-- it seems a shame that the Parse interface has to reduce everythingto setText( txt ).

Within the DefaultWorker it looks like each uri is opened twice: firstin getParse() then again in handle( Parse ). Something about thatfeels wrong.


thanks
ryan

more droids questions...

Reply via email to