About your use case, if you take Droids as a framework and are ready to put
your hands in code, I think you can nearly solve all problems you
mentionned.

I personnally used Droids more as an ETL tools than a crawler. To give you
an idea of my main use case, I used it build a dataset containing all the
interventions [1] of the swiss parliament as well as the parliamentarians
votes which are stored in pdf files [2]. Then, we used social network
analysis metrics to analyse the collected data. Droids was a very flexible
and easy to customize foundation.

[1] -
http://www.parlament.ch/ab/frameset/f/n/4817/345655/f_n_4817_345655_345933.htm
[2] - http://www.parlament.ch/poly/Abstimmung/48/out/vote_48_5042.pdf

On 2 March 2011 19:21, Jeremy Arnold <[email protected]> wrote:

> Hello droids developers,
>
> I wanted to talk a little bit about my use case and hopefully get an
> idea of how far away the code base is from it, as well as where I
> could put in personal time to help get it there.
>
> I'm interested in having a standalone searching/crawling service so I
> can have 1 application hosting the searching and indexing used by many
> different custom apps. I would like to have the option to use Solr or
> elasticsearch for indexing/searching. I've been working with elastic
> search for the past few days and am growing very fond of the
> flexibility it provides. I'm also a sucker for JSON over HTTP
> services. I need to be able to start crawls for both a filesystem and
> webpages from within my custom webapp, or have the crawls run at
> scheduled times. Those crawls then need to be indexed. I also need to
> have the ability to integrate my own content handlers so I can specify
> how certain pieces of content are indexed (e.g., custom PDF metadata).
> I also need to be able to easily add or remove items in an index from
> within my custom app, as well as the obvious updating items in an
> index.
>
> How far is the codebase away from being able to be used in the
> scenario described above?
>
> I've spent a lot of time over the past 3 days looking at the droids
> code base. It looks really promising but I'm not sure where it really
> stands overall. I know the elasticsearch piece doesn't exist, and I
> would love to put together that contribution if it seems like an
> acceptable counterpart to the existing droids-solr module. I would
> also like to take on the task of bringing a lot more consistency to
> the code base (e.g., commenting, code consistency). I'm just a bit
> concerned about taking on such a large task and submitting it as a
> patch. I also see a few places where testing would be beneficial and
> do not mind attacking that as well, I just don't want to waste time
> testing things that may be going away in the future.
>
> I'd like to start a discussion about my particular use case and where
> it would be most beneficial for me to get involved, or if there is a
> project out there that is better suited for my use case. I've
> evaluated Nutch but I can't say it's been the best experience and it
> doesn't quite fit into what I am trying to do. It does look like Nutch
> 2 will fit well into my use case but the timeline does not.
>
>
> Thanks,
> Jeremy
>



-- 
Bertil Chapuis
Agimem Sàrl
http://www.agimem.com

Reply via email to