Hello droids developers, I wanted to talk a little bit about my use case and hopefully get an idea of how far away the code base is from it, as well as where I could put in personal time to help get it there.
I'm interested in having a standalone searching/crawling service so I can have 1 application hosting the searching and indexing used by many different custom apps. I would like to have the option to use Solr or elasticsearch for indexing/searching. I've been working with elastic search for the past few days and am growing very fond of the flexibility it provides. I'm also a sucker for JSON over HTTP services. I need to be able to start crawls for both a filesystem and webpages from within my custom webapp, or have the crawls run at scheduled times. Those crawls then need to be indexed. I also need to have the ability to integrate my own content handlers so I can specify how certain pieces of content are indexed (e.g., custom PDF metadata). I also need to be able to easily add or remove items in an index from within my custom app, as well as the obvious updating items in an index. How far is the codebase away from being able to be used in the scenario described above? I've spent a lot of time over the past 3 days looking at the droids code base. It looks really promising but I'm not sure where it really stands overall. I know the elasticsearch piece doesn't exist, and I would love to put together that contribution if it seems like an acceptable counterpart to the existing droids-solr module. I would also like to take on the task of bringing a lot more consistency to the code base (e.g., commenting, code consistency). I'm just a bit concerned about taking on such a large task and submitting it as a patch. I also see a few places where testing would be beneficial and do not mind attacking that as well, I just don't want to waste time testing things that may be going away in the future. I'd like to start a discussion about my particular use case and where it would be most beneficial for me to get involved, or if there is a project out there that is better suited for my use case. I've evaluated Nutch but I can't say it's been the best experience and it doesn't quite fit into what I am trying to do. It does look like Nutch 2 will fit well into my use case but the timeline does not. Thanks, Jeremy
