Hey list, On Fri, Jul 17, 2009 at 16:55, Andrzej Bialecki<a...@getopt.org> wrote: > Hi all, > > I think we should be creating a sandbox area, where we can collaborate > on various subprojects, such as HBase, OSGI, Tika parsers, etc. Dogacan will > be importing his HBase work as 'nutchbase'. Tika work is the least > disruptive, so it could occur even on trunk. OSGI plugins work (which I'd > like to tackle) means significant refactoring so I'd rather put this on a > branch too. >
Thanks for starting the discussion, Andrzej. Can you detail your OSGI plugin framework design? Maybe I missed the discussion but updating the plugin system has been something that I wanted to do for a long time :) so I am very much interested in your design. > Dogacan, you mentioned that you would like to work on Katta integration. > Could you shed some light on how this fits with the abstract indexing & > searching layer that we now have, and how distributed Solr fits into this > picture? > I haven't yet given much thought to Katta integration. But basically, I am thinking of indexing newly-crawled documents as lucene shards and uploading them to katta for searching. This should be very possible with the new indexing system. But so far, I have neither studied katta too much nor given much thought to integration. So I may be missing obvious stuff. About distributed solr: I very much like to do this and again, I think, this should be possible to do within nutch. However, distributed solr is ultimately uninteresting to me because (AFAIK) it doesn't have the reliability and high-availability that hadoop&hbase have, i.e. if a machine dies you lose that part of the index. Are there any projects going on that are live indexing systems like solr, yet are backed up by hadoop HDFS like katta? > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > > -- Doğacan Güney