Hi everybody, I really need some good advice! I need to index in lucene something like 1.4 billions documents. I had experience in lucene but I've never worked with such a big number of documents. Also this is just the number of docs at "start-up": they are going to grow and fast.
I don't have to tell you that I need the system to be fast and to support real time updates to the documents The first solution that came to my mind was to use ParallelMultiSearcher, splitting the index into many "sub-index" (how many docs per index? 100,000?) but I don't have experience with it and I don't know how well will scale while the number of documents grows! A more solid solution seems to build some kind of integration with hadoop. But I didn't find match about lucene and hadoop integration. Any idea? Which direction should I go (pure lucene or hadoop)? Thanks Luca