Does anyone have stats on how multiple readers on an optimized Lucene index in HDFS compares with a ParallelMultiReader (or whatever its called) over RPC on a local filesystem?
I'm missing why you would ever want the Lucene index in HDFS for reading. Ian Ning Li <ning.li...@gmail.com> writes: > I should have pointed out that Nutch index build and contrib/index > targets different applications. The latter is for applications who > simply want to build Lucene index from a set of documents - e.g. no > link analysis. > > As to writing Lucene indexes, both work the same way - write the final > results to local file system and then copy to HDFS. In contrib/index, > the intermediate results are in memory and not written to HDFS. > > Hope it clarifies things. > > Cheers, > Ning > > > On Mon, Mar 16, 2009 at 2:57 PM, Ian Soboroff <ian.sobor...@nist.gov> wrote: >> >> I understand why you would index in the reduce phase, because the anchor >> text gets shuffled to be next to the document. However, when you index >> in the map phase, don't you just have to reindex later? >> >> The main point to the OP is that HDFS is a bad FS for writing Lucene >> indexes because of how Lucene works. The simple approach is to write >> your index outside of HDFS in the reduce phase, and then merge the >> indexes from each reducer manually. >> >> Ian >> >> Ning Li <ning.li...@gmail.com> writes: >> >>> Or you can check out the index contrib. The difference of the two is that: >>> - In Nutch's indexing map/reduce job, indexes are built in the >>> reduce phase. Afterwards, they are merged into smaller number of >>> shards if necessary. The last time I checked, the merge process does >>> not use map/reduce. >>> - In contrib/index, small indexes are built in the map phase. They >>> are merged into the desired number of shards in the reduce phase. In >>> addition, they can be merged into existing shards. >>> >>> Cheers, >>> Ning >>> >>> >>> On Fri, Mar 13, 2009 at 1:34 AM, 王红宝 <imcap...@126.com> wrote: >>>> you can see the nutch code. >>>> >>>> 2009/3/13 Mark Kerzner <markkerz...@gmail.com> >>>> >>>>> Hi, >>>>> >>>>> How do I allow multiple nodes to write to the same index file in HDFS? >>>>> >>>>> Thank you, >>>>> Mark >>>>> >>>> >> >>