Does anyone have stats on how multiple readers on an optimized Lucene
index in HDFS compares with a ParallelMultiReader (or whatever its
called) over RPC on a local filesystem?

I'm missing why you would ever want the Lucene index in HDFS for
reading.

Ian

Ning Li <ning.li...@gmail.com> writes:

> I should have pointed out that Nutch index build and contrib/index
> targets different applications. The latter is for applications who
> simply want to build Lucene index from a set of documents - e.g. no
> link analysis.
>
> As to writing Lucene indexes, both work the same way - write the final
> results to local file system and then copy to HDFS. In contrib/index,
> the intermediate results are in memory and not written to HDFS.
>
> Hope it clarifies things.
>
> Cheers,
> Ning
>
>
> On Mon, Mar 16, 2009 at 2:57 PM, Ian Soboroff <ian.sobor...@nist.gov> wrote:
>>
>> I understand why you would index in the reduce phase, because the anchor
>> text gets shuffled to be next to the document.  However, when you index
>> in the map phase, don't you just have to reindex later?
>>
>> The main point to the OP is that HDFS is a bad FS for writing Lucene
>> indexes because of how Lucene works.  The simple approach is to write
>> your index outside of HDFS in the reduce phase, and then merge the
>> indexes from each reducer manually.
>>
>> Ian
>>
>> Ning Li <ning.li...@gmail.com> writes:
>>
>>> Or you can check out the index contrib. The difference of the two is that:
>>>   - In Nutch's indexing map/reduce job, indexes are built in the
>>> reduce phase. Afterwards, they are merged into smaller number of
>>> shards if necessary. The last time I checked, the merge process does
>>> not use map/reduce.
>>>   - In contrib/index, small indexes are built in the map phase. They
>>> are merged into the desired number of shards in the reduce phase. In
>>> addition, they can be merged into existing shards.
>>>
>>> Cheers,
>>> Ning
>>>
>>>
>>> On Fri, Mar 13, 2009 at 1:34 AM, 王红宝 <imcap...@126.com> wrote:
>>>> you can see the nutch code.
>>>>
>>>> 2009/3/13 Mark Kerzner <markkerz...@gmail.com>
>>>>
>>>>> Hi,
>>>>>
>>>>> How do I allow multiple nodes to write to the same index file in HDFS?
>>>>>
>>>>> Thank you,
>>>>> Mark
>>>>>
>>>>
>>
>>

Reply via email to