Re: Distributed Search problem

MilleBii Sun, 13 Dec 2009 01:58:37 -0800

Ok I don't per say need distributed search.
I was trying to avoid a copy to local file system to optimize on
ressources working off HDFS


What is the minimum to copy over index and segments ? Not crawldb ?
All data in segments ?

2009/12/13, Dennis Kubes <[email protected]>:
> The assumption is wrong.  Distributed search is done from indexes on
> local file systems not HDFS.
>
> It doesn't return because lucene is trying to search across the indexes
> in HDFS in real time which doesn't work because of network overhead.
> Depending on the size of the indexes it may actually return after some
> time but I have seen it timeout even for small indexes.
>
> Short of it is, move the indexes and segments to a local file system,
> then point the distributed search server at their parent directory.
> Something like this:
>
> bin/nutch server 8100 /full/path/to/parent/of/local/indexes
>
> It technically doesn't have to be a full path.  Then point the
> searcher.dir to a directory with search-servers.txt as you have done.
> The search-servers.txt points like you have it.
>
> Dennis
>
> MilleBii wrote:
>> I'm trying to search directly from the index in hdfs so in distributed
>> mode
>>
>> What do I have wrong ?
>>
>> created  nutch/conf/search-servers.txt with
>>  localhost 8100
>>
>> pointed  search.dir in nutch-site.xml to nutch/conf
>>
>> tried to start search server with either :
>>  + nutch server 8100  crawl
>>  + nutch server 8100 hdfs://localhost:9000/user/nutch/crawl
>>
>> The nutch server command doesn't return to prompt ???
>> Is this normal should I wait ?
>>
>> And of course if I try a search it doesn't work
>>
>


-- 
-MilleBii-

Re: Distributed Search problem

Reply via email to