OK thx, I can also remove the segments in the HDFS since I don't thing they
are used for further crawls or even during merge of indexed segments ?
That way I could save a lot space in keeping only one copy of the segments
data.


2009/12/14 Dennis Kubes <[email protected]>

> Index and segments is the minimum yes.  You only need the segments for the
> indexes that you are serving on the local box.
>
> Dennis
>
>
> MilleBii wrote:
>
>> Ok I don't per say need distributed search.
>> I was trying to avoid a copy to local file system to optimize on
>> ressources working off HDFS
>>
>> What is the minimum to copy over index and segments ? Not crawldb ?
>> All data in segments ?
>>
>> 2009/12/13, Dennis Kubes <[email protected]>:
>>
>>> The assumption is wrong.  Distributed search is done from indexes on
>>> local file systems not HDFS.
>>>
>>> It doesn't return because lucene is trying to search across the indexes
>>> in HDFS in real time which doesn't work because of network overhead.
>>> Depending on the size of the indexes it may actually return after some
>>> time but I have seen it timeout even for small indexes.
>>>
>>> Short of it is, move the indexes and segments to a local file system,
>>> then point the distributed search server at their parent directory.
>>> Something like this:
>>>
>>> bin/nutch server 8100 /full/path/to/parent/of/local/indexes
>>>
>>> It technically doesn't have to be a full path.  Then point the
>>> searcher.dir to a directory with search-servers.txt as you have done.
>>> The search-servers.txt points like you have it.
>>>
>>> Dennis
>>>
>>> MilleBii wrote:
>>>
>>>> I'm trying to search directly from the index in hdfs so in distributed
>>>> mode
>>>>
>>>> What do I have wrong ?
>>>>
>>>> created  nutch/conf/search-servers.txt with
>>>>  localhost 8100
>>>>
>>>> pointed  search.dir in nutch-site.xml to nutch/conf
>>>>
>>>> tried to start search server with either :
>>>>  + nutch server 8100  crawl
>>>>  + nutch server 8100 hdfs://localhost:9000/user/nutch/crawl
>>>>
>>>> The nutch server command doesn't return to prompt ???
>>>> Is this normal should I wait ?
>>>>
>>>> And of course if I try a search it doesn't work
>>>>
>>>>
>>
>>


-- 
-MilleBii-

Reply via email to