OK thx, I can also remove the segments in the HDFS since I don't thing they are used for further crawls or even during merge of indexed segments ? That way I could save a lot space in keeping only one copy of the segments data.
2009/12/14 Dennis Kubes <[email protected]> > Index and segments is the minimum yes. You only need the segments for the > indexes that you are serving on the local box. > > Dennis > > > MilleBii wrote: > >> Ok I don't per say need distributed search. >> I was trying to avoid a copy to local file system to optimize on >> ressources working off HDFS >> >> What is the minimum to copy over index and segments ? Not crawldb ? >> All data in segments ? >> >> 2009/12/13, Dennis Kubes <[email protected]>: >> >>> The assumption is wrong. Distributed search is done from indexes on >>> local file systems not HDFS. >>> >>> It doesn't return because lucene is trying to search across the indexes >>> in HDFS in real time which doesn't work because of network overhead. >>> Depending on the size of the indexes it may actually return after some >>> time but I have seen it timeout even for small indexes. >>> >>> Short of it is, move the indexes and segments to a local file system, >>> then point the distributed search server at their parent directory. >>> Something like this: >>> >>> bin/nutch server 8100 /full/path/to/parent/of/local/indexes >>> >>> It technically doesn't have to be a full path. Then point the >>> searcher.dir to a directory with search-servers.txt as you have done. >>> The search-servers.txt points like you have it. >>> >>> Dennis >>> >>> MilleBii wrote: >>> >>>> I'm trying to search directly from the index in hdfs so in distributed >>>> mode >>>> >>>> What do I have wrong ? >>>> >>>> created nutch/conf/search-servers.txt with >>>> localhost 8100 >>>> >>>> pointed search.dir in nutch-site.xml to nutch/conf >>>> >>>> tried to start search server with either : >>>> + nutch server 8100 crawl >>>> + nutch server 8100 hdfs://localhost:9000/user/nutch/crawl >>>> >>>> The nutch server command doesn't return to prompt ??? >>>> Is this normal should I wait ? >>>> >>>> And of course if I try a search it doesn't work >>>> >>>> >> >> -- -MilleBii-
