The merged index directory should not be in the segments directory. The layout should be something like:

  ./index
  ./segments
  ./segments/20040429050904
  ./segments/20040430103038

I think moving your index up one level will fix things. We should make this problem easier to diagnose!

Doug

Byron Miller wrote:
I basically did 4 crawls of equal urls, put segments 1
 & 2 on server 1 and segments 3&4 on server 2. Each
with a merged index.

This is basically what i have on each server:

[EMAIL PROTECTED] nutch]$ pwd
/home/nutch
[EMAIL PROTECTED] nutch]$ cd segments/
[EMAIL PROTECTED] segments]$ ls
20040429050904 20040430103038 index
[EMAIL PROTECTED] segments]$ ls 20040429050904/
fetcher fetcher_content fetcher.done fetcher_text fetchlist index index.done
[EMAIL PROTECTED] segments]$



I then just start the server


bin/nutch server 6969 segments/

And it says:

[EMAIL PROTECTED] nutch]$ bin/nutch server 6969
segments/
040504 190945 10 opening merged index in
/home/nutch/segments/index
040504 190945 11 Server listener on port 6969:
starting
040504 190945 12 Server handler on 6969: starting
040504 190945 13 Server handler on 6969: starting
040504 190945 14 Server handler on 6969: starting
040504 190945 15 Server handler on 6969: starting
040504 190945 17 Server handler on 6969: starting
040504 190945 16 Server handler on 6969: starting
040504 190945 18 Server handler on 6969: starting
040504 190945 19 Server handler on 6969: starting
040504 190945 21 Server handler on 6969: starting
040504 190945 20 Server handler on 6969: starting

I may just kill the merged index and try and see if it
finds the segments individual indexes.

-byron


--- Doug Cutting <[EMAIL PROTECTED]> wrote:


Have you perhaps (re)moved a segment directory after
it was indexed, or somehow not kept the segments with the index? From
the backtrace below, it looks like a hit's segement directory does not
exist on a server. The segment name is indexed with each document, so
that, after indexes are merged, each still knows the name of the
directory that contains it's summary and cache data. (I don't think you
included all of the server log data. There should be a line before the
"starting" message indicating where it is finding the index.)


Each search server should be started in a directory
with a subdirectory named 'segments' containing all segments that the
server is to search, complete with 'fetcher', 'fetcher_content' and
'fetcher_text' directories, and either:


  1. a subdirectory named 'index' containing the
merged index; or
  2. an 'index' directory in each segment.

If both exist, the merged index is used.

(In fact, you don't really need to keep things quite
so coordinated. All that's really required is that some server has a
segment directory for every indexed document.)


Doug


Byron Miller wrote:


java.lang.NullPointerException
        at java.util.Hashtable.get(Hashtable.java:333)
        at


net.nutch.ipc.Client.getConnection(Client.java:273)


        at net.nutch.ipc.Client.call(Client.java:248)
        at


net.nutch.searcher.DistributedSearch$Client.getSummary(DistributedSearch.java:389)

at


net.nutch.searcher.NutchBean.getSummary(NutchBean.java:119)



-------------------------------------------------------

This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the
market... Oracle 10g. Take an Oracle 10g class now, and we'll give you the
exam FREE.



http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click


_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]


https://lists.sourceforge.net/lists/listinfo/nutch-developers




-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers


-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to