Another quick update:

I ran Luke on the index, and part-00000 works fine, whereas part-00001 comes
up as corrupt or missing. Now seeing from the list of files in both these
directories, we know that there is nothing in part-00001 - so why does it
get generated? And if it does, why does dedup not handle it gracefully?

I also ran a merge on the two indexes, and it worked fine. 

So that rests the case that both the indexes are corrupted. This brings me
to understand that since I only had two pages indexed and the index was
small, part-00001 came up with nothing, and dedup does not handle it????

Any thoughts?



-- Hetal Shah wrote: --

That's what I had read on another post as well, but somehow, I can't
understand how it can be corrupted! It's not even a massive index. Just a
couple of urls. Every step that I followed was per the tutorials on the wiki
page.

Here's the list under /indexes:

drwxr-xr-x  2 root root 4096 Jan 31 16:21 part-00000
drwxr-xr-x  2 root root 4096 Jan 31 16:21 part-00001

This is what's under part-00000

-rw-r--r--  1 root root    2 Jan 31 16:21 _2.f0
-rw-r--r--  1 root root    2 Jan 31 16:21 _2.f1
-rw-r--r--  1 root root    2 Jan 31 16:21 _2.f2
-rw-r--r--  1 root root    2 Jan 31 16:21 _2.f3
-rw-r--r--  1 root root    2 Jan 31 16:21 _2.f4
-rw-r--r--  1 root root    2 Jan 31 16:21 _2.f5
-rw-r--r--  1 root root  399 Jan 31 16:21 _2.fdt
-rw-r--r--  1 root root   16 Jan 31 16:21 _2.fdx
-rw-r--r--  1 root root   74 Jan 31 16:21 _2.fnm
-rw-r--r--  1 root root  945 Jan 31 16:21 _2.frq
-rw-r--r--  1 root root 1790 Jan 31 16:21 _2.prx
-rw-r--r--  1 root root  105 Jan 31 16:21 _2.tii
-rw-r--r--  1 root root 6850 Jan 31 16:21 _2.tis
-rw-r--r--  1 root root    4 Jan 31 16:21 deletable
-rw-r--r--  1 root root    0 Jan 31 16:21 index.done
-rw-r--r--  1 root root   27 Jan 31 16:21 segments

This is what's under part-00001

-rw-r--r--  1 root root  0 Jan 31 16:21 index.done
-rw-r--r--  1 root root 20 Jan 31 16:21 segments
 
By the way, also to mention here that I am running dedup on the DFS system.
I haven't tried running it on the local system yet, but does that matter?

Thanks for your help.




-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to