That's what I had read on another post as well, but somehow, I can't understand how it can be corrupted! It's not even a massive index. Just a couple of urls. Every step that I followed was per the tutorials on the wiki page.
Here's the list under /indexes: drwxr-xr-x 2 root root 4096 Jan 31 16:21 part-00000 drwxr-xr-x 2 root root 4096 Jan 31 16:21 part-00001 This is what's under part-00000 -rw-r--r-- 1 root root 2 Jan 31 16:21 _2.f0 -rw-r--r-- 1 root root 2 Jan 31 16:21 _2.f1 -rw-r--r-- 1 root root 2 Jan 31 16:21 _2.f2 -rw-r--r-- 1 root root 2 Jan 31 16:21 _2.f3 -rw-r--r-- 1 root root 2 Jan 31 16:21 _2.f4 -rw-r--r-- 1 root root 2 Jan 31 16:21 _2.f5 -rw-r--r-- 1 root root 399 Jan 31 16:21 _2.fdt -rw-r--r-- 1 root root 16 Jan 31 16:21 _2.fdx -rw-r--r-- 1 root root 74 Jan 31 16:21 _2.fnm -rw-r--r-- 1 root root 945 Jan 31 16:21 _2.frq -rw-r--r-- 1 root root 1790 Jan 31 16:21 _2.prx -rw-r--r-- 1 root root 105 Jan 31 16:21 _2.tii -rw-r--r-- 1 root root 6850 Jan 31 16:21 _2.tis -rw-r--r-- 1 root root 4 Jan 31 16:21 deletable -rw-r--r-- 1 root root 0 Jan 31 16:21 index.done -rw-r--r-- 1 root root 27 Jan 31 16:21 segments This is what's under part-00001 -rw-r--r-- 1 root root 0 Jan 31 16:21 index.done -rw-r--r-- 1 root root 20 Jan 31 16:21 segments By the way, also to mention here that I am running dedup on the DFS system. I haven't tried running it on the local system yet, but does that matter? Thanks for your help. Hetal Shah wrote: > Hey guys > > Been breaking my head over this error for a while now, but don't seem > to be getting anywhere! I have tried creating / recreating the index > several times, and also made sure that all settings were as "per the > book". I read somewhere on one of the other posts that this error > could be due to a corrupted index, but somehow, I don't think that's > the case. I only have a few urls in the index with depth 1, so it's not even a large crawl! > > There are two directories in my crawled/indexes directory, viz. > part-00000 and part-00001. > Could you do an 'ls -l' to show the content and sizes of these parts? > > Task TASKID="tip_0009_m_000001" TASK_TYPE="MAP" TASK_STATUS="FAILED" > FINISH_TIME="1170237489795" ERROR="java.lang.ArrayIndexOutOfBoundsException: > -1 > at > org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:109) > This usually indicates that one or more indexes under crawled/indexes is invalid - nonexistent, incomplete or corrupt. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
