Alright, I'll try next time I'm at work (would be next Friday cause I'm just a student worker). Thanks for your great help ;)
Regards, -- Erik H. Gal Nitzan schrieb: > Erik, > > I'm not sure cause' I worked with your version long time ago (work with 0.9) > so I'm not sure I'm right about the "crawl_generate and crawl_parse" folders > in the segment structure. > > However, two days ago I had that same exception when one of my segments was > missing the parse folder in the segment. > > So maybe you need to parse the segments again (bin/nutch parse > segments/segmentname) > > HTH, > > Gal. > > > > -----Original Message----- > From: Erik Höschler [mailto:[EMAIL PROTECTED] > Sent: Friday, January 26, 2007 6:21 PM > To: [email protected] > Subject: Re: Problems Searching an Index with Nutch > > Ok, > > I could not find any crawl_generate or crawl_parse Folder. Also I didn't > find Catalina.out on my whole System?!?! > > One thing I won't understand is the fact that nutch should create my > folder structure. If there is a fault in it, just like > the missing folders or the 'db' folder which should normally be > 'linkdb', how can I fix this. I didn't change anything at > the structure by my own so it must have been created by nutch > directly... Any idea how this could happen? > > Thanks for your time ;) > > --Erik > > Gal Nitzan schrieb: > > >> Well I guess that db is linkdb for ver 0.7 . >> >> Any way there is not much info maybe you can find more info in the >> Catalina.out ... >> >> One more thing to look for just maybe it is the reason (long shut)... >> > check > >> each of your segment folders and verify that it contains all the 5 folders >> i.e. content,crawl_generate,crawl_parse,parse_data,parse_text >> >> HTH >> >> Gal. >> >> -----Original Message----- >> From: Erik Höschler [mailto:[EMAIL PROTECTED] >> Sent: Friday, January 26, 2007 5:58 PM >> To: [email protected] >> Subject: Re: Problems Searching an Index with Nutch >> >> Hi, >> >> I checked my FolderStructure and everything seems to be correct... >> >> :/opt/nutch/crawl.db# l >> insgesamt 8 >> drwxr-xr-x 3 root root 53 2007-01-19 14:11 db >> drwxr-xr-x 2 root root 4096 2007-01-19 14:18 index >> drwxr-xr-x 12 root root 4096 2007-01-26 15:06 segments >> >> I'm not sure if I've ever had a linkdb Folder or did you mean the db >> folder listed above? >> >> Greetings, >> Erik >> >> Gal Nitzan schrieb: >> >> >>> Hi, >>> >>> I'm not sure but it seems to me you are missing the linkdb and segments >>> folder. It should be located on the same level as the index folder. >>> >>> HTH/ >>> >>> Gal >>> >>> -----Original Message----- >>> From: Erik Höschler [mailto:[EMAIL PROTECTED] >>> Sent: Friday, January 26, 2007 5:04 PM >>> To: [email protected] >>> Cc: Erik >>> Subject: Problems Searching an Index with Nutch >>> >>> Hi, >>> >>> I'm running Nutch-0.7.2. I created an Index for my local Lan which >>> consists of 45.000 Pages. >>> I can inspect this Index with Luke an everything looks fine. When I try >>> to start a search Query with Nutch >>> I can see the following Exception in my JBOSS Logfile (at the End of the >>> Log). >>> >>> >>> //Here I'm redploying the Nutch.war Archive.... >>> 2007-01-26 15:55:06,611 INFO [org.jboss.web.tomcat.tc5.TomcatDeployer] >>> deploy, ctxPath=/nutch, >>> >>> >>> > warUrl=file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/ > >> >> >>> 2007-01-26 15:55:06,831 DEBUG [tomcat.localhost./nutch.Context] Starting >>> tomcat.localhost./nutch.Context >>> 2007-01-26 15:55:06,832 DEBUG [tomcat.localhost./nutch.Context] >>> Configuring default Resources >>> 2007-01-26 15:55:06,836 DEBUG [tomcat.localhost./nutch.Context] >>> Processing standard container startup >>> 2007-01-26 15:55:06,844 DEBUG [tomcat.localhost./nutch.Context] Setting >>> deployment descriptor public ID to '-//Sun Microsystems, Inc.//DTD Web >>> Application 2.3//EN' >>> 2007-01-26 15:55:06,862 DEBUG [tomcat.localhost./nutch.Context] Setting >>> deployment descriptor public ID to '-//Sun Microsystems, Inc.//DTD Web >>> Application 2.3//EN' >>> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Posting >>> standard context attributes >>> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] >>> Configuring application event listeners >>> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Sending >>> application start events >>> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Starting >>> filters >>> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] >>> Starting filter 'CommonHeadersFilter' >>> 2007-01-26 15:55:06,867 DEBUG [tomcat.localhost./nutch.Context] Starting >>> completed //Archive successfully loaded...?!?! >>> 2007-01-26 15:55:06,867 DEBUG [tomcat.localhost./nutch.Context] Checking >>> for >>> >>> >>> > jboss.web:j2eeType=WebModule,name=//localhost/nutch,J2EEApplication=none,J2E > >> >> >>> EServer=none >>> >>> >>> //Here I startet a query in my Webbrowser... >>> 2007-01-26 15:55:53,585 INFO [STDOUT] 070126 155553 parsing >>> >>> >>> > file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF > >> >> >>> /classes/nutch-default.xml >>> 2007-01-26 15:55:53,591 INFO [STDOUT] 070126 155553 parsing >>> >>> >>> > file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF > >> >> >>> /classes/nutch-site.xml >>> 2007-01-26 15:55:53,599 INFO [STDOUT] 070126 155553 Plugins: looking >>> in: >>> >>> >>> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> >> >>> ses/plugins >>> 2007-01-26 15:55:53,600 INFO [STDOUT] 070126 155553 not including: >>> >>> >>> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> >> >>> ses/plugins/clustering-carrot2 >>> 2007-01-26 15:55:53,600 INFO [STDOUT] 070126 155553 not including: >>> >>> >>> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> >> >>> ses/plugins/creativecommons >>> 2007-01-26 15:55:53,600 INFO [STDOUT] 070126 155553 parsing: >>> >>> >>> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> >> >>> ses/plugins/index-basic/plugin.xml >>> 2007-01-26 15:55:53,607 INFO [STDOUT] 070126 155553 impl: >>> point=org.apache.nutch.indexer.IndexingFilter >>> class=org.apache.nutch.indexer.basic.BasicIndexingFilter >>> 2007-01-26 15:55:53,609 INFO [STDOUT] 070126 155553 not including: >>> >>> >>> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> >> >>> ses/plugins/index-more >>> 2007-01-26 15:55:53,609 INFO [STDOUT] 070126 155553 not including: >>> >>> >>> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> >> >>> ses/plugins/language-identifier >>> 2007-01-26 15:55:53,609 INFO [STDOUT] 070126 155553 parsing: >>> >>> >>> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> >> >>> ses/plugins/nutch-extensionpoints/plugin.xml >>> 2007-01-26 15:55:53,612 INFO [STDOUT] 070126 155553 not including: >>> >>> >>> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> >> >>> ses/plugins/ontology >>> 2007-01-26 15:55:53,612 INFO [STDOUT] 070126 155553 not including: >>> >>> >>> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> >> >>> ses/plugins/parse-ext >>> 2007-01-26 15:55:53,613 INFO [STDOUT] 070126 155553 parsing: >>> >>> >>> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> >> >>> ses/plugins/parse-html/plugin.xml >>> 2007-01-26 15:55:53,614 INFO [STDOUT] 070126 155553 impl: >>> point=org.apache.nutch.parse.Parser >>> class=org.apache.nutch.parse.html.HtmlParser >>> 2007-01-26 15:55:53,615 INFO [STDOUT] 070126 155553 not including: >>> >>> >>> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> >> >>> ses/plugins/parse-js >>> 2007-01-26 15:55:53,615 INFO [STDOUT] 070126 155553 not including: >>> >>> >>> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> >> >>> ses/plugins/parse-msword >>> 2007-01-26 15:55:53,615 INFO [STDOUT] 070126 155553 not including: >>> >>> >>> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> >> >>> ses/plugins/parse-pdf >>> 2007-01-26 15:55:53,615 INFO [STDOUT] 070126 155553 not including: >>> >>> >>> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> >> >>> ses/plugins/parse-rss >>> 2007-01-26 15:55:53,615 INFO [STDOUT] 070126 155553 parsing: >>> >>> >>> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> >> >>> ses/plugins/parse-text/plugin.xml >>> 2007-01-26 15:55:53,617 INFO [STDOUT] 070126 155553 impl: >>> point=org.apache.nutch.parse.Parser >>> class=org.apache.nutch.parse.text.TextParser >>> 2007-01-26 15:55:53,617 INFO [STDOUT] 070126 155553 not including: >>> >>> >>> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> >> >>> ses/plugins/protocol-file >>> 2007-01-26 15:55:53,618 INFO [STDOUT] 070126 155553 not including: >>> >>> >>> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> >> >>> ses/plugins/protocol-ftp >>> 2007-01-26 15:55:53,618 INFO [STDOUT] 070126 155553 parsing: >>> >>> >>> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> >> >>> ses/plugins/protocol-http/plugin.xml >>> 2007-01-26 15:55:53,619 INFO [STDOUT] 070126 155553 impl: >>> point=org.apache.nutch.protocol.Protocol >>> class=org.apache.nutch.protocol.http.Http >>> 2007-01-26 15:55:53,620 INFO [STDOUT] 070126 155553 not including: >>> >>> >>> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> >> >>> ses/plugins/protocol-httpclient >>> 2007-01-26 15:55:53,620 INFO [STDOUT] 070126 155553 parsing: >>> >>> >>> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> >> >>> ses/plugins/query-basic/plugin.xml >>> 2007-01-26 15:55:53,622 INFO [STDOUT] 070126 155553 impl: >>> point=org.apache.nutch.searcher.QueryFilter >>> class=org.apache.nutch.searcher.basic.BasicQueryFilter >>> 2007-01-26 15:55:53,622 INFO [STDOUT] 070126 155553 not including: >>> >>> >>> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> >> >>> ses/plugins/query-more >>> 2007-01-26 15:55:53,622 INFO [STDOUT] 070126 155553 parsing: >>> >>> >>> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> >> >>> ses/plugins/query-site/plugin.xml >>> 2007-01-26 15:55:53,624 INFO [STDOUT] 070126 155553 impl: >>> point=org.apache.nutch.searcher.QueryFilter >>> class=org.apache.nutch.searcher.site.SiteQueryFilter >>> 2007-01-26 15:55:53,624 INFO [STDOUT] 070126 155553 parsing: >>> >>> >>> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> >> >>> ses/plugins/query-url/plugin.xml >>> 2007-01-26 15:55:53,626 INFO [STDOUT] 070126 155553 impl: >>> point=org.apache.nutch.searcher.QueryFilter >>> class=org.apache.nutch.searcher.url.URLQueryFilter >>> 2007-01-26 15:55:53,626 INFO [STDOUT] 070126 155553 not including: >>> >>> >>> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> >> >>> ses/plugins/urlfilter-prefix >>> 2007-01-26 15:55:53,626 INFO [STDOUT] 070126 155553 parsing: >>> >>> >>> > /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas > >> >> >>> ses/plugins/urlfilter-regex/plugin.xml >>> 2007-01-26 15:55:53,628 INFO [STDOUT] 070126 155553 impl: >>> point=org.apache.nutch.net.URLFilter >>> class=org.apache.nutch.net.RegexURLFilter >>> 2007-01-26 15:55:53,639 INFO [STDOUT] 070126 155553 10 creating new bean >>> 2007-01-26 15:55:53,640 INFO [STDOUT] 070126 155553 10 opening segment >>> indexes in /srv/opt/nutch-0.7.2/crawl.db/segments >>> 2007-01-26 15:55:53,652 ERROR [org.jboss.web.localhost.Engine] >>> StandardWrapperValve[jsp]: Servlet.service() for servlet jsp threw >>> >>> >> exception >> >> >>> java.lang.ArrayIndexOutOfBoundsException >>> >>> >>> >>> In my Browser i got the following Error ... >>> >>> >>> HTTP Status 500 - >>> >>> ------------------------------------------------------------------------ >>> >>> *type* Exception report >>> >>> *message* >>> >>> *description* _The server encountered an internal error () that >>> prevented it from fulfilling this request._ >>> >>> *exception* >>> >>> org.apache.jasper.JasperException >>> >>> >>> >>> > org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:3 > >> >> >>> 72) >>> >>> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:292) >>> org.apache.jasper.servlet.JspServlet.service(JspServlet.java:236) >>> javax.servlet.http.HttpServlet.service(HttpServlet.java:810) >>> >>> >>> >>> > org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.ja > >> >> >>> va:75) >>> >>> *root cause* >>> >>> java.lang.ArrayIndexOutOfBoundsException >>> >>> *note* _The full stack trace of the root cause is available in the >>> Apache Tomcat/5.0.28 logs._ >>> >>> ------------------------------------------------------------------------ >>> >>> >>> Apache Tomcat/5.0.28 >>> >>> >>> >>> I also tested this Search on a newly created Index ( a small one ) but >>> got the same error. I Also tried to run Nutch-0.8.1 but still the same. >>> Also I couldn't find any information about this error and now I don't >>> know what to do. Maybe you have got a idea... >>> >>> Thanks in advance... >>> >>> Yours sincerely, >>> Erik H. >>> >>> >>> >>> >>> >> >> >> > > > > > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
