[jira] Created: (NUTCH-793) search.jsp compile errors
search.jsp compile errors - Key: NUTCH-793 URL: https://issues.apache.org/jira/browse/NUTCH-793 Project: Nutch Issue Type: Bug Components: web gui Reporter: Sami Siren Assignee: Sami Siren Fix For: 1.1 Related to the searcher interface changes recently committed I broke search.jsp which does not currently compile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: exception in search.jsp
Hi Jesse, thanks for spotting this. I fixed the problem in trunk, see https://issues.apache.org/jira/browse/NUTCH-793 -- Sami Siren Jesse Hires wrote: I am seeing the following and am able to find any notes anywhere on it. org.apache.jasper.JasperException: Unable to compile class for JSP: An error occurred at line: 207 in the jsp file: /search.jsp query.getParams cannot be resolved or is not a field 204: // position this is good, bad?... ugly? 205:Hits hits; 206:try{ 207: query.getParams.initFrom(start + hitsToRetrieve, hitsPerSite, site, sort, reverse); 208: hits = bean.search(query); 209:} catch (IOException e){ 210: hits = new Hits(0,new Hit[0]); It looks like this change came in recently to SVN --- lucene/nutch/trunk/src/web/jsp/search.jsp 2009/10/09 17:02:32 823614 +++ lucene/nutch/trunk/src/web/jsp/search.jsp 2010/02/01 20:47:34 905410 @@ -204,8 +204,8 @@ // position this is good, bad?... ugly? Hits hits; try{ - hits = bean.search(query, start + hitsToRetrieve, hitsPerSite, site, -sort, reverse); + query.getParams.initFrom(start + hitsToRetrieve, hitsPerSite, site, sort, reverse); + hits = bean.search(query); } catch (IOException e){ hits = new Hits(0,new Hit[0]); } Has anyone else run into this, or did I miss something when updating to the latest version? Jesse int GetRandomNumber() { return 4; // Chosen by fair roll of dice // Guaranteed to be random } // xkcd.com http://xkcd.com
[jira] Resolved: (NUTCH-793) search.jsp compile errors
[ https://issues.apache.org/jira/browse/NUTCH-793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren resolved NUTCH-793. -- Resolution: Fixed committed a fix search.jsp compile errors - Key: NUTCH-793 URL: https://issues.apache.org/jira/browse/NUTCH-793 Project: Nutch Issue Type: Bug Components: web gui Reporter: Sami Siren Assignee: Sami Siren Fix For: 1.1 Related to the searcher interface changes recently committed I broke search.jsp which does not currently compile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (NUTCH-788) search.jsp typo causing searches to fail
[ https://issues.apache.org/jira/browse/NUTCH-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren resolved NUTCH-788. -- Resolution: Fixed Fix Version/s: 1.1 Assignee: Sami Siren Thanks Sammy for the fix, I did not realize you had spotted this too. It's now fixed in trunk. search.jsp typo causing searches to fail Key: NUTCH-788 URL: https://issues.apache.org/jira/browse/NUTCH-788 Project: Nutch Issue Type: Bug Components: web gui Affects Versions: 1.1 Environment: On trunk Reporter: Sammy Yu Assignee: Sami Siren Fix For: 1.1 Attachments: 0001-Fix-up-servlet.patch Call to initialize the servlet parameter is missing parentheses. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-789) Improvements to Tika parser
[ https://issues.apache.org/jira/browse/NUTCH-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833714#action_12833714 ] Sami Siren commented on NUTCH-789: -- It would be really useful to include the improvements in the functionality since that way almost all (-flash ?) parsers would be covered. Improvements to Tika parser --- Key: NUTCH-789 URL: https://issues.apache.org/jira/browse/NUTCH-789 Project: Nutch Issue Type: Improvement Components: fetcher Environment: reported by Sami, in NUTCH-766 Reporter: Chris A. Mattmann Assignee: Chris A. Mattmann Priority: Minor Fix For: 1.1 Attachments: NutchTikaConfig.java, TikaParser.java As reported by Sami in NUTCH-766, Sami has a few improvements he made to the Tika parser. We'll track that progress here. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (NUTCH-766) Tika parser
[ https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-766. --- Have added small improvement in revision 910187 (Prioritise default Tika parser when discovering plugins matching mime-type). Thanks to Chris for testing and committing it + Andrzej and Sami for their comments and suggestions Tika parser --- Key: NUTCH-766 URL: https://issues.apache.org/jira/browse/NUTCH-766 Project: Nutch Issue Type: New Feature Reporter: Julien Nioche Assignee: Chris A. Mattmann Fix For: 1.1 Attachments: NUTCH-766-v3.patch, NUTCH-766.v2, NutchTikaConfig.java, sample.tar.gz, TikaParser.java Tika handles a lot of different formats under the bonnet and exposes them nicely via SAX events. What is described here is a tika-parser plugin which delegates the pasring mechanism of Tika but can still coexist with the existing parsing plugins which is useful for formats partially handled by Tika (or not at all). Some of the elements below have already been discussed on the mailing lists. Note that this is work in progress, your feedback is welcome. Tika is already used by Nutch for its MimeType implementations. Tika comes as different jar files (core and parsers), in the work described here we decided to put the libs in 2 different places NUTCH_HOME/lib : tika-core.jar NUTCH_HOME/tika-plugin/lib : tika-parsers.jar Tika being used by the core only for its Mimetype functionalities we only need to put tika-core at the main lib level whereas the tika plugin obviously needs the tika-parsers.jar + all the jars used internally by Tika Due to limitations in the way Tika loads its classes, we had to duplicate the TikaConfig class in the tika-plugin. This might be fixed in the future in Tika itself or avoided by refactoring the mimetype part of Nutch using extension points. Unlike most other parsers, Tika handles more than one Mime-type which is why we are using * as its mimetype value in the plugin descriptor and have modified ParserFactory.java so that it considers the tika parser as potentially suitable for all mime-types. In practice this means that the associations between a mime type and a parser plugin as defined in parse-plugins.xml are useful only for the cases where we want to handle a mime type with a different parser than Tika. The general approach I chose was to convert the SAX events returned by the Tika parsers into DOM objects and reuse the utilities that come with the current HTML parser i.e. link detection, metatag handling but also means that we can use the HTMLParseFilters in exactly the same way. The main difference though is that HTMLParseFilters are not limited to HTML documents anymore as the XHTML tags returned by Tika can correspond to a different format for the original document. There is a duplication of code with the html-plugin which will be resolved by either a) getting rid of the html-plugin altogether or b) exporting its jar and make the tika parser depend on it. The following libraries are required in the lib/ directory of the tika-parser : library name=asm-3.1.jar/ library name=bcmail-jdk15-144.jar/ library name=commons-compress-1.0.jar/ library name=commons-logging-1.1.1.jar/ library name=dom4j-1.6.1.jar/ library name=fontbox-0.8.0-incubator.jar/ library name=geronimo-stax-api_1.0_spec-1.0.1.jar/ library name=hamcrest-core-1.1.jar/ library name=jce-jdk13-144.jar/ library name=jempbox-0.8.0-incubator.jar/ library name=metadata-extractor-2.4.0-beta-1.jar/ library name=mockito-core-1.7.jar/ library name=objenesis-1.0.jar/ library name=ooxml-schemas-1.0.jar/ library name=pdfbox-0.8.0-incubating.jar/ library name=poi-3.5-FINAL.jar/ library name=poi-ooxml-3.5-FINAL.jar/ library name=poi-scratchpad-3.5-FINAL.jar/ library name=tagsoup-1.2.jar/ library name=tika-parsers-0.5-SNAPSHOT.jar/ library name=xml-apis-1.0.b2.jar/ library name=xmlbeans-2.3.0.jar/ There is a small test suite which needs to be improved. We will need to have a look at each individual format and check that it is covered by Tika and if so to the same extent; the Wiki is probably the right place for this. The language identifier (which is a HTMLParseFilter) seemed to work fine. Again, your comments are welcome. Please bear in mind that this is just a first step. Julien http://www.digitalpebble.com -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Trying to Add an new NutchDoc from plugin
Hi there, Im new to the forum and nutch as well... I wrote a plugin to nutch that implements the IndexingFilter... Now i want to add a new Document to the index from the plugin (split the current doc) I tryed testing it from something like this NutchIndexWriter[] Writers = NutchIndexWriterFactory.getNutchIndexWriters(getConf()); Writers[0].write(doc); the doc is the doc i get in the method not something new i created.(just for testing) And i get the error it doesn't make sense to have a field that is neither indexed nor stored Any suggestions? -- View this message in context: http://old.nabble.com/Trying-to-Add-an-new-NutchDoc-from-plugin-tp27598076p27598076.html Sent from the Nutch - Dev mailing list archive at Nabble.com.
Re: Trying to Add an new NutchDoc from plugin
Maybe I can try...debugging an Indexing plugin is kinda tricky. can you attach the req files and folders and tell me exactly what procedure to follow? Also any settings to be modified On Tue, Feb 16, 2010 at 12:10 AM, UDd dekelu...@gmail.com wrote: Hi there, Im new to the forum and nutch as well... I wrote a plugin to nutch that implements the IndexingFilter... Now i want to add a new Document to the index from the plugin (split the current doc) I tryed testing it from something like this NutchIndexWriter[] Writers = NutchIndexWriterFactory.getNutchIndexWriters(getConf()); Writers[0].write(doc); the doc is the doc i get in the method not something new i created.(just for testing) And i get the error it doesn't make sense to have a field that is neither indexed nor stored Any suggestions? -- View this message in context: http://old.nabble.com/Trying-to-Add-an-new-NutchDoc-from-plugin-tp27598076p27598076.html Sent from the Nutch - Dev mailing list archive at Nabble.com.
Re: Trying to Add an new NutchDoc from plugin
Thx for the quick response, Well i wrote a very simple plugin that tryes to the the same doc twice and if there is and error then put it in the orniginal doc custom field: public NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException { // filter out if url contains archive, label or feeds LOGGER.debug(Found Url: + new String(url.getBytes())); NutchIndexWriter[] Writers = NutchIndexWriterFactory.getNutchIndexWriters(getConf()); //doc.add(js, String.valueOf(Writers.length)); try { Writers[0].write(doc); } catch (Exception e) { // TODO Auto-generated catch block LOGGER.debug(Error adding Doc + e.getMessage()); doc.add(js, e.getMessage()); } doc.add(js, AfterTest); //return doc; return doc; } and after the nutch run i just look at the index with lukeall-1.0.0 , I added the compiled plugin jar if you can try to debug it... or if you can tell me how to debug it will be great (I have the nutch working from ecplise). http://old.nabble.com/file/p27598879/myplugins.rar myplugins.rar -- View this message in context: http://old.nabble.com/Trying-to-Add-an-new-NutchDoc-from-plugin-tp27598076p27598879.html Sent from the Nutch - Dev mailing list archive at Nabble.com.
Build failed in Hudson: Nutch-trunk #1070
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/1070/changes Changes: [jnioche] NUTCH-766: small improvement to Tika parser : prioritise default Tika parser when discovering plugins matching mime-type [siren] NUTCH-793 search.jsp compile errors -- [...truncated 6516 lines...] jar: deps-test: init: init-plugin: deps-jar: compile: [echo] Compiling plugin: lib-regex-filter jar: deps-test: deploy: copy-generated-lib: deploy: copy-generated-lib: test: [echo] Testing plugin: urlfilter-automaton [junit] Running org.apache.nutch.urlfilter.automaton.TestAutomatonURLFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 12.469 sec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 13.249 sec [junit] Running org.apache.nutch.tika.TestRTFParser init: init-plugin: deps-jar: compile: [echo] Compiling plugin: urlfilter-domain compile-test: [javac] Compiling 1 source file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-domain/test jar: deps-test: deploy: copy-generated-lib: test: [echo] Testing plugin: urlfilter-domain [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.35 sec init: init-plugin: deps-jar: init: init-plugin: deps-jar: compile: [echo] Compiling plugin: lib-regex-filter jar: init: init-plugin: deps-jar: compile: [echo] Compiling plugin: lib-regex-filter compile-test: compile: [echo] Compiling plugin: urlfilter-regex compile-test: [javac] Compiling 1 source file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-regex/test [junit] Running org.apache.nutch.urlfilter.domain.TestDomainURLFilter jar: deps-test: init: init-plugin: deps-jar: compile: [echo] Compiling plugin: lib-regex-filter jar: deps-test: deploy: copy-generated-lib: deploy: copy-generated-lib: test: [echo] Testing plugin: urlfilter-regex [junit] Running org.apache.nutch.urlfilter.regex.TestRegexURLFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.231 sec init: init-plugin: deps-jar: compile: [echo] Compiling plugin: urlfilter-suffix compile-test: [javac] Compiling 1 source file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/test jar: deps-test: deploy: copy-generated-lib: test: [echo] Testing plugin: urlfilter-suffix [junit] Running org.apache.nutch.urlfilter.suffix.TestSuffixURLFilter [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.229 sec init: init-plugin: deps-jar: compile: [echo] Compiling plugin: urlnormalizer-basic compile-test: [javac] Compiling 1 source file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/test jar: deps-test: deploy: copy-generated-lib: test: [echo] Testing plugin: urlnormalizer-basic [junit] Running org.apache.nutch.net.urlnormalizer.basic.TestBasicURLNormalizer [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.028 sec init: init-plugin: deps-jar: compile: [echo] Compiling plugin: urlnormalizer-pass compile-test: [javac] Compiling 1 source file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/test jar: deps-test: deploy: copy-generated-lib: test: [echo] Testing plugin: urlnormalizer-pass [junit] Running org.apache.nutch.net.urlnormalizer.pass.TestPassURLNormalizer [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.182 sec init: init-plugin: deps-jar: compile: [echo] Compiling plugin: urlnormalizer-regex compile-test: [javac] Compiling 1 source file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-regex/test [javac] Note: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/src/plugin/urlnormalizer-regex/src/test/org/apache/nutch/net/urlnormalizer/regex/TestRegexURLNormalizer.java uses unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. jar: deps-test: init: init-plugin: compile: jar: [jar] Warning: skipping jar archive http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/nutch-extensionpoints/nutch-extensionpoints.jar because no files were included. deps-test: deploy: copy-generated-lib: deploy: copy-generated-lib: test: [echo] Testing plugin: urlnormalizer-regex [junit] Running org.apache.nutch.net.urlnormalizer.regex.TestRegexURLNormalizer [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.269 sec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 12.816 sec BUILD FAILED http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build.xml:314: The following error occurred while executing this line: