[jira] Created: (NUTCH-793) search.jsp compile errors

2010-02-15 Thread Sami Siren (JIRA)
search.jsp compile errors
-

 Key: NUTCH-793
 URL: https://issues.apache.org/jira/browse/NUTCH-793
 Project: Nutch
  Issue Type: Bug
  Components: web gui
Reporter: Sami Siren
Assignee: Sami Siren
 Fix For: 1.1


Related to the searcher interface changes recently committed I broke search.jsp 
which does not currently compile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: exception in search.jsp

2010-02-15 Thread Sami Siren

Hi Jesse,

thanks for spotting this. I fixed the problem in trunk, see 
https://issues.apache.org/jira/browse/NUTCH-793


--
 Sami Siren

Jesse Hires wrote:

I am seeing the following and am able to find any notes anywhere on it.

org.apache.jasper.JasperException: Unable to compile class for JSP: 


An error occurred at line: 207 in the jsp file: /search.jsp

query.getParams cannot be resolved or is not a field
204: // position this is good, bad?... ugly?
205:Hits hits;
206:try{
207:   query.getParams.initFrom(start + hitsToRetrieve, hitsPerSite, 
site, sort, reverse);

208:  hits = bean.search(query);
209:} catch (IOException e){
210:  hits = new Hits(0,new Hit[0]);



It looks like this change came in recently to SVN

--- lucene/nutch/trunk/src/web/jsp/search.jsp   2009/10/09 17:02:32 823614

+++ lucene/nutch/trunk/src/web/jsp/search.jsp   2010/02/01 20:47:34 905410
@@ -204,8 +204,8 @@
 // position this is good, bad?... ugly?
Hits hits;
try{
- hits = bean.search(query, start + hitsToRetrieve, hitsPerSite, site,

-sort, reverse);
+  query.getParams.initFrom(start + hitsToRetrieve, hitsPerSite, site, 
sort, reverse);
+ hits = bean.search(query);
} catch (IOException e){
  hits = new Hits(0,new Hit[0]);

}


Has anyone else run into this, or did I miss something when updating to 
the latest version?


Jesse

int GetRandomNumber()
{
   return 4; // Chosen by fair roll of dice
// Guaranteed to be random
} // xkcd.com http://xkcd.com





[jira] Resolved: (NUTCH-793) search.jsp compile errors

2010-02-15 Thread Sami Siren (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sami Siren resolved NUTCH-793.
--

Resolution: Fixed

committed a fix

 search.jsp compile errors
 -

 Key: NUTCH-793
 URL: https://issues.apache.org/jira/browse/NUTCH-793
 Project: Nutch
  Issue Type: Bug
  Components: web gui
Reporter: Sami Siren
Assignee: Sami Siren
 Fix For: 1.1


 Related to the searcher interface changes recently committed I broke 
 search.jsp which does not currently compile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (NUTCH-788) search.jsp typo causing searches to fail

2010-02-15 Thread Sami Siren (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sami Siren resolved NUTCH-788.
--

   Resolution: Fixed
Fix Version/s: 1.1
 Assignee: Sami Siren

Thanks Sammy for the fix, I did not realize you had spotted this too. It's now 
fixed in trunk.

 search.jsp typo causing searches to fail
 

 Key: NUTCH-788
 URL: https://issues.apache.org/jira/browse/NUTCH-788
 Project: Nutch
  Issue Type: Bug
  Components: web gui
Affects Versions: 1.1
 Environment: On trunk
Reporter: Sammy Yu
Assignee: Sami Siren
 Fix For: 1.1

 Attachments: 0001-Fix-up-servlet.patch


 Call to initialize the servlet parameter is missing parentheses.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-789) Improvements to Tika parser

2010-02-15 Thread Sami Siren (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833714#action_12833714
 ] 

Sami Siren commented on NUTCH-789:
--

It would be really useful to include the improvements in the functionality 
since that way almost all (-flash ?) parsers would be covered.

 Improvements to Tika parser
 ---

 Key: NUTCH-789
 URL: https://issues.apache.org/jira/browse/NUTCH-789
 Project: Nutch
  Issue Type: Improvement
  Components: fetcher
 Environment: reported by Sami, in NUTCH-766
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
Priority: Minor
 Fix For: 1.1

 Attachments: NutchTikaConfig.java, TikaParser.java


 As reported by Sami in NUTCH-766, Sami has a few improvements he made to the 
 Tika parser. We'll track that progress here.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (NUTCH-766) Tika parser

2010-02-15 Thread Julien Nioche (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche closed NUTCH-766.
---


Have added small improvement in revision 910187 (Prioritise default Tika parser 
when discovering plugins matching mime-type).
Thanks to Chris for testing and committing it + Andrzej and Sami for their 
comments and suggestions

 Tika parser
 ---

 Key: NUTCH-766
 URL: https://issues.apache.org/jira/browse/NUTCH-766
 Project: Nutch
  Issue Type: New Feature
Reporter: Julien Nioche
Assignee: Chris A. Mattmann
 Fix For: 1.1

 Attachments: NUTCH-766-v3.patch, NUTCH-766.v2, NutchTikaConfig.java, 
 sample.tar.gz, TikaParser.java


 Tika handles a lot of different formats under the bonnet and exposes them 
 nicely via SAX events. What is described here is a tika-parser plugin which 
 delegates the pasring mechanism of Tika but can still coexist with the 
 existing parsing plugins which is useful for formats partially handled by 
 Tika (or not at all). Some of the elements below have already been discussed 
 on the mailing lists. Note that this is work in progress, your feedback is 
 welcome.
 Tika is already used by Nutch for its MimeType implementations. Tika comes as 
 different jar files (core and parsers), in the work described here we decided 
 to put the libs in 2 different places
 NUTCH_HOME/lib : tika-core.jar
 NUTCH_HOME/tika-plugin/lib : tika-parsers.jar
 Tika being used by the core only for its Mimetype functionalities we only 
 need to put tika-core at the main lib level whereas the tika plugin obviously 
 needs the tika-parsers.jar + all the jars used internally by Tika
 Due to limitations in the way Tika loads its classes, we had to duplicate the 
 TikaConfig class in the tika-plugin. This might be fixed in the future in 
 Tika itself or avoided by refactoring the mimetype part of Nutch using 
 extension points.
 Unlike most other parsers, Tika handles more than one Mime-type which is why 
 we are using * as its mimetype value in the plugin descriptor and have 
 modified ParserFactory.java so that it considers the tika parser as 
 potentially suitable for all mime-types. In practice this means that the 
 associations between a mime type and a parser plugin as defined in 
 parse-plugins.xml are useful only for the cases where we want to handle a 
 mime type with a different parser than Tika. 
 The general approach I chose was to convert the SAX events returned by the 
 Tika parsers into DOM objects and reuse the utilities that come with the 
 current HTML parser i.e. link detection,  metatag handling but also means 
 that we can use the HTMLParseFilters in exactly the same way. The main 
 difference though is that HTMLParseFilters are not limited to HTML documents 
 anymore as the XHTML tags returned by Tika can correspond to a different 
 format for the original document. There is a duplication of code with the 
 html-plugin which will be resolved by either a) getting rid of the 
 html-plugin altogether or b) exporting its jar and make the tika parser 
 depend on it.
 The following libraries are required in the lib/ directory of the tika-parser 
 : 
   library name=asm-3.1.jar/
   library name=bcmail-jdk15-144.jar/
   library name=commons-compress-1.0.jar/
   library name=commons-logging-1.1.1.jar/
   library name=dom4j-1.6.1.jar/
   library name=fontbox-0.8.0-incubator.jar/
   library name=geronimo-stax-api_1.0_spec-1.0.1.jar/
   library name=hamcrest-core-1.1.jar/
   library name=jce-jdk13-144.jar/
   library name=jempbox-0.8.0-incubator.jar/
   library name=metadata-extractor-2.4.0-beta-1.jar/
   library name=mockito-core-1.7.jar/
   library name=objenesis-1.0.jar/
   library name=ooxml-schemas-1.0.jar/
   library name=pdfbox-0.8.0-incubating.jar/
   library name=poi-3.5-FINAL.jar/
   library name=poi-ooxml-3.5-FINAL.jar/
   library name=poi-scratchpad-3.5-FINAL.jar/
   library name=tagsoup-1.2.jar/
   library name=tika-parsers-0.5-SNAPSHOT.jar/
   library name=xml-apis-1.0.b2.jar/
   library name=xmlbeans-2.3.0.jar/
 There is a small test suite which needs to be improved. We will need to have 
 a look at each individual format and check that it is covered by Tika and if 
 so to the same extent; the Wiki is probably the right place for this. The 
 language identifier (which is a HTMLParseFilter) seemed to work fine.
  
 Again, your comments are welcome. Please bear in mind that this is just a 
 first step. 
 Julien
 http://www.digitalpebble.com

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Trying to Add an new NutchDoc from plugin

2010-02-15 Thread UDd

Hi there,
Im new to the forum and nutch as well...
I wrote a plugin to nutch that implements the IndexingFilter...
Now i want to add a new Document to the index from the plugin (split the
current doc)
I tryed testing it from something like this

NutchIndexWriter[] Writers =
NutchIndexWriterFactory.getNutchIndexWriters(getConf());
Writers[0].write(doc);

the doc is the doc i get in the method not something new i created.(just
for testing)

And i get the error it doesn't make sense to have a field that is neither
indexed nor stored

Any suggestions?
-- 
View this message in context: 
http://old.nabble.com/Trying-to-Add-an-new-NutchDoc-from-plugin-tp27598076p27598076.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.



Re: Trying to Add an new NutchDoc from plugin

2010-02-15 Thread Sahil Shah
Maybe I can try...debugging an Indexing plugin is kinda tricky.
can you attach the req files and folders and tell me exactly what procedure
to follow?
Also any settings to be modified



On Tue, Feb 16, 2010 at 12:10 AM, UDd dekelu...@gmail.com wrote:


 Hi there,
 Im new to the forum and nutch as well...
 I wrote a plugin to nutch that implements the IndexingFilter...
 Now i want to add a new Document to the index from the plugin (split the
 current doc)
 I tryed testing it from something like this

 NutchIndexWriter[] Writers =
 NutchIndexWriterFactory.getNutchIndexWriters(getConf());
 Writers[0].write(doc);

 the doc is the doc i get in the method not something new i
 created.(just
 for testing)

 And i get the error it doesn't make sense to have a field that is neither
 indexed nor stored

 Any suggestions?
 --
 View this message in context:
 http://old.nabble.com/Trying-to-Add-an-new-NutchDoc-from-plugin-tp27598076p27598076.html
 Sent from the Nutch - Dev mailing list archive at Nabble.com.




Re: Trying to Add an new NutchDoc from plugin

2010-02-15 Thread UDd

Thx for the quick response,
Well i wrote a very simple plugin that tryes to the the same doc twice and
if there is and error
then put it in the orniginal doc custom field:

  public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
  CrawlDatum datum, Inlinks inlinks) throws IndexingException {
  
  // filter out if url contains archive, label or feeds
  LOGGER.debug(Found Url:  + new String(url.getBytes())); 
  
  NutchIndexWriter[] Writers =
NutchIndexWriterFactory.getNutchIndexWriters(getConf());
  //doc.add(js, String.valueOf(Writers.length));
  try {
Writers[0].write(doc);
  } catch (Exception e) {
// TODO Auto-generated catch block
  LOGGER.debug(Error adding Doc  + e.getMessage()); 
  doc.add(js, e.getMessage());
  }
  doc.add(js, AfterTest); 
  //return doc;
  return doc;
  }

and after the nutch run i just look at the index with lukeall-1.0.0 ,
I added the compiled plugin jar if you can try to debug it... or
if you can tell me how to debug it will be great (I have the nutch working
from ecplise).




http://old.nabble.com/file/p27598879/myplugins.rar myplugins.rar 
-- 
View this message in context: 
http://old.nabble.com/Trying-to-Add-an-new-NutchDoc-from-plugin-tp27598076p27598879.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.



Build failed in Hudson: Nutch-trunk #1070

2010-02-15 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/1070/changes

Changes:

[jnioche] NUTCH-766: small improvement to Tika parser : prioritise default Tika 
parser when discovering plugins matching mime-type

[siren] NUTCH-793 search.jsp compile errors

--
[...truncated 6516 lines...]

jar:

deps-test:

init:

init-plugin:

deps-jar:

compile:
 [echo] Compiling plugin: lib-regex-filter

jar:

deps-test:

deploy:

copy-generated-lib:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlfilter-automaton
[junit] Running org.apache.nutch.urlfilter.automaton.TestAutomatonURLFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 12.469 sec
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 13.249 sec
[junit] Running org.apache.nutch.tika.TestRTFParser

init:

init-plugin:

deps-jar:

compile:
 [echo] Compiling plugin: urlfilter-domain

compile-test:
[javac] Compiling 1 source file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-domain/test

jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlfilter-domain
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.35 sec

init:

init-plugin:

deps-jar:

init:

init-plugin:

deps-jar:

compile:
 [echo] Compiling plugin: lib-regex-filter

jar:

init:

init-plugin:

deps-jar:

compile:
 [echo] Compiling plugin: lib-regex-filter

compile-test:

compile:
 [echo] Compiling plugin: urlfilter-regex

compile-test:
[javac] Compiling 1 source file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-regex/test
[junit] Running org.apache.nutch.urlfilter.domain.TestDomainURLFilter

jar:

deps-test:

init:

init-plugin:

deps-jar:

compile:
 [echo] Compiling plugin: lib-regex-filter

jar:

deps-test:

deploy:

copy-generated-lib:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlfilter-regex
[junit] Running org.apache.nutch.urlfilter.regex.TestRegexURLFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.231 sec

init:

init-plugin:

deps-jar:

compile:
 [echo] Compiling plugin: urlfilter-suffix

compile-test:
[javac] Compiling 1 source file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/test

jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlfilter-suffix
[junit] Running org.apache.nutch.urlfilter.suffix.TestSuffixURLFilter
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.229 sec

init:

init-plugin:

deps-jar:

compile:
 [echo] Compiling plugin: urlnormalizer-basic

compile-test:
[javac] Compiling 1 source file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/test

jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlnormalizer-basic
[junit] Running 
org.apache.nutch.net.urlnormalizer.basic.TestBasicURLNormalizer
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.028 sec

init:

init-plugin:

deps-jar:

compile:
 [echo] Compiling plugin: urlnormalizer-pass

compile-test:
[javac] Compiling 1 source file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/test

jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlnormalizer-pass
[junit] Running 
org.apache.nutch.net.urlnormalizer.pass.TestPassURLNormalizer
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.182 sec

init:

init-plugin:

deps-jar:

compile:
 [echo] Compiling plugin: urlnormalizer-regex

compile-test:
[javac] Compiling 1 source file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-regex/test
[javac] Note: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/src/plugin/urlnormalizer-regex/src/test/org/apache/nutch/net/urlnormalizer/regex/TestRegexURLNormalizer.java
 uses unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

jar:

deps-test:

init:

init-plugin:

compile:

jar:
  [jar] Warning: skipping jar archive 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/nutch-extensionpoints/nutch-extensionpoints.jar
 because no files were included.

deps-test:

deploy:

copy-generated-lib:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlnormalizer-regex
[junit] Running 
org.apache.nutch.net.urlnormalizer.regex.TestRegexURLNormalizer
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.269 sec
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 12.816 sec

BUILD FAILED
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build.xml:314: 
The following error occurred while executing this line: