Re: Mega-cleanup in trunk/
Andrzej Bialecki wrote: Hi, I just commited a large patch to cleanup the trunk/ of obsolete and broken classes remaining from the 0.7.x development line. Please test that things still work as they should ... Hi, I am not sure what is wrong but a lot of JUnit test simply does not compile - I did svn checkout to new directory to be sure I do not anything left from my experiments. I am looking at it right now but - I would suggest to temporarily do a quick cleanup to make trunk testable: 1) Remove permanently - as classes under tests are removed in trunk: src/test/org/apache/nutch/pagedb/TestFetchListEntry.java src/test/org/apache/nutch/pagedb/TestPage.java src/test/org/apache/nutch/db/TestWebDB.java src/test/org/apache/nutch/db/DBTester.java src/test/org/apache/nutch/tools/TestSegmentMergeTool.java 2) Remove temporarly and create JIRA issue to fix it: src/test/org/apache/nutch/fetcher/TestFetcher.java src/test/org/apache/nutch/fetcher/TestFetcherOutput.java 3) Remove unused import in: src/test/org/apache/nutch/parse/TestParseText.java 4) Fix (as it looks simple to fix it - I will look at it in meantime): src/plugin/parse-msword/src/test/org/apache/nutch/parse/msword/TestMSWordParser.java src/plugin/parse-zip/src/test/org/apache/nutch/parse/zip/TestZipParser.java src/plugin/parse-rss/src/test/org/apache/nutch/parse/rss/TestRSSParser.java src/plugin/parse-pdf/src/test/org/apache/nutch/parse/pdf/TestPdfParser.java src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext/TestExtParser.java src/plugin/parse-mspowerpoint/src/test/org/apache/nutch/parse/mspowerpoint/TestMSPowerPointParser.java src/plugin/parse-mspowerpoint/src/test/org/apache/nutch/parse/mspowerpoint/AllTests.java After removal of all these not compiling classes tests of trunk complete sucessfully on my machine (JDK 1.4.2). If no objections - especially from Andrzej would be raised I can do the cleanup tommorow. P.
Re: how to add additional factor at search time to ranking score
AJ Chen wrote: It would be great if I can add some new functions to the nutch code to accomplish this. But, if it requires to customize lucene code, that's fine. I have tried to use the most recent release (1.4.3) of lucene source code, but it did not work. Is the lucene jar files included in the nutch release (0.7.1) very different from lucene 1.4.3? If yes, is it possible to get the source code for lucene used in nutch? Nutch uses lucene 1.9 (not existing release yet) - build from lucene trunk. Simply grab sources from lucene trunk and nutch should work fine with them. P.
[jira] Commented: (NUTCH-142) NutchConf should use the thread context classloader
[ http://issues.apache.org/jira/browse/NUTCH-142?page=comments#action_12361492 ] Piotr Kosiorowski commented on NUTCH-142: - Thanks. Fixed in 0.7 branch. Left open to fix it in trunk after cleaning trunk JUnit test problems (in next few days). NutchConf should use the thread context classloader --- Key: NUTCH-142 URL: http://issues.apache.org/jira/browse/NUTCH-142 Project: Nutch Type: Improvement Versions: 0.7 Reporter: Mike Cannon-Brookes Right now NutchConf uses it's own static classloader which is _evil_ in a J2EE scenario. This is simply fixed. Line 52: private ClassLoader classLoader = NutchConf.class.getClassLoader(); Should be: private ClassLoader classLoader = Thread.currentThread().getContextClassLoader(); This means no matter where Nutch classes are loaded from, it will use the correct J2EE classloader to try to find configuration files (ie from WEB-INF/classes). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira