Re: Mega-cleanup in trunk/

2006-01-01 Thread Piotr Kosiorowski

Andrzej Bialecki wrote:

Hi,

I just commited a large patch to cleanup the trunk/ of obsolete and 
broken classes remaining from the 0.7.x development line. Please test 
that things still work as they should ...



Hi,
I am not sure what is wrong but a lot of JUnit test simply does not 
compile - I did svn checkout to new directory to be sure I do not 
anything left from my experiments.


I am looking at it right now but - I would suggest to temporarily do a 
quick cleanup to make trunk testable:


1) Remove permanently - as classes under tests are removed in trunk:
src/test/org/apache/nutch/pagedb/TestFetchListEntry.java
src/test/org/apache/nutch/pagedb/TestPage.java
src/test/org/apache/nutch/db/TestWebDB.java
src/test/org/apache/nutch/db/DBTester.java
src/test/org/apache/nutch/tools/TestSegmentMergeTool.java
2) Remove temporarly and create JIRA issue to fix it:
src/test/org/apache/nutch/fetcher/TestFetcher.java
src/test/org/apache/nutch/fetcher/TestFetcherOutput.java

3) Remove unused import in:
src/test/org/apache/nutch/parse/TestParseText.java
4) Fix (as it looks simple to fix it - I will look at it in meantime):

src/plugin/parse-msword/src/test/org/apache/nutch/parse/msword/TestMSWordParser.java
src/plugin/parse-zip/src/test/org/apache/nutch/parse/zip/TestZipParser.java
src/plugin/parse-rss/src/test/org/apache/nutch/parse/rss/TestRSSParser.java
src/plugin/parse-pdf/src/test/org/apache/nutch/parse/pdf/TestPdfParser.java
src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext/TestExtParser.java
src/plugin/parse-mspowerpoint/src/test/org/apache/nutch/parse/mspowerpoint/TestMSPowerPointParser.java
src/plugin/parse-mspowerpoint/src/test/org/apache/nutch/parse/mspowerpoint/AllTests.java

After removal of all these not compiling classes tests of trunk complete 
sucessfully on my machine (JDK 1.4.2).


If no objections - especially from Andrzej would be raised I can do the 
cleanup tommorow.

P.




Re: how to add additional factor at search time to ranking score

2006-01-01 Thread Piotr Kosiorowski

AJ Chen wrote:

It would be great if I can add some new functions to the nutch code to 
accomplish this. But, if it requires to customize lucene code, that's 
fine. I have tried to use the most recent release (1.4.3) of lucene 
source code, but it did not work.  Is the lucene jar files included in 
the nutch release (0.7.1) very different from lucene 1.4.3?  If yes, is 
it possible to get the source code for lucene used in nutch?


Nutch uses lucene 1.9 (not existing release yet) - build from lucene 
trunk. Simply grab sources from lucene trunk and nutch should work fine 
with them.

P.



[jira] Commented: (NUTCH-142) NutchConf should use the thread context classloader

2006-01-01 Thread Piotr Kosiorowski (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-142?page=comments#action_12361492 ] 

Piotr Kosiorowski commented on NUTCH-142:
-

Thanks. Fixed in 0.7 branch. Left open to fix it in trunk after cleaning trunk 
JUnit test problems (in next few days).

 NutchConf should use the thread context classloader
 ---

  Key: NUTCH-142
  URL: http://issues.apache.org/jira/browse/NUTCH-142
  Project: Nutch
 Type: Improvement
 Versions: 0.7
 Reporter: Mike Cannon-Brookes


 Right now NutchConf uses it's own static classloader which is _evil_ in a 
 J2EE scenario.
 This is simply fixed. Line 52:
private ClassLoader classLoader = NutchConf.class.getClassLoader();
 Should be:
private ClassLoader classLoader = 
 Thread.currentThread().getContextClassLoader();
 This means no matter where Nutch classes are loaded from, it will use the 
 correct J2EE classloader to try to find configuration files (ie from 
 WEB-INF/classes).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira