[jira] Commented: (NUTCH-138) non-Latin-1 characters cannot be submitted for search

2006-01-02 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-138?page=comments#action_12361520 ] Piotr Kosiorowski commented on NUTCH-138: - I am not sure but I would suspect it is a problem of bad tomcat configuration. To handle special characters in query urls

Re: Mega-cleanup in trunk/

2006-01-02 Thread Andrzej Bialecki
Piotr Kosiorowski wrote: Andrzej Bialecki wrote: Hi, I just commited a large patch to cleanup the trunk/ of obsolete and broken classes remaining from the 0.7.x development line. Please test that things still work as they should ... Hi, I am not sure what is wrong but a lot of JUnit

Re: Trunk is broken

2006-01-02 Thread Thomas Jaeger
Hi Andrzej, Gal Nitzan wrote: It seems that Trunk is now broken... DmozParser seems to be broken, too. It's package declaration is still org.apache.nutch.crawl instead of org.apache.nutch.tools. TJ

Re: Trunk is broken

2006-01-02 Thread Thomas Jaeger
Hi Andrzej, Gal Nitzan wrote: It seems that Trunk is now broken... DmozParser seems to be broken, too. It's package declaration is still org.apache.nutch.crawl instead of org.apache.nutch.tools. TJ

[jira] Commented: (NUTCH-159) Specify temp/working directory for crawl

2006-01-02 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-159?page=comments#action_12361541 ] Doug Cutting commented on NUTCH-159: mapred.local.dir is the thing to set. if that fails, then there is a bug. what did you have this set to? Specify temp/working

Re: Bug in DeleteDuplicates.java ?

2006-01-02 Thread Doug Cutting
Andrzej Bialecki wrote: Gal Nitzan wrote: this function throws IOException. Why? public long getPos() throws IOException { return (doc*INDEX_LENGTH)/maxDoc; } It should be throwing ArithmeticException The IOException is required by the API of RecordReader.

Re: [bug?] PRC called emthod require parameter

2006-01-02 Thread Doug Cutting
Stefan Groschupf wrote: I also note this line in client.java public Writable[] call(Writable[] params, InetSocketAddress[] addresses) throws IOException { if (params.length == 0) return new Writable[0]; Do I understand it correct that in case the remote method does not need any

Re: IndexSorter optimizer

2006-01-02 Thread Doug Cutting
Andrzej Bialecki wrote: I'm happy to report that further tests performed on a larger index seem to show that the overall impact of the IndexSorter is definitely positive: performance improvements are significant, and the overall quality of results seems at least comparable, if not actually

[jira] Commented: (NUTCH-138) non-Latin-1 characters cannot be submitted for search

2006-01-02 Thread KuroSaka TeruHiko (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-138?page=comments#action_12361546 ] KuroSaka TeruHiko commented on NUTCH-138: - You are right. WIth this Tomcat config, UTF-8 characters can be passed. Also works is having: useBodyEncodingForURI=true

[jira] Closed: (NUTCH-138) non-Latin-1 characters cannot be submitted for search

2006-01-02 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-138?page=all ] Piotr Kosiorowski closed NUTCH-138: --- Resolution: Invalid Setting URIEncoding in tomcat config file fixes the problem. non-Latin-1 characters cannot be submitted for search

[jira] Commented: (NUTCH-138) non-Latin-1 characters cannot be submitted for search

2006-01-02 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-138?page=comments#action_12361549 ] Piotr Kosiorowski commented on NUTCH-138: - BTW - just create user for yourself in nutch Wiki and you shoudl be able to add a new page with information without

Re: svn commit: r359822 - in /lucene/nutch/trunk: bin/ conf/ src/java/org/apache/nutch/crawl/ src/java/org/apache/nutch/fetcher/ src/java/org/apache/nutch/indexer/ src/java/org/apache/nutch/parse/ src

2006-01-02 Thread Andrzej Bialecki
Doug Cutting wrote: [EMAIL PROTECTED] wrote: Now users can select their own page signature implementation, possibly with better properties than the old one. Two implementations are provided: * MD5Signature: backward-compatible with the old schema. * TextProfileSignature: an example

Re: IndexSorter optimizer

2006-01-02 Thread Andrzej Bialecki
Doug Cutting wrote: Andrzej Bialecki wrote: Using the original index, it was possible for pages with high tf/idf of a term, but with a low boost value (the OPIC score), to outrank pages with high boost but lower tf/idf of a term. This phenomenon leads quite often to results that are

[jira] Created: (NUTCH-161) Plain text parser should use parser.character.encoding.default property for fall back encoding

2006-01-02 Thread KuroSaka TeruHiko (JIRA)
Plain text parser should use parser.character.encoding.default property for fall back encoding -- Key: NUTCH-161 URL: http://issues.apache.org/jira/browse/NUTCH-161 Project: Nutch

NullPointerException (new as of Dec 31st)

2006-01-02 Thread Rod Taylor
During a fetch I have recently started getting these (pretty consistently). task_r_5m9ybr 0.15 reduce copy java.lang.NullPointerException at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:991) at java.lang.Float.parseFloat(Float.java:394) at

Re: IndexSorter optimizer

2006-01-02 Thread Andrzej Bialecki
Doug Cutting wrote: I have committed this, along with the LuceneQueryOptimizer changes. I could only find one place where I was using numDocs() instead of maxDoc(). Right, I confused two bugs from different files - the other bug still exists in the committed version of the