[
http://issues.apache.org/jira/browse/NUTCH-138?page=comments#action_12361520 ]
Piotr Kosiorowski commented on NUTCH-138:
-
I am not sure but I would suspect it is a problem of bad tomcat configuration.
To handle special characters in query urls
Piotr Kosiorowski wrote:
Andrzej Bialecki wrote:
Hi,
I just commited a large patch to cleanup the trunk/ of obsolete and
broken classes remaining from the 0.7.x development line. Please test
that things still work as they should ...
Hi,
I am not sure what is wrong but a lot of JUnit
Hi Andrzej,
Gal Nitzan wrote:
It seems that Trunk is now broken...
DmozParser seems to be broken, too. It's package declaration is still
org.apache.nutch.crawl instead of org.apache.nutch.tools.
TJ
Hi Andrzej,
Gal Nitzan wrote:
It seems that Trunk is now broken...
DmozParser seems to be broken, too. It's package declaration is still
org.apache.nutch.crawl instead of org.apache.nutch.tools.
TJ
[
http://issues.apache.org/jira/browse/NUTCH-159?page=comments#action_12361541 ]
Doug Cutting commented on NUTCH-159:
mapred.local.dir is the thing to set. if that fails, then there is a bug.
what did you have this set to?
Specify temp/working
Andrzej Bialecki wrote:
Gal Nitzan wrote:
this function throws IOException. Why?
public long getPos() throws IOException {
return (doc*INDEX_LENGTH)/maxDoc;
}
It should be throwing ArithmeticException
The IOException is required by the API of RecordReader.
Stefan Groschupf wrote:
I also note this line in client.java
public Writable[] call(Writable[] params, InetSocketAddress[] addresses)
throws IOException {
if (params.length == 0) return new Writable[0];
Do I understand it correct that in case the remote method does not need
any
Andrzej Bialecki wrote:
I'm happy to report that further tests performed on a larger index seem
to show that the overall impact of the IndexSorter is definitely
positive: performance improvements are significant, and the overall
quality of results seems at least comparable, if not actually
[
http://issues.apache.org/jira/browse/NUTCH-138?page=comments#action_12361546 ]
KuroSaka TeruHiko commented on NUTCH-138:
-
You are right. WIth this Tomcat config, UTF-8 characters can be passed.
Also works is having: useBodyEncodingForURI=true
[ http://issues.apache.org/jira/browse/NUTCH-138?page=all ]
Piotr Kosiorowski closed NUTCH-138:
---
Resolution: Invalid
Setting URIEncoding in tomcat config file fixes the problem.
non-Latin-1 characters cannot be submitted for search
[
http://issues.apache.org/jira/browse/NUTCH-138?page=comments#action_12361549 ]
Piotr Kosiorowski commented on NUTCH-138:
-
BTW - just create user for yourself in nutch Wiki and you shoudl be able to add
a new page with information without
Doug Cutting wrote:
[EMAIL PROTECTED] wrote:
Now users can select their own page signature implementation, possibly
with better properties than the old one.
Two implementations are provided:
* MD5Signature: backward-compatible with the old schema.
* TextProfileSignature: an example
Doug Cutting wrote:
Andrzej Bialecki wrote:
Using the original index, it was possible for pages with high tf/idf
of a term, but with a low boost value (the OPIC score), to outrank
pages with high boost but lower tf/idf of a term. This phenomenon
leads quite often to results that are
Plain text parser should use parser.character.encoding.default property for
fall back encoding
--
Key: NUTCH-161
URL: http://issues.apache.org/jira/browse/NUTCH-161
Project: Nutch
During a fetch I have recently started getting these (pretty
consistently).
task_r_5m9ybr 0.15 reduce copy java.lang.NullPointerException at
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:991)
at
java.lang.Float.parseFloat(Float.java:394) at
Doug Cutting wrote:
I have committed this, along with the LuceneQueryOptimizer changes.
I could only find one place where I was using numDocs() instead of
maxDoc().
Right, I confused two bugs from different files - the other bug still
exists in the committed version of the
16 matches
Mail list logo