Hi Michael,
Not directly answering this question, but keep in mind that as mentioned in the
issue Sebastian referenced, there are many more places in Nutch that have the
same problem, so setting LC_ALL is probably a good idea in general (until that
issue is fixed...).
If you're worried about ot
Not sure it's practical to go around to all the hadoop machines and change
their default encoding settings. Not sure it wouldn't break something else!
I'm wondering if there's a simple fix I could make to the source code to make
nutch.segment.SegmentReader use utf-8 as a default when reading the
Dear all,
I plan to improve hostdb functionality to have a DB_FETCHED delta for generate
stage.
Lets say for each website we have condition of generate while number of fetched
< 150.
The problem is for some websites that condition will (almost)never be finished,
because of its structure.
Fo
The third question can be:
1) Now we have hostdb that stores all statistics per host. You can read/write
to the database. Does it make sense to have both for the reporting?
Sent: Monday, December 04, 2017 at 7:47 PM
From: "Yossi Tamari"
To: user@nutch.apache.org
Subject: crawlcomplete
Hi,
I
4 matches
Mail list logo