make datanode starting port configurable
Key: NUTCH-97
URL: http://issues.apache.org/jira/browse/NUTCH-97
Project: Nutch
Type: Improvement
Versions: 0.8-dev
Reporter: Stefan Groschupf
Priority: Minor
Fix
Hi,
why are the taskReportPort and mapOutputPort randomly generated?
I can not see any reasons for that and wondering why we not just have
that configurable as well.
I can understand that in some situations it is necessary to
reinitialize the task tracker but it can use in any case the same
Hello,
Is there an API of some sort for injecting content into Nutch *without*
using Nutch's crawler? Or does anyone have ideas as to how to approach
this problem? I.e. given a URL, a page of content, metadata about the
page, links, etc., how can I inject this into Nutch without Nutch
Dave, you don't want to inject anything per-se, at least according
to nutch terminology. Instead, you'll want create your own synthetic
crawler. Nutch's crawler outputs one segment file (directory of
files, actually) per crawler pass. It is this segment that is
processed by the nutch index
Stefan Groschupf wrote:
Beside that a behavior like the datanode that iterates until it find a
free port would be a better than just random.
That would be fine.
Would a patch have a chance to be applied? I can create one, but I
wouldn't love to waste time in case people do not want to
Paul Baclace wrote:
Doug Cutting expressed a concern to me about using util.Random to generate
random 64 bit block numbers for NDFS. The following is my analysis.
Nice stuff, Paul. Thanks.
It just occurred to me that perhaps we could simply use sequential block
numbering. All block ids
Stefan Groschupf wrote:
Beside that a behavior like the datanode that iterates until it find
a free port would be a better than just random.
There is a possibility that a test run could start many processes
on one machine and a sequential available port search could be
contentious.
If you
Hi Paul,
my call stack say that actually no other classes using the tasktracker.
Beside that tasktracker could be implement NutchConfigurable than all
problems would be solved since this is IOC pattern.
Or do I oversee something?
Stefan
Am 27.09.2005 um 01:24 schrieb Paul Baclace:
Stefan
Goldschmidt, Dave wrote:
Hello,
Is there an API of some sort for injecting content into Nutch *without*
using Nutch's crawler? Or does anyone have ideas as to how to approach
this problem? I.e. given a URL, a page of content, metadata about the
page, links, etc., how can I inject this
Stefan Groschupf wrote:
Hi Paul,
my call stack say that actually no other classes using the tasktracker.
Beside that tasktracker could be implement NutchConfigurable than all
problems would be solved since this is IOC pattern.
Or do I oversee something?
I am thinking about the mapred branch
Hi there,
I just noticed after checking out the latest SVN of Nutch that I am
currently failing the TestSegmentMergeTool Junit test when I type ant test
for Nutch. Is anyone experiencing the same problem? Here is the relevant
information which I captured out of the
Doug Cutting wrote:
It just occurred to me that perhaps we could simply use sequential block
numbering. All block ids are generated centrally on the namenode.
I'm not sure what the advantage of sequential block numbers would be
since long period PRNG block numbering does not even need to
12 matches
Mail list logo