[jira] Created: (NUTCH-97) make datanode starting port configurable

2005-09-26 Thread Stefan Groschupf (JIRA)
make datanode starting port configurable Key: NUTCH-97 URL: http://issues.apache.org/jira/browse/NUTCH-97 Project: Nutch Type: Improvement Versions: 0.8-dev Reporter: Stefan Groschupf Priority: Minor Fix

why task tracker ports random?

2005-09-26 Thread Stefan Groschupf
Hi, why are the taskReportPort and mapOutputPort randomly generated? I can not see any reasons for that and wondering why we not just have that configurable as well. I can understand that in some situations it is necessary to reinitialize the task tracker but it can use in any case the same

API for injecting content into Nutch?

2005-09-26 Thread Goldschmidt, Dave
Hello, Is there an API of some sort for injecting content into Nutch *without* using Nutch's crawler? Or does anyone have ideas as to how to approach this problem? I.e. given a URL, a page of content, metadata about the page, links, etc., how can I inject this into Nutch without Nutch

Re: API for injecting content into Nutch?

2005-09-26 Thread Matt Kangas
Dave, you don't want to inject anything per-se, at least according to nutch terminology. Instead, you'll want create your own synthetic crawler. Nutch's crawler outputs one segment file (directory of files, actually) per crawler pass. It is this segment that is processed by the nutch index

Re: why task tracker ports random?

2005-09-26 Thread Doug Cutting
Stefan Groschupf wrote: Beside that a behavior like the datanode that iterates until it find a free port would be a better than just random. That would be fine. Would a patch have a chance to be applied? I can create one, but I wouldn't love to waste time in case people do not want to

Re: Random number generators for NDFS block numbers

2005-09-26 Thread Doug Cutting
Paul Baclace wrote: Doug Cutting expressed a concern to me about using util.Random to generate random 64 bit block numbers for NDFS. The following is my analysis. Nice stuff, Paul. Thanks. It just occurred to me that perhaps we could simply use sequential block numbering. All block ids

Re: why task tracker ports random?

2005-09-26 Thread Paul Baclace
Stefan Groschupf wrote: Beside that a behavior like the datanode that iterates until it find a free port would be a better than just random. There is a possibility that a test run could start many processes on one machine and a sequential available port search could be contentious. If you

Re: why task tracker ports random?

2005-09-26 Thread Stefan Groschupf
Hi Paul, my call stack say that actually no other classes using the tasktracker. Beside that tasktracker could be implement NutchConfigurable than all problems would be solved since this is IOC pattern. Or do I oversee something? Stefan Am 27.09.2005 um 01:24 schrieb Paul Baclace: Stefan

Re: API for injecting content into Nutch?

2005-09-26 Thread Jon Shoberg
Goldschmidt, Dave wrote: Hello, Is there an API of some sort for injecting content into Nutch *without* using Nutch's crawler? Or does anyone have ideas as to how to approach this problem? I.e. given a URL, a page of content, metadata about the page, links, etc., how can I inject this

Re: why task tracker ports random?

2005-09-26 Thread Paul Baclace
Stefan Groschupf wrote: Hi Paul, my call stack say that actually no other classes using the tasktracker. Beside that tasktracker could be implement NutchConfigurable than all problems would be solved since this is IOC pattern. Or do I oversee something? I am thinking about the mapred branch

failing of org.apache.nutch.tools.TestSegmentMergeTool?

2005-09-26 Thread Chris Mattmann
Hi there, I just noticed after checking out the latest SVN of Nutch that I am currently failing the TestSegmentMergeTool Junit test when I type ant test for Nutch. Is anyone experiencing the same problem? Here is the relevant information which I captured out of the

Re: Random number generators for NDFS block numbers

2005-09-26 Thread Paul Baclace
Doug Cutting wrote: It just occurred to me that perhaps we could simply use sequential block numbering. All block ids are generated centrally on the namenode. I'm not sure what the advantage of sequential block numbers would be since long period PRNG block numbering does not even need to