[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-12-05 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HADOOP-2185:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I just committed this. Thanks Konstantin!

> Server ports: to roll or not to roll.
> -
>
> Key: HADOOP-2185
> URL: https://issues.apache.org/jira/browse/HADOOP-2185
> Project: Hadoop
>  Issue Type: Improvement
>  Components: conf, dfs, mapred
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: FixedPorts3.patch, FixedPorts4.patch, port.stack
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration 
> variables?
> Like having hdfs instead of dfs, or info instead of http, or systematically 
> using either datanode
> or data.node would make that look better in my opinion.
> So these are all +*api*+ changes. I would +*really*+ like some feedback on 
> this, especially from 
> people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-12-04 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2185:


Attachment: (was: FixedPorts2.patch)

> Server ports: to roll or not to roll.
> -
>
> Key: HADOOP-2185
> URL: https://issues.apache.org/jira/browse/HADOOP-2185
> Project: Hadoop
>  Issue Type: Improvement
>  Components: conf, dfs, mapred
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: FixedPorts3.patch, FixedPorts4.patch, port.stack
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration 
> variables?
> Like having hdfs instead of dfs, or info instead of http, or systematically 
> using either datanode
> or data.node would make that look better in my opinion.
> So these are all +*api*+ changes. I would +*really*+ like some feedback on 
> this, especially from 
> people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-12-04 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2185:


Status: Patch Available  (was: Open)

> Server ports: to roll or not to roll.
> -
>
> Key: HADOOP-2185
> URL: https://issues.apache.org/jira/browse/HADOOP-2185
> Project: Hadoop
>  Issue Type: Improvement
>  Components: conf, dfs, mapred
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: FixedPorts3.patch, FixedPorts4.patch, port.stack
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration 
> variables?
> Like having hdfs instead of dfs, or info instead of http, or systematically 
> using either datanode
> or data.node would make that look better in my opinion.
> So these are all +*api*+ changes. I would +*really*+ like some feedback on 
> this, especially from 
> people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-12-04 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2185:


Attachment: FixedPorts4.patch

This is a newer version.

> Server ports: to roll or not to roll.
> -
>
> Key: HADOOP-2185
> URL: https://issues.apache.org/jira/browse/HADOOP-2185
> Project: Hadoop
>  Issue Type: Improvement
>  Components: conf, dfs, mapred
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: FixedPorts3.patch, FixedPorts4.patch, port.stack
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration 
> variables?
> Like having hdfs instead of dfs, or info instead of http, or systematically 
> using either datanode
> or data.node would make that look better in my opinion.
> So these are all +*api*+ changes. I would +*really*+ like some feedback on 
> this, especially from 
> people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-12-04 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HADOOP-2185:
-

Status: Open  (was: Patch Available)

Hi Konstantin, I an finding that this patch does not merge cleanly with trunk. 
Can you pl upload a new patch? thanks.

> Server ports: to roll or not to roll.
> -
>
> Key: HADOOP-2185
> URL: https://issues.apache.org/jira/browse/HADOOP-2185
> Project: Hadoop
>  Issue Type: Improvement
>  Components: conf, dfs, mapred
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: FixedPorts2.patch, FixedPorts3.patch, port.stack
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration 
> variables?
> Like having hdfs instead of dfs, or info instead of http, or systematically 
> using either datanode
> or data.node would make that look better in my opinion.
> So these are all +*api*+ changes. I would +*really*+ like some feedback on 
> this, especially from 
> people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-12-03 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2185:


Attachment: FixedPorts3.patch

Dhruba, thanks for the feedback. I finally realized why the new tests were 
sometimes failing.
The problem is with the clients.

Example 1: The name-node instantiates Trash, which creates a DFSClient (even if 
trash is disabled).
When the name-node stops this DFSClient remains up and the Secondary name-node 
would not start, 
because it cannot create a client. Namely the secondary nn just hangs trying to 
connect to the
main name-node (RPC.waitForProxy()).

Example 2: Similar thing happens with the JobTracker, which also creates a 
DFSClient in order
to remove a file. But never closes it. So the next start of the JobTracker 
would hang the same
way as in the previous example.

In both cases if you wait long enough the clients eventually dies, that is why 
the failure is
not stable.

I am closing the clients inside my tests now. Closing clients within Trash or 
JobTracker breaks
other unit tests, because the clients are static object, and closing a client 
once would destroy 
that object for everybody else, who opened the client inside the same JVM.
Fixing that is beyond the scope of this patch, I'll open another issue related 
to the problem.

All tests pass now.
As I mentioned before, the findBugs warning about assigning to static fields 
will remain unfixed.

> Server ports: to roll or not to roll.
> -
>
> Key: HADOOP-2185
> URL: https://issues.apache.org/jira/browse/HADOOP-2185
> Project: Hadoop
>  Issue Type: Improvement
>  Components: conf, dfs, mapred
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: FixedPorts2.patch, FixedPorts3.patch, port.stack
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = h

[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-12-03 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2185:


Status: Patch Available  (was: Open)

> Server ports: to roll or not to roll.
> -
>
> Key: HADOOP-2185
> URL: https://issues.apache.org/jira/browse/HADOOP-2185
> Project: Hadoop
>  Issue Type: Improvement
>  Components: conf, dfs, mapred
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: FixedPorts2.patch, FixedPorts3.patch, port.stack
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration 
> variables?
> Like having hdfs instead of dfs, or info instead of http, or systematically 
> using either datanode
> or data.node would make that look better in my opinion.
> So these are all +*api*+ changes. I would +*really*+ like some feedback on 
> this, especially from 
> people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-12-03 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2185:


Attachment: (was: FixedPorts.patch)

> Server ports: to roll or not to roll.
> -
>
> Key: HADOOP-2185
> URL: https://issues.apache.org/jira/browse/HADOOP-2185
> Project: Hadoop
>  Issue Type: Improvement
>  Components: conf, dfs, mapred
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: FixedPorts2.patch, FixedPorts3.patch, port.stack
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration 
> variables?
> Like having hdfs instead of dfs, or info instead of http, or systematically 
> using either datanode
> or data.node would make that look better in my opinion.
> So these are all +*api*+ changes. I would +*really*+ like some feedback on 
> this, especially from 
> people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-12-01 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HADOOP-2185:
-

Attachment: port.stack

Stack trace of TestHDFSServerPorts  when it was hung.

> Server ports: to roll or not to roll.
> -
>
> Key: HADOOP-2185
> URL: https://issues.apache.org/jira/browse/HADOOP-2185
> Project: Hadoop
>  Issue Type: Improvement
>  Components: conf, dfs, mapred
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: FixedPorts.patch, FixedPorts2.patch, port.stack
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration 
> variables?
> Like having hdfs instead of dfs, or info instead of http, or systematically 
> using either datanode
> or data.node would make that look better in my opinion.
> So these are all +*api*+ changes. I would +*really*+ like some feedback on 
> this, especially from 
> people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-12-01 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HADOOP-2185:
-

Status: Open  (was: Patch Available)

While running unit tests on trunk with this patch, I got a timeout for 

[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
[junit] Test org.apache.hadoop.dfs.TestHDFSServerPorts FAILED (timeout)

I will attach the stack trace to this JIRA.

> Server ports: to roll or not to roll.
> -
>
> Key: HADOOP-2185
> URL: https://issues.apache.org/jira/browse/HADOOP-2185
> Project: Hadoop
>  Issue Type: Improvement
>  Components: conf, dfs, mapred
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: FixedPorts.patch, FixedPorts2.patch
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration 
> variables?
> Like having hdfs instead of dfs, or info instead of http, or systematically 
> using either datanode
> or data.node would make that look better in my opinion.
> So these are all +*api*+ changes. I would +*really*+ like some feedback on 
> this, especially from 
> people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-11-26 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2185:


Status: Patch Available  (was: Open)

> Server ports: to roll or not to roll.
> -
>
> Key: HADOOP-2185
> URL: https://issues.apache.org/jira/browse/HADOOP-2185
> Project: Hadoop
>  Issue Type: Improvement
>  Components: conf, dfs, mapred
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: FixedPorts.patch, FixedPorts2.patch
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration 
> variables?
> Like having hdfs instead of dfs, or info instead of http, or systematically 
> using either datanode
> or data.node would make that look better in my opinion.
> So these are all +*api*+ changes. I would +*really*+ like some feedback on 
> this, especially from 
> people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-11-26 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2185:


Status: Open  (was: Patch Available)

> Server ports: to roll or not to roll.
> -
>
> Key: HADOOP-2185
> URL: https://issues.apache.org/jira/browse/HADOOP-2185
> Project: Hadoop
>  Issue Type: Improvement
>  Components: conf, dfs, mapred
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: FixedPorts.patch, FixedPorts2.patch
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration 
> variables?
> Like having hdfs instead of dfs, or info instead of http, or systematically 
> using either datanode
> or data.node would make that look better in my opinion.
> So these are all +*api*+ changes. I would +*really*+ like some feedback on 
> this, especially from 
> people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-11-26 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2185:


Attachment: (was: FixedPorts1.patch)

> Server ports: to roll or not to roll.
> -
>
> Key: HADOOP-2185
> URL: https://issues.apache.org/jira/browse/HADOOP-2185
> Project: Hadoop
>  Issue Type: Improvement
>  Components: conf, dfs, mapred
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: FixedPorts.patch, FixedPorts2.patch
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration 
> variables?
> Like having hdfs instead of dfs, or info instead of http, or systematically 
> using either datanode
> or data.node would make that look better in my opinion.
> So these are all +*api*+ changes. I would +*really*+ like some feedback on 
> this, especially from 
> people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-11-26 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2185:


Attachment: FixedPorts2.patch

Adding 2 unti tests: TestHDFSServerPorts and TestMRServerPorts

> Server ports: to roll or not to roll.
> -
>
> Key: HADOOP-2185
> URL: https://issues.apache.org/jira/browse/HADOOP-2185
> Project: Hadoop
>  Issue Type: Improvement
>  Components: conf, dfs, mapred
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: FixedPorts.patch, FixedPorts2.patch
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration 
> variables?
> Like having hdfs instead of dfs, or info instead of http, or systematically 
> using either datanode
> or data.node would make that look better in my opinion.
> So these are all +*api*+ changes. I would +*really*+ like some feedback on 
> this, especially from 
> people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-11-26 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2185:


Attachment: FixedPorts1.patch

All three findbugs reported during the last run are old bugs, not introduced by 
the patch.
I fixed findbugs in NamenodeFsck all 3 of them .
But the two in FSNamesystem related to "Write to static field 
FSNamesystem.fsNamesystemObject" cannot be fixed.
This is done intensionally and the warning should be ignored.
The patch is updated to current trunk.

> Server ports: to roll or not to roll.
> -
>
> Key: HADOOP-2185
> URL: https://issues.apache.org/jira/browse/HADOOP-2185
> Project: Hadoop
>  Issue Type: Improvement
>  Components: conf, dfs, mapred
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: FixedPorts.patch, FixedPorts1.patch
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration 
> variables?
> Like having hdfs instead of dfs, or info instead of http, or systematically 
> using either datanode
> or data.node would make that look better in my opinion.
> So these are all +*api*+ changes. I would +*really*+ like some feedback on 
> this, especially from 
> people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-11-26 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2185:


Status: Patch Available  (was: Open)

FindBugs problem HADOOP-2272
Resubmitting the patch.

> Server ports: to roll or not to roll.
> -
>
> Key: HADOOP-2185
> URL: https://issues.apache.org/jira/browse/HADOOP-2185
> Project: Hadoop
>  Issue Type: Improvement
>  Components: conf, dfs, mapred
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: FixedPorts.patch
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration 
> variables?
> Like having hdfs instead of dfs, or info instead of http, or systematically 
> using either datanode
> or data.node would make that look better in my opinion.
> So these are all +*api*+ changes. I would +*really*+ like some feedback on 
> this, especially from 
> people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-11-26 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2185:


Status: Open  (was: Patch Available)

> Server ports: to roll or not to roll.
> -
>
> Key: HADOOP-2185
> URL: https://issues.apache.org/jira/browse/HADOOP-2185
> Project: Hadoop
>  Issue Type: Improvement
>  Components: conf, dfs, mapred
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: FixedPorts.patch
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration 
> variables?
> Like having hdfs instead of dfs, or info instead of http, or systematically 
> using either datanode
> or data.node would make that look better in my opinion.
> So these are all +*api*+ changes. I would +*really*+ like some feedback on 
> this, especially from 
> people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-11-25 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2185:


Status: Patch Available  (was: Open)

> Server ports: to roll or not to roll.
> -
>
> Key: HADOOP-2185
> URL: https://issues.apache.org/jira/browse/HADOOP-2185
> Project: Hadoop
>  Issue Type: Improvement
>  Components: conf, dfs, mapred
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: FixedPorts.patch
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration 
> variables?
> Like having hdfs instead of dfs, or info instead of http, or systematically 
> using either datanode
> or data.node would make that look better in my opinion.
> So these are all +*api*+ changes. I would +*really*+ like some feedback on 
> this, especially from 
> people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-11-21 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2185:


Attachment: FixedPorts.patch

This patch
# Changes behavior of the following hadoop servers NameNode, DataNode, 
SecondaryNameNode, JobTracker, TaskTracker
with respect to port rolling.
The new behavior is:
- when a specific port is provided the server must either start on that port 
  or fail by throwing java.net.BindException.
- if the port = 0 (ephemeral) then the server should choose a free port and 
start on it.

# Introduces 2 new unit tests TestHDFSServerPorts and TestMRServerPorts, which 
verify the new behavior.

# All port parameters in hadooop configuration are incorporated into respective
addresses see the table of changes above.

# Renames *.info.bindAddress to *.http.bindAddress as requested.

# Modifies StatusHttpServer, which returns BindException in case the port is 
busy instead of a generic IOException.

# Introduces FSNamesystem.initialize() so that the FSNamesystem could be 
destroyed if an exception  is thrown inside the construction.
# Moves DataNode..createSocketAddr() into NetUtils, as requested.
# Fixes NullPointerException in JobTracker and NameNode, which is thrown during 
shutdown when I run the new tests
because some members are null.


> Server ports: to roll or not to roll.
> -
>
> Key: HADOOP-2185
> URL: https://issues.apache.org/jira/browse/HADOOP-2185
> Project: Hadoop
>  Issue Type: Improvement
>  Components: conf, dfs, mapred
>Affects Versions: 0.15.0
>Reporter: Konstantin Shvachko
> Fix For: 0.16.0
>
> Attachments: FixedPorts.patch
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration 
> variables?
> Like having hdfs instead of dfs, or info instea

Re: [jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-11-20 Thread Doug Cutting
FYI, I find it hard to follow an issue when folks edit descriptions and 
comments.


I think the best practice is to submit issues whose description briefly 
describes the problem.  Then comments can be used to elaborate on the 
problem and develop a solution.  If one changes one's mind, one should 
add a new comment noting that, rather than editing a prior comment.  If 
the initial description is no longer accurate, file a new issue, close 
the initial issue as "won't fix" and link it to the new issue.


I've noted this before in the HowToContribute page:

http://wiki.apache.org/lucene-hadoop/HowToContribute#head-374ea7eb0d41f1e7ea5d4c14102d993c494ac90c

Similarly, there is no need to remove stale attachments.  These provide 
history and are useful.


Thanks,

Doug


[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-11-20 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2185:


Description: 
Looked at the issues related to port rolling. My impression is that port 
rolling is required only for the unit tests to run.
Even the name-node port should roll there, which we don't have now, in order to 
be able to start 2 cluster for testing say dist cp.

For real clusters on the contrary port rolling is not desired and some times 
even prohibited.
So we should have a way of to ban port rolling. My proposition is to
# use ephemeral port 0 if port rolling is desired
# if a specific port is specified then port rolling should not happen at all, 
meaning that a 
server is either able or not able to start on that particular port.

The desired port is specified via configuration parameters.
- Name-node: fs.default.name = host:port
- Data-node: dfs.datanode.port
- Job-tracker: mapred.job.tracker = host:port
- Task-tracker: mapred.task.tracker.report.bindAddress = host
  Task-tracker currently does not have an option to specify port, it always 
uses the ephemeral port 0, 
  and therefore I propose to add one.
- Secondary node does not need a port to listen on.

For info servers we have two sets of config variables *.info.bindAddress and 
*.info.port
except for the task tracker, which calls them *.http.bindAddress and 
*.http.port instead of "info".
With respect to the info servers I propose to completely eliminate the port 
parameters, and form 
*.info.bindAddress = host:port
Info servers should do the same thing, namely start or fail on the specified 
port if it is not 0,
and start on any free port if it is ephemeral.

For the task-tracker I would rename tasktracker.http.bindAddress to 
mapred.task.tracker.info.bindAddress
For the data-node the info dfs.datanode.info.bindAddress should be included 
into the default config.
Is there a reason why it is not there?

This is the summary of proposed changes:
|| Server || current name = value || proposed name = value ||
| NameNode | fs.default.name = host:port | same |
| | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
| DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
host:port |
| | dfs.datanode.port = port | eliminate |
| | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
host:port |
| | dfs.datanode.info.port = port | eliminate |
| JobTracker | mapred.job.tracker = host:port | same |
| | mapred.job.tracker.info.bindAddress = host | 
mapred.job.tracker.http.bindAddress = host:port |
| | mapred.job.tracker.info.port = port | eliminate |
| TaskTracker | mapred.task.tracker.report.bindAddress = host | 
mapred.task.tracker.report.bindAddress = host:port |
| | tasktracker.http.bindAddress = host | mapred.task.tracker.http.bindAddress 
= host:port |
| | tasktracker.http.port = port | eliminate |
| SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
dfs.secondary.http.bindAddress = host:port |
| | dfs.secondary.info.port = port | eliminate |

Do we also want to set some uniform naming convention for the configuration 
variables?
Like having hdfs instead of dfs, or info instead of http, or systematically 
using either datanode
or data.node would make that look better in my opinion.

So these are all +*api*+ changes. I would +*really*+ like some feedback on 
this, especially from 
people who deal with configuration issues on practice.

  was:
Looked at the issues related to port rolling. My impression is that port 
rolling is required only for the unit tests to run.
Even the name-node port should roll there, which we don't have now, in order to 
be able to start 2 cluster for testing say dist cp.

For real clusters on the contrary port rolling is not desired and some times 
even prohibited.
So we should have a way of to ban port rolling. My proposition is to
# use ephemeral port 0 if port rolling is desired
# if a specific port is specified then port rolling should not happen at all, 
meaning that a 
server is either able or not able to start on that particular port.

The desired port is specified via configuration parameters.
- Name-node: fs.default.name = host:port
- Data-node: dfs.datanode.port
- Job-tracker: mapred.job.tracker = host:port
- Task-tracker: mapred.task.tracker.report.bindAddress = host
  Task-tracker currently does not have an option to specify port, it always 
uses the ephemeral port 0, 
  and therefore I propose to add one.
- Secondary node does not need a port to listen on.

For info servers we have two sets of config variables *.info.bindAddress and 
*.info.port
except for the task tracker, which calls them *.http.bindAddress and 
*.http.port instead of "info".
With respect to the info servers I propose to completely eliminate the port 
parameters, and form 
*.info.bindAddress = host:port
Info servers should do the same thing, namely start or fail on t

[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

2007-11-09 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HADOOP-2185:


Description: 
Looked at the issues related to port rolling. My impression is that port 
rolling is required only for the unit tests to run.
Even the name-node port should roll there, which we don't have now, in order to 
be able to start 2 cluster for testing say dist cp.

For real clusters on the contrary port rolling is not desired and some times 
even prohibited.
So we should have a way of to ban port rolling. My proposition is to
# use ephemeral port 0 if port rolling is desired
# if a specific port is specified then port rolling should not happen at all, 
meaning that a 
server is either able or not able to start on that particular port.

The desired port is specified via configuration parameters.
- Name-node: fs.default.name = host:port
- Data-node: dfs.datanode.port
- Job-tracker: mapred.job.tracker = host:port
- Task-tracker: mapred.task.tracker.report.bindAddress = host
  Task-tracker currently does not have an option to specify port, it always 
uses the ephemeral port 0, 
  and therefore I propose to add one.
- Secondary node does not need a port to listen on.

For info servers we have two sets of config variables *.info.bindAddress and 
*.info.port
except for the task tracker, which calls them *.http.bindAddress and 
*.http.port instead of "info".
With respect to the info servers I propose to completely eliminate the port 
parameters, and form 
*.info.bindAddress = host:port
Info servers should do the same thing, namely start or fail on the specified 
port if it is not 0,
and start on any free port if it is ephemeral.

For the task-tracker I would rename tasktracker.http.bindAddress to 
mapred.task.tracker.info.bindAddress
For the data-node the info dfs.datanode.info.bindAddress should be included 
into the default config.
Is there a reason why it is not there?

This is the summary of proposed changes:
|| Server || current name = value || proposed name = value ||
| NameNode | fs.default.name = host:port | same |
| | dfs.info.bindAddress = host | dfs.info.bindAddress = host:port |
| DataNode | dfs.datanode.port = port | same |
| | dfs.datanode.info.bindAddress = host | dfs.datanode.info.bindAddress = 
host:port |
| | dfs.datanode.info.port = port | eliminate |
| JobTracker| mapred.job.tracker = host:port | same |
| | mapred.task.tracker.info.bindAddress = host | 
mapred.task.tracker.info.bindAddress = host:port |
| | mapred.task.tracker.info.port = port | eliminate |
| TaskTracker| mapred.task.tracker.report.bindAddress = host | 
mapred.task.tracker.report.bindAddress = host:port |
| | tasktracker.http.bindAddress = host | mapred.task.tracker.info.bindAddress 
= host:port |
| | tasktracker.http.port = port | eliminate |
| SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
dfs.secondary.info.bindAddress = host:port |
| | dfs.secondary.info.port = port | eliminate |

Do we also want to set some uniform naming convention for the configuration 
variables?
Like having hdfs instead of dfs, or info instead of http, or systematically 
using either datanode
or data.node would make that look better in my opinion.

So these are all +*api*+ changes. I would +*really*+ like some feedback on 
this, especially from 
people who deal with configuration issues on practice.

  was:
Looked at the issues related to port rolling. My impression is that port 
rolling is required only for the unit tests to run.
Even the name-node port should roll there, which we don't have now, in order to 
be able to start 2 cluster for testing say dist cp.

For real clusters on the contrary port rolling is not desired and some times 
even prohibited.
So we should have a way of to ban port rolling. My proposition is to
# use ephemeral port 0 if port rolling is desired
# if a specific port is specified then port rolling should not happen at all, 
meaning that a 
server is either able or not able to start on that particular port.

The desired port is specified via configuration parameters.
- Name-node: fs.default.name = host:port
- Data-node: dfs.datanode.port
- Job-tracker: mapred.job.tracker = host:port
- Task-tracker: mapred.task.tracker.report.bindAddress = host
  Task-tracker currently does not have an option to specify port, it always 
uses the ephemeral port 0, 
  and therefore I propose to add one.
- Secondary node does not need a port to listen on.

For info servers we have two sets of config variables *.info.bindAddress and 
*.info.port
except for the task tracker, which calls them *.http.bindAddress and 
*.http.port instead of "info".
With respect to the info servers I propose to completely eliminate the port 
parameters, and form 
*.info.bindAddress = host:port
Info servers should do the same thing, namely start or fail on the specified 
port if it is not 0,
and start on any free port if it is ephemeral.