[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

Konstantin Shvachko (JIRA) Tue, 20 Nov 2007 00:43:04 -0800

     [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Konstantin Shvachko updated HADOOP-2185:
----------------------------------------

    Description: 
Looked at the issues related to port rolling. My impression is that port 
rolling is required only for the unit tests to run.
Even the name-node port should roll there, which we don't have now, in order to 
be able to start 2 cluster for testing say dist cp.

For real clusters on the contrary port rolling is not desired and some times 
even prohibited.
So we should have a way of to ban port rolling. My proposition is to
# use ephemeral port 0 if port rolling is desired
# if a specific port is specified then port rolling should not happen at all, 
meaning that a 
server is either able or not able to start on that particular port.

The desired port is specified via configuration parameters.
- Name-node: fs.default.name = host:port
- Data-node: dfs.datanode.port
- Job-tracker: mapred.job.tracker = host:port
- Task-tracker: mapred.task.tracker.report.bindAddress = host
  Task-tracker currently does not have an option to specify port, it always 
uses the ephemeral port 0, 
  and therefore I propose to add one.
- Secondary node does not need a port to listen on.

For info servers we have two sets of config variables *.info.bindAddress and 
*.info.port
except for the task tracker, which calls them *.http.bindAddress and 
*.http.port instead of "info".
With respect to the info servers I propose to completely eliminate the port 
parameters, and form 
*.info.bindAddress = host:port
Info servers should do the same thing, namely start or fail on the specified 
port if it is not 0,
and start on any free port if it is ephemeral.

For the task-tracker I would rename tasktracker.http.bindAddress to 
mapred.task.tracker.info.bindAddress
For the data-node the info dfs.datanode.info.bindAddress should be included 
into the default config.
Is there a reason why it is not there?

This is the summary of proposed changes:
|| Server || current name = value || proposed name = value ||
| NameNode | fs.default.name = host:port | same |
| | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
| DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
host:port |
| | dfs.datanode.port = port | eliminate |
| | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
host:port |
| | dfs.datanode.info.port = port | eliminate |
| JobTracker | mapred.job.tracker = host:port | same |
| | mapred.job.tracker.info.bindAddress = host | 
mapred.job.tracker.http.bindAddress = host:port |
| | mapred.job.tracker.info.port = port | eliminate |
| TaskTracker | mapred.task.tracker.report.bindAddress = host | 
mapred.task.tracker.report.bindAddress = host:port |
| | tasktracker.http.bindAddress = host | mapred.task.tracker.http.bindAddress 
= host:port |
| | tasktracker.http.port = port | eliminate |
| SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
dfs.secondary.http.bindAddress = host:port |
| | dfs.secondary.info.port = port | eliminate |

Do we also want to set some uniform naming convention for the configuration 
variables?
Like having hdfs instead of dfs, or info instead of http, or systematically 
using either datanode
or data.node would make that look better in my opinion.

So these are all +*api*+ changes. I would +*really*+ like some feedback on 
this, especially from 
people who deal with configuration issues on practice.

  was:
Looked at the issues related to port rolling. My impression is that port 
rolling is required only for the unit tests to run.
Even the name-node port should roll there, which we don't have now, in order to 
be able to start 2 cluster for testing say dist cp.

For real clusters on the contrary port rolling is not desired and some times 
even prohibited.
So we should have a way of to ban port rolling. My proposition is to
# use ephemeral port 0 if port rolling is desired
# if a specific port is specified then port rolling should not happen at all, 
meaning that a 
server is either able or not able to start on that particular port.

The desired port is specified via configuration parameters.
- Name-node: fs.default.name = host:port
- Data-node: dfs.datanode.port
- Job-tracker: mapred.job.tracker = host:port
- Task-tracker: mapred.task.tracker.report.bindAddress = host
  Task-tracker currently does not have an option to specify port, it always 
uses the ephemeral port 0, 
  and therefore I propose to add one.
- Secondary node does not need a port to listen on.

For info servers we have two sets of config variables *.info.bindAddress and 
*.info.port
except for the task tracker, which calls them *.http.bindAddress and 
*.http.port instead of "info".
With respect to the info servers I propose to completely eliminate the port 
parameters, and form 
*.info.bindAddress = host:port
Info servers should do the same thing, namely start or fail on the specified 
port if it is not 0,
and start on any free port if it is ephemeral.

For the task-tracker I would rename tasktracker.http.bindAddress to 
mapred.task.tracker.info.bindAddress
For the data-node the info dfs.datanode.info.bindAddress should be included 
into the default config.
Is there a reason why it is not there?

This is the summary of proposed changes:
|| Server || current name = value || proposed name = value ||
| NameNode | fs.default.name = host:port | same |
| | dfs.info.bindAddress = host | dfs.info.bindAddress = host:port |
| DataNode | dfs.datanode.port = port | same |
| | dfs.datanode.info.bindAddress = host | dfs.datanode.info.bindAddress = 
host:port |
| | dfs.datanode.info.port = port | eliminate |
| JobTracker| mapred.job.tracker = host:port | same |
| | mapred.task.tracker.info.bindAddress = host | 
mapred.task.tracker.info.bindAddress = host:port |
| | mapred.task.tracker.info.port = port | eliminate |
| TaskTracker| mapred.task.tracker.report.bindAddress = host | 
mapred.task.tracker.report.bindAddress = host:port |
| | tasktracker.http.bindAddress = host | mapred.task.tracker.info.bindAddress 
= host:port |
| | tasktracker.http.port = port | eliminate |
| SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
dfs.secondary.info.bindAddress = host:port |
| | dfs.secondary.info.port = port | eliminate |

Do we also want to set some uniform naming convention for the configuration 
variables?
Like having hdfs instead of dfs, or info instead of http, or systematically 
using either datanode
or data.node would make that look better in my opinion.

So these are all +*api*+ changes. I would +*really*+ like some feedback on 
this, especially from 
people who deal with configuration issues on practice.


> Server ports: to roll or not to roll.
> -------------------------------------
>
>                 Key: HADOOP-2185
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2185
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf, dfs, mapred
>    Affects Versions: 0.15.0
>            Reporter: Konstantin Shvachko
>             Fix For: 0.16.0
>
>
> Looked at the issues related to port rolling. My impression is that port 
> rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order 
> to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times 
> even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, 
> meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always 
> uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and 
> *.info.port
> except for the task tracker, which calls them *.http.bindAddress and 
> *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port 
> parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified 
> port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to 
> mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included 
> into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
> host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
> host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | 
> mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
> mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | 
> mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
> dfs.secondary.http.bindAddress = host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration 
> variables?
> Like having hdfs instead of dfs, or info instead of http, or systematically 
> using either datanode
> or data.node would make that look better in my opinion.
> So these are all +*api*+ changes. I would +*really*+ like some feedback on 
> this, especially from 
> people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

Reply via email to