[jira] [Created] (HDFS-3060) Bump TestDistributedUpgrade#testDistributedUpgrade timeout

2012-03-07 Thread Eli Collins (Created) (JIRA)
Bump TestDistributedUpgrade#testDistributedUpgrade timeout
--

 Key: HDFS-3060
 URL: https://issues.apache.org/jira/browse/HDFS-3060
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 0.23.0
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor


TestDistributedUpgrade#testDistributedUpgrade occasionally times out. Let's 
bump its timeout to 5 min to match some of the other long-running tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3078) 2NN https port setting is broken in Hadoop 1.0

2012-03-12 Thread Eli Collins (Created) (JIRA)
2NN https port setting is broken in Hadoop 1.0
--

 Key: HDFS-3078
 URL: https://issues.apache.org/jira/browse/HDFS-3078
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: 1.1.0


The code in SecondaryNameNode.java to set the https port is broken, if the port 
is set it sets the bind addr to "addr:addr:port" which is bogus. Even if it did 
work it uses port 0 instead of port 50490 (default listed in 
./src/packages/templates/conf/hdfs-site.xml).



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3090) Broaden NN#monitorHealth checks

2012-03-13 Thread Eli Collins (Created) (JIRA)
Broaden NN#monitorHealth checks 


 Key: HDFS-3090
 URL: https://issues.apache.org/jira/browse/HDFS-3090
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.23.3
Reporter: Eli Collins


Currently the NN implementation of HAServiceProtocol#monitorHealth just calls 
FSNamesystem#checkAvailableResources. We should  extend this method to cover a 
broader range of resources (eg HDFS-2704), but we should also extend 
NN#monitorHealth to make other unrelated health checks (eg whether all its 
important service threads are running, memory usage, etc).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3120) Provide ability to enable sync without append

2012-03-20 Thread Eli Collins (Created) (JIRA)
Provide ability to enable sync without append
-

 Key: HDFS-3120
 URL: https://issues.apache.org/jira/browse/HDFS-3120
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 1.0.1
Reporter: Eli Collins
Assignee: Eli Collins


The work on branch-20-append was to support *sync*, for durable HBase WALs, not 
*append*. The branch-20-append implementation is known to be buggy. There's 
been confusion about this, we often answer queries on the list [like 
this|http://search-hadoop.com/m/wfed01VOIJ5]. Unfortunately, the way to enable 
correct sync on branch-1 for HBase is to set dfs.support.append to true in your 
config, which has the side effect of enabling append (which we don't want to 
do).

Let's add a new *dfs.support.hsync* option that enables working sync (which is 
basically the current dfs.support.append flag modulo one place where it's not 
referring to sync). For compatibility, if dfs.support.append is set, 
dfs.support.sync will be set as well. This way someone can enable sync for 
HBase and still keep the current behavior that if dfs.support.append is not set 
then an append operation will result in an IOE indicating append is not 
supported. We should do this on trunk as well, as there's no reason to conflate 
hsync and append with a single config even if append works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3128) TestResolveHdfsSymlink#testFcResolveAfs shouldn't use /tmp

2012-03-22 Thread Eli Collins (Created) (JIRA)
TestResolveHdfsSymlink#testFcResolveAfs shouldn't use /tmp
--

 Key: HDFS-3128
 URL: https://issues.apache.org/jira/browse/HDFS-3128
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Eli Collins
Priority: Minor


Saw this on jenkins, TestResolveHdfsSymlink#testFcResolveAfs creates /tmp/alpha 
which interferes with other executors on the same machine.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3137) Bump LAST_UPGRADABLE_LAYOUT_VERSION

2012-03-23 Thread Eli Collins (Created) (JIRA)
Bump LAST_UPGRADABLE_LAYOUT_VERSION
---

 Key: HDFS-3137
 URL: https://issues.apache.org/jira/browse/HDFS-3137
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Eli Collins
Assignee: Eli Collins


LAST_UPGRADABLE_LAYOUT_VERSION is currently -7, which corresponds to Hadoop 
0.14. How about we bump it to -16, which corresponds to Hadoop 0.18?

I don't think many people are using releases older than v0.18, and those who 
are probably want to upgrade to the latest stable release (v1.0). They can 
always upgrade to v1.0 and then eg 0.23 from there if they want.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3138) Move DatanodeInfo#ipcPort and hostName to DatanodeID

2012-03-23 Thread Eli Collins (Created) (JIRA)
Move DatanodeInfo#ipcPort and hostName to DatanodeID


 Key: HDFS-3138
 URL: https://issues.apache.org/jira/browse/HDFS-3138
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Eli Collins
Assignee: Eli Collins


We can fix the following TODO once HDFS-3137 is committed. Also the hostName 
field should be moved as well (it's not ephemeral, just gets set on 
registration).

{code}
//TODO: move it to DatanodeID once DatanodeID is not stored in FSImage
out.writeShort(ipcPort);
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3139) Minor Datanode logging improvement

2012-03-23 Thread Eli Collins (Created) (JIRA)
Minor Datanode logging improvement
--

 Key: HDFS-3139
 URL: https://issues.apache.org/jira/browse/HDFS-3139
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor


- DatanodeInfo#getDatanodeReport should log its hostname, in addition to the 
DNS lookup it does on its IP
- Datanode should log the ipc/info/streaming servers its listening on at 
startup at INFO level

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3140) Support multiple network interfaces

2012-03-23 Thread Eli Collins (Created) (JIRA)
Support multiple network interfaces
---

 Key: HDFS-3140
 URL: https://issues.apache.org/jira/browse/HDFS-3140
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Eli Collins
Assignee: Eli Collins


Umbrella jira to track the HDFS side of HADOOP-8198.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3141) The NN should log "missing" blocks

2012-03-24 Thread Eli Collins (Created) (JIRA)
The NN should log "missing" blocks
--

 Key: HDFS-3141
 URL: https://issues.apache.org/jira/browse/HDFS-3141
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Eli Collins
Assignee: Eli Collins


It would be help debugging if the NN logged at info level and "missing" blocks. 
In v1 missing means  there are no live / decommissioned replicas (ie they're 
all excess or corrupt), in trunk it means all replicas of the block are corrupt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3142) TestHDFSCLI.testAll is failing

2012-03-24 Thread Eli Collins (Created) (JIRA)
TestHDFSCLI.testAll is failing
--

 Key: HDFS-3142
 URL: https://issues.apache.org/jira/browse/HDFS-3142
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eli Collins
Priority: Blocker


TestHDFSCLI.testAll is failing in the latest trunk/23 builds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3143) TestGetBlocks.testGetBlocks is failing

2012-03-24 Thread Eli Collins (Created) (JIRA)
TestGetBlocks.testGetBlocks is failing
--

 Key: HDFS-3143
 URL: https://issues.apache.org/jira/browse/HDFS-3143
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eli Collins


TestGetBlocks.testGetBlocks is failing in the latest trunk/23 builds. Last good 
build was Mar 23rd.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3144) Refactor DatanodeID#getName by use

2012-03-24 Thread Eli Collins (Created) (JIRA)
Refactor DatanodeID#getName by use
--

 Key: HDFS-3144
 URL: https://issues.apache.org/jira/browse/HDFS-3144
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Eli Collins
Assignee: Eli Collins


DatanodeID#getName, which returns a string containing the IP:xferPort of a 
Datanode, is used in a variety of contexts:

# Putting the ID in a log message
# Connecting to the DN for data transfer
# Getting a string to use as a key (eg for comparison)
# Using as a hostname, eg for excludes/includes, topology files

Same for DatanodeID#getHost, which returns just the IP part, and sometimes we 
use it as a key, sometimes we tack on the IPC port, etc.

Let's have a method for each use, eg toString can be used for #1, a new method 
(eg getDataXferAddr) for #2, a new method (eg getKey) for #3, new method (eg 
getHostID) for #4, etc. Aside from the code being more clear, we can change the 
value for particular uses, eg we can change the format in a log message without 
changing the address used that clients connect to the DN, or modify the address 
used for data transfer without changing the other uses.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3145) Disallow self failover

2012-03-25 Thread Eli Collins (Created) (JIRA)
Disallow self failover
--

 Key: HDFS-3145
 URL: https://issues.apache.org/jira/browse/HDFS-3145
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Reporter: Eli Collins
Assignee: Eli Collins


It is currently possible for users to make a standby NameNode failover to 
itself and become active. We shouldn't allow this to happen in case operators 
mistype and miss the fact that there are now two active NNs.

{noformat}
bash-4.1$ hdfs haadmin -ns ha-nn-uri -failover nn2 nn2
Failover from nn2 to nn2 successful
{noformat}

After the failover above, nn2 will be active.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3146) Datanode should be able to register multiple network interfaces

2012-03-25 Thread Eli Collins (Created) (JIRA)
Datanode should be able to register multiple network interfaces
---

 Key: HDFS-3146
 URL: https://issues.apache.org/jira/browse/HDFS-3146
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node
Reporter: Eli Collins
Assignee: Eli Collins


The Datanode should register multiple interfaces with the Namenode (who then 
forwards them to clients). We can do this by extending the DatanodeID, which 
currently just contains a single interface, to contain a list of interfaces. 
For compatibility, the DatanodeID method to get the DN address for data 
transfer should remain unchanged (multiple interfaces are only used where the 
client explicitly takes advantage of them).

By default, if the Datanode binds on all interfaces (via using the wildcard in 
the dfs*address configuration) all interfaces are exposed, modulo ones like the 
loopback that should never be exposed. Alternatively, a new configuration 
parameter ({{dfs.datanode.available.interfaces}}) allows the set of interfaces 
can be specified explicitly in case the user only wants to expose a subset. If 
the new default behavior is too disruptive we could default 
dfs.datanode.available.interfaces to be the IP of the IPC interface which is 
the only interface exposed today (per HADOOP-6867, only the port from 
dfs.datanode.address is used today). 

The interfaces can be specified by name (eg "eth0"), subinterface name (eg 
"eth0:0"), or IP address. The IP address can be specified by range using CIDR 
notation so the configuration values are portable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3147) The client should be able to specify which network interfaces to use

2012-03-25 Thread Eli Collins (Created) (JIRA)
The client should be able to specify which network interfaces to use


 Key: HDFS-3147
 URL: https://issues.apache.org/jira/browse/HDFS-3147
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node
Reporter: Eli Collins
Assignee: Eli Collins


HDFS-3146 exposes multiple interfaces to the client. However, not all 
interfaces exposed to clients should be used, eg because not all addresses 
given to clients may be routable by the client, or a user may want to restrict 
off-cluster clients from using cluster-private interfaces. Therefore the user 
should be able to configure clients to use a subset of the addresses they are 
given. This can be accomplished by a new configuration option 
({{dfs.client.available.interfaces}}) that takes a list of interfaces to use, 
interfaces that don't match the configuration are ignored. Acceptable 
configuration values are the same as the {{dfs.datanode.available.interfaces}} 
parameter. In addition, we could also add an option where clients automatically 
check if they can connect to each interface that's given them, and filter those 
out by default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3148) The client should be able to use multiple local interfaces for data transfer

2012-03-25 Thread Eli Collins (Created) (JIRA)
The client should be able to use multiple local interfaces for data transfer


 Key: HDFS-3148
 URL: https://issues.apache.org/jira/browse/HDFS-3148
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs client
Reporter: Eli Collins
Assignee: Eli Collins


HDFS-3147 covers using multiple interfaces on the server (Datanode) side. 
Clients should also be able to utilize multiple *local* interfaces for outbound 
connections instead of always using the interface for the local hostname. This 
can be accomplished with a new configuration parameter 
({{dfs.client.local.interfaces}}) that accepts a list of interfaces the client 
should use. Acceptable configuration values are the same as the 
{{dfs.datanode.available.interfaces}} parameter. The client binds its socket to 
a specific interface, which enables outbound traffic to use that interface. 
Binding the client socket to a specific address is not sufficient to ensure 
egress traffic uses that interface. Eg if multiple interfaces are on the same 
subnet the host requires IP rules that use the source address (which bind sets) 
to select the destination interface. The SO_BINDTODEVICE socket option could be 
used to select a specific interface for the connection instead, however it 
requires JNI (is not in Java's SocketOptions) and root access, which we don't 
want to require clients have.

Like HDFS-3147, the client can use multiple local interfaces for data transfer. 
Since the client already cache their connections to DNs choosing a local 
interface at random seems like a good policy. Users can also pin a specific 
client to a specific interface by specifying just that interface in 
dfs.client.local.interfaces.

This change was discussed in HADOOP-6210 a while back, and is actually 
useful/independent of the other HDFS-3140 changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3149) The client should blacklist failed local/remote network interface pairs

2012-03-25 Thread Eli Collins (Created) (JIRA)
The client should blacklist failed local/remote network interface pairs
---

 Key: HDFS-3149
 URL: https://issues.apache.org/jira/browse/HDFS-3149
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs client
Reporter: Eli Collins
Assignee: Eli Collins


If a client or worker can not connect to a given remote address, eg due to 
network or interface failure, then it should blacklist the local/remote 
interface pair. Only the pair is blacklisted in case the remote interface is 
routable via another local interface. The pair is black listed for a 
configurable period of time and another local/remote interface pair is tried. 
For full fault tolerance, the host interfaces need to be connected to different 
switches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3150) Add option for clients to contact DNs via hostname in branch-1

2012-03-25 Thread Eli Collins (Created) (JIRA)
Add option for clients to contact DNs via hostname in branch-1
--

 Key: HDFS-3150
 URL: https://issues.apache.org/jira/browse/HDFS-3150
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node, hdfs client
Reporter: Eli Collins
Assignee: Eli Collins


Per the document attached to HADOOP-8198, this is just for branch-1, and 
unbreaks DN multihoming. The datanode can be configured to listen on a bond, or 
all interfaces by specifying the wildcard in the dfs.datanode.*.address 
configuration options, however per HADOOP-6867 only the source address of the 
registration is exposed to clients. HADOOP-985 made clients access datanodes by 
IP primarily to avoid the latency of a DNS lookup, this had the side effect of 
breaking DN multihoming. In order to fix it let's add back the option for 
Datanodes to be accessed by hostname. This can be done by:
# Modifying the primary field of the Datanode descriptor to be the hostname, or 
# Modifying Client/Datanode <-> Datanode access use the hostname field instead 
of the IP

I'd like to go with approach #2 as it does not require making an incompatible 
change to the client protocol, and is much less invasive. It minimizes the 
scope of modification to just places where clients and Datanodes connect, vs 
changing all uses of Datanode identifiers.

New client and Datanode configuration options are introduced:
- {{dfs.client.use.datanode.hostname}} indicates all client to datanode 
connections should use the datanode hostname (as clients outside cluster may 
not be able to route the IP)
- {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should use 
hostnames when connecting to other Datanodes for data transfer

If the configuration options are not used, there is no change in the current 
behavior.

I'm doing something similar to #1 btw in trunk in HDFS-3144 - refactoring the 
use of DatanodeID to use the right field (IP, IP:xferPort, hostname, etc) based 
on the context the ID is being used in, vs always using the IP:xferPort as the 
Datanode's name, and using the name everywhere.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3164) Move DatanodeInfo#hostName to DatanodeID

2012-03-29 Thread Eli Collins (Created) (JIRA)
Move DatanodeInfo#hostName to DatanodeID


 Key: HDFS-3164
 URL: https://issues.apache.org/jira/browse/HDFS-3164
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Eli Collins
Assignee: Eli Collins


Like HDFS-3138 (the ipcPort) the hostName field in DatanodeInfo is not 
ephemeral and should be in DatanodeID. This also allows us to fixup the issue 
where the DatanodeID#name field is overloaded (the DN sets it to a hostname, 
then the NN clobbers it with an IP, and then the DN clobbers it's hostname 
field with this IP). If the DN can specify both a "name" and "hostname" in the 
DatanodeID then this code becomes simpler. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3171) The DatanodeID "name" field is overloaded

2012-03-31 Thread Eli Collins (Created) (JIRA)
The DatanodeID "name" field is overloaded 
--

 Key: HDFS-3171
 URL: https://issues.apache.org/jira/browse/HDFS-3171
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Eli Collins
Assignee: Eli Collins


The DatanodeID "name" field is currently overloaded, when the DN creates a 
DatanodeID to register with the NN it sets "name" to be the datanode hostname, 
which is the DN's "hostName" member. This isnot necesarily a FQDN, it is either 
set explicitly or determined by the DNS class, which could return the machine's 
hostname or the result of a DNS lookup, if configured to do so. The NN then 
clobbers the "name" field of the DatanodeID with the IP part of the new 
DatanodeID "name" field it creates (and sets the DatanodeID "hostName" field to 
the reported "name"). The DN gets the DatanodeID back from the NN and clobbers 
its "hostName" member with the "name" field of the returned DatanodeID. This 
makes the code hard to reason about eg DN#getMachine name sometimes returns a 
hostname and sometimes not, depending on when it's called in sequence with the 
registration. Ditto for uses of the "name" field. I think these contortions 
were originally performed because the DatanodeID didn't have a hostName field 
(it was part of DatanodeInfo) and so there was no way to communicate both at 
the same time. Now that the hostName field is in DatanodeID (as of HDFS-3164) 
we can establish the invariant that the "name" field always and only has an IP 
address and the "hostName" field always and only has a hostname.

In HDFS-3144 I'm going to rename the "name" field so its clear that it contains 
an IP address. The above is enough scope for one change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3172) dfs.upgrade.permission is dead code

2012-03-31 Thread Eli Collins (Created) (JIRA)
dfs.upgrade.permission is dead code
---

 Key: HDFS-3172
 URL: https://issues.apache.org/jira/browse/HDFS-3172
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Trivial


As of HDFS-3137 dfs.upgrade.permission is dead code (was only used for 
upgrading from old, no longer supported releases).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3174) Fix assert in TestPendingDataNodeMessages

2012-04-01 Thread Eli Collins (Created) (JIRA)
Fix assert in TestPendingDataNodeMessages
-

 Key: HDFS-3174
 URL: https://issues.apache.org/jira/browse/HDFS-3174
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Eli Collins
Assignee: Eli Collins


The assert in the text in TestPendingDataNodeMessages is missing the DatanodeID 
port number.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3199) TestValidateConfigurationSettings is failing

2012-04-04 Thread Eli Collins (Created) (JIRA)
TestValidateConfigurationSettings is failing


 Key: HDFS-3199
 URL: https://issues.apache.org/jira/browse/HDFS-3199
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eli Collins
Assignee: Todd Lipcon


TestValidateConfigurationSettings is failing on every run.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3208) Bogus entries in hosts files are incorrectly displayed in the report

2012-04-05 Thread Eli Collins (Created) (JIRA)
Bogus entries in hosts files are incorrectly displayed in the report 
-

 Key: HDFS-3208
 URL: https://issues.apache.org/jira/browse/HDFS-3208
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0
Reporter: Eli Collins
Assignee: Eli Collins


DM#getDatanodeListForReport incorrectly creates the DatanodeID for the "dead" 
report for bogus entries in the host files (eg that an invalid hostname).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3209) dfs.namenode.hosts* configuration options are unused

2012-04-05 Thread Eli Collins (Created) (JIRA)
dfs.namenode.hosts* configuration options are unused


 Key: HDFS-3209
 URL: https://issues.apache.org/jira/browse/HDFS-3209
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Eli Collins
Priority: Minor


HDFS-631 introduced dfs.namenode.hosts and dfs.namenode.hosts.exclude but never 
actually used them, so they're dead code (dfs.hosts and dfs.hosts.excludes are 
used instead). IMO the current names are better (even though they're 
inconsistent) so I'd actually prefer we just remove the dead defines.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3210) JsonUtil#toJsonMap for for a DatanodeInfo should use "ipAddr" instead of "name"

2012-04-05 Thread Eli Collins (Created) (JIRA)
JsonUtil#toJsonMap for for a DatanodeInfo should use "ipAddr" instead of "name"
---

 Key: HDFS-3210
 URL: https://issues.apache.org/jira/browse/HDFS-3210
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hdfs-3210.txt

In HDFS-3144 I missed a spot when renaming the "name" field. Let's fix that and 
add a test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3216) DatanodeID should support multiple IP addresses

2012-04-06 Thread Eli Collins (Created) (JIRA)
DatanodeID should support multiple IP addresses
---

 Key: HDFS-3216
 URL: https://issues.apache.org/jira/browse/HDFS-3216
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Eli Collins
Assignee: Eli Collins


The DatanodeID has a single field for the IP address, for HDFS-3146 we need to 
extend it to support multiple addresses.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3218) The client should be able to use multiple remote DN interfaces for block transfer

2012-04-06 Thread Eli Collins (Created) (JIRA)
The client should be able to use multiple remote DN interfaces for block 
transfer
-

 Key: HDFS-3218
 URL: https://issues.apache.org/jira/browse/HDFS-3218
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs client
Reporter: Eli Collins
Assignee: Eli Collins


HDFS-3146 and HDFS-3216 expose multiple DN interfaces to the client. In order 
for clients, in aggregate, to use multiple DN interfaces clients should pick 
different interfaces when transferring blocks. Given that we cache client <-> 
DN connections the policy of picking a remote interface at random for each new 
connection seems best (vs round robin for example). In the future we could make 
the client congestion aware. We could also establish multiple connections 
between the client and DN and therefore use multiple interfaces for a single 
block transfer. Both of those are out of scope for this jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3219) Disambiguate "visible length" in the code and docs

2012-04-06 Thread Eli Collins (Created) (JIRA)
Disambiguate "visible length" in the code and docs
--

 Key: HDFS-3219
 URL: https://issues.apache.org/jira/browse/HDFS-3219
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Eli Collins
Priority: Minor


HDFS-2288 there are two definition of visible length, or rather we're using the 
same name for two things:

# The HDFS-265 design doc which defines it as property of the replica:

{quote}
visible length is the "number of bytes that have been acknowledged by the 
downstream DataNodes". It is replica (not block) specific, meaning it can be 
different for different replicas at a given time. In the document it is called 
BA (bytes acknowledged), compared to BR (bytes received).
{quote}

# The definition in HDFS-814 and DFSClient#getVisibleLength which defines it as 
a property of a file:

{quote}
The visible length is the length that *all* datanodes in the pipeline contain 
at least such amount of data. Therefore, these data are visible to the readers.

According to this definition the visible length of a file is the floor of all 
visible lengths of all the replicas of the last block. It's a static property 
set on open, eg is not updated when a writer calls hflush. Also 
DFSInputStream#readBlockLength returns the 1st visible length of a replica it 
finds, so it seems possible (though unlikely) in a failure scenario it could 
return a length that was longer than what all replicas had.
{quote}

This has caused confusion in a number of other jiras. We should update the 
design doc, java doc, perhaps rename DFSClient#getVisibleLength etc to 
disambiguate this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3220) Improve some block recovery log messages

2012-04-06 Thread Eli Collins (Created) (JIRA)
Improve some block recovery log messages


 Key: HDFS-3220
 URL: https://issues.apache.org/jira/browse/HDFS-3220
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Eli Collins


FsDatasetImpl has three cases that throw exceptions with the message "THIS IS 
NOT SUPPOSED TO HAPPEN". These could happen in real life (eg with a corrupt 
block file). Let's improve these messages to indicate what case we've actually 
hit instead of this message, which isn't very useful. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3221) Update docs for HDFS-3140 (multiple interfaces)

2012-04-06 Thread Eli Collins (Created) (JIRA)
Update docs for HDFS-3140 (multiple interfaces)
---

 Key: HDFS-3221
 URL: https://issues.apache.org/jira/browse/HDFS-3221
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: documentation
Reporter: Eli Collins
Assignee: Eli Collins


Need to update the docs to cover:
- How to configure mulithoming (binding to the wildcard, the default)
- The new client and server configuration options

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3224) Bug in check for DN re-registration with different storage ID

2012-04-06 Thread Eli Collins (Created) (JIRA)
Bug in check for DN re-registration with different storage ID
-

 Key: HDFS-3224
 URL: https://issues.apache.org/jira/browse/HDFS-3224
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eli Collins
Priority: Minor


DatanodeManager#registerDatanode checks the host to node map using an IP:port 
key, however the map is keyed on IP, so this check will always fail. It's 
performing the check to determine if a DN with the same IP and storage ID has 
already registered, and if so to remove this DN from the map and indicate that 
eg it's no longer hosting these blocks. This bug has been here forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3230) Cleanup DatanodeID creation in the tests

2012-04-09 Thread Eli Collins (Created) (JIRA)
Cleanup DatanodeID creation in the tests


 Key: HDFS-3230
 URL: https://issues.apache.org/jira/browse/HDFS-3230
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: test
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor


A lot of tests create dummy DatanodeIDs for testing, often use bogus values 
when creating the objects (eg hostname in the IP field), which they can get 
away with because the IDs aren't actually used. Let's add a test utility method 
for creating a DatanodeID for testing and use it throughout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3231) NN Host2NodesMap should use hostnames

2012-04-09 Thread Eli Collins (Created) (JIRA)
NN Host2NodesMap should use hostnames
-

 Key: HDFS-3231
 URL: https://issues.apache.org/jira/browse/HDFS-3231
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Eli Collins
Assignee: Eli Collins


The NN's Host2NodesMap maps "host names" to datanode descriptors. It actually 
uses IP addresses and should use hostnames instead, as hostnames are a better 
key (eg a Datanode has one hostname but may have multiple IPs). Per HDFS-3216 
there's actually a bug in that it's sometimes accessed with IP:port instead of 
IP, so that jira should be fixed  before this one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3232) Cleanup DatanodeInfo vs DatanodeID handling in DN servlets

2012-04-09 Thread Eli Collins (Created) (JIRA)
Cleanup DatanodeInfo vs DatanodeID handling in DN servlets
--

 Key: HDFS-3232
 URL: https://issues.apache.org/jira/browse/HDFS-3232
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor


The DN servlets currently have code like the following:
{code}
  final String hostname = host instanceof DatanodeInfo 
  ? ((DatanodeInfo)host).getHostName() : host.getIpAddr();
{code}

I believe this outdated, that we now always get one or the other (at least when 
not running the tests). Need to verify that. We should clean this code up as 
well, eg always use the IP (which we'll lookup the FQDN for) since the hostname 
isn't necessarily valid to put in a URL (the DN hostname isn't necesarily a 
FQDN).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3233) Move IP to FQDN conversion from DatanodeJSPHelper to DatanodeID

2012-04-09 Thread Eli Collins (Created) (JIRA)
Move IP to FQDN conversion from DatanodeJSPHelper to DatanodeID
---

 Key: HDFS-3233
 URL: https://issues.apache.org/jira/browse/HDFS-3233
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor


In a handful of places DatanodeJSPHelper looks up the IP for a DN and then 
determines a FQDN for the IP. We should move this code to a single place, a new 
DatanodeID to return the FQDN for a DatanodeID.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3237) DatanodeInfo should have a DatanodeID rather than extend it

2012-04-09 Thread Eli Collins (Created) (JIRA)
DatanodeInfo should have a DatanodeID rather than extend it
---

 Key: HDFS-3237
 URL: https://issues.apache.org/jira/browse/HDFS-3237
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor


DatanodeInfo currently extends DatanodeID, the code would be more clear if it 
had a DatanodeID member instead, as DatanodeInfo is private within the server 
side and DatanodeID is passed to clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3238) ServerCommand and friends don't need to be writables

2012-04-09 Thread Eli Collins (Created) (JIRA)
ServerCommand and friends don't need to be writables


 Key: HDFS-3238
 URL: https://issues.apache.org/jira/browse/HDFS-3238
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hdfs-3238.txt

We can remove writable infrastructure from the ServerCommand classes as they're 
not uses across clients and we're PB within the server side. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3244) Remove dead writable code from hdfs/protocol

2012-04-10 Thread Eli Collins (Created) (JIRA)
Remove dead writable code from hdfs/protocol


 Key: HDFS-3244
 URL: https://issues.apache.org/jira/browse/HDFS-3244
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Eli Collins
Assignee: Eli Collins


While doing HDFS-3238 I noticed that there's more dead writable code in 
hdfs/protocol. Let's remove it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3250) Get the fuse-dfs test running

2012-04-10 Thread Eli Collins (Created) (JIRA)
Get the fuse-dfs test running
-

 Key: HDFS-3250
 URL: https://issues.apache.org/jira/browse/HDFS-3250
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: contrib/fuse-dfs, test
Reporter: Eli Collins


Now that fuse-dfs is building again (HDFS-2696) let's get the test running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3251) Mavenize the fuse-dfs build

2012-04-10 Thread Eli Collins (Created) (JIRA)
Mavenize the fuse-dfs build 


 Key: HDFS-3251
 URL: https://issues.apache.org/jira/browse/HDFS-3251
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: build, contrib/fuse-dfs
Reporter: Eli Collins


The fuse-dfs build still uses the old ant-based build, let's integrate it as 
part of the maven build. Looks like we need to introduce sub-directories under 
src/main/native as libhdfs is there (w/o it's own subdirectory).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3252) Include fuse-dfs in the tarball

2012-04-10 Thread Eli Collins (Created) (JIRA)
Include fuse-dfs in the tarball
---

 Key: HDFS-3252
 URL: https://issues.apache.org/jira/browse/HDFS-3252
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: build, contrib/fuse-dfs
Reporter: Eli Collins


The fuse-dfs binary needs to be included in the binary tarball.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3258) Test for HADOOP-8144 (pseudoSortByDistance in NetworkTopology for first rack local node)

2012-04-11 Thread Eli Collins (Created) (JIRA)
Test for HADOOP-8144 (pseudoSortByDistance in NetworkTopology for first rack 
local node)


 Key: HDFS-3258
 URL: https://issues.apache.org/jira/browse/HDFS-3258
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Reporter: Eli Collins
Assignee: Junping Du


For updating TestNetworkTopology to cover HADOOP-8144.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2502) hdfs-default.xml should include dfs.name.dir.restore

2011-10-25 Thread Eli Collins (Created) (JIRA)
hdfs-default.xml should include dfs.name.dir.restore


 Key: HDFS-2502
 URL: https://issues.apache.org/jira/browse/HDFS-2502
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.23.0
Reporter: Eli Collins
Priority: Minor




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2514) Link resolution bug for intermediate symlinks with relative targets

2011-10-29 Thread Eli Collins (Created) (JIRA)
Link resolution bug for intermediate symlinks with relative targets
---

 Key: HDFS-2514
 URL: https://issues.apache.org/jira/browse/HDFS-2514
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.21.0, 0.22.0, 0.23.0
Reporter: Eli Collins
Assignee: Eli Collins


There's a bug in the way the Namenode resolves intermediate symlinks (ie the 
symlink is not the final path component) in paths when the symlink's target is 
a relative path. Will post the full description in the first comment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2534) Remove RemoteBlockReader and rename RemoteBlockReader2

2011-11-02 Thread Eli Collins (Created) (JIRA)
Remove RemoteBlockReader and rename RemoteBlockReader2
--

 Key: HDFS-2534
 URL: https://issues.apache.org/jira/browse/HDFS-2534
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Affects Versions: 0.24.0
Reporter: Eli Collins


HDFS-2129 introduced a new BlockReader implementation and preserved the old 
that that can be selected via a config option as a fallback in 23. For 24 let's 
remove RemoteBlockReader and rename RemoteBlockReader2, and remove the config 
option.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2556) HDFS tests fail on systems with umask 0002

2011-11-16 Thread Eli Collins (Created) (JIRA)
HDFS tests fail on systems with umask 0002
--

 Key: HDFS-2556
 URL: https://issues.apache.org/jira/browse/HDFS-2556
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, test
Affects Versions: 0.20.206.0
Reporter: Eli Collins
Priority: Minor


On systems with umask 0002 tests will fail due to all data dir directories 
being invalid:

2011-11-16 14:19:53,879 WARN  datanode.DataNode 
(DataNode.java:makeInstance(1569)) - Invalid directory in dfs.data.dir: 
Incorrect permission for 
/data/2/eli/src/hadoop2/build/test/data/dfs/data/data1, expected: rwxr-xr-x, 
while actual: rwxrwxr-x
2011-11-16 14:19:53,893 WARN  datanode.DataNode 
(DataNode.java:makeInstance(1569)) - Invalid directory in dfs.data.dir: 
Incorrect permission for 
/data/2/eli/src/hadoop2/build/test/data/dfs/data/data2, expected: rwxr-xr-x, 
while actual: rwxrwxr-x
2011-11-16 14:19:53,894 ERROR datanode.DataNode 
(DataNode.java:makeInstance(1575)) - All directories in dfs.data.dir are 
invalid.

Aside from changing the umask backporting HDFS-1560 fixed this issue. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2570) Add descriptions for dfs.*.https.address in hdfs-default.xml

2011-11-19 Thread Eli Collins (Created) (JIRA)
Add descriptions for dfs.*.https.address in hdfs-default.xml


 Key: HDFS-2570
 URL: https://issues.apache.org/jira/browse/HDFS-2570
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.23.0
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Trivial
 Attachments: hdfs-2570-1.patch

Let's add descriptions for dfs.*.https.address in hdfs-default.xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2596) TestDirectoryScanner doesn't test parallel scans

2011-11-27 Thread Eli Collins (Created) (JIRA)
TestDirectoryScanner doesn't test parallel scans


 Key: HDFS-2596
 URL: https://issues.apache.org/jira/browse/HDFS-2596
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, test
Affects Versions: 0.23.0, 0.22.0
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hdfs-2596-1.patch

The code from HDFS-854 below doesn't run the test with parallel scanning. They 
probably intended "parallelism < 3".

{code}
+  public void testDirectoryScanner() throws Exception {
+// Run the test with and without parallel scanning
+for (int parallelism = 1; parallelism < 2; parallelism++) {
+  runTest(parallelism);
+}
+  }
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2607) Use named daemon threads for the directory scanner

2011-11-29 Thread Eli Collins (Created) (JIRA)
Use named daemon threads for the directory scanner
--

 Key: HDFS-2607
 URL: https://issues.apache.org/jira/browse/HDFS-2607
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.23.0, 0.21.0, 0.22.0
Reporter: Eli Collins
 Fix For: 0.23.1


HDFS-854 added a thread pool for block scanners. It would be better to use a 
factory that names the threads and daemonizes them so they don't block shutdown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2610) Make dashes vs dots consistent in config key names

2011-11-29 Thread Eli Collins (Created) (JIRA)
Make dashes vs dots consistent in config key names
--

 Key: HDFS-2610
 URL: https://issues.apache.org/jira/browse/HDFS-2610
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Eli Collins
Priority: Minor


The use of dashes vs dots in the config keys in inconsistent (eg https.address 
vs http-address). Let's make them all consistent (no dashes seems most 
consistent) and add the necessary deprecations in HdfsConfiguration.java. 
Should do the same in common and MR so we're not inconsistent there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2611) Build and publish indexed source code

2011-11-29 Thread Eli Collins (Created) (JIRA)
Build and publish indexed source code
-

 Key: HDFS-2611
 URL: https://issues.apache.org/jira/browse/HDFS-2611
 Project: Hadoop HDFS
  Issue Type: Task
  Components: documentation
Reporter: Eli Collins


The HBase folks publish xref which produces pages like 
http://hbase.apache.org/xref/org/apache/hadoop/hbase/client/Delete.html. It's 
quite nice: it makes their code indexable by Google, and, since it understands 
Java, it's easy to move around between classes. Let's do this as well. Here's 
the maven plugin: http://maven.apache.org/plugins/maven-jxr-plugin/jxr-mojo.html


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2631) Rewrite fuse-dfs to use the webhdfs protocol

2011-12-05 Thread Eli Collins (Created) (JIRA)
Rewrite fuse-dfs to use the webhdfs protocol


 Key: HDFS-2631
 URL: https://issues.apache.org/jira/browse/HDFS-2631
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: contrib/fuse-dfs
Reporter: Eli Collins


We should port the implementation of fuse-dfs to use the webhdfs protocol. This 
has a number of benefits:
* Compatibility - allows a single fuse client to work across server versions
* Works with both WebHDFS and Hoop since they are protocol compatible
* Removes the overhead related to libhdfs (forking a jvm)
* Makes it easier to support features like security

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2633) BPOfferService#isAlive is poorly named

2011-12-05 Thread Eli Collins (Created) (JIRA)
BPOfferService#isAlive is poorly named
--

 Key: HDFS-2633
 URL: https://issues.apache.org/jira/browse/HDFS-2633
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Affects Versions: 0.23.0
Reporter: Eli Collins
Priority: Minor


Per HDFS-2627 the current implementation returns true even if one of the actor 
threads is dead. "The only non-test use case for isAlive seems to be from 
BlockPoolSliceScanner and DataBlockScanner, where they're really trying to 
figure out whether they should stop scanning the block pool. If the BPOS is 
connected to any NN at all (regardless of active/standby) it needs to report 
true so that the scanners don't stop running. It would be nice to clean up 
these calls and specify in their function name that they're only meant for use 
in tests" and annotate @VisibleForTesting.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2637) The rpc timeout for block recovery is too low

2011-12-06 Thread Eli Collins (Created) (JIRA)
The rpc timeout for block recovery is too low 
--

 Key: HDFS-2637
 URL: https://issues.apache.org/jira/browse/HDFS-2637
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 1.0.0
Reporter: Eli Collins
Assignee: Eli Collins


The RPC timeout for block recovery does not take into account that it issues 
multiple RPCs itself. This can cause recovery to fail if the network is 
congested or DNs are busy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2638) Improve a blog recovery log

2011-12-06 Thread Eli Collins (Created) (JIRA)
Improve a blog recovery log
---

 Key: HDFS-2638
 URL: https://issues.apache.org/jira/browse/HDFS-2638
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 1.0.0
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor


It would be useful to know whether an attempt to recover a block is failing 
because the block was already recovered (has a new GS) or the block is missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2639) A client may fail during block recovery even if its request to recover a block succeeds

2011-12-06 Thread Eli Collins (Created) (JIRA)
A client may fail during block recovery even if its request to recover a block 
succeeds
---

 Key: HDFS-2639
 URL: https://issues.apache.org/jira/browse/HDFS-2639
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 1.0.0
Reporter: Eli Collins


The client gets stuck in the following loop if an rpc its issued to recover a 
block timed out:

{noformat}
DataStreamer#run
1.  processDatanodeError
2. DN#recoverBlock
3.DN#syncBlock
4.   NN#nextGenerationStamp
5.  sleep 1s
6.  goto 1
{noformat}

Once we've timed out onece at step 2 and loop, step 2 throws an IOE because the 
block is already being recovered and step 4 throws an IOE because the block GS 
is now out of date (the previous, timed-out, request got a new GS and updated 
the block). Eventually the client reaches max retries, considers all DNs bad, 
and close throws an IOE.

The client should be able to succeed if one of its requests to recover the 
block succeeded. It should still fail if another client (eg HBase via 
recoverLease or the NN via releaseLease) succesfully recovered the block. One 
way to handle this would be to not timeout the request to recover the block. 
Another would be able to make a subsequent call to recoverBlock succeed eg by 
updating the block's sequence number to be the latest value that was updated by 
the same client in the previous request (ie it can recover over itself but not 
another client).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2653) DFSClient should cache whether addrs are non-local when short-circuiting is enabled

2011-12-09 Thread Eli Collins (Created) (JIRA)
DFSClient should cache whether addrs are non-local when short-circuiting is 
enabled
---

 Key: HDFS-2653
 URL: https://issues.apache.org/jira/browse/HDFS-2653
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Affects Versions: 0.23.1, 1.0.0
Reporter: Eli Collins
Assignee: Eli Collins


Something Todd mentioned to me off-line.. currently DFSClient doesn't cache the 
fact that non-local reads are non-local, so if short-circuiting is enabled 
every time we create a block reader we'll go through the isLocalAddress code 
path. We should cache the fact that an addr is non-local as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2654) Make BlockReaderLocal not extend RemoteBlockReader2

2011-12-09 Thread Eli Collins (Created) (JIRA)
Make BlockReaderLocal not extend RemoteBlockReader2
---

 Key: HDFS-2654
 URL: https://issues.apache.org/jira/browse/HDFS-2654
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Affects Versions: 0.23.1, 1.0.0
Reporter: Eli Collins
Assignee: Eli Collins


The BlockReaderLocal code paths are easier to understand (especially true on 
branch-1 where BlockReaderLocal inherits code from BlockerReader and 
FSInputChecker) if the local and remote block reader implementations are 
independent, and they're not really sharing much code anyway. If for some 
reason they start to share sifnificant code we can make the BlockReader 
interface an abstract class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2655) BlockReaderLocal#skip performs unnecessary IO

2011-12-09 Thread Eli Collins (Created) (JIRA)
BlockReaderLocal#skip performs unnecessary IO
-

 Key: HDFS-2655
 URL: https://issues.apache.org/jira/browse/HDFS-2655
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Affects Versions: 0.23.1
Reporter: Eli Collins


Per HDFS-2654 BlockReaderLocal#skip performs the skip by reading the data so we 
stay in sync with checksums. This could be implemented more efficiently in the 
future to skip to the beginning of the appropriate checksum chunk and then only 
read to the middle of that chunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2657) TestHttpFSServer and TestServerWebApp are failing on trunk

2011-12-10 Thread Eli Collins (Created) (JIRA)
TestHttpFSServer and TestServerWebApp are failing on trunk
--

 Key: HDFS-2657
 URL: https://issues.apache.org/jira/browse/HDFS-2657
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eli Collins



>>> org.apache.hadoop.fs.http.server.TestHttpFSServer.instrumentation
>>> org.apache.hadoop.lib.servlet.TestServerWebApp.lifecycle

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2658) HttpFS introduced 70 javadoc warnings

2011-12-10 Thread Eli Collins (Created) (JIRA)
HttpFS introduced 70 javadoc warnings
-

 Key: HDFS-2658
 URL: https://issues.apache.org/jira/browse/HDFS-2658
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.1
Reporter: Eli Collins
Assignee: Alejandro Abdelnur


{noformat}
hadoop1 (trunk)$ grep warning javadoc.txt |grep -c httpfs
70
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2659) 20 "Cannot find annotation method 'value()'" of LimitedPrivate javadoc warnings

2011-12-10 Thread Eli Collins (Created) (JIRA)
20 "Cannot find annotation method 'value()'" of LimitedPrivate javadoc warnings
---

 Key: HDFS-2659
 URL: https://issues.apache.org/jira/browse/HDFS-2659
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eli Collins


There are 20 of the following warnings on trunK:

Cannot find annotation method 'value()' in type 
'org.apache.hadoop.classification.InterfaceAudience.LimitedPrivate'



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2677) HA: Web UI should indicate the NN state

2011-12-13 Thread Eli Collins (Created) (JIRA)
HA: Web UI should indicate the NN state
---

 Key: HDFS-2677
 URL: https://issues.apache.org/jira/browse/HDFS-2677
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins


The DFS web UI should indicate whether it's an active or standby.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2679) Add interface to query current state to HAServiceProtocol

2011-12-13 Thread Eli Collins (Created) (JIRA)
Add interface to query current state to HAServiceProtocol 
--

 Key: HDFS-2679
 URL: https://issues.apache.org/jira/browse/HDFS-2679
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins


Let's add an interface to HAServiceProtocol to query the current state of a 
NameNode for use by the the CLI (HAAdmin) and Web UI (HDFS-2677). This 
essentially makes the names "active" and "standby" from ACTIVE_STATE and 
STANDBY_STATE public interfaces, which IMO seems reasonable. Unlike the other 
APIs we should be able to use the interface even when HA is not enabled (as by 
default a non-HA NN is active).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2701) Cleanup FS* processIOError methods

2011-12-17 Thread Eli Collins (Created) (JIRA)
Cleanup FS* processIOError methods
--

 Key: HDFS-2701
 URL: https://issues.apache.org/jira/browse/HDFS-2701
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.20.205.0
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: 1.1.0


Let's rename the various "processIOError" methods to be more descriptive. The 
current code makes it difficult to identify and reason about bug fixes. While 
we're at it let's remove "Fatal" from the "Unable to sync the edit log" log 
since it's not actually a fatal error (this is confusing to users). And 2NN 
"Checkpoint done" should be info, not a warning (also confusing to users).

Thanks to HDFS-1073 these issues don't exist on trunk or 23.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2702) A single failed name dir can cause the NN to exit

2011-12-17 Thread Eli Collins (Created) (JIRA)
A single failed name dir can cause the NN to exit 
--

 Key: HDFS-2702
 URL: https://issues.apache.org/jira/browse/HDFS-2702
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.205.0
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Critical
 Fix For: 1.1.0


There's a bug in FSEditLog#rollEditLog which results in the NN process exiting 
if a single name dir has failed. Here's the relevant code:

{code}
close()  // So editStreams.size() is 0 
foreach edits dir {
  ..
  eStream = new ...  // Might get an IOE here
  editStreams.add(eStream);
} catch (IOException ioe) {
  removeEditsForStorageDir(sd);  // exits if editStreams.size() <= 1  
}
{code}

If we get an IOException before we've added two edits streams to the list we'll 
exit, eg if there's an error processing the 1st name dir we'll exit even if 
there are 4 valid name dirs. The fix is to move the checking out of 
removeEditsForStorageDir (nee processIOError) or modify it so it can be 
disabled in some cases, eg here where we don't yet know how many streams are 
valid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2703) removedStorageDirs is not updated everywhere we remove a storage dir

2011-12-17 Thread Eli Collins (Created) (JIRA)
removedStorageDirs is not updated everywhere we remove a storage dir


 Key: HDFS-2703
 URL: https://issues.apache.org/jira/browse/HDFS-2703
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.0.0
Reporter: Eli Collins
Assignee: Eli Collins


There are a number of places (FSEditLog#open, purgeEditLog, and rollEditLog) 
where we remove a storage directory but don't add it to the removedStorageDirs 
list. This means a storage dir may have been removed but we don't see it in the 
log or Web UI. This doesn't affect trunk/23 since the code there is totally 
different.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2704) NameNodeResouceChecker#checkAvailableResources should check for inodes

2011-12-17 Thread Eli Collins (Created) (JIRA)
NameNodeResouceChecker#checkAvailableResources should check for inodes
--

 Key: HDFS-2704
 URL: https://issues.apache.org/jira/browse/HDFS-2704
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.24.0
Reporter: Eli Collins


NameNodeResouceChecker#checkAvailableResources currently just checks for free 
space. However inodes are also a file system resource that needs to be 
available (you can run out of inodes but have free space).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2708) Stats for the total # blocks and blocks per DN

2011-12-20 Thread Eli Collins (Created) (JIRA)
Stats for the total # blocks and blocks per DN
--

 Key: HDFS-2708
 URL: https://issues.apache.org/jira/browse/HDFS-2708
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, name-node
Reporter: Eli Collins
Priority: Minor


It would be useful for tools to be able to retrieve the total # blocks in the 
file system (and also display eg via dfsadmin report, this is currently only 
available via FSNamesystemMetrics, so would add to ClientProtocol#getStats?) 
and the total number of blocks on each datanode (via DataNodeInfo).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2715) start-dfs.sh falsely warns about processes already running

2011-12-21 Thread Eli Collins (Created) (JIRA)
start-dfs.sh falsely warns about processes already running
--

 Key: HDFS-2715
 URL: https://issues.apache.org/jira/browse/HDFS-2715
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.24.0
Reporter: Eli Collins


The sbin script pid detection is broken. Running star-dfs.sh indicates the 
following even if there are no processes are running and the pid dir is empty 
before starting.

{noformat}
hadoop-0.24.0-SNAPSHOT $ ./sbin/start-dfs.sh 
Starting namenodes on [localhost localhost]
localhost: starting namenode, logging to 
/home/eli/hadoop/dirs1/logs/eli/hadoop-eli-namenode-eli-thinkpad.out
localhost: namenode running as process 25256. Stop it first.
{noformat}

This may be in 23 as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2731) Autopopulate standby name dirs if they're empty

2011-12-29 Thread Eli Collins (Created) (JIRA)
Autopopulate standby name dirs if they're empty
---

 Key: HDFS-2731
 URL: https://issues.apache.org/jira/browse/HDFS-2731
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins


To setup a SBN we currently format the primary then manually copy the name dirs 
to the SBN. The SBN should do this automatically. Specifically, on NN startup, 
if HA with a shared edits dir is configured and populated, if the SBN has empty 
name dirs it should downloads the image and log from the primary (as an 
optimization it could copy the logs from the shared dir). If the other NN is 
still in standby then it should fails to start as it does currently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2732) Add support for the standby in the bin scripts

2011-12-29 Thread Eli Collins (Created) (JIRA)
Add support for the standby in the bin scripts
--

 Key: HDFS-2732
 URL: https://issues.apache.org/jira/browse/HDFS-2732
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins


We need to update the bin scripts to support SBNs. Two ideas:

Modify start-dfs.sh to start another copy of the NN if HA is configured. We 
could introduce a file similar to masters (2NN hosts) called standbys which 
lists the SBN hosts, and start-dfs.sh would automatically make the NN it starts 
active (and leave the NNs listed in standby as is).

Or simpler, we could just provide a start-namenode.sh script that a user can 
run to start the SBN on another host themselves. The user would manually tell 
the other NN to be active via HAAdmin (or start-dfs.sh could do that 
automatically, ie assume the NN it starts should be the primary).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2733) Document HA configuration and CLI

2011-12-29 Thread Eli Collins (Created) (JIRA)
Document HA configuration and CLI
-

 Key: HDFS-2733
 URL: https://issues.apache.org/jira/browse/HDFS-2733
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: documentation, ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins


We need to document the configuration changes in HDFS-2231 and the new CLI 
introduced by HADOOP-7774.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2735) HA: add tests for multiple shared edits dirs

2011-12-30 Thread Eli Collins (Created) (JIRA)
HA: add tests for multiple shared edits dirs


 Key: HDFS-2735
 URL: https://issues.apache.org/jira/browse/HDFS-2735
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, test
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins


You can configure and run with multiple shared edits dirs but we don't have any 
test coverage for them. In particular, we should cover the behavior of the edit 
log tailer with multiple dirs, and failure scenarios (eg can we tolerate a 
single shared dir failure if we have two shared dirs).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2736) HA: support separate SBN and 2NN?

2011-12-30 Thread Eli Collins (Created) (JIRA)
HA: support separate SBN and 2NN?
-

 Key: HDFS-2736
 URL: https://issues.apache.org/jira/browse/HDFS-2736
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins


HDFS-2291 adds support for making the SBN capable of checkpointing, seems like 
we may also need to support the 2NN checkpointing as well. Eg if we fail over 
to the SBN does it continue to checkpoint? If not the log grows unbounded until 
the old primary comes back, if so does that create performance problems since 
the primary wasn't previously checkpointing?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2747) HA: entering SM after starting SBN can NPE

2012-01-03 Thread Eli Collins (Created) (JIRA)
HA: entering SM after starting SBN can NPE
--

 Key: HDFS-2747
 URL: https://issues.apache.org/jira/browse/HDFS-2747
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins


Entering SM on the primary after while it's already in SM after the SBN is 
started results in an NPE: 

{noformat}
hadoop-0.24.0-SNAPSHOT $ ./bin/hdfs dfsadmin -safemode get
Safe mode is ON
hadoop-0.24.0-SNAPSHOT $ ./bin/hdfs dfsadmin -safemode enter
safemode: java.lang.NullPointerException
{noformat}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2752) HA: exit if multiple shared dirs are configured

2012-01-05 Thread Eli Collins (Created) (JIRA)
HA: exit if multiple shared dirs are configured
---

 Key: HDFS-2752
 URL: https://issues.apache.org/jira/browse/HDFS-2752
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins


We don't support multiple shared edits dirs, we should fail to start with an 
error in this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2754) HA: enable dfs.namenode.name.dir.restore if HA is enabled

2012-01-05 Thread Eli Collins (Created) (JIRA)
HA: enable dfs.namenode.name.dir.restore if HA is enabled
-

 Key: HDFS-2754
 URL: https://issues.apache.org/jira/browse/HDFS-2754
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins


If HA is enabled it seems like we should always try to restore failed name 
dirs. Let's auto-enable name dir restoration if HA is enabled, at least for 
shared edits dirs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2755) HA: add tests for flaky and failed shared edits directories

2012-01-05 Thread Eli Collins (Created) (JIRA)
HA: add tests for flaky and failed shared edits directories
---

 Key: HDFS-2755
 URL: https://issues.apache.org/jira/browse/HDFS-2755
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, test
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins


We should test the behavior with both flaky and failed shared edits dirs. The 
tests should cover when name dir restore is enabled and disabled. There should 
be a warning and an API that we can check if all shared directories are not 
online.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2758) HA: multi-process MiniDFS cluster for testing ungraceful shutdown

2012-01-05 Thread Eli Collins (Created) (JIRA)
HA: multi-process MiniDFS cluster for testing ungraceful shutdown
-

 Key: HDFS-2758
 URL: https://issues.apache.org/jira/browse/HDFS-2758
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, test
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins


We should test ungraceful termination of NN processes, this is generally useful 
for HDFS testing, but particularly needed for HA since we may do this as via 
fencing (send a NN a SIGILL via ssh kill -9, flip the PDU, etc). We can't 
currently do this with the MiniDFSCluster since everything is in one process 
and killing the native thread hosting the java thread terminates the whole 
process.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2781) Add client protocol and DFSadmin for command to restore failed storage

2012-01-11 Thread Eli Collins (Created) (JIRA)
Add client protocol and DFSadmin for command to restore failed storage
--

 Key: HDFS-2781
 URL: https://issues.apache.org/jira/browse/HDFS-2781
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins


Per HDFS-2769, it's important that an admin be able to ask the NN to try to 
restore failed storage since we may drop into SM until the shared edits dir is 
restored (w/o having to wait for the next checkpoint). There's currently an API 
(and usage in DFSAdmin) to flip the flag indicating whether the NN should try 
to restore failed storage but not that it should actually attempt to do so. 
This jira is to add one. This is useful outside HA but doing as an HDFS-1623 
sub-task since it's motivated by HA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2782) HA: Support multiple shared edits dirs

2012-01-11 Thread Eli Collins (Created) (JIRA)
HA: Support multiple shared edits dirs
--

 Key: HDFS-2782
 URL: https://issues.apache.org/jira/browse/HDFS-2782
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins


Supporting multiple shared dirs will improve availability (eg see HDFS-2769). 
You may want to use multiple shared dirs on a single filer (eg for better fault 
isolation) or because you want to use multiple filers/mounts. Per HDFS-2752 
(and HDFS-2735) we need to do things like use the JournalSet in EditLogTailer 
and add tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2788) HdfsServerConstants#DN_KEEPALIVE_TIMEOUT is dead code

2012-01-13 Thread Eli Collins (Created) (JIRA)
HdfsServerConstants#DN_KEEPALIVE_TIMEOUT is dead code
-

 Key: HDFS-2788
 URL: https://issues.apache.org/jira/browse/HDFS-2788
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Affects Versions: 0.23.0, 0.22.0
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hdfs-2788.txt

HDFS-941 introduced HdfsServerConstants#DN_KEEPALIVE_TIMEOUT but its never 
used. Perhaps was renamed to 
DFSConfigKeys#DFS_DATANODE_SOCKET_REUSE_KEEPALIVE_DEFAULT while the patch was 
written and the old one wasn't deleted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2789) TestHAAdmin.testFailover is failing

2012-01-13 Thread Eli Collins (Created) (JIRA)
TestHAAdmin.testFailover is failing
---

 Key: HDFS-2789
 URL: https://issues.apache.org/jira/browse/HDFS-2789
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hdfs-2789.txt

Recent change broke it. Need to mock getServiceState to prevent the NPE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2799) Trim fs.checkpoint.dir values

2012-01-16 Thread Eli Collins (Created) (JIRA)
Trim fs.checkpoint.dir values
-

 Key: HDFS-2799
 URL: https://issues.apache.org/jira/browse/HDFS-2799
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Eli Collins


fs.checkpoint.dir values need to be trimmed like dfs.name.dir and dfs.data.dir 
values so eg the following works. This currently results in the directory 
"HADOOP_HOME/?/home/eli/hadoop/dirs3/dfs/chkpoint1" being created.

{noformat}
  
fs.checkpoint.dir
 
/home/eli/hadoop/dirs3/dfs/chkpoint1,
/home/eli/hadoop/dirs3/dfs/chkpoint2
 
  
{noformat}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2800) TestStandbyCheckpoints.testCheckpointCancellation is racy

2012-01-16 Thread Eli Collins (Created) (JIRA)
TestStandbyCheckpoints.testCheckpointCancellation is racy
-

 Key: HDFS-2800
 URL: https://issues.apache.org/jira/browse/HDFS-2800
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, test
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins


TestStandbyCheckpoints.testCheckpointCancellation is racy, have seen the 
following assert on line 212 fail:

{code}
assertTrue(StandbyCheckpointer.getCanceledCount() > 0);
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2808) HA: allow hdfs-specific names to be used in haadmin

2012-01-18 Thread Eli Collins (Created) (JIRA)
HA: allow hdfs-specific names to be used in haadmin
---

 Key: HDFS-2808
 URL: https://issues.apache.org/jira/browse/HDFS-2808
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins


Currently the HAAdmin CLI tools refer to services using host:port, would be 
more user friendly to allow people to use hdfs-specific logical names, eg the 
NNs configured in dfs.ha.namenodes and let it do the mapping to host:port. 
Could do this by wrapping HAAdmin with a hdfs-specific class and a dfshadmin 
command. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2860) HA: TestDFSRollback#testRollback is failing

2012-01-30 Thread Eli Collins (Created) (JIRA)
HA: TestDFSRollback#testRollback is failing
---

 Key: HDFS-2860
 URL: https://issues.apache.org/jira/browse/HDFS-2860
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Aaron T. Myers


TestDFSRollback#testRollback is failing post HDFS-2824. Looks like a test 
asserting now incorrect behavior. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2876) The unit tests (src/test/unit) are not being compiled and are not runnable

2012-02-01 Thread Eli Collins (Created) (JIRA)
The unit tests (src/test/unit) are not being compiled and are not runnable
--

 Key: HDFS-2876
 URL: https://issues.apache.org/jira/browse/HDFS-2876
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.23.0
Reporter: Eli Collins


The unit tests (src/test/unit not src/test/java) are not being compiled and are 
not runnable. {{mvn -Dtest=TestBlockRecovery test}} executed from 
hadoop-hdfs-project does not compile or execute the test.
TestBlockRecovery does not compile yet this test target completes w/o error. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2884) TestDecommission.testDecommissionFederation fails intermittently

2012-02-02 Thread Eli Collins (Created) (JIRA)
TestDecommission.testDecommissionFederation fails intermittently


 Key: HDFS-2884
 URL: https://issues.apache.org/jira/browse/HDFS-2884
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.23.1
Reporter: Eli Collins


I saw the following assert fail on a jenkins job for branch HDFS-1623 but I 
don't think it's HA related.
 
{noformat}
java.lang.AssertionError: Number of Datanodes  expected:<2> but was:<1>
at org.junit.Assert.fail(Assert.java:91)
at org.junit.Assert.failNotEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:126)
at org.junit.Assert.assertEquals(Assert.java:470)
at 
org.apache.hadoop.hdfs.TestDecommission.validateCluster(TestDecommission.java:275)
at 
org.apache.hadoop.hdfs.TestDecommission.startCluster(TestDecommission.java:288)
at 
org.apache.hadoop.hdfs.TestDecommission.testDecommission(TestDecommission.java:384)
at 
org.apache.hadoop.hdfs.TestDecommission.testDecommissionFederation(TestDecommission.java:344)
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2885) Remove "federation" from the nameservice config options

2012-02-02 Thread Eli Collins (Created) (JIRA)
Remove "federation" from the nameservice config options
---

 Key: HDFS-2885
 URL: https://issues.apache.org/jira/browse/HDFS-2885
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.23.1
Reporter: Eli Collins


HDFS-1623, and potentially other HDFS features will use the nameservice 
abstraction, even if federation is not enabled (eg you need to configure 
{{dfs.federation.nameservices}} in HA even if you're not using federation just 
to declare your nameservice). This is confusing to users. We should consider 
deprecating and removing "federation" from the {{dfs.federation.nameservices}} 
and {{dfs.federation.nameservice.id}} config options, as {{dfs.nameservices}} 
and {{dfs.nameservice.id}} are more intuitive.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2893) The 2NN won't start if dfs.namenode.secondary.http-address is default or specified with a wildcard IP and port

2012-02-04 Thread Eli Collins (Created) (JIRA)
The 2NN won't start if dfs.namenode.secondary.http-address is default or 
specified with a wildcard IP and port
--

 Key: HDFS-2893
 URL: https://issues.apache.org/jira/browse/HDFS-2893
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.1
Reporter: Eli Collins
Priority: Critical


Looks like DFSUtil address matching doesn't find a match if the http-address is 
specified using a wildcard IP and a port. It should return 0.0.0.0:50090 in 
this case which would allow the 2NN to start.

Also, unless http-address is explicitly configured in hdfs-site.xml the 2NN 
will not start, since DFSUtil#getSecondaryNameNodeAddresses does not use the 
default value as a fallback. That may be confusing to people who expect the 
default value to be used.

{noformat}
hadoop-0.23.1-SNAPSHOT $ cat /home/eli/hadoop/conf3/hdfs-site.xml
...
  
dfs.namenode.secondary.http-address
0.0.0.0:50090
  


hadoop-0.23.1-SNAPSHOT $ ./bin/hdfs --config ~/hadoop/conf3 getconf 
-secondarynamenodes
0.0.0.0
hadoop-0.23.1-SNAPSHOT $ ./sbin/start-dfs.sh 
Starting namenodes on [localhost]
localhost: starting namenode, logging to 
/home/eli/hadoop/dirs3/logs/eli/hadoop-eli-namenode-eli-thinkpad.out
localhost: starting datanode, logging to 
/home/eli/hadoop/dirs3/logs/eli/hadoop-eli-datanode-eli-thinkpad.out
Secondary namenodes are not configured.  Cannot start secondary namenodes.
{noformat}

This works if eg localhost:50090 is used.

We should also update the hdfs user guide to remove the reference to the 
masters file since it's no longer used to configure which hosts the 2NN runs on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2894) HA: disable 2NN when HA is enabled

2012-02-04 Thread Eli Collins (Created) (JIRA)
HA: disable 2NN when HA is enabled
--

 Key: HDFS-2894
 URL: https://issues.apache.org/jira/browse/HDFS-2894
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins


The SecondaryNameNode should log a message and refuse to start if HA is enabled 
since the StandbyNode checkpoints by default and IIRC we have not yet enabled 
the ability to have multiple checkpointers in the NN.

On the HA branch the 2NN does not currently start from start-dfs.sh because 
getconf -secondarynamenodes claims the http-address is not configured even 
though it is, though this seems like a bug, in branch 23 getconf will correctly 
return localhost:50090.

{noformat}
 
   dfs.namenode.secondary.http-address
   localhost:50090
 


hadoop-0.24.0-SNAPSHOT $ ./bin/hdfs getconf -secondarynamenodes
Incorrect configuration: secondary namenode address 
dfs.namenode.secondary.http-address is not configured.
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2896) The 2NN incorrectly daemonizes

2012-02-05 Thread Eli Collins (Created) (JIRA)
The 2NN incorrectly daemonizes
--

 Key: HDFS-2896
 URL: https://issues.apache.org/jira/browse/HDFS-2896
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0, 0.24.0
Reporter: Eli Collins
Assignee: Eli Collins


The SecondaryNameNode (and Checkpointer) confuse o.a.h.u.Daemon with a Unix 
daemon. Per below it intends to create a thread that never ends, but 
o.a.h.u.Daemon just marks a thread with Java's Thread#setDaemon which means 
Java will terminate the thread when there are no more non-daemon user threads 
running

{code}
// Create a never ending deamon
Daemon checkpointThread = new Daemon(secondary);
{code}

Perhaps they thought they were using commons Daemon. We of course don't want 
the 2NN to exit unless it exits itself or is stopped explicitly. Currently it 
won't do this because the main thread is not marked as a daemon thread. In any 
case, let's make the 2NN consistent with the NN in this regard (exit when the 
RPC thread exits).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2897) Enable a single 2nn to checkpoint multiple nameservices

2012-02-05 Thread Eli Collins (Created) (JIRA)
Enable a single 2nn to checkpoint multiple nameservices
---

 Key: HDFS-2897
 URL: https://issues.apache.org/jira/browse/HDFS-2897
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: 0.23.0
Reporter: Eli Collins


The dfs.namenode.secondary.http-address needs to be suffixed with a particular 
nameservice. It would be useful to be able to be able to configure a single 2NN 
to checkpoint all the nameservices for a NN rather than having to run a 
separate 2NN per nameservice. It could potentially checkpoint all namenode IDs 
for a nameservice as well but given that the standby is capable of 
checkpointing and is required I think we can ignore this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2911) Gracefully handle OutOfMemoryErrors

2012-02-07 Thread Eli Collins (Created) (JIRA)
Gracefully handle OutOfMemoryErrors
---

 Key: HDFS-2911
 URL: https://issues.apache.org/jira/browse/HDFS-2911
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, name-node
Affects Versions: 1.0.0, 0.23.0
Reporter: Eli Collins


We should gracefully handle j.l.OutOfMemoryError exceptions in the NN or DN. We 
should catch them in a high-level handler, cleanly fail the RPC (vs sending 
back the OOM stackrace) or background thread, and shutdown the NN or DN. 
Currently the process is left in a not well-test tested state (continuously 
fails RPCs and internal threads, may or may not recover and doesn't shutdown 
gracefully).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2916) HA: allow dfsadmin to refer to a particular namenode

2012-02-08 Thread Eli Collins (Created) (JIRA)
HA: allow dfsadmin to refer to a particular namenode


 Key: HDFS-2916
 URL: https://issues.apache.org/jira/browse/HDFS-2916
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins


dfsadmin currently fails over like other clients, so if we you want to put a 
particular NN in safemode you need to use the "fs" option and specify a 
host:ipcport target. Like HDFS-2808 it would be useful to be able to specify a 
logical namenode ID instead of an RPC addr. Since fs is part of generic options 
this could potentially apply to all tools, however most tools want to refer to 
the default logical namenode URI and failover like other clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2918) HA: dfsadmin should failover like other clients

2012-02-08 Thread Eli Collins (Created) (JIRA)
HA: dfsadmin should failover like other clients
---

 Key: HDFS-2918
 URL: https://issues.apache.org/jira/browse/HDFS-2918
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins


dfsadmin currently always uses the first namenode rather than failing over. It 
should failover like other clients, unless fs specifies a specific namenode.

{noformat}
hadoop-0.24.0-SNAPSHOT $ ./bin/hdfs haadmin -failover nn1 nn2
Failover from nn1 to nn2 successful
# nn2 is 8022
hadoop-0.24.0-SNAPSHOT $ ./bin/hdfs dfsadmin -fs localhost:8022 -safemode enter
Safe mode is ON
hadoop-0.24.0-SNAPSHOT $ ./bin/hdfs dfsadmin -safemode get 
Safe mode is OFF
hadoop-0.24.0-SNAPSHOT $ ./bin/hdfs dfsadmin -fs localhost:8022 -safemode get
Safe mode is ON
{noformat}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >