[jira] [Created] (HDFS-3060) Bump TestDistributedUpgrade#testDistributedUpgrade timeout
Bump TestDistributedUpgrade#testDistributedUpgrade timeout -- Key: HDFS-3060 URL: https://issues.apache.org/jira/browse/HDFS-3060 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 0.23.0 Reporter: Eli Collins Assignee: Eli Collins Priority: Minor TestDistributedUpgrade#testDistributedUpgrade occasionally times out. Let's bump its timeout to 5 min to match some of the other long-running tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3078) 2NN https port setting is broken in Hadoop 1.0
2NN https port setting is broken in Hadoop 1.0 -- Key: HDFS-3078 URL: https://issues.apache.org/jira/browse/HDFS-3078 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 1.0.0 Reporter: Eli Collins Assignee: Eli Collins Fix For: 1.1.0 The code in SecondaryNameNode.java to set the https port is broken, if the port is set it sets the bind addr to "addr:addr:port" which is bogus. Even if it did work it uses port 0 instead of port 50490 (default listed in ./src/packages/templates/conf/hdfs-site.xml). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3090) Broaden NN#monitorHealth checks
Broaden NN#monitorHealth checks Key: HDFS-3090 URL: https://issues.apache.org/jira/browse/HDFS-3090 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.23.3 Reporter: Eli Collins Currently the NN implementation of HAServiceProtocol#monitorHealth just calls FSNamesystem#checkAvailableResources. We should extend this method to cover a broader range of resources (eg HDFS-2704), but we should also extend NN#monitorHealth to make other unrelated health checks (eg whether all its important service threads are running, memory usage, etc). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3120) Provide ability to enable sync without append
Provide ability to enable sync without append - Key: HDFS-3120 URL: https://issues.apache.org/jira/browse/HDFS-3120 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 1.0.1 Reporter: Eli Collins Assignee: Eli Collins The work on branch-20-append was to support *sync*, for durable HBase WALs, not *append*. The branch-20-append implementation is known to be buggy. There's been confusion about this, we often answer queries on the list [like this|http://search-hadoop.com/m/wfed01VOIJ5]. Unfortunately, the way to enable correct sync on branch-1 for HBase is to set dfs.support.append to true in your config, which has the side effect of enabling append (which we don't want to do). Let's add a new *dfs.support.hsync* option that enables working sync (which is basically the current dfs.support.append flag modulo one place where it's not referring to sync). For compatibility, if dfs.support.append is set, dfs.support.sync will be set as well. This way someone can enable sync for HBase and still keep the current behavior that if dfs.support.append is not set then an append operation will result in an IOE indicating append is not supported. We should do this on trunk as well, as there's no reason to conflate hsync and append with a single config even if append works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3128) TestResolveHdfsSymlink#testFcResolveAfs shouldn't use /tmp
TestResolveHdfsSymlink#testFcResolveAfs shouldn't use /tmp -- Key: HDFS-3128 URL: https://issues.apache.org/jira/browse/HDFS-3128 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Eli Collins Priority: Minor Saw this on jenkins, TestResolveHdfsSymlink#testFcResolveAfs creates /tmp/alpha which interferes with other executors on the same machine. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3137) Bump LAST_UPGRADABLE_LAYOUT_VERSION
Bump LAST_UPGRADABLE_LAYOUT_VERSION --- Key: HDFS-3137 URL: https://issues.apache.org/jira/browse/HDFS-3137 Project: Hadoop HDFS Issue Type: Improvement Reporter: Eli Collins Assignee: Eli Collins LAST_UPGRADABLE_LAYOUT_VERSION is currently -7, which corresponds to Hadoop 0.14. How about we bump it to -16, which corresponds to Hadoop 0.18? I don't think many people are using releases older than v0.18, and those who are probably want to upgrade to the latest stable release (v1.0). They can always upgrade to v1.0 and then eg 0.23 from there if they want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3138) Move DatanodeInfo#ipcPort and hostName to DatanodeID
Move DatanodeInfo#ipcPort and hostName to DatanodeID Key: HDFS-3138 URL: https://issues.apache.org/jira/browse/HDFS-3138 Project: Hadoop HDFS Issue Type: Improvement Reporter: Eli Collins Assignee: Eli Collins We can fix the following TODO once HDFS-3137 is committed. Also the hostName field should be moved as well (it's not ephemeral, just gets set on registration). {code} //TODO: move it to DatanodeID once DatanodeID is not stored in FSImage out.writeShort(ipcPort); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3139) Minor Datanode logging improvement
Minor Datanode logging improvement -- Key: HDFS-3139 URL: https://issues.apache.org/jira/browse/HDFS-3139 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Eli Collins Assignee: Eli Collins Priority: Minor - DatanodeInfo#getDatanodeReport should log its hostname, in addition to the DNS lookup it does on its IP - Datanode should log the ipc/info/streaming servers its listening on at startup at INFO level -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3140) Support multiple network interfaces
Support multiple network interfaces --- Key: HDFS-3140 URL: https://issues.apache.org/jira/browse/HDFS-3140 Project: Hadoop HDFS Issue Type: New Feature Reporter: Eli Collins Assignee: Eli Collins Umbrella jira to track the HDFS side of HADOOP-8198. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3141) The NN should log "missing" blocks
The NN should log "missing" blocks -- Key: HDFS-3141 URL: https://issues.apache.org/jira/browse/HDFS-3141 Project: Hadoop HDFS Issue Type: Improvement Reporter: Eli Collins Assignee: Eli Collins It would be help debugging if the NN logged at info level and "missing" blocks. In v1 missing means there are no live / decommissioned replicas (ie they're all excess or corrupt), in trunk it means all replicas of the block are corrupt. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3142) TestHDFSCLI.testAll is failing
TestHDFSCLI.testAll is failing -- Key: HDFS-3142 URL: https://issues.apache.org/jira/browse/HDFS-3142 Project: Hadoop HDFS Issue Type: Bug Reporter: Eli Collins Priority: Blocker TestHDFSCLI.testAll is failing in the latest trunk/23 builds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3143) TestGetBlocks.testGetBlocks is failing
TestGetBlocks.testGetBlocks is failing -- Key: HDFS-3143 URL: https://issues.apache.org/jira/browse/HDFS-3143 Project: Hadoop HDFS Issue Type: Bug Reporter: Eli Collins TestGetBlocks.testGetBlocks is failing in the latest trunk/23 builds. Last good build was Mar 23rd. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3144) Refactor DatanodeID#getName by use
Refactor DatanodeID#getName by use -- Key: HDFS-3144 URL: https://issues.apache.org/jira/browse/HDFS-3144 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Eli Collins Assignee: Eli Collins DatanodeID#getName, which returns a string containing the IP:xferPort of a Datanode, is used in a variety of contexts: # Putting the ID in a log message # Connecting to the DN for data transfer # Getting a string to use as a key (eg for comparison) # Using as a hostname, eg for excludes/includes, topology files Same for DatanodeID#getHost, which returns just the IP part, and sometimes we use it as a key, sometimes we tack on the IPC port, etc. Let's have a method for each use, eg toString can be used for #1, a new method (eg getDataXferAddr) for #2, a new method (eg getKey) for #3, new method (eg getHostID) for #4, etc. Aside from the code being more clear, we can change the value for particular uses, eg we can change the format in a log message without changing the address used that clients connect to the DN, or modify the address used for data transfer without changing the other uses. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3145) Disallow self failover
Disallow self failover -- Key: HDFS-3145 URL: https://issues.apache.org/jira/browse/HDFS-3145 Project: Hadoop HDFS Issue Type: Bug Components: ha Reporter: Eli Collins Assignee: Eli Collins It is currently possible for users to make a standby NameNode failover to itself and become active. We shouldn't allow this to happen in case operators mistype and miss the fact that there are now two active NNs. {noformat} bash-4.1$ hdfs haadmin -ns ha-nn-uri -failover nn2 nn2 Failover from nn2 to nn2 successful {noformat} After the failover above, nn2 will be active. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3146) Datanode should be able to register multiple network interfaces
Datanode should be able to register multiple network interfaces --- Key: HDFS-3146 URL: https://issues.apache.org/jira/browse/HDFS-3146 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Reporter: Eli Collins Assignee: Eli Collins The Datanode should register multiple interfaces with the Namenode (who then forwards them to clients). We can do this by extending the DatanodeID, which currently just contains a single interface, to contain a list of interfaces. For compatibility, the DatanodeID method to get the DN address for data transfer should remain unchanged (multiple interfaces are only used where the client explicitly takes advantage of them). By default, if the Datanode binds on all interfaces (via using the wildcard in the dfs*address configuration) all interfaces are exposed, modulo ones like the loopback that should never be exposed. Alternatively, a new configuration parameter ({{dfs.datanode.available.interfaces}}) allows the set of interfaces can be specified explicitly in case the user only wants to expose a subset. If the new default behavior is too disruptive we could default dfs.datanode.available.interfaces to be the IP of the IPC interface which is the only interface exposed today (per HADOOP-6867, only the port from dfs.datanode.address is used today). The interfaces can be specified by name (eg "eth0"), subinterface name (eg "eth0:0"), or IP address. The IP address can be specified by range using CIDR notation so the configuration values are portable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3147) The client should be able to specify which network interfaces to use
The client should be able to specify which network interfaces to use Key: HDFS-3147 URL: https://issues.apache.org/jira/browse/HDFS-3147 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Reporter: Eli Collins Assignee: Eli Collins HDFS-3146 exposes multiple interfaces to the client. However, not all interfaces exposed to clients should be used, eg because not all addresses given to clients may be routable by the client, or a user may want to restrict off-cluster clients from using cluster-private interfaces. Therefore the user should be able to configure clients to use a subset of the addresses they are given. This can be accomplished by a new configuration option ({{dfs.client.available.interfaces}}) that takes a list of interfaces to use, interfaces that don't match the configuration are ignored. Acceptable configuration values are the same as the {{dfs.datanode.available.interfaces}} parameter. In addition, we could also add an option where clients automatically check if they can connect to each interface that's given them, and filter those out by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3148) The client should be able to use multiple local interfaces for data transfer
The client should be able to use multiple local interfaces for data transfer Key: HDFS-3148 URL: https://issues.apache.org/jira/browse/HDFS-3148 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs client Reporter: Eli Collins Assignee: Eli Collins HDFS-3147 covers using multiple interfaces on the server (Datanode) side. Clients should also be able to utilize multiple *local* interfaces for outbound connections instead of always using the interface for the local hostname. This can be accomplished with a new configuration parameter ({{dfs.client.local.interfaces}}) that accepts a list of interfaces the client should use. Acceptable configuration values are the same as the {{dfs.datanode.available.interfaces}} parameter. The client binds its socket to a specific interface, which enables outbound traffic to use that interface. Binding the client socket to a specific address is not sufficient to ensure egress traffic uses that interface. Eg if multiple interfaces are on the same subnet the host requires IP rules that use the source address (which bind sets) to select the destination interface. The SO_BINDTODEVICE socket option could be used to select a specific interface for the connection instead, however it requires JNI (is not in Java's SocketOptions) and root access, which we don't want to require clients have. Like HDFS-3147, the client can use multiple local interfaces for data transfer. Since the client already cache their connections to DNs choosing a local interface at random seems like a good policy. Users can also pin a specific client to a specific interface by specifying just that interface in dfs.client.local.interfaces. This change was discussed in HADOOP-6210 a while back, and is actually useful/independent of the other HDFS-3140 changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3149) The client should blacklist failed local/remote network interface pairs
The client should blacklist failed local/remote network interface pairs --- Key: HDFS-3149 URL: https://issues.apache.org/jira/browse/HDFS-3149 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs client Reporter: Eli Collins Assignee: Eli Collins If a client or worker can not connect to a given remote address, eg due to network or interface failure, then it should blacklist the local/remote interface pair. Only the pair is blacklisted in case the remote interface is routable via another local interface. The pair is black listed for a configurable period of time and another local/remote interface pair is tried. For full fault tolerance, the host interfaces need to be connected to different switches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3150) Add option for clients to contact DNs via hostname in branch-1
Add option for clients to contact DNs via hostname in branch-1 -- Key: HDFS-3150 URL: https://issues.apache.org/jira/browse/HDFS-3150 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node, hdfs client Reporter: Eli Collins Assignee: Eli Collins Per the document attached to HADOOP-8198, this is just for branch-1, and unbreaks DN multihoming. The datanode can be configured to listen on a bond, or all interfaces by specifying the wildcard in the dfs.datanode.*.address configuration options, however per HADOOP-6867 only the source address of the registration is exposed to clients. HADOOP-985 made clients access datanodes by IP primarily to avoid the latency of a DNS lookup, this had the side effect of breaking DN multihoming. In order to fix it let's add back the option for Datanodes to be accessed by hostname. This can be done by: # Modifying the primary field of the Datanode descriptor to be the hostname, or # Modifying Client/Datanode <-> Datanode access use the hostname field instead of the IP I'd like to go with approach #2 as it does not require making an incompatible change to the client protocol, and is much less invasive. It minimizes the scope of modification to just places where clients and Datanodes connect, vs changing all uses of Datanode identifiers. New client and Datanode configuration options are introduced: - {{dfs.client.use.datanode.hostname}} indicates all client to datanode connections should use the datanode hostname (as clients outside cluster may not be able to route the IP) - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should use hostnames when connecting to other Datanodes for data transfer If the configuration options are not used, there is no change in the current behavior. I'm doing something similar to #1 btw in trunk in HDFS-3144 - refactoring the use of DatanodeID to use the right field (IP, IP:xferPort, hostname, etc) based on the context the ID is being used in, vs always using the IP:xferPort as the Datanode's name, and using the name everywhere. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3164) Move DatanodeInfo#hostName to DatanodeID
Move DatanodeInfo#hostName to DatanodeID Key: HDFS-3164 URL: https://issues.apache.org/jira/browse/HDFS-3164 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Eli Collins Assignee: Eli Collins Like HDFS-3138 (the ipcPort) the hostName field in DatanodeInfo is not ephemeral and should be in DatanodeID. This also allows us to fixup the issue where the DatanodeID#name field is overloaded (the DN sets it to a hostname, then the NN clobbers it with an IP, and then the DN clobbers it's hostname field with this IP). If the DN can specify both a "name" and "hostname" in the DatanodeID then this code becomes simpler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3171) The DatanodeID "name" field is overloaded
The DatanodeID "name" field is overloaded -- Key: HDFS-3171 URL: https://issues.apache.org/jira/browse/HDFS-3171 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Eli Collins Assignee: Eli Collins The DatanodeID "name" field is currently overloaded, when the DN creates a DatanodeID to register with the NN it sets "name" to be the datanode hostname, which is the DN's "hostName" member. This isnot necesarily a FQDN, it is either set explicitly or determined by the DNS class, which could return the machine's hostname or the result of a DNS lookup, if configured to do so. The NN then clobbers the "name" field of the DatanodeID with the IP part of the new DatanodeID "name" field it creates (and sets the DatanodeID "hostName" field to the reported "name"). The DN gets the DatanodeID back from the NN and clobbers its "hostName" member with the "name" field of the returned DatanodeID. This makes the code hard to reason about eg DN#getMachine name sometimes returns a hostname and sometimes not, depending on when it's called in sequence with the registration. Ditto for uses of the "name" field. I think these contortions were originally performed because the DatanodeID didn't have a hostName field (it was part of DatanodeInfo) and so there was no way to communicate both at the same time. Now that the hostName field is in DatanodeID (as of HDFS-3164) we can establish the invariant that the "name" field always and only has an IP address and the "hostName" field always and only has a hostname. In HDFS-3144 I'm going to rename the "name" field so its clear that it contains an IP address. The above is enough scope for one change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3172) dfs.upgrade.permission is dead code
dfs.upgrade.permission is dead code --- Key: HDFS-3172 URL: https://issues.apache.org/jira/browse/HDFS-3172 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Eli Collins Assignee: Eli Collins Priority: Trivial As of HDFS-3137 dfs.upgrade.permission is dead code (was only used for upgrading from old, no longer supported releases). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3174) Fix assert in TestPendingDataNodeMessages
Fix assert in TestPendingDataNodeMessages - Key: HDFS-3174 URL: https://issues.apache.org/jira/browse/HDFS-3174 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0 Reporter: Eli Collins Assignee: Eli Collins The assert in the text in TestPendingDataNodeMessages is missing the DatanodeID port number. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3199) TestValidateConfigurationSettings is failing
TestValidateConfigurationSettings is failing Key: HDFS-3199 URL: https://issues.apache.org/jira/browse/HDFS-3199 Project: Hadoop HDFS Issue Type: Bug Reporter: Eli Collins Assignee: Todd Lipcon TestValidateConfigurationSettings is failing on every run. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3208) Bogus entries in hosts files are incorrectly displayed in the report
Bogus entries in hosts files are incorrectly displayed in the report - Key: HDFS-3208 URL: https://issues.apache.org/jira/browse/HDFS-3208 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.0 Reporter: Eli Collins Assignee: Eli Collins DM#getDatanodeListForReport incorrectly creates the DatanodeID for the "dead" report for bogus entries in the host files (eg that an invalid hostname). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3209) dfs.namenode.hosts* configuration options are unused
dfs.namenode.hosts* configuration options are unused Key: HDFS-3209 URL: https://issues.apache.org/jira/browse/HDFS-3209 Project: Hadoop HDFS Issue Type: Improvement Reporter: Eli Collins Priority: Minor HDFS-631 introduced dfs.namenode.hosts and dfs.namenode.hosts.exclude but never actually used them, so they're dead code (dfs.hosts and dfs.hosts.excludes are used instead). IMO the current names are better (even though they're inconsistent) so I'd actually prefer we just remove the dead defines. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3210) JsonUtil#toJsonMap for for a DatanodeInfo should use "ipAddr" instead of "name"
JsonUtil#toJsonMap for for a DatanodeInfo should use "ipAddr" instead of "name" --- Key: HDFS-3210 URL: https://issues.apache.org/jira/browse/HDFS-3210 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0 Reporter: Eli Collins Assignee: Eli Collins Attachments: hdfs-3210.txt In HDFS-3144 I missed a spot when renaming the "name" field. Let's fix that and add a test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3216) DatanodeID should support multiple IP addresses
DatanodeID should support multiple IP addresses --- Key: HDFS-3216 URL: https://issues.apache.org/jira/browse/HDFS-3216 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Eli Collins Assignee: Eli Collins The DatanodeID has a single field for the IP address, for HDFS-3146 we need to extend it to support multiple addresses. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3218) The client should be able to use multiple remote DN interfaces for block transfer
The client should be able to use multiple remote DN interfaces for block transfer - Key: HDFS-3218 URL: https://issues.apache.org/jira/browse/HDFS-3218 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs client Reporter: Eli Collins Assignee: Eli Collins HDFS-3146 and HDFS-3216 expose multiple DN interfaces to the client. In order for clients, in aggregate, to use multiple DN interfaces clients should pick different interfaces when transferring blocks. Given that we cache client <-> DN connections the policy of picking a remote interface at random for each new connection seems best (vs round robin for example). In the future we could make the client congestion aware. We could also establish multiple connections between the client and DN and therefore use multiple interfaces for a single block transfer. Both of those are out of scope for this jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3219) Disambiguate "visible length" in the code and docs
Disambiguate "visible length" in the code and docs -- Key: HDFS-3219 URL: https://issues.apache.org/jira/browse/HDFS-3219 Project: Hadoop HDFS Issue Type: Improvement Reporter: Eli Collins Priority: Minor HDFS-2288 there are two definition of visible length, or rather we're using the same name for two things: # The HDFS-265 design doc which defines it as property of the replica: {quote} visible length is the "number of bytes that have been acknowledged by the downstream DataNodes". It is replica (not block) specific, meaning it can be different for different replicas at a given time. In the document it is called BA (bytes acknowledged), compared to BR (bytes received). {quote} # The definition in HDFS-814 and DFSClient#getVisibleLength which defines it as a property of a file: {quote} The visible length is the length that *all* datanodes in the pipeline contain at least such amount of data. Therefore, these data are visible to the readers. According to this definition the visible length of a file is the floor of all visible lengths of all the replicas of the last block. It's a static property set on open, eg is not updated when a writer calls hflush. Also DFSInputStream#readBlockLength returns the 1st visible length of a replica it finds, so it seems possible (though unlikely) in a failure scenario it could return a length that was longer than what all replicas had. {quote} This has caused confusion in a number of other jiras. We should update the design doc, java doc, perhaps rename DFSClient#getVisibleLength etc to disambiguate this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3220) Improve some block recovery log messages
Improve some block recovery log messages Key: HDFS-3220 URL: https://issues.apache.org/jira/browse/HDFS-3220 Project: Hadoop HDFS Issue Type: Improvement Reporter: Eli Collins FsDatasetImpl has three cases that throw exceptions with the message "THIS IS NOT SUPPOSED TO HAPPEN". These could happen in real life (eg with a corrupt block file). Let's improve these messages to indicate what case we've actually hit instead of this message, which isn't very useful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3221) Update docs for HDFS-3140 (multiple interfaces)
Update docs for HDFS-3140 (multiple interfaces) --- Key: HDFS-3221 URL: https://issues.apache.org/jira/browse/HDFS-3221 Project: Hadoop HDFS Issue Type: Sub-task Components: documentation Reporter: Eli Collins Assignee: Eli Collins Need to update the docs to cover: - How to configure mulithoming (binding to the wildcard, the default) - The new client and server configuration options -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3224) Bug in check for DN re-registration with different storage ID
Bug in check for DN re-registration with different storage ID - Key: HDFS-3224 URL: https://issues.apache.org/jira/browse/HDFS-3224 Project: Hadoop HDFS Issue Type: Bug Reporter: Eli Collins Priority: Minor DatanodeManager#registerDatanode checks the host to node map using an IP:port key, however the map is keyed on IP, so this check will always fail. It's performing the check to determine if a DN with the same IP and storage ID has already registered, and if so to remove this DN from the map and indicate that eg it's no longer hosting these blocks. This bug has been here forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3230) Cleanup DatanodeID creation in the tests
Cleanup DatanodeID creation in the tests Key: HDFS-3230 URL: https://issues.apache.org/jira/browse/HDFS-3230 Project: Hadoop HDFS Issue Type: Improvement Components: test Reporter: Eli Collins Assignee: Eli Collins Priority: Minor A lot of tests create dummy DatanodeIDs for testing, often use bogus values when creating the objects (eg hostname in the IP field), which they can get away with because the IDs aren't actually used. Let's add a test utility method for creating a DatanodeID for testing and use it throughout. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3231) NN Host2NodesMap should use hostnames
NN Host2NodesMap should use hostnames - Key: HDFS-3231 URL: https://issues.apache.org/jira/browse/HDFS-3231 Project: Hadoop HDFS Issue Type: Improvement Reporter: Eli Collins Assignee: Eli Collins The NN's Host2NodesMap maps "host names" to datanode descriptors. It actually uses IP addresses and should use hostnames instead, as hostnames are a better key (eg a Datanode has one hostname but may have multiple IPs). Per HDFS-3216 there's actually a bug in that it's sometimes accessed with IP:port instead of IP, so that jira should be fixed before this one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3232) Cleanup DatanodeInfo vs DatanodeID handling in DN servlets
Cleanup DatanodeInfo vs DatanodeID handling in DN servlets -- Key: HDFS-3232 URL: https://issues.apache.org/jira/browse/HDFS-3232 Project: Hadoop HDFS Issue Type: Improvement Reporter: Eli Collins Assignee: Eli Collins Priority: Minor The DN servlets currently have code like the following: {code} final String hostname = host instanceof DatanodeInfo ? ((DatanodeInfo)host).getHostName() : host.getIpAddr(); {code} I believe this outdated, that we now always get one or the other (at least when not running the tests). Need to verify that. We should clean this code up as well, eg always use the IP (which we'll lookup the FQDN for) since the hostname isn't necessarily valid to put in a URL (the DN hostname isn't necesarily a FQDN). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3233) Move IP to FQDN conversion from DatanodeJSPHelper to DatanodeID
Move IP to FQDN conversion from DatanodeJSPHelper to DatanodeID --- Key: HDFS-3233 URL: https://issues.apache.org/jira/browse/HDFS-3233 Project: Hadoop HDFS Issue Type: Improvement Reporter: Eli Collins Assignee: Eli Collins Priority: Minor In a handful of places DatanodeJSPHelper looks up the IP for a DN and then determines a FQDN for the IP. We should move this code to a single place, a new DatanodeID to return the FQDN for a DatanodeID. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3237) DatanodeInfo should have a DatanodeID rather than extend it
DatanodeInfo should have a DatanodeID rather than extend it --- Key: HDFS-3237 URL: https://issues.apache.org/jira/browse/HDFS-3237 Project: Hadoop HDFS Issue Type: Improvement Reporter: Eli Collins Assignee: Eli Collins Priority: Minor DatanodeInfo currently extends DatanodeID, the code would be more clear if it had a DatanodeID member instead, as DatanodeInfo is private within the server side and DatanodeID is passed to clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3238) ServerCommand and friends don't need to be writables
ServerCommand and friends don't need to be writables Key: HDFS-3238 URL: https://issues.apache.org/jira/browse/HDFS-3238 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Eli Collins Assignee: Eli Collins Attachments: hdfs-3238.txt We can remove writable infrastructure from the ServerCommand classes as they're not uses across clients and we're PB within the server side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3244) Remove dead writable code from hdfs/protocol
Remove dead writable code from hdfs/protocol Key: HDFS-3244 URL: https://issues.apache.org/jira/browse/HDFS-3244 Project: Hadoop HDFS Issue Type: Improvement Reporter: Eli Collins Assignee: Eli Collins While doing HDFS-3238 I noticed that there's more dead writable code in hdfs/protocol. Let's remove it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3250) Get the fuse-dfs test running
Get the fuse-dfs test running - Key: HDFS-3250 URL: https://issues.apache.org/jira/browse/HDFS-3250 Project: Hadoop HDFS Issue Type: Improvement Components: contrib/fuse-dfs, test Reporter: Eli Collins Now that fuse-dfs is building again (HDFS-2696) let's get the test running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3251) Mavenize the fuse-dfs build
Mavenize the fuse-dfs build Key: HDFS-3251 URL: https://issues.apache.org/jira/browse/HDFS-3251 Project: Hadoop HDFS Issue Type: Improvement Components: build, contrib/fuse-dfs Reporter: Eli Collins The fuse-dfs build still uses the old ant-based build, let's integrate it as part of the maven build. Looks like we need to introduce sub-directories under src/main/native as libhdfs is there (w/o it's own subdirectory). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3252) Include fuse-dfs in the tarball
Include fuse-dfs in the tarball --- Key: HDFS-3252 URL: https://issues.apache.org/jira/browse/HDFS-3252 Project: Hadoop HDFS Issue Type: Improvement Components: build, contrib/fuse-dfs Reporter: Eli Collins The fuse-dfs binary needs to be included in the binary tarball. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3258) Test for HADOOP-8144 (pseudoSortByDistance in NetworkTopology for first rack local node)
Test for HADOOP-8144 (pseudoSortByDistance in NetworkTopology for first rack local node) Key: HDFS-3258 URL: https://issues.apache.org/jira/browse/HDFS-3258 Project: Hadoop HDFS Issue Type: Test Components: test Reporter: Eli Collins Assignee: Junping Du For updating TestNetworkTopology to cover HADOOP-8144. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2502) hdfs-default.xml should include dfs.name.dir.restore
hdfs-default.xml should include dfs.name.dir.restore Key: HDFS-2502 URL: https://issues.apache.org/jira/browse/HDFS-2502 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Affects Versions: 0.23.0 Reporter: Eli Collins Priority: Minor -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2514) Link resolution bug for intermediate symlinks with relative targets
Link resolution bug for intermediate symlinks with relative targets --- Key: HDFS-2514 URL: https://issues.apache.org/jira/browse/HDFS-2514 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.21.0, 0.22.0, 0.23.0 Reporter: Eli Collins Assignee: Eli Collins There's a bug in the way the Namenode resolves intermediate symlinks (ie the symlink is not the final path component) in paths when the symlink's target is a relative path. Will post the full description in the first comment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2534) Remove RemoteBlockReader and rename RemoteBlockReader2
Remove RemoteBlockReader and rename RemoteBlockReader2 -- Key: HDFS-2534 URL: https://issues.apache.org/jira/browse/HDFS-2534 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Affects Versions: 0.24.0 Reporter: Eli Collins HDFS-2129 introduced a new BlockReader implementation and preserved the old that that can be selected via a config option as a fallback in 23. For 24 let's remove RemoteBlockReader and rename RemoteBlockReader2, and remove the config option. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2556) HDFS tests fail on systems with umask 0002
HDFS tests fail on systems with umask 0002 -- Key: HDFS-2556 URL: https://issues.apache.org/jira/browse/HDFS-2556 Project: Hadoop HDFS Issue Type: Bug Components: data-node, test Affects Versions: 0.20.206.0 Reporter: Eli Collins Priority: Minor On systems with umask 0002 tests will fail due to all data dir directories being invalid: 2011-11-16 14:19:53,879 WARN datanode.DataNode (DataNode.java:makeInstance(1569)) - Invalid directory in dfs.data.dir: Incorrect permission for /data/2/eli/src/hadoop2/build/test/data/dfs/data/data1, expected: rwxr-xr-x, while actual: rwxrwxr-x 2011-11-16 14:19:53,893 WARN datanode.DataNode (DataNode.java:makeInstance(1569)) - Invalid directory in dfs.data.dir: Incorrect permission for /data/2/eli/src/hadoop2/build/test/data/dfs/data/data2, expected: rwxr-xr-x, while actual: rwxrwxr-x 2011-11-16 14:19:53,894 ERROR datanode.DataNode (DataNode.java:makeInstance(1575)) - All directories in dfs.data.dir are invalid. Aside from changing the umask backporting HDFS-1560 fixed this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2570) Add descriptions for dfs.*.https.address in hdfs-default.xml
Add descriptions for dfs.*.https.address in hdfs-default.xml Key: HDFS-2570 URL: https://issues.apache.org/jira/browse/HDFS-2570 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Affects Versions: 0.23.0 Reporter: Eli Collins Assignee: Eli Collins Priority: Trivial Attachments: hdfs-2570-1.patch Let's add descriptions for dfs.*.https.address in hdfs-default.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2596) TestDirectoryScanner doesn't test parallel scans
TestDirectoryScanner doesn't test parallel scans Key: HDFS-2596 URL: https://issues.apache.org/jira/browse/HDFS-2596 Project: Hadoop HDFS Issue Type: Bug Components: data-node, test Affects Versions: 0.23.0, 0.22.0 Reporter: Eli Collins Assignee: Eli Collins Attachments: hdfs-2596-1.patch The code from HDFS-854 below doesn't run the test with parallel scanning. They probably intended "parallelism < 3". {code} + public void testDirectoryScanner() throws Exception { +// Run the test with and without parallel scanning +for (int parallelism = 1; parallelism < 2; parallelism++) { + runTest(parallelism); +} + } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2607) Use named daemon threads for the directory scanner
Use named daemon threads for the directory scanner -- Key: HDFS-2607 URL: https://issues.apache.org/jira/browse/HDFS-2607 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.23.0, 0.21.0, 0.22.0 Reporter: Eli Collins Fix For: 0.23.1 HDFS-854 added a thread pool for block scanners. It would be better to use a factory that names the threads and daemonizes them so they don't block shutdown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2610) Make dashes vs dots consistent in config key names
Make dashes vs dots consistent in config key names -- Key: HDFS-2610 URL: https://issues.apache.org/jira/browse/HDFS-2610 Project: Hadoop HDFS Issue Type: Improvement Reporter: Eli Collins Priority: Minor The use of dashes vs dots in the config keys in inconsistent (eg https.address vs http-address). Let's make them all consistent (no dashes seems most consistent) and add the necessary deprecations in HdfsConfiguration.java. Should do the same in common and MR so we're not inconsistent there. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2611) Build and publish indexed source code
Build and publish indexed source code - Key: HDFS-2611 URL: https://issues.apache.org/jira/browse/HDFS-2611 Project: Hadoop HDFS Issue Type: Task Components: documentation Reporter: Eli Collins The HBase folks publish xref which produces pages like http://hbase.apache.org/xref/org/apache/hadoop/hbase/client/Delete.html. It's quite nice: it makes their code indexable by Google, and, since it understands Java, it's easy to move around between classes. Let's do this as well. Here's the maven plugin: http://maven.apache.org/plugins/maven-jxr-plugin/jxr-mojo.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2631) Rewrite fuse-dfs to use the webhdfs protocol
Rewrite fuse-dfs to use the webhdfs protocol Key: HDFS-2631 URL: https://issues.apache.org/jira/browse/HDFS-2631 Project: Hadoop HDFS Issue Type: Improvement Components: contrib/fuse-dfs Reporter: Eli Collins We should port the implementation of fuse-dfs to use the webhdfs protocol. This has a number of benefits: * Compatibility - allows a single fuse client to work across server versions * Works with both WebHDFS and Hoop since they are protocol compatible * Removes the overhead related to libhdfs (forking a jvm) * Makes it easier to support features like security -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2633) BPOfferService#isAlive is poorly named
BPOfferService#isAlive is poorly named -- Key: HDFS-2633 URL: https://issues.apache.org/jira/browse/HDFS-2633 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Affects Versions: 0.23.0 Reporter: Eli Collins Priority: Minor Per HDFS-2627 the current implementation returns true even if one of the actor threads is dead. "The only non-test use case for isAlive seems to be from BlockPoolSliceScanner and DataBlockScanner, where they're really trying to figure out whether they should stop scanning the block pool. If the BPOS is connected to any NN at all (regardless of active/standby) it needs to report true so that the scanners don't stop running. It would be nice to clean up these calls and specify in their function name that they're only meant for use in tests" and annotate @VisibleForTesting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2637) The rpc timeout for block recovery is too low
The rpc timeout for block recovery is too low -- Key: HDFS-2637 URL: https://issues.apache.org/jira/browse/HDFS-2637 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 1.0.0 Reporter: Eli Collins Assignee: Eli Collins The RPC timeout for block recovery does not take into account that it issues multiple RPCs itself. This can cause recovery to fail if the network is congested or DNs are busy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2638) Improve a blog recovery log
Improve a blog recovery log --- Key: HDFS-2638 URL: https://issues.apache.org/jira/browse/HDFS-2638 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 1.0.0 Reporter: Eli Collins Assignee: Eli Collins Priority: Minor It would be useful to know whether an attempt to recover a block is failing because the block was already recovered (has a new GS) or the block is missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2639) A client may fail during block recovery even if its request to recover a block succeeds
A client may fail during block recovery even if its request to recover a block succeeds --- Key: HDFS-2639 URL: https://issues.apache.org/jira/browse/HDFS-2639 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 1.0.0 Reporter: Eli Collins The client gets stuck in the following loop if an rpc its issued to recover a block timed out: {noformat} DataStreamer#run 1. processDatanodeError 2. DN#recoverBlock 3.DN#syncBlock 4. NN#nextGenerationStamp 5. sleep 1s 6. goto 1 {noformat} Once we've timed out onece at step 2 and loop, step 2 throws an IOE because the block is already being recovered and step 4 throws an IOE because the block GS is now out of date (the previous, timed-out, request got a new GS and updated the block). Eventually the client reaches max retries, considers all DNs bad, and close throws an IOE. The client should be able to succeed if one of its requests to recover the block succeeded. It should still fail if another client (eg HBase via recoverLease or the NN via releaseLease) succesfully recovered the block. One way to handle this would be to not timeout the request to recover the block. Another would be able to make a subsequent call to recoverBlock succeed eg by updating the block's sequence number to be the latest value that was updated by the same client in the previous request (ie it can recover over itself but not another client). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2653) DFSClient should cache whether addrs are non-local when short-circuiting is enabled
DFSClient should cache whether addrs are non-local when short-circuiting is enabled --- Key: HDFS-2653 URL: https://issues.apache.org/jira/browse/HDFS-2653 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Affects Versions: 0.23.1, 1.0.0 Reporter: Eli Collins Assignee: Eli Collins Something Todd mentioned to me off-line.. currently DFSClient doesn't cache the fact that non-local reads are non-local, so if short-circuiting is enabled every time we create a block reader we'll go through the isLocalAddress code path. We should cache the fact that an addr is non-local as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2654) Make BlockReaderLocal not extend RemoteBlockReader2
Make BlockReaderLocal not extend RemoteBlockReader2 --- Key: HDFS-2654 URL: https://issues.apache.org/jira/browse/HDFS-2654 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Affects Versions: 0.23.1, 1.0.0 Reporter: Eli Collins Assignee: Eli Collins The BlockReaderLocal code paths are easier to understand (especially true on branch-1 where BlockReaderLocal inherits code from BlockerReader and FSInputChecker) if the local and remote block reader implementations are independent, and they're not really sharing much code anyway. If for some reason they start to share sifnificant code we can make the BlockReader interface an abstract class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2655) BlockReaderLocal#skip performs unnecessary IO
BlockReaderLocal#skip performs unnecessary IO - Key: HDFS-2655 URL: https://issues.apache.org/jira/browse/HDFS-2655 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Affects Versions: 0.23.1 Reporter: Eli Collins Per HDFS-2654 BlockReaderLocal#skip performs the skip by reading the data so we stay in sync with checksums. This could be implemented more efficiently in the future to skip to the beginning of the appropriate checksum chunk and then only read to the middle of that chunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2657) TestHttpFSServer and TestServerWebApp are failing on trunk
TestHttpFSServer and TestServerWebApp are failing on trunk -- Key: HDFS-2657 URL: https://issues.apache.org/jira/browse/HDFS-2657 Project: Hadoop HDFS Issue Type: Bug Reporter: Eli Collins >>> org.apache.hadoop.fs.http.server.TestHttpFSServer.instrumentation >>> org.apache.hadoop.lib.servlet.TestServerWebApp.lifecycle -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2658) HttpFS introduced 70 javadoc warnings
HttpFS introduced 70 javadoc warnings - Key: HDFS-2658 URL: https://issues.apache.org/jira/browse/HDFS-2658 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.1 Reporter: Eli Collins Assignee: Alejandro Abdelnur {noformat} hadoop1 (trunk)$ grep warning javadoc.txt |grep -c httpfs 70 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2659) 20 "Cannot find annotation method 'value()'" of LimitedPrivate javadoc warnings
20 "Cannot find annotation method 'value()'" of LimitedPrivate javadoc warnings --- Key: HDFS-2659 URL: https://issues.apache.org/jira/browse/HDFS-2659 Project: Hadoop HDFS Issue Type: Bug Reporter: Eli Collins There are 20 of the following warnings on trunK: Cannot find annotation method 'value()' in type 'org.apache.hadoop.classification.InterfaceAudience.LimitedPrivate' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2677) HA: Web UI should indicate the NN state
HA: Web UI should indicate the NN state --- Key: HDFS-2677 URL: https://issues.apache.org/jira/browse/HDFS-2677 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins Assignee: Eli Collins The DFS web UI should indicate whether it's an active or standby. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2679) Add interface to query current state to HAServiceProtocol
Add interface to query current state to HAServiceProtocol -- Key: HDFS-2679 URL: https://issues.apache.org/jira/browse/HDFS-2679 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins Assignee: Eli Collins Let's add an interface to HAServiceProtocol to query the current state of a NameNode for use by the the CLI (HAAdmin) and Web UI (HDFS-2677). This essentially makes the names "active" and "standby" from ACTIVE_STATE and STANDBY_STATE public interfaces, which IMO seems reasonable. Unlike the other APIs we should be able to use the interface even when HA is not enabled (as by default a non-HA NN is active). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2701) Cleanup FS* processIOError methods
Cleanup FS* processIOError methods -- Key: HDFS-2701 URL: https://issues.apache.org/jira/browse/HDFS-2701 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.20.205.0 Reporter: Eli Collins Assignee: Eli Collins Fix For: 1.1.0 Let's rename the various "processIOError" methods to be more descriptive. The current code makes it difficult to identify and reason about bug fixes. While we're at it let's remove "Fatal" from the "Unable to sync the edit log" log since it's not actually a fatal error (this is confusing to users). And 2NN "Checkpoint done" should be info, not a warning (also confusing to users). Thanks to HDFS-1073 these issues don't exist on trunk or 23. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2702) A single failed name dir can cause the NN to exit
A single failed name dir can cause the NN to exit -- Key: HDFS-2702 URL: https://issues.apache.org/jira/browse/HDFS-2702 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.205.0 Reporter: Eli Collins Assignee: Eli Collins Priority: Critical Fix For: 1.1.0 There's a bug in FSEditLog#rollEditLog which results in the NN process exiting if a single name dir has failed. Here's the relevant code: {code} close() // So editStreams.size() is 0 foreach edits dir { .. eStream = new ... // Might get an IOE here editStreams.add(eStream); } catch (IOException ioe) { removeEditsForStorageDir(sd); // exits if editStreams.size() <= 1 } {code} If we get an IOException before we've added two edits streams to the list we'll exit, eg if there's an error processing the 1st name dir we'll exit even if there are 4 valid name dirs. The fix is to move the checking out of removeEditsForStorageDir (nee processIOError) or modify it so it can be disabled in some cases, eg here where we don't yet know how many streams are valid. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2703) removedStorageDirs is not updated everywhere we remove a storage dir
removedStorageDirs is not updated everywhere we remove a storage dir Key: HDFS-2703 URL: https://issues.apache.org/jira/browse/HDFS-2703 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 1.0.0 Reporter: Eli Collins Assignee: Eli Collins There are a number of places (FSEditLog#open, purgeEditLog, and rollEditLog) where we remove a storage directory but don't add it to the removedStorageDirs list. This means a storage dir may have been removed but we don't see it in the log or Web UI. This doesn't affect trunk/23 since the code there is totally different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2704) NameNodeResouceChecker#checkAvailableResources should check for inodes
NameNodeResouceChecker#checkAvailableResources should check for inodes -- Key: HDFS-2704 URL: https://issues.apache.org/jira/browse/HDFS-2704 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.24.0 Reporter: Eli Collins NameNodeResouceChecker#checkAvailableResources currently just checks for free space. However inodes are also a file system resource that needs to be available (you can run out of inodes but have free space). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2708) Stats for the total # blocks and blocks per DN
Stats for the total # blocks and blocks per DN -- Key: HDFS-2708 URL: https://issues.apache.org/jira/browse/HDFS-2708 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, name-node Reporter: Eli Collins Priority: Minor It would be useful for tools to be able to retrieve the total # blocks in the file system (and also display eg via dfsadmin report, this is currently only available via FSNamesystemMetrics, so would add to ClientProtocol#getStats?) and the total number of blocks on each datanode (via DataNodeInfo). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2715) start-dfs.sh falsely warns about processes already running
start-dfs.sh falsely warns about processes already running -- Key: HDFS-2715 URL: https://issues.apache.org/jira/browse/HDFS-2715 Project: Hadoop HDFS Issue Type: Bug Components: scripts Affects Versions: 0.24.0 Reporter: Eli Collins The sbin script pid detection is broken. Running star-dfs.sh indicates the following even if there are no processes are running and the pid dir is empty before starting. {noformat} hadoop-0.24.0-SNAPSHOT $ ./sbin/start-dfs.sh Starting namenodes on [localhost localhost] localhost: starting namenode, logging to /home/eli/hadoop/dirs1/logs/eli/hadoop-eli-namenode-eli-thinkpad.out localhost: namenode running as process 25256. Stop it first. {noformat} This may be in 23 as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2731) Autopopulate standby name dirs if they're empty
Autopopulate standby name dirs if they're empty --- Key: HDFS-2731 URL: https://issues.apache.org/jira/browse/HDFS-2731 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins Assignee: Eli Collins To setup a SBN we currently format the primary then manually copy the name dirs to the SBN. The SBN should do this automatically. Specifically, on NN startup, if HA with a shared edits dir is configured and populated, if the SBN has empty name dirs it should downloads the image and log from the primary (as an optimization it could copy the logs from the shared dir). If the other NN is still in standby then it should fails to start as it does currently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2732) Add support for the standby in the bin scripts
Add support for the standby in the bin scripts -- Key: HDFS-2732 URL: https://issues.apache.org/jira/browse/HDFS-2732 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins Assignee: Eli Collins We need to update the bin scripts to support SBNs. Two ideas: Modify start-dfs.sh to start another copy of the NN if HA is configured. We could introduce a file similar to masters (2NN hosts) called standbys which lists the SBN hosts, and start-dfs.sh would automatically make the NN it starts active (and leave the NNs listed in standby as is). Or simpler, we could just provide a start-namenode.sh script that a user can run to start the SBN on another host themselves. The user would manually tell the other NN to be active via HAAdmin (or start-dfs.sh could do that automatically, ie assume the NN it starts should be the primary). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2733) Document HA configuration and CLI
Document HA configuration and CLI - Key: HDFS-2733 URL: https://issues.apache.org/jira/browse/HDFS-2733 Project: Hadoop HDFS Issue Type: Sub-task Components: documentation, ha Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins We need to document the configuration changes in HDFS-2231 and the new CLI introduced by HADOOP-7774. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2735) HA: add tests for multiple shared edits dirs
HA: add tests for multiple shared edits dirs Key: HDFS-2735 URL: https://issues.apache.org/jira/browse/HDFS-2735 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, test Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins You can configure and run with multiple shared edits dirs but we don't have any test coverage for them. In particular, we should cover the behavior of the edit log tailer with multiple dirs, and failure scenarios (eg can we tolerate a single shared dir failure if we have two shared dirs). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2736) HA: support separate SBN and 2NN?
HA: support separate SBN and 2NN? - Key: HDFS-2736 URL: https://issues.apache.org/jira/browse/HDFS-2736 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins HDFS-2291 adds support for making the SBN capable of checkpointing, seems like we may also need to support the 2NN checkpointing as well. Eg if we fail over to the SBN does it continue to checkpoint? If not the log grows unbounded until the old primary comes back, if so does that create performance problems since the primary wasn't previously checkpointing? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2747) HA: entering SM after starting SBN can NPE
HA: entering SM after starting SBN can NPE -- Key: HDFS-2747 URL: https://issues.apache.org/jira/browse/HDFS-2747 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins Entering SM on the primary after while it's already in SM after the SBN is started results in an NPE: {noformat} hadoop-0.24.0-SNAPSHOT $ ./bin/hdfs dfsadmin -safemode get Safe mode is ON hadoop-0.24.0-SNAPSHOT $ ./bin/hdfs dfsadmin -safemode enter safemode: java.lang.NullPointerException {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2752) HA: exit if multiple shared dirs are configured
HA: exit if multiple shared dirs are configured --- Key: HDFS-2752 URL: https://issues.apache.org/jira/browse/HDFS-2752 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins Assignee: Eli Collins We don't support multiple shared edits dirs, we should fail to start with an error in this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2754) HA: enable dfs.namenode.name.dir.restore if HA is enabled
HA: enable dfs.namenode.name.dir.restore if HA is enabled - Key: HDFS-2754 URL: https://issues.apache.org/jira/browse/HDFS-2754 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, name-node Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins If HA is enabled it seems like we should always try to restore failed name dirs. Let's auto-enable name dir restoration if HA is enabled, at least for shared edits dirs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2755) HA: add tests for flaky and failed shared edits directories
HA: add tests for flaky and failed shared edits directories --- Key: HDFS-2755 URL: https://issues.apache.org/jira/browse/HDFS-2755 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, test Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins We should test the behavior with both flaky and failed shared edits dirs. The tests should cover when name dir restore is enabled and disabled. There should be a warning and an API that we can check if all shared directories are not online. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2758) HA: multi-process MiniDFS cluster for testing ungraceful shutdown
HA: multi-process MiniDFS cluster for testing ungraceful shutdown - Key: HDFS-2758 URL: https://issues.apache.org/jira/browse/HDFS-2758 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, test Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins Assignee: Eli Collins We should test ungraceful termination of NN processes, this is generally useful for HDFS testing, but particularly needed for HA since we may do this as via fencing (send a NN a SIGILL via ssh kill -9, flip the PDU, etc). We can't currently do this with the MiniDFSCluster since everything is in one process and killing the native thread hosting the java thread terminates the whole process. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2781) Add client protocol and DFSadmin for command to restore failed storage
Add client protocol and DFSadmin for command to restore failed storage -- Key: HDFS-2781 URL: https://issues.apache.org/jira/browse/HDFS-2781 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins Assignee: Eli Collins Per HDFS-2769, it's important that an admin be able to ask the NN to try to restore failed storage since we may drop into SM until the shared edits dir is restored (w/o having to wait for the next checkpoint). There's currently an API (and usage in DFSAdmin) to flip the flag indicating whether the NN should try to restore failed storage but not that it should actually attempt to do so. This jira is to add one. This is useful outside HA but doing as an HDFS-1623 sub-task since it's motivated by HA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2782) HA: Support multiple shared edits dirs
HA: Support multiple shared edits dirs -- Key: HDFS-2782 URL: https://issues.apache.org/jira/browse/HDFS-2782 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins Assignee: Eli Collins Supporting multiple shared dirs will improve availability (eg see HDFS-2769). You may want to use multiple shared dirs on a single filer (eg for better fault isolation) or because you want to use multiple filers/mounts. Per HDFS-2752 (and HDFS-2735) we need to do things like use the JournalSet in EditLogTailer and add tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2788) HdfsServerConstants#DN_KEEPALIVE_TIMEOUT is dead code
HdfsServerConstants#DN_KEEPALIVE_TIMEOUT is dead code - Key: HDFS-2788 URL: https://issues.apache.org/jira/browse/HDFS-2788 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Affects Versions: 0.23.0, 0.22.0 Reporter: Eli Collins Assignee: Eli Collins Attachments: hdfs-2788.txt HDFS-941 introduced HdfsServerConstants#DN_KEEPALIVE_TIMEOUT but its never used. Perhaps was renamed to DFSConfigKeys#DFS_DATANODE_SOCKET_REUSE_KEEPALIVE_DEFAULT while the patch was written and the old one wasn't deleted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2789) TestHAAdmin.testFailover is failing
TestHAAdmin.testFailover is failing --- Key: HDFS-2789 URL: https://issues.apache.org/jira/browse/HDFS-2789 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins Assignee: Eli Collins Attachments: hdfs-2789.txt Recent change broke it. Need to mock getServiceState to prevent the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2799) Trim fs.checkpoint.dir values
Trim fs.checkpoint.dir values - Key: HDFS-2799 URL: https://issues.apache.org/jira/browse/HDFS-2799 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Eli Collins fs.checkpoint.dir values need to be trimmed like dfs.name.dir and dfs.data.dir values so eg the following works. This currently results in the directory "HADOOP_HOME/?/home/eli/hadoop/dirs3/dfs/chkpoint1" being created. {noformat} fs.checkpoint.dir /home/eli/hadoop/dirs3/dfs/chkpoint1, /home/eli/hadoop/dirs3/dfs/chkpoint2 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2800) TestStandbyCheckpoints.testCheckpointCancellation is racy
TestStandbyCheckpoints.testCheckpointCancellation is racy - Key: HDFS-2800 URL: https://issues.apache.org/jira/browse/HDFS-2800 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, test Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins TestStandbyCheckpoints.testCheckpointCancellation is racy, have seen the following assert on line 212 fail: {code} assertTrue(StandbyCheckpointer.getCanceledCount() > 0); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2808) HA: allow hdfs-specific names to be used in haadmin
HA: allow hdfs-specific names to be used in haadmin --- Key: HDFS-2808 URL: https://issues.apache.org/jira/browse/HDFS-2808 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins Assignee: Eli Collins Currently the HAAdmin CLI tools refer to services using host:port, would be more user friendly to allow people to use hdfs-specific logical names, eg the NNs configured in dfs.ha.namenodes and let it do the mapping to host:port. Could do this by wrapping HAAdmin with a hdfs-specific class and a dfshadmin command. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2860) HA: TestDFSRollback#testRollback is failing
HA: TestDFSRollback#testRollback is failing --- Key: HDFS-2860 URL: https://issues.apache.org/jira/browse/HDFS-2860 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins Assignee: Aaron T. Myers TestDFSRollback#testRollback is failing post HDFS-2824. Looks like a test asserting now incorrect behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2876) The unit tests (src/test/unit) are not being compiled and are not runnable
The unit tests (src/test/unit) are not being compiled and are not runnable -- Key: HDFS-2876 URL: https://issues.apache.org/jira/browse/HDFS-2876 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.23.0 Reporter: Eli Collins The unit tests (src/test/unit not src/test/java) are not being compiled and are not runnable. {{mvn -Dtest=TestBlockRecovery test}} executed from hadoop-hdfs-project does not compile or execute the test. TestBlockRecovery does not compile yet this test target completes w/o error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2884) TestDecommission.testDecommissionFederation fails intermittently
TestDecommission.testDecommissionFederation fails intermittently Key: HDFS-2884 URL: https://issues.apache.org/jira/browse/HDFS-2884 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.23.1 Reporter: Eli Collins I saw the following assert fail on a jenkins job for branch HDFS-1623 but I don't think it's HA related. {noformat} java.lang.AssertionError: Number of Datanodes expected:<2> but was:<1> at org.junit.Assert.fail(Assert.java:91) at org.junit.Assert.failNotEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:126) at org.junit.Assert.assertEquals(Assert.java:470) at org.apache.hadoop.hdfs.TestDecommission.validateCluster(TestDecommission.java:275) at org.apache.hadoop.hdfs.TestDecommission.startCluster(TestDecommission.java:288) at org.apache.hadoop.hdfs.TestDecommission.testDecommission(TestDecommission.java:384) at org.apache.hadoop.hdfs.TestDecommission.testDecommissionFederation(TestDecommission.java:344) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2885) Remove "federation" from the nameservice config options
Remove "federation" from the nameservice config options --- Key: HDFS-2885 URL: https://issues.apache.org/jira/browse/HDFS-2885 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.23.1 Reporter: Eli Collins HDFS-1623, and potentially other HDFS features will use the nameservice abstraction, even if federation is not enabled (eg you need to configure {{dfs.federation.nameservices}} in HA even if you're not using federation just to declare your nameservice). This is confusing to users. We should consider deprecating and removing "federation" from the {{dfs.federation.nameservices}} and {{dfs.federation.nameservice.id}} config options, as {{dfs.nameservices}} and {{dfs.nameservice.id}} are more intuitive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2893) The 2NN won't start if dfs.namenode.secondary.http-address is default or specified with a wildcard IP and port
The 2NN won't start if dfs.namenode.secondary.http-address is default or specified with a wildcard IP and port -- Key: HDFS-2893 URL: https://issues.apache.org/jira/browse/HDFS-2893 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.1 Reporter: Eli Collins Priority: Critical Looks like DFSUtil address matching doesn't find a match if the http-address is specified using a wildcard IP and a port. It should return 0.0.0.0:50090 in this case which would allow the 2NN to start. Also, unless http-address is explicitly configured in hdfs-site.xml the 2NN will not start, since DFSUtil#getSecondaryNameNodeAddresses does not use the default value as a fallback. That may be confusing to people who expect the default value to be used. {noformat} hadoop-0.23.1-SNAPSHOT $ cat /home/eli/hadoop/conf3/hdfs-site.xml ... dfs.namenode.secondary.http-address 0.0.0.0:50090 hadoop-0.23.1-SNAPSHOT $ ./bin/hdfs --config ~/hadoop/conf3 getconf -secondarynamenodes 0.0.0.0 hadoop-0.23.1-SNAPSHOT $ ./sbin/start-dfs.sh Starting namenodes on [localhost] localhost: starting namenode, logging to /home/eli/hadoop/dirs3/logs/eli/hadoop-eli-namenode-eli-thinkpad.out localhost: starting datanode, logging to /home/eli/hadoop/dirs3/logs/eli/hadoop-eli-datanode-eli-thinkpad.out Secondary namenodes are not configured. Cannot start secondary namenodes. {noformat} This works if eg localhost:50090 is used. We should also update the hdfs user guide to remove the reference to the masters file since it's no longer used to configure which hosts the 2NN runs on. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2894) HA: disable 2NN when HA is enabled
HA: disable 2NN when HA is enabled -- Key: HDFS-2894 URL: https://issues.apache.org/jira/browse/HDFS-2894 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins Assignee: Eli Collins The SecondaryNameNode should log a message and refuse to start if HA is enabled since the StandbyNode checkpoints by default and IIRC we have not yet enabled the ability to have multiple checkpointers in the NN. On the HA branch the 2NN does not currently start from start-dfs.sh because getconf -secondarynamenodes claims the http-address is not configured even though it is, though this seems like a bug, in branch 23 getconf will correctly return localhost:50090. {noformat} dfs.namenode.secondary.http-address localhost:50090 hadoop-0.24.0-SNAPSHOT $ ./bin/hdfs getconf -secondarynamenodes Incorrect configuration: secondary namenode address dfs.namenode.secondary.http-address is not configured. {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2896) The 2NN incorrectly daemonizes
The 2NN incorrectly daemonizes -- Key: HDFS-2896 URL: https://issues.apache.org/jira/browse/HDFS-2896 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Eli Collins Assignee: Eli Collins The SecondaryNameNode (and Checkpointer) confuse o.a.h.u.Daemon with a Unix daemon. Per below it intends to create a thread that never ends, but o.a.h.u.Daemon just marks a thread with Java's Thread#setDaemon which means Java will terminate the thread when there are no more non-daemon user threads running {code} // Create a never ending deamon Daemon checkpointThread = new Daemon(secondary); {code} Perhaps they thought they were using commons Daemon. We of course don't want the 2NN to exit unless it exits itself or is stopped explicitly. Currently it won't do this because the main thread is not marked as a daemon thread. In any case, let's make the 2NN consistent with the NN in this regard (exit when the RPC thread exits). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2897) Enable a single 2nn to checkpoint multiple nameservices
Enable a single 2nn to checkpoint multiple nameservices --- Key: HDFS-2897 URL: https://issues.apache.org/jira/browse/HDFS-2897 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: 0.23.0 Reporter: Eli Collins The dfs.namenode.secondary.http-address needs to be suffixed with a particular nameservice. It would be useful to be able to be able to configure a single 2NN to checkpoint all the nameservices for a NN rather than having to run a separate 2NN per nameservice. It could potentially checkpoint all namenode IDs for a nameservice as well but given that the standby is capable of checkpointing and is required I think we can ignore this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2911) Gracefully handle OutOfMemoryErrors
Gracefully handle OutOfMemoryErrors --- Key: HDFS-2911 URL: https://issues.apache.org/jira/browse/HDFS-2911 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, name-node Affects Versions: 1.0.0, 0.23.0 Reporter: Eli Collins We should gracefully handle j.l.OutOfMemoryError exceptions in the NN or DN. We should catch them in a high-level handler, cleanly fail the RPC (vs sending back the OOM stackrace) or background thread, and shutdown the NN or DN. Currently the process is left in a not well-test tested state (continuously fails RPCs and internal threads, may or may not recover and doesn't shutdown gracefully). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2916) HA: allow dfsadmin to refer to a particular namenode
HA: allow dfsadmin to refer to a particular namenode Key: HDFS-2916 URL: https://issues.apache.org/jira/browse/HDFS-2916 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins Assignee: Eli Collins dfsadmin currently fails over like other clients, so if we you want to put a particular NN in safemode you need to use the "fs" option and specify a host:ipcport target. Like HDFS-2808 it would be useful to be able to specify a logical namenode ID instead of an RPC addr. Since fs is part of generic options this could potentially apply to all tools, however most tools want to refer to the default logical namenode URI and failover like other clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2918) HA: dfsadmin should failover like other clients
HA: dfsadmin should failover like other clients --- Key: HDFS-2918 URL: https://issues.apache.org/jira/browse/HDFS-2918 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: HA branch (HDFS-1623) Reporter: Eli Collins dfsadmin currently always uses the first namenode rather than failing over. It should failover like other clients, unless fs specifies a specific namenode. {noformat} hadoop-0.24.0-SNAPSHOT $ ./bin/hdfs haadmin -failover nn1 nn2 Failover from nn1 to nn2 successful # nn2 is 8022 hadoop-0.24.0-SNAPSHOT $ ./bin/hdfs dfsadmin -fs localhost:8022 -safemode enter Safe mode is ON hadoop-0.24.0-SNAPSHOT $ ./bin/hdfs dfsadmin -safemode get Safe mode is OFF hadoop-0.24.0-SNAPSHOT $ ./bin/hdfs dfsadmin -fs localhost:8022 -safemode get Safe mode is ON {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira