[jira] [Updated] (HDFS-16283) RBF: improve renewLease() to call only a specific NameNode rather than make fan-out calls
[ https://issues.apache.org/jira/browse/HDFS-16283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HDFS-16283: Attachment: RBF_ improve renewLease() to call only a specific NameNode rather than make fan-out calls.pdf > RBF: improve renewLease() to call only a specific NameNode rather than make > fan-out calls > - > > Key: HDFS-16283 > URL: https://issues.apache.org/jira/browse/HDFS-16283 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Labels: pull-request-available > Attachments: RBF_ improve renewLease() to call only a specific > NameNode rather than make fan-out calls.pdf > > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently renewLease() against a router will make fan-out to all the > NameNodes. Since renewLease() call is so frequent and if one of the NameNodes > are slow, then eventually the router queues are blocked by all renewLease() > and cause router degradation. > We will make a change in the client side to keep track of NameNode Id in > additional to current fileId so routers understand which NameNodes the client > is renewing lease against. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16283) RBF: improve renewLease() to call only a specific NameNode rather than make fan-out calls
[ https://issues.apache.org/jira/browse/HDFS-16283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17435067#comment-17435067 ] Aihua Xu commented on HDFS-16283: - [~jingzhao] and [~inigoiri] Can you help review the change? > RBF: improve renewLease() to call only a specific NameNode rather than make > fan-out calls > - > > Key: HDFS-16283 > URL: https://issues.apache.org/jira/browse/HDFS-16283 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently renewLease() against a router will make fan-out to all the > NameNodes. Since renewLease() call is so frequent and if one of the NameNodes > are slow, then eventually the router queues are blocked by all renewLease() > and cause router degradation. > We will make a change in the client side to keep track of NameNode Id in > additional to current fileId so routers understand which NameNodes the client > is renewing lease against. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16283) RBF: improve renewLease() to call only a specific NameNode rather than make fan-out calls
Aihua Xu created HDFS-16283: --- Summary: RBF: improve renewLease() to call only a specific NameNode rather than make fan-out calls Key: HDFS-16283 URL: https://issues.apache.org/jira/browse/HDFS-16283 Project: Hadoop HDFS Issue Type: Sub-task Components: rbf Reporter: Aihua Xu Assignee: Aihua Xu Currently renewLease() against a router will make fan-out to all the NameNodes. Since renewLease() call is so frequent and if one of the NameNodes are slow, then eventually the router queues are blocked by all renewLease() and cause router degradation. We will make a change in the client side to keep track of NameNode Id in additional to current fileId so routers understand which NameNodes the client is renewing lease against. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16200) Improve NameNode failover
[ https://issues.apache.org/jira/browse/HDFS-16200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415781#comment-17415781 ] Aihua Xu commented on HDFS-16200: - [~hexiaoqiao] Thanks for checking. Regarding improving topology resolution performance, there is TableMapping with precomputed topology info but you need to know the list of the hosts and precompute the topology. We can convert the script into a build-in implementation, but I believe we will still hit some slowness there. For our particular case, we don't colocate storage with computing and the failover has been improved from over 10 minutes to just seconds by disabling it. Right now there are more cases to separate storage and computing. Should we have a global configuration to optimize for those cases? > Improve NameNode failover > - > > Key: HDFS-16200 > URL: https://issues.apache.org/jira/browse/HDFS-16200 > Project: Hadoop HDFS > Issue Type: Task > Components: namanode >Affects Versions: 2.8.2 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > In a busy cluster, we are noticing the NameNode failover takes longer time > (over 10 minutes) and it causes cluster down time during the time period. > One bottleneck locates in resolving the client host's topology when the > cluster is not colocated with the computing hosts. NameNode resolves the > client host's topology and uses it to sort the hosts where the blocks locate > in. Such topology will be cached so the next access will be efficient, while > if the standby NameNode is newly restarted, then all the client hosts, e.g., > YARN hosts need to be resolved. > Solutions can be: 1) we can expose an API in DFSAdmin to load topology cache, > or 2) we can add a new configuration in HDFS cluster to skip resolving > topology for non-colocated HDFS cluster. Since client hosts and HDFS hosts > are not colocated, it's unnecessary to sort the DataNodes for the clients. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16200) Improve NameNode failover
Aihua Xu created HDFS-16200: --- Summary: Improve NameNode failover Key: HDFS-16200 URL: https://issues.apache.org/jira/browse/HDFS-16200 Project: Hadoop HDFS Issue Type: Task Components: namanode Affects Versions: 2.8.2 Reporter: Aihua Xu Assignee: Aihua Xu In a busy cluster, we are noticing the NameNode failover takes longer time (over 10 minutes) and it causes cluster down time during the time period. One bottleneck locates in resolving the client host's topology when the cluster is not colocated with the computing hosts. NameNode resolves the client host's topology and uses it to sort the hosts where the blocks locate in. Such topology will be cached so the next access will be efficient, while if the standby NameNode is newly restarted, then all the client hosts, e.g., YARN hosts need to be resolved. Solutions can be: 1) we can expose an API in DFSAdmin to load topology cache, or 2) we can add a new configuration in HDFS cluster to skip resolving topology for non-colocated HDFS cluster. Since client hosts and HDFS hosts are not colocated, it's unnecessary to sort the DataNodes for the clients. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16159) Support resolvePath() in DistributedFileSystem for federated HDFS
Aihua Xu created HDFS-16159: --- Summary: Support resolvePath() in DistributedFileSystem for federated HDFS Key: HDFS-16159 URL: https://issues.apache.org/jira/browse/HDFS-16159 Project: Hadoop HDFS Issue Type: Task Components: federation Affects Versions: 3.3.1 Reporter: Aihua Xu Assignee: Aihua Xu DistributedFileSystem needs to support resolvePath() similar to ViewFileSystem since DistributedFileSystem can be used to talk to Router FileSystem. The clients like Hive need the functionality to determine the physical clusters to choose the data copy vs. move if src and dest are on different physical clusters although they are on the same router file system. See Hive-24742. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15905) Improve Router performance with router redirection
[ https://issues.apache.org/jira/browse/HDFS-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304294#comment-17304294 ] Aihua Xu commented on HDFS-15905: - [~elgoiri], [~jingzhao], [~fengnanli] Can you provide any feedback/suggestion? Thanks a lot. > Improve Router performance with router redirection > -- > > Key: HDFS-15905 > URL: https://issues.apache.org/jira/browse/HDFS-15905 > Project: Hadoop HDFS > Issue Type: New Feature > Components: rbf >Affects Versions: 3.1.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > > Router implementation currently takes the proxy approach to handle the client > requests: the routers receive the requests from the clients and send the > requests to the target clusters on behalf of the clients. > This approach works well, while after moving more clusters on top of > routers, we are seeing that routers are becoming the bottleneck since e.g., > without RBF, the clients themselves manage the connections for themselves, > while with RBF, the limited routers manage much more connections for the > clients; we also keep idle connections to boost the connection performance. > We have done some work to tune connection management but it doesn't help much. > We are proposing to reduce the functionality on the router side and use them > as actual router instead of proxy: the clients talk to routers to resolve > target cluster info given a path and get router delegation token; the clients > directly send the requests to target cluster. > A big challenge here is the token authentication against target cluster with > router token only. One approach: we can ask router to return target cluster > token along with router token so the clients can authenticate against target > cluster. Second approach: similar to block token mechanism, the router > exchanges secret keys with target clusters through heart-beats so the clients > can authenticate with target cluster with that router token. > I would like to know your feedback. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15905) Improve Router performance with router redirection
Aihua Xu created HDFS-15905: --- Summary: Improve Router performance with router redirection Key: HDFS-15905 URL: https://issues.apache.org/jira/browse/HDFS-15905 Project: Hadoop HDFS Issue Type: New Feature Components: rbf Affects Versions: 3.1.0 Reporter: Aihua Xu Assignee: Aihua Xu Router implementation currently takes the proxy approach to handle the client requests: the routers receive the requests from the clients and send the requests to the target clusters on behalf of the clients. This approach works well, while after moving more clusters on top of routers, we are seeing that routers are becoming the bottleneck since e.g., without RBF, the clients themselves manage the connections for themselves, while with RBF, the limited routers manage much more connections for the clients; we also keep idle connections to boost the connection performance. We have done some work to tune connection management but it doesn't help much. We are proposing to reduce the functionality on the router side and use them as actual router instead of proxy: the clients talk to routers to resolve target cluster info given a path and get router delegation token; the clients directly send the requests to target cluster. A big challenge here is the token authentication against target cluster with router token only. One approach: we can ask router to return target cluster token along with router token so the clients can authenticate against target cluster. Second approach: similar to block token mechanism, the router exchanges secret keys with target clusters through heart-beats so the clients can authenticate with target cluster with that router token. I would like to know your feedback. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15800) DataNode to handle NameNode IP changes
[ https://issues.apache.org/jira/browse/HDFS-15800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HDFS-15800: Assignee: Aihua Xu Status: Patch Available (was: Open) Simple change to update remoteId.address directly instead of the local variable, so connection recreation will get the new IP address from remoteId.address. > DataNode to handle NameNode IP changes > -- > > Key: HDFS-15800 > URL: https://issues.apache.org/jira/browse/HDFS-15800 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.8.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > Attachments: HDFS-15800.patch > > > Hadoop-17068 is to handle the case of NameNode IP address changes in which > HDFS client will update the IP address after the connection failure. > DataNodes also use the same logic to refresh IP address for the connection. > Such connection is reused with the default idle time 10 seconds. (set by > ipc.client.connection.maxidletime). If the connection is closed and the > DataNode will use the old NameNode IP address to connect and refresh to the > new IP address after the first failure. > The problem with the refresh logic in org.apache.hadoop.ipc.Client is: the > server value getting refreshed will not reflect in remoteId.address, while > the next connection creation will use remoteId.address. > {{if (!server.equals(currentAddr)) {}} > {{ LOG.warn("Address change detected. Old: " + server.toString() +}} > {{ " New: " + currentAddr.toString()); }} > {{ server = currentAddr;}} > > Such kind of retry in a big cluster will cause random "BLOCK* > blk_16987635027_18010098516 is COMMITTED but not COMPLETE(numNodes= 0 < > minimum = 1) in fie" error if all three replicas take one retry to read/write > the block. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15800) DataNode to handle NameNode IP changes
[ https://issues.apache.org/jira/browse/HDFS-15800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HDFS-15800: Attachment: HDFS-15800.patch > DataNode to handle NameNode IP changes > -- > > Key: HDFS-15800 > URL: https://issues.apache.org/jira/browse/HDFS-15800 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.8.0 >Reporter: Aihua Xu >Priority: Major > Attachments: HDFS-15800.patch > > > Hadoop-17068 is to handle the case of NameNode IP address changes in which > HDFS client will update the IP address after the connection failure. > DataNodes also use the same logic to refresh IP address for the connection. > Such connection is reused with the default idle time 10 seconds. (set by > ipc.client.connection.maxidletime). If the connection is closed and the > DataNode will use the old NameNode IP address to connect and refresh to the > new IP address after the first failure. > The problem with the refresh logic in org.apache.hadoop.ipc.Client is: the > server value getting refreshed will not reflect in remoteId.address, while > the next connection creation will use remoteId.address. > {{if (!server.equals(currentAddr)) {}} > {{ LOG.warn("Address change detected. Old: " + server.toString() +}} > {{ " New: " + currentAddr.toString()); }} > {{ server = currentAddr;}} > > Such kind of retry in a big cluster will cause random "BLOCK* > blk_16987635027_18010098516 is COMMITTED but not COMPLETE(numNodes= 0 < > minimum = 1) in fie" error if all three replicas take one retry to read/write > the block. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15800) DataNode to handle NameNode IP changes
Aihua Xu created HDFS-15800: --- Summary: DataNode to handle NameNode IP changes Key: HDFS-15800 URL: https://issues.apache.org/jira/browse/HDFS-15800 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.8.0 Reporter: Aihua Xu Hadoop-17068 is to handle the case of NameNode IP address changes in which HDFS client will update the IP address after the connection failure. DataNodes also use the same logic to refresh IP address for the connection. Such connection is reused with the default idle time 10 seconds. (set by ipc.client.connection.maxidletime). If the connection is closed and the DataNode will use the old NameNode IP address to connect and refresh to the new IP address after the first failure. The problem with the refresh logic in org.apache.hadoop.ipc.Client is: the server value getting refreshed will not reflect in remoteId.address, while the next connection creation will use remoteId.address. {{if (!server.equals(currentAddr)) {}} {{ LOG.warn("Address change detected. Old: " + server.toString() +}} {{ " New: " + currentAddr.toString()); }} {{ server = currentAddr;}} Such kind of retry in a big cluster will cause random "BLOCK* blk_16987635027_18010098516 is COMMITTED but not COMPLETE(numNodes= 0 < minimum = 1) in fie" error if all three replicas take one retry to read/write the block. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15727) RpcQueueTimeAvgTime of the NameNode increases after it becomes StandBy
[ https://issues.apache.org/jira/browse/HDFS-15727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HDFS-15727: Description: RpcQueueTimeAvgTime of the NameNode on port 8020 (the client RPC calls) increases after it becomes StandBy. It will get resolved after it gets restarted. Seems there is something incorrect about this metrics. See the following graph, the NameNode becomes StandBy at 10:13 while RpcQueueTimeAvgTime increases instead. !image-2020-12-10-13-30-44-288.png! was: RpcQueueTimeAvgTime of the NameNode increases after it becomes StandBy. It will get resolved after it gets restarted. Seems there is something incorrect about this metrics. See the following graph, the NameNode becomes StandBy at 10:13 while RpcQueueTimeAvgTime increases instead. !image-2020-12-10-13-30-44-288.png! > RpcQueueTimeAvgTime of the NameNode increases after it becomes StandBy > -- > > Key: HDFS-15727 > URL: https://issues.apache.org/jira/browse/HDFS-15727 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.2 >Reporter: Aihua Xu >Priority: Major > Attachments: image-2020-12-10-13-30-44-288.png > > > RpcQueueTimeAvgTime of the NameNode on port 8020 (the client RPC calls) > increases after it becomes StandBy. It will get resolved after it gets > restarted. Seems there is something incorrect about this metrics. > See the following graph, the NameNode becomes StandBy at 10:13 while > RpcQueueTimeAvgTime increases instead. > !image-2020-12-10-13-30-44-288.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15727) RpcQueueTimeAvgTime of the NameNode increases after it becomes StandBy
[ https://issues.apache.org/jira/browse/HDFS-15727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248080#comment-17248080 ] Aihua Xu commented on HDFS-15727: - [~kihwal] Seems it doesn't cause issues but wrong metrics. BTW: this metrics is actually for 8020 port of the client RPC calls, not for 8022 port of RPC calls from internal communication. > RpcQueueTimeAvgTime of the NameNode increases after it becomes StandBy > -- > > Key: HDFS-15727 > URL: https://issues.apache.org/jira/browse/HDFS-15727 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.2 >Reporter: Aihua Xu >Priority: Major > Attachments: image-2020-12-10-13-30-44-288.png > > > RpcQueueTimeAvgTime of the NameNode increases after it becomes StandBy. It > will get resolved after it gets restarted. Seems there is something incorrect > about this metrics. > See the following graph, the NameNode becomes StandBy at 10:13 while > RpcQueueTimeAvgTime increases instead. > !image-2020-12-10-13-30-44-288.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15727) RpcQueueTimeAvgTime of the NameNode increases after it becomes StandBy
Aihua Xu created HDFS-15727: --- Summary: RpcQueueTimeAvgTime of the NameNode increases after it becomes StandBy Key: HDFS-15727 URL: https://issues.apache.org/jira/browse/HDFS-15727 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 2.8.2 Reporter: Aihua Xu Attachments: image-2020-12-10-13-30-44-288.png RpcQueueTimeAvgTime of the NameNode increases after it becomes StandBy. It will get resolved after it gets restarted. Seems there is something incorrect about this metrics. See the following graph, the NameNode becomes StandBy at 10:13 while RpcQueueTimeAvgTime increases instead. !image-2020-12-10-13-30-44-288.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15562) StandbyCheckpointer will do checkpoint repeatedly while connecting observer/active namenode failed
[ https://issues.apache.org/jira/browse/HDFS-15562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233178#comment-17233178 ] Aihua Xu commented on HDFS-15562: - Thanks [~shv] for your comment. When I get time, I will focus on not recreating the image if there is a recent one. > StandbyCheckpointer will do checkpoint repeatedly while connecting > observer/active namenode failed > -- > > Key: HDFS-15562 > URL: https://issues.apache.org/jira/browse/HDFS-15562 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: SunHao >Assignee: Aihua Xu >Priority: Major > Labels: pull-request-available > Attachments: HDFS-15562.patch > > Time Spent: 20m > Remaining Estimate: 0h > > We find the standby namenode will do checkpoint over and over while > connecting observer/active namenode failed. > StandbyCheckpointer won't update “lastCheckpointTime” when upload new fsimage > to the other namenode failed, so that the standby namenode will keep doing > checkpoint repeatedly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15467) ObserverReadProxyProvider should skip logging first failover from each proxy
[ https://issues.apache.org/jira/browse/HDFS-15467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229783#comment-17229783 ] Aihua Xu commented on HDFS-15467: - [~csun] It doesn't fail but print the info indicating failover message above. So it was designed o use upper-level retry logic for msync()? I feel FailoverProxyProvider should have its own retry logic to find the active namenode while ObserverReadProxyProvider has its own retry logic to find the right Observer NameNode. > ObserverReadProxyProvider should skip logging first failover from each proxy > > > Key: HDFS-15467 > URL: https://issues.apache.org/jira/browse/HDFS-15467 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Hanisha Koneru >Assignee: Aihua Xu >Priority: Major > > After HADOOP-17116, \{{RetryInvocationHandler}} skips logging the first > failover INFO message from each proxy. But {{ObserverReadProxyProvider}} uses > {{combinedProxy}} object which combines all proxies into one and assigns > {{combinedInfo}} as the ProxyInfo. > {noformat} > ObserverReadProxyProvider# Lines 197-207: > for (int i = 0; i < nameNodeProxies.size(); i++) { > if (i > 0) { > combinedInfo.append(","); > } > combinedInfo.append(nameNodeProxies.get(i).proxyInfo); > } > combinedInfo.append(']'); > T wrappedProxy = (T) Proxy.newProxyInstance( > ObserverReadInvocationHandler.class.getClassLoader(), > new Class[] {xface}, new ObserverReadInvocationHandler()); > combinedProxy = new ProxyInfo<>(wrappedProxy, > combinedInfo.toString()){noformat} > {{RetryInvocationHandler}} depends on the {{ProxyInfo}} to differentiate > between proxies while checking if failover from that proxy happened before. > And since combined proxy has only 1 proxy, HADOOP-17116 doesn't work on > {{ObserverReadProxyProvider.}}It would need to handled separately. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15467) ObserverReadProxyProvider should skip logging first failover from each proxy
[ https://issues.apache.org/jira/browse/HDFS-15467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228899#comment-17228899 ] Aihua Xu edited comment on HDFS-15467 at 11/10/20, 1:33 AM: [~csun] In ObserverReadProxyProvider, the FailoverProxyProvider (failoverProxy) for active/standby namenode failover doesn't seem to have retry logic. When msync() is called against failoverProxy, it could fail when it's reaching out a standby namenode. The exception is thrown to the retry logic of ObserverReadProxyProvider to handle (see the stack trace below). Is this by design? Logically seems FailoverProxyProvider should also have retry around it as well like: {{DfsClientConf config = new DfsClientConf(conf);}} {{ClientProtocol proxy = (ClientProtocol) RetryProxy.create(xface,}} {{failoverProxyProvider,}} {{RetryPolicies.failoverOnNetworkException(}} {{RetryPolicies.TRY_ONCE_THEN_FAIL, config.getMaxFailoverAttempts(),}} {{config.getMaxRetryAttempts(), config.getFailoverSleepBaseMillis(),}} {{config.getFailoverSleepMaxMillis()));}} {quote}20/10/29 04:22:33 INFO retry.RetryInvocationHandler: Exception while invoking $Proxy5.getFileInfo over [hadoopetanamenode01-dca1.prod.uber.internal/10.22.3.137:8020,hadoopetanamenode02-dca1.prod.uber.internal/10.18.6.167:8020,hadoopetaobserver01-dca1.prod.uber.internal/10.14.137.154:8020] after 1 failover attempts. Trying to failover after sleeping for 693ms. org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby. Visit [http://t.uber.com/hdfs_faq] at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1942) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1387) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.msync(NameNodeRpcServer.java:1318) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.msync(ClientNamenodeProtocolServerSideTranslatorPB.java:1617) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:508) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1034) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:930) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:865) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2726) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1524) at org.apache.hadoop.ipc.Client.call(Client.java:1470) at org.apache.hadoop.ipc.Client.call(Client.java:1369) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:231) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:117) at com.sun.proxy.$Proxy15.msync(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.msync(ClientNamenodeProtocolTranslatorPB.java:1634) at org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.initializeMsync(ObserverReadProxyProvider.java:350) at org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.access$600(ObserverReadProxyProvider.java:69) at org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider$ObserverReadInvocationHandler.invoke(ObserverReadProxyProvider.java:427) at com.sun.proxy.$Proxy5.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346) at com.sun.proxy.$Proxy5.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1700) at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1439) at
[jira] [Comment Edited] (HDFS-15467) ObserverReadProxyProvider should skip logging first failover from each proxy
[ https://issues.apache.org/jira/browse/HDFS-15467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228899#comment-17228899 ] Aihua Xu edited comment on HDFS-15467 at 11/10/20, 1:33 AM: [~csun] In ObserverReadProxyProvider, the FailoverProxyProvider (failoverProxy) for active/standby namenode failover doesn't seem to have retry logic. When msync() is called against failoverProxy, it could fail when it's reaching out a standby namenode. The exception is thrown to the retry logic of ObserverReadProxyProvider to handle (see the stack trace below). Is this by design? Logically seems FailoverProxyProvider should also have retry around it as well like: DfsClientConf config = new DfsClientConf(conf); {{ClientProtocol proxy = (ClientProtocol) RetryProxy.create(xface,}} {{failoverProxyProvider,}} {{RetryPolicies.failoverOnNetworkException(}} {{RetryPolicies.TRY_ONCE_THEN_FAIL, config.getMaxFailoverAttempts(),}} {{config.getMaxRetryAttempts(), config.getFailoverSleepBaseMillis(),}} {{config.getFailoverSleepMaxMillis()));}} {quote}20/10/29 04:22:33 INFO retry.RetryInvocationHandler: Exception while invoking $Proxy5.getFileInfo over [hadoopetanamenode01-dca1.prod.uber.internal/10.22.3.137:8020,hadoopetanamenode02-dca1.prod.uber.internal/10.18.6.167:8020,hadoopetaobserver01-dca1.prod.uber.internal/10.14.137.154:8020] after 1 failover attempts. Trying to failover after sleeping for 693ms. org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby. Visit [http://t.uber.com/hdfs_faq] at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1942) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1387) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.msync(NameNodeRpcServer.java:1318) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.msync(ClientNamenodeProtocolServerSideTranslatorPB.java:1617) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:508) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1034) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:930) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:865) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2726) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1524) at org.apache.hadoop.ipc.Client.call(Client.java:1470) at org.apache.hadoop.ipc.Client.call(Client.java:1369) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:231) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:117) at com.sun.proxy.$Proxy15.msync(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.msync(ClientNamenodeProtocolTranslatorPB.java:1634) at org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.initializeMsync(ObserverReadProxyProvider.java:350) at org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.access$600(ObserverReadProxyProvider.java:69) at org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider$ObserverReadInvocationHandler.invoke(ObserverReadProxyProvider.java:427) at com.sun.proxy.$Proxy5.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346) at com.sun.proxy.$Proxy5.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1700) at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1439) at
[jira] [Comment Edited] (HDFS-15467) ObserverReadProxyProvider should skip logging first failover from each proxy
[ https://issues.apache.org/jira/browse/HDFS-15467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228899#comment-17228899 ] Aihua Xu edited comment on HDFS-15467 at 11/10/20, 1:32 AM: [~csun] In ObserverReadProxyProvider, the FailoverProxyProvider (failoverProxy) for active/standby namenode failover doesn't seem to have retry logic. When msync() is called against failoverProxy, it could fail when it's reaching out a standby namenode. The exception is thrown to the retry logic of ObserverReadProxyProvider to handle (see the stack trace below). Is this by design? Logically seems FailoverProxyProvider should also have retry around it as well like: {{DfsClientConf config = new DfsClientConf(conf);}} {{ ClientProtocol proxy = (ClientProtocol) RetryProxy.create(xface,}} {{ failoverProxyProvider,}} {{ RetryPolicies.failoverOnNetworkException(}} {{ RetryPolicies.TRY_ONCE_THEN_FAIL, config.getMaxFailoverAttempts(),}} {{ config.getMaxRetryAttempts(), config.getFailoverSleepBaseMillis(),}} {{ config.getFailoverSleepMaxMillis()));}} {quote}20/10/29 04:22:33 INFO retry.RetryInvocationHandler: Exception while invoking $Proxy5.getFileInfo over [hadoopetanamenode01-dca1.prod.uber.internal/10.22.3.137:8020,hadoopetanamenode02-dca1.prod.uber.internal/10.18.6.167:8020,hadoopetaobserver01-dca1.prod.uber.internal/10.14.137.154:8020] after 1 failover attempts. Trying to failover after sleeping for 693ms. org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby. Visit [http://t.uber.com/hdfs_faq] at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1942) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1387) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.msync(NameNodeRpcServer.java:1318) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.msync(ClientNamenodeProtocolServerSideTranslatorPB.java:1617) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:508) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1034) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:930) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:865) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2726) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1524) at org.apache.hadoop.ipc.Client.call(Client.java:1470) at org.apache.hadoop.ipc.Client.call(Client.java:1369) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:231) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:117) at com.sun.proxy.$Proxy15.msync(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.msync(ClientNamenodeProtocolTranslatorPB.java:1634) at org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.initializeMsync(ObserverReadProxyProvider.java:350) at org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.access$600(ObserverReadProxyProvider.java:69) at org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider$ObserverReadInvocationHandler.invoke(ObserverReadProxyProvider.java:427) at com.sun.proxy.$Proxy5.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346) at com.sun.proxy.$Proxy5.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1700) at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1439) at
[jira] [Commented] (HDFS-15467) ObserverReadProxyProvider should skip logging first failover from each proxy
[ https://issues.apache.org/jira/browse/HDFS-15467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228899#comment-17228899 ] Aihua Xu commented on HDFS-15467: - [~csun] In ObserverReadProxyProvider, the FailoverProxyProvider (failoverProxy) for active/standby namenode failover doesn't seem to have retry logic. When msync() is called against failoverProxy, it could fail when it's reaching out a standby namenode. The exception is thrown to the retry logic of ObserverReadProxyProvider to handle (see the stack trace below). Is this by design? Logically seems FailoverProxyProvider should also have retry around it as well like: {{DfsClientConf config = new DfsClientConf(conf); ClientProtocol proxy = (ClientProtocol) RetryProxy.create(xface, failoverProxyProvider, RetryPolicies.failoverOnNetworkException( RetryPolicies.TRY_ONCE_THEN_FAIL, config.getMaxFailoverAttempts(), config.getMaxRetryAttempts(), config.getFailoverSleepBaseMillis(), config.getFailoverSleepMaxMillis()));}} {quote}20/10/29 04:22:33 INFO retry.RetryInvocationHandler: Exception while invoking $Proxy5.getFileInfo over [hadoopetanamenode01-dca1.prod.uber.internal/10.22.3.137:8020,hadoopetanamenode02-dca1.prod.uber.internal/10.18.6.167:8020,hadoopetaobserver01-dca1.prod.uber.internal/10.14.137.154:8020] after 1 failover attempts. Trying to failover after sleeping for 693ms. org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby. Visit http://t.uber.com/hdfs_faq at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1942) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1387) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.msync(NameNodeRpcServer.java:1318) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.msync(ClientNamenodeProtocolServerSideTranslatorPB.java:1617) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:508) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1034) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:930) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:865) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2726) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1524) at org.apache.hadoop.ipc.Client.call(Client.java:1470) at org.apache.hadoop.ipc.Client.call(Client.java:1369) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:231) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:117) at com.sun.proxy.$Proxy15.msync(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.msync(ClientNamenodeProtocolTranslatorPB.java:1634) at org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.initializeMsync(ObserverReadProxyProvider.java:350) at org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.access$600(ObserverReadProxyProvider.java:69) at org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider$ObserverReadInvocationHandler.invoke(ObserverReadProxyProvider.java:427) at com.sun.proxy.$Proxy5.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346) at com.sun.proxy.$Proxy5.getFileInfo(Unknown Source) at
[jira] [Commented] (HDFS-15597) ContentSummary.getSpaceConsumed does not consider replication
[ https://issues.apache.org/jira/browse/HDFS-15597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227882#comment-17227882 ] Aihua Xu commented on HDFS-15597: - Thanks for reviewing [~weichiu] and [~ayushtkn]. Let me take a look to address the comments. > ContentSummary.getSpaceConsumed does not consider replication > - > > Key: HDFS-15597 > URL: https://issues.apache.org/jira/browse/HDFS-15597 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfs >Affects Versions: 2.6.0 >Reporter: Ajmal Ahammed >Assignee: Aihua Xu >Priority: Minor > Attachments: HDFS-15597.patch > > > I am trying to get the disk space consumed by an HDFS directory using the > {{ContentSummary.getSpaceConsumed}} method. I can't get the space consumption > correctly considering the replication factor. The replication factor is 2, > and I was expecting twice the size of the actual file size from the above > method. > I can't get the space consumption correctly considering the replication > factor. The replication factor is 2, and I was expecting twice the size of > the actual file size from the above method. > {code} > ubuntu@ubuntu:~/ht$ sudo -u hdfs hdfs dfs -ls /var/lib/ubuntu > Found 2 items > -rw-r--r-- 2 ubuntu ubuntu3145728 2020-09-08 09:55 > /var/lib/ubuntu/size-test > drwxrwxr-x - ubuntu ubuntu 0 2020-09-07 06:37 /var/lib/ubuntu/test > {code} > But when I run the following code, > {code} > String path = "/etc/hadoop/conf/"; > conf.addResource(new Path(path + "core-site.xml")); > conf.addResource(new Path(path + "hdfs-site.xml")); > long size = > FileContext.getFileContext(conf).util().getContentSummary(fileStatus).getSpaceConsumed(); > System.out.println("Replication : " + fileStatus.getReplication()); > System.out.println("File size : " + size); > {code} > The output is > {code} > Replication : 0 > File size : 3145728 > {code} > Both the file size and the replication factor seems to be incorrect. > /etc/hadoop/conf/hdfs-site.xml contains the following config: > {code} > > dfs.replication > 2 > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15467) ObserverReadProxyProvider should skip logging first failover from each proxy
[ https://issues.apache.org/jira/browse/HDFS-15467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225120#comment-17225120 ] Aihua Xu commented on HDFS-15467: - I will take a look. > ObserverReadProxyProvider should skip logging first failover from each proxy > > > Key: HDFS-15467 > URL: https://issues.apache.org/jira/browse/HDFS-15467 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Hanisha Koneru >Assignee: Aihua Xu >Priority: Major > > After HADOOP-17116, \{{RetryInvocationHandler}} skips logging the first > failover INFO message from each proxy. But {{ObserverReadProxyProvider}} uses > {{combinedProxy}} object which combines all proxies into one and assigns > {{combinedInfo}} as the ProxyInfo. > {noformat} > ObserverReadProxyProvider# Lines 197-207: > for (int i = 0; i < nameNodeProxies.size(); i++) { > if (i > 0) { > combinedInfo.append(","); > } > combinedInfo.append(nameNodeProxies.get(i).proxyInfo); > } > combinedInfo.append(']'); > T wrappedProxy = (T) Proxy.newProxyInstance( > ObserverReadInvocationHandler.class.getClassLoader(), > new Class[] {xface}, new ObserverReadInvocationHandler()); > combinedProxy = new ProxyInfo<>(wrappedProxy, > combinedInfo.toString()){noformat} > {{RetryInvocationHandler}} depends on the {{ProxyInfo}} to differentiate > between proxies while checking if failover from that proxy happened before. > And since combined proxy has only 1 proxy, HADOOP-17116 doesn't work on > {{ObserverReadProxyProvider.}}It would need to handled separately. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15467) ObserverReadProxyProvider should skip logging first failover from each proxy
[ https://issues.apache.org/jira/browse/HDFS-15467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu reassigned HDFS-15467: --- Assignee: Aihua Xu > ObserverReadProxyProvider should skip logging first failover from each proxy > > > Key: HDFS-15467 > URL: https://issues.apache.org/jira/browse/HDFS-15467 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Hanisha Koneru >Assignee: Aihua Xu >Priority: Major > > After HADOOP-17116, \{{RetryInvocationHandler}} skips logging the first > failover INFO message from each proxy. But {{ObserverReadProxyProvider}} uses > {{combinedProxy}} object which combines all proxies into one and assigns > {{combinedInfo}} as the ProxyInfo. > {noformat} > ObserverReadProxyProvider# Lines 197-207: > for (int i = 0; i < nameNodeProxies.size(); i++) { > if (i > 0) { > combinedInfo.append(","); > } > combinedInfo.append(nameNodeProxies.get(i).proxyInfo); > } > combinedInfo.append(']'); > T wrappedProxy = (T) Proxy.newProxyInstance( > ObserverReadInvocationHandler.class.getClassLoader(), > new Class[] {xface}, new ObserverReadInvocationHandler()); > combinedProxy = new ProxyInfo<>(wrappedProxy, > combinedInfo.toString()){noformat} > {{RetryInvocationHandler}} depends on the {{ProxyInfo}} to differentiate > between proxies while checking if failover from that proxy happened before. > And since combined proxy has only 1 proxy, HADOOP-17116 doesn't work on > {{ObserverReadProxyProvider.}}It would need to handled separately. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15664) Prevent Observer NameNode from becoming StandBy NameNode
[ https://issues.apache.org/jira/browse/HDFS-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu resolved HDFS-15664. - Resolution: Duplicate > Prevent Observer NameNode from becoming StandBy NameNode > > > Key: HDFS-15664 > URL: https://issues.apache.org/jira/browse/HDFS-15664 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: auto-failover >Affects Versions: 2.10.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > > When the cluster performs a failover from NN1 to NN2, NN2 is asking all the > other NNs to cede active state and transit to StandBy including the Observer > NameNodes. > Seems we should block Observer from becoming StandBy and participating in > Failover. Of course, since we can transit StandBy NameNode to Observer, we > can separately support promote Observer NameNode to StandBy NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15664) Prevent Observer NameNode from becoming StandBy NameNode
[ https://issues.apache.org/jira/browse/HDFS-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224985#comment-17224985 ] Aihua Xu commented on HDFS-15664: - Didn't know it was fixed in another place. Thanks [~csun] > Prevent Observer NameNode from becoming StandBy NameNode > > > Key: HDFS-15664 > URL: https://issues.apache.org/jira/browse/HDFS-15664 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: auto-failover >Affects Versions: 2.10.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > > When the cluster performs a failover from NN1 to NN2, NN2 is asking all the > other NNs to cede active state and transit to StandBy including the Observer > NameNodes. > Seems we should block Observer from becoming StandBy and participating in > Failover. Of course, since we can transit StandBy NameNode to Observer, we > can separately support promote Observer NameNode to StandBy NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15664) Prevent Observer NameNode from becoming StandBy NameNode
[ https://issues.apache.org/jira/browse/HDFS-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224828#comment-17224828 ] Aihua Xu commented on HDFS-15664: - + [~sunchao] Comments on this? > Prevent Observer NameNode from becoming StandBy NameNode > > > Key: HDFS-15664 > URL: https://issues.apache.org/jira/browse/HDFS-15664 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: auto-failover >Affects Versions: 2.10.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > > When the cluster performs a failover from NN1 to NN2, NN2 is asking all the > other NNs to cede active state and transit to StandBy including the Observer > NameNodes. > Seems we should block Observer from becoming StandBy and participating in > Failover. Of course, since we can transit StandBy NameNode to Observer, we > can separately support promote Observer NameNode to StandBy NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15664) Prevent Observer NameNode from becoming StandBy NameNode
Aihua Xu created HDFS-15664: --- Summary: Prevent Observer NameNode from becoming StandBy NameNode Key: HDFS-15664 URL: https://issues.apache.org/jira/browse/HDFS-15664 Project: Hadoop HDFS Issue Type: Sub-task Components: auto-failover Affects Versions: 2.10.0 Reporter: Aihua Xu Assignee: Aihua Xu When the cluster performs a failover from NN1 to NN2, NN2 is asking all the other NNs to cede active state and transit to StandBy including the Observer NameNodes. Seems we should block Observer from becoming StandBy and participating in Failover. Of course, since we can transit StandBy NameNode to Observer, we can separately support promote Observer NameNode to StandBy NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15562) StandbyCheckpointer will do checkpoint repeatedly while connecting observer/active namenode failed
[ https://issues.apache.org/jira/browse/HDFS-15562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223287#comment-17223287 ] Aihua Xu commented on HDFS-15562: - [~weichiu], [~csun] Can you help review the change? Thanks. > StandbyCheckpointer will do checkpoint repeatedly while connecting > observer/active namenode failed > -- > > Key: HDFS-15562 > URL: https://issues.apache.org/jira/browse/HDFS-15562 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: SunHao >Assignee: Aihua Xu >Priority: Major > Attachments: HDFS-15562.patch > > > We find the standby namenode will do checkpoint over and over while > connecting observer/active namenode failed. > StandbyCheckpointer won't update “lastCheckpointTime” when upload new fsimage > to the other namenode failed, so that the standby namenode will keep doing > checkpoint repeatedly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15562) StandbyCheckpointer will do checkpoint repeatedly while connecting observer/active namenode failed
[ https://issues.apache.org/jira/browse/HDFS-15562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HDFS-15562: Status: Patch Available (was: Open) patch-1: Standby NameNode will do the checkpoint and upload the image to active and Observer NameNodes. Currently if any remote NameNode is down and uploads fails, then standby NameNode will immediately do the checkpoint again and try uploading. With multiple Observer NameNodes, it's not required that all the Observers are running. The patch will throw exception for checkpoint itself but not for upload failures. > StandbyCheckpointer will do checkpoint repeatedly while connecting > observer/active namenode failed > -- > > Key: HDFS-15562 > URL: https://issues.apache.org/jira/browse/HDFS-15562 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: SunHao >Assignee: Aihua Xu >Priority: Major > Attachments: HDFS-15562.patch > > > We find the standby namenode will do checkpoint over and over while > connecting observer/active namenode failed. > StandbyCheckpointer won't update “lastCheckpointTime” when upload new fsimage > to the other namenode failed, so that the standby namenode will keep doing > checkpoint repeatedly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15562) StandbyCheckpointer will do checkpoint repeatedly while connecting observer/active namenode failed
[ https://issues.apache.org/jira/browse/HDFS-15562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HDFS-15562: Attachment: HDFS-15562.patch > StandbyCheckpointer will do checkpoint repeatedly while connecting > observer/active namenode failed > -- > > Key: HDFS-15562 > URL: https://issues.apache.org/jira/browse/HDFS-15562 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: SunHao >Assignee: Aihua Xu >Priority: Major > Attachments: HDFS-15562.patch > > > We find the standby namenode will do checkpoint over and over while > connecting observer/active namenode failed. > StandbyCheckpointer won't update “lastCheckpointTime” when upload new fsimage > to the other namenode failed, so that the standby namenode will keep doing > checkpoint repeatedly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14327) Using FQDN instead of IP to access servers with DNS resolving
[ https://issues.apache.org/jira/browse/HDFS-14327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HDFS-14327: Description: *strong text*With [HDFS-14118|https://issues.apache.org/jira/browse/HDFS-14118], clients can get the IP of the servers (NN/Routers) and use the IP addresses to access the machine. This will fail in secure environment as Kerberos is using the domain name (FQDN) in the principal so it won't recognize the IP addresses. This task is mainly adding a reverse look up on the current basis and get the domain name after the IP is fetched. After that clients will still use the domain name to access the servers. was: With [HDFS-14118|https://issues.apache.org/jira/browse/HDFS-14118], clients can get the IP of the servers (NN/Routers) and use the IP addresses to access the machine. This will fail in secure environment as Kerberos is using the domain name (FQDN) in the principal so it won't recognize the IP addresses. This task is mainly adding a reverse look up on the current basis and get the domain name after the IP is fetched. After that clients will still use the domain name to access the servers. > Using FQDN instead of IP to access servers with DNS resolving > - > > Key: HDFS-14327 > URL: https://issues.apache.org/jira/browse/HDFS-14327 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14327.001.patch, HDFS-14327.002.patch > > > *strong text*With > [HDFS-14118|https://issues.apache.org/jira/browse/HDFS-14118], clients can > get the IP of the servers (NN/Routers) and use the IP addresses to access the > machine. This will fail in secure environment as Kerberos is using the domain > name (FQDN) in the principal so it won't recognize the IP addresses. > This task is mainly adding a reverse look up on the current basis and get the > domain name after the IP is fetched. After that clients will still use the > domain name to access the servers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14327) Using FQDN instead of IP to access servers with DNS resolving
[ https://issues.apache.org/jira/browse/HDFS-14327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HDFS-14327: Description: With [HDFS-14118|https://issues.apache.org/jira/browse/HDFS-14118], clients can get the IP of the servers (NN/Routers) and use the IP addresses to access the machine. This will fail in secure environment as Kerberos is using the domain name (FQDN) in the principal so it won't recognize the IP addresses. This task is mainly adding a reverse look up on the current basis and get the domain name after the IP is fetched. After that clients will still use the domain name to access the servers. was: *strong text*With [HDFS-14118|https://issues.apache.org/jira/browse/HDFS-14118], clients can get the IP of the servers (NN/Routers) and use the IP addresses to access the machine. This will fail in secure environment as Kerberos is using the domain name (FQDN) in the principal so it won't recognize the IP addresses. This task is mainly adding a reverse look up on the current basis and get the domain name after the IP is fetched. After that clients will still use the domain name to access the servers. > Using FQDN instead of IP to access servers with DNS resolving > - > > Key: HDFS-14327 > URL: https://issues.apache.org/jira/browse/HDFS-14327 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14327.001.patch, HDFS-14327.002.patch > > > With [HDFS-14118|https://issues.apache.org/jira/browse/HDFS-14118], clients > can get the IP of the servers (NN/Routers) and use the IP addresses to access > the machine. This will fail in secure environment as Kerberos is using the > domain name (FQDN) in the principal so it won't recognize the IP addresses. > This task is mainly adding a reverse look up on the current basis and get the > domain name after the IP is fetched. After that clients will still use the > domain name to access the servers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15601) Batch listing: gracefully fallback to use non-batched listing when NameNode doesn't support the feature
[ https://issues.apache.org/jira/browse/HDFS-15601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211161#comment-17211161 ] Aihua Xu commented on HDFS-15601: - Seems I won't have time to work on this. Assign back. > Batch listing: gracefully fallback to use non-batched listing when NameNode > doesn't support the feature > --- > > Key: HDFS-15601 > URL: https://issues.apache.org/jira/browse/HDFS-15601 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Chao Sun >Priority: Major > > HDFS-13616 requires both server and client side change. However, it is common > that users use a newer client to talk to older HDFS (say 2.10). Currently the > client will simply fail in this scenario. A better approach, perhaps, is to > have client fallback to use non-batched listing on the input directories. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15601) Batch listing: gracefully fallback to use non-batched listing when NameNode doesn't support the feature
[ https://issues.apache.org/jira/browse/HDFS-15601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu reassigned HDFS-15601: --- Assignee: (was: Aihua Xu) > Batch listing: gracefully fallback to use non-batched listing when NameNode > doesn't support the feature > --- > > Key: HDFS-15601 > URL: https://issues.apache.org/jira/browse/HDFS-15601 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Chao Sun >Priority: Major > > HDFS-13616 requires both server and client side change. However, it is common > that users use a newer client to talk to older HDFS (say 2.10). Currently the > client will simply fail in this scenario. A better approach, perhaps, is to > have client fallback to use non-batched listing on the input directories. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15601) Batch listing: gracefully fallback to use non-batched listing when NameNode doesn't support the feature
[ https://issues.apache.org/jira/browse/HDFS-15601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu reassigned HDFS-15601: --- Assignee: Aihua Xu > Batch listing: gracefully fallback to use non-batched listing when NameNode > doesn't support the feature > --- > > Key: HDFS-15601 > URL: https://issues.apache.org/jira/browse/HDFS-15601 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Chao Sun >Assignee: Aihua Xu >Priority: Major > > HDFS-13616 requires both server and client side change. However, it is common > that users use a newer client to talk to older HDFS (say 2.10). Currently the > client will simply fail in this scenario. A better approach, perhaps, is to > have client fallback to use non-batched listing on the input directories. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15601) Batch listing: gracefully fallback to use non-batched listing when NameNode doesn't support the feature
[ https://issues.apache.org/jira/browse/HDFS-15601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17207801#comment-17207801 ] Aihua Xu commented on HDFS-15601: - I will take a look. > Batch listing: gracefully fallback to use non-batched listing when NameNode > doesn't support the feature > --- > > Key: HDFS-15601 > URL: https://issues.apache.org/jira/browse/HDFS-15601 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Chao Sun >Assignee: Aihua Xu >Priority: Major > > HDFS-13616 requires both server and client side change. However, it is common > that users use a newer client to talk to older HDFS (say 2.10). Currently the > client will simply fail in this scenario. A better approach, perhaps, is to > have client fallback to use non-batched listing on the input directories. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15562) StandbyCheckpointer will do checkpoint repeatedly while connecting observer/active namenode failed
[ https://issues.apache.org/jira/browse/HDFS-15562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17206838#comment-17206838 ] Aihua Xu commented on HDFS-15562: - [~aswqazxsd] I will take a look. If you can provide more details like the version, stack trace, etc. that will be helpful. > StandbyCheckpointer will do checkpoint repeatedly while connecting > observer/active namenode failed > -- > > Key: HDFS-15562 > URL: https://issues.apache.org/jira/browse/HDFS-15562 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: SunHao >Assignee: Aihua Xu >Priority: Major > > We find the standby namenode will do checkpoint over and over while > connecting observer/active namenode failed. > StandbyCheckpointer won't update “lastCheckpointTime” when upload new fsimage > to the other namenode failed, so that the standby namenode will keep doing > checkpoint repeatedly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15562) StandbyCheckpointer will do checkpoint repeatedly while connecting observer/active namenode failed
[ https://issues.apache.org/jira/browse/HDFS-15562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu reassigned HDFS-15562: --- Assignee: Aihua Xu > StandbyCheckpointer will do checkpoint repeatedly while connecting > observer/active namenode failed > -- > > Key: HDFS-15562 > URL: https://issues.apache.org/jira/browse/HDFS-15562 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: SunHao >Assignee: Aihua Xu >Priority: Major > > We find the standby namenode will do checkpoint over and over while > connecting observer/active namenode failed. > StandbyCheckpointer won't update “lastCheckpointTime” when upload new fsimage > to the other namenode failed, so that the standby namenode will keep doing > checkpoint repeatedly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15597) ContentSummary.getSpaceConsumed does not consider replication
[ https://issues.apache.org/jira/browse/HDFS-15597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17206583#comment-17206583 ] Aihua Xu commented on HDFS-15597: - [~weichiu] Can you review the simple fix? Thanks. > ContentSummary.getSpaceConsumed does not consider replication > - > > Key: HDFS-15597 > URL: https://issues.apache.org/jira/browse/HDFS-15597 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfs >Affects Versions: 2.6.0 >Reporter: Ajmal Ahammed >Assignee: Aihua Xu >Priority: Minor > Attachments: HDFS-15597.patch > > > I am trying to get the disk space consumed by an HDFS directory using the > {{ContentSummary.getSpaceConsumed}} method. I can't get the space consumption > correctly considering the replication factor. The replication factor is 2, > and I was expecting twice the size of the actual file size from the above > method. > I can't get the space consumption correctly considering the replication > factor. The replication factor is 2, and I was expecting twice the size of > the actual file size from the above method. > {code} > ubuntu@ubuntu:~/ht$ sudo -u hdfs hdfs dfs -ls /var/lib/ubuntu > Found 2 items > -rw-r--r-- 2 ubuntu ubuntu3145728 2020-09-08 09:55 > /var/lib/ubuntu/size-test > drwxrwxr-x - ubuntu ubuntu 0 2020-09-07 06:37 /var/lib/ubuntu/test > {code} > But when I run the following code, > {code} > String path = "/etc/hadoop/conf/"; > conf.addResource(new Path(path + "core-site.xml")); > conf.addResource(new Path(path + "hdfs-site.xml")); > long size = > FileContext.getFileContext(conf).util().getContentSummary(fileStatus).getSpaceConsumed(); > System.out.println("Replication : " + fileStatus.getReplication()); > System.out.println("File size : " + size); > {code} > The output is > {code} > Replication : 0 > File size : 3145728 > {code} > Both the file size and the replication factor seems to be incorrect. > /etc/hadoop/conf/hdfs-site.xml contains the following config: > {code} > > dfs.replication > 2 > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15597) ContentSummary.getSpaceConsumed does not consider replication
[ https://issues.apache.org/jira/browse/HDFS-15597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HDFS-15597: Status: Patch Available (was: Open) Patch-1: update getContentSummary function to consider replication for spaceConsumed field. > ContentSummary.getSpaceConsumed does not consider replication > - > > Key: HDFS-15597 > URL: https://issues.apache.org/jira/browse/HDFS-15597 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfs >Affects Versions: 2.6.0 >Reporter: Ajmal Ahammed >Assignee: Aihua Xu >Priority: Minor > Attachments: HDFS-15597.patch > > > I am trying to get the disk space consumed by an HDFS directory using the > {{ContentSummary.getSpaceConsumed}} method. I can't get the space consumption > correctly considering the replication factor. The replication factor is 2, > and I was expecting twice the size of the actual file size from the above > method. > I can't get the space consumption correctly considering the replication > factor. The replication factor is 2, and I was expecting twice the size of > the actual file size from the above method. > {code} > ubuntu@ubuntu:~/ht$ sudo -u hdfs hdfs dfs -ls /var/lib/ubuntu > Found 2 items > -rw-r--r-- 2 ubuntu ubuntu3145728 2020-09-08 09:55 > /var/lib/ubuntu/size-test > drwxrwxr-x - ubuntu ubuntu 0 2020-09-07 06:37 /var/lib/ubuntu/test > {code} > But when I run the following code, > {code} > String path = "/etc/hadoop/conf/"; > conf.addResource(new Path(path + "core-site.xml")); > conf.addResource(new Path(path + "hdfs-site.xml")); > long size = > FileContext.getFileContext(conf).util().getContentSummary(fileStatus).getSpaceConsumed(); > System.out.println("Replication : " + fileStatus.getReplication()); > System.out.println("File size : " + size); > {code} > The output is > {code} > Replication : 0 > File size : 3145728 > {code} > Both the file size and the replication factor seems to be incorrect. > /etc/hadoop/conf/hdfs-site.xml contains the following config: > {code} > > dfs.replication > 2 > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15597) ContentSummary.getSpaceConsumed does not consider replication
[ https://issues.apache.org/jira/browse/HDFS-15597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HDFS-15597: Attachment: HDFS-15597.patch > ContentSummary.getSpaceConsumed does not consider replication > - > > Key: HDFS-15597 > URL: https://issues.apache.org/jira/browse/HDFS-15597 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfs >Affects Versions: 2.6.0 >Reporter: Ajmal Ahammed >Assignee: Aihua Xu >Priority: Minor > Attachments: HDFS-15597.patch > > > I am trying to get the disk space consumed by an HDFS directory using the > {{ContentSummary.getSpaceConsumed}} method. I can't get the space consumption > correctly considering the replication factor. The replication factor is 2, > and I was expecting twice the size of the actual file size from the above > method. > I can't get the space consumption correctly considering the replication > factor. The replication factor is 2, and I was expecting twice the size of > the actual file size from the above method. > {code} > ubuntu@ubuntu:~/ht$ sudo -u hdfs hdfs dfs -ls /var/lib/ubuntu > Found 2 items > -rw-r--r-- 2 ubuntu ubuntu3145728 2020-09-08 09:55 > /var/lib/ubuntu/size-test > drwxrwxr-x - ubuntu ubuntu 0 2020-09-07 06:37 /var/lib/ubuntu/test > {code} > But when I run the following code, > {code} > String path = "/etc/hadoop/conf/"; > conf.addResource(new Path(path + "core-site.xml")); > conf.addResource(new Path(path + "hdfs-site.xml")); > long size = > FileContext.getFileContext(conf).util().getContentSummary(fileStatus).getSpaceConsumed(); > System.out.println("Replication : " + fileStatus.getReplication()); > System.out.println("File size : " + size); > {code} > The output is > {code} > Replication : 0 > File size : 3145728 > {code} > Both the file size and the replication factor seems to be incorrect. > /etc/hadoop/conf/hdfs-site.xml contains the following config: > {code} > > dfs.replication > 2 > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15597) ContentSummary.getSpaceConsumed does not consider replication
[ https://issues.apache.org/jira/browse/HDFS-15597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17206492#comment-17206492 ] Aihua Xu commented on HDFS-15597: - Let me take a look. > ContentSummary.getSpaceConsumed does not consider replication > - > > Key: HDFS-15597 > URL: https://issues.apache.org/jira/browse/HDFS-15597 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfs >Affects Versions: 2.6.0 >Reporter: Ajmal Ahammed >Assignee: Aihua Xu >Priority: Minor > > I am trying to get the disk space consumed by an HDFS directory using the > {{ContentSummary.getSpaceConsumed}} method. I can't get the space consumption > correctly considering the replication factor. The replication factor is 2, > and I was expecting twice the size of the actual file size from the above > method. > I can't get the space consumption correctly considering the replication > factor. The replication factor is 2, and I was expecting twice the size of > the actual file size from the above method. > {code} > ubuntu@ubuntu:~/ht$ sudo -u hdfs hdfs dfs -ls /var/lib/ubuntu > Found 2 items > -rw-r--r-- 2 ubuntu ubuntu3145728 2020-09-08 09:55 > /var/lib/ubuntu/size-test > drwxrwxr-x - ubuntu ubuntu 0 2020-09-07 06:37 /var/lib/ubuntu/test > {code} > But when I run the following code, > {code} > String path = "/etc/hadoop/conf/"; > conf.addResource(new Path(path + "core-site.xml")); > conf.addResource(new Path(path + "hdfs-site.xml")); > long size = > FileContext.getFileContext(conf).util().getContentSummary(fileStatus).getSpaceConsumed(); > System.out.println("Replication : " + fileStatus.getReplication()); > System.out.println("File size : " + size); > {code} > The output is > {code} > Replication : 0 > File size : 3145728 > {code} > Both the file size and the replication factor seems to be incorrect. > /etc/hadoop/conf/hdfs-site.xml contains the following config: > {code} > > dfs.replication > 2 > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15597) ContentSummary.getSpaceConsumed does not consider replication
[ https://issues.apache.org/jira/browse/HDFS-15597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu reassigned HDFS-15597: --- Assignee: Aihua Xu > ContentSummary.getSpaceConsumed does not consider replication > - > > Key: HDFS-15597 > URL: https://issues.apache.org/jira/browse/HDFS-15597 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfs >Affects Versions: 2.6.0 >Reporter: Ajmal Ahammed >Assignee: Aihua Xu >Priority: Minor > > I am trying to get the disk space consumed by an HDFS directory using the > {{ContentSummary.getSpaceConsumed}} method. I can't get the space consumption > correctly considering the replication factor. The replication factor is 2, > and I was expecting twice the size of the actual file size from the above > method. > I can't get the space consumption correctly considering the replication > factor. The replication factor is 2, and I was expecting twice the size of > the actual file size from the above method. > {code} > ubuntu@ubuntu:~/ht$ sudo -u hdfs hdfs dfs -ls /var/lib/ubuntu > Found 2 items > -rw-r--r-- 2 ubuntu ubuntu3145728 2020-09-08 09:55 > /var/lib/ubuntu/size-test > drwxrwxr-x - ubuntu ubuntu 0 2020-09-07 06:37 /var/lib/ubuntu/test > {code} > But when I run the following code, > {code} > String path = "/etc/hadoop/conf/"; > conf.addResource(new Path(path + "core-site.xml")); > conf.addResource(new Path(path + "hdfs-site.xml")); > long size = > FileContext.getFileContext(conf).util().getContentSummary(fileStatus).getSpaceConsumed(); > System.out.println("Replication : " + fileStatus.getReplication()); > System.out.println("File size : " + size); > {code} > The output is > {code} > Replication : 0 > File size : 3145728 > {code} > Both the file size and the replication factor seems to be incorrect. > /etc/hadoop/conf/hdfs-site.xml contains the following config: > {code} > > dfs.replication > 2 > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-5389) A Namenode that keeps only a part of the namespace in memory
[ https://issues.apache.org/jira/browse/HDFS-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu reassigned HDFS-5389: -- Assignee: Aihua Xu (was: Haohui Mai) > A Namenode that keeps only a part of the namespace in memory > > > Key: HDFS-5389 > URL: https://issues.apache.org/jira/browse/HDFS-5389 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 0.23.1 >Reporter: Lin Xiao >Assignee: Aihua Xu >Priority: Minor > > *Background:* > Currently, the NN Keeps all its namespace in memory. This has had the benefit > that the NN code is very simple and, more importantly, helps the NN scale to > over 4.5K machines with 60K to 100K concurrently tasks. HDFS namespace can > be scaled currently using more Ram on the NN and/or using Federation which > scales both namespace and performance. The current federation implementation > does not allow renames across volumes without data copying but there are > proposals to remove that limitation. > *Motivation:* > Hadoop lets customers store huge amounts of data at very economical prices > and hence allows customers to store their data for several years. While most > customers perform analytics on recent data (last hour, day, week, months, > quarter, year), the ability to have five year old data online for analytics > is very attractive for many businesses. Although one can use larger RAM in a > NN and/or use Federation, it not really necessary to store the entire > namespace in memory since only the recent data is typically heavily accessed. > *Proposed Solution:* > Store a portion of the NN's namespace in memory- the "working set" of the > applications that are currently operating. LSM data structures are quite > appropriate for maintaining the full namespace in memory. One choice is > Google's LevelDB open-source implementation. > *Benefits:* > * Store larger namespaces without resorting to Federated namespace volumes. > * Complementary to NN Federated namespace volumes, indeed will allow a > single NN to easily store multiple larger volumes. > * Faster cold startup - the NN does not have read its full namespace before > responding to clients. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-5389) A Namenode that keeps only a part of the namespace in memory
[ https://issues.apache.org/jira/browse/HDFS-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17206394#comment-17206394 ] Aihua Xu commented on HDFS-5389: Seems no activity on this but sounds a great area to help scale NN and reduce NN memory pressure. I will take a look. [~weichiu] and [~yzhangal] > A Namenode that keeps only a part of the namespace in memory > > > Key: HDFS-5389 > URL: https://issues.apache.org/jira/browse/HDFS-5389 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 0.23.1 >Reporter: Lin Xiao >Assignee: Haohui Mai >Priority: Minor > > *Background:* > Currently, the NN Keeps all its namespace in memory. This has had the benefit > that the NN code is very simple and, more importantly, helps the NN scale to > over 4.5K machines with 60K to 100K concurrently tasks. HDFS namespace can > be scaled currently using more Ram on the NN and/or using Federation which > scales both namespace and performance. The current federation implementation > does not allow renames across volumes without data copying but there are > proposals to remove that limitation. > *Motivation:* > Hadoop lets customers store huge amounts of data at very economical prices > and hence allows customers to store their data for several years. While most > customers perform analytics on recent data (last hour, day, week, months, > quarter, year), the ability to have five year old data online for analytics > is very attractive for many businesses. Although one can use larger RAM in a > NN and/or use Federation, it not really necessary to store the entire > namespace in memory since only the recent data is typically heavily accessed. > *Proposed Solution:* > Store a portion of the NN's namespace in memory- the "working set" of the > applications that are currently operating. LSM data structures are quite > appropriate for maintaining the full namespace in memory. One choice is > Google's LevelDB open-source implementation. > *Benefits:* > * Store larger namespaces without resorting to Federated namespace volumes. > * Complementary to NN Federated namespace volumes, indeed will allow a > single NN to easily store multiple larger volumes. > * Faster cold startup - the NN does not have read its full namespace before > responding to clients. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14450) Erasure Coding: decommissioning datanodes cause replicate a large number of duplicate EC internal blocks
[ https://issues.apache.org/jira/browse/HDFS-14450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824561#comment-16824561 ] Aihua Xu commented on HDFS-14450: - [~wuweiwei] Is this issue always reproducible? > Erasure Coding: decommissioning datanodes cause replicate a large number of > duplicate EC internal blocks > > > Key: HDFS-14450 > URL: https://issues.apache.org/jira/browse/HDFS-14450 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Wu Weiwei >Assignee: Wu Weiwei >Priority: Major > > {code:java} > // [WARN] [RedundancyMonitor] : Failed to place enough replicas, still in > need of 2 to reach 167 (unavailableStorages=[DISK, ARCHIVE], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) All > required storage types are unavailable: unavailableStorages=[DISK, ARCHIVE], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > In a large-scale cluster, decommissioning large-scale datanodes cause EC > block groups to replicate a large number of duplicate internal blocks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14364) DataNode crashes without any workload
Aihua Xu created HDFS-14364: --- Summary: DataNode crashes without any workload Key: HDFS-14364 URL: https://issues.apache.org/jira/browse/HDFS-14364 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 3.1.1 Reporter: Aihua Xu Attachments: hs_err_pid106000.log All the datanodes crash on the cluster with EC enabled on part of HDFS. At the crash time, there is no active workload. Please see attached for JVM crash report. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14363) Namenode crashes without any workload
Aihua Xu created HDFS-14363: --- Summary: Namenode crashes without any workload Key: HDFS-14363 URL: https://issues.apache.org/jira/browse/HDFS-14363 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 3.1.1 Reporter: Aihua Xu Attachments: hs_err_pid32124.log, hs_err_pid33479.log, hs_err_pid44708.log The namenode and QJM both crash without active workloads. Please see attach log for JVM crash report. Erasing code is enabled on the cluster for part of the HDFS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14362) HDFS cluster with EC enabled crashes without any workload
[ https://issues.apache.org/jira/browse/HDFS-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HDFS-14362: Summary: HDFS cluster with EC enabled crashes without any workload (was: HDFS cluster crashes without any workload) > HDFS cluster with EC enabled crashes without any workload > - > > Key: HDFS-14362 > URL: https://issues.apache.org/jira/browse/HDFS-14362 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.1.1 >Reporter: Aihua Xu >Priority: Major > > We have a small test cluster on 3.1.1 with erasure coding enabled. We loaded > some data but doesn't have active data access. Then we are seeing the > namenode, datanode crash. Use this parent jira to track the issues. Right now > we are not sure if it's related to EC and if it has been fixed in the later > version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14362) HDFS cluster crashes without any workload
Aihua Xu created HDFS-14362: --- Summary: HDFS cluster crashes without any workload Key: HDFS-14362 URL: https://issues.apache.org/jira/browse/HDFS-14362 Project: Hadoop HDFS Issue Type: Bug Components: erasure-coding Affects Versions: 3.1.1 Reporter: Aihua Xu We have a small test cluster on 3.1.1 with erasure coding enabled. We loaded some data but doesn't have active data access. Then we are seeing the namenode, datanode crash. Use this parent jira to track the issues. Right now we are not sure if it's related to EC and if it has been fixed in the later version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-10586) Erasure Code misfunctions when 3 DataNode down
[ https://issues.apache.org/jira/browse/HDFS-10586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu reassigned HDFS-10586: --- Assignee: Aihua Xu > Erasure Code misfunctions when 3 DataNode down > -- > > Key: HDFS-10586 > URL: https://issues.apache.org/jira/browse/HDFS-10586 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 > Environment: 9 DataNode and 1 NameNode,Erasured code policy is > set as "6--3", When 3 DataNode down, erasured code fails and an exception > is thrown >Reporter: gao shan >Assignee: Aihua Xu >Priority: Major > > The following is the steps to reproduce: > 1) hadoop fs -mkdir /ec > 2) set erasured code policy as "6-3" > 3) "write" data by : > time hadoop jar > /opt/hadoop/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT.jar > TestDFSIO -D test.build.data=/ec -write -nrFiles 30 -fileSize 12288 > -bufferSize 1073741824 > 4) Manually down 3 nodes. Kill the threads of "datanode" and "nodemanager" > in 3 DataNode. > 5) By using erasured code to "read" data by: > time hadoop jar > /opt/hadoop/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT.jar > TestDFSIO -D test.build.data=/ec -read -nrFiles 30 -fileSize 12288 > -bufferSize 1073741824 > then the failure occurs and the exception is thrown as: > INFO mapreduce.Job: Task Id : attempt_1465445965249_0008_m_34_2, Status : > FAILED > Error: java.io.IOException: 4 missing blocks, the stripe is: Offset=0, > length=8388608, fetchedChunksNum=0, missingChunksNum=4 > at > org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.checkMissingBlocks(DFSStripedInputStream.java:614) > at > org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.readParityChunks(DFSStripedInputStream.java:647) > at > org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.readStripe(DFSStripedInputStream.java:762) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:316) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:450) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:941) > at java.io.DataInputStream.read(DataInputStream.java:149) > at org.apache.hadoop.fs.TestDFSIO$ReadMapper.doIO(TestDFSIO.java:531) > at org.apache.hadoop.fs.TestDFSIO$ReadMapper.doIO(TestDFSIO.java:508) > at org.apache.hadoop.fs.IOMapperBase.map(IOMapperBase.java:134) > at org.apache.hadoop.fs.IOMapperBase.map(IOMapperBase.java:37) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10586) Erasure Code misfunctions when 3 DataNode down
[ https://issues.apache.org/jira/browse/HDFS-10586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739908#comment-16739908 ] Aihua Xu commented on HDFS-10586: - I tested out this briefly and it doesn't seem to be an issue any more. I will do more investigation and confirm. > Erasure Code misfunctions when 3 DataNode down > -- > > Key: HDFS-10586 > URL: https://issues.apache.org/jira/browse/HDFS-10586 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 > Environment: 9 DataNode and 1 NameNode,Erasured code policy is > set as "6--3", When 3 DataNode down, erasured code fails and an exception > is thrown >Reporter: gao shan >Assignee: Aihua Xu >Priority: Major > > The following is the steps to reproduce: > 1) hadoop fs -mkdir /ec > 2) set erasured code policy as "6-3" > 3) "write" data by : > time hadoop jar > /opt/hadoop/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT.jar > TestDFSIO -D test.build.data=/ec -write -nrFiles 30 -fileSize 12288 > -bufferSize 1073741824 > 4) Manually down 3 nodes. Kill the threads of "datanode" and "nodemanager" > in 3 DataNode. > 5) By using erasured code to "read" data by: > time hadoop jar > /opt/hadoop/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT.jar > TestDFSIO -D test.build.data=/ec -read -nrFiles 30 -fileSize 12288 > -bufferSize 1073741824 > then the failure occurs and the exception is thrown as: > INFO mapreduce.Job: Task Id : attempt_1465445965249_0008_m_34_2, Status : > FAILED > Error: java.io.IOException: 4 missing blocks, the stripe is: Offset=0, > length=8388608, fetchedChunksNum=0, missingChunksNum=4 > at > org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.checkMissingBlocks(DFSStripedInputStream.java:614) > at > org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.readParityChunks(DFSStripedInputStream.java:647) > at > org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.readStripe(DFSStripedInputStream.java:762) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:316) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:450) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:941) > at java.io.DataInputStream.read(DataInputStream.java:149) > at org.apache.hadoop.fs.TestDFSIO$ReadMapper.doIO(TestDFSIO.java:531) > at org.apache.hadoop.fs.TestDFSIO$ReadMapper.doIO(TestDFSIO.java:508) > at org.apache.hadoop.fs.IOMapperBase.map(IOMapperBase.java:134) > at org.apache.hadoop.fs.IOMapperBase.map(IOMapperBase.java:37) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14190) Copying folders containing = - characters between hdfs (using webhdfs) does not work in distcp
[ https://issues.apache.org/jira/browse/HDFS-14190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737572#comment-16737572 ] Aihua Xu commented on HDFS-14190: - I will take a look. > Copying folders containing = - characters between hdfs (using webhdfs) does > not work in distcp > -- > > Key: HDFS-14190 > URL: https://issues.apache.org/jira/browse/HDFS-14190 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Affects Versions: 3.1.1 >Reporter: yinsong >Assignee: Aihua Xu >Priority: Major > > Copying folders containing = - characters between hdfs (using webhdfs) does > not work in distcp > for example: > src:hadoop2.7 target:hadoop3.1.1 > (1) > hadoop distcp \ > -pugp \ > -i \ > webhdfs://1.1.1.1:50070/sudiyi_datawarehouse > webhdfs://2.2.2.2:50070/sudiyi_datawarehouse > ERROR tools.SimpleCopyListing: FileNotFoundException exception in listStatus: > File /sudiyi_datawarehouse/st_device_standard_ds/date_time%3D2018-10-10 does > not exist > > (2) > hadoop distcp \ > -Dmapreduce.framework.name=yarn \ > -pugp \ > -i \ > webhdfs://1.1.1.1:50070/druid webhdfs://2.2.2.2:50070/druid > Error: java.io.IOException: File copy failed: > webhdfs://10.26.93.65:50070/druid/indexing-logs/kill_task-myapp_V1-2018-04-26T16_20_55+0800 > --> > webhdfs://10.27.234.198:50070/druid/indexing-logs/kill_task-myapp_V1-2018-04-26T16_20_55+0800 > at > org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:259) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:217) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:48) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) > Caused by: java.io.IOException: Couldn't run retriable-command: Copying > webhdfs://10.26.93.65:50070/druid/indexing-logs/kill_task-myapp_V1-2018-04-26T16_20_55+0800 > to > webhdfs://10.27.234.198:50070/druid/indexing-logs/kill_task-myapp_V1-2018-04-26T16_20_55+0800 > at > org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101) > at > org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:256) > ... 10 more > Caused by: java.io.IOException: Failed to promote > tmp-file:webhdfs://10.27.234.198:50070/druid/.distcp.tmp.attempt_1545990837043_0016_m_15_2 > to: > webhdfs://10.27.234.198:50070/druid/indexing-logs/kill_task-myapp_V1-2018-04-26T16_20_55+0800 > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.promoteTmpToTarget(RetriableFileCopyCommand.java:250) > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:140) > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:99) > at > org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-14190) Copying folders containing = - characters between hdfs (using webhdfs) does not work in distcp
[ https://issues.apache.org/jira/browse/HDFS-14190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu reassigned HDFS-14190: --- Assignee: Aihua Xu > Copying folders containing = - characters between hdfs (using webhdfs) does > not work in distcp > -- > > Key: HDFS-14190 > URL: https://issues.apache.org/jira/browse/HDFS-14190 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Affects Versions: 3.1.1 >Reporter: yinsong >Assignee: Aihua Xu >Priority: Major > > Copying folders containing = - characters between hdfs (using webhdfs) does > not work in distcp > for example: > src:hadoop2.7 target:hadoop3.1.1 > (1) > hadoop distcp \ > -pugp \ > -i \ > webhdfs://1.1.1.1:50070/sudiyi_datawarehouse > webhdfs://2.2.2.2:50070/sudiyi_datawarehouse > ERROR tools.SimpleCopyListing: FileNotFoundException exception in listStatus: > File /sudiyi_datawarehouse/st_device_standard_ds/date_time%3D2018-10-10 does > not exist > > (2) > hadoop distcp \ > -Dmapreduce.framework.name=yarn \ > -pugp \ > -i \ > webhdfs://1.1.1.1:50070/druid webhdfs://2.2.2.2:50070/druid > Error: java.io.IOException: File copy failed: > webhdfs://10.26.93.65:50070/druid/indexing-logs/kill_task-myapp_V1-2018-04-26T16_20_55+0800 > --> > webhdfs://10.27.234.198:50070/druid/indexing-logs/kill_task-myapp_V1-2018-04-26T16_20_55+0800 > at > org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:259) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:217) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:48) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) > Caused by: java.io.IOException: Couldn't run retriable-command: Copying > webhdfs://10.26.93.65:50070/druid/indexing-logs/kill_task-myapp_V1-2018-04-26T16_20_55+0800 > to > webhdfs://10.27.234.198:50070/druid/indexing-logs/kill_task-myapp_V1-2018-04-26T16_20_55+0800 > at > org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101) > at > org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:256) > ... 10 more > Caused by: java.io.IOException: Failed to promote > tmp-file:webhdfs://10.27.234.198:50070/druid/.distcp.tmp.attempt_1545990837043_0016_m_15_2 > to: > webhdfs://10.27.234.198:50070/druid/indexing-logs/kill_task-myapp_V1-2018-04-26T16_20_55+0800 > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.promoteTmpToTarget(RetriableFileCopyCommand.java:250) > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:140) > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:99) > at > org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10815) The state of the EC file is erroneously recognized when you restart the NameNode.
[ https://issues.apache.org/jira/browse/HDFS-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736229#comment-16736229 ] Aihua Xu commented on HDFS-10815: - Along with HDFS-10775, I will try out for the scenario and close out if it's not an issue. > The state of the EC file is erroneously recognized when you restart the > NameNode. > - > > Key: HDFS-10815 > URL: https://issues.apache.org/jira/browse/HDFS-10815 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 > Environment: 2 NameNodes, 5 DataNodes, Erasured code policy is set as > "RS-DEFAULT-3-2-64k" >Reporter: Eisuke Umeda >Assignee: Aihua Xu >Priority: Major > > After carrying out an examination in the following procedures, an EC files > came to be recognized as corrupt files. > These files were able to get in "hdfs dfs -get". > NameNode might be causing the false recognition. > DataNodes: datanode[1-5] > Rack awareness: not set > Copy target files: /tmp/tpcds-generate/25/store_sales/* > {code} > $ hdfs dfs -ls /tmp/tpcds-generate/25/store_sales > Found 25 items > -rw-r--r-- 0 root supergroup 399430918 2016-08-16 15:11 > /tmp/tpcds-generate/25/store_sales/data-m-0 > -rw-r--r-- 0 root supergroup 399054598 2016-08-16 15:11 > /tmp/tpcds-generate/25/store_sales/data-m-1 > -rw-r--r-- 0 root supergroup 399329373 2016-08-16 15:11 > /tmp/tpcds-generate/25/store_sales/data-m-2 > -rw-r--r-- 0 root supergroup 399528459 2016-08-16 15:11 > /tmp/tpcds-generate/25/store_sales/data-m-3 > -rw-r--r-- 0 root supergroup 399329624 2016-08-16 15:11 > /tmp/tpcds-generate/25/store_sales/data-m-4 > -rw-r--r-- 0 root supergroup 399085924 2016-08-16 15:11 > /tmp/tpcds-generate/25/store_sales/data-m-5 > -rw-r--r-- 0 root supergroup 399337384 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-6 > -rw-r--r-- 0 root supergroup 399199458 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-7 > -rw-r--r-- 0 root supergroup 399679096 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-8 > -rw-r--r-- 0 root supergroup 399440431 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-9 > -rw-r--r-- 0 root supergroup 399403931 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-00010 > -rw-r--r-- 0 root supergroup 399472465 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-00011 > -rw-r--r-- 0 root supergroup 399451784 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-00012 > -rw-r--r-- 0 root supergroup 399240168 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-00013 > -rw-r--r-- 0 root supergroup 399370507 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-00014 > -rw-r--r-- 0 root supergroup 399633351 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-00015 > -rw-r--r-- 0 root supergroup 396532952 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00016 > -rw-r--r-- 0 root supergroup 396258715 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00017 > -rw-r--r-- 0 root supergroup 396382486 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00018 > -rw-r--r-- 0 root supergroup 399016456 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00019 > -rw-r--r-- 0 root supergroup 399465745 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00020 > -rw-r--r-- 0 root supergroup 399208235 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00021 > -rw-r--r-- 0 root supergroup 399198296 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00022 > -rw-r--r-- 0 root supergroup 399599711 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00023 > -rw-r--r-- 0 root supergroup 395150855 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00024 > {code} > NameNodes: > namenode1(active) > namenode2(standby) > The directory which there is "Under-erasure-coded block groups": > /tmp/tpcds-generate/test > {code} > $ sudo -u hdfs hdfs erasurecode -getPolicy /tmp/tpcds-generate/test > ErasureCodingPolicy=[Name=RS-DEFAULT-3-2-64k, > Schema=[ECSchema=[Codec=rs-default, numDataUnits=3, numParityUnits=2]], > CellSize=65536 ] > {code} > The following is the steps to reproduce: > 1) hdfs dfs -cp /tmp/tpcds-generate/25/store_sales/* /tmp/tpcds-generate/test > 2) datanode1: (in the middle of the copy) sudo pkill -9 -f datanode > 3) start a process of datanode1 two minutes later > 4) carry out hdfs fsck and confirm that Under-Replicated Blocks occurred > 5) wait until Under-Replicated Blocks becomes 0 > 6) (namenode1) /etc/init.d/hadoop-hdfs-namenode restart > 7) (namenode2)
[jira] [Assigned] (HDFS-10815) The state of the EC file is erroneously recognized when you restart the NameNode.
[ https://issues.apache.org/jira/browse/HDFS-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu reassigned HDFS-10815: --- Assignee: Aihua Xu > The state of the EC file is erroneously recognized when you restart the > NameNode. > - > > Key: HDFS-10815 > URL: https://issues.apache.org/jira/browse/HDFS-10815 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 > Environment: 2 NameNodes, 5 DataNodes, Erasured code policy is set as > "RS-DEFAULT-3-2-64k" >Reporter: Eisuke Umeda >Assignee: Aihua Xu >Priority: Major > > After carrying out an examination in the following procedures, an EC files > came to be recognized as corrupt files. > These files were able to get in "hdfs dfs -get". > NameNode might be causing the false recognition. > DataNodes: datanode[1-5] > Rack awareness: not set > Copy target files: /tmp/tpcds-generate/25/store_sales/* > {code} > $ hdfs dfs -ls /tmp/tpcds-generate/25/store_sales > Found 25 items > -rw-r--r-- 0 root supergroup 399430918 2016-08-16 15:11 > /tmp/tpcds-generate/25/store_sales/data-m-0 > -rw-r--r-- 0 root supergroup 399054598 2016-08-16 15:11 > /tmp/tpcds-generate/25/store_sales/data-m-1 > -rw-r--r-- 0 root supergroup 399329373 2016-08-16 15:11 > /tmp/tpcds-generate/25/store_sales/data-m-2 > -rw-r--r-- 0 root supergroup 399528459 2016-08-16 15:11 > /tmp/tpcds-generate/25/store_sales/data-m-3 > -rw-r--r-- 0 root supergroup 399329624 2016-08-16 15:11 > /tmp/tpcds-generate/25/store_sales/data-m-4 > -rw-r--r-- 0 root supergroup 399085924 2016-08-16 15:11 > /tmp/tpcds-generate/25/store_sales/data-m-5 > -rw-r--r-- 0 root supergroup 399337384 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-6 > -rw-r--r-- 0 root supergroup 399199458 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-7 > -rw-r--r-- 0 root supergroup 399679096 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-8 > -rw-r--r-- 0 root supergroup 399440431 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-9 > -rw-r--r-- 0 root supergroup 399403931 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-00010 > -rw-r--r-- 0 root supergroup 399472465 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-00011 > -rw-r--r-- 0 root supergroup 399451784 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-00012 > -rw-r--r-- 0 root supergroup 399240168 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-00013 > -rw-r--r-- 0 root supergroup 399370507 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-00014 > -rw-r--r-- 0 root supergroup 399633351 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-00015 > -rw-r--r-- 0 root supergroup 396532952 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00016 > -rw-r--r-- 0 root supergroup 396258715 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00017 > -rw-r--r-- 0 root supergroup 396382486 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00018 > -rw-r--r-- 0 root supergroup 399016456 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00019 > -rw-r--r-- 0 root supergroup 399465745 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00020 > -rw-r--r-- 0 root supergroup 399208235 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00021 > -rw-r--r-- 0 root supergroup 399198296 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00022 > -rw-r--r-- 0 root supergroup 399599711 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00023 > -rw-r--r-- 0 root supergroup 395150855 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00024 > {code} > NameNodes: > namenode1(active) > namenode2(standby) > The directory which there is "Under-erasure-coded block groups": > /tmp/tpcds-generate/test > {code} > $ sudo -u hdfs hdfs erasurecode -getPolicy /tmp/tpcds-generate/test > ErasureCodingPolicy=[Name=RS-DEFAULT-3-2-64k, > Schema=[ECSchema=[Codec=rs-default, numDataUnits=3, numParityUnits=2]], > CellSize=65536 ] > {code} > The following is the steps to reproduce: > 1) hdfs dfs -cp /tmp/tpcds-generate/25/store_sales/* /tmp/tpcds-generate/test > 2) datanode1: (in the middle of the copy) sudo pkill -9 -f datanode > 3) start a process of datanode1 two minutes later > 4) carry out hdfs fsck and confirm that Under-Replicated Blocks occurred > 5) wait until Under-Replicated Blocks becomes 0 > 6) (namenode1) /etc/init.d/hadoop-hdfs-namenode restart > 7) (namenode2) /etc/init.d/hadoop-hdfs-namenode restart -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HDFS-10775) Under-Replicated Blocks can not be recovered
[ https://issues.apache.org/jira/browse/HDFS-10775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736221#comment-16736221 ] Aihua Xu commented on HDFS-10775: - Not sure if it's still an issue since it's a very old one. I will take a look. > Under-Replicated Blocks can not be recovered > > > Key: HDFS-10775 > URL: https://issues.apache.org/jira/browse/HDFS-10775 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 > Environment: 2 NameNodes, 5 DataNodes, Erasured code policy is set as > "RS-DEFAULT-3-2-64k" >Reporter: Eisuke Umeda >Assignee: Aihua Xu >Priority: Major > > I killed DataNode in the middle of the writing of the EC file. > Under-Replicated Blocks has occurred, but did not recover. > DataNodes: datanode[1-5] > Rack awareness: not set > Copy target files: /tmp/tpcds-generate/25/store_sales/* > {code} > $ hdfs dfs -ls /tmp/tpcds-generate/25/store_sales > Found 25 items > -rw-r--r-- 0 root supergroup 399430918 2016-08-16 15:11 > /tmp/tpcds-generate/25/store_sales/data-m-0 > -rw-r--r-- 0 root supergroup 399054598 2016-08-16 15:11 > /tmp/tpcds-generate/25/store_sales/data-m-1 > -rw-r--r-- 0 root supergroup 399329373 2016-08-16 15:11 > /tmp/tpcds-generate/25/store_sales/data-m-2 > -rw-r--r-- 0 root supergroup 399528459 2016-08-16 15:11 > /tmp/tpcds-generate/25/store_sales/data-m-3 > -rw-r--r-- 0 root supergroup 399329624 2016-08-16 15:11 > /tmp/tpcds-generate/25/store_sales/data-m-4 > -rw-r--r-- 0 root supergroup 399085924 2016-08-16 15:11 > /tmp/tpcds-generate/25/store_sales/data-m-5 > -rw-r--r-- 0 root supergroup 399337384 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-6 > -rw-r--r-- 0 root supergroup 399199458 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-7 > -rw-r--r-- 0 root supergroup 399679096 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-8 > -rw-r--r-- 0 root supergroup 399440431 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-9 > -rw-r--r-- 0 root supergroup 399403931 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-00010 > -rw-r--r-- 0 root supergroup 399472465 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-00011 > -rw-r--r-- 0 root supergroup 399451784 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-00012 > -rw-r--r-- 0 root supergroup 399240168 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-00013 > -rw-r--r-- 0 root supergroup 399370507 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-00014 > -rw-r--r-- 0 root supergroup 399633351 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-00015 > -rw-r--r-- 0 root supergroup 396532952 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00016 > -rw-r--r-- 0 root supergroup 396258715 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00017 > -rw-r--r-- 0 root supergroup 396382486 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00018 > -rw-r--r-- 0 root supergroup 399016456 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00019 > -rw-r--r-- 0 root supergroup 399465745 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00020 > -rw-r--r-- 0 root supergroup 399208235 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00021 > -rw-r--r-- 0 root supergroup 399198296 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00022 > -rw-r--r-- 0 root supergroup 399599711 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00023 > -rw-r--r-- 0 root supergroup 395150855 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00024 > {code} > Destination directory: /tmp/tpcds-generate/test > {code} > $ sudo -u hdfs hdfs erasurecode -getPolicy /tmp/tpcds-generate/test > ErasureCodingPolicy=[Name=RS-DEFAULT-3-2-64k, > Schema=[ECSchema=[Codec=rs-default, numDataUnits=3, numParityUnits=2]], > CellSize=65536 ] > {code} > The following is the steps to reproduce: > 1) hdfs dfs -cp /tmp/tpcds-generate/25/store_sales/* /tmp/tpcds-generate/test > 2) datanode1: (in the middle of the copy) sudo pkill -9 -f datanode > 3) start a process of datanode1 two minutes later > 4) wait for a while -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-10775) Under-Replicated Blocks can not be recovered
[ https://issues.apache.org/jira/browse/HDFS-10775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu reassigned HDFS-10775: --- Assignee: Aihua Xu > Under-Replicated Blocks can not be recovered > > > Key: HDFS-10775 > URL: https://issues.apache.org/jira/browse/HDFS-10775 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 > Environment: 2 NameNodes, 5 DataNodes, Erasured code policy is set as > "RS-DEFAULT-3-2-64k" >Reporter: Eisuke Umeda >Assignee: Aihua Xu >Priority: Major > > I killed DataNode in the middle of the writing of the EC file. > Under-Replicated Blocks has occurred, but did not recover. > DataNodes: datanode[1-5] > Rack awareness: not set > Copy target files: /tmp/tpcds-generate/25/store_sales/* > {code} > $ hdfs dfs -ls /tmp/tpcds-generate/25/store_sales > Found 25 items > -rw-r--r-- 0 root supergroup 399430918 2016-08-16 15:11 > /tmp/tpcds-generate/25/store_sales/data-m-0 > -rw-r--r-- 0 root supergroup 399054598 2016-08-16 15:11 > /tmp/tpcds-generate/25/store_sales/data-m-1 > -rw-r--r-- 0 root supergroup 399329373 2016-08-16 15:11 > /tmp/tpcds-generate/25/store_sales/data-m-2 > -rw-r--r-- 0 root supergroup 399528459 2016-08-16 15:11 > /tmp/tpcds-generate/25/store_sales/data-m-3 > -rw-r--r-- 0 root supergroup 399329624 2016-08-16 15:11 > /tmp/tpcds-generate/25/store_sales/data-m-4 > -rw-r--r-- 0 root supergroup 399085924 2016-08-16 15:11 > /tmp/tpcds-generate/25/store_sales/data-m-5 > -rw-r--r-- 0 root supergroup 399337384 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-6 > -rw-r--r-- 0 root supergroup 399199458 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-7 > -rw-r--r-- 0 root supergroup 399679096 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-8 > -rw-r--r-- 0 root supergroup 399440431 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-9 > -rw-r--r-- 0 root supergroup 399403931 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-00010 > -rw-r--r-- 0 root supergroup 399472465 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-00011 > -rw-r--r-- 0 root supergroup 399451784 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-00012 > -rw-r--r-- 0 root supergroup 399240168 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-00013 > -rw-r--r-- 0 root supergroup 399370507 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-00014 > -rw-r--r-- 0 root supergroup 399633351 2016-08-16 15:12 > /tmp/tpcds-generate/25/store_sales/data-m-00015 > -rw-r--r-- 0 root supergroup 396532952 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00016 > -rw-r--r-- 0 root supergroup 396258715 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00017 > -rw-r--r-- 0 root supergroup 396382486 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00018 > -rw-r--r-- 0 root supergroup 399016456 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00019 > -rw-r--r-- 0 root supergroup 399465745 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00020 > -rw-r--r-- 0 root supergroup 399208235 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00021 > -rw-r--r-- 0 root supergroup 399198296 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00022 > -rw-r--r-- 0 root supergroup 399599711 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00023 > -rw-r--r-- 0 root supergroup 395150855 2016-08-16 15:13 > /tmp/tpcds-generate/25/store_sales/data-m-00024 > {code} > Destination directory: /tmp/tpcds-generate/test > {code} > $ sudo -u hdfs hdfs erasurecode -getPolicy /tmp/tpcds-generate/test > ErasureCodingPolicy=[Name=RS-DEFAULT-3-2-64k, > Schema=[ECSchema=[Codec=rs-default, numDataUnits=3, numParityUnits=2]], > CellSize=65536 ] > {code} > The following is the steps to reproduce: > 1) hdfs dfs -cp /tmp/tpcds-generate/25/store_sales/* /tmp/tpcds-generate/test > 2) datanode1: (in the middle of the copy) sudo pkill -9 -f datanode > 3) start a process of datanode1 two minutes later > 4) wait for a while -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13293) RBF: The RouterRPCServer should transfer CallerContext and client ip to NamenodeRpcServer
[ https://issues.apache.org/jira/browse/HDFS-13293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733707#comment-16733707 ] Aihua Xu commented on HDFS-13293: - [~ferhui] How is the progress on this jira? We are also interested in getting callerContext to work with RBF. Thanks. > RBF: The RouterRPCServer should transfer CallerContext and client ip to > NamenodeRpcServer > - > > Key: HDFS-13293 > URL: https://issues.apache.org/jira/browse/HDFS-13293 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: maobaolong >Assignee: Fei Hui >Priority: Major > Attachments: HDFS-13293.001.patch > > > Otherwise, the namenode don't know the client's callerContext -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12953) XORRawDecoder.doDecode throws NullPointerException
[ https://issues.apache.org/jira/browse/HDFS-12953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733614#comment-16733614 ] Aihua Xu commented on HDFS-12953: - [~xiaochen] Are you working on this? I can investigate the issue further if you are not. Does the test case you attached fail? > XORRawDecoder.doDecode throws NullPointerException > -- > > Key: HDFS-12953 > URL: https://issues.apache.org/jira/browse/HDFS-12953 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Lei (Eddy) Xu >Assignee: Xiao Chen >Priority: Major > Attachments: HDFS-12953.test.patch > > > Thanks [~danielpol] report on HDFS-12860. > {noformat} > 17/11/30 04:19:55 INFO mapreduce.Job: map 0% reduce 0% > 17/11/30 04:20:01 INFO mapreduce.Job: Task Id : > attempt_1512036058655_0003_m_02_0, Status : FAILED > Error: java.lang.NullPointerException > at > org.apache.hadoop.io.erasurecode.rawcoder.XORRawDecoder.doDecode(XORRawDecoder.java:83) > at > org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:106) > at > org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170) > at > org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:423) > at > org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94) > at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:382) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:318) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:391) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:813) > at java.io.DataInputStream.read(DataInputStream.java:149) > at > org.apache.hadoop.examples.terasort.TeraInputFormat$TeraRecordReader.nextKeyValue(TeraInputFormat.java:257) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:563) > at > org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) > at > org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:794) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14098) use fsck tools for EC blockId will throw NullPointerException
[ https://issues.apache.org/jira/browse/HDFS-14098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733607#comment-16733607 ] Aihua Xu commented on HDFS-14098: - [~luoge123] Are you working on the fix? We are also seeing the same issue and I can take over if you are not working on it. Thanks. > use fsck tools for EC blockId will throw NullPointerException > - > > Key: HDFS-14098 > URL: https://issues.apache.org/jira/browse/HDFS-14098 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: luoge123 >Assignee: luoge123 >Priority: Major > > > when EC file have some block missing, use fsck tool for EC blockId whil throw > NullPointerException > {code:java} > hdfs fsck -blockId blk_-9223372036800049376 > fsck information is: > Block Id: blk_-9223372036800049376 > Block belongs to: /logdata/test.lzo > No. of Expected Replica: 9 > No. of live Replica: 5 > No. of excess Replica: 0 > No. of stale Replica: 0 > No. of decommissioned Replica: 0 > No. of decommissioning Replica: 0 > No. of corrupted Replica: 0 > null > {code} > namenode will throw NullPointerException: > {code:java} > 2018-11-26 15:59:35,107 WARN org.apache.hadoop.hdfs.server.namenode.NameNode: > Fsck on blockId 'blk_-9223372036800049376 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.blockIdCK(NamenodeFsck.java:270) > at > org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.fsck(NamenodeFsck.java:313) > at > org.apache.hadoop.hdfs.server.namenode.FsckServlet$1.run(FsckServlet.java:67) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10586) Erasure Code misfunctions when 3 DataNode down
[ https://issues.apache.org/jira/browse/HDFS-10586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662600#comment-16662600 ] Aihua Xu commented on HDFS-10586: - [~gaoshbj] This is a very old jira. I'm wondering if you have any further update on the issue. > Erasure Code misfunctions when 3 DataNode down > -- > > Key: HDFS-10586 > URL: https://issues.apache.org/jira/browse/HDFS-10586 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 > Environment: 9 DataNode and 1 NameNode,Erasured code policy is > set as "6--3", When 3 DataNode down, erasured code fails and an exception > is thrown >Reporter: gao shan >Priority: Major > > The following is the steps to reproduce: > 1) hadoop fs -mkdir /ec > 2) set erasured code policy as "6-3" > 3) "write" data by : > time hadoop jar > /opt/hadoop/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT.jar > TestDFSIO -D test.build.data=/ec -write -nrFiles 30 -fileSize 12288 > -bufferSize 1073741824 > 4) Manually down 3 nodes. Kill the threads of "datanode" and "nodemanager" > in 3 DataNode. > 5) By using erasured code to "read" data by: > time hadoop jar > /opt/hadoop/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT.jar > TestDFSIO -D test.build.data=/ec -read -nrFiles 30 -fileSize 12288 > -bufferSize 1073741824 > then the failure occurs and the exception is thrown as: > INFO mapreduce.Job: Task Id : attempt_1465445965249_0008_m_34_2, Status : > FAILED > Error: java.io.IOException: 4 missing blocks, the stripe is: Offset=0, > length=8388608, fetchedChunksNum=0, missingChunksNum=4 > at > org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.checkMissingBlocks(DFSStripedInputStream.java:614) > at > org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.readParityChunks(DFSStripedInputStream.java:647) > at > org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.readStripe(DFSStripedInputStream.java:762) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:316) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:450) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:941) > at java.io.DataInputStream.read(DataInputStream.java:149) > at org.apache.hadoop.fs.TestDFSIO$ReadMapper.doIO(TestDFSIO.java:531) > at org.apache.hadoop.fs.TestDFSIO$ReadMapper.doIO(TestDFSIO.java:508) > at org.apache.hadoop.fs.IOMapperBase.map(IOMapperBase.java:134) > at org.apache.hadoop.fs.IOMapperBase.map(IOMapperBase.java:37) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10385) LocalFileSystem rename() function should return false when destination file exists
Aihua Xu created HDFS-10385: --- Summary: LocalFileSystem rename() function should return false when destination file exists Key: HDFS-10385 URL: https://issues.apache.org/jira/browse/HDFS-10385 Project: Hadoop HDFS Issue Type: Bug Components: fs Affects Versions: 2.6.0 Reporter: Aihua Xu Currently rename() of LocalFileSystem returns true and renames successfully when the destination file exists. That seems to have different behavior from DFSFileSystem. If they can have the same behavior, then we can use one call to do rename rather than checking if destination exists and then making rename() call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org