[jira] [Created] (HDFS-14498) LeaseManager can loop forever on the file for which create has failed
Sergey Shelukhin created HDFS-14498: --- Summary: LeaseManager can loop forever on the file for which create has failed Key: HDFS-14498 URL: https://issues.apache.org/jira/browse/HDFS-14498 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.9.0 Reporter: Sergey Shelukhin The logs from file creation are long gone due to infinite lease logging, however it presumably failed... the client who was trying to write this file is definitely long dead. The version includes HDFS-4882. We get this log pattern repeating infinitely: {noformat} 2019-05-16 14:00:16,893 INFO [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease. Holder: DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1] has expired hard limit 2019-05-16 14:00:16,893 INFO [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease. Holder: DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1], src= 2019-05-16 14:00:16,893 WARN [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: Failed to release lease for file . Committed blocks are waiting to be minimally replicated. Try again later. 2019-05-16 14:00:16,893 WARN [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path in the lease [Lease. Holder: DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1]. It will be retried. org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* NameSystem.internalReleaseLease: Failed to release lease for file . Committed blocks are waiting to be minimally replicated. Try again later. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3357) at org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:573) at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:509) at java.lang.Thread.run(Thread.java:745) $ grep -c "Recovering.*DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1" hdfs_nn* hdfs_nn.log:1068035 hdfs_nn.log.2019-05-16-14:1516179 hdfs_nn.log.2019-05-16-15:1538350 {noformat} Aside from an actual bug fix, it might make sense to make LeaseManager not log so much, in case if there are more bugs like this... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14387) create a client-side override for dfs.namenode.block-placement-policy.default.prefer-local-node
Sergey Shelukhin created HDFS-14387: --- Summary: create a client-side override for dfs.namenode.block-placement-policy.default.prefer-local-node Key: HDFS-14387 URL: https://issues.apache.org/jira/browse/HDFS-14387 Project: Hadoop HDFS Issue Type: Bug Reporter: Sergey Shelukhin It should be possible for a service to decide whether it wants to use the local node preference; as it stands, if dfs.namenode.block-placement-policy.default.prefer-local-node is enabled, the services that run far fewer instances than there are DNs in the cluster unnecessarily concentrate their write load; the only way around it seems to be to disable prefer-local-node globally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
RE: DFSClient/DistriburedFileSystem fault injection?
Yeah, trying to do the injection client-side to avoid disruption to other users (and having to deploy/reconfigure HDFS). I was hoping someone has already created that :) We will probably create it at some point and may try to submit a patch later. -Original Message- From: Stephen Loughran Sent: Tuesday, February 12, 2019 3:33 PM To: Sergey Shelukhin Cc: hdfs-dev@hadoop.apache.org Subject: Re: DFSClient/DistriburedFileSystem fault injection? Sergey -you trying to simulate failures client side, or do you have an NN Which actually injects failures all the way up the IPC stack? as if its just client, couldn't registering a fault-injecting client as fs.hdfs.impl could do that FWIW, in the s3a connector we have the "inconsistent" s3 client which mimics some symptoms of delayed consistency; it has a path, a probability of happening and a delay before things become visible. This is in the main hadoop-aws JAR, and is turned on by a configuration switch (yes, it prints a big warning). With a single switch to turn it on, its trivial to enable it in tests On Mon, Feb 11, 2019 at 11:42 PM Sergey Shelukhin wrote: > Hi. > I've been looking for a client-side solution for fault injection in HDFS. > We had a naturally unstable HDFS cluster that helped uncover a lot of > issues in HBase; now that it has been stabilized, we miss it already > :) > > To keep testing without actually disrupting others' use of HDFS or > having to deploy a new version, I was thinking about having a > client-side schema (e.g. fhdfs) map to a wrapper over the standard DFS > that would inject failures and delays according to some configs, > similar to > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhado > op.apache.org%2Fdocs%2Fr2.7.2%2Fhadoop-project-dist%2Fhadoop-hdfs%2FFa > ultInjectFramework.html&data=02%7C01%7CSergey.Shelukhin%40microsof > t.com%7Cd586bd648a164a9dd45108d69142773e%7C72f988bf86f141af91ab2d7cd01 > 1db47%7C1%7C1%7C636856111994781945&sdata=9hiTYoOeDiKnz%2FNAvSRl%2F > AhqXtdIF%2FQcUwzQjorfJNU%3D&reserved=0 > > However I wonder if something like this exists already? > > - > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org > >
RE: DFSClient/DistriburedFileSystem fault injection?
Adding the user list :) -Original Message- From: Sergey Shelukhin Sent: Monday, February 11, 2019 3:42 PM To: hdfs-dev@hadoop.apache.org Subject: DFSClient/DistriburedFileSystem fault injection? Hi. I've been looking for a client-side solution for fault injection in HDFS. We had a naturally unstable HDFS cluster that helped uncover a lot of issues in HBase; now that it has been stabilized, we miss it already :) To keep testing without actually disrupting others' use of HDFS or having to deploy a new version, I was thinking about having a client-side schema (e.g. fhdfs) map to a wrapper over the standard DFS that would inject failures and delays according to some configs, similar to https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/FaultInjectFramework.html However I wonder if something like this exists already? - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
DFSClient/DistriburedFileSystem fault injection?
Hi. I've been looking for a client-side solution for fault injection in HDFS. We had a naturally unstable HDFS cluster that helped uncover a lot of issues in HBase; now that it has been stabilized, we miss it already :) To keep testing without actually disrupting others' use of HDFS or having to deploy a new version, I was thinking about having a client-side schema (e.g. fhdfs) map to a wrapper over the standard DFS that would inject failures and delays according to some configs, similar to https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/FaultInjectFramework.html However I wonder if something like this exists already? - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10757) KMSClientProvider combined with KeyProviderCache results in wrong UGI being used
Sergey Shelukhin created HDFS-10757: --- Summary: KMSClientProvider combined with KeyProviderCache results in wrong UGI being used Key: HDFS-10757 URL: https://issues.apache.org/jira/browse/HDFS-10757 Project: Hadoop HDFS Issue Type: Bug Reporter: Sergey Shelukhin Priority: Critical ClientContext::get gets the context from cache via a config setting based name, then KeyProviderCache stored in ClientContext gets the key provider cached by URI stored in configuration, too. KMSClientProvider caches the UGI (actualUgi) in ctor; that means in particular that all the users of DFS with KMSClientProvider in a process will get the KMS token (along with other credentials) of the first user... Either KMSClientProvider shouldn't store the UGI, or one of the caches should be UGI-aware, like the FS object cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-10414) allow disabling trash on per-directory basis
Sergey Shelukhin created HDFS-10414: --- Summary: allow disabling trash on per-directory basis Key: HDFS-10414 URL: https://issues.apache.org/jira/browse/HDFS-10414 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin For ETL, it might be useful to disable trash for certain directories only to avoid the overhead, while keeping it enabled for rest of the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-9567) LlapServiceDriver can fail if only the packaged logger config is present
[ https://issues.apache.org/jira/browse/HDFS-9567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin resolved HDFS-9567. Resolution: Invalid Wrong project > LlapServiceDriver can fail if only the packaged logger config is present > > > Key: HDFS-9567 > URL: https://issues.apache.org/jira/browse/HDFS-9567 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Sergey Shelukhin > > I was incrementally updating my setup on some VM and didn't have the logger > config file, so the packaged one was picked up apparently, which caused this: > {noformat} > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: > jar:file:/home/vagrant/llap/apache-hive-2.0.0-SNAPSHOT-bin/lib/hive-llap-server-2.0.0-SNAPSHOT.jar!/llap-daemon-log4j2.properties > at org.apache.hadoop.fs.Path.initialize(Path.java:205) > at org.apache.hadoop.fs.Path.(Path.java:171) > at > org.apache.hadoop.hive.llap.cli.LlapServiceDriver.run(LlapServiceDriver.java:234) > at > org.apache.hadoop.hive.llap.cli.LlapServiceDriver.main(LlapServiceDriver.java:58) > Caused by: java.net.URISyntaxException: Relative path in absolute URI: > jar:file:/home/vagrant/llap/apache-hive-2.0.0-SNAPSHOT-bin/lib/hive-llap-server-2.0.0-SNAPSHOT.jar!/llap-daemon-log4j2.properties > at java.net.URI.checkPath(URI.java:1823) > at java.net.URI.(URI.java:745) > at org.apache.hadoop.fs.Path.initialize(Path.java:202) > ... 3 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9567) LlapServiceDriver can fail if only the packaged logger config is present
Sergey Shelukhin created HDFS-9567: -- Summary: LlapServiceDriver can fail if only the packaged logger config is present Key: HDFS-9567 URL: https://issues.apache.org/jira/browse/HDFS-9567 Project: Hadoop HDFS Issue Type: Bug Reporter: Sergey Shelukhin I was incrementally updating my setup on some VM and didn't have the logger config file, so the packaged one was picked up apparently, which caused this: {noformat} java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: jar:file:/home/vagrant/llap/apache-hive-2.0.0-SNAPSHOT-bin/lib/hive-llap-server-2.0.0-SNAPSHOT.jar!/llap-daemon-log4j2.properties at org.apache.hadoop.fs.Path.initialize(Path.java:205) at org.apache.hadoop.fs.Path.(Path.java:171) at org.apache.hadoop.hive.llap.cli.LlapServiceDriver.run(LlapServiceDriver.java:234) at org.apache.hadoop.hive.llap.cli.LlapServiceDriver.main(LlapServiceDriver.java:58) Caused by: java.net.URISyntaxException: Relative path in absolute URI: jar:file:/home/vagrant/llap/apache-hive-2.0.0-SNAPSHOT-bin/lib/hive-llap-server-2.0.0-SNAPSHOT.jar!/llap-daemon-log4j2.properties at java.net.URI.checkPath(URI.java:1823) at java.net.URI.(URI.java:745) at org.apache.hadoop.fs.Path.initialize(Path.java:202) ... 3 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7895) open and getFileInfo APIs treat paths inconsistently
Sergey Shelukhin created HDFS-7895: -- Summary: open and getFileInfo APIs treat paths inconsistently Key: HDFS-7895 URL: https://issues.apache.org/jira/browse/HDFS-7895 Project: Hadoop HDFS Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Jing Zhao Priority: Minor When open() is called with regular HDFS path, hdfs://blah/blah/blah, it appears to work. However, getFileInfo doesn't {noformat} Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.fs.InvalidPathException): Invalid path name Invalid file name: hdfs://localhost:9000/apps/hive/warehouse/tpch_2.db/lineitem_orc/01_0 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4128) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:838) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:821) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) at org.apache.hadoop.ipc.Client.call(Client.java:1468) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy16.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy17.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1988) {noformat} 1) this seems inconsistent. 2) not clear why the validation should reject what looks like a good HDFS path. At least, client code should clean this stuff up on the way. [~prasanth_j] has the details, I just filed a bug so I could mention how buggy HDFS is to [~jingzhao] :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7878) API - expose an unique file identifier
Sergey Shelukhin created HDFS-7878: -- Summary: API - expose an unique file identifier Key: HDFS-7878 URL: https://issues.apache.org/jira/browse/HDFS-7878 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin See HDFS-487. Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA it supposedly duplicates. INode ID for the file should be easy to expose; alternatively ID could be derived from block IDs, to account for appends... This is useful e.g. for cache key by file, to make sure cache stays correct when file is overwritten. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7825) read(ByteBuffer) method doesn't conform to its API
Sergey Shelukhin created HDFS-7825: -- Summary: read(ByteBuffer) method doesn't conform to its API Key: HDFS-7825 URL: https://issues.apache.org/jira/browse/HDFS-7825 Project: Hadoop HDFS Issue Type: Bug Reporter: Sergey Shelukhin ByteBufferReadable::read(ByteBuffer) javadoc says: {noformat} After a successful call, buf.position() and buf.limit() should be unchanged, and therefore any data can be immediately read from buf. buf.mark() may be cleared or updated. {noformat} I have the following code: {noformat} ByteBuffer directBuf = ByteBuffer.allocateDirect(len); int pos = directBuf.position(); int count = file.read(directBuf); if (count < 0) throw new EOFException(); if (directBuf.position() != pos) { RecordReaderImpl.LOG.info("Warning - position mismatch from " + file.getClass() + ": after reading " + count + ", expected " + pos + " but got " + directBuf.position()); } {noformat} and I get: {noformat} 15/02/23 15:30:56 [pool-4-thread-1] INFO orc.RecordReaderImpl : Warning - position mismatch from class org.apache.hadoop.hdfs.client.HdfsDataInputStream: after reading 6, expected 0 but got 6 {noformat} So the position is changed, unlike the API doc indicates. Also, while I haven't verified yet, it may be that the 0-length read is not handled properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-5916) provide API to bulk delete directories/files
Sergey Shelukhin created HDFS-5916: -- Summary: provide API to bulk delete directories/files Key: HDFS-5916 URL: https://issues.apache.org/jira/browse/HDFS-5916 Project: Hadoop HDFS Issue Type: Improvement Reporter: Sergey Shelukhin It would be nice to have an API to delete directories and files in bulk - for example, when deleting Hive partitions or HBase regions in large numbers, the code could avoid many trips to NN. -- This message was sent by Atlassian JIRA (v6.1.5#6160)