[jira] [Created] (HDFS-14498) LeaseManager can loop forever on the file for which create has failed

2019-05-16 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HDFS-14498:
---

 Summary: LeaseManager can loop forever on the file for which 
create has failed 
 Key: HDFS-14498
 URL: https://issues.apache.org/jira/browse/HDFS-14498
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.9.0
Reporter: Sergey Shelukhin


The logs from file creation are long gone due to infinite lease logging, 
however it presumably failed... the client who was trying to write this file is 
definitely long dead.
The version includes HDFS-4882.
We get this log pattern repeating infinitely:
{noformat}
2019-05-16 14:00:16,893 INFO 
[org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: 
DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1] has expired hard limit
2019-05-16 14:00:16,893 INFO 
[org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
Holder: DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1], src=
2019-05-16 14:00:16,893 WARN 
[org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: 
Failed to release lease for file . Committed blocks are waiting to be 
minimally replicated. Try again later.
2019-05-16 14:00:16,893 WARN 
[org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path 
 in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_-20898906_61, 
pending creates: 1]. It will be retried.
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* 
NameSystem.internalReleaseLease: Failed to release lease for file . 
Committed blocks are waiting to be minimally replicated. Try again later.
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3357)
at 
org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:573)
at 
org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:509)
at java.lang.Thread.run(Thread.java:745)



$  grep -c "Recovering.*DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 
1" hdfs_nn*
hdfs_nn.log:1068035
hdfs_nn.log.2019-05-16-14:1516179
hdfs_nn.log.2019-05-16-15:1538350
{noformat}

Aside from an actual bug fix, it might make sense to make LeaseManager not log 
so much, in case if there are more bugs like this...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14387) create a client-side override for dfs.namenode.block-placement-policy.default.prefer-local-node

2019-03-22 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HDFS-14387:
---

 Summary: create a client-side override for 
dfs.namenode.block-placement-policy.default.prefer-local-node 
 Key: HDFS-14387
 URL: https://issues.apache.org/jira/browse/HDFS-14387
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Sergey Shelukhin


It should be possible for a service to decide whether it wants to use the local 
node preference; as it stands, if 
dfs.namenode.block-placement-policy.default.prefer-local-node is enabled, the 
services that run far fewer instances than there are DNs in the cluster 
unnecessarily concentrate their write load; the only way around it seems to be 
to disable prefer-local-node globally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



RE: DFSClient/DistriburedFileSystem fault injection?

2019-02-13 Thread Sergey Shelukhin
Yeah, trying to do the injection client-side to avoid disruption to other users 
(and having to deploy/reconfigure HDFS).
I was hoping someone has already created that :) We will probably create it at 
some point and may try to submit a patch later.

-Original Message-
From: Stephen Loughran  
Sent: Tuesday, February 12, 2019 3:33 PM
To: Sergey Shelukhin 
Cc: hdfs-dev@hadoop.apache.org
Subject: Re: DFSClient/DistriburedFileSystem fault injection?

Sergey -you trying to simulate failures client side, or do you have an NN Which 
actually injects failures all the way up the IPC stack?

as if its just client, couldn't registering a fault-injecting client as 
fs.hdfs.impl could do that

FWIW, in the s3a connector we have the "inconsistent" s3 client which mimics 
some symptoms of delayed consistency; it has a path, a probability of happening 
and a delay before things become visible. This is in the main hadoop-aws JAR, 
and is turned on by a configuration switch (yes, it prints a big warning). With 
a single switch to turn it on, its trivial to enable it in tests

On Mon, Feb 11, 2019 at 11:42 PM Sergey Shelukhin 
 wrote:

> Hi.
> I've been looking for a client-side solution for fault injection in HDFS.
> We had a naturally unstable HDFS cluster that helped uncover a lot of 
> issues in HBase; now that it has been stabilized, we miss it already 
> :)
>
> To keep testing without actually disrupting others' use of HDFS or 
> having to deploy a new version, I was thinking about having a 
> client-side schema (e.g. fhdfs) map to a wrapper over the standard DFS 
> that would inject failures and delays according to some configs, 
> similar to
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhado
> op.apache.org%2Fdocs%2Fr2.7.2%2Fhadoop-project-dist%2Fhadoop-hdfs%2FFa
> ultInjectFramework.html&data=02%7C01%7CSergey.Shelukhin%40microsof
> t.com%7Cd586bd648a164a9dd45108d69142773e%7C72f988bf86f141af91ab2d7cd01
> 1db47%7C1%7C1%7C636856111994781945&sdata=9hiTYoOeDiKnz%2FNAvSRl%2F
> AhqXtdIF%2FQcUwzQjorfJNU%3D&reserved=0
>
> However I wonder if something like this exists already?
>
> -
> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>


RE: DFSClient/DistriburedFileSystem fault injection?

2019-02-12 Thread Sergey Shelukhin
Adding the user list :)

-Original Message-
From: Sergey Shelukhin 
Sent: Monday, February 11, 2019 3:42 PM
To: hdfs-dev@hadoop.apache.org
Subject: DFSClient/DistriburedFileSystem fault injection?

Hi.
I've been looking for a client-side solution for fault injection in HDFS.
We had a naturally unstable HDFS cluster that helped uncover a lot of issues in 
HBase; now that it has been stabilized, we miss it already :)

To keep testing without actually disrupting others' use of HDFS or having to 
deploy a new version, I was thinking about having a client-side schema (e.g. 
fhdfs) map to a wrapper over the standard DFS that would inject failures and 
delays according to some configs, similar to 
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/FaultInjectFramework.html
 

However I wonder if something like this exists already?

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



DFSClient/DistriburedFileSystem fault injection?

2019-02-11 Thread Sergey Shelukhin
Hi.
I've been looking for a client-side solution for fault injection in HDFS.
We had a naturally unstable HDFS cluster that helped uncover a lot of issues in 
HBase; now that it has been stabilized, we miss it already :)

To keep testing without actually disrupting others' use of HDFS or having to 
deploy a new version, I was thinking about having a client-side schema (e.g. 
fhdfs) map to a wrapper over the standard DFS that would inject failures and 
delays according to some configs, similar to 
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/FaultInjectFramework.html
 

However I wonder if something like this exists already?

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10757) KMSClientProvider combined with KeyProviderCache results in wrong UGI being used

2016-08-11 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HDFS-10757:
---

 Summary: KMSClientProvider combined with KeyProviderCache results 
in wrong UGI being used
 Key: HDFS-10757
 URL: https://issues.apache.org/jira/browse/HDFS-10757
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Sergey Shelukhin
Priority: Critical


ClientContext::get gets the context from cache via a config setting based name, 
then KeyProviderCache stored in ClientContext gets the key provider cached by 
URI stored in configuration, too.
KMSClientProvider caches the UGI (actualUgi) in ctor; that means in particular 
that all the users of DFS with KMSClientProvider in a process will get the KMS 
token (along with other credentials) of the first user...

Either KMSClientProvider shouldn't store the UGI, or one of the caches should 
be UGI-aware, like the FS object cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10414) allow disabling trash on per-directory basis

2016-05-16 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HDFS-10414:
---

 Summary: allow disabling trash on per-directory basis
 Key: HDFS-10414
 URL: https://issues.apache.org/jira/browse/HDFS-10414
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin


For ETL, it might be useful to disable trash for certain directories only to 
avoid the overhead, while keeping it enabled for rest of the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-9567) LlapServiceDriver can fail if only the packaged logger config is present

2015-12-16 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HDFS-9567.

Resolution: Invalid

Wrong project

> LlapServiceDriver can fail if only the packaged logger config is present
> 
>
> Key: HDFS-9567
> URL: https://issues.apache.org/jira/browse/HDFS-9567
> Project: Hadoop HDFS
>  Issue Type: Bug
>    Reporter: Sergey Shelukhin
>
> I was incrementally updating my setup on some VM and didn't have the logger 
> config file, so the packaged one was picked up apparently, which caused this:
> {noformat}
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: 
> jar:file:/home/vagrant/llap/apache-hive-2.0.0-SNAPSHOT-bin/lib/hive-llap-server-2.0.0-SNAPSHOT.jar!/llap-daemon-log4j2.properties
>   at org.apache.hadoop.fs.Path.initialize(Path.java:205)
>   at org.apache.hadoop.fs.Path.(Path.java:171)
>   at 
> org.apache.hadoop.hive.llap.cli.LlapServiceDriver.run(LlapServiceDriver.java:234)
>   at 
> org.apache.hadoop.hive.llap.cli.LlapServiceDriver.main(LlapServiceDriver.java:58)
> Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
> jar:file:/home/vagrant/llap/apache-hive-2.0.0-SNAPSHOT-bin/lib/hive-llap-server-2.0.0-SNAPSHOT.jar!/llap-daemon-log4j2.properties
>   at java.net.URI.checkPath(URI.java:1823)
>   at java.net.URI.(URI.java:745)
>   at org.apache.hadoop.fs.Path.initialize(Path.java:202)
>   ... 3 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9567) LlapServiceDriver can fail if only the packaged logger config is present

2015-12-16 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HDFS-9567:
--

 Summary: LlapServiceDriver can fail if only the packaged logger 
config is present
 Key: HDFS-9567
 URL: https://issues.apache.org/jira/browse/HDFS-9567
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Sergey Shelukhin


I was incrementally updating my setup on some VM and didn't have the logger 
config file, so the packaged one was picked up apparently, which caused this:
{noformat}
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path 
in absolute URI: 
jar:file:/home/vagrant/llap/apache-hive-2.0.0-SNAPSHOT-bin/lib/hive-llap-server-2.0.0-SNAPSHOT.jar!/llap-daemon-log4j2.properties
at org.apache.hadoop.fs.Path.initialize(Path.java:205)
at org.apache.hadoop.fs.Path.(Path.java:171)
at 
org.apache.hadoop.hive.llap.cli.LlapServiceDriver.run(LlapServiceDriver.java:234)
at 
org.apache.hadoop.hive.llap.cli.LlapServiceDriver.main(LlapServiceDriver.java:58)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
jar:file:/home/vagrant/llap/apache-hive-2.0.0-SNAPSHOT-bin/lib/hive-llap-server-2.0.0-SNAPSHOT.jar!/llap-daemon-log4j2.properties
at java.net.URI.checkPath(URI.java:1823)
at java.net.URI.(URI.java:745)
at org.apache.hadoop.fs.Path.initialize(Path.java:202)
... 3 more
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7895) open and getFileInfo APIs treat paths inconsistently

2015-03-05 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HDFS-7895:
--

 Summary: open and getFileInfo APIs treat paths inconsistently
 Key: HDFS-7895
 URL: https://issues.apache.org/jira/browse/HDFS-7895
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Jing Zhao
Priority: Minor


When open() is called with regular HDFS path, hdfs://blah/blah/blah, it appears 
to work.
However, getFileInfo doesn't
{noformat}
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.fs.InvalidPathException):
 Invalid path name Invalid file name: 
hdfs://localhost:9000/apps/hive/warehouse/tpch_2.db/lineitem_orc/01_0
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4128)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:838)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:821)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

at org.apache.hadoop.ipc.Client.call(Client.java:1468)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy16.getFileInfo(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy17.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1988)
{noformat}

1) this seems inconsistent.
2) not clear why the validation should reject what looks like a good HDFS path. 
At least, client code should clean this stuff up on the way.

[~prasanth_j] has the details, I just filed a bug so I could mention how buggy 
HDFS is to [~jingzhao] :)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7878) API - expose an unique file identifier

2015-03-03 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HDFS-7878:
--

 Summary: API - expose an unique file identifier
 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin


See HDFS-487.
Even though that is resolved as duplicate, the ID is actually not exposed by 
the JIRA it supposedly duplicates.
INode ID for the file should be easy to expose; alternatively ID could be 
derived from block IDs, to account for appends...

This is useful e.g. for cache key by file, to make sure cache stays correct 
when file is overwritten.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7825) read(ByteBuffer) method doesn't conform to its API

2015-02-23 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HDFS-7825:
--

 Summary: read(ByteBuffer) method doesn't conform to its API
 Key: HDFS-7825
 URL: https://issues.apache.org/jira/browse/HDFS-7825
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Sergey Shelukhin


ByteBufferReadable::read(ByteBuffer) javadoc says:
{noformat}
After a successful call, buf.position() and buf.limit() should be unchanged, 
and therefore any data can be immediately read from buf. buf.mark() may be 
cleared or updated. 
{noformat}

I have the following code: 
{noformat}
ByteBuffer directBuf = ByteBuffer.allocateDirect(len);
int pos = directBuf.position();

int count = file.read(directBuf);
if (count < 0) throw new EOFException();
if (directBuf.position() != pos) {
  RecordReaderImpl.LOG.info("Warning - position mismatch from " + 
file.getClass()
  + ": after reading " + count + ", expected " + pos + " but 
got " + directBuf.position());
}
{noformat}

and I get:

{noformat}
15/02/23 15:30:56 [pool-4-thread-1] INFO orc.RecordReaderImpl : Warning - 
position mismatch from class org.apache.hadoop.hdfs.client.HdfsDataInputStream: 
after reading 6, expected 0 but got 6
{noformat}
So the position is changed, unlike the API doc indicates.

Also, while I haven't verified yet, it may be that the 0-length read is not 
handled properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-5916) provide API to bulk delete directories/files

2014-02-09 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HDFS-5916:
--

 Summary: provide API to bulk delete directories/files
 Key: HDFS-5916
 URL: https://issues.apache.org/jira/browse/HDFS-5916
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin


It would be nice to have an API to delete directories and files in bulk - for 
example, when deleting Hive partitions or HBase regions in large numbers, the 
code could avoid many trips to NN. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)