[jira] [Commented] (HADOOP-16453) Update how exceptions are handled in NetUtils.java

2019-08-04 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899570#comment-16899570
 ] 

He Xiaoqiao commented on HADOOP-16453:
--

+1 for [^HADOOP-16453.002.patch] from my side. Thanks [~leosun08].

> Update how exceptions are handled in NetUtils.java
> --
>
> Key: HADOOP-16453
> URL: https://issues.apache.org/jira/browse/HADOOP-16453
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HADOOP-16453.001.patch, HADOOP-16453.002.patch
>
>
> When there is no String Constructor for the exception, we Log a Trace 
> Message. Given that log and throw is not a very good approach I think the 
> right thing would be to just not log it at all as HADOOP-16431.
> {code:java}
> private static  T wrapWithMessage(
> T exception, String msg) throws T {
>   Class clazz = exception.getClass();
>   try {
> Constructor ctor = 
> clazz.getConstructor(String.class);
> Throwable t = ctor.newInstance(msg);
> return (T)(t.initCause(exception));
>   } catch (Throwable e) {
> LOG.trace("Unable to wrap exception of type {}: it has no (String) "
> + "constructor", clazz, e);
> throw exception;
>   }
> }
> {code}
>  *exception stack:*
> {code:java}
> 19/07/12 11:23:45 INFO mapreduce.JobSubmitter: Executing with tokens: [Kind: 
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:azorprc-xiaomi, Ident: (token for 
> sql_prc: HDFS_DELEGATION_TOKEN owner=sql_prc/hadoop@XIAOMI.HADOOP, 
> renewer=yarn_prc, realUser=, issueDate=1562901814007, maxDate=1594437814007, 
> sequenceNumber=3349939, masterKeyId=1400)]
> 19/07/12 11:23:46 TRACE net.NetUtils: Unable to wrap exception of type class 
> java.nio.channels.ClosedByInterruptException: it has no (String) constructor
> java.lang.NoSuchMethodException: 
> java.nio.channels.ClosedByInterruptException.(java.lang.String)
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.getConstructor(Class.java:1825)
> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:830)
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:806)
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1559)
> at org.apache.hadoop.ipc.Client.call(Client.java:1501)
> at org.apache.hadoop.ipc.Client.call(Client.java:1411)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:949)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hdfs.server.namenode.ha.RequestHedgingProxyProvider$RequestHedgingInvocationHandler$1.call(RequestHedgingProxyProvider.java:143)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 19/07/12 11:23:46 INFO Configuration.deprecation: No unit for 
> dfs.client.datanode-restart.timeout(30) assuming SECONDS
> 19/07/12 11:23:46 INFO Configuration.deprecation: No unit for 
> dfs.client.datanode-restart.timeout(30) assuming SECONDS
> 19/07/12 11:23:46 INFO Configuration.deprecation: No unit for 
> dfs.client.datanode-restart.timeout(30) assuming SECONDS
> 19/07/12 11:23:46 INFO Configuration.deprecation: No unit for 
> dfs.client.datanode-restart.timeout(30) assuming SECONDS
> 19/07/12 11:23:46 INFO Configuration.deprecation: No unit for 
> dfs.client.datanode-restart.timeout(30) assuming SECONDS
> 19/07/12 11:23:46 INFO Configuration.deprecation: No unit for 
> dfs.client.datanode-restart.timeout(30) assuming SECONDS
> 19/07/12 11:23:46 WARN ipc.Client: Exception encountered while connecting to 
> the server : java.io.InterruptedIOException: Interrupted while waiting for IO 
> on channel java.nio.channels.SocketChannel[connected 
> local=/10.118.30.48:34324 remote=/10.69.11.137:11200]. 6 millis timeout 
> left.
> 19/07/12 11:23:48 INFO conf.Configuration: resource-types.xml not found
> 19/07/12 11:23:48 INFO resource.ResourceUtils: Unable to find 
> 'resource-types.xml'.
> 

[jira] [Commented] (HADOOP-15440) Support kerberos principal name pattern for KerberosAuthenticationHandler

2019-08-02 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898846#comment-16898846
 ] 

He Xiaoqiao commented on HADOOP-15440:
--

[~eyang], I try to recall changes about this patch, and it seems that it is 
same as {{SecurityUtil#getServerPrincipal}} which is not import by submodule 
`hadoop-common`.
for case `test/test/test`, it will split to [test,test,test] but 
`components[1]` is not equals to `_HOST`, so it will not be replaced.
for case `test/_HOST/test`, it will be replaced to `test/$hostname/test`.
{quote}While this works fine for server with single network interface.  It can 
create problems for multi-homed network that getCanonicalHostName doesn't 
return the desired hostname.{quote}
it is true. it seems {{DNS.getHosts}} give one choice, any suggestions?  Thanks 
again.

> Support kerberos principal name pattern for KerberosAuthenticationHandler
> -
>
> Key: HADOOP-15440
> URL: https://issues.apache.org/jira/browse/HADOOP-15440
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-15440-trunk.001.patch, HADOOP-15440.002.patch
>
>
> When setup HttpFS server or KMS server in security mode, we have to config 
> kerberos principal for these service, it doesn't support to convert Kerberos 
> principal name pattern to valid Kerberos principal names whereas 
> NameNode/DataNode and many other service can do that, so it makes confused 
> for users. so I propose to replace hostname pattern with hostname, which 
> should be fully-qualified domain name.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15440) Support kerberos principal name pattern for KerberosAuthenticationHandler

2019-08-01 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898551#comment-16898551
 ] 

He Xiaoqiao commented on HADOOP-15440:
--

Thanks [~eyang] for your quick and kind response at the same time. And they are 
very valuable suggestions, I will check it as soon as possible.

> Support kerberos principal name pattern for KerberosAuthenticationHandler
> -
>
> Key: HADOOP-15440
> URL: https://issues.apache.org/jira/browse/HADOOP-15440
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-15440-trunk.001.patch, HADOOP-15440.002.patch
>
>
> When setup HttpFS server or KMS server in security mode, we have to config 
> kerberos principal for these service, it doesn't support to convert Kerberos 
> principal name pattern to valid Kerberos principal names whereas 
> NameNode/DataNode and many other service can do that, so it makes confused 
> for users. so I propose to replace hostname pattern with hostname, which 
> should be fully-qualified domain name.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15440) Support kerberos principal name pattern for KerberosAuthenticationHandler

2019-08-01 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898548#comment-16898548
 ] 

He Xiaoqiao commented on HADOOP-15440:
--

Thanks [~jojochuang] for getting this issue back,  [^HADOOP-15440.002.patch] 
try to fix checkstyle and pending Jenkins.
{quote}please make some examples in the summary so this is easier to 
understand.{quote}
When setup HttpFS server or KMS server in security mode, we should config the 
item `httpfs.authentication.kerberos.principal` for httpfs principal. Since it 
doesn't support to convert Kerberos principal name pattern to valid Kerberos 
principal names, so we have to config the principal value with the real 
hostname rather than the hostname patter `_HOST` as the following shows, thus 
we have to prepare different configs for different HttpFS instance or KMS 
instance.
{code:java}

httpfs.authentication.kerberos.principal
HTTP/`hostname`@REALM

{code}
cc [~jojochuang],[~eyang], [~stev...@iseran.com] Please take a reviews if you 
have times. Thanks again.

> Support kerberos principal name pattern for KerberosAuthenticationHandler
> -
>
> Key: HADOOP-15440
> URL: https://issues.apache.org/jira/browse/HADOOP-15440
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-15440-trunk.001.patch, HADOOP-15440.002.patch
>
>
> When setup HttpFS server or KMS server in security mode, we have to config 
> kerberos principal for these service, it doesn't support to convert Kerberos 
> principal name pattern to valid Kerberos principal names whereas 
> NameNode/DataNode and many other service can do that, so it makes confused 
> for users. so I propose to replace hostname pattern with hostname, which 
> should be fully-qualified domain name.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15440) Support kerberos principal name pattern for KerberosAuthenticationHandler

2019-08-01 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-15440:
-
Attachment: HADOOP-15440.002.patch

> Support kerberos principal name pattern for KerberosAuthenticationHandler
> -
>
> Key: HADOOP-15440
> URL: https://issues.apache.org/jira/browse/HADOOP-15440
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-15440-trunk.001.patch, HADOOP-15440.002.patch
>
>
> When setup HttpFS server or KMS server in security mode, we have to config 
> kerberos principal for these service, it doesn't support to convert Kerberos 
> principal name pattern to valid Kerberos principal names whereas 
> NameNode/DataNode and many other service can do that, so it makes confused 
> for users. so I propose to replace hostname pattern with hostname, which 
> should be fully-qualified domain name.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16403) Start a new statistical rpc queue and make the Reader's pendingConnection queue runtime-replaceable

2019-07-06 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879790#comment-16879790
 ] 

He Xiaoqiao commented on HADOOP-16403:
--

Thanks [~LiJinglun] for your response. Did you backport HDFS-6763? I have met 
this issue once, and it have resolved by applying HDFS-6763. Please try to 
apply that patch and welcome to some more discussion.

> Start a new statistical rpc queue and make the Reader's pendingConnection 
> queue runtime-replaceable
> ---
>
> Key: HADOOP-16403
> URL: https://issues.apache.org/jira/browse/HADOOP-16403
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Jinglun
>Priority: Major
> Attachments: HADOOP-16403.001.patch, MetricLinkedBlockingQueueTest.pdf
>
>
> I have an HA cluster with 2 NameNodes. The NameNode's meta is quite big so 
> after the active dead, it takes the standby more than 40s to become active. 
> Many requests(tcp connect request and rpc request) from Datanodes, clients 
> and zkfc timed out and start retrying. The suddenly request flood lasts for 
> the next 2 minutes and finally all requests are either handled or run out of 
> retry times. 
>  Adjusting the rpc related settings might power the NameNode and solve this 
> problem and the key point is finding the bottle neck. The rpc server can be 
> described as below:
> {noformat}
> Listener -> Readers' queues -> Readers -> callQueue -> Handlers{noformat}
> By sampling some failed clients, I find many of them got 
> ConnectTimeoutException. It's caused by a 20s un-responded tcp connect 
> request. I think may be the reader queue is full and block the listener from 
> handling new connections. Both slow handlers and slow readers can block the 
> whole processing progress, and I need to know who it is. I think *a queue 
> that computes the qps, write log when the queue is full and could be replaced 
> easily* will help. 
>  I find the nice work HADOOP-10302 implementing a runtime-swapped queue. 
> Using it at Reader's queue makes the reader queue runtime-swapped 
> automatically. The qps computing job could be done by implementing a subclass 
> of LinkedBlockQueue that does the computing job while put/take/... happens. 
> The qps data will show on jmx.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16385) Namenode crashes with "RedundancyMonitor thread received Runtime exception"

2019-07-03 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877603#comment-16877603
 ] 

He Xiaoqiao commented on HADOOP-16385:
--

Thanks [~elgoiri],[~ayushtkn] for your work.
+1 for remove `Preconditions.checkArgument` and avoid to crash namenode.
I am confused why this situation happened. After apply HADOOP-16028, 
{{totalInScopeNodes}} always greater than or equal to {{availableNodes}} based 
on the scope. Please correct me if something I missed.
{code:java}
Preconditions.checkArgument(
totalInScopeNodes >= availableNodes && availableNodes > 0, String
.format("%d should >= %d, and both should be positive.",
totalInScopeNodes, availableNodes));
{code}


> Namenode crashes with "RedundancyMonitor thread received Runtime exception"
> ---
>
> Key: HADOOP-16385
> URL: https://issues.apache.org/jira/browse/HADOOP-16385
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: krishna reddy
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HADOOP-16385-01.patch, HADOOP-16385-02.patch, 
> HADOOP-16385-03.patch, HADOOP-16385-HDFS_UT.patch, 
> HADOOP-16385.branch-3.1.001.patch
>
>
> *Description: *While removing dead nodes, Namenode went down with error 
> "RedundancyMonitor thread received Runtime exception"
> *Environment: *
> Server OS :- UBUNTU
>  No. of Cluster Node:- 1NN / 225DN's / 3ZK  / 2RM/ 4850 NMs
> total 240 machines, in each machine 21 docker containers (1 DN & 20 NM's)
> *Steps:*
> 1. Total number of containers running state : ~53000
> 2. Because of the load, machine was going to outofMemory and restarting the 
> machine and starting all the docker containers including NM's and DN's
> 3. in some point namenode throughs below error while removing a node and NN 
> went down.
> {noformat}
> 2019-06-19 05:54:07,262 INFO org.apache.hadoop.net.NetworkTopology: Removing 
> a node: /rack-1550/255.255.117.195:23735
> 2019-06-19 05:54:07,263 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> removeDeadDatanode: lost heartbeat from 255.255.117.151:23735, 
> removeBlocksFromBlockMap true
> 2019-06-19 05:54:07,281 INFO org.apache.hadoop.net.NetworkTopology: Removing 
> a node: /rack-4097/255.255.117.151:23735
> 2019-06-19 05:54:07,282 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> removeDeadDatanode: lost heartbeat from 255.255.116.213:23735, 
> removeBlocksFromBlockMap true
> 2019-06-19 05:54:07,290 ERROR 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: RedundancyMonitor 
> thread received Runtime exception.
> java.lang.IllegalArgumentException: 247 should >= 248, and both should be 
> positive.
> at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:575)
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:552)
> at 
> org.apache.hadoop.hdfs.net.DFSNetworkTopology.chooseRandomWithStorageTypeTwoTrial(DFSNetworkTopology.java:122)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseDataNode(BlockPlacementPolicyDefault.java:873)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:770)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRemoteRack(BlockPlacementPolicyDefault.java:712)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:507)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:425)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargets(BlockPlacementPolicyDefault.java:311)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:290)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:143)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.chooseTarget(BlockPlacementPolicy.java:103)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.ReplicationWork.chooseTargets(ReplicationWork.java:51)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1902)
> at 
> 

[jira] [Commented] (HADOOP-16403) Start a new statistical rpc queue and make the Reader's pendingConnection queue runtime-replaceable

2019-07-01 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876094#comment-16876094
 ] 

He Xiaoqiao commented on HADOOP-16403:
--

Thanks [~LiJinglun],
{code:java}
it takes the standby more than 40s to become active
{code}
would you like to provide which version do you deploy and how large meta? it 
seems too long(40s) to transition to active.

> Start a new statistical rpc queue and make the Reader's pendingConnection 
> queue runtime-replaceable
> ---
>
> Key: HADOOP-16403
> URL: https://issues.apache.org/jira/browse/HADOOP-16403
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Jinglun
>Priority: Major
> Attachments: HADOOP-16403.001.patch
>
>
> I have an HA cluster with 2 NameNodes. The NameNode's meta is quite big so 
> after the active dead, it takes the standby more than 40s to become active. 
> Many requests(tcp connect request and rpc request) from Datanodes, clients 
> and zkfc timed out and start retrying. The suddenly request flood lasts for 
> the next 2 minutes and finally all requests are either handled or run out of 
> retry times. 
> Adjusting the rpc related settings might power the NameNode and solve this 
> problem and the key point is finding the bottle neck. The rpc server can be 
> described as below:
> {noformat}
> Listener -> Readers' queues -> Readers -> callQueue -> Handlers{noformat}
> By sampling some failed clients, I find many of them got ConnectException. 
> It's caused by a 20s un-responded tcp connect request. I think may be the 
> reader queue is full and block the listener from handling new connections. 
> Both slow handlers and slow readers can block the whole processing progress, 
> and I need to know who it is. I think *a queue that computes the qps, write 
> log when the queue is full and could be replaced easily* will help. 
> I find the nice work HADOOP-10302 implementing a runtime-swapped queue. Using 
> it at Reader's queue makes the reader queue runtime-swapped automatically. 
> The qps computing job could be done by implementing a subclass of 
> LinkedBlockQueue that does the computing job while put/take/... happens. The 
> qps data will show on jmx.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15918) Namenode gets stuck when deleting large dir in trash

2019-06-23 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870559#comment-16870559
 ] 

He Xiaoqiao edited comment on HADOOP-15918 at 6/23/19 2:27 PM:
---

Thanks [~Tao Jie] for your contributions. I think this is very common issue, 
are you still working in progress? Thanks again.


was (Author: hexiaoqiao):
Thanks [~Tao Jie] for your contributions. I think this is very common issue, 
and if you are still working in progress? Thanks again.

> Namenode gets stuck when deleting large dir in trash
> 
>
> Key: HADOOP-15918
> URL: https://issues.apache.org/jira/browse/HADOOP-15918
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.1.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HADOOP-15918.001.patch, HADOOP-15918.002.patch, 
> HDFS-13769.001.patch, HDFS-13769.002.patch, HDFS-13769.003.patch, 
> HDFS-13769.004.patch
>
>
> Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a 
> long time when deleting trash dir with a large mount of data. We found log in 
> namenode:
> {quote}
> 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem 
> (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for 
> 23018 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047)
> {quote}
> One simple solution is to avoid deleting large data in one delete RPC call. 
> We implement a trashPolicy that divide the delete operation into several 
> delete RPCs, and each single deletion would not delete too many files.
> Any thought? [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15918) Namenode gets stuck when deleting large dir in trash

2019-06-23 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870559#comment-16870559
 ] 

He Xiaoqiao commented on HADOOP-15918:
--

Thanks [~Tao Jie] for your contributions. I think this is very common issue, 
and if you are still working in progress? Thanks again.

> Namenode gets stuck when deleting large dir in trash
> 
>
> Key: HADOOP-15918
> URL: https://issues.apache.org/jira/browse/HADOOP-15918
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.1.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HADOOP-15918.001.patch, HADOOP-15918.002.patch, 
> HDFS-13769.001.patch, HDFS-13769.002.patch, HDFS-13769.003.patch, 
> HDFS-13769.004.patch
>
>
> Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a 
> long time when deleting trash dir with a large mount of data. We found log in 
> namenode:
> {quote}
> 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem 
> (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for 
> 23018 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047)
> {quote}
> One simple solution is to avoid deleting large data in one delete RPC call. 
> We implement a trashPolicy that divide the delete operation into several 
> delete RPCs, and each single deletion would not delete too many files.
> Any thought? [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-16385) Namenode crashes with "RedundancyMonitor thread received Runtime exception"

2019-06-22 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870173#comment-16870173
 ] 

He Xiaoqiao edited comment on HADOOP-16385 at 6/22/19 9:41 AM:
---

Thanks [~xuzq_zander] for your deep dig. I think HADOOP-16028 may could solve 
your doubt. Anyway welcome some more discussion. Thanks [~xuzq_zander] again.


was (Author: hexiaoqiao):
Thanks [~xuzq_zander] for your deep dig. I think HADOOP-16028 may could solve 
your doubt. Anyway welcome some more discussion. Thans [~xuzq_zander] again.

> Namenode crashes with "RedundancyMonitor thread received Runtime exception"
> ---
>
> Key: HADOOP-16385
> URL: https://issues.apache.org/jira/browse/HADOOP-16385
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: krishna reddy
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HADOOP-16385.branch-3.1.001.patch
>
>
> *Description: *While removing dead nodes, Namenode went down with error 
> "RedundancyMonitor thread received Runtime exception"
> *Environment: *
> Server OS :- UBUNTU
>  No. of Cluster Node:- 1NN / 225DN's / 3ZK  / 2RM/ 4850 NMs
> total 240 machines, in each machine 21 docker containers (1 DN & 20 NM's)
> *Steps:*
> 1. Total number of containers running state : ~53000
> 2. Because of the load, machine was going to outofMemory and restarting the 
> machine and starting all the docker containers including NM's and DN's
> 3. in some point namenode throughs below error while removing a node and NN 
> went down.
> {noformat}
> 2019-06-19 05:54:07,262 INFO org.apache.hadoop.net.NetworkTopology: Removing 
> a node: /rack-1550/255.255.117.195:23735
> 2019-06-19 05:54:07,263 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> removeDeadDatanode: lost heartbeat from 255.255.117.151:23735, 
> removeBlocksFromBlockMap true
> 2019-06-19 05:54:07,281 INFO org.apache.hadoop.net.NetworkTopology: Removing 
> a node: /rack-4097/255.255.117.151:23735
> 2019-06-19 05:54:07,282 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> removeDeadDatanode: lost heartbeat from 255.255.116.213:23735, 
> removeBlocksFromBlockMap true
> 2019-06-19 05:54:07,290 ERROR 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: RedundancyMonitor 
> thread received Runtime exception.
> java.lang.IllegalArgumentException: 247 should >= 248, and both should be 
> positive.
> at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:575)
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:552)
> at 
> org.apache.hadoop.hdfs.net.DFSNetworkTopology.chooseRandomWithStorageTypeTwoTrial(DFSNetworkTopology.java:122)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseDataNode(BlockPlacementPolicyDefault.java:873)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:770)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRemoteRack(BlockPlacementPolicyDefault.java:712)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:507)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:425)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargets(BlockPlacementPolicyDefault.java:311)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:290)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:143)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.chooseTarget(BlockPlacementPolicy.java:103)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.ReplicationWork.chooseTargets(ReplicationWork.java:51)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1902)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1854)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4842)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4709)
> at java.lang.Thread.run(Thread.java:748)
> 2019-06-19 05:54:07,296 INFO 

[jira] [Commented] (HADOOP-16385) Namenode crashes with "RedundancyMonitor thread received Runtime exception"

2019-06-22 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870173#comment-16870173
 ] 

He Xiaoqiao commented on HADOOP-16385:
--

Thanks [~xuzq_zander] for your deep dig. I think HADOOP-16028 may could solve 
your doubt. Anyway welcome some more discussion. Thans [~xuzq_zander] again.

> Namenode crashes with "RedundancyMonitor thread received Runtime exception"
> ---
>
> Key: HADOOP-16385
> URL: https://issues.apache.org/jira/browse/HADOOP-16385
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: krishna reddy
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HADOOP-16385.branch-3.1.001.patch
>
>
> *Description: *While removing dead nodes, Namenode went down with error 
> "RedundancyMonitor thread received Runtime exception"
> *Environment: *
> Server OS :- UBUNTU
>  No. of Cluster Node:- 1NN / 225DN's / 3ZK  / 2RM/ 4850 NMs
> total 240 machines, in each machine 21 docker containers (1 DN & 20 NM's)
> *Steps:*
> 1. Total number of containers running state : ~53000
> 2. Because of the load, machine was going to outofMemory and restarting the 
> machine and starting all the docker containers including NM's and DN's
> 3. in some point namenode throughs below error while removing a node and NN 
> went down.
> {noformat}
> 2019-06-19 05:54:07,262 INFO org.apache.hadoop.net.NetworkTopology: Removing 
> a node: /rack-1550/255.255.117.195:23735
> 2019-06-19 05:54:07,263 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> removeDeadDatanode: lost heartbeat from 255.255.117.151:23735, 
> removeBlocksFromBlockMap true
> 2019-06-19 05:54:07,281 INFO org.apache.hadoop.net.NetworkTopology: Removing 
> a node: /rack-4097/255.255.117.151:23735
> 2019-06-19 05:54:07,282 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> removeDeadDatanode: lost heartbeat from 255.255.116.213:23735, 
> removeBlocksFromBlockMap true
> 2019-06-19 05:54:07,290 ERROR 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: RedundancyMonitor 
> thread received Runtime exception.
> java.lang.IllegalArgumentException: 247 should >= 248, and both should be 
> positive.
> at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:575)
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:552)
> at 
> org.apache.hadoop.hdfs.net.DFSNetworkTopology.chooseRandomWithStorageTypeTwoTrial(DFSNetworkTopology.java:122)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseDataNode(BlockPlacementPolicyDefault.java:873)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:770)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRemoteRack(BlockPlacementPolicyDefault.java:712)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:507)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:425)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargets(BlockPlacementPolicyDefault.java:311)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:290)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:143)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.chooseTarget(BlockPlacementPolicy.java:103)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.ReplicationWork.chooseTargets(ReplicationWork.java:51)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1902)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1854)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4842)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4709)
> at java.lang.Thread.run(Thread.java:748)
> 2019-06-19 05:54:07,296 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1: java.lang.IllegalArgumentException: 247 should >= 248, and both 
> should be positive.
> 2019-06-19 05:54:07,298 INFO 
> 

[jira] [Commented] (HADOOP-16385) Namenode crashes with "RedundancyMonitor thread received Runtime exception"

2019-06-20 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868828#comment-16868828
 ] 

He Xiaoqiao commented on HADOOP-16385:
--

{quote}HADOOP-16028 can be directly cherry-picked to 3.1, Doesn't require a 
separate patch. I will ping up there.{quote}
All right, it make sense. Thanks.

> Namenode crashes with "RedundancyMonitor thread received Runtime exception"
> ---
>
> Key: HADOOP-16385
> URL: https://issues.apache.org/jira/browse/HADOOP-16385
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: krishna reddy
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HADOOP-16385.branch-3.1.001.patch
>
>
> *Description: *While removing dead nodes, Namenode went down with error 
> "RedundancyMonitor thread received Runtime exception"
> *Environment: *
> Server OS :- UBUNTU
>  No. of Cluster Node:- 1NN / 225DN's / 3ZK  / 2RM/ 4850 NMs
> total 240 machines, in each machine 21 docker containers (1 DN & 20 NM's)
> *Steps:*
> 1. Total number of containers running state : ~53000
> 2. Because of the load, machine was going to outofMemory and restarting the 
> machine and starting all the docker containers including NM's and DN's
> 3. in some point namenode throughs below error while removing a node and NN 
> went down.
> {noformat}
> 2019-06-19 05:54:07,262 INFO org.apache.hadoop.net.NetworkTopology: Removing 
> a node: /rack-1550/255.255.117.195:23735
> 2019-06-19 05:54:07,263 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> removeDeadDatanode: lost heartbeat from 255.255.117.151:23735, 
> removeBlocksFromBlockMap true
> 2019-06-19 05:54:07,281 INFO org.apache.hadoop.net.NetworkTopology: Removing 
> a node: /rack-4097/255.255.117.151:23735
> 2019-06-19 05:54:07,282 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> removeDeadDatanode: lost heartbeat from 255.255.116.213:23735, 
> removeBlocksFromBlockMap true
> 2019-06-19 05:54:07,290 ERROR 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: RedundancyMonitor 
> thread received Runtime exception.
> java.lang.IllegalArgumentException: 247 should >= 248, and both should be 
> positive.
> at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:575)
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:552)
> at 
> org.apache.hadoop.hdfs.net.DFSNetworkTopology.chooseRandomWithStorageTypeTwoTrial(DFSNetworkTopology.java:122)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseDataNode(BlockPlacementPolicyDefault.java:873)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:770)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRemoteRack(BlockPlacementPolicyDefault.java:712)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:507)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:425)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargets(BlockPlacementPolicyDefault.java:311)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:290)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:143)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.chooseTarget(BlockPlacementPolicy.java:103)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.ReplicationWork.chooseTargets(ReplicationWork.java:51)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1902)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1854)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4842)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4709)
> at java.lang.Thread.run(Thread.java:748)
> 2019-06-19 05:54:07,296 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1: java.lang.IllegalArgumentException: 247 should >= 248, and both 
> should be positive.
> 2019-06-19 05:54:07,298 INFO 
> 

[jira] [Updated] (HADOOP-16385) Namenode crashes with "RedundancyMonitor thread received Runtime exception"

2019-06-20 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-16385:
-
Attachment: HADOOP-16385.branch-3.1.001.patch
Status: Patch Available  (was: Open)

Just backport HADOOP-16028 to branch-3.1 and pending Jenkins. cc [~ayushtkn] 
could you help to take another reviews?

> Namenode crashes with "RedundancyMonitor thread received Runtime exception"
> ---
>
> Key: HADOOP-16385
> URL: https://issues.apache.org/jira/browse/HADOOP-16385
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: krishna reddy
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HADOOP-16385.branch-3.1.001.patch
>
>
> *Description: *While removing dead nodes, Namenode went down with error 
> "RedundancyMonitor thread received Runtime exception"
> *Environment: *
> Server OS :- UBUNTU
>  No. of Cluster Node:- 1NN / 225DN's / 3ZK  / 2RM/ 4850 NMs
> total 240 machines, in each machine 21 docker containers (1 DN & 20 NM's)
> *Steps:*
> 1. Total number of containers running state : ~53000
> 2. Because of the load, machine was going to outofMemory and restarting the 
> machine and starting all the docker containers including NM's and DN's
> 3. in some point namenode throughs below error while removing a node and NN 
> went down.
> {noformat}
> 2019-06-19 05:54:07,262 INFO org.apache.hadoop.net.NetworkTopology: Removing 
> a node: /rack-1550/255.255.117.195:23735
> 2019-06-19 05:54:07,263 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> removeDeadDatanode: lost heartbeat from 255.255.117.151:23735, 
> removeBlocksFromBlockMap true
> 2019-06-19 05:54:07,281 INFO org.apache.hadoop.net.NetworkTopology: Removing 
> a node: /rack-4097/255.255.117.151:23735
> 2019-06-19 05:54:07,282 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> removeDeadDatanode: lost heartbeat from 255.255.116.213:23735, 
> removeBlocksFromBlockMap true
> 2019-06-19 05:54:07,290 ERROR 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: RedundancyMonitor 
> thread received Runtime exception.
> java.lang.IllegalArgumentException: 247 should >= 248, and both should be 
> positive.
> at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:575)
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:552)
> at 
> org.apache.hadoop.hdfs.net.DFSNetworkTopology.chooseRandomWithStorageTypeTwoTrial(DFSNetworkTopology.java:122)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseDataNode(BlockPlacementPolicyDefault.java:873)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:770)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRemoteRack(BlockPlacementPolicyDefault.java:712)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:507)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:425)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargets(BlockPlacementPolicyDefault.java:311)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:290)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:143)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.chooseTarget(BlockPlacementPolicy.java:103)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.ReplicationWork.chooseTargets(ReplicationWork.java:51)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1902)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1854)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4842)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4709)
> at java.lang.Thread.run(Thread.java:748)
> 2019-06-19 05:54:07,296 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1: java.lang.IllegalArgumentException: 247 should >= 248, and both 
> should be positive.
> 2019-06-19 05:54:07,298 INFO 
> 

[jira] [Moved] (HADOOP-16385) Namenode crashes with "RedundancyMonitor thread received Runtime exception"

2019-06-20 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao moved HDFS-14584 to HADOOP-16385:
-

Affects Version/s: (was: 3.1.1)
   3.1.1
  Key: HADOOP-16385  (was: HDFS-14584)
  Project: Hadoop Common  (was: Hadoop HDFS)

> Namenode crashes with "RedundancyMonitor thread received Runtime exception"
> ---
>
> Key: HADOOP-16385
> URL: https://issues.apache.org/jira/browse/HADOOP-16385
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: krishna reddy
>Assignee: Ayush Saxena
>Priority: Major
>
> *Description: *While removing dead nodes, Namenode went down with error 
> "RedundancyMonitor thread received Runtime exception"
> *Environment: *
> Server OS :- UBUNTU
>  No. of Cluster Node:- 1NN / 225DN's / 3ZK  / 2RM/ 4850 NMs
> total 240 machines, in each machine 21 docker containers (1 DN & 20 NM's)
> *Steps:*
> 1. Total number of containers running state : ~53000
> 2. Because of the load, machine was going to outofMemory and restarting the 
> machine and starting all the docker containers including NM's and DN's
> 3. in some point namenode throughs below error while removing a node and NN 
> went down.
> {noformat}
> 2019-06-19 05:54:07,262 INFO org.apache.hadoop.net.NetworkTopology: Removing 
> a node: /rack-1550/255.255.117.195:23735
> 2019-06-19 05:54:07,263 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> removeDeadDatanode: lost heartbeat from 255.255.117.151:23735, 
> removeBlocksFromBlockMap true
> 2019-06-19 05:54:07,281 INFO org.apache.hadoop.net.NetworkTopology: Removing 
> a node: /rack-4097/255.255.117.151:23735
> 2019-06-19 05:54:07,282 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> removeDeadDatanode: lost heartbeat from 255.255.116.213:23735, 
> removeBlocksFromBlockMap true
> 2019-06-19 05:54:07,290 ERROR 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: RedundancyMonitor 
> thread received Runtime exception.
> java.lang.IllegalArgumentException: 247 should >= 248, and both should be 
> positive.
> at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:575)
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:552)
> at 
> org.apache.hadoop.hdfs.net.DFSNetworkTopology.chooseRandomWithStorageTypeTwoTrial(DFSNetworkTopology.java:122)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseDataNode(BlockPlacementPolicyDefault.java:873)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:770)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRemoteRack(BlockPlacementPolicyDefault.java:712)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:507)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:425)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargets(BlockPlacementPolicyDefault.java:311)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:290)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:143)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.chooseTarget(BlockPlacementPolicy.java:103)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.ReplicationWork.chooseTargets(ReplicationWork.java:51)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1902)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1854)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4842)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4709)
> at java.lang.Thread.run(Thread.java:748)
> 2019-06-19 05:54:07,296 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1: java.lang.IllegalArgumentException: 247 should >= 248, and both 
> should be positive.
> 2019-06-19 05:54:07,298 INFO 
> org.apache.hadoop.hdfs.server.common.HadoopAuditLogger.audit: 
> process=Namenode operation=shutdown 

[jira] [Updated] (HADOOP-15414) Job submit not work well on HDFS Federation with Transparent Encryption feature

2019-06-16 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-15414:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks [~xiaochen], fixed by HADOOP-14445 and close this issue.

> Job submit not work well on HDFS Federation with Transparent Encryption 
> feature
> ---
>
> Key: HADOOP-15414
> URL: https://issues.apache.org/jira/browse/HADOOP-15414
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Reporter: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-15414-trunk.001.patch, 
> HADOOP-15414-trunk.002.patch
>
>
> When submit sample MapReduce job WordCount which read/write path under 
> encryption zone on HDFS Federation in security mode to YARN, task throws 
> exception as below:
> {code:java}
> 18/04/26 16:07:26 INFO mapreduce.Job: Task Id : attempt_JOBID_m_TASKID_0, 
> Status : FAILED
> Error: java.io.IOException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.createConnection(KMSClientProvider.java:489)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:776)
> at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388)
> at 
> org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:1468)
> at 
> org.apache.hadoop.hdfs.DFSClient.createWrappedInputStream(DFSClient.java:1538)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:306)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:300)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:300)
> at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:161)
> at 
> org.apache.hadoop.fs.viewfs.ChRootedFileSystem.open(ChRootedFileSystem.java:258)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.open(ViewFileSystem.java:424)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:793)
> at 
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:85)
> at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:552)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:823)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1690)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
> Caused by: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)
> at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:332)
> at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:205)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:128)
> at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:215)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.openConnection(DelegationTokenAuthenticatedURL.java:322)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider$1.run(KMSClientProvider.java:483)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider$1.run(KMSClientProvider.java:478)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1690)
> at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.createConnection(KMSClientProvider.java:478)
> ... 21 more
> Caused by: GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos tgt)
> at 
> sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
> at 
> 

[jira] [Commented] (HADOOP-16112) Delete the baseTrashPath's subDir leads to don't modify baseTrashPath

2019-06-13 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863689#comment-16863689
 ] 

He Xiaoqiao commented on HADOOP-16112:
--

[~jzhuge], Thanks for your comments.
{quote}the new unit test passes without any fix, is it valid? I understand race 
condition is hard to reproduce.{quote}
right, new unit test actually does not verify anything, so in my opinion, it is 
not a valid unit test.
I would like to state this case could be interpretable, may be not issue. IMO 
it make sense for me no matter parent or child path add timestamp to mkdir, in 
whatever way, I do not think we could guarantee consistency at the client side 
through retry. Please correct me if something wrong.

> Delete the baseTrashPath's subDir leads to don't modify baseTrashPath
> -
>
> Key: HADOOP-16112
> URL: https://issues.apache.org/jira/browse/HADOOP-16112
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 3.2.0
>Reporter: Lisheng Sun
>Priority: Major
> Attachments: HADOOP-16112.001.patch, HADOOP-16112.002.patch
>
>
> There is race condition in TrashPolicyDefault#moveToTrash
> try {
>  if (!fs.mkdirs(baseTrashPath, PERMISSION))
> { // create current LOG.warn("Can't create(mkdir) trash directory: " + 
> baseTrashPath); return false; }
> } catch (FileAlreadyExistsException e) {
>  // find the path which is not a directory, and modify baseTrashPath
>  // & trashPath, then mkdirs
>  Path existsFilePath = baseTrashPath;
>  while (!fs.exists(existsFilePath))
> { existsFilePath = existsFilePath.getParent(); }
> {color:#ff}// case{color}
> {color:#ff}  other thread deletes existsFilePath here ,the results 
> doesn't  meet expectation{color}
> {color:#ff} for example{color}
> {color:#ff}   there is 
> /user/u_sunlisheng/.Trash/Current/user/u_sunlisheng/b{color}
> {color:#ff}   when delete /user/u_sunlisheng/b/a. if existsFilePath is 
> deleted, the result is 
> /user/u_sunlisheng/.Trash/Current/user/u_sunlisheng+timstamp/b/a{color}
> {color:#ff}  so  when existsFilePath is deleted, don't modify 
> baseTrashPath.    {color}
> baseTrashPath = new Path(baseTrashPath.toString().replace(
>  existsFilePath.toString(), existsFilePath.toString() + Time.now())
>  );
> trashPath = new Path(baseTrashPath, trashPath.getName());
>  // retry, ignore current failure
>  --i;
>  continue;
>  } catch (IOException e)
> { LOG.warn("Can't create trash directory: " + baseTrashPath, e); cause = e; 
> break; }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16112) Delete the baseTrashPath's subDir leads to don't modify baseTrashPath

2019-06-13 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862802#comment-16862802
 ] 

He Xiaoqiao commented on HADOOP-16112:
--

[~leosun08] thanks for your report and sorry for late comment. just quick 
review #moveToTrash logic, I think the case you mentioned could be 
interpretable, may be not issue. [~ferhui] [~jzhuge] should be more proper to 
give another reviews.

> Delete the baseTrashPath's subDir leads to don't modify baseTrashPath
> -
>
> Key: HADOOP-16112
> URL: https://issues.apache.org/jira/browse/HADOOP-16112
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 3.2.0
>Reporter: Lisheng Sun
>Priority: Major
> Attachments: HADOOP-16112.001.patch, HADOOP-16112.002.patch
>
>
> There is race condition in TrashPolicyDefault#moveToTrash
> try {
>  if (!fs.mkdirs(baseTrashPath, PERMISSION))
> { // create current LOG.warn("Can't create(mkdir) trash directory: " + 
> baseTrashPath); return false; }
> } catch (FileAlreadyExistsException e) {
>  // find the path which is not a directory, and modify baseTrashPath
>  // & trashPath, then mkdirs
>  Path existsFilePath = baseTrashPath;
>  while (!fs.exists(existsFilePath))
> { existsFilePath = existsFilePath.getParent(); }
> {color:#ff}// case{color}
> {color:#ff}  other thread deletes existsFilePath here ,the results 
> doesn't  meet expectation{color}
> {color:#ff} for example{color}
> {color:#ff}   there is 
> /user/u_sunlisheng/.Trash/Current/user/u_sunlisheng/b{color}
> {color:#ff}   when delete /user/u_sunlisheng/b/a. if existsFilePath is 
> deleted, the result is 
> /user/u_sunlisheng/.Trash/Current/user/u_sunlisheng+timstamp/b/a{color}
> {color:#ff}  so  when existsFilePath is deleted, don't modify 
> baseTrashPath.    {color}
> baseTrashPath = new Path(baseTrashPath.toString().replace(
>  existsFilePath.toString(), existsFilePath.toString() + Time.now())
>  );
> trashPath = new Path(baseTrashPath, trashPath.getName());
>  // retry, ignore current failure
>  --i;
>  continue;
>  } catch (IOException e)
> { LOG.warn("Can't create trash directory: " + baseTrashPath, e); cause = e; 
> break; }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16112) Delete the baseTrashPath's subDir leads to don't modify baseTrashPath

2019-06-10 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860040#comment-16860040
 ] 

He Xiaoqiao commented on HADOOP-16112:
--

[~leosun08], Do you mean that trash directory should be 
{/user/test/.Trash/Current/user/test/a+timestamp/b} rather than 
{/user/test/.Trash/Current/user/test+timestamp/a/b} which is actual result in 
some corner case?

> Delete the baseTrashPath's subDir leads to don't modify baseTrashPath
> -
>
> Key: HADOOP-16112
> URL: https://issues.apache.org/jira/browse/HADOOP-16112
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 3.2.0
>Reporter: Lisheng Sun
>Priority: Major
> Attachments: HADOOP-16112.001.patch, HADOOP-16112.002.patch
>
>
> There is race condition in TrashPolicyDefault#moveToTrash
> try {
>  if (!fs.mkdirs(baseTrashPath, PERMISSION))
> { // create current LOG.warn("Can't create(mkdir) trash directory: " + 
> baseTrashPath); return false; }
> } catch (FileAlreadyExistsException e) {
>  // find the path which is not a directory, and modify baseTrashPath
>  // & trashPath, then mkdirs
>  Path existsFilePath = baseTrashPath;
>  while (!fs.exists(existsFilePath))
> { existsFilePath = existsFilePath.getParent(); }
> {color:#ff}// case{color}
> {color:#ff}  other thread deletes existsFilePath here ,the results 
> doesn't  meet expectation{color}
> {color:#ff} for example{color}
> {color:#ff}   there is 
> /user/u_sunlisheng/.Trash/Current/user/u_sunlisheng/b{color}
> {color:#ff}   when delete /user/u_sunlisheng/b/a. if existsFilePath is 
> deleted, the result is 
> /user/u_sunlisheng/.Trash/Current/user/u_sunlisheng+timstamp/b/a{color}
> {color:#ff}  so  when existsFilePath is deleted, don't modify 
> baseTrashPath.    {color}
> baseTrashPath = new Path(baseTrashPath.toString().replace(
>  existsFilePath.toString(), existsFilePath.toString() + Time.now())
>  );
> trashPath = new Path(baseTrashPath, trashPath.getName());
>  // retry, ignore current failure
>  --i;
>  continue;
>  } catch (IOException e)
> { LOG.warn("Can't create trash directory: " + baseTrashPath, e); cause = e; 
> break; }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-16112) Delete the baseTrashPath's subDir leads to don't modify baseTrashPath

2019-06-10 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859920#comment-16859920
 ] 

He Xiaoqiao edited comment on HADOOP-16112 at 6/10/19 10:55 AM:


Sorry I don't catch the point and just verify the unit test you attach in 
[^HADOOP-16112.002.patch], and it run correctly using branch trunk at local, So 
I am confused what this issue means?
Would you mind offer how to reproduce this issue? for instance(just example, no 
more info.)
{code:java}
fs -mkdir /user/test
fs -rm -r /user/test
fs -mkdir /user/test/a
fs -rm -r /user/test/b (another thread)
{code}


was (Author: hexiaoqiao):
Sorry I don't catch the point and just verify the unit test you attach in 
[^HADOOP-16112.002.patch], and it run correct, So I am confused what this issue 
means?
Would you mind offer how to reproduce this issue? for instance(just example, no 
more info.)
{code:java}
fs -mkdir /user/test
fs -rm -r /user/test
fs -mkdir /user/test/a
fs -rm -r /user/test/b (another thread)
{code}

> Delete the baseTrashPath's subDir leads to don't modify baseTrashPath
> -
>
> Key: HADOOP-16112
> URL: https://issues.apache.org/jira/browse/HADOOP-16112
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 3.2.0
>Reporter: Lisheng Sun
>Priority: Major
> Attachments: HADOOP-16112.001.patch, HADOOP-16112.002.patch
>
>
> There is race condition in TrashPolicyDefault#moveToTrash
> try {
>  if (!fs.mkdirs(baseTrashPath, PERMISSION))
> { // create current LOG.warn("Can't create(mkdir) trash directory: " + 
> baseTrashPath); return false; }
> } catch (FileAlreadyExistsException e) {
>  // find the path which is not a directory, and modify baseTrashPath
>  // & trashPath, then mkdirs
>  Path existsFilePath = baseTrashPath;
>  while (!fs.exists(existsFilePath))
> { existsFilePath = existsFilePath.getParent(); }
> {color:#ff}// case{color}
> {color:#ff}  other thread deletes existsFilePath here ,the results 
> doesn't  meet expectation{color}
> {color:#ff} for example{color}
> {color:#ff}   there is 
> /user/u_sunlisheng/.Trash/Current/user/u_sunlisheng/b{color}
> {color:#ff}   when delete /user/u_sunlisheng/b/a. if existsFilePath is 
> deleted, the result is 
> /user/u_sunlisheng/.Trash/Current/user/u_sunlisheng+timstamp/b/a{color}
> {color:#ff}  so  when existsFilePath is deleted, don't modify 
> baseTrashPath.    {color}
> baseTrashPath = new Path(baseTrashPath.toString().replace(
>  existsFilePath.toString(), existsFilePath.toString() + Time.now())
>  );
> trashPath = new Path(baseTrashPath, trashPath.getName());
>  // retry, ignore current failure
>  --i;
>  continue;
>  } catch (IOException e)
> { LOG.warn("Can't create trash directory: " + baseTrashPath, e); cause = e; 
> break; }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16112) Delete the baseTrashPath's subDir leads to don't modify baseTrashPath

2019-06-10 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859920#comment-16859920
 ] 

He Xiaoqiao commented on HADOOP-16112:
--

Sorry I don't catch the point and just verify the unit test you attach in 
[^HADOOP-16112.002.patch], and it run correct, So I am confused what this issue 
means?
Would you mind offer how to reproduce this issue? for instance(just example, no 
more info.)
{code:java}
fs -mkdir /user/test
fs -rm -r /user/test
fs -mkdir /user/test/a
fs -rm -r /user/test/b (another thread)
{code}

> Delete the baseTrashPath's subDir leads to don't modify baseTrashPath
> -
>
> Key: HADOOP-16112
> URL: https://issues.apache.org/jira/browse/HADOOP-16112
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 3.2.0
>Reporter: Lisheng Sun
>Priority: Major
> Attachments: HADOOP-16112.001.patch, HADOOP-16112.002.patch
>
>
> There is race condition in TrashPolicyDefault#moveToTrash
> try {
>  if (!fs.mkdirs(baseTrashPath, PERMISSION))
> { // create current LOG.warn("Can't create(mkdir) trash directory: " + 
> baseTrashPath); return false; }
> } catch (FileAlreadyExistsException e) {
>  // find the path which is not a directory, and modify baseTrashPath
>  // & trashPath, then mkdirs
>  Path existsFilePath = baseTrashPath;
>  while (!fs.exists(existsFilePath))
> { existsFilePath = existsFilePath.getParent(); }
> {color:#ff}// case{color}
> {color:#ff}  other thread deletes existsFilePath here ,the results 
> doesn't  meet expectation{color}
> {color:#ff} for example{color}
> {color:#ff}   there is 
> /user/u_sunlisheng/.Trash/Current/user/u_sunlisheng/b{color}
> {color:#ff}   when delete /user/u_sunlisheng/b/a. if existsFilePath is 
> deleted, the result is 
> /user/u_sunlisheng/.Trash/Current/user/u_sunlisheng+timstamp/b/a{color}
> {color:#ff}  so  when existsFilePath is deleted, don't modify 
> baseTrashPath.    {color}
> baseTrashPath = new Path(baseTrashPath.toString().replace(
>  existsFilePath.toString(), existsFilePath.toString() + Time.now())
>  );
> trashPath = new Path(baseTrashPath, trashPath.getName());
>  // retry, ignore current failure
>  --i;
>  continue;
>  } catch (IOException e)
> { LOG.warn("Can't create trash directory: " + baseTrashPath, e); cause = e; 
> break; }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16112) Delete the baseTrashPath's subDir leads to don't modify baseTrashPath

2019-06-10 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859749#comment-16859749
 ] 

He Xiaoqiao commented on HADOOP-16112:
--

[~leosun08] Thanks for your reports. Would you mind describe this issue 
detailed and how to reproduce?

> Delete the baseTrashPath's subDir leads to don't modify baseTrashPath
> -
>
> Key: HADOOP-16112
> URL: https://issues.apache.org/jira/browse/HADOOP-16112
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 3.2.0
>Reporter: Lisheng Sun
>Priority: Major
> Attachments: HADOOP-16112.001.patch, HADOOP-16112.002.patch
>
>
> There is race condition in TrashPolicyDefault#moveToTrash
> try {
>  if (!fs.mkdirs(baseTrashPath, PERMISSION))
> { // create current LOG.warn("Can't create(mkdir) trash directory: " + 
> baseTrashPath); return false; }
> } catch (FileAlreadyExistsException e) {
>  // find the path which is not a directory, and modify baseTrashPath
>  // & trashPath, then mkdirs
>  Path existsFilePath = baseTrashPath;
>  while (!fs.exists(existsFilePath))
> { existsFilePath = existsFilePath.getParent(); }
> {color:#ff}// case{color}
> {color:#ff}  other thread deletes existsFilePath here ,the results 
> doesn't  meet expectation{color}
> {color:#ff} for example{color}
> {color:#ff}   there is 
> /user/u_sunlisheng/.Trash/Current/user/u_sunlisheng/b{color}
> {color:#ff}   when delete /user/u_sunlisheng/b/a. if existsFilePath is 
> deleted, the result is 
> /user/u_sunlisheng/.Trash/Current/user/u_sunlisheng+timstamp/b/a{color}
> {color:#ff}  so  when existsFilePath is deleted, don't modify 
> baseTrashPath.    {color}
> baseTrashPath = new Path(baseTrashPath.toString().replace(
>  existsFilePath.toString(), existsFilePath.toString() + Time.now())
>  );
> trashPath = new Path(baseTrashPath, trashPath.getName());
>  // retry, ignore current failure
>  --i;
>  continue;
>  } catch (IOException e)
> { LOG.warn("Can't create trash directory: " + baseTrashPath, e); cause = e; 
> break; }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-05-15 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840523#comment-16840523
 ] 

He Xiaoqiao commented on HADOOP-16161:
--

Thanks [~elgoiri] for the review and commit.

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HADOOP-16161.001.patch, HADOOP-16161.002.patch, 
> HADOOP-16161.003.patch, HADOOP-16161.004.patch, HADOOP-16161.005.patch, 
> HADOOP-16161.006.patch, HADOOP-16161.007.patch, HADOOP-16161.008.patch, 
> HADOOP-16161.009.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-05-07 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834419#comment-16834419
 ] 

He Xiaoqiao commented on HADOOP-16161:
--

[~elgoiri], any furthermore comments or suggestions about this fix?

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16161.001.patch, HADOOP-16161.002.patch, 
> HADOOP-16161.003.patch, HADOOP-16161.004.patch, HADOOP-16161.005.patch, 
> HADOOP-16161.006.patch, HADOOP-16161.007.patch, HADOOP-16161.008.patch, 
> HADOOP-16161.009.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16254) Add proxy address in IPC connection

2019-05-07 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834418#comment-16834418
 ] 

He Xiaoqiao commented on HADOOP-16254:
--

[^HADOOP-16254.004.patch] 
1. set proxy address in IPC connection and avoid to pass on every call and 
reduce RPC load.
2. expose interface using static method {{Client#setProxyAddress}} in this 
version so it is possible that every one can set the field. welcomes some 
suggestions about safeguard ways.

> Add proxy address in IPC connection
> ---
>
> Key: HADOOP-16254
> URL: https://issues.apache.org/jira/browse/HADOOP-16254
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: ipc
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16254.001.patch, HADOOP-16254.002.patch, 
> HADOOP-16254.004.patch
>
>
> In order to support data locality of RBF, we need to add new field about 
> client hostname in the RPC headers of Router protocol calls.
>  clientHostname represents hostname of client and forward by Router to 
> Namenode to support data locality friendly. See more [RBF Data Locality 
> Design|https://issues.apache.org/jira/secure/attachment/12965092/RBF%20Data%20Locality%20Design.pdf]
>  in HDFS-13248 and [maillist 
> vote|http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201904.mbox/%3CCAF3Ajax7hGxvowg4K_HVTZeDqC5H=3bfb7mv5sz5mgvadhv...@mail.gmail.com%3E].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16254) Add proxy address in IPC connection

2019-05-07 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-16254:
-
Attachment: HADOOP-16254.004.patch

> Add proxy address in IPC connection
> ---
>
> Key: HADOOP-16254
> URL: https://issues.apache.org/jira/browse/HADOOP-16254
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: ipc
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16254.001.patch, HADOOP-16254.002.patch, 
> HADOOP-16254.004.patch
>
>
> In order to support data locality of RBF, we need to add new field about 
> client hostname in the RPC headers of Router protocol calls.
>  clientHostname represents hostname of client and forward by Router to 
> Namenode to support data locality friendly. See more [RBF Data Locality 
> Design|https://issues.apache.org/jira/secure/attachment/12965092/RBF%20Data%20Locality%20Design.pdf]
>  in HDFS-13248 and [maillist 
> vote|http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201904.mbox/%3CCAF3Ajax7hGxvowg4K_HVTZeDqC5H=3bfb7mv5sz5mgvadhv...@mail.gmail.com%3E].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16254) Add proxy address in IPC connection

2019-05-07 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-16254:
-
Summary: Add proxy address in IPC connection  (was: Add clientHostname to 
RPC header)

> Add proxy address in IPC connection
> ---
>
> Key: HADOOP-16254
> URL: https://issues.apache.org/jira/browse/HADOOP-16254
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: ipc
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16254.001.patch, HADOOP-16254.002.patch, 
> HADOOP-16254.004.patch
>
>
> In order to support data locality of RBF, we need to add new field about 
> client hostname in the RPC headers of Router protocol calls.
>  clientHostname represents hostname of client and forward by Router to 
> Namenode to support data locality friendly. See more [RBF Data Locality 
> Design|https://issues.apache.org/jira/secure/attachment/12965092/RBF%20Data%20Locality%20Design.pdf]
>  in HDFS-13248 and [maillist 
> vote|http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201904.mbox/%3CCAF3Ajax7hGxvowg4K_HVTZeDqC5H=3bfb7mv5sz5mgvadhv...@mail.gmail.com%3E].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16254) Add clientHostname to RPC header

2019-04-17 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819802#comment-16819802
 ] 

He Xiaoqiao commented on HADOOP-16254:
--

Thanks [~daryn], [~vinayrpet] for your further valuable advices.
{quote}Include complete client's socket address instead of just hostname(i.e. 
Hostname/IP:port ). This will help in identifying details about particular 
client if required.
Instead of changing the RPC Request header, add the same field in 
"IpcConnectionContextProto" as suggested by Daryn Sharp in the previous Jira.
Definitely don't want the peer address info passed on every call.
{quote}
it makes sense for me, I fully agree to maintain complete client's socket 
address and move to "IpcConnectionContextProto", thus it does not need to do 
domain resolve when #getRemoteAddress and also can reduce RPC load.
{quote}I was having deja vu seeing this jira. {quote}
Yes, this ticket originates from HDFS-13248, and based on some more discussions 
and maillist suggestions, I initiate this issue, so...
{quote}Does it allow anyone to spoof addresses?  If I didn't miss a safeguard, 
-1 on this massive security hole.{quote}
About security vulnerability, I think it is limited, take RBF as an example,
1. router server will never use this field even if client set it.
2. I think we can reinforce checking at RPC layer (only regard as legal 
parameter if current user/ugi is superuser) if client set proxyHostname and 
send RPC request to Namenode directly.
The current patch is just draft version, and continue welcome furthermore 
suggestions. Thanks all again.

> Add clientHostname to RPC header
> 
>
> Key: HADOOP-16254
> URL: https://issues.apache.org/jira/browse/HADOOP-16254
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: ipc
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16254.001.patch, HADOOP-16254.002.patch
>
>
> In order to support data locality of RBF, we need to add new field about 
> client hostname in the RPC headers of Router protocol calls.
>  clientHostname represents hostname of client and forward by Router to 
> Namenode to support data locality friendly. See more [RBF Data Locality 
> Design|https://issues.apache.org/jira/secure/attachment/12965092/RBF%20Data%20Locality%20Design.pdf]
>  in HDFS-13248 and [maillist 
> vote|http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201904.mbox/%3CCAF3Ajax7hGxvowg4K_HVTZeDqC5H=3bfb7mv5sz5mgvadhv...@mail.gmail.com%3E].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16254) Add clientHostname to RPC header

2019-04-16 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819018#comment-16819018
 ] 

He Xiaoqiao commented on HADOOP-16254:
--

Thanks [~elgoiri],  [^HADOOP-16254.002.patch] update the new field named 
'proxyHostname' generic following suggestions. And new test to verify protocol.

> Add clientHostname to RPC header
> 
>
> Key: HADOOP-16254
> URL: https://issues.apache.org/jira/browse/HADOOP-16254
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: ipc
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16254.001.patch, HADOOP-16254.002.patch
>
>
> In order to support data locality of RBF, we need to add new field about 
> client hostname in the RPC headers of Router protocol calls.
>  clientHostname represents hostname of client and forward by Router to 
> Namenode to support data locality friendly. See more [RBF Data Locality 
> Design|https://issues.apache.org/jira/secure/attachment/12965092/RBF%20Data%20Locality%20Design.pdf]
>  in HDFS-13248 and [maillist 
> vote|http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201904.mbox/%3CCAF3Ajax7hGxvowg4K_HVTZeDqC5H=3bfb7mv5sz5mgvadhv...@mail.gmail.com%3E].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16254) Add clientHostname to RPC header

2019-04-16 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-16254:
-
Attachment: HADOOP-16254.002.patch

> Add clientHostname to RPC header
> 
>
> Key: HADOOP-16254
> URL: https://issues.apache.org/jira/browse/HADOOP-16254
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: ipc
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16254.001.patch, HADOOP-16254.002.patch
>
>
> In order to support data locality of RBF, we need to add new field about 
> client hostname in the RPC headers of Router protocol calls.
>  clientHostname represents hostname of client and forward by Router to 
> Namenode to support data locality friendly. See more [RBF Data Locality 
> Design|https://issues.apache.org/jira/secure/attachment/12965092/RBF%20Data%20Locality%20Design.pdf]
>  in HDFS-13248 and [maillist 
> vote|http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201904.mbox/%3CCAF3Ajax7hGxvowg4K_HVTZeDqC5H=3bfb7mv5sz5mgvadhv...@mail.gmail.com%3E].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16254) Add clientHostname to RPC header

2019-04-15 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-16254:
-
Attachment: HADOOP-16254.001.patch
Status: Patch Available  (was: Open)

submit draft version patch, unit test will be following later.
cc [~elgoiri], [~ajisakaa], [~ayushtkn], [~vinayrpet],[~giovanni.fumarola], 
please have a look if convenient. Thanks.

> Add clientHostname to RPC header
> 
>
> Key: HADOOP-16254
> URL: https://issues.apache.org/jira/browse/HADOOP-16254
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: ipc
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16254.001.patch
>
>
> In order to support data locality of RBF, we need to add new field about 
> client hostname in the RPC headers of Router protocol calls.
>  clientHostname represents hostname of client and forward by Router to 
> Namenode to support data locality friendly. See more [RBF Data Locality 
> Design|https://issues.apache.org/jira/secure/attachment/12965092/RBF%20Data%20Locality%20Design.pdf]
>  in HDFS-13248 and [maillist 
> vote|http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201904.mbox/%3CCAF3Ajax7hGxvowg4K_HVTZeDqC5H=3bfb7mv5sz5mgvadhv...@mail.gmail.com%3E].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16254) Add clientHostname to RPC header

2019-04-15 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-16254:
-
Description: 
In order to support data locality of RBF, we need to add new field about client 
hostname in the RPC headers of Router protocol calls.
 clientHostname represents hostname of client and forward by Router to Namenode 
to support data locality friendly. See more [RBF Data Locality 
Design|https://issues.apache.org/jira/secure/attachment/12965092/RBF%20Data%20Locality%20Design.pdf]
 in HDFS-13248 and [maillist 
vote|http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201904.mbox/%3CCAF3Ajax7hGxvowg4K_HVTZeDqC5H=3bfb7mv5sz5mgvadhv...@mail.gmail.com%3E].

  was:
In order to support data locality of RBF, we need to add new field about client 
hostname in the RPC headers of Router protocol calls.
 clientHostname represents hostname of client and forward by Router to Namenode 
to support data locality friendly. See more [^RBF Data Locality Design.pdf] in 
HDFS-13248 and [maillist 
vote|http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201904.mbox/%3CCAF3Ajax7hGxvowg4K_HVTZeDqC5H=3bfb7mv5sz5mgvadhv...@mail.gmail.com%3E].


> Add clientHostname to RPC header
> 
>
> Key: HADOOP-16254
> URL: https://issues.apache.org/jira/browse/HADOOP-16254
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: ipc
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
>
> In order to support data locality of RBF, we need to add new field about 
> client hostname in the RPC headers of Router protocol calls.
>  clientHostname represents hostname of client and forward by Router to 
> Namenode to support data locality friendly. See more [RBF Data Locality 
> Design|https://issues.apache.org/jira/secure/attachment/12965092/RBF%20Data%20Locality%20Design.pdf]
>  in HDFS-13248 and [maillist 
> vote|http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201904.mbox/%3CCAF3Ajax7hGxvowg4K_HVTZeDqC5H=3bfb7mv5sz5mgvadhv...@mail.gmail.com%3E].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16254) Add clientHostname to RPC header

2019-04-15 Thread He Xiaoqiao (JIRA)
He Xiaoqiao created HADOOP-16254:


 Summary: Add clientHostname to RPC header
 Key: HADOOP-16254
 URL: https://issues.apache.org/jira/browse/HADOOP-16254
 Project: Hadoop Common
  Issue Type: New Feature
  Components: ipc
Reporter: He Xiaoqiao
Assignee: He Xiaoqiao


In order to support data locality of RBF, we need to add new field about client 
hostname in the RPC headers of Router protocol calls.
 clientHostname represents hostname of client and forward by Router to Namenode 
to support data locality friendly. See more [^RBF Data Locality Design.pdf] in 
HDFS-13248 and [maillist 
vote|http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201904.mbox/%3CCAF3Ajax7hGxvowg4K_HVTZeDqC5H=3bfb7mv5sz5mgvadhv...@mail.gmail.com%3E].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-04-03 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809484#comment-16809484
 ] 

He Xiaoqiao commented on HADOOP-16161:
--

Thanks [~elgoiri] correct {{assertEquals}} usage times,  
[^HADOOP-16161.009.patch] update that.
Maybe I need to highlight this rule. :)
{quote}assertEquals() should have the excepted value as the first parameter and 
the second to be the one checking{quote}
Thanks again.

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16161.001.patch, HADOOP-16161.002.patch, 
> HADOOP-16161.003.patch, HADOOP-16161.004.patch, HADOOP-16161.005.patch, 
> HADOOP-16161.006.patch, HADOOP-16161.007.patch, HADOOP-16161.008.patch, 
> HADOOP-16161.009.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-04-03 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-16161:
-
Attachment: HADOOP-16161.009.patch

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16161.001.patch, HADOOP-16161.002.patch, 
> HADOOP-16161.003.patch, HADOOP-16161.004.patch, HADOOP-16161.005.patch, 
> HADOOP-16161.006.patch, HADOOP-16161.007.patch, HADOOP-16161.008.patch, 
> HADOOP-16161.009.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-04-03 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808867#comment-16808867
 ] 

He Xiaoqiao commented on HADOOP-16161:
--

Thanks [~elgoiri],  [^HADOOP-16161.008.patch] update unit test and replace all 
#assertThat using assertEquals. pending Jenkins.

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16161.001.patch, HADOOP-16161.002.patch, 
> HADOOP-16161.003.patch, HADOOP-16161.004.patch, HADOOP-16161.005.patch, 
> HADOOP-16161.006.patch, HADOOP-16161.007.patch, HADOOP-16161.008.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-04-03 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-16161:
-
Attachment: HADOOP-16161.008.patch

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16161.001.patch, HADOOP-16161.002.patch, 
> HADOOP-16161.003.patch, HADOOP-16161.004.patch, HADOOP-16161.005.patch, 
> HADOOP-16161.006.patch, HADOOP-16161.007.patch, HADOOP-16161.008.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-04-01 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806728#comment-16806728
 ] 

He Xiaoqiao commented on HADOOP-16161:
--

Thanks [~elgoiri] for your reviews,  [^HADOOP-16161.007.patch] add new unit 
test to coverage #sortLocatedBlocks.

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16161.001.patch, HADOOP-16161.002.patch, 
> HADOOP-16161.003.patch, HADOOP-16161.004.patch, HADOOP-16161.005.patch, 
> HADOOP-16161.006.patch, HADOOP-16161.007.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-04-01 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-16161:
-
Attachment: HADOOP-16161.007.patch

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16161.001.patch, HADOOP-16161.002.patch, 
> HADOOP-16161.003.patch, HADOOP-16161.004.patch, HADOOP-16161.005.patch, 
> HADOOP-16161.006.patch, HADOOP-16161.007.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-03-12 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791292#comment-16791292
 ] 

He Xiaoqiao commented on HADOOP-16161:
--

[~elgoiri], Thanks for digging deep to find the truth.
{quote}The only issue would be if some code had a dependency on particular 
values of the weight.
However, it looks like the test are passing with no issues.{quote}
Actually, #getWeightUsingNetworkLocation is only invoke by getBlockLocations 
which determines block locations' order. in one word, 
#getWeightUsingNetworkLocation is about Read Locality. However I do not find 
any unit test to verify result of #sortLocatedBlocks directly in 
#blockmanagement, So it is well understood that test are passed under 
#blockmanagement.

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16161.001.patch, HADOOP-16161.002.patch, 
> HADOOP-16161.003.patch, HADOOP-16161.004.patch, HADOOP-16161.005.patch, 
> HADOOP-16161.006.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-03-12 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790333#comment-16790333
 ] 

He Xiaoqiao commented on HADOOP-16161:
--

Thanks [~elgoiri], do we need another one reviews?

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16161.001.patch, HADOOP-16161.002.patch, 
> HADOOP-16161.003.patch, HADOOP-16161.004.patch, HADOOP-16161.005.patch, 
> HADOOP-16161.006.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-03-09 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16788842#comment-16788842
 ] 

He Xiaoqiao commented on HADOOP-16161:
--

Thanks [~elgoiri], [^HADOOP-16161.006.patch] correct the format of unit test 
code. Please help to review at your convenience.

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16161.001.patch, HADOOP-16161.002.patch, 
> HADOOP-16161.003.patch, HADOOP-16161.004.patch, HADOOP-16161.005.patch, 
> HADOOP-16161.006.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-03-09 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-16161:
-
Attachment: HADOOP-16161.006.patch

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16161.001.patch, HADOOP-16161.002.patch, 
> HADOOP-16161.003.patch, HADOOP-16161.004.patch, HADOOP-16161.005.patch, 
> HADOOP-16161.006.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-03-09 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-16161:
-
Attachment: HADOOP-16161.005.patch

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16161.001.patch, HADOOP-16161.002.patch, 
> HADOOP-16161.003.patch, HADOOP-16161.004.patch, HADOOP-16161.005.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-03-09 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16788685#comment-16788685
 ] 

He Xiaoqiao commented on HADOOP-16161:
--

[~elgoiri], Thanks.
{quote}One minor thing: the assertEquals() in testGetWeight() are still 
reversed.
{quote}
right, I update and reverse parameters assertEquals in 
[^HADOOP-16161.005.patch] .
{quote}We could also initialize nodeNotInMap right before using it.
{quote}
Sorry I don't get the point, variable #nodeNotInMap has initialized 
respectively in #testGetWeight and #testGetWeightForDepth before using it. If 
you means initialize when #setup, it may be infeasible since depth of topology 
is not the same between different test cases. please correct if I am wrong. 
Thanks again.

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16161.001.patch, HADOOP-16161.002.patch, 
> HADOOP-16161.003.patch, HADOOP-16161.004.patch, HADOOP-16161.005.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-03-07 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-16161:
-
Attachment: HADOOP-16161.004.patch

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16161.001.patch, HADOOP-16161.002.patch, 
> HADOOP-16161.003.patch, HADOOP-16161.004.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-03-07 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787499#comment-16787499
 ] 

He Xiaoqiao commented on HADOOP-16161:
--

Thanks [~elgoiri] for reviews. update patch following comments and reload  
[^HADOOP-16161.004.patch].

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16161.001.patch, HADOOP-16161.002.patch, 
> HADOOP-16161.003.patch, HADOOP-16161.004.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16119) KMS on Hadoop RPC Engine

2019-03-06 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786438#comment-16786438
 ] 

He Xiaoqiao commented on HADOOP-16119:
--

[~jojochuang] Thanks for your quick response. And sorry for fuzzy expression.
{quote}Regarding delegation tokens – delegation tokens are stored in zookeeper, 
and after HADOOP-14445, delegation tokens are shared among KMS instances.{quote}
My branch is based on branch-2.7 and not patch HADOOP-14445, it makes sense for 
me. If just consider community version (include branch trunk), It seems to 
offer local storage with Java KeyStore only and no other choice, Please correct 
me if I am wrong. Looking forward to CKTS open source.
About part "HA", I means KMS instance adding/removing/fault is not transparent 
for client. title "HA" may mislead, I think this is also scalability issue and 
sorry for that.:)

> KMS on Hadoop RPC Engine
> 
>
> Key: HADOOP-16119
> URL: https://issues.apache.org/jira/browse/HADOOP-16119
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: Design doc_ KMS v2.pdf
>
>
> Per discussion on common-dev and text copied here for ease of reference.
> https://lists.apache.org/thread.html/0e2eeaf07b013f17fad6d362393f53d52041828feec53dcddff04808@%3Ccommon-dev.hadoop.apache.org%3E
> {noformat}
> Thanks all for the inputs,
> To offer additional information (while Daryn is working on his stuff),
> optimizing RPC encryption opens up another possibility: migrating KMS
> service to use Hadoop RPC.
> Today's KMS uses HTTPS + REST API, much like webhdfs. It has very
> undesirable performance (a few thousand ops per second) compared to
> NameNode. Unfortunately for each NameNode namespace operation you also need
> to access KMS too.
> Migrating KMS to Hadoop RPC greatly improves its performance (if
> implemented correctly), and RPC encryption would be a prerequisite. So
> please keep that in mind when discussing the Hadoop RPC encryption
> improvements. Cloudera is very interested to help with the Hadoop RPC
> encryption project because a lot of our customers are using at-rest
> encryption, and some of them are starting to hit KMS performance limit.
> This whole "migrating KMS to Hadoop RPC" was Daryn's idea. I heard this
> idea in the meetup and I am very thrilled to see this happening because it
> is a real issue bothering some of our customers, and I suspect it is the
> right solution to address this tech debt.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-03-06 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786402#comment-16786402
 ] 

He Xiaoqiao commented on HADOOP-16161:
--

Thanks [~elgoiri]. resubmit [^HADOOP-16161.003.patch] which following review 
comments, and pending Jenkins.

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16161.001.patch, HADOOP-16161.002.patch, 
> HADOOP-16161.003.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-03-06 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-16161:
-
Attachment: HADOOP-16161.003.patch

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16161.001.patch, HADOOP-16161.002.patch, 
> HADOOP-16161.003.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-16119) KMS on Hadoop RPC Engine

2019-03-06 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785610#comment-16785610
 ] 

He Xiaoqiao edited comment on HADOOP-16119 at 3/6/19 1:12 PM:
--

[~jojochuang] I would like to offer issues about current version of KMS that I 
meet in practice.
 # Scalability: now it is difficult to scale KMS instance friendly since 
delegation token and all key data are isolated between different KMS instances 
at all.
 # Transparent: KMSClient has to upgrade the configuration even when add one 
KMS instance.
 # HA: it seems that KMS instances is peer-to-peer arch. but client has to try 
one by one util success if some one fault. the cost is very high.
 # Data Consistency: KMS instance manages key by Java KeyStore isolated, KMS 
client request to create key to all KMS instances serially, if one of them 
failed for some reason, create-request will throw exception and key in KeyStore 
of different instances will be same completely, also no check background as far 
as I know.
Some of them are also mentioned and as exit criteria in [^Design doc_ KMS 
v2.pdf] via [~jojochuang].

I think the core issue is no share-storage between different instance in one 
word.
I propose to create a plugin ShareStore as file/dbms/zookeeper behind KMS 
instance, and let KMS stateless. It seems work well using the share storage 
reference RBF. Another side, we can also retain cache mechanism to improve the 
performance. 
 [~jojochuang] please do correct me if I am wrong.


was (Author: hexiaoqiao):
[~jojochuang] I would like to offer issues about current version of KMS that I 
meet in practice.
 # Scalability: now it is difficult to scale KMS instance friendly since 
delegation token and all key data are isolated between different KMS instances 
at all.
 # Transparent: KMSClient has to upgrade the configuration even when add one 
KMS instance.
 # HA: it seems that KMS instances is peer-to-peer arch. but client has to try 
one by one util success if some one fault. the cost is very high.
 # Data Consistency: KMS instance manages key by Java KeyStore isolated, KMS 
client request to create key to all KMS instances serially, if one of them 
failed for some reason, create-request will throw exception and key in KeyStore 
of different instances will be same completely, also no check background as far 
as I know.

I think the core issue is no share-storage between different instance in one 
word.
 I propose to create a plugin ShareStore as file/dbms/zookeeper behind KMS 
instance, and let KMS stateless. It seems work well using the share storage 
reference RBF. Another side, we can also retain cache mechanism to improve the 
performance. 
 [~jojochuang] please do correct me if I am wrong.

> KMS on Hadoop RPC Engine
> 
>
> Key: HADOOP-16119
> URL: https://issues.apache.org/jira/browse/HADOOP-16119
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: Design doc_ KMS v2.pdf
>
>
> Per discussion on common-dev and text copied here for ease of reference.
> https://lists.apache.org/thread.html/0e2eeaf07b013f17fad6d362393f53d52041828feec53dcddff04808@%3Ccommon-dev.hadoop.apache.org%3E
> {noformat}
> Thanks all for the inputs,
> To offer additional information (while Daryn is working on his stuff),
> optimizing RPC encryption opens up another possibility: migrating KMS
> service to use Hadoop RPC.
> Today's KMS uses HTTPS + REST API, much like webhdfs. It has very
> undesirable performance (a few thousand ops per second) compared to
> NameNode. Unfortunately for each NameNode namespace operation you also need
> to access KMS too.
> Migrating KMS to Hadoop RPC greatly improves its performance (if
> implemented correctly), and RPC encryption would be a prerequisite. So
> please keep that in mind when discussing the Hadoop RPC encryption
> improvements. Cloudera is very interested to help with the Hadoop RPC
> encryption project because a lot of our customers are using at-rest
> encryption, and some of them are starting to hit KMS performance limit.
> This whole "migrating KMS to Hadoop RPC" was Daryn's idea. I heard this
> idea in the meetup and I am very thrilled to see this happening because it
> is a real issue bothering some of our customers, and I suspect it is the
> right solution to address this tech debt.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16119) KMS on Hadoop RPC Engine

2019-03-06 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785610#comment-16785610
 ] 

He Xiaoqiao commented on HADOOP-16119:
--

[~jojochuang] I would like to offer issues about current version of KMS that I 
meet in practice.
 # Scalability: now it is difficult to scale KMS instance friendly since 
delegation token and all key data are isolated between different KMS instances 
at all.
 # Transparent: KMSClient has to upgrade the configuration even when add one 
KMS instance.
 # HA: it seems that KMS instances is peer-to-peer arch. but client has to try 
one by one util success if some one fault. the cost is very high.
 # Data Consistency: KMS instance manages key by Java KeyStore isolated, KMS 
client request to create key to all KMS instances serially, if one of them 
failed for some reason, create-request will throw exception and key in KeyStore 
of different instances will be same completely, also no check background as far 
as I know.

I think the core issue is no share-storage between different instance in one 
word.
 I propose to create a plugin ShareStore as file/dbms/zookeeper behind KMS 
instance, and let KMS stateless. It seems work well using the share storage 
reference RBF. Another side, we can also retain cache mechanism to improve the 
performance. 
 [~jojochuang] please do correct me if I am wrong.

> KMS on Hadoop RPC Engine
> 
>
> Key: HADOOP-16119
> URL: https://issues.apache.org/jira/browse/HADOOP-16119
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: Design doc_ KMS v2.pdf
>
>
> Per discussion on common-dev and text copied here for ease of reference.
> https://lists.apache.org/thread.html/0e2eeaf07b013f17fad6d362393f53d52041828feec53dcddff04808@%3Ccommon-dev.hadoop.apache.org%3E
> {noformat}
> Thanks all for the inputs,
> To offer additional information (while Daryn is working on his stuff),
> optimizing RPC encryption opens up another possibility: migrating KMS
> service to use Hadoop RPC.
> Today's KMS uses HTTPS + REST API, much like webhdfs. It has very
> undesirable performance (a few thousand ops per second) compared to
> NameNode. Unfortunately for each NameNode namespace operation you also need
> to access KMS too.
> Migrating KMS to Hadoop RPC greatly improves its performance (if
> implemented correctly), and RPC encryption would be a prerequisite. So
> please keep that in mind when discussing the Hadoop RPC encryption
> improvements. Cloudera is very interested to help with the Hadoop RPC
> encryption project because a lot of our customers are using at-rest
> encryption, and some of them are starting to hit KMS performance limit.
> This whole "migrating KMS to Hadoop RPC" was Daryn's idea. I heard this
> idea in the meetup and I am very thrilled to see this happening because it
> is a real issue bothering some of our customers, and I suspect it is the
> right solution to address this tech debt.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-03-05 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784754#comment-16784754
 ] 

He Xiaoqiao commented on HADOOP-16161:
--

upload patch  [^HADOOP-16161.002.patch], add complex topology for test deeper 
level nodes.

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16161.001.patch, HADOOP-16161.002.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-03-05 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-16161:
-
Attachment: HADOOP-16161.002.patch

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16161.001.patch, HADOOP-16161.002.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-03-05 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784734#comment-16784734
 ] 

He Xiaoqiao commented on HADOOP-16161:
--

[~elgoiri], Thanks for your continue following. In my test it can cover any 
depth of topology. I would like to add more unit test to cover your concerns 
later.

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16161.001.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-03-05 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784467#comment-16784467
 ] 

He Xiaoqiao commented on HADOOP-16161:
--

I would like to offer more comment about this issue. The following is the 
complete #getWeightUsingNetworkLocation (only for no datanode client) code 
based on branch trunk.
a. normalize both reader and datanode network location, the result is *rack 
location* marked readerPath and nodePath which is parent of reader or datanode, 
both are calculate by rack aware script if configure.
b. split both network location by slash, then get the smaller one level.
c. find the deepest node which is the common ancestor/parent of the network 
location mentioned in step a.
d. based on step c, calculate topology distance between readerPath and nodePath.

All above steps are correct, but the result is distance between parent of 
reader and parent of node, rather than reader to node. So adding a +2 can avoid 
this issue I think. welcome discuss and please help to correct me if there are 
something wrong. 

{code:java}
  private static int getWeightUsingNetworkLocation(Node reader, Node node) {
//Start off by initializing to Integer.MAX_VALUE
int weight = Integer.MAX_VALUE;
if(reader != null && node != null) {
  String readerPath = normalizeNetworkLocationPath(
  reader.getNetworkLocation());
  String nodePath = normalizeNetworkLocationPath(
  node.getNetworkLocation());

  //same rack
  if(readerPath.equals(nodePath)) {
if(reader.getName().equals(node.getName())) {
  weight = 0;
} else {
  weight = 2;
}
  } else {
String[] readerPathToken = readerPath.split(PATH_SEPARATOR_STR);
String[] nodePathToken = nodePath.split(PATH_SEPARATOR_STR);
int maxLevelToCompare = readerPathToken.length > nodePathToken.length ?
nodePathToken.length : readerPathToken.length;
int currentLevel = 1;
//traverse through the path and calculate the distance
while(currentLevel < maxLevelToCompare) {
  if(!readerPathToken[currentLevel]
  .equals(nodePathToken[currentLevel])){
break;
  }
  currentLevel++;
}
weight = (readerPathToken.length - currentLevel) +
(nodePathToken.length - currentLevel);
  }
}
return weight;
  }
{code}

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16161.001.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-03-04 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783600#comment-16783600
 ] 

He Xiaoqiao commented on HADOOP-16161:
--

[~elgoiri], I think it is not related to depth of the topology, since 
#getWeightUsingNetworkLocation does not calculate with the leaf of topology 
only. FYI.

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16161.001.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Moved] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-03-04 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao moved HDFS-14332 to HADOOP-16161:
-

Component/s: (was: namenode)
 net
Key: HADOOP-16161  (was: HDFS-14332)
Project: Hadoop Common  (was: Hadoop HDFS)

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16161.001.patch, HDFS-14332.001.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-03-04 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-16161:
-
Attachment: (was: HDFS-14332.001.patch)

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16161.001.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-03-04 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783380#comment-16783380
 ] 

He Xiaoqiao commented on HADOOP-16161:
--

move from project HDFS to COMMON and rename the patch.

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16161.001.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16161) NetworkTopology#getWeightUsingNetworkLocation return unexpected result

2019-03-04 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-16161:
-
Attachment: HADOOP-16161.001.patch

> NetworkTopology#getWeightUsingNetworkLocation return unexpected result
> --
>
> Key: HADOOP-16161
> URL: https://issues.apache.org/jira/browse/HADOOP-16161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: net
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-16161.001.patch, HDFS-14332.001.patch
>
>
> Consider the following scenario:
> 1. there are 4 slaves and topology like:
> Rack: /IDC/RACK1
>hostname1
>hostname2
> Rack: /IDC/RACK2
>hostname3
>hostname4
> 2. Reader from hostname1, and calculate weight between reader and [hostname1, 
> hostname3, hostname4] by #getWeight, and their corresponding values are 
> [0,4,4]
> 3. Reader from client which is not in the topology, and in the same IDC but 
> in none rack of the topology, and calculate weight between reader and 
> [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and 
> their corresponding values are [2,2,2]
> 4. Other different Reader can get the similar results.
> The weight result for case #3 is obviously not the expected value, the truth 
> is [4,4,4]. this issue may cause reader not really following arrange: local 
> -> local rack -> remote rack. 
> After dig the detailed implement, the root cause is 
> #getWeightUsingNetworkLocation only calculate distance between Racks rather 
> than hosts.
> I think we should add constant 2 to correct the weight of 
> #getWeightUsingNetworkLocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15864) Job submitter / executor fail when SBN domain name can not resolved

2019-03-02 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-15864:
-
Attachment: HADOOP-15864.005.patch
Status: Patch Available  (was: Reopened)

re-upload patch  [^HADOOP-15864.005.patch] (same as last one) and just for 
trigger Jenkins.

> Job submitter / executor fail when SBN domain name can not resolved
> ---
>
> Key: HADOOP-15864
> URL: https://issues.apache.org/jira/browse/HADOOP-15864
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Critical
> Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.2
>
> Attachments: HADOOP-15864-branch.2.7.001.patch, 
> HADOOP-15864-branch.2.7.002.patch, HADOOP-15864.003.patch, 
> HADOOP-15864.004.patch, HADOOP-15864.005.patch, 
> HADOOP-15864.branch.2.7.004.patch
>
>
> Job submit failure and Task executes failure if Standby NameNode domain name 
> can not resolved on HDFS HA with DelegationToken feature.
> This issue is triggered when create {{ConfiguredFailoverProxyProvider}} 
> instance which invoke {{HAUtil.cloneDelegationTokenForLogicalUri}} in HA mode 
> with Security. Since in HDFS HA mode UGI need include separate token for each 
> NameNode in order to dealing with Active-Standby switch, the double tokens' 
> content is same of course. 
> However when #setTokenService in {{HAUtil.cloneDelegationTokenForLogicalUri}} 
> it checks whether the address of NameNode has been resolved or not, if Not, 
> throw #IllegalArgumentException upon, then job submitter/ task executor fail.
> HDFS-8068 and HADOOP-12125 try to fix it, but I don't think the two tickets 
> resolve completely.
> Another questions many guys consider is why NameNode domain name can not 
> resolve? I think there are many scenarios, for instance node replace when 
> meet fault, and refresh DNS sometimes. Anyway, Standby NameNode failure 
> should not impact Hadoop cluster stability in my opinion.
> a. code ref: org.apache.hadoop.security.SecurityUtil line373-386
> {code:java}
>   public static Text buildTokenService(InetSocketAddress addr) {
> String host = null;
> if (useIpForTokenService) {
>   if (addr.isUnresolved()) { // host has no ip address
> throw new IllegalArgumentException(
> new UnknownHostException(addr.getHostName())
> );
>   }
>   host = addr.getAddress().getHostAddress();
> } else {
>   host = StringUtils.toLowerCase(addr.getHostName());
> }
> return new Text(host + ":" + addr.getPort());
>   }
> {code}
> b.exception log ref:
> {code:xml}
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Couldn't create proxy provider class 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:515)
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:170)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:761)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:691)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:150)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
> at 
> org.apache.hadoop.fs.viewfs.ChRootedFileSystem.(ChRootedFileSystem.java:106)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:178)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:172)
> at org.apache.hadoop.fs.viewfs.InodeTree.createLink(InodeTree.java:303)
> at org.apache.hadoop.fs.viewfs.InodeTree.(InodeTree.java:377)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.(ViewFileSystem.java:172)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.initialize(ViewFileSystem.java:172)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
> at 

[jira] [Updated] (HADOOP-15883) Fix WebHdfsFileSystemContract test

2019-03-01 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-15883:
-
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

> Fix WebHdfsFileSystemContract test
> --
>
> Key: HADOOP-15883
> URL: https://issues.apache.org/jira/browse/HADOOP-15883
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.0.4, 3.3.0, 3.1.2, 3.2.1
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-15883.001.patch
>
>
> HADOOP-15864 fix bug about Job/Task execute failure when server (NameNode, 
> KMS, Timeline) domain name can not resolve. meanwhile it change semantic of 
> http status code about webhdfsfilesystem, this ticket will trace to fix 
> TestWebHdfsFileSystemContract#testResponseCode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15864) Job submitter / executor fail when SBN domain name can not resolved

2019-03-01 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781888#comment-16781888
 ] 

He Xiaoqiao commented on HADOOP-15864:
--

upload new patch  [^HADOOP-15864.004.patch] and I try to fix this issue with 
new configuration in order to avoid this fix affect other unit test.

> Job submitter / executor fail when SBN domain name can not resolved
> ---
>
> Key: HADOOP-15864
> URL: https://issues.apache.org/jira/browse/HADOOP-15864
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Critical
> Fix For: 3.0.4, 3.3.0, 3.1.2, 3.2.1
>
> Attachments: HADOOP-15864-branch.2.7.001.patch, 
> HADOOP-15864-branch.2.7.002.patch, HADOOP-15864.003.patch, 
> HADOOP-15864.004.patch, HADOOP-15864.branch.2.7.004.patch
>
>
> Job submit failure and Task executes failure if Standby NameNode domain name 
> can not resolved on HDFS HA with DelegationToken feature.
> This issue is triggered when create {{ConfiguredFailoverProxyProvider}} 
> instance which invoke {{HAUtil.cloneDelegationTokenForLogicalUri}} in HA mode 
> with Security. Since in HDFS HA mode UGI need include separate token for each 
> NameNode in order to dealing with Active-Standby switch, the double tokens' 
> content is same of course. 
> However when #setTokenService in {{HAUtil.cloneDelegationTokenForLogicalUri}} 
> it checks whether the address of NameNode has been resolved or not, if Not, 
> throw #IllegalArgumentException upon, then job submitter/ task executor fail.
> HDFS-8068 and HADOOP-12125 try to fix it, but I don't think the two tickets 
> resolve completely.
> Another questions many guys consider is why NameNode domain name can not 
> resolve? I think there are many scenarios, for instance node replace when 
> meet fault, and refresh DNS sometimes. Anyway, Standby NameNode failure 
> should not impact Hadoop cluster stability in my opinion.
> a. code ref: org.apache.hadoop.security.SecurityUtil line373-386
> {code:java}
>   public static Text buildTokenService(InetSocketAddress addr) {
> String host = null;
> if (useIpForTokenService) {
>   if (addr.isUnresolved()) { // host has no ip address
> throw new IllegalArgumentException(
> new UnknownHostException(addr.getHostName())
> );
>   }
>   host = addr.getAddress().getHostAddress();
> } else {
>   host = StringUtils.toLowerCase(addr.getHostName());
> }
> return new Text(host + ":" + addr.getPort());
>   }
> {code}
> b.exception log ref:
> {code:xml}
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Couldn't create proxy provider class 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:515)
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:170)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:761)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:691)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:150)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
> at 
> org.apache.hadoop.fs.viewfs.ChRootedFileSystem.(ChRootedFileSystem.java:106)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:178)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:172)
> at org.apache.hadoop.fs.viewfs.InodeTree.createLink(InodeTree.java:303)
> at org.apache.hadoop.fs.viewfs.InodeTree.(InodeTree.java:377)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.(ViewFileSystem.java:172)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.initialize(ViewFileSystem.java:172)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:176)
> at 

[jira] [Updated] (HADOOP-15864) Job submitter / executor fail when SBN domain name can not resolved

2019-03-01 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-15864:
-
Attachment: HADOOP-15864.004.patch

> Job submitter / executor fail when SBN domain name can not resolved
> ---
>
> Key: HADOOP-15864
> URL: https://issues.apache.org/jira/browse/HADOOP-15864
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Critical
> Fix For: 3.0.4, 3.3.0, 3.1.2, 3.2.1
>
> Attachments: HADOOP-15864-branch.2.7.001.patch, 
> HADOOP-15864-branch.2.7.002.patch, HADOOP-15864.003.patch, 
> HADOOP-15864.004.patch, HADOOP-15864.branch.2.7.004.patch
>
>
> Job submit failure and Task executes failure if Standby NameNode domain name 
> can not resolved on HDFS HA with DelegationToken feature.
> This issue is triggered when create {{ConfiguredFailoverProxyProvider}} 
> instance which invoke {{HAUtil.cloneDelegationTokenForLogicalUri}} in HA mode 
> with Security. Since in HDFS HA mode UGI need include separate token for each 
> NameNode in order to dealing with Active-Standby switch, the double tokens' 
> content is same of course. 
> However when #setTokenService in {{HAUtil.cloneDelegationTokenForLogicalUri}} 
> it checks whether the address of NameNode has been resolved or not, if Not, 
> throw #IllegalArgumentException upon, then job submitter/ task executor fail.
> HDFS-8068 and HADOOP-12125 try to fix it, but I don't think the two tickets 
> resolve completely.
> Another questions many guys consider is why NameNode domain name can not 
> resolve? I think there are many scenarios, for instance node replace when 
> meet fault, and refresh DNS sometimes. Anyway, Standby NameNode failure 
> should not impact Hadoop cluster stability in my opinion.
> a. code ref: org.apache.hadoop.security.SecurityUtil line373-386
> {code:java}
>   public static Text buildTokenService(InetSocketAddress addr) {
> String host = null;
> if (useIpForTokenService) {
>   if (addr.isUnresolved()) { // host has no ip address
> throw new IllegalArgumentException(
> new UnknownHostException(addr.getHostName())
> );
>   }
>   host = addr.getAddress().getHostAddress();
> } else {
>   host = StringUtils.toLowerCase(addr.getHostName());
> }
> return new Text(host + ":" + addr.getPort());
>   }
> {code}
> b.exception log ref:
> {code:xml}
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Couldn't create proxy provider class 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:515)
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:170)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:761)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:691)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:150)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
> at 
> org.apache.hadoop.fs.viewfs.ChRootedFileSystem.(ChRootedFileSystem.java:106)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:178)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:172)
> at org.apache.hadoop.fs.viewfs.InodeTree.createLink(InodeTree.java:303)
> at org.apache.hadoop.fs.viewfs.InodeTree.(InodeTree.java:377)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.(ViewFileSystem.java:172)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.initialize(ViewFileSystem.java:172)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:176)
> at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:665)
> ... 35 more
> Caused by: java.lang.reflect.InvocationTargetException
> at 

[jira] [Commented] (HADOOP-16119) KMS on Hadoop RPC Engine

2019-02-28 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781284#comment-16781284
 ] 

He Xiaoqiao commented on HADOOP-16119:
--

Thanks [~jojochuang], it is interesting work. Now that I have deploy KMS to 
support massive column encryption for a long time, KMS performance improvement 
rather appeals to me and I would like to join and contribute to this work.

> KMS on Hadoop RPC Engine
> 
>
> Key: HADOOP-16119
> URL: https://issues.apache.org/jira/browse/HADOOP-16119
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: Design doc_ KMS v2.pdf
>
>
> Per discussion on common-dev and text copied here for ease of reference.
> https://lists.apache.org/thread.html/0e2eeaf07b013f17fad6d362393f53d52041828feec53dcddff04808@%3Ccommon-dev.hadoop.apache.org%3E
> {noformat}
> Thanks all for the inputs,
> To offer additional information (while Daryn is working on his stuff),
> optimizing RPC encryption opens up another possibility: migrating KMS
> service to use Hadoop RPC.
> Today's KMS uses HTTPS + REST API, much like webhdfs. It has very
> undesirable performance (a few thousand ops per second) compared to
> NameNode. Unfortunately for each NameNode namespace operation you also need
> to access KMS too.
> Migrating KMS to Hadoop RPC greatly improves its performance (if
> implemented correctly), and RPC encryption would be a prerequisite. So
> please keep that in mind when discussing the Hadoop RPC encryption
> improvements. Cloudera is very interested to help with the Hadoop RPC
> encryption project because a lot of our customers are using at-rest
> encryption, and some of them are starting to hit KMS performance limit.
> This whole "migrating KMS to Hadoop RPC" was Daryn's idea. I heard this
> idea in the meetup and I am very thrilled to see this happening because it
> is a real issue bothering some of our customers, and I suspect it is the
> right solution to address this tech debt.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15922) DelegationTokenAuthenticationFilter get wrong doAsUser since it does not decode URL

2019-01-09 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739049#comment-16739049
 ] 

He Xiaoqiao commented on HADOOP-15922:
--

[~daryn],[~eyang] Thanks for pushing forward this issue and sorry for late 
response. Please let me know if there are something i missed.

> DelegationTokenAuthenticationFilter get wrong doAsUser since it does not 
> decode URL
> ---
>
> Key: HADOOP-15922
> URL: https://issues.apache.org/jira/browse/HADOOP-15922
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, kms
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Fix For: 3.3.0, 3.1.2, 3.2.1
>
> Attachments: HADOOP-15922.001.patch, HADOOP-15922.002.patch, 
> HADOOP-15922.003.patch, HADOOP-15922.004.patch, HADOOP-15922.005.patch, 
> HADOOP-15922.006.patch, HADOOP-15922.007.patch
>
>
> DelegationTokenAuthenticationFilter get wrong doAsUser when proxy user from 
> client is complete kerberos name (e.g., user/hostn...@realm.com, actually it 
> is acceptable), because DelegationTokenAuthenticationFilter does not decode 
> DOAS parameter in URL which is encoded by {{URLEncoder}} at client.
> e.g. KMS as example:
> a. KMSClientProvider creates connection to KMS Server using 
> DelegationTokenAuthenticatedURL#openConnection.
> b. If KMSClientProvider is a doAsUser, KMSClientProvider will put {{doas}} 
> with url encoded user as one parameter of http request. 
> {code:java}
> // proxyuser
> if (doAs != null) {
>   extraParams.put(DO_AS, URLEncoder.encode(doAs, "UTF-8"));
> }
> {code}
> c. when KMS server receives the request, it does not decode the proxy user.
> As result, KMS Server will get the wrong proxy user if this proxy user is 
> complete Kerberos Name or it includes some special character. Some other 
> authentication and authorization exception will throws next to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15922) DelegationTokenAuthenticationFilter get wrong doAsUser since it does not decode URL

2019-01-05 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734894#comment-16734894
 ] 

He Xiaoqiao commented on HADOOP-15922:
--

Check failed junit tests (hadoop.security.ssl.TestSSLFactory
) and found it is failed for times. It maybe not related with patch.
[~eyang] Please help to double check.

> DelegationTokenAuthenticationFilter get wrong doAsUser since it does not 
> decode URL
> ---
>
> Key: HADOOP-15922
> URL: https://issues.apache.org/jira/browse/HADOOP-15922
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, kms
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Fix For: 3.3.0, 3.1.2, 3.2.1
>
> Attachments: HADOOP-15922.001.patch, HADOOP-15922.002.patch, 
> HADOOP-15922.003.patch, HADOOP-15922.004.patch, HADOOP-15922.005.patch, 
> HADOOP-15922.006.patch, HADOOP-15922.007.patch
>
>
> DelegationTokenAuthenticationFilter get wrong doAsUser when proxy user from 
> client is complete kerberos name (e.g., user/hostn...@realm.com, actually it 
> is acceptable), because DelegationTokenAuthenticationFilter does not decode 
> DOAS parameter in URL which is encoded by {{URLEncoder}} at client.
> e.g. KMS as example:
> a. KMSClientProvider creates connection to KMS Server using 
> DelegationTokenAuthenticatedURL#openConnection.
> b. If KMSClientProvider is a doAsUser, KMSClientProvider will put {{doas}} 
> with url encoded user as one parameter of http request. 
> {code:java}
> // proxyuser
> if (doAs != null) {
>   extraParams.put(DO_AS, URLEncoder.encode(doAs, "UTF-8"));
> }
> {code}
> c. when KMS server receives the request, it does not decode the proxy user.
> As result, KMS Server will get the wrong proxy user if this proxy user is 
> complete Kerberos Name or it includes some special character. Some other 
> authentication and authorization exception will throws next to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15922) DelegationTokenAuthenticationFilter get wrong doAsUser since it does not decode URL

2019-01-05 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734852#comment-16734852
 ] 

He Xiaoqiao commented on HADOOP-15922:
--

[~eyang] Thank you for pushing forward HADOOP-15996 and this issue.
update and upload v007 and adapt MIT auth_to_local mechanism based on 
HADOOP-15996.

> DelegationTokenAuthenticationFilter get wrong doAsUser since it does not 
> decode URL
> ---
>
> Key: HADOOP-15922
> URL: https://issues.apache.org/jira/browse/HADOOP-15922
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, kms
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Fix For: 3.3.0, 3.1.2, 3.2.1
>
> Attachments: HADOOP-15922.001.patch, HADOOP-15922.002.patch, 
> HADOOP-15922.003.patch, HADOOP-15922.004.patch, HADOOP-15922.005.patch, 
> HADOOP-15922.006.patch, HADOOP-15922.007.patch
>
>
> DelegationTokenAuthenticationFilter get wrong doAsUser when proxy user from 
> client is complete kerberos name (e.g., user/hostn...@realm.com, actually it 
> is acceptable), because DelegationTokenAuthenticationFilter does not decode 
> DOAS parameter in URL which is encoded by {{URLEncoder}} at client.
> e.g. KMS as example:
> a. KMSClientProvider creates connection to KMS Server using 
> DelegationTokenAuthenticatedURL#openConnection.
> b. If KMSClientProvider is a doAsUser, KMSClientProvider will put {{doas}} 
> with url encoded user as one parameter of http request. 
> {code:java}
> // proxyuser
> if (doAs != null) {
>   extraParams.put(DO_AS, URLEncoder.encode(doAs, "UTF-8"));
> }
> {code}
> c. when KMS server receives the request, it does not decode the proxy user.
> As result, KMS Server will get the wrong proxy user if this proxy user is 
> complete Kerberos Name or it includes some special character. Some other 
> authentication and authorization exception will throws next to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15922) DelegationTokenAuthenticationFilter get wrong doAsUser since it does not decode URL

2019-01-05 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-15922:
-
Attachment: HADOOP-15922.007.patch

> DelegationTokenAuthenticationFilter get wrong doAsUser since it does not 
> decode URL
> ---
>
> Key: HADOOP-15922
> URL: https://issues.apache.org/jira/browse/HADOOP-15922
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, kms
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Fix For: 3.3.0, 3.1.2, 3.2.1
>
> Attachments: HADOOP-15922.001.patch, HADOOP-15922.002.patch, 
> HADOOP-15922.003.patch, HADOOP-15922.004.patch, HADOOP-15922.005.patch, 
> HADOOP-15922.006.patch, HADOOP-15922.007.patch
>
>
> DelegationTokenAuthenticationFilter get wrong doAsUser when proxy user from 
> client is complete kerberos name (e.g., user/hostn...@realm.com, actually it 
> is acceptable), because DelegationTokenAuthenticationFilter does not decode 
> DOAS parameter in URL which is encoded by {{URLEncoder}} at client.
> e.g. KMS as example:
> a. KMSClientProvider creates connection to KMS Server using 
> DelegationTokenAuthenticatedURL#openConnection.
> b. If KMSClientProvider is a doAsUser, KMSClientProvider will put {{doas}} 
> with url encoded user as one parameter of http request. 
> {code:java}
> // proxyuser
> if (doAs != null) {
>   extraParams.put(DO_AS, URLEncoder.encode(doAs, "UTF-8"));
> }
> {code}
> c. when KMS server receives the request, it does not decode the proxy user.
> As result, KMS Server will get the wrong proxy user if this proxy user is 
> complete Kerberos Name or it includes some special character. Some other 
> authentication and authorization exception will throws next to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15922) DelegationTokenAuthenticationFilter get wrong doAsUser since it does not decode URL

2019-01-05 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-15922:
-
Attachment: (was: HADOOP-15922.007.patch)

> DelegationTokenAuthenticationFilter get wrong doAsUser since it does not 
> decode URL
> ---
>
> Key: HADOOP-15922
> URL: https://issues.apache.org/jira/browse/HADOOP-15922
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, kms
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Fix For: 3.3.0, 3.1.2, 3.2.1
>
> Attachments: HADOOP-15922.001.patch, HADOOP-15922.002.patch, 
> HADOOP-15922.003.patch, HADOOP-15922.004.patch, HADOOP-15922.005.patch, 
> HADOOP-15922.006.patch
>
>
> DelegationTokenAuthenticationFilter get wrong doAsUser when proxy user from 
> client is complete kerberos name (e.g., user/hostn...@realm.com, actually it 
> is acceptable), because DelegationTokenAuthenticationFilter does not decode 
> DOAS parameter in URL which is encoded by {{URLEncoder}} at client.
> e.g. KMS as example:
> a. KMSClientProvider creates connection to KMS Server using 
> DelegationTokenAuthenticatedURL#openConnection.
> b. If KMSClientProvider is a doAsUser, KMSClientProvider will put {{doas}} 
> with url encoded user as one parameter of http request. 
> {code:java}
> // proxyuser
> if (doAs != null) {
>   extraParams.put(DO_AS, URLEncoder.encode(doAs, "UTF-8"));
> }
> {code}
> c. when KMS server receives the request, it does not decode the proxy user.
> As result, KMS Server will get the wrong proxy user if this proxy user is 
> complete Kerberos Name or it includes some special character. Some other 
> authentication and authorization exception will throws next to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15922) DelegationTokenAuthenticationFilter get wrong doAsUser since it does not decode URL

2019-01-05 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-15922:
-
Attachment: HADOOP-15922.007.patch

> DelegationTokenAuthenticationFilter get wrong doAsUser since it does not 
> decode URL
> ---
>
> Key: HADOOP-15922
> URL: https://issues.apache.org/jira/browse/HADOOP-15922
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, kms
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Fix For: 3.3.0, 3.1.2, 3.2.1
>
> Attachments: HADOOP-15922.001.patch, HADOOP-15922.002.patch, 
> HADOOP-15922.003.patch, HADOOP-15922.004.patch, HADOOP-15922.005.patch, 
> HADOOP-15922.006.patch, HADOOP-15922.007.patch
>
>
> DelegationTokenAuthenticationFilter get wrong doAsUser when proxy user from 
> client is complete kerberos name (e.g., user/hostn...@realm.com, actually it 
> is acceptable), because DelegationTokenAuthenticationFilter does not decode 
> DOAS parameter in URL which is encoded by {{URLEncoder}} at client.
> e.g. KMS as example:
> a. KMSClientProvider creates connection to KMS Server using 
> DelegationTokenAuthenticatedURL#openConnection.
> b. If KMSClientProvider is a doAsUser, KMSClientProvider will put {{doas}} 
> with url encoded user as one parameter of http request. 
> {code:java}
> // proxyuser
> if (doAs != null) {
>   extraParams.put(DO_AS, URLEncoder.encode(doAs, "UTF-8"));
> }
> {code}
> c. when KMS server receives the request, it does not decode the proxy user.
> As result, KMS Server will get the wrong proxy user if this proxy user is 
> complete Kerberos Name or it includes some special character. Some other 
> authentication and authorization exception will throws next to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15959) revert HADOOP-12751

2018-12-02 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16706764#comment-16706764
 ] 

He Xiaoqiao commented on HADOOP-15959:
--

hi [~ste...@apache.org], [~ajisakaa], IIUC, this is a common issue, since after 
revert HADOOP-12751, some auth_to_local rules if include '/' or '@' will always 
check fail in KerberosName#apply since it throws NoMatchingRule exception.
{quote}if (result != null && nonSimplePattern.matcher(result).find()) {
  throw new NoMatchingRule("Non-simple name " + result + " after auth_to_local 
rule " + this); 
}{quote}
another case ref HADOOP-15922.
Please check +1.

> revert HADOOP-12751
> ---
>
> Key: HADOOP-15959
> URL: https://issues.apache.org/jira/browse/HADOOP-15959
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 3.2.0, 3.1.1, 2.9.2, 3.0.3, 2.7.7, 2.8.5
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 3.2.0, 2.7.8, 3.0.4, 3.1.2, 2.8.6, 2.9.3
>
> Attachments: HADOOP-15959-001.patch, HADOOP-15959-branch-2-002.patch, 
> HADOOP-15959-branch-2.7-003.patch
>
>
> HADOOP-12751 doesn't quite work right. Revert.
> (this patch is so jenkins can do the test runs)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15922) DelegationTokenAuthenticationFilter get wrong doAsUser since it does not decode URL

2018-12-02 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16706737#comment-16706737
 ] 

He Xiaoqiao commented on HADOOP-15922:
--

[~eyang] Thanks for your feedback.
Actually, if use 'fool/localhost' as impersonate user for ut, KerberosName 
could not check pass after revert HADOOP-12751, since KerberosName#apply check 
if impersonate user name include '/' or '@' when apply rule like 
'RULE:[2:$1/$2]', and it throws exception because HADOOP-12751.
{quote}  if (result != null && nonSimplePattern.matcher(result).find()) {
throw new NoMatchingRule("Non-simple name " + result +
 " after auth_to_local rule " + this);
  }{quote}
Before revert HADOOP-12751, this check just LOG.info and not throw exception.
IIUC, this is a common issue: if using auth_to_local and some rule include '/' 
or '@', it always throw exception. FYI.

> DelegationTokenAuthenticationFilter get wrong doAsUser since it does not 
> decode URL
> ---
>
> Key: HADOOP-15922
> URL: https://issues.apache.org/jira/browse/HADOOP-15922
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, kms
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Fix For: 3.3.0, 3.1.2, 3.2.1
>
> Attachments: HADOOP-15922.001.patch, HADOOP-15922.002.patch, 
> HADOOP-15922.003.patch, HADOOP-15922.004.patch, HADOOP-15922.005.patch, 
> HADOOP-15922.006.patch
>
>
> DelegationTokenAuthenticationFilter get wrong doAsUser when proxy user from 
> client is complete kerberos name (e.g., user/hostn...@realm.com, actually it 
> is acceptable), because DelegationTokenAuthenticationFilter does not decode 
> DOAS parameter in URL which is encoded by {{URLEncoder}} at client.
> e.g. KMS as example:
> a. KMSClientProvider creates connection to KMS Server using 
> DelegationTokenAuthenticatedURL#openConnection.
> b. If KMSClientProvider is a doAsUser, KMSClientProvider will put {{doas}} 
> with url encoded user as one parameter of http request. 
> {code:java}
> // proxyuser
> if (doAs != null) {
>   extraParams.put(DO_AS, URLEncoder.encode(doAs, "UTF-8"));
> }
> {code}
> c. when KMS server receives the request, it does not decode the proxy user.
> As result, KMS Server will get the wrong proxy user if this proxy user is 
> complete Kerberos Name or it includes some special character. Some other 
> authentication and authorization exception will throws next to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15922) DelegationTokenAuthenticationFilter get wrong doAsUser since it does not decode URL

2018-12-02 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16706207#comment-16706207
 ] 

He Xiaoqiao commented on HADOOP-15922:
--

[~eyang] update and re-upload v006 and use special character '%' replace '/' in 
TestKMS#testGetDelegationTokenByProxyUser compare to v005. Since after revert 
of HADOOP-12751, have to config complex auth_to_local rules to pass the auth. 
In order to check client not double encode doAs user name only, choose another 
special character '%' and do not import auth_to_local rules. FYI.

> DelegationTokenAuthenticationFilter get wrong doAsUser since it does not 
> decode URL
> ---
>
> Key: HADOOP-15922
> URL: https://issues.apache.org/jira/browse/HADOOP-15922
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, kms
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Fix For: 3.3.0, 3.1.2, 3.2.1
>
> Attachments: HADOOP-15922.001.patch, HADOOP-15922.002.patch, 
> HADOOP-15922.003.patch, HADOOP-15922.004.patch, HADOOP-15922.005.patch, 
> HADOOP-15922.006.patch
>
>
> DelegationTokenAuthenticationFilter get wrong doAsUser when proxy user from 
> client is complete kerberos name (e.g., user/hostn...@realm.com, actually it 
> is acceptable), because DelegationTokenAuthenticationFilter does not decode 
> DOAS parameter in URL which is encoded by {{URLEncoder}} at client.
> e.g. KMS as example:
> a. KMSClientProvider creates connection to KMS Server using 
> DelegationTokenAuthenticatedURL#openConnection.
> b. If KMSClientProvider is a doAsUser, KMSClientProvider will put {{doas}} 
> with url encoded user as one parameter of http request. 
> {code:java}
> // proxyuser
> if (doAs != null) {
>   extraParams.put(DO_AS, URLEncoder.encode(doAs, "UTF-8"));
> }
> {code}
> c. when KMS server receives the request, it does not decode the proxy user.
> As result, KMS Server will get the wrong proxy user if this proxy user is 
> complete Kerberos Name or it includes some special character. Some other 
> authentication and authorization exception will throws next to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15922) DelegationTokenAuthenticationFilter get wrong doAsUser since it does not decode URL

2018-12-02 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-15922:
-
Attachment: HADOOP-15922.006.patch

> DelegationTokenAuthenticationFilter get wrong doAsUser since it does not 
> decode URL
> ---
>
> Key: HADOOP-15922
> URL: https://issues.apache.org/jira/browse/HADOOP-15922
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, kms
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Fix For: 3.3.0, 3.1.2, 3.2.1
>
> Attachments: HADOOP-15922.001.patch, HADOOP-15922.002.patch, 
> HADOOP-15922.003.patch, HADOOP-15922.004.patch, HADOOP-15922.005.patch, 
> HADOOP-15922.006.patch
>
>
> DelegationTokenAuthenticationFilter get wrong doAsUser when proxy user from 
> client is complete kerberos name (e.g., user/hostn...@realm.com, actually it 
> is acceptable), because DelegationTokenAuthenticationFilter does not decode 
> DOAS parameter in URL which is encoded by {{URLEncoder}} at client.
> e.g. KMS as example:
> a. KMSClientProvider creates connection to KMS Server using 
> DelegationTokenAuthenticatedURL#openConnection.
> b. If KMSClientProvider is a doAsUser, KMSClientProvider will put {{doas}} 
> with url encoded user as one parameter of http request. 
> {code:java}
> // proxyuser
> if (doAs != null) {
>   extraParams.put(DO_AS, URLEncoder.encode(doAs, "UTF-8"));
> }
> {code}
> c. when KMS server receives the request, it does not decode the proxy user.
> As result, KMS Server will get the wrong proxy user if this proxy user is 
> complete Kerberos Name or it includes some special character. Some other 
> authentication and authorization exception will throws next to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15922) DelegationTokenAuthenticationFilter get wrong doAsUser since it does not decode URL

2018-11-29 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703412#comment-16703412
 ] 

He Xiaoqiao commented on HADOOP-15922:
--

[~daryn] Thank you for your correct. I upload another v005 and follow your 
suggestions.could you help to revert commit and review the new one. Thanks 
again. 

> DelegationTokenAuthenticationFilter get wrong doAsUser since it does not 
> decode URL
> ---
>
> Key: HADOOP-15922
> URL: https://issues.apache.org/jira/browse/HADOOP-15922
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, kms
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Fix For: 3.3.0, 3.1.2, 3.2.1
>
> Attachments: HADOOP-15922.001.patch, HADOOP-15922.002.patch, 
> HADOOP-15922.003.patch, HADOOP-15922.004.patch, HADOOP-15922.005.patch
>
>
> DelegationTokenAuthenticationFilter get wrong doAsUser when proxy user from 
> client is complete kerberos name (e.g., user/hostn...@realm.com, actually it 
> is acceptable), because DelegationTokenAuthenticationFilter does not decode 
> DOAS parameter in URL which is encoded by {{URLEncoder}} at client.
> e.g. KMS as example:
> a. KMSClientProvider creates connection to KMS Server using 
> DelegationTokenAuthenticatedURL#openConnection.
> b. If KMSClientProvider is a doAsUser, KMSClientProvider will put {{doas}} 
> with url encoded user as one parameter of http request. 
> {code:java}
> // proxyuser
> if (doAs != null) {
>   extraParams.put(DO_AS, URLEncoder.encode(doAs, "UTF-8"));
> }
> {code}
> c. when KMS server receives the request, it does not decode the proxy user.
> As result, KMS Server will get the wrong proxy user if this proxy user is 
> complete Kerberos Name or it includes some special character. Some other 
> authentication and authorization exception will throws next to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15922) DelegationTokenAuthenticationFilter get wrong doAsUser since it does not decode URL

2018-11-29 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-15922:
-
Attachment: HADOOP-15922.005.patch

> DelegationTokenAuthenticationFilter get wrong doAsUser since it does not 
> decode URL
> ---
>
> Key: HADOOP-15922
> URL: https://issues.apache.org/jira/browse/HADOOP-15922
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, kms
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Fix For: 3.3.0, 3.1.2, 3.2.1
>
> Attachments: HADOOP-15922.001.patch, HADOOP-15922.002.patch, 
> HADOOP-15922.003.patch, HADOOP-15922.004.patch, HADOOP-15922.005.patch
>
>
> DelegationTokenAuthenticationFilter get wrong doAsUser when proxy user from 
> client is complete kerberos name (e.g., user/hostn...@realm.com, actually it 
> is acceptable), because DelegationTokenAuthenticationFilter does not decode 
> DOAS parameter in URL which is encoded by {{URLEncoder}} at client.
> e.g. KMS as example:
> a. KMSClientProvider creates connection to KMS Server using 
> DelegationTokenAuthenticatedURL#openConnection.
> b. If KMSClientProvider is a doAsUser, KMSClientProvider will put {{doas}} 
> with url encoded user as one parameter of http request. 
> {code:java}
> // proxyuser
> if (doAs != null) {
>   extraParams.put(DO_AS, URLEncoder.encode(doAs, "UTF-8"));
> }
> {code}
> c. when KMS server receives the request, it does not decode the proxy user.
> As result, KMS Server will get the wrong proxy user if this proxy user is 
> complete Kerberos Name or it includes some special character. Some other 
> authentication and authorization exception will throws next to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15922) DelegationTokenAuthenticationFilter get wrong doAsUser since it does not decode URL

2018-11-17 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690802#comment-16690802
 ] 

He Xiaoqiao commented on HADOOP-15922:
--

[~eyang] Thank your for pushing forward this issue.

> DelegationTokenAuthenticationFilter get wrong doAsUser since it does not 
> decode URL
> ---
>
> Key: HADOOP-15922
> URL: https://issues.apache.org/jira/browse/HADOOP-15922
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, kms
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-15922.001.patch, HADOOP-15922.002.patch, 
> HADOOP-15922.003.patch, HADOOP-15922.004.patch
>
>
> DelegationTokenAuthenticationFilter get wrong doAsUser when proxy user from 
> client is complete kerberos name (e.g., user/hostn...@realm.com, actually it 
> is acceptable), because DelegationTokenAuthenticationFilter does not decode 
> DOAS parameter in URL which is encoded by {{URLEncoder}} at client.
> e.g. KMS as example:
> a. KMSClientProvider creates connection to KMS Server using 
> DelegationTokenAuthenticatedURL#openConnection.
> b. If KMSClientProvider is a doAsUser, KMSClientProvider will put {{doas}} 
> with url encoded user as one parameter of http request. 
> {code:java}
> // proxyuser
> if (doAs != null) {
>   extraParams.put(DO_AS, URLEncoder.encode(doAs, "UTF-8"));
> }
> {code}
> c. when KMS server receives the request, it does not decode the proxy user.
> As result, KMS Server will get the wrong proxy user if this proxy user is 
> complete Kerberos Name or it includes some special character. Some other 
> authentication and authorization exception will throws next to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15922) DelegationTokenAuthenticationFilter get wrong doAsUser since it does not decode URL

2018-11-16 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690402#comment-16690402
 ] 

He Xiaoqiao commented on HADOOP-15922:
--

Hi [~eyang] 
{quote}If the client is changed to proxy from client/host, then 
hadoop.kms.proxyuser.client.hosts should include host: 
conf.set("hadoop.kms.proxyuser.client.hosts", "localhost,host");
{quote}
In the unit test, try to use user 'client/host' impersonate 'foo/localhost'. 
Actually, it is set for user 'client' about which users/groups/hosts can 
impersonate to, using configuration key 
'hadoop.kms.proxyuser.client.users'/'hadoop.kms.proxyuser.client.hosts' and it 
is indeed valid. Impersonating user 'foo/localhost' can be passing auth since:
{code:java}
conf.set("hadoop.kms.proxyuser.client.users", "foo/localhost");
conf.set("hadoop.kms.proxyuser.client.hosts", "localhost");
{code}
It is not necessary to check groups if users can check pass, ref. 
org.apache.hadoop.security.authorize.AccessControlList#isUserInList
{code:java}
  public final boolean isUserInList(UserGroupInformation ugi) {
if (allAllowed || users.contains(ugi.getShortUserName())) {
  return true;
} else if (!groups.isEmpty()) {
  for (String group : ugi.getGroups()) {
if (groups.contains(group)) {
  return true;
}
  }
}
return false;
  }
{code}

{quote}I am not sure why KMS doesn't use standard 
hadoop.proxyuser.client.groups and hadoop.proxyuser.client.hosts.{quote}
Configuration prefix 'hadoop.kms' for KMS originate from HADOOP-10433, however 
I do not find why KMS use no-standard configuration.

> DelegationTokenAuthenticationFilter get wrong doAsUser since it does not 
> decode URL
> ---
>
> Key: HADOOP-15922
> URL: https://issues.apache.org/jira/browse/HADOOP-15922
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, kms
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-15922.001.patch, HADOOP-15922.002.patch, 
> HADOOP-15922.003.patch, HADOOP-15922.004.patch
>
>
> DelegationTokenAuthenticationFilter get wrong doAsUser when proxy user from 
> client is complete kerberos name (e.g., user/hostn...@realm.com, actually it 
> is acceptable), because DelegationTokenAuthenticationFilter does not decode 
> DOAS parameter in URL which is encoded by {{URLEncoder}} at client.
> e.g. KMS as example:
> a. KMSClientProvider creates connection to KMS Server using 
> DelegationTokenAuthenticatedURL#openConnection.
> b. If KMSClientProvider is a doAsUser, KMSClientProvider will put {{doas}} 
> with url encoded user as one parameter of http request. 
> {code:java}
> // proxyuser
> if (doAs != null) {
>   extraParams.put(DO_AS, URLEncoder.encode(doAs, "UTF-8"));
> }
> {code}
> c. when KMS server receives the request, it does not decode the proxy user.
> As result, KMS Server will get the wrong proxy user if this proxy user is 
> complete Kerberos Name or it includes some special character. Some other 
> authentication and authorization exception will throws next to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15922) DelegationTokenAuthenticationFilter get wrong doAsUser since it does not decode URL

2018-11-15 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689060#comment-16689060
 ] 

He Xiaoqiao commented on HADOOP-15922:
--

[~eyang] Thank you for quick response.
It makes sense for me. [^HADOOP-15922.004.patch] fixes it. FYI.

> DelegationTokenAuthenticationFilter get wrong doAsUser since it does not 
> decode URL
> ---
>
> Key: HADOOP-15922
> URL: https://issues.apache.org/jira/browse/HADOOP-15922
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, kms
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-15922.001.patch, HADOOP-15922.002.patch, 
> HADOOP-15922.003.patch, HADOOP-15922.004.patch
>
>
> DelegationTokenAuthenticationFilter get wrong doAsUser when proxy user from 
> client is complete kerberos name (e.g., user/hostn...@realm.com, actually it 
> is acceptable), because DelegationTokenAuthenticationFilter does not decode 
> DOAS parameter in URL which is encoded by {{URLEncoder}} at client.
> e.g. KMS as example:
> a. KMSClientProvider creates connection to KMS Server using 
> DelegationTokenAuthenticatedURL#openConnection.
> b. If KMSClientProvider is a doAsUser, KMSClientProvider will put {{doas}} 
> with url encoded user as one parameter of http request. 
> {code:java}
> // proxyuser
> if (doAs != null) {
>   extraParams.put(DO_AS, URLEncoder.encode(doAs, "UTF-8"));
> }
> {code}
> c. when KMS server receives the request, it does not decode the proxy user.
> As result, KMS Server will get the wrong proxy user if this proxy user is 
> complete Kerberos Name or it includes some special character. Some other 
> authentication and authorization exception will throws next to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15922) DelegationTokenAuthenticationFilter get wrong doAsUser since it does not decode URL

2018-11-15 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-15922:
-
Attachment: HADOOP-15922.004.patch

> DelegationTokenAuthenticationFilter get wrong doAsUser since it does not 
> decode URL
> ---
>
> Key: HADOOP-15922
> URL: https://issues.apache.org/jira/browse/HADOOP-15922
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, kms
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-15922.001.patch, HADOOP-15922.002.patch, 
> HADOOP-15922.003.patch, HADOOP-15922.004.patch
>
>
> DelegationTokenAuthenticationFilter get wrong doAsUser when proxy user from 
> client is complete kerberos name (e.g., user/hostn...@realm.com, actually it 
> is acceptable), because DelegationTokenAuthenticationFilter does not decode 
> DOAS parameter in URL which is encoded by {{URLEncoder}} at client.
> e.g. KMS as example:
> a. KMSClientProvider creates connection to KMS Server using 
> DelegationTokenAuthenticatedURL#openConnection.
> b. If KMSClientProvider is a doAsUser, KMSClientProvider will put {{doas}} 
> with url encoded user as one parameter of http request. 
> {code:java}
> // proxyuser
> if (doAs != null) {
>   extraParams.put(DO_AS, URLEncoder.encode(doAs, "UTF-8"));
> }
> {code}
> c. when KMS server receives the request, it does not decode the proxy user.
> As result, KMS Server will get the wrong proxy user if this proxy user is 
> complete Kerberos Name or it includes some special character. Some other 
> authentication and authorization exception will throws next to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15922) DelegationTokenAuthenticationFilter get wrong doAsUser since it does not decode URL

2018-11-15 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689045#comment-16689045
 ] 

He Xiaoqiao commented on HADOOP-15922:
--

[~eyang], Thanks for your reviews and comments.
[^HADOOP-15922.003.patch] update unit test based Eric's comments:
1. use proxy user named foo/localhost instead of foo/localh...@realm.com and 
just check special character '/'.
2. limit proxyuser scope about client.
3. I think it is not key point about client principal with/without hostname for 
this issue and principal is defined when setup at beginning, if I missed 
something, please correct me.
4. rename test method from doGetDelegationTokenByProxyUser to 
testGetDelegationTokenByProxyUser.
Thanks [~eyang] again.

> DelegationTokenAuthenticationFilter get wrong doAsUser since it does not 
> decode URL
> ---
>
> Key: HADOOP-15922
> URL: https://issues.apache.org/jira/browse/HADOOP-15922
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, kms
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-15922.001.patch, HADOOP-15922.002.patch, 
> HADOOP-15922.003.patch
>
>
> DelegationTokenAuthenticationFilter get wrong doAsUser when proxy user from 
> client is complete kerberos name (e.g., user/hostn...@realm.com, actually it 
> is acceptable), because DelegationTokenAuthenticationFilter does not decode 
> DOAS parameter in URL which is encoded by {{URLEncoder}} at client.
> e.g. KMS as example:
> a. KMSClientProvider creates connection to KMS Server using 
> DelegationTokenAuthenticatedURL#openConnection.
> b. If KMSClientProvider is a doAsUser, KMSClientProvider will put {{doas}} 
> with url encoded user as one parameter of http request. 
> {code:java}
> // proxyuser
> if (doAs != null) {
>   extraParams.put(DO_AS, URLEncoder.encode(doAs, "UTF-8"));
> }
> {code}
> c. when KMS server receives the request, it does not decode the proxy user.
> As result, KMS Server will get the wrong proxy user if this proxy user is 
> complete Kerberos Name or it includes some special character. Some other 
> authentication and authorization exception will throws next to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15922) DelegationTokenAuthenticationFilter get wrong doAsUser since it does not decode URL

2018-11-15 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-15922:
-
Attachment: HADOOP-15922.003.patch

> DelegationTokenAuthenticationFilter get wrong doAsUser since it does not 
> decode URL
> ---
>
> Key: HADOOP-15922
> URL: https://issues.apache.org/jira/browse/HADOOP-15922
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, kms
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-15922.001.patch, HADOOP-15922.002.patch, 
> HADOOP-15922.003.patch
>
>
> DelegationTokenAuthenticationFilter get wrong doAsUser when proxy user from 
> client is complete kerberos name (e.g., user/hostn...@realm.com, actually it 
> is acceptable), because DelegationTokenAuthenticationFilter does not decode 
> DOAS parameter in URL which is encoded by {{URLEncoder}} at client.
> e.g. KMS as example:
> a. KMSClientProvider creates connection to KMS Server using 
> DelegationTokenAuthenticatedURL#openConnection.
> b. If KMSClientProvider is a doAsUser, KMSClientProvider will put {{doas}} 
> with url encoded user as one parameter of http request. 
> {code:java}
> // proxyuser
> if (doAs != null) {
>   extraParams.put(DO_AS, URLEncoder.encode(doAs, "UTF-8"));
> }
> {code}
> c. when KMS server receives the request, it does not decode the proxy user.
> As result, KMS Server will get the wrong proxy user if this proxy user is 
> complete Kerberos Name or it includes some special character. Some other 
> authentication and authorization exception will throws next to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15922) DelegationTokenAuthenticationFilter get wrong doAsUser since it does not decode URL

2018-11-14 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687481#comment-16687481
 ] 

He Xiaoqiao commented on HADOOP-15922:
--

submit v002 patch with unittest and trigger jenkins again.
Hi, [~ste...@apache.org] would you help me to review this patch?

> DelegationTokenAuthenticationFilter get wrong doAsUser since it does not 
> decode URL
> ---
>
> Key: HADOOP-15922
> URL: https://issues.apache.org/jira/browse/HADOOP-15922
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, kms
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-15922.001.patch, HADOOP-15922.002.patch
>
>
> DelegationTokenAuthenticationFilter get wrong doAsUser when proxy user from 
> client is complete kerberos name (e.g., user/hostn...@realm.com, actually it 
> is acceptable), because DelegationTokenAuthenticationFilter does not decode 
> DOAS parameter in URL which is encoded by {{URLEncoder}} at client.
> e.g. KMS as example:
> a. KMSClientProvider creates connection to KMS Server using 
> DelegationTokenAuthenticatedURL#openConnection.
> b. If KMSClientProvider is a doAsUser, KMSClientProvider will put {{doas}} 
> with url encoded user as one parameter of http request. 
> {code:java}
> // proxyuser
> if (doAs != null) {
>   extraParams.put(DO_AS, URLEncoder.encode(doAs, "UTF-8"));
> }
> {code}
> c. when KMS server receives the request, it does not decode the proxy user.
> As result, KMS Server will get the wrong proxy user if this proxy user is 
> complete Kerberos Name or it includes some special character. Some other 
> authentication and authorization exception will throws next to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15922) DelegationTokenAuthenticationFilter get wrong doAsUser since it does not decode URL

2018-11-14 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-15922:
-
Attachment: HADOOP-15922.002.patch

> DelegationTokenAuthenticationFilter get wrong doAsUser since it does not 
> decode URL
> ---
>
> Key: HADOOP-15922
> URL: https://issues.apache.org/jira/browse/HADOOP-15922
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, kms
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-15922.001.patch, HADOOP-15922.002.patch
>
>
> DelegationTokenAuthenticationFilter get wrong doAsUser when proxy user from 
> client is complete kerberos name (e.g., user/hostn...@realm.com, actually it 
> is acceptable), because DelegationTokenAuthenticationFilter does not decode 
> DOAS parameter in URL which is encoded by {{URLEncoder}} at client.
> e.g. KMS as example:
> a. KMSClientProvider creates connection to KMS Server using 
> DelegationTokenAuthenticatedURL#openConnection.
> b. If KMSClientProvider is a doAsUser, KMSClientProvider will put {{doas}} 
> with url encoded user as one parameter of http request. 
> {code:java}
> // proxyuser
> if (doAs != null) {
>   extraParams.put(DO_AS, URLEncoder.encode(doAs, "UTF-8"));
> }
> {code}
> c. when KMS server receives the request, it does not decode the proxy user.
> As result, KMS Server will get the wrong proxy user if this proxy user is 
> complete Kerberos Name or it includes some special character. Some other 
> authentication and authorization exception will throws next to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15922) DelegationTokenAuthenticationFilter get wrong doAsUser since it does not decode URL

2018-11-12 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-15922:
-
Status: Patch Available  (was: Open)

submit v001 patch and trigger Jenkins. UT will follow later.

> DelegationTokenAuthenticationFilter get wrong doAsUser since it does not 
> decode URL
> ---
>
> Key: HADOOP-15922
> URL: https://issues.apache.org/jira/browse/HADOOP-15922
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, kms
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-15922.001.patch
>
>
> DelegationTokenAuthenticationFilter get wrong doAsUser when proxy user from 
> client is complete kerberos name (e.g., user/hostn...@realm.com, actually it 
> is acceptable), because DelegationTokenAuthenticationFilter does not decode 
> DOAS parameter in URL which is encoded by {{URLEncoder}} at client.
> e.g. KMS as example:
> a. KMSClientProvider creates connection to KMS Server using 
> DelegationTokenAuthenticatedURL#openConnection.
> b. If KMSClientProvider is a doAsUser, KMSClientProvider will put {{doas}} 
> with url encoded user as one parameter of http request. 
> {code:java}
> // proxyuser
> if (doAs != null) {
>   extraParams.put(DO_AS, URLEncoder.encode(doAs, "UTF-8"));
> }
> {code}
> c. when KMS server receives the request, it does not decode the proxy user.
> As result, KMS Server will get the wrong proxy user if this proxy user is 
> complete Kerberos Name or it includes some special character. Some other 
> authentication and authorization exception will throws next to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15922) DelegationTokenAuthenticationFilter get wrong doAsUser since it does not decode URL

2018-11-12 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-15922:
-
Attachment: HADOOP-15922.001.patch

> DelegationTokenAuthenticationFilter get wrong doAsUser since it does not 
> decode URL
> ---
>
> Key: HADOOP-15922
> URL: https://issues.apache.org/jira/browse/HADOOP-15922
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, kms
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-15922.001.patch
>
>
> DelegationTokenAuthenticationFilter get wrong doAsUser when proxy user from 
> client is complete kerberos name (e.g., user/hostn...@realm.com, actually it 
> is acceptable), because DelegationTokenAuthenticationFilter does not decode 
> DOAS parameter in URL which is encoded by {{URLEncoder}} at client.
> e.g. KMS as example:
> a. KMSClientProvider creates connection to KMS Server using 
> DelegationTokenAuthenticatedURL#openConnection.
> b. If KMSClientProvider is a doAsUser, KMSClientProvider will put {{doas}} 
> with url encoded user as one parameter of http request. 
> {code:java}
> // proxyuser
> if (doAs != null) {
>   extraParams.put(DO_AS, URLEncoder.encode(doAs, "UTF-8"));
> }
> {code}
> c. when KMS server receives the request, it does not decode the proxy user.
> As result, KMS Server will get the wrong proxy user if this proxy user is 
> complete Kerberos Name or it includes some special character. Some other 
> authentication and authorization exception will throws next to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15922) DelegationTokenAuthenticationFilter get wrong doAsUser since it does not decode URL

2018-11-12 Thread He Xiaoqiao (JIRA)
He Xiaoqiao created HADOOP-15922:


 Summary: DelegationTokenAuthenticationFilter get wrong doAsUser 
since it does not decode URL
 Key: HADOOP-15922
 URL: https://issues.apache.org/jira/browse/HADOOP-15922
 Project: Hadoop Common
  Issue Type: Bug
  Components: common, kms
Reporter: He Xiaoqiao
Assignee: He Xiaoqiao


DelegationTokenAuthenticationFilter get wrong doAsUser when proxy user from 
client is complete kerberos name (e.g., user/hostn...@realm.com, actually it is 
acceptable), because DelegationTokenAuthenticationFilter does not decode DOAS 
parameter in URL which is encoded by {{URLEncoder}} at client.
e.g. KMS as example:
a. KMSClientProvider creates connection to KMS Server using 
DelegationTokenAuthenticatedURL#openConnection.
b. If KMSClientProvider is a doAsUser, KMSClientProvider will put {{doas}} with 
url encoded user as one parameter of http request. 
{code:java}
// proxyuser
if (doAs != null) {
  extraParams.put(DO_AS, URLEncoder.encode(doAs, "UTF-8"));
}
{code}
c. when KMS server receives the request, it does not decode the proxy user.

As result, KMS Server will get the wrong proxy user if this proxy user is 
complete Kerberos Name or it includes some special character. Some other 
authentication and authorization exception will throws next to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15864) Job submitter / executor fail when SBN domain name can not resolved

2018-10-29 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667065#comment-16667065
 ] 

He Xiaoqiao commented on HADOOP-15864:
--

{quote}a number of other callers of SecurityUtil.buildTokenService in YARN and 
MAPREDUCE and none seem to handle a null response value{quote}
OMG, I will try to fix this issue and keep compatibility with YARN and other 
assembly in the next days.
Thanks [~jojochuang],[~wilfreds] again.

> Job submitter / executor fail when SBN domain name can not resolved
> ---
>
> Key: HADOOP-15864
> URL: https://issues.apache.org/jira/browse/HADOOP-15864
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Critical
> Fix For: 3.0.4, 3.3.0, 3.1.2, 3.2.1
>
> Attachments: HADOOP-15864-branch.2.7.001.patch, 
> HADOOP-15864-branch.2.7.002.patch, HADOOP-15864.003.patch, 
> HADOOP-15864.branch.2.7.004.patch
>
>
> Job submit failure and Task executes failure if Standby NameNode domain name 
> can not resolved on HDFS HA with DelegationToken feature.
> This issue is triggered when create {{ConfiguredFailoverProxyProvider}} 
> instance which invoke {{HAUtil.cloneDelegationTokenForLogicalUri}} in HA mode 
> with Security. Since in HDFS HA mode UGI need include separate token for each 
> NameNode in order to dealing with Active-Standby switch, the double tokens' 
> content is same of course. 
> However when #setTokenService in {{HAUtil.cloneDelegationTokenForLogicalUri}} 
> it checks whether the address of NameNode has been resolved or not, if Not, 
> throw #IllegalArgumentException upon, then job submitter/ task executor fail.
> HDFS-8068 and HADOOP-12125 try to fix it, but I don't think the two tickets 
> resolve completely.
> Another questions many guys consider is why NameNode domain name can not 
> resolve? I think there are many scenarios, for instance node replace when 
> meet fault, and refresh DNS sometimes. Anyway, Standby NameNode failure 
> should not impact Hadoop cluster stability in my opinion.
> a. code ref: org.apache.hadoop.security.SecurityUtil line373-386
> {code:java}
>   public static Text buildTokenService(InetSocketAddress addr) {
> String host = null;
> if (useIpForTokenService) {
>   if (addr.isUnresolved()) { // host has no ip address
> throw new IllegalArgumentException(
> new UnknownHostException(addr.getHostName())
> );
>   }
>   host = addr.getAddress().getHostAddress();
> } else {
>   host = StringUtils.toLowerCase(addr.getHostName());
> }
> return new Text(host + ":" + addr.getPort());
>   }
> {code}
> b.exception log ref:
> {code:xml}
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Couldn't create proxy provider class 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:515)
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:170)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:761)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:691)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:150)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
> at 
> org.apache.hadoop.fs.viewfs.ChRootedFileSystem.(ChRootedFileSystem.java:106)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:178)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:172)
> at org.apache.hadoop.fs.viewfs.InodeTree.createLink(InodeTree.java:303)
> at org.apache.hadoop.fs.viewfs.InodeTree.(InodeTree.java:377)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.(ViewFileSystem.java:172)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.initialize(ViewFileSystem.java:172)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729)
> at 

[jira] [Updated] (HADOOP-15883) Fix WebHdfsFileSystemContract test

2018-10-28 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-15883:
-
Attachment: HADOOP-15883.001.patch
Status: Patch Available  (was: Open)

submit initial patch against trunk and trigger Jenkins.

> Fix WebHdfsFileSystemContract test
> --
>
> Key: HADOOP-15883
> URL: https://issues.apache.org/jira/browse/HADOOP-15883
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.0.4, 3.3.0, 3.1.2, 3.2.1
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HADOOP-15883.001.patch
>
>
> HADOOP-15864 fix bug about Job/Task execute failure when server (NameNode, 
> KMS, Timeline) domain name can not resolve. meanwhile it change semantic of 
> http status code about webhdfsfilesystem, this ticket will trace to fix 
> TestWebHdfsFileSystemContract#testResponseCode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15883) Fix WebHdfsFileSystemContract test

2018-10-28 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1407#comment-1407
 ] 

He Xiaoqiao commented on HADOOP-15883:
--

[~jojochuang],[~ayushtkn] As HADOOP-15864 mentioned, after patch, DataNode will 
NOT meet IllegalArgumentException when create DFSClient instance, however, when 
DataNode creates wrapped outputstream, it will meet auth exception since no 
token exist, so Client will meet HTTP status code 403.
In one word, this patch changes http semantic when no parameter about 
#namenoderpcaddress.
On the base of the above, I support also change http status code to 403 when 
checking at TestWebHdfsFileSystemContract#testResponseCode. FYI.

> Fix WebHdfsFileSystemContract test
> --
>
> Key: HADOOP-15883
> URL: https://issues.apache.org/jira/browse/HADOOP-15883
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.0.4, 3.3.0, 3.1.2, 3.2.1
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
>
> HADOOP-15864 fix bug about Job/Task execute failure when server (NameNode, 
> KMS, Timeline) domain name can not resolve. meanwhile it change semantic of 
> http status code about webhdfsfilesystem, this ticket will trace to fix 
> TestWebHdfsFileSystemContract#testResponseCode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15864) Job submitter / executor fail when SBN domain name can not resolved

2018-10-28 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1405#comment-1405
 ] 

He Xiaoqiao commented on HADOOP-15864:
--

Thanks [~ayushtkn] feedback, I recheck the fail UT 
(#TestWebHdfsFileSystemContract) and retest at local machine, it is related to 
the issue.
The main reason:
Before patch, {{WebHdfsHandler}} at DataNode will meet 
{{IllegalArgumentException}} at {{SecurityUtil#buildTokenService}} when create 
DFSClient instance using {{newDfsClient(nnId, confForCreate);}} when handle 
event {{onCreate}} but client do not pass parameter #namenoderpcaddress, so 
Client will meet HTTP status code 400.
After patch, DataNode will *NOT* meet {{IllegalArgumentException}} when create 
DFSClient instance, however, when DataNode creates wrapped outputstream, it 
will meet auth exception since no token exist, so Client will meet HTTP status 
code 403.
In one word, this patch changes http semantic when no parameter about 
#namenoderpcaddress.
I have created another ticket HADOOP-15883 to trace this issue.
Thanks [~ayushtkn] again and sorry I do not see it in time when check fail ut 
in my local environment.

> Job submitter / executor fail when SBN domain name can not resolved
> ---
>
> Key: HADOOP-15864
> URL: https://issues.apache.org/jira/browse/HADOOP-15864
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Critical
> Fix For: 3.0.4, 3.3.0, 3.1.2, 3.2.1
>
> Attachments: HADOOP-15864-branch.2.7.001.patch, 
> HADOOP-15864-branch.2.7.002.patch, HADOOP-15864.003.patch, 
> HADOOP-15864.branch.2.7.004.patch
>
>
> Job submit failure and Task executes failure if Standby NameNode domain name 
> can not resolved on HDFS HA with DelegationToken feature.
> This issue is triggered when create {{ConfiguredFailoverProxyProvider}} 
> instance which invoke {{HAUtil.cloneDelegationTokenForLogicalUri}} in HA mode 
> with Security. Since in HDFS HA mode UGI need include separate token for each 
> NameNode in order to dealing with Active-Standby switch, the double tokens' 
> content is same of course. 
> However when #setTokenService in {{HAUtil.cloneDelegationTokenForLogicalUri}} 
> it checks whether the address of NameNode has been resolved or not, if Not, 
> throw #IllegalArgumentException upon, then job submitter/ task executor fail.
> HDFS-8068 and HADOOP-12125 try to fix it, but I don't think the two tickets 
> resolve completely.
> Another questions many guys consider is why NameNode domain name can not 
> resolve? I think there are many scenarios, for instance node replace when 
> meet fault, and refresh DNS sometimes. Anyway, Standby NameNode failure 
> should not impact Hadoop cluster stability in my opinion.
> a. code ref: org.apache.hadoop.security.SecurityUtil line373-386
> {code:java}
>   public static Text buildTokenService(InetSocketAddress addr) {
> String host = null;
> if (useIpForTokenService) {
>   if (addr.isUnresolved()) { // host has no ip address
> throw new IllegalArgumentException(
> new UnknownHostException(addr.getHostName())
> );
>   }
>   host = addr.getAddress().getHostAddress();
> } else {
>   host = StringUtils.toLowerCase(addr.getHostName());
> }
> return new Text(host + ":" + addr.getPort());
>   }
> {code}
> b.exception log ref:
> {code:xml}
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Couldn't create proxy provider class 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:515)
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:170)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:761)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:691)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:150)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
> at 
> org.apache.hadoop.fs.viewfs.ChRootedFileSystem.(ChRootedFileSystem.java:106)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:178)
> at 

[jira] [Created] (HADOOP-15883) Fix WebHdfsFileSystemContract test

2018-10-28 Thread He Xiaoqiao (JIRA)
He Xiaoqiao created HADOOP-15883:


 Summary: Fix WebHdfsFileSystemContract test
 Key: HADOOP-15883
 URL: https://issues.apache.org/jira/browse/HADOOP-15883
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 3.0.4, 3.3.0, 3.1.2, 3.2.1
Reporter: He Xiaoqiao
Assignee: He Xiaoqiao


HADOOP-15864 fix bug about Job/Task execute failure when server (NameNode, KMS, 
Timeline) domain name can not resolve. meanwhile it change semantic of http 
status code about webhdfsfilesystem, this ticket will trace to fix 
TestWebHdfsFileSystemContract#testResponseCode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15864) Job submitter / executor fail when SBN domain name can not resolved

2018-10-27 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1117#comment-1117
 ] 

He Xiaoqiao commented on HADOOP-15864:
--

Thanks [~jojochuang] for your review.
{quote} He Xiaoqiao Wei-Chiu Chuang mind giving a check again to 
hadoop.hdfs.web.TestWebHdfsFileSystemContract{quote}
Thanks for your feedback, I will recheck this UT in next two day.

> Job submitter / executor fail when SBN domain name can not resolved
> ---
>
> Key: HADOOP-15864
> URL: https://issues.apache.org/jira/browse/HADOOP-15864
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Critical
> Fix For: 3.0.4, 3.3.0, 3.1.2, 3.2.1
>
> Attachments: HADOOP-15864-branch.2.7.001.patch, 
> HADOOP-15864-branch.2.7.002.patch, HADOOP-15864.003.patch, 
> HADOOP-15864.branch.2.7.004.patch
>
>
> Job submit failure and Task executes failure if Standby NameNode domain name 
> can not resolved on HDFS HA with DelegationToken feature.
> This issue is triggered when create {{ConfiguredFailoverProxyProvider}} 
> instance which invoke {{HAUtil.cloneDelegationTokenForLogicalUri}} in HA mode 
> with Security. Since in HDFS HA mode UGI need include separate token for each 
> NameNode in order to dealing with Active-Standby switch, the double tokens' 
> content is same of course. 
> However when #setTokenService in {{HAUtil.cloneDelegationTokenForLogicalUri}} 
> it checks whether the address of NameNode has been resolved or not, if Not, 
> throw #IllegalArgumentException upon, then job submitter/ task executor fail.
> HDFS-8068 and HADOOP-12125 try to fix it, but I don't think the two tickets 
> resolve completely.
> Another questions many guys consider is why NameNode domain name can not 
> resolve? I think there are many scenarios, for instance node replace when 
> meet fault, and refresh DNS sometimes. Anyway, Standby NameNode failure 
> should not impact Hadoop cluster stability in my opinion.
> a. code ref: org.apache.hadoop.security.SecurityUtil line373-386
> {code:java}
>   public static Text buildTokenService(InetSocketAddress addr) {
> String host = null;
> if (useIpForTokenService) {
>   if (addr.isUnresolved()) { // host has no ip address
> throw new IllegalArgumentException(
> new UnknownHostException(addr.getHostName())
> );
>   }
>   host = addr.getAddress().getHostAddress();
> } else {
>   host = StringUtils.toLowerCase(addr.getHostName());
> }
> return new Text(host + ":" + addr.getPort());
>   }
> {code}
> b.exception log ref:
> {code:xml}
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Couldn't create proxy provider class 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:515)
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:170)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:761)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:691)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:150)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
> at 
> org.apache.hadoop.fs.viewfs.ChRootedFileSystem.(ChRootedFileSystem.java:106)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:178)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:172)
> at org.apache.hadoop.fs.viewfs.InodeTree.createLink(InodeTree.java:303)
> at org.apache.hadoop.fs.viewfs.InodeTree.(InodeTree.java:377)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.(ViewFileSystem.java:172)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.initialize(ViewFileSystem.java:172)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
> at 

[jira] [Updated] (HADOOP-15864) Job submitter / executor fail when SBN domain name can not resolved

2018-10-24 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-15864:
-
Fix Version/s: 3.3.0
   2.7.8

> Job submitter / executor fail when SBN domain name can not resolved
> ---
>
> Key: HADOOP-15864
> URL: https://issues.apache.org/jira/browse/HADOOP-15864
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Critical
> Fix For: 2.7.8, 3.3.0
>
> Attachments: HADOOP-15864-branch.2.7.001.patch, 
> HADOOP-15864-branch.2.7.002.patch, HADOOP-15864.003.patch, 
> HADOOP-15864.branch.2.7.004.patch
>
>
> Job submit failure and Task executes failure if Standby NameNode domain name 
> can not resolved on HDFS HA with DelegationToken feature.
> This issue is triggered when create {{ConfiguredFailoverProxyProvider}} 
> instance which invoke {{HAUtil.cloneDelegationTokenForLogicalUri}} in HA mode 
> with Security. Since in HDFS HA mode UGI need include separate token for each 
> NameNode in order to dealing with Active-Standby switch, the double tokens' 
> content is same of course. 
> However when #setTokenService in {{HAUtil.cloneDelegationTokenForLogicalUri}} 
> it checks whether the address of NameNode has been resolved or not, if Not, 
> throw #IllegalArgumentException upon, then job submitter/ task executor fail.
> HDFS-8068 and HADOOP-12125 try to fix it, but I don't think the two tickets 
> resolve completely.
> Another questions many guys consider is why NameNode domain name can not 
> resolve? I think there are many scenarios, for instance node replace when 
> meet fault, and refresh DNS sometimes. Anyway, Standby NameNode failure 
> should not impact Hadoop cluster stability in my opinion.
> a. code ref: org.apache.hadoop.security.SecurityUtil line373-386
> {code:java}
>   public static Text buildTokenService(InetSocketAddress addr) {
> String host = null;
> if (useIpForTokenService) {
>   if (addr.isUnresolved()) { // host has no ip address
> throw new IllegalArgumentException(
> new UnknownHostException(addr.getHostName())
> );
>   }
>   host = addr.getAddress().getHostAddress();
> } else {
>   host = StringUtils.toLowerCase(addr.getHostName());
> }
> return new Text(host + ":" + addr.getPort());
>   }
> {code}
> b.exception log ref:
> {code:xml}
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Couldn't create proxy provider class 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:515)
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:170)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:761)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:691)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:150)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
> at 
> org.apache.hadoop.fs.viewfs.ChRootedFileSystem.(ChRootedFileSystem.java:106)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:178)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:172)
> at org.apache.hadoop.fs.viewfs.InodeTree.createLink(InodeTree.java:303)
> at org.apache.hadoop.fs.viewfs.InodeTree.(InodeTree.java:377)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.(ViewFileSystem.java:172)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.initialize(ViewFileSystem.java:172)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:176)
> at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:665)
> ... 35 more
> Caused by: java.lang.reflect.InvocationTargetException
> at 

[jira] [Commented] (HADOOP-15864) Job submitter / executor fail when SBN domain name can not resolved

2018-10-24 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662429#comment-16662429
 ] 

He Xiaoqiao commented on HADOOP-15864:
--

Thanks [~jojochuang] for your suggestion,  [^HADOOP-15864.003.patch] is ready 
for branch trunk, and I fond unit test has pass. I also rename v002 follow the 
right format and resubmit. FYI.

> Job submitter / executor fail when SBN domain name can not resolved
> ---
>
> Key: HADOOP-15864
> URL: https://issues.apache.org/jira/browse/HADOOP-15864
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Critical
> Attachments: HADOOP-15864-branch.2.7.001.patch, 
> HADOOP-15864-branch.2.7.002.patch, HADOOP-15864.003.patch, 
> HADOOP-15864.branch.2.7.004.patch
>
>
> Job submit failure and Task executes failure if Standby NameNode domain name 
> can not resolved on HDFS HA with DelegationToken feature.
> This issue is triggered when create {{ConfiguredFailoverProxyProvider}} 
> instance which invoke {{HAUtil.cloneDelegationTokenForLogicalUri}} in HA mode 
> with Security. Since in HDFS HA mode UGI need include separate token for each 
> NameNode in order to dealing with Active-Standby switch, the double tokens' 
> content is same of course. 
> However when #setTokenService in {{HAUtil.cloneDelegationTokenForLogicalUri}} 
> it checks whether the address of NameNode has been resolved or not, if Not, 
> throw #IllegalArgumentException upon, then job submitter/ task executor fail.
> HDFS-8068 and HADOOP-12125 try to fix it, but I don't think the two tickets 
> resolve completely.
> Another questions many guys consider is why NameNode domain name can not 
> resolve? I think there are many scenarios, for instance node replace when 
> meet fault, and refresh DNS sometimes. Anyway, Standby NameNode failure 
> should not impact Hadoop cluster stability in my opinion.
> a. code ref: org.apache.hadoop.security.SecurityUtil line373-386
> {code:java}
>   public static Text buildTokenService(InetSocketAddress addr) {
> String host = null;
> if (useIpForTokenService) {
>   if (addr.isUnresolved()) { // host has no ip address
> throw new IllegalArgumentException(
> new UnknownHostException(addr.getHostName())
> );
>   }
>   host = addr.getAddress().getHostAddress();
> } else {
>   host = StringUtils.toLowerCase(addr.getHostName());
> }
> return new Text(host + ":" + addr.getPort());
>   }
> {code}
> b.exception log ref:
> {code:xml}
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Couldn't create proxy provider class 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:515)
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:170)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:761)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:691)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:150)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
> at 
> org.apache.hadoop.fs.viewfs.ChRootedFileSystem.(ChRootedFileSystem.java:106)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:178)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:172)
> at org.apache.hadoop.fs.viewfs.InodeTree.createLink(InodeTree.java:303)
> at org.apache.hadoop.fs.viewfs.InodeTree.(InodeTree.java:377)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.(ViewFileSystem.java:172)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.initialize(ViewFileSystem.java:172)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:176)
> at 

[jira] [Updated] (HADOOP-15864) Job submitter / executor fail when SBN domain name can not resolved

2018-10-24 Thread He Xiaoqiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HADOOP-15864:
-
Attachment: HADOOP-15864.branch.2.7.004.patch

> Job submitter / executor fail when SBN domain name can not resolved
> ---
>
> Key: HADOOP-15864
> URL: https://issues.apache.org/jira/browse/HADOOP-15864
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Critical
> Attachments: HADOOP-15864-branch.2.7.001.patch, 
> HADOOP-15864-branch.2.7.002.patch, HADOOP-15864.003.patch, 
> HADOOP-15864.branch.2.7.004.patch
>
>
> Job submit failure and Task executes failure if Standby NameNode domain name 
> can not resolved on HDFS HA with DelegationToken feature.
> This issue is triggered when create {{ConfiguredFailoverProxyProvider}} 
> instance which invoke {{HAUtil.cloneDelegationTokenForLogicalUri}} in HA mode 
> with Security. Since in HDFS HA mode UGI need include separate token for each 
> NameNode in order to dealing with Active-Standby switch, the double tokens' 
> content is same of course. 
> However when #setTokenService in {{HAUtil.cloneDelegationTokenForLogicalUri}} 
> it checks whether the address of NameNode has been resolved or not, if Not, 
> throw #IllegalArgumentException upon, then job submitter/ task executor fail.
> HDFS-8068 and HADOOP-12125 try to fix it, but I don't think the two tickets 
> resolve completely.
> Another questions many guys consider is why NameNode domain name can not 
> resolve? I think there are many scenarios, for instance node replace when 
> meet fault, and refresh DNS sometimes. Anyway, Standby NameNode failure 
> should not impact Hadoop cluster stability in my opinion.
> a. code ref: org.apache.hadoop.security.SecurityUtil line373-386
> {code:java}
>   public static Text buildTokenService(InetSocketAddress addr) {
> String host = null;
> if (useIpForTokenService) {
>   if (addr.isUnresolved()) { // host has no ip address
> throw new IllegalArgumentException(
> new UnknownHostException(addr.getHostName())
> );
>   }
>   host = addr.getAddress().getHostAddress();
> } else {
>   host = StringUtils.toLowerCase(addr.getHostName());
> }
> return new Text(host + ":" + addr.getPort());
>   }
> {code}
> b.exception log ref:
> {code:xml}
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Couldn't create proxy provider class 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:515)
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:170)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:761)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:691)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:150)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
> at 
> org.apache.hadoop.fs.viewfs.ChRootedFileSystem.(ChRootedFileSystem.java:106)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:178)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:172)
> at org.apache.hadoop.fs.viewfs.InodeTree.createLink(InodeTree.java:303)
> at org.apache.hadoop.fs.viewfs.InodeTree.(InodeTree.java:377)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.(ViewFileSystem.java:172)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.initialize(ViewFileSystem.java:172)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:176)
> at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:665)
> ... 35 more
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.GeneratedConstructorAccessor14.newInstance(Unknown Source)
> at 

[jira] [Commented] (HADOOP-15864) Job submitter / executor fail when SBN domain name can not resolved

2018-10-24 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661790#comment-16661790
 ] 

He Xiaoqiao commented on HADOOP-15864:
--

[~jojochuang] check fail unit test and it passed at local machine, I think it 
is not related about this patch. another question,  
[^HADOOP-15864-branch.2.7.002.patch] is for branch-2.7 , however jenkins apply 
it to branch-3.3.0, could you give some suggestions?

> Job submitter / executor fail when SBN domain name can not resolved
> ---
>
> Key: HADOOP-15864
> URL: https://issues.apache.org/jira/browse/HADOOP-15864
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Critical
> Attachments: HADOOP-15864-branch.2.7.001.patch, 
> HADOOP-15864-branch.2.7.002.patch, HADOOP-15864.003.patch
>
>
> Job submit failure and Task executes failure if Standby NameNode domain name 
> can not resolved on HDFS HA with DelegationToken feature.
> This issue is triggered when create {{ConfiguredFailoverProxyProvider}} 
> instance which invoke {{HAUtil.cloneDelegationTokenForLogicalUri}} in HA mode 
> with Security. Since in HDFS HA mode UGI need include separate token for each 
> NameNode in order to dealing with Active-Standby switch, the double tokens' 
> content is same of course. 
> However when #setTokenService in {{HAUtil.cloneDelegationTokenForLogicalUri}} 
> it checks whether the address of NameNode has been resolved or not, if Not, 
> throw #IllegalArgumentException upon, then job submitter/ task executor fail.
> HDFS-8068 and HADOOP-12125 try to fix it, but I don't think the two tickets 
> resolve completely.
> Another questions many guys consider is why NameNode domain name can not 
> resolve? I think there are many scenarios, for instance node replace when 
> meet fault, and refresh DNS sometimes. Anyway, Standby NameNode failure 
> should not impact Hadoop cluster stability in my opinion.
> a. code ref: org.apache.hadoop.security.SecurityUtil line373-386
> {code:java}
>   public static Text buildTokenService(InetSocketAddress addr) {
> String host = null;
> if (useIpForTokenService) {
>   if (addr.isUnresolved()) { // host has no ip address
> throw new IllegalArgumentException(
> new UnknownHostException(addr.getHostName())
> );
>   }
>   host = addr.getAddress().getHostAddress();
> } else {
>   host = StringUtils.toLowerCase(addr.getHostName());
> }
> return new Text(host + ":" + addr.getPort());
>   }
> {code}
> b.exception log ref:
> {code:xml}
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Couldn't create proxy provider class 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:515)
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:170)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:761)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:691)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:150)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
> at 
> org.apache.hadoop.fs.viewfs.ChRootedFileSystem.(ChRootedFileSystem.java:106)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:178)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:172)
> at org.apache.hadoop.fs.viewfs.InodeTree.createLink(InodeTree.java:303)
> at org.apache.hadoop.fs.viewfs.InodeTree.(InodeTree.java:377)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.(ViewFileSystem.java:172)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.initialize(ViewFileSystem.java:172)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:176)
> at 

[jira] [Commented] (HADOOP-15864) Job submitter / executor fail when SBN domain name can not resolved

2018-10-23 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660609#comment-16660609
 ] 

He Xiaoqiao commented on HADOOP-15864:
--

submit [^HADOOP-15864.003.patch] for branch trunk.

> Job submitter / executor fail when SBN domain name can not resolved
> ---
>
> Key: HADOOP-15864
> URL: https://issues.apache.org/jira/browse/HADOOP-15864
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Critical
> Attachments: HADOOP-15864-branch.2.7.001.patch, 
> HADOOP-15864-branch.2.7.002.patch, HADOOP-15864.003.patch
>
>
> Job submit failure and Task executes failure if Standby NameNode domain name 
> can not resolved on HDFS HA with DelegationToken feature.
> This issue is triggered when create {{ConfiguredFailoverProxyProvider}} 
> instance which invoke {{HAUtil.cloneDelegationTokenForLogicalUri}} in HA mode 
> with Security. Since in HDFS HA mode UGI need include separate token for each 
> NameNode in order to dealing with Active-Standby switch, the double tokens' 
> content is same of course. 
> However when #setTokenService in {{HAUtil.cloneDelegationTokenForLogicalUri}} 
> it checks whether the address of NameNode has been resolved or not, if Not, 
> throw #IllegalArgumentException upon, then job submitter/ task executor fail.
> HDFS-8068 and HADOOP-12125 try to fix it, but I don't think the two tickets 
> resolve completely.
> Another questions many guys consider is why NameNode domain name can not 
> resolve? I think there are many scenarios, for instance node replace when 
> meet fault, and refresh DNS sometimes. Anyway, Standby NameNode failure 
> should not impact Hadoop cluster stability in my opinion.
> a. code ref: org.apache.hadoop.security.SecurityUtil line373-386
> {code:java}
>   public static Text buildTokenService(InetSocketAddress addr) {
> String host = null;
> if (useIpForTokenService) {
>   if (addr.isUnresolved()) { // host has no ip address
> throw new IllegalArgumentException(
> new UnknownHostException(addr.getHostName())
> );
>   }
>   host = addr.getAddress().getHostAddress();
> } else {
>   host = StringUtils.toLowerCase(addr.getHostName());
> }
> return new Text(host + ":" + addr.getPort());
>   }
> {code}
> b.exception log ref:
> {code:xml}
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Couldn't create proxy provider class 
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:515)
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:170)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:761)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:691)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:150)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
> at 
> org.apache.hadoop.fs.viewfs.ChRootedFileSystem.(ChRootedFileSystem.java:106)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:178)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:172)
> at org.apache.hadoop.fs.viewfs.InodeTree.createLink(InodeTree.java:303)
> at org.apache.hadoop.fs.viewfs.InodeTree.(InodeTree.java:377)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem$1.(ViewFileSystem.java:172)
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.initialize(ViewFileSystem.java:172)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:176)
> at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:665)
> ... 35 more
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.GeneratedConstructorAccessor14.newInstance(Unknown Source)

  1   2   >