date:20140123


[ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879680#comment-13879680
 ] 

Liang Xie commented on HDFS-5776:
-

[~arpitagarwal], thanks for your nice review!
bq.  The concern is too many thread pools created by multiple clients on the 
same node
take it easy, the default configuration: pool=0, that means no extra new 
threads be created by default. if a end user/application enable hedged read, 
they should know about this

bq. what do you think of not exposing the 
DFS_DFSCLIENT_HEDGED_READ_THREADPOOL_SIZE setting at all
IMHO, i personally prefer the current style, it's less risky, we had a bound 
queue and once reach the queue limit, we force to exec it in current thread. 
about the internal upper bound, how much?  5000? 50? or sth else? i think 
if enabling this feature explicitly, the end user/application should know a 
little backgroud at least, right? just like lots of hadoop timeout config 
parameter, i never see any internal upper bound impl at all...   but if you 
strongly insist on it, i can add.

bq. DFSClient#allowHedgedReads seems unnecessary
let's keep it there, it's more easier to understand for developer or end user.

bq. For DEFAULT_DFSCLIENT_HEDGED_READ_THRESHOLD_MILLIS - can we add an inbuilt 
minimum delay to defeat applications that set it too low or even zero
my opinion is same as the above one. since we don't have any knowledge about 
end-user's storage configuration, just image if they have a fast flash(with 
HDFS-2832 enabled), say fusionio, probably one real disk read only cost tens of 
microseconds, how should we decide a good minimum defeat setting? so i don't 
like to add it, i totally get your kindly concern:)

bq. DFSInputStream#chooseDataNode - can the call to getBestNodeErrorString go 
inside the if (failures =... clause?
another log statement also use it, see DFSClient.LOG.info(Could not obtain  
+ block.getBlock..., so it's impossible here.

bq. #fetchBlockByteRange - can we rename retVal to something like addressPair?
good. let me rename it

bq. Do we still need the while loop still there in actualGetFromOneDataNode?
yes, but the loop is very very light,  only when some exceptions like 
AccessControlException/InvalidEncryptionKeyException/InvalidBlockTokenException 
happened, will do extra loop, and all those have a fast quit mechanism, like 
refetchToken/refetchEncryptionKey or disableLegacyBlockReaderLocal, so this 
loop will only be executed just a very few times:)

bq. There is already a while loop in fetchBlockByteRange enclosing the call to 
actualGetFromOneDataNode. Now we have a nested loop.
In the loop inside fetchBlockByteRange, the responsibily is picking another dn 
if there's IOException thrown from actualGetFromOneDataNode,  so not a fearful 
nested loop at all, take it easy:)

bq. Maybe I misunderstood the code flow but it looks like the way the while 
loops are nested it defeats the usage of refetchToken and refetchEncryptionKey. 
It looks like the intention was to limit the refetch to 1 across all retries, 
now we can refetch multiple times.
yes, you had a misunderstanding here. that's why i catch IOException fbae 
around fetchBlockAt. If we don't catch here, there will be always new refetch 
from outside loop and will have a spin loop

bq. Related to the previous, #actualGetFromOneDataNode, line 1026, - sorry I 
did not understand why the try-catch was added around the call to fetchBlockAt.
hope the above answer could make you clear?  hope my poor english doesn't make 
everything worse, haha:)

bq. #actualGetFromOneDataNode, line 1033 - the call to DFSClient.LOG.warn is 
deleted. Assume that was unintentional?
Gd catch!

bq. Nitpick - some lines have whitespace-only changes.
i found several unnessessiry whitespaces existing, i just removed them to make 
more clear.

Really thanks all for review!!!

 Support 'hedged' reads in DFSClient
 ---

 Key: HDFS-5776
 URL: https://issues.apache.org/jira/browse/HDFS-5776
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, 
 HDFS-5776-v5.txt, HDFS-5776.txt


 This is a placeholder of hdfs related stuff backport from 
 https://issues.apache.org/jira/browse/HBASE-7509
 The quorum read ability should be helpful especially to optimize read outliers
 we can utilize dfs.dfsclient.quorum.read.threshold.millis  
 dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read 
 ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
 could export the interested metric valus into client system(e.g. HBase's 
 regionserver metric).
 The core logic is in pread code path, we decide to

[jira] [Updated] (HDFS-5776) Support 'hedged' reads in DFSClient


 [ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-5776:


Attachment: HDFS-5776-v6.txt

 Support 'hedged' reads in DFSClient
 ---

 Key: HDFS-5776
 URL: https://issues.apache.org/jira/browse/HDFS-5776
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, 
 HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776.txt


 This is a placeholder of hdfs related stuff backport from 
 https://issues.apache.org/jira/browse/HBASE-7509
 The quorum read ability should be helpful especially to optimize read outliers
 we can utilize dfs.dfsclient.quorum.read.threshold.millis  
 dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read 
 ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
 could export the interested metric valus into client system(e.g. HBase's 
 regionserver metric).
 The core logic is in pread code path, we decide to goto the original 
 fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
 the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5776) Support 'hedged' reads in DFSClient


 [ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-5776:


Attachment: (was: HDFS-5776-v6.txt)

 Support 'hedged' reads in DFSClient
 ---

 Key: HDFS-5776
 URL: https://issues.apache.org/jira/browse/HDFS-5776
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, 
 HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776.txt


 This is a placeholder of hdfs related stuff backport from 
 https://issues.apache.org/jira/browse/HBASE-7509
 The quorum read ability should be helpful especially to optimize read outliers
 we can utilize dfs.dfsclient.quorum.read.threshold.millis  
 dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read 
 ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
 could export the interested metric valus into client system(e.g. HBase's 
 regionserver metric).
 The core logic is in pread code path, we decide to goto the original 
 fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
 the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5776) Support 'hedged' reads in DFSClient


 [ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-5776:


Attachment: HDFS-5776-v6.txt

 Support 'hedged' reads in DFSClient
 ---

 Key: HDFS-5776
 URL: https://issues.apache.org/jira/browse/HDFS-5776
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, 
 HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776.txt


 This is a placeholder of hdfs related stuff backport from 
 https://issues.apache.org/jira/browse/HBASE-7509
 The quorum read ability should be helpful especially to optimize read outliers
 we can utilize dfs.dfsclient.quorum.read.threshold.millis  
 dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read 
 ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
 could export the interested metric valus into client system(e.g. HBase's 
 regionserver metric).
 The core logic is in pread code path, we decide to goto the original 
 fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
 the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5821) TestHDFSCLI fails for user names with the dash character

Gera Shegalov created HDFS-5821:
---

 Summary: TestHDFSCLI fails for user names with the dash character
 Key: HDFS-5821
 URL: https://issues.apache.org/jira/browse/HDFS-5821
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.2.0
Reporter: Gera Shegalov


testHDFSConf.xml uses regexes inconsistently to match the username from 
{{[a-zA-z0-9]*}} to {{[a-z]*}}. This by far does not cover the space of 
possible OS user names.  For us, it fails for a user name containing a {{-}}. 
Instead of keeping updating regex, we propose to use the macro USERNAME.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5821) TestHDFSCLI fails for user names with the dash character


 [ 
https://issues.apache.org/jira/browse/HDFS-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated HDFS-5821:


Attachment: HDFS-5821-trunk.v01.patch

Patch with the proposed fix

 TestHDFSCLI fails for user names with the dash character
 

 Key: HDFS-5821
 URL: https://issues.apache.org/jira/browse/HDFS-5821
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.2.0
Reporter: Gera Shegalov
 Attachments: HDFS-5821-trunk.v01.patch


 testHDFSConf.xml uses regexes inconsistently to match the username from 
 {{[a-zA-z0-9]*}} to {{[a-z]*}}. This by far does not cover the space of 
 possible OS user names.  For us, it fails for a user name containing a {{-}}. 
 Instead of keeping updating regex, we propose to use the macro USERNAME.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5821) TestHDFSCLI fails for user names with the dash character


 [ 
https://issues.apache.org/jira/browse/HDFS-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated HDFS-5821:


Status: Patch Available  (was: Open)

 TestHDFSCLI fails for user names with the dash character
 

 Key: HDFS-5821
 URL: https://issues.apache.org/jira/browse/HDFS-5821
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.2.0
Reporter: Gera Shegalov
 Attachments: HDFS-5821-trunk.v01.patch


 testHDFSConf.xml uses regexes inconsistently to match the username from 
 {{[a-zA-z0-9]*}} to {{[a-z]*}}. This by far does not cover the space of 
 possible OS user names.  For us, it fails for a user name containing a {{-}}. 
 Instead of keeping updating regex, we propose to use the macro USERNAME.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5821) TestHDFSCLI fails for user names with the dash character


 [ 
https://issues.apache.org/jira/browse/HDFS-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated HDFS-5821:


Description: testHDFSConf.xml uses regexes inconsistently to match the 
username from {code}[a-zA-z0-9]*{code} to {code}[a-z]*{code}. This by far does 
not cover the space of possible OS user names.  For us, it fails for a user 
name containing {{'-'}}. Instead of keeping updating regex, we propose to use 
the macro USERNAME.  (was: testHDFSConf.xml uses regexes inconsistently to 
match the username from {{[a-zA-z0-9]*}} to {{[a-z]*}}. This by far does not 
cover the space of possible OS user names.  For us, it fails for a user name 
containing a {{-}}. Instead of keeping updating regex, we propose to use the 
macro USERNAME.)

 TestHDFSCLI fails for user names with the dash character
 

 Key: HDFS-5821
 URL: https://issues.apache.org/jira/browse/HDFS-5821
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.2.0
Reporter: Gera Shegalov
 Attachments: HDFS-5821-trunk.v01.patch


 testHDFSConf.xml uses regexes inconsistently to match the username from 
 {code}[a-zA-z0-9]*{code} to {code}[a-z]*{code}. This by far does not cover 
 the space of possible OS user names.  For us, it fails for a user name 
 containing {{'-'}}. Instead of keeping updating regex, we propose to use the 
 macro USERNAME.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5822) InterruptedException to thread sleep ignored

2014-01-23 Thread Ding Yuan (JIRA)

Ding Yuan created HDFS-5822:
---

 Summary: InterruptedException to thread sleep ignored
 Key: HDFS-5822
 URL: https://issues.apache.org/jira/browse/HDFS-5822
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.2.0
Reporter: Ding Yuan


In org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java, there is the 
following code snippet in the run() method:

156:  } catch (OutOfMemoryError ie) {
157:IOUtils.cleanup(null, peer);
158:// DataNode can run out of memory if there is too many transfers.
159:   // Log the event, Sleep for 30 seconds, other transfers may complete 
by
160:// then.
161:LOG.warn(DataNode is out of memory. Will retry in 30 seconds., 
ie);
162:try {
163:  Thread.sleep(30 * 1000);
164:} catch (InterruptedException e) {
165:  // ignore
166:}
167:  }

Note that InterruptedException is completely ignored. This might not be safe 
since any potential events that lead to InterruptedException are lost?

More info on why InterruptedException shouldn't be ignored: 
http://stackoverflow.com/questions/1087475/when-does-javas-thread-sleep-throw-interruptedexception

Thanks,
Ding



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5822) InterruptedException to thread sleep ignored

2014-01-23 Thread Ding Yuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ding Yuan updated HDFS-5822:


Description: 
In org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java, there is the 
following code snippet in the run() method:

{noformat}
156:  } catch (OutOfMemoryError ie) {
157:IOUtils.cleanup(null, peer);
158:// DataNode can run out of memory if there is too many transfers.
159:   // Log the event, Sleep for 30 seconds, other transfers may complete 
by
160:// then.
161:LOG.warn(DataNode is out of memory. Will retry in 30 seconds., 
ie);
162:try {
163:  Thread.sleep(30 * 1000);
164:} catch (InterruptedException e) {
165:  // ignore
166:}
167:  }
{noformat}

Note that InterruptedException is completely ignored. This might not be safe 
since any potential events that lead to InterruptedException are lost?

More info on why InterruptedException shouldn't be ignored: 
http://stackoverflow.com/questions/1087475/when-does-javas-thread-sleep-throw-interruptedexception

Thanks,
Ding

  was:
In org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java, there is the 
following code snippet in the run() method:

156:  } catch (OutOfMemoryError ie) {
157:IOUtils.cleanup(null, peer);
158:// DataNode can run out of memory if there is too many transfers.
159:   // Log the event, Sleep for 30 seconds, other transfers may complete 
by
160:// then.
161:LOG.warn(DataNode is out of memory. Will retry in 30 seconds., 
ie);
162:try {
163:  Thread.sleep(30 * 1000);
164:} catch (InterruptedException e) {
165:  // ignore
166:}
167:  }

Note that InterruptedException is completely ignored. This might not be safe 
since any potential events that lead to InterruptedException are lost?

More info on why InterruptedException shouldn't be ignored: 
http://stackoverflow.com/questions/1087475/when-does-javas-thread-sleep-throw-interruptedexception

Thanks,
Ding


 InterruptedException to thread sleep ignored
 

 Key: HDFS-5822
 URL: https://issues.apache.org/jira/browse/HDFS-5822
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.2.0
Reporter: Ding Yuan

 In org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java, there is 
 the following code snippet in the run() method:
 {noformat}
 156:  } catch (OutOfMemoryError ie) {
 157:IOUtils.cleanup(null, peer);
 158:// DataNode can run out of memory if there is too many transfers.
 159:   // Log the event, Sleep for 30 seconds, other transfers may 
 complete by
 160:// then.
 161:LOG.warn(DataNode is out of memory. Will retry in 30 seconds., 
 ie);
 162:try {
 163:  Thread.sleep(30 * 1000);
 164:} catch (InterruptedException e) {
 165:  // ignore
 166:}
 167:  }
 {noformat}
 Note that InterruptedException is completely ignored. This might not be safe 
 since any potential events that lead to InterruptedException are lost?
 More info on why InterruptedException shouldn't be ignored: 
 http://stackoverflow.com/questions/1087475/when-does-javas-thread-sleep-throw-interruptedexception
 Thanks,
 Ding



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5804) HDFS NFS Gateway fails to mount and proxy when using Kerberos


[ 
https://issues.apache.org/jira/browse/HDFS-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879932#comment-13879932
 ] 

Daryn Sharp commented on HDFS-5804:
---

I'm unfamiliar with the nfs code to level set these comments.  My initial 
feeling is that the conditional logic is less than desirable.

Relative to the provided patch, I think there's a clean way to avoid the 
explicit root check.  The check seems circumspect as in there shouldn't be a 
pre-condition that the fuse daemon run as root.  My basic understanding is 
that fuse runs as root to access user ticket caches.  However, there's no 
reason I couldn't map a different username to uid 0, allow a non-privileged 
user to access the ticket caches based on group perms, use SELinux capabilities 
to grant a fsuid of root to the fuse daemon, etc.

Anyway, back to the patch.  A better way may be to check the given username 
against the current user.  Create a proxy user if they are different, else 
return the current user.  No isSecurityEnabled or root comparison needed.  Or 
better yet, just always create a proxy user.  A proxy will work with or w/o 
security, and proxy of the same user also/should work.

I'm unclear how this patch solves the issue of root cannot stat /.  A proxy is 
only being created if the user isn't root so how does this fix the issue?

 HDFS NFS Gateway fails to mount and proxy when using Kerberos
 -

 Key: HDFS-5804
 URL: https://issues.apache.org/jira/browse/HDFS-5804
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Affects Versions: 3.0.0, 2.2.0
Reporter: Abin Shahab
 Attachments: HDFS-5804.patch, javadoc-after-patch.log, 
 javadoc-before-patch.log


 When using HDFS nfs gateway with secure hadoop 
 (hadoop.security.authentication: kerberos), mounting hdfs fails. 
 Additionally, there is no mechanism to support proxy user(nfs needs to proxy 
 as the user invoking commands on the hdfs mount).
 Steps to reproduce:
 1) start a hadoop cluster with kerberos enabled.
 2) sudo su -l nfsserver and start an nfs server. This 'nfsserver' account has 
 a an account in kerberos.
 3) Get the keytab for nfsserver, and issue the following mount command: mount 
 -t nfs -o vers=3,proto=tcp,nolock $server:/  $mount_point
 4) You'll see in the nfsserver logs that Kerberos is complaining about not 
 having a TGT for root.
 This is the stacktrace: 
 java.io.IOException: Failed on local exception: java.io.IOException: 
 org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
 via:[TOKEN, KERBEROS]; Host Details : local host is: 
 my-nfs-server-host.com/10.252.4.197; destination host is: 
 my-namenode-host.com:8020; 
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
   at org.apache.hadoop.ipc.Client.call(Client.java:1351)
   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileLinkInfo(ClientNamenodeProtocolTranslatorPB.java:664)
   at org.apache.hadoop.hdfs.DFSClient.getFileLinkInfo(DFSClient.java:1713)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileStatus(Nfs3Utils.java:58)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileAttr(Nfs3Utils.java:79)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.fsinfo(RpcProgramNfs3.java:1643)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.handleInternal(RpcProgramNfs3.java:1891)
   at 
 org.apache.hadoop.oncrpc.RpcProgram.messageReceived(RpcProgram.java:143)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:281)
   at 
 org.apache.hadoop.oncrpc.RpcUtil$RpcMessageParserStage.messageReceived(RpcUtil.java:132)
   at

[jira] [Updated] (HDFS-5789) Some of snapshot APIs missing checkOperation double check in fsn


 [ 
https://issues.apache.org/jira/browse/HDFS-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-5789:
--

Summary: Some of snapshot APIs missing checkOperation double check in fsn  
(was: Some of snapshot, Cache APIs missing checkOperation double check in fsn)

 Some of snapshot APIs missing checkOperation double check in fsn
 

 Key: HDFS-5789
 URL: https://issues.apache.org/jira/browse/HDFS-5789
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G

 HDFS-4591 introduced double checked for HA state while taking fsn lock.
 checkoperation made before actually taking lock and after the lock again.
 This pattern missed in some of the snapshot APIs and cache management related 
 APIs.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5789) Some of snapshot APIs missing checkOperation double check in fsn


 [ 
https://issues.apache.org/jira/browse/HDFS-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-5789:
--

Attachment: HDFS-5789.patch

Attached a simple patch for review.

 Some of snapshot APIs missing checkOperation double check in fsn
 

 Key: HDFS-5789
 URL: https://issues.apache.org/jira/browse/HDFS-5789
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-5789.patch


 HDFS-4591 introduced double checked for HA state while taking fsn lock.
 checkoperation made before actually taking lock and after the lock again.
 This pattern missed in some of the snapshot APIs and cache management related 
 APIs.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5318) Support read-only and read-write paths to shared replicas

2014-01-23 Thread Eric Sirianni (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Sirianni updated HDFS-5318:


Attachment: HDFS-5318c-branch-2.patch

 Support read-only and read-write paths to shared replicas
 -

 Key: HDFS-5318
 URL: https://issues.apache.org/jira/browse/HDFS-5318
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.4.0
Reporter: Eric Sirianni
 Attachments: HDFS-5318.patch, HDFS-5318a-branch-2.patch, 
 HDFS-5318b-branch-2.patch, HDFS-5318c-branch-2.patch, hdfs-5318.pdf


 There are several use cases for using shared-storage for datanode block 
 storage in an HDFS environment (storing cold blocks on a NAS device, Amazon 
 S3, etc.).
 With shared-storage, there is a distinction between:
 # a distinct physical copy of a block
 # an access-path to that block via a datanode.  
 A single 'replication count' metric cannot accurately capture both aspects.  
 However, for most of the current uses of 'replication count' in the Namenode, 
 the number of physical copies aspect seems to be the appropriate semantic.
 I propose altering the replication counting algorithm in the Namenode to 
 accurately infer distinct physical copies in a shared storage environment.  
 With HDFS-5115, a {{StorageID}} is a UUID.  I propose associating some minor 
 additional semantics to the {{StorageID}} - namely that multiple datanodes 
 attaching to the same physical shared storage pool should report the same 
 {{StorageID}} for that pool.  A minor modification would be required in the 
 DataNode to enable the generation of {{StorageID}} s to be pluggable behind 
 the {{FsDatasetSpi}} interface.  
 With those semantics in place, the number of physical copies of a block in a 
 shared storage environment can be calculated as the number of _distinct_ 
 {{StorageID}} s associated with that block.
 Consider the following combinations for two {{(DataNode ID, Storage ID)}} 
 pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B:
 * {{DN_A != DN_B  S_A != S_B}} - *different* access paths to *different* 
 physical replicas (i.e. the traditional HDFS case with local disks)
 ** rarr; Block B has {{ReplicationCount == 2}}
 * {{DN_A != DN_B  S_A == S_B}} - *different* access paths to the *same* 
 physical replica (e.g. HDFS datanodes mounting the same NAS share)
 ** rarr; Block B has {{ReplicationCount == 1}}
 For example, if block B has the following location tuples:
 * {{DN_1, STORAGE_A}}
 * {{DN_2, STORAGE_A}}
 * {{DN_3, STORAGE_B}}
 * {{DN_4, STORAGE_B}},
 the effect of this proposed change would be to calculate the replication 
 factor in the namenode as *2* instead of *4*.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5789) Some of snapshot APIs missing checkOperation double check in fsn

2014-01-23 Thread Tsz Wo (Nicholas), SZE (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5789:
-

Hadoop Flags: Reviewed

+1 patch looks good.

 Some of snapshot APIs missing checkOperation double check in fsn
 

 Key: HDFS-5789
 URL: https://issues.apache.org/jira/browse/HDFS-5789
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-5789.patch


 HDFS-4591 introduced double checked for HA state while taking fsn lock.
 checkoperation made before actually taking lock and after the lock again.
 This pattern missed in some of the snapshot APIs and cache management related 
 APIs.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5789) Some of snapshot APIs missing checkOperation double check in fsn


 [ 
https://issues.apache.org/jira/browse/HDFS-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-5789:
--

Status: Patch Available  (was: Open)

 Some of snapshot APIs missing checkOperation double check in fsn
 

 Key: HDFS-5789
 URL: https://issues.apache.org/jira/browse/HDFS-5789
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-5789.patch


 HDFS-4591 introduced double checked for HA state while taking fsn lock.
 checkoperation made before actually taking lock and after the lock again.
 This pattern missed in some of the snapshot APIs and cache management related 
 APIs.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5789) Some of snapshot APIs missing checkOperation double check in fsn


 [ 
https://issues.apache.org/jira/browse/HDFS-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-5789:
--

   Resolution: Fixed
Fix Version/s: 2.3.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Thanks a lot, Nicholas for the review!
I have just committed this to trunk and branch-2.

 Some of snapshot APIs missing checkOperation double check in fsn
 

 Key: HDFS-5789
 URL: https://issues.apache.org/jira/browse/HDFS-5789
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
 Fix For: 3.0.0, 2.3.0

 Attachments: HDFS-5789.patch


 HDFS-4591 introduced double checked for HA state while taking fsn lock.
 checkoperation made before actually taking lock and after the lock again.
 This pattern missed in some of the snapshot APIs and cache management related 
 APIs.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5788) listLocatedStatus response can be very large

[
https://issues.apache.org/jira/browse/HDFS-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880062#comment-13880062
]

Daryn Sharp commented on HDFS-5788:
---

For a bit more context, we had about ~6-7k tasks (erroneously) issuing
listLocatedStatus. Each limited response was over 1M. The handler attempts a
non-blocking write for the response. If the entire response cannot be written,
the call is added to the background responder thread. The kernel accepts well
below 1M for a non-blocking write so all the responses were added to the
responder thread.

The call response byte buffers track the position of the last write, thus the
entire response buffer is retained until the full response is sent.
Re-allocating a buffer with the unsent response will likely introduce
additional memory pressure, so the most logical/simplistic change is limiting
the response size of the located status.

The end result in our case was the heap bloating by over 8G. Full GC kicked
in. The NN was unresponsive for up to 5m at a time. Each time it woke up it
marked DNs as dead, causing a flurry of replications which further aggravated
the memory issue. Due to other exposed bugs, the NN required a restart.

Although more RPCs are required to satisfy the large requests, I believe the
tradeoff is reasonable. It's also not likely to be a common occurrence.

listLocatedStatus response can be very large

Key: HDFS-5788
URL: https://issues.apache.org/jira/browse/HDFS-5788
Project: Hadoop HDFS
Issue Type: Improvement
Components: namenode
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Nathan Roberts
Assignee: Nathan Roberts
Attachments: HDFS-5788.patch

Currently we limit the size of listStatus requests to a default of 1000
entries. This works fine except in the case of listLocatedStatus where the
location information can be quite large. As an example, a directory with 7000
entries, 4 blocks each, 3 way replication - a listLocatedStatus response is
over 1MB. This can chew up very large amounts of memory in the NN if lots of
clients try to do this simultaneously.
Seems like it would be better if we also considered the amount of location
information being returned when deciding how many files to return.
Patch will follow shortly.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5788) listLocatedStatus response can be very large

2014-01-23 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880068#comment-13880068
 ] 

Kihwal Lee commented on HDFS-5788:
--

The location counting can be off if blocks are under-replicated or 
over-replicated, but spending more cycles to make it perfect will be a waste. 
So I am okay with this approach.

+1

 listLocatedStatus response can be very large
 

 Key: HDFS-5788
 URL: https://issues.apache.org/jira/browse/HDFS-5788
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Nathan Roberts
Assignee: Nathan Roberts
 Attachments: HDFS-5788.patch


 Currently we limit the size of listStatus requests to a default of 1000 
 entries. This works fine except in the case of listLocatedStatus where the 
 location information can be quite large. As an example, a directory with 7000 
 entries, 4 blocks each, 3 way replication - a listLocatedStatus response is 
 over 1MB. This can chew up very large amounts of memory in the NN if lots of 
 clients try to do this simultaneously.
 Seems like it would be better if we also considered the amount of location 
 information being returned when deciding how many files to return.
 Patch will follow shortly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5343) When cat command is issued on snapshot files getting unexpected result


[ 
https://issues.apache.org/jira/browse/HDFS-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880069#comment-13880069
 ] 

Uma Maheswara Rao G commented on HDFS-5343:
---

Thanks for updating the patch Sathish.

Here you need to reset the out back with System.out
So, you take backup of System.out first and then your code. In finally you just 
reset with System.out again.
{code}
ByteArrayOutputStream bao = new ByteArrayOutputStream();
+System.setOut(new PrintStream(bao));
+System.setErr(new PrintStream(bao));
..
{code}

Above code should be something like this.

{code}
PrintStream psBackup = System.out;
ByteArrayOutputStream bao = new ByteArrayOutputStream();
+System.setOut(new PrintStream(bao));
+System.setErr(new PrintStream(bao));

try{
..
} finally {
  System.setOut(psBackup);
 }
{code}
Otherwise all SOP from next test onwards may go into the above stream set.

Also please keep one empty line gap between testcases.
{code}
fis.close();
   }
+  /**
+   * Adding as part of jira HDFS-5343
{code}


 When cat command is issued on snapshot files getting unexpected result
 --

 Key: HDFS-5343
 URL: https://issues.apache.org/jira/browse/HDFS-5343
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Reporter: sathish
Assignee: sathish
 Attachments: HDFS-5343-0003.patch, HDFS-5343-002.patch


 first if we create one file with some file length and take the snapshot of 
 that file,and again append some data through append method to that file,then 
 if we do cat command operation on snapshot of that file,in general it should 
 dispaly the data what we added with create operation,but it is displaying the 
 total data i.e. create +_ appended data.
 but if we do the same operation and if we read the contents of snapshot file 
 through input stream it is just displaying the data created in snapshoted 
 files.
 in this the behaviour of cat command and reading through inputstream is 
 getting different



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5754) Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion

2014-01-23 Thread Tsz Wo (Nicholas), SZE (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880077#comment-13880077
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5754:
--

- The patch adds the new system properties (HDFS_SERVICE_LAYOUT etc.) for 
initializing LayoutVersion.map.  It seems that it won't work for unit tests 
since a test may run both NN and DN but we only have one map.  I think we need 
two maps.  Do you agree?

- If HdfsConstants.LAYOUT_VERSION is initialized with NameNode.Feature.values() 
as below, the code in DN using HdfsConstants.LAYOUT_VERSION is incorrect.  I 
guess it needs two LAYOUT_VERSIONs.
{code}
//HdfsConstants
   public static final int LAYOUT_VERSION = LayoutVersion
-  .getCurrentLayoutVersion();
+  .getCurrentLayoutVersion(NameNode.Feature.values());
{code}

- LayoutFeatureComparator.compare(..) could be simplified as below
{code}
public int compare(LayoutFeature arg0, LayoutFeature arg1) {
  return arg0.getLayoutVersion() - arg1.getLayoutVersion();
}
{code}

- In the patch, the Feature enum in LayoutVersion, NameNode and DataNode are 
very similar.  We could add a new class FeatureInfo for reducing repeated code 
as shown in FeatureInfo.patch.

 Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion 
 

 Key: HDFS-5754
 URL: https://issues.apache.org/jira/browse/HDFS-5754
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Brandon Li
 Attachments: HDFS-5754.001.patch, HDFS-5754.002.patch, 
 HDFS-5754.003.patch, HDFS-5754.004.patch, HDFS-5754.006.patch, 
 HDFS-5754.007.patch, HDFS-5754.008.patch


 Currently, LayoutVersion defines the on-disk data format and supported 
 features of the entire cluster including NN and DNs.  LayoutVersion is 
 persisted in both NN and DNs.  When a NN/DN starts up, it checks its 
 supported LayoutVersion against the on-disk LayoutVersion.  Also, a DN with a 
 different LayoutVersion than NN cannot register with the NN.
 We propose to split LayoutVersion into two independent values that are local 
 to the nodes:
 - NamenodeLayoutVersion - defines the on-disk data format in NN, including 
 the format of FSImage, editlog and the directory structure.
 - DatanodeLayoutVersion - defines the on-disk data format in DN, including 
 the format of block data file, metadata file, block pool layout, and the 
 directory structure.  
 The LayoutVersion check will be removed in DN registration.  If 
 NamenodeLayoutVersion or DatanodeLayoutVersion is changed in a rolling 
 upgrade, then only rollback is supported and downgrade is not.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5754) Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion

2014-01-23 Thread Tsz Wo (Nicholas), SZE (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5754:
-

Attachment: FeatureInfo.patch

 Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion 
 

 Key: HDFS-5754
 URL: https://issues.apache.org/jira/browse/HDFS-5754
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Brandon Li
 Attachments: FeatureInfo.patch, HDFS-5754.001.patch, 
 HDFS-5754.002.patch, HDFS-5754.003.patch, HDFS-5754.004.patch, 
 HDFS-5754.006.patch, HDFS-5754.007.patch, HDFS-5754.008.patch


 Currently, LayoutVersion defines the on-disk data format and supported 
 features of the entire cluster including NN and DNs.  LayoutVersion is 
 persisted in both NN and DNs.  When a NN/DN starts up, it checks its 
 supported LayoutVersion against the on-disk LayoutVersion.  Also, a DN with a 
 different LayoutVersion than NN cannot register with the NN.
 We propose to split LayoutVersion into two independent values that are local 
 to the nodes:
 - NamenodeLayoutVersion - defines the on-disk data format in NN, including 
 the format of FSImage, editlog and the directory structure.
 - DatanodeLayoutVersion - defines the on-disk data format in DN, including 
 the format of block data file, metadata file, block pool layout, and the 
 directory structure.  
 The LayoutVersion check will be removed in DN registration.  If 
 NamenodeLayoutVersion or DatanodeLayoutVersion is changed in a rolling 
 upgrade, then only rollback is supported and downgrade is not.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5788) listLocatedStatus response can be very large

2014-01-23 Thread Kihwal Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5788:
-

   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks for working on the issue, Nathan. I've committed it to trunk and 
branch-2.

 listLocatedStatus response can be very large
 

 Key: HDFS-5788
 URL: https://issues.apache.org/jira/browse/HDFS-5788
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Nathan Roberts
Assignee: Nathan Roberts
 Fix For: 3.0.0, 2.4.0

 Attachments: HDFS-5788.patch


 Currently we limit the size of listStatus requests to a default of 1000 
 entries. This works fine except in the case of listLocatedStatus where the 
 location information can be quite large. As an example, a directory with 7000 
 entries, 4 blocks each, 3 way replication - a listLocatedStatus response is 
 over 1MB. This can chew up very large amounts of memory in the NN if lots of 
 clients try to do this simultaneously.
 Seems like it would be better if we also considered the amount of location 
 information being returned when deciding how many files to return.
 Patch will follow shortly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5823) Document async audit logging

Daryn Sharp created HDFS-5823:
-

 Summary: Document async audit logging
 Key: HDFS-5823
 URL: https://issues.apache.org/jira/browse/HDFS-5823
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp


HDFS-5241 added an option for async log4j audit logging.  The option is 
considered semi-experimental and should be documented in hdfs-defaults.xml 
after it's stability under stress is proven.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5241) Provide alternate queuing audit logger to reduce logging contention


 [ 
https://issues.apache.org/jira/browse/HDFS-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-5241:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2.

 Provide alternate queuing audit logger to reduce logging contention
 ---

 Key: HDFS-5241
 URL: https://issues.apache.org/jira/browse/HDFS-5241
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: HDFS-5241.patch, HDFS-5241.patch


 The default audit logger has extremely poor performance.  The internal 
 synchronization of log4j causes massive contention between the call handlers 
 (100 by default) which drastically limits the throughput of the NN.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (HDFS-5799) Make audit logging consistent across ACL APIs.

2014-01-23 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-5799.
-

   Resolution: Fixed
Fix Version/s: HDFS ACLs (HDFS-4685)
 Hadoop Flags: Reviewed

Thanks for the review, Arpit.  I committed this to the feature branch.

 Make audit logging consistent across ACL APIs.
 --

 Key: HDFS-5799
 URL: https://issues.apache.org/jira/browse/HDFS-5799
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: HDFS ACLs (HDFS-4685)

 Attachments: HDFS-5799.1.patch


 Currently, the various ACL APIs are not writing to the audit log 
 consistently.  This patch will ensure that all ACL APIs write to the audit 
 log and finalize the information that they write.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5781) Use a map to record the mapping between FSEditLogOpCode and the corresponding byte value


 [ 
https://issues.apache.org/jira/browse/HDFS-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5781:


Attachment: HDFS-5781.002.patch

Resubmit the patch to trigger Jenkins.

 Use a map to record the mapping between FSEditLogOpCode and the corresponding 
 byte value
 

 Key: HDFS-5781
 URL: https://issues.apache.org/jira/browse/HDFS-5781
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.4.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-5781.000.patch, HDFS-5781.001.patch, 
 HDFS-5781.002.patch, HDFS-5781.002.patch


 HDFS-5674 uses Enum.values and enum.ordinal to identify an editlog op for a 
 given byte value. While improving the efficiency, it may cause issue. E.g., 
 when several new editlog ops are added to trunk around the same time (for 
 several different new features), it is hard to backport the editlog ops with 
 larger byte values to branch-2 before those with smaller values, since there 
 will be gaps in the byte values of the enum. 
 This jira plans to still use a map to record the mapping between editlog ops 
 and their byte values. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5824) Add a Type field in Snapshot DiffEntry's protobuf definition

Jing Zhao created HDFS-5824:
---

 Summary: Add a Type field in Snapshot DiffEntry's protobuf 
definition
 Key: HDFS-5824
 URL: https://issues.apache.org/jira/browse/HDFS-5824
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor


We need to add a Type field to differentiate FileDiff and DirectoryDiff in our  
protobuf to enable the offline image viewer to parse the fsimage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5824) Add a Type field in Snapshot DiffEntry's protobuf definition


 [ 
https://issues.apache.org/jira/browse/HDFS-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5824:


Attachment: HDFS-5824.000.patch

 Add a Type field in Snapshot DiffEntry's protobuf definition
 

 Key: HDFS-5824
 URL: https://issues.apache.org/jira/browse/HDFS-5824
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-5824.000.patch


 We need to add a Type field to differentiate FileDiff and DirectoryDiff in 
 our  protobuf to enable the offline image viewer to parse the fsimage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5824) Add a Type field in Snapshot DiffEntry's protobuf definition


[ 
https://issues.apache.org/jira/browse/HDFS-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880227#comment-13880227
 ] 

Haohui Mai commented on HDFS-5824:
--

+1

 Add a Type field in Snapshot DiffEntry's protobuf definition
 

 Key: HDFS-5824
 URL: https://issues.apache.org/jira/browse/HDFS-5824
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-5824.000.patch


 We need to add a Type field to differentiate FileDiff and DirectoryDiff in 
 our  protobuf to enable the offline image viewer to parse the fsimage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (HDFS-5824) Add a Type field in Snapshot DiffEntry's protobuf definition


 [ 
https://issues.apache.org/jira/browse/HDFS-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao resolved HDFS-5824.
-

   Resolution: Fixed
Fix Version/s: HDFS-5698 (FSImage in protobuf)
 Hadoop Flags: Reviewed

Thanks for the review, Haohui! I've committed this.

 Add a Type field in Snapshot DiffEntry's protobuf definition
 

 Key: HDFS-5824
 URL: https://issues.apache.org/jira/browse/HDFS-5824
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Fix For: HDFS-5698 (FSImage in protobuf)

 Attachments: HDFS-5824.000.patch


 We need to add a Type field to differentiate FileDiff and DirectoryDiff in 
 our  protobuf to enable the offline image viewer to parse the fsimage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5808) Implement cancellation when saving FSImage


 [ 
https://issues.apache.org/jira/browse/HDFS-5808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5808:
-

Attachment: HDFS-5808.001.patch

Rebased

 Implement cancellation when saving FSImage
 --

 Key: HDFS-5808
 URL: https://issues.apache.org/jira/browse/HDFS-5808
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: HDFS-5698 (FSImage in protobuf)

 Attachments: HDFS-5808.000.patch, HDFS-5808.001.patch


 This jira proposes to implement checking whether the user has cancel the 
 operation when saving the fsimage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (HDFS-5808) Implement cancellation when saving FSImage


 [ 
https://issues.apache.org/jira/browse/HDFS-5808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao resolved HDFS-5808.
-

  Resolution: Fixed
Hadoop Flags: Reviewed

+1. I've committed this.

 Implement cancellation when saving FSImage
 --

 Key: HDFS-5808
 URL: https://issues.apache.org/jira/browse/HDFS-5808
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: HDFS-5698 (FSImage in protobuf)

 Attachments: HDFS-5808.000.patch, HDFS-5808.001.patch


 This jira proposes to implement checking whether the user has cancel the 
 operation when saving the fsimage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient

[
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880279#comment-13880279
]

Jing Zhao commented on HDFS-5776:
-

# In DFSClient, I agree with Arpit that we should remove the allowHedgedReads
field and the enable/disable methods. In the current code, whether hedged read
is enabled is determined by the initial setting of the hedgedReadThreadPool. If
we provide these extra enable/disable methods, what if a user of DFSClient sets
0 to the thread pool size and later call the enableHedgedReads? Unless we have
a clear use case to support the usage of the enable/disable methods, I guess we
do not need to provide these flexibility here.
An alternative way to do this is to have an Allow-Hedged-Reads configuration,
and if it is set to true, we load the number of thread pool and the threshold
time. We will provide an isHedgedReadsEnabled method but we will not provide
enable/disable methods. I guess this may be easier for users to understand.
# Can this scenario be possible? In hedgedFetchBlockByteRange, if we hit the
timeout for the first DN, we will add the DN to the ignore list, and call
chooseDataNode again. If the first DN is the only DN we can read, we will get
IOException from bestNode. Then we will run into a loop where we keep trying to
get another DN multiple times (some NN rpc call will even be fired). And during
this process the first DN can even return the data. In this scenario I guess we
may get a worse performance? Thus I guess we should not trigger hedged read if
we find that we cannot (easily) find the second DN for read?

Support 'hedged' reads in DFSClient
---

Key: HDFS-5776
URL: https://issues.apache.org/jira/browse/HDFS-5776
Project: Hadoop HDFS
Issue Type: Improvement
Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
Attachments: HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt,
HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776.txt

This is a placeholder of hdfs related stuff backport from
https://issues.apache.org/jira/browse/HBASE-7509
The quorum read ability should be helpful especially to optimize read outliers
we can utilize dfs.dfsclient.quorum.read.threshold.millis
dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read
ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we
could export the interested metric valus into client system(e.g. HBase's
regionserver metric).
The core logic is in pread code path, we decide to goto the original
fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per
the above config items.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient

2014-01-23 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880280#comment-13880280
 ] 

Colin Patrick McCabe commented on HDFS-5776:


bq. One note on 1.. Is static ever a good idea for sharing resources? But your 
point of being able to share amongst DFSClient instances is for sure something 
we should pursue (in another JIRA?)

Unfortunately, the {{FileContext}} API creates a new {{DFSClient}} instance for 
each operation that it does.  (The older {{FileSystem}} API doesn't have this 
problem, since the {{DistributedFileSystem}} object hangs on to the 
{{DFSClient}} for a while.)  This means that we do need to put this in a 
static, for now, or else {{FileContext}} users will be constantly destroying 
and creating thread-pools.

I have another change pending which creates the concept of a cache context, 
where different threads can use different contexts if they like.  For now, 
let's use a static variable, maybe with a TODO.

bq. Related to the previous - what do you think of not exposing the 
DFS_DFSCLIENT_HEDGED_READ_THREADPOOL_SIZE setting at all? Maybe we can just 
expose a boolean setting to enable it. The reason I prefer not to surface such 
settings is because it invites abuse (the concern is not with trusted apps like 
HBase). If we do expose this setting we should at least have an internal upper 
bound.

I don't see why we wouldn't expose this setting.  It doesn't give the client 
the ability to do anything bad it couldn't already do.  You can already try to 
open a zillion files at once in order to attack the {{NameNode}} / 
{{DataNodes}}.  Preventing denial-of-service attacks is not currently something 
we try to do.  And in the future, if we ever do try to prevent 
denial-of-service attacks, I don't think having hedged reads makes that any 
more or less difficult than it would otherwise be.

 Support 'hedged' reads in DFSClient
 ---

 Key: HDFS-5776
 URL: https://issues.apache.org/jira/browse/HDFS-5776
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, 
 HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776.txt


 This is a placeholder of hdfs related stuff backport from 
 https://issues.apache.org/jira/browse/HBASE-7509
 The quorum read ability should be helpful especially to optimize read outliers
 we can utilize dfs.dfsclient.quorum.read.threshold.millis  
 dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read 
 ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
 could export the interested metric valus into client system(e.g. HBase's 
 regionserver metric).
 The core logic is in pread code path, we decide to goto the original 
 fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
 the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient

2014-01-23 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880284#comment-13880284
 ] 

Colin Patrick McCabe commented on HDFS-5776:


By the way, my previous comment was assuming that the alternative proposed to 
making the thread-pool static was putting it in DFSClient (not a good option).  
Another option would be making the thread-pool local to the DFSInputStream.  
However, this seems like it will tend to create an enormous number of threads, 
especially for applications like HBase that open many files.  So again I would 
argue it should be static.

 Support 'hedged' reads in DFSClient
 ---

 Key: HDFS-5776
 URL: https://issues.apache.org/jira/browse/HDFS-5776
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, 
 HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776.txt


 This is a placeholder of hdfs related stuff backport from 
 https://issues.apache.org/jira/browse/HBASE-7509
 The quorum read ability should be helpful especially to optimize read outliers
 we can utilize dfs.dfsclient.quorum.read.threshold.millis  
 dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read 
 ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
 could export the interested metric valus into client system(e.g. HBase's 
 regionserver metric).
 The core logic is in pread code path, we decide to goto the original 
 fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
 the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5804) HDFS NFS Gateway fails to mount and proxy when using Kerberos


 [ 
https://issues.apache.org/jira/browse/HDFS-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated HDFS-5804:
--

Attachment: HDFS-5804.patch

Updated patch does not have special case on root.
Tested with nfs-gateway running as a non-root kerberized user.

 HDFS NFS Gateway fails to mount and proxy when using Kerberos
 -

 Key: HDFS-5804
 URL: https://issues.apache.org/jira/browse/HDFS-5804
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Affects Versions: 3.0.0, 2.2.0
Reporter: Abin Shahab
 Attachments: HDFS-5804.patch, HDFS-5804.patch, 
 javadoc-after-patch.log, javadoc-before-patch.log


 When using HDFS nfs gateway with secure hadoop 
 (hadoop.security.authentication: kerberos), mounting hdfs fails. 
 Additionally, there is no mechanism to support proxy user(nfs needs to proxy 
 as the user invoking commands on the hdfs mount).
 Steps to reproduce:
 1) start a hadoop cluster with kerberos enabled.
 2) sudo su -l nfsserver and start an nfs server. This 'nfsserver' account has 
 a an account in kerberos.
 3) Get the keytab for nfsserver, and issue the following mount command: mount 
 -t nfs -o vers=3,proto=tcp,nolock $server:/  $mount_point
 4) You'll see in the nfsserver logs that Kerberos is complaining about not 
 having a TGT for root.
 This is the stacktrace: 
 java.io.IOException: Failed on local exception: java.io.IOException: 
 org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
 via:[TOKEN, KERBEROS]; Host Details : local host is: 
 my-nfs-server-host.com/10.252.4.197; destination host is: 
 my-namenode-host.com:8020; 
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
   at org.apache.hadoop.ipc.Client.call(Client.java:1351)
   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileLinkInfo(ClientNamenodeProtocolTranslatorPB.java:664)
   at org.apache.hadoop.hdfs.DFSClient.getFileLinkInfo(DFSClient.java:1713)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileStatus(Nfs3Utils.java:58)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileAttr(Nfs3Utils.java:79)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.fsinfo(RpcProgramNfs3.java:1643)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.handleInternal(RpcProgramNfs3.java:1891)
   at 
 org.apache.hadoop.oncrpc.RpcProgram.messageReceived(RpcProgram.java:143)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:281)
   at 
 org.apache.hadoop.oncrpc.RpcUtil$RpcMessageParserStage.messageReceived(RpcUtil.java:132)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
   at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
   at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
   at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
   at

[jira] [Commented] (HDFS-5808) Implement cancellation when saving FSImage

2014-01-23 Thread Suresh Srinivas (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880296#comment-13880296
 ] 

Suresh Srinivas commented on HDFS-5808:
---

Can you please add description on why this is needed (use case) and what the 
patch does?

 Implement cancellation when saving FSImage
 --

 Key: HDFS-5808
 URL: https://issues.apache.org/jira/browse/HDFS-5808
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: HDFS-5698 (FSImage in protobuf)

 Attachments: HDFS-5808.000.patch, HDFS-5808.001.patch


 This jira proposes to implement checking whether the user has cancel the 
 operation when saving the fsimage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5808) Implement cancellation when saving FSImage


 [ 
https://issues.apache.org/jira/browse/HDFS-5808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5808:
-

Description: 
The code should be able to cancel the progress of saving fsimage / 
checkpointing. When fail over happens, the code needs to timely cancel the 
checkpoint operations so that the fail over sequences can proceed.

The same functionality exists in the old code. This jira proposes to implement 
the same functionality in the new fsimage code.

  was:This jira proposes to implement checking whether the user has cancel the 
operation when saving the fsimage.


 Implement cancellation when saving FSImage
 --

 Key: HDFS-5808
 URL: https://issues.apache.org/jira/browse/HDFS-5808
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: HDFS-5698 (FSImage in protobuf)

 Attachments: HDFS-5808.000.patch, HDFS-5808.001.patch


 The code should be able to cancel the progress of saving fsimage / 
 checkpointing. When fail over happens, the code needs to timely cancel the 
 checkpoint operations so that the fail over sequences can proceed.
 The same functionality exists in the old code. This jira proposes to 
 implement the same functionality in the new fsimage code.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5804) HDFS NFS Gateway fails to mount and proxy when using Kerberos


 [ 
https://issues.apache.org/jira/browse/HDFS-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated HDFS-5804:
--

Attachment: exception-as-root.log

This is the exception I get now. 
ROOT is doing the mount of nfs.
As part of the mount, it issues an FSINFO call, which fails, and it fails the 
mount.

I propose we catch and log the Access control exception for this failure, but 
not necessary fail the mount.

 HDFS NFS Gateway fails to mount and proxy when using Kerberos
 -

 Key: HDFS-5804
 URL: https://issues.apache.org/jira/browse/HDFS-5804
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Affects Versions: 3.0.0, 2.2.0
Reporter: Abin Shahab
 Attachments: HDFS-5804.patch, HDFS-5804.patch, exception-as-root.log, 
 javadoc-after-patch.log, javadoc-before-patch.log


 When using HDFS nfs gateway with secure hadoop 
 (hadoop.security.authentication: kerberos), mounting hdfs fails. 
 Additionally, there is no mechanism to support proxy user(nfs needs to proxy 
 as the user invoking commands on the hdfs mount).
 Steps to reproduce:
 1) start a hadoop cluster with kerberos enabled.
 2) sudo su -l nfsserver and start an nfs server. This 'nfsserver' account has 
 a an account in kerberos.
 3) Get the keytab for nfsserver, and issue the following mount command: mount 
 -t nfs -o vers=3,proto=tcp,nolock $server:/  $mount_point
 4) You'll see in the nfsserver logs that Kerberos is complaining about not 
 having a TGT for root.
 This is the stacktrace: 
 java.io.IOException: Failed on local exception: java.io.IOException: 
 org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
 via:[TOKEN, KERBEROS]; Host Details : local host is: 
 my-nfs-server-host.com/10.252.4.197; destination host is: 
 my-namenode-host.com:8020; 
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
   at org.apache.hadoop.ipc.Client.call(Client.java:1351)
   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileLinkInfo(ClientNamenodeProtocolTranslatorPB.java:664)
   at org.apache.hadoop.hdfs.DFSClient.getFileLinkInfo(DFSClient.java:1713)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileStatus(Nfs3Utils.java:58)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileAttr(Nfs3Utils.java:79)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.fsinfo(RpcProgramNfs3.java:1643)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.handleInternal(RpcProgramNfs3.java:1891)
   at 
 org.apache.hadoop.oncrpc.RpcProgram.messageReceived(RpcProgram.java:143)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:281)
   at 
 org.apache.hadoop.oncrpc.RpcUtil$RpcMessageParserStage.messageReceived(RpcUtil.java:132)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
   at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
   at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
   at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
   at

[jira] [Commented] (HDFS-5804) HDFS NFS Gateway fails to mount and proxy when using Kerberos


[ 
https://issues.apache.org/jira/browse/HDFS-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880379#comment-13880379
 ] 

Jing Zhao commented on HDFS-5804:
-

So I guess that idea here is that the nfs gateway acts a service, and 
authenticates itself through Kerberos to Hadoop/HDFS. Then for the clients of 
nfs, if a client can authenticate itself in the NFS gateway (currently we only 
support AUTH_UNIX, and we plan to support GSS in HDFS-5539), the nfs gateway 
will create a proxy user for the client and use the proxy user to communicate 
with HDFS.

Back to the exception, I have not tested myself, but have you add the proxy 
user setting in your HDFS's configuration? Because I saw the exception msg is 
User: nfsserver/krb-nfs-desktop.my.company@krb.altiscale.com is not 
allowed to impersonate root.

 HDFS NFS Gateway fails to mount and proxy when using Kerberos
 -

 Key: HDFS-5804
 URL: https://issues.apache.org/jira/browse/HDFS-5804
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Affects Versions: 3.0.0, 2.2.0
Reporter: Abin Shahab
 Attachments: HDFS-5804.patch, HDFS-5804.patch, exception-as-root.log, 
 javadoc-after-patch.log, javadoc-before-patch.log


 When using HDFS nfs gateway with secure hadoop 
 (hadoop.security.authentication: kerberos), mounting hdfs fails. 
 Additionally, there is no mechanism to support proxy user(nfs needs to proxy 
 as the user invoking commands on the hdfs mount).
 Steps to reproduce:
 1) start a hadoop cluster with kerberos enabled.
 2) sudo su -l nfsserver and start an nfs server. This 'nfsserver' account has 
 a an account in kerberos.
 3) Get the keytab for nfsserver, and issue the following mount command: mount 
 -t nfs -o vers=3,proto=tcp,nolock $server:/  $mount_point
 4) You'll see in the nfsserver logs that Kerberos is complaining about not 
 having a TGT for root.
 This is the stacktrace: 
 java.io.IOException: Failed on local exception: java.io.IOException: 
 org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
 via:[TOKEN, KERBEROS]; Host Details : local host is: 
 my-nfs-server-host.com/10.252.4.197; destination host is: 
 my-namenode-host.com:8020; 
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
   at org.apache.hadoop.ipc.Client.call(Client.java:1351)
   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileLinkInfo(ClientNamenodeProtocolTranslatorPB.java:664)
   at org.apache.hadoop.hdfs.DFSClient.getFileLinkInfo(DFSClient.java:1713)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileStatus(Nfs3Utils.java:58)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileAttr(Nfs3Utils.java:79)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.fsinfo(RpcProgramNfs3.java:1643)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.handleInternal(RpcProgramNfs3.java:1891)
   at 
 org.apache.hadoop.oncrpc.RpcProgram.messageReceived(RpcProgram.java:143)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:281)
   at 
 org.apache.hadoop.oncrpc.RpcUtil$RpcMessageParserStage.messageReceived(RpcUtil.java:132)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
   at

[jira] [Commented] (HDFS-5804) HDFS NFS Gateway fails to mount and proxy when using Kerberos


[ 
https://issues.apache.org/jira/browse/HDFS-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880395#comment-13880395
 ] 

Abin Shahab commented on HDFS-5804:
---

Jing, Thanks a lot for looking at the issue. I think you've captured what I'm 
trying to do very well! Thanks for that.

Yes. We specifically do not want nfsserver(the user running the nfs-gateway) to 
be able to impersonate root. We need root for one thing, and only one thing: to 
mount the filesystem. After that, root is irrelevant, and should not have any 
access to do anything. Regretably, it does an FSINFO as part of the mount.


 HDFS NFS Gateway fails to mount and proxy when using Kerberos
 -

 Key: HDFS-5804
 URL: https://issues.apache.org/jira/browse/HDFS-5804
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Affects Versions: 3.0.0, 2.2.0
Reporter: Abin Shahab
 Attachments: HDFS-5804.patch, HDFS-5804.patch, exception-as-root.log, 
 javadoc-after-patch.log, javadoc-before-patch.log


 When using HDFS nfs gateway with secure hadoop 
 (hadoop.security.authentication: kerberos), mounting hdfs fails. 
 Additionally, there is no mechanism to support proxy user(nfs needs to proxy 
 as the user invoking commands on the hdfs mount).
 Steps to reproduce:
 1) start a hadoop cluster with kerberos enabled.
 2) sudo su -l nfsserver and start an nfs server. This 'nfsserver' account has 
 a an account in kerberos.
 3) Get the keytab for nfsserver, and issue the following mount command: mount 
 -t nfs -o vers=3,proto=tcp,nolock $server:/  $mount_point
 4) You'll see in the nfsserver logs that Kerberos is complaining about not 
 having a TGT for root.
 This is the stacktrace: 
 java.io.IOException: Failed on local exception: java.io.IOException: 
 org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
 via:[TOKEN, KERBEROS]; Host Details : local host is: 
 my-nfs-server-host.com/10.252.4.197; destination host is: 
 my-namenode-host.com:8020; 
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
   at org.apache.hadoop.ipc.Client.call(Client.java:1351)
   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileLinkInfo(ClientNamenodeProtocolTranslatorPB.java:664)
   at org.apache.hadoop.hdfs.DFSClient.getFileLinkInfo(DFSClient.java:1713)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileStatus(Nfs3Utils.java:58)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileAttr(Nfs3Utils.java:79)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.fsinfo(RpcProgramNfs3.java:1643)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.handleInternal(RpcProgramNfs3.java:1891)
   at 
 org.apache.hadoop.oncrpc.RpcProgram.messageReceived(RpcProgram.java:143)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:281)
   at 
 org.apache.hadoop.oncrpc.RpcUtil$RpcMessageParserStage.messageReceived(RpcUtil.java:132)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
   at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
   at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
   at

[jira] [Moved] (HDFS-5825) Use FileUtils.copyFile() to implement DFSTestUtils.copyFile()


 [ 
https://issues.apache.org/jira/browse/HDFS-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai moved HADOOP-10271 to HDFS-5825:
---

Key: HDFS-5825  (was: HADOOP-10271)
Project: Hadoop HDFS  (was: Hadoop Common)

 Use FileUtils.copyFile() to implement DFSTestUtils.copyFile()
 -

 Key: HDFS-5825
 URL: https://issues.apache.org/jira/browse/HDFS-5825
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
Priority: Minor
 Attachments: HDFS-5825.000.patch


 {{DFSTestUtils.copyFile()}} is implemented by copying data through 
 FileInputStream / FileOutputStream. Apache Common IO provides 
 {{FileUtils.copyFile()}}. It uses FileChannel which is more efficient.
 This jira proposes to implement {{DFSTestUtils.copyFile()}} using 
 {{FileUtils.copyFile()}}.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5825) Use FileUtils.copyFile() to implement DFSTestUtils.copyFile()


 [ 
https://issues.apache.org/jira/browse/HDFS-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5825:
-

Attachment: HDFS-5825.000.patch

 Use FileUtils.copyFile() to implement DFSTestUtils.copyFile()
 -

 Key: HDFS-5825
 URL: https://issues.apache.org/jira/browse/HDFS-5825
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
Priority: Minor
 Attachments: HDFS-5825.000.patch


 {{DFSTestUtils.copyFile()}} is implemented by copying data through 
 FileInputStream / FileOutputStream. Apache Common IO provides 
 {{FileUtils.copyFile()}}. It uses FileChannel which is more efficient.
 This jira proposes to implement {{DFSTestUtils.copyFile()}} using 
 {{FileUtils.copyFile()}}.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5825) Use FileUtils.copyFile() to implement DFSTestUtils.copyFile()


 [ 
https://issues.apache.org/jira/browse/HDFS-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5825:
-

Status: Patch Available  (was: Open)

 Use FileUtils.copyFile() to implement DFSTestUtils.copyFile()
 -

 Key: HDFS-5825
 URL: https://issues.apache.org/jira/browse/HDFS-5825
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
Priority: Minor
 Attachments: HDFS-5825.000.patch


 {{DFSTestUtils.copyFile()}} is implemented by copying data through 
 FileInputStream / FileOutputStream. Apache Common IO provides 
 {{FileUtils.copyFile()}}. It uses FileChannel which is more efficient.
 This jira proposes to implement {{DFSTestUtils.copyFile()}} using 
 {{FileUtils.copyFile()}}.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-4685) Implementation of ACLs in HDFS

2014-01-23 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880440#comment-13880440
]

Sanjay Radia commented on HDFS-4685:

Comment on the the two alternatives for the default ACL proposals in the doc.
Reproducing the text for convenience.

* *Umask-Default-ACL*: The default ACL of the parent is cloned to the ACL of
the child at time of child creation. For new child directories, the default
ACL itself is also cloned, so that the same policy is applied to
sub-directories of sub-directories. Subsequent changes to the parent’s default
ACL will set a different ACL for new children, but will not alter existing
children. This matches POSIX behavior. If the administrator wants to change
policy on the sub-tree later, then this is performed by inserting a new more
restrictive ACL entry at the appropriate sub-tree root (see UC6) and may also
need to run a recursive ACL modification (analogous to chmod -R) since
existing children are not effected by the new ACL.

* *Inherited-Default-ACL*: A child that does not have an ACL of its own
inherits its ACL from the nearest ancestor that has defined a default ACL. A
child node that requires a different ACL can override the default (like the
Umask-Default-ACL). Subsequent changes to the ancestor’s default ACL will cause
all children that do not have an ACL to inherit the new ACL regardless of child
creation time (unlike Umask-Default-ACL). This model, like the ABAC ACLs (use
case UC8), encourages the user to create fewer ACLs (typically on the root of
specific subtrees) while the Posix-compliant Umask-Default-ACL is expected to
results in larger number of ACLs in the system. It would also make a memory
efficient implementation trivial. Note that this model is a deviation from
POSIX behavior.

Consider the following three sub use cases here
4a) OpenUP child for wide access than the default.
4b) Restrict a child for narrower access than the default.
4c) Change the defaultAcl because you made a mistake originally.

Both models support use case 4a and 4b with equal ease. However, with the
Inherited-Default-ACL, it is easy to identify children that have overridden the
default-ACL - the existence of an ACL means that the user intended to override
the default. Also 4c is a natural fit for Inherited-Default-ACL. For the
UMask-Default-ACL, every child has an ACL and hence you have to walk down the
subtree and compare the ACL with the default to see if the user had intended to
override it.

I think the Inherited-Default-ACL is much better design but posix compliance
may triumph and hence am willing to go with UMask-Default-ACL.

Implementation of ACLs in HDFS
--

Key: HDFS-4685
URL: https://issues.apache.org/jira/browse/HDFS-4685
Project: Hadoop HDFS
Issue Type: New Feature
Components: hdfs-client, namenode, security
Affects Versions: 1.1.2
Reporter: Sachin Jose
Assignee: Chris Nauroth
Attachments: HDFS-ACLs-Design-1.pdf, HDFS-ACLs-Design-2.pdf

Currenly hdfs doesn't support Extended file ACL. In unix extended ACL can be
achieved using getfacl and setfacl utilities. Is there anybody working on
this feature ?

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5804) HDFS NFS Gateway fails to mount and proxy when using Kerberos


[ 
https://issues.apache.org/jira/browse/HDFS-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880459#comment-13880459
 ] 

Jing Zhao commented on HDFS-5804:
-

Abin, I see your issue now. So from the nfs-gateway point of view, I think it 
should just simply impersonate any user who has passed its own authentication, 
thus should not have special case on root. In HDFS, why do you want to disable 
the proxy setting for root? HDFS does not respect root as a special user.

 HDFS NFS Gateway fails to mount and proxy when using Kerberos
 -

 Key: HDFS-5804
 URL: https://issues.apache.org/jira/browse/HDFS-5804
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Affects Versions: 3.0.0, 2.2.0
Reporter: Abin Shahab
 Attachments: HDFS-5804.patch, HDFS-5804.patch, exception-as-root.log, 
 javadoc-after-patch.log, javadoc-before-patch.log


 When using HDFS nfs gateway with secure hadoop 
 (hadoop.security.authentication: kerberos), mounting hdfs fails. 
 Additionally, there is no mechanism to support proxy user(nfs needs to proxy 
 as the user invoking commands on the hdfs mount).
 Steps to reproduce:
 1) start a hadoop cluster with kerberos enabled.
 2) sudo su -l nfsserver and start an nfs server. This 'nfsserver' account has 
 a an account in kerberos.
 3) Get the keytab for nfsserver, and issue the following mount command: mount 
 -t nfs -o vers=3,proto=tcp,nolock $server:/  $mount_point
 4) You'll see in the nfsserver logs that Kerberos is complaining about not 
 having a TGT for root.
 This is the stacktrace: 
 java.io.IOException: Failed on local exception: java.io.IOException: 
 org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
 via:[TOKEN, KERBEROS]; Host Details : local host is: 
 my-nfs-server-host.com/10.252.4.197; destination host is: 
 my-namenode-host.com:8020; 
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
   at org.apache.hadoop.ipc.Client.call(Client.java:1351)
   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileLinkInfo(ClientNamenodeProtocolTranslatorPB.java:664)
   at org.apache.hadoop.hdfs.DFSClient.getFileLinkInfo(DFSClient.java:1713)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileStatus(Nfs3Utils.java:58)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileAttr(Nfs3Utils.java:79)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.fsinfo(RpcProgramNfs3.java:1643)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.handleInternal(RpcProgramNfs3.java:1891)
   at 
 org.apache.hadoop.oncrpc.RpcProgram.messageReceived(RpcProgram.java:143)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:281)
   at 
 org.apache.hadoop.oncrpc.RpcUtil$RpcMessageParserStage.messageReceived(RpcUtil.java:132)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
   at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
   at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
   at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
   at

[jira] [Commented] (HDFS-5804) HDFS NFS Gateway fails to mount and proxy when using Kerberos


[ 
https://issues.apache.org/jira/browse/HDFS-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880491#comment-13880491
 ] 

Abin Shahab commented on HDFS-5804:
---

Ah! I see your point. I think I can allow nfsserver to proxy root, and that'd 
allow this patch to work properly(I've removed the root check condition).

BTW, this still allows any user in the proxied group to authenticate WITHOUT 
having a kerberos ticket. Do you have any advice on implementing the kerberos 
authentication on the nfs-gateway? We are kerberizing our clusters, and seems 
like nfs is allowing them to circumvent kerberos authentication.

 HDFS NFS Gateway fails to mount and proxy when using Kerberos
 -

 Key: HDFS-5804
 URL: https://issues.apache.org/jira/browse/HDFS-5804
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Affects Versions: 3.0.0, 2.2.0
Reporter: Abin Shahab
 Attachments: HDFS-5804.patch, HDFS-5804.patch, exception-as-root.log, 
 javadoc-after-patch.log, javadoc-before-patch.log


 When using HDFS nfs gateway with secure hadoop 
 (hadoop.security.authentication: kerberos), mounting hdfs fails. 
 Additionally, there is no mechanism to support proxy user(nfs needs to proxy 
 as the user invoking commands on the hdfs mount).
 Steps to reproduce:
 1) start a hadoop cluster with kerberos enabled.
 2) sudo su -l nfsserver and start an nfs server. This 'nfsserver' account has 
 a an account in kerberos.
 3) Get the keytab for nfsserver, and issue the following mount command: mount 
 -t nfs -o vers=3,proto=tcp,nolock $server:/  $mount_point
 4) You'll see in the nfsserver logs that Kerberos is complaining about not 
 having a TGT for root.
 This is the stacktrace: 
 java.io.IOException: Failed on local exception: java.io.IOException: 
 org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
 via:[TOKEN, KERBEROS]; Host Details : local host is: 
 my-nfs-server-host.com/10.252.4.197; destination host is: 
 my-namenode-host.com:8020; 
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
   at org.apache.hadoop.ipc.Client.call(Client.java:1351)
   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileLinkInfo(ClientNamenodeProtocolTranslatorPB.java:664)
   at org.apache.hadoop.hdfs.DFSClient.getFileLinkInfo(DFSClient.java:1713)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileStatus(Nfs3Utils.java:58)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileAttr(Nfs3Utils.java:79)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.fsinfo(RpcProgramNfs3.java:1643)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.handleInternal(RpcProgramNfs3.java:1891)
   at 
 org.apache.hadoop.oncrpc.RpcProgram.messageReceived(RpcProgram.java:143)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:281)
   at 
 org.apache.hadoop.oncrpc.RpcUtil$RpcMessageParserStage.messageReceived(RpcUtil.java:132)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
   at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
   at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
   at

[jira] [Commented] (HDFS-5804) HDFS NFS Gateway fails to mount and proxy when using Kerberos


[ 
https://issues.apache.org/jira/browse/HDFS-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880508#comment-13880508
 ] 

Jing Zhao commented on HDFS-5804:
-

bq. this still allows any user in the proxied group to authenticate WITHOUT 
having a kerberos ticket.

Yeah, currently nfs-gateway can only do simple AUTH_UNIX authentication, thus 
we need to finish HDFS-5086 so that nfs-gateway can authenticate clients based 
on kerberos. I have an in-progress patch long time ago, I will see if I can 
finish it recently. Also feel free to assign that jira to yourself if you want 
to work on it.

 HDFS NFS Gateway fails to mount and proxy when using Kerberos
 -

 Key: HDFS-5804
 URL: https://issues.apache.org/jira/browse/HDFS-5804
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Affects Versions: 3.0.0, 2.2.0
Reporter: Abin Shahab
 Attachments: HDFS-5804.patch, HDFS-5804.patch, exception-as-root.log, 
 javadoc-after-patch.log, javadoc-before-patch.log


 When using HDFS nfs gateway with secure hadoop 
 (hadoop.security.authentication: kerberos), mounting hdfs fails. 
 Additionally, there is no mechanism to support proxy user(nfs needs to proxy 
 as the user invoking commands on the hdfs mount).
 Steps to reproduce:
 1) start a hadoop cluster with kerberos enabled.
 2) sudo su -l nfsserver and start an nfs server. This 'nfsserver' account has 
 a an account in kerberos.
 3) Get the keytab for nfsserver, and issue the following mount command: mount 
 -t nfs -o vers=3,proto=tcp,nolock $server:/  $mount_point
 4) You'll see in the nfsserver logs that Kerberos is complaining about not 
 having a TGT for root.
 This is the stacktrace: 
 java.io.IOException: Failed on local exception: java.io.IOException: 
 org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
 via:[TOKEN, KERBEROS]; Host Details : local host is: 
 my-nfs-server-host.com/10.252.4.197; destination host is: 
 my-namenode-host.com:8020; 
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
   at org.apache.hadoop.ipc.Client.call(Client.java:1351)
   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileLinkInfo(ClientNamenodeProtocolTranslatorPB.java:664)
   at org.apache.hadoop.hdfs.DFSClient.getFileLinkInfo(DFSClient.java:1713)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileStatus(Nfs3Utils.java:58)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileAttr(Nfs3Utils.java:79)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.fsinfo(RpcProgramNfs3.java:1643)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.handleInternal(RpcProgramNfs3.java:1891)
   at 
 org.apache.hadoop.oncrpc.RpcProgram.messageReceived(RpcProgram.java:143)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:281)
   at 
 org.apache.hadoop.oncrpc.RpcUtil$RpcMessageParserStage.messageReceived(RpcUtil.java:132)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
   at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
   at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
   at

[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction

[
https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880525#comment-13880525
]

Jing Zhao commented on HDFS-5723:
-

Hi Vinay, one question about the patch: so this inconsistent generation stamp
can also be caused by a delayed block-received report? I.e., after the first
close(), the DN's report gets delayed and is received by NN when the append
starts. In that case, will we have any issue by wrongly putting the (block, DN)
into the corruptBlockMap?

Append failed FINALIZED replica should not be accepted as valid when that
block is underconstruction

Key: HDFS-5723
URL: https://issues.apache.org/jira/browse/HDFS-5723
Project: Hadoop HDFS
Issue Type: Bug
Components: namenode
Affects Versions: 2.2.0
Reporter: Vinay
Assignee: Vinay
Attachments: HDFS-5723.patch

Scenario:
1. 3 node cluster with
dfs.client.block.write.replace-datanode-on-failure.enable set to false.
2. One file is written with 3 replicas, blk_id_gs1
3. One of the datanode DN1 is down.
4. File was opened with append and some more data is added to the file and
synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2
5. Now DN1 restarted
6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should
be marked corrupted.
but since NN having appended block state as UnderConstruction, at this time
its not detecting this block as corrupt and adding to valid block locations.
As long as the namenode is alive, this datanode also will be considered as
valid replica and read/append will fail in that datanode.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Assigned] (HDFS-5797) Implement offline image viewer.


 [ 
https://issues.apache.org/jira/browse/HDFS-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai reassigned HDFS-5797:


Assignee: Haohui Mai

 Implement offline image viewer.
 ---

 Key: HDFS-5797
 URL: https://issues.apache.org/jira/browse/HDFS-5797
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: HDFS-5698 (FSImage in protobuf)

 Attachments: HDFS-5797.000.patch


 The format of FSImage has changed dramatically therefore a new implementation 
 of OfflineImageViewer is required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5797) Implement offline image viewer.


 [ 
https://issues.apache.org/jira/browse/HDFS-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5797:
-

Attachment: HDFS-5797.000.patch

 Implement offline image viewer.
 ---

 Key: HDFS-5797
 URL: https://issues.apache.org/jira/browse/HDFS-5797
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: HDFS-5698 (FSImage in protobuf)

 Attachments: HDFS-5797.000.patch


 The format of FSImage has changed dramatically therefore a new implementation 
 of OfflineImageViewer is required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5804) HDFS NFS Gateway fails to mount and proxy when using Kerberos


[ 
https://issues.apache.org/jira/browse/HDFS-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880550#comment-13880550
 ] 

Abin Shahab commented on HDFS-5804:
---

May I take a look at your patch? I was planning to mimic how 
org.apache.hadoop.ipc.Client does the authentication.
Also, I don't have access to assign issues to myself. I would definitely like 
to assign this one to me.

 HDFS NFS Gateway fails to mount and proxy when using Kerberos
 -

 Key: HDFS-5804
 URL: https://issues.apache.org/jira/browse/HDFS-5804
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Affects Versions: 3.0.0, 2.2.0
Reporter: Abin Shahab
 Attachments: HDFS-5804.patch, HDFS-5804.patch, exception-as-root.log, 
 javadoc-after-patch.log, javadoc-before-patch.log


 When using HDFS nfs gateway with secure hadoop 
 (hadoop.security.authentication: kerberos), mounting hdfs fails. 
 Additionally, there is no mechanism to support proxy user(nfs needs to proxy 
 as the user invoking commands on the hdfs mount).
 Steps to reproduce:
 1) start a hadoop cluster with kerberos enabled.
 2) sudo su -l nfsserver and start an nfs server. This 'nfsserver' account has 
 a an account in kerberos.
 3) Get the keytab for nfsserver, and issue the following mount command: mount 
 -t nfs -o vers=3,proto=tcp,nolock $server:/  $mount_point
 4) You'll see in the nfsserver logs that Kerberos is complaining about not 
 having a TGT for root.
 This is the stacktrace: 
 java.io.IOException: Failed on local exception: java.io.IOException: 
 org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
 via:[TOKEN, KERBEROS]; Host Details : local host is: 
 my-nfs-server-host.com/10.252.4.197; destination host is: 
 my-namenode-host.com:8020; 
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
   at org.apache.hadoop.ipc.Client.call(Client.java:1351)
   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileLinkInfo(ClientNamenodeProtocolTranslatorPB.java:664)
   at org.apache.hadoop.hdfs.DFSClient.getFileLinkInfo(DFSClient.java:1713)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileStatus(Nfs3Utils.java:58)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileAttr(Nfs3Utils.java:79)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.fsinfo(RpcProgramNfs3.java:1643)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.handleInternal(RpcProgramNfs3.java:1891)
   at 
 org.apache.hadoop.oncrpc.RpcProgram.messageReceived(RpcProgram.java:143)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:281)
   at 
 org.apache.hadoop.oncrpc.RpcUtil$RpcMessageParserStage.messageReceived(RpcUtil.java:132)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
   at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
   at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
   at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
   at

[jira] [Commented] (HDFS-5608) WebHDFS: implement GETACLSTATUS and SETACL.

2014-01-23 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880565#comment-13880565
 ] 

Chris Nauroth commented on HDFS-5608:
-

Comments on the latest version of the patch:

# {{JsonUtil}}: It looks like we never would call the new {{toJsonString}} or 
{{toAclStatus}} methods with {{includesType}} set to false.  Can we remove the 
{{includesType}} parameter and the code paths that would have handled the 
{{false}} case?  That way, we won't have dormant, untested code.
# {{JsonUtil#toAclStatus}}: Inside the for loop, we're still repeating parsing 
logic that also exists in {{AclEntry#parseAclSpec}}.  I think we need one more 
refactoring in the common code to give us a static {{AclEntry#parseAclEntry}} 
method that parses a single ACL entry (not a list like 
{{AclEntry#parseAclSpec}}).  I'll let you know when that's available.
# {{AclPermissionParam}}: The constructor makes 3 separate calls to 
{{parseAclSpec}}.  That method cannot return null, so you can cut down to 2 
calls by removing the null check.
# {{AclPermissionParam#parseAclSpec}}: There is a typo in the variable name 
{{aclspce}}.  Inside the for loop, you can use {{AclEntry#toString}} instead of 
building the string yourself.  This code has some bugs like not prepending 
default for a default ACL and a {{NullPointerException}} getting the 
permission symbol if no permission is defined (as {{removeAclEntries}} does).  
You'll get them all fixed for free if you switch to {{AclEntry#toString}}.  
Actually, it seems like this whole method is equivalent to this one-liner: 
{{StringUtils.join(,, entries)}}.  Try that out.
# {{TestJsonUtil}}: Thanks for adding tests!  I recently added a class named 
{{AclTestHelpers}} that contains some static methods that can shorten test code 
that needs to make ACL entries.  You can see an example of how this is used in 
places like {{TestNameNodeAcl}}.
# In addition to the tests already in the patch, we'll also need to add 
functional tests that exercise each of the new APIs end-to-end using a 
{{MiniDFSCluster}}.  Let's add these tests in 
{{org.apache.hadoop.hdfs.web.TestWebHDFS}}.

I think we'll be ready to commit after the above is addressed, so we're getting 
close.  :-)  Thank you for incorporating the feedback.

 WebHDFS: implement GETACLSTATUS and SETACL.
 ---

 Key: HDFS-5608
 URL: https://issues.apache.org/jira/browse/HDFS-5608
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: webhdfs
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Chris Nauroth
Assignee: Sachin Jose
 Attachments: HDFS-5608.0.patch, HDFS-5608.1.patch, HDFS-5608.2.patch, 
 HDFS-5608.3.patch


 Implement and test {{GETACLS}} and {{SETACL}} in WebHDFS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5826) Update the stored edit logs to be consistent with the changes in HDFS-5698 branch

Haohui Mai created HDFS-5826:


 Summary: Update the stored edit logs to be consistent with the 
changes in HDFS-5698 branch
 Key: HDFS-5826
 URL: https://issues.apache.org/jira/browse/HDFS-5826
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: HDFS-5698 (FSImage in protobuf)
Reporter: Haohui Mai
Assignee: Haohui Mai


HDFS-5698 bumps the LayoutVersion to indicate whether the file is in new 
format. The stored edit logs have to be updated in order to pass 
{{testOfflineEditsViewer}}.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5804) HDFS NFS Gateway fails to mount and proxy when using Kerberos


[ 
https://issues.apache.org/jira/browse/HDFS-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880572#comment-13880572
 ] 

Jing Zhao commented on HDFS-5804:
-

Sure, I will post what I have to HDFS-5086. In general, I was just trying to 
merge the GSS authentication part from [~brocknoland]'s NFS4 implementation 
(https://github.com/cloudera/hdfs-nfs-proxy) into the current NFS3-based 
implementation. You can directly check [~brocknoland]'s implementation also.

 HDFS NFS Gateway fails to mount and proxy when using Kerberos
 -

 Key: HDFS-5804
 URL: https://issues.apache.org/jira/browse/HDFS-5804
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Affects Versions: 3.0.0, 2.2.0
Reporter: Abin Shahab
 Attachments: HDFS-5804.patch, HDFS-5804.patch, exception-as-root.log, 
 javadoc-after-patch.log, javadoc-before-patch.log


 When using HDFS nfs gateway with secure hadoop 
 (hadoop.security.authentication: kerberos), mounting hdfs fails. 
 Additionally, there is no mechanism to support proxy user(nfs needs to proxy 
 as the user invoking commands on the hdfs mount).
 Steps to reproduce:
 1) start a hadoop cluster with kerberos enabled.
 2) sudo su -l nfsserver and start an nfs server. This 'nfsserver' account has 
 a an account in kerberos.
 3) Get the keytab for nfsserver, and issue the following mount command: mount 
 -t nfs -o vers=3,proto=tcp,nolock $server:/  $mount_point
 4) You'll see in the nfsserver logs that Kerberos is complaining about not 
 having a TGT for root.
 This is the stacktrace: 
 java.io.IOException: Failed on local exception: java.io.IOException: 
 org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
 via:[TOKEN, KERBEROS]; Host Details : local host is: 
 my-nfs-server-host.com/10.252.4.197; destination host is: 
 my-namenode-host.com:8020; 
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
   at org.apache.hadoop.ipc.Client.call(Client.java:1351)
   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileLinkInfo(ClientNamenodeProtocolTranslatorPB.java:664)
   at org.apache.hadoop.hdfs.DFSClient.getFileLinkInfo(DFSClient.java:1713)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileStatus(Nfs3Utils.java:58)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileAttr(Nfs3Utils.java:79)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.fsinfo(RpcProgramNfs3.java:1643)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.handleInternal(RpcProgramNfs3.java:1891)
   at 
 org.apache.hadoop.oncrpc.RpcProgram.messageReceived(RpcProgram.java:143)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:281)
   at 
 org.apache.hadoop.oncrpc.RpcUtil$RpcMessageParserStage.messageReceived(RpcUtil.java:132)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
   at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
   at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
   at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)

[jira] [Updated] (HDFS-5826) Update the stored edit logs to be consistent with the changes in HDFS-5698 branch


 [ 
https://issues.apache.org/jira/browse/HDFS-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5826:
-

Attachment: HDFS-5826.000.patch

 Update the stored edit logs to be consistent with the changes in HDFS-5698 
 branch
 -

 Key: HDFS-5826
 URL: https://issues.apache.org/jira/browse/HDFS-5826
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: HDFS-5698 (FSImage in protobuf)
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-5826.000.patch


 HDFS-5698 bumps the LayoutVersion to indicate whether the file is in new 
 format. The stored edit logs have to be updated in order to pass 
 {{testOfflineEditsViewer}}.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5804) HDFS NFS Gateway fails to mount and proxy when using Kerberos


[ 
https://issues.apache.org/jira/browse/HDFS-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880577#comment-13880577
 ] 

Abin Shahab commented on HDFS-5804:
---

BTW, I have a patch that gets rid off even checking whether we are in secure 
mode, but I'm not sure if it's the right thing to submit that patch. That patch 
would require the nfs-gateway user(nfsserver in our case) be allowed to proxy 
root, even in non-secure mode. That's a big change.

 HDFS NFS Gateway fails to mount and proxy when using Kerberos
 -

 Key: HDFS-5804
 URL: https://issues.apache.org/jira/browse/HDFS-5804
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Affects Versions: 3.0.0, 2.2.0
Reporter: Abin Shahab
 Attachments: HDFS-5804.patch, HDFS-5804.patch, exception-as-root.log, 
 javadoc-after-patch.log, javadoc-before-patch.log


 When using HDFS nfs gateway with secure hadoop 
 (hadoop.security.authentication: kerberos), mounting hdfs fails. 
 Additionally, there is no mechanism to support proxy user(nfs needs to proxy 
 as the user invoking commands on the hdfs mount).
 Steps to reproduce:
 1) start a hadoop cluster with kerberos enabled.
 2) sudo su -l nfsserver and start an nfs server. This 'nfsserver' account has 
 a an account in kerberos.
 3) Get the keytab for nfsserver, and issue the following mount command: mount 
 -t nfs -o vers=3,proto=tcp,nolock $server:/  $mount_point
 4) You'll see in the nfsserver logs that Kerberos is complaining about not 
 having a TGT for root.
 This is the stacktrace: 
 java.io.IOException: Failed on local exception: java.io.IOException: 
 org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
 via:[TOKEN, KERBEROS]; Host Details : local host is: 
 my-nfs-server-host.com/10.252.4.197; destination host is: 
 my-namenode-host.com:8020; 
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
   at org.apache.hadoop.ipc.Client.call(Client.java:1351)
   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileLinkInfo(ClientNamenodeProtocolTranslatorPB.java:664)
   at org.apache.hadoop.hdfs.DFSClient.getFileLinkInfo(DFSClient.java:1713)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileStatus(Nfs3Utils.java:58)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileAttr(Nfs3Utils.java:79)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.fsinfo(RpcProgramNfs3.java:1643)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.handleInternal(RpcProgramNfs3.java:1891)
   at 
 org.apache.hadoop.oncrpc.RpcProgram.messageReceived(RpcProgram.java:143)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:281)
   at 
 org.apache.hadoop.oncrpc.RpcUtil$RpcMessageParserStage.messageReceived(RpcUtil.java:132)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
   at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
   at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
   at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
   at

[jira] [Updated] (HDFS-5086) Support RPCSEC_GSS authentication in NFSv3 gateway


 [ 
https://issues.apache.org/jira/browse/HDFS-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5086:


Attachment: HDFS-5086.000.patch

An in-progress patch.

 Support RPCSEC_GSS authentication in NFSv3 gateway
 --

 Key: HDFS-5086
 URL: https://issues.apache.org/jira/browse/HDFS-5086
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Affects Versions: 3.0.0
Reporter: Brandon Li
Assignee: Jing Zhao
 Attachments: HDFS-5086.000.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5825) Use FileUtils.copyFile() to implement DFSTestUtils.copyFile()

2014-01-23 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880584#comment-13880584
 ] 

Arpit Agarwal commented on HDFS-5825:
-

+1 (pending Jenkins).

 Use FileUtils.copyFile() to implement DFSTestUtils.copyFile()
 -

 Key: HDFS-5825
 URL: https://issues.apache.org/jira/browse/HDFS-5825
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
Priority: Minor
 Attachments: HDFS-5825.000.patch


 {{DFSTestUtils.copyFile()}} is implemented by copying data through 
 FileInputStream / FileOutputStream. Apache Common IO provides 
 {{FileUtils.copyFile()}}. It uses FileChannel which is more efficient.
 This jira proposes to implement {{DFSTestUtils.copyFile()}} using 
 {{FileUtils.copyFile()}}.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5797) Implement offline image viewer.


 [ 
https://issues.apache.org/jira/browse/HDFS-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5797:
-

Attachment: HDFS-5797.001.patch

Fix an incorrect format string in LsrPBImage

 Implement offline image viewer.
 ---

 Key: HDFS-5797
 URL: https://issues.apache.org/jira/browse/HDFS-5797
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: HDFS-5698 (FSImage in protobuf)

 Attachments: HDFS-5797.000.patch, HDFS-5797.001.patch


 The format of FSImage has changed dramatically therefore a new implementation 
 of OfflineImageViewer is required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas

2014-01-23 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880680#comment-13880680
 ] 

Arpit Agarwal commented on HDFS-5318:
-

Hi Eric, I'm going to think about this some more.

My first thought is that your plugin implementation is doing the right thing 
and there is no good reason for reporting {{READ_ONLY_SHARED}} replicas to NN 
unless finalized. However if not done right this could result in reported 
blocks being 'lost' and as you mentioned, skipping ROS storages in 
{{BlockInfoUnderConstruction.addReplicaIfNotPresent}} is a bad idea.

I'm sure you are following HDFS-5194. FsDatasetSpi implementers are required to 
know too much 'out-of-band' knowledge of about the NN-DN protocol. Ideally the 
interface could be redone to avoid these problems. For now it may be an 
acceptable solution to just document this.

 Support read-only and read-write paths to shared replicas
 -

 Key: HDFS-5318
 URL: https://issues.apache.org/jira/browse/HDFS-5318
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.4.0
Reporter: Eric Sirianni
 Attachments: HDFS-5318.patch, HDFS-5318a-branch-2.patch, 
 HDFS-5318b-branch-2.patch, HDFS-5318c-branch-2.patch, hdfs-5318.pdf


 There are several use cases for using shared-storage for datanode block 
 storage in an HDFS environment (storing cold blocks on a NAS device, Amazon 
 S3, etc.).
 With shared-storage, there is a distinction between:
 # a distinct physical copy of a block
 # an access-path to that block via a datanode.  
 A single 'replication count' metric cannot accurately capture both aspects.  
 However, for most of the current uses of 'replication count' in the Namenode, 
 the number of physical copies aspect seems to be the appropriate semantic.
 I propose altering the replication counting algorithm in the Namenode to 
 accurately infer distinct physical copies in a shared storage environment.  
 With HDFS-5115, a {{StorageID}} is a UUID.  I propose associating some minor 
 additional semantics to the {{StorageID}} - namely that multiple datanodes 
 attaching to the same physical shared storage pool should report the same 
 {{StorageID}} for that pool.  A minor modification would be required in the 
 DataNode to enable the generation of {{StorageID}} s to be pluggable behind 
 the {{FsDatasetSpi}} interface.  
 With those semantics in place, the number of physical copies of a block in a 
 shared storage environment can be calculated as the number of _distinct_ 
 {{StorageID}} s associated with that block.
 Consider the following combinations for two {{(DataNode ID, Storage ID)}} 
 pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B:
 * {{DN_A != DN_B  S_A != S_B}} - *different* access paths to *different* 
 physical replicas (i.e. the traditional HDFS case with local disks)
 ** rarr; Block B has {{ReplicationCount == 2}}
 * {{DN_A != DN_B  S_A == S_B}} - *different* access paths to the *same* 
 physical replica (e.g. HDFS datanodes mounting the same NAS share)
 ** rarr; Block B has {{ReplicationCount == 1}}
 For example, if block B has the following location tuples:
 * {{DN_1, STORAGE_A}}
 * {{DN_2, STORAGE_A}}
 * {{DN_3, STORAGE_B}}
 * {{DN_4, STORAGE_B}},
 the effect of this proposed change would be to calculate the replication 
 factor in the namenode as *2* instead of *4*.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5809) BlockPoolSliceScanner make datanode to drop into infinite loop

2014-01-23 Thread ikweesung (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880700#comment-13880700
 ] 

ikweesung commented on HDFS-5809:
-

Please execute my poor English. : )
I found that int BlockPoolSliceScanner, blockInfoSet can contain two block 
which has the same block id, because BlockScanInfo compare by lastScanTime. 
Then int method updateScanStatus, the BlockScanInfo can not be updated, so 
((now - getEarliestScanTime()) = scanPeriod) will be always true. 
This cause datanode drop into infinite loop. 

 BlockPoolSliceScanner make datanode to drop into infinite loop
 --

 Key: HDFS-5809
 URL: https://issues.apache.org/jira/browse/HDFS-5809
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.0.0-alpha
 Environment: jdk1.6, centos6.4, 2.0.0-cdh4.5.0
Reporter: ikweesung
Priority: Critical
  Labels: blockpoolslicescanner, datanode, infinite-loop

 Hello, everyone.
 When hadoop cluster starts, BlockPoolSliceScanner start scanning the blocks 
 in my cluster.
 Then, randomly one datanode drop into infinite loop as the log show, and 
 finally all datanodes drop into infinite loop.
 Every datanode just verify fail by one block. 
 When i check the fail block like this : hadoop fsck / -files -blocks | grep 
 blk_1223474551535936089_4702249, no hdfs file contains the block.
 It seems that in while block of BlockPoolSliceScanner's scan method drop into 
 infinite loop .
 BlockPoolSliceScanner: 650
 while (datanode.shouldRun
  !datanode.blockScanner.blockScannerThread.isInterrupted()
  datanode.isBPServiceAlive(blockPoolId)) { 
 The log finally printed in method verifyBlock(BlockPoolSliceScanner:453).
 Please excuse my poor English.
 -
 LOG: 
 2014-01-21 18:36:50,582 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
 failed for 
 BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634 - 
 may be due to race with write
 2014-01-21 18:36:50,582 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
 failed for 
 BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634 - 
 may be due to race with write
 2014-01-21 18:36:50,582 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
 failed for 
 BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634 - 
 may be due to race with write



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5809) BlockPoolSliceScanner make datanode to drop into infinite loop

2014-01-23 Thread ikweesung (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880702#comment-13880702
 ] 

ikweesung commented on HDFS-5809:
-

My ugly path like this:
{code}
if ( info != null ) {
  delBlockInfo(info);
} else if(blockInfoSet.contains(block)){
delBlockInfo(info);
info = new BlockScanInfo(block);
} else {
  // It might already be removed. Thats ok, it will be caught next time.
  info = new BlockScanInfo(block);
}
{code}

 BlockPoolSliceScanner make datanode to drop into infinite loop
 --

 Key: HDFS-5809
 URL: https://issues.apache.org/jira/browse/HDFS-5809
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.0.0-alpha
 Environment: jdk1.6, centos6.4, 2.0.0-cdh4.5.0
Reporter: ikweesung
Priority: Critical
  Labels: blockpoolslicescanner, datanode, infinite-loop

 Hello, everyone.
 When hadoop cluster starts, BlockPoolSliceScanner start scanning the blocks 
 in my cluster.
 Then, randomly one datanode drop into infinite loop as the log show, and 
 finally all datanodes drop into infinite loop.
 Every datanode just verify fail by one block. 
 When i check the fail block like this : hadoop fsck / -files -blocks | grep 
 blk_1223474551535936089_4702249, no hdfs file contains the block.
 It seems that in while block of BlockPoolSliceScanner's scan method drop into 
 infinite loop .
 BlockPoolSliceScanner: 650
 while (datanode.shouldRun
  !datanode.blockScanner.blockScannerThread.isInterrupted()
  datanode.isBPServiceAlive(blockPoolId)) { 
 The log finally printed in method verifyBlock(BlockPoolSliceScanner:453).
 Please excuse my poor English.
 -
 LOG: 
 2014-01-21 18:36:50,582 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
 failed for 
 BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634 - 
 may be due to race with write
 2014-01-21 18:36:50,582 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
 failed for 
 BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634 - 
 may be due to race with write
 2014-01-21 18:36:50,582 INFO 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
 failed for 
 BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634 - 
 may be due to race with write



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction

2014-01-23 Thread Vinay (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880721#comment-13880721
]

Vinay commented on HDFS-5723:
-

bq. Hi Vinay, one question about the patch: so this inconsistent generation
stamp can also be caused by a delayed block-received report? I.e., after the
first close(), the DN's report gets delayed and is received by NN when the
append starts. In that case, will we have any issue by wrongly putting the
(block, DN) into the corruptBlockMap?
It can happen. Earlier block with prev genstamp will be marked corrupt, if
append pipeline setup with genstamp update happened before previous report
comes.
But append pipeline creation will update the genstamp in that datanode also, so
one more block-received report is expectedn with correct genstamp. This time it
will remove the old block from corrupt replica map.

Append failed FINALIZED replica should not be accepted as valid when that
block is underconstruction

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient

2014-01-23 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880746#comment-13880746
]

stack commented on HDFS-5776:
-

bq. An alternative way to do this is to have an Allow-Hedged-Reads
configuration, and if it is set to true, we load the number of thread pool and
the threshold time. We will provide an isHedgedReadsEnabled method but we will
not provide enable/disable methods.

The reviews are great. On the above, while I can see putting the on/off
switch as a DN config., we should allow setting at least the config on when to
start the hedge read per DFSCient instance.

bq. This means that we do need to put this in a static, for now, or else
FileContext users will be constantly destroying and creating thread-pools.

Thanks Colin. Makes sense.

Support 'hedged' reads in DFSClient
---

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5776) Support 'hedged' reads in DFSClient


 [ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-5776:


Attachment: HDFS-5776-v7.txt

Attached v7 makes the pool static now, please review

 Support 'hedged' reads in DFSClient
 ---

 Key: HDFS-5776
 URL: https://issues.apache.org/jira/browse/HDFS-5776
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, 
 HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, HDFS-5776.txt


 This is a placeholder of hdfs related stuff backport from 
 https://issues.apache.org/jira/browse/HBASE-7509
 The quorum read ability should be helpful especially to optimize read outliers
 we can utilize dfs.dfsclient.quorum.read.threshold.millis  
 dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read 
 ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
 could export the interested metric valus into client system(e.g. HBase's 
 regionserver metric).
 The core logic is in pread code path, we decide to goto the original 
 fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
 the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient

2014-01-23 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880757#comment-13880757
 ] 

stack commented on HDFS-5776:
-

[~xieliang007] what you think of the new comments above by the lads?

Now the executor is static, the number of threads config needs to be 
NumberOfHBaseOpenFiles X 2 else the feature will not work for all files?  
Thanks.

 Support 'hedged' reads in DFSClient
 ---

 Key: HDFS-5776
 URL: https://issues.apache.org/jira/browse/HDFS-5776
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, 
 HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, HDFS-5776.txt


 This is a placeholder of hdfs related stuff backport from 
 https://issues.apache.org/jira/browse/HBASE-7509
 The quorum read ability should be helpful especially to optimize read outliers
 we can utilize dfs.dfsclient.quorum.read.threshold.millis  
 dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read 
 ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
 could export the interested metric valus into client system(e.g. HBase's 
 regionserver metric).
 The core logic is in pread code path, we decide to goto the original 
 fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
 the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient

[
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880762#comment-13880762
]

Liang Xie commented on HDFS-5776:
-

bq. Can this scenario be possible? In hedgedFetchBlockByteRange, if we hit the
timeout for the first DN, we will add the DN to the ignore list, and call
chooseDataNode again. If the first DN is the only DN we can read, we will get
IOException from bestNode. Then we will run into a loop where we keep trying to
get another DN multiple times (some NN rpc call will even be fired). And during
this process the first DN can even return the data. In this scenario I guess we
may get a worse performance? Thus I guess we should not trigger hedged read if
we find that we cannot (easily) find the second DN for read?
yes,there's possible happen about your case, nice! and a very easy handling
method is just introduce a double-check function, say
enoughNodesForHedgedRead(LocatedBlock block) into the pread checking code
branch

Support 'hedged' reads in DFSClient
---

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5776) Support 'hedged' reads in DFSClient


 [ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-5776:


Attachment: HDFS-5776-v8.txt

 Support 'hedged' reads in DFSClient
 ---

 Key: HDFS-5776
 URL: https://issues.apache.org/jira/browse/HDFS-5776
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, 
 HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, HDFS-5776-v8.txt, 
 HDFS-5776.txt


 This is a placeholder of hdfs related stuff backport from 
 https://issues.apache.org/jira/browse/HBASE-7509
 The quorum read ability should be helpful especially to optimize read outliers
 we can utilize dfs.dfsclient.quorum.read.threshold.millis  
 dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read 
 ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
 could export the interested metric valus into client system(e.g. HBase's 
 regionserver metric).
 The core logic is in pread code path, we decide to goto the original 
 fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
 the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient