[jira] [Created] (HADOOP-10633) use Time#monotonicNow to avoid system clock reset

2014-05-28 Thread Liang Xie (JIRA)
Liang Xie created HADOOP-10633:
--

 Summary: use Time#monotonicNow to avoid system clock reset
 Key: HADOOP-10633
 URL: https://issues.apache.org/jira/browse/HADOOP-10633
 Project: Hadoop Common
  Issue Type: Improvement
  Components: io, security
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie


let's replace System#currentTimeMillis with Time#monotonicNow



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10633) use Time#monotonicNow to avoid system clock reset

2014-05-28 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HADOOP-10633:
---

Status: Patch Available  (was: Open)

 use Time#monotonicNow to avoid system clock reset
 -

 Key: HADOOP-10633
 URL: https://issues.apache.org/jira/browse/HADOOP-10633
 Project: Hadoop Common
  Issue Type: Improvement
  Components: io, security
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HADOOP-10633.txt


 let's replace System#currentTimeMillis with Time#monotonicNow



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10633) use Time#monotonicNow to avoid system clock reset

2014-05-28 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HADOOP-10633:
---

Attachment: HADOOP-10633.txt

 use Time#monotonicNow to avoid system clock reset
 -

 Key: HADOOP-10633
 URL: https://issues.apache.org/jira/browse/HADOOP-10633
 Project: Hadoop Common
  Issue Type: Improvement
  Components: io, security
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HADOOP-10633.txt


 let's replace System#currentTimeMillis with Time#monotonicNow



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10633) use Time#monotonicNow to avoid system clock reset

2014-05-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010866#comment-14010866
 ] 

Hadoop QA commented on HADOOP-10633:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647069/HADOOP-10633.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3978//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3978//console

This message is automatically generated.

 use Time#monotonicNow to avoid system clock reset
 -

 Key: HADOOP-10633
 URL: https://issues.apache.org/jira/browse/HADOOP-10633
 Project: Hadoop Common
  Issue Type: Improvement
  Components: io, security
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HADOOP-10633.txt


 let's replace System#currentTimeMillis with Time#monotonicNow



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10632) Minor improvements to Crypto input and output streams

2014-05-28 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-10632:


Attachment: HADOOP-10632.patch

[~tucu00], thanks for your nice comments.  The new patch includes update for 
all your comments except the last item which I want to discuss with you.

{quote}
CryptoInputStream#decrypt(long position, ...) method, given that this method 
does not change the current position of the stream, wouldn’t be simpler to 
create a new decryptor and use a different set of input/output buffers without 
touching the stream ones? We could also use instance vars for them and init 
them the first time this method is called (if it is).
{quote}

I think they each have their advantages. The approach here, doesn’t touch the 
stream ones, needs to create a new decryptor, and different set of input/output 
buffers, key, iv and some other instance vars.  
The code logic of original one is not complicated too, it needs to restore the 
{{outBuffer}} and {{decryptor}}, but much less code.
Alejandro, I’d personally like to keep the original one.  Do you find some 
potential issue? 


 Minor improvements to Crypto input and output streams
 -

 Key: HADOOP-10632
 URL: https://issues.apache.org/jira/browse/HADOOP-10632
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Alejandro Abdelnur
Assignee: Yi Liu
 Fix For: 3.0.0

 Attachments: HADOOP-10632.patch


 Minor follow up feedback on the crypto streams



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10632) Minor improvements to Crypto input and output streams

2014-05-28 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-10632:


Attachment: HADOOP-10632.1.patch

Hi Alejandro, in the new patch, for {{CryptoInputStream#decrypt(long position, 
...)}} method, I use a different output buffer to avoid restoring {{outBuffer}} 
which may cause few performance issue of bytes copy. For decryptor, still uses 
the existing one.

 Minor improvements to Crypto input and output streams
 -

 Key: HADOOP-10632
 URL: https://issues.apache.org/jira/browse/HADOOP-10632
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Alejandro Abdelnur
Assignee: Yi Liu
 Fix For: 3.0.0

 Attachments: HADOOP-10632.1.patch, HADOOP-10632.patch


 Minor follow up feedback on the crypto streams



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9704) Write metrics sink plugin for Hadoop/Graphite

2014-05-28 Thread Babak Behzad (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011294#comment-14011294
 ] 

Babak Behzad commented on HADOOP-9704:
--

Thanks Ravi. So, in order to enable this one has to put the following three 
lines in etc/hadoop-metrics2.properties for each of the contexts (namenode, 
datanode, resource manager, etc):
namenode.sink.graphite.server_host=localhost
namenode.sink.graphite.server_port=2003
namenode.sink.graphite.metrics_prefix=test.namenode

Looking at the logs and source code, the socket connection is only created once 
in the init() function for each of these contexts that you enable them, so at 
most 5 new connections are created. This is exactly the same as other sinks 
such as FileSink and GangliaSink and I don't think it is an issue.

Regarding a slow Graphite server, that's a good point. We will be able to test 
this soon, but again looking at the current Hadoop code, Ganglia's sink is 
using DatagramSocket instead of Socket. I am not sure if we need to do that 
later if we see problems with the current code.

 Write metrics sink plugin for Hadoop/Graphite
 -

 Key: HADOOP-9704
 URL: https://issues.apache.org/jira/browse/HADOOP-9704
 Project: Hadoop Common
  Issue Type: New Feature
Affects Versions: 2.0.3-alpha
Reporter: Chu Tong
 Attachments: 
 0001-HADOOP-9704.-Write-metrics-sink-plugin-for-Hadoop-Gr.patch, 
 HADOOP-9704.patch, HADOOP-9704.patch, Hadoop-9704.patch


 Write a metrics sink plugin for Hadoop to send metrics directly to Graphite 
 in additional to the current ganglia and file ones.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10625) Configuration: names should be trimmed when putting/getting to properties

2014-05-28 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011326#comment-14011326
 ] 

Xuan Gong commented on HADOOP-10625:


Committed to trunk, branch-2. Thanks Wanda!

 Configuration: names should be trimmed when putting/getting to properties
 -

 Key: HADOOP-10625
 URL: https://issues.apache.org/jira/browse/HADOOP-10625
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.5.0

 Attachments: HADOOP-10625.patch, HADOOP-10625.patch, 
 HADOOP-10625.patch


 Currently, Hadoop will not trim name when putting a pair of k/v to property. 
 But when loading configuration from file, names will be trimmed:
 (In Configuration.java)
 {code}
   if (name.equals(field.getTagName())  field.hasChildNodes())
 attr = StringInterner.weakIntern(
 ((Text)field.getFirstChild()).getData().trim());
   if (value.equals(field.getTagName())  field.hasChildNodes())
 value = StringInterner.weakIntern(
 ((Text)field.getFirstChild()).getData());
 {code}
 With this behavior, following steps will be problematic:
 1. User incorrectly set  hadoop.key=value (with a space before hadoop.key)
 2. User try to get hadoop.key, cannot get value
 3. Serialize/deserialize configuration (Like what did in MR)
 4. User try to get hadoop.key, can get value, which will make 
 inconsistency problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10625) Configuration: names should be trimmed when putting/getting to properties

2014-05-28 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated HADOOP-10625:
---

   Resolution: Fixed
Fix Version/s: 2.5.0
   Status: Resolved  (was: Patch Available)

 Configuration: names should be trimmed when putting/getting to properties
 -

 Key: HADOOP-10625
 URL: https://issues.apache.org/jira/browse/HADOOP-10625
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.5.0

 Attachments: HADOOP-10625.patch, HADOOP-10625.patch, 
 HADOOP-10625.patch


 Currently, Hadoop will not trim name when putting a pair of k/v to property. 
 But when loading configuration from file, names will be trimmed:
 (In Configuration.java)
 {code}
   if (name.equals(field.getTagName())  field.hasChildNodes())
 attr = StringInterner.weakIntern(
 ((Text)field.getFirstChild()).getData().trim());
   if (value.equals(field.getTagName())  field.hasChildNodes())
 value = StringInterner.weakIntern(
 ((Text)field.getFirstChild()).getData());
 {code}
 With this behavior, following steps will be problematic:
 1. User incorrectly set  hadoop.key=value (with a space before hadoop.key)
 2. User try to get hadoop.key, cannot get value
 3. Serialize/deserialize configuration (Like what did in MR)
 4. User try to get hadoop.key, can get value, which will make 
 inconsistency problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10632) Minor improvements to Crypto input and output streams

2014-05-28 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011344#comment-14011344
 ] 

Alejandro Abdelnur commented on HADOOP-10632:
-

Thanks Yi, looks good, a few follow ups for your consideration:

CryptoInputStream#freeBuffers(), not sure we should tap into SUN internal APIs 
here. do we gain something by cleaning up the buffers? And if we do, we should 
first check that the DB  is a sun.nio.ch.DB instance. (Sorry, I’ve missed this 
one in my first passed)

CryptoInputStream#decrypt(), you changed the signature to take the outBuffer as 
param, wouldn’t make sense to take both input and output buffers as parameters 
then, then the usage form the read-pos would be simpler. What I’m thinking is 
that the read-pos/readFully should leave alone all instance vars related to 
normal stream reading and use a complete different set of vars that are 
instantiated on their first use and reset before every use.

CryptoInputStream, read-pos() and readFully(), wouldn’t we have to consider the 
padding based on the requested pos there?  If so, the decrypt(), following 
previous comment, would have to receive the padding as param as well, no?

On failing the Crypto streams constructors if the buffer size is not a multiple 
of the block size, I’ve thought about that initially, but that seemed a too 
strong requirement to me, that is why I suggested flooring the request buffer 
to the closest previous block size multiple. 


 Minor improvements to Crypto input and output streams
 -

 Key: HADOOP-10632
 URL: https://issues.apache.org/jira/browse/HADOOP-10632
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Alejandro Abdelnur
Assignee: Yi Liu
 Fix For: 3.0.0

 Attachments: HADOOP-10632.1.patch, HADOOP-10632.patch


 Minor follow up feedback on the crypto streams



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10630) Possible race condition in RetryInvocationHandler

2014-05-28 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011352#comment-14011352
 ] 

Kihwal Lee commented on HADOOP-10630:
-

The patch looks reasonable. Did you have a chance to verify that it fixes the 
issue?

 Possible race condition in RetryInvocationHandler
 -

 Key: HADOOP-10630
 URL: https://issues.apache.org/jira/browse/HADOOP-10630
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HADOOP-10630.000.patch


 In one of our system tests with NameNode HA setup, we ran 300 threads in 
 LoadGenerator. While one of the NameNodes was already in the active state and 
 started to serve, we still saw one of the client thread failed all the 
 retries in a 20 seconds window. In the meanwhile, we saw a lot of following 
 warning msg in the log:
 {noformat}
 WARN retry.RetryInvocationHandler: A failover has occurred since the start of 
 this method invocation attempt.
 {noformat}
 After checking the code, we see the following code in RetryInvocationHandler:
 {code}
   while (true) {
   // The number of times this invocation handler has ever been failed 
 over,
   // before this method invocation attempt. Used to prevent concurrent
   // failed method invocations from triggering multiple failover attempts.
   long invocationAttemptFailoverCount;
   synchronized (proxyProvider) {
 invocationAttemptFailoverCount = proxyProviderFailoverCount;
   }
   ..
   if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) {
 // Make sure that concurrent failed method invocations only cause 
 a
 // single actual fail over.
 synchronized (proxyProvider) {
   if (invocationAttemptFailoverCount == 
 proxyProviderFailoverCount) {
 proxyProvider.performFailover(currentProxy.proxy);
 proxyProviderFailoverCount++;
 currentProxy = proxyProvider.getProxy();
   } else {
 LOG.warn(A failover has occurred since the start of this 
 method
 +  invocation attempt.);
   }
 }
 invocationFailoverCount++;
   }
  ..
 {code}
 We can see we refresh the value of currentProxy only when the thread performs 
 the failover (while holding the monitor of the proxyProvider). Because 
 currentProxy is not volatile,  a thread that does not perform the failover 
 (in which case it will log the warning msg) may fail to get the new value of 
 currentProxy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10631) Native Hadoop Client: Add missing output in GenerateProtobufs.cmake

2014-05-28 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011365#comment-14011365
 ] 

Colin Patrick McCabe commented on HADOOP-10631:
---

+1.  Thanks, Binglin.

 Native Hadoop Client: Add missing output in GenerateProtobufs.cmake
 ---

 Key: HADOOP-10631
 URL: https://issues.apache.org/jira/browse/HADOOP-10631
 Project: Hadoop Common
  Issue Type: Sub-task
Affects Versions: HADOOP-10388
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Trivial
 Attachments: HADOOP-10631.v1.patch


 In GenerateProtobufs.cmake, pb-c.h.s files are not added to output, so when 
 make clean is called, those files are not cleaned. 
 {code}
  add_custom_command(
 OUTPUT ${PB_C_FILE} ${PB_H_FILE} ${CALL_C_FILE} ${CALL_H_FILE}
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10631) Native Hadoop Client: make clean should remove pb-c.h.s files

2014-05-28 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HADOOP-10631:
--

Summary: Native Hadoop Client: make clean should remove pb-c.h.s files  
(was: Native Hadoop Client: Add missing output in GenerateProtobufs.cmake)

 Native Hadoop Client: make clean should remove pb-c.h.s files
 -

 Key: HADOOP-10631
 URL: https://issues.apache.org/jira/browse/HADOOP-10631
 Project: Hadoop Common
  Issue Type: Sub-task
Affects Versions: HADOOP-10388
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Trivial
 Attachments: HADOOP-10631.v1.patch


 In GenerateProtobufs.cmake, pb-c.h.s files are not added to output, so when 
 make clean is called, those files are not cleaned. 
 {code}
  add_custom_command(
 OUTPUT ${PB_C_FILE} ${PB_H_FILE} ${CALL_C_FILE} ${CALL_H_FILE}
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10630) Possible race condition in RetryInvocationHandler

2014-05-28 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011364#comment-14011364
 ] 

Jing Zhao commented on HADOOP-10630:


Not yet. Actually the issue cannot easily be reproduced since by default the 
client will retry/failover 10 times. I will decrease the retry number and rerun 
the test with/without the patch these days.

 Possible race condition in RetryInvocationHandler
 -

 Key: HADOOP-10630
 URL: https://issues.apache.org/jira/browse/HADOOP-10630
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HADOOP-10630.000.patch


 In one of our system tests with NameNode HA setup, we ran 300 threads in 
 LoadGenerator. While one of the NameNodes was already in the active state and 
 started to serve, we still saw one of the client thread failed all the 
 retries in a 20 seconds window. In the meanwhile, we saw a lot of following 
 warning msg in the log:
 {noformat}
 WARN retry.RetryInvocationHandler: A failover has occurred since the start of 
 this method invocation attempt.
 {noformat}
 After checking the code, we see the following code in RetryInvocationHandler:
 {code}
   while (true) {
   // The number of times this invocation handler has ever been failed 
 over,
   // before this method invocation attempt. Used to prevent concurrent
   // failed method invocations from triggering multiple failover attempts.
   long invocationAttemptFailoverCount;
   synchronized (proxyProvider) {
 invocationAttemptFailoverCount = proxyProviderFailoverCount;
   }
   ..
   if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) {
 // Make sure that concurrent failed method invocations only cause 
 a
 // single actual fail over.
 synchronized (proxyProvider) {
   if (invocationAttemptFailoverCount == 
 proxyProviderFailoverCount) {
 proxyProvider.performFailover(currentProxy.proxy);
 proxyProviderFailoverCount++;
 currentProxy = proxyProvider.getProxy();
   } else {
 LOG.warn(A failover has occurred since the start of this 
 method
 +  invocation attempt.);
   }
 }
 invocationFailoverCount++;
   }
  ..
 {code}
 We can see we refresh the value of currentProxy only when the thread performs 
 the failover (while holding the monitor of the proxyProvider). Because 
 currentProxy is not volatile,  a thread that does not perform the failover 
 (in which case it will log the warning msg) may fail to get the new value of 
 currentProxy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10631) Native Hadoop Client: make clean should remove pb-c.h.s files

2014-05-28 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HADOOP-10631:
--

   Resolution: Fixed
Fix Version/s: HADOOP-10388
   Status: Resolved  (was: Patch Available)

 Native Hadoop Client: make clean should remove pb-c.h.s files
 -

 Key: HADOOP-10631
 URL: https://issues.apache.org/jira/browse/HADOOP-10631
 Project: Hadoop Common
  Issue Type: Sub-task
Affects Versions: HADOOP-10388
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Trivial
 Fix For: HADOOP-10388

 Attachments: HADOOP-10631.v1.patch


 In GenerateProtobufs.cmake, pb-c.h.s files are not added to output, so when 
 make clean is called, those files are not cleaned. 
 {code}
  add_custom_command(
 OUTPUT ${PB_C_FILE} ${PB_H_FILE} ${CALL_C_FILE} ${CALL_H_FILE}
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10634) Add recursive list apis to FileSystem to give implementations an opportunity for optimization

2014-05-28 Thread Sumit Kumar (JIRA)
Sumit Kumar created HADOOP-10634:


 Summary: Add recursive list apis to FileSystem to give 
implementations an opportunity for optimization
 Key: HADOOP-10634
 URL: https://issues.apache.org/jira/browse/HADOOP-10634
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/s3
Reporter: Sumit Kumar
 Fix For: 2.4.0


Currently different code flows in hadoop use recursive listing to discover 
files/folders in a given path. For example in FileInputFormat (both mapreduce 
and mapred implementations) this is done while calculating splits. They however 
do this by doing listing level by level. That means to discover files in 
/foo/bar means they do listing at /foo/bar first to get the immediate children, 
then make the same call on all immediate children for /foo/bar to discover 
their immediate children and so on. This doesn't scale well for fs 
implementations like s3 because every listStatus call ends up being a 
webservice call to s3. In cases where large number of files are considered for 
input, this makes getSplits() call slow. 

This patch adds a new set of recursive list apis that give opportunity to the 
s3 fs implementation to optimize. The behavior remains the same for other 
implementations (that is a default implementation is provided for other fs so 
they don't have to implement anything new). However for s3 it provides a simple 
change (as shown in the patch) to improve listing performance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10624) Fix some minors typo and add more test cases for hadoop_err

2014-05-28 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011381#comment-14011381
 ] 

Colin Patrick McCabe commented on HADOOP-10624:
---

Looks good overall.  Can you get rid of the switch statements and just pass in 
the values that you want to test?

For example you could have:
{code}
hadoop_lerr_alloc_test(RUNTIME_EXCEPTION_ERROR_CODE,
org.apache.hadoop.native.HadoopCore.RuntimeException: );
{code}

and then have {{hadoop_lerr_alloc_test}} call {{hadoop_lerr_alloc}} and check 
the result of {{hadoop_err_msg}} against the string that was passed in.  It 
makes more sense to me to pass in the string than to have a case statement like 
that.

 Fix some minors typo and add more test cases for hadoop_err
 ---

 Key: HADOOP-10624
 URL: https://issues.apache.org/jira/browse/HADOOP-10624
 Project: Hadoop Common
  Issue Type: Sub-task
Affects Versions: HADOOP-10388
Reporter: Wenwu Peng
Assignee: Wenwu Peng
 Attachments: HADOOP-10624-pnative.001.patch


 Changes:
 1. Add more test cases to cover method hadoop_lerr_alloc and 
 hadoop_uverr_alloc
 2. Fix typo as following:
 1) Change hadoop_uverr_alloc(int cod to hadoop_uverr_alloc(int code in 
 hadoop_err.h
 2) Change OutOfMemory to OutOfMemoryException to consistent with other 
 Exception in hadoop_err.c
 3) Change DBUG to DEBUG in messenger.c
 4) Change DBUG to DEBUG in reactor.c



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10630) Possible race condition in RetryInvocationHandler

2014-05-28 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011387#comment-14011387
 ] 

Suresh Srinivas commented on HADOOP-10630:
--

+1 for the patch, once the failover tests pass.

 Possible race condition in RetryInvocationHandler
 -

 Key: HADOOP-10630
 URL: https://issues.apache.org/jira/browse/HADOOP-10630
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HADOOP-10630.000.patch


 In one of our system tests with NameNode HA setup, we ran 300 threads in 
 LoadGenerator. While one of the NameNodes was already in the active state and 
 started to serve, we still saw one of the client thread failed all the 
 retries in a 20 seconds window. In the meanwhile, we saw a lot of following 
 warning msg in the log:
 {noformat}
 WARN retry.RetryInvocationHandler: A failover has occurred since the start of 
 this method invocation attempt.
 {noformat}
 After checking the code, we see the following code in RetryInvocationHandler:
 {code}
   while (true) {
   // The number of times this invocation handler has ever been failed 
 over,
   // before this method invocation attempt. Used to prevent concurrent
   // failed method invocations from triggering multiple failover attempts.
   long invocationAttemptFailoverCount;
   synchronized (proxyProvider) {
 invocationAttemptFailoverCount = proxyProviderFailoverCount;
   }
   ..
   if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) {
 // Make sure that concurrent failed method invocations only cause 
 a
 // single actual fail over.
 synchronized (proxyProvider) {
   if (invocationAttemptFailoverCount == 
 proxyProviderFailoverCount) {
 proxyProvider.performFailover(currentProxy.proxy);
 proxyProviderFailoverCount++;
 currentProxy = proxyProvider.getProxy();
   } else {
 LOG.warn(A failover has occurred since the start of this 
 method
 +  invocation attempt.);
   }
 }
 invocationFailoverCount++;
   }
  ..
 {code}
 We can see we refresh the value of currentProxy only when the thread performs 
 the failover (while holding the monitor of the proxyProvider). Because 
 currentProxy is not volatile,  a thread that does not perform the failover 
 (in which case it will log the warning msg) may fail to get the new value of 
 currentProxy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10625) Configuration: names should be trimmed when putting/getting to properties

2014-05-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011432#comment-14011432
 ] 

Hudson commented on HADOOP-10625:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #5616 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5616/])
HADOOP-10625. Trim configuration names when putting/getting them to properties 
(xgong: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1598072)
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestConfiguration.java


 Configuration: names should be trimmed when putting/getting to properties
 -

 Key: HADOOP-10625
 URL: https://issues.apache.org/jira/browse/HADOOP-10625
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.5.0

 Attachments: HADOOP-10625.patch, HADOOP-10625.patch, 
 HADOOP-10625.patch


 Currently, Hadoop will not trim name when putting a pair of k/v to property. 
 But when loading configuration from file, names will be trimmed:
 (In Configuration.java)
 {code}
   if (name.equals(field.getTagName())  field.hasChildNodes())
 attr = StringInterner.weakIntern(
 ((Text)field.getFirstChild()).getData().trim());
   if (value.equals(field.getTagName())  field.hasChildNodes())
 value = StringInterner.weakIntern(
 ((Text)field.getFirstChild()).getData());
 {code}
 With this behavior, following steps will be problematic:
 1. User incorrectly set  hadoop.key=value (with a space before hadoop.key)
 2. User try to get hadoop.key, cannot get value
 3. Serialize/deserialize configuration (Like what did in MR)
 4. User try to get hadoop.key, can get value, which will make 
 inconsistency problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10628) Javadoc and few code style improvement for Crypto input and output streams

2014-05-28 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011440#comment-14011440
 ] 

Charles Lamb commented on HADOOP-10628:
---

+1. Thanks Yi.

 Javadoc and few code style improvement for Crypto input and output streams
 --

 Key: HADOOP-10628
 URL: https://issues.apache.org/jira/browse/HADOOP-10628
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Yi Liu
Assignee: Yi Liu
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10628.patch


 There are some additional comments from [~clamb] related to javadoc and few 
 code style on HADOOP-10603, let's fix them in this follow-on JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10634) Add recursive list apis to FileSystem to give implementations an opportunity for optimization

2014-05-28 Thread Sumit Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Kumar updated HADOOP-10634:
-

Attachment: HADOOP-10634.patch

 Add recursive list apis to FileSystem to give implementations an opportunity 
 for optimization
 -

 Key: HADOOP-10634
 URL: https://issues.apache.org/jira/browse/HADOOP-10634
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/s3
Reporter: Sumit Kumar
 Fix For: 2.4.0

 Attachments: HADOOP-10634.patch


 Currently different code flows in hadoop use recursive listing to discover 
 files/folders in a given path. For example in FileInputFormat (both mapreduce 
 and mapred implementations) this is done while calculating splits. They 
 however do this by doing listing level by level. That means to discover files 
 in /foo/bar means they do listing at /foo/bar first to get the immediate 
 children, then make the same call on all immediate children for /foo/bar to 
 discover their immediate children and so on. This doesn't scale well for fs 
 implementations like s3 because every listStatus call ends up being a 
 webservice call to s3. In cases where large number of files are considered 
 for input, this makes getSplits() call slow. 
 This patch adds a new set of recursive list apis that give opportunity to the 
 s3 fs implementation to optimize. The behavior remains the same for other 
 implementations (that is a default implementation is provided for other fs so 
 they don't have to implement anything new). However for s3 it provides a 
 simple change (as shown in the patch) to improve listing performance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10634) Add recursive list apis to FileSystem to give implementations an opportunity for optimization

2014-05-28 Thread Sumit Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Kumar updated HADOOP-10634:
-

Fix Version/s: (was: 2.4.0)
Affects Version/s: 2.4.0
   Status: Patch Available  (was: Open)

attached a patch that passes all the tests on top of hadoop 2.4.0 branch

 Add recursive list apis to FileSystem to give implementations an opportunity 
 for optimization
 -

 Key: HADOOP-10634
 URL: https://issues.apache.org/jira/browse/HADOOP-10634
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/s3
Affects Versions: 2.4.0
Reporter: Sumit Kumar
 Attachments: HADOOP-10634.patch


 Currently different code flows in hadoop use recursive listing to discover 
 files/folders in a given path. For example in FileInputFormat (both mapreduce 
 and mapred implementations) this is done while calculating splits. They 
 however do this by doing listing level by level. That means to discover files 
 in /foo/bar means they do listing at /foo/bar first to get the immediate 
 children, then make the same call on all immediate children for /foo/bar to 
 discover their immediate children and so on. This doesn't scale well for fs 
 implementations like s3 because every listStatus call ends up being a 
 webservice call to s3. In cases where large number of files are considered 
 for input, this makes getSplits() call slow. 
 This patch adds a new set of recursive list apis that give opportunity to the 
 s3 fs implementation to optimize. The behavior remains the same for other 
 implementations (that is a default implementation is provided for other fs so 
 they don't have to implement anything new). However for s3 it provides a 
 simple change (as shown in the patch) to improve listing performance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10632) Minor improvements to Crypto input and output streams

2014-05-28 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011509#comment-14011509
 ] 

Alejandro Abdelnur commented on HADOOP-10632:
-

Also, the {{CryptoCodec}} has some javadoc links that are incorrect, when 
referring to classes, don't prefix the class name with #.


 Minor improvements to Crypto input and output streams
 -

 Key: HADOOP-10632
 URL: https://issues.apache.org/jira/browse/HADOOP-10632
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Alejandro Abdelnur
Assignee: Yi Liu
 Fix For: 3.0.0

 Attachments: HADOOP-10632.1.patch, HADOOP-10632.patch


 Minor follow up feedback on the crypto streams



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10634) Add recursive list apis to FileSystem to give implementations an opportunity for optimization

2014-05-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011531#comment-14011531
 ] 

Hadoop QA commented on HADOOP-10634:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647191/HADOOP-10634.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3979//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3979//console

This message is automatically generated.

 Add recursive list apis to FileSystem to give implementations an opportunity 
 for optimization
 -

 Key: HADOOP-10634
 URL: https://issues.apache.org/jira/browse/HADOOP-10634
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/s3
Affects Versions: 2.4.0
Reporter: Sumit Kumar
 Attachments: HADOOP-10634.patch


 Currently different code flows in hadoop use recursive listing to discover 
 files/folders in a given path. For example in FileInputFormat (both mapreduce 
 and mapred implementations) this is done while calculating splits. They 
 however do this by doing listing level by level. That means to discover files 
 in /foo/bar means they do listing at /foo/bar first to get the immediate 
 children, then make the same call on all immediate children for /foo/bar to 
 discover their immediate children and so on. This doesn't scale well for fs 
 implementations like s3 because every listStatus call ends up being a 
 webservice call to s3. In cases where large number of files are considered 
 for input, this makes getSplits() call slow. 
 This patch adds a new set of recursive list apis that give opportunity to the 
 s3 fs implementation to optimize. The behavior remains the same for other 
 implementations (that is a default implementation is provided for other fs so 
 they don't have to implement anything new). However for s3 it provides a 
 simple change (as shown in the patch) to improve listing performance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-05-28 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011563#comment-14011563
 ] 

Owen O'Malley commented on HADOOP-10607:


I think that it would be good to add a method in Configuration that is 
getPassword(String key).

That method will do the credential provider lookup and translate it.

Perhaps we should have the identity credential provider log a warning when it 
is invoked so that admins are aware when they have plaintext passwords in their 
config files.

I think that the right final state is where you only have unadorned aliases 
where there are currently secrets.

 Create an API to Separate Credentials/Password Storage from Applications
 

 Key: HADOOP-10607
 URL: https://issues.apache.org/jira/browse/HADOOP-10607
 Project: Hadoop Common
  Issue Type: New Feature
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay
 Fix For: 3.0.0

 Attachments: 10607-2.patch, 10607-3.patch, 10607-4.patch, 
 10607-5.patch, 10607.patch


 As with the filesystem API, we need to provide a generic mechanism to support 
 multiple credential storage mechanisms that are potentially from third 
 parties. 
 We need the ability to eliminate the storage of passwords and secrets in 
 clear text within configuration files or within code.
 Toward that end, I propose an API that is configured using a list of URLs of 
 CredentialProviders. The implementation will look for implementations using 
 the ServiceLoader interface and thus support third party libraries.
 Two providers will be included in this patch. One using the credentials cache 
 in MapReduce jobs and the other using Java KeyStores from either HDFS or 
 local file system. 
 A CredShell CLI will also be included in this patch which provides the 
 ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10566) Refactor proxyservers out of ProxyUsers

2014-05-28 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HADOOP-10566:
---

  Resolution: Fixed
   Fix Version/s: 2.5.0
  3.0.0
Target Version/s: 2.5.0
  Status: Resolved  (was: Patch Available)

I committed this to branch-2.

 Refactor proxyservers out of ProxyUsers
 ---

 Key: HADOOP-10566
 URL: https://issues.apache.org/jira/browse/HADOOP-10566
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: security
Affects Versions: 2.4.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Fix For: 3.0.0, 2.5.0

 Attachments: HADOOP-10566-branch-2.patch, HADOOP-10566.patch, 
 HADOOP-10566.patch, HADOOP-10566.patch, HADOOP-10566.patch, HADOOP-10566.patch


 HADOOP-10498 added proxyservers feature in ProxyUsers. It is beneficial to 
 treat this as a separate feature since 
 1 The ProxyUsers is per proxyuser where as proxyservers is per cluster. The 
 cardinality is different. 
 2 The ProxyUsers.authorize() and ProxyUsers.isproxyUser() are synchronized 
 and hence share the same lock  and impacts performance.
 Since these are two separate features, it will be an improvement to keep them 
 separate. It also enables one to fine-tune each feature independently.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-05-28 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011597#comment-14011597
 ] 

Larry McCay commented on HADOOP-10607:
--

[~owen.omalley] - I can buy the getPassword method - that makes sense.

What I am wondering now is whether we need alias names beyond the config 
property names at all.
If when we call getPassword the implementation first checks for an alias of 
that name and finds it then it doesn't matter what the value is in the config 
file. We could suggest that it be ALIASED or something that shows that it is 
intentionally not a clear text password.

I think that will get us what we want without the ugly alias token syntax.
What do you think?

 Create an API to Separate Credentials/Password Storage from Applications
 

 Key: HADOOP-10607
 URL: https://issues.apache.org/jira/browse/HADOOP-10607
 Project: Hadoop Common
  Issue Type: New Feature
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay
 Fix For: 3.0.0

 Attachments: 10607-2.patch, 10607-3.patch, 10607-4.patch, 
 10607-5.patch, 10607.patch


 As with the filesystem API, we need to provide a generic mechanism to support 
 multiple credential storage mechanisms that are potentially from third 
 parties. 
 We need the ability to eliminate the storage of passwords and secrets in 
 clear text within configuration files or within code.
 Toward that end, I propose an API that is configured using a list of URLs of 
 CredentialProviders. The implementation will look for implementations using 
 the ServiceLoader interface and thus support third party libraries.
 Two providers will be included in this patch. One using the credentials cache 
 in MapReduce jobs and the other using Java KeyStores from either HDFS or 
 local file system. 
 A CredShell CLI will also be included in this patch which provides the 
 ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10622) Shell.runCommand can deadlock

2014-05-28 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011626#comment-14011626
 ] 

Jason Lowe commented on HADOOP-10622:
-

Thanks for the review, Gera!  My patch was just a quick-n-dirty thing that 
doesn't cover all the cases, and I agree your proposed approach is much better. 
 Would you mind taking this and posting an official patch of your proposal?

 Shell.runCommand can deadlock
 -

 Key: HADOOP-10622
 URL: https://issues.apache.org/jira/browse/HADOOP-10622
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Attachments: HADOOP-10622.patch


 Ran into a deadlock in Shell.runCommand.  Stacktrace details to follow.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-05-28 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011679#comment-14011679
 ] 

Larry McCay commented on HADOOP-10607:
--

Okay let's summarize an approach here...

If we have a ConfigurationCredentialProvider that simply looks for the 
credential in configuration then:
* this can be the default provider which will allow for passwords in clear text 
and work out of the box
* we can place a real credential provider in front of it in the provider path 
and allow for password aliases to be resolved and then fall back to 
Configuration

If we add a new method to Configuration - getPassword(String name) then:
* we essentially extend the configuration file to include the credentials 
available through the provider API
* we will leverage the CredentialProvider API to get the password whether it be 
in a store or in the configuration file without the consuming code or even the 
Configuration code knowing where it comes from

If we leverage the existing configuration property names as the aliases into 
the credential store then:
* we can simply remove the password config elements from files when not in 
clear text or
* add a value of ALIASED or something that indicates that the value is 
elsewhere (in case the property is mandatory for some elements)

Is this accurate?

 Create an API to Separate Credentials/Password Storage from Applications
 

 Key: HADOOP-10607
 URL: https://issues.apache.org/jira/browse/HADOOP-10607
 Project: Hadoop Common
  Issue Type: New Feature
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay
 Fix For: 3.0.0

 Attachments: 10607-2.patch, 10607-3.patch, 10607-4.patch, 
 10607-5.patch, 10607.patch


 As with the filesystem API, we need to provide a generic mechanism to support 
 multiple credential storage mechanisms that are potentially from third 
 parties. 
 We need the ability to eliminate the storage of passwords and secrets in 
 clear text within configuration files or within code.
 Toward that end, I propose an API that is configured using a list of URLs of 
 CredentialProviders. The implementation will look for implementations using 
 the ServiceLoader interface and thus support third party libraries.
 Two providers will be included in this patch. One using the credentials cache 
 in MapReduce jobs and the other using Java KeyStores from either HDFS or 
 local file system. 
 A CredShell CLI will also be included in this patch which provides the 
 ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-05-28 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011716#comment-14011716
 ] 

Owen O'Malley commented on HADOOP-10607:


Looks good except that I'd avoid the special value of ALIASED. We don't have 
any mandatory properties in our configs.

 Create an API to Separate Credentials/Password Storage from Applications
 

 Key: HADOOP-10607
 URL: https://issues.apache.org/jira/browse/HADOOP-10607
 Project: Hadoop Common
  Issue Type: New Feature
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay
 Fix For: 3.0.0

 Attachments: 10607-2.patch, 10607-3.patch, 10607-4.patch, 
 10607-5.patch, 10607.patch


 As with the filesystem API, we need to provide a generic mechanism to support 
 multiple credential storage mechanisms that are potentially from third 
 parties. 
 We need the ability to eliminate the storage of passwords and secrets in 
 clear text within configuration files or within code.
 Toward that end, I propose an API that is configured using a list of URLs of 
 CredentialProviders. The implementation will look for implementations using 
 the ServiceLoader interface and thus support third party libraries.
 Two providers will be included in this patch. One using the credentials cache 
 in MapReduce jobs and the other using Java KeyStores from either HDFS or 
 local file system. 
 A CredShell CLI will also be included in this patch which provides the 
 ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9704) Write metrics sink plugin for Hadoop/Graphite

2014-05-28 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011733#comment-14011733
 ] 

Luke Lu commented on HADOOP-9704:
-

[~raviprak]: Exceptions in sink impls won't bring down the daemon. The metrics 
system is designed to be resilient to transient back-end errors. It'll do 
retries according to config as well. 

 Write metrics sink plugin for Hadoop/Graphite
 -

 Key: HADOOP-9704
 URL: https://issues.apache.org/jira/browse/HADOOP-9704
 Project: Hadoop Common
  Issue Type: New Feature
Affects Versions: 2.0.3-alpha
Reporter: Chu Tong
 Attachments: 
 0001-HADOOP-9704.-Write-metrics-sink-plugin-for-Hadoop-Gr.patch, 
 HADOOP-9704.patch, HADOOP-9704.patch, Hadoop-9704.patch


 Write a metrics sink plugin for Hadoop to send metrics directly to Graphite 
 in additional to the current ganglia and file ones.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9704) Write metrics sink plugin for Hadoop/Graphite

2014-05-28 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011743#comment-14011743
 ] 

Luke Lu commented on HADOOP-9704:
-

The patch looks good overall. Thanks [~babakbehzad]! Please remove the tabs in 
the source and format according to 

https://wiki.apache.org/hadoop/CodeReviewChecklist

 Write metrics sink plugin for Hadoop/Graphite
 -

 Key: HADOOP-9704
 URL: https://issues.apache.org/jira/browse/HADOOP-9704
 Project: Hadoop Common
  Issue Type: New Feature
Affects Versions: 2.0.3-alpha
Reporter: Chu Tong
 Attachments: 
 0001-HADOOP-9704.-Write-metrics-sink-plugin-for-Hadoop-Gr.patch, 
 HADOOP-9704.patch, HADOOP-9704.patch, Hadoop-9704.patch


 Write a metrics sink plugin for Hadoop to send metrics directly to Graphite 
 in additional to the current ganglia and file ones.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-05-28 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011746#comment-14011746
 ] 

Larry McCay commented on HADOOP-10607:
--

Very good!

I will hopefully have a new patch by end of day tomorrow.

 Create an API to Separate Credentials/Password Storage from Applications
 

 Key: HADOOP-10607
 URL: https://issues.apache.org/jira/browse/HADOOP-10607
 Project: Hadoop Common
  Issue Type: New Feature
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay
 Fix For: 3.0.0

 Attachments: 10607-2.patch, 10607-3.patch, 10607-4.patch, 
 10607-5.patch, 10607.patch


 As with the filesystem API, we need to provide a generic mechanism to support 
 multiple credential storage mechanisms that are potentially from third 
 parties. 
 We need the ability to eliminate the storage of passwords and secrets in 
 clear text within configuration files or within code.
 Toward that end, I propose an API that is configured using a list of URLs of 
 CredentialProviders. The implementation will look for implementations using 
 the ServiceLoader interface and thus support third party libraries.
 Two providers will be included in this patch. One using the credentials cache 
 in MapReduce jobs and the other using Java KeyStores from either HDFS or 
 local file system. 
 A CredShell CLI will also be included in this patch which provides the 
 ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10625) Configuration: names should be trimmed when putting/getting to properties

2014-05-28 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011754#comment-14011754
 ] 

Wangda Tan commented on HADOOP-10625:
-

Thanks [~xgong] for reviewing this!

 Configuration: names should be trimmed when putting/getting to properties
 -

 Key: HADOOP-10625
 URL: https://issues.apache.org/jira/browse/HADOOP-10625
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.5.0

 Attachments: HADOOP-10625.patch, HADOOP-10625.patch, 
 HADOOP-10625.patch


 Currently, Hadoop will not trim name when putting a pair of k/v to property. 
 But when loading configuration from file, names will be trimmed:
 (In Configuration.java)
 {code}
   if (name.equals(field.getTagName())  field.hasChildNodes())
 attr = StringInterner.weakIntern(
 ((Text)field.getFirstChild()).getData().trim());
   if (value.equals(field.getTagName())  field.hasChildNodes())
 value = StringInterner.weakIntern(
 ((Text)field.getFirstChild()).getData());
 {code}
 With this behavior, following steps will be problematic:
 1. User incorrectly set  hadoop.key=value (with a space before hadoop.key)
 2. User try to get hadoop.key, cannot get value
 3. Serialize/deserialize configuration (Like what did in MR)
 4. User try to get hadoop.key, can get value, which will make 
 inconsistency problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10448) Support pluggable mechanism to specify proxy user settings

2014-05-28 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011783#comment-14011783
 ] 

Arpit Agarwal commented on HADOOP-10448:


Hi [~benoyantony],

During {{ImpersonationProvider}} initialization:
{code}
  public static void authorize(UserGroupInformation user, 
  String remoteAddress) throws AuthorizationException {
if (sip==null) {
  refreshSuperUserGroupsConfiguration(); 
}
{code}

and in {{refreshSuperUserGroupsConfiguration}}
{code}
public static void refreshSuperUserGroupsConfiguration(Configuration conf) {
sip = getInstance(conf);
...
{code}

So the first few calls could be serviced by different {{ImpersonationProvider}} 
objects.

Is this acceptable behavior? It should be documented if so.

 Support pluggable mechanism to specify proxy user settings
 --

 Key: HADOOP-10448
 URL: https://issues.apache.org/jira/browse/HADOOP-10448
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: security
Affects Versions: 2.3.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: HADOOP-10448.patch, HADOOP-10448.patch, 
 HADOOP-10448.patch, HADOOP-10448.patch, HADOOP-10448.patch, 
 HADOOP-10448.patch, HADOOP-10448.patch, HADOOP-10448.patch, 
 HADOOP-10448.patch, HADOOP-10448.patch


 We have a requirement to support large number of superusers. (users who 
 impersonate as another user) 
 (http://hadoop.apache.org/docs/r1.2.1/Secure_Impersonation.html) 
 Currently each  superuser needs to be defined in the core-site.xml via 
 proxyuser settings. This will be cumbersome when there are 1000 entries.
 It seems useful to have a pluggable mechanism to specify  proxy user settings 
 with the current approach as the default. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10376) Refactor refresh*Protocols into a single generic refreshConfigProtocol

2014-05-28 Thread Chris Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Li updated HADOOP-10376:
--

Attachment: HADOOP-10376.patch

Updated patch with support for many handlers mapping to a single identifier.

 Refactor refresh*Protocols into a single generic refreshConfigProtocol
 --

 Key: HADOOP-10376
 URL: https://issues.apache.org/jira/browse/HADOOP-10376
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Chris Li
Assignee: Chris Li
Priority: Minor
 Attachments: HADOOP-10376.patch, HADOOP-10376.patch, 
 RefreshFrameworkProposal.pdf


 See https://issues.apache.org/jira/browse/HADOOP-10285
 There are starting to be too many refresh*Protocols We can refactor them to 
 use a single protocol with a variable payload to choose what to do.
 Thereafter, we can return an indication of success or failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10624) Fix some minors typo and add more test cases for hadoop_err

2014-05-28 Thread Wenwu Peng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenwu Peng updated HADOOP-10624:


Attachment: HADOOP-10624-pnative.002.patch

Thanks Colin great comments.
HADOOP-10624-pnative.002.patch address Colin 's comment.

 Fix some minors typo and add more test cases for hadoop_err
 ---

 Key: HADOOP-10624
 URL: https://issues.apache.org/jira/browse/HADOOP-10624
 Project: Hadoop Common
  Issue Type: Sub-task
Affects Versions: HADOOP-10388
Reporter: Wenwu Peng
Assignee: Wenwu Peng
 Attachments: HADOOP-10624-pnative.001.patch, 
 HADOOP-10624-pnative.002.patch


 Changes:
 1. Add more test cases to cover method hadoop_lerr_alloc and 
 hadoop_uverr_alloc
 2. Fix typo as following:
 1) Change hadoop_uverr_alloc(int cod to hadoop_uverr_alloc(int code in 
 hadoop_err.h
 2) Change OutOfMemory to OutOfMemoryException to consistent with other 
 Exception in hadoop_err.c
 3) Change DBUG to DEBUG in messenger.c
 4) Change DBUG to DEBUG in reactor.c



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10376) Refactor refresh*Protocols into a single generic refreshConfigProtocol

2014-05-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012057#comment-14012057
 ] 

Hadoop QA commented on HADOOP-10376:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647271/HADOOP-10376.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3980//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3980//console

This message is automatically generated.

 Refactor refresh*Protocols into a single generic refreshConfigProtocol
 --

 Key: HADOOP-10376
 URL: https://issues.apache.org/jira/browse/HADOOP-10376
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Chris Li
Assignee: Chris Li
Priority: Minor
 Attachments: HADOOP-10376.patch, HADOOP-10376.patch, 
 RefreshFrameworkProposal.pdf


 See https://issues.apache.org/jira/browse/HADOOP-10285
 There are starting to be too many refresh*Protocols We can refactor them to 
 use a single protocol with a variable payload to choose what to do.
 Thereafter, we can return an indication of success or failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10635) Add a method to CryptoCodec to generate SRNs for IV

2014-05-28 Thread Alejandro Abdelnur (JIRA)
Alejandro Abdelnur created HADOOP-10635:
---

 Summary: Add a method to CryptoCodec to generate SRNs for IV
 Key: HADOOP-10635
 URL: https://issues.apache.org/jira/browse/HADOOP-10635
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Alejandro Abdelnur
Assignee: Yi Liu


SRN generators are provided by crypto libraries. the CryptoCodec gives access 
to a crypto library, thus it makes sense to expose the SRN generator on the 
CryptoCodec API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10635) Add a method to CryptoCodec to generate SRNs for IV

2014-05-28 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012075#comment-14012075
 ] 

Alejandro Abdelnur commented on HADOOP-10635:
-

Adding a method {{byte[] generateSecureRandom(int bytes)}} would do the trick, 
then the impl could get a SecureRandom instance from the same provider used to 
get the cipher.

 Add a method to CryptoCodec to generate SRNs for IV
 ---

 Key: HADOOP-10635
 URL: https://issues.apache.org/jira/browse/HADOOP-10635
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Alejandro Abdelnur
Assignee: Yi Liu
 Fix For: 3.0.0


 SRN generators are provided by crypto libraries. the CryptoCodec gives access 
 to a crypto library, thus it makes sense to expose the SRN generator on the 
 CryptoCodec API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10448) Support pluggable mechanism to specify proxy user settings

2014-05-28 Thread Benoy Antony (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012077#comment-14012077
 ] 

Benoy Antony commented on HADOOP-10448:
---

Thanks for pointing it out, Arpit. Daryn also mentioned this .
{quote}
During the first access or a refresh, a surge of connections may cause multiple 
instances to be created (all but the last disposed after the check) to be 
created but I suppose that's a fringe event and the benefit outweighs it.
{quote}

I'll document this in the source code as well as in the security documentation. 
I'll post a patch soon.

 Support pluggable mechanism to specify proxy user settings
 --

 Key: HADOOP-10448
 URL: https://issues.apache.org/jira/browse/HADOOP-10448
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: security
Affects Versions: 2.3.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: HADOOP-10448.patch, HADOOP-10448.patch, 
 HADOOP-10448.patch, HADOOP-10448.patch, HADOOP-10448.patch, 
 HADOOP-10448.patch, HADOOP-10448.patch, HADOOP-10448.patch, 
 HADOOP-10448.patch, HADOOP-10448.patch


 We have a requirement to support large number of superusers. (users who 
 impersonate as another user) 
 (http://hadoop.apache.org/docs/r1.2.1/Secure_Impersonation.html) 
 Currently each  superuser needs to be defined in the core-site.xml via 
 proxyuser settings. This will be cumbersome when there are 1000 entries.
 It seems useful to have a pluggable mechanism to specify  proxy user settings 
 with the current approach as the default. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10636) Native Hadoop Client:add unit test case for call

2014-05-28 Thread Wenwu Peng (JIRA)
Wenwu Peng created HADOOP-10636:
---

 Summary: Native Hadoop Client:add unit test case for call
 Key: HADOOP-10636
 URL: https://issues.apache.org/jira/browse/HADOOP-10636
 Project: Hadoop Common
  Issue Type: Sub-task
Affects Versions: HADOOP-10388
Reporter: Wenwu Peng
Assignee: Wenwu Peng






--
This message was sent by Atlassian JIRA
(v6.2#6252)