[jira] [Created] (HADOOP-10633) use Time#monotonicNow to avoid system clock reset
Liang Xie created HADOOP-10633: -- Summary: use Time#monotonicNow to avoid system clock reset Key: HADOOP-10633 URL: https://issues.apache.org/jira/browse/HADOOP-10633 Project: Hadoop Common Issue Type: Improvement Components: io, security Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie let's replace System#currentTimeMillis with Time#monotonicNow -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10633) use Time#monotonicNow to avoid system clock reset
[ https://issues.apache.org/jira/browse/HADOOP-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HADOOP-10633: --- Status: Patch Available (was: Open) use Time#monotonicNow to avoid system clock reset - Key: HADOOP-10633 URL: https://issues.apache.org/jira/browse/HADOOP-10633 Project: Hadoop Common Issue Type: Improvement Components: io, security Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HADOOP-10633.txt let's replace System#currentTimeMillis with Time#monotonicNow -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10633) use Time#monotonicNow to avoid system clock reset
[ https://issues.apache.org/jira/browse/HADOOP-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HADOOP-10633: --- Attachment: HADOOP-10633.txt use Time#monotonicNow to avoid system clock reset - Key: HADOOP-10633 URL: https://issues.apache.org/jira/browse/HADOOP-10633 Project: Hadoop Common Issue Type: Improvement Components: io, security Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HADOOP-10633.txt let's replace System#currentTimeMillis with Time#monotonicNow -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10633) use Time#monotonicNow to avoid system clock reset
[ https://issues.apache.org/jira/browse/HADOOP-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010866#comment-14010866 ] Hadoop QA commented on HADOOP-10633: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12647069/HADOOP-10633.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3978//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3978//console This message is automatically generated. use Time#monotonicNow to avoid system clock reset - Key: HADOOP-10633 URL: https://issues.apache.org/jira/browse/HADOOP-10633 Project: Hadoop Common Issue Type: Improvement Components: io, security Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HADOOP-10633.txt let's replace System#currentTimeMillis with Time#monotonicNow -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10632) Minor improvements to Crypto input and output streams
[ https://issues.apache.org/jira/browse/HADOOP-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HADOOP-10632: Attachment: HADOOP-10632.patch [~tucu00], thanks for your nice comments. The new patch includes update for all your comments except the last item which I want to discuss with you. {quote} CryptoInputStream#decrypt(long position, ...) method, given that this method does not change the current position of the stream, wouldn’t be simpler to create a new decryptor and use a different set of input/output buffers without touching the stream ones? We could also use instance vars for them and init them the first time this method is called (if it is). {quote} I think they each have their advantages. The approach here, doesn’t touch the stream ones, needs to create a new decryptor, and different set of input/output buffers, key, iv and some other instance vars. The code logic of original one is not complicated too, it needs to restore the {{outBuffer}} and {{decryptor}}, but much less code. Alejandro, I’d personally like to keep the original one. Do you find some potential issue? Minor improvements to Crypto input and output streams - Key: HADOOP-10632 URL: https://issues.apache.org/jira/browse/HADOOP-10632 Project: Hadoop Common Issue Type: Sub-task Components: security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Alejandro Abdelnur Assignee: Yi Liu Fix For: 3.0.0 Attachments: HADOOP-10632.patch Minor follow up feedback on the crypto streams -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10632) Minor improvements to Crypto input and output streams
[ https://issues.apache.org/jira/browse/HADOOP-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HADOOP-10632: Attachment: HADOOP-10632.1.patch Hi Alejandro, in the new patch, for {{CryptoInputStream#decrypt(long position, ...)}} method, I use a different output buffer to avoid restoring {{outBuffer}} which may cause few performance issue of bytes copy. For decryptor, still uses the existing one. Minor improvements to Crypto input and output streams - Key: HADOOP-10632 URL: https://issues.apache.org/jira/browse/HADOOP-10632 Project: Hadoop Common Issue Type: Sub-task Components: security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Alejandro Abdelnur Assignee: Yi Liu Fix For: 3.0.0 Attachments: HADOOP-10632.1.patch, HADOOP-10632.patch Minor follow up feedback on the crypto streams -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-9704) Write metrics sink plugin for Hadoop/Graphite
[ https://issues.apache.org/jira/browse/HADOOP-9704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011294#comment-14011294 ] Babak Behzad commented on HADOOP-9704: -- Thanks Ravi. So, in order to enable this one has to put the following three lines in etc/hadoop-metrics2.properties for each of the contexts (namenode, datanode, resource manager, etc): namenode.sink.graphite.server_host=localhost namenode.sink.graphite.server_port=2003 namenode.sink.graphite.metrics_prefix=test.namenode Looking at the logs and source code, the socket connection is only created once in the init() function for each of these contexts that you enable them, so at most 5 new connections are created. This is exactly the same as other sinks such as FileSink and GangliaSink and I don't think it is an issue. Regarding a slow Graphite server, that's a good point. We will be able to test this soon, but again looking at the current Hadoop code, Ganglia's sink is using DatagramSocket instead of Socket. I am not sure if we need to do that later if we see problems with the current code. Write metrics sink plugin for Hadoop/Graphite - Key: HADOOP-9704 URL: https://issues.apache.org/jira/browse/HADOOP-9704 Project: Hadoop Common Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Chu Tong Attachments: 0001-HADOOP-9704.-Write-metrics-sink-plugin-for-Hadoop-Gr.patch, HADOOP-9704.patch, HADOOP-9704.patch, Hadoop-9704.patch Write a metrics sink plugin for Hadoop to send metrics directly to Graphite in additional to the current ganglia and file ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10625) Configuration: names should be trimmed when putting/getting to properties
[ https://issues.apache.org/jira/browse/HADOOP-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011326#comment-14011326 ] Xuan Gong commented on HADOOP-10625: Committed to trunk, branch-2. Thanks Wanda! Configuration: names should be trimmed when putting/getting to properties - Key: HADOOP-10625 URL: https://issues.apache.org/jira/browse/HADOOP-10625 Project: Hadoop Common Issue Type: Bug Components: conf Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.5.0 Attachments: HADOOP-10625.patch, HADOOP-10625.patch, HADOOP-10625.patch Currently, Hadoop will not trim name when putting a pair of k/v to property. But when loading configuration from file, names will be trimmed: (In Configuration.java) {code} if (name.equals(field.getTagName()) field.hasChildNodes()) attr = StringInterner.weakIntern( ((Text)field.getFirstChild()).getData().trim()); if (value.equals(field.getTagName()) field.hasChildNodes()) value = StringInterner.weakIntern( ((Text)field.getFirstChild()).getData()); {code} With this behavior, following steps will be problematic: 1. User incorrectly set hadoop.key=value (with a space before hadoop.key) 2. User try to get hadoop.key, cannot get value 3. Serialize/deserialize configuration (Like what did in MR) 4. User try to get hadoop.key, can get value, which will make inconsistency problem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10625) Configuration: names should be trimmed when putting/getting to properties
[ https://issues.apache.org/jira/browse/HADOOP-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated HADOOP-10625: --- Resolution: Fixed Fix Version/s: 2.5.0 Status: Resolved (was: Patch Available) Configuration: names should be trimmed when putting/getting to properties - Key: HADOOP-10625 URL: https://issues.apache.org/jira/browse/HADOOP-10625 Project: Hadoop Common Issue Type: Bug Components: conf Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.5.0 Attachments: HADOOP-10625.patch, HADOOP-10625.patch, HADOOP-10625.patch Currently, Hadoop will not trim name when putting a pair of k/v to property. But when loading configuration from file, names will be trimmed: (In Configuration.java) {code} if (name.equals(field.getTagName()) field.hasChildNodes()) attr = StringInterner.weakIntern( ((Text)field.getFirstChild()).getData().trim()); if (value.equals(field.getTagName()) field.hasChildNodes()) value = StringInterner.weakIntern( ((Text)field.getFirstChild()).getData()); {code} With this behavior, following steps will be problematic: 1. User incorrectly set hadoop.key=value (with a space before hadoop.key) 2. User try to get hadoop.key, cannot get value 3. Serialize/deserialize configuration (Like what did in MR) 4. User try to get hadoop.key, can get value, which will make inconsistency problem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10632) Minor improvements to Crypto input and output streams
[ https://issues.apache.org/jira/browse/HADOOP-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011344#comment-14011344 ] Alejandro Abdelnur commented on HADOOP-10632: - Thanks Yi, looks good, a few follow ups for your consideration: CryptoInputStream#freeBuffers(), not sure we should tap into SUN internal APIs here. do we gain something by cleaning up the buffers? And if we do, we should first check that the DB is a sun.nio.ch.DB instance. (Sorry, I’ve missed this one in my first passed) CryptoInputStream#decrypt(), you changed the signature to take the outBuffer as param, wouldn’t make sense to take both input and output buffers as parameters then, then the usage form the read-pos would be simpler. What I’m thinking is that the read-pos/readFully should leave alone all instance vars related to normal stream reading and use a complete different set of vars that are instantiated on their first use and reset before every use. CryptoInputStream, read-pos() and readFully(), wouldn’t we have to consider the padding based on the requested pos there? If so, the decrypt(), following previous comment, would have to receive the padding as param as well, no? On failing the Crypto streams constructors if the buffer size is not a multiple of the block size, I’ve thought about that initially, but that seemed a too strong requirement to me, that is why I suggested flooring the request buffer to the closest previous block size multiple. Minor improvements to Crypto input and output streams - Key: HADOOP-10632 URL: https://issues.apache.org/jira/browse/HADOOP-10632 Project: Hadoop Common Issue Type: Sub-task Components: security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Alejandro Abdelnur Assignee: Yi Liu Fix For: 3.0.0 Attachments: HADOOP-10632.1.patch, HADOOP-10632.patch Minor follow up feedback on the crypto streams -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10630) Possible race condition in RetryInvocationHandler
[ https://issues.apache.org/jira/browse/HADOOP-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011352#comment-14011352 ] Kihwal Lee commented on HADOOP-10630: - The patch looks reasonable. Did you have a chance to verify that it fixes the issue? Possible race condition in RetryInvocationHandler - Key: HADOOP-10630 URL: https://issues.apache.org/jira/browse/HADOOP-10630 Project: Hadoop Common Issue Type: Bug Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HADOOP-10630.000.patch In one of our system tests with NameNode HA setup, we ran 300 threads in LoadGenerator. While one of the NameNodes was already in the active state and started to serve, we still saw one of the client thread failed all the retries in a 20 seconds window. In the meanwhile, we saw a lot of following warning msg in the log: {noformat} WARN retry.RetryInvocationHandler: A failover has occurred since the start of this method invocation attempt. {noformat} After checking the code, we see the following code in RetryInvocationHandler: {code} while (true) { // The number of times this invocation handler has ever been failed over, // before this method invocation attempt. Used to prevent concurrent // failed method invocations from triggering multiple failover attempts. long invocationAttemptFailoverCount; synchronized (proxyProvider) { invocationAttemptFailoverCount = proxyProviderFailoverCount; } .. if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) { // Make sure that concurrent failed method invocations only cause a // single actual fail over. synchronized (proxyProvider) { if (invocationAttemptFailoverCount == proxyProviderFailoverCount) { proxyProvider.performFailover(currentProxy.proxy); proxyProviderFailoverCount++; currentProxy = proxyProvider.getProxy(); } else { LOG.warn(A failover has occurred since the start of this method + invocation attempt.); } } invocationFailoverCount++; } .. {code} We can see we refresh the value of currentProxy only when the thread performs the failover (while holding the monitor of the proxyProvider). Because currentProxy is not volatile, a thread that does not perform the failover (in which case it will log the warning msg) may fail to get the new value of currentProxy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10631) Native Hadoop Client: Add missing output in GenerateProtobufs.cmake
[ https://issues.apache.org/jira/browse/HADOOP-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011365#comment-14011365 ] Colin Patrick McCabe commented on HADOOP-10631: --- +1. Thanks, Binglin. Native Hadoop Client: Add missing output in GenerateProtobufs.cmake --- Key: HADOOP-10631 URL: https://issues.apache.org/jira/browse/HADOOP-10631 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HADOOP-10631.v1.patch In GenerateProtobufs.cmake, pb-c.h.s files are not added to output, so when make clean is called, those files are not cleaned. {code} add_custom_command( OUTPUT ${PB_C_FILE} ${PB_H_FILE} ${CALL_C_FILE} ${CALL_H_FILE} {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10631) Native Hadoop Client: make clean should remove pb-c.h.s files
[ https://issues.apache.org/jira/browse/HADOOP-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HADOOP-10631: -- Summary: Native Hadoop Client: make clean should remove pb-c.h.s files (was: Native Hadoop Client: Add missing output in GenerateProtobufs.cmake) Native Hadoop Client: make clean should remove pb-c.h.s files - Key: HADOOP-10631 URL: https://issues.apache.org/jira/browse/HADOOP-10631 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HADOOP-10631.v1.patch In GenerateProtobufs.cmake, pb-c.h.s files are not added to output, so when make clean is called, those files are not cleaned. {code} add_custom_command( OUTPUT ${PB_C_FILE} ${PB_H_FILE} ${CALL_C_FILE} ${CALL_H_FILE} {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10630) Possible race condition in RetryInvocationHandler
[ https://issues.apache.org/jira/browse/HADOOP-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011364#comment-14011364 ] Jing Zhao commented on HADOOP-10630: Not yet. Actually the issue cannot easily be reproduced since by default the client will retry/failover 10 times. I will decrease the retry number and rerun the test with/without the patch these days. Possible race condition in RetryInvocationHandler - Key: HADOOP-10630 URL: https://issues.apache.org/jira/browse/HADOOP-10630 Project: Hadoop Common Issue Type: Bug Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HADOOP-10630.000.patch In one of our system tests with NameNode HA setup, we ran 300 threads in LoadGenerator. While one of the NameNodes was already in the active state and started to serve, we still saw one of the client thread failed all the retries in a 20 seconds window. In the meanwhile, we saw a lot of following warning msg in the log: {noformat} WARN retry.RetryInvocationHandler: A failover has occurred since the start of this method invocation attempt. {noformat} After checking the code, we see the following code in RetryInvocationHandler: {code} while (true) { // The number of times this invocation handler has ever been failed over, // before this method invocation attempt. Used to prevent concurrent // failed method invocations from triggering multiple failover attempts. long invocationAttemptFailoverCount; synchronized (proxyProvider) { invocationAttemptFailoverCount = proxyProviderFailoverCount; } .. if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) { // Make sure that concurrent failed method invocations only cause a // single actual fail over. synchronized (proxyProvider) { if (invocationAttemptFailoverCount == proxyProviderFailoverCount) { proxyProvider.performFailover(currentProxy.proxy); proxyProviderFailoverCount++; currentProxy = proxyProvider.getProxy(); } else { LOG.warn(A failover has occurred since the start of this method + invocation attempt.); } } invocationFailoverCount++; } .. {code} We can see we refresh the value of currentProxy only when the thread performs the failover (while holding the monitor of the proxyProvider). Because currentProxy is not volatile, a thread that does not perform the failover (in which case it will log the warning msg) may fail to get the new value of currentProxy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10631) Native Hadoop Client: make clean should remove pb-c.h.s files
[ https://issues.apache.org/jira/browse/HADOOP-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HADOOP-10631: -- Resolution: Fixed Fix Version/s: HADOOP-10388 Status: Resolved (was: Patch Available) Native Hadoop Client: make clean should remove pb-c.h.s files - Key: HADOOP-10631 URL: https://issues.apache.org/jira/browse/HADOOP-10631 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Fix For: HADOOP-10388 Attachments: HADOOP-10631.v1.patch In GenerateProtobufs.cmake, pb-c.h.s files are not added to output, so when make clean is called, those files are not cleaned. {code} add_custom_command( OUTPUT ${PB_C_FILE} ${PB_H_FILE} ${CALL_C_FILE} ${CALL_H_FILE} {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10634) Add recursive list apis to FileSystem to give implementations an opportunity for optimization
Sumit Kumar created HADOOP-10634: Summary: Add recursive list apis to FileSystem to give implementations an opportunity for optimization Key: HADOOP-10634 URL: https://issues.apache.org/jira/browse/HADOOP-10634 Project: Hadoop Common Issue Type: Improvement Components: fs/s3 Reporter: Sumit Kumar Fix For: 2.4.0 Currently different code flows in hadoop use recursive listing to discover files/folders in a given path. For example in FileInputFormat (both mapreduce and mapred implementations) this is done while calculating splits. They however do this by doing listing level by level. That means to discover files in /foo/bar means they do listing at /foo/bar first to get the immediate children, then make the same call on all immediate children for /foo/bar to discover their immediate children and so on. This doesn't scale well for fs implementations like s3 because every listStatus call ends up being a webservice call to s3. In cases where large number of files are considered for input, this makes getSplits() call slow. This patch adds a new set of recursive list apis that give opportunity to the s3 fs implementation to optimize. The behavior remains the same for other implementations (that is a default implementation is provided for other fs so they don't have to implement anything new). However for s3 it provides a simple change (as shown in the patch) to improve listing performance. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10624) Fix some minors typo and add more test cases for hadoop_err
[ https://issues.apache.org/jira/browse/HADOOP-10624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011381#comment-14011381 ] Colin Patrick McCabe commented on HADOOP-10624: --- Looks good overall. Can you get rid of the switch statements and just pass in the values that you want to test? For example you could have: {code} hadoop_lerr_alloc_test(RUNTIME_EXCEPTION_ERROR_CODE, org.apache.hadoop.native.HadoopCore.RuntimeException: ); {code} and then have {{hadoop_lerr_alloc_test}} call {{hadoop_lerr_alloc}} and check the result of {{hadoop_err_msg}} against the string that was passed in. It makes more sense to me to pass in the string than to have a case statement like that. Fix some minors typo and add more test cases for hadoop_err --- Key: HADOOP-10624 URL: https://issues.apache.org/jira/browse/HADOOP-10624 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Wenwu Peng Assignee: Wenwu Peng Attachments: HADOOP-10624-pnative.001.patch Changes: 1. Add more test cases to cover method hadoop_lerr_alloc and hadoop_uverr_alloc 2. Fix typo as following: 1) Change hadoop_uverr_alloc(int cod to hadoop_uverr_alloc(int code in hadoop_err.h 2) Change OutOfMemory to OutOfMemoryException to consistent with other Exception in hadoop_err.c 3) Change DBUG to DEBUG in messenger.c 4) Change DBUG to DEBUG in reactor.c -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10630) Possible race condition in RetryInvocationHandler
[ https://issues.apache.org/jira/browse/HADOOP-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011387#comment-14011387 ] Suresh Srinivas commented on HADOOP-10630: -- +1 for the patch, once the failover tests pass. Possible race condition in RetryInvocationHandler - Key: HADOOP-10630 URL: https://issues.apache.org/jira/browse/HADOOP-10630 Project: Hadoop Common Issue Type: Bug Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HADOOP-10630.000.patch In one of our system tests with NameNode HA setup, we ran 300 threads in LoadGenerator. While one of the NameNodes was already in the active state and started to serve, we still saw one of the client thread failed all the retries in a 20 seconds window. In the meanwhile, we saw a lot of following warning msg in the log: {noformat} WARN retry.RetryInvocationHandler: A failover has occurred since the start of this method invocation attempt. {noformat} After checking the code, we see the following code in RetryInvocationHandler: {code} while (true) { // The number of times this invocation handler has ever been failed over, // before this method invocation attempt. Used to prevent concurrent // failed method invocations from triggering multiple failover attempts. long invocationAttemptFailoverCount; synchronized (proxyProvider) { invocationAttemptFailoverCount = proxyProviderFailoverCount; } .. if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) { // Make sure that concurrent failed method invocations only cause a // single actual fail over. synchronized (proxyProvider) { if (invocationAttemptFailoverCount == proxyProviderFailoverCount) { proxyProvider.performFailover(currentProxy.proxy); proxyProviderFailoverCount++; currentProxy = proxyProvider.getProxy(); } else { LOG.warn(A failover has occurred since the start of this method + invocation attempt.); } } invocationFailoverCount++; } .. {code} We can see we refresh the value of currentProxy only when the thread performs the failover (while holding the monitor of the proxyProvider). Because currentProxy is not volatile, a thread that does not perform the failover (in which case it will log the warning msg) may fail to get the new value of currentProxy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10625) Configuration: names should be trimmed when putting/getting to properties
[ https://issues.apache.org/jira/browse/HADOOP-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011432#comment-14011432 ] Hudson commented on HADOOP-10625: - SUCCESS: Integrated in Hadoop-trunk-Commit #5616 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5616/]) HADOOP-10625. Trim configuration names when putting/getting them to properties (xgong: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1598072) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestConfiguration.java Configuration: names should be trimmed when putting/getting to properties - Key: HADOOP-10625 URL: https://issues.apache.org/jira/browse/HADOOP-10625 Project: Hadoop Common Issue Type: Bug Components: conf Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.5.0 Attachments: HADOOP-10625.patch, HADOOP-10625.patch, HADOOP-10625.patch Currently, Hadoop will not trim name when putting a pair of k/v to property. But when loading configuration from file, names will be trimmed: (In Configuration.java) {code} if (name.equals(field.getTagName()) field.hasChildNodes()) attr = StringInterner.weakIntern( ((Text)field.getFirstChild()).getData().trim()); if (value.equals(field.getTagName()) field.hasChildNodes()) value = StringInterner.weakIntern( ((Text)field.getFirstChild()).getData()); {code} With this behavior, following steps will be problematic: 1. User incorrectly set hadoop.key=value (with a space before hadoop.key) 2. User try to get hadoop.key, cannot get value 3. Serialize/deserialize configuration (Like what did in MR) 4. User try to get hadoop.key, can get value, which will make inconsistency problem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10628) Javadoc and few code style improvement for Crypto input and output streams
[ https://issues.apache.org/jira/browse/HADOOP-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011440#comment-14011440 ] Charles Lamb commented on HADOOP-10628: --- +1. Thanks Yi. Javadoc and few code style improvement for Crypto input and output streams -- Key: HADOOP-10628 URL: https://issues.apache.org/jira/browse/HADOOP-10628 Project: Hadoop Common Issue Type: Sub-task Components: security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Yi Liu Assignee: Yi Liu Fix For: fs-encryption (HADOOP-10150 and HDFS-6134) Attachments: HADOOP-10628.patch There are some additional comments from [~clamb] related to javadoc and few code style on HADOOP-10603, let's fix them in this follow-on JIRA. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10634) Add recursive list apis to FileSystem to give implementations an opportunity for optimization
[ https://issues.apache.org/jira/browse/HADOOP-10634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated HADOOP-10634: - Attachment: HADOOP-10634.patch Add recursive list apis to FileSystem to give implementations an opportunity for optimization - Key: HADOOP-10634 URL: https://issues.apache.org/jira/browse/HADOOP-10634 Project: Hadoop Common Issue Type: Improvement Components: fs/s3 Reporter: Sumit Kumar Fix For: 2.4.0 Attachments: HADOOP-10634.patch Currently different code flows in hadoop use recursive listing to discover files/folders in a given path. For example in FileInputFormat (both mapreduce and mapred implementations) this is done while calculating splits. They however do this by doing listing level by level. That means to discover files in /foo/bar means they do listing at /foo/bar first to get the immediate children, then make the same call on all immediate children for /foo/bar to discover their immediate children and so on. This doesn't scale well for fs implementations like s3 because every listStatus call ends up being a webservice call to s3. In cases where large number of files are considered for input, this makes getSplits() call slow. This patch adds a new set of recursive list apis that give opportunity to the s3 fs implementation to optimize. The behavior remains the same for other implementations (that is a default implementation is provided for other fs so they don't have to implement anything new). However for s3 it provides a simple change (as shown in the patch) to improve listing performance. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10634) Add recursive list apis to FileSystem to give implementations an opportunity for optimization
[ https://issues.apache.org/jira/browse/HADOOP-10634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated HADOOP-10634: - Fix Version/s: (was: 2.4.0) Affects Version/s: 2.4.0 Status: Patch Available (was: Open) attached a patch that passes all the tests on top of hadoop 2.4.0 branch Add recursive list apis to FileSystem to give implementations an opportunity for optimization - Key: HADOOP-10634 URL: https://issues.apache.org/jira/browse/HADOOP-10634 Project: Hadoop Common Issue Type: Improvement Components: fs/s3 Affects Versions: 2.4.0 Reporter: Sumit Kumar Attachments: HADOOP-10634.patch Currently different code flows in hadoop use recursive listing to discover files/folders in a given path. For example in FileInputFormat (both mapreduce and mapred implementations) this is done while calculating splits. They however do this by doing listing level by level. That means to discover files in /foo/bar means they do listing at /foo/bar first to get the immediate children, then make the same call on all immediate children for /foo/bar to discover their immediate children and so on. This doesn't scale well for fs implementations like s3 because every listStatus call ends up being a webservice call to s3. In cases where large number of files are considered for input, this makes getSplits() call slow. This patch adds a new set of recursive list apis that give opportunity to the s3 fs implementation to optimize. The behavior remains the same for other implementations (that is a default implementation is provided for other fs so they don't have to implement anything new). However for s3 it provides a simple change (as shown in the patch) to improve listing performance. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10632) Minor improvements to Crypto input and output streams
[ https://issues.apache.org/jira/browse/HADOOP-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011509#comment-14011509 ] Alejandro Abdelnur commented on HADOOP-10632: - Also, the {{CryptoCodec}} has some javadoc links that are incorrect, when referring to classes, don't prefix the class name with #. Minor improvements to Crypto input and output streams - Key: HADOOP-10632 URL: https://issues.apache.org/jira/browse/HADOOP-10632 Project: Hadoop Common Issue Type: Sub-task Components: security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Alejandro Abdelnur Assignee: Yi Liu Fix For: 3.0.0 Attachments: HADOOP-10632.1.patch, HADOOP-10632.patch Minor follow up feedback on the crypto streams -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10634) Add recursive list apis to FileSystem to give implementations an opportunity for optimization
[ https://issues.apache.org/jira/browse/HADOOP-10634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011531#comment-14011531 ] Hadoop QA commented on HADOOP-10634: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12647191/HADOOP-10634.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3979//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3979//console This message is automatically generated. Add recursive list apis to FileSystem to give implementations an opportunity for optimization - Key: HADOOP-10634 URL: https://issues.apache.org/jira/browse/HADOOP-10634 Project: Hadoop Common Issue Type: Improvement Components: fs/s3 Affects Versions: 2.4.0 Reporter: Sumit Kumar Attachments: HADOOP-10634.patch Currently different code flows in hadoop use recursive listing to discover files/folders in a given path. For example in FileInputFormat (both mapreduce and mapred implementations) this is done while calculating splits. They however do this by doing listing level by level. That means to discover files in /foo/bar means they do listing at /foo/bar first to get the immediate children, then make the same call on all immediate children for /foo/bar to discover their immediate children and so on. This doesn't scale well for fs implementations like s3 because every listStatus call ends up being a webservice call to s3. In cases where large number of files are considered for input, this makes getSplits() call slow. This patch adds a new set of recursive list apis that give opportunity to the s3 fs implementation to optimize. The behavior remains the same for other implementations (that is a default implementation is provided for other fs so they don't have to implement anything new). However for s3 it provides a simple change (as shown in the patch) to improve listing performance. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications
[ https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011563#comment-14011563 ] Owen O'Malley commented on HADOOP-10607: I think that it would be good to add a method in Configuration that is getPassword(String key). That method will do the credential provider lookup and translate it. Perhaps we should have the identity credential provider log a warning when it is invoked so that admins are aware when they have plaintext passwords in their config files. I think that the right final state is where you only have unadorned aliases where there are currently secrets. Create an API to Separate Credentials/Password Storage from Applications Key: HADOOP-10607 URL: https://issues.apache.org/jira/browse/HADOOP-10607 Project: Hadoop Common Issue Type: New Feature Components: security Reporter: Larry McCay Assignee: Larry McCay Fix For: 3.0.0 Attachments: 10607-2.patch, 10607-3.patch, 10607-4.patch, 10607-5.patch, 10607.patch As with the filesystem API, we need to provide a generic mechanism to support multiple credential storage mechanisms that are potentially from third parties. We need the ability to eliminate the storage of passwords and secrets in clear text within configuration files or within code. Toward that end, I propose an API that is configured using a list of URLs of CredentialProviders. The implementation will look for implementations using the ServiceLoader interface and thus support third party libraries. Two providers will be included in this patch. One using the credentials cache in MapReduce jobs and the other using Java KeyStores from either HDFS or local file system. A CredShell CLI will also be included in this patch which provides the ability to manage the credentials within the stores. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10566) Refactor proxyservers out of ProxyUsers
[ https://issues.apache.org/jira/browse/HADOOP-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HADOOP-10566: --- Resolution: Fixed Fix Version/s: 2.5.0 3.0.0 Target Version/s: 2.5.0 Status: Resolved (was: Patch Available) I committed this to branch-2. Refactor proxyservers out of ProxyUsers --- Key: HADOOP-10566 URL: https://issues.apache.org/jira/browse/HADOOP-10566 Project: Hadoop Common Issue Type: Sub-task Components: security Affects Versions: 2.4.0 Reporter: Benoy Antony Assignee: Benoy Antony Fix For: 3.0.0, 2.5.0 Attachments: HADOOP-10566-branch-2.patch, HADOOP-10566.patch, HADOOP-10566.patch, HADOOP-10566.patch, HADOOP-10566.patch, HADOOP-10566.patch HADOOP-10498 added proxyservers feature in ProxyUsers. It is beneficial to treat this as a separate feature since 1 The ProxyUsers is per proxyuser where as proxyservers is per cluster. The cardinality is different. 2 The ProxyUsers.authorize() and ProxyUsers.isproxyUser() are synchronized and hence share the same lock and impacts performance. Since these are two separate features, it will be an improvement to keep them separate. It also enables one to fine-tune each feature independently. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications
[ https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011597#comment-14011597 ] Larry McCay commented on HADOOP-10607: -- [~owen.omalley] - I can buy the getPassword method - that makes sense. What I am wondering now is whether we need alias names beyond the config property names at all. If when we call getPassword the implementation first checks for an alias of that name and finds it then it doesn't matter what the value is in the config file. We could suggest that it be ALIASED or something that shows that it is intentionally not a clear text password. I think that will get us what we want without the ugly alias token syntax. What do you think? Create an API to Separate Credentials/Password Storage from Applications Key: HADOOP-10607 URL: https://issues.apache.org/jira/browse/HADOOP-10607 Project: Hadoop Common Issue Type: New Feature Components: security Reporter: Larry McCay Assignee: Larry McCay Fix For: 3.0.0 Attachments: 10607-2.patch, 10607-3.patch, 10607-4.patch, 10607-5.patch, 10607.patch As with the filesystem API, we need to provide a generic mechanism to support multiple credential storage mechanisms that are potentially from third parties. We need the ability to eliminate the storage of passwords and secrets in clear text within configuration files or within code. Toward that end, I propose an API that is configured using a list of URLs of CredentialProviders. The implementation will look for implementations using the ServiceLoader interface and thus support third party libraries. Two providers will be included in this patch. One using the credentials cache in MapReduce jobs and the other using Java KeyStores from either HDFS or local file system. A CredShell CLI will also be included in this patch which provides the ability to manage the credentials within the stores. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10622) Shell.runCommand can deadlock
[ https://issues.apache.org/jira/browse/HADOOP-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011626#comment-14011626 ] Jason Lowe commented on HADOOP-10622: - Thanks for the review, Gera! My patch was just a quick-n-dirty thing that doesn't cover all the cases, and I agree your proposed approach is much better. Would you mind taking this and posting an official patch of your proposal? Shell.runCommand can deadlock - Key: HADOOP-10622 URL: https://issues.apache.org/jira/browse/HADOOP-10622 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Attachments: HADOOP-10622.patch Ran into a deadlock in Shell.runCommand. Stacktrace details to follow. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications
[ https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011679#comment-14011679 ] Larry McCay commented on HADOOP-10607: -- Okay let's summarize an approach here... If we have a ConfigurationCredentialProvider that simply looks for the credential in configuration then: * this can be the default provider which will allow for passwords in clear text and work out of the box * we can place a real credential provider in front of it in the provider path and allow for password aliases to be resolved and then fall back to Configuration If we add a new method to Configuration - getPassword(String name) then: * we essentially extend the configuration file to include the credentials available through the provider API * we will leverage the CredentialProvider API to get the password whether it be in a store or in the configuration file without the consuming code or even the Configuration code knowing where it comes from If we leverage the existing configuration property names as the aliases into the credential store then: * we can simply remove the password config elements from files when not in clear text or * add a value of ALIASED or something that indicates that the value is elsewhere (in case the property is mandatory for some elements) Is this accurate? Create an API to Separate Credentials/Password Storage from Applications Key: HADOOP-10607 URL: https://issues.apache.org/jira/browse/HADOOP-10607 Project: Hadoop Common Issue Type: New Feature Components: security Reporter: Larry McCay Assignee: Larry McCay Fix For: 3.0.0 Attachments: 10607-2.patch, 10607-3.patch, 10607-4.patch, 10607-5.patch, 10607.patch As with the filesystem API, we need to provide a generic mechanism to support multiple credential storage mechanisms that are potentially from third parties. We need the ability to eliminate the storage of passwords and secrets in clear text within configuration files or within code. Toward that end, I propose an API that is configured using a list of URLs of CredentialProviders. The implementation will look for implementations using the ServiceLoader interface and thus support third party libraries. Two providers will be included in this patch. One using the credentials cache in MapReduce jobs and the other using Java KeyStores from either HDFS or local file system. A CredShell CLI will also be included in this patch which provides the ability to manage the credentials within the stores. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications
[ https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011716#comment-14011716 ] Owen O'Malley commented on HADOOP-10607: Looks good except that I'd avoid the special value of ALIASED. We don't have any mandatory properties in our configs. Create an API to Separate Credentials/Password Storage from Applications Key: HADOOP-10607 URL: https://issues.apache.org/jira/browse/HADOOP-10607 Project: Hadoop Common Issue Type: New Feature Components: security Reporter: Larry McCay Assignee: Larry McCay Fix For: 3.0.0 Attachments: 10607-2.patch, 10607-3.patch, 10607-4.patch, 10607-5.patch, 10607.patch As with the filesystem API, we need to provide a generic mechanism to support multiple credential storage mechanisms that are potentially from third parties. We need the ability to eliminate the storage of passwords and secrets in clear text within configuration files or within code. Toward that end, I propose an API that is configured using a list of URLs of CredentialProviders. The implementation will look for implementations using the ServiceLoader interface and thus support third party libraries. Two providers will be included in this patch. One using the credentials cache in MapReduce jobs and the other using Java KeyStores from either HDFS or local file system. A CredShell CLI will also be included in this patch which provides the ability to manage the credentials within the stores. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-9704) Write metrics sink plugin for Hadoop/Graphite
[ https://issues.apache.org/jira/browse/HADOOP-9704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011733#comment-14011733 ] Luke Lu commented on HADOOP-9704: - [~raviprak]: Exceptions in sink impls won't bring down the daemon. The metrics system is designed to be resilient to transient back-end errors. It'll do retries according to config as well. Write metrics sink plugin for Hadoop/Graphite - Key: HADOOP-9704 URL: https://issues.apache.org/jira/browse/HADOOP-9704 Project: Hadoop Common Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Chu Tong Attachments: 0001-HADOOP-9704.-Write-metrics-sink-plugin-for-Hadoop-Gr.patch, HADOOP-9704.patch, HADOOP-9704.patch, Hadoop-9704.patch Write a metrics sink plugin for Hadoop to send metrics directly to Graphite in additional to the current ganglia and file ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-9704) Write metrics sink plugin for Hadoop/Graphite
[ https://issues.apache.org/jira/browse/HADOOP-9704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011743#comment-14011743 ] Luke Lu commented on HADOOP-9704: - The patch looks good overall. Thanks [~babakbehzad]! Please remove the tabs in the source and format according to https://wiki.apache.org/hadoop/CodeReviewChecklist Write metrics sink plugin for Hadoop/Graphite - Key: HADOOP-9704 URL: https://issues.apache.org/jira/browse/HADOOP-9704 Project: Hadoop Common Issue Type: New Feature Affects Versions: 2.0.3-alpha Reporter: Chu Tong Attachments: 0001-HADOOP-9704.-Write-metrics-sink-plugin-for-Hadoop-Gr.patch, HADOOP-9704.patch, HADOOP-9704.patch, Hadoop-9704.patch Write a metrics sink plugin for Hadoop to send metrics directly to Graphite in additional to the current ganglia and file ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications
[ https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011746#comment-14011746 ] Larry McCay commented on HADOOP-10607: -- Very good! I will hopefully have a new patch by end of day tomorrow. Create an API to Separate Credentials/Password Storage from Applications Key: HADOOP-10607 URL: https://issues.apache.org/jira/browse/HADOOP-10607 Project: Hadoop Common Issue Type: New Feature Components: security Reporter: Larry McCay Assignee: Larry McCay Fix For: 3.0.0 Attachments: 10607-2.patch, 10607-3.patch, 10607-4.patch, 10607-5.patch, 10607.patch As with the filesystem API, we need to provide a generic mechanism to support multiple credential storage mechanisms that are potentially from third parties. We need the ability to eliminate the storage of passwords and secrets in clear text within configuration files or within code. Toward that end, I propose an API that is configured using a list of URLs of CredentialProviders. The implementation will look for implementations using the ServiceLoader interface and thus support third party libraries. Two providers will be included in this patch. One using the credentials cache in MapReduce jobs and the other using Java KeyStores from either HDFS or local file system. A CredShell CLI will also be included in this patch which provides the ability to manage the credentials within the stores. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10625) Configuration: names should be trimmed when putting/getting to properties
[ https://issues.apache.org/jira/browse/HADOOP-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011754#comment-14011754 ] Wangda Tan commented on HADOOP-10625: - Thanks [~xgong] for reviewing this! Configuration: names should be trimmed when putting/getting to properties - Key: HADOOP-10625 URL: https://issues.apache.org/jira/browse/HADOOP-10625 Project: Hadoop Common Issue Type: Bug Components: conf Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.5.0 Attachments: HADOOP-10625.patch, HADOOP-10625.patch, HADOOP-10625.patch Currently, Hadoop will not trim name when putting a pair of k/v to property. But when loading configuration from file, names will be trimmed: (In Configuration.java) {code} if (name.equals(field.getTagName()) field.hasChildNodes()) attr = StringInterner.weakIntern( ((Text)field.getFirstChild()).getData().trim()); if (value.equals(field.getTagName()) field.hasChildNodes()) value = StringInterner.weakIntern( ((Text)field.getFirstChild()).getData()); {code} With this behavior, following steps will be problematic: 1. User incorrectly set hadoop.key=value (with a space before hadoop.key) 2. User try to get hadoop.key, cannot get value 3. Serialize/deserialize configuration (Like what did in MR) 4. User try to get hadoop.key, can get value, which will make inconsistency problem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10448) Support pluggable mechanism to specify proxy user settings
[ https://issues.apache.org/jira/browse/HADOOP-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011783#comment-14011783 ] Arpit Agarwal commented on HADOOP-10448: Hi [~benoyantony], During {{ImpersonationProvider}} initialization: {code} public static void authorize(UserGroupInformation user, String remoteAddress) throws AuthorizationException { if (sip==null) { refreshSuperUserGroupsConfiguration(); } {code} and in {{refreshSuperUserGroupsConfiguration}} {code} public static void refreshSuperUserGroupsConfiguration(Configuration conf) { sip = getInstance(conf); ... {code} So the first few calls could be serviced by different {{ImpersonationProvider}} objects. Is this acceptable behavior? It should be documented if so. Support pluggable mechanism to specify proxy user settings -- Key: HADOOP-10448 URL: https://issues.apache.org/jira/browse/HADOOP-10448 Project: Hadoop Common Issue Type: Sub-task Components: security Affects Versions: 2.3.0 Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HADOOP-10448.patch, HADOOP-10448.patch, HADOOP-10448.patch, HADOOP-10448.patch, HADOOP-10448.patch, HADOOP-10448.patch, HADOOP-10448.patch, HADOOP-10448.patch, HADOOP-10448.patch, HADOOP-10448.patch We have a requirement to support large number of superusers. (users who impersonate as another user) (http://hadoop.apache.org/docs/r1.2.1/Secure_Impersonation.html) Currently each superuser needs to be defined in the core-site.xml via proxyuser settings. This will be cumbersome when there are 1000 entries. It seems useful to have a pluggable mechanism to specify proxy user settings with the current approach as the default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10376) Refactor refresh*Protocols into a single generic refreshConfigProtocol
[ https://issues.apache.org/jira/browse/HADOOP-10376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-10376: -- Attachment: HADOOP-10376.patch Updated patch with support for many handlers mapping to a single identifier. Refactor refresh*Protocols into a single generic refreshConfigProtocol -- Key: HADOOP-10376 URL: https://issues.apache.org/jira/browse/HADOOP-10376 Project: Hadoop Common Issue Type: Improvement Reporter: Chris Li Assignee: Chris Li Priority: Minor Attachments: HADOOP-10376.patch, HADOOP-10376.patch, RefreshFrameworkProposal.pdf See https://issues.apache.org/jira/browse/HADOOP-10285 There are starting to be too many refresh*Protocols We can refactor them to use a single protocol with a variable payload to choose what to do. Thereafter, we can return an indication of success or failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10624) Fix some minors typo and add more test cases for hadoop_err
[ https://issues.apache.org/jira/browse/HADOOP-10624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenwu Peng updated HADOOP-10624: Attachment: HADOOP-10624-pnative.002.patch Thanks Colin great comments. HADOOP-10624-pnative.002.patch address Colin 's comment. Fix some minors typo and add more test cases for hadoop_err --- Key: HADOOP-10624 URL: https://issues.apache.org/jira/browse/HADOOP-10624 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Wenwu Peng Assignee: Wenwu Peng Attachments: HADOOP-10624-pnative.001.patch, HADOOP-10624-pnative.002.patch Changes: 1. Add more test cases to cover method hadoop_lerr_alloc and hadoop_uverr_alloc 2. Fix typo as following: 1) Change hadoop_uverr_alloc(int cod to hadoop_uverr_alloc(int code in hadoop_err.h 2) Change OutOfMemory to OutOfMemoryException to consistent with other Exception in hadoop_err.c 3) Change DBUG to DEBUG in messenger.c 4) Change DBUG to DEBUG in reactor.c -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10376) Refactor refresh*Protocols into a single generic refreshConfigProtocol
[ https://issues.apache.org/jira/browse/HADOOP-10376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012057#comment-14012057 ] Hadoop QA commented on HADOOP-10376: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12647271/HADOOP-10376.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3980//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3980//console This message is automatically generated. Refactor refresh*Protocols into a single generic refreshConfigProtocol -- Key: HADOOP-10376 URL: https://issues.apache.org/jira/browse/HADOOP-10376 Project: Hadoop Common Issue Type: Improvement Reporter: Chris Li Assignee: Chris Li Priority: Minor Attachments: HADOOP-10376.patch, HADOOP-10376.patch, RefreshFrameworkProposal.pdf See https://issues.apache.org/jira/browse/HADOOP-10285 There are starting to be too many refresh*Protocols We can refactor them to use a single protocol with a variable payload to choose what to do. Thereafter, we can return an indication of success or failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10635) Add a method to CryptoCodec to generate SRNs for IV
Alejandro Abdelnur created HADOOP-10635: --- Summary: Add a method to CryptoCodec to generate SRNs for IV Key: HADOOP-10635 URL: https://issues.apache.org/jira/browse/HADOOP-10635 Project: Hadoop Common Issue Type: Sub-task Components: security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Alejandro Abdelnur Assignee: Yi Liu SRN generators are provided by crypto libraries. the CryptoCodec gives access to a crypto library, thus it makes sense to expose the SRN generator on the CryptoCodec API. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10635) Add a method to CryptoCodec to generate SRNs for IV
[ https://issues.apache.org/jira/browse/HADOOP-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012075#comment-14012075 ] Alejandro Abdelnur commented on HADOOP-10635: - Adding a method {{byte[] generateSecureRandom(int bytes)}} would do the trick, then the impl could get a SecureRandom instance from the same provider used to get the cipher. Add a method to CryptoCodec to generate SRNs for IV --- Key: HADOOP-10635 URL: https://issues.apache.org/jira/browse/HADOOP-10635 Project: Hadoop Common Issue Type: Sub-task Components: security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Alejandro Abdelnur Assignee: Yi Liu Fix For: 3.0.0 SRN generators are provided by crypto libraries. the CryptoCodec gives access to a crypto library, thus it makes sense to expose the SRN generator on the CryptoCodec API. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10448) Support pluggable mechanism to specify proxy user settings
[ https://issues.apache.org/jira/browse/HADOOP-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012077#comment-14012077 ] Benoy Antony commented on HADOOP-10448: --- Thanks for pointing it out, Arpit. Daryn also mentioned this . {quote} During the first access or a refresh, a surge of connections may cause multiple instances to be created (all but the last disposed after the check) to be created but I suppose that's a fringe event and the benefit outweighs it. {quote} I'll document this in the source code as well as in the security documentation. I'll post a patch soon. Support pluggable mechanism to specify proxy user settings -- Key: HADOOP-10448 URL: https://issues.apache.org/jira/browse/HADOOP-10448 Project: Hadoop Common Issue Type: Sub-task Components: security Affects Versions: 2.3.0 Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HADOOP-10448.patch, HADOOP-10448.patch, HADOOP-10448.patch, HADOOP-10448.patch, HADOOP-10448.patch, HADOOP-10448.patch, HADOOP-10448.patch, HADOOP-10448.patch, HADOOP-10448.patch, HADOOP-10448.patch We have a requirement to support large number of superusers. (users who impersonate as another user) (http://hadoop.apache.org/docs/r1.2.1/Secure_Impersonation.html) Currently each superuser needs to be defined in the core-site.xml via proxyuser settings. This will be cumbersome when there are 1000 entries. It seems useful to have a pluggable mechanism to specify proxy user settings with the current approach as the default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10636) Native Hadoop Client:add unit test case for call
Wenwu Peng created HADOOP-10636: --- Summary: Native Hadoop Client:add unit test case for call Key: HADOOP-10636 URL: https://issues.apache.org/jira/browse/HADOOP-10636 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Wenwu Peng Assignee: Wenwu Peng -- This message was sent by Atlassian JIRA (v6.2#6252)