Re: [VOTE] Merge fs-encryption branch to trunk
While I was originally skeptical of transparent encryption, I like the value proposition of transparent encryption. HDFS has several layers, protocols and tools. While the HDFS core part seems to be well done in the Jira, inserting the matching transparency in the other tools or protocols need to be worked through. I have the following areas of concern: - Common protocols like webhdfs should continue to work (the design doc marks this as a goal), This issue is being discussed in the Jira but it appears that webhdfs does not currently work with encrypted files: Andrew say that Regarding webhdfs, it's not a recommended deployment and that he will modify the documentation to match that. Aljeandro say Both httpfs and webhdfs will work just fine but then in the same paragraph says this could fail some security audits. We need to resolve this quickly. Webhdfs is heavily used by many Hadoop users. - Common tools should like cp, distcp and HAR should continue to work with non-encrypted and encrypted files in an automatic fashion. This issue has been heavily discussed in the Jira and at the meeting. The /.reserved./.raw mechanism appears to be a step in the right direction for distcp and cp, however this work has not reached its conclusion in my opinion; Charles are I are going through the use cases and I think we are close to a clean solution for distcp and cp. HAR still needs a concrete proposal. - KMS scalability in medium to large clusters. This can perhaps be addressed by getting the keys ahead of time when a job is submitted. Without this the KMS will need to be as highly available and scalable as the NN. I think this is future implementation work but we need to at least determine if this is indeed possible in case we need to modify some of the APIs right now to support that. There are some other minor things under discussion, and I still need to go through the new APIs. Unfortunately at this stage I cannot give a +1 for this merge; I hope to change this in the next day or - I am working with the Jira's team. Alejandoro, Charles, Andrew, Atm, ... to resolve the above as quickly as possible. Sanjay (binding) On Aug 8, 2014, at 11:45 AM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, I'd like to call a vote to merge the fs-encryption branch to trunk. Development of this feature has been ongoing since March on HDFS-6134 and HADOOP-10150, totally approximately 50 commits. . Thanks, Andrew -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Resolved] (HADOOP-10969) RawLocalFileSystem.setPermission throws Exception on windows
[ https://issues.apache.org/jira/browse/HADOOP-10969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-10969. - Resolution: Invalid RawLocalFileSystem.setPermission throws Exception on windows Key: HADOOP-10969 URL: https://issues.apache.org/jira/browse/HADOOP-10969 Project: Hadoop Common Issue Type: Bug Environment: hadoop 2.3.0, Windows Environment, Development using Eclipse, Lenevo Laptop Reporter: Venkatesh Priority: Blocker I'm an application developer. We recently moved from CDH4.7 to CDH5.1. The hadoop version have been from 1.x to 2.x. In order to perform development on Eclipse (on WINDOWS), the following class was created public class WindowsLocalFileSystem extends LocalFileSystem { public WindowsLocalFileSystem() { super(); } @Override public boolean mkdirs(Path f, FsPermission permission) throws IOException { final boolean result = super.mkdirs(f); this.setPermission(f, permission); return result; } @Override public void setPermission(Path p, FsPermission permission) throws IOException { try { super.setPermission(p, permission); } catch (final IOException e) { System.err.println(Cant help it, hence ignoring IOException setting persmission for path \ + p + \: + e.getMessage()); } } } This class was used in MapReduce Job as if (RUN_LOCAL) { conf.set(fs.default.name, file:///); conf.set(mapred.job.tracker, local); conf.set(fs.file.impl, org.scif.bdp.mrjobs.WindowsLocalFileSystem); conf.set( io.serializations, org.apache.hadoop.io.serializer.JavaSerialization, + org.apache.hadoop.io.serializer.WritableSerialization); } It worked fine on CDH4.7. Now the same code when compiled on CDH5.1 works but when I try to execute it throws the following stacktrace Exception in thread main java.lang.NullPointerException at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010) at org.apache.hadoop.util.Shell.runCommand(Shell.java:451) at org.apache.hadoop.util.Shell.run(Shell.java:424) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:656) at org.apache.hadoop.util.Shell.execCommand(Shell.java:745) at org.apache.hadoop.util.Shell.execCommand(Shell.java:728) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633) at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:467) at com.scif.bdp.common.WindowsLocalFileSystem.setPermission(WindowsLocalFileSystem.java:26) at com.scif.bdp.common.WindowsLocalFileSystem.mkdirs(WindowsLocalFileSystem.java:17) at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:125) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:348) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1313) at com.scif.bdp.mrjobs.DeDup.run(DeDup.java:55) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at com.scif.bdp.mrjobs.DeDup.main(DeDup.java:59) (Note DeDup is my MR class to remove duplicates) Upon investigation the only change I saw was the change in method .setPermission(). It invokes Native.POSIX.chmod as against Native.chmod -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HADOOP-10963) Move compile-time dependency to JDK7
[ https://issues.apache.org/jira/browse/HADOOP-10963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-10963. - Resolution: Duplicate Move compile-time dependency to JDK7 Key: HADOOP-10963 URL: https://issues.apache.org/jira/browse/HADOOP-10963 Project: Hadoop Common Issue Type: Bug Reporter: Arun C Murthy Fix For: 2.7.0 As discussed on the *-d...@hadoop.apache.org mailing list, this jira tracks moving to JDK7 and dropping support for JDK6. -- This message was sent by Atlassian JIRA (v6.2#6252)
Build failed in Jenkins: Hadoop-Common-0.23-Build #1041
See https://builds.apache.org/job/Hadoop-Common-0.23-Build/1041/ -- [...truncated 8263 lines...] at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) Running org.apache.hadoop.io.TestVersionedWritable Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.078 sec Running org.apache.hadoop.io.TestMapFile Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.746 sec Running org.apache.hadoop.io.TestText Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.89 sec Running org.apache.hadoop.io.TestBloomMapFile Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.071 sec Running org.apache.hadoop.io.serializer.TestSerializationFactory Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.293 sec Running org.apache.hadoop.io.serializer.avro.TestAvroSerialization Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.624 sec Running org.apache.hadoop.io.serializer.TestWritableSerialization Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.404 sec Running org.apache.hadoop.io.TestDataByteBuffers Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.485 sec Running org.apache.hadoop.io.TestArrayFile Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.256 sec Running org.apache.hadoop.io.TestWritableName Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.153 sec Running org.apache.hadoop.io.TestIOUtils Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.328 sec Running org.apache.hadoop.io.TestSetFile Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.705 sec Running org.apache.hadoop.io.TestSequenceFile Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 20.257 sec Running org.apache.hadoop.io.TestObjectWritableProtos Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.367 sec Running org.apache.hadoop.io.TestMD5Hash Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.195 sec Running org.apache.hadoop.io.TestArrayWritable Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.078 sec Running org.apache.hadoop.io.retry.TestFailoverProxy Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.213 sec Running org.apache.hadoop.io.retry.TestRetryProxy Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.227 sec Running org.apache.hadoop.io.TestEnumSetWritable Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.458 sec Running org.apache.hadoop.io.TestSecureIOUtils Tests run: 4, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.584 sec Running org.apache.hadoop.io.TestBytesWritable Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.111 sec Running org.apache.hadoop.io.file.tfile.TestTFileNoneCodecsByteArrays Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.761 sec Running org.apache.hadoop.io.file.tfile.TestTFileStreams Tests run: 19, Failures: 0, Errors: 0, Skipped: 0,
[jira] [Resolved] (HADOOP-10952) Trash.getCurrentTrashDir() should be public
[ https://issues.apache.org/jira/browse/HADOOP-10952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-10952. --- Resolution: Won't Fix Talking with others, it was decided that this was private to allow for unicorns to carry your data away. ;) Trash.getCurrentTrashDir() should be public --- Key: HADOOP-10952 URL: https://issues.apache.org/jira/browse/HADOOP-10952 Project: Hadoop Common Issue Type: Improvement Reporter: Allen Wittenauer At some point in the future, I'd expect the trash location to be configurable. It makes sense to allow devs to figure out where that location might be. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HADOOP-10877) native client: implement hdfsMove and hdfsCopy
[ https://issues.apache.org/jira/browse/HADOOP-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe resolved HADOOP-10877. --- Resolution: Fixed Fix Version/s: HADOOP-10388 committed, thanks for the review native client: implement hdfsMove and hdfsCopy -- Key: HADOOP-10877 URL: https://issues.apache.org/jira/browse/HADOOP-10877 Project: Hadoop Common Issue Type: Sub-task Components: native Affects Versions: HADOOP-10388 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: HADOOP-10388 Attachments: HADOOP-10877-pnative.001.patch, HADOOP-10877-pnative.002.patch In the pure native client, we need to implement {{hdfsMove}} and {{hdfsCopy}}. These are basically recursive copy functions (in the Java code, move is copy with a delete at the end). -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: [VOTE] Migration from subversion to git for version control
+1 sanjay On Fri, Aug 8, 2014 at 7:57 PM, Karthik Kambatla ka...@cloudera.com wrote: I have put together this proposal based on recent discussion on this topic. Please vote on the proposal. The vote runs for 7 days. 1. Migrate from subversion to git for version control. 2. Force-push to be disabled on trunk and branch-* branches. Applying changes from any of trunk/branch-* to any of branch-* should be through git cherry-pick -x. 3. Force-push on feature-branches is allowed. Before pulling in a feature, the feature-branch should be rebased on latest trunk and the changes applied to trunk through git rebase --onto or git cherry-pick commit-range. 4. Every time a feature branch is rebased on trunk, a tag that identifies the state before the rebase needs to be created (e.g. tag_feature_JIRA-2454_2014-08-07_rebase). These tags can be deleted once the feature is pulled into trunk and the tags are no longer useful. 5. The relevance/use of tags stay the same after the migration. Thanks Karthik PS: Per Andrew Wang, this should be a Adoption of New Codebase kind of vote and will be Lazy 2/3 majority of PMC members. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Created] (HADOOP-10970) Cleanup KMS configuration keys
Andrew Wang created HADOOP-10970: Summary: Cleanup KMS configuration keys Key: HADOOP-10970 URL: https://issues.apache.org/jira/browse/HADOOP-10970 Project: Hadoop Common Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Andrew Wang It'd be nice to add descriptions to the config keys in kms-site.xml. Also, it'd be good to rename key.provider.path to key.provider.uri for clarity, or just drop .path. -- This message was sent by Atlassian JIRA (v6.2#6252)
Hortonworks scripting ...
Hi, In the core Hadoop you can on your (desktop) client have multiple clusters available simply by having multiple directories with setting files (i.e. core-site.xml etc.) and select the one you want by changing the environment settings (i.e. HADOOP_CONF_DIR and such) around. This doesn't work when I run under the Hortonworks 2.1.2 distribution. There I find that in all of the scripts placed in /usr/bin/ there is mucking about with the environment settings. Things from /etc/default are sourced and they override my settings. Now I can control part of it by directing the BIGTOP_DEFAULTS_DIR into a blank directory. But in /usr/bin/pig sourcing /etc/default/hadoop hardcoded into the script. Why is this done this way? P.S. Where is the git(?) repo located where this (apperently HW specific) scripting is maintained? -- Best regards / Met vriendelijke groeten, Niels Basjes
Re: [VOTE] Migration from subversion to git for version control
+1 — Hitesh On Aug 8, 2014, at 7:57 PM, Karthik Kambatla ka...@cloudera.com wrote: I have put together this proposal based on recent discussion on this topic. Please vote on the proposal. The vote runs for 7 days. 1. Migrate from subversion to git for version control. 2. Force-push to be disabled on trunk and branch-* branches. Applying changes from any of trunk/branch-* to any of branch-* should be through git cherry-pick -x. 3. Force-push on feature-branches is allowed. Before pulling in a feature, the feature-branch should be rebased on latest trunk and the changes applied to trunk through git rebase --onto or git cherry-pick commit-range. 4. Every time a feature branch is rebased on trunk, a tag that identifies the state before the rebase needs to be created (e.g. tag_feature_JIRA-2454_2014-08-07_rebase). These tags can be deleted once the feature is pulled into trunk and the tags are no longer useful. 5. The relevance/use of tags stay the same after the migration. Thanks Karthik PS: Per Andrew Wang, this should be a Adoption of New Codebase kind of vote and will be Lazy 2/3 majority of PMC members.
Re: [VOTE] Migration from subversion to git for version control
+1 On Aug 14, 2014 5:56 PM, Hitesh Shah hit...@apache.org wrote: +1 — Hitesh On Aug 8, 2014, at 7:57 PM, Karthik Kambatla ka...@cloudera.com wrote: I have put together this proposal based on recent discussion on this topic. Please vote on the proposal. The vote runs for 7 days. 1. Migrate from subversion to git for version control. 2. Force-push to be disabled on trunk and branch-* branches. Applying changes from any of trunk/branch-* to any of branch-* should be through git cherry-pick -x. 3. Force-push on feature-branches is allowed. Before pulling in a feature, the feature-branch should be rebased on latest trunk and the changes applied to trunk through git rebase --onto or git cherry-pick commit-range. 4. Every time a feature branch is rebased on trunk, a tag that identifies the state before the rebase needs to be created (e.g. tag_feature_JIRA-2454_2014-08-07_rebase). These tags can be deleted once the feature is pulled into trunk and the tags are no longer useful. 5. The relevance/use of tags stay the same after the migration. Thanks Karthik PS: Per Andrew Wang, this should be a Adoption of New Codebase kind of vote and will be Lazy 2/3 majority of PMC members.
Re: [DISCUSS] Switch to log4j 2
Hi, Steve has started discussion titled use SLF4J APIs in new modules? as a related topic. http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201404.mbox/%3cca+4kjvv_9cmmtdqzcgzy-chslyb1wkgdunxs7wrheslwbuh...@mail.gmail.com%3E It sounds good to me to use asynchronous logging when we log INFO. One concern is that asynchronous logging makes debugging difficult - I don't know log4j 2 well, but I suspect that ordering of logging can be changed even if WARN or FATAL are logged with synchronous logger. Thanks, - Tsuyoshi On Thu, Aug 14, 2014 at 6:44 AM, Arpit Agarwal aagar...@hortonworks.com wrote: I don't recall whether this was discussed before. I often find our INFO logging to be too sparse for useful diagnosis. A high performance logging framework will encourage us to log more. Specifically, Asynchronous Loggers look interesting. https://logging.apache.org/log4j/2.x/manual/async.html#Performance What does the community think of switching to log4j 2 in a Hadoop 2.x release? -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.