Re: [VOTE] Merge fs-encryption branch to trunk

2014-08-14 Thread sanjay Radia
While I was originally skeptical of transparent encryption, I like the value 
proposition of transparent encryption. HDFS has several layers, protocols  and 
tools. While the HDFS core part seems to be well done in the Jira, inserting 
the matching transparency in the other tools or protocols need to be worked 
through.

I have the following areas of concern:
- Common protocols like webhdfs should continue to work (the design doc marks 
this as a goal), This issue is being discussed in the Jira but it appears that 
webhdfs does not currently work with encrypted files: Andrew say that 
Regarding webhdfs, it's not a recommended deployment and that he will modify 
the documentation to match that. Aljeandro say Both httpfs and webhdfs will 
work just fine but then in the same paragraph says this could fail some 
security audits. We need to resolve this quickly. Webhdfs is heavily used by 
many Hadoop users.


- Common tools should like cp, distcp and HAR should continue  to work with 
non-encrypted and encrypted files in an automatic fashion. This issue has been 
heavily discussed in the Jira and at the meeting. The /.reserved./.raw 
mechanism appears to be a step in the right direction for distcp and cp, 
however this work has not reached its conclusion in my opinion; Charles are I 
are going through the use cases and I think we are close to a clean solution 
for distcp and cp.  HAR still needs a concrete proposal.

- KMS scalability in medium to large clusters. This can perhaps  be addressed 
by getting the keys ahead of time when a job is submitted.  Without this the  
KMS will need to be as highly available and scalable as the NN.  I think this 
is future implementation work but we need to at least determine if this is 
indeed possible in case we need to modify some of the APIs right now to support 
that.

There are some other minor things under discussion, and I still need to go 
through the new APIs.

 Unfortunately at this stage I cannot give a +1 for this merge; I hope to 
change this in the next day or -  I am working with the Jira's team.  
Alejandoro, Charles, Andrew, Atm, ...  to resolve the above as quickly as 
possible.

Sanjay (binding)



On Aug 8, 2014, at 11:45 AM, Andrew Wang andrew.w...@cloudera.com wrote:

 Hi all,
 
 I'd like to call a vote to merge the fs-encryption branch to trunk.
 Development of this feature has been ongoing since March on HDFS-6134 and
 HADOOP-10150, totally approximately 50 commits.
 
 .
 Thanks,
 Andrew


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Resolved] (HADOOP-10969) RawLocalFileSystem.setPermission throws Exception on windows

2014-08-14 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-10969.
-

Resolution: Invalid

 RawLocalFileSystem.setPermission throws Exception on windows
 

 Key: HADOOP-10969
 URL: https://issues.apache.org/jira/browse/HADOOP-10969
 Project: Hadoop Common
  Issue Type: Bug
 Environment: hadoop 2.3.0, Windows Environment, Development using 
 Eclipse, Lenevo Laptop
Reporter: Venkatesh
Priority: Blocker

 I'm an application developer. We recently moved from CDH4.7 to CDH5.1. The 
 hadoop version have been from 1.x to 2.x. In order to perform development on 
 Eclipse (on WINDOWS), the following class was created 
 public class WindowsLocalFileSystem extends LocalFileSystem {
   public WindowsLocalFileSystem() {
   super();
   }
   @Override
   public boolean mkdirs(Path f, FsPermission permission) throws 
 IOException {
   final boolean result = super.mkdirs(f);
   this.setPermission(f, permission);
   return result;
   
   }
   @Override
   public void setPermission(Path p, FsPermission permission)
   throws IOException {
   try {
   super.setPermission(p, permission);
   } catch (final IOException e) {
   System.err.println(Cant help it, hence ignoring 
 IOException setting persmission for path \ + p +
\:  + e.getMessage());
   }
   }
 }
 This class was used in MapReduce Job as
   if (RUN_LOCAL) {
   conf.set(fs.default.name, file:///);
   conf.set(mapred.job.tracker, local);
   conf.set(fs.file.impl,
   
 org.scif.bdp.mrjobs.WindowsLocalFileSystem);
   conf.set(
   io.serializations,
   
 org.apache.hadoop.io.serializer.JavaSerialization,
   + 
 org.apache.hadoop.io.serializer.WritableSerialization);
   }
 It worked fine on CDH4.7. Now the same code when compiled on CDH5.1 works but 
 when I try to execute it throws the following stacktrace
 Exception in thread main java.lang.NullPointerException
   at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010)
   at org.apache.hadoop.util.Shell.runCommand(Shell.java:451)
   at org.apache.hadoop.util.Shell.run(Shell.java:424)
   at 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:656)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:745)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:728)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633)
   at 
 org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:467)
   at 
 com.scif.bdp.common.WindowsLocalFileSystem.setPermission(WindowsLocalFileSystem.java:26)
   at 
 com.scif.bdp.common.WindowsLocalFileSystem.mkdirs(WindowsLocalFileSystem.java:17)
   at 
 org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:125)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:348)
   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)
   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1313)
   at com.scif.bdp.mrjobs.DeDup.run(DeDup.java:55)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at com.scif.bdp.mrjobs.DeDup.main(DeDup.java:59)
 (Note DeDup is my MR class to remove duplicates)
 Upon investigation the only change I saw was the change in method 
 .setPermission(). It invokes Native.POSIX.chmod as against Native.chmod



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-10963) Move compile-time dependency to JDK7

2014-08-14 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-10963.
-

Resolution: Duplicate

 Move compile-time dependency to JDK7
 

 Key: HADOOP-10963
 URL: https://issues.apache.org/jira/browse/HADOOP-10963
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Arun C Murthy
 Fix For: 2.7.0


 As discussed on the *-d...@hadoop.apache.org mailing list, this jira tracks 
 moving to JDK7 and dropping support for JDK6.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Build failed in Jenkins: Hadoop-Common-0.23-Build #1041

2014-08-14 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Common-0.23-Build/1041/

--
[...truncated 8263 lines...]
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at 
org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at 
org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)

Running org.apache.hadoop.io.TestVersionedWritable
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.078 sec
Running org.apache.hadoop.io.TestMapFile
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.746 sec
Running org.apache.hadoop.io.TestText
Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.89 sec
Running org.apache.hadoop.io.TestBloomMapFile
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.071 sec
Running org.apache.hadoop.io.serializer.TestSerializationFactory
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.293 sec
Running org.apache.hadoop.io.serializer.avro.TestAvroSerialization
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.624 sec
Running org.apache.hadoop.io.serializer.TestWritableSerialization
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.404 sec
Running org.apache.hadoop.io.TestDataByteBuffers
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.485 sec
Running org.apache.hadoop.io.TestArrayFile
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.256 sec
Running org.apache.hadoop.io.TestWritableName
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.153 sec
Running org.apache.hadoop.io.TestIOUtils
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.328 sec
Running org.apache.hadoop.io.TestSetFile
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.705 sec
Running org.apache.hadoop.io.TestSequenceFile
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 20.257 sec
Running org.apache.hadoop.io.TestObjectWritableProtos
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.367 sec
Running org.apache.hadoop.io.TestMD5Hash
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.195 sec
Running org.apache.hadoop.io.TestArrayWritable
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.078 sec
Running org.apache.hadoop.io.retry.TestFailoverProxy
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.213 sec
Running org.apache.hadoop.io.retry.TestRetryProxy
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.227 sec
Running org.apache.hadoop.io.TestEnumSetWritable
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.458 sec
Running org.apache.hadoop.io.TestSecureIOUtils
Tests run: 4, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.584 sec
Running org.apache.hadoop.io.TestBytesWritable
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.111 sec
Running org.apache.hadoop.io.file.tfile.TestTFileNoneCodecsByteArrays
Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.761 sec
Running org.apache.hadoop.io.file.tfile.TestTFileStreams
Tests run: 19, Failures: 0, Errors: 0, Skipped: 0, 

[jira] [Resolved] (HADOOP-10952) Trash.getCurrentTrashDir() should be public

2014-08-14 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HADOOP-10952.
---

Resolution: Won't Fix

Talking with others, it was decided that this was private to allow for unicorns 
to carry your data away. ;)

 Trash.getCurrentTrashDir() should be public
 ---

 Key: HADOOP-10952
 URL: https://issues.apache.org/jira/browse/HADOOP-10952
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Allen Wittenauer

 At some point in the future, I'd expect the trash location to be 
 configurable.  It makes sense to allow devs to figure out where that location 
 might be.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-10877) native client: implement hdfsMove and hdfsCopy

2014-08-14 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe resolved HADOOP-10877.
---

   Resolution: Fixed
Fix Version/s: HADOOP-10388

committed, thanks for the review

 native client: implement hdfsMove and hdfsCopy
 --

 Key: HADOOP-10877
 URL: https://issues.apache.org/jira/browse/HADOOP-10877
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: native
Affects Versions: HADOOP-10388
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: HADOOP-10388

 Attachments: HADOOP-10877-pnative.001.patch, 
 HADOOP-10877-pnative.002.patch


 In the pure native client, we need to implement {{hdfsMove}} and 
 {{hdfsCopy}}.  These are basically recursive copy functions (in the Java 
 code, move is copy with a delete at the end).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [VOTE] Migration from subversion to git for version control

2014-08-14 Thread sanjay Radia
+1 
sanjay
 
 On Fri, Aug 8, 2014 at 7:57 PM, Karthik Kambatla ka...@cloudera.com wrote:
 I have put together this proposal based on recent discussion on this topic.
 
 Please vote on the proposal. The vote runs for 7 days.
 
   1. Migrate from subversion to git for version control.
   2. Force-push to be disabled on trunk and branch-* branches. Applying
   changes from any of trunk/branch-* to any of branch-* should be through
   git cherry-pick -x.
   3. Force-push on feature-branches is allowed. Before pulling in a
   feature, the feature-branch should be rebased on latest trunk and the
   changes applied to trunk through git rebase --onto or git cherry-pick
   commit-range.
   4. Every time a feature branch is rebased on trunk, a tag that
   identifies the state before the rebase needs to be created (e.g.
   tag_feature_JIRA-2454_2014-08-07_rebase). These tags can be deleted once
   the feature is pulled into trunk and the tags are no longer useful.
   5. The relevance/use of tags stay the same after the migration.
 
 Thanks
 Karthik
 
 PS: Per Andrew Wang, this should be a Adoption of New Codebase kind of
 vote and will be Lazy 2/3 majority of PMC members.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Created] (HADOOP-10970) Cleanup KMS configuration keys

2014-08-14 Thread Andrew Wang (JIRA)
Andrew Wang created HADOOP-10970:


 Summary: Cleanup KMS configuration keys
 Key: HADOOP-10970
 URL: https://issues.apache.org/jira/browse/HADOOP-10970
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Andrew Wang
Assignee: Andrew Wang


It'd be nice to add descriptions to the config keys in kms-site.xml.

Also, it'd be good to rename key.provider.path to key.provider.uri for clarity, 
or just drop .path.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Hortonworks scripting ...

2014-08-14 Thread Niels Basjes
Hi,

In the core Hadoop you can on your (desktop) client have multiple clusters
available simply by having multiple directories with setting files (i.e.
core-site.xml etc.) and select the one you want by changing the environment
settings (i.e. HADOOP_CONF_DIR and such) around.

This doesn't work when I run under the Hortonworks 2.1.2 distribution.

There I find that in all of the scripts placed in /usr/bin/ there is
mucking about with the environment settings. Things from /etc/default are
sourced and they override my settings.
Now I can control part of it by directing the BIGTOP_DEFAULTS_DIR into a
blank directory.
But in /usr/bin/pig sourcing /etc/default/hadoop hardcoded into the script.

Why is this done this way?

P.S. Where is the git(?) repo located where this (apperently HW specific)
scripting is maintained?

-- 
Best regards / Met vriendelijke groeten,

Niels Basjes


Re: [VOTE] Migration from subversion to git for version control

2014-08-14 Thread Hitesh Shah
+1 

— Hitesh 

On Aug 8, 2014, at 7:57 PM, Karthik Kambatla ka...@cloudera.com wrote:

 I have put together this proposal based on recent discussion on this topic.
 
 Please vote on the proposal. The vote runs for 7 days.
 
   1. Migrate from subversion to git for version control.
   2. Force-push to be disabled on trunk and branch-* branches. Applying
   changes from any of trunk/branch-* to any of branch-* should be through
   git cherry-pick -x.
   3. Force-push on feature-branches is allowed. Before pulling in a
   feature, the feature-branch should be rebased on latest trunk and the
   changes applied to trunk through git rebase --onto or git cherry-pick
   commit-range.
   4. Every time a feature branch is rebased on trunk, a tag that
   identifies the state before the rebase needs to be created (e.g.
   tag_feature_JIRA-2454_2014-08-07_rebase). These tags can be deleted once
   the feature is pulled into trunk and the tags are no longer useful.
   5. The relevance/use of tags stay the same after the migration.
 
 Thanks
 Karthik
 
 PS: Per Andrew Wang, this should be a Adoption of New Codebase kind of
 vote and will be Lazy 2/3 majority of PMC members.



Re: [VOTE] Migration from subversion to git for version control

2014-08-14 Thread Jonathan Eagles
+1
On Aug 14, 2014 5:56 PM, Hitesh Shah hit...@apache.org wrote:

 +1

 — Hitesh

 On Aug 8, 2014, at 7:57 PM, Karthik Kambatla ka...@cloudera.com wrote:

  I have put together this proposal based on recent discussion on this
 topic.
 
  Please vote on the proposal. The vote runs for 7 days.
 
1. Migrate from subversion to git for version control.
2. Force-push to be disabled on trunk and branch-* branches. Applying
changes from any of trunk/branch-* to any of branch-* should be through
git cherry-pick -x.
3. Force-push on feature-branches is allowed. Before pulling in a
feature, the feature-branch should be rebased on latest trunk and the
changes applied to trunk through git rebase --onto or git
 cherry-pick
commit-range.
4. Every time a feature branch is rebased on trunk, a tag that
identifies the state before the rebase needs to be created (e.g.
tag_feature_JIRA-2454_2014-08-07_rebase). These tags can be deleted
 once
the feature is pulled into trunk and the tags are no longer useful.
5. The relevance/use of tags stay the same after the migration.
 
  Thanks
  Karthik
 
  PS: Per Andrew Wang, this should be a Adoption of New Codebase kind of
  vote and will be Lazy 2/3 majority of PMC members.




Re: [DISCUSS] Switch to log4j 2

2014-08-14 Thread Tsuyoshi OZAWA
Hi,

Steve has started discussion titled use SLF4J APIs in new modules?
as a related topic.
http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201404.mbox/%3cca+4kjvv_9cmmtdqzcgzy-chslyb1wkgdunxs7wrheslwbuh...@mail.gmail.com%3E

It sounds good to me to use asynchronous logging when we log INFO. One
concern is that asynchronous logging makes debugging difficult - I
don't know log4j 2 well, but I suspect that ordering of logging can be
changed even if WARN or  FATAL are logged with synchronous logger.

Thanks,
- Tsuyoshi

On Thu, Aug 14, 2014 at 6:44 AM, Arpit Agarwal aagar...@hortonworks.com wrote:
 I don't recall whether this was discussed before.

 I often find our INFO logging to be too sparse for useful diagnosis. A high
 performance logging framework will encourage us to log more. Specifically,
 Asynchronous Loggers look interesting.
 https://logging.apache.org/log4j/2.x/manual/async.html#Performance

 What does the community think of switching to log4j 2 in a Hadoop 2.x
 release?

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.