[jira] [Created] (HADOOP-10421) Enable Kerberos profiled UTs to run with IBM JAVA

2014-03-24 Thread Jinghui Wang (JIRA)
Jinghui Wang created HADOOP-10421:
-

 Summary: Enable Kerberos profiled UTs to run with IBM JAVA
 Key: HADOOP-10421
 URL: https://issues.apache.org/jira/browse/HADOOP-10421
 Project: Hadoop Common
  Issue Type: Test
  Components: security, test
Affects Versions: 2.2.0
Reporter: Jinghui Wang
 Fix For: 2.3.0


KerberosTestUtils in both hadoop-auth and hadoop-httpfs does not support IBM 
JAVA, which has different configuration options.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10422) Remove redundant logging of retry attempts.

2014-03-24 Thread Chris Nauroth (JIRA)
Chris Nauroth created HADOOP-10422:
--

 Summary: Remove redundant logging of retry attempts.
 Key: HADOOP-10422
 URL: https://issues.apache.org/jira/browse/HADOOP-10422
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Affects Versions: 2.3.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth


{{RetryUtils}} logs each retry attempt at both info level and debug level.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10423) Clarify compatibility policy document for combination of new client and old server.

2014-03-24 Thread Chris Nauroth (JIRA)
Chris Nauroth created HADOOP-10423:
--

 Summary: Clarify compatibility policy document for combination of 
new client and old server.
 Key: HADOOP-10423
 URL: https://issues.apache.org/jira/browse/HADOOP-10423
 Project: Hadoop Common
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.3.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth


As discussed on the dev mailing lists and MAPREDUCE-4052, we need to update the 
text of the compatibility policy to discuss a new client combined with an old 
server.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [DISCUSS] Clarification on Compatibility Policy: Upgraded Client + Old Server

2014-03-24 Thread Chris Nauroth
Thank you, everyone, for the discussion.  There is general agreement, so I
have filed HADOOP-10423 with a patch to update the compatibility
documentation.

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Thu, Mar 20, 2014 at 11:24 AM, Colin McCabe cmcc...@alumni.cmu.eduwrote:

 +1 for making this guarantee explicit.

 It also definitely seems like a good idea to test mixed versions in bigtop.

 HDFS is not immune to new client, old server scenarios because the HDFS
 client gets bundled into a lot of places.

 Colin
 On Mar 20, 2014 10:55 AM, Chris Nauroth cnaur...@hortonworks.com
 wrote:

  Our use of protobuf helps mitigate a lot of compatibility concerns, but
  there still can be situations that require careful coding on our part.
   When adding a new field to a protobuf message, the client might need to
 do
  a null check, even if the server-side implementation in the new version
  always populates the field.  When adding a whole new RPC endpoint, the
  client might need to consider the possibility that the RPC endpoint isn't
  there on an old server, and degrade gracefully after the RPC fails.  The
  original issue in MAPREDUCE-4052 concerned the script commands passed in
 a
  YARN container submission, where protobuf doesn't provide any validation
  beyond the fact that they're strings.
 
  Forward compatibility is harder than backward compatibility, and testing
 is
  a big challenge.  Our test suites in the Hadoop repo don't cover this.
   Does anyone know if anything in Bigtop tries to run with mixed versions?
 
  I agree that we need to make it clear in the language that upgrading
 client
  alone is insufficient to get access to new server-side features,
 including
  new YARN APIs.  Thanks for the suggestions, Steve.
 
  Chris Nauroth
  Hortonworks
  http://hortonworks.com/
 
 
 
  On Thu, Mar 20, 2014 at 5:53 AM, Steve Loughran ste...@hortonworks.com
  wrote:
 
   I'm clearly supportive of this, though of course the testing costs
 needed
   to back up the assertion make it more expensive than just a statement.
  
   Two issues
  
   -we'd need to make clear that new cluster features that a client can
  invoke
   won't be available. You can't expect snapshot or symlink support
 running
   against a -2.2.0 cluster, even if the client supports it.
  
   -in YARN, there are no guarantees that an app compiled against later
 YARN
   APIs will work in old clusters. Because YARN apps upload themselves to
  the
   server, and run with their hadoop, hdfs  yarn libraries. We have to
 do a
   bit of introspection in our code already to support this situation. The
   compatibility doc would need to be clear on that too: YARN apps that
 use
   new APIs (including new fields in datastructures) can expect link
   exceptions
  
  
  
  
  
   On 20 March 2014 04:25, Vinayakumar B vinayakuma...@huawei.com
 wrote:
  
+1, I agree with your point Chris. It depends on the client
 application
how they using the hdfs jars in their classpath.
   
As implementation already supports the compatibility (through
  protobuf),
No extra code changes required to support new Client + old server.
   
I feel it will be good to explicitly mention about the compatibility
 of
existing APIs in both versions.
   
Anyway this is not applicable for the new APIs in latest client and
  this
is understood. We can make it explicit in the document though.
   
   
Regards,
Vinayakumar B
   
-Original Message-
From: Chris Nauroth [mailto:cnaur...@hortonworks.com]
Sent: 20 March 2014 05:36
To: common-dev@hadoop.apache.org
Cc: mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org;
yarn-...@hadoop.apache.org
Subject: Re: [DISCUSS] Clarification on Compatibility Policy:
 Upgraded
Client + Old Server
   
I think this kind of compatibility issue still could surface for
 HDFS,
particularly for custom applications (i.e. something not executed via
hadoop jar on a cluster node, where the client classes ought to be
injected into the classpath automatically).  Running DistCP between 2
clusters of different versions could result in a 2.4.0 client
 calling a
2.3.0 NameNode.  Someone could potentially pick up the 2.4.0 WebHDFS
client as a dependency and try to use it to make HTTP calls to a
 2.3.0
   HDFS
cluster.
   
Chris Nauroth
Hortonworks
http://hortonworks.com/
   
   
   
On Wed, Mar 19, 2014 at 4:28 PM, Vinod Kumar Vavilapalli 
vino...@apache.org
 wrote:
   
 It makes sense only for YARN today where we separated out the
  clients.
 HDFS is still a monolithic jar so this compatibility issue is kind
 of
 invalid there.

 +vinod

 On Mar 19, 2014, at 1:59 PM, Chris Nauroth 
 cnaur...@hortonworks.com
  
 wrote:

  I'd like to discuss clarification of part of our compatibility
   policy.
  Here is a link to the compatibility documentation for release
  2.3.0:
 
   

Re: [DISCUSS] Clarification on Compatibility Policy: Upgraded Client + Old Server

2014-03-24 Thread Chris Nauroth
Adding back all *-dev lists to make sure everyone is covered.

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Mon, Mar 24, 2014 at 2:02 PM, Chris Nauroth cnaur...@hortonworks.comwrote:

 Thank you, everyone, for the discussion.  There is general agreement, so I
 have filed HADOOP-10423 with a patch to update the compatibility
 documentation.

 Chris Nauroth
 Hortonworks
 http://hortonworks.com/



 On Thu, Mar 20, 2014 at 11:24 AM, Colin McCabe cmcc...@alumni.cmu.eduwrote:

 +1 for making this guarantee explicit.

 It also definitely seems like a good idea to test mixed versions in
 bigtop.

 HDFS is not immune to new client, old server scenarios because the HDFS
 client gets bundled into a lot of places.

 Colin
 On Mar 20, 2014 10:55 AM, Chris Nauroth cnaur...@hortonworks.com
 wrote:

  Our use of protobuf helps mitigate a lot of compatibility concerns, but
  there still can be situations that require careful coding on our part.
   When adding a new field to a protobuf message, the client might need
 to do
  a null check, even if the server-side implementation in the new version
  always populates the field.  When adding a whole new RPC endpoint, the
  client might need to consider the possibility that the RPC endpoint
 isn't
  there on an old server, and degrade gracefully after the RPC fails.  The
  original issue in MAPREDUCE-4052 concerned the script commands passed
 in a
  YARN container submission, where protobuf doesn't provide any validation
  beyond the fact that they're strings.
 
  Forward compatibility is harder than backward compatibility, and
 testing is
  a big challenge.  Our test suites in the Hadoop repo don't cover this.
   Does anyone know if anything in Bigtop tries to run with mixed
 versions?
 
  I agree that we need to make it clear in the language that upgrading
 client
  alone is insufficient to get access to new server-side features,
 including
  new YARN APIs.  Thanks for the suggestions, Steve.
 
  Chris Nauroth
  Hortonworks
  http://hortonworks.com/
 
 
 
  On Thu, Mar 20, 2014 at 5:53 AM, Steve Loughran ste...@hortonworks.com
  wrote:
 
   I'm clearly supportive of this, though of course the testing costs
 needed
   to back up the assertion make it more expensive than just a statement.
  
   Two issues
  
   -we'd need to make clear that new cluster features that a client can
  invoke
   won't be available. You can't expect snapshot or symlink support
 running
   against a -2.2.0 cluster, even if the client supports it.
  
   -in YARN, there are no guarantees that an app compiled against later
 YARN
   APIs will work in old clusters. Because YARN apps upload themselves to
  the
   server, and run with their hadoop, hdfs  yarn libraries. We have to
 do a
   bit of introspection in our code already to support this situation.
 The
   compatibility doc would need to be clear on that too: YARN apps that
 use
   new APIs (including new fields in datastructures) can expect link
   exceptions
  
  
  
  
  
   On 20 March 2014 04:25, Vinayakumar B vinayakuma...@huawei.com
 wrote:
  
+1, I agree with your point Chris. It depends on the client
 application
how they using the hdfs jars in their classpath.
   
As implementation already supports the compatibility (through
  protobuf),
No extra code changes required to support new Client + old server.
   
I feel it will be good to explicitly mention about the
 compatibility of
existing APIs in both versions.
   
Anyway this is not applicable for the new APIs in latest client and
  this
is understood. We can make it explicit in the document though.
   
   
Regards,
Vinayakumar B
   
-Original Message-
From: Chris Nauroth [mailto:cnaur...@hortonworks.com]
Sent: 20 March 2014 05:36
To: common-dev@hadoop.apache.org
Cc: mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org;
yarn-...@hadoop.apache.org
Subject: Re: [DISCUSS] Clarification on Compatibility Policy:
 Upgraded
Client + Old Server
   
I think this kind of compatibility issue still could surface for
 HDFS,
particularly for custom applications (i.e. something not executed
 via
hadoop jar on a cluster node, where the client classes ought to be
injected into the classpath automatically).  Running DistCP between
 2
clusters of different versions could result in a 2.4.0 client
 calling a
2.3.0 NameNode.  Someone could potentially pick up the 2.4.0 WebHDFS
client as a dependency and try to use it to make HTTP calls to a
 2.3.0
   HDFS
cluster.
   
Chris Nauroth
Hortonworks
http://hortonworks.com/
   
   
   
On Wed, Mar 19, 2014 at 4:28 PM, Vinod Kumar Vavilapalli 
vino...@apache.org
 wrote:
   
 It makes sense only for YARN today where we separated out the
  clients.
 HDFS is still a monolithic jar so this compatibility issue is
 kind of
 invalid there.

 +vinod

 On Mar 19, 2014, at 1:59 PM, Chris Nauroth 
 

[jira] [Created] (HADOOP-10424) TestStreamingTaskLog is failing

2014-03-24 Thread Mit Desai (JIRA)
Mit Desai created HADOOP-10424:
--

 Summary: TestStreamingTaskLog is failing
 Key: HADOOP-10424
 URL: https://issues.apache.org/jira/browse/HADOOP-10424
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Mit Desai


testStreamingTaskLogWithHadoopCmd(org.apache.hadoop.streaming.TestStreamingTaskLog)
  Time elapsed: 44.069 sec   FAILURE!
java.lang.AssertionError: environment set for child is wrong
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.hadoop.streaming.TestStreamingTaskLog.runStreamJobAndValidateEnv(TestStreamingTaskLog.java:157)
at 
org.apache.hadoop.streaming.TestStreamingTaskLog.testStreamingTaskLogWithHadoopCmd(TestStreamingTaskLog.java:107)


Results :

Failed tests: 
  
TestStreamingTaskLog.testStreamingTaskLogWithHadoopCmd:107-runStreamJobAndValidateEnv:157
 environment set for child is wrong



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10425) Incompatible behavior of LocalFileSystem:getContentSummary

2014-03-24 Thread Brandon Li (JIRA)
Brandon Li created HADOOP-10425:
---

 Summary: Incompatible behavior of LocalFileSystem:getContentSummary
 Key: HADOOP-10425
 URL: https://issues.apache.org/jira/browse/HADOOP-10425
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.3.0
Reporter: Brandon Li
Assignee: Tsz Wo Nicholas Sze


Unlike in Hadoop1, FilterFileSystem overrides getContentSummary, which causes 
content summary to be called on rawLocalFileSystem in Local mode.

This impacts the computations of Stats in Hive with getting back FileSizes that 
include the size of the crc files.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10426) CreateOpts.getOpt(..) should declare with generic type argument

2014-03-24 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HADOOP-10426:


 Summary: CreateOpts.getOpt(..) should declare with generic type 
argument
 Key: HADOOP-10426
 URL: https://issues.apache.org/jira/browse/HADOOP-10426
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor


Similar to CreateOpts.setOpt(..), the CreateOpts.getOpt(..) should also declare 
with a generic type parameter T extends CreateOpts.  Then, all the casting 
from CreateOpts to its subclasses can be avoided.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10427) KeyProvider implementations should be thread safe

2014-03-24 Thread Alejandro Abdelnur (JIRA)
Alejandro Abdelnur created HADOOP-10427:
---

 Summary: KeyProvider implementations should be thread safe
 Key: HADOOP-10427
 URL: https://issues.apache.org/jira/browse/HADOOP-10427
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 3.0.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur


The {{KeyProvider}} API should be thread-safe so it can be used safely in 
server apps.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10428) JavaKeyStoreProvider should accept keystore password via configuration falling back to ENV VAR

2014-03-24 Thread Alejandro Abdelnur (JIRA)
Alejandro Abdelnur created HADOOP-10428:
---

 Summary:   JavaKeyStoreProvider should accept keystore password 
via configuration falling back to ENV VAR
 Key: HADOOP-10428
 URL: https://issues.apache.org/jira/browse/HADOOP-10428
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 3.0.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur


Currently the password for the {{JavaKeyStoreProvider}} must be set in an ENV 
VAR.

Allowing the password to be set via configuration enables applications to 
interactively ask for the password before initializing the 
{{JavaKeyStoreProvider}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10429) KeyStores should have methods to generate the materials themselves, KeyShell should use them

2014-03-24 Thread Alejandro Abdelnur (JIRA)
Alejandro Abdelnur created HADOOP-10429:
---

 Summary: KeyStores should have methods to generate the materials 
themselves, KeyShell should use them
 Key: HADOOP-10429
 URL: https://issues.apache.org/jira/browse/HADOOP-10429
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 3.0.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur


Currently, the {{KeyProvider}} API expects the caller to provide the key 
materials. And, the {{KeyShell}} generates key materials.

For security reasons, {{KeyProvider}} implementations may want to generate and 
hide (from the user generating the key) the key materials.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10430) KeyProvider Metadata should have an optional label, there should be a method to retrieve the metadata from all keys

2014-03-24 Thread Alejandro Abdelnur (JIRA)
Alejandro Abdelnur created HADOOP-10430:
---

 Summary: KeyProvider Metadata should have an optional label, there 
should be a method to retrieve the metadata from all keys
 Key: HADOOP-10430
 URL: https://issues.apache.org/jira/browse/HADOOP-10430
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 3.0.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur


Being able to attach an optional label (and show it when displaying metadata) 
will enable giving some context on the keys.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10431) Change visibility of KeyStore KeyVersion/Metadata/Options constructor and methods to public

2014-03-24 Thread Alejandro Abdelnur (JIRA)
Alejandro Abdelnur created HADOOP-10431:
---

 Summary: Change visibility of KeyStore KeyVersion/Metadata/Options 
constructor and methods to public
 Key: HADOOP-10431
 URL: https://issues.apache.org/jira/browse/HADOOP-10431
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 3.0.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur


Making KeyVersion/Metadata/Options constructor and methods public will 
facilitate {{KeyProvider}} implementations to use those classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)