[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13417908#comment-13417908
 ] 

Todd Lipcon commented on MAPREDUCE-4417:
----------------------------------------

- the reformatting of ssl-client.xml.example and ssl-server.xml.example makes 
it a little hard to read the diff. Is it necessary to reindent, etc?

- Style:
{code}
+ * if the trust certificates keystore file changes, the trustmanager
+ * is refreshed with the new trust certificate entries (using a
+ * {@link ReloadingX509TrustManager} trustmanager).
{code}

Formatting can be improved here - eg TrustManager is a java class, so should 
probably {@link} it to make it clear you're talking about a specific class and 
not an abstract concept. (as someone who doesn't know the SSL APIs well, it 
would make it easier to read)

----

{code}
+    SSLFactory.Mode mode =
+      SSLFactory.Mode.valueOf(conf.get(SSLFactory.SSL_FACTORY_MODE));
{code}
Why are we passing the factory mode through the configuration, instead of just 
making it a parameter for init()? It seems a little fragile/unnecessary, and a 
bit confusing since it's not a parameter that the user sets.

----

{code}
+    String keystoreType =
+      conf.get(resolvePropertyName(mode, SSL_KEYSTORE_TYPE_TPL), "jks");
{code}

What's jks? You also use the term "jks" in the conf files, but I don't know 
what it refers to (again, as an SSL n00b). Improvements:
-- in the config file where you say "default value is 'jks'", add "which 
enables the blah blah type key store" and some reference to what it means?
-- in the code, add a constant SSL_KEYSTORE_TYPE_DEFAULT, and javadoc with a 
pointer to what jks is.


----
{code}
+      String keystoreLocation = conf.get(
+        resolvePropertyName(mode, SSL_KEYSTORE_LOCATION_TPL), "");
+      keystorePassword = conf.get(
+        resolvePropertyName(mode, SSL_KEYSTORE_PASSWORD_TPL), 
"").toCharArray();
+
+      LOG.debug(mode.toString() + " KeyStore: " + keystoreLocation);
+
+      InputStream is = new FileInputStream(keystoreLocation);
{code}

If this property isn't set, you'll end up passing an empty string to the 
FileInputStream constructor, which will end up giving a hard-to-diagnose 
message. Check whether {{keystoreLocation.isEmpty()}}, and if it is, throw an 
appropriate exception including the config name.

Same goes for {{trustStoreLocation}}

----

Style nit: you have several javadocs for getters which are redundant, eg:
{code}
+  /**
+   * Returns the trustmanagers for trusted certificates.
+   *
+   * @return the trustmanagers for trusted certificates.
+   */
{code}
No need to repeat yourself twice - just have the @return line in the javadoc 
and not the line above it, IMO.

----
{code}
+   * @param type type of truststore file, typically 'JKS'.
{code}
Elsewhere in the code you have "jks" (lower case). Is it case sensitive?

----
{code}
+    } catch (Exception ex) {
+      trustManagerRef.set(null);
+      LOG.warn("Could not load truststore, using empty one : " + ex.toString(),
+               ex);
+    }
{code}
Why should you use an empty one? If the user configures a path to a trust 
store, and then starts up but the store can't be found, I don't think we should 
ignore their config. Better to bail out on startup. Then all of the null checks 
later on in this file could be removed.

----
{code}
+      FileInputStream in = new FileInputStream(file);
+      try {
+        ks.load(in, password.toCharArray());
+        lastLoaded = file.lastModified();
{code}
I think you need to set {{lastLoaded}} _before_ opening the file. Otherwise 
there's a race where you can miss a change to the file.

----
{code}
+    } catch (Exception ex) {
+      throw new RuntimeException(ex);
+    }
{code}
Maybe use {{Throwables.propagateIfPossible}} here to propagate IOException and 
GeneralSecurityException first? Seems strange to throw RTE for an IOE when you 
declare that the method throws IOE.

----
{code}
+  @SuppressWarnings({"InfiniteLoopStatement"})
+  public void run() {
{code}
There's really no way we can get a cleanup hook here to stop the thread at 
shutdown?

----
{code}
+        } catch (Exception ex) {
+          trustManagerRef.set(null);
+          LOG.warn("Could not load truststore, using empty one : " +
+                   ex.toString(),  ex);
+        }
{code}

If it fails to reload, why not stick to the previous version of the reference 
instead of falling back to empty?

----
{code}
+ * This SSLFactory uses a {@link ReloadingX509TrustManager} intance,
+ * which reloads public keys if the truststore file changes.
+ * <p/>
+ * This factory is used to configure HTTPS in Hadoop HTTP based endpoints, both
+ * client & server.
{code}
Typo: 'intance'
Style: don't abbreviate the word "and" as '&' -- it's invalid javadoc and also 
just harder to read.

----
{code}
+  public enum Mode { CLIENT, SERVER }
{code}

This can be {{static}} right? Also, since it's an inner class, you need an 
{{@InterfaceAudience.Private}} on it, too, or else it shows up in the public 
javadoc. (unfortunately the annotation doesn't get inherited from its outer 
class)

----
{code}
+  public static final String SSL_ENABLED =
+    "hadoop.ssl.enabled";
...
+  public static final String KEYSTORES_FACTORY_CLASS =
+    "hadoop.ssl.keystores.factory.class";
{code}
Style: rename all these constants to end in {{_KEY}} so it's clear it's the 
conf keys and not the values themselves.

----
{code}
+    Configuration sslConf = new Configuration(false);
+    sslConf.setBoolean(SSL_REQUIRE_CLIENT_CERT, requireClientCert);
+    String sslConfResource;
+    if (mode == Mode.CLIENT) {
+        sslConfResource = conf.get(SSL_CLIENT_CONF, "ssl-client.xml");
+    } else {
+      sslConfResource = conf.get(SSL_SERVER_CONF, "ssl-server.xml");
+    }
+    sslConf.addResource(sslConfResource);
{code}
Move this into a private method {{readSslConfiguration(mode)}}? Also, 
indentation is off in one line here.

----
- Extract the creation of SSLHostnameVerifier into a new method, as well.

----
{code}
+<property>
+  <name>hadoop.ssl.enabled</name>
+  <value>false</value>
+  <description>Whether encrypted shuffle is enabled</description>
+</property>
{code}
If this is specific to encrypted shuffle, the name should reflect that, and it 
should be in mapred-default.xml, not core-default.xml

I wonder: is there a use case for having this setting per-job in some clusters? 
Either way, it should definitely be an MR config and not a core config.

----
{code}
+    The keystores factory to use for retriving certificates.
{code}
Typo: retriving

----
{code}
+
+public class KeyStoreUtil {
{code}
Rename to KeyStoreTestUtil, since this is a test-only class.

{code}
+    FileOutputStream out = new FileOutputStream(filename);
+    ks.store(out, password.toCharArray());
+    out.close();
{code}
Need try..finally. A few other places later in this same file that need this 
fix.

----

{code}
+    // Wait so that the file modification time is different
+    Thread.sleep((tm.getReloadInterval() + 2) * 1000);
{code}
You have this in a bunch of places in the test - but if you set the last 
modified time of the file, as you do elsewhere in the test, then you shouldn't 
have to sleep, except for waiting for it to _notice_ the reload. If you change 
the reload interval to be specified in millis instead of seconds, then you 
could set it to 10ms or so for the tests and these tests would run a lot faster.

----
- In TestSSLFactory, you use Assert.fail() in a lot of places after catching an 
Exception. Instead, just let the exception fall through which will fail the 
test, with the advantage that we'll actually have the stack trace of the 
exception instead of an unexplained failure message. In the cases where you 
expect an exception, use {{GenericTestUtils.assertExceptionContains}} to check 
the text.

----
{code}
+        writeFuture = ch.write(new ChunkedFile(spill, info.startOffset,
+                                               info.partLength, 8192));
{code}
What's 8192 here? Need a constant or config. If it's a buffer size, I'd think 
64K or 128K would probably perform better, based on my general experience with 
java IO.

----

- In the docs, under the ssh-client configuration, it references ssl-server.xml 
in one spot.
- Typo: "trutsstore" in one place.
- Typo: "will incurs in a significant"

----

General comment: what's the point of client certificates here? They're not a 
secret, since all users share them. I would think they'd need to be shipped 
with the job in the distributed-cache, if the use case is for cross-cluster 
authentication in tools like distcp, since different users may want to distcp 
from different clusters, and also have different access controls.
                
> add support for encrypted shuffle
> ---------------------------------
>
>                 Key: MAPREDUCE-4417
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4417
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, security
>    Affects Versions: 2.0.0-alpha
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 2.1.0-alpha
>
>         Attachments: MAPREDUCE-4417.patch, MAPREDUCE-4417.patch, 
> MAPREDUCE-4417.patch, MAPREDUCE-4417.patch, MAPREDUCE-4417.patch, 
> MAPREDUCE-4417.patch, MAPREDUCE-4417.patch, MAPREDUCE-4417.patch, 
> MAPREDUCE-4417.patch, MAPREDUCE-4417.patch, MAPREDUCE-4417.patch
>
>
> Currently Shuffle fetches go on the clear. While Kerberos provides 
> comprehensive authentication for the cluster, it does not provide 
> confidentiality. 
> When processing sensitive data confidentiality may be desired (at the expense 
> of job performance and resources utilization for doing encryption).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to