[ 
https://issues.apache.org/jira/browse/HADOOP-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473757#comment-13473757
 ] 

Andy Isaacson commented on HADOOP-6311:
---------------------------------------

High level comments:
* We need a design doc explaining how all of these bits work together. This 
Jira has gone on long enough that it does not serve as documentation.
* I didn't review the red-black and splay tree implementations at all. I'm not 
sure why we expect this to be big/contended enough to deserve anything beyond a 
trivial hash table, which takes about 20 lines of C.  (ah, I see the code comes 
from \*BSD, so that's good at least.  We should document where and what version 
it came from for future maintainers' sanity.)

{code}
+++ 
hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/fd_server.h
...
+/**
+ * This file includes some common utilities 
+ * for all native code used in hadoop.
+ */
{code}
I don't think this comment is accurate.
{code}
+#include <stdint.h>
{code}
Please move {{#include}}s to the relevant {{.c}} unless they're needed in the 
{{.h}} directly. Doesn't look like it's needed here.
{code}
+    memset(&addr, 0, sizeof(struct sockaddr_un));
{code}
I prefer to say {{memset(&x, 0, sizeof\(x\))}} so that the code is clearly 
using the correct size. I don't feel too strongly about this though.
{code}
+    addr.sun_family = AF_UNIX;
+    if (bind(ud->sock, (struct sockaddr*)&addr, sizeof(sa_family_t)) < 0) {
{code}
This seems to be using the Linux-proprietary "abstract namespace".  If we do 
this it should be a Linux-specific name, not "unixDomainSock" which implies 
that the code is portable to other UNIXes such as Darwin/Mac OS or Solaris or 
FreeBSD.

The abstract socket API is documented at 
http://www.kernel.org/doc/man-pages/online/pages/man7/unix.7.html

(If I'm wrong and the abstract sockets are supported by other OSes then great! 
but I'm pretty sure they're not.)

Talking to Colin offline we confirmed that abstract sockets are Linux-specific, 
but he pointed out that {{unixDomainSockCreateAndBind}} handles both abstract 
sockets and named sockets (in the {{if(jpath)}} branch).  So the name is OK but 
we need a comment calling out the abstract socket use case.  The Linux-specific 
code will compile OK on other OSes, but it might be useful if the exception 
message says "your OS requires an explicit path" on non-Linux (using an 
{{#ifndef __linux__}} perhaps).

The control flow is a little confusing but not too bad, it could use a comment 
perhaps something like {{/* Client requested abstract socket (see unix(7) for 
details) by setting path = null. */}} in the abstract path case.

{code}
+  if (!jpath) {
... 20 lines of code
+  } else {
... 10 lines of code
+  }
{code}

I'd reorder them to {code}if (jpath) { ... } else { /* !jpath */ ... } {code} 
as it's one less bit-flip to think about.

Could you explain the benefits of using abstract sockets?  Why is it better 
than a named socket?  Ideally in a code comment near the implementation, or in 
this Jira.

{code}
+      jthr = newNativeIOException(env, errno, "error autobinding "
+                                  "PF_UNIX socket: ");
{code}
I don't recognize the phrase "autobinding".  Is that something specific to 
abstract sockets? if so it can be described in the documentation.
{code}
+      jthr = newNativeIOException(env, EIO, "getsockname():  "
+               "autobound abstract socket identifier started with / ");
{code}
Most of your newnativeIOException texts end with {{: "}} but this one ends with 
{{/ "}}. Best to be consistent.
{code}
+jthrowable unixDomainSetupSockaddr(JNIEnv *env, const char *id,
{code}
I think this function can be static, right?
{code}
+#define RETRY_ON_EINTR(ret, expr) do { \
+  ret = expr; \
+} while ((ret == -1) && (errno == EINTR));
{code}
This probably wants a maximum retry count (hardcoding 100 or thereabouts should 
be fine).

{code}
+static ssize_t safe_write(int fd, const void *b, size_t c)
+{
+  int res;
+
+  while (c > 0) {
+    res = write(fd, b, c);
+    if (res < 0) {
+      if (errno != EINTR)
+        return -errno;
+      continue;
+    }
+    c -= res;
+    b = (char *)b + res;
{code}
* I'd use a local {{char *p = b}} rather than having the cast in the loop.
* if write returns a too large value (which "cannot happen" but bugs happen) 
and c underflows, since it's unsigned the loop will spin forever (2^63 is 
forever).  I'd explicitly check {{if (res > c) return;}} before decrementing c.
* {{write(2)}} returns the number written.  Seems like we should do that here 
too.  If the user calls safe_write(100), we write 50 and then get ENOSPC, we 
should return 50 I think.  But I'm not sure, maybe that's not the right 
interface contract here.

{code}
+  if (!cmsg->cmsg_type == SCM_RIGHTS) {
{code}
Should be {{if (cmsg_type != SCM\_RIGHTS)}}.
{code}
+  if (setsockopt(sock, SOL_SOCKET, SO_RCVTIMEO, (char *)&timeout,
{code}
no need to cast the pointer, the fourth argument to {{setsockopt}} is {{void 
*}}.
{code}
+  /** The raw FileDescriptor */
+  int rawFd;
{code}
I think "raw" means "Posix filedescriptor" here?
{code}
+  pFd->rawFd = dup(rawFd);
{code}
Is there any chance we'll see races with fseek on these filedescriptors?  
Unfortunately two {{dup()}}ed fds share a single file offset pointer.  If this 
is something clients need to worry about we'll have to document it very 
prominently.

{code}
+  // Unpublish any remaining file descriptors.
+       RB_FOREACH_SAFE(pFd, publishedFdsByCookie,
+                  &data->publishedByCookie, pFdTmp) {
+    RB_REMOVE(publishedFdsByCookie, &data->publishedByCookie, pFd);
+    publishedFdFree(env, pFd);
+       }
{code}
there's some indentation craziness going on here.

{code}
+static jlong generateRandomCookie(void)
+{
+  uint64_t lo = ((uint64_t)mrand48()) & 0xffffffffULL;
+  uint64_t hi = ((uint64_t)mrand48()) << 32;
+  return (jlong)(lo | hi); 
{code}
Given that this is potentially a security-critical cookie, we should use 
/dev/urandom.
(not /dev/random due to the entropy pool blocking issues.)
{code}
+    // See man cmsg3) for more details about how 'ancillary data' works.
{code}
missing a ( I think.  I would write {{See cmsg(3) for}}.
{code}

Overall the code is pretty solid.  I wish we were not open-coding so much JNI 
argument string stuff, it's just asking for trouble, but I don't have a better 
option.  I'm not entirely convinced that this isn't overdesigned, but I'll wait 
for the design doc before coming to conclusions.
                
> Add support for unix domain sockets to JNI libs
> -----------------------------------------------
>
>                 Key: HADOOP-6311
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6311
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: native
>    Affects Versions: 0.20.0
>            Reporter: Todd Lipcon
>            Assignee: Colin Patrick McCabe
>         Attachments: 6311-trunk-inprogress.txt, HADOOP-6311.014.patch, 
> HADOOP-6311.016.patch, HADOOP-6311.018.patch, HADOOP-6311.020b.patch, 
> HADOOP-6311.020.patch, HADOOP-6311.021.patch, HADOOP-6311-0.patch, 
> HADOOP-6311-1.patch, hadoop-6311.txt
>
>
> For HDFS-347 we need to use unix domain sockets. This JIRA is to include a 
> library in common which adds a o.a.h.net.unix package based on the code from 
> Android (apache 2 license)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to