[ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004258#comment-13004258 ]
Greg Roelofs commented on HADOOP-7156: -------------------------------------- Doh! FF crashed while I was replying, sigh. Switching to e-mail: bq. In my experience, we do a really bad job of keeping the wiki up to date. Greg, what do you think? I agree--we're much better at keeping the code up to date (frequently in parallel across multiple branches ;-) ) than at keeping the wiki current. I think the XML config text is fine; you could optionally prefix it with "As of March 2011, systems known to ..." as a hint to users or future versions of us to recheck it if significant time has passed. The comment in NativeIO.c probably should be modified; perhaps "monitor used for working around a bug in the sssd security daemon, which was observed in getpwuid_r() on RHEL 6.0," or words to that effect. (Need not be that verbose, of course.) I also agree with Eli that we can leave the workaround disabled for tests. It might be worthwhile to add a log message at the start that "this test may fail (crash) with an invalid free() on some systems; see HADOOP-7156 for details." Again, feel free to word it however you wish. Trivial grammo: "workaround" is a noun; the verb form is "work around" (similar to layout, backup, setup, cleanup, checkin, cutoff, etc.). The various variable names would be more proper if they reflected this (e.g., WORK_AROUND_NON_THREADSAFE_CALLS_KEY, workAroundNonThreadSafePasswdCalls [or workAroundNonThreadsafePasswdCalls, since you're using "threadsafe" as a single word elsewhere]), but I won't fuss if you leave them as is. > getpwuid_r is not thread-safe on RHEL6 > -------------------------------------- > > Key: HADOOP-7156 > URL: https://issues.apache.org/jira/browse/HADOOP-7156 > Project: Hadoop Common > Issue Type: Bug > Affects Versions: 0.22.0 > Environment: RHEL 6.0 "Santiago" > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Priority: Critical > Fix For: 0.22.0 > > Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt > > > Due to the following bug in SSSD, functions like getpwuid_r are not > thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is > by default): > https://fedorahosted.org/sssd/ticket/640 > This causes many fetch failures in the case that the native libraries are > available, since the SecureIO functions call getpwuid_r as part of fstat. By > enabling -Xcheck:jni I get the following trace on JVM crash: > *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid > pointer: 0x0000003575741d23 *** > ======= Backtrace: ========= > /lib64/libc.so.6[0x3575675676] > /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb] > /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira