[
https://issues.apache.org/jira/browse/HDFS-10183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pavel Avgustinov updated HDFS-10183:
Attachment: HDFS-10183.2.patch
Adding v2 of the patch, which resolves the new checkstyle warnings by renaming
the newly {{final}} fields to uppercase. I'm a bit mystified by the test
failures, and for me locally the test suite doesn't pass with or without the
patch. I'll try to diagnose, but if the answer is obvious to anyone I'd
appreciate a heads-up.
[~sjlee0], good point about
[JLS12.4.2|https://docs.oracle.com/javase/specs/jls/se8/html/jls-12.html#jls-12.4.2].
It indeed suggests that something slightly more subtle is going on, as the
situation I described in the original issue would not lead to a data race --
the acquisition of the class loading lock would order the write to the field in
whichever thread wins the initialization race and the read in the other
thread(s).
I suspect the answer is actually in the previous section
([JLS12.4.1|https://docs.oracle.com/javase/specs/jls/se8/html/jls-12.html#jls-12.4.1]),
which describes when class loading happens (i.e. when the procedure in 12.4.2
applies). In particular, it says "A class or interface type T will be
initialized immediately before the first occurrence of \[an instance creation
or static member access\]". In other words, if the JVM can tell that a class
has already been initialized, it does not need to perform the procedure of
12.4.2.
With this in mind, a potential explanation for the problem is this: One thread
initializes the class with the non-{{final}} {{ThreadLocal}} field, and another
tries to use it shortly after, but without triggering its own initialization.
Because it does not acquire the class loading lock, there is no happens-before
relationship between its read of the static field and the write that happened
during class initialization, and if we're unlucky the write has not in fact
propagated.
I tried to come up with a small example demonstrating the issue, but couldn't
trigger it. I speculate that it needs a certain critical mass of complexity in
order to make the HotSpot compiler kick in and apply the necessary
optimizations. The stack trace shown by [~busbey] in HADOOP-11969 is a clear
example of it being a problem in practice, though -- in the state of the tree
before that fix, the only write to the {{DECODER_FACTORY}} field is the field
initializer (which sets it to non-{{null}}), and so the exception observed
would be impossible without a data race along the lines described above.
> Prevent race condition during class initialization
> --
>
> Key: HDFS-10183
> URL: https://issues.apache.org/jira/browse/HDFS-10183
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: fs
>Affects Versions: 2.9.0
>Reporter: Pavel Avgustinov
>Assignee: Pavel Avgustinov
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: HADOOP-12944.1.patch, HDFS-10183.2.patch
>
>
> In HADOOP-11969, [~busbey] tracked down a non-deterministic
> {{NullPointerException}} to an oddity in the Java memory model: When multiple
> threads trigger the loading of a class at the same time, one of them wins and
> creates the {{java.lang.Class}} instance; the others block during this
> initialization, but once it is complete they may obtain a reference to the
> {{Class}} which has non-{{final}} fields still containing their default (i.e.
> {{null}}) values. This leads to runtime failures that are hard to debug or
> diagnose.
> HADOOP-11969 observed that {{ThreadLocal}} fields, by their very nature, are
> very likely to be accessed from multiple threads, and thus the problem is
> particularly severe there. Consequently, the patch removed all occurrences of
> the issue in the code base.
> Unfortunately, since then HDFS-7964 has [reverted one of the fixes during a
> refactoring|https://github.com/apache/hadoop/commit/2151716832ad14932dd65b1a4e47e64d8d6cd767#diff-0c2e9f7f9e685f38d1a11373b627cfa6R151],
> and introduced a [new instance of the
> problem|https://github.com/apache/hadoop/commit/2151716832ad14932dd65b1a4e47e64d8d6cd767#diff-6334d0df7d9aefbccd12b21bb7603169R43].
> The attached patch addresses the issue by adding the missing {{final}}
> modifier in these two cases.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)