Hi Erik,

The implementation of ThreadLocal is based on HashMap:
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ThreadLocal.java#L76

Currently it is "impossible" for JIT compiler to reliably know that value stored by set() in hash map is the same as read by get().

Also because of ThreadLocal accessors's complex code, some calls may not be inlined and JIT does not know what side effect they may have - it assumes that they can modify a value.

Thanks,
Vladimir K

On 2/22/21 3:26 AM, Eirik Bjørsnøs wrote:
Hello,

ThreadLocals are commonly used to transfer state across method boundaries
in situations where passing them as method parameters is impractical.
Examples include crossing APIs you don't own, or instrumenting code you
don't own.

Consider the following pseudo-instrumented code (where the original code
calls a getter inside a loop):

public class Student {

     private int age;

     public int maxAge(Student[] students) {

         // Instrumented code:
         ExpensiveObject expensive = new ExpensiveObject();
         expensive.recordSomething();
         threadLocal.set(expensive);

         // Original code:
         int max = 0;
         for (Student student : students) {
             max = Math.max(max, student.getAge());
         }
         return max;
     }

     public int getAge() {
         // Instrumented code
         ExpensiveObject exp = threadLocal.get();
         exp.recordSomething();

         // Original code:
         return age;
     }

     // Instrumented field:
     private static ThreadLocal<ExpensiveObject> threadLocal = new
ThreadLocal<>();
}

The ThreadLocal is used here to avoid constructing ExpensiveObject
instances in each invocation of getAge.

However, once a compiler worth its salt sees this code, it immediately
wants to inline the getAge method:

// Instrumented code:
ExpensiveObject expensive = new ExpensiveObject();
expensive.recordSomething();
threadLocal.set(expensive);

for (Student student : students) {
     // Instrumented code
     ExpensiveObject exp = threadLocal.get();
     exp.recordSomething();
     // Original code
     max = Math.max(max, student.age);
}

At this point, we see that the last write to threadLocal is 'expensive', so
any following  'threadLocal.get()' should be substitutable for 'expensive'.
So we could do the following instead:

for (Student student : students) {
     // Instrumented code
     expensive.recordSomething();
     // Original code
     max = Math.max(max, student.age);
}

More generally, a compiler could record the first lookup of a ThreadLocal
in a scope and substitute any following lookup with the first read (until
the next write).

I'm pretty sure this would be immensely useful for my current use case
(which instruments methods to count invocations), but perhaps it is also a
useful optimization in a more general sense? Examples that come to mind are
enterprise apps where transaction and security contexts are passed around
using ThreadLocals.

Has this type of optimization been discussed before? Is it even possible to
implement, or did I miss some dragons hiding in the details? What would the
estimated work for an implementation look like? Are we looking at
bachelor's thesis? Master's thesis? PhD?

Would love to hear some thoughts on this idea.

Cheers,
Eirik.

Reply via email to