On Wed, 21 Aug 2024 14:21:39 GMT, Markus Grönlund <mgron...@openjdk.org> wrote:

>> Thread.currentThread() has an intrinsic, and isVirtual is just a type check. 
>> ContinuationSupport.isSupported reads a static final so will disappear once 
>> compiled. The pattern we are using in other areas is for the pin to return a 
>> boolean (like David suggested).
>
> I looked into this in more detail. The current suggestion:
> 
> mov    r10,QWORD PTR [r15+0x388]  ; _vthread OopHandle
> mov    r10,QWORD PTR [r10]              ; dereference OopHandle <<-- 
> Thread.currentThread() intrinsic gives 2 instructions
> mov    r11d,DWORD PTR [r10+0x8]    ; InstanceKlass to r11 <-- isVirtual()
> mov    r10d,r11d                                 ; InstanceKlass to r10
> mov    r8,QWORD PTR [r10+0x40]      ; Load slot in InstanceKlass primary 
> supers array to r8
> movabs r10,0x2d0481a8                     ; InstanceKlass for 
> {metadata('java/lang/BaseVirtualThread')} to r10
> cmp    r8,r10                                       ; compare if superklass 
> is java/lang/BaseVirtualThread
> jne    0x0000018571e0baf9                ; 6 instructions for isVirtual() 
> type check, 8 instructions in total
> 
> This gives a prologue of eight instructions.
> 
> For JFR, we already have much of this information resolved when loading up 
> the EventWriter instance using the existing intrinsic getEventWriter(). 
> Hence, we could extend that to mark the event writer with a field to say if 
> pinning should be performed. This results in only a two instruction prologue:
> 
> test   r8d,r8d                         ; pinVirtualThread? 
> je     0x0000012580a0f6c9    ; 2 instructions for test
> 
> This is an x4 speedup, although slightly less because of an additional store 
> instruction for loading the event writer.
> 
> Further, I looked into the Continuation.pin() and Continuation.unpin() 
> methods. They are currently not intrinsics, but lend themselves well to 
> intrinsification. I have created such intrinsics, and the results are quite 
> good.
> 
> Continuation.pin() or Continuation.unpin() without intrinsics = 112 
> instructions each
> Continuation.pin() or Continuation.unpin() with intrinsics = 8 instructions 
> each
> 
> This is an x14 speedup for virtual threads.

I plan to fix the event writer under this PR (to be updated) and file a 
separate tracking enhancement for the intrinsification of Continuation.pin() 
and Continuation.unpin().

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20588#discussion_r1725150866

Reply via email to