Page faults in mapped file i/o and counted loops are certainly two common 
causes of long TTSP. But there are many other paths that *could* cause it 
as well in HotSpot. Without catching it and looking at the stack trace, 
it's hard to know which ones to blame. Once you knock out one cause, you'll 
see if there is another.

In the specific stack trace you showed [assuming that trace was taken 
during a long TTSP], mapped file i/o is the most likely culprit. Your trace 
seems to be around making the page write-able for the first time and 
updating the file time (which takes a lock), but even without needing the 
lock, the fault itself could end up waiting for the i/o to complete (read 
page from disk), and that (when Murh=phy pays you a visit) can end up 
waiting behind 100s other i/o operations (e.g. when your i/o happens at the 
same time the kernel decided to flush some dirty pages in the cache), 
leading to TTSPs in the 100s of msec.

As I'm sure you already know, one simple way to get around mapped file 
related TTSP is to not used mapped files. Explicit random i/o calls are 
always done while at a safepoint, so they can't cause high TTSPs.

On Tuesday, December 5, 2017 at 10:30:57 AM UTC+1, Mark Price wrote:
>
> Hi Aleksey,
> thanks for the response. The I/O is definitely one problem, but I was 
> trying to figure out whether it was contributing to the long TTSP times, or 
> whether I might have some code that was misbehaving (e.g. NonCountedLoops).
>
> Your response aligns with my guesswork, so hopefully I just have the one 
> problem to solve ;)
>
>
>
> Cheers,
>
> Mark
>
> On Tuesday, 5 December 2017 09:24:33 UTC, Aleksey Shipilev wrote:
>>
>> On 12/05/2017 09:26 AM, Mark Price wrote: 
>> > I'm investigating some long time-to-safepoint pauses in oracle/openjdk. 
>> The application in question 
>> > is also suffering from some fairly nasty I/O problems where 
>> latency-sensitive threads are being 
>> > descheduled in uninterruptible sleep state due to needing a file-system 
>> lock. 
>> > 
>> > My question: can the JVM detect that a thread is in 
>> signal/interrupt-handler code and thus treat it 
>> > as though it is at a safepoint (as I believe happens when a thread is 
>> in native code via a JNI call)? 
>> > 
>> > For instance, given the stack trace below, will the JVM need to wait 
>> for the thread to be scheduled 
>> > back on to CPU in order to come to a safepoint, or will it be treated 
>> as "in-native"? 
>> > 
>> >         7fff81714cd9 __schedule ([kernel.kallsyms]) 
>> >         7fff817151e5 schedule ([kernel.kallsyms]) 
>> >         7fff81717a4b rwsem_down_write_failed ([kernel.kallsyms]) 
>> >         7fff813556e7 call_rwsem_down_write_failed ([kernel.kallsyms]) 
>> >         7fff817172ad down_write ([kernel.kallsyms]) 
>> >         7fffa0403dcf xfs_ilock ([kernel.kallsyms]) 
>> >         7fffa04018fe xfs_vn_update_time ([kernel.kallsyms]) 
>> >         7fff8122cc5d file_update_time ([kernel.kallsyms]) 
>> >         7fffa03f7183 xfs_filemap_page_mkwrite ([kernel.kallsyms]) 
>> >         7fff811ba935 do_page_mkwrite ([kernel.kallsyms]) 
>> >         7fff811bda74 handle_pte_fault ([kernel.kallsyms]) 
>> >         7fff811c041b handle_mm_fault ([kernel.kallsyms]) 
>> >         7fff8106adbe __do_page_fault ([kernel.kallsyms]) 
>> >         7fff8106b0c0 do_page_fault ([kernel.kallsyms]) 
>> >         7fff8171af48 page_fault ([kernel.kallsyms]) 
>> >         ---- java stack trace ends here ---- 
>>
>> I am pretty sure out-of-band page fault in Java thread does not yield a 
>> safepoint. At least because 
>> safepoint polls happen at given location in the generated code, because 
>> we need the pointer map as 
>> the part of the machine state, and that is generated by Hotspot (only) 
>> around the safepoint polls. 
>> Page faulting on random read/write insns does not have that luxury. Even 
>> if JVM had intercepted that 
>> fault, there is not enough metadata to work on. 
>>
>> The stacktrace above seems to say you have page faulted and this incurred 
>> disk I/O? This is 
>> swapping, I think, and all performance bets are off at that point. 
>>
>> Thanks, 
>> -Aleksey 
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to