Re: JVM detection of thread at safepoint

Gil Tene Tue, 05 Dec 2017 04:39:35 -0800


Sent from my iPad

On Dec 5, 2017, at 1:26 PM, Mark Price 
<m...@aitusoftware.com<mailto:m...@aitusoftware.com>> wrote:

That (each process having it's own copy) is surprising to me. Unless the 
mapping is such that private copies are required, I'd expect the processes to 
share the page cache entries.

I can't recreate this effect locally using FileChannel.map(); the library in 
use in the application uses a slightly more exotic route to get to mmap, so it 
could be a bug there; will investigate. I could also have been imagining it.

Is your pre-toucher thread a Java thread doing it's pre-touching using mapped 
i/o in the same process? If so, then the pre-toucher thread itself will be a 
high TTSP causer. The trick is to do the pre-touch in a thread that is already 
at a safepoint (e.g. do your pre-touch using mapped i/o from within a JNI call, 
use another process, or do the retouch with non-mapped i/o).

Yes, just a java thread in the same process; I hadn't considered that it would 
also cause long TTSP, but of course it's just as likely (or more likely) to be 
scheduled off due to a page fault. I could try using pwrite via 
FileChannel.write() to do the pre-touching, but I think it needs to perform a 
CAS (i.e. don't overwrite data that is already present), so a JNI method would 
be the only way to go. Unless just doing a 
FileChannel.position(writeLimit).read(buffer) would do the job? Presumably that 
is enough to load the page into the cache and performing a write is unnecessary.

This (non mapped reading at the write limit) will work to eliminate the actual 
page I/O impact on TTSP, but the time update path with the lock that you show 
in your initial stack trace will probably still hit you. I’d go either with a 
JNI CAS, or a forked-off mapped Java pretoucher as a separate process (tell it 
what you wNt touched via its stdin). Not sure which one is uglier. The pure 
java is more portable (for Unix/Linux variants at least)

Cheers,

Mark

On Tuesday, 5 December 2017 10:53:17 UTC, Gil Tene wrote:
Page faults in mapped file i/o and counted loops are certainly two common 
causes of long TTSP. But there are many other paths that *could* cause it as 
well in HotSpot. Without catching it and looking at the stack trace, it's hard 
to know which ones to blame. Once you knock out one cause, you'll see if there 
is another.

In the specific stack trace you showed [assuming that trace was taken during a 
long TTSP], mapped file i/o is the most likely culprit. Your trace seems to be 
around making the page write-able for the first time and updating the file time 
(which takes a lock), but even without needing the lock, the fault itself could 
end up waiting for the i/o to complete (read page from disk), and that (when 
Murphy pays you a visit) can end up waiting behind 100s other i/o operations 
(e.g. when your i/o happens at the same time the kernel decided to flush some 
dirty pages in the cache), leading to TTSPs in the 100s of msec.

As I'm sure you already know, one simple way to get around mapped file related 
TTSP is to not used mapped files. Explicit random i/o calls are always done 
while at a safepoint, so they can't cause high TTSPs.

On Tuesday, December 5, 2017 at 10:30:57 AM UTC+1, Mark Price wrote:
Hi Aleksey,
thanks for the response. The I/O is definitely one problem, but I was trying to 
figure out whether it was contributing to the long TTSP times, or whether I 
might have some code that was misbehaving (e.g. NonCountedLoops).

Your response aligns with my guesswork, so hopefully I just have the one 
problem to solve ;)

Cheers,

Mark

On Tuesday, 5 December 2017 09:24:33 UTC, Aleksey Shipilev wrote:
On 12/05/2017 09:26 AM, Mark Price wrote:
> I'm investigating some long time-to-safepoint pauses in oracle/openjdk. The 
> application in question
> is also suffering from some fairly nasty I/O problems where latency-sensitive 
> threads are being
> descheduled in uninterruptible sleep state due to needing a file-system lock.
>
> My question: can the JVM detect that a thread is in signal/interrupt-handler 
> code and thus treat it
> as though it is at a safepoint (as I believe happens when a thread is in 
> native code via a JNI call)?
>
> For instance, given the stack trace below, will the JVM need to wait for the 
> thread to be scheduled
> back on to CPU in order to come to a safepoint, or will it be treated as 
> "in-native"?
>
>         7fff81714cd9 __schedule ([kernel.kallsyms])
>         7fff817151e5 schedule ([kernel.kallsyms])
>         7fff81717a4b rwsem_down_write_failed ([kernel.kallsyms])
>         7fff813556e7 call_rwsem_down_write_failed ([kernel.kallsyms])
>         7fff817172ad down_write ([kernel.kallsyms])
>         7fffa0403dcf xfs_ilock ([kernel.kallsyms])
>         7fffa04018fe xfs_vn_update_time ([kernel.kallsyms])
>         7fff8122cc5d file_update_time ([kernel.kallsyms])
>         7fffa03f7183 xfs_filemap_page_mkwrite ([kernel.kallsyms])
>         7fff811ba935 do_page_mkwrite ([kernel.kallsyms])
>         7fff811bda74 handle_pte_fault ([kernel.kallsyms])
>         7fff811c041b handle_mm_fault ([kernel.kallsyms])
>         7fff8106adbe __do_page_fault ([kernel.kallsyms])
>         7fff8106b0c0 do_page_fault ([kernel.kallsyms])
>         7fff8171af48 page_fault ([kernel.kallsyms])
>         ---- java stack trace ends here ----

I am pretty sure out-of-band page fault in Java thread does not yield a 
safepoint. At least because
safepoint polls happen at given location in the generated code, because we need 
the pointer map as
the part of the machine state, and that is generated by Hotspot (only) around 
the safepoint polls.
Page faulting on random read/write insns does not have that luxury. Even if JVM 
had intercepted that
fault, there is not enough metadata to work on.

The stacktrace above seems to say you have page faulted and this incurred disk 
I/O? This is
swapping, I think, and all performance bets are off at that point.

Thanks,
-Aleksey

--
You received this message because you are subscribed to a topic in the Google 
Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit 
https://groups.google.com/d/topic/mechanical-sympathy/tepoA7PRFRU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to 
mechanical-sympathy+unsubscr...@googlegroups.com<mailto:mechanical-sympathy+unsubscr...@googlegroups.com>.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: JVM detection of thread at safepoint

Reply via email to