Sent from my iPad
On Dec 5, 2017, at 1:26 PM, Mark Price <m...@aitusoftware.com<mailto:m...@aitusoftware.com>> wrote: That (each process having it's own copy) is surprising to me. Unless the mapping is such that private copies are required, I'd expect the processes to share the page cache entries. I can't recreate this effect locally using FileChannel.map(); the library in use in the application uses a slightly more exotic route to get to mmap, so it could be a bug there; will investigate. I could also have been imagining it. Is your pre-toucher thread a Java thread doing it's pre-touching using mapped i/o in the same process? If so, then the pre-toucher thread itself will be a high TTSP causer. The trick is to do the pre-touch in a thread that is already at a safepoint (e.g. do your pre-touch using mapped i/o from within a JNI call, use another process, or do the retouch with non-mapped i/o). Yes, just a java thread in the same process; I hadn't considered that it would also cause long TTSP, but of course it's just as likely (or more likely) to be scheduled off due to a page fault. I could try using pwrite via FileChannel.write() to do the pre-touching, but I think it needs to perform a CAS (i.e. don't overwrite data that is already present), so a JNI method would be the only way to go. Unless just doing a FileChannel.position(writeLimit).read(buffer) would do the job? Presumably that is enough to load the page into the cache and performing a write is unnecessary. This (non mapped reading at the write limit) will work to eliminate the actual page I/O impact on TTSP, but the time update path with the lock that you show in your initial stack trace will probably still hit you. I’d go either with a JNI CAS, or a forked-off mapped Java pretoucher as a separate process (tell it what you wNt touched via its stdin). Not sure which one is uglier. The pure java is more portable (for Unix/Linux variants at least) Cheers, Mark On Tuesday, 5 December 2017 10:53:17 UTC, Gil Tene wrote: Page faults in mapped file i/o and counted loops are certainly two common causes of long TTSP. But there are many other paths that *could* cause it as well in HotSpot. Without catching it and looking at the stack trace, it's hard to know which ones to blame. Once you knock out one cause, you'll see if there is another. In the specific stack trace you showed [assuming that trace was taken during a long TTSP], mapped file i/o is the most likely culprit. Your trace seems to be around making the page write-able for the first time and updating the file time (which takes a lock), but even without needing the lock, the fault itself could end up waiting for the i/o to complete (read page from disk), and that (when Murphy pays you a visit) can end up waiting behind 100s other i/o operations (e.g. when your i/o happens at the same time the kernel decided to flush some dirty pages in the cache), leading to TTSPs in the 100s of msec. As I'm sure you already know, one simple way to get around mapped file related TTSP is to not used mapped files. Explicit random i/o calls are always done while at a safepoint, so they can't cause high TTSPs. On Tuesday, December 5, 2017 at 10:30:57 AM UTC+1, Mark Price wrote: Hi Aleksey, thanks for the response. The I/O is definitely one problem, but I was trying to figure out whether it was contributing to the long TTSP times, or whether I might have some code that was misbehaving (e.g. NonCountedLoops). Your response aligns with my guesswork, so hopefully I just have the one problem to solve ;) Cheers, Mark On Tuesday, 5 December 2017 09:24:33 UTC, Aleksey Shipilev wrote: On 12/05/2017 09:26 AM, Mark Price wrote: > I'm investigating some long time-to-safepoint pauses in oracle/openjdk. The > application in question > is also suffering from some fairly nasty I/O problems where latency-sensitive > threads are being > descheduled in uninterruptible sleep state due to needing a file-system lock. > > My question: can the JVM detect that a thread is in signal/interrupt-handler > code and thus treat it > as though it is at a safepoint (as I believe happens when a thread is in > native code via a JNI call)? > > For instance, given the stack trace below, will the JVM need to wait for the > thread to be scheduled > back on to CPU in order to come to a safepoint, or will it be treated as > "in-native"? > > 7fff81714cd9 __schedule ([kernel.kallsyms]) > 7fff817151e5 schedule ([kernel.kallsyms]) > 7fff81717a4b rwsem_down_write_failed ([kernel.kallsyms]) > 7fff813556e7 call_rwsem_down_write_failed ([kernel.kallsyms]) > 7fff817172ad down_write ([kernel.kallsyms]) > 7fffa0403dcf xfs_ilock ([kernel.kallsyms]) > 7fffa04018fe xfs_vn_update_time ([kernel.kallsyms]) > 7fff8122cc5d file_update_time ([kernel.kallsyms]) > 7fffa03f7183 xfs_filemap_page_mkwrite ([kernel.kallsyms]) > 7fff811ba935 do_page_mkwrite ([kernel.kallsyms]) > 7fff811bda74 handle_pte_fault ([kernel.kallsyms]) > 7fff811c041b handle_mm_fault ([kernel.kallsyms]) > 7fff8106adbe __do_page_fault ([kernel.kallsyms]) > 7fff8106b0c0 do_page_fault ([kernel.kallsyms]) > 7fff8171af48 page_fault ([kernel.kallsyms]) > ---- java stack trace ends here ---- I am pretty sure out-of-band page fault in Java thread does not yield a safepoint. At least because safepoint polls happen at given location in the generated code, because we need the pointer map as the part of the machine state, and that is generated by Hotspot (only) around the safepoint polls. Page faulting on random read/write insns does not have that luxury. Even if JVM had intercepted that fault, there is not enough metadata to work on. The stacktrace above seems to say you have page faulted and this incurred disk I/O? This is swapping, I think, and all performance bets are off at that point. Thanks, -Aleksey -- You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/tepoA7PRFRU/unsubscribe. To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsubscr...@googlegroups.com<mailto:mechanical-sympathy+unsubscr...@googlegroups.com>. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.