Re: JVM detection of thread at safepoint

2017-12-05 Thread Gil Tene
Alex, capturing the stack trace as the thread actually reaches a taken 
safepoint (via SafepointSynchronize::block) is definitely a cool trick. As 
a "where was I when I finally reached a safepoint after a long TTSP" 
tracer, it may well help in investigating the problematic code paths if you 
are able to deduce what they were by walking backwards from the captured 
trace point. But often this is not sufficient to figure out what causes the 
long TTSP; it helps raise suspicions, but does not catch the code "red 
handed". 

E.g. when your long TTSP is caused by a page fault incurred during on a 
mapped file access (which is a common case), the tracepoint 
for SafpointSynchronize block will not show the faulting operation on the 
stack. Instead, the SafpointSynchronize block trace will likely end up 
showing the stack trace for the back edge of a loop that contained 
(potentially deeply inlined) ByeBuffer.get() operations. E.g. when parsing 
the contents of a mapped file and taking a page fault, the trace would 
likely be in the body of Base64.Decoder.decode(ByteBuffer buffer) call, 
with no direct hint that the cause of the TTSP is a mapped file, or a page 
fault. If you examined the stack trace you may be able to guess (from the 
overall context) that the decode call was made on a ByteBuffer that was 
actually a MappedByteBuffer, and then suspect page faults and i/o as the 
culprit, but it won't be starting you in the face with a "this is what's 
going on" smile. And if the code where this occurred is not yours, but 
instead buried in some useful combinations of 3rd party maven central 
things you may be using (like a JSON parser lib being given a mapped byte 
buffer by library thing that for some reason uses mapped files to read it's 
config), the hunt may take a while... 

IMO the key to identifying the long-TTSP-causing code paths in a way that 
is legible enough for people to identify or usefully speculate about the 
TTSP cause is to capture the stack trace during the offending operation 
(rather than at the safepoint polling opportunity that immediately followed 
it). An example technique for doing that would be to implement some 
TTSP-timeout notification flag that would capture stack traces when a 
safepoint has been requested but not yet arrived at within a timeout 
period. E.g. if HotSpot sent a signal to offending (not 
yet-reached-safepoint-after-too-long-a-time) threads and the signal handler 
that deals with that signal called a specific function (e.g. 
"report_long_ttsp"), tracing THAT function would give you the stack trace 
you want, and catch the TTSP-cause red handed. And the signal handler 
itself could obviously log the traces in the JVM.

BTW, my BogusPauseReportingDemo 

 
and BogusPauseLoopReportingDemo 

 were 
originally written to demonstrate TTSP behavior and the discrepancy in 
reporting it as pause time in GC logs. The examples use array copy 
operations and long running counted loops (respectively), and show that 
application stopped times reported with the 
-XX:+PrintGCApplicationStoppedTime output lines, are not reflected in the 
GC pause output lines [e.g. see output below]. In the 
BogusPauseReportingDemo array copy as-is demo case, we're "only" talking 
about a 20x discrepancy, and in the BogusPauseLoopReportingDemo counted 
loop as-is demo case we're talking about a 1,500x+ discrepancy. Both can be 
made arbitrarily larger or smaller by playing with copy length and trip 
counts. You need to run them for a while (I usually use 100 second runs, so 
-t 10) to see the real effects (the loop demo, for example, will only 
start showing the big discrepancies after 10-30 seconds of runtime with a 
1-2GB heap on my laptop):

% java -Xmx1g -XX:+UseG1GC -cp target/classes -verbose:gc 
-XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime 
BogusPauseReportingDemo -t 10

...

3.075: [GC cleanup 526M->526M(1024M), 0.0007277 secs]

3.075: Total time for which application threads were stopped: *0.0429391* 
seconds, Stopping threads took: 0.0421697 seconds

3.138: [GC pause (G1 Humongous Allocation) (young) 850M->514M(1024M), 0.0019372 
secs]

3.140: Total time for which application threads were stopped: *0.0142784* 
seconds, Stopping threads took: 0.0122150 seconds

3.177: [GC pause (G1 Humongous Allocation) (young) (initial-mark) 
516M->514M(1024M), 0.0023110 secs]

3.179: Total time for which application threads were stopped: *0.0386066* 
seconds, Stopping threads took: 0.0361923 seconds
...

% java -Xmx1g -XX:+UseG1GC -cp target/classes -verbose:gc 
-XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime 
BogusLoopPauseReportingDemo -t 10

...

34.782: [GC cleanup 524M->524M(1024M), 0.0005976 secs]

34.783: Total time for which application threads were stopp

Re: JVM detection of thread at safepoint

2017-12-05 Thread Alex Bagehot
On Tue, Dec 5, 2017 at 11:22 AM, Mark Price  wrote:

> Hi Kirk,
> thanks for the response.
>
> I've run the same workload with -XX:+KeepSafepointsInCountedLoops, and
> the problem persists, so I'm fairly certain it's not that (assuming the JVM
> is respecting the flags...). The application is GC-free, so the safepoints
> are only going to be coming from (non-exhaustive): guaranteed safepoints,
> JIT operation?, lock biasing, others?
>
> The environment is indeed toxic: multiple JVMs mmap files on the order of
> 1/4 system memory, performing either read or write operations and the
> page-cache flusher threads are struggling to cope (not surprisingly). I
> don't really want to go down the route of using madvise/mlock.
>
> Happily, this is a test environment, so I can be brave with profiling -
> what sort of instrumentation do you refer to - kernel jprobes or JVM
> hacking?
>

Hi Mark, Kirk,

This seems to be a common question which could do with a generic solution.
This can be done with kernel uprobes tracing without changing the jvm.
There will be less guesswork left with a kernel profiler that can report
both the kernel and user stacks.
To find out:
1) what provoked the safepoint
   VMThread execute : _ZN8VMThread7executeEP12VM_Operation
  We could trace the scheduler wake event at this point but scheduler
events can be very high frequency, we avoid if possible.
2) what is delaying the safepoint sync
  SafpointSynchronize block : _ZN20SafepointSynchronize5blockEP10JavaThread


The proof of concept OpenJdk Safepoint profiler example uses linux perf.
Async profiler may be able to do a better job at reporting the interpreted
frames.


libjvm=
perf probe -d probe_libjvm:* && for p in
_ZN8VMThread7executeEP12VM_Operation
_ZN20SafepointSynchronize16do_cleanup_tasksEv
_ZN20SafepointSynchronize5blockEP10JavaThread
_ZN20SafepointSynchronize5beginEv _ZN2os28make_polling_page_unreadableEv
_ZN20SafepointSynchronize3endEv _ZN2os26make_polling_page_readableEv ;do
perf probe -x $libjvm $p;done

java_cmd=
perf record -e cpu-clock -e probe_libjvm:* -g -- taskset -c 1,2 $java_cmd
-cp . -XX:+PrintGCApplicationStoppedTime -XX:+PrintSafepointStatistics
-XX:PrintSafepointStatisticsCount=1 -XX:+PreserveFramePointer
BogusPauseReportingDemo -t 100
BogusPauseReportingDemo started, will run for 100 msec
 vmop[threads: total initially_running
wait_to_block][time: spin block sync cleanup vmop] page_trap_count
12.614: ParallelGCFailedAllocation   [  10  0
2]  [ 0   217   218 034]  0
Total time for which application threads were stopped: 0.2530938 seconds,
Stopping threads took: 0.2180568 seconds
BogusPauseReportingDemo done...
 vmop[threads: total initially_running
wait_to_block][time: spin block sync cleanup vmop] page_trap_count
13.178: no vm operation  [   6  0
0]  [ 0 0 0 0 0]  0

Polling page always armed
ParallelGCFailedAllocation 2
0 VM operations coalesced during safepoint
Maximum sync time241 ms
Maximum vm operation time (except for Exit VM operation) 34 ms
[ perf record: Woken up 46 times to write data ]
[ perf record: Captured and wrote 2.574 MB perf.data (14328 samples) ]




perf script|grep probe_libjvm

java 17997 [001] 2446326.288185:
probe_libjvm:_ZN8VMThread7executeEP12VM_Operation: (7f78a2367ee0)
java 17859 [001] 2446326.288251:
 probe_libjvm:_ZN20SafepointSynchronize5beginEv: (7f78a21d2280)
java 17859 [001] 2446326.288261:
 probe_libjvm:_ZN2os23serialize_thread_statesEv: (7f78a21272a0)
java 17859 [001] 2446326.288352:
probe_libjvm:_ZN2os28make_polling_page_unreadableEv: (7f78a2131260)
java 17997 [001] 2446326.288403:
probe_libjvm:_ZN20SafepointSynchronize5blockEP10JavaThread: (7f78a21d3a70)
java 17856 [002] 2446326.366545:
probe_libjvm:_ZN20SafepointSynchronize5blockEP10JavaThread: (7f78a21d3a70)
java 17998 [002] 2446326.506283:
probe_libjvm:_ZN20SafepointSynchronize5blockEP10JavaThread: (7f78a21d3a70)
java 17859 [002] 2446326.506313:
probe_libjvm:_ZN20SafepointSynchronize16do_cleanup_tasksEv: (7f78a21d20d0)
java 17859 [002] 2446326.541251:
 probe_libjvm:_ZN20SafepointSynchronize3endEv: (7f78a21d3670)
java 17859 [002] 2446326.541320:
probe_libjvm:_ZN2os26make_polling_page_readableEv: (7f78a21312c0)
...


java 17997 [001] 2446326.288185:
probe_libjvm:_ZN8VMThread7executeEP12VM_Operation: (7f78a2367ee0)
  b2fee0 VMThread::execute
(/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)
  ae96f3 TypeArrayKlass::allocate_common
(/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)
  681577 InterpreterRuntime::newarray
(/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)
7f788d0244c7 Interpreter (/tmp/perf-17855.map)
7f788d0004e7 call_stub (/tmp/perf-17855.

Re: JVM detection of thread at safepoint

2017-12-05 Thread Mark Price
Great, plenty of things to try :)

Thanks for your input everyone.




On Tuesday, 5 December 2017 12:39:22 UTC, Gil Tene wrote:
>
>
>
> Sent from my iPad
>
> On Dec 5, 2017, at 1:26 PM, Mark Price  > wrote:
>
>
>> That (each process having it's own copy) is surprising to me. Unless the 
>> mapping is such that private copies are required, I'd expect the processes 
>> to share the page cache entries.
>>
>
> I can't recreate this effect locally using FileChannel.map(); the library 
> in use in the application uses a slightly more exotic route to get to mmap, 
> so it could be a bug there; will investigate. I could also have been 
> imagining it.
>  
>
>>  
>>
>> Is your pre-toucher thread a Java thread doing it's pre-touching using 
>> mapped i/o in the same process? If so, then the pre-toucher thread itself 
>> will be a high TTSP causer. The trick is to do the pre-touch in a thread 
>> that is already at a safepoint (e.g. do your pre-touch using mapped i/o 
>> from within a JNI call, use another process, or do the retouch with 
>> non-mapped i/o).
>>
>
> Yes, just a java thread in the same process; I hadn't considered that it 
> would also cause long TTSP, but of course it's just as likely (or more 
> likely) to be scheduled off due to a page fault. I could try using pwrite 
> via FileChannel.write() to do the pre-touching, but I think it needs to 
> perform a CAS (i.e. don't overwrite data that is already present), so a JNI 
> method would be the only way to go. Unless just doing a 
> FileChannel.position(writeLimit).read(buffer) would do the job? Presumably 
> that is enough to load the page into the cache and performing a write is 
> unnecessary.
>
>
> This (non mapped reading at the write limit) will work to eliminate the 
> actual page I/O impact on TTSP, but the time update path with the lock that 
> you show in your initial stack trace will probably still hit you. I’d go 
> either with a JNI CAS, or a forked-off mapped Java pretoucher as a separate 
> process (tell it what you wNt touched via its stdin). Not sure which one is 
> uglier. The pure java is more portable (for Unix/Linux variants at least) 
>
>  
>
>>  
>>
>>>
>>>
>>> Cheers,
>>>
>>> Mark
>>>
>>> On Tuesday, 5 December 2017 10:53:17 UTC, Gil Tene wrote: 

 Page faults in mapped file i/o and counted loops are certainly two 
 common causes of long TTSP. But there are many other paths that *could* 
 cause it as well in HotSpot. Without catching it and looking at the stack 
 trace, it's hard to know which ones to blame. Once you knock out one 
 cause, 
 you'll see if there is another. 

 In the specific stack trace you showed [assuming that trace was taken 
 during a long TTSP], mapped file i/o is the most likely culprit. Your 
 trace 
 seems to be around making the page write-able for the first time and 
 updating the file time (which takes a lock), but even without needing the 
 lock, the fault itself could end up waiting for the i/o to complete (read 
 page from disk), and that (when Murphy pays you a visit) can end up 
 waiting 
 behind 100s other i/o operations (e.g. when your i/o happens at the same 
 time the kernel decided to flush some dirty pages in the cache), leading 
 to 
 TTSPs in the 100s of msec.

 As I'm sure you already know, one simple way to get around mapped file 
 related TTSP is to not used mapped files. Explicit random i/o calls are 
 always done while at a safepoint, so they can't cause high TTSPs.

 On Tuesday, December 5, 2017 at 10:30:57 AM UTC+1, Mark Price wrote: 
>
> Hi Aleksey,
> thanks for the response. The I/O is definitely one problem, but I was 
> trying to figure out whether it was contributing to the long TTSP times, 
> or 
> whether I might have some code that was misbehaving (e.g. 
> NonCountedLoops).
>
> Your response aligns with my guesswork, so hopefully I just have the 
> one problem to solve ;)
>
>
>
> Cheers,
>
> Mark
>
> On Tuesday, 5 December 2017 09:24:33 UTC, Aleksey Shipilev wrote: 
>>
>> On 12/05/2017 09:26 AM, Mark Price wrote: 
>> > I'm investigating some long time-to-safepoint pauses in 
>> oracle/openjdk. The application in question 
>> > is also suffering from some fairly nasty I/O problems where 
>> latency-sensitive threads are being 
>> > descheduled in uninterruptible sleep state due to needing a 
>> file-system lock. 
>> > 
>> > My question: can the JVM detect that a thread is in 
>> signal/interrupt-handler code and thus treat it 
>> > as though it is at a safepoint (as I believe happens when a thread 
>> is in native code via a JNI call)? 
>> > 
>> > For instance, given the stack trace below, will the JVM need to 
>> wait for the thread to be scheduled 
>> > back on to CPU in order to come to a safepoint, or will it be 
>> treated as 

Re: JVM detection of thread at safepoint

2017-12-05 Thread Gil Tene


Sent from my iPad

On Dec 5, 2017, at 1:26 PM, Mark Price 
mailto:m...@aitusoftware.com>> wrote:


That (each process having it's own copy) is surprising to me. Unless the 
mapping is such that private copies are required, I'd expect the processes to 
share the page cache entries.

I can't recreate this effect locally using FileChannel.map(); the library in 
use in the application uses a slightly more exotic route to get to mmap, so it 
could be a bug there; will investigate. I could also have been imagining it.



Is your pre-toucher thread a Java thread doing it's pre-touching using mapped 
i/o in the same process? If so, then the pre-toucher thread itself will be a 
high TTSP causer. The trick is to do the pre-touch in a thread that is already 
at a safepoint (e.g. do your pre-touch using mapped i/o from within a JNI call, 
use another process, or do the retouch with non-mapped i/o).

Yes, just a java thread in the same process; I hadn't considered that it would 
also cause long TTSP, but of course it's just as likely (or more likely) to be 
scheduled off due to a page fault. I could try using pwrite via 
FileChannel.write() to do the pre-touching, but I think it needs to perform a 
CAS (i.e. don't overwrite data that is already present), so a JNI method would 
be the only way to go. Unless just doing a 
FileChannel.position(writeLimit).read(buffer) would do the job? Presumably that 
is enough to load the page into the cache and performing a write is unnecessary.

This (non mapped reading at the write limit) will work to eliminate the actual 
page I/O impact on TTSP, but the time update path with the lock that you show 
in your initial stack trace will probably still hit you. I’d go either with a 
JNI CAS, or a forked-off mapped Java pretoucher as a separate process (tell it 
what you wNt touched via its stdin). Not sure which one is uglier. The pure 
java is more portable (for Unix/Linux variants at least)





Cheers,

Mark

On Tuesday, 5 December 2017 10:53:17 UTC, Gil Tene wrote:
Page faults in mapped file i/o and counted loops are certainly two common 
causes of long TTSP. But there are many other paths that *could* cause it as 
well in HotSpot. Without catching it and looking at the stack trace, it's hard 
to know which ones to blame. Once you knock out one cause, you'll see if there 
is another.

In the specific stack trace you showed [assuming that trace was taken during a 
long TTSP], mapped file i/o is the most likely culprit. Your trace seems to be 
around making the page write-able for the first time and updating the file time 
(which takes a lock), but even without needing the lock, the fault itself could 
end up waiting for the i/o to complete (read page from disk), and that (when 
Murphy pays you a visit) can end up waiting behind 100s other i/o operations 
(e.g. when your i/o happens at the same time the kernel decided to flush some 
dirty pages in the cache), leading to TTSPs in the 100s of msec.

As I'm sure you already know, one simple way to get around mapped file related 
TTSP is to not used mapped files. Explicit random i/o calls are always done 
while at a safepoint, so they can't cause high TTSPs.

On Tuesday, December 5, 2017 at 10:30:57 AM UTC+1, Mark Price wrote:
Hi Aleksey,
thanks for the response. The I/O is definitely one problem, but I was trying to 
figure out whether it was contributing to the long TTSP times, or whether I 
might have some code that was misbehaving (e.g. NonCountedLoops).

Your response aligns with my guesswork, so hopefully I just have the one 
problem to solve ;)



Cheers,

Mark

On Tuesday, 5 December 2017 09:24:33 UTC, Aleksey Shipilev wrote:
On 12/05/2017 09:26 AM, Mark Price wrote:
> I'm investigating some long time-to-safepoint pauses in oracle/openjdk. The 
> application in question
> is also suffering from some fairly nasty I/O problems where latency-sensitive 
> threads are being
> descheduled in uninterruptible sleep state due to needing a file-system lock.
>
> My question: can the JVM detect that a thread is in signal/interrupt-handler 
> code and thus treat it
> as though it is at a safepoint (as I believe happens when a thread is in 
> native code via a JNI call)?
>
> For instance, given the stack trace below, will the JVM need to wait for the 
> thread to be scheduled
> back on to CPU in order to come to a safepoint, or will it be treated as 
> "in-native"?
>
> 7fff81714cd9 __schedule ([kernel.kallsyms])
> 7fff817151e5 schedule ([kernel.kallsyms])
> 7fff81717a4b rwsem_down_write_failed ([kernel.kallsyms])
> 7fff813556e7 call_rwsem_down_write_failed ([kernel.kallsyms])
> 7fff817172ad down_write ([kernel.kallsyms])
> 7fffa0403dcf xfs_ilock ([kernel.kallsyms])
> 7fffa04018fe xfs_vn_update_time ([kernel.kallsyms])
> 7fff8122cc5d file_update_time ([kernel.kallsyms])
> 7fffa03f7183 xfs_filemap_page_mkwrite ([kernel.kallsyms])
> 7fff811ba935 do_page_mkwrite ([kerne

Re: JVM detection of thread at safepoint

2017-12-05 Thread Gil Tene
On Tuesday, December 5, 2017 at 1:26:23 PM UTC+1, Mark Price wrote:
> That (each process having it's own copy) is surprising to me. Unless the 
> mapping is such that private copies are required, I'd expect the processes to 
> share the page cache entries.
> 
> 
> I can't recreate this effect locally using FileChannel.map(); the library in 
> use in the application uses a slightly more exotic route to get to mmap, so 
> it could be a bug there; will investigate. I could also have been imagining 
> it.
> 
>  
> 
>  
> 
> Is your pre-toucher thread a Java thread doing it's pre-touching using mapped 
> i/o in the same process? If so, then the pre-toucher thread itself will be a 
> high TTSP causer. The trick is to do the pre-touch in a thread that is 
> already at a safepoint (e.g. do your pre-touch using mapped i/o from within a 
> JNI call, use another process, or do the retouch with non-mapped i/o).
> 
> 
> Yes, just a java thread in the same process; I hadn't considered that it 
> would also cause long TTSP, but of course it's just as likely (or more 
> likely) to be scheduled off due to a page fault. I could try using pwrite via 
> FileChannel.write() to do the pre-touching, but I think it needs to perform a 
> CAS (i.e. don't overwrite data that is already present), so a JNI method 
> would be the only way to go. Unless just doing a 
> FileChannel.position(writeLimit).read(buffer) would do the job? Presumably 
> that is enough to load the page into the cache and performing a write is 
> unnecessary.

This (non mapped reading at the write limit) will work to eliminate the actual 
page I/O impact on TTSP, but the time update path with the lock that you show 
in your initial stack trace will probably still hit you. I’d go either with a 
JNI CAS, or a forked-off mapped Java pretoucher as a separate process (tell it 
what you wNt touched via its stdin). Not sure which one is uglier. The pure 
java is more portable (for Unix/Linux variants at least)

> 
>  
> 
>  
> 
> 
> Cheers,
> 
> Mark
> 
> On Tuesday, 5 December 2017 10:53:17 UTC, Gil Tene  wrote:
> Page faults in mapped file i/o and counted loops are certainly two common 
> causes of long TTSP. But there are many other paths that *could* cause it as 
> well in HotSpot. Without catching it and looking at the stack trace, it's 
> hard to know which ones to blame. Once you knock out one cause, you'll see if 
> there is another.
> 
> 
> In the specific stack trace you showed [assuming that trace was taken during 
> a long TTSP], mapped file i/o is the most likely culprit. Your trace seems to 
> be around making the page write-able for the first time and updating the file 
> time (which takes a lock), but even without needing the lock, the fault 
> itself could end up waiting for the i/o to complete (read page from disk), 
> and that (when Murphy pays you a visit) can end up waiting behind 100s other 
> i/o operations (e.g. when your i/o happens at the same time the kernel 
> decided to flush some dirty pages in the cache), leading to TTSPs in the 100s 
> of msec.
> 
> 
> As I'm sure you already know, one simple way to get around mapped file 
> related TTSP is to not used mapped files. Explicit random i/o calls are 
> always done while at a safepoint, so they can't cause high TTSPs.
> 
> On Tuesday, December 5, 2017 at 10:30:57 AM UTC+1, Mark Price wrote:
> Hi Aleksey,
> thanks for the response. The I/O is definitely one problem, but I was trying 
> to figure out whether it was contributing to the long TTSP times, or whether 
> I might have some code that was misbehaving (e.g. NonCountedLoops).
> 
> Your response aligns with my guesswork, so hopefully I just have the one 
> problem to solve ;)
> 
> 
> 
> Cheers,
> 
> Mark
> 
> On Tuesday, 5 December 2017 09:24:33 UTC, Aleksey Shipilev  wrote:On 
> 12/05/2017 09:26 AM, Mark Price wrote:
> 
> > I'm investigating some long time-to-safepoint pauses in oracle/openjdk. The 
> > application in question
> 
> > is also suffering from some fairly nasty I/O problems where 
> > latency-sensitive threads are being
> 
> > descheduled in uninterruptible sleep state due to needing a file-system 
> > lock.
> 
> > 
> 
> > My question: can the JVM detect that a thread is in 
> > signal/interrupt-handler code and thus treat it
> 
> > as though it is at a safepoint (as I believe happens when a thread is in 
> > native code via a JNI call)?
> 
> > 
> 
> > For instance, given the stack trace below, will the JVM need to wait for 
> > the thread to be scheduled
> 
> > back on to CPU in order to come to a safepoint, or will it be treated as 
> > "in-native"?
> 
> > 
> 
> >         7fff81714cd9 __schedule ([kernel.kallsyms])
> 
> >         7fff817151e5 schedule ([kernel.kallsyms])
> 
> >         7fff81717a4b rwsem_down_write_failed ([kernel.kallsyms])
> 
> >         7fff813556e7 call_rwsem_down_write_failed ([kernel.kallsyms])
> 
> >         7fff817172ad down_write ([kernel.kallsyms])
> 
> >         7fffa0403dcf xfs_ilock ([kernel.kal

Re: JVM detection of thread at safepoint

2017-12-05 Thread Mark Price

>
>
> That (each process having it's own copy) is surprising to me. Unless the 
> mapping is such that private copies are required, I'd expect the processes 
> to share the page cache entries.
>

I can't recreate this effect locally using FileChannel.map(); the library 
in use in the application uses a slightly more exotic route to get to mmap, 
so it could be a bug there; will investigate. I could also have been 
imagining it.
 

>  
>
> Is your pre-toucher thread a Java thread doing it's pre-touching using 
> mapped i/o in the same process? If so, then the pre-toucher thread itself 
> will be a high TTSP causer. The trick is to do the pre-touch in a thread 
> that is already at a safepoint (e.g. do your pre-touch using mapped i/o 
> from within a JNI call, use another process, or do the retouch with 
> non-mapped i/o).
>

Yes, just a java thread in the same process; I hadn't considered that it 
would also cause long TTSP, but of course it's just as likely (or more 
likely) to be scheduled off due to a page fault. I could try using pwrite 
via FileChannel.write() to do the pre-touching, but I think it needs to 
perform a CAS (i.e. don't overwrite data that is already present), so a JNI 
method would be the only way to go. Unless just doing a 
FileChannel.position(writeLimit).read(buffer) would do the job? Presumably 
that is enough to load the page into the cache and performing a write is 
unnecessary.
 

>  
>
>>
>>
>> Cheers,
>>
>> Mark
>>
>> On Tuesday, 5 December 2017 10:53:17 UTC, Gil Tene wrote:
>>>
>>> Page faults in mapped file i/o and counted loops are certainly two 
>>> common causes of long TTSP. But there are many other paths that *could* 
>>> cause it as well in HotSpot. Without catching it and looking at the stack 
>>> trace, it's hard to know which ones to blame. Once you knock out one cause, 
>>> you'll see if there is another.
>>>
>>> In the specific stack trace you showed [assuming that trace was taken 
>>> during a long TTSP], mapped file i/o is the most likely culprit. Your trace 
>>> seems to be around making the page write-able for the first time and 
>>> updating the file time (which takes a lock), but even without needing the 
>>> lock, the fault itself could end up waiting for the i/o to complete (read 
>>> page from disk), and that (when Murphy pays you a visit) can end up waiting 
>>> behind 100s other i/o operations (e.g. when your i/o happens at the same 
>>> time the kernel decided to flush some dirty pages in the cache), leading to 
>>> TTSPs in the 100s of msec.
>>>
>>> As I'm sure you already know, one simple way to get around mapped file 
>>> related TTSP is to not used mapped files. Explicit random i/o calls are 
>>> always done while at a safepoint, so they can't cause high TTSPs.
>>>
>>> On Tuesday, December 5, 2017 at 10:30:57 AM UTC+1, Mark Price wrote:

 Hi Aleksey,
 thanks for the response. The I/O is definitely one problem, but I was 
 trying to figure out whether it was contributing to the long TTSP times, 
 or 
 whether I might have some code that was misbehaving (e.g. NonCountedLoops).

 Your response aligns with my guesswork, so hopefully I just have the 
 one problem to solve ;)



 Cheers,

 Mark

 On Tuesday, 5 December 2017 09:24:33 UTC, Aleksey Shipilev wrote:
>
> On 12/05/2017 09:26 AM, Mark Price wrote: 
> > I'm investigating some long time-to-safepoint pauses in 
> oracle/openjdk. The application in question 
> > is also suffering from some fairly nasty I/O problems where 
> latency-sensitive threads are being 
> > descheduled in uninterruptible sleep state due to needing a 
> file-system lock. 
> > 
> > My question: can the JVM detect that a thread is in 
> signal/interrupt-handler code and thus treat it 
> > as though it is at a safepoint (as I believe happens when a thread 
> is in native code via a JNI call)? 
> > 
> > For instance, given the stack trace below, will the JVM need to wait 
> for the thread to be scheduled 
> > back on to CPU in order to come to a safepoint, or will it be 
> treated as "in-native"? 
> > 
> > 7fff81714cd9 __schedule ([kernel.kallsyms]) 
> > 7fff817151e5 schedule ([kernel.kallsyms]) 
> > 7fff81717a4b rwsem_down_write_failed ([kernel.kallsyms]) 
> > 7fff813556e7 call_rwsem_down_write_failed 
> ([kernel.kallsyms]) 
> > 7fff817172ad down_write ([kernel.kallsyms]) 
> > 7fffa0403dcf xfs_ilock ([kernel.kallsyms]) 
> > 7fffa04018fe xfs_vn_update_time ([kernel.kallsyms]) 
> > 7fff8122cc5d file_update_time ([kernel.kallsyms]) 
> > 7fffa03f7183 xfs_filemap_page_mkwrite ([kernel.kallsyms]) 
> > 7fff811ba935 do_page_mkwrite ([kernel.kallsyms]) 
> > 7fff811bda74 handle_pte_fault ([kernel.kallsyms]) 
> > 7fff811c041b handle_mm_fault ([kernel.kall

Re: JVM detection of thread at safepoint

2017-12-05 Thread Gil Tene


On Tuesday, December 5, 2017 at 12:31:52 PM UTC+1, Mark Price wrote:
>
> Hi Gil,
> thanks for the response.
>
> I'm fairly sure that interaction with the page-cache is one of the 
> problems. When this is occurring, the free-mem is already hovering around 
> vm.min_free_kbytes, and the mapped files are a significant fraction of 
> system memory. From what I can see, each process that maps a file will get 
> its own copy in the page-cache (kernel shared pages doesn't seem to apply 
> to the page-cache, and is otherwise disabled on the test machine), so we 
> probably have approaching total system memory in use by cached pages.
>

That (each process having it's own copy) is surprising to me. Unless the 
mapping is such that private copies are required, I'd expect the processes 
to share the page cache entries.
 

>
> I had thought that pages that were last written-to before the 
> vm.dirty_expire_centisecs threshold would be written out to disk by the 
> flusher threads, but I read on lkml that the access times are maintained on 
> a per-inode basis, rather than per-page. If this is the case, then the 
> system in question is making it very difficult for the page-cache to work 
> efficiently.
>
> The system makes use of a "pre-toucher" thread to try to handle 
> page-faults ahead of the thread that is trying to write application data to 
> the mapped pages. However, it seems that it is not always successful, so I 
> need to spend a bit of time trying to figure out why that is not working. 
> It's possible that there is just too much memory pressure, and the OS is 
> swapping out pages that have be loaded by the pre-toucher before the 
> application gets to them.
>

Is your pre-toucher thread a Java thread doing it's pre-touching using 
mapped i/o in the same process? If so, then the pre-toucher thread itself 
will be a high TTSP causer. The trick is to do the pre-touch in a thread 
that is already at a safepoint (e.g. do your pre-touch using mapped i/o 
from within a JNI call, use another process, or do the retouch with 
non-mapped i/o).
 

>
>
> Cheers,
>
> Mark
>
> On Tuesday, 5 December 2017 10:53:17 UTC, Gil Tene wrote:
>>
>> Page faults in mapped file i/o and counted loops are certainly two common 
>> causes of long TTSP. But there are many other paths that *could* cause it 
>> as well in HotSpot. Without catching it and looking at the stack trace, 
>> it's hard to know which ones to blame. Once you knock out one cause, you'll 
>> see if there is another.
>>
>> In the specific stack trace you showed [assuming that trace was taken 
>> during a long TTSP], mapped file i/o is the most likely culprit. Your trace 
>> seems to be around making the page write-able for the first time and 
>> updating the file time (which takes a lock), but even without needing the 
>> lock, the fault itself could end up waiting for the i/o to complete (read 
>> page from disk), and that (when Murphy pays you a visit) can end up waiting 
>> behind 100s other i/o operations (e.g. when your i/o happens at the same 
>> time the kernel decided to flush some dirty pages in the cache), leading to 
>> TTSPs in the 100s of msec.
>>
>> As I'm sure you already know, one simple way to get around mapped file 
>> related TTSP is to not used mapped files. Explicit random i/o calls are 
>> always done while at a safepoint, so they can't cause high TTSPs.
>>
>> On Tuesday, December 5, 2017 at 10:30:57 AM UTC+1, Mark Price wrote:
>>>
>>> Hi Aleksey,
>>> thanks for the response. The I/O is definitely one problem, but I was 
>>> trying to figure out whether it was contributing to the long TTSP times, or 
>>> whether I might have some code that was misbehaving (e.g. NonCountedLoops).
>>>
>>> Your response aligns with my guesswork, so hopefully I just have the one 
>>> problem to solve ;)
>>>
>>>
>>>
>>> Cheers,
>>>
>>> Mark
>>>
>>> On Tuesday, 5 December 2017 09:24:33 UTC, Aleksey Shipilev wrote:

 On 12/05/2017 09:26 AM, Mark Price wrote: 
 > I'm investigating some long time-to-safepoint pauses in 
 oracle/openjdk. The application in question 
 > is also suffering from some fairly nasty I/O problems where 
 latency-sensitive threads are being 
 > descheduled in uninterruptible sleep state due to needing a 
 file-system lock. 
 > 
 > My question: can the JVM detect that a thread is in 
 signal/interrupt-handler code and thus treat it 
 > as though it is at a safepoint (as I believe happens when a thread is 
 in native code via a JNI call)? 
 > 
 > For instance, given the stack trace below, will the JVM need to wait 
 for the thread to be scheduled 
 > back on to CPU in order to come to a safepoint, or will it be treated 
 as "in-native"? 
 > 
 > 7fff81714cd9 __schedule ([kernel.kallsyms]) 
 > 7fff817151e5 schedule ([kernel.kallsyms]) 
 > 7fff81717a4b rwsem_down_write_failed ([kernel.kallsyms]) 
 > 7fff813556e7 call_rwsem_dow

Re: JVM detection of thread at safepoint

2017-12-05 Thread Gil Tene
You can use it to track down your problem. You can download a Zing trial 
and play with it. It won't tell you anything about the various HotSpot 
specific TTSP paths (since most of those don't exist in Zing), but since 
you suspect mapped i/o based TTSPs, Zing should run into those just as 
much, and you'll get good TTSP coverage and clear-blame stack traces to 
play with. 

On Tuesday, December 5, 2017 at 12:33:54 PM UTC+1, Mark Price wrote:
>
>
>> In Zing, we have a built-in TTSP profiler for exactly this reason. 
>>
>
> I remember it fondly :)
>
>
>  
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: JVM detection of thread at safepoint

2017-12-05 Thread Mark Price

>
>
> In Zing, we have a built-in TTSP profiler for exactly this reason. 
>

I remember it fondly :)


 

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: JVM detection of thread at safepoint

2017-12-05 Thread Mark Price
Hi Gil,
thanks for the response.

I'm fairly sure that interaction with the page-cache is one of the 
problems. When this is occurring, the free-mem is already hovering around 
vm.min_free_kbytes, and the mapped files are a significant fraction of 
system memory. From what I can see, each process that maps a file will get 
its own copy in the page-cache (kernel shared pages doesn't seem to apply 
to the page-cache, and is otherwise disabled on the test machine), so we 
probably have approaching total system memory in use by cached pages.

I had thought that pages that were last written-to before the 
vm.dirty_expire_centisecs threshold would be written out to disk by the 
flusher threads, but I read on lkml that the access times are maintained on 
a per-inode basis, rather than per-page. If this is the case, then the 
system in question is making it very difficult for the page-cache to work 
efficiently.

The system makes use of a "pre-toucher" thread to try to handle page-faults 
ahead of the thread that is trying to write application data to the mapped 
pages. However, it seems that it is not always successful, so I need to 
spend a bit of time trying to figure out why that is not working. It's 
possible that there is just too much memory pressure, and the OS is 
swapping out pages that have be loaded by the pre-toucher before the 
application gets to them.


Cheers,

Mark

On Tuesday, 5 December 2017 10:53:17 UTC, Gil Tene wrote:
>
> Page faults in mapped file i/o and counted loops are certainly two common 
> causes of long TTSP. But there are many other paths that *could* cause it 
> as well in HotSpot. Without catching it and looking at the stack trace, 
> it's hard to know which ones to blame. Once you knock out one cause, you'll 
> see if there is another.
>
> In the specific stack trace you showed [assuming that trace was taken 
> during a long TTSP], mapped file i/o is the most likely culprit. Your trace 
> seems to be around making the page write-able for the first time and 
> updating the file time (which takes a lock), but even without needing the 
> lock, the fault itself could end up waiting for the i/o to complete (read 
> page from disk), and that (when Murphy pays you a visit) can end up waiting 
> behind 100s other i/o operations (e.g. when your i/o happens at the same 
> time the kernel decided to flush some dirty pages in the cache), leading to 
> TTSPs in the 100s of msec.
>
> As I'm sure you already know, one simple way to get around mapped file 
> related TTSP is to not used mapped files. Explicit random i/o calls are 
> always done while at a safepoint, so they can't cause high TTSPs.
>
> On Tuesday, December 5, 2017 at 10:30:57 AM UTC+1, Mark Price wrote:
>>
>> Hi Aleksey,
>> thanks for the response. The I/O is definitely one problem, but I was 
>> trying to figure out whether it was contributing to the long TTSP times, or 
>> whether I might have some code that was misbehaving (e.g. NonCountedLoops).
>>
>> Your response aligns with my guesswork, so hopefully I just have the one 
>> problem to solve ;)
>>
>>
>>
>> Cheers,
>>
>> Mark
>>
>> On Tuesday, 5 December 2017 09:24:33 UTC, Aleksey Shipilev wrote:
>>>
>>> On 12/05/2017 09:26 AM, Mark Price wrote: 
>>> > I'm investigating some long time-to-safepoint pauses in 
>>> oracle/openjdk. The application in question 
>>> > is also suffering from some fairly nasty I/O problems where 
>>> latency-sensitive threads are being 
>>> > descheduled in uninterruptible sleep state due to needing a 
>>> file-system lock. 
>>> > 
>>> > My question: can the JVM detect that a thread is in 
>>> signal/interrupt-handler code and thus treat it 
>>> > as though it is at a safepoint (as I believe happens when a thread is 
>>> in native code via a JNI call)? 
>>> > 
>>> > For instance, given the stack trace below, will the JVM need to wait 
>>> for the thread to be scheduled 
>>> > back on to CPU in order to come to a safepoint, or will it be treated 
>>> as "in-native"? 
>>> > 
>>> > 7fff81714cd9 __schedule ([kernel.kallsyms]) 
>>> > 7fff817151e5 schedule ([kernel.kallsyms]) 
>>> > 7fff81717a4b rwsem_down_write_failed ([kernel.kallsyms]) 
>>> > 7fff813556e7 call_rwsem_down_write_failed ([kernel.kallsyms]) 
>>> > 7fff817172ad down_write ([kernel.kallsyms]) 
>>> > 7fffa0403dcf xfs_ilock ([kernel.kallsyms]) 
>>> > 7fffa04018fe xfs_vn_update_time ([kernel.kallsyms]) 
>>> > 7fff8122cc5d file_update_time ([kernel.kallsyms]) 
>>> > 7fffa03f7183 xfs_filemap_page_mkwrite ([kernel.kallsyms]) 
>>> > 7fff811ba935 do_page_mkwrite ([kernel.kallsyms]) 
>>> > 7fff811bda74 handle_pte_fault ([kernel.kallsyms]) 
>>> > 7fff811c041b handle_mm_fault ([kernel.kallsyms]) 
>>> > 7fff8106adbe __do_page_fault ([kernel.kallsyms]) 
>>> > 7fff8106b0c0 do_page_fault ([kernel.kallsyms]) 
>>> > 7fff8171af48 page_fault ([kernel.kallsyms]) 
>>> >  java stack 

Re: JVM detection of thread at safepoint

2017-12-05 Thread Gil Tene
One of the challenges in using GC logs to find out if you have TTSP issues 
is that GC events are extremely infrequent, so not having seen a high TTSP 
in the log doesn't mean you won't hit one soon... It's literally like 
deducing the likelihood of hitting something bad from 1000 data points 
collected with a 1-in-a-million random sampler. Coverage is very low.

In Zing, we have a built-in TTSP profiler for exactly this reason. 
Coverage. We initially used it internally, to detect and clean up all the 
JVM code paths that were problematic, and now we use it with customers, 
mostly to prove and point fingers at the system-related causes of TTSP when 
they occur. The way the TTSP profiler works is basically by using a tick 
based profiler that records the tick time arms a thread-local "go to a 
no-op safepoint" whose only purpose is to report the amount of time elapsed 
form the original tick to that thread getting to a safepoint. The thread 
leaves the safepoint immediate, so no pauses happen, and we keep track of 
the call stacks for e.g. the top 50 TTSP times seen since the last profile 
reset. With this technique, we can reliably completely cover and report on 
any experienced code path with a TTSP larger than e.g. 5 msec, and we get 
fairly good coverage even for the lower levels (just not 100% coverage, 
unless you are willing to take a e.g. TTSP-profiling tick 2000 times per 
second on each running thread).

I expect that with HotSpot finally looking at adding thread-specific 
safe-pointing and checkpoints, similar tooling will be needed there.

On Tuesday, December 5, 2017 at 12:11:44 PM UTC+1, Kirk Pepperdine wrote:
>
> Hi,
>
>
> On Dec 5, 2017, at 11:53 AM, Gil Tene > wrote:
>
> Page faults in mapped file i/o and counted loops are certainly two common 
> causes of long TTSP. But there are many other paths that *could* cause it 
> as well in HotSpot. Without catching it and looking at the stack trace, 
> it's hard to know which ones to blame. Once you knock out one cause, you'll 
> see if there is another.
>
> In the specific stack trace you showed [assuming that trace was taken 
> during a long TTSP], mapped file i/o is the most likely culprit. Your trace 
> seems to be around making the page write-able for the first time and 
> updating the file time (which takes a lock), but even without needing the 
> lock, the fault itself could end up waiting for the i/o to complete (read 
> page from disk), and that (when Murh=phy pays you a visit) can end up 
> waiting behind 100s other i/o operations (e.g. when your i/o happens at the 
> same time the kernel decided to flush some dirty pages in the cache), 
> leading to TTSPs in the 100s of msec.
>
> As I'm sure you already know, one simple way to get around mapped file 
> related TTSP is to not used mapped files. Explicit random i/o calls are 
> always done while at a safepoint, so they can't cause high TTSPs.
>
>
> I guess another way to avoid the long TTSP is to not safe point.. ;-)
>
> Most of the long TTSP’s IME are either due to outside scheduling 
> interference or loops that are compiled to the point where they turn into 
> counted loops. Finding this using traditional tooling is impossible because 
> you need the safe point which has most likely been compiled out of the code 
> to get a sample to see why you’re waiting for a safe point. Using an async 
> profiler (like honest) leaves you wanting for some correlation between 
> calls for safe point and the samples you’re looking at. So, I don’t really 
> have a good answer as to how to find these things. What I have done in the 
> past is used the GC log to get an idea of the toxicity level of the 
> environment you’re running in and then use that information to decide if I 
> need to look for internal or external causes. If the causes are internal 
> than it’s most likely a counted loop. Using JITWatch in this cause can you 
> get a candidate list of counted loops to consider. After that I’ve binary 
> searched through that list. If the environment is toxic.. then you need to 
> clean up the environment. If you are brave, you add some instrumentation to 
> the JVM to sample for the last thread to reach the safe point and use that 
> in your prod environment.
>
> Kind regards,
> Kirk
>
>
> On Tuesday, December 5, 2017 at 10:30:57 AM UTC+1, Mark Price wrote:
>>
>> Hi Aleksey,
>> thanks for the response. The I/O is definitely one problem, but I was 
>> trying to figure out whether it was contributing to the long TTSP times, or 
>> whether I might have some code that was misbehaving (e.g. NonCountedLoops).
>>
>> Your response aligns with my guesswork, so hopefully I just have the one 
>> problem to solve ;)
>>
>>
>>
>> Cheers,
>>
>> Mark
>>
>> On Tuesday, 5 December 2017 09:24:33 UTC, Aleksey Shipilev wrote:
>>>
>>> On 12/05/2017 09:26 AM, Mark Price wrote: 
>>> > I'm investigating some long time-to-safepoint pauses in 
>>> oracle/openjdk. The application in question 
>>> > is also suffering from some

Re: JVM detection of thread at safepoint

2017-12-05 Thread Mark Price
Hi Kirk,
thanks for the response.

I've run the same workload with -XX:+KeepSafepointsInCountedLoops, and the
problem persists, so I'm fairly certain it's not that (assuming the JVM is
respecting the flags...). The application is GC-free, so the safepoints are
only going to be coming from (non-exhaustive): guaranteed safepoints, JIT
operation?, lock biasing, others?

The environment is indeed toxic: multiple JVMs mmap files on the order of
1/4 system memory, performing either read or write operations and the
page-cache flusher threads are struggling to cope (not surprisingly). I
don't really want to go down the route of using madvise/mlock.

Happily, this is a test environment, so I can be brave with profiling -
what sort of instrumentation do you refer to - kernel jprobes or JVM
hacking?


Cheers,

Mark

On Tue, Dec 5, 2017 at 11:11 AM Kirk Pepperdine  wrote:

> Hi,
>
>
> On Dec 5, 2017, at 11:53 AM, Gil Tene  wrote:
>
> Page faults in mapped file i/o and counted loops are certainly two common
> causes of long TTSP. But there are many other paths that *could* cause it
> as well in HotSpot. Without catching it and looking at the stack trace,
> it's hard to know which ones to blame. Once you knock out one cause, you'll
> see if there is another.
>
> In the specific stack trace you showed [assuming that trace was taken
> during a long TTSP], mapped file i/o is the most likely culprit. Your trace
> seems to be around making the page write-able for the first time and
> updating the file time (which takes a lock), but even without needing the
> lock, the fault itself could end up waiting for the i/o to complete (read
> page from disk), and that (when Murh=phy pays you a visit) can end up
> waiting behind 100s other i/o operations (e.g. when your i/o happens at the
> same time the kernel decided to flush some dirty pages in the cache),
> leading to TTSPs in the 100s of msec.
>
> As I'm sure you already know, one simple way to get around mapped file
> related TTSP is to not used mapped files. Explicit random i/o calls are
> always done while at a safepoint, so they can't cause high TTSPs.
>
>
> I guess another way to avoid the long TTSP is to not safe point.. ;-)
>
> Most of the long TTSP’s IME are either due to outside scheduling
> interference or loops that are compiled to the point where they turn into
> counted loops. Finding this using traditional tooling is impossible because
> you need the safe point which has most likely been compiled out of the code
> to get a sample to see why you’re waiting for a safe point. Using an async
> profiler (like honest) leaves you wanting for some correlation between
> calls for safe point and the samples you’re looking at. So, I don’t really
> have a good answer as to how to find these things. What I have done in the
> past is used the GC log to get an idea of the toxicity level of the
> environment you’re running in and then use that information to decide if I
> need to look for internal or external causes. If the causes are internal
> than it’s most likely a counted loop. Using JITWatch in this cause can you
> get a candidate list of counted loops to consider. After that I’ve binary
> searched through that list. If the environment is toxic.. then you need to
> clean up the environment. If you are brave, you add some instrumentation to
> the JVM to sample for the last thread to reach the safe point and use that
> in your prod environment.
>
> Kind regards,
> Kirk
>
>
> On Tuesday, December 5, 2017 at 10:30:57 AM UTC+1, Mark Price wrote:
>>
>> Hi Aleksey,
>> thanks for the response. The I/O is definitely one problem, but I was
>> trying to figure out whether it was contributing to the long TTSP times, or
>> whether I might have some code that was misbehaving (e.g. NonCountedLoops).
>>
>> Your response aligns with my guesswork, so hopefully I just have the one
>> problem to solve ;)
>>
>>
>>
>> Cheers,
>>
>> Mark
>>
>> On Tuesday, 5 December 2017 09:24:33 UTC, Aleksey Shipilev wrote:
>>>
>>> On 12/05/2017 09:26 AM, Mark Price wrote:
>>> > I'm investigating some long time-to-safepoint pauses in
>>> oracle/openjdk. The application in question
>>> > is also suffering from some fairly nasty I/O problems where
>>> latency-sensitive threads are being
>>> > descheduled in uninterruptible sleep state due to needing a
>>> file-system lock.
>>> >
>>> > My question: can the JVM detect that a thread is in
>>> signal/interrupt-handler code and thus treat it
>>> > as though it is at a safepoint (as I believe happens when a thread is
>>> in native code via a JNI call)?
>>> >
>>> > For instance, given the stack trace below, will the JVM need to wait
>>> for the thread to be scheduled
>>> > back on to CPU in order to come to a safepoint, or will it be treated
>>> as "in-native"?
>>> >
>>> > 7fff81714cd9 __schedule ([kernel.kallsyms])
>>> > 7fff817151e5 schedule ([kernel.kallsyms])
>>> > 7fff81717a4b rwsem_down_write_failed ([kernel.kallsyms])
>>> > 7fff813556e

Re: JVM detection of thread at safepoint

2017-12-05 Thread Kirk Pepperdine
Hi,


> On Dec 5, 2017, at 11:53 AM, Gil Tene  wrote:
> 
> Page faults in mapped file i/o and counted loops are certainly two common 
> causes of long TTSP. But there are many other paths that *could* cause it as 
> well in HotSpot. Without catching it and looking at the stack trace, it's 
> hard to know which ones to blame. Once you knock out one cause, you'll see if 
> there is another.
> 
> In the specific stack trace you showed [assuming that trace was taken during 
> a long TTSP], mapped file i/o is the most likely culprit. Your trace seems to 
> be around making the page write-able for the first time and updating the file 
> time (which takes a lock), but even without needing the lock, the fault 
> itself could end up waiting for the i/o to complete (read page from disk), 
> and that (when Murh=phy pays you a visit) can end up waiting behind 100s 
> other i/o operations (e.g. when your i/o happens at the same time the kernel 
> decided to flush some dirty pages in the cache), leading to TTSPs in the 100s 
> of msec.
> 
> As I'm sure you already know, one simple way to get around mapped file 
> related TTSP is to not used mapped files. Explicit random i/o calls are 
> always done while at a safepoint, so they can't cause high TTSPs.

I guess another way to avoid the long TTSP is to not safe point.. ;-)

Most of the long TTSP’s IME are either due to outside scheduling interference 
or loops that are compiled to the point where they turn into counted loops. 
Finding this using traditional tooling is impossible because you need the safe 
point which has most likely been compiled out of the code to get a sample to 
see why you’re waiting for a safe point. Using an async profiler (like honest) 
leaves you wanting for some correlation between calls for safe point and the 
samples you’re looking at. So, I don’t really have a good answer as to how to 
find these things. What I have done in the past is used the GC log to get an 
idea of the toxicity level of the environment you’re running in and then use 
that information to decide if I need to look for internal or external causes. 
If the causes are internal than it’s most likely a counted loop. Using JITWatch 
in this cause can you get a candidate list of counted loops to consider. After 
that I’ve binary searched through that list. If the environment is toxic.. then 
you need to clean up the environment. If you are brave, you add some 
instrumentation to the JVM to sample for the last thread to reach the safe 
point and use that in your prod environment.

Kind regards,
Kirk

> 
> On Tuesday, December 5, 2017 at 10:30:57 AM UTC+1, Mark Price wrote:
> Hi Aleksey,
> thanks for the response. The I/O is definitely one problem, but I was trying 
> to figure out whether it was contributing to the long TTSP times, or whether 
> I might have some code that was misbehaving (e.g. NonCountedLoops).
> 
> Your response aligns with my guesswork, so hopefully I just have the one 
> problem to solve ;)
> 
> 
> 
> Cheers,
> 
> Mark
> 
> On Tuesday, 5 December 2017 09:24:33 UTC, Aleksey Shipilev wrote:
> On 12/05/2017 09:26 AM, Mark Price wrote: 
> > I'm investigating some long time-to-safepoint pauses in oracle/openjdk. The 
> > application in question 
> > is also suffering from some fairly nasty I/O problems where 
> > latency-sensitive threads are being 
> > descheduled in uninterruptible sleep state due to needing a file-system 
> > lock. 
> > 
> > My question: can the JVM detect that a thread is in 
> > signal/interrupt-handler code and thus treat it 
> > as though it is at a safepoint (as I believe happens when a thread is in 
> > native code via a JNI call)? 
> > 
> > For instance, given the stack trace below, will the JVM need to wait for 
> > the thread to be scheduled 
> > back on to CPU in order to come to a safepoint, or will it be treated as 
> > "in-native"? 
> > 
> > 7fff81714cd9 __schedule ([kernel.kallsyms]) 
> > 7fff817151e5 schedule ([kernel.kallsyms]) 
> > 7fff81717a4b rwsem_down_write_failed ([kernel.kallsyms]) 
> > 7fff813556e7 call_rwsem_down_write_failed ([kernel.kallsyms]) 
> > 7fff817172ad down_write ([kernel.kallsyms]) 
> > 7fffa0403dcf xfs_ilock ([kernel.kallsyms]) 
> > 7fffa04018fe xfs_vn_update_time ([kernel.kallsyms]) 
> > 7fff8122cc5d file_update_time ([kernel.kallsyms]) 
> > 7fffa03f7183 xfs_filemap_page_mkwrite ([kernel.kallsyms]) 
> > 7fff811ba935 do_page_mkwrite ([kernel.kallsyms]) 
> > 7fff811bda74 handle_pte_fault ([kernel.kallsyms]) 
> > 7fff811c041b handle_mm_fault ([kernel.kallsyms]) 
> > 7fff8106adbe __do_page_fault ([kernel.kallsyms]) 
> > 7fff8106b0c0 do_page_fault ([kernel.kallsyms]) 
> > 7fff8171af48 page_fault ([kernel.kallsyms]) 
> >  java stack trace ends here  
> 
> I am pretty sure out-of-band page fault in Java thread does not yield a 
> safepoint. At least because 
> safepoint polls happen

Re: JVM detection of thread at safepoint

2017-12-05 Thread Gil Tene
Page faults in mapped file i/o and counted loops are certainly two common 
causes of long TTSP. But there are many other paths that *could* cause it 
as well in HotSpot. Without catching it and looking at the stack trace, 
it's hard to know which ones to blame. Once you knock out one cause, you'll 
see if there is another.

In the specific stack trace you showed [assuming that trace was taken 
during a long TTSP], mapped file i/o is the most likely culprit. Your trace 
seems to be around making the page write-able for the first time and 
updating the file time (which takes a lock), but even without needing the 
lock, the fault itself could end up waiting for the i/o to complete (read 
page from disk), and that (when Murh=phy pays you a visit) can end up 
waiting behind 100s other i/o operations (e.g. when your i/o happens at the 
same time the kernel decided to flush some dirty pages in the cache), 
leading to TTSPs in the 100s of msec.

As I'm sure you already know, one simple way to get around mapped file 
related TTSP is to not used mapped files. Explicit random i/o calls are 
always done while at a safepoint, so they can't cause high TTSPs.

On Tuesday, December 5, 2017 at 10:30:57 AM UTC+1, Mark Price wrote:
>
> Hi Aleksey,
> thanks for the response. The I/O is definitely one problem, but I was 
> trying to figure out whether it was contributing to the long TTSP times, or 
> whether I might have some code that was misbehaving (e.g. NonCountedLoops).
>
> Your response aligns with my guesswork, so hopefully I just have the one 
> problem to solve ;)
>
>
>
> Cheers,
>
> Mark
>
> On Tuesday, 5 December 2017 09:24:33 UTC, Aleksey Shipilev wrote:
>>
>> On 12/05/2017 09:26 AM, Mark Price wrote: 
>> > I'm investigating some long time-to-safepoint pauses in oracle/openjdk. 
>> The application in question 
>> > is also suffering from some fairly nasty I/O problems where 
>> latency-sensitive threads are being 
>> > descheduled in uninterruptible sleep state due to needing a file-system 
>> lock. 
>> > 
>> > My question: can the JVM detect that a thread is in 
>> signal/interrupt-handler code and thus treat it 
>> > as though it is at a safepoint (as I believe happens when a thread is 
>> in native code via a JNI call)? 
>> > 
>> > For instance, given the stack trace below, will the JVM need to wait 
>> for the thread to be scheduled 
>> > back on to CPU in order to come to a safepoint, or will it be treated 
>> as "in-native"? 
>> > 
>> > 7fff81714cd9 __schedule ([kernel.kallsyms]) 
>> > 7fff817151e5 schedule ([kernel.kallsyms]) 
>> > 7fff81717a4b rwsem_down_write_failed ([kernel.kallsyms]) 
>> > 7fff813556e7 call_rwsem_down_write_failed ([kernel.kallsyms]) 
>> > 7fff817172ad down_write ([kernel.kallsyms]) 
>> > 7fffa0403dcf xfs_ilock ([kernel.kallsyms]) 
>> > 7fffa04018fe xfs_vn_update_time ([kernel.kallsyms]) 
>> > 7fff8122cc5d file_update_time ([kernel.kallsyms]) 
>> > 7fffa03f7183 xfs_filemap_page_mkwrite ([kernel.kallsyms]) 
>> > 7fff811ba935 do_page_mkwrite ([kernel.kallsyms]) 
>> > 7fff811bda74 handle_pte_fault ([kernel.kallsyms]) 
>> > 7fff811c041b handle_mm_fault ([kernel.kallsyms]) 
>> > 7fff8106adbe __do_page_fault ([kernel.kallsyms]) 
>> > 7fff8106b0c0 do_page_fault ([kernel.kallsyms]) 
>> > 7fff8171af48 page_fault ([kernel.kallsyms]) 
>> >  java stack trace ends here  
>>
>> I am pretty sure out-of-band page fault in Java thread does not yield a 
>> safepoint. At least because 
>> safepoint polls happen at given location in the generated code, because 
>> we need the pointer map as 
>> the part of the machine state, and that is generated by Hotspot (only) 
>> around the safepoint polls. 
>> Page faulting on random read/write insns does not have that luxury. Even 
>> if JVM had intercepted that 
>> fault, there is not enough metadata to work on. 
>>
>> The stacktrace above seems to say you have page faulted and this incurred 
>> disk I/O? This is 
>> swapping, I think, and all performance bets are off at that point. 
>>
>> Thanks, 
>> -Aleksey 
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: JVM detection of thread at safepoint

2017-12-05 Thread Mark Price
Hi Aleksey,
thanks for the response. The I/O is definitely one problem, but I was 
trying to figure out whether it was contributing to the long TTSP times, or 
whether I might have some code that was misbehaving (e.g. NonCountedLoops).

Your response aligns with my guesswork, so hopefully I just have the one 
problem to solve ;)



Cheers,

Mark

On Tuesday, 5 December 2017 09:24:33 UTC, Aleksey Shipilev wrote:
>
> On 12/05/2017 09:26 AM, Mark Price wrote: 
> > I'm investigating some long time-to-safepoint pauses in oracle/openjdk. 
> The application in question 
> > is also suffering from some fairly nasty I/O problems where 
> latency-sensitive threads are being 
> > descheduled in uninterruptible sleep state due to needing a file-system 
> lock. 
> > 
> > My question: can the JVM detect that a thread is in 
> signal/interrupt-handler code and thus treat it 
> > as though it is at a safepoint (as I believe happens when a thread is in 
> native code via a JNI call)? 
> > 
> > For instance, given the stack trace below, will the JVM need to wait for 
> the thread to be scheduled 
> > back on to CPU in order to come to a safepoint, or will it be treated as 
> "in-native"? 
> > 
> > 7fff81714cd9 __schedule ([kernel.kallsyms]) 
> > 7fff817151e5 schedule ([kernel.kallsyms]) 
> > 7fff81717a4b rwsem_down_write_failed ([kernel.kallsyms]) 
> > 7fff813556e7 call_rwsem_down_write_failed ([kernel.kallsyms]) 
> > 7fff817172ad down_write ([kernel.kallsyms]) 
> > 7fffa0403dcf xfs_ilock ([kernel.kallsyms]) 
> > 7fffa04018fe xfs_vn_update_time ([kernel.kallsyms]) 
> > 7fff8122cc5d file_update_time ([kernel.kallsyms]) 
> > 7fffa03f7183 xfs_filemap_page_mkwrite ([kernel.kallsyms]) 
> > 7fff811ba935 do_page_mkwrite ([kernel.kallsyms]) 
> > 7fff811bda74 handle_pte_fault ([kernel.kallsyms]) 
> > 7fff811c041b handle_mm_fault ([kernel.kallsyms]) 
> > 7fff8106adbe __do_page_fault ([kernel.kallsyms]) 
> > 7fff8106b0c0 do_page_fault ([kernel.kallsyms]) 
> > 7fff8171af48 page_fault ([kernel.kallsyms]) 
> >  java stack trace ends here  
>
> I am pretty sure out-of-band page fault in Java thread does not yield a 
> safepoint. At least because 
> safepoint polls happen at given location in the generated code, because we 
> need the pointer map as 
> the part of the machine state, and that is generated by Hotspot (only) 
> around the safepoint polls. 
> Page faulting on random read/write insns does not have that luxury. Even 
> if JVM had intercepted that 
> fault, there is not enough metadata to work on. 
>
> The stacktrace above seems to say you have page faulted and this incurred 
> disk I/O? This is 
> swapping, I think, and all performance bets are off at that point. 
>
> Thanks, 
> -Aleksey 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: JVM detection of thread at safepoint

2017-12-05 Thread Aleksey Shipilev
On 12/05/2017 09:26 AM, Mark Price wrote:
> I'm investigating some long time-to-safepoint pauses in oracle/openjdk. The 
> application in question
> is also suffering from some fairly nasty I/O problems where latency-sensitive 
> threads are being
> descheduled in uninterruptible sleep state due to needing a file-system lock.
> 
> My question: can the JVM detect that a thread is in signal/interrupt-handler 
> code and thus treat it
> as though it is at a safepoint (as I believe happens when a thread is in 
> native code via a JNI call)?
> 
> For instance, given the stack trace below, will the JVM need to wait for the 
> thread to be scheduled
> back on to CPU in order to come to a safepoint, or will it be treated as 
> "in-native"?
> 
>         7fff81714cd9 __schedule ([kernel.kallsyms])
>         7fff817151e5 schedule ([kernel.kallsyms])
>         7fff81717a4b rwsem_down_write_failed ([kernel.kallsyms])
>         7fff813556e7 call_rwsem_down_write_failed ([kernel.kallsyms])
>         7fff817172ad down_write ([kernel.kallsyms])
>         7fffa0403dcf xfs_ilock ([kernel.kallsyms])
>         7fffa04018fe xfs_vn_update_time ([kernel.kallsyms])
>         7fff8122cc5d file_update_time ([kernel.kallsyms])
>         7fffa03f7183 xfs_filemap_page_mkwrite ([kernel.kallsyms])
>         7fff811ba935 do_page_mkwrite ([kernel.kallsyms])
>         7fff811bda74 handle_pte_fault ([kernel.kallsyms])
>         7fff811c041b handle_mm_fault ([kernel.kallsyms])
>         7fff8106adbe __do_page_fault ([kernel.kallsyms])
>         7fff8106b0c0 do_page_fault ([kernel.kallsyms])
>         7fff8171af48 page_fault ([kernel.kallsyms])
>      java stack trace ends here 

I am pretty sure out-of-band page fault in Java thread does not yield a 
safepoint. At least because
safepoint polls happen at given location in the generated code, because we need 
the pointer map as
the part of the machine state, and that is generated by Hotspot (only) around 
the safepoint polls.
Page faulting on random read/write insns does not have that luxury. Even if JVM 
had intercepted that
fault, there is not enough metadata to work on.

The stacktrace above seems to say you have page faulted and this incurred disk 
I/O? This is
swapping, I think, and all performance bets are off at that point.

Thanks,
-Aleksey

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


signature.asc
Description: OpenPGP digital signature


JVM detection of thread at safepoint

2017-12-05 Thread Mark Price
Hi Folks,
I'm investigating some long time-to-safepoint pauses in oracle/openjdk. The 
application in question is also suffering from some fairly nasty I/O 
problems where latency-sensitive threads are being descheduled in 
uninterruptible sleep state due to needing a file-system lock.

My question: can the JVM detect that a thread is in 
signal/interrupt-handler code and thus treat it as though it is at a 
safepoint (as I believe happens when a thread is in native code via a JNI 
call)?

For instance, given the stack trace below, will the JVM need to wait for 
the thread to be scheduled back on to CPU in order to come to a safepoint, 
or will it be treated as "in-native"?

7fff81714cd9 __schedule ([kernel.kallsyms])
7fff817151e5 schedule ([kernel.kallsyms])
7fff81717a4b rwsem_down_write_failed ([kernel.kallsyms])
7fff813556e7 call_rwsem_down_write_failed ([kernel.kallsyms])
7fff817172ad down_write ([kernel.kallsyms])
7fffa0403dcf xfs_ilock ([kernel.kallsyms])
7fffa04018fe xfs_vn_update_time ([kernel.kallsyms])
7fff8122cc5d file_update_time ([kernel.kallsyms])
7fffa03f7183 xfs_filemap_page_mkwrite ([kernel.kallsyms])
7fff811ba935 do_page_mkwrite ([kernel.kallsyms])
7fff811bda74 handle_pte_fault ([kernel.kallsyms])
7fff811c041b handle_mm_fault ([kernel.kallsyms])
7fff8106adbe __do_page_fault ([kernel.kallsyms])
7fff8106b0c0 do_page_fault ([kernel.kallsyms])
7fff8171af48 page_fault ([kernel.kallsyms])
 java stack trace ends here 



Cheers,


Mark

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.