Re: Continuous performance monitoring with Java FlightRecorder (JFR)?

2016-12-02 Thread Gil Tene
I agree with the need/wish for a common way to get such information from 
JVMs. A common & standard way for JVMs and OpenJDK to provide event tracing 
as well as low-runtime-cost JVM instrumentation details (for information no 
currently covered by JVMTI and not cheaply [enough] gleaned via BCI) would 
be very useful but is not (yet?) part of the platform. JFR is very capable 
but is custom to Oracle. Zing has similar capabilities (and in some cases 
even more detailed information, such as viewing 
down-to-the-generated-machine-instruction hotness and stack traces) but 
those capabilities are also custom (to Zing). And IBM's J9 has it's own 
very detailed instrumentation capabilities. 

When looking at overlaps with APM and profiling tools, BCI, JVMTI, and 
other standard and semi-standard instrumentation levels do provide some 
overlapping capabilities for most things, and often at very affordable and 
practical runtime costs [that's why these production-time API tools are so 
popular]. But there are certainly some JVM-based instrumentation 
capabilities that current JVMs (HotSpot, Zing, J9) can fundamentally do 
"better" than the spec'ed standard things that profilers and APMs have 
access to, and/or in much cheaper ways, leaving their information in the 
custom jvm tooling arena for now. Specific examples of this include things 
like (A) tracking and examining wait times on monitor and j.u.c lock 
instances (for which the JVM has first hand knowledge that is better and 
more useful than the information that can be gleaned by external tools in 
clean/cheap-enoigh way); (B) the ability to use tick-based stack tracing 
outside of safepoints to profile code behavior. This tick-based stack 
tracing [outside of safepoints] is important not only because it is "cheap 
enough" to provide practical profiles with near-zero runtime overhead, but 
because it is accurate enough to make that information useful (as opposed 
to tick-based at-safepoint or at-BCI-instruimented-point instrumentations, 
which will often skew profiles dramatically). And (C) the ability to track 
and report on very useful heap content stats (e.g. by-type heap occupancy 
and by-type occupancy velocities) that the JVM can fundamentally measure 
cheaply [as a nearly-free part of GC scanning] but is not available to 
common tools via defined interfaces or log formats [forcing tools to 
re-instrument the heap to extract this data if they want it, often at a 
prohibitively high runtime cost]. Knowledge of generated code behavior, 
including information about compilations and deoptimizations, as well as 
the ability to express stack traces in terms of generated code locations 
(which makes stack traces both much cheaper and much more accurate in the 
non-heizengerg-ing sense) is also an area for which JVMTI capabilities 
could be greatly extended.

But even without those we-could-do-better capabilities, monitoring JVM 
behaviors in production seems to be doing pretty well. It can always be 
better, of course, but the state of these productioon-monitoring tools for 
Java is generally well ahead of what is available in almost all other 
languages and/or runtimes.

On Friday, December 2, 2016 at 2:08:37 AM UTC-8, zeo...@gmail.com wrote:
>
> Hi Gil,
>
> Thanks for the heads up and price references! I was certainly wary of the 
> license aspect even though the project I am planning is for open source 
> development.
>
> Would there be anything of similar capability in openjdk? Looking at the 
> openjdk src repo, it seems that there has been some more JEP 167 (
> http://openjdk.java.net/jeps/167) oriented changes introduced recently 
> into jdk9. The event tracing logic in the jdk8 tracing code seems to 
> already cover the core feature set of native (as opposed to BCI) JFR: 
> stacktrace samples, monitor waits, alloc/gc events, compiler 
> events. Judging by the small volume of changes between 2013 and 2015, I am 
> guessing the tracing feature is not used much in openjdk7/8 and might be of 
> uncertain reliability however (e.g. see this: 
> https://bugs.openjdk.java.net/browse/JDK-8145788). Maybe I should look 
> more into using those openjdk tracing capabilities instead of JFR for jdk9. 
> The runtime configurability and resource management (like JFR's 
> buffer/chunk/checkpoint) isn't quite there yet and I might need to write 
> some hacks to enable output destination that is not stdout/stderr.
>
> As for commercial APM out there, not doubt they will have lots of custom 
> BCI to cover app server use cases. I wonder how well (low overhead and high 
> accuracy) they do on the jvm native instrumentation side (stack sample, 
> alloc events, monitor wait). Same goes for profilers like Yourkit/JProfiler.
>
> Zee
>
> On Thursday, December 1, 2016 at 4:06:59 PM UTC-8, Gil Tene wrote:
>>
>> Virtually all the benefits of monitoring come in production environments 
>> (by definition, I think), and that's probably why you don't see this 
>> scenario (as) 

Re: Continuous performance monitoring with Java FlightRecorder (JFR)?

2016-12-02 Thread zeocio
Hi Gil,

Thanks for the heads up and price references! I was certainly wary of the 
license aspect even though the project I am planning is for open source 
development.

Would there be anything of similar capability in openjdk? Looking at the 
openjdk src repo, it seems that there has been some more JEP 167 
(http://openjdk.java.net/jeps/167) oriented changes introduced recently 
into jdk9. The event tracing logic in the jdk8 tracing code seems to 
already cover the core feature set of native (as opposed to BCI) JFR: 
stacktrace samples, monitor waits, alloc/gc events, compiler 
events. Judging by the small volume of changes between 2013 and 2015, I am 
guessing the tracing feature is not used much in openjdk7/8 and might be of 
uncertain reliability however (e.g. see this: 
https://bugs.openjdk.java.net/browse/JDK-8145788). Maybe I should look more 
into using those openjdk tracing capabilities instead of JFR for jdk9. The 
runtime configurability and resource management (like JFR's 
buffer/chunk/checkpoint) isn't quite there yet and I might need to write 
some hacks to enable output destination that is not stdout/stderr.

As for commercial APM out there, not doubt they will have lots of custom 
BCI to cover app server use cases. I wonder how well (low overhead and high 
accuracy) they do on the jvm native instrumentation side (stack sample, 
alloc events, monitor wait). Same goes for profilers like Yourkit/JProfiler.

Zee

On Thursday, December 1, 2016 at 4:06:59 PM UTC-8, Gil Tene wrote:
>
> Virtually all the benefits of monitoring come in production environments 
> (by definition, I think), and that's probably why you don't see this 
> scenario (as) commonly used with JFR.
>
> Basically, using JFR for production [currently at least] requires a 
> commercial Java SE Advanced license. How/if this is enforced technically is 
> irrelevant, the click-thropiugh license that allows you to use it for free 
> is specifically restricted to non-production use. This is spelled out in 
> the Oracle Binary Code License Agreement for the Java SE Platform Products 
> and JavaFX (
> http://www.oracle.com/technetwork/java/javase/terms/license/index.html), 
> under SUPPLEMENTAL LICENSE TERMS... A. COMMERCIAL FEATURES. and B. SOFTWARE 
> INTERNAL USE FOR DEVELOPMENT LICENSE GRANT. And since JFR is clearly marked 
> as a "Commercial Feature" (you literally have to use the 
> -XX:+UnlockCommercialFeatures -XX:+FlightRecorder to use it) it's 
> impossible to claim ignorance of this fact. See e.g. 
> https://www.infoq.com/news/2013/10/misson-control-flight-recorder, 
> http://www.adam-bien.com/roller/abien/entry/java_mission_control_development_pricing,
>  
> and 
> https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/run.htm#JFRUH164
>  
> for some discussion and mentions around it. 
>
> So while JFR can and may do some cool (and even semi-unique) things for 
> production monitoring, you'd have to clear the commercial pricing terms 
> first, and those seem pretty steep, as in a list price of $5000 per 2 x86 
> cores according to the Oracle price list (
> http://www.oracle.com/us/corporate/pricing/technology-price-list-070617.pdf), 
> which would equate to e.g. $40K per instance for EC2 m3.2xlarge instances, 
> and $80K-160K per server for modern 2 socket servers (those with "only" 
> 16-32 cores). While I'm sure the actual production pricing could end up 
> much lower once purchasing departments finish hand-wrestling with Oracle's 
> sales folks, it would probably still be way more than other commercial 
> monitoring and JVM-knowledgable APM solutions that are much more feature 
> rich and focused would cost (e.g. Dynatrace, AppDynamics, NewRelic, etc.), 
> all of which list at a tiny fraction of the Oracle Java SE Advanced list 
> price levels (and are massively used in production systems).
>
> On Thursday, December 1, 2016 at 12:18:18 PM UTC-8, zeo...@gmail.com 
> wrote:
>>
>> Hi all,
>>
>> Does anyone know a good way to do continuous performance monitoring using 
>> JFR (JDK8)? I am interested in using this on some apache data pipeline 
>> projects (Spark, Flink etc). I have used JFR for perf profiling with fixed 
>> duration before. Continuous monitoring would be quite different.
>>
>> The ideal scenario is to set up JFR to write to UDP  
>> destinations with configurable update frequencies. Obviously that is not 
>> supported by JFR as it stands today. So I tried setting up continuous JFR 
>> with maxage=30s and running JFR.dump every 30s, to my surprise the time 
>> range covered by the dumped jfr files does NOT correspond to the maxage 
>> parameter I gave. Instead the time ranges 
>> (FlightRecordingLoader.loadFile(new 
>> File("xyz.jfr")).timeRange) from successive JFR.dump can be overlapping 
>> and much bigger than maxage.
>>
>> So couple of questions for those experienced users of JFR:
>>
>> -- What exactly is the semantics of maxage?
>> I imagined that maxage has 2 effects: discarding events older