Axel, I’ve filed this bug and looked at various options for fixing it:
https://bugs.openjdk.java.net/browse/JDK-8160435 The simplest solution seems to be to use java.net.URI instead of java.net.URL. It provides a isOpaque() method which will properly recognize your URIs as non-hierarchical. It also provides a resolve() method to get the base URI and is not tied to I/O handlers. I’ll be posting a request for review soon. Hannes > Am 28.06.2016 um 09:02 schrieb Hannes Wallnöfer > <hannes.wallnoe...@oracle.com>: > > Hi Axel, > > Thanks for the explanation and code to reproduce the problem. > > I’m looking at it right now. > > Hannes > > >> Am 27.06.2016 um 23:53 schrieb Axel Faust <axel.faus...@googlemail.com>: >> >> Hello, >> >> TL;DR : I use custom URL protocol schemes and stream handlers that are not >> globally registered. This causes excessive handler resolution overhead in >> URL.getURLStreamHandler() called implicitly in Source.baseURL(). I can't >> find a way to avoid this overhead (in JDK 1.8.0_71) without two impossible >> choices: complete refactoring or registering a JVM global >> URLStreamHandlerFactory. >> A test case for sampling the overhead is provided in >> https://gist.github.com/AFaust/04ec0c65a560e306b6b547dcaf38fd21 >> >> >> >> This is a follow-up to my tweet of mine from yesterday: >> https://twitter.com/ReluctantBird83/status/747145726703075328 >> In this tweet I was commenting on an obversvation I made from CPU sampling >> the current state of my Nashorn-based script engine for the open source ECM >> platform Alfresco (https://github.com/AFaust/alfresco-nashorn-script-engine >> ). >> >> What prompted the comment where the following hot spot methods from my >> jvisualvm sampling session, when I was testing a trivial ReST endpoint >> backed by a Nashorn-executed script: >> >> "Hot Spots - Method","Self Time [%]","Self Time","Self Time (CPU)","Total >> Time","Total Time (CPU)","Samples" >> "java.lang.invoke.LambdaForm$MH.771977685.linkToCallSite()","15.152575","793.365 >> ms","793.365 ms","1126.483 ms","1126.483 ms","63" >> "java.net.URL.<init>()","11.350913","594.316 ms","594.316 ms","594.316 >> ms","594.316 ms","33" >> "java.lang.Throwable.<init>()","7.248728","379.532 ms","379.532 >> ms","379.532 ms","379.532 ms","21" >> [...] >> "jdk.nashorn.internal.runtime.Source.baseURL()","0.0","0.0 ms","0.0 >> ms","594.316 ms","594.316 ms","33" >> [...] >> >> The 1st and 3rd hot spot are directly related to frequently called code in >> my scripts / my utilities and somewhat expected, but I was not expecting >> the URL constructor to be up there. >> The backtraces view of the snapshot showed Source.baseURL() as the >> immediate and only caller of the URL constructor, even though I have other >> calls in my code which apparently don't trigger the sampling threshold. >> The total time per execution of the script is around 50-60ms with few >> outliers up to 90-100ms (sampling started only after reasonably stable >> state was reached). Sampling was limited specifically on the jdk.nashorn.*, >> jdk.internal.* and de.* packages. >> >> A bit of background on my Alfresco Nashorn engine: >> - embedded into a web application that may potentially run in Tomcat or JEE >> servers (JBoss, WebSphere...) >> - JavaScript in Alfresco is extensively used for embedded rules, policies >> (event handling), ReST API endpoints and server-side UI pre-composition >> - use of an AMD-like module system allowing flexible extension of script >> API by 3rd party developers of Alfresco "addons" >> - one file per module, lazily loaded when required by other module or >> executed script >> - frequently used "core" modules will be pre-loaded and cached on startup >> - scripts are referenced via "logical" URLs using custom protocol schemes >> to denote different script resolution and load scopes/mechanisms (example: >> "webscript:///my/module/id" for a module in the lookup scope for ReST >> endpoint scripts; some scripts may be user-managed within the content >> repository / database itself) >> - custom protocol schemes are handled by custom URL stream handlers *NOT* >> globally registered (to avoid interfering with other web applications or >> other URL-related functionality in the same JVM) >> >> >> It turns out that the last two points are essential. I created a >> generalised test case in a GitHub gist: >> https://gist.github.com/AFaust/04ec0c65a560e306b6b547dcaf38fd21 >> Essentially it is URL.getURLStreamHandler() which is responsible for the >> overhead. The Source.baseURL() creates a "base" name from the source URL >> and if the protocol is not "file://" then a new URL will be created. Since >> I use custom URL stream handlers and have not registered a global stream >> handler factory (and won't ever do so), the new URL will try to resolve the >> handler via URL.getURLStreamHandler(), go through all the hoops and always >> fail in the end. A failed resolution is never cached, so every time >> Source.baseURL() is called this whole process / overhead is repeated. >> >> >> I am currently trying to reduce all global overheads of my script engine >> setup, but can't find a way to avoid this overhead without registering a >> global URL stream factory, which is out of the question for various reasons >> (web application; 3rd party loaders; engine-specific semantics) or >> completely refactoring the engine so all scripts are copied to simple >> "file://" before execution (requiring constant sync-checking with original >> script in source storage location). >> >> Ideally, I would like the see options to provide both a base URL myself as >> pre-resolved information via URLReader/Global.load() and register a custom >> stream handler factory with my Nashorn engine instance. This would allow >> "simple" loaders to use simple URL-Strings instead of real URL instances to >> load script files via Global.load(), as well as "complex" loaders to >> continue using state-ful custom URL stream handlers where necessary. And it >> would allow Nashorn to resolve a potential custom URL stream handler before >> relegating to default JVM global handling if no handler is found. >> >> I am sure I am not aware of all the implications - and certainly I am aware >> that such a change in a core class might be impossible - but >> URL.getURLStreamHandler() should really cache failed stream handler >> resolutions and avoid repeating the entire lookup routine... >> >> >> Kind regards, and sorry for this overly long "summary" >> Axel Faust >