Hans Brende created ANY23-336:
---------------------------------
Summary: Parsing json-ld content takes prohibitively long time
Key: ANY23-336
URL: https://issues.apache.org/jira/browse/ANY23-336
Project: Apache Any23
Issue Type: Bug
Reporter: Hans Brende
Using the page [https://www.guthriegreen.com|https://www.guthriegreen.com/] as
a benchmark, a page fetch took about 100 ms, while simply *parsing* the json-ld
content on that page took a *staggering 27400 ms*. For reference, I'm using
Java 8, build 162, on a Macbook Pro (early 2015).
The bad news is that this is not our fault.
I've profiled this behavior down to the
{{com.github.jsonldjava.utils.JsonUtils.fromURL(URL, CloseableHttpClient)}}
function. 94% of the parsing time is spent there. This function is called when
trying to load remote json-ld contexts.
In order to avoid loading remote contexts repeatedly, this function tries to
*cache* them by using a {{CachingHttpClient}} from the httpclient-osgi library.
Unfortunately, that strategy is *not* working, as I have recorded exactly
*zero* cache hits, meaning that *every* retrieval is a cache miss and a remote
context is re-fetched via http every single time it's accessed.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)