Hans Brende created ANY23-336:
---------------------------------

             Summary: Parsing json-ld content takes prohibitively long time
                 Key: ANY23-336
                 URL: https://issues.apache.org/jira/browse/ANY23-336
             Project: Apache Any23
          Issue Type: Bug
            Reporter: Hans Brende


Using the page [https://www.guthriegreen.com|https://www.guthriegreen.com/] as 
a benchmark, a page fetch took about 100 ms, while simply *parsing* the json-ld 
content on that page took a *staggering 27400 ms*. For reference, I'm using 
Java 8, build 162, on a Macbook Pro (early 2015).

The bad news is that this is not our fault.

I've profiled this behavior down to the 
{{com.github.jsonldjava.utils.JsonUtils.fromURL(URL, CloseableHttpClient)}} 
function. 94% of the parsing time is spent there. This function is called when 
trying to load remote json-ld contexts. 

In order to avoid loading remote contexts repeatedly, this function tries to 
*cache* them by using a {{CachingHttpClient}} from the httpclient-osgi library.

Unfortunately, that strategy is *not* working, as I have recorded exactly 
*zero* cache hits, meaning that *every* retrieval is a cache miss and a remote 
context is re-fetched via http every single time it's accessed.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to