[ 
https://issues.apache.org/jira/browse/ANY23-336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416758#comment-16416758
 ] 

Peter Ansell commented on ANY23-336:
------------------------------------

JarCacheStorage is implemented as a primary-cache, not a backup-cache. If the 
user has specified in jarcache.json that a context is available on the 
classpath, it will always be used, rather than only when it isn't available 
live.

While you are in that code, could you look through and comment on another 
performance-related PR for me that the original requestor hasn't commented on: 

https://github.com/jsonld-java/jsonld-java/pull/224

It will change the default behaviour (which most people will never override), 
so I want some more eyes looking at it before thinking about it again.

In regards to the entry keys in the cache, it is my understanding from HTTP 
semantics that it has to be that way, based on content negotiation for both the 
content type and the content encoding. I haven't done anything special for that 
code, but there are tests that attempt to verify that caching works (based on 
HTTP response codes), so I would default to thinking that it does work, even 
though it looks crazy. If you aren't modifying the Accept header or the 
Accept-Encoding header, they shouldn't have any effect on the cacheability of 
the resource.

> Parsing json-ld content takes prohibitively long time
> -----------------------------------------------------
>
>                 Key: ANY23-336
>                 URL: https://issues.apache.org/jira/browse/ANY23-336
>             Project: Apache Any23
>          Issue Type: Bug
>          Components: core, extractors
>    Affects Versions: 2.2
>            Reporter: Hans Brende
>            Assignee: Peter Ansell
>            Priority: Critical
>             Fix For: 2.3
>
>         Attachments: Screen Shot 2018-03-27 at 2.52.15 PM.png, Screen Shot 
> 2018-03-27 at 2.54.43 PM.png
>
>
> Using the page [https://www.guthriegreen.com|https://www.guthriegreen.com/] 
> as a benchmark, a page fetch took about 100 ms, while simply *parsing* the 
> json-ld content on that page took a *staggering 27400 ms*. For reference, I'm 
> using Java 8, build 162, on a Macbook Pro (early 2015).
> The bad news is that this is not our fault.
> I've profiled this behavior down to the 
> {{com.github.jsonldjava.utils.JsonUtils.fromURL(URL, CloseableHttpClient)}} 
> function. 94% of the parsing time is spent there. This function is called 
> when trying to load remote json-ld contexts. 
> In order to avoid loading remote contexts repeatedly, this function tries to 
> *cache* them by using a {{CachingHttpClient}} from the httpclient-osgi 
> library.
> Unfortunately, that strategy is *not* working, as I have recorded exactly 
> *zero* cache hits, meaning that *every* retrieval is a cache miss and a 
> remote context is re-fetched via http every single time it's accessed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to