[ 
https://issues.apache.org/jira/browse/ANY23-336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16417871#comment-16417871
 ] 

Hans Brende edited comment on ANY23-336 at 3/30/18 7:08 AM:
------------------------------------------------------------

[~p_ansell] Regarding the PR you mentioned, I think that is a wonderful idea... 
the best part being: for applications that don't use the classpath at all 
(e.g., Any23), the list of resources will be empty, resulting in zero amortized 
time wasted searching the classpath! 

I added a suggestion: in the event that you would like to retain the previous 
Thread.getContextClassLoader() switching ability, it's a pretty easy fix: just 
keep track of the previous classloader you used to load the resources, and if 
it's changed, reload. That would also render the modifications you made to the 
test files unnecessary.

Other than that, the PR looks good to me!

(Also: with this PR implemented, it wouldn't really matter which of my above 
branches you use, since the speed increase you'd get by using the first branch 
would be quickly negated by the speed increase of *not* checking the classpath 
every time.)



was (Author: hansbrende):
[~p_ansell] Regarding the PR you mentioned, I think that is a wonderful idea... 
the best part being: for applications that don't use the classpath at all 
(e.g., Any23), the list of resources will be empty, resulting in zero amortized 
time wasted searching the classpath! 

I added a suggestion: in the event that you would like to retain the previous 
Thread.getContextClassLoader() switching ability, it's a pretty easy fix: just 
keep track of the previous classloader you used to load the resources, and if 
it's changed, reload. That would also render the modifications you made to the 
test files unnecessary.

Other than that, the PR looks good to me!


> Parsing json-ld content takes prohibitively long time
> -----------------------------------------------------
>
>                 Key: ANY23-336
>                 URL: https://issues.apache.org/jira/browse/ANY23-336
>             Project: Apache Any23
>          Issue Type: Bug
>          Components: core, extractors
>    Affects Versions: 2.2
>            Reporter: Hans Brende
>            Assignee: Peter Ansell
>            Priority: Critical
>             Fix For: 2.3
>
>         Attachments: Screen Shot 2018-03-27 at 2.52.15 PM.png, Screen Shot 
> 2018-03-27 at 2.54.43 PM.png
>
>
> Using the page [https://www.guthriegreen.com|https://www.guthriegreen.com/] 
> as a benchmark, a page fetch took about 100 ms, while simply *parsing* the 
> json-ld content on that page took a *staggering 27400 ms*. For reference, I'm 
> using Java 8, build 162, on a Macbook Pro (early 2015).
> The bad news is that this is not our fault.
> I've profiled this behavior down to the 
> {{com.github.jsonldjava.utils.JsonUtils.fromURL(URL, CloseableHttpClient)}} 
> function. 94% of the parsing time is spent there. This function is called 
> when trying to load remote json-ld contexts. 
> In order to avoid loading remote contexts repeatedly, this function tries to 
> *cache* them by using a {{CachingHttpClient}} from the httpclient-osgi 
> library.
> Unfortunately, that strategy is *not* working, as I have recorded exactly 
> *zero* cache hits, meaning that *every* retrieval is a cache miss and a 
> remote context is re-fetched via http every single time it's accessed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to