For me this is really bad practice. It also looks like they did the
benchmark more than one year ago. Otherwise due to JENA-1195 this error
wouldn't occur anymore. And submission deadline was August 6th, 2017 .
Their experiments contain 8 queries, rerunning those shouldn't take ages...

I'm currently trying to reproduce the results of the paper, but the
whole experimental setup remains unclear. I'm wondering if they used
just the Jena CLI or TDB. The same holds for RDF4J. I'm puzzled because
the runtimes in the eval section are quite small, but even loading the
data of their benchmark takes much more time. So maybe they used the
RDF4J server.

The worst thing is that they didn't contact any of the developers. Or
did they talk to somebody here and then Andy created the ticket
JENA-1195? Also for the other queries that failed, I would expect to see
tickets on Apache JIRA or at least a hint on the Jena mailing list...

@Andy I'm also wondering whether JENA-1317 addresses the problem with
the empty result of benchmark query containing an inverse property path.


On 18.10.2017 17:03, aj...@apache.org wrote:
> As you know, Andy, I'm going to ISWC this year-- shall I buttonhole
> them and give them our POV? :grin:
>
> In all seriousness, from what I can tell the results amount to "Using
> older versions of our comparands and without contacting the projects
> in question we couldn't find a store that implements every property
> path feature correctly and some fail entirely."
>
> I'm not really sure how useful that information is...? But I am ready
> to do a benchmarking paper for next year. Seems like it's a lot easier
> than I thought!
>
>
> ajs6f
>
>
> Andy Seaborne wrote on 10/17/17 9:28 AM:
>> Hi Lorenz,
>>
>> Looks like JENA-1195 which is fixed.  Does that look like it?
>>
>> I think it is shame when papers focus on bugs rather than discussing
>> and even fixing them.  Bugs aren't research.
>>
>> Path evaluation could improved to stream in more cases (that's why
>> LIMIT didn't help), but 1195 explains the slowness
>> and memory.
>>
>>     Andy
>>
>> On 17/10/17 07:58, Lorenz B. wrote:
>>> Hi,
>>>
>>> I just walked through the papers for the upcoming ISWC conference and
>>> found a paper about benchmarking of SPARQL property paths [1] .
>>>
>>> Not sure if this is relevant, but it looks like Jena has some issues
>>> with different types of queries using the property path. For example,
>>>
>>> SELECT ?o WHERE {A B* ?o.} LIMIT 100
>>>
>>> lead to an OOM error on non-cyclic data. Here is the relevant part of
>>> the paper:
>>>
>>>> While benchmarking Virtuoso, RDF4J and Allegrograph no errors or
>>>> exceptions have occurred. During the benchmark process of Jena an
>>>> OutOfMemoryError has been thrown whenever a query with the * operator
>>>> was used. In order to identify the cause of the error, the amount of
>>>> results the query should return has been limited to 100. The results
>>>> that have been returned by a query of the form SELECT ?o WHERE {A B*
>>>> ?o.} LIMIT 100 where A and B are valid IRIs, consisted of 100 times A.
>>>> Due to this fact it is presumable that the query containing the *
>>>> operator returns A recursively until the main memory was full. To
>>>> ensure that this behaviour is not caused by cycles in the dataset a
>>>> query of the same form but with a predicate IRI that did not exist in
>>>> the dataset was executed. This query still returned 100 times A. This
>>>> indicates, that the * operator is not implemented correctly.
>>> In addition, the experiments showed that:
>>>> Due to the problems with the * operator the queries 4, 7 and 8 could
>>>> not be processed. Additionally query 3, 5, and 6 returned no results
>>>> after 1 hour and thus, were aborted. Query 1 returned an empty and
>>>> thus, incomplete result set. Only for query 2 a valid result was
>>>> returned. Due to the lack of comparable results, Jena has been omitted
>>>> in the comparison of triple stores.
>>>
>>> In the discussion section, they summarize the overall performance of
>>> Jena by
>>>
>>>> Jena could not return results for any query in under 1 hour besides
>>>> query 2. Furthermore, the * operator could not be evaluated at all and
>>>> the inverse operator returned empty result sets.
>>>
>>> It looks like they used version 3.0.1, so maybe this doesn't hold
>>> anymore for all of the queries. If not, it could be interesting to
>>> improve performance and/or completeness.
>>>
>>> I hope I didn't miss some open JIRA ticket, but in general I just
>>> wanted
>>> to highlight the presence of some published benchmark for those kind of
>>> queries.
>>>
>>>
>>> Cheers,
>>>
>>> Lorenz
>>>
>>> [1] http://ceur-ws.org/Vol-1932/paper-04.pdf
>>>

Reply via email to