Github user osma commented on the issue:
https://github.com/apache/jena/pull/227
I tested the ES backend with some non-toy SKOS data, namely
[YSO](http://finto.fi/en/yso/). I configured the entity definition to index the
predicates `skos:prefLabel`, `skos:altLabel` and `skos:hiddenLabel`. The
dataset has 520k triples and 29k entities. There are in total 150k triples with
these label properties.
I'm using a rather old laptop (i3-2330M with SSD) for the test. Ubuntu
16.04, ES 5.2.1.
Using the ES backend, indexing this dataset took about 25 minutes:
```
16:42:45 INFO [1] PUT http://localhost:3030/ds/data?default
17:08:06 INFO [1] 204 No Content (1Â 521,465 s)
```
Looking at process stats, most of the time was spent by ES. It spent about
38 minutes CPU time.
I also indexed the same dataset using the Lucene backend. It took less than
30 seconds:
```
17:11:26 INFO [1] PUT http://localhost:3030/ds/data?default
17:11:55 INFO [1] 204 No Content (28,237 s)
```
Query performance seems to be pretty much the same, in fact the ES backend
seems slightly faster than the Lucene backend but there was a lot of variance
so I can't tell for sure.
I have my doubts about whether the indexing performance is acceptable for
real world use cases like what @anujgandharv is targeting, but I don't think
this should stop us from merging this contribution. Since there have been no
objections, I will proceed with the merge.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---