Github user osma commented on the issue: https://github.com/apache/jena/pull/227 I tested the ES backend with some non-toy SKOS data, namely [YSO](http://finto.fi/en/yso/). I configured the entity definition to index the predicates `skos:prefLabel`, `skos:altLabel` and `skos:hiddenLabel`. The dataset has 520k triples and 29k entities. There are in total 150k triples with these label properties. I'm using a rather old laptop (i3-2330M with SSD) for the test. Ubuntu 16.04, ES 5.2.1. Using the ES backend, indexing this dataset took about 25 minutes: ``` 16:42:45 INFO [1] PUT http://localhost:3030/ds/data?default 17:08:06 INFO [1] 204 No Content (1Â 521,465 s) ``` Looking at process stats, most of the time was spent by ES. It spent about 38 minutes CPU time. I also indexed the same dataset using the Lucene backend. It took less than 30 seconds: ``` 17:11:26 INFO [1] PUT http://localhost:3030/ds/data?default 17:11:55 INFO [1] 204 No Content (28,237 s) ``` Query performance seems to be pretty much the same, in fact the ES backend seems slightly faster than the Lucene backend but there was a lot of variance so I can't tell for sure. I have my doubts about whether the indexing performance is acceptable for real world use cases like what @anujgandharv is targeting, but I don't think this should stop us from merging this contribution. Since there have been no objections, I will proceed with the merge.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---