Re: No Improvement In Performance with indexing in Jena Fuseki

2021-01-04 Thread Andy Seaborne

Hi - attachments of configuration and data didn't make it (to me at least).

On 04/01/2021 04:56, Deepali Singhavi wrote:

Hi,

I am trying to implement indexing for Fuseki using 
Lucene/ElasticSearch using an assembler configuration file (attaching 
file for reference) but there is no improvement in performance 
(performance without index is better than with index).


I am using sample data from *films.ttl* file.

*Sample Query *
PREFIX text: >
PREFIX rdfs: >

select ?subject ?object
WHERE {
# Without Index
#?subject rdfs:label ?object .
#FILTER contains(?object,"City")
#With Index
?subject text:query (rdfs:label "city").
?subject rdfs:label ?object .
}

*Performance:*

No of Triples



No of Runs



Without Index



Lucene Index



ElasticSearch Index

6918



1



16ms



18ms



19ms

2



29ms



32ms



32ms

3



22ms



23ms



21ms

4



22ms



14ms



53ms

5



15ms



19ms



18ms


Please let me know if any other information is required from my side 
and please suggest how I can improve performance.


Regards,
Deepali



Re: No Improvement In Performance with indexing in Jena Fuseki

2021-01-04 Thread Deepali Singhavi
Hi,

Sample size means number of triples?

I have tried with 6000,4,5 and even with 1,00,000 triples. Please
find the performance report attached with this email.

Regards,
Deepali

On Mon, Jan 4, 2021 at 1:03 PM Lorenz Buehmann <
buehm...@informatik.uni-leipzig.de> wrote:

> What is the sample size here? I mean, for a low number of literals it's
> obvious that String containment check in Java isn't that slow. The
> difference will most likely come from a large scan over literals with
> containment check whereas with a Lucene index - which is basically an
> inverted index - it's obviously more efficient to lookup terms for the
> documents.
>
> On 04.01.21 05:56, Deepali Singhavi wrote:
> > Hi,
> >
> > I am trying to implement indexing for Fuseki using
> > Lucene/ElasticSearch using an assembler configuration file (attaching
> > file for reference) but there is no improvement in performance
> > (performance without index is better than with index).
> >
> > I am using sample data from *films.ttl* file.
> >
> > *Sample Query *
> > PREFIX text: 
> > PREFIX rdfs: 
> > select ?subject ?object
> > WHERE {
> > # Without Index
> > #?subject rdfs:label ?object .
> > #FILTER contains(?object,"City")
> > #With Index
> > ?subject text:query (rdfs:label "city").
> > ?subject rdfs:label ?object .
> > }
> >
> > *Performance:*
> >
> > No of Triples
> >
> >
> >
> > No of Runs
> >
> >
> >
> > Without Index
> >
> >
> >
> > Lucene Index
> >
> >
> >
> > ElasticSearch Index
> >
> > 6918
> >
> >
> >
> > 1
> >
> >
> >
> > 16ms
> >
> >
> >
> > 18ms
> >
> >
> >
> > 19ms
> >
> > 2
> >
> >
> >
> > 29ms
> >
> >
> >
> > 32ms
> >
> >
> >
> > 32ms
> >
> > 3
> >
> >
> >
> > 22ms
> >
> >
> >
> > 23ms
> >
> >
> >
> > 21ms
> >
> > 4
> >
> >
> >
> > 22ms
> >
> >
> >
> > 14ms
> >
> >
> >
> > 53ms
> >
> > 5
> >
> >
> >
> > 15ms
> >
> >
> >
> > 19ms
> >
> >
> >
> > 18ms
> >
> >
> > Please let me know if any other information is required from my side
> > and please suggest how I can improve performance.
> >
> > Regards,
> > Deepali
> >
>


Index Performance.xlsx
Description: MS-Excel 2007 spreadsheet