[jira] [Comment Edited] (JENA-1388) Lucene text search across multiple fields ("AND") yields no results

Osma Suominen (JIRA) Mon, 06 Nov 2017 02:40:47 -0800

    [ 
https://issues.apache.org/jira/browse/JENA-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16240123#comment-16240123
 ]


Osma Suominen edited comment on JENA-1388 at 11/6/17 10:39 AM:
---------------------------------------------------------------

What [~andy.seaborne] says is correct. The way jena-text with the Lucene 
backend works is that it creates separate documents for each indexed field. So 
one triple (or quad, when graph-specific indexing is enabled) corresponds to 
one document in the Lucene index. The upside is that this makes it rather 
simple to synchronize updates between the triple store and the Lucene index: 
for new triples, add documents into Lucene; for deleted triples, delete the 
corresponding documents from Lucene. The downside is that AND queries cannot be 
supported. This is a pretty fundamental design choice in jena-text so it cannot 
be simply fixed like a normal bug. It would require reengineering significant 
parts of the jena-text subsystem.

Note that the recently added Elasticsearch backend for jena-text works 
differently: it consolidates triples with the same subject into a single 
document in the text index. But it has to do a lot of bookkeeping to keep the 
information synchronized. One consequence of this is that updates to the index 
are very slow compared with the Lucene backend (though a major factor in this 
is also that operations are performed via a REST API to the Elasticsearch 
server, whereas the Lucene backend lives in the same JVM). The Elasticsearch 
backend does support AND queries, so you may want to try it instead of using 
the Lucene backend.


was (Author: osma):
What [~andy.seaborne] says is correct. The way jena-text with the Lucene 
backend works is that it creates separate documents for each document. So one 
triple (or quad, when graph-specific indexing is enabled) corresponds to one 
document in the Lucene index. The upside is that this makes it rather simple to 
synchronize updates between the triple store and the Lucene index: for new 
triples, add documents into Lucene; for deleted triples, delete the 
corresponding documents from Lucene. The downside is that AND queries cannot be 
supported. This is a pretty fundamental design choice in jena-text so it cannot 
be simply fixed like a normal bug. It would require reengineering significant 
parts of the jena-text subsystem.

Note that the recently added Elasticsearch backend for jena-text works 
differently: it consolidates triples with the same subject into a single 
document in the text index. But it has to do a lot of bookkeeping to keep the 
information synchronized. One consequence of this is that updates to the index 
are very slow compared with the Lucene backend (though a major factor in this 
is also that operations are performed via a REST API to the Elasticsearch 
server, whereas the Lucene backend lives in the same JVM). The Elasticsearch 
backend does support AND queries, so you may want to try it instead of using 
the Lucene backend.

> Lucene text search across multiple fields ("AND") yields no results
> -------------------------------------------------------------------
>
>                 Key: JENA-1388
>                 URL: https://issues.apache.org/jira/browse/JENA-1388
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: Text
>    Affects Versions: Jena 3.4.0
>         Environment: CentOS 7.3, OpenJDK 64-Bit, v1.8.0_141-b16
>            Reporter: Vilnis Termanis (Iotic Labs)
>            Assignee: Osma Suominen
>              Labels: index, lucene, search
>         Attachments: config-fields.ttl, multi_field.ttl, multi_index.sparql
>
>
> Searching across two Lucene text indexed fields produces potentially 
> unexpected results. (The following assumes that the string supplied to each 
> field does match and is tied to the same uid/subject.)
> # A query across two fields with *OR* produces two equal rows
> # The same query but with *AND* produces no rows



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (JENA-1388) Lucene text search across multiple fields ("AND") yields no results

Reply via email to