Yang Yuanzhe created JENA-953:
---------------------------------

             Summary: Text search does not work in Fuseki with In-memory 
datasets
                 Key: JENA-953
                 URL: https://issues.apache.org/jira/browse/JENA-953
             Project: Apache Jena
          Issue Type: Bug
          Components: Fuseki, Text
    Affects Versions: Fuseki 2.0.0, Fuseki 1.1.2, Fuseki 2.0.1
         Environment: Ubuntu 14.04 in VM
            Reporter: Yang Yuanzhe


First of all I apologize for possible duplicate posts. I sent it to the mailing 
list, it disappeared from the "draft box" but didn't show up again in the "sent 
box" either. So I try to publish it here before I lost it from my clipboard. :D

Here is the copy of the mail:

Hi Andy,

I am sorry for such a late response. We were busy on another project during 
this period. Now I try to explain how I reproduce the error step by step.

So the problem is there is something wrong in the search indexing for in-memory 
datasets.

Here is the configuration file I used, it should be basic enough: a server 
description, a service description and an index engine associating to the 
dataset to index "rdfs:label".

@prefix :        <#> .
@prefix fuseki:  <http://jena.apache.org/fuseki#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text:    <http://jena.apache.org/text#> .
@prefix spatial:    <http://jena.apache.org/spatial#> .

[] a fuseki:Server ;
   fuseki:services (
     <#memory>
   ) .

<#memory> a fuseki:Service ;
    fuseki:name                     "memory" ; 
    fuseki:serviceQuery             "sparql" ;
    fuseki:serviceQuery             "query" ;
    fuseki:serviceUpdate            "update" ;   # SPARQL query service -- 
/memory/update
    fuseki:serviceUpload            "upload" ;   # Non-SPARQL upload service
    fuseki:serviceReadWriteGraphStore      "data" ;     
    fuseki:serviceReadGraphStore       "get" ;   # Graph store protocol (read 
only) -- /memory/get
    fuseki:dataset           :text_dataset ;
    .

<#dataset> rdf:type ja:RDFDataset ;
    ja:defaultGraph
          [ 
            a ja:MemoryModel ;
          ] .

# Text
[] ja:loadClass "org.apache.jena.query.text.TextQuery" .
text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .

:text_dataset a text:TextDataset ;
    text:dataset   <#dataset> ;
    text:index     <#textIndexLucene> ;
    .

# Text index description
<#textIndexLucene> a text:TextIndexLucene ;
    text:directory <file:Lucene> ;
    ##text:directory "mem" ;
    text:entityMap <#entMap> ;
    .

<#entMap> a text:EntityMap ;
    text:entityField      "uri" ;
    text:defaultField     "text" ;
    text:map (
         [ text:field "text" ; text:predicate rdfs:label ]
         ) .


The server is started with
"./fuseki-server --config=config-memory-text.ttl"
and console says it starts properly:
[2015-06-03 12:13:09] Server     INFO  Fuseki 2.0.1-SNAPSHOT 
2015-05-05T12:48:09+0000
[2015-06-03 12:13:09] Config     INFO  
FUSEKI_HOME=/home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT
[2015-06-03 12:13:09] Config     INFO  
FUSEKI_BASE=/home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run
[2015-06-03 12:13:09] Servlet    INFO  Initializing Shiro environment
[2015-06-03 12:13:09] Config     INFO  Shiro file: 
file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini
[2015-06-03 12:13:09] Config     INFO  Configuration file: 
config-memory-text.ttl
[2015-06-03 12:13:10] Builder    INFO  Service: :memory
[2015-06-03 12:13:11] Config     INFO  Register: /memory
[2015-06-03 12:13:11] Server     INFO  Started 2015/06/03 12:13:11 CEST on port 
3030

I tested it in two versions: the official release 2.0.0 and the latest snapshot 
2.0.1-SNAPSHOT 2015-05-05T12:48:09+0000. The phenomenons are as follows:

In 2.0.0:
If I load some triples not containing "rdfs:label", everything works properly. 
However in this case the index engine is not working; then as long as I add one 
triple for "rdfs:label" into the file I am loading to Fuseki, error emerges:
[2015-06-03 12:10:47] Fuseki     INFO  [7] Filename: licenties.ttl, 
Content-Type=application/octet-stream, Charset=null => Turtle : Count=40 
Triples=40 Quads=0
[2015-06-03 12:10:47] HttpAction WARN  Exception during abort (operation 
attempts to continue): Can't abort a write lock-transaction
[2015-06-03 12:10:47] Fuseki     INFO  [7] 500 Server Error (523 ms) 
I remember that a few months ago when 2.0.0 was released for the first time, I 
discovered this issue and reported to you. But at that time I didn't realize 
that the root reason was because of indexing. In a later snapshot you fix it, 
but my test wasn't proper so I thought the problem is solved and gave you a 
wrong feedback. My sincere apologizes.

In 2.0.1 SNAPSHOT:
The latest snapshot contains the patch I mentioned above so they can be 
successfully loaded. However they are not indexed at all. Queries with keyword 
search do not return any result.

Following your advice, I tested loading and query from both Web UI and 
s-post/s-query tools, unfortunately (or fortunately?) the consequences are the 
same.

TDB:
Meanwhile, a similar experiment on Fuseki with TDB in 2.0.0 and 2.0.1 SNAPSHOT 
is also performed, they both works properly. Loadings are successful and 
queries returns search results. The only difference is in the configuration 
file the in-memory dataset is replaced with TDB.
@prefix :        <#> .
@prefix fuseki:  <http://jena.apache.org/fuseki#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text:    <http://jena.apache.org/text#> .

[] rdf:type fuseki:Server ;
   fuseki:services (
     <#service_text_tdb>
   ) .

# TDB
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model .

# Text
[] ja:loadClass "org.apache.jena.query.text.TextQuery" .
text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .

<#service_text_tdb> a fuseki:Service ;
    rdfs:label                      "TDB/text service" ;
    fuseki:name                     "tdb" ;
    fuseki:serviceQuery             "query" ;
    fuseki:serviceQuery             "sparql" ;
    fuseki:serviceUpdate            "update" ;
    fuseki:serviceUpload            "upload" ;
    fuseki:serviceReadGraphStore    "get" ;
    fuseki:serviceReadWriteGraphStore    "data" ;
    fuseki:dataset                  <#text_dataset> ;
    .

<#text_dataset> a text:TextDataset ;
    text:dataset   <#dataset> ;
    text:index     <#indexLucene> ;
    .

<#dataset> a tdb:DatasetTDB ;
    tdb:location "DB" ;
    ##tdb:unionDefaultGraph true ;
    .

<#indexLucene> a text:TextIndexLucene ;
    text:directory <file:Lucene> ;
    ##text:directory "mem" ;
    text:entityMap <#entMap> ;
    .

<#entMap> a text:EntityMap ;
    text:entityField      "uri" ;
    text:defaultField     "text" ;      
    text:map (          
         [ text:field "text" ; text:predicate rdfs:label ]
         ) .

Any advice for it now? Thank you very much for your efforts in advance.

Regards,
Yang

PS: I discovered that there is a SNAPSHOT for 2.3.0. I planned to test on it as 
well. However I wasn't able to run it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to