Clone URL (Committers only): https://cms.apache.org/redirect?new=anonymous;action=diff;uri=http://jena.apache.org/documentation%2Fquery%2Ftext-query.mdtext
vincent.ventres...@ens-lyon.fr Index: trunk/content/documentation/query/text-query.mdtext =================================================================== --- trunk/content/documentation/query/text-query.mdtext (revision 1851871) +++ trunk/content/documentation/query/text-query.mdtext (working copy) @@ -609,21 +609,47 @@ index field. More complex setups, with multiple properties per entity (URI) are possible. +The assembler file can be either default configuration file (.../run/config.ttl) +or a custom file in ...run/configuration folder. Note that you can use several files +simultaneously. + +You have to edit the file (see comments in the assembler code below): + +1. provide values for paths and a fixed URI for tdb:DatasetTDB +2. modify the entity map : add the fields you want to index and desired options (filters, tokenizers...) + +If your assembler file is run/config.ttl, you can index the dataset with this command : + +java -cp ./fuseki-server.jar jena.textindexer --desc=run/config.ttl + Once configured, any data added to the text dataset is automatically -indexed as well. +indexed as well : https://jena.apache.org/documentation/query/text-query.html#building-a-text-index +When you change the jena-text in significant ways, such as changing what analyzer +is used for a given property and so on, then you’ll need to rebuild the Lucene index +via reloading the dataset or using the textIndexer. + ### Text Dataset Assembler The following is an example of a TDB dataset with a text index. + ######## Example of a TDB dataset and text index######################### + # The main doc sources are: + # - https://jena.apache.org/documentation/fuseki2/fuseki-configuration.html + # - https://jena.apache.org/documentation/assembler/assembler-howto.html + # - https://jena.apache.org/documentation/assembler/assembler.ttl + # See https://jena.apache.org/documentation/fuseki2/fuseki-layout.html for the destination of this file. + ######################################################################### + @prefix : <http://localhost/jena_example/#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> . @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> . @prefix text: <http://jena.apache.org/text#> . + @prefix skos: <http://www.w3.org/2004/02/skos/core#> + @prefix fuseki: <http://jena.apache.org/fuseki#> . - ## Example of a TDB dataset and text index ## Initialize TDB [] ja:loadClass "org.apache.jena.tdb.TDB" . tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset . @@ -631,39 +657,64 @@ ## Initialize text query [] ja:loadClass "org.apache.jena.query.text.TextQuery" . + # A TextDataset is a regular dataset with a text index. text:TextDataset rdfs:subClassOf ja:RDFDataset . + # Lucene index text:TextIndexLucene rdfs:subClassOf text:TextIndex . - # Elasticsearch index - text:TextIndexES rdfs:subClassOf text:TextIndex . + ## --------------------------------------------------------------- - ## This URI must be fixed - it's used to assemble the text dataset. :text_dataset rdf:type text:TextDataset ; - text:dataset <#dataset> ; + text:dataset :my_dataset ; # <-- replace `:my_dataset` with the desired URI text:index <#indexLucene> ; - . + . # A TDB dataset used for RDF storage - <#dataset> rdf:type tdb:DatasetTDB ; - tdb:location "DB" ; - tdb:unionDefaultGraph true ; # Optional - . - # Text index description + :my_dataset rdf:type tdb:DatasetTDB ; # <-- replace `:my_dataset` with the desired URI + tdb:location "/tmp/tdb-dataset/" ; # <-- replace `/tmp/tdb-dataset/` with your path (`.../fuseki/run/databases/MY_DATASET`) + # tdb:unionDefaultGraph true ; # Optional + . + + # Text index description (see documentation for other options) + <#indexLucene> a text:TextIndexLucene ; - text:directory <file:/some/path/lucene-index> ; + text:directory <file:/tmp/tdb-lucene-index> ; # <-- replace `<file:/tmp/tdb-lucene-index> with your path` (`<file:/.../fuseki/run/databases/MY_INDEX>`) text:entityMap <#entMap> ; - text:storeValues true ; + text:storeValues true ; text:analyzer [ a text:StandardAnalyzer ] ; text:queryAnalyzer [ a text:KeywordAnalyzer ] ; text:queryParser text:AnalyzingQueryParser ; - text:defineAnalyzers [ . . . ] ; text:multilingualSupport true ; - . + . + # Entity map (see documentation for other options) + + <#entMap> a text:EntityMap ; + text:defaultField "label" ; # <-- modify this value if needed + text:entityField "uri" ; + text:uidField "uid" ; + text:langField "lang" ; + text:graphField "graph" ; + text:map ( + [ text:field "label" ; # <-- modify this value if needed + text:predicate skos:prefLabel ] # <-- provide the predicates you want to index + ) . + + # Fuseki service (see documentation for other options) + + + <#service> rdf:type fuseki:Service ; + fuseki:name "/ds" ; # e.g : `s-query --service=http://localhost:3030/ds "select * where {?s ?p ?o} limit 5"` + fuseki:serviceQuery "query" ; # SPARQL query service + fuseki:serviceReadGraphStore "data" ; # SPARQL Graph store protocol (WARNING : read only dataset) + fuseki:dataset :text_dataset ; + . + + The `text:TextDataset` has two properties: - a `text:dataset`, e.g., a `tdb:DatasetTDB`, to contain