Code Ferret created JENA-1459:
---------------------------------
Summary: add highlighting support to jena-text
Key: JENA-1459
URL: https://issues.apache.org/jira/browse/JENA-1459
Project: Apache Jena
Issue Type: Improvement
Components: Jena, Text
Affects Versions: Jena 3.6.0
Reporter: Code Ferret
Assignee: Code Ferret
This issue proposes an improvement to jena-text to include optional
highlighting of results via:
{{org.apache.lucene.search.highlight.Highlighter}}
and
{{org.apache.lucene.search.highlight.SimpleHTMLFormatter}}
The improvement will add an optional input argument to {{TextQueryPF}} that
signals that highlighting should be performed on the Lucene search results;
optionally indicates the _start_ and _end_ char sequences of a highlighted
term; optionally indicates the maximum number of fragments to highlight; and
optionally indicates a fragment separator.
The highlighted results are bound to the {{?literal}} output argument of
{{TextQueryPF}}.
Inclusion of this improvement will introduce a simple extraction of the
_highlight_ option string and a single test for the presence of the option
string so that the inclusion of the improvement will be of minimal impact when
highlighting is not used. The _highlight_ option string is passed directly to
{{TextIndex.query(...)}} and so can be used from code other than
{{TextQueryPF}}.
The simplest use of highlighting is like:
{code}
select ?s ?lit
where {
(?s ?sc ?lit) text:query (skos:prefLabel "one" 100 "lang:en" "highlight:") .
}
{code}
which will produce results such as:
{code}
"another ↦one↤ abc"@en
{code}
the right-arrow (\u21a6) and left-arrow (\u21a4) are the default _start_ and
_end_ highlighting character sequences. These are chosen to be very unlikely to
occur in literals. These can be changed easily via {{"s:"}} and {{"e:"}} in the
highlight options, for example:
{code}
select ?s ?lit
where {
(?s ?sc ?lit) text:query (skos:prefLabel "one" 100 "lang:en" "highlight:
s:<em class='hilite'> | e:</em>") .
{code}
which will produce results such as:
{code}
"another <em class='hilite'>one</em> abc"@en
{code}
Coding of this improvement is complete and a PR can be issued if there is
agreement that this improvement should be included in jena-text.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)