Use copyField with wildcard in source; how then to work out where a value came from?

2019-10-31 Thread Richard Walker
I've got a collection for which the schema has
a number of copyFields that have a wildcard in the source:

  

The idea is that I have fields in each document
that contain language-specific values in
fields that have field names that end in a language tag,
i.e., "skos_prefLabel-en", "skos_prefLabel-de",
"skos_prefLabel-fr", etc.
Let's say for this example that we have a Solr document
with:
{ ...,
  "skos_prefLabel-en": "One",
  "skos_prefLabel-de": "Eins",
  "skos_prefLabel-fr": "Un",
  ...
}

[ Let's leave aside the issue of what the field
type for "skos_prefLabel_all" should be; let's assume I'm
happy for it to be (say) "text_en_splitting" and
(for now) I'll live with the fact that this is wrong. ]

The idea is to be able to do searching and highlighting
on one or more specific languages, and _also_ to
be able to do a language-independent search, or,
if you like, to search for values in all languages
in one go. I want to display details of matches
and highlighting _with their language information_.

The problem: suppose I get a match and some
highlighting against the field skos_prefLabel_all.
How do I know which field(s) the data _came_ from?

My guess: when using a copyField in this way
(i.e., with a wildcard in the source),
it's not (in general) possible to work backwards from the
destination field to work out which source field
the content came from.

If that is so, one way to get what I want would
seem to be to _not_ use a copyField, but to
construct the Solr documents such that they
already contain a value for skos_prefLabel_all,
let's say, ["One", "Eins", "Un"],
and (let's say) for another field skos_prefLabel_all_languages,
that would then in this case have the value ["en", "de", "fr"],
i.e., such that there's a one-to-one match
between the values of skos_prefLabel_all and the
corresponding values of skos_prefLabel_all_languages.

Now I can display results with corresponding
language tags. Dealing with highlighting data
would still currently seem to be problematic,
but would be possible with something like
David Smiley's work at
https://issues.apache.org/jira/browse/SOLR-1954 .

Surely I'm missing something here.
Is there another/better way?

Richard.



Unified highlighter on result of query with two required terms which matched separate fields

2019-07-25 Thread Richard Walker
Hi, I'm trying to understand what's going on with
the combination of:

* Solr 8.1.1
* edismax parser
* qf with multiple fields specified (each of which has type
  text_en_splitting, some of which are multiValued)
* unified highlight method
* query with two terms
* results where the two terms match against _separate_ fields

when I make both of the two query terms _required_.

(Sample values for the query parameters to start with:
"q":"scope national"
"fl":"id,last_updated,slug,status,title,acronym,publisher,description,widgetable,sissvoc_endpoint,owner,[explain]"
"defType":"edismax"
"qf":"title_search^1 subject_search^0.5 description^0.01 concept_search^0.5 
publisher_search^0.5"
"hl":"on"
"hl.fl":"*"
"hl.method":"unified"
"hl.snippets":"10"
)

So far, so good: results are correct, and highlighting is correct.
In particular, for a result in which there is a match
for "scope" in one field (concept_search) and for "national"
in another (publisher_search), I get a highlighting result for
"scope" in concept_search and for "national" in publisher_search.
(I also get a highlight for another field concept_phrase which
has the same content as concept_search but with string type.)

All good so far.

But now if I change the query from

"q":"scope national"

to

"q":"+scope +national"

my results still (correctly) include the result in which there
was a match for "scope" in one field (concept_search) and for "national"
in another (publisher_search), but now there are no _highlights_
for that result!

What is even more counterintuitive is that if I now also set
"hl.requireFieldMatch":"true"
the highlights for the concept_search and publisher_search fields
(but not the concept_phrase field) come back!

Richard.



Re: Unified highlighter with storeOffsetsWithPositions and termVectors giving an exception

2019-07-21 Thread Richard Walker
On 22 Jul 2019, at 11:32 am, Richard Walker  wrote:
> I'm trying out the advice in the user guide
> ( 
> https://lucene.apache.org/solr/guide/8_1/highlighting.html#schema-options-and-performance-considerations
>  )
> for using the unified highlighter.
> 
> ...
> * "set storeOffsetsWithPositions to true"
> * "set termVectors to true but no other term vector
>  related options on the field being highlighted"
...

I completely forgot to mention that I also tried _just_:

> * "set storeOffsetsWithPositions to true"

i.e., without _also_ setting termVectors, and this _doesn't_
give the exception.

So it seems to be the _combination_ of:
* unified highlighter
* storeOffsetsWithPositions
* termVectors

that seems to be giving the exception.



Unified highlighter with storeOffsetsWithPositions and termVectors giving an exception

2019-07-21 Thread Richard Walker
I'm trying out the advice in the user guide
( 
https://lucene.apache.org/solr/guide/8_1/highlighting.html#schema-options-and-performance-considerations
 )
for using the unified highlighter.

I saw the note:
"This is definitely the fastest option for highlighting
wildcard queries on large text fields."

and decided to try this, namely:

* "set storeOffsetsWithPositions to true"
* "set termVectors to true but no other term vector
  related options on the field being highlighted"

I've set these options on two fields, but I now get an
exception during highlighting of the results of a phrase query.
(I'm not even testing with wildcards yet.)

Here's an extract of the schema before making the change:

  
  
  
  
  
  

And here are the only two lines I changed:

  
  

Here's a sample minimal query that worked perfectly before making the change:

defType=edismax
q="space administration"
fl=id,title
qf=fulltext concept_search
hl=true
hl.method=unified
hl.fl=*

After making the change to the schema, I now get this exception in the Solr log:

o.a.s.s.HttpSolrCall null:java.lang.IllegalStateException: field "fulltext" was 
indexed without position data; cannot run PhraseQuery (phrase=fulltext:"space 
administr")
at 
org.apache.lucene.search.PhraseQuery$1.getPhraseMatcher(PhraseQuery.java:446)
at 
org.apache.lucene.search.PhraseWeight.lambda$matches$0(PhraseWeight.java:89)
at org.apache.lucene.search.MatchesUtils.forField(MatchesUtils.java:101)
at org.apache.lucene.search.PhraseWeight.matches(PhraseWeight.java:88)
at 
org.apache.lucene.search.DisjunctionMaxQuery$DisjunctionMaxWeight.matches(DisjunctionMaxQuery.java:125)
at 
org.apache.lucene.search.uhighlight.FieldOffsetStrategy.createOffsetsEnumsWeightMatcher(FieldOffsetStrategy.java:138)
at 
org.apache.lucene.search.uhighlight.FieldOffsetStrategy.createOffsetsEnumFromReader(FieldOffsetStrategy.java:74)
at 
org.apache.lucene.search.uhighlight.TermVectorOffsetStrategy.getOffsetsEnum(TermVectorOffsetStrategy.java:49)
at 
org.apache.lucene.search.uhighlight.FieldHighlighter.highlightFieldForDoc(FieldHighlighter.java:76)
at 
org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFieldsAsObjects(UnifiedHighlighter.java:639)
at 
org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFields(UnifiedHighlighter.java:508)
at 
org.apache.solr.highlight.UnifiedSolrHighlighter.doHighlighting(UnifiedSolrHighlighter.java:149)
at 
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:171)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2566)
etc.

The response includes search results, but no highlighting information.

Of interest is that the exception is against the field "fulltext",
whose definition I _didn't_ change.

If I remove the "fulltext" field from qf, so that the query is now this:

defType=edismax
q="space administration"
fl=id,title
qf=concept_search
hl=true
hl.method=unified
hl.fl=*

the log now has this exception:

o.a.s.s.HttpSolrCall null:java.lang.IllegalStateException: field 
"concept_search" was indexed without position data; cannot run PhraseQuery 
(phrase=concept_search:"space administr")
at 
org.apache.lucene.search.PhraseQuery$1.getPhraseMatcher(PhraseQuery.java:446)
at 
org.apache.lucene.search.PhraseWeight.lambda$matches$0(PhraseWeight.java:89)
at org.apache.lucene.search.MatchesUtils.forField(MatchesUtils.java:101)
at org.apache.lucene.search.PhraseWeight.matches(PhraseWeight.java:88)
at 
org.apache.lucene.search.uhighlight.FieldOffsetStrategy.createOffsetsEnumsWeightMatcher(FieldOffsetStrategy.java:138)
at 
org.apache.lucene.search.uhighlight.FieldOffsetStrategy.createOffsetsEnumFromReader(FieldOffsetStrategy.java:74)
at 
org.apache.lucene.search.uhighlight.TermVectorOffsetStrategy.getOffsetsEnum(TermVectorOffsetStrategy.java:49)
at 
org.apache.lucene.search.uhighlight.FieldHighlighter.highlightFieldForDoc(FieldHighlighter.java:76)
at 
org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFieldsAsObjects(UnifiedHighlighter.java:639)
at 
org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFields(UnifiedHighlighter.java:508)
at 
org.apache.solr.highlight.UnifiedSolrHighlighter.doHighlighting(UnifiedSolrHighlighter.java:149)
at 
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:171)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
at 

Re: Upload/use a plugin JAR in ZooKeeper

2019-07-18 Thread Richard Walker
On 19 Jul 2019, at 12:02 pm, Chee Yee Lim  wrote:
> Not sure if this is the recommended way, but I managed to use plugin JARs
> with Solr Cloud.
> 
> Either include the absolute path to JAR in solrconfig.xml, or put the JAR
> in a "lib" folder relative to your instanceDir. See the following text from
> solrconfig.xml.

As I already noted in my original message of 16 July:

> I've been able to get this to work the "simple" way,
> by putting the JAR in the file system, and specifying
> basic
> 
>  
>  
> 
> values in solrconfig.xml. No problem doing it this way.

... and that this is precisely what I do _not_ want to do,
unless I have to.

I want to use a JAR file uploaded to the collection's znode,
as the user guide strongly suggests is possible.
(And also again, no, I don't want to configure/use the Blob Store.)



Re: Upload/use a plugin JAR in ZooKeeper

2019-07-18 Thread Richard Walker
On 16 Jul 2019, at 4:14 pm, Richard Walker  wrote:
> ...
> 
> To be specific, I'm trying to use this idea:
> 
> "Resources and plugins may be stored:
> • in ZooKeeper under a collection’s configset node (SolrCloud only);"
> 
> ...
> 
> So far, so good. But now how do I refer to the JAR in solrconfig.xml?
> The user guide doesn't really say.
> 
> ...
> 
> No success at all; I only get a ClassNotFoundException
> for the plugin class.
> 
> ...

I've now found this earlier thread:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201701.mbox/%3ccakhkodqv-y59+7m86ogvf1feqj6ieiogp8trhl1mg5fuajl...@mail.gmail.com%3e

in which the second message (from Shawn Heisey) says:

> I actually do not know what the path for lib directives is relative to
> when running SolrCloud.  Most things in a core config are relative to
> the location of the config file itself, but in this case, the config
> file is not on the filesystem at all, it's in zookeeper, and I don't
> think Solr can use jars in zookeeper.  

So is this the definitive answer? As I suggested in my
earlier message, the documentation in the user guide at
https://lucene.apache.org/solr/guide/8_1/resource-and-plugin-loading.html
strongly suggests that you _can_ use plugin JARs uploaded
to a collection's znode.

Richard.



Upload/use a plugin JAR in ZooKeeper

2019-07-16 Thread Richard Walker
Hi, I'm trying to use a plugin JAR containing
a custom query parser.

I've been able to get this to work the "simple" way,
by putting the JAR in the file system, and specifying
basic

  
  

values in solrconfig.xml. No problem doing it this way.

But I'm running in SolrCloud mode and I'd like to take
advantage of an option that the user guide seems to offer
at this page:

https://lucene.apache.org/solr/guide/8_1/resource-and-plugin-loading.html

But, so far, I don't see how to make it work.

To be specific, I'm trying to use this idea:

"Resources and plugins may be stored:
• in ZooKeeper under a collection’s configset node (SolrCloud only);"

Note: I'm _not_ trying to do the _third_ option listed, i.e.,
"• in Solr’s Blob Store (SolrCloud only)", that uses
the ".system" collection.

The user guide seems to suggest that I can upload the JAR
to the collection's config using zk cp:
"To upload a plugin or resource to a configset
already stored on ZooKeeper, you can use bin/solr zk cp."

So, I've used zk cp to upload the JAR to
zk:/configs/my_collection/my_plugin.jar
(I also tried various other subdirectories such as
zk:/configs/my_collection/lib/my_plugin.jar)

So far, so good. But now how do I refer to the JAR in solrconfig.xml?
The user guide doesn't really say.

I've tried specifying the location of the JAR
with various values of  element.

No success at all; I only get a ClassNotFoundException
for the plugin class.

Could someone please tell me what I'm missing, i.e., what
I need to do to use a plugin JAR stored
"in ZooKeeper under a collection’s configset node"?

Richard.