I've got a collection for which the schema has
a number of copyFields that have a wildcard in the source:

  <copyField source="skos_prefLabel-*" dest="skos_prefLabel_all"/>

The idea is that I have fields in each document
that contain language-specific values in
fields that have field names that end in a language tag,
i.e., "skos_prefLabel-en", "skos_prefLabel-de",
"skos_prefLabel-fr", etc.
Let's say for this example that we have a Solr document
with:
{ ...,
  "skos_prefLabel-en": "One",
  "skos_prefLabel-de": "Eins",
  "skos_prefLabel-fr": "Un",
  ...
}

[ Let's leave aside the issue of what the field
type for "skos_prefLabel_all" should be; let's assume I'm
happy for it to be (say) "text_en_splitting" and
(for now) I'll live with the fact that this is wrong. ]

The idea is to be able to do searching and highlighting
on one or more specific languages, and _also_ to
be able to do a language-independent search, or,
if you like, to search for values in all languages
in one go. I want to display details of matches
and highlighting _with their language information_.

The problem: suppose I get a match and some
highlighting against the field skos_prefLabel_all.
How do I know which field(s) the data _came_ from?

My guess: when using a copyField in this way
(i.e., with a wildcard in the source),
it's not (in general) possible to work backwards from the
destination field to work out which source field
the content came from.

If that is so, one way to get what I want would
seem to be to _not_ use a copyField, but to
construct the Solr documents such that they
already contain a value for skos_prefLabel_all,
let's say, ["One", "Eins", "Un"],
and (let's say) for another field skos_prefLabel_all_languages,
that would then in this case have the value ["en", "de", "fr"],
i.e., such that there's a one-to-one match
between the values of skos_prefLabel_all and the
corresponding values of skos_prefLabel_all_languages.

Now I can display results with corresponding
language tags. Dealing with highlighting data
would still currently seem to be problematic,
but would be possible with something like
David Smiley's work at
https://issues.apache.org/jira/browse/SOLR-1954 .

Surely I'm missing something here.
Is there another/better way?

Richard.

Reply via email to