Am 13.08.2021 um 00:12 schrieb Tamara Marnell:
Short question: Is it possible to write an XQuery FLWOR statement that
can return a set of unique values present across multiple databases?
Long question: Our new website in development displays EAD finding
aids stored across 45 databases in BaseX. I've built "facet" databases
that index terms in the EADs from controlled vocabularies like
subjects, places, personal names, etc. The indexes follow this
structure, where each EAD node contains a unique identifier:
<terms type="subject">
<term text="Literature" db="1">
<ead>12345</ead>
<ead>67890</ead>
</term>
<term text="Poetry" db="1">
<ead>abcde</ead>
</term>
{etc.}
</terms>
In the search interface, users can select multiple facets to apply to
one search. For example, they could browse database 12 for EADs with
the subject "Literature" /and/ the place "Oregon," etc.
I currently use the REST server to run an XQuery file that loops
through each selected facet and prints /all/ EAD IDs for each
submitted term and database. Then after results are returned, I use
PHP to count occurences of each EAD and print them only if the total
count matches the count of facets used.
declare variable $d as xs:string external;
declare variable $f as xs:string external;
let $db_ids := tokenize($d, '\|')
return <facets>{
for $facet in tokenize($f, '\|')
let $split := tokenize($facet, ':')
let $facet_type := $split[1]
let $facet_term := $split[2]
let $facet_db := 'facet-' || $facet_type
return <facet type="{$facet_type}" term="{$facet_term}">{
for $ead in db:open($facet_db)/terms/term[@text=$facet_term and
@db=$db_ids]/ead
return $ead
}</facet>
}</facets>
So in the hypothetical example above, I'd pass "12" as d (or multiple
selected databases separated by bars) and
"subject:Literature|geogname:Oregon" as f, and I'd get back a document
like:
<facets>
<facet type="subject" term="Literature">
<ead>12345</ead>
<ead>67890</ead>
</facet>
<facet type="geogname" term="Oregon">
<ead>12345</ead>
</facet>
</facets>
The count of "12345" will equal the count of the user's selected
facets, so that result will be printed, but 67890 will not.
Is there a more efficient way to do this? I'd prefer the XQuery to
return only the EADs that meet all criteria, so only 12345 would be
returned because it's in facet-subject under Literature /and/ in
facet-geogname under "Oregon," and then I don't have to do any
post-processing.
I think you can use fold-left to reduce the found eas while selecting them:
let $db_ids := tokenize($d, '\|')
return
<facets>{
let $facet-maps :=
fold-left(
for $facet in tokenize($f, '\|')
let $split := tokenize($facet, ':')
let $facet_type := $split[1]
let $facet_term := $split[2]
let $facet_db := 'facet-' || $facet_type
return
map:merge(
for $ead in
db:open($facet_db)/terms/term[@text=$facet_term and @db=$db_ids]/ead
return map:entry(string($ead), map { 'node' : $ead,
'type' : $facet_type, 'term' : $facet_term })
,
map { 'duplicates' : 'combine' }
)
,
map{},
function($ams, $m) {
for $m1 in $ams
return map:remove($m1, map:keys($m1)[not(. =
map:keys($m))]),
$m
}
)
return
for $m in $facet-maps[exists(map:keys(.))]
let $ead1 := $m?*[1]
return
<facet type="{$ead1?type}" term="{$ead1?term}">
{
$m?*?node
}
</facet>
}</facets>