Hi Christian,

Thank you very much for your explanation and variant example.

In my use case, the local:search function is itself being called (as a
named function reference) from within another function that is the endpoint
of a RESTXQ API. This containing function handles a number of things e.g.
pagination, deduplication, transformation of XML into HTML.

Even when I rewrite this local:search to match your variant example,
incorrect results are still returned. But when I then add %rest:GET
annotations to turn the local:search function into its own endpoint, the
correct results are returned only when I use that endpoint directly.

Thus I assume the containing function makes things again too complicated
for metadata to be propagated.

Does that sound plausible to you? And can you suggest any simple ways
around it? I'm afraid applying %basex:inline hasn't helped.

Very best,

Jack

On Fri, 1 Mar 2024, 8:12 pm Christian Grün, <christian.gr...@gmail.com>
wrote:

> Hi Jack,
>
> > When you say you can't reproduce it, do you mean you get 14 results from
> running this script?
>
> Yes, that’s what I meant.
>
> The upcoming information will be very technical and specific. You are
> welcome to focus on the examples.
>
> Your updated example was helpful, and I noticed it’s a bunch of issues
> that lead to the unexpected results. The core challenge is that ft:mark and
> ft:extract only yield expected results if the internally collected
> full-text metadata is not lost at some stage during the internal processing
> – which can happen at many places hidden to the writer of the query.
>
> In your specific example, the full-text information gets lost because the 
> local:search
> function is too complex to be inlined by the compiler (which enables
> further optimizations that eventually allow metadata propagation). You can
> tackle this by forcing the compiler to inline your function:
>
>   declare %basex:inline function local:search(...)
>
> Using '(ethnicgroups, languages)' instead of 'name() = (...)' is another
> practical advice; it helps the optimizer to detect at compile time that
> metadata will be available at runtime. Another solution is to use
> 'local-name()' instead of 'name()' (local-name does not rely on namespace
> that may possibly occur in a database, which also affects the way how
> full-text queries are evaluated).
>
> Here’s a variant that should work:
>
> declare function local:search(
>   $database  as xs:string,
>   $query     as xs:string
> ) {
>   let $country := ft:search($database, $query)/ancestor::country
>   let $search := function($node) { $node/text() contains text { $query } }
>   return (
>     ft:mark($country[.//name[$search(.)]]),
>     ft:mark($country[.//city[$search(.)]]),
>     ft:mark($country[.//(ethnicgroups, languages)[$search(.)]])
>   )
> };
> local:search('factbook', 'German')
>
> …or…
>
>   let $search := function($nodes) { $nodes[text() contains text { $query
> }] }
>   return (ft:mark($country[$search(.//name)]), ...
>
> From today’s perspective, we would certainly design ft:mark and ft:extract
> in a way that the results are always correct. The consequences, however,
> would be a much more restricted syntax.
>
> Hope this helps,
> Christian
>
>
> On Thu, Feb 29, 2024 at 12:13 AM Jack Steyn <steynj...@gmail.com> wrote:
>
>> Hi Christian,
>>
>> When I run your script, I do get 14 elements.
>>
>> When I run the following script I just get 12.
>>
>> <commands>
>>   <set option='ftindex'>true</set>
>>   <create-db name='factbook'>https://files.basex.org/xml/factbook.xml
>> </create-db>
>>   <xquery><![CDATA[
>> declare function local:search(
>>     $database as xs:string,
>>     $query as xs:string
>> ) {
>>     let $country-search := ft:search($database, $query)/ancestor::country
>>     let $city-search := ft:search($database,
>> $query)/ancestor::city/ancestor::country
>>     let $other-search := ft:search($database, $query)/parent::*[name() =
>> ('ethnicgroups', 'languages')]/ancestor::country
>>     let $country-mark := $country-search[.//name[text() contains text {
>> $query }]] => ft:mark()
>>     let $city-mark := $city-search[.//city[text() contains text { $query
>> }]] => ft:mark()
>>     let $other-mark := $other-search[.//*[name() = ('ethnicgroups',
>> 'languages')][text() contains text { $query }]] => ft:mark()
>>     return (
>>         $country-mark,
>>         $city-mark,
>>         $other-mark
>>     )
>> };
>>
>> local:search('factbook', 'German')//mark
>>   ]]></xquery>
>> </commands>
>>
>> When you say you can't reproduce it, do you mean you get 14 results from
>> running this script?
>>
>> Cheers,
>>
>> Jack
>>
>> On Thu, 29 Feb 2024, 1:02 am Christian Grün, <christian.gr...@gmail.com>
>> wrote:
>>
>>> Hi Jack,
>>>
>>> Thanks for your observation.
>>>
>>>
>>>> The first result of this query is the entry for Austria. I would expect
>>>> both of the instances of the word 'German' in that entry to be surrounded
>>>> by <mark> tags. However only the first instance is.
>>>>
>>>
>>> I couldn’t reproduce this yet. Here’s a command script that returns 14
>>> <mark>German</mark> elements:
>>>
>>> <commands>
>>>   <set option='ftindex'>true</set>
>>>   <create-db name='factbook'>https://files.basex.org/xml/factbook.xml
>>> </create-db>
>>>   <xquery><![CDATA[
>>> let $groups := ('ethnicgroups', 'languages')
>>> let $database := 'factbook'
>>> let $query := 'German'
>>>
>>> let $search := ft:search($database, $query)/parent::*
>>>   [name() = $groups]/ancestor::country
>>> let $marked := ft:mark(
>>>   $search[.//*[name() = $groups][text() contains text { $query }]]
>>> )
>>> return $marked//*[text() = 'German']
>>>   ]]></xquery>
>>> </commands>
>>>
>>> Could you check if you get the same result?
>>>
>>> Thanks in advance
>>> Christian
>>>
>>>

Reply via email to