Forwarding to the mailing list in order to share knowledge. On Fri, Nov 12, 2021 at 1:41 PM BaseX Support <supp...@basex.org> wrote:
> Hi France, > > I’d need to get my hands on your code to tell you exactly where it’s > best used, but I can give you some more details on the XQuery > specification: > > When creating new nodes in XQuery via node constructors [1], copies of > all enclosed nodes will be created, and the copied nodes get new node > identities. As a result, the following query yields false: > > let $a := <a/> > let $b := <b>{ $a }</b> > return $b/a is $a > > This step can be very expensive and memory consuming. If the option is > enabled, child nodes will only be linked to the new parent nodes, and > the upper query returns true. > > As the option changes the semantics of XQuery, it should preferably be > used in Pragmas. > > Best, > Christian > > PS: Mails to our mailing list are preferred; this way, other users > might benefit from the replies as well. > > [1] https://www.w3.org/TR/xquery-31/#id-constructors > > > > On Fri, Nov 12, 2021 at 2:13 PM France Baril > <france.ba...@architextus.com> wrote: > > > > Can you give me more information about how copynode changes the behavior > of the xquery and where it is best used. > > > > I see in the example that the pragma is on db:open. My process is: > > > > 1. Read a document A from DB called lang that has references to other > documents in the same DB lang (where lang is a 4 letter code for a locale). > > 2. Merge all the references into document A to create an aggregate. > > 3. Send the aggregate through multiple functions (that use > copy-modify-return) that each resolve a type of reference (most references > grab referenced content from a DB called global, but others grab it from > the lang DB). These references do not grad entire documents, but smaller > snippets within XML documents. > > 4. Save the result in a DB called staging-lang (where lang is a 4 letter > code for a locale). > > > > So should the pragma apply when reading the 1st document (1), when > reading the documents we aggregate into the 1st document (2), when grabbing > the snippets (3) and/or when saving the end result in the staging DB (4)? > Or maybe for all db:open() and db:attribute()/.. functions in this process? > > > > > > > > > > > > > > > > On Fri, Nov 12, 2021 at 12:16 PM BaseX Support <supp...@basex.org> > wrote: > >> > >> One more suggestion: > >> > >> If node construction turns out to consume too much memory, it sometimes > helps to disable the COPYNODE option: > >> > >> https://docs.basex.org/wiki/XQuery_Extensions#Database_Pragmas > >> > >> > >> > >> France Baril <france.ba...@architextus.com> schrieb am Fr., 12. Nov. > 2021, 13:09: > >>> > >>> Hi, > >>> > >>> Thanks for your answer. > >>> > >>> I tried rebuilding the document instead of using copies, I have > >>> implemented 3/4 of the functions that resolve references and I'm > >>> already at double the time I had before. So I will set that one aside > >>> as an unsuccessful alternative. If memory serves me correctly we might > >>> have moved from a transform that rebuilds the document to a > >>> copy-modify-return approach to improve performance over a year ago. > >>> > >>> I will try grouping the references of the same names in the example > >>> above to limit the number of queries to the DB. If that still doesn't > >>> help, I will see if I can send you a good example without having to > >>> send too many of our. > >>> > >>> We have a short term solution where we removed some references in > >>> references, which reduces substantially the number of items to resolve > >>> (80% improvement), but it does impact the user experience, so we are > >>> still looking into code-based solutions as opposed to (or to use in > >>> conjunction with) content-based solutions. > >>> > >>> On Fri, Nov 5, 2021 at 5:22 PM BaseX Support <supp...@basex.org> > wrote: > >>> > > >>> > Hi France, > >>> > > >>> > Do you have some sample data that allows us to test your code? > >>> > > >>> > If documents are pretty large, it’s sometimes faster to rebuild a > >>> > document with node constructors instead of performing updates on it. > >>> > > >>> > Best, > >>> > Christian > >>> > ____________________________________ > >>> > > >>> > > We have a query that looks like this: > >>> > > > >>> > > declare function content-refs:resolve-prompt-refs-new($node as > node(), > >>> > > $lang as xs:string) as node()*{ > >>> > > let $result := > >>> > > copy $copy := $node > >>> > > modify( > >>> > > let $entries := > >>> > > $copy/descendant-or-self::*[@name-ref][name()='prompt-ref' or > >>> > > name()='gui-ctrl-ref' > >>> > > or name()='feature-ref' or name()='app-ref' (: or > >>> > > name()='screen-ref':)] > >>> > > > >>> > > let $entries-hd := > >>> > > > $copy/descendant-or-self::*[@id='T1700243243']/descendant-or-self::*[@name-ref][name()='prompt-ref' > >>> > > or name()='gui-ctrl-ref' > >>> > > or name()='feature-ref' or name()='app-ref' (: or > >>> > > name()='screen-ref':)] > >>> > > > >>> > > let $trace := trace('Prompts count: ' || count($entries)) > >>> > > let $trace := trace('Prompts in Hardware diagram: ' || > >>> > > count($entries-hd)) > >>> > > > >>> > > for $entry in $entries > >>> > > (:let $trace := trace('start processing entry'):) > >>> > > let $name := $entry/data(@name-ref) > >>> > > let $trace := > >>> > > if (exists($entry/ancestor::*[@id = 'T1700243243'])) > >>> > > then trace( $name , ' Promptref ') > >>> > > else () > >>> > > let $prompts-from-index := db:attribute('index-prompt-' > || > >>> > > $lang, $name, 'name')/.. (:=> prof:time('index prompt attr: '):) > >>> > > (:let $prompts-from-index := db:open('index-prompt-' || > >>> > > $lang)//*[@name = $name] => prof:time('index prompt open: '):) > >>> > > let $prompts := > >>> > > for $prompt in $prompts-from-index > >>> > > let $original-elem-name := $entry/self::*/name() > >>> > > let $new-elem-name := > >>> > > switch ($original-elem-name) > >>> > > case 'prompt-ref' return $original-elem-name > >>> > > default return > substring-before($original-elem-name, '-ref') > >>> > > return > >>> > > copy $prompt-renamed := $prompt > >>> > > modify( > >>> > > rename node $prompt-renamed as $new-elem-name > >>> > > ) > >>> > > return $prompt-renamed (:=> prof:time('index prompt > new > >>> > > elem-name: '):) > >>> > > let $new-node := > >>> > > if (count($prompts) = 0) > >>> > > then > >>> > > <filter-group error="{concat("No target found in > for: ", > >>> > > $entry/name(), '/@name-ref=', > $entry/@name-ref)}"/> > >>> > > else <filter-group-inline>{ > >>> > > $prompts > >>> > > }</filter-group-inline> > >>> > > let $trace := ('Ready to replace old entry with > new-node') > >>> > > return replace node $entry with $new-node (:=> > >>> > > prof:time('index prompt new node: '):) > >>> > > > >>> > > ) > >>> > > return $copy (:=> prof:time('index prompt return copy: '):) > >>> > > return $result > >>> > > > >>> > > }; > >>> > > > >>> > > As you can see, we are using prof:time to see how quickly items are > >>> > > resolved. Querying to the db for each item goes fairly quickly (2 > >>> > > seconds). However that last 'return $copy' line, after all the > >>> > > replacements are processed takes between 11 and 25 minutes > depending > >>> > > on the system. Memory usage is low, but the CPU usage goes to the > >>> > > roof. > >>> > > > >>> > > We are updating a little over 110 000 items in this operation, so > it > >>> > > is a big operation on a file of about 89000 indented lines. We are > >>> > > wondering if there is a way we could improve the performance. > Before > >>> > > this operation occurs, we are processing the file multiple times to > >>> > > replace other items with very similar functions > (copy-modify.return), > >>> > > they all go fairly quickly so it does seem that the culprit is the > >>> > > number of items being replaced. > >>> > > > >>> > > > >>> > > -- > >>> > > France Baril > >>> > > Architecte documentaire / Documentation architect > >>> > > france.ba...@architextus.com > >>> > >>> > >>> > >>> -- > >>> France Baril > >>> Architecte documentaire / Documentation architect > >>> france.ba...@architextus.com > > > > > > > > -- > > France Baril > > Architecte documentaire / Documentation architect > > france.ba...@architextus.com > -- France Baril Architecte documentaire / Documentation architect france.ba...@architextus.com