Cool!

That phrase "main memory updates" also implies there's something I
should learn.  I am simply doing updates using a repository.  Should I be
doing something to trigger main memory updates?

FWIW, here is a small sample of my data (one sentence) and the query I use
to expand a morphology code in each <w> element into meaningful, readable
attributes.  So a <w> element in the output looks like this:

<w pos="preposition" n="010010010011" morph="R" lang="H" lemma="b">בְּ</w>
<w pos="noun" type="common" gender="feminine" number="singular"
state="absolute" n="010010010012" morph="Ncfsa" lang="H" lemma="7225"
after=" ">רֵאשִׁית</w>
<w pos="verb" stem="qal" type="qatal" person="third" gender="masculine"
number="singular" n="010010010021" lang="H" after=" " lemma="1254 a"
morph="Vqp3ms" id="01Nvk">בָּרָ֣א</w>
<w pos="noun" type="common" gender="masculine" number="plural"
state="absolute" n="010010010031" lang="H" after=" " lemma="430"
morph="Ncmpa" id="01TyA">אֱלֹהִ֑ים</w>
<w pos="particle" type="direct object marker" n="010010010041" lang="H"
after=" " lemma="853" morph="To" id="01vuQ">אֵ֥ת</w>
<w pos="particle" type="definite article" n="010010010051" morph="Td"
lang="H" lemma="d">הַ</w>
<w pos="noun" type="common" gender="masculine" number="plural"
state="absolute" n="010010010052" morph="Ncmpa" lang="H" lemma="8064"
after=" ">שּׁמַ֖יִם</w>
<w pos="conjunction" n="010010010061" morph="C" lang="H" lemma="c">וְ</w>
<w pos="particle" type="direct object marker" n="010010010062" morph="To"
lang="H" lemma="853" after=" ">אֵ֥ת</w>
<w pos="particle" type="definite article" n="010010010071" morph="Td"
lang="H" lemma="d">הָ</w>
<w pos="noun" type="common" gender="both" number="singular"
state="absolute" n="010010010072" morph="Ncbsa" lang="H" lemma="776"
after=":">אָֽרֶץ</w>

These are leaf nodes in a syntax tree - for simplicity, I am not showing
the syntax tree here, look to the input file for that.

Jonathan

On Tue, Feb 22, 2022 at 8:57 AM Christian Grün <christian.gr...@gmail.com>
wrote:

> A little announcement: With BaseX 10 [1], main memory updates will get
> much faster:
>
> <x>{
>   (1 to 1000000) ! <y/>
> }</x> update {
>   y ! (insert node <z/> into .)
> }
>
> BaseX 9: ages (6-7 minutes)
> BaseX 10: 3 seconds
>
> The reason: The disk-based block storage layout is now also used for the
> main memory representation of XML nodes.
>
> [1] https://files.basex.org/releases/latest-10/
>
>
> On Tue, Feb 22, 2022 at 9:49 AM ETANCHAUD Fabrice <
> fabrice.etanch...@maif.fr> wrote:
>
>> Hi Jonathan !
>>
>> Apologizes for my late contribution...
>>
>> Do you really have to use XQuery Update ? Do you have to stick to a
>> specific format ?
>> If not, maybe you could use a schema on read approach ?
>> I mean, you could add new data as new documents,
>> and recombine these documents into the attribute based format when
>> requesting the data.
>>
>> Would that be a viable solution for you ?
>>
>> I once had success with this solution, as BaseX is very quick at adding
>> documents.
>>
>> Best regards,
>> Fabrice
>>
>>
>> ------------------------------
>> *De :* BaseX-Talk <basex-talk-boun...@mailman.uni-konstanz.de> de la
>> part de Eliot Kimber <eliot.kim...@servicenow.com>
>> *Envoyé :* lundi 21 février 2022 18:06
>> *À :* BaseX <basex-talk@mailman.uni-konstanz.de>
>> *Objet :* Re: [basex-talk] Faster in the cloud?
>>
>>
>> You can use prof:track() to time your insertion operation for enough
>> iterations to get a reasonable time and then multiply by 2.5 million to get
>> an approximate time to completion.
>>
>>
>>
>> On my machine I’m finding times around 0.05 seconds for my operations,
>> which are more than just attribute insertions, where I need to do 40K
>> iterations. I would expect attribute insertion to be faster, especially if
>> you can batch up the insertions into a small number of transactions.
>>
>>
>>
>> But five hours to do the update doesn’t seem entirely out of spec if your
>> machine is significantly slower. Doing the math, I get 7ms per insertion:
>>
>>
>>
>> Hours
>>
>> Seconds/ Hour
>>
>> Seconds
>>
>> # operations
>>
>> Time/operation
>>
>> 5
>>
>> 3600
>>
>> 18000
>>
>> 2500000
>>
>> 0.0072
>>
>>
>>
>> That seems pretty fast on a per-operation standpoint.
>>
>>
>>
>> If you can break your content into multiple databases you could
>> parallelize the updates across multiple BaseX instances and then combine
>> the result back at the end.
>>
>>
>>
>> So spin up one server for each core, have a master server that provides a
>> REST API to kick off the processing and then use the REST method to farm
>> jobs out to each of the servers (using REST to make it easy to target each
>> of the servers via a port. Could also do it from a shell script through the
>> baseclient command-line.).
>>
>>
>>
>> With that should be able to reduce the processing to the time it takes
>> one server to process its share, which will be total objects/number of
>> cores (its share, that is).
>>
>>
>>
>> Cheers,
>>
>>
>>
>> E.
>>
>>
>>
>> _____________________________________________
>>
>> *Eliot Kimber*
>>
>> Sr Staff Content Engineer
>>
>> O: 512 554 9368
>>
>> M: 512 554 9368
>>
>> servicenow.com <https://www.servicenow.com>
>>
>> LinkedIn <https://www.linkedin.com/company/servicenow> | Twitter
>> <https://twitter.com/servicenow> | YouTube
>> <https://www.youtube.com/user/servicenowinc> | Facebook
>> <https://www.facebook.com/servicenow>
>>
>>
>>
>> *From: *BaseX-Talk <basex-talk-boun...@mailman.uni-konstanz.de> on
>> behalf of Jonathan Robie <jonathan.ro...@gmail.com>
>> *Date: *Monday, February 21, 2022 at 8:44 AM
>> *To: *Liam R. E. Quin <l...@fromoldbooks.org>
>> *Cc: *BaseX <basex-talk@mailman.uni-konstanz.de>
>> *Subject: *Re: [basex-talk] Faster in the cloud?
>>
>> *[External Email]*
>>
>>
>>
>> I have a 2013 Macbook Pro with 16 Gig RAM and a 1 Terabyte SSD.  So not
>> entirely wimpy, but nowhere near as fast as the current Macbooks, I have no
>> idea how that compares to a typical laptop these days.  Most things run
>> fairly quickly, but inserting 2.5 million attributes into a document takes
>> perhaps 5 hours, I didn't time it.  I can run that overnight, and do test
>> runs on smaller subsets, but I want to think through my options.
>>
>>
>>
>> Jonathan
>>
>>
>>
>> On Sat, Feb 19, 2022 at 6:11 PM Liam R. E. Quin <l...@fromoldbooks.org>
>> wrote:
>>
>> On Sat, 2022-02-19 at 16:05 -0500, Jonathan Robie wrote:
>> > If I am running my queries and updates on a typical laptop, would
>> > they run much faster if I ran them on a suitably configured instance
>> > in the cloud?
>>
>> "suitably configured" is very subjective.  Potentially your queries
>> could run a lot faster.
>>
>> A lot depends on the speed of the disk (or SSD) in the laptop, and the
>> amount of memory it has, as well as the CPU - a recent Macbook Pro will
>> be faster than a ten-year-old chromebook.  However, server blades (the
>> machines used in data centres) typically have much higher bandwidth
>> between memory and devices including both the CPU and the long-term
>> storage, and likely have more physical RAM than your laptop.
>>
>> On the other hand, connecting over the network to the cloud can be
>> slow....
>>
>> Liam
>>
>> --
>> Liam Quin, https://www.delightfulcomputing.com/
>> <https://urldefense.com/v3/__https:/www.delightfulcomputing.com/__;!!N4vogdjhuJM!V3R7YXmJCN9YvR-YAdDTx7sK3hV2dELnhc4qEd_duk8NH-nwBBxjt670F0zlctGQDZSN7w$>
>> Transformers team, Paligo.net
>>
>> Pictures from old books - www.fromoldbooks.org
>> <https://urldefense.com/v3/__http:/www.fromoldbooks.org__;!!N4vogdjhuJM!V3R7YXmJCN9YvR-YAdDTx7sK3hV2dELnhc4qEd_duk8NH-nwBBxjt670F0zlctEnC4Z4jw$>
>>
>>
<Sentence ID="gn1:1">
  <Trees>
    <Tree>
      <Node Cat="S" Head="0" nodeId="010010010010281" Length="28" Start="0" End="10">
        <Node Cat="CL" Start="0" End="10" Rule="PP-V-S-O" Head="1" nodeId="010010010010280" Length="28">
          <Node Cat="PP" Start="0" End="1" Rule="Pp2PP" Head="0" nodeId="010010010010061" Length="6">
            <Node Cat="pp" Start="0" End="1" Rule="PrepNp" Head="1" nodeId="010010010010060" Length="6">
              <Node Cat="pp" Start="0" End="0" Rule="P2PP" Head="0" nodeId="010010010010011" Length="1">
                <Node Cat="prep" Start="0" End="0" Length="1" morphId="010010010011" Unicode="בְּ" nodeId="010010010010010">
                  <w n="010010010011" morph="R" lang="H" lemma="b">בְּ</w>
                </Node>
              </Node>
              <Node Cat="np" Start="1" End="1" Rule="N2NP" Head="0" nodeId="010010010020051" Length="5">
                <Node Cat="noun" Start="1" End="1" Length="5" morphId="010010010012" Unicode="רֵאשִׁ֖ית" nodeId="010010010020050">
                  <w n="010010010012" morph="Ncfsa" lang="H" lemma="7225" after=" ">רֵאשִׁ֖ית</w>
                </Node>
              </Node>
            </Node>
          </Node>
          <Node Cat="V" Start="2" End="2" Rule="Vp2V" Head="0" nodeId="010010010070032" Length="3">
            <Node Cat="vp" Start="2" End="2" Rule="V2VP" Head="0" nodeId="010010010070031" Length="3">
              <Node Cat="verb" Start="2" End="2" Length="3" morphId="010010010021" Unicode="בָּרָ֣א" nodeId="010010010070030">
                <w n="010010010021" lang="H" after=" " lemma="1254 a" morph="Vqp3ms" id="01Nvk">בָּרָ֣א</w>
              </Node>
            </Node>
          </Node>
          <Node Cat="S" Start="3" End="3" Rule="Np2S" Head="0" nodeId="010010010100052" Length="5">
            <Node Cat="np" Start="3" End="3" Rule="N2NP" Head="0" nodeId="010010010100051" Length="5">
              <Node Cat="noun" Start="3" End="3" Length="5" morphId="010010010031" Unicode="אֱלֹהִ֑ים" nodeId="010010010100050">
                <w n="010010010031" lang="H" after=" " lemma="430" morph="Ncmpa" id="01TyA">אֱלֹהִ֑ים</w>
              </Node>
            </Node>
          </Node>
          <Node Cat="O" Start="4" End="10" Rule="Np2O" Head="0" nodeId="010010010150141" Length="14">
            <Node Cat="np" Start="4" End="10" Rule="NpaNp" Head="0" nodeId="010010010150140" Length="14">
              <Node Cat="np" Start="4" End="6" Rule="OmpNP" Head="1" nodeId="010010010150070" Length="7">
                <Node Cat="omp" Start="4" End="4" Rule="ObjMarker" Head="0" nodeId="010010010150021" Length="2">
                  <Node Cat="om" Start="4" End="4" Length="2" morphId="010010010041" Unicode="אֵ֥ת" nodeId="010010010150020">
                    <w n="010010010041" lang="H" after=" " lemma="853" morph="To" id="01vuQ">אֵ֥ת</w>
                  </Node>
                </Node>
                <Node Cat="np" Start="5" End="6" Rule="DetNP" Head="1" nodeId="010010010170050" Length="5">
                  <Node Cat="art" Start="5" End="5" Length="1" morphId="010010010051" Unicode="הַ" nodeId="010010010170010">
                    <w n="010010010051" morph="Td" lang="H" lemma="d">הַ</w>
                  </Node>
                  <Node Cat="np" Start="6" End="6" Rule="N2NP" Head="0" nodeId="010010010180041" Length="4">
                    <Node Cat="noun" Start="6" End="6" Length="4" morphId="010010010052" Unicode="שָּׁמַ֖יִם" nodeId="010010010180040">
                      <w n="010010010052" morph="Ncmpa" lang="H" lemma="8064" after=" ">שָּׁמַ֖יִם</w>
                    </Node>
                  </Node>
                </Node>
              </Node>
              <Node Cat="cjp" Start="7" End="7" Rule="Cj2Cjp" Head="0" nodeId="010010010220011" Length="1">
                <Node Cat="cj" Start="7" End="7" Length="1" morphId="010010010061" Unicode="וְ" nodeId="010010010220010">
                  <w n="010010010061" morph="C" lang="H" lemma="c">וְ</w>
                </Node>
              </Node>
              <Node Cat="np" Start="8" End="10" Rule="OmpNP" Head="1" nodeId="010010010230060" Length="6">
                <Node Cat="omp" Start="8" End="8" Rule="ObjMarker" Head="0" nodeId="010010010230021" Length="2">
                  <Node Cat="om" Start="8" End="8" Length="2" morphId="010010010062" Unicode="אֵ֥ת" nodeId="010010010230020">
                    <w n="010010010062" morph="To" lang="H" lemma="853" after=" ">אֵ֥ת</w>
                  </Node>
                </Node>
                <Node Cat="np" Start="9" End="10" Rule="DetNP" Head="1" nodeId="010010010250040" Length="4">
                  <Node Cat="art" Start="9" End="9" Length="1" morphId="010010010071" Unicode="הָ" nodeId="010010010250010">
                    <w n="010010010071" morph="Td" lang="H" lemma="d">הָ</w>
                  </Node>
                  <Node Cat="np" Start="10" End="10" Rule="N2NP" Head="0" nodeId="010010010260031" Length="3">
                    <Node Cat="noun" Start="10" End="10" Length="3" morphId="010010010072" Unicode="׃אָֽרֶץ" nodeId="010010010260030">
                      <w n="010010010072" morph="Ncbsa" lang="H" lemma="776" after="׃">אָֽרֶץ</w>
                    </Node>
                  </Node>
                </Node>
              </Node>
            </Node>
          </Node>
        </Node>
      </Node>
    </Tree>
  </Trees>
</Sentence>

Attachment: expand-oshb-attributes.xq
Description: Binary data

Reply via email to