Cool! That phrase "main memory updates" also implies there's something I should learn. I am simply doing updates using a repository. Should I be doing something to trigger main memory updates?
FWIW, here is a small sample of my data (one sentence) and the query I use to expand a morphology code in each <w> element into meaningful, readable attributes. So a <w> element in the output looks like this: <w pos="preposition" n="010010010011" morph="R" lang="H" lemma="b">בְּ</w> <w pos="noun" type="common" gender="feminine" number="singular" state="absolute" n="010010010012" morph="Ncfsa" lang="H" lemma="7225" after=" ">רֵאשִׁית</w> <w pos="verb" stem="qal" type="qatal" person="third" gender="masculine" number="singular" n="010010010021" lang="H" after=" " lemma="1254 a" morph="Vqp3ms" id="01Nvk">בָּרָ֣א</w> <w pos="noun" type="common" gender="masculine" number="plural" state="absolute" n="010010010031" lang="H" after=" " lemma="430" morph="Ncmpa" id="01TyA">אֱלֹהִ֑ים</w> <w pos="particle" type="direct object marker" n="010010010041" lang="H" after=" " lemma="853" morph="To" id="01vuQ">אֵ֥ת</w> <w pos="particle" type="definite article" n="010010010051" morph="Td" lang="H" lemma="d">הַ</w> <w pos="noun" type="common" gender="masculine" number="plural" state="absolute" n="010010010052" morph="Ncmpa" lang="H" lemma="8064" after=" ">שּׁמַ֖יִם</w> <w pos="conjunction" n="010010010061" morph="C" lang="H" lemma="c">וְ</w> <w pos="particle" type="direct object marker" n="010010010062" morph="To" lang="H" lemma="853" after=" ">אֵ֥ת</w> <w pos="particle" type="definite article" n="010010010071" morph="Td" lang="H" lemma="d">הָ</w> <w pos="noun" type="common" gender="both" number="singular" state="absolute" n="010010010072" morph="Ncbsa" lang="H" lemma="776" after=":">אָֽרֶץ</w> These are leaf nodes in a syntax tree - for simplicity, I am not showing the syntax tree here, look to the input file for that. Jonathan On Tue, Feb 22, 2022 at 8:57 AM Christian Grün <christian.gr...@gmail.com> wrote: > A little announcement: With BaseX 10 [1], main memory updates will get > much faster: > > <x>{ > (1 to 1000000) ! <y/> > }</x> update { > y ! (insert node <z/> into .) > } > > BaseX 9: ages (6-7 minutes) > BaseX 10: 3 seconds > > The reason: The disk-based block storage layout is now also used for the > main memory representation of XML nodes. > > [1] https://files.basex.org/releases/latest-10/ > > > On Tue, Feb 22, 2022 at 9:49 AM ETANCHAUD Fabrice < > fabrice.etanch...@maif.fr> wrote: > >> Hi Jonathan ! >> >> Apologizes for my late contribution... >> >> Do you really have to use XQuery Update ? Do you have to stick to a >> specific format ? >> If not, maybe you could use a schema on read approach ? >> I mean, you could add new data as new documents, >> and recombine these documents into the attribute based format when >> requesting the data. >> >> Would that be a viable solution for you ? >> >> I once had success with this solution, as BaseX is very quick at adding >> documents. >> >> Best regards, >> Fabrice >> >> >> ------------------------------ >> *De :* BaseX-Talk <basex-talk-boun...@mailman.uni-konstanz.de> de la >> part de Eliot Kimber <eliot.kim...@servicenow.com> >> *Envoyé :* lundi 21 février 2022 18:06 >> *À :* BaseX <basex-talk@mailman.uni-konstanz.de> >> *Objet :* Re: [basex-talk] Faster in the cloud? >> >> >> You can use prof:track() to time your insertion operation for enough >> iterations to get a reasonable time and then multiply by 2.5 million to get >> an approximate time to completion. >> >> >> >> On my machine I’m finding times around 0.05 seconds for my operations, >> which are more than just attribute insertions, where I need to do 40K >> iterations. I would expect attribute insertion to be faster, especially if >> you can batch up the insertions into a small number of transactions. >> >> >> >> But five hours to do the update doesn’t seem entirely out of spec if your >> machine is significantly slower. Doing the math, I get 7ms per insertion: >> >> >> >> Hours >> >> Seconds/ Hour >> >> Seconds >> >> # operations >> >> Time/operation >> >> 5 >> >> 3600 >> >> 18000 >> >> 2500000 >> >> 0.0072 >> >> >> >> That seems pretty fast on a per-operation standpoint. >> >> >> >> If you can break your content into multiple databases you could >> parallelize the updates across multiple BaseX instances and then combine >> the result back at the end. >> >> >> >> So spin up one server for each core, have a master server that provides a >> REST API to kick off the processing and then use the REST method to farm >> jobs out to each of the servers (using REST to make it easy to target each >> of the servers via a port. Could also do it from a shell script through the >> baseclient command-line.). >> >> >> >> With that should be able to reduce the processing to the time it takes >> one server to process its share, which will be total objects/number of >> cores (its share, that is). >> >> >> >> Cheers, >> >> >> >> E. >> >> >> >> _____________________________________________ >> >> *Eliot Kimber* >> >> Sr Staff Content Engineer >> >> O: 512 554 9368 >> >> M: 512 554 9368 >> >> servicenow.com <https://www.servicenow.com> >> >> LinkedIn <https://www.linkedin.com/company/servicenow> | Twitter >> <https://twitter.com/servicenow> | YouTube >> <https://www.youtube.com/user/servicenowinc> | Facebook >> <https://www.facebook.com/servicenow> >> >> >> >> *From: *BaseX-Talk <basex-talk-boun...@mailman.uni-konstanz.de> on >> behalf of Jonathan Robie <jonathan.ro...@gmail.com> >> *Date: *Monday, February 21, 2022 at 8:44 AM >> *To: *Liam R. E. Quin <l...@fromoldbooks.org> >> *Cc: *BaseX <basex-talk@mailman.uni-konstanz.de> >> *Subject: *Re: [basex-talk] Faster in the cloud? >> >> *[External Email]* >> >> >> >> I have a 2013 Macbook Pro with 16 Gig RAM and a 1 Terabyte SSD. So not >> entirely wimpy, but nowhere near as fast as the current Macbooks, I have no >> idea how that compares to a typical laptop these days. Most things run >> fairly quickly, but inserting 2.5 million attributes into a document takes >> perhaps 5 hours, I didn't time it. I can run that overnight, and do test >> runs on smaller subsets, but I want to think through my options. >> >> >> >> Jonathan >> >> >> >> On Sat, Feb 19, 2022 at 6:11 PM Liam R. E. Quin <l...@fromoldbooks.org> >> wrote: >> >> On Sat, 2022-02-19 at 16:05 -0500, Jonathan Robie wrote: >> > If I am running my queries and updates on a typical laptop, would >> > they run much faster if I ran them on a suitably configured instance >> > in the cloud? >> >> "suitably configured" is very subjective. Potentially your queries >> could run a lot faster. >> >> A lot depends on the speed of the disk (or SSD) in the laptop, and the >> amount of memory it has, as well as the CPU - a recent Macbook Pro will >> be faster than a ten-year-old chromebook. However, server blades (the >> machines used in data centres) typically have much higher bandwidth >> between memory and devices including both the CPU and the long-term >> storage, and likely have more physical RAM than your laptop. >> >> On the other hand, connecting over the network to the cloud can be >> slow.... >> >> Liam >> >> -- >> Liam Quin, https://www.delightfulcomputing.com/ >> <https://urldefense.com/v3/__https:/www.delightfulcomputing.com/__;!!N4vogdjhuJM!V3R7YXmJCN9YvR-YAdDTx7sK3hV2dELnhc4qEd_duk8NH-nwBBxjt670F0zlctGQDZSN7w$> >> Transformers team, Paligo.net >> >> Pictures from old books - www.fromoldbooks.org >> <https://urldefense.com/v3/__http:/www.fromoldbooks.org__;!!N4vogdjhuJM!V3R7YXmJCN9YvR-YAdDTx7sK3hV2dELnhc4qEd_duk8NH-nwBBxjt670F0zlctEnC4Z4jw$> >> >>
<Sentence ID="gn1:1"> <Trees> <Tree> <Node Cat="S" Head="0" nodeId="010010010010281" Length="28" Start="0" End="10"> <Node Cat="CL" Start="0" End="10" Rule="PP-V-S-O" Head="1" nodeId="010010010010280" Length="28"> <Node Cat="PP" Start="0" End="1" Rule="Pp2PP" Head="0" nodeId="010010010010061" Length="6"> <Node Cat="pp" Start="0" End="1" Rule="PrepNp" Head="1" nodeId="010010010010060" Length="6"> <Node Cat="pp" Start="0" End="0" Rule="P2PP" Head="0" nodeId="010010010010011" Length="1"> <Node Cat="prep" Start="0" End="0" Length="1" morphId="010010010011" Unicode="בְּ" nodeId="010010010010010"> <w n="010010010011" morph="R" lang="H" lemma="b">בְּ</w> </Node> </Node> <Node Cat="np" Start="1" End="1" Rule="N2NP" Head="0" nodeId="010010010020051" Length="5"> <Node Cat="noun" Start="1" End="1" Length="5" morphId="010010010012" Unicode="רֵאשִׁ֖ית" nodeId="010010010020050"> <w n="010010010012" morph="Ncfsa" lang="H" lemma="7225" after=" ">רֵאשִׁ֖ית</w> </Node> </Node> </Node> </Node> <Node Cat="V" Start="2" End="2" Rule="Vp2V" Head="0" nodeId="010010010070032" Length="3"> <Node Cat="vp" Start="2" End="2" Rule="V2VP" Head="0" nodeId="010010010070031" Length="3"> <Node Cat="verb" Start="2" End="2" Length="3" morphId="010010010021" Unicode="בָּרָ֣א" nodeId="010010010070030"> <w n="010010010021" lang="H" after=" " lemma="1254 a" morph="Vqp3ms" id="01Nvk">בָּרָ֣א</w> </Node> </Node> </Node> <Node Cat="S" Start="3" End="3" Rule="Np2S" Head="0" nodeId="010010010100052" Length="5"> <Node Cat="np" Start="3" End="3" Rule="N2NP" Head="0" nodeId="010010010100051" Length="5"> <Node Cat="noun" Start="3" End="3" Length="5" morphId="010010010031" Unicode="אֱלֹהִ֑ים" nodeId="010010010100050"> <w n="010010010031" lang="H" after=" " lemma="430" morph="Ncmpa" id="01TyA">אֱלֹהִ֑ים</w> </Node> </Node> </Node> <Node Cat="O" Start="4" End="10" Rule="Np2O" Head="0" nodeId="010010010150141" Length="14"> <Node Cat="np" Start="4" End="10" Rule="NpaNp" Head="0" nodeId="010010010150140" Length="14"> <Node Cat="np" Start="4" End="6" Rule="OmpNP" Head="1" nodeId="010010010150070" Length="7"> <Node Cat="omp" Start="4" End="4" Rule="ObjMarker" Head="0" nodeId="010010010150021" Length="2"> <Node Cat="om" Start="4" End="4" Length="2" morphId="010010010041" Unicode="אֵ֥ת" nodeId="010010010150020"> <w n="010010010041" lang="H" after=" " lemma="853" morph="To" id="01vuQ">אֵ֥ת</w> </Node> </Node> <Node Cat="np" Start="5" End="6" Rule="DetNP" Head="1" nodeId="010010010170050" Length="5"> <Node Cat="art" Start="5" End="5" Length="1" morphId="010010010051" Unicode="הַ" nodeId="010010010170010"> <w n="010010010051" morph="Td" lang="H" lemma="d">הַ</w> </Node> <Node Cat="np" Start="6" End="6" Rule="N2NP" Head="0" nodeId="010010010180041" Length="4"> <Node Cat="noun" Start="6" End="6" Length="4" morphId="010010010052" Unicode="שָּׁמַ֖יִם" nodeId="010010010180040"> <w n="010010010052" morph="Ncmpa" lang="H" lemma="8064" after=" ">שָּׁמַ֖יִם</w> </Node> </Node> </Node> </Node> <Node Cat="cjp" Start="7" End="7" Rule="Cj2Cjp" Head="0" nodeId="010010010220011" Length="1"> <Node Cat="cj" Start="7" End="7" Length="1" morphId="010010010061" Unicode="וְ" nodeId="010010010220010"> <w n="010010010061" morph="C" lang="H" lemma="c">וְ</w> </Node> </Node> <Node Cat="np" Start="8" End="10" Rule="OmpNP" Head="1" nodeId="010010010230060" Length="6"> <Node Cat="omp" Start="8" End="8" Rule="ObjMarker" Head="0" nodeId="010010010230021" Length="2"> <Node Cat="om" Start="8" End="8" Length="2" morphId="010010010062" Unicode="אֵ֥ת" nodeId="010010010230020"> <w n="010010010062" morph="To" lang="H" lemma="853" after=" ">אֵ֥ת</w> </Node> </Node> <Node Cat="np" Start="9" End="10" Rule="DetNP" Head="1" nodeId="010010010250040" Length="4"> <Node Cat="art" Start="9" End="9" Length="1" morphId="010010010071" Unicode="הָ" nodeId="010010010250010"> <w n="010010010071" morph="Td" lang="H" lemma="d">הָ</w> </Node> <Node Cat="np" Start="10" End="10" Rule="N2NP" Head="0" nodeId="010010010260031" Length="3"> <Node Cat="noun" Start="10" End="10" Length="3" morphId="010010010072" Unicode="׃אָֽרֶץ" nodeId="010010010260030"> <w n="010010010072" morph="Ncbsa" lang="H" lemma="776" after="׃">אָֽרֶץ</w> </Node> </Node> </Node> </Node> </Node> </Node> </Node> </Node> </Tree> </Trees> </Sentence>
expand-oshb-attributes.xq
Description: Binary data