Re: [basex-talk] Faster in the cloud?

Zimmel, Daniel Tue, 22 Feb 2022 06:20:04 -0800

Wait – my tea kettle is not yet prepared for 3 seconds! When shall I do my 
breaks?

Can’t wait to try this in my updating pipelines.
It is always intriguing to learn from this list about performance 
optimizations, be it in the implementation or giving example designs, thanks!

Daniel

Von: BaseX-Talk <basex-talk-boun...@mailman.uni-konstanz.de> Im Auftrag von 
Christian Grün
Gesendet: Dienstag, 22. Februar 2022 14:57
An: ETANCHAUD Fabrice <fabrice.etanch...@maif.fr>
Cc: BaseX <basex-talk@mailman.uni-konstanz.de>
Betreff: Re: [basex-talk] Faster in the cloud?

A little announcement: With BaseX 10 [1], main memory updates will get much 
faster:

<x>{
  (1 to 1000000) ! <y/>
}</x> update {
  y ! (insert node <z/> into .)
}

BaseX 9: ages (6-7 minutes)
BaseX 10: 3 seconds

The reason: The disk-based block storage layout is now also used for the main 
memory representation of XML nodes.

[1] https://files.basex.org/releases/latest-10/

On Tue, Feb 22, 2022 at 9:49 AM ETANCHAUD Fabrice 
<fabrice.etanch...@maif.fr<mailto:fabrice.etanch...@maif.fr>> wrote:
Hi Jonathan !

Apologizes for my late contribution...

Do you really have to use XQuery Update ? Do you have to stick to a specific 
format ?
If not, maybe you could use a schema on read approach ?
I mean, you could add new data as new documents,
and recombine these documents into the attribute based format when requesting 
the data.

Would that be a viable solution for you ?

I once had success with this solution, as BaseX is very quick at adding 
documents.

Best regards,
Fabrice

________________________________
De : BaseX-Talk 
<basex-talk-boun...@mailman.uni-konstanz.de<mailto:basex-talk-boun...@mailman.uni-konstanz.de>>
 de la part de Eliot Kimber 
<eliot.kim...@servicenow.com<mailto:eliot.kim...@servicenow.com>>
Envoyé : lundi 21 février 2022 18:06
À : BaseX 
<basex-talk@mailman.uni-konstanz.de<mailto:basex-talk@mailman.uni-konstanz.de>>
Objet : Re: [basex-talk] Faster in the cloud?

You can use prof:track() to time your insertion operation for enough iterations 
to get a reasonable time and then multiply by 2.5 million to get an approximate 
time to completion.

On my machine I’m finding times around 0.05 seconds for my operations, which 
are more than just attribute insertions, where I need to do 40K iterations. I 
would expect attribute insertion to be faster, especially if you can batch up 
the insertions into a small number of transactions.

But five hours to do the update doesn’t seem entirely out of spec if your 
machine is significantly slower. Doing the math, I get 7ms per insertion:

Hours

Seconds/ Hour

Seconds

# operations

Time/operation

5

3600

18000

2500000

0.0072

That seems pretty fast on a per-operation standpoint.

If you can break your content into multiple databases you could parallelize the 
updates across multiple BaseX instances and then combine the result back at the 
end.

So spin up one server for each core, have a master server that provides a REST 
API to kick off the processing and then use the REST method to farm jobs out to 
each of the servers (using REST to make it easy to target each of the servers 
via a port. Could also do it from a shell script through the baseclient 
command-line.).

With that should be able to reduce the processing to the time it takes one 
server to process its share, which will be total objects/number of cores (its 
share, that is).

Cheers,

E.

_____________________________________________

Eliot Kimber

Sr Staff Content Engineer

O: 512 554 9368

M: 512 554 9368

servicenow.com<https://www.servicenow.com>

LinkedIn<https://www.linkedin.com/company/servicenow> | 
Twitter<https://twitter.com/servicenow> | 
YouTube<https://www.youtube.com/user/servicenowinc> | 
Facebook<https://www.facebook.com/servicenow>

From: BaseX-Talk 
<basex-talk-boun...@mailman.uni-konstanz.de<mailto:basex-talk-boun...@mailman.uni-konstanz.de>>
 on behalf of Jonathan Robie 
<jonathan.ro...@gmail.com<mailto:jonathan.ro...@gmail.com>>
Date: Monday, February 21, 2022 at 8:44 AM
To: Liam R. E. Quin <l...@fromoldbooks.org<mailto:l...@fromoldbooks.org>>
Cc: BaseX 
<basex-talk@mailman.uni-konstanz.de<mailto:basex-talk@mailman.uni-konstanz.de>>
Subject: Re: [basex-talk] Faster in the cloud?

[External Email]

I have a 2013 Macbook Pro with 16 Gig RAM and a 1 Terabyte SSD.  So not 
entirely wimpy, but nowhere near as fast as the current Macbooks, I have no 
idea how that compares to a typical laptop these days.  Most things run fairly 
quickly, but inserting 2.5 million attributes into a document takes perhaps 5 
hours, I didn't time it.  I can run that overnight, and do test runs on smaller 
subsets, but I want to think through my options.

Jonathan

On Sat, Feb 19, 2022 at 6:11 PM Liam R. E. Quin 
<l...@fromoldbooks.org<mailto:l...@fromoldbooks.org>> wrote:

On Sat, 2022-02-19 at 16:05 -0500, Jonathan Robie wrote:
> If I am running my queries and updates on a typical laptop, would
> they run much faster if I ran them on a suitably configured instance
> in the cloud?

"suitably configured" is very subjective.  Potentially your queries
could run a lot faster.

A lot depends on the speed of the disk (or SSD) in the laptop, and the
amount of memory it has, as well as the CPU - a recent Macbook Pro will
be faster than a ten-year-old chromebook.  However, server blades (the
machines used in data centres) typically have much higher bandwidth
between memory and devices including both the CPU and the long-term
storage, and likely have more physical RAM than your laptop.

On the other hand, connecting over the network to the cloud can be
slow....

Liam

--
Liam Quin, 
https://www.delightfulcomputing.com/<https://urldefense.com/v3/__https:/www.delightfulcomputing.com/__;!!N4vogdjhuJM!V3R7YXmJCN9YvR-YAdDTx7sK3hV2dELnhc4qEd_duk8NH-nwBBxjt670F0zlctGQDZSN7w$>
Transformers team, Paligo.net

Pictures from old books - 
www.fromoldbooks.org<https://urldefense.com/v3/__http:/www.fromoldbooks.org__;!!N4vogdjhuJM!V3R7YXmJCN9YvR-YAdDTx7sK3hV2dELnhc4qEd_duk8NH-nwBBxjt670F0zlctEnC4Z4jw$>

Re: [basex-talk] Faster in the cloud?

Reply via email to