Re: [basex-talk] Performance issue with BaseX CLI

2024-04-22 Thread ANDRADE Antonio
Hie Christian,



You're right : historically, the XQuery code was executed with the Saxon 
engine. This is no longer possible without paying a license. In addition to 
the cost generated, this limits the replicability of the processing. This is 
why we are evaluating the BaseX solution.



I don't see how to profile XQuery code. I will carry out tests with a 
database. I will also improve the syntax of some queries. I will keep you 
informed of the results.



Thanks a lot,

Antonio



De : Christian Grün 
Envoyé : lundi 22 avril 2024 13:45
À : ANDRADE Antonio 
Cc : basex-talk@mailman.uni-konstanz.de
Objet : Re: [basex-talk] Performance issue with BaseX CLI



Hi again,



I had a quick look into the monitoring code, and I noticed two things:



1. It looks to me (correct me if I’m wrong) as if the code of the project 
was initially written for Saxon and then ported to BaseX. If you are 
interested in using BaseX, you could focus on the slow functions, try 
alternative writings and (if you want to run the code with both processors 
in the future) ensure that Saxon still gives delivers good performance.



2. Some functions can be noticeably sped up (for both BaseX and Saxon) if 
you use XQuery 3.1 features such as maps or group by. For example, the 
runtime of #131014 could possibly be reduced with something similar to…



  for $ms in $Monitoring/*:MonitoringSite
  let $emsc := $ms/*:euMonitoringSiteCode
  for $ceqm in $ms/*:ChemicalEcologicalQuantitativeMonitoring
  let $V_rech := $ceqm/*:parameterCode || '/' || $ceqm/*:parameterOther || 
'/' || $ceqm/*:chemicalMatrix
  group by $group := $emsc || ': ' || $V_rech
  where count($ceqm) > 1
  return $V_rech



If BaseX turns out to be the way to go, it’s definitely worth taking 
advantage of the database aspect. In BaseX, databases are fairly 
light-weight, which means you can simply create them before running the 
queries (e.g., with a single 'CREATE DB poc 
/path/to/poc_rapportage_controle-main/xml' command) and use db:get('poc', 
'your-doc.xml') in the queries to access a document (or even stick with 
doc('your-doc.xml') if you enable DEFAULTDB [1]).



Hope this helps,

Christian



[1] 

 
https://docs.basex.org/wiki/Options#DEFAULTDB





On Mon, Apr 22, 2024 at 9:32 AM Christian Grün mailto:christian.gr...@gmail.com> > wrote:

Hi Antonio,



As Liam indicated, you may get better performance when adding your documents 
to a database.



In general, though, the runtimes of BaseX and Saxon have aligned pretty much 
over the years, and I assume there’ll be a trivial reason behind the drastic 
difference in the runtime.



Your test setup is probably too complex for us readers to spend more time 
with it. Could you possibly share a more basic example with us, ideally with 
a single document and query file?



Thanks in advance,

Christian







On Mon, Apr 22, 2024 at 8:54 AM ANDRADE Antonio mailto:antonio.andr...@ofb.gouv.fr> > wrote:

  @Liam R. E. Quin : Thanks for your feedback. 
The processing time is between 2 minutes and more than 11 hours (see table 
below). Thus, the loading time of the Java virtual machine has little 
impact. The main XQuery script loads the XML document once at the start of 
processing. It is then requested several times as part of more or less 
complex quality controls. At this moment, the XML document is not intended 
to be stored. This is why it is not loaded into a database before 
processing.






Saxon

BaseX




Start

Stop

Elapse time

Start

Stop

Elapse time


Check Monitoring 2022 FRH

06:16:54

06:19:30

00:02:36

06:44:06

10:05:21

03:21:15


Check Multi schéma 2022 FRH

06:25:46

06:31:47

00:06:01

10:05:55

11:39:07

01:33:12





De : Liam R. E. Quin mailto:l...@fromoldbooks.org> >
Envoyé : samedi 20 avril 2024 05:00
À : ANDRADE Antonio mailto:antonio.andr...@ofb.gouv.fr> >; basex-talk@mailman.uni-konstanz.de 

Objet : Re: [basex-talk] Performance issue with BaseX CLI



On Fri, 2024-04-19 at 10:45 +0200, ANDRADE Antonio wrote:

Hie,



For the purposes of European Water Framework Directive reporting, I compared 
the performances of the Saxon and BaseX XQuery engines.



First, you should consider (as i think Martin said) the Java runtime startup 
time, typically a second or so.



Second, BaseX is a database. If you will process the same document many 
times, first load it into a database and then use the Python BaseX client. 
This will avoid startup time, and, more importantly, will allow BaseX to 
make use of database indexes.



If you will only process any given document once, then 

Re: [basex-talk] Performance issue with BaseX CLI

2024-04-22 Thread Christian Grün
Hi again,

I had a quick look into the monitoring code, and I noticed two things:

1. It looks to me (correct me if I’m wrong) as if the code of the project
was initially written for Saxon and then ported to BaseX. If you are
interested in using BaseX, you could focus on the slow functions, try
alternative writings and (if you want to run the code with both processors
in the future) ensure that Saxon still gives delivers good performance.

2. Some functions can be noticeably sped up (for both BaseX and Saxon) if
you use XQuery 3.1 features such as maps or group by. For example, the
runtime of #131014 could possibly be reduced with something similar to…

  for $ms in $Monitoring/*:MonitoringSite
  let $emsc := $ms/*:euMonitoringSiteCode
  for $ceqm in $ms/*:ChemicalEcologicalQuantitativeMonitoring
  let $V_rech := $ceqm/*:parameterCode || '/' || $ceqm/*:parameterOther ||
'/' || $ceqm/*:chemicalMatrix
  group by $group := $emsc || ': ' || $V_rech
  where count($ceqm) > 1
  return $V_rech

If BaseX turns out to be the way to go, it’s definitely worth taking
advantage of the database aspect. In BaseX, databases are fairly
light-weight, which means you can simply create them before running the
queries (e.g., with a single 'CREATE DB poc
/path/to/poc_rapportage_controle-main/xml' command) and use db:get('poc',
'your-doc.xml') in the queries to access a document (or even stick with
doc('your-doc.xml') if you enable DEFAULTDB [1]).

Hope this helps,
Christian

[1] https://docs.basex.org/wiki/Options#DEFAULTDB


On Mon, Apr 22, 2024 at 9:32 AM Christian Grün 
wrote:

> Hi Antonio,
>
> As Liam indicated, you may get better performance when adding your
> documents to a database.
>
> In general, though, the runtimes of BaseX and Saxon have aligned pretty
> much over the years, and I assume there’ll be a trivial reason behind the
> drastic difference in the runtime.
>
> Your test setup is probably too complex for us readers to spend more time
> with it. Could you possibly share a more basic example with us, ideally
> with a single document and query file?
>
> Thanks in advance,
> Christian
>
>
>
> On Mon, Apr 22, 2024 at 8:54 AM ANDRADE Antonio <
> antonio.andr...@ofb.gouv.fr> wrote:
>
>> @Liam R. E. Quin  : Thanks for your feedback. The
>> processing time is between 2 minutes and more than 11 hours (see table
>> below). Thus, the loading time of the Java virtual machine has little
>> impact. The main XQuery script loads the XML document once at the start of
>> processing. It is then requested several times as part of more or less
>> complex quality controls. At this moment, the XML document is not intended
>> to be stored. This is why it is not loaded into a database before
>> processing.
>>
>>
>>
>>
>>
>> *Saxon*
>>
>> *BaseX*
>>
>>
>>
>> *Start*
>>
>> *Stop*
>>
>> *Elapse time*
>>
>> *Start*
>>
>> *Stop*
>>
>> *Elapse time*
>>
>> Check Monitoring 2022 FRH
>>
>> 06:16:54
>>
>> 06:19:30
>>
>> 00:02:36
>>
>> 06:44:06
>>
>> 10:05:21
>>
>> 03:21:15
>>
>> Check Multi schéma 2022 FRH
>>
>> 06:25:46
>>
>> 06:31:47
>>
>> 00:06:01
>>
>> 10:05:55
>>
>> 11:39:07
>>
>> 01:33:12
>>
>>
>>
>>
>>
>> *De :* Liam R. E. Quin 
>> *Envoyé :* samedi 20 avril 2024 05:00
>> *À :* ANDRADE Antonio ;
>> basex-talk@mailman.uni-konstanz.de
>> *Objet :* Re: [basex-talk] Performance issue with BaseX CLI
>>
>>
>>
>> On Fri, 2024-04-19 at 10:45 +0200, ANDRADE Antonio wrote:
>>
>> Hie,
>>
>>
>>
>> For the purposes of European Water Framework Directive reporting, I
>> compared the performances of the Saxon and BaseX XQuery engines.
>>
>>
>>
>> First, you should consider (as i think Martin said) the Java runtime
>> startup time, typically a second or so.
>>
>>
>>
>> Second, BaseX is a database. If you will process the same document many
>> times, first load it into a database and then use the Python BaseX client.
>> This will avoid startup time, and, more importantly, will allow BaseX to
>> make use of database indexes.
>>
>>
>>
>> If you will only process any given document once, then Saxon may well be
>> the appropriate tool.
>>
>>
>>
>> liam
>>
>>
>>
>>
>>
>> --
>>
>> Liam Quin, https://www.delightfulcomputing.com/
>> 
>>
>> Available for XML/Document/Information Architecture/XSLT/
>>
>> XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
>>
>> Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org
>> 

Re: [basex-talk] file:path-to-native() throws an error if its argument does not exist

2024-04-22 Thread Imsieke, Gerrit, le-tex

Thank you Christian, I should have read the description of file:resolve-path(). 
It does exactly what I need.

Gerrit

On 22.04.2024 10:59, Christian Grün wrote:

Hi Gerrit,

If you don’t need the canonical path to a file resource on the file system, 
file:resolve-path may be the better choice. It can be used for both file URIs 
and local (relative or absolute) paths.

Hope this helps,
Christian


On Mon, Apr 22, 2024 at 8:00 AM Imsieke, Gerrit, le-tex mailto:gerrit.imsi...@le-tex.de>> wrote:

I have a file:// URI that corresponds to a directory that I need to create 
(using svn mkdir, therefore file:create-dir() is not an option here) if it 
doesn’t exist. Calling file:path-to-native() on it results in a file:not-found 
error. Is there a fundamental reason why the file needs to exist before 
transforming its URI into the OS-native representation? Using BaseX 10.7.

Gerrit



Re: [basex-talk] file:path-to-native() throws an error if its argument does not exist

2024-04-22 Thread Christian Grün
Hi Gerrit,

If you don’t need the canonical path to a file resource on the file system,
file:resolve-path may be the better choice. It can be used for both file
URIs and local (relative or absolute) paths.

Hope this helps,
Christian


On Mon, Apr 22, 2024 at 8:00 AM Imsieke, Gerrit, le-tex <
gerrit.imsi...@le-tex.de> wrote:

> I have a file:// URI that corresponds to a directory that I need to create
> (using svn mkdir, therefore file:create-dir() is not an option here) if it
> doesn’t exist. Calling file:path-to-native() on it results in a
> file:not-found error. Is there a fundamental reason why the file needs to
> exist before transforming its URI into the OS-native representation? Using
> BaseX 10.7.
>
> Gerrit
>


Re: [basex-talk] Performance issue with BaseX CLI

2024-04-22 Thread Christian Grün
Hi Antonio,

As Liam indicated, you may get better performance when adding your
documents to a database.

In general, though, the runtimes of BaseX and Saxon have aligned pretty
much over the years, and I assume there’ll be a trivial reason behind the
drastic difference in the runtime.

Your test setup is probably too complex for us readers to spend more time
with it. Could you possibly share a more basic example with us, ideally
with a single document and query file?

Thanks in advance,
Christian



On Mon, Apr 22, 2024 at 8:54 AM ANDRADE Antonio 
wrote:

> @Liam R. E. Quin  : Thanks for your feedback. The
> processing time is between 2 minutes and more than 11 hours (see table
> below). Thus, the loading time of the Java virtual machine has little
> impact. The main XQuery script loads the XML document once at the start of
> processing. It is then requested several times as part of more or less
> complex quality controls. At this moment, the XML document is not intended
> to be stored. This is why it is not loaded into a database before
> processing.
>
>
>
>
>
> *Saxon*
>
> *BaseX*
>
>
>
> *Start*
>
> *Stop*
>
> *Elapse time*
>
> *Start*
>
> *Stop*
>
> *Elapse time*
>
> Check Monitoring 2022 FRH
>
> 06:16:54
>
> 06:19:30
>
> 00:02:36
>
> 06:44:06
>
> 10:05:21
>
> 03:21:15
>
> Check Multi schéma 2022 FRH
>
> 06:25:46
>
> 06:31:47
>
> 00:06:01
>
> 10:05:55
>
> 11:39:07
>
> 01:33:12
>
>
>
>
>
> *De :* Liam R. E. Quin 
> *Envoyé :* samedi 20 avril 2024 05:00
> *À :* ANDRADE Antonio ;
> basex-talk@mailman.uni-konstanz.de
> *Objet :* Re: [basex-talk] Performance issue with BaseX CLI
>
>
>
> On Fri, 2024-04-19 at 10:45 +0200, ANDRADE Antonio wrote:
>
> Hie,
>
>
>
> For the purposes of European Water Framework Directive reporting, I
> compared the performances of the Saxon and BaseX XQuery engines.
>
>
>
> First, you should consider (as i think Martin said) the Java runtime
> startup time, typically a second or so.
>
>
>
> Second, BaseX is a database. If you will process the same document many
> times, first load it into a database and then use the Python BaseX client.
> This will avoid startup time, and, more importantly, will allow BaseX to
> make use of database indexes.
>
>
>
> If you will only process any given document once, then Saxon may well be
> the appropriate tool.
>
>
>
> liam
>
>
>
>
>
> --
>
> Liam Quin, https://www.delightfulcomputing.com/
> 
>
> Available for XML/Document/Information Architecture/XSLT/
>
> XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
>
> Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org
> 
>


Re: [basex-talk] Performance issue with BaseX CLI

2024-04-22 Thread Liam R. E. Quin
On Mon, 2024-04-22 at 08:54 +0200, ANDRADE Antonio wrote:
> At this moment, the XML document is not intended to be stored. This
> is why it is not loaded into a database before processing.

BaseX is designed to operate primarily on documents in the database,
which is why i suggest trying that.

Otherwise it’s be like comparing Excel and Oracle Database in a
benchmark that loaded a CSV file for each query, and concluding Excel
was faster :-)

liam

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org


Re: [basex-talk] file:path-to-native() throws an error if its argument does not exist

2024-04-22 Thread Liam R. E. Quin
On Mon, 2024-04-22 at 08:00 +0200, Imsieke, Gerrit, le-tex wrote:
> I have a file:// URI that corresponds to a directory that I need to
> create (using svn mkdir, therefore file:create-dir() is not an option
> here) if it doesn’t exist. Calling file:path-to-native() on it
> results in a file:not-found error. Is there a fundamental reason why
> the file needs to exist before transforming its URI into the OS-
> native representation? Using BaseX 10.7.

First, as you prolly saw, the spec does say it's an error if the file
does not exist.

Second, it behaves differently if it points to a directory than if it
points to a file, and symbolic links are resolved.

But i agree there doesn't seem a reason for it to be an error if the
file isn't there, except to the extent that it doesn't know whether to
append the file separator for a directory...

It’s documented as non-deterministic, though, so that's OK.

liam

[1] http://expath.org/spec/file#fn.path-to-native
> 

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org