Re: [basex-talk] format-number rounding

2017-01-14 Thread Christian Grün
Hi George,

I guess you’ll need to write an extra function:

  declare function local:floor(
$number as xs:decimal,
$scale  as xs:integer
  ) as xs:decimal {
let $factor := xs:decimal(math:pow(10, $scale))
return floor($number * $factor) div $factor
  };
  local:floor($y, 15.2249348734)

Hope this helps,
Christian



On Fri, Jan 13, 2017 at 2:16 PM, George Sofianos  wrote:
> Hi there, I was wondering if there is a similar function to format-number,
> but without rounding, so I don't have to create a custom one that involves
> string manipulation.
>
> For example I have two values:
> let $x := 15.224134
> let $y := 15.2249348734
>
> The following command will create different output for the two values
> (result for $x will be 15.224, result for $y will be 15.225)
>
> format-number(xs:decimal($x), "0.000"))
>
> Thanks,
>
> George
>


Re: [basex-talk] Losing some space entities

2017-01-14 Thread Christian Grün
Hi France,

In the database, all Unicode characters will be stored in their
standard (decoded) representation. As a result, it is not possible to
preserve entities from an original document. For XML serialization via
WebDAV, we have one special rule for converting non-breaking spaces
(xA0) to entities. Which other Unicode characters would you like to
have converted to entities?

Cheers,
Christian


On Fri, Jan 13, 2017 at 7:45 PM, France Baril
 wrote:
> Hi,
>
> When I serialize content to HTML5, I lose some entities.
>
> xquery: adds '‍' in front of some content and outputs html
> html: should have '‍', but doesn't. I've tried with 8204 and even the
> half space 8201.
>
> The only special space that seems to work is  , but it won't do for
> what I need right now.
>
> Code sample:
>
> let $target-table :=
>   copy $copy := $base-table
>   modify(
>  for $td in $copy//tr/td[position()=$column-to-filter-by]
>  let $new-value := ('‍', for $node in $td/node() return $node)
>  return replace value of node $td with $new-value
>   )
>   return $copy
>
> return
>   $target-table
>
>
> Is there any way to solve this?
>
>
> --
> France Baril
> Architecte documentaire / Documentation architect
> france.ba...@architextus.com


Re: [basex-talk] How to optimally and reliably go about sequentially applying updates with long running transactions?

2017-01-14 Thread Christian Grün
Hi Nicolai,

Welcome to the list.

The semantics of XQuery Update ensures that updates will be executed
after at the end of query evaluation. This also means that your udates
will always be executed at the same order [1].

In many cases, it will be possibly to rewrite sequential update
operations to single ones. If that’s not possible – and in your case
it may be tricky – you will have to define them in multiple RESTXQ
functions and call them successively (see e.g. [2]).

Feel free to provide us with a minimized example, and I can give you
some more hints.

Cheers,
Christian

[1] http://docs.basex.org/wiki/XQuery_Update#Concepts
[2] http://docs.basex.org/wiki/RESTXQ#Response


On Sat, Jan 14, 2017 at 12:52 AM, Mustard Seed
 wrote:
> I'm new to BaseX and XQuery, so I apologize if there's some glaring thing
> I've missed.
>
> What I have are three updating functions that I'm wanting to have executed
> sequentially, successfully, to BaseX.
>
> When I simply place them in my RestXQ function, delimited by commas, they
> all execute, but since at times certain ones take longer than others, and
> since their results can overwrite/negate the results of each other, I can't
> manage to get them to consistently execute in the same order.
>
> It seems to me that since it seemingly is running in parallel that whatever
> happens to end last determines what my final state is.
>
> I imagine there's got to be some blatantly obvious way to do this, but I'm
> just not finding it, I went through the BaseX manual a second time and have
> searched every possible combination of things I can think of to try and
> figure out a way around this.
>
>
> One function does a mass delete of all dbs with a given prefix, another
> repopulates from a sql query, and a third updates the repopulated dbs based
> on yet another sql query. That's the order I want it to execute in, I'm just
> about at my wits end trying to figure out how to do this. I was even getting
> into trying to pull off a hackish attempt at making jobs which I'd schedule
> based on a rough guess of the longest expected execution times for the
> functions. But even that wasn't working.
>
> Any help will be greatly appreciated as insuring the order of execution is
> rather vital for the overall project I'm trying to do, so I need to not only
> find a solution for the present use case, but I need to understand how to
> insure such for any future composite functions.
>
> Thanks,
>
> --Nicolai


Re: [basex-talk] How to optimally and reliably go about sequentially applying updates with long running transactions?

2017-01-14 Thread Christian Grün
PS: I have added a little example to our documentation that
demonstrates how RESTXQ redirection works;

  http://docs.basex.org/wiki/Web_Module#web:redirect


On Sat, Jan 14, 2017 at 10:40 AM, Christian Grün
 wrote:
> Hi Nicolai,
>
> Welcome to the list.
>
> The semantics of XQuery Update ensures that updates will be executed
> after at the end of query evaluation. This also means that your udates
> will always be executed at the same order [1].
>
> In many cases, it will be possibly to rewrite sequential update
> operations to single ones. If that’s not possible – and in your case
> it may be tricky – you will have to define them in multiple RESTXQ
> functions and call them successively (see e.g. [2]).
>
> Feel free to provide us with a minimized example, and I can give you
> some more hints.
>
> Cheers,
> Christian
>
> [1] http://docs.basex.org/wiki/XQuery_Update#Concepts
> [2] http://docs.basex.org/wiki/RESTXQ#Response
>
>
> On Sat, Jan 14, 2017 at 12:52 AM, Mustard Seed
>  wrote:
>> I'm new to BaseX and XQuery, so I apologize if there's some glaring thing
>> I've missed.
>>
>> What I have are three updating functions that I'm wanting to have executed
>> sequentially, successfully, to BaseX.
>>
>> When I simply place them in my RestXQ function, delimited by commas, they
>> all execute, but since at times certain ones take longer than others, and
>> since their results can overwrite/negate the results of each other, I can't
>> manage to get them to consistently execute in the same order.
>>
>> It seems to me that since it seemingly is running in parallel that whatever
>> happens to end last determines what my final state is.
>>
>> I imagine there's got to be some blatantly obvious way to do this, but I'm
>> just not finding it, I went through the BaseX manual a second time and have
>> searched every possible combination of things I can think of to try and
>> figure out a way around this.
>>
>>
>> One function does a mass delete of all dbs with a given prefix, another
>> repopulates from a sql query, and a third updates the repopulated dbs based
>> on yet another sql query. That's the order I want it to execute in, I'm just
>> about at my wits end trying to figure out how to do this. I was even getting
>> into trying to pull off a hackish attempt at making jobs which I'd schedule
>> based on a rough guess of the longest expected execution times for the
>> functions. But even that wasn't working.
>>
>> Any help will be greatly appreciated as insuring the order of execution is
>> rather vital for the overall project I'm trying to do, so I need to not only
>> find a solution for the present use case, but I need to understand how to
>> insure such for any future composite functions.
>>
>> Thanks,
>>
>> --Nicolai


Re: [basex-talk] Gravierende Performance-Einbüße bei Persistierung von mehr als 5000, 160 KB große XML Datenstrukturen.

2017-01-14 Thread Christian Grün
Hi Lucian,

I have a hard time reproducing the reported behavior. The attached,
revised Java example (without AUTOFLUSH) required around 30 ms for the
first documents and 120 ms for the last documents, which is still
pretty far from what you’ve been encountering:

> von Anfang ~ 10 ms auf  ~ 2500 ms kommne würde

But obviously something weird has been going on in your setup. Let’s
see what alternatives we have…

• Could you possibly try to update my example code such that it shows
the reported behavior? Ideally with small input, in order to speed up
the process. Maybe the runtime increase can also be demonstrated after
1.000 or 10.000 documents...
• You could also send me a list of the files of your test_database
directory; maybe the file sizes indicate some unusual patterns.
• You could start BaseXServer with the JVM flag -Xrunhprof:cpu=samples
(to be inserted in the basexserver script), start the server, run your
script, stop the server directly afterwards, and send me the result
file, which will be stored in the directory from where you started
BaseX (java.hprof.txt).

Best,
Christian


On Wed, Jan 11, 2017 at 4:57 PM, Christian Grün
 wrote:
> Hi Lucian,
>
> Thanks for your analysis. Indeed I’m wondering about the monotonic
> delay caused by auto flushing the data; this hasn’t always been the
> case. I’m wondering even more why no one else noticed this in recent
> time.. Maybe it’s not too long ago that this was introduced. It may
> take some time to find the culprit, but I’ll keep you updated.
>
> All the best,
> Christian
>
>
> On Wed, Jan 11, 2017 at 2:46 PM, Bularca, Lucian
>  wrote:
>> Hi Christian,
>>
>> I've made a comparation of the persistence time series running your example 
>> code and mine, in all possible combinations of following scenarios:
>> - with and without "set intparse on"
>> - using my prepared test data and your test data
>> - closing and opening the DB connection each "n"-th insertion operation 
>> (where n in {5, 100, 500, 1000})
>> - with and without "set autoflush on".
>>
>> I finally found out, that the only relevant variable that influence the 
>> insert operation duration is the value of the AUTOFLASH option.
>>
>> If AUTOFLASH = OFF when opening a database, then the persistence durations 
>> remains relative constant (on my machine about 43 ms) during the entire 
>> insert operations sequence (50.000 or 100.000 times), for all possible 
>> combinations named above.
>>
>> If AUTOFLASH = ON when opening a database, then the persistence durations 
>> increase monotonic, for all possible combinations named above.
>>
>> The persistence duration, if AUTOFLASH = ON, is directly proportional to the 
>> number of DB clients executing these insert operations, respectively to the 
>> sequence length of insert operations executed by a DB client.
>>
>> In my opinion, this behaviour is an issue of BaseX, because AUTOFLASH is 
>> implcitly set to ON (see BaseX documentation 
>> http://docs.basex.org/wiki/Options#AUTOFLUSH), so DB clients must explicitly 
>> set AUTOFLASH = OFF in order to keep the insert operation durations 
>> relatively constant over time. Additionally, no explicitly flushing data, 
>> increases the risk of data loss (see BaseX documentation 
>> http://docs.basex.org/wiki/Options#AUTOFLUSH), but clients how repeatedly 
>> execute the FLUSH command increase the durations of the subsequent insert 
>> operations.
>>
>> Regards,
>> Lucian
>>
>> 
>> Von: Christian Grün [christian.gr...@gmail.com]
>> Gesendet: Dienstag, 10. Januar 2017 17:33
>> An: Bularca, Lucian
>> Cc: Dirk Kirsten; basex-talk@mailman.uni-konstanz.de
>> Betreff: Re: [basex-talk] Gravierende Performance-Einbüße bei Persistierung 
>> von mehr als 5000, 160 KB große XML Datenstrukturen.
>>
>> Hi Lucian,
>>
>> I couldn’t run your code example out of the box. 24 hours sounds
>> pretty alarming, though, so I have written my own example (attached).
>> It creates 50.000 XML documents, each sized around 160 KB. It’s not as
>> fast as I had expected, but the total runtime is around 13 minutes,
>> and it only slow down a little when adding more documents...
>>
>> 1: 125279.45 ms
>> 2: 128244.23 ms
>> 3: 130499.9 ms
>> 4: 132286.05 ms
>> 5: 134814.82 ms
>>
>> Maybe you could compare the code with yours, and we can find out what
>> causes the delay?
>>
>> Best,
>> Christian
>>
>>
>> On Tue, Jan 10, 2017 at 4:44 PM, Bularca, Lucian
>>  wrote:
>>> Hi Dirk,
>>>
>>>  of course, querying millions of data entries on a single database rise
>>> problems. This is equally problematic for all databases, not only for the
>>> BaseX DB and certain storing strategies will be mandatory at production
>>> time.
>>>
>>> The actual problem is, that adding 50.000 of 160 KB xml stuctures took 24
>>> hours because that inexplicable monotonic increase of the insert operation
>>> durations.
>>>
>>> I'll really appreciate if someone can explain this behaviour or a
>>> counterexample c

Re: [basex-talk] Gravierende Performance-Einbüße bei Persistierung von mehr als 5000, 160 KB große XML Datenstrukturen.

2017-01-14 Thread Bram Vanroy | KU Leuven
Possibly related, but I'm not sure:

When creating millions of databases in a loop in the same session, I found that 
after some thousands I'd get an OOM error by BaseX. This seemed odd to me, 
because after each iteration, the database creation query was closed (and I'd 
expect GC to run at such a time?). To by-pass this I just closed the session 
and opened a new one each couple of thousand-th time in the loop.

Maybe there is a (small) memory leak somewhere in BaseX that only becomes 
noticeable (and annoying) after hundreds of thousands of even millions of 
queries? 

-Oorspronkelijk bericht-
Van: basex-talk-boun...@mailman.uni-konstanz.de 
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] Namens Christian Grün
Verzonden: zaterdag 14 januari 2017 12:09
Aan: Bularca, Lucian 
CC: basex-talk@mailman.uni-konstanz.de
Onderwerp: Re: [basex-talk] Gravierende Performance-Einbüße bei Persistierung 
von mehr als 5000, 160 KB große XML Datenstrukturen.

Hi Lucian,

I have a hard time reproducing the reported behavior. The attached, revised 
Java example (without AUTOFLUSH) required around 30 ms for the first documents 
and 120 ms for the last documents, which is still pretty far from what you’ve 
been encountering:

> von Anfang ~ 10 ms auf  ~ 2500 ms kommne würde

But obviously something weird has been going on in your setup. Let’s see what 
alternatives we have…

• Could you possibly try to update my example code such that it shows the 
reported behavior? Ideally with small input, in order to speed up the process. 
Maybe the runtime increase can also be demonstrated after
1.000 or 10.000 documents...
• You could also send me a list of the files of your test_database directory; 
maybe the file sizes indicate some unusual patterns.
• You could start BaseXServer with the JVM flag -Xrunhprof:cpu=samples (to be 
inserted in the basexserver script), start the server, run your script, stop 
the server directly afterwards, and send me the result file, which will be 
stored in the directory from where you started BaseX (java.hprof.txt).

Best,
Christian


On Wed, Jan 11, 2017 at 4:57 PM, Christian Grün  
wrote:
> Hi Lucian,
>
> Thanks for your analysis. Indeed I’m wondering about the monotonic 
> delay caused by auto flushing the data; this hasn’t always been the 
> case. I’m wondering even more why no one else noticed this in recent 
> time.. Maybe it’s not too long ago that this was introduced. It may 
> take some time to find the culprit, but I’ll keep you updated.
>
> All the best,
> Christian
>
>
> On Wed, Jan 11, 2017 at 2:46 PM, Bularca, Lucian 
>  wrote:
>> Hi Christian,
>>
>> I've made a comparation of the persistence time series running your example 
>> code and mine, in all possible combinations of following scenarios:
>> - with and without "set intparse on"
>> - using my prepared test data and your test data
>> - closing and opening the DB connection each "n"-th insertion 
>> operation (where n in {5, 100, 500, 1000})
>> - with and without "set autoflush on".
>>
>> I finally found out, that the only relevant variable that influence the 
>> insert operation duration is the value of the AUTOFLASH option.
>>
>> If AUTOFLASH = OFF when opening a database, then the persistence durations 
>> remains relative constant (on my machine about 43 ms) during the entire 
>> insert operations sequence (50.000 or 100.000 times), for all possible 
>> combinations named above.
>>
>> If AUTOFLASH = ON when opening a database, then the persistence durations 
>> increase monotonic, for all possible combinations named above.
>>
>> The persistence duration, if AUTOFLASH = ON, is directly proportional to the 
>> number of DB clients executing these insert operations, respectively to the 
>> sequence length of insert operations executed by a DB client.
>>
>> In my opinion, this behaviour is an issue of BaseX, because AUTOFLASH is 
>> implcitly set to ON (see BaseX documentation 
>> http://docs.basex.org/wiki/Options#AUTOFLUSH), so DB clients must explicitly 
>> set AUTOFLASH = OFF in order to keep the insert operation durations 
>> relatively constant over time. Additionally, no explicitly flushing data, 
>> increases the risk of data loss (see BaseX documentation 
>> http://docs.basex.org/wiki/Options#AUTOFLUSH), but clients how repeatedly 
>> execute the FLUSH command increase the durations of the subsequent insert 
>> operations.
>>
>> Regards,
>> Lucian
>>
>> 
>> Von: Christian Grün [christian.gr...@gmail.com]
>> Gesendet: Dienstag, 10. Januar 2017 17:33
>> An: Bularca, Lucian
>> Cc: Dirk Kirsten; basex-talk@mailman.uni-konstanz.de
>> Betreff: Re: [basex-talk] Gravierende Performance-Einbüße bei Persistierung 
>> von mehr als 5000, 160 KB große XML Datenstrukturen.
>>
>> Hi Lucian,
>>
>> I couldn’t run your code example out of the box. 24 hours sounds 
>> pretty alarming, though, so I have written my own example (attached).
>> It creates 50.000 XML documents, each sized around 16

Re: [basex-talk] Losing some space entities

2017-01-14 Thread France Baril
I use   - non-breakable space a lot. It's used widely in French.

I wanted:

‌ - ‌
‍ - ‍

I've had clients who have asked for the thin space in the past  ... I
nicely told them 'no'.

I've resolved my joiner issue with an alternate solution for now. But if
you are opening up the can of worms, and it's not much more work, why not
implement all the space and joiner entities   to ‍?

Regards,

France

On Sat, Jan 14, 2017 at 4:34 AM, Christian Grün 
wrote:

> Hi France,
>
> In the database, all Unicode characters will be stored in their
> standard (decoded) representation. As a result, it is not possible to
> preserve entities from an original document. For XML serialization via
> WebDAV, we have one special rule for converting non-breaking spaces
> (xA0) to entities. Which other Unicode characters would you like to
> have converted to entities?
>
> Cheers,
> Christian
>
>
> On Fri, Jan 13, 2017 at 7:45 PM, France Baril
>  wrote:
> > Hi,
> >
> > When I serialize content to HTML5, I lose some entities.
> >
> > xquery: adds '‍' in front of some content and outputs html
> > html: should have '‍', but doesn't. I've tried with 8204 and even the
> > half space 8201.
> >
> > The only special space that seems to work is  , but it won't do for
> > what I need right now.
> >
> > Code sample:
> >
> > let $target-table :=
> >   copy $copy := $base-table
> >   modify(
> >  for $td in $copy//tr/td[position()=$column-to-filter-by]
> >  let $new-value := ('‍', for $node in $td/node() return
> $node)
> >  return replace value of node $td with $new-value
> >   )
> >   return $copy
> >
> > return
> >   $target-table
> >
> >
> > Is there any way to solve this?
> >
> >
> > --
> > France Baril
> > Architecte documentaire / Documentation architect
> > france.ba...@architextus.com
>



-- 
France Baril
Architecte documentaire / Documentation architect
france.ba...@architextus.com