Re: [basex-talk] BaseX optimizer performance on REx-generated parser

2016-04-04 Thread Gunther Rademacher
Hi Christian,

thanks for reviewing the code!

REx v5.38 is now online and it includes the -basex option. Use it as follows:

   - create some grammar G.ebnf, e.g. by using a sample grammar from
 http://bottlecaps.de/rex/
 
   - create a parser by generating it on http://bottlecaps.de/rex/ using 
 command line options: -tree -java -basex
 
   - compile G.java and make class files available to BaseX classpath

   - invoke parser from XQuery, e.g.

 declare namespace p="G";
 p:parse-S($input)

 where S is a start symbol of the grammar.

   - for an XQuery coded parser, use REx options: -tree -xquery

   - for just a syntax checker, with no parse tree, omit option: -tree

If you encounter any problems with this, please let me know.

Also please note that I have restored the type of p:match2 in generated
XQuery code, which cause bad performance problem with earlier BaseX
versions (no longer with 8.4.2). p:transition and some others still
come without a type spec, but they are not known to cause any problem
and this saves me adding another case distinction to code generation.

Best regards
Gunther
 

Gesendet: Montag, 04. April 2016 um 19:36 Uhr
Von: "Christian Grün" 
An: "Gunther Rademacher" 
Cc: BaseX 
Betreff: Re: [basex-talk] BaseX optimizer performance on REx-generated parser
Hi Gunther,

Your code looks excellent. And I was amazed to see you’ve even found
out how to build the error node via our MemBuilder code (which is not
really well-documented, compared to real APIs)… I’ll be happy to play
around with it and give more feedback once the -basex option will be
available.

Thanks!
Christian


Re: [basex-talk] BaseX optimizer performance on REx-generated parser

2016-04-04 Thread Christian Grün
Hi Gunther,

Your code looks excellent. And I was amazed to see you’ve even found
out how to build the error node via our MemBuilder code (which is not
really well-documented, compared to real APIs)… I’ll be happy to play
around with it and give more feedback once the -basex option will be
available.

Thanks!
Christian


On Mon, Apr 4, 2016 at 8:07 AM, Gunther Rademacher  wrote:
> Hi Christian,
>
> thank you for the tree builder proposal, it works fine indeed.
>
> I have slightly modified the extension function such that it behaves the
> same as generated XQuery code, so can be used to replace it without further
> adaptations of the code that calls it.
>
> Also I have used Str rather than String, in order to create a unique signature
> identifying a BaseX extension function.
>
> Finally, the call of the parser's parse_x method was isolated in order to
> prepare for multiple extension functions in a single class. This occurs when
> there are multiple start symbols in a grammar.
>
> The modified code is attached to this mail. It is stripped down to what
> would be added to REx-generated code for '-basex'.
>
> Best regards
> Gunther
>
>
> Gesendet: Freitag, 01. April 2016 um 17:57 Uhr
> Von: "Christian Grün" 
> An: "Gunther Rademacher" 
> Cc: BaseX 
> Betreff: Re: [basex-talk] BaseX optimizer performance on REx-generated parser
> Hi Gunther,
>
> Thanks again! Thanks to your examples, which create 38 MB of
> serialized XML, I now see why it is in fact beneficial to use a tree
> builder ;)
>
> I finally looked at your Saxon code a bit closer, and I rewrote it a
> bit to work with BaseX:
>
> * I added a parse(String query) function, which basically does what
> ExtensionFunctionCall.call does
> * I renamed SaxonTreeBuilder to BaseXTreeBuilder, which now calls the
> appropriate BaseX builder functions
> * The TopDownTreeBuilder stays unchanged
>
> I have attached the resulting code; it seems to be much faster indeed.
> Does it make any sense to you? Do you think it would make sense to
> provide both a Saxon and BaseX option on your parser page?
>
> Christian
>
>
>
> On Fri, Apr 1, 2016 at 12:32 AM, Gunther Rademacher  wrote:
>> Hi Christian,
>>
>> please find my code attached. I have tested it along with an XQuery 3.1
>> parser, that was generated using command line options:
>>
>> -tree -main -java -saxon
>>
>> It contains the DOM tree builder, as well as your approach using
>> XmlSerializer followed by XML parsing, both for BaseX and for Saxon.
>>
>> In my tests I have parsed the XQuery code for the same grammar, roughly
>> 1 MB, and counted nodes of the parse tree.
>>
>> These are the commands that I have used:
>>
>> java org.basex.BaseX -q "declare namespace p='java:XQueryParser'; 
>> p:parseXQueryToDOM(unparsed-text('file:///C:/temp/CR-xquery-31-20151217.xquery'))/count(descendant-or-self::node())"
>> java org.basex.BaseX -q "declare namespace p='java:XQueryParser'; 
>> p:parseXQueryToDBNode(unparsed-text('file:///C:/temp/CR-xquery-31-20151217.xquery'))/count(descendant-or-self::node())"
>> java net.sf.saxon.Query -qs:"declare namespace p='java:XQueryParser'; 
>> p:parseXQueryToDOM(unparsed-text('file:///C:/temp/CR-xquery-31-20151217.xquery'))/count(descendant-or-self::node())"
>> java net.sf.saxon.Query -init:XQueryParser$SaxonInitializer -qs:"declare 
>> namespace p='XQueryParser'; 
>> p:parseXQueryToNodeInfo(unparsed-text('file:///C:/temp/CR-xquery-31-20151217.xquery'))/count(descendant-or-self::node())"
>> java net.sf.saxon.Query -init:CR_xquery_31_20151217$SaxonInitializer 
>> -qs:"declare namespace p='CR_xquery_31_20151217'; 
>> p:parse-XQuery(unparsed-text('file:///C:/temp/CR-xquery-31-20151217.xquery'))/count(descendant-or-self::node())"
>>
>> And here are the results (best runtime in seconds out of several executions):
>>
>> | BaseX | SaxonEE
>> ---+---+
>> DOM builder | 4.48 | 2.98
>> parseXml | 3.57 | 3.24
>> native builder | - | 2.36
>>
>> As you expected, using DOM seems not to be advantageous for BaseX. However
>> the Saxon results suggest that a native tree builder API can do better than
>> parsing XML.
>>
>> Best regards
>> Gunther
>>
>>
>> Gesendet: Donnerstag, 31. März 2016 um 15:01 Uhr
>> Von: "Christian Grün" 
>> An: "Gunther Rademacher" 
>> Cc: BaseX 
>> Betreff: Re: [basex-talk] BaseX optimizer performance on REx-generated parser
>> Hi Gunther,
>>
>>> I am busy right now, but will be able to present some code tonight.
>>
>> Thanks! Take your time.
>>
>>> Is there a different tree model than DOM, that you would prefer for BaseX?
>>
>> I assume that the difference between DOM and String inputs will be
>> marginal. If the method will be called from XQuery, one the fastest
>> solutions is probably to write everything to a temporary string or
>> byte array and create an XQuery node 

Re: [basex-talk] Matching multiple names across a list of sequences of names

2016-04-04 Thread Christian Grün
Hi Graydon,

> I can't give you a real example because it's the client's health care data,

No problem, your example looks fine.

> let $found := //*[@name eq $match(1)][./descendant::*[@name eq
> $match(2)][./descendant::*[@name eq $match(3)]]]

Right. You could try to rewrite this for index access:

1. You’ll have to mark the generated arrays as string arrays:

   let $composedNames as array(xs:string) :=
  for $x in $composed//composed
  return array { tokenize($x/string(),'\.') }

2. You need to replace "eq" with "=", and you can simplify the
predicates a little:

  let $found := //*[@name = $match(1)]
[descendant::*/@name = $match(2)]
[descendant::*/@name = $match(3)]

You indicated that you’ll have thousands of paths. How do they look
like? Could you add some more examples (besides
"class.operation.specifier")? Are some parts of the paths more
specific than others? E.g...

   A.A.A
   A.A.B
   A.A.C
   A.B.D
   A.B.E
   A.B.F
   ...

In this case, it could make sense to only look for the last path
segment via the index. You could also try to group your results by the
first segment, then do the search on the second segment, etc. See my
attached query as example (I’m sure it needs to be revised to work
properly, because I have only run it with your simple example file).

Does this help?
Christian




>
> This works, but it's going over the entire database for every three part
> class-operation-specifier compound name.  I can't shake the feeling that
> there's a more efficient way to do this, but I can't see what it might be.
>
> Thanks!
> Graydon
>
> On Fri, Apr 1, 2016 at 12:04 PM, Christian Grün 
> wrote:
>>
>> Hi Graydon,
>>
>> Do you think there’d be a chance for us to get a minimized,
>> self-contained example, which demonstrates the n^2 solution?
>>
>> Thanks  in advance,
>> Christian
>>
>>
>>
>> On Fri, Apr 1, 2016 at 5:24 PM, Graydon Saunders 
>> wrote:
>> > Hello -
>> >
>> > I've got a problem I'm not sure how to best approach.
>> >
>> > I've got triplets of names -- class.operation.specifier -- that I need
>> > to
>> > match against much longer sequences of names. (Which are in attributes
>> > in an
>> > XML hierarchy; each sequence of names derives from a path to a leaf
>> > element.)
>> >
>> > If there is a match (as there usually is not) one of the names in the
>> > sequence of names will match to the class, a subsequent name to the
>> > operation,  and a name subsequent to that match to the specifier. (All
>> > simple string values.)
>> >
>> > The naive n^2 version is much too slow for the amount of data involved.
>> >
>> > Is there an efficient way to do this kind of matching?
>> >
>> > Thanks!
>> > Graydon
>
>


threePartMatchesEG-Grouping.xq
Description: Binary data


Re: [basex-talk] BaseX XQuery Validation in Oxygen

2016-04-04 Thread Christian Grün
Hi Günter,

feel free to have a look at the Argon project [1]. For basic oXygen
integration, our Wiki page may be of interest as well [2].

Cheers,
Christian

[1] http://argon-author.com/
[2] http://docs.basex.org/wiki/Integrating_oXygen




On Sat, Apr 2, 2016 at 2:25 PM, kleist-digital  wrote:
> Hi all, hi Christian,
>
> in BaseX-specific cases, the Oxygen XQuery-Validation doesn’t work based on 
> the Saxon-Engine, for example
>
> import module namespace kleist = "http://kleist-digital.de/ns/kleist“;
>
> gives feedback: Failed to resolve URI of imported module: Cannot locate 
> module for namespace http://kleist-digital.de/ns/kleist
>
> Is there any way, to use the basex-XQuery-Parser inside of Oxygen?
>
> Best regards,
> Günter
>
>


Re: [basex-talk] node-basex

2016-04-04 Thread Andy Bunce
Hi again Günter,

>Just saw, that you’ve also written a node-based Basex-Client

There are quite a few https://www.npmjs.com/search?q=basex

>Are there any advantages (disadvantages) to change to a classical
Server-Client Scenario
I think it all depends what you want to do.
I wrote my client before RESTXQ existed, now I find I can do (almost)
everything I want from BaseX.
So my client has not had much attention recently and it's handling of
streaming and various edge cases could certainly be improved.

If I was looking to connect Node and BaseX now I would look at an http
interface to RESTXQ, but this may not be the most performant, resource
light option.

/Andy

On 3 April 2016 at 21:46, kleist-digital  wrote:

> Hi Andy,
>
> me again. Just saw, that you’ve also written a node-based Basex-Client. My
> frontend is a node-app. So far I’m using AJAX-calls, to communicate via the
> REST-Interface with the Basex-Server, in generell I use the run command to
> execute different queries.
>
> Are there any advantages (disadvantages) to change to a classical
> Server-Client Scenario (performance, less resources, security)? You know, I
> want to use it in an OpenShift-Environment for the future.
>
> Thanks for any advice.
>
> Günter
>


[basex-talk] Issue with db:add via restxq in 8.4.2

2016-04-04 Thread Vincenzo Cestone
Hi guys,

in the new basex 8.4.2 version I cannot write on the db via restxq.

For example if you create an empty 'testdb',
set in the .basex conf file the option MIXUPDATES = true,
and call the '/dbaddtest' restxq implemented as in the following:
---
module namespace o = "test";

declare
  %rest:path("/dbaddtest")
  %rest:GET
function o:dbaddtest() {
  let $o := db:add('testdb', , '111.xml')
  return 
};
---
it will not work in the basex 8.4.2 i.e. the document is not added to the
db, but it will return  without errors.
On the contrary the db:add executed in the BaseX.jar client is working well.

And also the above restxq code is working as expected in the basex 8.3.1
version.

Thank you in advance.

Regards,
Vincenzo


Re: [basex-talk] BaseX optimizer performance on REx-generated parser

2016-04-04 Thread Gunther Rademacher
Hi Christian,

thank you for the tree builder proposal, it works fine indeed.

I have slightly modified the extension function such that it behaves the 
same as generated XQuery code, so can be used to replace it without further 
adaptations of the code that calls it.

Also I have used Str rather than String, in order to create a unique signature
identifying a BaseX extension function.

Finally, the call of the parser's parse_x method was isolated in order to 
prepare for multiple extension functions in a single class. This occurs when
there are multiple start symbols in a grammar.

The modified code is attached to this mail. It is stripped down to what
would be added to REx-generated code for '-basex'.

Best regards
Gunther


Gesendet: Freitag, 01. April 2016 um 17:57 Uhr
Von: "Christian Grün" 
An: "Gunther Rademacher" 
Cc: BaseX 
Betreff: Re: [basex-talk] BaseX optimizer performance on REx-generated parser
Hi Gunther,

Thanks again! Thanks to your examples, which create 38 MB of
serialized XML, I now see why it is in fact beneficial to use a tree
builder ;)

I finally looked at your Saxon code a bit closer, and I rewrote it a
bit to work with BaseX:

* I added a parse(String query) function, which basically does what
ExtensionFunctionCall.call does
* I renamed SaxonTreeBuilder to BaseXTreeBuilder, which now calls the
appropriate BaseX builder functions
* The TopDownTreeBuilder stays unchanged

I have attached the resulting code; it seems to be much faster indeed.
Does it make any sense to you? Do you think it would make sense to
provide both a Saxon and BaseX option on your parser page?

Christian



On Fri, Apr 1, 2016 at 12:32 AM, Gunther Rademacher  wrote:
> Hi Christian,
>
> please find my code attached. I have tested it along with an XQuery 3.1
> parser, that was generated using command line options:
>
> -tree -main -java -saxon
>
> It contains the DOM tree builder, as well as your approach using
> XmlSerializer followed by XML parsing, both for BaseX and for Saxon.
>
> In my tests I have parsed the XQuery code for the same grammar, roughly
> 1 MB, and counted nodes of the parse tree.
>
> These are the commands that I have used:
>
> java org.basex.BaseX -q "declare namespace p='java:XQueryParser'; 
> p:parseXQueryToDOM(unparsed-text('file:///C:/temp/CR-xquery-31-20151217.xquery'))/count(descendant-or-self::node())"
> java org.basex.BaseX -q "declare namespace p='java:XQueryParser'; 
> p:parseXQueryToDBNode(unparsed-text('file:///C:/temp/CR-xquery-31-20151217.xquery'))/count(descendant-or-self::node())"
> java net.sf.saxon.Query -qs:"declare namespace p='java:XQueryParser'; 
> p:parseXQueryToDOM(unparsed-text('file:///C:/temp/CR-xquery-31-20151217.xquery'))/count(descendant-or-self::node())"
> java net.sf.saxon.Query -init:XQueryParser$SaxonInitializer -qs:"declare 
> namespace p='XQueryParser'; 
> p:parseXQueryToNodeInfo(unparsed-text('file:///C:/temp/CR-xquery-31-20151217.xquery'))/count(descendant-or-self::node())"
> java net.sf.saxon.Query -init:CR_xquery_31_20151217$SaxonInitializer 
> -qs:"declare namespace p='CR_xquery_31_20151217'; 
> p:parse-XQuery(unparsed-text('file:///C:/temp/CR-xquery-31-20151217.xquery'))/count(descendant-or-self::node())"
>
> And here are the results (best runtime in seconds out of several executions):
>
> | BaseX | SaxonEE
> ---+---+
> DOM builder | 4.48 | 2.98
> parseXml | 3.57 | 3.24
> native builder | - | 2.36
>
> As you expected, using DOM seems not to be advantageous for BaseX. However
> the Saxon results suggest that a native tree builder API can do better than
> parsing XML.
>
> Best regards
> Gunther
>
>
> Gesendet: Donnerstag, 31. März 2016 um 15:01 Uhr
> Von: "Christian Grün" 
> An: "Gunther Rademacher" 
> Cc: BaseX 
> Betreff: Re: [basex-talk] BaseX optimizer performance on REx-generated parser
> Hi Gunther,
>
>> I am busy right now, but will be able to present some code tonight.
>
> Thanks! Take your time.
>
>> Is there a different tree model than DOM, that you would prefer for BaseX?
>
> I assume that the difference between DOM and String inputs will be
> marginal. If the method will be called from XQuery, one the fastest
> solutions is probably to write everything to a temporary string or
> byte array and create an XQuery node representation (which is an
> instance of DBNode in BaseX):
>
> import org.basex.io.IO;
> import org.basex.query.value.node.DBNode;
>
> static DBNode parseXml() throws Exception {
> String input = "";
> return new DBNode(IO.get(input));
> }
>
> Thinking about this, I noticed that my previous parse-xquery.xq
> example will be executed faster (from 5ms to 2ms if executed
> repeatedly) if fn:parse-xml is replaced with with
> fn:parse-xml-fragment. This is why our internal XML parser instead of
> Java’s default XML parser is used for the