Re: [basex-talk] BaseX optimizer performance on REx-generated parser
Hi Christian, thanks for reviewing the code! REx v5.38 is now online and it includes the -basex option. Use it as follows: - create some grammar G.ebnf, e.g. by using a sample grammar from http://bottlecaps.de/rex/ - create a parser by generating it on http://bottlecaps.de/rex/ using command line options: -tree -java -basex - compile G.java and make class files available to BaseX classpath - invoke parser from XQuery, e.g. declare namespace p="G"; p:parse-S($input) where S is a start symbol of the grammar. - for an XQuery coded parser, use REx options: -tree -xquery - for just a syntax checker, with no parse tree, omit option: -tree If you encounter any problems with this, please let me know. Also please note that I have restored the type of p:match2 in generated XQuery code, which cause bad performance problem with earlier BaseX versions (no longer with 8.4.2). p:transition and some others still come without a type spec, but they are not known to cause any problem and this saves me adding another case distinction to code generation. Best regards Gunther Gesendet: Montag, 04. April 2016 um 19:36 Uhr Von: "Christian Grün"An: "Gunther Rademacher" Cc: BaseX Betreff: Re: [basex-talk] BaseX optimizer performance on REx-generated parser Hi Gunther, Your code looks excellent. And I was amazed to see you’ve even found out how to build the error node via our MemBuilder code (which is not really well-documented, compared to real APIs)… I’ll be happy to play around with it and give more feedback once the -basex option will be available. Thanks! Christian
Re: [basex-talk] BaseX optimizer performance on REx-generated parser
Hi Gunther, Your code looks excellent. And I was amazed to see you’ve even found out how to build the error node via our MemBuilder code (which is not really well-documented, compared to real APIs)… I’ll be happy to play around with it and give more feedback once the -basex option will be available. Thanks! Christian On Mon, Apr 4, 2016 at 8:07 AM, Gunther Rademacherwrote: > Hi Christian, > > thank you for the tree builder proposal, it works fine indeed. > > I have slightly modified the extension function such that it behaves the > same as generated XQuery code, so can be used to replace it without further > adaptations of the code that calls it. > > Also I have used Str rather than String, in order to create a unique signature > identifying a BaseX extension function. > > Finally, the call of the parser's parse_x method was isolated in order to > prepare for multiple extension functions in a single class. This occurs when > there are multiple start symbols in a grammar. > > The modified code is attached to this mail. It is stripped down to what > would be added to REx-generated code for '-basex'. > > Best regards > Gunther > > > Gesendet: Freitag, 01. April 2016 um 17:57 Uhr > Von: "Christian Grün" > An: "Gunther Rademacher" > Cc: BaseX > Betreff: Re: [basex-talk] BaseX optimizer performance on REx-generated parser > Hi Gunther, > > Thanks again! Thanks to your examples, which create 38 MB of > serialized XML, I now see why it is in fact beneficial to use a tree > builder ;) > > I finally looked at your Saxon code a bit closer, and I rewrote it a > bit to work with BaseX: > > * I added a parse(String query) function, which basically does what > ExtensionFunctionCall.call does > * I renamed SaxonTreeBuilder to BaseXTreeBuilder, which now calls the > appropriate BaseX builder functions > * The TopDownTreeBuilder stays unchanged > > I have attached the resulting code; it seems to be much faster indeed. > Does it make any sense to you? Do you think it would make sense to > provide both a Saxon and BaseX option on your parser page? > > Christian > > > > On Fri, Apr 1, 2016 at 12:32 AM, Gunther Rademacher wrote: >> Hi Christian, >> >> please find my code attached. I have tested it along with an XQuery 3.1 >> parser, that was generated using command line options: >> >> -tree -main -java -saxon >> >> It contains the DOM tree builder, as well as your approach using >> XmlSerializer followed by XML parsing, both for BaseX and for Saxon. >> >> In my tests I have parsed the XQuery code for the same grammar, roughly >> 1 MB, and counted nodes of the parse tree. >> >> These are the commands that I have used: >> >> java org.basex.BaseX -q "declare namespace p='java:XQueryParser'; >> p:parseXQueryToDOM(unparsed-text('file:///C:/temp/CR-xquery-31-20151217.xquery'))/count(descendant-or-self::node())" >> java org.basex.BaseX -q "declare namespace p='java:XQueryParser'; >> p:parseXQueryToDBNode(unparsed-text('file:///C:/temp/CR-xquery-31-20151217.xquery'))/count(descendant-or-self::node())" >> java net.sf.saxon.Query -qs:"declare namespace p='java:XQueryParser'; >> p:parseXQueryToDOM(unparsed-text('file:///C:/temp/CR-xquery-31-20151217.xquery'))/count(descendant-or-self::node())" >> java net.sf.saxon.Query -init:XQueryParser$SaxonInitializer -qs:"declare >> namespace p='XQueryParser'; >> p:parseXQueryToNodeInfo(unparsed-text('file:///C:/temp/CR-xquery-31-20151217.xquery'))/count(descendant-or-self::node())" >> java net.sf.saxon.Query -init:CR_xquery_31_20151217$SaxonInitializer >> -qs:"declare namespace p='CR_xquery_31_20151217'; >> p:parse-XQuery(unparsed-text('file:///C:/temp/CR-xquery-31-20151217.xquery'))/count(descendant-or-self::node())" >> >> And here are the results (best runtime in seconds out of several executions): >> >> | BaseX | SaxonEE >> ---+---+ >> DOM builder | 4.48 | 2.98 >> parseXml | 3.57 | 3.24 >> native builder | - | 2.36 >> >> As you expected, using DOM seems not to be advantageous for BaseX. However >> the Saxon results suggest that a native tree builder API can do better than >> parsing XML. >> >> Best regards >> Gunther >> >> >> Gesendet: Donnerstag, 31. März 2016 um 15:01 Uhr >> Von: "Christian Grün" >> An: "Gunther Rademacher" >> Cc: BaseX >> Betreff: Re: [basex-talk] BaseX optimizer performance on REx-generated parser >> Hi Gunther, >> >>> I am busy right now, but will be able to present some code tonight. >> >> Thanks! Take your time. >> >>> Is there a different tree model than DOM, that you would prefer for BaseX? >> >> I assume that the difference between DOM and String inputs will be >> marginal. If the method will be called from XQuery, one the fastest >> solutions is probably to write everything to a temporary string or >> byte array and create an XQuery node
Re: [basex-talk] Matching multiple names across a list of sequences of names
Hi Graydon, > I can't give you a real example because it's the client's health care data, No problem, your example looks fine. > let $found := //*[@name eq $match(1)][./descendant::*[@name eq > $match(2)][./descendant::*[@name eq $match(3)]]] Right. You could try to rewrite this for index access: 1. You’ll have to mark the generated arrays as string arrays: let $composedNames as array(xs:string) := for $x in $composed//composed return array { tokenize($x/string(),'\.') } 2. You need to replace "eq" with "=", and you can simplify the predicates a little: let $found := //*[@name = $match(1)] [descendant::*/@name = $match(2)] [descendant::*/@name = $match(3)] You indicated that you’ll have thousands of paths. How do they look like? Could you add some more examples (besides "class.operation.specifier")? Are some parts of the paths more specific than others? E.g... A.A.A A.A.B A.A.C A.B.D A.B.E A.B.F ... In this case, it could make sense to only look for the last path segment via the index. You could also try to group your results by the first segment, then do the search on the second segment, etc. See my attached query as example (I’m sure it needs to be revised to work properly, because I have only run it with your simple example file). Does this help? Christian > > This works, but it's going over the entire database for every three part > class-operation-specifier compound name. I can't shake the feeling that > there's a more efficient way to do this, but I can't see what it might be. > > Thanks! > Graydon > > On Fri, Apr 1, 2016 at 12:04 PM, Christian Grün> wrote: >> >> Hi Graydon, >> >> Do you think there’d be a chance for us to get a minimized, >> self-contained example, which demonstrates the n^2 solution? >> >> Thanks in advance, >> Christian >> >> >> >> On Fri, Apr 1, 2016 at 5:24 PM, Graydon Saunders >> wrote: >> > Hello - >> > >> > I've got a problem I'm not sure how to best approach. >> > >> > I've got triplets of names -- class.operation.specifier -- that I need >> > to >> > match against much longer sequences of names. (Which are in attributes >> > in an >> > XML hierarchy; each sequence of names derives from a path to a leaf >> > element.) >> > >> > If there is a match (as there usually is not) one of the names in the >> > sequence of names will match to the class, a subsequent name to the >> > operation, and a name subsequent to that match to the specifier. (All >> > simple string values.) >> > >> > The naive n^2 version is much too slow for the amount of data involved. >> > >> > Is there an efficient way to do this kind of matching? >> > >> > Thanks! >> > Graydon > > threePartMatchesEG-Grouping.xq Description: Binary data
Re: [basex-talk] BaseX XQuery Validation in Oxygen
Hi Günter, feel free to have a look at the Argon project [1]. For basic oXygen integration, our Wiki page may be of interest as well [2]. Cheers, Christian [1] http://argon-author.com/ [2] http://docs.basex.org/wiki/Integrating_oXygen On Sat, Apr 2, 2016 at 2:25 PM, kleist-digitalwrote: > Hi all, hi Christian, > > in BaseX-specific cases, the Oxygen XQuery-Validation doesn’t work based on > the Saxon-Engine, for example > > import module namespace kleist = "http://kleist-digital.de/ns/kleist“; > > gives feedback: Failed to resolve URI of imported module: Cannot locate > module for namespace http://kleist-digital.de/ns/kleist > > Is there any way, to use the basex-XQuery-Parser inside of Oxygen? > > Best regards, > Günter > >
Re: [basex-talk] node-basex
Hi again Günter, >Just saw, that you’ve also written a node-based Basex-Client There are quite a few https://www.npmjs.com/search?q=basex >Are there any advantages (disadvantages) to change to a classical Server-Client Scenario I think it all depends what you want to do. I wrote my client before RESTXQ existed, now I find I can do (almost) everything I want from BaseX. So my client has not had much attention recently and it's handling of streaming and various edge cases could certainly be improved. If I was looking to connect Node and BaseX now I would look at an http interface to RESTXQ, but this may not be the most performant, resource light option. /Andy On 3 April 2016 at 21:46, kleist-digitalwrote: > Hi Andy, > > me again. Just saw, that you’ve also written a node-based Basex-Client. My > frontend is a node-app. So far I’m using AJAX-calls, to communicate via the > REST-Interface with the Basex-Server, in generell I use the run command to > execute different queries. > > Are there any advantages (disadvantages) to change to a classical > Server-Client Scenario (performance, less resources, security)? You know, I > want to use it in an OpenShift-Environment for the future. > > Thanks for any advice. > > Günter >
[basex-talk] Issue with db:add via restxq in 8.4.2
Hi guys, in the new basex 8.4.2 version I cannot write on the db via restxq. For example if you create an empty 'testdb', set in the .basex conf file the option MIXUPDATES = true, and call the '/dbaddtest' restxq implemented as in the following: --- module namespace o = "test"; declare %rest:path("/dbaddtest") %rest:GET function o:dbaddtest() { let $o := db:add('testdb', , '111.xml') return }; --- it will not work in the basex 8.4.2 i.e. the document is not added to the db, but it will return without errors. On the contrary the db:add executed in the BaseX.jar client is working well. And also the above restxq code is working as expected in the basex 8.3.1 version. Thank you in advance. Regards, Vincenzo
Re: [basex-talk] BaseX optimizer performance on REx-generated parser
Hi Christian, thank you for the tree builder proposal, it works fine indeed. I have slightly modified the extension function such that it behaves the same as generated XQuery code, so can be used to replace it without further adaptations of the code that calls it. Also I have used Str rather than String, in order to create a unique signature identifying a BaseX extension function. Finally, the call of the parser's parse_x method was isolated in order to prepare for multiple extension functions in a single class. This occurs when there are multiple start symbols in a grammar. The modified code is attached to this mail. It is stripped down to what would be added to REx-generated code for '-basex'. Best regards Gunther Gesendet: Freitag, 01. April 2016 um 17:57 Uhr Von: "Christian Grün"An: "Gunther Rademacher" Cc: BaseX Betreff: Re: [basex-talk] BaseX optimizer performance on REx-generated parser Hi Gunther, Thanks again! Thanks to your examples, which create 38 MB of serialized XML, I now see why it is in fact beneficial to use a tree builder ;) I finally looked at your Saxon code a bit closer, and I rewrote it a bit to work with BaseX: * I added a parse(String query) function, which basically does what ExtensionFunctionCall.call does * I renamed SaxonTreeBuilder to BaseXTreeBuilder, which now calls the appropriate BaseX builder functions * The TopDownTreeBuilder stays unchanged I have attached the resulting code; it seems to be much faster indeed. Does it make any sense to you? Do you think it would make sense to provide both a Saxon and BaseX option on your parser page? Christian On Fri, Apr 1, 2016 at 12:32 AM, Gunther Rademacher wrote: > Hi Christian, > > please find my code attached. I have tested it along with an XQuery 3.1 > parser, that was generated using command line options: > > -tree -main -java -saxon > > It contains the DOM tree builder, as well as your approach using > XmlSerializer followed by XML parsing, both for BaseX and for Saxon. > > In my tests I have parsed the XQuery code for the same grammar, roughly > 1 MB, and counted nodes of the parse tree. > > These are the commands that I have used: > > java org.basex.BaseX -q "declare namespace p='java:XQueryParser'; > p:parseXQueryToDOM(unparsed-text('file:///C:/temp/CR-xquery-31-20151217.xquery'))/count(descendant-or-self::node())" > java org.basex.BaseX -q "declare namespace p='java:XQueryParser'; > p:parseXQueryToDBNode(unparsed-text('file:///C:/temp/CR-xquery-31-20151217.xquery'))/count(descendant-or-self::node())" > java net.sf.saxon.Query -qs:"declare namespace p='java:XQueryParser'; > p:parseXQueryToDOM(unparsed-text('file:///C:/temp/CR-xquery-31-20151217.xquery'))/count(descendant-or-self::node())" > java net.sf.saxon.Query -init:XQueryParser$SaxonInitializer -qs:"declare > namespace p='XQueryParser'; > p:parseXQueryToNodeInfo(unparsed-text('file:///C:/temp/CR-xquery-31-20151217.xquery'))/count(descendant-or-self::node())" > java net.sf.saxon.Query -init:CR_xquery_31_20151217$SaxonInitializer > -qs:"declare namespace p='CR_xquery_31_20151217'; > p:parse-XQuery(unparsed-text('file:///C:/temp/CR-xquery-31-20151217.xquery'))/count(descendant-or-self::node())" > > And here are the results (best runtime in seconds out of several executions): > > | BaseX | SaxonEE > ---+---+ > DOM builder | 4.48 | 2.98 > parseXml | 3.57 | 3.24 > native builder | - | 2.36 > > As you expected, using DOM seems not to be advantageous for BaseX. However > the Saxon results suggest that a native tree builder API can do better than > parsing XML. > > Best regards > Gunther > > > Gesendet: Donnerstag, 31. März 2016 um 15:01 Uhr > Von: "Christian Grün" > An: "Gunther Rademacher" > Cc: BaseX > Betreff: Re: [basex-talk] BaseX optimizer performance on REx-generated parser > Hi Gunther, > >> I am busy right now, but will be able to present some code tonight. > > Thanks! Take your time. > >> Is there a different tree model than DOM, that you would prefer for BaseX? > > I assume that the difference between DOM and String inputs will be > marginal. If the method will be called from XQuery, one the fastest > solutions is probably to write everything to a temporary string or > byte array and create an XQuery node representation (which is an > instance of DBNode in BaseX): > > import org.basex.io.IO; > import org.basex.query.value.node.DBNode; > > static DBNode parseXml() throws Exception { > String input = ""; > return new DBNode(IO.get(input)); > } > > Thinking about this, I noticed that my previous parse-xquery.xq > example will be executed faster (from 5ms to 2ms if executed > repeatedly) if fn:parse-xml is replaced with with > fn:parse-xml-fragment. This is why our internal XML parser instead of > Java’s default XML parser is used for the