[basex-talk] html document retrieval runs out of main memory

2022-04-07 Thread Graydon Saunders
Hello --

I'm using the basexgui to run (minus some identifying actual values defined
previously in the query)

(: for each path, retrieve the document :)
for $remote in $paths
  let $name as xs:string := file:name($remote)
  let $target as xs:string := file:resolve-path($name,$targetBase)
  let $fetched :=
http:send-request(,
 $remote)[2]
  let $use as item() := try {
html:parse($fetched)
  } catch * {
$fetched
  }
  return if ($use instance of document-node())
 then file:write($target,$use)
 else file:write-binary($target,$use)

It works, in that I get exactly 100 documents retrieved.  (There are
unfortunately 140+ documents in the list.)

However, the query fails with an "out of main memory" error when using a
recent 10.0 beta or 9.7 with Xmx set to 2g.  Setting Xmx to 16g with 9.7
produces the same "out of memory" error in the same length of time (about 5
minutes).

java -version says
20:27 test % java -version
openjdk version "11.0.14.1" 2022-02-08
OpenJDK Runtime Environment 18.9 (build 11.0.14.1+1)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.14.1+1, mixed mode, sharing)

It's entirely possible I'm going about fetching files off a web server the
wrong way; it's possible there's something there that's rather large, but I
doubt it's that large.

What should I be doing instead?

Thanks!
Graydon


Re: [basex-talk] Basexgui Feature suggestion

2022-04-07 Thread Tamara Marnell
Hi Steven,

Just another small tip for this:


> I would say that for my use cases, a few extra clicks is less trouble than
> managing temporary edits to files and reverting them back again.


In the BaseX GUI you can make changes on the fly in the Editor window and
click the Run query button, then just close the XQ file without saving. You
don't need to save the file, run it, then revert your changes and save
again.

-Tamara

On Thu, Apr 7, 2022 at 1:20 PM Majewski, Steven Dennis (sdm7g) <
sd...@virginia.edu> wrote:

>
> Reading the docs closer
> https://docs.basex.org/wiki/Graphical_User_Interface#Editor
> It looks like the best shortcut would be saving the results to an .xml
> file, and using the feature of right clicking on the file to bind to ‘.’ ,
> however that doesn’t appear to work as described.
>
> I get a shorter menu of options that what’s shown in that doc:
>
> Open
> Open Externally
> Run Tests  (greyed out and inactive)
> Refresh
> Copy Path
>
> No “set as context” option.
>
> I’m running BaseX 9.7 on a Mac.
> Is this difference from the docs due to a Mac / PC difference, or has this
> capability changed and the docs not updated?
>
>
> I’ve also realized, after thinking about some more use cases, that due to
> the fact that results could be things other than nodes, like maps or
> arrays, or could be serialized differently, that there may be more issues
> than I had initially thought with simply binding results to context.
>
> I would say that for my use cases, a few extra clicks is less trouble than
> managing temporary edits to files and reverting them back again.
>
> — Steve M.
>
>
> On Apr 7, 2022, at 1:25 PM, Tamara Marnell 
> wrote:
>
> Hi Steven,
>
> To save a few clicks, you can create a new database directly from your
> results in XQuery, without saving them to a file first. Instead of
> returning the results outright, assign them to a variable to pass as the
> input to db:create() in the return, with a made-up file name for the path.
>
> let $results := {
>   [Your original return in here]
> }
> return db:create('my_results_db', $results, 'my_results.xml')
>
> Then you can run further queries using db:open('my_results_db')/results,
> and DROP my_results_db in the command input bar when you don't need it
> anymore.
>
> This isn't to say a new feature isn't a good idea, just that there's a way
> you can streamline your workflow before that feature exists!
>
> -Tamara
>
>
>
> On Thu, Apr 7, 2022 at 9:49 AM Majewski, Steven Dennis (sdm7g) <
> sd...@virginia.edu> wrote:
>
>>
>> It would be handy if there was a way to bind results to current context
>> for additional inspection/investigation/query of results. Currently, it
>> seems, you would have to save results to a file and then create database
>> from that file to make an additional query on results.
>> That binding would be available from editor or input bar.
>>
>> Or perhaps alternatively, it could include an option to base
>> visualizations on results instead of open database — although I’m guessing
>> the former would be easier to implement than the latter.
>>
>> — Steve Majewski
>>
>>
>
> --
>
> Tamara Marnell
> Program Manager, Systems
> Orbis Cascade Alliance (orbiscascade.org )
> Pronouns: she/her/hers
>
>
>

-- 

Tamara Marnell
Program Manager, Systems
Orbis Cascade Alliance (orbiscascade.org )
Pronouns: she/her/hers


Re: [basex-talk] Basexgui Feature suggestion

2022-04-07 Thread Majewski, Steven Dennis (sdm7g)

Reading the docs closer 
https://docs.basex.org/wiki/Graphical_User_Interface#Editor 
 
It looks like the best shortcut would be saving the results to an .xml file, 
and using the feature of right clicking on the file to bind to ‘.’ , however 
that doesn’t appear to work as described. 

I get a shorter menu of options that what’s shown in that doc: 

Open
Open Externally
Run Tests  (greyed out and inactive) 
Refresh 
Copy Path

No “set as context” option.

I’m running BaseX 9.7 on a Mac. 
Is this difference from the docs due to a Mac / PC difference, or has this 
capability changed and the docs not updated? 


I’ve also realized, after thinking about some more use cases, that due to the 
fact that results could be things other than nodes, like maps or arrays, or 
could be serialized differently, that there may be more issues than I had 
initially thought with simply binding results to context. 

I would say that for my use cases, a few extra clicks is less trouble than 
managing temporary edits to files and reverting them back again. 

— Steve M.


> On Apr 7, 2022, at 1:25 PM, Tamara Marnell  wrote:
> 
> Hi Steven,
> 
> To save a few clicks, you can create a new database directly from your 
> results in XQuery, without saving them to a file first. Instead of returning 
> the results outright, assign them to a variable to pass as the input to 
> db:create() in the return, with a made-up file name for the path.
> 
> let $results := {
>   [Your original return in here]
> }
> return db:create('my_results_db', $results, 'my_results.xml') 
> 
> Then you can run further queries using db:open('my_results_db')/results, and 
> DROP my_results_db in the command input bar when you don't need it anymore.
> 
> This isn't to say a new feature isn't a good idea, just that there's a way 
> you can streamline your workflow before that feature exists!
> 
> -Tamara
> 
> 
> 
> On Thu, Apr 7, 2022 at 9:49 AM Majewski, Steven Dennis (sdm7g) 
> mailto:sd...@virginia.edu>> wrote:
> 
> It would be handy if there was a way to bind results to current context for 
> additional inspection/investigation/query of results. Currently, it seems, 
> you would have to save results to a file and then create database from that 
> file to make an additional query on results. 
> That binding would be available from editor or input bar. 
> 
> Or perhaps alternatively, it could include an option to base visualizations 
> on results instead of open database — although I’m guessing the former would 
> be easier to implement than the latter. 
> 
> — Steve Majewski
> 
> 
> 
> -- 
> 
> Tamara Marnell
> Program Manager, Systems
> Orbis Cascade Alliance (orbiscascade.org )
> Pronouns: she/her/hers



smime.p7s
Description: S/MIME cryptographic signature


Re: [basex-talk] Basexgui Feature suggestion

2022-04-07 Thread Tamara Marnell
Hi Steven,

To save a few clicks, you can create a new database directly from your
results in XQuery, without saving them to a file first. Instead of
returning the results outright, assign them to a variable to pass as the
input to db:create() in the return, with a made-up file name for the path.

let $results := {
  [Your original return in here]
}
return db:create('my_results_db', $results, 'my_results.xml')

Then you can run further queries using db:open('my_results_db')/results,
and DROP my_results_db in the command input bar when you don't need it
anymore.

This isn't to say a new feature isn't a good idea, just that there's a way
you can streamline your workflow before that feature exists!

-Tamara



On Thu, Apr 7, 2022 at 9:49 AM Majewski, Steven Dennis (sdm7g) <
sd...@virginia.edu> wrote:

>
> It would be handy if there was a way to bind results to current context
> for additional inspection/investigation/query of results. Currently, it
> seems, you would have to save results to a file and then create database
> from that file to make an additional query on results.
> That binding would be available from editor or input bar.
>
> Or perhaps alternatively, it could include an option to base
> visualizations on results instead of open database — although I’m guessing
> the former would be easier to implement than the latter.
>
> — Steve Majewski
>
>

-- 

Tamara Marnell
Program Manager, Systems
Orbis Cascade Alliance (orbiscascade.org )
Pronouns: she/her/hers


[basex-talk] Basexgui Feature suggestion

2022-04-07 Thread Majewski, Steven Dennis (sdm7g)

It would be handy if there was a way to bind results to current context for 
additional inspection/investigation/query of results. Currently, it seems, you 
would have to save results to a file and then create database from that file to 
make an additional query on results. 
That binding would be available from editor or input bar. 

Or perhaps alternatively, it could include an option to base visualizations on 
results instead of open database — although I’m guessing the former would be 
easier to implement than the latter. 

— Steve Majewski



smime.p7s
Description: S/MIME cryptographic signature


[basex-talk] An automated workflow for creating tested and sustainable REST API/RestXQ containers that use BaseX

2022-04-07 Thread Omar Siam
For the last few years, I tried to integrate BaseX into a CI/CD workflow 
(the one used by gitlab [1]).


My understanding of CI/CD explicitly includes automated tests so I can 
be as sure as possible I don’t break anything when doing further 
development.
I pondered with using BaseX’ built in unit test framework [2], which I 
think is very good in many situations, but I was not really satisfied 
when it comes to checking RestXQ endpoints. The way it needs to be 
launched [3] is a bit awkward to me as a test for RestXQ and I could not 
come up with an easy way to automate this.
When serving HTML based apps that interact with BaseX as a data store 
there is also no well-established way to do end-to-end testing, that is 
to for example programmatically click a button and capture the result, 
but they do exist for Node.js [4].


Unfortunately my projects almost never allow for more than a few tests 
due to budget and time constraints. So, I want to make them count. My 
experience is that such end-to-end tests are what can test the most code 
in JS and RestXQ with the least amount of test code. Admittedly, if you 
get an error, you go hunting for the real problem.
Similarly, I use Node.js to do kind of an outside integration test for 
RESTful APIs. I have to admit I have a personal aversion against using 
many programming languages throughout my day. I try to use two or 
maximum three on any given day. But there are also other reasons:
* For one JS and TypeScript are very likely consumers of a RESTful API. 
It is hard to escape JS or TypeScript today.
* Other people are probably much better at their preferred programming 
languages but I hope almost anyone has a passive knowledge of JS. If 
this is true, then tests written in JS are good examples and candidates 
for translation into another languages test framework if necessary.


The (now old) gitlab CI/CD workflow uses herokuish [5] and Heroku 
buildpacks [6] (but not the actual commercial Heroku service) to achieve 
two things:
* Bundle an application written in some programming language in a 
container without the author of the program needing to have any 
knowledge of containers, Dockerfiles etc. The author just needs to know 
the dependency tools of their programming language well
* Having a well-defined way to start up tests and proceed with 
deployment if the tests succeed or stopping right there.
The second point is something that the currently used process for 
building containers in gitlab, Cloud Native Buildpacks [7], still cannot 
do because they are only in the planning phase of a container interface 
for launching tests.


As I wanted to bundle everything needed to run and test an application 
running in the RestXQ environment provided by BaseX for the GitLab CI/CD 
process, I created a Heroku buildpack for BaseX based on the one for 
Node.js [8].
As I extended the Node.js buildpack everything is still controlled from 
package.json in principle. The “engines” definitions are amended to also 
include a version of BaseX and a version of Saxon to use [9]. Due to 
inheriting from the Node.js buildpack, JS dependencies for a HTML page 
that acts as the user interface for some RestXQ endpoints can be easily 
specified, a “build” script can build some sources written to use the 
Vue.js, React or angular frameworks. Or maybe only the dependencies of a 
more elaborate test suite need to be fetched like mocha, chai etc. [10]
Also, there is a shell script “deployment/initial.sh” [11] is run so XML 
data can be fetched and loaded into BaseX at the time the container is 
built. For example, I use it to pull data from another git repository 
and execute a BaseX bxs script [12] to load the data, generate indices 
etc. A “test” script then needs to be defined to run the RESTful API 
endpoint tests [13] or the end-to-end test [14] automatically.
Using this approach, it is possible to also launch any external process 
for testing so for example BaseX Unit module based tests could be 
launched additionally or instead of something truly Node.js based.


We have decided a while ago that we want to have private git 
repositories with all their CI/CD at gitlab.com. We provide the compute 
resources to run all builds and checks using a Kubernetes cluster we 
own. Back then github just didn’t provide any private repositories for 
free but we wanted to be present there and so we have all the public 
projects at github.com.
The good thing about public repositories on github.com is the Actions 
workflows there which run in a VM that provides just about any 
conceivable programming languages build environment plus container 
building tools and more.
The bad thing is: there is no immediately obvious recipe of how to make 
good use of that VM on github.com.
But if there are container building tools we can run pretty much a copy 
of the gitlab CI/CD workflow that people over there came up with and 
that is documented in their AutoDevOps workflow definitions.
So I cannot