Re: [basex-talk] Finalizing Query-Objects

2020-02-03 Thread Christian Grün
It makes no difference for the BaseX server if you close the session and
have open query objects (query objects exclusively reside in the client).

It can make a difference in client implementations, though. If you have a
chance to always close queries after the execution, I think you should do
so. I assume your are caching the query results before iterating over them,
as it’s some in the other client implementations?




Ben Engbers  schrieb am Mo., 3. Feb. 2020, 11:01:

> Hi,
>
> The people from CRAN strongly suggested to add tests (comparable to
> Unit-tests) to my package (RBaseX). Their request led me to take another
> critical look at my code.
> So far the tests do not give an error message. But after completing the
> last test, 'testthat' reports 1 failure without further explanation.
> After changing the order in which the tests are executed, the failure is
> always caused by the last test. Therefore I think that it are not the
> tests that cause an error, but the finalize-process.
>
> At this moment, my code is based upon 3 classes: 'RBaseXClient' creates
> a new client-session. This session use 'SocketClass' to communicate with
> basexserver.  When used in query-mode, the session uses 'QueryClass' to
> create new query-objects. Due to this architecture, it is easy to
> explicitly close a regular query-object, but (at least in R) it is
> difficult to close query-objects when finalizing the session-object.
>
> How does the basexserver respond to closing the session without first
> explicitly closing all open querys? Does this result in an error?
>
> Ben
>


Re: [basex-talk] No difference for output from 'FULL' or 'RESULTS'

2020-02-03 Thread Christian Grün
Hi Ben,

The client API code hasn’t changed since BaseX 8. Maybe you need to revise
your code.

If you believe something wrong happens in the API, I’d still need some more
information on what you believe has changed exactly?

Best,
Christian





Ben Engbers  schrieb am Mo., 3. Feb. 2020, 15:11:

> Hi,
>
> As far as I can remember when using early versions from my
> client-software, the main difference in output after sending \04 or \1F
> to the database, was that in the latter case the output was preceded
> with XDM Meta data.
>
> # Full
> query_txt <- "for $i in 1 to 2 return Text { $i }"
> query_obj <- Query(Session, query_txt)
> result <- Full(query_obj)
>
> resulted in:
> "0b" "Text 1" "0b" "Text 2"
>
> # Iterate over query
> query2 <- "for $i in 3 to 4 return Iter { $i }"
> query_iterate <- Query(Session, query2)   # <== Alternative call to
> query-object
> while (More(query_iterate)) {
>   cat(Next(query_iterate), "\n")
> }
>
> resulted in:
> Iter 3
> Iter 4
>
> Now, iterating over the same query gives:
> 0b
> Iter 3
> 0b
> Iter 4
>
> Did something change in the client/server protocol or did I introduce an
> error somewhere?
>
> Ben
>


Re: [basex-talk] how to count and remove "entities"

2020-02-03 Thread Christian Grün
You could use REPLACE instead of ADD (or db:replace instead of db:add) and
name your tweet by the JSON id. For more details, have a look at our
documentation [1].

Deleting duplicates after the insertion would be another approach, but it
surely is too slow if your plan is to store thousands or millions of tweets.

[1] http://docs.basex.org/wiki/Database_Module#db:replace



thufir  schrieb am Di., 4. Feb. 2020, 07:41:

> Not sure of the correct lingo, but I'm building a database of tweets.
> As I run it, duplicate tweets are added to the database.  I can see the
> duplicates with:
>
> for $tweets  in db:open("twitter")
> return {$tweets/json/id__str}
>
> Firstly, how would I select the json node for a duplicate entity.  But,
> before even selecting that node, recursively look to see if there's more
> than one result for that id__str value.
>
> How would I even generate a count of each occurrence for the data of a
> specific id__str?
>
>
> thanks,
>
> Thufir
>


Re: [basex-talk] how to count and remove "entities"

2020-02-03 Thread thufir

I think distinct-result is helpful here:

https://stackoverflow.com/q/60051384/262852

as is count.  How would I pipe the result from the set of 
distinct-result to a count?  If the count >1 then I could delete that tweet.


Just thinking out-loud.  Is that reasonable?  Or, might I not be 
re-inventing the wheel here?



On 2020-02-03 10:41 p.m., thufir wrote:
Not sure of the correct lingo, but I'm building a database of tweets. As 
I run it, duplicate tweets are added to the database.  I can see the 
duplicates with:


for $tweets  in db:open("twitter")
return {$tweets/json/id__str}

Firstly, how would I select the json node for a duplicate entity.  But, 
before even selecting that node, recursively look to see if there's more 
than one result for that id__str value.


How would I even generate a count of each occurrence for the data of a 
specific id__str?



thanks,

Thufir


[basex-talk] how to count and remove "entities"

2020-02-03 Thread thufir
Not sure of the correct lingo, but I'm building a database of tweets. 
As I run it, duplicate tweets are added to the database.  I can see the 
duplicates with:


for $tweets  in db:open("twitter")
return {$tweets/json/id__str}

Firstly, how would I select the json node for a duplicate entity.  But, 
before even selecting that node, recursively look to see if there's more 
than one result for that id__str value.


How would I even generate a count of each occurrence for the data of a 
specific id__str?



thanks,

Thufir


Re: [basex-talk] Add command: name of the input will be set as path?

2020-02-03 Thread thufir

I got it to work in a very kludgy way:


new Open(databaseName).execute(context);
for (int i = 0; i < tweets.length(); i++) {
jsonStringTweet = tweets.get(i).toString();
jsonObjectTweet = new org.json.JSONObject(jsonStringTweet);
stringXml = XML.toString(jsonObjectTweet);
stringXml = wrap(stringXml);
write(stringXml,fileName);
String stringFromFile = read(fileName);
log.fine(stringFromFile);
new Add(fileName, stringXml).execute(context);
}
}

buth there I'm passing the fileName -- certainly I can just pass 
stringXml by itself somehow?


see also:

https://stackoverflow.com/a/60047738/262852



thanks,

Thufir

On 2020-02-03 1:42 p.m., Christian Grün wrote:

In this case there's no path argument, but there is an input argument of

stringXml.  Is that how to pass a String to Add()?

There are various ways; one is as follows:

 String json = "{ \"A\": 123 }";
 Context ctx = new Context();
 new CreateDB("test").execute(ctx);
 new Set("parser", "json").execute(ctx);
 Command add = new Add("json.xml");
 add.setInput(new ArrayInput(json));
 add.execute(ctx);
 System.out.println(new XQuery(".").execute(ctx));




On Mon, Feb 3, 2020 at 10:16 PM thufir  wrote:




On 2020-02-03 6:46 a.m., Christian Grün wrote:

What does it mean that "if null, the name of input will be set as the path"?


If your path argument points to a directory or a single file, and if
you specify no argument for the input variable, the filenames
resulting from your first argument will be adopted as database paths.

If you run the command "ADD myfile.xml", the input argument will be
null. If you run "ADD TO /db/path myfile.xml", input will be
"/db/path".




Right, but I'm not looking to run the command "ADD myfile.xml" from the
console but rather:


  new Add(null, stringXml).execute(context);

In this case there's no path argument, but there is an input argument of
stringXml.  Is that how to pass a String to Add()?



thanks,

Thufir


Re: [basex-talk] Add command: name of the input will be set as path?

2020-02-03 Thread Christian Grün
> In this case there's no path argument, but there is an input argument of
stringXml.  Is that how to pass a String to Add()?

There are various ways; one is as follows:

String json = "{ \"A\": 123 }";
Context ctx = new Context();
new CreateDB("test").execute(ctx);
new Set("parser", "json").execute(ctx);
Command add = new Add("json.xml");
add.setInput(new ArrayInput(json));
add.execute(ctx);
System.out.println(new XQuery(".").execute(ctx));




On Mon, Feb 3, 2020 at 10:16 PM thufir  wrote:
>
>
>
> On 2020-02-03 6:46 a.m., Christian Grün wrote:
> >> What does it mean that "if null, the name of input will be set as the 
> >> path"?
> >
> > If your path argument points to a directory or a single file, and if
> > you specify no argument for the input variable, the filenames
> > resulting from your first argument will be adopted as database paths.
> >
> > If you run the command "ADD myfile.xml", the input argument will be
> > null. If you run "ADD TO /db/path myfile.xml", input will be
> > "/db/path".
> >
>
>
> Right, but I'm not looking to run the command "ADD myfile.xml" from the
> console but rather:
>
>
>  new Add(null, stringXml).execute(context);
>
> In this case there's no path argument, but there is an input argument of
> stringXml.  Is that how to pass a String to Add()?
>
>
>
> thanks,
>
> Thufir


Re: [basex-talk] Add command: name of the input will be set as path?

2020-02-03 Thread thufir




On 2020-02-03 6:46 a.m., Christian Grün wrote:

What does it mean that "if null, the name of input will be set as the path"?


If your path argument points to a directory or a single file, and if
you specify no argument for the input variable, the filenames
resulting from your first argument will be adopted as database paths.

If you run the command "ADD myfile.xml", the input argument will be
null. If you run "ADD TO /db/path myfile.xml", input will be
"/db/path".




Right, but I'm not looking to run the command "ADD myfile.xml" from the 
console but rather:



new Add(null, stringXml).execute(context);

In this case there's no path argument, but there is an input argument of 
stringXml.  Is that how to pass a String to Add()?




thanks,

Thufir


Re: [basex-talk] convert JSON to XML to add to database

2020-02-03 Thread thufir

is this what you're referring to?

Command:
SET PARSER json
Command:
CREATE DB tweet /home/thufir/json/tweet.json
Result:
Database 'tweet' created in 166.11 ms.


Which, yes, is exactly the sequence which I'm looking to capture or 
replicate -- but not from a file as above.  It's more the usage of "Add" 
to add a string.


I've converted the JSON to XML, so that rather than tweet.json I have 
tweet.xml for convenience.


Using either ADD or CREATE is my goal -- but not with files.  Trying to 
use Strings.


thanks,

Thufir

On 2020-02-03 6:40 a.m., Christian Grün wrote:

How is JSON converted to XML in order to ADD to a database?

 JSONObject jsonTweet = tweets.getJSONObject(Long.toString(id));
 xmlStringTweet = XML.toString(jsonTweet);


Do you know how to create a database and add documents as JSON via the
BaseX GUI? If yes, you can enable the InfoView panel, and you will see
the commands that are called in the background. In the next step, you
can call these commands with Java.

See [1] for the available BaseX options, and see [2] for an example
the assigns an option via the SET command.

[1] http://docs.basex.org/wiki/Options
[2] 
https://github.com/BaseXdb/basex/blob/master/basex-examples/src/main/java/org/basex/examples/local/CreateCollection.java



Re: [basex-talk] filtering NaN from a sequence

2020-02-03 Thread Graydon
On Mon, Feb 03, 2020 at 03:24:48PM +0100, Christian Grün scripsit:
> > > for $value in $xmlReport/csv/record/Payment_Amount
> > >   where $value castable as xs:double
> > >   return xs:double($value)
> >
> > That errors out!
> > [XPTY0004] Cannot convert element()* to xs:double+: 
> > $xmlReport_1/element(csv)/element(record)/element(Payment_Amount)[. 
> > castable as xs:double].
> 
> Did you get this error message for the suggested "for" clause, or a let 
> clause?

The type is on a let clause that derives its value from a for:

let $made as xs:double+ := for $value in $xmlReport/csv/record/Payment_Amount
  where $value castable as xs:double
  return $value

> The XQuery pandora box provides a lot of type conversions that are all
> working slightly different: If you specify a type after the let
> clause, it is (close to) identical to the "treat as" expression.
> Treating values as another values won’t trigger explicit casts; this
> is your element nodes won’t be converted to doubles.

I have learned something!  Thank you, that makes it make sense.

> However, if you specify types in functions, …
> 
>   declare function local:bla($made as xs:double+) { ... }
> 
> …the values will be "promoted" to the specific type (and this is
> similar to casts).

And now I have learned something else. :)

That's very helpful; much appreciated.

-- Graydon


Re: [basex-talk] Add command: name of the input will be set as path?

2020-02-03 Thread Christian Grün
> What does it mean that "if null, the name of input will be set as the path"?

If your path argument points to a directory or a single file, and if
you specify no argument for the input variable, the filenames
resulting from your first argument will be adopted as database paths.

If you run the command "ADD myfile.xml", the input argument will be
null. If you run "ADD TO /db/path myfile.xml", input will be
"/db/path".


Re: [basex-talk] convert JSON to XML to add to database

2020-02-03 Thread Christian Grün
> How is JSON converted to XML in order to ADD to a database?
>
> JSONObject jsonTweet = tweets.getJSONObject(Long.toString(id));
> xmlStringTweet = XML.toString(jsonTweet);

Do you know how to create a database and add documents as JSON via the
BaseX GUI? If yes, you can enable the InfoView panel, and you will see
the commands that are called in the background. In the next step, you
can call these commands with Java.

See [1] for the available BaseX options, and see [2] for an example
the assigns an option via the SET command.

[1] http://docs.basex.org/wiki/Options
[2] 
https://github.com/BaseXdb/basex/blob/master/basex-examples/src/main/java/org/basex/examples/local/CreateCollection.java


Re: [basex-talk] JSON to XML conversion

2020-02-03 Thread Christian Grün
>  public void transform(String fileName) throws IOException {
>  String content = new
> String(Files.readAllBytes(Paths.get(fileName)), StandardCharsets.UTF_8);
>  org.json.JSONObject json = new org.json.JSONObject(content);
>  log.info(org.json.XML.toString(json));
>  }

What you seem to want to achieve is:

1. Open a JSON file as a string;
2. Convert this string to a JSON object;
3. Write this JSON object as XML to a log output (?)

This would be the XQuery way to do it:

  let $content := file:read-text('x.json')
  let $json := json:parse($content)
  return admin:write-log($json)

If you address the BaseX Java code, you can work with different
abstraction levels. Maybe it’s already sufficient if you evaluate the
upper XQuery string as command:

  Context ctx = new Context();
  String query = "let $content...";
  XQuery cmd = new XQuery(query);
  System.out.println(cmd.execute(ctx));


Re: [basex-talk] filtering NaN from a sequence

2020-02-03 Thread Christian Grün
> > for $value in $xmlReport/csv/record/Payment_Amount
> >   where $value castable as xs:double
> >   return xs:double($value)
>
> That errors out!
> [XPTY0004] Cannot convert element()* to xs:double+: 
> $xmlReport_1/element(csv)/element(record)/element(Payment_Amount)[. castable 
> as xs:double].

Did you get this error message for the suggested "for" clause, or a let clause?

> I conclude from this that NaN is castable as xs:double which surprised
> me when I first tried something like this, but which does make sense in
> as much as NaN has to be pseudo-numeric.

Exactly: NaN is a valid double value (as is INF and -INF).

> let $made as xs:double+ := for $value in $xmlReport/csv/record/Payment_Amount
>   where $value castable as xs:double
>   return $value
>
> doesn't strike me as obviously wrongly typed on $made.  I'd expect that
> to fail without the where clause but to be OK with it.

The XQuery pandora box provides a lot of type conversions that are all
working slightly different: If you specify a type after the let
clause, it is (close to) identical to the "treat as" expression.
Treating values as another values won’t trigger explicit casts; this
is your element nodes won’t be converted to doubles.

However, if you specify types in functions, …

  declare function local:bla($made as xs:double+) { ... }

…the values will be "promoted" to the specific type (and this is
similar to casts).


[basex-talk] No difference for output from 'FULL' or 'RESULTS'

2020-02-03 Thread Ben Engbers
Hi,

As far as I can remember when using early versions from my
client-software, the main difference in output after sending \04 or \1F
to the database, was that in the latter case the output was preceded
with XDM Meta data.

# Full
query_txt <- "for $i in 1 to 2 return Text { $i }"
query_obj <- Query(Session, query_txt)
result <- Full(query_obj)

resulted in:
"0b" "Text 1" "0b" "Text 2"

# Iterate over query
query2 <- "for $i in 3 to 4 return Iter { $i }"
query_iterate <- Query(Session, query2)   # <== Alternative call to
query-object
while (More(query_iterate)) {
  cat(Next(query_iterate), "\n")
}

resulted in:
Iter 3
Iter 4

Now, iterating over the same query gives:
0b
Iter 3
0b
Iter 4

Did something change in the client/server protocol or did I introduce an
error somewhere?

Ben


Re: [basex-talk] filtering NaN from a sequence

2020-02-03 Thread Graydon
On Mon, Feb 03, 2020 at 02:09:03PM +0100, Christian Grün scripsit:
> Martin’s suggestion is indeed the cleanest solution I can see.

Thank you!

> A curious side note regarding your approach:
> 
> > where not($value = number('NaN'))
> 
> Comparisons with NaN doubles always yield false, no matter if you use
> XQuery, Java or other languages:
> 
>   let $d := xs:double('NaN')
>   return $d = $d

Well than I've learned at least one new thing today!

Thank you!

-- Graydon


Re: [basex-talk] filtering NaN from a sequence

2020-02-03 Thread Graydon
On Mon, Feb 03, 2020 at 08:27:09AM +0100, Martin Honnen scripsit:
> Am 03.02.2020 um 01:22 schrieb Graydon Saunders:
> > for $value in $xmlReport/csv/record/Payment_Amount/number()
> >   where ???
> >   return $value
> 
> Can you live with
> 
> for $value in $xmlReport/csv/record/Payment_Amount
>   where $value castable as xs:double
>   return xs:double($value)

That errors out!
[XPTY0004] Cannot convert element()* to xs:double+: 
$xmlReport_1/element(csv)/element(record)/element(Payment_Amount)[. castable as 
xs:double].

If I do that with /number() at the end of the XPath

for $value in $xmlReport/csv/record/Payment_Amount/number()

I get "NaN" as the overall result.

I conclude from this that NaN is castable as xs:double which surprised
me when I first tried something like this, but which does make sense in
as much as NaN has to be pseudo-numeric.

If I take the type off the variable:

let $made := for $value in $xmlReport/csv/record/Payment_Amount

instead of

let $made as xs:double+ := for $value in $xmlReport/csv/record/Payment_Amount

then it works.

Which really surprised me because the whole statement should return a
sequence of doubles:

let $made as xs:double+ := for $value in $xmlReport/csv/record/Payment_Amount
  where $value castable as xs:double
  return $value

doesn't strike me as obviously wrongly typed on $made.  I'd expect that
to fail without the where clause but to be OK with it.

Thanks!
Graydon


Re: [basex-talk] filtering NaN from a sequence

2020-02-03 Thread Christian Grün
Martin’s suggestion is indeed the cleanest solution I can see.

A curious side note regarding your approach:

> where not($value = number('NaN'))

Comparisons with NaN doubles always yield false, no matter if you use
XQuery, Java or other languages:

  let $d := xs:double('NaN')
  return $d = $d

Best,
Christian




On Mon, Feb 3, 2020 at 2:14 AM Graydon Saunders  wrote:
>
> Hi Bridger
>
> functx:is-a-number does indeed work, but it's guts are
>
> string(number($value)) != 'NaN'
>
> Which seems improper somehow; it's relying on knowing the string that 
> corresponding to the conceptual NaN result.
>
> I may be looking for more elegance than I can plausibly expect, here. :)
>
> Thanks!
> Graydon
>
> On Sun, Feb 2, 2020 at 8:07 PM Bridger Dyson-Smith  
> wrote:
>>
>> Hi Graydon,
>> I'm mobile at the moment, so please excuse the abbreviated reply. Would 
>> functx:is-a-number() [#1] work in your where clause?
>>
>> I'm completely unable to test... apologies.
>>
>> Best,
>> Bridger
>>
>> #1 http://www.xqueryfunctions.com/xq/functx_is-a-number.html
>>
>> On Sun, Feb 2, 2020, 7:22 PM Graydon Saunders  wrote:
>>>
>>> Hello all --
>>>
>>> So I have a CSV file, and I can pull that into BaseX in the hopes of 
>>> writing a query to extract a report.  I'm using 9.3.1 for the purpose.
>>>
>>> Not all of the Payment_Amount fields have a value, so any report-extracting 
>>> query has to filter those out of any calculations or the whole thing gets 
>>> infested with NaN.
>>>
>>> This works:
>>> let $xmlReport as document-node(element(csv)) :=
>>>  file:read-text('report.csv') => csv:parse( map { 'header': true(), 
>>> 'separator' : 'tab' })
>>>
>>> let $made as xs:double+ := for $value in 
>>> $xmlReport/csv/record/Payment_Amount[text() castable as xs:double]/number()
>>>   return $value
>>>
>>> return sum($made) => round(2)
>>>
>>> If I wanted to use a where clause,
>>>
>>> let $xmlReport as document-node(element(csv)) :=
>>>  file:read-text('report.csv') => csv:parse( map { 'header': true(), 
>>> 'separator' : 'tab' })
>>>
>>> let $made as xs:double+ := for $value in 
>>> $xmlReport/csv/record/Payment_Amount/number()
>>>   where ???
>>>   return $value
>>>
>>> return sum($made) => round(2)
>>>
>>> What do I put in the where clause?  I tried
>>> where not($value = NaN)
>>> and that was not successful:
>>> "Stopped at /home/graydon/git/writing/transform/urk.xq, 6/25:
>>> [XPTY0020] element(NaN): node expected, xs:double found: 3.38."
>>>
>>> where not($value = number('NaN'))
>>>
>>> didn't give an error but the query returns NaN so I know I didn't filter 
>>> any of the empty records from the sum.
>>>
>>> How ought that where clause be written?
>>>
>>> Thanks!
>>> Graydon
>>>


[basex-talk] Finalizing Query-Objects

2020-02-03 Thread Ben Engbers
Hi,

The people from CRAN strongly suggested to add tests (comparable to
Unit-tests) to my package (RBaseX). Their request led me to take another
critical look at my code.
So far the tests do not give an error message. But after completing the
last test, 'testthat' reports 1 failure without further explanation.
After changing the order in which the tests are executed, the failure is
always caused by the last test. Therefore I think that it are not the
tests that cause an error, but the finalize-process.

At this moment, my code is based upon 3 classes: 'RBaseXClient' creates
a new client-session. This session use 'SocketClass' to communicate with
basexserver.  When used in query-mode, the session uses 'QueryClass' to
create new query-objects. Due to this architecture, it is easy to
explicitly close a regular query-object, but (at least in R) it is
difficult to close query-objects when finalizing the session-object.

How does the basexserver respond to closing the session without first
explicitly closing all open querys? Does this result in an error?

Ben


[basex-talk] Add command: name of the input will be set as path?

2020-02-03 Thread thufir



What does it mean that "if null, the name of input will be set as the path"?


Javadoc:

Add

public Add(java.lang.String path,
   java.lang.String input)

Constructor, specifying a target path and an input.

Parameters:
path - target path, optionally terminated by a new file name. 
If null, the name of the input will be set as path.

input - input file or XML string



I'm looking to add an xml file, so am using "null" for the path:

https://stackoverflow.com/q/60035605/262852


but what are the implications?  the "name of the input" will be "set as 
path"?  Where is the "name of the input"?  What is "path" in relation to 
a String which exists only in memory?



Just pass a string like:

new Add(null, stringXml).execute(context);

and that should add to the currently open database?






thanks,

Thufir