from:"Menashè Eliezer"

Re: [basex-talk] Slow query

2015-08-03 Thread Menashè Eliezer


I think I've already mentioned that the new query is different.
The reference to 8.2.1 is included here where also the old query can be 
found: 
https://www.mail-archive.com/basex-talk%40mailman.uni-konstanz.de/msg06544.html


With kind regards,
Menashè

On 08/03/2015 03:38 PM, Christian Grün wrote:

What was the last version it was working with?

8.2.1. Not really working, but better...

I ran the attached query with 8.2.1, and no index was used either. Are
you sure you sent me the correct query?

Sorry for confronting you with all those questions, but to help you, I
really need your help as well. Could you check the attached files
again and give me some hints on how to proceed?

Re: [basex-talk] Slow query

2015-08-03 Thread Menashè Eliezer


On 08/03/2015 03:24 PM, Christian Grün wrote:

What was the last version it was working with?

8.2.1. Not really working, but better...

Re: [basex-talk] No debug info

2015-08-03 Thread Menashè Eliezer


Hi,
I've simply used the client GUI. Should I see query logs in the server?
In the past maybe the logs I've seen were always a result of the Java 
client queries, not the GUI.


With kind regards,
Menashè

On 07/30/2015 10:23 AM, Christian Grün wrote:

I don't know why I don't see anymore query log events like "admin OK
QUERY" in the .logs folder.

I need more information: When did you get these particular log info?
Which API did you use? Any example available to reproduce that easily?

Re: [basex-talk] Applying indexes

2015-08-03 Thread Menashè Eliezer


Hi,

So in case of six exact conditions (With different xpaths) should I see 
the usage of six indexes?

Anyway, I'll send the exact query inside the old thread.

On 07/30/2015 10:21 AM, Christian Grün wrote:

Hi Menashè,

Because none of our index structures is particularly suited for range
queries, such index-driven requests may be slower than sequential
scans. Maybe you remember my last mail: Have you already tried to
store latitudes and longitudes in a fixed-size string representation?
Sure, I've also responded to it: 
https://www.mail-archive.com/basex-talk%40mailman.uni-konstanz.de/msg06648.html


Best regards,
Menashè

[basex-talk] Query plan: long printing

2015-07-29 Thread Menashè Eliezer

 

 Hello,   

 I have a query which is quite fast, but I see in the query plan
(Using the client GUI) that the printing is very very slow, even when
the number of hits is zero.  

 I wonder what it means, and it is a problem.   

 I'm sending these messages in different posts for helping users find
this information.   

 --  

 With kind regards,   

 Menashè

[basex-talk] No debug info

2015-07-29 Thread Menashè Eliezer


Hello,

I don't know why I don't see anymore query log events like "admin 
OK  QUERY" in the .logs folder.

Not even when I use the -d flag.
BaseX 8.2.2

--
With kind regards,
Menashè

[basex-talk] Applying indexes

2015-07-29 Thread Menashè Eliezer


Hello,

1. In which cases the "applying index" phrase should I see in the query
   plan? I have an old query (the one I've sent here at 06/22/2015)
   where only for one condition I've seen "applying index", and even
   this one doesn't appear anymore using a newer (8.2.2) version of
   BaseX and a larger database. I do see the right properties of the db:
   Indexes
 Up-to-date: true
 TEXTINDEX: true
 ATTRINDEX: true
2. Only once a value which is mentioned in the query is found in the
   index, in case of "=" condition?
3. Always, in case of "<" or "<" condition, assuming the specific
   text/attribute has been indexed?
4. Everything I see as a reply to index:facets is indexed, right?


--
With kind regards,
Menashè

Re: [basex-talk] Slow query

2015-07-14 Thread Menashè Eliezer


:)
I've thought to do it as a second step, but I should do it earlier.
Thank you.

With kind regards,
Menashè

On 07/14/2015 03:22 PM, Christian Grün wrote:

...it only makes sense if you store the data in its normalized representation.


On Tue, Jul 14, 2015 at 2:42 PM, Menashè Eliezer
 wrote:

Hi,
It sounds like a great idea and I can also implement it to the date
comparisons, but unfortunately the new query is much slower.
Please see the attached log.

With kind regards,
Menashè


On 07/14/2015 12:50 PM, Christian Grün wrote:

Should geo:within of http://docs.basex.org/wiki/Geo_Module help?

The functions of the Geo Module don't use any index structures, so I
am afraid they won't speed up the query.

One more idea: you could convert all latitudes and longitudes to
strings with a fixed number of digits
_

(:~ Allowed range. :)
declare variable $RANGE := 99;
(:~ Maximum latitude. :)
declare variable $LAT-MIN := -90;
(:~ Maximum longitude. :)
declare variable $LAT-MAX := 90;

(:~
   : Converts a double value to a normalized string value
   : with a fixed size of digits.
   : @param $num number to be converted
   : @param $min minimum allowed value
   : @param $max maximum allowed value
   : @return resulting value
   :)
declare function local:normalize(
$num as xs:double,
$min as xs:integer,
$max as xs:integer
) {
let $norm := $RANGE * ($num - $min) div ($max - $min)
return format-number($norm, '00')
};

(: Run code for various latitude values :)
for $latitude in (-90, -89., -13.345, 0, 89.9)
return local:normalize($latitude, $LAT-MIN, $LAT-MAX)
_

Next, you could to do string comparisons on these values:

for $doc in db:open("CDI")
let $lat := $doc//latitude
let $lon := $doc//longitude
where $lat >= "883387" and $lat <= "893463"
  and $lon >= "173467" and $lon <= "178745"
return db:node-pre($doc)

It should be fast enough if the maximum value is not much bigger than
the minimum value.

Re: [basex-talk] Reporting function for a subset

2015-07-14 Thread Menashè Eliezer


Thank you for the helpful ideas!

With kind regards,
Menashè

On 07/14/2015 12:56 PM, Christian Grün wrote:

What about this?

let $nodes :=
   let $db := db:open("CDI")
   for $x in $db
   let $beginPosition := $x//startTime
   let $lon := xs:float($x//longitudine)
   let $lat := xs:float($x//latitudine)

   where
   $beginPosition>="1889-01-01" and $beginPosition<="2015-07-10"
   and $lat<=46.733  and  $lat>=-67.81
   and $lon<=72.7006667 and $lon >=-79.967
   return $x

return ...


On Tue, Jul 14, 2015 at 12:55 PM, Menashè Eliezer
 wrote:

I'm sorry, but it's not clear how $nodes can include the result of my main
query:

xquery version "3.0"; declare option output:item-separator ",";
let $db := db:open("CDI")
for $x in $db
let $beginPosition := $x//startTime
let $lon := xs:float($x//longitudine)
let $lat := xs:float($x//latitudine)

where
$beginPosition>="1889-01-01" and $beginPosition<="2015-07-10"
and $lat<=46.733  and  $lat>=-67.81
and $lon<=72.7006667 and $lon >=-79.967
return $x

With kind regards,
Menashè

On 07/14/2015 12:51 PM, Christian Grün wrote:

E.g. like that:

   let $count := function($nodes) { count($nodes) }
   let $nodes := (, )
   return $count($nodes)



On Tue, Jul 14, 2015 at 12:41 PM, Menashè Eliezer
 wrote:

Thank you, but would you please show me how to pass (only once) for each
function the xml sequence which results from my main query, instead of
simple numbers as in your example?

With kind regards,
Menashè


On 07/14/2015 12:30 PM, Christian Grün wrote:

I hope it's clear.

Sorry, I'm still confuzzled. What is the problem? I guess you want to
define different, exchangable reporting functions for more or less the
same input (dataType, device, ...)?

Here is one way to define functions and call them in a second step:

let $add := function($a, $b) { $a + $b }
let $multiply := function($a, $b) { $a * $b }
for $function in ($add, $multiply)
return $function(3, 5)

Instead of $add and $multiply, you could have $report-pivoting and
$report-count.



On Tue, Jul 14, 2015 at 11:40 AM, Menashè Eliezer
 wrote:

Hi,
The initial of the code should be modified, so here is only the essence
of
one of the pivoting reports:
for $singleDataType in $dataType

for $singleDevice in $device

for $singleAvailability in $availability

for $singleCountry in  $country

for $singleParameter in $parameter

group by

$singleDataType,$singleDevice,$singleAvailability,$singleCountry,$singleParameter
order by

$singleDataType,$singleDevice,$singleAvailability,$singleCountry,$singleParameter

return 

Other report will count all records with one less condition:

group by $singleDataType,$singleDevice,$singleAvailability,$singleCountry
order by $singleDataType,$singleDevice,$singleAvailability,$singleCountry

return 

I hope it's clear.

With kind regards,
Menashè

Re: [basex-talk] Slow query

2015-07-14 Thread Menashè Eliezer


Hi,
It sounds like a great idea and I can also implement it to the date 
comparisons, but unfortunately the new query is much slower.

Please see the attached log.

With kind regards,
Menashè

On 07/14/2015 12:50 PM, Christian Grün wrote:

Should geo:within of http://docs.basex.org/wiki/Geo_Module help?

The functions of the Geo Module don't use any index structures, so I
am afraid they won't speed up the query.

One more idea: you could convert all latitudes and longitudes to
strings with a fixed number of digits
_

(:~ Allowed range. :)
declare variable $RANGE := 99;
(:~ Maximum latitude. :)
declare variable $LAT-MIN := -90;
(:~ Maximum longitude. :)
declare variable $LAT-MAX := 90;

(:~
  : Converts a double value to a normalized string value
  : with a fixed size of digits.
  : @param $num number to be converted
  : @param $min minimum allowed value
  : @param $max maximum allowed value
  : @return resulting value
  :)
declare function local:normalize(
   $num as xs:double,
   $min as xs:integer,
   $max as xs:integer
) {
   let $norm := $RANGE * ($num - $min) div ($max - $min)
   return format-number($norm, '00')
};

(: Run code for various latitude values :)
for $latitude in (-90, -89., -13.345, 0, 89.9)
return local:normalize($latitude, $LAT-MIN, $LAT-MAX)
_

Next, you could to do string comparisons on these values:

   for $doc in db:open("CDI")
   let $lat := $doc//latitude
   let $lon := $doc//longitude
   where $lat >= "883387" and $lat <= "893463"
 and $lon >= "173467" and $lon <= "178745"
   return db:node-pre($doc)

It should be fast enough if the maximum value is not much bigger than
the minimum value.


Compiling:
- inlining $norm_3
- simplifying flwor expression
- pre-evaluating -90
- pre-evaluating -180
- pre-evaluating db:open("CDI")
- inlining local:normalize#3
- removing redundant $num_13 as xs:double cast.
- removing redundant $min_14 as xs:integer cast.
- removing redundant $max_15 as xs:integer cast.
- inlining $num_13
- inlining $min_14
- pre-evaluating (46.733 - -90)
- pre-evaluating (99 * 136.733)
- inlining $max_15
- pre-evaluating (90 - -90)
- pre-evaluating (1.36732863267E8 div 180)
- pre-evaluating format-number(759627.018149, "00")
- simplifying flwor expression
- pre-evaluating -67.81
- inlining local:normalize#3
- removing redundant $num_16 as xs:double cast.
- removing redundant $min_17 as xs:integer cast.
- removing redundant $max_18 as xs:integer cast.
- inlining $num_16
- inlining $min_17
- pre-evaluating (-67.81 - -90)
- pre-evaluating (99 * 22.188)
- inlining $max_18
- pre-evaluating (90 - -90)
- pre-evaluating (2.218997781E7 div 180)
- pre-evaluating format-number(123277.6544999, "00")
- simplifying flwor expression
- inlining local:normalize#3
- removing redundant $num_19 as xs:double cast.
- removing redundant $min_20 as xs:integer cast.
- removing redundant $max_21 as xs:integer cast.
- inlining $num_19
- inlining $min_20
- pre-evaluating (72.7006667 - -180)
- pre-evaluating (99 * 252.7006667)
- inlining $max_21
- pre-evaluating (180 - -180)
- pre-evaluating (2.52700413999E8 div 360)
- pre-evaluating format-number(701945.5944425925, "00")
- simplifying flwor expression
- pre-evaluating -79.967
- inlining local:normalize#3
- removing redundant $num_22 as xs:double cast.
- removing redundant $min_23 as xs:integer cast.
- removing redundant $max_24 as xs:integer cast.
- inlining $num_22
- inlining $min_23
- pre-evaluating (-79.967 - -180)
- pre-evaluating (99 * 100.033)
- inlining $max_24
- pre-evaluating (180 - -180)
- pre-evaluating (1.00033233267E8 div 360)
- pre-evaluating format-number(277870.0924074075, "00")
- simplifying flwor expression
- rewriting descendant-or-self step(s)
- rewriting descendant-or-self step(s)
- inlining local:normalize#3
- removing redundant $min_26 as xs:integer cast.
- removing redundant $max_27 as xs:integer cast.
- inlining $num_25 as xs:double
- inlining $min_26
- inlining $max_27
- pre-evaluating (180 - -180)
- simplifying flwor expression
- rewriting descendant-or-self step(s)
- inlining local:normalize#3
- removing redundant $min_29 as xs:integer cast.
- removing redundant $max_30 as xs:integer cast.
- inlining $num_28 as xs:double
- inlining $min_29
- inlining $max_30
- pre-evaluating (90 - -90)
- simplifying flwor expression
- rewriting ($beginPosition_10 >= "1889-01-01")
- rewriting ($beginPosition_10 <= "2015-07-10")
- atomic evaluation of ($lat_12 <= $north_5)
- atomic evaluation of ($lat_12 >= $south_6)
- atomic evaluation of ($lon_11 <= $east_7)
- atomic evaluation of ($lon_11 >= $west_8)
- rewriting ("1889-01-01" <= $beginPosition_10 and $beginPosition_10 <= "2015-07-10" and ($lat_12 <= $north_5) and ($lat_12 >= $south_6) and ($lon_11 <= $east_7) and ($lon_11 >= $west_8))
- inlining $db_4
- inlining $north_5
- rewriting ($lat_12 <= "759627")
- inlining $s

Re: [basex-talk] Reporting function for a subset

2015-07-14 Thread Menashè Eliezer

I'm sorry, but it's not clear how $nodes can include the result of my 
main query:


   xquery version "3.0"; declare option output:item-separator ",";
   let $db := db:open("CDI")
   for $x in $db
   let $beginPosition := $x//startTime
   let $lon := xs:float($x//longitudine)
   let $lat := xs:float($x//latitudine)

   where
   $beginPosition>="1889-01-01" and $beginPosition<="2015-07-10"
   and $lat<=46.733  and  $lat>=-67.81
   and $lon<=72.7006667 and $lon >=-79.967
   return $x

With kind regards,
Menashè

On 07/14/2015 12:51 PM, Christian Grün wrote:

E.g. like that:

   let $count := function($nodes) { count($nodes) }
   let $nodes := (, )
   return $count($nodes)



On Tue, Jul 14, 2015 at 12:41 PM, Menashè Eliezer
 wrote:

Thank you, but would you please show me how to pass (only once) for each
function the xml sequence which results from my main query, instead of
simple numbers as in your example?

With kind regards,
Menashè


On 07/14/2015 12:30 PM, Christian Grün wrote:

I hope it's clear.

Sorry, I'm still confuzzled. What is the problem? I guess you want to
define different, exchangable reporting functions for more or less the
same input (dataType, device, ...)?

Here is one way to define functions and call them in a second step:

let $add := function($a, $b) { $a + $b }
let $multiply := function($a, $b) { $a * $b }
for $function in ($add, $multiply)
return $function(3, 5)

Instead of $add and $multiply, you could have $report-pivoting and
$report-count.



On Tue, Jul 14, 2015 at 11:40 AM, Menashè Eliezer
 wrote:

Hi,
The initial of the code should be modified, so here is only the essence
of
one of the pivoting reports:
for $singleDataType in $dataType

for $singleDevice in $device

for $singleAvailability in $availability

for $singleCountry in  $country

for $singleParameter in $parameter

group by

$singleDataType,$singleDevice,$singleAvailability,$singleCountry,$singleParameter
order by

$singleDataType,$singleDevice,$singleAvailability,$singleCountry,$singleParameter

return 

Other report will count all records with one less condition:

group by $singleDataType,$singleDevice,$singleAvailability,$singleCountry
order by $singleDataType,$singleDevice,$singleAvailability,$singleCountry

return 

I hope it's clear.

With kind regards,
Menashè

Re: [basex-talk] Reporting function for a subset

2015-07-14 Thread Menashè Eliezer

Thank you, but would you please show me how to pass (only once) for each 
function the xml sequence which results from my main query, instead of 
simple numbers as in your example?


With kind regards,
Menashè

On 07/14/2015 12:30 PM, Christian Grün wrote:

I hope it's clear.

Sorry, I'm still confuzzled. What is the problem? I guess you want to
define different, exchangable reporting functions for more or less the
same input (dataType, device, ...)?

Here is one way to define functions and call them in a second step:

   let $add := function($a, $b) { $a + $b }
   let $multiply := function($a, $b) { $a * $b }
   for $function in ($add, $multiply)
   return $function(3, 5)

Instead of $add and $multiply, you could have $report-pivoting and
$report-count.



On Tue, Jul 14, 2015 at 11:40 AM, Menashè Eliezer
 wrote:

Hi,
The initial of the code should be modified, so here is only the essence of
one of the pivoting reports:
for $singleDataType in $dataType

for $singleDevice in $device

for $singleAvailability in $availability

for $singleCountry in  $country

for $singleParameter in $parameter

group by
$singleDataType,$singleDevice,$singleAvailability,$singleCountry,$singleParameter
order by
$singleDataType,$singleDevice,$singleAvailability,$singleCountry,$singleParameter

return 

Other report will count all records with one less condition:

group by $singleDataType,$singleDevice,$singleAvailability,$singleCountry
order by $singleDataType,$singleDevice,$singleAvailability,$singleCountry

return 

I hope it's clear.

With kind regards,
Menashè

Re: [basex-talk] Reporting function for a subset

2015-07-14 Thread Menashè Eliezer


Hi,
The initial of the code should be modified, so here is only the essence 
of one of the pivoting reports:

for $singleDataType in $dataType

for $singleDevice in $device

for $singleAvailability in $availability

for $singleCountry in  $country

for $singleParameter in $parameter

group by 
$singleDataType,$singleDevice,$singleAvailability,$singleCountry,$singleParameter
order by 
$singleDataType,$singleDevice,$singleAvailability,$singleCountry,$singleParameter


return Availability="{$singleAvailability}" Country="{$singleCountry}" 
Parameter="{$singleParameter}" NumberOfRecords="{count($current-pre)}"/>


Other report will count all records with one less condition:

group by $singleDataType,$singleDevice,$singleAvailability,$singleCountry
order by $singleDataType,$singleDevice,$singleAvailability,$singleCountry

return Availability="{$singleAvailability}" Country="{$singleCountry}" 
Parameter="Any" NumberOfRecords="{count($current-pre)}"/>


I hope it's clear.

With kind regards,
Menashè

Re: [basex-talk] Slow query

2015-07-14 Thread Menashè Eliezer


Hi,

On 07/14/2015 11:05 AM, Christian Grün wrote:

It may be slightly faster if you remove the explicit string() conversion

No, it's actually slower.
But please note that BaseX provides no native range index, which would 
be a good fit for your longitude/latitude filter. 

Should *geo:within *of http://docs.basex.org/wiki/Geo_Module help?

Re: [basex-talk] Reporting function for a subset

2015-07-14 Thread Menashè Eliezer


Hi Christian,

On 07/14/2015 09:30 AM, Christian Grün wrote:
What about the reporting function, does it already exist? What is a 
subset: Is it a sequence of XML nodes resulting from a path 
expression? Could you possibly provide us with some code you have 
written so far? Christian
I mean a sequence of XML nodes resulting from a 'for' loop. Then there 
are queries that for the same filters (or with a single extra condition) 
return different xml results.

For example: "group by" and time distribution table per year and month.

With kind regards,
Menashè

Re: [basex-talk] Slow query

2015-07-14 Thread Menashè Eliezer


Hi Christian,
oops, I'm sorry. It's attached.
There are text and attribute indexes.

With kind regards,
Menashè

On 07/14/2015 09:32 AM, Christian Grün wrote:

Hi Menashè,

The attached log file is empty. Maybe it's sufficient if you provide
us with the query and give us information on the query compilation
(are any indexes used?).

C.


On Mon, Jul 13, 2015 at 3:32 PM, Menashè Eliezer
 wrote:

Hello,
Creating a database of partial xml documents had almost no effect.
Therefore I've created a database with very simple xml structure. I'm
attaching an example (demo.xml).
BaseX version: 8.2.2
Number of documents: 374739

However, the attached query takes 4 seconds (attached simple_query.log). I
don't know if it's considered a normal performance, but my real query is
different:
I'm copying all the documents which correspond to my query to a newly
created temporary collection, for having faster processing for this subset:
reporting, ecc.
Adding to db (Remote Java client): 12 sec.
Optimising the db (Remote Java client): 23 sec.
Both the Java client and BaseX server are installed on powerful servers.
Are these numbers normal?
The attached results are based on a local client (Using the BaseX GUI).

In the future, I should have even much more documents to handle...
Any ideas?
I can also change the scheme of my new xml.

As for the idea of creating a new temporary db, I'm checking an alternative:
return in one query all what I need, including reports, all in one xml.

With kind regards,
Menashè




Compiling:
- pre-evaluating db:open("CDI")
- rewriting descendant-or-self step(s)
- rewriting descendant-or-self step(s)
- rewriting descendant-or-self step(s)
- rewriting ($beginPosition_2 >= "1889-01-01")
- rewriting ($beginPosition_2 <= "2015-07-10")
- rewriting ($lat_4 <= 46.733)
- pre-evaluating -67.81
- rewriting ($lat_4 >= -67.81)
- rewriting ($lon_3 <= 72.7006667)
- pre-evaluating -79.967
- rewriting ($lon_3 >= -79.967)
- rewriting ("1889-01-01" <= $beginPosition_2 and $beginPosition_2 <= "2015-07-10" and $lat_4 <= 46.733 and -67.81 <= $lat_4 and $lon_3 <= 72.7006667 and -79.967 <= $lon_3)
- inlining $db_0
- inlining $beginPosition_2
- inlining $lon_3
- inlining $lat_4
- rewriting where clause(s)
Query:
xquery version "3.0"; declare option output:item-separator ","; let $db := db:open("CDI") for $x in $db let $beginPosition := string($x//startTime) let $lon := xs:float($x//longitudine) let $lat := xs:float($x//latitudine) where $beginPosition>="1889-01-01" and $beginPosition<="2015-07-10" and $lat<=46.733 and $lat>=-67.81 and $lon<=72.7006667 and $lon >=-79.967 return db:node-pre($x)
Optimized Query:
for $x_1 in ((db:open-pre("CDI",0), ...))["1889-01-01" <= string(descendant::*:startTime) <= "2015-07-10"][-67.81 <= descendant::*:latitudine cast as xs:float? <= 46.733][-79.967 <= descendant::*:longitudine cast as xs:float? <= 72.7006667] return db:node-pre($x_1)
Result:
- Hit(s): 374736 Items
- Updated: 0 Items
- Printed: 2048 KB
- Read Locking: local [CDI]
- Write Locking: none
Timing:
- Parsing: 4.51 ms
- Compiling: 95.07 ms
- Evaluating: 28.92 ms
- Printing: 4458.8 ms
- Total Time: 4587.29 ms
Query plan:

[basex-talk] Reporting function for a subset

2015-07-13 Thread Menashè Eliezer


Hello,
I want to call a reporting function with a subset of documents created 
inside a loop which will return a xml report.

I couldn't find information how and if can it be done.
The idea is that there is a main query which results in a subset. Then I 
want to make different processing only on this subset without making a 
query on the whole collection with the filters, as I've already done in 
the main query.
The all-in-one query will actually return inside my own xml the results 
of the different functions.

Is it possible?

--
With kind regards,
Menashè

Re: [basex-talk] Slow query

2015-07-13 Thread Menashè Eliezer


Hello,
Creating a database of partial xml documents had almost no effect.
Therefore I've created a database with very simple xml structure. I'm 
attaching an example (demo.xml).

BaseX version: 8.2.2
Number of documents: 374739

However, the attached query takes 4 seconds (attached simple_query.log). 
I don't know if it's considered a normal performance, but my real query 
is different:
I'm copying all the documents which correspond to my query to a newly 
created temporary collection, for having faster processing for this 
subset: reporting, ecc.

Adding to db (Remote Java client): 12 sec.
Optimising the db (Remote Java client): 23 sec.
Both the Java client and BaseX server are installed on powerful servers.
Are these numbers normal?
The attached results are based on a local client (Using the BaseX GUI).

In the future, I should have even much more documents to handle...
Any ideas?
I can also change the scheme of my new xml.

As for the idea of creating a new temporary db, I'm checking an 
alternative: return in one query all what I need, including reports, all 
in one xml.


With kind regards,
Menashè




  2012-08-29T00:00:00
  2012-08-29T01:00:00
  45.6265
  13.7718333
  Ferriera
  by negotiation
  
  AHGT (Vertical spatial coordinates)
  AMON (Ammonium concentration parameters in the water column)
  NTOT (Particulate total and organic nitrogen concentrations in the water column)
  NTRA (Nitrate concentration parameters in the water column)
  NTRI (Nitrite concentration parameters in the water column)
  PHOS (Phosphate concentration parameters in the water column)
  SLCA (Silicate concentration parameters in the water column)
  TDPX (Dissolved total or organic phosphorus concentration in the water column)
  
  
  discrete water samplers
  
  
  
  2431 (OGS (Istituto Nazionale di Oceanografia e di Geofisica Sperimentale), Department of Biological Oceanography), from Italy
Italy

Re: [basex-talk] Slow query

2015-06-25 Thread Menashè Eliezer


Hi,


Just create a new database from the input data with this option turned on.

I've expected db:add to do it. Not important.


If it's not well-formed, you can't store it in BaseX.. If you can do 
so, it would be an error (and rather surprising to me ;).
Well, the not well-formed is the response for my query for getting the 
document after I've done what I've described earlier.

Maybe this is the reason that the new db does not 'function'.
I'll try adding the header lines to each new xml before using db:add.


Cheers,
C.

Re: [basex-talk] Slow query

2015-06-25 Thread Menashè Eliezer


Hi Christian,
True. I forgot to mention that the 'stripns' option (as all other XML 
parsing options [1]) only applies to newly parsed XML strings. 

But these strings belong to new documents being added using db:add.
Anyway, how can I strip the namespaces in my new database? I don't need 
them.



Anyway, maybe because the xml are not valid, I get always 0 hits unless I
ask to return the doc itself.

Hm, the stored data must be well-formed, otherwise it couldn't be
stored. And it will always be well-formed if you store it via db:add
(in XML terminology, validity requires a schema [2]).

Did you already have a look at the data stored in your new database?
Christian
I've used db:add. The data is not well-formed. It's just like copy&paste 
of the relevant xml. No headers.

This is how I've created it:

   declare namespace gco = "http://www.isotc211.org/2005/gco";;
   declare namespace gmd = "http://www.isotc211.org/2005/gmd";;
   declare namespace gml = "http://www.opengis.net/gml";;
   declare namespace gmx="http://www.isotc211.org/2005/gmx";;
   declare namespace sdn = "http://www.seadatanet.org";;

   declare namespace fn = "http://www.w3.org/2005/xpath-functions";;
   declare namespace xs = "http://www.w3.org/2001/XMLSchema";;

   let $db := db:open("ENTIRE-CDI","Vertical_profiles")
   for $x in
   $db/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification
   let $id :=
   
string($x/gmd:citation/gmd:CI_Citation/gmd:alternateTitle/gco:CharacterString)
  return db:add("CDI", $x, 'Vertical_profiles/' || $id || '.xml',
   map { 'stripns': true(), 'intparse': true() })


With kind regards,
Menashè

On 06/25/2015 03:05 PM, Christian Grün wrote:

Hi Menashè,


I've used  map { 'stripns': true(), 'intparse': true() }) in db:add, but the
namespaces were not removed, e.g. there is gml:beginPosition.

True. I forgot to mention that the 'stripns' option (as all other XML
parsing options [1]) only applies to newly parsed XML strings.


Anyway, maybe because the xml are not valid, I get always 0 hits unless I
ask to return the doc itself.

Hm, the stored data must be well-formed, otherwise it couldn't be
stored. And it will always be well-formed if you store it via db:add
(in XML terminology, validity requires a schema [2]).

Did you already have a look at the data stored in your new database?
Christian

[1] http://docs.basex.org/wiki/Options#STRIPNS
[2] https://en.wikipedia.org/wiki/XML#Schemas_and_validation

Re: [basex-talk] Slow query

2015-06-25 Thread Menashè Eliezer


Hi Christian,

I've created a new database with only the relevant part of each xml. 
It's much smaller and I hope it would help.
The created xml is not a valid one since the xml and xml-model tags are 
missing, but it shouldn't be a problem.
I've used  map { 'stripns': true(), 'intparse': true() }) in db:add, but 
the namespaces were not removed, e.g. there is gml:beginPosition.
Anyway, maybe because the xml are not valid, I get always 0 hits unless 
I ask to return the doc itself.

Even with where db:node-id($ext)=0
or without conditions, but when I ask to return return 
$ext/sdn:SDN_DataIdentification instead of $ext (The xml doc).


With kind regards,
Menashè

On 06/24/2015 01:58 PM, Christian Grün wrote:

I couldn't find an option in db:add to specificy an XPath. In my case, I
need to extract only the elements under
/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification

We try to avoid XPath strings arguments whenever possible. Instead,
simply use XQuery, which allows you to do all kinds of things.

Example 1 (add one document per element):

   for $node at $pos in /gmd:MD_Metadata/.
   return db:add('db', $pos || '.xml', $node)

Example 2 (add single document):

   db:add('db', 'doc.xml', element xml {
  /gmd:MD_Metadata/.
   })

Cheers,
Christian

Re: [basex-talk] db:add OPTIONS

2015-06-25 Thread Menashè Eliezer


Great. Thanks!

With kind regards,
Menashè

On 06/25/2015 01:51 PM, Christian Grün wrote:

I've followed the documentation of db:add which states: Allowed options are
all parsing and XML parsing options.

"The key must be in lower case"… Agreed, we should possibly add this
in the documentation.



using OPTIONS as map { 'STRIPNS': true() }
However, I get an error that STRIPNS is unknown database option.
Any idea?

--
With kind regards,
Menashè

[basex-talk] db:add OPTIONS

2015-06-25 Thread Menashè Eliezer


Hello,
I've followed the documentation of db:add which states: Allowed options 
are allparsing andXML 
parsing options.

using OPTIONS as /map { 'STRIPNS': true() }/
However, I get an error that STRIPNS is unknown database option.
Any idea?

--
With kind regards,
Menashè

Re: [basex-talk] Slow query

2015-06-24 Thread Menashè Eliezer


Hi Christian,

The usual approach is to simply create another database that only
contains the relevant parts of your document. This can directly be
done in XQuery (using db:create, db:add, ...), or, if memory
consumption is too high, by exporting and importing parts of your
document.
I couldn't find an option in db:add to specificy an XPath. In my case, I 
need to extract only the elements under 
/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification


With kind regards,
Menashè

Re: [basex-talk] Slow query

2015-06-23 Thread Menashè Eliezer


Thank you Christian for the helpful reply.

With kind regards,
Menashè

On 06/23/2015 01:32 PM, Christian Grün wrote:

Is there also an option to define inside the part only the xpaths which I would 
need?

I guess no, but to be honest, I am not exactly sure what you mean?
Would you like to restrict indexing to specific parts of the document?
In that case, you'll have to wait for someone implementing [1]
(contributors are always welcome)…


Another question, how can I know if the following values have been exceeded in 
a specific database? Quoting:
MAXLEN

You will know by looking at your data. Just write a query that returns
the maximum string lengths of all distinct paths.


MAXCATS

You can e.g. use index:facets().

Hope this helps,
Christian

[1] https://github.com/BaseXdb/basex/issues/59

Re: [basex-talk] Slow query

2015-06-23 Thread Menashè Eliezer


Thank you Christian,
I may try it later as a last option. I hope you can find an alternative 
solution.
Is there also an option to define inside the part only the xpaths which 
I would need?
Otherwise, many elements and attributes which I don't need are being 
indexed.


Another question, how can I know if the following values have been 
exceeded in a specific database? Quoting:



 MAXLEN

*Signature* |MAXLEN [int]|
*Default*   |96|
*Summary* 	Specifies the maximum length of strings that are to be 
indexed by the name, path, value, and full-text index structures. The 
value of this option will be assigned once to a new database, and cannot 
be changed after that.



 MAXCATS

*Signature* |MAXCATS [int]|
*Default*   |100|
*Summary* 	Specifies the maximum number of distinct values (categories) 
that will be stored together with the element/attribute names or unique 
paths in theName Index 
orPath Index 
. The value of this option 
will be assigned once to a new database, and cannot be changed after that.



With kind regards,
Menashè

On 06/23/2015 12:51 PM, Christian Grün wrote:

Is there an option to ask BaseX to parse only a part of the imported xml
files under a specific xpath, (or at least limit useless indexing of non
relevant components)? I don't need the rest of the xml files, even though
it's not too big. Maybe it can help.

The usual approach is to simply create another database that only
contains the relevant parts of your document. This can directly be
done in XQuery (using db:create, db:add, ...), or, if memory
consumption is too high, by exporting and importing parts of your
document.

Hope this helps,
Christian

Re: [basex-talk] Slow query

2015-06-23 Thread Menashè Eliezer

Hi Christian,
Even when I leave only the first filter and test it as standalone it
takes more than 8 seconds:

Result:
- Hit(s): 25 Items
- Updated: 0 Items
- Printed: 2048 KB
- Read Locking: local [CDI]
- Write Locking: none
Timing:
- Parsing: 2.0 ms
- Compiling: 107.74 ms
- Evaluating: 8085.55 ms
- Printing: 106.4 ms
- Total Time: 8301.69 ms

With kind regards,
Menashè

On 06/22/2015 07:57 PM, Christian Grün wrote:

Hi Menashè,

QUERY[0] xquery version "3.0"; declare namespace queryName ='GetIDS';
declare namespace gco = "http://www.isotc211.org/2005/gco";; declare
[...]

It would be great if you could help us and simplify the query, such
that we can have a look at the core issue.

Id there an undocumented way to log the full xquery in BaseX server logs?

The maximum size of log entries can be adjusted via the option LOGMSGMAXLEN [1].

Cheers,
Christian

[1] http://docs.basex.org/wiki/Options#LOGMSGMAXLEN

I've seen the -V option, but I don't use the standalone version, but:
java -cp /usr/share/java/basex.jar org.basex.BaseXServer
-d doesn't give me extra query info.

With kind regards,
Menashè

On 02/03/2015 01:13 PM, Menashè Eliezer wrote:

Hi Christian,

Thank you! The performance arrives to 0.5 sec!

The biggest improvement is related to the query rephrasing you've
suggested.
Then the latest snapshot also helps a lot!
You may want to know that in the log of the latest snapshot I see
applying attribute index for "7827"
which is not clear to the user, instead of BaseX80-20150130.124009 which
has also used indexing:
applying attribute index for ("ALKY", "AYMD")

I'm attaching the first and the second launch of the query using
BaseXGUI. Relaunching the same query reduces the time from over 1 second to
0.5 second.
Some data:
BaseX80-20150130.124009
Total Time: 30676.02 ms
After using "for $x in
collection("ALL-CDIS")/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification":
Total Time: 5456.74 ms
applying attribute index for ("ALKY", "AYMD") in log.
Second launch: 1333.71 ms
Latest snapshot (BaseX80-20150202.121033):
1st: Total Time: 1873.02 ms
2nd: Total Time: 548.62 ms

With kind regards,
Menashè

On 02/02/2015 02:02 PM, Menashè Eliezer wrote:

Hi Christian,

Thank you very much! Unfortunately I'll be at the office only tomorrow.

Menashè

On Sat, 31 Jan 2015 16:42:32 +0100, Christian Grün
wrote:

Hi Menashè,

With the latest snapshot [1], your original query should now be
rewritten for index access as well. Looking forward to your tests,

Christian

PS: In terms of performance, it may still be worthwhile to move
redundant paths to the for clause; but just try and see.

[1] http://files.basex.org/releases/latest/

On Fri, Jan 30, 2015 at 9:49 PM, Christian Grün
wrote:

Hi Menashè,

Should I expect to see the usage of an index for each of the where

phrases?

Usually, only one predicate will be rewritten for index access, and
the remaining conditions will be answered sequentially.

Have a nice weekend!

Enjoy,
Christian

Menashè

On Fri, 30 Jan 2015 18:11:59 +0100, Christian Grün
wrote:

Hi Menashè,

Thanks for the XML samples you sent me in private. I noticed that
the
index rewritings will only be triggered if you formulate your query
as
follows:

OLD:
for $x in collection("ALL-CDIS")
where $x/gmd:MD_Metadata/gmd:identificationInfo/...
return ...

NEW:
for $x in collection("ALL-CDIS")/gmd:MD_Metadata
where $x/gmd:identificationInfo/...
return ...

It's difficult to explain in short sentences why Variant 1 cannot be
optimized that straightforward (basically, it's quite a different
pattern to look for), but I'll check out if we can extend our
matcher
to also support these kind of queries.

So, if possible, I would recommend you for now (and at least for
testing) to move the root element test after the collection()
function. I noticed that the first three child steps are the same in
all of your conditions:

gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification

If that will be always be the case, it surely makes sense to move
all
of them to the "for" clause.

Looking forward to your updated performance tests,
Christian
___

On Fri, Jan 30, 2015 at 5:55 PM, Christian Grün
wrote:

Could you possibly provide me with a small snapshot of your data
sources (one, two documents might be sufficient)?

On Fri, Jan 30, 2015 at 5:52 PM, Menashè Eliezer
wrote:

Almost the same speed with version 8.0.
No indexing (no "applying" in the query info).
As I've attached before, indexes are active for this DB.

With kind regards,
Menashè

On 01/30/2015 05:31 PM, Christian Grün wrote:

It's indeed interesting that your query does not use any of the
existing index structures (if they did, you would f

Re: [basex-talk] Slow query

2015-06-22 Thread Menashè Eliezer


Hi,
I've used ssh -X for producing query info right from the server machine. 
Please see attached.

I hope it would help.

With kind regards,
Menashè

On 06/22/2015 04:48 PM, Menashè Eliezer wrote:

Hi Christian,
I'm have again performance problems. I have BaseX 8.2.1.
As you may remember, you've recommended changing
'for $x in collection("CDI")' to 'for $x in 
collection("CDI")/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification'.
However, I've discovered I cannot specify XPath while working with IDs 
(db:node-pre).
It's a multi-step process: client program sends to the server the 
search filter defined by end-user and get IDs.
Then there are several queries for getting different information about 
this specific subset. Instead of redefining the filters, the only 
condition is

where db:node-pre($x)=$ids
for having a better performance.
Once I specific XPath, it seems that the ids have no meaning. The 
resultset is always empty once they are being used.
So, I've returned to use 'for $x in collection("CDI")' in the first 
query of getting all IDs, but the performance is extremely low.


**I'm attaching the query and its related info using BaseXGUI (local 
server) with much smaller database. The performance seems ok.


I'm using your BaseXClient.java, however I see the delay already in 
the BaseX server logs:
 QUERY[0] xquery version "3.0"; declare namespace queryName ='GetIDS'; 
declare namespace gco = "http://www.isotc211.org/2005/gco";; declare 
namespace gmd = "http://www.isotc211.org/2005/gmd";; declare 
namespace gml = "http://www.opengis.net/gml";; declare namespace 
gmx="http://www.isotc211.org/2005/gmx";; declare namespace sdn = 
"http://www.seadatanet.org";; declare namespace fn = 
"http://www.w3.org/2005/xpath-functions";; declare namespace xs = 
"http://www.w3.org/2001/XMLSchema";; declare namespace output = 
"http://www.w3.org/2010/xslt-xquery-serialization";; declare option 
output:method 'xml';declare option output:item-separator ","; let $db 
:= db:open("CDI") for $x in $db where 
$x/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:extent/gmd:EX_Exte 
nt/gmd:geographicElement/gmd:EX_GeographicBoundingBox/gmd:westBoundLongitude/gco:Decimal>="-5.8447265625" 
and 
$x/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:extent/gmd:EX_Extent/gmd 
:geographicElement...0.17 ms
110 16:36:09.713192.168.155.30:39211admin   OK RESULTS[0]  
25957.11 ms


Then I have other slow queries, but I would like to focus in this 
phase on the biggest delay.

Server: Java 1.7.0_79, VM="-XX:MaxPermSize=512m -Xms3096m -Xmx3096m"
The network layer between client and server is very fast.

P.S.
Id there an undocumented way to log the full xquery in BaseX server logs?
I've seen the -V option, but I don't use the standalone version, but:
java -cp /usr/share/java/basex.jar org.basex.BaseXServer
-d doesn't give me extra query info.


With kind regards,
Menashè

On 02/03/2015 01:13 PM, Menashè Eliezer wrote:

Hi Christian,

Thank you! The performance arrives to 0.5 sec!

The biggest improvement is related to the query rephrasing you've 
suggested.

Then the latest snapshot also helps a lot!
You may want to know that in the log of the latest snapshot I see
applying attribute index for "7827"
which is not clear to the user, instead of BaseX80-20150130.124009 
which has also used indexing:

applying attribute index for ("ALKY", "AYMD")

I'm attaching the first and the second launch of the query using 
BaseXGUI. Relaunching the same query reduces the time from over 1 
second to 0.5 second.

Some data:
BaseX80-20150130.124009
Total Time: 30676.02 ms
After using "for $x in 
collection("ALL-CDIS")/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification":

Total Time: 5456.74 ms
applying attribute index for ("ALKY", "AYMD") in log.
Second launch: 1333.71 ms
Latest snapshot (BaseX80-20150202.121033):
1st: Total Time: 1873.02 ms
2nd: Total Time: 548.62 ms

With kind regards,
Menashè

On 02/02/2015 02:02 PM, Menashè Eliezer wrote:

Hi Christian,

Thank you very much! Unfortunately I'll be at the office only tomorrow.

Menashè

On Sat, 31 Jan 2015 16:42:32 +0100, Christian Grün
 wrote:

Hi Menashè,

With the latest snapshot [1], your original query should now be
rewritten for index access as well. Looking forward to your tests,

Christian

PS: In terms of performance, it may still be worthwhile to move
redundant paths to the for clause; but just try and see.

[1] http://files.basex.org/releases/latest/



On Fri, Jan 30, 2015 at 9:49 PM, Christian Grün
 wrote:

Hi Menashè,


Should I expect to see t

Re: [basex-talk] Slow query

2015-06-22 Thread Menashè Eliezer


Hi Christian,
I'm have again performance problems. I have BaseX 8.2.1.
As you may remember, you've recommended changing
'for $x in collection("CDI")' to 'for $x in 
collection("CDI")/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification'.
However, I've discovered I cannot specify XPath while working with IDs 
(db:node-pre).
It's a multi-step process: client program sends to the server the search 
filter defined by end-user and get IDs.
Then there are several queries for getting different information about 
this specific subset. Instead of redefining the filters, the only 
condition is

where db:node-pre($x)=$ids
for having a better performance.
Once I specific XPath, it seems that the ids have no meaning. The 
resultset is always empty once they are being used.
So, I've returned to use 'for $x in collection("CDI")' in the first 
query of getting all IDs, but the performance is extremely low.


**I'm attaching the query and its related info using BaseXGUI (local 
server) with much smaller database. The performance seems ok.


I'm using your BaseXClient.java, however I see the delay already in the 
BaseX server logs:
 QUERY[0] xquery version "3.0"; declare namespace queryName ='GetIDS'; 
declare namespace gco = "http://www.isotc211.org/2005/gco";; declare 
namespace gmd = "http://www.isotc211.org/2005/gmd";; declare 
namespace gml = "http://www.opengis.net/gml";; declare namespace 
gmx="http://www.isotc211.org/2005/gmx";; declare namespace sdn = 
"http://www.seadatanet.org";; declare namespace fn = 
"http://www.w3.org/2005/xpath-functions";; declare namespace xs = 
"http://www.w3.org/2001/XMLSchema";; declare namespace output = 
"http://www.w3.org/2010/xslt-xquery-serialization";; declare option 
output:method 'xml';declare option output:item-separator ","; let $db := 
db:open("CDI") for $x in $db where 
$x/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:extent/gmd:EX_Exte 
nt/gmd:geographicElement/gmd:EX_GeographicBoundingBox/gmd:westBoundLongitude/gco:Decimal>="-5.8447265625" 
and 
$x/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:extent/gmd:EX_Extent/gmd 
:geographicElement...0.17 ms
110 16:36:09.713192.168.155.30:39211admin   OK RESULTS[0]  
25957.11 ms


Then I have other slow queries, but I would like to focus in this phase 
on the biggest delay.

Server: Java 1.7.0_79, VM="-XX:MaxPermSize=512m -Xms3096m -Xmx3096m"
The network layer between client and server is very fast.

P.S.
Id there an undocumented way to log the full xquery in BaseX server logs?
I've seen the -V option, but I don't use the standalone version, but:
java -cp /usr/share/java/basex.jar org.basex.BaseXServer
-d doesn't give me extra query info.


With kind regards,
Menashè

On 02/03/2015 01:13 PM, Menashè Eliezer wrote:

Hi Christian,

Thank you! The performance arrives to 0.5 sec!

The biggest improvement is related to the query rephrasing you've 
suggested.

Then the latest snapshot also helps a lot!
You may want to know that in the log of the latest snapshot I see
applying attribute index for "7827"
which is not clear to the user, instead of BaseX80-20150130.124009 
which has also used indexing:

applying attribute index for ("ALKY", "AYMD")

I'm attaching the first and the second launch of the query using 
BaseXGUI. Relaunching the same query reduces the time from over 1 
second to 0.5 second.

Some data:
BaseX80-20150130.124009
Total Time: 30676.02 ms
After using "for $x in 
collection("ALL-CDIS")/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification":

Total Time: 5456.74 ms
applying attribute index for ("ALKY", "AYMD") in log.
Second launch: 1333.71 ms
Latest snapshot (BaseX80-20150202.121033):
1st: Total Time: 1873.02 ms
2nd: Total Time: 548.62 ms

With kind regards,
Menashè

On 02/02/2015 02:02 PM, Menashè Eliezer wrote:

Hi Christian,

Thank you very much! Unfortunately I'll be at the office only tomorrow.

Menashè

On Sat, 31 Jan 2015 16:42:32 +0100, Christian Grün
 wrote:

Hi Menashè,

With the latest snapshot [1], your original query should now be
rewritten for index access as well. Looking forward to your tests,

Christian

PS: In terms of performance, it may still be worthwhile to move
redundant paths to the for clause; but just try and see.

[1] http://files.basex.org/releases/latest/



On Fri, Jan 30, 2015 at 9:49 PM, Christian Grün
 wrote:

Hi Menashè,


Should I expect to see the usage of an index for each of the where

phrases?

Usually, only one predicate will be rewritten for index access, and
the remaining conditions will be answered sequentially.


Have a nice weekend!

[basex-talk] BaseX Java examples

2015-05-14 Thread Menashè Eliezer


Hello,
The page http://docs.basex.org/wiki/Java_Examples is very detailed.
However, I wonder in which cases one should use the BaseX XQJ API and 
not the standard client.

e.g., which client offers a better performance?

--
With kind regards,
Menashè

Re: [basex-talk] Add metadata to a group of documents inside the database

2015-05-13 Thread Menashè Eliezer


Vincent,

Thank you for the creative solution.

I've also commented the feature request as a reply to your question.

With kind regards,
Menashè

On 05/12/2015 03:57 PM, Lizzi, Vincent wrote:

Menashè,

In similar situations I've used a second database to store metadata at the same 
path as the document in the primary database. For example:

db:open('database', '/path/a.xml')
db:open('database_metadata', '/path/a.xml')

Also, there is a feature request for adding properties as additional metadata 
for documents in the database:

https://github.com/BaseXdb/basex/issues/988

If I may ask, what metadata information do you need to record about each 
document?

Vincent



-Original Message-
From:basex-talk-boun...@mailman.uni-konstanz.de  
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Christian Grün
Sent: Tuesday, May 12, 2015 8:41 AM
To: Menashè Eliezer
Cc: BaseX
Subject: Re: [basex-talk] Add metadata to a group of documents inside the 
database

If you create a database with a single CREATE or db:create call, the resulting 
path structure will reflect the original directory structure. However, you can 
also add files and directories to specific target paths in a second steps, e.g. 
via db:add [1]. Our Wiki should contain all relevant information (see e.g. [2]).

[1]http://docs.basex.org/wiki/Database_Module#db:add
[2]http://docs.basex.org/wiki/Databases


On Tue, May 12, 2015 at 2:35 PM, Menashè Eliezer  
wrote:

Hi Christian,

Thank you very much for your reply!
As for path: A database is created from all files in a specific folder
which has subfolders 'group1', 'group2' etc. and the path is derived
from these subfolders of adding files to db?

With kind regards,
Menashè


On 05/12/2015 02:19 PM, Christian Grün wrote:

Hi Menashè,

Just use different database paths for each group (e.g. '/path1/' and
'/path2/'), and specify the sub path with db:open:

db:open('db', '/path1')/...

You can also store the documents in two separate databases and use a
single XQuery expression to query all documents:

for $db in ('db1', 'db2')
return db:open($db)/...

Hope this helps,
Christian


On Tue, May 12, 2015 at 2:14 PM, Menashè Eliezer
  wrote:

Hello,
I have two groups of xml to be included in the same database.
Usually the same query will be performed on both of them, but I need
to able to query only one group.
The difference between the groups is known only to add the xml
files, e.g.
the origin of the files. This information is not found inside the
files, and I prefer not modifying their content.

I was hoping to be able to define groups, using subfolders and
base-uri or multiple collections, but I know it's not possible using
Basex.
Maybe I can tag them? The group type is both a property and a
possible filter in a query, so I need a good performance.

Right now I see two alternatives:
1. Using two separate databases. Once I need to query all files,
I'll make the same query on multiple databases...
2. One database, but a query to a new xml document, which includes
list of node-ids per group type, will be used both for knowing the
type and for querying a subset.

Any ideas, please?

--
With kind regards,
Menashè

Re: [basex-talk] Add metadata to a group of documents inside the database

2015-05-12 Thread Menashè Eliezer


Hi Christian,

Thank you very much for your reply!
As for path: A database is created from all files in a specific folder 
which has subfolders 'group1', 'group2' etc. and the path is derived 
from these subfolders of adding files to db?


With kind regards,
Menashè

On 05/12/2015 02:19 PM, Christian Grün wrote:

Hi Menashè,

Just use different database paths for each group (e.g. '/path1/' and
'/path2/'), and specify the sub path with db:open:

   db:open('db', '/path1')/...

You can also store the documents in two separate databases and use a
single XQuery expression to query all documents:

   for $db in ('db1', 'db2')
   return db:open($db)/...

Hope this helps,
Christian


On Tue, May 12, 2015 at 2:14 PM, Menashè Eliezer
 wrote:

Hello,
I have two groups of xml to be included in the same database.
Usually the same query will be performed on both of them, but I need to able
to query only one group.
The difference between the groups is known only to add the xml files, e.g.
the origin of the files. This information is not found inside the files, and
I prefer not modifying their content.

I was hoping to be able to define groups, using subfolders and base-uri or
multiple collections, but I know it's not possible using Basex.
Maybe I can tag them? The group type is both a property and a possible
filter in a query, so I need a good performance.

Right now I see two alternatives:
1. Using two separate databases. Once I need to query all files, I'll make
the same query on multiple databases...
2. One database, but a query to a new xml document, which includes list of
node-ids per group type, will be used both for knowing the type and for
querying a subset.

Any ideas, please?

--
With kind regards,
Menashè

[basex-talk] Add metadata to a group of documents inside the database

2015-05-12 Thread Menashè Eliezer


Hello,
I have two groups of xml to be included in the same database.
Usually the same query will be performed on both of them, but I need to 
able to query only one group.
The difference between the groups is known only to add the xml files, 
e.g. the origin of the files. This information is not found inside the 
files, and I prefer not modifying their content.


I was hoping to be able to define groups, using subfolders and base-uri 
or multiple collections, but I know it's not possible using Basex.
Maybe I can tag them? The group type is both a property and a possible 
filter in a query, so I need a good performance.


Right now I see two alternatives:
1. Using two separate databases. Once I need to query all files, I'll 
make the same query on multiple databases...
2. One database, but a query to a new xml document, which includes list 
of node-ids per group type, will be used both for knowing the type and 
for querying a subset.


Any ideas, please?

--
With kind regards,
Menashè

Re: [basex-talk] Querying a subset

2015-05-04 Thread Menashè Eliezer


Hi Christian,

Thank you for your valuable help!

With kind regards,
Menashè

On 04/30/2015 06:50 PM, Christian Grün wrote:

Hi Menashè,


The following query is really fast.

This should even be faster:

   let $ids := (161,891)
   for $id in $ids
   return db:open-id("collection_name", $id)


Should I use it also for thousands of
possible values instead of creating a temporal collection of the subset?

If the id approach does what you need, there is probably no need for
an additional collection. If you update your data, you may need to use
the UPDINDEX flag before creating the database; otherwise, id lookups
will be pretty slow.

Best,
Christian



let $ids := (161,891)
for $x in collection("collection_name")
where db:node-id($x)=$ids
return ...

With kind regards,
Menashè

On 04/30/2015 04:55 PM, Christian Grün wrote:

Hi Menashé,

If you want to directly address XML nodes of a BaseX database, you can
use the db:node-pre/db:open-pre or db:node-id/db:open-id functions.
Please have a look at the Wiki for more information [1].

Hope this helps,
Christian

[1] http://docs.basex.org/wiki/Database_Module#Read_Operations



On Thu, Apr 30, 2015 at 2:49 PM, Menashè Eliezer
 wrote:

Hello,
I'm using Java org.xmldb.api package for accessing the Basex server
(xmldb:basex://...)
After getting the resultSet I need to make further queries about the
requested subset (for reporting, etc.)
I have seen that getId() cannot be used since the Resource will be anonymous
if it is obtained as the result of a query.
Source: http://xmldb-org.sourceforge.net/xapi/api/index.html

The queries are not fixed and are based on end user selection.
I couldn't find a way to have a direct access/reference to a document. In my
case the ID is simply the filename, but it doesn't seem to be so efficient.
I wonder if creating a temporal collection for the subset would be faster
than making query/queries similar to the following example in which there
are only two ids, but I can have thousands of them:

let $ids := ('360836','300139')
for $x in collection("collection_name")
let $filename := substring-after(base-uri($x),'/')
let $id :=  substring-before($filename,'.')
where $id = $ids
return ...

--
With kind regards,
Menashè

Re: [basex-talk] Querying a subset

2015-04-30 Thread Menashè Eliezer


Hi Christian,

Thank you for this valuable reply!
The following query is really fast. Should I use it also for thousands 
of possible values instead of creating a temporal collection of the subset?


let $ids := (161,891)
for $x in collection("collection_name")
where db:node-id($x)=$ids
return ...

With kind regards,
Menashè

On 04/30/2015 04:55 PM, Christian Grün wrote:

Hi Menashé,

If you want to directly address XML nodes of a BaseX database, you can
use the db:node-pre/db:open-pre or db:node-id/db:open-id functions.
Please have a look at the Wiki for more information [1].

Hope this helps,
Christian

[1] http://docs.basex.org/wiki/Database_Module#Read_Operations



On Thu, Apr 30, 2015 at 2:49 PM, Menashè Eliezer
 wrote:

Hello,
I'm using Java org.xmldb.api package for accessing the Basex server
(xmldb:basex://...)
After getting the resultSet I need to make further queries about the
requested subset (for reporting, etc.)
I have seen that getId() cannot be used since the Resource will be anonymous
if it is obtained as the result of a query.
Source: http://xmldb-org.sourceforge.net/xapi/api/index.html

The queries are not fixed and are based on end user selection.
I couldn't find a way to have a direct access/reference to a document. In my
case the ID is simply the filename, but it doesn't seem to be so efficient.
I wonder if creating a temporal collection for the subset would be faster
than making query/queries similar to the following example in which there
are only two ids, but I can have thousands of them:

let $ids := ('360836','300139')
for $x in collection("collection_name")
let $filename := substring-after(base-uri($x),'/')
let $id :=  substring-before($filename,'.')
where $id = $ids
return ...

--
With kind regards,
Menashè

Re: [basex-talk] Querying a subset

2015-04-30 Thread Menashè Eliezer


And:
XQuery has no formal notion of files inside a database/collection
http://stackoverflow.com/questions/3363442/refer-to-a-specific-document-in-a-basex-db-using-xquery

With kind regards,
Menashè

On 04/30/2015 02:49 PM, Menashè Eliezer wrote:

Hello,
I'm using Java org.xmldb.api package for accessing the Basex server 
(xmldb:basex://...)
After getting the resultSet I need to make further queries about the 
requested subset (for reporting, etc.)
I have seen that getId() cannot be used since the Resource will be 
anonymous if it is obtained as the result of a query.

Source: http://xmldb-org.sourceforge.net/xapi/api/index.html

The queries are not fixed and are based on end user selection.
I couldn't find a way to have a direct access/reference to a document. 
In my case the ID is simply the filename, but it doesn't seem to be so 
efficient.
I wonder if creating a temporal collection for the subset would be 
faster than making query/queries similar to the following example in 
which there are only two ids, but I can have thousands of them:


let $ids := ('360836','300139')
for $x in collection("collection_name")
let $filename := substring-after(base-uri($x),'/')
let $id :=  substring-before($filename,'.')
where $id = $ids
return ...

--
With kind regards,
Menashè

[basex-talk] Querying a subset

2015-04-30 Thread Menashè Eliezer


Hello,
I'm using Java org.xmldb.api package for accessing the Basex server 
(xmldb:basex://...)
After getting the resultSet I need to make further queries about the 
requested subset (for reporting, etc.)
I have seen that getId() cannot be used since the Resource will be 
anonymous if it is obtained as the result of a query.

Source: http://xmldb-org.sourceforge.net/xapi/api/index.html

The queries are not fixed and are based on end user selection.
I couldn't find a way to have a direct access/reference to a document. 
In my case the ID is simply the filename, but it doesn't seem to be so 
efficient.
I wonder if creating a temporal collection for the subset would be 
faster than making query/queries similar to the following example in 
which there are only two ids, but I can have thousands of them:


let $ids := ('360836','300139')
for $x in collection("collection_name")
let $filename := substring-after(base-uri($x),'/')
let $id :=  substring-before($filename,'.')
where $id = $ids
return ...

--
With kind regards,
Menashè

Re: [basex-talk] xml serialisation

2015-03-12 Thread Menashè Eliezer

Great! Now I'm able to create a perfectly clean and customised xml 
result freely using @codeListValue, /text(), string(), etc.
I've also added string handling so I already have the name of the 
collection: 


Thank you Marco!

With kind regards,
Menashè

On 03/12/2015 02:07 PM, Marco Lettere wrote:


Maybe something like:

for $x in db:open('mydb)
return

{$x}


optionally, if you want everything included to become a unique XML 
element:


{
for $x in db:open('mydb)
return

{$x}

}

Hope I understood your question correctly...
M.


On 12/03/2015 13:46, Christian Grün wrote:

Hi Menashè,

BaseX 8.0 and later uses the "adaptive" serialization as default [1];
maybe this gives you the freedom you need? If not, please provide us
with a little example of how you would like the input to be formatted.

Best,
Christian

[1] http://docs.basex.org/wiki/XQuery_3.1#Adaptive_Serialization


On Thu, Mar 12, 2015 at 12:09 PM, Menashè Eliezer
 wrote:

Hello,
I need xquery reply which is composed of multiple values.
I've chosen the xml format, but I need also the collection name of each
record since I'm searching in multiple collections.
However for parsing the entire reply as xml I need to add an 
arbitrary xml

tag to base-uri($x).
I couldn't find a way to do it.
Any ideas?

Background: I've tried first the JSON format as output:method, 
however it

cannot be used for more than one record.
I've tried text format, but I need an attribute and I get: 
@codeListValue

cannot be serialised
So, xml format provides me the option to query the attribute, even 
though

the reply length is bigger.

I'm using BaseX 8 beta.

--
With kind regards,
Menashè

[basex-talk] xml serialisation

2015-03-12 Thread Menashè Eliezer


Hello,
I need xquery reply which is composed of multiple values.
I've chosen the xml format, but I need also the collection name of each 
record since I'm searching in multiple collections.
However for parsing the entire reply as xml I need to add an arbitrary 
xml tag to base-uri($x).

I couldn't find a way to do it.
Any ideas?

Background: I've tried first the JSON format as output:method, however 
it cannot be used for more than one record.
I've tried text format, but I need an attribute and I get: 
@codeListValue cannot be serialised
So, xml format provides me the option to query the attribute, even 
though the reply length is bigger.


I'm using BaseX 8 beta.

--
With kind regards,
Menashè

Re: [basex-talk] impressed! and already trying to run before I can walk

2015-02-03 Thread Menashè Eliezer


Hi Niels,
Just confirming BaseX is very good. I've confronted its performance and 
support to the biggest open source alternative, which I won't mention 
its name here.


With kind regards,
Menashè

On 02/02/2015 02:48 PM, Bridger Dyson-Smith wrote:

Hi Niels,

BaseX has some very nice documentation in the wiki [1]. The WIkibooks 
XQuery document can be helpful[2] and Priscilla Walmsley's XQuery book 
is an absolutely fantastic reference.


Hope that's helpful.
Cheers,
Bridger

[1] http://docs.basex.org/wiki/XQuery
[2] http://en.wikibooks.org/wiki/XQuery

On Mon, Feb 2, 2015 at 8:42 AM, Niels Grundtvig Nielsen 
mailto:communicator@gmail.com>> wrote:


BaseX already looks/feels like a program I am going to enjoy
using, even with the general challenge that I don't know enough
about xQuery. I would welcome suggestions for anywhere to find a
friendly introduction!

My first specific challenge is trying to under the syntax for
replace. In the code, I've managed to retrieve a single xml
document where  contains "35. Andan" … I made a copy/paste
error, and in this document the value of title should be '35.
Andantino' instead of '35. Andantino mesto'. My experiment is
obviously incorrect, so I'd welcome tips on what's wrong with it.

//score[contains(title, '35. Andan')]//title
replace value of (title) with '35. Andantino'

Thanks in advance!

Re: [basex-talk] Slow query

2015-02-03 Thread Menashè Eliezer

Hi Christian,

Thank you! The performance arrives to 0.5 sec!

The biggest improvement is related to the query rephrasing you've suggested.
Then the latest snapshot also helps a lot!
You may want to know that in the log of the latest snapshot I see
applying attribute index for "7827"
which is not clear to the user, instead of BaseX80-20150130.124009 which has
also used indexing:
applying attribute index for ("ALKY", "AYMD")

I'm attaching the first and the second launch of the query using BaseXGUI.
Relaunching the same query reduces the time from over 1 second to 0.5 second.
Some data:
BaseX80-20150130.124009
Total Time: 30676.02 ms
After using "for $x in
collection("ALL-CDIS")/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification":
Total Time: 5456.74 ms
applying attribute index for ("ALKY", "AYMD") in log.
Second launch: 1333.71 ms
Latest snapshot (BaseX80-20150202.121033):
1st: Total Time: 1873.02 ms
2nd: Total Time: 548.62 ms

With kind regards,
Menashè

On 02/02/2015 02:02 PM, Menashè Eliezer wrote:

Hi Christian,

Thank you very much! Unfortunately I'll be at the office only tomorrow.

Menashè

On Sat, 31 Jan 2015 16:42:32 +0100, Christian Grün
wrote:

Hi Menashè,

With the latest snapshot [1], your original query should now be
rewritten for index access as well. Looking forward to your tests,

Christian

PS: In terms of performance, it may still be worthwhile to move
redundant paths to the for clause; but just try and see.

[1] http://files.basex.org/releases/latest/

On Fri, Jan 30, 2015 at 9:49 PM, Christian Grün
wrote:

Hi Menashè,

Should I expect to see the usage of an index for each of the where

phrases?

Usually, only one predicate will be rewritten for index access, and
the remaining conditions will be answered sequentially.

Have a nice weekend!

Enjoy,
Christian

Menashè

On Fri, 30 Jan 2015 18:11:59 +0100, Christian Grün
wrote:

Hi Menashè,

Thanks for the XML samples you sent me in private. I noticed that the
index rewritings will only be triggered if you formulate your query as
follows:

OLD:
for $x in collection("ALL-CDIS")
where $x/gmd:MD_Metadata/gmd:identificationInfo/...
return ...

NEW:
for $x in collection("ALL-CDIS")/gmd:MD_Metadata
where $x/gmd:identificationInfo/...
return ...

gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification

If that will be always be the case, it surely makes sense to move all
of them to the "for" clause.

Looking forward to your updated performance tests,
Christian
___

On Fri, Jan 30, 2015 at 5:55 PM, Christian Grün
wrote:

Could you possibly provide me with a small snapshot of your data
sources (one, two documents might be sufficient)?

On Fri, Jan 30, 2015 at 5:52 PM, Menashè Eliezer
wrote:

Almost the same speed with version 8.0.
No indexing (no "applying" in the query info).
As I've attached before, indexes are active for this DB.

With kind regards,
Menashè

On 01/30/2015 05:31 PM, Christian Grün wrote:

It's indeed interesting that your query does not use any of the
existing index structures (if they did, you would find strings like
"applying text index" or "applying attribute index" in the query
info). Maybe/hopefully things look different with Version 8.0.

On Fri, Jan 30, 2015 at 5:26 PM, Menashè Eliezer
wrote:

On 01/30/2015 05:18 PM, Christian Grün wrote:

/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:descriptiveKeywords[1]/gmd:MD_Keywords/gmd:keyword[2]/sdn:SDN_ParameterDiscoveryCode/@codeListValue

How can I remove *?

Simply remove the predicate; a[*]/b is the same as a/b.

Maybe I wasn't clear. The actual number appears in the xml file,

e.g.,

gmd:descriptiveKeywords[1]
Anyway, I've removed all [*] and I get the same correct result,

however

the
processing time is doubled...

* In some cases, if you know that an element name is distinct,

you

can

get rid of all the explicit child steps and directly address

the

node

via the descendant axis.

Thanks, but it's not relevant in my case.

Is it because the element names are not distinct? Or is it

because

your input form allows users to choose arbitrary paths for

arbitrary

documents?

The element names are not distinct.

Sure, I'l also try BaseX 8.0 and compare. Should I recreate the

importing
the xml f

Re: [basex-talk] Slow query

2015-02-02 Thread Menashè Eliezer

Hi Christian,

Thank you very much! Unfortunately I'll be at the office only tomorrow.

Menashè

On Sat, 31 Jan 2015 16:42:32 +0100, Christian Grün
 wrote:
> Hi Menashè,
> 
> With the latest snapshot [1], your original query should now be
> rewritten for index access as well. Looking forward to your tests,
> 
> Christian
> 
> PS: In terms of performance, it may still be worthwhile to move
> redundant paths to the for clause; but just try and see.
> 
> [1] http://files.basex.org/releases/latest/
> 
> 
> 
> On Fri, Jan 30, 2015 at 9:49 PM, Christian Grün
>  wrote:
>> Hi Menashè,
>>
>>> Should I expect to see the usage of an index for each of the where
> phrases?
>>
>> Usually, only one predicate will be rewritten for index access, and
>> the remaining conditions will be answered sequentially.
>>
>>> Have a nice weekend!
>>
>> Enjoy,
>> Christian
>>
>>
>>> Menashè
>>>
>>> On Fri, 30 Jan 2015 18:11:59 +0100, Christian Grün
>>>  wrote:
>>>> Hi Menashè,
>>>>
>>>> Thanks for the XML samples you sent me in private. I noticed that the
>>>> index rewritings will only be triggered if you formulate your query as
>>>> follows:
>>>>
>>>> OLD:
>>>>   for $x in collection("ALL-CDIS")
>>>>   where $x/gmd:MD_Metadata/gmd:identificationInfo/...
>>>>   return ...
>>>>
>>>> NEW:
>>>>   for $x in collection("ALL-CDIS")/gmd:MD_Metadata
>>>>   where $x/gmd:identificationInfo/...
>>>>   return ...
>>>>
>>>> It's difficult to explain in short sentences why Variant 1 cannot be
>>>> optimized that straightforward (basically, it's quite a different
>>>> pattern to look for), but I'll check out if we can extend our matcher
>>>> to also support these kind of queries.
>>>>
>>>> So, if possible, I would recommend you for now (and at least for
>>>> testing) to move the root element test after the collection()
>>>> function. I noticed that the first three child steps are the same in
>>>> all of your conditions:
>>>>
>>>>   gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification
>>>>
>>>> If that will be always be the case, it surely makes sense to move all
>>>> of them to the "for" clause.
>>>>
>>>> Looking forward to your updated performance tests,
>>>> Christian
>>>> ___
>>>>
>>>> On Fri, Jan 30, 2015 at 5:55 PM, Christian Grün
>>>>  wrote:
>>>>> Could you possibly provide me with a small snapshot of your data
>>>>> sources (one, two documents might be sufficient)?
>>>>>
>>>>>
>>>>> On Fri, Jan 30, 2015 at 5:52 PM, Menashè Eliezer
>>>>>  wrote:
>>>>>> Almost the same speed with version 8.0.
>>>>>> No indexing (no "applying" in the query info).
>>>>>> As I've attached before, indexes are active for this DB.
>>>>>>
>>>>>> With kind regards,
>>>>>> Menashè
>>>>>>
>>>>>>
>>>>>> On 01/30/2015 05:31 PM, Christian Grün wrote:
>>>>>>>
>>>>>>> It's indeed interesting that your query does not use any of the
>>>>>>> existing index structures (if they did, you would find strings like
>>>>>>> "applying text index" or "applying attribute index" in the query
>>>>>>> info). Maybe/hopefully things look different with Version 8.0.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jan 30, 2015 at 5:26 PM, Menashè Eliezer
>>>>>>>  wrote:
>>>>>>>>
>>>>>>>> On 01/30/2015 05:18 PM, Christian Grün wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>
>>>
>
/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:descriptiveKeywords[1]/gmd:MD_Keywords/gmd:keyword[2]/sdn:SDN_ParameterDiscoveryCode/@codeListValue
>>>>>>>>>>
>>>>>>>>>> How can I remove *?
>>>>>>>>>
>>>>>>>>> Simply remove the predicate; a[*]/b is the

Re: [basex-talk] Slow query

2015-01-30 Thread Menashè Eliezer

Hi Christian,

Interesting! I'll check it when I'm back at the office and keep you
updated.
I'll use for $x in
collection("ALL-CDIS")/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification
as you've suggested.
Should I expect to see the usage of an index for each of the where phrases?

Have a nice weekend!
Menashè

On Fri, 30 Jan 2015 18:11:59 +0100, Christian Grün
 wrote:
> Hi Menashè,
> 
> Thanks for the XML samples you sent me in private. I noticed that the
> index rewritings will only be triggered if you formulate your query as
> follows:
> 
> OLD:
>   for $x in collection("ALL-CDIS")
>   where $x/gmd:MD_Metadata/gmd:identificationInfo/...
>   return ...
> 
> NEW:
>   for $x in collection("ALL-CDIS")/gmd:MD_Metadata
>   where $x/gmd:identificationInfo/...
>   return ...
> 
> It's difficult to explain in short sentences why Variant 1 cannot be
> optimized that straightforward (basically, it's quite a different
> pattern to look for), but I'll check out if we can extend our matcher
> to also support these kind of queries.
> 
> So, if possible, I would recommend you for now (and at least for
> testing) to move the root element test after the collection()
> function. I noticed that the first three child steps are the same in
> all of your conditions:
> 
>   gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification
> 
> If that will be always be the case, it surely makes sense to move all
> of them to the "for" clause.
> 
> Looking forward to your updated performance tests,
> Christian
> ___
> 
> On Fri, Jan 30, 2015 at 5:55 PM, Christian Grün
>  wrote:
>> Could you possibly provide me with a small snapshot of your data
>> sources (one, two documents might be sufficient)?
>>
>>
>> On Fri, Jan 30, 2015 at 5:52 PM, Menashè Eliezer
>>  wrote:
>>> Almost the same speed with version 8.0.
>>> No indexing (no "applying" in the query info).
>>> As I've attached before, indexes are active for this DB.
>>>
>>> With kind regards,
>>> Menashè
>>>
>>>
>>> On 01/30/2015 05:31 PM, Christian Grün wrote:
>>>>
>>>> It's indeed interesting that your query does not use any of the
>>>> existing index structures (if they did, you would find strings like
>>>> "applying text index" or "applying attribute index" in the query
>>>> info). Maybe/hopefully things look different with Version 8.0.
>>>>
>>>>
>>>> On Fri, Jan 30, 2015 at 5:26 PM, Menashè Eliezer
>>>>  wrote:
>>>>>
>>>>> On 01/30/2015 05:18 PM, Christian Grün wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>
/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:descriptiveKeywords[1]/gmd:MD_Keywords/gmd:keyword[2]/sdn:SDN_ParameterDiscoveryCode/@codeListValue
>>>>>>>
>>>>>>> How can I remove *?
>>>>>>
>>>>>> Simply remove the predicate; a[*]/b is the same as a/b.
>>>>>
>>>>> Maybe I wasn't clear. The actual number appears in the xml file,
> e.g.,
>>>>> gmd:descriptiveKeywords[1]
>>>>> Anyway, I've removed all [*] and I get the same correct result,
> however
>>>>> the
>>>>> processing time is doubled...
>>>>>>
>>>>>>
>>>>>>>> * In some cases, if you know that an element name is distinct, you
> can
>>>>>>>> get rid of all the explicit child steps and directly address the
> node
>>>>>>>> via the descendant axis.
>>>>>>>
>>>>>>> Thanks, but it's not relevant in my case.
>>>>>>
>>>>>> Is it because the element names are not distinct? Or is it because
>>>>>> your input form allows users to choose arbitrary paths for arbitrary
>>>>>> documents?
>>>>>
>>>>> The element names are not distinct.
>>>>>
>>>>>>> Sure, I'l also try BaseX 8.0 and compare. Should I recreate the db
>>>>>>> importing
>>>>>>> the xml files for testing the improved indexing?
>>>>>>
>>>>>> We have actually improved support for collections, but the database
>>>>>> format itself has not changed, so it shouldn't make a difference in
>>>>>> your case.
>>>>>>
>>>>>> Christian
>>>>>>
>>>>>>
>>>>>>>> [1] http://files.basex.org/releases/latest
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jan 30, 2015 at 3:55 PM, Menashè Eliezer
>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>> I wonder if the attached query can be optimised. I'm attaching
> all
>>>>>>>>> relevant
>>>>>>>>> information.
>>>>>>>>> Basex 7.9, Debian, powerful server.
>>>>>>>>> This is just an example. The queries will be built based on a
>>>>>>>>> compilation
>>>>>>>>> of
>>>>>>>>> a search form.
>>>>>>>>> Any help would be appreciated.
>>>>>>>>> 40 seconds are not acceptable.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> With kind regards,
>>>>>>>>> Menashè
>>>>>>>>>
>>>>>>> --
>>>>>>> With kind regards,
>>>>>>> Menashè
>>>>>>>
>>>>>>>
>>>>> With kind regards,
>>>>> Menashè
>>>>>
>>>
-- 
Menashè

Re: [basex-talk] Slow query

2015-01-30 Thread Menashè Eliezer

Almost the same speed with version 8.0.
No indexing (no "applying" in the query info).
As I've attached before, indexes are active for this DB.

With kind regards,
Menashè

On 01/30/2015 05:31 PM, Christian Grün wrote:

On Fri, Jan 30, 2015 at 5:26 PM, Menashè Eliezer
wrote:

On 01/30/2015 05:18 PM, Christian Grün wrote:

/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:descriptiveKeywords[1]/gmd:MD_Keywords/gmd:keyword[2]/sdn:SDN_ParameterDiscoveryCode/@codeListValue

How can I remove *?

Simply remove the predicate; a[*]/b is the same as a/b.

Maybe I wasn't clear. The actual number appears in the xml file, e.g.,
gmd:descriptiveKeywords[1]
Anyway, I've removed all [*] and I get the same correct result, however the
processing time is doubled...

* In some cases, if you know that an element name is distinct, you can
get rid of all the explicit child steps and directly address the node
via the descendant axis.

Thanks, but it's not relevant in my case.

Is it because the element names are not distinct? Or is it because
your input form allows users to choose arbitrary paths for arbitrary
documents?

The element names are not distinct.

Sure, I'l also try BaseX 8.0 and compare. Should I recreate the db
importing
the xml files for testing the improved indexing?

We have actually improved support for collections, but the database
format itself has not changed, so it shouldn't make a difference in
your case.

Christian

[1] http://files.basex.org/releases/latest

On Fri, Jan 30, 2015 at 3:55 PM, Menashè Eliezer
wrote:

Hello,
I wonder if the attached query can be optimised. I'm attaching all
relevant
information.
Basex 7.9, Debian, powerful server.
This is just an example. The queries will be built based on a
compilation
of
a search form.
Any help would be appreciated.
40 seconds are not acceptable.

--
With kind regards,
Menashè

With kind regards,
Menashè

Re: [basex-talk] Slow query

2015-01-30 Thread Menashè Eliezer

On 01/30/2015 05:18 PM, Christian Grün wrote:

/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:descriptiveKeywords[1]/gmd:MD_Keywords/gmd:keyword[2]/sdn:SDN_ParameterDiscoveryCode/@codeListValue

How can I remove *?

Simply remove the predicate; a[*]/b is the same as a/b.
Maybe I wasn't clear. The actual number appears in the xml file, e.g.,
gmd:descriptiveKeywords[1]
Anyway, I've removed all [*] and I get the same correct result, however
the processing time is doubled...

* In some cases, if you know that an element name is distinct, you can
get rid of all the explicit child steps and directly address the node
via the descendant axis.

Thanks, but it's not relevant in my case.

Is it because the element names are not distinct? Or is it because
your input form allows users to choose arbitrary paths for arbitrary
documents?

The element names are not distinct.

Sure, I'l also try BaseX 8.0 and compare. Should I recreate the db importing
the xml files for testing the improved indexing?

We have actually improved support for collections, but the database
format itself has not changed, so it shouldn't make a difference in
your case.

Christian

[1] http://files.basex.org/releases/latest

On Fri, Jan 30, 2015 at 3:55 PM, Menashè Eliezer
wrote:

Hello,
I wonder if the attached query can be optimised. I'm attaching all
relevant
information.
Basex 7.9, Debian, powerful server.
This is just an example. The queries will be built based on a compilation
of
a search form.
Any help would be appreciated.
40 seconds are not acceptable.

--
With kind regards,
Menashè

With kind regards,
Menashè

Re: [basex-talk] Slow query

2015-01-30 Thread Menashè Eliezer

Hi  Christian,

Thank you for your reply. Updated files are attached.

On 01/30/2015 04:35 PM, Christian Grün wrote:

Hi Menashè,

First of all, I wonder if your query really does what you want it to
do. I noticed for example that some of the where conditions start with
"$x/", while others start with "/" and some others start with no
slash. Is this intentional?

I've added $x and now it takes little less: 30 sec.
I haven't seen a case of no slash.

Some more comments:

* I would recommend you to avoid numeric tests in the @codeListValue
tests and use string tests instead (/@codeListValue = "7827", etc).

Done. Down to 23 sec.

* Usually, you can also get rid of the xs:dateTime() conversions,
because items of type date and time can also be compared as strings.

Done. Down to almost 19 sec. Still too much.

* I'm not sure what the predicates [*] are supposed to do in your
query. If you remove them, you will get the same results.
* means that I don't know if it's 1,2 or any other number inside the 
XPath, e.g. 
/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:descriptiveKeywords[1]/gmd:MD_Keywords/gmd:keyword[2]/sdn:SDN_ParameterDiscoveryCode/@codeListValue

How can I remove *?

* In some cases, if you know that an element name is distinct, you can
get rid of all the explicit child steps and directly address the node
via the descendant axis.

Thanks, but it's not relevant in my case.

So reordering the conditions for having smaller subset right from the
begging isn't relevant.

Reordering shouldn't make a big difference anyway, because BaseX tries
to find the cheapest index request by itself, based on the database
statistics.

Great, as I expect from a good product :)

Beside that, I would be interested to hear if you get better results
with BaseX 8.0 [1], because we recently spent quite some time to
further improve our index rewriting rules.

Sure, I'l also try BaseX 8.0 and compare. Should I recreate the db 
importing the xml files for testing the improved indexing?

Hope this helps,
Christian

[1] http://files.basex.org/releases/latest

On Fri, Jan 30, 2015 at 3:55 PM, Menashè Eliezer
 wrote:

Hello,
I wonder if the attached query can be optimised. I'm attaching all relevant
information.
Basex 7.9, Debian, powerful server.
This is just an example. The queries will be built based on a compilation of
a search form.
Any help would be appreciated.
40 seconds are not acceptable.

--
With kind regards,
Menashè

--
With kind regards,
Menashè

Compiling:
- pre-evaluating fn:collection("ALL-CDIS")
- pre-evaluating ("ALKY", "AYMD")
- rewriting 
($x_0/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:geographicElement/gmd:EX_GeographicBoundingBox/gmd:westBoundLongitude/gco:Decimal
 >= "13.708333")
- rewriting 
($x_0/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:geographicElement/gmd:EX_GeographicBoundingBox/gmd:eastBoundLongitude/gco:Decimal
 <= "15.708333")
- rewriting 
($x_0/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:geographicElement/gmd:EX_GeographicBoundingBox/gmd:southBoundLatitude/gco:Decimal
 >= "45.6976667")
- rewriting 
($x_0/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:geographicElement/gmd:EX_GeographicBoundingBox/gmd:northBoundLatitude/gco:Decimal
 <= "55.6976667")
- rewriting 
(fn:string($x_0/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:temporalElement/gmd:EX_TemporalExtent/gmd:extent/gml:TimePeriod/gml:beginPosition)
 >= "2012-03-30T09:07:00")
- rewriting 
(fn:string($x_0/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:temporalElement/gmd:EX_TemporalExtent/gmd:extent/gml:TimePeriod/gml:beginPosition)
 <= "2012-04-30T09:07:00")
- removing context expression (.)
- removing context expression (.)
- removing context expression (.)
- removing context expression (.)
- removing context expression (.)
- removing context expression (.)
- removing context expression (.)
- removing context expression (.)
- removing context expression (.)
- removing context expression (.)
- removing context expression (.)
- removing context expression (.)
- rewriting where clause(s)
Query:
xquery version "3.0"; declare namespace gco = 
"http://www.isotc211.org/2005/gco";; declare namespace gmd = 
"http://www.isotc211.org/2005/gmd";; declare namespace gml = 
"http://www.opengis.net/gml";; declare namespace 
gmx="http://www.isotc211.org/2005/gmx";; declare namespace sdn = 
"http://www.seadatanet.org";; declare namespace fn = 
"http://www.w3

[basex-talk] Slow query

2015-01-30 Thread Menashè Eliezer


Hello,
I wonder if the attached query can be optimised. I'm attaching all 
relevant information.

Basex 7.9, Debian, powerful server.
This is just an example. The queries will be built based on a 
compilation of a search form.
So reordering the conditions for having smaller subset right from the 
begging isn't relevant.

Any help would be appreciated.
40 seconds are not acceptable.

--
With kind regards,
Menashè

Compiling:
- pre-evaluating fn:collection("ALL-CDIS")
- pre-evaluating ("ALKY", "AYMD")
- rewriting ((db:open-pre("ALL-CDIS",0), db:open-pre("ALL-CDIS",747), 
...)/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:geographicElement/gmd:EX_GeographicBoundingBox/gmd:westBoundLongitude/gco:Decimal
 >= "13.708333")
- rewriting ((db:open-pre("ALL-CDIS",0), db:open-pre("ALL-CDIS",747), 
...)/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:geographicElement/gmd:EX_GeographicBoundingBox/gmd:eastBoundLongitude/gco:Decimal
 <= "15.708333")
- rewriting ((db:open-pre("ALL-CDIS",0), db:open-pre("ALL-CDIS",747), 
...)/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:geographicElement/gmd:EX_GeographicBoundingBox/gmd:southBoundLatitude/gco:Decimal
 >= "45.6976667")
- rewriting ((db:open-pre("ALL-CDIS",0), db:open-pre("ALL-CDIS",747), 
...)/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:geographicElement/gmd:EX_GeographicBoundingBox/gmd:northBoundLatitude/gco:Decimal
 <= "55.6976667")
- pre-evaluating "2012-03-30T09:07:00" cast as xs:dateTime
- atomic evaluation of 
(fn:string($x_0/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:temporalElement/gmd:EX_TemporalExtent/gmd:extent/gml:TimePeriod/gml:beginPosition)
 cast as xs:dateTime >= "2012-03-30T09:07:00")
- pre-evaluating "2012-04-30T09:07:00" cast as xs:dateTime
- atomic evaluation of 
(fn:string($x_0/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:temporalElement/gmd:EX_TemporalExtent/gmd:extent/gml:TimePeriod/gml:beginPosition)
 cast as xs:dateTime <= "2012-04-30T09:07:00")
- removing context expression (.)
- removing context expression (.)
- atomic evaluation of 
(fn:string(gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:temporalElement/gmd:EX_TemporalExtent/gmd:extent/gml:TimePeriod/gml:beginPosition)
 cast as xs:dateTime >= "2012-03-30T09:07:00")
- removing context expression (.)
- atomic evaluation of 
(fn:string(gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:temporalElement/gmd:EX_TemporalExtent/gmd:extent/gml:TimePeriod/gml:beginPosition)
 cast as xs:dateTime <= "2012-04-30T09:07:00")
- removing context expression (.)
- removing context expression (.)
- removing context expression (.)
- removing context expression (.)
- removing context expression (.)
- rewriting where clause(s)
Query:
xquery version "3.0"; declare namespace gco = 
"http://www.isotc211.org/2005/gco";; declare namespace gmd = 
"http://www.isotc211.org/2005/gmd";; declare namespace gml = 
"http://www.opengis.net/gml";; declare namespace 
gmx="http://www.isotc211.org/2005/gmx";; declare namespace sdn = 
"http://www.seadatanet.org";; declare namespace fn = 
"http://www.w3.org/2005/xpath-functions";; declare namespace xs = 
"http://www.w3.org/2001/XMLSchema";; for $x in collection("ALL-CDIS") where 
$x/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:descriptiveKeywords[*]/gmd:MD_Keywords/gmd:keyword[*]/sdn:SDN_ParameterDiscoveryCode/@codeListValue
 = ("ALKY","AYMD") and 
/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:geographicElement/gmd:EX_GeographicBoundingBox/gmd:westBoundLongitude/gco:Decimal
 >= "13.708333" and 
/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:geographicElement/gmd:EX_GeographicBoundingBox/gmd:eastBoundLongitude/gco:Decimal
 <= "15.708333" and 
/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:geographicElement/gmd:EX_GeographicBoundingBox/gmd:southBoundLatitude/gco:Decimal
 >= "45.6976667" and 
/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:geographicElement/gmd:EX_GeographicBoundingBox/gmd:northBoundLatitude/gco:Decimal
 <= "55.6976667" and 
xs:dateTime(string($x/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:temporalElement/gmd:EX_TemporalExtent/gmd:extent/gml:TimePeriod/gml:beginPosition))
 >= xs:dateTime("2012-03-30T09:07:00") and 
xs:dateTime(string($x/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:temporalElement/gmd:EX_TemporalExtent/gmd:extent/gml:TimePeriod/gml:beginPosition))
 <= xs:dat

47 matches

Mail list logo