Re: [basex-talk] Stopword-related NullPointerException (in FTWords.java)

2015-09-15 Thread Ron Katriel
Christian,

Please ignore the report below. It was triggered by a syntax error in the query 
(in a section not shown below). The error message threw me off as it was 
unrelated.

Thanks,
Ron


Ron Katriel, Ph.D. | Senior Data Scientist | Medidata Solutions
350 Hudson Street, 7th Floor, New York, NY 10014
rkatr...@mdsol.com | direct: +1 201 337 3622 | mobile: +1 201 675 5598 | main: 
+1 212 918 1800


On September 15, 2015 at 1:43:44 PM, Ron Katriel (rkatr...@mdsol.com) wrote:

Hi Christian,

I downloaded the latest release and confirmed the fix. Thanks for the quick 
turnaround!

However, I am now getting a different (presumably unrelated) error

Stopped at /Users/rkatriel/Documents/Data Science/Data 
Sets/CUR/CustomerUsageReport/2015/07-15-2015/JOIN/cdsstudies.drugbank.join.result/file,
 1/51:
[XQST0033] Duplicate declaration of prefix 'functx'.

when executing the following query (rest of code omitted for brevity)

declare namespace functx = "http://www.functx.com;;

declare function functx:value-union ($arg1 as xs:anyAtomicType*, $arg2 as 
xs:anyAtomicType*) as xs:anyAtomicType* {
  ($arg1, $arg2)
};

Changing the namespace or function name does not help; omitting it produces a 
different error (No namespace declared for 'functx:value-union’).

This worked before. Has something changed?

Best,
Ron


Ron Katriel, Ph.D. | Senior Data Scientist | Medidata Solutions
350 Hudson Street, 7th Floor, New York, NY 10014
rkatr...@mdsol.com | direct: +1 201 337 3622 | mobile: +1 201 675 5598 | main: 
+1 212 918 1800

On September 15, 2015 at 7:05:59 AM, Christian Grün (christian.gr...@gmail.com) 
wrote:

Hi Ron,

The problem is fixed in the latest snapshot [1].

By the way: If you specify a stopword when creating a database, there
is no need to specify it in the query. I have also updated our Wiki
article on Full Text Index Processing to make this more explicit [2].

Hope this helps,
Christian

[1] http://files.basex.org/releases/latest/
[2] http://docs.basex.org/wiki/Full-Text#Index_Processing


On Mon, Sep 14, 2015 at 4:30 PM, Ron Katriel  wrote:
> Hi Christian,
>
> Thanks for following up on this. Please use the attached XML files to create
> the CTGov and MeSH databases (the first contains just NCT00303472 while the
> second the definitions of the 4 MeSH terms referenced in the
>  section of this CT.gov trial). Also attached is the
> stopwords file (containing just ‘syndrome'). I verified that the issue is
> reproducible with these minimal files.
>
> Note: I enabled full text indexing for both databases (using SET FTINDEX
> true), in case it matters.
>
> Looking forward to having this resolved.
>
> Best,
> Ron
>
>
> Ron Katriel, Ph.D. | Senior Data Scientist | Medidata Solutions
> 350 Hudson Street, 7th Floor, New York, NY 10014
> rkatr...@mdsol.com | direct: +1 201 337 3622 | mobile: +1 201 675 5598 |
> main: +1 212 918 1800
>
>
> On September 14, 2015 at 6:28:24 AM, Christian Grün
> (christian.gr...@gmail.com) wrote:
>
> Hi Ron,
>
> Sorry for late reply and thanks for your bug report. I am pretty sure
> this is a bug -- but it's difficult to guess what's going wrong. Could
> you possibly point me to the XML source documents or ideally provide
> me a small example to test?
>
> Thanks,
> Christian
>
>
> On Sun, Aug 30, 2015 at 5:56 PM, Ron Katriel  wrote:
>> Hi,
>>
>> I encountered a peculiar error with a query using a stopwords file in the
>> context of a full text search. The query joins two XML databases: CT.gov
>> (containing 86635503 nodes) and the 2015 MeSH dictionary (containing
>> 12064461 nodes). I am debugging using CT.gov trial NCT00303472, hardcoded
>> in
>> the ‘where' clause of the following query:
>>
>> let $trees := db:open('MeSH')/DescriptorRecordSet/DescriptorRecord
>> for $article in db:open('CTGov')/clinical_study
>> where $article/id_info/nct_id = 'NCT00303472'
>> let $mesh := $article/condition_browse/mesh_term
>> let $tn1 := $trees[DescriptorName/String contains text { $mesh }]
>> let $tn2 := $trees[DescriptorName/String contains text { $mesh } using
>> stop
>> words at
>>
>> "/Volumes/Extra/Documents/Standards/MeSH/stopwords.txt"]/TreeNumberList/TreeNumber
>> return  { $article/id_info/nct_id, $mesh, $tn2 } 
>>
>> When the return clause contains the variable $tn2 (i.e., using stopwords -
>> as shown above) a Java NullPointerException is generated (see the stack
>> trace below). However, when only $tn1 is returned there is no problem (the
>> code for $tn2 is removed by the optimizer).
>>
>> The issue is related to a specific stopword (“syndrome”). When the
>> stopword
>> is removed from the file the exception does not occur. Surprisingly, when
>> the stopword is in uppercase (“Syndrome”) the issue does not occur - even
>> though the target MeSH term in this CT.gov trial is in uppercase, that is
>>
>> Syndrome
>>
>> Am I doing something wrong, or is this a real bug in BaseX? If the former,
>> please suggest a workaround as I would like to filter out 

Re: [basex-talk] Unexpected error: Improper use? Potential bug?

2015-09-15 Thread Pierre-Yves JALLUD

Hi Christian,
I hope your vacancies where beautiful and the return will not be too 
hard ;-)
To answer to your questions, we use actually the 8.2.3 version... I will 
install the last snapshot version to see if the behavior is the same.
For the second question, we have actually a version of BaseX (... I will 
precise later the version...) that have not the problem and that is not 
in our IT infrastructure, using a VM, in another research center but not 
open to the web. I will try to learn more about the whole configuration.


I tell you all this ASAP.
Greatings
Pierre-Yves

Le 14/09/2015 15:51, Christian Grün a écrit :

Hi Pierre-Yves,

Finally some feedback… And only a short one: Did you check out if the
problems persist with the latest BaseX 8.3 snapshot [1]?

And maybe another question: Did you experience similar problems in a
simple setting without load balancer, Apache and HAProxy?

Thanks,
Christian

[1] http://files.basex.org/releases/latest/


On Thu, Aug 27, 2015 at 2:25 PM, Pierre-Yves JALLUD
 wrote:

Hi all,
I have some troubles with BaseX request and webdav access. My situation is
the following: the acces to BaseX with webdav is with an oXygen (developer
version 17) with Ubuntu 14.0.4 and the BaseX server is running under a
Scientific Linux 64b (2.6.32-573.3.1.el6.x86_64 ... if the precision is
necessary ;)). The BaseX server is behind a web cluster using a load
balancer and an APACHE HTTP server. All the access passing thrue this server
and the different ports access are managed with HAProxy.
If the context is not clear enough, don't hesitate to ask more precisions.

The symptom is that the BaseX server is freezing after a webdav request. The
log's extracts you can find below track the following use case:
   - I try an access to http://BaseX.huma-num.fr/webdav/bvm/bvm.xml with
oXygen
   - I recieve a HTTP error 500
   - I try to access to the BaseX server with my web site and more than one
minute later, I have a timeout.

... but the worst symptom is that in some cases (not always), the requested
db get corrupted and emptied (totally or partly) : resources are missing or
they throw End of File exceptions when we then try to reopen them.

This use case can appear many times a day... but not systematically. And
other colleagues have the same problems with other version of oXygen with
other OS (Windows), but allways with the same BaseX server.

Does anyone of you have an idea to solve my problem?

Thanks
Pierre-Yves

Some extracts of the log file:

...

17:13:06.694X.X.X.X:34953admin   REQUEST [GET]
http://BaseX.huma-num.fr/webdav/bvm/bvm.xml
17:13:06.845X.X.X.X:34954admin   REQUEST [LOCK]
http://BaseX.huma-num.fr/webdav/bvm/bvm.xml
17:13:06.888X.X.X.X:34954admin   200 43.82 ms
17:13:08.437X.X.X.X:34953admin   500 Unexpected error: Improper
use? Potential bug? Your feedback is welcome: Contact:
basex-talk@mailman.uni-konstanz.de Version: BaseX 8.2.1 Java: Oracle
Corporation, 1.8.0_31 OS: Linux, amd64 Stack Trace:
java.lang.RuntimeException: org.eclipse.jetty.io.EofException at
org.basex.http.webdav.BXServletResponse.close(BXServletResponse.java:80) at
org.basex.http.webdav.WebDAVServlet.run(WebDAVServlet.java:46) at
org.basex.http.BaseXServlet.service(BaseXServlet.java:64) at
javax.servlet.http.HttpServlet.service(HttpServlet.java:848) at
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextH...
1743.02 ms

...

17:15:20.041X.X.X.X:38436admin   REQUEST [PROPFIND]
http://BaseX.huma-num.fr/webdav/skepsis/Cicero_De_Finibus.xml
17:15:40.067X.X.X.X:38770admin   REQUEST [PROPFIND]
http://BaseX.huma-num.fr/webdav/skepsis/Cicero_De_Finibus.xml
17:16:00.078X.X.X.X:39090admin   REQUEST [PROPFIND]
http://BaseX.huma-num.fr/webdav/skepsis/
17:16:00.170X.X.X.X:39091admin   REQUEST [PROPFIND]
http://BaseX.huma-num.fr/webdav/bvm/bvm.xml
17:16:20.194X.X.X.X:39582admin   REQUEST [PROPFIND]
http://BaseX.huma-num.fr/webdav/bvm/bvm.xml
17:17:00.904X.X.X.X:39091admin   200 60733.77 ms
17:17:00.907X.X.X.X:39582admin   200 40712.41 ms
17:17:00.909X.X.X.X:38770admin   200 80842.45 ms
17:17:00.910X.X.X.X:38436admin   200 100868.15 ms
17:17:01.004X.X.X.X:39090admin   200 60926.05 ms
17:19:37.189Y.Y.Y.Y:39926admin   ERROR   Timeout exceeded.
6.62 ms

...


<>

Re: [basex-talk] Database corruption

2015-09-15 Thread Christian Grün
Hi France,

As you may have seen in my other reply to the mailing list, I have
refactored our WebDAV implementation. Could you please give it a try
[1]?

Thanks,
Christian

[1] http://files.basex.org/releases/latest/



On Mon, Sep 14, 2015 at 12:31 PM, Christian Grün
 wrote:
> Hi France,
>
> The NullPointerException in the stack trace indicates that the problem
> is related to a buggy date conversion. I remember that we recently
> came across a similar issue. Does the problem persist with the latest
> snapshot [1]?
>
> Best,
> Christian
>
> [1] http://files.basex.org/releases/latest/
>
>
> On Tue, Sep 1, 2015 at 10:59 PM, France Baril
>  wrote:
>> Hi,
>>
>> Once in a while, when saving a file that is not well formed from Oxygen,
>> instead of just getting a raw file in the server, we end up with a corrupted
>> database. It's not a constant occurrence meaning that it doesn't happen all
>> the time, but we were able to get what we think is a relevant error message.
>>
>> We think saving
>> 310.mot.com:8984/webdav/en-us/topics/legal-privacy-modular.xml while it
>> wasn't well formed is what caused the issue.
>>
>> local:8984/webdav/en-us/topics/legal-privacy-modular.xml
>> [qtp1977419222-62] INFO com.bradmcevoy.http.HttpManager - PUT :: http://~
>> local:8984/webdav/en-us/topics/legal-privacy-modular.xml - http://~
>> local:8984/webdav/en-us/topics/legal-privacy-modular.xml
>> [qtp1977419222-64] INFO com.bradmcevoy.http.HttpManager - GET :: http://~
>> local:8984/webdav/AppResources/dtds/catalog.xml -
>> http://~local:8984/webdav/AppResources/dtds/catalog.xml
>> [qtp1977419222-65] INFO com.bradmcevoy.http.HttpManager - GET :: http://~
>> local:8984/webdav/AppResources/dtds/catalog.xml -
>> http://~local:8984/webdav/AppResources/dtds/catalog.xml
>> java.lang.NullPointerException
>> java.lang.NullPointerException
>> [qtp1977419222-62] WARN com.bradmcevoy.http.http11.PutHandler - parent
>> exists bu
>> t is not a collection resource: /webdav/en-us/topics
>> [qtp1977419222-61] WARN com.bradmcevoy.http.http11.PutHandler - parent
>> exists bu
>> t is not a collection resource: /webdav/en-us/topics
>> java.lang.NullPointerException
>> java.lang.NullPointerException
>> java.lang.NullPointerException
>> [qtp1977419222-62] WARN com.bradmcevoy.http.http11.PutHandler - Your
>> resource fa
>> ctory returned a resource with a different name to that requested!!!
>> Requested:
>> webdav returned:  - resource factory: class
>> org.basex.http.webdav.BXResourceFact
>> ory
>> java.lang.NullPointerException
>> [qtp1977419222-61] WARN com.bradmcevoy.http.http11.PutHandler - Your
>> resource fa
>> ctory returned a resource with a different name to that requested!!!
>> Requested:
>> webdav returned:  - resource factory: class
>> org.basex.http.webdav.BXResourceFact
>> ory
>> java.lang.NullPointerException
>> java.lang.NullPointerException
>> [qtp1977419222-61] ERROR com.bradmcevoy.http.StandardFilter - process
>> [qtp1977419222-62] ERROR com.bradmcevoy.http.StandardFilter - process
>> java.lang.NullPointerException
>> at java.text.SimpleDateFormat.parse(Unknown Source)
>> at java.text.DateFormat.parse(Unknown Source)
>> at org.basex.util.DateTime.parse(DateTime.java:66)
>> at
>> org.basex.http.webdav.impl.WebDAVService.timestamp(WebDAVService.java
>> :120)
>> at
>> org.basex.http.webdav.impl.WebDAVService.createDb(WebDAVService.java:
>> 269)
>> at org.basex.http.webdav.BXRoot$3.get(BXRoot.java:77)
>> at org.basex.http.webdav.BXRoot$3.get(BXRoot.java:1)
>> at org.basex.http.webdav.BXCode.eval(BXCode.java:37)
>> at org.basex.http.webdav.BXRoot.createCollection(BXRoot.java:79)
>> at org.basex.http.webdav.BXRoot.createCollection(BXRoot.java:1)
>> at
>> com.bradmcevoy.http.http11.PutHandler.findOrCreateFolders(PutHandler.
>> java:261)
>> at
>> com.bradmcevoy.http.http11.PutHandler.findOrCreateFolders(PutHandler.
>> java:249)
>> at
>> com.bradmcevoy.http.http11.PutHandler.process(PutHandler.java:175)
>> at
>> com.bradmcevoy.http.StandardFilter.process(StandardFilter.java:52)
>> at com.bradmcevoy.http.FilterChain.process(FilterChain.java:40)
>> at com.bradmcevoy.http.HttpManager.process(HttpManager.java:228)
>> at org.basex.http.webdav.WebDAVServlet.run(WebDAVServlet.java:43)
>> at org.basex.http.BaseXServlet.service(BaseXServlet.java:64)
>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
>> at
>> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684
>> )
>> at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java
>> :503)
>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
>> ava:137)
>> at
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.jav
>> a:557)
>> at
>> 

Re: [basex-talk] Unexpected error: Improper use? Potential bug?

2015-09-15 Thread Christian Grün
> I hope your vacancies where beautiful and the return will not be too hard

Thanks; both my vacation and my return was pretty pleasant, so I
cannot complain so far ;)

> To answer to your questions, we use actually the 8.2.3 version... I will
> install the last snapshot version to see if the behavior is the same.

I have refactored our WebDAV implementation a bit. Although I have no
real clue if my changes will have a positive effect in your
environment, I invite you to check out our latest snapshot [1] and
give us some more feedback.

Thanks in advance,
Christian

[1] http://files.basex.org/releases/latest/



> For the second question, we have actually a version of BaseX (... I will
> precise later the version...) that have not the problem and that is not in
> our IT infrastructure, using a VM, in another research center but not open
> to the web. I will try to learn more about the whole configuration.
>
> I tell you all this ASAP.
> Greatings
> Pierre-Yves
>
>
> Le 14/09/2015 15:51, Christian Grün a écrit :
>>
>> Hi Pierre-Yves,
>>
>> Finally some feedback… And only a short one: Did you check out if the
>> problems persist with the latest BaseX 8.3 snapshot [1]?
>>
>> And maybe another question: Did you experience similar problems in a
>> simple setting without load balancer, Apache and HAProxy?
>>
>> Thanks,
>> Christian
>>
>> [1] http://files.basex.org/releases/latest/
>>
>>
>> On Thu, Aug 27, 2015 at 2:25 PM, Pierre-Yves JALLUD
>>  wrote:
>>>
>>> Hi all,
>>> I have some troubles with BaseX request and webdav access. My situation
>>> is
>>> the following: the acces to BaseX with webdav is with an oXygen
>>> (developer
>>> version 17) with Ubuntu 14.0.4 and the BaseX server is running under a
>>> Scientific Linux 64b (2.6.32-573.3.1.el6.x86_64 ... if the precision is
>>> necessary ;)). The BaseX server is behind a web cluster using a load
>>> balancer and an APACHE HTTP server. All the access passing thrue this
>>> server
>>> and the different ports access are managed with HAProxy.
>>> If the context is not clear enough, don't hesitate to ask more
>>> precisions.
>>>
>>> The symptom is that the BaseX server is freezing after a webdav request.
>>> The
>>> log's extracts you can find below track the following use case:
>>>- I try an access to http://BaseX.huma-num.fr/webdav/bvm/bvm.xml with
>>> oXygen
>>>- I recieve a HTTP error 500
>>>- I try to access to the BaseX server with my web site and more than
>>> one
>>> minute later, I have a timeout.
>>>
>>> ... but the worst symptom is that in some cases (not always), the
>>> requested
>>> db get corrupted and emptied (totally or partly) : resources are missing
>>> or
>>> they throw End of File exceptions when we then try to reopen them.
>>>
>>> This use case can appear many times a day... but not systematically. And
>>> other colleagues have the same problems with other version of oXygen with
>>> other OS (Windows), but allways with the same BaseX server.
>>>
>>> Does anyone of you have an idea to solve my problem?
>>>
>>> Thanks
>>> Pierre-Yves
>>>
>>> Some extracts of the log file:
>>>
>>> ...
>>>
>>> 17:13:06.694X.X.X.X:34953admin   REQUEST [GET]
>>> http://BaseX.huma-num.fr/webdav/bvm/bvm.xml
>>> 17:13:06.845X.X.X.X:34954admin   REQUEST [LOCK]
>>> http://BaseX.huma-num.fr/webdav/bvm/bvm.xml
>>> 17:13:06.888X.X.X.X:34954admin   200 43.82 ms
>>> 17:13:08.437X.X.X.X:34953admin   500 Unexpected error:
>>> Improper
>>> use? Potential bug? Your feedback is welcome: Contact:
>>> basex-talk@mailman.uni-konstanz.de Version: BaseX 8.2.1 Java: Oracle
>>> Corporation, 1.8.0_31 OS: Linux, amd64 Stack Trace:
>>> java.lang.RuntimeException: org.eclipse.jetty.io.EofException at
>>> org.basex.http.webdav.BXServletResponse.close(BXServletResponse.java:80)
>>> at
>>> org.basex.http.webdav.WebDAVServlet.run(WebDAVServlet.java:46) at
>>> org.basex.http.BaseXServlet.service(BaseXServlet.java:64) at
>>> javax.servlet.http.HttpServlet.service(HttpServlet.java:848) at
>>> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at
>>>
>>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
>>> at
>>>
>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>>> at
>>>
>>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
>>> at
>>>
>>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>>> at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextH...
>>> 1743.02 ms
>>>
>>> ...
>>>
>>> 17:15:20.041X.X.X.X:38436admin   REQUEST [PROPFIND]
>>> http://BaseX.huma-num.fr/webdav/skepsis/Cicero_De_Finibus.xml
>>> 17:15:40.067X.X.X.X:38770admin   REQUEST [PROPFIND]
>>> http://BaseX.huma-num.fr/webdav/skepsis/Cicero_De_Finibus.xml
>>> 17:16:00.078X.X.X.X:39090admin   REQUEST [PROPFIND]
>>> http://BaseX.huma-num.fr/webdav/skepsis/
>>> 17:16:00.170 

Re: [basex-talk] Stopword-related NullPointerException (in FTWords.java)

2015-09-15 Thread Christian Grün
Thanks again. I isolated the problem, and I will try to provide a bug
fix soon [1].

Best,
Christian

[1] https://github.com/BaseXdb/basex/issues/1192



On Mon, Sep 14, 2015 at 4:30 PM, Ron Katriel  wrote:
> Hi Christian,
>
> Thanks for following up on this. Please use the attached XML files to create
> the CTGov and MeSH databases (the first contains just NCT00303472 while the
> second the definitions of the 4 MeSH terms referenced in the
>  section of this CT.gov trial). Also attached is the
> stopwords file (containing just ‘syndrome'). I verified that the issue is
> reproducible with these minimal files.
>
> Note: I enabled full text indexing for both databases (using SET FTINDEX
> true), in case it matters.
>
> Looking forward to having this resolved.
>
> Best,
> Ron
>
>
> Ron Katriel, Ph.D. | Senior Data Scientist | Medidata Solutions
> 350 Hudson Street, 7th Floor, New York, NY 10014
> rkatr...@mdsol.com | direct: +1 201 337 3622 | mobile: +1 201 675 5598 |
> main: +1 212 918 1800
>
>
> On September 14, 2015 at 6:28:24 AM, Christian Grün
> (christian.gr...@gmail.com) wrote:
>
> Hi Ron,
>
> Sorry for late reply and thanks for your bug report. I am pretty sure
> this is a bug -- but it's difficult to guess what's going wrong. Could
> you possibly point me to the XML source documents or ideally provide
> me a small example to test?
>
> Thanks,
> Christian
>
>
> On Sun, Aug 30, 2015 at 5:56 PM, Ron Katriel  wrote:
>> Hi,
>>
>> I encountered a peculiar error with a query using a stopwords file in the
>> context of a full text search. The query joins two XML databases: CT.gov
>> (containing 86635503 nodes) and the 2015 MeSH dictionary (containing
>> 12064461 nodes). I am debugging using CT.gov trial NCT00303472, hardcoded
>> in
>> the ‘where' clause of the following query:
>>
>> let $trees := db:open('MeSH')/DescriptorRecordSet/DescriptorRecord
>> for $article in db:open('CTGov')/clinical_study
>> where $article/id_info/nct_id = 'NCT00303472'
>> let $mesh := $article/condition_browse/mesh_term
>> let $tn1 := $trees[DescriptorName/String contains text { $mesh }]
>> let $tn2 := $trees[DescriptorName/String contains text { $mesh } using
>> stop
>> words at
>>
>> "/Volumes/Extra/Documents/Standards/MeSH/stopwords.txt"]/TreeNumberList/TreeNumber
>> return  { $article/id_info/nct_id, $mesh, $tn2 } 
>>
>> When the return clause contains the variable $tn2 (i.e., using stopwords -
>> as shown above) a Java NullPointerException is generated (see the stack
>> trace below). However, when only $tn1 is returned there is no problem (the
>> code for $tn2 is removed by the optimizer).
>>
>> The issue is related to a specific stopword (“syndrome”). When the
>> stopword
>> is removed from the file the exception does not occur. Surprisingly, when
>> the stopword is in uppercase (“Syndrome”) the issue does not occur - even
>> though the target MeSH term in this CT.gov trial is in uppercase, that is
>>
>> Syndrome
>>
>> Am I doing something wrong, or is this a real bug in BaseX? If the former,
>> please suggest a workaround as I would like to filter out generic MeSH
>> terms
>> that match the stopwords before any further processing (I removed a lot of
>> code from the above query to make it easier to debug).
>>
>> Thanks,
>> Ron
>>
>>
>> Error:
>> Improper use? Potential bug? Your feedback is welcome:
>> Contact: basex-talk@mailman.uni-konstanz.de
>> Version: BaseX 8.2
>> Java: Oracle Corporation, 1.8.0_20
>> OS: Mac OS X, x86_64
>> Stack Trace:
>> java.lang.NullPointerException
>> at org.basex.query.expr.ft.FTWords$1.next(FTWords.java:166)
>> at org.basex.query.expr.ft.FTIndexAccess$1.next(FTIndexAccess.java:48)
>> at org.basex.query.expr.ft.FTIndexAccess$1.next(FTIndexAccess.java:45)
>> at org.basex.query.iter.Iter.value(Iter.java:53)
>> at org.basex.query.expr.ParseExpr.value(ParseExpr.java:67)
>> at org.basex.query.QueryContext.value(QueryContext.java:421)
>> at org.basex.query.expr.path.CachedPath.iter(CachedPath.java:41)
>> at org.basex.query.expr.path.CachedPath.iter(CachedPath.java:22)
>> at org.basex.query.QueryContext.iter(QueryContext.java:410)
>> at org.basex.query.expr.List$1.next(List.java:133)
>> at org.basex.query.expr.constr.Constr.add(Constr.java:70)
>> at org.basex.query.expr.constr.CElem.item(CElem.java:92)
>> at org.basex.query.expr.constr.CElem.item(CElem.java:23)
>> at org.basex.query.expr.ParseExpr.iter(ParseExpr.java:43)
>> at org.basex.query.expr.gflwor.GFLWOR$1.next(GFLWOR.java:99)
>> at org.basex.query.MainModule$1.next(MainModule.java:114)
>> at org.basex.query.QueryContext.cache(QueryContext.java:660)
>> at org.basex.query.QueryProcessor.cache(QueryProcessor.java:103)
>> at org.basex.core.cmd.AQuery.query(AQuery.java:83)
>> at org.basex.core.cmd.XQuery.run(XQuery.java:22)
>> at org.basex.core.Command.run(Command.java:398)
>> at org.basex.core.Command.execute(Command.java:100)
>> at org.basex.gui.GUI.exec(GUI.java:472)
>> at 

Re: [basex-talk] Stopword-related NullPointerException (in FTWords.java)

2015-09-15 Thread Christian Grün
Hi Ron,

The problem is fixed in the latest snapshot [1].

By the way: If you specify a stopword when creating a database, there
is no need to specify it in the query. I have also updated our Wiki
article on Full Text Index Processing to make this more explicit [2].

Hope this helps,
Christian

[1] http://files.basex.org/releases/latest/
[2] http://docs.basex.org/wiki/Full-Text#Index_Processing


On Mon, Sep 14, 2015 at 4:30 PM, Ron Katriel  wrote:
> Hi Christian,
>
> Thanks for following up on this. Please use the attached XML files to create
> the CTGov and MeSH databases (the first contains just NCT00303472 while the
> second the definitions of the 4 MeSH terms referenced in the
>  section of this CT.gov trial). Also attached is the
> stopwords file (containing just ‘syndrome'). I verified that the issue is
> reproducible with these minimal files.
>
> Note: I enabled full text indexing for both databases (using SET FTINDEX
> true), in case it matters.
>
> Looking forward to having this resolved.
>
> Best,
> Ron
>
>
> Ron Katriel, Ph.D. | Senior Data Scientist | Medidata Solutions
> 350 Hudson Street, 7th Floor, New York, NY 10014
> rkatr...@mdsol.com | direct: +1 201 337 3622 | mobile: +1 201 675 5598 |
> main: +1 212 918 1800
>
>
> On September 14, 2015 at 6:28:24 AM, Christian Grün
> (christian.gr...@gmail.com) wrote:
>
> Hi Ron,
>
> Sorry for late reply and thanks for your bug report. I am pretty sure
> this is a bug -- but it's difficult to guess what's going wrong. Could
> you possibly point me to the XML source documents or ideally provide
> me a small example to test?
>
> Thanks,
> Christian
>
>
> On Sun, Aug 30, 2015 at 5:56 PM, Ron Katriel  wrote:
>> Hi,
>>
>> I encountered a peculiar error with a query using a stopwords file in the
>> context of a full text search. The query joins two XML databases: CT.gov
>> (containing 86635503 nodes) and the 2015 MeSH dictionary (containing
>> 12064461 nodes). I am debugging using CT.gov trial NCT00303472, hardcoded
>> in
>> the ‘where' clause of the following query:
>>
>> let $trees := db:open('MeSH')/DescriptorRecordSet/DescriptorRecord
>> for $article in db:open('CTGov')/clinical_study
>> where $article/id_info/nct_id = 'NCT00303472'
>> let $mesh := $article/condition_browse/mesh_term
>> let $tn1 := $trees[DescriptorName/String contains text { $mesh }]
>> let $tn2 := $trees[DescriptorName/String contains text { $mesh } using
>> stop
>> words at
>>
>> "/Volumes/Extra/Documents/Standards/MeSH/stopwords.txt"]/TreeNumberList/TreeNumber
>> return  { $article/id_info/nct_id, $mesh, $tn2 } 
>>
>> When the return clause contains the variable $tn2 (i.e., using stopwords -
>> as shown above) a Java NullPointerException is generated (see the stack
>> trace below). However, when only $tn1 is returned there is no problem (the
>> code for $tn2 is removed by the optimizer).
>>
>> The issue is related to a specific stopword (“syndrome”). When the
>> stopword
>> is removed from the file the exception does not occur. Surprisingly, when
>> the stopword is in uppercase (“Syndrome”) the issue does not occur - even
>> though the target MeSH term in this CT.gov trial is in uppercase, that is
>>
>> Syndrome
>>
>> Am I doing something wrong, or is this a real bug in BaseX? If the former,
>> please suggest a workaround as I would like to filter out generic MeSH
>> terms
>> that match the stopwords before any further processing (I removed a lot of
>> code from the above query to make it easier to debug).
>>
>> Thanks,
>> Ron
>>
>>
>> Error:
>> Improper use? Potential bug? Your feedback is welcome:
>> Contact: basex-talk@mailman.uni-konstanz.de
>> Version: BaseX 8.2
>> Java: Oracle Corporation, 1.8.0_20
>> OS: Mac OS X, x86_64
>> Stack Trace:
>> java.lang.NullPointerException
>> at org.basex.query.expr.ft.FTWords$1.next(FTWords.java:166)
>> at org.basex.query.expr.ft.FTIndexAccess$1.next(FTIndexAccess.java:48)
>> at org.basex.query.expr.ft.FTIndexAccess$1.next(FTIndexAccess.java:45)
>> at org.basex.query.iter.Iter.value(Iter.java:53)
>> at org.basex.query.expr.ParseExpr.value(ParseExpr.java:67)
>> at org.basex.query.QueryContext.value(QueryContext.java:421)
>> at org.basex.query.expr.path.CachedPath.iter(CachedPath.java:41)
>> at org.basex.query.expr.path.CachedPath.iter(CachedPath.java:22)
>> at org.basex.query.QueryContext.iter(QueryContext.java:410)
>> at org.basex.query.expr.List$1.next(List.java:133)
>> at org.basex.query.expr.constr.Constr.add(Constr.java:70)
>> at org.basex.query.expr.constr.CElem.item(CElem.java:92)
>> at org.basex.query.expr.constr.CElem.item(CElem.java:23)
>> at org.basex.query.expr.ParseExpr.iter(ParseExpr.java:43)
>> at org.basex.query.expr.gflwor.GFLWOR$1.next(GFLWOR.java:99)
>> at org.basex.query.MainModule$1.next(MainModule.java:114)
>> at org.basex.query.QueryContext.cache(QueryContext.java:660)
>> at org.basex.query.QueryProcessor.cache(QueryProcessor.java:103)
>> at 

Re: [basex-talk] Fwd: Re : Basex Query Optimization Support

2015-09-15 Thread Christian Grün
Hi Adi,

How does your new code look like? How did you rewrite the collection function?

Christian



On Tue, Sep 15, 2015 at 12:48 PM, Adi Babu  wrote:
> Hi Martin,
> I am working with collection data base only from starting on wards. Through
> mail it is not possible to share the large collection db(more than 7GB)
> thats why I am sharing sample query(with xml doc but actually I am using
> collection only).
> I have tried all your suggestions in the trail mail ,but it takes more time
> in 8.2.3 than 7.9 version. Can you please guide me to proceed further.
>
> Thanks & Regards,
> Adi
>
> On Fri, Sep 11, 2015 at 10:04 PM, Martín Ferrari
>  wrote:
>>
>> HI Adi,
>>  Have you tried Max's suggestion of adding the xml data to a BaseX db
>> first? Based on this:
>>
>> for $details in
>>
>> (collection("E:\Web-Projects\VodafoneUK\Docs\Cloud_Test_Stats\Performance_Analysis\Freq_Called_Sample_XML_Data.xml")
>>
>>
>> it looks like you're using the file Freq_Called_Sample_XML_Data.xml
>> directly from the filesystem, which means it has no indexes and must be
>> fully scanned to get results. You should create a DB and upload that xml
>> file with TEXTINDEX and ATTRINDEX enabled (which is the default).  You might
>> need to run the OPTIMIZE ALL command after uploading the xml file, you can
>> run the INFO DB command and it will show if your database has up-to-date
>> indexes or not. Once you have a db, you can also use the GUI to check the
>> query plan that's being used for a particular query.
>>
>>
>> Martín.
>>
>>
>> 
>> Date: Fri, 11 Sep 2015 18:37:05 +0530
>> From: adibab...@gmail.com
>> To: mgaer...@arcor.de
>> CC: basex-talk@mailman.uni-konstanz.de
>> Subject: Re: [basex-talk] Fwd: Re : Basex Query Optimization Support
>>
>> Dear Max,
>> Currently I am using collection only(size 7.96 GB), I am sharing the query
>> for your reference only. I tried the latest version 8.2.3 also but it takes
>> more time than 7.9.
>> Can you please assist me to proceed further.
>>
>> Thanks & Regards,
>> Adi
>>
>> On Tue, Sep 8, 2015 at 9:15 PM, Maximilian Gärber 
>> wrote:
>>
>> Hi Adi,
>>
>> some general pointers:
>>  * Server version 7.9 is missing all the latest perf improvements,
>> current version is 8.2.3
>>  * You are accessing the files in the file system - try adding them to
>> the database first, then you can make use of the text index etc. See
>> http://docs.basex.org/wiki/Indexes#Text_Index
>>
>> Regards,
>>
>> Max
>>
>>
>> 2015-09-04 8:31 GMT+02:00 Adi Babu :
>> > Dear Team,
>> > I am facing performance issue taking long time to execute the
>> > Xquery in Basex(Client Server architecture), can you please guide me to
>> > optimize the query performance.
>> >
>> > Below are the statistics & Basex server hardware specifications.
>> >
>> > Statistics :
>> > -
>> > Total Records count : 600
>> > Fetch count : 1
>> > Hits Parallel hits  : 100
>> > Time Taking : 750 seconds
>> > Architecture: Client Server Architecture
>> >
>> > Hardware Specifications of Basex Sever :
>> > --
>> > Operating System : Red Hat Enterprise Linux 64bit
>> > OS Version : Linux 2.6.32
>> > Hard Disk : 500 GB
>> > Memory Capacity : 16 GB Ram
>> > Processor Family : Intel (R) Xeon(R) CPU E5 -2680 V2
>> > Processor Speed : 2.80 GHz
>> > Physical Processors : 8
>> > Virtual Processors/cores : 4
>> > Server Version : basex 7.9
>> >
>> > Note : I have attached the sample xml data and Xquery .
>> >
>> >
>> > Thanks & Regards
>> > Adi
>> >
>>
>>
>


Re: [basex-talk] Xquery collections

2015-09-15 Thread Marc

Hi Michele,
Is it not because the following axis do'nt return the childrens?
Do you try without the except to see if you have all your text()?
Marc

Le 04/09/2015 15:49, michele.gre...@email.it a écrit :

Hi, i tried this xquery:
for $e in db:open("dbName")//w:tc[.//text()="Nome"]
  return $e/./following::text()except
   (for $x in db:open("dbName")//w:tc[.//text()="Indirizzo"]
return $x//following::text())
but only returns the result of the first document.
How do i do it for all the documents?

- Original Message 
Da: "Dirk Kirsten" 
To:
Cc: basex-talk@mailman.uni-konstanz.de
Oggetto: Re: [basex-talk] Xquery collections
Data: 03/09/15 18:08

Ciao Michele,

welcome to the community :-)

Collections in BaseX are basically databases. You can simply open a
database e.g. by issueing the XQuery db:open('mydatabase'). This
will use all documents in your database.

You can find some more examples at http://docs.basex.org/wiki/Databases

It might also be helpful for you to read some of the Tutorials given
at http://docs.basex.org/wiki/Getting_Started like BaseX for
Dummies, given a concise introduction to BaseX.

Spero che sia d'aiuto
Dirk



On 09/03/2015 05:48 PM, michele.gre...@email.it wrote:

Hello I Michele i'm new to basex,
i created a collection with different documents .xml.
I want to know how to query the entire collection e its
documents at one xquery.
I'm trying,but i can't.
Thanks.
MG


ZE-Light e ZE-Pro: servizi zimbra per caselle con dominio
email.it, per tutti i dettagli clicca qui



Sponsor:
Idee regalo classiche o alternative? Trova l'offerta migliore in
un click
Clicca qui



-- Dirk Kirsten, BaseX GmbH, http://basexgmbh.de |-- Firmensitz:
Blarerstrasse 56, 78462 Konstanz |-- Registergericht Freiburg, HRB:
708285, Geschäftsführer: | Dr. Christian Grün, Dr. Alexander
Holupirek, Michael Seiferle `-- Phone: 0049 7531 28 28 676, Fax:
0049 7531 20 05 22



ZE-Light e ZE-Pro: servizi zimbra per caselle con dominio email.it, per
tutti i dettagli clicca qui


Sponsor:
Caselle con tuo dominio su piattaforma Zimbra, fino a 30 GB di spazio,
sincronizzazione dati e backup
Clicca qui 


Re: [basex-talk] Fwd: Re : Basex Query Optimization Support

2015-09-15 Thread Christian Grün
Hi Adi,

I may not be up-to-date, as I expected you to send me an updated
version of your XQuery code. Did you try what Martín suggested?

Christian



On Tue, Sep 15, 2015 at 1:19 PM, Adi Babu  wrote:
> Hai Chistian,
> Please find the attachment sample piece of code.
>
> Thanks & Regards
> Adi
>
>
> On Tue, Sep 15, 2015 at 4:34 PM, Christian Grün 
> wrote:
>>
>> Hi Adi,
>>
>> How does your new code look like? How did you rewrite the collection
>> function?
>>
>> Christian
>>
>>
>>
>> On Tue, Sep 15, 2015 at 12:48 PM, Adi Babu  wrote:
>> > Hi Martin,
>> > I am working with collection data base only from starting on wards.
>> > Through
>> > mail it is not possible to share the large collection db(more than 7GB)
>> > thats why I am sharing sample query(with xml doc but actually I am using
>> > collection only).
>> > I have tried all your suggestions in the trail mail ,but it takes more
>> > time
>> > in 8.2.3 than 7.9 version. Can you please guide me to proceed further.
>> >
>> > Thanks & Regards,
>> > Adi
>> >
>> > On Fri, Sep 11, 2015 at 10:04 PM, Martín Ferrari
>> >  wrote:
>> >>
>> >> HI Adi,
>> >>  Have you tried Max's suggestion of adding the xml data to a BaseX
>> >> db
>> >> first? Based on this:
>> >>
>> >> for $details in
>> >>
>> >>
>> >> (collection("E:\Web-Projects\VodafoneUK\Docs\Cloud_Test_Stats\Performance_Analysis\Freq_Called_Sample_XML_Data.xml")
>> >>
>> >>
>> >> it looks like you're using the file Freq_Called_Sample_XML_Data.xml
>> >> directly from the filesystem, which means it has no indexes and must be
>> >> fully scanned to get results. You should create a DB and upload that
>> >> xml
>> >> file with TEXTINDEX and ATTRINDEX enabled (which is the default).  You
>> >> might
>> >> need to run the OPTIMIZE ALL command after uploading the xml file, you
>> >> can
>> >> run the INFO DB command and it will show if your database has
>> >> up-to-date
>> >> indexes or not. Once you have a db, you can also use the GUI to check
>> >> the
>> >> query plan that's being used for a particular query.
>> >>
>> >>
>> >> Martín.
>> >>
>> >>
>> >> 
>> >> Date: Fri, 11 Sep 2015 18:37:05 +0530
>> >> From: adibab...@gmail.com
>> >> To: mgaer...@arcor.de
>> >> CC: basex-talk@mailman.uni-konstanz.de
>> >> Subject: Re: [basex-talk] Fwd: Re : Basex Query Optimization Support
>> >>
>> >> Dear Max,
>> >> Currently I am using collection only(size 7.96 GB), I am sharing the
>> >> query
>> >> for your reference only. I tried the latest version 8.2.3 also but it
>> >> takes
>> >> more time than 7.9.
>> >> Can you please assist me to proceed further.
>> >>
>> >> Thanks & Regards,
>> >> Adi
>> >>
>> >> On Tue, Sep 8, 2015 at 9:15 PM, Maximilian Gärber 
>> >> wrote:
>> >>
>> >> Hi Adi,
>> >>
>> >> some general pointers:
>> >>  * Server version 7.9 is missing all the latest perf improvements,
>> >> current version is 8.2.3
>> >>  * You are accessing the files in the file system - try adding them to
>> >> the database first, then you can make use of the text index etc. See
>> >> http://docs.basex.org/wiki/Indexes#Text_Index
>> >>
>> >> Regards,
>> >>
>> >> Max
>> >>
>> >>
>> >> 2015-09-04 8:31 GMT+02:00 Adi Babu :
>> >> > Dear Team,
>> >> > I am facing performance issue taking long time to execute the
>> >> > Xquery in Basex(Client Server architecture), can you please guide me
>> >> > to
>> >> > optimize the query performance.
>> >> >
>> >> > Below are the statistics & Basex server hardware specifications.
>> >> >
>> >> > Statistics :
>> >> > -
>> >> > Total Records count : 600
>> >> > Fetch count : 1
>> >> > Hits Parallel hits  : 100
>> >> > Time Taking : 750 seconds
>> >> > Architecture: Client Server Architecture
>> >> >
>> >> > Hardware Specifications of Basex Sever :
>> >> > --
>> >> > Operating System : Red Hat Enterprise Linux 64bit
>> >> > OS Version : Linux 2.6.32
>> >> > Hard Disk : 500 GB
>> >> > Memory Capacity : 16 GB Ram
>> >> > Processor Family : Intel (R) Xeon(R) CPU E5 -2680 V2
>> >> > Processor Speed : 2.80 GHz
>> >> > Physical Processors : 8
>> >> > Virtual Processors/cores : 4
>> >> > Server Version : basex 7.9
>> >> >
>> >> > Note : I have attached the sample xml data and Xquery .
>> >> >
>> >> >
>> >> > Thanks & Regards
>> >> > Adi
>> >> >
>> >>
>> >>
>> >
>
>


Re: [basex-talk] Fwd: Re : Basex Query Optimization Support

2015-09-15 Thread Adi Babu
Hi Chirstian,
I tried what Martin suggested. Please find the below is the sample code
what I am using for creating collection.


/*
This method takes the xml files list as input and process the list
Create the xml collection DB, adds xml to the collection if already exists
replace it
*/

public static void createCollection(ArrayList listFiles){
ClientSession clientSession = null;
String strIndexedFile="";
try {
clientSession = new ClientSession("<>", 1984,
"<>", "<>");
String strDbName = "Sample_Collection";
String strKeysFileLocation =
"D:\\TEMP\\7GB_Collection_TestingTeam\\Files";

String res = clientSession.execute("XQUERY db:exists('" +
strDbName + "')");
//DB existence checking
if (!res.equalsIgnoreCase("true")) {
clientSession.execute("CREATE DB " + strDbName);
} else {
clientSession.execute("OPEN " + strDbName);
}

for (int i = 0; i < listFiles.size(); i++) {
strIndexedFile = listFiles.get(i);
String tempFlag=clientSession.execute("XQUERY db:exists('"
+ strDbName + "','" + strIndexedFile + "')");
clientSession.execute("ADD " + strKeysFileLocation +
File.separator + strIndexedFile);
}
System.out.println("InFormation::
"+clientSession.execute("INFO"));
//optimizing the current opend DB using OPTIMIZE ALL command
clientSession.execute("OPTIMIZE ALL");
System.out.println("InFormation::
"+clientSession.execute("INFO"));

} catch (Exception e) {
System.out.println("Exception files DB Updating: " +
e.getMessage());
e.printStackTrace();
} finally {
if (clientSession != null) {
try {
clientSession.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
}


Thanks & Regards,
Adi


On Tue, Sep 15, 2015 at 4:52 PM, Christian Grün 
wrote:

> Hi Adi,
>
> I may not be up-to-date, as I expected you to send me an updated
> version of your XQuery code. Did you try what Martín suggested?
>
> Christian
>
>
>
> On Tue, Sep 15, 2015 at 1:19 PM, Adi Babu  wrote:
> > Hai Chistian,
> > Please find the attachment sample piece of code.
> >
> > Thanks & Regards
> > Adi
> >
> >
> > On Tue, Sep 15, 2015 at 4:34 PM, Christian Grün <
> christian.gr...@gmail.com>
> > wrote:
> >>
> >> Hi Adi,
> >>
> >> How does your new code look like? How did you rewrite the collection
> >> function?
> >>
> >> Christian
> >>
> >>
> >>
> >> On Tue, Sep 15, 2015 at 12:48 PM, Adi Babu  wrote:
> >> > Hi Martin,
> >> > I am working with collection data base only from starting on wards.
> >> > Through
> >> > mail it is not possible to share the large collection db(more than
> 7GB)
> >> > thats why I am sharing sample query(with xml doc but actually I am
> using
> >> > collection only).
> >> > I have tried all your suggestions in the trail mail ,but it takes more
> >> > time
> >> > in 8.2.3 than 7.9 version. Can you please guide me to proceed further.
> >> >
> >> > Thanks & Regards,
> >> > Adi
> >> >
> >> > On Fri, Sep 11, 2015 at 10:04 PM, Martín Ferrari
> >> >  wrote:
> >> >>
> >> >> HI Adi,
> >> >>  Have you tried Max's suggestion of adding the xml data to a
> BaseX
> >> >> db
> >> >> first? Based on this:
> >> >>
> >> >> for $details in
> >> >>
> >> >>
> >> >>
> (collection("E:\Web-Projects\VodafoneUK\Docs\Cloud_Test_Stats\Performance_Analysis\Freq_Called_Sample_XML_Data.xml")
> >> >>
> >> >>
> >> >> it looks like you're using the file Freq_Called_Sample_XML_Data.xml
> >> >> directly from the filesystem, which means it has no indexes and must
> be
> >> >> fully scanned to get results. You should create a DB and upload that
> >> >> xml
> >> >> file with TEXTINDEX and ATTRINDEX enabled (which is the default).
> You
> >> >> might
> >> >> need to run the OPTIMIZE ALL command after uploading the xml file,
> you
> >> >> can
> >> >> run the INFO DB command and it will show if your database has
> >> >> up-to-date
> >> >> indexes or not. Once you have a db, you can also use the GUI to check
> >> >> the
> >> >> query plan that's being used for a particular query.
> >> >>
> >> >>
> >> >> Martín.
> >> >>
> >> >>
> >> >> 
> >> >> Date: Fri, 11 Sep 2015 18:37:05 +0530
> >> >> From: adibab...@gmail.com
> >> >> To: mgaer...@arcor.de
> >> >> CC: basex-talk@mailman.uni-konstanz.de
> >> >> Subject: Re: [basex-talk] Fwd: Re : Basex Query Optimization Support
> >> >>
> >> >> Dear Max,
> >> >> Currently I am using collection only(size 7.96 GB), I am sharing the
> >> >> query
> >> >> for your reference only. I tried the latest version 8.2.3 also but it
> >> >> takes
> >> >> more time than 7.9.
> >> >> Can 

Re: [basex-talk] Stopword-related NullPointerException (in FTWords.java)

2015-09-15 Thread Ron Katriel
Hi Christian,

I downloaded the latest release and confirmed the fix. Thanks for the quick 
turnaround!

However, I am now getting a different (presumably unrelated) error

Stopped at /Users/rkatriel/Documents/Data Science/Data 
Sets/CUR/CustomerUsageReport/2015/07-15-2015/JOIN/cdsstudies.drugbank.join.result/file,
 1/51:
[XQST0033] Duplicate declaration of prefix 'functx'.

when executing the following query (rest of code omitted for brevity)

declare namespace functx = "http://www.functx.com;;

declare function functx:value-union ($arg1 as xs:anyAtomicType*, $arg2 as 
xs:anyAtomicType*) as xs:anyAtomicType* {
  ($arg1, $arg2)
};

Changing the namespace or function name does not help; omitting it produces a 
different error (No namespace declared for 'functx:value-union’).

This worked before. Has something changed?

Best,
Ron


Ron Katriel, Ph.D. | Senior Data Scientist | Medidata Solutions
350 Hudson Street, 7th Floor, New York, NY 10014
rkatr...@mdsol.com | direct: +1 201 337 3622 | mobile: +1 201 675 5598 | main: 
+1 212 918 1800

On September 15, 2015 at 7:05:59 AM, Christian Grün (christian.gr...@gmail.com) 
wrote:

Hi Ron,  

The problem is fixed in the latest snapshot [1].  

By the way: If you specify a stopword when creating a database, there  
is no need to specify it in the query. I have also updated our Wiki  
article on Full Text Index Processing to make this more explicit [2].  

Hope this helps,  
Christian  

[1] http://files.basex.org/releases/latest/  
[2] http://docs.basex.org/wiki/Full-Text#Index_Processing  


On Mon, Sep 14, 2015 at 4:30 PM, Ron Katriel  wrote:  
> Hi Christian,  
>  
> Thanks for following up on this. Please use the attached XML files to create  
> the CTGov and MeSH databases (the first contains just NCT00303472 while the  
> second the definitions of the 4 MeSH terms referenced in the  
>  section of this CT.gov trial). Also attached is the  
> stopwords file (containing just ‘syndrome'). I verified that the issue is  
> reproducible with these minimal files.  
>  
> Note: I enabled full text indexing for both databases (using SET FTINDEX  
> true), in case it matters.  
>  
> Looking forward to having this resolved.  
>  
> Best,  
> Ron  
>  
>  
> Ron Katriel, Ph.D. | Senior Data Scientist | Medidata Solutions  
> 350 Hudson Street, 7th Floor, New York, NY 10014  
> rkatr...@mdsol.com | direct: +1 201 337 3622 | mobile: +1 201 675 5598 |  
> main: +1 212 918 1800  
>  
>  
> On September 14, 2015 at 6:28:24 AM, Christian Grün  
> (christian.gr...@gmail.com) wrote:  
>  
> Hi Ron,  
>  
> Sorry for late reply and thanks for your bug report. I am pretty sure  
> this is a bug -- but it's difficult to guess what's going wrong. Could  
> you possibly point me to the XML source documents or ideally provide  
> me a small example to test?  
>  
> Thanks,  
> Christian  
>  
>  
> On Sun, Aug 30, 2015 at 5:56 PM, Ron Katriel  wrote:  
>> Hi,  
>>  
>> I encountered a peculiar error with a query using a stopwords file in the  
>> context of a full text search. The query joins two XML databases: CT.gov  
>> (containing 86635503 nodes) and the 2015 MeSH dictionary (containing  
>> 12064461 nodes). I am debugging using CT.gov trial NCT00303472, hardcoded  
>> in  
>> the ‘where' clause of the following query:  
>>  
>> let $trees := db:open('MeSH')/DescriptorRecordSet/DescriptorRecord  
>> for $article in db:open('CTGov')/clinical_study  
>> where $article/id_info/nct_id = 'NCT00303472'  
>> let $mesh := $article/condition_browse/mesh_term  
>> let $tn1 := $trees[DescriptorName/String contains text { $mesh }]  
>> let $tn2 := $trees[DescriptorName/String contains text { $mesh } using  
>> stop  
>> words at  
>>  
>> "/Volumes/Extra/Documents/Standards/MeSH/stopwords.txt"]/TreeNumberList/TreeNumber
>>   
>> return  { $article/id_info/nct_id, $mesh, $tn2 }   
>>  
>> When the return clause contains the variable $tn2 (i.e., using stopwords -  
>> as shown above) a Java NullPointerException is generated (see the stack  
>> trace below). However, when only $tn1 is returned there is no problem (the  
>> code for $tn2 is removed by the optimizer).  
>>  
>> The issue is related to a specific stopword (“syndrome”). When the  
>> stopword  
>> is removed from the file the exception does not occur. Surprisingly, when  
>> the stopword is in uppercase (“Syndrome”) the issue does not occur - even  
>> though the target MeSH term in this CT.gov trial is in uppercase, that is  
>>  
>> Syndrome  
>>  
>> Am I doing something wrong, or is this a real bug in BaseX? If the former,  
>> please suggest a workaround as I would like to filter out generic MeSH  
>> terms  
>> that match the stopwords before any further processing (I removed a lot of  
>> code from the above query to make it easier to debug).  
>>  
>> Thanks,  
>> Ron  
>>  
>>  
>> Error:  
>> Improper use? Potential bug? Your feedback is welcome:  
>> Contact: 

Re: [basex-talk] Different behavior with same query in BaseX 7.9 and 8.2.3

2015-09-15 Thread Antoine WOLF
Hello Christian, hello Dirk,




Thank you for your quick answer and for the fix.

We will update to the latest snapshot. 







Best Regards,

Antoine






—
Sent from Mailbox

On Mon, Sep 14, 2015 at 4:41 PM, Christian Grün 
wrote:

> Thanks Antoine, thanks Dirk,
> The bug has been fixed in the latest snapshot [1]. BaseX 8.3 is
> planned to be released this month.
> Best,
> Christian
> [1] http://files.basex.org/releases/latest
> On Wed, Sep 9, 2015 at 1:29 PM, Dirk Kirsten  wrote:
>> Hello Antoine,
>>
>> this appears to be a bug in most recent BaseX version. The optimizer seems
>> to think the where clause is equivilant to the position predicate and it is
>> rewritten to
>>
>>   for $d_0 in
>> document-node()()/descendant::fund/properties/domain[position() = 1] return
>> data($d_0/@reference)
>>
>> i.e. without a where clause but with the position() predicate. Also, if you
>> return something like "return data($d/@reference) || $i" it works as
>> expected.
>>
>> Hence, I created a new issue for this at
>> https://github.com/BaseXdb/basex/issues/1189.
>>
>> Thanks for reporting this. Please be aware that our main developer Christian
>> is currently on vacation (until next week), so it will take slightly longer
>> to get this fixed (as I know Christian usually this would be fixed by the
>> end of the day). I hope this doesn't influence your decision making process
>> in your evaluation.
>>
>> Cheers
>> Dirk
>>
>>
>>
>>
>>
>> On 09/09/2015 11:22 AM, Antoine WOLF wrote:
>>
>> Hi,
>>
>> We are in the process of evaluating BaseX  and recently upgraded from
>> version 7.9 to  8.2.3
>>
>>
>> We have a DB “TESTDB” with 50 xml files, named Fund1.xml, Fund2.xml,
>> Fund3.xml,…. each having the same structure:
>>
>> Fund1.xml
>> --
>>
>> < fund>
>> …
>> 
>> …
>> 
>> …
>> 
>> …
>> 
>> …
>> 
>>
>>
>> Fund2.xml
>> --
>>
>> < fund>
>> …
>> 
>> …
>> 
>> …
>> 
>> …
>> 
>> …
>> 
>>
>> ….
>> ….
>>
>>
>>
>> The following query does not behave the same on both versions :
>>
>>
>> For $d at $i in db:open(“TESTDB”)/fund/properties/domain
>> where $i < 2
>> return data($d/@reference)
>>
>>
>> Result in 7.9
>> --
>>
>> “Fund1”
>>
>>
>> Result in 8.2.3
>> -
>>
>> “Fund1 Fund2 Fund3 Fund4 …”
>>
>>
>> Is this a bug or is there something wrong with the query ?
>>
>>
>> Best Regards,
>>
>> Antoine Wolf
>>
>>
>> --
>> Dirk Kirsten, BaseX GmbH, http://basexgmbh.de
>> |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz
>> |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer:
>> |   Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle
>> `-- Phone: 0049 7531 28 28 676, Fax: 0049 7531 20 05 22

Re: [basex-talk] Fwd: Re : Basex Query Optimization Support

2015-09-15 Thread Adi Babu
Hi Martin,
I am working with collection data base only from starting on wards.
Through mail
it is not possible to share the large collection db(more than 7GB) thats
why I am sharing sample query(with xml doc but actually I am using
collection only).
I have tried all your suggestions in the trail mail ,but it takes more time
in 8.2.3 than 7.9 version. Can you please guide me to proceed further.

Thanks & Regards,
Adi

On Fri, Sep 11, 2015 at 10:04 PM, Martín Ferrari  wrote:

> HI Adi,
>  Have you tried Max's suggestion of adding the xml data to a BaseX db
> first? Based on this:
>
> for $details in
> (collection("E:\Web-Projects\VodafoneUK\Docs\Cloud_Test_Stats\Performance_Analysis\Freq_Called_Sample_XML_Data.xml")
>
>
> it looks like you're using the file Freq_Called_Sample_XML_Data.xml directly 
> from the filesystem, which means it has no indexes and must be fully scanned 
> to get results. You should create a DB and upload that xml file with 
> TEXTINDEX and ATTRINDEX enabled (which is the default).  You might need to 
> run the OPTIMIZE ALL command after uploading the xml file, you can run the 
> INFO DB command and it will show if your database has up-to-date indexes or 
> not. Once you have a db, you can also use the GUI to check the query plan 
> that's being used for a particular query.
>
>
> Martín.
>
>
> --
> Date: Fri, 11 Sep 2015 18:37:05 +0530
> From: adibab...@gmail.com
> To: mgaer...@arcor.de
> CC: basex-talk@mailman.uni-konstanz.de
> Subject: Re: [basex-talk] Fwd: Re : Basex Query Optimization Support
>
> Dear Max,
> Currently I am using collection only(size 7.96 GB), I am sharing the query
> for your reference only. I tried the latest version 8.2.3 also but it takes
> more time than 7.9.
> Can you please assist me to proceed further.
>
> Thanks & Regards,
> Adi
>
> On Tue, Sep 8, 2015 at 9:15 PM, Maximilian Gärber 
> wrote:
>
> Hi Adi,
>
> some general pointers:
>  * Server version 7.9 is missing all the latest perf improvements,
> current version is 8.2.3
>  * You are accessing the files in the file system - try adding them to
> the database first, then you can make use of the text index etc. See
> http://docs.basex.org/wiki/Indexes#Text_Index
>
> Regards,
>
> Max
>
>
> 2015-09-04 8:31 GMT+02:00 Adi Babu :
> > Dear Team,
> > I am facing performance issue taking long time to execute the
> > Xquery in Basex(Client Server architecture), can you please guide me to
> > optimize the query performance.
> >
> > Below are the statistics & Basex server hardware specifications.
> >
> > Statistics :
> > -
> > Total Records count : 600
> > Fetch count : 1
> > Hits Parallel hits  : 100
> > Time Taking : 750 seconds
> > Architecture: Client Server Architecture
> >
> > Hardware Specifications of Basex Sever :
> > --
> > Operating System : Red Hat Enterprise Linux 64bit
> > OS Version : Linux 2.6.32
> > Hard Disk : 500 GB
> > Memory Capacity : 16 GB Ram
> > Processor Family : Intel (R) Xeon(R) CPU E5 -2680 V2
> > Processor Speed : 2.80 GHz
> > Physical Processors : 8
> > Virtual Processors/cores : 4
> > Server Version : basex 7.9
> >
> > Note : I have attached the sample xml data and Xquery .
> >
> >
> > Thanks & Regards
> > Adi
> >
>
>
>