Hi Christian,

I downloaded the latest release and confirmed the fix. Thanks for the quick 
turnaround!

However, I am now getting a different (presumably unrelated) error

Stopped at /Users/rkatriel/Documents/Data Science/Data 
Sets/CUR/CustomerUsageReport/2015/07-15-2015/JOIN/cdsstudies.drugbank.join.result/file,
 1/51:
[XQST0033] Duplicate declaration of prefix 'functx'.

when executing the following query (rest of code omitted for brevity)

declare namespace functx = "http://www.functx.com";;

declare function functx:value-union ($arg1 as xs:anyAtomicType*, $arg2 as 
xs:anyAtomicType*) as xs:anyAtomicType* {
  ($arg1, $arg2)
};

Changing the namespace or function name does not help; omitting it produces a 
different error (No namespace declared for 'functx:value-union’).

This worked before. Has something changed?

Best,
Ron


Ron Katriel, Ph.D. | Senior Data Scientist | Medidata Solutions
350 Hudson Street, 7th Floor, New York, NY 10014
rkatr...@mdsol.com | direct: +1 201 337 3622 | mobile: +1 201 675 5598 | main: 
+1 212 918 1800

On September 15, 2015 at 7:05:59 AM, Christian Grün (christian.gr...@gmail.com) 
wrote:

Hi Ron,  

The problem is fixed in the latest snapshot [1].  

By the way: If you specify a stopword when creating a database, there  
is no need to specify it in the query. I have also updated our Wiki  
article on Full Text Index Processing to make this more explicit [2].  

Hope this helps,  
Christian  

[1] http://files.basex.org/releases/latest/  
[2] http://docs.basex.org/wiki/Full-Text#Index_Processing  


On Mon, Sep 14, 2015 at 4:30 PM, Ron Katriel <rkatr...@mdsol.com> wrote:  
> Hi Christian,  
>  
> Thanks for following up on this. Please use the attached XML files to create  
> the CTGov and MeSH databases (the first contains just NCT00303472 while the  
> second the definitions of the 4 MeSH terms referenced in the  
> <condition_browse> section of this CT.gov trial). Also attached is the  
> stopwords file (containing just ‘syndrome'). I verified that the issue is  
> reproducible with these minimal files.  
>  
> Note: I enabled full text indexing for both databases (using SET FTINDEX  
> true), in case it matters.  
>  
> Looking forward to having this resolved.  
>  
> Best,  
> Ron  
>  
>  
> Ron Katriel, Ph.D. | Senior Data Scientist | Medidata Solutions  
> 350 Hudson Street, 7th Floor, New York, NY 10014  
> rkatr...@mdsol.com | direct: +1 201 337 3622 | mobile: +1 201 675 5598 |  
> main: +1 212 918 1800  
>  
>  
> On September 14, 2015 at 6:28:24 AM, Christian Grün  
> (christian.gr...@gmail.com) wrote:  
>  
> Hi Ron,  
>  
> Sorry for late reply and thanks for your bug report. I am pretty sure  
> this is a bug -- but it's difficult to guess what's going wrong. Could  
> you possibly point me to the XML source documents or ideally provide  
> me a small example to test?  
>  
> Thanks,  
> Christian  
>  
>  
> On Sun, Aug 30, 2015 at 5:56 PM, Ron Katriel <rkatr...@mdsol.com> wrote:  
>> Hi,  
>>  
>> I encountered a peculiar error with a query using a stopwords file in the  
>> context of a full text search. The query joins two XML databases: CT.gov  
>> (containing 86635503 nodes) and the 2015 MeSH dictionary (containing  
>> 12064461 nodes). I am debugging using CT.gov trial NCT00303472, hardcoded  
>> in  
>> the ‘where' clause of the following query:  
>>  
>> let $trees := db:open('MeSH')/DescriptorRecordSet/DescriptorRecord  
>> for $article in db:open('CTGov')/clinical_study  
>> where $article/id_info/nct_id = 'NCT00303472'  
>> let $mesh := $article/condition_browse/mesh_term  
>> let $tn1 := $trees[DescriptorName/String contains text { $mesh }]  
>> let $tn2 := $trees[DescriptorName/String contains text { $mesh } using  
>> stop  
>> words at  
>>  
>> "/Volumes/Extra/Documents/Standards/MeSH/stopwords.txt"]/TreeNumberList/TreeNumber
>>   
>> return <match> { $article/id_info/nct_id, $mesh, $tn2 } </match>  
>>  
>> When the return clause contains the variable $tn2 (i.e., using stopwords -  
>> as shown above) a Java NullPointerException is generated (see the stack  
>> trace below). However, when only $tn1 is returned there is no problem (the  
>> code for $tn2 is removed by the optimizer).  
>>  
>> The issue is related to a specific stopword (“syndrome”). When the  
>> stopword  
>> is removed from the file the exception does not occur. Surprisingly, when  
>> the stopword is in uppercase (“Syndrome”) the issue does not occur - even  
>> though the target MeSH term in this CT.gov trial is in uppercase, that is  
>>  
>> <mesh_term>Syndrome</mesh_term>  
>>  
>> Am I doing something wrong, or is this a real bug in BaseX? If the former,  
>> please suggest a workaround as I would like to filter out generic MeSH  
>> terms  
>> that match the stopwords before any further processing (I removed a lot of  
>> code from the above query to make it easier to debug).  
>>  
>> Thanks,  
>> Ron  
>>  
>>  
>> Error:  
>> Improper use? Potential bug? Your feedback is welcome:  
>> Contact: basex-talk@mailman.uni-konstanz.de  
>> Version: BaseX 8.2  
>> Java: Oracle Corporation, 1.8.0_20  
>> OS: Mac OS X, x86_64  
>> Stack Trace:  
>> java.lang.NullPointerException  
>> at org.basex.query.expr.ft.FTWords$1.next(FTWords.java:166)  
>> at org.basex.query.expr.ft.FTIndexAccess$1.next(FTIndexAccess.java:48)  
>> at org.basex.query.expr.ft.FTIndexAccess$1.next(FTIndexAccess.java:45)  
>> at org.basex.query.iter.Iter.value(Iter.java:53)  
>> at org.basex.query.expr.ParseExpr.value(ParseExpr.java:67)  
>> at org.basex.query.QueryContext.value(QueryContext.java:421)  
>> at org.basex.query.expr.path.CachedPath.iter(CachedPath.java:41)  
>> at org.basex.query.expr.path.CachedPath.iter(CachedPath.java:22)  
>> at org.basex.query.QueryContext.iter(QueryContext.java:410)  
>> at org.basex.query.expr.List$1.next(List.java:133)  
>> at org.basex.query.expr.constr.Constr.add(Constr.java:70)  
>> at org.basex.query.expr.constr.CElem.item(CElem.java:92)  
>> at org.basex.query.expr.constr.CElem.item(CElem.java:23)  
>> at org.basex.query.expr.ParseExpr.iter(ParseExpr.java:43)  
>> at org.basex.query.expr.gflwor.GFLWOR$1.next(GFLWOR.java:99)  
>> at org.basex.query.MainModule$1.next(MainModule.java:114)  
>> at org.basex.query.QueryContext.cache(QueryContext.java:660)  
>> at org.basex.query.QueryProcessor.cache(QueryProcessor.java:103)  
>> at org.basex.core.cmd.AQuery.query(AQuery.java:83)  
>> at org.basex.core.cmd.XQuery.run(XQuery.java:22)  
>> at org.basex.core.Command.run(Command.java:398)  
>> at org.basex.core.Command.execute(Command.java:100)  
>> at org.basex.gui.GUI.exec(GUI.java:472)  
>> at org.basex.gui.GUI.access$400(GUI.java:43)  
>> at org.basex.gui.GUI$7.run(GUI.java:412)  
>> Compiling:  
>> - pre-evaluating db:open("MeSH")  
>> - pre-evaluating db:open("CTGov")  
>> - inlining $trees_0  
>> - applying full-text index for { $mesh_2 } using language 'English'  
>> - applying full-text index for { $mesh_2 } using language 'English'  
>> - inlining $tn2_4  
>> - removing variable $tn1_3  
>> - applying text index for "NCT00303472"  
>> - rewriting where clause(s)  
>> Query:  
>> let $trees := db:open('MeSH')/DescriptorRecordSet/DescriptorRecord for  
>> $article in db:open('CTGov')/clinical_study where $article/id_info/nct_id  
>> =  
>> 'NCT00303472' let $mesh := $article/condition_browse/mesh_term let $tn1 :=  
>> $trees[DescriptorName/String contains text { $mesh }] let $tn2 :=  
>> $trees[DescriptorName/String contains text { $mesh } using stop words at  
>>  
>> "/Volumes/Extra/Documents/Standards/MeSH/stopwords.txt"]/TreeNumberList/TreeNumber
>>   
>> return <match> { $article/id_info/nct_id, $mesh, $tn2 } </match>  
>> Optimized Query:  
>> for $article_1 in db:text("CTGov",  
>> "NCT00303472")/parent::*:nct_id/parent::*:id_info/parent::*:clinical_study  
>> let $mesh_2 := $article_1/*:condition_browse/*:mesh_term return element  
>> match { (($article_1/*:id_info/*:nct_id, $mesh_2, ft:search("MeSH", {  
>> $mesh_2 } using language  
>>  
>> 'English')/parent::*:String/parent::*:DescriptorName/parent::*:DescriptorRecord/TreeNumberList/TreeNumber))
>>   
>> }  
>> Query plan:  
>> <QueryPlan compiled="true">  
>> <GFLWOR>  
>> <For>  
>> <Var name="$article" id="1"/>  
>> <IterPath>  
>> <ValueAccess data="CTGov" type="TEXT" name="*:nct_id">  
>> <Str value="NCT00303472" type="xs:string"/>  
>> </ValueAccess>  
>> <IterStep axis="parent" test="*:id_info"/>  
>> <IterStep axis="parent" test="*:clinical_study"/>  
>> </IterPath>  
>> </For>  
>> <Let>  
>> <Var name="$mesh" id="2"/>  
>> <IterPath>  
>> <VarRef>  
>> <Var name="$article" id="1"/>  
>> </VarRef>  
>> <IterStep axis="child" test="*:condition_browse"/>  
>> <IterStep axis="child" test="*:mesh_term"/>  
>> </IterPath>  
>> </Let>  
>> <CElem>  
>> <QNm value="match" type="xs:QName"/>  
>> <List>  
>> <IterPath>  
>> <VarRef>  
>> <Var name="$article" id="1"/>  
>> </VarRef>  
>> <IterStep axis="child" test="*:id_info"/>  
>> <IterStep axis="child" test="*:nct_id"/>  
>> </IterPath>  
>> <VarRef>  
>> <Var name="$mesh" id="2"/>  
>> </VarRef>  
>> <CachedPath>  
>> <FTIndexAccess data="MeSH">  
>> <FTWords>  
>> <VarRef>  
>> <Var name="$mesh" id="2"/>  
>> </VarRef>  
>> </FTWords>  
>> </FTIndexAccess>  
>> <IterStep axis="parent" test="*:String"/>  
>> <IterStep axis="parent" test="*:DescriptorName"/>  
>> <IterStep axis="parent" test="*:DescriptorRecord"/>  
>> <IterStep axis="child" test="TreeNumberList"/>  
>> <IterStep axis="child" test="TreeNumber"/>  
>> </CachedPath>  
>> </List>  
>> </CElem>  
>> </GFLWOR>  
>> </QueryPlan>  
>>  
>>  
>> Ron Katriel, Ph.D. | Senior Data Scientist | Medidata Solutions  
>> 350 Hudson Street, 7th Floor, New York, NY 10014  
>> rkatr...@mdsol.com | direct: +1 201 337 3622 | mobile: +1 201 675 5598 |  
>> main: +1 212 918 1800  
>>  

Reply via email to