Christian, Please ignore the report below. It was triggered by a syntax error in the query (in a section not shown below). The error message threw me off as it was unrelated.
Thanks, Ron Ron Katriel, Ph.D. | Senior Data Scientist | Medidata Solutions 350 Hudson Street, 7th Floor, New York, NY 10014 rkatr...@mdsol.com | direct: +1 201 337 3622 | mobile: +1 201 675 5598 | main: +1 212 918 1800 On September 15, 2015 at 1:43:44 PM, Ron Katriel (rkatr...@mdsol.com) wrote: Hi Christian, I downloaded the latest release and confirmed the fix. Thanks for the quick turnaround! However, I am now getting a different (presumably unrelated) error Stopped at /Users/rkatriel/Documents/Data Science/Data Sets/CUR/CustomerUsageReport/2015/07-15-2015/JOIN/cdsstudies.drugbank.join.result/file, 1/51: [XQST0033] Duplicate declaration of prefix 'functx'. when executing the following query (rest of code omitted for brevity) declare namespace functx = "http://www.functx.com"; declare function functx:value-union ($arg1 as xs:anyAtomicType*, $arg2 as xs:anyAtomicType*) as xs:anyAtomicType* { ($arg1, $arg2) }; Changing the namespace or function name does not help; omitting it produces a different error (No namespace declared for 'functx:value-union’). This worked before. Has something changed? Best, Ron Ron Katriel, Ph.D. | Senior Data Scientist | Medidata Solutions 350 Hudson Street, 7th Floor, New York, NY 10014 rkatr...@mdsol.com | direct: +1 201 337 3622 | mobile: +1 201 675 5598 | main: +1 212 918 1800 On September 15, 2015 at 7:05:59 AM, Christian Grün (christian.gr...@gmail.com) wrote: Hi Ron, The problem is fixed in the latest snapshot [1]. By the way: If you specify a stopword when creating a database, there is no need to specify it in the query. I have also updated our Wiki article on Full Text Index Processing to make this more explicit [2]. Hope this helps, Christian [1] http://files.basex.org/releases/latest/ [2] http://docs.basex.org/wiki/Full-Text#Index_Processing On Mon, Sep 14, 2015 at 4:30 PM, Ron Katriel <rkatr...@mdsol.com> wrote: > Hi Christian, > > Thanks for following up on this. Please use the attached XML files to create > the CTGov and MeSH databases (the first contains just NCT00303472 while the > second the definitions of the 4 MeSH terms referenced in the > <condition_browse> section of this CT.gov trial). Also attached is the > stopwords file (containing just ‘syndrome'). I verified that the issue is > reproducible with these minimal files. > > Note: I enabled full text indexing for both databases (using SET FTINDEX > true), in case it matters. > > Looking forward to having this resolved. > > Best, > Ron > > > Ron Katriel, Ph.D. | Senior Data Scientist | Medidata Solutions > 350 Hudson Street, 7th Floor, New York, NY 10014 > rkatr...@mdsol.com | direct: +1 201 337 3622 | mobile: +1 201 675 5598 | > main: +1 212 918 1800 > > > On September 14, 2015 at 6:28:24 AM, Christian Grün > (christian.gr...@gmail.com) wrote: > > Hi Ron, > > Sorry for late reply and thanks for your bug report. I am pretty sure > this is a bug -- but it's difficult to guess what's going wrong. Could > you possibly point me to the XML source documents or ideally provide > me a small example to test? > > Thanks, > Christian > > > On Sun, Aug 30, 2015 at 5:56 PM, Ron Katriel <rkatr...@mdsol.com> wrote: >> Hi, >> >> I encountered a peculiar error with a query using a stopwords file in the >> context of a full text search. The query joins two XML databases: CT.gov >> (containing 86635503 nodes) and the 2015 MeSH dictionary (containing >> 12064461 nodes). I am debugging using CT.gov trial NCT00303472, hardcoded >> in >> the ‘where' clause of the following query: >> >> let $trees := db:open('MeSH')/DescriptorRecordSet/DescriptorRecord >> for $article in db:open('CTGov')/clinical_study >> where $article/id_info/nct_id = 'NCT00303472' >> let $mesh := $article/condition_browse/mesh_term >> let $tn1 := $trees[DescriptorName/String contains text { $mesh }] >> let $tn2 := $trees[DescriptorName/String contains text { $mesh } using >> stop >> words at >> >> "/Volumes/Extra/Documents/Standards/MeSH/stopwords.txt"]/TreeNumberList/TreeNumber >> return <match> { $article/id_info/nct_id, $mesh, $tn2 } </match> >> >> When the return clause contains the variable $tn2 (i.e., using stopwords - >> as shown above) a Java NullPointerException is generated (see the stack >> trace below). However, when only $tn1 is returned there is no problem (the >> code for $tn2 is removed by the optimizer). >> >> The issue is related to a specific stopword (“syndrome”). When the >> stopword >> is removed from the file the exception does not occur. Surprisingly, when >> the stopword is in uppercase (“Syndrome”) the issue does not occur - even >> though the target MeSH term in this CT.gov trial is in uppercase, that is >> >> <mesh_term>Syndrome</mesh_term> >> >> Am I doing something wrong, or is this a real bug in BaseX? If the former, >> please suggest a workaround as I would like to filter out generic MeSH >> terms >> that match the stopwords before any further processing (I removed a lot of >> code from the above query to make it easier to debug). >> >> Thanks, >> Ron >> >> >> Error: >> Improper use? Potential bug? Your feedback is welcome: >> Contact: basex-talk@mailman.uni-konstanz.de >> Version: BaseX 8.2 >> Java: Oracle Corporation, 1.8.0_20 >> OS: Mac OS X, x86_64 >> Stack Trace: >> java.lang.NullPointerException >> at org.basex.query.expr.ft.FTWords$1.next(FTWords.java:166) >> at org.basex.query.expr.ft.FTIndexAccess$1.next(FTIndexAccess.java:48) >> at org.basex.query.expr.ft.FTIndexAccess$1.next(FTIndexAccess.java:45) >> at org.basex.query.iter.Iter.value(Iter.java:53) >> at org.basex.query.expr.ParseExpr.value(ParseExpr.java:67) >> at org.basex.query.QueryContext.value(QueryContext.java:421) >> at org.basex.query.expr.path.CachedPath.iter(CachedPath.java:41) >> at org.basex.query.expr.path.CachedPath.iter(CachedPath.java:22) >> at org.basex.query.QueryContext.iter(QueryContext.java:410) >> at org.basex.query.expr.List$1.next(List.java:133) >> at org.basex.query.expr.constr.Constr.add(Constr.java:70) >> at org.basex.query.expr.constr.CElem.item(CElem.java:92) >> at org.basex.query.expr.constr.CElem.item(CElem.java:23) >> at org.basex.query.expr.ParseExpr.iter(ParseExpr.java:43) >> at org.basex.query.expr.gflwor.GFLWOR$1.next(GFLWOR.java:99) >> at org.basex.query.MainModule$1.next(MainModule.java:114) >> at org.basex.query.QueryContext.cache(QueryContext.java:660) >> at org.basex.query.QueryProcessor.cache(QueryProcessor.java:103) >> at org.basex.core.cmd.AQuery.query(AQuery.java:83) >> at org.basex.core.cmd.XQuery.run(XQuery.java:22) >> at org.basex.core.Command.run(Command.java:398) >> at org.basex.core.Command.execute(Command.java:100) >> at org.basex.gui.GUI.exec(GUI.java:472) >> at org.basex.gui.GUI.access$400(GUI.java:43) >> at org.basex.gui.GUI$7.run(GUI.java:412) >> Compiling: >> - pre-evaluating db:open("MeSH") >> - pre-evaluating db:open("CTGov") >> - inlining $trees_0 >> - applying full-text index for { $mesh_2 } using language 'English' >> - applying full-text index for { $mesh_2 } using language 'English' >> - inlining $tn2_4 >> - removing variable $tn1_3 >> - applying text index for "NCT00303472" >> - rewriting where clause(s) >> Query: >> let $trees := db:open('MeSH')/DescriptorRecordSet/DescriptorRecord for >> $article in db:open('CTGov')/clinical_study where $article/id_info/nct_id >> = >> 'NCT00303472' let $mesh := $article/condition_browse/mesh_term let $tn1 := >> $trees[DescriptorName/String contains text { $mesh }] let $tn2 := >> $trees[DescriptorName/String contains text { $mesh } using stop words at >> >> "/Volumes/Extra/Documents/Standards/MeSH/stopwords.txt"]/TreeNumberList/TreeNumber >> return <match> { $article/id_info/nct_id, $mesh, $tn2 } </match> >> Optimized Query: >> for $article_1 in db:text("CTGov", >> "NCT00303472")/parent::*:nct_id/parent::*:id_info/parent::*:clinical_study >> let $mesh_2 := $article_1/*:condition_browse/*:mesh_term return element >> match { (($article_1/*:id_info/*:nct_id, $mesh_2, ft:search("MeSH", { >> $mesh_2 } using language >> >> 'English')/parent::*:String/parent::*:DescriptorName/parent::*:DescriptorRecord/TreeNumberList/TreeNumber)) >> } >> Query plan: >> <QueryPlan compiled="true"> >> <GFLWOR> >> <For> >> <Var name="$article" id="1"/> >> <IterPath> >> <ValueAccess data="CTGov" type="TEXT" name="*:nct_id"> >> <Str value="NCT00303472" type="xs:string"/> >> </ValueAccess> >> <IterStep axis="parent" test="*:id_info"/> >> <IterStep axis="parent" test="*:clinical_study"/> >> </IterPath> >> </For> >> <Let> >> <Var name="$mesh" id="2"/> >> <IterPath> >> <VarRef> >> <Var name="$article" id="1"/> >> </VarRef> >> <IterStep axis="child" test="*:condition_browse"/> >> <IterStep axis="child" test="*:mesh_term"/> >> </IterPath> >> </Let> >> <CElem> >> <QNm value="match" type="xs:QName"/> >> <List> >> <IterPath> >> <VarRef> >> <Var name="$article" id="1"/> >> </VarRef> >> <IterStep axis="child" test="*:id_info"/> >> <IterStep axis="child" test="*:nct_id"/> >> </IterPath> >> <VarRef> >> <Var name="$mesh" id="2"/> >> </VarRef> >> <CachedPath> >> <FTIndexAccess data="MeSH"> >> <FTWords> >> <VarRef> >> <Var name="$mesh" id="2"/> >> </VarRef> >> </FTWords> >> </FTIndexAccess> >> <IterStep axis="parent" test="*:String"/> >> <IterStep axis="parent" test="*:DescriptorName"/> >> <IterStep axis="parent" test="*:DescriptorRecord"/> >> <IterStep axis="child" test="TreeNumberList"/> >> <IterStep axis="child" test="TreeNumber"/> >> </CachedPath> >> </List> >> </CElem> >> </GFLWOR> >> </QueryPlan> >> >> >> Ron Katriel, Ph.D. | Senior Data Scientist | Medidata Solutions >> 350 Hudson Street, 7th Floor, New York, NY 10014 >> rkatr...@mdsol.com | direct: +1 201 337 3622 | mobile: +1 201 675 5598 | >> main: +1 212 918 1800 >>