;) Looks good! Thanks for the updated report, Christian
On Tue, Feb 3, 2015 at 1:13 PM, Menashè Eliezer <melie...@ogs.trieste.it> wrote: > Hi Christian, > > Thank you! The performance arrives to 0.5 sec! > > The biggest improvement is related to the query rephrasing you've suggested. > Then the latest snapshot also helps a lot! > You may want to know that in the log of the latest snapshot I see > applying attribute index for "7827" > which is not clear to the user, instead of BaseX80-20150130.124009 which has > also used indexing: > applying attribute index for ("ALKY", "AYMD") > > I'm attaching the first and the second launch of the query using BaseXGUI. > Relaunching the same query reduces the time from over 1 second to 0.5 > second. > Some data: > BaseX80-20150130.124009 > Total Time: 30676.02 ms > After using "for $x in > collection("ALL-CDIS")/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification": > Total Time: 5456.74 ms > applying attribute index for ("ALKY", "AYMD") in log. > Second launch: 1333.71 ms > Latest snapshot (BaseX80-20150202.121033): > 1st: Total Time: 1873.02 ms > 2nd: Total Time: 548.62 ms > > With kind regards, > Menashè > > > On 02/02/2015 02:02 PM, Menashè Eliezer wrote: >> >> Hi Christian, >> >> Thank you very much! Unfortunately I'll be at the office only tomorrow. >> >> Menashè >> >> On Sat, 31 Jan 2015 16:42:32 +0100, Christian Grün >> <christian.gr...@gmail.com> wrote: >>> >>> Hi Menashè, >>> >>> With the latest snapshot [1], your original query should now be >>> rewritten for index access as well. Looking forward to your tests, >>> >>> Christian >>> >>> PS: In terms of performance, it may still be worthwhile to move >>> redundant paths to the for clause; but just try and see. >>> >>> [1] http://files.basex.org/releases/latest/ >>> >>> >>> >>> On Fri, Jan 30, 2015 at 9:49 PM, Christian Grün >>> <christian.gr...@gmail.com> wrote: >>>> >>>> Hi Menashè, >>>> >>>>> Should I expect to see the usage of an index for each of the where >>> >>> phrases? >>>> >>>> Usually, only one predicate will be rewritten for index access, and >>>> the remaining conditions will be answered sequentially. >>>> >>>>> Have a nice weekend! >>>> >>>> Enjoy, >>>> Christian >>>> >>>> >>>>> Menashè >>>>> >>>>> On Fri, 30 Jan 2015 18:11:59 +0100, Christian Grün >>>>> <christian.gr...@gmail.com> wrote: >>>>>> >>>>>> Hi Menashè, >>>>>> >>>>>> Thanks for the XML samples you sent me in private. I noticed that the >>>>>> index rewritings will only be triggered if you formulate your query as >>>>>> follows: >>>>>> >>>>>> OLD: >>>>>> for $x in collection("ALL-CDIS") >>>>>> where $x/gmd:MD_Metadata/gmd:identificationInfo/... >>>>>> return ... >>>>>> >>>>>> NEW: >>>>>> for $x in collection("ALL-CDIS")/gmd:MD_Metadata >>>>>> where $x/gmd:identificationInfo/... >>>>>> return ... >>>>>> >>>>>> It's difficult to explain in short sentences why Variant 1 cannot be >>>>>> optimized that straightforward (basically, it's quite a different >>>>>> pattern to look for), but I'll check out if we can extend our matcher >>>>>> to also support these kind of queries. >>>>>> >>>>>> So, if possible, I would recommend you for now (and at least for >>>>>> testing) to move the root element test after the collection() >>>>>> function. I noticed that the first three child steps are the same in >>>>>> all of your conditions: >>>>>> >>>>>> gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification >>>>>> >>>>>> If that will be always be the case, it surely makes sense to move all >>>>>> of them to the "for" clause. >>>>>> >>>>>> Looking forward to your updated performance tests, >>>>>> Christian >>>>>> _______________________________ >>>>>> >>>>>> On Fri, Jan 30, 2015 at 5:55 PM, Christian Grün >>>>>> <christian.gr...@gmail.com> wrote: >>>>>>> >>>>>>> Could you possibly provide me with a small snapshot of your data >>>>>>> sources (one, two documents might be sufficient)? >>>>>>> >>>>>>> >>>>>>> On Fri, Jan 30, 2015 at 5:52 PM, Menashè Eliezer >>>>>>> <melie...@ogs.trieste.it> wrote: >>>>>>>> >>>>>>>> Almost the same speed with version 8.0. >>>>>>>> No indexing (no "applying" in the query info). >>>>>>>> As I've attached before, indexes are active for this DB. >>>>>>>> >>>>>>>> With kind regards, >>>>>>>> Menashè >>>>>>>> >>>>>>>> >>>>>>>> On 01/30/2015 05:31 PM, Christian Grün wrote: >>>>>>>>> >>>>>>>>> It's indeed interesting that your query does not use any of the >>>>>>>>> existing index structures (if they did, you would find strings like >>>>>>>>> "applying text index" or "applying attribute index" in the query >>>>>>>>> info). Maybe/hopefully things look different with Version 8.0. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Jan 30, 2015 at 5:26 PM, Menashè Eliezer >>>>>>>>> <melie...@ogs.trieste.it> wrote: >>>>>>>>>> >>>>>>>>>> On 01/30/2015 05:18 PM, Christian Grün wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >> >> /gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:descriptiveKeywords[1]/gmd:MD_Keywords/gmd:keyword[2]/sdn:SDN_ParameterDiscoveryCode/@codeListValue >>>>>>>>>>>> >>>>>>>>>>>> How can I remove *? >>>>>>>>>>> >>>>>>>>>>> Simply remove the predicate; a[*]/b is the same as a/b. >>>>>>>>>> >>>>>>>>>> Maybe I wasn't clear. The actual number appears in the xml file, >>>>>> >>>>>> e.g., >>>>>>>>>> >>>>>>>>>> gmd:descriptiveKeywords[1] >>>>>>>>>> Anyway, I've removed all [*] and I get the same correct result, >>>>>> >>>>>> however >>>>>>>>>> >>>>>>>>>> the >>>>>>>>>> processing time is doubled... >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> * In some cases, if you know that an element name is distinct, >>> >>> you >>>>>> >>>>>> can >>>>>>>>>>>>> >>>>>>>>>>>>> get rid of all the explicit child steps and directly address >>> >>> the >>>>>> >>>>>> node >>>>>>>>>>>>> >>>>>>>>>>>>> via the descendant axis. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, but it's not relevant in my case. >>>>>>>>>>> >>>>>>>>>>> Is it because the element names are not distinct? Or is it >>> >>> because >>>>>>>>>>> >>>>>>>>>>> your input form allows users to choose arbitrary paths for >>> >>> arbitrary >>>>>>>>>>> >>>>>>>>>>> documents? >>>>>>>>>> >>>>>>>>>> The element names are not distinct. >>>>>>>>>> >>>>>>>>>>>> Sure, I'l also try BaseX 8.0 and compare. Should I recreate the >>> >>> db >>>>>>>>>>>> >>>>>>>>>>>> importing >>>>>>>>>>>> the xml files for testing the improved indexing? >>>>>>>>>>> >>>>>>>>>>> We have actually improved support for collections, but the >>> >>> database >>>>>>>>>>> >>>>>>>>>>> format itself has not changed, so it shouldn't make a difference >>> >>> in >>>>>>>>>>> >>>>>>>>>>> your case. >>>>>>>>>>> >>>>>>>>>>> Christian >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> [1] http://files.basex.org/releases/latest >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Jan 30, 2015 at 3:55 PM, Menashè Eliezer >>>>>>>>>>>>> <melie...@ogs.trieste.it> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>> I wonder if the attached query can be optimised. I'm attaching >>>>>> >>>>>> all >>>>>>>>>>>>>> >>>>>>>>>>>>>> relevant >>>>>>>>>>>>>> information. >>>>>>>>>>>>>> Basex 7.9, Debian, powerful server. >>>>>>>>>>>>>> This is just an example. The queries will be built based on a >>>>>>>>>>>>>> compilation >>>>>>>>>>>>>> of >>>>>>>>>>>>>> a search form. >>>>>>>>>>>>>> Any help would be appreciated. >>>>>>>>>>>>>> 40 seconds are not acceptable. >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> With kind regards, >>>>>>>>>>>>>> Menashè >>>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> With kind regards, >>>>>>>>>>>> Menashè >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> With kind regards, >>>>>>>>>> Menashè >>>>>>>>>> >>>>> -- >>>>> Menashè > >