Re: [basex-talk] Slow query

Christian Grün Tue, 03 Feb 2015 06:03:41 -0800

;) Looks good!

Thanks for the updated report,
Christian



On Tue, Feb 3, 2015 at 1:13 PM, Menashè Eliezer <melie...@ogs.trieste.it> wrote:
> Hi Christian,
>
> Thank you! The performance arrives to 0.5 sec!
>
> The biggest improvement is related to the query rephrasing you've suggested.
> Then the latest snapshot also helps a lot!
> You may want to know that in the log of the latest snapshot I see
> applying attribute index for "7827"
> which is not clear to the user, instead of BaseX80-20150130.124009 which has
> also used indexing:
> applying attribute index for ("ALKY", "AYMD")
>
> I'm attaching the first and the second launch of the query using BaseXGUI.
> Relaunching the same query reduces the time from over 1 second to 0.5
> second.
> Some data:
> BaseX80-20150130.124009
> Total Time: 30676.02 ms
> After using "for $x in
> collection("ALL-CDIS")/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification":
> Total Time: 5456.74 ms
> applying attribute index for ("ALKY", "AYMD") in log.
> Second launch: 1333.71 ms
> Latest snapshot (BaseX80-20150202.121033):
> 1st: Total Time: 1873.02 ms
> 2nd: Total Time: 548.62 ms
>
> With kind regards,
> Menashè
>
>
> On 02/02/2015 02:02 PM, Menashè Eliezer wrote:
>>
>> Hi Christian,
>>
>> Thank you very much! Unfortunately I'll be at the office only tomorrow.
>>
>> Menashè
>>
>> On Sat, 31 Jan 2015 16:42:32 +0100, Christian Grün
>> <christian.gr...@gmail.com> wrote:
>>>
>>> Hi Menashè,
>>>
>>> With the latest snapshot [1], your original query should now be
>>> rewritten for index access as well. Looking forward to your tests,
>>>
>>> Christian
>>>
>>> PS: In terms of performance, it may still be worthwhile to move
>>> redundant paths to the for clause; but just try and see.
>>>
>>> [1] http://files.basex.org/releases/latest/
>>>
>>>
>>>
>>> On Fri, Jan 30, 2015 at 9:49 PM, Christian Grün
>>> <christian.gr...@gmail.com> wrote:
>>>>
>>>> Hi Menashè,
>>>>
>>>>> Should I expect to see the usage of an index for each of the where
>>>
>>> phrases?
>>>>
>>>> Usually, only one predicate will be rewritten for index access, and
>>>> the remaining conditions will be answered sequentially.
>>>>
>>>>> Have a nice weekend!
>>>>
>>>> Enjoy,
>>>> Christian
>>>>
>>>>
>>>>> Menashè
>>>>>
>>>>> On Fri, 30 Jan 2015 18:11:59 +0100, Christian Grün
>>>>> <christian.gr...@gmail.com> wrote:
>>>>>>
>>>>>> Hi Menashè,
>>>>>>
>>>>>> Thanks for the XML samples you sent me in private. I noticed that the
>>>>>> index rewritings will only be triggered if you formulate your query as
>>>>>> follows:
>>>>>>
>>>>>> OLD:
>>>>>>    for $x in collection("ALL-CDIS")
>>>>>>    where $x/gmd:MD_Metadata/gmd:identificationInfo/...
>>>>>>    return ...
>>>>>>
>>>>>> NEW:
>>>>>>    for $x in collection("ALL-CDIS")/gmd:MD_Metadata
>>>>>>    where $x/gmd:identificationInfo/...
>>>>>>    return ...
>>>>>>
>>>>>> It's difficult to explain in short sentences why Variant 1 cannot be
>>>>>> optimized that straightforward (basically, it's quite a different
>>>>>> pattern to look for), but I'll check out if we can extend our matcher
>>>>>> to also support these kind of queries.
>>>>>>
>>>>>> So, if possible, I would recommend you for now (and at least for
>>>>>> testing) to move the root element test after the collection()
>>>>>> function. I noticed that the first three child steps are the same in
>>>>>> all of your conditions:
>>>>>>
>>>>>>    gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification
>>>>>>
>>>>>> If that will be always be the case, it surely makes sense to move all
>>>>>> of them to the "for" clause.
>>>>>>
>>>>>> Looking forward to your updated performance tests,
>>>>>> Christian
>>>>>> _______________________________
>>>>>>
>>>>>> On Fri, Jan 30, 2015 at 5:55 PM, Christian Grün
>>>>>> <christian.gr...@gmail.com> wrote:
>>>>>>>
>>>>>>> Could you possibly provide me with a small snapshot of your data
>>>>>>> sources (one, two documents might be sufficient)?
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jan 30, 2015 at 5:52 PM, Menashè Eliezer
>>>>>>> <melie...@ogs.trieste.it> wrote:
>>>>>>>>
>>>>>>>> Almost the same speed with version 8.0.
>>>>>>>> No indexing (no "applying" in the query info).
>>>>>>>> As I've attached before, indexes are active for this DB.
>>>>>>>>
>>>>>>>> With kind regards,
>>>>>>>> Menashè
>>>>>>>>
>>>>>>>>
>>>>>>>> On 01/30/2015 05:31 PM, Christian Grün wrote:
>>>>>>>>>
>>>>>>>>> It's indeed interesting that your query does not use any of the
>>>>>>>>> existing index structures (if they did, you would find strings like
>>>>>>>>> "applying text index" or "applying attribute index" in the query
>>>>>>>>> info). Maybe/hopefully things look different with Version 8.0.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jan 30, 2015 at 5:26 PM, Menashè Eliezer
>>>>>>>>> <melie...@ogs.trieste.it> wrote:
>>>>>>>>>>
>>>>>>>>>> On 01/30/2015 05:18 PM, Christian Grün wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>
>> /gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:descriptiveKeywords[1]/gmd:MD_Keywords/gmd:keyword[2]/sdn:SDN_ParameterDiscoveryCode/@codeListValue
>>>>>>>>>>>>
>>>>>>>>>>>> How can I remove *?
>>>>>>>>>>>
>>>>>>>>>>> Simply remove the predicate; a[*]/b is the same as a/b.
>>>>>>>>>>
>>>>>>>>>> Maybe I wasn't clear. The actual number appears in the xml file,
>>>>>>
>>>>>> e.g.,
>>>>>>>>>>
>>>>>>>>>> gmd:descriptiveKeywords[1]
>>>>>>>>>> Anyway, I've removed all [*] and I get the same correct result,
>>>>>>
>>>>>> however
>>>>>>>>>>
>>>>>>>>>> the
>>>>>>>>>> processing time is doubled...
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> * In some cases, if you know that an element name is distinct,
>>>
>>> you
>>>>>>
>>>>>> can
>>>>>>>>>>>>>
>>>>>>>>>>>>> get rid of all the explicit child steps and directly address
>>>
>>> the
>>>>>>
>>>>>> node
>>>>>>>>>>>>>
>>>>>>>>>>>>> via the descendant axis.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks, but it's not relevant in my case.
>>>>>>>>>>>
>>>>>>>>>>> Is it because the element names are not distinct? Or is it
>>>
>>> because
>>>>>>>>>>>
>>>>>>>>>>> your input form allows users to choose arbitrary paths for
>>>
>>> arbitrary
>>>>>>>>>>>
>>>>>>>>>>> documents?
>>>>>>>>>>
>>>>>>>>>> The element names are not distinct.
>>>>>>>>>>
>>>>>>>>>>>> Sure, I'l also try BaseX 8.0 and compare. Should I recreate the
>>>
>>> db
>>>>>>>>>>>>
>>>>>>>>>>>> importing
>>>>>>>>>>>> the xml files for testing the improved indexing?
>>>>>>>>>>>
>>>>>>>>>>> We have actually improved support for collections, but the
>>>
>>> database
>>>>>>>>>>>
>>>>>>>>>>> format itself has not changed, so it shouldn't make a difference
>>>
>>> in
>>>>>>>>>>>
>>>>>>>>>>> your case.
>>>>>>>>>>>
>>>>>>>>>>> Christian
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> [1] http://files.basex.org/releases/latest
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jan 30, 2015 at 3:55 PM, Menashè Eliezer
>>>>>>>>>>>>> <melie...@ogs.trieste.it> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>> I wonder if the attached query can be optimised. I'm attaching
>>>>>>
>>>>>> all
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> relevant
>>>>>>>>>>>>>> information.
>>>>>>>>>>>>>> Basex 7.9, Debian, powerful server.
>>>>>>>>>>>>>> This is just an example. The queries will be built based on a
>>>>>>>>>>>>>> compilation
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>> a search form.
>>>>>>>>>>>>>> Any help would be appreciated.
>>>>>>>>>>>>>> 40 seconds are not acceptable.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> With kind regards,
>>>>>>>>>>>>>> Menashè
>>>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> With kind regards,
>>>>>>>>>>>> Menashè
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> With kind regards,
>>>>>>>>>> Menashè
>>>>>>>>>>
>>>>> --
>>>>> Menashè
>
>

Re: [basex-talk] Slow query

Reply via email to