Re: [basex-talk] Text index on uncommon node

Matthew Dziuban Mon, 30 Oct 2023 14:58:19 -0700

Hi Christian,

Good thinking! Indeed, 'false' appears 7.25 million times in the database.
I'll look into selective indexing, thanks for the link, and thanks again
for the help -- this gives me a much better idea of how the text index
works in general.


Best,
Matt

On Mon, Oct 30, 2023 at 5:23 PM Christian Grün <christian.gr...@gmail.com>
wrote:

> Hi Matt,
>
> I assume the culprit is the common string you're looking up. It probably
> occurs very often in your database. You can e.g. verify this via
> index:texts('MyDatabase', 'false') or count(db:get('MyDatabase')//text()[.
> = 'false']).
>
> If you don't need to perform exact queries on arbitrary elements, you
> could think about restricting the text index to specific element names to
> reduce the number of intermediate hits [1].
>
> Hope this helps,
> Christian
>
> [1] https://docs.basex.org/wiki/Indexes#Selective_Indexing
>
>
> Matthew Dziuban <mrdziu...@gmail.com> schrieb am Mo., 30. Okt. 2023,
> 22:09:
>
>> Hi Christian,
>>
>> Thanks for the quick response, and sure thing. It does look like the text
>> index is applied in both cases. While I was writing up an example of the
>> slow query, I realized that the way I'll actually be querying it is by
>> wrapping the condition in not(...). After doing so, it now only takes 6
>> seconds to run -- still slower but better. The slow query looks like this:
>>
>> for $x in db:open('MyDatabase')/data/element
>> where not($x/child1/child2/valid = 'false')
>> return $x
>>
>> And the fast query looks like this:
>>
>> for $x in db:open('MyDatabase')/data/element
>> where $x/child3/child4/id = '123'
>> return $x
>>
>> Let me know if this is helpful -- if not, I could share more info in a
>> direct email about the actual structure of the database and the Info output.
>>
>> Thanks,
>> Matt
>>
>> On Mon, Oct 30, 2023 at 4:40 PM Christian Grün <christian.gr...@gmail.com>
>> wrote:
>>
>>> Hi Matt,
>>>
>>> In general, all nodes are treated identically, no matter what the
>>> hierarchy is or regular the target path is.
>>>
>>> Could you share some more information with us? How do the queries look
>>> like (the slow and the fast one)? Is the text index applied in both cases?
>>>
>>> Thanks in advance,
>>> Christian
>>>
>>>
>>>
>>> Matthew Dziuban <mrdziu...@gmail.com> schrieb am Mo., 30. Okt. 2023,
>>> 21:33:
>>>
>>>> Hi all,
>>>>
>>>> I'm working with a database structured like so:
>>>>
>>>> <data>
>>>>   <element>...</element>
>>>>   <element>...</element>
>>>>   <element>...</element>
>>>> </data>
>>>>
>>>> There are a total of about 1.5 million <element> nodes in the database.
>>>> Each <element> has many child nodes, one of which is uncommon -- it only
>>>> appears in 727 <element>s.
>>>>
>>>> I'm writing a query that has a condition on this uncommon field, but
>>>> the query takes about 20 seconds to run, whereas another with a condition
>>>> on a child node that appears in every <element> only takes about 20
>>>> milliseconds to run.
>>>>
>>>> Based on the Info in the GUI, it does appear that the text index is
>>>> being used -- I see 'apply text index for "..."'. Is it expected that the
>>>> query time would be this much longer? Is the text index somehow built
>>>> differently for nodes that don't appear often in the database?
>>>>
>>>> Thanks in advance,
>>>> Matt
>>>>
>>>
>>
>> --
>> Matthew R. Dziuban
>> mattdziuban.com
>> 703-973-6717
>> mrdziu...@gmail.com
>>
>

-- 
Matthew R. Dziuban
mattdziuban.com
703-973-6717
mrdziu...@gmail.com

Re: [basex-talk] Text index on uncommon node

Reply via email to