Hi Christian, Good thinking! Indeed, 'false' appears 7.25 million times in the database. I'll look into selective indexing, thanks for the link, and thanks again for the help -- this gives me a much better idea of how the text index works in general.
Best, Matt On Mon, Oct 30, 2023 at 5:23 PM Christian Grün <christian.gr...@gmail.com> wrote: > Hi Matt, > > I assume the culprit is the common string you're looking up. It probably > occurs very often in your database. You can e.g. verify this via > index:texts('MyDatabase', 'false') or count(db:get('MyDatabase')//text()[. > = 'false']). > > If you don't need to perform exact queries on arbitrary elements, you > could think about restricting the text index to specific element names to > reduce the number of intermediate hits [1]. > > Hope this helps, > Christian > > [1] https://docs.basex.org/wiki/Indexes#Selective_Indexing > > > Matthew Dziuban <mrdziu...@gmail.com> schrieb am Mo., 30. Okt. 2023, > 22:09: > >> Hi Christian, >> >> Thanks for the quick response, and sure thing. It does look like the text >> index is applied in both cases. While I was writing up an example of the >> slow query, I realized that the way I'll actually be querying it is by >> wrapping the condition in not(...). After doing so, it now only takes 6 >> seconds to run -- still slower but better. The slow query looks like this: >> >> for $x in db:open('MyDatabase')/data/element >> where not($x/child1/child2/valid = 'false') >> return $x >> >> And the fast query looks like this: >> >> for $x in db:open('MyDatabase')/data/element >> where $x/child3/child4/id = '123' >> return $x >> >> Let me know if this is helpful -- if not, I could share more info in a >> direct email about the actual structure of the database and the Info output. >> >> Thanks, >> Matt >> >> On Mon, Oct 30, 2023 at 4:40 PM Christian Grün <christian.gr...@gmail.com> >> wrote: >> >>> Hi Matt, >>> >>> In general, all nodes are treated identically, no matter what the >>> hierarchy is or regular the target path is. >>> >>> Could you share some more information with us? How do the queries look >>> like (the slow and the fast one)? Is the text index applied in both cases? >>> >>> Thanks in advance, >>> Christian >>> >>> >>> >>> Matthew Dziuban <mrdziu...@gmail.com> schrieb am Mo., 30. Okt. 2023, >>> 21:33: >>> >>>> Hi all, >>>> >>>> I'm working with a database structured like so: >>>> >>>> <data> >>>> <element>...</element> >>>> <element>...</element> >>>> <element>...</element> >>>> </data> >>>> >>>> There are a total of about 1.5 million <element> nodes in the database. >>>> Each <element> has many child nodes, one of which is uncommon -- it only >>>> appears in 727 <element>s. >>>> >>>> I'm writing a query that has a condition on this uncommon field, but >>>> the query takes about 20 seconds to run, whereas another with a condition >>>> on a child node that appears in every <element> only takes about 20 >>>> milliseconds to run. >>>> >>>> Based on the Info in the GUI, it does appear that the text index is >>>> being used -- I see 'apply text index for "..."'. Is it expected that the >>>> query time would be this much longer? Is the text index somehow built >>>> differently for nodes that don't appear often in the database? >>>> >>>> Thanks in advance, >>>> Matt >>>> >>> >> >> -- >> Matthew R. Dziuban >> mattdziuban.com >> 703-973-6717 >> mrdziu...@gmail.com >> > -- Matthew R. Dziuban mattdziuban.com 703-973-6717 mrdziu...@gmail.com