Re: [basex-talk] Full-Text

Ветошкин Владимир Fri, 20 Jul 2018 02:24:46 -0700

Hi, Christian!

Thank you again for your help!!

Yes, I have the full-text index enabled.

I have tried something like this:

let $dbs := for $i in db:list()[starts-with(.,'000999~')] return $i

for $db in $dbs

for $doc in db:open($db)/.//*[(# db:enforceindex #) { text() contains text { 'TEN-9258' } any }]

return $doc

No effect.

Then I have tried:

let $dbs := for $i in db:list()[starts-with(.,'000999~')] return $i

for $db in $dbs

let $ft := ft:search($db, "TEN-9258")/parent::*

for $node in $ft

return root($node)

And it works like a charm =)

But I don't understand why the first way doesn't work...

20.07.2018, 11:31, "Christian Grün" <christian.gr...@gmail.com>:

Hi Vladimir,

But if I search in "db:list...db:open..." - it takes about 12-15 seconds.

If the name of the database is not statically known, the query cannot
be rewritten for index access (because one the targeted database may
not have the required index). I guess you have the full-text index
enabled?

However, since BaseX 9, you can take advantage of the ENFORCEINDEX
option: All queries will then optimized for index operations, based on
your knowledge that there an index will be available. See [1] for
further details.

By the way, you can have a look at the compilation section of the Info
View in the GUI to see if indexes will be applied in your query.

Best,
Christian

[1] http://docs.basex.org/wiki/Indexes#Enforce_Rewritings

Example takes ~12-15s:
let $db := for $i in db:list()[starts-with(.,'000999~')] return try {db:open($i)} catch * {}
for $doc in $db/.//*[text() contains text { 'TEN-9258' } any]
return $doc

Example takes ~180ms (returns 2 rows):
let $db := for $i in db:list()[starts-with(.,'000999~201807')] return db:open($i)
for $doc in $db/.//*[text() contains text { 'TEN-9258' } any]
return $doc

Example takes ~10ms (returns 2 rows):
for $doc in db:open('000999~201807')/.//*[text() contains text { 'TEN-9258' } any]
return $doc

Why do the last 2 examples take different times?
How can I improve this?

Example takes ~2s (returns 0 rows):
let $db := for $i in db:list()[starts-with(.,'000999~201806')] return db:open($i)
for $doc in $db/.//*[text() contains text { 'TEN-9258' } any]
return $doc

Example takes ~12ms (returns 0 rows):
for $doc in db:open('000999~201806')/.//*[text() contains text { 'TEN-9258' } any]
return $doc

25.06.2018, 13:07, "Alexander Shpack" <shadow...@gmail.com>:

Hi, Vladimir,

If you will do db names with the particular prefix, for example "db_", you may use the next code

let $docs := for $i in db:list()[starts-with(.,"db_")] return db:open($i)return $docs/*

On Mon, Jun 25, 2018 at 12:32 PM Ветошкин Владимир <en-tra...@yandex.ru> wrote:

Hi, Alexander,

Some questions:
After that, how can I perform a search in all of these databases?
Can I search for substring without fulltext using only text index?

25.06.2018, 11:56, "Alexander Shpack" <shadow...@gmail.com>:

Hey Vladimir,

You can use sharding approach for you data import and split all DBs even every month.

On Mon, Jun 25, 2018 at 11:50 AM Ветошкин Владимир <en-tra...@yandex.ru> wrote:

Hi, Alexander!
Thank you!

In my previous letter I have described the proccess in short.
I'll think about separated DB. But I'm afraid that this base will also be very big in future.
Although I can try to split data to several databases - one per year.. Hmm..

25.06.2018, 11:25, "Alexander Shpack" <shadow...@gmail.com>:

Hey, Vladimir!

Just put this specific files to the separated DB and than index it.
You can process it automatically, BaseX allows to create and index DB right from XQuery.

I hope it helps you. Anyhow, you can provide more details about your task and we can figure out the best solution for you.

On Mon, Jun 25, 2018 at 10:42 AM Ветошкин Владимир <en-tra...@yandex.ru> wrote:

Hi, Fabrice!
Thank you.

All databases constantly change.That is why there is no way to single out "a big readonly collection" :(
Maybe it is possible to use some other incremental indexes?
I have to index specific xml-files, not all files in database.

21.06.2018, 17:16, "Fabrice ETANCHAUD" <fetanch...@pch.cerfrance.fr>:

Hi Vladimir,

I don’t think there is something like a incremental full text index for the moment [1].

As index is per collection, the recommanded way shall be to split your data in two collections :

- A big readonly collection of all the past updates, indexed once

- A small/medium sized collection whom full text index can be recreated in an acceptable time after each update.

At the end of a predefined time period, you have to add the live collection to the readonly one, reindex it, and truncate the live one.

Best regards from France,

Fabrice Etanchaud

[1] http://docs.basex.org/wiki/Indexes#Updates

De : BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de ???????? ????????
Envoyé : jeudi 21 juin 2018 16:02
À : BaseX
Objet : [basex-talk] Full-Text

Hi, everyone!

Is there any way to index only imported xml-files?

Now, when I import xml-files the full-text index is deleted.

After importing I recreate whole full-text index and it takes too much time :(

--

С уважением,

Ветошкин Владимир Владимирович

--
С уважением,
Ветошкин Владимир Владимирович

--
s0rr0w

--
С уважением,
Ветошкин Владимир Владимирович

--
s0rr0w

--
С уважением,
Ветошкин Владимир Владимирович

--
s0rr0w

--
С уважением,
Ветошкин Владимир Владимирович

С уважением,

Ветошкин Владимир Владимирович

Re: [basex-talk] Full-Text

Reply via email to