Re: [basex-talk] Finding document based on filename

Dirk Kirsten Mon, 31 Aug 2015 11:21:07 -0700

Hello Martin,

I would like to add that Christian just implemented selective indexes,
so if you want to index in a more granular fashion this should now be
possible. See https://github.com/BaseXdb/basex/issues/59 for more
details and of course this is not stable software yet, so use with care.
But we are happy about feedback, as always.


Regarding the initial issue: I do think using supfolders in the database
should be the easiest and fastest way. Is there any reason not to use
distinct directories instead of naming the file using some pattern?

Cheers
Dirk

On 08/31/2015 06:35 PM, Martín Ferrari wrote:
> Hi Mansi,
>      I have a similar situation. I don't think there's a fast way to
> get documents by only knowing a part of their names. It seems you need
> to know the exact name. In my case, we might be able to group
> documents by a common id, so we might create subfolders inside the DB
> and store/get the contents of the subfolder directly, which is pretty
> fast.
>      I've also tried indexing, but insertions got really slow (I
> assume maybe because indexing is not granular, it indexes all
> values) and we need performance.
>
>      Oh, I've also tried using starts-with() instead of contains(),
> but it seems it does not pick up indexes.
>
> Martín.
>
> ------------------------------------------------------------------------
> Date: Fri, 28 Aug 2015 16:52:37 -0400
> From: mansi.sh...@gmail.com
> To: basex-talk@mailman.uni-konstanz.de
> Subject: [basex-talk] Finding document based on filename
>
> Hello, 
>
> I would be having 100s of databases, with each database having 100 XML
> documents. I want to devise an algorithm, where given a part of XML
> file name, i want to know which database(s) contains it, or null if
> document is not currently present in any database. Based on that, add
> current document into the database. This is to always maintain latest
> version of a document in DB, and remove the older version, while
> adding newer version.
>
> So far, only way I could come up with is:
>
> for $db in all-databases:
>       open $db
>       $fileNames = list $db
>             for eachFileName in $fileNames:
>                    if $eachFileName.contains(sub-xml filename):
>                             add to ret-list-db
>
> return ret-list-db
>
> Above algorithm, seems highly inefficient, Is there any indexing,
> which can be done ? Do you suggest, for each document insert, I should
> maintain a separate XML document, which lists each file inserted etc. 
>
> Once, i get hold of above list of db, I would be eventually deleting
> that file and inserting a latest version of that file(which would have
> same sub-xml file name). So, constant updating of this external
> document also seems painful (Map be ?).
>
> Also, would it be faster, using XQUERY script files, thru java code,
> or using Java API for such operations ?
>
> How do you all deal with such operations ?
>
> - Mansi

-- 
Dirk Kirsten, BaseX GmbH, http://basexgmbh.de
|-- Firmensitz: Blarerstrasse 56, 78462 Konstanz
|-- Registergericht Freiburg, HRB: 708285, Geschäftsführer:
|   Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle
`-- Phone: 0049 7531 28 28 676, Fax: 0049 7531 20 05 22

Re: [basex-talk] Finding document based on filename

Reply via email to