Thanks guys for all expert comments. Currently, I am going experimenting performance with just deleting and inserting using Java API. If this process takes a tiny bit longer, i don't really care is what I figured :) If i becomes unacceptable, I will use one of these suggestions.
Thanks once again. StringList databases = List.list(context) ; String query = "" ; for(String database : databases ) { query = "db:list('" + database + "')" ; try { for (String fileName: query(query).split(" ")) { query = "db:delete('" + database + "','" + fileName + "')" ; if(fileName.contains(XMLFileName.split("_")[1])) { query(query) ; logger.info("Deleted " + fileName + " from " + database) ; retVal = true; break; } } } catch (BaseXException e) { e.printStackTrace(); } } On Mon, Aug 31, 2015 at 9:45 PM, Martín Ferrari <ferrari_mar...@hotmail.com> wrote: > I forgot one thing, I got much better performance by just calling > replace rather than delete and insert, but this is a db with more than one > million records. If performance is not important, I believe either way will > do. > > Martín. > > ------------------------------ > From: ferrari_mar...@hotmail.com > To: mansi.sh...@gmail.com; basex-talk@mailman.uni-konstanz.de > Date: Mon, 31 Aug 2015 16:35:33 +0000 > Subject: Re: [basex-talk] Finding document based on filename > > > Hi Mansi, > I have a similar situation. I don't think there's a fast way to get > documents by only knowing a part of their names. It seems you need to know > the exact name. In my case, we might be able to group documents by a common > id, so we might create subfolders inside the DB and store/get the contents > of the subfolder directly, which is pretty fast. > I've also tried indexing, but insertions got really slow (I assume > maybe because indexing is not granular, it indexes all values) and we > need performance. > > Oh, I've also tried using starts-with() instead of contains(), but it > seems it does not pick up indexes. > > Martín. > > ------------------------------ > Date: Fri, 28 Aug 2015 16:52:37 -0400 > From: mansi.sh...@gmail.com > To: basex-talk@mailman.uni-konstanz.de > Subject: [basex-talk] Finding document based on filename > > Hello, > > I would be having 100s of databases, with each database having 100 XML > documents. I want to devise an algorithm, where given a part of XML file > name, i want to know which database(s) contains it, or null if document is > not currently present in any database. Based on that, add current document > into the database. This is to always maintain latest version of a document > in DB, and remove the older version, while adding newer version. > > So far, only way I could come up with is: > > for $db in all-databases: > open $db > $fileNames = list $db > for eachFileName in $fileNames: > if $eachFileName.contains(sub-xml filename): > add to ret-list-db > > return ret-list-db > > Above algorithm, seems highly inefficient, Is there any indexing, which > can be done ? Do you suggest, for each document insert, I should maintain a > separate XML document, which lists each file inserted etc. > > Once, i get hold of above list of db, I would be eventually deleting that > file and inserting a latest version of that file(which would have same > sub-xml file name). So, constant updating of this external document also > seems painful (Map be ?). > > Also, would it be faster, using XQUERY script files, thru java code, or > using Java API for such operations ? > > How do you all deal with such operations ? > > - Mansi > -- - Mansi