Re: [basex-talk] Finding document based on filename

Mansi Sheth Tue, 01 Sep 2015 14:07:43 -0700

Thanks guys for all expert comments. Currently, I am going experimenting
performance with just deleting and inserting using Java API. If this
process takes a tiny bit longer, i don't really care is what I figured :)
If i becomes unacceptable, I will use one of these suggestions.


Thanks once again.

StringList databases =  List.list(context) ;

String query = "" ;

for(String database : databases ) {

query = "db:list('" + database + "')" ;



try {

for (String fileName: query(query).split(" ")) {

query = "db:delete('" +  database + "','" + fileName + "')" ;

if(fileName.contains(XMLFileName.split("_")[1])) {

query(query) ;

logger.info("Deleted " + fileName + " from " + database) ;

retVal = true;

break;

}

}

} catch (BaseXException e) {

e.printStackTrace();

}

}

On Mon, Aug 31, 2015 at 9:45 PM, Martín Ferrari <ferrari_mar...@hotmail.com>
wrote:

>     I forgot one thing, I got much better performance by just calling
> replace rather than delete and insert, but this is a db with more than one
> million records. If performance is not important, I believe either way will
> do.
>
> Martín.
>
> ------------------------------
> From: ferrari_mar...@hotmail.com
> To: mansi.sh...@gmail.com; basex-talk@mailman.uni-konstanz.de
> Date: Mon, 31 Aug 2015 16:35:33 +0000
> Subject: Re: [basex-talk] Finding document based on filename
>
>
> Hi Mansi,
>      I have a similar situation. I don't think there's a fast way to get
> documents by only knowing a part of their names. It seems you need to know
> the exact name. In my case, we might be able to group documents by a common
> id, so we might create subfolders inside the DB and store/get the contents
> of the subfolder directly, which is pretty fast.
>      I've also tried indexing, but insertions got really slow (I assume
> maybe because indexing is not granular, it indexes all values) and we
> need performance.
>
>      Oh, I've also tried using starts-with() instead of contains(), but it
> seems it does not pick up indexes.
>
> Martín.
>
> ------------------------------
> Date: Fri, 28 Aug 2015 16:52:37 -0400
> From: mansi.sh...@gmail.com
> To: basex-talk@mailman.uni-konstanz.de
> Subject: [basex-talk] Finding document based on filename
>
> Hello,
>
> I would be having 100s of databases, with each database having 100 XML
> documents. I want to devise an algorithm, where given a part of XML file
> name, i want to know which database(s) contains it, or null if document is
> not currently present in any database. Based on that, add current document
> into the database. This is to always maintain latest version of a document
> in DB, and remove the older version, while adding newer version.
>
> So far, only way I could come up with is:
>
> for $db in all-databases:
>       open $db
>       $fileNames = list $db
>             for eachFileName in $fileNames:
>                    if $eachFileName.contains(sub-xml filename):
>                             add to ret-list-db
>
> return ret-list-db
>
> Above algorithm, seems highly inefficient, Is there any indexing, which
> can be done ? Do you suggest, for each document insert, I should maintain a
> separate XML document, which lists each file inserted etc.
>
> Once, i get hold of above list of db, I would be eventually deleting that
> file and inserting a latest version of that file(which would have same
> sub-xml file name). So, constant updating of this external document also
> seems painful (Map be ?).
>
> Also, would it be faster, using XQUERY script files, thru java code, or
> using Java API for such operations ?
>
> How do you all deal with such operations ?
>
> - Mansi
>



-- 
- Mansi

Re: [basex-talk] Finding document based on filename

Reply via email to