Re: [basex-talk] handling large files: is there a streamingsolution?

Fabrice Etanchaud Tue, 12 Feb 2013 01:07:50 -0800

Dear Peter,

I'm just a BaseX user, and Christian's team will correct me, but from my 
experience, document size does not matter, at least for querying.


Why do you talk about distributing data ? Did you reach the 2 billion nodes 
limit ?

As BaseX indexes all nodes, depending on the values distribution, creating a 
new collection containing hand made indices can speed up your queries.

For example, for append only collections, I'm used to creating a index 
collection like this :
<index>
        <item value='value to be indexed'>
                the 'pre' pointer to the indexed element
        </tem>
        <item>...
</index>

And access that 'index' something like this :

for $i in 
        //item[@value='searched value']
return
        db:open-pre('mydb', $i)


And a big number of documents may slow down the properties window display in  
the GUI, because of the document tree view.


Question to the BaseX 's team : would 'user defined' indices be a interesting 
feature ?


Regards

-----Message d'origine-----
De : [email protected] [mailto:[email protected]] 
Envoyé : lundi 11 février 2013 17:13
À : Fabrice Etanchaud; [email protected]; [email protected]
Objet : RE: [basex-talk] handling large files: is there a streamingsolution?

Thanks Fabrice, I am making good progress following your advice.  Do you have 
any heuristics for the best way to distribute data for performant searches and 
subsetting of data?  Am I better having lots of small files or a few large 
files in a collection?

>
>
>
>---- Original Message ----
>From: [email protected]
>To: [email protected], [email protected]
>Subject: RE: [basex-talk] handling large files: is there a 
>streamingsolution?
>Date: Mon, 11 Feb 2013 14:38:54 +0000
>
>>Dear Peter,
>>
>>Did you try to create a collection with the files (CREATE command) ?
>>You should start that way,  I don't see the point in using file:
>module for import.
>>I think that once in the database, file size does not matter (until
>you reach millions of file in the collection, and do a lot of document 
>related operations (list, etc...))
>>
>>
>>
>>-----Message d'origine-----
>>De : [email protected]
>[mailto:[email protected]] De la part de 
>[email protected]
>>Envoyé : lundi 11 février 2013 15:33
>>À : [email protected]
>>Objet : [basex-talk] handling large files: is there a streaming
>solution?
>>
>>Hello List
>>I am wanting to do a join with some large (3-400Mb) XML files and
>would appreciate guidance on the optimal strategy.
>>At present these files are on the filesystem and not in a database
>>
>>Is there any equivalent to the Zorba streaming xml:parse()?  
>>
>>Would loading the files into a database directly be the approach, or
>is it better to split them into smaller files?
>>
>>Is the file: module a suitable route through which to import the
>files?
>>
>>Thanks for your help
>>
>>Peter
>>
>>_______________________________________________
>>BaseX-Talk mailing list
>>[email protected]
>>https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
>>


_______________________________________________
BaseX-Talk mailing list
[email protected]
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Re: [basex-talk] handling large files: is there a streamingsolution?

Reply via email to