Dear Peter,
I'm just a BaseX user, and Christian's team will correct me, but from my
experience, document size does not matter, at least for querying.
Why do you talk about distributing data ? Did you reach the 2 billion nodes
limit ?
As BaseX indexes all nodes, depending on the values distribution, creating a
new collection containing hand made indices can speed up your queries.
For example, for append only collections, I'm used to creating a index
collection like this :
<index>
<item value='value to be indexed'>
the 'pre' pointer to the indexed element
</tem>
<item>...
</index>
And access that 'index' something like this :
for $i in
//item[@value='searched value']
return
db:open-pre('mydb', $i)
And a big number of documents may slow down the properties window display in
the GUI, because of the document tree view.
Question to the BaseX 's team : would 'user defined' indices be a interesting
feature ?
Regards
-----Message d'origine-----
De : [email protected] [mailto:[email protected]]
Envoyé : lundi 11 février 2013 17:13
À : Fabrice Etanchaud; [email protected]; [email protected]
Objet : RE: [basex-talk] handling large files: is there a streamingsolution?
Thanks Fabrice, I am making good progress following your advice. Do you have
any heuristics for the best way to distribute data for performant searches and
subsetting of data? Am I better having lots of small files or a few large
files in a collection?
>
>
>
>---- Original Message ----
>From: [email protected]
>To: [email protected], [email protected]
>Subject: RE: [basex-talk] handling large files: is there a
>streamingsolution?
>Date: Mon, 11 Feb 2013 14:38:54 +0000
>
>>Dear Peter,
>>
>>Did you try to create a collection with the files (CREATE command) ?
>>You should start that way, I don't see the point in using file:
>module for import.
>>I think that once in the database, file size does not matter (until
>you reach millions of file in the collection, and do a lot of document
>related operations (list, etc...))
>>
>>
>>
>>-----Message d'origine-----
>>De : [email protected]
>[mailto:[email protected]] De la part de
>[email protected]
>>Envoyé : lundi 11 février 2013 15:33
>>À : [email protected]
>>Objet : [basex-talk] handling large files: is there a streaming
>solution?
>>
>>Hello List
>>I am wanting to do a join with some large (3-400Mb) XML files and
>would appreciate guidance on the optimal strategy.
>>At present these files are on the filesystem and not in a database
>>
>>Is there any equivalent to the Zorba streaming xml:parse()?
>>
>>Would loading the files into a database directly be the approach, or
>is it better to split them into smaller files?
>>
>>Is the file: module a suitable route through which to import the
>files?
>>
>>Thanks for your help
>>
>>Peter
>>
>>_______________________________________________
>>BaseX-Talk mailing list
>>[email protected]
>>https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
>>
_______________________________________________
BaseX-Talk mailing list
[email protected]
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk