Re: [basex-talk] handling large files: is there a streamingsolution?

pw Tue, 12 Feb 2013 16:22:17 -0800

Hi Fabrice and list

I am dealing with data-centric XML rather than documents and so there
is a fairly high node to content ratio.  I have about 250 million
nodes and I find that having about 15 million nodes per database
seems to work well, but this is just a guesstimate and I am really
looking for some performance profiles or some heuristics so that I
can limit the numbers of nodes in each database before the
performance degrades.


Cheers

Peter

>
>
>---- Original Message ----
>From: [email protected]
>To: [email protected], [email protected],
>[email protected]
>Subject: RE: [basex-talk] handling large files: is there a
>streamingsolution?
>Date: Tue, 12 Feb 2013 09:07:40 +0000
>
>>Dear Peter,
>>
>>I'm just a BaseX user, and Christian's team will correct me, but
>from my experience, document size does not matter, at least for
>querying.
>>
>>Why do you talk about distributing data ? Did you reach the 2
>billion nodes limit ?
>>
>>As BaseX indexes all nodes, depending on the values distribution,
>creating a new collection containing hand made indices can speed up
>your queries.
>>
>>For example, for append only collections, I'm used to creating a
>index collection like this :
>><index>
>>      <item value='value to be indexed'>
>>              the 'pre' pointer to the indexed element
>>      </tem>
>>      <item>...
>></index>
>>
>>And access that 'index' something like this :
>>
>>for $i in 
>>      //item[@value='searched value']
>>return
>>      db:open-pre('mydb', $i)
>>
>>
>>And a big number of documents may slow down the properties window
>display in  the GUI, because of the document tree view.
>>
>>
>>Question to the BaseX 's team : would 'user defined' indices be a
>interesting feature ?
>>
>>
>>Regards
>>
>>-----Message d'origine-----
>>De : [email protected] [mailto:[email protected]] 
>>Envoyé : lundi 11 février 2013 17:13
>>À : Fabrice Etanchaud; [email protected];
>[email protected]
>>Objet : RE: [basex-talk] handling large files: is there a
>streamingsolution?
>>
>>Thanks Fabrice, I am making good progress following your advice.  Do
>you have any heuristics for the best way to distribute data for
>performant searches and subsetting of data?  Am I better having lots
>of small files or a few large files in a collection?
>>
>>>
>>>
>>>
>>>---- Original Message ----
>>>From: [email protected]
>>>To: [email protected], [email protected]
>>>Subject: RE: [basex-talk] handling large files: is there a 
>>>streamingsolution?
>>>Date: Mon, 11 Feb 2013 14:38:54 +0000
>>>
>>>>Dear Peter,
>>>>
>>>>Did you try to create a collection with the files (CREATE command)
>?
>>>>You should start that way,  I don't see the point in using file:
>>>module for import.
>>>>I think that once in the database, file size does not matter
>(until
>>>you reach millions of file in the collection, and do a lot of
>document 
>>>related operations (list, etc...))
>>>>
>>>>
>>>>
>>>>-----Message d'origine-----
>>>>De : [email protected]
>>>[mailto:[email protected]] De la part de 
>>>[email protected]
>>>>Envoyé : lundi 11 février 2013 15:33
>>>>À : [email protected]
>>>>Objet : [basex-talk] handling large files: is there a streaming
>>>solution?
>>>>
>>>>Hello List
>>>>I am wanting to do a join with some large (3-400Mb) XML files and
>>>would appreciate guidance on the optimal strategy.
>>>>At present these files are on the filesystem and not in a database
>>>>
>>>>Is there any equivalent to the Zorba streaming xml:parse()?  
>>>>
>>>>Would loading the files into a database directly be the approach,
>or
>>>is it better to split them into smaller files?
>>>>
>>>>Is the file: module a suitable route through which to import the
>>>files?
>>>>
>>>>Thanks for your help
>>>>
>>>>Peter
>>>>
>>>>_______________________________________________
>>>>BaseX-Talk mailing list
>>>>[email protected]
>>>>https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
>>>>
>>
>>
>>


_______________________________________________
BaseX-Talk mailing list
[email protected]
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Re: [basex-talk] handling large files: is there a streamingsolution?

Reply via email to