Hello Hans-Juergen,

> So my understanding is that the messages are inserted as child elements into 
> this root element - and the end result is one document with one root element 
> and millions of child elements representing the invidual messages, yes? 


Yes that is correct, i have one root element at the beginning and insert the 
incoming items as child nodes of the root.

> Therefore you do not have to come up with URIs, as there is only one single 
> document. A monster document, but I conclude from your approach that this is 
> no problem, and not worse (or even better) than having a million individual, 
> small documents. Is it correct - would you recommend to store the messages in 
> one single document?

In my use case, tweets have unique id attributes, so i don't need any URIs to 
identify them. Probably, it is a good idea if you describe your further 
querying process so it is easier to understand what you want to do.

> If the loading process cannot concur with queries - would there be any way 
> how one could periodically "shift" packages of messages into a "read only" 
> database? Or perhaps better the other way around, let the server periodically 
> interrupt its loading activity, close the database, rename it, open and 
> initialize a new base and then continue to load? Or is there presently simply 
> no solution available?


Thats exactly what i do after each hour. I rename the current db with the 
current date_hour and create a new database for the next incoming items. 
Shifting is not really an alternative, cause it will probably take too long to 
insert the items into a second database and delete them from the "main" 
database.

Kind regards,
Andreas


Am 03.07.2012 um 23:58 schrieb Hans-Juergen Rennau:

> Hello Andreas,
> 
> thank you very much for these informations! Indeed, the use-cases are 
> similar. 
> 
> I try to understand how exactly you stored the messages. The Wiki says: "the 
> initial database just contained a root node <tweets/>". So my understanding 
> is that the messages are inserted as child elements into this root element - 
> and the end result is one document with one root element and millions of 
> child elements representing the invidual messages, yes? Therefore you do not 
> have to come up with URIs, as there is only one single document. A monster 
> document, but I conclude from your approach that this is no problem, and not 
> worse (or even better) than having a million individual, small documents. Is 
> it correct - would you recommend to store the messages in one single document?
> 
> If the loading process cannot concur with queries - would there be any way 
> how one could periodically "shift" packages of messages into a "read only" 
> database? Or perhaps better the other way around, let the server periodically 
> interrupt its loading activity, close the database, rename it, open and 
> initialize a new base and then continue to load? Or is there presently simply 
> no solution available?
> 
> Kind regards,
> Hans-Juergen
> 
> Von: Andreas Weiler <andreas.wei...@uni-konstanz.de>
> An: Hans-Juergen Rennau <hren...@yahoo.de> 
> CC: "basex-talk@mailman.uni-konstanz.de" <basex-talk@mailman.uni-konstanz.de> 
> Gesendet: 15:51 Dienstag, 3.Juli 2012
> Betreff: Re: [basex-talk] BaseX as a log msg store?
> 
> Hello Hans-Juergen,
> 
> here are some details about my use case, which is similar to yours.
> I'm using BaseX to insert the live public Twitter Stream into databases (see 
> Wiki Entry [1]).
> 
> One Twitter message is around 4 kb of size and i'm able to insert about 2000 
> of them per second
> using single XQuery Update inserts. So that would probably be working out for 
> you, too.
> If you use bulk inserts, like caching the items in a item list and running 
> one XQuery Update for all of them, the amount of inserts would also increase.
> 
>> thus made available for querying
> 
> this could be a bigger problem, cause as long as you are writing items into 
> the database (which will never stop in your use case), the readers are 
> blocked.
> And if one of your readers will be running, the writers are blocked.
> 
> Hope this helps,
> Andreas
> 
> [1] http://docs.basex.org/wiki/Twitter
> 
> 

_______________________________________________
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Reply via email to