Hello again, > I implemented this and it looks like it works nicely (to be confirmed > soon - I started a run on a 600k records collection).
This runs nicely, in that the machine doesn't run out of memory anymore. There is one thing I noticed however, and that I had noticed earlier on as well when a big collection was being processed: any attempt to talk with the server seems not to be working, i.e. even when I try to connect via the command-line basexadmin and run a command such as "list" or "open db foo", I do not get a reply. I can see the commands in the log though: 17:28:06.532 [127.0.0.1:33112] LOGIN admin OK 17:28:08.158 [127.0.0.1:33112] LIST 17:28:21.288 [127.0.0.1:33114] LOGIN admin OK 17:28:25.602 [127.0.0.1:33114] LIST 17:28:52.676 [127.0.0.1:33116] LOGIN admin OK Could it be that the long session is blocking the output stream coming from the server? Thanks, Manuel On Mon, May 21, 2012 at 4:40 PM, Manuel Bernhardt <[email protected]> wrote: > Hi Christian, > >> as you have already seen, all results are first cached by the client >> if they are requested via the iterative query protocol. In earlier >> versions of BaseX, results were returned in a purely iterative manner >> -- which was more convenient and flexible from a user's point of view, >> but led to numerous deadlocks if reading and writing queries were >> mixed. >> >> If you only need parts of the requested results, I would recommend to >> limit the number of results via XQuery, e.g. as follows: >> >> ( for $i in /record[@version = > 0] >> order by $i/system/index >> return $i) [position() = 1 to 1000] >> > > I had considered this, but haven't used that approach - yet - mainly > because I wanted to try the streaming approach first. So far our > system only used MongoDB and we are used to working with cursors as > query results, so I'm trying to keep that somehow aligned if possible. > >> Next, it is important to note that the "order by" clause can get very >> expensive, as all results have to be cached anyway before they can be >> returned. Our top-k functions will probably give you better results if >> it's possible in your use case to limit the number of results [1]. > > Ok, thanks. If this becomes a problem, I'll consider using this. Is > the query time of 0.06ms otherwise the actual time the query takes to > run? If yes then I'm not too worried about query performance :) > In general, the bottleneck in our system is not so much the querying > but rather the processing of the records - I started rewriting this > one concurrently using Akka, but am now stuck with a classloader > deadlock (no pun intended). It will likely take quite some effort for > the processing to be faster than the query iteration. > >> A popular alternative to client-side caching (well, you mentioned that >> already) is to overwrite the code of the query client, and directly >> process the returned results. Note, however, that you need to loop >> through all results, even if you only need parts of the results. > > I implemented this and it looks like it works nicely (to be confirmed > soon - I started a run on a 600k records collection). > > > Thanks for your time! > > Manuel > > >> Hope this helps, >> Christian >> >> [1] http://docs.basex.org/wiki/Higher-Order_Functions_Module#hof:top-k-by _______________________________________________ BaseX-Talk mailing list [email protected] https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

