Hello again,

> I implemented this and it looks like it works nicely (to be confirmed
> soon  - I started a run on a 600k records collection).

This runs nicely, in that the machine doesn't run out of memory
anymore. There is one thing I noticed however, and that I had noticed
earlier on as well when a big collection was being processed: any
attempt to talk with the server seems not to be working, i.e. even
when I try to connect via the command-line basexadmin and run a
command such as "list" or "open db foo", I do not get a reply. I can
see the commands in the log though:

17:28:06.532    [127.0.0.1:33112]       LOGIN admin     OK
17:28:08.158    [127.0.0.1:33112]       LIST
17:28:21.288    [127.0.0.1:33114]       LOGIN admin     OK
17:28:25.602    [127.0.0.1:33114]       LIST
17:28:52.676    [127.0.0.1:33116]       LOGIN admin     OK

Could it be that the long session is blocking the output stream coming
from the server?

Thanks,

Manuel

On Mon, May 21, 2012 at 4:40 PM, Manuel Bernhardt
<[email protected]> wrote:
> Hi Christian,
>
>> as you have already seen, all results are first cached by the client
>> if they are requested via the iterative query protocol. In earlier
>> versions of BaseX, results were returned in a purely iterative manner
>> -- which was more convenient and flexible from a user's point of view,
>> but led to numerous deadlocks if reading and writing queries were
>> mixed.
>>
>> If you only need parts of the requested results, I would recommend to
>> limit the number of results via XQuery, e.g. as follows:
>>
>>  ( for $i in /record[@version = > 0]
>>  order by $i/system/index
>>  return $i) [position() = 1 to 1000]
>>
>
> I had considered this, but haven't used that approach - yet - mainly
> because I wanted to try the streaming approach first. So far our
> system only used MongoDB and we are used to working with cursors as
> query results, so I'm trying to keep that somehow aligned if possible.
>
>> Next, it is important to note that the "order by" clause can get very
>> expensive, as all results have to be cached anyway before they can be
>> returned. Our top-k functions will probably give you better results if
>> it's possible in your use case to limit the number of results [1].
>
> Ok, thanks. If this becomes a problem, I'll consider using this. Is
> the query time of 0.06ms otherwise the actual time the query takes to
> run? If yes then I'm not too worried about query performance :)
> In general, the bottleneck in our system is not so much the querying
> but rather the processing of the records - I started rewriting this
> one concurrently using Akka, but am now stuck with a classloader
> deadlock (no pun intended). It will likely take quite some effort for
> the processing to be faster than the query iteration.
>
>> A popular alternative to client-side caching (well, you mentioned that
>> already) is to overwrite the code of the query client, and directly
>> process the returned results. Note, however, that you need to loop
>> through all results, even if you only need parts of the results.
>
> I implemented this and it looks like it works nicely (to be confirmed
> soon  - I started a run on a 600k records collection).
>
>
> Thanks for your time!
>
> Manuel
>
>
>> Hope this helps,
>> Christian
>>
>> [1] http://docs.basex.org/wiki/Higher-Order_Functions_Module#hof:top-k-by
_______________________________________________
BaseX-Talk mailing list
[email protected]
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Reply via email to