On 04.09.2010 21:45, Justin Erenkrantz wrote: > On Sat, Sep 4, 2010 at 10:18 AM, Justin Erenkrantz > <jus...@erenkrantz.com> wrote: >> Notably, AFAICT, we're repeating a few of these queries: >> >> - STMT_SELECT_WORKING_NODE (2 times) >> - STMT_SELECT_ACTUAL_NODE (3 times) >> - STMT_SELECT_WORKING_PROPS (2 times) >> - STMT_SELECT_BASE_PROPS (2 times) >> >> I haven't yet dug into why we're repeating the queries. > Okay - I now know why we're repeating the core queries twice. > > In get_dir_status, we want to do a check to identify if the node > exists and what kind it is - which is done by a call to > svn_wc__db_read_info (around line 1269 in status.c). But, most of the > parameters (except for node and kind) are NULL. If it's not excluded > and we can go into the depth, then we call handle_dir_entry on the > entry a few lines down - which turns right around and calls > svn_wc__db_read_info - this time asking for everything. > > This causes the core per-file queries to be executed twice. > > I'm going to see what a quick check to retrieve just the kind and > status will do for the query volume. I think it's unlikely we have to > pull everything out of sqlite to answer that basic question. -- > justin
Possibly this existence check could be one single query for the whole WC and the results cached in memory? There shouldn't be a significant difference in per-query overhead, and you need all those results in any case for a whole-depth status. Of course it increases memory usage, but really ... I can't see that as terribly significant. $ sudo find -x / -print | wc 775161 1091167 81342644 80 megs of "file metadata" on my box with some 120 gigs of stuff and OS install on it, I doubt even a fairly large working copy would do worse than that. -- Brane