On Wed, Jan 23, 2013 at 6:56 PM, Julian Foad <[email protected]>wrote:
> > URL: http://svn.apache.org/viewvc?rev=1435746&view=rev > > > Log: > > Fix issues #3995 (redesign svn_fs_verify() for 1.8) and #4211 (verify > > is slow and needs to handle node verification better). > > > > This patch does two things. First, it adds a notification callback > > to svn_fs_verify() and forwards its output to our console stream. > > > > The second change is that revision-specific checks are begin moved > > from svn_fs_fs__verify() to a new svn_fs_verify_rev() function. > > Because the latter is being called as part of the per-revision checks, > > progress is visible and access / cache locality is very much improved. > > The latter is now being called as part of svn_repos_verify_fs2()'s > per-revision checking loop (not as part of svn_fs_[fs__]verify()'s per-rev > checks, which is how I interpreted this at first). > I updated the log message to make that clear. > Because of that, the already existing '* Verified revision R' notification > from that loop now means both 'dump revision R' and svn_fs_verify_rev(R) > have been done. There will no longer be the initial long delay which was > previously caused by svn_fs_verify() verifying all revs in the repo before > the first '* Verified revision R' notification. > But it may still take quite some time because all representations will be accessed once to verify that they actually exist as promised by the rep-cache. Reading 10 .. 100GB from disk will still take time. > But you are still calling svn_fs_verify(..., start, end, ...) before the > per-rev loop. This was puzzling until I saw that the 'start' and 'end' > arguments here are no longer used for per-rev full verification but are > still used for rep-cache verification. This revision range is rather an optimization reducing the amount of data to process. Depending on the nature of the tests we might perform in the future, findings / error messages may still refer to revisions outside that range. E.g. "r123 based on r99, which is empty". This is now the only potentially long-running check before the main loop > starts, and it now sends its own notifications. > Depending on what you check, the order may also be non-linear and even repeating. The only point here is to give the user some heartbeat signal and ideally sense of progress. > > So the overall output is now like this (on one of my test repos): > * Verifying global structure ... > * Verifying structure at revision 74 ... > * Verifying structure at revision 481 ... > * Verifying structure at revision 831 ... > * Verified revision 0. > * Verified revision 1. > * Verified revision 2. > * Verified revision 3. > > OK. Maybe s/structure/repository metadata/; what do others think? > Done in r1439211. > > > /** > > * Perform backend-specific data consistency and correctness validations > > * to the Subversion filesystem located in the directory @a path. > > * > > * @a start and @a end may be #SVN_INVALID_REVNUM, in which case > > * svn_repos_verify_fs2()'s semantics apply. When @c r0 is being > > * verified, global invariants may be verified as well. > > * ... */ > > svn_fs_verify(path, cancel..., notify..., start, end, pool); > > This needs to say what the START and END parameters are for, and make > clear that it doesn't call svn_fs_verify_rev() for each rev. > Tried to clear that up in r1439214. > > /** > > * Perform backend-specific data consistency and correctness validations > > * to revision @a revision of the Subversion filesystem @a fs. > > * ... */ > > svn_fs_verify_rev(fs, revision, pool); > > I'm still unclear of the relationship between svn_fs_verify_rev() and > performing a dump of the rev. I assume they are complementary, but it > seems odd, as it feels to me like a good implementation of > svn_fs_verify_rev() would do all that a dump does and more. > Some tests are very expensive to set up. E.g. rep-cache.db does not have a revision index and will thus we always do a full table scan. > > /* Verify the fsfs filesystem FS. Use POOL for temporary allocations. */ > > svn_fs_fs__verify(fs, cancel..., notify..., start, end, pool); > > > > /* Verify REVISION in filesystem FS. Use POOL for temporary > allocations. */ > > svn_fs_fs__verify_rev(fs, revision, pool); > > > > /* Verify metadata for ROOT. > > ### Currently only implemented for revision roots. */ > > svn_fs_fs__verify_root(svn_fs_root_t *root, pool); > > Similar docs are needed here too, or a cross-reference such as "Implements > svn_fs_verify...() for FSFS". > Done in r1439210. The order of parameters in svn_fs_verify() and the corresponding vtable > function and similar places would be more consistent with elsewhere if > rearranged as: > (main params, > notification, > cancellation, > pools). > Done in same revision. > > Modified: subversion/trunk/subversion/libsvn_fs_fs/rep-cache.c > > > ============================================================================== > > --- subversion/trunk/subversion/libsvn_fs_fs/rep-cache.c (original) > > +++ subversion/trunk/subversion/libsvn_fs_fs/rep-cache.c Sat Jan 19 > 22:45:02 > > @@ -135,6 +135,8 @@ svn_fs_fs__walk_rep_reference(svn_fs_t * > > void *walker_baton, > > svn_cancel_func_t cancel_func, > > void *cancel_baton, > > + svn_fs_progress_notify_func_t notify_func, > > + void *notify_baton, > > svn_revnum_t start, > > svn_revnum_t end, > > apr_pool_t *pool) > > @@ -164,6 +167,9 @@ svn_fs_fs__walk_rep_reference(svn_fs_t * > > SVN_ERR(svn_sqlite__reset(stmt)); > > if (SVN_IS_VALID_REVNUM(max)) /* The rep-cache could be empty. */ > > SVN_ERR(svn_fs_fs__revision_exists(max, fs, iterpool)); > > + > > + if (notify_func) > > + notify_func(SVN_INVALID_REVNUM, notify_baton, iterpool); > > } > > > > SVN_ERR(svn_sqlite__get_statement(&stmt, ffd->rep_cache_db, > > @@ -210,6 +216,16 @@ svn_fs_fs__walk_rep_reference(svn_fs_t * > > return svn_error_compose_create(err, svn_sqlite__reset(stmt)); > > > > SVN_ERR(svn_sqlite__step(&have_row, stmt)); > > + > > + /* Notify (occasionally, because walking is fast and we can't > > + guarantee a properly ordered notification sequence anyway) */ > > + if ( notify_func > > + && (iterations % 1024 == 0) > > + && (rep->revision != last_notified_revision)) > > + { > > + notify_func(rep->revision, notify_baton, iterpool); > > + last_notified_revision = rep->revision; > > + } > > } > > This function already calls a callback as its main purpose. We don't need > to add a separate notification > mechanism so it can tell us that it's called the callback: we can > implement whatever notification we want, *inside* the > existing callback. > > There's only one special case here, which is you put a notification in > after it does the initial global invariant. That's not really needed, as > the caller knows it's about to be done when it requests start==0, and knows > it's been done when it gets the first callback. (If you really want to > make a call at this point, it would be equally OK to call the main callback > with REP=NULL.) > > Advantages: (1) simplicity; (2) the *user* of the function controls how > often it notifies: it might want to do so on the basis of elapsed time > rather than number of loop iterations for example. > Good point! Done in r1439212. Once again an excellent review, Julian! -- Stefan^2. -- Certified & Supported Apache Subversion Downloads: * http://www.wandisco.com/subversion/download *

