Re: Why does cvs wait to the next second?

Paul Sander Tue, 25 Dec 2012 20:58:11 -0800

On Dec 25, 2012, at 2:47 PM, Urs Thuermann wrote:

> Paul Sander <[email protected]> writes:
> 
>> The specific reason for this is because CVS assumes that it was the
>> last to modify a file if its mod time matches the one recorded in
>> its Entries file.  If it's quickly modified by something else, then
>> CVS may still think it's up to date and both "cvs update" and "cvs
>> commit" will produce incorrect results.
>> 
>> There has been much discussion on this topic, and you can see
>> discussion of the rationale in the info-cvs archives.
> 
> OK, I've looked up the topic in the archives.  I assume it has already
> been suggested to change the "Entries" file format to use a hash
> instead of a time stamp.  But I haven't seen this in the info-cvs
> archive.  So wouldn't this be an option?  Otherwise, I'd like a
> command line option to disable the sleep, probably with a BIG warning
> that it should only be used if you know what you do.


I think that using hashes might have been discussed, but I don't recall
specific conversations.  The reliability of hashes, even cryptographic ones,
isn't foolproof, either.  Random content of the files is a simplifying
assumption when designing applications around hashes, and source code isn't
random content.  The MD5 and SHA-1 hashes have been broken in ways that I
believe matches use cases describing the natural evolution of source code.
This weakens the reliability claims of hashes to a degree, but truthfully I
don't know to what extent.  (Perhaps the effect is negligible in the real
world, at least in projects for which CVS is used.  Reducing the theoretical
probability of collision by several orders of magnitude still leaves a huge
number of files processed without incident.)

Anyway, if the use of hashes is limited strictly to the replacement of
timestamps in the Entries file (i.e., compute a hash at the time CVS writes
the file to the sandbox and write it to the Entries file, then later recompute
the hash and compare it to the Entries file), then the effect of a collision
is the same as we have observed with timestamps:  Incorrect behavior of
subsequent operations due to files believed to be up to date when they are
not.  The difference is that the breakage will be deterministic and there
will be no simple workaround, plus the overhead of computing the hashes
may become a factor.  (Note that it is useless to store hashes in the RCS
files due to keyword expansion, so you can't amortize some of the cost by
storing them at commit time.)

> I have a script that calls cvs checkout hundreds to thousands of times
> and that causes the script to run for half an hour or so instead of a
> few seconds.  The info-cvs archive also suggests using RCS tools
> instead of CVS.  Is it guaranteed that the CVS repository files will
> always have RCS format and RCS tools will work on them?

What is your use case that requires you to invoke "cvs checkout" so many
times?  Over a 30-minute interval, the sleeps begin to dominate execution
time at 900 invocations.  At that rate, CVS' locking mechanisms are also
significantly impacting performance.  Perhaps you are checking out each
source file individually?  If this is true then you should consider
reducing the number of invocations.  You can do this by checking out
directory trees or by simply specifying multiple paths on the
"cvs checkout" command line.  The use of tags or branch/timestamp pairs
would be useful here.  The use of xargs might also be helpful.

If you have path/version pairs (or path/tag pairs or even
path/branch/timestamp triples) then you can use RCS and conjure the CVS
meta-data yourself.  As you have discovered, there has been some discussion
of this method that appears in the archives detailing why it's fast and
reliable.  I have successfully done this method myself.

To my knowledge, CVS uses the standard RCS file format.  RCS produces
warnings if newphrase extensions are used in certain contexts, e.g. in
the initial admin section of the RCS file.  My experience in that area is
dated, so I don't know if this would be an issue with current versions of
either tool.

Re: Why does cvs wait to the next second?

Reply via email to