Re: Smoke, FUD (was Re: CVS corrupts binary files ...)

2004-07-02 Thread Greg A. Woods
[ On Thursday, July 1, 2004 at 14:33:11 (-0700), Paul Sander wrote: ]
 Subject: Re: Smoke, FUD (was Re: CVS corrupts binary files ...)

 What are you talking about?  I can think of only two ways that CVS
 uses the deltas:

Well, as usual you got off on the wrong track right from the start
again.

What part of RCS Compatible have you misunderstood this time?

 CVS has a notoriously poor diff and merge capability.

Well, that seems to depend entirely on your point of view and your
perpensity to try to use the wrong (or at least the least suitable) tool
for the job.

I and no doubt millions of other people have been incredibly satisfied
with the extremely wide applicability of the unix diff and merge
algorithms.  It's almost a miracle that they work so well for such a
variety of different kinds of text files -- or maybe it's just an
indication of how well designed they are for dealing with the vast
majority of forms of representation of human-readable information.

Of course you don't have to like them, but you do have to accept that
they are integral to RCS and thus integral to CVS.  Go away and go play
with xdelta and friends if you want.

-- 
Greg A. Woods

+1 416 218-0098  VE3TCPRoboHack [EMAIL PROTECTED]
Planix, Inc. [EMAIL PROTECTED]  Secrets of the Weird [EMAIL PROTECTED]


___
Info-cvs mailing list
[EMAIL PROTECTED]
http://lists.gnu.org/mailman/listinfo/info-cvs


Re: Smoke, FUD (was Re: CVS corrupts binary files ...)

2004-07-01 Thread Paul Sander
--- Forwarded mail from [EMAIL PROTECTED]

[ On Monday, June 28, 2004 at 19:02:19 (-0700), Paul Sander wrote: ]
 Subject: Re: Smoke, FUD (was Re: CVS corrupts binary files ...)

 I have never, ever advocated changing the format of an RCS file in a
 way that would break the ci, co, rcs, or rlog programs.  And although
 I strongly advocate the replacement of user-exposed diff and merge
 tools, I have never, ever advocated the replacement of the diff tool
 that computes the deltas stored in an RCS file.

Indeed -- instead you would rather use different algorithms for storing
deltas and for using them.

That would be just plain stupid, if indeed not eventually dangerous to
the integrity of a repository.

What are you talking about?  I can think of only two ways that CVS
uses the deltas:  Reconstructing complete versions, and annotating
version history.  For the purposes of this thread, which started out
with diffing and merging files, the tools require reconstructed versions.

Of course, the algorithms that produce the deltas and reconstruct the
original data must agree.  But that's all below the RCS API and is
completely invisible to the user.

Once the user has two or three complete files, he can apply any diff or
merge algorithm he wants to those files.  Recall the following sequence
of operations:

co -pancestor file,v  a
co -pcontributor file,v  c
diff3 -E file a c

Once again, the algorithms and data formats that maintain the integrity
of the RCS files is hidden away and invisible to the user by way of the
co and ci programs.  The user can replace the invocation of diff3 with any
tool that he chooses to perform the content merge.  Once done, the user
uses ci to produce a new delta in the RCS files, using the very algorithm
that produces the correct data for subsequent invocations of co.

There's absolutely no danger to the integrity of the RCS file, unless
someone mucks with the innards of co or ci.  And nobody is even hinting
that making such changes is desirable, at least with respect to the
deltatext phrases in the RCS files.  (There have been several
recommendations to exploit the areas of the rcsfile format that explicitly
permit extensions, but extensions of this nature have absolutely no effect
on RCS' ability to store and reconstruct versions, which I have demonstrated
in a separate message.)

The tools we now have for calculating and handling deltas are all
designed to work _together_, not in isolation of each other, and that
uniformity is as valuable to CVS as it is to RCS alone, if not more so.

What tools, specifically (and I mean, you need to name them and include
pointers to them so that the rest of us can look), are you talking about?
The RCS programs and CVS in its current implementation are the obvious
ones, and my comments withstand scrutiny on those.  What else are you
referring to?

How about you go off and spend the next, say, two years or so
intensively using such a scheme as you propose on a massively huge
variety of projects.  That should give you about 10% of the experience
the rest of the world has with using diff and diff3 and rcsmerge
uniformly for both purposes.

Then if you still think it's wise to use disparate techniques for
storing deltas and for using deltas then you can show your results and
raise your proposal here again.

In the mean time please keep in mind that there are not just a plethora
of tools for using diff-style deltas, but there's also an enormous
amount of human experience with them too.

I look forward to seeing your list of references, so that we can debate
the relative value of interpreting ed-like scripts for a least-common
denominator level of functionality, versus parsing the entire content of
a reconstructed file and applying domain-specific algorithms that
understand the type of data stored there.

You (and a few others) seem to want to throw the baby out with the bath
water, and all just so that a few hair-brained and lame mis-uses of CVS
will work better.  In the mean time if you (and others) had learned to
use the best tool for the job in the first place then you'd never have
had to dream up such a half-baked idea.

CVS has a notoriously poor diff and merge capability.  Integrating the
user-exposed features with better tools is a very good example of using
the best tool for the job.  And it's not a half-baked idea; the whole
idea of plug-ins is well established in the industry, and its feasibility
in CVS is proven.

--- End of forwarded message from [EMAIL PROTECTED]



___
Info-cvs mailing list
[EMAIL PROTECTED]
http://lists.gnu.org/mailman/listinfo/info-cvs


Re: Smoke, FUD (was Re: CVS corrupts binary files ...)

2004-07-01 Thread Paul Sander
--- Forwarded mail from [EMAIL PROTECTED]

[ On Monday, June 28, 2004 at 14:58:03 (-0700), Mark D. Baushke wrote: ]
 Subject: Re: Smoke, FUD (was Re: CVS corrupts binary files ...) 

 Yes, but diff is not diff3. diff is used for the
 delta format. diff3 is used by rcsmerge, not for
 fundamental version deltas.

I think you're confused -- the differencing algorithms used are
fudamentally intertwined (and fundamentally based on units of lines of
text).

This true, insofar as to maintain the integrity of the RCS files and
to reconstruct complete versions.

Pretending you can do merges using some other algorithm while still
trying to store your deltas in unix diff format is just leading everyone
down the garden path to a dark dank corner no-one really wants to be in.

What do we care what format the versions are stored in, as long as we
can recover the complete files and apply any tool we want to them?

Although I can imagine such a thing, I don't know of any merge tool
reads the ed-like scripts produced by the diff program and presents
a user interface to apply or omit specific deltas to an input file.
It's an interesting idea, and it might even be useful, but its
utility is limited.

On the other hand, reconstructing entire versions and applying
content-specific tools is far more useful.  For example, there is
research on hierarchical differencing algorithms that compare
tree-like structures like the ones produced by parsers of programming
languages.  I foresee that this will lead to a new wave of merge
tools that provide a much higher level of utility than line-based
tools like diff3.  This kind of work just isn't possible with line-based
deltas produced by the diff program.  (It's also possible that they
could lead to a new wave of archivers that provide RCS-like capability
but use the hierarchical diffs in the deltatext records, which will be
interesting.  But nobody's suggesting a possible replacement of the RCS
layer just yet.)

The uniform use of differencing algorithms and their corresponding merge
algorithms (which are of course just editing scripts), is what makes
it worthwhile to use something like RCS as the foundation for CVS in the
first place.

It's what makes it possible for systems like RCS to exploit the similarity
of sequential versions for efficient storage, to be sure.  But applying
a delta to reconstruct a version is very different from doing a content
merge of two or three fully reconstructed files.

I.e. it is not sufficient to just use the RCS delta format as a means of
archive compression -- that format is integral to the whole idea of
detecting, reporting, and merging, changes in any RCS-compatible tool.

Once again, no one is suggesting changing the way that RCS works.

 Are there really utilities out there that try to
 to read RCS formats directly and do not allow for
 rcsfile(5) syntax to be used? If so, could you
 name any of them?

Humans, for one.  :-)

(I know some folks can do manual merges of SCCS files, and though the
same techniques won't work quite so well on RCS files because of the
reverse delta thing, there are still a great many other valid reasons to
read and even repair RCS files by hand.)

There are a number of commercial software pacakges which are GNU RCS
compatible, apparently without using RCS source code, with the most
popular perhaps being CS-RCS (though I've not confirmed 100% that it
does not use RCS source code).  SourceCodeManager is apparently another,
and P4D yet another.

Perforce also uses RCS compatible files as its archive format, but I'm
not sure if its core RCS handling was derived from RCS source code or not.

I think I've just scratched the surface too, if any of the rumours I've
heard are close to true.

Well, if these tools are truly RCS compatible then they should be able
to ignore the newphrases we've been talking about.  And since there is
no proposition to change the format of the deltatext phrases, or any of
the other standard components of an RCS file, those tools should continue
to work.

BTW, I have also written a couple of tools that parse the RCS file syntax.
They conform to the rcsfile format and should tolerate extensions made as
newphrases as specified.

I have also seen commercial tools derived from RCS (specifically, the MKS
variety) that have made proprietary extensions and are no longer compatible
with the Gnu standard.

--- End of forwarded message from [EMAIL PROTECTED]



___
Info-cvs mailing list
[EMAIL PROTECTED]
http://lists.gnu.org/mailman/listinfo/info-cvs


Re: Smoke, FUD (was Re: CVS corrupts binary files ...)

2004-06-30 Thread Greg A. Woods
[ On Monday, June 28, 2004 at 19:02:19 (-0700), Paul Sander wrote: ]
 Subject: Re: Smoke, FUD (was Re: CVS corrupts binary files ...)

 I have never, ever advocated changing the format of an RCS file in a
 way that would break the ci, co, rcs, or rlog programs.  And although
 I strongly advocate the replacement of user-exposed diff and merge
 tools, I have never, ever advocated the replacement of the diff tool
 that computes the deltas stored in an RCS file.

Indeed -- instead you would rather use different algorithms for storing
deltas and for using them.

That would be just plain stupid, if indeed not eventually dangerous to
the integrity of a repository.

The tools we now have for calculating and handling deltas are all
designed to work _together_, not in isolation of each other, and that
uniformity is as valuable to CVS as it is to RCS alone, if not more so.

How about you go off and spend the next, say, two years or so
intensively using such a scheme as you propose on a massively huge
variety of projects.  That should give you about 10% of the experience
the rest of the world has with using diff and diff3 and rcsmerge
uniformly for both purposes.

Then if you still think it's wise to use disparate techniques for
storing deltas and for using deltas then you can show your results and
raise your proposal here again.

In the mean time please keep in mind that there are not just a plethora
of tools for using diff-style deltas, but there's also an enormous
amount of human experience with them too.

You (and a few others) seem to want to throw the baby out with the bath
water, and all just so that a few hair-brained and lame mis-uses of CVS
will work better.  In the mean time if you (and others) had learned to
use the best tool for the job in the first place then you'd never have
had to dream up such a half-baked idea.

-- 
Greg A. Woods

+1 416 218-0098  VE3TCPRoboHack [EMAIL PROTECTED]
Planix, Inc. [EMAIL PROTECTED]  Secrets of the Weird [EMAIL PROTECTED]


___
Info-cvs mailing list
[EMAIL PROTECTED]
http://lists.gnu.org/mailman/listinfo/info-cvs


Re: Smoke, FUD (was Re: CVS corrupts binary files ...)

2004-06-30 Thread Greg A. Woods
[ On Tuesday, June 29, 2004 at 02:18:26 (-0700), Paul Sander wrote: ]
 Subject: Re: Smoke, FUD (was Re: CVS corrupts binary files ...)

 I.e. How do you propose to make it possible for the standard RCS tools
 alone to re-create _every_ revision from all files created by this
 hacked system?
 
 Simple:  The delta text would not change.  See above.

It would be extremely short-sighted, if not downright stupid, to not
keep the delta format compatible with that used by the new delta tools.

You seem to have no appreciation whatsoever for the depth and breath to
which this format (and its easily computed variants) is used and
understood.

-- 
Greg A. Woods

+1 416 218-0098  VE3TCPRoboHack [EMAIL PROTECTED]
Planix, Inc. [EMAIL PROTECTED]  Secrets of the Weird [EMAIL PROTECTED]


___
Info-cvs mailing list
[EMAIL PROTECTED]
http://lists.gnu.org/mailman/listinfo/info-cvs


Re: Smoke, FUD (was Re: CVS corrupts binary files ...)

2004-06-30 Thread Greg A. Woods
[ On Monday, June 28, 2004 at 14:58:03 (-0700), Mark D. Baushke wrote: ]
 Subject: Re: Smoke, FUD (was Re: CVS corrupts binary files ...) 

 Yes, but diff is not diff3. diff is used for the
 delta format. diff3 is used by rcsmerge, not for
 fundamental version deltas.

I think you're confused -- the differencing algorithms used are
fudamentally intertwined (and fundamentally based on units of lines of
text).

Pretending you can do merges using some other algorithm while still
trying to store your deltas in unix diff format is just leading everyone
down the garden path to a dark dank corner no-one really wants to be in.

The uniform use of differencing algorithms and their corresponding merge
algorithms (which are of course just editing scripts), is what makes
it worthwhile to use something like RCS as the foundation for CVS in the
first place.

I.e. it is not sufficient to just use the RCS delta format as a means of
archive compression -- that format is integral to the whole idea of
detecting, reporting, and merging, changes in any RCS-compatible tool.


 Are there really utilities out there that try to
 to read RCS formats directly and do not allow for
 rcsfile(5) syntax to be used? If so, could you
 name any of them?

Humans, for one.  :-)

(I know some folks can do manual merges of SCCS files, and though the
same techniques won't work quite so well on RCS files because of the
reverse delta thing, there are still a great many other valid reasons to
read and even repair RCS files by hand.)

There are a number of commercial software pacakges which are GNU RCS
compatible, apparently without using RCS source code, with the most
popular perhaps being CS-RCS (though I've not confirmed 100% that it
does not use RCS source code).  SourceCodeManager is apparently another,
and P4D yet another.

Perforce also uses RCS compatible files as its archive format, but I'm
not sure if its core RCS handling was derived from RCS source code or not.

I think I've just scratched the surface too, if any of the rumours I've
heard are close to true.

-- 
Greg A. Woods

+1 416 218-0098  VE3TCPRoboHack [EMAIL PROTECTED]
Planix, Inc. [EMAIL PROTECTED]  Secrets of the Weird [EMAIL PROTECTED]


___
Info-cvs mailing list
[EMAIL PROTECTED]
http://lists.gnu.org/mailman/listinfo/info-cvs


Re: Smoke, FUD (was Re: CVS corrupts binary files ...)

2004-06-29 Thread Paul Sander
--- Forwarded mail from [EMAIL PROTECTED]

Mark, I agree with your response to Greg's claims about RCS
compatibility, or the lack thereof.

In particular, I am not aware of any fundamental
problems rcs 5.7 will have if someone were to
introduce a new keyword which would name a program
other than diff3 to be used in rcsmerge
operations. At most, I would expect a warning
message via the warnignore() function which would
specify

co: file,v: warning: Unknown phrases like `diff3hint ...;' are present.

and even so, a 'co -q file,v' would not generate
such a message.

So, I believe that adding a

'diff3hint someprogram;'

line to the RCS file should not be a problem for
co to still be able to checkout each and every
version of the file.

Rather than use a hint to expose an implementation detail, I suggest
recording a data type instead.  Maybe even a MIME type.  Then provide
a suitable mechanism to map data types to tools that are appropriate
to the environment.

BTW, CVS no longer uses rcsmerge; it co's the necessary versions
and runs diff3 directly.  So in a CVS context, pushing this capability
down to RCS isn't really a requirement.  However, I recognize the
usefulness of doing so, and would not oppose such a feature.  On the
other hand, doing so will likely be a duplication of effort because
CVS has client/server concerns that RCS does not, and that may necessitate
a different implementation.

Given that this would appear to be the desire of
at least a few folks out there who might want to
make CVS do a better job at merging structured
ASCII files such as XML or HTML format. And
further, that you seem to have objections to this
approach. And while I have known you to bring up
points I have overlooked in the past...

Not just structured ASCII files as you describe, but any file
containing structured data for which a merge tool is available.

This time around I just do not see anything that
would preclude such an approach of using an
external diff3 hint 'replacement' program for
doing a 'cvs update -jtag1 -jtag2' operation.

I will stipulate that such a program will likely
need to live on the server and furthermore that it
would not be interactive. In the absense of
finding such a program, CVS would likely resort to
using diff3 as a fallback, so its arguments would
likely need to match those of the diff3 program
itself... at least to the extent that cvs currently
uses various arguments to diff3.

I don't believe that such a program MUST live on the server.
Merge tools, like editors, have a way of becoming religious
icons, in situations where users have a choice.  Under such
circumstances, it becomes important to have client side
mappings between data types and merge tools.

Additionally, I don't believe that merge tools necessarily
need to be fully automated.  After the relevant versions have
been downloaded to the client (and the repository locks have
been cleared), the merge tools can run interactively.
However, I believe that CVS current intersperses merges with
downloads, and that would need to change before interactive
merges can be supported.

Also, CVS currently relies on diff3-style mark-ups to warn the
user when merge conflicts remain present at commit time.  Though
strictly speaking such warnings are not necessary, they are
incredibly useful.  And they'll be lost unless merge conflicts
are recorded another way.  One way is to lists conflicts in a
file stored in the CVS directory.  At commit time, skip the
scan for diff3 mark-ups and instead read the conflict list and
compare mod times of the relevant files.  If they have changed,
assume the conflicts have been resolved.

Let me state the scope of the thought experiment:

Goal: Provide a means whereby a cvs administrator
may cause a program other than diff3 to be used
when doing merge operations as a part of a
three-way merge of files in a sandbox. This
program might be defined as a keyword used as the
value of a 'diff3hint' followed by an 'id' which
could be looked up in a table that cvs could keep
to determine which executable and any additional
arguments above the diff3 form arguments might be
required.

Again, I think that recording a data type is a more straightforward
(or at least more easily understood) implementation.

Assertion: The diff3 replacement must handle
all of the args that cvs normally passes to diff3.

Yes.

Assertion: The diff3 replacement must not be
interactive in nature for client/server repository
uses.

Well, okay for the first implementation.  :-)

Assertion: The diff3 replacement must be able to
run just given the three versions of the file
without any other state.

Yes, but it would be nice to be able to pass in the version
numbers for column headings or the like, if the tool permits.

Assertion: That cvs continue to write new RCS files
in adherence to the syntax defined in rcsfile(5), but
allowing the introduction of one or more new phrases
and associated id word values as allowed for by the
RCS format syntax.


Re: Smoke, FUD (was Re: CVS corrupts binary files ...)

2004-06-29 Thread Paul Sander
--- Forwarded mail from [EMAIL PROTECTED]

[ On Monday, June 28, 2004 at 01:44:36 (-0700), Mark D. Baushke wrote: ]
 Subject: Re: Smoke, FUD (was Re: CVS corrupts binary files ...)

 The RCS format is very extensible and in fact the
 CVSNT folks have extended it already and I have had
 no problems using CVSNT repositories in conjunction
 with either CVS or RCS.

very is an over-statement of the first order!  ;-)

Sure, it's an extensible format, but not in the way that's been
suggested.  You can't get rid of the _exclusive_ use of diff et al
without entirely losing compatabilty with RCS.

Nobody has suggested abandoning diff for computing RCS deltas.
All discussion relating to replacement of diff and merge tools
have revolved around the user interface.  That's completely
different.

 I do not see support for your assertion that
 compatibility is far more than just the
 adherence to the syntax defined in rcsfile(5).

Sadly rcsfile(5) only describes the meta-syntax, not the nutsbolts of
how RCS files work and how they're actually used by the RCS package.

 So, I believe that adding a
 
 'diff3hint someprogram;'
 
 line to the RCS file should not be a problem for
 co to still be able to checkout each and every
 version of the file.

diff3hint in the way you're hinting it might be used is insufficient.

RCS directly interprets the content of the delta text information,
e.g. the likes of:

   @d5 1
   a5 1
   some new line of text
   d256 1
   @

See, for example, lib/rcsedit.c from the RCS source distribution.

You are obviously missing something here.  We're talking about
adding a newphrase in the admin, delta, or deltatext productions.
Using the deltatext production and your diff output as an example:

1.1
log
@this is a log message
@
diff3hint use-this-tool;
text
@d5 1
a5 1
some new line of text
d256 1
@

This obviously extends the RCS file format in a way that does
not break compatibility with the existing RCS software.  Following
is a complete RCS file that contains not one but three extensions,
but they're done in a way that is supported by the RCS file format.
And none of the RCS programmatic interfaces break.

head1.4;
access;
symbols;
locks; strict;
comment @# @;
admin-ext @this is an admin extension.@;


1.4
date2004.06.29.09.08.54;author paul;state Exp;
branches;
next1.3;

1.3
date2004.06.29.09.05.20;author paul;state Exp;
branches;
next1.2;
delta-ext @this is a delta extension.@;

1.2
date2004.06.29.09.04.53;author paul;state Exp;
branches;
next1.1;

1.1
date2004.06.29.09.04.24;author paul;state Exp;
branches;
next;


desc
@Test file.
@


1.4
log
@Added the beep!
@
text
@This is a test.  This is only a test.
If this had been an actual emergency,
it would have been too late.
BEP!
@


1.3
log
@Done!
@
deltatext-ext @this is a deltatext extension.@;
text
@d4 1
@


1.2
log
@First change.  Needs more work.
@
text
@d3 1
@


1.1
log
@Initial revision
@
text
@d2 1
@

Any modification of the diff algorithm would almost certainly require
changes to the syntax of this delta text.

Actually, this isn't true.  The diff program itself implements multiple
algorithms.  But that's neither here nor there because nobody is
recommending that the format of the differences be changed.

As far as I can tell the extensibility of the RCS,v syntax does not go
so far as to provide for callouts to add-on programs and I'm arguing
that it's _far_ too late to try to modify this widely used standard file
format now.

It's never too late to update a standard.  In any case, RCS file
extensibility has been in the standard for a very long time now.

So, how _exactly_ do you propose to convince the standard co program
(or the equivalent in any other RCS-compatible tool suite, including the
current CVS implementations) to actually make use of the new delta
text syntax that such a hack would create?

I.e. How do you propose to make it possible for the standard RCS tools
alone to re-create _every_ revision from all files created by this
hacked system?

Simple:  The delta text would not change.  See above.

It's simply not possible.  Like I said, only the bare surface of RCS
compatability is scratched by the meta-syntax described in rcsfile(5).

Absolutely untrue, as demonstrated by the RCS file above.

The RCS file format is intricately intertwined with the unix diff
algorithm, which is itself tightly dependent on the normal use of
lines of text to represent elements of a the source files being managed

This much is true.

(at least when it comes to automated merging for concurrent editing).

But that is not.

The diff and merge algorithms that the user applies are distinct from
the diff algorithm that computes the deltas.  They can be changed, as
I've demonstrated before on several occasions.  It so happens that
the rcsdiff and rcsmerge programs, and CVS itself, use the same
programs forr both purposes, but there is no technical

Re: Smoke, FUD (was Re: CVS corrupts binary files ...)

2004-06-29 Thread Mark D. Baushke
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Paul Sander [EMAIL PROTECTED] writes:

 --- Forwarded mail from [EMAIL PROTECTED]
 
 Rather than use a hint to expose an
 implementation detail, I suggest recording a
 data type instead. Maybe even a MIME type. Then
 provide a suitable mechanism to map data types
 to tools that are appropriate to the
 environment.

I have no fundamental objection to saving the MIME
type. I suggest that it may need to be inside of a
string to pass the syntax of rcsfile(5). I would
actually suggest that it might be useful to just
borrow both of the MIME media-type and charset
concepts. That might allow for a 

  media-type text/plain;
  charset ks_c_5601-1987;

on a given file... the defaults should probably
be text/plain and iso-8859-1 or utf-8

 BTW, CVS no longer uses rcsmerge; it co's the
 necessary versions and runs diff3 directly. So
 in a CVS context, pushing this capability down
 to RCS isn't really a requirement. However, I
 recognize the usefulness of doing so, and would
 not oppose such a feature. On the other hand,
 doing so will likely be a duplication of effort
 because CVS has client/server concerns that RCS
 does not, and that may necessitate a different
 implementation.

Yes, I am aware that CVS no longer uses rcsmerge.
However, Greg was suggesting that RCS
compatibility would be broken by an extension such
as the one outlined in the thought experiment I
provided, so I felt it reasonable to mention how
RCS itself used diff3 in the past.

 Given that this would appear to be the desire of
 at least a few folks out there who might want to
 make CVS do a better job at merging structured
 ASCII files such as XML or HTML format. And
 further, that you seem to have objections to this
 approach. And while I have known you to bring up
 points I have overlooked in the past...
 
 Not just structured ASCII files as you describe,
 but any file containing structured data for
 which a merge tool is available.

Ahh, but I am not really trying to suggest that
binary files are suitable in the general case
for CVS control. That is a separate argument.

That said, I suppose that a merge utility that
understands how to merge a file containing lines
in a non-ISO-LATIN character set might also fall
into the category of a diff3 replacement and that
such files might be considered 'binary' by some
programs.

 This time around I just do not see anything that
 would preclude such an approach of using an
 external diff3 hint 'replacement' program for
 doing a 'cvs update -jtag1 -jtag2' operation.
 
 I will stipulate that such a program will likely
 need to live on the server and furthermore that it
 would not be interactive. In the absense of
 finding such a program, CVS would likely resort to
 using diff3 as a fallback, so its arguments would
 likely need to match those of the diff3 program
 itself... at least to the extent that cvs currently
 uses various arguments to diff3.
 
 I don't believe that such a program MUST live on
 the server.

The changes needed to allow the client-side to do
a merge are very large. I am not willing to
stipulate an implementation that would allow CVS
to deal with an interactive merge operation for a
random 'cvs update' command. The repository would
have a lock open for too long in that case.

 Merge tools, like editors, have a way of
 becoming religious icons, in situations where
 users have a choice. Under such circumstances,
 it becomes important to have client side
 mappings between data types and merge tools.

Your arguments almost help to make a case in
Greg's favor against allowing a diff3 replacement.

The kind of flexibility you desire is not
something that I think makes sense to bolt into
the 'diff3' slot.

What you propose would potentially best be handled
with an entirely new kind of update paradigm.
Possibly the use of a CVS/Base/file file and a
'patch' that would bring CVS/Base/file up to the
latest version would be 'better' in this case...

 Additionally, I don't believe that merge tools
 necessarily need to be fully automated.

Here we do not agree. Without such automation,
lock contention on directories could get very
intense.

 After the relevant versions have been downloaded
 to the client (and the repository locks have
 been cleared), the merge tools can run
 interactively. However, I believe that CVS
 current intersperses merges with downloads, and
 that would need to change before interactive
 merges can be supported.

The current CVS operations all occur on the server
side prior to downloading patches to the client.

What you are suggesting is a fairly major overhaul
to the cvs client/server protocol and as such
there is probably a 'better' way to deal with this
than a 'simple' alternative table of diff3-style
programs to do alternative merger algorithms.

 Also, CVS currently relies on diff3-style
 mark-ups to warn the user when merge conflicts
 remain present at commit time.

Yes, I should have stated that a failed merger
will 

Re: Smoke, FUD (was Re: CVS corrupts binary files ...)

2004-06-29 Thread Paul Sander
--- Forwarded mail from [EMAIL PROTECTED]

Paul Sander [EMAIL PROTECTED] writes:

 --- Forwarded mail from [EMAIL PROTECTED]
=20
 Rather than use a hint to expose an
 implementation detail, I suggest recording a
 data type instead. Maybe even a MIME type. Then
 provide a suitable mechanism to map data types
 to tools that are appropriate to the
 environment.

I have no fundamental objection to saving the MIME
type. I suggest that it may need to be inside of a
string to pass the syntax of rcsfile(5). I would
actually suggest that it might be useful to just
borrow both of the MIME media-type and charset
concepts. That might allow for a=20

  media-type text/plain;
  charset ks_c_5601-1987;

on a given file... the defaults should probably
be text/plain and iso-8859-1 or utf-8

Do you propose that the media-type be valid on its own, for data
types where charsets have no meaning?  Or put another way, is
the charset solely to provide additional processing hints to supplement
the media-type, or is the charset also required?

 Given that this would appear to be the desire of
 at least a few folks out there who might want to
 make CVS do a better job at merging structured
 ASCII files such as XML or HTML format. And
 further, that you seem to have objections to this
 approach. And while I have known you to bring up
 points I have overlooked in the past...
=20
 Not just structured ASCII files as you describe,
 but any file containing structured data for
 which a merge tool is available.

Ahh, but I am not really trying to suggest that
binary files are suitable in the general case
for CVS control. That is a separate argument.

Fair enough, but the practice is more common than anyone wants to
admit.  The issue must be faced at some point.

That said, I suppose that a merge utility that
understands how to merge a file containing lines
in a non-ISO-LATIN character set might also fall
into the category of a diff3 replacement and that
such files might be considered 'binary' by some
programs.

Indeed.

 This time around I just do not see anything that
 would preclude such an approach of using an
 external diff3 hint 'replacement' program for
 doing a 'cvs update -jtag1 -jtag2' operation.
=20
 I will stipulate that such a program will likely
 need to live on the server and furthermore that it
 would not be interactive. In the absense of
 finding such a program, CVS would likely resort to
 using diff3 as a fallback, so its arguments would
 likely need to match those of the diff3 program
 itself... at least to the extent that cvs currently
 uses various arguments to diff3.
=20
 I don't believe that such a program MUST live on
 the server.

The changes needed to allow the client-side to do
a merge are very large. I am not willing to
stipulate an implementation that would allow CVS
to deal with an interactive merge operation for a
random 'cvs update' command. The repository would
have a lock open for too long in that case.

Yes, to avoid long-lived locks, the necessary files must be
copied to the client before the merge begins.  This would
involve a significant change to the client, but I'm not
convinced that it would be a significant change to the server.
The server already has the ability to send whole revisions
to the client, and it need not be involved with the merge
once it starts.

 Merge tools, like editors, have a way of
 becoming religious icons, in situations where
 users have a choice. Under such circumstances,
 it becomes important to have client side
 mappings between data types and merge tools.

Your arguments almost help to make a case in
Greg's favor against allowing a diff3 replacement.

Horrors!  I sure hope not!  :-)

The kind of flexibility you desire is not
something that I think makes sense to bolt into
the 'diff3' slot.

Then bolt in a wrapper that reads the user's environment
and invokes a suitable merge tool based on preferences
that are found there.  And provide a default, like diff3,
if such information is missing.

What you propose would potentially best be handled
with an entirely new kind of update paradigm.
Possibly the use of a CVS/Base/file file and a
'patch' that would bring CVS/Base/file up to the
latest version would be 'better' in this case...

Whatever's most efficient to get the other contributor
and common ancestor to the client.  Clean-up needs to
be considered as well.

 Additionally, I don't believe that merge tools
 necessarily need to be fully automated.

Here we do not agree. Without such automation,
lock contention on directories could get very
intense.

Again, running the merge after relevant data have been
copied out and freeing the locks would remove this
issue.

Actually, the ancestor and contributor are checked-in
versions, and they're known in advance either by version
number or branch/timestamp.  Correct me if I'm wrong here.
If this really is true, then directory locks aren't even
needed in the repository.

This specific issue has been discussed in this forum
once before, 

Re: Smoke, FUD (was Re: CVS corrupts binary files ...)

2004-06-28 Thread Greg A. Woods
[ On Monday, June 28, 2004 at 01:44:36 (-0700), Mark D. Baushke wrote: ]
 Subject: Re: Smoke, FUD (was Re: CVS corrupts binary files ...)

 The RCS format is very extensible and in fact the
 CVSNT folks have extended it already and I have had
 no problems using CVSNT repositories in conjunction
 with either CVS or RCS.

very is an over-statement of the first order!  ;-)

Sure, it's an extensible format, but not in the way that's been
suggested.  You can't get rid of the _exclusive_ use of diff et al
without entirely losing compatabilty with RCS.

 I do not see support for your assertion that
 compatibility is far more than just the
 adherence to the syntax defined in rcsfile(5).

Sadly rcsfile(5) only describes the meta-syntax, not the nutsbolts of
how RCS files work and how they're actually used by the RCS package.

 So, I believe that adding a
 
 'diff3hint someprogram;'
 
 line to the RCS file should not be a problem for
 co to still be able to checkout each and every
 version of the file.

diff3hint in the way you're hinting it might be used is insufficient.

RCS directly interprets the content of the delta text information,
e.g. the likes of:

@d5 1
a5 1
some new line of text
d256 1
@

See, for example, lib/rcsedit.c from the RCS source distribution.

Any modification of the diff algorithm would almost certainly require
changes to the syntax of this delta text.

As far as I can tell the extensibility of the RCS,v syntax does not go
so far as to provide for callouts to add-on programs and I'm arguing
that it's _far_ too late to try to modify this widely used standard file
format now.

So, how _exactly_ do you propose to convince the standard co program
(or the equivalent in any other RCS-compatible tool suite, including the
current CVS implementations) to actually make use of the new delta
text syntax that such a hack would create?

I.e. How do you propose to make it possible for the standard RCS tools
alone to re-create _every_ revision from all files created by this
hacked system?

It's simply not possible.  Like I said, only the bare surface of RCS
compatability is scratched by the meta-syntax described in rcsfile(5).

The RCS file format is intricately intertwined with the unix diff
algorithm, which is itself tightly dependent on the normal use of
lines of text to represent elements of a the source files being managed
(at least when it comes to automated merging for concurrent editing).

Meanwhile there are other change delta file formats and other version
tracking tools that use those other formats, and often there are also
tools that will convert RCS/CVS repositories into those other formats.
I.e. there's no _real_ fundamental need to hack on RCS,v syntax.

-- 
Greg A. Woods

+1 416 218-0098  VE3TCPRoboHack [EMAIL PROTECTED]
Planix, Inc. [EMAIL PROTECTED]  Secrets of the Weird [EMAIL PROTECTED]


___
Info-cvs mailing list
[EMAIL PROTECTED]
http://lists.gnu.org/mailman/listinfo/info-cvs


Re: Smoke, FUD (was Re: CVS corrupts binary files ...)

2004-06-28 Thread Mark D. Baushke
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Greg A. Woods [EMAIL PROTECTED] writes:

 [ On Monday, June 28, 2004 at 01:44:36 (-0700), Mark D. Baushke wrote: ]
  Subject: Re: Smoke, FUD (was Re: CVS corrupts binary files ...)
 
  The RCS format is very extensible and in fact the
  CVSNT folks have extended it already and I have had
  no problems using CVSNT repositories in conjunction
  with either CVS or RCS.
 
 very is an over-statement of the first order!  ;-)

Agreed. :-)

 Sure, it's an extensible format, but not in the
 way that's been suggested. You can't get rid of
 the _exclusive_ use of diff et al without
 entirely losing compatabilty with RCS.

Yes, but diff is not diff3. diff is used for the
delta format. diff3 is used by rcsmerge, not for
fundamental version deltas.

  I do not see support for your assertion that
  compatibility is far more than just the
  adherence to the syntax defined in rcsfile(5).
 
 Sadly rcsfile(5) only describes the meta-syntax,
 not the nutsbolts of how RCS files work and how
 they're actually used by the RCS package.

True, but examiniation of the rcs sources (or cvs
sources) can help you a lot.

  So, I believe that adding a
  
  'diff3hint someprogram;'
  
  line to the RCS file should not be a problem
  for co to still be able to checkout each and
  every version of the file.
 
 diff3hint in the way you're hinting it might
 be used is insufficient.

Why?

 RCS directly interprets the content of the delta
 text information, e.g. the likes of:
 
   @d5 1
   a5 1
   some new line of text
   d256 1
   @
 
 See, for example, lib/rcsedit.c from the RCS
 source distribution.

Yes, and that is the concern of 'diff' NOT 'diff3'.

My assumptions explicitly did NOT address any
requirements other than that a 'diff3' replacement
be used. Where did your assertion that this requires
'diff' to be changed arise?

 Any modification of the diff algorithm would
 almost certainly require changes to the syntax
 of this delta text.

I did not suggest modification of the diff format.
I suggested modification of the diff3 program to
be used.

 As far as I can tell the extensibility of the
 RCS,v syntax does not go so far as to provide
 for callouts to add-on programs and I'm arguing
 that it's _far_ too late to try to modify this
 widely used standard file format now.

With existing RCS, you may compile it to use
DIFF3_BIN as any path you wish. There is nothing
to guarentee that the diff3 does what the GNU
diff3 program did...

 So, how _exactly_ do you propose to convince the
 standard co program (or the equivalent in any
 other RCS-compatible tool suite, including the
 current CVS implementations) to actually make
 use of the new delta text syntax that such a
 hack would create?

I propose that co use diff just as it has
always done.

I am not proposing any change to the delta
structure at all. 

The thought experiment is proposing a change in
the function called to do three way diff and merge
operations.

 I.e. How do you propose to make it possible for
 the standard RCS tools alone to re-create
 _every_ revision from all files created by this
 hacked system?

What I suggested does not require this.

 It's simply not possible.

You say this, but are assuming facts that were not
supported. Why does a change to 'diff3' for merge
operations imply or require a change to 'diff' for
everything else?

 Like I said, only the bare surface of RCS
 compatability is scratched by the meta-syntax
 described in rcsfile(5).

Why or how would a change in diff3 impact delta
formats for RCS? The DIFF3 binary is used only in
rcs-5.7/src/merger.c and plays no direct role in
checkout or commit of RCS files.

 The RCS file format is intricately intertwined
 with the unix diff algorithm, 

Actually, I suspect this to be false. I believe
the RCS delta section format is intertwined with
the ed(1) command format.

 which is itself tightly dependent on the
 normal use of lines of text to represent
 elements of a the source files being managed (at
 least when it comes to automated merging for
 concurrent editing).

And all of that is not material to the current
thought experiment.

 Meanwhile there are other change delta file
 formats and other version tracking tools that
 use those other formats, and often there are
 also tools that will convert RCS/CVS
 repositories into those other formats. I.e.
 there's no _real_ fundamental need to hack on
 RCS,v syntax.

Hmmm... I may have missed something completely.
Most of the ones I have seen (the CVS to ClearCase
script for example) use the RCS commands to
checkout a file, the log message, and the date and
author information and then checkin that file to
their own system.

Are there really utilities out there that try to
to read RCS formats directly and do not allow for
rcsfile(5) syntax to be used? If so, could you
name any of them?

If such do exist, there would probably need to be
a utility to strip out the 'diff3hint ...;' lines
- From

Re: Smoke, FUD (was Re: CVS corrupts binary files ...)

2004-06-28 Thread Paul Sander
--- Forwarded mail from Greg Woods:

[ On Thursday, June 17, 2004 at 16:49:42 (-0700), Paul Sander wrote: ]
 Subject: Smoke, FUD (was Re: CVS corrupts binary files ...)

 If this is true, then we're in violent agreement.  But to date, you have
 argued that making the necessary changes to CVS to give better support
 for data types not handled well specifically by the diff and diff3 programs
 would break compatibility with RCS, which is demonstrably false.

Have you not looked at the content of an RCS file lately Paul?

RCS compatability is far more than just the adherence to the syntax
defined in rcsfile(5).  If the generic co program from the RCS package
cannot extract any and every revision of a file from a file claiming to
be an RCS file then that file is clearly not RCS compatible.

I have never, ever advocated changing the format of an RCS file in a
way that would break the ci, co, rcs, or rlog programs.  And although
I strongly advocate the replacement of user-exposed diff and merge
tools, I have never, ever advocated the replacement of the diff tool
that computes the deltas stored in an RCS file.

(That is not to say that I  have never suggested making incompatible
changes, but in context such suggestions have always carried caveats
and recognized the lack of desirability of losing a valuable feature.)

I don't know where you seem to be getting the idea that I'm recommending
doing a global search and replace of diff with some other tool.  That
is clearly not the case.  The RCS file format must be retained, unless
we as a group decide to abandon it after weighing the consequences.

However, I do advocate extending the RCS file format in ways that
the RCS API can accomodate.  The rcsfile(5) manual specifically allows
for extensions in the admin and delta sections of the file.  For
example, I do recommend using a newphrase in the admin section to identify
the type of data stored in the file, but not until the rename problem is
solved.

 How am I spreading Fear, Uncertainty, or Doubt? 

Maybe hypocrisy would be a better description of your approach to CVS.

I don't believe I'm misrepresenting any of my beliefs about CVS or SCM
in general.  I've tried very hard to explain them clearly, and I've tried
especially hard to drill them into that rock that you carry on your
shoulders, but I'm obviously using the wrong screwdriver.

--- End of forwarded message from [EMAIL PROTECTED]



___
Info-cvs mailing list
[EMAIL PROTECTED]
http://lists.gnu.org/mailman/listinfo/info-cvs