RE: merge mode for XML

2002-05-15 Thread Sean Hager

This thread is a die hard, but it is still the best conversation on the list
;)

sean.


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED]]On Behalf Of
 Peter Ring
 Sent: Tuesday, May 14, 2002 7:16 PM
 To: [EMAIL PROTECTED]
 Subject: RE: merge mode for XML
 Importance: Low


 A paper that will interest you:

 (preliminary version)
 http://citeseer.nj.nec.com/cache/papers/cs/15339/http:zSzzSzww
 w.cs.arizona.e
 duzSzpeoplezSztodszSzacceptedzSz2000zSzParsonsEmancipating.pdf
 /parsons00eman
 cipating.pdf

 (published)
 http://portal.acm.org/citation.cfm?id=357778coll=portaldl=AC
 MCFID=2131136
 CFTOKEN=70981949

 Abstract:
 Database design commonly assumes, explicitly or implicitly,
 that instances
 must belong to
 classes. This can be termed the assumption of inherent
 classification. We
 argue that the extent and complexity of problems in schema
 integration,
 schema evolution, and interoperability are, to a large
 extent, consequences
 of inherent classification. Furthermore, we make the case that the
 assumption of inherent classification violates philosophical
 and cognitive
 guidelines on classification and is, therefore, inappropriate
 in view of the
 role of data modeling in representing knowledge about
 application domains.

 Also, a search for 'semantic interoperability' should return some
 interesting hits.

 To tell the difference between two (or three) sequences of
 bytes is not too
 difficult; comparing two sequences A and B to determine their
 longest common
 subsequence (LCS) or the edit distance between them has been
 much studied.
 GNU diff is based on an algorithm published by Eugene W.
 Myers in 1986.

 To tell the difference (distance) between two semantic structures is
 difficult in a very fundamental way.

 Kind regards
 Peter Ring


 -Original Message-
 From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]]On Behalf Of
Glew, Andy
Sent: 13. maj 2002 19:32
To: [EMAIL PROTECTED]; Glew, Andy
Cc: Gary Bisaga
Subject: RE: merge mode for XML


  Motivation: schema changes in most existing relational databases are
  onerous.

 For very good reason.

And what is that reason?

OK, I admit that some RDBMS applications in production
need stability - just like some systems software applications
(the kind Greg seems to work on, the kind I used to
work on) value stability above all else, and actively
want to make it hard to change things.

However, there are other application domains
- in programming, the domains attacked by agile
methodologies like XP (eXtreme Programming).
{Donning asbestos underwear, expecting Greg
to flame.}

An application area that I frequently work in nowadays
is experimental databases - databases for experimental data.
I want to archive all of my experimental data in a form that
allows me to do arbitrary SQL-like queries over it.

Problem is, as I continue my research, the format of
my records is continually changing.  For example, a few years
ago I might have recorded CPU MHz and Cache Size as
configuration parameters - now I have to record at least
3 different cache sizes, as well as multiple clock domain
frequencies. Not to mention that the observations that
I record are constantly changing.
Rather than continually reformatting my database,
adding new fields which are Unknown or Null on old data,
I find it easier to add records containing fields that were not
known earlier.

snip /


___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-05-15 Thread Greg A. Woods

[ On Monday, May 13, 2002 at 10:31:36 (-0700), Glew, Andy wrote: ]
 Subject: RE: merge mode for XML

   Motivation: schema changes in most existing relational databases are
   onerous.
  
  For very good reason.
 
 And what is that reason?

because an RDBMS cannot understand the semantic qualities of your data
unless you describe them to it -- they are, by definition, not inutitive.

 An application area that I frequently work in nowadays
 is experimental databases - databases for experimental data.
 I want to archive all of my experimental data in a form that
 allows me to do arbitrary SQL-like queries over it.

I think you really need to look at flat text files.  IIRC there are SQL
engines available that will access them -- and with about as much
expectations to performance, though in my book AWK will do as well.
Even if you have millions of results per table AWK will do just fine.

 Problem is, as I continue my research, the format of
 my records is continually changing.

Or maybe OODBMS technology is more what you need  But that may cost
you more than using AWK

 I've tried to do this in a traditional RDBMS database.
 I've asked database experts like deWitt and the guy who
 invented transactions whose name I can't remember now...
 and the answer always comes that the traditional RDBMS way
 is to create a database in fully normalized form,
 of the form Experiment#:Metric:Value.
 Worse, it may be ncessary to create several different tables
 for each type.  It is impossible for ordinary humans
 to write queries in such a form.

Heh.  Yes, you do need an OODBMS

 Yet, self-schematization makes it trivial to do.

Perhaps only in an OODBMS

 Well, deWitt is the big advocate of ORDBMS
 - Object Relational DBMS.

H  I've not been a big fan of ORDBMS, but then I'm not a huge
DBMS user.  I don't know what the tradeoffs are w.r.t. a true OODBMS.

-- 
Greg A. Woods

+1 416 218-0098;  [EMAIL PROTECTED];  [EMAIL PROTECTED];  [EMAIL PROTECTED]
Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED]

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-05-14 Thread Glew, Andy

  Motivation: schema changes in most existing relational databases are
  onerous.
 
 For very good reason.

And what is that reason?

OK, I admit that some RDBMS applications in production
need stability - just like some systems software applications
(the kind Greg seems to work on, the kind I used to
work on) value stability above all else, and actively
want to make it hard to change things.

However, there are other application domains
- in programming, the domains attacked by agile
methodologies like XP (eXtreme Programming).
{Donning asbestos underwear, expecting Greg
to flame.}

An application area that I frequently work in nowadays
is experimental databases - databases for experimental data.
I want to archive all of my experimental data in a form that
allows me to do arbitrary SQL-like queries over it.

Problem is, as I continue my research, the format of
my records is continually changing.  For example, a few years
ago I might have recorded CPU MHz and Cache Size as 
configuration parameters - now I have to record at least
3 different cache sizes, as well as multiple clock domain 
frequencies. Not to mention that the observations that
I record are constantly changing.
Rather than continually reformatting my database,
adding new fields which are Unknown or Null on old data,
I find it easier to add records containing fields that were not
known earlier.

I've tried to do this in a traditional RDBMS database.
I've asked database experts like deWitt and the guy who
invented transactions whose name I can't remember now...
and the answer always comes that the traditional RDBMS way
is to create a database in fully normalized form,
of the form Experiment#:Metric:Value.
Worse, it may be ncessary to create several different tables
for each type.  It is impossible for ordinary humans
to write queries in such a form.

Yet, self-schematization makes it trivial to do.
All that is needed is more flexible handling of nulls
than most RDBMSes support - more like the handling
that Codd, Date, and Darwent(sp?) advocate.



 I suspect Dewitt is thinking a little bit deeper than you suspect.
 Certainly data can be self-describing -- that's what OO is all about.
 OO databases can effectively be queried about their schemas...
 An RDBMS, however, is not an OODBMS.

Well, deWitt is the big advocate of ORDBMS
- Object Relational DBMS.


 Whether an XML document without a DTD and/or schema can be considered
 self-describing enough to be independent like an object instance or a
 set of object instances, is probably what you're trying to 
 argue, but I won't go any further since such a thing is strictly outside 
 the scope of XML proper and is way outside the scope of what a common tool

 like CVS should ever deem worthy of dealing with.

Fair enough.

My original email was prompted by email from you,
Greg, that sounded like CVS should not have support for
XML, like supporting file-format-specific diff and merge,
because XML without a DTD is meaningless.

I reject that as a specious argument.

Your remaining argument, that nobody has stepped up to do
external diff and merge, emains valid. (Ditto wrt file renaming,
multiple repositories, etc.)

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-05-14 Thread Peter Ring

A paper that will interest you:

(preliminary version)
http://citeseer.nj.nec.com/cache/papers/cs/15339/http:zSzzSzwww.cs.arizona.e
duzSzpeoplezSztodszSzacceptedzSz2000zSzParsonsEmancipating.pdf/parsons00eman
cipating.pdf

(published)
http://portal.acm.org/citation.cfm?id=357778coll=portaldl=ACMCFID=2131136
CFTOKEN=70981949

Abstract:
Database design commonly assumes, explicitly or implicitly, that instances
must belong to
classes. This can be termed the assumption of inherent classification. We
argue that the extent and complexity of problems in schema integration,
schema evolution, and interoperability are, to a large extent, consequences
of inherent classification. Furthermore, we make the case that the
assumption of inherent classification violates philosophical and cognitive
guidelines on classification and is, therefore, inappropriate in view of the
role of data modeling in representing knowledge about application domains.

Also, a search for 'semantic interoperability' should return some
interesting hits.

To tell the difference between two (or three) sequences of bytes is not too
difficult; comparing two sequences A and B to determine their longest common
subsequence (LCS) or the edit distance between them has been much studied.
GNU diff is based on an algorithm published by Eugene W. Myers in 1986.

To tell the difference (distance) between two semantic structures is
difficult in a very fundamental way.

Kind regards
Peter Ring


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of
Glew, Andy
Sent: 13. maj 2002 19:32
To: [EMAIL PROTECTED]; Glew, Andy
Cc: Gary Bisaga
Subject: RE: merge mode for XML


  Motivation: schema changes in most existing relational databases are
  onerous.

 For very good reason.

And what is that reason?

OK, I admit that some RDBMS applications in production
need stability - just like some systems software applications
(the kind Greg seems to work on, the kind I used to
work on) value stability above all else, and actively
want to make it hard to change things.

However, there are other application domains
- in programming, the domains attacked by agile
methodologies like XP (eXtreme Programming).
{Donning asbestos underwear, expecting Greg
to flame.}

An application area that I frequently work in nowadays
is experimental databases - databases for experimental data.
I want to archive all of my experimental data in a form that
allows me to do arbitrary SQL-like queries over it.

Problem is, as I continue my research, the format of
my records is continually changing.  For example, a few years
ago I might have recorded CPU MHz and Cache Size as
configuration parameters - now I have to record at least
3 different cache sizes, as well as multiple clock domain
frequencies. Not to mention that the observations that
I record are constantly changing.
Rather than continually reformatting my database,
adding new fields which are Unknown or Null on old data,
I find it easier to add records containing fields that were not
known earlier.

snip /


___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-05-06 Thread Sean Hager


 I disargee.  Doing this would force a policy onto CVS
 users where such a policy isn't really necessary.
 
 I think using extensions for any decision making is
 bad.Don't you think it would be bad to force the
 same diff/merge onto several files that had no
 extension?
 
 There's two important issues here, really:
 1. The default diff/merge for a new file.
 2. The actual diff/merge of an existing file.
 
 Greg is talking about the second issue.  I have an
 inkling keeping this info on a per-version basis won't
 work but I haven't come up with anything substantial.
 
 I'm not sure which issue you're talking about.  If
 it's the second, then using extensions would not allow
 anyone to override the diff/merge for any reason
 thereby putting the users at the mercy of CVS.

Actually pattern matching would put the users at the mercy
 of CVS more then extension ( really I mean wild card )
 matching.  Pattern matching could be very unreliable and
 produce different results based on the content of the
 document per version, when the format per version has not
 changed.  Wild card matching puts the users in the drivers
 seat.  You can control how CVS will work with your files
 with naming conventions.  I think programmers
 are smart enough to follow naming conventions, and
 understand the consequences of breaking the conventions.

sean.





___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-05-06 Thread Noel Yap

--- Sean Hager [EMAIL PROTECTED] wrote:
 Actually pattern matching would put the users at the
 mercy
  of CVS more then extension ( really I mean wild
 card )
  matching.  Pattern matching could be very
 unreliable and
  produce different results based on the content of
 the
  document per version, when the format per version
 has not
  changed.  Wild card matching puts the users in the
 drivers
  seat.  You can control how CVS will work with your
 files
  with naming conventions.

I had understood pattern matching to be pattern
matching the name, not the contents, of the file.  In
this context, pattern matching would be an extended
form of extenstion matching.

OTOH, the pattern matching you mention is more like
the magic file.  I actually think this is an even
better mechanism.  IIRC, magic files work using
several ways (including extension matching and some
content checking) to guess at a file's type.  Note
that the content that magic looks at is typically in
the header or footer of the file in question.  These
first and/or last few bytes of files are pretty good
ways to guess at a file's type.  man file for an
example of how well this works.

  I think programmers
  are smart enough to follow naming conventions, and
  understand the consequences of breaking the
 conventions.

I agree.  This doesn't, however, cover all the
remaining issues with regards to extension checking:
1. Extenstions don't have one-to-one mapping with file
types.
2. Not all files have extensions (this is actually a
specific case of the former).

It's also not clear whether you're talking about using
extensions for the initial settings, or for the
life-time settings, of the file.  Do you think users
should be able to override whatever CVS thinks
should happen?

Noel


__
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-05-06 Thread Noel Yap

--- Sean Hager [EMAIL PROTECTED] wrote:
 but only unix files have magic file numbers correct?

No, I have file working on Win2k.

If, however, a system doesn't have magic files, CVS
can fallback on pattern matching filenames to set the
initial behaviours upon cvs add.  The alternative
would be for CVS to implement its own magic system.

  I agree.  This doesn't, however, cover all the
  remaining issues with regards to extension
 checking:
  1. Extenstions don't have one-to-one mapping with
 file
  types.
  2. Not all files have extensions (this is actually
 a
  specific case of the former).
  
  It's also not clear whether you're talking about
 using
  extensions for the initial settings, or for the
  life-time settings, of the file.  Do you think
 users
  should be able to override whatever CVS thinks
  should happen?
  
  Noel
 
 For the solid, predictable, common cases CVS could
 have out of the box 
 configurations.  For the not so clear cases, admins
 would 
 configure the perticular installation, and perhaps
 have to
 establish some conventions (naming) to isolate the
 file type.

Don't you think there'd be a huge discussion as to
what exactly constitutes solid, predictable, common
cases?  Even if key people agreed on this list, don't
you think it'll bloat CVS?

It sounds like you're saying that, once configured,
users cannot override the settings.  This is bad
since, as was stated before, extensions don't have a
one-to-one mapping to types.

 But, at least they have the options to do so if they
 need to.

Not if CVS has the final word as to the type of the
file (whether it's configured or not).

Noel

__
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-05-06 Thread Greg A. Woods

[ On Monday, May 6, 2002 at 07:58:09 (-0500), Sean Hager wrote: ]
 Subject: RE: merge mode for XML

 
 Actually pattern matching would put the users at the mercy
  of CVS more then extension ( really I mean wild card )
  matching.

Wildcard matching *is* pattern matching!

Unless you mean regular expressions when you say pattern.

  Pattern matching could be very unreliable and
  produce different results based on the content of the
  document per version, when the format per version has not
  changed.

Ah, you're talking about some form of pattern matching on the
content as opposed to filename matching.

Content matching is actually at hell of a lot more reliable than
filename matching.  After all it's the content type we're trying to
determine, not some mythical file naming convention!

PLEASE look at the sources of, and references to, the file(1) command.
(eg. including the Apache mod_mime_magic content identification module)

  Wild card matching puts the users in the drivers
  seat.  You can control how CVS will work with your files
  with naming conventions.  I think programmers
  are smart enough to follow naming conventions, and
  understand the consequences of breaking the conventions.

No matter what the matching technology, nor whether it matches against
the filename or the file content, it's still got to be used _only_ as an
initial guess as to the file content type.  Every revision's deltatext
should contain the file type in a newphrase (this is simpler than trying
to track resurrections against branches, etc.).  It must also be
possible to set and reset the file type for any given revision(s) in
order to correct any initial matching failures.

-- 
Greg A. Woods

+1 416 218-0098;  [EMAIL PROTECTED];  [EMAIL PROTECTED];  [EMAIL PROTECTED]
Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED]

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



Re: merge mode for XML

2002-05-05 Thread Mark A. Flacy

sarcasm
No doubt that's why nobody ever does it the other way on planet Earth.

Except, maybe, apache MIME magic.  Or the file test.
/sarcasm
___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



Re: merge mode for XML

2002-05-05 Thread Pierre Asselin

In [EMAIL PROTECTED] [EMAIL PROTECTED] (Greg A. Woods) writes:

It could be a good idea, if it were modified slightly.  Since the type
of content in a CVS RCS file revision can change from one revision to
another the type of merge tool must be declared in a newphrase in each
deltatext section.

That would be the type of the *file*, not the merge tool.  In a
three-way merge, the tool would depend on the three newphrases
of the revisions involved.  Ain't that fun?
___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-05-04 Thread Noel Yap

--- Sean Hager [EMAIL PROTECTED] wrote:
  No.  Not on extension, but based on *regular
 expressions*, or at least
  shell-style   pattern  matching   expressions.  
 Extensions   are  too
  simplistic.  (c.f. CVSROOT/cvswrappers, 
 CVSROOT/cvsignore)
 
 Extensions would work fine, pattern matching is
 overkill.

I think use of a magic file would be more appropriate
to set the default diff/merge program.

  Yes.   Some mechanisms  like  ~/.mime.types plus 
 ~/.mailcap would  be
  desirable.  But  one more complication  would be
 the version  of these
  external programs.  Maybe, CVS needs to keep track
 of which version of
  the tools  were used for which  file revisions, so
 as  to reliably and
  faithfully reproduce any snapshot.
 
 This is a bit more overkill.  Admins should test and
 make backups before
 changing diff/merge programs during production. 
 Even still, most updates in
 diff/merge programs would be to fix bugs and would
 not dramatically change
 the program functionality.

I argee.  Further, if versioning were desired, the
diff/merge tool can be versioned if it's kept in
CVSROOT like all the other admin stuff.

Noel



__
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-05-04 Thread Noel Yap

--- Sean Hager [EMAIL PROTECTED] wrote:
 on earth, extension matching would be fine.  Unless
 you have rogue
 developers that try and break the system by
 changing file formation while
 keeping extensions the same (save it as a jpg, but
 it is really a gif
 format) you should not have a problem.  If you do
 have rogue developers, or
 even developers that can't follow simple
 instructions such as hey, if it is
 not a jpeg then don't save it as a jpeg! then you
 have much larger
 problems.
 ie. maybe the inmates of San Quinton do not make the
 idea
 development team.

I disargee.  Doing this would force a policy onto CVS
users where such a policy isn't really necessary.

I think using extensions for any decision making is
bad.Don't you think it would be bad to force the
same diff/merge onto several files that had no
extension?

There's two important issues here, really:
1. The default diff/merge for a new file.
2. The actual diff/merge of an existing file.

Greg is talking about the second issue.  I have an
inkling keeping this info on a per-version basis won't
work but I haven't come up with anything substantial.

I'm not sure which issue you're talking about.  If
it's the second, then using extensions would not allow
anyone to override the diff/merge for any reason
thereby putting the users at the mercy of CVS.

Noel


__
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-05-04 Thread Paul Sander

--- Forwarded mail from [EMAIL PROTECTED]

 Yeah.  That'd  be a cool feature.  But  then, CVS will no  longer be a
 standalone  program.  If  you move  the repository  to  another server
 where the modules are missing, how would you expect CVS to behave?

The plugins would be part of the module, so if you moved the module to
another CVS
repository running the same versions of CVS everything would still work
perfectly.
(or if you moved the entire repository)

Agreed.

 Consider the  case where programmers are working  on UNIX workstations
 with  an NFS  mounted CVS  repository.   The cvs  command just  runs
 locally,  without using  the  CVS client-server  operations.  In  this
 case, it is possible that  those pluggable modules are present on some
 workstations but not the others.   So, files commited from one machine
 may fail to  check out in another.   And how can we now  make sure the
 snapshots  (tags) is  really  reproducible, when  CVS  now depends  on
 extension modules?

CVS would have to use a thin client, and the diff/merge program would have
to be on
executed on the server side.  This would ensure that rules are enforced
correctly.

Unfortunately, the diff/merge program might have too high an overhead,
or they might be interactive and therefore inappropriate for use on the
server side.  My preference is to have diffs and merges run on the client
side, but recognize the value of having the option of running them on
the server (e.g. if the merge can be done competely automatically in
all cases, as with diff3).  If I were to implement something like this
today, I'd put a switch in the type manager to specify where the merge
program runs.

Another issue is that the diff/merge programs may not even be present
on the server.  Take sourceforge as an example:  They won't install
custom software on their servers just to supply a free service, especially
if the tools come from commercial vendors.

--- End of forwarded message from [EMAIL PROTECTED]


___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-05-04 Thread Paul Sander

--- Forwarded mail from [EMAIL PROTECTED]

[ On Friday, May 3, 2002 at 14:49:11 (-0500), Sean Hager wrote: ]
 Subject: RE: merge mode for XML

 
 
  No.  Not on extension, but based on *regular expressions*, or at least
  shell-style   pattern  matching   expressions.   Extensions   are  too
  simplistic.  (c.f. CVSROOT/cvswrappers,  CVSROOT/cvsignore)
 
 Extensions would work fine, pattern matching is overkill.

Neither is suitable or sufficient.

The actual type must be explicitly recorded in every delta, or at least
the initial delta and every delta following a dead delta.

'Course, if CVS used a single RCS file for the entire lifetime of a
single file, then the admin newphrase works fine.  But that's another
argument left for the past...

--- End of forwarded message from [EMAIL PROTECTED]


___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-05-04 Thread Paul Sander

--- Forwarded mail from [EMAIL PROTECTED]

   No.  Not on extension, but based on *regular
 expressions*, or at least
   shell-style   pattern  matching   expressions.
 Extensions   are  too
   simplistic.  (c.f. CVSROOT/cvswrappers,  CVSROOT/cvsignore)
 
  Extensions would work fine, pattern matching is overkill.

 Neither is suitable or sufficient.

 The actual type must be explicitly recorded in every delta,
 or at least
 the initial delta and every delta following a dead delta.

on earth, extension matching would be fine.  Unless you have rogue
developers that try and break the system by changing file formation while
keeping extensions the same (save it as a jpg, but it is really a gif
format) you should not have a problem.  If you do have rogue developers, or
even developers that can't follow simple instructions such as hey, if it is
not a jpeg then don't save it as a jpeg! then you have much larger
problems.
ie. maybe the inmates of San Quinton do not make the idea
development team.

I think Greg's point here is that there are times when a file is removed
and later replaced with a new one that contains a different type of data.
Case in point:  ASCII text file names foo.doc is replaced by a Microsoft
Word document containing the same content and is stored with the same
name.  It could be argued that this practice is unsound, but it does happen.

There's no good solution to this problem using the current CVS design,
but Greg suggests a workaround that works well enough in some common cases.
There are other times when the data type is the same but the content is
completely unrelated, in which case the diff/merge should be avoided
altogether.  But CVS punts this scenario.

--- End of forwarded message from [EMAIL PROTECTED]


___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-05-04 Thread Paul Sander

--- Forwarded mail from [EMAIL PROTECTED]

Greg is talking about the second issue.  I have an
inkling keeping this info on a per-version basis won't
work but I haven't come up with anything substantial.

Here's one:

- Create a new file and check in a few versions on the trunk.
- Create a branch.
- Check in a few versions on the branch, but change the data type before
  the first commit.  All versions on the branch contain the same data type,
  but they differ from the trunk.
- Merge the branch to the trunk *OR* update the branch from the trunk.

What's the correct action for the last step?  Is it to refuse to merge
because the tools are different?  Is it to copy the contributor and change
the data type on the target?  Is it to somehow convert one data type to
the other and invoke the proper merge tool and record that tool with the
next commit?

None of these choices is a good one because there are valid cases where
each one is correct.  I'm inclined to go with the first one and diagnose
an error, but I'm sure others would disagree.

--- End of forwarded message from [EMAIL PROTECTED]


___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-05-04 Thread Noel Yap

--- Paul Sander [EMAIL PROTECTED] wrote:
 --- Forwarded mail from [EMAIL PROTECTED]
 
 Greg is talking about the second issue.  I have an
 inkling keeping this info on a per-version basis
 won't
 work but I haven't come up with anything
 substantial.
 
 Here's one:
 
 - Create a new file and check in a few versions on
 the trunk.
 - Create a branch.
 - Check in a few versions on the branch, but change
 the data type before
   the first commit.  All versions on the branch
 contain the same data type,
   but they differ from the trunk.
 - Merge the branch to the trunk *OR* update the
 branch from the trunk.
 
 What's the correct action for the last step?  Is it
 to refuse to merge
 because the tools are different?  Is it to copy the
 contributor and change
 the data type on the target?  Is it to somehow
 convert one data type to
 the other and invoke the proper merge tool and
 record that tool with the
 next commit?
 
 None of these choices is a good one because there
 are valid cases where
 each one is correct.  I'm inclined to go with the
 first one and diagnose
 an error, but I'm sure others would disagree.

Yes, I think it should be an error since it would be
unwise to expect CVS to be able to act on files of two
different types.  I guess ideally CVS would be able to
be configured to use diff/merge programs that did
their work on files of different types (eg similar to
overloaded functions), but this is way too much to
hope for.

As a workaround, the user can change the diff/merge
program for the trunk, then perform the operation
without impunity.

Come to think of it, you might've mentioned this
before, all that the file has to keep track of is its
type.  CVS can then map that type to a particular
diff/merge program.  This has two advantages:
1. Changing the diff/merge for a bunch of files with
the same type is easy.
2. Adding more type-specific behaviour in the future
would be easier.

Noel


__
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



Re: merge mode for XML

2002-05-03 Thread Greg A. Woods

[ On , May 2, 2002 at 09:33:45 (+0200), Lee Sau Dan wrote: ]
 Subject: Re: merge mode for XML

  Paul == Paul Sander [EMAIL PROTECTED] writes:
 
 Paul A better implementation would be to code a symbolic name for
 Paul the merge tool in a newphrase in the admin section the RCS
 Paul file, and look up that symbolic name on the client to locate
 Paul the proper tool.
 
 Good idea!  There  could be a look up table in  CVSROOT to provide the
 defaults,  and  then the  client  will have  its  own  config file  to
 override the defaults.  (Like CVSROOT/cvsignore vs. ~/.cvsignore.)

It could be a good idea, if it were modified slightly.  Since the type
of content in a CVS RCS file revision can change from one revision to
another the type of merge tool must be declared in a newphrase in each
deltatext section.

-- 
Greg A. Woods

+1 416 218-0098;  [EMAIL PROTECTED];  [EMAIL PROTECTED];  [EMAIL PROTECTED]
Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED]

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-05-03 Thread Greg A. Woods

[ On Wednesday, May 1, 2002 at 13:33:08 (-0700), Glew, Andy wrote: ]
 Subject: RE: merge mode for XML

 Well, I wrote Perl-SQL, a relational database system that
 is self-schematizing - where every record can define its own schema,
 with its own fields.

Yeah, that sounds like something a perl hacker would do

 Motivation: schema changes in most existing relational databases are
 onerous.

For very good reason.

 3-4 years ago I discussed self-schematization with Prof. David Dewitt, a man of some
 reknown in database circles.  His take is that self-schematization was not 
 done in the early days to save space, and that now it is not unreasonable to do so.
 
 However, rather than Perl-SQL, he pointed to me towards XML, saying something
 like well formed SQL doesn't require a schema or DTD -- that is the future.
 
 ---
 
 I.e., Greg, not all RDBMS experts agree with you about schemas; ditto DTDs.

I suspect Dewitt is thinking a little bit deeper than you suspect.
Certainly data can be self-describing -- that's what OO is all about.
OO databases can effectively be queried about their schemas, and since
all proper objects know how to interact with other objects, even those
in different classes (i.e. of different types), their relationships are
self-defining.

An RDBMS, however, is not an OODBMS.

Whether an XML document without a DTD and/or schema can be considered
self-describing enough to be independent like an object instance or a
set of object instances, is probably what you're trying to argue, but I
won't go any further since such a thing is strictly outside the scope of
XML proper and is way outside the scope of what a common tool like CVS
should ever deem worthy of dealing with.

-- 
Greg A. Woods

+1 416 218-0098;  [EMAIL PROTECTED];  [EMAIL PROTECTED];  [EMAIL PROTECTED]
Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED]

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-05-03 Thread Sean Hager



 No.  Not on extension, but based on *regular expressions*, or at least
 shell-style   pattern  matching   expressions.   Extensions   are  too
 simplistic.  (c.f. CVSROOT/cvswrappers,  CVSROOT/cvsignore)

Extensions would work fine, pattern matching is overkill.


 Yes.   Some mechanisms  like  ~/.mime.types plus  ~/.mailcap would  be
 desirable.  But  one more complication  would be the version  of these
 external programs.  Maybe, CVS needs to keep track of which version of
 the tools  were used for which  file revisions, so as  to reliably and
 faithfully reproduce any snapshot.

This is a bit more overkill.  Admins should test and make backups before
changing diff/merge programs during production.  Even still, most updates in
diff/merge programs would be to fix bugs and would not dramatically change
the program functionality.

Sean.



___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-05-03 Thread Sean Hager


 Yeah.  That'd  be a cool feature.  But  then, CVS will no  longer be a
 standalone  program.  If  you move  the repository  to  another server
 where the modules are missing, how would you expect CVS to behave?

The plugins would be part of the module, so if you moved the module to
another CVS
repository running the same versions of CVS everything would still work
perfectly.
(or if you moved the entire repository)


 Consider the  case where programmers are working  on UNIX workstations
 with  an NFS  mounted CVS  repository.   The cvs  command just  runs
 locally,  without using  the  CVS client-server  operations.  In  this
 case, it is possible that  those pluggable modules are present on some
 workstations but not the others.   So, files commited from one machine
 may fail to  check out in another.   And how can we now  make sure the
 snapshots  (tags) is  really  reproducible, when  CVS  now depends  on
 extension modules?

CVS would have to use a thin client, and the diff/merge program would have
to be on
executed on the server side.  This would ensure that rules are enforced
correctly.

Sean.




___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-05-03 Thread Greg A. Woods

[ On Friday, May 3, 2002 at 14:49:11 (-0500), Sean Hager wrote: ]
 Subject: RE: merge mode for XML

 
 
  No.  Not on extension, but based on *regular expressions*, or at least
  shell-style   pattern  matching   expressions.   Extensions   are  too
  simplistic.  (c.f. CVSROOT/cvswrappers,  CVSROOT/cvsignore)
 
 Extensions would work fine, pattern matching is overkill.

Neither is suitable or sufficient.

The actual type must be explicitly recorded in every delta, or at least
the initial delta and every delta following a dead delta.

-- 
Greg A. Woods

+1 416 218-0098;  [EMAIL PROTECTED];  [EMAIL PROTECTED];  [EMAIL PROTECTED]
Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED]

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-05-03 Thread Sean Hager


   No.  Not on extension, but based on *regular
 expressions*, or at least
   shell-style   pattern  matching   expressions.
 Extensions   are  too
   simplistic.  (c.f. CVSROOT/cvswrappers,  CVSROOT/cvsignore)
 
  Extensions would work fine, pattern matching is overkill.

 Neither is suitable or sufficient.

 The actual type must be explicitly recorded in every delta,
 or at least
 the initial delta and every delta following a dead delta.



on earth, extension matching would be fine.  Unless you have rogue
developers that try and break the system by changing file formation while
keeping extensions the same (save it as a jpg, but it is really a gif
format) you should not have a problem.  If you do have rogue developers, or
even developers that can't follow simple instructions such as hey, if it is
not a jpeg then don't save it as a jpeg! then you have much larger
problems.
ie. maybe the inmates of San Quinton do not make the idea
development team.


sean.



___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



Re: merge mode for XML

2002-05-03 Thread Eric Siegerman

On Fri, May 03, 2002 at 04:43:11PM -0400, Greg A. Woods wrote:
 [ On Friday, May 3, 2002 at 14:49:11 (-0500), Sean Hager wrote: ]
  Subject: RE: merge mode for XML
   No.  Not on extension, but based on *regular expressions*, or at least
   shell-style   pattern  matching   expressions.   Extensions   are  too
   simplistic.  (c.f. CVSROOT/cvswrappers,  CVSROOT/cvsignore)
  
  Extensions would work fine, pattern matching is overkill.
 
 Neither is suitable or sufficient.

Agreed!  I *know* I've had conflicts in CVSROOT/cvswrappers in
the past (having a text file treated as binary because it happens
to have an extension that also signifies some binary format);
just can't remember what they were.

But here are some counterexamples anyway:

  - .doc:  we all know what M$ thinks it means, but other people
have their own ideas.  I've seen text files with .doc
extensions, and I'd bet other word- or document-processors
have used it too

  - .cfg:  some kind of configuration file ... but it could be
anything from Windows-ini format to XML to pseudo-Lisp to
binary

  - .cgi:  again, that says how it functions -- in this case,
what its API is -- but says *nothing* about the file format;
CGI scripts can be in any language you like, including C

  - In general, on *NIX machines where extensions are more a
convention than an OS-mandated thing, people tend to play
fast and loose with them.  E.g. one could conceive of a
directory full of files named for Internet domains --
including Australian ones ending in .au, which is also an
audio format

  - And the killer is .bak:  can be anything under the sun, of
course.  Usually people don't check them in, but it would be
foolish in the extreme to presume they *never* do, and to
make doing so functionally useless

That's just from the A-D part of a list of file extensions.  I'm
sure there are lots more conflicts in E-Z.

Oh, and of course there's .sys;  even M$ can't decide what that
signifies.

--

|  | /\
|-_|/ Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED]
|  |  /
Anyone who swims with the current will reach the big music steamship;
whoever swims against the current will perhaps reach the source.
- Paul Schneider-Esleben

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



Re: merge mode for XML

2002-05-02 Thread Lee Sau Dan

 Sean == Sean Hager [EMAIL PROTECTED] writes:

Sean I agree that XML is overkill, but the truth is that it is
Sean here to stay.

Sean XML is fastly becoming excepted as the defacto standard for
.

accepted?  :)


Sean If CVS had away to use modular plug in diff and merge
Sean programs, we could setup a wrapper file that would
Sean automatically diff/merge the file differently based on the
Sean extension.

Yeah.  That'd  be a cool feature.  But  then, CVS will no  longer be a
standalone  program.  If  you move  the repository  to  another server
where the modules are missing, how would you expect CVS to behave?

Consider the  case where programmers are working  on UNIX workstations
with  an NFS  mounted CVS  repository.   The cvs  command just  runs
locally,  without using  the  CVS client-server  operations.  In  this
case, it is possible that  those pluggable modules are present on some
workstations but not the others.   So, files commited from one machine
may fail to  check out in another.   And how can we now  make sure the
snapshots  (tags) is  really  reproducible, when  CVS  now depends  on
extension modules?

Some more thoughts  have to be made before  it becomes reliable.  Some
more changes to the current  architecture will be needed.  This is not
a  simple  change.  Of  course,  I'm  not  saying pluggable  extension
modules  is  a  bad  thing.   And  it  would  be  also  nice  to  have
keyword-substitution modules, too, so that we can keep $Id$ as comment
tags in  GIF/JPEG/PNG/TIFF, for instance.  But some  major redesign of
CVS would  be required to manage  the which versions  of which modules
were used  to commit which changes,  so that CVS  could faithfully and
reliably reproduce the snapshots.




-- 
Lee Sau Dan §õ¦u´°(Big5)~{@nJX6X~}(HZ) 

E-mail: [EMAIL PROTECTED]
Home page: http://www.informatik.uni-freiburg.de/~danlee
___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-05-02 Thread Glew, Andy

[Greg Woods]:
... conversations about XML and DTDs ...
 ...
 well formed by definition should mean in conformance to a 
 pre-existing DTD!
 ...
 Do you build relational databases without defining a schema?  

Well, I wrote Perl-SQL, a relational database system that
is self-schematizing - where every record can define its own schema,
with its own fields.

Motivation: schema changes in most existing relational databases are
onerous.
Even just adding fields is painful. Self-schematization allows new fields to
be
added on the fly, improving documentation of the experiment results 
that are my target data, because any observed features can be easily added
to the schema in a structured way.  As opposed to all of those database
schemas
that have a miscellaneous text or comment field, where far too often all the
critical data that you wish to process lives. Self-schematization allows you
to
do all SQL operations across spontaneously added fields.

---

3-4 years ago I discussed self-schematization with Prof. David Dewitt, a man
of some
reknown in database circles.  His take is that self-schematization was not 
done in the early days to save space, and that now it is not unreasonable to
do so.

However, rather than Perl-SQL, he pointed to me towards XML, saying
something
like well formed SQL doesn't require a schema or DTD -- that is the
future.

---

I.e., Greg, not all RDBMS experts agree with you about schemas; ditto DTDs.

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



Re: merge mode for XML

2002-05-02 Thread Lee Sau Dan

 Paul == Paul Sander [EMAIL PROTECTED] writes:

Paul A better implementation would be to code a symbolic name for
Paul the merge tool in a newphrase in the admin section the RCS
Paul file, and look up that symbolic name on the client to locate
Paul the proper tool.

Good idea!  There  could be a look up table in  CVSROOT to provide the
defaults,  and  then the  client  will have  its  own  config file  to
override the defaults.  (Like CVSROOT/cvsignore vs. ~/.cvsignore.)




-- 
Lee Sau Dan §õ¦u´°(Big5)~{@nJX6X~}(HZ) 

E-mail: [EMAIL PROTECTED]
Home page: http://www.informatik.uni-freiburg.de/~danlee
___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



Re: merge mode for XML

2002-05-02 Thread Jesús M. NAVARRO

Hi, Greg:

Greg A. Woods wrote:
 [ On Wednesday, May 1, 2002 at 11:34:02 (-0400), Gary Bisaga wrote: ]
 
Subject: RE: merge mode for XML

Good point, Noel. At my last job we had a partner we were required to
connect to, and it was a job getting even an example XML document out of
them, let alone a DTD or schema.
 
 
 You guys are just re-iterating all the silly arguments that have come
 and gone with EDI.  New face, same old problem.
 

Yeah! been there seen that.  What's the good of XML if there's no 
concert about the DTD/schema to be used!?
-- 
SALUD,
Jesus
***
[EMAIL PROTECTED]
***
Desde Zaragoza, busco empleo - 
http://www.geocities.com/jesusm_navarro/CV/cv.html
***

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



Re: merge mode for XML

2002-05-02 Thread Lee Sau Dan

 Noel == Noel Yap [EMAIL PROTECTED] writes:

 If CVS had away to use modular plug in diff and merge
 programs, we could setup a wrapper file that would
 automatically diff/merge the file differently based on the
 extension.  e.g.:
 
 *.xml xml_dm *.html html_dm

Noel Ideally, the diff/merge tool would be tied to the type of

Please add also a keyword-substitution tool to the check list.


Noel the file and the type of the file is initially set depending
Noel on the extension.

No.  Not on extension, but based on *regular expressions*, or at least
shell-style   pattern  matching   expressions.   Extensions   are  too
simplistic.  (c.f. CVSROOT/cvswrappers,  CVSROOT/cvsignore) 


Noel   This way, one would be able to change the
Noel type of the file independent of its extension.

Yes.   Some mechanisms  like  ~/.mime.types plus  ~/.mailcap would  be
desirable.  But  one more complication  would be the version  of these
external programs.  Maybe, CVS needs to keep track of which version of
the tools  were used for which  file revisions, so as  to reliably and
faithfully reproduce any snapshot.


-- 
Lee Sau Dan §õ¦u´°(Big5)~{@nJX6X~}(HZ) 

E-mail: [EMAIL PROTECTED]
Home page: http://www.informatik.uni-freiburg.de/~danlee
___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-05-01 Thread Gary Bisaga

Sorry, this strikes me as just a little bit extreme. I agree that you ought
to write DTDs or schemas (just yesterday I had to make one of our developers
do so, and our own internal XML infrastructure requires them). But to call
documents without DTDs/schemas not XML and unworthy of configuration
management is certainly not supported by the XML spec or common usage. For
one thing, as I'm sure you know, the XML spec does not seem to deprecate
well-formed XML documents. When I was in the W3 XML working group (1999)
there was certainly a group of us (not everybody) who believed that
well-formed documents had a place in the world.

And if we take this tack, what about constructs not declarable with DTDs?
XML Schemas will certainly improve this, but many people are not using them
yet. Are DTDs with ANY declarations also not XML, since they really don't
describe the semantics of the document? Since DTDs can't describe data types
or other restrictions (such as field length), is any DTD'ed document not
XML?

DTDs and schemas are good and should be used wherever possible. But there
are realities of life.

 gary

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of
Greg A. Woods
Sent: Wednesday, May 01, 2002 1:56 AM
To: Peter Ring
Cc: CVS-II Discussion Mailing List
Subject: RE: merge mode for XML
 rantThere's a class of simple XML documents that live and
 die without getting near either a DTD or revision control.
 Without a schema and accompanying documentation, there's no
 way to tell the semantics of the XML document, and not much
 point in version management./rant

Amen.  I couldn't agree more!
Those who dare call such things XML are sadly mistaken.


___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-05-01 Thread Noel Yap

Not only that, but in the end it is the client who
decides the real semantics of the document with or
without DTDs and Schemas.

Noel
--- Gary Bisaga [EMAIL PROTECTED] wrote:
 Sorry, this strikes me as just a little bit extreme.
 I agree that you ought
 to write DTDs or schemas (just yesterday I had to
 make one of our developers
 do so, and our own internal XML infrastructure
 requires them). But to call
 documents without DTDs/schemas not XML and
 unworthy of configuration
 management is certainly not supported by the XML
 spec or common usage. For
 one thing, as I'm sure you know, the XML spec does
 not seem to deprecate
 well-formed XML documents. When I was in the W3 XML
 working group (1999)
 there was certainly a group of us (not everybody)
 who believed that
 well-formed documents had a place in the world.
 
 And if we take this tack, what about constructs not
 declarable with DTDs?
 XML Schemas will certainly improve this, but many
 people are not using them
 yet. Are DTDs with ANY declarations also not XML,
 since they really don't
 describe the semantics of the document? Since DTDs
 can't describe data types
 or other restrictions (such as field length), is any
 DTD'ed document not
 XML?
 
 DTDs and schemas are good and should be used
 wherever possible. But there
 are realities of life.
 
  gary
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED]]On Behalf Of
 Greg A. Woods
 Sent: Wednesday, May 01, 2002 1:56 AM
 To: Peter Ring
 Cc: CVS-II Discussion Mailing List
 Subject: RE: merge mode for XML
  rantThere's a class of simple XML documents that
 live and
  die without getting near either a DTD or revision
 control.
  Without a schema and accompanying documentation,
 there's no
  way to tell the semantics of the XML document, and
 not much
  point in version management./rant
 
 Amen.  I couldn't agree more!
 Those who dare call such things XML are sadly
 mistaken.
 
 
 ___
 Info-cvs mailing list
 [EMAIL PROTECTED]
 http://mail.gnu.org/mailman/listinfo/info-cvs


__
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-05-01 Thread Gary Bisaga

Good point, Noel. At my last job we had a partner we were required to
connect to, and it was a job getting even an example XML document out of
them, let alone a DTD or schema.

 gary

-Original Message-
From: Noel Yap [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, May 01, 2002 11:25 AM
To: Gary Bisaga; CVS-II Discussion Mailing List
Subject: RE: merge mode for XML


Not only that, but in the end it is the client who
decides the real semantics of the document with or
without DTDs and Schemas.

Noel
--- Gary Bisaga [EMAIL PROTECTED] wrote:
 Sorry, this strikes me as just a little bit extreme.
 I agree that you ought
 to write DTDs or schemas (just yesterday I had to
 make one of our developers
 do so, and our own internal XML infrastructure
 requires them). But to call
 documents without DTDs/schemas not XML and
 unworthy of configuration
 management is certainly not supported by the XML
 spec or common usage. For
 one thing, as I'm sure you know, the XML spec does
 not seem to deprecate
 well-formed XML documents. When I was in the W3 XML
 working group (1999)
 there was certainly a group of us (not everybody)
 who believed that
 well-formed documents had a place in the world.

 And if we take this tack, what about constructs not
 declarable with DTDs?
 XML Schemas will certainly improve this, but many
 people are not using them
 yet. Are DTDs with ANY declarations also not XML,
 since they really don't
 describe the semantics of the document? Since DTDs
 can't describe data types
 or other restrictions (such as field length), is any
 DTD'ed document not
 XML?

 DTDs and schemas are good and should be used
 wherever possible. But there
 are realities of life.

  gary

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED]]On Behalf Of
 Greg A. Woods
 Sent: Wednesday, May 01, 2002 1:56 AM
 To: Peter Ring
 Cc: CVS-II Discussion Mailing List
 Subject: RE: merge mode for XML
  rantThere's a class of simple XML documents that
 live and
  die without getting near either a DTD or revision
 control.
  Without a schema and accompanying documentation,
 there's no
  way to tell the semantics of the XML document, and
 not much
  point in version management./rant

 Amen.  I couldn't agree more!
 Those who dare call such things XML are sadly
 mistaken.


 ___
 Info-cvs mailing list
 [EMAIL PROTECTED]
 http://mail.gnu.org/mailman/listinfo/info-cvs


__
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com


___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-05-01 Thread Greg A. Woods

[ On Wednesday, May 1, 2002 at 11:11:32 (-0400), Gary Bisaga wrote: ]
 Subject: RE: merge mode for XML

 Sorry, this strikes me as just a little bit extreme. I agree that you ought
 to write DTDs or schemas (just yesterday I had to make one of our developers
 do so, and our own internal XML infrastructure requires them). But to call
 documents without DTDs/schemas not XML and unworthy of configuration
 management is certainly not supported by the XML spec or common usage. For
 one thing, as I'm sure you know, the XML spec does not seem to deprecate
 well-formed XML documents. When I was in the W3 XML working group (1999)
 there was certainly a group of us (not everybody) who believed that
 well-formed documents had a place in the world.

Just because some stupid ill-advised practice is common doesn't mean it
should be condoned.  Specification committees are, by their very nature,
political creatures.  Those of you who practice such shady techniques
are able to sway a political process, but that doesn't mean what you do
is right.

In any case, what do you mean by ought to!?!?!?!?  SGML syntax is not
self-documenting by and of itself.  Do you build relational databases
without defining a schema?  Do you design data structures which have the
sole purpose of interchanging data between published APIs but that are
not documented?

well formed by definition should mean in conformance to a pre-existing DTD!

As has been mentioned already, no XML parser worth its salt can even
begin to interpret an XML document without first reading the DTD that
describes it!  What do you do, parse your XML documents by hand?

Finally let's also note that if you are using loosely defined XML-like
formats for data interchange between programs then presumably you are
careful enough only to do so in contexts where the content is highly
dynamic (i.e. not static and long lived), in which case it certainly
isn't suitable for CVS, if indeed any form of change tracking

 And if we take this tack, what about constructs not declarable with DTDs?
 XML Schemas will certainly improve this, but many people are not using them
 yet. Are DTDs with ANY declarations also not XML, since they really don't
 describe the semantics of the document? Since DTDs can't describe data types
 or other restrictions (such as field length), is any DTD'ed document not
 XML?

What about them?   XML is no more a final all-encompassing solution than
the entire SGML framework from which it comes!  Do all your tools look
exactly like hammers?

This is yet another situation where you must learn to use the right tool
for the job!

 DTDs and schemas are good and should be used wherever possible. But there
 are realities of life.

Indeed, and in the reality you're existing in you are defining overly
complex little languages which you then mis-name as XML.

-- 
Greg A. Woods

+1 416 218-0098;  [EMAIL PROTECTED];  [EMAIL PROTECTED];  [EMAIL PROTECTED]
Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED]

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-05-01 Thread Greg A. Woods

[ On Wednesday, May 1, 2002 at 11:34:02 (-0400), Gary Bisaga wrote: ]
 Subject: RE: merge mode for XML

 Good point, Noel. At my last job we had a partner we were required to
 connect to, and it was a job getting even an example XML document out of
 them, let alone a DTD or schema.

You guys are just re-iterating all the silly arguments that have come
and gone with EDI.  New face, same old problem.

-- 
Greg A. Woods

+1 416 218-0098;  [EMAIL PROTECTED];  [EMAIL PROTECTED];  [EMAIL PROTECTED]
Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED]

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-05-01 Thread Peter Ring

I might just sit back and watch the show, but I like to be part of the
fun ;)

DTDs are certainly not enough in a lot of situations, and XML Schemas,
RELAX NG, or what-have-you won't ever obliviate the need for documentation
and project management.

But sometimes, well-formed XML is worthy of version management even
when no-one bothers about a schema. For example, you might need to save
some of those short-lived XML-formatted messages for a regression test.

Also, there is now a trend towards a more Unix-like 'tool-set' approach,
in which different parts of a chain of processes are responsible for
different tasks. While it might be valuable early in a proces to validate
an XML instance, it might be a waste of resources later in the proces.

There's also another interesting trend: HyTime is slowly being re-invented
in XML incarnation. Which to me is proof-of-concept: you can dream up
another syntax (and a full-blown SGML parser might even be able to parse
it, given a suitable SGML declaration), but in the end, it doesn't make
much of a difference whether you write conceptsomething/concept or
(concept something) or \concept{something} or whatever; the fundamental
issue is the same: You are not really supposed to look at the markup
except as an expression of structure.

Which was what started this thread: how to diff and merge in a meaningful
way, i.e. in a way that knows that whatnot a=bar b=foo / isn't
different from whatnot   b=foo   a=bar / in a way that most XML
applications should care about. You can come this far with just well-formed
XML. When it comes to whitespace in character context, things get really
interesting. XML in essence leaves it up to the application what to do with
whitespace, so you have to know the application in order to decide whether
a whitespace difference matter. A DTD or schema helps a lot because you
can then ignore whitespace in element context.

BTW, I stumbled over yet another XML diff, this one written by
Norman Walsh:

  http://nwalsh.com/java/diffmk/

and a small feature about whitespace and prettyprinting XML:

  http://www.xml.com/pub/a/2002/01/02/whitespace.html


Kind regards

Peter Ring


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of
Gary Bisaga
Sent: 1. maj 2002 17:12
To: CVS-II Discussion Mailing List
Subject: RE: merge mode for XML


Sorry, this strikes me as just a little bit extreme. I agree that you ought
to write DTDs or schemas (just yesterday I had to make one of our developers
do so, and our own internal XML infrastructure requires them). But to call
documents without DTDs/schemas not XML and unworthy of configuration
management is certainly not supported by the XML spec or common usage. For
one thing, as I'm sure you know, the XML spec does not seem to deprecate
well-formed XML documents. When I was in the W3 XML working group (1999)
there was certainly a group of us (not everybody) who believed that
well-formed documents had a place in the world.

And if we take this tack, what about constructs not declarable with DTDs?
XML Schemas will certainly improve this, but many people are not using them
yet. Are DTDs with ANY declarations also not XML, since they really don't
describe the semantics of the document? Since DTDs can't describe data types
or other restrictions (such as field length), is any DTD'ed document not
XML?

DTDs and schemas are good and should be used wherever possible. But there
are realities of life.

 gary

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of
Greg A. Woods
Sent: Wednesday, May 01, 2002 1:56 AM
To: Peter Ring
Cc: CVS-II Discussion Mailing List
Subject: RE: merge mode for XML
 rantThere's a class of simple XML documents that live and
 die without getting near either a DTD or revision control.
 Without a schema and accompanying documentation, there's no
 way to tell the semantics of the XML document, and not much
 point in version management./rant

Amen.  I couldn't agree more!
Those who dare call such things XML are sadly mistaken.


___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs


___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-04-30 Thread Noel Yap

--- Greg A. Woods [EMAIL PROTECTED] wrote:
 In any case if you re-read what I wrote a little
 more carefully you'll
 note that I'm still only talking about XML, using
 HTML only as an
 example (because it uses the same syntax).  Since
 all the XML parsers I
 know of are very much unrelated to any known HTML
 rendering engines any
 issues with the latter cannot possibly have anything
 whatsoever to do
 with the former.

Very well, we'll talk about what you want to talk
about, not what was being talked about at the time I
made my post.

Noel



__
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-04-30 Thread Greg A. Woods

[ On Tuesday, April 30, 2002 at 07:43:21 (-0500), Sean Hager wrote: ]
 Subject: RE: merge mode for XML

 Thanks for offering up the samples Paul.  I read through last Septembers
 thread on giving up cvs.  I see that I stirred up an old debate here (man
 you guys really had it out last time ;).
 
 With the emergence of xml more and more programs are supporting it as a
 format.  If a cvs - xml diff/merge solution was implemented then cvs could
 capture a huge new level of concurrent development in documentation,
 configuration, and help system docs, etc...
 
 Any chance some of you cvs wizes could ever implement a modular diff/merge
 subprogram architecture into CVS.  Then we could implement an XML wrapper.

Don't hold your breath.  Even the biggest proponents of this idea have
not yet come up with working code as a solid proposal -- only what
amounts to no more than a functional specification, and one that in my
opinion contains several concerns for existing CVS users.  Note too that
this is after over a half a decade of debate.  Unless you're prepared to
do the implementation yourself, or at least fund it, it may not happen
for another half decade  :-)

-- 
Greg A. Woods

+1 416 218-0098;  [EMAIL PROTECTED];  [EMAIL PROTECTED];  [EMAIL PROTECTED]
Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED]

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-04-30 Thread Noel Yap

 Don't hold your breath.  Even the biggest proponents
 of this idea have
 not yet come up with working code as a solid
 proposal -- only what
 amounts to no more than a functional specification,
 and one that in my
 opinion contains several concerns for existing CVS
 users.  Note too that
 this is after over a half a decade of debate. 
 Unless you're prepared to
 do the implementation yourself, or at least fund it,
 it may not happen
 for another half decade  :-)

Unfortunately, this is true.  IMO, it would be great
if diff/merge tools can be plugged into CVS (I think
CVS shouldn't be in a place to dictate, although it
can suggest, how things are diffed/merged, just when
it's supposed to happen).

ClearCase is the only tool on the market I know that
supports this.  I think there was some effort in
creating an open source ClearCase-like tool (katie and
possibly another one?) but I haven't heard anything as
of late.

Noel


__
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-04-30 Thread Greg A. Woods

[ On Monday, April 29, 2002 at 08:31:24 (+0200), Peter Ring wrote: ]
 Subject: RE: merge mode for XML


I sort of agree with the logic of the arguments for SGML and its
derrivatives, but I find the rhetoric about it being the only choice
because it's the best there is (something I've heard whined about for
nearly two decades now) to be nor more than self-serving, at best.

As for source code beautification issues w.r.t. XML, well those are no
different than when dealing with any kind of source code primarily
written and edited with an integrated IDE.

 There are, BTW, XML diff tools. See e.g.:
 
   http://www.alphaworks.ibm.com/tech/xmldiffmerge
   http://www.deltaxml.com
   http://www.vmguys.com/vmtools
   http://www.logilab.org/xmldiff
 
 The first one can be used as merge tool. The other ones can produce a
 XML diff file that -- given a proper XML patch utility -- can update one
 one XML file to become the other one.

Well with CVS there's always the choice of manually re-doing every
merge.  You don't have to use CVS to do merges and diffs -- it'll
happily store your files with text-based (i.e. newline separated) diffs
and you can use the better format-specific tools to view deltas and do
merges as you wish.  Obviously a front-end wrapper to CVS that
integrates these tools would be helpful, but it's not strictly necessary
(unless your users are less than clueless :-).

For instance PCL-CVS, the emacs front-end to CVS, allows one to re-do
merges with ediff.  I don't know if ediff could be extended to use
external diff tools (and also perhaps alternate merge tools), or not,
but that may be the best way for users with immediate needs to proceed.
(I.e. even if you're not an emacs user, treat emacs as an application
framework and use emacs+PCL-CVS+ediff as a stand-alone CVS interface.)

 There are, to the best of my knowledge, no freely available stand-alone
 SGML diff tools. Some editors, e.g. ArborText Epic, can do a very nice
 compare.

Would not a full stand-alone SGML diff tool be required to understand
the DTD in order to do a proper job of knowing just how different tagged
elements relate to each other in order to know whether or not they have
to be included in any delta or merge?

-- 
Greg A. Woods

+1 416 218-0098;  [EMAIL PROTECTED];  [EMAIL PROTECTED];  [EMAIL PROTECTED]
Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED]

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



Re: merge mode for XML

2002-04-30 Thread Paul Sander

I'm looking into a ground-up rewrite of CVS from Dick Grune's last shell 
script implementation.  It will take a while to complete a prototype 
because my life is pretty turbulent right now, but it will get done.

On Tuesday, April 30, 2002, at 05:43  AM, [EMAIL PROTECTED] wrote:

 Thanks for offering up the samples Paul.  I read through last Septembers
 thread on giving up cvs.  I see that I stirred up an old debate here 
 (man
 you guys really had it out last time ;).

 With the emergence of xml more and more programs are supporting it as a
 format.  If a cvs - xml diff/merge solution was implemented then cvs 
 could
 capture a huge new level of concurrent development in documentation,
 configuration, and help system docs, etc...

 Any chance some of you cvs wizes could ever implement a modular 
 diff/merge
 subprogram architecture into CVS.  Then we could implement an XML 
 wrapper.

 sean.


 -Original Message-
 From: Paul Sander [mailto:[EMAIL PROTECTED]]
 Sent: Monday, April 29, 2002 12:43 PM
 To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: RE: merge mode for XML


 Once again, take a look at message ID#
 [EMAIL PROTECTED]
 posted to this forum on September 16, 2001.  It illustrates
 one way (though
 perhaps not the best way) to do just this.  It relies on a
 lookup table that
 looks up a diff tool given a file's name.

 A better implementation would be to code a symbolic name for
 the merge tool
 in a newphrase in the admin section the RCS file, and look up
 that symbolic
 name on the client to locate the proper tool.

 --- Forwarded mail from [EMAIL PROTECTED]

 A better approach is to avoid XML entirely in the first place
 -- it's a
 really really horrid syntax with all kinds of goo that's
 usually way
 over-kill for the application, being SGML based and all that


 I agree that XML is overkill, but the truth is that it is
 here to stay.

 XML is fastly becoming excepted as the defacto standard for
 data exchange.
 Opto 22 makes machine control sensors / PLC that publishes
 data in XML.
 Semen's is doing similar things from what I understand.
 Java uses XML for
 all of the enterprise application descriptors.  It seems that I can't
 interface to machines, or program without looking at XML.

 If CVS had away to use modular plug in diff and merge
 programs, we could
 setup a wrapper file that would automatically diff/merge the file
 differently based on the extension.  e.g.:

 *.xml   xml_dm
 *.html  html_dm

 This way we could write our own diff programs without having
 to understand
 all the complexities of tying into CVS code seamlessly.
 Interfacing is much
 easier.  We could even take the XML diff/merge programs that
 are already
 available and just write wrappers for them.  No point in
 reinventing the
 wheel here.

 --- End of forwarded message from [EMAIL PROTECTED]


___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-04-30 Thread Peter Ring

I tried not to argue about the virtues of SGML/XML -- the fact
is that it there and any non-propriatary alternatives have
similar properties wrt. meaningful diffs (and thus merge).

My IDE for editing XML and SGML files is usually emacs+psgml.
I like the idea of extending PCL-CVS to invoke another diff
tool (but I'll probably not get around to exploring the idea).

SGML and XML files are really just serialized representations
of parse trees, infosets, and an infoset can be serialized in
many equivalent ways. So diff'ing XML and SGML alike need a
validating parser, i.e. one that uses a schema such as a DTD.
rantThere's a class of simple XML documents that live and
die without getting near either a DTD or revision control.
Without a schema and accompanying documentation, there's no
way to tell the semantics of the XML document, and not much
point in version management./rant

kind regards
Peter Ring

-Original Message-
From: Greg A. Woods [mailto:[EMAIL PROTECTED]]
Sent: 30. april 2002 19:09
To: Peter Ring
Cc: CVS-II Discussion Mailing List
Subject: RE: merge mode for XML


[ On Monday, April 29, 2002 at 08:31:24 (+0200), Peter Ring wrote: ]
 Subject: RE: merge mode for XML


I sort of agree with the logic of the arguments for SGML and its
derrivatives, but I find the rhetoric about it being the only choice
because it's the best there is (something I've heard whined about for
nearly two decades now) to be nor more than self-serving, at best.

As for source code beautification issues w.r.t. XML, well those are no
different than when dealing with any kind of source code primarily
written and edited with an integrated IDE.

snip /

For instance PCL-CVS, the emacs front-end to CVS, allows one to re-do
merges with ediff.  I don't know if ediff could be extended to use
external diff tools (and also perhaps alternate merge tools), or not,
but that may be the best way for users with immediate needs to proceed.
(I.e. even if you're not an emacs user, treat emacs as an application
framework and use emacs+PCL-CVS+ediff as a stand-alone CVS interface.)

 There are, to the best of my knowledge, no freely available stand-alone
 SGML diff tools. Some editors, e.g. ArborText Epic, can do a very nice
 compare.

Would not a full stand-alone SGML diff tool be required to understand
the DTD in order to do a proper job of knowing just how different tagged
elements relate to each other in order to know whether or not they have
to be included in any delta or merge?

--
Greg A. Woods

+1 416 218-0098;  [EMAIL PROTECTED];  [EMAIL PROTECTED];
[EMAIL PROTECTED]
Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird
[EMAIL PROTECTED]


___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-04-30 Thread Greg A. Woods

[ On Wednesday, May 1, 2002 at 00:04:39 (+0200), Peter Ring wrote: ]
 Subject: RE: merge mode for XML

 SGML and XML files are really just serialized representations
 of parse trees, infosets, and an infoset can be serialized in
 many equivalent ways.

Hmmm it's just that they have the most horrid syntax  It's
really too bad, but after all these years that's one heck of a herd of
elephants to try and turn on the head of a pin  :-)

 So diff'ing XML and SGML alike need a
 validating parser, i.e. one that uses a schema such as a DTD.

Yes -- that's what I thought...

 rantThere's a class of simple XML documents that live and
 die without getting near either a DTD or revision control.
 Without a schema and accompanying documentation, there's no
 way to tell the semantics of the XML document, and not much
 point in version management./rant

Amen.  I couldn't agree more!

Those who dare call such things XML are sadly mistaken.

-- 
Greg A. Woods

+1 416 218-0098;  [EMAIL PROTECTED];  [EMAIL PROTECTED];  [EMAIL PROTECTED]
Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED]

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-04-29 Thread Peter Ring

One of the most widespread uses of XML is as a neutral storage
and exchange format for documents. In these cases, avoiding XML
or SGML would just imply going back to Word or FrameMaker (and we
don't want that), or to LateX or Texi, which are similar wrt.
merging. Or HTML, an application of SGML. And anyway, a lot of
documentation for open source projects are being written or
converted to DocBook, and will be maintained using the same
revision control tools as the rest of the projects, i.e., cvs.
So we are going to see questions about XML or SGML pop up more
frequently.

Many of the issues wrt. cvs are essentially not much different
from maintaining documentation written in LateX or Texi format.
Except you can't assume that authors will be using vi or emacs.
A lot of different tools will be used -- that was one of the
main points of using SGML or XML.

It is common sense to break up a document into mini- or
micro-documents with each their own lifecycle -- just as you do
for programming source code. The concept of storage management
is built into SGML and XML at a very low level. The customary
way to do this is by declaring entitities, symbolic names for
storage objects, which can then be included in other documents
at appropriate places. XInclude and XLink (and for SGML, HyTime)
also offer ways to include or locate parts of documents in terms
of parse trees.

But how about the physical storage format of each file? Authors
will often be using different XML or SGML editors that will
'beautify' the XML or SGML source in different ways, introducing
spurious differences and conflicts. Another source of spurious
conflicts are character encoding, namespace declarations, and
order of attributes; most documents can be stored in a number of
different ways with no loss of information for the intended use.
But a simple diff will show a lot of difference that's not there,
essentially.

Until proper XML repositories become as ubiquitous as cvs, we
might as well find a way to live with it.

The character encoding is easy to control -- SGML and XML are
very explicit about it, and editors do in general handle encoding
gracefully.

Namespace declarations and attribute order are tricky. Things
can be normalized, see Canonical XML, http://www.w3.org/TR/xml-c14n,
but full canonicalization of a documents will be too much.

The 'beautify' problem is even worse., i.e., how to introduce and
remove whitespace in a way that makes cvs behave meaningfully.
I have not yet found a simple recipe for beautifying SGML and XML.
Here are some of the options:

The most generic, simple and safe way to break XML or SGML into
lines is unfortunately not too pretty. Keep any line breaks
already present in the source and, in addition, break just _before_
the markup delimiter close character '' on the start tag, e.g.:

$ osx xml.dcl beautify.xml
?xml version=1.0 encoding=iso-8859-1?
!DOCTYPE section PUBLIC
-//OASIS//DTD DocBook XML V4.2//EN
docbookx.dtd
section
title
Beautifying XML/titlepara
Papageno/parapara
Break inside markup like
this: emphasis
role=bold
some text/emphasis./parapara
Papagena/para/section

Some tools can beautify in a way more suitable for human consumption:

$ xmllint --format beautify.xml
?xml version=1.0 encoding=iso-8859-1?
!DOCTYPE section PUBLIC -//OASIS//DTD DocBook XML V4.2//EN
docbookx.dtd
section
  titleBeautifying XML/title
  paraPapageno/para
  paraBreak inside markup like
this: emphasis role=boldsome text/emphasis./para
  paraPapagena/para
/section

Keeping white space in character context while beautifying is a simple
way to avoid problems with NOTATION linespecific AKA xml:space='preserve'
AKA pre. But the reason we needed a beautfier in the first place is
that editors put in different amounts of whitespace in different places.
If someone out there have a nice and robust XSLT stylesheet for
normalizing/beautifying XML, please publish!

There are, BTW, XML diff tools. See e.g.:

  http://www.alphaworks.ibm.com/tech/xmldiffmerge
  http://www.deltaxml.com
  http://www.vmguys.com/vmtools
  http://www.logilab.org/xmldiff

The first one can be used as merge tool. The other ones can produce a
XML diff file that -- given a proper XML patch utility -- can update one
one XML file to become the other one.

There are, to the best of my knowledge, no freely available stand-alone
SGML diff tools. Some editors, e.g. ArborText Epic, can do a very nice
compare.

kind regards,
Peter Ring


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of
Greg A. Woods
Sent: 26. april 2002 23:45
To: CVS-II Discussion Mailing List
Subject: RE: merge mode for XML


snip

A better approach is to avoid XML entirely in the first place -- it's a
really really horrid syntax with all kinds of goo that's usually way
over-kill for the application, being SGML based and all that

/snip


___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info

RE: merge mode for XML

2002-04-29 Thread Sean Hager


 A better approach is to avoid XML entirely in the first place
 -- it's a
 really really horrid syntax with all kinds of goo that's usually way
 over-kill for the application, being SGML based and all that


I agree that XML is overkill, but the truth is that it is here to stay.

XML is fastly becoming excepted as the defacto standard for data exchange.
Opto 22 makes machine control sensors / PLC that publishes data in XML.
Semen's is doing similar things from what I understand.  Java uses XML for
all of the enterprise application descriptors.  It seems that I can't
interface to machines, or program without looking at XML.

If CVS had away to use modular plug in diff and merge programs, we could
setup a wrapper file that would automatically diff/merge the file
differently based on the extension.  e.g.:

*.xml   xml_dm
*.html  html_dm

This way we could write our own diff programs without having to understand
all the complexities of tying into CVS code seamlessly.  Interfacing is much
easier.  We could even take the XML diff/merge programs that are already
available and just write wrappers for them.  No point in reinventing the
wheel here.

Sean.



___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-04-29 Thread Gary Bisaga

Doesn't everyone format their XML like that?  I.e. like HTML so that
tags are on their own lines and there are extra blank lines (that won't
be treated as data) between groups of items and even between items too?

Not only do tools not always follow these rules, you can't even always treat
HTML like that. Besides making a huge file, it messes up the rendering of
tables with sliced-up images.


___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-04-29 Thread Paul Sander

Once again, take a look at message ID# [EMAIL PROTECTED]
posted to this forum on September 16, 2001.  It illustrates one way (though
perhaps not the best way) to do just this.  It relies on a lookup table that
looks up a diff tool given a file's name.

A better implementation would be to code a symbolic name for the merge tool
in a newphrase in the admin section the RCS file, and look up that symbolic
name on the client to locate the proper tool.

--- Forwarded mail from [EMAIL PROTECTED]

 A better approach is to avoid XML entirely in the first place
 -- it's a
 really really horrid syntax with all kinds of goo that's usually way
 over-kill for the application, being SGML based and all that


I agree that XML is overkill, but the truth is that it is here to stay.

XML is fastly becoming excepted as the defacto standard for data exchange.
Opto 22 makes machine control sensors / PLC that publishes data in XML.
Semen's is doing similar things from what I understand.  Java uses XML for
all of the enterprise application descriptors.  It seems that I can't
interface to machines, or program without looking at XML.

If CVS had away to use modular plug in diff and merge programs, we could
setup a wrapper file that would automatically diff/merge the file
differently based on the extension.  e.g.:

*.xml  xml_dm
*.html html_dm

This way we could write our own diff programs without having to understand
all the complexities of tying into CVS code seamlessly.  Interfacing is much
easier.  We could even take the XML diff/merge programs that are already
available and just write wrappers for them.  No point in reinventing the
wheel here.

--- End of forwarded message from [EMAIL PROTECTED]


___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-04-29 Thread Greg A. Woods

[ On Monday, April 29, 2002 at 07:23:03 (-0700), Noel Yap wrote: ]
 Subject: RE: merge mode for XML

 --- Sean Hager [EMAIL PROTECTED] wrote:
  If CVS had away to use modular plug in diff and
  merge programs, we could
  setup a wrapper file that would automatically
  diff/merge the file
  differently based on the extension.  e.g.:
  
  *.xml   xml_dm
  *.html  html_dm
 
 Ideally, the diff/merge tool would be tied to the type
 of the file and the type of the file is initially set
 depending on the extension.  This way, one would be
 able to change the type of the file independent of its
 extension.

Yes, please and thankyou!  :-)

-- 
Greg A. Woods

+1 416 218-0098;  [EMAIL PROTECTED];  [EMAIL PROTECTED];  [EMAIL PROTECTED]
Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED]

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-04-29 Thread Greg A. Woods

[ On Monday, April 29, 2002 at 09:54:50 (-0400), Gary Bisaga wrote: ]
 Subject: RE: merge mode for XML

 Doesn't everyone format their XML like that?  I.e. like HTML so that
 tags are on their own lines and there are extra blank lines (that won't
 be treated as data) between groups of items and even between items too?
 
 Not only do tools not always follow these rules, you can't even always treat
 HTML like that. Besides making a huge file, it messes up the rendering of
 tables with sliced-up images.

I'm not talking about placing blank lines and/or comments in places
where they would be considered part of the data.  There are plenty of
places to do so without affecting the interpretation of the data.

-- 
Greg A. Woods

+1 416 218-0098;  [EMAIL PROTECTED];  [EMAIL PROTECTED];  [EMAIL PROTECTED]
Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED]

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-04-29 Thread Noel Yap

--- Greg A. Woods [EMAIL PROTECTED] wrote:
  Not only do tools not always follow these rules,
 you can't even always treat
  HTML like that. Besides making a huge file, it
 messes up the rendering of
  tables with sliced-up images.
 
 I'm not talking about placing blank lines and/or
 comments in places
 where they would be considered part of the data. 
 There are plenty of
 places to do so without affecting the interpretation
 of the data.

In theory, this is easy to do, but in practice I have
seen browsers act differently due to whitespace that
really shouldn't affect the rendering.  IIRC,
tabletr worked differently from table\ntr
on at least one browser.

Noel


__
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-04-29 Thread Greg A. Woods

[ On Monday, April 29, 2002 at 13:25:48 (-0700), Noel Yap wrote: ]
 Subject: RE: merge mode for XML

 In theory, this is easy to do, but in practice I have
 seen browsers act differently due to whitespace that
 really shouldn't affect the rendering.  IIRC,
 tabletr worked differently from table\ntr
 on at least one browser.

I don't know of any browsers that parse XML -- you seem to be talking
about HTML.

In any case who the heck cares about a broken HTML parser in some random
browser (even if it is a commonly used version of M$-Exploder)?!?!?!?

If your users can't use standards-compliant browsers then get new users!  :-)

-- 
Greg A. Woods

+1 416 218-0098;  [EMAIL PROTECTED];  [EMAIL PROTECTED];  [EMAIL PROTECTED]
Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED]

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-04-29 Thread Noel Yap

--- Greg A. Woods [EMAIL PROTECTED] wrote:
 [ On Monday, April 29, 2002 at 13:25:48 (-0700),
 Noel Yap wrote: ]
  Subject: RE: merge mode for XML
 
  In theory, this is easy to do, but in practice I
 have
  seen browsers act differently due to whitespace
 that
  really shouldn't affect the rendering.  IIRC,
  tabletr worked differently from
 table\ntr
  on at least one browser.
 
 I don't know of any browsers that parse XML -- you
 seem to be talking
 about HTML.

I apologize for truncating the portion of the post you
had responded to.  Here it is as a reminder so that
you can take my post in its intended context:
 Doesn't everyone format their XML like that?  I.e.
like HTML so that
 tags are on their own lines and there are extra
blank lines (that 
won't
 be treated as data) between groups of items and
even between items 
too?

 In any case who the heck cares about a broken HTML
 parser in some random
 browser (even if it is a commonly used version of
 M$-Exploder)?!?!?!?

Actually, I believe it was the other commonly used
browser.

 If your users can't use standards-compliant browsers
 then get new users!  :-)

In any case, those of us who program as a service to
the greater community have to deal with broken
software no matter how much we hate it.  Greg, which
corporation did you say you worked in again?  In what
industry?

Noel

__
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-04-26 Thread Sean Hager



 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED]]On Behalf Of
 Greg A. Woods
 Sent: Thursday, April 25, 2002 8:24 PM
 To: [EMAIL PROTECTED]
 Cc: CVS-II Discussion Mailing List
 Subject: Re: merge mode for XML
 
 
 [ On Thursday, April 25, 2002 at 16:10:37 (-0500), Sean Hager wrote: ]
  Subject: merge mode for XML
 
  Is there a merge mode or merge algorithm that works well 
 for XML files?
 
 Doesn't diff3 work well enough?
 
 XML files are more or less just text, right?
 
 If the tags are all on separate lines, then regardless of whether
 content is changed, or tags are changed, diff3 will do the 
 right thing.
 
 -- 
   
 Greg A. Woods


I did test the merge with xml files and it worked fine
as long as we made edits more then 5 to 10 lines appart from each other. 

When we made edits that were only 1 or 2 lines appart we got conflicts
which I felt the merge should have been able to solve.

I am not a CVS expert, but I was thinking that perhaps the tags
diff3 was looking for were different for xml.  I am going to test it
some more, prehaps I will have to look into the source code, but I 
was hoping someone on the list had some experience with this.

sean.




___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-04-26 Thread Gary Bisaga

Sean,
Our result is more or less the same as yours. We do this all the time. The
diff does work, but it seems to get confused when people are making changes
close to each other. FWIW, it seems to have problems with whitespace and
attribute values. Not a very scientific result, I grant you, but that has
been my experience.

Our solution (obviously wouldn't work for everybody) is just to partition
the XML into multiple files, each of which will hopefully be worked on
mostly by one person at a time. Our configuration manager reads in all
similarly-named XML files in parallel, so if you put everything into one
file or split it into 100 it doesn't really matter. I even encourage people
working on their own part of the system for awhile to create a temporary
'jimsmith.page_registry.xml' file to reduce merge conflicts. Works for us.
Since we have done this, it has cut way back on merge conflicts and the
corresponding system errors.

BTW, I found this a useful technique using locking CM systems like PVCS too.
Replace merge conflict with locked files.

 gary

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of
Sean Hager
Sent: Friday, April 26, 2002 9:17 AM
To: 'CVS-II Discussion Mailing List'
Subject: RE: merge mode for XML




 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED]]On Behalf Of
 Greg A. Woods
 Sent: Thursday, April 25, 2002 8:24 PM
 To: [EMAIL PROTECTED]
 Cc: CVS-II Discussion Mailing List
 Subject: Re: merge mode for XML


 [ On Thursday, April 25, 2002 at 16:10:37 (-0500), Sean Hager wrote: ]
  Subject: merge mode for XML
 
  Is there a merge mode or merge algorithm that works well
 for XML files?

 Doesn't diff3 work well enough?

 XML files are more or less just text, right?

 If the tags are all on separate lines, then regardless of whether
 content is changed, or tags are changed, diff3 will do the
 right thing.

 --

 Greg A. Woods


I did test the merge with xml files and it worked fine
as long as we made edits more then 5 to 10 lines appart from each other.

When we made edits that were only 1 or 2 lines appart we got conflicts
which I felt the merge should have been able to solve.

I am not a CVS expert, but I was thinking that perhaps the tags
diff3 was looking for were different for xml.  I am going to test it
some more, prehaps I will have to look into the source code, but I
was hoping someone on the list had some experience with this.

sean.




___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs


___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-04-26 Thread EXT-Corcoran, David

It helps me to think of a plain ASCII text file source (C,java,perl etc) as
a markup language where a newline is the only tag.

To extend the delta generation of a more structured markup language, such as
XML, probably would require knowledge of that syntax by the diff program.

A quick an dirty approach may be to prefix all opening tags with a newline,
suffix all closing tags with a newline, and then remove all blank lines
unless inside a tag (in and among CDATA); essentially run it through an xml
equiv of cb(1) or indent(1).


--@@ 
   ~ 
 DavidC

No State shall convert a liberty into a privilege, license it, and charge a
fee therefore. 

~ Murdock v. Pennsylvania, 319 US 105, US Supreme Court, 1943.


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
 Sent: Thursday, April 25, 2002 6:24 PM
 To: [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: Re: merge mode for XML
 
 
 [ On Thursday, April 25, 2002 at 16:10:37 (-0500), Sean Hager wrote: ]
  Subject: merge mode for XML
 
  Is there a merge mode or merge algorithm that works well 
 for XML files?
 
 Doesn't diff3 work well enough?
 
 XML files are more or less just text, right?
 
 If the tags are all on separate lines, then regardless of whether
 content is changed, or tags are changed, diff3 will do the 
 right thing.
 
 -- 
   
 Greg A. Woods
 
 +1 416 218-0098;  [EMAIL PROTECTED];  [EMAIL PROTECTED];  
 [EMAIL PROTECTED]
 Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird 
 [EMAIL PROTECTED]
 
 ___
 Info-cvs mailing list
 [EMAIL PROTECTED]
 http://mail.gnu.org/mailman/listinfo/info-cvs
 

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-04-26 Thread Greg A. Woods

[ On Friday, April 26, 2002 at 08:16:44 (-0500), Sean Hager wrote: ]
 Subject: RE: merge mode for XML

 I am not a CVS expert, but I was thinking that perhaps the tags
 diff3 was looking for were different for xml.  I am going to test it
 some more, prehaps I will have to look into the source code, but I 
 was hoping someone on the list had some experience with this.

Diff and diff3 look for changes across entire lines of text -- i.e. the
separator is the newline.  Judicious use of blank lines, or other syntax
elements that won't normally change (eg. comments!) will help separate
related units by enough static lines that diff and diff3 won't blurr
them into one change.

-- 
Greg A. Woods

+1 416 218-0098;  [EMAIL PROTECTED];  [EMAIL PROTECTED];  [EMAIL PROTECTED]
Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED]

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs



RE: merge mode for XML

2002-04-26 Thread Greg A. Woods

[ On Friday, April 26, 2002 at 06:51:36 (-0700), EXT-Corcoran, David wrote: ]
 Subject: RE: merge mode for XML

 It helps me to think of a plain ASCII text file source (C,java,perl etc) as
 a markup language where a newline is the only tag.

:-)

 To extend the delta generation of a more structured markup language, such as
 XML, probably would require knowledge of that syntax by the diff program.

Just as it could with any language.  (eg. C statements and expressions
should really be treated as units)

 A quick an dirty approach may be to prefix all opening tags with a newline,
 suffix all closing tags with a newline, and then remove all blank lines
 unless inside a tag (in and among CDATA); essentially run it through an xml
 equiv of cb(1) or indent(1).

Doesn't everyone format their XML like that?  I.e. like HTML so that
tags are on their own lines and there are extra blank lines (that won't
be treated as data) between groups of items and even between items too?

A better approach is to avoid XML entirely in the first place -- it's a
really really horrid syntax with all kinds of goo that's usually way
over-kill for the application, being SGML based and all that

You're no worse off defining a proper little language for your data.
Writing a parser with modern grammar compilers is no harder than writing
a good DTD, and it means you can avoid having to have all that annoying
useless syntax that just gets in your way.  By following common
approaches to simple data description syntax design one can even make a
custom little language trivial for most programmers to learn (thus
avoiding one of the few arguments against using a custom language).

-- 
Greg A. Woods

+1 416 218-0098;  [EMAIL PROTECTED];  [EMAIL PROTECTED];  [EMAIL PROTECTED]
Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED]

___
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs