RE: merge mode for XML
This thread is a die hard, but it is still the best conversation on the list ;) sean. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Peter Ring Sent: Tuesday, May 14, 2002 7:16 PM To: [EMAIL PROTECTED] Subject: RE: merge mode for XML Importance: Low A paper that will interest you: (preliminary version) http://citeseer.nj.nec.com/cache/papers/cs/15339/http:zSzzSzww w.cs.arizona.e duzSzpeoplezSztodszSzacceptedzSz2000zSzParsonsEmancipating.pdf /parsons00eman cipating.pdf (published) http://portal.acm.org/citation.cfm?id=357778coll=portaldl=AC MCFID=2131136 CFTOKEN=70981949 Abstract: Database design commonly assumes, explicitly or implicitly, that instances must belong to classes. This can be termed the assumption of inherent classification. We argue that the extent and complexity of problems in schema integration, schema evolution, and interoperability are, to a large extent, consequences of inherent classification. Furthermore, we make the case that the assumption of inherent classification violates philosophical and cognitive guidelines on classification and is, therefore, inappropriate in view of the role of data modeling in representing knowledge about application domains. Also, a search for 'semantic interoperability' should return some interesting hits. To tell the difference between two (or three) sequences of bytes is not too difficult; comparing two sequences A and B to determine their longest common subsequence (LCS) or the edit distance between them has been much studied. GNU diff is based on an algorithm published by Eugene W. Myers in 1986. To tell the difference (distance) between two semantic structures is difficult in a very fundamental way. Kind regards Peter Ring -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Glew, Andy Sent: 13. maj 2002 19:32 To: [EMAIL PROTECTED]; Glew, Andy Cc: Gary Bisaga Subject: RE: merge mode for XML Motivation: schema changes in most existing relational databases are onerous. For very good reason. And what is that reason? OK, I admit that some RDBMS applications in production need stability - just like some systems software applications (the kind Greg seems to work on, the kind I used to work on) value stability above all else, and actively want to make it hard to change things. However, there are other application domains - in programming, the domains attacked by agile methodologies like XP (eXtreme Programming). {Donning asbestos underwear, expecting Greg to flame.} An application area that I frequently work in nowadays is experimental databases - databases for experimental data. I want to archive all of my experimental data in a form that allows me to do arbitrary SQL-like queries over it. Problem is, as I continue my research, the format of my records is continually changing. For example, a few years ago I might have recorded CPU MHz and Cache Size as configuration parameters - now I have to record at least 3 different cache sizes, as well as multiple clock domain frequencies. Not to mention that the observations that I record are constantly changing. Rather than continually reformatting my database, adding new fields which are Unknown or Null on old data, I find it easier to add records containing fields that were not known earlier. snip / ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
[ On Monday, May 13, 2002 at 10:31:36 (-0700), Glew, Andy wrote: ] Subject: RE: merge mode for XML Motivation: schema changes in most existing relational databases are onerous. For very good reason. And what is that reason? because an RDBMS cannot understand the semantic qualities of your data unless you describe them to it -- they are, by definition, not inutitive. An application area that I frequently work in nowadays is experimental databases - databases for experimental data. I want to archive all of my experimental data in a form that allows me to do arbitrary SQL-like queries over it. I think you really need to look at flat text files. IIRC there are SQL engines available that will access them -- and with about as much expectations to performance, though in my book AWK will do as well. Even if you have millions of results per table AWK will do just fine. Problem is, as I continue my research, the format of my records is continually changing. Or maybe OODBMS technology is more what you need But that may cost you more than using AWK I've tried to do this in a traditional RDBMS database. I've asked database experts like deWitt and the guy who invented transactions whose name I can't remember now... and the answer always comes that the traditional RDBMS way is to create a database in fully normalized form, of the form Experiment#:Metric:Value. Worse, it may be ncessary to create several different tables for each type. It is impossible for ordinary humans to write queries in such a form. Heh. Yes, you do need an OODBMS Yet, self-schematization makes it trivial to do. Perhaps only in an OODBMS Well, deWitt is the big advocate of ORDBMS - Object Relational DBMS. H I've not been a big fan of ORDBMS, but then I'm not a huge DBMS user. I don't know what the tradeoffs are w.r.t. a true OODBMS. -- Greg A. Woods +1 416 218-0098; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
Motivation: schema changes in most existing relational databases are onerous. For very good reason. And what is that reason? OK, I admit that some RDBMS applications in production need stability - just like some systems software applications (the kind Greg seems to work on, the kind I used to work on) value stability above all else, and actively want to make it hard to change things. However, there are other application domains - in programming, the domains attacked by agile methodologies like XP (eXtreme Programming). {Donning asbestos underwear, expecting Greg to flame.} An application area that I frequently work in nowadays is experimental databases - databases for experimental data. I want to archive all of my experimental data in a form that allows me to do arbitrary SQL-like queries over it. Problem is, as I continue my research, the format of my records is continually changing. For example, a few years ago I might have recorded CPU MHz and Cache Size as configuration parameters - now I have to record at least 3 different cache sizes, as well as multiple clock domain frequencies. Not to mention that the observations that I record are constantly changing. Rather than continually reformatting my database, adding new fields which are Unknown or Null on old data, I find it easier to add records containing fields that were not known earlier. I've tried to do this in a traditional RDBMS database. I've asked database experts like deWitt and the guy who invented transactions whose name I can't remember now... and the answer always comes that the traditional RDBMS way is to create a database in fully normalized form, of the form Experiment#:Metric:Value. Worse, it may be ncessary to create several different tables for each type. It is impossible for ordinary humans to write queries in such a form. Yet, self-schematization makes it trivial to do. All that is needed is more flexible handling of nulls than most RDBMSes support - more like the handling that Codd, Date, and Darwent(sp?) advocate. I suspect Dewitt is thinking a little bit deeper than you suspect. Certainly data can be self-describing -- that's what OO is all about. OO databases can effectively be queried about their schemas... An RDBMS, however, is not an OODBMS. Well, deWitt is the big advocate of ORDBMS - Object Relational DBMS. Whether an XML document without a DTD and/or schema can be considered self-describing enough to be independent like an object instance or a set of object instances, is probably what you're trying to argue, but I won't go any further since such a thing is strictly outside the scope of XML proper and is way outside the scope of what a common tool like CVS should ever deem worthy of dealing with. Fair enough. My original email was prompted by email from you, Greg, that sounded like CVS should not have support for XML, like supporting file-format-specific diff and merge, because XML without a DTD is meaningless. I reject that as a specious argument. Your remaining argument, that nobody has stepped up to do external diff and merge, emains valid. (Ditto wrt file renaming, multiple repositories, etc.) ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
A paper that will interest you: (preliminary version) http://citeseer.nj.nec.com/cache/papers/cs/15339/http:zSzzSzwww.cs.arizona.e duzSzpeoplezSztodszSzacceptedzSz2000zSzParsonsEmancipating.pdf/parsons00eman cipating.pdf (published) http://portal.acm.org/citation.cfm?id=357778coll=portaldl=ACMCFID=2131136 CFTOKEN=70981949 Abstract: Database design commonly assumes, explicitly or implicitly, that instances must belong to classes. This can be termed the assumption of inherent classification. We argue that the extent and complexity of problems in schema integration, schema evolution, and interoperability are, to a large extent, consequences of inherent classification. Furthermore, we make the case that the assumption of inherent classification violates philosophical and cognitive guidelines on classification and is, therefore, inappropriate in view of the role of data modeling in representing knowledge about application domains. Also, a search for 'semantic interoperability' should return some interesting hits. To tell the difference between two (or three) sequences of bytes is not too difficult; comparing two sequences A and B to determine their longest common subsequence (LCS) or the edit distance between them has been much studied. GNU diff is based on an algorithm published by Eugene W. Myers in 1986. To tell the difference (distance) between two semantic structures is difficult in a very fundamental way. Kind regards Peter Ring -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Glew, Andy Sent: 13. maj 2002 19:32 To: [EMAIL PROTECTED]; Glew, Andy Cc: Gary Bisaga Subject: RE: merge mode for XML Motivation: schema changes in most existing relational databases are onerous. For very good reason. And what is that reason? OK, I admit that some RDBMS applications in production need stability - just like some systems software applications (the kind Greg seems to work on, the kind I used to work on) value stability above all else, and actively want to make it hard to change things. However, there are other application domains - in programming, the domains attacked by agile methodologies like XP (eXtreme Programming). {Donning asbestos underwear, expecting Greg to flame.} An application area that I frequently work in nowadays is experimental databases - databases for experimental data. I want to archive all of my experimental data in a form that allows me to do arbitrary SQL-like queries over it. Problem is, as I continue my research, the format of my records is continually changing. For example, a few years ago I might have recorded CPU MHz and Cache Size as configuration parameters - now I have to record at least 3 different cache sizes, as well as multiple clock domain frequencies. Not to mention that the observations that I record are constantly changing. Rather than continually reformatting my database, adding new fields which are Unknown or Null on old data, I find it easier to add records containing fields that were not known earlier. snip / ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
I disargee. Doing this would force a policy onto CVS users where such a policy isn't really necessary. I think using extensions for any decision making is bad.Don't you think it would be bad to force the same diff/merge onto several files that had no extension? There's two important issues here, really: 1. The default diff/merge for a new file. 2. The actual diff/merge of an existing file. Greg is talking about the second issue. I have an inkling keeping this info on a per-version basis won't work but I haven't come up with anything substantial. I'm not sure which issue you're talking about. If it's the second, then using extensions would not allow anyone to override the diff/merge for any reason thereby putting the users at the mercy of CVS. Actually pattern matching would put the users at the mercy of CVS more then extension ( really I mean wild card ) matching. Pattern matching could be very unreliable and produce different results based on the content of the document per version, when the format per version has not changed. Wild card matching puts the users in the drivers seat. You can control how CVS will work with your files with naming conventions. I think programmers are smart enough to follow naming conventions, and understand the consequences of breaking the conventions. sean. ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
--- Sean Hager [EMAIL PROTECTED] wrote: Actually pattern matching would put the users at the mercy of CVS more then extension ( really I mean wild card ) matching. Pattern matching could be very unreliable and produce different results based on the content of the document per version, when the format per version has not changed. Wild card matching puts the users in the drivers seat. You can control how CVS will work with your files with naming conventions. I had understood pattern matching to be pattern matching the name, not the contents, of the file. In this context, pattern matching would be an extended form of extenstion matching. OTOH, the pattern matching you mention is more like the magic file. I actually think this is an even better mechanism. IIRC, magic files work using several ways (including extension matching and some content checking) to guess at a file's type. Note that the content that magic looks at is typically in the header or footer of the file in question. These first and/or last few bytes of files are pretty good ways to guess at a file's type. man file for an example of how well this works. I think programmers are smart enough to follow naming conventions, and understand the consequences of breaking the conventions. I agree. This doesn't, however, cover all the remaining issues with regards to extension checking: 1. Extenstions don't have one-to-one mapping with file types. 2. Not all files have extensions (this is actually a specific case of the former). It's also not clear whether you're talking about using extensions for the initial settings, or for the life-time settings, of the file. Do you think users should be able to override whatever CVS thinks should happen? Noel __ Do You Yahoo!? Yahoo! Health - your guide to health and wellness http://health.yahoo.com ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
--- Sean Hager [EMAIL PROTECTED] wrote: but only unix files have magic file numbers correct? No, I have file working on Win2k. If, however, a system doesn't have magic files, CVS can fallback on pattern matching filenames to set the initial behaviours upon cvs add. The alternative would be for CVS to implement its own magic system. I agree. This doesn't, however, cover all the remaining issues with regards to extension checking: 1. Extenstions don't have one-to-one mapping with file types. 2. Not all files have extensions (this is actually a specific case of the former). It's also not clear whether you're talking about using extensions for the initial settings, or for the life-time settings, of the file. Do you think users should be able to override whatever CVS thinks should happen? Noel For the solid, predictable, common cases CVS could have out of the box configurations. For the not so clear cases, admins would configure the perticular installation, and perhaps have to establish some conventions (naming) to isolate the file type. Don't you think there'd be a huge discussion as to what exactly constitutes solid, predictable, common cases? Even if key people agreed on this list, don't you think it'll bloat CVS? It sounds like you're saying that, once configured, users cannot override the settings. This is bad since, as was stated before, extensions don't have a one-to-one mapping to types. But, at least they have the options to do so if they need to. Not if CVS has the final word as to the type of the file (whether it's configured or not). Noel __ Do You Yahoo!? Yahoo! Health - your guide to health and wellness http://health.yahoo.com ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
[ On Monday, May 6, 2002 at 07:58:09 (-0500), Sean Hager wrote: ] Subject: RE: merge mode for XML Actually pattern matching would put the users at the mercy of CVS more then extension ( really I mean wild card ) matching. Wildcard matching *is* pattern matching! Unless you mean regular expressions when you say pattern. Pattern matching could be very unreliable and produce different results based on the content of the document per version, when the format per version has not changed. Ah, you're talking about some form of pattern matching on the content as opposed to filename matching. Content matching is actually at hell of a lot more reliable than filename matching. After all it's the content type we're trying to determine, not some mythical file naming convention! PLEASE look at the sources of, and references to, the file(1) command. (eg. including the Apache mod_mime_magic content identification module) Wild card matching puts the users in the drivers seat. You can control how CVS will work with your files with naming conventions. I think programmers are smart enough to follow naming conventions, and understand the consequences of breaking the conventions. No matter what the matching technology, nor whether it matches against the filename or the file content, it's still got to be used _only_ as an initial guess as to the file content type. Every revision's deltatext should contain the file type in a newphrase (this is simpler than trying to track resurrections against branches, etc.). It must also be possible to set and reset the file type for any given revision(s) in order to correct any initial matching failures. -- Greg A. Woods +1 416 218-0098; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
Re: merge mode for XML
sarcasm No doubt that's why nobody ever does it the other way on planet Earth. Except, maybe, apache MIME magic. Or the file test. /sarcasm ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
Re: merge mode for XML
In [EMAIL PROTECTED] [EMAIL PROTECTED] (Greg A. Woods) writes: It could be a good idea, if it were modified slightly. Since the type of content in a CVS RCS file revision can change from one revision to another the type of merge tool must be declared in a newphrase in each deltatext section. That would be the type of the *file*, not the merge tool. In a three-way merge, the tool would depend on the three newphrases of the revisions involved. Ain't that fun? ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
--- Sean Hager [EMAIL PROTECTED] wrote: No. Not on extension, but based on *regular expressions*, or at least shell-style pattern matching expressions. Extensions are too simplistic. (c.f. CVSROOT/cvswrappers, CVSROOT/cvsignore) Extensions would work fine, pattern matching is overkill. I think use of a magic file would be more appropriate to set the default diff/merge program. Yes. Some mechanisms like ~/.mime.types plus ~/.mailcap would be desirable. But one more complication would be the version of these external programs. Maybe, CVS needs to keep track of which version of the tools were used for which file revisions, so as to reliably and faithfully reproduce any snapshot. This is a bit more overkill. Admins should test and make backups before changing diff/merge programs during production. Even still, most updates in diff/merge programs would be to fix bugs and would not dramatically change the program functionality. I argee. Further, if versioning were desired, the diff/merge tool can be versioned if it's kept in CVSROOT like all the other admin stuff. Noel __ Do You Yahoo!? Yahoo! Health - your guide to health and wellness http://health.yahoo.com ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
--- Sean Hager [EMAIL PROTECTED] wrote: on earth, extension matching would be fine. Unless you have rogue developers that try and break the system by changing file formation while keeping extensions the same (save it as a jpg, but it is really a gif format) you should not have a problem. If you do have rogue developers, or even developers that can't follow simple instructions such as hey, if it is not a jpeg then don't save it as a jpeg! then you have much larger problems. ie. maybe the inmates of San Quinton do not make the idea development team. I disargee. Doing this would force a policy onto CVS users where such a policy isn't really necessary. I think using extensions for any decision making is bad.Don't you think it would be bad to force the same diff/merge onto several files that had no extension? There's two important issues here, really: 1. The default diff/merge for a new file. 2. The actual diff/merge of an existing file. Greg is talking about the second issue. I have an inkling keeping this info on a per-version basis won't work but I haven't come up with anything substantial. I'm not sure which issue you're talking about. If it's the second, then using extensions would not allow anyone to override the diff/merge for any reason thereby putting the users at the mercy of CVS. Noel __ Do You Yahoo!? Yahoo! Health - your guide to health and wellness http://health.yahoo.com ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
--- Forwarded mail from [EMAIL PROTECTED] Yeah. That'd be a cool feature. But then, CVS will no longer be a standalone program. If you move the repository to another server where the modules are missing, how would you expect CVS to behave? The plugins would be part of the module, so if you moved the module to another CVS repository running the same versions of CVS everything would still work perfectly. (or if you moved the entire repository) Agreed. Consider the case where programmers are working on UNIX workstations with an NFS mounted CVS repository. The cvs command just runs locally, without using the CVS client-server operations. In this case, it is possible that those pluggable modules are present on some workstations but not the others. So, files commited from one machine may fail to check out in another. And how can we now make sure the snapshots (tags) is really reproducible, when CVS now depends on extension modules? CVS would have to use a thin client, and the diff/merge program would have to be on executed on the server side. This would ensure that rules are enforced correctly. Unfortunately, the diff/merge program might have too high an overhead, or they might be interactive and therefore inappropriate for use on the server side. My preference is to have diffs and merges run on the client side, but recognize the value of having the option of running them on the server (e.g. if the merge can be done competely automatically in all cases, as with diff3). If I were to implement something like this today, I'd put a switch in the type manager to specify where the merge program runs. Another issue is that the diff/merge programs may not even be present on the server. Take sourceforge as an example: They won't install custom software on their servers just to supply a free service, especially if the tools come from commercial vendors. --- End of forwarded message from [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
--- Forwarded mail from [EMAIL PROTECTED] [ On Friday, May 3, 2002 at 14:49:11 (-0500), Sean Hager wrote: ] Subject: RE: merge mode for XML No. Not on extension, but based on *regular expressions*, or at least shell-style pattern matching expressions. Extensions are too simplistic. (c.f. CVSROOT/cvswrappers, CVSROOT/cvsignore) Extensions would work fine, pattern matching is overkill. Neither is suitable or sufficient. The actual type must be explicitly recorded in every delta, or at least the initial delta and every delta following a dead delta. 'Course, if CVS used a single RCS file for the entire lifetime of a single file, then the admin newphrase works fine. But that's another argument left for the past... --- End of forwarded message from [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
--- Forwarded mail from [EMAIL PROTECTED] No. Not on extension, but based on *regular expressions*, or at least shell-style pattern matching expressions. Extensions are too simplistic. (c.f. CVSROOT/cvswrappers, CVSROOT/cvsignore) Extensions would work fine, pattern matching is overkill. Neither is suitable or sufficient. The actual type must be explicitly recorded in every delta, or at least the initial delta and every delta following a dead delta. on earth, extension matching would be fine. Unless you have rogue developers that try and break the system by changing file formation while keeping extensions the same (save it as a jpg, but it is really a gif format) you should not have a problem. If you do have rogue developers, or even developers that can't follow simple instructions such as hey, if it is not a jpeg then don't save it as a jpeg! then you have much larger problems. ie. maybe the inmates of San Quinton do not make the idea development team. I think Greg's point here is that there are times when a file is removed and later replaced with a new one that contains a different type of data. Case in point: ASCII text file names foo.doc is replaced by a Microsoft Word document containing the same content and is stored with the same name. It could be argued that this practice is unsound, but it does happen. There's no good solution to this problem using the current CVS design, but Greg suggests a workaround that works well enough in some common cases. There are other times when the data type is the same but the content is completely unrelated, in which case the diff/merge should be avoided altogether. But CVS punts this scenario. --- End of forwarded message from [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
--- Forwarded mail from [EMAIL PROTECTED] Greg is talking about the second issue. I have an inkling keeping this info on a per-version basis won't work but I haven't come up with anything substantial. Here's one: - Create a new file and check in a few versions on the trunk. - Create a branch. - Check in a few versions on the branch, but change the data type before the first commit. All versions on the branch contain the same data type, but they differ from the trunk. - Merge the branch to the trunk *OR* update the branch from the trunk. What's the correct action for the last step? Is it to refuse to merge because the tools are different? Is it to copy the contributor and change the data type on the target? Is it to somehow convert one data type to the other and invoke the proper merge tool and record that tool with the next commit? None of these choices is a good one because there are valid cases where each one is correct. I'm inclined to go with the first one and diagnose an error, but I'm sure others would disagree. --- End of forwarded message from [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
--- Paul Sander [EMAIL PROTECTED] wrote: --- Forwarded mail from [EMAIL PROTECTED] Greg is talking about the second issue. I have an inkling keeping this info on a per-version basis won't work but I haven't come up with anything substantial. Here's one: - Create a new file and check in a few versions on the trunk. - Create a branch. - Check in a few versions on the branch, but change the data type before the first commit. All versions on the branch contain the same data type, but they differ from the trunk. - Merge the branch to the trunk *OR* update the branch from the trunk. What's the correct action for the last step? Is it to refuse to merge because the tools are different? Is it to copy the contributor and change the data type on the target? Is it to somehow convert one data type to the other and invoke the proper merge tool and record that tool with the next commit? None of these choices is a good one because there are valid cases where each one is correct. I'm inclined to go with the first one and diagnose an error, but I'm sure others would disagree. Yes, I think it should be an error since it would be unwise to expect CVS to be able to act on files of two different types. I guess ideally CVS would be able to be configured to use diff/merge programs that did their work on files of different types (eg similar to overloaded functions), but this is way too much to hope for. As a workaround, the user can change the diff/merge program for the trunk, then perform the operation without impunity. Come to think of it, you might've mentioned this before, all that the file has to keep track of is its type. CVS can then map that type to a particular diff/merge program. This has two advantages: 1. Changing the diff/merge for a bunch of files with the same type is easy. 2. Adding more type-specific behaviour in the future would be easier. Noel __ Do You Yahoo!? Yahoo! Health - your guide to health and wellness http://health.yahoo.com ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
Re: merge mode for XML
[ On , May 2, 2002 at 09:33:45 (+0200), Lee Sau Dan wrote: ] Subject: Re: merge mode for XML Paul == Paul Sander [EMAIL PROTECTED] writes: Paul A better implementation would be to code a symbolic name for Paul the merge tool in a newphrase in the admin section the RCS Paul file, and look up that symbolic name on the client to locate Paul the proper tool. Good idea! There could be a look up table in CVSROOT to provide the defaults, and then the client will have its own config file to override the defaults. (Like CVSROOT/cvsignore vs. ~/.cvsignore.) It could be a good idea, if it were modified slightly. Since the type of content in a CVS RCS file revision can change from one revision to another the type of merge tool must be declared in a newphrase in each deltatext section. -- Greg A. Woods +1 416 218-0098; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
[ On Wednesday, May 1, 2002 at 13:33:08 (-0700), Glew, Andy wrote: ] Subject: RE: merge mode for XML Well, I wrote Perl-SQL, a relational database system that is self-schematizing - where every record can define its own schema, with its own fields. Yeah, that sounds like something a perl hacker would do Motivation: schema changes in most existing relational databases are onerous. For very good reason. 3-4 years ago I discussed self-schematization with Prof. David Dewitt, a man of some reknown in database circles. His take is that self-schematization was not done in the early days to save space, and that now it is not unreasonable to do so. However, rather than Perl-SQL, he pointed to me towards XML, saying something like well formed SQL doesn't require a schema or DTD -- that is the future. --- I.e., Greg, not all RDBMS experts agree with you about schemas; ditto DTDs. I suspect Dewitt is thinking a little bit deeper than you suspect. Certainly data can be self-describing -- that's what OO is all about. OO databases can effectively be queried about their schemas, and since all proper objects know how to interact with other objects, even those in different classes (i.e. of different types), their relationships are self-defining. An RDBMS, however, is not an OODBMS. Whether an XML document without a DTD and/or schema can be considered self-describing enough to be independent like an object instance or a set of object instances, is probably what you're trying to argue, but I won't go any further since such a thing is strictly outside the scope of XML proper and is way outside the scope of what a common tool like CVS should ever deem worthy of dealing with. -- Greg A. Woods +1 416 218-0098; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
No. Not on extension, but based on *regular expressions*, or at least shell-style pattern matching expressions. Extensions are too simplistic. (c.f. CVSROOT/cvswrappers, CVSROOT/cvsignore) Extensions would work fine, pattern matching is overkill. Yes. Some mechanisms like ~/.mime.types plus ~/.mailcap would be desirable. But one more complication would be the version of these external programs. Maybe, CVS needs to keep track of which version of the tools were used for which file revisions, so as to reliably and faithfully reproduce any snapshot. This is a bit more overkill. Admins should test and make backups before changing diff/merge programs during production. Even still, most updates in diff/merge programs would be to fix bugs and would not dramatically change the program functionality. Sean. ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
Yeah. That'd be a cool feature. But then, CVS will no longer be a standalone program. If you move the repository to another server where the modules are missing, how would you expect CVS to behave? The plugins would be part of the module, so if you moved the module to another CVS repository running the same versions of CVS everything would still work perfectly. (or if you moved the entire repository) Consider the case where programmers are working on UNIX workstations with an NFS mounted CVS repository. The cvs command just runs locally, without using the CVS client-server operations. In this case, it is possible that those pluggable modules are present on some workstations but not the others. So, files commited from one machine may fail to check out in another. And how can we now make sure the snapshots (tags) is really reproducible, when CVS now depends on extension modules? CVS would have to use a thin client, and the diff/merge program would have to be on executed on the server side. This would ensure that rules are enforced correctly. Sean. ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
[ On Friday, May 3, 2002 at 14:49:11 (-0500), Sean Hager wrote: ] Subject: RE: merge mode for XML No. Not on extension, but based on *regular expressions*, or at least shell-style pattern matching expressions. Extensions are too simplistic. (c.f. CVSROOT/cvswrappers, CVSROOT/cvsignore) Extensions would work fine, pattern matching is overkill. Neither is suitable or sufficient. The actual type must be explicitly recorded in every delta, or at least the initial delta and every delta following a dead delta. -- Greg A. Woods +1 416 218-0098; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
No. Not on extension, but based on *regular expressions*, or at least shell-style pattern matching expressions. Extensions are too simplistic. (c.f. CVSROOT/cvswrappers, CVSROOT/cvsignore) Extensions would work fine, pattern matching is overkill. Neither is suitable or sufficient. The actual type must be explicitly recorded in every delta, or at least the initial delta and every delta following a dead delta. on earth, extension matching would be fine. Unless you have rogue developers that try and break the system by changing file formation while keeping extensions the same (save it as a jpg, but it is really a gif format) you should not have a problem. If you do have rogue developers, or even developers that can't follow simple instructions such as hey, if it is not a jpeg then don't save it as a jpeg! then you have much larger problems. ie. maybe the inmates of San Quinton do not make the idea development team. sean. ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
Re: merge mode for XML
On Fri, May 03, 2002 at 04:43:11PM -0400, Greg A. Woods wrote: [ On Friday, May 3, 2002 at 14:49:11 (-0500), Sean Hager wrote: ] Subject: RE: merge mode for XML No. Not on extension, but based on *regular expressions*, or at least shell-style pattern matching expressions. Extensions are too simplistic. (c.f. CVSROOT/cvswrappers, CVSROOT/cvsignore) Extensions would work fine, pattern matching is overkill. Neither is suitable or sufficient. Agreed! I *know* I've had conflicts in CVSROOT/cvswrappers in the past (having a text file treated as binary because it happens to have an extension that also signifies some binary format); just can't remember what they were. But here are some counterexamples anyway: - .doc: we all know what M$ thinks it means, but other people have their own ideas. I've seen text files with .doc extensions, and I'd bet other word- or document-processors have used it too - .cfg: some kind of configuration file ... but it could be anything from Windows-ini format to XML to pseudo-Lisp to binary - .cgi: again, that says how it functions -- in this case, what its API is -- but says *nothing* about the file format; CGI scripts can be in any language you like, including C - In general, on *NIX machines where extensions are more a convention than an OS-mandated thing, people tend to play fast and loose with them. E.g. one could conceive of a directory full of files named for Internet domains -- including Australian ones ending in .au, which is also an audio format - And the killer is .bak: can be anything under the sun, of course. Usually people don't check them in, but it would be foolish in the extreme to presume they *never* do, and to make doing so functionally useless That's just from the A-D part of a list of file extensions. I'm sure there are lots more conflicts in E-Z. Oh, and of course there's .sys; even M$ can't decide what that signifies. -- | | /\ |-_|/ Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED] | | / Anyone who swims with the current will reach the big music steamship; whoever swims against the current will perhaps reach the source. - Paul Schneider-Esleben ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
Re: merge mode for XML
Sean == Sean Hager [EMAIL PROTECTED] writes: Sean I agree that XML is overkill, but the truth is that it is Sean here to stay. Sean XML is fastly becoming excepted as the defacto standard for . accepted? :) Sean If CVS had away to use modular plug in diff and merge Sean programs, we could setup a wrapper file that would Sean automatically diff/merge the file differently based on the Sean extension. Yeah. That'd be a cool feature. But then, CVS will no longer be a standalone program. If you move the repository to another server where the modules are missing, how would you expect CVS to behave? Consider the case where programmers are working on UNIX workstations with an NFS mounted CVS repository. The cvs command just runs locally, without using the CVS client-server operations. In this case, it is possible that those pluggable modules are present on some workstations but not the others. So, files commited from one machine may fail to check out in another. And how can we now make sure the snapshots (tags) is really reproducible, when CVS now depends on extension modules? Some more thoughts have to be made before it becomes reliable. Some more changes to the current architecture will be needed. This is not a simple change. Of course, I'm not saying pluggable extension modules is a bad thing. And it would be also nice to have keyword-substitution modules, too, so that we can keep $Id$ as comment tags in GIF/JPEG/PNG/TIFF, for instance. But some major redesign of CVS would be required to manage the which versions of which modules were used to commit which changes, so that CVS could faithfully and reliably reproduce the snapshots. -- Lee Sau Dan §õ¦u´°(Big5)~{@nJX6X~}(HZ) E-mail: [EMAIL PROTECTED] Home page: http://www.informatik.uni-freiburg.de/~danlee ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
[Greg Woods]: ... conversations about XML and DTDs ... ... well formed by definition should mean in conformance to a pre-existing DTD! ... Do you build relational databases without defining a schema? Well, I wrote Perl-SQL, a relational database system that is self-schematizing - where every record can define its own schema, with its own fields. Motivation: schema changes in most existing relational databases are onerous. Even just adding fields is painful. Self-schematization allows new fields to be added on the fly, improving documentation of the experiment results that are my target data, because any observed features can be easily added to the schema in a structured way. As opposed to all of those database schemas that have a miscellaneous text or comment field, where far too often all the critical data that you wish to process lives. Self-schematization allows you to do all SQL operations across spontaneously added fields. --- 3-4 years ago I discussed self-schematization with Prof. David Dewitt, a man of some reknown in database circles. His take is that self-schematization was not done in the early days to save space, and that now it is not unreasonable to do so. However, rather than Perl-SQL, he pointed to me towards XML, saying something like well formed SQL doesn't require a schema or DTD -- that is the future. --- I.e., Greg, not all RDBMS experts agree with you about schemas; ditto DTDs. ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
Re: merge mode for XML
Paul == Paul Sander [EMAIL PROTECTED] writes: Paul A better implementation would be to code a symbolic name for Paul the merge tool in a newphrase in the admin section the RCS Paul file, and look up that symbolic name on the client to locate Paul the proper tool. Good idea! There could be a look up table in CVSROOT to provide the defaults, and then the client will have its own config file to override the defaults. (Like CVSROOT/cvsignore vs. ~/.cvsignore.) -- Lee Sau Dan §õ¦u´°(Big5)~{@nJX6X~}(HZ) E-mail: [EMAIL PROTECTED] Home page: http://www.informatik.uni-freiburg.de/~danlee ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
Re: merge mode for XML
Hi, Greg: Greg A. Woods wrote: [ On Wednesday, May 1, 2002 at 11:34:02 (-0400), Gary Bisaga wrote: ] Subject: RE: merge mode for XML Good point, Noel. At my last job we had a partner we were required to connect to, and it was a job getting even an example XML document out of them, let alone a DTD or schema. You guys are just re-iterating all the silly arguments that have come and gone with EDI. New face, same old problem. Yeah! been there seen that. What's the good of XML if there's no concert about the DTD/schema to be used!? -- SALUD, Jesus *** [EMAIL PROTECTED] *** Desde Zaragoza, busco empleo - http://www.geocities.com/jesusm_navarro/CV/cv.html *** ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
Re: merge mode for XML
Noel == Noel Yap [EMAIL PROTECTED] writes: If CVS had away to use modular plug in diff and merge programs, we could setup a wrapper file that would automatically diff/merge the file differently based on the extension. e.g.: *.xml xml_dm *.html html_dm Noel Ideally, the diff/merge tool would be tied to the type of Please add also a keyword-substitution tool to the check list. Noel the file and the type of the file is initially set depending Noel on the extension. No. Not on extension, but based on *regular expressions*, or at least shell-style pattern matching expressions. Extensions are too simplistic. (c.f. CVSROOT/cvswrappers, CVSROOT/cvsignore) Noel This way, one would be able to change the Noel type of the file independent of its extension. Yes. Some mechanisms like ~/.mime.types plus ~/.mailcap would be desirable. But one more complication would be the version of these external programs. Maybe, CVS needs to keep track of which version of the tools were used for which file revisions, so as to reliably and faithfully reproduce any snapshot. -- Lee Sau Dan §õ¦u´°(Big5)~{@nJX6X~}(HZ) E-mail: [EMAIL PROTECTED] Home page: http://www.informatik.uni-freiburg.de/~danlee ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
Sorry, this strikes me as just a little bit extreme. I agree that you ought to write DTDs or schemas (just yesterday I had to make one of our developers do so, and our own internal XML infrastructure requires them). But to call documents without DTDs/schemas not XML and unworthy of configuration management is certainly not supported by the XML spec or common usage. For one thing, as I'm sure you know, the XML spec does not seem to deprecate well-formed XML documents. When I was in the W3 XML working group (1999) there was certainly a group of us (not everybody) who believed that well-formed documents had a place in the world. And if we take this tack, what about constructs not declarable with DTDs? XML Schemas will certainly improve this, but many people are not using them yet. Are DTDs with ANY declarations also not XML, since they really don't describe the semantics of the document? Since DTDs can't describe data types or other restrictions (such as field length), is any DTD'ed document not XML? DTDs and schemas are good and should be used wherever possible. But there are realities of life. gary -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Greg A. Woods Sent: Wednesday, May 01, 2002 1:56 AM To: Peter Ring Cc: CVS-II Discussion Mailing List Subject: RE: merge mode for XML rantThere's a class of simple XML documents that live and die without getting near either a DTD or revision control. Without a schema and accompanying documentation, there's no way to tell the semantics of the XML document, and not much point in version management./rant Amen. I couldn't agree more! Those who dare call such things XML are sadly mistaken. ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
Not only that, but in the end it is the client who decides the real semantics of the document with or without DTDs and Schemas. Noel --- Gary Bisaga [EMAIL PROTECTED] wrote: Sorry, this strikes me as just a little bit extreme. I agree that you ought to write DTDs or schemas (just yesterday I had to make one of our developers do so, and our own internal XML infrastructure requires them). But to call documents without DTDs/schemas not XML and unworthy of configuration management is certainly not supported by the XML spec or common usage. For one thing, as I'm sure you know, the XML spec does not seem to deprecate well-formed XML documents. When I was in the W3 XML working group (1999) there was certainly a group of us (not everybody) who believed that well-formed documents had a place in the world. And if we take this tack, what about constructs not declarable with DTDs? XML Schemas will certainly improve this, but many people are not using them yet. Are DTDs with ANY declarations also not XML, since they really don't describe the semantics of the document? Since DTDs can't describe data types or other restrictions (such as field length), is any DTD'ed document not XML? DTDs and schemas are good and should be used wherever possible. But there are realities of life. gary -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Greg A. Woods Sent: Wednesday, May 01, 2002 1:56 AM To: Peter Ring Cc: CVS-II Discussion Mailing List Subject: RE: merge mode for XML rantThere's a class of simple XML documents that live and die without getting near either a DTD or revision control. Without a schema and accompanying documentation, there's no way to tell the semantics of the XML document, and not much point in version management./rant Amen. I couldn't agree more! Those who dare call such things XML are sadly mistaken. ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs __ Do You Yahoo!? Yahoo! Health - your guide to health and wellness http://health.yahoo.com ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
Good point, Noel. At my last job we had a partner we were required to connect to, and it was a job getting even an example XML document out of them, let alone a DTD or schema. gary -Original Message- From: Noel Yap [mailto:[EMAIL PROTECTED]] Sent: Wednesday, May 01, 2002 11:25 AM To: Gary Bisaga; CVS-II Discussion Mailing List Subject: RE: merge mode for XML Not only that, but in the end it is the client who decides the real semantics of the document with or without DTDs and Schemas. Noel --- Gary Bisaga [EMAIL PROTECTED] wrote: Sorry, this strikes me as just a little bit extreme. I agree that you ought to write DTDs or schemas (just yesterday I had to make one of our developers do so, and our own internal XML infrastructure requires them). But to call documents without DTDs/schemas not XML and unworthy of configuration management is certainly not supported by the XML spec or common usage. For one thing, as I'm sure you know, the XML spec does not seem to deprecate well-formed XML documents. When I was in the W3 XML working group (1999) there was certainly a group of us (not everybody) who believed that well-formed documents had a place in the world. And if we take this tack, what about constructs not declarable with DTDs? XML Schemas will certainly improve this, but many people are not using them yet. Are DTDs with ANY declarations also not XML, since they really don't describe the semantics of the document? Since DTDs can't describe data types or other restrictions (such as field length), is any DTD'ed document not XML? DTDs and schemas are good and should be used wherever possible. But there are realities of life. gary -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Greg A. Woods Sent: Wednesday, May 01, 2002 1:56 AM To: Peter Ring Cc: CVS-II Discussion Mailing List Subject: RE: merge mode for XML rantThere's a class of simple XML documents that live and die without getting near either a DTD or revision control. Without a schema and accompanying documentation, there's no way to tell the semantics of the XML document, and not much point in version management./rant Amen. I couldn't agree more! Those who dare call such things XML are sadly mistaken. ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs __ Do You Yahoo!? Yahoo! Health - your guide to health and wellness http://health.yahoo.com ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
[ On Wednesday, May 1, 2002 at 11:11:32 (-0400), Gary Bisaga wrote: ] Subject: RE: merge mode for XML Sorry, this strikes me as just a little bit extreme. I agree that you ought to write DTDs or schemas (just yesterday I had to make one of our developers do so, and our own internal XML infrastructure requires them). But to call documents without DTDs/schemas not XML and unworthy of configuration management is certainly not supported by the XML spec or common usage. For one thing, as I'm sure you know, the XML spec does not seem to deprecate well-formed XML documents. When I was in the W3 XML working group (1999) there was certainly a group of us (not everybody) who believed that well-formed documents had a place in the world. Just because some stupid ill-advised practice is common doesn't mean it should be condoned. Specification committees are, by their very nature, political creatures. Those of you who practice such shady techniques are able to sway a political process, but that doesn't mean what you do is right. In any case, what do you mean by ought to!?!?!?!? SGML syntax is not self-documenting by and of itself. Do you build relational databases without defining a schema? Do you design data structures which have the sole purpose of interchanging data between published APIs but that are not documented? well formed by definition should mean in conformance to a pre-existing DTD! As has been mentioned already, no XML parser worth its salt can even begin to interpret an XML document without first reading the DTD that describes it! What do you do, parse your XML documents by hand? Finally let's also note that if you are using loosely defined XML-like formats for data interchange between programs then presumably you are careful enough only to do so in contexts where the content is highly dynamic (i.e. not static and long lived), in which case it certainly isn't suitable for CVS, if indeed any form of change tracking And if we take this tack, what about constructs not declarable with DTDs? XML Schemas will certainly improve this, but many people are not using them yet. Are DTDs with ANY declarations also not XML, since they really don't describe the semantics of the document? Since DTDs can't describe data types or other restrictions (such as field length), is any DTD'ed document not XML? What about them? XML is no more a final all-encompassing solution than the entire SGML framework from which it comes! Do all your tools look exactly like hammers? This is yet another situation where you must learn to use the right tool for the job! DTDs and schemas are good and should be used wherever possible. But there are realities of life. Indeed, and in the reality you're existing in you are defining overly complex little languages which you then mis-name as XML. -- Greg A. Woods +1 416 218-0098; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
[ On Wednesday, May 1, 2002 at 11:34:02 (-0400), Gary Bisaga wrote: ] Subject: RE: merge mode for XML Good point, Noel. At my last job we had a partner we were required to connect to, and it was a job getting even an example XML document out of them, let alone a DTD or schema. You guys are just re-iterating all the silly arguments that have come and gone with EDI. New face, same old problem. -- Greg A. Woods +1 416 218-0098; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
I might just sit back and watch the show, but I like to be part of the fun ;) DTDs are certainly not enough in a lot of situations, and XML Schemas, RELAX NG, or what-have-you won't ever obliviate the need for documentation and project management. But sometimes, well-formed XML is worthy of version management even when no-one bothers about a schema. For example, you might need to save some of those short-lived XML-formatted messages for a regression test. Also, there is now a trend towards a more Unix-like 'tool-set' approach, in which different parts of a chain of processes are responsible for different tasks. While it might be valuable early in a proces to validate an XML instance, it might be a waste of resources later in the proces. There's also another interesting trend: HyTime is slowly being re-invented in XML incarnation. Which to me is proof-of-concept: you can dream up another syntax (and a full-blown SGML parser might even be able to parse it, given a suitable SGML declaration), but in the end, it doesn't make much of a difference whether you write conceptsomething/concept or (concept something) or \concept{something} or whatever; the fundamental issue is the same: You are not really supposed to look at the markup except as an expression of structure. Which was what started this thread: how to diff and merge in a meaningful way, i.e. in a way that knows that whatnot a=bar b=foo / isn't different from whatnot b=foo a=bar / in a way that most XML applications should care about. You can come this far with just well-formed XML. When it comes to whitespace in character context, things get really interesting. XML in essence leaves it up to the application what to do with whitespace, so you have to know the application in order to decide whether a whitespace difference matter. A DTD or schema helps a lot because you can then ignore whitespace in element context. BTW, I stumbled over yet another XML diff, this one written by Norman Walsh: http://nwalsh.com/java/diffmk/ and a small feature about whitespace and prettyprinting XML: http://www.xml.com/pub/a/2002/01/02/whitespace.html Kind regards Peter Ring -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Gary Bisaga Sent: 1. maj 2002 17:12 To: CVS-II Discussion Mailing List Subject: RE: merge mode for XML Sorry, this strikes me as just a little bit extreme. I agree that you ought to write DTDs or schemas (just yesterday I had to make one of our developers do so, and our own internal XML infrastructure requires them). But to call documents without DTDs/schemas not XML and unworthy of configuration management is certainly not supported by the XML spec or common usage. For one thing, as I'm sure you know, the XML spec does not seem to deprecate well-formed XML documents. When I was in the W3 XML working group (1999) there was certainly a group of us (not everybody) who believed that well-formed documents had a place in the world. And if we take this tack, what about constructs not declarable with DTDs? XML Schemas will certainly improve this, but many people are not using them yet. Are DTDs with ANY declarations also not XML, since they really don't describe the semantics of the document? Since DTDs can't describe data types or other restrictions (such as field length), is any DTD'ed document not XML? DTDs and schemas are good and should be used wherever possible. But there are realities of life. gary -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Greg A. Woods Sent: Wednesday, May 01, 2002 1:56 AM To: Peter Ring Cc: CVS-II Discussion Mailing List Subject: RE: merge mode for XML rantThere's a class of simple XML documents that live and die without getting near either a DTD or revision control. Without a schema and accompanying documentation, there's no way to tell the semantics of the XML document, and not much point in version management./rant Amen. I couldn't agree more! Those who dare call such things XML are sadly mistaken. ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
--- Greg A. Woods [EMAIL PROTECTED] wrote: In any case if you re-read what I wrote a little more carefully you'll note that I'm still only talking about XML, using HTML only as an example (because it uses the same syntax). Since all the XML parsers I know of are very much unrelated to any known HTML rendering engines any issues with the latter cannot possibly have anything whatsoever to do with the former. Very well, we'll talk about what you want to talk about, not what was being talked about at the time I made my post. Noel __ Do You Yahoo!? Yahoo! Health - your guide to health and wellness http://health.yahoo.com ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
[ On Tuesday, April 30, 2002 at 07:43:21 (-0500), Sean Hager wrote: ] Subject: RE: merge mode for XML Thanks for offering up the samples Paul. I read through last Septembers thread on giving up cvs. I see that I stirred up an old debate here (man you guys really had it out last time ;). With the emergence of xml more and more programs are supporting it as a format. If a cvs - xml diff/merge solution was implemented then cvs could capture a huge new level of concurrent development in documentation, configuration, and help system docs, etc... Any chance some of you cvs wizes could ever implement a modular diff/merge subprogram architecture into CVS. Then we could implement an XML wrapper. Don't hold your breath. Even the biggest proponents of this idea have not yet come up with working code as a solid proposal -- only what amounts to no more than a functional specification, and one that in my opinion contains several concerns for existing CVS users. Note too that this is after over a half a decade of debate. Unless you're prepared to do the implementation yourself, or at least fund it, it may not happen for another half decade :-) -- Greg A. Woods +1 416 218-0098; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
Don't hold your breath. Even the biggest proponents of this idea have not yet come up with working code as a solid proposal -- only what amounts to no more than a functional specification, and one that in my opinion contains several concerns for existing CVS users. Note too that this is after over a half a decade of debate. Unless you're prepared to do the implementation yourself, or at least fund it, it may not happen for another half decade :-) Unfortunately, this is true. IMO, it would be great if diff/merge tools can be plugged into CVS (I think CVS shouldn't be in a place to dictate, although it can suggest, how things are diffed/merged, just when it's supposed to happen). ClearCase is the only tool on the market I know that supports this. I think there was some effort in creating an open source ClearCase-like tool (katie and possibly another one?) but I haven't heard anything as of late. Noel __ Do You Yahoo!? Yahoo! Health - your guide to health and wellness http://health.yahoo.com ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
[ On Monday, April 29, 2002 at 08:31:24 (+0200), Peter Ring wrote: ] Subject: RE: merge mode for XML I sort of agree with the logic of the arguments for SGML and its derrivatives, but I find the rhetoric about it being the only choice because it's the best there is (something I've heard whined about for nearly two decades now) to be nor more than self-serving, at best. As for source code beautification issues w.r.t. XML, well those are no different than when dealing with any kind of source code primarily written and edited with an integrated IDE. There are, BTW, XML diff tools. See e.g.: http://www.alphaworks.ibm.com/tech/xmldiffmerge http://www.deltaxml.com http://www.vmguys.com/vmtools http://www.logilab.org/xmldiff The first one can be used as merge tool. The other ones can produce a XML diff file that -- given a proper XML patch utility -- can update one one XML file to become the other one. Well with CVS there's always the choice of manually re-doing every merge. You don't have to use CVS to do merges and diffs -- it'll happily store your files with text-based (i.e. newline separated) diffs and you can use the better format-specific tools to view deltas and do merges as you wish. Obviously a front-end wrapper to CVS that integrates these tools would be helpful, but it's not strictly necessary (unless your users are less than clueless :-). For instance PCL-CVS, the emacs front-end to CVS, allows one to re-do merges with ediff. I don't know if ediff could be extended to use external diff tools (and also perhaps alternate merge tools), or not, but that may be the best way for users with immediate needs to proceed. (I.e. even if you're not an emacs user, treat emacs as an application framework and use emacs+PCL-CVS+ediff as a stand-alone CVS interface.) There are, to the best of my knowledge, no freely available stand-alone SGML diff tools. Some editors, e.g. ArborText Epic, can do a very nice compare. Would not a full stand-alone SGML diff tool be required to understand the DTD in order to do a proper job of knowing just how different tagged elements relate to each other in order to know whether or not they have to be included in any delta or merge? -- Greg A. Woods +1 416 218-0098; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
Re: merge mode for XML
I'm looking into a ground-up rewrite of CVS from Dick Grune's last shell script implementation. It will take a while to complete a prototype because my life is pretty turbulent right now, but it will get done. On Tuesday, April 30, 2002, at 05:43 AM, [EMAIL PROTECTED] wrote: Thanks for offering up the samples Paul. I read through last Septembers thread on giving up cvs. I see that I stirred up an old debate here (man you guys really had it out last time ;). With the emergence of xml more and more programs are supporting it as a format. If a cvs - xml diff/merge solution was implemented then cvs could capture a huge new level of concurrent development in documentation, configuration, and help system docs, etc... Any chance some of you cvs wizes could ever implement a modular diff/merge subprogram architecture into CVS. Then we could implement an XML wrapper. sean. -Original Message- From: Paul Sander [mailto:[EMAIL PROTECTED]] Sent: Monday, April 29, 2002 12:43 PM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: RE: merge mode for XML Once again, take a look at message ID# [EMAIL PROTECTED] posted to this forum on September 16, 2001. It illustrates one way (though perhaps not the best way) to do just this. It relies on a lookup table that looks up a diff tool given a file's name. A better implementation would be to code a symbolic name for the merge tool in a newphrase in the admin section the RCS file, and look up that symbolic name on the client to locate the proper tool. --- Forwarded mail from [EMAIL PROTECTED] A better approach is to avoid XML entirely in the first place -- it's a really really horrid syntax with all kinds of goo that's usually way over-kill for the application, being SGML based and all that I agree that XML is overkill, but the truth is that it is here to stay. XML is fastly becoming excepted as the defacto standard for data exchange. Opto 22 makes machine control sensors / PLC that publishes data in XML. Semen's is doing similar things from what I understand. Java uses XML for all of the enterprise application descriptors. It seems that I can't interface to machines, or program without looking at XML. If CVS had away to use modular plug in diff and merge programs, we could setup a wrapper file that would automatically diff/merge the file differently based on the extension. e.g.: *.xml xml_dm *.html html_dm This way we could write our own diff programs without having to understand all the complexities of tying into CVS code seamlessly. Interfacing is much easier. We could even take the XML diff/merge programs that are already available and just write wrappers for them. No point in reinventing the wheel here. --- End of forwarded message from [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
I tried not to argue about the virtues of SGML/XML -- the fact is that it there and any non-propriatary alternatives have similar properties wrt. meaningful diffs (and thus merge). My IDE for editing XML and SGML files is usually emacs+psgml. I like the idea of extending PCL-CVS to invoke another diff tool (but I'll probably not get around to exploring the idea). SGML and XML files are really just serialized representations of parse trees, infosets, and an infoset can be serialized in many equivalent ways. So diff'ing XML and SGML alike need a validating parser, i.e. one that uses a schema such as a DTD. rantThere's a class of simple XML documents that live and die without getting near either a DTD or revision control. Without a schema and accompanying documentation, there's no way to tell the semantics of the XML document, and not much point in version management./rant kind regards Peter Ring -Original Message- From: Greg A. Woods [mailto:[EMAIL PROTECTED]] Sent: 30. april 2002 19:09 To: Peter Ring Cc: CVS-II Discussion Mailing List Subject: RE: merge mode for XML [ On Monday, April 29, 2002 at 08:31:24 (+0200), Peter Ring wrote: ] Subject: RE: merge mode for XML I sort of agree with the logic of the arguments for SGML and its derrivatives, but I find the rhetoric about it being the only choice because it's the best there is (something I've heard whined about for nearly two decades now) to be nor more than self-serving, at best. As for source code beautification issues w.r.t. XML, well those are no different than when dealing with any kind of source code primarily written and edited with an integrated IDE. snip / For instance PCL-CVS, the emacs front-end to CVS, allows one to re-do merges with ediff. I don't know if ediff could be extended to use external diff tools (and also perhaps alternate merge tools), or not, but that may be the best way for users with immediate needs to proceed. (I.e. even if you're not an emacs user, treat emacs as an application framework and use emacs+PCL-CVS+ediff as a stand-alone CVS interface.) There are, to the best of my knowledge, no freely available stand-alone SGML diff tools. Some editors, e.g. ArborText Epic, can do a very nice compare. Would not a full stand-alone SGML diff tool be required to understand the DTD in order to do a proper job of knowing just how different tagged elements relate to each other in order to know whether or not they have to be included in any delta or merge? -- Greg A. Woods +1 416 218-0098; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
[ On Wednesday, May 1, 2002 at 00:04:39 (+0200), Peter Ring wrote: ] Subject: RE: merge mode for XML SGML and XML files are really just serialized representations of parse trees, infosets, and an infoset can be serialized in many equivalent ways. Hmmm it's just that they have the most horrid syntax It's really too bad, but after all these years that's one heck of a herd of elephants to try and turn on the head of a pin :-) So diff'ing XML and SGML alike need a validating parser, i.e. one that uses a schema such as a DTD. Yes -- that's what I thought... rantThere's a class of simple XML documents that live and die without getting near either a DTD or revision control. Without a schema and accompanying documentation, there's no way to tell the semantics of the XML document, and not much point in version management./rant Amen. I couldn't agree more! Those who dare call such things XML are sadly mistaken. -- Greg A. Woods +1 416 218-0098; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
One of the most widespread uses of XML is as a neutral storage and exchange format for documents. In these cases, avoiding XML or SGML would just imply going back to Word or FrameMaker (and we don't want that), or to LateX or Texi, which are similar wrt. merging. Or HTML, an application of SGML. And anyway, a lot of documentation for open source projects are being written or converted to DocBook, and will be maintained using the same revision control tools as the rest of the projects, i.e., cvs. So we are going to see questions about XML or SGML pop up more frequently. Many of the issues wrt. cvs are essentially not much different from maintaining documentation written in LateX or Texi format. Except you can't assume that authors will be using vi or emacs. A lot of different tools will be used -- that was one of the main points of using SGML or XML. It is common sense to break up a document into mini- or micro-documents with each their own lifecycle -- just as you do for programming source code. The concept of storage management is built into SGML and XML at a very low level. The customary way to do this is by declaring entitities, symbolic names for storage objects, which can then be included in other documents at appropriate places. XInclude and XLink (and for SGML, HyTime) also offer ways to include or locate parts of documents in terms of parse trees. But how about the physical storage format of each file? Authors will often be using different XML or SGML editors that will 'beautify' the XML or SGML source in different ways, introducing spurious differences and conflicts. Another source of spurious conflicts are character encoding, namespace declarations, and order of attributes; most documents can be stored in a number of different ways with no loss of information for the intended use. But a simple diff will show a lot of difference that's not there, essentially. Until proper XML repositories become as ubiquitous as cvs, we might as well find a way to live with it. The character encoding is easy to control -- SGML and XML are very explicit about it, and editors do in general handle encoding gracefully. Namespace declarations and attribute order are tricky. Things can be normalized, see Canonical XML, http://www.w3.org/TR/xml-c14n, but full canonicalization of a documents will be too much. The 'beautify' problem is even worse., i.e., how to introduce and remove whitespace in a way that makes cvs behave meaningfully. I have not yet found a simple recipe for beautifying SGML and XML. Here are some of the options: The most generic, simple and safe way to break XML or SGML into lines is unfortunately not too pretty. Keep any line breaks already present in the source and, in addition, break just _before_ the markup delimiter close character '' on the start tag, e.g.: $ osx xml.dcl beautify.xml ?xml version=1.0 encoding=iso-8859-1? !DOCTYPE section PUBLIC -//OASIS//DTD DocBook XML V4.2//EN docbookx.dtd section title Beautifying XML/titlepara Papageno/parapara Break inside markup like this: emphasis role=bold some text/emphasis./parapara Papagena/para/section Some tools can beautify in a way more suitable for human consumption: $ xmllint --format beautify.xml ?xml version=1.0 encoding=iso-8859-1? !DOCTYPE section PUBLIC -//OASIS//DTD DocBook XML V4.2//EN docbookx.dtd section titleBeautifying XML/title paraPapageno/para paraBreak inside markup like this: emphasis role=boldsome text/emphasis./para paraPapagena/para /section Keeping white space in character context while beautifying is a simple way to avoid problems with NOTATION linespecific AKA xml:space='preserve' AKA pre. But the reason we needed a beautfier in the first place is that editors put in different amounts of whitespace in different places. If someone out there have a nice and robust XSLT stylesheet for normalizing/beautifying XML, please publish! There are, BTW, XML diff tools. See e.g.: http://www.alphaworks.ibm.com/tech/xmldiffmerge http://www.deltaxml.com http://www.vmguys.com/vmtools http://www.logilab.org/xmldiff The first one can be used as merge tool. The other ones can produce a XML diff file that -- given a proper XML patch utility -- can update one one XML file to become the other one. There are, to the best of my knowledge, no freely available stand-alone SGML diff tools. Some editors, e.g. ArborText Epic, can do a very nice compare. kind regards, Peter Ring -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Greg A. Woods Sent: 26. april 2002 23:45 To: CVS-II Discussion Mailing List Subject: RE: merge mode for XML snip A better approach is to avoid XML entirely in the first place -- it's a really really horrid syntax with all kinds of goo that's usually way over-kill for the application, being SGML based and all that /snip ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info
RE: merge mode for XML
A better approach is to avoid XML entirely in the first place -- it's a really really horrid syntax with all kinds of goo that's usually way over-kill for the application, being SGML based and all that I agree that XML is overkill, but the truth is that it is here to stay. XML is fastly becoming excepted as the defacto standard for data exchange. Opto 22 makes machine control sensors / PLC that publishes data in XML. Semen's is doing similar things from what I understand. Java uses XML for all of the enterprise application descriptors. It seems that I can't interface to machines, or program without looking at XML. If CVS had away to use modular plug in diff and merge programs, we could setup a wrapper file that would automatically diff/merge the file differently based on the extension. e.g.: *.xml xml_dm *.html html_dm This way we could write our own diff programs without having to understand all the complexities of tying into CVS code seamlessly. Interfacing is much easier. We could even take the XML diff/merge programs that are already available and just write wrappers for them. No point in reinventing the wheel here. Sean. ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
Doesn't everyone format their XML like that? I.e. like HTML so that tags are on their own lines and there are extra blank lines (that won't be treated as data) between groups of items and even between items too? Not only do tools not always follow these rules, you can't even always treat HTML like that. Besides making a huge file, it messes up the rendering of tables with sliced-up images. ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
Once again, take a look at message ID# [EMAIL PROTECTED] posted to this forum on September 16, 2001. It illustrates one way (though perhaps not the best way) to do just this. It relies on a lookup table that looks up a diff tool given a file's name. A better implementation would be to code a symbolic name for the merge tool in a newphrase in the admin section the RCS file, and look up that symbolic name on the client to locate the proper tool. --- Forwarded mail from [EMAIL PROTECTED] A better approach is to avoid XML entirely in the first place -- it's a really really horrid syntax with all kinds of goo that's usually way over-kill for the application, being SGML based and all that I agree that XML is overkill, but the truth is that it is here to stay. XML is fastly becoming excepted as the defacto standard for data exchange. Opto 22 makes machine control sensors / PLC that publishes data in XML. Semen's is doing similar things from what I understand. Java uses XML for all of the enterprise application descriptors. It seems that I can't interface to machines, or program without looking at XML. If CVS had away to use modular plug in diff and merge programs, we could setup a wrapper file that would automatically diff/merge the file differently based on the extension. e.g.: *.xml xml_dm *.html html_dm This way we could write our own diff programs without having to understand all the complexities of tying into CVS code seamlessly. Interfacing is much easier. We could even take the XML diff/merge programs that are already available and just write wrappers for them. No point in reinventing the wheel here. --- End of forwarded message from [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
[ On Monday, April 29, 2002 at 07:23:03 (-0700), Noel Yap wrote: ] Subject: RE: merge mode for XML --- Sean Hager [EMAIL PROTECTED] wrote: If CVS had away to use modular plug in diff and merge programs, we could setup a wrapper file that would automatically diff/merge the file differently based on the extension. e.g.: *.xml xml_dm *.html html_dm Ideally, the diff/merge tool would be tied to the type of the file and the type of the file is initially set depending on the extension. This way, one would be able to change the type of the file independent of its extension. Yes, please and thankyou! :-) -- Greg A. Woods +1 416 218-0098; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
[ On Monday, April 29, 2002 at 09:54:50 (-0400), Gary Bisaga wrote: ] Subject: RE: merge mode for XML Doesn't everyone format their XML like that? I.e. like HTML so that tags are on their own lines and there are extra blank lines (that won't be treated as data) between groups of items and even between items too? Not only do tools not always follow these rules, you can't even always treat HTML like that. Besides making a huge file, it messes up the rendering of tables with sliced-up images. I'm not talking about placing blank lines and/or comments in places where they would be considered part of the data. There are plenty of places to do so without affecting the interpretation of the data. -- Greg A. Woods +1 416 218-0098; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
--- Greg A. Woods [EMAIL PROTECTED] wrote: Not only do tools not always follow these rules, you can't even always treat HTML like that. Besides making a huge file, it messes up the rendering of tables with sliced-up images. I'm not talking about placing blank lines and/or comments in places where they would be considered part of the data. There are plenty of places to do so without affecting the interpretation of the data. In theory, this is easy to do, but in practice I have seen browsers act differently due to whitespace that really shouldn't affect the rendering. IIRC, tabletr worked differently from table\ntr on at least one browser. Noel __ Do You Yahoo!? Yahoo! Health - your guide to health and wellness http://health.yahoo.com ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
[ On Monday, April 29, 2002 at 13:25:48 (-0700), Noel Yap wrote: ] Subject: RE: merge mode for XML In theory, this is easy to do, but in practice I have seen browsers act differently due to whitespace that really shouldn't affect the rendering. IIRC, tabletr worked differently from table\ntr on at least one browser. I don't know of any browsers that parse XML -- you seem to be talking about HTML. In any case who the heck cares about a broken HTML parser in some random browser (even if it is a commonly used version of M$-Exploder)?!?!?!? If your users can't use standards-compliant browsers then get new users! :-) -- Greg A. Woods +1 416 218-0098; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
--- Greg A. Woods [EMAIL PROTECTED] wrote: [ On Monday, April 29, 2002 at 13:25:48 (-0700), Noel Yap wrote: ] Subject: RE: merge mode for XML In theory, this is easy to do, but in practice I have seen browsers act differently due to whitespace that really shouldn't affect the rendering. IIRC, tabletr worked differently from table\ntr on at least one browser. I don't know of any browsers that parse XML -- you seem to be talking about HTML. I apologize for truncating the portion of the post you had responded to. Here it is as a reminder so that you can take my post in its intended context: Doesn't everyone format their XML like that? I.e. like HTML so that tags are on their own lines and there are extra blank lines (that won't be treated as data) between groups of items and even between items too? In any case who the heck cares about a broken HTML parser in some random browser (even if it is a commonly used version of M$-Exploder)?!?!?!? Actually, I believe it was the other commonly used browser. If your users can't use standards-compliant browsers then get new users! :-) In any case, those of us who program as a service to the greater community have to deal with broken software no matter how much we hate it. Greg, which corporation did you say you worked in again? In what industry? Noel __ Do You Yahoo!? Yahoo! Health - your guide to health and wellness http://health.yahoo.com ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Greg A. Woods Sent: Thursday, April 25, 2002 8:24 PM To: [EMAIL PROTECTED] Cc: CVS-II Discussion Mailing List Subject: Re: merge mode for XML [ On Thursday, April 25, 2002 at 16:10:37 (-0500), Sean Hager wrote: ] Subject: merge mode for XML Is there a merge mode or merge algorithm that works well for XML files? Doesn't diff3 work well enough? XML files are more or less just text, right? If the tags are all on separate lines, then regardless of whether content is changed, or tags are changed, diff3 will do the right thing. -- Greg A. Woods I did test the merge with xml files and it worked fine as long as we made edits more then 5 to 10 lines appart from each other. When we made edits that were only 1 or 2 lines appart we got conflicts which I felt the merge should have been able to solve. I am not a CVS expert, but I was thinking that perhaps the tags diff3 was looking for were different for xml. I am going to test it some more, prehaps I will have to look into the source code, but I was hoping someone on the list had some experience with this. sean. ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
Sean, Our result is more or less the same as yours. We do this all the time. The diff does work, but it seems to get confused when people are making changes close to each other. FWIW, it seems to have problems with whitespace and attribute values. Not a very scientific result, I grant you, but that has been my experience. Our solution (obviously wouldn't work for everybody) is just to partition the XML into multiple files, each of which will hopefully be worked on mostly by one person at a time. Our configuration manager reads in all similarly-named XML files in parallel, so if you put everything into one file or split it into 100 it doesn't really matter. I even encourage people working on their own part of the system for awhile to create a temporary 'jimsmith.page_registry.xml' file to reduce merge conflicts. Works for us. Since we have done this, it has cut way back on merge conflicts and the corresponding system errors. BTW, I found this a useful technique using locking CM systems like PVCS too. Replace merge conflict with locked files. gary -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Sean Hager Sent: Friday, April 26, 2002 9:17 AM To: 'CVS-II Discussion Mailing List' Subject: RE: merge mode for XML -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Greg A. Woods Sent: Thursday, April 25, 2002 8:24 PM To: [EMAIL PROTECTED] Cc: CVS-II Discussion Mailing List Subject: Re: merge mode for XML [ On Thursday, April 25, 2002 at 16:10:37 (-0500), Sean Hager wrote: ] Subject: merge mode for XML Is there a merge mode or merge algorithm that works well for XML files? Doesn't diff3 work well enough? XML files are more or less just text, right? If the tags are all on separate lines, then regardless of whether content is changed, or tags are changed, diff3 will do the right thing. -- Greg A. Woods I did test the merge with xml files and it worked fine as long as we made edits more then 5 to 10 lines appart from each other. When we made edits that were only 1 or 2 lines appart we got conflicts which I felt the merge should have been able to solve. I am not a CVS expert, but I was thinking that perhaps the tags diff3 was looking for were different for xml. I am going to test it some more, prehaps I will have to look into the source code, but I was hoping someone on the list had some experience with this. sean. ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
It helps me to think of a plain ASCII text file source (C,java,perl etc) as a markup language where a newline is the only tag. To extend the delta generation of a more structured markup language, such as XML, probably would require knowledge of that syntax by the diff program. A quick an dirty approach may be to prefix all opening tags with a newline, suffix all closing tags with a newline, and then remove all blank lines unless inside a tag (in and among CDATA); essentially run it through an xml equiv of cb(1) or indent(1). --@@ ~ DavidC No State shall convert a liberty into a privilege, license it, and charge a fee therefore. ~ Murdock v. Pennsylvania, 319 US 105, US Supreme Court, 1943. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Thursday, April 25, 2002 6:24 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: merge mode for XML [ On Thursday, April 25, 2002 at 16:10:37 (-0500), Sean Hager wrote: ] Subject: merge mode for XML Is there a merge mode or merge algorithm that works well for XML files? Doesn't diff3 work well enough? XML files are more or less just text, right? If the tags are all on separate lines, then regardless of whether content is changed, or tags are changed, diff3 will do the right thing. -- Greg A. Woods +1 416 218-0098; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
[ On Friday, April 26, 2002 at 08:16:44 (-0500), Sean Hager wrote: ] Subject: RE: merge mode for XML I am not a CVS expert, but I was thinking that perhaps the tags diff3 was looking for were different for xml. I am going to test it some more, prehaps I will have to look into the source code, but I was hoping someone on the list had some experience with this. Diff and diff3 look for changes across entire lines of text -- i.e. the separator is the newline. Judicious use of blank lines, or other syntax elements that won't normally change (eg. comments!) will help separate related units by enough static lines that diff and diff3 won't blurr them into one change. -- Greg A. Woods +1 416 218-0098; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
RE: merge mode for XML
[ On Friday, April 26, 2002 at 06:51:36 (-0700), EXT-Corcoran, David wrote: ] Subject: RE: merge mode for XML It helps me to think of a plain ASCII text file source (C,java,perl etc) as a markup language where a newline is the only tag. :-) To extend the delta generation of a more structured markup language, such as XML, probably would require knowledge of that syntax by the diff program. Just as it could with any language. (eg. C statements and expressions should really be treated as units) A quick an dirty approach may be to prefix all opening tags with a newline, suffix all closing tags with a newline, and then remove all blank lines unless inside a tag (in and among CDATA); essentially run it through an xml equiv of cb(1) or indent(1). Doesn't everyone format their XML like that? I.e. like HTML so that tags are on their own lines and there are extra blank lines (that won't be treated as data) between groups of items and even between items too? A better approach is to avoid XML entirely in the first place -- it's a really really horrid syntax with all kinds of goo that's usually way over-kill for the application, being SGML based and all that You're no worse off defining a proper little language for your data. Writing a parser with modern grammar compilers is no harder than writing a good DTD, and it means you can avoid having to have all that annoying useless syntax that just gets in your way. By following common approaches to simple data description syntax design one can even make a custom little language trivial for most programmers to learn (thus avoiding one of the few arguments against using a custom language). -- Greg A. Woods +1 416 218-0098; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED] ___ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs