CVS commit: othersrc/external/bsd/bikeshed/dist/design

2014-10-20 Thread David A. Holland
Module Name:othersrc
Committed By:   dholland
Date:   Tue Oct 21 04:50:11 UTC 2014

Added Files:
othersrc/external/bsd/bikeshed/dist/design: schema.txt

Log Message:
Some notes on storing version control info in a graph datastore.


To generate a diff of this commit:
cvs rdiff -u -r0 -r1.1 othersrc/external/bsd/bikeshed/dist/design/schema.txt

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Added files:

Index: othersrc/external/bsd/bikeshed/dist/design/schema.txt
diff -u /dev/null othersrc/external/bsd/bikeshed/dist/design/schema.txt:1.1
--- /dev/null	Tue Oct 21 04:50:11 2014
+++ othersrc/external/bsd/bikeshed/dist/design/schema.txt	Tue Oct 21 04:50:11 2014
@@ -0,0 +1,166 @@
+Data schema for bikeshed
+
+
+Version control history forms a graph (a directed acyclic graph) so
+the best way to think of the version database is explicitly as a graph
+database. The property graph model is the least toxic extant way to
+handle graph-structured data, so we'll describe things in those terms.
+
+In the property graph model, a database is composed of one or more
+graphs; a graph is composed of nodes and edges; nodes and edges have
+IDs; edges also have labels; and nodes and edges furthermore have an
+arbitrary collection of attributes. A description of a schema gives
+the various types of nodes and edges that appear, and the attributes
+they're expected to have.
+
+
+1. Metadata vs. data
+
+The version control system has data (that is, files whose versions it
+controls) and it also has metadata: commit messages, version numbers,
+tags, etc.
+
+It is clear from the long-simmering argument about editing commit
+messages that most or all metadata needs to be mutable, even in the
+long term. Granting this one then immediately recognizes that metadata
+needs to be versioned: obviously if you e.g. change a commit message
+the old version should be retained for archival purposes, and to give
+an audit trail if needed. Meanwhile, the negative example of
+Mercurial's odd tagging semantics shows that metadata and data
+versions need to be handled separately. In particular, metadata
+changes should not create new data versions... and checking out an old
+data version should not cause the metadata to also be rolled back.
+(Maybe, however, rolling back the metadata should also force rolling
+back to a matching data version; however, it isn't clear if rolling
+back metadata, as opposed to just inspecting its history, is even an
+interesting thing to do.)
+
+
+2. Versions, changesets, branches, etc.
+
+Since we're versioning a whole filesystem tree, we need a
+representation of (a single state of) the tree, the files within the
+tree, and possibly regions within files. This will be some collection
+of graph nodes and edges.
+
+Each such version will chain back to the previous and forward to the
+next. (Or possibly back to more than one previous version for merges,
+and forward to more than one when branching occurs.) There should be
+one master node for the whole thing, which links forward and backward
+to the master nodes for other versions, and which also links to the
+nodes that describe the version.
+
+Individual objects within the version will be represented by nodes and
+these should also chain backwards and forwards to corresponding nodes
+in other versions. This allows traversing the history of an individual
+file cheaply.
+
+Aggregation of sub-versions into a single full version can be done by
+having the toplevel version point to the sub-versions, the same way a
+branch node points to the versions on the branch.
+
+Now, over on the metadata side, each metadata version has a system of
+nodes and edges that's a projection of the graph of data versions. All
+of this links to a master node for the metadata version, and the
+metadata version links forward and backward to other metadata
+versions.
+
+There's a common pattern here: given a graph (whether a filesystem
+tree or version graph), we manage a bigger graph that's a collection
+of versions of that graph. Generalizing this recursion would allow
+tracking meta-metadata, e.g. commit messages for metadata changes, and
+maintaining its history; however, there's no limit and I don't think
+it ends up sane. Hopefully we can track commit messages for metadata
+changes as part of the metadata history; although that becomes
+self-referential in a way that might also cause problems.
+
+We also need nodes for branches that link to the versions that are on
+that branch. The flow relationships between branches should be edges
+between branches.
+
+For tracking which changesets have been propagated to parallel
+branches and which haven't... we could make edges that join equivalent
+or merge-equivalent changesets directly, but this by itself isn't
+sufficient. Because changes aren't necessarily propagated in
+one-to-one fashion (e.g. one might merge several changes into another
+b

CVS commit: othersrc/external/bsd/bikeshed/dist/design

2014-03-05 Thread David A. Holland
Module Name:othersrc
Committed By:   dholland
Date:   Thu Mar  6 05:24:08 UTC 2014

Modified Files:
othersrc/external/bsd/bikeshed/dist/design: requirements.txt

Log Message:
a couple minor adjustments, sitting around since last july


To generate a diff of this commit:
cvs rdiff -u -r1.2 -r1.3 \
othersrc/external/bsd/bikeshed/dist/design/requirements.txt

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Modified files:

Index: othersrc/external/bsd/bikeshed/dist/design/requirements.txt
diff -u othersrc/external/bsd/bikeshed/dist/design/requirements.txt:1.2 othersrc/external/bsd/bikeshed/dist/design/requirements.txt:1.3
--- othersrc/external/bsd/bikeshed/dist/design/requirements.txt:1.2	Fri May 24 08:25:11 2013
+++ othersrc/external/bsd/bikeshed/dist/design/requirements.txt	Thu Mar  6 05:24:08 2014
@@ -178,7 +178,7 @@ in up front.
 
 Branches that are intended to diverge (releases, for example, or
 outright project forks) are fundamentally different from branches that
-are expected to reconnect to their parent (b.g. feature development
+are expected to reconnect to their parent (e.g. feature development
 branches) once the version control system has any kind of branch
 management or tracking support.
 
@@ -197,7 +197,8 @@ capable of listing the ones that haven't
 possible action. There is absolutely no reason, however, that the
 version control system shouldn't be able to provide this information.
 AIUI, for release branches releng currently has to maintain this
-metadata by hand.
+metadata by hand. (Update: "no existing ..." may actually be "no
+existing free ...".)
 
 If you duplicate a file, such as cloning a device driver template file
 for a new driver, or starting a new pmap by copying an old one,



CVS commit: othersrc/external/bsd/bikeshed/dist/design

2013-05-24 Thread Thomas Klausner
Module Name:othersrc
Committed By:   wiz
Date:   Fri May 24 08:25:12 UTC 2013

Modified Files:
othersrc/external/bsd/bikeshed/dist/design: requirements.txt

Log Message:
typo.


To generate a diff of this commit:
cvs rdiff -u -r1.1 -r1.2 \
othersrc/external/bsd/bikeshed/dist/design/requirements.txt

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Modified files:

Index: othersrc/external/bsd/bikeshed/dist/design/requirements.txt
diff -u othersrc/external/bsd/bikeshed/dist/design/requirements.txt:1.1 othersrc/external/bsd/bikeshed/dist/design/requirements.txt:1.2
--- othersrc/external/bsd/bikeshed/dist/design/requirements.txt:1.1	Fri May 24 00:41:31 2013
+++ othersrc/external/bsd/bikeshed/dist/design/requirements.txt	Fri May 24 08:25:11 2013
@@ -44,7 +44,7 @@ even many recent systems have felt the n
 protocols and network-level constructs.
 
 Easier collaboration with non-committers is an often-requested
-feature and a de facto property of distributed version contorl.
+feature and a de facto property of distributed version control.
 
 Unanswered questions:
 



CVS commit: othersrc/external/bsd/bikeshed/dist/design

2013-05-23 Thread David A. Holland
Module Name:othersrc
Committed By:   dholland
Date:   Fri May 24 00:41:31 UTC 2013

Added Files:
othersrc/external/bsd/bikeshed/dist/design: requirements.txt

Log Message:
Stuff distilled from my notes and previous arguments and bikeshed sessions


To generate a diff of this commit:
cvs rdiff -u -r0 -r1.1 \
othersrc/external/bsd/bikeshed/dist/design/requirements.txt

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Added files:

Index: othersrc/external/bsd/bikeshed/dist/design/requirements.txt
diff -u /dev/null othersrc/external/bsd/bikeshed/dist/design/requirements.txt:1.1
--- /dev/null	Fri May 24 00:41:31 2013
+++ othersrc/external/bsd/bikeshed/dist/design/requirements.txt	Fri May 24 00:41:31 2013
@@ -0,0 +1,340 @@
+The material herein is grouped first by topic and then by priority.
+
+
+
+1. Operational model
+
+- Centralized operation with one master tree
+- Supports disconnected operation
+- No compare-by-hash
+- Native support for synced slave copies of the master tree (like anoncvs)
+- Transport-independent remote operation, supporting both http/https
+  and ssh
+- Checkouts can cache arbitrary amounts of history locally but are not
+obliged to clone everything
+- Non-committers with readonly checkouts should be able to package
+changesets for review and commit by committers.
+
+Rationale:
+
+Centralized operation is posed as a design requirement because it's a
+prerequisite for other things... and because this whole project is
+predicated on the assumption that centralized operation is acceptable.
+If someone comes up with a clever way to support distributed operation
+without compromising other requirements, well and good; otherwise one
+may as well use one of the modern distributed version control systems.
+
+Disconnected operation, meanwhile, covers most of the use cases people
+cite in favor of distributed version control.
+
+Compare-by-hash is bad not because it's slightly sleazy, or because
+the statistical assumptions about the probability of collisions are
+wrong (although in some contexts they're questionable) -- it's because
+cryptographic hash functions don't age well and the standard DVCS
+scheme for hashing chains of versions doesn't provide any decent way
+to migrate an existing repository to a new hash function.
+
+Native support for synced slave copies is needed in order to be able
+to provide anonymous access (like anoncvs) without needing access to
+the master tree. This is also meant to satisfy the use cases where
+currently people rsync the whole CVS repository locally.
+
+Transport-independent remote operation should be a no-brainer, but
+even many recent systems have felt the need to make up their own
+protocols and network-level constructs.
+
+Easier collaboration with non-committers is an often-requested
+feature and a de facto property of distributed version contorl.
+
+Unanswered questions:
+
+- Do we need support for disconnected operation by more than one user
+at a time (or perhaps more than one tree at a time) so that
+uncommitted changesets can be shared? The non-committer changeset
+support might cover this territory adequately, or not, depending on
+how it ends up working.
+
+
+2. Schema
+
+- Supports arbitrary (smallish) metadata attached to changesets, and
+also to files and directories
+- Metadata (including on old versions) is mutable and changes are kept
+in history (this includes commit message text)
+- Provides provenance tracking for changesets/commits
+- Commits/changesets are atomic
+- Version numbers (for projects, files, and subtrees if any) are
+sequential.
+- Supports rename (of files or dirs) properly, and file history
+crosses renames transparently
+- Supports copy/duplicate (of files or dirs) properly
+- Has a coherent semantic model of tree history
+- Supports local-only changes that are not pushed back to the master
+tree
+
+Rationale:
+
+While arbitrary metadata is a nuisance to support (compared to a small
+fixed metadata schema) and in many cases using this metadata facility
+(as opposed to storing information in an ordinary file in the
+repository) would be a mistake, it is nonetheless useful for various
+purposes. One of these is preserving old version numbers from a
+repository conversion; given the large number of references to NetBSD
+CVS file version numbers, including in places like security advisories
+that count as "important", preserving this information and making it
+searchable is highly desirable.
+
+Metadata should be mutable because sometimes it contains errors. One
+of the big weaknesses of current distributed version control is that
+effectively all metadata is immutable once committed; this means any
+botch not immediately detected is graven in stone for all time, unless
+someone does a complete repository rebuild updating all subsequent
+versions. Meanwhile, keeping the history sh