This revision was automatically updated to reflect the committed changes. Closed by commit rHG1fa35ca345a5: internals: document bundle2 format (authored by indygreg, committed by ).
REPOSITORY rHG Mercurial CHANGES SINCE LAST UPDATE https://phab.mercurial-scm.org/D2298?vs=5811&id=6167 REVISION DETAIL https://phab.mercurial-scm.org/D2298 AFFECTED FILES contrib/wix/help.wxs mercurial/help.py mercurial/help/internals/bundle2.txt mercurial/help/internals/bundles.txt tests/test-help.t CHANGE DETAILS diff --git a/tests/test-help.t b/tests/test-help.t --- a/tests/test-help.t +++ b/tests/test-help.t @@ -993,6 +993,7 @@ To access a subtopic, use "hg help internals.{subtopic-name}" + bundle2 Bundle2 bundles Bundles censor Censor changegroups Changegroups @@ -3059,6 +3060,13 @@ <tr><td colspan="2"><h2><a name="topics" href="#topics">Topics</a></h2></td></tr> <tr><td> + <a href="/help/internals.bundle2"> + bundle2 + </a> + </td><td> + Bundle2 + </td></tr> + <tr><td> <a href="/help/internals.bundles"> bundles </a> diff --git a/mercurial/help/internals/bundles.txt b/mercurial/help/internals/bundles.txt --- a/mercurial/help/internals/bundles.txt +++ b/mercurial/help/internals/bundles.txt @@ -63,8 +63,7 @@ ``HG20`` is currently the only defined bundle2 version. -The ``HG20`` format is not yet documented here. See the inline comments -in ``mercurial/exchange.py`` for now. +The ``HG20`` format is documented at :hg:`help internals.bundle2`. Initial ``HG20`` support was added in Mercurial 3.0 (released May 2014). However, bundle2 bundles were hidden behind an experimental flag diff --git a/mercurial/help/internals/bundle2.txt b/mercurial/help/internals/bundle2.txt new file mode 100644 --- /dev/null +++ b/mercurial/help/internals/bundle2.txt @@ -0,0 +1,677 @@ +Bundle2 refers to a data format that is used for both on-disk storage +and over-the-wire transfer of repository data and state. + +The data format allows the capture of multiple components of +repository data. Contrast with the initial bundle format, which +only captured *changegroup* data (and couldn't store bookmarks, +phases, etc). + +Bundle2 is used for: + +* Transferring data from a repository (e.g. as part of an ``hg clone`` + or ``hg pull`` operation). +* Transferring data to a repository (e.g. as part of an ``hg push`` + operation). +* Storing data on disk (e.g. the result of an ``hg bundle`` + operation). +* Transferring the results of a repository operation (e.g. the + reply to an ``hg push`` operation). + +At its highest level, a bundle2 payload is a stream that begins +with some metadata and consists of a series of *parts*, with each +part describing repository data or state or the result of an +operation. New bundle2 parts are introduced over time when there is +a need to capture a new form of data. A *capabilities* mechanism +exists to allow peers to understand which bundle2 parts the other +understands. + +Stream Format +============= + +A bundle2 payload consists of a magic string (``HG20``) followed by +stream level parameters, followed by any number of payload *parts*. + +It may help to think of the stream level parameters as *headers* and the +payload parts as the *body*. + +Stream Level Parameters +----------------------- + +Following the magic string is data that defines parameters applicable to the +entire payload. + +Stream level parameters begin with a 32-bit unsigned big-endian integer. +The value of this integer defines the number of bytes of stream level +parameters that follow. + +The *N* bytes of raw data contains a space separated list of parameters. +Each parameter consists of a required name and an optional value. + +Parameters have the form ``<name>`` or ``<name>=<value>``. + +Both the parameter name and value are URL quoted. + +Names MUST start with a letter. If the first letter is lower case, the +parameter is advisory and can safely be ignored. If the first letter +is upper case, the parameter is mandatory and the handler MUST stop if +it is unable to process it. + +Stream level parameters apply to the entire bundle2 payload. Lower-level +options should go into a bundle2 part instead. + +The following stream level parameters are defined: + +compression + Compression format of payload data. ``GZ`` denotes zlib. ``BZ`` + denotes bzip2. ``ZS`` denotes zstandard. + + When defined, all bytes after the stream level parameters are + compressed using the compression format defined by this parameter. + + If this parameter isn't present, data is raw/uncompressed. + + This parameter MUST be mandatory because attempting to consume + streams without knowing how to decode the underlying bytes will + result in errors. + +Payload Part +------------ + +Following the stream level parameters are 0 or more payload parts. Each +payload part consists of a header and a body. + +The payload part header consists of a 32-bit unsigned big-endian integer +defining the number of bytes in the header that follow. The special +value ``0`` indicates the end of the bundle2 stream. + +The binary format of the part header is as follows: + +* 8-bit unsigned size of the part name +* N-bytes alphanumeric part name +* 32-bit unsigned big-endian part ID +* N bytes part parameter data + +The *part name* identifies the type of the part. A part name with an +UPPERCASE letter is mandatory. Otherwise, the part is advisory. A +consumer should abort if it encounters a mandatory part it doesn't know +how to process. See the sections below for each defined part type. + +The *part ID* is a unique identifier within the bundle used to refer to a +specific part. It should be unique within the bundle2 payload. + +Part parameter data consists of: + +* 1 byte number of mandatory parameters +* 1 byte number of advisory parameters +* 2 * N bytes of sizes of parameter key and values +* N * M blobs of values for parameter key and values + +Following the 2 bytes of mandatory and advisory parameter counts are +2-tuples of bytes of the sizes of each parameter. e.g. +(<key size>, <value size>). + +Following that are the raw values, without padding. Mandatory parameters +come first, followed by advisory parameters. + +Each parameter's key MUST be unique within the part. + +Following the part parameter data is the part payload. The part payload +consists of a series of framed chunks. The frame header is a 32-bit +big-endian integer defining the size of the chunk. The N bytes of raw +payload data follows. + +The part payload consists of 0 or more chunks. + +A chunk with size ``0`` denotes the end of the part payload. Therefore, +there will always be at least 1 32-bit integer following the payload +part header. + +A chunk size of ``-1`` is used to signal an *interrupt*. If such a chunk +size is seen, the stream processor should process the next bytes as a new +payload part. After this payload part, processing of the original, +interrupted part should resume. + +Capabilities +============ + +Bundle2 is a dynamic format that can evolve over time. For example, +when a new repository data concept is invented, a new bundle2 part +is typically invented to hold that data. In addition, parts performing +similar functionality may come into existence if there is a better +mechanism for performing certain functionality. + +Because the bundle2 format evolves over time, peers need to understand +what bundle2 features the other can understand. The *capabilities* +mechanism is how those features are expressed. + +Bundle2 capabilities are logically expressed as a dictionary of +string key-value pairs where the keys are strings and the values +are lists of strings. + +Capabilities are encoded for exchange between peers. The encoded +capabilities blob consists of a newline (``\n``) delimited list of +entries. Each entry has the form ``<key>`` or ``<key>=<value>``, +depending if the capability has a value. + +The capability name is URL quoted (``%XX`` encoding of URL unsafe +characters). + +The value, if present, is formed by URL quoting each value in +the capability list and concatenating the result with a comma (``,``). + +For example, the capabilities ``novaluekey`` and ``listvaluekey`` +with values ``value 1`` and ``value 2``. This would be encoded as: + + listvaluekey=value%201,value%202\nnovaluekey + +The sections below detail the defined bundle2 capabilities. + +HG20 +---- + +Denotes that the peer supports the bundle2 data format. + +bookmarks +--------- + +Denotes that the peer supports the ``bookmarks`` part. + +Peers should not issue mandatory ``bookmarks`` parts unless this +capability is present. + +changegroup +----------- + +Denotes which versions of the *changegroup* format the peer can +receive. Values include ``01``, ``02``, and ``03``. + +The peer should not generate changegroup data for a version not +specified by this capability. + +checkheads +---------- + +Denotes which forms of heads checking the peer supports. + +If ``related`` is in the value, then the peer supports the ``check:heads`` +part and the peer is capable of detecting race conditions when applying +changelog data. + +digests +------- + +Denotes which hashing formats the peer supports. + +Values are names of hashing function. Values include ``md5``, ``sha1``, +and ``sha512``. + +error +----- + +Denotes which ``error:`` parts the peer supports. + +Value is a list of strings of ``error:`` part names. Valid values +include ``abort``, ``unsupportecontent``, ``pushraced``, and ``pushkey``. + +Peers should not issue an ``error:`` part unless the type of that +part is listed as supported by this capability. + +listkeys +-------- + +Denotes that the peer supports the ``listkeys`` part. + +hgtagsfnodes +------------ + +Denotes that the peer supports the ``hgtagsfnodes`` part. + +obsmarkers +---------- + +Denotes that the peer supports the ``obsmarker`` part and which versions +of the obsolescence data format it can receive. Values are strings like +``V<N>``. e.g. ``V1``. + +phases +------ + +Denotes that the peer supports the ``phases`` part. + +pushback +-------- + +Denotes that the peer supports sending/receiving bundle2 data in response +to a bundle2 request. + +This capability is typically used by servers that employ server-side +rewriting of pushed repository data. For example, a server may wish to +automatically rebase pushed changesets. When this capability is present, +the server can send a bundle2 response containing the rewritten changeset +data and the client will apply it. + +pushkey +------- + +Denotes that the peer supports the ``puskey`` part. + +remote-changegroup +------------------ + +Denotes that the peer supports the ``remote-changegroup`` part and +which protocols it can use to fetch remote changegroup data. + +Values are protocol names. e.g. ``http`` and ``https``. + +stream +------ + +Denotes that the peer supports ``stream*`` parts in order to support +*stream clone*. + +Values are which ``stream*`` parts the peer supports. ``v2`` denotes +support for the ``stream2`` part. + +Bundle2 Part Types +================== + +The sections below detail the various bundle2 part types. + +bookmarks +--------- + +The ``bookmarks`` part holds bookmarks information. + +This part has no parameters. + +The payload consists of entries defining bookmarks. Each entry consists of: + +* 20 bytes binary changeset node. +* 2 bytes big endian short defining bookmark name length. +* N bytes defining bookmark name. + +Receivers typically update bookmarks to match the state specified in +this part. + +changegroup +----------- + +The ``changegroup`` part contains *changegroup* data (changelog, manifestlog, +and filelog revision data). + +The following part parameters are defined for this part. + +version + Changegroup version string. e.g. ``01``, ``02``, and ``03``. This parameter + determines how to interpret the changegroup data within the part. + +nbchanges + The number of changesets in this changegroup. This parameter can be used + to aid in the display of progress bars, etc during part application. + +treemanifest + Whether the changegroup contains tree manifests. + +targetphase + The target phase of changesets in this part. Value is an integer of + the target phase. + +The payload of this part is raw changegroup data. See +:hg:`help internals.changegroups` for the format of changegroup data. + +check:bookmarks +--------------- + +The ``check:bookmarks`` part is inserted into a bundle as a means for the +receiver to validate that the sender's known state of bookmarks matches +the receiver's. + +This part has no parameters. + +The payload is a binary stream of bookmark data. Each entry in the stream +consists of: + +* 20 bytes binary node that bookmark is associated with +* 2 bytes unsigned short defining length of bookmark name +* N bytes containing the bookmark name + +If all bits in the node value are ``1``, then this signifies a missing +bookmark. + +When the receiver encounters this part, for each bookmark in the part +payload, it should validate that the current bookmark state matches +the specified state. If it doesn't, then the receiver should take +appropriate action. (In the case of pushes, this mismatch signifies +a race condition and the receiver should consider rejecting the push.) + +check:heads +----------- + +The ``check:heads`` part is a means to validate that the sender's state +of DAG heads matches the receiver's. + +This part has no parameters. + +The body of this part is an array of 20 byte binary nodes representing +changeset heads. + +Receivers should compare the set of heads defined in this part to the +current set of repo heads and take action if there is a mismatch in that +set. + +Note that this part applies to *all* heads in the repo. + +check:phases +------------ + +The ``check:phases`` part validates that the sender's state of phase +boundaries matches the receiver's. + +This part has no parameters. + +The payload consists of an array of 24 byte entries. Each entry is +a big endian 32-bit integer defining the phase integer and 20 byte +binary node value. + +For each changeset defined in this part, the receiver should validate +that its current phase matches the phase defined in this part. The +receiver should take appropriate action if a mismatch occurs. + +check:updated-heads +------------------- + +The ``check:updated-heads`` part validates that the sender's state of +DAG heads updated by this bundle matches the receiver's. + +This type is nearly identical to ``check:heads`` except the heads +in the payload are only a subset of heads in the repository. The +receiver should validate that all nodes specified by the sender are +branch heads and take appropriate action if not. + +error:abort +----------- + +The ``error:abort`` part conveys a fatal error. + +The following part parameters are defined: + +message + The string content of the error message. + +hint + Supplemental string giving a hint on how to fix the problem. + +error:pushkey +------------- + +The ``error:pushkey`` part conveys an error in the *pushkey* protocol. + +The following part parameters are defined: + +namespace + The pushkey domain that exhibited the error. + +key + The key whose update failed. + +new + The value we tried to set the key to. + +old + The old value of the key (as supplied by the client). + +ret + The integer result code for the pushkey request. + +in-reply-to + Part ID that triggered this error. + +This part is generated if there was an error applying *pushkey* data. +Pushkey data includes bookmarks, phases, and obsolescence markers. + +error:pushraced +--------------- + +The ``error:pushraced`` part conveys that an error occurred and +the likely cause is losing a race with another pusher. + +The following part parameters are defined: + +message + String error message. + +This part is typically emitted when a receiver examining ``check:*`` +parts encountered inconsistency between incoming state and local state. +The likely cause of that inconsistency is another repository change +operation (often another client performing an ``hg push``). + +error:unsupportedcontent +------------------------ + +The ``error:unsupportedcontent`` part conveys that a bundle2 receiver +encountered a part or content it was not able to handle. + +The following part parameters are defined: + +parttype + The name of the part that triggered this error. + +params + ``\0`` delimited list of parameters. + +hgtagsfnodes +------------ + +The ``hgtagsfnodes`` type defines file nodes for the ``.hgtags`` file +for various changesets. + +This part has no parameters. + +The payload is an array of pairs of 20 byte binary nodes. The first node +is a changeset node. The second node is the ``.hgtags`` file node. + +Resolving tags requires resolving the ``.hgtags`` file node for changesets. +On large repositories, this can be expensive. Repositories cache the +mapping of changeset to ``.hgtags`` file node on disk as a performance +optimization. This part allows that cached data to be transferred alongside +changeset data. + +Receivers should update their ``.hgtags`` cache file node mappings with +the incoming data. + +listkeys +-------- + +The ``listkeys`` part holds content for a *pushkey* namespace. + +The following part parameters are defined: + +namespace + The pushkey domain this data belongs to. + +The part payload contains a newline (``\n``) delimited list of +tab (``\t``) delimited key-value pairs defining entries in this pushkey +namespace. + +obsmarkers +---------- + +The ``obsmarkers`` part defines obsolescence markers. + +This part has no parameters. + +The payload consists of obsolescence markers using the on-disk markers +format. The first byte defines the version format. + +The receiver should apply the obsolescence markers defined in this +part. A ``reply:obsmarkers`` part should be sent to the sender, if possible. + +output +------ + +The ``output`` part is used to display output on the receiver. + +This part has no parameters. + +The payload consists of raw data to be printed on the receiver. + +phase-heads +----------- + +The ``phase-heads`` part defines phase boundaries. + +This part has no parameters. + +The payload consists of an array of 24 byte entries. Each entry is +a big endian 32-bit integer defining the phase integer and 20 byte +binary node value. + +pushkey +------- + +The ``pushkey`` part communicates an intent to perform a ``pushkey`` +request. + +The following part parameters are defined: + +namespace + The pushkey domain to operate on. + +key + The key within the pushkey namespace that is being changed. + +old + The old value for the key being changed. + +new + The new value for the key being changed. + +This part has no payload. + +The receiver should perform a pushkey operation as described by this +part's parameters. + +If the pushey operation fails, a ``reply:pushkey`` part should be sent +back to the sender, if possible. The ``in-reply-to`` part parameter +should reference the source part. + +pushvars +-------- + +The ``pushvars`` part defines environment variables that should be +set when processing this bundle2 payload. + +The part's advisory parameters define environment variables. + +There is no part payload. + +When received, part parameters are prefixed with ``USERVAR_`` and the +resulting variables are defined in the hooks context for the current +bundle2 application. This part provides a mechanism for senders to +inject extra state into the hook execution environment on the receiver. + +remote-changegroup +------------------ + +The ``remote-changegroup`` part defines an external location of a bundle +to apply. This part can be used by servers to serve pre-generated bundles +hosted at arbitrary URLs. + +The following part parameters are defined: + +url + The URL of the remote bundle. + +size + The size in bytes of the remote bundle. + +digests + A space separated list of the digest types provided in additional + part parameters. + +digest:<type> + The hexadecimal representation of the digest (hash) of the remote bundle. + +There is no payload for this part type. + +When encountered, clients should attempt to fetch the URL being advertised +and read and apply it as a bundle. + +The ``size`` and ``digest:<type>`` parameters should be used to validate +that the downloaded bundle matches what was advertised. If a mismatch occurs, +the client should abort. + +reply:changegroup +----------------- + +The ``reply:changegroup`` part conveys the results of application of a +``changegroup`` part. + +The following part parameters are defined: + +return + Integer return code from changegroup application. + +in-reply-to + Part ID of part this reply is in response to. + +reply:obsmarkers +---------------- + +The ``reply:obsmarkers`` part conveys the results of applying an +``obsmarkers`` part. + +The following part parameters are defined: + +new + The integer number of new markers that were applied. + +in-reply-to + The part ID that this part is in reply to. + +reply:pushkey +------------- + +The ``reply:pushkey`` part conveys the result of a *pushkey* operation. + +The following part parameters are defined: + +return + Integer result code from pushkey operation. + +in-reply-to + Part ID that triggered this pushkey operation. + +This part has no payload. + +replycaps +--------- + +The ``replycaps`` part notifies the receiver that a reply bundle should +be created. + +This part has no parameters. + +The payload consists of a bundle2 capabilities blob. + +stream2 +------- + +The ``stream2`` part contains *streaming clone* version 2 data. + +The following part parameters are defined: + +requirements + URL quoted repository requirements string. Requirements are delimited by a + command (``,``). + +filecount + The total number of files being transferred in the payload. + +bytecount + The total size of file content being transferred in the payload. + +The payload consists of raw stream clone version 2 data. + +The ``filecount`` and ``bytecount`` parameters can be used for progress and +reporting purposes. The values may not be exact. diff --git a/mercurial/help.py b/mercurial/help.py --- a/mercurial/help.py +++ b/mercurial/help.py @@ -197,6 +197,8 @@ return loader internalstable = sorted([ + (['bundle2'], _('Bundle2'), + loaddoc('bundle2', subdir='internals')), (['bundles'], _('Bundles'), loaddoc('bundles', subdir='internals')), (['censor'], _('Censor'), diff --git a/contrib/wix/help.wxs b/contrib/wix/help.wxs --- a/contrib/wix/help.wxs +++ b/contrib/wix/help.wxs @@ -40,6 +40,7 @@ <Directory Id="help.internaldir" Name="internals"> <Component Id="help.internals" Guid="$(var.help.internals.guid)" Win64='$(var.IsX64)'> + <File Id="internals.bundle2.txt" Name="bundle2.txt" /> <File Id="internals.bundles.txt" Name="bundles.txt" KeyPath="yes" /> <File Id="internals.censor.txt" Name="censor.txt" /> <File Id="internals.changegroups.txt" Name="changegroups.txt" /> To: indygreg, #hg-reviewers, yuja Cc: mercurial-devel _______________________________________________ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel