From: Gregory Szorc <gregory.sz...@gmail.com>
Date: Friday, November 11, 2016 at 5:20 AM
To: Stanislau Hlebik <st...@fb.com>
Cc: mercurial-devel <mercurial-devel@mercurial-scm.org>
Subject: Re: [PATCH 2 of 8 V7] bookmarks: introduce binary encoding

On Wed, Nov 2, 2016 at 7:06 AM, Stanislau Hlebik 
<st...@fb.com<mailto:st...@fb.com>> wrote:
# HG changeset patch
# User Stanislau Hlebik <st...@fb.com<mailto:st...@fb.com>>
# Date 1478016027 25200
#      Tue Nov 01 09:00:27 2016 -0700
# Branch stable
# Node ID f3da841e3d47ccbb0be3892b521607c09fdeab13
# Parent  a56a624a8a42b93ec980d2c284756a38719dffe6
bookmarks: introduce binary encoding

Bookmarks binary encoding will be used for `bookmarks` bundle2 part.
The format is: <4 bytes - bookmark size, big-endian><bookmark>
               <1 byte - 0 if node is empty 1 otherwise><20 bytes node>
BookmarksEncodeError and BookmarksDecodeError maybe thrown if input is
incorrect.

diff --git a/mercurial/bookmarks.py b/mercurial/bookmarks.py
--- a/mercurial/bookmarks.py
+++ b/mercurial/bookmarks.py
@@ -7,8 +7,10 @@

 from __future__ import absolute_import

+import StringIO

This module isn't available on Python 3. Use util.stringio or (preferably) 
io.BytesIO (since it assumes it is operating on bytes and raises if it sees 
unicode types).

 import errno
 import os
+import struct

 from .i18n import _
 from .node import (
@@ -23,6 +25,70 @@
     util,
 )

+_NONEMPTYNODE = struct.pack('?', False)
+_EMPTYNODE = struct.pack('?', True)

TIL "?" is a valid format character for struct!

+
+def _unpackbookmarksize(stream):
+    """Returns 0 if stream is empty.
+    """
+
+    expectedsize = struct.calcsize('>i')
+    encodedbookmarksize = stream.read(expectedsize)
+    if len(encodedbookmarksize) == 0:
+        return 0

if not encodedbookmarksize:

+    if len(encodedbookmarksize) != expectedsize:
+        raise ValueError(
+            _('cannot decode bookmark size: '
+              'expected size: %d, actual size: %d') %
+            (expectedsize, len(encodedbookmarksize)))
+    return struct.unpack('>i', encodedbookmarksize)[0]
+
+def encodebookmarks(bookmarks):
+    """Encodes bookmark to node mapping to the byte string.
+
+    Format: <4 bytes - bookmark size><bookmark>
+            <1 byte - 0 if node is empty 1 otherwise><20 bytes node>
+    Node may be 20 byte binary string, 40 byte hex string or empty
+    """

I could make the argument for grouping the data by all sizes+nodes then all the 
names. This allows nice byte aligned packing and unpacking and some 
opportunities for indexing. It may also yield slightly better compression as 
the bookmark names will all be next to each other (although you'd have to 
measure that because compression window sizes may save us). But I'm having 
difficulty convincing myself it is worth it. Of course, having this data layout 
would also prohibit streaming. I think it is worth mentioning the alternative 
but probably not worth switching.

+
+    for bookmark, node in bookmarks.iteritems():

You'll want to use a sorted() here or eventual tests for multiple bookmarks 
will be unstable.

+        yield struct.pack('>i', (len(bookmark)))
+        yield encoding.fromlocal(bookmark)
+        if node:
+            if len(node) != 20 and len(node) != 40:
+                raise ValueError(_('node must be 20 or bytes long'))
+            if len(node) == 40:
+                node = bin(node)

I think the API should accept a single node type: the 20 byte binary node.


I’m fine with accepting only 20 byte nodes, but I don’t understand why 
accepting 40 byte nodes is bad.
Can you please explain?



+            yield _NONEMPTYNODE
+            yield node
+        else:
+            yield _EMPTYNODE

I'm on the fence regarding this single byte flag. On one hand, being explicit 
is nice. On the other, a sequence of \x00 identifies the null or empty node in 
so many other places. What's your reasoning here?

I think I replied somewhere but this letter got lost probably.
I’m using empty node to show that bookmark is deleted.
Nullid can be used for the same purpose but then  it would be impossible to 
push/pull bookmarks that point to null commit
(I know this is a rare case, but it’s possible to do it now).

+
+def decodebookmarks(buf):
+    """Decodes buffer into bookmark to node mapping.
+
+    Node is either 20 bytes or empty.
+    """
+
+    stream = StringIO.StringIO(buf)
+    bookmarks = {}
+    bookmarksize = _unpackbookmarksize(stream)
+    while bookmarksize:
+        bookmark = stream.read(bookmarksize)
+        if len(bookmark) != bookmarksize:
+            raise ValueError(
+                _('cannot decode bookmark: expected size: %d, '
+                'actual size: %d') % (bookmarksize, len(bookmark)))
+        bookmark = encoding.tolocal(bookmark)
+        packedemptynodeflag = stream.read(struct.calcsize('?'))
+        emptynode = struct.unpack('?', packedemptynodeflag)[0]

Yes, `struct` does cache things. But I don't like relying on it inside a loop. 
Please create a `struct.Struct` instance outside the loop.

+        node = ''
+        if not emptynode:
+            node = stream.read(20)
+        bookmarks[bookmark] = node

Alternatively, we could yield key, value pairs. That would facilitate streaming 
and allow the part to adapt to storing multiple values for the same bookmark. 
(YAGNI applies so if multiple nodes per bookmark isn't something you think we 
would need in the immediate future, don't bother.) New bundle parts aren't that 
hard to invent.

+        bookmarksize = _unpackbookmarksize(stream)
+    return bookmarks
+
 def _getbkfile(repo):
     """Hook so that extensions that mess with the store can hook bm storage.

_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org<mailto:Mercurial-devel@mercurial-scm.org>
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mercurial-2Dscm.org_mailman_listinfo_mercurial-2Ddevel&d=DQMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=1EQ58Dmb5uX1qHujcsT1Mg&m=ScomB9LY6Dj69k_tV10POcPVOn0NOTawu4nKgHZXVcM&s=yCmp6Z-zHlSnEMRrNUI8m71rBxfXOAs3Lz5kehoqut8&e=>

_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Reply via email to