This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/bookkeeper.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 8edff21  Updated site at revision fe50b8c
8edff21 is described below

commit 8edff214f5309138e9c2c793dd73c5d7ae8fbdea
Author: jenkins <[email protected]>
AuthorDate: Sat Nov 17 17:41:06 2018 +0000

    Updated site at revision fe50b8c
---
 .../bps/BP-34-cluster-metadata-checker/index.html  | 62 ++++++++++++++++++++--
 1 file changed, 59 insertions(+), 3 deletions(-)

diff --git a/content/bps/BP-34-cluster-metadata-checker/index.html 
b/content/bps/BP-34-cluster-metadata-checker/index.html
index 0b9dfab..28ffcfe 100644
--- a/content/bps/BP-34-cluster-metadata-checker/index.html
+++ b/content/bps/BP-34-cluster-metadata-checker/index.html
@@ -298,13 +298,69 @@
 message GetListOfEntriesOfALedgerResponse {
        required StatusCode status = 1;
        required int64 ledgerId = 2;
-       optional bytes availabilityOfEntriesOfLedger = 3; // treated as array 
of bits indicating availability of entry at that particular index location      
          
+       optional bytes availabilityOfEntriesOfLedger = 3; // explained below
 }
 </code></pre></div></div>
 
-<p>For the sake of future extensibility it would be helpful to add version 
info (and possibly some metadata in the future) at the beginning of the 
‘availabilityOfEntriesOfLedger’. So the first 64 bytes will be considered 
reserved space, with the first byte specifying the version for now and the rest 
of the bytes in the reserved space will be 0’s.</p>
+<p>For ‘availabilityOfEntriesOfLedger’ we can use following condensed encoding 
format, which helps in reducing the number of bytes needed to represent the 
list of entries.</p>
 
-<p>Here Bookie is expected to attain this information just from just 
LedgerCache (Index Files) - IndexPersistenceMgr and IndexInMemPageMgr but it 
doesn’t actually check the availability of this entry in Entrylogger. Since the 
intention of this checker is limited to do just metadata validation at cluster 
level.</p>
+<p>(note: following representation is used just for understanding purpose, but 
this is not protobuf (or any other) representation)</p>
+
+<div class="highlighter-rouge"><div class="highlight"><pre 
class="highlight"><code>AvailabilityOfEntriesOfLedger {
+       OrderedCollection sequenceGroup;
+}
+
+SequenceGroup {
+       long firstSequenceStart;
+       long lastSequenceStart;
+       int sequenceSize;
+       int sequencePeriod;
+}
+</code></pre></div></div>
+
+<p>Nomenclature:</p>
+
+<ul>
+  <li>Continuous entries are grouped as a ’Sequence’.</li>
+  <li>Number of continuous entries in a ‘Sequence’ is called 
‘sequenceSize’.</li>
+  <li>Gap between Consecutive sequences is called ‘sequencePeriod’.</li>
+  <li>Consecutive sequences with same sequenceSize and same sequencePeriod in 
between consecutive sequences are grouped as a SequenceGroup.</li>
+  <li>‘firstSequenceStart’ is the first entry in the first sequence of the 
SequenceGroup.</li>
+  <li>‘lastSequenceStart’ is the first entry in the last sequence of the 
SequenceGroup.</li>
+  <li>Ordered collection of such SequenceGroups will represent entries of a 
ledger residing in a bookie.</li>
+</ul>
+
+<p>(in the best case scenario there will be only one SequenceGroup, 
‘sequencePeriod’ will be ensembleSize and ‘sequenceSize’ will be 
writeQuorumSize of the ledger).</p>
+
+<p>for example,</p>
+
+<p>example 1 (best case scenario):</p>
+
+<p>1, 2, 4, 5, 7, 8, 10, 11</p>
+
+<p>in this case (1, 2), (4, 5), (7, 8), (10, 11) are sequences and in this 
scenario there happens to be just one SequenceGroup, which can be represented 
like</p>
+
+<p>{ firstSequenceStart - 1, lastSequenceStart - 10, sequenceSize - 2, 
sequencePeriod - 3 }</p>
+
+<p>example 2 (an entry is missing):</p>
+
+<p>1, 2, 3, 6, 7, 8, 11, 13, 16, 17, 18, 21, 22</p>
+
+<p>(entry 12 is missing and in the last sequence there are only 2 entries 21, 
22)
+in this case (1, 2, 3), (6, 7, 8), (11), (13), (16, 17, 18), (21, 22) are the 
sequences
+so the sequence groups are</p>
+
+<p>{ firstSequenceStart - 1, lastSequenceStart - 6, sequenceSize - 3, 
sequencePeriod - 5 }, { firstSequenceStart - 11, lastSequenceStart - 13, 
sequenceSize - 1, sequencePeriod - 2 }, { firstSequenceStart - 16, 
lastSequenceStart - 16, sequenceSize - 3, sequencePeriod - 0 }, { 
firstSequenceStart - 21, lastSequenceStart - 21, sequenceSize - 2, 
sequencePeriod - 0 }</p>
+
+<p>As you can notice to represent a SequenceGroup, two long values and two int 
values are needed, so each SequenceGroup can be represented with (2 * 8 + 2 * 4 
= 24 bytes).</p>
+
+<p>In the ‘availabilityOfEntriesOfLedger’ byte array, for the sake of future 
extensibility it would be helpful to have reserved space for metadata at the 
beginning. So the first 64 bytes will be used for metadata, with the first four 
bytes specifying the int version number, next four bytes specifying the number 
of entries for now and the rest of the bytes in the reserved space will be 0’s. 
The encoded format will be represented after the first 64 bytes. The ordered 
collection of Sequence [...]
+
+<p>So for a ledger having thousands of entries, this condensed encoded format 
would need one or two SequenceGroups (in the best case, with no holes and no 
overreplication) 24/48 bytes, which would be much less than what is needed to 
represent using bit vector (array of bits indicating availability of entry at 
that particular index location)</p>
+
+<p>Any encoded format needs encoder and decoder at the sending/receiving ends 
of the channel, so the encoding/decoding logic should be handled optimally from 
computation and memory perspective.</p>
+
+<p>Here Bookie is expected to just attain index information (say from 
LedgerCache (Index Files) - IndexPersistenceMgr and IndexInMemPageMgr, 
unflushed entries in EntryMemTable in SortedLedgerStorage case and from rocksdb 
database in DBLedgerStorage case) but it doesn’t actually check the 
availability of this entry in Entrylogger. Since the intention of this checker 
is limited to do just metadata validation at cluster level.</p>
 
 <h3 id="compatibility-deprecation-and-migration-plan">Compatibility, 
Deprecation, and Migration Plan</h3>
 

Reply via email to