Branch: refs/heads/master
  Home:   https://github.com/btcsuite/btcd
  Commit: f68cd7422dd5d0e0d6002647305c1fd663aee77d
      
https://github.com/btcsuite/btcd/commit/f68cd7422dd5d0e0d6002647305c1fd663aee77d
  Author: Dave Collins <[email protected]>
  Date:   2016-06-03 (Fri, 03 Jun 2016)

  Changed paths:
    M wire/common.go
    M wire/msgtx.go
    M wire/netaddress.go

  Log Message:
  -----------
  wire: Reduce allocs with a binary free list.

This introduces a new binary free list which provides a concurrent safe
list of unused buffers for the purpose of serializing and deserializing
primitive integers to their raw binary bytes.

For convenience, the type also provides functions for each of the
primitive unsigned integers that automatically obtain a buffer from the
free list, perform the necessary binary conversion, read from or write
to the given io.Reader or io.Writer, and return the buffer to the free
list.

A global instance of the type has been introduced with a maximum number
of 1024 items. Since each buffer is 8 bytes, it will consume a maximum
of 8KB.  Theoretically, this value would only allow up to 1024 peers
simultaneously reading and writing without having to resort to burdening
the garbage collector with additional allocations.  However, due to the
fact the code is designed in such a way that the buffers are quickly
used and returned to the free list, in practice it can support much more
than 1024 peers without involving the garbage collector since it is
highly unlikely every peer would need a buffer at the exact same time.

The following is a before and after comparison of the allocations
with the benchmarks that did not change removed:

benchmark              old allocs     new allocs     delta
-------------------------------------------------------------
WriteVarInt1           1              0              -100.00%
WriteVarInt3           1              0              -100.00%
WriteVarInt5           1              0              -100.00%
WriteVarInt9           1              0              -100.00%
ReadVarInt1            1              0              -100.00%
ReadVarInt3            1              0              -100.00%
ReadVarInt5            1              0              -100.00%
ReadVarInt9            1              0              -100.00%
ReadVarStr4            3              2              -33.33%
ReadVarStr10           3              2              -33.33%
WriteVarStr4           2              1              -50.00%
WriteVarStr10          2              1              -50.00%
ReadOutPoint           1              0              -100.00%
WriteOutPoint          1              0              -100.00%
ReadTxOut              3              1              -66.67%
WriteTxOut             2              0              -100.00%
ReadTxIn               5              2              -60.00%
WriteTxIn              3              0              -100.00%
DeserializeTxSmall     15             7              -53.33%
DeserializeTxLarge     33428          16715          -50.00%
SerializeTx            8              0              -100.00%
ReadBlockHeader        7              1              -85.71%
WriteBlockHeader       10             4              -60.00%
DecodeGetHeaders       1004           501            -50.10%
DecodeHeaders          18002          4001           -77.77%
DecodeGetBlocks        1004           501            -50.10%
DecodeAddr             9002           4001           -55.55%
DecodeInv              150005         50003          -66.67%
DecodeNotFound         150004         50002          -66.67%
DecodeMerkleBlock      222            108            -51.35%
TxSha                  10             2              -80.00%


  Commit: 5de5b7354ca458d6e7677d6b4629214d3f871b59
      
https://github.com/btcsuite/btcd/commit/5de5b7354ca458d6e7677d6b4629214d3f871b59
  Author: Dave Collins <[email protected]>
  Date:   2016-06-03 (Fri, 03 Jun 2016)

  Changed paths:
    M wire/blockheader.go
    M wire/common.go
    M wire/msgversion.go
    M wire/netaddress.go

  Log Message:
  -----------
  wire: Avoid allocation on timestamp decodes.

Since the protocol encodes timestamps differently depending on the
message, the code currently decodes into a local variable and then
converts it to a time.Time.  However, this causes an allocation due to
the local having to escape to the heap in order for the readElement
function to write to it.

So, in order to avoid that, this introduces two new types for a
timestamp named uint32Time and int64Time that are encoded as the
respective type on the read.  When calling the readElements function,
the time.Time field in the message is cast to a pointer of the
appropriate type which effectively allows the allocations to be avoided.

The following is a before and after comparison of the allocations
with the benchmarks that did not change removed:

benchmark              old allocs     new allocs     delta
----------------------------------------------------------------------
ReadBlockHeader        1              0              -100.00%
DecodeHeaders          4001           2001           -49.99%
DecodeAddr             4001           3001           -24.99%
DecodeMerkleBlock      108            107            -0.93%


  Commit: 2adfb3b56acd280e84451e94dd0c06203eef9832
      
https://github.com/btcsuite/btcd/commit/2adfb3b56acd280e84451e94dd0c06203eef9832
  Author: Dave Collins <[email protected]>
  Date:   2016-06-03 (Fri, 03 Jun 2016)

  Changed paths:
    M wire/msgaddr.go
    M wire/msggetblocks.go
    M wire/msggetdata.go
    M wire/msggetheaders.go
    M wire/msgheaders.go
    M wire/msginv.go
    M wire/msgmerkleblock.go
    M wire/msgnotfound.go
    M wire/msgtx.go

  Log Message:
  -----------
  wire: Reduce allocs with contiguous slices.

The current code involves a ton of small allocations which is harsh on
the garbage collector and in turn causes a lot of addition runtime
overhead both in terms of additional memory and processing time.

In order to improve the situation, this drasticially reduces the number
of allocations by creating contiguous slices of objects and
deserializing into them.  Since the final data structures consist of
slices of pointers to the objects, they are constructed by pointing them
into the appropriate offset of the contiguous slice.

This could be improved upon even further by converting all of the data
structures provided the wire package to be slices of contiguous objects
directly, however that would be a major breaking API change and would
end up requiring updating a lot more code in every caller.  I do think
that ultimately the API should be changed, but the changes in this
commit already makes a massive difference and it doesn't require
touching any of the callers, so it is a good place to begin.

The following is a before and after comparison of the allocations
with the benchmarks that did not change removed:

benchmark              old allocs     new allocs     delta
-----------------------------------------------------------
DeserializeTxLarge     16715          11146          -33.32%
DecodeGetHeaders       501            2              -99.60%
DecodeHeaders          2001           2              -99.90%
DecodeGetBlocks        501            2              -99.60%
DecodeAddr             3001           2002           -33.29%
DecodeInv              50003          3              -99.99%
DecodeNotFound         50002          3              -99.99%
DecodeMerkleBlock      107            3              -97.20%


  Commit: 6229e3583505a82d4514b1efa86f910b78693825
      
https://github.com/btcsuite/btcd/commit/6229e3583505a82d4514b1efa86f910b78693825
  Author: Dave Collins <[email protected]>
  Date:   2016-06-03 (Fri, 03 Jun 2016)

  Changed paths:
    M wire/bench_test.go
    M wire/msgtx.go

  Log Message:
  -----------
  wire: Further reduce transaction allocs.

This commit drastically reduces the number of allocations needed to
deserialize a transaction and its scripts by using the combination of a
free list for initially deserializing the individual scripts along with
copying them into a single contiguous byte slice after the final size is
known and modifying each script in the transaction to point to its
location within the contiguous blob.

The end result is only a single allocation that holds all of the scripts
for a transaction regardless of the total number of scripts it has.

The script free list allows a maximum of 12,500 items with each buffer
being 512 bytes.  This implies it will have a peak usage of 6.1MB.  The
values were chosen based on profiling data and a desire to allow at
least 100 scripts per transaction to be simultaneously deserialized by
125 peers.

Also, while optimizing, decode directly into the existing previous
outpoint structure of each transaction input in order to avoid the extra
allocation per input that is otherwise caused when the local escapes to
the heap.

The following is a before and after comparison of the allocations
with the benchmarks that did not change removed:

benchmark              old allocs     new allocs     delta
-----------------------------------------------------------
ReadTxOut              1              0              -100.00%
ReadTxIn               2              0              -100.00%
DeserializeTxSmall     7              5              -28.57%
DeserializeTxLarge     11146          6              -99.95%


Compare: https://github.com/btcsuite/btcd/compare/dc83f4ee6a12...6229e3583505

Reply via email to