[bitcoin-dev] Introducing a version field to BIP39 mnemonic phrases

2024-01-11 Thread Leslie via bitcoin-dev

BIP:
Layer: Applications
Title: Versioned BIP39 Mnemonic Phrases
Author: Leslie <0300dbd...@protonmail.com>
Status: None
Type: Standards Track
Created: 2024-01-10


## Abstract

This BIP proposes an enhancement to the BIP39 mnemonic phrases by introducing a 
version field.
The version field will be a 32-bit field, prepended to the entropy of the BIP39 
mnemonic phrase.
The first 24 bits are for general purposes, and the subsequent 8 bits are for 
defining the version used.

## Motivation
The current implementation of BIP39 mnemonic phrases lacks a crucial feature: 
versioning.
This omission has been identified as a significant design flaw, affecting the 
robustness and future-proofness of the mnemonic phrase generation and usage.
Notable community members and projects have expressed concerns over this 
shortcoming:

>The lack of versioning is a serious design flaw in this proposal. On this 
>basis alone I would recommend against use of this proposal.

\- [Greg Maxwell 
2017-03-14](https://github.com/bitcoin/bips/wiki/Comments:BIP-0039/fd2ddb6d840c6a91c98a29146b9a62d6a65d03bf)

Furthermore, the absence of a version number in BIP39 seed phrases poses risks 
and inefficiencies in wallet software development and backward compatibility:

>BIP39 seed phrases do not include a version number. This means that software 
>should always know how to generate keys and addresses. BIP43 suggests that 
>wallet software will try various existing derivation schemes within the BIP32 
>framework. This is extremely inefficient and rests on the assumption that 
>future wallets will support all previously accepted derivation methods. If, in 
>the future, a wallet developer decides not to implement a particular 
>derivation method because it is deprecated, then the software will not be able 
>to detect that the corresponding seed phrases are not supported, and it will 
>return an empty wallet instead. This threatens users funds.
>
>For these reasons, Electrum does not generate BIP39 seeds.

\- [Electrum Documentation 
2017-01-27](https://electrum.readthedocs.io/en/latest/seedphrase.html#motivation)

The proposed BIP aims to address these concerns by introducing a version field 
in the BIP39 mnemonic phrases.
The introduction of versioning is expected to enhance the mnemonic's 
adaptability to future changes, improve the efficiency of wallet software in 
handling different derivation methods, and secure users funds by reducing the 
risk of incompatibilities between mnemonic phrases and wallet implementations.

## Generating the Mnemonic

In this proposal, we build upon the structure of BIP39 to include a versioned 
enhancement in the mnemonic generation process. The mnemonic encodes entropy, 
as in BIP39, but with a flexible approach to the size of the initial entropy 
(ENT).

### Version Field Inclusion:

1. **Initial Entropy Generation:**
The initial entropy of ENT bits is generated, where ENT can be any size as long 
as it is a multiple of 32 bits.

2. **Version Field Prepending:**
A crucial addition to this process is the prepending of a 32-bit version field 
to the initial entropy. This field is composed of:
- The first 24 bits are reserved for general purposes, which can be utilized 
for various enhancements or specific wallet functionalities.
- The remaining 8 bits are designated for specifying the version of the BIP39 
standard.

3. **Checksum Calculation:**
A checksum is generated following the BIP39 method: taking the first (ENT + VF 
) / 32 bits of the SHA256 hash of the combined entropy (initial entropy plus 
the 32-bit version field). This checksum is then appended to the combined 
entropy.

4. **Concatenation and Splitting:**
The combined entropy, including the initial entropy, version field, and 
checksum, is split into groups of 11 bits. Each group of bits corresponds to an 
index in the BIP39 wordlist.

5. **Mnemonic Sentence Formation:**
The mnemonic sentence is formed by converting these 11-bit groups into words 
using the standard BIP39 wordlist.

## Compatibility Considerations

- **Backward Compatibility:** Systems designed for BIP39, unaware of the 32-bit 
extension, will interpret the mnemonic as a 'Legacy' BIP39 phrase.
- **Forward Compatibility:** The versioning mechanism prepares systems for 
future modifications to the BIP39 standard, facilitating seamless integration.

## Dictionary Dependency

Wallets will still require access to the predefined BIP39 dictionary to 
retrieve the version of the mnemonic seed and validate the checksum.

> 💡 It's noteworthy that the BIP39 English wordlist includes specific words 
> that software can use to identify the mnemonic's version number in a 
> user-friendly manner, reducing dependence on the wordlist for version 
> recognition.
>
> One way to achieve this is by assigning the first 22
> bits of the reserved field to match these words.
>
> 0010110 101 : version zero
> 0010110 10011010101 : version one
> 0010110 11101011101 : version two
> 00

Re: [bitcoin-dev] Compressed Bitcoin Transactions

2024-01-11 Thread Tom Briar via bitcoin-dev
Hi,

After reviewing all the feedback and writing a reference implementation, I have 
linked the updated schema and a Draft PR for a reference Implementation to 
Bitcoin Core.

Some of the major changes consist of: 

Removing the grinding of the nLocktime in favor of a relative block height, 
which all of the Compressed Inputs use.
And the use of a second kind of Variable Integer.


Compressed Transaction Schema:

compressed_transactions.md

Reference Impl/Draft PR:

https://github.com/bitcoin/bitcoin/pull/29134

Thanks-
Tom.

=== begin compressed_transactions.md ===

# Compressed Transaction Schema
By (Tom Briar) and (Andrew Poelstra)

## 1. Abstract

With this Transaction Compression Schema we use several methods to compress 
transactions,
including dropping data and recovering it on decompression by grinding until we 
obtain
valid signatures.

The bulk of our size savings come from replacing the prevout of each input by a 
block
height and index. This requires the decompression to have access to the 
blockchain, and
also means that compression is ineffective for transactions that spend 
unconfirmed or
insufficiently confirmed outputs.

Even without compression, Taproot keyspends are very small: as witness data they
include only a single 64/65-byte signature and do not repeat the public key or
any other metadata. By using pubkey recovery, we obtain Taproot-like compression
for legacy and Segwit transactions.

The main applications for this schema are for steganography, satellite/radio 
broadcast, and
other low bandwidth channels with a high CPU availability on decompression. We
assume users have some ability to shape their transactions to improve their
compressibility, and therefore give special treatment to certain transaction 
forms.

This schema is easily reversible except for compressing the Txid/Vout input 
pairs(Method 4).
Compressing the input Txid/Vout is optional, and without it still gleans 50% of 
the
total compression. This allows for the additional use case of P2P communication.

## 2. Methods

The four main methods to achieve a lower transactions size are:

1. packing transaction metadata before the transaction and each of its inputs 
and
outputs to determine the structure of the following data.
2. replacing 32-bit numeric values with either variable-length integers 
(VarInts) or compact-integers (CompactSizes).
3. using compressed signatures and public key recovery upon decompression.
4. replacing the 36-byte txid/vout pair with a blockheight and output index.

Method 4 will cause the compressed transaction to be undecompressable if a block
reorg occurs at or before the block it's included in. Therefore, we'll only 
compress
the Txid if the transaction input is at least one hundred blocks old.


## 3 Schema

### 3.1 Primitives

| Name | Width | Description |
|--|---|-|
| CompactSize  | 1-5 Bytes | For 0-253, encode the value directly in one 
byte. For 254-65535, encode 254 followed by 2 little-endian bytes. For 
65536-(2^32-1), encode 255 followed by 4 little-endian bytes. |
| CompactSize flag | 2 Bits| 1, 2 or 3 indicate literal values. 0 indicates 
that the value will be encoded in a later CompactInt. |
| VarInt   | 1+ Bytes  | 7-bit little-endian encoding, with each 7-bit 
word encoded in a byte. The highest bit of each byte is 1 if more bytes follow, 
and 0 for the last byte. |
| VLP-Bytestream   | 2+ Bytes  | A VarInt Length Prefixed Bytestream. Has a 
VarInt prefixed to determine the length. |

### 3.2 General Schema

| Name   | Width   | Description |
||-|-|
| Transaction Metadata   | 1 Byte| Information on the structure of 
the transaction. See Section 3.3. |
| Version| 0-5 Bytes | An optional CompactSize 
containing the transactions version. |
| Input Count| 0-5 Bytes | An optional CompactSize 
containing the transactions input count. |
| Output Count   | 0-5 Bytes | An optional CompactSize 
containing the transactions output count. |
| LockTime   | 0-5 Bytes | An optional CompactSize 
containing the transaction LockTime if its non zero. |
| Minimum Blockheight| 1-5 Bytes | A VarInt containing the Minimum 
Blockheight of which the transaction locktime and input blockheights are given 
as offsets. |
| Input Metadata+Output Metadata | 1+ Bytes  | A Encoding containing metadata 
on all the inputs and then all the outputs of the transaction. For each input 
see Section 3.4, for each output see Section 3.5. |
| Input Data | 66+ Bytes | See Section 3.6 for each input. |
| Output Data| 3+ Bytes  | See Section 3.7 for each output. 
|

For the four CompactSize listed above we could use a more compact bit encoding 
for these but they are already a fall back for the bit encoding of the 
Transa