[Lightning-dev] LN Summit 2023 Notes

Carla Kirk-Cohen Wed, 19 Jul 2023 17:09:04 -0700

Hi List,

At the end of June we got together in NYC for the annual specification
meeting. This time around we made an attempt at taking transcript-style
notes which are available here:
https://docs.google.com/document/d/1MZhAH82YLEXWz4bTnSQcdTQ03FpH4JpukK9Pm7V02bk/edit?usp=sharing
.
To decrease our dependence on my google drive I've also included the
full set of notes at the end of this email (no promises about the
formatting however).


We made a semi-successful attempt at recording larger group topics, so
these notes roughly follow the structure of the discussions that we had
at the summit (rather than being a summary). Speakers are not
attributed, and any mistakes are my own.

Thanks to everyone who traveled far, Wolf for hosting us in style in
NYC and to Michael Levin for helping out with notes <3

# LN Summit - NYC 2023

## Day One

### Package Relay
- The current proposal for package relay is ancestor package relay:
  - One child can have up to 24 ancestors.
  - Right now, we only score mempool transactions by ancestry anyway, so
there isn’t much point in other types of packages.
- For base package relay, commitment transactions will still need to have
the minimum relay fee.
  - No batch bumping is allowed, because it can open up pinning attacks.
  - With one anchor, we can package RBF.
- Once we have package relay, it will be easier to get things into the
mempool.
- Once we have V3 transactions, we can drop minimum relay fees because we
are restricted to one child pays for one parent transaction:
  - The size of these transactions is limited.
  - You can’t arbitrarily attach junk to pin them.
- If we want to get rid of 330 sat anchors, we will need ephemeral anchors:
  - If there is an OP_TRUE output, it can be any value including zero.
  - It must be spent by a child in the same package.
  - Only one child can spend the anchor (because it’s one output).
  - The parent must be zero fee because we never want it in a block on its
own, or relayed without the child.
  - If the child is evicted, we really want the parent to be evicted as
well (there are some odd edge cases at the bottom of the mempool, so zero
ensures that we’ll definitely be evicted).
- The bigger change is with HLTCs:
  - With SIGHASH_ANYONECANPAY, your counterparty can inflate the size of
your transaction - eg: HTLC success, anyone can attack junk.
  - How much do we want to change here?
- So, we can get to zero fee commitment transactions and one (ephemeral)
anchor per transaction.
  - With zero fee commitments, where do we put trimmed HTLCs?
     - You can just drop it in an OP_TRUE, and reasonably expect the miner
to take it.
- In a commitment with to_self and to_remote and an ephemeral anchor that
must be spent, you can drop the 1 block CSV in the transaction that does
not have a revocation path (ie, you can drop it for to_remote).
  - Any spend of this transaction must spend the one anchor in the same
block.
  - No other output is eligible to have a tx attached to it, so we don’t
need the delay anymore.
     - Theoretically, your counterparty could get hold of your signed local
copy, but then you can just RBF.
- Since these will be V3 transactions, the size of the child must be small
so you can’t pin it.
  - Both parties can RBF the child spending the ephemeral anchor.
  - This isn’t specifically tailored to lightning, it’s a more general
concept.
  - Child transactions of V3 are implicitly RBF, so we won’t have to worry
about it not being replaceable (no inheritance bug).
- In general, when we’re changing the mempool policy we want to make the
minimal relaxation that allows the best improvement.
- We need “top of block” mempool / cluster mempool:
  - The mempool can have clusters of transactions (parents/children
arranged in various topologies) and the whole mempool could even be a
single cluster (think a trellis of parents and children “zigzagging”).
  - The mining algorithm will pick one “vertical” using ancestor fee rate.
  - There are some situations where you add a single transaction and it
would completely change the order in which our current selection picks
things.
  - Cluster mempool groups transactions into “clusters” that make this
easier to sort and reason about.
  - It can be expensive (which introduces a risk of denial of service), but
if we have limits on cluster sizes then we can limit this.
  - This is the only way to get package RBF.
- How far along is all of this?
  - BIP331: the P2P part that allows different package types is moving
along and implementation is happening. This is done in a way that would
allow us to add different types of packages in future if we need them.
     - There are some improvements being made to core’s orphan pool,
because we need to make sure that peers can’t knock orphans out of your
pool that you may later need to retrieve as package ancestors.
     - We reserve some spots with tokens, and if you have a token we keep
some space for you.
  - V3 transactions: implemented on top of package relay, since they don’t
really make sense without it. This is an opt-in regime where you make
things easier to RBF.
  - Ephemeral Anchors: on top of package relay and V3.
  - Cluster Mempool: This is further out, but there’s been some progress.
Right now people are working on the linearization algorithm.
- If these changes don’t suit lightning’s use case, now is the time to
speak because it’s all being worked on.
  - In what way does it not fit LN, as it’s currently designed?
     - With the V3 paradigm, to stop all pinning vectors, HTLC transactions
will need to get anchors and you will have to drop ANYONECANPAY (to use
anchors instead).
     - Could we fix this with additional restrictions on V3 (or a V4)?
     - You could do something where V4 means that you can have no ancestors
and no descendants (in the mempool).
     - V3 restricts the child, you could also restrict the parent further.
     - You could have a heuristic on the number of inputs and outputs, but
anyone can pay or add inputs.
     - You could commit to the whole thing being some size by setting a
series of bits in sequence to mark maximum size but that would involve
running script (which is annoying).
     - The history of V3 was to allow a parent of any size because
commitment transactions can be any size.
- What about keeping HTLCs the way they are today?
  - A lot of other things are less pineapple, maybe that’s ok? It’s a step
forward.
  - The long term solution is to change relay policy to top of mempool. We
shouldn’t be doing things until the long term solution is clear, and we
shouldn’t change the protocol in a way that doesn’t fit with the long term.
  - We already do a lot of arbitrary things, if we were starting from day
one we wouldn’t do HTLCs with anchors (it’s too much bloat), and being
unable to RBF is more bloat because you overpay.
  - If the remote commitment is broadcast, you can spen the HTLC with V2.
You can’t add a V3 child (it won’t get in the mempool).
  - If you want to be consistent, even when it’s the remote commit you need
a presigned transaction, we should just use it as designed today.
- What is the “top of mempool” assumption?
  - If large transactions are at the top of your mempool, you (a miner)
want transactions with higher fee rates to increase your total revenue.
  - Today, you can’t really easily answer or reason about these questions.
  - We want to accept transactions in our mempool in a denial of service
resistant way that will strictly increase our miner fees.
  - If a transaction is at the top of your mempool, you may be more willing
to accept a replacement fee. If it’s way down in the mempool, you would
probably replace more slowly.
  - It’s conceivable that we could get here, but it’s years off.
     - Once we have this, pinning only happens with large transactions at
the bottom of the mempool. If you just slow roll these transactions, you
can just relay what you’ve got.
     - If somebody comes and replaces the bottom with something small
that’ll be mined in the next two blocks, you clearly want to accept that.
- Does cluster mempool fix rule 3?
  - No but they help.
  - Today when you get a transaction you don’t know which block it’s going
in - we don’t have a preference ordering.
  - Miners don’t make blocks as they go - with cluster mempool you can make
very fast block templates.
  - When talking about relay, you can ignore block making and just get a
very good estimate.
- The question is: when do we jump?
  - Wail until V3? Package Relay?
     - Package relay will help. You still negotiate fees but they don’t
matter as much.
     - We’d like to kill upate fee and have a magic number for fees, can we
do that when we get package relay?
     - Package relay is the hard part, V3 should be relatively easier after
that.
     - When we get V3 we can drop to zero fee.
- Is there a future where miners don’t care about policy at all?
  - Say, the block that I’m mining has V3 transactions. They’re just
maximizing fees so they’ll accept anything out of band, ignoring these new
policy rules.
  - Accelerators are already starting to emerge today - how are they doing
it?
  - We’re carving out a small space in which mempool rules work, and it’s
okay to work around it.
  - If a miner mines it, great - that’s what we want. The problem is not
getting mined (ie pinning).
- Figuring this out is not simple:
  - Today there are times where we accept replacements when we should
reject them and times where we reject replacements when we should accept
them.
  - There can be situations where miners will mine things that aren’t
incentive compatible (ie, not the best block template).
- What if somebody pays out of band not to mine?
- Ephemeral anchors are interesting, they never enter the UTXO set - if the
child gets evicted, you get evicted.
  - The first transaction being zero is optimal, want to be sure it’ll be
evicted (there are some mempool quirks).
  - It must be zero fee so that it will be evicted.
- Should we add trimmed HTLCs to the ephemeral anchor?
  - We don’t want to get above min relay fee because then we could hang
around in the mempool.
  - The eltoo implementation currently does this.
  - You can’t keep things in OP_TRUE because they’ll be taken.
  - You can also just put it in fees as before.
- More on cluster mempools:
  - It can be used for block selection, but it’s currently focused on
mempool selection.
  - You can simulate template selection with it.
  - When you’re mining, you have to pick from an ancestor downwards.
  - First we lineralize the transactions to flatten the structure
intelligently (by topology and fee rate).
  - Then we figure out the best ordering of this flat structure.
  - Each chunk in a cluster is less than 75 transactions, and a cluster has
chunks of transactions.
  - If you woul include a transaction with the ancestor, it goes in the
same chunk. Otherwise it goes in the next chunk.
  - Miners select the highest fee rate chunks, lower fee rate ones can
safely be evicted.
  - For replacement, you can just check replacement for a single cluster,
rechunk it and then resort the chunks.
     - It must beat the fee rate of the chunks to get in.
     - You can check how much it beats the chunk by, whether it would go in
the next block(s) and then decide.
  - Chunk ordering takes into account transaction size, beyond 25
transactions we just do ancestor set feerate.
  - One of the limits is going away, one is sticking around. We still have
sibling eviction issues.
  - Is one of the issues with chunks is that they’re limited by transaction
weight, 101 kilo-v-bytes, which is the maximum package size?
     - We should be okay, these limits are higher than present.
     - You can bound chunk size pretty easily.
  - If we get chunk fee rate replacement then you can do batch fee bumping
(eg, a bunch of ephemeral anchors that are all batched together).
- Are there long term policy implications for privacy for ephemeral anchors?
  - You’re essentially opting into transaction sponsors?
  - If everyone uses V3 eventually there’s no issue.
  - For right now, it would be nice if V3 is only for unilateral closes.
  - It’s also useful in a custodial wallet setting, where you have one team
creating on chain transactions and the other attaching fees. This is a
common accounting headache. People can also non-interactively fee bump.

### Taproot
- Spec is still in draft right now, and the code is a bit ahead of the test
vectors.
- The biggest change that has been made is around anchors, which become
more complicated with taproot:
  - The revocation path on to_local takes the script path, so now we need
to reveal data for the anchor.
  - The downside is that revocation is more expensive - 32 more bytes in
the control block.
- The to_remote has a NUMS point, previously we just had multisig keys:
  - If you’re doing a rescan, you won’t know those keys.
  - Now you just need to know the NUMS point and you can always rescan.
  - The NUMS point itself is pretty verifiable, you just start with a
string and hash it.
  - It is constant, you just use it randomized with your key.
- The internal pubkey is constant on to_remote.
- Co-op close [broke into several different discussions]:
  - The idea here is to remove negotiation, since we’ve had disagreement
issues in the past.
  - In this version, the initiator just accepts the fee.
  - Do we want negotiation?
     - In the past we’ve had bugs with fee rate estimates that won’t budge.
  - Why don’t we just pick our fees and send two sets of signatures?
     - This would no longer be symmetric.
     - What if nobody wants to close first? We’d need to work through the
game theory.
     - The person who wants their funds has an incentive.
  - Why is RBF so hard for co-op close?
     - Closing should be marked as RBF, there’s no reason not to.
     - Just pick your fee rate, pay it and then come back and RBF if you’d
like. You can bump/broadcast as much as you like.
     - If we’re going to have something like that where we iterate, why
don’t we just do the simple version where we pick a fee and sign?
     - If you have to pay the whole fee, you have less incentive to sign.
  - Why is this linked to taproot work?
     - It needs to change anyway, and we need to add nonces.
  - What about, whoever wants to close sends a fee rate (paying the fees)
and the responder just sends a signature?
     - If you don’t have enough balance, you can’t close. But why do you
care anyway, you have no funds?
     - We can do this as many times as we want.
  - Shutdown is still useful to clear the air on the channel.
  - When you reconnect, you start a new interaction completely.
  - TL;DR:
     - Shutdown message stays.
     - You send a signature with a fee rate.
     - The remote party signs it.
     - If they disagree, you do it again.
     - It’s all RBF-able.
     - You must retransmit shutdown, and you must respond with a shutdown.
     - You can send nonces at any time:
     - In revoke and ack.
     - On channel re-establish.
- For taproot/musig2 we need nonces:
  - Today we store the commitment signature from the remote party. We don’t
need to store our own signature - we can sign at time of broadcast.
  - To be able to sign you need the verification nonce - you could remember
it, or you could use a counter:
     - Counter based:
     - We re-use shachain and then just use it to generate nonces.
     - Start with a seed, derive from that, use it to generate nonces.
     - This way you don’t need to remember state, since it can always be
generated from what you already have.
     - Why is this safe?
     - We never re-use nonces.
     - The remote party never sees your partial signature.
     - The message always stays the same (the dangerous re-use case is
using the same nonce for different messages).
     - If we used the same nonce for different messages we could leak our
key.
     - You can combine the sighash + nonce to make it unique - this also
binds more.
     - Remote party will only see the full signature on chain, never your
partial one.
  - Each party has sign and verify nonces, 4 total.
  - Co-op close only has 2 because it’s symmetric.

### Gossip V1.5 vs V2
- How much do we care about script binding?
  - It it’s loose, it can be any script - you can advertise any UTXO.
  - You revel less information, just providing a full signature with the
full taproot public key.
  - If it’s tight, you have to provide two keys and then use the BIP 86
tweak to check that it’s a 2-of-2 multisig.
- Should we fully bind to the script, or just allow any taproot output?
  - Don’t see why we’d want additional overhead.
  - Any taproot output can be a channel - let people experiment.
  - We shouldn’t have cared in the first place, so it doesn’t matter what
it’s bound to.
  - It’s just there for anti-DOS, just need to prove that you can sign.
- Let every taproot output be a lightning channel, amen.
- We’re going to decouple:
  - You still need a UTXO but it doesn’t matter what it looks like.
  - This also allows other channel types in future.
  - We send:
     - UTXO: unspent and in the UTXO set
     - Two node pubkeys
     - One signature
- How much do we care about amount binding?
  - Today it is exact.
     - People use it for capacity graphs.
     - Graph go up.
     - We can watch the chain for spends when we know which UTXO to watch
per-channel.
  - Is there an impact on pathfinding if we over-advertize?
     - We use capacity to pathfind.
     - What’s the worst case if people lie? We don’t use them.
  - If we’ve already agreed that this can be a UTXO that isn’t a channel,
then it shouldn’t matter.
  - If you allow value magnification, we can use a single UTXO to claim for
multiple channels. Even in the most naive version (say 5x), you’re only
revealing 20% of your UTXOs.
  - How much leverage can we allow? The only limit is denial of service.
  - There’s the potential for a market for UTXOs.
- There’s a privacy trade-off:
  - If you one-to-one map them, then there’s no privacy gain.
  - Do we know that you get substantial privacy?
     - Even if you have two UTXOs and two channels, those UTXOs are now not
linked (because you can just use the first one to advertise).
     - This is only assuming that somebody implements it/ infrastructure is
built out.
     - People could create more elaborate things over time, even if
implementations do the “dumb” way.
- Gossip 1.5 (ie, with amount binding) fits in the current flow, V2 (ie,
without amount binding) has a very different scope.
  - It’s a big step, and you don’t truly know until you implement it.
  - What about things like: a very large node and a very small node, whose
announced UTXO do you use?
- We decided not to put UTXOs in node announcement, so we’d put it in
channel announcement:
  - Sometimes there’s a UTXO, sometimes there isn’t.
  - You look at a node’s previous channels to see if they still have
“quota”.
  - If you don’t have “quota” left, you have to include a signature TLV.
- With the goal of publicly announcing taproot channels, 1.5 gets us there
and is a much smaller code change.
- We’ve talked alot about capacity for pathfinding, but we haven’t really
touched on control valves like max HTLC:
  - Currently we don’t use these valves to tune our pathfinding, people
don’t use it.
  - If we get better here, we won’t need capacity.
  - This value is already < 50% anyway.
- If we don’t un-bind amounts now, when will we do it?
  - It’s always a lower priority and everyone is busy.
  - If we allow overcommitting by some factor now, it’s not unrealistic
that it will allow some degree of privacy.
  - Between these features, we have opened the door to leasing UTXOs:
     - Before we do more over-commitment, let’s see if anybody uses it?
- We add channel capacity to the channel announcement with a feature bit:
  - If we turn the feature off, we are on-to-one mapped.
  - But a node can’t use the upgraded version until everyone is upgraded?
     - Our current “get everyone upgraded” cycle is 18 months (or a CVE).
     - If you’re upgrading to 2x multiplier, nobody on 1x will accept that
gossip.
     - People will not pay for privacy (via lost revenue of people not
seeing their gossip).
     - This is additive to defeating chain analysis.
     - Private UTXO management is already complicated, we don’t know the
ideal that we’re working towards.
- What about if we just set it to 2 today?
  - Is 2 qualitatively better than 1 (without script binding) today?
  - Will a marketplace magically emerge if we allow over-commitment?
- We don’t know the implications of setting a global multiplier for routing
or denial of service, and we don’t have a clear view of what privacy would
look like (other than “some” improvement).
- We agree that adding a multiplier doesn’t break the network.
  - People with a lower value will see a subnet when we upgrade.
- We’re going to go with gossip “1.75”:
  - Bind to amount but not script.
  - We include a TLV cut out that paves the way to overcommitment.

### Multi-Sig Channel Parties
- There are a few paths we could take to get multi-sig for one channel
party:
  - Script: just do it on the script level for the UTXO, but it’s heavy
handed.
  - FROSTy: you end up having to do a bunch of things around fault
tolerance which require a more intense setup. You also may not want the
shachain to be known by all of the parties in the setup (we have an ugly
solution for this, we think).
  - Recursive musig:
- Context: you have one key in the party, but you actually want it to be
multiple keys under the hood. You don’t want any single party to know the
revocation secrets, so you have to each have a part and combine them.
- Ugliest solution: just create distinct values and store them.
- Less ugly solution uses multiple shachains:
  - Right now we have a shachan, and we reveal two leaves of it.
  - You have 8 shachains, and you XOR them all together.
     - Why do we need 8? That’ll serve a 5-of-7.
     - Maybe we need 21? 5 choose 7, we’re not sure.
  - How does this work with K-of-N?
     - Each party has a piece, and you can always combine them in different
combinations (of K pieces) to get to the secret you’re using.

### PTLCs
- We can do PTLCs in two ways:
  - Regular musig
  - Adaptor signatures
- Do they work with trampoline as defined today?
  - The sender picks all the blinding factors today, so it’s fine.
- There’s a paper called: splitting locally while routing
interdimensionally:
  - You can let intermediate nodes do splitting because they know all the
local values.
  - Then can generate new blinding factors and split out to the next node.
  - Adaptor signatures could possibly be combined for PTLCs that get fanned
out then combined.
- There are a few options for redundant overpayment (ie, “stuckless”
payments):
  - Boomerang:
     - Preimages are coefficients of a polynomial, and you commit to the
polynomial itself.
     - If you have a degree P polynomial and you take P+1 shares, then you
can claim in the other direction.
     - Quite complex.
     - You have to agree on the number of splits in advance.
  - Spear:
     - H2TLC: there are two payment hashes per-HTLC, one if from the sender
and one is from the invoice.
     - When you send a payment, the sender only reveals the right number of
sender preimages.
     - This also gives us HTLC acknowledgement, which is nice.
     - You can concurrently split, and then spray and pray.
     - Interaction is required to get preimages.
- Do we want to add redundant overpayment with PTLCs?
  - We’ll introduce a communication requirement.
  - Do we need to decide that before we do PTLCs?
     - Can we go for the simplest possible option first and then add it?
     - For spear, yes - the intermediate nodes don’t know that it’s two
hashes.
     - We could build them out without thinking about overpayment, and then
mix in a sender secret so that we can claim a subset.
  - We’ll have more round trips, but you also have the ability to ACK HTLCs
that have arrived.
  - Spray and pray uses the same about as you would with our current “send
/ wait / send”, it’s just concurrently not serially.
- Is it a problem that HLTCs that aren’t settled don’t pay fees?
  - You’re paying the fastest routes.
  - Even if it’s random, you still get a constant factor of what we should
have gotten otherwise.
- It makes sense to use onion messages if it’s available to us.
- Are we getting payment acknowledgement? Seems so!

## Day Two

### Hybrid Approach to Channel Jamming
- We’ve been talking about jamming for 8 years, back and forth on the
mailing list.
- We’d like to find a way to move forward so that we can get something
done.
- Generally when we think about jamming, there are three “classes” of
mitigations:
  - Monetary: unconditional fees, implemented in various ways.
  - Reputation: locally assessed (global is terrible)
  - Scarce Resources: POW, stake, tokens.
- The problem is that none of these solutions work in isolation.
  - Monetary: the cost that will deter an attacker is unreasonable for an
honest user, and the cost that is reasonable for an honest user is too low
for an attacker.
  - Reputation: any system needs to define some threshold that is
considered good behavior, and an attacker can aim to fall just under it.
Eg: if you need a payment to resolve in 1 minute, you can fall just under
that bar.
  - Scarce resources: like with monetary, pricing doesn’t work out.
     - Paper: proof of work doesn’t work for email spam, as an example.
     - Since scarce resources can be purchased, they could be considered a
subset of monetary.
- There is no silver bullet for jamming mitigation.
- Combination of unconditional fees and reputation:
  - Good behavior grants access to more resources, bad behavior loses it.
  - If you want to fall just below that threshold, we close the gap with
unconditional fees.
- Looking a these three classes implemented in isolation, are there any
unresolved questions that people have - “what about this”?
  - Doesn’t POW get you around the cold start problem in reputation where
if you want to put money in to quickly bootstrap you can?
     - Since POW can be rented, it’s essentially a monetary solution - just
extra steps.
     - We run into the same pricing issues.
- Why these combinations?
  - Since scarce resources are essentially monetary, we think that
unconditional fees are the simplest possible monetary solution.
- Unconditional Fees:
  - As a sender, you’re building a route and losing money if it doesn’t go
through?
     - Yes, but they only need to be trivially small compared to success
case fee budgets.
     - You can also eventually succeed so long as you retry enough, even if
failure rates are very high.
  - How do you know that these fees will be small? The market could decide
otherwise.
     - Routing nodes still need to be competitive. If you put an
unconditional fee of 100x the success case, senders will choose to not send
through you because you have no incentive to forward.
     - We could also add an in-protocol limit or sender-side advisory.
  - With unconditional fees, a fast jamming attack is very clearly paid for.
- Reputation:
  - The easiest way to jam somebody today is to send a bunch of HTLCs
through them and hold them for two weeks. We’re focusing on reputation to
begin with, because in this case we can quite easily identify that people
are doing something wrong (at the extremes).
  - If you have a reputation score that blows up on failed attempts,
doesn’t that fix it without upfront fees?
     - We have to allow some natural rate of failure in the network.
     - An attacker can still aim to fall just below that failure threshold
and go through multiple channels to attack an individual channel.
     - THere isn’t any way to set a bar that an attacker can’t fall just
beneath.
     - Isn’t this the same for reputation? We have a suggestion for
reputation but all of them fail because they can be gamed below the bar.
  - If reputation matches the regular operation of nodes on the network,
you will naturally build reputation up over time.
     - If we do not match reputation accumulation to what normal nodes do,
then an attacker can take some other action to get more reputation than the
rest of the network. We don’t want attackers to be able to get ahead of
regular nodes.
     - Let’s say you get one point for success and one for failure, a
normal node will always have bad reputation. An attacker could then send 1
say payments all day long, pay a fee for it and gain reputation.
- Can you define jamming? Is it stuck HTLCs or a lot of 1 sat HTLCS
spamming up your DB?
  - Jamming is holding HTLCs to or streaming constant failed HTLCs to
prevent a channel from operating.
  - This can be achieved with slots or liquidity.
- Does the system still work if users are playing with reputation?
  - In the steady state, it doesn’t really matter whether a node has a good
reputation or not.
  - If users start to set reputation in a way that doesn’t reflect normal
operation of the network, it will only affect their ability to route when
under attack.
- Isn’t reputation monetary as well, as you can buy a whole node?
  - There is a connection, and yes in the extreme case you can buy an
entire identity.
  - Even if you do this, the resource bucketing doesn’t give you a “golden
ticket” to consume all slots/liquidity with good reputation, so you’re
still limited in what you can do.
- Can we learn anything from research elsewhere / the way things are done
on the internet?
  - A lot of our confidence that these solutions don’t work in isolation is
based on previous work looking at spam on the internet.
  - Lightning is also unique because it is a monetary network - we have
money built in, so we have different tools to use.
- To me, it seems like if the scarce resource that we’re trying to allocate
is HTLC slots and upfront fees you can pay me upfront fees for the worst
case (say two weeks) and then if it settles if 5 seconds you give it back?
  - The dream solution is to only pay for the amount of time that a HTLC is
held in flight.
  - The problem here is that there’s no way to prove time when things go
wrong, and any solution without a universal clock will fall back on
cooperation which breaks down in the case of an attack.
  - No honest user will be willing to pay the price for the worst case,
which gets us back to the pricing issue.
  - There’s also an incentives issue when the “rent” we pay for these two
weeks worst case is more than the forwarding fee, so a router may be
incentivized to just hang on to that amount and bank it.
  - We’ve talked about forwards and backwards fees extensively on the
mailing list:
     - They’re not large enough to be enforceable, so somebody always has
to give the money back off chain.
     - This means that we rely on cooperation for this refund.
     - The complexity of this type of system is very high, and we start to
open up new “non-cooperation” concerns - can we be attacked using this
mechanism itself?
     - Doesn’t an attacker need to be directly connected to you to steal in
the non-cooperative case?
     - At the end of the day, somebody ends up getting robbed when we can’t
pull the money from the source (attacker).
- Does everybody feel resolved on the statement that we need to take this
hybrid approach to clamp down on jamming? Are there any “what about
solution X” questions left for anyone? Nothing came up.

### Reputation for Channel Jamming
- Resource bucketing allows us to limit the number of slots and amount of
liquidity that are available for nodes that do not have good reputation.
  - No reputation system is perfect, and we will always have nodes that
have low-to-no activity, or are new to the network that we can’t form
reputation scores for.
  - It would be a terrible outcome for lightning to just drop these HTLCs,
so we reserve some portion of resources for them.
- We have two buckets: protected and general (split 50/50 for the purposes
of explanation, but we’ll find more intelligent numbers with further
research):
  - In the normal operation of the network, it doesn’t matter if you get
into the protected slots. When everyone is using the network as usual,
things clear out quickly so the general bucket won’t fill up.
  - When the network comes under attack, an attacker will fill up slots and
liquidity in the general bucket. When this happens, only nodes with good
reputation will be able to use the protected slots, other HTLCs will be
dropped.
  - During an attack, nodes that don’t have a good reputation will
experience lower quality of service - we’ll gradually degrade.
- What do you mean by the steady state?
  - Nobody is doing anything malicious, payments are clearing out as usual
- not sitting on the channel using all 483 slots.
- We decide which bucket the HTLC goes into using two signals:
  - Reputation: whether the upstream node had good reputation with our
local node.
  - Endorsement: whether the upstream node has indicated that the HTLC is
expected to be honest (0 if uncertain, 1 if expected to be honest).
  - If reputation && endorsement, then we’ll allow the HTLC into protected
slots and forward the HTLC on with endorsed=1.
  - We need reputation to add a local viewpoint to this endorsement signal
- otherwise we can just trivially be jammed if we just copy what the
incoming peer said.
  - We need endorsement to be able to propagate this signal over multiple
hops - once it drops, it’s dropped for good.
  - There’s a privacy questions for when senders set endorsed:
     - You can flip a coin or set the endorsed field for your payments at
the same proportion as you endorse forwards.
- We think about reputation in terms of the maximum amount of damage that
can be done by abusing it:
  - Longest CLTV that we allow in the future from current height 2016
blocks (~2 weeks): this it the longest that we can be slow jammed.
  - Total route length ~27 hops: this is the largest amplifying factor an
attacker can have.
- We use the two week period to calculate the node’s total routing revenue,
this is what we have to lose if we are jammed.
- We then look at a longer period, 10x the two week period to see what the
peer has forwarded us over that longer period.
- If they have forwarded us more over that longer period than what we have
to loose in the shorter period, then they have good reputation.
- This is the damage that is observable to us - there are values outside of
the protocol that are also affected by jamming:
  - Business reliability, joy of running a node, etc
  - We contend that these values are inherently unmeasurable to protocol
devs:
     - End users can’t easily put a value on them.
     - If we try to approximate them, users will likely just run the
defaults.
- One of the simplest attacks we can expect is an “about turn” where an
attacker behaves perfectly and then attacks:
  - So, once you have good reputation we can’t just give you full access to
protected slots.
- We want to reward behavior that we consider to be honest, so we consider
“effective” HTLC fees - the fee value that a HTLC has given us relative to
how long it took to resolve:
  - Resolution period: the amount of time that a HTLC can reasonably take
to resolve - based on MPP timeout / 1 minute.
  - We calculate opportunity cost for every minute after the first
“allowed” minute as the fees that we could have earned with that
liquidity/slot.
  - Reputation is only negatively affected if you endorsed the HTLC.
  - If you did not endorse, then you only gain reputation for fast success
(allowing bootstrapping).
- When do I get access to protected slots?
  - When you get good reputation, you can use protected slots for your
endorsed HTLCs but there is a cap on the number of in flight HLTCs that are
allowed.
  - We treat every HLTC as if it will resolve with the worst possible
outcome, and temporarily dock reputation until it resolves:
     - In the good case, it resolves quickly and you get your next HLTC
endorsed.
     - In the bad case, you don’t get any more HTLCs endorsed and you
reputation remains docked once it resolves (slowly).
- Wouldn’t a decaying average be easier to implement, rather than a sliding
window?
  - If you’re going to use large windows, then a day here or there doesn’t
matter so much.
- Have you thought about how to make this more visible to node operators?
  - In the steady state we don’t expect this to have any impact on routing
operations, so they’ll only need a high level view.
- Can you elaborate on slots vs liquidity for these buckets?
  - Since we have a proportional fee for HTLCs, this indirectly represents
liquidity: larger HTLCs will have larger fees so will be more “expensive”
to get endorsed.
- Where do we go from here?
  - We would like to dry run with an experimental endorsement field, and
ask volunteers to gather data for us.
  - Are there any objections to an experimental TLV?
     - No.
     - We could also test multiple endorsement signals / algorithms in
parallel.
- In your simulations, have you looked at the ability to segment off
attacks as they happen? To see how quickly an attacker's reputation drops
off, and that you have a protected path?
  - Not yet, but plan to.
- Do any of these assumptions change with trampoline? Don’t think it’s
related.
- Your reputation is at stake when you endorse, when do you decide to
endorse my own payments?
  - You do the things you were already doing to figure out a good route.
Paths that you think have good liquidity, and have had success with in the
past.
- What about redundant overpayment, some of your HTLCs are bound to fail?
  - Provided that they fail fast, it shouldn’t be a problem.
- Is it possible that the case where general slots are perpetually filled
by attackers becomes the steady state? And we can’t tell the difference
between a regular user and attacker.
  - This is where unconditional fees come in, if somebody wants to
perpetually fill up the general bucket they have to pay for it.
- Is there anything we’d like to see that will help us have move confidence
here?
  - What do you think is missing from the information presented?
     - We can simulate the steady state / create synthetic data, but can’t
simulate every attack. Would like to spend more time thinking through the
ways this could possibly be abused.
  - Would it help to run this on signet? Or scaling lightning?
     - Its a little easier to produce various profiles of activity on
regtest[.

### Simplified Commitments
- Simplified commitments makes our state machine easier to think about.
  - Advertise option_simplified_commitment: once both peers upgrade, we can
just use it.
  - Simplify our state machine before we make any more changes to it.
- Right now Alice and Bob can have changes in flight at the same time:
  - Impossible to debug, though technically optimal.
  - Everyone is afraid of touching the state machine.
- We can simplify this by introducing turn taking:
  - First turn is taken by the lower pubkey.
  - Alice: update / commit.
  - Bob: revoke and ack.
  - If alice wants to go when it’s bob’s turn, she can just send a message.
  - Bob can ignore it, or yield and accept it.
  - This has been implemented in CLN for LNSymmetry
- It’s less code to not have the ignore message, but for real performance
we’ll want it. Don’t want to suck up all of that latency.
- The easiest way is to re-establish is to upgrade on re-establish.
  - If it wasn’t somebody’s turn, you can just do lowest pubkey.
  - If it was somebody’s turn, you can just resume it.
- We could also add REVOKE and NACK:
  - Right now we have no way to refuse updates.
  - Why do we want a NACK?
     - You currently have to express to the other side what they can put in
your channel because you can’t handle it if they give you something you
don't’ like (eg, a HTLC below min_htlc).
     - You can likely force a break by trying to send things that aren't
allowed, which is a robustness issue.
     - We just force close when we get things we don’t allow.
     - Could possibly trigger force closes.
- There’s an older proposal called fastball where you send a HTLC and
advise that you’re going to fail it.
  - If Alice gets it, she can reply with UNADD.
  - If you don’t get it in time,you just go through the regular cycle.
- When you get commitment signed, you could NACK it. This could mean you’re
failing the whole commitment, or just a few HTLCs.
  - You can’t fail a HTLC when you’ve sent commitment signed, so you need a
new cycle to clear it out.
  - What NACK says is: I’ve ignored all of your updates and I’m progressing
to the next commitment.
- Revoke and NACK is followed by commitment signed where you clear out all
the bad HTLCs, ending that set of updates.
- You have to NACk and then wait for another commitment signature, signing
for the same revocation number.
- Bob never has to hold a HTLC that he doesn't want from Alice on his
commitment.
- This is bad for latency, good for robustness.
  - Alice can send whatever she wants, and Bob has a way to reject it.
  - There are a whole lot of protocol violations that Alice can force a
force close with, now they can be NACKed.
- This is good news for remote signers where policy has been violate
because we have cases where policy has been violated and our only way right
now is the close the channel.
- You still want Alice to know Bob’s limits so that you can avoid endless
invalid HTLCs.
- Simplified commitment allows us to do things more easily in the protocol.
  - When we specced this all out, we didn’t foresee that update fee would
be so complicated, with this we know update fee will be correct.
 - If we don’t do this, we have to change update fee?
     - Sender of the HTLC adds fee.
     - Or fixed fee.
- Even if we have zero fees, don’t we still have HTLC dust problems?
  - You can have a bit on update add that says the HTLC is dust.
  - You can’t be totally fee agnostic because you have to be able to
understand when to trim HLTCs.
- Even update fee aside, shouldn’t things just be simpler?
- Would a turn based protocol have implications for musig nonces?
  - If you’re taking a turn, it’s a session.
  - You’d need to have different nonces for different sessions.
- We should probably do this before we make another major change, it
simplifies things.
- Upgrade on re-establish is pretty neat because you can just tell them
what type you’d like.
  - This worked very well for CLN getting rid of static remote.
- What about parameter exchange?
  - There’s a version of splice that allows you to add new inputs and
outputs.
  - Splice no splice which means that you can only make a new commitment
transaction, no on-chain work.
  - Seems like you can get something like dynamic commitments with this,
and it’s a subset of splicing.

## Day Three

### Meta Spec Process
- Do we want to re-evaluate the concept of a “living document”?
  - It’s only going to get longer.
  - As we continue to update, we have two options:
     - Remove old text and replace entirely.
     - Do an extension and then one day replace.
- If implementing from scratch, what would you want to use?
  - Nobody is currently doing this.
  - By the time they finished, everything would have move.
- The protocol isn’t actually that modular:
  - Except for untouched BOLT-08, which can be read in isolation.
  - Other things have tentacles.
     - We should endeavor for things to be as modular as possible.
- Back in Adelaide we have a version with a set of features.
  - We have not re-visited that discussion.
  - Is it possible for us to come up with versions and hold ourselves
accountable to them?
  - If we’re going to start having different ideas of what lightning looks
like, versioning helps.
  - Versioning is not tied one-to-one to the protocol:
     - Features make it less clean because you have a grab bag of features
on top of any “base” version we decide on.
     - Does a version imply that we’re implementing in lock step?
  - If we do extraction, remove and cleanup, we could say that we’re on
version X with features A/B/C.
- How confident are we that we can pull things out? Only things that are
brand new will be easy to do this with.
- Keeping context in your head is hard, and jumping between documents
breaks up thought.
  - Should we fully copy the current document and copy it?
  - Not everything is a rewrite, some things are optional.
- What’s our design goal?
  - To be able to more easily speak about compatibility.
  - To have a readable document for implementation.
- A single living document works when we were all making a unified push,
now it makes less sense:
  - You can’t be compliant “by commit” because things are implemented in
different orders.
  - We can’t fix that with extensions, they’re hard to keep up to date?
     - RFCs work like this, they have replacement ones.
- BOLT 9 can act as a control bolt because it defines features.
- Extensions seem helpful:
  - Can contain more rationale.
  - You can have spaghetti and ravioli code, these could be raviolo
extensions.
  - If everything is an extension BOLT with minimal references, we avoid
the if-else-ey structure we have right now.
- For small things, we can just throw out the old stuff.
- If it were possible to modularize and have working groups, that would be
great but it seems like we’d tread on each other’s toes.
- We must avoid scenarios like vfmanprint:
  - The return value says “vfprintf returns -1 on error”
  - The next sentence says “this was true until version 2”
  - But nobody reads the next sentence.
- Cleanup PRs won’t be looked at, and need to be maintained as new stuff
gets in.
- Eg: legacy onion - we just looked at network use and removed when it was
unused:
  - Rip out how you generate them.
  - Rip out how you handle them.
- One of the issues with deleting old text is that existing software that
delete the old text is annoying when you run into interop issues on the old
spec version.
  - If there are real things to deal with on the network, we must keep them.
  - We must reference old commits so that people at least know what was
there and can do git archeology to find out what it used to be.
- We can remove some things today!
  - Static remote
  - Non-zero fee anchors
  - ANYSEGWIT is default (/compulsory)
  - Payment secrets / basic MPP.
- Should we have regular cleanups?
  - Even if we do, they need review.
- For now, let’s do wholesale replacement to avoid cleanup.
- The proposals folder is nice to know what’s touched by what changes.
- Rationale sections need improvement: sometimes they’re detailed,
sometimes vague.
- Once a feature becomes compulsory on the network we can possibly ignore
it.
- What about things that are neither bolts nor blips - like inbound fees?
  - Why does it need to be merged anywhere?
  - If it’s an implementation experiment, we can merge it once we’re
convinced it works.
  - If we reach a stage where we all agree it should be universally done,
then it should be a bolt.
     - This is a BLIP to BOLT path.
- Communication:
  - We’re not really using IRC anymore - bring it back!
  - We need a canonical medium, recommit to lightning-dev.

### Async Payments/ Trampoline
- Blinded payments are a nice improvement for trampoline because you don’t
know where the recipient is.
- The high level idea is:
  - Light nodes only see a small part of the network that they are close
to.
  - Recipients only give a few trampolines in the network that they can be
reached via.
  - In the onion for the first trampoline, there will be an onion for the
second trampoline.
  - You just need to give a trampoline a blinded path and they can do the
rest.
- If you only have one trampoline, they can probably make a good guess
where the payment came from (it’s in the reachable neighborhood).
- Is there a new sync mode for trampoline gossip?
  - We’d now need radius-based gossip rather than block based.
  - The trust version is just getting this from a LSP.
  - In cold bootstrap, you’re probably going to open a channel so you ask
them for gossip.
- Can you split MPP over trampoline? Yes.
- Routing nodes can learn more about the network because they make their
own attempts.

_______________________________________________
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

[Lightning-dev] LN Summit 2023 Notes

Reply via email to