Re: [Lightning-dev] Routemap scaling (was: Just in Time Routing (JIT-Routing) and a channel rebalancing heuristic as an add on for improved routing success in BOLT 1.0)

m.a.holden via Lightning-dev Thu, 04 Apr 2019 00:09:12 -0700

Good morning ZmnSCPxj, thanks for the response.


> I would be hesitant to divide the world in such manner.
> I understand that in typical computer science, splitting your objects up into 
> smaller parts is a long-accepted method of doing things.

> However, when it comes to finances and political power (the power to censor 
> or disrupt), such splitting is highly undesirable.

> Thus I am strongly allergic to things that would "hierarchical"ize or 
> "bucket"ize or even just have a separation between "endpoint" and "core" 
> network.

> I would rather propose for acceptance into BOLT, such proposals that would 
> keep the network homogeneous.

Firstly, I completely agree with you that we should not be splitting up the 
network in any way which nodes or channels can be politically targeted, or in 
which any node is designated special privelege or status. I have the same 
allergy as you and took this into considerations when proposing this, and some 
of your suggestions are along a simlar  line of thinking as mine.

I think you might have misunderstood what I was proposing, but it's probably my 
fault for not expressing it well. With my suggestion, all nodes continue to be 
equal participants and there are no special nodes. I used the term "endpoint" 
previously to mean one of the two nodes which a regular channel belongs to, and 
not to mean some kind of special node. Any node can open channels to any other 
node - the "buckets" are merely a strategy for locally organizing gossip so 
that it can be made more efficient. The term "buckets" is used in the 
descriptions of other DHTs such as Kademlia too, and they don't refer to 
splitting up the global network, they merely provide a perspective for looking 
at subsections of it.

I'm certainly interested in any solutions which keep the network permissionless 
because we really don't want to end up with DNS 2.0 or similar.

> Various nodes may have different resources (BTC, CPU, memory, bandwidth, 
> latency).
> Such inequalities are inevitable in this universe.
> But these nodes should still, as much as we can, remain peers on the network.

My suggestion accounts for the difference in computational requirements, as 
each node can determine its own depth in  the tree based on the approximate 
quantity of information it wishes to gossip. A node could filter information to 
whichever depth of the tree they wished, by setting the bucket size at which 
they spill. This also allows for the size of gossip to dynamically shrink as 
the network grows, and is similar to garbage collection, in which anything 
which isn't part of the destination bucket on spilling is purged.

Nodes could also pick (multiple) specific quadtree buckets to communicate all 
gossip about through filters they negotiate (the filter being the hash prefix 
of the desired bucket). It might be desirable to broadcast their filter as part 
of the gossip itself, so that other nodes can learn who are the better 
information providers, but again this would be completely optional.

---

> This rule can be probabilistically fulfilled by having each node N know *at 
> least* routes to some nodes Ls..., where for all L <- Ls, dist(N, L) < some 
> constant.
> The constant here can be selected by each node independently, depending on 
> its memory and CPU capacity.
(the node knowing more routes than that is perfectly fine)

The quadtree proposes a similar idea, but instead of using some constant, N 
knows about routes to Ls, where each L is within the same bucket. Essentially, 
i < dist(N, L) <= j, where i=BUCKET_MIN and j=BUCKET_MAX. For example, if using 
8-bit ids and the first two bits identify the bucket, then i=00000000, 
j=00111111. I guess the equivalent distance using your constant would be 
k=00100000, to cover the same range but without discriminating the results 
based on any boundaries like my suggestion does.

> Then, we can have the following global rule:
> * Given three nodes X, Y, and Z, and dist(X, Z) < dist(Y, Z), then X SHOULD 
> be more likely to know a route from itself to Z, than Y to know a route from 
> itself to Z, is also met for nodes which are in different buckets.

This rule is essentially the same as what I was thinking for the distance 
between buckets. With the autopilot suggestion of decreasing the number of 
channels opened as distance increases, the probability of knowing about a 
neighbouring bucket is increased compared with a distant one.

---

> It looks for any node L <- Ls such that dist(L, D) < dist(N, D).
> In fact, it can just sort those nodes according to dist(L, D) and start with 
> the lowest distance, and fail once it reaches a dist(L, D) that exceeds its 
> own dist(N, D).
> The above algorithm converges since dist(N, D) for each N that is delegated 
> to will progressively get smaller until we reach some N that knows the 
> destination D.

This proposal also seems very similar to how existing DHTs work, unless I'm 
mistaken. My problem with the approach of existing DHTs is that they are 
suboptimal for the number of queries which must be made to find a route - which 
is worst-case O(log n) for example, in Chord or Kademlia. This isn't a problem 
for things like file-sharing where latency isn't a major concern, but we don't 
really want to be waiting for a bunch of queries if we're stood in queue to pay 
for a coffee. (Given also that some routes may fail due to inherent constraints 
of the LN itself). As the network grows, the efficiency of route finding 
declines too.

I came up with the idea of using the quadtree specifically for trying to reduce 
the maxmimum (or typical) route length to query where possible, at the expense 
of storing much more information than existing DHTs, but trying to get 
reasonable savings on resources. Although the worst-case query cost is still 
O(log n), it seems that this is unlikely to occur and O(1) seems plausible for 
small depth.

The other expense is that this approach will not find the most optimal routes, 
as it prioiritizes considering the smallest number of buckets. However, it is 
not possible to know the most optimal path without knowing about the entire 
network topology anyway, so this problem exists with the DHT too. I've 
optimized for reduced querying.

> All nodes remain peers, there are no special "flare nodes", no special "knows 
> the entire map" nodes, no "knows my octant" nodes, no "endpoint" nodes, no 
> hubs and spokes.

There are no special nodes in my approach, only a commitment to maintain 
information about nodes (and their channels) whose PKH prefix matches your own 
at the depth you have chosen to gossip, regardless of whether or not they've 
advertized the same feature. A node expressing depth=0 would be a "knows the 
entire map" node, but it does not give them any special status.

> The network remains homogeneous and all are still peers, some might have 
> smaller routemaps than others, but all are still equal in the network, as all 
> things should be.

All nodes are still equal, but the network isn't entirely homogeneous (in the 
distribution of channels) because of the autopilot suggestions which prioritize 
local channels and deprioritize distant channels, with the goal of improving 
efficiency. I guess there is some trade-off with this approach, but it's still 
worth considering, because "perfectly unstructured" isn't the end goal - there 
may be some middle ground between that and other approaches which tend towards 
centralization.

It may even be harmful to try to keep it perfectly unstructured, because with 
no structure, the network will effectively be "maximally inefficient" for those 
not maintaining sufficient gossip information. If by default, it is inefficient 
to find a route, then there will inevitably be somebody who will fill that 
market gap. Big players will have no problem keeping knowledge of the whole 
network and giving information about paths in just one query - in exchange for 
people's data. Like DNS.

A semi-structured approach might provide enough incentive for everyone to keep 
as much information locally as realistic for the resources they have, and like 
Adam Smith's invisible hand, by doing so they not only benefit themselves, but 
they benefit everybody by improving the efficiency of the network overall. The 
network would become "reasonably efficient" for everyone, rather than 
inefficient for most except the few with sufficient resources to maintain a 
useful routemap.

---

I'll try to convey by example how I think censorship should not be cause for 
concern with this quadtree, and what savings might be expected. This makes some 
crude assumptions using the current network size, with my previous suggestions 
for autopilot (ie, nodes open 1/2 their channels to nodes in their own bucket, 
and decreasing with distance), and assumes uniform distribution of node ids (we 
assume there is no numeric bias in the hashes of public keys).

There are currently ~4000 nodes with an average of ~20 channels per node 
(~40,000 channels) which we need to keep information about at depth=0.

If we spill to depth=1, then ~1,000 nodes will go into each bucket. If each has 
(on average) ~10 channels to other nodes in the same bucket, then there will be 
~5,000 "intra-bucket" channels in each bucket. Each node also has ~10 channels 
to nodes in other buckets ("inter-bucket" channels), which is another ~10,000 
channels it maintains information for.

Note that "intra-bucket" and "inter-bucket" channels are just plain channels - 
there's nothing special about them and they are merely the difference in 
perspective which each node will view them based on whether either node's PKH 
prefix is the same bucket as their own PKH prefix at the depth which they 
filter gossip.

Since we've filtered out all channels where neither node is in our own bucket, 
we now only need to know about ~15,000 channels in total, compared to the 
original 40,000 (~37.5% of total channels in the network). We still need to 
know about 1,000 nodes in our own bucket, and we keep information about the 
nodes at the other end of the 10,000 inter-bucket channels we know about, which 
could still be anything up to 100% of the nodes in the global network, but 
nodes will be filtered if none of the 10,000 inter-bucket channels we know 
about belong to them, so nodes we track information about is 1000 < N <= 4000.

We don't suddenly have no fault-tolerance or a censorship threat to find a node 
in another bucket here. Therea are still ~3,333 known channels on average 
between our bucket and each of the other 3 buckets at depth=1. Since they keep 
all of the gossip for their own bucket, then the chance that any of them know 
the destination for a payment is likely - so the number of queries you would 
typically need to make is 1 (or several *in parallel*). The node you query 
should know the remaining route from himself to the destination. You may chose 
to query more information for potential privacy enhancement.

If we spill to depth=2, then we now have 16 total leaf-buckets, with ~250 nodes 
each. Each bucket would have ~1250 intra-bucket channels, and ~2500 
inter-bucket channels. Information requirement per bucket is now ~3750 channels 
(~10% of the total network, which is a reasonable saving). Of the inter-bucket 
channels, 1/2 of them are to the 3 sibling buckets, making ~400 potential 
routes to any of the siblings, and the other 1/2 are to "cousin" buckets, of 
which there are 12, resulting in ~100 channels on average to those buckets. 
Still likely sufficient to have a typical query of 1, with fault-tolerance.

At depth=3 it is ~62 nodes per bucket, ~310 intra-bucket channels, ~620 
inter-bucket channels and 64 possible leaf-buckets. resulting in ~100 channels 
to each sibling, ~13 channels to each cousin, and just 1 or 2 channels to each 
second-cousin (the most distant buckets).

Only at this point does querying start to become potentially expensive if you 
want to make the payment to a distant node. You might still have some direct 
routes to the second-cousin buckets, but not much fault-tolerance. However, 
your cousins or the siblings of your second-cousin have a higher probability of 
knowing about a node at that distance than local nodes will, so you still have 
numerous options for discovering a node but it might take multiple queries. 
Queries would use a similar greedy algorithm approach to the one you have 
suggested. It should still be significantly less than worst-case query cost.

Resource constrained devices might spill to these limits at the cost of 
requiring more queries for payments. In practice, you probably wouldn't go 
beyond depth=2 as above unless the global network gets sufficiently large that 
bandwidth requirements at depth=2 are a problem, which probably isn't going to 
be the case until the network is a magnitude larger than it already is - and in 
which case the number of "inter-bucket" channels will be tenfold the current 
amount and spilling to depth=4 or 5 may become plausible.

There is also potential for analysts to find out which buckets do not have any, 
or have few channels between them, and to take the opportunity to fill that gap 
to try and benefit from routing fees, as those new channels would be 
prioritized over longer routes which span multiple buckets. The autopilot 
suggestions are only a starting point, but eventually it will be mostly market 
driven.

---

My idea is not fully researched, but since the topic was raised, I decided to 
share my incomplete thoughts about it. The choice of a quadtree itself was 
quite arbitrary, but it seemed reasonable after briefly considering alternative 
arity trees, and the potential for linking this to approximate geographic 
location to optimize local payments seemed like it could be valuable too.

I'll give some more thought to your suggestions and reconsider my own with 
these new suggestions.

Regards,
Mark H
_______________________________________________
Lightning-dev mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

Re: [Lightning-dev] Routemap scaling (was: Just in Time Routing (JIT-Routing) and a channel rebalancing heuristic as an add on for improved routing success in BOLT 1.0)

Reply via email to