Good morning ZmnSCPxj, thanks for the response.
> I would be hesitant to divide the world in such manner.
> I understand that in typical computer science, splitting your objects up into
> smaller parts is a long-accepted method of doing things.
> However, when it comes to finances and political power (the power to censor
> or disrupt), such splitting is highly undesirable.
> Thus I am strongly allergic to things that would "hierarchical"ize or
> "bucket"ize or even just have a separation between "endpoint" and "core"
> network.
> I would rather propose for acceptance into BOLT, such proposals that would
> keep the network homogeneous.
Firstly, I completely agree with you that we should not be splitting up the
network in any way which nodes or channels can be politically targeted, or in
which any node is designated special privelege or status. I have the same
allergy as you and took this into considerations when proposing this, and some
of your suggestions are along a simlar line of thinking as mine.
I think you might have misunderstood what I was proposing, but it's probably my
fault for not expressing it well. With my suggestion, all nodes continue to be
equal participants and there are no special nodes. I used the term "endpoint"
previously to mean one of the two nodes which a regular channel belongs to, and
not to mean some kind of special node. Any node can open channels to any other
node - the "buckets" are merely a strategy for locally organizing gossip so
that it can be made more efficient. The term "buckets" is used in the
descriptions of other DHTs such as Kademlia too, and they don't refer to
splitting up the global network, they merely provide a perspective for looking
at subsections of it.
I'm certainly interested in any solutions which keep the network permissionless
because we really don't want to end up with DNS 2.0 or similar.
> Various nodes may have different resources (BTC, CPU, memory, bandwidth,
> latency).
> Such inequalities are inevitable in this universe.
> But these nodes should still, as much as we can, remain peers on the network.
My suggestion accounts for the difference in computational requirements, as
each node can determine its own depth in the tree based on the approximate
quantity of information it wishes to gossip. A node could filter information to
whichever depth of the tree they wished, by setting the bucket size at which
they spill. This also allows for the size of gossip to dynamically shrink as
the network grows, and is similar to garbage collection, in which anything
which isn't part of the destination bucket on spilling is purged.
Nodes could also pick (multiple) specific quadtree buckets to communicate all
gossip about through filters they negotiate (the filter being the hash prefix
of the desired bucket). It might be desirable to broadcast their filter as part
of the gossip itself, so that other nodes can learn who are the better
information providers, but again this would be completely optional.
---
> This rule can be probabilistically fulfilled by having each node N know *at
> least* routes to some nodes Ls..., where for all L <- Ls, dist(N, L) < some
> constant.
> The constant here can be selected by each node independently, depending on
> its memory and CPU capacity.
(the node knowing more routes than that is perfectly fine)
The quadtree proposes a similar idea, but instead of using some constant, N
knows about routes to Ls, where each L is within the same bucket. Essentially,
i < dist(N, L) <= j, where i=BUCKET_MIN and j=BUCKET_MAX. For example, if using
8-bit ids and the first two bits identify the bucket, then i=00000000,
j=00111111. I guess the equivalent distance using your constant would be
k=00100000, to cover the same range but without discriminating the results
based on any boundaries like my suggestion does.
> Then, we can have the following global rule:
> * Given three nodes X, Y, and Z, and dist(X, Z) < dist(Y, Z), then X SHOULD
> be more likely to know a route from itself to Z, than Y to know a route from
> itself to Z, is also met for nodes which are in different buckets.
This rule is essentially the same as what I was thinking for the distance
between buckets. With the autopilot suggestion of decreasing the number of
channels opened as distance increases, the probability of knowing about a
neighbouring bucket is increased compared with a distant one.
---
> It looks for any node L <- Ls such that dist(L, D) < dist(N, D).
> In fact, it can just sort those nodes according to dist(L, D) and start with
> the lowest distance, and fail once it reaches a dist(L, D) that exceeds its
> own dist(N, D).
> The above algorithm converges since dist(N, D) for each N that is delegated
> to will progressively get smaller until we reach some N that knows the
> destination D.
This proposal also seems very similar to how existing DHTs work, unless I'm
mistaken. My problem with the approach of existing DHTs is that they are
suboptimal for the number of queries which must be made to find a route - which
is worst-case O(log n) for example, in Chord or Kademlia. This isn't a problem
for things like file-sharing where latency isn't a major concern, but we don't
really want to be waiting for a bunch of queries if we're stood in queue to pay
for a coffee. (Given also that some routes may fail due to inherent constraints
of the LN itself). As the network grows, the efficiency of route finding
declines too.
I came up with the idea of using the quadtree specifically for trying to reduce
the maxmimum (or typical) route length to query where possible, at the expense
of storing much more information than existing DHTs, but trying to get
reasonable savings on resources. Although the worst-case query cost is still
O(log n), it seems that this is unlikely to occur and O(1) seems plausible for
small depth.
The other expense is that this approach will not find the most optimal routes,
as it prioiritizes considering the smallest number of buckets. However, it is
not possible to know the most optimal path without knowing about the entire
network topology anyway, so this problem exists with the DHT too. I've
optimized for reduced querying.
> All nodes remain peers, there are no special "flare nodes", no special "knows
> the entire map" nodes, no "knows my octant" nodes, no "endpoint" nodes, no
> hubs and spokes.
There are no special nodes in my approach, only a commitment to maintain
information about nodes (and their channels) whose PKH prefix matches your own
at the depth you have chosen to gossip, regardless of whether or not they've
advertized the same feature. A node expressing depth=0 would be a "knows the
entire map" node, but it does not give them any special status.
> The network remains homogeneous and all are still peers, some might have
> smaller routemaps than others, but all are still equal in the network, as all
> things should be.
All nodes are still equal, but the network isn't entirely homogeneous (in the
distribution of channels) because of the autopilot suggestions which prioritize
local channels and deprioritize distant channels, with the goal of improving
efficiency. I guess there is some trade-off with this approach, but it's still
worth considering, because "perfectly unstructured" isn't the end goal - there
may be some middle ground between that and other approaches which tend towards
centralization.
It may even be harmful to try to keep it perfectly unstructured, because with
no structure, the network will effectively be "maximally inefficient" for those
not maintaining sufficient gossip information. If by default, it is inefficient
to find a route, then there will inevitably be somebody who will fill that
market gap. Big players will have no problem keeping knowledge of the whole
network and giving information about paths in just one query - in exchange for
people's data. Like DNS.
A semi-structured approach might provide enough incentive for everyone to keep
as much information locally as realistic for the resources they have, and like
Adam Smith's invisible hand, by doing so they not only benefit themselves, but
they benefit everybody by improving the efficiency of the network overall. The
network would become "reasonably efficient" for everyone, rather than
inefficient for most except the few with sufficient resources to maintain a
useful routemap.
---
I'll try to convey by example how I think censorship should not be cause for
concern with this quadtree, and what savings might be expected. This makes some
crude assumptions using the current network size, with my previous suggestions
for autopilot (ie, nodes open 1/2 their channels to nodes in their own bucket,
and decreasing with distance), and assumes uniform distribution of node ids (we
assume there is no numeric bias in the hashes of public keys).
There are currently ~4000 nodes with an average of ~20 channels per node
(~40,000 channels) which we need to keep information about at depth=0.
If we spill to depth=1, then ~1,000 nodes will go into each bucket. If each has
(on average) ~10 channels to other nodes in the same bucket, then there will be
~5,000 "intra-bucket" channels in each bucket. Each node also has ~10 channels
to nodes in other buckets ("inter-bucket" channels), which is another ~10,000
channels it maintains information for.
Note that "intra-bucket" and "inter-bucket" channels are just plain channels -
there's nothing special about them and they are merely the difference in
perspective which each node will view them based on whether either node's PKH
prefix is the same bucket as their own PKH prefix at the depth which they
filter gossip.
Since we've filtered out all channels where neither node is in our own bucket,
we now only need to know about ~15,000 channels in total, compared to the
original 40,000 (~37.5% of total channels in the network). We still need to
know about 1,000 nodes in our own bucket, and we keep information about the
nodes at the other end of the 10,000 inter-bucket channels we know about, which
could still be anything up to 100% of the nodes in the global network, but
nodes will be filtered if none of the 10,000 inter-bucket channels we know
about belong to them, so nodes we track information about is 1000 < N <= 4000.
We don't suddenly have no fault-tolerance or a censorship threat to find a node
in another bucket here. Therea are still ~3,333 known channels on average
between our bucket and each of the other 3 buckets at depth=1. Since they keep
all of the gossip for their own bucket, then the chance that any of them know
the destination for a payment is likely - so the number of queries you would
typically need to make is 1 (or several *in parallel*). The node you query
should know the remaining route from himself to the destination. You may chose
to query more information for potential privacy enhancement.
If we spill to depth=2, then we now have 16 total leaf-buckets, with ~250 nodes
each. Each bucket would have ~1250 intra-bucket channels, and ~2500
inter-bucket channels. Information requirement per bucket is now ~3750 channels
(~10% of the total network, which is a reasonable saving). Of the inter-bucket
channels, 1/2 of them are to the 3 sibling buckets, making ~400 potential
routes to any of the siblings, and the other 1/2 are to "cousin" buckets, of
which there are 12, resulting in ~100 channels on average to those buckets.
Still likely sufficient to have a typical query of 1, with fault-tolerance.
At depth=3 it is ~62 nodes per bucket, ~310 intra-bucket channels, ~620
inter-bucket channels and 64 possible leaf-buckets. resulting in ~100 channels
to each sibling, ~13 channels to each cousin, and just 1 or 2 channels to each
second-cousin (the most distant buckets).
Only at this point does querying start to become potentially expensive if you
want to make the payment to a distant node. You might still have some direct
routes to the second-cousin buckets, but not much fault-tolerance. However,
your cousins or the siblings of your second-cousin have a higher probability of
knowing about a node at that distance than local nodes will, so you still have
numerous options for discovering a node but it might take multiple queries.
Queries would use a similar greedy algorithm approach to the one you have
suggested. It should still be significantly less than worst-case query cost.
Resource constrained devices might spill to these limits at the cost of
requiring more queries for payments. In practice, you probably wouldn't go
beyond depth=2 as above unless the global network gets sufficiently large that
bandwidth requirements at depth=2 are a problem, which probably isn't going to
be the case until the network is a magnitude larger than it already is - and in
which case the number of "inter-bucket" channels will be tenfold the current
amount and spilling to depth=4 or 5 may become plausible.
There is also potential for analysts to find out which buckets do not have any,
or have few channels between them, and to take the opportunity to fill that gap
to try and benefit from routing fees, as those new channels would be
prioritized over longer routes which span multiple buckets. The autopilot
suggestions are only a starting point, but eventually it will be mostly market
driven.
---
My idea is not fully researched, but since the topic was raised, I decided to
share my incomplete thoughts about it. The choice of a quadtree itself was
quite arbitrary, but it seemed reasonable after briefly considering alternative
arity trees, and the potential for linking this to approximate geographic
location to optimize local payments seemed like it could be valuable too.
I'll give some more thought to your suggestions and reconsider my own with
these new suggestions.
Regards,
Mark H
_______________________________________________
Lightning-dev mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev