Re: [Lightning-dev] Improving the initial gossip sync

Fabrice Drouin Mon, 19 Feb 2018 10:05:29 -0800

I'm still pushing for the hash-based solution because it can be
implemented and developed quickly and easily and fixes the main issues
we've seen on testnet:
- routing sync on mobile nodes
- "route not found" errors when you're missing routing info.

It can be deployed as an optional feature and will give us time to
specify and implement proper IBLT-based filters.

It can be combined with the timestamp approach: nodes would send
bucket hashes + low and high watermarks.

I've tried and summarised the issue below:

## The problem

The current scheme (broadcast + optionally ask for a full routing
table when you connect) works well for nodes which are never
completely offline, but is becoming unpractical on mobile
node/end-user nodes which are often offline and connected to a few
peers. We need a way to improve the initial routing sync and retrieve
announcements that we've missed without having to download the entire
routing table.

Additionally, the only way to check that routing information is
consistent between different nodes is to ask each one of them to send
you their entire routing table. Exchanging filters/digests/… would
mitigate the issue of having to “trust” that your peers do provide do
with a good routing table, especially when you’re connected to very
few peers.

## Requirements

- Easy to specify and implement
- Low overhead
- Ability to retrieve missing routing information
- (Nice to have) Ability to query multiple peers for consistency checks

## Background

The current method for retrieving this routing table is to set the
`initial_routing_sync` flag in the `init` message that you send every
time you connect/reconnect to a peer, which will then send back its
entire routing table (currently 6000 channels on testnet).
If a node believes that a channel is available when it has in fact
been closed, and uses it to build a route, it will receive an error
message and try again without this specific channel.
But if a node is missing a channel, and cannot route payments, there
is currently no way to recover it: it has to wait until the channel
parameters are updated, and will then receive a `channel_announcement`
and the matching `channel_update`. This could take a very long time.

Hence, the only option for mobile nodes is to request a routing table
dump every time they connect/reconnect, which simply does not work.

We need a way to improve this initial table sync, which is simple
enough to be implemented and deployed quickly. Otherwise, these nodes
will probably use ugly specific hacks (like use their own mechanisms
for retrieving and syncing routing tables) or even rely on remote
servers to compute routes for them.

## Proposed solutions

### Timestamps/watermarks

When they connect to another peer, nodes send a timestamp (I know the
routing table up to X) or a pair of timestamps (I know the routing
table from time X to time Y).

Pros:
- Very simple to specify (use `channel_update` timestamps) and implement.
- Very little overhead
- Solves the “download the entire routing table every time” issue

Cons:
- Does not solve the "missing announcements" issue: if you're missing
routing info you would have to wait for channel parameters to be
updated etc.., as above

### Bucket hashes

Routing info (i.e. channel announcements) are grouped by buckets, one
bucket being a group of 144 blocks. A hash is computed for each
bucket, peers exchanges these hashes and send back all announcements
for which bucket hashes don't match.
We propose to use a bucket per block for the last 7 days, one bucket
per 144 blocks for older announcements,
If gives a total of `(365 + 7*144) = 1373` hashes, for a year of announcements

Pros:
- Simple to specify and implement
- Little overhead (a few dozen Kb)
- If a node is missing a few elements it will immediately recover
them, even if they're very old
- Works well when routing tables are mostly synchronized, which would
be the case for nodes which connect to a very small number of peers
- Bucket hashes are the same for all peers you connect to, and can be
used for consistency checks between peers

Cons:
- Buckets hashes are not queryable filters
- Since a single mismatch will invalidate an entire buckets, even with
small differences nodes could end up having to send their entire
routing table (which exactly what they are doing today)

### IBLT filters

Upon connection, nodes exchange information to estimate the number of
differences between their routing table.
Using this estimate, they build and exchange IBLT filters, and use
them to compute the announcements that they should send to their peer.

Pros:
- Queryable filters
- Very efficient if the number of differences is small, even with very
large routing tables

Cons:
- More complex to specify and implement: we need a good estimator for
the number of differences (send min height + max height + a sample of
announcements ?)
- Filters become peer-specific (similar to the server-side vs
client-side filtering for SPV clients)

On 16 February 2018 at 13:34, Fabrice Drouin <fabrice.dro...@acinq.fr> wrote:
> I like the IBLT idea very much but my understanding of how they work
> is that that the tricky part would be first to estimate the number of
> differences between "our" and "their" routing tables.
> So when we open a connection we would first exchange messages to
> estimate how far off we are (by sending a sample of shortids and
> extrapolate ?) then send filters which would be specific to each peer.
> This may become an issue if a node is connected to many peers, and is
> similar to the server-side vs client-side filtering issue for SPV
> clients.
> Basically, I fear that it would take some time before it is agreed
> upon and available, hindering the development of mobile nodes.
>
> The bucket hash idea is naive but is very simple to implement and
> could buy us enough time to implement IBLT filters properly. Imho the
> timestamp idea does not work for the mobile phone use case (but is
> probably simpler and better that bucket hashes for "centre" nodes
> which are never completely off the grid)
>
>
> On 14 February 2018 at 01:24, Rusty Russell <ru...@rustcorp.com.au> wrote:
>> Fabrice Drouin <fabrice.dro...@acinq.fr> writes:
>>> Yes, real filters would be better, but the 'bucket hash' idea works
>>> (from what I've seen on testnet) for our specific target (nodes which
>>> are connected to very small number of peers and go offline very
>>> often)
>>
>> What if we also add an 'announce_query' message: if you see a
>> 'channel_update' which you discard because you don't know the channel,
>> 'announce_query' asks them to send the 'channel_announce' for that
>> 'short_channel_id' followed by re-sending the 'channel_update'(s)?
>> (Immediately, rather than waiting for next gossip batch).
>>
>> I think we would want this for IBLT, too, since you'd want this to query
>> any short-channel-id you extract from that which you don't know about.
>
> Yes, unless it is part of the initial sync (compare filters. then send
> what they're missing)
>
>> I see.  (BTW, your formatting makes your post sounds very Zen!).
> Sorry about that, I've disabled the haiku mode :)
>
>> Yes, we can probably use difference encoding and use 1 bit for output
>> index (with anything which is greater appended) and get down to 1 byte
>> per channel_id at scale.
>>
>> But my rule-of-thumb for scaling today is 1M - 10M channels, and that
>> starts to get a little excessive.  Hence my interest in IBLTs, which are
>> still pretty trivial to implement.
>
> Yes, sending all shortids would also have been a temporary measure
> while a more sophisticated approach is being designed.
>>
>> Cheers,
>> Rusty.
_______________________________________________
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

Re: [Lightning-dev] Improving the initial gossip sync

Reply via email to