On 2014/04/04 (Apr), at 2:16 PM, Matthew Toseland wrote:

> For several reasons I think we need to discuss the dubious security
> assumption of "it's hard to probe your peers' datastores", and the
> performance tradeoffs that go with it ...
> 
> Why is this important now?
> - Darknet bloom filters. Okay, it's darknet; there's more trust; so
> that's okay then? Maybe.
> - Opennet bloom filters (for long-lived peers). We'll need to identify
> long-lived peers anyway for tunnels and security (MAST countermeasures).
> So it'd be good to do bloom filter sharing with them.
> - Opportunistic data exchange (possibly with CBR) (on both darknet and
> opennet).
> - Broadcast probes at low HTL. Same effect as bloom filter sharing but
> work with low uptime nodes. Can be made cheap; we don't have to wait for
> them before forwarding, and we because latency isn't a big deal we can
> aggregate them and make them cheap. (We might want to wait for them,
> with timeouts, before finishing with DNF) Would probably have to be
> non-forwardable, to cut costs ... so it's a true probe. But then the
> security model is the same for bloom filter sharing - except for the
> bandwidth cost, which makes probing bloom filters a bit more costly...

This is in line with the saying that it is easier to move an operation ("do you 
have data X?") than to move data ("here's all the data I have!").

In my opinion, much of this almost-reached-the-data stuff (bloom filters, data 
probes, FOAF, etc) serves to hide some deep bug or design flaw; that is, to 
make a broken system usable. Passing INSERTs only to veteran nodes & having a 
"outer ring" of connectedness (only applicable to opennet) might fix the lower 
level issues.

IIRC, the motivation for bloom filter sharing was to hide the lookup from your 
peer; the theory being... the fact that your node has a particular datum is 
less interesting or volatile than the fact that someone is requesting it.

It's a bit curious, but intriguing, that you mention aggregating the data 
probes... seems kinda like hinting to your neighbors: "I'm working on X, Y, & 
Z... any leads?"... esp. If we need some packet padding anyway... then, well... 
in the common end case *most* of our neighbors will have already seen the 
request (right?)... so I'm not sure how much this buys us.

> What is the model?
> - Censorship: We want it to be hard for bad guys to identify nodes
> containing a specific key and eliminate them. Classically, fetching a
> key tells you where it's stored (the data source, used for path
> folding), but in fetching it you have propagated it. The problem with
> this theory is you've only propagated it to *caches*, not stores.

Hmm... what if.... whenever an item drops from the cache, if it is a small chk 
(i.e. one data packet, not a superblock) we turn it into a FAST_INSERT (a 
one-shot insert, no thread/state required)... you just drain your cache back 
into the network/store?

> -- Does this mean we need to give more protection to the store than the
> cache? E.g. only do bloom filter sharing for stores, only read store for
> broadcast probes?

Wouldn't that be providing even more information to an attacker? if we let them 
differentiate between what is stored versus cached? although... even just the 
address of the node & the data might give a good indication of that...

> -- Possibly we could avoid path folding if we served it from the store?
> However this might make path folding less effective...

...by one hop, and arguably make the attack less effective... by one hop... 
right?

> - Plausible deniability: The bad guys request a key from you. You return
> it. It might be in your store or it might not be; you might have
> forwarded the request, and the attacker can't tell. This is IMHO
> legalistic bull****. It's easy to tell with a timing attack (and
> whatever randomisation we add won't make it much harder, as long as you
> are fetching multiple keys). Also, RejectedOverload{local=false} implies
> it came from downstream, though probing would use minimal HTL so might
> not see that... (But at least *that* is fixable; it is planned to
> generate RejectedOverload{local=false} locally when we are over some
> fraction of our capacity, as part of load management reforms).
> 
> The security issues are doubtful IMHO. So it really comes down to
> censorship attacks... The problem with censorship attacks is inserts
> store the key on ~ 3 nodes; it's cached on lots of nodes, but the CHK
> cache turnover is very high; if you take out those 3 nodes, the data
> goes away very quickly. So the classic attack of "fetch it and kill the
> data source" may actually work - if you care about blocking single keys,
> or can kill lots of nodes (e.g. all elements of a splitfile middle layer
> etc).

All the more reason to (somehow) drain caches back into the network stores, and 
oddly enough... the *least-requested* may be the most important, in this case.

> Do we need even more "put it back where it should be" mechanisms? E.g.
> if a sink node finds the data from a store probe it should store it?
> Would this help with censorship attacks?

Wouldn't that require a *huge* lookup table (or bloom filter?!) ala 
RecentlyFailed.

> The average store turnover is unknown but guesswork suggests it's around
> 2 weeks ...

That sounds quite disagreeable. I'd much rather the original design goal of a 
nearly-infinite vat of nearly-permanent storage. :-)

Do you mean "time until the data is not reachable", or a node's stores 
*actually* getting so much data that they are rolling over?

--
Robert Hailey

> 
> Possibly we need to know the average turnover times of the store, cache,
> for SSKs and CHKs. Do we have a probe for that?
> (Might be a good reason for people to insert as SSK!)
_______________________________________________
Devl mailing list
[email protected]
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to