On Apr 30, 2008, at 6:56 PM, Matt Mahoney wrote:
Which protocol are you referring to, the one described on my web page
or the abstract one described in my thesis?
Thesis.
In the abstract one I
described a network of n identical (but unreliable) peers, each
connected to c = O(log n) peers,...
Yeah, that is kind of what I meant. You have to design the protocol
to guarantee that the aggregate network does not develop pathologies
that greatly reduce its efficiency and utility, otherwise you *will*
end up at a pathological minima. If you look at, for example, the
evolution of routing protocols they learned this lesson the hard way
-- multiple times. If you study Nash Equilibria and similar, you will
find that there are constrained cases that will allow you to design a
protocol that will generate efficient equilibria, but real network
characteristics increasingly violate the set of constrained cases that
we know how to design a decentralized protocol for. If you study BGP4
(core routing protocol), you will see that it essentially uses the
venerable Marginal Cost Pricing (MCP) optimization strategy, but with
an increasing number of patches to deal with the fact that the
assumptions that make it work are increasingly violated.
Your model above tacitly predicates its optimality on a naive MCP
strategy, but is not particularly well-suited for it. In short, this
means that you are assuming that the aggregate latency function for a
transaction over the network is a close proxy for the transaction
cost. At one time this might have been a reasonable assumption, but
it becomes less true every year. There are some interesting new
strategies from other fields of mathematics that kind of look like a
more adaptive MCP strategy but which does not really solve the problem
-- a google search on MCP will put you in the link ballpark to find
all the other literature. It is actually possible to make applications
like yours work in an MCP model, but it requires fine-grained,
authoritative accounting in an architecture that is supposed to be
decentralized which kind of defeats the purpose -- you have to be able
to trust every node in a strong sense.
There are an increasing number of problem and application spaces that
are very pathological on decentralized topologies that are optimized
with conventional MCP models. This is a big theoretical concern right
now, both because conventional networks are diverging from the robust
assumptions that make it valid and because there are organizations
that want to build decentralized systems that they know upfront cannot
be built in such a way that they do not collapse into an expensively
pathological morass. Most of the basic computer science literature on
decentralized systems limits itself to models based on the assumption
of a naive MCP strategy or partially centralized control to maintain
optimality.
I cannot tell you how to solve the problem, as I am not aware of a
good solution other than strictly centralizing key pieces of metadata
control which will impose its own serious issues. One of the
benefits of tightly controlled cloud/cluster models (e.g. Google) is
that you can impose optimizing constraints that are utterly
impractical or impossible to impose on the wild and wooly internet at
large.
This is a problem I have spent a lot of time on but, other than
developing a really deep appreciation of the theory surrounding the
development of pervasively decentralized topology protocols that do
not slowly destroy themselves in the wild, I cannot say that I have
made any progress on it. It is a broad theoretical problem that is
kind of sneaking up on us, and I suspect over the long run we'll use
one of the less than ideal modifications of MCP (like using a central
authority) to deal with it.
Like I said, I don't have a good solution for you that will allow
things to scale the way you want, I mostly wanted to point out that
these issues had not been addressed appropriately for the scale being
discussed. Interestingly, it seems that hardly anyone that works on
decentralized systems design is aware of these issues -- it is the
domain of theoretical mathematicians and routing geeks. For small
systems, it really doesn't matter which may be part of the reason why.
For your purposes in the broadest sense, things like kD-trees
will drop dead for pretty trivial systems, never mind for something
ambitious. On the other hand, generalized distribution with O(n)
storage complexity was solved last year which may or may not address
your issues.
Storage is O(n) using an organizational tree, but that would not be
robust. I think the O(log n) factor is not a big penalty. You want
to
have more pointers per peer as the network grows, and you want to have
more cached and backup copies of messages.
No existing spatial index that is general (i.e. not restricted to a
set of data with carefully tailored characteristics) will distribute
beyond a few dozen nodes, and it gets worse fast when you add
dimensions. The new solution I mention above is fully general and
scales extremely well in a distributed, decentralized environment,
which seemed to be the issue that needed solving.
The exception to this is if your spatial index is built once and never
modified (i.e. read-only) in which case it is possible to have one
that is general and which scales to modest size. Its a real mess, and
scalability is something like a five- or six-axis space when talking
about these kinds of algorithms.
J. Andrew Rogers
-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription:
http://www.listbox.com/member/?member_id=8660244&id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com