On Apr 30, 2008, at 6:56 PM, Matt Mahoney wrote:
Which protocol are you referring to, the one described on my web page
or the abstract one described in my thesis?


Thesis.


In the abstract one I
described a network of n identical (but unreliable) peers, each
connected to c = O(log n) peers,...


Yeah, that is kind of what I meant. You have to design the protocol to guarantee that the aggregate network does not develop pathologies that greatly reduce its efficiency and utility, otherwise you *will* end up at a pathological minima. If you look at, for example, the evolution of routing protocols they learned this lesson the hard way -- multiple times. If you study Nash Equilibria and similar, you will find that there are constrained cases that will allow you to design a protocol that will generate efficient equilibria, but real network characteristics increasingly violate the set of constrained cases that we know how to design a decentralized protocol for. If you study BGP4 (core routing protocol), you will see that it essentially uses the venerable Marginal Cost Pricing (MCP) optimization strategy, but with an increasing number of patches to deal with the fact that the assumptions that make it work are increasingly violated.

Your model above tacitly predicates its optimality on a naive MCP strategy, but is not particularly well-suited for it. In short, this means that you are assuming that the aggregate latency function for a transaction over the network is a close proxy for the transaction cost. At one time this might have been a reasonable assumption, but it becomes less true every year. There are some interesting new strategies from other fields of mathematics that kind of look like a more adaptive MCP strategy but which does not really solve the problem -- a google search on MCP will put you in the link ballpark to find all the other literature. It is actually possible to make applications like yours work in an MCP model, but it requires fine-grained, authoritative accounting in an architecture that is supposed to be decentralized which kind of defeats the purpose -- you have to be able to trust every node in a strong sense.


There are an increasing number of problem and application spaces that are very pathological on decentralized topologies that are optimized with conventional MCP models. This is a big theoretical concern right now, both because conventional networks are diverging from the robust assumptions that make it valid and because there are organizations that want to build decentralized systems that they know upfront cannot be built in such a way that they do not collapse into an expensively pathological morass. Most of the basic computer science literature on decentralized systems limits itself to models based on the assumption of a naive MCP strategy or partially centralized control to maintain optimality.

I cannot tell you how to solve the problem, as I am not aware of a good solution other than strictly centralizing key pieces of metadata control which will impose its own serious issues. One of the benefits of tightly controlled cloud/cluster models (e.g. Google) is that you can impose optimizing constraints that are utterly impractical or impossible to impose on the wild and wooly internet at large.

This is a problem I have spent a lot of time on but, other than developing a really deep appreciation of the theory surrounding the development of pervasively decentralized topology protocols that do not slowly destroy themselves in the wild, I cannot say that I have made any progress on it. It is a broad theoretical problem that is kind of sneaking up on us, and I suspect over the long run we'll use one of the less than ideal modifications of MCP (like using a central authority) to deal with it.

Like I said, I don't have a good solution for you that will allow things to scale the way you want, I mostly wanted to point out that these issues had not been addressed appropriately for the scale being discussed. Interestingly, it seems that hardly anyone that works on decentralized systems design is aware of these issues -- it is the domain of theoretical mathematicians and routing geeks. For small systems, it really doesn't matter which may be part of the reason why.


For your purposes in the broadest sense, things like kD-trees
will drop dead for pretty trivial systems, never mind for something
ambitious.  On the other hand, generalized distribution with O(n)
storage complexity was solved last year which may or may not address
your issues.

Storage is O(n) using an organizational tree, but that would not be
robust. I think the O(log n) factor is not a big penalty. You want to
have more pointers per peer as the network grows, and you want to have
more cached and backup copies of messages.


No existing spatial index that is general (i.e. not restricted to a set of data with carefully tailored characteristics) will distribute beyond a few dozen nodes, and it gets worse fast when you add dimensions. The new solution I mention above is fully general and scales extremely well in a distributed, decentralized environment, which seemed to be the issue that needed solving.

The exception to this is if your spatial index is built once and never modified (i.e. read-only) in which case it is possible to have one that is general and which scales to modest size. Its a real mess, and scalability is something like a five- or six-axis space when talking about these kinds of algorithms.

J. Andrew Rogers

-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Reply via email to