On 6 October 2010 07:40, Zooko O'Whielacronx <[email protected]> wrote: > Dear Rufus Pollock and other OKF folks:
Thanks for writing Zooko -- and let me re-iterate I have nothing but admiration for the sterling work you guys have been doing. Anyway to comments on particular points ... [...] > I've reviewed the discussion that Rufus started on the tahoe-dev > mailing list nine months ago [1]. Back then I thought that what Rufus > was asking for sounded reasonable enough, and much of it seemed > definitely doable, but for some of it I wasn't really sure of the > details—what specifically was required and if it was a reasonable > thing to want or if it was even possible to implement it all. I'm Absolutely -- also, my impression was storage accounting was a fairly big job ... > still not entirely sure today, and I'm interested in seeing how some > other tools such as MongoDB provide for OKF's needs. If it can, then > that example can show me how Tahoe-LAFS can be used likewise. If it > can't, then this gives me increased confidence that the original > desiderata for the OKF grid were too strong. Excellent point -- I was talking with an ex-researcher in wide-area distributed storage a few months ago and he basically said: this is hard problem and no-one has solved it yet (not necessarily hard technically but socially - having enough people participating to ensure a stable grid). For more on these challenges see: <http://lists.okfn.org/pipermail/open-science/2010-April/000265.html> Current spec of overall requirements is (from <http://wiki.okfn.org/p/Distributed_Storage/>): There is an addressable file-space (e.g. a virtual file-system) which is distributed over multiple machines (nodes). Key features: * '''Wide area''': we have a preference for a wide-area system, i.e. we do not expect all the nodes to be in a single data-centre or on a single high-speed network but rather to be distributed across the Internet. * Even a single data-centre solution would be interesting though * '''Robustness''': data must not be lost if a given node (or even k) nodes disappear * This implies replication, i.e. data must be automatically replicated across nodes * '''Easy addition of nodes''': it should be easy for an average sysadmin to install and configure a node (e.g. debian package should be available) * We want people to be able to easily "donate" nodes * '''Share/shard-rebalancing''': should have good re-balancing to handle (permanent) node entry and exit * '''Different file sizes''': the system should be able to handle small and very large files (so files should be automatically sharded) * '''Availability''': high guarantee of data availability (so the disappearance of a given node) * '''Open data focused''': focused on data/content that is [[http://opendefinition.org/|open]] so encryption/privacy is '''not''' a priority * '''F/OSS''': must be free/open source software so we can build [[http://opendefinition.org/ossd|open services]] * '''Eventually consistent''': Concurrency/Consistency is not required as long as eventually consistent (we know our CAP) > In this note I'll talk about first encryption and then space accounting. > > > Let's tackle the issue of encryption, because I think it is kind of a > red herring and I hope to get it out of the way and concentrate on the > really hard issues. Tahoe-LAFS's encryption can be understood as: I mostly agree (my point was always about usability not about any flaws in the model) so let's assume complete agreement and I'll snip this section. [...] > Next, let's talk about the "space accounting" issue. This one I > definitely understand as being a reasonable thing to want and a thing > that could be feasibly implemented. Let's distinguish between two > goals: > > Goal 1: I want to allow users to read (download) files without thereby > allowing them to write (upload) them. Yes. > Goal 2: I want to allow server operators to contribute space on their > storage server without thereby allowing them to consume space on other > storage servers. Yes. Though the two relate. Given the p2p nature of tahoe (if I understand correctly) if someone else starts a node and joins the network and allows upload on *that* node that content will propagate to other nodes. I guess the answer is that node owners should shut down write access except through that main proxy. > Goal 1 is already possible using an HTTP proxy in front of the > Tahoe-LAFS gateway. This is already done in practice, as recently > discussed on the tahoe-dev list [2]. That's what we also implemented with <http://knowledgeforge.net/okfn/grid/> (I apologize for not announcing that on tahoe-dev but it was, and is, rather alpha ...) > Goal 2 is much trickier. To allow goal 2, as has been mentioned on > this thread, Tahoe-LAFS developers have a plan to add strong > distributed space accounting in the future, which plan we haven't made > much progress on in the last nine months. > > What interests me for the OKF grid is: what are the alternatives? From > my experience using Cassandra I'm pretty sure that it is even less We don't have much interest in Cassandra ... > capable than Tahoe-LAFS is at goal 2, and it can be served up behind > an HTTP proxy just as well as Tahoe-LAFS can. I would assume (without > knowing much) that the same goes for MongoDB and couchdb and every > other system on the planet. :-) Yes, you are quite correct that in any system that doesn't "tag" the owner/source node of a given object bit of data in the system. However, there is one difference with Tahoe I believe (if i remember correctly and matters haven't changed): in Tahoe someone can upload files and fail to make the readcap available. I also believe they can upload to a new root node they create in which I won't even see this node if I 'walk' the filesystem. In other systems if someone uploads content "I" will definitely be able to see it -- and can, for example, enforce a policy such as: any piece of content without a valid owner field will be deleted. > So in sum, Tahoe-LAFS already allows goal 1 and is actually used that > way in practice, and Tahoe-LAFS might in the future (especially if > someone else pitches in and helps) achieve goal 2, which no other > current system to my knowledge can offer either. Agreed modulo major caveat. > Oh, we should really think about another goal which wasn't explicitly > mentioned before but which is probably actually very important: > > Goal 3: I want to allow server operators to contribute space on their > storage server without thereby allowing them to overwrite or delete > files on other storage servers. Yes. > Tahoe-LAFS already offers goal 3, and I'm pretty sure that it is the > only system that offers goal 3 and the only one that is likely to in > the near future. (I would love to be proven wrong.) You also offer: sharding and share-rebalancing (with some others do too but are a major challenge)! > Okay, so now that I've sat down and written this letter, it sounds to > me like maybe Tahoe-LAFS is a reasonable tool for OKF to move forward > with after all. Or at least, it isn't that much more unreasonable than > any alternative that I know of. ;-) Yes, you've definitely made it clear we should go revisit this and see what we can do. > I'm sorry that I didn't figure this out and write this letter nine > months ago when you first asked, but honestly, I was uncertain. In the > time that has passed since then I've learned a lot and gotten familiar > with Cassandra. It wasn't until I actually wrote this letter that I > thought things through in these terms. Thank-you very much for taking the time to write :) and look forward to your responses to some of my queries above. Regards Rufus > [1] http://tahoe-lafs.org/pipermail/tahoe-dev/2009-June/001985.html > [2] http://tahoe-lafs.org/pipermail/tahoe-dev/2010-October/005336.html > > _______________________________________________ > okfn-discuss mailing list > [email protected] > http://lists.okfn.org/mailman/listinfo/okfn-discuss > -- Open Knowledge Foundation Promoting Open Knowledge in a Digital Age http://www.okfn.org/ - http://blog.okfn.org/ _______________________________________________ okfn-discuss mailing list [email protected] http://lists.okfn.org/mailman/listinfo/okfn-discuss
