Re: [p2p-hackers] P2P file storage systems

Pietro Michiardi Fri, 28 Jan 2011 15:04:26 -0800

Dear all,

I think the topic raised by this mail is very interesting and I'd like
to contribute to the discussion (actually, we are a group working on
this subject and what I'm about to write is the result of our
discussions).

As a first observation, imho, we should draw a line to separate
wide-area file systems, online (p2p) storage and online (p2p) backup,
as the working assumption are different for each of these
applications. So let me focus for now on *backup*.

Before getting into technical details, I'd like to take up on the
observations made by Alex: why a p2p approach to online backup? Why
can't we just use a USB drive or a NAS box to backup/archive our data?

One argument is that such an approach is not really seamless for the
user: you need to plug your USB drive, you need to make sure your NAS
box is up and running, etc… (things are easier with time
machine/capsule -like products). Another argument is that USB disk /
NAS boxes can break down, can be stolen (co-location of your PC/laptop
and such devices does not help here), or burn...

Well, at least these are some of the arguments in favor of "cloud"
backup/storage solutions. Think of dropbox, symantec/norton backup and
similar. So the natural follow up question is: why existing "cloud"
storage systems are not enough? Why do we need a p2p approach?

I think here there may be many arguments to discuss, but I'll focus on
just few of them.

Think of price. At a first sight, price seems negligible: 100 bucks
for 50GB a year is not much. Right: but here we're talking about
backup, i.e., *long-term* storage. Now, let's do a simple
back-of-the-envelope calculation. I buy a PC today (no fancy
peripherals) and I pay roughly $300. Average life span of this PC 3-5
years, say 3.
Assume I generate 50GB today and i want to back it up for 3 years. Let
me also be imprecise and assume that the rate at which I generate new
data compensate equally the rate at which online storage prices will
go down (note, S3 prices for storage have been stable in the past few
years). So 50GB in total at a constant price of $100 for 3 year is
$300.
=> Backing up your 50GB costs you as much as your PC!

Think about your data in the hands of a company that can run out of
business (hopefully this doesn't apply to Amazon), or think about a
government that tells this company to erase your data (wikileaks,
amazon).

What if I can offer you the same backup service, at essentially no or
very small fees, with no fear of putting your data all in one basket?
You could save your 300 bucks and maybe go even for a trip and take
pictures to actually backup!

I think that this could be a good motivating example to work on a p2p
approach (or hybrid, as we do) to online backup. If you are interested
in what we have been working on, here are some links to our work:
http://bit.ly/p2pbackup
http://arxiv.org/pdf/1009.1344v1
http://www.eurecom.fr/util/popuppubli.en.htm?page=copyright&id=3140

In few words, for those who don't like papers:
* p2p backup requires redundancy: apply your favorite coding and place
fragments on remote peers (trivial, state of the art)
* p2p backup => you keep a local copy of your data. So what? No need
to achieve high data availability for low latency access to individual
files. Moreover, no need to go bonkers with complex repair techniques:
one remote peers goes dead and you need to replace the missing
fragment? Without local copy (i.e. storage) you need to download
enough redundant blocks to generate a new one and place it, with a
local copy, you just output a new encoded block
* p2p backup => durability is key, and before that making sure you
actually complete a backup operation as fast as possible is even more
important (do that before you crash!). You can trade-off backup /
restore performance: it's not like in storage that you have to access
your files quickly. Restores happens (hopefully) rarely, so it's
better to have low time to backup for a (slightly) larger time to
restore. How to achieve that? Reduce redundancy, which implies less
data to upload and storage efficiency. Just be careful not to reduce
it too much, otherwise you may loose data.

We have done some more nerdy work on optimality vs. random scheduling,
incentives, security and so on, but I realize this mail is long and
don't want to push it too much. If you're interested I'll be happy to
give more details.

Let me conclude with this. Despite the intellectual and technical
challenges that such an application constitutes, we've been also
considering some business cases. We work in a project
(http://www.nanodatacenters.eu/) in which a telco could use edge
resources (set-top-boxes) to deploy services, and p2p backup is one of
such. You spread data on set top boxes (which by the way are up most
of the time, which further reduces redundancy requirements), the telco
can help you out with additional storage and coordination, and
bandwidth remains within the telco.

We're building an (open source) application out of all this, and will
post it here asap for you to have a look.

Ciao,
Pietro, Matteo, Laszlo and Mario

On Fri, Jan 28, 2011 at 4:44 PM, Alen Peacock <alenlpeac...@gmail.com> wrote:
> I dabbled in this area a number of years ago, and still maintain the
> flŭd backup website (http://flud.org). flŭd had almost identical
> design goals to the ones you describe. Unfortunately, other pursuits
> caused me to largely abandon flud. Not much has been done on it in the
> past few years. Still, you might be interested in some of the
> discussion and designs revolving around durability, privacy, localized
> trust in an untrusted environment, attack resistance, etc. explained
> in the wiki and blog.
>
> I'll warn you upfront, though, having worked in this industry for the
> past 4.5 years: automatic offsite backup is a problem that appears
> very simple at first but is deceptively complex with all sorts of
> really high-effort-to-get-right features required (many of them
> client-side), even if you eliminate the p2p complexities.
>
> Alen
>
>
> On Wed, Jan 26, 2011 at 12:04 PM, Michael Militzer <mich...@xvid.org> wrote:
>> Hi all,
>>
>> I am new to the list and also have no background with P2P. However,
>> I'd like to realize a P2P related project and have therefore read a
>> bit on the topic during the past year.
>>
>> Basically, what I have in mind is a peer-to-peer, wide-area persistant
>> file storage system for a backup use-case. That may not sound very
>> exciting because several other software promising the same already
>> exists. However, when taking a closer look it seems to me all existing
>> solutions have serious shortcomings in one or the other area.
>>
>> That's also the main topic of my post here. I'd appreciate your
>> feedback on whether my analysis about related software in the following
>> is correct or if I'm rather missing something important.
>>
>> I envision a storage network that is open to everyone, so is comprised
>> of untrusted nodes that are deployed on a global scale. Like with any
>> external backup service, I think the most important design goal is to
>> ensure "trust". So such a system must be robust and secure to a very
>> high degree - after all, users are supposed to entrust the system their
>> irreplacable data.
>>
>> Therefore, I think robustness and security here cannot mean only the
>> promise for data integrity by some system operator or software vendor.
>> Data availability, privacy and also censorship resistance must be
>> verifiable. In addition, a secure storage system must withstand
>> adversarial attacks. A direct consequence of this is that the peer
>> software and protocol must be open-source. A storage system built
>> around a secret protocol and proprietary software cannot be trusted.
>>
>> And with these requirements in mind, what is currently available seems
>> somewhat disappointing (but maybe it's also just my inability to conduct
>> proper research - so if you know more please give me some input):
>>
>>
>> Allmydata/Tahoe:
>>
>> The only true open-source contender I know of. Unfortunately, not
>> really targeted towards a global-scale network of untrusted nodes. Also,
>> no particular measures to withstand adversarial attacks (but is also not
>> needed when deployed in a trusted environment).
>>
>> Cleversafe:
>>
>> Apparently not open-source anymore. Also not P2P in the sense of a
>> wide-area network of untrusted nodes.
>>
>> Crashplan:
>>
>> Proprietary. P2P only to set up a "friends network", so no untrusted
>> nodes.
>>
>> Freenet:
>>
>> Open-source. Is not really a persistent file store and has other design
>> goals that don't quite fit a backup storage system.
>>
>> Maidsafe/PerpetualData:
>>
>> Some support libraries open-sourced but not the actual protocol and
>> client software. Software not yet publically available. From what is
>> known about the protocol, it looks complex. Not sure about how it
>> will scale or the robustness it can provide.
>>
>> Powerfolder:
>>
>> Some source code seems available. However based on manual peer
>> selection, so also a "friends network".
>>
>> Wuala:
>>
>> Proprietary software. Not much is known about how it internally
>> works, in particular the relation between Wuala's central servers
>> and the storage provided by peers (So: What is dominating? Is Wuala
>> actually a normal cloud storage service with some P2P buzz or is the
>> storage really P2P organized mainly?). Nothing is known about how
>> Wuala can withstand adversarial attacks (Security by secrecy because
>> of secret protocol?). Seems to be the only global-scale P2P storage
>> system in "production use" today according to my knowledge.
>>
>>
>> So it seems one is a bit at a loss when looking for an open-source P2P
>> storage system that is build on a network of untrusted nodes. There
>> are some more open-source programs derived from research like e.g.
>> OceanStore. But these seem unmaintained and not actually deployed.
>> I haven't found a P2P backup solution that has:
>>
>> - Deployability on a global scale with untrusted nodes
>> - Secure, private and persistent data storage
>> - Open-source protocol and software
>> - Censorship-resistance
>> - Resiliency to adversarial attacks
>> - Reasonably simple and manageable design
>>
>> This however would be the kind of project I'd like to explore further if
>> not already available. If anyone is interested I could briefly describe
>> the design I have in mind in a later post. I'd like to connect to people
>> who have practical experience with P2P networking (other than me) to
>> discuss and further refine design ideas...
>>
>> Thanks for any input you can provide!
>>
>> Best regards,
>> Michael
>>
>>
>>
>> _______________________________________________
>> p2p-hackers mailing list
>> p2p-hackers@lists.zooko.com
>> http://lists.zooko.com/mailman/listinfo/p2p-hackers
>>
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@lists.zooko.com
> http://lists.zooko.com/mailman/listinfo/p2p-hackers
>

-- 
Pietro Michiardi, PhD
Faculty, Networking Dept.
Eurecom
http://www.eurecom.fr/~michiard
_______________________________________________
p2p-hackers mailing list
p2p-hackers@lists.zooko.com
http://lists.zooko.com/mailman/listinfo/p2p-hackers

Re: [p2p-hackers] P2P file storage systems

Reply via email to