Hi David, I'm sure Zooko will chime in a bit more later but he's in-transit right now so I'll give some ballpark information to get started.
Below is a provisioning page where we try to give some guidance on setting up a grid: http://tahoebs1.allmydata.com:8123/provisioning/ The way we currently run is with 3.3x expansion which means that for every 1MB of input data, we generate 3.3MB of encoded data (plus a bit of overhead for directory and file management, but let's ignore that as it's relatively small). This gives us very good reliability for the types of servers that we manage. The current distribution algorithm for the data shares does not provide very many knobs to tweak yet and you don't have much control over where the shares are sent in the grid. That being said, we are planning to add the ability to direct the share distribution in various ways (node quality, geography, network topology, etc). One example scenario that we've thought a bit about is: make sure that each colocation facility almost always has enough shares to completely rebuild the file. So, if you have k facilities, then you'd want to set your expansion factor to k+1. This way you could completely rebuild any file locally and not have to incur any external (expensive) network traffic to do so. Hope this helps, Peter David Barrett wrote: > If I store a 1MB file in Tahoe, how much total storage (ballpark) does > it use summed across all nodes? > > I'm reading the "expansion factor" section of the architecture.txt file > and it says: > >> In general, small private grids should work well, but the participants will >> have to decide between storage overhead and reliability. Large stable grids >> will be able to reduce the expansion factor down to a bare minimum while >> still retaining high reliability, but large unstable grids (where nodes are >> coming and going very quickly) may require more repair/verification bandwidth >> than actual upload/download traffic. > > What's a reasonable estimate of the total storage capacity of a Tahoe > grid across 100 servers, each devoting 10GB of storage? The total disk > capacity would be 1000GB, but with encoding and replication, does that > mean we could store 999GB of data? Or 750? Or 500? Less? > > Similarly, what are reasonable configuration values for this, assuming > each server has >90% uptime come and the 100 are split between 4 > clusters of 25. > > (Can we configure Tahoe to avoid storing in servers inside the same > cluster?) > > Just trying to wrap my head of Tahoe's capabilities while considering > its utility harvesting extra space within a large server cluster. > > Thanks! > > -david > > > _______________________________________________ > p2p-hackers mailing list > [email protected] > http://lists.zooko.com/mailman/listinfo/p2p-hackers _______________________________________________ p2p-hackers mailing list [email protected] http://lists.zooko.com/mailman/listinfo/p2p-hackers
