Hi there,
For those of you that are facts and numbers crazy,
I attached some data size info for 3 large FSFS
repositories. They are 1.8-format mirrors of the
Apache, KDE and wordpress repositories. I used
my new fsfs-stats tool to extract the info.
Some of my findings:
* Apache: lots of large zip files added lately
(low overall compression rate but tool does not
list zip files etc. as the reason - yet)
* KDE: still larger then Apache with an excellent
compression ratio (lots of large .po files); >1TB
* Wordpress: directory compression eliminated
directory storage overhead (5000% => <10%)
* rep sharing is most effective when you have many
"casual" users (> factor 2 in wordpress; 25% savings
for Apache; insignificant for KDE since po files
are not shared / identical between branches)
* noderevs + changes list takes up 10..30% of
the total repo size, i.e. actual content already
well compressed
* more different file props reps than I thought
(probably due to per-file old merge info)
* >50% of all nodes in Apache repo have props
* rep sharing + deltification brings prop info down
to ~10 bytes / rev for Apache
-- Stefan^2.
--
Certified & Supported Apache Subversion Downloads:
*
http://www.wandisco.com/subversion/download
*
Global statistics:
43,571,200,544 bytes in 1,407,978 revisions
1,719,438,790 bytes in 11,919,461 changes
8,527,042,341 bytes in 28,631,286 node revision records
32,606,404,032 bytes in 26,042,259 representations
175,991,665,585 bytes expanded representation size
232,589,088,405 bytes with rep-sharing off
Noderev statistics:
8,527,042,341 bytes in 28,631,286 nodes total
4,529,280,752 bytes in 18,195,547 directory noderevs
3,997,761,589 bytes in 10,435,739 file noderevs
Representation statistics:
32,606,404,032 bytes in 26,042,259 representations total
1,206,577,410 bytes in 17,999,442 directory representations
31,386,080,727 bytes in 7,866,975 file representations
7,936,967 bytes in 102,123 directory property representations
5,808,928 bytes in 73,719 file property representations
703,824,567 bytes in header & footer overhead
Directory representation statistics:
1,206,577,410 bytes in 17,999,442 reps
7,198,044 bytes in 76,251 shared reps
14,900,076,043 bytes expanded size
54,380,469 bytes expanded shared size
15,067,384,452 bytes with rep-sharing off
140,449 shared references
File representation statistics:
31,386,080,727 bytes in 7,866,975 reps
6,957,017,837 bytes in 1,308,907 shared reps
160,724,606,881 bytes expanded size
26,699,217,946 bytes expanded shared size
215,992,591,222 bytes with rep-sharing off
2,568,681 shared references
Directory property representation statistics:
7,936,967 bytes in 102,123 reps
2,435,475 bytes in 30,208 shared reps
236,898,639 bytes expanded size
48,224,988 bytes expanded shared size
959,652,575 bytes with rep-sharing off
3,267,341 shared references
File property representation statistics:
5,808,928 bytes in 73,719 reps
691,141 bytes in 8,936 shared reps
130,084,022 bytes expanded size
4,241,945 bytes expanded shared size
569,460,156 bytes with rep-sharing off
6,554,789 shared references
Global statistics:
42,516,758,377 bytes in 1,325,037 revisions
2,112,852,964 bytes in 18,163,503 changes
9,918,750,627 bytes in 31,461,675 node revision records
29,614,818,603 bytes in 29,269,280 representations
1,114,881,994,595 bytes expanded representation size
1,155,846,558,984 bytes with rep-sharing off
Noderev statistics:
9,918,750,627 bytes in 31,461,675 nodes total
3,641,226,857 bytes in 14,233,846 directory noderevs
6,277,523,770 bytes in 17,227,829 file noderevs
Representation statistics:
29,614,818,603 bytes in 29,269,280 representations total
1,411,801,736 bytes in 14,143,671 directory representations
28,200,181,907 bytes in 15,087,277 file representations
1,465,071 bytes in 17,885 directory property representations
1,369,889 bytes in 20,447 file property representations
856,408,582 bytes in header & footer overhead
Directory representation statistics:
1,411,801,736 bytes in 14,143,671 reps
5,670,142 bytes in 51,339 shared reps
26,884,721,654 bytes expanded size
61,486,365 bytes expanded shared size
26,955,905,794 bytes with rep-sharing off
63,390 shared references
File representation statistics:
28,200,181,907 bytes in 15,087,277 reps
3,087,013,223 bytes in 1,136,350 shared reps
1,087,898,597,508 bytes expanded size
23,485,645,700 bytes expanded shared size
1,126,563,329,834 bytes with rep-sharing off
2,140,551 shared references
Directory property representation statistics:
1,465,071 bytes in 17,885 reps
782,037 bytes in 8,669 shared reps
93,340,801 bytes expanded size
30,873,623 bytes expanded shared size
1,374,010,811 bytes with rep-sharing off
8,070,095 shared references
File property representation statistics:
1,369,889 bytes in 20,447 reps
188,512 bytes in 3,028 shared reps
5,334,632 bytes expanded size
855,812 bytes expanded shared size
953,312,545 bytes with rep-sharing off
9,041,782 shared references
Global statistics:
8,233,212,081 bytes in 507,189 revisions
336,363,580 bytes in 3,473,008 changes
1,205,197,688 bytes in 5,125,527 node revision records
6,610,608,683 bytes in 3,175,300 representations
416,559,053,291 bytes expanded representation size
440,976,526,859 bytes with rep-sharing off
Noderev statistics:
1,205,197,688 bytes in 5,125,527 nodes total
403,048,125 bytes in 2,263,745 directory noderevs
802,149,563 bytes in 2,861,782 file noderevs
Representation statistics:
6,610,608,683 bytes in 3,175,300 representations total
428,471,684 bytes in 2,111,717 directory representations
6,181,996,505 bytes in 1,061,535 file representations
116,243 bytes in 1,742 directory property representations
24,251 bytes in 306 file property representations
75,980,107 bytes in header & footer overhead
Directory representation statistics:
428,471,684 bytes in 2,111,717 reps
5,577,314 bytes in 36,636 shared reps
398,462,596,403 bytes expanded size
79,861,877 bytes expanded shared size
398,549,277,881 bytes with rep-sharing off
42,953 shared references
File representation statistics:
6,181,996,505 bytes in 1,061,535 reps
3,029,368,482 bytes in 446,128 shared reps
18,096,237,254 bytes expanded size
7,064,016,710 bytes expanded shared size
42,360,997,646 bytes with rep-sharing off
1,800,236 shared references
Directory property representation statistics:
116,243 bytes in 1,742 reps
78,252 bytes in 1,100 shared reps
193,351 bytes expanded size
106,096 bytes expanded shared size
4,082,036 bytes with rep-sharing off
68,921 shared references
File property representation statistics:
24,251 bytes in 306 reps
18,453 bytes in 239 shared reps
26,283 bytes expanded size
18,931 bytes expanded shared size
62,169,296 bytes with rep-sharing off
1,213,859 shared references