Dear list,

we have replicated gluster volumes much slower than the brick disks and I 
wonder if this is a configuration issue, a conceptual issue of our setup or 
really how slow gluster just is.

The setup:

  *   Three servers in a ring connected via IP over Infiniband, 100Gb/s on each 
link
  *   2 x U.2 2TB-SSDs as RAID1, on a Megaraid controller, connected via NVMe 
links on each node
  *   A 3x replicated volume on a thin-pool LVM on the SSD-RAIDs.
  *   2 x 26 cores, lots of RAM
  *   The volumes are locally mounted and used as shared disks for computation 
jobs (CPU and GPU) on the nodes.
  *   The LVM thinpool is shared with other volumes and with a cache pool for 
the hard disks.

The measurement:

  *   Iozone in automated mode, on the gluster volume (Set1) and on the mounted 
brick disk (Baseline)
  *   Compared with iozone_results_comparator.py

The issue:

  *   Smaller files are around factor 2 slower than larger files on the SSD, 
but factor 6-10 slower than larger files on the gluster volume (somewhat 
expected)
  *   Larger files are, except for certain read accesses, still factor 3-10 
slower on the gluster volume than on the SSD RAID directly, depending on the 
operation.
  *   Operations with many, admittedly smaller files (checkouts, copying, rsync 
and unpacking) can extend into hours, where they take tens of seconds to few 
minutes on disk.
  *   atop sometimes shows 9x% busy on some of the LVM block devices - and 
sometimes the avio value increases from 4-26ms to 3000-5000ms. Otherwise, there 
is nothing mentionable observed regarding system load.

This does not seem to be network bound. Accessing from a remote VM via 
glusterfs-mount is not much slower despite connecting via 10GbE instead of 
100GB. Nethogs shows sent traffic on the Infiniband interface going up to 
30000kB/s for large records - 240Mb/s over a 100Gb/s Interface ...
The machine itself is mostly idle - no iowaits, a load average of 6 on 104 
logical cpus. Glusterfsd is sometimes pushing up to 10% CPU. I just do not see 
what the bottleneck should be.

Below a summary out of the compared iozone reports.

Any hints if it is worth trying to optimize this at all and where to start 
looking? I believe I have checked all the "standard" hints, but might have 
missed something.

The alternative would be a single storage node connected via nfs, sacrificing 
the probably not needed redundancy/high availability of the replicated 
filesystem.

Thanks a lot for any comments in advance,

Best
Rupert
[summary]


Operation

Write<file:///V:/rps/iozone/html_out/index.html#iwrite>

Re-write<file:///V:/rps/iozone/html_out/index.html#rewrite>

Read<file:///V:/rps/iozone/html_out/index.html#iread>

Re-read<file:///V:/rps/iozone/html_out/index.html#reread>

Random read<file:///V:/rps/iozone/html_out/index.html#randrd>

Random write<file:///V:/rps/iozone/html_out/index.html#randwr>

Backwards read<file:///V:/rps/iozone/html_out/index.html#bkwdrd>

Record rewrite<file:///V:/rps/iozone/html_out/index.html#recrewr>

Strided Read<file:///V:/rps/iozone/html_out/index.html#striderd>

Fwrite<file:///V:/rps/iozone/html_out/index.html#fwrite>

Frewrite<file:///V:/rps/iozone/html_out/index.html#frewrite>

Fread<file:///V:/rps/iozone/html_out/index.html#fread>

Freread<file:///V:/rps/iozone/html_out/index.html#freread>

ALL<file:///V:/rps/iozone/html_out/index.html#ALL>

baseline

first quartile

1260.16

1894.11

3171.44

3002.55

3679.12

2214.07

2987.15

2701.4

3361.62

1783.45

1739.6

3231.97

3405.8

2152.28

median

1474.31

2145.7

3637.6

3383.24

4167.16

2570.24

3435.44

3751.41

3967.55

2059.25

1992.64

3611.78

3936.96

3125.35

third quartile

1595.17

2318.0

3992.78

3840.0

4803.77

3013.13

3864.87

4420.33

4673.13

2354.6

2258.15

4036.73

4570.02

3915.54

minimum

194.64

960.92

1785.65

2152.15

1840.86

947.03

1491.1

1135.78

1732.88

999.01

933.54

1745.15

2083.72

194.64

maximum

2041.55

2819.84

5628.61

7342.68

7142.61

4776.16

6091.17

15908.02

6922.3

3540.22

3787.35

5933.0

7510.19

15908.02

mean val.

1408.31

2070.62

3663.03

3505.74

4231.31

2633.07

3444.43

4102.42

4029.73

2087.25

2027.23

3641.8

4113.02

3150.61

standard dev.

306.93

366.43

652.73

816.6

1048.38

781.84

796.13

2277.33

1041.12

511.69

464.33

687.27

1012.13

1335.36

ci. min. 90%

1363.0

2016.52

3566.67

3385.19

4076.54

2517.65

3326.9

3766.21

3876.03

2011.71

1958.68

3540.34

3963.6

3096.31

ci. max. 90%

1453.62

2124.71

3759.39

3626.3

4386.08

2748.5

3561.96

4438.62

4183.44

2162.79

2095.78

3743.27

4262.44

3204.91

geom. mean

1362.58

2034.2

3604.92

3422.21

4094.96

2511.93

3346.74

3637.05

3892.76

2024.11

1975.46

3576.11

3996.51

2883.44

set1

first quartile

173.4

174.43

208.58

2813.71

3498.77

139.63

51.51

91.49

124.34

201.55

199.05

187.72

2509.21

189.36

median

312.05

293.47

267.53

3134.22

4005.72

332.09

110.38

302.02

216.41

359.67

355.33

234.42

2742.35

357.31

third quartile

414.07

410.15

401.42

3446.69

4429.1

436.45

178.94

429.6

397.15

463.62

480.29

317.98

3092.73

686.35

minimum

44.11

45.6

59.07

2089.07

2004.07

9.39

8.67

2.77

32.58

38.63

41.09

26.0

1359.63

2.77

maximum

606.95

610.49

4457.8

6248.75

6153.85

666.55

374.16

696.91

2593.43

647.26

700.02

451.39

4353.18

6248.75

mean val.

308.37

309.17

544.27

3181.18

3964.96

305.76

123.49

282.89

402.01

337.11

342.11

248.76

2780.37

1010.03

standard dev.

154.66

157.37

892.64

635.03

825.98

183.62

87.03

191.43

493.51

160.15

173.1

100.31

484.33

1358.34

ci. min. 90%

285.54

285.93

412.49

3087.43

3843.02

278.65

110.65

254.63

329.15

313.47

316.56

233.95

2708.87

954.8

ci. max. 90%

331.2

332.4

676.05

3274.93

4086.9

332.87

136.34

311.15

474.86

360.76

367.67

263.57

2851.87

1065.27

geom. mean

260.27

260.16

318.88

3127.01

3872.62

220.54

86.71

174.2

243.43

285.24

286.02

223.44

2737.93

414.26

linear regression slope 90%

0.17 - 0.28

0.1 - 0.21

-0.05 - 0.35

0.83 - 0.97

0.85 - 1.0

0.08 - 0.15

0.02 - 0.05

0.04 - 0.07

0.04 - 0.17

0.12 - 0.2

0.13 - 0.22

0.05 - 0.09

0.6 - 0.72

0.28 - 0.35

ttest equality

DIFF

DIFF

DIFF

DIFF

DIFF

DIFF

DIFF

DIFF

DIFF

DIFF

DIFF

DIFF

DIFF

DIFF

baseline set1 difference

-78.1 %

-85.07 %

-85.14 %

-9.26 %

-6.29 %

-88.39 %

-96.41 %

-93.1 %

-90.02 %

-83.85 %

-83.12 %

-93.17 %

-32.4 %

-67.94 %

ttest p-value

0.0

0.0

0.0

0.0005

0.026

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0

0.0



________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to