Re: [ceph-users] client with uid

2018-02-06 Thread Patrick Donnelly
On Mon, Feb 5, 2018 at 9:08 AM, Keane Wolter  wrote:
> Hi Patrick,
>
> Thanks for the info. Looking at the fuse options in the man page, I should
> be able to pass "-o uid=$(id -u)" at the end of the ceph-fuse command.
> However, when I do, it returns with an unknown option for fuse and
> segfaults. Any pointers would be greatly appreciated. This is the result I
> get:

I'm not familiar with that uid= option, you'll ahve to redirect that
question to FUSE devs. (However, I don't think it does what you want
it to. It says it only hard-codes the st_uid field returned by stat.)

> daemoneye@wolterk:~$ ceph-fuse --id=kwolter_test1 -r /user/kwolter/
> /home/daemoneye/ceph/ --client-die-on-failed-remount=false -o uid=$(id -u)
> ceph-fuse[25156]: starting ceph client
> fuse: unknown option `uid=1000'
> ceph-fuse[25156]: fuse failed to start
> *** Caught signal (Segmentation fault) **
>  in thread 7efc7da86100 thread_name:ceph-fuse
>  ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous
> (stable)
>  1: (()+0x6a8784) [0x5583372d8784]
>  2: (()+0x12180) [0x7efc7bb4f180]
>  3: (Client::_ll_drop_pins()+0x67) [0x558336e5dea7]
>  4: (Client::unmount()+0x943) [0x558336e67323]
>  5: (main()+0x7ed) [0x558336e02b0d]
>  6: (__libc_start_main()+0xea) [0x7efc7a892f2a]
>  7: (_start()+0x2a) [0x558336e0b73a]
> ceph-fuse [25154]: (33) Numerical argument out of domain
> daemoneye@wolterk:~$

I wasn't able to reproduce this.

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD Segfaults after Bluestore conversion

2018-02-06 Thread Kyle Hutson
We had a 26-node production ceph cluster which we upgraded to Luminous a
little over a month ago. I added a 27th-node with Bluestore and didn't have
any issues, so I began converting the others, one at a time. The first two
went off pretty smoothly, but the 3rd is doing something strange.

Initially, all the OSDs came up fine, but then some started to segfault.
Out of curiosity more than anything else, I did reboot the server to see if
it would get better or worse, and it pretty much stayed the same - 12 of
the 18 OSDs did not properly come up. Of those, 3 again segfaulted

I picked one that didn't properly come up and copied the log to where
anybody can view it:
http://people.beocat.ksu.edu/~kylehutson/ceph-osd.426.log

You can contrast that with one that is up:
http://people.beocat.ksu.edu/~kylehutson/ceph-osd.428.log

(which is still showing segfaults in the logs, but seems to be recovering
from them OK?)

Any ideas?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] object lifecycle scope

2018-02-06 Thread Robert Stanford
 Hello Ceph users.  Is object lifecycle (currently expiration) for rgw
implementable on a per-object basis, or is the smallest scope the bucket?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD device as SBD device for pacemaker cluster

2018-02-06 Thread Lars Marowsky-Bree
On 2018-02-06T13:00:59, Kai Wagner  wrote:

> I had the idea to use a RBD device as the SBD device for a pacemaker
> cluster. So I don't have to fiddle with multipathing and all that stuff.
> Have someone already tested this somewhere and can tell how the cluster
> reacts on this?

SBD should work on top of RBD; any shared block device will do. I'd
recommend slightly higher timeouts than normal; check how/if the Ceph
cluster blocks IO during recovery.


Regards,
Lars

-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] High apply latency

2018-02-06 Thread Frédéric Nass

Hi Jakub,


Le 06/02/2018 à 16:03, Jakub Jaszewski a écrit :

​Hi Frederic,

I've not enable debug level logging on all OSDs, just on one for the 
test, need to double check that.
But looks that merging is ongoing on few OSDs or OSDs are faulty, I 
will dig into that tomorrow.

Write bandwidth is very random


I just reread the whole thread:

- Splitting is not happening anymore - if it ever did - that's for sure.
- Regarding the write bandwidth variations, it seems that these 
variations only concern EC 6+3 pools.
- As you get more than a 1.2 GB/s on replicated pools with 4MB iops, I 
would think that neither NVMe, nor PERC or HDDs is to blame.


Did you check CPU load during EC 6+3 writes on pool 
default.rgw.buckets.data ?


If you don't see any 100% CPU load, nor any 100% iostat issues on either 
the NVMe disk or HDDs, then I would benchmark the network for bandwidth 
or latency issues.


BTW, did you see that some of your OSDs were not tagged as 'hdd' (ceph 
osd df tree).





# rados bench -p default.rgw.buckets.data 120 write
hints = 1
Maintaining 16 concurrent writes of 4194432 bytes to objects of size 
4194432 for up to 120 seconds or 0 objects

Object prefix: benchmark_data_sg08-09_59104
sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg 
lat(s)

0       0         0         0         0         0  -           0
1      16       155       139    555.93   556.017  0.0750027     0.10687
2      16       264       248   495.936   436.013 0.154185    0.118693
3      16       330       314   418.616   264.008 0.118476    0.142667
4      16       415       399   398.953    340.01  0.0873379     0.15102
5      16       483       467   373.557   272.008 0.750453    0.159819
6      16       532       516   343.962   196.006  0.0298334    0.171218
7      16       617       601   343.391    340.01 0.192698    0.177288
8      16       700       684   341.963    332.01  0.0281355    0.171277
9      16       762       746   331.521   248.008  0.0962037    0.163734
 10      16       804       788   315.167   168.005  1.40356    0.196298
 11      16       897       881    320.33   372.011  0.0369085     0.19496
 12      16       985       969   322.966   352.011  0.0290563    0.193986
 13      15      1106      1091   335.657   488.015  0.0617642    0.188703
 14      16      1166      1150   328.537   236.007  0.0401884    0.186206
 15      16      1251      1235   329.299    340.01 0.171256    0.190974
 16      16      1339      1323   330.716   352.011 0.024222    0.189901
 17      16      1417      1401   329.613    312.01  0.0289473    0.186562
 18      16      1465      1449   321.967   192.006 0.028123    0.189153
 19      16      1522      1506    317.02   228.007 0.265448    0.188288
2018-02-06 13:43:21.412512 min lat: 0.0204657 max lat: 3.61509 avg 
lat: 0.18918
sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg 
lat(s)

 20      16      1564      1548   309.568   168.005  0.0327581     0.18918
 21      16      1636      1620    308.54   288.009  0.0715159    0.187381
 22      16      1673      1657   301.242   148.005  1.57285    0.191596
 23      16      1762      1746   303.621   356.011  6.00352    0.206217
 24      16      1885      1869   311.468   492.015  0.0298435    0.203874
 25      16      2010      1994   319.008   500.015  0.0258761    0.199652
 26      16      2116      2100   323.044   424.013  0.0533319     0.19631
 27      16      2201      2185    323.67    340.01 0.134796    0.195953
 28      16      2257      2241    320.11   224.007 0.473629    0.196464
 29      16      2333      2317   319.554   304.009  0.0362741    0.198054
 30      16      2371      2355   313.968   152.005 0.438141    0.200265
 31      16      2459      2443   315.194   352.011  0.0610629    0.200858
 32      16      2525      2509   313.593   264.008  0.0234799    0.201008
 33      16      2612      2596   314.635   348.011 0.072019    0.199094
 34      16      2682      2666   313.615   280.009  0.10062    0.197586
 35      16      2757      2741   313.225   300.009  0.0552581    0.196981
 36      16      2849      2833   314.746   368.011 0.257323     0.19565
 37      16      2891      2875   310.779   168.005  0.0918386     0.19556
 38      16      2946      2930    308.39   220.007  0.0276621    0.195792
 39      16      2975      2959   303.456   116.004  0.0588971     0.19952
2018-02-06 13:43:41.415107 min lat: 0.0204657 max lat: 7.9873 avg lat: 
0.198749
sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg 
lat(s)

 40      16      3060      3044   304.369    340.01  0.0217136    0.198749
 41      16      3098      3082   300.652   152.005  0.0717398    0.199052
 42      16      3141      3125   297.589   172.005  0.0257422    0.201899
 43      15      3241      3226   300.063   404.012  0.0733869    0.209446
 44      16      3332      3316   301.424   360.011  0.0327249    0.206686
 45      16      3430      3414   303.436   392.012  0.0413156    

Re: [ceph-users] High apply latency

2018-02-06 Thread Jakub Jaszewski
​Hi Frederic,

I've not enable debug level logging on all OSDs, just on one for the test,
need to double check that.
But looks that merging is ongoing on few OSDs or OSDs are faulty, I will
dig into that tomorrow.
Write bandwidth is very random

# rados bench -p default.rgw.buckets.data 120 write
hints = 1
Maintaining 16 concurrent writes of 4194432 bytes to objects of size
4194432 for up to 120 seconds or 0 objects
Object prefix: benchmark_data_sg08-09_59104
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
lat(s)
0   0 0 0 0 0   -
 0
1  16   155   139555.93   556.017   0.0750027
 0.10687
2  16   264   248   495.936   436.0130.154185
0.118693
3  16   330   314   418.616   264.0080.118476
0.142667
4  16   415   399   398.953340.01   0.0873379
 0.15102
5  16   483   467   373.557   272.0080.750453
0.159819
6  16   532   516   343.962   196.006   0.0298334
0.171218
7  16   617   601   343.391340.010.192698
0.177288
8  16   700   684   341.963332.01   0.0281355
0.171277
9  16   762   746   331.521   248.008   0.0962037
0.163734
   10  16   804   788   315.167   168.005 1.40356
0.196298
   11  16   897   881320.33   372.011   0.0369085
 0.19496
   12  16   985   969   322.966   352.011   0.0290563
0.193986
   13  15  1106  1091   335.657   488.015   0.0617642
0.188703
   14  16  1166  1150   328.537   236.007   0.0401884
0.186206
   15  16  1251  1235   329.299340.010.171256
0.190974
   16  16  1339  1323   330.716   352.0110.024222
0.189901
   17  16  1417  1401   329.613312.01   0.0289473
0.186562
   18  16  1465  1449   321.967   192.0060.028123
0.189153
   19  16  1522  1506317.02   228.0070.265448
0.188288
2018-02-06 13:43:21.412512 min lat: 0.0204657 max lat: 3.61509 avg lat:
0.18918
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
lat(s)
   20  16  1564  1548   309.568   168.005   0.0327581
 0.18918
   21  16  1636  1620308.54   288.009   0.0715159
0.187381
   22  16  1673  1657   301.242   148.005 1.57285
0.191596
   23  16  1762  1746   303.621   356.011 6.00352
0.206217
   24  16  1885  1869   311.468   492.015   0.0298435
0.203874
   25  16  2010  1994   319.008   500.015   0.0258761
0.199652
   26  16  2116  2100   323.044   424.013   0.0533319
 0.19631
   27  16  2201  2185323.67340.010.134796
0.195953
   28  16  2257  2241320.11   224.0070.473629
0.196464
   29  16  2333  2317   319.554   304.009   0.0362741
0.198054
   30  16  2371  2355   313.968   152.0050.438141
0.200265
   31  16  2459  2443   315.194   352.011   0.0610629
0.200858
   32  16  2525  2509   313.593   264.008   0.0234799
0.201008
   33  16  2612  2596   314.635   348.0110.072019
0.199094
   34  16  2682  2666   313.615   280.009 0.10062
0.197586
   35  16  2757  2741   313.225   300.009   0.0552581
0.196981
   36  16  2849  2833   314.746   368.0110.257323
 0.19565
   37  16  2891  2875   310.779   168.005   0.0918386
 0.19556
   38  16  2946  2930308.39   220.007   0.0276621
0.195792
   39  16  2975  2959   303.456   116.004   0.0588971
 0.19952
2018-02-06 13:43:41.415107 min lat: 0.0204657 max lat: 7.9873 avg lat:
0.198749
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
lat(s)
   40  16  3060  3044   304.369340.01   0.0217136
0.198749
   41  16  3098  3082   300.652   152.005   0.0717398
0.199052
   42  16  3141  3125   297.589   172.005   0.0257422
0.201899
   43  15  3241  3226   300.063   404.012   0.0733869
0.209446
   44  16  3332  3316   301.424   360.011   0.0327249
0.206686
   45  16  3430  3414   303.436   392.012   0.0413156
0.203727
   46  16  3534  3518   305.882   416.0130.033638
0.202182
   47  16  3602  3586   305.161   272.008   0.0453557
0.200028
   48  16  3663  3647   303.886   244.007   0.0779019
0.199777
   49  16  3736  3720   303.643   292.009   0.0285231
0.206274
   50  16  3849  3833   306.609   452.014   0.0537071
0.208127
   51  16  3909  3893   305.303   240.007   0.0366709
0.207793
   52  16  3972  3956   304.277   252.008   0.0289131
0.207989
   53  16  4048  4032   304.272   304.009   0.0348617
0.207844
   54  16  4114  4098   303.525   264.008   0.0799526
 0.20701
   55  16  

Re: [ceph-users] RBD device as SBD device for pacemaker cluster

2018-02-06 Thread Wido den Hollander



On 02/06/2018 01:00 PM, Kai Wagner wrote:

Hi all,

I had the idea to use a RBD device as the SBD device for a pacemaker
cluster. So I don't have to fiddle with multipathing and all that stuff.
Have someone already tested this somewhere and can tell how the cluster
reacts on this?

I think this shouldn't be problem, but I'm just wondering if there's
anything that I'm not aware of?



I do think it will work. You might need to disable the exclusive-lock 
feature, but I'm not sure of it.


I thought I heard somebody using it a while back, but I don't know 
exactly anymore.


Wido


Thanks

Kai



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RBD device as SBD device for pacemaker cluster

2018-02-06 Thread Kai Wagner
Hi all,

I had the idea to use a RBD device as the SBD device for a pacemaker
cluster. So I don't have to fiddle with multipathing and all that stuff.
Have someone already tested this somewhere and can tell how the cluster
reacts on this?

I think this shouldn't be problem, but I'm just wondering if there's
anything that I'm not aware of?

Thanks

Kai

-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing osd crush chooseleaf type at runtime

2018-02-06 Thread Flemming Frandsen

Ah! Right, I guess my actual question was:

How does osd crush chooseleaf type = 0 and 1 alter the crushmap?


By experimentation I've figured out that:

"osd crush chooseleaf type = 0" turns into "step choose firstn 0 type 
osd" and


"osd crush chooseleaf type = 1" turns into "step chooseleaf firstn 0 
type host".



Changing the crushmap in this way worked perfectly for me, ceph -s 
complained while doing the rebalancing, but eventually became happy with 
the result.



On 02/02/18 17:07, Gregory Farnum wrote:
Once you've created a crush map you need to edit it directly (either 
by dumping it from the cluster, editing with the crush tool, and 
importing; or via the ceph cli commands), rather than by updating 
config settings. I believe doing so is explained in the ceph docs.


On Fri, Feb 2, 2018 at 4:47 AM Flemming Frandsen 
> wrote:


Hi, I'm just starting to play around with Ceph, so please excuse my
complete lack of a clue if this question is covered somewhere, but I
have been unable to find an answer.


I have a single machine running Ceph which was set up with osd crush
chooseleaf type = 0 in /etc/ceph/ceph.conf, now I've added a new
machine
with some new OSDs, so I'd like to change to osd crush chooseleaf
type =
1 and have Ceph re-balance the replicas.

How do I do that?

Preferably I'd like to make the change without making the cluster
unavailable.


So far I've edited the config file and tried restarting daemons,
including rebooting the entire OS, but I still see PGs that live
only on
one host.

I've read the config docuemntation page but it doesn't mention what to
do to make that specific config change take effect:

http://docs.ceph.com/docs/master/rados/configuration/ceph-conf/

I've barked up the crushmap tree a bit, but I did not see how "osd
crush
chooseleaf type" relates to that in any way.


--
  Regards Flemming Frandsen - Stibo Systems - DK - STEP Release
Manager
  Please use rele...@stibo.com  for all
Release Management requests

___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
 Regards Flemming Frandsen - Stibo Systems - DK - STEP Release Manager
 Please use rele...@stibo.com for all Release Management requests

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] resolved - unusual growth in cluster after replacing journalSSDs

2018-02-06 Thread Jogi Hofmüller
Dear all,

we finally found the reason for the unexpected growth in our cluster. 
The data was created by a collectd plugin [1] that measures latency by
running rados bench once a minute.  Since our cluster was stressed out
for a while, removing the objects created by rados bench failed.  We
completely overlooked the log messages that should have given us the
hint a lot earlier.  e.g.:

Jan 18 23:26:09 ceph1 ceph-osd: 2018-01-18 23:26:09.931638
7f963389f700  0 -- IP:6802/1986 submit_message osd_op_reply(374
benchmark_data_ceph3_31746_object158 [delete] v21240'22867646
uv22867646 ack = 0) v7 remote, IP:0/3091801967, failed lossy con,
dropping message 0x7f96672a6680

Over time we "collected" some 1.5TB of benchmark data :(

Furthermore, due to some misunderstanding we had the collectd plugin
that runs the benchmarks running on two machines, doubling the stress
on the cluster.

And finally we created benchmark data in our main production pool,
which also was a bad idea.

Hope this info will be useful for someone :)

[1]  https://github.com/rochaporto/collectd-ceph

Cheers,
-- 
J.Hofmüller
We are all idiots with deadlines.
- Mike West


signature.asc
Description: This is a digitally signed message part
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how to delete a cluster network

2018-02-06 Thread Александр Пивушков
Hello! My cluster uses two networks.
  in ceph.conf there are two records: public_network = 10.53.8.0/24, 
cluster_network = 10.0.0.0/24
  Servers and clients are connected to one switch.
  To store data in ceph from clients, use cephfs:
10.53.8.141:6789,10.53.8.143:6789,10.53.8.144:6789:/ on / mnt / mycephfs type 
ceph (rw, noatime, name = admin, secret = , acl)
How can I leave only one network on the whole cluster and clients - 10.0.0.0/24 
(which is now cluster_network)? And remove the network 10.53.8.0/24 (which is 
now public_network)?
  Cluster - 5 clients, 3 MON, 1 MDS.
  Editing ceph.conf on the entire cluster (deleting the cluster_network entry) 
and restarting MON services (and even rebooting all three MONs) did not help.


Александр Пивушков___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Latency for the Public Network

2018-02-06 Thread Christian Balzer

Hello,

On Tue, 6 Feb 2018 09:21:22 +0100 Tobias Kropf wrote:

> On 02/06/2018 04:03 AM, Christian Balzer wrote:
> > Hello,
> >
> > On Mon, 5 Feb 2018 22:04:00 +0100 Tobias Kropf wrote:
> >  
> >> Hi ceph list,
> >>
> >> we have a hyperconvergent ceph cluster with kvm on 8 nodes with ceph
> >> hammer 0.94.10.   
> > Do I smell Proxmox?  
> Yes we use atm Proxmox
> >  
> >> The cluster is now 3 years old an we plan with a new
> >> cluster for a high iops project. We use replicated pools 3/2 and have
> >> not the best latency on our switch backend.
> >>
> >>
> >> ping -s 8192 10.10.10.40 
> >>
> >> 8200 bytes from 10.10.10.40: icmp_seq=1 ttl=64 time=0.153 ms
> >>  
> > Not particularly great, yes.
> > However your network latency is only one factor, Ceph OSDs add quite
> > another layer there and do affect IOPS even more usually. 
> > For high IOPS you need of course fast storage, network AND CPUs.   
> Yes we know that... the network is our first job. We plan with new
> hardware for mon and osd services with a lot of flash nvme disks and
> high ghz cpus.
> >  
> >> We plan to split the hyperconvergent setup to storage an compute nodes
> >> and want to split ceph cluster and public network. Cluster network with
> >> 40 gbit mellanox switches and public network with the existant 10gbit
> >> switches.
> >>  
> > You'd do a lot better if you were to go all 40Gb/s and forget about
> > splitting networks.   
> Use public and cluster network over the same nics and the same subnet?

Yes, at least for NICs. 
If for some reason your compute nodes have no dedicated links/NICs for the
Ceph cluster and it makes you feel warm and fuzzy, you can segregate
traffic with VLANs. 
But it most cases that really comes down to "security theater", if a
compute gets compromised they have access to your ceph cluster network
anyway.

When looking at the ML archives you'll find a number of people suggesting
to keep things simple if not otherwise needed. 

> >
> > The faster replication network will:
> > a) be underutilized all of the time in terms of bandwidth 
> > b) not help with read IOPS at all
> > c) still be hobbled by the public network latency when it comes to write
> > IOPS (but of course help in regards to replication latency). 
> >  
> >> Now my question... are 0.153ms - 0.170ms fast enough for the public
> >> network? We must deploy a setup with 1500 - 2000 terminalserver
> >>  
> > Define terminal server, are we talking Windows Virtual Desktops with RDP?
> > Windows is quite the hog when it comes to I/O.  
> Yes we talking about windows virtual desktops with rdp
> Our calculation is... 1x dc= 60-80 IOPS 1x ts = 60-80 IOPS N User * 10
> IOPS ...
> 
> For this system we want to wort with cache tiering in front with nvme
> disk and sata disk on ec pool.  Is this a good idear to use Cache
> tiering in this setup?
> 
Depends on the size of your cache-tier really.
I have done no analysis of Windows I/O behavior other than it being
insanely swap happy w/o needs, so if you can, eliminate the pagefile. 

If all your typical writes can be satisfied from the cache-tier, good.
Reads (like OS boot, etc) should be fine from the EC pool, so cache-tier
in read-forward mode. 

But you _really_ need to test this, a non-fitting cache-tier can be worse
than no cache at all.

Christian

> 
> >
> > Regards,
> >
> > Christian  
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Latency for the Public Network

2018-02-06 Thread Tobias Kropf


On 02/06/2018 04:03 AM, Christian Balzer wrote:
> Hello,
>
> On Mon, 5 Feb 2018 22:04:00 +0100 Tobias Kropf wrote:
>
>> Hi ceph list,
>>
>> we have a hyperconvergent ceph cluster with kvm on 8 nodes with ceph
>> hammer 0.94.10. 
> Do I smell Proxmox?
Yes we use atm Proxmox
>
>> The cluster is now 3 years old an we plan with a new
>> cluster for a high iops project. We use replicated pools 3/2 and have
>> not the best latency on our switch backend.
>>
>>
>> ping -s 8192 10.10.10.40 
>>
>> 8200 bytes from 10.10.10.40: icmp_seq=1 ttl=64 time=0.153 ms
>>
> Not particularly great, yes.
> However your network latency is only one factor, Ceph OSDs add quite
> another layer there and do affect IOPS even more usually. 
> For high IOPS you need of course fast storage, network AND CPUs. 
Yes we know that... the network is our first job. We plan with new
hardware for mon and osd services with a lot of flash nvme disks and
high ghz cpus.
>
>> We plan to split the hyperconvergent setup to storage an compute nodes
>> and want to split ceph cluster and public network. Cluster network with
>> 40 gbit mellanox switches and public network with the existant 10gbit
>> switches.
>>
> You'd do a lot better if you were to go all 40Gb/s and forget about
> splitting networks. 
Use public and cluster network over the same nics and the same subnet?
>
> The faster replication network will:
> a) be underutilized all of the time in terms of bandwidth 
> b) not help with read IOPS at all
> c) still be hobbled by the public network latency when it comes to write
> IOPS (but of course help in regards to replication latency). 
>
>> Now my question... are 0.153ms - 0.170ms fast enough for the public
>> network? We must deploy a setup with 1500 - 2000 terminalserver
>>
> Define terminal server, are we talking Windows Virtual Desktops with RDP?
> Windows is quite the hog when it comes to I/O.
Yes we talking about windows virtual desktops with rdp
Our calculation is... 1x dc= 60-80 IOPS 1x ts = 60-80 IOPS N User * 10
IOPS ...

For this system we want to wort with cache tiering in front with nvme
disk and sata disk on ec pool.  Is this a good idear to use Cache
tiering in this setup?


>
> Regards,
>
> Christian

-- 
Tobias Kropf

 

Technik

 

 

--

inett5-100x56

inett GmbH » Ihr IT Systemhaus in Saarbrücken

Mainzerstrasse 183
66121 Saarbrücken
Geschäftsführer: Marco Gabriel
Handelsregister Saarbrücken
HRB 16588


Telefon: 0681 / 41 09 93 – 0
Telefax: 0681 / 41 09 93 – 99
E-Mail: i...@inett.de
Web: www.inett.de

Cyberoam Gold Partner - Zarafa Gold Partner - Proxmox Authorized Reseller - 
Proxmox Training Center - SEP sesam Certified Partner – Open-E Partner - Endian 
Certified Partner - Kaspersky Silver Partner – ESET Silver Partner - Mitglied 
im iTeam Systemhausverbund für den Mittelstand 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Infinite loop in radosgw-usage show

2018-02-06 Thread Ingo Reimann
Just to add -

We wrote a little wrapper, that reads the output of "radosgw-admin usage
show" and stops, when the loop starts. When we add all entries by
ourselves, the result is correct. Moreover - the duplicate timestamp, that
we detect to break the loop, is not the last taken into  account. Eg:
./radosgw-admin-break-loop --uid=TestUser --start-date=2017-12-01
--end-date=2018-01-01 [...]
"bytes_received": 1472051975516, Loop detected at
"2017-12-21 08:00:00.00Z"

./radosgw-admin-break-loop --uid=TestUser --start-date=2017-12-01
--end-date=2017-12-22 [...]
"bytes_received": 1245051973424, Loop detected at
"2017-12-21 08:00:00.00Z"

 This leads to the assumption, that the loop occurs after processing of
raw data.

Looks like a bug?

Best regards,

Ingo Reimann
Dunkel GmbH
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd_recovery_max_chunk value

2018-02-06 Thread Christian Balzer
On Tue, 6 Feb 2018 13:27:22 +0530 Karun Josy wrote:

>  Hi Christian,
> 
> Thank you for your help.
> 
> Ceph version is 12.2.2. So is this value bad ? Do you have any suggestions ?
> 
> 
> So to reduce the max chunk ,I assume I can choose something like
> 7 << 20 ,ie 7340032 ?
> 
More like 4MB to match things up nicely in the binary world.

Christian
> Karun Josy
> 
> On Tue, Feb 6, 2018 at 1:15 PM, Christian Balzer  wrote:
> 
> > On Tue, 6 Feb 2018 13:01:12 +0530 Karun Josy wrote:
> >  
> > > Hello,
> > >
> > > We are seeing slow requests while recovery process going on.
> > >
> > > I am trying to slow down the recovery process. I set  
> > osd_recovery_max_active  
> > > and  osd_recovery_sleep as below :
> > > --
> > > ceph tell osd.* injectargs '--osd_recovery_max_active 1'
> > > ceph tell osd.* injectargs '--osd_recovery_sleep .1'
> > > --  
> > What version of Ceph, in some "sleep" values will make things _worse_!
> > Would be nice if that was documented in like, the documentation...
> >  
> > >
> > > But I am confused with the  osd_recovery_max_chunk. Currently, it shows
> > > 8388608.
> > >
> > > # ceph daemon osd.4 config get osd_recovery_max_chunk
> > > {
> > > "osd_recovery_max_chunk": "8388608"
> > >
> > >
> > > In ceph documentation, it shows
> > >
> > > ---
> > > osd recovery max chunk
> > > Description: The maximum size of a recovered chunk of data to push.
> > > Type: 64-bit Unsigned Integer
> > > Default: 8 << 20
> > > 
> > >
> > > I am confused. Can anyone let me know what is the value that I have to  
> > give  
> > > to reduce this parameter ?
> > >  
> > This is what you get when programmers write docs.
> >
> > The above is a left-shift operation, see for example:
> > http://bit-calculator.com/bit-shift-calculator
> >
> > Now if shrinking that value is beneficial for reducing recovery load,
> > that's for you to find out.
> >
> > Christian
> >  
> > >
> > >
> > > Karun Josy  
> >
> >
> > --
> > Christian BalzerNetwork/Systems Engineer
> > ch...@gol.com   Rakuten Communications
> >  


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd_recovery_max_chunk value

2018-02-06 Thread Christian Balzer
On Tue, 6 Feb 2018 13:24:18 +0530 Karun Josy wrote:

> Hi Christian,
> 
> Thank you for your help.
> 
> Ceph version is 12.2.2. So is this value bad ? Do you have any suggestions ?
>
That should be fine AFAIK, some (all?) versions for Jewel definitely are
not.

> 
> ceph tell osd.* injectargs '--osd_recovery_sleep .1'
> -
> 
> 
> Karun Josy
> 
> On Tue, Feb 6, 2018 at 1:15 PM, Christian Balzer  wrote:
> 
> > On Tue, 6 Feb 2018 13:01:12 +0530 Karun Josy wrote:
> >  
> > > Hello,
> > >
> > > We are seeing slow requests while recovery process going on.
> > >
> > > I am trying to slow down the recovery process. I set  
> > osd_recovery_max_active  
> > > and  osd_recovery_sleep as below :
> > > --
> > > ceph tell osd.* injectargs '--osd_recovery_max_active 1'
> > > ceph tell osd.* injectargs '--osd_recovery_sleep .1'
> > > --  
> > What version of Ceph, in some "sleep" values will make things _worse_!
> > Would be nice if that was documented in like, the documentation...
> >  
> > >
> > > But I am confused with the  osd_recovery_max_chunk. Currently, it shows
> > > 8388608.
> > >
> > > # ceph daemon osd.4 config get osd_recovery_max_chunk
> > > {
> > > "osd_recovery_max_chunk": "8388608"
> > >
> > >
> > > In ceph documentation, it shows
> > >
> > > ---
> > > osd recovery max chunk
> > > Description: The maximum size of a recovered chunk of data to push.
> > > Type: 64-bit Unsigned Integer
> > > Default: 8 << 20
> > > 
> > >
> > > I am confused. Can anyone let me know what is the value that I have to  
> > give  
> > > to reduce this parameter ?
> > >  
> > This is what you get when programmers write docs.
> >
> > The above is a left-shift operation, see for example:
> > http://bit-calculator.com/bit-shift-calculator
> >
> > Now if shrinking that value is beneficial for reducing recovery load,
> > that's for you to find out.
> >
> > Christian
> >  
> > >
> > >
> > > Karun Josy  
> >
> >
> > --
> > Christian BalzerNetwork/Systems Engineer
> > ch...@gol.com   Rakuten Communications
> >  


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com