[ceph-users] corporate video production company in bangalore

2019-11-18 Thread vhtnow11
Best Video Production Company in Bangalore : We at VHTnow create visual 
masterpieces that engage, inspire and impact people's lives. Our services also 
include ad film and corporate film production in bangalore
visit:https://vhtnow.com/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ssd cache question

2019-11-18 Thread Wesley Peng
Thanks Manuel for letting me know this.  18.11.2019, 22:11, "EDH - Manuel Rios Fernandez" :Hi Wesley Its a common issue think about a SSD cache will help in ceph. Normally it produces other issue also related to performance, the’res a lot of mailist about this. Our recommendation , in RBD setup don’t help. RegardsManuel  De: Wesley Peng Enviado el: lunes, 18 de noviembre de 2019 14:54Para: ceph-users@ceph.ioAsunto: [ceph-users] Ssd cache question Hello For today ceph deployment, is SSD cache pool the must for performance stuff? Thank you. Regards ___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: msgr2 not used on OSDs in some Nautilus clusters

2019-11-18 Thread Bryan Stillwell
I cranked up debug_ms to 20 on two of these clusters today and I'm still not 
understanding why some of the clusters use v2 and some just use v1.

Here's the boot/peering process for the cluster which uses v2:

2019-11-18 16:46:03.027 7fabb6281dc0  0 osd.0 39101 done with init, starting 
boot process
2019-11-18 16:46:03.028 7fabb6281dc0  1 osd.0 39101 start_boot
2019-11-18 16:46:03.030 7fabaebac700  5 --2- 
[v2:10.0.32.67:6800/258117,v1:10.0.32.67:6801/258117] >> 
[v2:10.0.32.3:6800/1473285,v1:10.0.32.3:6801/1473285] conn(0x5596b30c3000 
0x5596b4bf4000 unknown :-1 s=HELLO_CONNECTING pgs=0 cs=0 l=1 rx=0 
tx=0).handle_hello received hello: peer_type=16 
peer_addr_for_me=v2:10.0.32.67:51508/0
2019-11-18 16:46:03.034 7faba8116700  1 -- 
[v2:10.0.32.67:6800/258117,v1:10.0.32.67:6801/258117] --> 
[v2:10.0.32.65:3300/0,v1:10.0.32.65:6789/0] -- osd_boot(osd.0 booted 0 features 
4611087854031667199 v39101) v7 -- 0x5596b4bd6000 con 0x5596b3b06400
2019-11-18 16:46:03.034 7faba8116700  5 --2- 
[v2:10.0.32.67:6800/258117,v1:10.0.32.67:6801/258117] >> 
[v2:10.0.32.65:3300/0,v1:10.0.32.65:6789/0] conn(0x5596b3b06400 0x5596b2bca580 
crc :-1 s=READY pgs=11687624 cs=0 l=1 rx=0 tx=0).send_message enqueueing 
message m=0x5596b4bd6000 type=71 osd_boot(osd.0 booted 0 features 
4611087854031667199 v39101) v7
2019-11-18 16:46:03.034 7fabaf3ad700 20 --2- 
[v2:10.0.32.67:6800/258117,v1:10.0.32.67:6801/258117] >> 
[v2:10.0.32.65:3300/0,v1:10.0.32.65:6789/0] conn(0x5596b3b06400 0x5596b2bca580 
crc :-1 s=READY pgs=11687624 cs=0 l=1 rx=0 tx=0).prepare_send_message 
m=osd_boot(osd.0 booted 0 features 4611087854031667199 v39101) v7
2019-11-18 16:46:03.034 7fabaf3ad700 20 --2- 
[v2:10.0.32.67:6800/258117,v1:10.0.32.67:6801/258117] >> 
[v2:10.0.32.65:3300/0,v1:10.0.32.65:6789/0] conn(0x5596b3b06400 0x5596b2bca580 
crc :-1 s=READY pgs=11687624 cs=0 l=1 rx=0 tx=0).prepare_send_message encoding 
features 4611087854031667199 0x5596b4bd6000 osd_boot(osd.0 booted 0 features 
4611087854031667199 v39101) v7
2019-11-18 16:46:03.034 7fabaf3ad700  5 --2- 
[v2:10.0.32.67:6800/258117,v1:10.0.32.67:6801/258117] >> 
[v2:10.0.32.65:3300/0,v1:10.0.32.65:6789/0] conn(0x5596b3b06400 0x5596b2bca580 
crc :-1 s=READY pgs=11687624 cs=0 l=1 rx=0 tx=0).write_message sending message 
m=0x5596b4bd6000 seq=8 osd_boot(osd.0 booted 0 features 4611087854031667199 
v39101) v7
2019-11-18 16:46:03.352 7fab9d100700  1 osd.0 39104 state: booting -> active
2019-11-18 16:46:03.354 7fabaebac700  5 --2- 
[v2:10.0.32.67:6802/258117,v1:10.0.32.67:6803/258117] >> 
[v2:10.0.32.9:6802/3892454,v1:10.0.32.9:6803/3892454] conn(0x5596b4d68800 
0x5596b4bf5080 unknown :-1 s=HELLO_CONNECTING pgs=0 cs=0 l=0 rx=0 
tx=0).handle_hello received hello: peer_type=4 
peer_addr_for_me=v2:10.0.32.67:45488/0
2019-11-18 16:46:03.354 7fabafbae700  5 --2- 
[v2:10.0.32.67:6802/258117,v1:10.0.32.67:6803/258117] >> 
[v2:10.0.32.142:6810/2881684,v1:10.0.32.142:6811/2881684] conn(0x5596b4d68000 
0x5596b4bf4580 unknown :-1 s=HELLO_CONNECTING pgs=0 cs=0 l=0 rx=0 
tx=0).handle_hello received hello: peer_type=4 
peer_addr_for_me=v2:10.0.32.67:39044/0
2019-11-18 16:46:03.355 7fabaf3ad700  5 --2-  >> 
[v2:10.0.32.67:6814/100535,v1:10.0.32.67:6815/100535] conn(0x5596b4d68400 
0x5596b4bf4b00 unknown :-1 s=HELLO_CONNECTING pgs=0 cs=0 l=1 rx=0 
tx=0).handle_hello received hello: peer_type=4 
peer_addr_for_me=v2:10.0.32.67:51558/0
2019-11-18 16:46:03.355 7fabaf3ad700  1 -- 10.0.32.67:0/258117 learned_addr 
learned my addr 10.0.32.67:0/258117 (peer_addr_for_me v2:10.0.32.67:0/0)
2019-11-18 16:46:03.355 7fabafbae700  5 --2-  >> 
[v2:10.0.32.67:6812/100535,v1:10.0.32.67:6813/100535] conn(0x5596b4d68c00 
0x5596b4bf5600 unknown :-1 s=HELLO_CONNECTING pgs=0 cs=0 l=1 rx=0 
tx=0).handle_hello received hello: peer_type=4 
peer_addr_for_me=v2:10.0.32.67:40378/0
2019-11-18 16:46:03.355 7fabafbae700  1 -- 10.0.32.67:0/258117 learned_addr 
learned my addr 10.0.32.67:0/258117 (peer_addr_for_me v2:10.0.32.67:0/0)


You can see at the end it learns the address to be v2:10.0.32.67:0/0, but 
compare that to the cluster which uses v1:

2019-11-18 16:46:05.066 7f9182d8ce00  0 osd.0 46410 done with init, starting 
boot process
2019-11-18 16:46:05.066 7f9182d8ce00  1 osd.0 46410 start_boot
2019-11-18 16:46:05.069 7f917becf700  5 --2- 
[v2:10.0.13.2:6800/3084510,v1:10.0.13.2:6801/3084510] >> 
[v2:10.0.12.131:6800/3507,v1:10.0.12.131:6801/3507] conn(0x56011fc4 
0x56010f0b3b80 unknown :-1 s=HELLO_CONNECTING pgs=0 cs=0 l=1 rx=0 
tx=0).handle_hello received hello: peer_type=16 
peer_addr_for_me=v2:10.0.13.2:44708/0
2019-11-18 16:46:05.072 7f9174c3e700  1 -- 
[v2:10.0.13.2:6800/3084510,v1:10.0.13.2:6801/3084510] --> 
[v2:10.0.13.137:3300/0,v1:10.0.13.137:6789/0] -- osd_boot(osd.0 booted 0 
features 4611087854031667199 v46410) v7 -- 0x56011fc26000 con 0x56010f291400
2019-11-18 16:46:05.072 7f9174c3e700  5 --2- 
[v2:10.0.13.2:6800/3084510,v1:10.0.13.2:6801/3084510] >> 
[v2:10.0.13.137:3300/0,v1:10.0.13.137:6789/0] conn(0x56010f291400 

[ceph-users] Re: add debian buster stable support for ceph-deploy

2019-11-18 Thread Daniel Swarbrick
Yes of course, these packages first have to make it through -testing before 
they can even be considered for buster-backports.

FWIW, we have successfully been running cross-ported mimic packages from Ubuntu 
on buster for a few months now, rebuilt with buster toolchain, along with a few 
minor patches. We have nautilus packages ready to go, but have been waiting for 
it to mature a bit first (and are now waiting for 14.2.5).

Kevin Olbrich wrote:
> I don't think debian teams will package them for buster, as the policy
> forbids that.
> Maybe backports (like v10 vs. v12 for stretch) but we will only know for
> sure when it's there.
> 
> Kevin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: add debian buster stable support for ceph-deploy

2019-11-18 Thread Kevin Olbrich
I don't think debian teams will package them for buster, as the policy
forbids that.
Maybe backports (like v10 vs. v12 for stretch) but we will only know for
sure when it's there.

Kevin


Am Mo., 18. Nov. 2019 um 20:48 Uhr schrieb Daniel Swarbrick <
daniel.swarbr...@gmail.com>:

> It looks like Debian's own packaging efforts have finally kicked back into
> life:
>
> https://salsa.debian.org/ceph-team/ceph/commits/debian/14.2.4-1
>
> Perhaps we can expect some official Debian packages again soon.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Balancing PGs across OSDs

2019-11-18 Thread Paul Emmerich
You have way too few PGs in one of the roots. Many OSDs have so few
PGs that you should see a lot of health warnings because of it.
The other root has a factor 5 difference in disk size which isn't ideal either.


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Mon, Nov 18, 2019 at 3:03 PM Thomas Schneider <74cmo...@gmail.com> wrote:
>
> Hi,
>
> in this 
> blog post I find this statement:
> "So, in our ideal world so far (assuming equal size OSDs), every OSD now
> has the same number of PGs assigned."
>
> My issue is that accross all pools the number of PGs per OSD is not equal.
> And I conclude that this is causing very unbalanced data placement.
> As a matter of fact the data stored on my 1.6TB HDD in specific pool
> "hdb_backup" is in a range starting with
> osd.228 size: 1.6 usage: 52.61 reweight: 1.0
> and ending with
> osd.145 size: 1.6 usage: 81.11 reweight: 1.0
>
> This impacts the amount of data that can be stored in the cluster heavily.
>
> Ceph balancer is enabled, but this is not solving this issue.
> root@ld3955:~# ceph balancer status
> {
> "active": true,
> "plans": [],
> "mode": "upmap"
> }
>
> Therefore I would ask you for suggestions how to work on this unbalanced
> data distribution.
>
> I have attached pastebin for
> - ceph osd df sorted by usage 
> - ceph osd df tree 
>
> My cluster has multiple crush roots respresenting different disks.
> In addition I have defined multiple pools, one pool for each disk type:
> hdd, ssd, nvme.
>
> THX
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: add debian buster stable support for ceph-deploy

2019-11-18 Thread Paul Emmerich
We maintain an unofficial mirror for Buster packages:
https://croit.io/2019/07/07/2019-07-07-debian-mirror


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Mon, Nov 18, 2019 at 5:16 PM Jelle de Jong  wrote:
>
> Hello everybody,
>
> Can somebody add support for Debian buster and ceph-deploy:
> https://tracker.ceph.com/issues/42870
>
> Highly appreciated,
>
> Regards,
>
> Jelle de Jong
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] add debian buster stable support for ceph-deploy

2019-11-18 Thread Jelle de Jong

Hello everybody,

Can somebody add support for Debian buster and ceph-deploy: 
https://tracker.ceph.com/issues/42870


Highly appreciated,

Regards,

Jelle de Jong
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph manager causing MGR active switch

2019-11-18 Thread Thomas Schneider
Hi,

I can see the following error message regularely in MGR log:
2019-11-18 14:25:48.847 7fd9e6a3a700  0 mgr[dashboard]
[18/Nov/2019:14:25:48] ENGINE Error in HTTPServer.tick
Traceback (most recent call last):
  File
"/usr/lib/python2.7/dist-packages/cherrypy/wsgiserver/__init__.py", line
2021, in start
    self.tick()
  File
"/usr/lib/python2.7/dist-packages/cherrypy/wsgiserver/__init__.py", line
2090, in tick
    s, ssl_env = self.ssl_adapter.wrap(s)
  File
"/usr/lib/python2.7/dist-packages/cherrypy/wsgiserver/ssl_builtin.py",
line 67, in wrap
    server_side=True)
  File "/usr/lib/python2.7/ssl.py", line 369, in wrap_socket
    _context=self)
  File "/usr/lib/python2.7/ssl.py", line 599, in __init__
    self.do_handshake()
  File "/usr/lib/python2.7/ssl.py", line 828, in do_handshake
    self._sslobj.do_handshake()
error: [Errno 0] Error

2019-11-18 14:25:49.027 7fd9e6a3a700  0 mgr[dashboard]
[18/Nov/2019:14:25:49] ENGINE Error in HTTPServer.tick
Traceback (most recent call last):
  File
"/usr/lib/python2.7/dist-packages/cherrypy/wsgiserver/__init__.py", line
2021, in start
    self.tick()
  File
"/usr/lib/python2.7/dist-packages/cherrypy/wsgiserver/__init__.py", line
2090, in tick
    s, ssl_env = self.ssl_adapter.wrap(s)
  File
"/usr/lib/python2.7/dist-packages/cherrypy/wsgiserver/ssl_builtin.py",
line 67, in wrap
    server_side=True)
  File "/usr/lib/python2.7/ssl.py", line 369, in wrap_socket
    _context=self)
  File "/usr/lib/python2.7/ssl.py", line 599, in __init__
    self.do_handshake()
  File "/usr/lib/python2.7/ssl.py", line 828, in do_handshake
    self._sslobj.do_handshake()
SSLError: [SSL: SSLV3_ALERT_CERTIFICATE_UNKNOWN] sslv3 alert certificate
unknown (_ssl.c:727)

In many cases this error causes a switch of the active MGR node.
But there's another impact on the Ceph Dashboard directly that hangs
completely when this error is logged.

Any advise how to fix this issue is appreciated.

THX
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Balancing PGs across OSDs

2019-11-18 Thread Thomas Schneider
Hi,

in this 
blog post I find this statement:
"So, in our ideal world so far (assuming equal size OSDs), every OSD now
has the same number of PGs assigned."

My issue is that accross all pools the number of PGs per OSD is not equal.
And I conclude that this is causing very unbalanced data placement.
As a matter of fact the data stored on my 1.6TB HDD in specific pool
"hdb_backup" is in a range starting with
osd.228 size: 1.6 usage: 52.61 reweight: 1.0
and ending with
osd.145 size: 1.6 usage: 81.11 reweight: 1.0

This impacts the amount of data that can be stored in the cluster heavily.

Ceph balancer is enabled, but this is not solving this issue.
root@ld3955:~# ceph balancer status
{
    "active": true,
    "plans": [],
    "mode": "upmap"
}

Therefore I would ask you for suggestions how to work on this unbalanced
data distribution.

I have attached pastebin for
- ceph osd df sorted by usage 
- ceph osd df tree 

My cluster has multiple crush roots respresenting different disks.
In addition I have defined multiple pools, one pool for each disk type:
hdd, ssd, nvme.

THX
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: nfs ganesha rgw write errors

2019-11-18 Thread Daniel Gryniewicz

On 11/17/19 1:42 PM, Marc Roos wrote:
  


Hi Daniel,

I am able to mount the buckets with your config, however when I try to
write something, my logs get a lot of these errors:

svc_732] nfs4_Errno_verbose :NFS4 :CRIT :Error I/O error in
nfs4_write_cb converted to NFS4ERR_IO but was set non-retryable

Any chance you know how to resolve this?

  



Sounds like the RGW user configured in Ganesha doesn't have permissions 
to write to the buckets in question.


Daniel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Full FLash NVME Cluster recommendation

2019-11-18 Thread Darren Soothill
Hi Yoann,

So I would not be putting 1 x 6.4TB device in but multiple smaller devices in 
each node.

What CPU are you thinking of using?
How many CPU's? If you have 1 PCIe card then it will only be connected to 1 CPU 
so will you be able to use all of the performance of multiple CPU's.
What network are you thinking of? Wouldn't be doing less than 100G or multiple 
25G connections.
Where is you CephFS metadata being stored? What are you doing about CephFS 
metadata servers?
What about some faster storage for the WAL?

What is your IO Profile? Read/Write split?

It may be the case that EC is not the best fit for the workload you are trying 
to do.

Darren



On 15/11/2019, 15:26, "Yoann Moulin"  wrote:

Hello,

I'm going to deploy a new cluster soon based on 6.4TB NVME PCI-E Cards, I 
will have only 1 NVME card per node and 38 nodes.

The use case is to offer cephfs volumes for a k8s platform, I plan to use 
an EC-POOL 8+3 for the cephfs_data pool.

Do you have recommendations for the setup or mistakes to avoid? I use 
ceph-ansible to deploy all myclusters.

Best regards,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Possible data corruption with 14.2.3 and 14.2.4

2019-11-18 Thread Simon Ironside

Hi Igor,

Thanks very much for providing all this detail.

On 18/11/2019 10:43, Igor Fedotov wrote:


- Check how full their DB devices are?
For your case it makes sense to check this. And then safely wait for 
14.2.5 if its not full.


bluefs.db_used_bytes / bluefs_db_total_bytes is only around 1-2% (I am 
almost exclusively RBD and using a 64GB DB/WAL partition) and 
bluefs_slow_used_bytes is 0 on them all so it would seem I have little 
to worry about here with an essentially zero chance of corruption so far.


I will sit tight and wait for 14.2.5.

Thanks again,
Simon.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Possible data corruption with 14.2.3 and 14.2.4

2019-11-18 Thread Igor Fedotov

Hi Simon,

On 11/15/2019 6:02 PM, Simon Ironside wrote:

Hi Igor,

On 15/11/2019 14:22, Igor Fedotov wrote:

Do you mean both standalone DB and(!!) standalone WAL 
devices/partitions by having SSD DB/WAL?


No, 1x combined DB/WAL partition on an SSD and 1x data partition on an 
HDD per OSD. I.e. created like:


ceph-deploy osd create --data /dev/sda --block-db ssd0/ceph-db-disk0
ceph-deploy osd create --data /dev/sdb --block-db ssd0/ceph-db-disk1
ceph-deploy osd create --data /dev/sdc --block-db ssd0/ceph-db-disk2

--block-wal wasn't used.

If so then BlueFS might eventually overwrite some data at you DB 
volume with BlueFS log content. Which most probably makes OSD crash 
and unable to restart one day. This is quite random and not very 
frequent event which is to some degree dependent from cluster 
loading. And the period between actual data corruption and any 
evidence of this is non-zero most of the time - we tend to see it 
mostly when RocksDB was performing compaction.


So this, if I've understood you correctly, is for those with 3 
separate (DB + WAL + Data) devices per OSD. Not my setup.



right
Other OSD configuration which might suffer from the issue is main 
device + WAL devices.


Much less failure probability exists for main + DB layout. It 
requires almost full DB to get any chances to appear.


This sounds like my setup: 2 separate (DB/WAL combined + Data) devices 
per OSD.

yep


Main-only device configurations aren't under the threat as far as I 
can tell.


And this is for all-in-one devices that aren't at risk. Understood.

While we're waiting for 14.2.5 to be released, what should 14.2.3/4 
users with an at risk setup do in the meantime, if anything?


- Check how full their DB devices are?
For your case it makes sense to check this. And then safely wait for 
14.2.5 if its not full.

- Avoid adding new data/load to the cluster?
this is probably the last resort when you already start seeing this 
issue and is absolutely uncomfortable with data loss probability. Not a 
panacea anyway though as one can have already broken data but still 
undiscovered data corruption at multiple OSDs b.

- Would deep scrubbing detect any undiscovered corruption?


May be. We tend to see it during DB compaction (mostly triggered by DB 
write access) but IMO it can be detected during scrubbing and/or store 
fsck as well.




- Get backups ready to restore? I mean, how bad is this?


As per multiple reports there are some chances to lose OSD data. E.g. 
we've got reports about reproducing 1-2 OSD failures per day under some 
stress(!!!) loading testing. That's probably not the general case and 
production clusters might suffer from this much less frequently. E.g. 
for our multiple QA activities we've observed the issue just once since 
it had been introduced.


Anyway it's possible to lose multiple OSDs simultaneously. Probability 
is rather not that large but it's definitely non-zero.


But as fix is almost ready I'd recommend to wait for it and apply ASAP.



Thanks,
Simon.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io