[ceph-users] ceph cluster monitoring tool

2018-07-23 Thread Satish Patel
My 5 node ceph cluster is ready for production, now i am looking for
good monitoring tool (Open source), what majority of folks using in
their production?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Reclaim free space on RBD images that use Bluestore?????

2018-07-23 Thread Satish Patel
Forgive me found this post which solved my issue:

https://www.sebastien-han.fr/blog/2015/02/02/openstack-and-ceph-rbd-discard/

On Mon, Jul 23, 2018 at 11:22 PM, Satish Patel  wrote:
> I have same issue, i just build new Ceph cluster for my Openstack VMs
> workload using rbd and i have created bunch of VM did some dd test to
> create big big file to test performance now i deleted all dd file but
> ceph still showing USED space.
>
> I tried to do from guest VM
>
> [root@c7-vm ~]# sudo fstrim /
> fstrim: /: the discard operation is not supported
>
>
> Can we run fstrim on ceph OSD node? what if i delete my VM in that
> case how do i run fstrim ?
>
> On Mon, Jul 23, 2018 at 6:13 PM, Ronny Aasen  
> wrote:
>> On 23.07.2018 22:18, Sean Bolding wrote:
>>
>> I have XenServers that connect via iSCSI to Ceph gateway servers that use
>> lrbd and targetcli. On my ceph cluster the RBD images I create are used as
>> storage repositories in Xenserver for the virtual machine vdisks.
>>
>>
>>
>> Whenever I delete a virtual machine, XenServer shows that the repository
>> size has decreased. This also happens when I mount a virtual drive in
>> Xenserver as a virtual drive in a Windows guest. If I delete a large file,
>> such as an exported VM, it shows as deleted and space available. However;
>> when check in Ceph  using ceph –s or ceph df it still shows the space being
>> used.
>>
>>
>>
>> I checked everywhere and it seems there was a reference to it here
>> https://github.com/ceph/ceph/pull/14727 but not sure if a way to trim or
>> discard freed blocks was ever implemented.
>>
>>
>>
>> The only way I have found is to play musical chairs and move the VMs to
>> different repositories and then completely remove the old RBD images in
>> ceph. This is not exactly easy to do.
>>
>>
>>
>> Is there a way to reclaim free space on RBD images that use Bluestore?
>> What commands do I use and where do I use this from? If such command exist
>> do I run them on the ceph cluster or do I run them from XenServer? Please
>> help.
>>
>>
>>
>>
>>
>> Sean
>>
>>
>>
>>
>>
>>
>>
>>
>> I am not familiar with Xen, but it does sounds like you have a rbd mounted
>> with a filesystem on the xen server.
>> in that case it is the same as for other filesystems. Deleted files are just
>> deleted in the file allocation table, and the RBD space is "reclaimed" when
>> the filesystem zeroes out the now unused blocks.
>>
>> in many filesystems you would run the fstrim command to overwrite free'd
>> blocks with zeroes, optionally mount the fs with the the discard option.
>> in xenserver >6.5 this should be a button in xencenter to reclaim freed
>> space.
>>
>>
>> kind regards
>> Ronny Aasen
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Insane CPU utilization in ceph.fuse

2018-07-23 Thread Yan, Zheng
could you profile memory allocation of mds

http://docs.ceph.com/docs/mimic/rados/troubleshooting/memory-profiling/
On Tue, Jul 24, 2018 at 7:54 AM Daniel Carrasco  wrote:
>
> Yeah, is also my thread. This thread was created before lower the cache size 
> from 512Mb to 8Mb. I thought that maybe was my fault and I did a 
> misconfiguration, so I've ignored the problem until now.
>
> Greetings!
>
> El mar., 24 jul. 2018 1:00, Gregory Farnum  escribió:
>>
>> On Mon, Jul 23, 2018 at 11:08 AM Patrick Donnelly  
>> wrote:
>>>
>>> On Mon, Jul 23, 2018 at 5:48 AM, Daniel Carrasco  
>>> wrote:
>>> > Hi, thanks for your response.
>>> >
>>> > Clients are about 6, and 4 of them are the most of time on standby. Only 
>>> > two
>>> > are active servers that are serving the webpage. Also we've a varnish on
>>> > front, so are not getting all the load (below 30% in PHP is not much).
>>> > About the MDS cache, now I've the mds_cache_memory_limit at 8Mb.
>>>
>>> What! Please post `ceph daemon mds. config diff`,  `... perf
>>> dump`, and `... dump_mempools `  from the server the active MDS is on.
>>>
>>> > I've tested
>>> > also 512Mb, but the CPU usage is the same and the MDS RAM usage grows up 
>>> > to
>>> > 15GB (on a 16Gb server it starts to swap and all fails). With 8Mb, at 
>>> > least
>>> > the memory usage is stable on less than 6Gb (now is using about 1GB of 
>>> > RAM).
>>>
>>> We've seen reports of possible memory leaks before and the potential
>>> fixes for those were in 12.2.6. How fast does your MDS reach 15GB?
>>> Your MDS cache size should be configured to 1-8GB (depending on your
>>> preference) so it's disturbing to see you set it so low.
>>
>>
>> See also the thread "[ceph-users] Fwd: MDS memory usage is very high", which 
>> had more discussion of that. The MDS daemon seemingly had 9.5GB of allocated 
>> RSS but only believed 489MB was in use for the cache...
>> -Greg
>>
>>>
>>>
>>> --
>>> Patrick Donnelly
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Reclaim free space on RBD images that use Bluestore?????

2018-07-23 Thread Satish Patel
I have same issue, i just build new Ceph cluster for my Openstack VMs
workload using rbd and i have created bunch of VM did some dd test to
create big big file to test performance now i deleted all dd file but
ceph still showing USED space.

I tried to do from guest VM

[root@c7-vm ~]# sudo fstrim /
fstrim: /: the discard operation is not supported


Can we run fstrim on ceph OSD node? what if i delete my VM in that
case how do i run fstrim ?

On Mon, Jul 23, 2018 at 6:13 PM, Ronny Aasen  wrote:
> On 23.07.2018 22:18, Sean Bolding wrote:
>
> I have XenServers that connect via iSCSI to Ceph gateway servers that use
> lrbd and targetcli. On my ceph cluster the RBD images I create are used as
> storage repositories in Xenserver for the virtual machine vdisks.
>
>
>
> Whenever I delete a virtual machine, XenServer shows that the repository
> size has decreased. This also happens when I mount a virtual drive in
> Xenserver as a virtual drive in a Windows guest. If I delete a large file,
> such as an exported VM, it shows as deleted and space available. However;
> when check in Ceph  using ceph –s or ceph df it still shows the space being
> used.
>
>
>
> I checked everywhere and it seems there was a reference to it here
> https://github.com/ceph/ceph/pull/14727 but not sure if a way to trim or
> discard freed blocks was ever implemented.
>
>
>
> The only way I have found is to play musical chairs and move the VMs to
> different repositories and then completely remove the old RBD images in
> ceph. This is not exactly easy to do.
>
>
>
> Is there a way to reclaim free space on RBD images that use Bluestore?
> What commands do I use and where do I use this from? If such command exist
> do I run them on the ceph cluster or do I run them from XenServer? Please
> help.
>
>
>
>
>
> Sean
>
>
>
>
>
>
>
>
> I am not familiar with Xen, but it does sounds like you have a rbd mounted
> with a filesystem on the xen server.
> in that case it is the same as for other filesystems. Deleted files are just
> deleted in the file allocation table, and the RBD space is "reclaimed" when
> the filesystem zeroes out the now unused blocks.
>
> in many filesystems you would run the fstrim command to overwrite free'd
> blocks with zeroes, optionally mount the fs with the the discard option.
> in xenserver >6.5 this should be a button in xencenter to reclaim freed
> space.
>
>
> kind regards
> Ronny Aasen
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Insane CPU utilization in ceph.fuse

2018-07-23 Thread Daniel Carrasco
Yeah, is also my thread. This thread was created before lower the cache
size from 512Mb to 8Mb. I thought that maybe was my fault and I did a
misconfiguration, so I've ignored the problem until now.

Greetings!

El mar., 24 jul. 2018 1:00, Gregory Farnum  escribió:

> On Mon, Jul 23, 2018 at 11:08 AM Patrick Donnelly 
> wrote:
>
>> On Mon, Jul 23, 2018 at 5:48 AM, Daniel Carrasco 
>> wrote:
>> > Hi, thanks for your response.
>> >
>> > Clients are about 6, and 4 of them are the most of time on standby.
>> Only two
>> > are active servers that are serving the webpage. Also we've a varnish on
>> > front, so are not getting all the load (below 30% in PHP is not much).
>> > About the MDS cache, now I've the mds_cache_memory_limit at 8Mb.
>>
>> What! Please post `ceph daemon mds. config diff`,  `... perf
>> dump`, and `... dump_mempools `  from the server the active MDS is on.
>>
>> > I've tested
>> > also 512Mb, but the CPU usage is the same and the MDS RAM usage grows
>> up to
>> > 15GB (on a 16Gb server it starts to swap and all fails). With 8Mb, at
>> least
>> > the memory usage is stable on less than 6Gb (now is using about 1GB of
>> RAM).
>>
>> We've seen reports of possible memory leaks before and the potential
>> fixes for those were in 12.2.6. How fast does your MDS reach 15GB?
>> Your MDS cache size should be configured to 1-8GB (depending on your
>> preference) so it's disturbing to see you set it so low.
>>
>
> See also the thread "[ceph-users] Fwd: MDS memory usage is very high",
> which had more discussion of that. The MDS daemon seemingly had 9.5GB of
> allocated RSS but only believed 489MB was in use for the cache...
> -Greg
>
>
>>
>> --
>> Patrick Donnelly
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mimic 13.2.1 release date

2018-07-23 Thread Sergey Malinin
Looks we're not getting it soon.
http://tracker.ceph.com/issues/24981 


> On 23.07.2018, at 13:45, Wido den Hollander  wrote:
> 
> Any news on this yet? 13.2.1 would be very welcome! :-)
> 
> Wido
> 
> On 07/09/2018 05:11 PM, Wido den Hollander wrote:
>> Hi,
>> 
>> Is there a release date for Mimic 13.2.1 yet?
>> 
>> There are a few issues which currently make deploying with Mimic 13.2.0
>> a bit difficult, for example:
>> 
>> - https://tracker.ceph.com/issues/24423
>> - https://github.com/ceph/ceph/pull/22393
>> 
>> Especially the first one makes it difficult.
>> 
>> 13.2.1 would be very welcome with these fixes in there.
>> 
>> Is there a ETA for this version yet?
>> 
>> Wido
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Insane CPU utilization in ceph.fuse

2018-07-23 Thread Gregory Farnum
On Mon, Jul 23, 2018 at 11:08 AM Patrick Donnelly 
wrote:

> On Mon, Jul 23, 2018 at 5:48 AM, Daniel Carrasco 
> wrote:
> > Hi, thanks for your response.
> >
> > Clients are about 6, and 4 of them are the most of time on standby. Only
> two
> > are active servers that are serving the webpage. Also we've a varnish on
> > front, so are not getting all the load (below 30% in PHP is not much).
> > About the MDS cache, now I've the mds_cache_memory_limit at 8Mb.
>
> What! Please post `ceph daemon mds. config diff`,  `... perf
> dump`, and `... dump_mempools `  from the server the active MDS is on.
>
> > I've tested
> > also 512Mb, but the CPU usage is the same and the MDS RAM usage grows up
> to
> > 15GB (on a 16Gb server it starts to swap and all fails). With 8Mb, at
> least
> > the memory usage is stable on less than 6Gb (now is using about 1GB of
> RAM).
>
> We've seen reports of possible memory leaks before and the potential
> fixes for those were in 12.2.6. How fast does your MDS reach 15GB?
> Your MDS cache size should be configured to 1-8GB (depending on your
> preference) so it's disturbing to see you set it so low.
>

See also the thread "[ceph-users] Fwd: MDS memory usage is very high",
which had more discussion of that. The MDS daemon seemingly had 9.5GB of
allocated RSS but only believed 489MB was in use for the cache...
-Greg


>
> --
> Patrick Donnelly
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Reclaim free space on RBD images that use Bluestore?????

2018-07-23 Thread Ronny Aasen

On 23.07.2018 22:18, Sean Bolding wrote:


I have XenServers that connect via iSCSI to Ceph gateway servers that 
use lrbd and targetcli. On my ceph cluster the RBD images I create are 
used as storage repositories in Xenserver for the virtual machine vdisks.


Whenever I delete a virtual machine, XenServer shows that the 
repository size has decreased. This also happens when I mount a 
virtual drive in Xenserver as a virtual drive in a Windows guest. If I 
delete a large file, such as an exported VM, it shows as deleted and 
space available. However; when check in Ceph  using ceph –s or ceph df 
it still shows the space being used.


I checked everywhere and it seems there was a reference to it here 
https://github.com/ceph/ceph/pull/14727 but not sure if a way to trim 
or discard freed blocks was ever implemented.


The only way I have found is to play musical chairs and move the VMs 
to different repositories and then completely remove the old RBD 
images in ceph. This is not exactly easy to do.


Is there a way to reclaim free space on RBD images that use 
Bluestore? What commands do I use and where do I use this from? If 
such command exist do I run them on the ceph cluster or do I run them 
from XenServer? Please help.


Sean



I am not familiar with Xen, but it does sounds like you have a rbd 
mounted with a filesystem on the xen server.
in that case it is the same as for other filesystems. Deleted files are 
just deleted in the file allocation table, and the RBD space is 
"reclaimed" when the filesystem zeroes out the now unused blocks.


in many filesystems you would run the fstrim command to overwrite free'd 
blocks with zeroes, optionally mount the fs with the the discard option.
in xenserver >6.5 this should be a button in xencenter to reclaim freed 
space.



kind regards
Ronny Aasen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Reclaim free space on RBD images that use Bluestore?????

2018-07-23 Thread Sean Bolding
I have XenServers that connect via iSCSI to Ceph gateway servers that use
lrbd and targetcli. On my ceph cluster the RBD images I create are used as
storage repositories in Xenserver for the virtual machine vdisks. 

 

Whenever I delete a virtual machine, XenServer shows that the repository
size has decreased. This also happens when I mount a virtual drive in
Xenserver as a virtual drive in a Windows guest. If I delete a large file,
such as an exported VM, it shows as deleted and space available. However;
when check in Ceph  using ceph -s or ceph df it still shows the space being
used.

 

I checked everywhere and it seems there was a reference to it here
https://github.com/ceph/ceph/pull/14727 but not sure if a way to trim or
discard freed blocks was ever implemented.

 

The only way I have found is to play musical chairs and move the VMs to
different repositories and then completely remove the old RBD images in
ceph. This is not exactly easy to do.

 

Is there a way to reclaim free space on RBD images that use Bluestore?
What commands do I use and where do I use this from? If such command exist
do I run them on the ceph cluster or do I run them from XenServer? Please
help.

 

 

Sean

 

 

 

 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Technical Writer - Red Hat Ceph Storage

2018-07-23 Thread Kenneth Hartsoe
Helloposting for greater visibility of this opportunity, thank you.

Technical Writer - Red Hat Ceph Storage
US-MA-Boston
Posting date (7/19/2018 2:14 AM)

Job ID: 64257

Category: Product Documentation

URL: 
https://us-redhat.icims.com/jobs/64257/technical-writer---red-hat-ceph-storage/job?hub=7=false=1170=500=true=false=-480=-420

Ken Hartsoe 
Senior Content Strategist 
Red Hat Storage Documentation 

Raleigh, North Carolina
khart...@redhat.com; IRC: khartsoe 
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Insane CPU utilization in ceph.fuse

2018-07-23 Thread Daniel Carrasco
Hi,

I forgot to say that maybe the Diff is lower than real (8Mb), because the
memory usage was still high and i've prepared a new configuration with
lower limit (5Mb). I've not reloaded the daemons for now, but maybe the
configuration was loaded again today and that's the reason why is using
less than 1Gb of RAM just now. Of course I've not rebooted the machine, but
maybe if the daemon was killed for high memory usage then the new
configuration is loaded now.

Greetings!


2018-07-23 21:07 GMT+02:00 Daniel Carrasco :

> Thanks!,
>
> It's true that I've seen a continuous memory growth, but I've not thought
> in a memory leak. I don't remember exactly how many hours were neccesary to
> fill the memory, but I calculate that were about 14h.
>
> With the new configuration looks like memory grows slowly and when it
> reaches 5-6 GB stops. Sometimes looks like the daemon flush the memory and
> down again to less than 1Gb grown again to 5-6Gb slowly.
>
> Just today I don't know why and how, because I've not changed anything on
> the ceph cluster, but the memory has down to less than 1 Gb and still there
> 8 hours later. I've only deployed a git repository with some changes.
>
> I've some nodes on version 12.2.5 because I've detected this problem and I
> didn't know if was for the latest version, so I've stopped the update. The
> one that is the active MDS is on latest version (12.2.7), and I've
> programmed an update for the rest of nodes the thursday.
>
> A graphic of the memory usage of latest days with that configuration:
> https://imgur.com/a/uSsvBi4
>
> I haven't info about when the problem was worst (512MB of MDS memory limit
> and 15-16Gb of usage), because memory usage was not logged. I've only a
> heap stats from that were dumped when the daemon was in progress to fill
> the memory:
>
> # ceph tell mds.kavehome-mgto-pro-fs01  heap stats
> 2018-07-19 00:43:46.142560 7f5a7a7fc700  0 client.1318388 ms_handle_reset
> on 10.22.0.168:6800/1129848128
> 2018-07-19 00:43:46.181133 7f5a7b7fe700  0 client.1318391 ms_handle_reset
> on 10.22.0.168:6800/1129848128
> mds.kavehome-mgto-pro-fs01 tcmalloc heap stats:
> 
> MALLOC: 9982980144 ( 9520.5 MiB) Bytes in use by application
> MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
> MALLOC: +172148208 (  164.2 MiB) Bytes in central cache freelist
> MALLOC: + 19031168 (   18.1 MiB) Bytes in transfer cache freelist
> MALLOC: + 23987552 (   22.9 MiB) Bytes in thread cache freelists
> MALLOC: + 20869280 (   19.9 MiB) Bytes in malloc metadata
> MALLOC:   
> MALLOC: =  10219016352 ( 9745.6 MiB) Actual memory used (physical + swap)
> MALLOC: +   3913687040 ( 3732.4 MiB) Bytes released to OS (aka unmapped)
> MALLOC:   
> MALLOC: =  14132703392 (13478.0 MiB) Virtual address space used
> MALLOC:
> MALLOC:  63875  Spans in use
> MALLOC: 16  Thread heaps in use
> MALLOC:   8192  Tcmalloc page size
> 
> Call ReleaseFreeMemory() to release freelist memory to the OS (via
> madvise()).
> Bytes released to the OS take up virtual address space but no physical
> memory.
>
>
>
> Here's the Diff:
> 
> 
> {
> "diff": {
> "current": {
> "admin_socket": "/var/run/ceph/ceph-mds.
> kavehome-mgto-pro-fs01.asok",
> "auth_client_required": "cephx",
> "bluestore_cache_size_hdd": "80530636",
> "bluestore_cache_size_ssd": "80530636",
> "err_to_stderr": "true",
> "fsid": "f015f888-6e0c-4203-aea8-ef0f69ef7bd8",
> "internal_safe_to_start_threads": "true",
> "keyring": "/var/lib/ceph/mds/ceph-kavehome-mgto-pro-fs01/
> keyring",
> "log_file": "/var/log/ceph/ceph-mds.
> kavehome-mgto-pro-fs01.log",
> "log_max_recent": "1",
> "log_to_stderr": "false",
> "mds_cache_memory_limit": "53687091",
> "mds_data": "/var/lib/ceph/mds/ceph-kavehome-mgto-pro-fs01",
> "mgr_data": "/var/lib/ceph/mgr/ceph-kavehome-mgto-pro-fs01",
> "mon_cluster_log_file": "default=/var/log/ceph/ceph.$channel.log
> cluster=/var/log/ceph/ceph.log",
> "mon_data": "/var/lib/ceph/mon/ceph-kavehome-mgto-pro-fs01",
> "mon_debug_dump_location": "/var/log/ceph/ceph-mds.
> kavehome-mgto-pro-fs01.tdump",
> "mon_host": "10.22.0.168,10.22.0.140,10.22.0.127",
> "mon_initial_members": "kavehome-mgto-pro-fs01,
> kavehome-mgto-pro-fs02, kavehome-mgto-pro-fs03",
> "osd_data": "/var/lib/ceph/osd/ceph-kavehome-mgto-pro-fs01",
> "osd_journal": "/var/lib/ceph/osd/ceph-kavehome-mgto-pro-fs01/
> journal",
> "public_addr": "10.22.0.168:0/0",
> 

Re: [ceph-users] Fwd: MDS memory usage is very high

2018-07-23 Thread Daniel Carrasco
Hi,

I forgot to say that maybe the Diff is lower than real (8Mb), because the
memory usage was still high and i've prepared a new configuration with
lower limit (5Mb). I've not reloaded the daemons for now, but maybe the
configuration was loaded again today and that's the reason why is using
less than 1Gb of RAM just now. Of course I've not rebooted the machine, but
maybe if the daemon was killed for high memory usage then the new
configuration is loaded now.

Greetings!

2018-07-19 11:35 GMT+02:00 Daniel Carrasco :

> Hello again,
>
> It is still early to say that is working fine now, but looks like the MDS
> memory is now under 20% of RAM and the most of time between 6-9%. Maybe was
> a mistake on configuration.
>
> As appointment, I've changed this client config:
> [global]
> ...
> bluestore_cache_size_ssd = 805306360
> bluestore_cache_size_hdd = 805306360
> mds_cache_memory_limit = 536870910
>
> [client]
>   client_reconnect_stale = true
>   client_cache_size = 32768
>   client_mount_timeout = 30
>   client_oc_max_objects = 2000
>   client_oc_size = 629145600
>   rbd_cache = true
>   rbd_cache_size = 671088640
>
>
> for this (just client cache sizes / 10):
> [global]
> ...
> bluestore_cache_size_ssd = 80530636
> bluestore_cache_size_hdd = 80530636
> mds_cache_memory_limit = 53687091
>
> [client]
>   client_cache_size = 32768
>   client_mount_timeout = 30
>   client_oc_max_objects = 2000
>   client_oc_size = 62914560
>   rbd_cache = true
>   rbd_cache_size = 67108864
>
>
>
> Now the heap stats are:
> mds.kavehome-mgto-pro-fs01 tcmalloc heap stats:
> 
> MALLOC:  714063568 (  681.0 MiB) Bytes in use by application
> MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
> MALLOC: +132992224 (  126.8 MiB) Bytes in central cache freelist
> MALLOC: + 21929920 (   20.9 MiB) Bytes in transfer cache freelist
> MALLOC: + 31806608 (   30.3 MiB) Bytes in thread cache freelists
> MALLOC: + 30666912 (   29.2 MiB) Bytes in malloc metadata
> MALLOC:   
> MALLOC: =931459232 (  888.3 MiB) Actual memory used (physical + swap)
> MALLOC: +  21886803968 (20872.9 MiB) Bytes released to OS (aka unmapped)
> MALLOC:   
> MALLOC: =  22818263200 (21761.2 MiB) Virtual address space used
> MALLOC:
> MALLOC:  21311  Spans in use
> MALLOC: 18  Thread heaps in use
> MALLOC:   8192  Tcmalloc page size
> 
> Call ReleaseFreeMemory() to release freelist memory to the OS (via
> madvise()).
> Bytes released to the OS take up virtual address space but no physical
> memory.
>
> And sometimes even better (taken later than above):
> mds.kavehome-mgto-pro-fs01 tcmalloc heap stats:
> 
> MALLOC:  516434072 (  492.5 MiB) Bytes in use by application
> MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
> MALLOC: +  7564936 (7.2 MiB) Bytes in central cache freelist
> MALLOC: +  2751072 (2.6 MiB) Bytes in transfer cache freelist
> MALLOC: +  2707072 (2.6 MiB) Bytes in thread cache freelists
> MALLOC: +  2715808 (2.6 MiB) Bytes in malloc metadata
> MALLOC:   
> MALLOC: =532172960 (  507.5 MiB) Actual memory used (physical + swap)
> MALLOC: +   573440 (0.5 MiB) Bytes released to OS (aka unmapped)
> MALLOC:   
> MALLOC: =532746400 (  508.1 MiB) Virtual address space used
> MALLOC:
> MALLOC:  21990  Spans in use
> MALLOC: 16  Thread heaps in use
> MALLOC:   8192  Tcmalloc page size
> 
> Call ReleaseFreeMemory() to release freelist memory to the OS (via
> madvise()).
> Bytes released to the OS take up virtual address space but no physical
> memory.
>
>
> Greetings!!
>
> 2018-07-19 10:24 GMT+02:00 Daniel Carrasco :
>
>> Hello,
>>
>> Finally I've to remove CephFS and use a simple NFS, because the MDS
>> daemon starts to use a lot of memory and is unstable. After reboot one node
>> because it started to swap (the cluster will be able to survive without a
>> node), the cluster goes down because one of the other MDS starts to use
>> about 15Gb of RAM and crash all the time, so the cluster is unable to come
>> back. The only solution is to reboot all nodes and is not good for HA.
>>
>> If somebody knows something about this, I'll be pleased to test it on a
>> test environment to see if we can find a solution.
>>
>> Greetings!
>>
>> 2018-07-19 1:07 GMT+02:00 Daniel Carrasco :
>>
>>> Thanks again,
>>>
>>> I was trying to use fuse client instead Ubuntu 16.04 kernel module to
>>> see if maybe is a client side problem, but CPU usage on fuse client is very
>>> high (a 100% and even more in a two cores machine), so I'd to rever to
>>> kernel client that uses much less CPU.
>>>
>>> Is a web server, so maybe 

Re: [ceph-users] Insane CPU utilization in ceph.fuse

2018-07-23 Thread Daniel Carrasco
Thanks!,

It's true that I've seen a continuous memory growth, but I've not thought
in a memory leak. I don't remember exactly how many hours were neccesary to
fill the memory, but I calculate that were about 14h.

With the new configuration looks like memory grows slowly and when it
reaches 5-6 GB stops. Sometimes looks like the daemon flush the memory and
down again to less than 1Gb grown again to 5-6Gb slowly.

Just today I don't know why and how, because I've not changed anything on
the ceph cluster, but the memory has down to less than 1 Gb and still there
8 hours later. I've only deployed a git repository with some changes.

I've some nodes on version 12.2.5 because I've detected this problem and I
didn't know if was for the latest version, so I've stopped the update. The
one that is the active MDS is on latest version (12.2.7), and I've
programmed an update for the rest of nodes the thursday.

A graphic of the memory usage of latest days with that configuration:
https://imgur.com/a/uSsvBi4

I haven't info about when the problem was worst (512MB of MDS memory limit
and 15-16Gb of usage), because memory usage was not logged. I've only a
heap stats from that were dumped when the daemon was in progress to fill
the memory:

# ceph tell mds.kavehome-mgto-pro-fs01  heap stats
2018-07-19 00:43:46.142560 7f5a7a7fc700  0 client.1318388 ms_handle_reset on
 10.22.0.168:6800/1129848128
2018-07-19 00:43:46.181133 7f5a7b7fe700  0 client.1318391 ms_handle_reset on
 10.22.0.168:6800/1129848128
mds.kavehome-mgto-pro-fs01 tcmalloc heap stats:

MALLOC: 9982980144 ( 9520.5 MiB) Bytes in use by application
MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
MALLOC: +172148208 (  164.2 MiB) Bytes in central cache freelist
MALLOC: + 19031168 (   18.1 MiB) Bytes in transfer cache freelist
MALLOC: + 23987552 (   22.9 MiB) Bytes in thread cache freelists
MALLOC: + 20869280 (   19.9 MiB) Bytes in malloc metadata
MALLOC:   
MALLOC: =  10219016352 ( 9745.6 MiB) Actual memory used (physical + swap)
MALLOC: +   3913687040 ( 3732.4 MiB) Bytes released to OS (aka unmapped)
MALLOC:   
MALLOC: =  14132703392 (13478.0 MiB) Virtual address space used
MALLOC:
MALLOC:  63875  Spans in use
MALLOC: 16  Thread heaps in use
MALLOC:   8192  Tcmalloc page size

Call ReleaseFreeMemory() to release freelist memory to the OS (via
madvise()).
Bytes released to the OS take up virtual address space but no physical
memory.



Here's the Diff:

{
"diff": {
"current": {
"admin_socket":
"/var/run/ceph/ceph-mds.kavehome-mgto-pro-fs01.asok",
"auth_client_required": "cephx",
"bluestore_cache_size_hdd": "80530636",
"bluestore_cache_size_ssd": "80530636",
"err_to_stderr": "true",
"fsid": "f015f888-6e0c-4203-aea8-ef0f69ef7bd8",
"internal_safe_to_start_threads": "true",
"keyring":
"/var/lib/ceph/mds/ceph-kavehome-mgto-pro-fs01/keyring",
"log_file": "/var/log/ceph/ceph-mds.kavehome-mgto-pro-fs01.log",
"log_max_recent": "1",
"log_to_stderr": "false",
"mds_cache_memory_limit": "53687091",
"mds_data": "/var/lib/ceph/mds/ceph-kavehome-mgto-pro-fs01",
"mgr_data": "/var/lib/ceph/mgr/ceph-kavehome-mgto-pro-fs01",
"mon_cluster_log_file":
"default=/var/log/ceph/ceph.$channel.log cluster=/var/log/ceph/ceph.log",
"mon_data": "/var/lib/ceph/mon/ceph-kavehome-mgto-pro-fs01",
"mon_debug_dump_location":
"/var/log/ceph/ceph-mds.kavehome-mgto-pro-fs01.tdump",
"mon_host": "10.22.0.168,10.22.0.140,10.22.0.127",
"mon_initial_members": "kavehome-mgto-pro-fs01,
kavehome-mgto-pro-fs02, kavehome-mgto-pro-fs03",
"osd_data": "/var/lib/ceph/osd/ceph-kavehome-mgto-pro-fs01",
"osd_journal":
"/var/lib/ceph/osd/ceph-kavehome-mgto-pro-fs01/journal",
"public_addr": "10.22.0.168:0/0",
"public_network": "10.22.0.0/24",
"rgw_data": "/var/lib/ceph/radosgw/ceph-kavehome-mgto-pro-fs01",
"setgroup": "ceph",
"setuser": "ceph"
},
"defaults": {
"admin_socket": "",
"auth_client_required": "cephx, none",
"bluestore_cache_size_hdd": "1073741824",
"bluestore_cache_size_ssd": "3221225472",
"err_to_stderr": "false",
"fsid": "----",
"internal_safe_to_start_threads": "false",
"keyring":
"/etc/ceph/$cluster.$name.keyring,/etc/ceph/$cluster.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,",
"log_file": "",
  

Re: [ceph-users] JBOD question

2018-07-23 Thread Satish Patel
I am planning to buy "LSI SAS 9207-8i" does anyone know it support
both RAID & JBOD mode together so i can do RAID-1 on OS disk and other
disk for JBOD

On Sat, Jul 21, 2018 at 11:16 AM, Willem Jan Withagen  wrote:
> On 21/07/2018 01:45, Oliver Freyermuth wrote:
>>
>> Hi Satish,
>>
>> that really completely depends on your controller.
>>
>
> This is what I get on an older AMCC 9550 controller.
> Note that the disk type is set to JBOD. But the disk descriptors are hidden.
> And you'll never know what more is not done right.
>
> Geom name: da6
> Providers:
> 1. Name: da6
>Mediasize: 1000204886016 (932G)
>Sectorsize: 512
>Mode: r1w1e2
>descr: AMCC 9550SXU-8L DISK
>lunname: AMCCZ1N00KBD
>lunid: AMCCZ1N00KBD
>ident: Z1N00KBD
>rotationrate: unknown
>fwsectors: 63
>fwheads: 255
>
> This is an LSI 9802 controller in IT mode:
> (And that gives me a bit more faith)
> Geom name: da7
> Providers:
> 1. Name: da7
>Mediasize: 3000592982016 (2.7T)
>Sectorsize: 512
>Mode: r1w1e1
>descr: WDC WD30EFRX-68AX9N0
>lunid: 0004d927f870
>ident: WD-WMC1T4088693
>rotationrate: unknown
>fwsectors: 63
>fwheads: 255
>
> --WjW
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why lvm is recommended method for bleustore

2018-07-23 Thread Satish Patel
Alfredo,

Thanks, I think i should go with LVM then :)

I have question here, I have 4 physical SSD per server, some reason i
am using ceph-ansible 3.0.8 version which doesn't create LVM volume
itself so i have to create LVM volume manually.

I am using bluestore  ( want to keep WAL/DB on same DATA disk), How do
i create lvm manually on single physical disk? Do i need to create two
logical volume (1 for journal & 1 for Data )?

I am reading this
http://docs.ceph.com/ceph-ansible/master/osds/scenarios.html (at
bottom)

lvm_volumes:
  - data: data-lv1
data_vg: vg1
crush_device_class: foo


In above example, did they create vg1 (volume group)  and created
data-lv1 (logical volume)? If i want to add journal then do i need to
create one more logical volume?  I am confused in that document so
need some clarification

On Mon, Jul 23, 2018 at 2:06 PM, Alfredo Deza  wrote:
> On Mon, Jul 23, 2018 at 1:56 PM, Satish Patel  wrote:
>> This is great explanation, based on your details look like when reboot
>> machine (OSD node) it will take longer time to initialize all number
>> of OSDs but if we use LVM in that case it shorten that time.
>
> That is one aspect, yes. Most importantly: all OSDs will consistently
> come up with ceph-volume. This wasn't the case with ceph-disk and it
> was impossible to
> replicate or understand why (hence the 3 hour timeout)
>
>>
>> There is a good chance that LVM impact some performance because of
>> extra layer, Does anyone has any data which can provide some inside
>> about good or bad performance. It would be great if your share so it
>> will help us to understand impact.
>
> There isn't performance impact, and if there is, it is negligible.
>
>>
>>
>>
>> On Mon, Jul 23, 2018 at 8:37 AM, Alfredo Deza  wrote:
>>> On Mon, Jul 23, 2018 at 6:09 AM, Nicolas Huillard  
>>> wrote:
 Le dimanche 22 juillet 2018 à 09:51 -0400, Satish Patel a écrit :
> I read that post and that's why I open this thread for few more
> questions and clearence,
>
> When you said OSD doesn't come up what actually that means?  After
> reboot of node or after service restart or installation of new disk?
>
> You said we are using manual method what is that?
>
> I'm building new cluster and had zero prior experience so how can I
> produce this error to see lvm is really life saving tool here? I'm
> sure there are plenty of people using but I didn't find and good
> document except that mailing list which raising more questions in my
> mind.

 When I had to change a few drives manually, copying the old contents
 over, I noticed that the logical volumes are tagged with lots of
 information related to how they should be handled at boot time by the
 OSD startup system.
 These LVM tags are a good standard way to add that meta-data within the
 volumes themselves. Apparently, there is no other way to add these tags
 that allow for bluestore/filestore, SATA/SAS/NVMe, whole drive or
 partition, etc.
 They are easy to manage and fail-safe in many configurations.
>>>
>>> This is spot on. To clarify even further, let me give a brief overview
>>> of how that worked with ceph-disk and GPT GUID:
>>>
>>> * at creation time, ceph-disk would add a GUID to the partitions so
>>> that it would later be recognized. These GUID were unique so they
>>> would ensure accuracy
>>> * a set of udev rules would be in place to detect when these GUID
>>> would become available in the system
>>> * at boot time, udev would start detecting devices coming online, and
>>> the rules would call out to ceph-disk (the executable)
>>> * the ceph-disk executable would then call out to the ceph-disk
>>> systemd unit, with a timeout of three hours the device to which it was
>>> assigned (e.g. ceph-disk@/dev/sda )
>>> * the previous step would be done *per device*, waiting for all
>>> devices associated with the OSD to become available (hence the 3 hour
>>> timeout)
>>> * the ceph-disk systemd unit would call back again to the ceph-disk
>>> command line tool signaling devices are ready (with --sync)
>>> * the ceph-disk command line tool would call *the ceph-disk command
>>> line tool again* to "activate" the OSD, having detected (finally) the
>>> device type (encrypted, partially prepared, etc...)
>>>
>>> The above workflow worked for pre-systemd systems, it could've
>>> probably be streamlined better, but it was what allowed to "discover"
>>> devices at boot time. The 3 hour timeout was there because
>>> udev would find these devices being active asynchronously, and
>>> ceph-disk was trying to coerce a more synchronous behavior to get all
>>> devices needed. In a dense OSD node, this meant that OSDs
>>> would not come up at all, inconsistently (sometimes all of them would 
>>> work!).
>>>
>>> Device discovery is a tremendously complicated and difficult problem
>>> to solve, and we thought that a few simple rules with UDEV would be
>>> the answer (they weren't). 

Re: [ceph-users] Insane CPU utilization in ceph.fuse

2018-07-23 Thread Patrick Donnelly
On Mon, Jul 23, 2018 at 5:48 AM, Daniel Carrasco  wrote:
> Hi, thanks for your response.
>
> Clients are about 6, and 4 of them are the most of time on standby. Only two
> are active servers that are serving the webpage. Also we've a varnish on
> front, so are not getting all the load (below 30% in PHP is not much).
> About the MDS cache, now I've the mds_cache_memory_limit at 8Mb.

What! Please post `ceph daemon mds. config diff`,  `... perf
dump`, and `... dump_mempools `  from the server the active MDS is on.

> I've tested
> also 512Mb, but the CPU usage is the same and the MDS RAM usage grows up to
> 15GB (on a 16Gb server it starts to swap and all fails). With 8Mb, at least
> the memory usage is stable on less than 6Gb (now is using about 1GB of RAM).

We've seen reports of possible memory leaks before and the potential
fixes for those were in 12.2.6. How fast does your MDS reach 15GB?
Your MDS cache size should be configured to 1-8GB (depending on your
preference) so it's disturbing to see you set it so low.

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why lvm is recommended method for bleustore

2018-07-23 Thread Alfredo Deza
On Mon, Jul 23, 2018 at 1:56 PM, Satish Patel  wrote:
> This is great explanation, based on your details look like when reboot
> machine (OSD node) it will take longer time to initialize all number
> of OSDs but if we use LVM in that case it shorten that time.

That is one aspect, yes. Most importantly: all OSDs will consistently
come up with ceph-volume. This wasn't the case with ceph-disk and it
was impossible to
replicate or understand why (hence the 3 hour timeout)

>
> There is a good chance that LVM impact some performance because of
> extra layer, Does anyone has any data which can provide some inside
> about good or bad performance. It would be great if your share so it
> will help us to understand impact.

There isn't performance impact, and if there is, it is negligible.

>
>
>
> On Mon, Jul 23, 2018 at 8:37 AM, Alfredo Deza  wrote:
>> On Mon, Jul 23, 2018 at 6:09 AM, Nicolas Huillard  
>> wrote:
>>> Le dimanche 22 juillet 2018 à 09:51 -0400, Satish Patel a écrit :
 I read that post and that's why I open this thread for few more
 questions and clearence,

 When you said OSD doesn't come up what actually that means?  After
 reboot of node or after service restart or installation of new disk?

 You said we are using manual method what is that?

 I'm building new cluster and had zero prior experience so how can I
 produce this error to see lvm is really life saving tool here? I'm
 sure there are plenty of people using but I didn't find and good
 document except that mailing list which raising more questions in my
 mind.
>>>
>>> When I had to change a few drives manually, copying the old contents
>>> over, I noticed that the logical volumes are tagged with lots of
>>> information related to how they should be handled at boot time by the
>>> OSD startup system.
>>> These LVM tags are a good standard way to add that meta-data within the
>>> volumes themselves. Apparently, there is no other way to add these tags
>>> that allow for bluestore/filestore, SATA/SAS/NVMe, whole drive or
>>> partition, etc.
>>> They are easy to manage and fail-safe in many configurations.
>>
>> This is spot on. To clarify even further, let me give a brief overview
>> of how that worked with ceph-disk and GPT GUID:
>>
>> * at creation time, ceph-disk would add a GUID to the partitions so
>> that it would later be recognized. These GUID were unique so they
>> would ensure accuracy
>> * a set of udev rules would be in place to detect when these GUID
>> would become available in the system
>> * at boot time, udev would start detecting devices coming online, and
>> the rules would call out to ceph-disk (the executable)
>> * the ceph-disk executable would then call out to the ceph-disk
>> systemd unit, with a timeout of three hours the device to which it was
>> assigned (e.g. ceph-disk@/dev/sda )
>> * the previous step would be done *per device*, waiting for all
>> devices associated with the OSD to become available (hence the 3 hour
>> timeout)
>> * the ceph-disk systemd unit would call back again to the ceph-disk
>> command line tool signaling devices are ready (with --sync)
>> * the ceph-disk command line tool would call *the ceph-disk command
>> line tool again* to "activate" the OSD, having detected (finally) the
>> device type (encrypted, partially prepared, etc...)
>>
>> The above workflow worked for pre-systemd systems, it could've
>> probably be streamlined better, but it was what allowed to "discover"
>> devices at boot time. The 3 hour timeout was there because
>> udev would find these devices being active asynchronously, and
>> ceph-disk was trying to coerce a more synchronous behavior to get all
>> devices needed. In a dense OSD node, this meant that OSDs
>> would not come up at all, inconsistently (sometimes all of them would work!).
>>
>> Device discovery is a tremendously complicated and difficult problem
>> to solve, and we thought that a few simple rules with UDEV would be
>> the answer (they weren't). The LVM implementation of ceph-volume
>> limits itself to just ask LVM about devices and then gets them
>> "activated" at once. On some tests on nodes with ~20 OSDs, we were 10x
>> faster to come up (compared to ceph-disk), and fully operational -
>> every time.
>>
>> Since this is a question that keeps coming up, and answers are now
>> getting a bit scattered, I'll compound them all into a section in the
>> docs. I'll try to address the "layer of complexity", "performance
>> overhead", and other
>> recurring issues that keep being used.
>>
>> Any other ideas are welcomed if some of the previously discussed
>> things are still not entirely clear.
>>
>>>
 Sent from my iPhone

 > On Jul 22, 2018, at 6:31 AM, Marc Roos 
 > wrote:
 >
 >
 >
 > I don’t think it will get any more basic than that. Or maybe this?
 > If
 > the doctor diagnoses you, you can either accept this, get 2nd
 > opinion,
 > or study medicine to verify 

Re: [ceph-users] Why lvm is recommended method for bleustore

2018-07-23 Thread Satish Patel
This is great explanation, based on your details look like when reboot
machine (OSD node) it will take longer time to initialize all number
of OSDs but if we use LVM in that case it shorten that time.

There is a good chance that LVM impact some performance because of
extra layer, Does anyone has any data which can provide some inside
about good or bad performance. It would be great if your share so it
will help us to understand impact.



On Mon, Jul 23, 2018 at 8:37 AM, Alfredo Deza  wrote:
> On Mon, Jul 23, 2018 at 6:09 AM, Nicolas Huillard  
> wrote:
>> Le dimanche 22 juillet 2018 à 09:51 -0400, Satish Patel a écrit :
>>> I read that post and that's why I open this thread for few more
>>> questions and clearence,
>>>
>>> When you said OSD doesn't come up what actually that means?  After
>>> reboot of node or after service restart or installation of new disk?
>>>
>>> You said we are using manual method what is that?
>>>
>>> I'm building new cluster and had zero prior experience so how can I
>>> produce this error to see lvm is really life saving tool here? I'm
>>> sure there are plenty of people using but I didn't find and good
>>> document except that mailing list which raising more questions in my
>>> mind.
>>
>> When I had to change a few drives manually, copying the old contents
>> over, I noticed that the logical volumes are tagged with lots of
>> information related to how they should be handled at boot time by the
>> OSD startup system.
>> These LVM tags are a good standard way to add that meta-data within the
>> volumes themselves. Apparently, there is no other way to add these tags
>> that allow for bluestore/filestore, SATA/SAS/NVMe, whole drive or
>> partition, etc.
>> They are easy to manage and fail-safe in many configurations.
>
> This is spot on. To clarify even further, let me give a brief overview
> of how that worked with ceph-disk and GPT GUID:
>
> * at creation time, ceph-disk would add a GUID to the partitions so
> that it would later be recognized. These GUID were unique so they
> would ensure accuracy
> * a set of udev rules would be in place to detect when these GUID
> would become available in the system
> * at boot time, udev would start detecting devices coming online, and
> the rules would call out to ceph-disk (the executable)
> * the ceph-disk executable would then call out to the ceph-disk
> systemd unit, with a timeout of three hours the device to which it was
> assigned (e.g. ceph-disk@/dev/sda )
> * the previous step would be done *per device*, waiting for all
> devices associated with the OSD to become available (hence the 3 hour
> timeout)
> * the ceph-disk systemd unit would call back again to the ceph-disk
> command line tool signaling devices are ready (with --sync)
> * the ceph-disk command line tool would call *the ceph-disk command
> line tool again* to "activate" the OSD, having detected (finally) the
> device type (encrypted, partially prepared, etc...)
>
> The above workflow worked for pre-systemd systems, it could've
> probably be streamlined better, but it was what allowed to "discover"
> devices at boot time. The 3 hour timeout was there because
> udev would find these devices being active asynchronously, and
> ceph-disk was trying to coerce a more synchronous behavior to get all
> devices needed. In a dense OSD node, this meant that OSDs
> would not come up at all, inconsistently (sometimes all of them would work!).
>
> Device discovery is a tremendously complicated and difficult problem
> to solve, and we thought that a few simple rules with UDEV would be
> the answer (they weren't). The LVM implementation of ceph-volume
> limits itself to just ask LVM about devices and then gets them
> "activated" at once. On some tests on nodes with ~20 OSDs, we were 10x
> faster to come up (compared to ceph-disk), and fully operational -
> every time.
>
> Since this is a question that keeps coming up, and answers are now
> getting a bit scattered, I'll compound them all into a section in the
> docs. I'll try to address the "layer of complexity", "performance
> overhead", and other
> recurring issues that keep being used.
>
> Any other ideas are welcomed if some of the previously discussed
> things are still not entirely clear.
>
>>
>>> Sent from my iPhone
>>>
>>> > On Jul 22, 2018, at 6:31 AM, Marc Roos 
>>> > wrote:
>>> >
>>> >
>>> >
>>> > I don’t think it will get any more basic than that. Or maybe this?
>>> > If
>>> > the doctor diagnoses you, you can either accept this, get 2nd
>>> > opinion,
>>> > or study medicine to verify it.
>>> >
>>> > In short lvm has been introduced to solve some issues of related
>>> > to
>>> > starting osd's (which I did not have, probably because of a
>>> > 'manual'
>>> > configuration). And it opens the ability to support (more future)
>>> > devices.
>>> >
>>> > I gave you two links, did you read the whole thread?
>>> > https://www.mail-archive.com/ceph-users@lists.ceph.com/msg47802.htm
>>> > l
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > 

Re: [ceph-users] Cephfs kernel driver availability

2018-07-23 Thread Michael Kuriger
If you're using CentOS/RHEL you can try the elrepo kernels

Mike Kuriger 



-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of John 
Spray
Sent: Monday, July 23, 2018 5:07 AM
To: Bryan Henderson
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Cephfs kernel driver availability

On Sun, Jul 22, 2018 at 9:03 PM Bryan Henderson  wrote:
>
> Is there some better place to get a filesystem driver for the longterm
> stable Linux kernel (3.16) than the regular kernel.org source distribution?

The general advice[1] on this is not to try and use a 3.x kernel with
CephFS.  The only exception is if your distro provider is doing
special backports (latest RHEL releases have CephFS backports).  This
causes some confusion, because a number of distros that have shipped
"stable" kernels with older, known unstable CephFS code.

If you're building your own kernels then you definitely want to be on
a recent 4.x

John

1. 
https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.ceph.com_docs_master_cephfs_best-2Dpractices_-23which-2Dkernel-2Dversion=DwICAg=5m9CfXHY6NXqkS7nN5n23w=5r9bhr1JAPRaUcJcU-FfGg=1d2qC0CtsbiZASGIWepKvVV0aMaJAXZZmg2_NDncscw=nAbQCNqk5k58F3w1fk-APMYb49ODP3WlGtdkQNjwU4Q=

> The reason I ask is that I have been trying to get some clients running
> Linux kernel 3.16 (the current long term stable Linux kernel) and so far
> I have run into two serious bugs that, it turns out, were found and fixed
> years ago in more current mainline kernels.
>
> In both cases, I emailed Ben Hutchings, the apparent maintainer of 3.16,
> asking if the fixes could be added to 3.16, but was met with silence.  This
> leads me to believe that there are many more bugs in the 3.16 cephfs
> filesystem driver waiting for me.  Indeed, I've seen panics not yet explained.
>
> So what are other people using?  A less stable kernel?  An out-of-tree driver?
> FUSE?  Is there a working process for getting known bugs fixed in 3.16?
>
> --
> Bryan Henderson   San Jose, California
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwICAg=5m9CfXHY6NXqkS7nN5n23w=5r9bhr1JAPRaUcJcU-FfGg=1d2qC0CtsbiZASGIWepKvVV0aMaJAXZZmg2_NDncscw=lLPHUayL4gqcIGSbOL6XkIuUPBs14rsGI6hFq1UtXvI=
___
ceph-users mailing list
ceph-users@lists.ceph.com
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwICAg=5m9CfXHY6NXqkS7nN5n23w=5r9bhr1JAPRaUcJcU-FfGg=1d2qC0CtsbiZASGIWepKvVV0aMaJAXZZmg2_NDncscw=lLPHUayL4gqcIGSbOL6XkIuUPBs14rsGI6hFq1UtXvI=
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Omap warning in 12.2.6

2018-07-23 Thread Brent Kennedy
Thanks for the heads up.  I upgraded the cluster to 12.2.7 and the message went 
away.  No CRC errors luckily.

 

-Brent

 

From: Brady Deetz [mailto:bde...@gmail.com] 
Sent: Thursday, July 19, 2018 3:26 PM
To: Brent Kennedy 
Cc: ceph-users 
Subject: Re: [ceph-users] Omap warning in 12.2.6

 

12.2.6 has a regression. See "v12.2.7 Luminous released" and all of the related 
disaster posts. Also in the release nodes for .7 is a bug disclosure for 12.2.5 
that affects rgw users pretty badly during upgrade. You might take a look there.

 

On Thu, Jul 19, 2018 at 2:13 PM Brent Kennedy mailto:bkenn...@cfl.rr.com> > wrote:

I just upgraded our cluster to 12.2.6 and now I see this warning about 1 large 
omap object.  I looked and it seems this warning was just added in 12.2.6.  I 
found a few discussions on what is was but not much information on addressing 
it properly.  Our cluster uses rgw exclusively with just a few buckets in the 
.rgw.buckets pool.  Our largest bucket has millions of objects in it.

 

Any thoughts or links on this?

 

 

Regards,

-Brent

 

Existing Clusters:

Test: Luminous 12.2.6 with 3 osd servers, 1 mon/man, 1 gateway ( all virtual )

US Production: Firefly with 4 osd servers, 3 mons, 3 gateways behind haproxy LB

UK Production: Luminous 12.2.6 with 8 osd servers, 3 mons/man, 3 gateways 
behind haproxy LB

 

 

___
ceph-users mailing list
ceph-users@lists.ceph.com  
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] alert conditions

2018-07-23 Thread Jan Fajerski

Hi community,
the topic of alerting conditions for a ceph cluster comes up in various 
contexts. Some folks use prometheus or grafana, (I believe) sopme people would 
like snmp traps from ceph, the mgr dashboard could provide basic alerting 
capabilities and there is of course ceph -s.

Also see "Improving alerting/health checks" on ceph-devel.

Working on some prometheus stuff I think it would be nice to have some basic 
alerting rules in the ceph repo. This could serve as a out-of-the-box default as 
well as a example or best practice which conditions should be watched.


So I'm wondering what does the community think? What do operators use as alert 
conditions or find alert-worthy?
I'm aware that this is very open-ended, highly dependent on the cluster and its 
workload and can range from obvious (health_err anyone?) to intricate conditions 
that are designed for a certain cluster. I'm wondering if we can distill some 
non-trivial alert conditions that ceph itself does not (yet) provide.


If you have any conditions fitting that description, feel free to add them to 
https://pad.ceph.com/p/alert-conditions. Otherwise looking forward to feedback.


jan

--
Jan Fajerski
Engineer Enterprise Storage
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Self shutdown of 1 whole system (Derbian stretch/Ceph 12.2.7/bluestore)

2018-07-23 Thread Oliver Freyermuth
Am 23.07.2018 um 14:59 schrieb Nicolas Huillard:
> Le lundi 23 juillet 2018 à 12:40 +0200, Oliver Freyermuth a écrit :
>> Am 23.07.2018 um 11:18 schrieb Nicolas Huillard:
>>> Le lundi 23 juillet 2018 à 18:23 +1000, Brad Hubbard a écrit :
 Ceph doesn't shut down systems as in kill or reboot the box if
 that's
 what you're saying?
>>>
>>> That's the first part of what I was saying, yes. I was pretty sure
>>> Ceph
>>> doesn't reboot/shutdown/reset, but now it's 100% sure, thanks.
>>> Maybe systemd triggered something, but without any lasting traces.
>>> The kernel didn't leave any more traces in kernel.log, and since
>>> the
>>> server was off, there was no oops remaining on the console...
>>
>> If there was an oops, it should also be recorded in pstore. 
>> If the kernel was still running and able to show a stacktrace, even
>> if disk I/O has become impossible,
>> it will in general dump the stacktrace to pstore (e.g. UEFI pstore if
>> you boot via EFI, or ACPI pstore, if available). 
> 
> I was sure I would learn something from this thread. Thnaks!
> Unfortunately, those machines don't boot using UEFI, /sys/fs/pstore/ is
> empty, and:
> /sys/module/pstore/parameters/backend:(null)
> /sys/module/pstore/parameters/update_ms:-1
> 
> I suppose this pstore is also shown in the BMC web interface as "Server
> Health / System Log". This is empty too, and I wondered what would fill
> it. Maybe I'll use UEFI boot next time.

It's usually not shown anywhere else - in the end, the UEFI pstore is just 
permanend storage, which the Linux kernel uses to save OOPSes and other kinds 
of PANICs. 
It's very unlikely that the BMC can interpret the very same format the Linux 
kernel writes there. 

Sadly, it seems your machine does not have any backend available (unless booted 
via UEFI). 
Our machines can luckily use ACPI ERST (Error Record Serialization Table) even 
if legacy-booted. 

So probably, booting via UEFI is your only option (other options could be 
netconsole, but it is less robust / does not capture everything, or ramoops, 
but I've never used that). 

Cheers,
Oliver



smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Self shutdown of 1 whole system (Derbian stretch/Ceph 12.2.7/bluestore)

2018-07-23 Thread Nicolas Huillard
Le lundi 23 juillet 2018 à 12:40 +0200, Oliver Freyermuth a écrit :
> Am 23.07.2018 um 11:18 schrieb Nicolas Huillard:
> > Le lundi 23 juillet 2018 à 18:23 +1000, Brad Hubbard a écrit :
> > > Ceph doesn't shut down systems as in kill or reboot the box if
> > > that's
> > > what you're saying?
> > 
> > That's the first part of what I was saying, yes. I was pretty sure
> > Ceph
> > doesn't reboot/shutdown/reset, but now it's 100% sure, thanks.
> > Maybe systemd triggered something, but without any lasting traces.
> > The kernel didn't leave any more traces in kernel.log, and since
> > the
> > server was off, there was no oops remaining on the console...
> 
> If there was an oops, it should also be recorded in pstore. 
> If the kernel was still running and able to show a stacktrace, even
> if disk I/O has become impossible,
> it will in general dump the stacktrace to pstore (e.g. UEFI pstore if
> you boot via EFI, or ACPI pstore, if available). 

I was sure I would learn something from this thread. Thnaks!
Unfortunately, those machines don't boot using UEFI, /sys/fs/pstore/ is
empty, and:
/sys/module/pstore/parameters/backend:(null)
/sys/module/pstore/parameters/update_ms:-1

I suppose this pstore is also shown in the BMC web interface as "Server
Health / System Log". This is empty too, and I wondered what would fill
it. Maybe I'll use UEFI boot next time.

-- 
Nicolas Huillard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Insane CPU utilization in ceph.fuse

2018-07-23 Thread Daniel Carrasco
Hi, thanks for your response.

Clients are about 6, and 4 of them are the most of time on standby. Only
two are active servers that are serving the webpage. Also we've a varnish
on front, so are not getting all the load (below 30% in PHP is not much).
About the MDS cache, now I've the mds_cache_memory_limit at 8Mb. I've
tested also 512Mb, but the CPU usage is the same and the MDS RAM usage
grows up to 15GB (on a 16Gb server it starts to swap and all fails). With
8Mb, at least the memory usage is stable on less than 6Gb (now is using
about 1GB of RAM).

What catches my attention, is the huge difference between kernel and fuse.
Why the kernel client is not notizable and the fuse client is using the
most of CPU power...

Greetings.

2018-07-23 14:01 GMT+02:00 Paul Emmerich :

> Hi,
>
> do you happen to have a relatively large number of clients and a
> relatively small cache size on the MDS?
>
>
> Paul
>
> 2018-07-23 13:16 GMT+02:00 Daniel Carrasco :
>
>> Hello,
>>
>> I've created a Ceph cluster of 3 nodes (3 mons, 3 osd, 3 mgr and 3 mds
>> with two active). This cluster is for mainly for server a webpage (small
>> files) and is configured to have three copies of files (a copy on every
>> OSD).
>> My question is about ceph.fuse clients: I've noticed an insane CPU usage
>> when the fuse client is used, while the kernel client usage is unnoticeable.
>>
>> For example, now i've that machines working with kernel client and the
>> CPU usage is less than 30% (all used by php processes). When I change to
>> ceph.fuse the CPU usage raise to more than 130% and even sometimes up to
>> 190-200% (on a two cores machines means burn the CPU).
>>
>> Now I've seen two warnings on the cluster:
>> 1 MDSs report oversized cache
>> 4 clients failing to respond to cache pressure
>>
>> and I think that maybe is a lack of capabilities on ceph kernel modules,
>> so I want to give a try to fuse module but I've the above problem.
>>
>> My SO is Ubuntu 16.04 x64 with kernel version 4.13.0-45-generic and ceph
>> server/client version is 12.2.7.
>>
>> How I can debug why that CPU usage?.
>>
>> Thanks!
>> --
>> _
>>
>>   Daniel Carrasco Marín
>>   Ingeniería para la Innovación i2TIC, S.L.
>>   Tlf:  +34 911 12 32 84 Ext: 223
>>   www.i2tic.com
>> _
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 
> 81247 München
> 
> www.croit.io
> Tel: +49 89 1896585 90
>



-- 
_

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why lvm is recommended method for bleustore

2018-07-23 Thread Alfredo Deza
On Mon, Jul 23, 2018 at 6:09 AM, Nicolas Huillard  wrote:
> Le dimanche 22 juillet 2018 à 09:51 -0400, Satish Patel a écrit :
>> I read that post and that's why I open this thread for few more
>> questions and clearence,
>>
>> When you said OSD doesn't come up what actually that means?  After
>> reboot of node or after service restart or installation of new disk?
>>
>> You said we are using manual method what is that?
>>
>> I'm building new cluster and had zero prior experience so how can I
>> produce this error to see lvm is really life saving tool here? I'm
>> sure there are plenty of people using but I didn't find and good
>> document except that mailing list which raising more questions in my
>> mind.
>
> When I had to change a few drives manually, copying the old contents
> over, I noticed that the logical volumes are tagged with lots of
> information related to how they should be handled at boot time by the
> OSD startup system.
> These LVM tags are a good standard way to add that meta-data within the
> volumes themselves. Apparently, there is no other way to add these tags
> that allow for bluestore/filestore, SATA/SAS/NVMe, whole drive or
> partition, etc.
> They are easy to manage and fail-safe in many configurations.

This is spot on. To clarify even further, let me give a brief overview
of how that worked with ceph-disk and GPT GUID:

* at creation time, ceph-disk would add a GUID to the partitions so
that it would later be recognized. These GUID were unique so they
would ensure accuracy
* a set of udev rules would be in place to detect when these GUID
would become available in the system
* at boot time, udev would start detecting devices coming online, and
the rules would call out to ceph-disk (the executable)
* the ceph-disk executable would then call out to the ceph-disk
systemd unit, with a timeout of three hours the device to which it was
assigned (e.g. ceph-disk@/dev/sda )
* the previous step would be done *per device*, waiting for all
devices associated with the OSD to become available (hence the 3 hour
timeout)
* the ceph-disk systemd unit would call back again to the ceph-disk
command line tool signaling devices are ready (with --sync)
* the ceph-disk command line tool would call *the ceph-disk command
line tool again* to "activate" the OSD, having detected (finally) the
device type (encrypted, partially prepared, etc...)

The above workflow worked for pre-systemd systems, it could've
probably be streamlined better, but it was what allowed to "discover"
devices at boot time. The 3 hour timeout was there because
udev would find these devices being active asynchronously, and
ceph-disk was trying to coerce a more synchronous behavior to get all
devices needed. In a dense OSD node, this meant that OSDs
would not come up at all, inconsistently (sometimes all of them would work!).

Device discovery is a tremendously complicated and difficult problem
to solve, and we thought that a few simple rules with UDEV would be
the answer (they weren't). The LVM implementation of ceph-volume
limits itself to just ask LVM about devices and then gets them
"activated" at once. On some tests on nodes with ~20 OSDs, we were 10x
faster to come up (compared to ceph-disk), and fully operational -
every time.

Since this is a question that keeps coming up, and answers are now
getting a bit scattered, I'll compound them all into a section in the
docs. I'll try to address the "layer of complexity", "performance
overhead", and other
recurring issues that keep being used.

Any other ideas are welcomed if some of the previously discussed
things are still not entirely clear.

>
>> Sent from my iPhone
>>
>> > On Jul 22, 2018, at 6:31 AM, Marc Roos 
>> > wrote:
>> >
>> >
>> >
>> > I don’t think it will get any more basic than that. Or maybe this?
>> > If
>> > the doctor diagnoses you, you can either accept this, get 2nd
>> > opinion,
>> > or study medicine to verify it.
>> >
>> > In short lvm has been introduced to solve some issues of related
>> > to
>> > starting osd's (which I did not have, probably because of a
>> > 'manual'
>> > configuration). And it opens the ability to support (more future)
>> > devices.
>> >
>> > I gave you two links, did you read the whole thread?
>> > https://www.mail-archive.com/ceph-users@lists.ceph.com/msg47802.htm
>> > l
>> >
>> >
>> >
>> >
>> >
>> > -Original Message-
>> > From: Satish Patel [mailto:satish@gmail.com]
>> > Sent: zaterdag 21 juli 2018 20:59
>> > To: ceph-users
>> > Subject: [ceph-users] Why lvm is recommended method for bleustore
>> >
>> > Folks,
>> >
>> > I think i am going to boil ocean here, I google a lot about this
>> > topic
>> > why lvm is recommended method for bluestore, but didn't find any
>> > good
>> > and detail explanation, not even in Ceph official website.
>> >
>> > Can someone explain here in basic language because i am no way
>> > expert so
>> > just want to understand what is the advantage of adding extra layer
>> > of
>> > complexity?
>> >

Re: [ceph-users] Cephfs kernel driver availability

2018-07-23 Thread John Spray
On Sun, Jul 22, 2018 at 9:03 PM Bryan Henderson  wrote:
>
> Is there some better place to get a filesystem driver for the longterm
> stable Linux kernel (3.16) than the regular kernel.org source distribution?

The general advice[1] on this is not to try and use a 3.x kernel with
CephFS.  The only exception is if your distro provider is doing
special backports (latest RHEL releases have CephFS backports).  This
causes some confusion, because a number of distros that have shipped
"stable" kernels with older, known unstable CephFS code.

If you're building your own kernels then you definitely want to be on
a recent 4.x

John

1. http://docs.ceph.com/docs/master/cephfs/best-practices/#which-kernel-version

> The reason I ask is that I have been trying to get some clients running
> Linux kernel 3.16 (the current long term stable Linux kernel) and so far
> I have run into two serious bugs that, it turns out, were found and fixed
> years ago in more current mainline kernels.
>
> In both cases, I emailed Ben Hutchings, the apparent maintainer of 3.16,
> asking if the fixes could be added to 3.16, but was met with silence.  This
> leads me to believe that there are many more bugs in the 3.16 cephfs
> filesystem driver waiting for me.  Indeed, I've seen panics not yet explained.
>
> So what are other people using?  A less stable kernel?  An out-of-tree driver?
> FUSE?  Is there a working process for getting known bugs fixed in 3.16?
>
> --
> Bryan Henderson   San Jose, California
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Insane CPU utilization in ceph.fuse

2018-07-23 Thread Paul Emmerich
Hi,

do you happen to have a relatively large number of clients and a relatively
small cache size on the MDS?


Paul

2018-07-23 13:16 GMT+02:00 Daniel Carrasco :

> Hello,
>
> I've created a Ceph cluster of 3 nodes (3 mons, 3 osd, 3 mgr and 3 mds
> with two active). This cluster is for mainly for server a webpage (small
> files) and is configured to have three copies of files (a copy on every
> OSD).
> My question is about ceph.fuse clients: I've noticed an insane CPU usage
> when the fuse client is used, while the kernel client usage is unnoticeable.
>
> For example, now i've that machines working with kernel client and the CPU
> usage is less than 30% (all used by php processes). When I change to
> ceph.fuse the CPU usage raise to more than 130% and even sometimes up to
> 190-200% (on a two cores machines means burn the CPU).
>
> Now I've seen two warnings on the cluster:
> 1 MDSs report oversized cache
> 4 clients failing to respond to cache pressure
>
> and I think that maybe is a lack of capabilities on ceph kernel modules,
> so I want to give a try to fuse module but I've the above problem.
>
> My SO is Ubuntu 16.04 x64 with kernel version 4.13.0-45-generic and ceph
> server/client version is 12.2.7.
>
> How I can debug why that CPU usage?.
>
> Thanks!
> --
> _
>
>   Daniel Carrasco Marín
>   Ingeniería para la Innovación i2TIC, S.L.
>   Tlf:  +34 911 12 32 84 Ext: 223
>   www.i2tic.com
> _
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Insane CPU utilization in ceph.fuse

2018-07-23 Thread Daniel Carrasco
Hello,

I've created a Ceph cluster of 3 nodes (3 mons, 3 osd, 3 mgr and 3 mds with
two active). This cluster is for mainly for server a webpage (small files)
and is configured to have three copies of files (a copy on every OSD).
My question is about ceph.fuse clients: I've noticed an insane CPU usage
when the fuse client is used, while the kernel client usage is unnoticeable.

For example, now i've that machines working with kernel client and the CPU
usage is less than 30% (all used by php processes). When I change to
ceph.fuse the CPU usage raise to more than 130% and even sometimes up to
190-200% (on a two cores machines means burn the CPU).

Now I've seen two warnings on the cluster:
1 MDSs report oversized cache
4 clients failing to respond to cache pressure

and I think that maybe is a lack of capabilities on ceph kernel modules, so
I want to give a try to fuse module but I've the above problem.

My SO is Ubuntu 16.04 x64 with kernel version 4.13.0-45-generic and ceph
server/client version is 12.2.7.

How I can debug why that CPU usage?.

Thanks!
-- 
_

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Self shutdown of 1 whole system (Derbian stretch/Ceph 12.2.7/bluestore)

2018-07-23 Thread Nicolas Huillard
Le lundi 23 juillet 2018 à 11:40 +0100, Matthew Vernon a écrit :
> One of my server silently shutdown last night, with no explanation
> > whatsoever in any logs. According to the existing logs, the
> > shutdown
> 
> We have seen similar things with our SuperMicro servers; our current
> best theory is that it's related to CPU power management. Disabling
> it
> in BIOS seems to have helped.

Too bad my hardware design heavily rely on power management, thus
silence...

-- 
Nicolas Huillard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "CPU CATERR Fault" Was: Self shutdown of 1 whole system (Derbian stretch/Ceph 12.2.7/bluestore)

2018-07-23 Thread Nicolas Huillard
Le lundi 23 juillet 2018 à 12:43 +0200, Oliver Freyermuth a écrit :
> There ARE chassis/BMC/IPMI level events, one of which is "CPU
> > CATERR
> > Fault", with a timestamp matching the timestamps below, and no more
> > information.
> 
> If this kind of failure (or a less severe one) also happens at
> runtime, mcelog should catch it. 

I'll install mcelog ASAP, even though it probably wouldn't have added
much in that case.

> For CATERR errors, we also found that sometimes the web interface of
> the BMC shows more information for the event log entry 
> than querying the event log via ipmitool - you may want to check
> this. 

I got that from the web interface. ipmitool does not give more
information anyway (lots of "missing" and "unknown", and not
description...):
ipmitool> sel get 118
SEL Record ID  : 0076
 Record Type   : 02
 Timestamp : 07/21/2018 01:58:48
 Generator ID  : 0020
 EvM Revision  : 04
 Sensor Type   : Unknown
 Sensor Number : 76
 Event Type: Sensor-specific Discrete
 Event Direction   : Assertion Event
 Event Data (RAW)  : 00
 Event Interpretation  : Missing
 Description   : 

Sensor ID  : CPU CATERR (0x76)
 Entity ID : 26.1
 Sensor Type (Discrete): Unknown

-- 
Nicolas Huillard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Add Partitions to Ceph Cluster

2018-07-23 Thread Mehmet

Hi Dimitri,

what is the output of

- ceph osd tree?

Perhaps you have a initials crush weight of 0 and in this case there 
wouldnt be any change in the PGs till you change the weight.


- Mehmet

Am 2018-07-10 11:58, schrieb Dimitri Roschkowski:

Hi,

is it possible to use just a partition instead of a whole disk for
OSD? On a server I already use hdb for Ceph and want to add hda4 to be
used in the Ceph Cluster, but it didn’t work for me.

On the server with the partition I tried:

ceph-disk prepare /dev/sda4

and

ceph-disk activate /dev/sda4

And with df I see, that ceph did something on the partition:

/dev/sda4   1.8T  2.8G  1.8T   1% /var/lib/ceph/osd/ceph-4


My problem is, that after I activated the disk, I didn't see a change
in the ceph status output:

  data:
pools:   6 pools, 168 pgs
objects: 25.84 k objects, 100 GiB
usage:   305 GiB used, 6.8 TiB / 7.1 TiB avail
pgs: 168 active+clean

Can some one help me?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mimic 13.2.1 release date

2018-07-23 Thread Wido den Hollander
Any news on this yet? 13.2.1 would be very welcome! :-)

Wido

On 07/09/2018 05:11 PM, Wido den Hollander wrote:
> Hi,
> 
> Is there a release date for Mimic 13.2.1 yet?
> 
> There are a few issues which currently make deploying with Mimic 13.2.0
> a bit difficult, for example:
> 
> - https://tracker.ceph.com/issues/24423
> - https://github.com/ceph/ceph/pull/22393
> 
> Especially the first one makes it difficult.
> 
> 13.2.1 would be very welcome with these fixes in there.
> 
> Is there a ETA for this version yet?
> 
> Wido
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "CPU CATERR Fault" Was: Self shutdown of 1 whole system (Derbian stretch/Ceph 12.2.7/bluestore)

2018-07-23 Thread Oliver Freyermuth
Am 23.07.2018 um 11:39 schrieb Nicolas Huillard:
> Le lundi 23 juillet 2018 à 10:28 +0200, Caspar Smit a écrit :
>> Do you have any hardware watchdog running in the system? A watchdog
>> could
>> trigger a powerdown if it meets some value. Any event logs from the
>> chassis
>> itself?
> 
> Nice suggestions ;-)
> 
> I see some [watchdog/N] and one [watchdogd] kernel threads, along with
> a "kernel: [0.116002] NMI watchdog: enabled on all CPUs,
> permanently consumes one hw-PMU counter." line in the kernel log, but
> no user-land watchdog daemon: I'm not sure if the watchdog is actually
> active.
> 
> There ARE chassis/BMC/IPMI level events, one of which is "CPU CATERR
> Fault", with a timestamp matching the timestamps below, and no more
> information.

If this kind of failure (or a less severe one) also happens at runtime, mcelog 
should catch it. 
For CATERR errors, we also found that sometimes the web interface of the BMC 
shows more information for the event log entry 
than querying the event log via ipmitool - you may want to check this. 


> If I understand correctly, this is a signal emitted by the CPU, to the
> BMC, upon "catastrophic error" (more than "fatal"), which the BMC must
> respond to the way it wants, Intel suggestions including resetting the
> chassis.
> 
> https://www.intel.in/content/dam/www/public/us/en/documents/white-paper
> s/platform-level-error-strategies-paper.pdf
> 
> Does that mean that the hardware is failing, or a neutrino just crossed
> some CPU register?
> CPU is a Xeon D-1521 with ECC memory.
> 
>> Kind regards,
> 
> Many thanks!
> 
>>
>> Caspar
>>
>> 2018-07-21 10:31 GMT+02:00 Nicolas Huillard :
>>
>>> Hi all,
>>>
>>> One of my server silently shutdown last night, with no explanation
>>> whatsoever in any logs. According to the existing logs, the
>>> shutdown
>>> (without reboot) happened between 03:58:20.061452 (last timestamp
>>> from
>>> /var/log/ceph/ceph-mgr.oxygene.log) and 03:59:01.515308 (new MON
>>> election called, for which oxygene didn't answer).
>>>
>>> Is there any way in which Ceph could silently shutdown a server?
>>> Can SMART self-test influence scrubbing or compaction?
>>>
>>> The only thing I have is that smartd stated a long self-test on
>>> both
>>> OSD spinning drives on that host:
>>> Jul 21 03:21:35 oxygene smartd[712]: Device: /dev/sda [SAT],
>>> starting
>>> scheduled Long Self-Test.
>>> Jul 21 03:21:35 oxygene smartd[712]: Device: /dev/sdb [SAT],
>>> starting
>>> scheduled Long Self-Test.
>>> Jul 21 03:21:35 oxygene smartd[712]: Device: /dev/sdc [SAT],
>>> starting
>>> scheduled Long Self-Test.
>>> Jul 21 03:51:35 oxygene smartd[712]: Device: /dev/sda [SAT], self-
>>> test in
>>> progress, 90% remaining
>>> Jul 21 03:51:35 oxygene smartd[712]: Device: /dev/sdb [SAT], self-
>>> test in
>>> progress, 90% remaining
>>> Jul 21 03:51:35 oxygene smartd[712]: Device: /dev/sdc [SAT],
>>> previous
>>> self-test completed without error
>>>
>>> ...and smartctl now says that the self-tests didn't finish (on both
>>> drives) :
>>> # 1  Extended offlineInterrupted (host
>>> reset)  00% 10636
>>> -
>>>
>>> MON logs on oxygene talks about rockdb compaction a few minutes
>>> before
>>> the shutdown, and a deep-scrub finished earlier:
>>> /var/log/ceph/ceph-osd.6.log
>>> 2018-07-21 03:32:54.086021 7fd15d82c700  0 log_channel(cluster) log
>>> [DBG]
>>> : 6.1d deep-scrub starts
>>> 2018-07-21 03:34:31.185549 7fd15d82c700  0 log_channel(cluster) log
>>> [DBG]
>>> : 6.1d deep-scrub ok
>>> 2018-07-21 03:43:36.720707 7fd178082700  0 --
>>> 172.22.0.16:6801/478362 >>
>>> 172.21.0.16:6800/1459922146 conn(0x556f0642b800 :6801
>>> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
>>> l=1).handle_connect_msg: challenging authorizer
>>>
>>> /var/log/ceph/ceph-mgr.oxygene.log
>>> 2018-07-21 03:58:16.060137 7fbcd300  1 mgr send_beacon standby
>>> 2018-07-21 03:58:18.060733 7fbcd300  1 mgr send_beacon standby
>>> 2018-07-21 03:58:20.061452 7fbcd300  1 mgr send_beacon standby
>>>
>>> /var/log/ceph/ceph-mon.oxygene.log
>>> 2018-07-21 03:52:27.702314 7f25b5406700  4 rocksdb: (Original Log
>>> Time
>>> 2018/07/21-03:52:27.702302) [/build/ceph-12.2.7/src/
>>> rocksdb/db/db_impl_compaction_flush.cc:1392] [default] Manual
>>> compaction
>>> from level-0 to level-1 from 'mgrstat .. '
>>> 2018-07-21 03:52:27.702321 7f25b5406700  4 rocksdb:
>>> [/build/ceph-12.2.7/src/rocksdb/db/compaction_job.cc:1403]
>>> [default] [JOB
>>> 1746] Compacting 1@0 + 1@1 files to L1, score -1.00
>>> 2018-07-21 03:52:27.702329 7f25b5406700  4 rocksdb:
>>> [/build/ceph-12.2.7/src/rocksdb/db/compaction_job.cc:1407]
>>> [default]
>>> Compaction start summary: Base version 1745 Base level 0, inputs:
>>> [149507(602KB)], [149505(13MB)]
>>> 2018-07-21 03:52:27.702348 7f25b5406700  4 rocksdb: EVENT_LOG_v1
>>> {"time_micros": 1532137947702334, "job": 1746, "event":
>>> "compaction_started", "files_L0": [149507], "files_L1": [149505],
>>> "score":
>>> -1, "input_data_size": 14916379}

Re: [ceph-users] Self shutdown of 1 whole system (Derbian stretch/Ceph 12.2.7/bluestore)

2018-07-23 Thread Oliver Freyermuth
Am 23.07.2018 um 11:18 schrieb Nicolas Huillard:
> Le lundi 23 juillet 2018 à 18:23 +1000, Brad Hubbard a écrit :
>> Ceph doesn't shut down systems as in kill or reboot the box if that's
>> what you're saying?
> 
> That's the first part of what I was saying, yes. I was pretty sure Ceph
> doesn't reboot/shutdown/reset, but now it's 100% sure, thanks.
> Maybe systemd triggered something, but without any lasting traces.
> The kernel didn't leave any more traces in kernel.log, and since the
> server was off, there was no oops remaining on the console...

If there was an oops, it should also be recorded in pstore. 
If the kernel was still running and able to show a stacktrace, even if disk I/O 
has become impossible,
it will in general dump the stacktrace to pstore (e.g. UEFI pstore if you boot 
via EFI, or ACPI pstore, if available). 

Cheers,
Oliver

> 
> I'm currently activating "Auto video recording" at the BMC/IPMI level,
> as that may help next time this event occurs... Triggers look like
> they're tuned for Windows BSOD though...
> 
> Thanks for all answers ;-)
> 
>> On Mon, Jul 23, 2018 at 5:04 PM, Nicolas Huillard > .fr> wrote:
>>> Le lundi 23 juillet 2018 à 11:07 +0700, Konstantin Shalygin a écrit
>>> :
> I even have no fancy kernel or device, just real standard
> Debian.
> The
> uptime was 6 days since the upgrade from 12.2.6...

 Nicolas, you should upgrade your 12.2.6 to 12.2.7 due bugs in
 this
 release.
> 



smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Self shutdown of 1 whole system (Derbian stretch/Ceph 12.2.7/bluestore)

2018-07-23 Thread Matthew Vernon
Hi,

> One of my server silently shutdown last night, with no explanation
> whatsoever in any logs. According to the existing logs, the shutdown

We have seen similar things with our SuperMicro servers; our current
best theory is that it's related to CPU power management. Disabling it
in BIOS seems to have helped.

Regards,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Error bluestore doesn't support lvm

2018-07-23 Thread Matthew Vernon
Hi,

On 21/07/18 04:24, Satish Patel wrote:
> I am using openstack-ansible with ceph-ansible to deploy my Ceph
> custer and here is my config in yml file

You might like to know that there's a dedicated (if quiet!) list for
ceph-ansible - ceph-ansi...@lists.ceph.com

Regards,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph bluestore data cache on osd

2018-07-23 Thread Igor Fedotov
Firstly I'd suggest to inspect bluestore performance counters before and 
after adjusting cache parameters (and after running the same test suite).


Namely:

"bluestore_buffer_bytes"

"bluestore_buffer_hit_bytes"

"bluestore_buffer_miss_bytes"


Is hit ratio (bluestore_buffer_hit_bytes) much different after data 
cache increase? Is amount of cached data (bluestore_buffer_bytes) much 
different?



Thanks,

Igor


On 7/23/2018 12:50 PM, nokia ceph wrote:

Hi Team,

We need a mechanism to have some data cache on OSD build on bluestore  
. Is there an option available to enable data cache?


With default configurations , OSD logs state that data cache is 
disabled by default,


 bluestore(/var/lib/ceph/osd/ceph-66) _set_cache_sizes cache_size 
1073741824*meta 0.5 kv 0.5**data 0*

*
*
We tried to change the config to have 49% for data and the OSD logs 
reflected as following, however don't see any improvement in iops status.


bluestore(/var/lib/ceph/osd/ceph-66) _set_cache_sizes cache_size 
1073741824*meta 0.01 kv 0.5 data 0.49*

*
*
Thanks,
Muthu



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CFP: linux.conf.au 2019 (Christchurch, New Zealand)

2018-07-23 Thread Tim Serong
Just a friendly reminder, the linux.conf.au 2019 CFP closes next Monday,
July 30.  Don't miss out!  :-)

On 07/02/2018 04:10 PM, Tim Serong wrote:
> Hi All,
> 
> As happened last year, I forwarded the LCA CFP to ceph-users and
> ceph-devel, but it didn't make it to ceph-devel due to some alleged spam
> filter somewhere.
> 
> TL;DR: Best F/OSS tech conference in the southern hemisphere, this time
> in Christchurch, New Zealand 21-25 January 2019.  Everyone should go
> submit a talk right now, or at least plan to attend :-)
> 
>   https://linux.conf.au/call-for-papers/
> 
> Here's the full announcement with a bunch more details:
> 
>   http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-July/027788.html
> 
> Regards,
> 
> Tim
> 
-- 
Tim Serong
Senior Clustering Engineer
SUSE
tser...@suse.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why lvm is recommended method for bleustore

2018-07-23 Thread Nicolas Huillard
Le dimanche 22 juillet 2018 à 09:51 -0400, Satish Patel a écrit :
> I read that post and that's why I open this thread for few more
> questions and clearence,
> 
> When you said OSD doesn't come up what actually that means?  After
> reboot of node or after service restart or installation of new disk?
> 
> You said we are using manual method what is that? 
> 
> I'm building new cluster and had zero prior experience so how can I
> produce this error to see lvm is really life saving tool here? I'm
> sure there are plenty of people using but I didn't find and good
> document except that mailing list which raising more questions in my
> mind. 

When I had to change a few drives manually, copying the old contents
over, I noticed that the logical volumes are tagged with lots of
information related to how they should be handled at boot time by the
OSD startup system.
These LVM tags are a good standard way to add that meta-data within the
volumes themselves. Apparently, there is no other way to add these tags
that allow for bluestore/filestore, SATA/SAS/NVMe, whole drive or
partition, etc.
They are easy to manage and fail-safe in many configurations.

> Sent from my iPhone
> 
> > On Jul 22, 2018, at 6:31 AM, Marc Roos 
> > wrote:
> > 
> > 
> > 
> > I don’t think it will get any more basic than that. Or maybe this?
> > If 
> > the doctor diagnoses you, you can either accept this, get 2nd
> > opinion, 
> > or study medicine to verify it. 
> > 
> > In short lvm has been introduced to solve some issues of related
> > to 
> > starting osd's (which I did not have, probably because of a
> > 'manual' 
> > configuration). And it opens the ability to support (more future) 
> > devices.
> > 
> > I gave you two links, did you read the whole thread?
> > https://www.mail-archive.com/ceph-users@lists.ceph.com/msg47802.htm
> > l
> > 
> > 
> > 
> > 
> > 
> > -Original Message-
> > From: Satish Patel [mailto:satish@gmail.com] 
> > Sent: zaterdag 21 juli 2018 20:59
> > To: ceph-users
> > Subject: [ceph-users] Why lvm is recommended method for bleustore
> > 
> > Folks,
> > 
> > I think i am going to boil ocean here, I google a lot about this
> > topic 
> > why lvm is recommended method for bluestore, but didn't find any
> > good 
> > and detail explanation, not even in Ceph official website.
> > 
> > Can someone explain here in basic language because i am no way
> > expert so 
> > just want to understand what is the advantage of adding extra layer
> > of 
> > complexity?
> > 
> > I found this post but its not i got lost reading it and want to see
> > what 
> > other folks suggesting and offering in their language 
> > https://www.mail-archive.com/ceph-users@lists.ceph.com/msg46768.htm
> > l
> > 
> > ~S
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> > 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-- 
Nicolas Huillard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why lvm is recommended method for bleustore

2018-07-23 Thread Willem Jan Withagen

On 22-7-2018 15:51, Satish Patel wrote:

I read that post and that's why I open this thread for few more questions and 
clearence,

When you said OSD doesn't come up what actually that means?  After reboot of 
node or after service restart or installation of new disk?

You said we are using manual method what is that?

I'm building new cluster and had zero prior experience so how can I produce 
this error to see lvm is really life saving tool here? I'm sure there are 
plenty of people using but I didn't find and good document except that mailing 
list which raising more questions in my mind.


Satish

It is a choice made during the design of the new setup with ceph-volume.
For reasons set out by Sage in one of the refered posts.

Just as there are many engineering questions that get solved by 
selecting a tool that does the work, in this case LVM.
And I do not think it was given a huge amount of consideration to use 
it. If I would guess the possibility to hard add attributes to volumes 
is going to be one of the selectors.
(I'm not even sure the if an alternative low impact middle layer that 
can do disk abstraction on Linux)


LVM is sort of a first tool of trade if you do not want to deal with the 
raw disks...
And as Marc said: You need to start a full study on the possible 
alternatives to the raised questions.


I personally would not waste the time for that. On the developers list 
it has been gone over a few post about ceph-volume and things were 
rarely about the selection of LVM.


--WjW


Sent from my iPhone


On Jul 22, 2018, at 6:31 AM, Marc Roos  wrote:



I don’t think it will get any more basic than that. Or maybe this? If
the doctor diagnoses you, you can either accept this, get 2nd opinion,
or study medicine to verify it.

In short lvm has been introduced to solve some issues of related to
starting osd's (which I did not have, probably because of a 'manual'
configuration). And it opens the ability to support (more future)
devices.

I gave you two links, did you read the whole thread?
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg47802.html





-Original Message-
From: Satish Patel [mailto:satish@gmail.com]
Sent: zaterdag 21 juli 2018 20:59
To: ceph-users
Subject: [ceph-users] Why lvm is recommended method for bleustore

Folks,

I think i am going to boil ocean here, I google a lot about this topic
why lvm is recommended method for bluestore, but didn't find any good
and detail explanation, not even in Ceph official website.

Can someone explain here in basic language because i am no way expert so
just want to understand what is the advantage of adding extra layer of
complexity?

I found this post but its not i got lost reading it and want to see what
other folks suggesting and offering in their language
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg46768.html

~S
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Ceph-deploy] Cluster Name

2018-07-23 Thread Glen Baars
How very timely, I am facing the exact same issue.

Kind regards,
Glen Baars

-Original Message-
From: ceph-users  On Behalf Of Thode Jocelyn
Sent: Monday, 23 July 2018 1:42 PM
To: Vasu Kulkarni 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name

Hi,

Yes my rbd-mirror is coloctaed with my mon/osd. It only affects nodes where 
they are collocated as they all use the "/etc/sysconfig/ceph" configuration 
file.

Best
Jocelyn Thode

-Original Message-
From: Vasu Kulkarni [mailto:vakul...@redhat.com]
Sent: vendredi, 20 juillet 2018 17:25
To: Thode Jocelyn 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name

On Fri, Jul 20, 2018 at 7:29 AM, Thode Jocelyn  wrote:
> Hi,
>
>
>
> I noticed that in commit
> https://github.com/ceph/ceph-deploy/commit/b1c27b85d524f2553af2487a980
> 23b60efe421f3, the ability to specify a cluster name was removed. Is
> there a reason for this removal ?
>
>
>
> Because right now, there are no possibility to create a ceph cluster
> with a different name with ceph-deploy which is a big problem when
> having two clusters replicating with rbd-mirror as we need different names.
>
>
>
> And even when following the doc here:
> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/h
> tml/block_device_guide/block_device_mirroring#rbd-mirroring-clusters-w
> ith-the-same-name
>
>
>
> This is not sufficient as once we change the CLUSTER variable in the
> sysconfig file, mon,osd, mds etc. all use it and fail to start on a
> reboot as they then try to load data from a path in /var/lib/ceph
> containing the cluster name.

Is you rbd-mirror client also colocated with mon/osd? This needs to be changed 
only on the client side where you are doing mirroring, rest of the nodes are 
not affected?


>
>
>
> Is there a solution to this problem ?
>
>
>
> Best Regards
>
> Jocelyn Thode
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph bluestore data cache on osd

2018-07-23 Thread nokia ceph
 Hi Team,

We need a mechanism to have some data cache on OSD build on bluestore  . Is
there an option available to enable data cache?

With default configurations , OSD logs state that data cache is disabled by
default,

 bluestore(/var/lib/ceph/osd/ceph-66) _set_cache_sizes cache_size 1073741824
 *meta 0.5 kv 0.5* *data 0*

We tried to change the config to have 49% for data and the OSD logs
reflected as following, however don't see any improvement in iops status.

bluestore(/var/lib/ceph/osd/ceph-66) _set_cache_sizes cache_size
1073741824 *meta
0.01 kv 0.5 data 0.49*

Thanks,
Muthu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "CPU CATERR Fault" Was: Self shutdown of 1 whole system (Derbian stretch/Ceph 12.2.7/bluestore)

2018-07-23 Thread Nicolas Huillard
Le lundi 23 juillet 2018 à 10:28 +0200, Caspar Smit a écrit :
> Do you have any hardware watchdog running in the system? A watchdog
> could
> trigger a powerdown if it meets some value. Any event logs from the
> chassis
> itself?

Nice suggestions ;-)

I see some [watchdog/N] and one [watchdogd] kernel threads, along with
a "kernel: [0.116002] NMI watchdog: enabled on all CPUs,
permanently consumes one hw-PMU counter." line in the kernel log, but
no user-land watchdog daemon: I'm not sure if the watchdog is actually
active.

There ARE chassis/BMC/IPMI level events, one of which is "CPU CATERR
Fault", with a timestamp matching the timestamps below, and no more
information.
If I understand correctly, this is a signal emitted by the CPU, to the
BMC, upon "catastrophic error" (more than "fatal"), which the BMC must
respond to the way it wants, Intel suggestions including resetting the
chassis.

https://www.intel.in/content/dam/www/public/us/en/documents/white-paper
s/platform-level-error-strategies-paper.pdf

Does that mean that the hardware is failing, or a neutrino just crossed
some CPU register?
CPU is a Xeon D-1521 with ECC memory.

> Kind regards,

Many thanks!

> 
> Caspar
> 
> 2018-07-21 10:31 GMT+02:00 Nicolas Huillard :
> 
> > Hi all,
> > 
> > One of my server silently shutdown last night, with no explanation
> > whatsoever in any logs. According to the existing logs, the
> > shutdown
> > (without reboot) happened between 03:58:20.061452 (last timestamp
> > from
> > /var/log/ceph/ceph-mgr.oxygene.log) and 03:59:01.515308 (new MON
> > election called, for which oxygene didn't answer).
> > 
> > Is there any way in which Ceph could silently shutdown a server?
> > Can SMART self-test influence scrubbing or compaction?
> > 
> > The only thing I have is that smartd stated a long self-test on
> > both
> > OSD spinning drives on that host:
> > Jul 21 03:21:35 oxygene smartd[712]: Device: /dev/sda [SAT],
> > starting
> > scheduled Long Self-Test.
> > Jul 21 03:21:35 oxygene smartd[712]: Device: /dev/sdb [SAT],
> > starting
> > scheduled Long Self-Test.
> > Jul 21 03:21:35 oxygene smartd[712]: Device: /dev/sdc [SAT],
> > starting
> > scheduled Long Self-Test.
> > Jul 21 03:51:35 oxygene smartd[712]: Device: /dev/sda [SAT], self-
> > test in
> > progress, 90% remaining
> > Jul 21 03:51:35 oxygene smartd[712]: Device: /dev/sdb [SAT], self-
> > test in
> > progress, 90% remaining
> > Jul 21 03:51:35 oxygene smartd[712]: Device: /dev/sdc [SAT],
> > previous
> > self-test completed without error
> > 
> > ...and smartctl now says that the self-tests didn't finish (on both
> > drives) :
> > # 1  Extended offlineInterrupted (host
> > reset)  00% 10636
> > -
> > 
> > MON logs on oxygene talks about rockdb compaction a few minutes
> > before
> > the shutdown, and a deep-scrub finished earlier:
> > /var/log/ceph/ceph-osd.6.log
> > 2018-07-21 03:32:54.086021 7fd15d82c700  0 log_channel(cluster) log
> > [DBG]
> > : 6.1d deep-scrub starts
> > 2018-07-21 03:34:31.185549 7fd15d82c700  0 log_channel(cluster) log
> > [DBG]
> > : 6.1d deep-scrub ok
> > 2018-07-21 03:43:36.720707 7fd178082700  0 --
> > 172.22.0.16:6801/478362 >>
> > 172.21.0.16:6800/1459922146 conn(0x556f0642b800 :6801
> > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
> > l=1).handle_connect_msg: challenging authorizer
> > 
> > /var/log/ceph/ceph-mgr.oxygene.log
> > 2018-07-21 03:58:16.060137 7fbcd300  1 mgr send_beacon standby
> > 2018-07-21 03:58:18.060733 7fbcd300  1 mgr send_beacon standby
> > 2018-07-21 03:58:20.061452 7fbcd300  1 mgr send_beacon standby
> > 
> > /var/log/ceph/ceph-mon.oxygene.log
> > 2018-07-21 03:52:27.702314 7f25b5406700  4 rocksdb: (Original Log
> > Time
> > 2018/07/21-03:52:27.702302) [/build/ceph-12.2.7/src/
> > rocksdb/db/db_impl_compaction_flush.cc:1392] [default] Manual
> > compaction
> > from level-0 to level-1 from 'mgrstat .. '
> > 2018-07-21 03:52:27.702321 7f25b5406700  4 rocksdb:
> > [/build/ceph-12.2.7/src/rocksdb/db/compaction_job.cc:1403]
> > [default] [JOB
> > 1746] Compacting 1@0 + 1@1 files to L1, score -1.00
> > 2018-07-21 03:52:27.702329 7f25b5406700  4 rocksdb:
> > [/build/ceph-12.2.7/src/rocksdb/db/compaction_job.cc:1407]
> > [default]
> > Compaction start summary: Base version 1745 Base level 0, inputs:
> > [149507(602KB)], [149505(13MB)]
> > 2018-07-21 03:52:27.702348 7f25b5406700  4 rocksdb: EVENT_LOG_v1
> > {"time_micros": 1532137947702334, "job": 1746, "event":
> > "compaction_started", "files_L0": [149507], "files_L1": [149505],
> > "score":
> > -1, "input_data_size": 14916379}
> > 2018-07-21 03:52:27.785532 7f25b5406700  4 rocksdb:
> > [/build/ceph-12.2.7/src/rocksdb/db/compaction_job.cc:1116]
> > [default] [JOB
> > 1746] Generated table #149508: 4904 keys, 14808953 bytes
> > 2018-07-21 03:52:27.785587 7f25b5406700  4 rocksdb: EVENT_LOG_v1
> > {"time_micros": 1532137947785565, "cf_name": "default", "job":
> > 1746,
> > "event": "table_file_creation", "file_number": 149508, 

[ceph-users] Checksum verification of BlueStore superblock using Python

2018-07-23 Thread Bausch, Florian
Hi,

I try to use Python (3) to verify the checksum at the end of a BlueStore 
superblock, but I cannot figure it out how to do it.

In my test scenario, the superblock is 0x158 bytes long (starting with 
"bluestore block device\n\n"), then 4 bytes of CRC32 follow. In my case 
the checksum is 0xb759e167.

To verify the checksum I read bytes 0x0-0x157 and compute the CRC32.
As I understand the code, the CRC is initialized with -1.
Therefore my code looks like this:
crc = binascii.crc32(f.read(0x158), -1)

And afterwards crc contains 0xbaa01c56.

I tried several Python libraries for CRC32 computation and cannot get it 
working with any of them.

I guess I miss something, like an extra step, or I have to use a certain lib. 
Maybe somebody of you got an idea.


Thanks in advance,
Florian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Self shutdown of 1 whole system (Derbian stretch/Ceph 12.2.7/bluestore)

2018-07-23 Thread Nicolas Huillard
Le lundi 23 juillet 2018 à 18:23 +1000, Brad Hubbard a écrit :
> Ceph doesn't shut down systems as in kill or reboot the box if that's
> what you're saying?

That's the first part of what I was saying, yes. I was pretty sure Ceph
doesn't reboot/shutdown/reset, but now it's 100% sure, thanks.
Maybe systemd triggered something, but without any lasting traces.
The kernel didn't leave any more traces in kernel.log, and since the
server was off, there was no oops remaining on the console...

I'm currently activating "Auto video recording" at the BMC/IPMI level,
as that may help next time this event occurs... Triggers look like
they're tuned for Windows BSOD though...

Thanks for all answers ;-)

> On Mon, Jul 23, 2018 at 5:04 PM, Nicolas Huillard  .fr> wrote:
> > Le lundi 23 juillet 2018 à 11:07 +0700, Konstantin Shalygin a écrit
> > :
> > > > I even have no fancy kernel or device, just real standard
> > > > Debian.
> > > > The
> > > > uptime was 6 days since the upgrade from 12.2.6...
> > > 
> > > Nicolas, you should upgrade your 12.2.6 to 12.2.7 due bugs in
> > > this
> > > release.

-- 
Nicolas Huillard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Converting to multisite

2018-07-23 Thread Robert Stanford
 I already have a set of default.rgw.* pools.  They are in use.  I want to
convert to multisite.  The tutorials show to create new pools
(zone.rgw.*).  Do I have to destroy my old pools and lose all data, in
order to convert to multisite?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Self shutdown of 1 whole system (Derbian stretch/Ceph 12.2.7/bluestore)

2018-07-23 Thread Caspar Smit
Do you have any hardware watchdog running in the system? A watchdog could
trigger a powerdown if it meets some value. Any event logs from the chassis
itself?

Kind regards,

Caspar

2018-07-21 10:31 GMT+02:00 Nicolas Huillard :

> Hi all,
>
> One of my server silently shutdown last night, with no explanation
> whatsoever in any logs. According to the existing logs, the shutdown
> (without reboot) happened between 03:58:20.061452 (last timestamp from
> /var/log/ceph/ceph-mgr.oxygene.log) and 03:59:01.515308 (new MON
> election called, for which oxygene didn't answer).
>
> Is there any way in which Ceph could silently shutdown a server?
> Can SMART self-test influence scrubbing or compaction?
>
> The only thing I have is that smartd stated a long self-test on both
> OSD spinning drives on that host:
> Jul 21 03:21:35 oxygene smartd[712]: Device: /dev/sda [SAT], starting
> scheduled Long Self-Test.
> Jul 21 03:21:35 oxygene smartd[712]: Device: /dev/sdb [SAT], starting
> scheduled Long Self-Test.
> Jul 21 03:21:35 oxygene smartd[712]: Device: /dev/sdc [SAT], starting
> scheduled Long Self-Test.
> Jul 21 03:51:35 oxygene smartd[712]: Device: /dev/sda [SAT], self-test in
> progress, 90% remaining
> Jul 21 03:51:35 oxygene smartd[712]: Device: /dev/sdb [SAT], self-test in
> progress, 90% remaining
> Jul 21 03:51:35 oxygene smartd[712]: Device: /dev/sdc [SAT], previous
> self-test completed without error
>
> ...and smartctl now says that the self-tests didn't finish (on both
> drives) :
> # 1  Extended offlineInterrupted (host reset)  00% 10636
> -
>
> MON logs on oxygene talks about rockdb compaction a few minutes before
> the shutdown, and a deep-scrub finished earlier:
> /var/log/ceph/ceph-osd.6.log
> 2018-07-21 03:32:54.086021 7fd15d82c700  0 log_channel(cluster) log [DBG]
> : 6.1d deep-scrub starts
> 2018-07-21 03:34:31.185549 7fd15d82c700  0 log_channel(cluster) log [DBG]
> : 6.1d deep-scrub ok
> 2018-07-21 03:43:36.720707 7fd178082700  0 -- 172.22.0.16:6801/478362 >>
> 172.21.0.16:6800/1459922146 conn(0x556f0642b800 :6801
> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
> l=1).handle_connect_msg: challenging authorizer
>
> /var/log/ceph/ceph-mgr.oxygene.log
> 2018-07-21 03:58:16.060137 7fbcd300  1 mgr send_beacon standby
> 2018-07-21 03:58:18.060733 7fbcd300  1 mgr send_beacon standby
> 2018-07-21 03:58:20.061452 7fbcd300  1 mgr send_beacon standby
>
> /var/log/ceph/ceph-mon.oxygene.log
> 2018-07-21 03:52:27.702314 7f25b5406700  4 rocksdb: (Original Log Time
> 2018/07/21-03:52:27.702302) [/build/ceph-12.2.7/src/
> rocksdb/db/db_impl_compaction_flush.cc:1392] [default] Manual compaction
> from level-0 to level-1 from 'mgrstat .. '
> 2018-07-21 03:52:27.702321 7f25b5406700  4 rocksdb:
> [/build/ceph-12.2.7/src/rocksdb/db/compaction_job.cc:1403] [default] [JOB
> 1746] Compacting 1@0 + 1@1 files to L1, score -1.00
> 2018-07-21 03:52:27.702329 7f25b5406700  4 rocksdb:
> [/build/ceph-12.2.7/src/rocksdb/db/compaction_job.cc:1407] [default]
> Compaction start summary: Base version 1745 Base level 0, inputs:
> [149507(602KB)], [149505(13MB)]
> 2018-07-21 03:52:27.702348 7f25b5406700  4 rocksdb: EVENT_LOG_v1
> {"time_micros": 1532137947702334, "job": 1746, "event":
> "compaction_started", "files_L0": [149507], "files_L1": [149505], "score":
> -1, "input_data_size": 14916379}
> 2018-07-21 03:52:27.785532 7f25b5406700  4 rocksdb:
> [/build/ceph-12.2.7/src/rocksdb/db/compaction_job.cc:1116] [default] [JOB
> 1746] Generated table #149508: 4904 keys, 14808953 bytes
> 2018-07-21 03:52:27.785587 7f25b5406700  4 rocksdb: EVENT_LOG_v1
> {"time_micros": 1532137947785565, "cf_name": "default", "job": 1746,
> "event": "table_file_creation", "file_number": 149508, "file_size":
> 14808953, "table_properties": {"data
> 2018-07-21 03:52:27.785627 7f25b5406700  4 rocksdb:
> [/build/ceph-12.2.7/src/rocksdb/db/compaction_job.cc:1173] [default] [JOB
> 1746] Compacted 1@0 + 1@1 files to L1 => 14808953 bytes
> 2018-07-21 03:52:27.785656 7f25b5406700  3 rocksdb:
> [/build/ceph-12.2.7/src/rocksdb/db/version_set.cc:2087] More existing
> levels in DB than needed. max_bytes_for_level_multiplier may not be
> guaranteed.
> 2018-07-21 03:52:27.791640 7f25b5406700  4 rocksdb: (Original Log Time
> 2018/07/21-03:52:27.791526) [/build/ceph-12.2.7/src/
> rocksdb/db/compaction_job.cc:621] [default] compacted to: base level 1
> max bytes base 26843546 files[0 1 0 0 0 0 0]
> 2018-07-21 03:52:27.791657 7f25b5406700  4 rocksdb: (Original Log Time
> 2018/07/21-03:52:27.791563) EVENT_LOG_v1 {"time_micros": 1532137947791548,
> "job": 1746, "event": "compaction_finished", "compaction_time_micros":
> 83261, "output_level"
> 2018-07-21 03:52:27.792024 7f25b5406700  4 rocksdb: EVENT_LOG_v1
> {"time_micros": 1532137947792019, "job": 1746, "event":
> "table_file_deletion", "file_number": 149507}
> 2018-07-21 03:52:27.796596 7f25b5406700  4 rocksdb: EVENT_LOG_v1
> {"time_micros": 1532137947796592, "job": 1746, "event":
> 

Re: [ceph-users] Self shutdown of 1 whole system (Derbian stretch/Ceph 12.2.7/bluestore)

2018-07-23 Thread Brad Hubbard
Ceph doesn't shut down systems as in kill or reboot the box if that's
what you're saying?

On Mon, Jul 23, 2018 at 5:04 PM, Nicolas Huillard  wrote:
> Le lundi 23 juillet 2018 à 11:07 +0700, Konstantin Shalygin a écrit :
>> > I even have no fancy kernel or device, just real standard Debian.
>> > The
>> > uptime was 6 days since the upgrade from 12.2.6...
>>
>> Nicolas, you should upgrade your 12.2.6 to 12.2.7 due bugs in this
>> release.
>
> That was done (cf. subject).
> This is happening with 12.2.7, fresh and 6 days old.
>
> --
> Nicolas Huillard
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Self shutdown of 1 whole system (Derbian stretch/Ceph 12.2.7/bluestore)

2018-07-23 Thread Nicolas Huillard
Le lundi 23 juillet 2018 à 11:07 +0700, Konstantin Shalygin a écrit :
> > I even have no fancy kernel or device, just real standard Debian.
> > The
> > uptime was 6 days since the upgrade from 12.2.6...
> 
> Nicolas, you should upgrade your 12.2.6 to 12.2.7 due bugs in this
> release.

That was done (cf. subject).
This is happening with 12.2.7, fresh and 6 days old.

-- 
Nicolas Huillard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com