Re: [ceph-users] Investigating Config Error, 300x reduction in IOPs performance on RGW layer

2019-07-18 Thread Paul Emmerich
On Thu, Jul 18, 2019 at 3:44 AM Robert LeBlanc  wrote:

> I'm pretty new to RGW, but I'm needing to get max performance as well.
> Have you tried moving your RGW metadata pools to nvme? Carve out a bit of
> NVMe space and then pin the pool to the SSD class in CRUSH, that way the
> small metadata ops aren't on slow media.
>

no, don't do that:

1) a performance difference of 130 vs. 48k iopos is not due to SSD vs. NVMe
for metadata unless the SSD is absolute crap
2) the OSDs already have an NVMe DB device, it's much easier to use it
directly than by partioning the NVMes to create a separate partition as a
normal OSD


Assuming your NVMe disks are a reasonable size (30GB per OSD): put the
metadata pools on the HDDs. It's better to have 48 OSDs with 4 NVMes behind
them handling metadata than only 4 OSDs with SSDs.

Running mons in VMs with gigabit network is fine for small clusters and not
a performance problem


How are you benchmarking?

Paul


> 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Wed, Jul 17, 2019 at 5:59 PM Ravi Patel  wrote:
>
>> Hello,
>>
>> We have deployed ceph cluster and we are trying to debug a massive drop
>> in performance between the RADOS layer vs the RGW layer
>>
>> ## Cluster config
>> 4 OSD nodes (12 Drives each, NVME Journals, 1 SSD drive) 40GbE NIC
>> 2 RGW nodes ( DNS RR load balancing) 40GbE NIC
>> 3 MON nodes 1 GbE NIC
>>
>> ## Pool configuration
>> RGW data pool  - replicated 3x 4M stripe (HDD)
>> RGW metadata pool - replicated 3x (SSD) pool
>>
>> ## Benchmarks
>> 4K Read IOP/s performance using RADOS Bench 48,000~ IOP/s
>> 4K Read RGW performance via s3 interface ~ 130 IOP/s
>>
>> Really trying to understand how to debug this issue. all the nodes never
>> break 15% CPU utilization and there is plenty of RAM. The one pathological
>> issue in our cluster is that the MON nodes are currently on VMs that are
>> sitting behind a single 1 GbE NIC. (We are in the process of moving them,
>> but are unsure if that will fix the issue.
>>
>> What metrics should we be looking at to debug the RGW layer. Where do we
>> need to look?
>>
>> ---
>>
>> Ravi Patel, PhD
>> Machine Learning Systems Lead
>> Email: r...@kheironmed.com
>>
>>
>> *Kheiron Medical Technologies*
>>
>> kheironmed.com | supporting radiologists with deep learning
>>
>> Kheiron Medical Technologies Ltd. is a registered company in England and
>> Wales. This e-mail and its attachment(s) are intended for the above named
>> only and are confidential. If they have come to you in error then you must
>> take no action based upon them but contact us immediately. Any disclosure,
>> copying, distribution or any action taken or omitted to be taken in
>> reliance on it is prohibited and may be unlawful. Although this e-mail and
>> its attachments are believed to be free of any virus, it is the
>> responsibility of the recipient to ensure that they are virus free. If you
>> contact us by e-mail then we will store your name and address to facilitate
>> communications. Any statements contained herein are those of the individual
>> and not the organisation.
>>
>> Registered number: 10184103. Registered office: RocketSpace, 40
>> Islington High Street, London, N1 8EQ
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Investigating Config Error, 300x reduction in IOPs performance on RGW layer

2019-07-18 Thread Burkhard Linke

Hi,


On 7/18/19 8:57 AM, Ravi Patel wrote:
We’ve been debugging this a while.  The data pool was originally EC 
backed with the bucket indexes on HDD pools. Moving the metadata to 
SSD backed pools improved usability and consistency and the change 
from EC to replicated improved the rados layer iops by 4x, but didn't 
seem to affect rgw IOPS performance very much. Based on that I think 
there is a configuration setup error somewhere.


We can try it but not sure that the hardware is the bottleneck.

It would be good to understand if there is any performance counters or 
metrics we should be looking at to see where the issue might be.



Just my 2 ct:


What kind of authentication do you use within RGW? Local authentication 
(based on username/password stored in RGW metadata), Keystone or LDAP? 
If you do not use local authentication, each request has to validated 
against an external source. In case of keystone this means the RGW has 
to send the requests and authentication information to RGW for 
validation. It does not have access to the plaintext password/secret 
key. This adds an extra round trip for each request.



If this upcall is using a SSL/TLS based connection, you might even need 
to do a complete handshake for each upcall (not sure whether RGW is 
using keep-alive and persistent connections in this case, maybe a 
developer can comment?).


For local authentication I'm also not sure whether the metadata is 
cached, which will require another round trip to ceph for retrieving the 
password.



If you are using keystone, you can test this by creating a local user + 
bucket, and benchmark that account vs. a keystone based account



Regards,

Burkhard


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Allocation recommendations for separate blocks.db and WAL

2019-07-18 Thread Lars Marowsky-Bree
On 2019-07-17T11:56:25, Robert LeBlanc  wrote:

> So, I see the recommendation for 4% of OSD space for blocks.db/WAL and the
> corresponding discussion regrading the 3/30/300GB vs 6/60/600GB allocation.
> 
> How does this change when WAL is seperate from blocks.db?
> 
> Reading [0] it seems that 6/60/600 is not correct. It seems that to compact
> a 300GB DB, you taking values from the above layer (which is only 10% of
> the lower layer and only some percentage that exceeds the trigger point of
> that will be merged down) and merging that in, so at worse case you would
> need 333GB (300+30+3) plus some headroom.

I think the doubling of values is mainly used to leave sufficient
headroom for all possible overhead.

The most common choice we see here is the 60/64 GB scenario. (Computer
folks tend to think in powers of two. ;-)

It's not cost effective to haggle too much; at any given 1:n ratio, the
60 GB * n on the shared device is not the significant cost factor. Going
too low however would likely be rather annoying in the future, so why
not play it safe?

The 4% general advice seems incomplete; if anything, one should possibly
then round up to the next sensible value. But this heavily depends on
the workload - if the cluster only hosts RBDs, you'll see much less
metadata, for example. Unfortunately, we don't seem to have
significantly better recommendations yet.


Regards,
Lars

-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Mary Higgins, Sri Rasiah, HRB 21284 (AG 
Nürnberg)
"Architects should open possibilities and not determine everything." (Ueli 
Zbinden)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD replacement causes slow requests

2019-07-18 Thread Eugen Block

Hi list,

we're facing an unexpected recovery behavior of an upgraded cluster  
(Luminous -> Nautilus).


We added new servers with Nautilus to the existing Luminous cluster,  
so we could first replace the MONs step by step. Then we moved the old  
servers to a new root in the crush map and then added the new OSDs to  
the default root so we would need to rebalance the data only once.  
This almost worked as planned, except for many slow and stuck  
requests. We did this after business hours so the impact was  
negligable and we didn't really investigate, the goal was to finish  
the rebalancing.


But only after two days one of the new OSDs (osd.30) already reported  
errors, so we need to replace that disk.
The replacement disk (osd.0) has been added with an initial crush  
weight of 0 (also reweight 0) to control the backfill with small steps.
This seems to be harder than it should (also than we experienced so  
far), no matter how small the steps are, the cluster immediately  
reports slow requests. We can't disrupt the production environment so  
we cancelled the backfill/recovery for now. But this procedure has  
been successful in the past with Luminous, that's why we're so  
surprised.


The recovery and backfill parameters are pretty low:

"osd_max_backfills": "1",
"osd_recovery_max_active": "3",

This usually allowed us a slow backfill to be able to continue  
productive work, now it doesn't.


Our ceph version is (only the active MDS still runs Luminous, the  
designated server is currently being upgraded):


14.2.0-300-gacd2f2b9e1 (acd2f2b9e196222b0350b3b59af9981f91706c7f)  
nautilus (stable)


Is there anything we missed that we should be aware of in Nautilus  
regarding recovery and replacement scenarios?
We couldn't reduce the weight of that osd lower than 0.16, everything  
else results in slow requests.
During the weight reduction several PGs keep stuck in  
activating+remapped state when, only recoverable (sometimes) by  
restarting that affected osd several times. Reducing crush weight  
leads to the same effect.


Please note: the old servers in root-ec are going to be ec-only OSDs,  
that's why they're still in the cluster.


Any pointers to what goes wrong here would be highly appreciated! If  
you need any other information I'd be happy to provide it.


Best regards,
Eugen


This is our osd tree:

ID  CLASS WEIGHT   TYPE NAME STATUS REWEIGHT PRI-AFF
-19   11.09143 root root-ec
 -25.54572 host ceph01
  1   hdd  0.92429 osd.1   down0 1.0
  4   hdd  0.92429 osd.4 up0 1.0
  6   hdd  0.92429 osd.6 up0 1.0
 13   hdd  0.92429 osd.13up0 1.0
 16   hdd  0.92429 osd.16up0 1.0
 18   hdd  0.92429 osd.18up0 1.0
 -35.54572 host ceph02
  2   hdd  0.92429 osd.2 up0 1.0
  5   hdd  0.92429 osd.5 up0 1.0
  7   hdd  0.92429 osd.7 up0 1.0
 12   hdd  0.92429 osd.12up0 1.0
 17   hdd  0.92429 osd.17up0 1.0
 19   hdd  0.92429 osd.19up0 1.0
 -5  0 host ceph03
 -1   38.32857 root default
-31   10.79997 host ceph04
 25   hdd  3.5 osd.25up  1.0 1.0
 26   hdd  3.5 osd.26up  1.0 1.0
 27   hdd  3.5 osd.27up  1.0 1.0
-34   14.39995 host ceph05
  0   hdd  3.59998 osd.0 up0 1.0
 28   hdd  3.5 osd.28up  1.0 1.0
 29   hdd  3.5 osd.29up  1.0 1.0
 30   hdd  3.5 osd.30up  0.15999   0
-37   10.79997 host ceph06
 31   hdd  3.5 osd.31up  1.0 1.0
 32   hdd  3.5 osd.32up  1.0 1.0
 33   hdd  3.5 osd.33up  1.0 1.0


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pools limit

2019-07-18 Thread M Ranga Swami Reddy
Hi - I can start using the 64 PGs...as  Iam having 10 nodes - with 18 OSDs
per node..

On Tue, Jul 16, 2019 at 9:01 PM Janne Johansson  wrote:

> Den tis 16 juli 2019 kl 16:16 skrev M Ranga Swami Reddy <
> swamire...@gmail.com>:
>
>> Hello - I have created 10 nodes ceph cluster with 14.x version. Can you
>> please confirm below:
>> Q1 - Can I create 100+ pool (or more) on the cluster? (the reason is -
>> creating a pool per project). Any limitation on pool creation?
>>
>> Q2 - In the above pool - I use 128 PG-NUM - to start with and enable
>> autoscale for PG_NUM, so that based on the data in the pool, PG_NUM will
>> increase by ceph itself.
>>
>>
> 12800 PGs in total might be a bit much, depending on how many OSDs you
> have in total for these pools. OSDs aim for something like ~100 PGs per OSD
> at most, so for 12800 PGs in total, times 3 for replication=3 makes it
> necessary to have quite many OSDs per host. I guess the autoscaler might be
> working downwards for your pools instead of upwards. There is nothing wrong
> with starting with PG_NUM 8 or so, and have autoscaler increase the pools
> that actually do get a lot of data.
>
> 100 pools * repl = 3 * pg_num 8 => 2400 PGs, which is fine for 24 OSDs but
> would need more OSDs as some of those pools grow in data/objects.
>
> 100 * 3 * 128 => 38400 PGs, which requires 384 OSDs, or close to 40 OSDs
> per host in your setup. That might become a limiting factor in itself,
> sticking so many OSDs in a single box.
>
> --
> May the most significant bit of your life be positive.
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph OSD daemon causes network card issues

2019-07-18 Thread Geoffrey Rhodes
Hi Cephers,

I've been having an issue since upgrading my cluster to Mimic 6 months ago
(previously installed with Luminous 12.2.1).
All nodes that have the same PCIe network card seem to loose network
connectivity randomly. (frequency ranges from a few days to weeks per host
node)
The affected nodes only have the Intel 82576 LAN Card in common, different
motherboards, installed drives, RAM and even PSUs.
Nodes that have the Intel I350 cards are not affected by the Mimic upgrade.
Each host node has recommended RAM installed and has between 4 and 6 OSDs /
sata hard drives installed.
The cluster operated for over a year (Luminous) without a single issue,
only after the Mimic upgrade did the issues begin with these nodes.
The cluster is only used for CephFS (file storage, low intensity usage) and
makes use of erasure data pool (K=4, M=2).

I've tested many things, different kernel versions, different Ubuntu LTS
releases, re-installation and even CENTOS 7, different releases of Mimic,
different igb drivers.
If I stop the ceph-osd daemons the issue does not occur.  If I swap out the
Intel 82576 card with the Intel I350 the issue is resolved.
I haven't any more ideas other than replacing the cards but I feel the
issue is linked to the ceph-osd daemon and a change in the Mimic release.
Below are the various software versions and drivers I've tried and a log
extract from a node that lost network connectivity. - Any help or
suggestions would be greatly appreciated.

*OS:*  Ubuntu 16.04 / 18.04 and recently CENTOS 7
*Ceph Version:*Mimic (currently 13.2.6)
*Network card:*4-PORT 1GB INTEL 82576 LAN CARD (AOC-SG-I4)
*Driver:  *   igb
*Driver Versions:* 5.3.0-k / 5.3.5.22s / 5.4.0-k
*Network Config:* 2 x bonded (LACP) 1GB nic for public net,   2 x
bonded (LACP) 1GB nic for private net
*Log errors:*
Jun 27 12:10:28 cephnode5 kernel: [497346.638608] igb :03:00.0
enp3s0f0: PCIe link lost, device now detached
Jun 27 12:10:28 cephnode5 kernel: [497346.686752] igb :04:00.1
enp4s0f1: PCIe link lost, device now detached
Jun 27 12:10:29 cephnode5 kernel: [497347.550473] igb :03:00.1
enp3s0f1: PCIe link lost, device now detached
Jun 27 12:10:29 cephnode5 kernel: [497347.646785] igb :04:00.0
enp4s0f0: PCIe link lost, device now detached
Jun 27 12:10:43 cephnode5 ceph-osd[2575]: 2019-06-27 12:10:43.793
7f73ca637700 -1 osd.15 28497 heartbeat_check: no reply from 10.100.4.1:6809
osd.16 since back 2019-06
-27 12:10:27.438961 front 2019-06-27 12:10:23.338012 (cutoff 2019-06-27
12:10:23.796726)
Jun 27 12:10:43 cephnode5 ceph-osd[2575]: 2019-06-27 12:10:43.793
7f73ca637700 -1 osd.15 28497 heartbeat_check: no reply from 10.100.6.1:6804
osd.20 since back 2019-06
-27 12:10:27.438961 front 2019-06-27 12:10:23.338012 (cutoff 2019-06-27
12:10:23.796726)
Jun 27 12:10:43 cephnode5 ceph-osd[2575]: 2019-06-27 12:10:43.793
7f73ca637700 -1 osd.15 28497 heartbeat_check: no reply from 10.100.7.1:6803
osd.25 since back 2019-06
-27 12:10:23.338012 front 2019-06-27 12:10:23.338012 (cutoff 2019-06-27
12:10:23.796726)
Jun 27 12:10:43 cephnode5 ceph-osd[2575]: 2019-06-27 12:10:43.793
7f73ca637700 -1 osd.15 28497 heartbeat_check: no reply from 10.100.8.1:6803
osd.30 since back 2019-06
-27 12:10:27.438961 front 2019-06-27 12:10:23.338012 (cutoff 2019-06-27
12:10:23.796726)
Jun 27 12:10:43 cephnode5 ceph-osd[2575]: 2019-06-27 12:10:43.793
7f73ca637700 -1 osd.15 28497 heartbeat_check: no reply from 10.100.9.1:6808
osd.43 since back 2019-06
-27 12:10:23.338012 front 2019-06-27 12:10:23.338012 (cutoff 2019-06-27
12:10:23.796726)


Kind regards
Geoffrey Rhodes
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph OSD daemon causes network card issues

2019-07-18 Thread Konstantin Shalygin

I've been having an issue since upgrading my cluster to Mimic 6 months ago
(previously installed with Luminous 12.2.1).
All nodes that have the same PCIe network card seem to loose network
connectivity randomly. (frequency ranges from a few days to weeks per host
node)
The affected nodes only have the Intel 82576 LAN Card in common, different
motherboards, installed drives, RAM and even PSUs.
Nodes that have the Intel I350 cards are not affected by the Mimic upgrade.
Each host node has recommended RAM installed and has between 4 and 6 OSDs /
sata hard drives installed.
The cluster operated for over a year (Luminous) without a single issue,
only after the Mimic upgrade did the issues begin with these nodes.
The cluster is only used for CephFS (file storage, low intensity usage) and
makes use of erasure data pool (K=4, M=2).

I've tested many things, different kernel versions, different Ubuntu LTS
releases, re-installation and even CENTOS 7, different releases of Mimic,
different igb drivers.
If I stop the ceph-osd daemons the issue does not occur.  If I swap out the
Intel 82576 card with the Intel I350 the issue is resolved.
I haven't any more ideas other than replacing the cards but I feel the
issue is linked to the ceph-osd daemon and a change in the Mimic release.
Below are the various software versions and drivers I've tried and a log
extract from a node that lost network connectivity. - Any help or
suggestions would be greatly appreciated.

*OS:*  Ubuntu 16.04 / 18.04 and recently CENTOS 7
*Ceph Version:*Mimic (currently 13.2.6)
*Network card:*4-PORT 1GB INTEL 82576 LAN CARD (AOC-SG-I4)
*Driver:  *   igb
*Driver Versions:* 5.3.0-k / 5.3.5.22s / 5.4.0-k
*Network Config:* 2 x bonded (LACP) 1GB nic for public net,   2 x
bonded (LACP) 1GB nic for private net
*Log errors:*
Jun 27 12:10:28 cephnode5 kernel: [497346.638608] igb :03:00.0
enp3s0f0: PCIe link lost, device now detached
Jun 27 12:10:28 cephnode5 kernel: [497346.686752] igb :04:00.1
enp4s0f1: PCIe link lost, device now detached
Jun 27 12:10:29 cephnode5 kernel: [497347.550473] igb :03:00.1
enp3s0f1: PCIe link lost, device now detached
Jun 27 12:10:29 cephnode5 kernel: [497347.646785] igb :04:00.0
enp4s0f0: PCIe link lost, device now detached
Jun 27 12:10:43 cephnode5 ceph-osd[2575]: 2019-06-27 12:10:43.793
7f73ca637700 -1 osd.15 28497 heartbeat_check: no reply from 10.100.4.1:6809
osd.16 since back 2019-06
-27 12:10:27.438961 front 2019-06-27 12:10:23.338012 (cutoff 2019-06-27
12:10:23.796726)
Jun 27 12:10:43 cephnode5 ceph-osd[2575]: 2019-06-27 12:10:43.793
7f73ca637700 -1 osd.15 28497 heartbeat_check: no reply from 10.100.6.1:6804
osd.20 since back 2019-06
-27 12:10:27.438961 front 2019-06-27 12:10:23.338012 (cutoff 2019-06-27
12:10:23.796726)
Jun 27 12:10:43 cephnode5 ceph-osd[2575]: 2019-06-27 12:10:43.793
7f73ca637700 -1 osd.15 28497 heartbeat_check: no reply from 10.100.7.1:6803
osd.25 since back 2019-06
-27 12:10:23.338012 front 2019-06-27 12:10:23.338012 (cutoff 2019-06-27
12:10:23.796726)
Jun 27 12:10:43 cephnode5 ceph-osd[2575]: 2019-06-27 12:10:43.793
7f73ca637700 -1 osd.15 28497 heartbeat_check: no reply from 10.100.8.1:6803
osd.30 since back 2019-06
-27 12:10:27.438961 front 2019-06-27 12:10:23.338012 (cutoff 2019-06-27
12:10:23.796726)
Jun 27 12:10:43 cephnode5 ceph-osd[2575]: 2019-06-27 12:10:43.793
7f73ca637700 -1 osd.15 28497 heartbeat_check: no reply from 10.100.9.1:6808
osd.43 since back 2019-06
-27 12:10:23.338012 front 2019-06-27 12:10:23.338012 (cutoff 2019-06-27
12:10:23.796726)


Paste your `ethtool -S `, `ethtool -i ` and `dmesg 
-TL | grep igb`.




k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph OSD daemon causes network card issues

2019-07-18 Thread Geoffrey Rhodes
Sure, also attached.

cephuser@cephnode6:~$ ethtool -S enp3s0f0
NIC statistics:
 rx_packets: 3103528
 tx_packets: 20954382
 rx_bytes: 1385006975
 tx_bytes: 30063866207
 rx_broadcast: 8
 tx_broadcast: 2
 rx_multicast: 14098
 tx_multicast: 476
 multicast: 14098
 collisions: 0
 rx_crc_errors: 0
 rx_no_buffer_count: 0
 rx_missed_errors: 0
 tx_aborted_errors: 0
 tx_carrier_errors: 0
 tx_window_errors: 0
 tx_abort_late_coll: 0
 tx_deferred_ok: 0
 tx_single_coll_ok: 0
 tx_multi_coll_ok: 0
 tx_timeout_count: 0
 rx_long_length_errors: 0
 rx_short_length_errors: 0
 rx_align_errors: 0
 tx_tcp_seg_good: 1563318
 tx_tcp_seg_failed: 0
 rx_flow_control_xon: 0
 rx_flow_control_xoff: 0
 tx_flow_control_xon: 0
 tx_flow_control_xoff: 0
 rx_long_byte_count: 1385006975
 tx_dma_out_of_sync: 0
 tx_smbus: 0
 rx_smbus: 0
 dropped_smbus: 0
 os2bmc_rx_by_bmc: 0
 os2bmc_tx_by_bmc: 0
 os2bmc_tx_by_host: 0
 os2bmc_rx_by_host: 0
 tx_hwtstamp_timeouts: 0
 tx_hwtstamp_skipped: 0
 rx_hwtstamp_cleared: 0
 rx_errors: 0
 tx_errors: 0
 tx_dropped: 0
 rx_length_errors: 0
 rx_over_errors: 0
 rx_frame_errors: 0
 rx_fifo_errors: 0
 tx_fifo_errors: 0
 tx_heartbeat_errors: 0
 tx_queue_0_packets: 292684
 tx_queue_0_bytes: 216244150
 tx_queue_0_restart: 0
 tx_queue_1_packets: 6489256
 tx_queue_1_bytes: 9529075383
 tx_queue_1_restart: 0
 tx_queue_2_packets: 325263
 tx_queue_2_bytes: 330734519
 tx_queue_2_restart: 0
 tx_queue_3_packets: 3074167
 tx_queue_3_bytes: 4363429551
 tx_queue_3_restart: 0
 tx_queue_4_packets: 319961
 tx_queue_4_bytes: 242633539
 tx_queue_4_restart: 0
 tx_queue_5_packets: 252717
 tx_queue_5_bytes: 191717682
 tx_queue_5_restart: 0
 tx_queue_6_packets: 3402851
 tx_queue_6_bytes: 4966526009
 tx_queue_6_restart: 0
 tx_queue_7_packets: 6797483
 tx_queue_7_bytes: 10139687288
 tx_queue_7_restart: 0
 rx_queue_0_packets: 215189
 rx_queue_0_bytes: 153838496
 rx_queue_0_drops: 0
 rx_queue_0_csum_err: 0
 rx_queue_0_alloc_failed: 0
 rx_queue_1_packets: 528614
 rx_queue_1_bytes: 192679289
 rx_queue_1_drops: 0
 rx_queue_1_csum_err: 0
 rx_queue_1_alloc_failed: 0
 rx_queue_2_packets: 186822
 rx_queue_2_bytes: 141708803
 rx_queue_2_drops: 0
 rx_queue_2_csum_err: 0
 rx_queue_2_alloc_failed: 0
 rx_queue_3_packets: 173099
 rx_queue_3_bytes: 131321568
 rx_queue_3_drops: 0
 rx_queue_3_csum_err: 0
 rx_queue_3_alloc_failed: 0
 rx_queue_4_packets: 147423
 rx_queue_4_bytes: 111807376
 rx_queue_4_drops: 0
 rx_queue_4_csum_err: 0
 rx_queue_4_alloc_failed: 0
 rx_queue_5_packets: 133384
 rx_queue_5_bytes: 110116165
 rx_queue_5_drops: 0
 rx_queue_5_csum_err: 0
 rx_queue_5_alloc_failed: 0
 rx_queue_6_packets: 1598989
 rx_queue_6_bytes: 440079884
 rx_queue_6_drops: 0
 rx_queue_6_csum_err: 0
 rx_queue_6_alloc_failed: 0
 rx_queue_7_packets: 120008
 rx_queue_7_bytes: 91041282
 rx_queue_7_drops: 0
 rx_queue_7_csum_err: 0
 rx_queue_7_alloc_failed: 0
cephuser@cephnode6:~$
cephuser@cephnode6:~$ ethtool -S enp3s0f1
NIC statistics:
 rx_packets: 2417818
 tx_packets: 1605247
 rx_bytes: 1790337041
 tx_bytes: 1268847302
 rx_broadcast: 80
 tx_broadcast: 11
 rx_multicast: 14235
 tx_multicast: 463
 multicast: 14235
 collisions: 0
 rx_crc_errors: 0
 rx_no_buffer_count: 0
 rx_missed_errors: 0
 tx_aborted_errors: 0
 tx_carrier_errors: 0
 tx_window_errors: 0
 tx_abort_late_coll: 0
 tx_deferred_ok: 0
 tx_single_coll_ok: 0
 tx_multi_coll_ok: 0
 tx_timeout_count: 0
 rx_long_length_errors: 0
 rx_short_length_errors: 0
 rx_align_errors: 0
 tx_tcp_seg_good: 511041
 tx_tcp_seg_failed: 0
 rx_flow_control_xon: 0
 rx_flow_control_xoff: 0
 tx_flow_control_xon: 0
 tx_flow_control_xoff: 0
 rx_long_byte_count: 1790337041
 tx_dma_out_of_sync: 0
 tx_smbus: 0
 rx_smbus: 0
 dropped_smbus: 0
 os2bmc_rx_by_bmc: 0
 os2bmc_tx_by_bmc: 0
 os2bmc_tx_by_host: 0
 os2bmc_rx_by_host: 0
 tx_hwtstamp_timeouts: 0
 tx_hwtstamp_skipped: 0
 rx_hwtstamp_cleared: 0
 rx_errors: 0
 tx_errors: 0
 tx_dropped: 0
 rx_length_errors: 0
 rx_over_errors: 0
 rx_frame_errors: 0
 rx_fifo_errors: 0
 tx_fifo_errors: 0
 tx_heartbeat_errors: 0
 tx_queue_0_packets: 187486
 tx_queue_0_bytes: 158004014
 tx_queue_0_restart: 0
 tx_queue_1_packets: 292695
 tx_queue_1_bytes: 216338071
 tx_queue_1_restart: 0
 tx_queue_2_packets: 230556
 tx_queue_2_bytes: 187701434
 tx_queue_2_restart: 0
 tx_queue_3_packets: 175072
 tx_queue_3_bytes: 132739732
 t

[ceph-users] Ceph Day London - October 24 (Call for Papers!)

2019-07-18 Thread Wido den Hollander
Hi,

We will be having Ceph Day London October 24th!

https://ceph.com/cephdays/ceph-day-london-2019/

The CFP is now open for you to get your Ceph related content in front
of the Ceph community ranging from all levels of expertise:

https://forms.zohopublic.com/thingee/form/CephDayLondon2019/formperma/h96jZGAAm1dmEzuk2FxbKSpxSk4Y7u4zLtZJcq2Vijk

If your company is interested in sponsoring the event, we would be
delighted to have you. Please contact me directly for further information.

The Ceph Day is co-located with the Apache CloudStack project. There
will be two tracks where people can choose between Ceph and CloudStack.

After the Ceph Day there's going to be beers in the pub nearby to make
new friends.

Join us in London on October 24th!

Wido
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph OSD daemon causes network card issues

2019-07-18 Thread Paul Emmerich
Hi,

Intel 82576 is bad. I've seen quite a few problems with these older igb
familiy NICs, but losing the PCIe link is a new one.
I usually see them getting stuck with a message like "tx queue X hung,
resetting device..."

Try to disable offloading features using ethtool, that sometimes helps with
the problems that I've seen. Maybe that's just a variant of the stuck
problem?


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Thu, Jul 18, 2019 at 12:47 PM Geoffrey Rhodes 
wrote:

> Hi Cephers,
>
> I've been having an issue since upgrading my cluster to Mimic 6 months ago
> (previously installed with Luminous 12.2.1).
> All nodes that have the same PCIe network card seem to loose network
> connectivity randomly. (frequency ranges from a few days to weeks per host
> node)
> The affected nodes only have the Intel 82576 LAN Card in common, different
> motherboards, installed drives, RAM and even PSUs.
> Nodes that have the Intel I350 cards are not affected by the Mimic upgrade.
> Each host node has recommended RAM installed and has between 4 and 6 OSDs
> / sata hard drives installed.
> The cluster operated for over a year (Luminous) without a single issue,
> only after the Mimic upgrade did the issues begin with these nodes.
> The cluster is only used for CephFS (file storage, low intensity usage)
> and makes use of erasure data pool (K=4, M=2).
>
> I've tested many things, different kernel versions, different Ubuntu LTS
> releases, re-installation and even CENTOS 7, different releases of Mimic,
> different igb drivers.
> If I stop the ceph-osd daemons the issue does not occur.  If I swap out
> the Intel 82576 card with the Intel I350 the issue is resolved.
> I haven't any more ideas other than replacing the cards but I feel the
> issue is linked to the ceph-osd daemon and a change in the Mimic release.
> Below are the various software versions and drivers I've tried and a log
> extract from a node that lost network connectivity. - Any help or
> suggestions would be greatly appreciated.
>
> *OS:*  Ubuntu 16.04 / 18.04 and recently CENTOS 7
> *Ceph Version:*Mimic (currently 13.2.6)
> *Network card:*4-PORT 1GB INTEL 82576 LAN CARD (AOC-SG-I4)
> *Driver:  *   igb
> *Driver Versions:* 5.3.0-k / 5.3.5.22s / 5.4.0-k
> *Network Config:* 2 x bonded (LACP) 1GB nic for public net,   2 x
> bonded (LACP) 1GB nic for private net
> *Log errors:*
> Jun 27 12:10:28 cephnode5 kernel: [497346.638608] igb :03:00.0
> enp3s0f0: PCIe link lost, device now detached
> Jun 27 12:10:28 cephnode5 kernel: [497346.686752] igb :04:00.1
> enp4s0f1: PCIe link lost, device now detached
> Jun 27 12:10:29 cephnode5 kernel: [497347.550473] igb :03:00.1
> enp3s0f1: PCIe link lost, device now detached
> Jun 27 12:10:29 cephnode5 kernel: [497347.646785] igb :04:00.0
> enp4s0f0: PCIe link lost, device now detached
> Jun 27 12:10:43 cephnode5 ceph-osd[2575]: 2019-06-27 12:10:43.793
> 7f73ca637700 -1 osd.15 28497 heartbeat_check: no reply from
> 10.100.4.1:6809 osd.16 since back 2019-06
> -27 12:10:27.438961 front 2019-06-27 12:10:23.338012 (cutoff 2019-06-27
> 12:10:23.796726)
> Jun 27 12:10:43 cephnode5 ceph-osd[2575]: 2019-06-27 12:10:43.793
> 7f73ca637700 -1 osd.15 28497 heartbeat_check: no reply from
> 10.100.6.1:6804 osd.20 since back 2019-06
> -27 12:10:27.438961 front 2019-06-27 12:10:23.338012 (cutoff 2019-06-27
> 12:10:23.796726)
> Jun 27 12:10:43 cephnode5 ceph-osd[2575]: 2019-06-27 12:10:43.793
> 7f73ca637700 -1 osd.15 28497 heartbeat_check: no reply from
> 10.100.7.1:6803 osd.25 since back 2019-06
> -27 12:10:23.338012 front 2019-06-27 12:10:23.338012 (cutoff 2019-06-27
> 12:10:23.796726)
> Jun 27 12:10:43 cephnode5 ceph-osd[2575]: 2019-06-27 12:10:43.793
> 7f73ca637700 -1 osd.15 28497 heartbeat_check: no reply from
> 10.100.8.1:6803 osd.30 since back 2019-06
> -27 12:10:27.438961 front 2019-06-27 12:10:23.338012 (cutoff 2019-06-27
> 12:10:23.796726)
> Jun 27 12:10:43 cephnode5 ceph-osd[2575]: 2019-06-27 12:10:43.793
> 7f73ca637700 -1 osd.15 28497 heartbeat_check: no reply from
> 10.100.9.1:6808 osd.43 since back 2019-06
> -27 12:10:23.338012 front 2019-06-27 12:10:23.338012 (cutoff 2019-06-27
> 12:10:23.796726)
>
>
> Kind regards
> Geoffrey Rhodes
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] reproducable rbd-nbd crashes

2019-07-18 Thread Marc Schöchlin
Hello cephers,

rbd-nbd crashes in a reproducible way here.

I created the following bug report: https://tracker.ceph.com/issues/40822

Do you also experience this problem?
Do you have suggestions for in depth debug data collection?

I invoke the following command on a freshly mapped rbd and rbd_rbd crashes:

# find . -type f -name "*.sql" -exec ionice -c3 nice -n 20 gzip -v {} \;
gzip: ./deprecated_data/data_archive.done/entry_search_201232.sql.gz already 
exists; do you wish to overwrite (y or n)? y
./deprecated_data/data_archive.done/entry_search_201232.sql: 84.1% -- 
replaced with ./deprecated_data/data_archive.done/entry_search_201232.sql.gz
./deprecated_data/data_archive.done/entry_search_201233.sql:
gzip: ./deprecated_data/data_archive.done/entry_search_201233.sql: Input/output 
error
gzip: ./deprecated_data/data_archive.done/entry_search_201234.sql: Input/output 
error
gzip: ./deprecated_data/data_archive.done/entry_search_201235.sql: Input/output 
error
gzip: ./deprecated_data/data_archive.done/entry_search_201236.sql: Input/output 
error


dmesg output:

[579763.020890] block nbd0: Connection timed out
[579763.020926] block nbd0: shutting down sockets
[579763.020943] print_req_error: I/O error, dev nbd0, sector 3221296950
[579763.020946] block nbd0: Receive data failed (result -32)
[579763.020952] print_req_error: I/O error, dev nbd0, sector 4523172248
[579763.021001] XFS (nbd0): metadata I/O error: block 0xc0011736 
("xlog_iodone") error 5 numblks 512
[579763.021031] XFS (nbd0): xfs_do_force_shutdown(0x2) called from line 1261 of 
file /build/linux-hwe-xJVMkx/linux-hwe-4.15.0/fs/xfs/xfs_log.c.  Return address 
= 0x918af758
[579763.021046] print_req_error: I/O error, dev nbd0, sector 4523172248
[579763.021161] XFS (nbd0): Log I/O Error Detected.  Shutting down filesystem
[579763.021176] XFS (nbd0): Please umount the filesystem and rectify the 
problem(s)
[579763.176834] print_req_error: I/O error, dev nbd0, sector 3221296969
[579763.176856] print_req_error: I/O error, dev nbd0, sector 2195113096
[579763.176869] XFS (nbd0): metadata I/O error: block 0xc0011749 
("xlog_iodone") error 5 numblks 512
[579763.176884] XFS (nbd0): xfs_do_force_shutdown(0x2) called from line 1261 of 
file /build/linux-hwe-xJVMkx/linux-hwe-4.15.0/fs/xfs/xfs_log.c.  Return address 
= 0x918af758
[579763.252836] print_req_error: I/O error, dev nbd0, sector 2195113352
[579763.252859] print_req_error: I/O error, dev nbd0, sector 2195113608
[579763.252869] print_req_error: I/O error, dev nbd0, sector 2195113864
[579763.356841] print_req_error: I/O error, dev nbd0, sector 2195114120
[579763.356885] print_req_error: I/O error, dev nbd0, sector 2195114376
[579763.358040] XFS (nbd0): writeback error on sector 2195119688
[579763.916813] block nbd0: Connection timed out
[579768.140839] block nbd0: Connection timed out
[579768.140859] print_req_error: 21 callbacks suppressed
[579768.140860] print_req_error: I/O error, dev nbd0, sector 2195112840
[579768.141101] XFS (nbd0): writeback error on sector 2195115592

/var/log/ceph/ceph-client.archiv.log

2019-07-18 14:52:55.387815 7fffcf7fe700  1 -- 10.23.27.200:0/3920476044 --> 
10.23.27.151:6806/2322641 -- osd_op(unknown.0.0:1853 34.132 
34:4cb446f4:::rbd_header.6c73776b8b4567:head [watch unwatch cookie 
140736414969824] snapc 0=[] ondisk+write+known_if_redirected e256219) v8 -- 
0x7fffc803a340 con 0
2019-07-18 14:52:55.388656 7fffe913b700  1 -- 10.23.27.200:0/3920476044 <== 
osd.17 10.23.27.151:6806/2322641 90  watch-notify(notify (1) cookie 
140736414969824 notify 1100452225614816 ret 0) v3  68+0+0 (1852866777 0 0) 
0x7fffe187b4c0 con 0x7fffc00054d0
2019-07-18 14:52:55.388738 7fffe913b700  1 -- 10.23.27.200:0/3920476044 <== 
osd.17 10.23.27.151:6806/2322641 91  osd_op_reply(1852 
rbd_header.6c73776b8b4567 [notify cookie 140736550101040] v0'0 uv2102967 ondisk 
= 0) v8  169+0+8 (3077247585 0 3199212159) 0x7fffe0002ef0 con 0x7fffc00054d0
2019-07-18 14:52:55.388815 7fffc700  5 librbd::Watcher: 0x7fffc0001010 
notifications_blocked: blocked=1
2019-07-18 14:52:55.388904 7fffc700  1 -- 10.23.27.200:0/3920476044 --> 
10.23.27.151:6806/2322641 -- osd_op(unknown.0.0:1854 34.132 
34:4cb446f4:::rbd_header.6c73776b8b4567:head [notify-ack cookie 0] snapc 0=[] 
ondisk+read+known_if_redirected e256219) v8 -- 0x7fffc00600a0 con 0
2019-07-18 14:52:55.389594 7fffe913b700  1 -- 10.23.27.200:0/3920476044 <== 
osd.17 10.23.27.151:6806/2322641 92  osd_op_reply(1853 
rbd_header.6c73776b8b4567 [watch unwatch cookie 140736414969824] 
v256219'2102968 uv2102967 ondisk = 0) v8  169+0+0 (242862078 0 0) 
0x7fffe0002ef0 con 0x7fffc00054d0
2019-07-18 14:52:55.389838 7fffcd7fa700 10 librbd::image::CloseRequest: 
0x55946390 handle_unregister_image_watcher: r=0
2019-07-18 14:52:55.389849 7fffcd7fa700 10 librbd::image::CloseRequest: 
0x55946390 send_flush_readahead
2019-07-18 14:52:55.389848 7fffe913b700  1 -- 10.23.27.200:0/3920476044 <== 
osd.17 10.23.27.1

[ceph-users] Need to replace OSD. How do I find physical disk

2019-07-18 Thread Pelletier, Robert
How do I find the physical disk in a Ceph luminous cluster in order to replace 
it. Osd.9 is down in my cluster which resides on ceph-osd1 host.

If I run lsblk -io KNAME,TYPE,SIZE,MODEL,SERIAL I can get the serial numbers of 
all the physical disks for example
sdbdisk  1.8T ST2000DM001-1CH1 Z1E5VLRG

But how do I find out which osd is mapped to sdb and so on?
When I run df -h I get this

[root@ceph-osd1 ~]# df -h
Filesystem   Size  Used Avail Use% Mounted on
/dev/mapper/ceph--osd1-root   19G  1.9G   17G  10% /
devtmpfs  48G 0   48G   0% /dev
tmpfs 48G 0   48G   0% /dev/shm
tmpfs 48G  9.3M   48G   1% /run
tmpfs 48G 0   48G   0% /sys/fs/cgroup
/dev/sda3947M  232M  716M  25% /boot
tmpfs 48G   24K   48G   1% /var/lib/ceph/osd/ceph-2
tmpfs 48G   24K   48G   1% /var/lib/ceph/osd/ceph-5
tmpfs 48G   24K   48G   1% /var/lib/ceph/osd/ceph-0
tmpfs 48G   24K   48G   1% /var/lib/ceph/osd/ceph-8
tmpfs 48G   24K   48G   1% /var/lib/ceph/osd/ceph-7
tmpfs 48G   24K   48G   1% /var/lib/ceph/osd/ceph-33
tmpfs 48G   24K   48G   1% /var/lib/ceph/osd/ceph-10
tmpfs 48G   24K   48G   1% /var/lib/ceph/osd/ceph-1
tmpfs 48G   24K   48G   1% /var/lib/ceph/osd/ceph-38
tmpfs 48G   24K   48G   1% /var/lib/ceph/osd/ceph-4
tmpfs 48G   24K   48G   1% /var/lib/ceph/osd/ceph-6
tmpfs9.5G 0  9.5G   0% /run/user/0


Robert Pelletier, IT and Security Specialist
Eastern Maine Community College
(207) 974-4782 | 354 Hogan Rd., Bangor, ME 04401

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Need to replace OSD. How do I find physical disk

2019-07-18 Thread John Petrini
Try ceph-disk list
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-18 Thread Jason Dillaman
On Thu, Jul 18, 2019 at 1:47 PM Marc Schöchlin  wrote:
>
> Hello cephers,
>
> rbd-nbd crashes in a reproducible way here.

I don't see a crash report in the log below. Is it really crashing or
is it shutting down? If it is crashing and it's reproducable, can you
install the debuginfo packages, attach gdb, and get a full backtrace
of the crash?

It seems like your cluster cannot keep up w/ the load and the nbd
kernel driver is timing out the IO and shutting down. There is a
"--timeout" option on "rbd-nbd" that you can use to increase the
kernel IO timeout for nbd.

> I created the following bug report: https://tracker.ceph.com/issues/40822
>
> Do you also experience this problem?
> Do you have suggestions for in depth debug data collection?
>
> I invoke the following command on a freshly mapped rbd and rbd_rbd crashes:
>
> # find . -type f -name "*.sql" -exec ionice -c3 nice -n 20 gzip -v {} \;
> gzip: ./deprecated_data/data_archive.done/entry_search_201232.sql.gz already 
> exists; do you wish to overwrite (y or n)? y
> ./deprecated_data/data_archive.done/entry_search_201232.sql: 84.1% -- 
> replaced with ./deprecated_data/data_archive.done/entry_search_201232.sql.gz
> ./deprecated_data/data_archive.done/entry_search_201233.sql:
> gzip: ./deprecated_data/data_archive.done/entry_search_201233.sql: 
> Input/output error
> gzip: ./deprecated_data/data_archive.done/entry_search_201234.sql: 
> Input/output error
> gzip: ./deprecated_data/data_archive.done/entry_search_201235.sql: 
> Input/output error
> gzip: ./deprecated_data/data_archive.done/entry_search_201236.sql: 
> Input/output error
> 
>
> dmesg output:
>
> [579763.020890] block nbd0: Connection timed out
> [579763.020926] block nbd0: shutting down sockets
> [579763.020943] print_req_error: I/O error, dev nbd0, sector 3221296950
> [579763.020946] block nbd0: Receive data failed (result -32)
> [579763.020952] print_req_error: I/O error, dev nbd0, sector 4523172248
> [579763.021001] XFS (nbd0): metadata I/O error: block 0xc0011736 
> ("xlog_iodone") error 5 numblks 512
> [579763.021031] XFS (nbd0): xfs_do_force_shutdown(0x2) called from line 1261 
> of file /build/linux-hwe-xJVMkx/linux-hwe-4.15.0/fs/xfs/xfs_log.c.  Return 
> address = 0x918af758
> [579763.021046] print_req_error: I/O error, dev nbd0, sector 4523172248
> [579763.021161] XFS (nbd0): Log I/O Error Detected.  Shutting down filesystem
> [579763.021176] XFS (nbd0): Please umount the filesystem and rectify the 
> problem(s)
> [579763.176834] print_req_error: I/O error, dev nbd0, sector 3221296969
> [579763.176856] print_req_error: I/O error, dev nbd0, sector 2195113096
> [579763.176869] XFS (nbd0): metadata I/O error: block 0xc0011749 
> ("xlog_iodone") error 5 numblks 512
> [579763.176884] XFS (nbd0): xfs_do_force_shutdown(0x2) called from line 1261 
> of file /build/linux-hwe-xJVMkx/linux-hwe-4.15.0/fs/xfs/xfs_log.c.  Return 
> address = 0x918af758
> [579763.252836] print_req_error: I/O error, dev nbd0, sector 2195113352
> [579763.252859] print_req_error: I/O error, dev nbd0, sector 2195113608
> [579763.252869] print_req_error: I/O error, dev nbd0, sector 2195113864
> [579763.356841] print_req_error: I/O error, dev nbd0, sector 2195114120
> [579763.356885] print_req_error: I/O error, dev nbd0, sector 2195114376
> [579763.358040] XFS (nbd0): writeback error on sector 2195119688
> [579763.916813] block nbd0: Connection timed out
> [579768.140839] block nbd0: Connection timed out
> [579768.140859] print_req_error: 21 callbacks suppressed
> [579768.140860] print_req_error: I/O error, dev nbd0, sector 2195112840
> [579768.141101] XFS (nbd0): writeback error on sector 2195115592
>
> /var/log/ceph/ceph-client.archiv.log
>
> 2019-07-18 14:52:55.387815 7fffcf7fe700  1 -- 10.23.27.200:0/3920476044 --> 
> 10.23.27.151:6806/2322641 -- osd_op(unknown.0.0:1853 34.132 
> 34:4cb446f4:::rbd_header.6c73776b8b4567:head [watch unwatch cookie 
> 140736414969824] snapc 0=[] ondisk+write+known_if_redirected e256219) v8 -- 
> 0x7fffc803a340 con 0
> 2019-07-18 14:52:55.388656 7fffe913b700  1 -- 10.23.27.200:0/3920476044 <== 
> osd.17 10.23.27.151:6806/2322641 90  watch-notify(notify (1) cookie 
> 140736414969824 notify 1100452225614816 ret 0) v3  68+0+0 (1852866777 0 
> 0) 0x7fffe187b4c0 con 0x7fffc00054d0
> 2019-07-18 14:52:55.388738 7fffe913b700  1 -- 10.23.27.200:0/3920476044 <== 
> osd.17 10.23.27.151:6806/2322641 91  osd_op_reply(1852 
> rbd_header.6c73776b8b4567 [notify cookie 140736550101040] v0'0 uv2102967 
> ondisk = 0) v8  169+0+8 (3077247585 0 3199212159) 0x7fffe0002ef0 con 
> 0x7fffc00054d0
> 2019-07-18 14:52:55.388815 7fffc700  5 librbd::Watcher: 0x7fffc0001010 
> notifications_blocked: blocked=1
> 2019-07-18 14:52:55.388904 7fffc700  1 -- 10.23.27.200:0/3920476044 --> 
> 10.23.27.151:6806/2322641 -- osd_op(unknown.0.0:1854 34.132 
> 34:4cb446f4:::rbd_header.6c73776b8b4567:head [notify-ack cookie 0] snapc 0=[] 
> ondisk+read+known_if_redirected e256219)

Re: [ceph-users] Need to replace OSD. How do I find physical disk

2019-07-18 Thread Paul Emmerich
On Thu, Jul 18, 2019 at 8:10 PM John Petrini  wrote:

> Try ceph-disk list
>

no, this system is running ceph-volume not ceph-disk because the
mountpoints are in tmpfs

ceph-volume lvm list

But it looks like the disk is just completely broken and disappeared from
the system.


-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Need to replace OSD. How do I find physical disk

2019-07-18 Thread Reed Dier
You can use ceph-volume to get the LV ID

> # ceph-volume lvm list
> 
> == osd.24 ==
> 
>   [block]
> /dev/ceph-edeb727e-c6d3-4347-bfbb-b9ce7f60514b/osd-block-1da5910e-136a-48a7-8cf1-1c265b7b612a
> 
>   type  block
>   osd id24
>   osd fsid  1da5910e-136a-48a7-8cf1-1c265b7b612a
>   db device /dev/nvme0n1p4
>   db uuid   c4939e17-c787-4630-9ec7-b44565ecf845
>   block uuidn8mCnv-PW4n-43R6-I4uN-P1E0-7qDh-I5dslh
>   block device  
> /dev/ceph-edeb727e-c6d3-4347-bfbb-b9ce7f60514b/osd-block-1da5910e-136a-48a7-8cf1-1c265b7b612a
>   devices   /dev/sda
> 
>   [  db]/dev/nvme0n1p4
> 
>   PARTUUID  c4939e17-c787-4630-9ec7-b44565ecf845

And you can then match this against lsblk which should give you the LV

> $ lsblk -a
> NAME  
> MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
> sda   
>   8:00   1.8T  0 disk
> └─ceph--edeb727e--c6d3--4347--bfbb--b9ce7f60514b-osd--block--1da5910e--136a--48a7--8cf1--1c265b7b612a
>  253:60   1.8T  0 lvm
> nvme0n1   
> 259:00 372.6G  0 disk
> ├─nvme0n1p4   
> 259:40  14.9G  0 part

And if the device has just dropped off, which I have seen before, you should be 
able to see that in dmesg

> [Sat May 11 22:56:27 2019] sd 1:0:17:0: attempting task abort! 
> scmd(2d043ad6)
> [Sat May 11 22:56:27 2019] sd 1:0:17:0: [sdr] tag#0 CDB: Inquiry 12 00 00 00 
> 24 00
> [Sat May 11 22:56:27 2019] scsi target1:0:17: handle(0x001b), 
> sas_address(0x500304801f12eca1), phy(33)
> [Sat May 11 22:56:27 2019] scsi target1:0:17: enclosure logical 
> id(0x500304801f12ecbf), slot(17)
> [Sat May 11 22:56:27 2019] scsi target1:0:17: enclosure level(0x), 
> connector name( )
> [Sat May 11 22:56:28 2019] sd 1:0:17:0: device_block, handle(0x001b)
> [Sat May 11 22:56:30 2019] sd 1:0:17:0: device_unblock and setting to 
> running, handle(0x001b)
> [Sat May 11 22:56:30 2019] sd 1:0:17:0: [sdr] Synchronizing SCSI cache
> [Sat May 11 22:56:30 2019] sd 1:0:17:0: [sdr] Synchronize Cache(10) failed: 
> Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> [Sat May 11 22:56:31 2019] scsi 1:0:17:0: task abort: SUCCESS 
> scmd(2d043ad6)
> [Sat May 11 22:56:31 2019] mpt3sas_cm0: removing handle(0x001b), 
> sas_addr(0x500304801f12eca1)
> [Sat May 11 22:56:31 2019] mpt3sas_cm0: enclosure logical 
> id(0x500304801f12ecbf), slot(17)
> [Sat May 11 22:56:31 2019] mpt3sas_cm0: enclosure level(0x), connector 
> name( )
> [Sat May 11 23:00:57 2019] Buffer I/O error on dev dm-20, logical block 
> 488378352, async page read
> [Sat May 11 23:00:57 2019] Buffer I/O error on dev dm-20, logical block 1, 
> async page read
> [Sat May 11 23:00:58 2019] Buffer I/O error on dev dm-20, logical block 
> 488378352, async page read
> [Sat May 11 23:00:58 2019] Buffer I/O error on dev dm-20, logical block 1, 
> async page read
> 
> # smartctl -a /dev/sdr
> smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.0-46-generic] (local build)
> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> Smartctl open device: /dev/sdr failed: No such device
> Hopefully that helps.

Reed

> On Jul 18, 2019, at 1:11 PM, Paul Emmerich  wrote:
> 
> 
> 
> On Thu, Jul 18, 2019 at 8:10 PM John Petrini  > wrote:
> Try ceph-disk list
> 
> no, this system is running ceph-volume not ceph-disk because the mountpoints 
> are in tmpfs
> 
> ceph-volume lvm list
> 
> But it looks like the disk is just completely broken and disappeared from the 
> system.
> 
> 
> -- 
> Paul Emmerich
> 
> Looking for help with your Ceph cluster? Contact us at https://croit.io 
> 
> 
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io 
> Tel: +49 89 1896585 90
>  
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Need to replace OSD. How do I find physical disk

2019-07-18 Thread ☣Adam
The block device can be found in /var/lib/ceph/osd/ceph-$ID/block
# ls -l /var/lib/ceph/osd/ceph-9/block

In my case it links to /dev/sdbvg/sdb which makes is pretty obvious
which drive this is, but the Volume Group and Logical volume could be
named anything.  To see what physical disk(s) make up this volume group
use lvblk (as Reed suggested)
# lvblk

If that drive needs to be located in a computer with many drives,
smartctl should be able to be used to pull the make, model, and serial
number
# smartctl -i /dev/sdb


I was not aware of ceph-volume, or `ceph-disk list` (which is apparently
now deprecated in favor of ceph-volume), so thank you to all in this
thread for teaching about alternative (arguably more proper) ways of
doing this. :-)

On 7/18/19 12:58 PM, Pelletier, Robert wrote:
> How do I find the physical disk in a Ceph luminous cluster in order to
> replace it. Osd.9 is down in my cluster which resides on ceph-osd1 host.
> 
>  
> 
> If I run lsblk -io KNAME,TYPE,SIZE,MODEL,SERIAL I can get the serial
> numbers of all the physical disks for example
> 
> sdb    disk  1.8T ST2000DM001-1CH1 Z1E5VLRG
> 
>  
> 
> But how do I find out which osd is mapped to sdb and so on?
> 
> When I run df –h I get this
> 
>  
> 
> [root@ceph-osd1 ~]# df -h
> 
> Filesystem   Size  Used Avail Use% Mounted on
> 
> /dev/mapper/ceph--osd1-root   19G  1.9G   17G  10% /
> 
> devtmpfs  48G 0   48G   0% /dev
> 
> tmpfs 48G 0   48G   0% /dev/shm
> 
> tmpfs 48G  9.3M   48G   1% /run
> 
> tmpfs 48G 0   48G   0% /sys/fs/cgroup
> 
> /dev/sda3    947M  232M  716M  25% /boot
> 
> tmpfs 48G   24K   48G   1% /var/lib/ceph/osd/ceph-2
> 
> tmpfs 48G   24K   48G   1% /var/lib/ceph/osd/ceph-5
> 
> tmpfs 48G   24K   48G   1% /var/lib/ceph/osd/ceph-0
> 
> tmpfs 48G   24K   48G   1% /var/lib/ceph/osd/ceph-8
> 
> tmpfs 48G   24K   48G   1% /var/lib/ceph/osd/ceph-7
> 
> tmpfs 48G   24K   48G   1% /var/lib/ceph/osd/ceph-33
> 
> tmpfs 48G   24K   48G   1% /var/lib/ceph/osd/ceph-10
> 
> tmpfs 48G   24K   48G   1% /var/lib/ceph/osd/ceph-1
> 
> tmpfs 48G   24K   48G   1% /var/lib/ceph/osd/ceph-38
> 
> tmpfs 48G   24K   48G   1% /var/lib/ceph/osd/ceph-4
> 
> tmpfs 48G   24K   48G   1% /var/lib/ceph/osd/ceph-6
> 
> tmpfs    9.5G 0  9.5G   0% /run/user/0
> 
>  
> 
>  
> 
> *Robert Pelletier, **IT and Security Specialist***
> 
> Eastern Maine Community College
> (207) 974-4782 | 354 Hogan Rd., Bangor, ME 04401
> 
>  
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bluestore Runaway Memory

2019-07-18 Thread Brett Kelly

Hello,

We have a Nautilus cluster exhibiting what looks like this bug: 
https://tracker.ceph.com/issues/39618


No matter what is set as the osd_memory_target (currently 2147483648 ), 
each OSD process will surpass this value and peak around ~4.0GB then 
eventually start using swap. Cluster stays stable for about a week and 
then starts running into OOM issues, kills off OSDs and requires a 
reboot of each node to get back to a stable state.


Has anyone run into similar/workarounds ?

Ceph version: 14.2.1, RGW Clients

CentOS Linux release 7.6.1810 (Core)

Kernel: 3.10.0-957.12.1.el7.x86_64

256GB RAM per OSD node, 60 OSD's in each node.


Thanks,

--
Brett Kelly

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore Runaway Memory

2019-07-18 Thread Mark Nelson

Hi Brett,


Can you enable debug_bluestore = 5 and debug_prioritycache = 5 on one of 
the OSDs that's showing the behavior?  You'll want to look in the logs 
for lines that look like this:



2019-07-18T19:34:42.587-0400 7f4048b8d700  5 prioritycache tune_memory 
target: 4294967296 mapped: 4260962304 unmapped: 856948736 heap: 
5117911040 old mem: 2845415707 new mem: 2845415707
2019-07-18T19:34:33.527-0400 7f4048b8d700  5 
bluestore.MempoolThread(0x55a6d330ead0) _resize_shards cache_size: 
2845415707 kv_alloc: 1241513984 kv_used: 874833889 meta_alloc: 
1258291200 meta_used: 889040246 data_alloc: 318767104 data_used: 0


The first line will tell you what your memory target is set to, how much 
memory is currently mapped, how much is unmapped (ie what's been freed 
but the kernel hasn't reclaimed), the total heap size, and the old and 
new aggregate size for all of bluestores caches.  The second line also 
tells you the aggregate cache size, and then how much space is being 
allocated and used for the kv, meta, and data caches.  If there's a leak 
somewhere in the OSD or bluestore the autotuner will shrink the cache 
way down but eventually won't be able to contain it and eventually your 
process will start growing beyond the target size despite having a tiny 
amount of bluestore cache.  If it's something else like a huge amount of 
freed memory not being reclaimed by the kernel, you'll see large amount 
of unmapped memory and a big heap size despite the mapped memory staying 
near the target.  If it's a bug in the autotuner, we might see the 
mapped memory greatly exceeding the target.



Mark


On 7/18/19 4:02 PM, Brett Kelly wrote:


Hello,

We have a Nautilus cluster exhibiting what looks like this bug: 
https://tracker.ceph.com/issues/39618


No matter what is set as the osd_memory_target (currently 2147483648 
), each OSD process will surpass this value and peak around ~4.0GB 
then eventually start using swap. Cluster stays stable for about a 
week and then starts running into OOM issues, kills off OSDs and 
requires a reboot of each node to get back to a stable state.


Has anyone run into similar/workarounds ?

Ceph version: 14.2.1, RGW Clients

CentOS Linux release 7.6.1810 (Core)

Kernel: 3.10.0-957.12.1.el7.x86_64

256GB RAM per OSD node, 60 OSD's in each node.


Thanks,

--
Brett Kelly


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing the release cadence

2019-07-18 Thread Konstantin Shalygin

Arch Linux packager for Ceph here o/


I take this opportunity to consider the possibility of the appearance 
not in Ceph packaging, but Archlinux+Ceph related.
Currently with Archlinux packaging impossible to build "Samba CTDB 
Cluster with CephFS backend". This caused by lack of build options, 
ticket requests for this ignored for a years: [1], [2]. AFAIK all 
distro's lack of full RADOS support, only exception is SUSE - because 
most of RADOS features comes to Samba from SUSE employees. For CentOS7 
this covered here [3].


May be Thore can raise this question among samba package maintainers.



[1] https://bugs.archlinux.org/task/53467
[2] https://bugs.archlinux.org/task/49356
[3] https://lists.samba.org/archive/samba/2019-July/224288.html

Thanks,
k
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph OSD daemon causes network card issues

2019-07-18 Thread Konstantin Shalygin

On 7/18/19 7:43 PM, Geoffrey Rhodes wrote:

Sure, also attached.


Try to disable flow control via `ethtool -K  rx off tx off`.



k


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com