Re: [ceph-users] Ceph 10.2.11 - Status not working
On 2018-12-17 20:16, Brad Hubbard wrote: On Tue, Dec 18, 2018 at 10:23 AM Mike O'Connor wrote: Hi All I have a ceph cluster which has been working with out issues for about 2 years now, it was upgrade about 6 month ago to 10.2.11 root@blade3:/var/lib/ceph/mon# ceph status 2018-12-18 10:42:39.242217 7ff770471700 0 -- 10.1.5.203:0/1608630285 >> 10.1.5.207:6789/0 pipe(0x7ff768000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7ff768001f90).fault 2018-12-18 10:42:45.242745 7ff770471700 0 -- 10.1.5.203:0/1608630285 >> 10.1.5.207:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7ff768002410).fault 2018-12-18 10:42:51.243230 7ff770471700 0 -- 10.1.5.203:0/1608630285 >> 10.1.5.207:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7ff768002f40).fault 2018-12-18 10:42:54.243452 7ff770572700 0 -- 10.1.5.203:0/1608630285 >> 10.1.5.205:6789/0 pipe(0x7ff768000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7ff768008060).fault 2018-12-18 10:42:57.243715 7ff770471700 0 -- 10.1.5.203:0/1608630285 >> 10.1.5.207:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7ff768003580).fault 2018-12-18 10:43:03.244280 7ff7781b9700 0 -- 10.1.5.203:0/1608630285 >> 10.1.5.205:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7ff768003670).fault All system can ping each other. I simple can not see why its failing. ceph.conf [global] auth client required = cephx auth cluster required = cephx auth service required = cephx cluster network = 10.1.5.0/24 filestore xattr use omap = true fsid = 42a0f015-76da-4f47-b506-da5cdacd030f keyring = /etc/pve/priv/$cluster.$name.keyring osd journal size = 5120 osd pool default min size = 1 public network = 10.1.5.0/24 mon_pg_warn_max_per_osd = 0 [client] rbd cache = true [osd] keyring = /var/lib/ceph/osd/ceph-$id/keyring osd max backfills = 1 osd recovery max active = 1 osd_disk_threads = 1 osd_disk_thread_ioprio_class = idle osd_disk_thread_ioprio_priority = 7 [mon.2] host = blade5 mon addr = 10.1.5.205:6789 [mon.1] host = blade3 mon addr = 10.1.5.203:6789 [mon.3] host = blade7 mon addr = 10.1.5.207:6789 [mon.0] host = blade1 mon addr = 10.1.5.201:6789 [mds] mds data = /var/lib/ceph/mds/mds.$id keyring = /var/lib/ceph/mds/mds.$id/mds.$id.keyring [mds.0] host = blade1 [mds.1] host = blade3 [mds.2] host = blade5 [mds.3] host = blade7 Any ideas ? more information ? The system on which you are running the "ceph" client, blade3 (10.1.5.203) is trying to contact monitors on 10.1.5.207 (blade7) port 6789 and 10.1.5.205 (blade5) port 6789. You need to check the ceph-mon binary is running on blade7 and blade5 and that they are listening on port 6789 and that that port is accessible from blade3. The simplest explanation is the MONs are not running. The next simplest is their is a firewall interfering with blade3's ability to connect to port 6789 on those machines. Check the above and see what you find. Hi, After what Brad wrote, as for what would cause your MONs to not be running... Check kernel logs / dmesg... bad blocks? (Unlikely to knock out both MONs) Check disk space on /var/lib/ceph/mon/... Did it full up? (check both blocks and inodes) You said it was running without issues... just to double check... were ALL your PGs healthy? (i.e. active+clean)? MONs will not trim their logs if any PG is not healthy. Newer versions of Ceph do grow their logs as fast as the older versions. Good luck! Dyweni ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph 10.2.11 - Status not working
On Tue, Dec 18, 2018 at 10:23 AM Mike O'Connor wrote: > > Hi All > > I have a ceph cluster which has been working with out issues for about 2 > years now, it was upgrade about 6 month ago to 10.2.11 > > root@blade3:/var/lib/ceph/mon# ceph status > 2018-12-18 10:42:39.242217 7ff770471700 0 -- 10.1.5.203:0/1608630285 >> > 10.1.5.207:6789/0 pipe(0x7ff768000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1 > c=0x7ff768001f90).fault > 2018-12-18 10:42:45.242745 7ff770471700 0 -- 10.1.5.203:0/1608630285 >> > 10.1.5.207:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1 > c=0x7ff768002410).fault > 2018-12-18 10:42:51.243230 7ff770471700 0 -- 10.1.5.203:0/1608630285 >> > 10.1.5.207:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1 > c=0x7ff768002f40).fault > 2018-12-18 10:42:54.243452 7ff770572700 0 -- 10.1.5.203:0/1608630285 >> > 10.1.5.205:6789/0 pipe(0x7ff768000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1 > c=0x7ff768008060).fault > 2018-12-18 10:42:57.243715 7ff770471700 0 -- 10.1.5.203:0/1608630285 >> > 10.1.5.207:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1 > c=0x7ff768003580).fault > 2018-12-18 10:43:03.244280 7ff7781b9700 0 -- 10.1.5.203:0/1608630285 >> > 10.1.5.205:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1 > c=0x7ff768003670).fault > > All system can ping each other. I simple can not see why its failing. > > > ceph.conf > > [global] > auth client required = cephx > auth cluster required = cephx > auth service required = cephx > cluster network = 10.1.5.0/24 > filestore xattr use omap = true > fsid = 42a0f015-76da-4f47-b506-da5cdacd030f > keyring = /etc/pve/priv/$cluster.$name.keyring > osd journal size = 5120 > osd pool default min size = 1 > public network = 10.1.5.0/24 > mon_pg_warn_max_per_osd = 0 > > [client] > rbd cache = true > [osd] > keyring = /var/lib/ceph/osd/ceph-$id/keyring > osd max backfills = 1 > osd recovery max active = 1 > osd_disk_threads = 1 > osd_disk_thread_ioprio_class = idle > osd_disk_thread_ioprio_priority = 7 > [mon.2] > host = blade5 > mon addr = 10.1.5.205:6789 > [mon.1] > host = blade3 > mon addr = 10.1.5.203:6789 > [mon.3] > host = blade7 > mon addr = 10.1.5.207:6789 > [mon.0] > host = blade1 > mon addr = 10.1.5.201:6789 > [mds] > mds data = /var/lib/ceph/mds/mds.$id > keyring = /var/lib/ceph/mds/mds.$id/mds.$id.keyring > [mds.0] > host = blade1 > [mds.1] > host = blade3 > [mds.2] > host = blade5 > [mds.3] > host = blade7 > > > Any ideas ? more information ? The system on which you are running the "ceph" client, blade3 (10.1.5.203) is trying to contact monitors on 10.1.5.207 (blade7) port 6789 and 10.1.5.205 (blade5) port 6789. You need to check the ceph-mon binary is running on blade7 and blade5 and that they are listening on port 6789 and that that port is accessible from blade3. The simplest explanation is the MONs are not running. The next simplest is their is a firewall interfering with blade3's ability to connect to port 6789 on those machines. Check the above and see what you find. -- Cheers, Brad ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph 10.2.11 - Status not working
Hi All I have a ceph cluster which has been working with out issues for about 2 years now, it was upgrade about 6 month ago to 10.2.11 root@blade3:/var/lib/ceph/mon# ceph status 2018-12-18 10:42:39.242217 7ff770471700 0 -- 10.1.5.203:0/1608630285 >> 10.1.5.207:6789/0 pipe(0x7ff768000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7ff768001f90).fault 2018-12-18 10:42:45.242745 7ff770471700 0 -- 10.1.5.203:0/1608630285 >> 10.1.5.207:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7ff768002410).fault 2018-12-18 10:42:51.243230 7ff770471700 0 -- 10.1.5.203:0/1608630285 >> 10.1.5.207:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7ff768002f40).fault 2018-12-18 10:42:54.243452 7ff770572700 0 -- 10.1.5.203:0/1608630285 >> 10.1.5.205:6789/0 pipe(0x7ff768000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7ff768008060).fault 2018-12-18 10:42:57.243715 7ff770471700 0 -- 10.1.5.203:0/1608630285 >> 10.1.5.207:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7ff768003580).fault 2018-12-18 10:43:03.244280 7ff7781b9700 0 -- 10.1.5.203:0/1608630285 >> 10.1.5.205:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7ff768003670).fault All system can ping each other. I simple can not see why its failing. ceph.conf [global] auth client required = cephx auth cluster required = cephx auth service required = cephx cluster network = 10.1.5.0/24 filestore xattr use omap = true fsid = 42a0f015-76da-4f47-b506-da5cdacd030f keyring = /etc/pve/priv/$cluster.$name.keyring osd journal size = 5120 osd pool default min size = 1 public network = 10.1.5.0/24 mon_pg_warn_max_per_osd = 0 [client] rbd cache = true [osd] keyring = /var/lib/ceph/osd/ceph-$id/keyring osd max backfills = 1 osd recovery max active = 1 osd_disk_threads = 1 osd_disk_thread_ioprio_class = idle osd_disk_thread_ioprio_priority = 7 [mon.2] host = blade5 mon addr = 10.1.5.205:6789 [mon.1] host = blade3 mon addr = 10.1.5.203:6789 [mon.3] host = blade7 mon addr = 10.1.5.207:6789 [mon.0] host = blade1 mon addr = 10.1.5.201:6789 [mds] mds data = /var/lib/ceph/mds/mds.$id keyring = /var/lib/ceph/mds/mds.$id/mds.$id.keyring [mds.0] host = blade1 [mds.1] host = blade3 [mds.2] host = blade5 [mds.3] host = blade7 Any ideas ? more information ? Mike ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com