Re: [ceph-users] Ceph 10.2.11 - Status not working

2018-12-17 Thread Dyweni - Ceph-Users



On 2018-12-17 20:16, Brad Hubbard wrote:

On Tue, Dec 18, 2018 at 10:23 AM Mike O'Connor  wrote:


Hi All

I have a ceph cluster which has been working with out issues for about 
2

years now, it was upgrade about 6 month ago to 10.2.11

root@blade3:/var/lib/ceph/mon# ceph status
2018-12-18 10:42:39.242217 7ff770471700  0 -- 10.1.5.203:0/1608630285 
>>

10.1.5.207:6789/0 pipe(0x7ff768000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1
c=0x7ff768001f90).fault
2018-12-18 10:42:45.242745 7ff770471700  0 -- 10.1.5.203:0/1608630285 
>>

10.1.5.207:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1
c=0x7ff768002410).fault
2018-12-18 10:42:51.243230 7ff770471700  0 -- 10.1.5.203:0/1608630285 
>>

10.1.5.207:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1
c=0x7ff768002f40).fault
2018-12-18 10:42:54.243452 7ff770572700  0 -- 10.1.5.203:0/1608630285 
>>

10.1.5.205:6789/0 pipe(0x7ff768000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1
c=0x7ff768008060).fault
2018-12-18 10:42:57.243715 7ff770471700  0 -- 10.1.5.203:0/1608630285 
>>

10.1.5.207:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1
c=0x7ff768003580).fault
2018-12-18 10:43:03.244280 7ff7781b9700  0 -- 10.1.5.203:0/1608630285 
>>

10.1.5.205:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1
c=0x7ff768003670).fault

All system can ping each other. I simple can not see why its failing.


ceph.conf

[global]
 auth client required = cephx
 auth cluster required = cephx
 auth service required = cephx
 cluster network = 10.1.5.0/24
 filestore xattr use omap = true
 fsid = 42a0f015-76da-4f47-b506-da5cdacd030f
 keyring = /etc/pve/priv/$cluster.$name.keyring
 osd journal size = 5120
 osd pool default min size = 1
 public network = 10.1.5.0/24
 mon_pg_warn_max_per_osd = 0

[client]
 rbd cache = true
[osd]
 keyring = /var/lib/ceph/osd/ceph-$id/keyring
 osd max backfills = 1
 osd recovery max active = 1
 osd_disk_threads = 1
 osd_disk_thread_ioprio_class = idle
 osd_disk_thread_ioprio_priority = 7
[mon.2]
 host = blade5
 mon addr = 10.1.5.205:6789
[mon.1]
 host = blade3
 mon addr = 10.1.5.203:6789
[mon.3]
 host = blade7
 mon addr = 10.1.5.207:6789
[mon.0]
 host = blade1
 mon addr = 10.1.5.201:6789
[mds]
 mds data = /var/lib/ceph/mds/mds.$id
 keyring = /var/lib/ceph/mds/mds.$id/mds.$id.keyring
[mds.0]
 host = blade1
[mds.1]
 host = blade3
[mds.2]
 host = blade5
[mds.3]
 host = blade7


Any ideas ? more information ?


The system on which you are running the "ceph" client, blade3
(10.1.5.203) is trying to contact monitors on 10.1.5.207 (blade7) port
6789 and 10.1.5.205 (blade5) port 6789. You need to check the ceph-mon
binary is running on blade7 and blade5 and that they are listening on
port 6789 and that that port is accessible from blade3. The simplest
explanation is the MONs are not running. The next simplest is their is
a firewall interfering with blade3's ability to connect to port 6789
on those machines. Check the above and see what you find.



Hi,

After what Brad wrote, as for what would cause your MONs to not be 
running...


Check kernel logs / dmesg... bad blocks?  (Unlikely to knock out both 
MONs)
Check disk space on /var/lib/ceph/mon/...  Did it full up?  (check both 
blocks and inodes)


You said it was running without issues... just to double check... were 
ALL your PGs healthy?  (i.e.  active+clean)?  MONs will not trim their 
logs if any PG is not healthy.  Newer versions of Ceph do grow their 
logs as fast as the older versions.


Good luck!
Dyweni

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph 10.2.11 - Status not working

2018-12-17 Thread Brad Hubbard
On Tue, Dec 18, 2018 at 10:23 AM Mike O'Connor  wrote:
>
> Hi All
>
> I have a ceph cluster which has been working with out issues for about 2
> years now, it was upgrade about 6 month ago to 10.2.11
>
> root@blade3:/var/lib/ceph/mon# ceph status
> 2018-12-18 10:42:39.242217 7ff770471700  0 -- 10.1.5.203:0/1608630285 >>
> 10.1.5.207:6789/0 pipe(0x7ff768000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1
> c=0x7ff768001f90).fault
> 2018-12-18 10:42:45.242745 7ff770471700  0 -- 10.1.5.203:0/1608630285 >>
> 10.1.5.207:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1
> c=0x7ff768002410).fault
> 2018-12-18 10:42:51.243230 7ff770471700  0 -- 10.1.5.203:0/1608630285 >>
> 10.1.5.207:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1
> c=0x7ff768002f40).fault
> 2018-12-18 10:42:54.243452 7ff770572700  0 -- 10.1.5.203:0/1608630285 >>
> 10.1.5.205:6789/0 pipe(0x7ff768000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1
> c=0x7ff768008060).fault
> 2018-12-18 10:42:57.243715 7ff770471700  0 -- 10.1.5.203:0/1608630285 >>
> 10.1.5.207:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1
> c=0x7ff768003580).fault
> 2018-12-18 10:43:03.244280 7ff7781b9700  0 -- 10.1.5.203:0/1608630285 >>
> 10.1.5.205:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1
> c=0x7ff768003670).fault
>
> All system can ping each other. I simple can not see why its failing.
>
>
> ceph.conf
>
> [global]
>  auth client required = cephx
>  auth cluster required = cephx
>  auth service required = cephx
>  cluster network = 10.1.5.0/24
>  filestore xattr use omap = true
>  fsid = 42a0f015-76da-4f47-b506-da5cdacd030f
>  keyring = /etc/pve/priv/$cluster.$name.keyring
>  osd journal size = 5120
>  osd pool default min size = 1
>  public network = 10.1.5.0/24
>  mon_pg_warn_max_per_osd = 0
>
> [client]
>  rbd cache = true
> [osd]
>  keyring = /var/lib/ceph/osd/ceph-$id/keyring
>  osd max backfills = 1
>  osd recovery max active = 1
>  osd_disk_threads = 1
>  osd_disk_thread_ioprio_class = idle
>  osd_disk_thread_ioprio_priority = 7
> [mon.2]
>  host = blade5
>  mon addr = 10.1.5.205:6789
> [mon.1]
>  host = blade3
>  mon addr = 10.1.5.203:6789
> [mon.3]
>  host = blade7
>  mon addr = 10.1.5.207:6789
> [mon.0]
>  host = blade1
>  mon addr = 10.1.5.201:6789
> [mds]
>  mds data = /var/lib/ceph/mds/mds.$id
>  keyring = /var/lib/ceph/mds/mds.$id/mds.$id.keyring
> [mds.0]
>  host = blade1
> [mds.1]
>  host = blade3
> [mds.2]
>  host = blade5
> [mds.3]
>  host = blade7
>
>
> Any ideas ? more information ?

The system on which you are running the "ceph" client, blade3
(10.1.5.203) is trying to contact monitors on 10.1.5.207 (blade7) port
6789 and 10.1.5.205 (blade5) port 6789. You need to check the ceph-mon
binary is running on blade7 and blade5 and that they are listening on
port 6789 and that that port is accessible from blade3. The simplest
explanation is the MONs are not running. The next simplest is their is
a firewall interfering with blade3's ability to connect to port 6789
on those machines. Check the above and see what you find.

-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph 10.2.11 - Status not working

2018-12-17 Thread Mike O'Connor
Hi All

I have a ceph cluster which has been working with out issues for about 2
years now, it was upgrade about 6 month ago to 10.2.11

root@blade3:/var/lib/ceph/mon# ceph status
2018-12-18 10:42:39.242217 7ff770471700  0 -- 10.1.5.203:0/1608630285 >>
10.1.5.207:6789/0 pipe(0x7ff768000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1
c=0x7ff768001f90).fault
2018-12-18 10:42:45.242745 7ff770471700  0 -- 10.1.5.203:0/1608630285 >>
10.1.5.207:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1
c=0x7ff768002410).fault
2018-12-18 10:42:51.243230 7ff770471700  0 -- 10.1.5.203:0/1608630285 >>
10.1.5.207:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1
c=0x7ff768002f40).fault
2018-12-18 10:42:54.243452 7ff770572700  0 -- 10.1.5.203:0/1608630285 >>
10.1.5.205:6789/0 pipe(0x7ff768000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1
c=0x7ff768008060).fault
2018-12-18 10:42:57.243715 7ff770471700  0 -- 10.1.5.203:0/1608630285 >>
10.1.5.207:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1
c=0x7ff768003580).fault
2018-12-18 10:43:03.244280 7ff7781b9700  0 -- 10.1.5.203:0/1608630285 >>
10.1.5.205:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1
c=0x7ff768003670).fault

All system can ping each other. I simple can not see why its failing.


ceph.conf

[global]
     auth client required = cephx
     auth cluster required = cephx
     auth service required = cephx
     cluster network = 10.1.5.0/24
     filestore xattr use omap = true
     fsid = 42a0f015-76da-4f47-b506-da5cdacd030f
     keyring = /etc/pve/priv/$cluster.$name.keyring
     osd journal size = 5120
     osd pool default min size = 1
     public network = 10.1.5.0/24
 mon_pg_warn_max_per_osd = 0

[client]
     rbd cache = true
[osd]
     keyring = /var/lib/ceph/osd/ceph-$id/keyring
     osd max backfills = 1
     osd recovery max active = 1
     osd_disk_threads = 1
     osd_disk_thread_ioprio_class = idle
     osd_disk_thread_ioprio_priority = 7
[mon.2]
     host = blade5
     mon addr = 10.1.5.205:6789
[mon.1]
     host = blade3
     mon addr = 10.1.5.203:6789
[mon.3]
     host = blade7
     mon addr = 10.1.5.207:6789
[mon.0]
     host = blade1
     mon addr = 10.1.5.201:6789
[mds]
 mds data = /var/lib/ceph/mds/mds.$id
 keyring = /var/lib/ceph/mds/mds.$id/mds.$id.keyring
[mds.0]
 host = blade1
[mds.1]
 host = blade3
[mds.2]
 host = blade5
[mds.3]
 host = blade7


Any ideas ? more information ?


Mike

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com