[ceph-users] protocol feature mismatch after upgrading to Hammer
I upgraded from giant to hammer yesterday and now 'ceph -w' is constantly repeating this message: 2015-04-09 08:50:26.318042 7f95dbf86700 0 -- 10.5.38.1:0/2037478 10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0 cs=0 l=1 c=0x7f95e0023670).connect protocol feature mismatch, my 3fff peer 13fff missing 1 It isn't always the same IP for the destination - here's another: 2015-04-09 08:50:20.322059 7f95dc087700 0 -- 10.5.38.1:0/2037478 10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0 cs=0 l=1 c=0x7f95e002b480).connect protocol feature mismatch, my 3fff peer 13fff missing 1 Some details about our install: We have 24 hosts with 18 OSDs each. 16 per host are spinning disks in an erasure coded pool (k=8 m=4). 2 OSDs per host are SSD partitions used for a caching tier in front of the EC pool. All 24 hosts are monitors. 4 hosts are mds. We are running cephfs with a client trying to write data over cephfs when we're seeing these messages. Any ideas? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] protocol feature mismatch after upgrading to Hammer
Can you dump your crush map and post it on pastebin or something? On Thu, Apr 9, 2015 at 7:26 AM, Kyle Hutson kylehut...@ksu.edu wrote: Nope - it's 64-bit. (Sorry, I missed the reply-all last time.) On Thu, Apr 9, 2015 at 9:24 AM, Gregory Farnum g...@gregs42.com wrote: [Re-added the list] Hmm, I'm checking the code and that shouldn't be possible. What's your ciient? (In particular, is it 32-bit? That's the only thing i can think of that might have slipped through our QA.) On Thu, Apr 9, 2015 at 7:17 AM, Kyle Hutson kylehut...@ksu.edu wrote: I did nothing to enable anything else. Just changed my ceph repo from 'giant' to 'hammer', then did 'yum update' and restarted services. On Thu, Apr 9, 2015 at 9:15 AM, Gregory Farnum g...@gregs42.com wrote: Did you enable the straw2 stuff? CRUSHV4 shouldn't be required by the cluster unless you made changes to the layout requiring it. If you did, the clients have to be upgraded to understand it. You could disable all the v4 features; that should let them connect again. -Greg On Thu, Apr 9, 2015 at 7:07 AM, Kyle Hutson kylehut...@ksu.edu wrote: This particular problem I just figured out myself ('ceph -w' was still running from before the upgrade, and ctrl-c and restarting solved that issue), but I'm still having a similar problem on the ceph client: libceph: mon19 10.5.38.20:6789 feature set mismatch, my 2b84a042aca server's 102b84a042aca, missing 1 It appears that even the latest kernel doesn't have support for CEPH_FEATURE_CRUSH_V4 How do I make my ceph cluster backward-compatible with the old cephfs client? On Thu, Apr 9, 2015 at 8:58 AM, Kyle Hutson kylehut...@ksu.edu wrote: I upgraded from giant to hammer yesterday and now 'ceph -w' is constantly repeating this message: 2015-04-09 08:50:26.318042 7f95dbf86700 0 -- 10.5.38.1:0/2037478 10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0 cs=0 l=1 c=0x7f95e0023670).connect protocol feature mismatch, my 3fff peer 13fff missing 1 It isn't always the same IP for the destination - here's another: 2015-04-09 08:50:20.322059 7f95dc087700 0 -- 10.5.38.1:0/2037478 10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0 cs=0 l=1 c=0x7f95e002b480).connect protocol feature mismatch, my 3fff peer 13fff missing 1 Some details about our install: We have 24 hosts with 18 OSDs each. 16 per host are spinning disks in an erasure coded pool (k=8 m=4). 2 OSDs per host are SSD partitions used for a caching tier in front of the EC pool. All 24 hosts are monitors. 4 hosts are mds. We are running cephfs with a client trying to write data over cephfs when we're seeing these messages. Any ideas? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] protocol feature mismatch after upgrading to Hammer
[Re-added the list] Hmm, I'm checking the code and that shouldn't be possible. What's your ciient? (In particular, is it 32-bit? That's the only thing i can think of that might have slipped through our QA.) On Thu, Apr 9, 2015 at 7:17 AM, Kyle Hutson kylehut...@ksu.edu wrote: I did nothing to enable anything else. Just changed my ceph repo from 'giant' to 'hammer', then did 'yum update' and restarted services. On Thu, Apr 9, 2015 at 9:15 AM, Gregory Farnum g...@gregs42.com wrote: Did you enable the straw2 stuff? CRUSHV4 shouldn't be required by the cluster unless you made changes to the layout requiring it. If you did, the clients have to be upgraded to understand it. You could disable all the v4 features; that should let them connect again. -Greg On Thu, Apr 9, 2015 at 7:07 AM, Kyle Hutson kylehut...@ksu.edu wrote: This particular problem I just figured out myself ('ceph -w' was still running from before the upgrade, and ctrl-c and restarting solved that issue), but I'm still having a similar problem on the ceph client: libceph: mon19 10.5.38.20:6789 feature set mismatch, my 2b84a042aca server's 102b84a042aca, missing 1 It appears that even the latest kernel doesn't have support for CEPH_FEATURE_CRUSH_V4 How do I make my ceph cluster backward-compatible with the old cephfs client? On Thu, Apr 9, 2015 at 8:58 AM, Kyle Hutson kylehut...@ksu.edu wrote: I upgraded from giant to hammer yesterday and now 'ceph -w' is constantly repeating this message: 2015-04-09 08:50:26.318042 7f95dbf86700 0 -- 10.5.38.1:0/2037478 10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0 cs=0 l=1 c=0x7f95e0023670).connect protocol feature mismatch, my 3fff peer 13fff missing 1 It isn't always the same IP for the destination - here's another: 2015-04-09 08:50:20.322059 7f95dc087700 0 -- 10.5.38.1:0/2037478 10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0 cs=0 l=1 c=0x7f95e002b480).connect protocol feature mismatch, my 3fff peer 13fff missing 1 Some details about our install: We have 24 hosts with 18 OSDs each. 16 per host are spinning disks in an erasure coded pool (k=8 m=4). 2 OSDs per host are SSD partitions used for a caching tier in front of the EC pool. All 24 hosts are monitors. 4 hosts are mds. We are running cephfs with a client trying to write data over cephfs when we're seeing these messages. Any ideas? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] protocol feature mismatch after upgrading to Hammer
http://people.beocat.cis.ksu.edu/~kylehutson/crushmap On Thu, Apr 9, 2015 at 11:25 AM, Gregory Farnum g...@gregs42.com wrote: Hmmm. That does look right and neither I nor Sage can come up with anything via code inspection. Can you post the actual binary crush map somewhere for download so that we can inspect it with our tools? -Greg On Thu, Apr 9, 2015 at 7:57 AM, Kyle Hutson kylehut...@ksu.edu wrote: Here 'tis: https://dpaste.de/POr1 On Thu, Apr 9, 2015 at 9:49 AM, Gregory Farnum g...@gregs42.com wrote: Can you dump your crush map and post it on pastebin or something? On Thu, Apr 9, 2015 at 7:26 AM, Kyle Hutson kylehut...@ksu.edu wrote: Nope - it's 64-bit. (Sorry, I missed the reply-all last time.) On Thu, Apr 9, 2015 at 9:24 AM, Gregory Farnum g...@gregs42.com wrote: [Re-added the list] Hmm, I'm checking the code and that shouldn't be possible. What's your ciient? (In particular, is it 32-bit? That's the only thing i can think of that might have slipped through our QA.) On Thu, Apr 9, 2015 at 7:17 AM, Kyle Hutson kylehut...@ksu.edu wrote: I did nothing to enable anything else. Just changed my ceph repo from 'giant' to 'hammer', then did 'yum update' and restarted services. On Thu, Apr 9, 2015 at 9:15 AM, Gregory Farnum g...@gregs42.com wrote: Did you enable the straw2 stuff? CRUSHV4 shouldn't be required by the cluster unless you made changes to the layout requiring it. If you did, the clients have to be upgraded to understand it. You could disable all the v4 features; that should let them connect again. -Greg On Thu, Apr 9, 2015 at 7:07 AM, Kyle Hutson kylehut...@ksu.edu wrote: This particular problem I just figured out myself ('ceph -w' was still running from before the upgrade, and ctrl-c and restarting solved that issue), but I'm still having a similar problem on the ceph client: libceph: mon19 10.5.38.20:6789 feature set mismatch, my 2b84a042aca server's 102b84a042aca, missing 1 It appears that even the latest kernel doesn't have support for CEPH_FEATURE_CRUSH_V4 How do I make my ceph cluster backward-compatible with the old cephfs client? On Thu, Apr 9, 2015 at 8:58 AM, Kyle Hutson kylehut...@ksu.edu wrote: I upgraded from giant to hammer yesterday and now 'ceph -w' is constantly repeating this message: 2015-04-09 08:50:26.318042 7f95dbf86700 0 -- 10.5.38.1:0/2037478 10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0 cs=0 l=1 c=0x7f95e0023670).connect protocol feature mismatch, my 3fff peer 13fff missing 1 It isn't always the same IP for the destination - here's another: 2015-04-09 08:50:20.322059 7f95dc087700 0 -- 10.5.38.1:0/2037478 10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0 cs=0 l=1 c=0x7f95e002b480).connect protocol feature mismatch, my 3fff peer 13fff missing 1 Some details about our install: We have 24 hosts with 18 OSDs each. 16 per host are spinning disks in an erasure coded pool (k=8 m=4). 2 OSDs per host are SSD partitions used for a caching tier in front of the EC pool. All 24 hosts are monitors. 4 hosts are mds. We are running cephfs with a client trying to write data over cephfs when we're seeing these messages. Any ideas? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] protocol feature mismatch after upgrading to Hammer
Here 'tis: https://dpaste.de/POr1 On Thu, Apr 9, 2015 at 9:49 AM, Gregory Farnum g...@gregs42.com wrote: Can you dump your crush map and post it on pastebin or something? On Thu, Apr 9, 2015 at 7:26 AM, Kyle Hutson kylehut...@ksu.edu wrote: Nope - it's 64-bit. (Sorry, I missed the reply-all last time.) On Thu, Apr 9, 2015 at 9:24 AM, Gregory Farnum g...@gregs42.com wrote: [Re-added the list] Hmm, I'm checking the code and that shouldn't be possible. What's your ciient? (In particular, is it 32-bit? That's the only thing i can think of that might have slipped through our QA.) On Thu, Apr 9, 2015 at 7:17 AM, Kyle Hutson kylehut...@ksu.edu wrote: I did nothing to enable anything else. Just changed my ceph repo from 'giant' to 'hammer', then did 'yum update' and restarted services. On Thu, Apr 9, 2015 at 9:15 AM, Gregory Farnum g...@gregs42.com wrote: Did you enable the straw2 stuff? CRUSHV4 shouldn't be required by the cluster unless you made changes to the layout requiring it. If you did, the clients have to be upgraded to understand it. You could disable all the v4 features; that should let them connect again. -Greg On Thu, Apr 9, 2015 at 7:07 AM, Kyle Hutson kylehut...@ksu.edu wrote: This particular problem I just figured out myself ('ceph -w' was still running from before the upgrade, and ctrl-c and restarting solved that issue), but I'm still having a similar problem on the ceph client: libceph: mon19 10.5.38.20:6789 feature set mismatch, my 2b84a042aca server's 102b84a042aca, missing 1 It appears that even the latest kernel doesn't have support for CEPH_FEATURE_CRUSH_V4 How do I make my ceph cluster backward-compatible with the old cephfs client? On Thu, Apr 9, 2015 at 8:58 AM, Kyle Hutson kylehut...@ksu.edu wrote: I upgraded from giant to hammer yesterday and now 'ceph -w' is constantly repeating this message: 2015-04-09 08:50:26.318042 7f95dbf86700 0 -- 10.5.38.1:0/2037478 10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0 cs=0 l=1 c=0x7f95e0023670).connect protocol feature mismatch, my 3fff peer 13fff missing 1 It isn't always the same IP for the destination - here's another: 2015-04-09 08:50:20.322059 7f95dc087700 0 -- 10.5.38.1:0/2037478 10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0 cs=0 l=1 c=0x7f95e002b480).connect protocol feature mismatch, my 3fff peer 13fff missing 1 Some details about our install: We have 24 hosts with 18 OSDs each. 16 per host are spinning disks in an erasure coded pool (k=8 m=4). 2 OSDs per host are SSD partitions used for a caching tier in front of the EC pool. All 24 hosts are monitors. 4 hosts are mds. We are running cephfs with a client trying to write data over cephfs when we're seeing these messages. Any ideas? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] protocol feature mismatch after upgrading to Hammer
Nope - it's 64-bit. (Sorry, I missed the reply-all last time.) On Thu, Apr 9, 2015 at 9:24 AM, Gregory Farnum g...@gregs42.com wrote: [Re-added the list] Hmm, I'm checking the code and that shouldn't be possible. What's your ciient? (In particular, is it 32-bit? That's the only thing i can think of that might have slipped through our QA.) On Thu, Apr 9, 2015 at 7:17 AM, Kyle Hutson kylehut...@ksu.edu wrote: I did nothing to enable anything else. Just changed my ceph repo from 'giant' to 'hammer', then did 'yum update' and restarted services. On Thu, Apr 9, 2015 at 9:15 AM, Gregory Farnum g...@gregs42.com wrote: Did you enable the straw2 stuff? CRUSHV4 shouldn't be required by the cluster unless you made changes to the layout requiring it. If you did, the clients have to be upgraded to understand it. You could disable all the v4 features; that should let them connect again. -Greg On Thu, Apr 9, 2015 at 7:07 AM, Kyle Hutson kylehut...@ksu.edu wrote: This particular problem I just figured out myself ('ceph -w' was still running from before the upgrade, and ctrl-c and restarting solved that issue), but I'm still having a similar problem on the ceph client: libceph: mon19 10.5.38.20:6789 feature set mismatch, my 2b84a042aca server's 102b84a042aca, missing 1 It appears that even the latest kernel doesn't have support for CEPH_FEATURE_CRUSH_V4 How do I make my ceph cluster backward-compatible with the old cephfs client? On Thu, Apr 9, 2015 at 8:58 AM, Kyle Hutson kylehut...@ksu.edu wrote: I upgraded from giant to hammer yesterday and now 'ceph -w' is constantly repeating this message: 2015-04-09 08:50:26.318042 7f95dbf86700 0 -- 10.5.38.1:0/2037478 10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0 cs=0 l=1 c=0x7f95e0023670).connect protocol feature mismatch, my 3fff peer 13fff missing 1 It isn't always the same IP for the destination - here's another: 2015-04-09 08:50:20.322059 7f95dc087700 0 -- 10.5.38.1:0/2037478 10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0 cs=0 l=1 c=0x7f95e002b480).connect protocol feature mismatch, my 3fff peer 13fff missing 1 Some details about our install: We have 24 hosts with 18 OSDs each. 16 per host are spinning disks in an erasure coded pool (k=8 m=4). 2 OSDs per host are SSD partitions used for a caching tier in front of the EC pool. All 24 hosts are monitors. 4 hosts are mds. We are running cephfs with a client trying to write data over cephfs when we're seeing these messages. Any ideas? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] protocol feature mismatch after upgrading to Hammer
This particular problem I just figured out myself ('ceph -w' was still running from before the upgrade, and ctrl-c and restarting solved that issue), but I'm still having a similar problem on the ceph client: libceph: mon19 10.5.38.20:6789 feature set mismatch, my 2b84a042aca server's 102b84a042aca, missing 1 It appears that even the latest kernel doesn't have support for CEPH_FEATURE_CRUSH_V4 How do I make my ceph cluster backward-compatible with the old cephfs client? On Thu, Apr 9, 2015 at 8:58 AM, Kyle Hutson kylehut...@ksu.edu wrote: I upgraded from giant to hammer yesterday and now 'ceph -w' is constantly repeating this message: 2015-04-09 08:50:26.318042 7f95dbf86700 0 -- 10.5.38.1:0/2037478 10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0 cs=0 l=1 c=0x7f95e0023670).connect protocol feature mismatch, my 3fff peer 13fff missing 1 It isn't always the same IP for the destination - here's another: 2015-04-09 08:50:20.322059 7f95dc087700 0 -- 10.5.38.1:0/2037478 10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0 cs=0 l=1 c=0x7f95e002b480).connect protocol feature mismatch, my 3fff peer 13fff missing 1 Some details about our install: We have 24 hosts with 18 OSDs each. 16 per host are spinning disks in an erasure coded pool (k=8 m=4). 2 OSDs per host are SSD partitions used for a caching tier in front of the EC pool. All 24 hosts are monitors. 4 hosts are mds. We are running cephfs with a client trying to write data over cephfs when we're seeing these messages. Any ideas? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com