[ceph-users] protocol feature mismatch after upgrading to Hammer

2015-04-09 Thread Kyle Hutson
I upgraded from giant to hammer yesterday and now 'ceph -w' is constantly
repeating this message:

2015-04-09 08:50:26.318042 7f95dbf86700  0 -- 10.5.38.1:0/2037478 
10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0 cs=0 l=1
c=0x7f95e0023670).connect protocol feature mismatch, my 3fff  peer
13fff missing 1

It isn't always the same IP for the destination - here's another:
2015-04-09 08:50:20.322059 7f95dc087700  0 -- 10.5.38.1:0/2037478 
10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0 cs=0 l=1
c=0x7f95e002b480).connect protocol feature mismatch, my 3fff  peer
13fff missing 1

Some details about our install:
We have 24 hosts with 18 OSDs each. 16 per host are spinning disks in an
erasure coded pool (k=8 m=4). 2 OSDs per host are SSD partitions used for a
caching tier in front of the EC pool. All 24 hosts are monitors. 4 hosts
are mds. We are running cephfs with a client trying to write data over
cephfs when we're seeing these messages.

Any ideas?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] protocol feature mismatch after upgrading to Hammer

2015-04-09 Thread Gregory Farnum
Can you dump your crush map and post it on pastebin or something?

On Thu, Apr 9, 2015 at 7:26 AM, Kyle Hutson kylehut...@ksu.edu wrote:
 Nope - it's 64-bit.

 (Sorry, I missed the reply-all last time.)

 On Thu, Apr 9, 2015 at 9:24 AM, Gregory Farnum g...@gregs42.com wrote:

 [Re-added the list]

 Hmm, I'm checking the code and that shouldn't be possible. What's your
 ciient? (In particular, is it 32-bit? That's the only thing i can
 think of that might have slipped through our QA.)

 On Thu, Apr 9, 2015 at 7:17 AM, Kyle Hutson kylehut...@ksu.edu wrote:
  I did nothing to enable anything else. Just changed my ceph repo from
  'giant' to 'hammer', then did 'yum update' and restarted services.
 
  On Thu, Apr 9, 2015 at 9:15 AM, Gregory Farnum g...@gregs42.com wrote:
 
  Did you enable the straw2 stuff? CRUSHV4 shouldn't be required by the
  cluster unless you made changes to the layout requiring it.
 
  If you did, the clients have to be upgraded to understand it. You
  could disable all the v4 features; that should let them connect again.
  -Greg
 
  On Thu, Apr 9, 2015 at 7:07 AM, Kyle Hutson kylehut...@ksu.edu wrote:
   This particular problem I just figured out myself ('ceph -w' was
   still
   running from before the upgrade, and ctrl-c and restarting solved
   that
   issue), but I'm still having a similar problem on the ceph client:
  
   libceph: mon19 10.5.38.20:6789 feature set mismatch, my 2b84a042aca 
   server's 102b84a042aca, missing 1
  
   It appears that even the latest kernel doesn't have support for
   CEPH_FEATURE_CRUSH_V4
  
   How do I make my ceph cluster backward-compatible with the old cephfs
   client?
  
   On Thu, Apr 9, 2015 at 8:58 AM, Kyle Hutson kylehut...@ksu.edu
   wrote:
  
   I upgraded from giant to hammer yesterday and now 'ceph -w' is
   constantly
   repeating this message:
  
   2015-04-09 08:50:26.318042 7f95dbf86700  0 -- 10.5.38.1:0/2037478 
   10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0 cs=0 l=1
   c=0x7f95e0023670).connect protocol feature mismatch, my 3fff
   
   peer
   13fff missing 1
  
   It isn't always the same IP for the destination - here's another:
   2015-04-09 08:50:20.322059 7f95dc087700  0 -- 10.5.38.1:0/2037478 
   10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0 cs=0 l=1
   c=0x7f95e002b480).connect protocol feature mismatch, my 3fff
   
   peer
   13fff missing 1
  
   Some details about our install:
   We have 24 hosts with 18 OSDs each. 16 per host are spinning disks
   in
   an
   erasure coded pool (k=8 m=4). 2 OSDs per host are SSD partitions
   used
   for a
   caching tier in front of the EC pool. All 24 hosts are monitors. 4
   hosts are
   mds. We are running cephfs with a client trying to write data over
   cephfs
   when we're seeing these messages.
  
   Any ideas?
  
  
  
   ___
   ceph-users mailing list
   ceph-users@lists.ceph.com
   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  
 
 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] protocol feature mismatch after upgrading to Hammer

2015-04-09 Thread Gregory Farnum
[Re-added the list]

Hmm, I'm checking the code and that shouldn't be possible. What's your
ciient? (In particular, is it 32-bit? That's the only thing i can
think of that might have slipped through our QA.)

On Thu, Apr 9, 2015 at 7:17 AM, Kyle Hutson kylehut...@ksu.edu wrote:
 I did nothing to enable anything else. Just changed my ceph repo from
 'giant' to 'hammer', then did 'yum update' and restarted services.

 On Thu, Apr 9, 2015 at 9:15 AM, Gregory Farnum g...@gregs42.com wrote:

 Did you enable the straw2 stuff? CRUSHV4 shouldn't be required by the
 cluster unless you made changes to the layout requiring it.

 If you did, the clients have to be upgraded to understand it. You
 could disable all the v4 features; that should let them connect again.
 -Greg

 On Thu, Apr 9, 2015 at 7:07 AM, Kyle Hutson kylehut...@ksu.edu wrote:
  This particular problem I just figured out myself ('ceph -w' was still
  running from before the upgrade, and ctrl-c and restarting solved that
  issue), but I'm still having a similar problem on the ceph client:
 
  libceph: mon19 10.5.38.20:6789 feature set mismatch, my 2b84a042aca 
  server's 102b84a042aca, missing 1
 
  It appears that even the latest kernel doesn't have support for
  CEPH_FEATURE_CRUSH_V4
 
  How do I make my ceph cluster backward-compatible with the old cephfs
  client?
 
  On Thu, Apr 9, 2015 at 8:58 AM, Kyle Hutson kylehut...@ksu.edu wrote:
 
  I upgraded from giant to hammer yesterday and now 'ceph -w' is
  constantly
  repeating this message:
 
  2015-04-09 08:50:26.318042 7f95dbf86700  0 -- 10.5.38.1:0/2037478 
  10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0 cs=0 l=1
  c=0x7f95e0023670).connect protocol feature mismatch, my 3fff 
  peer
  13fff missing 1
 
  It isn't always the same IP for the destination - here's another:
  2015-04-09 08:50:20.322059 7f95dc087700  0 -- 10.5.38.1:0/2037478 
  10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0 cs=0 l=1
  c=0x7f95e002b480).connect protocol feature mismatch, my 3fff 
  peer
  13fff missing 1
 
  Some details about our install:
  We have 24 hosts with 18 OSDs each. 16 per host are spinning disks in
  an
  erasure coded pool (k=8 m=4). 2 OSDs per host are SSD partitions used
  for a
  caching tier in front of the EC pool. All 24 hosts are monitors. 4
  hosts are
  mds. We are running cephfs with a client trying to write data over
  cephfs
  when we're seeing these messages.
 
  Any ideas?
 
 
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] protocol feature mismatch after upgrading to Hammer

2015-04-09 Thread Kyle Hutson
http://people.beocat.cis.ksu.edu/~kylehutson/crushmap

On Thu, Apr 9, 2015 at 11:25 AM, Gregory Farnum g...@gregs42.com wrote:

 Hmmm. That does look right and neither I nor Sage can come up with
 anything via code inspection. Can you post the actual binary crush map
 somewhere for download so that we can inspect it with our tools?
 -Greg

 On Thu, Apr 9, 2015 at 7:57 AM, Kyle Hutson kylehut...@ksu.edu wrote:
  Here 'tis:
  https://dpaste.de/POr1
 
 
  On Thu, Apr 9, 2015 at 9:49 AM, Gregory Farnum g...@gregs42.com wrote:
 
  Can you dump your crush map and post it on pastebin or something?
 
  On Thu, Apr 9, 2015 at 7:26 AM, Kyle Hutson kylehut...@ksu.edu wrote:
   Nope - it's 64-bit.
  
   (Sorry, I missed the reply-all last time.)
  
   On Thu, Apr 9, 2015 at 9:24 AM, Gregory Farnum g...@gregs42.com
 wrote:
  
   [Re-added the list]
  
   Hmm, I'm checking the code and that shouldn't be possible. What's
 your
   ciient? (In particular, is it 32-bit? That's the only thing i can
   think of that might have slipped through our QA.)
  
   On Thu, Apr 9, 2015 at 7:17 AM, Kyle Hutson kylehut...@ksu.edu
 wrote:
I did nothing to enable anything else. Just changed my ceph repo
 from
'giant' to 'hammer', then did 'yum update' and restarted services.
   
On Thu, Apr 9, 2015 at 9:15 AM, Gregory Farnum g...@gregs42.com
wrote:
   
Did you enable the straw2 stuff? CRUSHV4 shouldn't be required by
the
cluster unless you made changes to the layout requiring it.
   
If you did, the clients have to be upgraded to understand it. You
could disable all the v4 features; that should let them connect
again.
-Greg
   
On Thu, Apr 9, 2015 at 7:07 AM, Kyle Hutson kylehut...@ksu.edu
wrote:
 This particular problem I just figured out myself ('ceph -w' was
 still
 running from before the upgrade, and ctrl-c and restarting
 solved
 that
 issue), but I'm still having a similar problem on the ceph
 client:

 libceph: mon19 10.5.38.20:6789 feature set mismatch, my
 2b84a042aca 
 server's 102b84a042aca, missing 1

 It appears that even the latest kernel doesn't have support for
 CEPH_FEATURE_CRUSH_V4

 How do I make my ceph cluster backward-compatible with the old
 cephfs
 client?

 On Thu, Apr 9, 2015 at 8:58 AM, Kyle Hutson kylehut...@ksu.edu
 
 wrote:

 I upgraded from giant to hammer yesterday and now 'ceph -w' is
 constantly
 repeating this message:

 2015-04-09 08:50:26.318042 7f95dbf86700  0 --
 10.5.38.1:0/2037478
 
 10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0
 cs=0
 l=1
 c=0x7f95e0023670).connect protocol feature mismatch, my
 3fff
 
 peer
 13fff missing 1

 It isn't always the same IP for the destination - here's
 another:
 2015-04-09 08:50:20.322059 7f95dc087700  0 --
 10.5.38.1:0/2037478
 
 10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0
 cs=0
 l=1
 c=0x7f95e002b480).connect protocol feature mismatch, my
 3fff
 
 peer
 13fff missing 1

 Some details about our install:
 We have 24 hosts with 18 OSDs each. 16 per host are spinning
 disks
 in
 an
 erasure coded pool (k=8 m=4). 2 OSDs per host are SSD
 partitions
 used
 for a
 caching tier in front of the EC pool. All 24 hosts are
 monitors.
 4
 hosts are
 mds. We are running cephfs with a client trying to write data
 over
 cephfs
 when we're seeing these messages.

 Any ideas?



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

   
   
  
  
 
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] protocol feature mismatch after upgrading to Hammer

2015-04-09 Thread Kyle Hutson
Here 'tis:
https://dpaste.de/POr1


On Thu, Apr 9, 2015 at 9:49 AM, Gregory Farnum g...@gregs42.com wrote:

 Can you dump your crush map and post it on pastebin or something?

 On Thu, Apr 9, 2015 at 7:26 AM, Kyle Hutson kylehut...@ksu.edu wrote:
  Nope - it's 64-bit.
 
  (Sorry, I missed the reply-all last time.)
 
  On Thu, Apr 9, 2015 at 9:24 AM, Gregory Farnum g...@gregs42.com wrote:
 
  [Re-added the list]
 
  Hmm, I'm checking the code and that shouldn't be possible. What's your
  ciient? (In particular, is it 32-bit? That's the only thing i can
  think of that might have slipped through our QA.)
 
  On Thu, Apr 9, 2015 at 7:17 AM, Kyle Hutson kylehut...@ksu.edu wrote:
   I did nothing to enable anything else. Just changed my ceph repo from
   'giant' to 'hammer', then did 'yum update' and restarted services.
  
   On Thu, Apr 9, 2015 at 9:15 AM, Gregory Farnum g...@gregs42.com
 wrote:
  
   Did you enable the straw2 stuff? CRUSHV4 shouldn't be required by the
   cluster unless you made changes to the layout requiring it.
  
   If you did, the clients have to be upgraded to understand it. You
   could disable all the v4 features; that should let them connect
 again.
   -Greg
  
   On Thu, Apr 9, 2015 at 7:07 AM, Kyle Hutson kylehut...@ksu.edu
 wrote:
This particular problem I just figured out myself ('ceph -w' was
still
running from before the upgrade, and ctrl-c and restarting solved
that
issue), but I'm still having a similar problem on the ceph client:
   
libceph: mon19 10.5.38.20:6789 feature set mismatch, my
 2b84a042aca 
server's 102b84a042aca, missing 1
   
It appears that even the latest kernel doesn't have support for
CEPH_FEATURE_CRUSH_V4
   
How do I make my ceph cluster backward-compatible with the old
 cephfs
client?
   
On Thu, Apr 9, 2015 at 8:58 AM, Kyle Hutson kylehut...@ksu.edu
wrote:
   
I upgraded from giant to hammer yesterday and now 'ceph -w' is
constantly
repeating this message:
   
2015-04-09 08:50:26.318042 7f95dbf86700  0 -- 10.5.38.1:0/2037478
 
10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0 cs=0
 l=1
c=0x7f95e0023670).connect protocol feature mismatch, my
 3fff

peer
13fff missing 1
   
It isn't always the same IP for the destination - here's another:
2015-04-09 08:50:20.322059 7f95dc087700  0 -- 10.5.38.1:0/2037478
 
10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0 cs=0
 l=1
c=0x7f95e002b480).connect protocol feature mismatch, my
 3fff

peer
13fff missing 1
   
Some details about our install:
We have 24 hosts with 18 OSDs each. 16 per host are spinning disks
in
an
erasure coded pool (k=8 m=4). 2 OSDs per host are SSD partitions
used
for a
caching tier in front of the EC pool. All 24 hosts are monitors. 4
hosts are
mds. We are running cephfs with a client trying to write data over
cephfs
when we're seeing these messages.
   
Any ideas?
   
   
   
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
   
  
  
 
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] protocol feature mismatch after upgrading to Hammer

2015-04-09 Thread Kyle Hutson
Nope - it's 64-bit.

(Sorry, I missed the reply-all last time.)

On Thu, Apr 9, 2015 at 9:24 AM, Gregory Farnum g...@gregs42.com wrote:

 [Re-added the list]

 Hmm, I'm checking the code and that shouldn't be possible. What's your
 ciient? (In particular, is it 32-bit? That's the only thing i can
 think of that might have slipped through our QA.)

 On Thu, Apr 9, 2015 at 7:17 AM, Kyle Hutson kylehut...@ksu.edu wrote:
  I did nothing to enable anything else. Just changed my ceph repo from
  'giant' to 'hammer', then did 'yum update' and restarted services.
 
  On Thu, Apr 9, 2015 at 9:15 AM, Gregory Farnum g...@gregs42.com wrote:
 
  Did you enable the straw2 stuff? CRUSHV4 shouldn't be required by the
  cluster unless you made changes to the layout requiring it.
 
  If you did, the clients have to be upgraded to understand it. You
  could disable all the v4 features; that should let them connect again.
  -Greg
 
  On Thu, Apr 9, 2015 at 7:07 AM, Kyle Hutson kylehut...@ksu.edu wrote:
   This particular problem I just figured out myself ('ceph -w' was still
   running from before the upgrade, and ctrl-c and restarting solved that
   issue), but I'm still having a similar problem on the ceph client:
  
   libceph: mon19 10.5.38.20:6789 feature set mismatch, my 2b84a042aca 
   server's 102b84a042aca, missing 1
  
   It appears that even the latest kernel doesn't have support for
   CEPH_FEATURE_CRUSH_V4
  
   How do I make my ceph cluster backward-compatible with the old cephfs
   client?
  
   On Thu, Apr 9, 2015 at 8:58 AM, Kyle Hutson kylehut...@ksu.edu
 wrote:
  
   I upgraded from giant to hammer yesterday and now 'ceph -w' is
   constantly
   repeating this message:
  
   2015-04-09 08:50:26.318042 7f95dbf86700  0 -- 10.5.38.1:0/2037478 
   10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0 cs=0 l=1
   c=0x7f95e0023670).connect protocol feature mismatch, my 3fff
 
   peer
   13fff missing 1
  
   It isn't always the same IP for the destination - here's another:
   2015-04-09 08:50:20.322059 7f95dc087700  0 -- 10.5.38.1:0/2037478 
   10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0 cs=0 l=1
   c=0x7f95e002b480).connect protocol feature mismatch, my 3fff
 
   peer
   13fff missing 1
  
   Some details about our install:
   We have 24 hosts with 18 OSDs each. 16 per host are spinning disks in
   an
   erasure coded pool (k=8 m=4). 2 OSDs per host are SSD partitions used
   for a
   caching tier in front of the EC pool. All 24 hosts are monitors. 4
   hosts are
   mds. We are running cephfs with a client trying to write data over
   cephfs
   when we're seeing these messages.
  
   Any ideas?
  
  
  
   ___
   ceph-users mailing list
   ceph-users@lists.ceph.com
   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  
 
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] protocol feature mismatch after upgrading to Hammer

2015-04-09 Thread Kyle Hutson
This particular problem I just figured out myself ('ceph -w' was still
running from before the upgrade, and ctrl-c and restarting solved that
issue), but I'm still having a similar problem on the ceph client:

libceph: mon19 10.5.38.20:6789 feature set mismatch, my 2b84a042aca 
server's 102b84a042aca, missing 1

It appears that even the latest kernel doesn't have support
for CEPH_FEATURE_CRUSH_V4

How do I make my ceph cluster backward-compatible with the old cephfs
client?

On Thu, Apr 9, 2015 at 8:58 AM, Kyle Hutson kylehut...@ksu.edu wrote:

 I upgraded from giant to hammer yesterday and now 'ceph -w' is constantly
 repeating this message:

 2015-04-09 08:50:26.318042 7f95dbf86700  0 -- 10.5.38.1:0/2037478 
 10.5.38.1:6789/0 pipe(0x7f95e00256e0 sd=3 :39489 s=1 pgs=0 cs=0 l=1
 c=0x7f95e0023670).connect protocol feature mismatch, my 3fff  peer
 13fff missing 1

 It isn't always the same IP for the destination - here's another:
 2015-04-09 08:50:20.322059 7f95dc087700  0 -- 10.5.38.1:0/2037478 
 10.5.38.8:6789/0 pipe(0x7f95e00262f0 sd=3 :54047 s=1 pgs=0 cs=0 l=1
 c=0x7f95e002b480).connect protocol feature mismatch, my 3fff  peer
 13fff missing 1

 Some details about our install:
 We have 24 hosts with 18 OSDs each. 16 per host are spinning disks in an
 erasure coded pool (k=8 m=4). 2 OSDs per host are SSD partitions used for a
 caching tier in front of the EC pool. All 24 hosts are monitors. 4 hosts
 are mds. We are running cephfs with a client trying to write data over
 cephfs when we're seeing these messages.

 Any ideas?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com