Re: [ceph-users] ceph-deploy - problem creating an osd

Markus Goldberg Thu, 12 Jun 2014 00:11:29 -0700

Am 11.06.2014 16:47, schrieb Alfredo Deza:

On Wed, Jun 11, 2014 at 9:29 AM, Markus Goldberg
<goldb...@uni-hildesheim.de> wrote:

Hi,
ceph-deploy-1.5.3 can make trouble, if a reboot is done between preparation
and aktivation of an osd:


The osd-disk was /dev/sdb at this time, osd itself should go to sdb1,
formatted to cleared, journal should go to sdb2, formatted to btrfs
I prepared an osd:

root@bd-a:/etc/ceph# ceph-deploy -v --overwrite-conf osd --fs-type btrfs
prepare bd-1:/dev/sdb1:/dev/sdb2
[ceph_deploy.conf][DEBUG ] found configuration file at:
/root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.3): /usr/bin/ceph-deploy -v
--overwrite-conf osd --fs-type btrfs prepare bd-1:/dev/sdb1:/dev/sdb2
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks
bd-1:/dev/sdb1:/dev/sdb2
[bd-1][DEBUG ] connected to host: bd-1
[bd-1][DEBUG ] detect platform information from remote host
[bd-1][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 14.04 trusty
[ceph_deploy.osd][DEBUG ] Deploying osd to bd-1
[bd-1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[bd-1][INFO  ] Running command: udevadm trigger --subsystem-match=block
--action=add
[ceph_deploy.osd][DEBUG ] Preparing host bd-1 disk /dev/sdb1 journal
/dev/sdb2 activate False
[bd-1][INFO  ] Running command: ceph-disk-prepare --fs-type btrfs --cluster
ceph -- /dev/sdb1 /dev/sdb2
[bd-1][DEBUG ]
[bd-1][DEBUG ] WARNING! - Btrfs v3.12 IS EXPERIMENTAL
[bd-1][DEBUG ] WARNING! - see http://btrfs.wiki.kernel.org before using
[bd-1][DEBUG ]
[bd-1][DEBUG ] fs created label (null) on /dev/sdb1
[bd-1][DEBUG ]  nodesize 32768 leafsize 32768 sectorsize 4096 size 19.99TiB
[bd-1][DEBUG ] Btrfs v3.12
[bd-1][WARNIN] WARNING:ceph-disk:OSD will not be hot-swappable if journal is
not the same device as the osd data
[bd-1][WARNIN] Turning ON incompat feature 'extref': increased hardlink
limit per file to 65536
[bd-1][WARNIN] Error: Partition(s) 1 on /dev/sdb1 have been written, but we
have been unable to inform the kernel of the change, probably because
it/they are in use.  As a result, the old partition(s) will remain in use.
You should reboot now before making further changes.
[bd-1][INFO  ] checking OSD status...
[bd-1][INFO  ] Running command: ceph --cluster=ceph osd stat --format=json
[ceph_deploy.osd][DEBUG ] Host bd-1 is now ready for osd use.
Unhandled exception in thread started by
sys.excepthook is missing
lost sys.stderr

ceph-deploy told me to do a reboot, so i did.

This is actually not ceph-deploy asking you for a reboot but the
stderr captured from the
remote node (bd-1 in your case).

ceph-deploy will log output from remote nodes and will preface the
logs with the hostname when
the output happens remotely. stderr will be used as WARNING level and
stdout as DEBUG.

So in your case this line is output from ceph-disk-prepare/btrfs:

[bd-1][WARNIN] Error: Partition(s) 1 on /dev/sdb1 have been written, but we
have been unable to inform the kernel of the change, probably because
it/they are in use.  As a result, the old partition(s) will remain in use.
You should reboot now before making further changes.

Have you tried 'create' instead of 'prepare' and 'activate' ?

Hello Alfredo,

yes, i have tried create before. It stopped after the prepare-partbecause of this problem.

The 2 partitions were not in use, they were not mounted.

Other programs like gparted are able to inform the kernel, a reboot isnot nessessary.ceph-deploy initiates formatting the osd-partition, it is of type brtfsafter the prepare-step.

The journal partition is still unattached at this moment.

Bye,
  Markus

After the reboot the osd-disk changed from sdb to sda. This is a known
problem of linux (ubuntu)

root@bd-a:/etc/ceph# ceph-deploy -v osd activate bd-1:/dev/sda1:/dev/sda2
[ceph_deploy.conf][DEBUG ] found configuration file at:
/root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.3): /usr/bin/ceph-deploy -v osd
activate bd-1:/dev/sda1:/dev/sda2
[ceph_deploy.osd][DEBUG ] Activating cluster ceph disks
bd-1:/dev/sda1:/dev/sda2
[bd-1][DEBUG ] connected to host: bd-1
[bd-1][DEBUG ] detect platform information from remote host
[bd-1][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 14.04 trusty
[ceph_deploy.osd][DEBUG ] activating host bd-1 disk /dev/sda1
[ceph_deploy.osd][DEBUG ] will use init type: upstart
[bd-1][INFO  ] Running command: ceph-disk-activate --mark-init upstart
--mount /dev/sda1
[bd-1][WARNIN] got monmap epoch 1
[bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument
[bd-1][WARNIN] 2014-06-10 11:45:07.222697 7f5c111af800 -1 journal check:
ondisk fsid c8ce6ee2-f21b-4ba3-a20e-649224244b9a doesn't match expected
fcaaf66f-b7b7-4702-83a4-54832b7131fa, invalid (someone else's?) journal
[bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument
[bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument
[bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument
[bd-1][WARNIN] 2014-06-10 11:45:08.125384 7f5c111af800 -1
filestore(/var/lib/ceph/tmp/mnt.LryOxo) could not find
23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
[bd-1][WARNIN] 2014-06-10 11:45:08.320327 7f5c111af800 -1 created object
store /var/lib/ceph/tmp/mnt.LryOxo journal
/var/lib/ceph/tmp/mnt.LryOxo/journal for osd.4 fsid
08066b4a-3f36-4e3f-bd1e-15c006a09057
[bd-1][WARNIN] 2014-06-10 11:45:08.320367 7f5c111af800 -1 auth: error
reading file: /var/lib/ceph/tmp/mnt.LryOxo/keyring: can't open
/var/lib/ceph/tmp/mnt.LryOxo/keyring: (2) No such file or directory
[bd-1][WARNIN] 2014-06-10 11:45:08.320419 7f5c111af800 -1 created new key in
keyring /var/lib/ceph/tmp/mnt.LryOxo/keyring
[bd-1][WARNIN] added key for osd.4
[bd-1][INFO  ] checking OSD status...
[bd-1][INFO  ] Running command: ceph --cluster=ceph osd stat --format=json
[bd-1][WARNIN] there are 2 OSDs down
[bd-1][WARNIN] there are 2 OSDs out
root@bd-a:/etc/ceph# ceph -s
     cluster 08066b4a-3f36-4e3f-bd1e-15c006a09057
      health HEALTH_WARN 679 pgs degraded; 992 pgs stuck unclean; recovery
19/60 objects degraded (31.667%); clock skew detected on mon.bd-1
      monmap e1: 3 mons at
{bd-0=xxx.xxx.xxx.20:6789/0,bd-1=xxx.xxx.xxx.21:6789/0,bd-2=xxx.xxx.xxx.22:6789/0},
election epoch 4034, quorum 0,1,2 bd-0,bd-1,bd-2
      mdsmap e2815: 1/1/1 up {0=bd-2=up:active}, 2 up:standby
      osdmap e1717: 6 osds: 4 up, 4 in
       pgmap v46008: 992 pgs, 11 pools, 544 kB data, 20 objects
             10324 MB used, 125 TB / 125 TB avail
             19/60 objects degraded (31.667%)
                    2 active
                  679 active+degraded
                  311 active+remapped
root@bd-a:/etc/ceph# ceph osd tree
# id    weight  type name       up/down reweight
-1      189.1   root default
-2      63.63           host bd-0
0       43.64                   osd.0   up      1
3       19.99                   osd.3   up      1
-3      63.63           host bd-1
1       43.64                   osd.1   down    0
4       19.99                   osd.4   down    0
-4      61.81           host bd-2
2       43.64                   osd.2   up      1
5       18.17                   osd.5   up      1

At this time i rebooted bd-1 once more and the osd-disk now was /dev/sdb.
So i tried once more to activate the osd:


root@bd-a:/etc/ceph# ceph-deploy -v osd activate bd-1:/dev/sdb1:/dev/sdb2
[ceph_deploy.conf][DEBUG ] found configuration file at:
/root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.3): /usr/bin/ceph-deploy -v osd
activate bd-1:/dev/sdb1:/dev/sdb2
[ceph_deploy.osd][DEBUG ] Activating cluster ceph disks
bd-1:/dev/sdb1:/dev/sdb2
[bd-1][DEBUG ] connected to host: bd-1
[bd-1][DEBUG ] detect platform information from remote host
[bd-1][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 14.04 trusty
[ceph_deploy.osd][DEBUG ] activating host bd-1 disk /dev/sdb1
[ceph_deploy.osd][DEBUG ] will use init type: upstart
[bd-1][INFO  ] Running command: ceph-disk-activate --mark-init upstart
--mount /dev/sdb1
[bd-1][INFO  ] checking OSD status...
[bd-1][INFO  ] Running command: ceph --cluster=ceph osd stat --format=json
[bd-1][WARNIN] there are 2 OSDs down
[bd-1][WARNIN] there are 2 OSDs out
root@bd-a:/etc/ceph# ceph osd tree
# id    weight  type name       up/down reweight
-1      189.1   root default
-2      63.63           host bd-0
0       43.64                   osd.0   up      1
3       19.99                   osd.3   up      1
-3      63.63           host bd-1
1       43.64                   osd.1   down    0
4       19.99                   osd.4   down    0
-4      61.81           host bd-2
2       43.64                   osd.2   up      1
5       18.17                   osd.5   up      1
root@bd-a:/etc/ceph# ceph -s
     cluster 08066b4a-3f36-4e3f-bd1e-15c006a09057
      health HEALTH_WARN 679 pgs degraded; 992 pgs stuck unclean; recovery
10/60 objects degraded (16.667%); clock skew detected on mon.bd-1
      monmap e1: 3 mons at
{bd-0=xxx.xxx.xxx.20:6789/0,bd-1=xxx.xxx.xxx.21:6789/0,bd-2=xxx.xxx.xxx.22:6789/0},
election epoch 4060, quorum 0,1,2 bd-0,bd-1,bd-2
      mdsmap e2823: 1/1/1 up {0=bd-2=up:active}, 2 up:standby
      osdmap e1759: 6 osds: 4 up, 4 in
       pgmap v46110: 992 pgs, 11 pools, 544 kB data, 20 objects
             10320 MB used, 125 TB / 125 TB avail
             10/60 objects degraded (16.667%)
                  679 active+degraded
                  313 active+remapped
root@bd-a:/etc/ceph#

After another reboot all was ok:

ceph -s
     cluster 08066b4a-3f36-4e3f-bd1e-15c006a09057
      health HEALTH_OK
      monmap e1: 3 mons at
{bd-0=xxx.xxx.xxx.20:6789/0,bd-1=xxx.xxx.xxx.21:6789/0,bd-2=xxx.xxx.xxx.22:6789/0},
election epoch 4220, quorum 0,1,2 bd-0,bd-1,bd-2
      mdsmap e2895: 1/1/1 up {0=bd-2=up:active}, 2 up:standby
      osdmap e1939: 6 osds: 6 up, 6 in
       pgmap v47099: 992 pgs, 11 pools, 551 kB data, 20 objects
             117 MB used, 189 TB / 189 TB avail
                  992 active+clean
root@bd-a:~#


Is it possible for the author of ceph-deploy, to make the reboot needlessly
during these 2 steps ?
Then it would also be possible to use create instead of prepare+activate

Thank you,
   Markus

--
MfG,
   Markus Goldberg

--------------------------------------------------------------------------
Markus Goldberg       Universität Hildesheim
                       Rechenzentrum
Tel +49 5121 88392822 Marienburger Platz 22, D-31141 Hildesheim, Germany
Fax +49 5121 88392823 email goldb...@uni-hildesheim.de
--------------------------------------------------------------------------


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
MfG,
  Markus Goldberg

--------------------------------------------------------------------------
Markus Goldberg       Universität Hildesheim
                      Rechenzentrum
Tel +49 5121 88392822 Marienburger Platz 22, D-31141 Hildesheim, Germany
Fax +49 5121 88392823 email goldb...@uni-hildesheim.de
--------------------------------------------------------------------------


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-deploy - problem creating an osd

Reply via email to