date:20170301

[ceph-users] ceph osd activate error

2017-03-01 Thread gjprabu

Hi Team,



 


   We are installing new ceph setup version jewel and while active tehe osd its 
throughing error RuntimeError: Failed to execute command: /usr/sbin/ceph-disk 
-v activate --mark-init systemd --mount /home/data/osd1.  We try to reinstall 
the osd machine and still same error . Kindly let us know is there any solution 
on this error.



root@cephadmin~/mycluster#ceph-deploy osd activate cephnode1:/home/data/osd1 
cephnode2:/home/data/osd2 cephnode3:/home/data/osd3

[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf

[ceph_deploy.cli][INFO  ] Invoked (1.5.37): /usr/bin/ceph-deploy osd activate 
cephnode1:/home/data/osd1 cephnode2:/home/data/osd2 cephnode3:/home/data/osd3

[ceph_deploy.cli][INFO  ] ceph-deploy options:

[ceph_deploy.cli][INFO  ]  username  : None

[ceph_deploy.cli][INFO  ]  verbose   : False

[ceph_deploy.cli][INFO  ]  overwrite_conf: False

[ceph_deploy.cli][INFO  ]  subcommand: activate

[ceph_deploy.cli][INFO  ]  quiet : False

[ceph_deploy.cli][INFO  ]  cd_conf   : 


[ceph_deploy.cli][INFO  ]  cluster   : ceph

[ceph_deploy.cli][INFO  ]  func  : 

[ceph_deploy.cli][INFO  ]  ceph_conf : None

[ceph_deploy.cli][INFO  ]  default_release   : False

[ceph_deploy.cli][INFO  ]  disk  : [('cephnode1', 
'/home/data/osd1', None), ('cephnode2', '/home/data/osd2', None), ('cephnode3', 
'/home/data/osd3', None)]

[ceph_deploy.osd][DEBUG ] Activating cluster ceph disks 
cephnode1:/home/data/osd1: cephnode2:/home/data/osd2: cephnode3:/home/data/osd3:

**

WARNING: This system is a restricted access system.  All activity on this 
system is subject to monitoring.  If information collected reveals possible 
criminal activity or activity that exceeds privileges, evidence of such 
activity may be providedto the relevant authorities for further action. 

By continuing past this point, you expressly consent to   this monitoring.- 
ZOHO Corporation

**

**

WARNING: This system is a restricted access system.  All activity on this 
system is subject to monitoring.  If information collected reveals possible 
criminal activity or activity that exceeds privileges, evidence of such 
activity may be providedto the relevant authorities for further action. 

By continuing past this point, you expressly consent to   this monitoring.- 
ZOHO Corporation

**

[cephnode1][DEBUG ] connected to host: cephnode1

[cephnode1][DEBUG ] detect platform information from remote host

[cephnode1][DEBUG ] detect machine type

[cephnode1][DEBUG ] find the location of an executable

[ceph_deploy.osd][INFO  ] Distro info: CentOS Linux 7.3.1611 Core

[ceph_deploy.osd][DEBUG ] activating host cephnode1 disk /home/data/osd1

[ceph_deploy.osd][DEBUG ] will use init type: systemd

[cephnode1][DEBUG ] find the location of an executable

[cephnode1][INFO  ] Running command: /usr/sbin/ceph-disk -v activate 
--mark-init systemd --mount /home/data/osd1

[cephnode1][WARNIN] main_activate: path = /home/data/osd1

[cephnode1][WARNIN] activate: Cluster uuid is 
228e2b14-a6f2-4a46-b99e-673e3cd6774f

[cephnode1][WARNIN] command: Running command: /usr/bin/ceph-osd --cluster=ceph 
--show-config-value=fsid

[cephnode1][WARNIN] activate: Cluster name is ceph

[cephnode1][WARNIN] activate: OSD uuid is 147347cb-cc6b-400d-9a72-abae8cc75207

[cephnode1][WARNIN] allocate_osd_id: Allocating OSD id...

[cephnode1][WARNIN] command: Running command: /usr/bin/ceph --cluster ceph 
--name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring 
osd create --concise 147347cb-cc6b-400d-9a72-abae8cc75207

[cephnode1][WARNIN] command: Running command: /usr/sbin/restorecon -R 
/home/data/osd1/whoami.3203.tmp

[cephnode1][WARNIN] command: Running command: /usr/bin/chown -R ceph:ceph 
/home/data/osd1/whoami.3203.tmp

[cephnode1][WARNIN] activate: OSD id is 0

[cephnode1][WARNIN] activate: Initializing OSD...

[cephnode1][WARNIN] command_check_call: Running command: /usr/bin/ceph 
--cluster ceph --name client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o 
/home/data/osd1/activate.monmap

[cephnode1][WARNIN] got monmap epoch 1

[cephnode1][WARNIN] command: Running command: /usr/bin/timeout 300 ceph-osd 
--cluster c

Re: [ceph-users] How to hide internal ip on ceph mount

2017-03-01 Thread gjprabu

Hi,

We try to use host name instead of ip address but mounted partion showing 
up address only . How show the host name instead of ip address.

 On Wed, 01 Mar 2017 07:43:17 +0530  superdebu...@gmail.com wrote 

We do try to use DNS to hide the IP and achieve kinds of HA, but failed.

mount.ceph will resolve whatever you provide, to IP address, and pass it to 
kernel.

2017-02-28 16:14 GMT+08:00 Robert Sander :
On 28.02.2017 07:19, gjprabu wrote:

>              How to hide internal ip address on cephfs mounting. Due to
> security reason we need to hide ip address. Also we are running docker
> container in the base machine and which will shown the partition details
> over there. Kindly let us know is there any solution for this.
>
> 192.168.xxx.xxx:6789,192.168.xxx.xxx:6789,192.168.xxx.xxx:6789:/
> ceph      6.4T  2.0T  4.5T  31% /home/

If this is needed as a "security measure" you should not mount CephFS on
this host in the first place.

Only mount CephFS on hosts you trust (especially the root user) as the
Filesystem uses the local accounts for access control.

Regards
--
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to hide internal ip on ceph mount

2017-03-01 Thread gjprabu

Hi Robert,

   As per my understand whichever partion it has that will be replicated from 
base machine to docker container. My only concern is instead of ip how to show 
the dns name.

Regards
Prabu 

 On Tue, 28 Feb 2017 13:44:30 +0530 r.san...@heinlein-support.de wrote 

On 28.02.2017 07:19, gjprabu wrote: 

> How to hide internal ip address on cephfs mounting. Due to 
> security reason we need to hide ip address. Also we are running docker 
> container in the base machine and which will shown the partition details 
> over there. Kindly let us know is there any solution for this. 
> 
> 192.168.xxx.xxx:6789,192.168.xxx.xxx:6789,192.168.xxx.xxx:6789:/ 
> ceph 6.4T 2.0T 4.5T 31% /home/ 

If this is needed as a "security measure" you should not mount CephFS on 
this host in the first place. 

Only mount CephFS on hosts you trust (especially the root user) as the 
Filesystem uses the local accounts for access control. 

Regards 
-- 
Robert Sander 
Heinlein Support GmbH 
Schwedter Str. 8/9b, 10119 Berlin 

http://www.heinlein-support.de 

Tel: 030 / 405051-43 
Fax: 030 / 405051-19 

Zwangsangaben lt. §35a GmbHG: 
HRB 93818 B / Amtsgericht Berlin-Charlottenburg, 
Geschäftsführer: Peer Heinlein -- Sitz: Berlin 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RADOS as a simple object storage

2017-03-01 Thread Jan Kasprzak

Wido den Hollander wrote:
: 
: > Op 27 februari 2017 om 15:59 schreef Jan Kasprzak :
: > : > : > Here is some statistics from our biggest instance of the object 
storage:
: > : > : >
: > : > : > objects stored: 100_000_000
: > : > : > < 1024 bytes:10_000_000
: > : > : > 1k-64k bytes:80_000_000
: > : > : > 64k-4M bytes:10_000_000
: > : > : > 4M-256M bytes:1_000_000
: > : > : >> 256M bytes:10_000
: > : > : > biggest object:   15 GBytes
: > : > : >
: > : > : > Would it be feasible to put 100M to 1G objects as a native RADOS 
objects
: > : > : > into a single pool?
[...]
: > 
https://github.com/ceph/ceph/blob/master/src/libradosstriper/RadosStriperImpl.cc#L33
: > 
: > If I understand it correctly, it looks like libradosstriper only splits
: > large stored objects into smaller pieces (RADOS objects), but does not
: > consolidate more small stored objects into larger RADOS objects.
: 
: Why would you want to do that? Yes, very small objects can be a problem if 
you have millions of them since it takes a bit more to replicate them and 
recover them.

Yes. This is what I was afraid of. The immutability of my objects
would allow to consolidate smaller objects into larger bundles, but
if you say is not necessary for the problem of my size, I'll store them into
individual RADOS objects.
: 
: But overall I wouldn't bother about it too much.

OK, thanks!

: > So do you think I am ok with >10M tiny objects (smaller than 1KB)
: > and ~100,000,000 to 1,000,000,000 total objects, provided that I split
: > huge objects using libradosstriper?

-Yenya

-- 
| Jan "Yenya" Kasprzak  |
| http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
Assuming that OpenSSL is written as carefully as Wietse's own code,
every 1000 lines introduce one additional bug into Postfix."   --TLS_README
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Separate Network (RDB, RGW) and CephFS

2017-03-01 Thread Jimmy Goffaux

 

Hello,

I have a cluster CEPH (10.2.5-1trusty) I use the various
possibilities: 

-Block 

- Object 

- CephFS


root@ih-par1-cld1-ceph-01:~# cat /etc/ceph/ceph.conf
[]
mon_host =
10.4.0.1, 10.4.0.3, 10.4.0.5
[]
public_network =
10.4.0.0/24
cluster_network = 192.168.33.0/24
[] 

I have dedicated
servers for the storage Block, Object and other servers for CephFS (Full
SSD): 

root@ih-par1-cld1-ceph-01:~# ceph osd tree
ID WEIGHT TYPE NAME
UP/DOWN REWEIGHT PRIMARY-AFFINITY 
 -6 2.79593 root ssdforcephfs 
 -7
0.46599 host ih-prd-cephfs-02 
 32 0.23299 osd.32 up 1.0 1.0 

33 0.23299 osd.33 up 1.0 1.0 
 -8 0.46599 host ih-prd-cephfs-03

 34 0.23299 osd.34 up 1.0 1.0 
 35 0.23299 osd.35 up 1.0
1.0 
 -9 0.46599 host ih-prd-cephfs-05 
 36 0.23299 osd.36 up
1.0 1.0 
 37 0.23299 osd.37 up 1.0 1.0 
-10 0.46599 host
ih-prd-cephfs-01 
 38 0.23299 osd.38 up 1.0 1.0 
 39 0.23299
osd.39 up 1.0 1.0 
-11 0.46599 host ih-prd-cephfs-04 
 40
0.23299 osd.40 up 1.0 1.0 
 41 0.23299 osd.41 up 1.0 1.0

-12 0.46599 host ih-prd-cephfs-07 
 42 0.23299 osd.42 up 1.0
1.0 
 43 0.23299 osd.43 up 1.0 1.0 
 -1 116.47998 root
default 
 -2 43.67999 host ih-par1-cld1-ceph-01 
 0 3.64000 osd.0 up
1.0 1.0 
 2 3.64000 osd.2 up 1.0 1.0 
 6 3.64000 osd.6
up 1.0 1.0 
 8 3.64000 osd.8 up 1.0 1.0 
 15 3.64000
osd.15 up 1.0 1.0 
 16 3.64000 osd.16 up 1.0 1.0 
 19
3.64000 osd.19 up 1.0 1.0 
 22 3.64000 osd.22 up 1.0 1.0

 24 3.64000 osd.24 up 1.0 1.0 
 26 3.64000 osd.26 up 1.0
1.0 
 28 3.64000 osd.28 up 1.0 1.0 
 30 3.64000 osd.30 up
1.0 1.0 
 -3 43.67999 host ih-par1-cld1-ceph-03 
 1 3.64000
osd.1 up 1.0 1.0 
 3 3.64000 osd.3 up 1.0 1.0 
 5
3.64000 osd.5 up 1.0 1.0 
 7 3.64000 osd.7 up 1.0 1.0 

13 3.64000 osd.13 up 1.0 1.0 
 4 3.64000 osd.4 up 1.0
1.0 
 20 3.64000 osd.20 up 1.0 1.0 
 23 3.64000 osd.23 up
1.0 1.0 
 25 3.64000 osd.25 up 1.0 1.0 
 27 3.64000
osd.27 up 1.0 1.0 
 29 3.64000 osd.29 up 1.0 1.0 
 31
3.64000 osd.31 up 1.0 1.0 
 -5 29.12000 host
ih-par1-cld1-ceph-05 
 9 3.64000 osd.9 up 1.0 1.0 
 10 3.64000
osd.10 up 1.0 1.0 
 11 3.64000 osd.11 up 1.0 1.0 
 12
3.64000 osd.12 up 1.0 1.0 
 14 3.64000 osd.14 up 1.0 1.0

 17 3.64000 osd.17 up 1.0 1.0 
 18 3.64000 osd.18 up 1.0
1.0 
 21 3.64000 osd.21 up 1.0 1.0 

I use OpenNebula for
the use of RDB on the public network: 10.4.0.0/16.

I shall like
separating the RDB, RGW network and CephFS... I have my customers CephFS
who can accèder to all the network RBD, the hypervisors
OpenNebula

Example:

Customer A (CephFS, path: /client1) = > Reaches at
present all the network 10.4.0.0/16
Customer B (CephFS, path: /client2)
= > Reaches at present all the network 10.4.0.0/16

How is it possible
to separate networks: RBD, RGW and have multiple access networks for
CephFS?

I hope to have been clear:/

Thank you 
 ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to hide internal ip on ceph mount

2017-03-01 Thread Robert Sander

On 01.03.2017 10:54, gjprabu wrote:
> Hi,
> 
> We try to use host name instead of ip address but mounted partion
> showing up address only . How show the host name instead of ip address.

What is the security gain you try to achieve by hiding the IPs?

Regards
-- 
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Separate Network (RDB, RGW) and CephFS

2017-03-01 Thread John Spray

On Wed, Mar 1, 2017 at 10:15 AM, Jimmy Goffaux  wrote:
>
>
> Hello,
>
> I have a cluster CEPH (10.2.5-1trusty) I use the various possibilities:
>
> -Block
>
> - Object
>
> - CephFS
>
>
>
>
>
> root@ih-par1-cld1-ceph-01:~# cat /etc/ceph/ceph.conf
> []
> mon_host = 10.4.0.1, 10.4.0.3, 10.4.0.5
> []
> public_network = 10.4.0.0/24
> cluster_network = 192.168.33.0/24
> []
>
> I have dedicated servers for the storage Block, Object and other servers for
> CephFS (Full SSD):
>
>
>
>
> root@ih-par1-cld1-ceph-01:~# ceph osd tree
> ID  WEIGHTTYPE NAME UP/DOWN REWEIGHT
> PRIMARY-AFFINITY
>  -6   2.79593 root ssdforcephfs
>  -7   0.46599 host ih-prd-cephfs-02
>  32   0.23299 osd.32 up  1.0
> 1.0
>  33   0.23299 osd.33 up  1.0
> 1.0
>  -8   0.46599 host ih-prd-cephfs-03
>  34   0.23299 osd.34 up  1.0
> 1.0
>  35   0.23299 osd.35 up  1.0
> 1.0
>  -9   0.46599 host ih-prd-cephfs-05
>  36   0.23299 osd.36 up  1.0
> 1.0
>  37   0.23299 osd.37 up  1.0
> 1.0
> -10   0.46599 host ih-prd-cephfs-01
>  38   0.23299 osd.38 up  1.0
> 1.0
>  39   0.23299 osd.39 up  1.0
> 1.0
> -11   0.46599 host ih-prd-cephfs-04
>  40   0.23299 osd.40 up  1.0
> 1.0
>  41   0.23299 osd.41 up  1.0
> 1.0
> -12   0.46599 host ih-prd-cephfs-07
>  42   0.23299 osd.42 up  1.0
> 1.0
>  43   0.23299 osd.43 up  1.0
> 1.0
>  -1 116.47998 root default
>  -2  43.67999 host ih-par1-cld1-ceph-01
>   0   3.64000 osd.0  up  1.0
> 1.0
>   2   3.64000 osd.2  up  1.0
> 1.0
>   6   3.64000 osd.6  up  1.0
> 1.0
>   8   3.64000 osd.8  up  1.0
> 1.0
>  15   3.64000 osd.15 up  1.0
> 1.0
>  16   3.64000 osd.16 up  1.0
> 1.0
>  19   3.64000 osd.19 up  1.0
> 1.0
>  22   3.64000 osd.22 up  1.0
> 1.0
>  24   3.64000 osd.24 up  1.0
> 1.0
>  26   3.64000 osd.26 up  1.0
> 1.0
>  28   3.64000 osd.28 up  1.0
> 1.0
>  30   3.64000 osd.30 up  1.0
> 1.0
>  -3  43.67999 host ih-par1-cld1-ceph-03
>   1   3.64000 osd.1  up  1.0
> 1.0
>   3   3.64000 osd.3  up  1.0
> 1.0
>   5   3.64000 osd.5  up  1.0
> 1.0
>   7   3.64000 osd.7  up  1.0
> 1.0
>  13   3.64000 osd.13 up  1.0
> 1.0
>   4   3.64000 osd.4  up  1.0
> 1.0
>  20   3.64000 osd.20 up  1.0
> 1.0
>  23   3.64000 osd.23 up  1.0
> 1.0
>  25   3.64000 osd.25 up  1.0
> 1.0
>  27   3.64000 osd.27 up  1.0
> 1.0
>  29   3.64000 osd.29 up  1.0
> 1.0
>  31   3.64000 osd.31 up  1.0
> 1.0
>  -5  29.12000 host ih-par1-cld1-ceph-05
>   9   3.64000 osd.9  up  1.0
> 1.0
>  10   3.64000 osd.10 up  1.0
> 1.0
>  11   3.64000 osd.11 up  1.0
> 1.0
>  12   3.64000 osd.12 up  1.0
> 1.0
>  14   3.64000 osd.14 up  1.0
> 1.0
>  17   3.64000 osd.17 up  1.0
> 1.0
>  18   3.64000 osd.18 up  1.0
> 1.0
>  21   3.64000 osd.21 up  1.0
> 1.0
>
>
>
>
>
> I use OpenNebula for the use of RDB on the public network: 10.4.0.0/16.
>
> I shall like separating the RDB, RGW network and CephFS... I have my
> customers CephFS who can accèder to all the network RBD, the hypervisors
> OpenNebula
>
> Example:
>
> Customer A (CephFS, path: /client1) = > Reaches at present all the network
> 10.4.0.0/16
> Customer B (CephFS, path: /client2) = > Reaches at present all the network
> 10.4.0.0/16
>
> How is it possible to separate networks: RBD, RGW and have multiple access
> networks for CephFS?

CephFS clients talk directly to OSDs, like RBD clients -- if you want
to avoid giving your CephFS clients access to your Ceph public network
then the simplest way to acco

Re: [ceph-users] How to hide internal ip on ceph mount

2017-03-01 Thread gjprabu

Hi Robert,


  This container host will be provided to end user and we don't want to expose 
this ip to end users.



Regards

Prabu GJ




 On Wed, 01 Mar 2017 16:03:49 +0530 Robert Sander 
 wrote 




On 01.03.2017 10:54, gjprabu wrote: 

> Hi, 

> 

> We try to use host name instead of ip address but mounted partion 

> showing up address only . How show the host name instead of ip address. 

 

What is the security gain you try to achieve by hiding the IPs? 

 

Regards 

-- 

Robert Sander 

Heinlein Support GmbH 

Schwedter Str. 8/9b, 10119 Berlin 

 

http://www.heinlein-support.de 

 

Tel: 030 / 405051-43 

Fax: 030 / 405051-19 

 

Zwangsangaben lt. §35a GmbHG: 

HRB 93818 B / Amtsgericht Berlin-Charlottenburg, 

Geschäftsführer: Peer Heinlein -- Sitz: Berlin 

 

___ 

ceph-users mailing list 

ceph-users@lists.ceph.com 

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Antw: Safely Upgrading OS on a live Ceph Cluster

2017-03-01 Thread Heller, Chris

In my case the version will be identical. But I might have to do this node by 
node approach if I can't stabilize the more general shutdown/bring-up approach. 
There are 192 OSD in my cluster, so it will take a while to go node by node 
unfortunately.

-Chris

> On Mar 1, 2017, at 2:50 AM, Steffen Weißgerber  wrote:
> 
> Hello,
> 
> some time ago I upgraded our 6 node cluster (0.94.9) running on Ubuntu from 
> Trusty
> to Xenial.
> 
> The problem here was that with the os update also ceph is upgraded what we 
> did not want
> in the same step because then we had to upgrade all nodes at the same time.
> 
> Therefore we did it node by node first freeing the osd's on the node with 
> setting the weight to 0.
> 
> After os update, configuring the right ceph version for our setup and testing 
> the reboot so that
> all components start up correctly we set the osd weights to the normal value 
> so that the
> cluster was rebalancing.
> 
> With this procedure the cluster was always up.
> 
> Regards
> 
> Steffen
> 
> 
 "Heller, Chris"  schrieb am Montag, 27. Februar 2017 um
> 18:01:
>> I am attempting an operating system upgrade of a live Ceph cluster. Before I 
>> go an screw up my production system, I have been testing on a smaller 
>> installation, and I keep running into issues when bringing the Ceph FS 
>> metadata server online.
>> 
>> My approach here has been to store all Ceph critical files on non-root 
>> partitions, so the OS install can safely proceed without overwriting any of 
>> the Ceph configuration or data.
>> 
>> Here is how I proceed:
>> 
>> First I bring down the Ceph FS via `ceph mds cluster_down`.
>> Second, to prevent OSDs from trying to repair data, I run `ceph osd set 
>> noout`
>> Finally I stop the ceph processes in the following order: ceph-mds, 
>> ceph-mon, 
>> ceph-osd
>> 
>> Note my cluster has 1 mds and 1 mon, and 7 osd.
>> 
>> I then install the new OS and then bring the cluster back up by walking the 
>> steps in reverse:
>> 
>> First I start the ceph processes in the following order: ceph-osd, ceph-mon, 
>> ceph-mds
>> Second I restore OSD functionality with `ceph osd unset noout`
>> Finally I bring up the Ceph FS via `ceph mds cluster_up`
>> 
>> Everything works smoothly except the Ceph FS bring up. The MDS starts in the 
>> active:replay state and eventually crashes with the following backtrace:
>> 
>> starting mds.cuba at :/0
>> 2017-02-27 16:56:08.233680 7f31daa3b7c0 -1 mds.-1.0 log_to_monitors 
>> {default=true}
>> 2017-02-27 16:56:08.537714 7f31d30df700 -1 mds.0.sessionmap _load_finish got 
>> (2) No such file or directory
>> mds/SessionMap.cc : In function 'void 
>> SessionMap::_load_finish(int, ceph::bufferlist&)' thread 7f31d30df700 time 
>> 2017-02-27 16:56:08.537739
>> mds/SessionMap.cc : 98: FAILED assert(0 == "failed to 
>> load sessionmap")
>> ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
>> const*)+0x8b) [0x98bb4b]
>> 2: (SessionMap::_load_finish(int, ceph::buffer::list&)+0x2b4) [0x7df2a4]
>> 3: (MDSIOContextBase::complete(int)+0x95) [0x7e34b5]
>> 4: (Finisher::finisher_thread_entry()+0x190) [0x8bd6d0]
>> 5: (()+0x8192) [0x7f31d9c8f192]
>> 6: (clone()+0x6d) [0x7f31d919c51d]
>> NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
>> interpret this.
>> 2017-02-27 16:56:08.538493 7f31d30df700 -1 mds/SessionMap.cc 
>> : In function 'void SessionMap::_load_finish(int, 
>> ceph::bufferlist&)' thread 7f31d30df700 time 2017-02-27 16:56:08.537739
>> mds/SessionMap.cc : 98: FAILED assert(0 == "failed to 
>> load sessionmap")
>> 
>> ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
>> const*)+0x8b) [0x98bb4b]
>> 2: (SessionMap::_load_finish(int, ceph::buffer::list&)+0x2b4) [0x7df2a4]
>> 3: (MDSIOContextBase::complete(int)+0x95) [0x7e34b5]
>> 4: (Finisher::finisher_thread_entry()+0x190) [0x8bd6d0]
>> 5: (()+0x8192) [0x7f31d9c8f192]
>> 6: (clone()+0x6d) [0x7f31d919c51d]
>> NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
>> interpret this.
>> 
>> -106> 2017-02-27 16:56:08.233680 7f31daa3b7c0 -1 mds.-1.0 log_to_monitors 
>> {default=true}
>>   -1> 2017-02-27 16:56:08.537714 7f31d30df700 -1 mds.0.sessionmap 
>> _load_finish 
>> got (2) No such file or directory
>>0> 2017-02-27 16:56:08.538493 7f31d30df700 -1 mds/SessionMap.cc 
>> : In function 'void SessionMap::_load_finish(int, 
>> ceph::bufferlist&)' thread 7f31d30df700 time 2017-02-27 16:56:08.537739
>> mds/SessionMap.cc : 98: FAILED assert(0 == "failed to 
>> load sessionmap")
>> 
>> ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
>> const*)+0x8b) [0x98bb4b]
>> 2: (SessionMap::_load_finish(int, ceph::buffer

Re: [ceph-users] Safely Upgrading OS on a live Ceph Cluster

2017-03-01 Thread Peter Maloney

On 02/28/17 18:55, Heller, Chris wrote:
> Quick update. So I'm trying out the procedure as documented here.
>
> So far I've:
>
> 1. Stopped ceph-mds
> 2. set noout, norecover, norebalance, nobackfill
> 3. Stopped all ceph-osd
> 4. Stopped ceph-mon
> 5. Installed new OS
> 6. Started ceph-mon
> 7. Started all ceph-osd
>
> This is where I've stopped. All but one OSD came back online. One has
> this backtrace:
>
> 2017-02-28 17:44:54.884235 7fb2ba3187c0 -1 journal
> FileJournal::_open: disabling aio for non-block journal.  Use
> journal_force_aio to force use of aio anyway
>
Are the journals inline? or separate? If they're separate, the above
means the journal symlink/config is missing, so it would possibly make a
new journal, which would be bad if you didn't flush the old journal before.

And also just one osd is easy enough to replace (which I wouldn't do
until the cluster settled down and recovered). So it's lame for it to be
broken, but it's still recoverable if that's the only issue.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph osd activate error

2017-03-01 Thread gjprabu

Hi All,



 Anybody faced similar issue and is there any solution on this.



Regards

Prabu GJ




 On Wed, 01 Mar 2017 14:21:14 +0530 gjprabu  
wrote 




Hi Team,



 



   We are installing new ceph setup version jewel and while active tehe osd its 
throughing error RuntimeError: Failed to execute command: /usr/sbin/ceph-disk 
-v activate --mark-init systemd --mount /home/data/osd1.  We try to reinstall 
the osd machine and still same error . Kindly let us know is there any solution 
on this error.



root@cephadmin~/mycluster#ceph-deploy osd activate cephnode1:/home/data/osd1 
cephnode2:/home/data/osd2 cephnode3:/home/data/osd3

[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf

[ceph_deploy.cli][INFO  ] Invoked (1.5.37): /usr/bin/ceph-deploy osd activate 
cephnode1:/home/data/osd1 cephnode2:/home/data/osd2 cephnode3:/home/data/osd3

[ceph_deploy.cli][INFO  ] ceph-deploy options:

[ceph_deploy.cli][INFO  ]  username  : None

[ceph_deploy.cli][INFO  ]  verbose   : False

[ceph_deploy.cli][INFO  ]  overwrite_conf: False

[ceph_deploy.cli][INFO  ]  subcommand: activate

[ceph_deploy.cli][INFO  ]  quiet : False

[ceph_deploy.cli][INFO  ]  cd_conf   : 


[ceph_deploy.cli][INFO  ]  cluster   : ceph

[ceph_deploy.cli][INFO  ]  func  : 

[ceph_deploy.cli][INFO  ]  ceph_conf : None

[ceph_deploy.cli][INFO  ]  default_release   : False

[ceph_deploy.cli][INFO  ]  disk  : [('cephnode1', 
'/home/data/osd1', None), ('cephnode2', '/home/data/osd2', None), ('cephnode3', 
'/home/data/osd3', None)]

[ceph_deploy.osd][DEBUG ] Activating cluster ceph disks 
cephnode1:/home/data/osd1: cephnode2:/home/data/osd2: cephnode3:/home/data/osd3:

**

WARNING: This system is a restricted access system.  All activity on this 
system is subject to monitoring.  If information collected reveals possible 
criminal activity or activity that exceeds privileges, evidence of such 
activity may be providedto the relevant authorities for further action. 

By continuing past this point, you expressly consent to   this monitoring.- 
ZOHO Corporation

**

**

WARNING: This system is a restricted access system.  All activity on this 
system is subject to monitoring.  If information collected reveals possible 
criminal activity or activity that exceeds privileges, evidence of such 
activity may be providedto the relevant authorities for further action. 

By continuing past this point, you expressly consent to   this monitoring.- 
ZOHO Corporation

**

[cephnode1][DEBUG ] connected to host: cephnode1

[cephnode1][DEBUG ] detect platform information from remote host

[cephnode1][DEBUG ] detect machine type

[cephnode1][DEBUG ] find the location of an executable

[ceph_deploy.osd][INFO  ] Distro info: CentOS Linux 7.3.1611 Core

[ceph_deploy.osd][DEBUG ] activating host cephnode1 disk /home/data/osd1

[ceph_deploy.osd][DEBUG ] will use init type: systemd

[cephnode1][DEBUG ] find the location of an executable

[cephnode1][INFO  ] Running command: /usr/sbin/ceph-disk -v activate 
--mark-init systemd --mount /home/data/osd1

[cephnode1][WARNIN] main_activate: path = /home/data/osd1

[cephnode1][WARNIN] activate: Cluster uuid is 
228e2b14-a6f2-4a46-b99e-673e3cd6774f

[cephnode1][WARNIN] command: Running command: /usr/bin/ceph-osd --cluster=ceph 
--show-config-value=fsid

[cephnode1][WARNIN] activate: Cluster name is ceph

[cephnode1][WARNIN] activate: OSD uuid is 147347cb-cc6b-400d-9a72-abae8cc75207

[cephnode1][WARNIN] allocate_osd_id: Allocating OSD id...

[cephnode1][WARNIN] command: Running command: /usr/bin/ceph --cluster ceph 
--name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring 
osd create --concise 147347cb-cc6b-400d-9a72-abae8cc75207

[cephnode1][WARNIN] command: Running command: /usr/sbin/restorecon -R 
/home/data/osd1/whoami.3203.tmp

[cephnode1][WARNIN] command: Running command: /usr/bin/chown -R ceph:ceph 
/home/data/osd1/whoami.3203.tmp

[cephnode1][WARNIN] activate: OSD id is 0

[cephnode1][WARNIN] activate: Initializing OSD...

[cephnode1][WARNIN] command_check_call: Running command: /usr/bin/ceph 
--cluster ceph --name client.bootstrap-osd --keyring 
/var/lib/ceph/boo

Re: [ceph-users] Safely Upgrading OS on a live Ceph Cluster

2017-03-01 Thread Heller, Chris

That is a good question, and I'm not sure how to answer. The journal is on its 
own volume, and is not a symlink. Also how does one flush the journal? That 
seems like an important step when bringing down a cluster safely.

-Chris

> On Mar 1, 2017, at 8:37 AM, Peter Maloney 
>  wrote:
> 
> On 02/28/17 18:55, Heller, Chris wrote:
>> Quick update. So I'm trying out the procedure as documented here.
>> 
>> So far I've:
>> 
>> 1. Stopped ceph-mds
>> 2. set noout, norecover, norebalance, nobackfill
>> 3. Stopped all ceph-osd
>> 4. Stopped ceph-mon
>> 5. Installed new OS
>> 6. Started ceph-mon
>> 7. Started all ceph-osd
>> 
>> This is where I've stopped. All but one OSD came back online. One has this 
>> backtrace:
>> 
>> 2017-02-28 17:44:54.884235 7fb2ba3187c0 -1 journal FileJournal::_open: 
>> disabling aio for non-block journal.  Use journal_force_aio to force use of 
>> aio anyway
> Are the journals inline? or separate? If they're separate, the above means 
> the journal symlink/config is missing, so it would possibly make a new 
> journal, which would be bad if you didn't flush the old journal before.
> 
> And also just one osd is easy enough to replace (which I wouldn't do until 
> the cluster settled down and recovered). So it's lame for it to be broken, 
> but it's still recoverable if that's the only issue.



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph osd activate error

2017-03-01 Thread Iban Cabrillo

Hi,
Are you sure ceph-disk is installed on target machine?

Regards, I

El mié., 1 mar. 2017 14:38, gjprabu  escribió:

> Hi All,
>
>  Anybody faced similar issue and is there any solution on this.
>
> Regards
> Prabu GJ
>
>
>  On Wed, 01 Mar 2017 14:21:14 +0530 *gjprabu  >* wrote 
>
> Hi Team,
>
>
>
>We are installing new ceph setup version jewel and while active tehe
> osd its throughing error *RuntimeError: Failed to execute command:
> /usr/sbin/ceph-disk -v activate --mark-init systemd --mount
> /home/data/osd1.  *We try to reinstall the osd machine and still same
> error . Kindly let us know is there any solution on this error.
>
> root@cephadmin~/mycluster#ceph-deploy osd activate
> cephnode1:/home/data/osd1 cephnode2:/home/data/osd2
> cephnode3:/home/data/osd3
>
> [ceph_deploy.conf][DEBUG ] found configuration file at:
> /root/.cephdeploy.conf
>
> [ceph_deploy.cli][INFO  ] Invoked (1.5.37): /usr/bin/ceph-deploy osd
> activate cephnode1:/home/data/osd1 cephnode2:/home/data/osd2
> cephnode3:/home/data/osd3
>
> [ceph_deploy.cli][INFO  ] ceph-deploy options:
>
> [ceph_deploy.cli][INFO  ]  username  : None
>
> [ceph_deploy.cli][INFO  ]  verbose   : False
>
> [ceph_deploy.cli][INFO  ]  overwrite_conf: False
>
> [ceph_deploy.cli][INFO  ]  subcommand: activate
>
> [ceph_deploy.cli][INFO  ]  quiet : False
>
> [ceph_deploy.cli][INFO  ]  cd_conf   :
> 
>
> [ceph_deploy.cli][INFO  ]  cluster   : ceph
>
> [ceph_deploy.cli][INFO  ]  func  :  at 0xbbc050>
>
> [ceph_deploy.cli][INFO  ]  ceph_conf : None
>
> [ceph_deploy.cli][INFO  ]  default_release   : False
>
> [ceph_deploy.cli][INFO  ]  disk  : [('cephnode1',
> '/home/data/osd1', None), ('cephnode2', '/home/data/osd2', None),
> ('cephnode3', '/home/data/osd3', None)]
>
> [ceph_deploy.osd][DEBUG ] Activating cluster ceph disks
> cephnode1:/home/data/osd1: cephnode2:/home/data/osd2:
> cephnode3:/home/data/osd3:
>
>
> **
>
> WARNING: This system is a restricted access system.  All activity on this
> system is subject to monitoring.  If information collected reveals possible
> criminal activity or activity that exceeds privileges, evidence of such
> activity may be providedto the relevant authorities for further action.
>
> By continuing past this point, you expressly consent to   this
> monitoring.- ZOHO Corporation
>
>
> **
>
>
> **
>
> WARNING: This system is a restricted access system.  All activity on this
> system is subject to monitoring.  If information collected reveals possible
> criminal activity or activity that exceeds privileges, evidence of such
> activity may be providedto the relevant authorities for further action.
>
> By continuing past this point, you expressly consent to   this
> monitoring.- ZOHO Corporation
>
>
> **
>
> [cephnode1][DEBUG ] connected to host: cephnode1
>
> [cephnode1][DEBUG ] detect platform information from remote host
>
> [cephnode1][DEBUG ] detect machine type
>
> [cephnode1][DEBUG ] find the location of an executable
>
> [ceph_deploy.osd][INFO  ] Distro info: CentOS Linux 7.3.1611 Core
>
> [ceph_deploy.osd][DEBUG ] activating host cephnode1 disk /home/data/osd1
>
> [ceph_deploy.osd][DEBUG ] will use init type: systemd
>
> [cephnode1][DEBUG ] find the location of an executable
>
> [cephnode1][INFO  ] Running command: /usr/sbin/ceph-disk -v activate
> --mark-init systemd --mount /home/data/osd1
>
> [cephnode1][WARNIN] main_activate: path = /home/data/osd1
>
> [cephnode1][WARNIN] activate: Cluster uuid is
> 228e2b14-a6f2-4a46-b99e-673e3cd6774f
>
> [cephnode1][WARNIN] command: Running command: /usr/bin/ceph-osd
> --cluster=ceph --show-config-value=fsid
>
> [cephnode1][WARNIN] activate: Cluster name is ceph
>
> [cephnode1][WARNIN] activate: OSD uuid is
> 147347cb-cc6b-400d-9a72-abae8cc75207
>
> [cephnode1][WARNIN] allocate_osd_id: Allocating OSD id...
>
> [cephnode1][WARNIN] command: Running command: /usr/bin/ceph --cluster ceph
> --name client.bootstrap-osd --keyring
> /var/lib/ceph/bootstrap-osd/ceph.keyring osd create --concise
> 147347cb-cc6b-400d-9a72-abae8cc75207
>
> [cephnode1][WARNIN] command: Running command: /usr/sbin/restorecon -R
> /home/data/osd1/whoami.3203.tmp
>
> [cephnode1][WARNIN] command: Running command: /usr/bin/chown -R ceph:ceph
> /home/data/osd1/whoami.3203.tmp
>
> [cephnode1][WARNIN] activate:

Re: [ceph-users] ceph osd activate error

2017-03-01 Thread gjprabu

Hi Iban,



  Sure it is there. Ceph prepared was working properly and activate is 
through the error.



root@cephnode1~#df -Th

Filesystem Type  Size  Used Avail Use% Mounted on

/dev/vda2  ext4  7.6G  2.2G  5.4G  29% /

devtmpfs   devtmpfs  3.9G 0  3.9G   0% /dev

tmpfs  tmpfs 3.9G 0  3.9G   0% /dev/shm

tmpfs  tmpfs 3.9G  8.4M  3.9G   1% /run

tmpfs  tmpfs 3.9G 0  3.9G   0% /sys/fs/cgroup

/dev/vda1  ext4  9.5G  293M  9.1G   4% /var

/dev/vda5  ext4  9.5G   37M  9.4G   1% /tmp

/dev/mapper/vg000-mysqlvol ext4  255G  5.1G  247G   3% /home

tmpfs  tmpfs 782M 0  782M   0% /run/user/0





root@cephnode1/home/data/osd1#pwd



/home/data/osd1





root@cephnode1/home#ls -ld data/

drwxr-xr-x 3 ceph ceph 4096 Mar  1 14:08 data/

root@zoho-cephnode1/home#ls -ld data/osd1/

drwxr-xr-x 3 ceph ceph 4096 Mar  1 14:12 data/osd1/





Ceph Prepare
root@cephadmin~/mycluster#ceph-deploy osd prepare cephnode1:/home/data/osd1 
cephnode2:/home/data/osd2 cephnode3:/home/data/osd3

[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf

[ceph_deploy.cli][INFO ] Invoked (1.5.37): /usr/bin/ceph-deploy osd prepare 
cephnode1:/home/data/osd1 cephnode2:/home/data/osd2 cephnode3:/home/data/osd3

[ceph_deploy.cli][INFO ] ceph-deploy options:

[ceph_deploy.cli][INFO ] username : None

[ceph_deploy.cli][INFO ] disk : [('cephnode1', '/home/data/osd1', None), 
('cephnode2', '/home/data/osd2', None), ('cephnode3', '/home/data/osd3', None)]

[ceph_deploy.cli][INFO ] dmcrypt : False

[ceph_deploy.cli][INFO ] verbose : False

[ceph_deploy.cli][INFO ] bluestore : None

[ceph_deploy.cli][INFO ] overwrite_conf : False

[ceph_deploy.cli][INFO ] subcommand : prepare

[ceph_deploy.cli][INFO ] dmcrypt_key_dir : /etc/ceph/dmcrypt-keys

[ceph_deploy.cli][INFO ] quiet : False

[ceph_deploy.cli][INFO ] cd_conf : 

[ceph_deploy.cli][INFO ] cluster : ceph

[ceph_deploy.cli][INFO ] fs_type : xfs

[ceph_deploy.cli][INFO ] func : 

[ceph_deploy.cli][INFO ] ceph_conf : None

[ceph_deploy.cli][INFO ] default_release : False

[ceph_deploy.cli][INFO ] zap_disk : False

[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks 
cephnode1:/home/data/osd1: cephnode2:/home/data/osd2: cephnode3:/home/data/osd3:

**

WARNING: This system is a restricted access system. All activity on this system 
is subject to monitoring. If information collected reveals possible criminal 
activity or activity that exceeds privileges, evidence of such activity may be 
providedto the relevant authorities for further action.

By continuing past this point, you expressly consent to this monitoring.- ZOHO 
Corporation

**

**

WARNING: This system is a restricted access system. All activity on this system 
is subject to monitoring. If information collected reveals possible criminal 
activity or activity that exceeds privileges, evidence of such activity may be 
providedto the relevant authorities for further action.

By continuing past this point, you expressly consent to this monitoring.- ZOHO 
Corporation

**

[cephnode1][DEBUG ] connected to host: cephnode1

[cephnode1][DEBUG ] detect platform information from remote host

[cephnode1][DEBUG ] detect machine type

[cephnode1][DEBUG ] find the location of an executable

[ceph_deploy.osd][INFO ] Distro info: CentOS Linux 7.3.1611 Core

[ceph_deploy.osd][DEBUG ] Deploying osd to cephnode1

[cephnode1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf

[cephnode1][WARNIN] osd keyring does not exist yet, creating one

[cephnode1][DEBUG ] create a keyring file

[ceph_deploy.osd][DEBUG ] Preparing host cephnode1 disk /home/data/osd1 journal 
None activate False

[cephnode1][DEBUG ] find the location of an executable

[cephnode1][INFO ] Running command: /usr/sbin/ceph-disk -v prepare --cluster 
ceph --fs-type xfs -- /home/data/osd1

[cephnode1][WARNIN] command: Running command: /usr/bin/ceph-osd --cluster=ceph 
--show-config-value=fsid

[cephnode1][WARNIN] command: Running command: /usr/bin/ceph-osd 
--check-allows-journal -i 0 --cluster ceph

[cephnode1][WARNIN] command: Running command: /usr/bin/ceph-osd 
--check-wants-journal -i 0 --cluster ceph

[cephnode1][WARNIN] command: Running command: /usr/bin/ceph-osd 
--check-needs-journal -i

Re: [ceph-users] Safely Upgrading OS on a live Ceph Cluster

2017-03-01 Thread Peter Maloney

On 03/01/17 14:41, Heller, Chris wrote:
> That is a good question, and I'm not sure how to answer. The journal
> is on its own volume, and is not a symlink. Also how does one flush
> the journal? That seems like an important step when bringing down a
> cluster safely.
>
You only need to flush the journal if you are removing it from the osd,
replacing it with a different journal.

So since your journal is on its own, then you need either a symlink in
the osd directory named "journal" which points to the device (ideally
not /dev/sdx but /dev/disk/by-.../), or you put it in the ceph.conf.

And since it said you have a non-block journal now, it probably means
there is a file... you should remove that (rename it to journal.junk
until you're sure it's not an important file, and delete it later).

> -Chris
>
>> On Mar 1, 2017, at 8:37 AM, Peter Maloney
>> > > wrote:
>>
>> On 02/28/17 18:55, Heller, Chris wrote:
>>> Quick update. So I'm trying out the procedure as documented here.
>>>
>>> So far I've:
>>>
>>> 1. Stopped ceph-mds
>>> 2. set noout, norecover, norebalance, nobackfill
>>> 3. Stopped all ceph-osd
>>> 4. Stopped ceph-mon
>>> 5. Installed new OS
>>> 6. Started ceph-mon
>>> 7. Started all ceph-osd
>>>
>>> This is where I've stopped. All but one OSD came back online. One
>>> has this backtrace:
>>>
>>> 2017-02-28 17:44:54.884235 7fb2ba3187c0 -1 journal
>>> FileJournal::_open: disabling aio for non-block journal.  Use
>>> journal_force_aio to force use of aio anyway
>>>
>> Are the journals inline? or separate? If they're separate, the above
>> means the journal symlink/config is missing, so it would possibly
>> make a new journal, which would be bad if you didn't flush the old
>> journal before.
>>
>> And also just one osd is easy enough to replace (which I wouldn't do
>> until the cluster settled down and recovered). So it's lame for it to
>> be broken, but it's still recoverable if that's the only issue.
>


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Safely Upgrading OS on a live Ceph Cluster

2017-03-01 Thread Heller, Chris

I see. My journal is specified in ceph.conf. I'm not removing it from the OSD 
so sounds like flushing isn't needed in my case.

-Chris
> On Mar 1, 2017, at 9:31 AM, Peter Maloney 
>  wrote:
> 
> On 03/01/17 14:41, Heller, Chris wrote:
>> That is a good question, and I'm not sure how to answer. The journal is on 
>> its own volume, and is not a symlink. Also how does one flush the journal? 
>> That seems like an important step when bringing down a cluster safely.
>> 
> You only need to flush the journal if you are removing it from the osd, 
> replacing it with a different journal.
> 
> So since your journal is on its own, then you need either a symlink in the 
> osd directory named "journal" which points to the device (ideally not 
> /dev/sdx but /dev/disk/by-.../), or you put it in the ceph.conf.
> 
> And since it said you have a non-block journal now, it probably means there 
> is a file... you should remove that (rename it to journal.junk until you're 
> sure it's not an important file, and delete it later).
> 
>> -Chris
>> 
>>> On Mar 1, 2017, at 8:37 AM, Peter Maloney 
>>> >> > wrote:
>>> 
>>> On 02/28/17 18:55, Heller, Chris wrote:
 Quick update. So I'm trying out the procedure as documented here.
 
 So far I've:
 
 1. Stopped ceph-mds
 2. set noout, norecover, norebalance, nobackfill
 3. Stopped all ceph-osd
 4. Stopped ceph-mon
 5. Installed new OS
 6. Started ceph-mon
 7. Started all ceph-osd
 
 This is where I've stopped. All but one OSD came back online. One has this 
 backtrace:
 
 2017-02-28 17:44:54.884235 7fb2ba3187c0 -1 journal FileJournal::_open: 
 disabling aio for non-block journal.  Use journal_force_aio to force use 
 of aio anyway
>>> Are the journals inline? or separate? If they're separate, the above means 
>>> the journal symlink/config is missing, so it would possibly make a new 
>>> journal, which would be bad if you didn't flush the old journal before.
>>> 
>>> And also just one osd is easy enough to replace (which I wouldn't do until 
>>> the cluster settled down and recovered). So it's lame for it to be broken, 
>>> but it's still recoverable if that's the only issue.
>> 
> 
> 



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Safely Upgrading OS on a live Ceph Cluster

2017-03-01 Thread Peter Maloney

On 03/01/17 15:36, Heller, Chris wrote:
> I see. My journal is specified in ceph.conf. I'm not removing it from
> the OSD so sounds like flushing isn't needed in my case.
>
Okay but it seems it's not right if it's saying it's a non-block
journal. (meaning a file, not a block device).

Double check your ceph.conf... make sure the path works, and somehow
make sure the [osd.x] actually matches that osd (no idea how to test
that, esp. if the osd doesn't start ... maybe just increase logging).

Or just make a symlink for now, just to see if it solves the problem,
which would imply the ceph.conf is wrong.


> -Chris
>> On Mar 1, 2017, at 9:31 AM, Peter Maloney
>> > > wrote:
>>
>> On 03/01/17 14:41, Heller, Chris wrote:
>>> That is a good question, and I'm not sure how to answer. The journal
>>> is on its own volume, and is not a symlink. Also how does one flush
>>> the journal? That seems like an important step when bringing down a
>>> cluster safely.
>>>
>> You only need to flush the journal if you are removing it from the
>> osd, replacing it with a different journal.
>>
>> So since your journal is on its own, then you need either a symlink
>> in the osd directory named "journal" which points to the device
>> (ideally not /dev/sdx but /dev/disk/by-.../), or you put it in the
>> ceph.conf.
>>
>> And since it said you have a non-block journal now, it probably means
>> there is a file... you should remove that (rename it to journal.junk
>> until you're sure it's not an important file, and delete it later).
> This is where I've stopped. All but one OSD came back online. One
> has this backtrace:
>
> 2017-02-28 17:44:54.884235 7fb2ba3187c0 -1 journal
> FileJournal::_open: disabling aio for non-block journal.  Use
> journal_force_aio to force use of aio anyway
>
 Are the journals inline? or separate? If they're separate, the
 above means the journal symlink/config is missing, so it would
 possibly make a new journal, which would be bad if you didn't flush
 the old journal before.

 And also just one osd is easy enough to replace (which I wouldn't
 do until the cluster settled down and recovered). So it's lame for
 it to be broken, but it's still recoverable if that's the only issue.
>>>
>>
>>
>


-- 


Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.malo...@brockmann-consult.de
Internet: http://www.brockmann-consult.de


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to hide internal ip on ceph mount

2017-03-01 Thread Xiaoxi Chen

Well , I think the argument here is not all about security gain, it just
NOT a user friendly way to let "df" show out 7 IPs of monitorsMuch
better if they seeing something like "mycephfs.mydomain.com".

And using DNS give you the flexibility of changing your monitor quorum
members , without notifying end user to change their fstab entry , or
whatever mount point record.

2017-03-01 18:46 GMT+08:00 gjprabu :

> Hi Robert,
>
>   This container host will be provided to end user and we don't want to
> expose this ip to end users.
>
> Regards
> Prabu GJ
>
>
>  On Wed, 01 Mar 2017 16:03:49 +0530 *Robert Sander
> >* wrote 
>
> On 01.03.2017 10:54, gjprabu wrote:
> > Hi,
> >
> > We try to use host name instead of ip address but mounted partion
> > showing up address only . How show the host name instead of ip address.
>
> What is the security gain you try to achieve by hiding the IPs?
>
> Regards
> --
> Robert Sander
> Heinlein Support GmbH
> Schwedter Str. 8/9b, 10119 Berlin
>
> http://www.heinlein-support.de
>
> Tel: 030 / 405051-43
> Fax: 030 / 405051-19
>
> Zwangsangaben lt. §35a GmbHG:
> HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
> Geschäftsführer: Peer Heinlein -- Sitz: Berlin
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to hide internal ip on ceph mount

2017-03-01 Thread Wido den Hollander


> Op 1 maart 2017 om 15:40 schreef Xiaoxi Chen :
> 
> 
> Well , I think the argument here is not all about security gain, it just
> NOT a user friendly way to let "df" show out 7 IPs of monitorsMuch
> better if they seeing something like "mycephfs.mydomain.com".
> 

mount / df simply prints the monmap. It doesn't print what you added when you 
mounted the filesystem.

Totally normal behavior.

> And using DNS give you the flexibility of changing your monitor quorum
> members , without notifying end user to change their fstab entry , or
> whatever mount point record.
> 

Still applies. Just create a Round Robin DNS record. The clients will obtain a 
new monmap while they are connected to the cluster.

Wido

> 2017-03-01 18:46 GMT+08:00 gjprabu :
> 
> > Hi Robert,
> >
> >   This container host will be provided to end user and we don't want to
> > expose this ip to end users.
> >
> > Regards
> > Prabu GJ
> >
> >
> >  On Wed, 01 Mar 2017 16:03:49 +0530 *Robert Sander
> > >* wrote 
> >
> > On 01.03.2017 10:54, gjprabu wrote:
> > > Hi,
> > >
> > > We try to use host name instead of ip address but mounted partion
> > > showing up address only . How show the host name instead of ip address.
> >
> > What is the security gain you try to achieve by hiding the IPs?
> >
> > Regards
> > --
> > Robert Sander
> > Heinlein Support GmbH
> > Schwedter Str. 8/9b, 10119 Berlin
> >
> > http://www.heinlein-support.de
> >
> > Tel: 030 / 405051-43
> > Fax: 030 / 405051-19
> >
> > Zwangsangaben lt. §35a GmbHG:
> > HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
> > Geschäftsführer: Peer Heinlein -- Sitz: Berlin
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to hide internal ip on ceph mount

2017-03-01 Thread Sage Weil

On Wed, 1 Mar 2017, Wido den Hollander wrote:
> > Op 1 maart 2017 om 15:40 schreef Xiaoxi Chen :
> > 
> > 
> > Well , I think the argument here is not all about security gain, it just
> > NOT a user friendly way to let "df" show out 7 IPs of monitorsMuch
> > better if they seeing something like "mycephfs.mydomain.com".
> > 
> 
> mount / df simply prints the monmap. It doesn't print what you added when you 
> mounted the filesystem.
> 
> Totally normal behavior.

Yep.  This *could* be changed, though: modern kernels have DNS resolution 
capability.  Not sure if all distros compile it in, but if so, mount.ceph 
could first try to pass in the DNS name and only do the DNS resolution if 
the kernel can't.  And the kernel client could be updated to remember the 
DNS name and use that.  It's a bit friendlier, but imprecise, since DNS 
might change.  What does NFS do in this case? (Show an IP or a name?)

sage


> > And using DNS give you the flexibility of changing your monitor quorum
> > members , without notifying end user to change their fstab entry , or
> > whatever mount point record.
> > 
> 
> Still applies. Just create a Round Robin DNS record. The clients will obtain 
> a new monmap while they are connected to the cluster.
> 
> Wido
> 
> > 2017-03-01 18:46 GMT+08:00 gjprabu :
> > 
> > > Hi Robert,
> > >
> > >   This container host will be provided to end user and we don't want to
> > > expose this ip to end users.
> > >
> > > Regards
> > > Prabu GJ
> > >
> > >
> > >  On Wed, 01 Mar 2017 16:03:49 +0530 *Robert Sander
> > > >* wrote 
> > >
> > > On 01.03.2017 10:54, gjprabu wrote:
> > > > Hi,
> > > >
> > > > We try to use host name instead of ip address but mounted partion
> > > > showing up address only . How show the host name instead of ip address.
> > >
> > > What is the security gain you try to achieve by hiding the IPs?
> > >
> > > Regards
> > > --
> > > Robert Sander
> > > Heinlein Support GmbH
> > > Schwedter Str. 8/9b, 10119 Berlin
> > >
> > > http://www.heinlein-support.de
> > >
> > > Tel: 030 / 405051-43
> > > Fax: 030 / 405051-19
> > >
> > > Zwangsangaben lt. §35a GmbHG:
> > > HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
> > > Geschäftsführer: Peer Heinlein -- Sitz: Berlin
> > >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > >
> > >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to hide internal ip on ceph mount

2017-03-01 Thread Wido den Hollander


> Op 1 maart 2017 om 16:57 schreef Sage Weil :
> 
> 
> On Wed, 1 Mar 2017, Wido den Hollander wrote:
> > > Op 1 maart 2017 om 15:40 schreef Xiaoxi Chen :
> > > 
> > > 
> > > Well , I think the argument here is not all about security gain, it just
> > > NOT a user friendly way to let "df" show out 7 IPs of monitorsMuch
> > > better if they seeing something like "mycephfs.mydomain.com".
> > > 
> > 
> > mount / df simply prints the monmap. It doesn't print what you added when 
> > you mounted the filesystem.
> > 
> > Totally normal behavior.
> 
> Yep.  This *could* be changed, though: modern kernels have DNS resolution 
> capability.  Not sure if all distros compile it in, but if so, mount.ceph 
> could first try to pass in the DNS name and only do the DNS resolution if 
> the kernel can't.  And the kernel client could be updated to remember the 
> DNS name and use that.  It's a bit friendlier, but imprecise, since DNS 
> might change.  What does NFS do in this case? (Show an IP or a name?)
> 

A "df" will show the entry as it's in the fstab file, but mount will show the 
IPs as well.

But Ceph is a different story here due to the monmap.

Wido

> sage
> 
> 
> > > And using DNS give you the flexibility of changing your monitor quorum
> > > members , without notifying end user to change their fstab entry , or
> > > whatever mount point record.
> > > 
> > 
> > Still applies. Just create a Round Robin DNS record. The clients will 
> > obtain a new monmap while they are connected to the cluster.
> > 
> > Wido
> > 
> > > 2017-03-01 18:46 GMT+08:00 gjprabu :
> > > 
> > > > Hi Robert,
> > > >
> > > >   This container host will be provided to end user and we don't want to
> > > > expose this ip to end users.
> > > >
> > > > Regards
> > > > Prabu GJ
> > > >
> > > >
> > > >  On Wed, 01 Mar 2017 16:03:49 +0530 *Robert Sander
> > > > >* wrote 
> > > > 
> > > >
> > > > On 01.03.2017 10:54, gjprabu wrote:
> > > > > Hi,
> > > > >
> > > > > We try to use host name instead of ip address but mounted partion
> > > > > showing up address only . How show the host name instead of ip 
> > > > > address.
> > > >
> > > > What is the security gain you try to achieve by hiding the IPs?
> > > >
> > > > Regards
> > > > --
> > > > Robert Sander
> > > > Heinlein Support GmbH
> > > > Schwedter Str. 8/9b, 10119 Berlin
> > > >
> > > > http://www.heinlein-support.de
> > > >
> > > > Tel: 030 / 405051-43
> > > > Fax: 030 / 405051-19
> > > >
> > > > Zwangsangaben lt. §35a GmbHG:
> > > > HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
> > > > Geschäftsführer: Peer Heinlein -- Sitz: Berlin
> > > >
> > > > ___
> > > > ceph-users mailing list
> > > > ceph-users@lists.ceph.com
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > >
> > > >
> > > >
> > > > ___
> > > > ceph-users mailing list
> > > > ceph-users@lists.ceph.com
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > >
> > > >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] ceph crush map rules for EC pools and out OSDs ?

2017-03-01 Thread SCHAER Frederic

Hi,

I have 5 data nodes (bluestore, kraken), each with 24 OSDs.
I enabled the optimal crush tunables.
I'd like to try to "really" use EC pools, but until now I've faced cluster 
lockups when I was using 3+2 EC pools with a host failure domain.
When a host was down for instance ;)

Since I'd like the erasure codes to be more than a "nice to have feature with 
12+ ceph data nodes", I wanted to try this :


-  Use a 14+6 EC rule

-  And for each data chunk:

oselect 4 hosts

o   On these hosts, select 5 OSDs

In order to do that, I created this rule in the crush map :

rule 4hosts_20shards {
ruleset 3
type erasure
min_size 20
max_size 20
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default
step choose indep 4 type host
step chooseleaf indep 5 type osd
step emit
}

I then created an EC pool with this erasure profile :
ceph osd erasure-code-profile set erasurep14_6_osd  ruleset-failure-domain=osd 
k=14 m=6

I hoped this would allow for loosing 1 host completely  without locking the 
cluster, and I have the impression this is working..
But. There's always a but ;)

I tried to make all OSDs down by stopping the ceph-osd daemons on one node.
And according to ceph, the cluster is unhealthy.
The ceph health detail fives me for instance this (for the 3+2 and 14+6 pools) :

pg 5.18b is active+undersized+degraded, acting [57,47,2147483647,23,133]
pg 9.186 is active+undersized+degraded, acting 
[2147483647,2147483647,2147483647,2147483647,2147483647,133,142,125,131,137,50,48,55,65,52,16,13,18,22,3]

My question therefore is : why aren't the down PGs remapped onto my 5th data 
node since I made sure the 20 EC shards were spread onto 4 hosts only ?
I thought/hoped that because osds were down, the data would be rebuilt onto 
another OSD/host ?
I can understand the 3+2 EC pool cannot allocate OSDs on another host because 
the 3+2=5 hosts already, but I don't understand why the 14+6 EC pool/pgs do not 
rebuild somewhere else ?

I do not find anything worth in a "ceph pg query", the up and acting parts are 
equal and do contain the 2147483647 value (wich means none as far as I 
understood).

I've also tried to "ceph osd out" all the OSDs from one host : in that case, 
the 3+2 EC PGs behaves as previously, but the 14+6 EC PGs seem happy despite 
the fact they are still saying the out OSDs are up and acting.
Is my crush rule that wrong ?
Is it possible to do what I want ?

Thanks for any hints...

Regards
Frederic

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Ceph - reclaim free space - aka trimrbd image

2017-03-01 Thread Massimiliano Cuttini


Dear all,

i use the rbd-nbd connector.
Is there a way to reclaim free space from rbd image using this component 
or not?



Thanks,
Max

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] S3 Multi-part upload broken with newer AWS Java SDK and Kraken RGW

2017-03-01 Thread John Nielsen

Hi all-

We use Amazon S3 quite a bit at $WORK but are evaluating Ceph+radosgw as an 
alternative for some things. We have an "S3 smoke test" written using the AWS 
Java SDK that we use to validate a number of operations. On my Kraken cluster, 
multi-part uploads work fine for s3cmd. Our smoke test also passes fine using 
version 1.9.27 of the AWS SDK. However in SDK 1.11.69 the multi-part upload 
fails. The initial POST (to reserve the object name and start the upload) 
succeeds, but the first PUT fails with a 403 error.

So, does anyone know offhand what might be going on here? If not, how can I get 
more details about the 403 error and what is causing it?

The cluster was installed with Jewel and recently updated to Kraken. Using the 
built-in civetweb server.

Here is the log output for three multi-part uploads. The first two are s3cmd 
and the older SDK, respectively. The last is the failing one with the newer SDK.

S3cmd, Succeeds.
2017-03-01 17:33:16.845613 7f80b06de700  1 == starting new request 
req=0x7f80b06d8340 =
2017-03-01 17:33:16.856522 7f80b06de700  1 == req done req=0x7f80b06d8340 
op status=0 http_status=200 ==
2017-03-01 17:33:16.856628 7f80b06de700  1 civetweb: 0x7f81131fd000: 
10.251.50.7 - - [01/Mar/2017:17:33:16 +] "POST 
/testdomobucket10x3x104x64250438/multipartStreamTest?uploads HTTP/1.1" 1 0 - -
2017-03-01 17:33:16.953967 7f80b06de700  1 == starting new request 
req=0x7f80b06d8340 =
2017-03-01 17:33:24.094134 7f80b06de700  1 == req done req=0x7f80b06d8340 
op status=0 http_status=200 ==
2017-03-01 17:33:24.094211 7f80b06de700  1 civetweb: 0x7f81131fd000: 
10.251.50.7 - - [01/Mar/2017:17:33:16 +] "PUT 
/testdomobucket10x3x104x64250438/multipartStreamTest?partNumber=1&uploadId=2~IGYuZC4uDC27TGWfpFkKk-Makqvk_XB
 HTTP/1.1" 1 0 - -
2017-03-01 17:33:24.193747 7f80b06de700  1 == starting new request 
req=0x7f80b06d8340 =
2017-03-01 17:33:30.002050 7f80b06de700  1 == req done req=0x7f80b06d8340 
op status=0 http_status=200 ==
2017-03-01 17:33:30.002124 7f80b06de700  1 civetweb: 0x7f81131fd000: 
10.251.50.7 - - [01/Mar/2017:17:33:16 +] "PUT 
/testdomobucket10x3x104x64250438/multipartStreamTest?partNumber=2&uploadId=2~IGYuZC4uDC27TGWfpFkKk-Makqvk_XB
 HTTP/1.1" 1 0 - -
2017-03-01 17:33:30.085033 7f80b06de700  1 == starting new request 
req=0x7f80b06d8340 =
2017-03-01 17:33:30.104944 7f80b06de700  1 == req done req=0x7f80b06d8340 
op status=0 http_status=200 ==
2017-03-01 17:33:30.105007 7f80b06de700  1 civetweb: 0x7f81131fd000: 
10.251.50.7 - - [01/Mar/2017:17:33:16 +] "POST 
/testdomobucket10x3x104x64250438/multipartStreamTest?uploadId=2~IGYuZC4uDC27TGWfpFkKk-Makqvk_XB
 HTTP/1.1" 1 0 - -

AWS SDK (1.9.27). Succeeds.
2017-03-01 17:54:50.720093 7f80c0eff700  1 == starting new request 
req=0x7f80c0ef9340 =
2017-03-01 17:54:50.733109 7f80c0eff700  1 == req done req=0x7f80c0ef9340 
op status=0 http_status=200 ==
2017-03-01 17:54:50.733188 7f80c0eff700  1 civetweb: 0x7f811314c000: 
10.251.50.7 - - [01/Mar/2017:17:54:42 +] "POST 
/testdomobucket10x3x104x6443285/multipartStreamTest?uploads HTTP/1.1" 1 0 - 
aws-sdk-java/1.9.27 Mac_OS_X/10.10.5 
Java_HotSpot(TM)_64-Bit_Server_VM/24.71-b01/1.7.0_71
2017-03-01 17:54:50.831618 7f80c0eff700  1 == starting new request 
req=0x7f80c0ef9340 =
2017-03-01 17:54:58.057011 7f80c0eff700  1 == req done req=0x7f80c0ef9340 
op status=0 http_status=200 ==
2017-03-01 17:54:58.057082 7f80c0eff700  1 civetweb: 0x7f811314c000: 
10.251.50.7 - - [01/Mar/2017:17:54:42 +] "PUT 
/testdomobucket10x3x104x6443285/multipartStreamTest?uploadId=2%7EPlNR4meSvAvCYtvbqz8JLlSKu5_laxo&partNumber=1
 HTTP/1.1" 1 0 - aws-sdk-java/1.9.27 Mac_OS_X/10.10.5 
Java_HotSpot(TM)_64-Bit_Server_VM/24.71-b01/1.7.0_71
2017-03-01 17:54:58.143235 7f80c0eff700  1 == starting new request 
req=0x7f80c0ef9340 =
2017-03-01 17:54:58.328351 7f80c0eff700  1 == req done req=0x7f80c0ef9340 
op status=0 http_status=200 ==
2017-03-01 17:54:58.328437 7f80c0eff700  1 civetweb: 0x7f811314c000: 
10.251.50.7 - - [01/Mar/2017:17:54:42 +] "PUT 
/testdomobucket10x3x104x6443285/multipartStreamTest?uploadId=2%7EPlNR4meSvAvCYtvbqz8JLlSKu5_laxo&partNumber=2
 HTTP/1.1" 1 0 - aws-sdk-java/1.9.27 Mac_OS_X/10.10.5 
Java_HotSpot(TM)_64-Bit_Server_VM/24.71-b01/1.7.0_71
2017-03-01 17:54:58.415890 7f80c0eff700  1 == starting new request 
req=0x7f80c0ef9340 =
2017-03-01 17:54:58.438199 7f80c0eff700  1 == req done req=0x7f80c0ef9340 
op status=0 http_status=200 ==
2017-03-01 17:54:58.438253 7f80c0eff700  1 civetweb: 0x7f811314c000: 
10.251.50.7 - - [01/Mar/2017:17:54:42 +] "POST 
/testdomobucket10x3x104x6443285/multipartStreamTest?uploadId=2%7EPlNR4meSvAvCYtvbqz8JLlSKu5_laxo
 HTTP/1.1" 1 0 - aws-sdk-java/1.9.27 Mac_OS_X/10.10.5 
Java_HotSpot(TM)_64-Bit_Server_VM/24.71-b01/1.7.0_71

AWS SDK (1.11.69), fails.
2017-03-01 17:37:31.833693 7f80c6f0b700  1 == starting new request

Re: [ceph-users] S3 Multi-part upload broken with newer AWS Java SDK and Kraken RGW

2017-03-01 Thread Roger Brown

I had similar issues when I created all the rbd-related pools with
erasure-coding instead of replication. -Roger


On Wed, Mar 1, 2017 at 11:47 AM John Nielsen  wrote:

> Hi all-
>
> We use Amazon S3 quite a bit at $WORK but are evaluating Ceph+radosgw as
> an alternative for some things. We have an "S3 smoke test" written using
> the AWS Java SDK that we use to validate a number of operations. On my
> Kraken cluster, multi-part uploads work fine for s3cmd. Our smoke test also
> passes fine using version 1.9.27 of the AWS SDK. However in SDK 1.11.69 the
> multi-part upload fails. The initial POST (to reserve the object name and
> start the upload) succeeds, but the first PUT fails with a 403 error.
>
> So, does anyone know offhand what might be going on here? If not, how can
> I get more details about the 403 error and what is causing it?
>
> The cluster was installed with Jewel and recently updated to Kraken. Using
> the built-in civetweb server.
>
> Here is the log output for three multi-part uploads. The first two are
> s3cmd and the older SDK, respectively. The last is the failing one with the
> newer SDK.
>
> S3cmd, Succeeds.
> 2017-03-01 17:33:16.845613 7f80b06de700  1 == starting new request
> req=0x7f80b06d8340 =
> 2017-03-01 17:33:16.856522 7f80b06de700  1 == req done
> req=0x7f80b06d8340 op status=0 http_status=200 ==
> 2017-03-01 17:33:16.856628 7f80b06de700  1 civetweb: 0x7f81131fd000:
> 10.251.50.7 - - [01/Mar/2017:17:33:16 +] "POST
> /testdomobucket10x3x104x64250438/multipartStreamTest?uploads HTTP/1.1" 1 0
> - -
> 2017-03-01 17:33:16.953967 7f80b06de700  1 == starting new request
> req=0x7f80b06d8340 =
> 2017-03-01 17:33:24.094134 7f80b06de700  1 == req done
> req=0x7f80b06d8340 op status=0 http_status=200 ==
> 2017-03-01 17:33:24.094211 7f80b06de700  1 civetweb: 0x7f81131fd000:
> 10.251.50.7 - - [01/Mar/2017:17:33:16 +] "PUT
> /testdomobucket10x3x104x64250438/multipartStreamTest?partNumber=1&uploadId=2~IGYuZC4uDC27TGWfpFkKk-Makqvk_XB
> HTTP/1.1" 1 0 - -
> 2017-03-01 17:33:24.193747 7f80b06de700  1 == starting new request
> req=0x7f80b06d8340 =
> 2017-03-01 17:33:30.002050 7f80b06de700  1 == req done
> req=0x7f80b06d8340 op status=0 http_status=200 ==
> 2017-03-01 17:33:30.002124 7f80b06de700  1 civetweb: 0x7f81131fd000:
> 10.251.50.7 - - [01/Mar/2017:17:33:16 +] "PUT
> /testdomobucket10x3x104x64250438/multipartStreamTest?partNumber=2&uploadId=2~IGYuZC4uDC27TGWfpFkKk-Makqvk_XB
> HTTP/1.1" 1 0 - -
> 2017-03-01 17:33:30.085033 7f80b06de700  1 == starting new request
> req=0x7f80b06d8340 =
> 2017-03-01 17:33:30.104944 7f80b06de700  1 == req done
> req=0x7f80b06d8340 op status=0 http_status=200 ==
> 2017-03-01 17:33:30.105007 7f80b06de700  1 civetweb: 0x7f81131fd000:
> 10.251.50.7 - - [01/Mar/2017:17:33:16 +] "POST
> /testdomobucket10x3x104x64250438/multipartStreamTest?uploadId=2~IGYuZC4uDC27TGWfpFkKk-Makqvk_XB
> HTTP/1.1" 1 0 - -
>
> AWS SDK (1.9.27). Succeeds.
> 2017-03-01 17:54:50.720093 7f80c0eff700  1 == starting new request
> req=0x7f80c0ef9340 =
> 2017-03-01 17:54:50.733109 7f80c0eff700  1 == req done
> req=0x7f80c0ef9340 op status=0 http_status=200 ==
> 2017-03-01 17:54:50.733188 7f80c0eff700  1 civetweb: 0x7f811314c000:
> 10.251.50.7 - - [01/Mar/2017:17:54:42 +] "POST
> /testdomobucket10x3x104x6443285/multipartStreamTest?uploads HTTP/1.1" 1 0 -
> aws-sdk-java/1.9.27 Mac_OS_X/10.10.5
> Java_HotSpot(TM)_64-Bit_Server_VM/24.71-b01/1.7.0_71
> 2017-03-01 17:54:50.831618 7f80c0eff700  1 == starting new request
> req=0x7f80c0ef9340 =
> 2017-03-01 17:54:58.057011 7f80c0eff700  1 == req done
> req=0x7f80c0ef9340 op status=0 http_status=200 ==
> 2017-03-01 17:54:58.057082 7f80c0eff700  1 civetweb: 0x7f811314c000:
> 10.251.50.7 - - [01/Mar/2017:17:54:42 +] "PUT
> /testdomobucket10x3x104x6443285/multipartStreamTest?uploadId=2%7EPlNR4meSvAvCYtvbqz8JLlSKu5_laxo&partNumber=1
> HTTP/1.1" 1 0 - aws-sdk-java/1.9.27 Mac_OS_X/10.10.5
> Java_HotSpot(TM)_64-Bit_Server_VM/24.71-b01/1.7.0_71
> 2017-03-01 17:54:58.143235 7f80c0eff700  1 == starting new request
> req=0x7f80c0ef9340 =
> 2017-03-01 17:54:58.328351 7f80c0eff700  1 == req done
> req=0x7f80c0ef9340 op status=0 http_status=200 ==
> 2017-03-01 17:54:58.328437 7f80c0eff700  1 civetweb: 0x7f811314c000:
> 10.251.50.7 - - [01/Mar/2017:17:54:42 +] "PUT
> /testdomobucket10x3x104x6443285/multipartStreamTest?uploadId=2%7EPlNR4meSvAvCYtvbqz8JLlSKu5_laxo&partNumber=2
> HTTP/1.1" 1 0 - aws-sdk-java/1.9.27 Mac_OS_X/10.10.5
> Java_HotSpot(TM)_64-Bit_Server_VM/24.71-b01/1.7.0_71
> 2017-03-01 17:54:58.415890 7f80c0eff700  1 == starting new request
> req=0x7f80c0ef9340 =
> 2017-03-01 17:54:58.438199 7f80c0eff700  1 == req done
> req=0x7f80c0ef9340 op status=0 http_status=200 ==
> 2017-03-01 17:54:58.438253 7f80c0eff700  1 civetweb: 0x7f811314c000:
> 10.251.50.7 - - [01/Mar/2017:17:54:42 +] "POST
> /testdomobuck

Re: [ceph-users] Ceph - reclaim free space - aka trimrbd image

2017-03-01 Thread Jason Dillaman

You should be able to issue an fstrim against the filesystem on top of
the nbd device or run blkdiscard against the raw device if you don't
have a filesystem.

On Wed, Mar 1, 2017 at 1:26 PM, Massimiliano Cuttini  wrote:
> Dear all,
>
> i use the rbd-nbd connector.
> Is there a way to reclaim free space from rbd image using this component or
> not?
>
>
> Thanks,
> Max
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] S3 Multi-part upload broken with newer AWS Java SDK and Kraken RGW

2017-03-01 Thread Yehuda Sadeh-Weinraub

This sounds like this bug:
http://tracker.ceph.com/issues/17076

Will be fixed in 10.2.6. It's triggered by aws4 auth, so a workaround
would be to use aws2 instead.

Yehuda


On Wed, Mar 1, 2017 at 10:46 AM, John Nielsen  wrote:
> Hi all-
>
> We use Amazon S3 quite a bit at $WORK but are evaluating Ceph+radosgw as an 
> alternative for some things. We have an "S3 smoke test" written using the AWS 
> Java SDK that we use to validate a number of operations. On my Kraken 
> cluster, multi-part uploads work fine for s3cmd. Our smoke test also passes 
> fine using version 1.9.27 of the AWS SDK. However in SDK 1.11.69 the 
> multi-part upload fails. The initial POST (to reserve the object name and 
> start the upload) succeeds, but the first PUT fails with a 403 error.
>
> So, does anyone know offhand what might be going on here? If not, how can I 
> get more details about the 403 error and what is causing it?
>
> The cluster was installed with Jewel and recently updated to Kraken. Using 
> the built-in civetweb server.
>
> Here is the log output for three multi-part uploads. The first two are s3cmd 
> and the older SDK, respectively. The last is the failing one with the newer 
> SDK.
>
> S3cmd, Succeeds.
> 2017-03-01 17:33:16.845613 7f80b06de700  1 == starting new request 
> req=0x7f80b06d8340 =
> 2017-03-01 17:33:16.856522 7f80b06de700  1 == req done req=0x7f80b06d8340 
> op status=0 http_status=200 ==
> 2017-03-01 17:33:16.856628 7f80b06de700  1 civetweb: 0x7f81131fd000: 
> 10.251.50.7 - - [01/Mar/2017:17:33:16 +] "POST 
> /testdomobucket10x3x104x64250438/multipartStreamTest?uploads HTTP/1.1" 1 0 - -
> 2017-03-01 17:33:16.953967 7f80b06de700  1 == starting new request 
> req=0x7f80b06d8340 =
> 2017-03-01 17:33:24.094134 7f80b06de700  1 == req done req=0x7f80b06d8340 
> op status=0 http_status=200 ==
> 2017-03-01 17:33:24.094211 7f80b06de700  1 civetweb: 0x7f81131fd000: 
> 10.251.50.7 - - [01/Mar/2017:17:33:16 +] "PUT 
> /testdomobucket10x3x104x64250438/multipartStreamTest?partNumber=1&uploadId=2~IGYuZC4uDC27TGWfpFkKk-Makqvk_XB
>  HTTP/1.1" 1 0 - -
> 2017-03-01 17:33:24.193747 7f80b06de700  1 == starting new request 
> req=0x7f80b06d8340 =
> 2017-03-01 17:33:30.002050 7f80b06de700  1 == req done req=0x7f80b06d8340 
> op status=0 http_status=200 ==
> 2017-03-01 17:33:30.002124 7f80b06de700  1 civetweb: 0x7f81131fd000: 
> 10.251.50.7 - - [01/Mar/2017:17:33:16 +] "PUT 
> /testdomobucket10x3x104x64250438/multipartStreamTest?partNumber=2&uploadId=2~IGYuZC4uDC27TGWfpFkKk-Makqvk_XB
>  HTTP/1.1" 1 0 - -
> 2017-03-01 17:33:30.085033 7f80b06de700  1 == starting new request 
> req=0x7f80b06d8340 =
> 2017-03-01 17:33:30.104944 7f80b06de700  1 == req done req=0x7f80b06d8340 
> op status=0 http_status=200 ==
> 2017-03-01 17:33:30.105007 7f80b06de700  1 civetweb: 0x7f81131fd000: 
> 10.251.50.7 - - [01/Mar/2017:17:33:16 +] "POST 
> /testdomobucket10x3x104x64250438/multipartStreamTest?uploadId=2~IGYuZC4uDC27TGWfpFkKk-Makqvk_XB
>  HTTP/1.1" 1 0 - -
>
> AWS SDK (1.9.27). Succeeds.
> 2017-03-01 17:54:50.720093 7f80c0eff700  1 == starting new request 
> req=0x7f80c0ef9340 =
> 2017-03-01 17:54:50.733109 7f80c0eff700  1 == req done req=0x7f80c0ef9340 
> op status=0 http_status=200 ==
> 2017-03-01 17:54:50.733188 7f80c0eff700  1 civetweb: 0x7f811314c000: 
> 10.251.50.7 - - [01/Mar/2017:17:54:42 +] "POST 
> /testdomobucket10x3x104x6443285/multipartStreamTest?uploads HTTP/1.1" 1 0 - 
> aws-sdk-java/1.9.27 Mac_OS_X/10.10.5 
> Java_HotSpot(TM)_64-Bit_Server_VM/24.71-b01/1.7.0_71
> 2017-03-01 17:54:50.831618 7f80c0eff700  1 == starting new request 
> req=0x7f80c0ef9340 =
> 2017-03-01 17:54:58.057011 7f80c0eff700  1 == req done req=0x7f80c0ef9340 
> op status=0 http_status=200 ==
> 2017-03-01 17:54:58.057082 7f80c0eff700  1 civetweb: 0x7f811314c000: 
> 10.251.50.7 - - [01/Mar/2017:17:54:42 +] "PUT 
> /testdomobucket10x3x104x6443285/multipartStreamTest?uploadId=2%7EPlNR4meSvAvCYtvbqz8JLlSKu5_laxo&partNumber=1
>  HTTP/1.1" 1 0 - aws-sdk-java/1.9.27 Mac_OS_X/10.10.5 
> Java_HotSpot(TM)_64-Bit_Server_VM/24.71-b01/1.7.0_71
> 2017-03-01 17:54:58.143235 7f80c0eff700  1 == starting new request 
> req=0x7f80c0ef9340 =
> 2017-03-01 17:54:58.328351 7f80c0eff700  1 == req done req=0x7f80c0ef9340 
> op status=0 http_status=200 ==
> 2017-03-01 17:54:58.328437 7f80c0eff700  1 civetweb: 0x7f811314c000: 
> 10.251.50.7 - - [01/Mar/2017:17:54:42 +] "PUT 
> /testdomobucket10x3x104x6443285/multipartStreamTest?uploadId=2%7EPlNR4meSvAvCYtvbqz8JLlSKu5_laxo&partNumber=2
>  HTTP/1.1" 1 0 - aws-sdk-java/1.9.27 Mac_OS_X/10.10.5 
> Java_HotSpot(TM)_64-Bit_Server_VM/24.71-b01/1.7.0_71
> 2017-03-01 17:54:58.415890 7f80c0eff700  1 == starting new request 
> req=0x7f80c0ef9340 =
> 2017-03-01 17:54:58.438199 7f80c0eff700  1 == req done req=0x7f80c0ef9340 
> op status=0 http_status=200 ==
> 2017-03-01 17:54:58.438253 7f8

Re: [ceph-users] S3 Multi-part upload broken with newer AWS Java SDK and Kraken RGW

2017-03-01 Thread John Nielsen

Thanks! Changing to V2 auth does indeed work around the issue with the newer 
SDK.

> On Mar 1, 2017, at 12:33 PM, Yehuda Sadeh-Weinraub  wrote:
> 
> This sounds like this bug:
> http://tracker.ceph.com/issues/17076
> 
> Will be fixed in 10.2.6. It's triggered by aws4 auth, so a workaround
> would be to use aws2 instead.
> 
> Yehuda
> 
> 
> On Wed, Mar 1, 2017 at 10:46 AM, John Nielsen  wrote:
>> Hi all-
>> 
>> We use Amazon S3 quite a bit at $WORK but are evaluating Ceph+radosgw as an 
>> alternative for some things. We have an "S3 smoke test" written using the 
>> AWS Java SDK that we use to validate a number of operations. On my Kraken 
>> cluster, multi-part uploads work fine for s3cmd. Our smoke test also passes 
>> fine using version 1.9.27 of the AWS SDK. However in SDK 1.11.69 the 
>> multi-part upload fails. The initial POST (to reserve the object name and 
>> start the upload) succeeds, but the first PUT fails with a 403 error.
>> 
>> So, does anyone know offhand what might be going on here? If not, how can I 
>> get more details about the 403 error and what is causing it?
>> 
>> The cluster was installed with Jewel and recently updated to Kraken. Using 
>> the built-in civetweb server.
>> 
>> Here is the log output for three multi-part uploads. The first two are s3cmd 
>> and the older SDK, respectively. The last is the failing one with the newer 
>> SDK.
>> 
>> S3cmd, Succeeds.
>> 2017-03-01 17:33:16.845613 7f80b06de700  1 == starting new request 
>> req=0x7f80b06d8340 =
>> 2017-03-01 17:33:16.856522 7f80b06de700  1 == req done 
>> req=0x7f80b06d8340 op status=0 http_status=200 ==
>> 2017-03-01 17:33:16.856628 7f80b06de700  1 civetweb: 0x7f81131fd000: 
>> 10.251.50.7 - - [01/Mar/2017:17:33:16 +] "POST 
>> /testdomobucket10x3x104x64250438/multipartStreamTest?uploads HTTP/1.1" 1 0 - 
>> -
>> 2017-03-01 17:33:16.953967 7f80b06de700  1 == starting new request 
>> req=0x7f80b06d8340 =
>> 2017-03-01 17:33:24.094134 7f80b06de700  1 == req done 
>> req=0x7f80b06d8340 op status=0 http_status=200 ==
>> 2017-03-01 17:33:24.094211 7f80b06de700  1 civetweb: 0x7f81131fd000: 
>> 10.251.50.7 - - [01/Mar/2017:17:33:16 +] "PUT 
>> /testdomobucket10x3x104x64250438/multipartStreamTest?partNumber=1&uploadId=2~IGYuZC4uDC27TGWfpFkKk-Makqvk_XB
>>  HTTP/1.1" 1 0 - -
>> 2017-03-01 17:33:24.193747 7f80b06de700  1 == starting new request 
>> req=0x7f80b06d8340 =
>> 2017-03-01 17:33:30.002050 7f80b06de700  1 == req done 
>> req=0x7f80b06d8340 op status=0 http_status=200 ==
>> 2017-03-01 17:33:30.002124 7f80b06de700  1 civetweb: 0x7f81131fd000: 
>> 10.251.50.7 - - [01/Mar/2017:17:33:16 +] "PUT 
>> /testdomobucket10x3x104x64250438/multipartStreamTest?partNumber=2&uploadId=2~IGYuZC4uDC27TGWfpFkKk-Makqvk_XB
>>  HTTP/1.1" 1 0 - -
>> 2017-03-01 17:33:30.085033 7f80b06de700  1 == starting new request 
>> req=0x7f80b06d8340 =
>> 2017-03-01 17:33:30.104944 7f80b06de700  1 == req done 
>> req=0x7f80b06d8340 op status=0 http_status=200 ==
>> 2017-03-01 17:33:30.105007 7f80b06de700  1 civetweb: 0x7f81131fd000: 
>> 10.251.50.7 - - [01/Mar/2017:17:33:16 +] "POST 
>> /testdomobucket10x3x104x64250438/multipartStreamTest?uploadId=2~IGYuZC4uDC27TGWfpFkKk-Makqvk_XB
>>  HTTP/1.1" 1 0 - -
>> 
>> AWS SDK (1.9.27). Succeeds.
>> 2017-03-01 17:54:50.720093 7f80c0eff700  1 == starting new request 
>> req=0x7f80c0ef9340 =
>> 2017-03-01 17:54:50.733109 7f80c0eff700  1 == req done 
>> req=0x7f80c0ef9340 op status=0 http_status=200 ==
>> 2017-03-01 17:54:50.733188 7f80c0eff700  1 civetweb: 0x7f811314c000: 
>> 10.251.50.7 - - [01/Mar/2017:17:54:42 +] "POST 
>> /testdomobucket10x3x104x6443285/multipartStreamTest?uploads HTTP/1.1" 1 0 - 
>> aws-sdk-java/1.9.27 Mac_OS_X/10.10.5 
>> Java_HotSpot(TM)_64-Bit_Server_VM/24.71-b01/1.7.0_71
>> 2017-03-01 17:54:50.831618 7f80c0eff700  1 == starting new request 
>> req=0x7f80c0ef9340 =
>> 2017-03-01 17:54:58.057011 7f80c0eff700  1 == req done 
>> req=0x7f80c0ef9340 op status=0 http_status=200 ==
>> 2017-03-01 17:54:58.057082 7f80c0eff700  1 civetweb: 0x7f811314c000: 
>> 10.251.50.7 - - [01/Mar/2017:17:54:42 +] "PUT 
>> /testdomobucket10x3x104x6443285/multipartStreamTest?uploadId=2%7EPlNR4meSvAvCYtvbqz8JLlSKu5_laxo&partNumber=1
>>  HTTP/1.1" 1 0 - aws-sdk-java/1.9.27 Mac_OS_X/10.10.5 
>> Java_HotSpot(TM)_64-Bit_Server_VM/24.71-b01/1.7.0_71
>> 2017-03-01 17:54:58.143235 7f80c0eff700  1 == starting new request 
>> req=0x7f80c0ef9340 =
>> 2017-03-01 17:54:58.328351 7f80c0eff700  1 == req done 
>> req=0x7f80c0ef9340 op status=0 http_status=200 ==
>> 2017-03-01 17:54:58.328437 7f80c0eff700  1 civetweb: 0x7f811314c000: 
>> 10.251.50.7 - - [01/Mar/2017:17:54:42 +] "PUT 
>> /testdomobucket10x3x104x6443285/multipartStreamTest?uploadId=2%7EPlNR4meSvAvCYtvbqz8JLlSKu5_laxo&partNumber=2
>>  HTTP/1.1" 1 0 - aws-sdk-java/1.9.27 Mac_OS_X/10.10.5 
>> Java_HotSpot(TM)_64-Bit_Server_VM/24.71-b01/1

Re: [ceph-users] How to hide internal ip on ceph mount

2017-03-01 Thread Xiaoxi Chen

>Still applies. Just create a Round Robin DNS record. The clients will
obtain a new monmap while they are connected to the cluster.

It works to some extent, but causing issue for "mount -a". We have such
deployment nowaday, a GTM(kinds of dns) record created with all MDS ips and
it works fine in terms of failover/ mount.

But, user usually automation such mount by fstab and even, "mount -a " are
periodically called. With such DNS approach above, they will get mount
point busy message every time. Just due to mount.ceph resolve the DNS name
to another IP, and kernel client was feeling like you are trying to attach
another fs...



2017-03-02 0:29 GMT+08:00 Wido den Hollander :

>
> > Op 1 maart 2017 om 16:57 schreef Sage Weil :
> >
> >
> > On Wed, 1 Mar 2017, Wido den Hollander wrote:
> > > > Op 1 maart 2017 om 15:40 schreef Xiaoxi Chen  >:
> > > >
> > > >
> > > > Well , I think the argument here is not all about security gain, it
> just
> > > > NOT a user friendly way to let "df" show out 7 IPs of
> monitorsMuch
> > > > better if they seeing something like "mycephfs.mydomain.com".
> > > >
> > >
> > > mount / df simply prints the monmap. It doesn't print what you added
> when you mounted the filesystem.
> > >
> > > Totally normal behavior.
> >
> > Yep.  This *could* be changed, though: modern kernels have DNS resolution
> > capability.  Not sure if all distros compile it in, but if so, mount.ceph
> > could first try to pass in the DNS name and only do the DNS resolution if
> > the kernel can't.  And the kernel client could be updated to remember the
> > DNS name and use that.  It's a bit friendlier, but imprecise, since DNS
> > might change.  What does NFS do in this case? (Show an IP or a name?)
> >
>
> A "df" will show the entry as it's in the fstab file, but mount will show
> the IPs as well.
>
> But Ceph is a different story here due to the monmap.
>
> Wido
>
> > sage
> >
> >
> > > > And using DNS give you the flexibility of changing your monitor
> quorum
> > > > members , without notifying end user to change their fstab entry , or
> > > > whatever mount point record.
> > > >
> > >
> > > Still applies. Just create a Round Robin DNS record. The clients will
> obtain a new monmap while they are connected to the cluster.
> > >
> > > Wido
> > >
> > > > 2017-03-01 18:46 GMT+08:00 gjprabu :
> > > >
> > > > > Hi Robert,
> > > > >
> > > > >   This container host will be provided to end user and we don't
> want to
> > > > > expose this ip to end users.
> > > > >
> > > > > Regards
> > > > > Prabu GJ
> > > > >
> > > > >
> > > > >  On Wed, 01 Mar 2017 16:03:49 +0530 *Robert Sander
> > > > > >*
> wrote 
> > > > >
> > > > > On 01.03.2017 10:54, gjprabu wrote:
> > > > > > Hi,
> > > > > >
> > > > > > We try to use host name instead of ip address but mounted partion
> > > > > > showing up address only . How show the host name instead of ip
> address.
> > > > >
> > > > > What is the security gain you try to achieve by hiding the IPs?
> > > > >
> > > > > Regards
> > > > > --
> > > > > Robert Sander
> > > > > Heinlein Support GmbH
> > > > > Schwedter Str. 8/9b, 10119 Berlin
> > > > >
> > > > > http://www.heinlein-support.de
> > > > >
> > > > > Tel: 030 / 405051-43
> > > > > Fax: 030 / 405051-19
> > > > >
> > > > > Zwangsangaben lt. §35a GmbHG:
> > > > > HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
> > > > > Geschäftsführer: Peer Heinlein -- Sitz: Berlin
> > > > >
> > > > > ___
> > > > > ceph-users mailing list
> > > > > ceph-users@lists.ceph.com
> > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > >
> > > > >
> > > > >
> > > > > ___
> > > > > ceph-users mailing list
> > > > > ceph-users@lists.ceph.com
> > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > >
> > > > >
> > > > ___
> > > > ceph-users mailing list
> > > > ceph-users@lists.ceph.com
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to hide internal ip on ceph mount

2017-03-01 Thread Xiaoxi Chen

> mount / df simply prints the monmap. It doesn't print what you added when
you mounted the filesystem.
>
> Totally normal behavior.


Not true again,

df only show what IP or IPs you added when mounting, also mount

10.189.11.138:6789:/sharefs_prod/8c285b3b59a843b6aab623314288ee36  2.8P
 108T  2.7P   4% /mnt/slc_cephFS_8c285b3b59a843b6aab623314288ee36
10.135.3.136:6789:/sharefs_prod/8c285b3b59a843b6aab623314288ee36   2.7P
91T  2.6P   4% /mnt/lvs_cephFS_8c285b3b59a843b6aab623314288ee36

But we do have 5/7 mons for each cluster.








2017-03-02 7:42 GMT+08:00 Xiaoxi Chen :

> >Still applies. Just create a Round Robin DNS record. The clients will
> obtain a new monmap while they are connected to the cluster.
>
> It works to some extent, but causing issue for "mount -a". We have such
> deployment nowaday, a GTM(kinds of dns) record created with all MDS ips and
> it works fine in terms of failover/ mount.
>
> But, user usually automation such mount by fstab and even, "mount -a " are
> periodically called. With such DNS approach above, they will get mount
> point busy message every time. Just due to mount.ceph resolve the DNS name
> to another IP, and kernel client was feeling like you are trying to attach
> another fs...
>
>
>
> 2017-03-02 0:29 GMT+08:00 Wido den Hollander :
>
>>
>> > Op 1 maart 2017 om 16:57 schreef Sage Weil :
>> >
>> >
>> > On Wed, 1 Mar 2017, Wido den Hollander wrote:
>> > > > Op 1 maart 2017 om 15:40 schreef Xiaoxi Chen <
>> superdebu...@gmail.com>:
>> > > >
>> > > >
>> > > > Well , I think the argument here is not all about security gain, it
>> just
>> > > > NOT a user friendly way to let "df" show out 7 IPs of
>> monitorsMuch
>> > > > better if they seeing something like "mycephfs.mydomain.com".
>> > > >
>> > >
>> > > mount / df simply prints the monmap. It doesn't print what you added
>> when you mounted the filesystem.
>> > >
>> > > Totally normal behavior.
>> >
>> > Yep.  This *could* be changed, though: modern kernels have DNS
>> resolution
>> > capability.  Not sure if all distros compile it in, but if so,
>> mount.ceph
>> > could first try to pass in the DNS name and only do the DNS resolution
>> if
>> > the kernel can't.  And the kernel client could be updated to remember
>> the
>> > DNS name and use that.  It's a bit friendlier, but imprecise, since DNS
>> > might change.  What does NFS do in this case? (Show an IP or a name?)
>> >
>>
>> A "df" will show the entry as it's in the fstab file, but mount will show
>> the IPs as well.
>>
>> But Ceph is a different story here due to the monmap.
>>
>> Wido
>>
>> > sage
>> >
>> >
>> > > > And using DNS give you the flexibility of changing your monitor
>> quorum
>> > > > members , without notifying end user to change their fstab entry ,
>> or
>> > > > whatever mount point record.
>> > > >
>> > >
>> > > Still applies. Just create a Round Robin DNS record. The clients will
>> obtain a new monmap while they are connected to the cluster.
>> > >
>> > > Wido
>> > >
>> > > > 2017-03-01 18:46 GMT+08:00 gjprabu :
>> > > >
>> > > > > Hi Robert,
>> > > > >
>> > > > >   This container host will be provided to end user and we don't
>> want to
>> > > > > expose this ip to end users.
>> > > > >
>> > > > > Regards
>> > > > > Prabu GJ
>> > > > >
>> > > > >
>> > > > >  On Wed, 01 Mar 2017 16:03:49 +0530 *Robert Sander
>> > > > > >*
>> wrote 
>> > > > >
>> > > > > On 01.03.2017 10:54, gjprabu wrote:
>> > > > > > Hi,
>> > > > > >
>> > > > > > We try to use host name instead of ip address but mounted
>> partion
>> > > > > > showing up address only . How show the host name instead of ip
>> address.
>> > > > >
>> > > > > What is the security gain you try to achieve by hiding the IPs?
>> > > > >
>> > > > > Regards
>> > > > > --
>> > > > > Robert Sander
>> > > > > Heinlein Support GmbH
>> > > > > Schwedter Str. 8/9b, 10119 Berlin
>> > > > >
>> > > > > http://www.heinlein-support.de
>> > > > >
>> > > > > Tel: 030 / 405051-43
>> > > > > Fax: 030 / 405051-19
>> > > > >
>> > > > > Zwangsangaben lt. §35a GmbHG:
>> > > > > HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
>> > > > > Geschäftsführer: Peer Heinlein -- Sitz: Berlin
>> > > > >
>> > > > > ___
>> > > > > ceph-users mailing list
>> > > > > ceph-users@lists.ceph.com
>> > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > > > >
>> > > > >
>> > > > >
>> > > > > ___
>> > > > > ceph-users mailing list
>> > > > > ceph-users@lists.ceph.com
>> > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > > > >
>> > > > >
>> > > > ___
>> > > > ceph-users mailing list
>> > > > ceph-users@lists.ceph.com
>> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > > ___
>> > > ceph-users mailing list
>> > > ceph-users@lists.ceph.com
>> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > >
>>
>

Re: [ceph-users] How to hide internal ip on ceph mount

2017-03-01 Thread Sage Weil

On Thu, 2 Mar 2017, Xiaoxi Chen wrote:
> >Still applies. Just create a Round Robin DNS record. The clients will
> obtain a new monmap while they are connected to the cluster.
> It works to some extent, but causing issue for "mount -a". We have such
> deployment nowaday, a GTM(kinds of dns) record created with all MDS ips and
> it works fine in terms of failover/ mount.
> 
> But, user usually automation such mount by fstab and even, "mount -a " are
> periodically called. With such DNS approach above, they will get mount point
> busy message every time. Just due to mount.ceph resolve the DNS name to
> another IP, and kernel client was feeling like you are trying to attach
> another fs...

The kernel client is (should be!) smart enough to tell that it is the same 
mount point and will share the superblock.  If you see a problem here it's 
a bug.

sage


> 
> 
> 2017-03-02 0:29 GMT+08:00 Wido den Hollander :
> 
>   > Op 1 maart 2017 om 16:57 schreef Sage Weil
>   :
>   >
>   >
>   > On Wed, 1 Mar 2017, Wido den Hollander wrote:
>   > > > Op 1 maart 2017 om 15:40 schreef Xiaoxi Chen
>   :
>   > > >
>   > > >
>   > > > Well , I think the argument here is not all about security
>   gain, it just
>   > > > NOT a user friendly way to let "df" show out 7 IPs of
>   monitorsMuch
>   > > > better if they seeing something like
>   "mycephfs.mydomain.com".
>   > > >
>   > >
>   > > mount / df simply prints the monmap. It doesn't print what
>   you added when you mounted the filesystem.
>   > >
>   > > Totally normal behavior.
>   >
>   > Yep.  This *could* be changed, though: modern kernels have DNS
>   resolution
>   > capability.  Not sure if all distros compile it in, but if so,
>   mount.ceph
>   > could first try to pass in the DNS name and only do the DNS
>   resolution if
>   > the kernel can't.  And the kernel client could be updated to
>   remember the
>   > DNS name and use that.  It's a bit friendlier, but imprecise,
>   since DNS
>   > might change.  What does NFS do in this case? (Show an IP or a
>   name?)
>   >
> 
>   A "df" will show the entry as it's in the fstab file, but mount
>   will show the IPs as well.
> 
>   But Ceph is a different story here due to the monmap.
> 
>   Wido
> 
>   > sage
>   >
>   >
>   > > > And using DNS give you the flexibility of changing your
>   monitor quorum
>   > > > members , without notifying end user to change their fstab
>   entry , or
>   > > > whatever mount point record.
>   > > >
>   > >
>   > > Still applies. Just create a Round Robin DNS record. The
>   clients will obtain a new monmap while they are connected to the
>   cluster.
>   > >
>   > > Wido
>   > >
>   > > > 2017-03-01 18:46 GMT+08:00 gjprabu :
>   > > >
>   > > > > Hi Robert,
>   > > > >
>   > > > >   This container host will be provided to end user and
>   we don't want to
>   > > > > expose this ip to end users.
>   > > > >
>   > > > > Regards
>   > > > > Prabu GJ
>   > > > >
>   > > > >
>   > > > >  On Wed, 01 Mar 2017 16:03:49 +0530 *Robert Sander
>   > > > >>* wrote 
>   > > > >
>   > > > > On 01.03.2017 10:54, gjprabu wrote:
>   > > > > > Hi,
>   > > > > >
>   > > > > > We try to use host name instead of ip address but
>   mounted partion
>   > > > > > showing up address only . How show the host name
>   instead of ip address.
>   > > > >
>   > > > > What is the security gain you try to achieve by hiding
>   the IPs?
>   > > > >
>   > > > > Regards
>   > > > > --
>   > > > > Robert Sander
>   > > > > Heinlein Support GmbH
>   > > > > Schwedter Str. 8/9b, 10119 Berlin
>   > > > >
>   > > > > http://www.heinlein-support.de
>   > > > >
>   > > > > Tel: 030 / 405051-43
>   > > > > Fax: 030 / 405051-19
>   > > > >
>   > > > > Zwangsangaben lt. §35a GmbHG:
>   > > > > HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
>   > > > > Geschäftsführer: Peer Heinlein -- Sitz: Berlin
>   > > > >
>   > > > > ___
>   > > > > ceph-users mailing list
>   > > > > ceph-users@lists.ceph.com
>   > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>   > > > >
>   > > > >
>   > > > >
>   > > > > ___
>   > > > > ceph-users mailing list
>   > > > > ceph-users@lists.ceph.com
>   > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>   > > > >
>   > > > >
>   > > > ___
>   > > > ceph-users mailing list
>   > > > ceph-users@lists.ceph.com
>   > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>   >

Re: [ceph-users] ceph-users Digest, Vol 50, Issue 1

2017-03-01 Thread Jon Wright

Thank you for your response. :)

Version was Jewel - 10.2.2.  And, yes I did restart the monitors with no 
change in results.

For the record, here's the problem.   It was a multi-pool cluster, and 
the Crush rules had an inappropriately large number for the step 
chooseleaf line.  I won't get into details because it would raise more 
questions than it would answer.  But enough OSDs went down to result in 
there being no solution for some PG groups.   Neverthess the monitors 
continued searching for a solution (based on the chooseleaf parameter 
applied to each placement group).

The monitors were spending all their time running crush and then calling 
an election when a timeout expired.

That's the cause of the problem.   Maybe I'll post a solution if and 
when we get out of this state.

Sadly, my fault.  It might be nice to get a warning when you try to do 
something really stupid like that.

On 03/01/2017 03:02 PM, ceph-users-requ...@lists.ceph.com wrote:

Date: Tue, 28 Feb 2017 23:52:26 +
From: Joao Eduardo Luis
To:ceph-users@lists.ceph.com
Subject: Re: [ceph-users] monitors at 100%; cluster out of service
Message-ID:
Content-Type: text/plain; charset=windows-1252; format=flowed

On 02/28/2017 09:53 PM, WRIGHT, JON R (JON R) wrote:

>I currently have a situation where the monitors are running at 100% CPU,
>and can't run any commands because authentication times out after 300
>seconds.
>
>I stopped the leader, and the resulting election picked a new leader,
>but that monitor shows exactly the same behavor.
>
>Now both monitors*think*  they are the leader and call new elections
>against the third monitor, both winning each time.   Essentially they
>alternate between calling an election (which they win) and then pegging
>one of the CPUs at 100%.
>
>strace suggests that the monitor daemons are spending the "pegged" time
>in user space, and attaching a debugger to the running process suggests
>that the monitor is spending its time doing crushmap calculations in
>fn_monstore.
>
>Setting paxos_debug to 10 produces this log message:
>
>2017-02-28 16:50:49.503712 7f218ccd4700  7
>mon.hlxkvm001-storage@0(leader).paxosservice(osdmap 1252..1873) _active
>creating new pending
>
>during the time when the monitor process is pegged at 100%.
>
>The problem started when one of the hosts running a peon was rebooted,
>but didn't have the correct mtu setting in /etc/network/interfaces.
>The problem showed up after correcting the mtu value.
>
>Also, we are using a hyperconverged architecture where the same host
>runs a monitor and multiple OSDs.
>
>Any thoughts on recovery would be greatly appreciated.

What version is this?

How many monitors are you running?

Are the monitors consuming an unusual amount of memory? What about the
OSDs in the same nodes?

Is the size of the monitor stores abnormally high?

Have you tried restarting all monitors and see if they hit the same issue?

-Joao

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] 转发: Re: ceph-users Digest, Vol 50, Issue 1

2017-03-01 Thread song.baisen

Hi! Maybe you should check your network.Does the network of your mon to each 
other is ok. Then do you only run ceph in the mon host.Is there any others 




program run and waste the cpu of your host. If that is all ok , you can try 
expend the expire time for each mon to lease timeout.This may reduce the new 
elelction for 




each mon.
















原始邮件



发件人： ＜jonrodwri...@gmail.com＞
收件人： ＜ceph-users@lists.ceph.com＞
日 期 ：2017年03月02日 08:32
主 题 ：Re: [ceph-users] ceph-users Digest, Vol 50, Issue 1





Thank you for your response. :)

Version was Jewel - 10.2.2.  And, yes I did restart the monitors   with no 
change in results.




For the record, here's the problem.   It was a multi-pool   cluster, and 
the Crush rules had an inappropriately large number   for the step 
chooseleaf line.  I won't get into details because it   would raise more 
questions than it would answer.  But enough OSDs   went down to result in 
there being no solution for some PG   groups.   Neverthess the monitors 
continued searching for a   solution (based on the chooseleaf parameter 
applied to each   placement group).  





The monitors were spending all their time running crush and then   calling 
an election when a timeout expired.




That's the cause of the problem.   Maybe I'll post a solution if   and when 
we get out of this state.




Sadly, my fault.  It might be nice to get a warning when you try   to do 
something really stupid like that.




On 03/01/2017 03:02 PM, ceph-users-requ...@lists.ceph..com wrote:


Date: Tue, 28 Feb 2017 23:52:26 + From: Joao Eduardo Luis ＜j...@suse.de＞ 
To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] monitors at 100% 
cluster out of service Message-ID: 
＜d7904f61-63fc-e59c-e31e-791dd4e05...@suse.de＞ Content-Type: text/plain 
charset=windows-1252 format=flowed  On 02/28/2017 09:53 PM, WRIGHT, JON R (JON 
R) wrote:  ＞ I currently have a situation where the monitors are running at 
100% CPU, ＞ and can't run any commands because authentication times out after 
300 ＞ seconds. ＞ ＞ I stopped the leader, and the resulting election picked a 
new leader, ＞ but that monitor shows exactly the same behavor. ＞ ＞ Now both 
monitors *think* they are the leader and call new elections ＞ against the third 
monitor, both winning each time.   Essentially they ＞ alternate between calling 
an election (which they win) and then pegging ＞ one of the CPUs at 100%. ＞ ＞ 
strace suggests that the monitor daemons are spending the "pegged" time ＞ in 
user space, and attaching a debugger to the running process suggests ＞ that the 
monitor is spending its time doing crushmap calculations in ＞ fn_monstore. ＞ ＞ 
Setting paxos_debug to 10 produces this log message: ＞ ＞ 2017-02-28 
16:50:49.503712 7f218ccd4700  7 ＞ 
mon.hlxkvm001-storage@0(leader).paxosservice(osdmap 1252..1873) _active ＞ 
creating new pending ＞ ＞ during the time when the monitor process is pegged at 
100%. ＞ ＞ The problem started when one of the hosts running a peon was 
rebooted, ＞ but didn't have the correct mtu setting in /etc/network/interfaces. 
＞ The problem showed up after correcting the mtu value. ＞ ＞ Also, we are using 
a hyperconverged architecture where the same host ＞ runs a monitor and multiple 
OSDs. ＞ ＞ Any thoughts on recovery would be greatly appreciated.  What 
version is this?  How many monitors are you running?  Are the monitors 
consuming an unusual amount of memory? What about the  OSDs in the same nodes?  
Is the size of the monitor stores abnormally high?  Have you tried restarting 
all monitors and see if they hit the same issue? -Joao___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Not able to map a striped RBD image - Format 2

2017-03-01 Thread Daleep Singh Bais

Hi,

I am using Ceph Jewel, version 10.2.2 and trying to map a RBD image
which has stripe parameters to test performance, however, when I try to
mount it, i get "rbd: map failed: (22) Invalid argument" . Please
confirm if we cannot use the stripe parameters with this version as well
and need to go with base stripe parameters as in Format 1.

I have gone through earlier mails and the issue was due to striping not
being supported.

~# ceph --version
ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

root@ceph-host:~# rbd info Stripetest2 -p test-pool
rbd image 'Stripetest2':
size 500 GB in 128112 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.fd3ca238e1f29
format: 2
features: layering, striping
flags:
stripe unit: 1024 kB
stripe count: 16
root@ceph-host:~# rbd map Stripetest2 -p test-pool
rbd: sysfs write failed
In some cases useful info is found in syslog - try "dmesg | tail" or so.
rbd: map failed: (22) Invalid argument

Any information about this would be appreciated.

Thanks,

Daleep Singh Bais
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Not able to map a striped RBD image - Format 2

2017-03-01 Thread Jason Dillaman

On Wed, Mar 1, 2017 at 8:23 PM, Daleep Singh Bais  wrote:
> Hi,
>
> I am using Ceph Jewel, version 10.2.2 and trying to map a RBD image
> which has stripe parameters to test performance, however, when I try to
> mount it, i get "rbd: map failed: (22) Invalid argument" . Please
> confirm if we cannot use the stripe parameters with this version as well
> and need to go with base stripe parameters as in Format 1.

"Fancy" striping is not currently supported by krbd. Note image format
v1 is (1) deprecated and (2) effectively laid out in the same basic
striping pattern as default v2 images (i.e. stripe unit == object size
and stripe count == 1).

> I have gone through earlier mails and the issue was due to striping not
> being supported.
>
> ~# ceph --version
> ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>
> root@ceph-host:~# rbd info Stripetest2 -p test-pool
> rbd image 'Stripetest2':
> size 500 GB in 128112 objects
> order 22 (4096 kB objects)
> block_name_prefix: rbd_data.fd3ca238e1f29
> format: 2
> features: layering, striping
> flags:
> stripe unit: 1024 kB
> stripe count: 16
> root@ceph-host:~# rbd map Stripetest2 -p test-pool
> rbd: sysfs write failed
> In some cases useful info is found in syslog - try "dmesg | tail" or so.
> rbd: map failed: (22) Invalid argument
>
> Any information about this would be appreciated.
>
> Thanks,
>
> Daleep Singh Bais
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Hammer update

2017-03-01 Thread Sasha Litvak

Hello everyone,

Hammer 0.94.10 update was announced in the blog a week ago.  However, there
are no packages available for either version of redhat.  Can someone tell
me what is going on?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Slow request log format, negative IO size?

2017-03-01 Thread Stephen Blinick

Hello, I'm chasing down a situation where there's periodic slow requests
occurring.   While the specific version in this case is 0.80.7 Firefly, I
think this log format is the same in newer versions.  I can verify.

There's a host of symptoms going on, but one strange anomaly I found that I
wasn't able to chase down or find in search has to do with the Op
parameters for an osd_op being logged as a slow request.

Specifically, in some cases the byte range for an op of various types (i.e.
read, writefull) is sometimes negative.  Here's an example of two slow
request log entries:

#1
2017-02-28 18:39:09.943169 osd.27 10.0.1.84:6822/2402845 5255 : [WRN] slow
request 16.440574 seconds old, received at 2017-02-28 18:38:53.502539:
osd_op(client.2000529.0:496 ObjectNameOne [writefull 0~4194304] 3.3f26fcc9
ondisk+write e691) v4 currently commit sent

#2
2017-02-28 18:39:05.959253 osd.40 10.0.1.88:6831/2187230 6180 : [WRN] slow
request 8.470175 seconds old, received at 2017-02-28 18:38:57.489045:
osd_op(client.1941470.0:21164213 ObjectNameTwo [read 3670016~524288]
3.3a50c331 ack+read e691) v4 currently started

As you can see, some of them show the byte range A~B where B is lower than
A.  I'm mostly interested to find out if this is an indication of any
problem.   This is an EC pool, 3+2.

Thanks,

Stephen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Slow request log format, negative IO size?

2017-03-01 Thread Sage Weil

On Wed, 1 Mar 2017, Stephen Blinick wrote:
> Hello, I'm chasing down a situation where there's periodic slow requests
> occurring.   While the specific version in this case is 0.80.7 Firefly, I
> think this log format is the same in newer versions.  I can verify.
> There's a host of symptoms going on, but one strange anomaly I found that I
> wasn't able to chase down or find in search has to do with the Op parameters
> for an osd_op being logged as a slow request. 
> 
> Specifically, in some cases the byte range for an op of various types (i.e.
> read, writefull) is sometimes negative.  Here's an example of two slow
> request log entries:
> 
> #1
> 2017-02-28 18:39:09.943169 osd.27 10.0.1.84:6822/2402845 5255 : [WRN] slow
> request 16.440574 seconds old, received at 2017-02-28 18:38:53.502539:
> osd_op(client.2000529.0:496 ObjectNameOne [writefull 0~4194304] 3.3f26fcc9
> ondisk+write e691) v4 currently commit sent
> 
> #2
> 2017-02-28 18:39:05.959253 osd.40 10.0.1.88:6831/2187230 6180 : [WRN] slow
> request 8.470175 seconds old, received at 2017-02-28 18:38:57.489045:
> osd_op(client.1941470.0:21164213 ObjectNameTwo [read 3670016~524288]
> 3.3a50c331 ack+read e691) v4 currently started
> 
> As you can see, some of them show the byte range A~B where B is lower than
> A.  I'm mostly interested to find out if this is an indication of any

This is quirky Ceph convention for printing extents as offset~length (it's 
not start~end).  So these look fine.

Firefly 0.80.7?  You should really upgrade.  Twice (to hammer and then to 
jewel).

sage

> problem.   This is an EC pool, 3+2.
> 
> Thanks,
> 
> Stephen 
> 
> ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Slow request log format, negative IO size?

2017-03-01 Thread Stephen Blinick

On Wed, Mar 1, 2017 at 11:34 PM, Sage Weil  wrote:

> On Wed, 1 Mar 2017, Stephen Blinick wrote:
> > Hello, I'm chasing down a situation where there's periodic slow requests
> > occurring.   While the specific version in this case is 0.80.7 Firefly, I
> > think this log format is the same in newer versions.  I can verify.
> > There's a host of symptoms going on, but one strange anomaly I found
> that I
> > wasn't able to chase down or find in search has to do with the Op
> parameters
> > for an osd_op being logged as a slow request.
> >
> > Specifically, in some cases the byte range for an op of various types
> (i.e.
> > read, writefull) is sometimes negative.  Here's an example of two slow
> > request log entries:
> >
> > #1
> > 2017-02-28 18:39:09.943169 osd.27 10.0.1.84:6822/2402845 5255 : [WRN]
> slow
> > request 16.440574 seconds old, received at 2017-02-28 18:38:53.502539:
> > osd_op(client.2000529.0:496 ObjectNameOne [writefull 0~4194304]
> 3.3f26fcc9
> > ondisk+write e691) v4 currently commit sent
> >
> > #2
> > 2017-02-28 18:39:05.959253 osd.40 10.0.1.88:6831/2187230 6180 : [WRN]
> slow
> > request 8.470175 seconds old, received at 2017-02-28 18:38:57.489045:
> > osd_op(client.1941470.0:21164213 ObjectNameTwo [read 3670016~524288]
> > 3.3a50c331 ack+read e691) v4 currently started
> >
> > As you can see, some of them show the byte range A~B where B is lower
> than
> > A.  I'm mostly interested to find out if this is an indication of any
>
> This is quirky Ceph convention for printing extents as offset~length (it's
> not start~end).  So these look fine.
>

Ahh I should have guessed.  This makes a lot more sense. I'll update the
parser accordingly.  Thanks!

>
> Firefly 0.80.7?  You should really upgrade.  Twice (to hammer and then to
> jewel).
>
>
Indeed!  We already have for the most part, but this system is in
production, so that always adds friction to any upgrade.  Definitely makes
problem debug more of a 'forensic' exercise :)


> sage
>
> > problem.   This is an EC pool, 3+2.
> >
> > Thanks,
> >
> > Stephen
> >
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] "STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY" showing in ceph -s

2017-03-01 Thread nokia ceph

Thanks Greg for the info.

As per our testing, we fix this warning problem by disabling ceph-mgr
service on all the ceph nodes .if the warning still persist, we go  on the
last ceph node of the cluster and  tried starting and stoping ceph-mgr
service as this operation solved the issue.

Do you have any other suggestion to how to skip this warning?

Thanks


On Mon, Feb 27, 2017 at 8:47 PM, Gregory Farnum  wrote:

> On Sun, Feb 26, 2017 at 10:41 PM, nokia ceph 
> wrote:
> > Hello,
> >
> > On a fresh installation ceph kraken 11.2.0 , we are facing below error in
> > the "ceph -s" output.
> >
> > 
> > 0 -- 10.50.62.152:0/675868622 >> 10.50.62.152:6866/13884
> conn(0x7f576c002750
> > :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0
> > l=1)._process_connection connect claims to be 10.50.62.152:6866/1244305
> not
> > 10.50.62.152:6866/13884 - wrong node!
> > 
>
> As you see when comparing addresses, they differ only at the end, in
> what we call the nonce. This most commonly just means that one end or
> the other has a newer osd map epoch indicating the OSD went down and
> it restarted itself. If it persists once they've all finished their
> startup work, you may have an issue with your network config or
> something.
> -Greg
>
> >
> > May I know under what scenerio the above message will prompt in the
> screen.
> > Also let me know what is the impact of this message.
> >
> > I suspect this message raised because of something wrong with the OSD
> > creation.
> >
> > Env:-
> > Kraken - 11.2.0 , 4 node , 3 mon
> > RHEL 7.2
> > EC 3+1 , 68 disks , bluestore
> >
> > Please suggest how to remove or skip these errors.
> > FYI -
> > https://github.com/ceph/ceph/blob/master/src/msg/async/
> AsyncConnection.h#L237
> >
> > Thanks
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

40 matches

Mail list logo