Re: [ceph-users] Cache tiering and target_max_bytes

2014-08-18 Thread Paweł Sadowski
On 08/14/2014 10:30 PM, Sage Weil wrote:
 On Thu, 14 Aug 2014, Pawe? Sadowski wrote:
 W dniu 14.08.2014 17:20, Sage Weil pisze:
 On Thu, 14 Aug 2014, Pawe? Sadowski wrote:
 Hello,

 I've a cluster of 35 OSD (30 HDD, 5 SSD) with cache tiering configured.
 During tests it looks like ceph is not respecting target_max_bytes
 settings. Steps to reproduce:
  - configure cache tiering
  - set target_max_bytes to 32G (on hot pool)
  - write more than 32G of data
  - nothing happens
 snip details

 The reason the agent is doing work is because you don't have 
 hit_set_* configured for the cache pool, which means the cluster isn't 
 tracking what objects get read to inform the flush/evict 
 decisions.  Configuring that will fix this.  Try

  ceph osd pool set cache hit_set_type bloom
  ceph osd pool set cache hit_set_count 8
  ceph osd pool set cache hit_set_period 3600

 or similar.

 The agent could still run in a brain-dead mode without it, but it suffers 
 from the bug you found.  That was fixed after 0.80.5 and will be in 
 0.80.6.
Thanks!

PS
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pools with latest master

2014-08-18 Thread Varada Kari
Yes, these are recent changes from John. Because of these changes:


commit 90e6daec9f3fe2a3ba051301ee50940278ade18b
Author: John Spray john.sp...@inktank.com
Date:   Tue Apr 29 15:39:45 2014 +0100

osdmap: Don't create FS pools by default

Because many Ceph users don't use the filesystem,
don't create the 'data' and 'metadata' pools by
default -- they will be created by newfs if
they are needed.

Signed-off-by: John Spray john.sp...@inktank.com

commit 7294e8c4df6df9d0898f82bb6e0839ed98149310
Author: John Spray john.sp...@inktank.com
Date:   Tue May 27 11:04:43 2014 +0100

test/qa: update for MDSMonitor changes

Accomodate changes:
 * data and metadata pools no longer exist by default
 * filesystem-using tests must use `fs new` to create
   the filesystem first.

Signed-off-by: John Spray john.sp...@inktank.com


Varada

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Somnath Roy
Sent: Saturday, August 16, 2014 3:19 AM
To: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org
Subject: [ceph-users] pools with latest master

Hi,
I have installed created a single node/single osd cluster with latest master 
for some experiment and saw it is creating only rbd pool by default not the 
data/metadata pools. Is this something changed recently ?

Thanks  Regards
Somnath



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pools with latest master

2014-08-18 Thread Varada Kari
Yes, these are recent changes from John. Because of these changes:


commit 90e6daec9f3fe2a3ba051301ee50940278ade18b
Author: John Spray john.sp...@inktank.com
Date:   Tue Apr 29 15:39:45 2014 +0100

osdmap: Don't create FS pools by default

Because many Ceph users don't use the filesystem,
don't create the 'data' and 'metadata' pools by
default -- they will be created by newfs if
they are needed.

Signed-off-by: John Spray john.sp...@inktank.com

commit 7294e8c4df6df9d0898f82bb6e0839ed98149310
Author: John Spray john.sp...@inktank.com
Date:   Tue May 27 11:04:43 2014 +0100

test/qa: update for MDSMonitor changes

Accomodate changes:
 * data and metadata pools no longer exist by default
 * filesystem-using tests must use `fs new` to create
   the filesystem first.

Signed-off-by: John Spray john.sp...@inktank.com


Varada

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Somnath Roy
Sent: Saturday, August 16, 2014 3:19 AM
To: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org
Subject: [ceph-users] pools with latest master

Hi, 
I have installed created a single node/single osd cluster with latest master 
for some experiment and saw it is creating only rbd pool by default not the 
data/metadata pools. Is this something changed recently ?

Thanks  Regards
Somnath



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph cluster inconsistency?

2014-08-18 Thread Kenneth Waegeman

Hi,

I tried this after restarting the osd, but I guess that was not the aim
(
# ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/current/ list  
_GHOBJTOSEQ_| grep 6adb1100 -A 100
IO error: lock /var/lib/ceph/osd/ceph-67/current//LOCK: Resource  
temporarily unavailable
tools/ceph_kvstore_tool.cc: In function 'StoreTool::StoreTool(const  
string)' thread 7f8fecf7d780 time 2014-08-18 11:12:29.551780

tools/ceph_kvstore_tool.cc: 38: FAILED assert(!db_ptr-open(std::cerr))
..
)

When I run it after bringing the osd down, it takes a while, but it  
has no output.. (When running it without the grep, I'm getting a huge  
list )


Or should I run this immediately after the osd is crashed, (because it  
maybe rebalanced?  I did already restarted the cluster)



I don't know if it is related, but before I could all do that, I had  
to fix something else: A monitor did run out if disk space, using 8GB  
for his store.db folder (lot of sst files). Other monitors are also  
near that level.
Never had that problem on previous setups before. I recreated a  
monitor and now it uses 3.8GB.


Thanks!

Kenneth



- Message from Sage Weil sw...@redhat.com -
   Date: Fri, 15 Aug 2014 06:10:34 -0700 (PDT)
   From: Sage Weil sw...@redhat.com
Subject: Re: [ceph-users] ceph cluster inconsistency?
 To: Haomai Wang haomaiw...@gmail.com
 Cc: Kenneth Waegeman kenneth.waege...@ugent.be,  
ceph-users@lists.ceph.com




On Fri, 15 Aug 2014, Haomai Wang wrote:

Hi Kenneth,

I don't find valuable info in your logs, it lack of the necessary
debug output when accessing crash code.

But I scan the encode/decode implementation in GenericObjectMap and
find something bad.

For example, two oid has same hash and their name is:
A: rb.data.123
B: rb-123

In ghobject_t compare level, A  B. But GenericObjectMap encode . to
%e, so the key in DB is:
A: _GHOBJTOSEQ_:blah!51615000!!none!!rb%edata%e123!head
B: _GHOBJTOSEQ_:blah!51615000!!none!!rb-123!head

A  B

And it seemed that the escape function is useless and should be disabled.

I'm not sure whether Kenneth's problem is touching this bug. Because
this scene only occur when the object set is very large and make the
two object has same hash value.

Kenneth, could you have time to run ceph-kv-store [path-to-osd] list
_GHOBJTOSEQ_| grep 6adb1100 -A 100. ceph-kv-store is a debug tool
which can be compiled from source. You can clone ceph repo and run
./authongen.sh; ./configure; cd src; make ceph-kvstore-tool.
path-to-osd should be /var/lib/ceph/osd-[id]/current/. 6adb1100
is from your verbose log and the next 100 rows should know necessary
infos.


You can also get ceph-kvstore-tool from the 'ceph-tests' package.


Hi sage, do you think we need to provided with upgrade function to fix it?


Hmm, we might.  This only affects the key/value encoding right?  The
FileStore is using its own function to map these to file names?

Can you open a ticket in the tracker for this?

Thanks!
sage




On Thu, Aug 14, 2014 at 7:36 PM, Kenneth Waegeman
kenneth.waege...@ugent.be wrote:


 - Message from Haomai Wang haomaiw...@gmail.com -
Date: Thu, 14 Aug 2014 19:11:55 +0800

From: Haomai Wang haomaiw...@gmail.com
 Subject: Re: [ceph-users] ceph cluster inconsistency?
  To: Kenneth Waegeman kenneth.waege...@ugent.be


 Could you add config debug_keyvaluestore = 20/20 to the crashed osd
 and replay the command causing crash?

 I would like to get more debug infos! Thanks.


 I included the log in attachment!
 Thanks!


 On Thu, Aug 14, 2014 at 4:41 PM, Kenneth Waegeman
 kenneth.waege...@ugent.be wrote:


 I have:
 osd_objectstore = keyvaluestore-dev

 in the global section of my ceph.conf


 [root@ceph002 ~]# ceph osd erasure-code-profile get profile11
 directory=/usr/lib64/ceph/erasure-code
 k=8
 m=3
 plugin=jerasure
 ruleset-failure-domain=osd
 technique=reed_sol_van

 the ecdata pool has this as profile

 pool 3 'ecdata' erasure size 11 min_size 8 crush_ruleset 2 object_hash
 rjenkins pg_num 128 pgp_num 128 last_change 161 flags hashpspool
 stripe_width 4096

 ECrule in crushmap

 rule ecdata {
 ruleset 2
 type erasure
 min_size 3
 max_size 20
 step set_chooseleaf_tries 5
 step take default-ec
 step choose indep 0 type osd
 step emit
 }
 root default-ec {
 id -8   # do not change unnecessarily
 # weight 140.616
 alg straw
 hash 0  # rjenkins1
 item ceph001-ec weight 46.872
 item ceph002-ec weight 46.872
 item ceph003-ec weight 46.872
 ...

 Cheers!
 Kenneth

 - Message from Haomai Wang haomaiw...@gmail.com -
Date: Thu, 14 Aug 2014 10:07:50 +0800
From: Haomai Wang haomaiw...@gmail.com
 Subject: Re: [ceph-users] ceph cluster inconsistency?
  To: Kenneth Waegeman kenneth.waege...@ugent.be
  Cc: ceph-users ceph-users@lists.ceph.com



 Hi Kenneth,

 Could you give your configuration related to EC and KeyValueStore?
 Not sure 

Re: [ceph-users] RadosGW problems

2014-08-18 Thread Marco Garcês
Hi there,

I have FastCgiWrapper Off in fastcgi.conf file; I also have SELinux in
permissive state; 'ps aux | grep rados' shows me radosgw is running;

The problems stays the same... I can login with S3 credentials, create
buckets, but uploads write this in the logs:
[Mon Aug 18 12:00:28.636378 2014] [:error] [pid 11251] [client
10.5.1.1:49680] FastCGI: comm with server /var/www/cgi-bin/s3gw.fcgi
aborted: idle timeout (3
0 sec)
[Mon Aug 18 12:00:28.676825 2014] [:error] [pid 11251] [client
10.5.1.1:49680] FastCGI: incomplete headers (0 bytes) received from server
/var/www/cgi-bin/s3
gw.fcgi

When I try Swift credentials, I cannot login at all.. I have tested both
Cyberduck and Swift client on the command line, and I always get this on
the logs:
GET /v1.0 HTTP/1.1 404 78 - Cyberduck/4.5 (Mac OS X/10.9.3) (x86_64)
GET /v1.0 HTTP/1.1 404 78 - python-swiftclient-2.2.0

In S3 login, when I upload a file, I can see it almost at 100% complete,
but then it fails with the above errors.

A strange thing is... the /var/log/ceph/client.radosgw.gateway.log is not
getting updated, I don't see any new logs in there.

Thank you once again for your help, Marco Garcês


*Marco Garcês*
*#sysadmin*
Maputo - Mozambique
*[Phone]* +258 84 4105579
*[Skype]* marcogarces


On Mon, Aug 18, 2014 at 12:08 AM, Linux Chips linux.ch...@gmail.com wrote:

 On Mon 18 Aug 2014 12:45:33 AM AST, Bachelder, Kurt wrote:

 Hi Marco –

 In CentOS 6, you also had to edit /etc/httpd/conf.d/fastcgi.conf to
 turn OFF the fastcgi wrapper.  I haven’t tested in v7 yet, but I’d
 guess it’s required there too:

 # wrap all fastcgi script calls in suexec

 FastCgiWrapper Off

 Give that a try, if you haven’t already – restart httpd and
 ceph-radosgw afterward.

 Kurt

 *From:*ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On
 Behalf Of *Marco Garcês
 *Sent:* Friday, August 15, 2014 12:46 PM
 *To:* ceph-users@lists.ceph.com
 *Subject:* [ceph-users] RadosGW problems


 Hi there,

 I am using CentOS 7 with Ceph version 0.80.5
 (38b73c67d375a2552d8ed67843c8a65c2c0feba6), 3 OSD, 3 MON, 1 RadosGW
 (which also serves as ceph-deploy node)

 I followed all the instructions in the docs, regarding setting up a
 basic Ceph cluster, and then followed the one to setup RadosGW.

 I can't seem to use the Swift interface, and the S3 interface, times
 out after 30 seconds.

 [Fri Aug 15 18:25:33.290877 2014] [:error] [pid 6197] [client
 10.5.5.222:58051 http://10.5.5.222:58051] FastCGI: comm with server

 /var/www/cgi-bin/s3gw.fcgi aborted: idle timeout (30 sec)

 [Fri Aug 15 18:25:33.291781 2014] [:error] [pid 6197] [client
 10.5.5.222:58051 http://10.5.5.222:58051] FastCGI: incomplete

 headers (0 bytes) received from server /var/www/cgi-bin/s3gw.fcgi

 *My ceph.conf:*


 [global]

 fsid = 581bcd61-8760-4756-a7c8-e8275c0957ad

 mon_initial_members = CEPH01, CEPH02, CEPH03

 mon_host = 10.2.27.81,10.2.27.82,10.2.27.83

 public network = 10.2.27.0/25 http://10.2.27.0/25


 auth_cluster_required = cephx

 auth_service_required = cephx

 auth_client_required = cephx

 filestore_xattr_use_omap = true

 osd pool default size = 2

 osd pool default pg num = 333

 osd pool default pgp num = 333

 osd journal size = 1024

 [client.radosgw.gateway]

 host = GATEWAY

 keyring = /etc/ceph/ceph.client.radosgw.keyring

 rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock

 log file = /var/log/ceph/client.radosgw.gateway.log

 rgw print continue = false

 rgw enable ops log = true

 *My apache rgw.conf:*


 FastCgiExternalServer /var/www/cgi-bin/s3gw.fcgi -socket
 /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock

 VirtualHost *:443

 SSLEngine on

 SSLCertificateFile /etc/pki/tls/certs/ca_rgw.crt

 SSLCertificateKeyFile /etc/pki/tls/private/ca_rgw.key

 SetEnv SERVER_PORT_SECURE 443

 ServerName gateway.testes.local

 ServerAlias *.gateway.testes.local

 ServerAdmin marco.gar...@testes.co.mz
 mailto:marco.gar...@testes.co.mz


 DocumentRoot /var/www/cgi-bin

 RewriteEngine On

 #RewriteRule ^/(.*) /s3gw.fcgi?%{QUERY_STRING}
 [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]

 RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*)
 /s3gw.fcgi?page=$1params=$2%{QUERY_STRING}
 [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]

 IfModule mod_fastcgi.c

 Directory /var/www

 Options +ExecCGI

 AllowOverride All

 SetHandler fastcgi-script

 Order allow,deny

 Allow from all

 AuthBasicAuthoritative Off

 /Directory

 /IfModule

 AllowEncodedSlashes On

 ErrorLog /var/log/httpd/error_rgw_ssl.log

 CustomLog /var/log/httpd/access_rgw_ssl.log combined

 ServerSignature Off

 /VirtualHost

 *My /var/www/cgi-bin/s3gw.fcgi *


 #!/bin/sh

 exec /usr/bin/radosgw -c /etc/ceph/ceph.conf -n 

Re: [ceph-users] ceph cluster inconsistency?

2014-08-18 Thread Haomai Wang
On Mon, Aug 18, 2014 at 5:38 PM, Kenneth Waegeman
kenneth.waege...@ugent.be wrote:
 Hi,

 I tried this after restarting the osd, but I guess that was not the aim
 (
 # ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/current/ list _GHOBJTOSEQ_|
 grep 6adb1100 -A 100
 IO error: lock /var/lib/ceph/osd/ceph-67/current//LOCK: Resource temporarily
 unavailable
 tools/ceph_kvstore_tool.cc: In function 'StoreTool::StoreTool(const
 string)' thread 7f8fecf7d780 time 2014-08-18 11:12:29.551780
 tools/ceph_kvstore_tool.cc: 38: FAILED assert(!db_ptr-open(std::cerr))
 ..
 )

 When I run it after bringing the osd down, it takes a while, but it has no
 output.. (When running it without the grep, I'm getting a huge list )

Oh, sorry for it! I made a mistake, the hash value(6adb1100) will be
reversed into leveldb.
So grep benchmark_data_ceph001.cubone.os_5560_object789734 should be help it.


 Or should I run this immediately after the osd is crashed, (because it maybe
 rebalanced?  I did already restarted the cluster)


 I don't know if it is related, but before I could all do that, I had to fix
 something else: A monitor did run out if disk space, using 8GB for his
 store.db folder (lot of sst files). Other monitors are also near that level.
 Never had that problem on previous setups before. I recreated a monitor and
 now it uses 3.8GB.

It exists some duplicate data which needed to be compacted.


Another idea, maybe you can make KeyValueStore's stripe size align
with EC stripe size.
I haven't think deeply and maybe I will try it later.

 Thanks!

 Kenneth



 - Message from Sage Weil sw...@redhat.com -
Date: Fri, 15 Aug 2014 06:10:34 -0700 (PDT)
From: Sage Weil sw...@redhat.com

 Subject: Re: [ceph-users] ceph cluster inconsistency?
  To: Haomai Wang haomaiw...@gmail.com
  Cc: Kenneth Waegeman kenneth.waege...@ugent.be,
 ceph-users@lists.ceph.com



 On Fri, 15 Aug 2014, Haomai Wang wrote:

 Hi Kenneth,

 I don't find valuable info in your logs, it lack of the necessary
 debug output when accessing crash code.

 But I scan the encode/decode implementation in GenericObjectMap and
 find something bad.

 For example, two oid has same hash and their name is:
 A: rb.data.123
 B: rb-123

 In ghobject_t compare level, A  B. But GenericObjectMap encode . to
 %e, so the key in DB is:
 A: _GHOBJTOSEQ_:blah!51615000!!none!!rb%edata%e123!head
 B: _GHOBJTOSEQ_:blah!51615000!!none!!rb-123!head

 A  B

 And it seemed that the escape function is useless and should be disabled.

 I'm not sure whether Kenneth's problem is touching this bug. Because
 this scene only occur when the object set is very large and make the
 two object has same hash value.

 Kenneth, could you have time to run ceph-kv-store [path-to-osd] list
 _GHOBJTOSEQ_| grep 6adb1100 -A 100. ceph-kv-store is a debug tool
 which can be compiled from source. You can clone ceph repo and run
 ./authongen.sh; ./configure; cd src; make ceph-kvstore-tool.
 path-to-osd should be /var/lib/ceph/osd-[id]/current/. 6adb1100
 is from your verbose log and the next 100 rows should know necessary
 infos.


 You can also get ceph-kvstore-tool from the 'ceph-tests' package.

 Hi sage, do you think we need to provided with upgrade function to fix
 it?


 Hmm, we might.  This only affects the key/value encoding right?  The
 FileStore is using its own function to map these to file names?

 Can you open a ticket in the tracker for this?

 Thanks!
 sage



 On Thu, Aug 14, 2014 at 7:36 PM, Kenneth Waegeman
 kenneth.waege...@ugent.be wrote:

 
  - Message from Haomai Wang haomaiw...@gmail.com -
 Date: Thu, 14 Aug 2014 19:11:55 +0800
 
 From: Haomai Wang haomaiw...@gmail.com
  Subject: Re: [ceph-users] ceph cluster inconsistency?
   To: Kenneth Waegeman kenneth.waege...@ugent.be
 
 
  Could you add config debug_keyvaluestore = 20/20 to the crashed osd
  and replay the command causing crash?
 
  I would like to get more debug infos! Thanks.
 
 
  I included the log in attachment!
  Thanks!
 
 
  On Thu, Aug 14, 2014 at 4:41 PM, Kenneth Waegeman
  kenneth.waege...@ugent.be wrote:
 
 
  I have:
  osd_objectstore = keyvaluestore-dev
 
  in the global section of my ceph.conf
 
 
  [root@ceph002 ~]# ceph osd erasure-code-profile get profile11
  directory=/usr/lib64/ceph/erasure-code
  k=8
  m=3
  plugin=jerasure
  ruleset-failure-domain=osd
  technique=reed_sol_van
 
  the ecdata pool has this as profile
 
  pool 3 'ecdata' erasure size 11 min_size 8 crush_ruleset 2
  object_hash
  rjenkins pg_num 128 pgp_num 128 last_change 161 flags hashpspool
  stripe_width 4096
 
  ECrule in crushmap
 
  rule ecdata {
  ruleset 2
  type erasure
  min_size 3
  max_size 20
  step set_chooseleaf_tries 5
  step take default-ec
  step choose indep 0 type osd
  step emit
  }
  root default-ec {
  id -8   # do not change unnecessarily
  # weight 140.616
  alg straw
  hash 0 

Re: [ceph-users] ceph cluster inconsistency?

2014-08-18 Thread Kenneth Waegeman


- Message from Haomai Wang haomaiw...@gmail.com -
   Date: Mon, 18 Aug 2014 18:34:11 +0800
   From: Haomai Wang haomaiw...@gmail.com
Subject: Re: [ceph-users] ceph cluster inconsistency?
 To: Kenneth Waegeman kenneth.waege...@ugent.be
 Cc: Sage Weil sw...@redhat.com, ceph-users@lists.ceph.com



On Mon, Aug 18, 2014 at 5:38 PM, Kenneth Waegeman
kenneth.waege...@ugent.be wrote:

Hi,

I tried this after restarting the osd, but I guess that was not the aim
(
# ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/current/ list _GHOBJTOSEQ_|
grep 6adb1100 -A 100
IO error: lock /var/lib/ceph/osd/ceph-67/current//LOCK: Resource temporarily
unavailable
tools/ceph_kvstore_tool.cc: In function 'StoreTool::StoreTool(const
string)' thread 7f8fecf7d780 time 2014-08-18 11:12:29.551780
tools/ceph_kvstore_tool.cc: 38: FAILED assert(!db_ptr-open(std::cerr))
..
)

When I run it after bringing the osd down, it takes a while, but it has no
output.. (When running it without the grep, I'm getting a huge list )


Oh, sorry for it! I made a mistake, the hash value(6adb1100) will be
reversed into leveldb.
So grep benchmark_data_ceph001.cubone.os_5560_object789734 should  
be help it.



this gives:

[root@ceph003 ~]# ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/current/  
list _GHOBJTOSEQ_ | grep 5560_object789734 -A 100

_GHOBJTOSEQ_:3%e0s0_head!0011BDA6!!3!!benchmark_data_ceph001%ecubone%eos_5560_object789734!head
_GHOBJTOSEQ_:3%e0s0_head!0011C027!!3!!benchmark_data_ceph001%ecubone%eos_31461_object1330170!head
_GHOBJTOSEQ_:3%e0s0_head!0011C6FD!!3!!benchmark_data_ceph001%ecubone%eos_4919_object227366!head
_GHOBJTOSEQ_:3%e0s0_head!0011CB03!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1363631!head
_GHOBJTOSEQ_:3%e0s0_head!0011CDF0!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1573957!head
_GHOBJTOSEQ_:3%e0s0_head!0011D02C!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1019282!head
_GHOBJTOSEQ_:3%e0s0_head!0011E2B5!!3!!benchmark_data_ceph001%ecubone%eos_31461_object1283563!head
_GHOBJTOSEQ_:3%e0s0_head!0011E511!!3!!benchmark_data_ceph001%ecubone%eos_4919_object273736!head
_GHOBJTOSEQ_:3%e0s0_head!0011E547!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1170628!head
_GHOBJTOSEQ_:3%e0s0_head!0011EAAB!!3!!benchmark_data_ceph001%ecubone%eos_4919_object256335!head
_GHOBJTOSEQ_:3%e0s0_head!0011F446!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1484196!head
_GHOBJTOSEQ_:3%e0s0_head!0011FC59!!3!!benchmark_data_ceph001%ecubone%eos_5560_object884178!head
_GHOBJTOSEQ_:3%e0s0_head!001203F3!!3!!benchmark_data_ceph001%ecubone%eos_5560_object853746!head
_GHOBJTOSEQ_:3%e0s0_head!001208E3!!3!!benchmark_data_ceph001%ecubone%eos_5560_object36633!head
_GHOBJTOSEQ_:3%e0s0_head!00120B37!!3!!benchmark_data_ceph001%ecubone%eos_31461_object1235337!head
_GHOBJTOSEQ_:3%e0s0_head!001210B6!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1661351!head
_GHOBJTOSEQ_:3%e0s0_head!001210CB!!3!!benchmark_data_ceph001%ecubone%eos_5560_object238126!head
_GHOBJTOSEQ_:3%e0s0_head!0012184C!!3!!benchmark_data_ceph001%ecubone%eos_5560_object339943!head
_GHOBJTOSEQ_:3%e0s0_head!00121916!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1047094!head
_GHOBJTOSEQ_:3%e0s0_head!001219C1!!3!!benchmark_data_ceph001%ecubone%eos_31461_object520642!head
_GHOBJTOSEQ_:3%e0s0_head!001222BB!!3!!benchmark_data_ceph001%ecubone%eos_5560_object639565!head
_GHOBJTOSEQ_:3%e0s0_head!001223AA!!3!!benchmark_data_ceph001%ecubone%eos_4919_object231080!head
_GHOBJTOSEQ_:3%e0s0_head!0012243C!!3!!benchmark_data_ceph001%ecubone%eos_5560_object858050!head
_GHOBJTOSEQ_:3%e0s0_head!0012289C!!3!!benchmark_data_ceph001%ecubone%eos_5560_object241796!head
_GHOBJTOSEQ_:3%e0s0_head!00122D28!!3!!benchmark_data_ceph001%ecubone%eos_4919_object7462!head
_GHOBJTOSEQ_:3%e0s0_head!00122DFE!!3!!benchmark_data_ceph001%ecubone%eos_5560_object243798!head
_GHOBJTOSEQ_:3%e0s0_head!00122EFC!!3!!benchmark_data_ceph001%ecubone%eos_8961_object109512!head
_GHOBJTOSEQ_:3%e0s0_head!001232D7!!3!!benchmark_data_ceph001%ecubone%eos_31461_object653973!head
_GHOBJTOSEQ_:3%e0s0_head!001234A3!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1378169!head
_GHOBJTOSEQ_:3%e0s0_head!00123714!!3!!benchmark_data_ceph001%ecubone%eos_5560_object512925!head
_GHOBJTOSEQ_:3%e0s0_head!001237D9!!3!!benchmark_data_ceph001%ecubone%eos_4919_object23289!head
_GHOBJTOSEQ_:3%e0s0_head!00123854!!3!!benchmark_data_ceph001%ecubone%eos_31461_object1108852!head
_GHOBJTOSEQ_:3%e0s0_head!00123971!!3!!benchmark_data_ceph001%ecubone%eos_5560_object704026!head
_GHOBJTOSEQ_:3%e0s0_head!00123F75!!3!!benchmark_data_ceph001%ecubone%eos_8961_object250441!head
_GHOBJTOSEQ_:3%e0s0_head!00124083!!3!!benchmark_data_ceph001%ecubone%eos_31461_object706178!head
_GHOBJTOSEQ_:3%e0s0_head!001240FA!!3!!benchmark_data_ceph001%ecubone%eos_5560_object316952!head
_GHOBJTOSEQ_:3%e0s0_head!0012447D!!3!!benchmark_data_ceph001%ecubone%eos_5560_object538734!head
_GHOBJTOSEQ_:3%e0s0_head!001244D9!!3!!benchmark_data_ceph001%ecubone%eos_31461_object789215!head

Re: [ceph-users] HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean

2014-08-18 Thread Riederer, Michael
Hi Craig,

I brought the cluster in a stable condition. All slow osds are no longer in the 
cluster. All remaining 36 osds are more than 100 MB / sec writeable (dd 
if=/dev/zero of=testfile-2.txt bs=1024 count=4096000). No ceph client is 
connected to the cluster. The ceph nodes are in idle. Now sees the state as 
follows:

root@ceph-admin-storage:~# ceph -s
cluster 6b481875-8be5-4508-b075-e1f660fd7b33
 health HEALTH_WARN 3 pgs down; 3 pgs incomplete; 3 pgs stuck inactive; 3 
pgs stuck unclean
 monmap e2: 3 mons at 
{ceph-1-storage=10.65.150.101:6789/0,ceph-2-storage=10.65.150.102:6789/0,ceph-3-storage=10.65.150.103:6789/0},
 election epoch 5018, quorum 0,1,2 ceph-1-storage,ceph-2-storage,ceph-3-storage
 osdmap e36830: 36 osds: 36 up, 36 in
  pgmap v10907190: 6144 pgs, 3 pools, 10997 GB data, 2760 kobjects
22051 GB used, 68206 GB / 90258 GB avail
6140 active+clean
   3 down+incomplete
   1 active+clean+replay

root@ceph-admin-storage:~# ceph health detail
HEALTH_WARN 3 pgs down; 3 pgs incomplete; 3 pgs stuck inactive; 3 pgs stuck 
unclean
pg 2.c1 is stuck inactive since forever, current state down+incomplete, last 
acting [13,8]
pg 2.e3 is stuck inactive since forever, current state down+incomplete, last 
acting [20,8]
pg 2.587 is stuck inactive since forever, current state down+incomplete, last 
acting [13,8]
pg 2.c1 is stuck unclean since forever, current state down+incomplete, last 
acting [13,8]
pg 2.e3 is stuck unclean since forever, current state down+incomplete, last 
acting [20,8]
pg 2.587 is stuck unclean since forever, current state down+incomplete, last 
acting [13,8]
pg 2.587 is down+incomplete, acting [13,8]
pg 2.e3 is down+incomplete, acting [20,8]
pg 2.c1 is down+incomplete, acting [13,8]

I have tried the following:

root@ceph-admin-storage:~# ceph pg scrub 2.587
instructing pg 2.587 on osd.13 to scrub
root@ceph-admin-storage:~# ceph pg scrub 2.e3
^[[Ainstructing pg 2.e3 on osd.20 to scrub
root@ceph-admin-storage:~# ceph pg scrub 2.c1
instructing pg 2.c1 on osd.13 to scrub

root@ceph-admin-storage:~# ceph pg deep-scrub 2.587
instructing pg 2.587 on osd.13 to deep-scrub
root@ceph-admin-storage:~# ceph pg deep-scrub 2.e3
instructing pg 2.e3 on osd.20 to deep-scrub
root@ceph-admin-storage:~# ceph pg deep-scrub 2.c1
instructing pg 2.c1 on osd.13 to deep-scrub

root@ceph-admin-storage:~# ceph pg repair 2.587
instructing pg 2.587 on osd.13 to repair
root@ceph-admin-storage:~# ceph pg repair 2.e3
instructing pg 2.e3 on osd.20 to repair
root@ceph-admin-storage:~# ceph pg repair 2.c1
instructing pg 2.c1 on osd.13 to repair

In the monitor logfiles (ceph-mon.ceph-1/2/3-storage.log) I see the pg scrub, 
pg deep-scrub and pg repair commands, but I do not see anything in ceph.log and 
nothing in the ceph-osd.13/20/8.log.
(2014-08-18 13:24:49.337954 7f24ac111700  0 mon.ceph-1-storage@0(leader) e2 
handle_command mon_command({prefix: pg repair, pgid: 2.587} v 0) v1)

Is it possible to repair the ceph-cluster?

root@ceph-admin-storage:~# ceph pg force_create_pg 2.587
pg 2.587 now creating, ok

But nothing happens, the pg will not created.

root@ceph-admin-storage:~# ceph -s
cluster 6b481875-8be5-4508-b075-e1f660fd7b33
 health HEALTH_WARN 2 pgs down; 2 pgs incomplete; 3 pgs stuck inactive; 3 
pgs stuck unclean
 monmap e2: 3 mons at 
{ceph-1-storage=10.65.150.101:6789/0,ceph-2-storage=10.65.150.102:6789/0,ceph-3-storage=10.65.150.103:6789/0},
 election epoch 5018, quorum 0,1,2 ceph-1-storage,ceph-2-storage,ceph-3-storage
 osdmap e36830: 36 osds: 36 up, 36 in
  pgmap v10907191: 6144 pgs, 3 pools, 10997 GB data, 2760 kobjects
22051 GB used, 68206 GB / 90258 GB avail
   1 creating
6140 active+clean
   2 down+incomplete
   1 active+clean+replay
root@ceph-admin-storage:~# ceph health detail
HEALTH_WARN 2 pgs down; 2 pgs incomplete; 3 pgs stuck inactive; 3 pgs stuck 
unclean
pg 2.c1 is stuck inactive since forever, current state down+incomplete, last 
acting [13,8]
pg 2.e3 is stuck inactive since forever, current state down+incomplete, last 
acting [20,8]
pg 2.587 is stuck inactive since forever, current state creating, last acting []
pg 2.c1 is stuck unclean since forever, current state down+incomplete, last 
acting [13,8]
pg 2.e3 is stuck unclean since forever, current state down+incomplete, last 
acting [20,8]
pg 2.587 is stuck unclean since forever, current state creating, last acting []
pg 2.e3 is down+incomplete, acting [20,8]
pg 2.c1 is down+incomplete, acting [13,8]

What can I do to get rid of the incomplete or creating pg?

Regards,
Mike



Von: Craig Lewis [cle...@centraldesktop.com]
Gesendet: Donnerstag, 14. August 2014 19:56
An: Riederer, Michael
Cc: Karan Singh; ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 
pgs stuck unclean

It sound 

Re: [ceph-users] HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean

2014-08-18 Thread Riederer, Michael
What has changed in the cluster compared to my first mail, the cluster was in a 
position to repair one pg, but now has a different pg in status 
active+clean+replay

root@ceph-admin-storage:~# ceph pg dump | grep ^2.92
dumped all in format plain
2.920000000active+clean2014-08-18 
10:37:20.9628580'036830:577[8,13]8[8,13]80'0
2014-08-18 10:37:20.96272813503'13904192014-08-14 10:37:12.497492
root@ceph-admin-storage:~# ceph pg dump | grep replay
dumped all in format plain
0.49a0000000active+clean+replay2014-08-18 
13:09:15.3172210'036830:1704[12,10]12[12,10]120'0   
 2014-08-18 13:09:15.3171310'02014-08-18 13:09:15.317131

Mike


Von: ceph-users [ceph-users-boun...@lists.ceph.com] im Auftrag von Riederer, 
Michael [michael.riede...@br.de]
Gesendet: Montag, 18. August 2014 13:40
An: Craig Lewis
Cc: ceph-users@lists.ceph.com; Karan Singh
Betreff: Re: [ceph-users] HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 
pgs stuck unclean

Hi Craig,

I brought the cluster in a stable condition. All slow osds are no longer in the 
cluster. All remaining 36 osds are more than 100 MB / sec writeable (dd 
if=/dev/zero of=testfile-2.txt bs=1024 count=4096000). No ceph client is 
connected to the cluster. The ceph nodes are in idle. Now sees the state as 
follows:

root@ceph-admin-storage:~# ceph -s
cluster 6b481875-8be5-4508-b075-e1f660fd7b33
 health HEALTH_WARN 3 pgs down; 3 pgs incomplete; 3 pgs stuck inactive; 3 
pgs stuck unclean
 monmap e2: 3 mons at 
{ceph-1-storage=10.65.150.101:6789/0,ceph-2-storage=10.65.150.102:6789/0,ceph-3-storage=10.65.150.103:6789/0},
 election epoch 5018, quorum 0,1,2 ceph-1-storage,ceph-2-storage,ceph-3-storage
 osdmap e36830: 36 osds: 36 up, 36 in
  pgmap v10907190: 6144 pgs, 3 pools, 10997 GB data, 2760 kobjects
22051 GB used, 68206 GB / 90258 GB avail
6140 active+clean
   3 down+incomplete
   1 active+clean+replay

root@ceph-admin-storage:~# ceph health detail
HEALTH_WARN 3 pgs down; 3 pgs incomplete; 3 pgs stuck inactive; 3 pgs stuck 
unclean
pg 2.c1 is stuck inactive since forever, current state down+incomplete, last 
acting [13,8]
pg 2.e3 is stuck inactive since forever, current state down+incomplete, last 
acting [20,8]
pg 2.587 is stuck inactive since forever, current state down+incomplete, last 
acting [13,8]
pg 2.c1 is stuck unclean since forever, current state down+incomplete, last 
acting [13,8]
pg 2.e3 is stuck unclean since forever, current state down+incomplete, last 
acting [20,8]
pg 2.587 is stuck unclean since forever, current state down+incomplete, last 
acting [13,8]
pg 2.587 is down+incomplete, acting [13,8]
pg 2.e3 is down+incomplete, acting [20,8]
pg 2.c1 is down+incomplete, acting [13,8]

I have tried the following:

root@ceph-admin-storage:~# ceph pg scrub 2.587
instructing pg 2.587 on osd.13 to scrub
root@ceph-admin-storage:~# ceph pg scrub 2.e3
^[[Ainstructing pg 2.e3 on osd.20 to scrub
root@ceph-admin-storage:~# ceph pg scrub 2.c1
instructing pg 2.c1 on osd.13 to scrub

root@ceph-admin-storage:~# ceph pg deep-scrub 2.587
instructing pg 2.587 on osd.13 to deep-scrub
root@ceph-admin-storage:~# ceph pg deep-scrub 2.e3
instructing pg 2.e3 on osd.20 to deep-scrub
root@ceph-admin-storage:~# ceph pg deep-scrub 2.c1
instructing pg 2.c1 on osd.13 to deep-scrub

root@ceph-admin-storage:~# ceph pg repair 2.587
instructing pg 2.587 on osd.13 to repair
root@ceph-admin-storage:~# ceph pg repair 2.e3
instructing pg 2.e3 on osd.20 to repair
root@ceph-admin-storage:~# ceph pg repair 2.c1
instructing pg 2.c1 on osd.13 to repair

In the monitor logfiles (ceph-mon.ceph-1/2/3-storage.log) I see the pg scrub, 
pg deep-scrub and pg repair commands, but I do not see anything in ceph.log and 
nothing in the ceph-osd.13/20/8.log.
(2014-08-18 13:24:49.337954 7f24ac111700  0 mon.ceph-1-storage@0(leader) e2 
handle_command mon_command({prefix: pg repair, pgid: 2.587} v 0) v1)

Is it possible to repair the ceph-cluster?

root@ceph-admin-storage:~# ceph pg force_create_pg 2.587
pg 2.587 now creating, ok

But nothing happens, the pg will not created.

root@ceph-admin-storage:~# ceph -s
cluster 6b481875-8be5-4508-b075-e1f660fd7b33
 health HEALTH_WARN 2 pgs down; 2 pgs incomplete; 3 pgs stuck inactive; 3 
pgs stuck unclean
 monmap e2: 3 mons at 
{ceph-1-storage=10.65.150.101:6789/0,ceph-2-storage=10.65.150.102:6789/0,ceph-3-storage=10.65.150.103:6789/0},
 election epoch 5018, quorum 0,1,2 ceph-1-storage,ceph-2-storage,ceph-3-storage
 osdmap e36830: 36 osds: 36 up, 36 in
  pgmap v10907191: 6144 pgs, 3 pools, 10997 GB data, 2760 kobjects
22051 GB used, 68206 GB / 90258 GB avail
   1 creating
6140 active+clean
   2 down+incomplete
   1 

[ceph-users] [radosgw-admin] bilog list confusion

2014-08-18 Thread Patrycja Szabłowska
Hi,


Is there any configuration option in ceph.conf for enabling/disabling
the bilog list?
I mean the result of this command:
radosgw-admin bilog list

One ceph cluster gives me results - list of operations which were made
to the bucket, and the other one gives me just an empty list. I can't
see what's the reason.


I can't find it anywhere here in the ceph.conf file.
http://ceph.com/docs/master/rados/configuration/ceph-conf/

My guess is it's in region info, but when I've changed these values to
false for the cluster with working bilog, the bilog would still show.

1. cluster with empty bilog list:
  zones: [
{ name: default,
  endpoints: [],
  log_meta: false,
  log_data: false}],
2. cluster with *proper* bilog list:
  zones: [
{ name: master-1,
  endpoints: [
http:\/\/[...]],
  log_meta: true,
  log_data: true}],


Here are pools on both of the clusters:

1. cluster with *proper* bilog list:
rbd
.rgw.root
.rgw.control
.rgw
.rgw.gc
.users.uid
.users.email
.users
.rgw.buckets
.rgw.buckets.index
.log
''

2. cluster with empty bilog list:
data
metadata
rbd
.rgw.root
.rgw.control
.rgw
.rgw.gc
.users.uid
.users.email
.users
''
.rgw.buckets.index
.rgw.buckets
.log


And here is the zone info (just the placement_pools, rest of the
config is the same):
1. cluster with *proper* bilog list:
placement_pools: []

2. cluster with *empty* bilog list:
  placement_pools: [
{ key: default-placement,
  val: { index_pool: .rgw.buckets.index,
  data_pool: .rgw.buckets,
  data_extra_pool: }}]}


Any thoughts? I've tried to figure it out by myself, but no luck.



Thanks,
Patrycja Szabłowska
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW problems

2014-08-18 Thread Bachelder, Kurt
Hi Marco,

Is your DNS setup to use the wildcard (*.gateway.testes.local)?

I noticed that you're using it in the server alias, but that you don't have an 
rgw_dns_name configured in your ceph.conf.  The rgw_dns_name should be set to 
gateway.testes.local if your dns is configured to use the wildcard naming 
with that subdomain.

Also see that you're using SSL... which domain have you signed? 
*.gateway.testes.local?

Since you can create a bucket, but not write to it, I'm wondering if there's an 
issue with the way your client is attempting to access the bucket... can you 
resolve bucket.gateway.testes.local from your client?

Kurt


 Original message 
From: Marco Garcês
Date:08/18/2014 6:33 AM (GMT-05:00)
To: Linux Chips
Cc: Bachelder, Kurt , ceph-users@lists.ceph.com
Subject: Re: [ceph-users] RadosGW problems

Hi there,

I have FastCgiWrapper Off in fastcgi.conf file; I also have SELinux in 
permissive state; 'ps aux | grep rados' shows me radosgw is running;

The problems stays the same... I can login with S3 credentials, create buckets, 
but uploads write this in the logs:
[Mon Aug 18 12:00:28.636378 2014] [:error] [pid 11251] [client 
10.5.1.1:49680http://10.5.1.1:49680] FastCGI: comm with server 
/var/www/cgi-bin/s3gw.fcgi aborted: idle timeout (3
0 sec)
[Mon Aug 18 12:00:28.676825 2014] [:error] [pid 11251] [client 
10.5.1.1:49680http://10.5.1.1:49680] FastCGI: incomplete headers (0 bytes) 
received from server /var/www/cgi-bin/s3
gw.fcgi

When I try Swift credentials, I cannot login at all.. I have tested both 
Cyberduck and Swift client on the command line, and I always get this on the 
logs:
GET /v1.0 HTTP/1.1 404 78 - Cyberduck/4.5 (Mac OS X/10.9.3) (x86_64)
GET /v1.0 HTTP/1.1 404 78 - python-swiftclient-2.2.0

In S3 login, when I upload a file, I can see it almost at 100% complete, but 
then it fails with the above errors.

A strange thing is... the /var/log/ceph/client.radosgw.gateway.log is not 
getting updated, I don't see any new logs in there.

Thank you once again for your help, Marco Garcês


Marco Garcês
#sysadmin
Maputo - Mozambique
[Phone] +258 84 4105579
[Skype] marcogarces


On Mon, Aug 18, 2014 at 12:08 AM, Linux Chips 
linux.ch...@gmail.commailto:linux.ch...@gmail.com wrote:
On Mon 18 Aug 2014 12:45:33 AM AST, Bachelder, Kurt wrote:
Hi Marco –

In CentOS 6, you also had to edit /etc/httpd/conf.d/fastcgi.conf to
turn OFF the fastcgi wrapper.  I haven’t tested in v7 yet, but I’d
guess it’s required there too:

# wrap all fastcgi script calls in suexec

FastCgiWrapper Off

Give that a try, if you haven’t already – restart httpd and
ceph-radosgw afterward.

Kurt

*From:*ceph-users 
[mailto:ceph-users-boun...@lists.ceph.commailto:ceph-users-boun...@lists.ceph.com]
 *On
Behalf Of *Marco Garcês
*Sent:* Friday, August 15, 2014 12:46 PM
*To:* ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
*Subject:* [ceph-users] RadosGW problems


Hi there,

I am using CentOS 7 with Ceph version 0.80.5
(38b73c67d375a2552d8ed67843c8a65c2c0feba6), 3 OSD, 3 MON, 1 RadosGW
(which also serves as ceph-deploy node)

I followed all the instructions in the docs, regarding setting up a
basic Ceph cluster, and then followed the one to setup RadosGW.

I can't seem to use the Swift interface, and the S3 interface, times
out after 30 seconds.

[Fri Aug 15 18:25:33.290877 2014] [:error] [pid 6197] [client
10.5.5.222:58051http://10.5.5.222:58051 http://10.5.5.222:58051] FastCGI: 
comm with server

/var/www/cgi-bin/s3gw.fcgi aborted: idle timeout (30 sec)

[Fri Aug 15 18:25:33.291781 2014] [:error] [pid 6197] [client
10.5.5.222:58051http://10.5.5.222:58051 http://10.5.5.222:58051] FastCGI: 
incomplete

headers (0 bytes) received from server /var/www/cgi-bin/s3gw.fcgi

*My ceph.conf:*


[global]

fsid = 581bcd61-8760-4756-a7c8-e8275c0957ad

mon_initial_members = CEPH01, CEPH02, CEPH03

mon_host = 10.2.27.81,10.2.27.82,10.2.27.83

public network = 10.2.27.0/25http://10.2.27.0/25 http://10.2.27.0/25


auth_cluster_required = cephx

auth_service_required = cephx

auth_client_required = cephx

filestore_xattr_use_omap = true

osd pool default size = 2

osd pool default pg num = 333

osd pool default pgp num = 333

osd journal size = 1024

[client.radosgw.gateway]

host = GATEWAY

keyring = /etc/ceph/ceph.client.radosgw.keyring

rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock

log file = /var/log/ceph/client.radosgw.gateway.log

rgw print continue = false

rgw enable ops log = true

*My apache rgw.conf:*


FastCgiExternalServer /var/www/cgi-bin/s3gw.fcgi -socket
/var/run/ceph/ceph.radosgw.gateway.fastcgi.sock

VirtualHost *:443

SSLEngine on

SSLCertificateFile /etc/pki/tls/certs/ca_rgw.crt

SSLCertificateKeyFile /etc/pki/tls/private/ca_rgw.key

SetEnv SERVER_PORT_SECURE 443

ServerName gateway.testes.local

ServerAlias *.gateway.testes.local

ServerAdmin marco.gar...@testes.co.mzmailto:marco.gar...@testes.co.mz

Re: [ceph-users] RadosGW problems

2014-08-18 Thread Marco Garcês
Hi Kurt,

I have pointed my DNS '*.gateway.testes.local' and 'gateway.testes.local,
to the same IP (the radosgw server).

I have added rgw_dns_name has you suggested to the config (it was comment
out). I will try everything and give feedback.

By the way, when I restart ceph-radosgw service, I get this in the logs
(which previous I did not see anything):

2014-08-18 15:19:44.812039 7fbf417fa700  1 handle_sigterm
2014-08-18 15:19:44.812104 7fbf417fa700  1 handle_sigterm set alarm for 120
2014-08-18 15:19:44.812235 7fbf5c495880 -1 shutting down
2014-08-18 15:19:44.812305 7fbf40ff9700  0 ERROR: FCGX_Accept_r returned -4
2014-08-18 15:19:44.812432 7fbf417fa700  1 handle_sigterm
2014-08-18 15:19:44.857506 7fbf5c495880  1 final shutdown
2014-08-18 15:19:45.010597 7fb318b96880  0 ceph version 0.80.5
(38b73c67d375a2552d8ed67843c8a65c2c0feba6), process radosgw, pid 3242
2014-08-18 15:19:45.219582 7fb318b96880  0 framework: fastcgi
2014-08-18 15:19:45.219599 7fb318b96880  0 starting handler: fastcgi
2014-08-18 15:19:45.692248 7fb2fe6fb700  0 ERROR: can't read user header:
ret=-2
2014-08-18 15:19:45.692273 7fb2fe6fb700  0 ERROR: sync_user() failed,
user=teste ret=-2

The last 2 lines look suspicious...


*Marco Garcês*
*#sysadmin*
Maputo - Mozambique
*[Phone]* +258 84 4105579
*[Skype]* marcogarces


On Mon, Aug 18, 2014 at 2:58 PM, Bachelder, Kurt 
kurt.bachel...@sierra-cedar.com wrote:

  Hi Marco,

  Is your DNS setup to use the wildcard (*.gateway.testes.local)?

  I noticed that you're using it in the server alias, but that you don't
 have an rgw_dns_name configured in your ceph.conf.  The rgw_dns_name
 should be set to gateway.testes.local if your dns is configured to use
 the wildcard naming with that subdomain.

  Also see that you're using SSL... which domain have you signed?
 *.gateway.testes.local?

  Since you can create a bucket, but not write to it, I'm wondering if
 there's an issue with the way your client is attempting to access the
 bucket... can you resolve bucket.gateway.testes.local from your client?

  Kurt


  Original message 
 From: Marco Garcês
 Date:08/18/2014 6:33 AM (GMT-05:00)
 To: Linux Chips
 Cc: Bachelder, Kurt , ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] RadosGW problems

   Hi there,

  I have FastCgiWrapper Off in fastcgi.conf file; I also have SELinux in
 permissive state; 'ps aux | grep rados' shows me radosgw is running;

  The problems stays the same... I can login with S3 credentials, create
 buckets, but uploads write this in the logs:
 [Mon Aug 18 12:00:28.636378 2014] [:error] [pid 11251] [client
 10.5.1.1:49680] FastCGI: comm with server /var/www/cgi-bin/s3gw.fcgi
 aborted: idle timeout (3
 0 sec)
 [Mon Aug 18 12:00:28.676825 2014] [:error] [pid 11251] [client
 10.5.1.1:49680] FastCGI: incomplete headers (0 bytes) received from
 server /var/www/cgi-bin/s3
 gw.fcgi

  When I try Swift credentials, I cannot login at all.. I have tested both
 Cyberduck and Swift client on the command line, and I always get this on
 the logs:
 GET /v1.0 HTTP/1.1 404 78 - Cyberduck/4.5 (Mac OS X/10.9.3) (x86_64)
 GET /v1.0 HTTP/1.1 404 78 - python-swiftclient-2.2.0

  In S3 login, when I upload a file, I can see it almost at 100% complete,
 but then it fails with the above errors.

  A strange thing is... the /var/log/ceph/client.radosgw.gateway.log is
 not getting updated, I don't see any new logs in there.

  Thank you once again for your help, Marco Garcês


  *Marco Garcês*
 *#sysadmin*
  Maputo - Mozambique
 *[Phone]* +258 84 4105579
 *[Skype]* marcogarces


 On Mon, Aug 18, 2014 at 12:08 AM, Linux Chips linux.ch...@gmail.com
 wrote:

 On Mon 18 Aug 2014 12:45:33 AM AST, Bachelder, Kurt wrote:

 Hi Marco –

 In CentOS 6, you also had to edit /etc/httpd/conf.d/fastcgi.conf to
 turn OFF the fastcgi wrapper.  I haven’t tested in v7 yet, but I’d
 guess it’s required there too:

 # wrap all fastcgi script calls in suexec

 FastCgiWrapper Off

 Give that a try, if you haven’t already – restart httpd and
 ceph-radosgw afterward.

 Kurt

  *From:*ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On
 Behalf Of *Marco Garcês
 *Sent:* Friday, August 15, 2014 12:46 PM
 *To:* ceph-users@lists.ceph.com
 *Subject:* [ceph-users] RadosGW problems


 Hi there,

 I am using CentOS 7 with Ceph version 0.80.5
 (38b73c67d375a2552d8ed67843c8a65c2c0feba6), 3 OSD, 3 MON, 1 RadosGW
 (which also serves as ceph-deploy node)

 I followed all the instructions in the docs, regarding setting up a
 basic Ceph cluster, and then followed the one to setup RadosGW.

 I can't seem to use the Swift interface, and the S3 interface, times
 out after 30 seconds.

 [Fri Aug 15 18:25:33.290877 2014] [:error] [pid 6197] [client
  10.5.5.222:58051 http://10.5.5.222:58051] FastCGI: comm with server

 /var/www/cgi-bin/s3gw.fcgi aborted: idle timeout (30 sec)

 [Fri Aug 15 18:25:33.291781 2014] [:error] [pid 6197] [client
  10.5.5.222:58051 http://10.5.5.222:58051] FastCGI: incomplete

 headers (0 bytes) 

[ceph-users] mds isn't working anymore after osd's running full

2014-08-18 Thread Jasper Siero
Hi all,

We have a small ceph cluster running version 0.80.1 with cephfs on five nodes.
Last week some osd's were full and shut itself down. To help de osd's start 
again I added some extra osd's and moved some placement group directories on 
the full osd's (which has a copy on another osd) to another place on the node 
(as mentioned in 
http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/)
After clearing some space on the full osd's I started them again. After a lot 
of deep scrubbing and two pg inconsistencies which needed to be repaired 
everything looked fine except the mds which still is in the replay state and it 
stays that way.
The log below says that mds need osdmap epoch 1833 and have 1832.

2014-08-18 12:29:22.268248 7fa786182700  1 mds.-1.0 handle_mds_map standby
2014-08-18 12:29:22.273995 7fa786182700  1 mds.0.25 handle_mds_map i am now 
mds.0.25
2014-08-18 12:29:22.273998 7fa786182700  1 mds.0.25 handle_mds_map state change 
up:standby -- up:replay
2014-08-18 12:29:22.274000 7fa786182700  1 mds.0.25 replay_start
2014-08-18 12:29:22.274014 7fa786182700  1 mds.0.25  recovery set is
2014-08-18 12:29:22.274016 7fa786182700  1 mds.0.25  need osdmap epoch 1833, 
have 1832
2014-08-18 12:29:22.274017 7fa786182700  1 mds.0.25  waiting for osdmap 1833 
(which blacklists prior instance)

 # ceph status
cluster c78209f5-55ea-4c70-8968-2231d2b05560
 health HEALTH_WARN mds cluster is degraded
 monmap e3: 3 mons at 
{th1-mon001=10.1.2.21:6789/0,th1-mon002=10.1.2.22:6789/0,th1-mon003=10.1.2.23:6789/0},
 election epoch 362, quorum 0,1,2 th1-mon001,th1-mon002,th1-mon003
 mdsmap e154: 1/1/1 up {0=th1-mon001=up:replay}, 1 up:standby
 osdmap e1951: 12 osds: 12 up, 12 in
  pgmap v193685: 492 pgs, 4 pools, 60297 MB data, 470 kobjects
124 GB used, 175 GB / 299 GB avail
 492 active+clean

# ceph osd tree
# idweighttype nameup/downreweight
-10.2399root default
-20.05997host th1-osd001
00.01999osd.0up1
10.01999osd.1up1
20.01999osd.2up1
-30.05997host th1-osd002
30.01999osd.3up1
40.01999osd.4up1
50.01999osd.5up1
-40.05997host th1-mon003
60.01999osd.6up1
70.01999osd.7up1
80.01999osd.8up1
-50.05997host th1-mon002
90.01999osd.9up1
100.01999osd.10up1
110.01999osd.11up1

What is the way to get the mds up and running again?

I still have all the placement group directories which I moved from the full 
osds which where down to create disk space.



Kind regards,

Jasper Siero
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cache tiering and CRUSH map

2014-08-18 Thread Michael Kolomiets
Hi
I am trying to use cache tiering and read the topic about mapping OSD
with pools 
(http://ceph.com/docs/master/rados/operations/crush-map/#placing-different-pools-on-different-osds).
I can't realize why OSDs were splitted on spinner and SSD type on root
level of CRUSH map?

Is it possible to to use some location type under host level to group
OSDs by type and use then it in mapping rules?

-- 
Michael Kolomiets
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Days are back with a vengeance!

2014-08-18 Thread Patrick McGarry
Greetings cephalofolk,

Now that the Ceph Day events are becoming much more of a community
undertaking (as opposed to a Inktank-hosted event), we are really
ramping things up.  There are currently four events planned in the
near future, and we need speakers for all of them!

http://ceph.com/cephdays/

If you are interested in speaking at any of these events just send me
the following:

1) Title
2) Abstract (brief outline of your ceph-related talk)
3) Speaker Name/title
4) Organization/Affiliation (or just Ceph Community if you are
speaking on your own)
5) Event at which you wish to speak


Currently we have openings at the following events:

* 18 SEP 2014 -- Paris, France :: Le Comptoir General Ghetto Museum
* 24 SEP 2014 -- San Jose, CA USA :: Brocade Communication Systems HQ
* 08 OCT 2014 -- New York, NY USA :: Humphrey at the Eventi Hotel
* 22 OCT 2014 -- London, UK :: etc. Venues St Paul's


We obviously love any talks that are Ceph-related, but we're
especially interested in some of the following topics:

* CephFS
* Performance Tuning
* Integrations work
* Crazy experiments
* Large scale deployment/management use cases
* Using embedded object classes

Please let me know if you have questions or concerns before
submitting, but hurry as spots will fill up quickly!  Thanks.


Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph-Deploy Install Error

2014-08-18 Thread Alfredo Deza
Do you have the full paste of the ceph-deploy output?

Tracing the URL we definitely not have google-perftools packages for
Wheezy, the full output might help understanding what is going on

On Mon, Aug 11, 2014 at 8:01 PM, joshua Kay scjo...@gmail.com wrote:
 Hi,

 When I attempt to use the ceph-deploy install command on one of my nodes I
 get this error:


 ][WARNIN] W: Failed to fetch
 http://ceph.com/packages/google-perftools/debian/dists/wheezy/main/binary-armhf/Packages
 404  Not Found [IP: 208.113.241.137 80]
 [ceph1][WARNIN]
 [ceph1][WARNIN] E: Some index files failed to download. They have been
 ignored, or old ones used instead.
 [ceph1][ERROR ] RuntimeError: command returned non-zero exit status: 100
 [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: apt-get -q
 update

 Does anyone know the cause of this problem and the solution?

 Thanks,
 Josh

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fixed all active+remapped PGs stuck forever (but I have no clue why)

2014-08-18 Thread John Morris


On 08/14/2014 02:35 AM, Christian Balzer wrote:


The default (firefly, but previous ones are functionally identical) crush
map has:
---
# rules
rule replicated_ruleset {
 ruleset 0
 type replicated
 min_size 1
 max_size 10
 step take default
 step chooseleaf firstn 0 type host
 step emit
}
---

The type host states that there will be not more that one replica per host
(node), so with size=3 you will need at least 3 hosts to choose from.
If you were to change this to to type OSD, all 3 replicas could wind up on
the same host, not really a good idea.


Ah, this is a great clue.  (On my cluster, the default rule contains 
'step choose firstn 0 type osd', and thus has the problem you hint at here.)


So I played with a new rule set with the buckets 'root', 'rack', 'host', 
'bank' and 'osd', of which 'rack' and 'host' are unused.  The 'bank' 
bucket:  the OSD nodes each contain two 'banks' of disks with a separate 
disk controller channel, a separate power supply cable, and a separate 
SSD.  Thus, 'bank' actually does represent a real failure domain.  More 
importantly, this provides a bucket level below 'osd' that is big enough 
for 3-4 replicas.  Here's the rule:


rule by_bank {
ruleset 3
type replicated
min_size 3
max_size 4
step take default
step choose firstn 0 type bank
step choose firstn 0 type osd
step emit
}

If the OP (sorry, Craig, you do have a name ;) wants to play with CRUSH 
map rules, here's the quick and dirty of what I did:


# get the current 'orig' CRUSH map, decompile and edit; see:
# 
http://ceph.com/docs/master/rados/operations/crush-map/#editing-a-crush-map


ceph osd getcrushmap -o /tmp/crush-orig.bin
crushtool -d /tmp/crush-orig.bin -o /tmp/crush.txt
$EDITOR /tmp/crush.txt

# edit the crush map with your fave editor; see:
# http://ceph.com/docs/master/rados/operations/crush-map
#
# in my case, I added the bank type:

type 0 osd
type 1 bank
type 2 host
type 3 rack
type 4 root

# the banks (repeat as applicable):

bank bank0 {
id -6
alg straw
hash 0
item osd.0 weight 1.000
item osd.1 weight 1.000
}

bank bank1 {
id -7
alg straw
hash 0
item osd.2 weight 1.000
item osd.3 weight 1.000
}

# updated the hosts (repeat as applicable):

host host0 {
id -4   # do not change unnecessarily
# weight 3.000
alg straw
hash 0  # rjenkins1
item bank0 weight 2.000
item bank1 weight 2.000
}

# and added the rule:

rule by_bank {
ruleset 3
type replicated
min_size 3
max_size 4
step take default
step choose firstn 0 type bank
step choose firstn 0 type osd
step emit
}

# compile the crush map:

crushtool -c /tmp/crush.txt -o /tmp/crush-new.bin

# and run some tests; the replica sizes tested come from
# 'min_size' and 'max_size' in the above rule; see:
# http://ceph.com/docs/master/man/8/crushtool/#running-tests-with-test
#
# show sample PG-OSD maps:

crushtool -i /tmp/crush-new.bin --test --show-statistics

# show bad mappings; if the CRUSH map is correct,
# this should be empty:

crushtool -i /tmp/crush-new.bin --test --show-bad-mappings

# show per-OSD pg utilization:

crushtool -i /tmp/crush-new.bin --test --show-utilization



You might finackle something like that (again the rule splits on hosts) by
having multiple hosts on one physical machine, but therein lies madness.


Well, the bucket names can be changed, as above, and Sage hints at doing 
something like this here:


http://wiki.ceph.com/Planning/Blueprints/Dumpling/extend_crush_rule_language

(And IIUC he also proposes something to implement my original 
intentions:  distribute four replicas, two on each of two racks, and 
don't put two replicas on the same host within a rack; this is more 
easily generalized than the above funky configuration.)


John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy error

2014-08-18 Thread Alfredo Deza
Oh yes, we don't have ARM packages for wheezy.



On Mon, Aug 11, 2014 at 7:12 PM, joshua Kay scjo...@gmail.com wrote:
 Hi,



 I am running into an error when I am attempting to use ceph-deploy install
 when creating my cluster. I am attempting to run ceph on Debian 7.0 wheezy
 with an ARM processor. When I attempt to run ceph-deploy install I get the
 following errors:



 [ceph1][WARNIN] E: Unable to locate package ceph

 [ceph1][WARNIN] E: Unable to locate package ceph-mds

 [ceph1][WARNIN] E: Unable to locate package ceph-common

 [ceph1][WARNIN] E: Unable to locate package ceph-fs-common

 [ceph1][ERROR ] RuntimeError: command returned non-zero exit status: 100

 [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env
 DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get -q -o
 Dpkg::Options::=--force-confnew --no-install-recommends --assume-yes install
 -- ceph ceph-mds ceph-common ceph-fs-common gdisk



 I am assuming I do not have all the packages required for debian wheezy, but
 I have tried to set up a repository and manually insert the packages from
 this documentation: http://ceph.com/docs/master/install/get-packages/



 Does anyone know the issue or the proper way to set up a repository for a
 Debian wheezy system?



 Thanks


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean

2014-08-18 Thread Craig Lewis
I take it that OSD 8, 13, and 20 are some of the stopped OSDs.

I wasn't able to get ceph to execute ceph pg force_create until the OSDs in
[recovery_state][probing_osds] from ceph pg query were online.  I ended up
reformatting most of them, and re-adding them to the cluster.

What's wrong with those OSDs?  How slow are they?  If the problem is just
that they're really slow, try starting them up, and manually marking them
UP and OUT.  That way Ceph will read from them, but not write to them.  If
they won't stay up, I'd replace them, and get the replacements back in the
cluster.  I'd leave the replacements UP and OUT.  You can rebalance later,
after the cluster is healthy again.



I've never seen the replay state, I'm not sure what to do with that.



On Mon, Aug 18, 2014 at 5:05 AM, Riederer, Michael michael.riede...@br.de
wrote:

  What has changed in the cluster compared to my first mail, the cluster
 was in a position to repair one pg, but now has a different pg in status
 active+clean+replay

 root@ceph-admin-storage:~# ceph pg dump | grep ^2.92
 dumped all in format plain
 2.920000000active+clean2014-08-18
 10:37:20.9628580'036830:577[8,13]8[8,13]80'0
 2014-08-18 10:37:20.96272813503'13904192014-08-14 10:37:12.497492
 root@ceph-admin-storage:~# ceph pg dump | grep replay
 dumped all in format plain
 0.49a0000000active+clean+replay
 2014-08-18 13:09:15.3172210'036830:1704[12,10]12
 [12,10]120'02014-08-18 13:09:15.3171310'02014-08-18
 13:09:15.317131

 Mike

  --
 *Von:* ceph-users [ceph-users-boun...@lists.ceph.com] im Auftrag von
 Riederer, Michael [michael.riede...@br.de]
 *Gesendet:* Montag, 18. August 2014 13:40
 *An:* Craig Lewis
 *Cc:* ceph-users@lists.ceph.com; Karan Singh

 *Betreff:* Re: [ceph-users] HEALTH_WARN 4 pgs incomplete; 4 pgs stuck
 inactive; 4 pgs stuck unclean

   Hi Craig,

 I brought the cluster in a stable condition. All slow osds are no longer in
 the cluster. All remaining 36 osds are more than 100 MB / sec writeable
 (dd if=/dev/zero of=testfile-2.txt bs=1024 count=4096000). No ceph client
 is connected to the cluster. The ceph nodes are in idle. Now sees the
 state as follows:

 root@ceph-admin-storage:~# ceph -s
 cluster 6b481875-8be5-4508-b075-e1f660fd7b33
  health HEALTH_WARN 3 pgs down; 3 pgs incomplete; 3 pgs stuck
 inactive; 3 pgs stuck unclean
  monmap e2: 3 mons at {ceph-1-storage=
 10.65.150.101:6789/0,ceph-2-storage=10.65.150.102:6789/0,ceph-3-storage=10.65.150.103:6789/0},
 election epoch 5018, quorum 0,1,2
 ceph-1-storage,ceph-2-storage,ceph-3-storage
  osdmap e36830: 36 osds: 36 up, 36 in
   pgmap v10907190: 6144 pgs, 3 pools, 10997 GB data, 2760 kobjects
 22051 GB used, 68206 GB / 90258 GB avail
 6140 active+clean
3 down+incomplete
1 active+clean+replay

 root@ceph-admin-storage:~# ceph health detail
 HEALTH_WARN 3 pgs down; 3 pgs incomplete; 3 pgs stuck inactive; 3 pgs
 stuck unclean
 pg 2.c1 is stuck inactive since forever, current state down+incomplete,
 last acting [13,8]
 pg 2.e3 is stuck inactive since forever, current state down+incomplete,
 last acting [20,8]
 pg 2.587 is stuck inactive since forever, current state down+incomplete,
 last acting [13,8]
 pg 2.c1 is stuck unclean since forever, current state down+incomplete,
 last acting [13,8]
 pg 2.e3 is stuck unclean since forever, current state down+incomplete,
 last acting [20,8]
 pg 2.587 is stuck unclean since forever, current state down+incomplete,
 last acting [13,8]
 pg 2.587 is down+incomplete, acting [13,8]
 pg 2.e3 is down+incomplete, acting [20,8]
 pg 2.c1 is down+incomplete, acting [13,8]

 I have tried the following:

 root@ceph-admin-storage:~# ceph pg scrub 2.587
 instructing pg 2.587 on osd.13 to scrub
 root@ceph-admin-storage:~# ceph pg scrub 2.e3
 ^[[Ainstructing pg 2.e3 on osd.20 to scrub
 root@ceph-admin-storage:~# ceph pg scrub 2.c1
 instructing pg 2.c1 on osd.13 to scrub

 root@ceph-admin-storage:~# ceph pg deep-scrub 2.587
 instructing pg 2.587 on osd.13 to deep-scrub
 root@ceph-admin-storage:~# ceph pg deep-scrub 2.e3
 instructing pg 2.e3 on osd.20 to deep-scrub
 root@ceph-admin-storage:~# ceph pg deep-scrub 2.c1
 instructing pg 2.c1 on osd.13 to deep-scrub

 root@ceph-admin-storage:~# ceph pg repair 2.587
 instructing pg 2.587 on osd.13 to repair
 root@ceph-admin-storage:~# ceph pg repair 2.e3
 instructing pg 2.e3 on osd.20 to repair
 root@ceph-admin-storage:~# ceph pg repair 2.c1
 instructing pg 2.c1 on osd.13 to repair

 In the monitor logfiles (ceph-mon.ceph-1/2/3-storage.log) I see the pg
 scrub, pg deep-scrub and pg repair commands, but I do not see anything in
 ceph.log and nothing in the ceph-osd.13/20/8.log.
 (2014-08-18 13:24:49.337954 7f24ac111700  0 mon.ceph-1-storage@0(leader)
 e2 handle_command 

Re: [ceph-users] Fixed all active+remapped PGs stuck forever (but I have no clue why)

2014-08-18 Thread John Morris



On 08/18/2014 12:13 PM, John Morris wrote:


On 08/14/2014 02:35 AM, Christian Balzer wrote:


The default (firefly, but previous ones are functionally identical) crush
map has:
---
# rules
rule replicated_ruleset {
 ruleset 0
 type replicated
 min_size 1
 max_size 10
 step take default
 step chooseleaf firstn 0 type host
 step emit
}
---

The type host states that there will be not more that one replica per
host
(node), so with size=3 you will need at least 3 hosts to choose from.
If you were to change this to to type OSD, all 3 replicas could wind
up on
the same host, not really a good idea.


Ah, this is a great clue.  (On my cluster, the default rule contains
'step choose firstn 0 type osd', and thus has the problem you hint at
here.)

So I played with a new rule set with the buckets 'root', 'rack', 'host',
'bank' and 'osd', of which 'rack' and 'host' are unused.  The 'bank'
bucket:  the OSD nodes each contain two 'banks' of disks with a separate
disk controller channel, a separate power supply cable, and a separate
SSD.  Thus, 'bank' actually does represent a real failure domain.  More
importantly, this provides a bucket level below 'osd' that is big enough
for 3-4 replicas.  Here's the rule:

rule by_bank {
 ruleset 3
 type replicated
 min_size 3
 max_size 4
 step take default
 step choose firstn 0 type bank
 step choose firstn 0 type osd
 step emit
}


Ah, with the 'legacy' tunables, the 'chooseleaf' step in the above rule 
generates bad mappings.  But by injecting tunables into the map 
(recommended in the below link), the rule can be shortened to the following:


rule by_bank {
ruleset 3
type replicated
min_size 3
max_size 4
step take default
step chooseleaf firstn 0 type bank
step emit
}

See this link:

http://ceph.com/docs/master/rados/operations/crush-map/#tuning-crush-the-hard-way

Below, after re-compiling the new CRUSH map, but before running tests, 
inject the tunables into the binary map, and then run the tests on 
/tmp/crush-new-tuned.bin instead:


crushtool --enable-unsafe-tunables \
  --set-choose-local-tries 0 \
  --set-choose-local-fallback-tries 0 \
  --set-choose-total-tries 50 \
  -i /tmp/crush-new.bin -o /tmp/crush-new-tuned.bin



If the OP (sorry, Craig, you do have a name ;) wants to play with CRUSH
map rules, here's the quick and dirty of what I did:

# get the current 'orig' CRUSH map, decompile and edit; see:
#
http://ceph.com/docs/master/rados/operations/crush-map/#editing-a-crush-map

ceph osd getcrushmap -o /tmp/crush-orig.bin
crushtool -d /tmp/crush-orig.bin -o /tmp/crush.txt
$EDITOR /tmp/crush.txt

# edit the crush map with your fave editor; see:
# http://ceph.com/docs/master/rados/operations/crush-map
#
# in my case, I added the bank type:

type 0 osd
type 1 bank
type 2 host
type 3 rack
type 4 root

# the banks (repeat as applicable):

bank bank0 {
 id -6
 alg straw
 hash 0
 item osd.0 weight 1.000
 item osd.1 weight 1.000
}

bank bank1 {
 id -7
 alg straw
 hash 0
 item osd.2 weight 1.000
 item osd.3 weight 1.000
}

# updated the hosts (repeat as applicable):

host host0 {
 id -4   # do not change unnecessarily
 # weight 3.000
 alg straw
 hash 0  # rjenkins1
 item bank0 weight 2.000
 item bank1 weight 2.000
}

# and added the rule:

rule by_bank {
 ruleset 3
 type replicated
 min_size 3
 max_size 4
 step take default
 step choose firstn 0 type bank
 step choose firstn 0 type osd
 step emit
}

# compile the crush map:

crushtool -c /tmp/crush.txt -o /tmp/crush-new.bin

# and run some tests; the replica sizes tested come from
# 'min_size' and 'max_size' in the above rule; see:
# http://ceph.com/docs/master/man/8/crushtool/#running-tests-with-test
#
# show sample PG-OSD maps:

crushtool -i /tmp/crush-new.bin --test --show-statistics

# show bad mappings; if the CRUSH map is correct,
# this should be empty:

crushtool -i /tmp/crush-new.bin --test --show-bad-mappings

# show per-OSD pg utilization:

crushtool -i /tmp/crush-new.bin --test --show-utilization



You might finackle something like that (again the rule splits on
hosts) by
having multiple hosts on one physical machine, but therein lies
madness.


Well, the bucket names can be changed, as above, and Sage hints at doing
something like this here:

http://wiki.ceph.com/Planning/Blueprints/Dumpling/extend_crush_rule_language


(And IIUC he also proposes something to implement my original
intentions:  distribute four replicas, two on each of two racks, and
don't put two replicas on the same host within a rack; this is more
easily generalized than the above funky configuration.)

 John

___

Re: [ceph-users] Fixed all active+remapped PGs stuck forever (but I have no clue why)

2014-08-18 Thread Sage Weil
On Mon, 18 Aug 2014, John Morris wrote:
 rule by_bank {
 ruleset 3
 type replicated
 min_size 3
 max_size 4
 step take default
 step choose firstn 0 type bank
 step choose firstn 0 type osd
 step emit
 }

You probably want:

 step choose firstn 0 type bank
 step choose firstn 1 type osd

I.e., 3 (or 4) banks, and 1 osd in each.. not 3 banks with 3 osds in each 
or 4 banks with 4 osds in each (for a total of 9 or 16 OSDs).

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Managing OSDs on twin machines

2014-08-18 Thread Pierre Jaury
Hello guys,

I just acquired some brand new machines I would like to rely upon for a
storage cluster (and some virtualization). These machines are, however,
« twin servers », ie. each blade (1U) comes with two different machines
but a single psu.

I think two replicas would be enough for the intended purpose. Yet I
cannot guarantee that all replicas of a given object are stored on two
different blades.

I basically have N blades, each blade has 2 distinct machines but a
single psu, each machine has 2 hard drives. Is it possible to configure
mutual exclusion between OSDs where replicas of a single object are stored?

Regards

-- 
Pierre Jaury @ kaiyou
http://kaiyou.fr/contact.html



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Managing OSDs on twin machines

2014-08-18 Thread Jason Harley
Hi Pierre —

You can manipulate your CRUSH map to make use of ‘chassis’ in addition to the 
default ‘host’ type.  I’ve done this with FatTwin and FatTwin^2 boxes with 
great success.

For more reading take a look at: 
http://ceph.com/docs/master/rados/operations/crush-map/

In particular the ‘Move a Bucket’ section: 
http://ceph.com/docs/master/rados/operations/crush-map/#move-a-bucket

./JRH

On Aug 18, 2014, at 2:57 PM, Pierre Jaury pie...@jaury.eu wrote:

 Hello guys,
 
 I just acquired some brand new machines I would like to rely upon for a
 storage cluster (and some virtualization). These machines are, however,
 « twin servers », ie. each blade (1U) comes with two different machines
 but a single psu.
 
 I think two replicas would be enough for the intended purpose. Yet I
 cannot guarantee that all replicas of a given object are stored on two
 different blades.
 
 I basically have N blades, each blade has 2 distinct machines but a
 single psu, each machine has 2 hard drives. Is it possible to configure
 mutual exclusion between OSDs where replicas of a single object are stored?
 
 Regards
 
 -- 
 Pierre Jaury @ kaiyou
 http://kaiyou.fr/contact.html
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fixed all active+remapped PGs stuck forever (but I have no clue why)

2014-08-18 Thread John Morris



On 08/18/2014 01:49 PM, Sage Weil wrote:

On Mon, 18 Aug 2014, John Morris wrote:

rule by_bank {
 ruleset 3
 type replicated
 min_size 3
 max_size 4
 step take default
 step choose firstn 0 type bank
 step choose firstn 0 type osd
 step emit
}


You probably want:

  step choose firstn 0 type bank
  step choose firstn 1 type osd

I.e., 3 (or 4) banks, and 1 osd in each.. not 3 banks with 3 osds in each
or 4 banks with 4 osds in each (for a total of 9 or 16 OSDs).


Yes, thanks.  Funny, testing still works with the incorrect version, and 
the --show-utilization test results look similar.


In re. to my last email about tunables, those can also be expressed in 
the human-readable map as such:


tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50

John




sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] v0.84 released

2014-08-18 Thread Sage Weil
The next Ceph development release is here!  This release contains several 
meaty items, including some MDS improvements for journaling, the ability 
to remove the CephFS file system (and name it), several mon cleanups with 
tiered pools, several OSD performance branches, a new read forward RADOS 
caching mode, a prototype Kinetic OSD backend, and various radosgw 
improvements (especially with the new standalone civetweb frontend).  And 
there are a zillion OSD bug fixes. Things are looking pretty good for the 
Giant release that is coming up in the next month.

Upgrading
-

* The *_kb perf counters on the monitor have been removed.  These are
  replaced with a new set of *_bytes counters (e.g., cluster_osd_kb is
  replaced by cluster_osd_bytes).

* The rd_kb and wr_kb fields in the JSON dumps for pool stats (accessed via
  the 'ceph df detail -f json-pretty' and related commands) have been replaced
  with corresponding *_bytes fields.  Similarly, the 'total_space', 
'total_used',
  and 'total_avail' fields are replaced with 'total_bytes', 
  'total_used_bytes', and 'total_avail_bytes' fields.

* The 'rados df --format=json' output 'read_bytes' and 'write_bytes'
  fields were incorrectly reporting ops; this is now fixed.

* The 'rados df --format=json' output previously included 'read_kb' and
  'write_kb' fields; these have been removed.  Please use 'read_bytes' and
  'write_bytes' instead (and divide by 1024 if appropriate).

Notable Changes
---

* ceph-conf: flush log on exit (Sage Weil)
* ceph-dencoder: refactor build a bit to limit dependencies (Sage Weil, 
  Dan Mick)
* ceph.spec: split out ceph-common package, other fixes (Sandon Van Ness)
* ceph_test_librbd_fsx: fix RNG, make deterministic (Ilya Dryomov)
* cephtool: refactor and improve CLI tests (Joao Eduardo Luis)
* client: improved MDS session dumps (John Spray)
* common: fix dup log messages (#9080, Sage Weil)
* crush: include new tunables in dump (Sage Weil)
* crush: only require rule features if the rule is used (#8963, Sage Weil)
* crushtool: send output to stdout, not stderr (Wido den Hollander)
* fix i386 builds (Sage Weil)
* fix struct vs class inconsistencies (Thorsten Behrens)
* hadoop: update hadoop tests for Hadoop 2.0 (Haumin Chen)
* librbd, ceph-fuse: reduce cache flush overhead (Haomai Wang)
* librbd: fix error path when opening image (#8912, Josh Durgin)
* mds: add file system name, enabled flag (John Spray)
* mds: boot refactor, cleanup (John Spray)
* mds: fix journal conversion with standby-replay (John Spray)
* mds: separate inode recovery queue (John Spray)
* mds: session ls, evict commands (John Spray)
* mds: submit log events in async thread (Yan, Zheng)
* mds: use client-provided timestamp for user-visible file metadata (Yan, 
  Zheng)
* mds: validate journal header on load and save (John Spray)
* misc build fixes for OS X (John Spray)
* misc integer size cleanups (Kevin Cox)
* mon: add get-quota commands (Joao Eduardo Luis)
* mon: do not create file system by default (John Spray)
* mon: fix 'ceph df' output for available space (Xiaoxi Chen)
* mon: fix bug when no auth keys are present (#8851, Joao Eduardo Luis)
* mon: fix compat version for MForward (Joao Eduardo Luis)
* mon: restrict some pool properties to tiered pools (Joao Eduardo Luis)
* msgr: misc locking fixes for fast dispatch (#8891, Sage Weil)
* osd: add 'dump_reservations' admin socket command (Sage Weil)
* osd: add READFORWARD caching mode (Luis Pabon)
* osd: add header cache for KeyValueStore (Haomai Wang)
* osd: add prototype KineticStore based on Seagate Kinetic (Josh Durgin)
* osd: allow map cache size to be adjusted at runtime (Sage Weil)
* osd: avoid refcounting overhead by passing a few things by ref (Somnath 
  Roy)
* osd: avoid sharing PG info that is not durable (Samuel Just)
* osd: clear slow request latency info on osd up/down (Sage Weil)
* osd: fix PG object listing/ordering bug (Guang Yang)
* osd: fix PG stat errors with tiering (#9082, Sage Weil)
* osd: fix bug with long object names and rename (#8701, Sage Weil)
* osd: fix cache full - not full requeueing (#8931, Sage Weil)
* osd: fix gating of messages from old OSD instances (Greg Farnum)
* osd: fix memstore bugs with collection_move_rename, lock ordering (Sage 
  Weil)
* osd: improve locking for KeyValueStore (Haomai Wang)
* osd: make tiering behave if hit_sets aren't enabled (Sage Weil)
* osd: mark pools with incomplete clones (Sage Weil)
* osd: misc locking fixes for fast dispatch (Samuel Just, Ma Jianpeng)
* osd: prevent old rados clients from using tiered pools (#8714, Sage 
  Weil)
* osd: reduce OpTracker overhead (Somnath Roy)
* osd: set configurable hard limits on object and xattr names (Sage Weil, 
  Haomai Wang)
* osd: trim old EC objects quickly; verify on scrub (Samuel Just)
* osd: work around GCC 4.8 bug in journal code (Matt Benjamin)
* rados bench: fix arg order (Kevin Dalley)
* rados: fix {read,write}_ops values for df output (Sage Weil)
* rbd: add rbdmap pre- 

[ceph-users] cephfs set_layout / setfattr ... does not work anymore for pools

2014-08-18 Thread Kasper Dieter
Hi Sage,

a couple of months ago (maybe last year) I was able to change the
assignment of Directorlies and Files of CephFS to different pools 
back and forth (with cephfs set_layout as well as with setfattr).

Now (with ceph v0.81 and Kernel 3.10 an the client side)
neither 'cephfs set_layout' nor 'setfattr' works anymore:

# mount | grep ceph
ceph-fuse on /mnt/ceph-fuse type fuse.ceph-fuse 
(rw,nosuid,nodev,allow_other,default_permissions)
192.168.113.52:6789:/ on /mnt/cephfs type ceph (name=admin,key=client.admin)

# ls -l /mnt/cephfs
total 0
-rw-r--r-- 1 root root 0 Aug 18 21:06 file
-rw-r--r-- 1 root root 0 Aug 18 21:10 file2
-rw-r--r-- 1 root root 0 Aug 18 21:11 file3
drwxr-xr-x 1 root root 0 Aug 18 20:54 sas-r2
drwxr-xr-x 1 root root 0 Aug 18 20:54 ssd-r2

# getfattr -d -m - /mnt/cephfs
getfattr: Removing leading '/' from absolute path names
# file: mnt/cephfs
ceph.dir.entries=5
ceph.dir.files=3
ceph.dir.layout=stripe_unit=4194304 stripe_count=1 object_size=4194304 
pool=SAS-r2
ceph.dir.rbytes=0
ceph.dir.rctime=0.090
ceph.dir.rentries=1
ceph.dir.rfiles=0
ceph.dir.rsubdirs=1
ceph.dir.subdirs=2

# setfattr -n ceph.dir.layout.pool -v SSD-r2 /mnt/cephfs
setfattr: /mnt/cephfs: Invalid argument

# ceph osd dump | grep pool
pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins 
pg_num 8064 pgp_num 8064 last_change 1 flags hashpspool crash_replay_interval 
45 stripe_width 0
pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 8064 pgp_num 8064 last_change 1 flags hashpspool stripe_width 0
pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins 
pg_num 8064 pgp_num 8064 last_change 1 flags hashpspool stripe_width 0
pool 3 'SSD-r2' replicated size 2 min_size 2 crush_ruleset 3 object_hash 
rjenkins pg_num 6000 pgp_num 6000 last_change 404 flags hashpspool stripe_width 0
pool 4 'SAS-r2' replicated size 2 min_size 2 crush_ruleset 4 object_hash 
rjenkins pg_num 6000 pgp_num 6000 last_change 408 flags hashpspool stripe_width 0
pool 5 'SSD-r3' replicated size 3 min_size 2 crush_ruleset 3 object_hash 
rjenkins pg_num 4000 pgp_num 4000 last_change 413 flags hashpspool stripe_width 0
pool 6 'SAS-r3' replicated size 3 min_size 2 crush_ruleset 4 object_hash 
rjenkins pg_num 4000 pgp_num 4000 last_change 416 flags hashpspool stripe_width 0

# getfattr -d -m - /mnt/cephfs/ssd-r2
getfattr: Removing leading '/' from absolute path names
# file: mnt/cephfs/ssd-r2
ceph.dir.entries=0
ceph.dir.files=0
ceph.dir.rbytes=0
ceph.dir.rctime=0.090
ceph.dir.rentries=1
ceph.dir.rfiles=0
ceph.dir.rsubdirs=1

# setfattr -n ceph.dir.layout.pool -v SSD-r2 /mnt/cephfs/ssd-r2
setfattr: /mnt/cephfs/ssd-r2: Invalid argument

# cephfs /mnt/cephfs/ssd-r2   set_layout -p 3 -s 4194304 -u 4194304 -c 1
Error setting layout: (22) Invalid argument


Any recommendations ?
Is this a bug, or a new feature ?
Do I have to use a newer Kernel ?


Kind Regards,
-Dieter



On Sat, Aug 31, 2013 at 02:26:48AM +0200, Sage Weil wrote:
 On Fri, 30 Aug 2013, Joao Pedras wrote:
  
  Greetings all!
  
  I am bumping into a small issue and I am wondering if someone has any
  insight on it.
  
  I am trying to use a pool other than 'data' for cephfs. Said pool has id #3
  and I have run 'ceph mds add_data_pool 3'.
  
  After mounting cephfs seg faults when trying to set the layout:
  
  $ cephfs /path set_layout -p 3
  
  Segmentation fault
  
  Actually plainly running 'cephfs /path set_layout' without more options will
  seg fault as well.
  
  Version is 0.61.8 on ubuntu 12.04.
  
  A question that comes to mind here is if there is a way of accomplishing
  this when using ceph-fuse (3.x kernels).
 
 You can adjust this more easily using the xattr interface:
 
  getfattr -n ceph.dir.layout dir
  setfattr -n ceph.dir.layout.pool -v mypool
  getfattr -n ceph.dir.layout dir
 
 The interface tests are probably a decent reference given this isn't 
 explicitly documented anywhere:
 
  https://github.com/ceph/ceph/blob/master/qa/workunits/misc/layout_vxattrs.sh
 
 sage
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs set_layout / setfattr ... does not work anymore for pools

2014-08-18 Thread Sage Weil
Hi Dieter,

There is a new xattr based interface.  See


https://github.com/ceph/ceph/blob/master/qa/workunits/fs/misc/layout_vxattrs.sh

The nice part about this interface is no new tools are necessary (just 
standard 'attr' or 'setfattr' commands) and it is the same with both 
ceph-fuse and the kernel client.

sage


On Mon, 18 Aug 2014, Kasper Dieter wrote:

 Hi Sage,
 
 a couple of months ago (maybe last year) I was able to change the
 assignment of Directorlies and Files of CephFS to different pools 
 back and forth (with cephfs set_layout as well as with setfattr).
 
 Now (with ceph v0.81 and Kernel 3.10 an the client side)
 neither 'cephfs set_layout' nor 'setfattr' works anymore:
 
 # mount | grep ceph
 ceph-fuse on /mnt/ceph-fuse type fuse.ceph-fuse 
 (rw,nosuid,nodev,allow_other,default_permissions)
 192.168.113.52:6789:/ on /mnt/cephfs type ceph (name=admin,key=client.admin)
 
 # ls -l /mnt/cephfs
 total 0
 -rw-r--r-- 1 root root 0 Aug 18 21:06 file
 -rw-r--r-- 1 root root 0 Aug 18 21:10 file2
 -rw-r--r-- 1 root root 0 Aug 18 21:11 file3
 drwxr-xr-x 1 root root 0 Aug 18 20:54 sas-r2
 drwxr-xr-x 1 root root 0 Aug 18 20:54 ssd-r2
 
 # getfattr -d -m - /mnt/cephfs
 getfattr: Removing leading '/' from absolute path names
 # file: mnt/cephfs
 ceph.dir.entries=5
 ceph.dir.files=3
 ceph.dir.layout=stripe_unit=4194304 stripe_count=1 object_size=4194304 
 pool=SAS-r2
 ceph.dir.rbytes=0
 ceph.dir.rctime=0.090
 ceph.dir.rentries=1
 ceph.dir.rfiles=0
 ceph.dir.rsubdirs=1
 ceph.dir.subdirs=2
 
 # setfattr -n ceph.dir.layout.pool -v SSD-r2 /mnt/cephfs
 setfattr: /mnt/cephfs: Invalid argument
 
 # ceph osd dump | grep pool
 pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
 rjenkins pg_num 8064 pgp_num 8064 last_change 1 flags hashpspool 
 crash_replay_interval 45 stripe_width 0
 pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
 rjenkins pg_num 8064 pgp_num 8064 last_change 1 flags hashpspool stripe_width  0
 pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
 rjenkins pg_num 8064 pgp_num 8064 last_change 1 flags hashpspool stripe_width  0
 pool 3 'SSD-r2' replicated size 2 min_size 2 crush_ruleset 3 object_hash 
 rjenkins pg_num 6000 pgp_num 6000 last_change 404 flags hashpspool 
 stripe_width 0
 pool 4 'SAS-r2' replicated size 2 min_size 2 crush_ruleset 4 object_hash 
 rjenkins pg_num 6000 pgp_num 6000 last_change 408 flags hashpspool 
 stripe_width 0
 pool 5 'SSD-r3' replicated size 3 min_size 2 crush_ruleset 3 object_hash 
 rjenkins pg_num 4000 pgp_num 4000 last_change 413 flags hashpspool 
 stripe_width 0
 pool 6 'SAS-r3' replicated size 3 min_size 2 crush_ruleset 4 object_hash 
 rjenkins pg_num 4000 pgp_num 4000 last_change 416 flags hashpspool 
 stripe_width 0
 
 # getfattr -d -m - /mnt/cephfs/ssd-r2
 getfattr: Removing leading '/' from absolute path names
 # file: mnt/cephfs/ssd-r2
 ceph.dir.entries=0
 ceph.dir.files=0
 ceph.dir.rbytes=0
 ceph.dir.rctime=0.090
 ceph.dir.rentries=1
 ceph.dir.rfiles=0
 ceph.dir.rsubdirs=1
 
 # setfattr -n ceph.dir.layout.pool -v SSD-r2 /mnt/cephfs/ssd-r2
 setfattr: /mnt/cephfs/ssd-r2: Invalid argument
 
 # cephfs /mnt/cephfs/ssd-r2   set_layout -p 3 -s 4194304 -u 4194304 -c 1
 Error setting layout: (22) Invalid argument
 
 
 Any recommendations ?
 Is this a bug, or a new feature ?
 Do I have to use a newer Kernel ?
 
 
 Kind Regards,
 -Dieter
 
 
 
 On Sat, Aug 31, 2013 at 02:26:48AM +0200, Sage Weil wrote:
  On Fri, 30 Aug 2013, Joao Pedras wrote:
   
   Greetings all!
   
   I am bumping into a small issue and I am wondering if someone has any
   insight on it.
   
   I am trying to use a pool other than 'data' for cephfs. Said pool has id 
   #3
   and I have run 'ceph mds add_data_pool 3'.
   
   After mounting cephfs seg faults when trying to set the layout:
   
   $ cephfs /path set_layout -p 3
   
   Segmentation fault
   
   Actually plainly running 'cephfs /path set_layout' without more options 
   will
   seg fault as well.
   
   Version is 0.61.8 on ubuntu 12.04.
   
   A question that comes to mind here is if there is a way of accomplishing
   this when using ceph-fuse (3.x kernels).
  
  You can adjust this more easily using the xattr interface:
  
   getfattr -n ceph.dir.layout dir
   setfattr -n ceph.dir.layout.pool -v mypool
   getfattr -n ceph.dir.layout dir
  
  The interface tests are probably a decent reference given this isn't 
  explicitly documented anywhere:
  
   
  https://github.com/ceph/ceph/blob/master/qa/workunits/misc/layout_vxattrs.sh
  
  sage
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] setfattr ... does not work anymore for pools

2014-08-18 Thread Kasper Dieter
Hi Sage,

I know about the setattr syntax from

https://github.com/ceph/ceph/blob/master/qa/workunits/fs/misc/layout_vxattrs.sh
=
setfattr -n ceph.dir.layout.pool -v data dir
setfattr -n ceph.dir.layout.pool -v 2 dir

But, in my case it is not working:

[root@rx37-1 ~]# setfattr -n ceph.dir.layout.pool -v 3 /mnt/cephfs/ssd-r2
setfattr: /mnt/cephfs/ssd-r2: Invalid argument

[root@rx37-1 ~]# setfattr -n ceph.dir.layout.pool -v SSD-r2 /mnt/cephfs/ssd-r2
setfattr: /mnt/cephfs/ssd-r2: Invalid argument

[root@rx37-1 ~]# strace setfattr -n ceph.dir.layout.pool -v SSD-r2 
/mnt/cephfs/ssd-r2
(...)
setxattr(/mnt/cephfs/ssd-r2, ceph.dir.layout.pool, SSD-r2, 6, 0) = -1 
EINVAL (Invalid argument)

Same with ceph-fuse:
[root@rx37-1 ~]# strace setfattr -n ceph.dir.layout.pool -v SSD-r2 
/mnt/ceph-fuse/ssd-r2
(...)
setxattr(/mnt/ceph-fuse/ssd-r2, ceph.dir.layout.pool, SSD-r2, 6, 0) = -1 
EINVAL (Invalid argument)


Setting all layout attribute at once does not work either:
[root@rx37-1 cephfs]# setfattr -n ceph.dir.layout -v stripe_unit=2097152 
stripe_count=1 object_size=4194304 pool=SSD-r2 /mnt/cephfs/ssd-r2
setfattr: /mnt/cephfs/ssd-r2: Invalid argument


How can I debug this further ?
It seems the Directory has no layout at all:

# getfattr -d -m - /mnt/cephfs/ssd-r2
# file: ssd-r2
ceph.dir.entries=0
ceph.dir.files=0
ceph.dir.rbytes=0
ceph.dir.rctime=0.090
ceph.dir.rentries=1
ceph.dir.rfiles=0
ceph.dir.rsubdirs=1
ceph.dir.subdirs=0


Kind Regards,
-Dieter



On Mon, Aug 18, 2014 at 09:37:39PM +0200, Sage Weil wrote:
 Hi Dieter,
 
 There is a new xattr based interface.  See
 
   
 https://github.com/ceph/ceph/blob/master/qa/workunits/fs/misc/layout_vxattrs.sh
 
 The nice part about this interface is no new tools are necessary (just 
 standard 'attr' or 'setfattr' commands) and it is the same with both 
 ceph-fuse and the kernel client.
 
 sage
 
 
 On Mon, 18 Aug 2014, Kasper Dieter wrote:
 
  Hi Sage,
  
  a couple of months ago (maybe last year) I was able to change the
  assignment of Directorlies and Files of CephFS to different pools 
  back and forth (with cephfs set_layout as well as with setfattr).
  
  Now (with ceph v0.81 and Kernel 3.10 an the client side)
  neither 'cephfs set_layout' nor 'setfattr' works anymore:
  
  # mount | grep ceph
  ceph-fuse on /mnt/ceph-fuse type fuse.ceph-fuse 
  (rw,nosuid,nodev,allow_other,default_permissions)
  192.168.113.52:6789:/ on /mnt/cephfs type ceph (name=admin,key=client.admin)
  
  # ls -l /mnt/cephfs
  total 0
  -rw-r--r-- 1 root root 0 Aug 18 21:06 file
  -rw-r--r-- 1 root root 0 Aug 18 21:10 file2
  -rw-r--r-- 1 root root 0 Aug 18 21:11 file3
  drwxr-xr-x 1 root root 0 Aug 18 20:54 sas-r2
  drwxr-xr-x 1 root root 0 Aug 18 20:54 ssd-r2
  
  # getfattr -d -m - /mnt/cephfs
  getfattr: Removing leading '/' from absolute path names
  # file: mnt/cephfs
  ceph.dir.entries=5
  ceph.dir.files=3
  ceph.dir.layout=stripe_unit=4194304 stripe_count=1 object_size=4194304 
  pool=SAS-r2
  ceph.dir.rbytes=0
  ceph.dir.rctime=0.090
  ceph.dir.rentries=1
  ceph.dir.rfiles=0
  ceph.dir.rsubdirs=1
  ceph.dir.subdirs=2
  
  # setfattr -n ceph.dir.layout.pool -v SSD-r2 /mnt/cephfs
  setfattr: /mnt/cephfs: Invalid argument
  
  # ceph osd dump | grep pool
  pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
  rjenkins pg_num 8064 pgp_num 8064 last_change 1 flags hashpspool 
  crash_replay_interval 45 stripe_width 0
  pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
  rjenkins pg_num 8064 pgp_num 8064 last_change 1 flags hashpspool 
  stripe_width 0
  pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
  rjenkins pg_num 8064 pgp_num 8064 last_change 1 flags hashpspool 
  stripe_width 0
  pool 3 'SSD-r2' replicated size 2 min_size 2 crush_ruleset 3 object_hash 
  rjenkins pg_num 6000 pgp_num 6000 last_change 404 flags hashpspool 
  stripe_width 0
  pool 4 'SAS-r2' replicated size 2 min_size 2 crush_ruleset 4 object_hash 
  rjenkins pg_num 6000 pgp_num 6000 last_change 408 flags hashpspool 
  stripe_width 0
  pool 5 'SSD-r3' replicated size 3 min_size 2 crush_ruleset 3 object_hash 
  rjenkins pg_num 4000 pgp_num 4000 last_change 413 flags hashpspool 
  stripe_width 0
  pool 6 'SAS-r3' replicated size 3 min_size 2 crush_ruleset 4 object_hash 
  rjenkins pg_num 4000 pgp_num 4000 last_change 416 flags hashpspool 
  stripe_width 0
  
  # getfattr -d -m - /mnt/cephfs/ssd-r2
  getfattr: Removing leading '/' from absolute path names
  # file: mnt/cephfs/ssd-r2
  ceph.dir.entries=0
  ceph.dir.files=0
  ceph.dir.rbytes=0
  ceph.dir.rctime=0.090
  ceph.dir.rentries=1
  ceph.dir.rfiles=0
  ceph.dir.rsubdirs=1
  
  # setfattr -n ceph.dir.layout.pool -v SSD-r2 /mnt/cephfs/ssd-r2
  setfattr: /mnt/cephfs/ssd-r2: Invalid argument
  
  # cephfs /mnt/cephfs/ssd-r2   set_layout -p 3 -s 4194304 -u 4194304 -c 1
  Error setting layout: (22) Invalid argument
  
  
  Any recommendations ?
  Is this a bug, or a new feature ?
  Do 

Re: [ceph-users] setfattr ... works after 'ceph mds add_data_pool'

2014-08-18 Thread Kasper Dieter
Hi Sage,

it seems the pools must be added to the MDS first:

ceph mds add_data_pool 3# = SSD-r2
ceph mds add_data_pool 4# = SAS-r2

After these commands the setfattr -n ceph.dir.layout.pool worked.

Thanks,
-Dieter


On Mon, Aug 18, 2014 at 10:19:08PM +0200, Kasper Dieter wrote:
 Hi Sage,
 
 I know about the setattr syntax from
   
 https://github.com/ceph/ceph/blob/master/qa/workunits/fs/misc/layout_vxattrs.sh
 =
 setfattr -n ceph.dir.layout.pool -v data dir
 setfattr -n ceph.dir.layout.pool -v 2 dir
 
 But, in my case it is not working:
 
 [root@rx37-1 ~]# setfattr -n ceph.dir.layout.pool -v 3 /mnt/cephfs/ssd-r2
 setfattr: /mnt/cephfs/ssd-r2: Invalid argument
 
 [root@rx37-1 ~]# setfattr -n ceph.dir.layout.pool -v SSD-r2 /mnt/cephfs/ssd-r2
 setfattr: /mnt/cephfs/ssd-r2: Invalid argument
 
 [root@rx37-1 ~]# strace setfattr -n ceph.dir.layout.pool -v SSD-r2 
 /mnt/cephfs/ssd-r2
 (...)
 setxattr(/mnt/cephfs/ssd-r2, ceph.dir.layout.pool, SSD-r2, 6, 0) = -1 
 EINVAL (Invalid argument)
 
 Same with ceph-fuse:
 [root@rx37-1 ~]# strace setfattr -n ceph.dir.layout.pool -v SSD-r2 
 /mnt/ceph-fuse/ssd-r2
 (...)
 setxattr(/mnt/ceph-fuse/ssd-r2, ceph.dir.layout.pool, SSD-r2, 6, 0) = 
 -1 EINVAL (Invalid argument)
 
 
 Setting all layout attribute at once does not work either:
 [root@rx37-1 cephfs]# setfattr -n ceph.dir.layout -v stripe_unit=2097152 
 stripe_count=1 object_size=4194304 pool=SSD-r2 /mnt/cephfs/ssd-r2
 setfattr: /mnt/cephfs/ssd-r2: Invalid argument
 
 
 How can I debug this further ?
 It seems the Directory has no layout at all:
 
 # getfattr -d -m - /mnt/cephfs/ssd-r2
 # file: ssd-r2
 ceph.dir.entries=0
 ceph.dir.files=0
 ceph.dir.rbytes=0
 ceph.dir.rctime=0.090
 ceph.dir.rentries=1
 ceph.dir.rfiles=0
 ceph.dir.rsubdirs=1
 ceph.dir.subdirs=0
 
 
 Kind Regards,
 -Dieter
 
 
 
 On Mon, Aug 18, 2014 at 09:37:39PM +0200, Sage Weil wrote:
  Hi Dieter,
  
  There is a new xattr based interface.  See
  
  
  https://github.com/ceph/ceph/blob/master/qa/workunits/fs/misc/layout_vxattrs.sh
  
  The nice part about this interface is no new tools are necessary (just 
  standard 'attr' or 'setfattr' commands) and it is the same with both 
  ceph-fuse and the kernel client.
  
  sage
  
  
  On Mon, 18 Aug 2014, Kasper Dieter wrote:
  
   Hi Sage,
   
   a couple of months ago (maybe last year) I was able to change the
   assignment of Directorlies and Files of CephFS to different pools 
   back and forth (with cephfs set_layout as well as with setfattr).
   
   Now (with ceph v0.81 and Kernel 3.10 an the client side)
   neither 'cephfs set_layout' nor 'setfattr' works anymore:
   
   # mount | grep ceph
   ceph-fuse on /mnt/ceph-fuse type fuse.ceph-fuse 
   (rw,nosuid,nodev,allow_other,default_permissions)
   192.168.113.52:6789:/ on /mnt/cephfs type ceph 
   (name=admin,key=client.admin)
   
   # ls -l /mnt/cephfs
   total 0
   -rw-r--r-- 1 root root 0 Aug 18 21:06 file
   -rw-r--r-- 1 root root 0 Aug 18 21:10 file2
   -rw-r--r-- 1 root root 0 Aug 18 21:11 file3
   drwxr-xr-x 1 root root 0 Aug 18 20:54 sas-r2
   drwxr-xr-x 1 root root 0 Aug 18 20:54 ssd-r2
   
   # getfattr -d -m - /mnt/cephfs
   getfattr: Removing leading '/' from absolute path names
   # file: mnt/cephfs
   ceph.dir.entries=5
   ceph.dir.files=3
   ceph.dir.layout=stripe_unit=4194304 stripe_count=1 object_size=4194304 
   pool=SAS-r2
   ceph.dir.rbytes=0
   ceph.dir.rctime=0.090
   ceph.dir.rentries=1
   ceph.dir.rfiles=0
   ceph.dir.rsubdirs=1
   ceph.dir.subdirs=2
   
   # setfattr -n ceph.dir.layout.pool -v SSD-r2 /mnt/cephfs
   setfattr: /mnt/cephfs: Invalid argument
   
   # ceph osd dump | grep pool
   pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
   rjenkins pg_num 8064 pgp_num 8064 last_change 1 flags hashpspool 
   crash_replay_interval 45 stripe_width 0
   pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 
   object_hash rjenkins pg_num 8064 pgp_num 8064 last_change 1 flags 
   hashpspool stripe_width 0
   pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
   rjenkins pg_num 8064 pgp_num 8064 last_change 1 flags hashpspool 
   stripe_width 0
   pool 3 'SSD-r2' replicated size 2 min_size 2 crush_ruleset 3 object_hash 
   rjenkins pg_num 6000 pgp_num 6000 last_change 404 flags hashpspool 
   stripe_width 0
   pool 4 'SAS-r2' replicated size 2 min_size 2 crush_ruleset 4 object_hash 
   rjenkins pg_num 6000 pgp_num 6000 last_change 408 flags hashpspool 
   stripe_width 0
   pool 5 'SSD-r3' replicated size 3 min_size 2 crush_ruleset 3 object_hash 
   rjenkins pg_num 4000 pgp_num 4000 last_change 413 flags hashpspool 
   stripe_width 0
   pool 6 'SAS-r3' replicated size 3 min_size 2 crush_ruleset 4 object_hash 
   rjenkins pg_num 4000 pgp_num 4000 last_change 416 flags hashpspool 
   stripe_width 0
   
   # getfattr -d -m - /mnt/cephfs/ssd-r2
   getfattr: Removing leading '/' from absolute path names
   # file: mnt/cephfs/ssd-r2
   ceph.dir.entries=0
   

Re: [ceph-users] Fixed all active+remapped PGs stuck forever (but I have no clue why)

2014-08-18 Thread John Morris



On 08/18/2014 02:20 PM, John Morris wrote:



On 08/18/2014 01:49 PM, Sage Weil wrote:

On Mon, 18 Aug 2014, John Morris wrote:

rule by_bank {
 ruleset 3
 type replicated
 min_size 3
 max_size 4
 step take default
 step choose firstn 0 type bank
 step choose firstn 0 type osd
 step emit
}


You probably want:

  step choose firstn 0 type bank
  step choose firstn 1 type osd

I.e., 3 (or 4) banks, and 1 osd in each.. not 3 banks with 3 osds in each
or 4 banks with 4 osds in each (for a total of 9 or 16 OSDs).


Yes, thanks.  Funny, testing still works with the incorrect version, and
the --show-utilization test results look similar.

In re. to my last email about tunables, those can also be expressed in
the human-readable map as such:

tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50


Wrapping up this exercise:

This little script helps to see exactly where things go, and show what 
goes wrong with my original, incorrect map.


#!/bin/bash
echo compiling crush map
crushtool -c /tmp/crush.txt -o /tmp/crush-new.bin \
--enable-unsafe-tunables
bad=$(crushtool -i /tmp/crush2-new.bin --test \
--show-bad-mappings 21 | \
wc -l)
echo number of bad mappings:  $bad

distribution() {
crushtool -i /tmp/crush2-new.bin --test --show-statistics \
--num-rep $1 21 | \
awk '/\[.*\]/ {
gsub([][],,$6);
split($6,a,,);
asort(a,d);
print d[1], d[2], d[3], d[4]; }' | \
sort | uniq -c
}
for i in 3 4; do
echo distribution of size=${i} replicas:
distribution $i
done


For --num-rep=4, the result looks like the following; it's easily seen 
that two sets of OSDs in the same bank are always picked, exactly what 
we do NOT want (note OSDs 0+1 in bank0, 1+2 in bank1, etc.):


173 0 1 2 3
176 0 1 4 5
184 0 1 6 7
171 2 3 4 5
156 2 3 6 7
164 4 5 6 7

After Sage's correction, the result looks like the following, with one 
OSD from each bank:


 70 0 2 4 6
 74 0 2 4 7
 65 0 2 5 6
 58 0 2 5 7
 60 0 3 4 6
 72 0 3 4 7
 80 0 3 5 6
 64 0 3 5 7
 48 1 2 4 6
 66 1 2 4 7
 72 1 2 5 6
 46 1 2 5 7
 73 1 3 4 6
 70 1 3 4 7
 51 1 3 5 6
 55 1 3 5 7

When replicas=3, the result is also correct.

So this is a bit of a hack, but it does seem to work to evenly 
distribute 3-4 replicas across a bucket level with only two nodes.  Late 
into this exploration, it appears that if the 'bank' layer is 
undesirable, this also works to distribute evenly across hosts:


step choose firstn 0 type host
step choose firstn 2 type osd

In conclusion, this example doesn't seem so far-fetched, since it's easy 
to imagine wanting to distribute OSDs across two racks, or PDUs, or data 
centers, where it's not so unreasonable to say a third is out of the budget.


John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] active+remapped after remove osd via ceph osd out

2014-08-18 Thread Dominik Mostowiec
After replace broken disk and ceph osd in it, cluster:
ceph health detail
HEALTH_WARN 2 pgs stuck unclean; recovery 60/346857819 degraded (0.000%)
pg 3.884 is stuck unclean for 570722.873270, current state
active+remapped, last acting [143,261,314]
pg 3.154a is stuck unclean for 577659.917066, current state
active+remapped, last acting [85,224,64]
recovery 60/346857819 degraded (0.000%)

What can be wrong?
It is possible this is caused by 'ceph osd reweight-by-utilization' ?

More info:
ceph -v
ceph version 0.67.9 (ba340a97c3dafc9155023da8d515eecc675c619a)

Enabled tunnables:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1

df osd:
143 - 78%
261 - 78%
314 - 80%

85 - 76%
224  76%
64 - 75%

ceph osd dump | grep -i pool
pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 28459 owner 0
crash_replay_interval 45
pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 28460 owner 0
pool 2 'rbd' rep size 3 min_size 1 crush_ruleset 2 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 28461 owner 0
pool 3 '.rgw.buckets' rep size 3 min_size 1 crush_ruleset 0
object_hash rjenkins pg_num 8192 pgp_num 8192 last_change 73711 owner
0
pool 4 '.log' rep size 3 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 2048 pgp_num 2048 last_change 90517 owner 0
pool 5 '.rgw' rep size 3 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 128 pgp_num 128 last_change 72467 owner 0
pool 6 '.users.uid' rep size 3 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 8 pgp_num 8 last_change 28465 owner 0
pool 7 '.users' rep size 3 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 8 pgp_num 8 last_change 28466 owner 0
pool 8 '.usage' rep size 2 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 8 pgp_num 8 last_change 28467 owner
18446744073709551615
pool 9 '.intent-log' rep size 3 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 8 pgp_num 8 last_change 28468 owner
18446744073709551615
pool 10 '.rgw.control' rep size 3 min_size 1 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 33485 owner
18446744073709551615
pool 11 '.rgw.gc' rep size 3 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 8 pgp_num 8 last_change 33487 owner
18446744073709551615
pool 12 '.rgw.root' rep size 2 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 8 pgp_num 8 last_change 44540 owner 0
pool 13 '' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins
pg_num 8 pgp_num 8 last_change 46912 owner 0

ceph pg 3.884 query
{ state: active+remapped,
  epoch: 160655,
  up: [
143],
  acting: [
143,
261,
314],
  info: { pgid: 3.884,
  last_update: 160655'111533,
  last_complete: 160655'111533,
  log_tail: 159997'108532,
  last_backfill: MAX,
  purged_snaps: [],
  history: { epoch_created: 4,
  last_epoch_started: 160261,
  last_epoch_clean: 160261,
  last_epoch_split: 11488,
  same_up_since: 160252,
  same_interval_since: 160260,
  same_primary_since: 160252,
  last_scrub: 155516'107396,
  last_scrub_stamp: 2014-08-06 03:15:18.193611,
  last_deep_scrub: 155516'107293,
  last_deep_scrub_stamp: 2014-08-03 06:45:59.215397,
  last_clean_scrub_stamp: 2014-08-06 03:15:18.193611},
  stats: { version: 160655'111533,
  reported_seq: 856860,
  reported_epoch: 160655,
  state: active+remapped,
  last_fresh: 2014-08-18 23:06:47.068588,
  last_change: 2014-08-17 21:12:29.452628,
  last_active: 2014-08-18 23:06:47.068588,
  last_clean: 2014-08-12 08:44:00.293916,
  last_became_active: 2013-10-25 14:54:55.902442,
  last_unstale: 2014-08-18 23:06:47.068588,
  mapping_epoch: 160258,
  log_start: 159997'108532,
  ondisk_log_start: 159997'108532,
  created: 4,
  last_epoch_clean: 160261,
  parent: 0.0,
  parent_split_bits: 0,
  last_scrub: 155516'107396,
  last_scrub_stamp: 2014-08-06 03:15:18.193611,
  last_deep_scrub: 155516'107293,
  last_deep_scrub_stamp: 2014-08-03 06:45:59.215397,
  last_clean_scrub_stamp: 2014-08-06 03:15:18.193611,
  log_size: 3001,
  ondisk_log_size: 3001,
  stats_invalid: 0,
  stat_sum: { num_bytes: 2750235192,
  num_objects: 12015,
  num_object_clones: 0,
  num_object_copies: 0,
  num_objects_missing_on_primary: 0,
  num_objects_degraded: 0,
  num_objects_unfound: 0,
  num_read: 708045,
  num_read_kb: 39418032,
  num_write: 120983,
  num_write_kb: 2383937,
  num_scrub_errors: 0,
  num_shallow_scrub_errors: 0,
   

Re: [ceph-users] [radosgw-admin] bilog list confusion

2014-08-18 Thread Craig Lewis
I have the same results.  The primary zone (with log_meta and log_data
true) have bilog data, the secondary zone (with log_meta and log_data
false) do not have bilog data.

I'm just guessing here (I can't test it right now)...  I would think that
disabling log_meta and log_data will stop adding new information to the
bilog, but keep existing bilogs.  If that's true, bilog trim should clean
up the old logs (along with mdlog trim and datalog trim).





On Mon, Aug 18, 2014 at 5:43 AM, Patrycja Szabłowska 
szablowska.patry...@gmail.com wrote:

 Hi,


 Is there any configuration option in ceph.conf for enabling/disabling
 the bilog list?
 I mean the result of this command:
 radosgw-admin bilog list

 One ceph cluster gives me results - list of operations which were made
 to the bucket, and the other one gives me just an empty list. I can't
 see what's the reason.


 I can't find it anywhere here in the ceph.conf file.
 http://ceph.com/docs/master/rados/configuration/ceph-conf/

 My guess is it's in region info, but when I've changed these values to
 false for the cluster with working bilog, the bilog would still show.

 1. cluster with empty bilog list:
   zones: [
 { name: default,
   endpoints: [],
   log_meta: false,
   log_data: false}],
 2. cluster with *proper* bilog list:
   zones: [
 { name: master-1,
   endpoints: [
 http:\/\/[...]],
   log_meta: true,
   log_data: true}],


 Here are pools on both of the clusters:

 1. cluster with *proper* bilog list:
 rbd
 .rgw.root
 .rgw.control
 .rgw
 .rgw.gc
 .users.uid
 .users.email
 .users
 .rgw.buckets
 .rgw.buckets.index
 .log
 ''

 2. cluster with empty bilog list:
 data
 metadata
 rbd
 .rgw.root
 .rgw.control
 .rgw
 .rgw.gc
 .users.uid
 .users.email
 .users
 ''
 .rgw.buckets.index
 .rgw.buckets
 .log


 And here is the zone info (just the placement_pools, rest of the
 config is the same):
 1. cluster with *proper* bilog list:
 placement_pools: []

 2. cluster with *empty* bilog list:
   placement_pools: [
 { key: default-placement,
   val: { index_pool: .rgw.buckets.index,
   data_pool: .rgw.buckets,
   data_extra_pool: }}]}


 Any thoughts? I've tried to figure it out by myself, but no luck.



 Thanks,
 Patrycja Szabłowska
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.84 released

2014-08-18 Thread Robert LeBlanc
This may be a better question for Federico. I've pulled the systemd stuff
from git and I have it working, but only if I have the volumes listed in
fstab. Is this the intended way that systemd will function for now or am I
missing a step? I'm pretty new to systemd.

Thanks,
Robert LeBlanc


On Mon, Aug 18, 2014 at 1:14 PM, Sage Weil s...@inktank.com wrote:

 The next Ceph development release is here!  This release contains several
 meaty items, including some MDS improvements for journaling, the ability
 to remove the CephFS file system (and name it), several mon cleanups with
 tiered pools, several OSD performance branches, a new read forward RADOS
 caching mode, a prototype Kinetic OSD backend, and various radosgw
 improvements (especially with the new standalone civetweb frontend).  And
 there are a zillion OSD bug fixes. Things are looking pretty good for the
 Giant release that is coming up in the next month.

 Upgrading
 -

 * The *_kb perf counters on the monitor have been removed.  These are
   replaced with a new set of *_bytes counters (e.g., cluster_osd_kb is
   replaced by cluster_osd_bytes).

 * The rd_kb and wr_kb fields in the JSON dumps for pool stats (accessed via
   the 'ceph df detail -f json-pretty' and related commands) have been
 replaced
   with corresponding *_bytes fields.  Similarly, the 'total_space',
 'total_used',
   and 'total_avail' fields are replaced with 'total_bytes',
   'total_used_bytes', and 'total_avail_bytes' fields.

 * The 'rados df --format=json' output 'read_bytes' and 'write_bytes'
   fields were incorrectly reporting ops; this is now fixed.

 * The 'rados df --format=json' output previously included 'read_kb' and
   'write_kb' fields; these have been removed.  Please use 'read_bytes' and
   'write_bytes' instead (and divide by 1024 if appropriate).

 Notable Changes
 ---

 * ceph-conf: flush log on exit (Sage Weil)
 * ceph-dencoder: refactor build a bit to limit dependencies (Sage Weil,
   Dan Mick)
 * ceph.spec: split out ceph-common package, other fixes (Sandon Van Ness)
 * ceph_test_librbd_fsx: fix RNG, make deterministic (Ilya Dryomov)
 * cephtool: refactor and improve CLI tests (Joao Eduardo Luis)
 * client: improved MDS session dumps (John Spray)
 * common: fix dup log messages (#9080, Sage Weil)
 * crush: include new tunables in dump (Sage Weil)
 * crush: only require rule features if the rule is used (#8963, Sage Weil)
 * crushtool: send output to stdout, not stderr (Wido den Hollander)
 * fix i386 builds (Sage Weil)
 * fix struct vs class inconsistencies (Thorsten Behrens)
 * hadoop: update hadoop tests for Hadoop 2.0 (Haumin Chen)
 * librbd, ceph-fuse: reduce cache flush overhead (Haomai Wang)
 * librbd: fix error path when opening image (#8912, Josh Durgin)
 * mds: add file system name, enabled flag (John Spray)
 * mds: boot refactor, cleanup (John Spray)
 * mds: fix journal conversion with standby-replay (John Spray)
 * mds: separate inode recovery queue (John Spray)
 * mds: session ls, evict commands (John Spray)
 * mds: submit log events in async thread (Yan, Zheng)
 * mds: use client-provided timestamp for user-visible file metadata (Yan,
   Zheng)
 * mds: validate journal header on load and save (John Spray)
 * misc build fixes for OS X (John Spray)
 * misc integer size cleanups (Kevin Cox)
 * mon: add get-quota commands (Joao Eduardo Luis)
 * mon: do not create file system by default (John Spray)
 * mon: fix 'ceph df' output for available space (Xiaoxi Chen)
 * mon: fix bug when no auth keys are present (#8851, Joao Eduardo Luis)
 * mon: fix compat version for MForward (Joao Eduardo Luis)
 * mon: restrict some pool properties to tiered pools (Joao Eduardo Luis)
 * msgr: misc locking fixes for fast dispatch (#8891, Sage Weil)
 * osd: add 'dump_reservations' admin socket command (Sage Weil)
 * osd: add READFORWARD caching mode (Luis Pabon)
 * osd: add header cache for KeyValueStore (Haomai Wang)
 * osd: add prototype KineticStore based on Seagate Kinetic (Josh Durgin)
 * osd: allow map cache size to be adjusted at runtime (Sage Weil)
 * osd: avoid refcounting overhead by passing a few things by ref (Somnath
   Roy)
 * osd: avoid sharing PG info that is not durable (Samuel Just)
 * osd: clear slow request latency info on osd up/down (Sage Weil)
 * osd: fix PG object listing/ordering bug (Guang Yang)
 * osd: fix PG stat errors with tiering (#9082, Sage Weil)
 * osd: fix bug with long object names and rename (#8701, Sage Weil)
 * osd: fix cache full - not full requeueing (#8931, Sage Weil)
 * osd: fix gating of messages from old OSD instances (Greg Farnum)
 * osd: fix memstore bugs with collection_move_rename, lock ordering (Sage
   Weil)
 * osd: improve locking for KeyValueStore (Haomai Wang)
 * osd: make tiering behave if hit_sets aren't enabled (Sage Weil)
 * osd: mark pools with incomplete clones (Sage Weil)
 * osd: misc locking fixes for fast dispatch (Samuel Just, Ma Jianpeng)
 * osd: prevent old rados clients from using 

Re: [ceph-users] v0.84 released

2014-08-18 Thread Sage Weil
On Mon, 18 Aug 2014, Robert LeBlanc wrote:
 This may be a better question for Federico. I've pulled the systemd stuff
 from git and I have it working, but only if I have the volumes listed in
 fstab. Is this the intended way that systemd will function for now or am I
 missing a step? I'm pretty new to systemd.

The OSDs are normally mounted and started via udev, which will call 
'ceph-disk activate device'.  The missing piece is teaching ceph-disk 
how to start up the systemd service for the OSD.  I suspect that this can 
be completely dynamic, based on udev events, not not using 'enable' thing 
where systemd persistently registers that a service is to be started...?

sage




 Thanks,
 Robert LeBlanc
 
 
 On Mon, Aug 18, 2014 at 1:14 PM, Sage Weil s...@inktank.com wrote:
   The next Ceph development release is here!  This release
   contains several
   meaty items, including some MDS improvements for journaling, the
   ability
   to remove the CephFS file system (and name it), several mon
   cleanups with
   tiered pools, several OSD performance branches, a new read
   forward RADOS
   caching mode, a prototype Kinetic OSD backend, and various
   radosgw
   improvements (especially with the new standalone civetweb
   frontend).  And
   there are a zillion OSD bug fixes. Things are looking pretty
   good for the
   Giant release that is coming up in the next month.
 
   Upgrading
   -
 
   * The *_kb perf counters on the monitor have been removed. 
   These are
     replaced with a new set of *_bytes counters (e.g.,
   cluster_osd_kb is
     replaced by cluster_osd_bytes).
 
   * The rd_kb and wr_kb fields in the JSON dumps for pool stats
   (accessed via
     the 'ceph df detail -f json-pretty' and related commands) have
   been replaced
     with corresponding *_bytes fields.  Similarly, the
   'total_space', 'total_used',
     and 'total_avail' fields are replaced with 'total_bytes',
     'total_used_bytes', and 'total_avail_bytes' fields.
 
   * The 'rados df --format=json' output 'read_bytes' and
   'write_bytes'
     fields were incorrectly reporting ops; this is now fixed.
 
   * The 'rados df --format=json' output previously included
   'read_kb' and
     'write_kb' fields; these have been removed.  Please use
   'read_bytes' and
     'write_bytes' instead (and divide by 1024 if appropriate).
 
   Notable Changes
   ---
 
   * ceph-conf: flush log on exit (Sage Weil)
   * ceph-dencoder: refactor build a bit to limit dependencies
   (Sage Weil,
     Dan Mick)
   * ceph.spec: split out ceph-common package, other fixes (Sandon
   Van Ness)
   * ceph_test_librbd_fsx: fix RNG, make deterministic (Ilya
   Dryomov)
   * cephtool: refactor and improve CLI tests (Joao Eduardo Luis)
   * client: improved MDS session dumps (John Spray)
   * common: fix dup log messages (#9080, Sage Weil)
   * crush: include new tunables in dump (Sage Weil)
   * crush: only require rule features if the rule is used (#8963,
   Sage Weil)
   * crushtool: send output to stdout, not stderr (Wido den
   Hollander)
   * fix i386 builds (Sage Weil)
   * fix struct vs class inconsistencies (Thorsten Behrens)
   * hadoop: update hadoop tests for Hadoop 2.0 (Haumin Chen)
   * librbd, ceph-fuse: reduce cache flush overhead (Haomai Wang)
   * librbd: fix error path when opening image (#8912, Josh Durgin)
   * mds: add file system name, enabled flag (John Spray)
   * mds: boot refactor, cleanup (John Spray)
   * mds: fix journal conversion with standby-replay (John Spray)
   * mds: separate inode recovery queue (John Spray)
   * mds: session ls, evict commands (John Spray)
   * mds: submit log events in async thread (Yan, Zheng)
   * mds: use client-provided timestamp for user-visible file
   metadata (Yan,
     Zheng)
   * mds: validate journal header on load and save (John Spray)
   * misc build fixes for OS X (John Spray)
   * misc integer size cleanups (Kevin Cox)
   * mon: add get-quota commands (Joao Eduardo Luis)
   * mon: do not create file system by default (John Spray)
   * mon: fix 'ceph df' output for available space (Xiaoxi Chen)
   * mon: fix bug when no auth keys are present (#8851, Joao
   Eduardo Luis)
   * mon: fix compat version for MForward (Joao Eduardo Luis)
   * mon: restrict some pool properties to tiered pools (Joao
   Eduardo Luis)
   * msgr: misc locking fixes for fast dispatch (#8891, Sage Weil)
   * osd: add 'dump_reservations' admin socket command (Sage Weil)
   * osd: add READFORWARD caching mode (Luis Pabon)
   * osd: add header cache for KeyValueStore (Haomai Wang)
   * osd: add prototype KineticStore based on Seagate Kinetic (Josh
   Durgin)
   

Re: [ceph-users] ceph cluster inconsistency?

2014-08-18 Thread Haomai Wang
On Mon, Aug 18, 2014 at 7:32 PM, Kenneth Waegeman
kenneth.waege...@ugent.be wrote:

 - Message from Haomai Wang haomaiw...@gmail.com -
Date: Mon, 18 Aug 2014 18:34:11 +0800

From: Haomai Wang haomaiw...@gmail.com
 Subject: Re: [ceph-users] ceph cluster inconsistency?
  To: Kenneth Waegeman kenneth.waege...@ugent.be
  Cc: Sage Weil sw...@redhat.com, ceph-users@lists.ceph.com



 On Mon, Aug 18, 2014 at 5:38 PM, Kenneth Waegeman
 kenneth.waege...@ugent.be wrote:

 Hi,

 I tried this after restarting the osd, but I guess that was not the aim
 (
 # ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/current/ list _GHOBJTOSEQ_|
 grep 6adb1100 -A 100
 IO error: lock /var/lib/ceph/osd/ceph-67/current//LOCK: Resource
 temporarily
 unavailable
 tools/ceph_kvstore_tool.cc: In function 'StoreTool::StoreTool(const
 string)' thread 7f8fecf7d780 time 2014-08-18 11:12:29.551780
 tools/ceph_kvstore_tool.cc: 38: FAILED assert(!db_ptr-open(std::cerr))
 ..
 )

 When I run it after bringing the osd down, it takes a while, but it has
 no
 output.. (When running it without the grep, I'm getting a huge list )


 Oh, sorry for it! I made a mistake, the hash value(6adb1100) will be
 reversed into leveldb.
 So grep benchmark_data_ceph001.cubone.os_5560_object789734 should be
 help it.

 this gives:

 [root@ceph003 ~]# ceph-kvstore-tool /var/lib/ceph/osd/ceph-67/current/ list
 _GHOBJTOSEQ_ | grep 5560_object789734 -A 100
 _GHOBJTOSEQ_:3%e0s0_head!0011BDA6!!3!!benchmark_data_ceph001%ecubone%eos_5560_object789734!head
 _GHOBJTOSEQ_:3%e0s0_head!0011C027!!3!!benchmark_data_ceph001%ecubone%eos_31461_object1330170!head
 _GHOBJTOSEQ_:3%e0s0_head!0011C6FD!!3!!benchmark_data_ceph001%ecubone%eos_4919_object227366!head
 _GHOBJTOSEQ_:3%e0s0_head!0011CB03!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1363631!head
 _GHOBJTOSEQ_:3%e0s0_head!0011CDF0!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1573957!head
 _GHOBJTOSEQ_:3%e0s0_head!0011D02C!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1019282!head
 _GHOBJTOSEQ_:3%e0s0_head!0011E2B5!!3!!benchmark_data_ceph001%ecubone%eos_31461_object1283563!head
 _GHOBJTOSEQ_:3%e0s0_head!0011E511!!3!!benchmark_data_ceph001%ecubone%eos_4919_object273736!head
 _GHOBJTOSEQ_:3%e0s0_head!0011E547!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1170628!head
 _GHOBJTOSEQ_:3%e0s0_head!0011EAAB!!3!!benchmark_data_ceph001%ecubone%eos_4919_object256335!head
 _GHOBJTOSEQ_:3%e0s0_head!0011F446!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1484196!head
 _GHOBJTOSEQ_:3%e0s0_head!0011FC59!!3!!benchmark_data_ceph001%ecubone%eos_5560_object884178!head
 _GHOBJTOSEQ_:3%e0s0_head!001203F3!!3!!benchmark_data_ceph001%ecubone%eos_5560_object853746!head
 _GHOBJTOSEQ_:3%e0s0_head!001208E3!!3!!benchmark_data_ceph001%ecubone%eos_5560_object36633!head
 _GHOBJTOSEQ_:3%e0s0_head!00120B37!!3!!benchmark_data_ceph001%ecubone%eos_31461_object1235337!head
 _GHOBJTOSEQ_:3%e0s0_head!001210B6!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1661351!head
 _GHOBJTOSEQ_:3%e0s0_head!001210CB!!3!!benchmark_data_ceph001%ecubone%eos_5560_object238126!head
 _GHOBJTOSEQ_:3%e0s0_head!0012184C!!3!!benchmark_data_ceph001%ecubone%eos_5560_object339943!head
 _GHOBJTOSEQ_:3%e0s0_head!00121916!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1047094!head
 _GHOBJTOSEQ_:3%e0s0_head!001219C1!!3!!benchmark_data_ceph001%ecubone%eos_31461_object520642!head
 _GHOBJTOSEQ_:3%e0s0_head!001222BB!!3!!benchmark_data_ceph001%ecubone%eos_5560_object639565!head
 _GHOBJTOSEQ_:3%e0s0_head!001223AA!!3!!benchmark_data_ceph001%ecubone%eos_4919_object231080!head
 _GHOBJTOSEQ_:3%e0s0_head!0012243C!!3!!benchmark_data_ceph001%ecubone%eos_5560_object858050!head
 _GHOBJTOSEQ_:3%e0s0_head!0012289C!!3!!benchmark_data_ceph001%ecubone%eos_5560_object241796!head
 _GHOBJTOSEQ_:3%e0s0_head!00122D28!!3!!benchmark_data_ceph001%ecubone%eos_4919_object7462!head
 _GHOBJTOSEQ_:3%e0s0_head!00122DFE!!3!!benchmark_data_ceph001%ecubone%eos_5560_object243798!head
 _GHOBJTOSEQ_:3%e0s0_head!00122EFC!!3!!benchmark_data_ceph001%ecubone%eos_8961_object109512!head
 _GHOBJTOSEQ_:3%e0s0_head!001232D7!!3!!benchmark_data_ceph001%ecubone%eos_31461_object653973!head
 _GHOBJTOSEQ_:3%e0s0_head!001234A3!!3!!benchmark_data_ceph001%ecubone%eos_5560_object1378169!head
 _GHOBJTOSEQ_:3%e0s0_head!00123714!!3!!benchmark_data_ceph001%ecubone%eos_5560_object512925!head
 _GHOBJTOSEQ_:3%e0s0_head!001237D9!!3!!benchmark_data_ceph001%ecubone%eos_4919_object23289!head
 _GHOBJTOSEQ_:3%e0s0_head!00123854!!3!!benchmark_data_ceph001%ecubone%eos_31461_object1108852!head
 _GHOBJTOSEQ_:3%e0s0_head!00123971!!3!!benchmark_data_ceph001%ecubone%eos_5560_object704026!head
 _GHOBJTOSEQ_:3%e0s0_head!00123F75!!3!!benchmark_data_ceph001%ecubone%eos_8961_object250441!head
 _GHOBJTOSEQ_:3%e0s0_head!00124083!!3!!benchmark_data_ceph001%ecubone%eos_31461_object706178!head
 _GHOBJTOSEQ_:3%e0s0_head!001240FA!!3!!benchmark_data_ceph001%ecubone%eos_5560_object316952!head