date:20160921

Re: [ceph-users] Ceph Rust Librados

2016-09-21 Thread Brad Hubbard

On Thu, Sep 22, 2016 at 1:42 AM, Chris Jones  wrote:
> Ceph-rust for librados has been released. It's an API interface in Rust for
> all of librados that's a thin layer above the C APIs. There are low-level
> direct access and higher level Rust helpers that make working directly with
> librados simple.
>
> The official repo is:
> https://github.com/ceph/ceph-rust
>
> The Rust Crate is:
> ceph-rust
>
> Rust is a systems programing language that gives you the speed and low-level
> access of C but with the benefits of a higher level language. The main
> benefits of Rust are:
> 1. Speed
> 2. Prevents segfaults
> 3. Guarantees thread safety
> 4. Strong typing
> 5. Compiled
>
> You can find out more at: https://www.rust-lang.org
>
> Contributions are encouraged and welcomed.
>
> This is the base for a number of larger Ceph related projects.

Can you give us any hints to what these projects will entail?

> Updates to
> the library will be frequent.
>
> Also, there will be new Ceph tools coming soon and you can use the following
> for RGW/S3 access from Rust: (Supports V2 and V4 signatures)
> Crate: aws-sdk-rust - https://github.com/lambdastackio/aws-sdk-rust
>
> Thanks,
> Chris Jones
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Snap delete performance impact

2016-09-21 Thread Adrian Saul


Any guidance on this?  I have osd_snap_trim_sleep set to 1 and it seems to have 
tempered some of the issues but its still bad enough that NFS storage off RBD 
volumes become unavailable for over 3 minutes.

It seems that the activity which the snapshot deletes are actioned triggers 
massive disk load for around 30 minutes.  The logs show OSDs marking each other 
out, OSDs complaining they are wrongly marked out and blocked requests errors 
for around 10 minutes at the start of this activity.

Is there any way to throttle snapshot deletes to make them much more of a 
background activity?  It really should not make the entire platform unusable 
for 10 minutes.



> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Adrian Saul
> Sent: Wednesday, 6 July 2016 3:41 PM
> To: 'ceph-users@lists.ceph.com'
> Subject: [ceph-users] Snap delete performance impact
>
>
> I recently started a process of using rbd snapshots to setup a backup regime
> for a few file systems contained in RBD images.  While this generally works
> well at the time of the snapshots there is a massive increase in latency (10ms
> to multiple seconds of rbd device latency) across the entire cluster.  This 
> has
> flow on effects for some cluster timeouts as well as general performance hits
> to applications.
>
> In research I have found some references to osd_snap_trim_sleep being the
> way to throttle this activity but no real guidance on values for it.   I also 
> see
> some other osd_snap_trim tunables  (priority and cost).
>
> Is there any recommendations around setting these for a Jewel cluster?
>
> cheers,
>  Adrian
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Confidentiality: This email and any attachments are confidential and may be 
subject to copyright, legal or some other professional privilege. They are 
intended solely for the attention and use of the named addressee(s). They may 
only be copied, distributed or disclosed with the consent of the copyright 
owner. If you have received this email by mistake or by breach of the 
confidentiality clause, please notify the sender immediately by return email 
and delete or destroy all copies of the email. Any confidentiality, privilege 
or copyright is not waived or lost because this email has been sent to you by 
mistake.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Faulting MDS clients, HEALTH_OK

2016-09-21 Thread Heller, Chris

So just to put more info out there, here is what I’m seeing with a Spark/HDFS 
client:

2016-09-21 20:09:25.076595 7fd61c16f700  0 -- 192.168.1.157:0/634334964 >> 
192.168.1.190:6802/32183 pipe(0x7fd5fcef8ca0 sd=66 :53864 s=2 pgs=50445 cs=1 
l=0 c=0x7fd5fdd371d0).fault, initiating reconnect
2016-09-21 20:09:25.077328 7fd60c579700  0 -- 192.168.1.157:0/634334964 >> 
192.168.1.190:6802/32183 pipe(0x7fd5fcef8ca0 sd=66 :53994 s=1 pgs=50445 cs=2 
l=0 c=0x7fd5fdd371d0).connect got RESETSESSION
2016-09-21 20:09:25.077429 7fd60fd80700  0 client.585194220 
ms_handle_remote_reset on 192.168.1.190:6802/32183
2016-09-21 20:20:55.990686 7fd61c16f700  0 -- 192.168.1.157:0/634334964 >> 
192.168.1.190:6802/32183 pipe(0x7fd5fcef8ca0 sd=66 :53994 s=2 pgs=50630 cs=1 
l=0 c=0x7fd5fdd371d0).fault, initiating reconnect
2016-09-21 20:20:55.990890 7fd60c579700  0 -- 192.168.1.157:0/634334964 >> 
192.168.1.190:6802/32183 pipe(0x7fd5fcef8ca0 sd=66 :53994 s=1 pgs=50630 cs=2 
l=0 c=0x7fd5fdd371d0).fault
2016-09-21 20:21:09.385228 7fd60c579700  0 -- 192.168.1.157:0/634334964 >> 
192.168.1.154:6800/17142 pipe(0x7fd6401e8160 sd=184 :39160 s=1 pgs=0 cs=0 l=0 
c=0x7fd6400433c0).fault

And here is its session info from ‘session ls’:

{
"id": 585194220,
"num_leases": 0,
"num_caps": 16385,
"state": "open",
"replay_requests": 0,
"reconnecting": false,
"inst": "client.585194220 192.168.1.157:0\/634334964",
"client_metadata": {
"ceph_sha1": "d56bdf93ced6b80b07397d57e3fa68fe68304432",
"ceph_version": "ceph version 0.94.7 
(d56bdf93ced6b80b07397d57e3fa68fe68304432)",
"entity_id": "hdfs.user",
"hostname": "a192-168-1-157.d.a.com"
}
},

-Chris

On 9/21/16, 9:27 PM, "Heller, Chris"  wrote:

I also went and bumped mds_cache_size up to 1 million… still seeing cache 
pressure, but I might just need to evict those clients…

On 9/21/16, 9:24 PM, "Heller, Chris"  wrote:

What is the interesting value in ‘session ls’? Is it ‘num_leases’ or 
‘num_caps’ leases appears to be, on average, 1. But caps seems to be 16385 for 
many many clients!

-Chris

On 9/21/16, 9:22 PM, "Gregory Farnum"  wrote:

On Wed, Sep 21, 2016 at 6:13 PM, Heller, Chris  
wrote:
> I’m suspecting something similar, we have millions of files and 
can read a huge subset of them at a time, presently the client is Spark 1.5.2 
which I suspect is leaving the closing of file descriptors up to the garbage 
collector. That said, I’d like to know if I could verify this theory using the 
ceph tools. I’ll try upping “mds cache size”, are there any other configuration 
settings I might adjust to perhaps ease the problem while I track it down in 
the HDFS tools layer?

That's the big one. You can also go through the admin socket 
commands
for things like "session ls" that will tell you how many files the
client is holding on to and compare.

>
> -Chris
>
> On 9/21/16, 4:34 PM, "Gregory Farnum"  wrote:
>
> On Wed, Sep 21, 2016 at 1:16 PM, Heller, Chris 
 wrote:
> > Ok. I just ran into this issue again. The mds rolled after 
many clients were failing to relieve cache pressure.
>
> That definitely could have had something to do with it, if 
say they
> overloaded the MDS so much it got stuck in a directory read 
loop.
> ...actually now I come to think of it, I think there was some 
problem
> with Hadoop not being nice about closing files and so forcing 
clients
> to keep them pinned, which will make the MDS pretty unhappy 
if they're
> holding more than it's configured for.
>
> >
> > Now here is the result of `ceph –s`
> >
> > # ceph -s
> > cluster b126570e-9e7c-0bb2-991f-ecf9abe3afa0
> >  health HEALTH_OK
> >  monmap e1: 5 mons at 
{a154=192.168.1.154:6789/0,a155=192.168.1.155:6789/0,a189=192.168.1.189:6789/0,a190=192.168.1.190:6789/0,a191=192.168.1.191:6789/0}
> > election epoch 130, quorum 0,1,2,3,4 
a154,a155,a189,a190,a191
> >  mdsmap e18676: 1/1/1 up {0=a190=up:active}, 1 
up:standby-replay, 3 up:standby
> >  osdmap e118886: 192 osds: 192 up, 192 in
> >   pgmap v13706298: 11328 pgs, 5 pools, 22704 GB data, 
63571 kobjects
> > 69601 GB used, 37656 GB / 104 TB avail
> >11309 active+clean
> >   13

Re: [ceph-users] radosgw bucket name performance

2016-09-21 Thread Stas Starikevich

Felix,

According to my tests there is difference in performance between usual named 
buckets (test, test01, test02), uuid-named buckets (like 
'7c9e4a81-df86-4c9d-a681-3a570de109db') or just date ('2016-09-20-16h').
Getting ~3x more upload performance (220 uploads\s vs 650 uploads\s) with 
SSD-backed indexes or 'blind buckets' feature enabled.

Stas

> On Sep 21, 2016, at 1:28 PM, Félix Barbeira  wrote:
> 
> Hi,
> 
> Regarding to Amazon S3 documentation, it is advised to insert a bit of random 
> chars in the bucket name in order to gain performance. This is related to how 
> Amazon store key names. It looks like they store an index of object key names 
> in each region.
> 
> http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html#workloads-with-mix-request-types
>  
> 
> 
> My question is: is this also a good practice in a ceph cluster where all the 
> nodes are in the same datacenter? It is relevant in ceph the name of the 
> bucket to gain more performance? I think it's not, because all the data is 
> spread in the placement groups all over the osd nodes, no matter what bucket 
> name he got. Can anyone confirm this?
> 
> Thanks in advance.
> 
> -- 
> Félix Barbeira.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Faulting MDS clients, HEALTH_OK

2016-09-21 Thread Heller, Chris

I also went and bumped mds_cache_size up to 1 million… still seeing cache 
pressure, but I might just need to evict those clients…

On 9/21/16, 9:24 PM, "Heller, Chris"  wrote:

What is the interesting value in ‘session ls’? Is it ‘num_leases’ or 
‘num_caps’ leases appears to be, on average, 1. But caps seems to be 16385 for 
many many clients!

-Chris

On 9/21/16, 9:22 PM, "Gregory Farnum"  wrote:

On Wed, Sep 21, 2016 at 6:13 PM, Heller, Chris  
wrote:
> I’m suspecting something similar, we have millions of files and can 
read a huge subset of them at a time, presently the client is Spark 1.5.2 which 
I suspect is leaving the closing of file descriptors up to the garbage 
collector. That said, I’d like to know if I could verify this theory using the 
ceph tools. I’ll try upping “mds cache size”, are there any other configuration 
settings I might adjust to perhaps ease the problem while I track it down in 
the HDFS tools layer?

That's the big one. You can also go through the admin socket commands
for things like "session ls" that will tell you how many files the
client is holding on to and compare.

>
> -Chris
>
> On 9/21/16, 4:34 PM, "Gregory Farnum"  wrote:
>
> On Wed, Sep 21, 2016 at 1:16 PM, Heller, Chris 
 wrote:
> > Ok. I just ran into this issue again. The mds rolled after many 
clients were failing to relieve cache pressure.
>
> That definitely could have had something to do with it, if say 
they
> overloaded the MDS so much it got stuck in a directory read loop.
> ...actually now I come to think of it, I think there was some 
problem
> with Hadoop not being nice about closing files and so forcing 
clients
> to keep them pinned, which will make the MDS pretty unhappy if 
they're
> holding more than it's configured for.
>
> >
> > Now here is the result of `ceph –s`
> >
> > # ceph -s
> > cluster b126570e-9e7c-0bb2-991f-ecf9abe3afa0
> >  health HEALTH_OK
> >  monmap e1: 5 mons at 
{a154=192.168.1.154:6789/0,a155=192.168.1.155:6789/0,a189=192.168.1.189:6789/0,a190=192.168.1.190:6789/0,a191=192.168.1.191:6789/0}
> > election epoch 130, quorum 0,1,2,3,4 
a154,a155,a189,a190,a191
> >  mdsmap e18676: 1/1/1 up {0=a190=up:active}, 1 
up:standby-replay, 3 up:standby
> >  osdmap e118886: 192 osds: 192 up, 192 in
> >   pgmap v13706298: 11328 pgs, 5 pools, 22704 GB data, 63571 
kobjects
> > 69601 GB used, 37656 GB / 104 TB avail
> >11309 active+clean
> >   13 active+clean+scrubbing
> >6 active+clean+scrubbing+deep
> >
> > And here are the ops in flight:
> >
> > # ceph daemon mds.a190 dump_ops_in_flight
> > {
> > "ops": [],
> > "num_ops": 0
> > }
> >
> > And a tail of the active mds log at debug_mds 5/5
> >
> > 2016-09-21 20:15:53.354226 7fce3b626700  4 mds.0.server 
handle_client_request client_request(client.585124080:17863 lookup 
#1/stream2store 2016-09-21 20:15:53.352390) v2
> > 2016-09-21 20:15:53.354234 7fce3b626700  5 mds.0.server session 
closed|closing|killing, dropping
>
> This is also pretty solid evidence that the MDS is zapping clients
> when they misbehave.
>
> You can increase "mds cache size" past its default 10 
dentries and
> see if that alleviates (or just draws out) the problem.
> -Greg
>
> > 2016-09-21 20:15:54.867108 7fce3b626700  3 mds.0.server 
handle_client_session client_session(request_renewcaps seq 235) v1 from 
client.507429717
> > 2016-09-21 20:15:54.980907 7fce3851f700  2 mds.0.cache 
check_memory_usage total 1475784, rss 666432, heap 79712, malloc 584052 mmap 0, 
baseline 79712, buffers 0, max 1048576, 0 / 93392 inodes have caps, 0 caps, 0 
caps per inode
> > 2016-09-21 20:15:54.980960 7fce3851f700  5 mds.0.bal mds.0 
epoch 38 load mdsload<[0,0 0]/[0,0 0], req 1987, hr 0, qlen 0, cpu 0.34>
> > 2016-09-21 20:15:55.247885 7fce3b626700  3 mds.0.server 
handle_client_session client_session(request_renewcaps seq 233) v1 from 
client.538555196
> > 2016-09-21 20:15:55.455566 7fce3b626700  3 mds.0.server 
handle_client_session client_session(request_renewcaps seq 365) v1 from 
client.507390467
> > 2016-09-21 20:15:55.807704 7fce3b626700  3 mds.0.server

Re: [ceph-users] Faulting MDS clients, HEALTH_OK

2016-09-21 Thread Heller, Chris

What is the interesting value in ‘session ls’? Is it ‘num_leases’ or ‘num_caps’ 
leases appears to be, on average, 1. But caps seems to be 16385 for many many 
clients!

-Chris

On 9/21/16, 9:22 PM, "Gregory Farnum"  wrote:

On Wed, Sep 21, 2016 at 6:13 PM, Heller, Chris  wrote:
> I’m suspecting something similar, we have millions of files and can read 
a huge subset of them at a time, presently the client is Spark 1.5.2 which I 
suspect is leaving the closing of file descriptors up to the garbage collector. 
That said, I’d like to know if I could verify this theory using the ceph tools. 
I’ll try upping “mds cache size”, are there any other configuration settings I 
might adjust to perhaps ease the problem while I track it down in the HDFS 
tools layer?

That's the big one. You can also go through the admin socket commands
for things like "session ls" that will tell you how many files the
client is holding on to and compare.

>
> -Chris
>
> On 9/21/16, 4:34 PM, "Gregory Farnum"  wrote:
>
> On Wed, Sep 21, 2016 at 1:16 PM, Heller, Chris  
wrote:
> > Ok. I just ran into this issue again. The mds rolled after many 
clients were failing to relieve cache pressure.
>
> That definitely could have had something to do with it, if say they
> overloaded the MDS so much it got stuck in a directory read loop.
> ...actually now I come to think of it, I think there was some problem
> with Hadoop not being nice about closing files and so forcing clients
> to keep them pinned, which will make the MDS pretty unhappy if they're
> holding more than it's configured for.
>
> >
> > Now here is the result of `ceph –s`
> >
> > # ceph -s
> > cluster b126570e-9e7c-0bb2-991f-ecf9abe3afa0
> >  health HEALTH_OK
> >  monmap e1: 5 mons at 
{a154=192.168.1.154:6789/0,a155=192.168.1.155:6789/0,a189=192.168.1.189:6789/0,a190=192.168.1.190:6789/0,a191=192.168.1.191:6789/0}
> > election epoch 130, quorum 0,1,2,3,4 
a154,a155,a189,a190,a191
> >  mdsmap e18676: 1/1/1 up {0=a190=up:active}, 1 
up:standby-replay, 3 up:standby
> >  osdmap e118886: 192 osds: 192 up, 192 in
> >   pgmap v13706298: 11328 pgs, 5 pools, 22704 GB data, 63571 
kobjects
> > 69601 GB used, 37656 GB / 104 TB avail
> >11309 active+clean
> >   13 active+clean+scrubbing
> >6 active+clean+scrubbing+deep
> >
> > And here are the ops in flight:
> >
> > # ceph daemon mds.a190 dump_ops_in_flight
> > {
> > "ops": [],
> > "num_ops": 0
> > }
> >
> > And a tail of the active mds log at debug_mds 5/5
> >
> > 2016-09-21 20:15:53.354226 7fce3b626700  4 mds.0.server 
handle_client_request client_request(client.585124080:17863 lookup 
#1/stream2store 2016-09-21 20:15:53.352390) v2
> > 2016-09-21 20:15:53.354234 7fce3b626700  5 mds.0.server session 
closed|closing|killing, dropping
>
> This is also pretty solid evidence that the MDS is zapping clients
> when they misbehave.
>
> You can increase "mds cache size" past its default 10 dentries and
> see if that alleviates (or just draws out) the problem.
> -Greg
>
> > 2016-09-21 20:15:54.867108 7fce3b626700  3 mds.0.server 
handle_client_session client_session(request_renewcaps seq 235) v1 from 
client.507429717
> > 2016-09-21 20:15:54.980907 7fce3851f700  2 mds.0.cache 
check_memory_usage total 1475784, rss 666432, heap 79712, malloc 584052 mmap 0, 
baseline 79712, buffers 0, max 1048576, 0 / 93392 inodes have caps, 0 caps, 0 
caps per inode
> > 2016-09-21 20:15:54.980960 7fce3851f700  5 mds.0.bal mds.0 epoch 38 
load mdsload<[0,0 0]/[0,0 0], req 1987, hr 0, qlen 0, cpu 0.34>
> > 2016-09-21 20:15:55.247885 7fce3b626700  3 mds.0.server 
handle_client_session client_session(request_renewcaps seq 233) v1 from 
client.538555196
> > 2016-09-21 20:15:55.455566 7fce3b626700  3 mds.0.server 
handle_client_session client_session(request_renewcaps seq 365) v1 from 
client.507390467
> > 2016-09-21 20:15:55.807704 7fce3b626700  3 mds.0.server 
handle_client_session client_session(request_renewcaps seq 367) v1 from 
client.538485341
> > 2016-09-21 20:15:56.243462 7fce3b626700  3 mds.0.server 
handle_client_session client_session(request_renewcaps seq 189) v1 from 
client.538577596
> > 2016-09-21 20:15:56.986901 7fce3b626700  3 mds.0.server 
handle_client_session client_session(request_renewcaps seq 232) v1 from 
client.507430372
> > 2016-09-21 20:15:57.026206 7fce3b626700  3 mds.0.server

Re: [ceph-users] rgw bucket index manual copy

2016-09-21 Thread Stas Starikevich

Ben,

Works fine as far as I see:

[root@273aa9f2ee9f /]# s3cmd mb s3://test
Bucket 's3://test/' created

[root@273aa9f2ee9f /]# s3cmd put /etc/hosts s3://test
upload: '/etc/hosts' -> 's3://test/hosts'  [1 of 1]
 196 of 196   100% in0s   404.87 B/s  done

[root@273aa9f2ee9f /]# s3cmd ls s3://test

[root@273aa9f2ee9f /]# ls -al /tmp/hosts
ls: cannot access /tmp/hosts: No such file or directory

[root@273aa9f2ee9f /]# s3cmd get s3://test/hosts /tmp/hosts
download: 's3://test/hosts' -> '/tmp/hosts'  [1 of 1]
 196 of 196   100% in0s  2007.56 B/s  done

[root@273aa9f2ee9f /]# cat /tmp/hosts
172.17.0.4  273aa9f2ee9f

[root@ceph-mon01 ~]# radosgw-admin bucket rm --bucket=test --purge-objects
[root@ceph-mon01 ~]# 

[root@273aa9f2ee9f /]# s3cmd ls 
[root@273aa9f2ee9f /]# 

>>If not, i imagine rados could be used to delete them manually by prefix.
That would be pain with more than few million objects :)

Stas

> On Sep 21, 2016, at 9:10 PM, Ben Hines  wrote:
> 
> Thanks. Will try it out once we get on Jewel.
> 
> Just curious, does bucket deletion with --purge-objects work via 
> radosgw-admin with the no index option?
> If not, i imagine rados could be used to delete them manually by prefix.
> 
> 
> On Sep 21, 2016 6:02 PM, "Stas Starikevich"  > wrote:
> Hi Ben,
> 
> Since the 'Jewel' RadosGW supports blind buckets.
> To enable blind buckets configuration I used:
> 
> radosgw-admin zone get --rgw-zone=default > default-zone.json
> #change index_type from 0 to 1
> vi default-zone.json
> radosgw-admin zone set --rgw-zone=default --infile default-zone.json
> 
> To apply changes you have to restart all the RGW daemons. Then all newly 
> created buckets will not have index (bucket list will provide empty output), 
> but GET\PUT works perfectly.
> In my tests there is no performance difference between SSD-backed indexes and 
> 'blind bucket' configuration.
> 
> Stas
> 
> > On Sep 21, 2016, at 2:26 PM, Ben Hines  > > wrote:
> >
> > Nice, thanks! Must have missed that one. It might work well for our use 
> > case since we don't really need the index.
> >
> > -Ben
> >
> > On Wed, Sep 21, 2016 at 11:23 AM, Gregory Farnum  > > wrote:
> > On Wednesday, September 21, 2016, Ben Hines  > > wrote:
> > Yes, 200 million is way too big for a single ceph RGW bucket. We 
> > encountered this problem early on and sharded our buckets into 20 buckets, 
> > each which have the sharded bucket index with 20 shards.
> >
> > Unfortunately, enabling the sharded RGW index requires recreating the 
> > bucket and all objects.
> >
> > The fact that ceph uses ceph itself for the bucket indexes makes RGW less 
> > reliable in our experience. Instead of depending on one object you're 
> > depending on two, with the index and the object itself. If the cluster has 
> > any issues with the index the fact that it blocks access to the object 
> > itself is very frustrating. If we could retrieve / put objects into RGW 
> > without hitting the index at all we would - we don't need to list our 
> > buckets.
> >
> > I don't know the details or which release it went into, but indexless 
> > buckets are now a thing -- check the release notes or search the lists! :)
> > -Greg
> >
> >
> >
> > -Ben
> >
> > On Tue, Sep 20, 2016 at 1:57 AM, Wido den Hollander  > > wrote:
> >
> > > Op 20 september 2016 om 10:55 schreef Василий Ангапов  > > >:
> > >
> > >
> > > Hello,
> > >
> > > Is there any way to copy rgw bucket index to another Ceph node to
> > > lower the downtime of RGW? For now I have  a huge bucket with 200
> > > million files and its backfilling is blocking RGW completely for an
> > > hour and a half even with 10G network.
> > >
> >
> > No, not really. What you really want is the bucket sharding feature.
> >
> > So what you can do is enable the sharding, create a NEW bucket and copy 
> > over the objects.
> >
> > Afterwards you can remove the old bucket.
> >
> > Wido
> >
> > > Thanks!
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com 
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> > > 
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> > 
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> >

Re: [ceph-users] Faulting MDS clients, HEALTH_OK

2016-09-21 Thread Gregory Farnum

On Wed, Sep 21, 2016 at 6:13 PM, Heller, Chris  wrote:
> I’m suspecting something similar, we have millions of files and can read a 
> huge subset of them at a time, presently the client is Spark 1.5.2 which I 
> suspect is leaving the closing of file descriptors up to the garbage 
> collector. That said, I’d like to know if I could verify this theory using 
> the ceph tools. I’ll try upping “mds cache size”, are there any other 
> configuration settings I might adjust to perhaps ease the problem while I 
> track it down in the HDFS tools layer?

That's the big one. You can also go through the admin socket commands
for things like "session ls" that will tell you how many files the
client is holding on to and compare.

>
> -Chris
>
> On 9/21/16, 4:34 PM, "Gregory Farnum"  wrote:
>
> On Wed, Sep 21, 2016 at 1:16 PM, Heller, Chris  wrote:
> > Ok. I just ran into this issue again. The mds rolled after many clients 
> were failing to relieve cache pressure.
>
> That definitely could have had something to do with it, if say they
> overloaded the MDS so much it got stuck in a directory read loop.
> ...actually now I come to think of it, I think there was some problem
> with Hadoop not being nice about closing files and so forcing clients
> to keep them pinned, which will make the MDS pretty unhappy if they're
> holding more than it's configured for.
>
> >
> > Now here is the result of `ceph –s`
> >
> > # ceph -s
> > cluster b126570e-9e7c-0bb2-991f-ecf9abe3afa0
> >  health HEALTH_OK
> >  monmap e1: 5 mons at 
> {a154=192.168.1.154:6789/0,a155=192.168.1.155:6789/0,a189=192.168.1.189:6789/0,a190=192.168.1.190:6789/0,a191=192.168.1.191:6789/0}
> > election epoch 130, quorum 0,1,2,3,4 
> a154,a155,a189,a190,a191
> >  mdsmap e18676: 1/1/1 up {0=a190=up:active}, 1 up:standby-replay, 3 
> up:standby
> >  osdmap e118886: 192 osds: 192 up, 192 in
> >   pgmap v13706298: 11328 pgs, 5 pools, 22704 GB data, 63571 kobjects
> > 69601 GB used, 37656 GB / 104 TB avail
> >11309 active+clean
> >   13 active+clean+scrubbing
> >6 active+clean+scrubbing+deep
> >
> > And here are the ops in flight:
> >
> > # ceph daemon mds.a190 dump_ops_in_flight
> > {
> > "ops": [],
> > "num_ops": 0
> > }
> >
> > And a tail of the active mds log at debug_mds 5/5
> >
> > 2016-09-21 20:15:53.354226 7fce3b626700  4 mds.0.server 
> handle_client_request client_request(client.585124080:17863 lookup 
> #1/stream2store 2016-09-21 20:15:53.352390) v2
> > 2016-09-21 20:15:53.354234 7fce3b626700  5 mds.0.server session 
> closed|closing|killing, dropping
>
> This is also pretty solid evidence that the MDS is zapping clients
> when they misbehave.
>
> You can increase "mds cache size" past its default 10 dentries and
> see if that alleviates (or just draws out) the problem.
> -Greg
>
> > 2016-09-21 20:15:54.867108 7fce3b626700  3 mds.0.server 
> handle_client_session client_session(request_renewcaps seq 235) v1 from 
> client.507429717
> > 2016-09-21 20:15:54.980907 7fce3851f700  2 mds.0.cache 
> check_memory_usage total 1475784, rss 666432, heap 79712, malloc 584052 mmap 
> 0, baseline 79712, buffers 0, max 1048576, 0 / 93392 inodes have caps, 0 
> caps, 0 caps per inode
> > 2016-09-21 20:15:54.980960 7fce3851f700  5 mds.0.bal mds.0 epoch 38 
> load mdsload<[0,0 0]/[0,0 0], req 1987, hr 0, qlen 0, cpu 0.34>
> > 2016-09-21 20:15:55.247885 7fce3b626700  3 mds.0.server 
> handle_client_session client_session(request_renewcaps seq 233) v1 from 
> client.538555196
> > 2016-09-21 20:15:55.455566 7fce3b626700  3 mds.0.server 
> handle_client_session client_session(request_renewcaps seq 365) v1 from 
> client.507390467
> > 2016-09-21 20:15:55.807704 7fce3b626700  3 mds.0.server 
> handle_client_session client_session(request_renewcaps seq 367) v1 from 
> client.538485341
> > 2016-09-21 20:15:56.243462 7fce3b626700  3 mds.0.server 
> handle_client_session client_session(request_renewcaps seq 189) v1 from 
> client.538577596
> > 2016-09-21 20:15:56.986901 7fce3b626700  3 mds.0.server 
> handle_client_session client_session(request_renewcaps seq 232) v1 from 
> client.507430372
> > 2016-09-21 20:15:57.026206 7fce3b626700  3 mds.0.server 
> handle_client_session client_session(request_renewcaps seq 364) v1 from 
> client.491885158
> > 2016-09-21 20:15:57.369281 7fce3b626700  3 mds.0.server 
> handle_client_session client_session(request_renewcaps seq 364) v1 from 
> client.507390682
> > 2016-09-21 20:15:57.445687 7fce3b626700  3 mds.0.server 
> handle_client_session client_session(request_renewcaps seq 364) v1 from 
> client.538485996
> > 2016-09-21 20:15:57.579268 7fce3b626700  3 mds.0.server

Re: [ceph-users] Faulting MDS clients, HEALTH_OK

2016-09-21 Thread Heller, Chris

I’m suspecting something similar, we have millions of files and can read a huge 
subset of them at a time, presently the client is Spark 1.5.2 which I suspect 
is leaving the closing of file descriptors up to the garbage collector. That 
said, I’d like to know if I could verify this theory using the ceph tools. I’ll 
try upping “mds cache size”, are there any other configuration settings I might 
adjust to perhaps ease the problem while I track it down in the HDFS tools 
layer?

-Chris

On 9/21/16, 4:34 PM, "Gregory Farnum"  wrote:

On Wed, Sep 21, 2016 at 1:16 PM, Heller, Chris  wrote:
> Ok. I just ran into this issue again. The mds rolled after many clients 
were failing to relieve cache pressure.

That definitely could have had something to do with it, if say they
overloaded the MDS so much it got stuck in a directory read loop.
...actually now I come to think of it, I think there was some problem
with Hadoop not being nice about closing files and so forcing clients
to keep them pinned, which will make the MDS pretty unhappy if they're
holding more than it's configured for.

>
> Now here is the result of `ceph –s`
>
> # ceph -s
> cluster b126570e-9e7c-0bb2-991f-ecf9abe3afa0
>  health HEALTH_OK
>  monmap e1: 5 mons at 
{a154=192.168.1.154:6789/0,a155=192.168.1.155:6789/0,a189=192.168.1.189:6789/0,a190=192.168.1.190:6789/0,a191=192.168.1.191:6789/0}
> election epoch 130, quorum 0,1,2,3,4 a154,a155,a189,a190,a191
>  mdsmap e18676: 1/1/1 up {0=a190=up:active}, 1 up:standby-replay, 3 
up:standby
>  osdmap e118886: 192 osds: 192 up, 192 in
>   pgmap v13706298: 11328 pgs, 5 pools, 22704 GB data, 63571 kobjects
> 69601 GB used, 37656 GB / 104 TB avail
>11309 active+clean
>   13 active+clean+scrubbing
>6 active+clean+scrubbing+deep
>
> And here are the ops in flight:
>
> # ceph daemon mds.a190 dump_ops_in_flight
> {
> "ops": [],
> "num_ops": 0
> }
>
> And a tail of the active mds log at debug_mds 5/5
>
> 2016-09-21 20:15:53.354226 7fce3b626700  4 mds.0.server 
handle_client_request client_request(client.585124080:17863 lookup 
#1/stream2store 2016-09-21 20:15:53.352390) v2
> 2016-09-21 20:15:53.354234 7fce3b626700  5 mds.0.server session 
closed|closing|killing, dropping

This is also pretty solid evidence that the MDS is zapping clients
when they misbehave.

You can increase "mds cache size" past its default 10 dentries and
see if that alleviates (or just draws out) the problem.
-Greg

> 2016-09-21 20:15:54.867108 7fce3b626700  3 mds.0.server 
handle_client_session client_session(request_renewcaps seq 235) v1 from 
client.507429717
> 2016-09-21 20:15:54.980907 7fce3851f700  2 mds.0.cache check_memory_usage 
total 1475784, rss 666432, heap 79712, malloc 584052 mmap 0, baseline 79712, 
buffers 0, max 1048576, 0 / 93392 inodes have caps, 0 caps, 0 caps per inode
> 2016-09-21 20:15:54.980960 7fce3851f700  5 mds.0.bal mds.0 epoch 38 load 
mdsload<[0,0 0]/[0,0 0], req 1987, hr 0, qlen 0, cpu 0.34>
> 2016-09-21 20:15:55.247885 7fce3b626700  3 mds.0.server 
handle_client_session client_session(request_renewcaps seq 233) v1 from 
client.538555196
> 2016-09-21 20:15:55.455566 7fce3b626700  3 mds.0.server 
handle_client_session client_session(request_renewcaps seq 365) v1 from 
client.507390467
> 2016-09-21 20:15:55.807704 7fce3b626700  3 mds.0.server 
handle_client_session client_session(request_renewcaps seq 367) v1 from 
client.538485341
> 2016-09-21 20:15:56.243462 7fce3b626700  3 mds.0.server 
handle_client_session client_session(request_renewcaps seq 189) v1 from 
client.538577596
> 2016-09-21 20:15:56.986901 7fce3b626700  3 mds.0.server 
handle_client_session client_session(request_renewcaps seq 232) v1 from 
client.507430372
> 2016-09-21 20:15:57.026206 7fce3b626700  3 mds.0.server 
handle_client_session client_session(request_renewcaps seq 364) v1 from 
client.491885158
> 2016-09-21 20:15:57.369281 7fce3b626700  3 mds.0.server 
handle_client_session client_session(request_renewcaps seq 364) v1 from 
client.507390682
> 2016-09-21 20:15:57.445687 7fce3b626700  3 mds.0.server 
handle_client_session client_session(request_renewcaps seq 364) v1 from 
client.538485996
> 2016-09-21 20:15:57.579268 7fce3b626700  3 mds.0.server 
handle_client_session client_session(request_renewcaps seq 364) v1 from 
client.538486021
> 2016-09-21 20:15:57.595568 7fce3b626700  3 mds.0.server 
handle_client_session client_session(request_renewcaps seq 364) v1 from 
client.507390702
> 2016-09-21 20:15:57.604356 7fce3b626700  3 mds.0.server 
handle_client_session client_session(request_renewcaps seq 364) v1 from 
client.507390712
> 2016-09-21

Re: [ceph-users] rgw bucket index manual copy

2016-09-21 Thread Ben Hines

Thanks. Will try it out once we get on Jewel.

Just curious, does bucket deletion with --purge-objects work via
radosgw-admin with the no index option?
If not, i imagine rados could be used to delete them manually by prefix.

On Sep 21, 2016 6:02 PM, "Stas Starikevich" 
wrote:

> Hi Ben,
>
> Since the 'Jewel' RadosGW supports blind buckets.
> To enable blind buckets configuration I used:
>
> radosgw-admin zone get --rgw-zone=default > default-zone.json
> #change index_type from 0 to 1
> vi default-zone.json
> radosgw-admin zone set --rgw-zone=default --infile default-zone.json
>
> To apply changes you have to restart all the RGW daemons. Then all newly
> created buckets will not have index (bucket list will provide empty
> output), but GET\PUT works perfectly.
> In my tests there is no performance difference between SSD-backed indexes
> and 'blind bucket' configuration.
>
> Stas
>
> > On Sep 21, 2016, at 2:26 PM, Ben Hines  wrote:
> >
> > Nice, thanks! Must have missed that one. It might work well for our use
> case since we don't really need the index.
> >
> > -Ben
> >
> > On Wed, Sep 21, 2016 at 11:23 AM, Gregory Farnum 
> wrote:
> > On Wednesday, September 21, 2016, Ben Hines  wrote:
> > Yes, 200 million is way too big for a single ceph RGW bucket. We
> encountered this problem early on and sharded our buckets into 20 buckets,
> each which have the sharded bucket index with 20 shards.
> >
> > Unfortunately, enabling the sharded RGW index requires recreating the
> bucket and all objects.
> >
> > The fact that ceph uses ceph itself for the bucket indexes makes RGW
> less reliable in our experience. Instead of depending on one object you're
> depending on two, with the index and the object itself. If the cluster has
> any issues with the index the fact that it blocks access to the object
> itself is very frustrating. If we could retrieve / put objects into RGW
> without hitting the index at all we would - we don't need to list our
> buckets.
> >
> > I don't know the details or which release it went into, but indexless
> buckets are now a thing -- check the release notes or search the lists! :)
> > -Greg
> >
> >
> >
> > -Ben
> >
> > On Tue, Sep 20, 2016 at 1:57 AM, Wido den Hollander 
> wrote:
> >
> > > Op 20 september 2016 om 10:55 schreef Василий Ангапов <
> anga...@gmail.com>:
> > >
> > >
> > > Hello,
> > >
> > > Is there any way to copy rgw bucket index to another Ceph node to
> > > lower the downtime of RGW? For now I have  a huge bucket with 200
> > > million files and its backfilling is blocking RGW completely for an
> > > hour and a half even with 10G network.
> > >
> >
> > No, not really. What you really want is the bucket sharding feature.
> >
> > So what you can do is enable the sharding, create a NEW bucket and copy
> over the objects.
> >
> > Afterwards you can remove the old bucket.
> >
> > Wido
> >
> > > Thanks!
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rgw bucket index manual copy

2016-09-21 Thread Stas Starikevich

Hi Ben,

Since the 'Jewel' RadosGW supports blind buckets.
To enable blind buckets configuration I used:

radosgw-admin zone get --rgw-zone=default > default-zone.json
#change index_type from 0 to 1
vi default-zone.json
radosgw-admin zone set --rgw-zone=default --infile default-zone.json

To apply changes you have to restart all the RGW daemons. Then all newly 
created buckets will not have index (bucket list will provide empty output), 
but GET\PUT works perfectly.
In my tests there is no performance difference between SSD-backed indexes and 
'blind bucket' configuration.

Stas

> On Sep 21, 2016, at 2:26 PM, Ben Hines  wrote:
> 
> Nice, thanks! Must have missed that one. It might work well for our use case 
> since we don't really need the index. 
> 
> -Ben
> 
> On Wed, Sep 21, 2016 at 11:23 AM, Gregory Farnum  wrote:
> On Wednesday, September 21, 2016, Ben Hines  wrote:
> Yes, 200 million is way too big for a single ceph RGW bucket. We encountered 
> this problem early on and sharded our buckets into 20 buckets, each which 
> have the sharded bucket index with 20 shards.
> 
> Unfortunately, enabling the sharded RGW index requires recreating the bucket 
> and all objects.
> 
> The fact that ceph uses ceph itself for the bucket indexes makes RGW less 
> reliable in our experience. Instead of depending on one object you're 
> depending on two, with the index and the object itself. If the cluster has 
> any issues with the index the fact that it blocks access to the object itself 
> is very frustrating. If we could retrieve / put objects into RGW without 
> hitting the index at all we would - we don't need to list our buckets.
> 
> I don't know the details or which release it went into, but indexless buckets 
> are now a thing -- check the release notes or search the lists! :)
> -Greg
> 
>  
> 
> -Ben
> 
> On Tue, Sep 20, 2016 at 1:57 AM, Wido den Hollander  wrote:
> 
> > Op 20 september 2016 om 10:55 schreef Василий Ангапов :
> >
> >
> > Hello,
> >
> > Is there any way to copy rgw bucket index to another Ceph node to
> > lower the downtime of RGW? For now I have  a huge bucket with 200
> > million files and its backfilling is blocking RGW completely for an
> > hour and a half even with 10G network.
> >
> 
> No, not really. What you really want is the bucket sharding feature.
> 
> So what you can do is enable the sharding, create a NEW bucket and copy over 
> the objects.
> 
> Afterwards you can remove the old bucket.
> 
> Wido
> 
> > Thanks!
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Upgrading 0.94.6 -> 0.94.9 saturating mon node networking

2016-09-21 Thread Stillwell, Bryan J

While attempting to upgrade a 1200+ OSD cluster from 0.94.6 to 0.94.9 I've
run into serious performance issues every time I restart an OSD.

At first I thought the problem I was running into was caused by the osdmap
encoding bug that Dan and Wido ran into when upgrading to 0.94.7, because
I was seeing a ton (millions) of these messages in the logs:

2016-09-21 20:48:32.831040 osd.504 24.161.248.128:6810/96488 24 : cluster
[WRN] failed to encode map e727985 with expected cry

Here are the links to their descriptions of the problem:

http://www.spinics.net/lists/ceph-devel/msg30450.html
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg30783.html

I tried the solution of using the following command to stop those errors
from occurring:

ceph tell osd.* injectargs '--clog_to_monitors false'

Which did get the messages to stop spamming the log files, however, it
didn't fix the performance issue for me.

Using dstat on the mon nodes I was able to determine that every time the
osdmap is updated (by running 'ceph osd pool set data size 2' in this
example) it causes the outgoing network on all mon nodes to be saturated
for multiple seconds at a time:

system total-cpu-usage --memory-usage- -net/total-
-dsk/total- --io/total-
 time |usr sys idl wai hiq siq| used  buff  cach  free| recv
send| read  writ| read  writ
21-09 21:06:53|  1   0  99   0   0   0|11.8G  273M 18.7G  221G|2326k
9015k|   0  1348k|   0  16.0
21-09 21:06:54|  1   1  98   0   0   0|11.9G  273M 18.7G  221G|  15M
10M|   0  1312k|   0  16.0
21-09 21:06:55|  2   2  94   0   0   1|12.3G  273M 18.7G  220G|  14M
311M|   048M|   0   309
21-09 21:06:56|  2   3  93   0   0   3|12.2G  273M 18.7G  220G|7745k
1190M|   016M|   0  93.0
21-09 21:06:57|  1   2  96   0   0   1|12.0G  273M 18.7G  220G|8269k
1189M|   0  1956k|   0  10.0
21-09 21:06:58|  3   1  95   0   0   1|11.8G  273M 18.7G  221G|4854k
752M|   0  4960k|   0  21.0
21-09 21:06:59|  3   0  97   0   0   0|11.8G  273M 18.7G  221G|3098k
25M|   0  5036k|   0  26.0
21-09 21:07:00|  1   0  98   0   0   0|11.8G  273M 18.7G  221G|2247k
25M|   0  9980k|   0  45.0
21-09 21:07:01|  2   1  97   0   0   0|11.8G  273M 18.7G  221G|4149k
17M|   076M|   0   427

That would be 1190 MiB/s (or 9.982 Gbps).

Restarting every OSD on a node at once as part of the upgrade causes a
couple minutes worth of network saturation on all three mon nodes.  This
causes thousands of slow requests and many unhappy OpenStack users.

I'm now stuck about 15% into the upgrade and haven't been able to
determine how to move forward (or even backward) without causing another
outage.

I've attempted to run the same test on another cluster with 1300+ OSDs and
the outgoing network on the mon nodes didn't exceed 15 MiB/s (0.126 Gbps).

Any suggestions on how I can proceed?

Thanks,
Bryan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] rgw multi-site replication issues

2016-09-21 Thread John Rowe

Hello,

We have 2 Ceph clusters running in two separate data centers, each one with
3 mons, 3 rgws, and 5 osds. I am attempting to get bi-directional
multi-site replication setup as described in the ceph documentation here:
http://docs.ceph.com/docs/jewel/radosgw/multisite/

We are running Jewel v 10.2.2:
rpm -qa | grep ceph
ceph-base-10.2.2-0.el7.x86_64
ceph-10.2.2-0.el7.x86_64
ceph-radosgw-10.2.2-0.el7.x86_64
libcephfs1-10.2.2-0.el7.x86_64
python-cephfs-10.2.2-0.el7.x86_64
ceph-selinux-10.2.2-0.el7.x86_64
ceph-mon-10.2.2-0.el7.x86_64
ceph-osd-10.2.2-0.el7.x86_64
ceph-release-1-1.el7.noarch
ceph-common-10.2.2-0.el7.x86_64
ceph-mds-10.2.2-0.el7.x86_64

It appears syncing is happening, however it is not able to sync the
metadata, and therefore no users/buckets from the primary are going to the
secondary.
Primary sync status:
* radosgw-admin sync status*
*  realm 3af93a86-916a-490f-b38f-17922b472b19 (my_realm)*
*  zonegroup 235b010c-22e2-4b43-8fcc-8ae01939273e (us)*
*   zone 6c830b44-4e39-4e19-9bd8-03c37c2021f2 (us-dfw)*
*  metadata sync no sync (zone is master)*
*  data sync source: 58aa3eef-fc1f-492c-a08e-9c6019e7c266 (us-phx)*
*syncing*
*full sync: 0/128 shards*
*incremental sync: 128/128 shards*
*data is caught up with source*


*radosgw-admin data sync status --source-zone=us-phx{"sync_status": {
  "info": {"status": "sync","num_shards": 128
  },"markers": [...}radosgw-admin metadata sync status{
"sync_status": {"info": {"status": "init",
"num_shards": 0,"period": "","realm_epoch": 0
  },"markers": []},"full_sync": {"total": 0,
"complete": 0}}*
Secondary sync status:
*radosgw-admin sync status*
*  realm 3af93a86-916a-490f-b38f-17922b472b19 (pardot)*
*  zonegroup 235b010c-22e2-4b43-8fcc-8ae01939273e (us)*
*   zone 58aa3eef-fc1f-492c-a08e-9c6019e7c266 (us-phx)*
*  metadata sync failed to read sync status: (2) No such file or directory*
*  data sync source: 6c830b44-4e39-4e19-9bd8-03c37c2021f2 (us-dfw)*
*syncing*
*full sync: 0/128 shards*
*incremental sync: 128/128 shards*
*data is behind on 10 shards*
*oldest incremental change not applied: 2016-09-20
15:00:17.0.330225s*

*radosgw-admin data sync status --source-zone=us-dfw*
*{*
*"sync_status": {*
*"info": {*
*"status": "building-full-sync-maps",*
*"num_shards": 128*
*},*
**
*}*



*radosgw-admin metadata sync status --source-zone=us-dfwERROR:
sync.read_sync_status() returned ret=-2*

In the logs I am seeing (date stamps not in order to pick out non-dupes):
Primary logs:
2016-09-20 15:02:44.313204 7f2a2dffb700  0 ERROR:
client_io->complete_request() returned -5
2016-09-20 10:31:57.501247 7faf4bfff700  0 ERROR: failed to wait for op,
ret=-11: POST
http://pardot0-cephrgw1-3-phx.ops.sfdc.net:80/admin/realm/period?period=385c44c7-0506-4204-90d7-9d26a6cbaad2=12=f46ce11b-ee5d-489b-aa30-752fc5353931
2016-09-20 10:32:03.391118 7fb12affd700  0 ERROR: failed to fetch datalog
info
2016-09-20 10:32:03.491520 7fb12affd700  0 ERROR: lease cr failed, done
early

Secondary logs;
2016-09-20 10:28:15.290050 7faab2fed700  0 ERROR: failed to get bucket
instance info for bucket
id=BUCKET1:78301214-35bb-41df-a77f-24968ee4b3ff.104293.1
2016-09-20 10:28:15.290108 7faab5ff3700  0 ERROR: failed to get bucket
instance info for bucket
id=BUCKET1:78301214-35bb-41df-a77f-24968ee4b3ff.104293.1
2016-09-20 10:28:15.290571 7faab77f6700  0 ERROR: failed to get bucket
instance info for bucket
id=BUCKET1:78301214-35bb-41df-a77f-24968ee4b3ff.104293.1
2016-09-20 10:28:15.304619 7faaad7e2700  0 ERROR: failed to get bucket
instance info for bucket
id=BUCKET1:78301214-35bb-41df-a77f-24968ee4b3ff.104293.1
2016-09-20 10:28:38.169629 7fa98bfff700  0 ERROR: failed to distribute
cache for .rgw.root:periods.385c44c7-0506-4204-90d7-9d26a6cbaad2.12
2016-09-20 10:28:38.169642 7fa98bfff700 -1 period epoch 12 is not newer
than current epoch 12, discarding update
2016-09-21 03:19:01.550808 7fe10bfff700  0 rgw meta sync: ERROR: failed to
fetch mdlog info
2016-09-21 15:45:09.799195 7fcd677fe700  0 ERROR: failed to fetch remote
data log info: ret=-11

Each of those logs are repeated several times each, and constantly.

Any help would be greatly appreciated.  Thanks!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Object lost

2016-09-21 Thread Jason Dillaman

Unfortunately, it sounds like the image's header object was lost
during your corruption event. While you can manually retrieve the
image data blocks from the cluster, undoubtedly many might be lost
and/or corrupted as well.

You'll first need to determine the internal id of your image:
$ rados --pool images getomapval rbd_directory
name_07e54256-d123-4e61-a23a-7f8008340751
value (16 bytes) :
  0c 00 00 00 31 30 31 34  31 30 39 63 66 39 32 65  |1014109cf92e|
0010

In my example above, the image id (1014109cf92e in this case) is the
string starting after the first four bytes (the id length). I can then
use the rados tool to list all available data blocks:

$ rados --pool images ls | grep rbd_data.1014109cf92e | sort
rbd_data.1014109cf92e.
rbd_data.1014109cf92e.000b
rbd_data.1014109cf92e.0010

The sequence of hex numbers at the end of each data object is the
object number and it represents the byte offset within the image (4MB
* object number = byte offset assuming default 4MB object size and no
fancy striping enabled).

You should be able to script something up to rebuild a sparse image
with whatever data is still available in your cluster.

On Wed, Sep 21, 2016 at 11:12 AM, Fran Barrera  wrote:
> Hello,
>
> I have a Ceph Jewel cluster with 4 osds and only one monitor integrated with
> Openstack Mitaka.
>
> Two OSD were down, with a service restart one of them was recovered. The
> cluster began to recover and was OK. Finally the disk of the other OSD was
> corrupted and the solution was a format and recreate the OSD.
>
> Now I have the cluster OK, but the problem now is with some of the images
> stored in Ceph.
>
> $ rbd list -p images|grep 07e54256-d123-4e61-a23a-7f8008340751
> 07e54256-d123-4e61-a23a-7f8008340751
>
> $ rbd export -p images 07e54256-d123-4e61-a23a-7f8008340751 /tmp/image.img
> 2016-09-21 17:07:00.889379 7f51f9520700 -1 librbd::image::OpenRequest:
> failed to retreive immutable metadata: (2) No such file or directory
> rbd: error opening image 07e54256-d123-4e61-a23a-7f8008340751: (2) No such
> file or directory
>
> Ceph can list the image but nothing more, for example an export. So
> Openstack can not retrieve this image. I try repair the pg but appear's ok.
> Is there any solution for this?
>
> Kind Regards,
> Fran.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Faulting MDS clients, HEALTH_OK

2016-09-21 Thread Gregory Farnum

On Wed, Sep 21, 2016 at 1:16 PM, Heller, Chris  wrote:
> Ok. I just ran into this issue again. The mds rolled after many clients were 
> failing to relieve cache pressure.

That definitely could have had something to do with it, if say they
overloaded the MDS so much it got stuck in a directory read loop.
...actually now I come to think of it, I think there was some problem
with Hadoop not being nice about closing files and so forcing clients
to keep them pinned, which will make the MDS pretty unhappy if they're
holding more than it's configured for.

>
> Now here is the result of `ceph –s`
>
> # ceph -s
> cluster b126570e-9e7c-0bb2-991f-ecf9abe3afa0
>  health HEALTH_OK
>  monmap e1: 5 mons at 
> {a154=192.168.1.154:6789/0,a155=192.168.1.155:6789/0,a189=192.168.1.189:6789/0,a190=192.168.1.190:6789/0,a191=192.168.1.191:6789/0}
> election epoch 130, quorum 0,1,2,3,4 a154,a155,a189,a190,a191
>  mdsmap e18676: 1/1/1 up {0=a190=up:active}, 1 up:standby-replay, 3 
> up:standby
>  osdmap e118886: 192 osds: 192 up, 192 in
>   pgmap v13706298: 11328 pgs, 5 pools, 22704 GB data, 63571 kobjects
> 69601 GB used, 37656 GB / 104 TB avail
>11309 active+clean
>   13 active+clean+scrubbing
>6 active+clean+scrubbing+deep
>
> And here are the ops in flight:
>
> # ceph daemon mds.a190 dump_ops_in_flight
> {
> "ops": [],
> "num_ops": 0
> }
>
> And a tail of the active mds log at debug_mds 5/5
>
> 2016-09-21 20:15:53.354226 7fce3b626700  4 mds.0.server handle_client_request 
> client_request(client.585124080:17863 lookup #1/stream2store 2016-09-21 
> 20:15:53.352390) v2
> 2016-09-21 20:15:53.354234 7fce3b626700  5 mds.0.server session 
> closed|closing|killing, dropping

This is also pretty solid evidence that the MDS is zapping clients
when they misbehave.

You can increase "mds cache size" past its default 10 dentries and
see if that alleviates (or just draws out) the problem.
-Greg

> 2016-09-21 20:15:54.867108 7fce3b626700  3 mds.0.server handle_client_session 
> client_session(request_renewcaps seq 235) v1 from client.507429717
> 2016-09-21 20:15:54.980907 7fce3851f700  2 mds.0.cache check_memory_usage 
> total 1475784, rss 666432, heap 79712, malloc 584052 mmap 0, baseline 79712, 
> buffers 0, max 1048576, 0 / 93392 inodes have caps, 0 caps, 0 caps per inode
> 2016-09-21 20:15:54.980960 7fce3851f700  5 mds.0.bal mds.0 epoch 38 load 
> mdsload<[0,0 0]/[0,0 0], req 1987, hr 0, qlen 0, cpu 0.34>
> 2016-09-21 20:15:55.247885 7fce3b626700  3 mds.0.server handle_client_session 
> client_session(request_renewcaps seq 233) v1 from client.538555196
> 2016-09-21 20:15:55.455566 7fce3b626700  3 mds.0.server handle_client_session 
> client_session(request_renewcaps seq 365) v1 from client.507390467
> 2016-09-21 20:15:55.807704 7fce3b626700  3 mds.0.server handle_client_session 
> client_session(request_renewcaps seq 367) v1 from client.538485341
> 2016-09-21 20:15:56.243462 7fce3b626700  3 mds.0.server handle_client_session 
> client_session(request_renewcaps seq 189) v1 from client.538577596
> 2016-09-21 20:15:56.986901 7fce3b626700  3 mds.0.server handle_client_session 
> client_session(request_renewcaps seq 232) v1 from client.507430372
> 2016-09-21 20:15:57.026206 7fce3b626700  3 mds.0.server handle_client_session 
> client_session(request_renewcaps seq 364) v1 from client.491885158
> 2016-09-21 20:15:57.369281 7fce3b626700  3 mds.0.server handle_client_session 
> client_session(request_renewcaps seq 364) v1 from client.507390682
> 2016-09-21 20:15:57.445687 7fce3b626700  3 mds.0.server handle_client_session 
> client_session(request_renewcaps seq 364) v1 from client.538485996
> 2016-09-21 20:15:57.579268 7fce3b626700  3 mds.0.server handle_client_session 
> client_session(request_renewcaps seq 364) v1 from client.538486021
> 2016-09-21 20:15:57.595568 7fce3b626700  3 mds.0.server handle_client_session 
> client_session(request_renewcaps seq 364) v1 from client.507390702
> 2016-09-21 20:15:57.604356 7fce3b626700  3 mds.0.server handle_client_session 
> client_session(request_renewcaps seq 364) v1 from client.507390712
> 2016-09-21 20:15:57.693546 7fce3b626700  3 mds.0.server handle_client_session 
> client_session(request_renewcaps seq 364) v1 from client.507390717
> 2016-09-21 20:15:57.819536 7fce3b626700  3 mds.0.server handle_client_session 
> client_session(request_renewcaps seq 364) v1 from client.491885168
> 2016-09-21 20:15:57.894058 7fce3b626700  3 mds.0.server handle_client_session 
> client_session(request_renewcaps seq 364) v1 from client.507390732
> 2016-09-21 20:15:57.983329 7fce3b626700  3 mds.0.server handle_client_session 
> client_session(request_renewcaps seq 364) v1 from client.507390742
> 2016-09-21 20:15:58.077915 7fce3b626700  3 mds.0.server handle_client_session 
> client_session(request_renewcaps seq 364) v1 from client.538486031
> 2016-09-21 20:15:58.141710 7fce3b626700  3

Re: [ceph-users] Faulting MDS clients, HEALTH_OK

2016-09-21 Thread Heller, Chris

Ok. I just ran into this issue again. The mds rolled after many clients were 
failing to relieve cache pressure.

Now here is the result of `ceph –s`

# ceph -s
cluster b126570e-9e7c-0bb2-991f-ecf9abe3afa0
 health HEALTH_OK
 monmap e1: 5 mons at 
{a154=192.168.1.154:6789/0,a155=192.168.1.155:6789/0,a189=192.168.1.189:6789/0,a190=192.168.1.190:6789/0,a191=192.168.1.191:6789/0}
election epoch 130, quorum 0,1,2,3,4 a154,a155,a189,a190,a191
 mdsmap e18676: 1/1/1 up {0=a190=up:active}, 1 up:standby-replay, 3 
up:standby
 osdmap e118886: 192 osds: 192 up, 192 in
  pgmap v13706298: 11328 pgs, 5 pools, 22704 GB data, 63571 kobjects
69601 GB used, 37656 GB / 104 TB avail
   11309 active+clean
  13 active+clean+scrubbing
   6 active+clean+scrubbing+deep

And here are the ops in flight:

# ceph daemon mds.a190 dump_ops_in_flight
{
"ops": [],
"num_ops": 0
}

And a tail of the active mds log at debug_mds 5/5

2016-09-21 20:15:53.354226 7fce3b626700  4 mds.0.server handle_client_request 
client_request(client.585124080:17863 lookup #1/stream2store 2016-09-21 
20:15:53.352390) v2
2016-09-21 20:15:53.354234 7fce3b626700  5 mds.0.server session 
closed|closing|killing, dropping
2016-09-21 20:15:54.867108 7fce3b626700  3 mds.0.server handle_client_session 
client_session(request_renewcaps seq 235) v1 from client.507429717
2016-09-21 20:15:54.980907 7fce3851f700  2 mds.0.cache check_memory_usage total 
1475784, rss 666432, heap 79712, malloc 584052 mmap 0, baseline 79712, buffers 
0, max 1048576, 0 / 93392 inodes have caps, 0 caps, 0 caps per inode
2016-09-21 20:15:54.980960 7fce3851f700  5 mds.0.bal mds.0 epoch 38 load 
mdsload<[0,0 0]/[0,0 0], req 1987, hr 0, qlen 0, cpu 0.34>
2016-09-21 20:15:55.247885 7fce3b626700  3 mds.0.server handle_client_session 
client_session(request_renewcaps seq 233) v1 from client.538555196
2016-09-21 20:15:55.455566 7fce3b626700  3 mds.0.server handle_client_session 
client_session(request_renewcaps seq 365) v1 from client.507390467
2016-09-21 20:15:55.807704 7fce3b626700  3 mds.0.server handle_client_session 
client_session(request_renewcaps seq 367) v1 from client.538485341
2016-09-21 20:15:56.243462 7fce3b626700  3 mds.0.server handle_client_session 
client_session(request_renewcaps seq 189) v1 from client.538577596
2016-09-21 20:15:56.986901 7fce3b626700  3 mds.0.server handle_client_session 
client_session(request_renewcaps seq 232) v1 from client.507430372
2016-09-21 20:15:57.026206 7fce3b626700  3 mds.0.server handle_client_session 
client_session(request_renewcaps seq 364) v1 from client.491885158
2016-09-21 20:15:57.369281 7fce3b626700  3 mds.0.server handle_client_session 
client_session(request_renewcaps seq 364) v1 from client.507390682
2016-09-21 20:15:57.445687 7fce3b626700  3 mds.0.server handle_client_session 
client_session(request_renewcaps seq 364) v1 from client.538485996
2016-09-21 20:15:57.579268 7fce3b626700  3 mds.0.server handle_client_session 
client_session(request_renewcaps seq 364) v1 from client.538486021
2016-09-21 20:15:57.595568 7fce3b626700  3 mds.0.server handle_client_session 
client_session(request_renewcaps seq 364) v1 from client.507390702
2016-09-21 20:15:57.604356 7fce3b626700  3 mds.0.server handle_client_session 
client_session(request_renewcaps seq 364) v1 from client.507390712
2016-09-21 20:15:57.693546 7fce3b626700  3 mds.0.server handle_client_session 
client_session(request_renewcaps seq 364) v1 from client.507390717
2016-09-21 20:15:57.819536 7fce3b626700  3 mds.0.server handle_client_session 
client_session(request_renewcaps seq 364) v1 from client.491885168
2016-09-21 20:15:57.894058 7fce3b626700  3 mds.0.server handle_client_session 
client_session(request_renewcaps seq 364) v1 from client.507390732
2016-09-21 20:15:57.983329 7fce3b626700  3 mds.0.server handle_client_session 
client_session(request_renewcaps seq 364) v1 from client.507390742
2016-09-21 20:15:58.077915 7fce3b626700  3 mds.0.server handle_client_session 
client_session(request_renewcaps seq 364) v1 from client.538486031
2016-09-21 20:15:58.141710 7fce3b626700  3 mds.0.server handle_client_session 
client_session(request_renewcaps seq 364) v1 from client.491885178
2016-09-21 20:15:58.159134 7fce3b626700  3 mds.0.server handle_client_session 
client_session(request_renewcaps seq 364) v1 from client.491885188

-Chris

On 9/21/16, 11:23 AM, "Heller, Chris"  wrote:

Perhaps related, I was watching the active mds with debug_mds set to 5/5, 
when I saw this in the log:

2016-09-21 15:13:26.067698 7fbaec248700  0 -- 192.168.1.196:6802/13581 >> 
192.168.1.238:0/3488321578 pipe(0x55db000 sd=49 :6802 s=2 pgs=2 cs=1 l=0 
c=0x5631ce0).fault with nothing to send, going to standby
2016-09-21 15:13:26.067717 7fbaf64ea700  0 -- 192.168.1.196:6802/13581 >> 
192.168.1.214:0/3252234463 pipe(0x54d1000 sd=76 :6802 s=2 pgs=2 cs=1 l=0 
c=0x237e8420).fault with

Re: [ceph-users] Stat speed for objects in ceph

2016-09-21 Thread Iain Buclaw

On 21 September 2016 at 17:28, Haomai Wang  wrote:
> BTW, why you need to iterate so much objects. I think it should be
> done by other ways to achieve the goal.
>

Mostly it's just a brute force way to identify objects that shouldn't
exist, or objects that have been orphaned (e.g: last modification time
was over 60 days ago).  This house keeping probably wouldn't be needed
if it was possible to rely on the storage platform and the index
holding reference to all objects stored being always correct.

In reality, strange things happen - data was never written, or goes
missing during or after a migration, or disk failure, etc...

When the lifecycle of an object ends, it gets removed from index, and
a deleted from disk.  Again, reality - data was never deleted, or gets
recreated during a migration, etc... :-)

This goes back to iterating all objects and validating that there's
nothing unexpected still on disk.

Now that I have (mostly) one region migrated over to Ceph, maybe there
will start being less reliance on this sort of house keeping.  But the
constant stating of objects for its existence must always happen
during periodical refreshes.

But from what I gather from my local tests, and feedback on here, it's
seems like there should be room for ample improvement on object
iteration.  If I request the an object, via
rados_nobjects_list_next(), the chances of me asking for the next
object via the same callback should be pretty high, right?  And it
would do no harm prefetching that data before it's requested by the
rados client.

--
Iain Buclaw
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rgw bucket index manual copy

2016-09-21 Thread Ben Hines

Nice, thanks! Must have missed that one. It might work well for our use
case since we don't really need the index.

-Ben

On Wed, Sep 21, 2016 at 11:23 AM, Gregory Farnum  wrote:

> On Wednesday, September 21, 2016, Ben Hines  wrote:
>
>> Yes, 200 million is way too big for a single ceph RGW bucket. We
>> encountered this problem early on and sharded our buckets into 20 buckets,
>> each which have the sharded bucket index with 20 shards.
>>
>> Unfortunately, enabling the sharded RGW index requires recreating the
>> bucket and all objects.
>>
>> The fact that ceph uses ceph itself for the bucket indexes makes RGW less
>> reliable in our experience. Instead of depending on one object you're
>> depending on two, with the index and the object itself. If the cluster has
>> any issues with the index the fact that it blocks access to the object
>> itself is very frustrating. If we could retrieve / put objects into RGW
>> without hitting the index at all we would - we don't need to list our
>> buckets.
>>
>
> I don't know the details or which release it went into, but indexless
> buckets are now a thing -- check the release notes or search the lists! :)
> -Greg
>
>
>
>>
>> -Ben
>>
>> On Tue, Sep 20, 2016 at 1:57 AM, Wido den Hollander 
>> wrote:
>>
>>>
>>> > Op 20 september 2016 om 10:55 schreef Василий Ангапов <
>>> anga...@gmail.com>:
>>> >
>>> >
>>> > Hello,
>>> >
>>> > Is there any way to copy rgw bucket index to another Ceph node to
>>> > lower the downtime of RGW? For now I have  a huge bucket with 200
>>> > million files and its backfilling is blocking RGW completely for an
>>> > hour and a half even with 10G network.
>>> >
>>>
>>> No, not really. What you really want is the bucket sharding feature.
>>>
>>> So what you can do is enable the sharding, create a NEW bucket and copy
>>> over the objects.
>>>
>>> Afterwards you can remove the old bucket.
>>>
>>> Wido
>>>
>>> > Thanks!
>>> > ___
>>> > ceph-users mailing list
>>> > ceph-users@lists.ceph.com
>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rgw bucket index manual copy

2016-09-21 Thread Gregory Farnum

On Wednesday, September 21, 2016, Ben Hines  wrote:

> Yes, 200 million is way too big for a single ceph RGW bucket. We
> encountered this problem early on and sharded our buckets into 20 buckets,
> each which have the sharded bucket index with 20 shards.
>
> Unfortunately, enabling the sharded RGW index requires recreating the
> bucket and all objects.
>
> The fact that ceph uses ceph itself for the bucket indexes makes RGW less
> reliable in our experience. Instead of depending on one object you're
> depending on two, with the index and the object itself. If the cluster has
> any issues with the index the fact that it blocks access to the object
> itself is very frustrating. If we could retrieve / put objects into RGW
> without hitting the index at all we would - we don't need to list our
> buckets.
>

I don't know the details or which release it went into, but indexless
buckets are now a thing -- check the release notes or search the lists! :)
-Greg



>
> -Ben
>
> On Tue, Sep 20, 2016 at 1:57 AM, Wido den Hollander  > wrote:
>
>>
>> > Op 20 september 2016 om 10:55 schreef Василий Ангапов <
>> anga...@gmail.com >:
>> >
>> >
>> > Hello,
>> >
>> > Is there any way to copy rgw bucket index to another Ceph node to
>> > lower the downtime of RGW? For now I have  a huge bucket with 200
>> > million files and its backfilling is blocking RGW completely for an
>> > hour and a half even with 10G network.
>> >
>>
>> No, not really. What you really want is the bucket sharding feature.
>>
>> So what you can do is enable the sharding, create a NEW bucket and copy
>> over the objects.
>>
>> Afterwards you can remove the old bucket.
>>
>> Wido
>>
>> > Thanks!
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> 
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rgw bucket index manual copy

2016-09-21 Thread Ben Hines

Yes, 200 million is way too big for a single ceph RGW bucket. We
encountered this problem early on and sharded our buckets into 20 buckets,
each which have the sharded bucket index with 20 shards.

Unfortunately, enabling the sharded RGW index requires recreating the
bucket and all objects.

The fact that ceph uses ceph itself for the bucket indexes makes RGW less
reliable in our experience. Instead of depending on one object you're
depending on two, with the index and the object itself. If the cluster has
any issues with the index the fact that it blocks access to the object
itself is very frustrating. If we could retrieve / put objects into RGW
without hitting the index at all we would - we don't need to list our
buckets.

-Ben

On Tue, Sep 20, 2016 at 1:57 AM, Wido den Hollander  wrote:

>
> > Op 20 september 2016 om 10:55 schreef Василий Ангапов  >:
> >
> >
> > Hello,
> >
> > Is there any way to copy rgw bucket index to another Ceph node to
> > lower the downtime of RGW? For now I have  a huge bucket with 200
> > million files and its backfilling is blocking RGW completely for an
> > hour and a half even with 10G network.
> >
>
> No, not really. What you really want is the bucket sharding feature.
>
> So what you can do is enable the sharding, create a NEW bucket and copy
> over the objects.
>
> Afterwards you can remove the old bucket.
>
> Wido
>
> > Thanks!
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] crash of osd using cephfs jewel 10.2.2, and corruption

2016-09-21 Thread Peter Maloney

It seems the trigger for the problem is this:
> 24.9141130d 1000527. [write 0~242] snapc 1=[] ondisk+write
> e320)
>-40> 2016-09-20 20:38:02.007942 708f67bbd700  0
> filestore(/var/lib/ceph/osd/ceph-0) write couldn't open
> 24.32_head/#24:4d11884b:::1000504.:head#: (24) Too many
> open files
>-39> 2016-09-20 20:38:02.007759 708f673ae700  0
> filestore(/var/lib/ceph/osd/ceph-0) write couldn't open

(and the compressed log is 23MB... do you really want it still?)

So I can understand the osd has little choice but to give up, but
corruption is not okay.

So I set the open files limit to 1 soft 13000 hard, and it seems
fine now. (the default was only 1024.) Grsecurity kernels actually
enforce these things, so maybe that's why nobody else noticed this
corruption problem.

BTW since the symptom is similar, let me mention another bug I found in
hammer 0.94.9... I was playing around with cephfs and `ceph osd
blacklist add ...` and blacklisted a client that wrote some files, and
then whether I kill -9, umount -l, unblacklist right away, etc. it will
show non-corrupt files on the client until cache flush (umount or sysctl
vm.drop_caches=3), and then the files I wrote will be all nulls, even
though no osds crashed. I couldn't reproduce that in jewel 10.2.2
though. So that's 2 ways to cause this... so it should be solved other
than just the limits.conf adjustment.

Peter

On 09/21/16 13:02, Samuel Just wrote:
> Looks like the OSD didn't like an error return it got from the
> underlying fs.  Can you reproduce with
>
> debug filestore = 20
> debug osd = 20
> debug ms = 1
>
> on the osd and post the whole log?
> -Sam
>
> On Wed, Sep 21, 2016 at 12:10 AM, Peter Maloney
>  wrote:
>> Hi,
>>
>> I created a one disk osd with data and separate journal on the same lvm
>> volume group just for test, one mon, one mds on my desktop.
>>
>> I managed to crash the osd just by mounting cephfs and doing cp -a of
>> the linux-stable git tree into it. It crashed after copying 2.1G which
>> only covers some of the .git dir and none of the rest. And then when I
>> killed ceph-mds and restarted the osd and mds, ceph -s said something
>> about the pgs being stuck or unclean or something, and the computer
>> froze. :/ After booting again, everything is fine, and the problem was
>> reproducable the same way...just copying the files again.[but after
>> writing this mail, I can't seem to cause it as easily again... copying
>> again works, but sha1sum doesn't, even if I drop caches]
>>
>> Also reading seems to do the same.
>>
>> And then I tried adding a 2nd osd (also from vlm, with osd and journal
>> on same volume group). And that seemed to stop the crashing, but not
>> sure about corruption.I guess the corruption was on the cephfs but RAM
>> had good copies or something, so rebooting, etc. is what made the
>> corruption appear? (I tried to reproduce, but couldn't...didn't try
>> killing daemons)
>>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw bucket name performance

2016-09-21 Thread Félix Barbeira

Hi,

Regarding to Amazon S3 documentation, it is advised to insert a bit of
random chars in the bucket name in order to gain performance. This is
related to how Amazon store key names. It looks like they store an index of
object key names in each region.

http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html#workloads-with-mix-request-types

My question is: is this also a good practice in a ceph cluster where all
the nodes are in the same datacenter? It is relevant in ceph the name of
the bucket to gain more performance? I think it's not, because all the data
is spread in the placement groups all over the osd nodes, no matter what
bucket name he got. Can anyone confirm this?

Thanks in advance.

-- 
Félix Barbeira.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Ceph Rust Librados

2016-09-21 Thread Chris Jones

Ceph-rust for librados has been released. It's an API interface in Rust for
all of librados that's a thin layer above the C APIs. There are low-level
direct access and higher level Rust helpers that make working directly with
librados simple.

The official repo is:
https://github.com/ceph/ceph-rust

The Rust Crate is:
ceph-rust

Rust is a systems programing language that gives you the speed and
low-level access of C but with the benefits of a higher level language. The
main benefits of Rust are:
1. Speed
2. Prevents segfaults
3. Guarantees thread safety
4. Strong typing
5. Compiled

You can find out more at: https://www.rust-lang.org

Contributions are encouraged and welcomed.

This is the base for a number of larger Ceph related projects. Updates to
the library will be frequent.

Also, there will be new Ceph tools coming soon and you can use the
following for RGW/S3 access from Rust: (Supports V2 and V4 signatures)
Crate: aws-sdk-rust - https://github.com/lambdastackio/aws-sdk-rust

Thanks,
Chris Jones
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Stat speed for objects in ceph

2016-09-21 Thread Wido den Hollander


> Op 21 september 2016 om 17:23 schreef Iain Buclaw :
> 
> 
> On 20 September 2016 at 19:27, Gregory Farnum  wrote:
> > In librados getting a stat is basically equivalent to reading a small
> > object; there's not an index or anything so FileStore needs to descend its
> > folder hierarchy. If looking at metadata for all the objects in the system
> > efficiently is important you'll want to layer an index in somewhere.
> > -Greg
> >
> 
> Yeah, that's not particularly good to hear.  Is this slowness also
> inherent in list_nobjects too?  It looks like I can iterate all
> objects at a rate no faster than 25K per second.  No chance at
> speeding this up either by having two or more instances starting at
> different pg offsets.
> 

RADOS has no index of objects. Everything is done using calculation. So when 
listing objects you basically have to go to each primary OSD for all PGs in a 
pool and ask them what objects are in the pool/PG.

> For this particular operation, it's only looking for orphaned objects.
> This wouldn't be needed if a mechanism for TTLs existed and set on all
> objects.  But that would mean finding out how RGW gets away with it,
> and I assume with another very large index and actively keeping track
> of all set destruction times.
> 

RGW indeed keeps a index, but that limits it in the amount of objects you can 
store in a single bucket. Yes, bucket sharding helps, but the limit for a RGW 
bucket is still way lower then for a RADOS pool.

Wido

> -- 
> Iain Buclaw
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Stat speed for objects in ceph

2016-09-21 Thread Haomai Wang

BTW, why you need to iterate so much objects. I think it should be
done by other ways to achieve the goal.

On Wed, Sep 21, 2016 at 11:23 PM, Iain Buclaw  wrote:
> On 20 September 2016 at 19:27, Gregory Farnum  wrote:
>> In librados getting a stat is basically equivalent to reading a small
>> object; there's not an index or anything so FileStore needs to descend its
>> folder hierarchy. If looking at metadata for all the objects in the system
>> efficiently is important you'll want to layer an index in somewhere.
>> -Greg
>>
>
> Yeah, that's not particularly good to hear.  Is this slowness also
> inherent in list_nobjects too?  It looks like I can iterate all
> objects at a rate no faster than 25K per second.  No chance at
> speeding this up either by having two or more instances starting at
> different pg offsets.
>
> For this particular operation, it's only looking for orphaned objects.
> This wouldn't be needed if a mechanism for TTLs existed and set on all
> objects.  But that would mean finding out how RGW gets away with it,
> and I assume with another very large index and actively keeping track
> of all set destruction times.
>
> --
> Iain Buclaw
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Stat speed for objects in ceph

2016-09-21 Thread Iain Buclaw

On 20 September 2016 at 19:27, Gregory Farnum  wrote:
> In librados getting a stat is basically equivalent to reading a small
> object; there's not an index or anything so FileStore needs to descend its
> folder hierarchy. If looking at metadata for all the objects in the system
> efficiently is important you'll want to layer an index in somewhere.
> -Greg
>

Yeah, that's not particularly good to hear.  Is this slowness also
inherent in list_nobjects too?  It looks like I can iterate all
objects at a rate no faster than 25K per second.  No chance at
speeding this up either by having two or more instances starting at
different pg offsets.

For this particular operation, it's only looking for orphaned objects.
This wouldn't be needed if a mechanism for TTLs existed and set on all
objects.  But that would mean finding out how RGW gets away with it,
and I assume with another very large index and actively keeping track
of all set destruction times.

-- 
Iain Buclaw
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Faulting MDS clients, HEALTH_OK

2016-09-21 Thread Heller, Chris

Perhaps related, I was watching the active mds with debug_mds set to 5/5, when 
I saw this in the log:

2016-09-21 15:13:26.067698 7fbaec248700  0 -- 192.168.1.196:6802/13581 >> 
192.168.1.238:0/3488321578 pipe(0x55db000 sd=49 :6802 s=2 pgs=2 cs=1 l=0 
c=0x5631ce0).fault with nothing to send, going to standby
2016-09-21 15:13:26.067717 7fbaf64ea700  0 -- 192.168.1.196:6802/13581 >> 
192.168.1.214:0/3252234463 pipe(0x54d1000 sd=76 :6802 s=2 pgs=2 cs=1 l=0 
c=0x237e8420).fault with nothing to send, going to standby
2016-09-21 15:13:26.067725 7fbb0098e700  0 -- 192.168.1.196:6802/13581 >> 
192.168.1.204:0/2963585795 pipe(0x3bf1000 sd=55 :6802 s=2 pgs=2 cs=1 l=0 
c=0x15c29020).fault with noth
ing to send, going to standby
2016-09-21 15:13:26.067743 7fbb026ab700  0 -- 192.168.1.196:6802/13581 >> 
192.168.1.192:0/4235516229 pipe(0x562b000 sd=83 :6802 s=2 pgs=2 cs=1 l=0 
c=0x237e91e0).fault, server,
going to standby
2016-09-21 15:13:26.067749 7fbae840a700  0 -- 192.168.1.196:6802/13581 >> 
192.168.1.214:0/329045 pipe(0x2a38a000 sd=74 :6802 s=2 pgs=2 cs=1 l=0 
c=0x13b6c160).fault with not
hing to send, going to standby
2016-09-21 15:13:26.067783 7fbadb239700  0 -- 192.168.1.196:6802/13581 >> 
192.168.1.141:0/229472938 pipe(0x268d2000 sd=87 :6802 s=2 pgs=2 cs=1 l=0 
c=0x28e24f20).fault with noth
ing to send, going to standby
2016-09-21 15:13:26.067803 7fbafe66b700  0 -- 192.168.1.196:6802/13581 >> 
192.168.1.193:0/2637929639 pipe(0x29582000 sd=80 :6802 s=2 pgs=2 cs=1 l=0 
c=0x237e9760).fault with not
hing to send, going to standby
2016-09-21 15:13:26.067876 7fbb01a9f700  0 -- 192.168.1.196:6802/13581 >> 
192.168.1.228:0/581679898 pipe(0x2384f000 sd=103 :6802 s=2 pgs=2 cs=1 l=0 
c=0x2f92f5a0).fault with not
hing to send, going to standby
2016-09-21 15:13:26.067886 7fbb01ca1700  0 -- 192.168.1.196:6802/13581 >> 
192.168.1.145:0/586636299 pipe(0x25806000 sd=101 :6802 s=2 pgs=2 cs=1 l=0 
c=0x2f92cc60).fault with not
hing to send, going to standby
2016-09-21 15:13:26.067865 7fbaf43c9700  0 -- 192.168.1.196:6802/13581 >> 
192.168.1.234:0/3131612847 pipe(0x2fbe5000 sd=120 :6802 s=2 pgs=2 cs=1 l=0 
c=0x37c902c0).fault with no
thing to send, going to standby
2016-09-21 15:13:26.067910 7fbaf4ed4700  0 -- 192.168.1.196:6802/13581 >> 
192.168.1.236:0/650394434 pipe(0x2fbe sd=116 :6802 s=2 pgs=2 cs=1 l=0 
c=0x56a5440).fault with noth
ing to send, going to standby
2016-09-21 15:13:26.067911 7fbb01196700  0 -- 192.168.1.196:6802/13581 >> 
192.168.1.149:0/821983967 pipe(0x1420b000 sd=104 :6802 s=2 pgs=2 cs=1 l=0 
c=0x2f92cf20).fault with not
hing to send, going to standby
2016-09-21 15:13:26.068076 7fbafc64b700  0 -- 192.168.1.196:6802/13581 >> 
192.168.1.190:0/1817596579 pipe(0x36829000 sd=124 :6802 s=2 pgs=2 cs=1 l=0 
c=0x31f7a100).fault with no
thing to send, going to standby
2016-09-21 15:13:26.067717 7fbaf64ea700  0 -- 192.168.1.196:6802/13581 >> 
192.168.1.214:0/3252234463 pipe(0x54d1000 sd=76 :6802 s=2 pgs=2 cs=1 l=0 
c=0x237e8420).fault w[0/9326]ing to send, going to standby
2016-09-21 15:13:26.067725 7fbb0098e700  0 -- 192.168.1.196:6802/13581 >> 
192.168.1.204:0/2963585795 pipe(0x3bf1000 sd=55 :6802 s=2 pgs=2 cs=1 l=0 
c=0x15c29020).fault with noth
ing to send, going to standby
2016-09-21 15:13:26.067743 7fbb026ab700  0 -- 192.168.1.196:6802/13581 >> 
192.168.1.192:0/4235516229 pipe(0x562b000 sd=83 :6802 s=2 pgs=2 cs=1 l=0 
c=0x237e91e0).fault, server,
going to standby
2016-09-21 15:13:26.067749 7fbae840a700  0 -- 192.168.1.196:6802/13581 >> 
192.168.1.214:0/329045 pipe(0x2a38a000 sd=74 :6802 s=2 pgs=2 cs=1 l=0 
c=0x13b6c160).fault with not
hing to send, going to standby
2016-09-21 15:13:26.067783 7fbadb239700  0 -- 192.168.1.196:6802/13581 >> 
192.168.1.141:0/229472938 pipe(0x268d2000 sd=87 :6802 s=2 pgs=2 cs=1 l=0 
c=0x28e24f20).fault with noth
ing to send, going to standby
2016-09-21 15:13:26.067803 7fbafe66b700  0 -- 192.168.1.196:6802/13581 >> 
192.168.1.193:0/2637929639 pipe(0x29582000 sd=80 :6802 s=2 pgs=2 cs=1 l=0 
c=0x237e9760).fault with not
hing to send, going to standby
2016-09-21 15:13:26.067876 7fbb01a9f700  0 -- 192.168.1.196:6802/13581 >> 
192.168.1.228:0/581679898 pipe(0x2384f000 sd=103 :6802 s=2 pgs=2 cs=1 l=0 
c=0x2f92f5a0).fault with nothing to send, going to standby
2016-09-21 15:13:26.067886 7fbb01ca1700  0 -- 192.168.1.196:6802/13581 >> 
192.168.1.145:0/586636299 pipe(0x25806000 sd=101 :6802 s=2 pgs=2 cs=1 l=0 
c=0x2f92cc60).fault with nothing to send, going to standby
2016-09-21 15:13:26.067865 7fbaf43c9700  0 -- 192.168.1.196:6802/13581 >> 
192.168.1.234:0/3131612847 pipe(0x2fbe5000 sd=120 :6802 s=2 pgs=2 cs=1 l=0 
c=0x37c902c0).fault with nothing to send, going to standby
2016-09-21 15:13:26.067910 7fbaf4ed4700  0 -- 192.168.1.196:6802/13581 >> 
192.168.1.236:0/650394434 pipe(0x2fbe sd=116 :6802 s=2 pgs=2 cs=1 l=0 
c=0x56a5440).fault with nothing to send, going to standby
2016-09-21 15:13:26.067911 7fbb01196700  0 -- 192.168.1.196:6802/13581 >> 
192.168.1.149:0/821983967

[ceph-users] Object lost

2016-09-21 Thread Fran Barrera

Hello,

I have a Ceph Jewel cluster with 4 osds and only one monitor integrated
with Openstack Mitaka.

Two OSD were down, with a service restart one of them was recovered. The
cluster began to recover and was OK. Finally the disk of the other OSD was
corrupted and the solution was a format and recreate the OSD.

Now I have the cluster OK, but the problem now is with some of the images
stored in Ceph.

$ rbd list -p images|grep 07e54256-d123-4e61-a23a-7f8008340751
07e54256-d123-4e61-a23a-7f8008340751

$ rbd export -p images 07e54256-d123-4e61-a23a-7f8008340751 /tmp/image.img
2016-09-21 17:07:00.889379 7f51f9520700 -1 librbd::image::OpenRequest:
failed to retreive immutable metadata: (2) No such file or directory
rbd: error opening image 07e54256-d123-4e61-a23a-7f8008340751: (2) No such
file or directory

Ceph can list the image but nothing more, for example an export. So
Openstack can not retrieve this image. I try repair the pg but appear's ok.
Is there any solution for this?

Kind Regards,
Fran.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Stat speed for objects in ceph

2016-09-21 Thread Iain Buclaw

On 21 September 2016 at 02:57, Haomai Wang  wrote:
> On Wed, Sep 21, 2016 at 2:41 AM, Wido den Hollander  wrote:
>>
>>> Op 20 september 2016 om 20:30 schreef Haomai Wang :
>>>
>>>
>>> On Wed, Sep 21, 2016 at 2:26 AM, Wido den Hollander  wrote:
>>> >
>>> >> Op 20 september 2016 om 19:27 schreef Gregory Farnum 
>>> >> :
>>> >>
>>> >>
>>> >> In librados getting a stat is basically equivalent to reading a small
>>> >> object; there's not an index or anything so FileStore needs to descend 
>>> >> its
>>> >> folder hierarchy. If looking at metadata for all the objects in the 
>>> >> system
>>> >> efficiently is important you'll want to layer an index in somewhere.
>>> >> -Greg
>>> >>
>>> >
>>> > Should we expect a improvement here with BlueStore vs FileStore? That 
>>> > would basically be a RocksDB lookup on the OSD, right?
>>>
>>> Yes, bluestore will be much better since it has indexed on Onode(like
>>> inode) in rocksdb. Although it's fast enough, it also cost some on
>>> construct object, if you only want to check object existence, we may
>>> need a more lightweight interface
>>>
>>
>> It's rados_stat() which would be called, that is the way to check if a 
>> object exists. If I remember the BlueStore architecture correctly it would 
>> be a lookup in RocksDB with all the information in there.
>
> Exactly, but compared to database query, this lookup is still heavy.
> Each onode construct need to get lots of keys and do inline construct.
> Of course, it's a cheaper one in all rados interfaces.
>

>From some preliminary tests, I've noted that BlueStore is exceedingly
quicker doing millions of random small file IO compared to FileStore.
But this is only with around 1/25th of the data we are holding.

So having an index pool is the only way to get faster lookup speeds?
I don't think having one is really for my use case, with billions of
objects being held, I don't think maintaining such an index would be
any quicker than what rados_stat() is capable of achieving already.

In any case, these clients maintain and validate the data that's
stored, it would inherently assume that any index is wrong.

--
Iain Buclaw
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Fwd: Error

2016-09-21 Thread Rens Vermeulen

Dear ceph-users,

I'm preparing a Ceph cluster on my debian machines. I successfully walked
true the Preflight installation guide on the website.
Now i'm stuck at the STORAGE CLUSTER QUICK START, when i enter the
following command:

ceph-deploy mon create-initial


I get the following errors:


[cephsrv1][ERROR ] "ceph auth get-or-create for keytype admin returned 1

[ceph_deploy.gatherkeys][ERROR ] Failed to connect to host:cephsrv1
[ceph_deploy.gatherkeys][INFO  ] Destroy temp directory /tmp/tmpK5yQHi
[ceph_deploy][ERROR ] RuntimeError: Failed to connect any mon

I already changed line 78 from allow * --> allow in the gatherkeys.py File
under /usr/lib/python2.7/dist-packages/ceph_deploy/ (on the admin-node).

source: http://tracker.ceph.com/issues/16443

Best regards,

Thomas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Faulting MDS clients, HEALTH_OK

2016-09-21 Thread Heller, Chris

I’ll see if I can capture the output the next time this issue arises, but in 
general the output looks as if nothing is wrong. No OSD are down, a ‘ceph 
health detail’ results in HEALTH_OK, the mds server is in the up:active state, 
in general it’s as if nothing is wrong server side (at least from the summary).

-Chris

On 9/21/16, 10:46 AM, "Gregory Farnum"  wrote:

On Wed, Sep 21, 2016 at 6:30 AM, Heller, Chris  wrote:
> I’m running a production 0.94.7 Ceph cluster, and have been seeing a
> periodic issue arise where in all my MDS clients will become stuck, and 
the
> fix so far has been to restart the active MDS (sometimes I need to restart
> the subsequent active MDS as well).
>
>
>
> These clients are using the cephfs-hadoop API, so there is no kernel 
client,
> or fuse api involved. When I see clients get stuck, there are messages
> printed to stderr like the following:
>
>
>
> 2016-09-21 10:31:12.285030 7fea4c7fb700  0 – 192.168.1.241:0/1606648601 >>
> 192.168.1.195:6801/1674 pipe(0x7feaa0a1e0f0 sd=206 :0 s=1 pgs=0 cs=0 l=0
> c=0x7feaa0a0c500).fault
>
>
>
> I’m at somewhat of a loss on where to begin debugging this issue, and 
wanted
> to ping the list for ideas.

What's the full output of "ceph -s" when this happens? Have you looked
at the MDS' admin socket's ops-in-flight, and that of the clients?j

http://docs.ceph.com/docs/master/cephfs/troubleshooting/ may help some as 
well.

>
>
>
> I managed to dump the mds cache during one of the stalled moments, which
> hopefully is a useful starting point:
>
>
>
> e51bed37327a676e9974d740a13e173f11d1a11fdba5fbcf963b62023b06d7e8
> mdscachedump.txt.gz (https://filetea.me/t1sz3XPHxEVThOk8tvVTK5Bsg)
>
>
>
>
>
> -Chris
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Faulting MDS clients, HEALTH_OK

2016-09-21 Thread Gregory Farnum

On Wed, Sep 21, 2016 at 6:30 AM, Heller, Chris  wrote:
> I’m running a production 0.94.7 Ceph cluster, and have been seeing a
> periodic issue arise where in all my MDS clients will become stuck, and the
> fix so far has been to restart the active MDS (sometimes I need to restart
> the subsequent active MDS as well).
>
>
>
> These clients are using the cephfs-hadoop API, so there is no kernel client,
> or fuse api involved. When I see clients get stuck, there are messages
> printed to stderr like the following:
>
>
>
> 2016-09-21 10:31:12.285030 7fea4c7fb700  0 – 192.168.1.241:0/1606648601 >>
> 192.168.1.195:6801/1674 pipe(0x7feaa0a1e0f0 sd=206 :0 s=1 pgs=0 cs=0 l=0
> c=0x7feaa0a0c500).fault
>
>
>
> I’m at somewhat of a loss on where to begin debugging this issue, and wanted
> to ping the list for ideas.

What's the full output of "ceph -s" when this happens? Have you looked
at the MDS' admin socket's ops-in-flight, and that of the clients?j

http://docs.ceph.com/docs/master/cephfs/troubleshooting/ may help some as well.

>
>
>
> I managed to dump the mds cache during one of the stalled moments, which
> hopefully is a useful starting point:
>
>
>
> e51bed37327a676e9974d740a13e173f11d1a11fdba5fbcf963b62023b06d7e8
> mdscachedump.txt.gz (https://filetea.me/t1sz3XPHxEVThOk8tvVTK5Bsg)
>
>
>
>
>
> -Chris
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help on RGW NFS function

2016-09-21 Thread Matt Benjamin

Hi,

- Original Message -
> From: "yiming xie" 
> To: ceph-users@lists.ceph.com
> Sent: Wednesday, September 21, 2016 3:53:35 AM
> Subject: [ceph-users] Help on RGW NFS function
> 
> Hi,
> I have some question about rgw nfs.
> 
> ceph release notes: You can now access radosgw buckets via NFS
> (experimental).
> In addition to the sentence, ceph documents does not do any explanation
> I don't understand the experimental implications.
> 
> 1. RGW nfs functional integrity of it? If nfs function is not complete, which
> features missing?

NFSv4 only initially (but NFS3 support just added on master).  The I/O model is 
simplified.  Objects in RGW cannot be mutated in place, and the NFS client 
always overwrites.  Clients are currently expected to write sequentually from 
offset 0--on Linux, you should mount with -osync.

The RGW/S3 namespace is an emulation of a posix one using substring search, so 
we impose some limitations.  You cannot move directories, is one.  There are 
likely be published limits on bucket/object listing.

Some bugfixes are still in backport to Jewel.  That release supports NFSv4 and 
not NFS3.

> 2. How stable is the RGW nfs?

Some features are still being backported to Jewel.  I've submitted one 
important bugfix on master this week.  We are aiming for "general usability" in 
over the next 1-2 months (NFSv4).

> 3. RGW nfs latest version can be used in a production environment yet?

If you're conservative, it's probably not "ready."  Now would be a good time to 
experiment with the feature and see whether it is potentially useful to you.

Matt

> 
> Please reply to my question as soon as possible. Very grateful, thank you!
> 
> plato.xie
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-707-0660
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Faulting MDS clients, HEALTH_OK

2016-09-21 Thread Heller, Chris

I’m running a production 0.94.7 Ceph cluster, and have been seeing a periodic 
issue arise where in all my MDS clients will become stuck, and the fix so far 
has been to restart the active MDS (sometimes I need to restart the subsequent 
active MDS as well).

These clients are using the cephfs-hadoop API, so there is no kernel client, or 
fuse api involved. When I see clients get stuck, there are messages printed to 
stderr like the following:

2016-09-21 10:31:12.285030 7fea4c7fb700  0 – 192.168.1.241:0/1606648601 >> 
192.168.1.195:6801/1674 pipe(0x7feaa0a1e0f0 sd=206 :0 s=1 pgs=0 cs=0 l=0 
c=0x7feaa0a0c500).fault

I’m at somewhat of a loss on where to begin debugging this issue, and wanted to 
ping the list for ideas.

I managed to dump the mds cache during one of the stalled moments, which 
hopefully is a useful starting point:

e51bed37327a676e9974d740a13e173f11d1a11fdba5fbcf963b62023b06d7e8  
mdscachedump.txt.gz (https://filetea.me/t1sz3XPHxEVThOk8tvVTK5Bsg)


-Chris

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cache tier on rgw index pool

2016-09-21 Thread Samuel Just

I seriously doubt that it's ever going to be a winning strategy to let
rgw index objects go to a cold tier.  Some practical problems:
1) We don't track omap size (the leveldb entries for an object)
because it would turn writes into rmw's -- so they always show up as 0
size.  Thus, the target_max_bytes param is going to be useless.
2) You can't store omap objects on an ec pool at all, so if the base
pool is an ec pool, nothing will ever be demoted.
3) We always promote whole objects.

As to point 2., I'm guessing that Greg meant that OSDs don't care
about each other's leveldb instances *directly* since leveldb itself
is behind two layers of interfaces (one osd might have bluestore using
rocksdb, while the other might have filestore with some other
key-value db entirely).  Of course, replication -- certainly including
the omap entries -- still happens, but at the object level rather than
at the key-value db level.
-Sam

On Wed, Sep 21, 2016 at 5:43 AM, Abhishek Varshney
 wrote:
> Hi,
>
> I am evaluating on setting up a cache tier for the rgw index pool and
> have a few questions regarding that. The rgw index pool is different
> as it completely stores the data in leveldb. The 'rados df' command on
> the existing index pool shows size in KB as 0 on a 1 PB cluster with
> 500 million objects running ceph 0.94.2.
>
> Seeking clarifications on the following points:
>
> 1. How are the cache tier parameters like target_max_bytes,
> cache_target_dirty_ratio and cache_target_full_ratio honoured given
> the size of index pool is shown as 0 and how does flush/eviction take
> place in this case? Is there any specific reason why the omap data is
> not reflected in the size, as Sage mentions it here [1]
>
> 2. I found a mail archive in ceph-devel where Greg mentions that
> "there's no cross-OSD LevelDB replication or communication" [2]. In
> that case,  how does ceph handle re-balancing of leveldb instance data
> in case of node failure?
>
> 3. Are there any surprises that can be expected on deploying a cache
> tier for rgw index pool ?
>
> [1] http://www.spinics.net/lists/ceph-devel/msg28635.html
> [2] http://www.spinics.net/lists/ceph-devel/msg24990.html
>
> Thanks
> Abhishek Varshney
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Same pg scrubbed over and over (Jewel)

2016-09-21 Thread Samuel Just

Ah, same question then.  If we can get logging on the primary for one
of those pgs, it should be fairly obvious.
-Sam

On Wed, Sep 21, 2016 at 4:08 AM, Pavan Rallabhandi
 wrote:
> We find this as well in our fresh built Jewel clusters, and seems to happen 
> only with a handful of PGs from couple of pools.
>
> Thanks!
>
> On 9/21/16, 3:14 PM, "ceph-users on behalf of Tobias Böhm" 
>  wrote:
>
> Hi,
>
> there is an open bug in the tracker: http://tracker.ceph.com/issues/16474
>
> It also suggests restarting OSDs as a workaround. We faced the same issue 
> after increasing the number of PGs in our cluster and restarting OSDs solved 
> it as well.
>
> Tobias
>
> > Am 21.09.2016 um 11:26 schrieb Dan van der Ster :
> >
> > There was a thread about this a few days ago:
> > 
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/012857.html
> > And the OP found a workaround.
> > Looks like a bug though... (by default PGs scrub at most once per day).
> >
> > -- dan
> >
> >
> >
> > On Tue, Sep 20, 2016 at 10:43 PM, Martin Bureau  
> wrote:
> >> Hello,
> >>
> >>
> >> I noticed that the same pg gets scrubbed repeatedly on our new Jewel
> >> cluster:
> >>
> >>
> >> Here's an excerpt from log:
> >>
> >>
> >> 2016-09-20 20:36:31.236123 osd.12 10.1.82.82:6820/14316 150514 : 
> cluster
> >> [INF] 25.3f scrub ok
> >> 2016-09-20 20:36:32.232918 osd.12 10.1.82.82:6820/14316 150515 : 
> cluster
> >> [INF] 25.3f scrub starts
> >> 2016-09-20 20:36:32.236876 osd.12 10.1.82.82:6820/14316 150516 : 
> cluster
> >> [INF] 25.3f scrub ok
> >> 2016-09-20 20:36:33.233268 osd.12 10.1.82.82:6820/14316 150517 : 
> cluster
> >> [INF] 25.3f deep-scrub starts
> >> 2016-09-20 20:36:33.242258 osd.12 10.1.82.82:6820/14316 150518 : 
> cluster
> >> [INF] 25.3f deep-scrub ok
> >> 2016-09-20 20:36:36.233604 osd.12 10.1.82.82:6820/14316 150519 : 
> cluster
> >> [INF] 25.3f scrub starts
> >> 2016-09-20 20:36:36.237221 osd.12 10.1.82.82:6820/14316 150520 : 
> cluster
> >> [INF] 25.3f scrub ok
> >> 2016-09-20 20:36:41.234490 osd.12 10.1.82.82:6820/14316 150521 : 
> cluster
> >> [INF] 25.3f deep-scrub starts
> >> 2016-09-20 20:36:41.243720 osd.12 10.1.82.82:6820/14316 150522 : 
> cluster
> >> [INF] 25.3f deep-scrub ok
> >> 2016-09-20 20:36:45.235128 osd.12 10.1.82.82:6820/14316 150523 : 
> cluster
> >> [INF] 25.3f deep-scrub starts
> >> 2016-09-20 20:36:45.352589 osd.12 10.1.82.82:6820/14316 150524 : 
> cluster
> >> [INF] 25.3f deep-scrub ok
> >> 2016-09-20 20:36:47.235310 osd.12 10.1.82.82:6820/14316 150525 : 
> cluster
> >> [INF] 25.3f scrub starts
> >> 2016-09-20 20:36:47.239348 osd.12 10.1.82.82:6820/14316 150526 : 
> cluster
> >> [INF] 25.3f scrub ok
> >> 2016-09-20 20:36:49.235538 osd.12 10.1.82.82:6820/14316 150527 : 
> cluster
> >> [INF] 25.3f deep-scrub starts
> >> 2016-09-20 20:36:49.243121 osd.12 10.1.82.82:6820/14316 150528 : 
> cluster
> >> [INF] 25.3f deep-scrub ok
> >> 2016-09-20 20:36:51.235956 osd.12 10.1.82.82:6820/14316 150529 : 
> cluster
> >> [INF] 25.3f deep-scrub starts
> >> 2016-09-20 20:36:51.244201 osd.12 10.1.82.82:6820/14316 150530 : 
> cluster
> >> [INF] 25.3f deep-scrub ok
> >> 2016-09-20 20:36:52.236076 osd.12 10.1.82.82:6820/14316 150531 : 
> cluster
> >> [INF] 25.3f scrub starts
> >> 2016-09-20 20:36:52.239376 osd.12 10.1.82.82:6820/14316 150532 : 
> cluster
> >> [INF] 25.3f scrub ok
> >> 2016-09-20 20:36:56.236740 osd.12 10.1.82.82:6820/14316 150533 : 
> cluster
> >> [INF] 25.3f scrub starts
> >>
> >>
> >> How can I troubleshoot / resolve this ?
> >>
> >>
> >> Regards,
> >>
> >> Martin
> >>
> >>
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] swiftclient call radosgw, it always response 401 Unauthorized

2016-09-21 Thread Radoslaw Zarzynski

Hi,

Responded inline.

On Wed, Sep 21, 2016 at 4:54 AM, Brian Chang-Chien
 wrote:
>
>
> [global]
> ...
> debug rgw = 20
> [client.radosgw.gateway]
> host = brianceph
> rgw keystone url = http://10.62.13.253:35357
> rgw keystone admin token = 7bb8e26cbc714c47a26ffec3d96f246f
> rgw keystone accepted roles = admin, swiftuser, user, _member_, Member
> rgw keystone token cache size = 500
> rgw keystone revocation interval = 60
> rgw keystone make new tenants = true
> rgw s3 auth use keystone = true
> nss db path = /var/ceph/nss

The debug_rgw=20 has been put into the global section.
I bet that's the sole reason why this particular RadosGW
instance sees it.

> and I still some config problem
>
> Q3 : when i edit /etc/ceph/ceph.conf , if my hostname is brianceph
> the radosgw term in ceph.conf should be  [client.radosgw.gateway] or 
> [client.radosgw.brianceph]
> which one is correct?
>
> PS: when i create radosgw, i call th cmd " ceph-deploy rgw create brianceph"

Most likely your current section naming is wrong. I haven't
poked with ceph-deloy too much but I would say it should
be [client.rgw.brianceph] or [client.radosgw.brianceph].
I don't have now any cluster alive to disambiguate, sorry.

> Q4: when i finish to edit ceph.conf, i need restart radosgw service or 
> restart ceph service
>  in this case, i use ceph jewel, so which cmd need to call " systemctl 
> restart ceph-radosgw.target " or  "systemctl ceph.target"

Take a look on that:
http://docs.ceph.com/docs/jewel/install/install-ceph-gateway/

> Q5: when i use ceph-deploy new brianceph, ceph will generate a ceph.cof, what 
> kind edit ceph.conf to create rgw is prefer
>
> Method1: i direct edit ceph.conf from ceph geerated, and use ceph-deploy 
> --overwrite-conf rgw create brianceph
>
> Method2(i used in the case) : first i call ceph-deploy rgw create brianceph, 
> and then  edit ceph.conf in /etc/ceph/ folder , then call systemctl restart 
> ceph-radosgw.target
>
>
> two methods i find some different issues,
> in Method1, the radosgw item of the ceph.conf in /etc/ceph/  like "rgw 
> keystone url" will convert rgw_keystone_url ,

Spaces and underscores in configurables' names are treated
in the same way. No difference.

Regards,
Radoslaw Zarzynski
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Same pg scrubbed over and over (Jewel)

2016-09-21 Thread Pavan Rallabhandi

We find this as well in our fresh built Jewel clusters, and seems to happen 
only with a handful of PGs from couple of pools.

Thanks!

On 9/21/16, 3:14 PM, "ceph-users on behalf of Tobias Böhm" 
 wrote:

Hi,

there is an open bug in the tracker: http://tracker.ceph.com/issues/16474

It also suggests restarting OSDs as a workaround. We faced the same issue 
after increasing the number of PGs in our cluster and restarting OSDs solved it 
as well.

Tobias

> Am 21.09.2016 um 11:26 schrieb Dan van der Ster :
> 
> There was a thread about this a few days ago:
> 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/012857.html
> And the OP found a workaround.
> Looks like a bug though... (by default PGs scrub at most once per day).
> 
> -- dan
> 
> 
> 
> On Tue, Sep 20, 2016 at 10:43 PM, Martin Bureau  
wrote:
>> Hello,
>> 
>> 
>> I noticed that the same pg gets scrubbed repeatedly on our new Jewel
>> cluster:
>> 
>> 
>> Here's an excerpt from log:
>> 
>> 
>> 2016-09-20 20:36:31.236123 osd.12 10.1.82.82:6820/14316 150514 : cluster
>> [INF] 25.3f scrub ok
>> 2016-09-20 20:36:32.232918 osd.12 10.1.82.82:6820/14316 150515 : cluster
>> [INF] 25.3f scrub starts
>> 2016-09-20 20:36:32.236876 osd.12 10.1.82.82:6820/14316 150516 : cluster
>> [INF] 25.3f scrub ok
>> 2016-09-20 20:36:33.233268 osd.12 10.1.82.82:6820/14316 150517 : cluster
>> [INF] 25.3f deep-scrub starts
>> 2016-09-20 20:36:33.242258 osd.12 10.1.82.82:6820/14316 150518 : cluster
>> [INF] 25.3f deep-scrub ok
>> 2016-09-20 20:36:36.233604 osd.12 10.1.82.82:6820/14316 150519 : cluster
>> [INF] 25.3f scrub starts
>> 2016-09-20 20:36:36.237221 osd.12 10.1.82.82:6820/14316 150520 : cluster
>> [INF] 25.3f scrub ok
>> 2016-09-20 20:36:41.234490 osd.12 10.1.82.82:6820/14316 150521 : cluster
>> [INF] 25.3f deep-scrub starts
>> 2016-09-20 20:36:41.243720 osd.12 10.1.82.82:6820/14316 150522 : cluster
>> [INF] 25.3f deep-scrub ok
>> 2016-09-20 20:36:45.235128 osd.12 10.1.82.82:6820/14316 150523 : cluster
>> [INF] 25.3f deep-scrub starts
>> 2016-09-20 20:36:45.352589 osd.12 10.1.82.82:6820/14316 150524 : cluster
>> [INF] 25.3f deep-scrub ok
>> 2016-09-20 20:36:47.235310 osd.12 10.1.82.82:6820/14316 150525 : cluster
>> [INF] 25.3f scrub starts
>> 2016-09-20 20:36:47.239348 osd.12 10.1.82.82:6820/14316 150526 : cluster
>> [INF] 25.3f scrub ok
>> 2016-09-20 20:36:49.235538 osd.12 10.1.82.82:6820/14316 150527 : cluster
>> [INF] 25.3f deep-scrub starts
>> 2016-09-20 20:36:49.243121 osd.12 10.1.82.82:6820/14316 150528 : cluster
>> [INF] 25.3f deep-scrub ok
>> 2016-09-20 20:36:51.235956 osd.12 10.1.82.82:6820/14316 150529 : cluster
>> [INF] 25.3f deep-scrub starts
>> 2016-09-20 20:36:51.244201 osd.12 10.1.82.82:6820/14316 150530 : cluster
>> [INF] 25.3f deep-scrub ok
>> 2016-09-20 20:36:52.236076 osd.12 10.1.82.82:6820/14316 150531 : cluster
>> [INF] 25.3f scrub starts
>> 2016-09-20 20:36:52.239376 osd.12 10.1.82.82:6820/14316 150532 : cluster
>> [INF] 25.3f scrub ok
>> 2016-09-20 20:36:56.236740 osd.12 10.1.82.82:6820/14316 150533 : cluster
>> [INF] 25.3f scrub starts
>> 
>> 
>> How can I troubleshoot / resolve this ?
>> 
>> 
>> Regards,
>> 
>> Martin
>> 
>> 
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] crash of osd using cephfs jewel 10.2.2, and corruption

2016-09-21 Thread Samuel Just

Looks like the OSD didn't like an error return it got from the
underlying fs.  Can you reproduce with

debug filestore = 20
debug osd = 20
debug ms = 1

on the osd and post the whole log?
-Sam

On Wed, Sep 21, 2016 at 12:10 AM, Peter Maloney
 wrote:
> Hi,
>
> I created a one disk osd with data and separate journal on the same lvm
> volume group just for test, one mon, one mds on my desktop.
>
> I managed to crash the osd just by mounting cephfs and doing cp -a of
> the linux-stable git tree into it. It crashed after copying 2.1G which
> only covers some of the .git dir and none of the rest. And then when I
> killed ceph-mds and restarted the osd and mds, ceph -s said something
> about the pgs being stuck or unclean or something, and the computer
> froze. :/ After booting again, everything is fine, and the problem was
> reproducable the same way...just copying the files again.[but after
> writing this mail, I can't seem to cause it as easily again... copying
> again works, but sha1sum doesn't, even if I drop caches]
>
> Also reading seems to do the same.
>
> And then I tried adding a 2nd osd (also from vlm, with osd and journal
> on same volume group). And that seemed to stop the crashing, but not
> sure about corruption.I guess the corruption was on the cephfs but RAM
> had good copies or something, so rebooting, etc. is what made the
> corruption appear? (I tried to reproduce, but couldn't...didn't try
> killing daemons)
>
>> root@client:/mnt/test # ls -l
>> total 447
>> drwx-- 1 root root  4 2016-09-20 20:37 1/
>> drwx-- 1 root root  4 2016-09-20 20:37 2/
>> drwx-- 1 root root  4 2016-09-20 20:37 linux-stable/
>> -rw-r--r-- 1 root root 457480 2016-09-20 21:38 sums.txt
>> root@client:/mnt/test # (cd linux-stable/; sha1sum -c --quiet
>> ../sums.txt )
> (osd crashed before that finished ... and then impressively, starting
> the osd again made the command finish gracefully... and then tried rsync
> to finish copying and 6 or so restarts later it finished with just the 1
> rsync run)
>
> And then the checksums didn't match... (corruption)
>> root@client:/mnt/test # (cd linux-stable/; sha1sum -c --quiet
>> ../sums.txt )
>> ./.git/objects/e6/635671beff26a417c02d50adeefa2a6897a9dd: FAILED
>> ./.git/objects/e6/d58d90213a4a283d428988a398281663dd68e4: FAILED
>> ./.git/objects/81/281381965b21d3c23b2f877e214c4af65d6fa4: FAILED
>> ./.git/objects/4c/f549c4a9b23638ab49cc0f8b47c395b1fc8ede: FAILED
>> sha1sum: WARNING: 4 computed checksums did NOT match
>
>> root@client:/mnt/test # hexdump -C
>> linux-stable/.git/objects/e6/635671beff26a417c02d50adeefa2a6897a9dd
>>   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>> ||
>> *
>> 0dd0  00 00 00  |...|
>> 0dd3
>> peter@peter:~/projects $ hexdump -C
>> linux-stable/.git/objects/e6/635671beff26a417c02d50adeefa2a6897a9dd | head
>>   78 01 bd 5b 7d 6f db c6  19 df bf d4 a7 38 20 40
>> |x..[}o...8 @|
>> 0010  27 0b 8e ec 14 2b 06 24  5d 90 34 75 5c 63 4e 6c
>> |'+.$].4u\cNl|
>> 0020  d8 f1 82 62 19 08 9a 3a  59 ac 29 52 25 29 bb 5e
>> |...b...:Y.)R%).^|
>> 0030  9a ef be df f3 dc 1d 79  c7 77 39 c1 84 c0 12 79
>> |...y.w9y|
>> 0040  77 cf fb 3b 99 eb 38 bd  16 cf 7e 38 fc e1 f0 2f
>> |w..;..8...~8.../|
>> 0050  07 b3 89 98 89 fc 21 5f  e6 f3 95 78 2a 16 72 19
>> |..!_...x*.r.|
>> 0060  25 51 11 a5 49 2e 96 69  26 8a 95 c4 bd bb 28 c4
>> |%Q..I..i&.(.|
>> 0070  57 16 dd c9 4c 2c a3 58  62 7f 21 d7 38 49 87 df
>> |W...L,.Xb.!.8I..|
>> 0080  a4 9b 87 2c ba 59 15 62  1a ee 89 ef 0f 0f 9f ed
>> |...,.Y.b|
>> 0090  e3 cf f7 e2 3c 28 b2 28  bc 15 ef d2 70 25 e3 d6
>> |<(.(p%..|
> and then copying that and testing checksums again has even more failures
>
>> root@client:/mnt/test # cp -a linux-stable 3
>> root@client:/mnt/test # (cd 3/; sha1sum -c --quiet ../sums.txt )
>> ./net/iucv/iucv.c: FAILED
>> ./net/kcm/kcmsock.c: FAILED
>> ./net/irda/ircomm/ircomm_event.c: FAILED
>> ./net/irda/ircomm/ircomm_tty_attach.c: FAILED
>> ./net/llc/Makefile: FAILED
>> ./net/llc/Kconfig: FAILED
>> ./net/llc/af_llc.c: FAILED
>> ./net/lapb/lapb_timer.c: FAILED
>> ./net/lapb/lapb_subr.c: FAILED
>> ./net/lapb/lapb_iface.c: FAILED
>> ./net/lapb/Makefile: FAILED
>> ./net/lapb/lapb_in.c: FAILED
>> ./net/lapb/lapb_out.c: FAILED
>> ./net/l2tp/l2tp_eth.c: FAILED
>> ./net/l2tp/Kconfig: FAILED
>> ./net/l2tp/l2tp_core.h: FAILED
>> ./.git/objects/e6/635671beff26a417c02d50adeefa2a6897a9dd: FAILED
>> ./.git/objects/e6/d58d90213a4a283d428988a398281663dd68e4: FAILED
>> ./.git/objects/81/281381965b21d3c23b2f877e214c4af65d6fa4: FAILED
>> ./.git/objects/4c/f549c4a9b23638ab49cc0f8b47c395b1fc8ede: FAILED
>> sha1sum: WARNING: 20 computed checksums did NOT match
>> root@client:/mnt/test # hexdump -C 3/net/iucv/iucv.c
>>   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>> ||
>> *
>> d420

Re: [ceph-users] Same pg scrubbed over and over (Jewel)

2016-09-21 Thread Samuel Just

Can you reproduce with logging on the primary for that pg?

debug osd = 20
debug filestore = 20
debug ms = 1

Since restarting the osd may be a workaround, can you inject the debug
values without restarting the daemon?
-Sam

On Wed, Sep 21, 2016 at 2:44 AM, Tobias Böhm  wrote:
> Hi,
>
> there is an open bug in the tracker: http://tracker.ceph.com/issues/16474
>
> It also suggests restarting OSDs as a workaround. We faced the same issue 
> after increasing the number of PGs in our cluster and restarting OSDs solved 
> it as well.
>
> Tobias
>
>> Am 21.09.2016 um 11:26 schrieb Dan van der Ster :
>>
>> There was a thread about this a few days ago:
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/012857.html
>> And the OP found a workaround.
>> Looks like a bug though... (by default PGs scrub at most once per day).
>>
>> -- dan
>>
>>
>>
>> On Tue, Sep 20, 2016 at 10:43 PM, Martin Bureau  wrote:
>>> Hello,
>>>
>>>
>>> I noticed that the same pg gets scrubbed repeatedly on our new Jewel
>>> cluster:
>>>
>>>
>>> Here's an excerpt from log:
>>>
>>>
>>> 2016-09-20 20:36:31.236123 osd.12 10.1.82.82:6820/14316 150514 : cluster
>>> [INF] 25.3f scrub ok
>>> 2016-09-20 20:36:32.232918 osd.12 10.1.82.82:6820/14316 150515 : cluster
>>> [INF] 25.3f scrub starts
>>> 2016-09-20 20:36:32.236876 osd.12 10.1.82.82:6820/14316 150516 : cluster
>>> [INF] 25.3f scrub ok
>>> 2016-09-20 20:36:33.233268 osd.12 10.1.82.82:6820/14316 150517 : cluster
>>> [INF] 25.3f deep-scrub starts
>>> 2016-09-20 20:36:33.242258 osd.12 10.1.82.82:6820/14316 150518 : cluster
>>> [INF] 25.3f deep-scrub ok
>>> 2016-09-20 20:36:36.233604 osd.12 10.1.82.82:6820/14316 150519 : cluster
>>> [INF] 25.3f scrub starts
>>> 2016-09-20 20:36:36.237221 osd.12 10.1.82.82:6820/14316 150520 : cluster
>>> [INF] 25.3f scrub ok
>>> 2016-09-20 20:36:41.234490 osd.12 10.1.82.82:6820/14316 150521 : cluster
>>> [INF] 25.3f deep-scrub starts
>>> 2016-09-20 20:36:41.243720 osd.12 10.1.82.82:6820/14316 150522 : cluster
>>> [INF] 25.3f deep-scrub ok
>>> 2016-09-20 20:36:45.235128 osd.12 10.1.82.82:6820/14316 150523 : cluster
>>> [INF] 25.3f deep-scrub starts
>>> 2016-09-20 20:36:45.352589 osd.12 10.1.82.82:6820/14316 150524 : cluster
>>> [INF] 25.3f deep-scrub ok
>>> 2016-09-20 20:36:47.235310 osd.12 10.1.82.82:6820/14316 150525 : cluster
>>> [INF] 25.3f scrub starts
>>> 2016-09-20 20:36:47.239348 osd.12 10.1.82.82:6820/14316 150526 : cluster
>>> [INF] 25.3f scrub ok
>>> 2016-09-20 20:36:49.235538 osd.12 10.1.82.82:6820/14316 150527 : cluster
>>> [INF] 25.3f deep-scrub starts
>>> 2016-09-20 20:36:49.243121 osd.12 10.1.82.82:6820/14316 150528 : cluster
>>> [INF] 25.3f deep-scrub ok
>>> 2016-09-20 20:36:51.235956 osd.12 10.1.82.82:6820/14316 150529 : cluster
>>> [INF] 25.3f deep-scrub starts
>>> 2016-09-20 20:36:51.244201 osd.12 10.1.82.82:6820/14316 150530 : cluster
>>> [INF] 25.3f deep-scrub ok
>>> 2016-09-20 20:36:52.236076 osd.12 10.1.82.82:6820/14316 150531 : cluster
>>> [INF] 25.3f scrub starts
>>> 2016-09-20 20:36:52.239376 osd.12 10.1.82.82:6820/14316 150532 : cluster
>>> [INF] 25.3f scrub ok
>>> 2016-09-20 20:36:56.236740 osd.12 10.1.82.82:6820/14316 150533 : cluster
>>> [INF] 25.3f scrub starts
>>>
>>>
>>> How can I troubleshoot / resolve this ?
>>>
>>>
>>> Regards,
>>>
>>> Martin
>>>
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] swiftclient call radosgw, it always response 401 Unauthorized

2016-09-21 Thread Brian Chang-Chien

Hi Radoslaw Zarzynski

I also test Ceph Hammer 0.94.7 for rados gateway

the result still response 401 , so i post ceph.conf, radosgw.log as
following

ceph.conf


when i call swift cmd , radosgw.log


2016-09-21 17:47:12.141783 7fc7f4bf5700 10
content=MIIBzgYJKoZIhvcNAQcCoIIBvzCCAbsCAQExDTALBglghkgBZQMEAgEwHgYJKoZIhvcNAQcBoBEED3sicmV2b2tlZCI6IFtdfTGCAYUwggGBAgEBMFwwVzELMAkGA1UEBhMCVVMxDjAMBgNVBAgMBVVuc2V0MQ4wDAYDVQQHDAVVbnNldDEOMAwGA1UECgwFVW5zZXQxGDAWBgNVBAMMD3d3dy5leGFtcGxlLmNvbQIBATALBglghkgBZQMEAgEwDQYJKoZIhvcNAQEBBQAEggEAi6EwxSG2h4A9iatlWY258MpqV4NQIRzEUEJIEuEGtlkf9AGjMi9bvXgEsJdf3WXQMPSQOdQAiISo9dWI0cLz90C9gA5buseORLpF5PtdhSK29EtHCqBqlgAssS1ZvPAa3tl4QsqG6GGXpzYPHv8Ilkq5uuxWP3XxgJiBw3EBtgkI93CDidqK9dd49DoPKJDnKINAP2ku6lNwI24kC+KNc7ehPmbUH6vXGhgsGeBh/b49u0nUFzpshy5ehu3kF01vIvuo4FT9tMTmxQ4rn+LrdjLizgdAl2B/Qu/JlJ1J0vN/mFfbb/ClKerlTUJQOJqYbvZfPOw70ZCn2pRZCbyj9Q==
2016-09-21 17:47:12.142766 7fc7f4bf5700 10 ceph_decode_cms: decoded:
{"revoked": []}
2016-09-21 17:47:18.870391 7fc7f77fe700  2
RGWDataChangesLog::ChangesRenewThread: start
2016-09-21 17:47:40.870466 7fc7f77fe700  2
RGWDataChangesLog::ChangesRenewThread: start
2016-09-21 17:47:42.142875 7fc7f4bf5700  2 keystone revoke thread: start
2016-09-21 17:47:42.142925 7fc7f4bf5700 20 sending request to
http://10.62.14.192:35357/v2.0/tokens/revoked
2016-09-21 17:47:42.172758 7fc7f4bf5700 10 request returned {"signed":
"-BEGIN
CMS-\nMIIBzgYJKoZIhvcNAQcCoIIBvzCCAbsCAQExDTALBglghkgBZQMEAgEwHgYJKoZI\nhvcNAQcBoBEED3sicmV2b2tlZCI6IFtdfTGCAYUwggGBAgEBMFwwVzELMAkGA1UE\nBhMCVVMxDjAMBgNVBAgMBVVuc2V0MQ4wDAYDVQQHDAVVbnNldDEOMAwGA1UECgwF\nVW5zZXQxGDAWBgNVBAMMD3d3dy5leGFtcGxlLmNvbQIBATALBglghkgBZQMEAgEw\nDQYJKoZIhvcNAQEBBQAEggEAi6EwxSG2h4A9iatlWY258MpqV4NQIRzEUEJIEuEG\ntlkf9AGjMi9bvXgEsJdf3WXQMPSQOdQAiISo9dWI0cLz90C9gA5buseORLpF5Ptd\nhSK29EtHCqBqlgAssS1ZvPAa3tl4QsqG6GGXpzYPHv8Ilkq5uuxWP3XxgJiBw3EB\ntgkI93CDidqK9dd49DoPKJDnKINAP2ku6lNwI24kC+KNc7ehPmbUH6vXGhgsGeBh\n/b49u0nUFzpshy5ehu3kF01vIvuo4FT9tMTmxQ4rn+LrdjLizgdAl2B/Qu/JlJ1J\n0vN/mFfbb/ClKerlTUJQOJqYbvZfPOw70ZCn2pRZCbyj9Q==\n-END
CMS-\n"}
2016-09-21 17:47:42.172825 7fc7f4bf5700 10 signed=-BEGIN CMS-
MIIBzgYJKoZIhvcNAQcCoIIBvzCCAbsCAQExDTALBglghkgBZQMEAgEwHgYJKoZI
hvcNAQcBoBEED3sicmV2b2tlZCI6IFtdfTGCAYUwggGBAgEBMFwwVzELMAkGA1UE
BhMCVVMxDjAMBgNVBAgMBVVuc2V0MQ4wDAYDVQQHDAVVbnNldDEOMAwGA1UECgwF
VW5zZXQxGDAWBgNVBAMMD3d3dy5leGFtcGxlLmNvbQIBATALBglghkgBZQMEAgEw
DQYJKoZIhvcNAQEBBQAEggEAi6EwxSG2h4A9iatlWY258MpqV4NQIRzEUEJIEuEG
tlkf9AGjMi9bvXgEsJdf3WXQMPSQOdQAiISo9dWI0cLz90C9gA5buseORLpF5Ptd
hSK29EtHCqBqlgAssS1ZvPAa3tl4QsqG6GGXpzYPHv8Ilkq5uuxWP3XxgJiBw3EB
tgkI93CDidqK9dd49DoPKJDnKINAP2ku6lNwI24kC+KNc7ehPmbUH6vXGhgsGeBh
/b49u0nUFzpshy5ehu3kF01vIvuo4FT9tMTmxQ4rn+LrdjLizgdAl2B/Qu/JlJ1J
0vN/mFfbb/ClKerlTUJQOJqYbvZfPOw70ZCn2pRZCbyj9Q==
-END CMS-
2016-09-21 17:47:42.172871 7fc7f4bf5700 10
content=MIIBzgYJKoZIhvcNAQcCoIIBvzCCAbsCAQExDTALBglghkgBZQMEAgEwHgYJKoZIhvcNAQcBoBEED3sicmV2b2tlZCI6IFtdfTGCAYUwggGBAgEBMFwwVzELMAkGA1UEBhMCVVMxDjAMBgNVBAgMBVVuc2V0MQ4wDAYDVQQHDAVVbnNldDEOMAwGA1UECgwFVW5zZXQxGDAWBgNVBAMMD3d3dy5leGFtcGxlLmNvbQIBATALBglghkgBZQMEAgEwDQYJKoZIhvcNAQEBBQAEggEAi6EwxSG2h4A9iatlWY258MpqV4NQIRzEUEJIEuEGtlkf9AGjMi9bvXgEsJdf3WXQMPSQOdQAiISo9dWI0cLz90C9gA5buseORLpF5PtdhSK29EtHCqBqlgAssS1ZvPAa3tl4QsqG6GGXpzYPHv8Ilkq5uuxWP3XxgJiBw3EBtgkI93CDidqK9dd49DoPKJDnKINAP2ku6lNwI24kC+KNc7ehPmbUH6vXGhgsGeBh/b49u0nUFzpshy5ehu3kF01vIvuo4FT9tMTmxQ4rn+LrdjLizgdAl2B/Qu/JlJ1J0vN/mFfbb/ClKerlTUJQOJqYbvZfPOw70ZCn2pRZCbyj9Q==
2016-09-21 17:47:42.173848 7fc7f4bf5700 10 ceph_decode_cms: decoded:
{"revoked": []}
2016-09-21 17:48:02.870540 7fc7f77fe700  2
RGWDataChangesLog::ChangesRenewThread: start
2016-09-21 17:48:12.173957 7fc7f4bf5700  2 keystone revoke thread: start
2016-09-21 17:48:12.174004 7fc7f4bf5700 20 sending request to
http://10.62.14.192:35357/v2.0/tokens/revoked
2016-09-21 17:48:12.204581 7fc7f4bf5700 10 request returned {"signed":
"-BEGIN
CMS-\nMIIBzgYJKoZIhvcNAQcCoIIBvzCCAbsCAQExDTALBglghkgBZQMEAgEwHgYJKoZI\nhvcNAQcBoBEED3sicmV2b2tlZCI6IFtdfTGCAYUwggGBAgEBMFwwVzELMAkGA1UE\nBhMCVVMxDjAMBgNVBAgMBVVuc2V0MQ4wDAYDVQQHDAVVbnNldDEOMAwGA1UECgwF\nVW5zZXQxGDAWBgNVBAMMD3d3dy5leGFtcGxlLmNvbQIBATALBglghkgBZQMEAgEw\nDQYJKoZIhvcNAQEBBQAEggEAi6EwxSG2h4A9iatlWY258MpqV4NQIRzEUEJIEuEG\ntlkf9AGjMi9bvXgEsJdf3WXQMPSQOdQAiISo9dWI0cLz90C9gA5buseORLpF5Ptd\nhSK29EtHCqBqlgAssS1ZvPAa3tl4QsqG6GGXpzYPHv8Ilkq5uuxWP3XxgJiBw3EB\ntgkI93CDidqK9dd49DoPKJDnKINAP2ku6lNwI24kC+KNc7ehPmbUH6vXGhgsGeBh\n/b49u0nUFzpshy5ehu3kF01vIvuo4FT9tMTmxQ4rn+LrdjLizgdAl2B/Qu/JlJ1J\n0vN/mFfbb/ClKerlTUJQOJqYbvZfPOw70ZCn2pRZCbyj9Q==\n-END
CMS-\n"}
2016-09-21 17:48:12.204647 7fc7f4bf5700 10 signed=-BEGIN CMS-
MIIBzgYJKoZIhvcNAQcCoIIBvzCCAbsCAQExDTALBglghkgBZQMEAgEwHgYJKoZI
hvcNAQcBoBEED3sicmV2b2tlZCI6IFtdfTGCAYUwggGBAgEBMFwwVzELMAkGA1UE
BhMCVVMxDjAMBgNVBAgMBVVuc2V0MQ4wDAYDVQQHDAVVbnNldDEOMAwGA1UECgwF
VW5zZXQxGDAWBgNVBAMMD3d3dy5leGFtcGxlLmNvbQIBATALBglghkgBZQMEAgEw
DQYJKoZIhvcNAQEBBQAEggEAi6EwxSG2h4A9iatlWY258MpqV4NQIRzEUEJIEuEG

Re: [ceph-users] Same pg scrubbed over and over (Jewel)

2016-09-21 Thread Tobias Böhm

Hi,

there is an open bug in the tracker: http://tracker.ceph.com/issues/16474

It also suggests restarting OSDs as a workaround. We faced the same issue after 
increasing the number of PGs in our cluster and restarting OSDs solved it as 
well.

Tobias

> Am 21.09.2016 um 11:26 schrieb Dan van der Ster :
> 
> There was a thread about this a few days ago:
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/012857.html
> And the OP found a workaround.
> Looks like a bug though... (by default PGs scrub at most once per day).
> 
> -- dan
> 
> 
> 
> On Tue, Sep 20, 2016 at 10:43 PM, Martin Bureau  wrote:
>> Hello,
>> 
>> 
>> I noticed that the same pg gets scrubbed repeatedly on our new Jewel
>> cluster:
>> 
>> 
>> Here's an excerpt from log:
>> 
>> 
>> 2016-09-20 20:36:31.236123 osd.12 10.1.82.82:6820/14316 150514 : cluster
>> [INF] 25.3f scrub ok
>> 2016-09-20 20:36:32.232918 osd.12 10.1.82.82:6820/14316 150515 : cluster
>> [INF] 25.3f scrub starts
>> 2016-09-20 20:36:32.236876 osd.12 10.1.82.82:6820/14316 150516 : cluster
>> [INF] 25.3f scrub ok
>> 2016-09-20 20:36:33.233268 osd.12 10.1.82.82:6820/14316 150517 : cluster
>> [INF] 25.3f deep-scrub starts
>> 2016-09-20 20:36:33.242258 osd.12 10.1.82.82:6820/14316 150518 : cluster
>> [INF] 25.3f deep-scrub ok
>> 2016-09-20 20:36:36.233604 osd.12 10.1.82.82:6820/14316 150519 : cluster
>> [INF] 25.3f scrub starts
>> 2016-09-20 20:36:36.237221 osd.12 10.1.82.82:6820/14316 150520 : cluster
>> [INF] 25.3f scrub ok
>> 2016-09-20 20:36:41.234490 osd.12 10.1.82.82:6820/14316 150521 : cluster
>> [INF] 25.3f deep-scrub starts
>> 2016-09-20 20:36:41.243720 osd.12 10.1.82.82:6820/14316 150522 : cluster
>> [INF] 25.3f deep-scrub ok
>> 2016-09-20 20:36:45.235128 osd.12 10.1.82.82:6820/14316 150523 : cluster
>> [INF] 25.3f deep-scrub starts
>> 2016-09-20 20:36:45.352589 osd.12 10.1.82.82:6820/14316 150524 : cluster
>> [INF] 25.3f deep-scrub ok
>> 2016-09-20 20:36:47.235310 osd.12 10.1.82.82:6820/14316 150525 : cluster
>> [INF] 25.3f scrub starts
>> 2016-09-20 20:36:47.239348 osd.12 10.1.82.82:6820/14316 150526 : cluster
>> [INF] 25.3f scrub ok
>> 2016-09-20 20:36:49.235538 osd.12 10.1.82.82:6820/14316 150527 : cluster
>> [INF] 25.3f deep-scrub starts
>> 2016-09-20 20:36:49.243121 osd.12 10.1.82.82:6820/14316 150528 : cluster
>> [INF] 25.3f deep-scrub ok
>> 2016-09-20 20:36:51.235956 osd.12 10.1.82.82:6820/14316 150529 : cluster
>> [INF] 25.3f deep-scrub starts
>> 2016-09-20 20:36:51.244201 osd.12 10.1.82.82:6820/14316 150530 : cluster
>> [INF] 25.3f deep-scrub ok
>> 2016-09-20 20:36:52.236076 osd.12 10.1.82.82:6820/14316 150531 : cluster
>> [INF] 25.3f scrub starts
>> 2016-09-20 20:36:52.239376 osd.12 10.1.82.82:6820/14316 150532 : cluster
>> [INF] 25.3f scrub ok
>> 2016-09-20 20:36:56.236740 osd.12 10.1.82.82:6820/14316 150533 : cluster
>> [INF] 25.3f scrub starts
>> 
>> 
>> How can I troubleshoot / resolve this ?
>> 
>> 
>> Regards,
>> 
>> Martin
>> 
>> 
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Same pg scrubbed over and over (Jewel)

2016-09-21 Thread Dan van der Ster

There was a thread about this a few days ago:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/012857.html
And the OP found a workaround.
Looks like a bug though... (by default PGs scrub at most once per day).

-- dan



On Tue, Sep 20, 2016 at 10:43 PM, Martin Bureau  wrote:
> Hello,
>
>
> I noticed that the same pg gets scrubbed repeatedly on our new Jewel
> cluster:
>
>
> Here's an excerpt from log:
>
>
> 2016-09-20 20:36:31.236123 osd.12 10.1.82.82:6820/14316 150514 : cluster
> [INF] 25.3f scrub ok
> 2016-09-20 20:36:32.232918 osd.12 10.1.82.82:6820/14316 150515 : cluster
> [INF] 25.3f scrub starts
> 2016-09-20 20:36:32.236876 osd.12 10.1.82.82:6820/14316 150516 : cluster
> [INF] 25.3f scrub ok
> 2016-09-20 20:36:33.233268 osd.12 10.1.82.82:6820/14316 150517 : cluster
> [INF] 25.3f deep-scrub starts
> 2016-09-20 20:36:33.242258 osd.12 10.1.82.82:6820/14316 150518 : cluster
> [INF] 25.3f deep-scrub ok
> 2016-09-20 20:36:36.233604 osd.12 10.1.82.82:6820/14316 150519 : cluster
> [INF] 25.3f scrub starts
> 2016-09-20 20:36:36.237221 osd.12 10.1.82.82:6820/14316 150520 : cluster
> [INF] 25.3f scrub ok
> 2016-09-20 20:36:41.234490 osd.12 10.1.82.82:6820/14316 150521 : cluster
> [INF] 25.3f deep-scrub starts
> 2016-09-20 20:36:41.243720 osd.12 10.1.82.82:6820/14316 150522 : cluster
> [INF] 25.3f deep-scrub ok
> 2016-09-20 20:36:45.235128 osd.12 10.1.82.82:6820/14316 150523 : cluster
> [INF] 25.3f deep-scrub starts
> 2016-09-20 20:36:45.352589 osd.12 10.1.82.82:6820/14316 150524 : cluster
> [INF] 25.3f deep-scrub ok
> 2016-09-20 20:36:47.235310 osd.12 10.1.82.82:6820/14316 150525 : cluster
> [INF] 25.3f scrub starts
> 2016-09-20 20:36:47.239348 osd.12 10.1.82.82:6820/14316 150526 : cluster
> [INF] 25.3f scrub ok
> 2016-09-20 20:36:49.235538 osd.12 10.1.82.82:6820/14316 150527 : cluster
> [INF] 25.3f deep-scrub starts
> 2016-09-20 20:36:49.243121 osd.12 10.1.82.82:6820/14316 150528 : cluster
> [INF] 25.3f deep-scrub ok
> 2016-09-20 20:36:51.235956 osd.12 10.1.82.82:6820/14316 150529 : cluster
> [INF] 25.3f deep-scrub starts
> 2016-09-20 20:36:51.244201 osd.12 10.1.82.82:6820/14316 150530 : cluster
> [INF] 25.3f deep-scrub ok
> 2016-09-20 20:36:52.236076 osd.12 10.1.82.82:6820/14316 150531 : cluster
> [INF] 25.3f scrub starts
> 2016-09-20 20:36:52.239376 osd.12 10.1.82.82:6820/14316 150532 : cluster
> [INF] 25.3f scrub ok
> 2016-09-20 20:36:56.236740 osd.12 10.1.82.82:6820/14316 150533 : cluster
> [INF] 25.3f scrub starts
>
>
> How can I troubleshoot / resolve this ?
>
>
> Regards,
>
> Martin
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Help on RGW NFS function

2016-09-21 Thread yiming xie

Hi,
I have some question about rgw nfs.

ceph release notes: You can now access radosgw buckets via NFS (experimental).
In addition to the sentence, ceph documents does not do any explanation
I don't understand the experimental implications.

1. RGW nfs functional integrity of it? If nfs function is not complete, which 
features missing?
2. How stable is the RGW nfs?
3. RGW nfs latest version can be used in a production environment yet?

Please reply to my question as soon as possible. Very grateful, thank you!


 plato.xie___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] ceph pg stuck creating

2016-09-21 Thread Yuriy Karpel

Hi,

There is a cluster of ceph. To accelerate dokinuli ssd into it, the card: I
ssd gathered in a separate pool and a prescribed rule.   But I'm getting
the following error msg:

[ceph@ceph-adm test]$ ceph -s
cluster a73f8be5-0fa6-4a82-aa9b-22121246398e
 health HEALTH_WARN
64 pgs stuck inactive
64 pgs stuck unclean
 monmap e1: 3 mons at {srv-osd01=10.30.15.20:6789/0,
srv-osd02=10.30.15.21:6789/0,srv-osd03=10.30.15.22:6789/0}
election epoch 76, quorum 0,1,2 srv-osd01,srv-osd02,srv-osd03
 osdmap e9500: 135 osds: 135 up, 135 in
flags sortbitwise
  pgmap v16447007: 4168 pgs, 3 pools, 5829 GB data, 1488 kobjects
17469 GB used, 270 TB / 287 TB avail
4104 active+clean
  64 creating
  client io 179 kB/s rd, 8781 kB/s wr, 177 op/s

If you change the value of ruleset 0 to ruleset 2 all will work.
Possibly due to the removal of osd.52?

ceph.conf
[global]
fsid = a73f8be5-0fa6-4a82-aa9b-22121246398e
mon_initial_members = srv-osd01, srv-osd02, srv-osd03
mon_host = 10.30.15.20,10.30.15.21,10.30.15.22
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
public network = 10.30.15.0/24
cluster network = 10.40.10.0/24
osd_crush_update_on_start = false


My crushrule map:

[ceph@ceph-adm test]$ cat map.decompile
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable straw_calc_version 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11
device 12 osd.12
device 13 osd.13
device 14 osd.14
device 15 osd.15
device 16 osd.16
device 17 osd.17
device 18 osd.18
device 19 osd.19
device 20 osd.20
device 21 osd.21
device 22 osd.22
device 23 osd.23
device 24 osd.24
device 25 osd.25
device 26 osd.26
device 27 osd.27
device 28 osd.28
device 29 osd.29
device 30 osd.30
device 31 osd.31
device 32 osd.32
device 33 osd.33
device 34 osd.34
device 35 osd.35
device 36 osd.36
device 37 osd.37
device 38 osd.38
device 39 osd.39
device 40 osd.40
device 41 osd.41
device 42 osd.42
device 43 osd.43
device 44 osd.44
device 45 osd.45
device 46 osd.46
device 47 osd.47
device 48 osd.48
device 49 osd.49
device 50 osd.50
device 51 osd.51
device 52 device52
device 53 osd.53
device 54 osd.54
device 55 osd.55
device 56 osd.56
device 57 osd.57
device 58 osd.58
device 59 osd.59
device 60 osd.60
device 61 osd.61
device 62 osd.62
device 63 osd.63
device 64 osd.64
device 65 osd.65
device 66 osd.66
device 67 osd.67
device 68 osd.68
device 69 osd.69
device 70 osd.70
device 71 osd.71
device 72 osd.72
device 73 osd.73
device 74 osd.74
device 75 osd.75
device 76 osd.76
device 77 osd.77
device 78 osd.78
device 79 osd.79
device 80 osd.80
device 81 osd.81
device 82 osd.82
device 83 osd.83
device 84 osd.84
device 85 osd.85
device 86 osd.86
device 87 osd.87
device 88 osd.88
device 89 osd.89
device 90 osd.90
device 91 osd.91
device 92 osd.92
device 93 osd.93
device 94 osd.94
device 95 osd.95
device 96 osd.96
device 97 osd.97
device 98 osd.98
device 99 osd.99
device 100 osd.100
device 101 osd.101
device 102 osd.102
device 103 osd.103
device 104 osd.104
device 105 osd.105
device 106 osd.106
device 107 osd.107
device 108 osd.108
device 109 osd.109
device 110 osd.110
device 111 osd.111
device 112 osd.112
device 113 osd.113
device 114 osd.114
device 115 osd.115
device 116 osd.116
device 117 osd.117
device 118 osd.118
device 119 osd.119
device 120 osd.120
device 121 osd.121
device 122 osd.122
device 123 osd.123
device 124 osd.124
device 125 osd.125
device 126 osd.126
device 127 osd.127
device 128 osd.128
device 129 osd.129
device 130 osd.130
device 131 osd.131
device 132 osd.132
device 133 osd.133
device 134 osd.134
device 135 osd.135

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host srv-osd01-ssd {
id -2   # do not change unnecessarily
# weight 0.854
alg straw
hash 0  # rjenkins1
item osd.0 weight 0.427
item osd.1 weight 0.427
}
host srv-osd02-ssd {
id -3   # do not change unnecessarily
# weight 0.854
alg straw
hash 0  # rjenkins1
item osd.2 weight 0.427
item osd.3 weight 0.427
}
host srv-osd03-ssd {
id -4   # do not change unnecessarily
# weight 0.854
alg straw
hash 0  # rjenkins1
item osd.4 weight 0.427
item osd.5 weight 0.427
}
host srv-osd04-ssd {
id -5   # do not change unnecessarily
# weight 0.854
alg straw
hash 0  # rjenkins1
item osd.6 weight 0.427
item osd.7 weight 0.427
}
host srv-osd05-ssd {
id -6   # do

[ceph-users] crash of osd using cephfs jewel 10.2.2, and corruption

2016-09-21 Thread Peter Maloney

Hi,

I created a one disk osd with data and separate journal on the same lvm
volume group just for test, one mon, one mds on my desktop.

I managed to crash the osd just by mounting cephfs and doing cp -a of
the linux-stable git tree into it. It crashed after copying 2.1G which
only covers some of the .git dir and none of the rest. And then when I
killed ceph-mds and restarted the osd and mds, ceph -s said something
about the pgs being stuck or unclean or something, and the computer
froze. :/ After booting again, everything is fine, and the problem was
reproducable the same way...just copying the files again.[but after
writing this mail, I can't seem to cause it as easily again... copying
again works, but sha1sum doesn't, even if I drop caches]

Also reading seems to do the same.

And then I tried adding a 2nd osd (also from vlm, with osd and journal
on same volume group). And that seemed to stop the crashing, but not
sure about corruption.I guess the corruption was on the cephfs but RAM
had good copies or something, so rebooting, etc. is what made the
corruption appear? (I tried to reproduce, but couldn't...didn't try
killing daemons)

> root@client:/mnt/test # ls -l
> total 447
> drwx-- 1 root root  4 2016-09-20 20:37 1/
> drwx-- 1 root root  4 2016-09-20 20:37 2/
> drwx-- 1 root root  4 2016-09-20 20:37 linux-stable/
> -rw-r--r-- 1 root root 457480 2016-09-20 21:38 sums.txt
> root@client:/mnt/test # (cd linux-stable/; sha1sum -c --quiet
> ../sums.txt )
(osd crashed before that finished ... and then impressively, starting
the osd again made the command finish gracefully... and then tried rsync
to finish copying and 6 or so restarts later it finished with just the 1
rsync run)

And then the checksums didn't match... (corruption)
> root@client:/mnt/test # (cd linux-stable/; sha1sum -c --quiet
> ../sums.txt )
> ./.git/objects/e6/635671beff26a417c02d50adeefa2a6897a9dd: FAILED
> ./.git/objects/e6/d58d90213a4a283d428988a398281663dd68e4: FAILED
> ./.git/objects/81/281381965b21d3c23b2f877e214c4af65d6fa4: FAILED
> ./.git/objects/4c/f549c4a9b23638ab49cc0f8b47c395b1fc8ede: FAILED
> sha1sum: WARNING: 4 computed checksums did NOT match

> root@client:/mnt/test # hexdump -C
> linux-stable/.git/objects/e6/635671beff26a417c02d50adeefa2a6897a9dd
>   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
> ||
> *
> 0dd0  00 00 00  |...|
> 0dd3
> peter@peter:~/projects $ hexdump -C
> linux-stable/.git/objects/e6/635671beff26a417c02d50adeefa2a6897a9dd | head
>   78 01 bd 5b 7d 6f db c6  19 df bf d4 a7 38 20 40 
> |x..[}o...8 @|
> 0010  27 0b 8e ec 14 2b 06 24  5d 90 34 75 5c 63 4e 6c 
> |'+.$].4u\cNl|
> 0020  d8 f1 82 62 19 08 9a 3a  59 ac 29 52 25 29 bb 5e 
> |...b...:Y.)R%).^|
> 0030  9a ef be df f3 dc 1d 79  c7 77 39 c1 84 c0 12 79 
> |...y.w9y|
> 0040  77 cf fb 3b 99 eb 38 bd  16 cf 7e 38 fc e1 f0 2f 
> |w..;..8...~8.../|
> 0050  07 b3 89 98 89 fc 21 5f  e6 f3 95 78 2a 16 72 19 
> |..!_...x*.r.|
> 0060  25 51 11 a5 49 2e 96 69  26 8a 95 c4 bd bb 28 c4 
> |%Q..I..i&.(.|
> 0070  57 16 dd c9 4c 2c a3 58  62 7f 21 d7 38 49 87 df 
> |W...L,.Xb.!.8I..|
> 0080  a4 9b 87 2c ba 59 15 62  1a ee 89 ef 0f 0f 9f ed 
> |...,.Y.b|
> 0090  e3 cf f7 e2 3c 28 b2 28  bc 15 ef d2 70 25 e3 d6 
> |<(.(p%..|
and then copying that and testing checksums again has even more failures

> root@client:/mnt/test # cp -a linux-stable 3
> root@client:/mnt/test # (cd 3/; sha1sum -c --quiet ../sums.txt )
> ./net/iucv/iucv.c: FAILED
> ./net/kcm/kcmsock.c: FAILED
> ./net/irda/ircomm/ircomm_event.c: FAILED
> ./net/irda/ircomm/ircomm_tty_attach.c: FAILED
> ./net/llc/Makefile: FAILED
> ./net/llc/Kconfig: FAILED
> ./net/llc/af_llc.c: FAILED
> ./net/lapb/lapb_timer.c: FAILED
> ./net/lapb/lapb_subr.c: FAILED
> ./net/lapb/lapb_iface.c: FAILED
> ./net/lapb/Makefile: FAILED
> ./net/lapb/lapb_in.c: FAILED
> ./net/lapb/lapb_out.c: FAILED
> ./net/l2tp/l2tp_eth.c: FAILED
> ./net/l2tp/Kconfig: FAILED
> ./net/l2tp/l2tp_core.h: FAILED
> ./.git/objects/e6/635671beff26a417c02d50adeefa2a6897a9dd: FAILED
> ./.git/objects/e6/d58d90213a4a283d428988a398281663dd68e4: FAILED
> ./.git/objects/81/281381965b21d3c23b2f877e214c4af65d6fa4: FAILED
> ./.git/objects/4c/f549c4a9b23638ab49cc0f8b47c395b1fc8ede: FAILED
> sha1sum: WARNING: 20 computed checksums did NOT match
> root@client:/mnt/test # hexdump -C 3/net/iucv/iucv.c
>   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
> ||
> *
> d420  00 00 00 00 00 00 00 00   ||
> d428
> root@client:/mnt/test # hexdump -C 3/net/kcm/kcmsock.c
>   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
> ||
> *
> d010  00 00 00 00 00 00 00  |...|
> d017
> root@client:/mnt/test # hexdump -C 3/net/lapb/Makefile
>   00 00 00 00 00 00 00 00  00 00 00

46 matches

Mail list logo