Re: [ceph-users] Moderator?

2017-08-23 Thread Dan Mick
On 08/23/2017 07:29 AM, Eric Renfro wrote:
> I sent a message in almost 2 days ago, with pasted logs. Since then, it’s 
> been in the moderator’s queue and still not approved (or even declined). 
> 
> Is anyone actually checking that? ;)
> 
> Eric Renfro

As you guessed, no, not really.

I found the message.  It's too large.  There are a lot of users on this
mailing list.  Please cut down its size and try again.

-- 
Dan Mick
Red Hat, Inc.
Ceph docs: http://ceph.com/docs
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [SSD NVM FOR JOURNAL] Performance issues

2017-08-23 Thread Christian Balzer

Hello,

On Wed, 23 Aug 2017 09:11:18 -0300 Guilherme Steinmüller wrote:

> Hello!
> 
> I recently installed INTEL SSD 400GB 750 SERIES PCIE 3.0 X4 in 3 of my OSD
> nodes.
> 
Well, you know what's coming now, don't you?

That's a consumer device, with 70GB writes per day endurance.
unless you're essentially having a read-only cluster, you're throwing away
money. 

> First of all, here's is an schema describing how my cluster is:
> 
> [image: Imagem inline 1]
> 
> [image: Imagem inline 2]
> 
> I primarily use my ceph as a beckend for OpenStack nova, glance, swift and
> cinder. My crushmap is configured to have rulesets for SAS disks, SATA
> disks and another ruleset that resides in HPE nodes using SATA disks too.
> 
> Before installing the new journal in HPE nodes, i was using one of the
> disks that today are OSDs (osd.35, osd.34 and osd.33). After upgrading the
> journal, i noticed that a dd command writing 1gb blocks in openstack nova
> instances doubled the throughput but the value expected was actually 400%
> or 500% since in the Dell nodes that we have another nova pool the
> throughput is around this value.
> 
Apples, oranges and bananas. 
You're comparing different HW (and no, I'm not going to look this up)
which may or may not have vastly different capabilities (like HW cache),
RAM and (unlikely relevant) CPU. 
Your NVMe may also be plugged into a different, insufficient PCIe slot for
all we know.
You're also using very different HDDs, which definitely will be a factor.

But most importanly you're comparing 2 pools of vastly different ODS
count, no wonder a pool with 15 OSDs is faster in sequential writes than
one with 9. 

> Here is a demonstration of the scenario and the difference in performance
> between Dell nodes and HPE nodes:
> 
> 
> 
> Scenario:
> 
> 
>-Using pools to store instance disks for OpenStack
> 
> 
>- Pool nova in "ruleset SAS" placed on c4-osd201, c4-osd202 and
>c4-osd203 with 5 osds per hosts
> 
SAS
> 
>- Pool nova_hpedl180 in "ruleset NOVA_HPEDL180" placed on c4-osd204,
>c4-osd205, c4-osd206 with 3 osds per hosts
> 
SATA
> 
>- Every OSD has one partition of 35GB in a INTEL SSD 400GB 750
>SERIES PCIE 3.0 X4
> 
Overkill, but since your NVMe will die shortly anyway...

With large sequential tests, the journal will have nearly NO impact on the
result, even if tuned to that effect.

> 
>- Internal link for cluster and public network of 10Gbps
> 
> 
>- Deployment via ceph-ansible. Same configuration define in ansible
>for every host on cluster
> 
> 
> 
> *Instance on pool nova in ruleset SAS:*
> 
> 
># dd if=/dev/zero of=/mnt/bench bs=1G count=1 oflag=direct
>1+0 records in
>1+0 records out
>1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.56255 s, 419 MB/s
> 
This is a very small test for what you're trying to determine and not
going to be very representative. 
If for example there _is_ a HW cache of 2GB on the Dell nodes, it would
fit nicely in there.

> 
> *Instance on pool nova in ruleset NOVA_HPEDL180:*
> 
>  #  dd if=/dev/zero of=/mnt/bench bs=1G count=1 oflag=direct
>  1+0 records in
>  1+0 records out
>  1073741824 bytes (1.1 GB, 1.0 GiB) copied, 11.8243 s, 90.8 MB/s
> 
> 
> I made some FIO benchmarks as suggested by Sebastien (
> https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-
> test-if-your-ssd-is-suitable-as-a-journal-device/ ) and the command with 1
> job returned me about 180MB/s of throughput in recently installed nodes
> (HPE nodes). I made some hdparm benchmark in all SSDs and everything seems
> normal.
> 
I'd consider a 180MB/s result from a device that supposedly does 900MB/s a
fail, but then again those tests above do NOT reflect journal usage
reality but a more of a hint if something is totally broken or not.

> 
> I can't see what is causing this difference of throughput since the network
> is not a problem and i think that cpu and memory are not crucial since i
> was monitoring the cluster with atop command and i didn't notice saturation
> of resources. My only though is that I have less workload in nova_hpedl180
> pool in HPE nodes and less disks per node and this ca influence in the
> throughput of the journal.
>
How busy are your NVMe journals during that test on the Dells and HPs
respectively?
Same for the HDDs.

Again, run longer, larger tests to get something that will actually
register, also atop with shorter intervals.

Christian
> 
> Any clue about what is missing or what is happening?
> 
> Thanks in advance.


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] NVMe + SSD + HDD RBD Replicas with Bluestore...

2017-08-23 Thread Mark Nelson



On 08/23/2017 07:17 PM, Mark Nelson wrote:



On 08/23/2017 06:18 PM, Xavier Trilla wrote:

Oh man, what do you know!... I'm quite amazed. I've been reviewing
more documentation about min_replica_size and seems like it doesn't
work as I thought (Although I remember specifically reading it
somewhere some years ago :/ ).

And, as all replicas need to be written before primary OSD informs the
client about the write being completed, we cannot have the third
replica on HDDs, no way. It would kill latency.

Well, we'll just keep adding NVMs to our cluster (I mean, S4500 and
P4500 price difference is negligible) and we'll decrease the primary
affinity weight for SATA SSDs, just to be sure we get the most out of
NVMe.

BTW, does anybody have any experience so far with erasure coding and
rbd? A 2/3 profile, would really save space on SSDs but I'm afraid
about the extra calculations needed and how will it affect
performance... Well, maybe I'll check into it, and I'll start a new
thread :)


There's a decent chance you'll get higher performance with something
like EC 6+2 vs 3X replication for large writes due simply to having less
data to write (we see somewhere between 2x and 3x rep performance in the
lab for 4MB writes to RBD). Small random writes will almost certainly be
slower due to increased latency.  Reads in general will be slower as
well.  With replication the read comes entirely from the primary but in
EC you have to fetch chunks from the secondaries and reconstruct the
object before sending it back to the client.

So basically compared to 3X rep you'll likely gain some performance on
large writes, lose some performance on large reads, and lose more
performance on small writes/reads (dependent on cpu speed and various
other factors).


I should follow up and mention though that you gain space vs 3X as well, 
so it's very much a question of what trade-offs you are willing to make.




Mark



Anyway, thanks for the info!
Xavier.

-Mensaje original-
De: Christian Balzer [mailto:ch...@gol.com]
Enviado el: martes, 22 de agosto de 2017 2:40
Para: ceph-users@lists.ceph.com
CC: Xavier Trilla 
Asunto: Re: [ceph-users] NVMe + SSD + HDD RBD Replicas with Bluestore...


Hello,


Firstly, what David said.

On Mon, 21 Aug 2017 20:25:07 + Xavier Trilla wrote:


Hi,

I'm working into improving the costs of our actual ceph cluster. We
actually keep 3 x replicas, all of them in SSDs (That cluster hosts
several hundred VMs RBD disks) and lately I've been wondering if the
following setup would make sense, in order to improve cost /
performance.



Have you done a full analysis of your current cluster, as in
utilization of your SSDs (IOPS), CPU, etc with
atop/iostat/collectd/grafana?
During peak utilization times?

If so, you should have a decent enough idea of what level IOPS you
need and can design from there.


The ideal would be to move PG primaries to high performance nodes
using NVMe, keep secondary replica in SSDs and move the third replica
to HDDs.

Most probably the hardware will be:

1st Replica: Intel P4500 NVMe (2TB)
2nd Replica: Intel S3520 SATA SSD (1.6TB)

Unless you have:
a) a lot of these and/or
b) very little writes
what David said.

Aside from that whole replica idea not working. as you think.


3rd Replica: WD Gold Harddrives (2 TB) (I'm considering either 1TB o
2TB model, as I want to have as many spins as possible)

Also, hosts running OSDs would have a quite different HW configuration
(In our experience NVMe need crazy CPU power in order to get the best
out of them)


Correct, one might run into that with pure NVMe/SSD nodes.


I know the NVMe and SATA SSD replicas will work, no problem about
that (We'll just adjust the primary affinity and crushmap in order to
have the desired data layoff + primary OSDs) what I'm worried is
about the HDD replica.

Also the pool will have min_size 1 (Would love to use min_size 2, but
it would kill latency times) so, even if we have to do some
maintenance in the NVMe nodes, writes to HDDs will be always "lazy".

Before bluestore (we are planning to move to luminous most probably
by the end of the year or beginning 2018, once it is released and
tested properly) I would just use  SSD/NVMe journals for the HDDs.
So, all writes would go to the SSD journal, and then moved to the
HDD. But now, with Bluestore I don't think that's an option anymore.


Bluestore bits are still a bit of dark magic in terms of concise and
complete documentation, but the essentials have been mentioned here
before.

Essentially, if you can get the needed IOPS with SSD/NVMe journals and
HDDs, Bluestore won't be worse than that, if done correctly.

With Bluestore use either NVMe for the WAL (small space, high
IOPS/data), SSDs for the actual rocksdb and the (surprise, surprise!)
journal for small writes (large space, nobody knows for sure how large
is large enough) and finally the HDDs.

If you're trying to optimize costs, decent SSDs (good luck finding any
with Intel 37xx and 36xx basical

Re: [ceph-users] NVMe + SSD + HDD RBD Replicas with Bluestore...

2017-08-23 Thread Mark Nelson



On 08/23/2017 06:18 PM, Xavier Trilla wrote:

Oh man, what do you know!... I'm quite amazed. I've been reviewing more 
documentation about min_replica_size and seems like it doesn't work as I 
thought (Although I remember specifically reading it somewhere some years ago 
:/ ).

And, as all replicas need to be written before primary OSD informs the client 
about the write being completed, we cannot have the third replica on HDDs, no 
way. It would kill latency.

Well, we'll just keep adding NVMs to our cluster (I mean, S4500 and P4500 price 
difference is negligible) and we'll decrease the primary affinity weight for 
SATA SSDs, just to be sure we get the most out of NVMe.

BTW, does anybody have any experience so far with erasure coding and rbd? A 2/3 
profile, would really save space on SSDs but I'm afraid about the extra 
calculations needed and how will it affect performance... Well, maybe I'll 
check into it, and I'll start a new thread :)


There's a decent chance you'll get higher performance with something 
like EC 6+2 vs 3X replication for large writes due simply to having less 
data to write (we see somewhere between 2x and 3x rep performance in the 
lab for 4MB writes to RBD). Small random writes will almost certainly be 
slower due to increased latency.  Reads in general will be slower as 
well.  With replication the read comes entirely from the primary but in 
EC you have to fetch chunks from the secondaries and reconstruct the 
object before sending it back to the client.


So basically compared to 3X rep you'll likely gain some performance on 
large writes, lose some performance on large reads, and lose more 
performance on small writes/reads (dependent on cpu speed and various 
other factors).


Mark



Anyway, thanks for the info!
Xavier.

-Mensaje original-
De: Christian Balzer [mailto:ch...@gol.com]
Enviado el: martes, 22 de agosto de 2017 2:40
Para: ceph-users@lists.ceph.com
CC: Xavier Trilla 
Asunto: Re: [ceph-users] NVMe + SSD + HDD RBD Replicas with Bluestore...


Hello,


Firstly, what David said.

On Mon, 21 Aug 2017 20:25:07 + Xavier Trilla wrote:


Hi,

I'm working into improving the costs of our actual ceph cluster. We actually 
keep 3 x replicas, all of them in SSDs (That cluster hosts several hundred VMs 
RBD disks) and lately I've been wondering if the following setup would make 
sense, in order to improve cost / performance.



Have you done a full analysis of your current cluster, as in utilization of 
your SSDs (IOPS), CPU, etc with atop/iostat/collectd/grafana?
During peak utilization times?

If so, you should have a decent enough idea of what level IOPS you need and can 
design from there.


The ideal would be to move PG primaries to high performance nodes using NVMe, 
keep secondary replica in SSDs and move the third replica to HDDs.

Most probably the hardware will be:

1st Replica: Intel P4500 NVMe (2TB)
2nd Replica: Intel S3520 SATA SSD (1.6TB)

Unless you have:
a) a lot of these and/or
b) very little writes
what David said.

Aside from that whole replica idea not working. as you think.


3rd Replica: WD Gold Harddrives (2 TB) (I'm considering either 1TB o
2TB model, as I want to have as many spins as possible)

Also, hosts running OSDs would have a quite different HW configuration
(In our experience NVMe need crazy CPU power in order to get the best
out of them)


Correct, one might run into that with pure NVMe/SSD nodes.


I know the NVMe and SATA SSD replicas will work, no problem about that (We'll 
just adjust the primary affinity and crushmap in order to have the desired data 
layoff + primary OSDs) what I'm worried is about the HDD replica.

Also the pool will have min_size 1 (Would love to use min_size 2, but it would kill 
latency times) so, even if we have to do some maintenance in the NVMe nodes, writes to 
HDDs will be always "lazy".

Before bluestore (we are planning to move to luminous most probably by the end 
of the year or beginning 2018, once it is released and tested properly) I would 
just use  SSD/NVMe journals for the HDDs. So, all writes would go to the SSD 
journal, and then moved to the HDD. But now, with Bluestore I don't think 
that's an option anymore.


Bluestore bits are still a bit of dark magic in terms of concise and complete 
documentation, but the essentials have been mentioned here before.

Essentially, if you can get the needed IOPS with SSD/NVMe journals and HDDs, 
Bluestore won't be worse than that, if done correctly.

With Bluestore use either NVMe for the WAL (small space, high IOPS/data), SSDs 
for the actual rocksdb and the (surprise, surprise!) journal for small writes 
(large space, nobody knows for sure how large is large enough) and finally the 
HDDs.

If you're trying to optimize costs, decent SSDs (good luck finding any with 
Intel 37xx and 36xx basically unavailable), maybe the S or P 4600, to hold both 
the WAL and DB should do the trick.

Christian


What I'm worried is how would affect to the NVMe prim

Re: [ceph-users] NVMe + SSD + HDD RBD Replicas with Bluestore...

2017-08-23 Thread Xavier Trilla
Oh man, what do you know!... I'm quite amazed. I've been reviewing more 
documentation about min_replica_size and seems like it doesn't work as I 
thought (Although I remember specifically reading it somewhere some years ago 
:/ ).

And, as all replicas need to be written before primary OSD informs the client 
about the write being completed, we cannot have the third replica on HDDs, no 
way. It would kill latency.

Well, we'll just keep adding NVMs to our cluster (I mean, S4500 and P4500 price 
difference is negligible) and we'll decrease the primary affinity weight for 
SATA SSDs, just to be sure we get the most out of NVMe.

BTW, does anybody have any experience so far with erasure coding and rbd? A 2/3 
profile, would really save space on SSDs but I'm afraid about the extra 
calculations needed and how will it affect performance... Well, maybe I'll 
check into it, and I'll start a new thread :)

Anyway, thanks for the info!
Xavier.

-Mensaje original-
De: Christian Balzer [mailto:ch...@gol.com] 
Enviado el: martes, 22 de agosto de 2017 2:40
Para: ceph-users@lists.ceph.com
CC: Xavier Trilla 
Asunto: Re: [ceph-users] NVMe + SSD + HDD RBD Replicas with Bluestore...


Hello,


Firstly, what David said.

On Mon, 21 Aug 2017 20:25:07 + Xavier Trilla wrote:

> Hi,
> 
> I'm working into improving the costs of our actual ceph cluster. We actually 
> keep 3 x replicas, all of them in SSDs (That cluster hosts several hundred 
> VMs RBD disks) and lately I've been wondering if the following setup would 
> make sense, in order to improve cost / performance.
> 

Have you done a full analysis of your current cluster, as in utilization of 
your SSDs (IOPS), CPU, etc with atop/iostat/collectd/grafana?
During peak utilization times?

If so, you should have a decent enough idea of what level IOPS you need and can 
design from there.

> The ideal would be to move PG primaries to high performance nodes using NVMe, 
> keep secondary replica in SSDs and move the third replica to HDDs.
> 
> Most probably the hardware will be:
> 
> 1st Replica: Intel P4500 NVMe (2TB)
> 2nd Replica: Intel S3520 SATA SSD (1.6TB)
Unless you have:
a) a lot of these and/or
b) very little writes
what David said. 

Aside from that whole replica idea not working. as you think.

> 3rd Replica: WD Gold Harddrives (2 TB) (I'm considering either 1TB o 
> 2TB model, as I want to have as many spins as possible)
> 
> Also, hosts running OSDs would have a quite different HW configuration 
> (In our experience NVMe need crazy CPU power in order to get the best 
> out of them)
> 
Correct, one might run into that with pure NVMe/SSD nodes.

> I know the NVMe and SATA SSD replicas will work, no problem about that (We'll 
> just adjust the primary affinity and crushmap in order to have the desired 
> data layoff + primary OSDs) what I'm worried is about the HDD replica.
> 
> Also the pool will have min_size 1 (Would love to use min_size 2, but it 
> would kill latency times) so, even if we have to do some maintenance in the 
> NVMe nodes, writes to HDDs will be always "lazy".
> 
> Before bluestore (we are planning to move to luminous most probably by the 
> end of the year or beginning 2018, once it is released and tested properly) I 
> would just use  SSD/NVMe journals for the HDDs. So, all writes would go to 
> the SSD journal, and then moved to the HDD. But now, with Bluestore I don't 
> think that's an option anymore.
>
Bluestore bits are still a bit of dark magic in terms of concise and complete 
documentation, but the essentials have been mentioned here before.

Essentially, if you can get the needed IOPS with SSD/NVMe journals and HDDs, 
Bluestore won't be worse than that, if done correctly.

With Bluestore use either NVMe for the WAL (small space, high IOPS/data), SSDs 
for the actual rocksdb and the (surprise, surprise!) journal for small writes 
(large space, nobody knows for sure how large is large enough) and finally the 
HDDs. 

If you're trying to optimize costs, decent SSDs (good luck finding any with 
Intel 37xx and 36xx basically unavailable), maybe the S or P 4600, to hold both 
the WAL and DB should do the trick.

Christian
 
> What I'm worried is how would affect to the NVMe primary OSDs having a quite 
> slow third replica. WD Gold hard drives seem quite decent (For a SATA drive) 
> but obviously performance is nowhere near to SSDs or NVMe.
> 
> So, what do you think? Does anybody have some opinions or experience he would 
> like to share?
> 
> Thanks!
> Xavier.
> 
> 
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Tech Talk Cancelled

2017-08-23 Thread Leonardo Vaz
On Thu, Aug 24, 2017 at 12:04:37AM +0800, Xuehan Xu wrote:
> Hi, Leonardo
> 
> Will there be a link for September's CDM in
> http://tracker.ceph.com/projects/ceph/wiki/Planning?

Yes, I must post the reminder next week.

> And when will the video record of August's CDM be posted in youtube?

It's on the Todo list Xuehan. ;-)

Kindest regards,

Leo

> 
> Thank you:-)
> 
> On 23 August 2017 at 23:04, Leonardo Vaz  wrote:
> > Hey Cephers,
> >
> > Sorry for the short notice, but the Ceph Tech Talk for August (scheduled
> > for today) has been cancelled.
> >
> > Kindest regards,
> >
> > Leo
> >
> > --
> > Leonardo Vaz
> > Ceph Community Manager
> > Open Source and Standards Team
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Leonardo Vaz
Ceph Community Manager
Open Source and Standards Team
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Anybody gotten boto3 and ceph RGW working?

2017-08-23 Thread Bryan Banister
That was the problem, thanks again,
-Bryan

From: Bryan Banister
Sent: Wednesday, August 23, 2017 9:06 AM
To: Bryan Banister ; Abhishek Lekshmanan 
; ceph-users@lists.ceph.com
Subject: RE: [ceph-users] Anybody gotten boto3 and ceph RGW working?

Looks like I found the problem:  
https://github.com/snowflakedb/snowflake-connector-python/issues/1

I’ll try the fixed version of botocore 1.4.87+,
-Bryan

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Bryan 
Banister
Sent: Wednesday, August 23, 2017 9:01 AM
To: Abhishek Lekshmanan mailto:abhis...@suse.com>>; 
ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Anybody gotten boto3 and ceph RGW working?

Note: External Email


Here is the error I get:



# python3 boto3_test.py

Traceback (most recent call last):

  File "boto3_test.py", line 15, in 

for bucket in s3.list_buckets():

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/client.py",
 line 251, in _api_call

return self._make_api_call(operation_name, kwargs)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/client.py",
 line 526, in _make_api_call

operation_model, request_dict)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/endpoint.py",
 line 141, in make_request

return self._send_request(request_dict, operation_model)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/endpoint.py",
 line 170, in _send_request

success_response, exception):

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/endpoint.py",
 line 249, in _needs_retry

caught_exception=caught_exception, request_dict=request_dict)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/hooks.py",
 line 227, in emit

return self._emit(event_name, kwargs)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/hooks.py",
 line 210, in _emit

response = handler(**kwargs)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/retryhandler.py",
 line 183, in __call__

if self._checker(attempts, response, caught_exception):

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/retryhandler.py",
 line 251, in __call__

caught_exception)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/retryhandler.py",
 line 269, in _should_retry

return self._checker(attempt_number, response, caught_exception)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/retryhandler.py",
 line 317, in __call__

caught_exception)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/retryhandler.py",
 line 223, in __call__

attempt_number, caught_exception)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/retryhandler.py",
 line 359, in _check_caught_exception

raise caught_exception

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/endpoint.py",
 line 204, in _get_response

proxies=self.proxies, timeout=self.timeout)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/vendored/requests/sessions.py",
 line 573, in send

r = adapter.send(request, **kwargs)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/vendored/requests/adapters.py",
 line 370, in send

timeout=timeout

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/vendored/requests/packages/urllib3/connectionpool.py",
 line 544, in urlopen

body=body, headers=headers)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/vendored/requests/packages/urllib3/connectionpool.py",
 line 349, in _make_request

conn.request(method, url, **httplib_request_kw)

  File "/jump/software/rhel7/Python-3.6.0/lib/python3.6/http/client.py", line 
1239, in request

self._send_request(method, url, body, headers, encode_chunked)

TypeError: _send_request() takes 5 positional arguments but 6 were given



Here is the simple code:

import boto3



access_key = ""

secret_key = ""

gateway = "http://carf-ceph-osd15";



s3 = boto3.client('s3', 'us-east-1',

aws_access_key_id=access_key,

aws_secret_access_key=secret_key,

endpoint_url=gateway,

use_ssl=False

   )

# config=boto3.session.Config(signature_version='s3v2')



for bucket in s3.list_buckets():

for key in bucket.objects.all():

print(key.key)



Thanks in advance for any help!!

-Bryan



-Original Message-
From: 

Re: [ceph-users] PG reported as inconsistent in status, but no inconsistencies visible to rados

2017-08-23 Thread Edward R Huyer
Actually, this looks very much like my issue, so I'll add to that:  
http://tracker.ceph.com/issues/21040

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Edward 
R Huyer
Sent: Wednesday, August 23, 2017 11:10 AM
To: Brad Hubbard 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] PG reported as inconsistent in status, but no 
inconsistencies visible to rados

Forgot to send to the list with the first reply.

I'm honestly not exactly sure when it happened.  I hadn't looked at ceph status 
in several days prior to discovering the issue and submitting to the mailing 
list.  I've seen one or two inconsistent pg issues randomly crop up in the 
month or so since these nodes were spun up, but nothing I couldn't resolve.

There was an issue with one of the Proxmox VE nodes that store VM data in the 
ceph cluster.  A network driver issue that caused the NIC to be disabled.  That 
was a week or two ago, and has since been resolved.  While the problematic PG 
is in the pool used by Proxmox, I wouldn't expect the above problem would be 
able to cause store-level corruption on the OSDs.

Other than that, nothing of interest has happened that I'm aware of, though I 
don't yet have good monitoring on these nodes.

I'll put something in the tracker later today.

Thank you for your help.

-Original Message-
From: Brad Hubbard [mailto:bhubb...@redhat.com]
Sent: Wednesday, August 23, 2017 4:44 AM
To: Edward R Huyer 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] PG reported as inconsistent in status, but no 
inconsistencies visible to rados

On Wed, Aug 23, 2017 at 12:47 AM, Edward R Huyer  wrote:
> Neat, hadn't seen that command before.  Here's the fsck log from the 
> primary OSD:  https://pastebin.com/nZ0H5ag3
>
> Looks like the OSD's bluestore "filesystem" itself has some underlying 
> errors, though I'm not sure what to do about them.

Hmmm... Can you tell us any more about how/when this happened?

Any corresponding event at all? Any interesting log entries around the same 
time?

Could you also open a tracker for this (or let me know and I can open one for 
you)? That way we can continue the investigation there.

>
> -Original Message-
> From: Brad Hubbard [mailto:bhubb...@redhat.com]
> Sent: Monday, August 21, 2017 7:05 PM
> To: Edward R Huyer 
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] PG reported as inconsistent in status, but 
> no inconsistencies visible to rados
>
> Could you provide the output of 'ceph-bluestore-tool fsck' for one of these 
> OSDs?
>
> On Tue, Aug 22, 2017 at 2:53 AM, Edward R Huyer  wrote:
>> This is an odd one.  My cluster is reporting an inconsistent pg in 
>> ceph status and ceph health detail.  However, rados 
>> list-inconsistent-obj and rados list-inconsistent-snapset both report 
>> no inconsistencies.  Scrubbing the pg results in these errors in the osd 
>> logs:
>>
>>
>>
>> OSD 63 (primary):
>>
>> 2017-08-21 12:41:03.580068 7f0b36629700 -1
>> bluestore(/var/lib/ceph/osd/ceph-63) _verify_csum bad crc32c/0x1000 
>> checksum at blob offset 0x0, got 0x6b6b9184, expected 0x6706be76, 
>> device location [0x23f39d~1000], logical extent 0x0~1000, object 
>> #9:55bf7cc6:::rbd_data.33992ae8944a.200f:e#
>>
>> 2017-08-21 12:41:03.961945 7f0b36629700 -1 log_channel(cluster) log [ERR] :
>> 9.aa soid 9:55bf7cc6:::rbd_data.33992ae8944a.200f:e:
>> failed to pick suitable object info
>>
>> 2017-08-21 12:41:15.357484 7f0b36629700 -1 log_channel(cluster) log [ERR] :
>> 9.aa deep-scrub 3 errors
>>
>>
>>
>> OSD 50:
>>
>> 2017-08-21 12:41:03.592918 7f264be6d700 -1
>> bluestore(/var/lib/ceph/osd/ceph-50) _verify_csum bad crc32c/0x1000 
>> checksum at blob offset 0x0, got 0x64a1e2b1, expected 0x6706be76, 
>> device location [0x341883~1000], logical extent 0x0~1000, object 
>> #9:55bf7cc6:::rbd_data.33992ae8944a.200f:e#
>>
>>
>>
>> OSD 46:
>>
>> 2017-08-21 12:41:03.531394 7fb396b1f700 -1
>> bluestore(/var/lib/ceph/osd/ceph-46) _verify_csum bad crc32c/0x1000 
>> checksum at blob offset 0x0, got 0x7aa05c01, expected 0x6706be76, 
>> device location [0x1d6e1e~1000], logical extent 0x0~1000, object 
>> #9:55bf7cc6:::rbd_data.33992ae8944a.200f:e#
>>
>>
>>
>> This is on Ceph 12.1.4 (previously 12.1.1).
>>
>>
>>
>> Thoughts?
>>
>>
>>
>> -
>>
>> Edward Huyer
>>
>> School of Interactive Games and Media
>>
>> Rochester Institute of Technology
>>
>> Golisano 70-2373
>>
>> 152 Lomb Memorial Drive
>>
>> Rochester, NY 14623
>>
>> 585-475-6651
>>
>> erh...@rit.edu
>>
>>
>>
>> Obligatory Legalese:
>>
>> The information transmitted, including attachments, is intended only 
>> for the
>> person(s) or entity to which it is addressed and may contain 
>> confidential and/or privileged material. Any review, retransmission, 
>> dissemination or other use of, or taking of any action in reliance 
>> upon this information by persons or entities other than the intended 
>> reci

[ceph-users] Problems recovering MDS

2017-08-23 Thread Eric Renfro
So, I was running Ceph 10.2.9 servers, with 10.2.6 (I think, what is in 
CentOS’s Jewel-SIG repo?), clients.

I had an issue where the MDS cluster stopped working, wasn’t responding to 
cache pressure, and I restarted the mdd’s and they failed to replay the 
journal. 

Long story short, I managed to get things sort of working, I upgraded to 
Luminous 12.1.4rc because it had the more developed cephfs-data-scan tools with 
scan_links (Jewel did not). Even though things are mostly working, there is 
obviously still some corruption in links and metadata, as I’m getting logs of 
them.

What I need to know is, how can I fix this so that I clear all the data 
corruption? I’ve gone through the steps documented in the disaster recovery. 
I’m doing a last ditch attempt to re-order how I do things just a little by 
running scan_frags, then scan_extents and scan_inodes, hoping that it can 
repair some of the damage.

At the very least what I want, since nothing important seems to be 
corrupted/damaged, is to repair or delete the damaged links/references, and 
clear up all that so things run reliably again.

Eric Renfro
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD doesn't always start at boot

2017-08-23 Thread Sean Purdy
On Wed, 23 Aug 2017, David Turner said:
> This isn't a solution to fix them not starting at boot time, but a fix to
> not having to reboot the node again.  `ceph-disk activate-all` should go
> through and start up the rest of your osds without another reboot.

Thanks, will try next time.

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG reported as inconsistent in status, but no inconsistencies visible to rados

2017-08-23 Thread Edward R Huyer
Forgot to send to the list with the first reply.

I'm honestly not exactly sure when it happened.  I hadn't looked at ceph status 
in several days prior to discovering the issue and submitting to the mailing 
list.  I've seen one or two inconsistent pg issues randomly crop up in the 
month or so since these nodes were spun up, but nothing I couldn't resolve.

There was an issue with one of the Proxmox VE nodes that store VM data in the 
ceph cluster.  A network driver issue that caused the NIC to be disabled.  That 
was a week or two ago, and has since been resolved.  While the problematic PG 
is in the pool used by Proxmox, I wouldn't expect the above problem would be 
able to cause store-level corruption on the OSDs.

Other than that, nothing of interest has happened that I'm aware of, though I 
don't yet have good monitoring on these nodes.

I'll put something in the tracker later today.

Thank you for your help.

-Original Message-
From: Brad Hubbard [mailto:bhubb...@redhat.com] 
Sent: Wednesday, August 23, 2017 4:44 AM
To: Edward R Huyer 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] PG reported as inconsistent in status, but no 
inconsistencies visible to rados

On Wed, Aug 23, 2017 at 12:47 AM, Edward R Huyer  wrote:
> Neat, hadn't seen that command before.  Here's the fsck log from the 
> primary OSD:  https://pastebin.com/nZ0H5ag3
>
> Looks like the OSD's bluestore "filesystem" itself has some underlying 
> errors, though I'm not sure what to do about them.

Hmmm... Can you tell us any more about how/when this happened?

Any corresponding event at all? Any interesting log entries around the same 
time?

Could you also open a tracker for this (or let me know and I can open one for 
you)? That way we can continue the investigation there.

>
> -Original Message-
> From: Brad Hubbard [mailto:bhubb...@redhat.com]
> Sent: Monday, August 21, 2017 7:05 PM
> To: Edward R Huyer 
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] PG reported as inconsistent in status, but 
> no inconsistencies visible to rados
>
> Could you provide the output of 'ceph-bluestore-tool fsck' for one of these 
> OSDs?
>
> On Tue, Aug 22, 2017 at 2:53 AM, Edward R Huyer  wrote:
>> This is an odd one.  My cluster is reporting an inconsistent pg in 
>> ceph status and ceph health detail.  However, rados 
>> list-inconsistent-obj and rados list-inconsistent-snapset both report 
>> no inconsistencies.  Scrubbing the pg results in these errors in the osd 
>> logs:
>>
>>
>>
>> OSD 63 (primary):
>>
>> 2017-08-21 12:41:03.580068 7f0b36629700 -1
>> bluestore(/var/lib/ceph/osd/ceph-63) _verify_csum bad crc32c/0x1000 
>> checksum at blob offset 0x0, got 0x6b6b9184, expected 0x6706be76, 
>> device location [0x23f39d~1000], logical extent 0x0~1000, object 
>> #9:55bf7cc6:::rbd_data.33992ae8944a.200f:e#
>>
>> 2017-08-21 12:41:03.961945 7f0b36629700 -1 log_channel(cluster) log [ERR] :
>> 9.aa soid 9:55bf7cc6:::rbd_data.33992ae8944a.200f:e:
>> failed to pick suitable object info
>>
>> 2017-08-21 12:41:15.357484 7f0b36629700 -1 log_channel(cluster) log [ERR] :
>> 9.aa deep-scrub 3 errors
>>
>>
>>
>> OSD 50:
>>
>> 2017-08-21 12:41:03.592918 7f264be6d700 -1
>> bluestore(/var/lib/ceph/osd/ceph-50) _verify_csum bad crc32c/0x1000 
>> checksum at blob offset 0x0, got 0x64a1e2b1, expected 0x6706be76, 
>> device location [0x341883~1000], logical extent 0x0~1000, object 
>> #9:55bf7cc6:::rbd_data.33992ae8944a.200f:e#
>>
>>
>>
>> OSD 46:
>>
>> 2017-08-21 12:41:03.531394 7fb396b1f700 -1
>> bluestore(/var/lib/ceph/osd/ceph-46) _verify_csum bad crc32c/0x1000 
>> checksum at blob offset 0x0, got 0x7aa05c01, expected 0x6706be76, 
>> device location [0x1d6e1e~1000], logical extent 0x0~1000, object 
>> #9:55bf7cc6:::rbd_data.33992ae8944a.200f:e#
>>
>>
>>
>> This is on Ceph 12.1.4 (previously 12.1.1).
>>
>>
>>
>> Thoughts?
>>
>>
>>
>> -
>>
>> Edward Huyer
>>
>> School of Interactive Games and Media
>>
>> Rochester Institute of Technology
>>
>> Golisano 70-2373
>>
>> 152 Lomb Memorial Drive
>>
>> Rochester, NY 14623
>>
>> 585-475-6651
>>
>> erh...@rit.edu
>>
>>
>>
>> Obligatory Legalese:
>>
>> The information transmitted, including attachments, is intended only 
>> for the
>> person(s) or entity to which it is addressed and may contain 
>> confidential and/or privileged material. Any review, retransmission, 
>> dissemination or other use of, or taking of any action in reliance 
>> upon this information by persons or entities other than the intended 
>> recipient is prohibited. If you received this in error, please 
>> contact the sender and destroy any copies of this information.
>>
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
> Cheers,
> Brad



--
Cheers,
Brad
___
ceph-users mailing

[ceph-users] Ceph Tech Talk Cancelled

2017-08-23 Thread Leonardo Vaz
Hey Cephers,

Sorry for the short notice, but the Ceph Tech Talk for August (scheduled
for today) has been cancelled.

Kindest regards,

Leo

-- 
Leonardo Vaz
Ceph Community Manager
Open Source and Standards Team
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Moderator?

2017-08-23 Thread Eric Renfro
I sent a message in almost 2 days ago, with pasted logs. Since then, it’s been 
in the moderator’s queue and still not approved (or even declined). 

Is anyone actually checking that? ;)

Eric Renfro
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cephfs user path permissions luminous

2017-08-23 Thread John Spray
On Wed, Aug 23, 2017 at 3:13 PM, Marc Roos  wrote:
>
>
> ceph fs authorize cephfs client.bla /bla rw
>
> Will generate a user with these permissions
>
> [client.bla]
> caps mds = "allow rw path=/bla"
> caps mon = "allow r"
> caps osd = "allow rw pool=fs_data"
>
> With those permissions I cannot mount, I get a permission denied, until
> I change the permissions to eg. These:
>
> caps mds = "allow r, allow rw path=/bla"
> caps mon = "allow r"
> caps osd = "allow rwx pool=fs_meta,allow rwx pool=fs_data"
>
> Are these the minimum required permissions for mounting? I guess this
> should also be updated for ceph fs authorize?

I'm guessing you're using an older kernel client -- the older client
always tries to read the / inode even if it is mounting a subpath, so
needed that "allow r" workaround.  I don't think there's a neat way to
accommodate that with the new "fs authorize" hotness, but this
probably deserves a little warning box in the documentation.

John

>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cephfs user path permissions luminous

2017-08-23 Thread Marc Roos


ceph fs authorize cephfs client.bla /bla rw

Will generate a user with these permissions 

[client.bla]
caps mds = "allow rw path=/bla"
caps mon = "allow r"
caps osd = "allow rw pool=fs_data"

With those permissions I cannot mount, I get a permission denied, until 
I change the permissions to eg. These:

caps mds = "allow r, allow rw path=/bla"
caps mon = "allow r"
caps osd = "allow rwx pool=fs_meta,allow rwx pool=fs_data"

Are these the minimum required permissions for mounting? I guess this 
should also be updated for ceph fs authorize?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Anybody gotten boto3 and ceph RGW working?

2017-08-23 Thread Bryan Banister
Looks like I found the problem:  
https://github.com/snowflakedb/snowflake-connector-python/issues/1

I’ll try the fixed version of botocore 1.4.87+,
-Bryan

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Bryan 
Banister
Sent: Wednesday, August 23, 2017 9:01 AM
To: Abhishek Lekshmanan ; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Anybody gotten boto3 and ceph RGW working?

Note: External Email


Here is the error I get:



# python3 boto3_test.py

Traceback (most recent call last):

  File "boto3_test.py", line 15, in 

for bucket in s3.list_buckets():

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/client.py",
 line 251, in _api_call

return self._make_api_call(operation_name, kwargs)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/client.py",
 line 526, in _make_api_call

operation_model, request_dict)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/endpoint.py",
 line 141, in make_request

return self._send_request(request_dict, operation_model)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/endpoint.py",
 line 170, in _send_request

success_response, exception):

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/endpoint.py",
 line 249, in _needs_retry

caught_exception=caught_exception, request_dict=request_dict)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/hooks.py",
 line 227, in emit

return self._emit(event_name, kwargs)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/hooks.py",
 line 210, in _emit

response = handler(**kwargs)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/retryhandler.py",
 line 183, in __call__

if self._checker(attempts, response, caught_exception):

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/retryhandler.py",
 line 251, in __call__

caught_exception)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/retryhandler.py",
 line 269, in _should_retry

return self._checker(attempt_number, response, caught_exception)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/retryhandler.py",
 line 317, in __call__

caught_exception)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/retryhandler.py",
 line 223, in __call__

attempt_number, caught_exception)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/retryhandler.py",
 line 359, in _check_caught_exception

raise caught_exception

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/endpoint.py",
 line 204, in _get_response

proxies=self.proxies, timeout=self.timeout)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/vendored/requests/sessions.py",
 line 573, in send

r = adapter.send(request, **kwargs)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/vendored/requests/adapters.py",
 line 370, in send

timeout=timeout

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/vendored/requests/packages/urllib3/connectionpool.py",
 line 544, in urlopen

body=body, headers=headers)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/vendored/requests/packages/urllib3/connectionpool.py",
 line 349, in _make_request

conn.request(method, url, **httplib_request_kw)

  File "/jump/software/rhel7/Python-3.6.0/lib/python3.6/http/client.py", line 
1239, in request

self._send_request(method, url, body, headers, encode_chunked)

TypeError: _send_request() takes 5 positional arguments but 6 were given



Here is the simple code:

import boto3



access_key = ""

secret_key = ""

gateway = "http://carf-ceph-osd15";



s3 = boto3.client('s3', 'us-east-1',

aws_access_key_id=access_key,

aws_secret_access_key=secret_key,

endpoint_url=gateway,

use_ssl=False

   )

# config=boto3.session.Config(signature_version='s3v2')



for bucket in s3.list_buckets():

for key in bucket.objects.all():

print(key.key)



Thanks in advance for any help!!

-Bryan



-Original Message-
From: Abhishek Lekshmanan [mailto:abhis...@suse.com]
Sent: Wednesday, August 23, 2017 4:07 AM
To: Bryan Banister 
mailto:bbanis...@jumptrading.com>>; 
ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Anybody gotten boto3 and ceph RGW working?



Note: External Email


Re: [ceph-users] Anybody gotten boto3 and ceph RGW working?

2017-08-23 Thread Bryan Banister
Here is the error I get:



# python3 boto3_test.py

Traceback (most recent call last):

  File "boto3_test.py", line 15, in 

for bucket in s3.list_buckets():

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/client.py",
 line 251, in _api_call

return self._make_api_call(operation_name, kwargs)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/client.py",
 line 526, in _make_api_call

operation_model, request_dict)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/endpoint.py",
 line 141, in make_request

return self._send_request(request_dict, operation_model)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/endpoint.py",
 line 170, in _send_request

success_response, exception):

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/endpoint.py",
 line 249, in _needs_retry

caught_exception=caught_exception, request_dict=request_dict)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/hooks.py",
 line 227, in emit

return self._emit(event_name, kwargs)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/hooks.py",
 line 210, in _emit

response = handler(**kwargs)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/retryhandler.py",
 line 183, in __call__

if self._checker(attempts, response, caught_exception):

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/retryhandler.py",
 line 251, in __call__

caught_exception)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/retryhandler.py",
 line 269, in _should_retry

return self._checker(attempt_number, response, caught_exception)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/retryhandler.py",
 line 317, in __call__

caught_exception)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/retryhandler.py",
 line 223, in __call__

attempt_number, caught_exception)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/retryhandler.py",
 line 359, in _check_caught_exception

raise caught_exception

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/endpoint.py",
 line 204, in _get_response

proxies=self.proxies, timeout=self.timeout)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/vendored/requests/sessions.py",
 line 573, in send

r = adapter.send(request, **kwargs)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/vendored/requests/adapters.py",
 line 370, in send

timeout=timeout

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/vendored/requests/packages/urllib3/connectionpool.py",
 line 544, in urlopen

body=body, headers=headers)

  File 
"/jump/software/rhel7/python36_botocore-1.4.85/lib/python3.6/site-packages/botocore/vendored/requests/packages/urllib3/connectionpool.py",
 line 349, in _make_request

conn.request(method, url, **httplib_request_kw)

  File "/jump/software/rhel7/Python-3.6.0/lib/python3.6/http/client.py", line 
1239, in request

self._send_request(method, url, body, headers, encode_chunked)

TypeError: _send_request() takes 5 positional arguments but 6 were given



Here is the simple code:

import boto3



access_key = ""

secret_key = ""

gateway = "http://carf-ceph-osd15";



s3 = boto3.client('s3', 'us-east-1',

aws_access_key_id=access_key,

aws_secret_access_key=secret_key,

endpoint_url=gateway,

use_ssl=False

   )

# config=boto3.session.Config(signature_version='s3v2')



for bucket in s3.list_buckets():

for key in bucket.objects.all():

print(key.key)



Thanks in advance for any help!!

-Bryan



-Original Message-
From: Abhishek Lekshmanan [mailto:abhis...@suse.com]
Sent: Wednesday, August 23, 2017 4:07 AM
To: Bryan Banister ; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Anybody gotten boto3 and ceph RGW working?



Note: External Email

-



Bryan Banister mailto:bbanis...@jumptrading.com>> 
writes:



> Hello,

>

> I have the boto python API working with our ceph cluster but haven't figured 
> out a way to get boto3 to communicate yet to our RGWs.  Anybody have a simple 
> example?



I just use the client interface as described in

http://boto3.readthedocs.io/en/latest/reference/services/s3.html



so something like::



s3 = boto3.client('s3','us-east-1', endpoint_url='http://',

   aws_access_key_i

Re: [ceph-users] OSD doesn't always start at boot

2017-08-23 Thread David Turner
This isn't a solution to fix them not starting at boot time, but a fix to
not having to reboot the node again.  `ceph-disk activate-all` should go
through and start up the rest of your osds without another reboot.

On Wed, Aug 23, 2017 at 9:36 AM Sean Purdy  wrote:

> Hi,
>
> Luminous 12.1.1
>
> I've had a couple of servers where at cold boot time, one or two of the
> OSDs haven't mounted/been detected.  Or been partially detected.  These are
> luminous Bluestore OSDs.  Often a warm boot fixes it, but I'd rather not
> have to reboot the node again.
>
> Sometimes /var/lib/ceph/osd/ceph-NN is empty - i.e. not mounted.  And
> sometimes /var/lib/ceph/osd/ceph-NN is mounted, but the
> /var/lib/ceph/osd/ceph-NN/block symlink is pointing to a /dev/mapper UUID
> path that doesn't exist.  Those partitions have to be mounted before
> "systemctl start ceph-osd@NN.service" will work.
>
> What happens at disk detect and mount time?  Is there a timeout somewhere
> I can extend?
>
> How can I tell udev to have another go at mounting the disks?
>
> If it's in the docs and I've missed it, apologies.
>
>
> Thanks in advance,
>
> Sean Purdy
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-fuse hanging on df with ceph luminous >= 12.1.3

2017-08-23 Thread John Spray
On Tue, Aug 22, 2017 at 1:37 AM, Alessandro De Salvo
 wrote:
> Hi,
>
> when trying to use df on a ceph-fuse mounted cephfs filesystem with ceph
> luminous >= 12.1.3 I'm having hangs with the following kind of messages in
> the logs:
>
>
> 2017-08-22 02:20:51.094704 7f80addb7700  0 client.174216 ms_handle_reset on
> 192.168.0.10:6789/0
>
>
> The logs are only showing this type of messages and nothing more useful. The
> only possible way to resume the operations is to kill ceph-fuse and remount.
> Only df is hanging though, while file operations, like copy/rm/ls are
> working as expected.
>
> This behavior is only shown for ceph >= 12.1.3, while for example ceph-fuse
> on 12.1.2 works.
>
> Anyone has seen the same problems? Any help is highly appreciated.

I am seeing this problem too after updating to 12.1.3.  Opened
http://tracker.ceph.com/issues/21078

John

>
> Thanks,
>
>
>  Alessandro
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD doesn't always start at boot

2017-08-23 Thread Sean Purdy
Hi,

Luminous 12.1.1

I've had a couple of servers where at cold boot time, one or two of the OSDs 
haven't mounted/been detected.  Or been partially detected.  These are luminous 
Bluestore OSDs.  Often a warm boot fixes it, but I'd rather not have to reboot 
the node again.

Sometimes /var/lib/ceph/osd/ceph-NN is empty - i.e. not mounted.  And sometimes 
/var/lib/ceph/osd/ceph-NN is mounted, but the /var/lib/ceph/osd/ceph-NN/block 
symlink is pointing to a /dev/mapper UUID path that doesn't exist.  Those 
partitions have to be mounted before "systemctl start ceph-osd@NN.service" will 
work.

What happens at disk detect and mount time?  Is there a timeout somewhere I can 
extend?

How can I tell udev to have another go at mounting the disks?

If it's in the docs and I've missed it, apologies.


Thanks in advance,

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] [SSD NVM FOR JOURNAL] Performance issues

2017-08-23 Thread Guilherme Steinmüller
Hello!

I recently installed INTEL SSD 400GB 750 SERIES PCIE 3.0 X4 in 3 of my OSD
nodes.

First of all, here's is an schema describing how my cluster is:

[image: Imagem inline 1]

[image: Imagem inline 2]

I primarily use my ceph as a beckend for OpenStack nova, glance, swift and
cinder. My crushmap is configured to have rulesets for SAS disks, SATA
disks and another ruleset that resides in HPE nodes using SATA disks too.

Before installing the new journal in HPE nodes, i was using one of the
disks that today are OSDs (osd.35, osd.34 and osd.33). After upgrading the
journal, i noticed that a dd command writing 1gb blocks in openstack nova
instances doubled the throughput but the value expected was actually 400%
or 500% since in the Dell nodes that we have another nova pool the
throughput is around this value.

Here is a demonstration of the scenario and the difference in performance
between Dell nodes and HPE nodes:



Scenario:


   -Using pools to store instance disks for OpenStack


   - Pool nova in "ruleset SAS" placed on c4-osd201, c4-osd202 and
   c4-osd203 with 5 osds per hosts


   - Pool nova_hpedl180 in "ruleset NOVA_HPEDL180" placed on c4-osd204,
   c4-osd205, c4-osd206 with 3 osds per hosts


   - Every OSD has one partition of 35GB in a INTEL SSD 400GB 750
   SERIES PCIE 3.0 X4


   - Internal link for cluster and public network of 10Gbps


   - Deployment via ceph-ansible. Same configuration define in ansible
   for every host on cluster



*Instance on pool nova in ruleset SAS:*


   # dd if=/dev/zero of=/mnt/bench bs=1G count=1 oflag=direct
   1+0 records in
   1+0 records out
   1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.56255 s, 419 MB/s


*Instance on pool nova in ruleset NOVA_HPEDL180:*

 #  dd if=/dev/zero of=/mnt/bench bs=1G count=1 oflag=direct
 1+0 records in
 1+0 records out
 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 11.8243 s, 90.8 MB/s


I made some FIO benchmarks as suggested by Sebastien (
https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-
test-if-your-ssd-is-suitable-as-a-journal-device/ ) and the command with 1
job returned me about 180MB/s of throughput in recently installed nodes
(HPE nodes). I made some hdparm benchmark in all SSDs and everything seems
normal.


I can't see what is causing this difference of throughput since the network
is not a problem and i think that cpu and memory are not crucial since i
was monitoring the cluster with atop command and i didn't notice saturation
of resources. My only though is that I have less workload in nova_hpedl180
pool in HPE nodes and less disks per node and this ca influence in the
throughput of the journal.


Any clue about what is missing or what is happening?

Thanks in advance.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph cluster with SSDs

2017-08-23 Thread Christian Balzer
On Wed, 23 Aug 2017 16:48:12 +0530 M Ranga Swami Reddy wrote:

> On Mon, Aug 21, 2017 at 5:37 PM, Christian Balzer  wrote:
> > On Mon, 21 Aug 2017 17:13:10 +0530 M Ranga Swami Reddy wrote:
> >  
> >> Thank you.
> >> Here I have NVMes from Intel. but as the support of these NVMes not
> >> there from Intel, we decided not to use these NVMes as a journal.  
> >
> > You again fail to provide with specific model numbers...  
> 
> NEMe - Intel DC P3608  - 1.6TB

3DWPD, so you could put this in front (journal~ of 30 or so of those
Samsungs and it still would last longer.

Christian

> 
> Thanks
> Swami
> 
> > No support from Intel suggests that these may be consumer models again.
> >
> > Samsung also makes DC grade SSDs and NVMEs, as Adrian pointed out.
> >  
> >> Btw, if we split this SSD with multiple OSD (for ex: 1 SSD with 4 or 2
> >> OSDs), is  this help any performance numbers?
> >>  
> > Of course not, if anything it will make it worse due to the overhead
> > outside the SSD itself.
> >
> > Christian
> >  
> >> On Sun, Aug 20, 2017 at 9:33 AM, Christian Balzer  wrote:  
> >> >
> >> > Hello,
> >> >
> >> > On Sat, 19 Aug 2017 23:22:11 +0530 M Ranga Swami Reddy wrote:
> >> >  
> >> >> SSD make details : SSD 850 EVO 2.5" SATA III 4TB Memory & Storage -
> >> >> MZ-75E4T0B/AM | Samsung
> >> >>  
> >> > And there's your answer.
> >> >
> >> > A bit of googling in the archives here would have shown you that these 
> >> > are
> >> > TOTALLY unsuitable for use with Ceph.
> >> > Not only because of the horrid speed when used with/for Ceph journaling
> >> > (direct/sync I/O) but also their abysmal endurance of 0.04 DWPD over 5
> >> > years.
> >> > Or in other words 160GB/day, which after the Ceph journal double writes
> >> > and FS journals, other overhead and write amplification in general
> >> > probably means less that effective 40GB/day.
> >> >
> >> > In contrast the lowest endurance DC grade SSDs tend to be 0.3 DWPD and
> >> > more commonly 1 DWPD.
> >> > And I'm not buying anything below 3 DWPD for use with Ceph.
> >> >
> >> > Your only chance to improve the speed here is to take the journals off
> >> > them and put them onto fast and durable enough NVMes like the Intel DC P
> >> > 3700 or at worst 3600 types.
> >> >
> >> > That still leaves you with their crappy endurance, only twice as high 
> >> > than
> >> > before with the journals offloaded.
> >> >
> >> > Christian
> >> >  
> >> >> On Sat, Aug 19, 2017 at 10:44 PM, M Ranga Swami Reddy
> >> >>  wrote:  
> >> >> > Yes, Its in production and used the pg count as per the pg calcuator 
> >> >> > @ ceph.com.
> >> >> >
> >> >> > On Fri, Aug 18, 2017 at 3:30 AM, Mehmet  wrote:  
> >> >> >> Which ssds are used? Are they in production? If so how is your PG 
> >> >> >> Count?
> >> >> >>
> >> >> >> Am 17. August 2017 20:04:25 MESZ schrieb M Ranga Swami Reddy
> >> >> >> :  
> >> >> >>>
> >> >> >>> Hello,
> >> >> >>> I am using the Ceph cluster with HDDs and SSDs. Created separate 
> >> >> >>> pool for
> >> >> >>> each.
> >> >> >>> Now, when I ran the "ceph osd bench", HDD's OSDs show around 500 
> >> >> >>> MB/s
> >> >> >>> and SSD's OSD show around 280MB/s.
> >> >> >>>
> >> >> >>> Ideally, what I expected was - SSD's OSDs should be at-least 40% 
> >> >> >>> high
> >> >> >>> as compared with HDD's OSD bench.
> >> >> >>>
> >> >> >>> Did I miss anything here? Any hint is appreciated.
> >> >> >>>
> >> >> >>> Thanks
> >> >> >>> Swami
> >> >> >>> 
> >> >> >>>
> >> >> >>> ceph-users mailing list
> >> >> >>> ceph-users@lists.ceph.com
> >> >> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com  
> >> >> >>
> >> >> >>
> >> >> >> ___
> >> >> >> ceph-users mailing list
> >> >> >> ceph-users@lists.ceph.com
> >> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >> >>  
> >> >> ___
> >> >> ceph-users mailing list
> >> >> ceph-users@lists.ceph.com
> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >>  
> >> >
> >> >
> >> > --
> >> > Christian BalzerNetwork/Systems Engineer
> >> > ch...@gol.com   Rakuten Communications  
> >>  
> >
> >
> > --
> > Christian BalzerNetwork/Systems Engineer
> > ch...@gol.com   Rakuten Communications  
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph cluster with SSDs

2017-08-23 Thread M Ranga Swami Reddy
On Mon, Aug 21, 2017 at 5:37 PM, Christian Balzer  wrote:
> On Mon, 21 Aug 2017 17:13:10 +0530 M Ranga Swami Reddy wrote:
>
>> Thank you.
>> Here I have NVMes from Intel. but as the support of these NVMes not
>> there from Intel, we decided not to use these NVMes as a journal.
>
> You again fail to provide with specific model numbers...

NEMe - Intel DC P3608  - 1.6TB


Thanks
Swami

> No support from Intel suggests that these may be consumer models again.
>
> Samsung also makes DC grade SSDs and NVMEs, as Adrian pointed out.
>
>> Btw, if we split this SSD with multiple OSD (for ex: 1 SSD with 4 or 2
>> OSDs), is  this help any performance numbers?
>>
> Of course not, if anything it will make it worse due to the overhead
> outside the SSD itself.
>
> Christian
>
>> On Sun, Aug 20, 2017 at 9:33 AM, Christian Balzer  wrote:
>> >
>> > Hello,
>> >
>> > On Sat, 19 Aug 2017 23:22:11 +0530 M Ranga Swami Reddy wrote:
>> >
>> >> SSD make details : SSD 850 EVO 2.5" SATA III 4TB Memory & Storage -
>> >> MZ-75E4T0B/AM | Samsung
>> >>
>> > And there's your answer.
>> >
>> > A bit of googling in the archives here would have shown you that these are
>> > TOTALLY unsuitable for use with Ceph.
>> > Not only because of the horrid speed when used with/for Ceph journaling
>> > (direct/sync I/O) but also their abysmal endurance of 0.04 DWPD over 5
>> > years.
>> > Or in other words 160GB/day, which after the Ceph journal double writes
>> > and FS journals, other overhead and write amplification in general
>> > probably means less that effective 40GB/day.
>> >
>> > In contrast the lowest endurance DC grade SSDs tend to be 0.3 DWPD and
>> > more commonly 1 DWPD.
>> > And I'm not buying anything below 3 DWPD for use with Ceph.
>> >
>> > Your only chance to improve the speed here is to take the journals off
>> > them and put them onto fast and durable enough NVMes like the Intel DC P
>> > 3700 or at worst 3600 types.
>> >
>> > That still leaves you with their crappy endurance, only twice as high than
>> > before with the journals offloaded.
>> >
>> > Christian
>> >
>> >> On Sat, Aug 19, 2017 at 10:44 PM, M Ranga Swami Reddy
>> >>  wrote:
>> >> > Yes, Its in production and used the pg count as per the pg calcuator @ 
>> >> > ceph.com.
>> >> >
>> >> > On Fri, Aug 18, 2017 at 3:30 AM, Mehmet  wrote:
>> >> >> Which ssds are used? Are they in production? If so how is your PG 
>> >> >> Count?
>> >> >>
>> >> >> Am 17. August 2017 20:04:25 MESZ schrieb M Ranga Swami Reddy
>> >> >> :
>> >> >>>
>> >> >>> Hello,
>> >> >>> I am using the Ceph cluster with HDDs and SSDs. Created separate pool 
>> >> >>> for
>> >> >>> each.
>> >> >>> Now, when I ran the "ceph osd bench", HDD's OSDs show around 500 MB/s
>> >> >>> and SSD's OSD show around 280MB/s.
>> >> >>>
>> >> >>> Ideally, what I expected was - SSD's OSDs should be at-least 40% high
>> >> >>> as compared with HDD's OSD bench.
>> >> >>>
>> >> >>> Did I miss anything here? Any hint is appreciated.
>> >> >>>
>> >> >>> Thanks
>> >> >>> Swami
>> >> >>> 
>> >> >>>
>> >> >>> ceph-users mailing list
>> >> >>> ceph-users@lists.ceph.com
>> >> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >>
>> >> >>
>> >> >> ___
>> >> >> ceph-users mailing list
>> >> >> ceph-users@lists.ceph.com
>> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >>
>> >> ___
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>
>> >
>> >
>> > --
>> > Christian BalzerNetwork/Systems Engineer
>> > ch...@gol.com   Rakuten Communications
>>
>
>
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Rakuten Communications
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Blocked requests problem

2017-08-23 Thread Ramazan Terzi
Finally problem solved.

First, I set noscrub, nodeep-scrub, norebalance, nobackfill, norecover, noup 
and nodown flags. Then I restarted the OSD which has problem.
When OSD daemon started, blocked requests increased (up to 100) and some 
misplaced PGs appeared. Then I unset flags in order to noup, nodown, norecover, 
nobackfill, norebalance.
In a little while, all misplaced PGs repaired. Then I unset noscrub and 
nodeep-scrub flags. And finally: HEALTH_OK.

Thanks for your helps,
Ramazan

> On 22 Aug 2017, at 20:46, Ranjan Ghosh  wrote:
> 
> Hm. That's quite weird. On our cluster, when I set "noscrub", "nodeep-scrub", 
> scrubbing will always stop pretty quickly (a few minutes). I wonder why this 
> doesnt happen on your cluster. When exactly did you set the flag? Perhaps it 
> just needs some more time... Or there might be a disk problem why the 
> scrubbing never finishes. Perhaps it's really a good idea, just like you 
> proposed, to shutdown the corresponding OSDs. But that's just my thoughts. 
> Perhaps some Ceph pro can shed some light on the possible reasons, why a 
> scrubbing might get stuck and how to resolve this.
> 
> 
> Am 22.08.2017 um 18:58 schrieb Ramazan Terzi:
>> Hi Ranjan,
>> 
>> Thanks for your reply. I did set scrub and nodeep-scrub flags. But active 
>> scrubbing operation can’t working properly. Scrubbing operation always in 
>> same pg (20.1e).
>> 
>> $ ceph pg dump | grep scrub
>> dumped all in format plain
>> pg_stat  objects mip degrmispunf bytes   log disklog 
>> state   state_stamp v   reportedup  up_primary  
>> acting  acting_primary  last_scrub  scrub_stamp last_deep_scrub 
>> deep_scrub_stamp
>> 20.1e25189   0   0   0   0   98359116362 3048
>> 3048active+clean+scrubbing  2017-08-21 04:55:13.354379  
>> 6930'2393   6930:20949058   [29,31,3]   29  [29,31,3]   29   
>>6712'22950171   2017-08-20 04:46:59.208792  6712'22950171   
>> 2017-08-20 04:46:59.208792
>> 
>> 
>> $ ceph -s
>> cluster 
>>  health HEALTH_WARN
>> 33 requests are blocked > 32 sec
>> noscrub,nodeep-scrub flag(s) set
>>  monmap e9: 3 mons at 
>> {ceph-mon01=**:6789/0,ceph-mon02=**:6789/0,ceph-mon03=**:6789/0}
>> election epoch 84, quorum 0,1,2 ceph-mon01,ceph-mon02,ceph-mon03
>>  osdmap e6930: 36 osds: 36 up, 36 in
>> flags noscrub,nodeep-scrub,sortbitwise,require_jewel_osds
>>   pgmap v17667617: 1408 pgs, 5 pools, 24779 GB data, 6494 kobjects
>> 70497 GB used, 127 TB / 196 TB avail
>> 1407 active+clean
>>1 active+clean+scrubbing
>> 
>> 
>> Thanks,
>> Ramazan
>> 
>> 
>>> On 22 Aug 2017, at 18:52, Ranjan Ghosh  wrote:
>>> 
>>> Hi Ramazan,
>>> 
>>> I'm no Ceph expert, but what I can say from my experience using Ceph is:
>>> 
>>> 1) During "Scrubbing", Ceph can be extremely slow. This is probably where 
>>> your "blocked requests" are coming from. BTW: Perhaps you can even find out 
>>> which processes are currently blocking with: ps aux | grep "D". You might 
>>> even want to kill some of those and/or shutdown services in order to 
>>> relieve some stress from the machine until it recovers.
>>> 
>>> 2) I usually have the following in my ceph.conf. This lets the scrubbing 
>>> only run between midnight and 6 AM (hopefully the time of least demand; 
>>> adjust as necessary)  - and with the lowest priority.
>>> 
>>> #Reduce impact of scrub.
>>> osd_disk_thread_ioprio_priority = 7
>>> osd_disk_thread_ioprio_class = "idle"
>>> osd_scrub_end_hour = 6
>>> 
>>> 3) The Scrubbing begin and end hour will always work. The low priority 
>>> mode, however, works (AFAIK!) only with CFQ I/O Scheduler. Show your 
>>> current scheduler like this (replace sda with your device):
>>> 
>>> cat /sys/block/sda/queue/scheduler
>>> 
>>> You can also echo to this file to set a different scheduler.
>>> 
>>> 
>>> With these settings you can perhaps alleviate the problem so far, that the 
>>> scrubbing runs over many nights until it finished. Again, AFAIK, it doesnt 
>>> have to finish in one night. It will continue the next night and so on.
>>> 
>>> The Ceph experts say scrubbing is important. Don't know why, but I just 
>>> believe them. They've built this complex stuff after all :-)
>>> 
>>> Thus, you can use "noscrub"/"nodeepscrub" to quickly get a hung server back 
>>> to work, but you should not let it run like this forever and a day.
>>> 
>>> Hope this helps at least a bit.
>>> 
>>> BR,
>>> 
>>> Ranjan
>>> 
>>> 
>>> Am 22.08.2017 um 15:20 schrieb Ramazan Terzi:
 Hello,
 
 I have a Ceph Cluster with specifications below:
 3 x Monitor node
 6 x Storage Node (6 disk per Storage Node, 6TB SATA Disks, all disks have 
 SSD journals)
 Distributed public and private networks. All NICs are 10Gbit/s
 osd pool default size = 3
 osd pool default min size = 2
 
 Ceph v

Re: [ceph-users] cluster unavailable for 20 mins when downed server was reintroduced

2017-08-23 Thread Sean Purdy
On Tue, 15 Aug 2017, Sean Purdy said:
> Luminous 12.1.1 rc1
> 
> Hi,
> 
> 
> I have a three node cluster with 6 OSD and 1 mon per node.
> 
> I had to turn off one node for rack reasons.  While the node was down, the 
> cluster was still running and accepting files via radosgw.  However, when I 
> turned the machine back on, radosgw uploads stopped working and things like 
> "ceph status" starting timed out.  It took 20 minutes for "ceph status" to be 
> OK.  

Well I've figured out why "ceph status" was hanging (and possibly radosgw).  It 
seems that ceph utility looks at ceph.conf to find a monitor to connect to (or 
at least that's what strace implied), but our ceph.conf only had one monitor 
out of three actually listed in the file.  And that was the node I turned off.  
Updating mon_initial_members and mon_host with the other two monitors worked.

TBF, 
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/1.3/html/administration_guide/managing_cluster_size
 does mention you should add your second and third monitors here.  But I hadn't 
read that, and elsewhere I read that on boot the monitors will discover other 
monitors, so I thought you didn't need to list them all.  e.g. 
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address
 (which also says clients use ceph.conf to find monitors - I missed that part).

Anyway, I'll do a few more tests with a better ceph.conf


Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Blocked requests problem

2017-08-23 Thread Manuel Lausch
Hi,

Sometimes we have the same issue on our 10.2.9 Cluster. (24 Nodes á 60
OSDs)

I think there is some racecondition or something like that
which results in this state. The blocking requests starts exactly at
the time the PG begins to scrub. 

you can try the following. The OSD will automaticaly recover and the
blocked requests will disapear.

ceph osd down 31 


In my opinion this is a bug but I have note investigated so far. Mayby
some developer can say something about this issue 


Regards,
Manuel


Am Tue, 22 Aug 2017 16:20:14 +0300
schrieb Ramazan Terzi :

> Hello,
> 
> I have a Ceph Cluster with specifications below:
> 3 x Monitor node
> 6 x Storage Node (6 disk per Storage Node, 6TB SATA Disks, all disks
> have SSD journals) Distributed public and private networks. All NICs
> are 10Gbit/s osd pool default size = 3
> osd pool default min size = 2
> 
> Ceph version is Jewel 10.2.6.
> 
> My cluster is active and a lot of virtual machines running on it
> (Linux and Windows VM's, database clusters, web servers etc).
> 
> During normal use, cluster slowly went into a state of blocked
> requests. Blocked requests periodically incrementing. All OSD's seems
> healthy. Benchmark, iowait, network tests, all of them succeed.
> 
> Yerterday, 08:00:
> $ ceph health detail
> HEALTH_WARN 3 requests are blocked > 32 sec; 3 osds have slow requests
> 1 ops are blocked > 134218 sec on osd.31
> 1 ops are blocked > 134218 sec on osd.3
> 1 ops are blocked > 8388.61 sec on osd.29
> 3 osds have slow requests
> 
> Todat, 16:05:
> $ ceph health detail
> HEALTH_WARN 32 requests are blocked > 32 sec; 3 osds have slow
> requests 1 ops are blocked > 134218 sec on osd.31
> 1 ops are blocked > 134218 sec on osd.3
> 16 ops are blocked > 134218 sec on osd.29
> 11 ops are blocked > 67108.9 sec on osd.29
> 2 ops are blocked > 16777.2 sec on osd.29
> 1 ops are blocked > 8388.61 sec on osd.29
> 3 osds have slow requests
> 
> $ ceph pg dump | grep scrub
> dumped all in format plain
> pg_stat   objects mip degrmisp
> unf   bytes   log disklog state
> state_stamp   v   reportedup
> up_primaryacting  acting_primary
> last_scrubscrub_stamp last_deep_scrub
> deep_scrub_stamp 20.1e25183   0   0   0
> 0 98332537930 30663066
> active+clean+scrubbing2017-08-21 04:55:13.354379
> 6930'23908781 6930:20905696   [29,31,3]   29
> [29,31,3] 29  6712'22950171   2017-08-20
> 04:46:59.208792   6712'22950171   2017-08-20 04:46:59.208792
> 
> Active scrub does not finish (about 24 hours). I did not restart any
> OSD meanwhile. I'm thinking set noscrub, noscrub-deep, norebalance,
> nobackfill, and norecover flags and restart 3,29,31th OSDs. Is this
> solve my problem? Or anyone has suggestion about this problem?
> 
> Thanks,
> Ramazan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Manuel Lausch

Systemadministrator
Cloud Services

1&1 Mail & Media Development & Technology GmbH | Brauerstraße 48 |
76135 Karlsruhe | Germany Phone: +49 721 91374-1847
E-Mail: manuel.lau...@1und1.de | Web: www.1und1.de

Amtsgericht Montabaur, HRB 5452

Geschäftsführer: Thomas Ludwig, Jan Oetjen


Member of United Internet

Diese E-Mail kann vertrauliche und/oder gesetzlich geschützte
Informationen enthalten. Wenn Sie nicht der bestimmungsgemäße Adressat
sind oder diese E-Mail irrtümlich erhalten haben, unterrichten Sie
bitte den Absender und vernichten Sie diese E-Mail. Anderen als dem
bestimmungsgemäßen Adressaten ist untersagt, diese E-Mail zu speichern,
weiterzuleiten oder ihren Inhalt auf welche Weise auch immer zu
verwenden.

This e-mail may contain confidential and/or privileged information. If
you are not the intended recipient of this e-mail, you are hereby
notified that saving, distribution or use of the content of this e-mail
in any way is prohibited. If you have received this e-mail in error,
please notify the sender and delete the e-mail.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSDs in EC pool flapping

2017-08-23 Thread george.vasilakakos
No, nothing like that.

The cluster is in the process of having more OSDs added and, while that was 
ongoing, one was removed because the underlying disk was throwing up a bunch of 
read errors.
Shortly after, the first three OSDs in this PG started crashing with error 
messages about corrupted EC shards. We seemed to be running into 
http://tracker.ceph.com/issues/18624 so we moved on to 11.2.1 which essentially 
means they now fail with a different error message. Our problem looks a bit 
like this: http://tracker.ceph.com/issues/18162

For a bit more context here's two more events going backwards in the dump:


-3> 2017-08-22 17:42:09.443216 7fa2e283d700  0 osd.1290 pg_epoch: 73324 
pg[1.138s0( v 73085'430014 (62760'421568,73
085'430014] local-les=73323 n=22919 ec=764 les/c/f 73323/72881/0 
73321/73322/73322) [1290,927,672,456,177,1094,194,1513
,236,302,1326]/[1290,927,672,456,177,1094,194,2147483647,236,302,1326] r=0 
lpr=73322 pi=72880-73321/179 rops=1 bft=1513
(7) crt=73085'430014 lcod 0'0 mlcod 0'0 
active+undersized+degraded+remapped+backfilling] failed_push 1:1c959fdd:::datad
isk%2frucio%2fmc16_13TeV%2f41%2f30%2fAOD.11927271._003020.pool.root.1.:head
 from shard 177(4), reps on 
 unfound? 0
-2> 2017-08-22 17:42:09.443299 7fa2e283d700  5 -- op tracker -- seq: 490, 
time: 2017-08-22 17:42:09.443297, event: 
done, op: MOSDECSubOpReadReply(1.138s0 73324 ECSubReadReply(tid=5, 
attrs_read=0))

No amount of taking OSDs out or restarting them fixes it. At this point we've 
had the first 3 marked out by ceph because they flapped enough that systemd 
gave up trying to restart them, they stayed down long enough and 
mon_osd_down_out_interval expired. Now the pg map looks like this:

# ceph pg map 1.138
osdmap e73599 pg 1.138 (1.138) -> up 
[111,1325,437,456,177,1094,194,1513,236,302,1326] acting 
[2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,1326]

Seeing the #18162, it looks a lot like what we're seeing in our production 
system (which is experiencing a service outage because of this) but the fact 
that the issue is marked as minor severity and hasn't had any updates in two 
months is disconcerting.

As for deep scrubbing it sounds like it could possibly work in a general 
corruption situation but not with a PG stuck in down+remapped and it's first 3 
OSDs crashing out after 5' of operation.


Thanks, 

George



From: Paweł Woszuk [pwos...@man.poznan.pl]

Sent: 22 August 2017 19:19

To: ceph-users@lists.ceph.com; Vasilakakos, George (STFC,RAL,SC)

Subject: Re: [ceph-users] OSDs in EC pool flapping





Have you experienced huge memory consumption by flapping OSD daemons? Restart 
could be triggered by no memory (omkiller).



If yes,this could be connected with osd device error,(bad blocks?), but we've 
experienced something similar on Jewel, not Kraken release. Solution was to 
find PG that cause error, set it to deep scrub manually and restart PG's 
primary OSD.



Hope that helps, or at least lead to some solution.



Dnia 22 sierpnia 2017 18:39:47 CEST, george.vasilaka...@stfc.ac.uk napisał(a):

Hey folks,


I'm staring at a problem that I have found no solution for and which is causing 
major issues.
We've had a PG go down with the first 3 OSDs all crashing and coming back only 
to crash again with the following error in their logs:

-1> 2017-08-22 17:27:50.961633 7f4af4057700 -1 osd.1290 pg_epoch: 72946 
pg[1.138s0( v 72946'430011 (62760'421568,72
946'430011] local-les=72945 n=22918 ec=764 les/c/f 72945/72881/0 
72942/72944/72944) [1290,927,672,456,177,1094,194,1513
,236,302,1326]/[1290,927,672,456,177,1094,194,2147483647,236,302,1326] r=0 
lpr=72944 pi=72880-72943/24 bft=1513(7) crt=
72946'430011 lcod 72889'430010 mlcod 72889'430010 
active+undersized+degraded+remapped+backfilling] recover_replicas: ob
ject added to missing set for backfill, but is not in recovering, error!
 0> 2017-08-22 17:27:50.965861 7f4af4057700 -1 *** Caught signal (Aborted) 
**
 in thread 7f4af4057700 thread_name:tp_osd_tp

This has been going on over the weekend when we saw a different error message 
before upgrading from 11.2.0 to 11.2.1.
The pool is running EC 8+3.

The OSDs crash with that error only to be restarted by systemd and fail again 
the exact same way. Eventually systemd gives, the mon_osd_down_out_interval 
expires and the PG just stays down+remapped while other recover and go 
active+clean.

Can anybody help with this type of problem?


Best regards,

George Vasilakakos

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





Paweł Woszuk

PCSS, Poznańskie Centrum Superkomputerowo-Sieciowe

ul. Jana Pawła II nr 10, 61-139 Poznań

Polska


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Anybody gotten boto3 and ceph RGW working?

2017-08-23 Thread Abhishek Lekshmanan
Bryan Banister  writes:

> Hello,
>
> I have the boto python API working with our ceph cluster but haven't figured 
> out a way to get boto3 to communicate yet to our RGWs.  Anybody have a simple 
> example?

 I just use the client interface as described in
 http://boto3.readthedocs.io/en/latest/reference/services/s3.html

 so something like::

 s3 = boto3.client('s3','us-east-1', endpoint_url='http://',
   aws_access_key_id = 'access',
   aws_secret_access_key = 'secret')

 s3.create_bucket(Bucket='foobar')
 s3.put_object(Bucket='foobar',Key='foo',Body='foo')

-- 
Abhishek Lekshmanan
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG reported as inconsistent in status, but no inconsistencies visible to rados

2017-08-23 Thread Brad Hubbard
On Wed, Aug 23, 2017 at 12:47 AM, Edward R Huyer  wrote:
> Neat, hadn't seen that command before.  Here's the fsck log from the primary 
> OSD:  https://pastebin.com/nZ0H5ag3
>
> Looks like the OSD's bluestore "filesystem" itself has some underlying 
> errors, though I'm not sure what to do about them.

Hmmm... Can you tell us any more about how/when this happened?

Any corresponding event at all? Any interesting log entries around the
same time?

Could you also open a tracker for this (or let me know and I can open
one for you)? That way we can continue the investigation there.

>
> -Original Message-
> From: Brad Hubbard [mailto:bhubb...@redhat.com]
> Sent: Monday, August 21, 2017 7:05 PM
> To: Edward R Huyer 
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] PG reported as inconsistent in status, but no 
> inconsistencies visible to rados
>
> Could you provide the output of 'ceph-bluestore-tool fsck' for one of these 
> OSDs?
>
> On Tue, Aug 22, 2017 at 2:53 AM, Edward R Huyer  wrote:
>> This is an odd one.  My cluster is reporting an inconsistent pg in
>> ceph status and ceph health detail.  However, rados
>> list-inconsistent-obj and rados list-inconsistent-snapset both report
>> no inconsistencies.  Scrubbing the pg results in these errors in the osd 
>> logs:
>>
>>
>>
>> OSD 63 (primary):
>>
>> 2017-08-21 12:41:03.580068 7f0b36629700 -1
>> bluestore(/var/lib/ceph/osd/ceph-63) _verify_csum bad crc32c/0x1000
>> checksum at blob offset 0x0, got 0x6b6b9184, expected 0x6706be76,
>> device location [0x23f39d~1000], logical extent 0x0~1000, object
>> #9:55bf7cc6:::rbd_data.33992ae8944a.200f:e#
>>
>> 2017-08-21 12:41:03.961945 7f0b36629700 -1 log_channel(cluster) log [ERR] :
>> 9.aa soid 9:55bf7cc6:::rbd_data.33992ae8944a.200f:e:
>> failed to pick suitable object info
>>
>> 2017-08-21 12:41:15.357484 7f0b36629700 -1 log_channel(cluster) log [ERR] :
>> 9.aa deep-scrub 3 errors
>>
>>
>>
>> OSD 50:
>>
>> 2017-08-21 12:41:03.592918 7f264be6d700 -1
>> bluestore(/var/lib/ceph/osd/ceph-50) _verify_csum bad crc32c/0x1000
>> checksum at blob offset 0x0, got 0x64a1e2b1, expected 0x6706be76,
>> device location [0x341883~1000], logical extent 0x0~1000, object
>> #9:55bf7cc6:::rbd_data.33992ae8944a.200f:e#
>>
>>
>>
>> OSD 46:
>>
>> 2017-08-21 12:41:03.531394 7fb396b1f700 -1
>> bluestore(/var/lib/ceph/osd/ceph-46) _verify_csum bad crc32c/0x1000
>> checksum at blob offset 0x0, got 0x7aa05c01, expected 0x6706be76,
>> device location [0x1d6e1e~1000], logical extent 0x0~1000, object
>> #9:55bf7cc6:::rbd_data.33992ae8944a.200f:e#
>>
>>
>>
>> This is on Ceph 12.1.4 (previously 12.1.1).
>>
>>
>>
>> Thoughts?
>>
>>
>>
>> -
>>
>> Edward Huyer
>>
>> School of Interactive Games and Media
>>
>> Rochester Institute of Technology
>>
>> Golisano 70-2373
>>
>> 152 Lomb Memorial Drive
>>
>> Rochester, NY 14623
>>
>> 585-475-6651
>>
>> erh...@rit.edu
>>
>>
>>
>> Obligatory Legalese:
>>
>> The information transmitted, including attachments, is intended only
>> for the
>> person(s) or entity to which it is addressed and may contain
>> confidential and/or privileged material. Any review, retransmission,
>> dissemination or other use of, or taking of any action in reliance
>> upon this information by persons or entities other than the intended
>> recipient is prohibited. If you received this in error, please contact
>> the sender and destroy any copies of this information.
>>
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
> Cheers,
> Brad



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] State of play for RDMA on Luminous

2017-08-23 Thread Florian Haas
Hello everyone,

I'm trying to get a handle on the current state of the async messenger's
RDMA transport in Luminous, and I've noticed that the information
available is a little bit sparse (I've found
https://community.mellanox.com/docs/DOC-2693 and
https://community.mellanox.com/docs/DOC-2721, which are a great start
but don't look very complete). So I'm kicking off this thread that might
hopefully bring interested parties and developers together.

Could someone in the know please confirm that the following assumptions
of mine are accurate:

- RDMA support for the async messenger is available in Luminous.

- You enable it globally by setting ms_type to "async+rdma", and by
setting appropriate values for the various ms_async_rdma* options (most
importantly, ms_async_rdma_device_name).

- You can also set RDMA messaging just for the public or cluster
network, via ms_public_type and ms_cluster_type.

- Users have to make a global async+rdma vs. async+posix decision on
either network. For example, if either ms_type or ms_public_type is
configured to async+rdma on cluster nodes, then a client configured with
ms_type = async+posix can't communicate.

Based on those assumptions, I have the following questions:

- What is the current state of RDMA support in kernel libceph? In other
words, is there currently a way to map RBDs, or mount CephFS, if a Ceph
cluster uses RDMA messaging?

- In case there is no such support in the kernel yet: What's the current
status of RDMA support (and testing) with regard to
  * libcephfs?
  * the Samba Ceph VFS?
  * nfs-ganesha?
  * tcmu-runner?

- In summary, if a user wants to access their Ceph cluster via a POSIX
filesystem or via iSCSI, is enabling the RDMA-enabled async messenger in
the public network an option? Or would they have to continue running on
TCP/IP (possibly on IPoIB if they already have InfiniBand hardware)
until the client libraries catch up?

- And more broadly, if a user wants to use the performance benefits of
RDMA, but not all of their potential Ceph clients have InfiniBand HCAs,
what are their options? RoCE?

Thanks very much in advance for everyone's insight!

Cheers,
Florian


-- 
Please feel free to verify my identity:
https://keybase.io/fghaas



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cache tier unevictable objects

2017-08-23 Thread Eugen Block

Hi, thanks for your quick response!


Do I take it from this that your cache tier is only on one node?
If so upgrade the "Risky" up there to "Channeling Murphy".


The two SSDs are on two different nodes, but since we just started  
using cache tier, we decided to use a pool size of 2, we know it's not  
recommended.



If not, and your min_size is 1 as it should be for a size 2 pool, nothing
bad should happen.


The size and min_size is 2, I changed min_size to 1.


Penultimately, google is EVIL but helps you find answers:
http://tracker.ceph.com/issues/12659


I had already seen this, it describes what we are seeing in our  
cluster. Even though the cache_mode is set to "forward" I still see  
new objects written to it if I spawn a new instance. At least this  
lead to a better understanding, the rbd_header files seem to represent  
the clones of a snapshot if a new instance is spawned, that's helpful.
We plan to upgrade our cluster soon, but first we need to get rid of  
this cache pool. We'll continue to analyze the cache-pool, but if you  
have any more helpful insights, we would appreciate it!


Regards,
Eugen

Zitat von Christian Balzer :


On Tue, 22 Aug 2017 09:54:34 + Eugen Block wrote:


Hi list,

we have a productive Hammer cluster for our OpenStack cloud and
recently a colleague added a cache tier consisting of 2 SSDs and also
a pool size of 2, we're still experimenting with this topic.


Risky, but I guess you know that.


Now we have some hardware maintenance to do and need to shutdown
nodes, one at a time of course. So we tried to flush/evict the cache
pool and disable it to prevent data loss, we also set the cache-mode
to "forward". Most of the objects have been evicted successfully, but
there are still 39 objects left, and it's impossible to evict them.
I'm not sure how to make sure if we can just delete the cache pool
without data loss, we want to set up the cache-pool from scratch.


Do I take it from this that your cache tier is only on one node?
If so upgrade the "Risky" up there to "Channeling Murphy".

If not, and your min_size is 1 as it should be for a size 2 pool, nothing
bad should happen.

Penultimately, google is EVIL but helps you find answers:
http://tracker.ceph.com/issues/12659

Christian


# rados -p images-cache ls
rbd_header.210f542ae8944a
volume-ce17068e-a36d-4d9b-9779-3af473aba033.rbd
rbd_header.50ec372eb141f2
931f9a1e-2022-4571-909e-6c3f5f8c3ae8_disk.rbd
rbd_header.59dd32ae8944a
...

There are only 3 types of objects in the cache-pool:
   - rbd_header
   - volume-XXX.rbd (obviously cinder related)
   - XXX_disk (nova disks)

All rbd_header objects have a size of 0 if I run a "stat" command on
them, the rest has a size of 112. If I compare the objects with the
respective object in the cold-storage, they are identical:

Object rbd_header.1128db1b5d2111:
images-cache/rbd_header.1128db1b5d2111 mtime 2017-08-21
15:55:26.00, size 0
   images/rbd_header.1128db1b5d2111 mtime 2017-08-21
15:55:26.00, size 0

Object volume-fd07dd66-8a82-431c-99cf-9bfc3076af30.rbd:
images-cache/volume-fd07dd66-8a82-431c-99cf-9bfc3076af30.rbd mtime
2017-08-21 15:55:26.00, size 112
   images/volume-fd07dd66-8a82-431c-99cf-9bfc3076af30.rbd mtime
2017-08-21 15:55:26.00, size 112

Object 2dcb9d7d-3a4f-49a4-8792-b4b74f5b60e5_disk.rbd:
images-cache/2dcb9d7d-3a4f-49a4-8792-b4b74f5b60e5_disk.rbd mtime
2017-08-21 15:55:25.00, size 112
   images/2dcb9d7d-3a4f-49a4-8792-b4b74f5b60e5_disk.rbd mtime
2017-08-21 15:55:25.00, size 112

Some of them have an rbd_lock, some of them have a watcher, some don't
have any of that but they still can't be evicted:

# rados -p images-cache lock list rbd_header.2207c92ae8944a
{"objname":"rbd_header.2207c92ae8944a","locks":[]}
# rados -p images-cache listwatchers rbd_header.2207c92ae8944a
#
# rados -p images-cache cache-evict rbd_header.2207c92ae8944a
error from cache-evict rbd_header.2207c92ae8944a: (16) Device or  
resource busy


Then I also tried to shutdown an instance that uses some of the
volumes listed in the cache pool, but the objects didn't change at
all, the total number was also still 39. For the rbd_header objects I
don't even know how to identify their "owner", is there a way?

Has anyone a hint what else I could check or is it reasonable to
assume that the objects are really the same and there would be no data
loss in case we deleted that pool?
We appreciate any help!

Regards,
Eugen




--
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications




--
Eugen Block voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg e-mail  : ebl...@nde.ag

Vorsitzende des Aufsichtsrates: Angelika Mozdzen
  Sitz und Registergericht: Hamburg, HRB 90934
  Vorstand: Jens-U. Mozdzen
   USt-IdNr. DE 814 013 983

_