[ceph-users] RBD Image Features not working on Ubuntu 16.04 + Jewel 10.2.3.

2016-12-02 Thread Rakesh Parkiti
Hi All,


I. Firstly, As per my understanding, RBD image features (exclusive-lock, 
object-map, fast-diff, deep-flatten, journaling) are not yet ready for ceph 
Jewel version?

II. The only working image feature is "Layering".

III.Trying to configure rbd-mirroring on two different clusters, which has same 
"ceph" cluster name.

 --- Here I have observed two problems:

 a) Initial command "ceph-deploy --cluster tom new tom1" command is working 
fine in ubuntu 16.04, where as while creating initial monitor   unable to 
create a monitor. Error: Admin Socket Error.




b) Whereas in CentOS-7, It straight away says:

[user@local1 local]$ ceph-deploy --cluster tom new tom1
[ceph_deploy.conf][DEBUG ] found configuration file at: 
/home/user/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.36): /usr/bin/ceph-deploy --cluster 
active new local1
:
:
[ceph_deploy.new][ERROR ] custom cluster names are not supported on sysvinit 
hosts
[ceph_deploy][ERROR ] ClusterNameError: host local1 does not support custom 
cluster names

Note: This is an expected behavior as per REDHAT Errata forum.

Questions:-

1. To configure RBD-MIRRORING for images. Required RBD image features are 
"exclusive-lock + journaling" as these two features are mandate.
2. Are RBD image features working with older ceph versions like Hammer?
3. Any Operating System specific Kernel is required to work with these RBD 
Image features?
4. RBD-Mirroring is production ready? If Yes, can anyone share the working 
configuration steps?
5. How to change the cluster name from default "ceph" as cluster name?
I did not see any official document with proper steps for cluster name 
change. only I found this procedure from this below link
http://docs.ceph.com/docs/jewel/rados/deployment/ceph-deploy-new/ , If I am 
wrong, please direct me to proper link.
6. Can we configure rbd-mirroring with default "ceph" cluster name for two 
different clusters? If yes, how to isolate, which is primary and secondary?

--Below is the output, when try to create RBD image with these features 
(exclusive-lock, object-map, fast-diff, deep-flatten, journaling).


Steps Information:



user@tom1:~$ uname -a
Linux tom1 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:07:12 UTC 2016 x86_64 
x86_64 x86_64 GNU/Linux

user@tom1:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:Ubuntu 16.04.1 LTS
Release:16.04
Codename:   xenial


user@tom1:~$ ceph -v
ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)


user@tom1:~$ ceph -s
cluster c7c91460-3cd6-4183-9ebb-8880fb15865f
 health HEALTH_OK
 monmap e1: 3 mons at 
{tom1=10.1.24.93:6789/0,tom2=10.1.24.94:6789/0,tom3=10.1.24.95:6789/0}
election epoch 4, quorum 0,1,2 tom1,tom2,tom3
 osdmap e51: 9 osds: 9 up, 9 in
flags sortbitwise
  pgmap v255: 128 pgs, 1 pools, 114 bytes data, 5 objects
322 MB used, 134 GB / 134 GB avail
 128 active+clean


user@tom1:~$ rados lspools

rbd

user@tom1:~$ rbd create --image rbd/img1 --size 1G
user@tom1:~$ rbd --image rbd/img1 info
rbd image 'img1':
size 1024 MB in 256 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.105a2ae8944a
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:

user@tom1:~$ rbd feature enable rbd/img1 journaling
user@tom1:~$ rbd --image rbd/img1 info
rbd image 'img1':
size 1024 MB in 256 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.105a2ae8944a
format: 2
features: layering, exclusive-lock, object-map, fast-diff, 
deep-flatten, journaling
flags:
journal: 105a2ae8944a
mirroring state: disabled

user@tom1:~$ sudo rbd map --image rbd/img1
rbd: sysfs write failed
RBD image feature set mismatch. You can disable features unsupported by the 
kernel with "rbd feature disable".
In some cases useful info is found in syslog - try "dmesg | tail" or so.
rbd: map failed: (6) No such device or address


II. Working with RBD feature - "layering" only

user@tom1:~$ rbd create --image rbd/img2 --size 1G --image-feature layering
user@tom1:~$ rbd --image rbd/img2 info
rbd image 'img2':
size 1024 MB in 256 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.105f238e1f29
format: 2
features: layering
flags:

user@tom1:~$ sudo rbd map --image rbd/img2
/dev/rbd0

user@tom1:~$ rbd showmapped
id pool image snap device
0  rbd  img2  -/dev/rbd0

can someone help in replying these questions. It would be great help. Thanks..!

Thanks
Rakesh Parkiti




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph QoS user stories

2016-12-02 Thread Federico Lucifredi
Hi Sage,

 The primary QoS issue we see with OpenStack users is wanting to
guarantee minimum IOPS to each Cinder-mounted RBD volume as a way to
guarantee the health of well-mannered workloads against badly-behaving
ones.

 As an OpenStack Administrator, I want to guarantee a minimum number
of IOPS to each Cinder volume to prevent any tenant from interfering
with another.

  The number of IOPS may vary per volume, but in many cases a
"standard" and "high" number would probably suffice. The guarantee is
more important than the granularity.

  This is something impacting users at today's Ceph performance level.

  Looking at the future, once Bluestore becomes the default,  there
will also be latency requirements from the crowd that wants to run
databases with RBD backends — both low latency and low jitter in the
latency, but rather than applying that to all volumes, it will be only
to select ones backing RDBMs. Well, at least in the case of a general
purpose cluster.


 My hunch is that Enterprise users that want hard-QoS guarantees will
accept that a capacity planning exercise is necessary, software can
only allocate existing capacity, not create more. Community users may
value more some "fairness" in distributing existing resources instead.
Just a hunch at this point.

 Best -F

_
-- "You must try until your brain hurts —Elon Musk
(Federico L. Lucifredi) - federico at redhat.com - GnuPG 0x4A73884C

On Fri, Dec 2, 2016 at 2:01 PM, Sage Weil  wrote:
>
> Hi all,
>
> We're working on getting infrasture into RADOS to allow for proper
> distributed quality-of-service guarantees.  The work is based on the
> mclock paper published in OSDI'10
>
> https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Gulati.pdf
>
> There are a few ways this can be applied:
>
>  - We can use mclock simply as a better way to prioritize background
> activity (scrub, snap trimming, recovery, rebalancing) against client IO.
>  - We can use d-mclock to set QoS parameters (e.g., min IOPS or
> proportional priority/weight) on RADOS pools
>  - We can use d-mclock to set QoS parameters (e.g., min IOPS) for
> individual clients.
>
> Once the rados capabilities are in place, there will be a significant
> amount of effort needed to get all of the APIs in place to configure and
> set policy.  In order to make sure we build somethign that makes sense,
> I'd like to collection a set of user stores that we'd like to support so
> that we can make sure we capture everything (or at least the important
> things).
>
> Please add any use-cases that are important to you to this pad:
>
> http://pad.ceph.com/p/qos-user-stories
>
> or as a follow-up to this email.
>
> mClock works in terms of a minimum allocation (of IOPS or bandwidth; they
> are sort of reduced into a single unit of work), a maximum (i.e. simple
> cap), and a proportional weighting (to allocation any additional capacity
> after the minimum allocations are satisfied).  It's somewhat flexible in
> terms of how we apply it to specific clients, classes of clients, or types
> of work (e.g., recovery).  How we put it all together really depends on
> what kinds of things we need to accomplish (e.g., do we need to support a
> guaranteed level of service shared across a specific set of N different
> clients, or only individual clients?).
>
> Thanks!
> sage
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph QoS user stories

2016-12-02 Thread Federico Lucifredi
Hi Sage,

 The primary QoS issue we see with OpenStack users is wanting to guarantee
minimum IOPS to each Cinder-mounted RBD volume as a way to guarantee the
health of well-mannered workloads against badly-behaving ones.

 As an OpenStack Administrator, I want to guarantee a minimum number of
IOPS to each Cinder volume to prevent any tenant from interfering with
another.

  The number of IOPS may vary per volume, but in many cases a "standard"
and "high" number would probably suffice. The guarantee is more important
than the granularity.

  This is something impacting users at today's Ceph performance level.

  Looking at the future, once Bluestore becomes the default,  there will
also be latency requirements from the crowd that wants to run databases
with RBD backends — both low latency and low jitter in the latency, but
rather than applying that to all volumes, it will be only to select ones
backing RDBMs. Well, at least in the case of a general purpose cluster.


 My hunch is that Enterprise users that want hard-QoS guarantees will
accept that a capacity planning exercise is necessary, software can only
allocate existing capacity, not create more. Community users may value more
some "fairness" in distributing existing resources instead. Just a hunch at
this point.

 Best -F

_
-- "You must try until your brain hurts —Elon Musk
(Federico L. Lucifredi) - federico at redhat.com - GnuPG 0x4A73884C

On Fri, Dec 2, 2016 at 2:01 PM, Sage Weil  wrote:

> Hi all,
>
> We're working on getting infrasture into RADOS to allow for proper
> distributed quality-of-service guarantees.  The work is based on the
> mclock paper published in OSDI'10
>
> https://www.usenix.org/legacy/event/osdi10/tech/full_papers/
> Gulati.pdf
>
> There are a few ways this can be applied:
>
>  - We can use mclock simply as a better way to prioritize background
> activity (scrub, snap trimming, recovery, rebalancing) against client IO.
>  - We can use d-mclock to set QoS parameters (e.g., min IOPS or
> proportional priority/weight) on RADOS pools
>  - We can use d-mclock to set QoS parameters (e.g., min IOPS) for
> individual clients.
>
> Once the rados capabilities are in place, there will be a significant
> amount of effort needed to get all of the APIs in place to configure and
> set policy.  In order to make sure we build somethign that makes sense,
> I'd like to collection a set of user stores that we'd like to support so
> that we can make sure we capture everything (or at least the important
> things).
>
> Please add any use-cases that are important to you to this pad:
>
> http://pad.ceph.com/p/qos-user-stories
>
> or as a follow-up to this email.
>
> mClock works in terms of a minimum allocation (of IOPS or bandwidth; they
> are sort of reduced into a single unit of work), a maximum (i.e. simple
> cap), and a proportional weighting (to allocation any additional capacity
> after the minimum allocations are satisfied).  It's somewhat flexible in
> terms of how we apply it to specific clients, classes of clients, or types
> of work (e.g., recovery).  How we put it all together really depends on
> what kinds of things we need to accomplish (e.g., do we need to support a
> guaranteed level of service shared across a specific set of N different
> clients, or only individual clients?).
>
> Thanks!
> sage
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph QoS user stories

2016-12-02 Thread Nick Fisk
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Sage 
> Weil
> Sent: 02 December 2016 19:02
> To: ceph-de...@vger.kernel.org; ceph-us...@ceph.com
> Subject: [ceph-users] Ceph QoS user stories
> 
> Hi all,
> 
> We're working on getting infrasture into RADOS to allow for proper 
> distributed quality-of-service guarantees.  The work is based
on
> the mclock paper published in OSDI'10
> 
>   https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Gulati.pdf
> 
> There are a few ways this can be applied:
> 
>  - We can use mclock simply as a better way to prioritize background activity 
> (scrub, snap trimming, recovery, rebalancing)
against
> client IO.
>  - We can use d-mclock to set QoS parameters (e.g., min IOPS or proportional 
> priority/weight) on RADOS pools
>  - We can use d-mclock to set QoS parameters (e.g., min IOPS) for individual 
> clients.
> 
> Once the rados capabilities are in place, there will be a significant amount 
> of effort needed to get all of the APIs in place to
configure
> and set policy.  In order to make sure we build somethign that makes sense, 
> I'd like to collection a set of user stores that we'd
like to
> support so that we can make sure we capture everything (or at least the 
> important things).
> 
> Please add any use-cases that are important to you to this pad:
> 
>   http://pad.ceph.com/p/qos-user-stories
> 
> or as a follow-up to this email.
> 
> mClock works in terms of a minimum allocation (of IOPS or bandwidth; they are 
> sort of reduced into a single unit of work), a
maximum
> (i.e. simple cap), and a proportional weighting (to allocation any additional 
> capacity after the minimum allocations are
satisfied).  It's
> somewhat flexible in terms of how we apply it to specific clients, classes of 
> clients, or types of work (e.g., recovery).  How we
put it all
> together really depends on what kinds of things we need to accomplish (e.g., 
> do we need to support a guaranteed level of service
> shared across a specific set of N different clients, or only individual 
> clients?).
> 
> Thanks!
> sage
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Hi Sage,

You mention IOPs and Bandwidth but would this be applicable to latency as well? 
Some client operations (buffered IO) can hit several
hundered iops with terrible latency if queue depth is high enough. When the 
intended requirement might have been to have a more
responsive application.

Would it be possible to apply some sort of shares system to the minimum 
allocation. Ie, in the event not all allocations can be met,
will it gracefully try to balance available resources or will it completely 
starve some clients. Maybe partial loss of cluster has
caused performance drop, or user has set read latency to 1ms on a disk based 
cluster. Is this a tuneable parameter, deadline vs
sharesetc

I can think of a number of scenarios where QOS may help and how it might be 
applied. Hope they are of some use.

1. Min iop/bandwith/latency for important vm. Probably settable on a per RBD 
basis. Can maybe have an inheritable default from Rados
pool, or customised to allow to offer bronze/silver/gold service levels.

2. Max iop/bandwith to limit noisy clients, but with option for over allocation 
if free resources available

3. Min Bandwidth for streaming to tape. Again set per RBD or RBD snapshot. 
Would help filter out the impact of clients emptying
their buffered writes, as small drops in performance massively effect 
continuous streaming of tape.

4. Ability to QOS either reads or writes. Eg SQL DB's will benefit from fast 
consistent sync write latency. But actual write
throughput is fairly small and coalesces well. Being able to make sure all 
writes jump to front of queue would ensure good
performance.

5. If size < min_size I want recovery to take very high priority as ops might 
be blocked

6. There probably needs to be some sort of reporting to go along with this to 
be able to see which targets are being missed/met. I
guess this needs some sort or "ceph top" or "rbd top" before it can be 
implemented?

7. Currently a RBD with a snapshot can overload a cluster if you do lots of 
small random writes to the parent. COW causes massive
write amplification. If QOS was set on the parent, how are these COW writes 
taken into account?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrate OSD Journal to SSD

2016-12-02 Thread Warren Wang - ISD
I’ve actually had to migrate every single journal in many clusters from one 
(horrible) SSD model to a better SSD. It went smoothly. You’ll also need to 
update your /var/lib/ceph/osd/ceph-*/journal_uuid file. 

Honestly, the only challenging part was mapping and automating the back and 
forth conversion from /dev/sd* to the uuid for the corresponding osd.  I would 
share the script, but it was at my previous employer.

Warren Wang
Walmart ✻

On 12/1/16, 7:26 PM, "ceph-users on behalf of Christian Balzer" 
 wrote:

On Thu, 1 Dec 2016 18:06:38 -0600 Reed Dier wrote:

> Apologies if this has been asked dozens of times before, but most answers 
are from pre-Jewel days, and want to double check that the methodology still 
holds.
> 
It does.

> Currently have 16 OSD’s across 8 machines with on-disk journals, created 
using ceph-deploy.
> 
> These machines have NVMe storage (Intel P3600 series) for the system 
volume, and am thinking about carving out a partition for SSD journals for the 
OSD’s. The drives don’t make tons of use of the local storage, so should have 
plenty of io overhead to support the OSD journaling, as well as the P3600 
should have the endurance to handle the added write wear.
>
Slight disconnect there, money for a NVMe (which size?) and on disk
journals? ^_-
 
> From what I’ve read, you need a partition per OSD journal, so with the 
probability of a third (and final) OSD being added to each node, I should 
create 3 partitions, each ~8GB in size (is this a good value? 8TB OSD’s, is the 
journal size based on size of data or number of objects, or something else?).
> 
Journal size is unrelated to the OSD per se, with default parameters and
HDDs for OSDs a size of 10GB would be more than adequate, the default of
5GB would do as well.

> So:
> {create partitions}
> set noout
> service ceph stop osd.$i
> ceph-osd -i osd.$i —flush-journal
> rm -f rm -f /var/lib/ceph/osd//journal
Typo and there should be no need for -f. ^_^

> ln -s  /var/lib/ceph/osd//journal /dev/
Even though in your case with a single(?) NVMe there is little chance for
confusion, ALWAYS reference to devices by their UUID or similar, I prefer
the ID:
---
lrwxrwxrwx   1 root root44 May 21  2015 journal -> 
/dev/disk/by-id/wwn-0x55cd2e404b73d570-part4
---


> ceph-osd -i osd.$i -mkjournal
> service ceph start osd.$i
> ceph osd unset noout
> 
> Does this logic appear to hold up?
> 
Yup.

Christian

> Appreciate the help.
> 
> Thanks,
> 
> Reed

-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
***
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Announcing: Embedded Ceph and Rook

2016-12-02 Thread Bassam Tabbara
Hi Dan,

Is there anyplace you explain in more detail about why this design is
attractive?  I'm having a hard time imagining why applications would
want to try to embed the cluster.

Take a look at https://github.com/rook/rook for a small explanation of how we 
use embedded Ceph.

Thanks!
Bassam

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph and rrdtool

2016-12-02 Thread Steve Jankowski
Anyone using rrdtool with Ceph via rados or cephfs ?


If so, how many rrd files and how many rrd file updates per minute.


We have a large population of rrd files that's growing beyond a single machine. 
 We're already using SSD and rrdcached with great success, but it's not enough 
for the growth that's coming.


A distributed file store would tick a lot of check boxes, but it needs to 
survive the high volume small write IOPS produced by rrdtool.


Thanks,

Steve
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Announcing: Embedded Ceph and Rook

2016-12-02 Thread Dan Mick
On 11/30/2016 03:46 PM, Bassam Tabbara wrote:
> Hello Cephers,
> 
> I wanted to let you know about a new library that is now available in
> master. It's called “libcephd” and it enables the embedding of Ceph
> daemons like MON and OSD (and soon MDS and RGW) into other applications.
> Using libcephd it's possible to create new applications that closely
> integrate Ceph storage without bringing in the full distribution of Ceph
> and its dependencies. For example, you can build storage application
> that runs the Ceph daemons on limited distributions like CoreOS natively
> or along side a hypervisor for hyperconverged scenarios. The goal is to
> enable a broader ecosystem of solutions built around Ceph and reduce
> some of the friction for adopting Ceph today. See
> http://pad.ceph.com/p/embedded-ceph for the blueprint.
> 
> We (Quantum) are using embedded Ceph in a new open-source project called
> Rook (https://github.com/rook/rook and https://rook.io). Rook integrates
> embedded Ceph in a deployment that is targeting cloud-native applications.
> 
> Please feel free to respond with feedback. Also if you’re in the Seattle
> area next week stop by for a meetup on embedded Ceph and its use in Rook
> https://www.meetup.com/Pacific-Northwest-Ceph-Meetup/events/235632106/
> 
> Thanks!
> Bassam
> 


Is there anyplace you explain in more detail about why this design is
attractive?  I'm having a hard time imagining why applications would
want to try to embed the cluster.

-- 
Dan Mick
Red Hat, Inc.
Ceph docs: http://ceph.com/docs
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph QoS user stories

2016-12-02 Thread Sage Weil
Hi all,

We're working on getting infrasture into RADOS to allow for proper 
distributed quality-of-service guarantees.  The work is based on the 
mclock paper published in OSDI'10

https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Gulati.pdf

There are a few ways this can be applied:

 - We can use mclock simply as a better way to prioritize background 
activity (scrub, snap trimming, recovery, rebalancing) against client IO.
 - We can use d-mclock to set QoS parameters (e.g., min IOPS or 
proportional priority/weight) on RADOS pools
 - We can use d-mclock to set QoS parameters (e.g., min IOPS) for 
individual clients.

Once the rados capabilities are in place, there will be a significant 
amount of effort needed to get all of the APIs in place to configure and 
set policy.  In order to make sure we build somethign that makes sense, 
I'd like to collection a set of user stores that we'd like to support so 
that we can make sure we capture everything (or at least the important 
things).

Please add any use-cases that are important to you to this pad:

http://pad.ceph.com/p/qos-user-stories

or as a follow-up to this email.

mClock works in terms of a minimum allocation (of IOPS or bandwidth; they 
are sort of reduced into a single unit of work), a maximum (i.e. simple 
cap), and a proportional weighting (to allocation any additional capacity 
after the minimum allocations are satisfied).  It's somewhat flexible in 
terms of how we apply it to specific clients, classes of clients, or types 
of work (e.g., recovery).  How we put it all together really depends on 
what kinds of things we need to accomplish (e.g., do we need to support a 
guaranteed level of service shared across a specific set of N different 
clients, or only individual clients?).

Thanks!
sage

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] stalls caused by scrub on jewel

2016-12-02 Thread Dan Jakubiec

> On Dec 2, 2016, at 10:48, Sage Weil  wrote:
> 
> On Fri, 2 Dec 2016, Dan Jakubiec wrote:
>> For what it's worth... this sounds like the condition we hit we 
>> re-enabled scrub on our 16 OSDs (after 6 to 8 weeks of noscrub).  They 
>> flapped for about 30 minutes as most of the OSDs randomly hit suicide 
>> timeouts here and there.
>> 
>> This settled down after about an hour and the OSDs stopped dying.  We 
>> have since left scrub enabled for about 4 days and have only seen three 
>> small spurts of OSD flapping since then (which quickly resolved 
>> themselves).
> 
> Yeah.  I think what's happening is that with a cold cache it is slow 
> enough to suicide, but with a warm cache it manages to complete (although 
> I bet it's still stalling other client IO for perhaps multiple seconds).  
> I would leave noscrub set for now.

Ah... thanks for the suggestion!  We are indeed working through some jerky 
performance issues.  Perhaps this is a layer of that onion, thank you.

-- Dan

> 
> sage
> 
> 
> 
> 
>> 
>> -- Dan
>> 
>>> On Dec 1, 2016, at 14:38, Frédéric Nass  
>>> wrote:
>>> 
>>> Hi Yoann,
>>> 
>>> Thank you for your input. I was just told by RH support that it’s gonna 
>>> make it to RHCS 2.0 (10.2.3). Thank you guys for the fix !
>>> 
>>> We thought about increasing the number of PGs just after changing the 
>>> merge/split threshold values but this would have led to a _lot_ of data 
>>> movements (1.2 billion of XFS files) over weeks, without any possibility to 
>>> scrub / deep-scrub to ensure data consistency. Still as soon as we get the 
>>> fix, we will increase the number of PGs.
>>> 
>>> Regards,
>>> 
>>> Frederic.
>>> 
>>> 
>>> 
 Le 1 déc. 2016 à 16:47, Yoann Moulin  a écrit :
 
 Hello,
 
> We're impacted by this bug (case 01725311). Our cluster is running RHCS 
> 2.0 and is no more capable to scrub neither deep-scrub.
> 
> [1] http://tracker.ceph.com/issues/17859
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1394007
> [3] https://github.com/ceph/ceph/pull/11898
> 
> I'm worried we'll have to live with a cluster that can't scrub/deep-scrub 
> until March 2017 (ETA for RHCS 2.2 running Jewel 10.2.4).
> 
> Can we have this fix any sooner ?
 
 As far as I know about that bug, it appears if you have big PGs, a 
 workaround could be increasing the pg_num of the pool that has the biggest 
 PGs.
 
 -- 
 Yoann Moulin
 EPFL IC-IT
>>> 
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] stalls caused by scrub on jewel

2016-12-02 Thread Sage Weil
On Fri, 2 Dec 2016, Dan Jakubiec wrote:
> For what it's worth... this sounds like the condition we hit we 
> re-enabled scrub on our 16 OSDs (after 6 to 8 weeks of noscrub).  They 
> flapped for about 30 minutes as most of the OSDs randomly hit suicide 
> timeouts here and there.
> 
> This settled down after about an hour and the OSDs stopped dying.  We 
> have since left scrub enabled for about 4 days and have only seen three 
> small spurts of OSD flapping since then (which quickly resolved 
> themselves).

Yeah.  I think what's happening is that with a cold cache it is slow 
enough to suicide, but with a warm cache it manages to complete (although 
I bet it's still stalling other client IO for perhaps multiple seconds).  
I would leave noscrub set for now.

sage




> 
> -- Dan
> 
> > On Dec 1, 2016, at 14:38, Frédéric Nass  
> > wrote:
> > 
> > Hi Yoann,
> > 
> > Thank you for your input. I was just told by RH support that it’s gonna 
> > make it to RHCS 2.0 (10.2.3). Thank you guys for the fix !
> > 
> > We thought about increasing the number of PGs just after changing the 
> > merge/split threshold values but this would have led to a _lot_ of data 
> > movements (1.2 billion of XFS files) over weeks, without any possibility to 
> > scrub / deep-scrub to ensure data consistency. Still as soon as we get the 
> > fix, we will increase the number of PGs.
> > 
> > Regards,
> > 
> > Frederic.
> > 
> > 
> > 
> >> Le 1 déc. 2016 à 16:47, Yoann Moulin  a écrit :
> >> 
> >> Hello,
> >> 
> >>> We're impacted by this bug (case 01725311). Our cluster is running RHCS 
> >>> 2.0 and is no more capable to scrub neither deep-scrub.
> >>> 
> >>> [1] http://tracker.ceph.com/issues/17859
> >>> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1394007
> >>> [3] https://github.com/ceph/ceph/pull/11898
> >>> 
> >>> I'm worried we'll have to live with a cluster that can't scrub/deep-scrub 
> >>> until March 2017 (ETA for RHCS 2.2 running Jewel 10.2.4).
> >>> 
> >>> Can we have this fix any sooner ?
> >> 
> >> As far as I know about that bug, it appears if you have big PGs, a 
> >> workaround could be increasing the pg_num of the pool that has the biggest 
> >> PGs.
> >> 
> >> -- 
> >> Yoann Moulin
> >> EPFL IC-IT
> > 
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] stalls caused by scrub on jewel

2016-12-02 Thread Dan Jakubiec
For what it's worth... this sounds like the condition we hit we re-enabled 
scrub on our 16 OSDs (after 6 to 8 weeks of noscrub).  They flapped for about 
30 minutes as most of the OSDs randomly hit suicide timeouts here and there.

This settled down after about an hour and the OSDs stopped dying.  We have 
since left scrub enabled for about 4 days and have only seen three small spurts 
of OSD flapping since then (which quickly resolved themselves).

-- Dan

> On Dec 1, 2016, at 14:38, Frédéric Nass  
> wrote:
> 
> Hi Yoann,
> 
> Thank you for your input. I was just told by RH support that it’s gonna make 
> it to RHCS 2.0 (10.2.3). Thank you guys for the fix !
> 
> We thought about increasing the number of PGs just after changing the 
> merge/split threshold values but this would have led to a _lot_ of data 
> movements (1.2 billion of XFS files) over weeks, without any possibility to 
> scrub / deep-scrub to ensure data consistency. Still as soon as we get the 
> fix, we will increase the number of PGs.
> 
> Regards,
> 
> Frederic.
> 
> 
> 
>> Le 1 déc. 2016 à 16:47, Yoann Moulin  a écrit :
>> 
>> Hello,
>> 
>>> We're impacted by this bug (case 01725311). Our cluster is running RHCS 2.0 
>>> and is no more capable to scrub neither deep-scrub.
>>> 
>>> [1] http://tracker.ceph.com/issues/17859
>>> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1394007
>>> [3] https://github.com/ceph/ceph/pull/11898
>>> 
>>> I'm worried we'll have to live with a cluster that can't scrub/deep-scrub 
>>> until March 2017 (ETA for RHCS 2.2 running Jewel 10.2.4).
>>> 
>>> Can we have this fix any sooner ?
>> 
>> As far as I know about that bug, it appears if you have big PGs, a 
>> workaround could be increasing the pg_num of the pool that has the biggest 
>> PGs.
>> 
>> -- 
>> Yoann Moulin
>> EPFL IC-IT
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rgw: how to prevent rgw user from creating a new bucket?

2016-12-02 Thread Yehuda Sadeh-Weinraub
On Fri, Dec 2, 2016 at 3:18 AM, Yang Joseph  wrote:
> Hello,
>
> I would like only to allow the user to read the object in a already existed
> bucket, and not allow users
> to create new bucket. It supposed to execute the following command:
>
> $ radosgw-admin metadata put user:test3 < ...
>   ...
> "caps": [
> {
> "type": "buckets",
> "perm": "read"
> }
>
> But why user test3 can still create new bucket after I have set its caps to
> "buckets=read"?
>


Because this cap is unrelated. iirc starting at jewel you can do:

$ radosgw-admin user modify --uid=test3 --max-buckets=-1

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrate OSD Journal to SSD

2016-12-02 Thread Reed Dier

> On Dec 1, 2016, at 6:26 PM, Christian Balzer  wrote:
> 
> On Thu, 1 Dec 2016 18:06:38 -0600 Reed Dier wrote:
> 
>> Apologies if this has been asked dozens of times before, but most answers 
>> are from pre-Jewel days, and want to double check that the methodology still 
>> holds.
>> 
> It does.
> 
>> Currently have 16 OSD’s across 8 machines with on-disk journals, created 
>> using ceph-deploy.
>> 
>> These machines have NVMe storage (Intel P3600 series) for the system volume, 
>> and am thinking about carving out a partition for SSD journals for the 
>> OSD’s. The drives don’t make tons of use of the local storage, so should 
>> have plenty of io overhead to support the OSD journaling, as well as the 
>> P3600 should have the endurance to handle the added write wear.
>> 
> Slight disconnect there, money for a NVMe (which size?) and on disk
> journals? ^_-

NVMe was already in place before the ceph project began. 400GB P3600, with 
~275GB available space after swap partition.

>> From what I’ve read, you need a partition per OSD journal, so with the 
>> probability of a third (and final) OSD being added to each node, I should 
>> create 3 partitions, each ~8GB in size (is this a good value? 8TB OSD’s, is 
>> the journal size based on size of data or number of objects, or something 
>> else?).
>> 
> Journal size is unrelated to the OSD per se, with default parameters and
> HDDs for OSDs a size of 10GB would be more than adequate, the default of
> 5GB would do as well.

I was under the impression that it was agnostic to either metric, but figured I 
should ask while I had the chance.

>> So:
>> {create partitions}
>> set noout
>> service ceph stop osd.$i
>> ceph-osd -i osd.$i —flush-journal
>> rm -f rm -f /var/lib/ceph/osd//journal
> Typo and there should be no need for -f. ^_^
> 
>> ln -s  /var/lib/ceph/osd//journal /dev/
> Even though in your case with a single(?) NVMe there is little chance for
> confusion, ALWAYS reference to devices by their UUID or similar, I prefer
> the ID:
> ---
> lrwxrwxrwx   1 root root44 May 21  2015 journal -> 
> /dev/disk/by-id/wwn-0x55cd2e404b73d570-part4
> —

Correct, would reference by UUID.

Thanks again for the sanity check.

Reed

> 
>> ceph-osd -i osd.$i -mkjournal
>> service ceph start osd.$i
>> ceph osd unset noout
>> 
>> Does this logic appear to hold up?
>> 
> Yup.
> 
> Christian
> 
>> Appreciate the help.
>> 
>> Thanks,
>> 
>> Reed
> 
> -- 
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com Global OnLine Japan/Rakuten Communications
> http://www.gol.com/

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] node and its OSDs down...

2016-12-02 Thread David Turner
If you want to reweight only once when you have a failed disk that is being 
balanced off of, set the crush weight for that osd to 0.0.  Then when you fully 
remove the disk from the cluster it will not do any additional backfilling.  
Any change to the crush map will likely move data around, even if you're 
removing an already "removed" osd.



[cid:imagebf1f6b.JPG@cb3472a2.40a10c92]   David 
Turner | Cloud Operations Engineer | StorageCraft Technology 
Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2760 | Mobile: 385.224.2943



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.




From: M Ranga Swami Reddy [swamire...@gmail.com]
Sent: Thursday, December 01, 2016 11:45 PM
To: David Turner
Cc: ceph-users
Subject: Re: [ceph-users] node and its OSDs down...

Hi David - Yep, I did the "ceph osd crush remove osd.", which started the 
recovery.
My worries is - why Ceph is doing the recovery, if an OSD is already down and 
no more in the cluster. That means, ceph already maintained down OSDs objects 
copied to another OSDs.. here is the ceph osd tree o/p:
===
227 0.91osd.227 down0

250 0.91osd.250 down0
===

So to avoid the recovery/rebalance , can I set the weight of OSD (which was in 
down state). But is this weight setting also lead to rebalance activity.

Thanks
Swami


On Thu, Dec 1, 2016 at 8:07 PM, David Turner 
mailto:david.tur...@storagecraft.com>> wrote:

I assume you also did ceph osd crush remove osd..  When you removed the osd 
that was down/out and balanced off of, you changed the weight of the host that 
it was on which triggers additional backfilling to balance the crush map.



[cid:image0b480d.JPG@7a964f55.48b534e9]   David 
Turner | Cloud Operations Engineer | StorageCraft Technology 
Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2760 | Mobile: 
385.224.2943



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.




From: ceph-users 
[ceph-users-boun...@lists.ceph.com] 
on behalf of M Ranga Swami Reddy 
[swamire...@gmail.com]
Sent: Thursday, December 01, 2016 3:03 AM
To: ceph-users
Subject: [ceph-users] node and its OSDs down...

Hello,
One of my ceph node with 20 OSDs down...After a couple of hours, ceph health is 
in OK state.

Now, I tried to remove those OSDs, which were down state from ceph cluster...
using the "ceh osd remove osd."
then ceph clsuter started rebalancing...which is strange ..because thsoe OSDs 
are down for a long time and health also OK..
my question - why recovery or reblance started when I remove the OSD (which was 
down).

Thanks
Swami

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to create two isolated rgw services in one ceph cluster?

2016-12-02 Thread Abhishek L

piglei writes:

> Hi, I am a ceph newbie. I want to create two isolated rgw services in a 
> single ceph cluster, the requirements:
>
> * Two radosgw will have different hosts, such as radosgw-x.site.com and 
> radosgw-y.site.com. File uploaded to rgw-xcannot be accessed via rgw-y.
> * Isolated bucket and user namespaces is not necessary, because I could 
> prepend term to bucket name and user name, like "x-bucket" or "y-bucket"
>
> At first I thought region and zone may be the solution, but after a little 
> more researchs, I found that region and zone are for different geo locations, 
> they share the same metadata (buckets and users) and objects instead of 
> isolated copies.
>
> After that I noticed ceph's multi-tenancy feature since jewel release, which 
> is probably what I'm looking for, here is my solution using multi-tenancy:
>
> * using two tenant called x and y, each rgw service matches one tenant.
> * Limit incoming requests to rgw in it's own tenant, which means you can only 
> retrieve resources belongs to buckets "x:bucket" when 
> callingradosgw-x.site.com. This can be archived by some custom nginx rules.
>
> Is this the right approach or Should I just use two different clusters 
> instead? Looking forward to your awesome advises.
>

Since jewel, you can also consider looking into realms which sort of
provide for isolated namespaces within a zone or zonegroup.

--
Abhishek
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to create two isolated rgw services in one ceph cluster?

2016-12-02 Thread piglei
Hi, I am a ceph newbie. I want to create two isolated rgw services in a
single ceph cluster, the requirements:

   - Two radosgw will have different hosts, such as radosgw-x.site.com and
   radosgw-y.site.com . *File uploaded
   to rgw-xcannot be accessed via rgw-y.*
   - Isolated bucket and user namespaces is not necessary, because I could
   prepend term to bucket name and user name, like "x-bucket" or "y-bucket"

At first I thought region and zone may be the solution, but after a little
more researchs, I found that region and zone are for different geo
locations, they share the same metadata (buckets and users) and objects
instead of isolated copies.

After that I noticed ceph's multi-tenancy feature since jewel release,
which is probably what I'm looking for, here is my solution using
multi-tenancy:

   - using two tenant called x and y, each rgw service matches one tenant.
   - Limit incoming requests to rgw in it's own tenant, which means you can
   only retrieve resources belongs to buckets "x:bucket" when calling
   radosgw-x.site.com. This can be archived by some custom nginx rules.

Is this the right approach or Should I just use two different clusters
instead? Looking forward to your awesome advises.

Thank you!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd_default_features

2016-12-02 Thread Ilya Dryomov
On Thu, Dec 1, 2016 at 10:31 PM, Florent B  wrote:
> Hi,
>
> On 12/01/2016 10:26 PM, Tomas Kukral wrote:
>>
>> I wasn't successful trying to find table with indexes of features ...
>> does anybody know?
>
> In sources :
> https://github.com/ceph/ceph/blob/master/src/include/rbd/features.h

There is a ticket for it with a nice description [1], but it looks like
it hasn't made it to the upstream docs...

"The features can be specified via the command-line when creating images
or the default features can be specified in the Ceph config file via
'rbd_default_features = '

- Layering: Layering enables you to use cloning
  Config numeric value: 1
  CLI value: layering
- Striping v2: Striping spreads data across multiple objects. Striping
  helps with parallelism for sequential read/write workloads.
  Config numeric value: 2
  CLI value: striping
- Exclusive locking: When enabled, it requires a client to get a lock
  on an object before making a write. Exclusive lock should only be
  enabled when a single client is accessing an image at the same time.
  Config numeric value: 4
  CLI value: exclusive-lock
- Object map: Object map support depends on exclusive lock support.
  Block devices are thin provisioned -- meaning, they only store data
  that actually exists. Object map support helps track which objects
  actually exist (have data stored on a drive). Enabling object map
  support speeds up I/O operations for cloning; importing and exporting
  a sparsely populated image; and deleting.
  Config numeric value: 8
  CLI value: object-map
- Fast-diff: Fast-diff support depends on object map support and
  exclusive lock support. It adds another property to the object map,
  which makes it much faster to generate diffs between snapshots of an
  image, and the actual data usage of a snapshot much faster.
  Config numeric value: 16
  CLI value: fast-diff
- Deep-flatten: Deep-flatten makes rbd flatten work on all the
  snapshots of an image, in addition to the image itself. Without it,
  snapshots of an image will still rely on the parent, so the parent
  will not be delete-able until the snapshots are deleted. Deep-flatten
  makes a parent independent of its clones, even if they have
  snapshots.
  Config numeric value: 32
  CLI value: deep-flatten
- Journaling: Journaling support depends on exclusive lock support.
  Journaling records all modifications to an image in the order they
  occur. RBD mirroring utilizes the journal to replicate a crash
  consistent image to a remote cluster.
  Config numeric value: 64
  CLI value: journaling"

[1] http://tracker.ceph.com/issues/15000

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is there a setting on Ceph that we can use to fix the minimum read size?

2016-12-02 Thread Thomas Bennett
Hi Steve and Kate,

Again - thanks again for the great suggestions.

Increasing the allocsize did not help us in the situation relating to my
current testing (poor read performance). However, allocsize is a great for
parameter for overall performance tuning and I intend to use it. :)

After discussion with colleagues and reading this article - ubuntu drive io
scheduler
,
I decided to try out the cfq io schedular - ubuntu now defaults to deadline.

This made a significant difference - it actually double the overall read
performance.

I suggest anyone using ubuntu 14.04 or higher and high density osd nodes
(we have 48 osds per osd node) might like to test out cfq. It's also a
pretty easy test to perform :) and can be done on the fly.

Cheers,
Tom

On Wed, Nov 30, 2016 at 5:50 PM, Steve Taylor  wrote:

> We’re using Ubuntu 14.04 on x86_64. We just added ‘osd mount options xfs =
> rw,noatime,inode64,allocsize=1m’ to the [osd] section of our ceph.conf so
> XFS allocates 1M blocks for new files. That only affected new files, so
> manual defragmentation was still necessary to clean up older data, but once
> that was done everything got better and stayed better.
>
>
>
> You can use the xfs_db command to check fragmentation on an XFS volume and
> xfs_fsr to perform a defragmentation. The defragmentation can run on a
> mounted filesystem too, so you don’t even have to rely on Ceph to avoid
> downtime. I probably wouldn’t run it everywhere at once though for
> performance reasons. A single OSD at a time would be ideal, but that’s a
> matter of preference.
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Thomas Bennett
> *Sent:* Wednesday, November 30, 2016 5:58 AM
>
> *Cc:* ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] Is there a setting on Ceph that we can use to
> fix the minimum read size?
>
>
>
> Hi Kate and Steve,
>
>
>
> Thanks for the replies. Always good to hear back from a community :)
>
>
>
> I'm using Linux on x86_64 architecture and the block size is limited to
> the page size which is 4k. So it looks like I'm hitting hard limits in any
> changes. to increase the block size.
>
>
>
> I found this out by running the following command:
>
>
>
> $ mkfs.xfs -f -b size=8192 /dev/sda1
>
>
>
> $ mount -v /dev/sda1 /tmp/disk/
>
> mount: Function not implemented #huh???
>
>
>
> Checking out the man page:
>
>
>
> $ man mkfs.xfs
>
>  -b block_size_options
>
>   ... XFS  on  Linux  currently  only  supports pagesize or smaller
> blocks.
>
>
>
> I'm hesitant to implement btrfs as its still experimental and ext4 seems
> to have the same current limitation.
>
>
>
> Our current approach is to exclude the hard drive that we're getting the
> poor read rates from our procurement process, but it would still be nice to
> find out how much control we have over how ceph-osd  daemons read from the
> drives. I may attempts a strace on an osd daemon as we read to see what the
> actual read request size is being asked to the kernel.
>
>
>
> Cheers,
>
> Tom
>
>
>
>
>
> On Tue, Nov 29, 2016 at 11:53 PM, Steve Taylor <
> steve.tay...@storagecraft.com> wrote:
>
> We configured XFS on our OSDs to use 1M blocks (our use case is RBDs with
> 1M blocks) due to massive fragmentation in our filestores a while back. We
> were having to defrag all the time and cluster performance was noticeably
> degraded. We also create and delete lots of RBD snapshots on a daily basis,
> so that likely contributed to the fragmentation as well. It’s been MUCH
> better since we switched XFS to use 1M allocations. Virtually no
> fragmentation and performance is consistently good.
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New to ceph - error running create-initial

2016-12-02 Thread Oleg Kolosov
Hi
Thank you for your answer. Unfortunately both workarounds didn't solve the
issue:
1) Pusing admin key onto mon:
ubuntu@ip-172-31-38-183:~/my-cluster$ ceph-deploy --username ubuntu admin
mon1
[ceph_deploy.conf][DEBUG ] found configuration file at:
/home/ubuntu/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.36): /usr/local/bin/ceph-deploy
--username ubuntu admin mon1
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username  : ubuntu
[ceph_deploy.cli][INFO  ]  verbose   : False
[ceph_deploy.cli][INFO  ]  overwrite_conf: False
[ceph_deploy.cli][INFO  ]  quiet : False
[ceph_deploy.cli][INFO  ]  cd_conf   :

[ceph_deploy.cli][INFO  ]  cluster   : ceph
[ceph_deploy.cli][INFO  ]  client: ['mon1']
[ceph_deploy.cli][INFO  ]  func  : 
[ceph_deploy.cli][INFO  ]  ceph_conf : None
[ceph_deploy.cli][INFO  ]  default_release   : False
[ceph_deploy][ERROR ] RuntimeError: ceph.client.admin.keyring not found

2) Use jewel build on a fresh instance - same issue occurred

sudo /usr/bin/ceph --connect-timeout=25 --cluster=ceph --name mon.
--keyring=/var/lib/ceph/mon/ceph-mon1/keyring auth get client.admin


2016-12-02 12:18:46.890561 7f2b50d27700  1 librados: starting msgr at :/0
2016-12-02 12:18:46.890704 7f2b50d27700  1 librados: starting objecter
2016-12-02 12:18:46.890856 7f2b50d27700  1 librados: setting wanted keys
2016-12-02 12:18:46.890909 7f2b50d27700  1 librados: calling monclient init
Traceback (most recent call last):
  File "/usr/bin/ceph", line 948, in 
retval = main()
  File "/usr/bin/ceph", line 852, in main
prefix='get_command_descriptions')
  File "/usr/lib/python2.7/dist-packages/ceph_argparse.py", line 1300, in
json_command
raise RuntimeError('"{0}": exception {1}'.format(argdict, e))
RuntimeError: "None": exception "['{"prefix":
"get_command_descriptions"}']": exception You cannot perform that operation
on a Rados object in state configuring.



Thanks,
Oleg

On Tue, Nov 29, 2016 at 10:05 PM, Vasu Kulkarni  wrote:

> If you are using 'master' build there is an issue
>
> workaround 1)
> before mon create-initial just run 'ceph-deploy admin mon-node' to push
> the admin key on mon nodes and then rerun mon create-initial
>
> 2) or use jewel build which is stable and if you dont need latest master
> ceph-deploy install --stable=jewel node1 node2
>
> On Tue, Nov 29, 2016 at 11:45 AM, Oleg Kolosov  wrote:
>
>> Hi
>> I've recently started working with ceph for a university project I have.
>> I'm working on Amazon EC2 servers.
>>
>> I've used 4 instances: one is admin/mon + 3 OSDs.
>> Right from the start I've encountered a problem. When running the
>> following command:
>> ceph-deploy --username ubuntu mon create-initial
>>
>> I've got the following errors:
>>
>> [mon1][INFO  ] Running command: sudo /usr/bin/ceph --connect-timeout=25
>> --cluster=ceph --admin-daemon=/var/run/ceph/ceph-mon.mon1.asok mon_status
>> [mon1][INFO  ] Running command: sudo /usr/bin/ceph --connect-timeout=25
>> --cluster=ceph --name mon. --keyring=/var/lib/ceph/mon/ceph-mon1/keyring
>> auth get client.admi
>>
>> [mon1][ERROR ] "ceph auth get-or-create for keytype admin returned 1
>> [mon1][DEBUG ] Traceback (most recent call last):
>> [mon1][DEBUG ]   File "/usr/bin/ceph", line 948, in 
>> [mon1][DEBUG ] retval = main()
>> [mon1][DEBUG ]   File "/usr/bin/ceph", line 852, in main
>> [mon1][DEBUG ] prefix='get_command_descriptions')
>> [mon1][DEBUG ]   File "/usr/lib/python2.7/dist-packages/ceph_argparse.py",
>> line 1300, in json_command
>> [mon1][DEBUG ] raise RuntimeError('"{0}": exception
>> {1}'.format(argdict, e))
>> [mon1][DEBUG ] RuntimeError: "None": exception "['{"prefix":
>> "get_command_descriptions"}']": exception You cannot perform that
>> operation on a Rados object in state configuring.
>>
>>
>>
>> I've also tried to use "ceph-deploy install", but got the same error when
>> did "gatherkeys".
>>
>> I'd appreciate your help.
>>
>> Thanks,
>> Oleg
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] renaming ceph server names

2016-12-02 Thread Peter Maloney
On 12/02/16 12:33, Peter Maloney wrote:
> # last section on the other mons (using the file produced on
> the first)
> # repeat on each monitor node
> ceph-mon --cluster newname -i newhostname --inject-monmap
> /tmp/monmap

correction do that on all mons

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] renaming ceph server names

2016-12-02 Thread Peter Maloney
I did something like this the other day on a test cluster... can't
guarantee the same results, but it worked for me. I don't see an
official procedure documented anywhere. I didn't have mds or radosgw. (I
also renamed the cluster at the same time... I omitted those steps)

assuming services are stopped, and assuming your cluster is named "ceph"
(the default):

things to change:
/etc/ceph/ceph.conf (deploy everywhere)
change hostnames here
rename dirs (repeat on each mon)
/var/lib/ceph/mon/ceph-oldhostname ->
/var/lib/ceph/mon/ceph-newhostname
also check mds, etc.
monmap (mon nodes)
# first section on just one mon
# here newhostname matches the dir name
/var/lib/ceph/mon/ceph-newhostname
ceph-mon --cluster ceph -i oldhostname --extract-monmap /tmp/monmap
monmaptool --print /tmp/monmap

# repeat for each
monmaptool --rm oldname1 /tmp/monmap
   
# repeat for each
monmaptool --add newname1 ipgoeshere:6789 /tmp/monmap
   
monmaptool --print /tmp/monmap
   
# last section on the other mons (using the file produced on the
first)
# repeat on each monitor node
ceph-mon --cluster newname -i newhostname --inject-monmap
/tmp/monmap

In theory something should be done about renaming the auth keys ...
ceph auth ..

but I didn't do that, and don't see any auth keys..mine has some
bootstrap ones. I don't know if that's standard or not.
If you had to do that, maybe copying them first, then removing after is
best. Or run the cluster with cephx disabled temporarily to fix it.

Then start mons only
   
This is if you renamed some osd host names, and requires running mons:
# output to compare to later
ceph osd tree

# I didn't do this step...but I think this ought to be right
ceph osd crush rename-bucket oldname newname
   
#verify it looks right now
ceph osd tree

# if it looks wrong, like let's say now you have extra hosts
leftover (which might happen if you start osds before renaming)... use
rename-bucket or rm
# ceph osd crush rm "oldname"

Then start osds.

If you have mds servers you renamed, there's auth for that...rename
those clients probably. That means the /var/lib/ceph/mds/... dirs, and
maybe the client name inside the keyring there. I don't know this step.
And no idea about radosgw.

Test on a test cluster first. And I have no idea if it will result in
data movement. You may want to prepare by `ceph osd set norecover`, and
set maxbackfills, etc. too.


On 12/02/16 12:08, Andrei Mikhailovsky wrote:
> *BUMP*
> 
>
> *From: *"andrei" 
> *To: *"ceph-users" 
> *Sent: *Tuesday, 29 November, 2016 12:46:05
> *Subject: *[ceph-users] renaming ceph server names
>
> Hello.
>
> As a part of the infrastructure change we are planning to rename
> the servers running ceph-osd, ceph-mon and radosgw services. The
> IP addresses will be the same, it's only the server names which
> will need to change.
>
> I would like to find out the steps required to perform these
> changes? Would it be as simple as changing the /etc/hostname,
> /etc/hosts files and changing the radosgw info in
> /etc/ceph/ceph.conf and performing a server reboot? Would all the
> ceph services start okay after the name change? If not, what are
> the proper steps in changing the hostnames? We have a very small
> cluster (3 physical servers running ceph-mon/osd and two of
> theservers are running radosgw service), so we can't really remove
> the servers from the cluster.
>
> Many thanks for your help and ideas
>
> Andrei
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 


Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.malo...@brockmann-consult.de
Internet: http://www.brockmann-consult.de


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rgw: how to prevent rgw user from creating a new bucket?

2016-12-02 Thread Yang Joseph

Hello,

I would like only to allow the user to read the object in a already 
existed bucket, and not allow users

to create new bucket. It supposed to execute the following command:

$ radosgw-admin metadata put user:test3 < ...
  ...
"caps": [
{
"type": "buckets",
"perm": "read"
}

But why user test3 can still create new bucket after I have set its caps 
to "buckets=read"?


thx,

Yang Honggang

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] renaming ceph server names

2016-12-02 Thread Andrei Mikhailovsky
*BUMP* 

> From: "andrei" 
> To: "ceph-users" 
> Sent: Tuesday, 29 November, 2016 12:46:05
> Subject: [ceph-users] renaming ceph server names

> Hello.

> As a part of the infrastructure change we are planning to rename the servers
> running ceph-osd, ceph-mon and radosgw services. The IP addresses will be the
> same, it's only the server names which will need to change.

> I would like to find out the steps required to perform these changes? Would it
> be as simple as changing the /etc/hostname, /etc/hosts files and changing the
> radosgw info in /etc/ceph/ceph.conf and performing a server reboot? Would all
> the ceph services start okay after the name change? If not, what are the 
> proper
> steps in changing the hostnames? We have a very small cluster (3 physical
> servers running ceph-mon/osd and two of theservers are running radosgw
> service), so we can't really remove the servers from the cluster.

> Many thanks for your help and ideas

> Andrei

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Sandisk SSDs

2016-12-02 Thread Matteo Dacrema
Hi All,

Did someone ever used or tested Sandisk Cloudspeed Eco II 1,92TB with Ceph?
I know they have 0,6 DWPD that with Journal will be only 0,3 DPWD which means 
560GB of data per day over 5 years.
I need to know the performance side.

Thanks
Matteo



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mds reconnect timeout

2016-12-02 Thread Xusangdi
Hi John,

In our environment we want to deploy MDS and cephfs client on the same node 
(users actually use cifs/nfs to access ceph storage). However,
it takes a long time to recover if the node with active MDS fails, during which 
a large part is for the new MDS waiting for all clients reconnect.
The `mds_reconnect_timeout` is set to 45s by default, but in our experiment 
cluster, the reconnection with normal client is usually quite fast (<100ms),
so imho we might configure it to a much smaller value like 5, but I’m not sure 
if that would be safe. Any suggestions? Is there any concern I missed
that inspires to such a large default value?

Regards,
--- Sandy
-
本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
邮件!
This e-mail and its attachments contain confidential information from H3C, 
which is
intended only for the person or entity whose address is listed above. Any use 
of the
information contained herein in any way (including, but not limited to, total 
or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender
by phone or email immediately and delete it!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to create two isolated rgw services in one ceph cluster?

2016-12-02 Thread piglei
Hi, I am a ceph newbie. I want to create two isolated rgw services in a
single ceph cluster, the requirements:

   - Two radosgw will have different hosts, such as radosgw-x.site.com and
   radosgw-y.site.com . *File uploaded
   to rgw-xcannot be accessed via rgw-y.*
   - Isolated bucket and user namespaces is not necessary, because I could
   prepend term to bucket name and user name, like "x-bucket" or "y-bucket"

At first I thought region and zone may be the solution, but after a little
more researchs, I found that region and zone are for different geo
locations, they share the same metadata (buckets and users) and objects
instead of isolated copies.

After that I noticed ceph's multi-tenancy feature since jewel release,
which is probably what I'm looking for, here is my solution using
multi-tenancy:

   - using two tenant called x and y, each rgw service matches one tenant.
   - Limit incoming requests to rgw in it's own tenant, which means you can
   only retrieve resources belongs to buckets "x:bucket" when calling
   radosgw-x.site.com. This can be archived by some custom nginx rules.

Is this the right approach or Should I just use two different clusters
instead? Looking forward to your awesome advises.

Thank you!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw leaked orphan objects

2016-12-02 Thread Marius Vaitiekunas
Hi Cephers,

I would like to ask more about this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1254398

On our backup cluster we've a search of leaked objects:
# radosgw-admin orphans find --pool=.rgw.buckets --job-id=bck1

The result is 131288. Before running radosgw-admin orphans finish, I would
like to know other cephers experience. Have anybody tried to delete leaked
objects? How did it go?

Maybe, Yehuda as an author could bring some confidence about this script.
Because our production cluster has 35TB of data which probably is leaked.
We've counted the usage in all of our buckets and compared to rgw buckets
pool usage. The pool is 60TB of size and all the buckets takes only 25TB.
We would like to get these 35TB back :)

How safe is to run leaked objects deletion? Any horrible stories?

-- 
Marius Vaitiekūnas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com