Re: [ceph-users] Micro Ceph summit during the OpenStack summit

2014-10-13 Thread Jonathan D. Proulx

There's also a ceph related session proposed for the 'Ops meetup'
track.  The track itself has several rooms over two days though
schedul isn't finalized yet.

I belive there's still more space for more working groups if anyone
wants to setup an ops focused ceph working group in addition to the
dev stuff mentioned.

https://etherpad.openstack.org/p/PAR-ops-meetup

-Jon


signature.asc
Description: Digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Deep Scrub distribution

2018-03-05 Thread Jonathan D. Proulx
Hi All,

I've recently noticed my deep scrubs are EXTREAMLY poorly
distributed.  They are stating with in the 18->06 local time start
stop time but are not distrubuted over enough days or well distributed
over the range of days they have.

root@ceph-mon0:~# for date in `ceph pg dump | awk '/active/{print $20}'`; do 
date +%D -d $date; done | sort | uniq -c
dumped all
  1 03/01/18
  6 03/03/18
   8358 03/04/18
   1875 03/05/18

So very nearly all 10240 pgs scrubbed lastnight/this morning.  I've
been kicking this around for a while since I noticed poor distribution
over a 7 day range when I was really pretty sure I'd changed that from
the 7d default to 28d.

Tried kicking it out to 42 days about a week ago with:

ceph tell osd.* injectargs '--osd_deep_scrub_interval 3628800'


There were many error suggesting it could nto reread the change and I'd
need to restart the OSDs but 'ceph daemon osd.0 config show |grep
osd_deep_scrub_interval' showed the right value so I let it roll for a
week but the scrubs did not spread out.

So Friday I set that value in ceph.conf and did rolling restarts of
all OSDs.  Then doubled checked running value on all daemons.
Checking Sunday the nightly deeps scrubs (based on LAST_DEEP_SCRUB
voodoo above) show near enough 1/42nd of PGs had been scrubbed
Saturday night that I thought this was working.

This morning I checked again and got the results above.

I would expect after changing to a 42d scrub cycle I'd see approx 1/42
of the PGs deep scrub each night untill there was a roughly even
distribution over the past 42 days.

So which thing is broken my config or my expectations?

-Jon

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Hybrid pool speed (SSD + SATA HDD)

2018-03-14 Thread Jonathan D. Proulx
On Wed, Mar 14, 2018 at 09:50:12PM +0100, mart.v wrote:

:   But from what I understood so far, during the writing process the
:   client communicates also only with the primary OSDs but it will wait
:   until all data are written on all replicas. This is my main concern.
:   Does this mean that writing speed (and IOps) will be limited by the
:   slowest HDD?

Yes that is true.

So if you only (or mostly) care about read performance your plan may
be advantageous, but if write performance is important you will be
sad.

Using SSD WAL+DB in front of the spinning disk Data for bluestore (or
SSD journals + HDD filesystems for filestore) can help juice write
performance to an extent. To what extent is an important but AFAIK
unanswered question.

-Jon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Poor read performance.

2018-04-26 Thread Jonathan D. Proulx
On Wed, Apr 25, 2018 at 10:58:43PM +, Blair Bethwaite wrote:
:Hi Jon,
:
:On 25 April 2018 at 21:20, Jonathan Proulx  wrote:
:>
:> here's a snap of 24hr graph form one server (others are similar in
:> general shape):
:>
:> 
https://snapshot.raintank.io/dashboard/snapshot/gB3FDPl7uRGWmL17NHNBCuWKGsXdiqlt
:
:That's what, a median IOPs of about 80? Pretty high for spinning disk.
:I'd guess you're seeing write-choking. You might be able to improve
:things a bit by upping your librbd cache size (though obviously that
:would only have an effect on new or reset instances), also perhaps
:double check your block queue scheduler max_sectors_kb inside a guest
:and make sure you're not splitting up all writes into 512 byte chunks.
:But does kinda look like you need more hardware, and fast.

Those block queue scheduler tips *might* help me squeeze a bit more
till next budget starts July 1...

Seeing yesterday I have 75% more VMs running than I thought does
change my perspective a bit make the "no we're really just crushed"
analysis more plausible!

Thanks,
-Jon


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] journal or cache tier on SSDs ?

2016-05-11 Thread Jonathan D. Proulx
On Tue, May 10, 2016 at 10:40:08AM +0200, Yoann Moulin wrote:

:RadowGW (S3 and maybe swift for hadoop/spark) will be the main usage. Most of
:the access will be in read only mode. Write access will only be done by the
:admin to update the datasets.

No one seems to have pointed this out, but if your write workload isn't
performance sensitive there's no point in using SSD for journals.

Whether you can/should repurpose as a cache tier is another issue. I
don't have any experince with that so can not comment.

But I think you should not use them as journals becasue each SSD
becomes a single point of failure for multiple OSDs. I'm using
mirrored 3600 series SSDs for journaling but they're the same
generation and subject to identical write loads so I'm suspicious
about wether this is useful or just twice as expensive.

There's also additional complexity in deploy and management when you
split off the journals just becuase it's a more complex system.  This
part isn't too bad and can mostly be automated away, but if you don't
need the performance why pay it.

I too work in an accademic research lab, so if you need to keep the
donor happy by all means decide which way with the system is better.
Leaving them as journal if cache doesn't fit isn't likely to cause
much harm so long as you're replicating your data and can survive an
ssd loss but you should do that to survive a spinning disk loss or
storage node loss anyway.

But if I were you my choice would be between caching and moving them
to a non-ceph use.

-Jon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw hammer -> jewel upgrade (default zone & region config)

2016-05-20 Thread Jonathan D. Proulx
Hi All,

I saw the previous thread on this related to
http://tracker.ceph.com/issues/15597

and Yehuda's fix script
https://raw.githubusercontent.com/yehudasa/ceph/wip-fix-default-zone/src/fix-zone

Running this seems to have landed me in a weird state.

I can create and get new buckets and objects but I've "lost" all my
old buckets.  I'm fairly confident the "lost" data is in the
.rgw.buckets pool but my current zone is set to use .rgw.buckets_


 
root@ceph-mon0:~# radosgw-admin zone get
{
"id": "default",
"name": "default",
"domain_root": ".rgw_",
"control_pool": ".rgw.control_",
"gc_pool": ".rgw.gc_",
"log_pool": ".log_",
"intent_log_pool": ".intent-log_",
"usage_log_pool": ".usage_",
"user_keys_pool": ".users_",
"user_email_pool": ".users.email_",
"user_swift_pool": ".users.swift_",
"user_uid_pool": ".users.uid_",
"system_key": {
"access_key": "",
"secret_key": ""
},
"placement_pools": [
{
"key": "default-placement",
"val": {
"index_pool": ".rgw.buckets.index_",
"data_pool": ".rgw.buckets_",
"data_extra_pool": ".rgw.buckets.extra_",
"index_type": 0
}
}
],
"metadata_heap": "default.rgw.meta",
"realm_id": "a935d12f-14b7-4bf8-a24f-596d5ddd81be"
}


root@ceph-mon0:~# ceph osd pool ls |grep rgw|sort
default.rgw.meta
.rgw
.rgw_
.rgw.buckets
.rgw.buckets_
.rgw.buckets.index
.rgw.buckets.index_
.rgw.control
.rgw.control_
.rgw.gc
.rgw.gc_
.rgw.root
.rgw.root.backup

Should I just adjust the zone to use the pools without trailing
slashes?  I'm a bit lost.  the last I could see from running the
script didn't seem to indicate any errors (though I lost the to to
scroll back buffer before i noticed the issue)

Tail of output from running script:
https://raw.githubusercontent.com/yehudasa/ceph/wip-fix-default-zone/src/fix-zone

+ radosgw-admin zone set --rgw-zone=default
zone id default{
"id": "default",
"name": "default",
"domain_root": ".rgw_",
"control_pool": ".rgw.control_",
"gc_pool": ".rgw.gc_",
"log_pool": ".log_",
"intent_log_pool": ".intent-log_",
"usage_log_pool": ".usage_",
"user_keys_pool": ".users_",
"user_email_pool": ".users.email_",
"user_swift_pool": ".users.swift_",
"user_uid_pool": ".users.uid_",
"system_key": {
"access_key": "",
"secret_key": ""
},
"placement_pools": [
{
"key": "default-placement",
"val": {
"index_pool": ".rgw.buckets.index_",
"data_pool": ".rgw.buckets_",
"data_extra_pool": ".rgw.buckets.extra_",
"index_type": 0
}
}
],
"metadata_heap": "default.rgw.meta",
"realm_id": "a935d12f-14b7-4bf8-a24f-596d5ddd81be"
}
+ radosgw-admin zonegroup default --rgw-zonegroup=default
+ radosgw-admin zone default --rgw-zone=default
root@ceph-mon0:~# radosgw-admin region get --rgw-zonegroup=default
{
"id": "default",
"name": "default",
"api_name": "",
"is_master": "true",
"endpoints": [],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "default",
"zones": [
{
"id": "default",
"name": "default",
"endpoints": [],
"log_meta": "false",
"log_data": "false",
"bucket_index_max_shards": 0,
"read_only": "false"}
],
"placement_targets": [
{
"name": "default-placement",
"tags": []
}
],
"default_placement": "default-placement",
"realm_id": "a935d12f-14b7-4bf8-a24f-596d5ddd81be"}

root@ceph-mon0:~# ceph -v
ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)

Thanks,
-Jon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw hammer -> jewel upgrade (default zone & region config)

2016-05-20 Thread Jonathan D. Proulx
On Fri, May 20, 2016 at 09:21:58AM -0700, Yehuda Sadeh-Weinraub wrote:
:On Fri, May 20, 2016 at 9:03 AM, Jonathan D. Proulx  wrote:
:> Hi All,
:>
:> I saw the previous thread on this related to
:> http://tracker.ceph.com/issues/15597
:>
:> and Yehuda's fix script
:> 
https://raw.githubusercontent.com/yehudasa/ceph/wip-fix-default-zone/src/fix-zone
:>
:> Running this seems to have landed me in a weird state.
:>
:> I can create and get new buckets and objects but I've "lost" all my
:> old buckets.  I'm fairly confident the "lost" data is in the
:> .rgw.buckets pool but my current zone is set to use .rgw.buckets_



:> Should I just adjust the zone to use the pools without trailing
:> slashes?  I'm a bit lost.  the last I could see from running the
:
:Yes. The trailing slashes were needed when upgrading for 10.2.0, as
:there was another bug, and I needed to add these to compensate for it.
:I should update the script now to reflect that fix. You should just
:update the json and set the zone appropriately.
:
:Yehuda


That did the trick (though obviously we both meant trailing
underscores '_')

Thanks,
-Jon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph and Openstack

2016-06-14 Thread Jonathan D. Proulx
On Tue, Jun 14, 2016 at 02:15:45PM +0200, Fran Barrera wrote:
:Hi all,
:
:I have a problem integration Glance with Ceph.
:
:Openstack Mitaka
:Ceph Jewel
:
:I've following the Ceph doc (
:http://docs.ceph.com/docs/jewel/rbd/rbd-openstack/) but when I try to list
:or create images, I have an error "Unable to establish connection to
:http://IP:9292/v2/images";, and in the debug mode I can see this:

This suggests that the Glance API service isn't running properly
and probably isn't related to the rbd backend.

You should be able to conncet to the glance API endpoint even if the
ceph config is wrong (though you'd probably get 'internal server
errors' if the storage backend isn't set up correctly).

In either case you'll probably get better resonse on the openstack
lists, but my suggestion would be to try the regular file backend to
verify your glance setup is working, then switch to the rbd backend.

-Jon

:
:2016-06-14 14:02:54.634 2256 DEBUG glance_store.capabilities [-] Store
:glance_store._drivers.rbd.Store doesn't support updating dynamic storage
:capabilities. Please overwrite 'update_capabilities' method of the store to
:implement updating logics if needed. update_capabilities
:/usr/lib/python2.7/dist-packages/glance_store/capabilities.py:98
:
:I've also tried to remove the database and populate again but the same
:error.
:Cinder with Ceph works correctly.
:
:Any suggestions?
:
:Thanks,
:Fran.

:___
:ceph-users mailing list
:ceph-users@lists.ceph.com
:http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph and Openstack

2016-06-14 Thread Jonathan D. Proulx
On Tue, Jun 14, 2016 at 05:48:11PM +0200, Iban Cabrillo wrote:
:Hi Jon,
:   Which is the hypervisor used for your Openstack deployment? We have lots
:of troubles with xen until latest libvirt ( in libvirt < 1.3.2 package, RDB
:driver was not supported )

we're using kvm (Ubuntu 14.04, libvirt 1.2.12 )

-Jon

:
:Regards, I
:
:2016-06-14 17:38 GMT+02:00 Jonathan D. Proulx :
:
:> On Tue, Jun 14, 2016 at 02:15:45PM +0200, Fran Barrera wrote:
:> :Hi all,
:> :
:> :I have a problem integration Glance with Ceph.
:> :
:> :Openstack Mitaka
:> :Ceph Jewel
:> :
:> :I've following the Ceph doc (
:> :http://docs.ceph.com/docs/jewel/rbd/rbd-openstack/) but when I try to
:> list
:> :or create images, I have an error "Unable to establish connection to
:> :http://IP:9292/v2/images";, and in the debug mode I can see this:
:>
:> This suggests that the Glance API service isn't running properly
:> and probably isn't related to the rbd backend.
:>
:> You should be able to conncet to the glance API endpoint even if the
:> ceph config is wrong (though you'd probably get 'internal server
:> errors' if the storage backend isn't set up correctly).
:>
:> In either case you'll probably get better resonse on the openstack
:> lists, but my suggestion would be to try the regular file backend to
:> verify your glance setup is working, then switch to the rbd backend.
:>
:> -Jon
:>
:> :
:> :2016-06-14 14:02:54.634 2256 DEBUG glance_store.capabilities [-] Store
:> :glance_store._drivers.rbd.Store doesn't support updating dynamic storage
:> :capabilities. Please overwrite 'update_capabilities' method of the store
:> to
:> :implement updating logics if needed. update_capabilities
:> :/usr/lib/python2.7/dist-packages/glance_store/capabilities.py:98
:> :
:> :I've also tried to remove the database and populate again but the same
:> :error.
:> :Cinder with Ceph works correctly.
:> :
:> :Any suggestions?
:> :
:> :Thanks,
:> :Fran.
:>
:> :___
:> :ceph-users mailing list
:> :ceph-users@lists.ceph.com
:> :http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
:>
:>
:> --
:> ___
:> ceph-users mailing list
:> ceph-users@lists.ceph.com
:> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
:>
:
:
:
:-- 
:
:Iban Cabrillo Bartolome
:Instituto de Fisica de Cantabria (IFCA)
:Santander, Spain
:Tel: +34942200969
:PGP PUBLIC KEY:
:http://pgp.mit.edu/pks/lookup?op=get&search=0xD9DF0B3D6C8C08AC
:
:Bertrand Russell:
:*"El problema con el mundo es que los estúpidos están seguros de todo y los
:inteligentes están llenos de dudas*"

-- 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] network architecture questions

2018-09-18 Thread Jonathan D. Proulx
On Tue, Sep 18, 2018 at 12:33:21PM -0700, solarflow99 wrote:
:Hi, anyone able to answer these few questions?

I'm not using CephFS but for RBD (my primary use case) clients also
access OSDs directly.

I use separate cluster and public networks mainly so replication
bandwidth and client bandwidth don't compete. Though I wouldn't call
this necessary.

-Jon

:
:
:On Mon, Sep 17, 2018 at 4:13 PM solarflow99  wrote:
:
:> Hi, I read through the various documentation and had a few questions:
:>
:> - From what I understand cephFS clients reach the OSDs directly, does the
:> cluster network need to be opened up as a public network?
:>
:> - Is it still necessary to have a public and cluster network when the
:> using cephFS since the clients all reach the OSD's directly?
:>
:> - Simplest way to do HA on the mons for providing NFS, etc?
:>

:___
:ceph-users mailing list
:ceph-users@lists.ceph.com
:http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hardware heterogeneous in same pool

2018-10-03 Thread Jonathan D. Proulx
On Wed, Oct 03, 2018 at 07:09:30PM -0300, Bruno Carvalho wrote:
:Hi Cephers, I would like to know how you are growing the cluster.
:
:Using dissimilar hardware in the same pool or creating a pool for each
:different hardware group.
:
:What problem would I have many problems using different hardware (CPU,
:memory, disk) in the same pool?

I've been growing with new hardware in old pools.

Due to the way RDB gets smeared across the disks your performance is
almost always bottle necked by slowest storage location.

If you're just adding slightly newer slightly faster hardware this is
OK as most of the performance gain in that case is from spreading
wider not so much the individual drive performance.

But if you are adding a faster technology like going from
spinning disk to ssd you do want to think about how to transition.

I recently added SSD to a previously all HDD cluster (well HDD data
with SSD WAL/DB).  For this I did fiddle with crush rules. First I made
the existing rules require HDD class devices which shoudl have been a
noop in my mind but actually moved 90% of my data.  The folks at CERN
made a similar discovery before me and even (I think worked out a way
to avoid it) see
http://lists.ceph.com/pipermail/ceph-large-ceph.com/2018-June/000113.html

After that I made new rules that took on SSD andtwo HDD for each
replica set (in addtion to spreading across racks or servers or what
ever) and after applying the new rule to the pools I use for Nova
ephemeral storage and Cinder Volumes I set the SSD OSDs to have high
"primary affinity" and the HDDs to have low "primary affinity".

In the end this means the SSDs server reads and writes while writes to
the HDD replicas are buffered by the SSD WAL so both reads and write
are relatively fast (we'd previouslyy been suffering on reads due to
IO load).

I left  Glance images on HDD only as those don't require much
performance in my world, same with RGW object storage though for soem
that may be performance sensitive.

The plan forward is more SSD to replace HDD, probbably by first
getting enough to transition ephemeral dirves, then a set to move
block storage, then the rest over next year or two.

The mixed SSD/HDD was a big win for us though so we're happy with that
for now.

scale matters with this so we have:
245 OSDs in 12 servers
627 TiB RAW storage (267 TiB used)
19.44 M objects


hope that helps,
-Jon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com