Re: [ceph-users] Nautilus:14.2.2 Legacy BlueStore stats reporting detected

2019-07-21 Thread nokia ceph
Thank you  Paul Emmerich

On Fri, Jul 19, 2019 at 5:22 PM Paul Emmerich 
wrote:

> bluestore warn on legacy statfs = false
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
>
> On Fri, Jul 19, 2019 at 1:35 PM nokia ceph 
> wrote:
>
>> Hi Team,
>>
>> After upgrading our cluster from 14.2.1 to 14.2.2 , the cluster moved to
>> warning state with following error
>>
>> cn1.chn6m1c1ru1c1.cdn ~# ceph status
>>   cluster:
>> id: e9afb5f3-4acf-421a-8ae6-caaf328ef888
>> health: HEALTH_WARN
>> Legacy BlueStore stats reporting detected on 335 OSD(s)
>>
>>   services:
>> mon: 5 daemons, quorum cn1,cn2,cn3,cn4,cn5 (age 114m)
>> mgr: cn4(active, since 2h), standbys: cn3, cn1, cn2, cn5
>> osd: 335 osds: 335 up (since 112m), 335 in
>>
>>   data:
>> pools:   1 pools, 8192 pgs
>> objects: 129.01M objects, 849 TiB
>> usage:   1.1 PiB used, 749 TiB / 1.8 PiB avail
>> pgs: 8146 active+clean
>>  46   active+clean+scrubbing
>>
>> Checked the bug list and found that this issue is solved however still
>> exists ,
>>
>> https://github.com/ceph/ceph/pull/28563
>>
>> How to disable this warning?
>>
>> Thanks,
>> Muthu
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing the release cadence

2019-07-21 Thread Brent Kennedy
12 months sounds good to me, I like the idea of march as well since we plan
on doing upgrades in June/July each year.  Gives it time to be discussed and
marinate before we decide to upgrade.

-Brent

-Original Message-
From: ceph-users  On Behalf Of Sage Weil
Sent: Wednesday, June 5, 2019 11:58 AM
To: ceph-us...@ceph.com; ceph-de...@vger.kernel.org; d...@ceph.io
Subject: [ceph-users] Changing the release cadence

Hi everyone,

Since luminous, we have had the follow release cadence and policy:   
 - release every 9 months
 - maintain backports for the last two releases
 - enable upgrades to move either 1 or 2 releases heads
   (e.g., luminous -> mimic or nautilus; mimic -> nautilus or octopus; ...)

This has mostly worked out well, except that the mimic release received less
attention that we wanted due to the fact that multiple downstream Ceph
products (from Red Has and SUSE) decided to based their next release on
nautilus.  Even though upstream every release is an "LTS" release, as a
practical matter mimic got less attention than luminous or nautilus.

We've had several requests/proposals to shift to a 12 month cadence. This
has several advantages:

 - Stable/conservative clusters only have to be upgraded every 2 years
   (instead of every 18 months)
 - Yearly releases are more likely to intersect with downstream
   distribution release (e.g., Debian).  In the past there have been 
   problems where the Ceph releases included in consecutive releases of a 
   distro weren't easily upgradeable.
 - Vendors that make downstream Ceph distributions/products tend to
   release yearly.  Aligning with those vendors means they are more likely 
   to productize *every* Ceph release.  This will help make every Ceph 
   release an "LTS" release (not just in name but also in terms of 
   maintenance attention).

So far the balance of opinion seems to favor a shift to a 12 month cycle[1],
especially among developers, so it seems pretty likely we'll make that
shift.  (If you do have strong concerns about such a move, now is the time
to raise them.)

That brings us to an important decision: what time of year should we
release?  Once we pick the timing, we'll be releasing at that time *every
year* for each release (barring another schedule shift, which we want to
avoid), so let's choose carefully!

A few options:

 - November: If we release Octopus 9 months from the Nautilus release
   (planned for Feb, released in Mar) then we'd target this November.  We 
   could shift to a 12 months candence after that.
 - February: That's 12 months from the Nautilus target.
 - March: That's 12 months from when Nautilus was *actually* released.

November is nice in the sense that we'd wrap things up before the holidays.
It's less good in that users may not be inclined to install the new release
when many developers will be less available in December.

February kind of sucked in that the scramble to get the last few things done
happened during the holidays.  OTOH, we should be doing what we can to avoid
such scrambles, so that might not be something we should factor in.  March
may be a bit more balanced, with a solid 3 months before when people are
productive, and 3 months after before they disappear on holiday to address
any post-release issues.

People tend to be somewhat less available over the summer months due to
holidays etc, so an early or late summer release might also be less than
ideal.

Thoughts?  If we can narrow it down to a few options maybe we could do a
poll to gauge user preferences.

Thanks!
sage


[1] https://twitter.com/larsmb/status/1130010208971952129

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Observation of bluestore db/wal performance

2019-07-21 Thread Anthony D'Atri
This may be somewhat controversial, so I’ll try to tread lightly.

Might we infer that your OSDs are on spinners?  And at 500 GB it would seem 
likely that they and the servers are old?  Please share hardware details and OS.

Having suffered an “enterprise” dogfood deployment in which I had to attempt to 
support thousands of RBD clients on spinners with colo journals (and a serious 
design flaw that some of you are familiar with), my knee-jerk thought is that 
they are antithetical to “heavy use of block storage”.  I understand though 
that in an education setting you may not have choices.

How highly utilized are your OSD drives?  Depending on your workload you 
*might* benefit with more PGs.  But since you describe your OSDs as being 500GB 
on average, I have to ask:  do their sizes vary considerably?  If so, larger 
OSDs are going to have more PGs (and thus receive more workload) than smaller.  
“ceph osd df” will show the number of PGs on each.  If you do have a 
significant disparity of drive sizes, careful enabling and tweaking of primary 
affinity can have measureable results in read performance.

Is the number of PGs a power of 2?  If not, some of your PGs will be much 
larger than others.  Do you have OSD fillage reasonably well balanced?  If 
“ceph osd df” shows a wide variance, this can also hamper performance as the 
workload will not be spread evenly.

With all due respect to those who have tighter constraints than I enjoy in my 
my current corporate setting, heavy RBD usage on spinners can be sisyphean.  
Granted I’ve never run with a cache tier myself, or with separate WAL/DB 
devices.  In a corporate setting the additional cost of SSD OSDs can easily be 
balanced by reduced administrative hassle and user experience.  If that isn’t 
an option for you anytime soon, then by all means I’d stick with the cache 
tier, and maybe with Luminous indefinitely.  


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Iscsi in the nautilus Dashboard

2019-07-21 Thread Brent Kennedy
I have a test cluster running centos 7.6 setup with two iscsi gateways ( per
the requirement ).  I have the dashboard setup in nautilus ( 14.2.2 ) and I
added the iscsi gateways via the command.  Both show down and when I go to
the dashboard it states:

 

" Unsupported `ceph-iscsi` config version. Expected 9 but found 8.  "

 

Both iscsi gateways were setup from scratch since the latest and greatest
packages required for ceph iscsi install are not available in the centos
repositories.  Is 3.0 not considered version 9?  ( did I do something wrong?
) Why is it called/detected as version 8 when its version 3?

 

I also wondering, the package versions listed as required in the nautilus
docs(http://docs.ceph.com/docs/nautilus/rbd/iscsi-target-cli/)  state x.x.x
or NEWER package, but when I try to add a gateway gwcli complains about the
tcmu-runner and targetcli versions and I have to use the 

Skipchecks=true option when adding them.  

 

Another thing came up, might be doing it wrong as well:  

Added a disk, then added the client, then tried to add the auth using the
auth command and it states: "Failed to update the client's auth: Invalid
password"

 

Actual output:

/iscsi-target...erpix:backup1> auth username=test password=test

CMD: ../hosts/ auth *

username=test, password=test, mutual_username=None, mutual_password=None

CMD: ../hosts/ auth *

auth to be set to username='test', password='test', mutual_username='None',
mutual_password='None' for 'iqn.2019-07.com.somgthing:backup1'

Failed to update the client's auth: Invalid username

 

Did I miss something in the setup doc?

 

Installed packages:

rtslib:  wget
https://github.com/open-iscsi/rtslib-fb/archive/v2.1.fb69.tar.gz

target-cli: wget
https://github.com/open-iscsi/targetcli-fb/archive/v2.1.fb49.tar.gz 

tcmu-runner: wget
https://github.com/open-iscsi/tcmu-runner/archive/v1.4.1.tar.gz 

ceph-iscsi: wget https://github.com/ceph/ceph-iscsi/archive/3.0.tar.gz 

configshell: wget
https://github.com/open-iscsi/configshell-fb/archive/v1.1.fb25.tar.gz 

 

Other bits I installed as part of this:

yum install epel-release python-pip python-devel -y

yum groupinstall "Development Tools" -y

python -m pip install --upgrade pip setuptools wheel

pip install netifaces cryptography flask

 

 

Any helps or pointer would be greatly appreciated!

 

-Brent

 

Existing Clusters:

Test: Nautilus 14.2.2 with 3 osd servers, 1 mon/man, 1 gateway, 2 iscsi
gateways ( all virtual on nvme )

US Production(HDD): Nautilus 14.2.1 with 11 osd servers, 3 mons, 4 gateways
behind haproxy LB

UK Production(HDD): Luminous 12.2.11 with 25 osd servers, 3 mons/man, 3
gateways behind haproxy LB

US Production(SSD): Luminous 12.2.11 with 6 osd servers, 3 mons/man, 3
gateways behind haproxy LB

 

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Failed to get omap key when mirroring of image is enabled

2019-07-21 Thread Ajitha Robert
 I have a rbd mirroring setup with primary and secondary clusters as peers
and I have a pool enabled image mode.., In this i created a rbd image ,
enabled with journaling.

But whenever i enable mirroring on the image,  I m getting error in
osd.log. I couldnt trace it out. please guide me to solve this error.
I think initially it worked fine. but after ceph process restart. these
error coming


Secondary.osd.0.log

2019-07-22 05:36:17.371771 7ffbaa0e9700  0 
/build/ceph-12.2.12/src/cls/journal/cls_journal.cc:61: failed to get omap
key: client_a5c76849-ba16-480a-a96b-ebfdb7f6ac65
2019-07-22 05:36:17.388552 7ffbaa0e9700  0 
/build/ceph-12.2.12/src/cls/journal/cls_journal.cc:472: active object set
earlier than minimum: 0 < 1
2019-07-22 05:36:17.413102 7ffbaa0e9700  0 
/build/ceph-12.2.12/src/cls/journal/cls_journal.cc:61: failed to get omap
key: order
2019-07-22 05:36:23.341490 7ffbab8ec700  0 
/build/ceph-12.2.12/src/cls/rbd/cls_rbd.cc:4125: error retrieving image id
for global id '9e36b9f8-238e-4a54-a055-19b19447855e': (2) No such file or
directory


primary-osd.0.log

2019-07-22 05:16:49.287769 7fae12db1700  0 log_channel(cluster) log [DBG] :
1.b deep-scrub ok
2019-07-22 05:16:54.078698 7fae125b0700  0 log_channel(cluster) log [DBG] :
1.1b scrub starts
2019-07-22 05:16:54.293839 7fae125b0700  0 log_channel(cluster) log [DBG] :
1.1b scrub ok
2019-07-22 05:17:04.055277 7fae12db1700  0 
/build/ceph-12.2.12/src/cls/journal/cls_journal.cc:472: active object set
earlier than minimum: 0 < 1

2019-07-22 05:33:21.540986 7fae135b2700  0 
/build/ceph-12.2.12/src/cls/journal/cls_journal.cc:472: active object set
earlier than minimum: 0 < 1
2019-07-22 05:35:27.447820 7fae12db1700  0 
/build/ceph-12.2.12/src/cls/rbd/cls_rbd.cc:4125: error retrieving image id
for global id '8a61f694-f650-4ba1-b768-c5e7629ad2e0': (2) No such file or
directory


-- 


*Regards,Ajitha R*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Observation of bluestore db/wal performance

2019-07-21 Thread Mark Nelson
FWIW, the DB and WAL don't really do the same thing that the cache tier 
does.  The WAL is similar to filestore's journal, and the DB is 
primarily for storing metadata (onodes, blobs, extents, and OMAP data).  
Offloading these things to an SSD will definitely help, but you won't 
see the same kind of behavior that you would see with cache tiering 
(especially if the workload is small enough to fit entirely in the cache 
tier).



IMHO the biggest performance consideration with cache tiering is when 
your workload doesn't fit entirely in the cache and you are evicting 
large quantities of data over the network.  Depending on a variety of 
factors this can be pretty slow (and in fact can be slower than not 
using a cache tier at all!).  If your workload fits entirely within the 
cache tier though, it's almost certainly going to be faster than 
bluestore without a cache tier.



Mark


On 7/21/19 9:39 AM, Shawn Iverson wrote:
Just wanted to post an observation here.  Perhaps someone with 
resources to perform some performance tests is interested in comparing 
or has some insight into why I observed this.


Background:

12 node ceph cluster
3-way replicated by chassis group
3 chassis groups
4 nodes per chassis
running Luminous (up to date)
heavy use of block storage for kvm virtual machines (proxmox)
some cephfs usage (<10%)
~100 OSDs
~100 pgs/osd
500GB average OSD capacity

I recently attempted to do away with my ssd cache tier on Luminous and 
replace it with bluestore with db/wal on ssd as this seemed to be a 
better practice, or so I thought.


Sadly, after 2 weeks of rebuilding OSDs and placing the db/wall on 
ssd, I was sorely disappointed with performance. My cluster performed 
poorly.  It seemed that the db/wal on ssd did not boost performance as 
I was used to having.  I used 60gb for the size.  Unfortunately, I did 
not have enough ssd capacity to make it any larger for my OSDs


Despite the words of caution on the Ceph docs in regard to replicated 
base tier and replicated cache-tier, I returned to cache tiering.


Performance has returned to expectations.

It would be interesting if someone had the spare iron and resources to 
benchmark bluestore OSDs with SSD db/wal against cache tiering and 
provide some statistics.


--
Shawn Iverson, CETL
Director of Technology
Rush County Schools
765-932-3901 option 7
ivers...@rushville.k12.in.us 

Cybersecurity

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] which tool to use for benchmarking rgw s3, yscb or cosbench

2019-07-21 Thread Mark Nelson

Hi Wei Zhao,


I've used ycsb for mongodb on rbd testing before.  It worked fine and 
was pretty straightforward to run.  The only real concern I had was that 
many of the default workloads used a zipfian distribution for reads.  
This basically meant reads were entirely coming from cache and didn't 
really test the storage system at all.  I ended up creating some my own 
profiles so that we could test both the default zipfian read setup and 
using a random read distribution as well.  I hadn't heard about nor have 
used the YCSB S3 tests, but I would be very interested in giving it a 
try. Cosbench can be a bit heavy if you only need to run a couple of 
simple tests and have other tools for the test orchestration and data 
visualization.



Mark


On 7/21/19 10:51 AM, Wei Zhao wrote:

Hi:
   I found cosbench is a very convenient tool for benchmaring rgw. But
when I read papers ,  I found YCSB tool,
https://github.com/brianfrankcooper/YCSB/tree/master/s3  . It seems
that this is used for test cloud service , and seems a right tool for
our service . Has  anyone tried this tool ?How is it  compared to
cosbench ?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] which tool to use for benchmarking rgw s3, yscb or cosbench

2019-07-21 Thread Wei Zhao
Hi:
  I found cosbench is a very convenient tool for benchmaring rgw. But
when I read papers ,  I found YCSB tool,
https://github.com/brianfrankcooper/YCSB/tree/master/s3  . It seems
that this is used for test cloud service , and seems a right tool for
our service . Has  anyone tried this tool ?How is it  compared to
cosbench ?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Observation of bluestore db/wal performance

2019-07-21 Thread Shawn Iverson
Just wanted to post an observation here.  Perhaps someone with resources to
perform some performance tests is interested in comparing or has some
insight into why I observed this.

Background:

12 node ceph cluster
3-way replicated by chassis group
3 chassis groups
4 nodes per chassis
running Luminous (up to date)
heavy use of block storage for kvm virtual machines (proxmox)
some cephfs usage (<10%)
~100 OSDs
~100 pgs/osd
500GB average OSD capacity

I recently attempted to do away with my ssd cache tier on Luminous and
replace it with bluestore with db/wal on ssd as this seemed to be a better
practice, or so I thought.

Sadly, after 2 weeks of rebuilding OSDs and placing the db/wall on ssd, I
was sorely disappointed with performance.  My cluster performed poorly.  It
seemed that the db/wal on ssd did not boost performance as I was used to
having.  I used 60gb for the size.  Unfortunately, I did not have enough
ssd capacity to make it any larger for my OSDs

Despite the words of caution on the Ceph docs in regard to replicated base
tier and replicated cache-tier, I returned to cache tiering.

Performance has returned to expectations.

It would be interesting if someone had the spare iron and resources to
benchmark bluestore OSDs with SSD db/wal against cache tiering and provide
some statistics.

-- 
Shawn Iverson, CETL
Director of Technology
Rush County Schools
765-932-3901 option 7
ivers...@rushville.k12.in.us

[image: Cybersecurity]
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] memory usage of: radosgw-admin bucket rm

2019-07-21 Thread Paul Emmerich
For mailing list archive readers in the future:

On Tue, Jul 9, 2019 at 1:22 PM Paul Emmerich  wrote:

> Try to add "--inconsistent-index" (caution: will obviously leave your
> bucket in a broken state during the deletion, so don't try to use the
> bucket)
>

this was bad advice as long as https://tracker.ceph.com/issues/40700 is not
fixed, don't do that.



>
> You can also speed up the deletion with "--max-concurrent-ios" (default
> 32). The documentation incorrectly claims that "--max-concurrent-ios" is
> only for other operations but that's wrong, it is used for most bucket
> operations including deleteion.
>

this, however, is a good idea to speed up deletion of large buckets.

Try to combine the deletion command with timeout or something to not run
into OOM all the time affecting other services.
(or use cgroups to limit RAM)


Paul


>
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
>
> On Tue, Jul 9, 2019 at 1:11 PM Harald Staub 
> wrote:
>
>> Currently removing a bucket with a lot of objects:
>> radosgw-admin bucket rm --bucket=$BUCKET --bypass-gc --purge-objects
>>
>> This process was killed by the out-of-memory killer. Then looking at the
>> graphs, we see a continuous increase of memory usage for this process,
>> about +24 GB per day. Removal rate is about 3 M objects per day.
>>
>> It is not the fastest hardware, and this index pool is still without
>> SSDs. The bucket is sharded, 1024 shards. We are on Nautilus 14.2.1, now
>> about 500 OSDs.
>>
>> So with this bucket with 60 M objects, we would need about 480 GB of RAM
>> to come through. Or is there a workaround? Should I open a tracker issue?
>>
>> The killed remove command can just be called again, but it will be
>> killed again before it finishes. Also, it has to run some time until it
>> continues to actually remove objects. This "wait time" is also
>> increasing. Last time, after about 16 M objects already removed, the
>> wait time was nearly 9 hours. Also during this time, there is a memory
>> ramp, but not so steep.
>>
>> BTW it feels strange that the removal of objects is slower (about 3
>> times) than adding objects.
>>
>>   Harry
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com