[ceph-users] Trouble removing MDS daemons | Luminous

2017-11-08 Thread Geoffrey Rhodes
Good day,

Firstly I'd like to acknowledge that I consider myself a Ceph noob.

OS: Ubuntu 16.04.3 LTS
Ceph version: 12.2.1

I'm running a small six node POC cluster with three MDS daemons. (One on
each node,  node1, node2 and node3)
I've also configured three ceph file systems fsys1, fsys2 and fsys3.

I'd like to remove two of the file systems (fsys2 and fsys3) and at least
one if not both of the MDS daemons.
I was able to fail MDS on node3 using command "sudo ceph mds fail node3"
followed by "sudo ceph mds rmfailed 0 --yes-i-really-mean-it".
Then I removed the file system using command "sudo ceph fs rm fsys3
--yes-i-really-mean-it".

Running command "sudo ceph fs status" confirms that fsys3 is now failed and
that the MDS daemon on node3 has become a standby MDS.
I've combinations of "ceph mds (fail, deactivate, rm, rmfailed" but I can't
seem to be able to remove the standby daemon.
After rebooting node3 and running command "sudo ceph fs status" -  fsys3 is
no longer a listed file system and node3 is still standby MDS.

I've searched for details on this topic but what I have found has not
helped me.
Could anybody assist with the correct steps for removing MDS daemons and
ceph file systems on nodes?
It would be useful to be able to know how to completely remove all ceph
file systems and MDS daemons should I have no further use for them in a
cluster.

Kind regards
Geoffrey Rhodes
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Trouble removing MDS daemons | Luminous

2017-11-08 Thread John Spray
On Wed, Nov 8, 2017 at 10:39 AM, Geoffrey Rhodes  wrote:
> Good day,
>
> Firstly I'd like to acknowledge that I consider myself a Ceph noob.
>
> OS: Ubuntu 16.04.3 LTS
> Ceph version: 12.2.1
>
> I'm running a small six node POC cluster with three MDS daemons. (One on
> each node,  node1, node2 and node3)
> I've also configured three ceph file systems fsys1, fsys2 and fsys3.
>
> I'd like to remove two of the file systems (fsys2 and fsys3) and at least
> one if not both of the MDS daemons.
> I was able to fail MDS on node3 using command "sudo ceph mds fail node3"
> followed by "sudo ceph mds rmfailed 0 --yes-i-really-mean-it".

When you want to remove a filesystem, you only need to do this:
ceph fs set  cluster_down true  # prevent any MDSs rejoining
the filesystem
ceph mds fail  # for each MDS that was in the filesystem
ceph fs rm  # should work because the MDSs are now no longer
involved in this filesystem.

> Then I removed the file system using command "sudo ceph fs rm fsys3
> --yes-i-really-mean-it".
>
> Running command "sudo ceph fs status" confirms that fsys3 is now failed and
> that the MDS daemon on node3 has become a standby MDS.
> I've combinations of "ceph mds (fail, deactivate, rm, rmfailed" but I can't
> seem to be able to remove the standby daemon.

You don't need to explicitly remove standby daemons using the ceph
CLI: you can just get rid of the daemon itself on the server where
it's running.  Ceph will just forget about the standby when it stops
receiving messages from that daemon.

John

> After rebooting node3 and running command "sudo ceph fs status" -  fsys3 is
> no longer a listed file system and node3 is still standby MDS.
>
> I've searched for details on this topic but what I have found has not helped
> me.
> Could anybody assist with the correct steps for removing MDS daemons and
> ceph file systems on nodes?
> It would be useful to be able to know how to completely remove all ceph file
> systems and MDS daemons should I have no further use for them in a cluster.
>
> Kind regards
> Geoffrey Rhodes
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] bluestore - wal,db on faster devices?

2017-11-08 Thread Wolfgang Lendl
Hello,

it's clear to me getting a performance gain from putting the journal on
a fast device (ssd,nvme) when using filestore backend.
it's not when it comes to bluestore - are there any resources,
performance test, etc. out there how a fast wal,db device impacts
performance?


br
wolfgang

-- 
Wolfgang Lendl
IT Systems & Communications
Medizinische Universität Wien
Spitalgasse 23 / BT 88 /Ebene 00
A-1090 Wien
Tel: +43 1 40160-21231
Fax: +43 1 40160-921200

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD heartbeat problem

2017-11-08 Thread Monis Monther
Good Day,

Today we had a problem with lots of OSDs being marked as down due to
heartbeat failures between the OSDs.

Specifically the following is seen in the OSD logs prior to the heartbeat
no_reply errors

monclient: _check_auth_rotating possible clock skew, rotating keys expired
way too early

Can anyone shed some light on what the above log message means?

Our monitors are properly synced with NTP and show NO skew problems in the
logs

NOTE: we restarted the OSD and all went back to healthy and normal, I just
want to understand the messages in the logs to find the cause of the problem

Ceph luminous 12.2.0

-- 
Best Regards
Monis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issue with "renamed" mon, crashing

2017-11-08 Thread Kamila Součková
Hi,

I am not sure if this is the same issue as we had recently, but it looks a
bit like it -- we also had a Luminous mon crashing right after syncing was
done.

Turns out that the current release has a bug which causes the mon to crash
if it cannot find a mgr daemon. This should be fixed in the upcoming
release.

In our case we "solved" it by moving the active mgr to the mon's host. (I
am not sure how to activate a specific mgr, but it appears that the
mgrs get activated in FIFO order -- so just keep killing and re-starting
the active one until a mgr on the mon's host is active).

Hope this helps!

Kamila

On Mon, Nov 6, 2017 at 12:44 PM Anders Olausson  wrote:

> Hi,
>
>
>
> I recently (yesterday) upgraded to Luminous (12.2.1) running on Ubuntu
> 14.04.5 LTS.
>
> Upgrade went fine, no issues at all.
>
> However when I was about to use ceph-deploy to configure some new disks it
> failed.
>
> After some investigation I figured out that it didn’t like that my mons
> was named ceph03mon on the host ceph03 for example, ceph-deploy gatherkeys
> ceph03 failed.
>
> So I decided to rename my mons. I started with removing one of them:
>
>
>
> # stop ceph-mon id=ceph03mon
>
> # ceph mon remove ceph03mon
>
> # cd /var/lib/ceph/mon/
>
> # mv ceph-ceph03mon disabled-ceph-ceph03mon
>
>
>
> Created the new one:
>
>
>
> # mkdir tmp
>
> # mkdir ceph-ceph03
>
> # ceph auth get mon. -o tmp/keyring
>
> # ceph mon getmap -o tmp/monmap
>
> # ceph-mon -i ceph03 --mkfs --monmap tmp/monmap --keyring tmp/keyring
>
> # chown -R ceph:ceph ceph-ceph03
>
> # ceph-mon -i ceph03 --public-addr 10.10.1.23:6789
>
> # start ceph-mon id=ceph03
>
>
>
> Starts OK, quorum is established, when it gets the command “ceph osd pool
> stat” for example, or “ceph auth list” it crashes.
>
>
>
> Complete log can be found at:
> http://files.spacedump.se/ceph03-monerror-20171106-01.txt
>
> Used below settings for logging in ceph.conf at the time:
>
>
>
> [mon]
>
>debug mon = 20
>
>debug paxos = 20
>
>debug auth = 20
>
>
>
> I have now rolled back to the old monitor, it works as it should, on the
> same box etc. But it’s the one upgraded from Hammer -> Jewel -> Luminous.
>
>
>
> Any idea what the issue could be?
>
> Thanks.
>
>
>
> Best regards
>
>   Anders Olausson
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] High osd cpu usage

2017-11-08 Thread Alon Avrahami
Hello Guys

We  have a fresh 'luminous'  (  12.2.0 )
(32ce2a3ae5239ee33d6150705cdb24d43bab910c)
luminous (rc)   ( installed using ceph-ansible )

the cluster contains 6 *  Intel  server board  S2600WTTR  (  96 osds and  3
mons )

We have 6 nodes  ( Intel server board  S2600WTTR ) , Mem - 64G , CPU
-> Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz , 32 cores .
Each server   has 16 * 1.6TB  Dell SSD drives ( SSDSC2BB016T7R )  , total
of 96 osds , 3 mons

The main usage  is rbd's for our  OpenStack environment ( Okata )

We're at the beginning of our production tests and it looks like the  osd's
are too busy although  we don't generate  too much iops at this stage (
almost nothing )
All ceph-osds using 50% of CPU usage and I can't figure out why are they so
busy :

top - 07:41:55 up 49 days,  2:54,  2 users,  load average: 6.85, 6.40, 6.37

Tasks: 518 total,   1 running, 517 sleeping,   0 stopped,   0 zombie
%Cpu(s): 14.8 us,  4.3 sy,  0.0 ni, 80.3 id,  0.0 wa,  0.0 hi,  0.6 si,
0.0 st
KiB Mem : 65853584 total, 23953788 free, 40342680 used,  1557116 buff/cache
KiB Swap:  3997692 total,  3997692 free,0 used. 18020584 avail Mem

PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
COMMAND
  36713 ceph  20   0 3869588 2.826g  28896 S  47.2  4.5   6079:20
ceph-osd
  53981 ceph  20   0 3998732 2.666g  28628 S  45.8  4.2   5939:28
ceph-osd
  55879 ceph  20   0 3707004 2.286g  28844 S  44.2  3.6   5854:29
ceph-osd
  46026 ceph  20   0 3631136 1.930g  29100 S  43.2  3.1   6008:50
ceph-osd
  39021 ceph  20   0 4091452 2.698g  28936 S  42.9  4.3   5687:39
ceph-osd
  47210 ceph  20   0 3598572 1.871g  29092 S  42.9  3.0   5759:19
ceph-osd
  52763 ceph  20   0 3843216 2.410g  28896 S  42.2  3.8   5540:11
ceph-osd
  49317 ceph  20   0 3794760 2.142g  28932 S  41.5  3.4   5872:24
ceph-osd
  42653 ceph  20   0 3915476 2.489g  28840 S  41.2  4.0   5605:13
ceph-osd
  41560 ceph  20   0 3460900 1.801g  28660 S  38.5  2.9   5128:01
ceph-osd
  50675 ceph  20   0 3590288 1.827g  28840 S  37.9  2.9   5196:58
ceph-osd
  37897 ceph  20   0 4034180 2.814g  29000 S  34.9  4.5   4789:10
ceph-osd
  50237 ceph  20   0 3379780 1.930g  28892 S  34.6  3.1   4846:36
ceph-osd
  48608 ceph  20   0 3893684 2.721g  28880 S  33.9  4.3   4752:43
ceph-osd
  40323 ceph  20   0 4227864 2.959g  28800 S  33.6  4.7   4712:36
ceph-osd
  44638 ceph  20   0 3656780 2.437g  28896 S  33.2  3.9   4793:58
ceph-osd
  61639 ceph  20   0  527512 114300  20988 S   2.7  0.2   2722:03
ceph-mgr
  31586 ceph  20   0  765672 304140  21816 S   0.7  0.5 409:06.09
ceph-mon
 68 root  20   0   0  0  0 S   0.3  0.0   3:09.69
ksoftirqd/12

strace  doesn't show anything suspicious

root@ecprdbcph10-opens:~# strace -p 36713
strace: Process 36713 attached
futex(0x563343c56764, FUTEX_WAIT_PRIVATE, 1, NUL

Ceph logs don't reveal anything?
Is this "normal" behavior in Luminous?
Looking out in older threads I can only find a thread about time gaps which
is not our case

Thanks,
Alon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bluestore - wal,db on faster devices?

2017-11-08 Thread Mark Nelson

Hi Wolfgang,

In bluestore the WAL serves sort of a similar purpose to filestore's 
journal, but bluestore isn't dependent on it for guaranteeing durability 
of large writes.  With bluestore you can often get higher large-write 
throughput than with filestore when using HDD-only or flash-only OSDs.


Bluestore also stores allocation, object, and cluster metadata in the 
DB.  That, in combination with the way bluestore stores objects, 
dramatically improves behavior during certain workloads.  A big one is 
creating millions of small objects as quickly as possible.  In 
filestore, PG splitting has a huge impact on performance and tail 
latency.  Bluestore is much better just on HDD, and putting the DB and 
WAL on flash makes it better still since metadata no longer is a bottleneck.


Bluestore does have a couple of shortcomings vs filestore currently. 
The allocator is not as good as XFS's and can fragment more over time. 
There is no server-side readahead so small sequential read performance 
is very dependent on client-side readahead.  There's still a number of 
optimizations to various things ranging from threading and locking in 
the shardedopwq to pglog and dup_ops that potentially could improve 
performance.


I have a blog post that we've been working on that explores some of 
these things but I'm still waiting on review before I publish it.


Mark

On 11/08/2017 05:53 AM, Wolfgang Lendl wrote:

Hello,

it's clear to me getting a performance gain from putting the journal on
a fast device (ssd,nvme) when using filestore backend.
it's not when it comes to bluestore - are there any resources,
performance test, etc. out there how a fast wal,db device impacts
performance?


br
wolfgang


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] VM Data corruption shortly after Luminous Upgrade

2017-11-08 Thread James Forde
Title probably should have read "Ceph Data corruption shortly after Luminous 
Upgrade"

Problem seems to have been sorted out. Still not sure why original problem 
other than Upgrade latency?, or mgr errors?
After I resolved the boot problem I attempted to reproduce error, but was 
unsuccessful which is good. HEALTH_OK

Anyway, to future users running into Windows "Unmountable Boot Volume", or 
CentOS7 boot to emergency mode, HERE IS SOLUTION.

Get rbd image size and increase by 1GB and restart VM. That's it. All VM's 
booted right up after increasing rbd image by 1024MB. Takes just a couple of 
seconds.


Rbd info vmtest
Rbd image 'vmtest':
Size 20480 MB

Rbd resize -image vmtest -size 21504


Rbd info vmtest
Rbd image 'vmtest':
Size 21504 MB


Good luck

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] VM Data corruption shortly after Luminous Upgrade

2017-11-08 Thread Jason Dillaman
Are your QEMU VMs using a different CephX user than client.admin? If so,
can you double-check your caps to ensure that the QEMU user can blacklist?
See step 6 in the upgrade instructions [1]. The fact that "rbd resize"
fixed something hints that your VMs had hard-crashed with the exclusive
lock left in the locked position and QEMU wasn't able to break the lock
when the VMs were restarted.

[1]
http://docs.ceph.com/docs/master/release-notes/#upgrade-from-jewel-or-kraken


On Wed, Nov 8, 2017 at 10:29 AM, James Forde  wrote:

> Title probably should have read “Ceph Data corruption shortly after
> Luminous Upgrade”
>
>
>
> Problem seems to have been sorted out. Still not sure why original problem
> other than Upgrade latency?, or mgr errors?
>
> After I resolved the boot problem I attempted to reproduce error, but was
> unsuccessful which is good. HEALTH_OK
>
>
>
> Anyway, to future users running into Windows "Unmountable Boot Volume", or
> CentOS7 boot to emergency mode, HERE IS SOLUTION.
>
>
>
> Get rbd image size and increase by 1GB and restart VM. That’s it. All VM’s
> booted right up after increasing rbd image by 1024MB. Takes just a couple
> of seconds.
>
>
>
>
>
> Rbd info vmtest
>
> Rbd image ‘vmtest’:
>
> Size 20480 MB
>
>
>
> Rbd resize –image vmtest –size 21504
>
>
>
>
>
> Rbd info vmtest
>
> Rbd image ‘vmtest’:
>
> Size 21504 MB
>
>
>
>
>
> Good luck
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] FAILED assert(p.same_interval_since) and unusable cluster

2017-11-08 Thread Jon Light
Thanks for the instructions Michael, I was able to successfully get the
patch, build, and install.

Unfortunately I'm now seeing "osd/PG.cc: 5381: FAILED
assert(info.history.same_interval_since != 0)". Then the OSD crashes.

On Sat, Nov 4, 2017 at 5:51 AM, Michael  wrote:

> Jon Light wrote:
>
> I followed the instructions in the Github repo for cloning and setting up
> the build environment, checked out the 12.2.0 tag, modified OSD.cc with the
> fix, and then tried to build with dpkg-buildpackage. I got the following
> error:
> "ceph/src/kv/RocksDBStore.cc:593:22: error: ‘perf_context’ is not a
> member of ‘rocksdb’"
> I guess some changes have been made to RocksDB since 12.2.0?
>
> Am I going about this the right way? Should I just simply recompile the
> OSD binary with the fix and then copy it to the nodes in my cluster? What's
> the best way to get this fix applied to my current installation?
>
> Thanks
>
> It's probably only an indirect help because you might just have the issue
> that you aren't using 12.2.1, but the way in which I got the patch applied
> on Ubuntu's bionic 12.2.1 is this: apt-get source ceph, wget patch file
> from github, cd to the ceph sources, quilt import ,
> quilt push, pdebuild and then install the ceph-osd .deb .
> At least roughly - you still may have to perform related configuration
> tasks like enabling deb-src entries in the apt sources file, setting up the
> standard pbuilderrc for bionic, and such.
>
> As you might see on the bug tracker, the patch did apparently avoid the
> immediate error for me, but Ceph then ran into another error.
>
> - Michael
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Recovery operations and ioprio options

2017-11-08 Thread Захаров Алексей
Hello,
Today we use ceph jewel with:
  osd disk thread ioprio class=idle
  osd disk thread ioprio priority=7
and "nodeep-scrub" flag is set.

We want to change scheduler from CFQ to deadline, so these options will lose 
effect.
I've tried to find out what operations are performed in "disk thread". What I 
found is that only scrubbing and snap-trimming operations are performed in 
"disk thread".

Do these options affect recovery operations?
Are there any other operations in "disk thread", except scrubbing and 
snap-trimming?

-- 
Regards,
Aleksei Zakharov
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bluestore - wal,db on faster devices?

2017-11-08 Thread Wolfgang Lendl
Hi Mark,

thanks for your reply!
I'm a big fan of keeping things simple - this means that there has to be
a very good reason to put the WAL and DB on a separate device otherwise
I'll keep it collocated (and simpler).

as far as I understood - putting the WAL,DB on a faster (than hdd)
device makes more sense in cephfs and rgw environments (more metadata) -
and less sense in rbd environments - correct?

br
wolfgang

On 11/08/2017 02:21 PM, Mark Nelson wrote:
> Hi Wolfgang,
>
> In bluestore the WAL serves sort of a similar purpose to filestore's
> journal, but bluestore isn't dependent on it for guaranteeing
> durability of large writes.  With bluestore you can often get higher
> large-write throughput than with filestore when using HDD-only or
> flash-only OSDs.
>
> Bluestore also stores allocation, object, and cluster metadata in the
> DB.  That, in combination with the way bluestore stores objects,
> dramatically improves behavior during certain workloads.  A big one is
> creating millions of small objects as quickly as possible.  In
> filestore, PG splitting has a huge impact on performance and tail
> latency.  Bluestore is much better just on HDD, and putting the DB and
> WAL on flash makes it better still since metadata no longer is a
> bottleneck.
>
> Bluestore does have a couple of shortcomings vs filestore currently.
> The allocator is not as good as XFS's and can fragment more over time.
> There is no server-side readahead so small sequential read performance
> is very dependent on client-side readahead.  There's still a number of
> optimizations to various things ranging from threading and locking in
> the shardedopwq to pglog and dup_ops that potentially could improve
> performance.
>
> I have a blog post that we've been working on that explores some of
> these things but I'm still waiting on review before I publish it.
>
> Mark
>
> On 11/08/2017 05:53 AM, Wolfgang Lendl wrote:
>> Hello,
>>
>> it's clear to me getting a performance gain from putting the journal on
>> a fast device (ssd,nvme) when using filestore backend.
>> it's not when it comes to bluestore - are there any resources,
>> performance test, etc. out there how a fast wal,db device impacts
>> performance?
>>
>>
>> br
>> wolfgang
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issue with "renamed" mon, crashing

2017-11-08 Thread Anders Olausson
Hi Kamila,

Thank you for your response.

I think we solved it yesterday.
I simply removed the mon again and this time I also removed all references to 
it in ceph.conf (had some remnants there).
After that I ran ceph-deploy and after that it haven’t crashed again so far.

So in this case it was most likely some leftovers from the old mon in the 
config that fscked up things. (don’t get why though, but since it works after I 
removed all traces of it first and then recreated it). (before that I had 
removed it, recreated it a bunch of times aswell, but with some leftovers I 
ceph.conf, that was when it didn’t work)

//Anders

Från: Kamila Součková [mailto:kam...@ksp.sk]
Skickat: den 8 november 2017 13:43
Till: Anders Olausson 
Kopia: ceph-users@lists.ceph.com
Ämne: Re: [ceph-users] Issue with "renamed" mon, crashing

Hi,

I am not sure if this is the same issue as we had recently, but it looks a bit 
like it -- we also had a Luminous mon crashing right after syncing was done.

Turns out that the current release has a bug which causes the mon to crash if 
it cannot find a mgr daemon. This should be fixed in the upcoming release.

In our case we "solved" it by moving the active mgr to the mon's host. (I am 
not sure how to activate a specific mgr, but it appears that the mgrs get 
activated in FIFO order -- so just keep killing and re-starting the active one 
until a mgr on the mon's host is active).

Hope this helps!

Kamila

On Mon, Nov 6, 2017 at 12:44 PM Anders Olausson 
mailto:and...@spacedump.se>> wrote:
Hi,

I recently (yesterday) upgraded to Luminous (12.2.1) running on Ubuntu 14.04.5 
LTS.
Upgrade went fine, no issues at all.
However when I was about to use ceph-deploy to configure some new disks it 
failed.
After some investigation I figured out that it didn’t like that my mons was 
named ceph03mon on the host ceph03 for example, ceph-deploy gatherkeys ceph03 
failed.
So I decided to rename my mons. I started with removing one of them:

# stop ceph-mon id=ceph03mon
# ceph mon remove ceph03mon
# cd /var/lib/ceph/mon/
# mv ceph-ceph03mon disabled-ceph-ceph03mon

Created the new one:

# mkdir tmp
# mkdir ceph-ceph03
# ceph auth get mon. -o tmp/keyring
# ceph mon getmap -o tmp/monmap
# ceph-mon -i ceph03 --mkfs --monmap tmp/monmap --keyring tmp/keyring
# chown -R ceph:ceph ceph-ceph03
# ceph-mon -i ceph03 --public-addr 10.10.1.23:6789
# start ceph-mon id=ceph03

Starts OK, quorum is established, when it gets the command “ceph osd pool stat” 
for example, or “ceph auth list” it crashes.

Complete log can be found at: 
http://files.spacedump.se/ceph03-monerror-20171106-01.txt
Used below settings for logging in ceph.conf at the time:

[mon]
   debug mon = 20
   debug paxos = 20
   debug auth = 20

I have now rolled back to the old monitor, it works as it should, on the same 
box etc. But it’s the one upgraded from Hammer -> Jewel -> Luminous.

Any idea what the issue could be?
Thanks.

Best regards
  Anders Olausson
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bluestore - wal,db on faster devices?

2017-11-08 Thread Mark Nelson

Hi Wolfgang,

You've got the right idea.  RBD is probably going to benefit less since 
you have a small number of large objects and little extra OMAP data. 
Having the allocation and object metadata on flash certainly shouldn't 
hurt, and you should still have less overhead for small (<64k) writes. 
With RGW however you also have to worry about bucket index updates 
during writes and that's a big potential bottleneck that you don't need 
to worry about with RBD.


Mark

On 11/08/2017 01:01 PM, Wolfgang Lendl wrote:

Hi Mark,

thanks for your reply!
I'm a big fan of keeping things simple - this means that there has to be
a very good reason to put the WAL and DB on a separate device otherwise
I'll keep it collocated (and simpler).

as far as I understood - putting the WAL,DB on a faster (than hdd)
device makes more sense in cephfs and rgw environments (more metadata) -
and less sense in rbd environments - correct?

br
wolfgang

On 11/08/2017 02:21 PM, Mark Nelson wrote:

Hi Wolfgang,

In bluestore the WAL serves sort of a similar purpose to filestore's
journal, but bluestore isn't dependent on it for guaranteeing
durability of large writes.  With bluestore you can often get higher
large-write throughput than with filestore when using HDD-only or
flash-only OSDs.

Bluestore also stores allocation, object, and cluster metadata in the
DB.  That, in combination with the way bluestore stores objects,
dramatically improves behavior during certain workloads.  A big one is
creating millions of small objects as quickly as possible.  In
filestore, PG splitting has a huge impact on performance and tail
latency.  Bluestore is much better just on HDD, and putting the DB and
WAL on flash makes it better still since metadata no longer is a
bottleneck.

Bluestore does have a couple of shortcomings vs filestore currently.
The allocator is not as good as XFS's and can fragment more over time.
There is no server-side readahead so small sequential read performance
is very dependent on client-side readahead.  There's still a number of
optimizations to various things ranging from threading and locking in
the shardedopwq to pglog and dup_ops that potentially could improve
performance.

I have a blog post that we've been working on that explores some of
these things but I'm still waiting on review before I publish it.

Mark

On 11/08/2017 05:53 AM, Wolfgang Lendl wrote:

Hello,

it's clear to me getting a performance gain from putting the journal on
a fast device (ssd,nvme) when using filestore backend.
it's not when it comes to bluestore - are there any resources,
performance test, etc. out there how a fast wal,db device impacts
performance?


br
wolfgang


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Libvirt hosts freeze after ceph osd+mon problem

2017-11-08 Thread Jan Pekař - Imatic

You were right, it was frozen at virtual machine level.
panic kernel parameter worked, so server resumed with reboot.

But there were no panic displayed on the VNC console even if I was logged.

The main problem is, that combination of MON and OSD silent failure at 
once will cause much longer resuming from that state.


In my case

approx at 18:38:11 I paused MON+OSD
at 18:38:17 I have first heartbeat_check: no reply from
at 18:38:30 I have libceph: mon1 [X]:6789 session lost, hunting for new mon
at 18:38:30 I have libceph: mon2 [X]:6789 session established
at 18:39:05 imatic-hydra01 kernel: [2384345.121219] libceph: osd6 down

So it took 54 seconds in my case to resume IO and recover. Is that 
normal and expected?
I think that long time is because MON hunt was run during the OSD error 
and another monitor won election, so after that timeouts for kicking OSD 
out are running from the very beginning.


When considering timeouts, everybody must count with MON recover timeout 
+ OSD recover timeout as worst scenario for IO outage. Even they are 
hosted on different machines, they can fail in the same time.


Do you have any recommendation for reliable heartbeat and other settings 
for virtual machines with ext4, xfs and NTFS to be safe?


Thank you
With regards
Jan Pekar




On 7.11.2017 00:30, Jason Dillaman wrote:
If you could install the debug packages and get a gdb backtrace from all 
threads it would be helpful. librbd doesn't utilize any QEMU threads so 
even if librbd was deadlocked, the worst case that I would expect would 
be your guest OS complaining about hung kernel tasks related to disk IO 
(since the disk wouldn't be responding).


On Mon, Nov 6, 2017 at 6:02 PM, Jan Pekař - Imatic > wrote:


Hi,

I'm using debian stretch with ceph 12.2.1-1~bpo80+1 and qemu
1:2.8+dfsg-6+deb9u3
I'm running 3 nodes with 3 monitors and 8 osds on my nodes, all on IPV6.

When I tested the cluster, I detected strange and severe problem.
On first node I'm running qemu hosts with librados disk connection
to the cluster and all 3 monitors mentioned in connection.
On second node I stopped mon and osd with command

kill -STOP MONPID OSDPID

Within one minute all my qemu hosts on first node freeze, so they
even don't respond to ping. On VNC screen there is no error (disk or
kernel panic), they just hung forever with no console response. Even
starting MON and OSD on stopped host doesn't make them running.
Destroying the qemu domain and running again is the only solution.

This happens even if virtual machine has all primary OSD on other
OSDs from that I have stopped - so it is not writing primary to the
stopped OSD.

If I stop only OSD and MON keep running, or I stop only MON and OSD
keep running everything looks OK.

When I stop MON and OSD, I can see in log  osd.0 1300
heartbeat_check: no reply from ... as usual when OSD fails. During
this are virtuals still running, but after that they all stop.

What should I send you to debug this problem? Without fixing that,
ceph is not reliable to me.

Thank you
With regards
Jan Pekar
Imatic
___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





--
Jason


--

Ing. Jan Pekař
jan.pe...@imatic.cz | +420603811737

Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz

--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issues with dynamic bucket indexing resharding and tenants

2017-11-08 Thread Orit Wasserman
On Wed, Nov 8, 2017 at 9:45 PM, Mark Schouten  wrote:
> I see you fixed this (with a rather trivial patch :)), great!
>
:)

> I am wondering though, should I be able to remove the invalid entry using
> this patch too?
>
It should work.

> Regards,
>
> Mark
>
>
> On 5 Nov 2017, at 07:33, Orit Wasserman  wrote:
>
> Hi Mark,
>
> On Fri, Oct 20, 2017 at 4:26 PM, Mark Schouten  wrote:
>
> Hi,
>
> I see issues with resharding. rgw logging shows the following:
> 2017-10-20 15:17:30.018807 7fa1b219a700 -1 ERROR: failed to get entry from
> reshard log, oid=reshard.13 tenant= bucket=qnapnas
>
> radosgw-admin shows me there is one bucket in the queue to do resharding
> for:
> radosgw-admin reshard list
> [
>{
>"time": "2017-10-20 12:37:28.575096Z",
>"tenant": "DB0339",
>"bucket_name": "qnapnas",
>"bucket_id": "1c19a332-7ffc-4472-b852-ec4a143785cc.19675875.3",
>"new_instance_id": "",
>"old_num_shards": 1,
>"new_num_shards": 4
>}
> ]
>
> But, the tenant field in the logging entry is emtpy, which makes me expect
> that the tenant part is partially implemented.
>
> Also, I can add "DB0339/qnapnas" to the list:
> radosgw-admin reshard add --bucket DB0339/qnapnas --num-shards 4
>
> But not like this:
> radosgw-admin reshard add --bucket qnapnas --tenant DB0339 --num-shards 4
> ERROR: --tenant is set, but there's no user ID
>
>
> Please advise.
>
>
> Looks like a bug.
> Can you open a tracker issue for this ?
>
> Thanks,
> Orit
>
>
> Met vriendelijke groeten,
>
> --
> Kerio Operator in de Cloud? https://www.kerioindecloud.nl/
> Mark Schouten | Tuxis Internet Engineering
> KvK: 61527076 | http://www.tuxis.nl/
> T: 0318 200208 | i...@tuxis.nl
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore osd_max_backfills

2017-11-08 Thread Scottix
When I add in the next hdd i'll try the method again and see if I just
needed to wait longer.

On Tue, Nov 7, 2017 at 11:19 PM Wido den Hollander  wrote:

>
> > Op 7 november 2017 om 22:54 schreef Scottix :
> >
> >
> > Hey,
> > I recently updated to luminous and started deploying bluestore osd
> nodes. I
> > normally set osd_max_backfills = 1 and then ramp up as time progresses.
> >
> > Although with bluestore it seems like I wasn't able to do this on the fly
> > like I used to with XFS.
> >
> > ceph tell osd.* injectargs '--osd-max-backfills 5'
> >
> > osd.34: osd_max_backfills = '5'
> > osd.35: osd_max_backfills = '5' rocksdb_separate_wal_dir = 'false' (not
> > observed, change may require restart)
> > osd.36: osd_max_backfills = '5'
> > osd.37: osd_max_backfills = '5'
> >
> > As I incorporate more bluestore osds not being able to control this is
> > going to drastically affect recovery speed and with the default as 1, on
> a
> > big rebalance, I would be afraid restarting a bunch of osd.
> >
>
> Are you sure the backfills are really not increasing? If you re-run the
> command, what does it output?
>
> I've seen this as well, but the backfills seemed to increase anyway.
>
> Wido
>
> > Any advice in how to control this better?
> >
> > Thanks,
> > Scott
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Erasure pool

2017-11-08 Thread Marc Roos
 
Can anyone advice on a erasure pool config to store 

- files between 500MB and 8GB, total 8TB
- just for archiving, not much reading (few files a week)
- hdd pool
- now 3 node cluster (4th coming)
- would like to save on storage space

I was thinking of a profile with jerasure  k=3 m=2, but maybe this lrc 
is better? Or wait for 4th node and choose k=4 m=2?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bluestore - wal,db on faster devices?

2017-11-08 Thread Nick Fisk
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Mark Nelson
> Sent: 08 November 2017 19:46
> To: Wolfgang Lendl 
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] bluestore - wal,db on faster devices?
> 
> Hi Wolfgang,
> 
> You've got the right idea.  RBD is probably going to benefit less since
you
> have a small number of large objects and little extra OMAP data.
> Having the allocation and object metadata on flash certainly shouldn't
hurt,
> and you should still have less overhead for small (<64k) writes.
> With RGW however you also have to worry about bucket index updates
> during writes and that's a big potential bottleneck that you don't need to
> worry about with RBD.

If you are running anything which is sensitive to sync write latency, like
databases. You will see a big performance improvement in using WAL on SSD.
As Mark says, small writes will get ack'd once written to SSD. ~10-200us vs
1-2us difference. It will also batch lots of these small writes
together and write them to disk in bigger chunks much more effectively. If
you want to run active workloads on RBD and want them to match enterprise
storage array with BBWC type performance, I would say DB and WAL on SSD is a
requirement.

> 
> Mark
> 
> On 11/08/2017 01:01 PM, Wolfgang Lendl wrote:
> > Hi Mark,
> >
> > thanks for your reply!
> > I'm a big fan of keeping things simple - this means that there has to
> > be a very good reason to put the WAL and DB on a separate device
> > otherwise I'll keep it collocated (and simpler).
> >
> > as far as I understood - putting the WAL,DB on a faster (than hdd)
> > device makes more sense in cephfs and rgw environments (more
> metadata)
> > - and less sense in rbd environments - correct?
> >
> > br
> > wolfgang
> >
> > On 11/08/2017 02:21 PM, Mark Nelson wrote:
> >> Hi Wolfgang,
> >>
> >> In bluestore the WAL serves sort of a similar purpose to filestore's
> >> journal, but bluestore isn't dependent on it for guaranteeing
> >> durability of large writes.  With bluestore you can often get higher
> >> large-write throughput than with filestore when using HDD-only or
> >> flash-only OSDs.
> >>
> >> Bluestore also stores allocation, object, and cluster metadata in the
> >> DB.  That, in combination with the way bluestore stores objects,
> >> dramatically improves behavior during certain workloads.  A big one
> >> is creating millions of small objects as quickly as possible.  In
> >> filestore, PG splitting has a huge impact on performance and tail
> >> latency.  Bluestore is much better just on HDD, and putting the DB
> >> and WAL on flash makes it better still since metadata no longer is a
> >> bottleneck.
> >>
> >> Bluestore does have a couple of shortcomings vs filestore currently.
> >> The allocator is not as good as XFS's and can fragment more over time.
> >> There is no server-side readahead so small sequential read
> >> performance is very dependent on client-side readahead.  There's
> >> still a number of optimizations to various things ranging from
> >> threading and locking in the shardedopwq to pglog and dup_ops that
> >> potentially could improve performance.
> >>
> >> I have a blog post that we've been working on that explores some of
> >> these things but I'm still waiting on review before I publish it.
> >>
> >> Mark
> >>
> >> On 11/08/2017 05:53 AM, Wolfgang Lendl wrote:
> >>> Hello,
> >>>
> >>> it's clear to me getting a performance gain from putting the journal
> >>> on a fast device (ssd,nvme) when using filestore backend.
> >>> it's not when it comes to bluestore - are there any resources,
> >>> performance test, etc. out there how a fast wal,db device impacts
> >>> performance?
> >>>
> >>>
> >>> br
> >>> wolfgang
> >>>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recovery operations and ioprio options

2017-11-08 Thread Nick Fisk
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> ??? ???
> Sent: 08 November 2017 16:21
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] Recovery operations and ioprio options
> 
> Hello,
> Today we use ceph jewel with:
>   osd disk thread ioprio class=idle
>   osd disk thread ioprio priority=7
> and "nodeep-scrub" flag is set.
> 
> We want to change scheduler from CFQ to deadline, so these options will
> lose effect.
> I've tried to find out what operations are performed in "disk thread".
What I
> found is that only scrubbing and snap-trimming operations are performed in
> "disk thread".

In jewel those operations are now in the main OSD thread and setting the
ioprio's will have no effect. Use the scrub and snap trim sleep options to
throttle them.

> 
> Do these options affect recovery operations?
> Are there any other operations in "disk thread", except scrubbing and
snap-
> trimming?
> 
> --
> Regards,
> Aleksei Zakharov
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Inconsistent PG won't repair

2017-11-08 Thread Richard Bade
For anyone that encounters this in the future, I was able to resolve
the issue by finding the three osd's that the object is on. One by one
I stop the osd, flushed the journal and used the objectstore tool to
remove the data (sudo ceph-objectstore-tool --data-path
/var/lib/ceph/osd/ceph-19 --journal-path
/dev/disk/by-partlabel/journal19 --pool tier3-rbd-3X
rbd_data.19cdf512ae8944a.0001bb56 remove). Then I started the
osd again and let it recover before moving on to the next osd.
After the object was deleted from all three osd's I ran a scrub on the
PG (ceph pg scrub 3.f05). Once the scrub was finished the
inconsistency went away.
Note, the object in question was empty (size of zero bytes) before I
started this process. I emptied the object by moving the rbd image to
another pool.

Rich

On 24 October 2017 at 14:34, Richard Bade  wrote:
> What I'm thinking about trying is using the ceph-objectstore-tool to
> remove the offending clone metadata. From the help the syntax is this:
> ceph-objectstore-tool ...  remove-clone-metadata 
> i.e. something like for my object and expected clone from the log message
> ceph-objectstore-tool rbd_data.19cdf512ae8944a.0001bb56
> remove-clone-metadata 148d2
> Anyone had experience with this? I'm not 100% sure if this will
> resolve the issue or cause much the same situation (since it's already
> expecting a clone that's not there currently).
>
> Rich
>
> On 21 October 2017 at 14:13, Brad Hubbard  wrote:
>> On Sat, Oct 21, 2017 at 1:59 AM, Richard Bade  wrote:
>>> Hi Lincoln,
>>> Yes the object is 0-bytes on all OSD's. Has the same filesystem
>>> date/time too. Before I removed the rbd image (migrated disk to
>>> different pool) it was 4MB on all the OSD's and md5 checksum was the
>>> same on all so it seems that only metadata is inconsistent.
>>> Thanks for your suggestion, I just looked into this as I thought maybe
>>> I can delete the object (since it's empty anyway). But I just get file
>>> not found:
>>> ~$ rados stat rbd_data.19cdf512ae8944a.0001bb56 --pool=tier3-rbd-3X
>>>  error stat-ing
>>> tier3-rbd-3X/rbd_data.19cdf512ae8944a.0001bb56: (2) No such
>>> file or directory
>>
>> Maybe try downing the osds involved?
>>
>>>
>>> Regards,
>>> Rich
>>>
>>> On 21 October 2017 at 04:32, Lincoln Bryant  wrote:
 Hi Rich,

 Is the object inconsistent and 0-bytes on all OSDs?

 We ran into a similar issue on Jewel, where an object was empty across the 
 board but had inconsistent metadata. Ultimately it was resolved by doing a 
 "rados get" and then a "rados put" on the object. *However* that was a 
 last ditch effort after I couldn't get any other repair option to work, 
 and I have no idea if that will cause any issues down the road :)

 --Lincoln

> On Oct 20, 2017, at 10:16 AM, Richard Bade  wrote:
>
> Hi Everyone,
> In our cluster running 0.94.10 we had a pg pop up as inconsistent
> during scrub. Previously when this has happened running ceph pg repair
> [pg_num] has resolved the problem. This time the repair runs but it
> remains inconsistent.
> ~$ ceph health detail
> HEALTH_ERR 1 pgs inconsistent; 2 scrub errors; noout flag(s) set
> pg 3.f05 is active+clean+inconsistent, acting [171,23,131]
> 1 scrub errors
>
> The error in the logs is:
> cstor01 ceph-mon: osd.171 10.233.202.21:6816/12694 45 : deep-scrub
> 3.f05 3/68ab5f05/rbd_data.19cdf512ae8944a.0001bb56/snapdir
> expected clone 3/68ab5f05/rbd_data.19cdf512ae8944a.0001bb56/148d2
>
> Now, I've tried several things to resolve this. I've tried stopping
> each of the osd's in turn and running a repair. I've located the rbd
> image and removed it to empty out the object. The object is now zero
> bytes but still inconsistent. I've tried stopping each osd, removing
> the object and starting the osd again. It correctly identifies the
> object as missing and repair works to fix this but it still remains
> inconsistent.
> I've run out of ideas.
> The object is now zero bytes:
> ~$ find /var/lib/ceph/osd/ceph-23/current/3.f05_head/ -name
> "*19cdf512ae8944a.0001bb56*" -ls
> 537598582  0 -rw-r--r--   1 root root0 Oct 21
> 03:54 
> /var/lib/ceph/osd/ceph-23/current/3.f05_head/DIR_5/DIR_0/DIR_F/DIR_5/DIR_B/rbd\\udata.19cdf512ae8944a.0001bb56__snapdir_68AB5F05__3
>
> How can I resolve this? Is there some way to remove the empty object
> completely? I saw reference to ceph-objectstore-tool which has some
> options to remove-clone-metadata but I don't know how to use this.
> Will using this to remove the mentioned 148d2 expected clone resolve
> this? Or would this do the opposite as it would seem that it can't
> find that clone?
> Documentation on this tool is sparse.
>
> Any help here would be appreciated.
>
> Regards,
> Rich
> _

Re: [ceph-users] Blog post: storage server power consumption

2017-11-08 Thread Nick Fisk
Also look at the new WD 10TB Red's if you want very low use archive storage.
Because they spin at 5400, they only use 2.8W at idle.

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Jack
> Sent: 06 November 2017 22:31
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Blog post: storage server power consumption
> 
> Online does that on C14 (https://www.online.net/en/c14)
> 
> IIRC, 52 spining disks per RU, with only 2 disks usable at a time There is
some
> custom hardware, though, and it is really design for cold storage (as an
IO
> must wait for an idle slot, power-on the device, do the IO, power-off the
> device and release the slot) They use 1GB as a block size
> 
> I do not think this will work anyhow with Ceph
> 
> On 06/11/2017 23:12, Simon Leinen wrote:
> > The last paragraph contains a challenge to developers: Can we save
> > more power in "cold storage" applications by turning off idle disks?
> > Crazy idea, or did anyone already try this?
> >
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Disconnect a client Hypervisor

2017-11-08 Thread Karun Josy
Hi,

Do you think there is a way for ceph to disconnect an HV client from a
cluster?

We want to prevent the possibility that two hvs are running the same vm.
When a hv crashes, we have to make sure that when the
vms are started in a new hv, that the disk is not open in the crashed hv.


I can see 'eviction' in filesystem:
http://docs.ceph.com/docs/master/cephfs/eviction/

But we are implementing RBD in erasure coded profile.


Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bluestore - wal,db on faster devices?

2017-11-08 Thread Mark Nelson



On 11/08/2017 03:16 PM, Nick Fisk wrote:

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Mark Nelson
Sent: 08 November 2017 19:46
To: Wolfgang Lendl 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] bluestore - wal,db on faster devices?

Hi Wolfgang,

You've got the right idea.  RBD is probably going to benefit less since

you

have a small number of large objects and little extra OMAP data.
Having the allocation and object metadata on flash certainly shouldn't

hurt,

and you should still have less overhead for small (<64k) writes.
With RGW however you also have to worry about bucket index updates
during writes and that's a big potential bottleneck that you don't need to
worry about with RBD.


If you are running anything which is sensitive to sync write latency, like
databases. You will see a big performance improvement in using WAL on SSD.
As Mark says, small writes will get ack'd once written to SSD. ~10-200us vs
1-2us difference. It will also batch lots of these small writes
together and write them to disk in bigger chunks much more effectively. If
you want to run active workloads on RBD and want them to match enterprise
storage array with BBWC type performance, I would say DB and WAL on SSD is a
requirement.


Hi Nick,

You've done more investigation in this area than most I think.  Once you 
get to the point under continuous load where RocksDB is compacting, do 
you see better than a 2X gain?


Mark





Mark

On 11/08/2017 01:01 PM, Wolfgang Lendl wrote:

Hi Mark,

thanks for your reply!
I'm a big fan of keeping things simple - this means that there has to
be a very good reason to put the WAL and DB on a separate device
otherwise I'll keep it collocated (and simpler).

as far as I understood - putting the WAL,DB on a faster (than hdd)
device makes more sense in cephfs and rgw environments (more

metadata)

- and less sense in rbd environments - correct?

br
wolfgang

On 11/08/2017 02:21 PM, Mark Nelson wrote:

Hi Wolfgang,

In bluestore the WAL serves sort of a similar purpose to filestore's
journal, but bluestore isn't dependent on it for guaranteeing
durability of large writes.  With bluestore you can often get higher
large-write throughput than with filestore when using HDD-only or
flash-only OSDs.

Bluestore also stores allocation, object, and cluster metadata in the
DB.  That, in combination with the way bluestore stores objects,
dramatically improves behavior during certain workloads.  A big one
is creating millions of small objects as quickly as possible.  In
filestore, PG splitting has a huge impact on performance and tail
latency.  Bluestore is much better just on HDD, and putting the DB
and WAL on flash makes it better still since metadata no longer is a
bottleneck.

Bluestore does have a couple of shortcomings vs filestore currently.
The allocator is not as good as XFS's and can fragment more over time.
There is no server-side readahead so small sequential read
performance is very dependent on client-side readahead.  There's
still a number of optimizations to various things ranging from
threading and locking in the shardedopwq to pglog and dup_ops that
potentially could improve performance.

I have a blog post that we've been working on that explores some of
these things but I'm still waiting on review before I publish it.

Mark

On 11/08/2017 05:53 AM, Wolfgang Lendl wrote:

Hello,

it's clear to me getting a performance gain from putting the journal
on a fast device (ssd,nvme) when using filestore backend.
it's not when it comes to bluestore - are there any resources,
performance test, etc. out there how a fast wal,db device impacts
performance?


br
wolfgang


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] who is using nfs-ganesha and cephfs?

2017-11-08 Thread Sage Weil
Who is running nfs-ganesha's FSAL to export CephFS?  What has your 
experience been?

(We are working on building proper testing and support for this into 
Mimic, but the ganesha FSAL has been around for years.)

Thanks!
sage

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] who is using nfs-ganesha and cephfs?

2017-11-08 Thread Marc Roos
 

I, in test environment, centos7, on a luminous osd node, with binaries 
from 
download.ceph.com::ceph/nfs-ganesha/rpm-V2.5-stable/luminous/x86_64/

Having these:
Nov  6 17:41:34 c01 kernel: ganesha.nfsd[31113]: segfault at 0 ip 
7fa80a151a43 sp 7fa755ffa2f0 error 4 in 
libdbus-1.so.3.7.4[7fa80a12b000+46000]
Nov  6 17:41:34 c01 kernel: ganesha.nfsd[31113]: segfault at 0 ip 
7fa80a151a43 sp 7fa755ffa2f0 error 4 in 
libdbus-1.so.3.7.4[7fa80a12b000+46000]
Nov  6 17:42:16 c01 kernel: ganesha.nfsd[6839]: segfault at 8 ip 
7fc97a5d3f98 sp 7fc8c6ffc2f8 error 6 in 
libdbus-1.so.3.7.4[7fc97a5ac000+46000]
Nov  6 17:42:16 c01 kernel: ganesha.nfsd[6839]: segfault at 8 ip 
7fc97a5d3f98 sp 7fc8c6ffc2f8 error 6 in 
libdbus-1.so.3.7.4[7fc97a5ac000+46000]
Nov  6 17:47:47 c01 kernel: ganesha.nfsd[7662]: segfault at 4 ip 
7f15e2afc060 sp 7f152effc388 error 6 in 
libdbus-1.so.3.7.4[7f15e2ad6000+46000]
Nov  6 17:47:47 c01 kernel: ganesha.nfsd[7662]: segfault at 4 ip 
7f15e2afc060 sp 7f152effc388 error 6 in 
libdbus-1.so.3.7.4[7f15e2ad6000+46000]
Nov  6 17:52:25 c01 kernel: ganesha.nfsd[14415]: segfault at 88 ip 
7f9258eed453 sp 7f91a9ff2348 error 4 in 
libdbus-1.so.3.7.4[7f9258eda000+46000]
Nov  6 17:52:25 c01 kernel: ganesha.nfsd[14415]: segfault at 88 ip 
7f9258eed453 sp 7f91a9ff2348 error 4 in 
libdbus-1.so.3.7.4[7f9258eda000+46000]


And reported this
https://github.com/nfs-ganesha/nfs-ganesha/issues/215



-Original Message-
From: Sage Weil [mailto:sw...@redhat.com] 
Sent: woensdag 8 november 2017 22:42
To: ceph-us...@ceph.com; ceph-de...@vger.kernel.org
Subject: [ceph-users] who is using nfs-ganesha and cephfs?

Who is running nfs-ganesha's FSAL to export CephFS?  What has your 
experience been?

(We are working on building proper testing and support for this into 
Mimic, but the ganesha FSAL has been around for years.)

Thanks!
sage

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bluestore - wal,db on faster devices?

2017-11-08 Thread Nick Fisk
> -Original Message-
> From: Mark Nelson [mailto:mnel...@redhat.com]
> Sent: 08 November 2017 21:42
> To: n...@fisk.me.uk; 'Wolfgang Lendl' 
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] bluestore - wal,db on faster devices?
> 
> 
> 
> On 11/08/2017 03:16 PM, Nick Fisk wrote:
> >> -Original Message-
> >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> >> Of Mark Nelson
> >> Sent: 08 November 2017 19:46
> >> To: Wolfgang Lendl 
> >> Cc: ceph-users@lists.ceph.com
> >> Subject: Re: [ceph-users] bluestore - wal,db on faster devices?
> >>
> >> Hi Wolfgang,
> >>
> >> You've got the right idea.  RBD is probably going to benefit less
> >> since
> > you
> >> have a small number of large objects and little extra OMAP data.
> >> Having the allocation and object metadata on flash certainly
> >> shouldn't
> > hurt,
> >> and you should still have less overhead for small (<64k) writes.
> >> With RGW however you also have to worry about bucket index updates
> >> during writes and that's a big potential bottleneck that you don't
> >> need to worry about with RBD.
> >
> > If you are running anything which is sensitive to sync write latency,
> > like databases. You will see a big performance improvement in using WAL
> on SSD.
> > As Mark says, small writes will get ack'd once written to SSD.
> > ~10-200us vs 1-2us difference. It will also batch lots of
> > these small writes together and write them to disk in bigger chunks
> > much more effectively. If you want to run active workloads on RBD and
> > want them to match enterprise storage array with BBWC type
> > performance, I would say DB and WAL on SSD is a requirement.
> 
> Hi Nick,
> 
> You've done more investigation in this area than most I think.  Once you get
> to the point under continuous load where RocksDB is compacting, do you see
> better than a 2X gain?
> 
> Mark

Hi Mark,

I've not really been testing it in a way where all the OSD's would be under 
100% load for a long period of time. It's been more of a real world user facing 
test were IO comes and goes in short bursts and spikes. I've been busy in other 
areas for the last few months and so have sort of missed out on all the 
official Luminous/bluestore goodness. I hope to get round to doing some more 
testing towards the end of the year though. Once I do, I will look into the 
compaction and see what impact it might be having.

> 
> >
> >>
> >> Mark
> >>
> >> On 11/08/2017 01:01 PM, Wolfgang Lendl wrote:
> >>> Hi Mark,
> >>>
> >>> thanks for your reply!
> >>> I'm a big fan of keeping things simple - this means that there has
> >>> to be a very good reason to put the WAL and DB on a separate device
> >>> otherwise I'll keep it collocated (and simpler).
> >>>
> >>> as far as I understood - putting the WAL,DB on a faster (than hdd)
> >>> device makes more sense in cephfs and rgw environments (more
> >> metadata)
> >>> - and less sense in rbd environments - correct?
> >>>
> >>> br
> >>> wolfgang
> >>>
> >>> On 11/08/2017 02:21 PM, Mark Nelson wrote:
>  Hi Wolfgang,
> 
>  In bluestore the WAL serves sort of a similar purpose to
>  filestore's journal, but bluestore isn't dependent on it for
>  guaranteeing durability of large writes.  With bluestore you can
>  often get higher large-write throughput than with filestore when
>  using HDD-only or flash-only OSDs.
> 
>  Bluestore also stores allocation, object, and cluster metadata in
>  the DB.  That, in combination with the way bluestore stores
>  objects, dramatically improves behavior during certain workloads.
>  A big one is creating millions of small objects as quickly as
>  possible.  In filestore, PG splitting has a huge impact on
>  performance and tail latency.  Bluestore is much better just on
>  HDD, and putting the DB and WAL on flash makes it better still
>  since metadata no longer is a bottleneck.
> 
>  Bluestore does have a couple of shortcomings vs filestore currently.
>  The allocator is not as good as XFS's and can fragment more over time.
>  There is no server-side readahead so small sequential read
>  performance is very dependent on client-side readahead.  There's
>  still a number of optimizations to various things ranging from
>  threading and locking in the shardedopwq to pglog and dup_ops that
>  potentially could improve performance.
> 
>  I have a blog post that we've been working on that explores some of
>  these things but I'm still waiting on review before I publish it.
> 
>  Mark
> 
>  On 11/08/2017 05:53 AM, Wolfgang Lendl wrote:
> > Hello,
> >
> > it's clear to me getting a performance gain from putting the
> > journal on a fast device (ssd,nvme) when using filestore backend.
> > it's not when it comes to bluestore - are there any resources,
> > performance test, etc. out there how a fast wal,db device impacts
> > perform

[ceph-users] recreate ceph-deploy node

2017-11-08 Thread James Forde
On my cluster I have a ceph-deploy node that is not a mon or osd. This is my 
bench system, and I want to recreate the ceph-deploy node to simulate a 
failure. I cannot find this outlined anywhere, so I thought I would ask.

Basically follow Preflight 
http://docs.ceph.com/docs/master/start/quick-start-preflight/#ceph-deploy-setup

Install OS
Setup User with passwordless sudo privileges
Configure ssh and copy to each node
Update repositories
Install Ceph-Deploy
Open Ports etc
Make Directory and cd to directory

This is where I get confused. If I were creating a new cluster I would 
"ceph-deploy new node1" However this will not be new.
Do I just get the ceph.conf and ceph.mon.keyring and move them to the directory 
I just created?
Do I gather keys?
Then run ceph-deploy install?

Thanks in advance.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] VM Data corruption shortly after Luminous Upgrade

2017-11-08 Thread James Forde
Wow, Thanks for the heads-up Jason. That explains a lot. I followed the 
instructions here http://ceph.com/releases/v12-2-0-luminous-released/ which 
apparently left out that step. I have now executed that command.

Is there a new master list of the cli’s?

From: Jason Dillaman [mailto:jdill...@redhat.com]
Sent: Wednesday, November 8, 2017 9:53 AM
To: James Forde 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] VM Data corruption shortly after Luminous Upgrade

Are your QEMU VMs using a different CephX user than client.admin? If so, can 
you double-check your caps to ensure that the QEMU user can blacklist? See step 
6 in the upgrade instructions [1]. The fact that "rbd resize" fixed something 
hints that your VMs had hard-crashed with the exclusive lock left in the locked 
position and QEMU wasn't able to break the lock when the VMs were restarted.

[1] http://docs.ceph.com/docs/master/release-notes/#upgrade-from-jewel-or-kraken

On Wed, Nov 8, 2017 at 10:29 AM, James Forde 
mailto:j...@mninc.net>> wrote:
Title probably should have read “Ceph Data corruption shortly after Luminous 
Upgrade”

Problem seems to have been sorted out. Still not sure why original problem 
other than Upgrade latency?, or mgr errors?
After I resolved the boot problem I attempted to reproduce error, but was 
unsuccessful which is good. HEALTH_OK

Anyway, to future users running into Windows "Unmountable Boot Volume", or 
CentOS7 boot to emergency mode, HERE IS SOLUTION.

Get rbd image size and increase by 1GB and restart VM. That’s it. All VM’s 
booted right up after increasing rbd image by 1024MB. Takes just a couple of 
seconds.


Rbd info vmtest
Rbd image ‘vmtest’:
Size 20480 MB

Rbd resize –image vmtest –size 21504


Rbd info vmtest
Rbd image ‘vmtest’:
Size 21504 MB


Good luck


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] who is using nfs-ganesha and cephfs?

2017-11-08 Thread Lincoln Bryant
Hi Sage,

We have been running the Ganesha FSAL for a while (as far back as Hammer / 
Ganesha 2.2.0), primarily for uid/gid squashing.

Things are basically OK for our application, but we've seen the following 
weirdness*:
- Sometimes there are duplicated entries when directories are listed. 
Same filename, same inode, just shows up twice in 'ls'.
- There can be a considerable latency between new files added to CephFS 
and those files becoming visible on our NFS clients. I understand this might be 
related to dentry caching. 
- Occasionally, the Ganesha FSAL seems to max out at 100,000 caps 
claimed which don't get released until the MDS is restarted.

*note: these issues are with Ganesha 2.2.0 and Hammer/Jewel, and have perhaps 
since been fixed upstream. 

(We've recently updated to Luminous / Ganesha 2.5.2, and will be happy to 
complain if any issues show up :))

Cheers,
Lincoln

> On Nov 8, 2017, at 3:41 PM, Sage Weil  wrote:
> 
> Who is running nfs-ganesha's FSAL to export CephFS?  What has your 
> experience been?
> 
> (We are working on building proper testing and support for this into 
> Mimic, but the ganesha FSAL has been around for years.)
> 
> Thanks!
> sage
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: Luminous RadosGW issue

2017-11-08 Thread Sam Huracan
Hi Cephers,

I'm testing RadosGW in Luminous version.  I've already installed done in
separate host, service is running but RadosGW did not accept any my
configuration in ceph.conf.

My Config:
[client.radosgw.gateway]
host = radosgw
keyring = /etc/ceph/ceph.client.radosgw.keyring
rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
log file = /var/log/radosgw/client.radosgw.gateway.log
*rgw dns name = radosgw.demo.com *
rgw print continue = false


When I show config of radosgw socket:
[root@radosgw ~]# ceph --admin-daemon
/var/run/ceph/ceph-client.rgw.radosgw.asok
config show | grep dns
"mon_dns_srv_name": "",
"*rgw_dns_name": "",*
"rgw_dns_s3website_name": "",

rgw_dns_name is empty, hence S3 API is unable to access Ceph Object Storage.


Do anyone meet this issue?

My ceph version I'm  using is ceph-radosgw-12.2.1-0.el7.x86_64

Thanks in advance
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: Luminous RadosGW issue

2017-11-08 Thread Hans van den Bogert
Are you sure you deployed it with the client.radosgw.gateway name as
well? Try to redeploy the RGW and make sure the name you give it
corresponds to the name you give in the ceph.conf. Also, do not forget
to push the ceph.conf to the RGW machine.

On Wed, Nov 8, 2017 at 11:44 PM, Sam Huracan  wrote:
>
>
> Hi Cephers,
>
> I'm testing RadosGW in Luminous version.  I've already installed done in 
> separate host, service is running but RadosGW did not accept any my 
> configuration in ceph.conf.
>
> My Config:
> [client.radosgw.gateway]
> host = radosgw
> keyring = /etc/ceph/ceph.client.radosgw.keyring
> rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
> log file = /var/log/radosgw/client.radosgw.gateway.log
> rgw dns name = radosgw.demo.com
> rgw print continue = false
>
>
> When I show config of radosgw socket:
> [root@radosgw ~]# ceph --admin-daemon 
> /var/run/ceph/ceph-client.rgw.radosgw.asok config show | grep dns
> "mon_dns_srv_name": "",
> "rgw_dns_name": "",
> "rgw_dns_s3website_name": "",
>
> rgw_dns_name is empty, hence S3 API is unable to access Ceph Object Storage.
>
>
> Do anyone meet this issue?
>
> My ceph version I'm  using is ceph-radosgw-12.2.1-0.el7.x86_64
>
> Thanks in advance
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: Luminous RadosGW issue

2017-11-08 Thread Sam Huracan
@Hans: Yes, I tried to redeploy RGW, and ensure client.radosgw.gateway is
the same in ceph.conf.
Everything go well, service radosgw running, port 7480 is opened, but all
my config of radosgw in ceph.conf can't be set, rgw_dns_name is still
empty, and log file keeps default value.

[root@radosgw system]# ceph --admin-daemon
/var/run/ceph/ceph-client.rgw.radosgw.asok config show | grep log_file
"log_file": "/var/log/ceph/ceph-client.rgw.radosgw.log",


[root@radosgw system]# cat /etc/ceph/ceph.client.radosgw.keyring
[client.radosgw.gateway]
key = AQCsywNaqQdDHxAAC24O8CJ0A9Gn6qeiPalEYg==
caps mon = "allow rwx"
caps osd = "allow rwx"


2017-11-09 6:11 GMT+07:00 Hans van den Bogert :

> Are you sure you deployed it with the client.radosgw.gateway name as
> well? Try to redeploy the RGW and make sure the name you give it
> corresponds to the name you give in the ceph.conf. Also, do not forget
> to push the ceph.conf to the RGW machine.
>
> On Wed, Nov 8, 2017 at 11:44 PM, Sam Huracan 
> wrote:
> >
> >
> > Hi Cephers,
> >
> > I'm testing RadosGW in Luminous version.  I've already installed done in
> separate host, service is running but RadosGW did not accept any my
> configuration in ceph.conf.
> >
> > My Config:
> > [client.radosgw.gateway]
> > host = radosgw
> > keyring = /etc/ceph/ceph.client.radosgw.keyring
> > rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
> > log file = /var/log/radosgw/client.radosgw.gateway.log
> > rgw dns name = radosgw.demo.com
> > rgw print continue = false
> >
> >
> > When I show config of radosgw socket:
> > [root@radosgw ~]# ceph --admin-daemon 
> > /var/run/ceph/ceph-client.rgw.radosgw.asok
> config show | grep dns
> > "mon_dns_srv_name": "",
> > "rgw_dns_name": "",
> > "rgw_dns_s3website_name": "",
> >
> > rgw_dns_name is empty, hence S3 API is unable to access Ceph Object
> Storage.
> >
> >
> > Do anyone meet this issue?
> >
> > My ceph version I'm  using is ceph-radosgw-12.2.1-0.el7.x86_64
> >
> > Thanks in advance
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: Luminous RadosGW issue

2017-11-08 Thread Sam Huracan
I checked ceph pools, cluster has some pools:

[ceph-deploy@ceph1 cluster-ceph]$ ceph osd lspools
2 rbd,3 .rgw.root,4 default.rgw.control,5 default.rgw.meta,6
default.rgw.log,



2017-11-09 11:25 GMT+07:00 Sam Huracan :

> @Hans: Yes, I tried to redeploy RGW, and ensure client.radosgw.gateway is
> the same in ceph.conf.
> Everything go well, service radosgw running, port 7480 is opened, but all
> my config of radosgw in ceph.conf can't be set, rgw_dns_name is still
> empty, and log file keeps default value.
>
> [root@radosgw system]# ceph --admin-daemon 
> /var/run/ceph/ceph-client.rgw.radosgw.asok
> config show | grep log_file
> "log_file": "/var/log/ceph/ceph-client.rgw.radosgw.log",
>
>
> [root@radosgw system]# cat /etc/ceph/ceph.client.radosgw.keyring
> [client.radosgw.gateway]
> key = AQCsywNaqQdDHxAAC24O8CJ0A9Gn6qeiPalEYg==
> caps mon = "allow rwx"
> caps osd = "allow rwx"
>
>
> 2017-11-09 6:11 GMT+07:00 Hans van den Bogert :
>
>> Are you sure you deployed it with the client.radosgw.gateway name as
>> well? Try to redeploy the RGW and make sure the name you give it
>> corresponds to the name you give in the ceph.conf. Also, do not forget
>> to push the ceph.conf to the RGW machine.
>>
>> On Wed, Nov 8, 2017 at 11:44 PM, Sam Huracan 
>> wrote:
>> >
>> >
>> > Hi Cephers,
>> >
>> > I'm testing RadosGW in Luminous version.  I've already installed done
>> in separate host, service is running but RadosGW did not accept any my
>> configuration in ceph.conf.
>> >
>> > My Config:
>> > [client.radosgw.gateway]
>> > host = radosgw
>> > keyring = /etc/ceph/ceph.client.radosgw.keyring
>> > rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
>> > log file = /var/log/radosgw/client.radosgw.gateway.log
>> > rgw dns name = radosgw.demo.com
>> > rgw print continue = false
>> >
>> >
>> > When I show config of radosgw socket:
>> > [root@radosgw ~]# ceph --admin-daemon 
>> > /var/run/ceph/ceph-client.rgw.radosgw.asok
>> config show | grep dns
>> > "mon_dns_srv_name": "",
>> > "rgw_dns_name": "",
>> > "rgw_dns_s3website_name": "",
>> >
>> > rgw_dns_name is empty, hence S3 API is unable to access Ceph Object
>> Storage.
>> >
>> >
>> > Do anyone meet this issue?
>> >
>> > My ceph version I'm  using is ceph-radosgw-12.2.1-0.el7.x86_64
>> >
>> > Thanks in advance
>> >
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] who is using nfs-ganesha and cephfs?

2017-11-08 Thread Wido den Hollander

> Op 8 november 2017 om 22:41 schreef Sage Weil :
> 
> 
> Who is running nfs-ganesha's FSAL to export CephFS?  What has your 
> experience been?
> 

A customer of mine is going this. They are running Ubuntu and my experience is 
that getting Ganesha compiled is already a pain sometimes.

When it runs it runs just fine. I don't hear a lot of complaints from their 
side with Ganesha or NFS not working.

Wido

> (We are working on building proper testing and support for this into 
> Mimic, but the ganesha FSAL has been around for years.)
> 
> Thanks!
> sage
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] High osd cpu usage

2017-11-08 Thread Vy Nguyen Tan
Hello,

I think it not normal behavior in Luminous. I'm testing 3 nodes, each node
have 3 x 1TB HDD, 1 SSD for wal + db, E5-2620 v3, 32GB of RAM, 10Gbps NIC.

I use fio for  I/O performance measurements. When I ran "fio --randrepeat=1
--ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test
--bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75" I get %
CPU each ceph-osd as shown bellow:

   2452 ceph  20   0 2667088 1.813g  15724 S  22.8  5.8  34:41.02
/usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph
   2178 ceph  20   0 2872152 2.005g  15916 S  22.2  6.4  43:22.80
/usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph
   1820 ceph  20   0 2713428 1.865g  15064 S  13.2  5.9  34:19.56
/usr/bin/ceph-osd -f --cluster ceph --id 2 --setuser ceph --setgroup ceph

Are you using bluestore? How many IOPS / disk throughput did you get with
your cluster ?


Regards,

On Wed, Nov 8, 2017 at 8:13 PM, Alon Avrahami 
wrote:

> Hello Guys
>
> We  have a fresh 'luminous'  (  12.2.0 ) 
> (32ce2a3ae5239ee33d6150705cdb24d43bab910c)
> luminous (rc)   ( installed using ceph-ansible )
>
> the cluster contains 6 *  Intel  server board  S2600WTTR  (  96 osds and
> 3 mons )
>
> We have 6 nodes  ( Intel server board  S2600WTTR ) , Mem - 64G , CPU
> -> Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz , 32 cores .
> Each server   has 16 * 1.6TB  Dell SSD drives ( SSDSC2BB016T7R )  , total
> of 96 osds , 3 mons
>
> The main usage  is rbd's for our  OpenStack environment ( Okata )
>
> We're at the beginning of our production tests and it looks like the
> osd's are too busy although  we don't generate  too much iops at this stage
> ( almost nothing )
> All ceph-osds using 50% of CPU usage and I can't figure out why are they
> so busy :
>
> top - 07:41:55 up 49 days,  2:54,  2 users,  load average: 6.85, 6.40, 6.37
>
> Tasks: 518 total,   1 running, 517 sleeping,   0 stopped,   0 zombie
> %Cpu(s): 14.8 us,  4.3 sy,  0.0 ni, 80.3 id,  0.0 wa,  0.0 hi,  0.6 si,
> 0.0 st
> KiB Mem : 65853584 total, 23953788 free, 40342680 used,  1557116 buff/cache
> KiB Swap:  3997692 total,  3997692 free,0 used. 18020584 avail Mem
>
> PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
> COMMAND
>   36713 ceph  20   0 3869588 2.826g  28896 S  47.2  4.5   6079:20
> ceph-osd
>   53981 ceph  20   0 3998732 2.666g  28628 S  45.8  4.2   5939:28
> ceph-osd
>   55879 ceph  20   0 3707004 2.286g  28844 S  44.2  3.6   5854:29
> ceph-osd
>   46026 ceph  20   0 3631136 1.930g  29100 S  43.2  3.1   6008:50
> ceph-osd
>   39021 ceph  20   0 4091452 2.698g  28936 S  42.9  4.3   5687:39
> ceph-osd
>   47210 ceph  20   0 3598572 1.871g  29092 S  42.9  3.0   5759:19
> ceph-osd
>   52763 ceph  20   0 3843216 2.410g  28896 S  42.2  3.8   5540:11
> ceph-osd
>   49317 ceph  20   0 3794760 2.142g  28932 S  41.5  3.4   5872:24
> ceph-osd
>   42653 ceph  20   0 3915476 2.489g  28840 S  41.2  4.0   5605:13
> ceph-osd
>   41560 ceph  20   0 3460900 1.801g  28660 S  38.5  2.9   5128:01
> ceph-osd
>   50675 ceph  20   0 3590288 1.827g  28840 S  37.9  2.9   5196:58
> ceph-osd
>   37897 ceph  20   0 4034180 2.814g  29000 S  34.9  4.5   4789:10
> ceph-osd
>   50237 ceph  20   0 3379780 1.930g  28892 S  34.6  3.1   4846:36
> ceph-osd
>   48608 ceph  20   0 3893684 2.721g  28880 S  33.9  4.3   4752:43
> ceph-osd
>   40323 ceph  20   0 4227864 2.959g  28800 S  33.6  4.7   4712:36
> ceph-osd
>   44638 ceph  20   0 3656780 2.437g  28896 S  33.2  3.9   4793:58
> ceph-osd
>   61639 ceph  20   0  527512 114300  20988 S   2.7  0.2   2722:03
> ceph-mgr
>   31586 ceph  20   0  765672 304140  21816 S   0.7  0.5 409:06.09
> ceph-mon
>  68 root  20   0   0  0  0 S   0.3  0.0   3:09.69
> ksoftirqd/12
>
> strace  doesn't show anything suspicious
>
> root@ecprdbcph10-opens:~# strace -p 36713
> strace: Process 36713 attached
> futex(0x563343c56764, FUTEX_WAIT_PRIVATE, 1, NUL
>
> Ceph logs don't reveal anything?
> Is this "normal" behavior in Luminous?
> Looking out in older threads I can only find a thread about time gaps
> which is not our case
>
> Thanks,
> Alon
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com