Re: [ceph-users] Encryption questions

2019-01-24 Thread Gregory Farnum
On Fri, Jan 11, 2019 at 11:24 AM Sergio A. de Carvalho Jr. <
scarvalh...@gmail.com> wrote:

> Thanks for the answers, guys!
>
> Am I right to assume msgr2 (http://docs.ceph.com/docs/mimic/dev/msgr2/)
> will provide encryption between Ceph daemons as well as between clients and
> daemons?
>
> Does anybody know if it will be available in Nautilus?
>

That’s the intention; people are scrambling a bit to get it in soon enough
to validate before the release.


>
> On Fri, Jan 11, 2019 at 8:10 AM Tobias Florek  wrote:
>
>> Hi,
>>
>> as others pointed out, traffic in ceph is unencrypted (internal traffic
>> as well as client traffic).  I usually advise to set up IPSec or
>> nowadays wireguard connections between all hosts.  That takes care of
>> any traffic going over the wire, including ceph.
>>
>> Cheers,
>>  Tobias Florek
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client instability

2019-01-24 Thread Martin Palma
Hi Ilya,

thank you for the clarification. After setting the
"osd_map_messages_max" to 10 the io errors and the MDS error
"MDS_CLIENT_LATE_RELEASE" are gone.

The messages of  "mon session lost, hunting for new new mon" didn't go
away... can it be that this is related to
https://tracker.ceph.com/issues/23537

Best,
Martin

On Thu, Jan 24, 2019 at 10:16 PM Ilya Dryomov  wrote:
>
> On Thu, Jan 24, 2019 at 6:21 PM Andras Pataki
>  wrote:
> >
> > Hi Ilya,
> >
> > Thanks for the clarification - very helpful.
> > I've lowered osd_map_messages_max to 10, and this resolves the issue
> > about the kernel being unhappy about large messages when the OSDMap
> > changes.  One comment here though: you mentioned that Luminous uses 40
> > as the default, which is indeed the case.  The documentation for
> > Luminous (and master), however, says that the default is 100.
>
> Looks like that page hasn't been kept up to date.  I'll fix that
> section.
>
> >
> > One other follow-up question on the kernel client about something I've
> > been seeing while testing.  Does the kernel client clean up when the MDS
> > asks due to cache pressure?  On a machine I ran something that touches a
> > lot of files, so the kernel client accumulated over 4 million caps.
> > Many hours after all the activity finished (i.e. many hours after
> > anything accesses ceph on that node) the kernel client still holds
> > millions of caps, and the MDS periodically complains about clients not
> > responding to cache pressure.  How is this supposed to be handled?
> > Obviously asking the kernel to drop caches via /proc/sys/vm/drop_caches
> > does a very thorough cleanup, but something in the middle would be better.
>
> The kernel client sitting on way too many caps for way too long is
> a long standing issue.  Adding Zheng who has recently been doing some
> work to facilitate cap releases and put a limit on the overall cap
> count.
>
> Thanks,
>
> Ilya
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] krbd reboot hung

2019-01-24 Thread Gregory Farnum
Looks like your network deactivated before the rbd volume was unmounted.
This is a known issue without a good programmatic workaround and you’ll
need to adjust your configuration.
On Tue, Jan 22, 2019 at 9:17 AM Gao, Wenjun  wrote:

> I’m using krbd to map a rbd device to a VM, it appears when the device is
> mounted, reboot OS will hung for more than 7min, in baremetal case, it
> could be more than 15min, even using the latest kernel 5.0.0, the problem
> still occurs.
>
> Here are the console logs with 4.15.18 kernel and mimic rbd client, reboot
> seems to be stuck in umount rbd operation
>
> *[  OK  ] Stopped Update UTMP about System Boot/Shutdown.*
>
> *[  OK  ] Stopped Create Volatile Files and Directories.*
>
> *[  OK  ] Stopped target Local File Systems.*
>
> * Unmounting /run/user/110281572...*
>
> * Unmounting /var/tmp...*
>
> * Unmounting /root/test...*
>
> * Unmounting /run/user/78402...*
>
> * Unmounting Configuration File System...*
>
> *[  OK  ] Stopped Configure read-only root support.*
>
> *[  OK  ] Unmounted /var/tmp.*
>
> *[  OK  ] Unmounted /run/user/78402.*
>
> *[  OK  ] Unmounted /run/user/110281572.*
>
> *[  OK  ] Stopped target Swap.*
>
> *[  OK  ] Unmounted Configuration File System.*
>
> *[  189.919062] libceph: mon4 XX.XX.XX.XX:6789 session lost, hunting for
> new mon*
>
> *[  189.950085] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  189.950764] libceph: mon3 XX.XX.XX.XX:6789 connect error*
>
> *[  190.687090] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  190.694197] libceph: mon3 XX.XX.XX.XX:6789 connect error*
>
> *[  191.711080] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  191.745254] libceph: mon3 XX.XX.XX.XX:6789 connect error*
>
> *[  193.695065] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  193.727694] libceph: mon3 XX.XX.XX.XX:6789 connect error*
>
> *[  197.087076] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  197.121077] libceph: mon4 XX.XX.XX.XX:6789 connect error*
>
> *[  197.663082] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  197.680671] libceph: mon4 XX.XX.XX.XX:6789 connect error*
>
> *[  198.687122] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  198.719253] libceph: mon4 XX.XX.XX.XX:6789 connect error*
>
> *[  200.671136] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  200.702717] libceph: mon4 XX.XX.XX.XX:6789 connect error*
>
> *[  204.703115] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  204.736586] libceph: mon4 XX.XX.XX.XX:6789 connect error*
>
> *[  209.887141] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  209.918721] libceph: mon0 XX.XX.XX.XX:6789 connect error*
>
> *[  210.719078] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  210.750378] libceph: mon0 XX.XX.XX.XX:6789 connect error*
>
> *[  211.679118] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  211.712246] libceph: mon0 XX.XX.XX.XX:6789 connect error*
>
> *[  213.663116] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  213.696943] libceph: mon0 XX.XX.XX.XX:6789 connect error*
>
> *[  217.695062] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  217.728511] libceph: mon0 XX.XX.XX.XX:6789 connect error*
>
> *[  225.759109] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  225.775869] libceph: mon0 XX.XX.XX.XX:6789 connect error*
>
> *[  233.951062] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  233.951997] libceph: mon3 XX.XX.XX.XX:6789 connect error*
>
> *[  234.719114] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  234.720083] libceph: mon3 XX.XX.XX.XX:6789 connect error*
>
> *[  235.679112] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  235.680060] libceph: mon3 XX.XX.XX.XX:6789 connect error*
>
> *[  237.663088] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  237.664121] libceph: mon3 XX.XX.XX.XX:6789 connect error*
>
> *[  241.695082] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  241.696500] libceph: mon3 XX.XX.XX.XX:6789 connect error*
>
> *[  249.823095] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  249.824101] libceph: mon3 XX.XX.XX.XX:6789 connect error*
>
> *[  264.671119] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  264.672102] libceph: mon0 XX.XX.XX.XX:6789 connect error*
>
> *[  265.695109] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  265.696106] libceph: mon0 XX.XX.XX.XX:6789 connect error*
>
> *[  266.719145] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  266.720204] libceph: mon0 XX.XX.XX.XX:6789 connect error*
>
> *[  268.703121] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  268.704110] libceph: mon0 XX.XX.XX.XX:6789 connect error*
>
> *[  272.671115] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  272.672159] libceph: mon0 XX.XX.XX.XX:6789 connect error*
>
> *[  281.055087] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  281.056577] libceph: mon0 XX.XX.XX.XX:6789 connect error*
>
> *[  294.879098] libceph: connect XX.XX.XX.XX:6789 error -101*
>
> *[  294.88

Re: [ceph-users] backfill_toofull while OSDs are not full

2019-01-24 Thread Gregory Farnum
This doesn’t look familiar to me. Is the cluster still doing recovery so we
can at least expect them to make progress when the “out” OSDs get removed
from the set?
On Tue, Jan 22, 2019 at 2:44 PM Wido den Hollander  wrote:

> Hi,
>
> I've got a couple of PGs which are stuck in backfill_toofull, but none
> of them are actually full.
>
>   "up": [
> 999,
> 1900,
> 145
>   ],
>   "acting": [
> 701,
> 1146,
> 1880
>   ],
>   "backfill_targets": [
> "145",
> "999",
> "1900"
>   ],
>   "acting_recovery_backfill": [
> "145",
> "701",
> "999",
> "1146",
> "1880",
> "1900"
>   ],
>
> I checked all these OSDs, but they are all <75% utilization.
>
> full_ratio 0.95
> backfillfull_ratio 0.9
> nearfull_ratio 0.9
>
> So I started checking all the PGs and I've noticed that each of these
> PGs has one OSD in the 'acting_recovery_backfill' which is marked as out.
>
> In this case osd.1880 is marked as out and thus it's capacity is shown
> as zero.
>
> [ceph@ceph-mgr ~]$ ceph osd df|grep 1880
> 1880   hdd 4.545990 0 B  0 B  0 B 00  27
> [ceph@ceph-mgr ~]$
>
> This is on a Mimic 13.2.4 cluster. Is this expected or is this a unknown
> side-effect of one of the OSDs being marked as out?
>
> Thanks,
>
> Wido
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] create osd failed due to cephx authentication

2019-01-24 Thread Zhenshi Zhou
Hi,

I add --no-mon-config option after the command cause the error shows
that I can try this option. And the command executes successfully. Now
I have added OSDs into the cluster and seems well.

I'm wondering whether this option have any effect to osd or not?

Thanks

Marc Roos  于2019年1月25日周五 上午4:16写道:

>
>
> ceph osd create
>
> ceph osd rm osd.15
>
> sudo -u ceph mkdir /var/lib/ceph/osd/ceph-15
>
> ceph-disk prepare --bluestore --zap-disk /dev/sdc (bluestore)
>
> blkid /dev/sdb1
> echo "UUID=a300d511-8874-4655-b296-acf489d5cbc8
> /var/lib/ceph/osd/ceph-15 xfs defaults   0 0" >> /etc/fstab
> mount /var/lib/ceph/osd/ceph-15
> chown ceph.ceph -R /var/lib/ceph/osd/ceph-15
>
>
> sudo -u ceph ceph-osd -i 15 --mkfs --mkkey --osd-uuid
> sudo -u ceph ceph auth add osd.15 osd 'allow *' mon 'allow profile osd'
> mgr 'allow profile osd' -i /var/lib/ceph/osd/ceph-15/keyring
>
> ceph osd create
>
> sudo -u ceph ceph osd crush add osd.15 0.4 host=c04
>
> systemctl start ceph-osd@15
> systemctl enable ceph-osd@15
>
>
>
>
>
> -Original Message-
> From: Zhenshi Zhou [mailto:deader...@gmail.com]
> Sent: 24 January 2019 10:32
> To: ceph-users
> Subject: [ceph-users] create osd failed due to cephx authentication
>
> Hi,
>
> I'm installing a new ceph cluster manually. I get errors when I create
> osd:
>
> # ceph-osd -i 0 --mkfs --mkkey
> 2019-01-24 17:07:44.045 7f45f497b1c0 -1 auth: unable to find a keyring
> on /var/lib/ceph/osd/ceph-0/keyring: (2) No such file or directory
> 2019-01-24 17:07:44.045 7f45f497b1c0 -1 monclient: ERROR: missing
> keyring, cannot use cephx for authentication
>
> Some informations are provided, did I miss anything?
>
> # cat /etc/ceph/ceph.conf:
> [global]
> ...
> [osd.0]
> host = ceph-osd1
> osd data = /var/lib/ceph/osd/ceph-0
> bluestore block path = /dev/disk/by-partlabel/bluestore_block_0
> bluestore block db path = /dev/disk/by-partlabel/bluestore_block_db_0
> bluestore block wal path = /dev/disk/by-partlabel/bluestore_block_wal_0
>
> # ls /var/lib/ceph/osd/ceph-0
> type
>
> # cat /var/lib/ceph/osd/ceph-0/type
> bluestore
>
> # ceph -s
>   cluster:
> id: 7712ab7e-3c38-44b3-96d3-4e1de9da0ff6
> health: HEALTH_OK
>
>   services:
> mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3
> mgr: ceph-mon1(active), standbys: ceph-mon2, ceph-mon3
> osd: 1 osds: 0 up, 0 in
>
>   data:
> pools:   0 pools, 0 pgs
> objects: 0  objects, 0 B
> usage:   0 B used, 0 B / 0 B avail
> pgs:
>
> # ceph --version
> ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic
> (stable)
>
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Commercial support

2019-01-24 Thread Joachim Kraftmayer

Hi Ketil,

We also offer independent ceph consulting and
and operate productive cluster for more than 4 years and up 2500 osds.

You can meet many in person at the next cephalocon in Barcelona. 
(https://ceph.com/cephalocon/barcelona-2019/)


Regards, Joachim

Clyso GmbH

Homepage: https://www.clyso.com

Am 24.01.2019 um 15:23 schrieb Matthew Vernon:

Hi,

On 23/01/2019 22:28, Ketil Froyn wrote:

How is the commercial support for Ceph? More specifically, I was  
recently pointed in the direction of the very interesting combination 
of CephFS, Samba and ctdb. Is anyone familiar with companies that 
provide commercial support for in-house solutions like this?


To add to the answers you've already had:

Ubuntu also offer Ceph & Swift support:
https://www.ubuntu.com/support/plans-and-pricing#storage

Croit offer their own managed Ceph product, but do also offer 
support/consulting for Ceph installs, I think:

https://croit.io/

There are some smaller consultancies, too, including 42on which is run 
by Wido den Hollander who you will have seen posting here:

https://www.42on.com/

Regards,

Matthew
disclaimer: I have no commercial relationship to any of the above


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Salvage CEPHFS after lost PG

2019-01-24 Thread Rik
Thanks Marc,

When I next have physical access to the cluster, I’ll add some more OSDs. Would 
that cause the hanging though?

No takers on the bluestore salvage?

thanks,
rik.

> On 20 Jan 2019, at 20:36, Marc Roos  wrote:
> 
> 
> If you have a backfillfull, no pg's will be able to migrate. 
> Better is to just add harddrives, because at least one of your osd's is 
> to full.
> 
> I know you can set the backfillfull ratio's with commands like these
> ceph tell osd.* injectargs '--mon_osd_full_ratio=0.97'
> ceph tell osd.* injectargs '--mon_osd_backfillfull_ratio=0.95'
> 
> ceph tell osd.* injectargs '--mon_osd_full_ratio=0.95'
> ceph tell osd.* injectargs '--mon_osd_backfillfull_ratio=0.90'
> 
> Or maybe decrease the weight of the full osd, check the osds with 'ceph 
> osd status' and make sure your nodes have even distribution of the 
> storage.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -Original Message-
> From: Rik [mailto:r...@kawaja.net] 
> Sent: zondag 20 januari 2019 8:47
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] Salvage CEPHFS after lost PG
> 
> Hi all,
> 
> 
> 
> 
> I'm looking for some suggestions on how to do something inappropriate. 
> 
> 
> 
> 
> In a nutshell, I've lost the WAL/DB for three bluestore OSDs on a small 
> cluster and, as a result of those three OSDs going offline, I've lost a 
> placement group (7.a7). How I achieved this feat is an embarrassing 
> mistake, which I don't think has bearing on my question.
> 
> 
> 
> 
> The OSDs were created a few months ago with ceph-deploy:
> 
> /usr/local/bin/ceph-deploy --overwrite-conf osd create --bluestore 
> --data /dev/vdc1 --block-db /dev/vdf1 ceph-a
> 
> 
> 
> 
> With the 3 OSDs out, I'm sitting at OSD_BACKFILLFULL.
> 
> 
> 
> 
> First, the PG 7.a7 belongs to the data pool, rather than the metadata 
> pool and if I run "cephfs-data-scan pg_files / 7.a7" then I get a list 
> of 4149 files/objects but then it hangs. I don't understand why this 
> would hang if it's only the data pool which is impacted (since pg_files 
> only operates on the metadata pool?).
> 
> 
> 
> 
> The ceph-log shows:
> 
> cluster [WRN] slow request 30.894832 seconds old, received at 2019-01-20 
> 18:00:12.555398: client_request(client.25017730:21
> 
> 8006 lookup #0x10001c8ce15/01 2019-01-20 18:00:12.550421 
> caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting
> 
> 
> 
> 
> Is the hang perhaps related to the OSD_BACKFILLFULL? If so, I could add 
> some completely new OSDs to fix that problem. I have held off doing that 
> for now as that will trigger a whole lot of data movement which might be 
> unnecessary.
> 
> 
> 
> 
> Or is the hang indeed related to the missing PG?
> 
> 
> 
> 
> Second, if I try to copy files out of the CEPHFS filesystem, I get a few 
> hundred files and then it too hangs. None of the files I’m attempting 
> to copy are listed in the pg_files output (although since the pg_files 
> hangs, perhaps it hadn't got to those files yet). Again, should I not be 
> able to access files which are not associated with the a missing data 
> pool PG?
> 
> 
> 
> 
> Lastly, I want to know if there is some way to recreate the WAL/DB while 
> leaving the OSD data intact and/or fool one of the OSDs into thinking 
> everything is OK, allowing it to serve up the data it has in the missing 
> PG.
> 
> 
> 
> 
> From reading the mailing list and documentation, I know that this is not 
> a "safe" operation:
> 
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021713.html
> 
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-January/024268.html
> 
> 
> 
> 
> However, my current status indicates an unusable CEPHFS and limited 
> access to the data. I'd like to get as much data off it as possible and 
> then I expect to have to recreate it. With a combination of the backups 
> I have and what I can salvage from the cluster, I should hopefully have 
> most of what I need.
> 
> 
> 
> 
> I know what I *should* have done, but now I'm at this point, I know I'm 
> asking for something which would never be required on a properly-run 
> cluster.
> 
> 
> 
> 
> If it really is not possible to get the (possibly corrupt) PG back 
> again, can I get the cluster back so the remainder of the files are 
> accessible?
> 
> 
> 
> 
> Currently running mimic 13.2.4 on all nodes.
> 
> 
> 
> 
> Status:
> 
> $ ceph health detail - 
> https://gist.github.com/kawaja/f59d231179b3186748eca19aae26bcd4
> 
> $ ceph fs get main - 
> https://gist.github.com/kawaja/a7ab0b285d53dee6a950a4310be4fa5a
> 
> 
> 
> 
> Any advice on where I could go from here would be greatly appreciated.
> 
> 
> 
> 
> thanks,
> 
> rik.
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Creating bootstrap keys

2019-01-24 Thread Randall Smith
Greetings,

I have a ceph cluster that I've been running since the argonaut release.
I've been upgrading it over the years and now it's mostly on mimic. A
number of the tools (eg. ceph-volume) require the bootstrap keys that are
assumed to be in /var/lib/ceph/bootstrap-*/. Because of the history of this
cluster, I do not have these keys. What is the correct way to create them?

Thanks

-- 
Randall Smith
Computing Services
Adams State University
http://www.adams.edu/
719-587-7741
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw s3 subuser permissions

2019-01-24 Thread Marc Roos


This should do it sort of.

{
  "Id": "Policy1548367105316",
  "Version": "2012-10-17",
  "Statement": [
{
  "Sid": "Stmt1548367099807",
  "Effect": "Allow",
  "Action": "s3:ListBucket",
  "Principal": { "AWS": "arn:aws:iam::Company:user/testuser" },
  "Resource": "arn:aws:s3:::archive"
},
{
  "Sid": "Stmt1548369229354",
  "Effect": "Allow",
  "Action": [
"s3:GetObject",
"s3:PutObject",
"s3:ListBucket"
  ],
  "Principal": { "AWS": "arn:aws:iam::Company:user/testuser" },
  "Resource": "arn:aws:s3:::archive/folder2/*"
}
  ]
} 





-Original Message-
From: Matt Benjamin [mailto:mbenj...@redhat.com] 
Sent: 24 January 2019 21:36
To: Marc Roos
Cc: ceph-users
Subject: Re: [ceph-users] Radosgw s3 subuser permissions

Hi Marc,

I'm not actually certain whether the traditional ACLs permit any 
solution for that, but I believe with bucket policy, you can achieve 
precise control within and across tenants, for any set of desired 
resources (buckets).

Matt

On Thu, Jan 24, 2019 at 3:18 PM Marc Roos  
wrote:
>
>
> It is correct that it is NOT possible for s3 subusers to have 
> different permissions on folders created by the parent account?
> Thus the --access=[ read | write | readwrite | full ] is for 
> everything the parent has created, and it is not possible to change 
> that for specific folders/buckets?
>
> radosgw-admin subuser create --uid='Company$archive' 
> --subuser=testuser
> --key-type=s3
>
> Thus if archive created this bucket/folder structure.
> └── bucket
> ├── folder1
> ├── folder2
> └── folder3
> └── folder4
>
> It is not possible to allow testuser to only write in folder2?
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Commercial support

2019-01-24 Thread Martin Verges
Hello Ketil,

we as croit offer commercial support for Ceph as well as our own Ceph based
unified storage solution.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


Am Mi., 23. Jan. 2019 um 23:29 Uhr schrieb Ketil Froyn :

> Hi,
>
> How is the commercial support for Ceph? More specifically, I was  recently
> pointed in the direction of the very interesting combination of CephFS,
> Samba and ctdb. Is anyone familiar with companies that provide commercial
> support for in-house solutions like this?
>
> Regards, Ketil
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cephfs kernel client hung after eviction

2019-01-24 Thread Tim Bishop
Hi,

I have a cephfs kernel client (Ubuntu 18.04 4.15.0-34-generic) that's
completely hung after the client was evicted by the MDS.

The client logged:

Jan 24 17:31:46 client kernel: [10733559.309496] libceph: FULL or reached pool 
quota
Jan 24 17:32:26 client kernel: [10733599.232213] libceph: mon0 n.n.n.n:6789 
session lost, hunting for new mon

And the MDS logged:

2019-01-24 17:36:38.859 7f3ac7844700  0 log_channel(cluster) log [WRN] : 
evicting unresponsive client client:cephfs-client (86527773), after 300.081 
seconds

Looking in mdsc shows:

% head /sys/kernel/debug/ceph/[id].client86527773/mdsc
20  mds0getattr  #103d042
21  mds0getattr  #103d042
22  mds0getattr  #103d042
23  mds0getattr  #103d042
24  mds0getattr  #103d042
25  mds0getattr  #103d042
26  mds0getattr  #103d042
27  mds0getattr  #103d042
28  mds0getattr  #103d042
29  mds0getattr  #103d042

But osdc hangs when I try to access it.

I've tried umount -f but it hangs too. umount -l hides the problem (df
no longer hangs), but any processes that were trying to access the mount
are still blocked.

I've also tried switching back and forth to standby MDSs in case that
unblocked something. There are no current OSD blacklist entries either.

It looks like rebooting is the only option, but that's somewhat of a
pain to do. There's lots of people using this machine :-(

Any ideas?

Tim.

-- 
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x6C226B37FDF38D55

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] migrate ceph-disk to ceph-volume fails with dmcrypt

2019-01-24 Thread mlausch




Am 24.01.19 um 22:34 schrieb Alfredo Deza:


I have both, plain and luks.
At the moment I played around with the plain dmcrypt OSDs and run into
this problem. I didn't test the luks crypted OSDs.


There is support in the JSON file to define the type of encryption with the key:

 encryption_type

If this is undefined it will default to 'plain'. So that tells me that
we may indeed have a problem here.


Ah yes. I set in my json this:
"encryption_type": "plain"

But as far as I see if this Key is missing plain is default. So this 
should be OK.




I'm not sure
what might be needed here, but I do recall having some trouble trying
to understand what ceph-disk was doing. That is
capture in this comment
https://github.com/ceph/ceph/blob/v12.2.10/src/ceph-volume/ceph_volume/devices/simple/activate.py#L150-L155
> Do you think that might be related?


The comment confusing me a bit.
As far as I read the code. The base64 encoded key is retrieved from the 
monitor decoded and then passed via stdin to the cryptsetup command. At 
first I thought this was all OK. But after some investigation what 
cryptsetup is doing I think there should be a passed the following 
option as well to cryptsetup.

--hash Plain

ceph-disk uses the local keyfile which is *not* base64 encoded.
The way ceph-disk passes the keyfile to cryptsetup, cryptsetup will not 
hash the key with some default algorithm.





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] migrate ceph-disk to ceph-volume fails with dmcrypt

2019-01-24 Thread Alfredo Deza
On Thu, Jan 24, 2019 at 4:13 PM mlausch  wrote:
>
>
>
> Am 24.01.19 um 22:02 schrieb Alfredo Deza:
> >>
> >> Ok with a new empty journal the OSD will not start. I have now rescued
> >> the data with dd and the recrypt it with a other key and copied the
> >> data back. This worked so far
> >>
> >> Now I encoded the key with base64 and put it to the key-value store.
> >> Also created the neccessary authkeys. Creating the json File by hand
> >> was quiet easy.
> >>
> >> But now there is one problem.
> >> ceph-disk opens the crypt like
> >> cryptsetup --key-file /etc/ceph/dmcrypt-keys/foobar ...
> >> ceph-volume pipes the key via stdin like this
> >> cat foobar | cryptsetup --key-file - ...
> >>
> >> The big problem. if the key is given via stdin cryptsetup hashes this
> >> key per default with some hash. Only if I set --hash plain it works. I
> >> think this is a bug in ceph-volume.
> >>
> >> Can someone confirm this?
> >
> > Ah right, this is when it was supported to have keys in a file.
> >
> > What type of keys do you have: LUKS or plain?
>
> I have both, plain and luks.
> At the moment I played around with the plain dmcrypt OSDs and run into
> this problem. I didn't test the luks crypted OSDs.

There is support in the JSON file to define the type of encryption with the key:

encryption_type

If this is undefined it will default to 'plain'. So that tells me that
we may indeed have a problem here. I'm not sure
what might be needed here, but I do recall having some trouble trying
to understand what ceph-disk was doing. That is
capture in this comment
https://github.com/ceph/ceph/blob/v12.2.10/src/ceph-volume/ceph_volume/devices/simple/activate.py#L150-L155

Do you think that might be related?
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client instability

2019-01-24 Thread Ilya Dryomov
On Thu, Jan 24, 2019 at 6:21 PM Andras Pataki
 wrote:
>
> Hi Ilya,
>
> Thanks for the clarification - very helpful.
> I've lowered osd_map_messages_max to 10, and this resolves the issue
> about the kernel being unhappy about large messages when the OSDMap
> changes.  One comment here though: you mentioned that Luminous uses 40
> as the default, which is indeed the case.  The documentation for
> Luminous (and master), however, says that the default is 100.

Looks like that page hasn't been kept up to date.  I'll fix that
section.

>
> One other follow-up question on the kernel client about something I've
> been seeing while testing.  Does the kernel client clean up when the MDS
> asks due to cache pressure?  On a machine I ran something that touches a
> lot of files, so the kernel client accumulated over 4 million caps.
> Many hours after all the activity finished (i.e. many hours after
> anything accesses ceph on that node) the kernel client still holds
> millions of caps, and the MDS periodically complains about clients not
> responding to cache pressure.  How is this supposed to be handled?
> Obviously asking the kernel to drop caches via /proc/sys/vm/drop_caches
> does a very thorough cleanup, but something in the middle would be better.

The kernel client sitting on way too many caps for way too long is
a long standing issue.  Adding Zheng who has recently been doing some
work to facilitate cap releases and put a limit on the overall cap
count.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] migrate ceph-disk to ceph-volume fails with dmcrypt

2019-01-24 Thread mlausch




Am 24.01.19 um 22:02 schrieb Alfredo Deza:


Ok with a new empty journal the OSD will not start. I have now rescued
the data with dd and the recrypt it with a other key and copied the
data back. This worked so far

Now I encoded the key with base64 and put it to the key-value store.
Also created the neccessary authkeys. Creating the json File by hand
was quiet easy.

But now there is one problem.
ceph-disk opens the crypt like
cryptsetup --key-file /etc/ceph/dmcrypt-keys/foobar ...
ceph-volume pipes the key via stdin like this
cat foobar | cryptsetup --key-file - ...

The big problem. if the key is given via stdin cryptsetup hashes this
key per default with some hash. Only if I set --hash plain it works. I
think this is a bug in ceph-volume.

Can someone confirm this?


Ah right, this is when it was supported to have keys in a file.

What type of keys do you have: LUKS or plain?


I have both, plain and luks.
At the moment I played around with the plain dmcrypt OSDs and run into 
this problem. I didn't test the luks crypted OSDs.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client instability

2019-01-24 Thread Ilya Dryomov
On Thu, Jan 24, 2019 at 8:16 PM Martin Palma  wrote:
>
> We are experiencing the same issues on clients with CephFS mounted
> using the kernel client and 4.x kernels.
>
> The problem  shows up when we add new OSDs, on reboots after
> installing patches and when changing the weight.
>
> Here the logs of a misbehaving client;
>
> [6242967.890611] libceph: mon4 10.8.55.203:6789 session established
> [6242968.010242] libceph: osd534 10.7.55.23:6814 io error
> [6242968.259616] libceph: mon1 10.7.55.202:6789 io error
> [6242968.259658] libceph: mon1 10.7.55.202:6789 session lost, hunting
> for new mon
> [6242968.359031] libceph: mon4 10.8.55.203:6789 session established
> [6242968.622692] libceph: osd534 10.7.55.23:6814 io error
> [6242968.692274] libceph: mon4 10.8.55.203:6789 io error
> [6242968.692337] libceph: mon4 10.8.55.203:6789 session lost, hunting
> for new mon
> [6242968.694216] libceph: mon0 10.7.55.201:6789 session established
> [6242969.099862] libceph: mon0 10.7.55.201:6789 io error
> [6242969.099888] libceph: mon0 10.7.55.201:6789 session lost, hunting
> for new mon
> [6242969.224565] libceph: osd534 10.7.55.23:6814 io error
>
> Additional to the MON io error we also got some OSD io errors.

This isn't surprising -- the kernel client can receive osdmaps from
both monitors and OSDs.

>
> Moreover when the error occurs several clients causes a
> "MDS_CLIENT_LATE_RELEASE" error on the MDS server.
>
> We are currently running on Luminous 12.2.10 and have around 580 OSDs
> and 5 monitor nodes. The cluster is running on CentOS 7.6.
>
> The ‘osd_map_message_max’ setting is set to the default value of 40.
> But we are still getting these errors.

My advise is the same: set it to 20 or even 10.  The problem is this
setting is in terms of the number of osdmaps instead of the size of the
resulting message.  I've filed

  http://tracker.ceph.com/issues/38040

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client instability

2019-01-24 Thread Andras Pataki

Hi Ilya,

Thanks for the clarification - very helpful.
I've lowered osd_map_messages_max to 10, and this resolves the issue 
about the kernel being unhappy about large messages when the OSDMap 
changes.  One comment here though: you mentioned that Luminous uses 40 
as the default, which is indeed the case.  The documentation for 
Luminous (and master), however, says that the default is 100.


One other follow-up question on the kernel client about something I've 
been seeing while testing.  Does the kernel client clean up when the MDS 
asks due to cache pressure?  On a machine I ran something that touches a 
lot of files, so the kernel client accumulated over 4 million caps.  
Many hours after all the activity finished (i.e. many hours after 
anything accesses ceph on that node) the kernel client still holds 
millions of caps, and the MDS periodically complains about clients not 
responding to cache pressure.  How is this supposed to be handled?  
Obviously asking the kernel to drop caches via /proc/sys/vm/drop_caches 
does a very thorough cleanup, but something in the middle would be better.


Andras


On 1/16/19 1:45 PM, Ilya Dryomov wrote:

On Wed, Jan 16, 2019 at 7:12 PM Andras Pataki
 wrote:

Hi Ilya/Kjetil,

I've done some debugging and tcpdump-ing to see what the interaction
between the kernel client and the mon looks like.  Indeed -
CEPH_MSG_MAX_FRONT defined as 16Mb seems low for the default mon
messages for our cluster (with osd_mon_messages_max at 100).  We have
about 3500 osd's, and the kernel advertises itself as older than

This is too big, especially for a fairly large cluster such as yours.
The default was reduced to 40 in luminous.  Given about 3500 OSDs, you
might want to set it to 20 or even 10.


Luminous, so it gets full map updates.  The FRONT message size on the
wire I saw was over 24Mb.  I'll try setting osd_mon_messages_max to 30
and do some more testing, but from the debugging it definitely seems
like the issue.

Is the kernel driver really not up to date to be considered at least a
Luminous client by the mon (i.e. it has some feature really missing)?  I
looked at the bits, and the MON seems to want is bit 59 in ceph features
shared by FS_BTIME, FS_CHANGE_ATTR, MSG_ADDR2.  Can the kernel client be
used when setting require-min-compat to luminous (either with the 4.19.x
kernel or the Redhat/Centos 7.6 kernel)?  Some background here would be
helpful.

Yes, the kernel client is missing support for that feature bit, however
4.13+ and RHEL 7.5+ _can_ be used with require-min-compat-client set to
luminous.  See

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/027002.html

Thanks,

 Ilya

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] migrate ceph-disk to ceph-volume fails with dmcrypt

2019-01-24 Thread Alfredo Deza
On Thu, Jan 24, 2019 at 3:17 PM Manuel Lausch  wrote:
>
>
>
> On Wed, 23 Jan 2019 16:32:08 +0100
> Manuel Lausch  wrote:
>
>
> > >
> > > The key api for encryption is *very* odd and a lot of its quirks are
> > > undocumented. For example, ceph-volume is stuck supporting naming
> > > files and keys 'lockbox'
> > > (for backwards compatibility) but there is no real lockbox anymore.
> > > Another quirk is that when storing the secret in the monitor, it is
> > > done using the following convention:
> > >
> > > dm-crypt/osd/{OSD FSID}/luks
> > >
> > > The 'luks' part there doesn't indicate anything about the type of
> > > encryption (!!) so regardless of the type of encryption (luks or
> > > plain) the key would still go there.
> > >
> > > If you manage to get the keys into the monitors you still wouldn't
> > > be able to scan OSDs to produce the JSON files, but you would be
> > > able to create the JSON file with the
> > > metadata that ceph-volume needs to run the OSD.
> >
> > I think it is not that problem to create the json files by myself.
> > Moving the Keys to the monitors and creating appropriate auth-keys
> > should be more or less easy as well.
> >
> > The problem I see is, that there are individual keys for the journal
> > and data partition while the new process useses only one key for both
> > partitions.
> >
> > maybe I can recreate the journal partition with the other key. But is
> > this possible? Are there important data ramaining on the journal after
> > clean stopping the OSD which I cannot throw away without trashing the
> > whole OSD?
> >
>
> Ok with a new empty journal the OSD will not start. I have now rescued
> the data with dd and the recrypt it with a other key and copied the
> data back. This worked so far
>
> Now I encoded the key with base64 and put it to the key-value store.
> Also created the neccessary authkeys. Creating the json File by hand
> was quiet easy.
>
> But now there is one problem.
> ceph-disk opens the crypt like
> cryptsetup --key-file /etc/ceph/dmcrypt-keys/foobar ...
> ceph-volume pipes the key via stdin like this
> cat foobar | cryptsetup --key-file - ...
>
> The big problem. if the key is given via stdin cryptsetup hashes this
> key per default with some hash. Only if I set --hash plain it works. I
> think this is a bug in ceph-volume.
>
> Can someone confirm this?

Ah right, this is when it was supported to have keys in a file.

What type of keys do you have: LUKS or plain?
>
> there is the related code I mean in ceph-volume
> https://github.com/ceph/ceph/blob/v12.2.10/src/ceph-volume/ceph_volume/util/encryption.py#L59
>
> Regards
> Manuel
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-users Digest, Vol 72, Issue 20

2019-01-24 Thread Charles Tassell
I tried setting noout and that did provide a bit better result.  
Basically I could stop the OSD on the inactive server and everything 
still worked (after a 2-3 second pause) but then when I rebooted the 
inactive server everything hung again until it came back online and 
resynced with the cluster.  This is what I saw in ceph -s:


    cluster eb2003cf-b16d-4551-adb7-892469447f89
 health HEALTH_WARN
    128 pgs degraded
    124 pgs stuck unclean
    128 pgs undersized
    recovery 805252/1610504 objects degraded (50.000%)
    mds cluster is degraded
    1/2 in osds are down
    noout flag(s) set
 monmap e1: 3 mons at 
{FILE1=10.1.1.201:6789/0,FILE2=10.1.1.202:6789/0,MON1=10.1.1.90:6789/0}

    election epoch 216, quorum 0,1,2 FILE1,FILE2,MON1
  fsmap e796: 1/1/1 up {0=FILE2=up:rejoin}
 osdmap e360: 2 osds: 1 up, 2 in; 128 remapped pgs
    flags noout,sortbitwise,require_jewel_osds
  pgmap v7056802: 128 pgs, 3 pools, 164 GB data, 786 kobjects
    349 GB used, 550 GB / 899 GB avail
    805252/1610504 objects degraded (50.000%)
 128 active+undersized+degraded
  client io 1379 B/s rd, 1 op/s rd, 0 op/s wr

These are the commands I ran and the results:
ceph osd set noout
systemctl stop ceph-mds@FILE2.service
# Everything still works on the clients...
systemctl stop ceph-osd@0.service # This was on FILE2 wile FILE1 was the 
active fsmap

# Fails over quickly, can still read content on the clients..
# Rebooted FILE2
# File access on the clients locked up until FILE2 rejoined


This is on Ubuntu 16 with kernel 4.4.0-141, so I'm not sure if that 
qualifies for David's warning about old kernels...


Is there a command or a logfile I can look at that will better help to 
diagnose this issue?  Is three servers (with only 2 OSDs) enough to run 
a HA cluster on ceph, or does it just die when it doesn't have 3 active 
servers for a quorum? Would installing MDS and MON on a 4th box (but 
sticking with 2 OSDs) be what's required to resolve this?  I really 
don't want to do that, but if I have to I guess I can look into find 
another box.



On 2019-01-21 5:01 p.m., ceph-users-requ...@lists.ceph.com wrote:

Message: 14
Date: Mon, 21 Jan 2019 10:05:15 +0100
From: Robert Sander
To:ceph-users@lists.ceph.com
Subject: Re: [ceph-users] How To Properly Failover a HA Setup
Message-ID:<587dac75-96bc-8719-ee62-38e71491c...@heinlein-support.de>
Content-Type: text/plain; charset="utf-8"

On 21.01.19 09:22, Charles Tassell wrote:


Hello Everyone,

  ? I've got a 3 node Jewel cluster setup, and I think I'm missing
something.? When I want to take one of my nodes down for maintenance
(kernel upgrades or the like) all of my clients (running the kernel
module for the cephfs filesystem) hang for a couple of minutes before
the redundant servers kick in.


Have you set the noout flag before doing cluster maintenance?

ceph osd set noout

and afterwards

ceph osd unset noout

Regards
-- Robert Sander Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrating to a dedicated cluster network

2019-01-24 Thread Paul Emmerich
Split networks is rarely worth it. One fast network is usually better.
And since you mentioned having only two interfaces: one bond is way
better than two independent interfaces.

IPv4/6 dual stack setups will be supported in Nautilus, you currently
have to use either IPv4 or IPv6.

Jumbo frames: often mentioned but usually not worth it.
(Yes, I know that this is somewhat controversial and increasing MTU is
often a standard trick for performance tuning, but I still have to see
have a benchmark that actually shows a significant performance
improvements. Some quick tests show that I can save around 5-10% CPU
load on a system doing ~50 gbit/s of IO traffic which is almost
nothing given the total system load)



Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Wed, Jan 23, 2019 at 11:41 AM Jan Kasprzak  wrote:
>
> Jakub Jaszewski wrote:
> : Hi Yenya,
> :
> : Can I ask how your cluster looks and  why you want to do the network
> : splitting?
>
> Jakub,
>
> we have deployed the Ceph cluster originally as a proof of concept for
> a private cloud. We run OpenNebula and Ceph on about 30 old servers
> with old HDDs (2 OSDs per host), all connected via 1 Gbit ethernet
> with 10Gbit backbone. Since then our private cloud got pretty popular
> among our users, so we are planning to upgrade it to a smaller amount
> of modern servers. The new servers have two 10GbE interfaces, so the primary
> reasoning behind it is "why not use them both when we already have them".
> Of course, interface teaming/bonding is another option.
>
> Currently I see the network being saturated only when doing a live
> migration of a VM between the physical hosts, and then during a Ceph
> cluster rebalance.
>
> So, I don't think moving to a dedicated cluster network is a necessity for us.
>
> Anyway, does anybody use the cluster network with larger MTU (jumbo frames)?
>
> : We used to set up 9-12 OSD nodes (12-16 HDDs each) clusters using 2x10Gb
> : for access and 2x10Gb for cluster network, however, I don't see the reasons
> : to not use just one network for next cluster setup.
>
>
> -Yenya
>
> : śr., 23 sty 2019, 10:40: Jan Kasprzak  napisał(a):
> :
> : > Hello, Ceph users,
> : >
> : > is it possible to migrate already deployed Ceph cluster, which uses
> : > public network only, to a split public/dedicated networks? If so,
> : > can this be done without service disruption? I have now got a new
> : > hardware which makes this possible, but I am not sure how to do it.
> : >
> : > Another question is whether the cluster network can be done
> : > solely on top of IPv6 link-local addresses without any public address
> : > prefix.
> : >
> : > When deploying this cluster (Ceph Firefly, IIRC), I had problems
> : > with mixed IPv4/IPv6 addressing, and ended up with ms_bind_ipv6 = false
> : > in my Ceph conf.
> : >
> : > Thanks,
> : >
> : > -Yenya
> : >
> : > --
> : > | Jan "Yenya" Kasprzak 
> : > |
> : > | http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5
> : > |
> : >  This is the world we live in: the way to deal with computers is to google
> : >  the symptoms, and hope that you don't have to watch a video. --P. Zaitcev
> : > ___
> : > ceph-users mailing list
> : > ceph-users@lists.ceph.com
> : > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> : >
>
> --
> | Jan "Yenya" Kasprzak  |
> | http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
>  This is the world we live in: the way to deal with computers is to google
>  the symptoms, and hope that you don't have to watch a video. --P. Zaitcev
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Configure libvirt to 'see' already created snapshots of a vm rbd image

2019-01-24 Thread ceph
Hmmm... if i am Not wrong, this Information have to be put within the config 
files from you... there isnt a mechanism which extracts this via rbd snap ls ...

Am 7. Januar 2019 13:16:36 MEZ schrieb Marc Roos :
>
>
>How do you configure libvirt so it sees the snapshots already created
>on 
>the rbd image it is using for the vm?
>
>I have already a vm running connected to the rbd pool via 
>protocol='rbd', and rbd snap ls is showing for snapshots.
>
>
>
>
>
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw s3 subuser permissions

2019-01-24 Thread Matt Benjamin
Hi Marc,

I'm not actually certain whether the traditional ACLs permit any
solution for that, but I believe with bucket policy, you can achieve
precise control within and across tenants, for any set of desired
resources (buckets).

Matt

On Thu, Jan 24, 2019 at 3:18 PM Marc Roos  wrote:
>
>
> It is correct that it is NOT possible for s3 subusers to have different
> permissions on folders created by the parent account?
> Thus the --access=[ read | write | readwrite | full ] is for everything
> the parent has created, and it is not possible to change that for
> specific folders/buckets?
>
> radosgw-admin subuser create --uid='Company$archive' --subuser=testuser
> --key-type=s3
>
> Thus if archive created this bucket/folder structure.
> └── bucket
> ├── folder1
> ├── folder2
> └── folder3
> └── folder4
>
> It is not possible to allow testuser to only write in folder2?
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Performance issue due to tuned

2019-01-24 Thread Mohamad Gebai
Hi all,

I want to share a performance issue I just encountered on a test cluster
of mine, specifically related to tuned. I started by setting the
"throughput-performance" tuned profile on my OSD nodes and ran some
benchmarks. I then applied that same profile to my client node, which
intuitively sounds like a reasonable thing to do (I do want to tweak my
client to maximize throughput if that's possible). Long story short, I
found out that one of the tweaks made by the "throughput-performance"
profile is to increase

kernel.sched_wakeup_granularity_ns = 1500

which reduces the maximum throughput I'm able to get from 1080 MB/s to
1060 MB/s (-2.8%). The default value for sched_wakeup_granularity_ns
depends on the distro, on my system the default is 7.5ms. More info
about the benchmark:

- The benchmark tool is 'rados bench'
- The cluster has about 10 nodes with older hardware
- The client node has only 4 CPUs, the OSD nodes have 16 CPUs and 5 OSDs
each
- The throughput difference is always reproducible
- This was a read workload so that there is less volatility in the results
- I had all the data in BlueStore's cache on the OSD nodes so that
accessing the HDDs wouldn't skew the results
- I was looking at the difference of throughput once the benchmark
reaches its permanent regime, during which the throughput is very stable
(not surprising for a sequential read workload served from memory)

I have a theory which explains the reason for this reduced throughput.
The sched_wakeup_granularity_ns setting sets the minimum time a process
runs on a CPU before it can get preempted, so it looks like there might
be too much of a delay for rados bench's threads to get scheduled on-cpu
(higher latency from the moment a thread is woken up and goes in the CPU
runqueue to the time it is scheduled in and starts running) which
effectively results in a lower throughput overall.

We can measure that latency using 'perf sched timehist':

   time    cpu  task name   wait time  sch
delay   run time
    [tid/pid]  (msec)
(msec) (msec)
--- --  --  - 
-  -
 3279952.180957 [0002]  msgr-worker-1[50098/50094]  0.154 
0.021  0.135

it is shown in the 5th column (sch delay). If we look at the average of
'sch delay' for a lower throughput run, we get:

$> perf sched timehist -i perf.data.slow | egrep 'msgr|rados' | awk '{
total += $5; count++ } END { print total/count }'
0.0243015

And for a higher throughput run:

$> perf sched timehist -i perf.data.fast | egrep 'msgr|rados' | awk '{
total += $5; count++ } END { print total/count }'
0.00401659

There is on average a 20ms longer delay for "wakeup-to-sched-in" with
the throughput-performance profile enabled on the client due to the
sched_wakeup_granularity_ns setting. The fact that there are few CPUs on
that node doesn't help. If I set the number of concurrent IOs to 1, I
get the same throughput for both values of sched_wakeup_granularity,
because there is (almost) always an available CPU, which means that
rados bench's threads don't have to wait as long to get scheduled in and
start consuming data.

On the other hand, increasing sched_wakeup_granularity_ns on the OSD
nodes doesn't reduce the throughput because there are more CPUs than
there are OSDs, and the wakeup-to-sched delay is "diluted" by the
latency of reading/writing/moving data around.

I'm curious to know if this theory makes sense, and if other people have
encountered similar situations (with tuned or otherwise).

Mohamad

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] logging of cluster status (Jewel vs Luminous and later)

2019-01-24 Thread Neha Ojha
Hi Matthew,

Some of the logging was intentionally removed because it used to
clutter up the logs. However, we are bringing back some of the useful
stuff back and have a tracker ticket
https://tracker.ceph.com/issues/37886 open for it.

Thanks,
Neha


On Thu, Jan 24, 2019 at 12:13 PM Stefan Kooman  wrote:
>
> Quoting Matthew Vernon (m...@sanger.ac.uk):
> > Hi,
> >
> > On our Jewel clusters, the mons keep a log of the cluster status e.g.
> >
> > 2019-01-24 14:00:00.028457 7f7a17bef700  0 log_channel(cluster) log [INF] :
> > HEALTH_OK
> > 2019-01-24 14:00:00.646719 7f7a46423700  0 log_channel(cluster) log [INF] :
> > pgmap v66631404: 173696 pgs: 10 active+clean+scrubbing+deep, 173686
> > active+clean; 2271 TB data, 6819 TB used, 9875 TB / 16695 TB avail; 1313
> > MB/s rd, 236 MB/s wr, 12921 op/s
> >
> > This is sometimes useful after a problem, to see when thing started going
> > wrong (which can be helpful for incident response and analysis) and so on.
> > There doesn't appear to be any such logging in Luminous, either by mons or
> > mgrs. What am I missing?
>
> Our mons keep a log in /var/log/ceph/ceph.log (running luminous 12.2.8).
> Is that log present on your systems?
>
> Gr. Stefan
>
> --
> | BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
> | GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Radosgw s3 subuser permissions

2019-01-24 Thread Marc Roos

It is correct that it is NOT possible for s3 subusers to have different 
permissions on folders created by the parent account?
Thus the --access=[ read | write | readwrite | full ] is for everything 
the parent has created, and it is not possible to change that for 
specific folders/buckets?

radosgw-admin subuser create --uid='Company$archive' --subuser=testuser 
--key-type=s3

Thus if archive created this bucket/folder structure. 
└── bucket
├── folder1
├── folder2
└── folder3
└── folder4

It is not possible to allow testuser to only write in folder2?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] migrate ceph-disk to ceph-volume fails with dmcrypt

2019-01-24 Thread Manuel Lausch



On Wed, 23 Jan 2019 16:32:08 +0100
Manuel Lausch  wrote:


> > 
> > The key api for encryption is *very* odd and a lot of its quirks are
> > undocumented. For example, ceph-volume is stuck supporting naming
> > files and keys 'lockbox'
> > (for backwards compatibility) but there is no real lockbox anymore.
> > Another quirk is that when storing the secret in the monitor, it is
> > done using the following convention:
> > 
> > dm-crypt/osd/{OSD FSID}/luks
> > 
> > The 'luks' part there doesn't indicate anything about the type of
> > encryption (!!) so regardless of the type of encryption (luks or
> > plain) the key would still go there.
> > 
> > If you manage to get the keys into the monitors you still wouldn't
> > be able to scan OSDs to produce the JSON files, but you would be
> > able to create the JSON file with the
> > metadata that ceph-volume needs to run the OSD.  
> 
> I think it is not that problem to create the json files by myself.
> Moving the Keys to the monitors and creating appropriate auth-keys
> should be more or less easy as well.
> 
> The problem I see is, that there are individual keys for the journal
> and data partition while the new process useses only one key for both
> partitions. 
> 
> maybe I can recreate the journal partition with the other key. But is
> this possible? Are there important data ramaining on the journal after
> clean stopping the OSD which I cannot throw away without trashing the
> whole OSD?
> 

Ok with a new empty journal the OSD will not start. I have now rescued
the data with dd and the recrypt it with a other key and copied the
data back. This worked so far

Now I encoded the key with base64 and put it to the key-value store.
Also created the neccessary authkeys. Creating the json File by hand
was quiet easy.

But now there is one problem.
ceph-disk opens the crypt like
cryptsetup --key-file /etc/ceph/dmcrypt-keys/foobar ...
ceph-volume pipes the key via stdin like this
cat foobar | cryptsetup --key-file - ...

The big problem. if the key is given via stdin cryptsetup hashes this
key per default with some hash. Only if I set --hash plain it works. I
think this is a bug in ceph-volume. 

Can someone confirm this?

there is the related code I mean in ceph-volume
https://github.com/ceph/ceph/blob/v12.2.10/src/ceph-volume/ceph_volume/util/encryption.py#L59

Regards
Manuel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] create osd failed due to cephx authentication

2019-01-24 Thread Marc Roos
 

ceph osd create

ceph osd rm osd.15

sudo -u ceph mkdir /var/lib/ceph/osd/ceph-15

ceph-disk prepare --bluestore --zap-disk /dev/sdc (bluestore)

blkid /dev/sdb1
echo "UUID=a300d511-8874-4655-b296-acf489d5cbc8 
/var/lib/ceph/osd/ceph-15 xfs defaults   0 0" >> /etc/fstab
mount /var/lib/ceph/osd/ceph-15
chown ceph.ceph -R /var/lib/ceph/osd/ceph-15


sudo -u ceph ceph-osd -i 15 --mkfs --mkkey --osd-uuid
sudo -u ceph ceph auth add osd.15 osd 'allow *' mon 'allow profile osd' 
mgr 'allow profile osd' -i /var/lib/ceph/osd/ceph-15/keyring 

ceph osd create

sudo -u ceph ceph osd crush add osd.15 0.4 host=c04

systemctl start ceph-osd@15
systemctl enable ceph-osd@15





-Original Message-
From: Zhenshi Zhou [mailto:deader...@gmail.com] 
Sent: 24 January 2019 10:32
To: ceph-users
Subject: [ceph-users] create osd failed due to cephx authentication

Hi,

I'm installing a new ceph cluster manually. I get errors when I create 
osd:

# ceph-osd -i 0 --mkfs --mkkey
2019-01-24 17:07:44.045 7f45f497b1c0 -1 auth: unable to find a keyring 
on /var/lib/ceph/osd/ceph-0/keyring: (2) No such file or directory
2019-01-24 17:07:44.045 7f45f497b1c0 -1 monclient: ERROR: missing 
keyring, cannot use cephx for authentication

Some informations are provided, did I miss anything?

# cat /etc/ceph/ceph.conf:
[global]
...
[osd.0]
host = ceph-osd1
osd data = /var/lib/ceph/osd/ceph-0
bluestore block path = /dev/disk/by-partlabel/bluestore_block_0
bluestore block db path = /dev/disk/by-partlabel/bluestore_block_db_0
bluestore block wal path = /dev/disk/by-partlabel/bluestore_block_wal_0

# ls /var/lib/ceph/osd/ceph-0
type

# cat /var/lib/ceph/osd/ceph-0/type
bluestore

# ceph -s
  cluster:
id: 7712ab7e-3c38-44b3-96d3-4e1de9da0ff6
health: HEALTH_OK

  services:
mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3
mgr: ceph-mon1(active), standbys: ceph-mon2, ceph-mon3
osd: 1 osds: 0 up, 0 in

  data:
pools:   0 pools, 0 pgs
objects: 0  objects, 0 B
usage:   0 B used, 0 B / 0 B avail
pgs:

# ceph --version
ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic 
(stable)



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] logging of cluster status (Jewel vs Luminous and later)

2019-01-24 Thread Stefan Kooman
Quoting Matthew Vernon (m...@sanger.ac.uk):
> Hi,
> 
> On our Jewel clusters, the mons keep a log of the cluster status e.g.
> 
> 2019-01-24 14:00:00.028457 7f7a17bef700  0 log_channel(cluster) log [INF] :
> HEALTH_OK
> 2019-01-24 14:00:00.646719 7f7a46423700  0 log_channel(cluster) log [INF] :
> pgmap v66631404: 173696 pgs: 10 active+clean+scrubbing+deep, 173686
> active+clean; 2271 TB data, 6819 TB used, 9875 TB / 16695 TB avail; 1313
> MB/s rd, 236 MB/s wr, 12921 op/s
> 
> This is sometimes useful after a problem, to see when thing started going
> wrong (which can be helpful for incident response and analysis) and so on.
> There doesn't appear to be any such logging in Luminous, either by mons or
> mgrs. What am I missing?

Our mons keep a log in /var/log/ceph/ceph.log (running luminous 12.2.8).
Is that log present on your systems?

Gr. Stefan

-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client instability

2019-01-24 Thread Martin Palma
We are experiencing the same issues on clients with CephFS mounted
using the kernel client and 4.x kernels.

The problem  shows up when we add new OSDs, on reboots after
installing patches and when changing the weight.

Here the logs of a misbehaving client;

[6242967.890611] libceph: mon4 10.8.55.203:6789 session established
[6242968.010242] libceph: osd534 10.7.55.23:6814 io error
[6242968.259616] libceph: mon1 10.7.55.202:6789 io error
[6242968.259658] libceph: mon1 10.7.55.202:6789 session lost, hunting
for new mon
[6242968.359031] libceph: mon4 10.8.55.203:6789 session established
[6242968.622692] libceph: osd534 10.7.55.23:6814 io error
[6242968.692274] libceph: mon4 10.8.55.203:6789 io error
[6242968.692337] libceph: mon4 10.8.55.203:6789 session lost, hunting
for new mon
[6242968.694216] libceph: mon0 10.7.55.201:6789 session established
[6242969.099862] libceph: mon0 10.7.55.201:6789 io error
[6242969.099888] libceph: mon0 10.7.55.201:6789 session lost, hunting
for new mon
[6242969.224565] libceph: osd534 10.7.55.23:6814 io error

Additional to the MON io error we also got some OSD io errors.

Moreover when the error occurs several clients causes a
"MDS_CLIENT_LATE_RELEASE" error on the MDS server.

We are currently running on Luminous 12.2.10 and have around 580 OSDs
and 5 monitor nodes. The cluster is running on CentOS 7.6.

The ‘osd_map_message_max’ setting is set to the default value of 40.
But we are still getting these errors.

Best,
Martin


On Wed, Jan 16, 2019 at 7:46 PM Ilya Dryomov  wrote:
>
> On Wed, Jan 16, 2019 at 7:12 PM Andras Pataki
>  wrote:
> >
> > Hi Ilya/Kjetil,
> >
> > I've done some debugging and tcpdump-ing to see what the interaction
> > between the kernel client and the mon looks like.  Indeed -
> > CEPH_MSG_MAX_FRONT defined as 16Mb seems low for the default mon
> > messages for our cluster (with osd_mon_messages_max at 100).  We have
> > about 3500 osd's, and the kernel advertises itself as older than
>
> This is too big, especially for a fairly large cluster such as yours.
> The default was reduced to 40 in luminous.  Given about 3500 OSDs, you
> might want to set it to 20 or even 10.
>
> > Luminous, so it gets full map updates.  The FRONT message size on the
> > wire I saw was over 24Mb.  I'll try setting osd_mon_messages_max to 30
> > and do some more testing, but from the debugging it definitely seems
> > like the issue.
> >
> > Is the kernel driver really not up to date to be considered at least a
> > Luminous client by the mon (i.e. it has some feature really missing)?  I
> > looked at the bits, and the MON seems to want is bit 59 in ceph features
> > shared by FS_BTIME, FS_CHANGE_ATTR, MSG_ADDR2.  Can the kernel client be
> > used when setting require-min-compat to luminous (either with the 4.19.x
> > kernel or the Redhat/Centos 7.6 kernel)?  Some background here would be
> > helpful.
>
> Yes, the kernel client is missing support for that feature bit, however
> 4.13+ and RHEL 7.5+ _can_ be used with require-min-compat-client set to
> luminous.  See
>
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/027002.html
>
> Thanks,
>
> Ilya
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] create osd failed due to cephx authentication

2019-01-24 Thread Zhenshi Zhou
Hi,

I'm installing a new ceph cluster manually. I get errors when I create osd:

# ceph-osd -i 0 --mkfs --mkkey
2019-01-24 17:07:44.045 7f45f497b1c0 -1 auth: unable to find a keyring on
/var/lib/ceph/osd/ceph-0/keyring: (2) No such file or directory
2019-01-24 17:07:44.045 7f45f497b1c0 -1 monclient: ERROR: missing keyring,
cannot use cephx for authentication

Some informations are provided, did I miss anything?

# cat /etc/ceph/ceph.conf:
[global]
...
[osd.0]
host = ceph-osd1
osd data = /var/lib/ceph/osd/ceph-0
bluestore block path = /dev/disk/by-partlabel/bluestore_block_0
bluestore block db path = /dev/disk/by-partlabel/bluestore_block_db_0
bluestore block wal path = /dev/disk/by-partlabel/bluestore_block_wal_0

# ls /var/lib/ceph/osd/ceph-0
type

# cat /var/lib/ceph/osd/ceph-0/type
bluestore

# ceph -s
  cluster:
id: 7712ab7e-3c38-44b3-96d3-4e1de9da0ff6
health: HEALTH_OK

  services:
mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3
mgr: ceph-mon1(active), standbys: ceph-mon2, ceph-mon3
osd: 1 osds: 0 up, 0 in

  data:
pools:   0 pools, 0 pgs
objects: 0  objects, 0 B
usage:   0 B used, 0 B / 0 B avail
pgs:

# ceph --version
ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic
(stable)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Commercial support

2019-01-24 Thread Matthew Vernon

Hi,

On 23/01/2019 22:28, Ketil Froyn wrote:

How is the commercial support for Ceph? More specifically, I was  
recently pointed in the direction of the very interesting combination of 
CephFS, Samba and ctdb. Is anyone familiar with companies that provide 
commercial support for in-house solutions like this?


To add to the answers you've already had:

Ubuntu also offer Ceph & Swift support:
https://www.ubuntu.com/support/plans-and-pricing#storage

Croit offer their own managed Ceph product, but do also offer 
support/consulting for Ceph installs, I think:

https://croit.io/

There are some smaller consultancies, too, including 42on which is run 
by Wido den Hollander who you will have seen posting here:

https://www.42on.com/

Regards,

Matthew
disclaimer: I have no commercial relationship to any of the above


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] logging of cluster status (Jewel vs Luminous and later)

2019-01-24 Thread Matthew Vernon

Hi,

On our Jewel clusters, the mons keep a log of the cluster status e.g.

2019-01-24 14:00:00.028457 7f7a17bef700  0 log_channel(cluster) log 
[INF] : HEALTH_OK
2019-01-24 14:00:00.646719 7f7a46423700  0 log_channel(cluster) log 
[INF] : pgmap v66631404: 173696 pgs: 10 active+clean+scrubbing+deep, 
173686 active+clean; 2271 TB data, 6819 TB used, 9875 TB / 16695 TB 
avail; 1313 MB/s rd, 236 MB/s wr, 12921 op/s


This is sometimes useful after a problem, to see when thing started 
going wrong (which can be helpful for incident response and analysis) 
and so on. There doesn't appear to be any such logging in Luminous, 
either by mons or mgrs. What am I missing?


Thanks,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com