Re: [ceph-users] Cannot attach volumes
Hi Karan, I have checked the cinder logs but could not find anything suspicious. Thanks Kumar From: Karan Singh [mailto:karan.si...@csc.fi] Sent: Tuesday, June 10, 2014 3:14 PM To: Gnan Kumar, Yalla Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Cannot attach volumes Hi Kumar Clock skew is just a warning and should not related to this problem. But its pretty easy to fix this warning either by setting up NTP on all Ceph cluster nodes or by adding mon clock drift warn backoff = Seconds into ceph.conf (do not do this in production) WRT to your second problem , try to check out cinder volume logs and scheduler logs , you would surely find something there. If not try increasing debug level of cinder and check for some clues. - Karan Singh - On 10 Jun 2014, at 09:53, yalla.gnan.ku...@accenture.commailto:yalla.gnan.ku...@accenture.com wrote: Hi All, I have four node ceph cluster. I have another three node setup for openstack. I have integrated Ceph with openstack. Whenever I try to create storage with ceph as storage backend for the openstack vm, the creation process goes on forever in the horizon dashboard. It never completes. Also when attaching the ceph volume to the VM in openstack, it freezes and goes on forever without completion. To investigate this issue, I typed the 'ceph -s' command on the ceph nodes. The health of ceph cluster is in warning state. It says it detected clock skew on two of the nodes. Is this time synchronization the reason beyond the VM freezing while attaching volumes to VMs ? Thanks Kumar This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. __ www.accenture.comhttp://www.accenture.com/ ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG Selection Criteria for Deep-Scrub
Hi Greg, This tracker issue is relevant: http://tracker.ceph.com/issues/7288 Cheers, Dan On 11 Jun 2014, at 00:30, Gregory Farnum g...@inktank.com wrote: Hey Mike, has your manual scheduling resolved this? I think I saw another similar-sounding report, so a feature request to improve scrub scheduling would be welcome. :) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, May 20, 2014 at 5:46 PM, Mike Dawson mike.daw...@cloudapt.com wrote: I tend to set it whenever I don't want to be bothered by storage performance woes (nights I value sleep, etc). This cluster is bounded by relentless small writes (it has a couple dozen rbd volumes backing video surveillance DVRs). Some of the software we run is completely unaffected whereas other software falls apart during periods of deep-scrubs. I theorize it has to do with the individual software's attitude about flushing to disk / buffering. - Mike On 5/20/2014 8:31 PM, Aaron Ten Clay wrote: For what it's worth, version 0.79 has different headers, and the awk command needs $19 instead of $20. But here is the output I have on a small cluster that I recently rebuilt: $ ceph pg dump all | grep active | awk '{ print $19}' | sort -k1 | uniq -c dumped all in format plain 1 2014-05-15 2 2014-05-17 19 2014-05-18 193 2014-05-19 105 2014-05-20 I have set noscrub and nodeep-scrub, as well as noout and nodown off and on while I performed various maintenance, but that hasn't (apparently) impeded the regular schedule. With what frequency are you setting the nodeep-scrub flag? -Aaron On Tue, May 20, 2014 at 5:21 PM, Mike Dawson mike.daw...@cloudapt.com mailto:mike.daw...@cloudapt.com wrote: Today I noticed that deep-scrub is consistently missing some of my Placement Groups, leaving me with the following distribution of PGs and the last day they were successfully deep-scrubbed. # ceph pg dump all | grep active | awk '{ print $20}' | sort -k1 | uniq -c 5 2013-11-06 221 2013-11-20 1 2014-02-17 25 2014-02-19 60 2014-02-20 4 2014-03-06 3 2014-04-03 6 2014-04-04 6 2014-04-05 13 2014-04-06 4 2014-04-08 3 2014-04-10 2 2014-04-11 50 2014-04-12 28 2014-04-13 14 2014-04-14 3 2014-04-15 78 2014-04-16 44 2014-04-17 8 2014-04-18 1 2014-04-20 16 2014-05-02 69 2014-05-04 140 2014-05-05 569 2014-05-06 9231 2014-05-07 103 2014-05-08 514 2014-05-09 1593 2014-05-10 393 2014-05-16 2563 2014-05-17 1283 2014-05-18 1640 2014-05-19 1979 2014-05-20 I have been running the default osd deep scrub interval of once per week, but have disabled deep-scrub on several occasions in an attempt to avoid the associated degraded cluster performance I have written about before. To get the PGs longest in need of a deep-scrub started, I set the nodeep-scrub flag, and wrote a script to manually kick off deep-scrub according to age. It is processing as expected. Do you consider this a feature request or a bug? Perhaps the code that schedules PGs to deep-scrub could be improved to prioritize PGs that have needed a deep-scrub the longest. Thanks, Mike Dawson _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to avoid deep-scrubbing performance hit?
On 10 Jun 2014, at 11:59, Dan Van Der Ster daniel.vanders...@cern.ch wrote: One idea I had was to check the behaviour under different disk io schedulers, trying exploit thread io priorities with cfq. So I have a question for the developers about using ionice or ioprio_set to lower the IO priorities of the threads responsible for scrubbing: - Are there dedicated threads always used for scrubbing only, and never for client IOs? If so, can an admin identify the thread IDs so he can ionice those? - If OTOH a disk/op thread is switching between scrubbing and client IO responsibilities, could Ceph use ioprio_set to change the io priorities on the fly?? I just submitted a feature request for this: http://tracker.ceph.com/issues/8580 Cheers, Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] failed when activate the OSD
after I created 1 mon and prepared 2 osds,I checked and found that the fsid of the three are same,but when I input *ceph-deploy osd activate node2:/var/local/osd0 node3:/var/local/osd1*, the error output were as follows: node2][WARNIN] ceph-disk: Error: No cluster conf found in /etc/ceph with fsid 3e68a2b5-cbf3-4149-9462-b89e2a40236e It was strange that the fsid in the output is different with that of the three node,and if I modified the three nodes' , another error happend as [node2][WARNIN] 2014-06-11 01:39:17.738451 b63cfb40 0 librados: client.bootstrap-osd authentication error (1) Operation not permitted what should I do?___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Is it still unsafe to map a RBD device on an OSD server?
On 06/11/2014 08:20 AM, Sebastien Han wrote: Thanks for your answers u I have that for an apt-cache since more than 1 year now, never had an issue. Of course, your question is not about having a krbd device backing an OSD of the same cluster ;-) attachment: mcluseau.vcf___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Monitor down
I have a four node ceph storage cluster. Ceph -s is showing one monitor as down . How to start it and in which server do I have to start it ? --- root@cephadmin:/home/oss# ceph -w cluster 9acd33d7-759b-45f4-b48f-a4682fd6c674 health HEALTH_WARN 1 mons down, quorum 0,1 cephnode1,cephnode2 monmap e3: 3 mons at {cephnode1=10.211.203.237:6789/0,cephnode2=10.211.203.238:6789/0,cephnode3=10.211.203.239:6789/0}, election epoch 844, quorum 0,1 cephnode1,cephnode2 mdsmap e225: 1/1/1 up {0=cephnode1=up:active} osdmap e297: 3 osds: 3 up, 3 in pgmap v214969: 448 pgs, 5 pools, 9495 bytes data, 30 objects 21881 MB used, 51663 MB / 77501 MB avail 448 active+clean -- Thanks Kumar This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. __ www.accenture.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Is CRUSH used on reading ?
On 06/11/2014 12:51 PM, Florent B wrote: Hi, I would like to know if Ceph uses CRUSH algorithm when a read operation occurs, for example to select the nearest OSD storing the asked object. CRUSH is used when reading since it's THE algorithm inside Ceph to determine data placement. CRUSH doesn't support reading the nearest object, it will always read from the primary OSD for a PG, but you can influence the primary affinity. Thank you :) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander Ceph consultant and trainer 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Monitor down
On 06/11/2014 01:23 PM, yalla.gnan.ku...@accenture.com wrote: I have a four node ceph storage cluster. Ceph –s is showing one monitor as down . How to start it and in which server do I have to start it ? It's cephnode3 which is down. Log in and do: $ start ceph-mon-all --- root@cephadmin:/home/oss# ceph -w cluster 9acd33d7-759b-45f4-b48f-a4682fd6c674 health HEALTH_WARN 1 mons down, quorum 0,1 cephnode1,cephnode2 monmap e3: 3 mons at {cephnode1=10.211.203.237:6789/0,cephnode2=10.211.203.238:6789/0,cephnode3=10.211.203.239:6789/0}, election epoch 844, quorum 0,1 cephnode1,cephnode2 mdsmap e225: 1/1/1 up {0=cephnode1=up:active} osdmap e297: 3 osds: 3 up, 3 in pgmap v214969: 448 pgs, 5 pools, 9495 bytes data, 30 objects 21881 MB used, 51663 MB / 77501 MB avail 448 active+clean -- Thanks Kumar This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. __ www.accenture.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander Ceph consultant and trainer 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Unable to remove mds
Hi All, I have a four node ceph cluster. The metadata service is showing as degraded in health. How to remove the mds service from ceph ? =- root@cephadmin:/home/oss# ceph -s cluster 9acd33d7-759b-45f4-b48f-a4682fd6c674 health HEALTH_WARN mds cluster is degraded monmap e3: 3 mons at {cephnode1=10.211.203.237:6789/0,cephnode2=10.211.203.238:6789/0,cephnode3=10.211.203.239:6789/0}, election epoch 874, quorum 0,1,2 cephnode1,cephnode2,cephnode3 mdsmap e227: 1/1/1 up {0=cephnode1=up:replay} osdmap e299: 3 osds: 3 up, 3 in pgmap v214988: 448 pgs, 5 pools, 9495 bytes data, 30 objects 22693 MB used, 50851 MB / 77501 MB avail 448 active+clean -- Thanks Kumar This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. __ www.accenture.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can we map OSDs from different hosts (servers) to a Pool in Ceph
Hi, we have a similar setup where we have SSD and HDD in the same hosts. Our very basic crushmap is configured as follows: # ceph osd tree # id weight type name up/down reweight -6 3 root ssd 3 1 osd.3 up 1 4 1 osd.4 up 1 5 1 osd.5 up 1 -5 3 root platters 0 1 osd.0 up 1 1 1 osd.1 up 1 2 1 osd.2 up 1 -1 3 root default -2 1 host chgva-srv-stor-001 0 1 osd.0 up 1 3 1 osd.3 up 1 -3 1 host chgva-srv-stor-002 1 1 osd.1 up 1 4 1 osd.4 up 1 -4 1 host chgva-srv-stor-003 2 1 osd.2 up 1 5 1 osd.5 up 1 We do not seem to have problems with this setup, but i'm not sure if it's a good practice to have elements appearing multiple times in different branches. On the other hand, I see no way to follow the physical hierarchy of a datacenter for pools, since a pool can be spread among servers/racks/rooms... Can someone confirm this crushmap is any good for our configuration? Thanks is advance. BR Davide On Mon, Mar 3, 2014 at 12:48 PM, Wido den Hollander w...@42on.com wrote: On 03/03/2014 12:45 PM, Vikrant Verma wrote: Hi All, Is it possible to map OSDs from different hosts (servers) to a Pool in ceph cluster? In Crush Map we can add a bucket mentioning the host details (hostname and its weight). Is it possible to configure a bucket which contains OSDs from different hosts? I think it's possible. But you can always try it and afterwards run crushtool with tests: $ crushtool -i mycrushmap --test --rule 0 --num-rep 3 --show-statistics That will run some tests on your compiled crushmap if possible please let me know how to configure it. Regards, Vikrant ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] I have PGs that I can't deep-scrub
Hi Craig, It's hard to say what is going wrong with that level of logs. Can you reproduce with debug ms = 1 and debug osd = 20? There were a few things fixed in scrub between emperor and firefly. Are you planning on upgrading soon? sage On Tue, 10 Jun 2014, Craig Lewis wrote: Every time I deep-scrub one PG, all of the OSDs responsible get kicked out of the cluster. I've deep-scrubbed this PG 4 times now, and it fails the same way every time. OSD logs are linked at the bottom. What can I do to get this deep-scrub to complete cleanly? This is the first time I've deep-scrubbed these PGs since Sage helped me recover from some OSD problems (http://t53277.file-systems-ceph-development.file-systemstalk.info/70-osd-are-down-and-not-coming-up-t53277.html) I can trigger the issue easily in this cluster, but have not been able to re-create in other clusters. The PG stats for this PG say that last_deep_scrub and deep_scrub_stamp are 48009'1904117 2014-05-21 07:28:01.315996 respectively. This PG is owned by OSDs [11,0] This is a secondary cluster, so I stopped all external I/O on it. I set nodeep-scrub, and restarted both OSDs with: debug osd = 5/5 debug filestore = 5/5 debug journal = 1 debug monc = 20/20 then I ran a deep-scrub on this PG. 2014-06-10 10:47:50.881783 mon.0 [INF] pgmap v8832020: 2560 pgs: 2555 active+clean, 5 active+clean+scrubbing; 27701 GB data, 56218 GB used, 77870 GB / 130 TB avail 2014-06-10 10:47:54.039829 mon.0 [INF] pgmap v8832021: 2560 pgs: 2554 active+clean, 5 active+clean+scrubbing, 1 active+clean+scrubbing+deep; 27701 GB data, 56218 GB used, 77870 GB / 130 TB avail At 10:49:09, I see ceph-osd for both 11 and 0 spike to 100% CPU (100.3% +/- 1.0%). Prior to this, they were both using ~30% CPU. It might've started a few seconds sooner, I'm watching top. I forgot to watch IO stat until 10:56. At this point, both OSDs are reading. iostat reports that they're both doing ~100 transactions/sec, reading ~1 MiBps, 0 writes. At 11:01:26, iostat reports that both osds are no longer consuming any disk I/O. They both go for 30 seconds with 0 transactions, and 0 kiB read/write. There are small bumps of 2 transactions/sec for one second, then it's back to 0. At 11:02:41, the primary OSD gets kicked out by the monitors: 2014-06-10 11:02:41.168443 mon.0 [INF] pgmap v8832125: 2560 pgs: 2555 active+clean, 4 active+clean+scrubbing, 1 active+clean+scrubbing+deep; 27701 GB data, 56218 GB used, 77870 GB / 130 TB avail; 1996 B/s rd, 2 op/s 2014-06-10 11:02:57.801047 mon.0 [INF] osd.11 marked down after no pg stats for 903.825187seconds 2014-06-10 11:02:57.823115 mon.0 [INF] osdmap e58834: 36 osds: 35 up, 36 in Both ceph-osd processes (11 and 0) continue to use 100% CPU (same range). At ~11:10, I see that osd.11 has resumed reading from disk at the original levels (~100 tps, ~1MiBps read, 0 MiBps write). Since it's down, but doing something, I let it run. Both the osd.11 and osd.0 repeat this pattern. Reading for a while at ~1 MiBps, then nothing. The duty cycle seems about 50%, with a 20 minute period, but I haven't timed anything. CPU usage remains at 100%, regardless of whether IO is happening or not. At 12:24:15, osd.11 rejoins the cluster: 2014-06-10 12:24:15.294646 mon.0 [INF] osd.11 10.193.0.7:6804/7100 boot 2014-06-10 12:24:15.294725 mon.0 [INF] osdmap e58838: 36 osds: 35 up, 36 in 2014-06-10 12:24:15.343869 mon.0 [INF] pgmap v8832827: 2560 pgs: 1 stale+active+clean+scrubbing+deep, 2266 active+clean, 5 stale+active+clean, 287 active+degraded, 1 active+clean+scrubbing; 27701 GB data, 56218 GB used, 77870 GB / 130 TB avail; 15650 B/s rd, 18 op/s; 3617854/61758142 objects degraded (5.858%) osd.0's CPU usage drops back to normal when osd.11 rejoins the cluster. The PG stats have not changed. The last_deep_scrub and deep_scrub_stamp are still 48009'1904117 2014-05-21 07:28:01.315996 respectively. This time, osd.0 did not get kicked out by the monitors. In previous attempts, osd.0 was kicked out 5-10 minutes after osd.11. When that happens, osd.0 rejoins the cluster after osd.11. I have several more PGs exhibiting the same behavior. At least 3 that I know of, and many more that I haven't attempted to deep-scrub. ceph -v: ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60) ceph.conf: https://cd.centraldesktop.com/p/eAAADvxuAHJRUk4 ceph-osd.11.log (5.7 MiB): https://cd.centraldesktop.com/p/eAAADvxyABPwaeM ceph-osd.0.log (6.3 MiB): https://cd.centraldesktop.com/p/eAAADvx0ADWEGng ceph pg 40.11e query: https://cd.centraldesktop.com/p/eAAADvxvAAylTW0 (the pg query was collected at 13:24, after the above events) Things that probably don't matter: The OSD partitions were created using ceph-disk-prepare --dmcrypt. -- To unsubscribe from this list: send the line unsubscribe
[ceph-users] ceph-deploy - problem creating an osd
Hi, ceph-deploy-1.5.3 can make trouble, if a reboot is done between preparation and aktivation of an osd: The osd-disk was /dev/sdb at this time, osd itself should go to sdb1, formatted to cleared, journal should go to sdb2, formatted to btrfs I prepared an osd: root@bd-a:/etc/ceph# ceph-deploy -v --overwrite-conf osd --fs-type btrfs prepare bd-1:/dev/sdb1:/dev/sdb2 [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.3): /usr/bin/ceph-deploy -v --overwrite-conf osd --fs-type btrfs prepare bd-1:/dev/sdb1:/dev/sdb2 [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks bd-1:/dev/sdb1:/dev/sdb2 [bd-1][DEBUG ] connected to host: bd-1 [bd-1][DEBUG ] detect platform information from remote host [bd-1][DEBUG ] detect machine type [ceph_deploy.osd][INFO ] Distro info: Ubuntu 14.04 trusty [ceph_deploy.osd][DEBUG ] Deploying osd to bd-1 [bd-1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf [bd-1][INFO ] Running command: udevadm trigger --subsystem-match=block --action=add [ceph_deploy.osd][DEBUG ] Preparing host bd-1 disk /dev/sdb1 journal /dev/sdb2 activate False [bd-1][INFO ] Running command: ceph-disk-prepare --fs-type btrfs --cluster ceph -- /dev/sdb1 /dev/sdb2 [bd-1][DEBUG ] [bd-1][DEBUG ] WARNING! - Btrfs v3.12 IS EXPERIMENTAL [bd-1][DEBUG ] WARNING! - see http://btrfs.wiki.kernel.org before using [bd-1][DEBUG ] [bd-1][DEBUG ] fs created label (null) on /dev/sdb1 [bd-1][DEBUG ] nodesize 32768 leafsize 32768 sectorsize 4096 size 19.99TiB [bd-1][DEBUG ] Btrfs v3.12 [bd-1][WARNIN] WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same device as the osd data [bd-1][WARNIN] Turning ON incompat feature 'extref': increased hardlink limit per file to 65536 [bd-1][WARNIN] Error: Partition(s) 1 on /dev/sdb1 have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes. [bd-1][INFO ] checking OSD status... [bd-1][INFO ] Running command: ceph --cluster=ceph osd stat --format=json [ceph_deploy.osd][DEBUG ] Host bd-1 is now ready for osd use. Unhandled exception in thread started by sys.excepthook is missing lost sys.stderr ceph-deploy told me to do a reboot, so i did. After the reboot the osd-disk changed from sdb to sda. This is a known problem of linux (ubuntu) root@bd-a:/etc/ceph# ceph-deploy -v osd activate bd-1:/dev/sda1:/dev/sda2 [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.3): /usr/bin/ceph-deploy -v osd activate bd-1:/dev/sda1:/dev/sda2 [ceph_deploy.osd][DEBUG ] Activating cluster ceph disks bd-1:/dev/sda1:/dev/sda2 [bd-1][DEBUG ] connected to host: bd-1 [bd-1][DEBUG ] detect platform information from remote host [bd-1][DEBUG ] detect machine type [ceph_deploy.osd][INFO ] Distro info: Ubuntu 14.04 trusty [ceph_deploy.osd][DEBUG ] activating host bd-1 disk /dev/sda1 [ceph_deploy.osd][DEBUG ] will use init type: upstart [bd-1][INFO ] Running command: ceph-disk-activate --mark-init upstart --mount /dev/sda1 [bd-1][WARNIN] got monmap epoch 1 [bd-1][WARNIN] HDIO_DRIVE_CMD(identify) failed: Invalid argument [bd-1][WARNIN] 2014-06-10 11:45:07.222697 7f5c111af800 -1 journal check: ondisk fsid c8ce6ee2-f21b-4ba3-a20e-649224244b9a doesn't match expected fcaaf66f-b7b7-4702-83a4-54832b7131fa, invalid (someone else's?) journal [bd-1][WARNIN] HDIO_DRIVE_CMD(identify) failed: Invalid argument [bd-1][WARNIN] HDIO_DRIVE_CMD(identify) failed: Invalid argument [bd-1][WARNIN] HDIO_DRIVE_CMD(identify) failed: Invalid argument [bd-1][WARNIN] 2014-06-10 11:45:08.125384 7f5c111af800 -1 filestore(/var/lib/ceph/tmp/mnt.LryOxo) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory [bd-1][WARNIN] 2014-06-10 11:45:08.320327 7f5c111af800 -1 created object store /var/lib/ceph/tmp/mnt.LryOxo journal /var/lib/ceph/tmp/mnt.LryOxo/journal for osd.4 fsid 08066b4a-3f36-4e3f-bd1e-15c006a09057 [bd-1][WARNIN] 2014-06-10 11:45:08.320367 7f5c111af800 -1 auth: error reading file: /var/lib/ceph/tmp/mnt.LryOxo/keyring: can't open /var/lib/ceph/tmp/mnt.LryOxo/keyring: (2) No such file or directory [bd-1][WARNIN] 2014-06-10 11:45:08.320419 7f5c111af800 -1 created new key in keyring /var/lib/ceph/tmp/mnt.LryOxo/keyring [bd-1][WARNIN] added key for osd.4 [bd-1][INFO ] checking OSD status... [bd-1][INFO ] Running command: ceph --cluster=ceph osd stat --format=json [bd-1][WARNIN] there are 2 OSDs down [bd-1][WARNIN] there are 2 OSDs out root@bd-a:/etc/ceph# ceph -s cluster 08066b4a-3f36-4e3f-bd1e-15c006a09057 health HEALTH_WARN 679 pgs degraded; 992 pgs stuck unclean; recovery 19/60 objects degraded (31.667%); clock skew detected on mon.bd-1 monmap e1: 3 mons at
[ceph-users] pid_max value?
Hi, what is the recommended value for /proc/sys/kernel/pid_max? Is 32768 enough for Ceph cluster with 4 nodes (40 1T OSDs on each node)? My ceph node already run into create thread fail problem in osd log which root cause at pid_max. Wei Cao (Buddy) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pid_max value?
Hello, The values we use are as follows: # sysctl -p net.ipv4.ip_local_port_range = 1024 65535 net.core.netdev_max_backlog = 3 net.core.somaxconn = 16384 net.ipv4.tcp_max_syn_backlog = 252144 net.ipv4.tcp_max_tw_buckets = 36 net.ipv4.tcp_fin_timeout = 3 net.ipv4.tcp_max_orphans = 262144 net.ipv4.tcp_synack_retries = 2 net.ipv4.tcp_syn_retries = 2 net.core.rmem_max = 8388608 net.core.wmem_max = 8388608 net.core.rmem_default = 65536 net.core.wmem_default = 65536 net.ipv4.tcp_rmem = 4096 87380 8388608 net.ipv4.tcp_wmem = 4096 65536 8388608 net.ipv4.tcp_mem = 8388608 8388608 8388608 net.ipv4.route.flush = 1 kernel.pid_max = 4194303 The timeouts don't really make sense without tw reuse/recycling but we found increasing the max and letting the old ones hang gives better performance. Somaxconn was the most important value we had to increase as with 3 mons, 3 storage nodes, 3 vm hypervisors, 16vms and 48 OSDs we've started running into major problems with servers dying left and right. Most of those values are lifted from some openstack python script IIRC, please let us know if you find a more efficient/stable configuration, however we're quite happy with this one. Regards, Maciej Bonin Systems Engineer | M247 Limited M247.com Connected with our Customers Contact us today to discuss your hosting and connectivity requirements ISO 27001 | ISO 9001 | Deloitte Technology Fast 50 | Deloitte Technology Fast 500 EMEA | Sunday Times Tech Track 100 M247 Ltd, registered in England Wales #4968341. 1 Ball Green, Cobra Court, Manchester, M32 0QT ISO 27001 Data Protection Classification: A - Public From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Cao, Buddy Sent: 11 June 2014 15:00 To: ceph-users@lists.ceph.com Subject: [ceph-users] pid_max value? Hi, what is the recommended value for /proc/sys/kernel/pid_max? Is 32768 enough for Ceph cluster with 4 nodes (40 1T OSDs on each node)? My ceph node already run into create thread fail problem in osd log which root cause at pid_max. Wei Cao (Buddy) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy - problem creating an osd
On Wed, Jun 11, 2014 at 9:29 AM, Markus Goldberg goldb...@uni-hildesheim.de wrote: Hi, ceph-deploy-1.5.3 can make trouble, if a reboot is done between preparation and aktivation of an osd: The osd-disk was /dev/sdb at this time, osd itself should go to sdb1, formatted to cleared, journal should go to sdb2, formatted to btrfs I prepared an osd: root@bd-a:/etc/ceph# ceph-deploy -v --overwrite-conf osd --fs-type btrfs prepare bd-1:/dev/sdb1:/dev/sdb2 [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.3): /usr/bin/ceph-deploy -v --overwrite-conf osd --fs-type btrfs prepare bd-1:/dev/sdb1:/dev/sdb2 [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks bd-1:/dev/sdb1:/dev/sdb2 [bd-1][DEBUG ] connected to host: bd-1 [bd-1][DEBUG ] detect platform information from remote host [bd-1][DEBUG ] detect machine type [ceph_deploy.osd][INFO ] Distro info: Ubuntu 14.04 trusty [ceph_deploy.osd][DEBUG ] Deploying osd to bd-1 [bd-1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf [bd-1][INFO ] Running command: udevadm trigger --subsystem-match=block --action=add [ceph_deploy.osd][DEBUG ] Preparing host bd-1 disk /dev/sdb1 journal /dev/sdb2 activate False [bd-1][INFO ] Running command: ceph-disk-prepare --fs-type btrfs --cluster ceph -- /dev/sdb1 /dev/sdb2 [bd-1][DEBUG ] [bd-1][DEBUG ] WARNING! - Btrfs v3.12 IS EXPERIMENTAL [bd-1][DEBUG ] WARNING! - see http://btrfs.wiki.kernel.org before using [bd-1][DEBUG ] [bd-1][DEBUG ] fs created label (null) on /dev/sdb1 [bd-1][DEBUG ] nodesize 32768 leafsize 32768 sectorsize 4096 size 19.99TiB [bd-1][DEBUG ] Btrfs v3.12 [bd-1][WARNIN] WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same device as the osd data [bd-1][WARNIN] Turning ON incompat feature 'extref': increased hardlink limit per file to 65536 [bd-1][WARNIN] Error: Partition(s) 1 on /dev/sdb1 have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes. [bd-1][INFO ] checking OSD status... [bd-1][INFO ] Running command: ceph --cluster=ceph osd stat --format=json [ceph_deploy.osd][DEBUG ] Host bd-1 is now ready for osd use. Unhandled exception in thread started by sys.excepthook is missing lost sys.stderr ceph-deploy told me to do a reboot, so i did. This is actually not ceph-deploy asking you for a reboot but the stderr captured from the remote node (bd-1 in your case). ceph-deploy will log output from remote nodes and will preface the logs with the hostname when the output happens remotely. stderr will be used as WARNING level and stdout as DEBUG. So in your case this line is output from ceph-disk-prepare/btrfs: [bd-1][WARNIN] Error: Partition(s) 1 on /dev/sdb1 have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes. Have you tried 'create' instead of 'prepare' and 'activate' ? After the reboot the osd-disk changed from sdb to sda. This is a known problem of linux (ubuntu) root@bd-a:/etc/ceph# ceph-deploy -v osd activate bd-1:/dev/sda1:/dev/sda2 [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.3): /usr/bin/ceph-deploy -v osd activate bd-1:/dev/sda1:/dev/sda2 [ceph_deploy.osd][DEBUG ] Activating cluster ceph disks bd-1:/dev/sda1:/dev/sda2 [bd-1][DEBUG ] connected to host: bd-1 [bd-1][DEBUG ] detect platform information from remote host [bd-1][DEBUG ] detect machine type [ceph_deploy.osd][INFO ] Distro info: Ubuntu 14.04 trusty [ceph_deploy.osd][DEBUG ] activating host bd-1 disk /dev/sda1 [ceph_deploy.osd][DEBUG ] will use init type: upstart [bd-1][INFO ] Running command: ceph-disk-activate --mark-init upstart --mount /dev/sda1 [bd-1][WARNIN] got monmap epoch 1 [bd-1][WARNIN] HDIO_DRIVE_CMD(identify) failed: Invalid argument [bd-1][WARNIN] 2014-06-10 11:45:07.222697 7f5c111af800 -1 journal check: ondisk fsid c8ce6ee2-f21b-4ba3-a20e-649224244b9a doesn't match expected fcaaf66f-b7b7-4702-83a4-54832b7131fa, invalid (someone else's?) journal [bd-1][WARNIN] HDIO_DRIVE_CMD(identify) failed: Invalid argument [bd-1][WARNIN] HDIO_DRIVE_CMD(identify) failed: Invalid argument [bd-1][WARNIN] HDIO_DRIVE_CMD(identify) failed: Invalid argument [bd-1][WARNIN] 2014-06-10 11:45:08.125384 7f5c111af800 -1 filestore(/var/lib/ceph/tmp/mnt.LryOxo) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory [bd-1][WARNIN] 2014-06-10 11:45:08.320327 7f5c111af800 -1 created object store /var/lib/ceph/tmp/mnt.LryOxo journal /var/lib/ceph/tmp/mnt.LryOxo/journal for osd.4 fsid 08066b4a-3f36-4e3f-bd1e-15c006a09057 [bd-1][WARNIN]
Re: [ceph-users] pid_max value?
Thanks Bonin. Do you have totally 48 OSDs or there are 48 OSDs on each storage node? Do you think kernel.pid_max = 4194303 is reasonable since it increase a lot from the default OS setting. Wei Cao (Buddy) -Original Message- From: Maciej Bonin [mailto:maciej.bo...@m247.com] Sent: Wednesday, June 11, 2014 10:07 PM To: Cao, Buddy; ceph-users@lists.ceph.com Subject: RE: pid_max value? Hello, The values we use are as follows: # sysctl -p net.ipv4.ip_local_port_range = 1024 65535 net.core.netdev_max_backlog = 3 net.core.somaxconn = 16384 net.ipv4.tcp_max_syn_backlog = 252144 net.ipv4.tcp_max_tw_buckets = 36 net.ipv4.tcp_fin_timeout = 3 net.ipv4.tcp_max_orphans = 262144 net.ipv4.tcp_synack_retries = 2 net.ipv4.tcp_syn_retries = 2 net.core.rmem_max = 8388608 net.core.wmem_max = 8388608 net.core.rmem_default = 65536 net.core.wmem_default = 65536 net.ipv4.tcp_rmem = 4096 87380 8388608 net.ipv4.tcp_wmem = 4096 65536 8388608 net.ipv4.tcp_mem = 8388608 8388608 8388608 net.ipv4.route.flush = 1 kernel.pid_max = 4194303 The timeouts don't really make sense without tw reuse/recycling but we found increasing the max and letting the old ones hang gives better performance. Somaxconn was the most important value we had to increase as with 3 mons, 3 storage nodes, 3 vm hypervisors, 16vms and 48 OSDs we've started running into major problems with servers dying left and right. Most of those values are lifted from some openstack python script IIRC, please let us know if you find a more efficient/stable configuration, however we're quite happy with this one. Regards, Maciej Bonin Systems Engineer | M247 Limited M247.com Connected with our Customers Contact us today to discuss your hosting and connectivity requirements ISO 27001 | ISO 9001 | Deloitte Technology Fast 50 | Deloitte Technology Fast 500 EMEA | Sunday Times Tech Track 100 M247 Ltd, registered in England Wales #4968341. 1 Ball Green, Cobra Court, Manchester, M32 0QT ISO 27001 Data Protection Classification: A - Public From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Cao, Buddy Sent: 11 June 2014 15:00 To: ceph-users@lists.ceph.com Subject: [ceph-users] pid_max value? Hi, what is the recommended value for /proc/sys/kernel/pid_max? Is 32768 enough for Ceph cluster with 4 nodes (40 1T OSDs on each node)? My ceph node already run into create thread fail problem in osd log which root cause at pid_max. Wei Cao (Buddy) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS crash dump ?
On Wednesday, June 11, 2014, Florent B flor...@coppint.com wrote: Hi every one, Sometimes my MDS crashes... sometimes after a few hours, sometimes after a few days. I know I could enable debugging and so on to get more information. But if it crashes after a few days, it generates gigabytes of debugging data that are not related to the crash. Is it possible to get just a crash dump when MDS is crashing, to see what's wrong ? You should be getting a backtrace regardless of what debugging levels are enabled, so I assume you mean having it dump out prior log lines when that happens. And indeed you can. Normally you specify something like debug mds =10 And that dumps out the log. You can instead specify two values, separated by a slash, and the daemon will take the time to generate all the log lines at the second value but only dump to disk the first value: debug mds = 0/10 That will put nothing in the log, but will generate debug output level 10 in a memory ring buffer (1 entries), and dump it on a crash. You can do this with any debug setting. -Greg Thank you. ___ ceph-users mailing list ceph-users@lists.ceph.com javascript:; http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] umount gets stuck when umounting a cloned rbd image
Hello I address you with this issue i noticed it with ceph 072.2 and linux ubuntu 13.10 and with 0.80.1 with ubuntu 14.04. here is what i do: 1) I create and format to ext4 or xfs a rbd image of 4 TB . the image has --order 25 and --image-format 2 2) I create a snapshot of that rbd image 3) I protect that snapshot 4) I create a clone image of that inicial rbd image using the protected snapshot as reference. 5) I insert the line in /etc/ceph/rbdmap I map the new image. I mount the new image to my ceph client server. Until here all is fine cool and dandy 6) I umount the /dev/rbd1 which is the previous mounted rbd clone image. and umount is stuck in the client server with the umount stuck i have this message in the /var/log/syslog Jun 11 12:26:10 tesla kernel: [63365.178657] libceph: osd8 20.10.10.105:6803 socket error on read as it seems the problem is somehow related to osd8 on my 20.10.10.105 ceph node then i go there to get more information from log in the /var/log/ceph-osd.8.log there is this message comming in endlessly 2014-06-11 12:31:51.692031 7fa26085c700 0 -- 20.10.10.105:6805/23321 20.10.10.12:0/2563935849 pipe(0x9dd6780 sd=231 :6805 s=0 pgs=0 cs=0 l=0 c=0x7ed6840).accept peer addr is really 20.10.10.12:0/2563935849 (socket is 20.10.10.12:33056/0) Can anyone help me solve this issue ? -- Alphe Salas I.T ingeneer ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Unable to remove mds
On Wed, Jun 11, 2014 at 4:56 AM, yalla.gnan.ku...@accenture.com wrote: Hi All, I have a four node ceph cluster. The metadata service is showing as degraded in health. How to remove the mds service from ceph ? Unfortunately you can't remove it entirely right now, but if you create a new filesystem using the newfs command, and don't turn on an MDS daemon after that, it won't report a health error. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can we map OSDs from different hosts (servers) to a Pool in Ceph
On Wed, Jun 11, 2014 at 5:18 AM, Davide Fanciola dfanci...@gmail.com wrote: Hi, we have a similar setup where we have SSD and HDD in the same hosts. Our very basic crushmap is configured as follows: # ceph osd tree # id weight type name up/down reweight -6 3 root ssd 3 1 osd.3 up 1 4 1 osd.4 up 1 5 1 osd.5 up 1 -5 3 root platters 0 1 osd.0 up 1 1 1 osd.1 up 1 2 1 osd.2 up 1 -1 3 root default -2 1 host chgva-srv-stor-001 0 1 osd.0 up 1 3 1 osd.3 up 1 -3 1 host chgva-srv-stor-002 1 1 osd.1 up 1 4 1 osd.4 up 1 -4 1 host chgva-srv-stor-003 2 1 osd.2 up 1 5 1 osd.5 up 1 We do not seem to have problems with this setup, but i'm not sure if it's a good practice to have elements appearing multiple times in different branches. On the other hand, I see no way to follow the physical hierarchy of a datacenter for pools, since a pool can be spread among servers/racks/rooms... Can someone confirm this crushmap is any good for our configuration? If you accidentally use the default node anywhere, you'll get data scattered across both classes of device. If you try and use both the platters and ssd nodes within a single CRUSH rule, you might end up with copies of data on the same host (reducing your data resiliency). Otherwise this is just fine. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pid_max value?
We have not experienced any downsides to this approach performance or stability-wise, if you prefer you can experiment with the values, but I see no real advantage in doing so. Regards, Maciej Bonin Systems Engineer | M247 Limited M247.com Connected with our Customers Contact us today to discuss your hosting and connectivity requirements ISO 27001 | ISO 9001 | Deloitte Technology Fast 50 | Deloitte Technology Fast 500 EMEA | Sunday Times Tech Track 100 M247 Ltd, registered in England Wales #4968341. 1 Ball Green, Cobra Court, Manchester, M32 0QT ISO 27001 Data Protection Classification: A - Public -Original Message- From: Cao, Buddy [mailto:buddy@intel.com] Sent: 11 June 2014 17:00 To: Maciej Bonin; ceph-users@lists.ceph.com Subject: RE: pid_max value? Thanks Bonin. Do you have totally 48 OSDs or there are 48 OSDs on each storage node? Do you think kernel.pid_max = 4194303 is reasonable since it increase a lot from the default OS setting. Wei Cao (Buddy) -Original Message- From: Maciej Bonin [mailto:maciej.bo...@m247.com] Sent: Wednesday, June 11, 2014 10:07 PM To: Cao, Buddy; ceph-users@lists.ceph.com Subject: RE: pid_max value? Hello, The values we use are as follows: # sysctl -p net.ipv4.ip_local_port_range = 1024 65535 net.core.netdev_max_backlog = 3 net.core.somaxconn = 16384 net.ipv4.tcp_max_syn_backlog = 252144 net.ipv4.tcp_max_tw_buckets = 36 net.ipv4.tcp_fin_timeout = 3 net.ipv4.tcp_max_orphans = 262144 net.ipv4.tcp_synack_retries = 2 net.ipv4.tcp_syn_retries = 2 net.core.rmem_max = 8388608 net.core.wmem_max = 8388608 net.core.rmem_default = 65536 net.core.wmem_default = 65536 net.ipv4.tcp_rmem = 4096 87380 8388608 net.ipv4.tcp_wmem = 4096 65536 8388608 net.ipv4.tcp_mem = 8388608 8388608 8388608 net.ipv4.route.flush = 1 kernel.pid_max = 4194303 The timeouts don't really make sense without tw reuse/recycling but we found increasing the max and letting the old ones hang gives better performance. Somaxconn was the most important value we had to increase as with 3 mons, 3 storage nodes, 3 vm hypervisors, 16vms and 48 OSDs we've started running into major problems with servers dying left and right. Most of those values are lifted from some openstack python script IIRC, please let us know if you find a more efficient/stable configuration, however we're quite happy with this one. Regards, Maciej Bonin Systems Engineer | M247 Limited M247.com Connected with our Customers Contact us today to discuss your hosting and connectivity requirements ISO 27001 | ISO 9001 | Deloitte Technology Fast 50 | Deloitte Technology Fast 500 EMEA | Sunday Times Tech Track 100 M247 Ltd, registered in England Wales #4968341. 1 Ball Green, Cobra Court, Manchester, M32 0QT ISO 27001 Data Protection Classification: A - Public From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Cao, Buddy Sent: 11 June 2014 15:00 To: ceph-users@lists.ceph.com Subject: [ceph-users] pid_max value? Hi, what is the recommended value for /proc/sys/kernel/pid_max? Is 32768 enough for Ceph cluster with 4 nodes (40 1T OSDs on each node)? My ceph node already run into create thread fail problem in osd log which root cause at pid_max. Wei Cao (Buddy) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Moving Ceph cluster to different network segment
We need to move Ceph cluster to different network segment for interconnectivity between mon and osc, anybody has the procedure regarding how that can be done? Note that the host name reference will be changed, so originally the osd host referenced as cephnode1, in the new segment it will be cephnode1-n. Thanks, Fred Sent from my Samsung Galaxy S3 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] tiering : hit_set_count hit_set_period memory usage ?
Hi, I'm reading tiering doc here http://ceph.com/docs/firefly/dev/cache-pool/ The hit_set_count and hit_set_period define how much time each HitSet should cover, and how many such HitSets to store. Binning accesses over time allows Ceph to independently determine whether an object was accessed at least once and whether it was accessed more than once over some time period (“age” vs “temperature”). Note that the longer the period and the higher the count the more RAM will be consumed by the ceph-osd process. In particular, when the agent is active to flush or evict cache objects, all hit_set_count HitSets are loaded into RAM about how much memory do we talk here ? any formula ? (nr object x ? ) I'm looking for hit_set_period like 12h or 24h ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] tiering : hit_set_count hit_set_period memory usage ?
On Wed, Jun 11, 2014 at 12:44 PM, Alexandre DERUMIER aderum...@odiso.com wrote: Hi, I'm reading tiering doc here http://ceph.com/docs/firefly/dev/cache-pool/ The hit_set_count and hit_set_period define how much time each HitSet should cover, and how many such HitSets to store. Binning accesses over time allows Ceph to independently determine whether an object was accessed at least once and whether it was accessed more than once over some time period (“age” vs “temperature”). Note that the longer the period and the higher the count the more RAM will be consumed by the ceph-osd process. In particular, when the agent is active to flush or evict cache objects, all hit_set_count HitSets are loaded into RAM about how much memory do we talk here ? any formula ? (nr object x ? ) We haven't really quantified that yet. In particular, it's going to depend on how many objects are accessed within a period; the OSD sizes them based on the previous access count and the false positive probability that you give it. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] HEALTH_WARN pool has too few pgs
Hi, I am seeing the following warning on one of my test clusters: # ceph health detail HEALTH_WARN pool Ray has too few pgs pool Ray objects per pg (24) is more than 12 times cluster average (2) This is a reported issue and is set to Won't Fix at: http://tracker.ceph.com/issues/8103 My test cluster has a mix of test data, and the pool showing the warning is used for RBD Images. # ceph df detail GLOBAL: SIZE AVAIL RAW USED %RAW USED OBJECTS 1009G 513G 496G 49.14 33396 POOLS: NAME ID CATEGORY USED %USED OBJECTS DIRTY READ WRITE data 0 -0 0 0 0 0 0 metadata 1 -0 0 0 0 0 0 rbd2 -0 0 0 0 0 0 iscsi 3 -847M 0.08 241 211 11839k 10655k cinder 4 -305M 0.03 53 2 51579 31584 glance 5 -65653M 6.35 82227 512k 10405 .users.swift 7 -0 0 0 0 0 4 .rgw.root 8 -1045 0 4 4 23 5 .rgw.control 9 -0 0 8 8 0 0 .rgw 10 -2520 2 2 3 11 .rgw.gc11 -0 0 32 324958 3328 .users.uid 12 -5750 3 3 70 23 .users 13 -9 0 1 1 0 9 .users.email 14 -0 0 0 0 0 0 .rgw.buckets 15 -0 0 0 0 0 0 .rgw.buckets.index 16 -0 0 1 1 1 1 Ray17 -99290M 9.61 24829 24829 0 0 It would be nice if we could turn off this message. Eric ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] For ceph firefly, which version kernel client should be used?
Dear Sir, In our test, we use ceph firefly to build a cluster. On a node with kernel 3.10.xx, if using kernel client to mount cephfs, when use ‘ls’ command, sometime no all the files can be listed. If using ceph-fuse 0.80.x, so far it seems it work well. I guess that the kernel 3.10.xx is too old, so the kernel client does not work well. If it is right, which version of kernel shall we use? Thanks, Baogang___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph pgs stuck inactive since forever
I installed ceph and then I was ceph health it gives me the following output HEALTH_WARN 384 pgs incomplete; 384 pgs stuck inactive; 384 pgs stuck unclean; 2 near full osd(s) This is the output of a single pg when I use ceph health detail pg 2.2 is incomplete, acting [0] (reducing pool rbd min_size from 2 may help; search ceph.com/docs for 'incomplete') and similar line comes up for all the pgs. This is the output of ceph - s cluster 89cbb30c-023b-4f8b-ac14-abc78fb6b07a health HEALTH_WARN 384 pgs incomplete; 384 pgs stuck inactive; 384 pgs stuck unclean; 2 near full osd(s) monmap e1: 1 mons at {a=100.112.12.28:6789/0}, election epoch 2, quorum 0 a osdmap e5: 2 osds: 2 up, 2 in pgmap v64: 384 pgs, 3 pools, 0 bytes data, 0 objects 111 GB used, 8346 MB / 125 GB avail 384 incomplete Confidentiality Warning: This message and any attachments are intended only for the use of the intended recipient(s). are confidential and may be privileged. If you are not the intended recipient. you are hereby notified that any review. re-transmission. conversion to hard copy. copying. circulation or other use of this message and any attachments is strictly prohibited. If you are not the intended recipient. please notify the sender immediately by return email. and delete this message and any attachments from your system. Virus Warning: Although the company has taken reasonable precautions to ensure no viruses are present in this email. The company cannot accept responsibility for any loss or damage arising from the use of this email or attachment. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Problem installing ceph from package manager / ceph repositories
On 06/09/2014 03:08 PM, Karan Singh wrote: 1. When installing Ceph using package manger and ceph repositores , the package manager i.e YUM does not respect the ceph.repo file and takes ceph package directly from EPEL . Option 1: install yum-plugin-priorities, add priority = X to ceph.repo. X should be less than EPEL's priority, the default is I believe 99. Option 2: add exclude = ceph_package(s) to epel.repo. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] HEALTH_WARN pool has too few pgs
Hi Eric, increase the number of PGs in your pool with Step 1: ceph osd pool set poolname pg_num newvalue Step 2: ceph osd pool set poolname pgp_num newvalue You can check the number of PGs in your pool with ceph osd dump | grep ^pool See documentation: http://ceph.com/docs/master/rados/operations/pools/ JC On Jun 11, 2014, at 12:59, Eric Eastman eri...@aol.com wrote: Hi, I am seeing the following warning on one of my test clusters: # ceph health detail HEALTH_WARN pool Ray has too few pgs pool Ray objects per pg (24) is more than 12 times cluster average (2) This is a reported issue and is set to Won't Fix at: http://tracker.ceph.com/issues/8103 My test cluster has a mix of test data, and the pool showing the warning is used for RBD Images. # ceph df detail GLOBAL: SIZE AVAIL RAW USED %RAW USED OBJECTS 1009G 513G 496G 49.14 33396 POOLS: NAME ID CATEGORY USED %USED OBJECTS DIRTY READ WRITE data 0 -0 0 0 0 0 0 metadata 1 -0 0 0 0 0 0 rbd2 -0 0 0 0 0 0 iscsi 3 -847M 0.08 241 211 11839k 10655k cinder 4 -305M 0.03 53 2 51579 31584 glance 5 -65653M 6.35 8222 7 512k 10405 .users.swift 7 -0 0 0 0 0 4 .rgw.root 8 -1045 0 4 4 23 5 .rgw.control 9 -0 0 8 8 0 0 .rgw 10 -2520 2 2 3 11 .rgw.gc11 -0 0 32 324958 3328 .users.uid 12 -5750 3 3 70 23 .users 13 -9 0 1 1 0 9 .users.email 14 -0 0 0 0 0 0 .rgw.buckets 15 -0 0 0 0 0 0 .rgw.buckets.index 16 -0 0 1 1 1 1 Ray17 -99290M 9.61 24829 24829 0 0 It would be nice if we could turn off this message. Eric ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Problem installing ceph from package manager / ceph repositories
Hi Dimitri It was already resolved , moderator took a long time to approve my email to get posted to mailing list. Thanks for your solution . - Karan - On 12 Jun 2014, at 00:02, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote: On 06/09/2014 03:08 PM, Karan Singh wrote: 1. When installing Ceph using package manger and ceph repositores , the package manager i.e YUM does not respect the ceph.repo file and takes ceph package directly from EPEL . Option 1: install yum-plugin-priorities, add priority = X to ceph.repo. X should be less than EPEL's priority, the default is I believe 99. Option 2: add exclude = ceph_package(s) to epel.repo. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] I have PGs that I can't deep-scrub
New logs, with debug ms = 1, debug osd = 20. In this timeline, I started the deep-scrub at 11:04:00 Ceph start deep-scrubing at 11:04:03. osd.11 started consuming 100% CPU around 11:07. Same for osd.0. CPU usage is all user; iowait is 0.10%. There is more variance in the CPU usage now, ranging between 98.5% and 101.2% This time, I didn't see any major IO, read or write. osd.11 was marked down at 11:22:00: 2014-06-11 11:22:00.820118 mon.0 [INF] osd.11 marked down after no pg stats for 902.656777seconds osd.0 was marked down at 11:36:00: 2014-06-11 11:36:00.890869 mon.0 [INF] osd.0 marked down after no pg stats for 902.498894seconds ceph.conf: https://cd.centraldesktop.com/p/eAAADwbcABIDZuE ceph-osd.0.log.gz (140MiB, 18MiB compressed): https://cd.centraldesktop.com/p/eAAADwbdAHnmhFQ ceph-osd.11.log.gz (131MiB, 17MiB compressed): https://cd.centraldesktop.com/p/eAAADwbeAEUR9AI ceph pg 40.11e query: https://cd.centraldesktop.com/p/eAAADwbfAEJcwvc On Wed, Jun 11, 2014 at 5:42 AM, Sage Weil s...@inktank.com wrote: Hi Craig, It's hard to say what is going wrong with that level of logs. Can you reproduce with debug ms = 1 and debug osd = 20? There were a few things fixed in scrub between emperor and firefly. Are you planning on upgrading soon? sage On Tue, 10 Jun 2014, Craig Lewis wrote: Every time I deep-scrub one PG, all of the OSDs responsible get kicked out of the cluster. I've deep-scrubbed this PG 4 times now, and it fails the same way every time. OSD logs are linked at the bottom. What can I do to get this deep-scrub to complete cleanly? This is the first time I've deep-scrubbed these PGs since Sage helped me recover from some OSD problems (http://t53277.file-systems-ceph-development.file-systemstalk.info/70-osd-are-down-and-not-coming-up-t53277.html) I can trigger the issue easily in this cluster, but have not been able to re-create in other clusters. The PG stats for this PG say that last_deep_scrub and deep_scrub_stamp are 48009'1904117 2014-05-21 07:28:01.315996 respectively. This PG is owned by OSDs [11,0] This is a secondary cluster, so I stopped all external I/O on it. I set nodeep-scrub, and restarted both OSDs with: debug osd = 5/5 debug filestore = 5/5 debug journal = 1 debug monc = 20/20 then I ran a deep-scrub on this PG. 2014-06-10 10:47:50.881783 mon.0 [INF] pgmap v8832020: 2560 pgs: 2555 active+clean, 5 active+clean+scrubbing; 27701 GB data, 56218 GB used, 77870 GB / 130 TB avail 2014-06-10 10:47:54.039829 mon.0 [INF] pgmap v8832021: 2560 pgs: 2554 active+clean, 5 active+clean+scrubbing, 1 active+clean+scrubbing+deep; 27701 GB data, 56218 GB used, 77870 GB / 130 TB avail At 10:49:09, I see ceph-osd for both 11 and 0 spike to 100% CPU (100.3% +/- 1.0%). Prior to this, they were both using ~30% CPU. It might've started a few seconds sooner, I'm watching top. I forgot to watch IO stat until 10:56. At this point, both OSDs are reading. iostat reports that they're both doing ~100 transactions/sec, reading ~1 MiBps, 0 writes. At 11:01:26, iostat reports that both osds are no longer consuming any disk I/O. They both go for 30 seconds with 0 transactions, and 0 kiB read/write. There are small bumps of 2 transactions/sec for one second, then it's back to 0. At 11:02:41, the primary OSD gets kicked out by the monitors: 2014-06-10 11:02:41.168443 mon.0 [INF] pgmap v8832125: 2560 pgs: 2555 active+clean, 4 active+clean+scrubbing, 1 active+clean+scrubbing+deep; 27701 GB data, 56218 GB used, 77870 GB / 130 TB avail; 1996 B/s rd, 2 op/s 2014-06-10 11:02:57.801047 mon.0 [INF] osd.11 marked down after no pg stats for 903.825187seconds 2014-06-10 11:02:57.823115 mon.0 [INF] osdmap e58834: 36 osds: 35 up, 36 in Both ceph-osd processes (11 and 0) continue to use 100% CPU (same range). At ~11:10, I see that osd.11 has resumed reading from disk at the original levels (~100 tps, ~1MiBps read, 0 MiBps write). Since it's down, but doing something, I let it run. Both the osd.11 and osd.0 repeat this pattern. Reading for a while at ~1 MiBps, then nothing. The duty cycle seems about 50%, with a 20 minute period, but I haven't timed anything. CPU usage remains at 100%, regardless of whether IO is happening or not. At 12:24:15, osd.11 rejoins the cluster: 2014-06-10 12:24:15.294646 mon.0 [INF] osd.11 10.193.0.7:6804/7100 boot 2014-06-10 12:24:15.294725 mon.0 [INF] osdmap e58838: 36 osds: 35 up, 36 in 2014-06-10 12:24:15.343869 mon.0 [INF] pgmap v8832827: 2560 pgs: 1 stale+active+clean+scrubbing+deep, 2266 active+clean, 5 stale+active+clean, 287 active+degraded, 1 active+clean+scrubbing; 27701 GB data, 56218 GB used, 77870 GB / 130 TB avail; 15650 B/s rd, 18 op/s; 3617854/61758142 objects degraded (5.858%) osd.0's CPU usage drops back to normal when osd.11 rejoins the cluster. The PG stats have not changed. The last_deep_scrub and
Re: [ceph-users] Ceph pgs stuck inactive since forever
I'll update the docs to incorporate the term incomplete. I believe this is due to an inability to complete backfilling. Your cluster is nearly full. You indicated that you installed Ceph. Did you store data in the cluster? Your usage indicates that you have used 111GB of 125GB. So you only have about 8GB left. Did it ever get to an active + clean state? On Wed, Jun 11, 2014 at 6:08 AM, akhil.labudubar...@ril.com wrote: I installed ceph and then I was ceph health it gives me the following output *HEALTH_WARN 384 pgs incomplete; 384 pgs stuck inactive; 384 pgs stuck unclean; 2 near full osd(s)* This is the output of a single pg when I use ceph health detail *pg 2.2 is incomplete, acting [0] (reducing pool rbd min_size from 2 may help; search ceph.com/docs http://ceph.com/docs for 'incomplete')* and similar line comes up for all the pgs. This is the output of ceph - s *cluster 89cbb30c-023b-4f8b-ac14-abc78fb6b07a* * health HEALTH_WARN 384 pgs incomplete; 384 pgs stuck inactive; 384 pgs stuck unclean; 2 near full osd(s)* * monmap e1: 1 mons at {a=100.112.12.28:6789/0 http://100.112.12.28:6789/0}, election epoch 2, quorum 0 a* * osdmap e5: 2 osds: 2 up, 2 in* * pgmap v64: 384 pgs, 3 pools, 0 bytes data, 0 objects* *111 GB used, 8346 MB / 125 GB avail* * 384 incomplete* *Confidentiality Warning*: This message and any attachments are intended only for the use of the intended recipient(s), are confidential and may be privileged. If you are not the intended recipient, you are hereby notified that any review, re-transmission, conversion to hard copy, copying, circulation or other use of this message and any attachments is strictly prohibited. If you are not the intended recipient, please notify the sender immediately by return email and delete this message and any attachments from your system. *Virus Warning:* Although the company has taken reasonable precautions to ensure no viruses are present in this email. The company cannot accept responsibility for any loss or damage arising from the use of this email or attachment. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- John Wilkins Senior Technical Writer Intank john.wilk...@inktank.com (415) 425-9599 http://inktank.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Swift API Authentication Failure
(resending also to list) Right. So Basically the swift subuser wasn't created correctly. I created issue #8587. Can you try creating a second subuser, see if it's created correctly the second time? On Wed, Jun 11, 2014 at 2:03 PM, David Curtiss dcurtiss_c...@dcurtiss.com wrote: Hmm Using that method, the subuser object appears to be an empty string. First, note that I skipped the Create Pools step: http://ceph.com/docs/master/radosgw/config/#create-pools because it says If the user you created has permissions, the gateway will create the pools automatically. And indeed, the .users.swift pool is there: $ rados lspools data metadata rbd .rgw.root .rgw.control .rgw .rgw.gc .users.uid .users.email .users .users.swift But the only entry in that pool is an empty string. $ rados ls -p .users.swift blank line And that is indeed a blank line (as opposed to 0 lines), because there is 1 object in that pool: $ rados df pool name category KB objects clones degraded unfound rdrd KB wrwr KB ... .users.swift- 110 0 00011 For comparison, the 'df' line for the .users pool lists 2 objects, which are as follows: $ rados ls -p .users 4U5H60BMDL7OSI5ZBL8P F7HZCI4SL12KVVSJ9UVZ - David On Tue, Jun 10, 2014 at 11:49 PM, Yehuda Sadeh yeh...@inktank.com wrote: Can you verify that the subuser object actually exist? Try doing: $ rados ls -p .users.swift (unless you have non default pools set) Yehuda On Tue, Jun 10, 2014 at 6:44 PM, David Curtiss dcurtiss_c...@dcurtiss.com wrote: No good. In fact, for some reason when I tried to load up my cluster VMs today, I couldnt't get them to work (something to do with a pipe fault), so I recreated my VMs nearly from scratch, to no avail. Here are the commands I used to create the user and subuser: radosgw-admin user create --uid=hive_cache --display-name=Hive Cache --email=pds.supp...@ni.com radosgw-admin subuser create --uid=hive_cache --subuser=hive_cache:swift --access=full radosgw-admin key create --subuser=hive_cache:swift --key-type=swift --secret=QFAMEDSJP5DEKJO0DDXY - David On Mon, Jun 9, 2014 at 11:14 PM, Yehuda Sadeh yeh...@inktank.com wrote: It seems that the subuser object was not created for some reason. Can you try recreating it? Yehuda On Sun, Jun 8, 2014 at 5:50 PM, David Curtiss dcurtiss_c...@dcurtiss.com wrote: Here's the log: http://pastebin.com/bRt9kw9C Thanks, David On Fri, Jun 6, 2014 at 10:58 PM, Yehuda Sadeh yeh...@inktank.com wrote: On Wed, Jun 4, 2014 at 12:00 PM, David Curtiss dcurtiss_c...@dcurtiss.com wrote: Over the last two days, I set up ceph on a set of ubuntu 12.04 VMs (my first time working with ceph), and it seems to be working fine (I have HEALTH_OK, and can create a test document via the rados commandline tool), but I can't authenticate with the swift API. I followed the quickstart guides to get ceph and radosgw installed. (Listed here, if you want to check my work: http://pastebin.com/nfPWCn9P ) Visiting the root of the web server shows the ListAllMyBucketsResult XML, as expected, but trying to authenticate always gives me 403 Forbidden errors. Here's the output of radosgw-admin user info --uid=hive_cache: http://pastebin.com/vwwbyd4c And here's my curl invocation: http://pastebin.com/EfQ8nw8a Any ideas on what might be wrong? Not sure. Can you try reproducing it with 'debug rgw = 20' and 'debug ms = 1' on rgw and provide the log? Thanks, Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Current OS kernel recommendations
This http://ceph.com/docs/master/start/os-recommendations/ appears to be a bit out of date only goes to Ceph 0.72). Presumably Ubuntu Trusty should now be on that list in some form, e.g., for Firefly? -- Cheers, ~Blairo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] For ceph firefly, which version kernel client should be used?
On Mon, Jun 9, 2014 at 3:49 PM, Liu Baogang liubaog...@gmail.com wrote: Dear Sir, In our test, we use ceph firefly to build a cluster. On a node with kernel 3.10.xx, if using kernel client to mount cephfs, when use 'ls' command, sometime no all the files can be listed. If using ceph-fuse 0.80.x, so far it seems it work well. I guess that the kernel 3.10.xx is too old, so the kernel client does not work well. If it is right, which version of kernel shall we use? 3.14 Thanks, Baogang ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG Selection Criteria for Deep-Scrub
The code checks the pg with the oldest scrub_stamp/deep_scrub_stamp to see whether the osd_scrub_min_interval/osd_deep_scrub_interval time has elapsed. So the output you are showing with the very old scrub stamps shouldn’t happen under default settings. As soon set deep-scrub is re-enabled, the 5 pgs with that old stamp should be the first to get run. A PG needs to have active and clean set to be scrubbed. If any weren’t active+clean, then even a manual scrub would do nothing. Now that I’m looking at the code I see that your symptom is possible if the values of osd_scrub_min_interval or osd_scrub_max_interval are larger than your osd_deep_scrub_interval. Should the osd_scrub_min_interval be greater than osd_deep_scrub_interval, there won't be a deep scrub until the osd_scrub_min_interval has elapsed. If an OSD is under load and the osd_scrub_max_interval is greater than the osd_deep_scrub_interval, there won't be a deep scrub until osd_scrub_max_interval has elapsed. Please check the 3 interval config values. Verify that your PGs are active+clean just to be sure. David On May 20, 2014, at 5:21 PM, Mike Dawson mike.daw...@cloudapt.com wrote: Today I noticed that deep-scrub is consistently missing some of my Placement Groups, leaving me with the following distribution of PGs and the last day they were successfully deep-scrubbed. # ceph pg dump all | grep active | awk '{ print $20}' | sort -k1 | uniq -c 5 2013-11-06 221 2013-11-20 1 2014-02-17 25 2014-02-19 60 2014-02-20 4 2014-03-06 3 2014-04-03 6 2014-04-04 6 2014-04-05 13 2014-04-06 4 2014-04-08 3 2014-04-10 2 2014-04-11 50 2014-04-12 28 2014-04-13 14 2014-04-14 3 2014-04-15 78 2014-04-16 44 2014-04-17 8 2014-04-18 1 2014-04-20 16 2014-05-02 69 2014-05-04 140 2014-05-05 569 2014-05-06 9231 2014-05-07 103 2014-05-08 514 2014-05-09 1593 2014-05-10 393 2014-05-16 2563 2014-05-17 1283 2014-05-18 1640 2014-05-19 1979 2014-05-20 I have been running the default osd deep scrub interval of once per week, but have disabled deep-scrub on several occasions in an attempt to avoid the associated degraded cluster performance I have written about before. To get the PGs longest in need of a deep-scrub started, I set the nodeep-scrub flag, and wrote a script to manually kick off deep-scrub according to age. It is processing as expected. Do you consider this a feature request or a bug? Perhaps the code that schedules PGs to deep-scrub could be improved to prioritize PGs that have needed a deep-scrub the longest. Thanks, Mike Dawson ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Swift API Authentication Failure
Success! You nailed it. Thanks, Yehuda. I can successfully use the second subuser. Given this success, I also tried the following: $ rados -p .users.swift get '' tmp $ rados -p .users.swift put hive_cache:swift tmp $ rados -p .users.swift rm '' $ rados -p .users.swift ls hive_cache:swift2 hive_cache:swift So everything looked good, as far as I can tell, but I still can't authenticate with the first subuser. (But at least the second one still works.) - David On Wed, Jun 11, 2014 at 5:38 PM, Yehuda Sadeh yeh...@inktank.com wrote: (resending also to list) Right. So Basically the swift subuser wasn't created correctly. I created issue #8587. Can you try creating a second subuser, see if it's created correctly the second time? On Wed, Jun 11, 2014 at 2:03 PM, David Curtiss dcurtiss_c...@dcurtiss.com wrote: Hmm Using that method, the subuser object appears to be an empty string. First, note that I skipped the Create Pools step: http://ceph.com/docs/master/radosgw/config/#create-pools because it says If the user you created has permissions, the gateway will create the pools automatically. And indeed, the .users.swift pool is there: $ rados lspools data metadata rbd .rgw.root .rgw.control .rgw .rgw.gc .users.uid .users.email .users .users.swift But the only entry in that pool is an empty string. $ rados ls -p .users.swift blank line And that is indeed a blank line (as opposed to 0 lines), because there is 1 object in that pool: $ rados df pool name category KB objects clones degraded unfound rdrd KB wrwr KB ... .users.swift- 110 0 00011 For comparison, the 'df' line for the .users pool lists 2 objects, which are as follows: $ rados ls -p .users 4U5H60BMDL7OSI5ZBL8P F7HZCI4SL12KVVSJ9UVZ - David On Tue, Jun 10, 2014 at 11:49 PM, Yehuda Sadeh yeh...@inktank.com wrote: Can you verify that the subuser object actually exist? Try doing: $ rados ls -p .users.swift (unless you have non default pools set) Yehuda On Tue, Jun 10, 2014 at 6:44 PM, David Curtiss dcurtiss_c...@dcurtiss.com wrote: No good. In fact, for some reason when I tried to load up my cluster VMs today, I couldnt't get them to work (something to do with a pipe fault), so I recreated my VMs nearly from scratch, to no avail. Here are the commands I used to create the user and subuser: radosgw-admin user create --uid=hive_cache --display-name=Hive Cache --email=pds.supp...@ni.com radosgw-admin subuser create --uid=hive_cache --subuser=hive_cache:swift --access=full radosgw-admin key create --subuser=hive_cache:swift --key-type=swift --secret=QFAMEDSJP5DEKJO0DDXY - David On Mon, Jun 9, 2014 at 11:14 PM, Yehuda Sadeh yeh...@inktank.com wrote: It seems that the subuser object was not created for some reason. Can you try recreating it? Yehuda On Sun, Jun 8, 2014 at 5:50 PM, David Curtiss dcurtiss_c...@dcurtiss.com wrote: Here's the log: http://pastebin.com/bRt9kw9C Thanks, David On Fri, Jun 6, 2014 at 10:58 PM, Yehuda Sadeh yeh...@inktank.com wrote: On Wed, Jun 4, 2014 at 12:00 PM, David Curtiss dcurtiss_c...@dcurtiss.com wrote: Over the last two days, I set up ceph on a set of ubuntu 12.04 VMs (my first time working with ceph), and it seems to be working fine (I have HEALTH_OK, and can create a test document via the rados commandline tool), but I can't authenticate with the swift API. I followed the quickstart guides to get ceph and radosgw installed. (Listed here, if you want to check my work: http://pastebin.com/nfPWCn9P ) Visiting the root of the web server shows the ListAllMyBucketsResult XML, as expected, but trying to authenticate always gives me 403 Forbidden errors. Here's the output of radosgw-admin user info --uid=hive_cache: http://pastebin.com/vwwbyd4c And here's my curl invocation: http://pastebin.com/EfQ8nw8a Any ideas on what might be wrong? Not sure. Can you try reproducing it with 'debug rgw = 20' and 'debug ms = 1' on rgw and provide the log? Thanks, Yehuda ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] tiering : hit_set_count hit_set_period memory usage ?
We haven't really quantified that yet. In particular, it's going to depend on how many objects are accessed within a period; the OSD sizes them based on the previous access count and the false positive probability that you give it Ok, thanks Greg. Another question, the doc describe how the objects are going from cache tier to base tier. But how does it work from base tier to cache tier ? (cache-mode writeback) Does any read on base tier promote the object in the cache tier ? Or they are also statistics on the base tier ? (I tell the question, because I have cold datas, but I have full backups jobs running each week, reading all theses cold datas) - Mail original - De: Gregory Farnum g...@inktank.com À: Alexandre DERUMIER aderum...@odiso.com Cc: ceph-users ceph-users@lists.ceph.com Envoyé: Mercredi 11 Juin 2014 21:56:29 Objet: Re: [ceph-users] tiering : hit_set_count hit_set_period memory usage ? On Wed, Jun 11, 2014 at 12:44 PM, Alexandre DERUMIER aderum...@odiso.com wrote: Hi, I'm reading tiering doc here http://ceph.com/docs/firefly/dev/cache-pool/ The hit_set_count and hit_set_period define how much time each HitSet should cover, and how many such HitSets to store. Binning accesses over time allows Ceph to independently determine whether an object was accessed at least once and whether it was accessed more than once over some time period (“age” vs “temperature”). Note that the longer the period and the higher the count the more RAM will be consumed by the ceph-osd process. In particular, when the agent is active to flush or evict cache objects, all hit_set_count HitSets are loaded into RAM about how much memory do we talk here ? any formula ? (nr object x ? ) We haven't really quantified that yet. In particular, it's going to depend on how many objects are accessed within a period; the OSD sizes them based on the previous access count and the false positive probability that you give it. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com