Re: [ceph-users] Does anyone success deploy FEDERATED GATEWAYS
Hi Peter thanks for your reply. We plan to have a test of the muti-site data replication,but we encountered a problem. All user and metadata were replicated ok,while the data failed .radosgw-agent allways responsed "the state is error". >Message: 22 >Date: Thu, 17 Apr 2014 12:03:06 +0100 >From: Peter >To: ceph-users@lists.ceph.com >Subject: Re: [ceph-users] Does anyone success deploy FEDERATED > GATEWAYS >Message-ID: <534fb4ea.10...@tchpc.tcd.ie> >Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" > >I am currently testing this functionality. What is your issue? > > >On 04/17/2014 07:32 AM, maoqi1982 wrote: >> Hi list >> i follow the http://ceph.com/docs/master/radosgw/federated-config/ to >> test the muti-geography function.failed. .Does anyone success deploy >> FEDERATED GATEWAYS?Is the function in ceph ok or not? if anyone >> success deploy it please give me some help >> >> thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Access denied error
Hi Yehuda, With the same keys i am able to access the buckets through cyberduckbut i use the same keys in the AdminOPS apiit through the access denied erroras i have assign all the permissions to this user but still the same access denied error... http://ceph.com/docs/master/radosgw/s3/authentication/ How i can sign the message appropriately ??? On Thu, Apr 17, 2014 at 12:10 AM, Yehuda Sadeh wrote: > On Tue, Apr 15, 2014 at 11:33 PM, Punit Dambiwal > wrote: > > Hi, > > > > Still i am getting the same error,when i run the following :- > > > > -- > > curl -i 'http://xxx.xlinux.com/admin/usage?format=json' -X GET -H > > 'Authorization: AWS > > YHFQ4D8BM835BCGERHTN:kXpM0XB9UjOadexDu2ZoP8s4nKjuoL0iIZhE\/+Gv' -H 'Host: > > > Where did you come up with this authorization field? You need to sign > the message appropriately. > > Yehuda > > > xxx.xlinux.com' -H 'Content-Length: 0' > > HTTP/1.1 403 Forbidden > > Date: Wed, 16 Apr 2014 06:26:45 GMT > > > > Server: Apache/2.2.22 (Ubuntu) > > Accept-Ranges: bytes > > Content-Length: 23 > > Content-Type: application/json > > > > {"Code":"AccessDenied"} > > --- > > > > Can any body help me to resolve this issue.. > > > > Thanks, > > punit > > > > > > On Mon, Apr 14, 2014 at 11:55 AM, Punit Dambiwal > wrote: > >> > >> Hi, > >> > >> I am trying to list out all users using the Ceph S3 api and php. These > are > >> the lines of code which i used for > >> > >> > >> > >> > >> > -- > >> > >> > >> > >> > >> > >> $url = "http://.Xlinux.com/admin/user?format=json";; > >> > >> $ch = curl_init ($url); > >> > >> curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "GET"); curl_setopt($ch, > >> CURLOPT_USERPWD, > >> "P8K3750Z3PP5MGUKQYBL:CB+Ioydr1XsmQF\/gQmE\/X3YsDjtDbxLZzByaU9t\/"); > >> > >> curl_setopt($ch, CURLOPT_HTTPHEADER, array("Authorization: AWS > >> P8K3750Z3PP5MGUKQYBL:CB+Ioydr1XsmQF\/gQmE\/X3YsDjtDbxLZzByaU9t\/")); > >> > >> curl_setopt($ch, CURLOPT_HEADER, 0); > >> > >> curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, > >> CURLOPT_BINARYTRANSFER,1); $response = curl_exec($ch); $output = > >> json_encode($response); > >> > >> > >> > >> print_r($output); > >> > >> > >> > >> > >> > -- > >> > >> > >> > >> > >> > >> I am getting an error: access denied as output. Could you please check > it > >> ? > >> > >> > >> > >> I have also tried the same using the curl command like > >> > >> > >> > >> curl -i 'http://XXX.Xlinux.com/admin/usage?format=json' -X GET -H > >> > >> 'Authorization: AWS > >> > >> P8K3750Z3PP5MGUKQYBL:CB+Ioydr1XsmQF\/gQmE\/X3YsDjtDbxLZzByaU9t\/' -H > >> > >> 'Host: XXX.Xlinux.com' -H 'Content-Length: 0' > >> > >> > >> > >> HTTP/1.1 403 Forbidden > >> > >> Date: Fri, 11 Apr 2014 10:08:20 GMT > >> > >> Server: Apache/2.2.22 (Ubuntu) > >> > >> Accept-Ranges: bytes > >> > >> Content-Length: 23 > >> > >> Content-Type: application/json > >> > >> > >> > >> {"Code":"AccessDenied"} > >> > >> > >> > >> Can any body let me know if anything wrong in the above... ?? > >> > >> > >> > >> > >> > > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SSDs: cache pool/tier versus node-local block cache
On Fri, 18 Apr 2014 11:34:15 +1000 Blair Bethwaite wrote: > > Message: 20 > > Date: Thu, 17 Apr 2014 17:45:39 +0900 > > From: Christian Balzer > > To: "ceph-users@lists.ceph.com" > > Subject: Re: [ceph-users] SSDs: cache pool/tier versus node-local > > block cache > > Message-ID: <20140417174539.6c713...@batzmaru.gol.ad.jp> > > Content-Type: text/plain; charset=US-ASCII > > > > On Thu, 17 Apr 2014 12:58:55 +1000 Blair Bethwaite wrote: > > > > > Hi Kyle, > > > > > > Thanks for the response. Further comments/queries... > > > > > > > Message: 42 > > > > Date: Wed, 16 Apr 2014 06:53:41 -0700 > > > > From: Kyle Bader > > > > Cc: ceph-users > > > > Subject: Re: [ceph-users] SSDs: cache pool/tier versus node-local > > > > block cache > > > > Message-ID: > > > > > > i...@mail.gmail.com> > > > > Content-Type: text/plain; charset=UTF-8 > > > > > > > > >> Obviously the ssds could be used as journal devices, but I'm not > > > > >> really convinced whether this is worthwhile when all nodes have > > > > >> 1GB of > > > hardware > > > > >> writeback cache (writes to journal and data areas on the same > > > > >> spindle > > > have > > > > >> time to coalesce in the cache and minimise seek time hurt). Any > > > > >> advice > > > on > > > > >> this? > > > > > > > > All writes need to be written to the journal before being written > > > > to the data volume so it's going to impact your overall throughput > > > > and cause seeking, a hardware cache will only help with the later > > > > (unless you use btrfs). > > > > > > > Indeed. Also a 1GB cache having to serve 12 spindles isn't as > > impressive anymore when it comes down to per disk cache (assuming more > > or less uniform activity). > > That hardware cache also will be used for reads (I've seen controllers > > that allow you to influence the read/write cache usage ratio, but none > > where you could disable caching reads right away). > > > > Which leads me to another point, your journal SSDs will be hanging of > > that same controller as the OSD HDDs. > > Meaning that they will compete for hardware cache space that would be > > much better used for the HDDs (again, I'm unaware of any controller > > that allows to disable caching for individual disks). > > > > That's why for my current first production cluster as well as any > > future ones am planning to separate the SSDs from the OSDs whenever > > possible. > > So the PERC 710p, whilst not having the native JBOD mode of the > underlying LSI 2208 chipset, does allow per- virtual-disk cache and > read-ahead mode settings. It also does support "Cut-Through IO" (CTIO), > apparently enabled when the virtual-disk is set to no read-ahead and > write-through caching. So my draft plan is that for our hardware we'll > have 12x single-RAID0 virtual-disks, the 3 ssds will be set for CTIO. > Ah, I've seen similar stuff with LSI 2108s, but not the CTIO bit. What tends to be annoying about these single drive RAID0 virtual disks is that the real drive is shielded from the OS. And with a cluster of your size SMART data can and will be immensely helpful. > > > Right, good point. So back of envelope calculations for throughput > > > scenarios based on our hardware, just saying 150MB/s r/w for the > spindles > > > and 450/350MB/s r/w for the ssds, and pretending no controller > > > bottlenecks etc: > > > > > > 1 OSD node (without ssd journals, hence divide by 2): > > > 9 * 150 / 2 = 675MB/s write throughput > > > > > Which is, even though extremely optimistic, quite below your network > > bandwidth. > > Indeed (I'd say wildly optimistic, but for the sake of argument one has > to have some sort of number/s). > > > > 1 OSD node (with ssd journals): > > > min(9 * 150, 3 * 350) = 1050MB/s write throughput > > > > > > Aggregates for 12 OSDs: ~8GB/s versus 12.5GB/s > > > > > You get to divide those aggregate numbers by your replication factor > > and if you value your data that is 3. > > Can anyone point to the reasoning/background behind the shift to > favouring a 3x replication factor? When we started out it seemed that 2x > was the recommendation, and that's what we're running with at present. > When I first looked at Ceph the default was 2 and everybody here and at Inktank recommended 3. I think the default was/is going to be changed to 3 as well. > Current use case is RBD volumes for working data and we're looking at > integrating a cold-storage option for long-term durability of those, so > our replication is mainly about availability. I assume 3x replication is > more relevant for radosgw? There was an interesting discussion a while > back about calculating data-loss probabilities under certain conditions > but it didn't seem to have a definitive end... > You're probably thinking about the thread called "Failure probability with largish deployments" that I started last year. You might want to revisit that thread, the reliability modeling software by Inktank was coming up with decent enough numbers f
Re: [ceph-users] SSDs: cache pool/tier versus node-local block cache
> Message: 20 > Date: Thu, 17 Apr 2014 17:45:39 +0900 > From: Christian Balzer > To: "ceph-users@lists.ceph.com" > Subject: Re: [ceph-users] SSDs: cache pool/tier versus node-local > block cache > Message-ID: <20140417174539.6c713...@batzmaru.gol.ad.jp> > Content-Type: text/plain; charset=US-ASCII > > On Thu, 17 Apr 2014 12:58:55 +1000 Blair Bethwaite wrote: > > > Hi Kyle, > > > > Thanks for the response. Further comments/queries... > > > > > Message: 42 > > > Date: Wed, 16 Apr 2014 06:53:41 -0700 > > > From: Kyle Bader > > > Cc: ceph-users > > > Subject: Re: [ceph-users] SSDs: cache pool/tier versus node-local > > > block cache > > > Message-ID: > > > > i...@mail.gmail.com> > > > Content-Type: text/plain; charset=UTF-8 > > > > > > >> Obviously the ssds could be used as journal devices, but I'm not > > > >> really convinced whether this is worthwhile when all nodes have 1GB > > > >> of > > hardware > > > >> writeback cache (writes to journal and data areas on the same > > > >> spindle > > have > > > >> time to coalesce in the cache and minimise seek time hurt). Any > > > >> advice > > on > > > >> this? > > > > > > All writes need to be written to the journal before being written to > > > the data volume so it's going to impact your overall throughput and > > > cause seeking, a hardware cache will only help with the later (unless > > > you use btrfs). > > > > Indeed. Also a 1GB cache having to serve 12 spindles isn't as impressive > anymore when it comes down to per disk cache (assuming more or less > uniform activity). > That hardware cache also will be used for reads (I've seen controllers > that allow you to influence the read/write cache usage ratio, but none > where you could disable caching reads right away). > > Which leads me to another point, your journal SSDs will be hanging of that > same controller as the OSD HDDs. > Meaning that they will compete for hardware cache space that would be much > better used for the HDDs (again, I'm unaware of any controller that allows > to disable caching for individual disks). > > That's why for my current first production cluster as well as any future > ones am planning to separate the SSDs from the OSDs whenever possible. So the PERC 710p, whilst not having the native JBOD mode of the underlying LSI 2208 chipset, does allow per- virtual-disk cache and read-ahead mode settings. It also does support "Cut-Through IO" (CTIO), apparently enabled when the virtual-disk is set to no read-ahead and write-through caching. So my draft plan is that for our hardware we'll have 12x single-RAID0 virtual-disks, the 3 ssds will be set for CTIO. > > Right, good point. So back of envelope calculations for throughput > > scenarios based on our hardware, just saying 150MB/s r/w for the spindles > > and 450/350MB/s r/w for the ssds, and pretending no controller > > bottlenecks etc: > > > > 1 OSD node (without ssd journals, hence divide by 2): > > 9 * 150 / 2 = 675MB/s write throughput > > > Which is, even though extremely optimistic, quite below your network > bandwidth. Indeed (I'd say wildly optimistic, but for the sake of argument one has to have some sort of number/s). > > 1 OSD node (with ssd journals): > > min(9 * 150, 3 * 350) = 1050MB/s write throughput > > > > Aggregates for 12 OSDs: ~8GB/s versus 12.5GB/s > > > You get to divide those aggregate numbers by your replication factor and > if you value your data that is 3. Can anyone point to the reasoning/background behind the shift to favouring a 3x replication factor? When we started out it seemed that 2x was the recommendation, and that's what we're running with at present. Current use case is RBD volumes for working data and we're looking at integrating a cold-storage option for long-term durability of those, so our replication is mainly about availability. I assume 3x replication is more relevant for radosgw? There was an interesting discussion a while back about calculating data-loss probabilities under certain conditions but it didn't seem to have a definitive end... > That replication will also eat into your network bandwidth, making a > dedicated cluster network for replication potentially quite attractive. > But since in your case the disk bandwidth per node is pretty close to the > network bandwidth of 10GE, using the dual ports for a resilient public > network might be a better approach. Plan is to use L2 MSTP. So we have multiple VLANs, e.g., client-access and storage-private. They're bonded in active/passive configuration with each active on a different port and the VLANs having independent root bridges. In port/cable/switch failure-mode all VLANs get squished over the same port. > > So the general naive case seems like a no-brainer, we should use SSD > > journals. But then we don't require even 8GB/s most of the time... > > > Well, first and foremost people here seem to obsessed with throughput, > everybody clamors about that and the rbd bench doesn't help either. > > Unless
Re: [ceph-users] force_create_pg not working
Well, this is embarrassing. After working on this for a week, it finally created last night. The only thing that changed in the past 2 days was that I ran ceph osd unset noscrub and ceph osd unset nodeep-scrub. I had disabled both scrubs in the hope that backfilling would finish faster. I only had the default logging level, and the OSD logs don't show anything interesting. Looking at ceph.log when the incomplete goes away... it's complicated. 4 of my 16 OSDs were kicked out for being unresponsive. They stayed down and out for about 4 hours, until I restarted them. At some point, the incomplete went away, without ever going into creating. I've had a lot of problems with OSDs doing being marked down and out in this cluster. It started with some extreme OSD slowness caused by kswapd problems. I think I have that under control now. But during that time, the OSDs thrashed themselves so hard, some of them aren't stable anymore. I still have two OSDs marked out in this cluster. If I mark them in, as soon as they start backfilling, they start using 100% CPU, and the other OSDs complain that they're not responding to heartbeats. So far, the 14 OSDs that are IN are remapping fine. If remapping completes, I plan to zap those two OSDs and re-add them. On 4/16/14 22:10 , Gregory Farnum wrote: Do you have any logging running on those OSDs? I'm going to need to get somebody else to look at this, but if we could check the probe messages being sent that might be helpful. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Apr 15, 2014 at 4:36 PM, Craig Lewis wrote: http://pastebin.com/ti1VYqfr I assume the problem is at the very end: "probing_osds": [ 0, 2, 3, 4, 11, 13], "down_osds_we_would_probe": [], "peering_blocked_by": []}, OSDs 3, 4, and 11 have been UP and IN for hours. OSDs 0, 2, and 13 have been UP and IN since the problems started, but they never complete probing. Craig Lewis Senior Systems Engineer Office +1.714.602.1309 Email cle...@centraldesktop.com Central Desktop. Work together in ways you never thought possible. Connect with us Website | Twitter | Facebook | LinkedIn | Blog On 4/15/14 16:07 , Gregory Farnum wrote: What are the results of "ceph osd pg 11.483 query"? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Apr 15, 2014 at 4:01 PM, Craig Lewis wrote: I have 1 incomplete PG. The data is gone, but I can upload it again. I just need to make the cluster start working so I can upload it. I've read a bunch of mailling list posts, and found ceph pg force_create_pg. Except, it doesn't work. I run: root@ceph1c:/var/lib/ceph/osd# ceph pg force_create_pg 11.483 pg 11.483 now creating, ok The incomplete PG switches to creating. It sits in creating for a while, then flips back to incomplete: 2014-04-15 15:06:11.876535 mon.0 [INF] pgmap v5719605: 2592 pgs: 2586 active+clean, 1 incomplete, 5 active+clean+scrubbing+deep; 15086 GB data, 27736 GB used, 28127 GB / 55864 GB avail 2014-04-15 15:06:13.899681 mon.0 [INF] pgmap v5719606: 2592 pgs: 1 creating, 2586 active+clean, 5 active+clean+scrubbing+deep; 15086 GB data, 27736 GB used, 28127 GB / 55864 GB avail 2014-04-15 15:06:14.965676 mon.0 [INF] pgmap v5719607: 2592 pgs: 1 creating, 2586 active+clean, 5 active+clean+scrubbing+deep; 15086 GB data, 27736 GB used, 28127 GB / 55864 GB avail 2014-04-15 15:06:15.995570 mon.0 [INF] pgmap v5719608: 2592 pgs: 1 creating, 2586 active+clean, 5 active+clean+scrubbing+deep; 15086 GB data, 27736 GB used, 28127 GB / 55864 GB avail 2014-04-15 15:06:17.019972 mon.0 [INF] pgmap v5719609: 2592 pgs: 1 creating, 2586 active+clean, 5 active+clean+scrubbing+deep; 15086 GB data, 27736 GB used, 28127 GB / 55864 GB avail 2014-04-15 15:06:18.048487 mon.0 [INF] pgmap v5719610: 2592 pgs: 1 creating, 2586 active+clean, 5 active+clean+scrubbing+deep; 15086 GB data, 27736 GB used, 28127 GB / 55864 GB avail 2014-04-15 15:06:19.093757 mon.0 [INF] pgmap v5719611: 2592 pgs: 2586 active+clean, 1 incomplete, 5 active+clean+scrubbing+deep; 15086 GB data, 27736 GB used, 28127 GB / 55864 GB avail I'm on: root@ceph0c:~# ceph -v ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60) root@ceph0c:~# uname -a Linux ceph0c 3.5.0-46-generic #70~precise1-Ubuntu SMP Thu Jan 9 23:55:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux root@ceph0c:~# cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=12.04 DISTRIB_CODENAME=precise DISTRIB_DESCRIPTION="Ubuntu 12.04.4 LTS" -- Craig Lewis Senior Systems Engineer Office +1.714.602.1309 Email cle...@centraldesktop.com Central Desktop. Work together in ways you never thought possible. Connect with us Website | Twitter | Facebook | LinkedIn | Blog ___ ceph-users mailing list ceph-users@lists.ceph.com http
Re: [ceph-users] SSDs: cache pool/tier versus node-local block cache
>> >> I think the timing should work that we'll be deploying with Firefly and >> >> so >> >> have Ceph cache pool tiering as an option, but I'm also evaluating >> >> Bcache >> >> versus Tier to act as node-local block cache device. Does anybody have >> >> real >> >> or anecdotal evidence about which approach has better performance? >> > New idea that is dependent on failure behaviour of the cache tier... >> >> The problem with this type of configuration is it ties a VM to a >> specific hypervisor, in theory it should be faster because you don't >> have network latency from round trips to the cache tier, resulting in >> higher iops. Large sequential workloads may achieve higher throughput >> by parallelizing across many OSDs in a cache tier, whereas local flash >> would be limited to single device throughput. > > Ah, I was ambiguous. When I said node-local I meant OSD-local. So I'm really > looking at: > 2-copy write-back object ssd cache-pool > versus > OSD write-back ssd block-cache > versus > 1-copy write-around object cache-pool & ssd journal Ceph cache pools allow you to scale the size of the cache pool independent of the underlying storage and avoids constraints about disk:ssd ratios (for flashcache, bcache, etc). Local block caches should have lower latency than a cache tier for a cache miss, due to the extra hop(s) across the network. I would lean towards using Ceph's cache tiers for the scaling independence. > This is undoubtedly true for a write-back cache-tier. But in the scenario > I'm suggesting, a write-around cache, that needn't be bad news - if a > cache-tier OSD is lost the cache simply just got smaller and some cached > objects were unceremoniously flushed. The next read on those objects should > just miss and bring them into the now smaller cache. > > The thing I'm trying to avoid with the above is double read-caching of > objects (so as to get more aggregate read cache). I assume the standard > wisdom with write-back cache-tiering is that the backing data pool shouldn't > bother with ssd journals? Currently, all cache tiers need to be durable - regardless of cache mode. As such, cache tiers should be erasure coded or N+1 replicated (I'd recommend N+2 or 3x replica). Ceph could potentially do what you described in the future, it just doesn't yet. -- Kyle ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Troubles MDS
On Thu, Apr 17, 2014 at 12:45 AM, Georg Höllrigl wrote: > Hello Greg, > > I've searched - but don't see any backtraces... I've tried to get some more > info out of the logs. I really hope, there is something interesting in it: > > It all started two days ago with an authentication error: > > 2014-04-14 21:08:55.929396 7fd93d53f700 1 mds.0.0 standby_replay_restart > (as standby) > 2014-04-14 21:09:07.989547 7fd93b62e700 1 mds.0.0 replay_done (as standby) > 2014-04-14 21:09:08.989647 7fd93d53f700 1 mds.0.0 standby_replay_restart > (as standby) > 2014-04-14 21:09:10.633786 7fd93b62e700 1 mds.0.0 replay_done (as standby) > 2014-04-14 21:09:11.633886 7fd93d53f700 1 mds.0.0 standby_replay_restart > (as standby) > 2014-04-14 21:09:17.995105 7fd93f644700 0 mds.0.0 handle_mds_beacon no > longer laggy > 2014-04-14 21:09:39.798798 7fd93f644700 0 monclient: hunting for new mon > 2014-04-14 21:09:39.955078 7fd93f644700 1 mds.-1.-1 handle_mds_map i > (10.0.1.107:6800/16503) dne in the mdsmap, respawning myself > 2014-04-14 21:09:39.955094 7fd93f644700 1 mds.-1.-1 respawn > 2014-04-14 21:09:39.955106 7fd93f644700 1 mds.-1.-1 e: '/usr/bin/ceph-mds' > 2014-04-14 21:09:39.955109 7fd93f644700 1 mds.-1.-1 0: '/usr/bin/ceph-mds' > 2014-04-14 21:09:39.955110 7fd93f644700 1 mds.-1.-1 1: '-i' > 2014-04-14 21:09:39.955112 7fd93f644700 1 mds.-1.-1 2: 'ceph-m-02' > 2014-04-14 21:09:39.955113 7fd93f644700 1 mds.-1.-1 3: '--pid-file' > 2014-04-14 21:09:39.955114 7fd93f644700 1 mds.-1.-1 4: > '/var/run/ceph/mds.ceph-m-02.pid' > 2014-04-14 21:09:39.955116 7fd93f644700 1 mds.-1.-1 5: '-c' > 2014-04-14 21:09:39.955117 7fd93f644700 1 mds.-1.-1 6: > '/etc/ceph/ceph.conf' > 2014-04-14 21:09:39.979138 7fd93f644700 1 mds.-1.-1 cwd / > 2014-04-14 19:09:40.922683 7f8ba9973780 0 ceph version 0.72.2 > (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 16505 > 2014-04-14 19:09:40.975024 7f8ba9973780 -1 mds.-1.0 ERROR: failed to > authenticate: (1) Operation not permitted > 2014-04-14 19:09:40.975070 7f8ba9973780 1 mds.-1.0 suicide. wanted > down:dne, now up:boot Well, as it says, this is a failure to authenticate with the cluster. It could conceivably be a result of a huge clock skew, or maybe the keyring wasn't accessible for some reason... > > That was fixed with restarting mds (+ the whole server). > > 2014-04-15 07:07:15.948650 7f9fdec0d780 0 ceph version 0.72.2 > (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 506 > 2014-04-15 07:07:15.954386 7f9fdec0d780 -1 mds.-1.0 ERROR: failed to > authenticate: (1) Operation not permitted > 2014-04-15 07:07:15.954422 7f9fdec0d780 1 mds.-1.0 suicide. wanted > down:dne, now up:boot > 2014-04-15 07:15:49.177861 7fe8a1d60780 0 ceph version 0.72.2 > (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 26401 > 2014-04-15 07:15:49.184027 7fe8a1d60780 -1 mds.-1.0 ERROR: failed to > authenticate: (1) Operation not permitted > 2014-04-15 07:15:49.184046 7fe8a1d60780 1 mds.-1.0 suicide. wanted > down:dne, now up:boot > 2014-04-15 07:17:32.598031 7fab123e6780 0 ceph version 0.72.2 > (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 30531 > 2014-04-15 07:17:32.604560 7fab123e6780 -1 mds.-1.0 ERROR: failed to > authenticate: (1) Operation not permitted > 2014-04-15 07:17:32.604592 7fab123e6780 1 mds.-1.0 suicide. wanted > down:dne, now up:boot > 2014-04-15 07:21:56.099203 7fd37b951780 0 ceph version 0.72.2 > (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 11335 > 2014-04-15 07:21:56.105229 7fd37b951780 -1 mds.-1.0 ERROR: failed to > authenticate: (1) Operation not permitted > 2014-04-15 07:21:56.105254 7fd37b951780 1 mds.-1.0 suicide. wanted > down:dne, now up:boot > 2014-04-15 07:22:09.345800 7f23392ef780 0 ceph version 0.72.2 > (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 11461 > 2014-04-15 07:22:09.390001 7f23392ef780 -1 mds.-1.0 ERROR: failed to > authenticate: (1) Operation not permitted > 2014-04-15 07:22:09.391087 7f23392ef780 1 mds.-1.0 suicide. wanted > down:dne, now up:boot > 2014-04-15 07:28:01.762191 7fab6d14b780 0 ceph version 0.72.2 > (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 28263 > 2014-04-15 07:28:01.779485 7fab6d14b780 -1 mds.-1.0 ERROR: failed to > authenticate: (1) Operation not permitted > 2014-04-15 07:28:01.779507 7fab6d14b780 1 mds.-1.0 suicide. wanted > down:dne, now up:boot > 2014-04-15 07:35:49.065110 7fe4f6b0d780 0 ceph version 0.72.2 > (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 1233 > 2014-04-15 07:35:52.191856 7fe4f6b05700 0 -- 10.0.1.107:6800/1233 >> > 10.0.1.108:6789/0 pipe(0x2f9f500 sd=8 :0 s=1 pgs=0 cs=0 l=1 > c=0x2f81580).fault > 2014-04-15 07:35:56.352553 7fe4f1b96700 1 mds.-1.0 handle_mds_map standby > 2014-04-15 07:35:56.419499 7fe4f1b96700 1 mds.0.0 handle_mds_map i am now > mds.7905854.0replaying mds.0.0 > 2014-04-15 07:35:56.419507 7fe4f1b96700 1 mds.0.0 handle_mds_map state > c
Re: [ceph-users] question on harvesting freed space
So in the mean time, are there any common work-arounds? I'm assuming monitoring imageused/imagesize ratio and if its greater than some tolerance create a new image and move file system content over is an effective, if crude approach. I'm not clear on how to measure the amount of storage an image uses at the RBD level. Probably because I don't understand info output: $ sudo rbd --id nova info somecontainer rbd image 'somecontainer': size 1024 GB in 262144 objects order 22 (4096 kB objects) block_name_prefix: rb.0.176f3.238e1f29 format: 1 Are there others? I assume snapshotting images doesn't help here since RBD still wouldn't be able to distinguish what's in use and what's not. Thoughts? ~jpr On 04/17/2014 01:38 AM, Wido den Hollander wrote: > On 04/17/2014 02:39 AM, Somnath Roy wrote: >> It seems Discard support for kernel rbd is targeted for v80.. >> >> http://tracker.ceph.com/issues/190 >> > > True, but it will obviously take time before this hits the upstream > kernels and goes into distributions. > > For RHEL 7 it might be that the krbd module from the Ceph extra repo > might work. For Ubuntu it's waiting for newer kernels to be backported > to the LTS releases. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD as a hot spare
17 апр. 2014 г., в 16:41, Wido den Hollander написал(а): > On 04/17/2014 02:37 PM, Pavel V. Kaygorodov wrote: >> Hi! >> >> How do you think, is it a good idea, to add RBD block device as a hot spare >> drive to a linux software raid? >> > > Well, it could work, but why? What is the total setup going to be? > > RAID over a couple of physical disks with RBD as hotspare? Yes. If it works, it can reduce overall cost of disk subsystem, because I can use all physical drives for raid on each host, and add RBD hot spares, which (due to thin provisioning) will not consume any real space until hot spares become active. Pavel. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD as a hot spare
On 04/17/2014 02:37 PM, Pavel V. Kaygorodov wrote: Hi! How do you think, is it a good idea, to add RBD block device as a hot spare drive to a linux software raid? Well, it could work, but why? What is the total setup going to be? RAID over a couple of physical disks with RBD as hotspare? Pavel. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander Ceph consultant and trainer 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RBD as a hot spare
Hi! How do you think, is it a good idea, to add RBD block device as a hot spare drive to a linux software raid? Pavel. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD write access patterns and atime
Thanks Dan! Thanks, Mike Dawson On 4/17/2014 4:06 AM, Dan van der Ster wrote: Mike Dawson wrote: Dan, Could you describe how you harvested and analyzed this data? Even better, could you share the code? Cheers, Mike First enable debug_filestore=10, then you'll see logs like this: 2014-04-17 09:40:34.466749 7fb39df16700 10 filestore(/var/lib/ceph/osd/osd.0) write 4.206_head/57186206/rbd_data.1f7ccd36575a0ed.1620/head//4 651264~4096 = 4096 and this for reads: 2014-04-17 09:46:10.449577 7fb392427700 10 filestore(/var/lib/ceph/osd/osd.0) FileStore::read 4.fe9_head/f7281fe9/rbd_data.10bb48f705289c0.6a24/head//4 1994752~4096/4096 The last num is the size of the write/read. Then run this: https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl Cheers, Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Does anyone success deploy FEDERATED GATEWAYS
I am currently testing this functionality. What is your issue? On 04/17/2014 07:32 AM, maoqi1982 wrote: Hi list i follow the http://ceph.com/docs/master/radosgw/federated-config/ to test the muti-geography function.failed. .Does anyone success deploy FEDERATED GATEWAYS?Is the function in ceph ok or not? if anyone success deploy it please give me some help thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Systems and Storage Engineer, Digital Repository of Ireland (DRI) High Performance & Research Computing, IS Services Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. http://www.tchpc.tcd.ie/ | ptier...@tchpc.tcd.ie Tel: +353-1-896-4466 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Troubles MDS
On Thu, Apr 17, 2014 at 4:10 PM, Georg Höllrigl wrote: > Whatever happened - It fixed itself!? > > When restarting, I got ~ 165k log messages like: > 2014-04-17 07:30:14.856421 7fc50b991700 0 log [WRN] : ino 1f24fe0 > 2014-04-17 07:30:14.856422 7fc50b991700 0 log [WRN] : ino 1f24fe1 > 2014-04-17 07:30:14.856423 7fc50b991700 0 log [WRN] : ino 1f24fe2 > 2014-04-17 07:30:14.856424 7fc50b991700 0 log [WRN] : ino 1f24fe3 > 2014-04-17 07:30:14.856427 7fc50b991700 0 log [WRN] : ino 1f24fe4 > 2014-04-17 07:30:14.856428 7fc50b991700 0 log [WRN] : ino 1f24fe5 > > And the clients recovered!? > > I would be really interested, what happend! > > Georg > > > On 17.04.2014 09:45, Georg Höllrigl wrote: >> >> Hello Greg, >> >> I've searched - but don't see any backtraces... I've tried to get some >> more info out of the logs. I really hope, there is something interesting >> in it: >> >> It all started two days ago with an authentication error: >> >> 2014-04-14 21:08:55.929396 7fd93d53f700 1 mds.0.0 >> standby_replay_restart (as standby) >> 2014-04-14 21:09:07.989547 7fd93b62e700 1 mds.0.0 replay_done (as >> standby) >> 2014-04-14 21:09:08.989647 7fd93d53f700 1 mds.0.0 >> standby_replay_restart (as standby) >> 2014-04-14 21:09:10.633786 7fd93b62e700 1 mds.0.0 replay_done (as >> standby) >> 2014-04-14 21:09:11.633886 7fd93d53f700 1 mds.0.0 >> standby_replay_restart (as standby) >> 2014-04-14 21:09:17.995105 7fd93f644700 0 mds.0.0 handle_mds_beacon no >> longer laggy >> 2014-04-14 21:09:39.798798 7fd93f644700 0 monclient: hunting for new mon >> 2014-04-14 21:09:39.955078 7fd93f644700 1 mds.-1.-1 handle_mds_map i >> (10.0.1.107:6800/16503) dne in the mdsmap, respawning myself >> 2014-04-14 21:09:39.955094 7fd93f644700 1 mds.-1.-1 respawn >> 2014-04-14 21:09:39.955106 7fd93f644700 1 mds.-1.-1 e: >> '/usr/bin/ceph-mds' >> 2014-04-14 21:09:39.955109 7fd93f644700 1 mds.-1.-1 0: >> '/usr/bin/ceph-mds' >> 2014-04-14 21:09:39.955110 7fd93f644700 1 mds.-1.-1 1: '-i' >> 2014-04-14 21:09:39.955112 7fd93f644700 1 mds.-1.-1 2: 'ceph-m-02' >> 2014-04-14 21:09:39.955113 7fd93f644700 1 mds.-1.-1 3: '--pid-file' >> 2014-04-14 21:09:39.955114 7fd93f644700 1 mds.-1.-1 4: >> '/var/run/ceph/mds.ceph-m-02.pid' >> 2014-04-14 21:09:39.955116 7fd93f644700 1 mds.-1.-1 5: '-c' >> 2014-04-14 21:09:39.955117 7fd93f644700 1 mds.-1.-1 6: >> '/etc/ceph/ceph.conf' >> 2014-04-14 21:09:39.979138 7fd93f644700 1 mds.-1.-1 cwd / >> 2014-04-14 19:09:40.922683 7f8ba9973780 0 ceph version 0.72.2 >> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 16505 >> 2014-04-14 19:09:40.975024 7f8ba9973780 -1 mds.-1.0 ERROR: failed to >> authenticate: (1) Operation not permitted >> 2014-04-14 19:09:40.975070 7f8ba9973780 1 mds.-1.0 suicide. wanted >> down:dne, now up:boot >> >> That was fixed with restarting mds (+ the whole server). >> >> 2014-04-15 07:07:15.948650 7f9fdec0d780 0 ceph version 0.72.2 >> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 506 >> 2014-04-15 07:07:15.954386 7f9fdec0d780 -1 mds.-1.0 ERROR: failed to >> authenticate: (1) Operation not permitted >> 2014-04-15 07:07:15.954422 7f9fdec0d780 1 mds.-1.0 suicide. wanted >> down:dne, now up:boot >> 2014-04-15 07:15:49.177861 7fe8a1d60780 0 ceph version 0.72.2 >> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 26401 >> 2014-04-15 07:15:49.184027 7fe8a1d60780 -1 mds.-1.0 ERROR: failed to >> authenticate: (1) Operation not permitted >> 2014-04-15 07:15:49.184046 7fe8a1d60780 1 mds.-1.0 suicide. wanted >> down:dne, now up:boot >> 2014-04-15 07:17:32.598031 7fab123e6780 0 ceph version 0.72.2 >> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 30531 >> 2014-04-15 07:17:32.604560 7fab123e6780 -1 mds.-1.0 ERROR: failed to >> authenticate: (1) Operation not permitted >> 2014-04-15 07:17:32.604592 7fab123e6780 1 mds.-1.0 suicide. wanted >> down:dne, now up:boot >> 2014-04-15 07:21:56.099203 7fd37b951780 0 ceph version 0.72.2 >> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 11335 >> 2014-04-15 07:21:56.105229 7fd37b951780 -1 mds.-1.0 ERROR: failed to >> authenticate: (1) Operation not permitted >> 2014-04-15 07:21:56.105254 7fd37b951780 1 mds.-1.0 suicide. wanted >> down:dne, now up:boot >> 2014-04-15 07:22:09.345800 7f23392ef780 0 ceph version 0.72.2 >> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 11461 >> 2014-04-15 07:22:09.390001 7f23392ef780 -1 mds.-1.0 ERROR: failed to >> authenticate: (1) Operation not permitted >> 2014-04-15 07:22:09.391087 7f23392ef780 1 mds.-1.0 suicide. wanted >> down:dne, now up:boot >> 2014-04-15 07:28:01.762191 7fab6d14b780 0 ceph version 0.72.2 >> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 28263 >> 2014-04-15 07:28:01.779485 7fab6d14b780 -1 mds.-1.0 ERROR: failed to >> authenticate: (1) Operation not permitted >> 2014-04-15 07:28:01.779507 7fab6d14b780 1 mds.-1.0 suicide. want
Re: [ceph-users] SSDs: cache pool/tier versus node-local block cache
On Thu, 17 Apr 2014 12:58:55 +1000 Blair Bethwaite wrote: > Hi Kyle, > > Thanks for the response. Further comments/queries... > > > Message: 42 > > Date: Wed, 16 Apr 2014 06:53:41 -0700 > > From: Kyle Bader > > Cc: ceph-users > > Subject: Re: [ceph-users] SSDs: cache pool/tier versus node-local > > block cache > > Message-ID: > > i...@mail.gmail.com> > > Content-Type: text/plain; charset=UTF-8 > > > > >> Obviously the ssds could be used as journal devices, but I'm not > > >> really convinced whether this is worthwhile when all nodes have 1GB > > >> of > hardware > > >> writeback cache (writes to journal and data areas on the same > > >> spindle > have > > >> time to coalesce in the cache and minimise seek time hurt). Any > > >> advice > on > > >> this? > > > > All writes need to be written to the journal before being written to > > the data volume so it's going to impact your overall throughput and > > cause seeking, a hardware cache will only help with the later (unless > > you use btrfs). > Indeed. Also a 1GB cache having to serve 12 spindles isn't as impressive anymore when it comes down to per disk cache (assuming more or less uniform activity). That hardware cache also will be used for reads (I've seen controllers that allow you to influence the read/write cache usage ratio, but none where you could disable caching reads right away). Which leads me to another point, your journal SSDs will be hanging of that same controller as the OSD HDDs. Meaning that they will compete for hardware cache space that would be much better used for the HDDs (again, I'm unaware of any controller that allows to disable caching for individual disks). That's why for my current first production cluster as well as any future ones am planning to separate the SSDs from the OSDs whenever possible. > Right, good point. So back of envelope calculations for throughput > scenarios based on our hardware, just saying 150MB/s r/w for the spindles > and 450/350MB/s r/w for the ssds, and pretending no controller > bottlenecks etc: > > 1 OSD node (without ssd journals, hence divide by 2): > 9 * 150 / 2 = 675MB/s write throughput > Which is, even though extremely optimistic, quite below your network bandwidth. > 1 OSD node (with ssd journals): > min(9 * 150, 3 * 350) = 1050MB/s write throughput > > Aggregates for 12 OSDs: ~8GB/s versus 12.5GB/s > You get to divide those aggregate numbers by your replication factor and if you value your data that is 3. That replication will also eat into your network bandwidth, making a dedicated cluster network for replication potentially quite attractive. But since in your case the disk bandwidth per node is pretty close to the network bandwidth of 10GE, using the dual ports for a resilient public network might be a better approach. > So the general naive case seems like a no-brainer, we should use SSD > journals. But then we don't require even 8GB/s most of the time... > Well, first and foremost people here seem to obsessed with throughput, everybody clamors about that and the rbd bench doesn't help either. Unless you have a very special use case of basically writing or reading a few large sequential files, you will run out of IOPS long before you run out of raw bandwidth. And that's were caching/coalescing all along the way, from RBD cache for the VMs, SSD journals to the hardware cache of your controller comes in. These will all allow you to have peak performance far over the sustainable IOPS of your backing HDDs, for some time at least. In your case that sustained rate for the cluster you outlined would be something (assuming 100 IOPS for those NL drives) like this: 100(IOPS) x 9(disk) x 12(hosts) / 3(replication ratio) = 3600 IOPS However that's ignoring all the other caches, in particular the controller HW cache, which can raise the sustainable level quite a bit. Regards, Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Troubles MDS
Whatever happened - It fixed itself!? When restarting, I got ~ 165k log messages like: 2014-04-17 07:30:14.856421 7fc50b991700 0 log [WRN] : ino 1f24fe0 2014-04-17 07:30:14.856422 7fc50b991700 0 log [WRN] : ino 1f24fe1 2014-04-17 07:30:14.856423 7fc50b991700 0 log [WRN] : ino 1f24fe2 2014-04-17 07:30:14.856424 7fc50b991700 0 log [WRN] : ino 1f24fe3 2014-04-17 07:30:14.856427 7fc50b991700 0 log [WRN] : ino 1f24fe4 2014-04-17 07:30:14.856428 7fc50b991700 0 log [WRN] : ino 1f24fe5 And the clients recovered!? I would be really interested, what happend! Georg On 17.04.2014 09:45, Georg Höllrigl wrote: Hello Greg, I've searched - but don't see any backtraces... I've tried to get some more info out of the logs. I really hope, there is something interesting in it: It all started two days ago with an authentication error: 2014-04-14 21:08:55.929396 7fd93d53f700 1 mds.0.0 standby_replay_restart (as standby) 2014-04-14 21:09:07.989547 7fd93b62e700 1 mds.0.0 replay_done (as standby) 2014-04-14 21:09:08.989647 7fd93d53f700 1 mds.0.0 standby_replay_restart (as standby) 2014-04-14 21:09:10.633786 7fd93b62e700 1 mds.0.0 replay_done (as standby) 2014-04-14 21:09:11.633886 7fd93d53f700 1 mds.0.0 standby_replay_restart (as standby) 2014-04-14 21:09:17.995105 7fd93f644700 0 mds.0.0 handle_mds_beacon no longer laggy 2014-04-14 21:09:39.798798 7fd93f644700 0 monclient: hunting for new mon 2014-04-14 21:09:39.955078 7fd93f644700 1 mds.-1.-1 handle_mds_map i (10.0.1.107:6800/16503) dne in the mdsmap, respawning myself 2014-04-14 21:09:39.955094 7fd93f644700 1 mds.-1.-1 respawn 2014-04-14 21:09:39.955106 7fd93f644700 1 mds.-1.-1 e: '/usr/bin/ceph-mds' 2014-04-14 21:09:39.955109 7fd93f644700 1 mds.-1.-1 0: '/usr/bin/ceph-mds' 2014-04-14 21:09:39.955110 7fd93f644700 1 mds.-1.-1 1: '-i' 2014-04-14 21:09:39.955112 7fd93f644700 1 mds.-1.-1 2: 'ceph-m-02' 2014-04-14 21:09:39.955113 7fd93f644700 1 mds.-1.-1 3: '--pid-file' 2014-04-14 21:09:39.955114 7fd93f644700 1 mds.-1.-1 4: '/var/run/ceph/mds.ceph-m-02.pid' 2014-04-14 21:09:39.955116 7fd93f644700 1 mds.-1.-1 5: '-c' 2014-04-14 21:09:39.955117 7fd93f644700 1 mds.-1.-1 6: '/etc/ceph/ceph.conf' 2014-04-14 21:09:39.979138 7fd93f644700 1 mds.-1.-1 cwd / 2014-04-14 19:09:40.922683 7f8ba9973780 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 16505 2014-04-14 19:09:40.975024 7f8ba9973780 -1 mds.-1.0 ERROR: failed to authenticate: (1) Operation not permitted 2014-04-14 19:09:40.975070 7f8ba9973780 1 mds.-1.0 suicide. wanted down:dne, now up:boot That was fixed with restarting mds (+ the whole server). 2014-04-15 07:07:15.948650 7f9fdec0d780 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 506 2014-04-15 07:07:15.954386 7f9fdec0d780 -1 mds.-1.0 ERROR: failed to authenticate: (1) Operation not permitted 2014-04-15 07:07:15.954422 7f9fdec0d780 1 mds.-1.0 suicide. wanted down:dne, now up:boot 2014-04-15 07:15:49.177861 7fe8a1d60780 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 26401 2014-04-15 07:15:49.184027 7fe8a1d60780 -1 mds.-1.0 ERROR: failed to authenticate: (1) Operation not permitted 2014-04-15 07:15:49.184046 7fe8a1d60780 1 mds.-1.0 suicide. wanted down:dne, now up:boot 2014-04-15 07:17:32.598031 7fab123e6780 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 30531 2014-04-15 07:17:32.604560 7fab123e6780 -1 mds.-1.0 ERROR: failed to authenticate: (1) Operation not permitted 2014-04-15 07:17:32.604592 7fab123e6780 1 mds.-1.0 suicide. wanted down:dne, now up:boot 2014-04-15 07:21:56.099203 7fd37b951780 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 11335 2014-04-15 07:21:56.105229 7fd37b951780 -1 mds.-1.0 ERROR: failed to authenticate: (1) Operation not permitted 2014-04-15 07:21:56.105254 7fd37b951780 1 mds.-1.0 suicide. wanted down:dne, now up:boot 2014-04-15 07:22:09.345800 7f23392ef780 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 11461 2014-04-15 07:22:09.390001 7f23392ef780 -1 mds.-1.0 ERROR: failed to authenticate: (1) Operation not permitted 2014-04-15 07:22:09.391087 7f23392ef780 1 mds.-1.0 suicide. wanted down:dne, now up:boot 2014-04-15 07:28:01.762191 7fab6d14b780 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 28263 2014-04-15 07:28:01.779485 7fab6d14b780 -1 mds.-1.0 ERROR: failed to authenticate: (1) Operation not permitted 2014-04-15 07:28:01.779507 7fab6d14b780 1 mds.-1.0 suicide. wanted down:dne, now up:boot 2014-04-15 07:35:49.065110 7fe4f6b0d780 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 1233 2014-04-15 07:35:52.191856 7fe4f6b05700 0 -- 10.0.1.107:6800/1233 >> 10.0.1.108:6789/0 pipe(0x2f9f500 sd=8 :0 s=1 pgs=0 cs=0 l=1 c=0x2f81580).fault 2014-04-15 07:35:5
Re: [ceph-users] RBD write access patterns and atime
Christian Balzer wrote: > I'm trying to understand that distribution, and the best explanation > I've come up with is that these are ext4/xfs metadata updates, > probably atime updates. Based on that theory, I'm going to test > noatime on a few VMs and see if I notice a change in the distribution. > That strikes me as odd, as since kernel2.6.30 the default option for mounts is relatime, which should have an effect quite close to that of a strict noatime. That's a good point, which I hadn't realized. I confirmed that relatime is used by default on our RHEL6 client VMs, so it probably isn't the file accesses leading to many small writes. Any other theories? Cheers, Dan -- Dan van der Ster || Data & Storage Services || CERN IT Department -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD write access patterns and atime
Mike Dawson wrote: Dan, Could you describe how you harvested and analyzed this data? Even better, could you share the code? Cheers, Mike First enable debug_filestore=10, then you'll see logs like this: 2014-04-17 09:40:34.466749 7fb39df16700 10 filestore(/var/lib/ceph/osd/osd.0) write 4.206_head/57186206/rbd_data.1f7ccd36575a0ed.1620/head//4 651264~4096 = 4096 and this for reads: 2014-04-17 09:46:10.449577 7fb392427700 10 filestore(/var/lib/ceph/osd/osd.0) FileStore::read 4.fe9_head/f7281fe9/rbd_data.10bb48f705289c0.6a24/head//4 1994752~4096/4096 The last num is the size of the write/read. Then run this: https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl Cheers, Dan -- -- Dan van der Ster || Data & Storage Services || CERN IT Department -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Troubles MDS
Hello Greg, I've searched - but don't see any backtraces... I've tried to get some more info out of the logs. I really hope, there is something interesting in it: It all started two days ago with an authentication error: 2014-04-14 21:08:55.929396 7fd93d53f700 1 mds.0.0 standby_replay_restart (as standby) 2014-04-14 21:09:07.989547 7fd93b62e700 1 mds.0.0 replay_done (as standby) 2014-04-14 21:09:08.989647 7fd93d53f700 1 mds.0.0 standby_replay_restart (as standby) 2014-04-14 21:09:10.633786 7fd93b62e700 1 mds.0.0 replay_done (as standby) 2014-04-14 21:09:11.633886 7fd93d53f700 1 mds.0.0 standby_replay_restart (as standby) 2014-04-14 21:09:17.995105 7fd93f644700 0 mds.0.0 handle_mds_beacon no longer laggy 2014-04-14 21:09:39.798798 7fd93f644700 0 monclient: hunting for new mon 2014-04-14 21:09:39.955078 7fd93f644700 1 mds.-1.-1 handle_mds_map i (10.0.1.107:6800/16503) dne in the mdsmap, respawning myself 2014-04-14 21:09:39.955094 7fd93f644700 1 mds.-1.-1 respawn 2014-04-14 21:09:39.955106 7fd93f644700 1 mds.-1.-1 e: '/usr/bin/ceph-mds' 2014-04-14 21:09:39.955109 7fd93f644700 1 mds.-1.-1 0: '/usr/bin/ceph-mds' 2014-04-14 21:09:39.955110 7fd93f644700 1 mds.-1.-1 1: '-i' 2014-04-14 21:09:39.955112 7fd93f644700 1 mds.-1.-1 2: 'ceph-m-02' 2014-04-14 21:09:39.955113 7fd93f644700 1 mds.-1.-1 3: '--pid-file' 2014-04-14 21:09:39.955114 7fd93f644700 1 mds.-1.-1 4: '/var/run/ceph/mds.ceph-m-02.pid' 2014-04-14 21:09:39.955116 7fd93f644700 1 mds.-1.-1 5: '-c' 2014-04-14 21:09:39.955117 7fd93f644700 1 mds.-1.-1 6: '/etc/ceph/ceph.conf' 2014-04-14 21:09:39.979138 7fd93f644700 1 mds.-1.-1 cwd / 2014-04-14 19:09:40.922683 7f8ba9973780 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 16505 2014-04-14 19:09:40.975024 7f8ba9973780 -1 mds.-1.0 ERROR: failed to authenticate: (1) Operation not permitted 2014-04-14 19:09:40.975070 7f8ba9973780 1 mds.-1.0 suicide. wanted down:dne, now up:boot That was fixed with restarting mds (+ the whole server). 2014-04-15 07:07:15.948650 7f9fdec0d780 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 506 2014-04-15 07:07:15.954386 7f9fdec0d780 -1 mds.-1.0 ERROR: failed to authenticate: (1) Operation not permitted 2014-04-15 07:07:15.954422 7f9fdec0d780 1 mds.-1.0 suicide. wanted down:dne, now up:boot 2014-04-15 07:15:49.177861 7fe8a1d60780 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 26401 2014-04-15 07:15:49.184027 7fe8a1d60780 -1 mds.-1.0 ERROR: failed to authenticate: (1) Operation not permitted 2014-04-15 07:15:49.184046 7fe8a1d60780 1 mds.-1.0 suicide. wanted down:dne, now up:boot 2014-04-15 07:17:32.598031 7fab123e6780 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 30531 2014-04-15 07:17:32.604560 7fab123e6780 -1 mds.-1.0 ERROR: failed to authenticate: (1) Operation not permitted 2014-04-15 07:17:32.604592 7fab123e6780 1 mds.-1.0 suicide. wanted down:dne, now up:boot 2014-04-15 07:21:56.099203 7fd37b951780 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 11335 2014-04-15 07:21:56.105229 7fd37b951780 -1 mds.-1.0 ERROR: failed to authenticate: (1) Operation not permitted 2014-04-15 07:21:56.105254 7fd37b951780 1 mds.-1.0 suicide. wanted down:dne, now up:boot 2014-04-15 07:22:09.345800 7f23392ef780 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 11461 2014-04-15 07:22:09.390001 7f23392ef780 -1 mds.-1.0 ERROR: failed to authenticate: (1) Operation not permitted 2014-04-15 07:22:09.391087 7f23392ef780 1 mds.-1.0 suicide. wanted down:dne, now up:boot 2014-04-15 07:28:01.762191 7fab6d14b780 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 28263 2014-04-15 07:28:01.779485 7fab6d14b780 -1 mds.-1.0 ERROR: failed to authenticate: (1) Operation not permitted 2014-04-15 07:28:01.779507 7fab6d14b780 1 mds.-1.0 suicide. wanted down:dne, now up:boot 2014-04-15 07:35:49.065110 7fe4f6b0d780 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 1233 2014-04-15 07:35:52.191856 7fe4f6b05700 0 -- 10.0.1.107:6800/1233 >> 10.0.1.108:6789/0 pipe(0x2f9f500 sd=8 :0 s=1 pgs=0 cs=0 l=1 c=0x2f81580).fault 2014-04-15 07:35:56.352553 7fe4f1b96700 1 mds.-1.0 handle_mds_map standby 2014-04-15 07:35:56.419499 7fe4f1b96700 1 mds.0.0 handle_mds_map i am now mds.7905854.0replaying mds.0.0 2014-04-15 07:35:56.419507 7fe4f1b96700 1 mds.0.0 handle_mds_map state change up:standby --> up:standby-replay 2014-04-15 07:35:56.419512 7fe4f1b96700 1 mds.0.0 replay_start 2014-04-15 07:35:56.425229 7fe4f1b96700 1 mds.0.0 recovery set is 2014-04-15 07:35:56.425241 7fe4f1b96700 1 mds.0.0 need osdmap epoch 1391, have 3696 2014-04-15 07:35:56.497573 7fe4f1b96700 0 mds.0.cache creating system inode with ino:100 2014-04-15 07:35
Re: [ceph-users] RBD write access patterns and atime
Hi, Gregory Farnum wrote: I forget which clients you're using — is rbd caching enabled? Yes, the clients are qemu-kvm-rhev with latest librbd from dumpling and rbd cache = true. Cheers, Dan -- Dan van der Ster || Data & Storage Services || CERN IT Department -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com