Re: [ceph-users] Does anyone success deploy FEDERATED GATEWAYS

2014-04-17 Thread maoqi1982
Hi Peter
thanks for your reply.

We plan to have a test of the muti-site data replication,but we encountered a 
problem.

All user and metadata were replicated ok,while the data failed  .radosgw-agent 
allways responsed "the state is error".




>Message: 22
>Date: Thu, 17 Apr 2014 12:03:06 +0100
>From: Peter 
>To: ceph-users@lists.ceph.com
>Subject: Re: [ceph-users] Does anyone success deploy FEDERATED
>   GATEWAYS
>Message-ID: <534fb4ea.10...@tchpc.tcd.ie>
>Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>
>I am currently testing this functionality. What is your issue?
>
>
>On 04/17/2014 07:32 AM, maoqi1982 wrote:
>> Hi list
>>  i follow the http://ceph.com/docs/master/radosgw/federated-config/ to 
>> test the muti-geography  function.failed. .Does anyone success deploy 
>> FEDERATED GATEWAYS?Is the function in ceph  ok or not? if anyone 
>> success deploy it please give me some help
>>
>> thanks.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Access denied error

2014-04-17 Thread Punit Dambiwal
Hi Yehuda,

With the same keys i am able to access the buckets through cyberduckbut
i use the same keys in the AdminOPS apiit through the access denied
erroras i have assign all the permissions to this user but still the
same access denied error...

http://ceph.com/docs/master/radosgw/s3/authentication/

How i can sign the message appropriately ???




On Thu, Apr 17, 2014 at 12:10 AM, Yehuda Sadeh  wrote:

> On Tue, Apr 15, 2014 at 11:33 PM, Punit Dambiwal 
> wrote:
> > Hi,
> >
> > Still i am getting the same error,when i run the following :-
> >
> > --
> > curl -i 'http://xxx.xlinux.com/admin/usage?format=json' -X GET -H
> > 'Authorization: AWS
> > YHFQ4D8BM835BCGERHTN:kXpM0XB9UjOadexDu2ZoP8s4nKjuoL0iIZhE\/+Gv' -H 'Host:
>
>
> Where did you come up with this authorization field? You need to sign
> the message appropriately.
>
> Yehuda
>
> > xxx.xlinux.com' -H 'Content-Length: 0'
> > HTTP/1.1 403 Forbidden
> > Date: Wed, 16 Apr 2014 06:26:45 GMT
> >
> > Server: Apache/2.2.22 (Ubuntu)
> > Accept-Ranges: bytes
> > Content-Length: 23
> > Content-Type: application/json
> >
> > {"Code":"AccessDenied"}
> > ---
> >
> > Can any body help me to resolve this issue..
> >
> > Thanks,
> > punit
> >
> >
> > On Mon, Apr 14, 2014 at 11:55 AM, Punit Dambiwal 
> wrote:
> >>
> >> Hi,
> >>
> >> I am trying to list out all users using the Ceph S3 api and php. These
> are
> >> the lines of code which i used for
> >>
> >>
> >>
> >>
> >>
> --
> >>
> >>
> >>
> >>
> >>
> >> $url = "http://.Xlinux.com/admin/user?format=json";;
> >>
> >> $ch = curl_init ($url);
> >>
> >> curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "GET"); curl_setopt($ch,
> >> CURLOPT_USERPWD,
> >> "P8K3750Z3PP5MGUKQYBL:CB+Ioydr1XsmQF\/gQmE\/X3YsDjtDbxLZzByaU9t\/");
> >>
> >> curl_setopt($ch, CURLOPT_HTTPHEADER, array("Authorization: AWS
> >> P8K3750Z3PP5MGUKQYBL:CB+Ioydr1XsmQF\/gQmE\/X3YsDjtDbxLZzByaU9t\/"));
> >>
> >> curl_setopt($ch, CURLOPT_HEADER, 0);
> >>
> >> curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch,
> >> CURLOPT_BINARYTRANSFER,1); $response = curl_exec($ch); $output =
> >> json_encode($response);
> >>
> >>
> >>
> >> print_r($output);
> >>
> >>
> >>
> >>
> >>
> --
> >>
> >>
> >>
> >>
> >>
> >> I am getting an error: access denied as output. Could you please check
> it
> >> ?
> >>
> >>
> >>
> >> I have also tried the same using the curl command like
> >>
> >>
> >>
> >> curl -i 'http://XXX.Xlinux.com/admin/usage?format=json' -X GET -H
> >>
> >> 'Authorization: AWS
> >>
> >> P8K3750Z3PP5MGUKQYBL:CB+Ioydr1XsmQF\/gQmE\/X3YsDjtDbxLZzByaU9t\/' -H
> >>
> >> 'Host: XXX.Xlinux.com' -H 'Content-Length: 0'
> >>
> >>
> >>
> >> HTTP/1.1 403 Forbidden
> >>
> >> Date: Fri, 11 Apr 2014 10:08:20 GMT
> >>
> >> Server: Apache/2.2.22 (Ubuntu)
> >>
> >> Accept-Ranges: bytes
> >>
> >> Content-Length: 23
> >>
> >> Content-Type: application/json
> >>
> >>
> >>
> >> {"Code":"AccessDenied"}
> >>
> >>
> >>
> >> Can any body let me know if anything wrong in the above... ??
> >>
> >>
> >>
> >>
> >>
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSDs: cache pool/tier versus node-local block cache

2014-04-17 Thread Christian Balzer
On Fri, 18 Apr 2014 11:34:15 +1000 Blair Bethwaite wrote:

> > Message: 20
> > Date: Thu, 17 Apr 2014 17:45:39 +0900
> > From: Christian Balzer 
> > To: "ceph-users@lists.ceph.com" 
> > Subject: Re: [ceph-users] SSDs: cache pool/tier versus node-local
> > block cache
> > Message-ID: <20140417174539.6c713...@batzmaru.gol.ad.jp>
> > Content-Type: text/plain; charset=US-ASCII
> >
> > On Thu, 17 Apr 2014 12:58:55 +1000 Blair Bethwaite wrote:
> >
> > > Hi Kyle,
> > >
> > > Thanks for the response. Further comments/queries...
> > >
> > > > Message: 42
> > > > Date: Wed, 16 Apr 2014 06:53:41 -0700
> > > > From: Kyle Bader 
> > > > Cc: ceph-users 
> > > > Subject: Re: [ceph-users] SSDs: cache pool/tier versus node-local
> > > > block cache
> > > > Message-ID:
> > > >  > > i...@mail.gmail.com>
> > > > Content-Type: text/plain; charset=UTF-8
> > > >
> > > > >> Obviously the ssds could be used as journal devices, but I'm not
> > > > >> really convinced whether this is worthwhile when all nodes have
> > > > >> 1GB of
> > > hardware
> > > > >> writeback cache (writes to journal and data areas on the same
> > > > >> spindle
> > > have
> > > > >> time to coalesce in the cache and minimise seek time hurt). Any
> > > > >> advice
> > > on
> > > > >> this?
> > > >
> > > > All writes need to be written to the journal before being written
> > > > to the data volume so it's going to impact your overall throughput
> > > > and cause seeking, a hardware cache will only help with the later
> > > > (unless you use btrfs).
> > >
> >
> > Indeed. Also a 1GB cache having to serve 12 spindles isn't as
> > impressive anymore when it comes down to per disk cache (assuming more
> > or less uniform activity).
> > That hardware cache also will be used for reads (I've seen controllers
> > that allow you to influence the read/write cache usage ratio, but none
> > where you could disable caching reads right away).
> >
> > Which leads me to another point, your journal SSDs will be hanging of
> > that same controller as the OSD HDDs.
> > Meaning that they will compete for hardware cache space that would be
> > much better used for the HDDs (again, I'm unaware of any controller
> > that allows to disable caching for individual disks).
> >
> > That's why for my current first production cluster as well as any
> > future ones am planning to separate the SSDs from the OSDs whenever
> > possible.
> 
> So the PERC 710p, whilst not having the native JBOD mode of the
> underlying LSI 2208 chipset, does allow per- virtual-disk cache and
> read-ahead mode settings. It also does support "Cut-Through IO" (CTIO),
> apparently enabled when the virtual-disk is set to no read-ahead and
> write-through caching. So my draft plan is that for our hardware we'll
> have 12x single-RAID0 virtual-disks, the 3 ssds will be set for CTIO.
> 
Ah, I've seen similar stuff with LSI 2108s, but not the CTIO bit.
What tends to be annoying about these single drive RAID0 virtual disks is
that the real drive is shielded from the OS. And with a cluster of your
size SMART data can and will be immensely helpful.

> > > Right, good point. So back of envelope calculations for throughput
> > > scenarios based on our hardware, just saying 150MB/s r/w for the
> spindles
> > > and 450/350MB/s r/w for the ssds, and pretending no controller
> > > bottlenecks etc:
> > >
> > > 1 OSD node (without ssd journals, hence divide by 2):
> > > 9 * 150 / 2 = 675MB/s write throughput
> > >
> > Which is, even though extremely optimistic, quite below your network
> > bandwidth.
> 
> Indeed (I'd say wildly optimistic, but for the sake of argument one has
> to have some sort of number/s).
> 
> > > 1 OSD node (with ssd journals):
> > > min(9 * 150, 3 * 350) = 1050MB/s write throughput
> > >
> > > Aggregates for 12 OSDs: ~8GB/s versus 12.5GB/s
> > >
> > You get to divide those aggregate numbers by your replication factor
> > and if you value your data that is 3.
> 
> Can anyone point to the reasoning/background behind the shift to
> favouring a 3x replication factor? When we started out it seemed that 2x
> was the recommendation, and that's what we're running with at present.
>
When I first looked at Ceph the default was 2 and everybody here and at
Inktank recommended 3. I think the default was/is going to be changed to 3
as well.

> Current use case is RBD volumes for working data and we're looking at
> integrating a cold-storage option for long-term durability of those, so
> our replication is mainly about availability. I assume 3x replication is
> more relevant for radosgw? There was an interesting discussion a while
> back about calculating data-loss probabilities under certain conditions
> but it didn't seem to have a definitive end...
> 
You're probably thinking about the thread called 
"Failure probability with largish deployments" that I started last year.

You might want to revisit that thread, the reliability modeling software
by Inktank was coming up with decent enough numbers f

Re: [ceph-users] SSDs: cache pool/tier versus node-local block cache

2014-04-17 Thread Blair Bethwaite
> Message: 20
> Date: Thu, 17 Apr 2014 17:45:39 +0900
> From: Christian Balzer 
> To: "ceph-users@lists.ceph.com" 
> Subject: Re: [ceph-users] SSDs: cache pool/tier versus node-local
> block cache
> Message-ID: <20140417174539.6c713...@batzmaru.gol.ad.jp>
> Content-Type: text/plain; charset=US-ASCII
>
> On Thu, 17 Apr 2014 12:58:55 +1000 Blair Bethwaite wrote:
>
> > Hi Kyle,
> >
> > Thanks for the response. Further comments/queries...
> >
> > > Message: 42
> > > Date: Wed, 16 Apr 2014 06:53:41 -0700
> > > From: Kyle Bader 
> > > Cc: ceph-users 
> > > Subject: Re: [ceph-users] SSDs: cache pool/tier versus node-local
> > > block cache
> > > Message-ID:
> > >  > i...@mail.gmail.com>
> > > Content-Type: text/plain; charset=UTF-8
> > >
> > > >> Obviously the ssds could be used as journal devices, but I'm not
> > > >> really convinced whether this is worthwhile when all nodes have 1GB
> > > >> of
> > hardware
> > > >> writeback cache (writes to journal and data areas on the same
> > > >> spindle
> > have
> > > >> time to coalesce in the cache and minimise seek time hurt). Any
> > > >> advice
> > on
> > > >> this?
> > >
> > > All writes need to be written to the journal before being written to
> > > the data volume so it's going to impact your overall throughput and
> > > cause seeking, a hardware cache will only help with the later (unless
> > > you use btrfs).
> >
>
> Indeed. Also a 1GB cache having to serve 12 spindles isn't as impressive
> anymore when it comes down to per disk cache (assuming more or less
> uniform activity).
> That hardware cache also will be used for reads (I've seen controllers
> that allow you to influence the read/write cache usage ratio, but none
> where you could disable caching reads right away).
>
> Which leads me to another point, your journal SSDs will be hanging of that
> same controller as the OSD HDDs.
> Meaning that they will compete for hardware cache space that would be much
> better used for the HDDs (again, I'm unaware of any controller that allows
> to disable caching for individual disks).
>
> That's why for my current first production cluster as well as any future
> ones am planning to separate the SSDs from the OSDs whenever possible.

So the PERC 710p, whilst not having the native JBOD mode of the underlying
LSI 2208 chipset, does allow per- virtual-disk cache and read-ahead mode
settings. It also does support "Cut-Through IO" (CTIO), apparently enabled
when the virtual-disk is set to no read-ahead and write-through caching. So
my draft plan is that for our hardware we'll have 12x single-RAID0
virtual-disks, the 3 ssds will be set for CTIO.

> > Right, good point. So back of envelope calculations for throughput
> > scenarios based on our hardware, just saying 150MB/s r/w for the
spindles
> > and 450/350MB/s r/w for the ssds, and pretending no controller
> > bottlenecks etc:
> >
> > 1 OSD node (without ssd journals, hence divide by 2):
> > 9 * 150 / 2 = 675MB/s write throughput
> >
> Which is, even though extremely optimistic, quite below your network
> bandwidth.

Indeed (I'd say wildly optimistic, but for the sake of argument one has to
have some sort of number/s).

> > 1 OSD node (with ssd journals):
> > min(9 * 150, 3 * 350) = 1050MB/s write throughput
> >
> > Aggregates for 12 OSDs: ~8GB/s versus 12.5GB/s
> >
> You get to divide those aggregate numbers by your replication factor and
> if you value your data that is 3.

Can anyone point to the reasoning/background behind the shift to favouring
a 3x replication factor? When we started out it seemed that 2x was the
recommendation, and that's what we're running with at present. Current use
case is RBD volumes for working data and we're looking at integrating a
cold-storage option for long-term durability of those, so our replication
is mainly about availability. I assume 3x replication is more relevant for
radosgw? There was an interesting discussion a while back about calculating
data-loss probabilities under certain conditions but it didn't seem to have
a definitive end...

> That replication will also eat into your network bandwidth, making a
> dedicated cluster network for replication potentially quite attractive.
> But since in your case the disk bandwidth per node is pretty close to the
> network bandwidth of 10GE, using the dual ports for a resilient public
> network might be a better approach.

Plan is to use L2 MSTP. So we have multiple VLANs, e.g., client-access and
storage-private. They're bonded in active/passive configuration with each
active on a different port and the VLANs having independent root bridges.
In port/cable/switch failure-mode all VLANs get squished over the same port.

> > So the general naive case seems like a no-brainer, we should use SSD
> > journals. But then we don't require even 8GB/s most of the time...
> >
> Well, first and foremost people here seem to obsessed with throughput,
> everybody clamors about that and the rbd bench doesn't help either.
>
> Unless 

Re: [ceph-users] force_create_pg not working

2014-04-17 Thread Craig Lewis

Well, this is embarrassing.

After working on this for a week, it finally created last night. The 
only thing that changed in the past 2 days was that I ran ceph osd unset 
noscrub and ceph osd unset nodeep-scrub. I had disabled both scrubs in 
the hope that backfilling would finish faster.



I only had the default logging level, and the OSD logs don't show 
anything interesting.



Looking at ceph.log when the incomplete goes away... it's complicated.  
4 of my 16 OSDs were kicked out for being unresponsive.  They stayed 
down and out for about 4 hours, until I restarted them.  At some point, 
the incomplete went away, without ever going into creating.



I've had a lot of problems with OSDs doing being marked down and out in 
this cluster.  It started with some extreme OSD slowness caused by 
kswapd problems.  I think I have that under control now.  But during 
that time, the OSDs thrashed themselves so hard, some of them aren't 
stable anymore.  I still have two OSDs marked out in this cluster.  If I 
mark them in, as soon as they start backfilling, they start using 100% 
CPU, and the other OSDs complain that they're not responding to 
heartbeats.  So far, the 14 OSDs that are IN are remapping fine.  If 
remapping completes, I plan to zap those two OSDs and re-add them.






On 4/16/14 22:10 , Gregory Farnum wrote:

Do you have any logging running on those OSDs? I'm going to need to
get somebody else to look at this, but if we could check the probe
messages being sent that might be helpful.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Tue, Apr 15, 2014 at 4:36 PM, Craig Lewis  wrote:

http://pastebin.com/ti1VYqfr

I assume the problem is at the very end:
   "probing_osds": [
 0,
 2,
 3,
 4,
 11,
 13],
   "down_osds_we_would_probe": [],
   "peering_blocked_by": []},


OSDs 3, 4, and 11 have been UP and IN for hours.  OSDs 0, 2, and 13 have
been UP and IN since the problems started, but they never complete probing.




Craig Lewis
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com

Central Desktop. Work together in ways you never thought possible.
Connect with us   Website  |  Twitter  |  Facebook  |  LinkedIn  |  Blog

On 4/15/14 16:07 , Gregory Farnum wrote:

What are the results of "ceph osd pg 11.483 query"?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Tue, Apr 15, 2014 at 4:01 PM, Craig Lewis 
wrote:

I have 1 incomplete PG.  The data is gone, but I can upload it again.  I
just need to make the cluster start working so I can upload it.

I've read a bunch of mailling list posts, and found ceph pg force_create_pg.
Except, it doesn't work.

I run:
root@ceph1c:/var/lib/ceph/osd# ceph pg force_create_pg 11.483
pg 11.483 now creating, ok

The incomplete PG switches to creating.  It sits in creating for a while,
then flips back to incomplete:
2014-04-15 15:06:11.876535 mon.0 [INF] pgmap v5719605: 2592 pgs: 2586
active+clean, 1 incomplete, 5 active+clean+scrubbing+deep; 15086 GB data,
27736 GB used, 28127 GB / 55864 GB avail
2014-04-15 15:06:13.899681 mon.0 [INF] pgmap v5719606: 2592 pgs: 1 creating,
2586 active+clean, 5 active+clean+scrubbing+deep; 15086 GB data, 27736 GB
used, 28127 GB / 55864 GB avail
2014-04-15 15:06:14.965676 mon.0 [INF] pgmap v5719607: 2592 pgs: 1 creating,
2586 active+clean, 5 active+clean+scrubbing+deep; 15086 GB data, 27736 GB
used, 28127 GB / 55864 GB avail
2014-04-15 15:06:15.995570 mon.0 [INF] pgmap v5719608: 2592 pgs: 1 creating,
2586 active+clean, 5 active+clean+scrubbing+deep; 15086 GB data, 27736 GB
used, 28127 GB / 55864 GB avail
2014-04-15 15:06:17.019972 mon.0 [INF] pgmap v5719609: 2592 pgs: 1 creating,
2586 active+clean, 5 active+clean+scrubbing+deep; 15086 GB data, 27736 GB
used, 28127 GB / 55864 GB avail
2014-04-15 15:06:18.048487 mon.0 [INF] pgmap v5719610: 2592 pgs: 1 creating,
2586 active+clean, 5 active+clean+scrubbing+deep; 15086 GB data, 27736 GB
used, 28127 GB / 55864 GB avail
2014-04-15 15:06:19.093757 mon.0 [INF] pgmap v5719611: 2592 pgs: 2586
active+clean, 1 incomplete, 5 active+clean+scrubbing+deep; 15086 GB data,
27736 GB used, 28127 GB / 55864 GB avail

I'm on:
root@ceph0c:~# ceph -v
ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)

root@ceph0c:~# uname -a
Linux ceph0c 3.5.0-46-generic #70~precise1-Ubuntu SMP Thu Jan 9 23:55:12 UTC
2014 x86_64 x86_64 x86_64 GNU/Linux

root@ceph0c:~# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=12.04
DISTRIB_CODENAME=precise
DISTRIB_DESCRIPTION="Ubuntu 12.04.4 LTS"

--

Craig Lewis
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com

Central Desktop. Work together in ways you never thought possible.
Connect with us   Website  |  Twitter  |  Facebook  |  LinkedIn  |  Blog


___
ceph-users mailing list
ceph-users@lists.ceph.com
http

Re: [ceph-users] SSDs: cache pool/tier versus node-local block cache

2014-04-17 Thread Kyle Bader
>> >> I think the timing should work that we'll be deploying with Firefly and
>> >> so
>> >> have Ceph cache pool tiering as an option, but I'm also evaluating
>> >> Bcache
>> >> versus Tier to act as node-local block cache device. Does anybody have
>> >> real
>> >> or anecdotal evidence about which approach has better performance?
>> > New idea that is dependent on failure behaviour of the cache tier...
>>
>> The problem with this type of configuration is it ties a VM to a
>> specific hypervisor, in theory it should be faster because you don't
>> have network latency from round trips to the cache tier, resulting in
>> higher iops. Large sequential workloads may achieve higher throughput
>> by parallelizing across many OSDs in a cache tier, whereas local flash
>> would be limited to single device throughput.
>
> Ah, I was ambiguous. When I said node-local I meant OSD-local. So I'm really
> looking at:
> 2-copy write-back object ssd cache-pool
> versus
> OSD write-back ssd block-cache
> versus
> 1-copy write-around object cache-pool & ssd journal

Ceph cache pools allow you to scale the size of the cache pool
independent of the underlying storage and avoids constraints about
disk:ssd ratios (for flashcache, bcache, etc). Local block caches
should have lower latency than a cache tier for a cache miss, due to
the extra hop(s) across the network. I would lean towards using Ceph's
cache tiers for the scaling independence.

> This is undoubtedly true for a write-back cache-tier. But in the scenario
> I'm suggesting, a write-around cache, that needn't be bad news - if a
> cache-tier OSD is lost the cache simply just got smaller and some cached
> objects were unceremoniously flushed. The next read on those objects should
> just miss and bring them into the now smaller cache.
>
> The thing I'm trying to avoid with the above is double read-caching of
> objects (so as to get more aggregate read cache). I assume the standard
> wisdom with write-back cache-tiering is that the backing data pool shouldn't
> bother with ssd journals?

Currently, all cache tiers need to be durable - regardless of cache
mode. As such, cache tiers should be erasure coded or N+1 replicated
(I'd recommend N+2 or 3x replica). Ceph could potentially do what you
described in the future, it just doesn't yet.

-- 

Kyle
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Troubles MDS

2014-04-17 Thread Gregory Farnum
On Thu, Apr 17, 2014 at 12:45 AM, Georg Höllrigl
 wrote:
> Hello Greg,
>
> I've searched - but don't see any backtraces... I've tried to get some more
> info out of the logs. I really hope, there is something interesting in it:
>
> It all started two days ago with an authentication error:
>
> 2014-04-14 21:08:55.929396 7fd93d53f700  1 mds.0.0 standby_replay_restart
> (as standby)
> 2014-04-14 21:09:07.989547 7fd93b62e700  1 mds.0.0 replay_done (as standby)
> 2014-04-14 21:09:08.989647 7fd93d53f700  1 mds.0.0 standby_replay_restart
> (as standby)
> 2014-04-14 21:09:10.633786 7fd93b62e700  1 mds.0.0 replay_done (as standby)
> 2014-04-14 21:09:11.633886 7fd93d53f700  1 mds.0.0 standby_replay_restart
> (as standby)
> 2014-04-14 21:09:17.995105 7fd93f644700  0 mds.0.0 handle_mds_beacon no
> longer laggy
> 2014-04-14 21:09:39.798798 7fd93f644700  0 monclient: hunting for new mon
> 2014-04-14 21:09:39.955078 7fd93f644700  1 mds.-1.-1 handle_mds_map i
> (10.0.1.107:6800/16503) dne in the mdsmap, respawning myself
> 2014-04-14 21:09:39.955094 7fd93f644700  1 mds.-1.-1 respawn
> 2014-04-14 21:09:39.955106 7fd93f644700  1 mds.-1.-1  e: '/usr/bin/ceph-mds'
> 2014-04-14 21:09:39.955109 7fd93f644700  1 mds.-1.-1  0: '/usr/bin/ceph-mds'
> 2014-04-14 21:09:39.955110 7fd93f644700  1 mds.-1.-1  1: '-i'
> 2014-04-14 21:09:39.955112 7fd93f644700  1 mds.-1.-1  2: 'ceph-m-02'
> 2014-04-14 21:09:39.955113 7fd93f644700  1 mds.-1.-1  3: '--pid-file'
> 2014-04-14 21:09:39.955114 7fd93f644700  1 mds.-1.-1  4:
> '/var/run/ceph/mds.ceph-m-02.pid'
> 2014-04-14 21:09:39.955116 7fd93f644700  1 mds.-1.-1  5: '-c'
> 2014-04-14 21:09:39.955117 7fd93f644700  1 mds.-1.-1  6:
> '/etc/ceph/ceph.conf'
> 2014-04-14 21:09:39.979138 7fd93f644700  1 mds.-1.-1  cwd /
> 2014-04-14 19:09:40.922683 7f8ba9973780  0 ceph version 0.72.2
> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 16505
> 2014-04-14 19:09:40.975024 7f8ba9973780 -1 mds.-1.0 ERROR: failed to
> authenticate: (1) Operation not permitted
> 2014-04-14 19:09:40.975070 7f8ba9973780  1 mds.-1.0 suicide.  wanted
> down:dne, now up:boot

Well, as it says, this is a failure to authenticate with the cluster.
It could conceivably be a result of a huge clock skew, or maybe the
keyring wasn't accessible for some reason...

>
> That was fixed with restarting mds (+ the whole server).
>
> 2014-04-15 07:07:15.948650 7f9fdec0d780  0 ceph version 0.72.2
> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 506
> 2014-04-15 07:07:15.954386 7f9fdec0d780 -1 mds.-1.0 ERROR: failed to
> authenticate: (1) Operation not permitted
> 2014-04-15 07:07:15.954422 7f9fdec0d780  1 mds.-1.0 suicide.  wanted
> down:dne, now up:boot
> 2014-04-15 07:15:49.177861 7fe8a1d60780  0 ceph version 0.72.2
> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 26401
> 2014-04-15 07:15:49.184027 7fe8a1d60780 -1 mds.-1.0 ERROR: failed to
> authenticate: (1) Operation not permitted
> 2014-04-15 07:15:49.184046 7fe8a1d60780  1 mds.-1.0 suicide.  wanted
> down:dne, now up:boot
> 2014-04-15 07:17:32.598031 7fab123e6780  0 ceph version 0.72.2
> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 30531
> 2014-04-15 07:17:32.604560 7fab123e6780 -1 mds.-1.0 ERROR: failed to
> authenticate: (1) Operation not permitted
> 2014-04-15 07:17:32.604592 7fab123e6780  1 mds.-1.0 suicide.  wanted
> down:dne, now up:boot
> 2014-04-15 07:21:56.099203 7fd37b951780  0 ceph version 0.72.2
> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 11335
> 2014-04-15 07:21:56.105229 7fd37b951780 -1 mds.-1.0 ERROR: failed to
> authenticate: (1) Operation not permitted
> 2014-04-15 07:21:56.105254 7fd37b951780  1 mds.-1.0 suicide.  wanted
> down:dne, now up:boot
> 2014-04-15 07:22:09.345800 7f23392ef780  0 ceph version 0.72.2
> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 11461
> 2014-04-15 07:22:09.390001 7f23392ef780 -1 mds.-1.0 ERROR: failed to
> authenticate: (1) Operation not permitted
> 2014-04-15 07:22:09.391087 7f23392ef780  1 mds.-1.0 suicide.  wanted
> down:dne, now up:boot
> 2014-04-15 07:28:01.762191 7fab6d14b780  0 ceph version 0.72.2
> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 28263
> 2014-04-15 07:28:01.779485 7fab6d14b780 -1 mds.-1.0 ERROR: failed to
> authenticate: (1) Operation not permitted
> 2014-04-15 07:28:01.779507 7fab6d14b780  1 mds.-1.0 suicide.  wanted
> down:dne, now up:boot
> 2014-04-15 07:35:49.065110 7fe4f6b0d780  0 ceph version 0.72.2
> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 1233
> 2014-04-15 07:35:52.191856 7fe4f6b05700  0 -- 10.0.1.107:6800/1233 >>
> 10.0.1.108:6789/0 pipe(0x2f9f500 sd=8 :0 s=1 pgs=0 cs=0 l=1
> c=0x2f81580).fault
> 2014-04-15 07:35:56.352553 7fe4f1b96700  1 mds.-1.0 handle_mds_map standby
> 2014-04-15 07:35:56.419499 7fe4f1b96700  1 mds.0.0 handle_mds_map i am now
> mds.7905854.0replaying mds.0.0
> 2014-04-15 07:35:56.419507 7fe4f1b96700  1 mds.0.0 handle_mds_map state
> c

Re: [ceph-users] question on harvesting freed space

2014-04-17 Thread John-Paul Robinson
So in the mean time, are there any common work-arounds?

I'm assuming monitoring imageused/imagesize ratio and if its greater
than some tolerance create a new image and move file system content over
is an effective, if crude approach.  I'm not clear on how to measure the
amount of storage an image uses at the RBD level.  Probably because I
don't understand info output:

$ sudo rbd --id nova info somecontainer
rbd image 'somecontainer':
size 1024 GB in 262144 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.176f3.238e1f29
format: 1

Are there others?

I assume snapshotting images doesn't help here since RBD still wouldn't
be able to distinguish what's in use and what's not.

Thoughts?

~jpr

On 04/17/2014 01:38 AM, Wido den Hollander wrote:
> On 04/17/2014 02:39 AM, Somnath Roy wrote:
>> It seems Discard support for kernel rbd is targeted for v80..
>>
>> http://tracker.ceph.com/issues/190
>>
> 
> True, but it will obviously take time before this hits the upstream
> kernels and goes into distributions.
> 
> For RHEL 7 it might be that the krbd module from the Ceph extra repo
> might work. For Ubuntu it's waiting for newer kernels to be backported
> to the LTS releases.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD as a hot spare

2014-04-17 Thread Pavel V. Kaygorodov

17 апр. 2014 г., в 16:41, Wido den Hollander  написал(а):

> On 04/17/2014 02:37 PM, Pavel V. Kaygorodov wrote:
>> Hi!
>> 
>> How do you think, is it a good idea, to add RBD block device as a hot spare 
>> drive to a linux software raid?
>> 
> 
> Well, it could work, but why? What is the total setup going to be?
> 
> RAID over a couple of physical disks with RBD as hotspare?

Yes. If it works, it can reduce overall cost of disk subsystem, because I can 
use all physical drives for raid on each host, and add RBD hot spares, which 
(due to thin provisioning) will not consume any real space until hot spares 
become active.

Pavel.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD as a hot spare

2014-04-17 Thread Wido den Hollander

On 04/17/2014 02:37 PM, Pavel V. Kaygorodov wrote:

Hi!

How do you think, is it a good idea, to add RBD block device as a hot spare 
drive to a linux software raid?



Well, it could work, but why? What is the total setup going to be?

RAID over a couple of physical disks with RBD as hotspare?


Pavel.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
Ceph consultant and trainer
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RBD as a hot spare

2014-04-17 Thread Pavel V. Kaygorodov
Hi!

How do you think, is it a good idea, to add RBD block device as a hot spare 
drive to a linux software raid?

Pavel.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD write access patterns and atime

2014-04-17 Thread Mike Dawson

Thanks Dan!

Thanks,
Mike Dawson


On 4/17/2014 4:06 AM, Dan van der Ster wrote:

Mike Dawson wrote:

Dan,

Could you describe how you harvested and analyzed this data? Even
better, could you share the code?

Cheers,
Mike

First enable debug_filestore=10, then you'll see logs like this:

2014-04-17 09:40:34.466749 7fb39df16700 10
filestore(/var/lib/ceph/osd/osd.0) write
4.206_head/57186206/rbd_data.1f7ccd36575a0ed.1620/head//4
651264~4096 = 4096

and this for reads:

2014-04-17 09:46:10.449577 7fb392427700 10
filestore(/var/lib/ceph/osd/osd.0) FileStore::read
4.fe9_head/f7281fe9/rbd_data.10bb48f705289c0.6a24/head//4
1994752~4096/4096

The last num is the size of the write/read.

Then run this:
https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl

Cheers, Dan


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Does anyone success deploy FEDERATED GATEWAYS

2014-04-17 Thread Peter

I am currently testing this functionality. What is your issue?


On 04/17/2014 07:32 AM, maoqi1982 wrote:

Hi list
 i follow the http://ceph.com/docs/master/radosgw/federated-config/ to 
test the muti-geography  function.failed. .Does anyone success deploy 
FEDERATED GATEWAYS?Is the function in ceph  ok or not? if anyone 
success deploy it please give me some help


thanks.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Systems and Storage Engineer, Digital Repository of Ireland (DRI)
High Performance & Research Computing, IS Services
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/ | ptier...@tchpc.tcd.ie
Tel: +353-1-896-4466

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Troubles MDS

2014-04-17 Thread Yan, Zheng
On Thu, Apr 17, 2014 at 4:10 PM, Georg Höllrigl
 wrote:
> Whatever happened - It fixed itself!?
>
> When restarting, I got ~ 165k log messages like:
> 2014-04-17 07:30:14.856421 7fc50b991700  0 log [WRN] :  ino 1f24fe0
> 2014-04-17 07:30:14.856422 7fc50b991700  0 log [WRN] :  ino 1f24fe1
> 2014-04-17 07:30:14.856423 7fc50b991700  0 log [WRN] :  ino 1f24fe2
> 2014-04-17 07:30:14.856424 7fc50b991700  0 log [WRN] :  ino 1f24fe3
> 2014-04-17 07:30:14.856427 7fc50b991700  0 log [WRN] :  ino 1f24fe4
> 2014-04-17 07:30:14.856428 7fc50b991700  0 log [WRN] :  ino 1f24fe5
>
> And the clients recovered!?
>
> I would be really interested, what happend!
>
> Georg
>
>
> On 17.04.2014 09:45, Georg Höllrigl wrote:
>>
>> Hello Greg,
>>
>> I've searched - but don't see any backtraces... I've tried to get some
>> more info out of the logs. I really hope, there is something interesting
>> in it:
>>
>> It all started two days ago with an authentication error:
>>
>> 2014-04-14 21:08:55.929396 7fd93d53f700  1 mds.0.0
>> standby_replay_restart (as standby)
>> 2014-04-14 21:09:07.989547 7fd93b62e700  1 mds.0.0 replay_done (as
>> standby)
>> 2014-04-14 21:09:08.989647 7fd93d53f700  1 mds.0.0
>> standby_replay_restart (as standby)
>> 2014-04-14 21:09:10.633786 7fd93b62e700  1 mds.0.0 replay_done (as
>> standby)
>> 2014-04-14 21:09:11.633886 7fd93d53f700  1 mds.0.0
>> standby_replay_restart (as standby)
>> 2014-04-14 21:09:17.995105 7fd93f644700  0 mds.0.0 handle_mds_beacon no
>> longer laggy
>> 2014-04-14 21:09:39.798798 7fd93f644700  0 monclient: hunting for new mon
>> 2014-04-14 21:09:39.955078 7fd93f644700  1 mds.-1.-1 handle_mds_map i
>> (10.0.1.107:6800/16503) dne in the mdsmap, respawning myself
>> 2014-04-14 21:09:39.955094 7fd93f644700  1 mds.-1.-1 respawn
>> 2014-04-14 21:09:39.955106 7fd93f644700  1 mds.-1.-1  e:
>> '/usr/bin/ceph-mds'
>> 2014-04-14 21:09:39.955109 7fd93f644700  1 mds.-1.-1  0:
>> '/usr/bin/ceph-mds'
>> 2014-04-14 21:09:39.955110 7fd93f644700  1 mds.-1.-1  1: '-i'
>> 2014-04-14 21:09:39.955112 7fd93f644700  1 mds.-1.-1  2: 'ceph-m-02'
>> 2014-04-14 21:09:39.955113 7fd93f644700  1 mds.-1.-1  3: '--pid-file'
>> 2014-04-14 21:09:39.955114 7fd93f644700  1 mds.-1.-1  4:
>> '/var/run/ceph/mds.ceph-m-02.pid'
>> 2014-04-14 21:09:39.955116 7fd93f644700  1 mds.-1.-1  5: '-c'
>> 2014-04-14 21:09:39.955117 7fd93f644700  1 mds.-1.-1  6:
>> '/etc/ceph/ceph.conf'
>> 2014-04-14 21:09:39.979138 7fd93f644700  1 mds.-1.-1  cwd /
>> 2014-04-14 19:09:40.922683 7f8ba9973780  0 ceph version 0.72.2
>> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 16505
>> 2014-04-14 19:09:40.975024 7f8ba9973780 -1 mds.-1.0 ERROR: failed to
>> authenticate: (1) Operation not permitted
>> 2014-04-14 19:09:40.975070 7f8ba9973780  1 mds.-1.0 suicide.  wanted
>> down:dne, now up:boot
>>
>> That was fixed with restarting mds (+ the whole server).
>>
>> 2014-04-15 07:07:15.948650 7f9fdec0d780  0 ceph version 0.72.2
>> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 506
>> 2014-04-15 07:07:15.954386 7f9fdec0d780 -1 mds.-1.0 ERROR: failed to
>> authenticate: (1) Operation not permitted
>> 2014-04-15 07:07:15.954422 7f9fdec0d780  1 mds.-1.0 suicide.  wanted
>> down:dne, now up:boot
>> 2014-04-15 07:15:49.177861 7fe8a1d60780  0 ceph version 0.72.2
>> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 26401
>> 2014-04-15 07:15:49.184027 7fe8a1d60780 -1 mds.-1.0 ERROR: failed to
>> authenticate: (1) Operation not permitted
>> 2014-04-15 07:15:49.184046 7fe8a1d60780  1 mds.-1.0 suicide.  wanted
>> down:dne, now up:boot
>> 2014-04-15 07:17:32.598031 7fab123e6780  0 ceph version 0.72.2
>> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 30531
>> 2014-04-15 07:17:32.604560 7fab123e6780 -1 mds.-1.0 ERROR: failed to
>> authenticate: (1) Operation not permitted
>> 2014-04-15 07:17:32.604592 7fab123e6780  1 mds.-1.0 suicide.  wanted
>> down:dne, now up:boot
>> 2014-04-15 07:21:56.099203 7fd37b951780  0 ceph version 0.72.2
>> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 11335
>> 2014-04-15 07:21:56.105229 7fd37b951780 -1 mds.-1.0 ERROR: failed to
>> authenticate: (1) Operation not permitted
>> 2014-04-15 07:21:56.105254 7fd37b951780  1 mds.-1.0 suicide.  wanted
>> down:dne, now up:boot
>> 2014-04-15 07:22:09.345800 7f23392ef780  0 ceph version 0.72.2
>> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 11461
>> 2014-04-15 07:22:09.390001 7f23392ef780 -1 mds.-1.0 ERROR: failed to
>> authenticate: (1) Operation not permitted
>> 2014-04-15 07:22:09.391087 7f23392ef780  1 mds.-1.0 suicide.  wanted
>> down:dne, now up:boot
>> 2014-04-15 07:28:01.762191 7fab6d14b780  0 ceph version 0.72.2
>> (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 28263
>> 2014-04-15 07:28:01.779485 7fab6d14b780 -1 mds.-1.0 ERROR: failed to
>> authenticate: (1) Operation not permitted
>> 2014-04-15 07:28:01.779507 7fab6d14b780  1 mds.-1.0 suicide.  want

Re: [ceph-users] SSDs: cache pool/tier versus node-local block cache

2014-04-17 Thread Christian Balzer
On Thu, 17 Apr 2014 12:58:55 +1000 Blair Bethwaite wrote:

> Hi Kyle,
> 
> Thanks for the response. Further comments/queries...
> 
> > Message: 42
> > Date: Wed, 16 Apr 2014 06:53:41 -0700
> > From: Kyle Bader 
> > Cc: ceph-users 
> > Subject: Re: [ceph-users] SSDs: cache pool/tier versus node-local
> > block cache
> > Message-ID:
> >  i...@mail.gmail.com>
> > Content-Type: text/plain; charset=UTF-8
> >
> > >> Obviously the ssds could be used as journal devices, but I'm not
> > >> really convinced whether this is worthwhile when all nodes have 1GB
> > >> of
> hardware
> > >> writeback cache (writes to journal and data areas on the same
> > >> spindle
> have
> > >> time to coalesce in the cache and minimise seek time hurt). Any
> > >> advice
> on
> > >> this?
> >
> > All writes need to be written to the journal before being written to
> > the data volume so it's going to impact your overall throughput and
> > cause seeking, a hardware cache will only help with the later (unless
> > you use btrfs).
> 

Indeed. Also a 1GB cache having to serve 12 spindles isn't as impressive
anymore when it comes down to per disk cache (assuming more or less
uniform activity). 
That hardware cache also will be used for reads (I've seen controllers
that allow you to influence the read/write cache usage ratio, but none
where you could disable caching reads right away).

Which leads me to another point, your journal SSDs will be hanging of that
same controller as the OSD HDDs.
Meaning that they will compete for hardware cache space that would be much
better used for the HDDs (again, I'm unaware of any controller that allows
to disable caching for individual disks).

That's why for my current first production cluster as well as any future
ones am planning to separate the SSDs from the OSDs whenever possible.

> Right, good point. So back of envelope calculations for throughput
> scenarios based on our hardware, just saying 150MB/s r/w for the spindles
> and 450/350MB/s r/w for the ssds, and pretending no controller
> bottlenecks etc:
> 
> 1 OSD node (without ssd journals, hence divide by 2):
> 9 * 150 / 2 = 675MB/s write throughput
> 
Which is, even though extremely optimistic, quite below your network
bandwidth.

> 1 OSD node (with ssd journals):
> min(9 * 150, 3 * 350) = 1050MB/s write throughput
> 
> Aggregates for 12 OSDs: ~8GB/s versus 12.5GB/s
> 
You get to divide those aggregate numbers by your replication factor and
if you value your data that is 3. 

That replication will also eat into your network bandwidth, making a
dedicated cluster network for replication potentially quite attractive.
But since in your case the disk bandwidth per node is pretty close to the
network bandwidth of 10GE, using the dual ports for a resilient public
network might be a better approach.

> So the general naive case seems like a no-brainer, we should use SSD
> journals. But then we don't require even 8GB/s most of the time...
> 
Well, first and foremost people here seem to obsessed with throughput,
everybody clamors about that and the rbd bench doesn't help either.

Unless you have a very special use case of basically writing or reading a
few large sequential files, you will run out of IOPS long before you run
out of raw bandwidth. 

And that's were caching/coalescing all along the way, from RBD cache for
the VMs, SSD journals to the hardware cache of your controller comes in.
These will all allow you to have peak performance far over the sustainable
IOPS of your backing HDDs, for some time at least.

In your case that sustained rate for the cluster you outlined would be
something (assuming 100 IOPS for those NL drives) like this:

 100(IOPS) x 9(disk) x 12(hosts) / 3(replication ratio) = 3600 IOPS

However that's ignoring all the other caches, in particular the controller
HW cache, which can raise the sustainable level quite a bit.

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Troubles MDS

2014-04-17 Thread Georg Höllrigl

Whatever happened - It fixed itself!?

When restarting, I got ~ 165k log messages like:
2014-04-17 07:30:14.856421 7fc50b991700  0 log [WRN] :  ino 1f24fe0
2014-04-17 07:30:14.856422 7fc50b991700  0 log [WRN] :  ino 1f24fe1
2014-04-17 07:30:14.856423 7fc50b991700  0 log [WRN] :  ino 1f24fe2
2014-04-17 07:30:14.856424 7fc50b991700  0 log [WRN] :  ino 1f24fe3
2014-04-17 07:30:14.856427 7fc50b991700  0 log [WRN] :  ino 1f24fe4
2014-04-17 07:30:14.856428 7fc50b991700  0 log [WRN] :  ino 1f24fe5

And the clients recovered!?

I would be really interested, what happend!

Georg

On 17.04.2014 09:45, Georg Höllrigl wrote:

Hello Greg,

I've searched - but don't see any backtraces... I've tried to get some
more info out of the logs. I really hope, there is something interesting
in it:

It all started two days ago with an authentication error:

2014-04-14 21:08:55.929396 7fd93d53f700  1 mds.0.0
standby_replay_restart (as standby)
2014-04-14 21:09:07.989547 7fd93b62e700  1 mds.0.0 replay_done (as standby)
2014-04-14 21:09:08.989647 7fd93d53f700  1 mds.0.0
standby_replay_restart (as standby)
2014-04-14 21:09:10.633786 7fd93b62e700  1 mds.0.0 replay_done (as standby)
2014-04-14 21:09:11.633886 7fd93d53f700  1 mds.0.0
standby_replay_restart (as standby)
2014-04-14 21:09:17.995105 7fd93f644700  0 mds.0.0 handle_mds_beacon no
longer laggy
2014-04-14 21:09:39.798798 7fd93f644700  0 monclient: hunting for new mon
2014-04-14 21:09:39.955078 7fd93f644700  1 mds.-1.-1 handle_mds_map i
(10.0.1.107:6800/16503) dne in the mdsmap, respawning myself
2014-04-14 21:09:39.955094 7fd93f644700  1 mds.-1.-1 respawn
2014-04-14 21:09:39.955106 7fd93f644700  1 mds.-1.-1  e:
'/usr/bin/ceph-mds'
2014-04-14 21:09:39.955109 7fd93f644700  1 mds.-1.-1  0:
'/usr/bin/ceph-mds'
2014-04-14 21:09:39.955110 7fd93f644700  1 mds.-1.-1  1: '-i'
2014-04-14 21:09:39.955112 7fd93f644700  1 mds.-1.-1  2: 'ceph-m-02'
2014-04-14 21:09:39.955113 7fd93f644700  1 mds.-1.-1  3: '--pid-file'
2014-04-14 21:09:39.955114 7fd93f644700  1 mds.-1.-1  4:
'/var/run/ceph/mds.ceph-m-02.pid'
2014-04-14 21:09:39.955116 7fd93f644700  1 mds.-1.-1  5: '-c'
2014-04-14 21:09:39.955117 7fd93f644700  1 mds.-1.-1  6:
'/etc/ceph/ceph.conf'
2014-04-14 21:09:39.979138 7fd93f644700  1 mds.-1.-1  cwd /
2014-04-14 19:09:40.922683 7f8ba9973780  0 ceph version 0.72.2
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 16505
2014-04-14 19:09:40.975024 7f8ba9973780 -1 mds.-1.0 ERROR: failed to
authenticate: (1) Operation not permitted
2014-04-14 19:09:40.975070 7f8ba9973780  1 mds.-1.0 suicide.  wanted
down:dne, now up:boot

That was fixed with restarting mds (+ the whole server).

2014-04-15 07:07:15.948650 7f9fdec0d780  0 ceph version 0.72.2
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 506
2014-04-15 07:07:15.954386 7f9fdec0d780 -1 mds.-1.0 ERROR: failed to
authenticate: (1) Operation not permitted
2014-04-15 07:07:15.954422 7f9fdec0d780  1 mds.-1.0 suicide.  wanted
down:dne, now up:boot
2014-04-15 07:15:49.177861 7fe8a1d60780  0 ceph version 0.72.2
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 26401
2014-04-15 07:15:49.184027 7fe8a1d60780 -1 mds.-1.0 ERROR: failed to
authenticate: (1) Operation not permitted
2014-04-15 07:15:49.184046 7fe8a1d60780  1 mds.-1.0 suicide.  wanted
down:dne, now up:boot
2014-04-15 07:17:32.598031 7fab123e6780  0 ceph version 0.72.2
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 30531
2014-04-15 07:17:32.604560 7fab123e6780 -1 mds.-1.0 ERROR: failed to
authenticate: (1) Operation not permitted
2014-04-15 07:17:32.604592 7fab123e6780  1 mds.-1.0 suicide.  wanted
down:dne, now up:boot
2014-04-15 07:21:56.099203 7fd37b951780  0 ceph version 0.72.2
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 11335
2014-04-15 07:21:56.105229 7fd37b951780 -1 mds.-1.0 ERROR: failed to
authenticate: (1) Operation not permitted
2014-04-15 07:21:56.105254 7fd37b951780  1 mds.-1.0 suicide.  wanted
down:dne, now up:boot
2014-04-15 07:22:09.345800 7f23392ef780  0 ceph version 0.72.2
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 11461
2014-04-15 07:22:09.390001 7f23392ef780 -1 mds.-1.0 ERROR: failed to
authenticate: (1) Operation not permitted
2014-04-15 07:22:09.391087 7f23392ef780  1 mds.-1.0 suicide.  wanted
down:dne, now up:boot
2014-04-15 07:28:01.762191 7fab6d14b780  0 ceph version 0.72.2
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 28263
2014-04-15 07:28:01.779485 7fab6d14b780 -1 mds.-1.0 ERROR: failed to
authenticate: (1) Operation not permitted
2014-04-15 07:28:01.779507 7fab6d14b780  1 mds.-1.0 suicide.  wanted
down:dne, now up:boot
2014-04-15 07:35:49.065110 7fe4f6b0d780  0 ceph version 0.72.2
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 1233
2014-04-15 07:35:52.191856 7fe4f6b05700  0 -- 10.0.1.107:6800/1233 >>
10.0.1.108:6789/0 pipe(0x2f9f500 sd=8 :0 s=1 pgs=0 cs=0 l=1
c=0x2f81580).fault
2014-04-15 07:35:5

Re: [ceph-users] RBD write access patterns and atime

2014-04-17 Thread Dan van der Ster

Christian Balzer wrote:

>  I'm trying to understand that distribution, and the best explanation
>  I've come up with is that these are ext4/xfs metadata updates,
>  probably atime updates. Based on that theory, I'm going to test
>  noatime on a few VMs and see if I notice a change in the distribution.
>

That strikes me as odd, as since kernel2.6.30  the default option for mounts is 
relatime, which should have an effect quite close to that of a strict noatime.
That's a good point, which I hadn't realized. I confirmed that relatime 
is used by default on our RHEL6 client VMs, so it probably isn't the 
file accesses leading to many small writes. Any other theories?


Cheers, Dan

-- Dan van der Ster || Data & Storage Services || CERN IT Department --
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD write access patterns and atime

2014-04-17 Thread Dan van der Ster

Mike Dawson wrote:

Dan,

Could you describe how you harvested and analyzed this data? Even 
better, could you share the code?


Cheers,
Mike

First enable debug_filestore=10, then you'll see logs like this:

2014-04-17 09:40:34.466749 7fb39df16700 10 
filestore(/var/lib/ceph/osd/osd.0) write 
4.206_head/57186206/rbd_data.1f7ccd36575a0ed.1620/head//4 
651264~4096 = 4096


and this for reads:

2014-04-17 09:46:10.449577 7fb392427700 10 
filestore(/var/lib/ceph/osd/osd.0) FileStore::read 
4.fe9_head/f7281fe9/rbd_data.10bb48f705289c0.6a24/head//4 
1994752~4096/4096


The last num is the size of the write/read.

Then run this: 
https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl


Cheers, Dan

--
-- Dan van der Ster || Data & Storage Services || CERN IT Department --
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Troubles MDS

2014-04-17 Thread Georg Höllrigl

Hello Greg,

I've searched - but don't see any backtraces... I've tried to get some 
more info out of the logs. I really hope, there is something interesting 
in it:


It all started two days ago with an authentication error:

2014-04-14 21:08:55.929396 7fd93d53f700  1 mds.0.0 
standby_replay_restart (as standby)

2014-04-14 21:09:07.989547 7fd93b62e700  1 mds.0.0 replay_done (as standby)
2014-04-14 21:09:08.989647 7fd93d53f700  1 mds.0.0 
standby_replay_restart (as standby)

2014-04-14 21:09:10.633786 7fd93b62e700  1 mds.0.0 replay_done (as standby)
2014-04-14 21:09:11.633886 7fd93d53f700  1 mds.0.0 
standby_replay_restart (as standby)
2014-04-14 21:09:17.995105 7fd93f644700  0 mds.0.0 handle_mds_beacon no 
longer laggy

2014-04-14 21:09:39.798798 7fd93f644700  0 monclient: hunting for new mon
2014-04-14 21:09:39.955078 7fd93f644700  1 mds.-1.-1 handle_mds_map i 
(10.0.1.107:6800/16503) dne in the mdsmap, respawning myself

2014-04-14 21:09:39.955094 7fd93f644700  1 mds.-1.-1 respawn
2014-04-14 21:09:39.955106 7fd93f644700  1 mds.-1.-1  e: '/usr/bin/ceph-mds'
2014-04-14 21:09:39.955109 7fd93f644700  1 mds.-1.-1  0: '/usr/bin/ceph-mds'
2014-04-14 21:09:39.955110 7fd93f644700  1 mds.-1.-1  1: '-i'
2014-04-14 21:09:39.955112 7fd93f644700  1 mds.-1.-1  2: 'ceph-m-02'
2014-04-14 21:09:39.955113 7fd93f644700  1 mds.-1.-1  3: '--pid-file'
2014-04-14 21:09:39.955114 7fd93f644700  1 mds.-1.-1  4: 
'/var/run/ceph/mds.ceph-m-02.pid'

2014-04-14 21:09:39.955116 7fd93f644700  1 mds.-1.-1  5: '-c'
2014-04-14 21:09:39.955117 7fd93f644700  1 mds.-1.-1  6: 
'/etc/ceph/ceph.conf'

2014-04-14 21:09:39.979138 7fd93f644700  1 mds.-1.-1  cwd /
2014-04-14 19:09:40.922683 7f8ba9973780  0 ceph version 0.72.2 
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 16505
2014-04-14 19:09:40.975024 7f8ba9973780 -1 mds.-1.0 ERROR: failed to 
authenticate: (1) Operation not permitted
2014-04-14 19:09:40.975070 7f8ba9973780  1 mds.-1.0 suicide.  wanted 
down:dne, now up:boot


That was fixed with restarting mds (+ the whole server).

2014-04-15 07:07:15.948650 7f9fdec0d780  0 ceph version 0.72.2 
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 506
2014-04-15 07:07:15.954386 7f9fdec0d780 -1 mds.-1.0 ERROR: failed to 
authenticate: (1) Operation not permitted
2014-04-15 07:07:15.954422 7f9fdec0d780  1 mds.-1.0 suicide.  wanted 
down:dne, now up:boot
2014-04-15 07:15:49.177861 7fe8a1d60780  0 ceph version 0.72.2 
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 26401
2014-04-15 07:15:49.184027 7fe8a1d60780 -1 mds.-1.0 ERROR: failed to 
authenticate: (1) Operation not permitted
2014-04-15 07:15:49.184046 7fe8a1d60780  1 mds.-1.0 suicide.  wanted 
down:dne, now up:boot
2014-04-15 07:17:32.598031 7fab123e6780  0 ceph version 0.72.2 
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 30531
2014-04-15 07:17:32.604560 7fab123e6780 -1 mds.-1.0 ERROR: failed to 
authenticate: (1) Operation not permitted
2014-04-15 07:17:32.604592 7fab123e6780  1 mds.-1.0 suicide.  wanted 
down:dne, now up:boot
2014-04-15 07:21:56.099203 7fd37b951780  0 ceph version 0.72.2 
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 11335
2014-04-15 07:21:56.105229 7fd37b951780 -1 mds.-1.0 ERROR: failed to 
authenticate: (1) Operation not permitted
2014-04-15 07:21:56.105254 7fd37b951780  1 mds.-1.0 suicide.  wanted 
down:dne, now up:boot
2014-04-15 07:22:09.345800 7f23392ef780  0 ceph version 0.72.2 
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 11461
2014-04-15 07:22:09.390001 7f23392ef780 -1 mds.-1.0 ERROR: failed to 
authenticate: (1) Operation not permitted
2014-04-15 07:22:09.391087 7f23392ef780  1 mds.-1.0 suicide.  wanted 
down:dne, now up:boot
2014-04-15 07:28:01.762191 7fab6d14b780  0 ceph version 0.72.2 
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 28263
2014-04-15 07:28:01.779485 7fab6d14b780 -1 mds.-1.0 ERROR: failed to 
authenticate: (1) Operation not permitted
2014-04-15 07:28:01.779507 7fab6d14b780  1 mds.-1.0 suicide.  wanted 
down:dne, now up:boot
2014-04-15 07:35:49.065110 7fe4f6b0d780  0 ceph version 0.72.2 
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mds, pid 1233
2014-04-15 07:35:52.191856 7fe4f6b05700  0 -- 10.0.1.107:6800/1233 >> 
10.0.1.108:6789/0 pipe(0x2f9f500 sd=8 :0 s=1 pgs=0 cs=0 l=1 
c=0x2f81580).fault

2014-04-15 07:35:56.352553 7fe4f1b96700  1 mds.-1.0 handle_mds_map standby
2014-04-15 07:35:56.419499 7fe4f1b96700  1 mds.0.0 handle_mds_map i am 
now mds.7905854.0replaying mds.0.0
2014-04-15 07:35:56.419507 7fe4f1b96700  1 mds.0.0 handle_mds_map state 
change up:standby --> up:standby-replay

2014-04-15 07:35:56.419512 7fe4f1b96700  1 mds.0.0 replay_start
2014-04-15 07:35:56.425229 7fe4f1b96700  1 mds.0.0  recovery set is
2014-04-15 07:35:56.425241 7fe4f1b96700  1 mds.0.0  need osdmap epoch 
1391, have 3696
2014-04-15 07:35:56.497573 7fe4f1b96700  0 mds.0.cache creating system 
inode with ino:100
2014-04-15 07:35

Re: [ceph-users] RBD write access patterns and atime

2014-04-17 Thread Dan van der Ster

Hi,

Gregory Farnum wrote:

I forget which clients you're using — is rbd caching enabled?
Yes, the clients are qemu-kvm-rhev with latest librbd from dumpling and 
rbd cache = true.

Cheers, Dan

-- Dan van der Ster || Data & Storage Services || CERN IT Department --
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com