date:20170110

Re: [ceph-users] pg stuck in peering while power failure

2017-01-10 Thread Craig Chi

Hi Sam,

Thank you for your precise inspection.

I reviewed the log at the time, and I discovered that the cluster failed a OSD 
just after I shut the first unit down. Thus as you said, the pg can't finish 
peering due to the second unit was then shut off suddenly.

Much appreciate your advice, but I aim to keep my cluster working when 2 
storage nodes are down. The unexpected OSD failed with the following log just 
at the time I shut the first unit down:

2017-01-10 12:30:07.905562 mon.1 172.20.1.3:6789/0 28484 : cluster [INF] 
osd.153 172.20.3.2:6810/26796 failed (2 reporters from different host after 
20.072026>= grace 20.00)

But that OSD was not dead actually, more likely had slow response to 
heartbeats. What I think is increasing the osd_heartbeat_grace may somehow 
mitigate the issue.

Sincerely,
Craig Chi

On 2017-01-11 00:08, Samuel Justwrote:
> { "name": "Started\/Primary\/Peering", "enter_time": "2017-01-10 
> 13:43:34.933074", "past_intervals": [ { "first": 75858, "last": 75860, 
> "maybe_went_rw": 1, "up": [ 345, 622, 685, 183, 792, 2147483647, 2147483647, 
> 401, 516 ], "acting": [ 345, 622, 685, 183, 792, 2147483647, 2147483647, 401, 
> 516 ], "primary": 345, "up_primary":345 }, Between 75858 and 75860, 345, 622, 
> 685, 183, 792, 2147483647, 2147483647, 401, 516 was the acting set. The 
> current acting set 345, 622, 685, 183, 2147483647, 2147483647, 153, 401, 516 
> needs *all 7* of the osds from epochs 75858 through 75860 to ensure that it 
> has any writes completed during that time. You can make transient situations 
> like that less of a problem by setting min_size to 8 (though it'll prevent 
> writes with 2 failures until backfill completes). A possible enhancement for 
> an EC pool would be to gather the infos from those osds anyway and use that 
> rule outwrites (if they actually happened, you'd still be stuck). -Sam On 
> Tue, Jan 10, 20
 17 at 5:

36 AM, Craig Chiwrote:>Hi List,>>I am testing the 
stability of my Ceph cluster with power failure.>>I brutally powered off 2 Ceph 
units with each 90 OSDs on it while the client>I/O was continuing.>>Since then, 
some of the pgs of my cluster stucked in peering>>pgmap v3261136: 17408 pgs, 4 
pools, 176 TB data, 5082 kobjects>236 TB used, 5652 TB / 5889 TB 
avail>8563455/38919024 objects degraded (22.003%)>13526 
active+undersized+degraded>3769 active+clean>104 down+remapped+peering>9 
down+peering>>I queried the peering pg (all on EC pool with 7+2) and got 
blocked>information (full query: http://pastebin.com/pRkaMG2h 
)>>"probing_osds": 
[>"153(6)",>"183(3)",>"345(0)",>"401(7)",>"516(8)",>"622(1)",>"685(2)">],>"blocked":
 "peering is blocked due to down osds",>"down_osds_we_would_probe": 
[>792>],>"peering_blocked_by": [>{>"osd": 792,>"current_lost_at": 0,>"comment": 
"starting or marking this osd lost may let us>proceed">}>]>>>osd.792 is exactly 
on one of the unit
 s I powe

red off. And I think the I/O>associated with this pg is paused too.>>I have 
checked the troubleshooting page on Ceph website 
(>http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/>), 
it says that start the OSD or mark it lost can make the procedure>continue.>>I 
am sure that my cluster was healthy before the power outage occurred. I 
am>wondering if the power outage really happens in production environment, 
will>it also freeze my client I/O if I don't do anything? Since I just lost 
2>redundancies (I have erasure code with 7+2), I think it should still 
serve>normal functionality.>>Or if I am doing something wrong? Please give me 
some suggestions, thanks.>>Sincerely,>Craig 
Chi>>___>ceph-users mailing 
list>ceph-users@lists.ceph.com>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Crushmap (tunables) flapping on cluster

2017-01-10 Thread Breunig, Steve (KASRL)

Yes, that was the problem, thx.

Von: Stillwell, Bryan J 
Gesendet: Dienstag, 10. Januar 2017 18:06:10
An: Breunig, Steve (KASRL); ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] Crushmap (tunables) flapping on cluster

On 1/10/17, 2:56 AM, "ceph-users on behalf of Breunig, Steve (KASRL)"
 wrote:

>Hi list,
>
>
>I'm running a cluster which is currently in migration from hammer to
>jewel.
>
>
>Actually i have the problem, that the tunables are flapping and a map of
>an rbd image is not working.
>
>
>It is flapping between:
>
>
>{
>"choose_local_tries": 0,
>"choose_local_fallback_tries": 0,
>"choose_total_tries": 50,
>"chooseleaf_descend_once": 1,
>"chooseleaf_vary_r": 1,
>"chooseleaf_stable": 0,
>"straw_calc_version": 1,
>"allowed_bucket_algs": 54,
>"profile": "hammer",
>"optimal_tunables": 0,
>"legacy_tunables": 0,
>"minimum_required_version": "hammer",
>"require_feature_tunables": 1,
>"require_feature_tunables2": 1,
>"has_v2_rules": 0,
>"require_feature_tunables3": 1,
>"has_v3_rules": 0,
>"has_v4_buckets": 1,
>"require_feature_tunables5": 0,
>"has_v5_rules": 0
>}
>
>
>and
>
>
>{
>"choose_local_tries": 0,
>"choose_local_fallback_tries": 0,
>"choose_total_tries": 50,
>"chooseleaf_descend_once": 1,
>"chooseleaf_vary_r": 1,
>"straw_calc_version": 1,
>"allowed_bucket_algs": 54,
>"profile": "hammer",
>"optimal_tunables": 0,
>"legacy_tunables": 0,
>"require_feature_tunables": 1,
>"require_feature_tunables2": 1,
>"require_feature_tunables3": 1,
>"has_v2_rules": 0,
>"has_v3_rules": 0,
>"has_v4_buckets": 1
>}
>
>
>Did someone have that problem too?
>How can it be solved?

Have you upgraded all the mon nodes?  My guess is that when you're running
'ceph osd crush show-tunables' it's sometimes being reported from a hammer
mon node and sometimes from a jewel mon node.

You can run 'ceph tell mon.* version' to verify they're all running the
same version.

When you say the map is failing, are you using the kernel rbd driver?  If
so you might need to upgrade your kernel to support the new features in
jewel.

Bryan

E-MAIL CONFIDENTIALITY NOTICE:
The contents of this e-mail message and any attachments are intended solely for 
the addressee(s) and may contain confidential and/or legally privileged 
information. If you are not the intended recipient of this message or if this 
message has been addressed to you in error, please immediately alert the sender 
by reply e-mail and then delete this message and any attachments. If you are 
not the intended recipient, you are notified that any use, dissemination, 
distribution, copying, or storage of this message or any attachment is strictly 
prohibited.

Kantar Disclaimer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Review of Ceph on ZFS - or how not to deploy Ceph for RBD + OpenStack

2017-01-10 Thread Adrian Saul


I would concur having spent a lot of time on ZFS on Solaris.

ZIL will reduce the fragmentation problem a lot (because it is not doing intent 
logging into the filesystem itself which fragments the block allocations) and 
write response will be a lot better.  I would use different devices for L2ARC 
and ZIL - ZIL needs to be small and fast for writes (and mirrored - we have 
used some HGST 16G devices which are designed as ZILs - pricy but highly 
recommend) - L2ARC just needs to be faster for reads than your data disks, most 
SSDs would be fine for this.

A 14 disk RAIDZ2 is also going to be very poor for writes especially with SATA 
- you are effectively only getting one disk worth of IOPS for write as each 
write needs to hit all disks.  Without a ZIL you are also losing out on write 
IOPS for ZIL and metadata operations.



> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Patrick Donnelly
> Sent: Wednesday, 11 January 2017 5:24 PM
> To: Kevin Olbrich
> Cc: Ceph Users
> Subject: Re: [ceph-users] Review of Ceph on ZFS - or how not to deploy Ceph
> for RBD + OpenStack
>
> Hello Kevin,
>
> On Tue, Jan 10, 2017 at 4:21 PM, Kevin Olbrich  wrote:
> > 5x Ceph node equipped with 32GB RAM, Intel i5, Intel DC P3700 NVMe
> > journal,
>
> Is the "journal" used as a ZIL?
>
> > We experienced a lot of io blocks (X requests blocked > 32 sec) when a
> > lot of data is changed in cloned RBDs (disk imported via OpenStack
> > Glance, cloned during instance creation by Cinder).
> > If the disk was cloned some months ago and large software updates are
> > applied (a lot of small files) combined with a lot of syncs, we often
> > had a node hit suicide timeout.
> > Most likely this is a problem with op thread count, as it is easy to
> > block threads with RAIDZ2 (RAID6) if many small operations are written
> > to disk (again, COW is not optimal here).
> > When recovery took place (0.020% degraded) the cluster performance was
> > very bad - remote service VMs (Windows) were unusable. Recovery itself
> > was using
> > 70 - 200 mb/s which was okay.
>
> I would think having an SSD ZIL here would make a very large difference.
> Probably a ZIL may have a much larger performance impact than an L2ARC
> device. [You may even partition it and have both but I'm not sure if that's
> normally recommended.]
>
> Thanks for your writeup!
>
> --
> Patrick Donnelly
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Confidentiality: This email and any attachments are confidential and may be 
subject to copyright, legal or some other professional privilege. They are 
intended solely for the attention and use of the named addressee(s). They may 
only be copied, distributed or disclosed with the consent of the copyright 
owner. If you have received this email by mistake or by breach of the 
confidentiality clause, please notify the sender immediately by return email 
and delete or destroy all copies of the email. Any confidentiality, privilege 
or copyright is not waived or lost because this email has been sent to you by 
mistake.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Review of Ceph on ZFS - or how not to deploy Ceph for RBD + OpenStack

2017-01-10 Thread Patrick Donnelly

Hello Kevin,

On Tue, Jan 10, 2017 at 4:21 PM, Kevin Olbrich  wrote:
> 5x Ceph node equipped with 32GB RAM, Intel i5, Intel DC P3700 NVMe journal,

Is the "journal" used as a ZIL?

> We experienced a lot of io blocks (X requests blocked > 32 sec) when a lot
> of data is changed in cloned RBDs (disk imported via OpenStack Glance,
> cloned during instance creation by Cinder).
> If the disk was cloned some months ago and large software updates are
> applied (a lot of small files) combined with a lot of syncs, we often had a
> node hit suicide timeout.
> Most likely this is a problem with op thread count, as it is easy to block
> threads with RAIDZ2 (RAID6) if many small operations are written to disk
> (again, COW is not optimal here).
> When recovery took place (0.020% degraded) the cluster performance was very
> bad - remote service VMs (Windows) were unusable. Recovery itself was using
> 70 - 200 mb/s which was okay.

I would think having an SSD ZIL here would make a very large
difference. Probably a ZIL may have a much larger performance impact
than an L2ARC device. [You may even partition it and have both but I'm
not sure if that's normally recommended.]

Thanks for your writeup!

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-10 Thread Shinobu Kinjo

Yeah, Sam is correct. I've not looked at crushmap. But I should have
noticed what troublesome is with looking at `ceph osd tree`. That's my
bad, sorry for that.

Again please refer to:

http://www.anchor.com.au/blog/2013/02/pulling-apart-cephs-crush-algorithm/

Regards,


On Wed, Jan 11, 2017 at 1:50 AM, Samuel Just  wrote:
> Shinobu isn't correct, you have 9/9 osds up and running.  up does not
> equal acting because crush is having trouble fulfilling the weights in
> your crushmap and the acting set is being padded out with an extra osd
> which happens to have the data to keep you up to the right number of
> replicas.  Please refer back to Brad's post.
> -Sam
>
> On Mon, Jan 9, 2017 at 11:08 PM, Marcus Müller  
> wrote:
>> Ok, i understand but how can I debug why they are not running as they 
>> should? For me I thought everything is fine because ceph -s said they are up 
>> and running.
>>
>> I would think of a problem with the crush map.
>>
>>> Am 10.01.2017 um 08:06 schrieb Shinobu Kinjo :
>>>
>>> e.g.,
>>> OSD7 / 3 / 0 are in the same acting set. They should be up, if they
>>> are properly running.
>>>
>>> # 9.7
>>> 
   "up": [
   7,
   3
   ],
   "acting": [
   7,
   3,
   0
   ],
>>> 
>>>
>>> Here is an example:
>>>
>>>  "up": [
>>>1,
>>>0,
>>>2
>>>  ],
>>>  "acting": [
>>>1,
>>>0,
>>>2
>>>   ],
>>>
>>> Regards,
>>>
>>>
>>> On Tue, Jan 10, 2017 at 3:52 PM, Marcus Müller  
>>> wrote:
>
> That's not perfectly correct.
>
> OSD.0/1/2 seem to be down.


 Sorry but where do you see this? I think this indicates that they are up:  
  osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs?


> Am 10.01.2017 um 07:50 schrieb Shinobu Kinjo :
>
> On Tue, Jan 10, 2017 at 3:44 PM, Marcus Müller  
> wrote:
>> All osds are currently up:
>>
>>health HEALTH_WARN
>>   4 pgs stuck unclean
>>   recovery 4482/58798254 objects degraded (0.008%)
>>   recovery 420522/58798254 objects misplaced (0.715%)
>>   noscrub,nodeep-scrub flag(s) set
>>monmap e9: 5 mons at
>> {ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0,ceph5=192.168.60.11:6789/0}
>>   election epoch 478, quorum 0,1,2,3,4
>> ceph1,ceph2,ceph3,ceph4,ceph5
>>osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs
>>   flags noscrub,nodeep-scrub
>> pgmap v9981077: 320 pgs, 3 pools, 4837 GB data, 19140 kobjects
>>   15070 GB used, 40801 GB / 55872 GB avail
>>   4482/58798254 objects degraded (0.008%)
>>   420522/58798254 objects misplaced (0.715%)
>>316 active+clean
>>  4 active+remapped
>> client io 56601 B/s rd, 45619 B/s wr, 0 op/s
>>
>> This did not chance for two days or so.
>>
>>
>> By the way, my ceph osd df now looks like this:
>>
>> ID WEIGHT  REWEIGHT SIZE   USEAVAIL  %USE  VAR
>> 0 1.28899  1.0  3724G  1699G  2024G 45.63 1.69
>> 1 1.57899  1.0  3724G  1708G  2015G 45.87 1.70
>> 2 1.68900  1.0  3724G  1695G  2028G 45.54 1.69
>> 3 6.78499  1.0  7450G  1241G  6208G 16.67 0.62
>> 4 8.3  1.0  7450G  1228G  6221G 16.49 0.61
>> 5 9.51500  1.0  7450G  1239G  6210G 16.64 0.62
>> 6 7.66499  1.0  7450G  1265G  6184G 16.99 0.63
>> 7 9.75499  1.0  7450G  2497G  4952G 33.52 1.24
>> 8 9.32999  1.0  7450G  2495G  4954G 33.49 1.24
>> TOTAL 55872G 15071G 40801G 26.97
>> MIN/MAX VAR: 0.61/1.70  STDDEV: 13.16
>>
>> As you can see, now osd2 also went down to 45% Use and „lost“ data. But I
>> also think this is no problem and ceph just clears everything up after
>> backfilling.
>>
>>
>> Am 10.01.2017 um 07:29 schrieb Shinobu Kinjo :
>>
>> Looking at ``ceph -s`` you originally provided, all OSDs are up.
>>
>> osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs
>>
>>
>> But looking at ``pg query``, OSD.0 / 1 are not up. Are they something
>
> That's not perfectly correct.
>
> OSD.0/1/2 seem to be down.
>
>> like related to ?:
>>
>> Ceph1, ceph2 and ceph3 are vms on one physical host
>>
>>
>> Are those OSDs running on vm instances?
>>
>> # 9.7
>> 
>>
>> "state": "active+remapped",
>> "snap_trimq": "[]",
>> "epoch": 3114,
>> "up": [
>> 7,
>> 3
>> ],
>> "acting": [
>> 7,
>> 3,
>> 0
>> ],
>>
>> 
>>
>> # 7.84
>> 
>>
>> "state": "active+remapped",
>> "snap_trimq": "[]",
>> "epoch": 3114,
>> "up": [
>> 4,
>>

Re: [ceph-users] Analysing ceph performance with SSD journal, 10gbe NIC and 2 replicas -Hammer release

2017-01-10 Thread Willem Jan Withagen

On 10-1-2017 20:35, Lionel Bouton wrote:
> Hi,

I usually don't top post, but this time it is just to agree whole
hartedly with what you wrote. And you have again more arguements as to why.

Using SSD that don't work right is a certain recipe for losing data.

--WjW

> Le 10/01/2017 à 19:32, Brian Andrus a écrit :
>> [...]
>>
>>
>> I think the main point I'm trying to address is - as long as the
>> backing OSD isn't egregiously handling large amounts of writes and it
>> has a good journal in front of it (that properly handles O_DSYNC [not
>> D_SYNC as Sebastien's article states]), it is unlikely inconsistencies
>> will occur upon a crash and subsequent restart.
> 
> I don't see how you can guess if it is "unlikely". If you need SSDs you
> are probably handling relatively large amounts of accesses (so large
> amounts of writes aren't unlikely) or you would have used cheap 7200rpm
> or even slower drives.
> 
> Remember that in the default configuration, if you have any 3 OSDs
> failing at the same time, you have chances of losing data. For <30 OSDs
> and size=3 this is highly probable as there are only a few thousands
> combinations of 3 OSDs possible (and you usually have typically a
> thousand or 2 of pgs picking OSDs in a more or less random pattern).
> 
> With SSDs not handling write barriers properly I wouldn't bet on
> recovering the filesystems of all OSDs properly given a cluster-wide
> power loss shutting down all the SSDs at the same time... In fact as the
> hardware will lie about the stored data, the filesystem might not even
> detect the crash properly and might apply its own journal on outdated
> data leading to unexpected results.
> So losing data is a possibility and testing for it is almost impossible
> (you'll have to reproduce all the different access patterns your Ceph
> cluster could experience at the time of a power loss and trigger the
> power losses in each case).
> 
>>
>> Therefore - while not ideal to rely on journals to maintain consistency,
> 
> Ceph journals aren't designed for maintaining the filestore consistency.
> They *might* restrict the access patterns to the filesystems in such a
> way that running fsck on them after a "let's throw away committed data"
> crash might have better chances of restoring enough data but if it's the
> case it's only an happy coincidence (and you will have to run these
> fscks *manually* as the filesystem can't detect inconsistencies by itself).
> 
>> that is what they are there for.
> 
> No. They are here for Ceph internal consistency, not the filesystem
> backing the filestore consistency. Ceph relies both on journals and
> filesystems able to maintain internal consistency and supporting syncfs
> to maintain consistency, if the journal or the filesystem fails the OSD
> is damaged. If 3 OSDs are damaged at the same time on a size=3 pool you
> enter "probable data loss" territory.
> 
>> There is a situation where "consumer-grade" SSDs could be used as
>> OSDs. While not ideal, it can and has been done before, and may be
>> preferable to tossing out $500k of SSDs (Seen it firsthand!)
> 
> For these I'd like to know :
> - which SSD models were used ?
> - how long did the SSDs survive (some consumer SSDs not only lie to the
> system about write completions but they usually don't handle large
> amounts of write nearly as well as DC models) ?
> - how many cluster-wide power losses did the cluster survive ?
> - what were the access patterns on the cluster during the power losses ?
> 
> If for a model not guaranteed for sync writes there hasn't been dozens
> of power losses on clusters under large loads without any problem
> detected in the week following (thing deep-scrub), using them is playing
> Russian roulette with your data.
> 
> AFAIK there have only been reports of data losses and/or heavy
> maintenance later when people tried to use consumer SSDs (admittedly
> mainly for journals). I've yet to spot long-running robust clusters
> built with consumer SSDs.
> 
> Lionel

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Review of Ceph on ZFS - or how not to deploy Ceph for RBD + OpenStack

2017-01-10 Thread Lindsay Mathieson


On 11/01/2017 7:21 AM, Kevin Olbrich wrote:


Read-Cache using normal Samsung PRO SSDs works very well


How did you implement the cache and measure the results?

a ZFS ssd cache will perform very badly with VM hosting and/or 
distriibuted filesystems, the random nature of the I/O and the ARC cache 
essential render it useless. I never saw better than a 6% hit rate with 
L2ARC.



Also if used as Journals or SSD Tiers, Samsung Pro have shocking write 
performance.


ZFS is probably not optimal for Ceph, but regardless of the underlying 
file system, with a 5 Node, 2G, Replica 3 setup you are going to see 
pretty bad write performance.


POOMA U  - but I believe that linked clones, especially old ones are 
going to be pretty slow.


--
Lindsay Mathieson

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Review of Ceph on ZFS - or how not to deploy Ceph for RBD + OpenStack

2017-01-10 Thread Kevin Olbrich

Dear Ceph-users,

just to make sure nobody makes the mistake, I would like to share my
experience with Ceph on ZFS in our test lab.
ZFS is a Copy-on-Write filesystem and is suitable IMHO where data
resilience has high priority.
I work for a mid-sized datacenter in Germany and we set up a cluster using
Ceph hammer -> infernalis -> jewel 10.2.3 (upgrades during 24/7 usage).
We initialy had chosen ZFS for it's great cache (ARC) and thought it would
be a great idea to use it instead of XFS (or EXT4 when it was supported).
Before we were using ZFS for Backup-Storage JBODs and made good results
(performance is great!).

We then assumed that ZFS is a good choice for distributed / high
availability scenarios.
Since end 2015 I was running OpenStack Liberty / Mitaka on top of this
cluster and our use case were all sorts of VMs (20/80 split Win / Linux).
We are running this cluster setup for over a year now.

Details:

   - 80x Disks (56x 500GB SATA via FC, 24x 1TB SATA via SAS) JBOD
   - All nodes (OpenStack and Ceph) on CentOS 7
   - Everything Kernel 3.10.0-x, switched to 4.4.30+ (elrepo) while upgrade
   to jewel
   - ZFSonLinux latest
   - 5x Ceph node equipped with 32GB RAM, Intel i5, Intel DC P3700 NVMe
   journal, Emulex Fiber PCIe for JBOD
   - 2x 1GBit-Bond per node with belance-alb (belancing using different
   MAC-address during ARP) on two switches
   - 2x HP 2920 using 20G interconnect, then switched to 2x HP Comware 5130
   using IRF-stack with 20G interconnect
   - Nodes had RAIDZ2 (RAID6) configuration for 14x 500GB disks (= 1 OSD
   node) and the 24 disk JBOD had 4x RAIDZ2 (RAID6) using 6 disks each (= 4
   OSD node, only 2 in production).
   - 90x VMs in total at the time we ended our evaluation
   - 6 OSDs in total
   - pgnum 128 x 4 pools, 512 PGs total, size 2 and min_size 1
   - OSD filled 30 - 40%, low fragmentation

We were not using 10GBit NICs as our VM traffic would not exceed 2x GBit
per node in normal operation as we expected a lot of 4k blocks from Windows
Remote Services (known as "terminal server").

Pros:

   - Survived two outages without a single lost object (just had to do "pg
   repair num" on 4 PGs)
   KVM VMs were frozen and OS started to reset SCSI bus until cluster was
   back online - no broken databases (we were running MySQL, MSSQL and
   Exchange)
   - Read-Cache using normal Samsung PRO SSDs works very well
   - Together with multipathd optimal redundancy and performance
   - Deep-Scrub is not needed as ZFS can scrub itself in RAIDZ1 and RAIDZ2
   backed by checksumms

Cons:

   - Performance goes lower and lower with ongoing usage (we added the 1TB
   disks JBOD to accommodate this issue) but lately hit it again.
   - Disks spin at 100% all the time in the 14x 500G JBODs, 30% at the
   SAS-JBOD - mostly related to COW
   - Even a little bit of fragmentation results in slow downs
   - If Deep-Scrub is enabled, IO stucks very often
   - noout-flag needs to be set to stop recovery storm (which is bad as a
   recovery of a single 500GB OSD is great while 6 TB takes a very long time)


We moved from Hammer in 2015 to Infernalis in early 2016 and to Jewel in
Oct 2016. During upgrade to Jewel, we moved to the elrepo.org kernel-lt
package and upgraded from kernel 3.10.0 to 4.4.30+.
Migration from Infernalis to Jewel was noticeable, most VMs were running a
lot faster but we also had a great increase of stuck requests. I am not
sure but I did not notice any on Infernalis.

We experienced a lot of io blocks (X requests blocked > 32 sec) when a lot
of data is changed in cloned RBDs (disk imported via OpenStack Glance,
cloned during instance creation by Cinder).
If the disk was cloned some months ago and large software updates are
applied (a lot of small files) combined with a lot of syncs, we often had a
node hit suicide timeout.
Most likely this is a problem with op thread count, as it is easy to block
threads with RAIDZ2 (RAID6) if many small operations are written to disk
(again, COW is not optimal here).
When recovery took place (0.020% degraded) the cluster performance was very
bad - remote service VMs (Windows) were unusable. Recovery itself was using
70 - 200 mb/s which was okay.

Read did not cause any problems. We made a lot of backups of the running
VMs during the day and performance in other VMs was slightly lowered -
nothing we realy worried about.
All in all read performance was okay while write performance was awful as
soon as filestore flush kicked in (= some seconds when downloading stuff
via GBit to the VM).
Scrub and Deepscrub needed to be disabled to remain "normal operation" -
this is the worst point about this setup.

In data resilience terms we were very satisfied. We had one node crashing
regulary with Infernalis (we never found the reason after 3 days) before we
upgraded to Jewel and no data was corrupted when this happend (especially
MS Exchange did not complain!).
After we upgraded to Jewel, it did not crash again. In all cases, VMs were
fully functional.

Re: [ceph-users] Ceph cache tier removal.

2017-01-10 Thread Nick Fisk

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> Daznis
> Sent: 09 January 2017 12:54
> To: ceph-users 
> Subject: [ceph-users] Ceph cache tier removal.
> 
> Hello,
> 
> 
> I'm running preliminary test on cache tier removal on a live cluster, before 
> I try to do that on a production one. I'm trying to
avoid
> downtime, but from what I noticed it's either impossible or I'm doing 
> something wrong. My cluster is running Centos 7.2 and 0.94.9
> ceph.
> 
> Example 1:
>  I'm setting the cache layer to forward.
> 1. ceph osd tier cache-mode test-cache forward .
> Then flushing the cache:
>  1. rados -p test-cache cache-flush-evict-all Then I'm getting stuck with 
> the some objects that can't be removed:
> 
> rbd_header.29c3cdb2ae8944a
> failed to evict /rbd_header.29c3cdb2ae8944a: (16) Device or resource busy
> rbd_header.28c96316763845e
> failed to evict /rbd_header.28c96316763845e: (16) Device or resource busy 
> error from cache-flush-evict-all: (1) Operation not
> permitted
> 

These are probably the objects which have watchers attached. The current evict 
logic seems to be unable to evict these, hence the
error. I'm not sure if anything can be done to work around this other than what 
you have tried...ie stopping the VM, which will
remove the watcher.

> I found a workaround for this. You can bypass these errors by running
>   1. ceph osd tier remove-overlay test-pool
>   2. turning off the VM's that are using them.
> 
> For the second option. I can boot the VM's normally after recreating a new 
> overlay/cauchetier. At this point everything is working
fine,
> but I'm trying to avoid downtime as it takes almost 8h to start and check 
> everything to be in optimal condition.
> 
> Now for the first part. I can remove the overlay and flush cache layer. And 
> VM's are running fine with it removed. Issues start
after I
> have readed the cache layer to the cold pool and try to write/read from the 
> disk. For no apparent reason VM's just freeze. And you
> need to force stop/start all VM's to start working.

Which pool are the VM's being pointed at base or cache? I'm wondering if it's 
something to do with the pool id changing?

> 
> From what I have read about it all objects should leave cache tier and you 
> don't have to  "force" removing the tier with objects.
> 
> Now onto the questions:
> 
>1. Is it normal for VPS to freeze while adding a cache layer/tier?
>2. Do VMS' need to be offline to remove caching layer?
>3. I have read somewhere that snapshots might interfere with cache
> tier clean up. Is it true?4. Are there some other ways to
> remove the caching tier on a live system?
> 
> 
> Regards,
> 
> 
> Darius
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Write back cache removal

2017-01-10 Thread Nick Fisk

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Stuart 
Harland
Sent: 10 January 2017 11:58
To: Wido den Hollander 
Cc: ceph new ; n...@fisk.me.uk
Subject: Re: [ceph-users] Write back cache removal

Yes Wido, you are correct. There is a RBD pool in the cluster, but is not 
currently running with a cache attached. The Pool I’m trying to manage here is 
only used by Librados to write objects directly to the pool as opposed to any 
of the other niceties that ceph provides.

Specifically I ran:

`ceph osd tier cache-mode  forward`

which returned `Error EPERM: 'forward' is not a well-supported cache mode and 
may corrupt your data.  pass --yes-i-really-mean-it to force.`

Currently we are running 10.2.5. I suspect that it’s fine in our use case, 
however given the sparsity of the documentation I didn’t like to assume 
anything.

Regards

Stuart

Yep, sorry, I got this post mixed up with the one from Daznis yesterday who was 
using RBD’s. I think that warning was introduced as some bugs were found that 
corrupted some users data after frequently switching between writeback and 
forward modes. As it is very rarely used mode and so wasn’t worth the testing I 
believe the decision was taken to just implement the warning. If you are using 
it as part of removing a cache tier and you have already flushed the tier, then 
I believe it should be fine to use. 

Another way would probably be to set the min promote thresholds to higher than 
your hit set counts, this will abuse the tiering logic but should also stop 
anything getting promoted into your cache tier.

On 10 Jan 2017, at 09:52, Wido den Hollander  > wrote:

Op 10 januari 2017 om 9:52 schreef Nick Fisk  >:

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Wido 
den Hollander
Sent: 10 January 2017 07:54
To: ceph new  >; 
Stuart Harland  >
Subject: Re: [ceph-users] Write back cache removal

Op 9 januari 2017 om 13:02 schreef Stuart Harland 
 >:

Hi,

We’ve been operating a ceph storage system storing files using librados (using 
a replicated pool on rust disks). We implemented a

cache over the top of this with SSDs, however we now want to turn this off.

The documentation suggests setting the cache mode to forward before draining 
the pool, however the ceph management

controller spits out an error about this saying that it is unsupported and 
hence dangerous.

What version of Ceph are you running?

And can you paste the exact command and the output?

Wido

Hi Wido,

I think this has been discussed before and looks like it might be a current 
limitation. Not sure if it's on anybody's radar to fix.

https://www.mail-archive.com/ceph-users@lists.ceph.com/msg24472.html

Might be, but afaik they are using their own application which writes to RADOS 
using librados, not RBD.

Is that correct Stuart?

Wido

Nick

The thing is I cannot really locate any documentation as to why it’s considered 
unsupported and under what conditions it is expected

to fail: I have read a passing comment about EC pools having data corruption, 
but we are using replicated pools.

Is this something that is safe to do?

Otherwise I have noted the read proxy mode of cache tiers which is documented 
as a mechanism to transition from write back to

disabled, however the documentation is even sparser on this than forward mode. 
Would this be a better approach if there is some
unsupported behaviour in the forward mode cache option?

Any thoughts would be appreciated - we really cannot afford to corrupt the 
data, and I really do not want to have to do some

manual software based eviction on this data.

regards

Stuart

− Stuart Harland:
Infrastructure Engineer
Email: s.harl...@livelinktechnology.net 

LiveLink Technology Ltd
McCormack House
56A East Street
Havant
PO9 1BS

IMPORTANT: The information transmitted in this e-mail is intended only for the 
person or entity to whom it is addressed and may

contain confidential and/or privileged information. If you are not the intended 
recipient of this message, please do not read, copy, use
or disclose this communication and notify the sender immediately. Any review, 
retransmission, dissemination or other use of, or
taking any action in reliance upon this information by persons or entities 
other than the intended recipient is prohibited. Any views or
opinions presented in this e-mail are solely those of the author and do not 
necessarily represent those of LiveLink. This

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-10 Thread Marcus Müller

Hi Sam,

another idea: I have two HDDs here and already wanted to add them to ceph5, so 
that I would need a new crush map. Could this problem be solved by doing this?


> Am 10.01.2017 um 17:50 schrieb Samuel Just :
> 
> Shinobu isn't correct, you have 9/9 osds up and running.  up does not
> equal acting because crush is having trouble fulfilling the weights in
> your crushmap and the acting set is being padded out with an extra osd
> which happens to have the data to keep you up to the right number of
> replicas.  Please refer back to Brad's post.
> -Sam
> 
> On Mon, Jan 9, 2017 at 11:08 PM, Marcus Müller  
> wrote:
>> Ok, i understand but how can I debug why they are not running as they 
>> should? For me I thought everything is fine because ceph -s said they are up 
>> and running.
>> 
>> I would think of a problem with the crush map.
>> 
>>> Am 10.01.2017 um 08:06 schrieb Shinobu Kinjo :
>>> 
>>> e.g.,
>>> OSD7 / 3 / 0 are in the same acting set. They should be up, if they
>>> are properly running.
>>> 
>>> # 9.7
>>> 
  "up": [
  7,
  3
  ],
  "acting": [
  7,
  3,
  0
  ],
>>> 
>>> 
>>> Here is an example:
>>> 
>>> "up": [
>>>   1,
>>>   0,
>>>   2
>>> ],
>>> "acting": [
>>>   1,
>>>   0,
>>>   2
>>>  ],
>>> 
>>> Regards,
>>> 
>>> 
>>> On Tue, Jan 10, 2017 at 3:52 PM, Marcus Müller  
>>> wrote:
> 
> That's not perfectly correct.
> 
> OSD.0/1/2 seem to be down.
 
 
 Sorry but where do you see this? I think this indicates that they are up:  
  osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs?
 
 
> Am 10.01.2017 um 07:50 schrieb Shinobu Kinjo :
> 
> On Tue, Jan 10, 2017 at 3:44 PM, Marcus Müller  
> wrote:
>> All osds are currently up:
>> 
>>   health HEALTH_WARN
>>  4 pgs stuck unclean
>>  recovery 4482/58798254 objects degraded (0.008%)
>>  recovery 420522/58798254 objects misplaced (0.715%)
>>  noscrub,nodeep-scrub flag(s) set
>>   monmap e9: 5 mons at
>> {ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0,ceph5=192.168.60.11:6789/0}
>>  election epoch 478, quorum 0,1,2,3,4
>> ceph1,ceph2,ceph3,ceph4,ceph5
>>   osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs
>>  flags noscrub,nodeep-scrub
>>pgmap v9981077: 320 pgs, 3 pools, 4837 GB data, 19140 kobjects
>>  15070 GB used, 40801 GB / 55872 GB avail
>>  4482/58798254 objects degraded (0.008%)
>>  420522/58798254 objects misplaced (0.715%)
>>   316 active+clean
>> 4 active+remapped
>> client io 56601 B/s rd, 45619 B/s wr, 0 op/s
>> 
>> This did not chance for two days or so.
>> 
>> 
>> By the way, my ceph osd df now looks like this:
>> 
>> ID WEIGHT  REWEIGHT SIZE   USEAVAIL  %USE  VAR
>> 0 1.28899  1.0  3724G  1699G  2024G 45.63 1.69
>> 1 1.57899  1.0  3724G  1708G  2015G 45.87 1.70
>> 2 1.68900  1.0  3724G  1695G  2028G 45.54 1.69
>> 3 6.78499  1.0  7450G  1241G  6208G 16.67 0.62
>> 4 8.3  1.0  7450G  1228G  6221G 16.49 0.61
>> 5 9.51500  1.0  7450G  1239G  6210G 16.64 0.62
>> 6 7.66499  1.0  7450G  1265G  6184G 16.99 0.63
>> 7 9.75499  1.0  7450G  2497G  4952G 33.52 1.24
>> 8 9.32999  1.0  7450G  2495G  4954G 33.49 1.24
>>TOTAL 55872G 15071G 40801G 26.97
>> MIN/MAX VAR: 0.61/1.70  STDDEV: 13.16
>> 
>> As you can see, now osd2 also went down to 45% Use and „lost“ data. But I
>> also think this is no problem and ceph just clears everything up after
>> backfilling.
>> 
>> 
>> Am 10.01.2017 um 07:29 schrieb Shinobu Kinjo :
>> 
>> Looking at ``ceph -s`` you originally provided, all OSDs are up.
>> 
>> osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs
>> 
>> 
>> But looking at ``pg query``, OSD.0 / 1 are not up. Are they something
> 
> That's not perfectly correct.
> 
> OSD.0/1/2 seem to be down.
> 
>> like related to ?:
>> 
>> Ceph1, ceph2 and ceph3 are vms on one physical host
>> 
>> 
>> Are those OSDs running on vm instances?
>> 
>> # 9.7
>> 
>> 
>> "state": "active+remapped",
>> "snap_trimq": "[]",
>> "epoch": 3114,
>> "up": [
>>7,
>>3
>> ],
>> "acting": [
>>7,
>>3,
>>0
>> ],
>> 
>> 
>> 
>> # 7.84
>> 
>> 
>> "state": "active+remapped",
>> "snap_trimq": "[]",
>> "epoch": 3114,
>> "up": [
>>4,
>>8
>> ],
>> "acting": [
>>4,
>>8,
>>1
>> ],
>> 
>> 
>>

Re: [ceph-users] Analysing ceph performance with SSD journal, 10gbe NIC and 2 replicas -Hammer release

2017-01-10 Thread Lionel Bouton

Hi,

Le 10/01/2017 à 19:32, Brian Andrus a écrit :
> [...]
>
>
> I think the main point I'm trying to address is - as long as the
> backing OSD isn't egregiously handling large amounts of writes and it
> has a good journal in front of it (that properly handles O_DSYNC [not
> D_SYNC as Sebastien's article states]), it is unlikely inconsistencies
> will occur upon a crash and subsequent restart.

I don't see how you can guess if it is "unlikely". If you need SSDs you
are probably handling relatively large amounts of accesses (so large
amounts of writes aren't unlikely) or you would have used cheap 7200rpm
or even slower drives.

Remember that in the default configuration, if you have any 3 OSDs
failing at the same time, you have chances of losing data. For <30 OSDs
and size=3 this is highly probable as there are only a few thousands
combinations of 3 OSDs possible (and you usually have typically a
thousand or 2 of pgs picking OSDs in a more or less random pattern).

With SSDs not handling write barriers properly I wouldn't bet on
recovering the filesystems of all OSDs properly given a cluster-wide
power loss shutting down all the SSDs at the same time... In fact as the
hardware will lie about the stored data, the filesystem might not even
detect the crash properly and might apply its own journal on outdated
data leading to unexpected results.
So losing data is a possibility and testing for it is almost impossible
(you'll have to reproduce all the different access patterns your Ceph
cluster could experience at the time of a power loss and trigger the
power losses in each case).

>
> Therefore - while not ideal to rely on journals to maintain consistency,

Ceph journals aren't designed for maintaining the filestore consistency.
They *might* restrict the access patterns to the filesystems in such a
way that running fsck on them after a "let's throw away committed data"
crash might have better chances of restoring enough data but if it's the
case it's only an happy coincidence (and you will have to run these
fscks *manually* as the filesystem can't detect inconsistencies by itself).

> that is what they are there for.

No. They are here for Ceph internal consistency, not the filesystem
backing the filestore consistency. Ceph relies both on journals and
filesystems able to maintain internal consistency and supporting syncfs
to maintain consistency, if the journal or the filesystem fails the OSD
is damaged. If 3 OSDs are damaged at the same time on a size=3 pool you
enter "probable data loss" territory.

> There is a situation where "consumer-grade" SSDs could be used as
> OSDs. While not ideal, it can and has been done before, and may be
> preferable to tossing out $500k of SSDs (Seen it firsthand!)

For these I'd like to know :
- which SSD models were used ?
- how long did the SSDs survive (some consumer SSDs not only lie to the
system about write completions but they usually don't handle large
amounts of write nearly as well as DC models) ?
- how many cluster-wide power losses did the cluster survive ?
- what were the access patterns on the cluster during the power losses ?

If for a model not guaranteed for sync writes there hasn't been dozens
of power losses on clusters under large loads without any problem
detected in the week following (thing deep-scrub), using them is playing
Russian roulette with your data.

AFAIK there have only been reports of data losses and/or heavy
maintenance later when people tried to use consumer SSDs (admittedly
mainly for journals). I've yet to spot long-running robust clusters
built with consumer SSDs.

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] High CPU usage by ceph-mgr on idle Ceph cluster

2017-01-10 Thread Alfredo Deza

On Tue, Jan 10, 2017 at 12:59 PM, Samuel Just  wrote:
> Mm, maybe the tag didn't get pushed.  Alfredo, is there supposed to be
> a v11.1.1 tag?

Yep. You can see there is one here: https://github.com/ceph/ceph/releases

Specifically: https://github.com/ceph/ceph/releases/tag/v11.1.1 which
points to 
https://github.com/ceph/ceph/commit/87597971b371d7f497d7eabad3545d72d18dd755


> -Sam
>
> On Tue, Jan 10, 2017 at 9:57 AM, Stillwell, Bryan J
>  wrote:
>> That's strange, I installed that version using packages from here:
>>
>> http://download.ceph.com/debian-kraken/pool/main/c/ceph/
>>
>>
>> Bryan
>>
>> On 1/10/17, 10:51 AM, "Samuel Just"  wrote:
>>
>>>Can you push that branch somewhere?  I don't have a v11.1.1 or that sha1.
>>>-Sam
>>>
>>>On Tue, Jan 10, 2017 at 9:41 AM, Stillwell, Bryan J
>>> wrote:
 This is from:

 ceph version 11.1.1 (87597971b371d7f497d7eabad3545d72d18dd755)

 On 1/10/17, 10:23 AM, "Samuel Just"  wrote:

>What ceph sha1 is that?  Does it include
>6c3d015c6854a12cda40673848813d968ff6afae which fixed the messenger
>spin?
>-Sam
>
>On Tue, Jan 10, 2017 at 9:00 AM, Stillwell, Bryan J
> wrote:
>> On 1/10/17, 5:35 AM, "John Spray"  wrote:
>>
>>>On Mon, Jan 9, 2017 at 11:46 PM, Stillwell, Bryan J
>>> wrote:
 Last week I decided to play around with Kraken (11.1.1-1xenial) on a
 single node, two OSD cluster, and after a while I noticed that the
new
 ceph-mgr daemon is frequently using a lot of the CPU:

 17519 ceph  20   0  850044 168104208 S 102.7  4.3   1278:27
 ceph-mgr

 Restarting it with 'systemctl restart ceph-mgr*' seems to get its
CPU
 usage down to < 1%, but after a while it climbs back up to > 100%.
Has
 anyone else seen this?
>>>
>>>Definitely worth investigating, could you set "debug mgr = 20" on the
>>>daemon to see if it's obviously spinning in a particular place?
>>
>> I've injected that option to the ceps-mgr process, and now I'm just
>> waiting for it to go out of control again.
>>
>> However, I've noticed quite a few messages like this in the logs
>>already:
>>
>> 2017-01-10 09:56:07.441678 7f70f4562700  0 -- 172.24.88.207:6800/4104

>> 172.24.88.207:0/4168225878 conn(0x563c7e0bc000 :6800 s=STATE_OPEN
>>pgs=2
>> cs=1 l=0).fault initiating reconnect
>> 2017-01-10 09:56:07.442044 7f70f4562700  0 -- 172.24.88.207:6800/4104

>> 172.24.88.207:0/4168225878 conn(0x563c7dfea800 :6800
>> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
>>l=0).handle_connect_msg
>> accept connect_seq 0 vs existing csq=2 existing_state=STATE_CONNECTING
>> 2017-01-10 09:56:07.442067 7f70f4562700  0 -- 172.24.88.207:6800/4104

>> 172.24.88.207:0/4168225878 conn(0x563c7dfea800 :6800
>> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
>>l=0).handle_connect_msg
>> accept peer reset, then tried to connect to us, replacing
>> 2017-01-10 09:56:07.443026 7f70f4562700  0 -- 172.24.88.207:6800/4104

>> 172.24.88.207:0/4168225878 conn(0x563c7e0bc000 :6800
>> s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=2 cs=0 l=0).fault with nothing
>>to
>> send and in the half  accept state just closed
>>
>>
>> What's weird about that is that this is a single node cluster with
>> ceph-mgr, ceph-mon, and the ceph-osd processes all running on the same
>> host.  So none of the communication should be leaving the node.
>>
>> Bryan
>>
>> E-MAIL CONFIDENTIALITY NOTICE:
>> The contents of this e-mail message and any attachments are intended
>>solely for the addressee(s) and may contain confidential and/or legally
>>privileged information. If you are not the intended recipient of this
>>message or if this message has been addressed to you in error, please
>>immediately alert the sender by reply e-mail and then delete this
>>message and any attachments. If you are not the intended recipient, you
>>are notified that any use, dissemination, distribution, copying, or
>>storage of this message or any attachment is strictly prohibited.
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 E-MAIL CONFIDENTIALITY NOTICE:
 The contents of this e-mail message and any attachments are intended
solely for the addressee(s) and may contain confidential and/or legally
privileged information. If you are not the intended recipient of this
message or if this message has been addressed to you in error, please
immediately alert

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-10 Thread Marcus Müller

Ok, thanks. Then I will change the tunables.

As far as I see, this would already help me: ceph osd crush tunables bobtail

Even if we run ceph hammer this would work according to the documentation, am I 
right? 

And: I’m using librados for our clients (hammer too) could this change create 
problems (even on older kernels)?


> Am 10.01.2017 um 17:50 schrieb Samuel Just :
> 
> Shinobu isn't correct, you have 9/9 osds up and running.  up does not
> equal acting because crush is having trouble fulfilling the weights in
> your crushmap and the acting set is being padded out with an extra osd
> which happens to have the data to keep you up to the right number of
> replicas.  Please refer back to Brad's post.
> -Sam
> 
>> On Mon, Jan 9, 2017 at 11:08 PM, Marcus Müller  
>> wrote:
>> Ok, i understand but how can I debug why they are not running as they 
>> should? For me I thought everything is fine because ceph -s said they are up 
>> and running.
>> 
>> I would think of a problem with the crush map.
>> 
>>> Am 10.01.2017 um 08:06 schrieb Shinobu Kinjo :
>>> 
>>> e.g.,
>>> OSD7 / 3 / 0 are in the same acting set. They should be up, if they
>>> are properly running.
>>> 
>>> # 9.7
>>> 
 "up": [
 7,
 3
 ],
 "acting": [
 7,
 3,
 0
 ],
>>> 
>>> 
>>> Here is an example:
>>> 
>>> "up": [
>>>  1,
>>>  0,
>>>  2
>>> ],
>>> "acting": [
>>>  1,
>>>  0,
>>>  2
>>> ],
>>> 
>>> Regards,
>>> 
>>> 
>>> On Tue, Jan 10, 2017 at 3:52 PM, Marcus Müller  
>>> wrote:
> 
> That's not perfectly correct.
> 
> OSD.0/1/2 seem to be down.
 
 
 Sorry but where do you see this? I think this indicates that they are up:  
  osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs?
 
 
> Am 10.01.2017 um 07:50 schrieb Shinobu Kinjo :
> 
> On Tue, Jan 10, 2017 at 3:44 PM, Marcus Müller  
> wrote:
>> All osds are currently up:
>> 
>>  health HEALTH_WARN
>> 4 pgs stuck unclean
>> recovery 4482/58798254 objects degraded (0.008%)
>> recovery 420522/58798254 objects misplaced (0.715%)
>> noscrub,nodeep-scrub flag(s) set
>>  monmap e9: 5 mons at
>> {ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0,ceph5=192.168.60.11:6789/0}
>> election epoch 478, quorum 0,1,2,3,4
>> ceph1,ceph2,ceph3,ceph4,ceph5
>>  osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs
>> flags noscrub,nodeep-scrub
>>   pgmap v9981077: 320 pgs, 3 pools, 4837 GB data, 19140 kobjects
>> 15070 GB used, 40801 GB / 55872 GB avail
>> 4482/58798254 objects degraded (0.008%)
>> 420522/58798254 objects misplaced (0.715%)
>>  316 active+clean
>>4 active+remapped
>> client io 56601 B/s rd, 45619 B/s wr, 0 op/s
>> 
>> This did not chance for two days or so.
>> 
>> 
>> By the way, my ceph osd df now looks like this:
>> 
>> ID WEIGHT  REWEIGHT SIZE   USEAVAIL  %USE  VAR
>> 0 1.28899  1.0  3724G  1699G  2024G 45.63 1.69
>> 1 1.57899  1.0  3724G  1708G  2015G 45.87 1.70
>> 2 1.68900  1.0  3724G  1695G  2028G 45.54 1.69
>> 3 6.78499  1.0  7450G  1241G  6208G 16.67 0.62
>> 4 8.3  1.0  7450G  1228G  6221G 16.49 0.61
>> 5 9.51500  1.0  7450G  1239G  6210G 16.64 0.62
>> 6 7.66499  1.0  7450G  1265G  6184G 16.99 0.63
>> 7 9.75499  1.0  7450G  2497G  4952G 33.52 1.24
>> 8 9.32999  1.0  7450G  2495G  4954G 33.49 1.24
>>   TOTAL 55872G 15071G 40801G 26.97
>> MIN/MAX VAR: 0.61/1.70  STDDEV: 13.16
>> 
>> As you can see, now osd2 also went down to 45% Use and „lost“ data. But I
>> also think this is no problem and ceph just clears everything up after
>> backfilling.
>> 
>> 
>> Am 10.01.2017 um 07:29 schrieb Shinobu Kinjo :
>> 
>> Looking at ``ceph -s`` you originally provided, all OSDs are up.
>> 
>> osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs
>> 
>> 
>> But looking at ``pg query``, OSD.0 / 1 are not up. Are they something
> 
> That's not perfectly correct.
> 
> OSD.0/1/2 seem to be down.
> 
>> like related to ?:
>> 
>> Ceph1, ceph2 and ceph3 are vms on one physical host
>> 
>> 
>> Are those OSDs running on vm instances?
>> 
>> # 9.7
>> 
>> 
>> "state": "active+remapped",
>> "snap_trimq": "[]",
>> "epoch": 3114,
>> "up": [
>>   7,
>>   3
>> ],
>> "acting": [
>>   7,
>>   3,
>>   0
>> ],
>> 
>> 
>> 
>> # 7.84
>> 
>> 
>> "state": "active+remapped",
>> "snap_trimq": "[]",
>> "epoch": 3114,
>> "up": [

Re: [ceph-users] Failing to Activate new OSD ceph-deploy

2017-01-10 Thread David Turner

Removing the setuser_match_path resolved this.  This seems like an oversight in 
this setting that allows people to run osds as root that prevents them from 
adding storage.

[cid:image0f8f2e.JPG@91910f83.4d87ff12]   David 
Turner | Cloud Operations Engineer | StorageCraft Technology 
Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2760 | Mobile: 385.224.2943

If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

From: Scottix [scot...@gmail.com]
Sent: Tuesday, January 10, 2017 11:52 AM
To: David Turner; ceph-users
Subject: Re: [ceph-users] Failing to Activate new OSD ceph-deploy

I think I got it to work by removing setuser_match_path = 
/var/lib/ceph/$type/$cluster-$id from the host machine.
I think I did do a reboot it was a while ago so don't remember exactly.
Then running ceph-deploy activate

--Scott

On Tue, Jan 10, 2017 at 10:16 AM David Turner 
> wrote:

Did you ever fitgure out why the /var/lib/ceph/osd/ceph-22 folder was not being 
created automatically?  We are having this issue while testing adding storage 
to an upgraded to jewel ceph cluster.  Like you manually creating the directory 
and setting the permissions for the directory will allow us to activate the osd 
and it comes up and in without issue.

[X]   David Turner | Cloud Operations Engineer | 
StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2760 | Mobile: 
385.224.2943

If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

From: ceph-users 
[ceph-users-boun...@lists.ceph.com] 
on behalf of Scottix [scot...@gmail.com]
Sent: Thursday, July 07, 2016 5:01 PM
To: ceph-users
Subject: Re: [ceph-users] Failing to Activate new OSD ceph-deploy

I played with it enough to make it work.

Basically i created the directory it was going to put the data in
mkdir /var/lib/ceph/osd/ceph-22

Then I ran ceph-deploy activate which then did a little bit more into putting 
it in the cluster but it still didn't start because of permissions with the 
journal.

Some of the permissions were set to ceph:ceph I tried the new permissions but 
it failed to start, and after reading a mailing list a reboot may have fixed 
that.
Anyway I ran chown -R root:root ceph-22 and after that is started.

I still need to fix permissions but I am happy I got it in atleast.

--Scott

On Thu, Jul 7, 2016 at 2:54 PM Scottix 
> wrote:
Hey,
This is the first time I have had a problem with ceph-deploy

I have attached the log but I can't seem to activate the osd.

I am running
ceph version 10.2.0 (3a9fba20ec743699b69bd0181dd6c54dc01c64b9)

I did upgrade from Infernalis->Jewel
I haven't changed ceph ownership but I do have the config option
setuser_match_path = /var/lib/ceph/$type/$cluster-$id

Any help would be appreciated,
Scott
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Failing to Activate new OSD ceph-deploy

2017-01-10 Thread Scottix

My guess, ceph-deploy doesn't know how to handle that setting. I just
remove it on host machine to add the disk then put it back so the other
will boot as root.

--Scottie

On Tue, Jan 10, 2017 at 11:02 AM David Turner 
wrote:

> Removing the setuser_match_path resolved this.  This seems like an
> oversight in this setting that allows people to run osds as root that
> prevents them from adding storage.
>
> --
>
>  David Turner | Cloud Operations Engineer | 
> StorageCraft
> Technology Corporation 
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2760 <(801)%20871-2760> | Mobile: 385.224.2943
> <(385)%20224-2943>
>
> --
>
> If you are not the intended recipient of this message or received it
> erroneously, please notify the sender and delete it, together with any
> attachments, and be advised that any dissemination or copying of this
> message is prohibited.
> --
>
> --
> *From:* Scottix [scot...@gmail.com]
> *Sent:* Tuesday, January 10, 2017 11:52 AM
> *To:* David Turner; ceph-users
>
> *Subject:* Re: [ceph-users] Failing to Activate new OSD ceph-deploy
> I think I got it to work by removing setuser_match_path =
> /var/lib/ceph/$type/$cluster-$id from the host machine.
> I think I did do a reboot it was a while ago so don't remember exactly.
> Then running ceph-deploy activate
>
> --Scott
>
> On Tue, Jan 10, 2017 at 10:16 AM David Turner <
> david.tur...@storagecraft.com> wrote:
>
> Did you ever fitgure out why the /var/lib/ceph/osd/ceph-22 folder was not
> being created automatically?  We are having this issue while testing adding
> storage to an upgraded to jewel ceph cluster.  Like you manually creating
> the directory and setting the permissions for the directory will allow us
> to activate the osd and it comes up and in without issue.
>
> --
>
>  David Turner | Cloud Operations Engineer | 
> StorageCraft
> Technology Corporation 
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2760 <(801)%20871-2760> | Mobile: 385.224.2943
> <(385)%20224-2943>
>
> --
>
> If you are not the intended recipient of this message or received it
> erroneously, please notify the sender and delete it, together with any
> attachments, and be advised that any dissemination or copying of this
> message is prohibited.
>
> --
>
> --
> *From:* ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of
> Scottix [scot...@gmail.com]
> *Sent:* Thursday, July 07, 2016 5:01 PM
> *To:* ceph-users
> *Subject:* Re: [ceph-users] Failing to Activate new OSD ceph-deploy
>
> I played with it enough to make it work.
>
> Basically i created the directory it was going to put the data in
> mkdir /var/lib/ceph/osd/ceph-22
>
> Then I ran ceph-deploy activate which then did a little bit more into
> putting it in the cluster but it still didn't start because of permissions
> with the journal.
>
> Some of the permissions were set to ceph:ceph I tried the new permissions
> but it failed to start, and after reading a mailing list a reboot may have
> fixed that.
> Anyway I ran chown -R root:root ceph-22 and after that is started.
>
> I still need to fix permissions but I am happy I got it in atleast.
>
> --Scott
>
>
>
> On Thu, Jul 7, 2016 at 2:54 PM Scottix  wrote:
>
> Hey,
> This is the first time I have had a problem with ceph-deploy
>
> I have attached the log but I can't seem to activate the osd.
>
> I am running
> ceph version 10.2.0 (3a9fba20ec743699b69bd0181dd6c54dc01c64b9)
>
> I did upgrade from Infernalis->Jewel
> I haven't changed ceph ownership but I do have the config option
> setuser_match_path = /var/lib/ceph/$type/$cluster-$id
>
> Any help would be appreciated,
> Scott
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Failing to Activate new OSD ceph-deploy

2017-01-10 Thread Scottix

I think I got it to work by removing setuser_match_path =
/var/lib/ceph/$type/$cluster-$id from the host machine.
I think I did do a reboot it was a while ago so don't remember exactly.
Then running ceph-deploy activate

--Scott

On Tue, Jan 10, 2017 at 10:16 AM David Turner 
wrote:

Did you ever fitgure out why the /var/lib/ceph/osd/ceph-22 folder was not
being created automatically?  We are having this issue while testing adding
storage to an upgraded to jewel ceph cluster.  Like you manually creating
the directory and setting the permissions for the directory will allow us
to activate the osd and it comes up and in without issue.

--

 David Turner | Cloud Operations Engineer |
StorageCraft
Technology Corporation 
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2760 <(801)%20871-2760> | Mobile: 385.224.2943
<(385)%20224-2943>

--

If you are not the intended recipient of this message or received it
erroneously, please notify the sender and delete it, together with any
attachments, and be advised that any dissemination or copying of this
message is prohibited.

--

--
*From:* ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Scottix
[scot...@gmail.com]
*Sent:* Thursday, July 07, 2016 5:01 PM
*To:* ceph-users
*Subject:* Re: [ceph-users] Failing to Activate new OSD ceph-deploy

I played with it enough to make it work.

Basically i created the directory it was going to put the data in
mkdir /var/lib/ceph/osd/ceph-22

Then I ran ceph-deploy activate which then did a little bit more into
putting it in the cluster but it still didn't start because of permissions
with the journal.

Some of the permissions were set to ceph:ceph I tried the new permissions
but it failed to start, and after reading a mailing list a reboot may have
fixed that.
Anyway I ran chown -R root:root ceph-22 and after that is started.

I still need to fix permissions but I am happy I got it in atleast.

--Scott



On Thu, Jul 7, 2016 at 2:54 PM Scottix  wrote:

Hey,
This is the first time I have had a problem with ceph-deploy

I have attached the log but I can't seem to activate the osd.

I am running
ceph version 10.2.0 (3a9fba20ec743699b69bd0181dd6c54dc01c64b9)

I did upgrade from Infernalis->Jewel
I haven't changed ceph ownership but I do have the config option
setuser_match_path = /var/lib/ceph/$type/$cluster-$id

Any help would be appreciated,
Scott
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Analysing ceph performance with SSD journal, 10gbe NIC and 2 replicas -Hammer release

2017-01-10 Thread Brian Andrus

On Mon, Jan 9, 2017 at 3:33 PM, Willem Jan Withagen  wrote:

> On 9-1-2017 23:58, Brian Andrus wrote:
> > Sorry for spam... I meant D_SYNC.
>
> That term does not run any lights in Google...
> So I would expect it has to O_DSYNC.
> (https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-
> test-if-your-ssd-is-suitable-as-a-journal-device/)
>
> Now you tell me there is a SSDs that does take correct action with
> O_SYNC but not with O_DSYNC... That makes no sense to me. It is a
> typical solution in the OS as speed trade-off versus a bit less
> consistent FS.
>
> Either a device actually writes its data persistenly (either in silicon
> cells, or keeps it in RAM with a supercapacitor), or it does not.
> Something else I can not think off. Maybe my EE background is sort of in
> the way here. And I know that is rather hard to write correct SSD
> firmware, I seen lots of firmware upgrades to actually fix serious
> corner cases.
>
> Now the second thing is how hard does a drive lie when being told that
> the request write is synchronised. And Oke is only returned when data is
> in stable storage, and can not be lost.
>
> If there is a possibility that a sync write to a drive is not
> persistent, then that is a serious breach of the sync write contract.
> There will always be situations possible that these drives will lose data.
> And if data is no longer in the journal, because the writing process
> thinks the data is on stable storage it has deleted the data from the
> journal. In this case that data is permanently lost.
>
> Now you have a second chance (even a third) with Ceph, because data is
> stored multiple times. And you can go to another OSD and try to get it
> back.
>
> --WjW
>

I'm not disagreeing per se.


I think the main point I'm trying to address is - as long as the backing
OSD isn't egregiously handling large amounts of writes and it has a good
journal in front of it (that properly handles O_DSYNC [not D_SYNC as
Sebastien's article states]), it is unlikely inconsistencies will occur
upon a crash and subsequent restart.

Therefore - while not ideal to rely on journals to maintain consistency,
that is what they are there for. There is a situation where
"consumer-grade" SSDs could be used as OSDs. While not ideal, it can and
has been done before, and may be preferable to tossing out $500k of SSDs
(Seen it firsthand!)



>
> >
> > On Mon, Jan 9, 2017 at 2:56 PM, Brian Andrus  > > wrote:
> >
> > Hi Willem, the SSDs are probably fine for backing OSDs, it's the
> > O_DSYNC writes they tend to lie about.
> >
> > They may have a failure rate higher than enterprise-grade SSDs, but
> > are otherwise suitable for use as OSDs if journals are placed
> elsewhere.
> >
> > On Mon, Jan 9, 2017 at 2:39 PM, Willem Jan Withagen  > > wrote:
> >
> > On 9-1-2017 18:46, Oliver Humpage wrote:
> > >
> > >> Why would you still be using journals when running fully OSDs
> on
> > >> SSDs?
> > >
> > > In our case, we use cheaper large SSDs for the data (Samsung
> 850 Pro
> > > 2TB), whose performance is excellent in the cluster, but as
> has been
> > > pointed out in this thread can lose data if power is suddenly
> > > removed.
> > >
> > > We therefore put journals onto SM863 SSDs (1 journal SSD per 3
> OSD
> > > SSDs), which are enterprise quality and have power outage
> protection.
> > > This seems to balance speed, capacity, reliability and budget
> fairly
> > > well.
> >
> > This would make me feel very uncomfortable.
> >
> > So you have a reliable journal, so upto there thing do work:
> >   Once in the journal you data is safe.
> >
> > But then you async transfer the data to disk. And that is an SSD
> > that
> > lies to you? It will tell you that the data is written. But if
> > you pull
> > the power, then it turns out that the data is not really stored.
> >
> > And then the only way to get the data consistent again, is to
> > (deep)scrub.
> >
> > Not a very appealing lookout??
> >
> > --WjW
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> >
> >
> >
> >
> > --
> > Brian Andrus
> > Cloud Systems Engineer
> > DreamHost, LLC
> >
> >
> >
> >
> > --
> > Brian Andrus
> > Cloud Systems Engineer
> > DreamHost, LLC
>
>


-- 
Brian Andrus
Cloud Systems Engineer
DreamHost, LLC
___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] Failing to Activate new OSD ceph-deploy

2017-01-10 Thread David Turner

Did you ever fitgure out why the /var/lib/ceph/osd/ceph-22 folder was not being 
created automatically?  We are having this issue while testing adding storage 
to an upgraded to jewel ceph cluster.  Like you manually creating the directory 
and setting the permissions for the directory will allow us to activate the osd 
and it comes up and in without issue.



[cid:image4b5b84.JPG@302cf538.4591e1be]   David 
Turner | Cloud Operations Engineer | StorageCraft Technology 
Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2760 | Mobile: 385.224.2943



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.




From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Scottix 
[scot...@gmail.com]
Sent: Thursday, July 07, 2016 5:01 PM
To: ceph-users
Subject: Re: [ceph-users] Failing to Activate new OSD ceph-deploy

I played with it enough to make it work.

Basically i created the directory it was going to put the data in
mkdir /var/lib/ceph/osd/ceph-22

Then I ran ceph-deploy activate which then did a little bit more into putting 
it in the cluster but it still didn't start because of permissions with the 
journal.

Some of the permissions were set to ceph:ceph I tried the new permissions but 
it failed to start, and after reading a mailing list a reboot may have fixed 
that.
Anyway I ran chown -R root:root ceph-22 and after that is started.

I still need to fix permissions but I am happy I got it in atleast.

--Scott



On Thu, Jul 7, 2016 at 2:54 PM Scottix 
> wrote:
Hey,
This is the first time I have had a problem with ceph-deploy

I have attached the log but I can't seem to activate the osd.

I am running
ceph version 10.2.0 (3a9fba20ec743699b69bd0181dd6c54dc01c64b9)

I did upgrade from Infernalis->Jewel
I haven't changed ceph ownership but I do have the config option
setuser_match_path = /var/lib/ceph/$type/$cluster-$id

Any help would be appreciated,
Scott
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] High CPU usage by ceph-mgr on idle Ceph cluster

2017-01-10 Thread Samuel Just

Mm, maybe the tag didn't get pushed.  Alfredo, is there supposed to be
a v11.1.1 tag?
-Sam

On Tue, Jan 10, 2017 at 9:57 AM, Stillwell, Bryan J
 wrote:
> That's strange, I installed that version using packages from here:
>
> http://download.ceph.com/debian-kraken/pool/main/c/ceph/
>
>
> Bryan
>
> On 1/10/17, 10:51 AM, "Samuel Just"  wrote:
>
>>Can you push that branch somewhere?  I don't have a v11.1.1 or that sha1.
>>-Sam
>>
>>On Tue, Jan 10, 2017 at 9:41 AM, Stillwell, Bryan J
>> wrote:
>>> This is from:
>>>
>>> ceph version 11.1.1 (87597971b371d7f497d7eabad3545d72d18dd755)
>>>
>>> On 1/10/17, 10:23 AM, "Samuel Just"  wrote:
>>>
What ceph sha1 is that?  Does it include
6c3d015c6854a12cda40673848813d968ff6afae which fixed the messenger
spin?
-Sam

On Tue, Jan 10, 2017 at 9:00 AM, Stillwell, Bryan J
 wrote:
> On 1/10/17, 5:35 AM, "John Spray"  wrote:
>
>>On Mon, Jan 9, 2017 at 11:46 PM, Stillwell, Bryan J
>> wrote:
>>> Last week I decided to play around with Kraken (11.1.1-1xenial) on a
>>> single node, two OSD cluster, and after a while I noticed that the
>>>new
>>> ceph-mgr daemon is frequently using a lot of the CPU:
>>>
>>> 17519 ceph  20   0  850044 168104208 S 102.7  4.3   1278:27
>>> ceph-mgr
>>>
>>> Restarting it with 'systemctl restart ceph-mgr*' seems to get its
>>>CPU
>>> usage down to < 1%, but after a while it climbs back up to > 100%.
>>>Has
>>> anyone else seen this?
>>
>>Definitely worth investigating, could you set "debug mgr = 20" on the
>>daemon to see if it's obviously spinning in a particular place?
>
> I've injected that option to the ceps-mgr process, and now I'm just
> waiting for it to go out of control again.
>
> However, I've noticed quite a few messages like this in the logs
>already:
>
> 2017-01-10 09:56:07.441678 7f70f4562700  0 -- 172.24.88.207:6800/4104
>>>
> 172.24.88.207:0/4168225878 conn(0x563c7e0bc000 :6800 s=STATE_OPEN
>pgs=2
> cs=1 l=0).fault initiating reconnect
> 2017-01-10 09:56:07.442044 7f70f4562700  0 -- 172.24.88.207:6800/4104
>>>
> 172.24.88.207:0/4168225878 conn(0x563c7dfea800 :6800
> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
>l=0).handle_connect_msg
> accept connect_seq 0 vs existing csq=2 existing_state=STATE_CONNECTING
> 2017-01-10 09:56:07.442067 7f70f4562700  0 -- 172.24.88.207:6800/4104
>>>
> 172.24.88.207:0/4168225878 conn(0x563c7dfea800 :6800
> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
>l=0).handle_connect_msg
> accept peer reset, then tried to connect to us, replacing
> 2017-01-10 09:56:07.443026 7f70f4562700  0 -- 172.24.88.207:6800/4104
>>>
> 172.24.88.207:0/4168225878 conn(0x563c7e0bc000 :6800
> s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=2 cs=0 l=0).fault with nothing
>to
> send and in the half  accept state just closed
>
>
> What's weird about that is that this is a single node cluster with
> ceph-mgr, ceph-mon, and the ceph-osd processes all running on the same
> host.  So none of the communication should be leaving the node.
>
> Bryan
>
> E-MAIL CONFIDENTIALITY NOTICE:
> The contents of this e-mail message and any attachments are intended
>solely for the addressee(s) and may contain confidential and/or legally
>privileged information. If you are not the intended recipient of this
>message or if this message has been addressed to you in error, please
>immediately alert the sender by reply e-mail and then delete this
>message and any attachments. If you are not the intended recipient, you
>are notified that any use, dissemination, distribution, copying, or
>storage of this message or any attachment is strictly prohibited.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>> E-MAIL CONFIDENTIALITY NOTICE:
>>> The contents of this e-mail message and any attachments are intended
>>>solely for the addressee(s) and may contain confidential and/or legally
>>>privileged information. If you are not the intended recipient of this
>>>message or if this message has been addressed to you in error, please
>>>immediately alert the sender by reply e-mail and then delete this
>>>message and any attachments. If you are not the intended recipient, you
>>>are notified that any use, dissemination, distribution, copying, or
>>>storage of this message or any attachment is strictly prohibited.
>>>
>
> E-MAIL CONFIDENTIALITY NOTICE:
> The contents of this e-mail message and any attachments are intended solely 
> for the addressee(s) and may contain

Re: [ceph-users] High CPU usage by ceph-mgr on idle Ceph cluster

2017-01-10 Thread Stillwell, Bryan J

That's strange, I installed that version using packages from here:

http://download.ceph.com/debian-kraken/pool/main/c/ceph/

Bryan

On 1/10/17, 10:51 AM, "Samuel Just"  wrote:

>Can you push that branch somewhere?  I don't have a v11.1.1 or that sha1.
>-Sam
>
>On Tue, Jan 10, 2017 at 9:41 AM, Stillwell, Bryan J
> wrote:
>> This is from:
>>
>> ceph version 11.1.1 (87597971b371d7f497d7eabad3545d72d18dd755)
>>
>> On 1/10/17, 10:23 AM, "Samuel Just"  wrote:
>>
>>>What ceph sha1 is that?  Does it include
>>>6c3d015c6854a12cda40673848813d968ff6afae which fixed the messenger
>>>spin?
>>>-Sam
>>>
>>>On Tue, Jan 10, 2017 at 9:00 AM, Stillwell, Bryan J
>>> wrote:
 On 1/10/17, 5:35 AM, "John Spray"  wrote:

>On Mon, Jan 9, 2017 at 11:46 PM, Stillwell, Bryan J
> wrote:
>> Last week I decided to play around with Kraken (11.1.1-1xenial) on a
>> single node, two OSD cluster, and after a while I noticed that the
>>new
>> ceph-mgr daemon is frequently using a lot of the CPU:
>>
>> 17519 ceph  20   0  850044 168104208 S 102.7  4.3   1278:27
>> ceph-mgr
>>
>> Restarting it with 'systemctl restart ceph-mgr*' seems to get its
>>CPU
>> usage down to < 1%, but after a while it climbs back up to > 100%.
>>Has
>> anyone else seen this?
>
>Definitely worth investigating, could you set "debug mgr = 20" on the
>daemon to see if it's obviously spinning in a particular place?

 I've injected that option to the ceps-mgr process, and now I'm just
 waiting for it to go out of control again.

 However, I've noticed quite a few messages like this in the logs
already:

 2017-01-10 09:56:07.441678 7f70f4562700  0 -- 172.24.88.207:6800/4104
>>
 172.24.88.207:0/4168225878 conn(0x563c7e0bc000 :6800 s=STATE_OPEN
pgs=2
 cs=1 l=0).fault initiating reconnect
 2017-01-10 09:56:07.442044 7f70f4562700  0 -- 172.24.88.207:6800/4104
>>
 172.24.88.207:0/4168225878 conn(0x563c7dfea800 :6800
 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0).handle_connect_msg
 accept connect_seq 0 vs existing csq=2 existing_state=STATE_CONNECTING
 2017-01-10 09:56:07.442067 7f70f4562700  0 -- 172.24.88.207:6800/4104
>>
 172.24.88.207:0/4168225878 conn(0x563c7dfea800 :6800
 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0).handle_connect_msg
 accept peer reset, then tried to connect to us, replacing
 2017-01-10 09:56:07.443026 7f70f4562700  0 -- 172.24.88.207:6800/4104
>>
 172.24.88.207:0/4168225878 conn(0x563c7e0bc000 :6800
 s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=2 cs=0 l=0).fault with nothing
to
 send and in the half  accept state just closed

 What's weird about that is that this is a single node cluster with
 ceph-mgr, ceph-mon, and the ceph-osd processes all running on the same
 host.  So none of the communication should be leaving the node.

 Bryan

 E-MAIL CONFIDENTIALITY NOTICE:
 The contents of this e-mail message and any attachments are intended
solely for the addressee(s) and may contain confidential and/or legally
privileged information. If you are not the intended recipient of this
message or if this message has been addressed to you in error, please
immediately alert the sender by reply e-mail and then delete this
message and any attachments. If you are not the intended recipient, you
are notified that any use, dissemination, distribution, copying, or
storage of this message or any attachment is strictly prohibited.

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> E-MAIL CONFIDENTIALITY NOTICE:
>> The contents of this e-mail message and any attachments are intended
>>solely for the addressee(s) and may contain confidential and/or legally
>>privileged information. If you are not the intended recipient of this
>>message or if this message has been addressed to you in error, please
>>immediately alert the sender by reply e-mail and then delete this
>>message and any attachments. If you are not the intended recipient, you
>>are notified that any use, dissemination, distribution, copying, or
>>storage of this message or any attachment is strictly prohibited.
>>

E-MAIL CONFIDENTIALITY NOTICE: 
The contents of this e-mail message and any attachments are intended solely for 
the addressee(s) and may contain confidential and/or legally privileged 
information. If you are not the intended recipient of this message or if this 
message has been addressed to you in error, please immediately alert the sender 
by reply e-mail and then delete this message and any attachments. If you are 
not the intended recipient,

Re: [ceph-users] High CPU usage by ceph-mgr on idle Ceph cluster

2017-01-10 Thread Samuel Just

Can you push that branch somewhere?  I don't have a v11.1.1 or that sha1.
-Sam

On Tue, Jan 10, 2017 at 9:41 AM, Stillwell, Bryan J
 wrote:
> This is from:
>
> ceph version 11.1.1 (87597971b371d7f497d7eabad3545d72d18dd755)
>
> On 1/10/17, 10:23 AM, "Samuel Just"  wrote:
>
>>What ceph sha1 is that?  Does it include
>>6c3d015c6854a12cda40673848813d968ff6afae which fixed the messenger
>>spin?
>>-Sam
>>
>>On Tue, Jan 10, 2017 at 9:00 AM, Stillwell, Bryan J
>> wrote:
>>> On 1/10/17, 5:35 AM, "John Spray"  wrote:
>>>
On Mon, Jan 9, 2017 at 11:46 PM, Stillwell, Bryan J
 wrote:
> Last week I decided to play around with Kraken (11.1.1-1xenial) on a
> single node, two OSD cluster, and after a while I noticed that the new
> ceph-mgr daemon is frequently using a lot of the CPU:
>
> 17519 ceph  20   0  850044 168104208 S 102.7  4.3   1278:27
> ceph-mgr
>
> Restarting it with 'systemctl restart ceph-mgr*' seems to get its CPU
> usage down to < 1%, but after a while it climbs back up to > 100%.
>Has
> anyone else seen this?

Definitely worth investigating, could you set "debug mgr = 20" on the
daemon to see if it's obviously spinning in a particular place?
>>>
>>> I've injected that option to the ceps-mgr process, and now I'm just
>>> waiting for it to go out of control again.
>>>
>>> However, I've noticed quite a few messages like this in the logs
>>>already:
>>>
>>> 2017-01-10 09:56:07.441678 7f70f4562700  0 -- 172.24.88.207:6800/4104 >>
>>> 172.24.88.207:0/4168225878 conn(0x563c7e0bc000 :6800 s=STATE_OPEN pgs=2
>>> cs=1 l=0).fault initiating reconnect
>>> 2017-01-10 09:56:07.442044 7f70f4562700  0 -- 172.24.88.207:6800/4104 >>
>>> 172.24.88.207:0/4168225878 conn(0x563c7dfea800 :6800
>>> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
>>>l=0).handle_connect_msg
>>> accept connect_seq 0 vs existing csq=2 existing_state=STATE_CONNECTING
>>> 2017-01-10 09:56:07.442067 7f70f4562700  0 -- 172.24.88.207:6800/4104 >>
>>> 172.24.88.207:0/4168225878 conn(0x563c7dfea800 :6800
>>> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
>>>l=0).handle_connect_msg
>>> accept peer reset, then tried to connect to us, replacing
>>> 2017-01-10 09:56:07.443026 7f70f4562700  0 -- 172.24.88.207:6800/4104 >>
>>> 172.24.88.207:0/4168225878 conn(0x563c7e0bc000 :6800
>>> s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=2 cs=0 l=0).fault with nothing to
>>> send and in the half  accept state just closed
>>>
>>>
>>> What's weird about that is that this is a single node cluster with
>>> ceph-mgr, ceph-mon, and the ceph-osd processes all running on the same
>>> host.  So none of the communication should be leaving the node.
>>>
>>> Bryan
>>>
>>> E-MAIL CONFIDENTIALITY NOTICE:
>>> The contents of this e-mail message and any attachments are intended
>>>solely for the addressee(s) and may contain confidential and/or legally
>>>privileged information. If you are not the intended recipient of this
>>>message or if this message has been addressed to you in error, please
>>>immediately alert the sender by reply e-mail and then delete this
>>>message and any attachments. If you are not the intended recipient, you
>>>are notified that any use, dissemination, distribution, copying, or
>>>storage of this message or any attachment is strictly prohibited.
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> E-MAIL CONFIDENTIALITY NOTICE:
> The contents of this e-mail message and any attachments are intended solely 
> for the addressee(s) and may contain confidential and/or legally privileged 
> information. If you are not the intended recipient of this message or if this 
> message has been addressed to you in error, please immediately alert the 
> sender by reply e-mail and then delete this message and any attachments. If 
> you are not the intended recipient, you are notified that any use, 
> dissemination, distribution, copying, or storage of this message or any 
> attachment is strictly prohibited.
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] High CPU usage by ceph-mgr on idle Ceph cluster

2017-01-10 Thread Stillwell, Bryan J

This is from:

ceph version 11.1.1 (87597971b371d7f497d7eabad3545d72d18dd755)

On 1/10/17, 10:23 AM, "Samuel Just"  wrote:

>What ceph sha1 is that?  Does it include
>6c3d015c6854a12cda40673848813d968ff6afae which fixed the messenger
>spin?
>-Sam
>
>On Tue, Jan 10, 2017 at 9:00 AM, Stillwell, Bryan J
> wrote:
>> On 1/10/17, 5:35 AM, "John Spray"  wrote:
>>
>>>On Mon, Jan 9, 2017 at 11:46 PM, Stillwell, Bryan J
>>> wrote:
 Last week I decided to play around with Kraken (11.1.1-1xenial) on a
 single node, two OSD cluster, and after a while I noticed that the new
 ceph-mgr daemon is frequently using a lot of the CPU:

 17519 ceph  20   0  850044 168104208 S 102.7  4.3   1278:27
 ceph-mgr

 Restarting it with 'systemctl restart ceph-mgr*' seems to get its CPU
 usage down to < 1%, but after a while it climbs back up to > 100%.
Has
 anyone else seen this?
>>>
>>>Definitely worth investigating, could you set "debug mgr = 20" on the
>>>daemon to see if it's obviously spinning in a particular place?
>>
>> I've injected that option to the ceps-mgr process, and now I'm just
>> waiting for it to go out of control again.
>>
>> However, I've noticed quite a few messages like this in the logs
>>already:
>>
>> 2017-01-10 09:56:07.441678 7f70f4562700  0 -- 172.24.88.207:6800/4104 >>
>> 172.24.88.207:0/4168225878 conn(0x563c7e0bc000 :6800 s=STATE_OPEN pgs=2
>> cs=1 l=0).fault initiating reconnect
>> 2017-01-10 09:56:07.442044 7f70f4562700  0 -- 172.24.88.207:6800/4104 >>
>> 172.24.88.207:0/4168225878 conn(0x563c7dfea800 :6800
>> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
>>l=0).handle_connect_msg
>> accept connect_seq 0 vs existing csq=2 existing_state=STATE_CONNECTING
>> 2017-01-10 09:56:07.442067 7f70f4562700  0 -- 172.24.88.207:6800/4104 >>
>> 172.24.88.207:0/4168225878 conn(0x563c7dfea800 :6800
>> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
>>l=0).handle_connect_msg
>> accept peer reset, then tried to connect to us, replacing
>> 2017-01-10 09:56:07.443026 7f70f4562700  0 -- 172.24.88.207:6800/4104 >>
>> 172.24.88.207:0/4168225878 conn(0x563c7e0bc000 :6800
>> s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=2 cs=0 l=0).fault with nothing to
>> send and in the half  accept state just closed
>>
>>
>> What's weird about that is that this is a single node cluster with
>> ceph-mgr, ceph-mon, and the ceph-osd processes all running on the same
>> host.  So none of the communication should be leaving the node.
>>
>> Bryan
>>
>> E-MAIL CONFIDENTIALITY NOTICE:
>> The contents of this e-mail message and any attachments are intended
>>solely for the addressee(s) and may contain confidential and/or legally
>>privileged information. If you are not the intended recipient of this
>>message or if this message has been addressed to you in error, please
>>immediately alert the sender by reply e-mail and then delete this
>>message and any attachments. If you are not the intended recipient, you
>>are notified that any use, dissemination, distribution, copying, or
>>storage of this message or any attachment is strictly prohibited.
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

E-MAIL CONFIDENTIALITY NOTICE: 
The contents of this e-mail message and any attachments are intended solely for 
the addressee(s) and may contain confidential and/or legally privileged 
information. If you are not the intended recipient of this message or if this 
message has been addressed to you in error, please immediately alert the sender 
by reply e-mail and then delete this message and any attachments. If you are 
not the intended recipient, you are notified that any use, dissemination, 
distribution, copying, or storage of this message or any attachment is strictly 
prohibited.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] High CPU usage by ceph-mgr on idle Ceph cluster

2017-01-10 Thread Samuel Just

What ceph sha1 is that?  Does it include
6c3d015c6854a12cda40673848813d968ff6afae which fixed the messenger
spin?
-Sam

On Tue, Jan 10, 2017 at 9:00 AM, Stillwell, Bryan J
 wrote:
> On 1/10/17, 5:35 AM, "John Spray"  wrote:
>
>>On Mon, Jan 9, 2017 at 11:46 PM, Stillwell, Bryan J
>> wrote:
>>> Last week I decided to play around with Kraken (11.1.1-1xenial) on a
>>> single node, two OSD cluster, and after a while I noticed that the new
>>> ceph-mgr daemon is frequently using a lot of the CPU:
>>>
>>> 17519 ceph  20   0  850044 168104208 S 102.7  4.3   1278:27
>>> ceph-mgr
>>>
>>> Restarting it with 'systemctl restart ceph-mgr*' seems to get its CPU
>>> usage down to < 1%, but after a while it climbs back up to > 100%.  Has
>>> anyone else seen this?
>>
>>Definitely worth investigating, could you set "debug mgr = 20" on the
>>daemon to see if it's obviously spinning in a particular place?
>
> I've injected that option to the ceps-mgr process, and now I'm just
> waiting for it to go out of control again.
>
> However, I've noticed quite a few messages like this in the logs already:
>
> 2017-01-10 09:56:07.441678 7f70f4562700  0 -- 172.24.88.207:6800/4104 >>
> 172.24.88.207:0/4168225878 conn(0x563c7e0bc000 :6800 s=STATE_OPEN pgs=2
> cs=1 l=0).fault initiating reconnect
> 2017-01-10 09:56:07.442044 7f70f4562700  0 -- 172.24.88.207:6800/4104 >>
> 172.24.88.207:0/4168225878 conn(0x563c7dfea800 :6800
> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg
> accept connect_seq 0 vs existing csq=2 existing_state=STATE_CONNECTING
> 2017-01-10 09:56:07.442067 7f70f4562700  0 -- 172.24.88.207:6800/4104 >>
> 172.24.88.207:0/4168225878 conn(0x563c7dfea800 :6800
> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg
> accept peer reset, then tried to connect to us, replacing
> 2017-01-10 09:56:07.443026 7f70f4562700  0 -- 172.24.88.207:6800/4104 >>
> 172.24.88.207:0/4168225878 conn(0x563c7e0bc000 :6800
> s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=2 cs=0 l=0).fault with nothing to
> send and in the half  accept state just closed
>
>
> What's weird about that is that this is a single node cluster with
> ceph-mgr, ceph-mon, and the ceph-osd processes all running on the same
> host.  So none of the communication should be leaving the node.
>
> Bryan
>
> E-MAIL CONFIDENTIALITY NOTICE:
> The contents of this e-mail message and any attachments are intended solely 
> for the addressee(s) and may contain confidential and/or legally privileged 
> information. If you are not the intended recipient of this message or if this 
> message has been addressed to you in error, please immediately alert the 
> sender by reply e-mail and then delete this message and any attachments. If 
> you are not the intended recipient, you are notified that any use, 
> dissemination, distribution, copying, or storage of this message or any 
> attachment is strictly prohibited.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Your company listed as a user / contributor on ceph.com

2017-01-10 Thread Patrick McGarry

Hey cephers,

Now that we're getting ready to launch the new ceph.com site, I'd like
to open it up to anyone that would like to have their company logo
listed as either a "ceph user" or "ceph contributor" with a hyperlink
to your site.

In order to do this I'll need you to send me a logo that is at least
300x300px and in the correct greyscale format:

Ceph User - #8E8E8E (or something very similar, can also be multi-hue
if you need it)

Ceph contributor - #697176


If you would like to see examples, the staging site is still live and
available for review:

http://stage.ceph.com


Thanks!

-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Crushmap (tunables) flapping on cluster

2017-01-10 Thread Stillwell, Bryan J

On 1/10/17, 2:56 AM, "ceph-users on behalf of Breunig, Steve (KASRL)"
 wrote:

>Hi list,
>
>
>I'm running a cluster which is currently in migration from hammer to
>jewel.
>
>
>Actually i have the problem, that the tunables are flapping and a map of
>an rbd image is not working.
>
>
>It is flapping between:
>
>
>{
>"choose_local_tries": 0,
>"choose_local_fallback_tries": 0,
>"choose_total_tries": 50,
>"chooseleaf_descend_once": 1,
>"chooseleaf_vary_r": 1,
>"chooseleaf_stable": 0,
>"straw_calc_version": 1,
>"allowed_bucket_algs": 54,
>"profile": "hammer",
>"optimal_tunables": 0,
>"legacy_tunables": 0,
>"minimum_required_version": "hammer",
>"require_feature_tunables": 1,
>"require_feature_tunables2": 1,
>"has_v2_rules": 0,
>"require_feature_tunables3": 1,
>"has_v3_rules": 0,
>"has_v4_buckets": 1,
>"require_feature_tunables5": 0,
>"has_v5_rules": 0
>}
>
>
>and
>
>
>{
>"choose_local_tries": 0,
>"choose_local_fallback_tries": 0,
>"choose_total_tries": 50,
>"chooseleaf_descend_once": 1,
>"chooseleaf_vary_r": 1,
>"straw_calc_version": 1,
>"allowed_bucket_algs": 54,
>"profile": "hammer",
>"optimal_tunables": 0,
>"legacy_tunables": 0,
>"require_feature_tunables": 1,
>"require_feature_tunables2": 1,
>"require_feature_tunables3": 1,
>"has_v2_rules": 0,
>"has_v3_rules": 0,
>"has_v4_buckets": 1
>}
>
>
>Did someone have that problem too?
>How can it be solved?

Have you upgraded all the mon nodes?  My guess is that when you're running
'ceph osd crush show-tunables' it's sometimes being reported from a hammer
mon node and sometimes from a jewel mon node.

You can run 'ceph tell mon.* version' to verify they're all running the
same version.

When you say the map is failing, are you using the kernel rbd driver?  If
so you might need to upgrade your kernel to support the new features in
jewel.

Bryan

E-MAIL CONFIDENTIALITY NOTICE: 
The contents of this e-mail message and any attachments are intended solely for 
the addressee(s) and may contain confidential and/or legally privileged 
information. If you are not the intended recipient of this message or if this 
message has been addressed to you in error, please immediately alert the sender 
by reply e-mail and then delete this message and any attachments. If you are 
not the intended recipient, you are notified that any use, dissemination, 
distribution, copying, or storage of this message or any attachment is strictly 
prohibited.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] High CPU usage by ceph-mgr on idle Ceph cluster

2017-01-10 Thread Stillwell, Bryan J

On 1/10/17, 5:35 AM, "John Spray"  wrote:

>On Mon, Jan 9, 2017 at 11:46 PM, Stillwell, Bryan J
> wrote:
>> Last week I decided to play around with Kraken (11.1.1-1xenial) on a
>> single node, two OSD cluster, and after a while I noticed that the new
>> ceph-mgr daemon is frequently using a lot of the CPU:
>>
>> 17519 ceph  20   0  850044 168104208 S 102.7  4.3   1278:27
>> ceph-mgr
>>
>> Restarting it with 'systemctl restart ceph-mgr*' seems to get its CPU
>> usage down to < 1%, but after a while it climbs back up to > 100%.  Has
>> anyone else seen this?
>
>Definitely worth investigating, could you set "debug mgr = 20" on the
>daemon to see if it's obviously spinning in a particular place?

I've injected that option to the ceps-mgr process, and now I'm just
waiting for it to go out of control again.

However, I've noticed quite a few messages like this in the logs already:

2017-01-10 09:56:07.441678 7f70f4562700  0 -- 172.24.88.207:6800/4104 >>
172.24.88.207:0/4168225878 conn(0x563c7e0bc000 :6800 s=STATE_OPEN pgs=2
cs=1 l=0).fault initiating reconnect
2017-01-10 09:56:07.442044 7f70f4562700  0 -- 172.24.88.207:6800/4104 >>
172.24.88.207:0/4168225878 conn(0x563c7dfea800 :6800
s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg
accept connect_seq 0 vs existing csq=2 existing_state=STATE_CONNECTING
2017-01-10 09:56:07.442067 7f70f4562700  0 -- 172.24.88.207:6800/4104 >>
172.24.88.207:0/4168225878 conn(0x563c7dfea800 :6800
s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg
accept peer reset, then tried to connect to us, replacing
2017-01-10 09:56:07.443026 7f70f4562700  0 -- 172.24.88.207:6800/4104 >>
172.24.88.207:0/4168225878 conn(0x563c7e0bc000 :6800
s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=2 cs=0 l=0).fault with nothing to
send and in the half  accept state just closed

What's weird about that is that this is a single node cluster with
ceph-mgr, ceph-mon, and the ceph-osd processes all running on the same
host.  So none of the communication should be leaving the node.

Bryan

E-MAIL CONFIDENTIALITY NOTICE: 
The contents of this e-mail message and any attachments are intended solely for 
the addressee(s) and may contain confidential and/or legally privileged 
information. If you are not the intended recipient of this message or if this 
message has been addressed to you in error, please immediately alert the sender 
by reply e-mail and then delete this message and any attachments. If you are 
not the intended recipient, you are notified that any use, dissemination, 
distribution, copying, or storage of this message or any attachment is strictly 
prohibited.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rgw swift api long term support

2017-01-10 Thread Yehuda Sadeh-Weinraub

On Tue, Jan 10, 2017 at 1:35 AM, Marius Vaitiekunas
 wrote:
> Hi,
>
> I would like to ask ceph developers if there any chance that swift api
> support for rgw is going to be dropped in the future (like in 5 years).
>
> Why am I asking? :)
>
> We were happy openstack glance users on ceph s3 api until openstack decided
> to drop glance s3 support.. So, we need to switch our image backend. Swift
> api on ceph looks quite good solution.
>

I'm not aware of any plans (current or future) to drop swift api.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-10 Thread Samuel Just

Shinobu isn't correct, you have 9/9 osds up and running.  up does not
equal acting because crush is having trouble fulfilling the weights in
your crushmap and the acting set is being padded out with an extra osd
which happens to have the data to keep you up to the right number of
replicas.  Please refer back to Brad's post.
-Sam

On Mon, Jan 9, 2017 at 11:08 PM, Marcus Müller  wrote:
> Ok, i understand but how can I debug why they are not running as they should? 
> For me I thought everything is fine because ceph -s said they are up and 
> running.
>
> I would think of a problem with the crush map.
>
>> Am 10.01.2017 um 08:06 schrieb Shinobu Kinjo :
>>
>> e.g.,
>> OSD7 / 3 / 0 are in the same acting set. They should be up, if they
>> are properly running.
>>
>> # 9.7
>> 
>>>   "up": [
>>>   7,
>>>   3
>>>   ],
>>>   "acting": [
>>>   7,
>>>   3,
>>>   0
>>>   ],
>> 
>>
>> Here is an example:
>>
>>  "up": [
>>1,
>>0,
>>2
>>  ],
>>  "acting": [
>>1,
>>0,
>>2
>>   ],
>>
>> Regards,
>>
>>
>> On Tue, Jan 10, 2017 at 3:52 PM, Marcus Müller  
>> wrote:

 That's not perfectly correct.

 OSD.0/1/2 seem to be down.
>>>
>>>
>>> Sorry but where do you see this? I think this indicates that they are up:   
>>> osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs?
>>>
>>>
 Am 10.01.2017 um 07:50 schrieb Shinobu Kinjo :

 On Tue, Jan 10, 2017 at 3:44 PM, Marcus Müller  
 wrote:
> All osds are currently up:
>
>health HEALTH_WARN
>   4 pgs stuck unclean
>   recovery 4482/58798254 objects degraded (0.008%)
>   recovery 420522/58798254 objects misplaced (0.715%)
>   noscrub,nodeep-scrub flag(s) set
>monmap e9: 5 mons at
> {ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0,ceph5=192.168.60.11:6789/0}
>   election epoch 478, quorum 0,1,2,3,4
> ceph1,ceph2,ceph3,ceph4,ceph5
>osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs
>   flags noscrub,nodeep-scrub
> pgmap v9981077: 320 pgs, 3 pools, 4837 GB data, 19140 kobjects
>   15070 GB used, 40801 GB / 55872 GB avail
>   4482/58798254 objects degraded (0.008%)
>   420522/58798254 objects misplaced (0.715%)
>316 active+clean
>  4 active+remapped
> client io 56601 B/s rd, 45619 B/s wr, 0 op/s
>
> This did not chance for two days or so.
>
>
> By the way, my ceph osd df now looks like this:
>
> ID WEIGHT  REWEIGHT SIZE   USEAVAIL  %USE  VAR
> 0 1.28899  1.0  3724G  1699G  2024G 45.63 1.69
> 1 1.57899  1.0  3724G  1708G  2015G 45.87 1.70
> 2 1.68900  1.0  3724G  1695G  2028G 45.54 1.69
> 3 6.78499  1.0  7450G  1241G  6208G 16.67 0.62
> 4 8.3  1.0  7450G  1228G  6221G 16.49 0.61
> 5 9.51500  1.0  7450G  1239G  6210G 16.64 0.62
> 6 7.66499  1.0  7450G  1265G  6184G 16.99 0.63
> 7 9.75499  1.0  7450G  2497G  4952G 33.52 1.24
> 8 9.32999  1.0  7450G  2495G  4954G 33.49 1.24
> TOTAL 55872G 15071G 40801G 26.97
> MIN/MAX VAR: 0.61/1.70  STDDEV: 13.16
>
> As you can see, now osd2 also went down to 45% Use and „lost“ data. But I
> also think this is no problem and ceph just clears everything up after
> backfilling.
>
>
> Am 10.01.2017 um 07:29 schrieb Shinobu Kinjo :
>
> Looking at ``ceph -s`` you originally provided, all OSDs are up.
>
> osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs
>
>
> But looking at ``pg query``, OSD.0 / 1 are not up. Are they something

 That's not perfectly correct.

 OSD.0/1/2 seem to be down.

> like related to ?:
>
> Ceph1, ceph2 and ceph3 are vms on one physical host
>
>
> Are those OSDs running on vm instances?
>
> # 9.7
> 
>
> "state": "active+remapped",
> "snap_trimq": "[]",
> "epoch": 3114,
> "up": [
> 7,
> 3
> ],
> "acting": [
> 7,
> 3,
> 0
> ],
>
> 
>
> # 7.84
> 
>
> "state": "active+remapped",
> "snap_trimq": "[]",
> "epoch": 3114,
> "up": [
> 4,
> 8
> ],
> "acting": [
> 4,
> 8,
> 1
> ],
>
> 
>
> # 8.1b
> 
>
> "state": "active+remapped",
> "snap_trimq": "[]",
> "epoch": 3114,
> "up": [
> 4,
> 7
> ],
> "acting": [
> 4,
> 7,
> 2
> ],
>
> 
>
> # 7.7a
> 
>
> "state": "active+remapped",
> "snap_trimq": "[]",
> "epoch": 3114,
> "up": [
> 7,
> 4
> ],
> "acting": [

Re: [ceph-users] pg stuck in peering while power failure

2017-01-10 Thread Samuel Just

{
"name": "Started\/Primary\/Peering",
"enter_time": "2017-01-10 13:43:34.933074",
"past_intervals": [
{
"first": 75858,
"last": 75860,
"maybe_went_rw": 1,
"up": [
345,
622,
685,
183,
792,
2147483647,
2147483647,
401,
516
],
"acting": [
345,
622,
685,
183,
792,
2147483647,
2147483647,
401,
516
],
"primary": 345,
"up_primary": 345
},

Between 75858 and 75860,

345,
622,
685,
183,
792,
2147483647,
2147483647,
401,
516

was the acting set.  The current acting set

345,
622,
685,
183,
2147483647,
2147483647,
153,
401,
516

needs *all 7* of the osds from epochs 75858 through 75860 to ensure
that it has any writes completed during that time.  You can make
transient situations like that less of a problem by setting min_size
to 8 (though it'll prevent writes with 2 failures until backfill
completes).  A possible enhancement for an EC pool would be to gather
the infos from those osds anyway and use that rule out writes (if they
actually happened, you'd still be stuck).
-Sam

On Tue, Jan 10, 2017 at 5:36 AM, Craig Chi  wrote:
> Hi List,
>
> I am testing the stability of my Ceph cluster with power failure.
>
> I brutally powered off 2 Ceph units with each 90 OSDs on it while the client
> I/O was continuing.
>
> Since then, some of the pgs of my cluster stucked in peering
>
>   pgmap v3261136: 17408 pgs, 4 pools, 176 TB data, 5082 kobjects
> 236 TB used, 5652 TB / 5889 TB avail
> 8563455/38919024 objects degraded (22.003%)
>13526 active+undersized+degraded
> 3769 active+clean
>  104 down+remapped+peering
>9 down+peering
>
> I queried the peering pg (all on EC pool with 7+2) and got blocked
> information (full query: http://pastebin.com/pRkaMG2h )
>
> "probing_osds": [
> "153(6)",
> "183(3)",
> "345(0)",
> "401(7)",
> "516(8)",
> "622(1)",
> "685(2)"
> ],
> "blocked": "peering is blocked due to down osds",
> "down_osds_we_would_probe": [
> 792
> ],
> "peering_blocked_by": [
> {
> "osd": 792,
> "current_lost_at": 0,
> "comment": "starting or marking this osd lost may let us
> proceed"
> }
> ]
>
>
> osd.792 is exactly on one of the units I powered off. And I think the I/O
> associated with this pg is paused too.
>
> I have checked the troubleshooting page on Ceph website (
> http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/
> ), it says that start the OSD or mark it lost can make the procedure
> continue.
>
> I am sure that my cluster was healthy before the power outage occurred. I am
> wondering if the power outage really happens in production environment, will
> it also freeze my client I/O if I don't do anything? Since I just lost 2
> redundancies (I have erasure code with 7+2), I think it should still serve
> normal functionality.
>
> Or if I am doing something wrong? Please give me some suggestions, thanks.
>
> Sincerely,
> Craig Chi
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] pg stuck in peering while power failure

2017-01-10 Thread Craig Chi

Hi List,

I am testing the stability of my Ceph cluster with power failure.

I brutally powered off 2 Ceph units with each 90 OSDs on it while the client 
I/O was continuing.

Since then, some of the pgs of my cluster stucked in peering

pgmap v3261136: 17408 pgs, 4 pools, 176 TB data, 5082 kobjects
236 TB used, 5652 TB / 5889 TB avail
8563455/38919024 objects degraded (22.003%)
13526 active+undersized+degraded
3769 active+clean
104 down+remapped+peering
9 down+peering

I queried the peering pg (all on EC pool with 7+2) and got blocked information 
(full query:http://pastebin.com/pRkaMG2h)

"probing_osds": [
"153(6)",
"183(3)",
"345(0)",
"401(7)",
"516(8)",
"622(1)",
"685(2)"
],
"blocked": "peering is blocked due to down osds",
"down_osds_we_would_probe": [
792
],
"peering_blocked_by": [
{
"osd": 792,
"current_lost_at": 0,
"comment": "starting or marking this osd lost may let us proceed"
}
]


osd.792 is exactly on one of the units I powered off. And I think the I/O 
associated with this pg is paused too.

I have checked the troubleshooting page on Ceph website 
(http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/), 
it says that start the OSD or mark it lost can make the procedure continue.

I am sure that my cluster was healthy before the power outage occurred. I am 
wondering if the power outage really happens in production environment, will it 
also freeze my client I/O if I don't do anything? Since I just lost 2 
redundancies (I have erasure code with 7+2), I think it should still serve 
normal functionality.

Or if I am doing something wrong? Please give me some suggestions, thanks.

Sincerely,
Craig Chi___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] High CPU usage by ceph-mgr on idle Ceph cluster

2017-01-10 Thread John Spray

On Mon, Jan 9, 2017 at 11:46 PM, Stillwell, Bryan J
 wrote:
> Last week I decided to play around with Kraken (11.1.1-1xenial) on a
> single node, two OSD cluster, and after a while I noticed that the new
> ceph-mgr daemon is frequently using a lot of the CPU:
>
> 17519 ceph  20   0  850044 168104208 S 102.7  4.3   1278:27
> ceph-mgr
>
> Restarting it with 'systemctl restart ceph-mgr*' seems to get its CPU
> usage down to < 1%, but after a while it climbs back up to > 100%.  Has
> anyone else seen this?

Definitely worth investigating, could you set "debug mgr = 20" on the
daemon to see if it's obviously spinning in a particular place?

Thanks,
John

>
> Bryan
>
> E-MAIL CONFIDENTIALITY NOTICE:
> The contents of this e-mail message and any attachments are intended solely 
> for the addressee(s) and may contain confidential and/or legally privileged 
> information. If you are not the intended recipient of this message or if this 
> message has been addressed to you in error, please immediately alert the 
> sender by reply e-mail and then delete this message and any attachments. If 
> you are not the intended recipient, you are notified that any use, 
> dissemination, distribution, copying, or storage of this message or any 
> attachment is strictly prohibited.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Write back cache removal

2017-01-10 Thread Stuart Harland

Yes Wido, you are correct. There is a RBD pool in the cluster, but is not 
currently running with a cache attached. The Pool I’m trying to manage here is 
only used by Librados to write objects directly to the pool as opposed to any 
of the other niceties that ceph provides.

Specifically I ran:

`ceph osd tier cache-mode  forward`

which returned `Error EPERM: 'forward' is not a well-supported cache mode and 
may corrupt your data.  pass --yes-i-really-mean-it to force.`

Currently we are running 10.2.5. I suspect that it’s fine in our use case, 
however given the sparsity of the documentation I didn’t like to assume 
anything.


Regards

Stuart





> On 10 Jan 2017, at 09:52, Wido den Hollander  wrote:
> 
>> 
>> Op 10 januari 2017 om 9:52 schreef Nick Fisk > >:
>> 
>> 
>>> -Original Message-
>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
>>> Wido den Hollander
>>> Sent: 10 January 2017 07:54
>>> To: ceph new ; Stuart Harland 
>>> 
>>> Subject: Re: [ceph-users] Write back cache removal
>>> 
>>> 
 Op 9 januari 2017 om 13:02 schreef Stuart Harland 
 :
 
 
 Hi,
 
 We’ve been operating a ceph storage system storing files using librados 
 (using a replicated pool on rust disks). We implemented a
>>> cache over the top of this with SSDs, however we now want to turn this off.
 
 The documentation suggests setting the cache mode to forward before 
 draining the pool, however the ceph management
>>> controller spits out an error about this saying that it is unsupported and 
>>> hence dangerous.
 
>>> 
>>> What version of Ceph are you running?
>>> 
>>> And can you paste the exact command and the output?
>>> 
>>> Wido
>> 
>> Hi Wido,
>> 
>> I think this has been discussed before and looks like it might be a current 
>> limitation. Not sure if it's on anybody's radar to fix.
>> 
>> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg24472.html 
>> 
>> 
> 
> Might be, but afaik they are using their own application which writes to 
> RADOS using librados, not RBD.
> 
> Is that correct Stuart?
> 
> Wido
> 
>> Nick
>> 
>>> 
 The thing is I cannot really locate any documentation as to why it’s 
 considered unsupported and under what conditions it is expected
>>> to fail: I have read a passing comment about EC pools having data 
>>> corruption, but we are using replicated pools.
 
 Is this something that is safe to do?
 
 Otherwise I have noted the read proxy mode of cache tiers which is 
 documented as a mechanism to transition from write back to
>>> disabled, however the documentation is even sparser on this than forward 
>>> mode. Would this be a better approach if there is some
>>> unsupported behaviour in the forward mode cache option?
 
 Any thoughts would be appreciated - we really cannot afford to corrupt the 
 data, and I really do not want to have to do some
>>> manual software based eviction on this data.
 
 regards
 
 Stuart
 
 
 − Stuart Harland:
 Infrastructure Engineer
 Email: s.harl...@livelinktechnology.net 
 
 
 
 
 LiveLink Technology Ltd
 McCormack House
 56A East Street
 Havant
 PO9 1BS
 
 IMPORTANT: The information transmitted in this e-mail is intended only for 
 the person or entity to whom it is addressed and may
>>> contain confidential and/or privileged information. If you are not the 
>>> intended recipient of this message, please do not read, copy, use
>>> or disclose this communication and notify the sender immediately. Any 
>>> review, retransmission, dissemination or other use of, or
>>> taking any action in reliance upon this information by persons or entities 
>>> other than the intended recipient is prohibited. Any views or
>>> opinions presented in this e-mail are solely those of the author and do not 
>>> necessarily represent those of LiveLink. This e-mail
>>> message has been checked for the presence of computer viruses. However, 
>>> LiveLink is not able to accept liability for any damage
>>> caused by this e-mail.
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>> 
___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] Write back cache removal

2017-01-10 Thread jiajia zhong

It's fixed since v0.94.6, http://ceph.com/releases/v0-94-6-hammer-released/


   - fs: CephFS restriction on removing cache tiers is overly strict (
   issue#11504 , pr#6402
   , John Spray)


but you have to make sure your release patched.

2017-01-10 16:52 GMT+08:00 Nick Fisk :

> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> Of Wido den Hollander
> > Sent: 10 January 2017 07:54
> > To: ceph new ; Stuart Harland <
> s.harl...@livelinktechnology.net>
> > Subject: Re: [ceph-users] Write back cache removal
> >
> >
> > > Op 9 januari 2017 om 13:02 schreef Stuart Harland <
> s.harl...@livelinktechnology.net>:
> > >
> > >
> > > Hi,
> > >
> > > We’ve been operating a ceph storage system storing files using
> librados (using a replicated pool on rust disks). We implemented a
> > cache over the top of this with SSDs, however we now want to turn this
> off.
> > >
> > > The documentation suggests setting the cache mode to forward before
> draining the pool, however the ceph management
> > controller spits out an error about this saying that it is unsupported
> and hence dangerous.
> > >
> >
> > What version of Ceph are you running?
> >
> > And can you paste the exact command and the output?
> >
> > Wido
>
> Hi Wido,
>
> I think this has been discussed before and looks like it might be a
> current limitation. Not sure if it's on anybody's radar to fix.
>
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg24472.html
>
> Nick
>
> >
> > > The thing is I cannot really locate any documentation as to why it’s
> considered unsupported and under what conditions it is expected
> > to fail: I have read a passing comment about EC pools having data
> corruption, but we are using replicated pools.
> > >
> > > Is this something that is safe to do?
> > >
> > > Otherwise I have noted the read proxy mode of cache tiers which is
> documented as a mechanism to transition from write back to
> > disabled, however the documentation is even sparser on this than forward
> mode. Would this be a better approach if there is some
> > unsupported behaviour in the forward mode cache option?
> > >
> > > Any thoughts would be appreciated - we really cannot afford to corrupt
> the data, and I really do not want to have to do some
> > manual software based eviction on this data.
> > >
> > > regards
> > >
> > > Stuart
> > >
> > >
> > >  − Stuart Harland:
> > > Infrastructure Engineer
> > > Email: s.harl...@livelinktechnology.net  livelinktechnology.net>
> > >
> > >
> > >
> > > LiveLink Technology Ltd
> > > McCormack House
> > > 56A East Street
> > > Havant
> > > PO9 1BS
> > >
> > > IMPORTANT: The information transmitted in this e-mail is intended only
> for the person or entity to whom it is addressed and may
> > contain confidential and/or privileged information. If you are not the
> intended recipient of this message, please do not read, copy, use
> > or disclose this communication and notify the sender immediately. Any
> review, retransmission, dissemination or other use of, or
> > taking any action in reliance upon this information by persons or
> entities other than the intended recipient is prohibited. Any views or
> > opinions presented in this e-mail are solely those of the author and do
> not necessarily represent those of LiveLink. This e-mail
> > message has been checked for the presence of computer viruses. However,
> LiveLink is not able to accept liability for any damage
> > caused by this e-mail.
> > >
> > >
> > >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Crushmap (tunables) flapping on cluster

2017-01-10 Thread Breunig, Steve (KASRL)

Hi list,


I'm running a cluster which is currently in migration from hammer to jewel.


Actually i have the problem, that the tunables are flapping and a map of an rbd 
image is not working.


It is flapping between:

{
"choose_local_tries": 0,
"choose_local_fallback_tries": 0,
"choose_total_tries": 50,
"chooseleaf_descend_once": 1,
"chooseleaf_vary_r": 1,
"chooseleaf_stable": 0,
"straw_calc_version": 1,
"allowed_bucket_algs": 54,
"profile": "hammer",
"optimal_tunables": 0,
"legacy_tunables": 0,
"minimum_required_version": "hammer",
"require_feature_tunables": 1,
"require_feature_tunables2": 1,
"has_v2_rules": 0,
"require_feature_tunables3": 1,
"has_v3_rules": 0,
"has_v4_buckets": 1,
"require_feature_tunables5": 0,
"has_v5_rules": 0
}

and

{
"choose_local_tries": 0,
"choose_local_fallback_tries": 0,
"choose_total_tries": 50,
"chooseleaf_descend_once": 1,
"chooseleaf_vary_r": 1,
"straw_calc_version": 1,
"allowed_bucket_algs": 54,
"profile": "hammer",
"optimal_tunables": 0,
"legacy_tunables": 0,
"require_feature_tunables": 1,
"require_feature_tunables2": 1,
"require_feature_tunables3": 1,
"has_v2_rules": 0,
"has_v3_rules": 0,
"has_v4_buckets": 1
}


Did someone have that problem too?

How can it be solved?


Regards,

Steve



Kantar Disclaimer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Write back cache removal

2017-01-10 Thread Wido den Hollander


> Op 10 januari 2017 om 9:52 schreef Nick Fisk :
> 
> 
> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> > Wido den Hollander
> > Sent: 10 January 2017 07:54
> > To: ceph new ; Stuart Harland 
> > 
> > Subject: Re: [ceph-users] Write back cache removal
> > 
> > 
> > > Op 9 januari 2017 om 13:02 schreef Stuart Harland 
> > > :
> > >
> > >
> > > Hi,
> > >
> > > We’ve been operating a ceph storage system storing files using librados 
> > > (using a replicated pool on rust disks). We implemented a
> > cache over the top of this with SSDs, however we now want to turn this off.
> > >
> > > The documentation suggests setting the cache mode to forward before 
> > > draining the pool, however the ceph management
> > controller spits out an error about this saying that it is unsupported and 
> > hence dangerous.
> > >
> > 
> > What version of Ceph are you running?
> > 
> > And can you paste the exact command and the output?
> > 
> > Wido
> 
> Hi Wido,
> 
> I think this has been discussed before and looks like it might be a current 
> limitation. Not sure if it's on anybody's radar to fix.
> 
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg24472.html
> 

Might be, but afaik they are using their own application which writes to RADOS 
using librados, not RBD.

Is that correct Stuart?

Wido

> Nick
> 
> > 
> > > The thing is I cannot really locate any documentation as to why it’s 
> > > considered unsupported and under what conditions it is expected
> > to fail: I have read a passing comment about EC pools having data 
> > corruption, but we are using replicated pools.
> > >
> > > Is this something that is safe to do?
> > >
> > > Otherwise I have noted the read proxy mode of cache tiers which is 
> > > documented as a mechanism to transition from write back to
> > disabled, however the documentation is even sparser on this than forward 
> > mode. Would this be a better approach if there is some
> > unsupported behaviour in the forward mode cache option?
> > >
> > > Any thoughts would be appreciated - we really cannot afford to corrupt 
> > > the data, and I really do not want to have to do some
> > manual software based eviction on this data.
> > >
> > > regards
> > >
> > > Stuart
> > >
> > >
> > >  − Stuart Harland:
> > > Infrastructure Engineer
> > > Email: s.harl...@livelinktechnology.net 
> > > 
> > >
> > >
> > >
> > > LiveLink Technology Ltd
> > > McCormack House
> > > 56A East Street
> > > Havant
> > > PO9 1BS
> > >
> > > IMPORTANT: The information transmitted in this e-mail is intended only 
> > > for the person or entity to whom it is addressed and may
> > contain confidential and/or privileged information. If you are not the 
> > intended recipient of this message, please do not read, copy, use
> > or disclose this communication and notify the sender immediately. Any 
> > review, retransmission, dissemination or other use of, or
> > taking any action in reliance upon this information by persons or entities 
> > other than the intended recipient is prohibited. Any views or
> > opinions presented in this e-mail are solely those of the author and do not 
> > necessarily represent those of LiveLink. This e-mail
> > message has been checked for the presence of computer viruses. However, 
> > LiveLink is not able to accept liability for any damage
> > caused by this e-mail.
> > >
> > >
> > >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] rgw swift api long term support

2017-01-10 Thread Marius Vaitiekunas

Hi,

I would like to ask ceph developers if there any chance that swift api
support for rgw is going to be dropped in the future (like in 5 years).

Why am I asking? :)

We were happy openstack glance users on ceph s3 api until openstack decided
to drop glance s3 support.. So, we need to switch our image backend. Swift
api on ceph looks quite good solution.

Thank's in advance!

-- 
Marius Vaitiekūnas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Write back cache removal

2017-01-10 Thread Nick Fisk

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Wido 
> den Hollander
> Sent: 10 January 2017 07:54
> To: ceph new ; Stuart Harland 
> 
> Subject: Re: [ceph-users] Write back cache removal
> 
> 
> > Op 9 januari 2017 om 13:02 schreef Stuart Harland 
> > :
> >
> >
> > Hi,
> >
> > We’ve been operating a ceph storage system storing files using librados 
> > (using a replicated pool on rust disks). We implemented a
> cache over the top of this with SSDs, however we now want to turn this off.
> >
> > The documentation suggests setting the cache mode to forward before 
> > draining the pool, however the ceph management
> controller spits out an error about this saying that it is unsupported and 
> hence dangerous.
> >
> 
> What version of Ceph are you running?
> 
> And can you paste the exact command and the output?
> 
> Wido

Hi Wido,

I think this has been discussed before and looks like it might be a current 
limitation. Not sure if it's on anybody's radar to fix.

https://www.mail-archive.com/ceph-users@lists.ceph.com/msg24472.html

Nick

> 
> > The thing is I cannot really locate any documentation as to why it’s 
> > considered unsupported and under what conditions it is expected
> to fail: I have read a passing comment about EC pools having data corruption, 
> but we are using replicated pools.
> >
> > Is this something that is safe to do?
> >
> > Otherwise I have noted the read proxy mode of cache tiers which is 
> > documented as a mechanism to transition from write back to
> disabled, however the documentation is even sparser on this than forward 
> mode. Would this be a better approach if there is some
> unsupported behaviour in the forward mode cache option?
> >
> > Any thoughts would be appreciated - we really cannot afford to corrupt the 
> > data, and I really do not want to have to do some
> manual software based eviction on this data.
> >
> > regards
> >
> > Stuart
> >
> >
> >  − Stuart Harland:
> > Infrastructure Engineer
> > Email: s.harl...@livelinktechnology.net 
> > 
> >
> >
> >
> > LiveLink Technology Ltd
> > McCormack House
> > 56A East Street
> > Havant
> > PO9 1BS
> >
> > IMPORTANT: The information transmitted in this e-mail is intended only for 
> > the person or entity to whom it is addressed and may
> contain confidential and/or privileged information. If you are not the 
> intended recipient of this message, please do not read, copy, use
> or disclose this communication and notify the sender immediately. Any review, 
> retransmission, dissemination or other use of, or
> taking any action in reliance upon this information by persons or entities 
> other than the intended recipient is prohibited. Any views or
> opinions presented in this e-mail are solely those of the author and do not 
> necessarily represent those of LiveLink. This e-mail
> message has been checked for the presence of computer viruses. However, 
> LiveLink is not able to accept liability for any damage
> caused by this e-mail.
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

38 matches

Mail list logo