date:20150921

Re: [ceph-users] Software Raid 1 for system disks on storage nodes (not for OSD disks)

2015-09-21 Thread Max A. Krasilnikov

Hello!

On Sat, Sep 19, 2015 at 07:03:35AM +0200, martin wrote:

> Thanks all for the suggestions.

> Our storage nodes have plenty of RAM and their only purpose is to host the
> OSD daemons, so we will not create a swap partition on provisioning.

As an option, You can use swap file on demand. It is easy to deploy.

> For the OS disk we will then use a software raid 1 to handle eventually
> disk failures. For provisioning the hosts we use kickstart and then Ansible
> to install an prepare the hosts to be ready to for ceph-deploy.

I don't think raid1 is suitable for ceph as of ability to have distributed over
hosts copies of data. Think as this is a raid1 over hosts that is more reliable.
Even a server crush will not destroy Your data. If Your setup is quite correct
:)

-- 
WBR, Max A. Krasilnikov
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] multi-datacenter crush map

2015-09-21 Thread Wouter De Borger

Thank you for your answer! We will use size=4 and min_size=2, which should
do the trick.

For the monitor issue, we have a third datacenter (with higher latency, but
that shouldn't be a problem for the monitors)

We had also considered the locality issue. Our WAN round trip latency is
1.5 ms (now) and we should get a dedicated light path in the near future
(<0.1 ms).
So we hope to get acceptable latency without additional tweaking.
The plan B is to make two pools, with different weights for the different
DC's. VM's in DC 1 will get high weight for DC 1, VM's in DC 2 will get
high weight for DC 2.

Thanks,
Wouter

On Sat, Sep 19, 2015 at 9:31 PM, Robert LeBlanc 
wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> You will want size=4 min_size=2 if you want to keep I/O going if a DC
> fails and ensure some data integrity. Data checksumming (I think is
> being added) would provide much stronger data integrity checking in a
> two copy situation as you would be able to tell which of the two
> copies is the good copy instead of needing a third to break the tie.
>
> However, you have yet another problem on your hands. The way monitors
> works makes this tricky. If you have one monitor in one DC and two in
> the other, if the two monitor DC burns down, the surviving cluster
> stops working too because there isn't more than 50% of the monitors
> available. Putting two monitors in each DC only causes both to stop
> working if one goes down (you need three to make a quorum). It has
> been suggested that putting the odd monitor in the cloud (or other
> off-site location to both DCs) could be an option, but latency could
> cause problems. The cloud monitor would complete the quorum with
> whichever DC survives.
>
> Also remember that there is no data locality awareness in Ceph at the
> moment. This could mean that the primary for a PG is in the other DC.
> So your client has to contact the primary in the other DC, then that
> OSD contacts one OSD in it's DC and two in the other and has to get
> confirmation that the write is acknowledged then ack the write to the
> client. For a write you will be between 2 x ( LAN latency + WAN
> latency ) and 2 x ( LAN latency + 2 x WAN latency ). Additionally your
> reads will be between 2 x LAN latency and 2 x WAN latency. Then there
> is write amplification so you need to make sure you have a lot more
> WAN bandwidth than you think you need.
>
> I think the large majority of us are eagerly waiting for the RBD
> replication feature or some sort of lag behind OSD for situations like
> this.
> -BEGIN PGP SIGNATURE-
> Version: Mailvelope v1.1.0
> Comment: https://www.mailvelope.com
>
> wsFcBAEBCAAQBQJV/bgrCRDmVDuy+mK58QAAOTsQAJd5Q/3uVMP6D0U+iZv/
> FGvEThfxLqarEo/n/TAPiJdCeZP9sKr8szTP72Iajt5UAwH8Ry5qcvClUoet
> LmMXfOxHJQJcMbXcKHxI8G7w9h/8ExkGA3GkoBYltUvZ9+oEI30ANHZphBiK
> HhaLWanrEKh8L4EbXnqA9JvEYwf1BGDvxKbdvFDNSIIbDywN3DJn7OavRhC9
> M63GQnFmxSO6F+Oy1q5vMfpur/VtZ27GRfzIDsougRTmM5q9zbdpSY8pHrrZ
> RDExkM1t0orl1gUnbNhl/YgQTGfU/XWpEKtJju7Wk9Ciem5SFczJRWsputHc
> AhBtnxBoEInlsnpHKnCsPvbY8wEcoo+YxNt79/M3cR8x0UzXl+/4SoDlYnSK
> X3afL/YmVnbCV6hoxl2LAOqHbTYasN9VxQIbpQe4kAzSq45yJX//k8NRXBfD
> +hGF8qfxpcbTe/9IjJiqwe+ZpaAd4vX7Xfq4oHHeMwWUrvd8sXSbr5CIV1AJ
> CYsixEy2gJ0oFFVKcBGtzAfBUxJHb/FAcAuV97zSdYyYRplMq5Qjaz/hwGeu
> 9pC83kxY40pfzdD9uEElWoI3+6/34LdNo4TLi3IM8aeZmNGzzIgt/MxAuFOk
> 9Jf2Dwmab0+Ut6uJasY4Fr6HiyNoeTXea+CSWrnvsMohOseyJg996GUP3gUl
> OEoA
> =PfhN
> -END PGP SIGNATURE-
> 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Sat, Sep 19, 2015 at 12:54 PM, Wouter De Borger 
> wrote:
> > Ok, so if I understand correctly, for replication level 3 or 4 I would
> have
> > to use the rule
> >
> > rule replicated_ruleset {
> >
> > ruleset 0
> > type replicated
> > min_size 1
> > max_size 10
> > step take root
> > step choose firstn 2 type datacenter
> > step chooseleaf firstn 2 type host
> > step emit
> > }
> >
> > The question I have now is: how will it behave when a DC goes down?
> > (Assuming catastrophic failure, the thing burns down)
> >
> > For example, if I set replication to 3, min_rep to 3.
> > Then, if a DC goes down, crush will only return 2 PG's, so everything
> will
> > hang  (same for 4/4 and 4/3)
> >
> > If I set replication to 3, min_rep to 2, it could occur that all data of
> a
> > PG is in one DC (degraded mode). if this DC goes down, the PG will
> hang,
> > As far as I know, degraded PG's will still accept writes, so data loss is
> > possible. (same for 4/2)
> >
> >
> >
> > I can't seem to find a way around this. What am I missing.
> >
> >
> > Wouter
> >
> >
> >
> >
> > On Fri, Sep 18, 2015 at 10:10 PM, Gregory Farnum 
> wrote:
> >>
> >> On Fri, Sep 18, 2015 at 4:57 AM, Wouter De Borger  >
> >> wrote:
> >> > Hi all,
> >> >
> >> > I have found on the mailing list that it should be possible to have a
> >> > multi
> >> > datacenter setup, if latency is low enough.
>

[ceph-users] move/upgrade from straw to straw2

2015-09-21 Thread Stefan Priebe - Profihost AG

Hi,

how can i upgrade / move from straw to straw2? I checked the docs but i
was unable to find upgrade informations?

Greets,
Stefan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Software Raid 1 for system disks on storage nodes (not for OSD disks)

2015-09-21 Thread Vickey Singh

On Fri, Sep 18, 2015 at 6:33 PM, Robert LeBlanc 
wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Depends on how easy it is to rebuild an OS from scratch. If you have
> something like Puppet or Chef that configure a node completely for
> you, it may not be too much of a pain to forgo the RAID. We run our
> OSD nodes from a single SATADOM and use Puppet for configuration. We
> also don't use swap (not very effective on SATADOM), but have enough
> RAM that we feel comfortable enough with that decision.
>
> If you use ceph-disk or ceph-deploy to configure the OSDs, then they
> should automatically come back up when you lay down the new OS and set
> up the necessary ceph config items (ceph.conf and the OSD bootstrap
> keys).
>

Hello sir

This sounds really interesting , could you please elaborate how after
reinstalling OS and installing Ceph packages, how does Ceph detects OSD's
that were hosted earlier on this node.

I am using ceph-deploy to provision ceph , now what all changes i need to
do after reinstalling OS of a OSD node. So that it should detect my OSD
daemons. Please help me to know this step by step.

Thanks in advance.

Vickey



> - 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Fri, Sep 18, 2015 at 9:06 AM, Martin Palma  wrote:
> > Hi,
> >
> > Is it a good idea to use a software raid for the system disk (Operating
> > System) on a Ceph storage node? I mean only for the OS not for the OSD
> > disks.
> >
> > And what about a swap partition? Is that needed?
> >
> > Best,
> > Martin
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
> -BEGIN PGP SIGNATURE-
> Version: Mailvelope v1.1.0
> Comment: https://www.mailvelope.com
>
> wsFcBAEBCAAQBQJV/C7UCRDmVDuy+mK58QAAoTMQAMZBv4/lphmntC23b9/l
> JWUPjZfbXUtNgnfMvWcVyTSXsTtM5mY/4/iSZ4ZfCQ4YyqWWMpSlocHONHFz
> nFTtGupqV3vPCo4X8bl58/iv4J0H2iWUr2klk7jtTj+e+JjyWDo25l8V2ofP
> edt5g7qcMAwiWYrrpjxQBK4AFNiPJKSMxrzK1Mgic15nwX0OJu0DDNS5twzZ
> s8Y+UfS80+hZvyBTUGhsO8pkYoJQvYRGgyqYtCdxA+m1T8lWVe8SC0eLWOXy
> xoyGR7dqcvEXQadrqfmU618eNpNEECPoHeIkeCqpTohrUVsyRcfSGAtfM0YY
> Ixf2SCaDMAaRwvXGJUf5OP/3HHWps0m4YyLBOddPZ5XZb1utZiclh26KuOyw
> QdGkP7uoYEMO0v40dcsIbOVhtgTdX+HrpEGuqEtNEGe194sS1nluw+49aLxe
> eozHSRGq3GmRm/q3bR5f2p+WXwKqmdDRFhqII8H11bb5F7etU2PBo1JA2bTW
> hUFqu6+ST8eI34OeC7LbC9Txfw/iUhL62kiCm+gj8Rg+m+TZ7a1HEaVc8uyq
> Jw1+5hIgyTWFvKdIiW65k++8w9my6kUIsY8RT8p08DTSPzxuwGtHr7UJJ629
> K/tlpGdQTRf7PXgmea6sSodnmaF5HRIUdU0nhQpRRxjX/V+PENI8Qq45KyfX
> BovV
> =Gzvl
> -END PGP SIGNATURE-
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] how to get a mount list?

2015-09-21 Thread John Spray

I'm assuming you mean from the server: you can list the clients of an
MDS by SSHing to the server where it's running and doing "ceph daemon
mds. session ls".  This has been in releases since Giant iirc.

Cheers,
John

On Mon, Sep 21, 2015 at 4:24 AM, domain0  wrote:
> hi ,
> if use cephfs, how to get a client mount list?
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] move/upgrade from straw to straw2

2015-09-21 Thread Wido den Hollander

On 21-09-15 11:06, Stefan Priebe - Profihost AG wrote:
> Hi,
> 
> how can i upgrade / move from straw to straw2? I checked the docs but i
> was unable to find upgrade informations?
> 

First make sure that all clients are running librados 0.9, but keep in
mind that any running VMs or processes have to be restarted first.

Afterwards, download the crushmap and change 'alg straw' to 'alg straw2'.

You can also change 'straw_calc_version' to 2 in the CRUSHMap.

Inject the newly compiled version and you're done.

> Greets,
> Stefan
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] move/upgrade from straw to straw2

2015-09-21 Thread Dan van der Ster

On Mon, Sep 21, 2015 at 12:11 PM, Wido den Hollander  wrote:
> You can also change 'straw_calc_version' to 2 in the CRUSHMap.

AFAIK straw_calc_version = 1 is the optimal. straw_calc_version = 2 is
not defined. See src/crush/builder.c

Cheers, Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] snapshot failed after enable cache tier

2015-09-21 Thread Xiangyu (Raijin, BP&IT Dept)

Openstack Kilo use ceph as the backend storage (nova,cinder and glance),after 
enable cache tier for glance pool, take snapshot for instance failed (it seems 
generate the snapshot then delete it automatically soon)

If cache tier not suitable for glance ?

Best Regards!
*
[图像 007]
向毓(Raijin.Xiang)
计算与存储部(Computing and Storage Dept)
华为技术有限公司(Huawei Technologies Co., Ltd.)
Mobile：+86 186 2032 2562
Mail: xiang...@huawei.com
地址：深圳市龙岗区坂田街道雪岗路2018号天安云谷产业园一期1栋B座  邮编：518129
Bldg1-B, Cloud Park, Huancheng Road, Bantian Str., Longgang District,  518129  
Shenzhen, P. R. China
***

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] change ruleset with data

2015-09-21 Thread Xiangyu (Raijin, BP&IT Dept)

I have a ceph cluster set with two ruleset, each ruleset will select different 
osd, if can change ruleset for a pool online drectly(there have data exist on 
the pool) ?

Best Regards!
*
[图像 007]
向毓(Raijin.Xiang)
计算与存储部(Computing and Storage Dept)
华为技术有限公司(Huawei Technologies Co., Ltd.)
Mobile：+86 186 2032 2562
Mail: xiang...@huawei.com
地址：深圳市龙岗区坂田街道雪岗路2018号天安云谷产业园一期1栋B座  邮编：518129
Bldg1-B, Cloud Park, Huancheng Road, Bantian Str., Longgang District,  518129  
Shenzhen, P. R. China
***

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] move/upgrade from straw to straw2

2015-09-21 Thread Wido den Hollander



On 21-09-15 13:18, Dan van der Ster wrote:
> On Mon, Sep 21, 2015 at 12:11 PM, Wido den Hollander  wrote:
>> You can also change 'straw_calc_version' to 2 in the CRUSHMap.
> 
> AFAIK straw_calc_version = 1 is the optimal. straw_calc_version = 2 is
> not defined. See src/crush/builder.c
> 

Hmm, I could almost swear that I've done that on a setup. But if you say
so :)


@Stefan, change 'straw' to 'straw2' in your CRUSHMap.

Wido

> Cheers, Dan
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] snapshot failed after enable cache tier

2015-09-21 Thread Andrija Panic

Hi,

depending on cache mode etc - from what we have also experienced (using
CloudStack) - CEPH snapshot functionality simply stops working in some
cache configuration.
This means, we were also unable to deploy new VMs (base-gold snapshot is
created on CEPH and new data disk which is child of snapshot etc).

Inktank:
https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf

Mail-list:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg18338.html


On 21 September 2015 at 13:36, Xiangyu (Raijin, BP&IT Dept) <
xiang...@huawei.com> wrote:

> Openstack Kilo use ceph as the backend storage (nova,cinder and
> glance),after enable cache tier for glance pool, take snapshot for instance
> failed (it seems generate the snapshot then delete it automatically soon)
>
>
>
> If cache tier not suitable for glance ?
>
>
>
> *Best Regards!*
>
> *
>
> [image: 图像 007]
>
> *向毓**(Raijin.Xiang)*
>
> 计算与存储部(Computing and Storage Dept)
>
> 华为技术有限公司(Huawei Technologies Co., Ltd.)
>
> Mobile：+86 186 2032 2562
>
> Mail: xiang...@huawei.com
>
> 地址：深圳市龙岗区坂田街道雪岗路2018号天安云谷产业园一期1栋B座  邮编：518129
> Bldg1-B, Cloud Park, Huancheng Road, Bantian Str., Longgang District,
> 518129  Shenzhen, P. R. China
>
> ***
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] mds not starting ?

2015-09-21 Thread Frank, Petric (Petric)

Hello,

i'm facing a problem that mds seems not to start.

I started mds in debug mode "ceph-mds -f -i storage08 --debug_mds 10" which 
outputs in the log:

-- cut -
2015-09-21 14:12:14.313534 7ff47983d780  0 ceph version 0.94.3 
(95cefea9fd9ab740263bf8bb4796fd864d9afe2b), process ceph-mds, pid 24787
starting mds.storage08 at :/0
2015-09-21 14:12:14.316062 7ff47983d780 10 mds.-1.0 168 MDSCacheObject
2015-09-21 14:12:14.316074 7ff47983d780 10 mds.-1.0 2408CInode
2015-09-21 14:12:14.316075 7ff47983d780 10 mds.-1.0 16   elist<>::item   *7=112
2015-09-21 14:12:14.316077 7ff47983d780 10 mds.-1.0 480  inode_t
2015-09-21 14:12:14.316079 7ff47983d780 10 mds.-1.0 48nest_info_t
2015-09-21 14:12:14.316081 7ff47983d780 10 mds.-1.0 32frag_info_t
2015-09-21 14:12:14.316082 7ff47983d780 10 mds.-1.0 40   SimpleLock   *5=200
2015-09-21 14:12:14.316083 7ff47983d780 10 mds.-1.0 48   ScatterLock  *3=144
2015-09-21 14:12:14.316085 7ff47983d780 10 mds.-1.0 480 CDentry
2015-09-21 14:12:14.316086 7ff47983d780 10 mds.-1.0 16   elist<>::item
2015-09-21 14:12:14.316096 7ff47983d780 10 mds.-1.0 40   SimpleLock
2015-09-21 14:12:14.316097 7ff47983d780 10 mds.-1.0 952 CDir
2015-09-21 14:12:14.316098 7ff47983d780 10 mds.-1.0 16   elist<>::item   *2=32
2015-09-21 14:12:14.316099 7ff47983d780 10 mds.-1.0 176  fnode_t
2015-09-21 14:12:14.316100 7ff47983d780 10 mds.-1.0 48nest_info_t *2
2015-09-21 14:12:14.316101 7ff47983d780 10 mds.-1.0 32frag_info_t *2
2015-09-21 14:12:14.316103 7ff47983d780 10 mds.-1.0 264 Capability
2015-09-21 14:12:14.316104 7ff47983d780 10 mds.-1.0 32   xlist<>::item   *2=64
2015-09-21 14:12:14.316665 7ff47983d780 -1 mds.-1.0 log_to_monitors 
{default=true}
2015-09-21 14:12:14.320840 7ff4740c8700  7 mds.-1.server handle_osd_map: full = 
0 epoch = 20
2015-09-21 14:12:14.320984 7ff47983d780 10 mds.beacon.storage08 _send up:boot 
seq 1
2015-09-21 14:12:14.321060 7ff47983d780 10 mds.-1.0 create_logger
2015-09-21 14:12:14.321234 7ff4740c8700  5 mds.-1.0 handle_mds_map epoch 1 from 
mon.1
2015-09-21 14:12:14.321256 7ff4740c8700 10 mds.-1.0  my compat 
compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses 
versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no 
anchor table}
2015-09-21 14:12:14.321264 7ff4740c8700 10 mds.-1.0  mdsmap compat 
compat={},rocompat={},incompat={}
2015-09-21 14:12:14.321267 7ff4740c8700 10 mds.-1.-1 map says i am 
192.168.0.178:6802/24787 mds.-1.-1 state down:dne
2015-09-21 14:12:14.321272 7ff4740c8700 10 mds.-1.-1 not in map yet
2015-09-21 14:12:14.321305 7ff4740c8700  7 mds.-1.server handle_osd_map: full = 
0 epoch = 20
2015-09-21 14:12:14.321443 7ff4740c8700  5 mds.-1.-1 handle_mds_map epoch 1 
from mon.1
2015-09-21 14:12:14.321447 7ff4740c8700  5 mds.-1.-1  old map epoch 1 <= 1, 
discarding
2015-09-21 14:12:18.321061 7ff4707c0700 10 mds.beacon.storage08 _send up:boot 
seq 2
2015-09-21 14:12:19.321093 7ff470fc1700 10 MDSInternalContextBase::complete: 
N3MDS10C_MDS_TickE
2015-09-21 14:12:22.321119 7ff4707c0700 10 mds.beacon.storage08 _send up:boot 
seq 3
2015-09-21 14:12:24.321169 7ff470fc1700 10 MDSInternalContextBase::complete: 
N3MDS10C_MDS_TickE
...
-- cut -

"cheph -s" shows:

cluster 982924a3-32e7-401f-9975-018bb697d717
 health HEALTH_OK
 monmap e1: 3 mons at 
{0=192.168.0.176:6789/0,1=192.168.0.177:6789/0,2=192.168.0.178:6789/0}
election epoch 6, quorum 0,1,2 0,1,2
 osdmap e20: 3 osds: 3 up, 3 in
  pgmap v39: 64 pgs, 1 pools, 0 bytes data, 0 objects
15541 MB used, 388 GB / 403 GB avail
  64 active+clean

As you see MONs and OSDs seem to be happy.
I'm missing the "mdsmap" entry here. Try to verify with the command "ceph mds 
stat" gives:

  e1: 0/0/0 up

The section of ceph.conf regarding mds reads:
  [mds]
mds data = /var/lib/ceph/mds/ceph-$id
keyring = /var/lib/ceph/mds/ceph-$id/keyring
  [mds.storage08]
host = storage08
mds addr = 192.168.0.178


Configuration:
  3 Hosts
  Gentoo Linux (kernel 4.0.5)
  Ceph 0.94.3
  All have MON and OSD
  One has MDS additionally.


Any idea for on how the MDS to get running ?


Kind regards
  Petric

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] move/upgrade from straw to straw2

2015-09-21 Thread Stefan Priebe - Profihost AG


Am 21.09.2015 um 13:47 schrieb Wido den Hollander:
> 
> 
> On 21-09-15 13:18, Dan van der Ster wrote:
>> On Mon, Sep 21, 2015 at 12:11 PM, Wido den Hollander  wrote:
>>> You can also change 'straw_calc_version' to 2 in the CRUSHMap.
>>
>> AFAIK straw_calc_version = 1 is the optimal. straw_calc_version = 2 is
>> not defined. See src/crush/builder.c
>>
> 
> Hmm, I could almost swear that I've done that on a setup. But if you say
> so :)
> 
> 
> @Stefan, change 'straw' to 'straw2' in your CRUSHMap.

THX

> 
> Wido
> 
>> Cheers, Dan
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mds not starting ?

2015-09-21 Thread John Spray

Follow the instructions here to set up a filesystem:
http://docs.ceph.com/docs/master/cephfs/createfs/

It looks like you haven't done "ceph fs new".

Cheers,
John

On Mon, Sep 21, 2015 at 1:34 PM, Frank, Petric (Petric)
 wrote:
> Hello,
>
> i'm facing a problem that mds seems not to start.
>
> I started mds in debug mode "ceph-mds -f -i storage08 --debug_mds 10" which 
> outputs in the log:
>
> -- cut -
> 2015-09-21 14:12:14.313534 7ff47983d780  0 ceph version 0.94.3 
> (95cefea9fd9ab740263bf8bb4796fd864d9afe2b), process ceph-mds, pid 24787
> starting mds.storage08 at :/0
> 2015-09-21 14:12:14.316062 7ff47983d780 10 mds.-1.0 168 MDSCacheObject
> 2015-09-21 14:12:14.316074 7ff47983d780 10 mds.-1.0 2408CInode
> 2015-09-21 14:12:14.316075 7ff47983d780 10 mds.-1.0 16   elist<>::item   
> *7=112
> 2015-09-21 14:12:14.316077 7ff47983d780 10 mds.-1.0 480  inode_t
> 2015-09-21 14:12:14.316079 7ff47983d780 10 mds.-1.0 48nest_info_t
> 2015-09-21 14:12:14.316081 7ff47983d780 10 mds.-1.0 32frag_info_t
> 2015-09-21 14:12:14.316082 7ff47983d780 10 mds.-1.0 40   SimpleLock   *5=200
> 2015-09-21 14:12:14.316083 7ff47983d780 10 mds.-1.0 48   ScatterLock  *3=144
> 2015-09-21 14:12:14.316085 7ff47983d780 10 mds.-1.0 480 CDentry
> 2015-09-21 14:12:14.316086 7ff47983d780 10 mds.-1.0 16   elist<>::item
> 2015-09-21 14:12:14.316096 7ff47983d780 10 mds.-1.0 40   SimpleLock
> 2015-09-21 14:12:14.316097 7ff47983d780 10 mds.-1.0 952 CDir
> 2015-09-21 14:12:14.316098 7ff47983d780 10 mds.-1.0 16   elist<>::item   *2=32
> 2015-09-21 14:12:14.316099 7ff47983d780 10 mds.-1.0 176  fnode_t
> 2015-09-21 14:12:14.316100 7ff47983d780 10 mds.-1.0 48nest_info_t *2
> 2015-09-21 14:12:14.316101 7ff47983d780 10 mds.-1.0 32frag_info_t *2
> 2015-09-21 14:12:14.316103 7ff47983d780 10 mds.-1.0 264 Capability
> 2015-09-21 14:12:14.316104 7ff47983d780 10 mds.-1.0 32   xlist<>::item   *2=64
> 2015-09-21 14:12:14.316665 7ff47983d780 -1 mds.-1.0 log_to_monitors 
> {default=true}
> 2015-09-21 14:12:14.320840 7ff4740c8700  7 mds.-1.server handle_osd_map: full 
> = 0 epoch = 20
> 2015-09-21 14:12:14.320984 7ff47983d780 10 mds.beacon.storage08 _send up:boot 
> seq 1
> 2015-09-21 14:12:14.321060 7ff47983d780 10 mds.-1.0 create_logger
> 2015-09-21 14:12:14.321234 7ff4740c8700  5 mds.-1.0 handle_mds_map epoch 1 
> from mon.1
> 2015-09-21 14:12:14.321256 7ff4740c8700 10 mds.-1.0  my compat 
> compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
> ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds 
> uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline 
> data,8=no anchor table}
> 2015-09-21 14:12:14.321264 7ff4740c8700 10 mds.-1.0  mdsmap compat 
> compat={},rocompat={},incompat={}
> 2015-09-21 14:12:14.321267 7ff4740c8700 10 mds.-1.-1 map says i am 
> 192.168.0.178:6802/24787 mds.-1.-1 state down:dne
> 2015-09-21 14:12:14.321272 7ff4740c8700 10 mds.-1.-1 not in map yet
> 2015-09-21 14:12:14.321305 7ff4740c8700  7 mds.-1.server handle_osd_map: full 
> = 0 epoch = 20
> 2015-09-21 14:12:14.321443 7ff4740c8700  5 mds.-1.-1 handle_mds_map epoch 1 
> from mon.1
> 2015-09-21 14:12:14.321447 7ff4740c8700  5 mds.-1.-1  old map epoch 1 <= 1, 
> discarding
> 2015-09-21 14:12:18.321061 7ff4707c0700 10 mds.beacon.storage08 _send up:boot 
> seq 2
> 2015-09-21 14:12:19.321093 7ff470fc1700 10 MDSInternalContextBase::complete: 
> N3MDS10C_MDS_TickE
> 2015-09-21 14:12:22.321119 7ff4707c0700 10 mds.beacon.storage08 _send up:boot 
> seq 3
> 2015-09-21 14:12:24.321169 7ff470fc1700 10 MDSInternalContextBase::complete: 
> N3MDS10C_MDS_TickE
> ...
> -- cut -
>
> "cheph -s" shows:
>
> cluster 982924a3-32e7-401f-9975-018bb697d717
>  health HEALTH_OK
>  monmap e1: 3 mons at 
> {0=192.168.0.176:6789/0,1=192.168.0.177:6789/0,2=192.168.0.178:6789/0}
> election epoch 6, quorum 0,1,2 0,1,2
>  osdmap e20: 3 osds: 3 up, 3 in
>   pgmap v39: 64 pgs, 1 pools, 0 bytes data, 0 objects
> 15541 MB used, 388 GB / 403 GB avail
>   64 active+clean
>
> As you see MONs and OSDs seem to be happy.
> I'm missing the "mdsmap" entry here. Try to verify with the command "ceph mds 
> stat" gives:
>
>   e1: 0/0/0 up
>
> The section of ceph.conf regarding mds reads:
>   [mds]
> mds data = /var/lib/ceph/mds/ceph-$id
> keyring = /var/lib/ceph/mds/ceph-$id/keyring
>   [mds.storage08]
> host = storage08
> mds addr = 192.168.0.178
>
>
> Configuration:
>   3 Hosts
>   Gentoo Linux (kernel 4.0.5)
>   Ceph 0.94.3
>   All have MON and OSD
>   One has MDS additionally.
>
>
> Any idea for on how the MDS to get running ?
>
>
> Kind regards
>   Petric
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mai

Re: [ceph-users] Important security noticed regarding release signing key

2015-09-21 Thread SCHAER Frederic

Hi,

Forgive the question if the answer is obvious... It's been more than "an hour 
or so" and eu.ceph.com apparently still hasn't been re-signed or at least what 
I checked wasn't :

# rpm -qp --qf '%{RSAHEADER:pgpsig}' 
http://eu.ceph.com/rpm-hammer/el7/x86_64/ceph-0.94.3-0.el7.centos.x86_64.rpm
RSA/SHA1, Wed 26 Aug 2015 09:57:17 PM CEST, Key ID 7ebfdd5d17ed316d

Should this repository/mirror be discarded and should we (in EU) switch to 
download.ceph.com ?

Thanks && regards


-Message d'origine-
De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de Sage 
Weil
Envoyé : jeudi 17 septembre 2015 18:30
À : ceph-annou...@ceph.com; ceph-de...@vger.kernel.org; ceph-us...@ceph.com; 
ceph-maintain...@ceph.com
Objet : [ceph-users] Important security noticed regarding release signing key

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Last week, Red Hat investigated an intrusion on the sites of both the Ceph 
community project (ceph.com) and Inktank (download.inktank.com), which 
were hosted on a computer system outside of Red Hat infrastructure.

Ceph.com provided Ceph community versions downloads signed with a Ceph 
signing key (id 7EBFDD5D17ED316D). Download.inktank.comprovided releases 
of the Red Hat Ceph product for Ubuntu and CentOS operating systems signed 
with an Inktank signing key (id 5438C7019DCEEEAD). While the investigation 
into the intrusion is ongoing, our initial focus was on the integrity of 
the software and distribution channel for both sites.

To date, our investigation has not discovered any compromised code or 
binaries available for download on these sites. However, we cannot fully 
rule out the possibility that some compromised code or binaries were 
available for download at some point in the past. Further, we can no 
longer trust the integrity of the Ceph signing key, and therefore have 
created a new signing key (id E84AC2C0460F3994) for verifying downloads. 
This new key is committed to the ceph.git repository and is 
also available from

https://git.ceph.com/release.asc

The new key should look like:

pub   4096R/460F3994 2015-09-15
uid  Ceph.com (release key) 

All future release git tags will be signed with this new key.

This intrusion did not affect other Ceph sites such as download.ceph.com 
(which contained some older Ceph downloads) or git.ceph.com (which mirrors 
various source repositories), and is not known to have affected any other 
Ceph community infrastructure.  There is no evidence that build system or 
the Ceph github source repository were compromised.

New hosts for ceph.com and download.ceph.com have been created and the 
sites have been rebuilt.  All content available on download.ceph.com as 
been verified, and all ceph.com URLs for package locations now redirect 
there.  There is still some content missing from download.ceph.com that 
will appear later today: source tarballs will be regenerated from git, and 
older release packages are being resigned with the new release key DNS 
changes are still propogating so you may not see the new versions of the 
ceph.com and download.ceph.com sites for another hour or so.

The download.inktank.com host has been retired and affected Red Hat 
customers have been notified, further information is available at 
https://securityblog.redhat.com/2015/09/17/.

Users of Ceph packages should take action as a precautionary measure to 
download the newly-signed versions.  Please see the instructions below.

The Ceph community would like to thank Kai Fabian for initially alerting 
us to this issue.

Any questions can be directed to the email discussion lists or the #ceph 
IRC channel on irc.oftc.net.

Thank you!
sage

- -

The following steps should be performed on all nodes with Ceph software 
installed.

Replace APT keys (Debian, Ubuntu)

sudo apt-key del 17ED316D
curl https://git.ceph.com/release.asc | sudo apt-key add -

Replace RPM keys (Fedora, CentOS, SUSE, etc.)

sudo rpm -e --allmatches gpg-pubkey-17ed316d-4fb96ee8
sudo rpm --import 'https://git.ceph.com/release.asc'

Reinstalling packages (Fedora, CentOS, SUSE, etc.)

sudo yum clean metadata
sudo yum reinstall -y $(repoquery --disablerepo= --enablerepo=ceph \
--queryformat='%{NAME}' list '*')

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAlX66k0ACgkQ2kQg7SiJlcg0wQCfVy+/2BfoNqtCfAcbuNABczFx
bpIAoLf8RTHisIn5wFvEb4Akym/UNn5l
=SEws
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mds not starting ?

2015-09-21 Thread Frank, Petric (Petric)

Hello John,

that was the info i missed (both - create pools and fs). Works now.

Thank you very much.

Kind regards
  Petric

> -Original Message-
> From: John Spray [mailto:jsp...@redhat.com]
> Sent: Montag, 21. September 2015 14:41
> To: Frank, Petric (Petric)
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] mds not starting ?
> 
> Follow the instructions here to set up a filesystem:
> http://docs.ceph.com/docs/master/cephfs/createfs/
> 
> It looks like you haven't done "ceph fs new".
> 
> Cheers,
> John
> 
> On Mon, Sep 21, 2015 at 1:34 PM, Frank, Petric (Petric)
>  wrote:
> > Hello,
> >
> > i'm facing a problem that mds seems not to start.
> >
> > I started mds in debug mode "ceph-mds -f -i storage08 --debug_mds 10"
> which outputs in the log:
> >
> > -- cut -
> > 2015-09-21 14:12:14.313534 7ff47983d780  0 ceph version 0.94.3
> > (95cefea9fd9ab740263bf8bb4796fd864d9afe2b), process ceph-mds, pid
> > 24787 starting mds.storage08 at :/0
> > 2015-09-21 14:12:14.316062 7ff47983d780 10 mds.-1.0 168
> MDSCacheObject
> > 2015-09-21 14:12:14.316074 7ff47983d780 10 mds.-1.0 2408
> CInode
> > 2015-09-21 14:12:14.316075 7ff47983d780 10 mds.-1.0 16
> elist<>::item   *7=112
> > 2015-09-21 14:12:14.316077 7ff47983d780 10 mds.-1.0 480  inode_t
> > 2015-09-21 14:12:14.316079 7ff47983d780 10 mds.-1.0 48nest_info_t
> > 2015-09-21 14:12:14.316081 7ff47983d780 10 mds.-1.0 32frag_info_t
> > 2015-09-21 14:12:14.316082 7ff47983d780 10 mds.-1.0 40   SimpleLock
> *5=200
> > 2015-09-21 14:12:14.316083 7ff47983d780 10 mds.-1.0 48   ScatterLock
> *3=144
> > 2015-09-21 14:12:14.316085 7ff47983d780 10 mds.-1.0 480 CDentry
> > 2015-09-21 14:12:14.316086 7ff47983d780 10 mds.-1.0 16
> elist<>::item
> > 2015-09-21 14:12:14.316096 7ff47983d780 10 mds.-1.0 40   SimpleLock
> > 2015-09-21 14:12:14.316097 7ff47983d780 10 mds.-1.0 952 CDir
> > 2015-09-21 14:12:14.316098 7ff47983d780 10 mds.-1.0 16
> elist<>::item   *2=32
> > 2015-09-21 14:12:14.316099 7ff47983d780 10 mds.-1.0 176  fnode_t
> > 2015-09-21 14:12:14.316100 7ff47983d780 10 mds.-1.0 48nest_info_t
> *2
> > 2015-09-21 14:12:14.316101 7ff47983d780 10 mds.-1.0 32frag_info_t
> *2
> > 2015-09-21 14:12:14.316103 7ff47983d780 10 mds.-1.0 264 Capability
> > 2015-09-21 14:12:14.316104 7ff47983d780 10 mds.-1.0 32
> xlist<>::item   *2=64
> > 2015-09-21 14:12:14.316665 7ff47983d780 -1 mds.-1.0 log_to_monitors
> > {default=true}
> > 2015-09-21 14:12:14.320840 7ff4740c8700  7 mds.-1.server
> > handle_osd_map: full = 0 epoch = 20
> > 2015-09-21 14:12:14.320984 7ff47983d780 10 mds.beacon.storage08 _send
> > up:boot seq 1
> > 2015-09-21 14:12:14.321060 7ff47983d780 10 mds.-1.0 create_logger
> > 2015-09-21 14:12:14.321234 7ff4740c8700  5 mds.-1.0 handle_mds_map
> epoch 1 from mon.1
> > 2015-09-21 14:12:14.321256 7ff4740c8700 10 mds.-1.0  my compat
> compat={},rocompat={},incompat={1=base v0.20,2=client writeable
> ranges,3=default file layouts on dirs,4=dir inode in separate
> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds
> uses inline data,8=no anchor table}
> > 2015-09-21 14:12:14.321264 7ff4740c8700 10 mds.-1.0  mdsmap compat
> > compat={},rocompat={},incompat={}
> > 2015-09-21 14:12:14.321267 7ff4740c8700 10 mds.-1.-1 map says i am
> > 192.168.0.178:6802/24787 mds.-1.-1 state down:dne
> > 2015-09-21 14:12:14.321272 7ff4740c8700 10 mds.-1.-1 not in map yet
> > 2015-09-21 14:12:14.321305 7ff4740c8700  7 mds.-1.server
> > handle_osd_map: full = 0 epoch = 20
> > 2015-09-21 14:12:14.321443 7ff4740c8700  5 mds.-1.-1 handle_mds_map
> > epoch 1 from mon.1
> > 2015-09-21 14:12:14.321447 7ff4740c8700  5 mds.-1.-1  old map epoch 1
> > <= 1, discarding
> > 2015-09-21 14:12:18.321061 7ff4707c0700 10 mds.beacon.storage08 _send
> > up:boot seq 2
> > 2015-09-21 14:12:19.321093 7ff470fc1700 10
> > MDSInternalContextBase::complete: N3MDS10C_MDS_TickE
> > 2015-09-21 14:12:22.321119 7ff4707c0700 10 mds.beacon.storage08 _send
> > up:boot seq 3
> > 2015-09-21 14:12:24.321169 7ff470fc1700 10
> > MDSInternalContextBase::complete: N3MDS10C_MDS_TickE ...
> > -- cut -
> >
> > "cheph -s" shows:
> >
> > cluster 982924a3-32e7-401f-9975-018bb697d717
> >  health HEALTH_OK
> >  monmap e1: 3 mons at
> {0=192.168.0.176:6789/0,1=192.168.0.177:6789/0,2=192.168.0.178:6789/0}
> > election epoch 6, quorum 0,1,2 0,1,2
> >  osdmap e20: 3 osds: 3 up, 3 in
> >   pgmap v39: 64 pgs, 1 pools, 0 bytes data, 0 objects
> > 15541 MB used, 388 GB / 403 GB avail
> >   64 active+clean
> >
> > As you see MONs and OSDs seem to be happy.
> > I'm missing the "mdsmap" entry here. Try to verify with the command
> "ceph mds stat" gives:
> >
> >   e1: 0/0/0 up
> >
> > The section of ceph.conf regarding mds reads:
> >   [mds]
> > mds data = /var/lib/ceph/mds/ceph-$id
> > keyring = /var/lib/ceph/mds/ceph-$id/keyring
> >   [mds

[ceph-users] mds0: Client client008 failing to respond to capability release

2015-09-21 Thread Kenneth Waegeman


Hi all!

A quick question:
We are syncing data over cephfs , and we are seeing messages in our 
output like:


mds0: Client client008 failing to respond to capability release

What does this mean? I don't find information about this somewhere else.

We are running ceph 9.0.3

On earlier versions, we often saw messages like 'failing to respond to 
cache pressure',  is this related?


Thanks!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Important security noticed regarding release signing key

2015-09-21 Thread Wido den Hollander



On 21-09-15 15:05, SCHAER Frederic wrote:
> Hi,
> 
> Forgive the question if the answer is obvious... It's been more than "an hour 
> or so" and eu.ceph.com apparently still hasn't been re-signed or at least 
> what I checked wasn't :
> 
> # rpm -qp --qf '%{RSAHEADER:pgpsig}' 
> http://eu.ceph.com/rpm-hammer/el7/x86_64/ceph-0.94.3-0.el7.centos.x86_64.rpm
> RSA/SHA1, Wed 26 Aug 2015 09:57:17 PM CEST, Key ID 7ebfdd5d17ed316d
> 
> Should this repository/mirror be discarded and should we (in EU) switch to 
> download.ceph.com ?

I fixed eu.ceph.com by putting a Varnish HTTP cache in between which now
links to ceph.com

You can still use eu.ceph.com and should be able to do so.

eu.ceph.com caches all traffic so that should be much snappier then
downloading everything from download.ceph.com directly.

Wido

> 
> Thanks && regards
> 
> 
> -Message d'origine-
> De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de Sage 
> Weil
> Envoyé : jeudi 17 septembre 2015 18:30
> À : ceph-annou...@ceph.com; ceph-de...@vger.kernel.org; ceph-us...@ceph.com; 
> ceph-maintain...@ceph.com
> Objet : [ceph-users] Important security noticed regarding release signing key
> 
> Last week, Red Hat investigated an intrusion on the sites of both the Ceph 
> community project (ceph.com) and Inktank (download.inktank.com), which 
> were hosted on a computer system outside of Red Hat infrastructure.
> 
> Ceph.com provided Ceph community versions downloads signed with a Ceph 
> signing key (id 7EBFDD5D17ED316D). Download.inktank.comprovided releases 
> of the Red Hat Ceph product for Ubuntu and CentOS operating systems signed 
> with an Inktank signing key (id 5438C7019DCEEEAD). While the investigation 
> into the intrusion is ongoing, our initial focus was on the integrity of 
> the software and distribution channel for both sites.
> 
> To date, our investigation has not discovered any compromised code or 
> binaries available for download on these sites. However, we cannot fully 
> rule out the possibility that some compromised code or binaries were 
> available for download at some point in the past. Further, we can no 
> longer trust the integrity of the Ceph signing key, and therefore have 
> created a new signing key (id E84AC2C0460F3994) for verifying downloads. 
> This new key is committed to the ceph.git repository and is 
> also available from
> 
>   https://git.ceph.com/release.asc
> 
> The new key should look like:
> 
> pub   4096R/460F3994 2015-09-15
> uid  Ceph.com (release key) 
> 
> All future release git tags will be signed with this new key.
> 
> This intrusion did not affect other Ceph sites such as download.ceph.com 
> (which contained some older Ceph downloads) or git.ceph.com (which mirrors 
> various source repositories), and is not known to have affected any other 
> Ceph community infrastructure.  There is no evidence that build system or 
> the Ceph github source repository were compromised.
> 
> New hosts for ceph.com and download.ceph.com have been created and the 
> sites have been rebuilt.  All content available on download.ceph.com as 
> been verified, and all ceph.com URLs for package locations now redirect 
> there.  There is still some content missing from download.ceph.com that 
> will appear later today: source tarballs will be regenerated from git, and 
> older release packages are being resigned with the new release key DNS 
> changes are still propogating so you may not see the new versions of the 
> ceph.com and download.ceph.com sites for another hour or so.
> 
> The download.inktank.com host has been retired and affected Red Hat 
> customers have been notified, further information is available at 
> https://securityblog.redhat.com/2015/09/17/.
> 
> Users of Ceph packages should take action as a precautionary measure to 
> download the newly-signed versions.  Please see the instructions below.
> 
> The Ceph community would like to thank Kai Fabian for initially alerting 
> us to this issue.
> 
> Any questions can be directed to the email discussion lists or the #ceph 
> IRC channel on irc.oftc.net.
> 
> Thank you!
> sage
> 
> -
> 
> The following steps should be performed on all nodes with Ceph software 
> installed.
> 
> Replace APT keys (Debian, Ubuntu)
> 
>   sudo apt-key del 17ED316D
>   curl https://git.ceph.com/release.asc | sudo apt-key add -
> 
> Replace RPM keys (Fedora, CentOS, SUSE, etc.)
> 
>   sudo rpm -e --allmatches gpg-pubkey-17ed316d-4fb96ee8
>   sudo rpm --import 'https://git.ceph.com/release.asc'
> 
> Reinstalling packages (Fedora, CentOS, SUSE, etc.)
> 
>   sudo yum clean metadata
>   sudo yum reinstall -y $(repoquery --disablerepo= --enablerepo=ceph \
>   --queryformat='%{NAME}' list '*')
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> __

Re: [ceph-users] Important security noticed regarding release signing key

2015-09-21 Thread Dan van der Ster

On Mon, Sep 21, 2015 at 3:50 PM, Wido den Hollander  wrote:
>
>
> On 21-09-15 15:05, SCHAER Frederic wrote:
>> Hi,
>>
>> Forgive the question if the answer is obvious... It's been more than "an 
>> hour or so" and eu.ceph.com apparently still hasn't been re-signed or at 
>> least what I checked wasn't :
>>
>> # rpm -qp --qf '%{RSAHEADER:pgpsig}' 
>> http://eu.ceph.com/rpm-hammer/el7/x86_64/ceph-0.94.3-0.el7.centos.x86_64.rpm
>> RSA/SHA1, Wed 26 Aug 2015 09:57:17 PM CEST, Key ID 7ebfdd5d17ed316d
>>
>> Should this repository/mirror be discarded and should we (in EU) switch to 
>> download.ceph.com ?
>
> I fixed eu.ceph.com by putting a Varnish HTTP cache in between which now
> links to ceph.com
>
> You can still use eu.ceph.com and should be able to do so.
>
> eu.ceph.com caches all traffic so that should be much snappier then
> downloading everything from download.ceph.com directly.
>

Thanks Wido for running this mirror/cache!

It's a pity though now that rsync'ing eu.ceph.com won't work. (And I
guess you should close eu.ceph.com's rsync doors now because it still
serves up the old packages).

Cheers, Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] EU Ceph mirror changes

2015-09-21 Thread Wido den Hollander

Hi,

Since the security notice regarding ceph.com the mirroring system broke.
This meant that eu.ceph.com didn't serve new packages since the whole
download system changed.

I didn't have much time to fix this, but today I resolved it by
installing Varnish [0] on eu.ceph.com

The VCL which is being used on eu.ceph.com can be found on my Github
gist page [1].

All URLs to eu.ceph.com should still work and data is served from the EU.

rsync is however no longer available since all data is stored in memory
and is downloaded from 'download.ceph.com' when not available locally.

It's not a ideal mirroring system right now, but it still works. If you
have multiple machines downloading the same package, the first request
for a RPM / DEB might be a cache-miss which requires a download from the
US, but afterwards it should serve the other requests from cache.

The VCL [1] I use can also be used locally on your own mirror. Feel free
to use it. I'll try to keep the one on Github as up to date as possible.

Wido

[0]: http://www.varnish-cache.org/
[1]: https://gist.github.com/wido/40ffb92ea99842c2666b
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Important security noticed regarding release signing key

2015-09-21 Thread Wido den Hollander



On 21-09-15 15:57, Dan van der Ster wrote:
> On Mon, Sep 21, 2015 at 3:50 PM, Wido den Hollander  wrote:
>>
>>
>> On 21-09-15 15:05, SCHAER Frederic wrote:
>>> Hi,
>>>
>>> Forgive the question if the answer is obvious... It's been more than "an 
>>> hour or so" and eu.ceph.com apparently still hasn't been re-signed or at 
>>> least what I checked wasn't :
>>>
>>> # rpm -qp --qf '%{RSAHEADER:pgpsig}' 
>>> http://eu.ceph.com/rpm-hammer/el7/x86_64/ceph-0.94.3-0.el7.centos.x86_64.rpm
>>> RSA/SHA1, Wed 26 Aug 2015 09:57:17 PM CEST, Key ID 7ebfdd5d17ed316d
>>>
>>> Should this repository/mirror be discarded and should we (in EU) switch to 
>>> download.ceph.com ?
>>
>> I fixed eu.ceph.com by putting a Varnish HTTP cache in between which now
>> links to ceph.com
>>
>> You can still use eu.ceph.com and should be able to do so.
>>
>> eu.ceph.com caches all traffic so that should be much snappier then
>> downloading everything from download.ceph.com directly.
>>
> 
> Thanks Wido for running this mirror/cache!
> 

You're welcome!

> It's a pity though now that rsync'ing eu.ceph.com won't work. (And I
> guess you should close eu.ceph.com's rsync doors now because it still
> serves up the old packages).

I just did. It's only serving HTTP now. The active data should all be in
cache now, so downloads should be fast.

I personally love Varnish, great reverse HTTP proxy.

Wido

> 
> Cheers, Dan
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] multi-datacenter crush map

2015-09-21 Thread Robert LeBlanc

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256




On Mon, Sep 21, 2015 at 3:02 AM, Wouter De Borger  wrote:
> Thank you for your answer! We will use size=4 and min_size=2, which should
> do the trick.
>
> For the monitor issue, we have a third datacenter (with higher latency, but
> that shouldn't be a problem for the monitors)

Just be sure that the IP address in the third DC is higher than the
other two. Ceph chooses the lowest IP address as the primary monitor.
Having the monitor with the highest latency as your primary is sure to
give you grief.

> We had also considered the locality issue. Our WAN round trip latency is 1.5
> ms (now) and we should get a dedicated light path in the near future (<0.1
> ms).
> So we hope to get acceptable latency without additional tweaking.
> The plan B is to make two pools, with different weights for the different
> DC's. VM's in DC 1 will get high weight for DC 1, VM's in DC 2 will get high
> weight for DC 2.

You don't want to mess with weights or else you will get stuck PGs
because there isn't enough storage at the remote DC to hold all your
data or the local DC fills up faster or CRUSH can't figure out how to
distribute balanced data in an unbalanced way. You will probably want
to look at primary affinity. This talks about SSD, but the same
principle applies
http://www.sebastien-han.fr/blog/2015/08/06/ceph-get-the-best-of-your-ssd-with-primary-affinity/

- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.1.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWAA80CRDmVDuy+mK58QAAK2UQAIadviUNo+hybghYO2de
0Ayqy1SQ9hPjpwJUTWt0VKIgIoDMMbp1+LVd6EiPRNXPPwalDZen8z1ARW5F
o0g0nxMkJIJWxY2U3ch4s3oE85Hl5NqyPZIdaLfqQv7PXfFsRTM7rZDMRuLn
X9PWcxzZ3lvXj6WA/jiN42zXlhTlhLhtKV153poCtd/iCkytJUEPZF/c+Pr+
/mIVVbbtzPDh5XC/Il/KEI1W+fZzPG27EL6N2YoskYWTEpUr0auBH7jTLm8B
kk4HKZToBsakP/j/LEhrh+MMW/ZNwuUaigWe351YQrkmlUBXxKVtnlgfJpDL
KIGvVDICRMwe6e3ubU7umsiFVchYPBlXFN3OoRmx3UfPp3WlfKong+PG9X8+
JoukMyj630jmaDy7KPH5sKv+LUaLpWlWv6271bVUDlSMjPNHqVMlk/HsOGit
QYaWQQygQ+D/YedLP66xz+O6A8qfmrYfWBZZDdG9C49erw6Q6Ihzp6u+Hj1O
zt96JsmBjHcq27AfK1jSbZDGOHchL78/Ok3O4O01tC00HwRvQLy/ofppOzgn
KHnuLMs/9fZpSqh5sZ1BxWSgKBnYzc3npFloaOZLoVPT4Dv+hM8kL9lF0mHm
3j1D4ZmysVaSU+zVw0WsWaNVQyrrPn+i9ivT6U/3VHiaS2Nz+BQTQeq0oMqQ
awOM
=sl/W
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] change ruleset with data

2015-09-21 Thread Robert LeBlanc

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

I think you will be OK, but you should double check on a test cluster.
You should be able to revert the rulesets if the data isn't found.

- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Mon, Sep 21, 2015 at 5:42 AM, Xiangyu (Raijin, BP&IT Dept)  wrote:
I have a ceph cluster set with two ruleset, each ruleset will select
different osd, if can change ruleset for a pool online drectly(there
have data exist on the pool) ?

Best Regards!
*

向毓(Raijin.Xiang)
计算与存储部(Computing and Storage Dept)
华为技术有限公司(Huawei Technologies Co., Ltd.)
Mobile：+86 186 2032 2562
Mail: xiang...@huawei.com
地址：深圳市龙岗区坂田街道雪岗路2018号天安云谷产业园一期1栋B座  邮编：518129
Bldg1-B, Cloud Park, Huancheng Road, Bantian Str., Longgang District,
518129  Shenzhen, P. R. China
***


___
ceph-users mailing
listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.1.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWABD7CRDmVDuy+mK58QAAJvMP/RVKLWw+HN3mF35VXkOc
Resrw0CjLw3z/Rsh1qZ3j/OrMvGY3RmrEWRldOUEy8sn5dNsMOKr+pRPeCi2
9t8y4JxsCHsvrIKJGHa4Btbz5f9ZVzafJhkjeykeQGDrSMrJaIARLy0Lrjrm
F+cZ05gb9cWDLbAGARN3oUl3HEbFdJ3T9jXJhuzgiwKCRwwgaacNykGl0oNR
IkesOITrRsZDxSma4YjqGxrNgTh/pdO4r0C9GLq6Iva73qE/XZw5J+1YrzPa
jeIkSZtU0nUGolF83/yVRuDL7EdCNvq1qeWhsX70URBNDthpOtGlxhdPnOkc
U3PVTHU4fnPwT+bSbJ81YfcCn6URa316MTOQGTGYRB6BtvrfUtqRWjTrJlD+
vlJBqWK4eTfiG1RSC1gFrOkPxOP9+A0wJW6K3SHbRiz5Hi0Dqr+msR9R5k1H
prq1pV6r4o71YoS9vjyKoVMHvqUVN2VFge6sjqnnvbqUsLCluwDzevZN15nB
DIVFtyiSq1UtOHdSyYTvQbKs7HnedM0HQ0Fhn0Zkaf8blgq2MDOVGmRyoGcb
8uYy11Pc88GZJmVLYOeOQPE3qt9yddosYf38EHBeweNVVTAlMw9/ipAe/6Gb
+jUzv4b3K9hlr0+GZEPTGSNQtHvZj5F9gKskj+Ti6LU4ryUtH/08GiWXL4Xp
yD3y
=HOeT
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] snapshot failed after enable cache tier

2015-09-21 Thread Robert LeBlanc

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

If I recall, there was a bug or two that was found with cache tiers
and snapshots and were fixed. I hope it is being backported to Hammer.
I don't know if this exactly fixes your issue.

- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Mon, Sep 21, 2015 at 6:09 AM, Andrija Panic  wrote:
Hi,

depending on cache mode etc - from what we have also experienced
(using CloudStack) - CEPH snapshot functionality simply stops working
in some cache configuration.
This means, we were also unable to deploy new VMs (base-gold snapshot
is created on CEPH and new data disk which is child of snapshot etc).

Inktank:https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf

Mail-list:https://www.mail-archive.com/ceph-users@lists.ceph.com/msg18338.html


On 21 September 2015 at 13:36, Xiangyu (Raijin, BP&IT Dept)  wrote:
Openstack Kilo use ceph as the backend storage (nova,cinder and
glance),after enable cache tier for glance pool, take snapshot for
instance failed (it seems generate the snapshot then delete it
automatically soon)

If cache tier not suitable for glance ?

Best Regards!
*

向毓(Raijin.Xiang)
计算与存储部(Computing and Storage Dept)
华为技术有限公司(Huawei Technologies Co., Ltd.)
Mobile：+86 186 2032 2562
Mail: xiang...@huawei.com
地址：深圳市龙岗区坂田街道雪岗路2018号天安云谷产业园一期1栋B座  邮编：518129
Bldg1-B, Cloud Park, Huancheng Road, Bantian Str., Longgang District,
518129  Shenzhen, P. R. China
***


___
ceph-users mailing
listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




- --

Andrija Panić

___
ceph-users mailing
listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.1.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWABG4CRDmVDuy+mK58QAAr+EQAMkjHrL88ByPx4pbmbPB
1ktYHq4sSkJ5AZGg4azlIl3lh5E8QVPOf6XcZ4z1jfgplBGw1XBcVeEoC+3S
wuXnK15O4TIlvDRo7Mfe5CMXcUZzfWs/lv9UHk5iyIE9eO/pf8d5OALoUxI4
fCB4KZX/AxnCjoD0uO4RANqRzbOGDYTacwM0tNVMJ/oxeGrrAgiXDk0UvX1r
hF3kch2U78HUEivt1tJr2YO32Db0FqzAJ3JLRwgnUcWD6/cvpZtECbZOANAY
EAvNINAkv63btnGX76R9BwabguXMlfLYcnvEy+X2mFBmMdVv/IEEVzT+ovET
/f2WEvGMLUFy6lriavrKG7O73vuzqhvSSKfPj+5VKLmNp2JvsDHqEswIMbNE
WuS6JybbL/hrEZOfaIJ8hzU6PeyLpiHZwR468K0kR2R6wNY5ekkClR4IH7W0
jkitm/ues3oncAo3fIkSSIxE68cLiMZwSpwWVqRC2wRt9KjFFr/RTHAuIYn2
QV7ucdydwzkpqpBC3LMoDWjzG3XCRXKwwP6odOlfG6rN0Z8N1UC4bD8+O71h
UmPo60nIDeJ0VGWeqtcttQgbusynk2NB4cPYR9ios9Da/ywAn0nH9PbB+nsw
D3IgVrttO1qhyd/1RG/De9b4CS4zyYeqBmG6wk7Nop9Eee4b4ujSQBKnmtHA
1gXm
=MuKM
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Important security noticed regarding release signing key

2015-09-21 Thread SCHAER Frederic

-Message d'origine-
De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de Wido 
den Hollander
Envoyé : lundi 21 septembre 2015 15:50
À : ceph-users@lists.ceph.com
Objet : Re: [ceph-users] Important security noticed regarding release signing 
key



On 21-09-15 15:05, SCHAER Frederic wrote:
> Hi,
> 
> Forgive the question if the answer is obvious... It's been more than "an hour 
> or so" and eu.ceph.com apparently still hasn't been re-signed or at least 
> what I checked wasn't :
> 
> # rpm -qp --qf '%{RSAHEADER:pgpsig}' 
> http://eu.ceph.com/rpm-hammer/el7/x86_64/ceph-0.94.3-0.el7.centos.x86_64.rpm
> RSA/SHA1, Wed 26 Aug 2015 09:57:17 PM CEST, Key ID 7ebfdd5d17ed316d
> 
> Should this repository/mirror be discarded and should we (in EU) switch to 
> download.ceph.com ?

I fixed eu.ceph.com by putting a Varnish HTTP cache in between which now
links to ceph.com

You can still use eu.ceph.com and should be able to do so.

eu.ceph.com caches all traffic so that should be much snappier then
downloading everything from download.ceph.com directly.

Wido

[>- FS : -<] Many thanks for your quick reply and quick reaction !

Frederic 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mds0: Client client008 failing to respond to capability release

2015-09-21 Thread John Spray

On Mon, Sep 21, 2015 at 2:33 PM, Kenneth Waegeman
 wrote:
> Hi all!
>
> A quick question:
> We are syncing data over cephfs , and we are seeing messages in our output
> like:
>
> mds0: Client client008 failing to respond to capability release
>
> What does this mean? I don't find information about this somewhere else.

It means the MDS thinks that client is exhibiting a buggy behaviour in
failing to respond to requests to release resources.

> We are running ceph 9.0.3
>
> On earlier versions, we often saw messages like 'failing to respond to cache
> pressure',  is this related?

They're two different health checks that both indicate potential
problems with the clients.

What version of ceph-fuse or kernel client is in use?

John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Software Raid 1 for system disks on storage nodes (not for OSD disks)

2015-09-21 Thread Robert LeBlanc

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Ceph-disk (which ceph-deploy uses), uses GPT to partition OSDs so that
they can be automatically started by udev and reference partitions for
journals using unique identifiers. The necessary data to start the
OSD, the auth key, fsid, etc, are stored on the OSD file system. This
allows you to move OSD disks between nodes in a Ceph cluster. As long
as you don't reformat the OSD drives (and any journals), then if you
reimage the host, install Ceph, copy the ceph.conf and the OSD
bootstrap keys, then it will act as if you have just moved the disks
between servers. We have reformatted the OS of one of the nodes last
week and the OSDs survived and rejoined the cluster after Puppet laid
down the configuration.

If you have journals on SSD and you try to move a spindle, then you
have some more work to do. You either have to move the SSD as well
(and any other spindles that have journals on it), or flush the
journal and create a new one on the destination host.

If you are using dm-crypt, you also need to save the encryption key as
it is not on the OSD FS for obvious reasons.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Mon, Sep 21, 2015 at 3:18 AM, Vickey Singh  wrote:
>
>
> On Fri, Sep 18, 2015 at 6:33 PM, Robert LeBlanc
> wrote:
>>
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>>
>> Depends on how easy it is to rebuild an OS from scratch. If you have
>> something like Puppet or Chef that configure a node completely for
>> you, it may not be too much of a pain to forgo the RAID. We run our
>> OSD nodes from a single SATADOM and use Puppet for configuration. We
>> also don't use swap (not very effective on SATADOM), but have enough
>> RAM that we feel comfortable enough with that decision.
>>
>> If you use ceph-disk or ceph-deploy to configure the OSDs, then they
>> should automatically come back up when you lay down the new OS and set
>> up the necessary ceph config items (ceph.conf and the OSD bootstrap
>> keys).
>
>
> Hello sir
>
> This sounds really interesting , could you please elaborate how after
> reinstalling OS and installing Ceph packages, how does Ceph detects OSD's
> that were hosted earlier on this node.
>
> I am using ceph-deploy to provision ceph , now what all changes i need to do
> after reinstalling OS of a OSD node. So that it should detect my OSD
> daemons. Please help me to know this step by step.
>
> Thanks in advance.
>
> Vickey
>
>
>>
>> - 
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Fri, Sep 18, 2015 at 9:06 AM, Martin Palma  wrote:
>> > Hi,
>> >
>> > Is it a good idea to use a software raid for the system disk (Operating
>> > System) on a Ceph storage node? I mean only for the OS not for the OSD
>> > disks.
>> >
>> > And what about a swap partition? Is that needed?
>> >
>> > Best,
>> > Martin
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>> -BEGIN PGP SIGNATURE-
>> Version: Mailvelope v1.1.0
>> Comment: https://www.mailvelope.com
>>
>> wsFcBAEBCAAQBQJV/C7UCRDmVDuy+mK58QAAoTMQAMZBv4/lphmntC23b9/l
>> JWUPjZfbXUtNgnfMvWcVyTSXsTtM5mY/4/iSZ4ZfCQ4YyqWWMpSlocHONHFz
>> nFTtGupqV3vPCo4X8bl58/iv4J0H2iWUr2klk7jtTj+e+JjyWDo25l8V2ofP
>> edt5g7qcMAwiWYrrpjxQBK4AFNiPJKSMxrzK1Mgic15nwX0OJu0DDNS5twzZ
>> s8Y+UfS80+hZvyBTUGhsO8pkYoJQvYRGgyqYtCdxA+m1T8lWVe8SC0eLWOXy
>> xoyGR7dqcvEXQadrqfmU618eNpNEECPoHeIkeCqpTohrUVsyRcfSGAtfM0YY
>> Ixf2SCaDMAaRwvXGJUf5OP/3HHWps0m4YyLBOddPZ5XZb1utZiclh26KuOyw
>> QdGkP7uoYEMO0v40dcsIbOVhtgTdX+HrpEGuqEtNEGe194sS1nluw+49aLxe
>> eozHSRGq3GmRm/q3bR5f2p+WXwKqmdDRFhqII8H11bb5F7etU2PBo1JA2bTW
>> hUFqu6+ST8eI34OeC7LbC9Txfw/iUhL62kiCm+gj8Rg+m+TZ7a1HEaVc8uyq
>> Jw1+5hIgyTWFvKdIiW65k++8w9my6kUIsY8RT8p08DTSPzxuwGtHr7UJJ629
>> K/tlpGdQTRf7PXgmea6sSodnmaF5HRIUdU0nhQpRRxjX/V+PENI8Qq45KyfX
>> BovV
>> =Gzvl
>> -END PGP SIGNATURE-
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>

-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.1.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWACdUCRDmVDuy+mK58QAAMFsP/3CdU8PvnbWYpT0uur5/
iUCEixJL3ChKY2RSODraR1FIUqFDannAzw/njafy21ZhzfY6YKtWfajE2qiC
vNr1Y7bkZ711lhn8XRE4zYTBAzOZWkGEfE9DV2eYaEXB2ZbktJcJkEW8ykOG
ynyDkKCkilROhyQTdiIUEx7IH12OJFUB3jjXIKdWtiMizZxDf/CnsXw6RwpY
k9diX0FnZ0tcDprKc5GkT7ZNHLPNPHr39cG+0fwsmjA/bgzoh2GL4TytGON6
f6wqBR9Msb+qIGtSLd2AytJYhtwQoaMLAvmJc/uoplGbArU4yQwan5XIsMww
jp5jK9z6uEBPHvMTOQjP8dp1eZoNE+PMrRssDsW28nLhr9TPohbOpjPyNVs7
/bpoEgZmmkKq3w8TWbkErS7O2ibNHSdKCjq0JISe0Qg5s58mcCLmiNj3OxC2
qogTDOVy9yP0VPKVdLa4XXNXYs5LMYHl6+3d6mtBWI01dVVIjO30Yc0mfvi4
pfuI1u5Fz8geIjcHcz/ruz51RgyEOeL2eEQsb0s/M7w6jPDhRQrZ42JS69Mg
0o48HXkxJRjy2kBd+h6sXd2wXgQwKTsse6Fy+cKU5xCuGYuT/RS6SxjKadqp
E4Oswi

Re: [ceph-users] debian repositories path change?

2015-09-21 Thread Ken Dreyer

On Sat, Sep 19, 2015 at 7:54 PM, Lindsay Mathieson
 wrote:
> I'm getting:
>
>   W: GPG error: http://download.ceph.com wheezy Release: The following
> signatures couldn't be verified because the public key is not available:
> NO_PUBKEY E84AC2C0460F3994
>
>
> Trying to update from there

Hi Lindsay, did you add the new release key?

As described at
http://ceph.com/releases/important-security-notice-regarding-signing-key-and-binary-downloads-of-ceph/

  sudo apt-key del 17ED316D
  curl https://git.ceph.com/release.asc | sudo apt-key add -
  sudo apt-get update

- Ken
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] lttng duplicate registration problem when using librados2 and libradosstriper

2015-09-21 Thread Jason Dillaman

This is usually indicative of the same tracepoint event being included by both 
a static and dynamic library.  See the following thread regarding this issue 
within Ceph when LTTng-ust was first integrated [1].  Since I don't have any 
insight into your application, are you somehow linking against Ceph static 
libraries?

[1] http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/20353

- Original Message -
> From: "Nick Fisk" 
> To: "Paul Mansfield" , 
> ceph-users@lists.ceph.com
> Sent: Saturday, September 19, 2015 3:10:02 AM
> Subject: Re: [ceph-users] lttng duplicate registration problem when using 
> librados2 and libradosstriper
> 
> Hi Paul,
> 
> I hit the same problem here (see last post):
> 
> https://groups.google.com/forum/#!topic/bareos-users/mEzJ7IbDxvA
> 
> If I ever get to the bottom of it, I will let you know. Sorry I can't be of
> any more help.
> 
> Nick
> 
> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> > Paul Mansfield
> > Sent: 18 September 2015 17:16
> > To: ceph-users@lists.ceph.com
> > Subject: [ceph-users] lttng duplicate registration problem when using
> > librados2 and libradosstriper
> > 
> > Hello,
> > thanks for your attention.
> > 
> > I have started using rados striper library, calling the functions from a C
> > program.
> > 
> > As soon as I add libradosstriper to the linking process, I get this error
> when
> > the program runs, even though I am not calling any functions from the
> rados
> > striper library (I commented them out).
> > 
> > LTTng-UST: Error (-17) while registering tracepoint probe. Duplicate
> > registration of tracepoint probes having the same name is not allowed.
> > /bin/sh: line 1: 61001 Aborted (core dumped) ./$test
> > 
> > 
> > I had been using lttng in my program but removed it to ensure it wasn't
> > causing the problem.
> > 
> > I have tried running the program using gdb but the calls to initialise
> lttng occur
> > before main() is called and so I cannot add a break point to see what is
> > happening.
> > 
> > 
> > thanks
> > Paul
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mds0: Client client008 failing to respond to capability release

2015-09-21 Thread Kenneth Waegeman




On 21/09/15 16:32, John Spray wrote:

On Mon, Sep 21, 2015 at 2:33 PM, Kenneth Waegeman
 wrote:

Hi all!

A quick question:
We are syncing data over cephfs , and we are seeing messages in our output
like:

mds0: Client client008 failing to respond to capability release

What does this mean? I don't find information about this somewhere else.

It means the MDS thinks that client is exhibiting a buggy behaviour in
failing to respond to requests to release resources.


We are running ceph 9.0.3

On earlier versions, we often saw messages like 'failing to respond to cache
pressure',  is this related?

They're two different health checks that both indicate potential
problems with the clients.

What version of ceph-fuse or kernel client is in use?

ceph-fuse is used, also 9.0.3


John


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Uneven data distribution across OSDs

2015-09-21 Thread Andras Pataki

Hi ceph users,

I am using CephFS for file storage and I have noticed that the data gets 
distributed very unevenly across OSDs.

  *   I have about 90 OSDs across 8 hosts, and 4096 PGs for the cephfs_data 
pool with 2 replicas, which is in line with the total PG recommendation if 
"Total PGs = (OSDs * 100) / pool_size" from the docs.
  *   CephFS distributes the data pretty much evenly across the PGs as shown by 
'ceph pg dump'
  *   However - the number of PGs assigned to various OSDs (per weight 
unit/terabyte) varies quite a lot.  The fullest OSD has as many as 44 PGs per 
terabyte (weight unit), while the emptier ones have as few as 19 or 20.
  *   Even if I consider the total number of PGs for all pools per OSD, the 
number varies similarly wildly (as with the cephfs_data pool only).

As a result, when the whole CephFS file system is at 60% full, some of the OSDs 
already reach the 95% full condition, and no more data can be written to the 
system.
Is there any way to force a more even distribution of PGs to OSDs?  I am using 
the default crush map, with two levels (root/host).  Can any changes to the 
crush map help?  I would really like to be get higher disk utilization than 60% 
without 1 of 90 disks filling up so early.

Thanks,

Andras

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Uneven data distribution across OSDs

2015-09-21 Thread Michael Hackett

Hello Andras,

Some initial observations and questions: 

The total PG recommendation for this cluster would actually be 8192 PGs per the 
formula. 

Total PG's = (90 * 100) / 2 = 4500 

Next power of 2 = 8192. 

The result should be rounded up to the nearest power of two. Rounding up is 
optional, but recommended for CRUSH to evenly balance the number of objects 
among placement groups.

How many data pools are being used for storing objects?

'ceph osd dump |grep pool'

Also how are these 90 OSD's laid out across the 8 hosts and is there any 
discrepancy between disk sizes and weight?

'ceph osd tree'

Also what are you using for CRUSH tunables and what Ceph release?

'ceph osd crush show-tunables'
'ceph -v'

Thanks,

- Original Message -
From: "Andras Pataki" 
To: ceph-users@lists.ceph.com
Sent: Monday, September 21, 2015 2:00:29 PM
Subject: [ceph-users] Uneven data distribution across OSDs

Hi ceph users, 

I am using CephFS for file storage and I have noticed that the data gets 
distributed very unevenly across OSDs. 


* I have about 90 OSDs across 8 hosts, and 4096 PGs for the cephfs_data 
pool with 2 replicas, which is in line with the total PG recommendation if 
“Total PGs = (OSDs * 100) / pool_size” from the docs. 
* CephFS distributes the data pretty much evenly across the PGs as shown by 
‘ceph pg dump’ 
* However – the number of PGs assigned to various OSDs (per weight 
unit/terabyte) varies quite a lot. The fullest OSD has as many as 44 PGs per 
terabyte (weight unit), while the emptier ones have as few as 19 or 20. 
* Even if I consider the total number of PGs for all pools per OSD, the 
number varies similarly wildly (as with the cephfs_data pool only). 
As a result, when the whole CephFS file system is at 60% full, some of the OSDs 
already reach the 95% full condition, and no more data can be written to the 
system. 
Is there any way to force a more even distribution of PGs to OSDs? I am using 
the default crush map, with two levels (root/host). Can any changes to the 
crush map help? I would really like to be get higher disk utilization than 60% 
without 1 of 90 disks filling up so early. 

Thanks, 

Andras 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Uneven data distribution across OSDs

2015-09-21 Thread Andras Pataki

Hi Michael,

I could certainly double the total PG count, but and it probably will
reduce the discrepancies somewhat, but I wonder if it would be all that
different.  I could of course be very wrong.

ceph osd dump |grep pool output:


pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
pool 1 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 512 pgp_num 512 last_change 3347 flags hashpspool
stripe_width 0
pool 2 'cephfs_data' replicated size 2 min_size 1 crush_ruleset 0
object_hash rjenkins pg_num 4096 pgp_num 4096 last_change 4288 flags
hashpspool crash_replay_interval 45 stripe_width 0
pool 3 'cephfs_metadata' replicated size 2 min_size 1 crush_ruleset 0
object_hash rjenkins pg_num 512 pgp_num 512 last_change 3349 flags
hashpspool stripe_width 0

Only pool 2 has significant amount of data in it (99.9% of the data is
there) (from ceph df):

POOLS:
NAMEID USED   %USED MAX AVAIL OBJECTS
rbd 0   0 018234G0
data1   0 018234G0
cephfs_data 2  92083G 29.7018234G 24155210
cephfs_metadata 3  60839k 018234G36448

As for disk sizes, yes there are discrepancies, we have 1TB, 2TB and 6TB
disks on various hosts (7 hosts not 8 as I said before).  Two exceptions
(osd.84 I reduced the weight because it filled up, osd.57 is a 5TB
partition of a 6TB disk).  All others are just the three disk sizes.  The
weights were set automatically accordingly at installation.  The OSD tree:


 -1 295.55994 root default
 -6  21.84000 host scda002
  0   0.90999 osd.0   up  1.0  1.0
 10   0.90999 osd.10  up  1.0  1.0
 11   0.90999 osd.11  up  1.0  1.0
 12   0.90999 osd.12  up  1.0  1.0
 13   0.90999 osd.13  up  1.0  1.0
 14   0.90999 osd.14  up  1.0  1.0
 15   0.90999 osd.15  up  1.0  1.0
 16   0.90999 osd.16  up  1.0  1.0
 32   1.81999 osd.32  up  1.0  1.0
 33   1.81999 osd.33  up  1.0  1.0
 34   1.81999 osd.34  up  1.0  1.0
 35   1.81999 osd.35  up  1.0  1.0
 36   1.81999 osd.36  up  1.0  1.0
 37   1.81999 osd.37  up  1.0  1.0
 38   1.81999 osd.38  up  1.0  1.0
 39   1.81999 osd.39  up  1.0  1.0
 -3  29.01999 host scda006
 84   0.81000 osd.84  up  1.0  1.0
 85   0.90999 osd.85  up  1.0  1.0
 86   0.90999 osd.86  up  1.0  1.0
 87   0.90999 osd.87  up  1.0  1.0
 88   0.90999 osd.88  up  1.0  1.0
 89   0.90999 osd.89  up  1.0  1.0
 90   0.90999 osd.90  up  1.0  1.0
 91   0.90999 osd.91  up  1.0  1.0
  9   0.90999 osd.9   up  1.0  1.0
 17   0.90999 osd.17  up  1.0  1.0
 18   0.90999 osd.18  up  1.0  1.0
 19   0.90999 osd.19  up  1.0  1.0
 20   0.90999 osd.20  up  1.0  1.0
 21   0.90999 osd.21  up  1.0  1.0
 22   0.90999 osd.22  up  1.0  1.0
 23   0.90999 osd.23  up  1.0  1.0
 49   1.81999 osd.49  up  1.0  1.0
 50   1.81999 osd.50  up  1.0  1.0
 51   1.81999 osd.51  up  1.0  1.0
 52   1.81999 osd.52  up  1.0  1.0
 53   1.81999 osd.53  up  1.0  1.0
 54   1.81999 osd.54  up  1.0  1.0
 55   1.81999 osd.55  up  1.0  1.0
 56   1.81999 osd.56  up  1.0  1.0
 -2  70.98000 host scda005
 79   5.45999 osd.79  up  1.0  1.0
 80   5.45999 osd.80  up  1.0  1.0
 81   5.45999 osd.81  up  1.0  1.0
 82   5.45999 osd.82  up  1.0  1.0
 83   5.45999 osd.83

[ceph-users] Client Local SSD Caching

2015-09-21 Thread Lazuardi Nasution

Hi,

I'm looking for recommended client local SSD caching startegy on OpenStack
Compute where the backend storage is CEPH cluster. The target is for
reducing Compute to Storage traffic. I have not known that librbd support
local SSD caching. Beside, I'm not sure if block SSD caching of local
mounted RBD volume (VM images/block as local files) will be good enough
when more than one Compute access the same RBD volume. Any idea?

Best regards,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] CephFS Fuse Issue

2015-09-21 Thread Scottix

I was rsyncing files to ceph from an older machine and I ran into a
ceph-fuse crash.

OpenSUSE 12.1, 3.1.10-1.29-desktop
ceph-fuse 0.94.3

The rsync was running for about 48 hours then crashed somewhere along the
way.

I added the log, and can run more if you like, I am not sure how to
reproduce it easily except run it again, which will take a while to see
with any results.


ceph.log.bz2
Description: application/bzip
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Receiving "failed to parse date for auth header"

2015-09-21 Thread Jens Hadlich


Am 04.09.2015 um 11:42 schrieb Ramon Marco Navarro:

Good day everyone!

I'm having a problem using aws-java-sdk to connect to Ceph using
radosgw. I am reading a " NOTICE: failed to parse date for auth header"
message in the logs. HTTP_DATE is "Fri, 04 Sep 2015 09:25:33 +00:00",
which is I think a valid rfc 1123 date...

Here's a link to the related lines in the log file:
https://gist.github.com/ramonmaruko/96e841167eda907f768b

Thank you for any help in advance!


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



I had the same problem today. The date has to be rfc 822 (no colons) but 
the pattern ("EEE, dd MMM  HH:mm:ss z") at 
com.amazonaws.util.DateUtils#L55 has a "z" which can give "GMT" but also 
"+00:00". I couldn't figure out yet why I always get "+00:00" with my 
recent project but not with others (same Java 8, same SDK version) ...


I create a ticket and provided a fix: 
https://github.com/aws/aws-sdk-java/issues/514 because to my 
understanding it should always be GMT.


Cheers,
Jens


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Potential OSD deadlock?

2015-09-21 Thread Gregory Farnum

So it sounds like you've got two different things here:
1) You get a lot of slow operations that show up as warnings.

2) Rarely, you get blocked op warnings that don't seem to go away
until the cluster state changes somehow.

(2) is the interesting one. Since you say the cluster is under heavy
load, I presume (1) is just you overloading your servers and getting
some hot spots that take time to clear up.

There have been bugs in the past where slow op warnings weren't
getting removed when they should have. I don't *think* any are in
.94.3 but could be wrong. Have you observed these from the other
direction, where a client has blocked operations?
If you want to go through the logs yourself, you should try and find
all the lines about one of the operations which seems to be blocked.
They aren't the most readable but if you grep for the operation ID
(client.4267090.0:3510311) and then once you're in the right area look
for what the threads processing it are doing you should get some idea
of where things are going wrong you can share.
-Greg

On Sun, Sep 20, 2015 at 10:43 PM, Robert LeBlanc  wrote:
> We set the logging on an OSD that had problems pretty frequently, but
> cleared up in less than 30 seconds. The logs are at
> http://162.144.87.113/files/ceph-osd.112.log.xz and are uncompressed
> at 8.6GB. Some of the messages we were seeing in ceph -w are:
>
> 2015-09-20 20:55:44.029041 osd.112 [WRN] 10 slow requests, 10 included
> below; oldest blocked for > 30.132696 secs
> 2015-09-20 20:55:44.029047 osd.112 [WRN] slow request 30.132696
> seconds old, received at 2015-09-20 20:55:13.896286:
> osd_op(client.3289538.0:62497509
> rbd_data.29b9ae3f960770.0200 [stat,set-alloc-hint
> object_size 8388608 write_size 8388608,write 2588672~4096] 17.118f0c67
> ack+ondisk+write+known_if_redirected e57590) currently reached_pg
> 2015-09-20 20:55:44.029051 osd.112 [WRN] slow request 30.132619
> seconds old, received at 2015-09-20 20:55:13.896363:
> osd_op(client.3289538.0:62497510
> rbd_data.29b9ae3f960770.0200 [stat,set-alloc-hint
> object_size 8388608 write_size 8388608,write 2908160~12288]
> 17.118f0c67 ack+ondisk+write+known_if_redirected e57590) currently
> waiting for rw locks
> 2015-09-20 20:55:44.029054 osd.112 [WRN] slow request 30.132520
> seconds old, received at 2015-09-20 20:55:13.896462:
> osd_op(client.3289538.0:62497511
> rbd_data.29b9ae3f960770.0200 [stat,set-alloc-hint
> object_size 8388608 write_size 8388608,write 2949120~4096] 17.118f0c67
> ack+ondisk+write+known_if_redirected e57590) currently waiting for rw
> locks
> 2015-09-20 20:55:44.029058 osd.112 [WRN] slow request 30.132415
> seconds old, received at 2015-09-20 20:55:13.896567:
> osd_op(client.3289538.0:62497512
> rbd_data.29b9ae3f960770.0200 [stat,set-alloc-hint
> object_size 8388608 write_size 8388608,write 2957312~4096] 17.118f0c67
> ack+ondisk+write+known_if_redirected e57590) currently waiting for rw
> locks
> 2015-09-20 20:55:44.029061 osd.112 [WRN] slow request 30.132302
> seconds old, received at 2015-09-20 20:55:13.896680:
> osd_op(client.3289538.0:62497513
> rbd_data.29b9ae3f960770.0200 [stat,set-alloc-hint
> object_size 8388608 write_size 8388608,write 2998272~4096] 17.118f0c67
> ack+ondisk+write+known_if_redirected e57590) currently waiting for rw
> locks
> 2015-09-20 20:55:45.029290 osd.112 [WRN] 9 slow requests, 5 included
> below; oldest blocked for > 31.132843 secs
> 2015-09-20 20:55:45.029298 osd.112 [WRN] slow request 31.132447
> seconds old, received at 2015-09-20 20:55:13.896759:
> osd_op(client.3289538.0:62497514
> rbd_data.29b9ae3f960770.0200 [stat,set-alloc-hint
> object_size 8388608 write_size 8388608,write 3035136~4096] 17.118f0c67
> ack+ondisk+write+known_if_redirected e57590) currently waiting for rw
> locks
> 2015-09-20 20:55:45.029303 osd.112 [WRN] slow request 31.132362
> seconds old, received at 2015-09-20 20:55:13.896845:
> osd_op(client.3289538.0:62497515
> rbd_data.29b9ae3f960770.0200 [stat,set-alloc-hint
> object_size 8388608 write_size 8388608,write 3047424~4096] 17.118f0c67
> ack+ondisk+write+known_if_redirected e57590) currently waiting for rw
> locks
> 2015-09-20 20:55:45.029309 osd.112 [WRN] slow request 31.132276
> seconds old, received at 2015-09-20 20:55:13.896931:
> osd_op(client.3289538.0:62497516
> rbd_data.29b9ae3f960770.0200 [stat,set-alloc-hint
> object_size 8388608 write_size 8388608,write 3072000~4096] 17.118f0c67
> ack+ondisk+write+known_if_redirected e57590) currently waiting for rw
> locks
> 2015-09-20 20:55:45.029315 osd.112 [WRN] slow request 31.132199
> seconds old, received at 2015-09-20 20:55:13.897008:
> osd_op(client.3289538.0:62497517
> rbd_data.29b9ae3f960770.0200 [stat,set-alloc-hint
> object_size 8388608 write_size 8388608,write 3211264~4096] 17.118f0c67
> ack+ondisk+write+known_if_redirected e57590) currently waiting for rw
> locks
> 2015-09-20 20:55:45.029326 osd.112 [WRN] sl

Re: [ceph-users] Potential OSD deadlock?

2015-09-21 Thread Robert LeBlanc

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

In my lab cluster I can saturate the disks and I'm not seeing any of
the blocked I/Os from the Ceph side, although the client shows that
I/O stops for a while. I'm not convinced that it is load related.

I was looking through the logs using the technique you described as
well as looking for the associated PG. There is a lot of data to go
through and it is taking me some time.

We are rolling some of the backports for 0.94.4 into a build, one for
the PG split problem, and 5 others that might help. One that I'm
really hopeful about is http://tracker.ceph.com/issues/12843, but I'm
not sure the messages we are seeing are exactly related. We are
planning to roll the new binaries tomorrow night. I'll update this
thread after the new code has been rolled.

Thanks,
-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.1.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWAIE9CRDmVDuy+mK58QAAyS8P/21+0Y+QhsByqgu/bTiS
3dG6hNMyElXFyuWXievqqvyvaak7Y/nkVhC+oII1glujWFRRTL+61K4Qq8oo
abFBtFVSRkkQpg0BCuHH0LsbXwyK7bmiSTZted2/XzZfJdcuQcDCVXZ0K3En
LLWn0PvDj7OBnLexAAKAMF91a8gCnjuKq3AJnEYxQBeI/Fv58cpfERAiYa+W
Fl6jBKPboJr8sgbQ87k6hu4aLuHGepliFJlUO3XPTvuD4WQ6Ak1HAD+KtmXd
i8GYOZK9ukMQs8YavO8GqVAiZvUcuIGHVf502fP0v+7SR/s/9OY6Loo00/kK
QdG0+mgV0o60AZ4r/setlsd7Uo3l9u4ra9n3D2RUtSJZRvcBK2HweeMiit4u
FgA5dcx0lRFd6IluxZstgZlQiyxggIWHUgoQYFashtNWu/bl8bXn+gzK0GxO
mWZqaeKBMauBWwLADIX1Q+VYBSvZWqFCfKGUawQ4bRnyz7zlHXQANlL1t7iF
/QakoriydMW3l2WPftk4kDt4egFGhxxrCRZfA0TnVNx1DOLE9vRBKXKgTr0j
miB0Ca9v9DQzVnTWhPCTfb8UdEHzozMTMEv30V3nskafPolsRJmjO04C1K7e
61R+cawG02J0RQqFMMNj3X2Gnbp/CC6JzUpQ5JPvNrvO34lcTYBWkdfwtolg
9ExB
=hAcJ
-END PGP SIGNATURE-

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Mon, Sep 21, 2015 at 4:00 PM, Gregory Farnum  wrote:
> So it sounds like you've got two different things here:
> 1) You get a lot of slow operations that show up as warnings.
>
> 2) Rarely, you get blocked op warnings that don't seem to go away
> until the cluster state changes somehow.
>
> (2) is the interesting one. Since you say the cluster is under heavy
> load, I presume (1) is just you overloading your servers and getting
> some hot spots that take time to clear up.
>
> There have been bugs in the past where slow op warnings weren't
> getting removed when they should have. I don't *think* any are in
> .94.3 but could be wrong. Have you observed these from the other
> direction, where a client has blocked operations?
> If you want to go through the logs yourself, you should try and find
> all the lines about one of the operations which seems to be blocked.
> They aren't the most readable but if you grep for the operation ID
> (client.4267090.0:3510311) and then once you're in the right area look
> for what the threads processing it are doing you should get some idea
> of where things are going wrong you can share.
> -Greg
>
> On Sun, Sep 20, 2015 at 10:43 PM, Robert LeBlanc  wrote:
>> We set the logging on an OSD that had problems pretty frequently, but
>> cleared up in less than 30 seconds. The logs are at
>> http://162.144.87.113/files/ceph-osd.112.log.xz and are uncompressed
>> at 8.6GB. Some of the messages we were seeing in ceph -w are:
>>
>> 2015-09-20 20:55:44.029041 osd.112 [WRN] 10 slow requests, 10 included
>> below; oldest blocked for > 30.132696 secs
>> 2015-09-20 20:55:44.029047 osd.112 [WRN] slow request 30.132696
>> seconds old, received at 2015-09-20 20:55:13.896286:
>> osd_op(client.3289538.0:62497509
>> rbd_data.29b9ae3f960770.0200 [stat,set-alloc-hint
>> object_size 8388608 write_size 8388608,write 2588672~4096] 17.118f0c67
>> ack+ondisk+write+known_if_redirected e57590) currently reached_pg
>> 2015-09-20 20:55:44.029051 osd.112 [WRN] slow request 30.132619
>> seconds old, received at 2015-09-20 20:55:13.896363:
>> osd_op(client.3289538.0:62497510
>> rbd_data.29b9ae3f960770.0200 [stat,set-alloc-hint
>> object_size 8388608 write_size 8388608,write 2908160~12288]
>> 17.118f0c67 ack+ondisk+write+known_if_redirected e57590) currently
>> waiting for rw locks
>> 2015-09-20 20:55:44.029054 osd.112 [WRN] slow request 30.132520
>> seconds old, received at 2015-09-20 20:55:13.896462:
>> osd_op(client.3289538.0:62497511
>> rbd_data.29b9ae3f960770.0200 [stat,set-alloc-hint
>> object_size 8388608 write_size 8388608,write 2949120~4096] 17.118f0c67
>> ack+ondisk+write+known_if_redirected e57590) currently waiting for rw
>> locks
>> 2015-09-20 20:55:44.029058 osd.112 [WRN] slow request 30.132415
>> seconds old, received at 2015-09-20 20:55:13.896567:
>> osd_op(client.3289538.0:62497512
>> rbd_data.29b9ae3f960770.0200 [stat,set-alloc-hint
>> object_size 8388608 write_size 8388608,write 2957312~4096] 17.118f0c67
>> ack+ondisk+write+known_if_redirected e57590) currently waiting for rw
>> locks
>> 2015-09-20 20:55:44.029061 osd.112 [WRN] slow request 30.132302
>> seconds old, received at 2015-09-20 20:55

Re: [ceph-users] multi-datacenter crush map

2015-09-21 Thread Gregory Farnum

On Mon, Sep 21, 2015 at 7:07 AM, Robert LeBlanc  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
>
>
>
> On Mon, Sep 21, 2015 at 3:02 AM, Wouter De Borger  wrote:
>> Thank you for your answer! We will use size=4 and min_size=2, which should
>> do the trick.
>>
>> For the monitor issue, we have a third datacenter (with higher latency, but
>> that shouldn't be a problem for the monitors)
>
> Just be sure that the IP address in the third DC is higher than the
> other two. Ceph chooses the lowest IP address as the primary monitor.
> Having the monitor with the highest latency as your primary is sure to
> give you grief.
>
>> We had also considered the locality issue. Our WAN round trip latency is 1.5
>> ms (now) and we should get a dedicated light path in the near future (<0.1
>> ms).
>> So we hope to get acceptable latency without additional tweaking.
>> The plan B is to make two pools, with different weights for the different
>> DC's. VM's in DC 1 will get high weight for DC 1, VM's in DC 2 will get high
>> weight for DC 2.
>
> You don't want to mess with weights or else you will get stuck PGs
> because there isn't enough storage at the remote DC to hold all your
> data or the local DC fills up faster or CRUSH can't figure out how to
> distribute balanced data in an unbalanced way. You will probably want
> to look at primary affinity. This talks about SSD, but the same
> principle applies
> http://www.sebastien-han.fr/blog/2015/08/06/ceph-get-the-best-of-your-ssd-with-primary-affinity/

Keep in mind that primary affinity is per-OSD and applies to all
pools. If you have separate pools for each DC, you can construct
custom CRUSH rules for each one that always picks one or two OSDs out
of the "local" DC and then one or two out of the secondary.
Of course if you lose the primary DC you might want to edit the CRUSH
rules — things should keep working but I'd expect the CRUSH
calculations to start taking longer as it repeatedly tries and fails
to get the OSDs out of the first list. ;)

But if you can get a <.1ms link I wouldn't expect you to need to worry about it.
-Greg

>
> - 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> -BEGIN PGP SIGNATURE-
> Version: Mailvelope v1.1.0
> Comment: https://www.mailvelope.com
>
> wsFcBAEBCAAQBQJWAA80CRDmVDuy+mK58QAAK2UQAIadviUNo+hybghYO2de
> 0Ayqy1SQ9hPjpwJUTWt0VKIgIoDMMbp1+LVd6EiPRNXPPwalDZen8z1ARW5F
> o0g0nxMkJIJWxY2U3ch4s3oE85Hl5NqyPZIdaLfqQv7PXfFsRTM7rZDMRuLn
> X9PWcxzZ3lvXj6WA/jiN42zXlhTlhLhtKV153poCtd/iCkytJUEPZF/c+Pr+
> /mIVVbbtzPDh5XC/Il/KEI1W+fZzPG27EL6N2YoskYWTEpUr0auBH7jTLm8B
> kk4HKZToBsakP/j/LEhrh+MMW/ZNwuUaigWe351YQrkmlUBXxKVtnlgfJpDL
> KIGvVDICRMwe6e3ubU7umsiFVchYPBlXFN3OoRmx3UfPp3WlfKong+PG9X8+
> JoukMyj630jmaDy7KPH5sKv+LUaLpWlWv6271bVUDlSMjPNHqVMlk/HsOGit
> QYaWQQygQ+D/YedLP66xz+O6A8qfmrYfWBZZDdG9C49erw6Q6Ihzp6u+Hj1O
> zt96JsmBjHcq27AfK1jSbZDGOHchL78/Ok3O4O01tC00HwRvQLy/ofppOzgn
> KHnuLMs/9fZpSqh5sZ1BxWSgKBnYzc3npFloaOZLoVPT4Dv+hM8kL9lF0mHm
> 3j1D4ZmysVaSU+zVw0WsWaNVQyrrPn+i9ivT6U/3VHiaS2Nz+BQTQeq0oMqQ
> awOM
> =sl/W
> -END PGP SIGNATURE-
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Potential OSD deadlock?

2015-09-21 Thread Gregory Farnum

On Mon, Sep 21, 2015 at 3:14 PM, Robert LeBlanc  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> In my lab cluster I can saturate the disks and I'm not seeing any of
> the blocked I/Os from the Ceph side, although the client shows that
> I/O stops for a while. I'm not convinced that it is load related.
>
> I was looking through the logs using the technique you described as
> well as looking for the associated PG. There is a lot of data to go
> through and it is taking me some time.
>
> We are rolling some of the backports for 0.94.4 into a build, one for
> the PG split problem, and 5 others that might help. One that I'm
> really hopeful about is http://tracker.ceph.com/issues/12843, but I'm
> not sure the messages we are seeing are exactly related. We are
> planning to roll the new binaries tomorrow night. I'll update this
> thread after the new code has been rolled.

Ah, yep, I didn't realize we still had any of those in hammer. That
bug is indeed a good bet for what you're seeing.
-Greg

>
> Thanks,
> -BEGIN PGP SIGNATURE-
> Version: Mailvelope v1.1.0
> Comment: https://www.mailvelope.com
>
> wsFcBAEBCAAQBQJWAIE9CRDmVDuy+mK58QAAyS8P/21+0Y+QhsByqgu/bTiS
> 3dG6hNMyElXFyuWXievqqvyvaak7Y/nkVhC+oII1glujWFRRTL+61K4Qq8oo
> abFBtFVSRkkQpg0BCuHH0LsbXwyK7bmiSTZted2/XzZfJdcuQcDCVXZ0K3En
> LLWn0PvDj7OBnLexAAKAMF91a8gCnjuKq3AJnEYxQBeI/Fv58cpfERAiYa+W
> Fl6jBKPboJr8sgbQ87k6hu4aLuHGepliFJlUO3XPTvuD4WQ6Ak1HAD+KtmXd
> i8GYOZK9ukMQs8YavO8GqVAiZvUcuIGHVf502fP0v+7SR/s/9OY6Loo00/kK
> QdG0+mgV0o60AZ4r/setlsd7Uo3l9u4ra9n3D2RUtSJZRvcBK2HweeMiit4u
> FgA5dcx0lRFd6IluxZstgZlQiyxggIWHUgoQYFashtNWu/bl8bXn+gzK0GxO
> mWZqaeKBMauBWwLADIX1Q+VYBSvZWqFCfKGUawQ4bRnyz7zlHXQANlL1t7iF
> /QakoriydMW3l2WPftk4kDt4egFGhxxrCRZfA0TnVNx1DOLE9vRBKXKgTr0j
> miB0Ca9v9DQzVnTWhPCTfb8UdEHzozMTMEv30V3nskafPolsRJmjO04C1K7e
> 61R+cawG02J0RQqFMMNj3X2Gnbp/CC6JzUpQ5JPvNrvO34lcTYBWkdfwtolg
> 9ExB
> =hAcJ
> -END PGP SIGNATURE-
> 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Mon, Sep 21, 2015 at 4:00 PM, Gregory Farnum  wrote:
>> So it sounds like you've got two different things here:
>> 1) You get a lot of slow operations that show up as warnings.
>>
>> 2) Rarely, you get blocked op warnings that don't seem to go away
>> until the cluster state changes somehow.
>>
>> (2) is the interesting one. Since you say the cluster is under heavy
>> load, I presume (1) is just you overloading your servers and getting
>> some hot spots that take time to clear up.
>>
>> There have been bugs in the past where slow op warnings weren't
>> getting removed when they should have. I don't *think* any are in
>> .94.3 but could be wrong. Have you observed these from the other
>> direction, where a client has blocked operations?
>> If you want to go through the logs yourself, you should try and find
>> all the lines about one of the operations which seems to be blocked.
>> They aren't the most readable but if you grep for the operation ID
>> (client.4267090.0:3510311) and then once you're in the right area look
>> for what the threads processing it are doing you should get some idea
>> of where things are going wrong you can share.
>> -Greg
>>
>> On Sun, Sep 20, 2015 at 10:43 PM, Robert LeBlanc  
>> wrote:
>>> We set the logging on an OSD that had problems pretty frequently, but
>>> cleared up in less than 30 seconds. The logs are at
>>> http://162.144.87.113/files/ceph-osd.112.log.xz and are uncompressed
>>> at 8.6GB. Some of the messages we were seeing in ceph -w are:
>>>
>>> 2015-09-20 20:55:44.029041 osd.112 [WRN] 10 slow requests, 10 included
>>> below; oldest blocked for > 30.132696 secs
>>> 2015-09-20 20:55:44.029047 osd.112 [WRN] slow request 30.132696
>>> seconds old, received at 2015-09-20 20:55:13.896286:
>>> osd_op(client.3289538.0:62497509
>>> rbd_data.29b9ae3f960770.0200 [stat,set-alloc-hint
>>> object_size 8388608 write_size 8388608,write 2588672~4096] 17.118f0c67
>>> ack+ondisk+write+known_if_redirected e57590) currently reached_pg
>>> 2015-09-20 20:55:44.029051 osd.112 [WRN] slow request 30.132619
>>> seconds old, received at 2015-09-20 20:55:13.896363:
>>> osd_op(client.3289538.0:62497510
>>> rbd_data.29b9ae3f960770.0200 [stat,set-alloc-hint
>>> object_size 8388608 write_size 8388608,write 2908160~12288]
>>> 17.118f0c67 ack+ondisk+write+known_if_redirected e57590) currently
>>> waiting for rw locks
>>> 2015-09-20 20:55:44.029054 osd.112 [WRN] slow request 30.132520
>>> seconds old, received at 2015-09-20 20:55:13.896462:
>>> osd_op(client.3289538.0:62497511
>>> rbd_data.29b9ae3f960770.0200 [stat,set-alloc-hint
>>> object_size 8388608 write_size 8388608,write 2949120~4096] 17.118f0c67
>>> ack+ondisk+write+known_if_redirected e57590) currently waiting for rw
>>> locks
>>> 2015-09-20 20:55:44.029058 osd.112 [WRN] slow request 30.132415
>>> seconds old, received at 2015-09-20 20:55:13.896567:
>>> osd_op(client.3289538.0:62497512
>>> rbd_data.

Re: [ceph-users] CephFS Fuse Issue

2015-09-21 Thread Gregory Farnum

Do you have a core file from the crash? If you do and can find out
which pointers are invalid that would help...I think "cct" must be the
broken one, but maybe it's just the Inode* or something.
-Greg

On Mon, Sep 21, 2015 at 2:03 PM, Scottix  wrote:
> I was rsyncing files to ceph from an older machine and I ran into a
> ceph-fuse crash.
>
> OpenSUSE 12.1, 3.1.10-1.29-desktop
> ceph-fuse 0.94.3
>
> The rsync was running for about 48 hours then crashed somewhere along the
> way.
>
> I added the log, and can run more if you like, I am not sure how to
> reproduce it easily except run it again, which will take a while to see with
> any results.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] debian repositories path change?

2015-09-21 Thread David Clarke

On 19/09/15 03:28, Sage Weil wrote:
> On Fri, 18 Sep 2015, Alfredo Deza wrote:
>> The new locations are in:
>>
>>
>> http://packages.ceph.com/
>>
>> For debian this would be:
>>
>> http://packages.ceph.com/debian-{release}
> 
> Make that download.ceph.com .. the packages url was temporary while we got 
> the new site ready and will go away shortly!
> 
> (Also, HTTPS is enabled now.)

Prior to the repository reshuffle eu.ceph.com supported rsync, which was
a convenient way of us to maintain our own local (in New Zealand)
mirror.  Is there an intention to re-enable this on either
download.ceph.com, or the EU equivalent?


-- 
David Clarke




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS Fuse Issue

2015-09-21 Thread Scottix

I didn't get the core dump.

I set it up now and I'll try to see if I can get it to crash again.

On Mon, Sep 21, 2015 at 3:40 PM Gregory Farnum  wrote:

> Do you have a core file from the crash? If you do and can find out
> which pointers are invalid that would help...I think "cct" must be the
> broken one, but maybe it's just the Inode* or something.
> -Greg
>
> On Mon, Sep 21, 2015 at 2:03 PM, Scottix  wrote:
> > I was rsyncing files to ceph from an older machine and I ran into a
> > ceph-fuse crash.
> >
> > OpenSUSE 12.1, 3.1.10-1.29-desktop
> > ceph-fuse 0.94.3
> >
> > The rsync was running for about 48 hours then crashed somewhere along the
> > way.
> >
> > I added the log, and can run more if you like, I am not sure how to
> > reproduce it easily except run it again, which will take a while to see
> with
> > any results.
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Potential OSD deadlock?

2015-09-21 Thread Robert LeBlanc

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

I'm starting to wonder if this has to do with some OSDs getting full
or the 0.94.3 code. Earlier this afternoon, I cleared out my test
cluster so there was no pools. I created anew rbd pool and started
filling it with 6 - 1TB fio jobs replication 3 with 6 spindles over
six servers. It was running 0.94.2 at the time. After several hours of
writes, we had the new patched 0.93.3 binaries ready for testing so I
rolled the update on the test cluster while the fio jobs were running.
There were a few blocked I/O as the services were restarted (nothing
I'm concerned about). Now that the OSDs are about 60% full, the
blocked I/O is becoming very frequent even with the backports. The
write bandwidth was consistently at 200 MB/s until this point, now it
is fluctuating between 200 MB/s and 75 MB/s mostly around about
100MB/s. Our production cluster is XFS on the OSDs, this test cluster
is EXT4.

I'll see if I can go back to 0.94.2 and fill the cluster up again
Going back to 0.94.2 and 0.94.0 still has the issue (although I didn't
refill the cluster, I didn't delete what was already there). I'm
building the latest of hammer-backports now and see if it resolves the
issue.
-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.1.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWAPioCRDmVDuy+mK58QAAOIwP/3D86CWYlgozKBNlsuIv
AT30S7ZrqDZmxygaJQ9PZgSyQlgQuXpDLL4CnVtbUNd+dgz91i7CVecVGj3h
/jrFwrH063yPD1r3nMmSdc2GTTIahH1JhvzpWqcP9pkmuGHoYlWqteYnosfn
ptOjJI57AFw/goxcJLUExLfdp+L/3GkHNoMMKtJXZX7OIEWdkMj1f9jBGEK6
tJ3AGbbpL6eZGB/KFDObHwCEjfwouTkRk0wNh0luDAU9QlBokmcKS134Ht2C
kRtggOMlXxOKaQiXKZHZL7TUEgvlwldpS01rgDLnNOn3AHZMiAoaC2noFDDS
48ZnbkJgdqpMX2nMFcbwh4zdWOmRRcFqNXuA/t4m0UrZwRCWlSwcVPxDqbHr
00kjDMFtlbov1NWfDXfcMF32qSdsfVaDAwjCmMct1IEn3EXYKYeYA8GUePia
+A9FvUezeYSELWxk59Hirk69A39wNsA40lrMbFzIOkp8CLLuKiHSKs8dTFtJ
CaIPMwZDElcKJDKXPEMu260/GIcJmERUZXPayIQp2Attgx3/gvDpU3crWN7C
49dqnPOVqm6+f+ciUBVwIgQ7Xbbqom+yc1jxlvmpMW1C5iu9vjH/mvO42N/c
e+R0/SgCJnDQU4tYppYadA8vKA/e9JyjMfBlbTW0urxHQlkNqohFY9G+edLW
Zkxf
=kYQ2
-END PGP SIGNATURE-

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Mon, Sep 21, 2015 at 4:33 PM, Gregory Farnum  wrote:
> On Mon, Sep 21, 2015 at 3:14 PM, Robert LeBlanc  wrote:
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>>
>> In my lab cluster I can saturate the disks and I'm not seeing any of
>> the blocked I/Os from the Ceph side, although the client shows that
>> I/O stops for a while. I'm not convinced that it is load related.
>>
>> I was looking through the logs using the technique you described as
>> well as looking for the associated PG. There is a lot of data to go
>> through and it is taking me some time.
>>
>> We are rolling some of the backports for 0.94.4 into a build, one for
>> the PG split problem, and 5 others that might help. One that I'm
>> really hopeful about is http://tracker.ceph.com/issues/12843, but I'm
>> not sure the messages we are seeing are exactly related. We are
>> planning to roll the new binaries tomorrow night. I'll update this
>> thread after the new code has been rolled.
>
> Ah, yep, I didn't realize we still had any of those in hammer. That
> bug is indeed a good bet for what you're seeing.
> -Greg
>
>>
>> Thanks,
>> -BEGIN PGP SIGNATURE-
>> Version: Mailvelope v1.1.0
>> Comment: https://www.mailvelope.com
>>
>> wsFcBAEBCAAQBQJWAIE9CRDmVDuy+mK58QAAyS8P/21+0Y+QhsByqgu/bTiS
>> 3dG6hNMyElXFyuWXievqqvyvaak7Y/nkVhC+oII1glujWFRRTL+61K4Qq8oo
>> abFBtFVSRkkQpg0BCuHH0LsbXwyK7bmiSTZted2/XzZfJdcuQcDCVXZ0K3En
>> LLWn0PvDj7OBnLexAAKAMF91a8gCnjuKq3AJnEYxQBeI/Fv58cpfERAiYa+W
>> Fl6jBKPboJr8sgbQ87k6hu4aLuHGepliFJlUO3XPTvuD4WQ6Ak1HAD+KtmXd
>> i8GYOZK9ukMQs8YavO8GqVAiZvUcuIGHVf502fP0v+7SR/s/9OY6Loo00/kK
>> QdG0+mgV0o60AZ4r/setlsd7Uo3l9u4ra9n3D2RUtSJZRvcBK2HweeMiit4u
>> FgA5dcx0lRFd6IluxZstgZlQiyxggIWHUgoQYFashtNWu/bl8bXn+gzK0GxO
>> mWZqaeKBMauBWwLADIX1Q+VYBSvZWqFCfKGUawQ4bRnyz7zlHXQANlL1t7iF
>> /QakoriydMW3l2WPftk4kDt4egFGhxxrCRZfA0TnVNx1DOLE9vRBKXKgTr0j
>> miB0Ca9v9DQzVnTWhPCTfb8UdEHzozMTMEv30V3nskafPolsRJmjO04C1K7e
>> 61R+cawG02J0RQqFMMNj3X2Gnbp/CC6JzUpQ5JPvNrvO34lcTYBWkdfwtolg
>> 9ExB
>> =hAcJ
>> -END PGP SIGNATURE-
>> 
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Mon, Sep 21, 2015 at 4:00 PM, Gregory Farnum  wrote:
>>> So it sounds like you've got two different things here:
>>> 1) You get a lot of slow operations that show up as warnings.
>>>
>>> 2) Rarely, you get blocked op warnings that don't seem to go away
>>> until the cluster state changes somehow.
>>>
>>> (2) is the interesting one. Since you say the cluster is under heavy
>>> load, I presume (1) is just you overloading your servers and getting
>>> some hot spots that take time to clear up.
>>>
>>> There have been bugs in the past where slow op warnings weren't
>>> getting removed when they should have. I don't *think* any are in
>>> .94.3

44 matches

Mail list logo