Re: [ceph-users] does anyone know what xfsaild and kworker are?they make osd disk busy. produce 100-200iops per osd disk?

2015-12-02 Thread flisky
It works. However, I think the root case is due to the xfs_buf missing?

trace-cmd record -e xfs\*
trace-cmd report > xfs.txt
awk '{print $4}' xfs2.txt |sort -n |uniq -c|sort -n|tail -n 20

  14468 xfs_file_splice_write:
  16562 xfs_buf_find:
  19597 xfs_buf_read:
  19634 xfs_buf_get:
  21943 xfs_get_blocks_alloc:
  23265 xfs_perag_put:
  26327 xfs_perag_get:
  27853 xfs_ail_locked:
  39252 xfs_buf_iorequest:
  40187 xfs_ail_delete:
  41590 xfs_buf_ioerror:
  42523 xfs_buf_hold:
  44659 xfs_buf_trylock:
  47986 xfs_ail_flushing:
  50793 xfs_ilock_nowait:
  57585 xfs_ilock:
  58293 xfs_buf_unlock:
  79977 xfs_buf_iodone:
 104165 xfs_buf_rele:
 108383 xfs_iunlock:

Could you please give me another hint? :) Thanks!

On 2015年12月02日 05:14, Somnath Roy wrote:
> Sure..The following settings helped me minimizing the effect a bit for the PR 
> https://github.com/ceph/ceph/pull/6670
> 
> 
>sysctl -w fs.xfs.xfssyncd_centisecs=72
>sysctl -w fs.xfs.xfsbufd_centisecs=3000
>sysctl -w fs.xfs.age_buffer_centisecs=72
> 
> But, for existing Ceph write path you may need to tweak this..
> 
> Thanks & Regards
> Somnath
> 
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> flisky
> Sent: Tuesday, December 01, 2015 11:04 AM
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] does anyone know what xfsaild and kworker are?they 
> make osd disk busy. produce 100-200iops per osd disk?
> 
> On 2015年12月02日 01:31, Somnath Roy wrote:
>> This is xfs metadata sync process...when it is waking up and there are lot 
>> of data to sync it will throttle all the process accessing the drive...There 
>> are some xfs settings to control the behavior, but you can't stop that
> May I ask how to tune the xfs settings? Thanks!
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw in 0.94.5 leaking memory?

2015-12-02 Thread Dan van der Ster
Hi,

We've had increased user activity on our radosgw boxes the past two
days and are finding that the radosgw is growing quickly in used
memory. Most of our gateways are VMs with 4GB of memory and these are
getting OOM-killed after ~30 mins of high user load. We added a few
physical gateways with 64GB of ram and overnight those have grown from
zero to more than 8GB, and are still growing.

I'm not a valgrind expert, but I've been running one of the daemons like this:

  valgrind --leak-check=full /bin/radosgw -n client.radosgw.cephrgw -f

but it's not reporting any leaks, even though the memory usage is
climbing for that process.

Anyone seen something similar? Any tips for tracking this down? My
next (random) step will be to disable the rgw_cache and see if that
helps.

Cheers, Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Removing OSD - double rebalance?

2015-12-02 Thread Andy Allan
On 30 November 2015 at 09:34, Burkhard Linke
 wrote:
> On 11/30/2015 10:08 AM, Carsten Schmitt wrote:

>> But after entering the last command, the cluster starts rebalancing again.
>>
>> And that I don't understand: Shouldn't be one rebalancing process enough
>> or am I missing something?
>
> Removing the OSD changes the weight for the host, thus a second rebalance is
> necessary.
>
> The best practice to remove an OSD involves changing the crush weight to 0.0
> as first step.

I found this out the hard way too. It's unfortunate that the
documentation is, in my mind, not helpful on the order of commands to
run.

http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual

Is there any good reason why the documentation recommends this
double-rebalance approach? Or conversely, any reason not to change the
documentation so that rebalances only happen once?

Thanks,
Andy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



Re: [ceph-users] Removing OSD - double rebalance?

2015-12-02 Thread Jan Schermer
1) if you have the original drive that works and just want to replace it then 
you can just "dd" it over to the new drive and then extend the partition if the 
new one is larger, this avoids double backfilling in this case
2) if the old drive is dead you should "out" it and at the same time add a new 
drive

If you reweight the drive then you shuffle all data on it to the rest of the 
drives on that host (with default crush at least), so you need to have free 
space to do that safely.
Also, ceph is not that smart to only backfill the data to the new drive locally 
(even though it could) and the "hashing" algorithm doesn't really guarantee 
that no other data moves when you switch drives like that.

TL;DR - if you can, deal with the additional load

Jan

> On 02 Dec 2015, at 11:59, Andy Allan  wrote:
> 
> On 30 November 2015 at 09:34, Burkhard Linke
>  wrote:
>> On 11/30/2015 10:08 AM, Carsten Schmitt wrote:
> 
>>> But after entering the last command, the cluster starts rebalancing again.
>>> 
>>> And that I don't understand: Shouldn't be one rebalancing process enough
>>> or am I missing something?
>> 
>> Removing the OSD changes the weight for the host, thus a second rebalance is
>> necessary.
>> 
>> The best practice to remove an OSD involves changing the crush weight to 0.0
>> as first step.
> 
> I found this out the hard way too. It's unfortunate that the
> documentation is, in my mind, not helpful on the order of commands to
> run.
> 
> http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual
> 
> Is there any good reason why the documentation recommends this
> double-rebalance approach? Or conversely, any reason not to change the
> documentation so that rebalances only happen once?
> 
> Thanks,
> Andy
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RBD: Missing 1800000000 when map block device

2015-12-02 Thread MinhTien MinhTien
Hi all,

When I map block device in client kernel 3.10.93-1.el6.elrepo.x86_64, i get
error:

libceph: mon0 ip:6789 feature set mismatch, my 4a042a42 < server's
184a042a42, *missing 18*
libceph: mon0 ip:6789 socket error on read

I used ceph version 0.87.2

I tried to find the problem in this page:
http://cephnotes.ksperis.com/blog/2014/01/21/feature-set-mismatch-error-on-ceph-kernel-client
but I not saw infomation for my error.

Please help me,

Thanks you,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] infernalis on centos 7

2015-12-02 Thread Dan Nica
Hi guys,

I try to setup the cluster but it seems to fail for me on setting up the mons 
with the error
bellow:

[ceph_deploy.conf][DEBUG ] found configuration file at: 
/home/ceph/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.25): /usr/bin/ceph-deploy mon create 
cmon01 cmon02 cmon03
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts cmon01 cmon02 cmon03
[ceph_deploy.mon][DEBUG ] detecting platform for host cmon01 ...
[cmon01][DEBUG ] connection detected need for sudo
[cmon01][DEBUG ] connected to host: cmon01
[cmon01][DEBUG ] detect platform information from remote host
[cmon01][DEBUG ] detect machine type
[ceph_deploy.mon][INFO  ] distro info: CentOS Linux 7.1.1503 Core
[cmon01][DEBUG ] determining if provided host has same hostname in remote
[cmon01][DEBUG ] get remote short hostname
[cmon01][DEBUG ] deploying mon to cmon01
[cmon01][DEBUG ] get remote short hostname
[cmon01][DEBUG ] remote hostname: cmon01
[cmon01][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[cmon01][DEBUG ] create the mon path if it does not exist
[cmon01][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-cmon01/done
[cmon01][DEBUG ] create a done file to avoid re-doing the mon deployment
[cmon01][DEBUG ] create the init path if it does not exist
[cmon01][DEBUG ] locating the `service` executable...
[cmon01][INFO  ] Running command: sudo /usr/sbin/service ceph -c 
/etc/ceph/ceph.conf start mon.cmon01
[cmon01][WARNIN] The service command supports only basic LSB actions (start, 
stop, restart, try-restart, reload, force-reload, status). For other actions, 
please try to use systemctl.
[cmon01][ERROR ] RuntimeError: command returned non-zero exit status: 2
[ceph_deploy.mon][ERROR ] Failed to execute command: /usr/sbin/service ceph -c 
/etc/ceph/ceph.conf start mon.cmon01


Can someone advice ?

Thanks,
Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Removing OSD - double rebalance?

2015-12-02 Thread Dan van der Ster
Here's something that I didn't see mentioned in this thread yet: the
set of PGs mapped to an OSD is a function of the ID of that OSD. So,
if you replace a drive but don't reuse the same OSD ID for the
replacement, you'll have more PG movement than if you kept the ID.

-- dan

On Wed, Dec 2, 2015 at 12:10 PM, Jan Schermer  wrote:
> 1) if you have the original drive that works and just want to replace it then 
> you can just "dd" it over to the new drive and then extend the partition if 
> the new one is larger, this avoids double backfilling in this case
> 2) if the old drive is dead you should "out" it and at the same time add a 
> new drive
>
> If you reweight the drive then you shuffle all data on it to the rest of the 
> drives on that host (with default crush at least), so you need to have free 
> space to do that safely.
> Also, ceph is not that smart to only backfill the data to the new drive 
> locally (even though it could) and the "hashing" algorithm doesn't really 
> guarantee that no other data moves when you switch drives like that.
>
> TL;DR - if you can, deal with the additional load
>
> Jan
>
>> On 02 Dec 2015, at 11:59, Andy Allan  wrote:
>>
>> On 30 November 2015 at 09:34, Burkhard Linke
>>  wrote:
>>> On 11/30/2015 10:08 AM, Carsten Schmitt wrote:
>>
 But after entering the last command, the cluster starts rebalancing again.

 And that I don't understand: Shouldn't be one rebalancing process enough
 or am I missing something?
>>>
>>> Removing the OSD changes the weight for the host, thus a second rebalance is
>>> necessary.
>>>
>>> The best practice to remove an OSD involves changing the crush weight to 0.0
>>> as first step.
>>
>> I found this out the hard way too. It's unfortunate that the
>> documentation is, in my mind, not helpful on the order of commands to
>> run.
>>
>> http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual
>>
>> Is there any good reason why the documentation recommends this
>> double-rebalance approach? Or conversely, any reason not to change the
>> documentation so that rebalances only happen once?
>>
>> Thanks,
>> Andy
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Removing OSD - double rebalance?

2015-12-02 Thread Andy Allan
On 2 December 2015 at 11:10, Jan Schermer  wrote:
> 1) if you have the original drive that works and just want to replace it then 
> you can just "dd" it over to the new drive and then extend the partition if 
> the new one is larger, this avoids double backfilling in this case
> 2) if the old drive is dead you should "out" it and at the same time add a 
> new drive

Hi Jan - we're talking about the case when we simply want to remove an
OSD (e.g. downsizing or rearranging the cluster). Obviously there are
other situations but I'd like to discuss the documentation for
removing an OSD, not fixing failures.

Thanks,
Andy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Removing OSD - double rebalance?

2015-12-02 Thread Carsten Schmitt

Hi all,

On 12/02/2015 12:10 PM, Jan Schermer wrote:

1) if you have the original drive that works and just want to replace it then you can 
just "dd" it over to the new drive and then extend the partition if the new one 
is larger, this avoids double backfilling in this case
2) if the old drive is dead you should "out" it and at the same time add a new 
drive

If you reweight the drive then you shuffle all data on it to the rest of the 
drives on that host (with default crush at least), so you need to have free 
space to do that safely.
Also, ceph is not that smart to only backfill the data to the new drive locally (even 
though it could) and the "hashing" algorithm doesn't really guarantee that no 
other data moves when you switch drives like that.

TL;DR - if you can, deal with the additional load


well, that is unfortunately not an option for me, 46 OSD and 32 GB in 
one server is not a smart setup :-) (Yes, this was discussed earlier in 
other threads, but I couldn't resist).


If everything runs smoothly it's great, but if only the tiniest thing is 
off, then you can experience horrendous domino effects of dying daemons.


A small addendum in the manual might be helpful for others.

And thanks to all who answered, it helped me a lot and saves a lot of my 
time.


Cheers,
Carsten




smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD: Missing 1800000000 when map block device

2015-12-02 Thread Ilya Dryomov
On Wed, Dec 2, 2015 at 12:15 PM, MinhTien MinhTien
 wrote:
> Hi all,
>
> When I map block device in client kernel 3.10.93-1.el6.elrepo.x86_64, i get
> error:
>
> libceph: mon0 ip:6789 feature set mismatch, my 4a042a42 < server's
> 184a042a42, missing 18
> libceph: mon0 ip:6789 socket error on read
>
> I used ceph version 0.87.2
>
> I tried to find the problem in this page:
> http://cephnotes.ksperis.com/blog/2014/01/21/feature-set-mismatch-error-on-ceph-kernel-client
> but I not saw infomation for my error.
>
> Please help me,

That's OSD_CACHEPOOL and CRUSH_V2 - you must have tiering enabled on
one of your pools (not necessarily the pool you are trying to map an
image out of).  To make it work with 3.10 you'll need to make sure
*none* of the pools have tiering enabled and make sure your crushmap
doesn't have any indep or SET_* steps.

The alternative is upgrading your kernel to at least 3.14.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] infernalis on centos 7

2015-12-02 Thread Adrien Gillard
I'm guessing that your ceph deploy version is not up-to-date and you are
usinf one that does not support systemd.
Be sure to update to the latest ceph-deploy available in the ceph noarch
repo (1.5.28) and you should be fine.

Adrien

On Wed, Dec 2, 2015 at 1:08 PM, Dan Nica 
wrote:

> Hi guys,
>
>
>
> I try to setup the cluster but it seems to fail for me on setting up the
> mons with the error
>
> bellow:
>
>
>
> [ceph_deploy.conf][DEBUG ] found configuration file at:
> /home/ceph/.cephdeploy.conf
>
> [ceph_deploy.cli][INFO  ] Invoked (1.5.25): /usr/bin/ceph-deploy mon
> create cmon01 cmon02 cmon03
>
> [ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts cmon01 cmon02
> cmon03
>
> [ceph_deploy.mon][DEBUG ] detecting platform for host cmon01 ...
>
> [cmon01][DEBUG ] connection detected need for sudo
>
> [cmon01][DEBUG ] connected to host: cmon01
>
> [cmon01][DEBUG ] detect platform information from remote host
>
> [cmon01][DEBUG ] detect machine type
>
> [ceph_deploy.mon][INFO  ] distro info: CentOS Linux 7.1.1503 Core
>
> [cmon01][DEBUG ] determining if provided host has same hostname in remote
>
> [cmon01][DEBUG ] get remote short hostname
>
> [cmon01][DEBUG ] deploying mon to cmon01
>
> [cmon01][DEBUG ] get remote short hostname
>
> [cmon01][DEBUG ] remote hostname: cmon01
>
> [cmon01][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
>
> [cmon01][DEBUG ] create the mon path if it does not exist
>
> [cmon01][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-cmon01/done
>
> [cmon01][DEBUG ] create a done file to avoid re-doing the mon deployment
>
> [cmon01][DEBUG ] create the init path if it does not exist
>
> [cmon01][DEBUG ] locating the `service` executable...
>
> [cmon01][INFO  ] Running command: sudo /usr/sbin/service ceph -c
> /etc/ceph/ceph.conf start mon.cmon01
>
> [cmon01][WARNIN] The service command supports only basic LSB actions
> (start, stop, restart, try-restart, reload, force-reload, status). For
> other actions, please try to use systemctl.
>
> [cmon01][ERROR ] RuntimeError: command returned non-zero exit status: 2
>
> [ceph_deploy.mon][ERROR ] Failed to execute command: /usr/sbin/service
> ceph -c /etc/ceph/ceph.conf start mon.cmon01
>
>
>
>
>
> Can someone advice ?
>
>
>
> Thanks,
>
> Dan
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Infernalis for Debian 8 armhf

2015-12-02 Thread Alfredo Deza
On Tue, Dec 1, 2015 at 11:58 PM, Swapnil Jain  wrote:
>
> Hi,
>
> Any plans to release Infernalis Debian 8 binary packages for armhf. As I only 
> see it for amd64.

This would be pretty simple to do but we don't have any ARM boxes
around and nothing is immediately available for us
to setup any.

>
>
>
> —
>
> Swapnil Jain
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] F21 pkgs for Ceph Hammer release ?

2015-12-02 Thread Alfredo Deza
On Tue, Dec 1, 2015 at 4:59 AM, Deepak Shetty  wrote:
> Hi,
>  Does anybody how/where I can get the F21 repo for ceph hammer release ?
>
> In download.ceph.com/rpm-hammer/ I only see F20 dir, not F21

Right, we haven't built FC binaries for a while, the one node we had
for FC was a FC20 box that was unable
to build newer (at that time) releases of Ceph so it was dropped.

At some point we might be able to start releasing FC22 binaries but
those will not be immediately available to
hammer until we get a new release.

>
> F21 distro repo only carries firefly release, but I want to install Ceph
> Hammer, hence the Q
>
> thanx,
> deepak
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] does anyone know what xfsaild and kworker are?they make osd disk busy. produce 100-200iops per osd disk?

2015-12-02 Thread flisky
Ignore my last reply. I read the thread [Re: XFS
Syncd]("http://oss.sgi.com/archives/xfs/2015-06/msg00111.html";), and
found that might be okay.

The call xfs_ail_push is almost INODE rather than BUF (1579 vs 99).
Our ceph is dedicated to S3 service, and the write is small.
So, where are so many INODE changes come from? How can I decrease it?

Thanks in advanced!

==
Mount Options:
rw,noatime,seclabel,swalloc,attr2,largeio,nobarrier,inode64,logbsize=256k,noquota

==
XFS Info:
meta-data=/dev/sdb1  isize=2048   agcount=4,
agsize=182979519 blks
 =   sectsz=512   attr=2, projid32bit=1
 =   crc=0finobt=0
data =   bsize=4096   blocks=731918075, imaxpct=5
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0 ftype=0
log  =internal   bsize=4096   blocks=357381, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0




On 2015年12月02日 16:20, flisky wrote:
> It works. However, I think the root case is due to the xfs_buf missing?
> 
> trace-cmd record -e xfs\*
> trace-cmd report > xfs.txt
> awk '{print $4}' xfs2.txt |sort -n |uniq -c|sort -n|tail -n 20
> 
>14468 xfs_file_splice_write:
>16562 xfs_buf_find:
>19597 xfs_buf_read:
>19634 xfs_buf_get:
>21943 xfs_get_blocks_alloc:
>23265 xfs_perag_put:
>26327 xfs_perag_get:
>27853 xfs_ail_locked:
>39252 xfs_buf_iorequest:
>40187 xfs_ail_delete:
>41590 xfs_buf_ioerror:
>42523 xfs_buf_hold:
>44659 xfs_buf_trylock:
>47986 xfs_ail_flushing:
>50793 xfs_ilock_nowait:
>57585 xfs_ilock:
>58293 xfs_buf_unlock:
>79977 xfs_buf_iodone:
>   104165 xfs_buf_rele:
>   108383 xfs_iunlock:
> 
> Could you please give me another hint? :) Thanks!
> 
> On 2015年12月02日 05:14, Somnath Roy wrote:
>> Sure..The following settings helped me minimizing the effect a bit for the 
>> PR https://github.com/ceph/ceph/pull/6670
>>
>>
>> sysctl -w fs.xfs.xfssyncd_centisecs=72
>> sysctl -w fs.xfs.xfsbufd_centisecs=3000
>> sysctl -w fs.xfs.age_buffer_centisecs=72
>>
>> But, for existing Ceph write path you may need to tweak this..
>>
>> Thanks & Regards
>> Somnath
>>
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
>> flisky
>> Sent: Tuesday, December 01, 2015 11:04 AM
>> To: ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] does anyone know what xfsaild and kworker are?they 
>> make osd disk busy. produce 100-200iops per osd disk?
>>
>> On 2015年12月02日 01:31, Somnath Roy wrote:
>>> This is xfs metadata sync process...when it is waking up and there are lot 
>>> of data to sync it will throttle all the process accessing the 
>>> drive...There are some xfs settings to control the behavior, but you can't 
>>> stop that
>> May I ask how to tune the xfs settings? Thanks!
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Infernalis for Debian 8 armhf

2015-12-02 Thread ceph new
i get cluster if PI i can use to build arm ver on it who di i need to talk
to about uploading it the ceph repos ?

On Wed, Dec 2, 2015 at 9:01 AM, Alfredo Deza  wrote:

> On Tue, Dec 1, 2015 at 11:58 PM, Swapnil Jain  wrote:
> >
> > Hi,
> >
> > Any plans to release Infernalis Debian 8 binary packages for armhf. As I
> only see it for amd64.
>
> This would be pretty simple to do but we don't have any ARM boxes
> around and nothing is immediately available for us
> to setup any.
>
> >
> >
> >
> > —
> >
> > Swapnil Jain
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] infernalis osd activation on centos 7

2015-12-02 Thread Dan Nica
Hi guys,

After managing to get the mons up, I am stuck at activating the osds with the 
error below

[ceph_deploy.conf][DEBUG ] found configuration file at: 
/home/ceph/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.28): /usr/bin/ceph-deploy disk activate 
osd01:sdb1:sdb2
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username  : None
[ceph_deploy.cli][INFO  ]  verbose   : False
[ceph_deploy.cli][INFO  ]  overwrite_conf: False
[ceph_deploy.cli][INFO  ]  subcommand: activate
[ceph_deploy.cli][INFO  ]  quiet : False
[ceph_deploy.cli][INFO  ]  cd_conf   : 

[ceph_deploy.cli][INFO  ]  cluster   : ceph
[ceph_deploy.cli][INFO  ]  func  : 
[ceph_deploy.cli][INFO  ]  ceph_conf : None
[ceph_deploy.cli][INFO  ]  default_release   : False
[ceph_deploy.cli][INFO  ]  disk  : [('osd01', 
'/dev/sdb1', '/dev/sdb2')]
[ceph_deploy.osd][DEBUG ] Activating cluster ceph disks 
osd01:/dev/sdb1:/dev/sdb2
[osd01][DEBUG ] connection detected need for sudo
[osd01][DEBUG ] connected to host: osd01
[osd01][DEBUG ] detect platform information from remote host
[osd01][DEBUG ] detect machine type
[osd01][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO  ] Distro info: CentOS Linux 7.1.1503 Core
[ceph_deploy.osd][DEBUG ] activating host osd01 disk /dev/sdb1
[ceph_deploy.osd][DEBUG ] will use init type: systemd
[osd01][INFO  ] Running command: sudo ceph-disk -v activate --mark-init systemd 
--mount /dev/sdb1
[osd01][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdb1 uuid path is 
/sys/dev/block/8:17/dm/uuid
[osd01][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdb1 uuid path is 
/sys/dev/block/8:17/dm/uuid
[osd01][WARNIN] INFO:ceph-disk:Running command: /sbin/sgdisk -i 1 /dev/sdb
[osd01][WARNIN] INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -ovalue 
-- /dev/sdb1
[osd01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_mount_options_xfs
[osd01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
[osd01][WARNIN] DEBUG:ceph-disk:Mounting /dev/sdb1 on 
/var/lib/ceph/tmp/mnt.Ng38c4 with options noatime,inode64
[osd01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/mount -t xfs -o 
noatime,inode64 -- /dev/sdb1 /var/lib/ceph/tmp/mnt.Ng38c4
[osd01][WARNIN] INFO:ceph-disk:Running command: /sbin/restorecon 
/var/lib/ceph/tmp/mnt.Ng38c4
[osd01][WARNIN] DEBUG:ceph-disk:Cluster uuid is 
0c36d242-92a9-4331-b48d-ce07b628750a
[osd01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd 
--cluster=ceph --show-config-value=fsid
[osd01][WARNIN] ERROR:ceph-disk:Failed to activate
[osd01][WARNIN] DEBUG:ceph-disk:Unmounting /var/lib/ceph/tmp/mnt.Ng38c4
[osd01][WARNIN] INFO:ceph-disk:Running command: /bin/umount -- 
/var/lib/ceph/tmp/mnt.Ng38c4
[osd01][WARNIN] Traceback (most recent call last):
[osd01][WARNIN]   File "/sbin/ceph-disk", line 3576, in 
[osd01][WARNIN] main(sys.argv[1:])
[osd01][WARNIN]   File "/sbin/ceph-disk", line 3530, in main
[osd01][WARNIN] args.func(args)
[osd01][WARNIN]   File "/sbin/ceph-disk", line 2424, in main_activate
[osd01][WARNIN] dmcrypt_key_dir=args.dmcrypt_key_dir,
[osd01][WARNIN]   File "/sbin/ceph-disk", line 2197, in mount_activate
[osd01][WARNIN] (osd_id, cluster) = activate(path, activate_key_template, 
init)
[osd01][WARNIN]   File "/sbin/ceph-disk", line 2331, in activate
[osd01][WARNIN] raise Error('No cluster conf found in ' + SYSCONFDIR + ' 
with fsid %s' % ceph_fsid)
[osd01][WARNIN] __main__.Error: Error: No cluster conf found in /etc/ceph with 
fsid 0c36d242-92a9-4331-b48d-ce07b628750a
[osd01][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: ceph-disk -v 
activate --mark-init systemd --mount /dev/sdb1

Why do I get no cluster conf ?

[ceph@osd01 ~]$ ll /etc/ceph/
total 12
-rw--- 1 ceph ceph  63 Dec  2 10:30 ceph.client.admin.keyring
-rw-r--r-- 1 ceph ceph 270 Dec  2 10:31 ceph.conf
-rwxr-xr-x 1 ceph ceph  92 Nov 10 07:06 rbdmap
-rw--- 1 ceph ceph   0 Dec  2 10:30 tmp0jJPo4

[ceph@osd01 ~]$ cat /etc/ceph/ceph.conf
[global]
fsid = 0e906cd0-81f1-412c-a3aa-3866192a2de7
mon_initial_members = cmon01, cmon02, cmon03
mon_host = 10.8.250.249,10.8.250.248,10.8.250.247
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true

why is it looking for other fsid than in the ceph.conf ?

Thanks,
Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Sizing

2015-12-02 Thread Sam Huracan
Hi,
I'm building a storage structure for OpenStack cloud System, input:
- 700 VM
- 150 IOPS per VM
- 20 Storage per VM (boot volume)
- Some VM run database (SQL or MySQL)

I want to ask a sizing plan for Ceph to satisfy the IOPS requirement, I
list some factors considered:
- Amount of OSD (SAS Disk)
- Amount of Journal (SSD)
- Amount of OSD Servers
- Amount of MON Server
- Network
- Replica ( default is 3)

I will divide to 3 pool with 3 Disk types: SSD, SAS 15k and SAS 10k
Should I use all 3 disk types in one server or build dedicated servers for
every pool? Example: 3 15k servers for Pool-1, 3 10k Servers for Pool-2.

Could you help me a formula to calculate the minimum devices needed for
above input.

Thanks and regards.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Sizing

2015-12-02 Thread Sam Huracan
Hi,
I'm building a storage structure for OpenStack cloud System, input:
- 700 VM
- 150 IOPS per VM
- 20 Storage per VM (boot volume)
- Some VM run database (SQL or MySQL)

I want to ask a sizing plan for Ceph to satisfy the IOPS requirement, I
list some factors considered:
- Amount of OSD (SAS Disk)
- Amount of Journal (SSD)
- Amount of OSD Servers
- Amount of MON Server
- Network
- Replica ( default is 3)

I will divide to 3 pool with 3 Disk types: SSD, SAS 15k and SAS 10k
Should I use all 3 disk types in one server or build dedicated servers for
every pool? Example: 3 15k servers for Pool-1, 3 10k Servers for Pool-2.

Could you help me a formula to calculate the minimum devices needed for
above input.

Thanks and regards.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Sizing

2015-12-02 Thread Nick Fisk
You've left out an important factorcost. Otherwise I would just say buy 
enough SSD to cover the capacity.

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Sam Huracan
> Sent: 02 December 2015 15:46
> To: ceph-us...@ceph.com
> Subject: [ceph-users] Ceph Sizing
> 
> Hi,
> I'm building a storage structure for OpenStack cloud System, input:
> - 700 VM
> - 150 IOPS per VM
> - 20 Storage per VM (boot volume)
> - Some VM run database (SQL or MySQL)
> 
> I want to ask a sizing plan for Ceph to satisfy the IOPS requirement, I list
> some factors considered:
> - Amount of OSD (SAS Disk)
> - Amount of Journal (SSD)
> - Amount of OSD Servers
> - Amount of MON Server
> - Network
> - Replica ( default is 3)
> 
> I will divide to 3 pool with 3 Disk types: SSD, SAS 15k and SAS 10k
> Should I use all 3 disk types in one server or build dedicated servers for 
> every
> pool? Example: 3 15k servers for Pool-1, 3 10k Servers for Pool-2.
> 
> Could you help me a formula to calculate the minimum devices needed for
> above input.
> 
> Thanks and regards.






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Sizing

2015-12-02 Thread Srinivasula Maram
One more factor we need to consider here is IO size(block size) to get required 
IOPS, based on this we can calculate the bandwidth and design the solution.

Thanks
Srinivas

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Nick 
Fisk
Sent: Wednesday, December 02, 2015 9:28 PM
To: 'Sam Huracan'; ceph-us...@ceph.com
Subject: Re: [ceph-users] Ceph Sizing

You've left out an important factorcost. Otherwise I would just say buy 
enough SSD to cover the capacity.

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
> Of Sam Huracan
> Sent: 02 December 2015 15:46
> To: ceph-us...@ceph.com
> Subject: [ceph-users] Ceph Sizing
> 
> Hi,
> I'm building a storage structure for OpenStack cloud System, input:
> - 700 VM
> - 150 IOPS per VM
> - 20 Storage per VM (boot volume)
> - Some VM run database (SQL or MySQL)
> 
> I want to ask a sizing plan for Ceph to satisfy the IOPS requirement, 
> I list some factors considered:
> - Amount of OSD (SAS Disk)
> - Amount of Journal (SSD)
> - Amount of OSD Servers
> - Amount of MON Server
> - Network
> - Replica ( default is 3)
> 
> I will divide to 3 pool with 3 Disk types: SSD, SAS 15k and SAS 10k 
> Should I use all 3 disk types in one server or build dedicated servers 
> for every pool? Example: 3 15k servers for Pool-1, 3 10k Servers for Pool-2.
> 
> Could you help me a formula to calculate the minimum devices needed 
> for above input.
> 
> Thanks and regards.






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how to mount a bootable VM image file?

2015-12-02 Thread Judd Maltin
I'm using OpenStack to create VMs.  They're KVM VMs, and I can see all the
authentication information I need on the process tree.  I want to mount
this bootable image on the hypervizor node to access its filesystem and fix
a file I messed up in /etc/ so I can get the VM to boot.

[root@ceph mnt]# mount -t ceph
192.168.170.53:6789:/volumes/d02ef718-bb44-4316-9e93-5979396921da_disk
/mnt/image -o 'name=volumes,secret=AQDG7fBVqH3/LxAA8pQ0IF5LKQzAPYKTv8SvfQ=='
mount: 192.168.170.53:6789:/volumes/d02ef718-bb44-4316-9e93-5979396921da_disk:
can't read superblock

How can I find and use the partition inside this raw, bootable file image?

Thanks folks,
-judd

-- 
Judd Maltin
T: 917-882-1270
Of Life immense in passion, pulse, and power,  Cheerful—for freest action
form’d, under the laws divine,  The Modern Man I sing. -Walt Whitman
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw in 0.94.5 leaking memory?

2015-12-02 Thread Dan van der Ster
On Wed, Dec 2, 2015 at 11:09 AM, Dan van der Ster  wrote:
> Hi,
>
> We've had increased user activity on our radosgw boxes the past two
> days and are finding that the radosgw is growing quickly in used
> memory. Most of our gateways are VMs with 4GB of memory and these are
> getting OOM-killed after ~30 mins of high user load. We added a few
> physical gateways with 64GB of ram and overnight those have grown from
> zero to more than 8GB, and are still growing.
>
> I'm not a valgrind expert, but I've been running one of the daemons like this:
>
>   valgrind --leak-check=full /bin/radosgw -n client.radosgw.cephrgw -f
>
> but it's not reporting any leaks, even though the memory usage is
> climbing for that process.
>
> Anyone seen something similar? Any tips for tracking this down? My
> next (random) step will be to disable the rgw_cache and see if that
> helps.

Neither changing the lru cache size, nor disabling the rgw_cache
competely seems to make a difference.

We're now checking if the keystone s3 integration feature could be to
blame --  just enabled that a couple days ago and it seems to
correlate.

-- dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to mount a bootable VM image file?

2015-12-02 Thread Gregory Farnum
On Wednesday, December 2, 2015, Judd Maltin  wrote:

> I'm using OpenStack to create VMs.  They're KVM VMs, and I can see all the
> authentication information I need on the process tree.  I want to mount
> this bootable image on the hypervizor node to access its filesystem and fix
> a file I messed up in /etc/ so I can get the VM to boot.
>
> [root@ceph mnt]# mount -t ceph 
> 192.168.170.53:6789:/volumes/d02ef718-bb44-4316-9e93-5979396921da_disk
> /mnt/image -o 'name=volumes,secret=AQDG7fBVqH3/LxAA8pQ0IF5LKQzAPYKTv8SvfQ=='
> mount: 192.168.170.53:6789:/volumes/d02ef718-bb44-4316-9e93-5979396921da_disk:
> can't read superblock
>
> How can I find and use the partition inside this raw, bootable file image?
>

You've probably created it using features that the kernel client you have
installed doesn't understand. You'd need to either use a newer kernel or
(more likely) just hook it up to a VM with QEMU.
-Greg




> Thanks folks,
> -judd
>
> --
> Judd Maltin
> T: 917-882-1270
> Of Life immense in passion, pulse, and power,  Cheerful—for freest action
> form’d, under the laws divine,  The Modern Man I sing. -Walt Whitman
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to mount a bootable VM image file?

2015-12-02 Thread Jan Schermer
There's a pretty cool thing caled libguestfs, and a tool called guestfish

http://libguestfs.org 

I've never used it (just stumbled on it recently) but it should do exactly what 
you need :-) And it supports RBD.

Jan


> On 02 Dec 2015, at 18:07, Gregory Farnum  wrote:
> 
> On Wednesday, December 2, 2015, Judd Maltin  > wrote:
> I'm using OpenStack to create VMs.  They're KVM VMs, and I can see all the 
> authentication information I need on the process tree.  I want to mount this 
> bootable image on the hypervizor node to access its filesystem and fix a file 
> I messed up in /etc/ so I can get the VM to boot.
> 
> [root@ceph mnt]# mount -t ceph 
> 192.168.170.53:6789:/volumes/d02ef718-bb44-4316-9e93-5979396921da_disk 
> /mnt/image -o 'name=volumes,secret=AQDG7fBVqH3/LxAA8pQ0IF5LKQzAPYKTv8SvfQ=='
> mount: 
> 192.168.170.53:6789:/volumes/d02ef718-bb44-4316-9e93-5979396921da_disk: can't 
> read superblock
> 
> How can I find and use the partition inside this raw, bootable file image?
> 
> You've probably created it using features that the kernel client you have 
> installed doesn't understand. You'd need to either use a newer kernel or 
> (more likely) just hook it up to a VM with QEMU.
> -Greg
> 
> 
>  
> Thanks folks,
> -judd
> 
> -- 
> Judd Maltin
> T: 917-882-1270
> Of Life immense in passion, pulse, and power,   <>
> Cheerful—for freest action form’d, under the laws divine,   <>
> The Modern Man I sing. -Walt Whitman
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Infernalis for Debian 8 armhf

2015-12-02 Thread Swapnil Jain
If you can point me to some documention, I can do that.

—
Swapnil Jain

> On 02-Dec-2015, at 7:31 pm, Alfredo Deza  wrote:
> 
> On Tue, Dec 1, 2015 at 11:58 PM, Swapnil Jain  wrote:
>> 
>> Hi,
>> 
>> Any plans to release Infernalis Debian 8 binary packages for armhf. As I 
>> only see it for amd64.
> 
> This would be pretty simple to do but we don't have any ARM boxes
> around and nothing is immediately available for us
> to setup any.
> 
>> 
>> 
>> 
>> —
>> 
>> Swapnil Jain
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Flapping OSDs, Large meta directories in OSDs

2015-12-02 Thread Gregory Farnum
On Tue, Dec 1, 2015 at 10:02 AM, Tom Christensen  wrote:
> Another thing that we don't quite grasp is that when we see slow requests
> now they almost always, probably 95% have the "known_if_redirected" state
> set.  What does this state mean?  Does it indicate we have OSD maps that are
> lagging and the cluster isn't really in sync?  Could this be the cause of
> our growing osdmaps?

This is just a flag set on operations by new clients to let the OSD
perform more effectively — you don't need to worry about it.

I'm not sure why you're getting a bunch of client blacklist
operations, but each one will generate a new OSDMap (if nothing else
prompts one), yes.
-Greg

>
> -Tom
>
>
> On Tue, Dec 1, 2015 at 2:35 AM, HEWLETT, Paul (Paul)
>  wrote:
>>
>> I believe that ‘filestore xattr use omap’ is no longer used in Ceph – can
>> anybody confirm this?
>> I could not find any usage in the Ceph source code except that the value
>> is set in some of the test software…
>>
>> Paul
>>
>>
>> From: ceph-users  on behalf of Tom
>> Christensen 
>> Date: Monday, 30 November 2015 at 23:20
>> To: "ceph-users@lists.ceph.com" 
>> Subject: Re: [ceph-users] Flapping OSDs, Large meta directories in OSDs
>>
>> What counts as ancient?  Concurrent to our hammer upgrade we went from
>> 3.16->3.19 on ubuntu 14.04.  We are looking to revert to the 3.16 kernel
>> we'd been running because we're also seeing an intermittent (its happened
>> twice in 2 weeks) massive load spike that completely hangs the osd node
>> (we're talking about load averages that hit 20k+ before the box becomes
>> completely unresponsive).  We saw a similar behavior on a 3.13 kernel, which
>> resolved by moving to the 3.16 kernel we had before.  I'll try to catch one
>> with debug_ms=1 and see if I can see it we're hitting a similar hang.
>>
>> To your comment about omap, we do have filestore xattr use omap = true in
>> our conf... which we believe was placed there by ceph-deploy (which we used
>> to deploy this cluster).  We are on xfs, but we do take tons of RBD
>> snapshots.  If either of these use cases will cause lots of osd map size
>> then, we may just be exceeding the limits of the number of rbd snapshots
>> ceph can handle (we take about 4-5000/day, 1 per RBD in the cluster)
>>
>> An interesting note, we had an OSD flap earlier this morning, and when it
>> did, immediately after it came back I checked its meta directory size with
>> du -sh, this returned immediately, and showed a size of 107GB.  The fact
>> that it returned immediately indicated to me that something had just
>> recently read through that whole directory and it was all cached in the FS
>> cache.  Normally a du -sh on the meta directory takes a good 5 minutes to
>> return.  Anyway, since it dropped this morning its meta directory size
>> continues to shrink and is down to 93GB.  So it feels like something happens
>> that makes the OSD read all its historical maps which results in the OSD
>> hanging cause there are a ton of them, and then it wakes up and realizes it
>> can delete a bunch of them...
>>
>> On Mon, Nov 30, 2015 at 2:11 PM, Dan van der Ster 
>> wrote:
>>>
>>> The trick with debugging heartbeat problems is to grep back through the
>>> log to find the last thing the affected thread was doing, e.g. is
>>> 0x7f5affe72700 stuck in messaging, writing to the disk, reading through the
>>> omap, etc..
>>>
>>> I agree this doesn't look to be network related, but if you want to rule
>>> it out you should use debug_ms=1.
>>>
>>> Last week we upgraded a 1200 osd cluster from firefly to 0.94.5 and
>>> similarly started getting slow requests. To make a long story short, our
>>> issue turned out to be sendmsg blocking (very rarely), probably due to an
>>> ancient el6 kernel (these osd servers had ~800 days' uptime). The signature
>>> of this was 900s of slow requests, then an ms log showing "initiating
>>> reconnect". Until we got the kernel upgraded everywhere, we used a
>>> workaround of ms tcp read timeout = 60.
>>> So, check your kernels, and upgrade if they're ancient. Latest el6
>>> kernels work for us.
>>>
>>> Otherwise, those huge osd leveldb's don't look right. (Unless you're
>>> using tons and tons of omap...) And it kinda reminds me of the other problem
>>> we hit after the hammer upgrade, namely the return of the ever growing mon
>>> leveldb issue. The solution was to recreate the mons one by one. Perhaps
>>> you've hit something similar with the OSDs. debug_osd=10 might be good
>>> enough to see what the osd is doing, maybe you need debug_filestore=10 also.
>>> If that doesn't show the problem, bump those up to 20.
>>>
>>> Good luck,
>>>
>>> Dan
>>>
>>> On 30 Nov 2015 20:56, "Tom Christensen"  wrote:
>>> >
>>> > We recently upgraded to 0.94.3 from firefly and now for the last week
>>> > have had intermittent slow requests and flapping OSDs.  We have been 
>>> > unable
>>> > to nail down the cause, but its feeling like it may be related to our
>>> > osdmaps not getting deleted properly

[ceph-users] OSD crash, unable to restart

2015-12-02 Thread Major Csaba

Hi,

I have a small cluster(5 nodes, 20OSDs), where an OSD crashed. There is 
no any other signal of problems. No kernel message, so the disks seem to 
be OK.


I tried to restart the OSD but the process stops almost immediately with 
the same logs.


Version is 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) on an 
ubuntu 14.04 with kernel 3.13.0-68-generic.


See the logs from the relevant OSD below.
How can I fix this or what other info needed to find the issue?

Thanks,
Csaba

2015-12-02 12:24:02.897795 7f1dcd6d5700  0 log_channel(cluster) log 
[INF] : 1.1cd deep-scrub starts
2015-12-02 12:24:55.524671 7f1dcd6d5700  0 log_channel(cluster) log 
[INF] : 1.1cd deep-scrub ok
2015-12-02 13:12:03.726819 7f1dcd6d5700  0 log_channel(cluster) log 
[INF] : 1.1d1 deep-scrub starts
2015-12-02 13:12:54.071754 7f1dcd6d5700  0 log_channel(cluster) log 
[INF] : 1.1d1 deep-scrub ok
2015-12-02 14:00:03.679075 7f1dcd6d5700  0 log_channel(cluster) log 
[INF] : 1.1d5 deep-scrub starts
2015-12-02 14:01:01.918174 7f1dcd6d5700  0 log_channel(cluster) log 
[INF] : 1.1d5 deep-scrub ok
2015-12-02 14:02:00.766516 7f1dcd6d5700  0 log_channel(cluster) log 
[INF] : 1.15d scrub starts
2015-12-02 14:02:03.769314 7f1dcd6d5700  0 log_channel(cluster) log 
[INF] : 1.15d scrub ok
2015-12-02 14:14:39.957244 7f1dcd6d5700  0 log_channel(cluster) log 
[INF] : 0.1e5 scrub starts
2015-12-02 14:14:39.980339 7f1dcd6d5700  0 log_channel(cluster) log 
[INF] : 0.1e5 scrub ok
2015-12-02 15:39:49.182272 7f1dcd6d5700  0 log_channel(cluster) log 
[INF] : 0.93 scrub starts
2015-12-02 15:39:49.475440 7f1dcd6d5700  0 log_channel(cluster) log 
[INF] : 0.93 scrub ok
2015-12-02 15:39:50.183518 7f1dcd6d5700  0 log_channel(cluster) log 
[INF] : 0.124 scrub starts
2015-12-02 15:39:50.460812 7f1dcd6d5700  0 log_channel(cluster) log 
[INF] : 0.124 scrub ok
2015-12-02 15:39:52.184466 7f1dcd6d5700  0 log_channel(cluster) log 
[INF] : 0.15d scrub starts
2015-12-02 15:39:52.209681 7f1dcd6d5700  0 log_channel(cluster) log 
[INF] : 0.15d scrub ok
2015-12-02 15:39:59.187012 7f1dcd6d5700  0 log_channel(cluster) log 
[INF] : 0.184 scrub starts
2015-12-02 15:39:59.216332 7f1dcd6d5700  0 log_channel(cluster) log 
[INF] : 0.184 scrub ok
2015-12-02 15:40:01.188834 7f1dcd6d5700  0 log_channel(cluster) log 
[INF] : 0.4 scrub starts
2015-12-02 15:40:01.225628 7f1dcd6d5700  0 log_channel(cluster) log 
[INF] : 0.4 scrub ok
2015-12-02 16:54:17.266052 7f1dcd6d5700  0 log_channel(cluster) log 
[INF] : 0.96 scrub starts
2015-12-02 16:54:17.286040 7f1dcd6d5700  0 log_channel(cluster) log 
[INF] : 0.96 scrub ok
2015-12-02 17:08:33.788969 7f1de2a92700  0 
filestore(/var/lib/ceph/osd/ceph-7)  error (1) Operation not permitted 
not handled on operation 0x15778000 (35890348.0.0, or op 0, counting from 0)
2015-12-02 17:08:33.788977 7f1de2a92700  0 
filestore(/var/lib/ceph/osd/ceph-7) unexpected error code
2015-12-02 17:08:33.788978 7f1de2a92700  0 
filestore(/var/lib/ceph/osd/ceph-7)  transaction dump:

{
"ops": [
{
"op_num": 0,
"op_name": "omap_setkeys",
"collection": "1.136_head",
"oid": "136\/\/head\/\/1",
"attr_lens": {
"004386.01801497": 176
}
},
{
"op_num": 1,
"op_name": "omap_setkeys",
"collection": "1.136_head",
"oid": "136\/\/head\/\/1",
"attr_lens": {
"_epoch": 4,
"_info": 745
}
},
{
"op_num": 2,
"op_name": "omap_setkeys",
"collection": "1.136_head",
"oid": "136\/\/head\/\/1",
"attr_lens": {
"004386.01801497": 176,
"can_rollback_to": 12,
"rollback_info_trimmed_to": 12
}
},
{
"op_num": 3,
"op_name": "op_setallochint",
"collection": "1.136_head",
"oid": "9afc9936\/rb.0.2f6a4.238e1f29.1a74\/head\/\/1",
"expected_object_size": "4194304",
"expected_write_size": "4194304"
},
{
"op_num": 4,
"op_name": "write",
"collection": "1.136_head",
"oid": "9afc9936\/rb.0.2f6a4.238e1f29.1a74\/head\/\/1",
"length": 4096,
"offset": 274432,
"bufferlist length": 4096
},
{
"op_num": 5,
"op_name": "setattr",
"collection": "1.136_head",
"oid": "9afc9936\/rb.0.2f6a4.238e1f29.1a74\/head\/\/1",
"name": "_",
"length": 267
},
{
"op_num": 6,
"op_name": "setattr",
"collection": "1.136_head",
"oid": "9afc9936\/rb.0.2f6a4.238e1f29.1a74\/head\/\/1",
"name": "snapset",
"length": 31
}
]
}

2015-12-02 17:08:33.793090 7f1de2a92700 -1 os/F

Re: [ceph-users] OSD crash, unable to restart

2015-12-02 Thread Gregory Farnum
On Wed, Dec 2, 2015 at 10:54 AM, Major Csaba  wrote:
> Hi,
>
> I have a small cluster(5 nodes, 20OSDs), where an OSD crashed. There is no
> any other signal of problems. No kernel message, so the disks seem to be OK.
>
> I tried to restart the OSD but the process stops almost immediately with the
> same logs.
>
> Version is 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) on an ubuntu
> 14.04 with kernel 3.13.0-68-generic.
>
> See the logs from the relevant OSD below.
> How can I fix this or what other info needed to find the issue?
>
> Thanks,
> Csaba
>
> 2015-12-02 12:24:02.897795 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 1.1cd deep-scrub starts
> 2015-12-02 12:24:55.524671 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 1.1cd deep-scrub ok
> 2015-12-02 13:12:03.726819 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 1.1d1 deep-scrub starts
> 2015-12-02 13:12:54.071754 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 1.1d1 deep-scrub ok
> 2015-12-02 14:00:03.679075 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 1.1d5 deep-scrub starts
> 2015-12-02 14:01:01.918174 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 1.1d5 deep-scrub ok
> 2015-12-02 14:02:00.766516 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 1.15d scrub starts
> 2015-12-02 14:02:03.769314 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 1.15d scrub ok
> 2015-12-02 14:14:39.957244 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.1e5 scrub starts
> 2015-12-02 14:14:39.980339 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.1e5 scrub ok
> 2015-12-02 15:39:49.182272 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.93 scrub starts
> 2015-12-02 15:39:49.475440 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.93 scrub ok
> 2015-12-02 15:39:50.183518 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.124 scrub starts
> 2015-12-02 15:39:50.460812 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.124 scrub ok
> 2015-12-02 15:39:52.184466 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.15d scrub starts
> 2015-12-02 15:39:52.209681 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.15d scrub ok
> 2015-12-02 15:39:59.187012 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.184 scrub starts
> 2015-12-02 15:39:59.216332 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.184 scrub ok
> 2015-12-02 15:40:01.188834 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.4 scrub starts
> 2015-12-02 15:40:01.225628 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.4 scrub ok
> 2015-12-02 16:54:17.266052 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.96 scrub starts
> 2015-12-02 16:54:17.286040 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.96 scrub ok
> 2015-12-02 17:08:33.788969 7f1de2a92700  0
> filestore(/var/lib/ceph/osd/ceph-7)  error (1) Operation not permitted not
> handled on operation 0x15778000 (35890348.0.0, or op 0, counting from 0)

Okay, so you're getting EPERM on op 0. I think the naive reading of it
as that "omap_setkeys" operation right below is the correct one. So
for some reason the OSD isn't being allowed to do that.

Is the OSD's leveldb store accessible and working properly? Does the
process have write access to it?
This one looks pretty weird to me but I haven't spent much time
debugging disk access issues. *shrug*
-Greg

> 2015-12-02 17:08:33.788977 7f1de2a92700  0
> filestore(/var/lib/ceph/osd/ceph-7) unexpected error code
> 2015-12-02 17:08:33.788978 7f1de2a92700  0
> filestore(/var/lib/ceph/osd/ceph-7)  transaction dump:
> {
> "ops": [
> {
> "op_num": 0,
> "op_name": "omap_setkeys",
> "collection": "1.136_head",
> "oid": "136\/\/head\/\/1",
> "attr_lens": {
> "004386.01801497": 176
> }
> },
> {
> "op_num": 1,
> "op_name": "omap_setkeys",
> "collection": "1.136_head",
> "oid": "136\/\/head\/\/1",
> "attr_lens": {
> "_epoch": 4,
> "_info": 745
> }
> },
> {
> "op_num": 2,
> "op_name": "omap_setkeys",
> "collection": "1.136_head",
> "oid": "136\/\/head\/\/1",
> "attr_lens": {
> "004386.01801497": 176,
> "can_rollback_to": 12,
> "rollback_info_trimmed_to": 12
> }
> },
> {
> "op_num": 3,
> "op_name": "op_setallochint",
> "collection": "1.136_head",
> "oid": "9afc9936\/rb.0.2f6a4.238e1f29.1a74\/head\/\/1",
> "expected_object_size": "4194304",
> "expected_write_size": "4194304"
> },
> {
> "op_num": 4,
> "op_name": "write",
> "collection": "1.136_head",
> "oid": "9afc9936\/rb.0.2f6a4.238e1f29.1a74\/head\/\/1",
> "

Re: [ceph-users] OSD crash, unable to restart

2015-12-02 Thread Gregory Farnum
On Wed, Dec 2, 2015 at 11:11 AM, Major Csaba  wrote:
> Hi,
> [ sorry, I accidentaly left out the list address ]
>
> This is the content of the LOG file in the directory
> /var/lib/ceph/osd/ceph-7/current/omap:
> 2015/12/02-18:48:12.241386 7f805fc27900 Recovering log #26281
> 2015/12/02-18:48:12.242455 7f805fc27900 Level-0 table #26283: started
> 2015/12/02-18:48:12.274615 7f805fc27900 Level-0 table #26283: 32841 bytes OK
> 2015/12/02-18:48:12.352606 7f805fc27900 Delete type=2 #26282
> 2015/12/02-18:48:12.353143 7f805fc27900 Delete type=2 #26284
> 2015/12/02-18:48:12.353657 7f805fc27900 Delete type=2 #26285
> 2015/12/02-18:48:12.354145 7f805fc27900 Delete type=3 #26279
> 2015/12/02-18:48:12.354183 7f805fc27900 Delete type=0 #26281
> 2015/12/02-18:48:12.354343 7f80541a4700 Compacting 14@0 + 5@1 files
> 2015/12/02-18:48:12.497244 7f80541a4700 Generated table #26285: 116539 keys,
> 2134818 bytes
> 2015/12/02-18:48:12.653044 7f80541a4700 Generated table #26286: 173429 keys,
> 2130210 bytes
> 2015/12/02-18:48:12.819800 7f80541a4700 Generated table #26287: 172490 keys,
> 2130006 bytes
> 2015/12/02-18:48:12.894106 7f80541a4700 compacted to: files[ 14 5 8 0 0 0 0
> ]
> 2015/12/02-18:48:12.894115 7f80541a4700 Compaction error: Corruption:
> corrupted compressed block contents
>
> I don't know if it's relevant, but seems wrong.

Uh, yeah. Seems something has gone terribly wrong in or under leveldb.
I'm not sure if there's any good way to repair that, or if you just
need to figure out how it died (to prevent future recurrences) and
rebuild the OSD. :/
-Greg


>
> Regards,
> Csaba
>
> On 12/02/2015 07:54 PM, Major Csaba wrote:
>
> Hi,
>
> I have a small cluster(5 nodes, 20OSDs), where an OSD crashed. There is no
> any other signal of problems. No kernel message, so the disks seem to be OK.
>
> I tried to restart the OSD but the process stops almost immediately with the
> same logs.
>
> Version is 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) on an ubuntu
> 14.04 with kernel 3.13.0-68-generic.
>
> See the logs from the relevant OSD below.
> How can I fix this or what other info needed to find the issue?
>
> Thanks,
> Csaba
>
> 2015-12-02 12:24:02.897795 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 1.1cd deep-scrub starts
> 2015-12-02 12:24:55.524671 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 1.1cd deep-scrub ok
> 2015-12-02 13:12:03.726819 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 1.1d1 deep-scrub starts
> 2015-12-02 13:12:54.071754 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 1.1d1 deep-scrub ok
> 2015-12-02 14:00:03.679075 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 1.1d5 deep-scrub starts
> 2015-12-02 14:01:01.918174 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 1.1d5 deep-scrub ok
> 2015-12-02 14:02:00.766516 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 1.15d scrub starts
> 2015-12-02 14:02:03.769314 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 1.15d scrub ok
> 2015-12-02 14:14:39.957244 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.1e5 scrub starts
> 2015-12-02 14:14:39.980339 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.1e5 scrub ok
> 2015-12-02 15:39:49.182272 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.93 scrub starts
> 2015-12-02 15:39:49.475440 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.93 scrub ok
> 2015-12-02 15:39:50.183518 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.124 scrub starts
> 2015-12-02 15:39:50.460812 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.124 scrub ok
> 2015-12-02 15:39:52.184466 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.15d scrub starts
> 2015-12-02 15:39:52.209681 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.15d scrub ok
> 2015-12-02 15:39:59.187012 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.184 scrub starts
> 2015-12-02 15:39:59.216332 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.184 scrub ok
> 2015-12-02 15:40:01.188834 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.4 scrub starts
> 2015-12-02 15:40:01.225628 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.4 scrub ok
> 2015-12-02 16:54:17.266052 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.96 scrub starts
> 2015-12-02 16:54:17.286040 7f1dcd6d5700  0 log_channel(cluster) log [INF] :
> 0.96 scrub ok
> 2015-12-02 17:08:33.788969 7f1de2a92700  0
> filestore(/var/lib/ceph/osd/ceph-7)  error (1) Operation not permitted not
> handled on operation 0x15778000 (35890348.0.0, or op 0, counting from 0)
> 2015-12-02 17:08:33.788977 7f1de2a92700  0
> filestore(/var/lib/ceph/osd/ceph-7) unexpected error code
> 2015-12-02 17:08:33.788978 7f1de2a92700  0
> filestore(/var/lib/ceph/osd/ceph-7)  transaction dump:
> {
> "ops": [
> {
> "op_num": 0,
> "op_name": "omap_setkeys",
> "collection": "1.136_head",
> "oid": "136\/\/head\/\/1",
> "attr_lens": {
> "004386.01801497": 176
> }
> 

Re: [ceph-users] OSD crash, unable to restart

2015-12-02 Thread Major Csaba

Hi,

On 12/02/2015 08:12 PM, Gregory Farnum wrote:

On Wed, Dec 2, 2015 at 11:11 AM, Major Csaba  wrote:

Hi,
[ sorry, I accidentaly left out the list address ]

This is the content of the LOG file in the directory
/var/lib/ceph/osd/ceph-7/current/omap:
2015/12/02-18:48:12.241386 7f805fc27900 Recovering log #26281
2015/12/02-18:48:12.242455 7f805fc27900 Level-0 table #26283: started
2015/12/02-18:48:12.274615 7f805fc27900 Level-0 table #26283: 32841 bytes OK
2015/12/02-18:48:12.352606 7f805fc27900 Delete type=2 #26282
2015/12/02-18:48:12.353143 7f805fc27900 Delete type=2 #26284
2015/12/02-18:48:12.353657 7f805fc27900 Delete type=2 #26285
2015/12/02-18:48:12.354145 7f805fc27900 Delete type=3 #26279
2015/12/02-18:48:12.354183 7f805fc27900 Delete type=0 #26281
2015/12/02-18:48:12.354343 7f80541a4700 Compacting 14@0 + 5@1 files
2015/12/02-18:48:12.497244 7f80541a4700 Generated table #26285: 116539 keys,
2134818 bytes
2015/12/02-18:48:12.653044 7f80541a4700 Generated table #26286: 173429 keys,
2130210 bytes
2015/12/02-18:48:12.819800 7f80541a4700 Generated table #26287: 172490 keys,
2130006 bytes
2015/12/02-18:48:12.894106 7f80541a4700 compacted to: files[ 14 5 8 0 0 0 0
]
2015/12/02-18:48:12.894115 7f80541a4700 Compaction error: Corruption:
corrupted compressed block contents

I don't know if it's relevant, but seems wrong.

Uh, yeah. Seems something has gone terribly wrong in or under leveldb.
I'm not sure if there's any good way to repair that, or if you just
need to figure out how it died (to prevent future recurrences) and
rebuild the OSD. :/
-Greg

We had some RAM error earlier on this node, we replaced the HW but we 
kept the disks with their content. So, I can imagine the content is 
corrupted earlier by the bad RAM module. So, I'll rebuild this OSD.


Thanks for your help.

Regards,
Csaba
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] does anyone know what xfsaild and kworker are?they make osd disk busy. produce 100-200iops per osd disk?

2015-12-02 Thread Somnath Roy
I think each write will create 2 objects (512 KB head object + rest of the 
contents)  if your object size > 512KB. Also, it is writing some xattrs on top 
of what OSD is writing. Don't take my word blindly as I am not fully familiar 
with  RGW :-)
This will pollute significant number of INODE I guess..
But, I think the effect will be much more severe in case of RBD partial random 
write case.

Thanks & Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of flisky
Sent: Wednesday, December 02, 2015 6:39 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] does anyone know what xfsaild and kworker are?they 
make osd disk busy. produce 100-200iops per osd disk?

Ignore my last reply. I read the thread [Re: XFS 
Syncd]("http://oss.sgi.com/archives/xfs/2015-06/msg00111.html";), and found that 
might be okay.

The call xfs_ail_push is almost INODE rather than BUF (1579 vs 99).
Our ceph is dedicated to S3 service, and the write is small.
So, where are so many INODE changes come from? How can I decrease it?

Thanks in advanced!

==
Mount Options:
rw,noatime,seclabel,swalloc,attr2,largeio,nobarrier,inode64,logbsize=256k,noquota

==
XFS Info:
meta-data=/dev/sdb1  isize=2048   agcount=4,
agsize=182979519 blks
 =   sectsz=512   attr=2, projid32bit=1
 =   crc=0finobt=0
data =   bsize=4096   blocks=731918075, imaxpct=5
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0 ftype=0
log  =internal   bsize=4096   blocks=357381, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0




On 2015年12月02日 16:20, flisky wrote:
> It works. However, I think the root case is due to the xfs_buf missing?
> 
> trace-cmd record -e xfs\*
> trace-cmd report > xfs.txt
> awk '{print $4}' xfs2.txt |sort -n |uniq -c|sort -n|tail -n 20
> 
>14468 xfs_file_splice_write:
>16562 xfs_buf_find:
>19597 xfs_buf_read:
>19634 xfs_buf_get:
>21943 xfs_get_blocks_alloc:
>23265 xfs_perag_put:
>26327 xfs_perag_get:
>27853 xfs_ail_locked:
>39252 xfs_buf_iorequest:
>40187 xfs_ail_delete:
>41590 xfs_buf_ioerror:
>42523 xfs_buf_hold:
>44659 xfs_buf_trylock:
>47986 xfs_ail_flushing:
>50793 xfs_ilock_nowait:
>57585 xfs_ilock:
>58293 xfs_buf_unlock:
>79977 xfs_buf_iodone:
>   104165 xfs_buf_rele:
>   108383 xfs_iunlock:
> 
> Could you please give me another hint? :) Thanks!
> 
> On 2015年12月02日 05:14, Somnath Roy wrote:
>> Sure..The following settings helped me minimizing the effect a bit 
>> for the PR https://github.com/ceph/ceph/pull/6670
>>
>>
>> sysctl -w fs.xfs.xfssyncd_centisecs=72
>> sysctl -w fs.xfs.xfsbufd_centisecs=3000
>> sysctl -w fs.xfs.age_buffer_centisecs=72
>>
>> But, for existing Ceph write path you may need to tweak this..
>>
>> Thanks & Regards
>> Somnath
>>
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
>> Of flisky
>> Sent: Tuesday, December 01, 2015 11:04 AM
>> To: ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] does anyone know what xfsaild and kworker are?they 
>> make osd disk busy. produce 100-200iops per osd disk?
>>
>> On 2015年12月02日 01:31, Somnath Roy wrote:
>>> This is xfs metadata sync process...when it is waking up and there 
>>> are lot of data to sync it will throttle all the process accessing 
>>> the drive...There are some xfs settings to control the behavior, but 
>>> you can't stop that
>> May I ask how to tune the xfs settings? Thanks!
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Infernalis for Debian 8 armhf

2015-12-02 Thread ceph new
for now the prosses for me was :
git clone https://github.com/ceph/ceph.git
git checkout infernalis
cd ceph
apt-get install debhelper autoconf automake autotools-dev libbz2-dev cmake
default-jdk gdisk javahelper junit4 libaio-dev libatomic-ops-dev
libbabeltrace-ctf-dev libbabeltrace-dev libblkid-dev libboost-dev
libboost-program-options-dev libboost-system-dev libboost-thread-dev
libboost-regex-dev libboost-random-dev libcurl4-gnutls-dev libedit-dev
libfcgi-dev libfuse-dev libkeyutils-dev libleveldb-dev libnss3-dev
libsnappy-dev liblttng-ust-dev libtool libudev-dev libxml2-dev python-nose
python-sphinx python-virtualenv uuid-runtime xfslibs-dev xfsprogs
xmlstarlet libtcmalloc-minimal4 libgoogle-perftools-dev libgoogle-perftools4
./install-deps.sh
dpkg-buildpackage -j3

i get kill-OOM so i will add swap space and run it

On Wed, Dec 2, 2015 at 12:30 PM, Swapnil Jain  wrote:

> If you can point me to some documention, I can do that.
>
> —
> *Swapnil Jain*
>
> On 02-Dec-2015, at 7:31 pm, Alfredo Deza  wrote:
>
> On Tue, Dec 1, 2015 at 11:58 PM, Swapnil Jain  wrote:
>
>
> Hi,
>
> Any plans to release Infernalis Debian 8 binary packages for armhf. As I
> only see it for amd64.
>
>
> This would be pretty simple to do but we don't have any ARM boxes
> around and nothing is immediately available for us
> to setup any.
>
>
>
>
> —
>
> Swapnil Jain
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] systemctl enable ceph-mon fails in ceph-deploy create initial (no such service)

2015-12-02 Thread Gruher, Joseph R
Hey folks.  Running RHEL7.1 with stock 3.10.0 kernel and trying to deploy 
Infernalis.  Haven't done this since Firefly but I used to know what I was 
doing.  My problem is "ceph-deploy new" and "ceph-deploy install" seem to go 
well but "ceph-deploy mon create-initial" reliably fails when starting the 
ceph-mon service.  I attached a full log of the deploy attempt and have pasted 
a sample of the problem below.  Problem seems to be that the ceph-mon service 
it wants to start doesn't actually exist on the target system.  Any ideas?  
Thanks!

[root@bdcr151 ceph]# ceph-deploy --overwrite-conf mon create-initial
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.28): /usr/bin/ceph-deploy 
--overwrite-conf mon create-initial
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username  : None
[ceph_deploy.cli][INFO  ]  verbose   : False
[ceph_deploy.cli][INFO  ]  overwrite_conf: True
[ceph_deploy.cli][INFO  ]  subcommand: create-initial
[ceph_deploy.cli][INFO  ]  quiet : False
[ceph_deploy.cli][INFO  ]  cd_conf   : 

[ceph_deploy.cli][INFO  ]  cluster   : ceph
[ceph_deploy.cli][INFO  ]  func  : 
[ceph_deploy.cli][INFO  ]  ceph_conf : None
[ceph_deploy.cli][INFO  ]  default_release   : False
[ceph_deploy.cli][INFO  ]  keyrings  : None
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts bdcr151 bdcr153 
bdcr155
[ceph_deploy.mon][DEBUG ] detecting platform for host bdcr151 ...
[bdcr151][DEBUG ] connected to host: bdcr151
[bdcr151][DEBUG ] detect platform information from remote host
[bdcr151][DEBUG ] detect machine type
[bdcr151][DEBUG ] find the location of an executable
[ceph_deploy.mon][INFO  ] distro info: Red Hat Enterprise Linux Server 7.1 Maipo
[bdcr151][DEBUG ] determining if provided host has same hostname in remote
[bdcr151][DEBUG ] get remote short hostname
[bdcr151][DEBUG ] deploying mon to bdcr151
[bdcr151][DEBUG ] get remote short hostname
[bdcr151][DEBUG ] remote hostname: bdcr151
[bdcr151][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[bdcr151][DEBUG ] create the mon path if it does not exist
[bdcr151][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-bdcr151/done
[bdcr151][DEBUG ] create a done file to avoid re-doing the mon deployment
[bdcr151][DEBUG ] create the init path if it does not exist
[bdcr151][INFO  ] Running command: systemctl enable ceph.target
[bdcr151][INFO  ] Running command: systemctl enable ceph-mon@bdcr151
[bdcr151][WARNIN] Failed to issue method call: No such file or directory
[bdcr151][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy.mon][ERROR ] Failed to execute command: systemctl enable 
ceph-mon@bdcr151
[ceph_deploy.mon][DEBUG ] detecting platform for host bdcr153 ...


ceph-deployment.log
Description: ceph-deployment.log
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] New cluster performance analysis

2015-12-02 Thread Adrien Gillard
Hi everyone,



I am currently testing our new cluster and I would like some feedback on
the numbers I am getting.



For the hardware :

7 x OSD : 2 x Intel 2640v3 (8x2.6GHz), 64B RAM, 2x10Gbits LACP for public
net., 2x10Gbits LACP for cluster net., MTU 9000

1 x MON : 2 x Intel 2630L (6x2GHz), 32GB RAM and Intel DC SSD, 2x10Gbits
LACP for public net., MTU 9000

2 x MON : VMs (8 cores, 8GB RAM), backed by SSD



Journals are 20GB partitions on SSD



The system is CentOS 7.1 with stock kernel (3.10.0-229.20.1.el7.x86_64). No
particular system optimizations.



Ceph is Infernalis from Ceph repository  : ceph version 9.2.0
(bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)



[cephadm@cph-adm-01  ~/scripts]$ ceph -s

cluster 259f65a3-d6c8-4c90-a9c2-71d4c3c55cce

 health HEALTH_OK

 monmap e1: 3 mons at
{clb-cph-frpar1-mon-02=x.x.x.2:6789/0,clb-cph-frpar2-mon-01=x.x.x.1:6789/0,clb-cph-frpar2-mon-03=x.x.x.3:6789/0}

election epoch 62, quorum 0,1,2
clb-cph-frpar2-mon-01,clb-cph-frpar1-mon-02,clb-cph-frpar2-mon-03

 osdmap e844: 84 osds: 84 up, 84 in

flags sortbitwise

  pgmap v111655: 3136 pgs, 3 pools, 3166 GB data, 19220 kobjects

8308 GB used, 297 TB / 305 TB avail

3136 active+clean



My ceph.conf :



[global]

fsid = 259f65a3-d6c8-4c90-a9c2-71d4c3c55cce

mon_initial_members = clb-cph-frpar2-mon-01, clb-cph-frpar1-mon-02,
clb-cph-frpar2-mon-03

mon_host = x.x.x.1,x.x.x.2,x.x.x.3

auth_cluster_required = cephx

auth_service_required = cephx

auth_client_required = cephx

filestore_xattr_use_omap = true

public network = 10.25.25.0/24

cluster network = 10.25.26.0/24

debug_lockdep = 0/0

debug_context = 0/0

debug_crush = 0/0

debug_buffer = 0/0

debug_timer = 0/0

debug_filer = 0/0

debug_objecter = 0/0

debug_rados = 0/0

debug_rbd = 0/0

debug_journaler = 0/0

debug_objectcatcher = 0/0

debug_client = 0/0

debug_osd = 0/0

debug_optracker = 0/0

debug_objclass = 0/0

debug_filestore = 0/0

debug_journal = 0/0

debug_ms = 0/0

debug_monc = 0/0

debug_tp = 0/0

debug_auth = 0/0

debug_finisher = 0/0

debug_heartbeatmap = 0/0

debug_perfcounter = 0/0

debug_asok = 0/0

debug_throttle = 0/0

debug_mon = 0/0

debug_paxos = 0/0

debug_rgw = 0/0



[osd]

osd journal size = 0

osd mount options xfs = "rw,noatime,inode64,logbufs=8,logbsize=256k"

filestore min sync interval = 5

filestore max sync interval = 15

filestore queue max ops = 2048

filestore queue max bytes = 1048576000

filestore queue committing max ops = 4096

filestore queue committing max bytes = 1048576000

filestore op thread = 32

filestore journal writeahead = true

filestore merge threshold = 40

filestore split multiple = 8



journal max write bytes = 1048576000

journal max write entries = 4096

journal queue max ops = 8092

journal queue max bytes = 1048576000



osd max write size = 512

osd op threads = 16

osd disk threads = 2

osd op num threads per shard = 3

osd op num shards = 10

osd map cache size = 1024

osd max backfills = 1

osd recovery max active = 2



I have set up 2 pools : one for cache with 3x replication in front of an EC
pool. At the moment I am only interested in the cache pool, so no
promotions/flushes/evictions happen.

(I know, I am using the same set of OSD for hot and cold data, but in my
use case they should not be used at the same time.)



I am accessing the cluster via RBD volumes mapped with the kernel module on
CentOS 7.1. These volumes are formatted in XFS on the clients.



The journal SSDs seem to perform quite well according to the results of
Sebastien Han’s benchmark suggestion (they are Sandisk) :

write: io=22336MB, bw=381194KB/s, iops=95298, runt= 60001msec (this is for
numjob=10)



Here are the rados bench tests :



rados bench -p rbdcache 120 write -b 4K -t 32 --no-cleanup


Total time run: 121.410763

Total writes made:  65357

Write size: 4096

Bandwidth (MB/sec): 2.1

Stddev Bandwidth:   0.597

Max bandwidth (MB/sec): 3.89

Min bandwidth (MB/sec): 0.00781

Average IOPS:   538

Stddev IOPS:152

Max IOPS:   995

Min IOPS:   2

Average Latency:0.0594

Stddev Latency: 0.18

Max latency:2.82

Min latency:0.00494



And the results of the fio test with the following parameters :



[global]

size=8G

runtime=300

ioengine=libaio

invalidate=1

direct=1

sync=1

fsync=1

numjobs=32

rw=randwrite

name=4k-32-1-randwrite-libaio

blocksize=4K

iodepth=1

directory=/mnt/rbd

group_reporting=1


4k-32-1-randwrite-libaio: (groupid=0, jobs=32): err= 0: pid=20442: Wed Dec
 2 21:38:30 2015

  write: io=992.11MB, bw=3389.3KB/s, iops=847, runt=300011msec

slat (usec): min=5, max=4726, avg=40.32, stdev=41.28

clat (msec): min=2, max=2208, avg=19.35, stdev=74.34

 lat (msec): min=2, max=2208, avg=19.39, stdev=74.34

clat percentiles (msec):

 |  1.00th=[3],  5.00th=[4], 10.00th=[4], 20.00th=[4]

Re: [ceph-users] New cluster performance analysis

2015-12-02 Thread Jan Schermer
> Let's take IOPS, assuming the spinners can do 50 (4k) synced sustained IOPS 
> (I hope they can do more ^^), we should be around 50x84/3 = 1400 IOPS, which 
> is far from rados bench (538) and fio (847). And surprisingly fio numbers are 
> greater than rados.
> 

I think the missing factor here is filesystem journal overhead - that would 
explain the strange numbers you are seeing and the low performance in rados 
bench - every filesystem metadata operation has to do at least one 1 (synced) 
OP to the journal and that's not only file creation but also file growth (or 
filling the holes). And that's on the OSD as well as on the client filesystem 
side(!).


To do a proper benchmark, fill the RBD mounted filesytem first with data 
completely and then try again with fio on a preallocated file. (and don't 
enable discard if that's supported)
Better yet, run fio on the block device itself but write it over with dd 
if=/dev/zero first.
I think you'll get bit different numbers then.
Of course whether that's representative of what your usage pattern might be is 
another story.

Can you tell us what workload should be running on this and what the 
expectations were?
Can you see someting maxed our while the benchmark is running? (CPU or drives?) 
Have you tried switching schedulers on the drives?

Jan

> On 02 Dec 2015, at 22:33, Adrien Gillard  wrote:
> 
> Hi everyone,
> 
>  
> I am currently testing our new cluster and I would like some feedback on the 
> numbers I am getting.
> 
>  
> For the hardware :
> 
> 7 x OSD : 2 x Intel 2640v3 (8x2.6GHz), 64B RAM, 2x10Gbits LACP for public 
> net., 2x10Gbits LACP for cluster net., MTU 9000
> 
> 1 x MON : 2 x Intel 2630L (6x2GHz), 32GB RAM and Intel DC SSD, 2x10Gbits LACP 
> for public net., MTU 9000
> 
> 2 x MON : VMs (8 cores, 8GB RAM), backed by SSD
> 
>  
> Journals are 20GB partitions on SSD
> 
>  
> The system is CentOS 7.1 with stock kernel (3.10.0-229.20.1.el7.x86_64). No 
> particular system optimizations.
> 
>  
> Ceph is Infernalis from Ceph repository  : ceph version 9.2.0 
> (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
> 
>  
> [cephadm@cph-adm-01  ~/scripts]$ ceph -s
> 
> cluster 259f65a3-d6c8-4c90-a9c2-71d4c3c55cce
> 
>  health HEALTH_OK
> 
>  monmap e1: 3 mons at 
> {clb-cph-frpar1-mon-02=x.x.x.2:6789/0,clb-cph-frpar2-mon-01=x.x.x.1:6789/0,clb-cph-frpar2-mon-03=x.x.x.3:6789/0}
> 
> election epoch 62, quorum 0,1,2 
> clb-cph-frpar2-mon-01,clb-cph-frpar1-mon-02,clb-cph-frpar2-mon-03
> 
>  osdmap e844: 84 osds: 84 up, 84 in
> 
> flags sortbitwise
> 
>   pgmap v111655: 3136 pgs, 3 pools, 3166 GB data, 19220 kobjects
> 
> 8308 GB used, 297 TB / 305 TB avail
> 
> 3136 active+clean
> 
>  
> My ceph.conf :
> 
>  
> [global]
> 
> fsid = 259f65a3-d6c8-4c90-a9c2-71d4c3c55cce
> 
> mon_initial_members = clb-cph-frpar2-mon-01, clb-cph-frpar1-mon-02, 
> clb-cph-frpar2-mon-03
> 
> mon_host = x.x.x.1,x.x.x.2,x.x.x.3
> 
> auth_cluster_required = cephx
> 
> auth_service_required = cephx
> 
> auth_client_required = cephx
> 
> filestore_xattr_use_omap = true
> 
> public network = 10.25.25.0/24 
> cluster network = 10.25.26.0/24 
> debug_lockdep = 0/0
> 
> debug_context = 0/0
> 
> debug_crush = 0/0
> 
> debug_buffer = 0/0
> 
> debug_timer = 0/0
> 
> debug_filer = 0/0
> 
> debug_objecter = 0/0
> 
> debug_rados = 0/0
> 
> debug_rbd = 0/0
> 
> debug_journaler = 0/0
> 
> debug_objectcatcher = 0/0
> 
> debug_client = 0/0
> 
> debug_osd = 0/0
> 
> debug_optracker = 0/0
> 
> debug_objclass = 0/0
> 
> debug_filestore = 0/0
> 
> debug_journal = 0/0
> 
> debug_ms = 0/0
> 
> debug_monc = 0/0
> 
> debug_tp = 0/0
> 
> debug_auth = 0/0
> 
> debug_finisher = 0/0
> 
> debug_heartbeatmap = 0/0
> 
> debug_perfcounter = 0/0
> 
> debug_asok = 0/0
> 
> debug_throttle = 0/0
> 
> debug_mon = 0/0
> 
> debug_paxos = 0/0
> 
> debug_rgw = 0/0
> 
>  
> [osd]
> 
> osd journal size = 0
> 
> osd mount options xfs = "rw,noatime,inode64,logbufs=8,logbsize=256k"
> 
> filestore min sync interval = 5
> 
> filestore max sync interval = 15
> 
> filestore queue max ops = 2048
> 
> filestore queue max bytes = 1048576000
> 
> filestore queue committing max ops = 4096
> 
> filestore queue committing max bytes = 1048576000
> 
> filestore op thread = 32
> 
> filestore journal writeahead = true
> 
> filestore merge threshold = 40
> 
> filestore split multiple = 8
> 
>  
> journal max write bytes = 1048576000
> 
> journal max write entries = 4096
> 
> journal queue max ops = 8092
> 
> journal queue max bytes = 1048576000
> 
>  
> osd max write size = 512
> 
> osd op threads = 16
> 
> osd disk threads = 2
> 
> osd op num threads per shard = 3
> 
> osd op num shards = 10
> 
> osd map cache size = 1024
> 
> osd max backfills = 1
> 
> osd recovery max active = 2
> 
>  
> I have set up 2 pools : one for cache with 3x replication in front of an EC 
> pool. At the moment I am only interested in the cache pool, so no 

Re: [ceph-users] infernalis osd activation on centos 7

2015-12-02 Thread Brad Hubbard
- Original Message - 

> From: "Dan Nica" 
> To: ceph-us...@ceph.com
> Sent: Thursday, 3 December, 2015 1:39:16 AM
> Subject: [ceph-users] infernalis osd activation on centos 7

> Hi guys,

> After managing to get the mons up, I am stuck at activating the osds with the
> error below

> [ceph_deploy.conf][DEBUG ] found configuration file at:
> /home/ceph/.cephdeploy.conf

What is the fsid in this file?

0c36d242-92a9-4331-b48d-ce07b628750a or 0e906cd0-81f1-412c-a3aa-3866192a2de7 ?

Check that your config in /home/ceph/ matches your running config.

Cheers,
Brad


> [osd01][WARNIN] __main__.Error: Error: No cluster conf found in /etc/ceph
> with fsid 0c36d242-92a9-4331-b48d-ce07b628750a
> [osd01][ERROR ] RuntimeError: command returned non-zero exit status: 1
> [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: ceph-disk -v
> activate --mark-init systemd --mount /dev/sdb1

> Why do I get no cluster conf ?

> [ceph@osd01 ~]$ ll /etc/ceph/
> total 12
> -rw--- 1 ceph ceph 63 Dec 2 10:30 ceph.client.admin.keyring
> -rw-r--r-- 1 ceph ceph 270 Dec 2 10:31 ceph.conf
> -rwxr-xr-x 1 ceph ceph 92 Nov 10 07:06 rbdmap
> -rw--- 1 ceph ceph 0 Dec 2 10:30 tmp0jJPo4

> [ceph@osd01 ~]$ cat /etc/ceph/ceph.conf
> [global]
> fsid = 0e906cd0-81f1-412c-a3aa-3866192a2de7
> mon_initial_members = cmon01, cmon02, cmon03
> mon_host = 10.8.250.249,10.8.250.248,10.8.250.247
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> filestore_xattr_use_omap = true

> why is it looking for other fsid than in the ceph.conf ?

> Thanks,
> Dan

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Sizing

2015-12-02 Thread Sam Huracan
IO size is 4 KB, and I need a Minimum sizing, cost optimized
I intend use SuperMicro Devices
http://www.supermicro.com/solutions/storage_Ceph.cfm

What do you think?

2015-12-02 23:17 GMT+07:00 Srinivasula Maram 
:

> One more factor we need to consider here is IO size(block size) to get
> required IOPS, based on this we can calculate the bandwidth and design the
> solution.
>
> Thanks
> Srinivas
>
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Nick Fisk
> Sent: Wednesday, December 02, 2015 9:28 PM
> To: 'Sam Huracan'; ceph-us...@ceph.com
> Subject: Re: [ceph-users] Ceph Sizing
>
> You've left out an important factorcost. Otherwise I would just say
> buy enough SSD to cover the capacity.
>
> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> > Of Sam Huracan
> > Sent: 02 December 2015 15:46
> > To: ceph-us...@ceph.com
> > Subject: [ceph-users] Ceph Sizing
> >
> > Hi,
> > I'm building a storage structure for OpenStack cloud System, input:
> > - 700 VM
> > - 150 IOPS per VM
> > - 20 Storage per VM (boot volume)
> > - Some VM run database (SQL or MySQL)
> >
> > I want to ask a sizing plan for Ceph to satisfy the IOPS requirement,
> > I list some factors considered:
> > - Amount of OSD (SAS Disk)
> > - Amount of Journal (SSD)
> > - Amount of OSD Servers
> > - Amount of MON Server
> > - Network
> > - Replica ( default is 3)
> >
> > I will divide to 3 pool with 3 Disk types: SSD, SAS 15k and SAS 10k
> > Should I use all 3 disk types in one server or build dedicated servers
> > for every pool? Example: 3 15k servers for Pool-1, 3 10k Servers for
> Pool-2.
> >
> > Could you help me a formula to calculate the minimum devices needed
> > for above input.
> >
> > Thanks and regards.
>
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Mon quorum fails

2015-12-02 Thread Sam Huracan
Hi,

My Mon quorum includes 3 nodes, if 2 nodes fail out incidently. How could I
recover system from 1 node left?

Thanks and regards.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph osd on btrfs maintenance/optimization

2015-12-02 Thread Timofey Titovets
Hi list,
I create small tool for maintenance/optimization of btrfs based OSD store:
https://github.com/Nefelim4ag/ceph-btrfs-butler
May be it's can be useful for somebody

At now script can find rarely accessed objects on disk and based on
this information can:
1. Defrag objs
2. Compress objs
3. Dedup objs (duperemove needed)

P.S. It's designet for btrfs, but it's not mean what i can't rename it
and add hooks for ext4/xfs based stores

Feel free to kick me, if you need some stuff to other FS
-- 
Have a nice day,
Timofey.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-disk activate Permission denied problems

2015-12-02 Thread Goncalo Borges

Dear Cephers...

1./ I am currently deploying infernalis 9.2.0 in centos7.

2./ I am not using ceph-deploy (I prefer not to use it) and I prefer to 
use the short form osd installation procedure described here:


   http://docs.ceph.com/docs/infernalis/install/manual-deployment/#short-form

3./ In hammer this instructions worked fine but in infernalis 9.2.0, 
which the implementation of the ceph user running the ceph daemons, 
those instructions do not seem to work properly anymore.


4./ Here is the behaviour:

4.1/ ceph-disk prepare runs fine:

   # ceph-disk prepare --cluster ceph --cluster-uuid
   a9431bc6-3ee1-4b0a-8d21-0ad883a4d2ed --fs-type xfs /dev/sdh /dev/sdc3
   WARNING:ceph-disk:OSD will not be hot-swappable if journal is not
   the same device as the osd data
   WARNING:ceph-disk:Journal /dev/sdc3 was not prepared with ceph-disk.
   Symlinking directly.
   Creating new GPT entries.
   The operation has completed successfully.
   meta-data=/dev/sdh1  isize=2048   agcount=4,
   agsize=183107519 blks
 =   sectsz=512   attr=2, projid32bit=1
 =   crc=0finobt=0
   data =   bsize=4096   blocks=732430075,
   imaxpct=5
 =   sunit=0  swidth=0 blks
   naming   =version 2  bsize=4096   ascii-ci=0 ftype=0
   log  =internal log   bsize=4096   blocks=357631, version=2
 =   sectsz=512   sunit=0 blks,
   lazy-count=1
   realtime =none   extsz=4096   blocks=0, rtextents=0
   The operation has completed successfully.

4.2/ ceph-disk activate fails with permission problems. It seems 
some temporary directory us made by root which is then not accessible to 
ceph user.


   # ceph-disk activate /dev/sdh1
   got monmap epoch 1
   2015-12-03 01:56:41.406742 7fb050fa9900 -1
   filestore(/var/lib/ceph/tmp/mnt.dsi7zI) mkjournal error creating
   journal on /var/lib/ceph/tmp/mnt.dsi7zI/journal: (13) Permission denied
   2015-12-03 01:56:41.406777 7fb050fa9900 -1 OSD::mkfs:
   ObjectStore::mkfs failed with error -13
   2015-12-03 01:56:41.406843 7fb050fa9900 -1  ** ERROR: error creating
   empty object store in /var/lib/ceph/tmp/mnt.dsi7zI: (13) Permission
   denied
   ERROR:ceph-disk:Failed to activate
   Traceback (most recent call last):
  File "/usr/sbin/ceph-disk", line 3576, in 
main(sys.argv[1:])
  File "/usr/sbin/ceph-disk", line 3532, in main
main_catch(args.func, args)
  File "/usr/sbin/ceph-disk", line 3554, in main_catch
func(args)
  File "/usr/sbin/ceph-disk", line 2424, in main_activate
dmcrypt_key_dir=args.dmcrypt_key_dir,
  File "/usr/sbin/ceph-disk", line 2197, in mount_activate
(osd_id, cluster) = activate(path, activate_key_template, init)
  File "/usr/sbin/ceph-disk", line 2360, in activate
keyring=keyring,
  File "/usr/sbin/ceph-disk", line 1950, in mkfs
'--setgroup', get_ceph_user(),
  File "/usr/sbin/ceph-disk", line 349, in command_check_call
return subprocess.check_call(arguments)
  File "/usr/lib64/python2.7/subprocess.py", line 542, in check_call
raise CalledProcessError(retcode, cmd)
   subprocess.CalledProcessError: Command '['/usr/bin/ceph-osd',
   '--cluster', 'ceph', '--mkfs', '--mkkey', '-i', '4', '--monmap',
   '/var/lib/ceph/tmp/mnt.dsi7zI/activate.monmap', '--osd-data',
   '/var/lib/ceph/tmp/mnt.dsi7zI', '--osd-journal',
   '/var/lib/ceph/tmp/mnt.dsi7zI/journal', '--osd-uuid',
   '995e436f-dc6f-42ca-bbbf-3ef9f0bb4952', '--keyring',
   '/var/lib/ceph/tmp/mnt.dsi7zI/keyring', '--setuser', 'ceph',
   '--setgroup', 'ceph']' returned non-zero exit status 1

4.3 If I run the the ceph activate command as a ceph user, it 
complains it need root permissions to mount



   # su ceph -s /bin/bash --session-command="ceph-disk activate /dev/sdh1"
   Problem opening /dev/sdh for reading! Error is 13.
   You must run this program as root or use sudo!
   mount: only root can use "--options" option
   ceph-disk: Mounting filesystem failed: Command '['/usr/bin/mount',
   '-t', 'xfs', '-o', 'noatime,inode64', '--', '/dev/sdh1',
   '/var/lib/ceph/tmp/mnt.0bUl5q']' returned non-zero exit status 1

5./ So there is an incoherence on this set of instructions.

Is there a way to solve / patch that solves this?

Cheers
Goncalo

--
Goncalo Borges
Research Computing
ARC Centre of Excellence for Particle Physics at the Terascale
School of Physics A28 | University of Sydney, NSW  2006
T: +61 2 93511937

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How long will the logs be kept?

2015-12-02 Thread Wukongming
Hi ,All
Is there anyone who knows How long or how many days will the logs.gz 
(mon/osd/mds)be kept, maybe before flushed?

-
wukongming ID: 12019
Tel:0571-86760239
Dept:2014 UIS2 OneStor

-
本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
邮件!
This e-mail and its attachments contain confidential information from H3C, 
which is
intended only for the person or entity whose address is listed above. Any use 
of the
information contained herein in any way (including, but not limited to, total 
or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender
by phone or email immediately and delete it!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How long will the logs be kept?

2015-12-02 Thread huang jun
it will rotate every week by default, you can see the logrotate file
/etc/ceph/logrotate.d/ceph

2015-12-03 12:37 GMT+08:00 Wukongming :
> Hi ,All
> Is there anyone who knows How long or how many days will the logs.gz 
> (mon/osd/mds)be kept, maybe before flushed?
>
> -
> wukongming ID: 12019
> Tel:0571-86760239
> Dept:2014 UIS2 OneStor
>
> -
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
> 邮件!
> This e-mail and its attachments contain confidential information from H3C, 
> which is
> intended only for the person or entity whose address is listed above. Any use 
> of the
> information contained herein in any way (including, but not limited to, total 
> or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please 
> notify the sender
> by phone or email immediately and delete it!



-- 
thanks
huangjun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-disk activate Permission denied problems

2015-12-02 Thread Adrien Gillard
You should check that the owner of your ceph partitions (both journal and
data) is 'ceph', otherwise the ceph user won't mount it.

You can simply do : chown ceph:disk /dev/sdc3

If this solve your issue you should set the GPT GUID [1] of the partitions
with a tool like sgdisk to make this persistent across reboot.

I think only your journal is affected as ceph-disk does not prepare the
partition (WARNING:ceph-disk:Journal /dev/sdc3 was not prepared with
ceph-disk. Symlinking directly)


[1] https://en.wikipedia.org/wiki/GUID_Partition_Table#Partition_type_GUIDs

On Thu, Dec 3, 2015 at 3:03 AM, Goncalo Borges  wrote:

> Dear Cephers...
>
> 1./ I am currently deploying infernalis 9.2.0 in centos7.
>
> 2./ I am not using ceph-deploy (I prefer not to use it) and I prefer to
> use the short form osd installation procedure described here:
>
> http://docs.ceph.com/docs/infernalis/install/manual-deployment/#short-form
>
> 3./ In hammer this instructions worked fine but in infernalis 9.2.0, which
> the implementation of the ceph user running the ceph daemons, those
> instructions do not seem to work properly anymore.
>
> 4./ Here is the behaviour:
>
> 4.1/ ceph-disk prepare runs fine:
>
> # ceph-disk prepare --cluster ceph --cluster-uuid
> a9431bc6-3ee1-4b0a-8d21-0ad883a4d2ed --fs-type xfs /dev/sdh /dev/sdc3
> WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same
> device as the osd data
> WARNING:ceph-disk:Journal /dev/sdc3 was not prepared with ceph-disk.
> Symlinking directly.
> Creating new GPT entries.
> The operation has completed successfully.
> meta-data=/dev/sdh1  isize=2048   agcount=4, agsize=183107519
> blks
>  =   sectsz=512   attr=2, projid32bit=1
>  =   crc=0finobt=0
> data =   bsize=4096   blocks=732430075, imaxpct=5
>  =   sunit=0  swidth=0 blks
> naming   =version 2  bsize=4096   ascii-ci=0 ftype=0
> log  =internal log   bsize=4096   blocks=357631, version=2
>  =   sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none   extsz=4096   blocks=0, rtextents=0
> The operation has completed successfully.
>
> 4.2/ ceph-disk activate fails with permission problems. It seems some
> temporary directory us made by root which is then not accessible to ceph
> user.
>
> # ceph-disk activate /dev/sdh1
> got monmap epoch 1
> 2015-12-03 01:56:41.406742 7fb050fa9900 -1
> filestore(/var/lib/ceph/tmp/mnt.dsi7zI) mkjournal error creating journal on
> /var/lib/ceph/tmp/mnt.dsi7zI/journal: (13) Permission denied
> 2015-12-03 01:56:41.406777 7fb050fa9900 -1 OSD::mkfs: ObjectStore::mkfs
> failed with error -13
> 2015-12-03 01:56:41.406843 7fb050fa9900 -1  ** ERROR: error creating empty
> object store in /var/lib/ceph/tmp/mnt.dsi7zI: (13) Permission denied
> ERROR:ceph-disk:Failed to activate
> Traceback (most recent call last):
>   File "/usr/sbin/ceph-disk", line 3576, in 
> main(sys.argv[1:])
>   File "/usr/sbin/ceph-disk", line 3532, in main
> main_catch(args.func, args)
>   File "/usr/sbin/ceph-disk", line 3554, in main_catch
> func(args)
>   File "/usr/sbin/ceph-disk", line 2424, in main_activate
> dmcrypt_key_dir=args.dmcrypt_key_dir,
>   File "/usr/sbin/ceph-disk", line 2197, in mount_activate
> (osd_id, cluster) = activate(path, activate_key_template, init)
>   File "/usr/sbin/ceph-disk", line 2360, in activate
> keyring=keyring,
>   File "/usr/sbin/ceph-disk", line 1950, in mkfs
> '--setgroup', get_ceph_user(),
>   File "/usr/sbin/ceph-disk", line 349, in command_check_call
> return subprocess.check_call(arguments)
>   File "/usr/lib64/python2.7/subprocess.py", line 542, in check_call
> raise CalledProcessError(retcode, cmd)
> subprocess.CalledProcessError: Command '['/usr/bin/ceph-osd', '--cluster',
> 'ceph', '--mkfs', '--mkkey', '-i', '4', '--monmap',
> '/var/lib/ceph/tmp/mnt.dsi7zI/activate.monmap', '--osd-data',
> '/var/lib/ceph/tmp/mnt.dsi7zI', '--osd-journal',
> '/var/lib/ceph/tmp/mnt.dsi7zI/journal', '--osd-uuid',
> '995e436f-dc6f-42ca-bbbf-3ef9f0bb4952', '--keyring',
> '/var/lib/ceph/tmp/mnt.dsi7zI/keyring', '--setuser', 'ceph', '--setgroup',
> 'ceph']' returned non-zero exit status 1
>
> 4.3 If I run the the ceph activate command as a ceph user, it
> complains it need root permissions to mount
>
>
> # su ceph -s /bin/bash --session-command="ceph-disk activate /dev/sdh1"
> Problem opening /dev/sdh for reading! Error is 13.
> You must run this program as root or use sudo!
> mount: only root can use "--options" option
> ceph-disk: Mounting filesystem failed: Command '['/usr/bin/mount', '-t',
> 'xfs', '-o', 'noatime,inode64', '--', '/dev/sdh1',
> '/var/lib/ceph/tmp/mnt.0bUl5q']' returned non-zero exit status 1
>
> 5./ So there is an incoherence on this set of instructions.
>
> Is there a way to solve / patch that solves this?
>
> Cheers
> Go