date:20161116

Re: [ceph-users] Help needed ! cluster unstable after upgrade from Hammer to Jewel

2016-11-16 Thread Udo Lembke

Hi,


On 16.11.2016 19:01, Vincent Godin wrote:
> Hello,
>
> We now have a full cluster (Mon, OSD & Clients) in jewel 10.2.2
> (initial was hammer 0.94.5) but we have still some big problems on our
> production environment :
>
>   * some ceph filesystem are not mounted at startup and we have to
> mount them with the "/bin/sh -c 'flock /var/lock/ceph-disk
> /usr/sbin/ceph-disk --verbose --log-stdout trigger --syn /dev/vdX1'"
>
vdX1?? This sounds you use ceph inside an virtualized system?

Udo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] how possible is that ceph cluster crash

2016-11-16 Thread Goncalo Borges

Olá Pedro...

These are extremely generic questions, and therefore, hard to answer.  Nick did 
a good job in defining the risks.

In our case, we are running a Ceph/CephFS system in production for over an 
year, and before that, we tried to understand Ceph for a year also.

Ceph is incredibility good is dealing with hardware failures so it is a 
powerfull tool if you are using commodity hardware. If your disks fail or even 
if a fraction of your hosts fail, it is able to cope and recover properly 
(until a given extent) if you have the proper crush rules in place (the default 
ones do a good job on that) and free space available. To be on the safe side:
- decouple mons from osds servers
- check the RAM requirement for your osds servers (depend in the number of osds 
in each server)
- have, at least, 3 mons in a production system
- use a 3x replica 
There is a good info page on hardware requirements in the ceph wikis.

However, the devil is on the details. Ceph is a complex system still in 
permanent development. Wrong configurations might lead to performance problems. 
If your network is not reliable, that might lead to flapping osds, which on its 
turn, might lead to problems in your pgs. When your osds starts to become full 
(a single full osd freezes all I/O to the cluster) many problems may start to 
appear. Finally there are bugs. Their number is not huge and there is a real 
good effort form the developers and from the community to address those in a 
fast and reliable way. However, sometimes it is difficult to diagnose what 
could be wrong because of the so many layers involved. It is not infrequent 
that we have to go and look to the source code to figure out (when possible) 
what may be happening. So, I would say that there is a learning curve that 
myself and others are still going through.

Abraço
Gonçalo






From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Pedro Benites 
[pbeni...@litholaser.com]
Sent: 17 November 2016 04:50
To: ceph-users@lists.ceph.com
Subject: [ceph-users] how possible is that ceph cluster crash

Hi,

I have a ceph cluster with 50 TB, with 15 osds, it is working fine for
one year and I would like to grow it and migrate all my old storage,
about 100 TB to ceph, but I have a doubt. How possible is that the
cluster fail and everything went very bad? How reliable is ceph? What is
the risk about lose my data.? is necessary backup my data?

Regards.
Pedro.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph cluster having blocke requests very frequently

2016-11-16 Thread Nick Fisk

Hi,

 

Yes, I can’t think of anything else at this stage. Could you maybe repost some 
dump historic op dumps  now that you have turned off snapshots. I wonder if 
they might reveal anything.

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Thomas 
Danan
Sent: 16 November 2016 17:38
To: n...@fisk.me.uk; 'Peter Maloney' 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph cluster having blocke requests very frequently

 

Hi Nick,

 

We have deleted all Snapshots and observed the system for several hours.

>From what I see this did not help to reduce the blocked ops and IO freeze on 
>Client ceph side.

 

We have also tried to increase a little bit the PGs (by 8 than 128) because 
this is something we should do and we wanted to see how the cluster was 
behaving.

During recovery, the number of blocked ops and associated duration increased 
significantly. Also the number of impacted OSDs was much more important. 

 

Don’t really know what to conclude from all of this …

 

Again we have checked Disk / network / and everything seems fine …

 

Thomas

From: Nick Fisk [mailto:n...@fisk.me.uk] 
Sent: mercredi 16 novembre 2016 14:01
To: Thomas Danan; 'Peter Maloney'
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] ceph cluster having blocke requests very frequently

 

The snapshot works by using Copy On Write. If you dirty even a 4kb section of a 
4MB object in the primary RBD, that entire 4MB object then needs to be read and 
then written into the snapshot RBD.

 

From: Thomas Danan [mailto:thomas.da...@mycom-osi.com] 
Sent: 16 November 2016 12:58
To: Thomas Danan  >; n...@fisk.me.uk  
; 'Peter Maloney'  >
Cc: ceph-users@lists.ceph.com  
Subject: RE: [ceph-users] ceph cluster having blocke requests very frequently

 

Hi Nick,

 

Actually I was wondering, is there any difference between Snapshot or simple 
RBD image ?

With simple RBD image when doing a random IO, we are asking Ceph cluster to 
update one or several 4MB objects no ?

So Snapshotting is multiplying the load by 2 but not more, Am I wrong ?

 

Thomas

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Thomas 
Danan
Sent: mercredi 16 novembre 2016 13:52
To: n...@fisk.me.uk  ; 'Peter Maloney'
Cc: ceph-users@lists.ceph.com  
Subject: Re: [ceph-users] ceph cluster having blocke requests very frequently

 

Hi Nick,

 

Yes our application is doing small Random IO and I did not realize that the 
snapshotting feature could so much degrade performances in that case.

 

We have just deactivated it and deleted all snapshots. Will notify you if it 
drastically reduce the blocked ops and consequently the IO freeze on client 
side.

 

Thanks

 

Thomas

 

From: Nick Fisk [mailto:n...@fisk.me.uk] 
Sent: mercredi 16 novembre 2016 13:25
To: Thomas Danan; 'Peter Maloney'
Cc: ceph-users@lists.ceph.com  
Subject: RE: [ceph-users] ceph cluster having blocke requests very frequently

 

 

 

From: ceph-users [  
mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Thomas Danan
Sent: 15 November 2016 21:14
To: Peter Maloney <  
peter.malo...@brockmann-consult.de>
Cc:   ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph cluster having blocke requests very frequently

 

Very interesting ...

 

Any idea why optimal tunable would help here ?  on our cluster we have 500TB of 
data, I am a bit concerned about changing it without taking lot of precautions 
. ...

I am curious to know how much time it takes you to change tunable, size of your 
cluster and observed impacts on client IO ...

 

Yes We do have daily rbd snapshot from 16 different ceph RBD clients. 
Snapshoting the RBD image is quite immediate while we are seing the issue 
continuously during the day...

 

Just to point out that when you take a snapshot any writes to the original RBD 
will mean that the full 4MB object is copied into the snapshot. If you have a 
lot of small random IO going on the original RBD this can lead to massive write 
amplification across the cluster and may cause issues such as what you describe.

 

Also be aware that deleting large snapshots also puts significant strain on the 
OSD’s as they try and delete hundreds of thousands of objects.

 

 

Will check all of this tomorrow . ..

 

Thanks again

 

Thomas

 

 

 

Sent from my Samsung device



 Original message 
From: Peter Maloney <  
peter.malo...@brockmann-consult.de> 
Date: 11/15/16 21:27 (GMT+01:00) 
To: Thomas Danan <

Re: [ceph-users] how possible is that ceph cluster crash

2016-11-16 Thread Nick Fisk



> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> Pedro Benites
> Sent: 16 November 2016 17:51
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] how possible is that ceph cluster crash
> 
> Hi,
> 
> I have a ceph cluster with 50 TB, with 15 osds, it is working fine for one 
> year and I would like to grow it and migrate all my old
storage,
> about 100 TB to ceph, but I have a doubt. How possible is that the cluster 
> fail and everything went very bad? 

Everything is possible, I think there are 3 main risks

1) Hardware failure
I would say Ceph is probably one of the safest options in regards to hardware 
failures, certainly if you start using 4TB+ disks.

2) Config Errors
This can be an easy one to say you are safe from. But I would say most outages 
and data loss incidents I have seen on the mailing
lists have been due to poor hardware choice or configuring options such as 
size=2, min_size=1 or enabling stuff like nobarriers.

3) Ceph Bugs
Probably the rarest, but potentially the most scary as you have less control. 
They do happen and it's something to be aware of

How reliable is ceph?
> What is the risk about lose my data.? is necessary backup my data?

Yes, always backup your data, no matter solution you use. Just like RAID != 
Backup, neither does ceph.

> 
> Regards.
> Pedro.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] how to list deleted objects in snapshot

2016-11-16 Thread Gregory Farnum

On Wed, Nov 16, 2016 at 5:13 AM, Jan Krcmar  wrote:
> hi,
>
> i've got found problem/feature in pool snapshots
>
> when i delete some object from pool which was previously snapshotted,
> i cannot list the object name in the snapshot anymore.
>
> steps to reproduce
>
> # ceph -v
> ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
> # rados -p test ls
> stats
> # rados -p test mksnap now
> # rados -p test -s now ls
> selected snap 3 'now'
> stats
> # rados -p test rm stats
> # rados -p test -s now ls
> selected snap 3 'now'
> # rados -p test -s now stat stats
> selected snap 3 'now'
> test/stats mtime 2016-11-16 14:07:14.00, size 329
> # rados -p test stat stats
>  error stat-ing test/stats: (2) No such file or directory
>
> is this rados feature or bug?

The rados tool does not apply the pool snapshot "SnapContext" when
doing listings. I *think* if it did, you would get the listing you
desire, but I'm not certain and it might be much more complicated. (If
it's just about using the correct SnapContext, it would be a pretty
small patch!)
It does apply the correct SnapContext on many other operations; you
did you try specifying "-s now" when doing the stat command?
-Greg

>
> thanks
> jan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help needed ! cluster unstable after upgrade from Hammer to Jewel

2016-11-16 Thread Nick Fisk

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Vincent Godin
Sent: 16 November 2016 18:02
To: ceph-users 
Subject: [ceph-users] Help needed ! cluster unstable after upgrade from Hammer 
to Jewel

Hello,

We now have a full cluster (Mon, OSD & Clients) in jewel 10.2.2 (initial was 
hammer 0.94.5) but we have still some big problems on our production 
environment :

*   some ceph filesystem are not mounted at startup and we have to mount 
them with the "/bin/sh -c 'flock /var/lock/ceph-disk /usr/sbin/ceph-disk 
--verbose --log-stdout trigger --syn /dev/vdX1'"
*   some OSD start but are in timeout as soon as they start for a pretty 
long time (more than 5 mn)

*   016-11-15 01:46:26.625945 7f79db91e800  0 osd.32 191438 done with init, 
starting boot process
2016-11-15 01:47:28.344996 7f79d61f7700  1 heartbeat_map is_healthy 
'FileStore::op_tp thread 0x7f79c5c91700' had timed out after 60
2016-11-15 01:47:33.345098 7f79d61f7700  1 heartbeat_map is_healthy 
'FileStore::op_tp thread 0x7f79c5c91700' had timed out after 60
...

*   these OSD take very long time to stop

*   we just loosed one OSD and the cluster is unable to stabilize and some 
OSDs go Up and Down. The cluster is in ERR state and can not serve production 
environment

*   we are in jewel 10.2.2 on CentOS 7.2 kernel 3.10.0-327.36.3.el7.x86_64

Help will be apreciate !

Vincent

Can you see anything that might indicate why the OSD’s are taking a long time 
to start up. Ie any errors in the kernel log or do the disks look like they are 
working very hard when the OSD tries to start?

Also a quick google of “heartbeat_map is_healthy 'FileStore::op_tp thread” 
brings up several past threads, it might be worth seeing if any of them had a 
solution.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Can we drop ubuntu 14.04 (trusty) for kraken and lumninous?

2016-11-16 Thread Sage Weil

On Wed, 16 Nov 2016, Yuri Weinstein wrote:
> Sage,
> 
> We had discussed xenial support in sepia today and right now jobs
> asking for it from smithi and mira nodes will fail because there are
> no bare-metals provisioned for it.
> 
> The question is - how do we split nodes between 14.04, 16.04 and centos ?

Yeah.  I thought we'd already split the bar emetal boxes between trusty 
and xenial.. sorry!

s

> 
> Thx
> YuriW
> 
> On Mon, Nov 14, 2016 at 6:24 AM, Sage Weil  wrote:
> > On Fri, 11 Nov 2016, Sage Weil wrote:
> >> Currently the distros we use for upstream testing are
> >>
> >>  centos 7.x
> >>  ubuntu 16.04 (xenial)
> >>  ubuntu 14.04 (trusty)
> >>
> >> We also do some basic testing for Debian 8 and Fedora (some old version).
> >>
> >> Jewel was the first release that had native systemd and full xenial
> >> support, so it's helpful to have both 14.04 and 16.04 supported to provide
> >> an upgrade path.  But I think we can safely drop 14.04 now for kraken and
> >> luminous.  Our options are
> >>
> >> 1) keep testing on xenial and trusty, and keep building packages for both
> >
> > Sounds like we'll keep trusty around for a while.  Thanks, everyone!
> >
> > sage
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Can we drop ubuntu 14.04 (trusty) for kraken and lumninous?

2016-11-16 Thread Yuri Weinstein

Sage,

We had discussed xenial support in sepia today and right now jobs
asking for it from smithi and mira nodes will fail because there are
no bare-metals provisioned for it.

The question is - how do we split nodes between 14.04, 16.04 and centos ?

Thx
YuriW

On Mon, Nov 14, 2016 at 6:24 AM, Sage Weil  wrote:
> On Fri, 11 Nov 2016, Sage Weil wrote:
>> Currently the distros we use for upstream testing are
>>
>>  centos 7.x
>>  ubuntu 16.04 (xenial)
>>  ubuntu 14.04 (trusty)
>>
>> We also do some basic testing for Debian 8 and Fedora (some old version).
>>
>> Jewel was the first release that had native systemd and full xenial
>> support, so it's helpful to have both 14.04 and 16.04 supported to provide
>> an upgrade path.  But I think we can safely drop 14.04 now for kraken and
>> luminous.  Our options are
>>
>> 1) keep testing on xenial and trusty, and keep building packages for both
>
> Sounds like we'll keep trusty around for a while.  Thanks, everyone!
>
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Help needed ! cluster unstable after upgrade from Hammer to Jewel

2016-11-16 Thread Vincent Godin

Hello,

We now have a full cluster (Mon, OSD & Clients) in jewel 10.2.2 (initial
was hammer 0.94.5) but we have still some big problems on our production
environment :

   - some ceph filesystem are not mounted at startup and we have to mount
   them with the "/bin/sh -c 'flock /var/lock/ceph-disk /usr/sbin/ceph-disk
   --verbose --log-stdout trigger --syn /dev/vdX1'"

   - some OSD start but are in timeout as soon as they start for a pretty
   long time (more than 5 mn)
  - 016-11-15 01:46:26.625945 7f79db91e800  0 osd.32 191438 done with
  init, starting boot process
  2016-11-15 01:47:28.344996 7f79d61f7700  1 heartbeat_map is_healthy
  'FileStore::op_tp thread 0x7f79c5c91700' had timed out after 60
  2016-11-15 01:47:33.345098 7f79d61f7700  1 heartbeat_map is_healthy
  'FileStore::op_tp thread 0x7f79c5c91700' had timed out after 60
  ...

  - these OSD take very long time to stop


   - we just loosed one OSD and the cluster is unable to stabilize and some
   OSDs go Up and Down. The cluster is in ERR state and can not serve
   production environment


   - we are in jewel 10.2.2 on CentOS 7.2 kernel 3.10.0-327.36.3.el7.x86_64

Help will be apreciate !

Vincent
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] how possible is that ceph cluster crash

2016-11-16 Thread Pedro Benites


Hi,

I have a ceph cluster with 50 TB, with 15 osds, it is working fine for 
one year and I would like to grow it and migrate all my old storage, 
about 100 TB to ceph, but I have a doubt. How possible is that the 
cluster fail and everything went very bad? How reliable is ceph? What is 
the risk about lose my data.? is necessary backup my data?


Regards.
Pedro.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph cluster having blocke requests very frequently

2016-11-16 Thread Thomas Danan

Hi Nick,

We have deleted all Snapshots and observed the system for several hours.
From what I see this did not help to reduce the blocked ops and IO freeze on 
Client ceph side.

We have also tried to increase a little bit the PGs (by 8 than 128) because 
this is something we should do and we wanted to see how the cluster was 
behaving.
During recovery, the number of blocked ops and associated duration increased 
significantly. Also the number of impacted OSDs was much more important.

Don’t really know what to conclude from all of this …

Again we have checked Disk / network / and everything seems fine …

Thomas
From: Nick Fisk [mailto:n...@fisk.me.uk]
Sent: mercredi 16 novembre 2016 14:01
To: Thomas Danan; 'Peter Maloney'
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] ceph cluster having blocke requests very frequently

The snapshot works by using Copy On Write. If you dirty even a 4kb section of a 
4MB object in the primary RBD, that entire 4MB object then needs to be read and 
then written into the snapshot RBD.

From: Thomas Danan [mailto:thomas.da...@mycom-osi.com]
Sent: 16 November 2016 12:58
To: Thomas Danan 
>; 
n...@fisk.me.uk; 'Peter Maloney' 
>
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] ceph cluster having blocke requests very frequently

Hi Nick,

Actually I was wondering, is there any difference between Snapshot or simple 
RBD image ?
With simple RBD image when doing a random IO, we are asking Ceph cluster to 
update one or several 4MB objects no ?
So Snapshotting is multiplying the load by 2 but not more, Am I wrong ?

Thomas

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Thomas 
Danan
Sent: mercredi 16 novembre 2016 13:52
To: n...@fisk.me.uk; 'Peter Maloney'
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph cluster having blocke requests very frequently

Hi Nick,

Yes our application is doing small Random IO and I did not realize that the 
snapshotting feature could so much degrade performances in that case.

We have just deactivated it and deleted all snapshots. Will notify you if it 
drastically reduce the blocked ops and consequently the IO freeze on client 
side.

Thanks

Thomas

From: Nick Fisk [mailto:n...@fisk.me.uk]
Sent: mercredi 16 novembre 2016 13:25
To: Thomas Danan; 'Peter Maloney'
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] ceph cluster having blocke requests very frequently

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Thomas 
Danan
Sent: 15 November 2016 21:14
To: Peter Maloney 
>
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph cluster having blocke requests very frequently

Very interesting ...

Any idea why optimal tunable would help here ?  on our cluster we have 500TB of 
data, I am a bit concerned about changing it without taking lot of precautions 
. ...
I am curious to know how much time it takes you to change tunable, size of your 
cluster and observed impacts on client IO ...

Yes We do have daily rbd snapshot from 16 different ceph RBD clients. 
Snapshoting the RBD image is quite immediate while we are seing the issue 
continuously during the day...

Just to point out that when you take a snapshot any writes to the original RBD 
will mean that the full 4MB object is copied into the snapshot. If you have a 
lot of small random IO going on the original RBD this can lead to massive write 
amplification across the cluster and may cause issues such as what you describe.

Also be aware that deleting large snapshots also puts significant strain on the 
OSD’s as they try and delete hundreds of thousands of objects.

Will check all of this tomorrow . ..

Thanks again

Thomas

Sent from my Samsung device

 Original message 
From: Peter Maloney 
>
Date: 11/15/16 21:27 (GMT+01:00)
To: Thomas Danan >
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph cluster having blocke requests very frequently
On 11/15/16 14:05, Thomas Danan wrote:
> Hi Peter,
>
> Ceph cluster version is 0.94.5 and we are running with Firefly tunables and 
> also we have 10KPGs instead of the 30K / 40K we should have.
> The linux kernel version is 3.10.0-327.36.1.el7.x86_64 with RHEL 7.2
>
> On our side we havethe following settings:
> mon_osd_adjust_heartbeat_grace = false
> mon_osd_adjust_down_out_interval = false
> mon_osd_min_down_reporters = 5
> mon_osd_min_down_reports = 10
>
> explaining why

Re: [ceph-users] nfs-ganesha and rados gateway, Cannot find supported RGW runtime. Disabling RGW fsal build

2016-11-16 Thread Ken Dreyer

On Fri, Nov 4, 2016 at 2:14 AM, 于 姜  wrote:
> ceph version 10.2.3
> ubuntu 14.04 server
> nfs-ganesha 2.4.1
> ntirpc 1.4.3
>
> cmake -DUSE_FSAL_RGW=ON ../src/
>
> -- Found rgw libraries: /usr/lib
> -- Could NOT find RGW: Found unsuitable version ".", but required is at
> least "1.1" (found /usr)
> CMake Warning at CMakeLists.txt:571 (message):
> Cannot find supported RGW runtime. Disabling RGW fsal build
>
> Hello, everyone, Will the nfs-ganesha in ceph 10.2.3 version available?

Unfortunately nfs-ganesha 2.4 will not build with vanilla Ceph
v10.2.3. You probably need some or all of the patches here:
https://github.com/ceph/ceph/pull/11335 (or more?)

I think this is fixed in Ceph v10.2.4.

- Ken
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cephfs mds failing to respond to capability release

2016-11-16 Thread Webert de Souza Lima

Hello John.

I'm sorry for the lack of information at the first post.
The same version is in use for servers and clients.

About the workload, it varies.
On one server it's about *5 files created/written and then fully read per
second*.
On the other server it's about *5 to 6 times that number*, so a lot more,
but the problem does not escalate at the same proportion.

*~# ceph -v*
ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

*~#dpkg -l | grep ceph*
ii  ceph-fuse10.2.2-1trusty
amd64FUSE-based client for the Ceph distributed file system

Some things are worth mentioning:
The service(1) that creates the file sends an async request to another
service(2) that reads it.
The service(1) that creates the file also deletes it when its client closes
the connection, so it can do so while the other service(2) is trying to
read it. i'm not sure what would happen here.

On Wed, Nov 16, 2016 at 1:42 PM John Spray  wrote:

> On Wed, Nov 16, 2016 at 3:15 PM, Webert de Souza Lima
>  wrote:
> > hi,
> >
> > I have many clusters running cephfs, and in the last 45 days or so, 2 of
> > them started giving me the following message in ceph health:
> > mds0: Client dc1-mx02-fe02:guest failing to respond to capability release
> >
> > When this happens, cephfs stops responding. It will only get back after I
> > restart the failing mds.
> >
> > Algo, I get the following logs from ceph.log
> > https://paste.debian.net/896236/
> >
> > There was no change made that I can relate to this and I can't figure out
> > what is happening.
>
> I have the usual questions: what ceph versions, what clients etc
> (http://docs.ceph.com/docs/jewel/cephfs/early-adopters/#reporting-issues)
>
> Clients failing to respond to capability release are either buggy (old
> kernels?) or it's also possible that you have a workload that is
> holding an excessive number of files open.
>
> Cheers,
> John
>
>
>
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cephfs mds failing to respond to capability release

2016-11-16 Thread Webert de Souza Lima

I'm sorry, by server, I meant cluster.
On one cluster the rate of files created and read is about 5 per second.
On another cluster it's from 25 to 30 files created and read per second.

On Wed, Nov 16, 2016 at 2:03 PM Webert de Souza Lima 
wrote:

> Hello John.
>
> I'm sorry for the lack of information at the first post.
> The same version is in use for servers and clients.
>
> About the workload, it varies.
> On one server it's about *5 files created/written and then fully read per
> second*.
> On the other server it's about *5 to 6 times that number*, so a lot more,
> but the problem does not escalate at the same proportion.
>
> *~# ceph -v*
> ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>
> *~#dpkg -l | grep ceph*
> ii  ceph-fuse10.2.2-1trusty
> amd64FUSE-based client for the Ceph distributed file system
>
> Some things are worth mentioning:
> The service(1) that creates the file sends an async request to another
> service(2) that reads it.
> The service(1) that creates the file also deletes it when its client
> closes the connection, so it can do so while the other service(2) is
> trying to read it. i'm not sure what would happen here.
>
>
>
> On Wed, Nov 16, 2016 at 1:42 PM John Spray  wrote:
>
> On Wed, Nov 16, 2016 at 3:15 PM, Webert de Souza Lima
>  wrote:
> > hi,
> >
> > I have many clusters running cephfs, and in the last 45 days or so, 2 of
> > them started giving me the following message in ceph health:
> > mds0: Client dc1-mx02-fe02:guest failing to respond to capability release
> >
> > When this happens, cephfs stops responding. It will only get back after I
> > restart the failing mds.
> >
> > Algo, I get the following logs from ceph.log
> > https://paste.debian.net/896236/
> >
> > There was no change made that I can relate to this and I can't figure out
> > what is happening.
>
> I have the usual questions: what ceph versions, what clients etc
> (http://docs.ceph.com/docs/jewel/cephfs/early-adopters/#reporting-issues)
>
> Clients failing to respond to capability release are either buggy (old
> kernels?) or it's also possible that you have a workload that is
> holding an excessive number of files open.
>
> Cheers,
> John
>
>
>
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cephfs mds failing to respond to capability release

2016-11-16 Thread John Spray

On Wed, Nov 16, 2016 at 3:15 PM, Webert de Souza Lima
 wrote:
> hi,
>
> I have many clusters running cephfs, and in the last 45 days or so, 2 of
> them started giving me the following message in ceph health:
> mds0: Client dc1-mx02-fe02:guest failing to respond to capability release
>
> When this happens, cephfs stops responding. It will only get back after I
> restart the failing mds.
>
> Algo, I get the following logs from ceph.log
> https://paste.debian.net/896236/
>
> There was no change made that I can relate to this and I can't figure out
> what is happening.

I have the usual questions: what ceph versions, what clients etc
(http://docs.ceph.com/docs/jewel/cephfs/early-adopters/#reporting-issues)

Clients failing to respond to capability release are either buggy (old
kernels?) or it's also possible that you have a workload that is
holding an excessive number of files open.

Cheers,
John



> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] cephfs mds failing to respond to capability release

2016-11-16 Thread Webert de Souza Lima

hi,

I have many clusters running cephfs, and in the last 45 days or so, 2 of
them started giving me the following message in *ceph health:*
*mds0: Client dc1-mx02-fe02:guest failing to respond to capability release*

When this happens, cephfs stops responding. It will only get back
after I *restart
the failing mds*.

Algo, I get the following logs from *ceph.log*
https://paste.debian.net/896236/

There was no change made that I can relate to this and I can't figure out
what is happening.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Antw: Re: hammer on xenial

2016-11-16 Thread Steffen Weißgerber

Hello,

now I made an

'apt-get autoremove ceph-common',

 reinstalled hammer 

 'apt-get install --reinstall ceph=0.94.9-1xenial' 

readded the systemctl unit-files from backup after changing the
ExecStart entry to command string without -setuser an --setgroup
option
that hammer daemon binaries do not have

enabled the unit files for ceph-mon.service and ceph-osd@.service

systemctl enable ...

and startet the mon succesfully. While the mon is up there's clock
skew
detected for it that can't be removed with a mon restart.

Wondering about this because the ntpd is running with the same config
as before the upgrade with iburts entries to local time server and the
rest of the cluster nodes.

Regards

Steffen

>>> Robert Sander  16.11.2016 10:23 >>>
On 16.11.2016 09:05, Steffen Weißgerber wrote:
> Hello,
> 
> we started upgrading ubuntu on our ceph nodes to Xenial and had to
see that during
> the upgrade ceph automatically was upgraded from hammer to jewel
also.
> 
> Because we don't want to upgrade ceph and the OS at the same time we
deinstalled
> the ceph jewel components reactivated
/etc/apt/sources.list.d/ceph.list with
> 
> deb http://ceph.com/debian-hammer/ xenial main
> 
> and pinned the ceph relaese to install in
/etc/apt/preferences/ceph.pref

After that process you may still have the Ubuntu trusty packages for
Ceph Hammer installed.

Do an "apt-get install --reinstall ceph.*" on your node after the
Upgrade. This should pull the Ubuntu xenial packages and install them.

Regards
-- 
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de 

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. *35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin

-- 
Klinik-Service Neubrandenburg GmbH
Allendestr. 30, 17036 Neubrandenburg
Amtsgericht Neubrandenburg, HRB 2457
Geschaeftsfuehrerin: Gudrun Kappich
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] how to list deleted objects in snapshot

2016-11-16 Thread Jan Krcmar

hi,

i've got found problem/feature in pool snapshots

when i delete some object from pool which was previously snapshotted,
i cannot list the object name in the snapshot anymore.

steps to reproduce

# ceph -v
ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
# rados -p test ls
stats
# rados -p test mksnap now
# rados -p test -s now ls
selected snap 3 'now'
stats
# rados -p test rm stats
# rados -p test -s now ls
selected snap 3 'now'
# rados -p test -s now stat stats
selected snap 3 'now'
test/stats mtime 2016-11-16 14:07:14.00, size 329
# rados -p test stat stats
 error stat-ing test/stats: (2) No such file or directory

is this rados feature or bug?

thanks
jan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Antw: Re: hammer on xenial

2016-11-16 Thread Steffen Weißgerber

>>> Robert Sander  schrieb am Mittwoch,
16. November
2016 um 10:23:
> On 16.11.2016 09:05, Steffen Weißgerber wrote:
>> Hello,
>> 

Hello,

>> we started upgrading ubuntu on our ceph nodes to Xenial and had to
see that 
> during
>> the upgrade ceph automatically was upgraded from hammer to jewel
also.
>> 
>> Because we don't want to upgrade ceph and the OS at the same time we

> deinstalled
>> the ceph jewel components reactivated
/etc/apt/sources.list.d/ceph.list with
>> 
>> deb http://ceph.com/debian-hammer/ xenial main
>> 
>> and pinned the ceph relaese to install in
/etc/apt/preferences/ceph.pref
> 
> After that process you may still have the Ubuntu trusty packages for
> Ceph Hammer installed.
> 

Now after I repeated the procedure the installed ceph version is

~# ceph -v
ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

When enabling the source list entry for xenial hammer and doing an
apt-get update the right package is also available for install:

~# apt-cache madison ceph
  ceph | 10.2.2-0ubuntu0.16.04.2 | http://archive.ubuntu.com/ubuntu
xenial-updates/main amd64 Packages
  ceph | 10.1.2-0ubuntu1 | http://archive.ubuntu.com/ubuntu
xenial/main amd64 Packages
  ceph | 0.94.9-1xenial | http://ceph.com/debian-hammer xenial/main
amd64 Packages
  ceph | 10.1.2-0ubuntu1 | http://archive.ubuntu.com/ubuntu
xenial/main Sources
  ceph | 10.2.2-0ubuntu0.16.04.2 | http://archive.ubuntu.com/ubuntu
xenial-updates/main Sources

> Do an "apt-get install --reinstall ceph.*" on your node after the
> Upgrade. This should pull the Ubuntu xenial packages and install
them.
> 

but the reinstall does not downgrade the package, although it's pinned
to the old version:

~# apt-get install --reinstall ceph

Paketlisten werden gelesen... Fertig
Abhängigkeitsbaum wird aufgebaut.   
Statusinformationen werden eingelesen Fertig
Die folgenden Pakete wurden automatisch installiert und werden nicht
mehr benötigt:
  libclass-accessor-perl libio-string-perl libsub-name-perl
libtimedate-perl
Verwenden Sie *apt autoremove*, um sie zu entfernen.
0 aktualisiert, 0 neu installiert, 1 erneut installiert, 0 zu entfernen
und 0 nicht aktualisiert.
Es müssen noch 0 B von 12,7 MB an Archiven heruntergeladen werden.
Nach dieser Operation werden 0 B Plattenplatz zusätzlich benutzt.
N: Datei *50unattended-upgrades.ucf-dist* in Verzeichnis
*/etc/apt/apt.conf.d/* wird ignoriert, da sie eine ungülti
ge Dateinamen-Erweiterung hat.
(Lese Datenbank ... 71226 Dateien und Verzeichnisse sind derzeit
installiert.)
Vorbereitung zum Entpacken von
.../ceph_10.2.2-0ubuntu0.16.04.2_amd64.deb ...
Entpacken von ceph (10.2.2-0ubuntu0.16.04.2) über
(10.2.2-0ubuntu0.16.04.2) ...
Trigger für systemd (229-4ubuntu12) werden verarbeitet ...
Trigger für ureadahead (0.100.0-19) werden verarbeitet ...
Trigger für man-db (2.7.5-1) werden verarbeitet ...
ceph (10.2.2-0ubuntu0.16.04.2) wird eingerichtet ...
N: Datei *50unattended-upgrades.ucf-dist* in Verzeichnis
*/etc/apt/apt.conf.d/* wird ignoriert, da sie eine ungült$ge
Dateinamen-Erweiterung hat.

Now I find the systemd service files in /lib/systemd/system/

~# ls /lib/systemd/system/*ceph*
/lib/systemd/system/ceph-create-keys.service  
/lib/systemd/system/ceph-mon.service
/lib/systemd/system/ceph-create-keys@.service 
/lib/systemd/system/ceph-mon@.service
/lib/systemd/system/ceph-disk@.service
/lib/systemd/system/ceph-osd@.service
/lib/systemd/system/ceph-mds.service  
/lib/systemd/system/ceph.target
/lib/systemd/system/ceph-mds@.service

which will be removed while ceph deinstall I think.

An apt-get install --reinstall ceph=0.94.9-1xenial results in an
unresolved dependencies
error:

...
Die folgenden Pakete haben unerfüllte Abhängigkeiten:
 ceph : Hängt ab von: ceph-common (>= 0.94.2-2) soll aber nich
t
installiert werden
...

With the options --allow-downgrades or --force-yes it's the same
behavior.

Hmmm.

Regards

Steffen

> Regards
> -- 
> Robert Sander
> Heinlein Support GmbH
> Schwedter Str. 8/9b, 10119 Berlin
> 
> http://www.heinlein-support.de 
> 
> Tel: 030 / 405051-43
> Fax: 030 / 405051-19
> 
> Zwangsangaben lt. *35a GmbHG:
> HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
> Geschäftsführer: Peer Heinlein -- Sitz: Berlin
-- 
Klinik-Service Neubrandenburg GmbH
Allendestr. 30, 17036 Neubrandenburg
Amtsgericht Neubrandenburg, HRB 2457
Geschaeftsfuehrerin: Gudrun Kappich
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph cluster having blocke requests very frequently

2016-11-16 Thread Nick Fisk

The snapshot works by using Copy On Write. If you dirty even a 4kb section of a 
4MB object in the primary RBD, that entire 4MB object then needs to be read and 
then written into the snapshot RBD.

From: Thomas Danan [mailto:thomas.da...@mycom-osi.com] 
Sent: 16 November 2016 12:58
To: Thomas Danan ; n...@fisk.me.uk; 'Peter Maloney' 

Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] ceph cluster having blocke requests very frequently

Hi Nick,

Actually I was wondering, is there any difference between Snapshot or simple 
RBD image ?

With simple RBD image when doing a random IO, we are asking Ceph cluster to 
update one or several 4MB objects no ?

So Snapshotting is multiplying the load by 2 but not more, Am I wrong ?

Thomas

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Thomas 
Danan
Sent: mercredi 16 novembre 2016 13:52
To: n...@fisk.me.uk; 'Peter Maloney'
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph cluster having blocke requests very frequently

Hi Nick,

Yes our application is doing small Random IO and I did not realize that the 
snapshotting feature could so much degrade performances in that case.

We have just deactivated it and deleted all snapshots. Will notify you if it 
drastically reduce the blocked ops and consequently the IO freeze on client 
side.

Thanks

Thomas

From: Nick Fisk [mailto:n...@fisk.me.uk] 
Sent: mercredi 16 novembre 2016 13:25
To: Thomas Danan; 'Peter Maloney'
Cc: ceph-users@lists.ceph.com  
Subject: RE: [ceph-users] ceph cluster having blocke requests very frequently

From: ceph-users [  
mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Thomas Danan
Sent: 15 November 2016 21:14
To: Peter Maloney <  
peter.malo...@brockmann-consult.de>
Cc:   ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph cluster having blocke requests very frequently

Very interesting ...

Any idea why optimal tunable would help here ?  on our cluster we have 500TB of 
data, I am a bit concerned about changing it without taking lot of precautions 
. ...

I am curious to know how much time it takes you to change tunable, size of your 
cluster and observed impacts on client IO ...

Yes We do have daily rbd snapshot from 16 different ceph RBD clients. 
Snapshoting the RBD image is quite immediate while we are seing the issue 
continuously during the day...

Just to point out that when you take a snapshot any writes to the original RBD 
will mean that the full 4MB object is copied into the snapshot. If you have a 
lot of small random IO going on the original RBD this can lead to massive write 
amplification across the cluster and may cause issues such as what you describe.

Also be aware that deleting large snapshots also puts significant strain on the 
OSD’s as they try and delete hundreds of thousands of objects.

Will check all of this tomorrow . ..

Thanks again

Thomas

Sent from my Samsung device

 Original message 
From: Peter Maloney <  
peter.malo...@brockmann-consult.de> 
Date: 11/15/16 21:27 (GMT+01:00) 
To: Thomas Danan <  
thomas.da...@mycom-osi.com> 
Cc:   ceph-users@lists.ceph.com 
Subject: Re: [ceph-users] ceph cluster having blocke requests very frequently 

On 11/15/16 14:05, Thomas Danan wrote:
> Hi Peter,
>
> Ceph cluster version is 0.94.5 and we are running with Firefly tunables and 
> also we have 10KPGs instead of the 30K / 40K we should have.
> The linux kernel version is 3.10.0-327.36.1.el7.x86_64 with RHEL 7.2
>
> On our side we havethe following settings:
> mon_osd_adjust_heartbeat_grace = false
> mon_osd_adjust_down_out_interval = false
> mon_osd_min_down_reporters = 5
> mon_osd_min_down_reports = 10
>
> explaining why the OSDs are not flapping but still they are behaving wrongly 
> and generate the slow requests I am describing.
>
> The osd_op_complaint_time is with the default value (30 sec), not sure I want 
> to change it base on your experience
I wasn't saying you should set the complaint time to 5, just saying
that's why I have complaints logged with such low block times.
> Thomas

And now I'm testing this:
osd recovery sleep = 0.5
osd snap trim sleep = 0.5

(or fiddling with it as low as 0.1 to make it rebalance faster)

While also changing tunables to optimal (which will rebalance 75% of the
objects)
Which has very good results so far (a few <14s blocks right at the
start, and none since, over an hour ago).

And I'm somehow hoping that will fix my rbd export-diff issue too... but
it at least appears to fix the rebalance causing blocks.

Do you use rbd snapshots? I think that

Re: [ceph-users] ceph cluster having blocke requests very frequently

2016-11-16 Thread Thomas Danan

Hi Nick,

Yes our application is doing small Random IO and I did not realize that the 
snapshotting feature could so much degrade performances in that case.

We have just deactivated it and deleted all snapshots. Will notify you if it 
drastically reduce the blocked ops and consequently the IO freeze on client 
side.

Thanks

Thomas

From: Nick Fisk [mailto:n...@fisk.me.uk]
Sent: mercredi 16 novembre 2016 13:25
To: Thomas Danan; 'Peter Maloney'
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] ceph cluster having blocke requests very frequently

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Thomas 
Danan
Sent: 15 November 2016 21:14
To: Peter Maloney 
>
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph cluster having blocke requests very frequently

Very interesting ...

Any idea why optimal tunable would help here ?  on our cluster we have 500TB of 
data, I am a bit concerned about changing it without taking lot of precautions 
. ...
I am curious to know how much time it takes you to change tunable, size of your 
cluster and observed impacts on client IO ...

Yes We do have daily rbd snapshot from 16 different ceph RBD clients. 
Snapshoting the RBD image is quite immediate while we are seing the issue 
continuously during the day...

Just to point out that when you take a snapshot any writes to the original RBD 
will mean that the full 4MB object is copied into the snapshot. If you have a 
lot of small random IO going on the original RBD this can lead to massive write 
amplification across the cluster and may cause issues such as what you describe.

Also be aware that deleting large snapshots also puts significant strain on the 
OSD's as they try and delete hundreds of thousands of objects.

Will check all of this tomorrow . ..

Thanks again

Thomas

Sent from my Samsung device

 Original message 
From: Peter Maloney 
>
Date: 11/15/16 21:27 (GMT+01:00)
To: Thomas Danan >
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph cluster having blocke requests very frequently
On 11/15/16 14:05, Thomas Danan wrote:
> Hi Peter,
>
> Ceph cluster version is 0.94.5 and we are running with Firefly tunables and 
> also we have 10KPGs instead of the 30K / 40K we should have.
> The linux kernel version is 3.10.0-327.36.1.el7.x86_64 with RHEL 7.2
>
> On our side we havethe following settings:
> mon_osd_adjust_heartbeat_grace = false
> mon_osd_adjust_down_out_interval = false
> mon_osd_min_down_reporters = 5
> mon_osd_min_down_reports = 10
>
> explaining why the OSDs are not flapping but still they are behaving wrongly 
> and generate the slow requests I am describing.
>
> The osd_op_complaint_time is with the default value (30 sec), not sure I want 
> to change it base on your experience
I wasn't saying you should set the complaint time to 5, just saying
that's why I have complaints logged with such low block times.
> Thomas

And now I'm testing this:
osd recovery sleep = 0.5
osd snap trim sleep = 0.5

(or fiddling with it as low as 0.1 to make it rebalance faster)

While also changing tunables to optimal (which will rebalance 75% of the
objects)
Which has very good results so far (a few <14s blocks right at the
start, and none since, over an hour ago).

And I'm somehow hoping that will fix my rbd export-diff issue too... but
it at least appears to fix the rebalance causing blocks.

Do you use rbd snapshots? I think that may be causing my issues, based
on things like:

> "description": "osd_op(client.692201.0:20455419 4.1b5a5bc1
> rbd_data.94a08238e1f29.617b [] snapc 918d=[918d]
> ack+ondisk+write+known_if_redirected e40036)",
> "initiated_at": "2016-11-15 20:57:48.313432",
> "age": 409.634862,
> "duration": 3.377347,
> ...
> {
> "time": "2016-11-15 20:57:48.313767",
> "event": "waiting for subops from 0,1,8,22"
> },
> ...
> {
> "time": "2016-11-15 20:57:51.688530",
> "event": "sub_op_applied_rec from 22"
> },

Which says "snapc" in there (CoW?), and I think shows that just one osd
is delayed a few seconds and the rest are really fast, like you said.
(and not sure why I see 4 osds here when I have size 3... node1 osd 0
and 1, and node3 osd 8 and 22)

or some (shorter I think) have description like:
> osd_repop(client.426591.0:203051290 4.1f9
> 4:9fe4c001:::rbd_data.4cf92238e1f29.14ef:head v 40047'2531604)

This electronic message contains information

Re: [ceph-users] Best practices for use ceph cluster anddirectorieswith many! Entries

2016-11-16 Thread John Spray

On Wed, Nov 16, 2016 at 12:18 PM, Burkhard Linke
 wrote:
> Hi,
>
>
> On 11/16/2016 11:17 AM, John Spray wrote:
>>
>> On Wed, Nov 16, 2016 at 1:16 AM, Patrick Donnelly 
>> wrote:
>>>
>>> On Tue, Nov 15, 2016 at 8:40 AM, Hauke Homburg 
>>> wrote:

 In the last weeks we enabled for testing the dir fragmentation. The
 Resultat
 is that we have sometimes error messages with rsync with unlink and
 no-space
 left on device.
>>>
>>> Enabling directory fragmentation would not cause the unlink and ENOSPC
>>> errors. Failure to unlink is caused by the stray directories on the
>>> MDS growing too large. The only current solution is to wait for the
>>> MDS to eventually purge the stray directory entries. Retry the unlink
>>> as necessary. [The other workaround is to increase
>>> mds_bal_fragment_size_max [1] which is not recommended.]
>>>
>>> Directory fragmentation is not yet considered stable so beware
>>> potential issues including data loss. However, fragmentation will
>>> allow your directories to grow to unbounded size. This includes the
>>> stray directories which would permit unlink to avoid this issue.
>>
>> The last part isn't quite right, we currently don't fragment
>> strays[1].  Unfortunately anyone who uses directory fragmentation to
>> create a super-big directory could still have issues when unlinking
>> it.  However, there are 10x stray directories and removed items are
>> spread between them, so you should be able to handle deleting a
>> directory 10x the limit on the size of a stray dir.
>
> Just out of curiosity:
>
> It is possible to increase the number of stray directories?

Nope, it's a compiled-in constant.

John

> Regards,
> Burkhard
>
> --
> Dr. rer. nat. Burkhard Linke
> Bioinformatics and Systems Biology
> Justus-Liebig-University Giessen
> 35392 Giessen, Germany
> Phone: (+49) (0)641 9935810
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Best practices for use ceph cluster anddirectorieswith many! Entries

2016-11-16 Thread Burkhard Linke


Hi,


On 11/16/2016 11:17 AM, John Spray wrote:

On Wed, Nov 16, 2016 at 1:16 AM, Patrick Donnelly  wrote:

On Tue, Nov 15, 2016 at 8:40 AM, Hauke Homburg  wrote:

In the last weeks we enabled for testing the dir fragmentation. The Resultat
is that we have sometimes error messages with rsync with unlink and no-space
left on device.

Enabling directory fragmentation would not cause the unlink and ENOSPC
errors. Failure to unlink is caused by the stray directories on the
MDS growing too large. The only current solution is to wait for the
MDS to eventually purge the stray directory entries. Retry the unlink
as necessary. [The other workaround is to increase
mds_bal_fragment_size_max [1] which is not recommended.]

Directory fragmentation is not yet considered stable so beware
potential issues including data loss. However, fragmentation will
allow your directories to grow to unbounded size. This includes the
stray directories which would permit unlink to avoid this issue.

The last part isn't quite right, we currently don't fragment
strays[1].  Unfortunately anyone who uses directory fragmentation to
create a super-big directory could still have issues when unlinking
it.  However, there are 10x stray directories and removed items are
spread between them, so you should be able to handle deleting a
directory 10x the limit on the size of a stray dir.

Just out of curiosity:

It is possible to increase the number of stray directories?

Regards,
Burkhard

--
Dr. rer. nat. Burkhard Linke
Bioinformatics and Systems Biology
Justus-Liebig-University Giessen
35392 Giessen, Germany
Phone: (+49) (0)641 9935810

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph cluster having blocke requests very frequently

2016-11-16 Thread Nick Fisk

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Thomas 
Danan
Sent: 15 November 2016 21:14
To: Peter Maloney 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph cluster having blocke requests very frequently

Very interesting ...

Any idea why optimal tunable would help here ?  on our cluster we have 500TB of 
data, I am a bit concerned about changing it without
taking lot of precautions . ...

I am curious to know how much time it takes you to change tunable, size of your 
cluster and observed impacts on client IO ...

Yes We do have daily rbd snapshot from 16 different ceph RBD clients. 
Snapshoting the RBD image is quite immediate while we are
seing the issue continuously during the day...

Just to point out that when you take a snapshot any writes to the original RBD 
will mean that the full 4MB object is copied into the
snapshot. If you have a lot of small random IO going on the original RBD this 
can lead to massive write amplification across the
cluster and may cause issues such as what you describe.

Also be aware that deleting large snapshots also puts significant strain on the 
OSD's as they try and delete hundreds of thousands
of objects.

Will check all of this tomorrow . ..

Thanks again

Thomas

Sent from my Samsung device

 Original message 
From: Peter Maloney  > 
Date: 11/15/16 21:27 (GMT+01:00) 
To: Thomas Danan  > 
Cc: ceph-users@lists.ceph.com   
Subject: Re: [ceph-users] ceph cluster having blocke requests very frequently 

On 11/15/16 14:05, Thomas Danan wrote:
> Hi Peter,
>
> Ceph cluster version is 0.94.5 and we are running with Firefly tunables and 
> also we have 10KPGs instead of the 30K / 40K we should
have.
> The linux kernel version is 3.10.0-327.36.1.el7.x86_64 with RHEL 7.2
>
> On our side we havethe following settings:
> mon_osd_adjust_heartbeat_grace = false
> mon_osd_adjust_down_out_interval = false
> mon_osd_min_down_reporters = 5
> mon_osd_min_down_reports = 10
>
> explaining why the OSDs are not flapping but still they are behaving wrongly 
> and generate the slow requests I am describing.
>
> The osd_op_complaint_time is with the default value (30 sec), not sure I want 
> to change it base on your experience
I wasn't saying you should set the complaint time to 5, just saying
that's why I have complaints logged with such low block times.
> Thomas

And now I'm testing this:
osd recovery sleep = 0.5
osd snap trim sleep = 0.5

(or fiddling with it as low as 0.1 to make it rebalance faster)

While also changing tunables to optimal (which will rebalance 75% of the
objects)
Which has very good results so far (a few <14s blocks right at the
start, and none since, over an hour ago).

And I'm somehow hoping that will fix my rbd export-diff issue too... but
it at least appears to fix the rebalance causing blocks.

Do you use rbd snapshots? I think that may be causing my issues, based
on things like:

> "description": "osd_op(client.692201.0:20455419 4.1b5a5bc1
> rbd_data.94a08238e1f29.617b [] snapc 918d=[918d]
> ack+ondisk+write+known_if_redirected e40036)",
> "initiated_at": "2016-11-15 20:57:48.313432",
> "age": 409.634862,
> "duration": 3.377347,
> ...
> {
> "time": "2016-11-15 20:57:48.313767",
> "event": "waiting for subops from 0,1,8,22"
> },
> ...
> {
> "time": "2016-11-15 20:57:51.688530",
> "event": "sub_op_applied_rec from 22"
> },

Which says "snapc" in there (CoW?), and I think shows that just one osd
is delayed a few seconds and the rest are really fast, like you said.
(and not sure why I see 4 osds here when I have size 3... node1 osd 0
and 1, and node3 osd 8 and 22)

or some (shorter I think) have description like:
> osd_repop(client.426591.0:203051290 4.1f9
> 4:9fe4c001:::rbd_data.4cf92238e1f29.14ef:head v 40047'2531604)

  _  

This electronic message contains information from Mycom which may be privileged 
or confidential. The information is intended to be
for the use of the individual(s) or entity named above. If you are not the 
intended recipient, be aware that any disclosure,
copying, distribution or any other use of the contents of this information is 
prohibited. If you have received this electronic
message in error, please notify us by post or telephone (to the numbers or 
correspondence address above) or by email (at the email
address above) immediately.

___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] Using Node JS with Ceph Hammer

2016-11-16 Thread fridifree

I didn't find documentation about how to that onpremise and not for aws
itself

On Nov 16, 2016 13:20, "Haomai Wang"  wrote:

> On Wed, Nov 16, 2016 at 7:19 PM, fridifree  wrote:
> > Hi,
> > Thanks
> > This is for rados, not for s3 with nodejs
> > If someone can send examples how to do that I will appreciate it
>
> oh, if you refer to s3, you can get nodejs support from aws doc
>
> >
> > Thank you
> >
> >
> > On Nov 16, 2016 13:07, "Haomai Wang"  wrote:
> >>
> >> https://www.npmjs.com/package/rados
> >>
> >> On Wed, Nov 16, 2016 at 6:29 PM, fridifree  wrote:
> >> > Hi Everyone,
> >> >
> >> > Someone knows how to nodeJS with Ceph S3(Radosgw)
> >> > I succeed to do that on python using boto, I don't find any examples
> >> > about
> >> > how to this on Nodejs.
> >> > If someone can share with me examples I would be happy
> >> >
> >> > Thanks
> >> >
> >> > ___
> >> > ceph-users mailing list
> >> > ceph-users@lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Using Node JS with Ceph Hammer

2016-11-16 Thread Haomai Wang

On Wed, Nov 16, 2016 at 7:19 PM, fridifree  wrote:
> Hi,
> Thanks
> This is for rados, not for s3 with nodejs
> If someone can send examples how to do that I will appreciate it

oh, if you refer to s3, you can get nodejs support from aws doc

>
> Thank you
>
>
> On Nov 16, 2016 13:07, "Haomai Wang"  wrote:
>>
>> https://www.npmjs.com/package/rados
>>
>> On Wed, Nov 16, 2016 at 6:29 PM, fridifree  wrote:
>> > Hi Everyone,
>> >
>> > Someone knows how to nodeJS with Ceph S3(Radosgw)
>> > I succeed to do that on python using boto, I don't find any examples
>> > about
>> > how to this on Nodejs.
>> > If someone can share with me examples I would be happy
>> >
>> > Thanks
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Using Node JS with Ceph Hammer

2016-11-16 Thread fridifree

Hi,
Thanks
This is for rados, not for s3 with nodejs
If someone can send examples how to do that I will appreciate it

Thank you

On Nov 16, 2016 13:07, "Haomai Wang"  wrote:

> https://www.npmjs.com/package/rados
>
> On Wed, Nov 16, 2016 at 6:29 PM, fridifree  wrote:
> > Hi Everyone,
> >
> > Someone knows how to nodeJS with Ceph S3(Radosgw)
> > I succeed to do that on python using boto, I don't find any examples
> about
> > how to this on Nodejs.
> > If someone can share with me examples I would be happy
> >
> > Thanks
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Using Node JS with Ceph Hammer

2016-11-16 Thread Haomai Wang

https://www.npmjs.com/package/rados

On Wed, Nov 16, 2016 at 6:29 PM, fridifree  wrote:
> Hi Everyone,
>
> Someone knows how to nodeJS with Ceph S3(Radosgw)
> I succeed to do that on python using boto, I don't find any examples about
> how to this on Nodejs.
> If someone can share with me examples I would be happy
>
> Thanks
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Antw: Re: hammer on xenial

2016-11-16 Thread 钟佳佳

since you have ceph stuffs installed， you could comment the ceph line in in sourse.lst
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Antw: Re: hammer on xenial

2016-11-16 Thread Steffen Weißgerber






>>> Robert Sander  schrieb am Mittwoch,
16. November
2016 um 10:23:
> On 16.11.2016 09:05, Steffen Weißgerber wrote:
>> Hello,
>> 

Hello,

>> we started upgrading ubuntu on our ceph nodes to Xenial and had to
see that 
> during
>> the upgrade ceph automatically was upgraded from hammer to jewel
also.
>> 
>> Because we don't want to upgrade ceph and the OS at the same time we

> deinstalled
>> the ceph jewel components reactivated
/etc/apt/sources.list.d/ceph.list with
>> 
>> deb http://ceph.com/debian-hammer/ xenial main
>> 
>> and pinned the ceph relaese to install in
/etc/apt/preferences/ceph.pref
> 
> After that process you may still have the Ubuntu trusty packages for
> Ceph Hammer installed.

Hmm, not really. The 'ceph -v' returned 10.x after the system upgrade.

> 
> Do an "apt-get install --reinstall ceph.*" on your node after the
> Upgrade. This should pull the Ubuntu xenial packages and install
them.
> 

I'll try this. Thank you.


Regards

Steffen

> Regards
> -- 
> Robert Sander
> Heinlein Support GmbH
> Schwedter Str. 8/9b, 10119 Berlin
> 
> http://www.heinlein-support.de 
> 
> Tel: 030 / 405051-43
> Fax: 030 / 405051-19
> 
> Zwangsangaben lt. *35a GmbHG:
> HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
> Geschäftsführer: Peer Heinlein -- Sitz: Berlin

-- 
Klinik-Service Neubrandenburg GmbH
Allendestr. 30, 17036 Neubrandenburg
Amtsgericht Neubrandenburg, HRB 2457
Geschaeftsfuehrerin: Gudrun Kappich
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Antw: Re: hammer on xenial

2016-11-16 Thread Steffen Weißgerber

Hi,

after doing 'apt-mark hold ceph' the upgrade failed.
It seems due to some kind of fetch failed:

...
OK http://archive.ubuntu.com trusty-backports/universe amd64 Packages  
  
Fehl http://ceph.com xenial/main Translation-en
  
   
  
OK http://archive.ubuntu.com trusty-backports/multiverse amd64 Packages
  
OK http://archive.ubuntu.com trusty-backports/main i386 Packages   
  
OK http://archive.ubuntu.com trusty-backports/restricted i386 Packages 
  
Fehl http://ceph.com xenial/main Translation-de
  
   
  
OK http://archive.ubuntu.com trusty-backports/universe i386 Packages   
  
OK http://archive.ubuntu.com trusty-backports/multiverse i386 Packages 
  
Fehl http://ceph.com trusty/main Translation-en
  
   
  
OK http://archive.ubuntu.com trusty-backports/main Translation-en  
  
OK http://archive.ubuntu.com trusty-backports/multiverse Translation-en
  
OK http://archive.ubuntu.com trusty-backports/restricted Translation-en
  
Fehl http://ceph.com trusty/main Translation-de   

...

Seems not to work.

Ragrds

Steffen


>>> "钟佳佳"  16.11.2016 09:32 >>>
hi :
you can google apt-mark
apt-mark hold PACKAGENAME
 
 
-- Original --
From:  "Steffen Weißgerbe";
Date:  Wed, Nov 16, 2016 04:05 PM
To:  "CEPH list"; 

Subject:  [ceph-users] hammer on xenial

 
Hello,

we started upgrading ubuntu on our ceph nodes to Xenial and had to see
that during
the upgrade ceph automatically was upgraded from hammer to jewel also.

Because we don't want to upgrade ceph and the OS at the same time we
deinstalled
the ceph jewel components reactivated /etc/apt/sources.list.d/ceph.list
with

deb http://ceph.com/debian-hammer/ xenial main

and pinned the ceph relaese to install in
/etc/apt/preferences/ceph.pref

Package: *
Pin: version 0.94*
Pin: origin ceph.com
Pin-Priority: 999

Now after restarting the node the ceph daemons are not active and can't
be started
by /etc/init.d/ceph.

It seems that this is caused by the missing systemd unit files for the
mon and osd's
in /lib/systemd/system/.

What would be the right way to fix this?
Maybe we could use the target and service files from
https://github.com/ceph/ceph/tree/master/systemd 
but we don't know how to use it manually.

And is there a way to upgrade Ubuntu with avoiding the ceph upgrade?

Thanks in advance.

Steffen



-- 
Klinik-Service Neubrandenburg GmbH
Allendestr. 30, 17036 Neubrandenburg
Amtsgericht Neubrandenburg, HRB 2457
Geschaeftsfuehrerin: Gudrun Kappich
___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-- 
Klinik-Service Neubrandenburg GmbH
Allendestr. 30, 17036 Neubrandenburg
Amtsgericht Neubrandenburg, HRB 2457
Geschaeftsfuehrerin: Gudrun Kappich
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Using Node JS with Ceph Hammer

2016-11-16 Thread fridifree

Hi Everyone,

Someone knows how to nodeJS with Ceph S3(Radosgw)
I succeed to do that on python using boto, I don't find any examples about
how to this on Nodejs.
If someone can share with me examples I would be happy

Thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS - Couple of questions

2016-11-16 Thread John Spray

On Wed, Nov 16, 2016 at 8:55 AM, James Wilkins
 wrote:
> Hello,
>
>
>
> Hoping to pick any users brains in relation to production CephFS deployments
> as we’re preparing to deploy CephFS to replace Gluster for our container
> based storage needs.
>
>
>
> (Target OS is Centos7  for both servers/clients & latest jewel release)
>
>
>
> o) Based on our performance testing we’re seeing the kernel client by far
> out-performs the fuse client – older mailing list posts from 2014 suggest
> this is expected, is the recommendation still to use the kernel client?

The kernel client does usually beat the fuse client in benchmarks, but
the practical difference depends on how data/metadata heavy your
workload is, and how much your workload concentrates through a single
client vs. having multiple less-loaded clients.  Many everyday
workloads would not notice the difference.

In general I recommend that you use the fuse client unless its
performance becomes an issue for you, in which case you go down the
road of working out whether you are comfortable with using a recent
enough kernel to have the latest cephfs fixes (or switching to a
distro that has backports in its stable kernel).

John

>
>
> o) Ref: http://docs.ceph.com/docs/master/cephfs/experimental-features/ lists
> multiple MDS as experimental – I’m assuming this refers to multiple active
> MDS and having one active / X standby is a valid/stable configuration?  (We
> haven’t noticed any issues during testing – just wanting to be sure).
>
>
>
> Cheers,
>
>
>
>
>
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] hammer on xenial

2016-11-16 Thread Robert Sander

On 16.11.2016 09:05, Steffen Weißgerber wrote:
> Hello,
> 
> we started upgrading ubuntu on our ceph nodes to Xenial and had to see that 
> during
> the upgrade ceph automatically was upgraded from hammer to jewel also.
> 
> Because we don't want to upgrade ceph and the OS at the same time we 
> deinstalled
> the ceph jewel components reactivated /etc/apt/sources.list.d/ceph.list with
> 
> deb http://ceph.com/debian-hammer/ xenial main
> 
> and pinned the ceph relaese to install in /etc/apt/preferences/ceph.pref

After that process you may still have the Ubuntu trusty packages for
Ceph Hammer installed.

Do an "apt-get install --reinstall ceph.*" on your node after the
Upgrade. This should pull the Ubuntu xenial packages and install them.

Regards
-- 
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Antw: Re: hammer on xenial

2016-11-16 Thread Steffen Weißgerber

Hi,

looks good.

Because I've made an image fo the node's system disk I can revert to
the state before the upgrade and restart the hole process.

Thank you.

Steffen


>>> "钟佳佳"  16.11.2016 09:32 >>>
hi :
you can google apt-mark
apt-mark hold PACKAGENAME
 
 
-- Original --
From:  "Steffen Weißgerbe";
Date:  Wed, Nov 16, 2016 04:05 PM
To:  "CEPH list"; 

Subject:  [ceph-users] hammer on xenial

 
Hello,

we started upgrading ubuntu on our ceph nodes to Xenial and had to see
that during
the upgrade ceph automatically was upgraded from hammer to jewel also.

Because we don't want to upgrade ceph and the OS at the same time we
deinstalled
the ceph jewel components reactivated /etc/apt/sources.list.d/ceph.list
with

deb http://ceph.com/debian-hammer/ xenial main

and pinned the ceph relaese to install in
/etc/apt/preferences/ceph.pref

Package: *
Pin: version 0.94*
Pin: origin ceph.com
Pin-Priority: 999

Now after restarting the node the ceph daemons are not active and can't
be started
by /etc/init.d/ceph.

It seems that this is caused by the missing systemd unit files for the
mon and osd's
in /lib/systemd/system/.

What would be the right way to fix this?
Maybe we could use the target and service files from
https://github.com/ceph/ceph/tree/master/systemd 
but we don't know how to use it manually.

And is there a way to upgrade Ubuntu with avoiding the ceph upgrade?

Thanks in advance.

Steffen



-- 
Klinik-Service Neubrandenburg GmbH
Allendestr. 30, 17036 Neubrandenburg
Amtsgericht Neubrandenburg, HRB 2457
Geschaeftsfuehrerin: Gudrun Kappich
___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Klinik-Service Neubrandenburg GmbH
Allendestr. 30, 17036 Neubrandenburg
Amtsgericht Neubrandenburg, HRB 2457
Geschaeftsfuehrerin: Gudrun Kappich
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Fwd: iSCSI Lun issue after MON Out Of Memory

2016-11-16 Thread Nick Fisk

I assume you mean you only had 1 mon and it crashed, so effectively the iSCSI 
suddenly went offline?

 

I suspect somehow that you have corrupted the NTFS volume, are there any errors 
in the event log?

 

You may be able to use some disk recovery tools to try and fix the FS. Maybe 
also try mounting the RBD on a linux host and using the Linux NTFS tools to try 
and mount the volume?

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Daleep 
Singh Bais
Sent: 16 November 2016 03:41
To: ceph-users 
Subject: [ceph-users] Fwd: iSCSI Lun issue after MON Out Of Memory

 

Dear All,

Any suggestion in this regard will be helpful.

Thanks,
Daleep Singh Bais



 Forwarded Message  


Subject: 

iSCSI Lun issue after MON Out Of Memory


Date: 

Tue, 15 Nov 2016 11:58:07 +0530


From: 

Daleep Singh Bais   


To: 

ceph-users   

 

Hello friends,
 
I had RBD images mapped to Windows client through iSCSI, however, the
MON got OOM due to some unknown reason. After rebooting MON, I am able
to mount one of the image/ iSCSI LUN back to client, second image when
mapped is shown as unallocated on windows client. I have data on that
image,hence cannot reformat the LUN.
 
Please suggest.
 
I am able to see objects when I do rados ls for that pool with image id.
 
Thanks,
 
Daleep Singh Bais
 
 
 

 

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] hammer on xenial

2016-11-16 Thread 钟佳佳

hi :
you can google apt-mark
apt-mark hold PACKAGENAME

-- Original --
From:  "Steffen Weißgerbe";
Date:  Wed, Nov 16, 2016 04:05 PM
To:  "CEPH list"; 

Subject:  [ceph-users] hammer on xenial

Hello,

we started upgrading ubuntu on our ceph nodes to Xenial and had to see that 
during
the upgrade ceph automatically was upgraded from hammer to jewel also.

Because we don't want to upgrade ceph and the OS at the same time we deinstalled
the ceph jewel components reactivated /etc/apt/sources.list.d/ceph.list with

deb http://ceph.com/debian-hammer/ xenial main

and pinned the ceph relaese to install in /etc/apt/preferences/ceph.pref

Package: *
Pin: version 0.94*
Pin: origin ceph.com
Pin-Priority: 999

Now after restarting the node the ceph daemons are not active and can't be 
started
by /etc/init.d/ceph.

It seems that this is caused by the missing systemd unit files for the mon and 
osd's
in /lib/systemd/system/.

What would be the right way to fix this?
Maybe we could use the target and service files from
https://github.com/ceph/ceph/tree/master/systemd 
but we don't know how to use it manually.

And is there a way to upgrade Ubuntu with avoiding the ceph upgrade?

Thanks in advance.

Steffen

-- 
Klinik-Service Neubrandenburg GmbH
Allendestr. 30, 17036 Neubrandenburg
Amtsgericht Neubrandenburg, HRB 2457
Geschaeftsfuehrerin: Gudrun Kappich
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] hammer on xenial

2016-11-16 Thread Steffen Weißgerber

Hello,

we started upgrading ubuntu on our ceph nodes to Xenial and had to see that 
during
the upgrade ceph automatically was upgraded from hammer to jewel also.

Because we don't want to upgrade ceph and the OS at the same time we deinstalled
the ceph jewel components reactivated /etc/apt/sources.list.d/ceph.list with

deb http://ceph.com/debian-hammer/ xenial main

and pinned the ceph relaese to install in /etc/apt/preferences/ceph.pref

Package: *
Pin: version 0.94*
Pin: origin ceph.com
Pin-Priority: 999

Now after restarting the node the ceph daemons are not active and can't be 
started
by /etc/init.d/ceph.

It seems that this is caused by the missing systemd unit files for the mon and 
osd's
in /lib/systemd/system/.

What would be the right way to fix this?
Maybe we could use the target and service files from
https://github.com/ceph/ceph/tree/master/systemd 
but we don't know how to use it manually.

And is there a way to upgrade Ubuntu with avoiding the ceph upgrade?

Thanks in advance.

Steffen



-- 
Klinik-Service Neubrandenburg GmbH
Allendestr. 30, 17036 Neubrandenburg
Amtsgericht Neubrandenburg, HRB 2457
Geschaeftsfuehrerin: Gudrun Kappich
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

38 matches

Mail list logo