from:"Karun Josy"

Re: [ceph-users] osd_recovery_max_chunk value

2018-02-05 Thread Karun Josy

 Hi Christian,

Thank you for your help.

Ceph version is 12.2.2. So is this value bad ? Do you have any suggestions ?


So to reduce the max chunk ,I assume I can choose something like
7 << 20 ,ie 7340032 ?

Karun Josy

On Tue, Feb 6, 2018 at 1:15 PM, Christian Balzer  wrote:

> On Tue, 6 Feb 2018 13:01:12 +0530 Karun Josy wrote:
>
> > Hello,
> >
> > We are seeing slow requests while recovery process going on.
> >
> > I am trying to slow down the recovery process. I set
> osd_recovery_max_active
> > and  osd_recovery_sleep as below :
> > --
> > ceph tell osd.* injectargs '--osd_recovery_max_active 1'
> > ceph tell osd.* injectargs '--osd_recovery_sleep .1'
> > --
> What version of Ceph, in some "sleep" values will make things _worse_!
> Would be nice if that was documented in like, the documentation...
>
> >
> > But I am confused with the  osd_recovery_max_chunk. Currently, it shows
> > 8388608.
> >
> > # ceph daemon osd.4 config get osd_recovery_max_chunk
> > {
> > "osd_recovery_max_chunk": "8388608"
> >
> >
> > In ceph documentation, it shows
> >
> > ---
> > osd recovery max chunk
> > Description: The maximum size of a recovered chunk of data to push.
> > Type: 64-bit Unsigned Integer
> > Default: 8 << 20
> > 
> >
> > I am confused. Can anyone let me know what is the value that I have to
> give
> > to reduce this parameter ?
> >
> This is what you get when programmers write docs.
>
> The above is a left-shift operation, see for example:
> http://bit-calculator.com/bit-shift-calculator
>
> Now if shrinking that value is beneficial for reducing recovery load,
> that's for you to find out.
>
> Christian
>
> >
> >
> > Karun Josy
>
>
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Rakuten Communications
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] osd_recovery_max_chunk value

2018-02-05 Thread Karun Josy

Hello,

We are seeing slow requests while recovery process going on.

I am trying to slow down the recovery process. I set  osd_recovery_max_active
and  osd_recovery_sleep as below :
--
ceph tell osd.* injectargs '--osd_recovery_max_active 1'
ceph tell osd.* injectargs '--osd_recovery_sleep .1'
--

But I am confused with the  osd_recovery_max_chunk. Currently, it shows
8388608.

# ceph daemon osd.4 config get osd_recovery_max_chunk
{
"osd_recovery_max_chunk": "8388608"


In ceph documentation, it shows

---
osd recovery max chunk
Description: The maximum size of a recovered chunk of data to push.
Type: 64-bit Unsigned Integer
Default: 8 << 20


I am confused. Can anyone let me know what is the value that I have to give
to reduce this parameter ?



Karun Josy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] High RAM usage in OSD servers

2018-02-03 Thread Karun Josy

Can it be this bug :
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021676.html

In most of the OSDs buffer anon is high
  },
"buffer_anon": {
"items": 268443,
"bytes": 1421912265


Karun Josy

On Sun, Feb 4, 2018 at 7:03 AM, Karun Josy  wrote:

> And can see this in error log :
>
> Feb  2 16:41:28 ceph-las1-a4-osd kernel: bstore_kv_sync: page allocation
> stalls for 14188ms, order:0, mode:0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO),
> nodemask=(null)
>
>
> Karun Josy
>
> On Sun, Feb 4, 2018 at 6:19 AM, Karun Josy  wrote:
>
>> Hi,
>>
>> We are using EC profile in our cluster.
>> We are seeing very high RAM usage in 1 OSD server.
>> Sometimes it goes too low and server hangs. We have to restart the
>> daemons which frees up the memory, but in very short time get used up again
>>
>> Memory usage of daemons from issue server
>> -
>>   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
>> COMMAND
>>
>> 16918 ceph  20   0 15.780g 0.013t   7928 S  28.2  21.9  67:29.09
>> ceph-osd
>> 18568 ceph  20   0 25.833g 0.023t  26096 S  24.9 36.8   9:15.58
>> ceph-osd
>> 22630 ceph  20   0 12.520g 0.011t  26660 S  22.3 18.3   5:49.03
>> ceph-osd
>>  2796 ceph  20   0 11.091g 9.851g   8900 S  13.6 15.7  25:17.68
>> ceph-osd
>>
>>
>> Memory usage from another server :
>> ---
>>   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
>> COMMAND
>> 11649 ceph  20   0 12.788g 9.563g  25068 S 107.6  7.6  12285:54
>> ceph-osd
>> 18295 ceph  20   0 11.028g 6.069g  26212 S  54.0  4.8   2122:18
>> ceph-osd
>> 30974 ceph  20   0 13.860g 0.010t  24956 S  46.4  8.1  10984:47
>> ceph-osd
>>
>>
>> We are using ec profile 5/3. And there are 2 failed disks in the cluster
>> in another nodes, (I have marked them down, but not out) so cannot turn
>> this node off as it will force some pgs to be incomplete state.
>>
>> And help would be really appreciated.
>>
>>
>> Karun Josy
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] High RAM usage in OSD servers

2018-02-03 Thread Karun Josy

And can see this in error log :

Feb  2 16:41:28 ceph-las1-a4-osd kernel: bstore_kv_sync: page allocation
stalls for 14188ms, order:0,
mode:0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null)


Karun Josy

On Sun, Feb 4, 2018 at 6:19 AM, Karun Josy  wrote:

> Hi,
>
> We are using EC profile in our cluster.
> We are seeing very high RAM usage in 1 OSD server.
> Sometimes it goes too low and server hangs. We have to restart the daemons
> which frees up the memory, but in very short time get used up again
>
> Memory usage of daemons from issue server
> -
>   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
>
> 16918 ceph  20   0 15.780g 0.013t   7928 S  28.2  21.9  67:29.09
> ceph-osd
> 18568 ceph  20   0 25.833g 0.023t  26096 S  24.9 36.8   9:15.58
> ceph-osd
> 22630 ceph  20   0 12.520g 0.011t  26660 S  22.3 18.3   5:49.03
> ceph-osd
>  2796 ceph  20   0 11.091g 9.851g   8900 S  13.6 15.7  25:17.68
> ceph-osd
>
>
> Memory usage from another server :
> ---
>   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
> 11649 ceph  20   0 12.788g 9.563g  25068 S 107.6  7.6  12285:54
> ceph-osd
> 18295 ceph  20   0 11.028g 6.069g  26212 S  54.0  4.8   2122:18
> ceph-osd
> 30974 ceph  20   0 13.860g 0.010t  24956 S  46.4  8.1  10984:47
> ceph-osd
>
>
> We are using ec profile 5/3. And there are 2 failed disks in the cluster
> in another nodes, (I have marked them down, but not out) so cannot turn
> this node off as it will force some pgs to be incomplete state.
>
> And help would be really appreciated.
>
>
> Karun Josy
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] High RAM usage in OSD servers

2018-02-03 Thread Karun Josy

Hi,

We are using EC profile in our cluster.
We are seeing very high RAM usage in 1 OSD server.
Sometimes it goes too low and server hangs. We have to restart the daemons
which frees up the memory, but in very short time get used up again

Memory usage of daemons from issue server
-
  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND

16918 ceph  20   0 15.780g 0.013t   7928 S  28.2  21.9  67:29.09
ceph-osd
18568 ceph  20   0 25.833g 0.023t  26096 S  24.9 36.8   9:15.58 ceph-osd
22630 ceph  20   0 12.520g 0.011t  26660 S  22.3 18.3   5:49.03 ceph-osd
 2796 ceph  20   0 11.091g 9.851g   8900 S  13.6 15.7  25:17.68 ceph-osd


Memory usage from another server :
---
  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
11649 ceph  20   0 12.788g 9.563g  25068 S 107.6  7.6  12285:54 ceph-osd
18295 ceph  20   0 11.028g 6.069g  26212 S  54.0  4.8   2122:18 ceph-osd
30974 ceph  20   0 13.860g 0.010t  24956 S  46.4  8.1  10984:47 ceph-osd


We are using ec profile 5/3. And there are 2 failed disks in the cluster in
another nodes, (I have marked them down, but not out) so cannot turn this
node off as it will force some pgs to be incomplete state.

And help would be really appreciated.


Karun Josy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Snapshot trimming

2018-01-30 Thread Karun Josy

Hi Jason,

>> Was the base RBD pool used only for data-pool associated images
Yes, it is only used for storing metadata of ecpool.

We use 2 pools for erasure coding

ecpool - erasure coded datapool
vm -  replicated pool to store  metadata

Karun Josy

On Tue, Jan 30, 2018 at 8:00 PM, Jason Dillaman  wrote:

> Unfortunately, any snapshots created prior to 12.2.2 against a separate
> data pool were incorrectly associated to the base image pool instead of the
> data pool. Was the base RBD pool used only for data-pool associated images
> (i.e. all the snapshots that exists within the pool can be safely deleted)?
>
> On Mon, Jan 29, 2018 at 11:50 AM, Karun Josy  wrote:
>
>>
>> The problem we are experiencing is described here:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1497332
>>
>> However, we are running 12.2.2.
>>
>> Across our 6 ceph clusters, this one with the problem was first version
>> 12.2.0, then upgraded to .1 and then to .2.
>>
>> The other 5 ceph installations started as version 12.2.1 and then updated
>> to .2.
>>
>> Karun Josy
>>
>> On Mon, Jan 29, 2018 at 7:01 PM, Karun Josy  wrote:
>>
>>> Thank you for your response.
>>>
>>> We don't think there is an issue with the cluster being behind snap
>>> trimming. We just don't think snaptrim is occurring at all.
>>>
>>> We have 6 individual ceph clusters. When we delete old snapshots for
>>> clients, we can see space being made available. In this particular one
>>> however, with 300 virtual machines, 28TBs of data (this is our largest
>>> ceph), I can delete hundreds of snapshots, and not a single gigabyte
>>> becomes available after doing that.
>>>
>>> In our other 5, smaller Ceph clusters, we can see hundreds of gigabytes
>>> becoming available again after doing massive deletions of snapshots.
>>>
>>> The Luminous gui also never shows "snaptrimming" occurring in the EC
>>> pool.  While the other 5 Luminous clusters, their GUI will show
>>> snaptrimming occurring for the EC pool. Within minutes we can see the
>>> additional space becoming available.
>>>
>>> This isn't an issue of the trimming queue behind schedule. The system
>>> shows there is no trimming scheduled in the queue, ever.
>>>
>>> However, when using ceph du on particular virtual machines, we can see
>>> that snapshots we delete are indeed no longer listed in ceph du's output.
>>>
>>> So, they seem to be deleting. But the space is not being reclaimed.
>>>
>>> All clusters are same hardware. Some have more disks and servers than
>>> others. The only major difference is that this particular Ceph with this
>>> problem, it had the noscrub and nodeep-scrub flags set for many weeks.
>>>
>>>
>>> Karun Josy
>>>
>>> On Mon, Jan 29, 2018 at 6:27 PM, David Turner 
>>> wrote:
>>>
>>>> I don't know why you keep asking the same question about snap trimming.
>>>> You haven't shown any evidence that your cluster is behind on that. Have
>>>> you looked into fstrim inside of your VMs?
>>>>
>>>> On Mon, Jan 29, 2018, 4:30 AM Karun Josy  wrote:
>>>>
>>>>> fast-diff map is not enabled for RBD images.
>>>>> Can it be a reason for Trimming not happening ?
>>>>>
>>>>> Karun Josy
>>>>>
>>>>> On Sat, Jan 27, 2018 at 10:19 PM, Karun Josy 
>>>>> wrote:
>>>>>
>>>>>> Hi David,
>>>>>>
>>>>>> Thank you for your reply! I really appreciate it.
>>>>>>
>>>>>> The images are in pool id 55. It is an erasure coded pool.
>>>>>>
>>>>>> ---
>>>>>> $ echo $(( $(ceph pg  55.58 query | grep snap_trimq | cut -d[ -f2 |
>>>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>>>> 0
>>>>>> $ echo $(( $(ceph pg  55.a query | grep snap_trimq | cut -d[ -f2 |
>>>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>>>> 0
>>>>>> $ echo $(( $(ceph pg  55.65 query | grep snap_trimq | cut -d[ -f2 |
>>>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>>>> 0
>>>>>> --
>>>>>>
>>>>>> Current snap_trim_sleep value is default.
>>>>>&

Re: [ceph-users] lease_timeout

2018-01-29 Thread Karun Josy

Thank you for looking into it.

Yes, I believe it is the same issue as reported in the bug.

 Sorry I was not specific.
- Health section is not updated
- The Activity values under Pools section (right side) gets stuck it shows
the old data and it is not updated.

However, the Cluster log section gets updated correctly.




Karun Josy

On Tue, Jan 30, 2018 at 1:35 AM, John Spray  wrote:

> On Mon, Jan 29, 2018 at 6:58 PM, Gregory Farnum 
> wrote:
> > The lease timeout means this (peon) monitor hasn't heard from the leader
> > monitor in too long; its read lease on the system state has expired. So
> it
> > calls a new election since that means the leader is down or misbehaving.
> Do
> > the other monitors have a similar problem at this stage?
> >
> > The manager freezing until you restart it is a separate bug, but I'm not
> > sure what the dashboard/mgr people will want to see there. John?
>
> There is a bug where the mgr will stop getting updates from the mon in
> some situations (http://tracker.ceph.com/issues/22142), which is fixed
> in master but not backported to luminous yet.
>
> However, I don't know what "gets stuck" means in this context.  Karun,
> can you be more specific?  Is it rendering but old data?  Is the page
> not loading at all?
>
> John
>
> > -Greg
> >
> > On Sun, Jan 28, 2018 at 9:11 AM Karun Josy  wrote:
> >>
> >> Still the issue is continuing. Any one else has noticed it ?
> >>
> >>
> >> When this happens, the Ceph Dashboard GUI gets stuck and we have to
> >> restart the manager daemon to make it work again
> >>
> >> Karun Josy
> >>
> >> On Wed, Jan 17, 2018 at 6:16 AM, Karun Josy 
> wrote:
> >>>
> >>> Hello,
> >>>
> >>> In one of our cluster set up, there is frequent monitor elections
> >>> happening.
> >>> In the logs of one of the monitor, there is "lease_timeout" message
> >>> before that happens. Can anyone help me to figure it out ?
> >>> (When this happens, the Ceph Dashboard GUI gets stuck and we have to
> >>> restart the manager daemon to make it work again)
> >>>
> >>> Ceph version : Luminous 12.2.2
> >>>
> >>> Log :
> >>> =
> >>>
> >>> 2018-01-16 16:33:08.001937 7f0cfbaad700  4 rocksdb:
> >>> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_
> 64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/
> centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/
> ceph-12.2.2/src/rocksdb/db/compaction_job.cc:1173]
> >>> [default] [JOB 885] Compacted 1@0 + 1@1 files to L1 => 20046585 bytes
> >>> 2018-01-16 16:33:08.015891 7f0cfbaad700  4 rocksdb: (Original Log Time
> >>> 2018/01/16-16:33:08.015826)
> >>> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_
> 64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/
> centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/
> ceph-12.2.2/src/rocksdb/db/compaction_job.cc:621]
> >>> [default] compacted to: base level 1 max bytes base 268435456 files[0
> 1 0 0
> >>> 0 0 0] max score 0.07, MB/sec: 32.7 rd, 30.9 wr, level 1, files in(1,
> 1)
> >>> out(1) MB in(1.3, 18.9) out(19.1), read-write-amplify(31.0)
> >>> write-amplify(15.1) OK, records in: 4305, records dropped: 515
> >>>
> >>> 2018-01-16 16:33:08.015897 7f0cfbaad700  4 rocksdb: (Original Log Time
> >>> 2018/01/16-16:33:08.015840) EVENT_LOG_v1 {"time_micros":
> 1516149188015833,
> >>> "job": 885, "event": "compaction_finished", "compaction_time_micros":
> >>> 647876, "output_level": 1, "num_output_files": 1, "total_output_size":
> >>> 20046585, "num_input_records": 4305, "num_output_records": 3790,
> >>> "num_subcompactions": 1, "num_single_delete_mismatches": 0,
> >>> "num_single_delete_fallthrough": 0, "lsm_state": [0, 1, 0, 0, 0, 0,
> 0]}
> >>> 2018-01-16 16:33:08.016131 7f0cfbaad700  4 rocksdb: EVENT_LOG_v1
> >>> {"time_micros": 1516149188016128, "job": 885, "event":
> >>> "table_file_deletion", "file_number": 2419}
> >>> 2018-01-16 16:33:08.018147 7f0cfbaad700  4 rocksdb: EVENT_LOG_v1
> >>> {"time_micros": 1516149188018146, "job": 885, "event":
> >>> "table_file_deletion", "file_numb

Re: [ceph-users] Snapshot trimming

2018-01-29 Thread Karun Josy

The problem we are experiencing is described here:

https://bugzilla.redhat.com/show_bug.cgi?id=1497332

However, we are running 12.2.2.

Across our 6 ceph clusters, this one with the problem was first version
12.2.0, then upgraded to .1 and then to .2.

The other 5 ceph installations started as version 12.2.1 and then updated
to .2.

Karun Josy

On Mon, Jan 29, 2018 at 7:01 PM, Karun Josy  wrote:

> Thank you for your response.
>
> We don't think there is an issue with the cluster being behind snap
> trimming. We just don't think snaptrim is occurring at all.
>
> We have 6 individual ceph clusters. When we delete old snapshots for
> clients, we can see space being made available. In this particular one
> however, with 300 virtual machines, 28TBs of data (this is our largest
> ceph), I can delete hundreds of snapshots, and not a single gigabyte
> becomes available after doing that.
>
> In our other 5, smaller Ceph clusters, we can see hundreds of gigabytes
> becoming available again after doing massive deletions of snapshots.
>
> The Luminous gui also never shows "snaptrimming" occurring in the EC
> pool.  While the other 5 Luminous clusters, their GUI will show
> snaptrimming occurring for the EC pool. Within minutes we can see the
> additional space becoming available.
>
> This isn't an issue of the trimming queue behind schedule. The system
> shows there is no trimming scheduled in the queue, ever.
>
> However, when using ceph du on particular virtual machines, we can see
> that snapshots we delete are indeed no longer listed in ceph du's output.
>
> So, they seem to be deleting. But the space is not being reclaimed.
>
> All clusters are same hardware. Some have more disks and servers than
> others. The only major difference is that this particular Ceph with this
> problem, it had the noscrub and nodeep-scrub flags set for many weeks.
>
>
> Karun Josy
>
> On Mon, Jan 29, 2018 at 6:27 PM, David Turner 
> wrote:
>
>> I don't know why you keep asking the same question about snap trimming.
>> You haven't shown any evidence that your cluster is behind on that. Have
>> you looked into fstrim inside of your VMs?
>>
>> On Mon, Jan 29, 2018, 4:30 AM Karun Josy  wrote:
>>
>>> fast-diff map is not enabled for RBD images.
>>> Can it be a reason for Trimming not happening ?
>>>
>>> Karun Josy
>>>
>>> On Sat, Jan 27, 2018 at 10:19 PM, Karun Josy 
>>> wrote:
>>>
>>>> Hi David,
>>>>
>>>> Thank you for your reply! I really appreciate it.
>>>>
>>>> The images are in pool id 55. It is an erasure coded pool.
>>>>
>>>> ---
>>>> $ echo $(( $(ceph pg  55.58 query | grep snap_trimq | cut -d[ -f2 | cut
>>>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>> 0
>>>> $ echo $(( $(ceph pg  55.a query | grep snap_trimq | cut -d[ -f2 | cut
>>>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>> 0
>>>> $ echo $(( $(ceph pg  55.65 query | grep snap_trimq | cut -d[ -f2 | cut
>>>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>> 0
>>>> --
>>>>
>>>> Current snap_trim_sleep value is default.
>>>> "osd_snap_trim_sleep": "0.00". I assume it means there is no delay.
>>>> (Can't find any documentation related to it)
>>>> Will changing its value initiate snaptrimming, like
>>>> ceph tell osd.* injectargs '--osd_snap_trim_sleep 0.05'
>>>>
>>>> Also, we are using an rbd user with the below profile. It is used while
>>>> deleting snapshots
>>>> ---
>>>> caps: [mon] profile rbd
>>>> caps: [osd] profile rbd pool=ecpool, profile rbd pool=vm,
>>>> profile rbd-read-only pool=templates
>>>> ---
>>>>
>>>> Can it be a reason ?
>>>>
>>>> Also, can you let me know which all logs to check while deleting
>>>> snapshots to see if it is snaptrimming ?
>>>> I am sorry I feel like pestering you too much.
>>>> But in mailing lists, I can see you have dealt with similar issues with
>>>> Snapshots
>>>> So I think you can help me figure this mess out.
>>>>
>>>>
>>>> Karun Josy
>>>>
>>>> On Sat, Jan 27, 2018 at 7:15 PM, David Turner 
>>>> wrote:
>>>>
>>>>> Prove* a positive
>>>>>
>>>>> On Sat,

Re: [ceph-users] Snapshot trimming

2018-01-29 Thread Karun Josy

 Thank you for your response.

We don't think there is an issue with the cluster being behind snap
trimming. We just don't think snaptrim is occurring at all.

We have 6 individual ceph clusters. When we delete old snapshots for
clients, we can see space being made available. In this particular one
however, with 300 virtual machines, 28TBs of data (this is our largest
ceph), I can delete hundreds of snapshots, and not a single gigabyte
becomes available after doing that.

In our other 5, smaller Ceph clusters, we can see hundreds of gigabytes
becoming available again after doing massive deletions of snapshots.

The Luminous gui also never shows "snaptrimming" occurring in the EC pool.
While the other 5 Luminous clusters, their GUI will show snaptrimming
occurring for the EC pool. Within minutes we can see the additional space
becoming available.

This isn't an issue of the trimming queue behind schedule. The system shows
there is no trimming scheduled in the queue, ever.

However, when using ceph du on particular virtual machines, we can see that
snapshots we delete are indeed no longer listed in ceph du's output.

So, they seem to be deleting. But the space is not being reclaimed.

All clusters are same hardware. Some have more disks and servers than
others. The only major difference is that this particular Ceph with this
problem, it had the noscrub and nodeep-scrub flags set for many weeks.

Karun Josy

On Mon, Jan 29, 2018 at 6:27 PM, David Turner  wrote:

> I don't know why you keep asking the same question about snap trimming.
> You haven't shown any evidence that your cluster is behind on that. Have
> you looked into fstrim inside of your VMs?
>
> On Mon, Jan 29, 2018, 4:30 AM Karun Josy  wrote:
>
>> fast-diff map is not enabled for RBD images.
>> Can it be a reason for Trimming not happening ?
>>
>> Karun Josy
>>
>> On Sat, Jan 27, 2018 at 10:19 PM, Karun Josy 
>> wrote:
>>
>>> Hi David,
>>>
>>> Thank you for your reply! I really appreciate it.
>>>
>>> The images are in pool id 55. It is an erasure coded pool.
>>>
>>> ---
>>> $ echo $(( $(ceph pg  55.58 query | grep snap_trimq | cut -d[ -f2 | cut
>>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>> 0
>>> $ echo $(( $(ceph pg  55.a query | grep snap_trimq | cut -d[ -f2 | cut
>>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>> 0
>>> $ echo $(( $(ceph pg  55.65 query | grep snap_trimq | cut -d[ -f2 | cut
>>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>> 0
>>> --
>>>
>>> Current snap_trim_sleep value is default.
>>> "osd_snap_trim_sleep": "0.00". I assume it means there is no delay.
>>> (Can't find any documentation related to it)
>>> Will changing its value initiate snaptrimming, like
>>> ceph tell osd.* injectargs '--osd_snap_trim_sleep 0.05'
>>>
>>> Also, we are using an rbd user with the below profile. It is used while
>>> deleting snapshots
>>> ---
>>> caps: [mon] profile rbd
>>> caps: [osd] profile rbd pool=ecpool, profile rbd pool=vm,
>>> profile rbd-read-only pool=templates
>>> ---
>>>
>>> Can it be a reason ?
>>>
>>> Also, can you let me know which all logs to check while deleting
>>> snapshots to see if it is snaptrimming ?
>>> I am sorry I feel like pestering you too much.
>>> But in mailing lists, I can see you have dealt with similar issues with
>>> Snapshots
>>> So I think you can help me figure this mess out.
>>>
>>>
>>> Karun Josy
>>>
>>> On Sat, Jan 27, 2018 at 7:15 PM, David Turner 
>>> wrote:
>>>
>>>> Prove* a positive
>>>>
>>>> On Sat, Jan 27, 2018, 8:45 AM David Turner 
>>>> wrote:
>>>>
>>>>> Unless you have things in your snap_trimq, your problem isn't snap
>>>>> trimming. That is currently how you can check snap trimming and you say
>>>>> you're caught up.
>>>>>
>>>>> Are you certain that you are querying the correct pool for the images
>>>>> you are snapshotting. You showed that you tested 4 different pools. You
>>>>> should only need to check the pool with the images you are dealing with.
>>>>>
>>>>> You can inversely price a positive by changing your snap_trim settings
>>>>> to not do any cleanup and see if the appropriate PGs have anything in 
>>>>>

Re: [ceph-users] Snapshot trimming

2018-01-29 Thread Karun Josy

fast-diff map is not enabled for RBD images.
Can it be a reason for Trimming not happening ?

Karun Josy

On Sat, Jan 27, 2018 at 10:19 PM, Karun Josy  wrote:

> Hi David,
>
> Thank you for your reply! I really appreciate it.
>
> The images are in pool id 55. It is an erasure coded pool.
>
> ---
> $ echo $(( $(ceph pg  55.58 query | grep snap_trimq | cut -d[ -f2 | cut
> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
> 0
> $ echo $(( $(ceph pg  55.a query | grep snap_trimq | cut -d[ -f2 | cut -d]
> -f1 | tr ',' '\n' | wc -l) - 1 ))
> 0
> $ echo $(( $(ceph pg  55.65 query | grep snap_trimq | cut -d[ -f2 | cut
> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
> 0
> --
>
> Current snap_trim_sleep value is default.
> "osd_snap_trim_sleep": "0.00". I assume it means there is no delay.
> (Can't find any documentation related to it)
> Will changing its value initiate snaptrimming, like
> ceph tell osd.* injectargs '--osd_snap_trim_sleep 0.05'
>
> Also, we are using an rbd user with the below profile. It is used while
> deleting snapshots
> ---
> caps: [mon] profile rbd
> caps: [osd] profile rbd pool=ecpool, profile rbd pool=vm, profile
> rbd-read-only pool=templates
> ---
>
> Can it be a reason ?
>
> Also, can you let me know which all logs to check while deleting snapshots
> to see if it is snaptrimming ?
> I am sorry I feel like pestering you too much.
> But in mailing lists, I can see you have dealt with similar issues with
> Snapshots
> So I think you can help me figure this mess out.
>
>
> Karun Josy
>
> On Sat, Jan 27, 2018 at 7:15 PM, David Turner 
> wrote:
>
>> Prove* a positive
>>
>> On Sat, Jan 27, 2018, 8:45 AM David Turner  wrote:
>>
>>> Unless you have things in your snap_trimq, your problem isn't snap
>>> trimming. That is currently how you can check snap trimming and you say
>>> you're caught up.
>>>
>>> Are you certain that you are querying the correct pool for the images
>>> you are snapshotting. You showed that you tested 4 different pools. You
>>> should only need to check the pool with the images you are dealing with.
>>>
>>> You can inversely price a positive by changing your snap_trim settings
>>> to not do any cleanup and see if the appropriate PGs have anything in their
>>> q.
>>>
>>> On Sat, Jan 27, 2018, 12:06 AM Karun Josy  wrote:
>>>
>>>> Is scrubbing and deep scrubbing necessary for Snaptrim operation to
>>>> happen ?
>>>>
>>>> Karun Josy
>>>>
>>>> On Fri, Jan 26, 2018 at 9:29 PM, Karun Josy 
>>>> wrote:
>>>>
>>>>> Thank you for your quick response!
>>>>>
>>>>> I used the command to fetch the snap_trimq from many pgs, however it
>>>>> seems they don't have any in queue ?
>>>>>
>>>>> For eg :
>>>>> 
>>>>> $ echo $(( $(ceph pg  55.4a query | grep snap_trimq | cut -d[ -f2 |
>>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>>> 0
>>>>> $ echo $(( $(ceph pg  55.5a query | grep snap_trimq | cut -d[ -f2 |
>>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>>> 0
>>>>> $ echo $(( $(ceph pg  55.88 query | grep snap_trimq | cut -d[ -f2 |
>>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>>> 0
>>>>> $ echo $(( $(ceph pg  55.55 query | grep snap_trimq | cut -d[ -f2 |
>>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>>> 0
>>>>> $ echo $(( $(ceph pg  54.a query | grep snap_trimq | cut -d[ -f2 | cut
>>>>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>>> 0
>>>>> $ echo $(( $(ceph pg  34.1d query | grep snap_trimq | cut -d[ -f2 |
>>>>> cut -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>>> 0
>>>>> $ echo $(( $(ceph pg  1.3f query | grep snap_trimq | cut -d[ -f2 | cut
>>>>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>>> 0
>>>>> =
>>>>>
>>>>>
>>>>> While going through the PG query, I find that these PGs have no value
>>>>> in purged_snaps section too.
>>>>> For eg :
>>>>>

Re: [ceph-users] POOL_NEARFULL

2018-01-29 Thread Karun Josy

In Luminous version, we have to use osd set command

--
ceph osd   set -backfillfull-ratio .89
ceph osd set-nearfull-ratio .84
ceph osd set-full-ratio .96
--

Karun Josy

On Thu, Dec 21, 2017 at 4:29 PM, Konstantin Shalygin  wrote:

> Update your ceph.conf file
>
> This is also not help. I was create ticket http://tracker.ceph.com/
> issues/22520
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Limit deep scrub

2018-01-28 Thread Karun Josy

Hi,

I used these settings and there are no more slow requests in the cluster.

-
ceph tell osd.* injectargs '--osd_scrub_sleep 0.1'
ceph tell osd.* injectargs '--osd_scrub_load_threshold 0.3'
ceph tell osd.* injectargs '--osd_scrub_chunk_max 6'
--

Yes, scrubbing is slower now, but there has been no osd flapping and slow
requests!

Thanks for all your help!


Karun Josy

On Sun, Jan 28, 2018 at 9:25 PM, David Turner  wrote:

> Use a get with the second syntax to see the currently running config.
>
> On Sun, Jan 28, 2018, 3:41 AM Karun Josy  wrote:
>
>> Hello,
>>
>> Sorry for bringing this up again.
>>
>> What is the proper way to adjust the scrub settings ?
>> Can I use injectargs ?
>> ---
>> ceph tell osd.* injectargs '--osd_scrub_sleep .1'
>> ---
>>
>> Or do I have to use set manually in each osd daemons ?
>> ---
>> ceph daemon osd.21 set osd_scrub_sleep .1
>> 
>>
>> While using both it shows (not observed, change may require restart)
>> So is it not set ?
>>
>>
>> Karun Josy
>>
>> On Mon, Jan 15, 2018 at 7:16 AM, shadow_lin  wrote:
>>
>>> hi,
>>> you can try to adjusting osd_scrub_chunk_min,osd_scrub_chunk_max and
>>> osd_scrub_sleep.
>>>
>>>
>>> osd scrub sleep
>>>
>>> Description: Time to sleep before scrubbing next group of chunks.
>>> Increasing this value will slow down whole scrub operation while client
>>> operations will be less impacted.
>>> Type: Float
>>> Default: 0
>>>
>>> osd scrub chunk min
>>>
>>> Description: The minimal number of object store chunks to scrub during
>>> single operation. Ceph blocks writes to single chunk during scrub.
>>> Type: 32-bit Integer
>>> Default: 5
>>>
>>>
>>> 2018-01-15
>>> --
>>> lin.yunfan
>>> --
>>>
>>> *发件人：*Karun Josy 
>>> *发送时间：*2018-01-15 06:53
>>> *主题：*[ceph-users] Limit deep scrub
>>> *收件人：*"ceph-users"
>>> *抄送：*
>>>
>>> Hello,
>>>
>>> It appears that cluster is having many slow requests while it is
>>> scrubbing and deep scrubbing. Also sometimes we can see osds flapping.
>>>
>>> So we have put the flags : noscrub,nodeep-scrub
>>>
>>> When we unset it, 5 PGs start to scrub.
>>> Is there a way to limit it to one at a time?
>>>
>>> # ceph daemon osd.35 config show | grep scrub
>>> "mds_max_scrub_ops_in_progress": "5",
>>> "mon_scrub_inject_crc_mismatch": "0.00",
>>> "mon_scrub_inject_missing_keys": "0.00",
>>> "mon_scrub_interval": "86400",
>>> "mon_scrub_max_keys": "100",
>>> "mon_scrub_timeout": "300",
>>> "mon_warn_not_deep_scrubbed": "0",
>>> "mon_warn_not_scrubbed": "0",
>>> "osd_debug_scrub_chance_rewrite_digest": "0",
>>> "osd_deep_scrub_interval": "604800.00",
>>> "osd_deep_scrub_randomize_ratio": "0.15",
>>> "osd_deep_scrub_stride": "524288",
>>> "osd_deep_scrub_update_digest_min_age": "7200",
>>> "osd_max_scrubs": "1",
>>> "osd_op_queue_mclock_scrub_lim": "0.001000",
>>> "osd_op_queue_mclock_scrub_res": "0.00",
>>> "osd_op_queue_mclock_scrub_wgt": "1.00",
>>> "osd_requested_scrub_priority": "120",
>>> "osd_scrub_auto_repair": "false",
>>> "osd_scrub_auto_repair_num_errors": "5",
>>> "osd_scrub_backoff_ratio": "0.66",
>>> "osd_scrub_begin_hour": "0",
>>> "osd_scrub_chunk_max": "25",
>>> "osd_scrub_chunk_min": "5",
>>> "osd_scrub_cost": "52428800",
>>> "osd_scrub_during_recovery": "false",
>>> "osd_scrub_end_hour": "24",
>>> "osd_scrub_interval_randomize_ratio": "0.50",
>>> "osd_scrub_invalid_stats": "true",
>>> "osd_scrub_load_threshold": "0.50",
>>> "osd_scrub_max_interval": "604800.00",
>>> "osd_scrub_min_interval": "86400.00",
>>> "osd_scrub_priority": "5",
>>> "osd_scrub_sleep": "0.00",
>>>
>>>
>>> Karun
>>>
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] lease_timeout

2018-01-28 Thread Karun Josy

Still the issue is continuing. Any one else has noticed it ?

When this happens, the Ceph Dashboard GUI gets stuck and we have to restart
the manager daemon to make it work again

Karun Josy

On Wed, Jan 17, 2018 at 6:16 AM, Karun Josy  wrote:

> Hello,
>
> In one of our cluster set up, there is frequent monitor elections
> happening.
> In the logs of one of the monitor, there is "lease_timeout" message before
> that happens. Can anyone help me to figure it out ?
> (When this happens, the Ceph Dashboard GUI gets stuck and we have to
> restart the manager daemon to make it work again)
>
> Ceph version : Luminous 12.2.2
>
> Log :
> =
>
> 2018-01-16 16:33:08.001937 7f0cfbaad700  4 rocksdb:
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_
> 64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/
> centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/
> ceph-12.2.2/src/rocksdb/db/compaction_job.cc:1173] [default] [JOB 885]
> Compacted 1@0 + 1@1 files to L1 => 20046585 bytes
> 2018-01-16 16:33:08.015891 7f0cfbaad700  4 rocksdb: (Original Log Time
> 2018/01/16-16:33:08.015826) [/home/jenkins-build/build/
> workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/
> AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/
> release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/compaction_job.cc:621]
> [default] compacted to: base level 1 max bytes base 268435456 files[0 1 0 0
> 0 0 0] max score 0.07, MB/sec: 32.7 rd, 30.9 wr, level 1, files in(1, 1)
> out(1) MB in(1.3, 18.9) out(19.1), read-write-amplify(31.0)
> write-amplify(15.1) OK, records in: 4305, records dropped: 515
>
> 2018-01-16 16:33:08.015897 7f0cfbaad700  4 rocksdb: (Original Log Time
> 2018/01/16-16:33:08.015840) EVENT_LOG_v1 {"time_micros": 1516149188015833,
> "job": 885, "event": "compaction_finished", "compaction_time_micros":
> 647876, "output_level": 1, "num_output_files": 1, "total_output_size":
> 20046585, "num_input_records": 4305, "num_output_records": 3790,
> "num_subcompactions": 1, "num_single_delete_mismatches": 0,
> "num_single_delete_fallthrough": 0, "lsm_state": [0, 1, 0, 0, 0, 0, 0]}
> 2018-01-16 16:33:08.016131 7f0cfbaad700  4 rocksdb: EVENT_LOG_v1
> {"time_micros": 1516149188016128, "job": 885, "event":
> "table_file_deletion", "file_number": 2419}
> 2018-01-16 16:33:08.018147 7f0cfbaad700  4 rocksdb: EVENT_LOG_v1
> {"time_micros": 1516149188018146, "job": 885, "event":
> "table_file_deletion", "file_number": 2417}
> 2018-01-16 16:33:11.051010 7f0d042be700  0 
> mon.ceph-mon3@2(peon).data_health(436)
> update_stats avail 84% total 20918 MB, used 2179 MB, avail 17653 MB
> 2018-01-16 16:33:17.269954 7f0d042be700  1 mon.ceph-mon3@2(peon).paxos(paxos
> active c 84337..84838) lease_timeout -- calling new election
> 2018-01-16 16:33:17.291096 7f0d01ab9700  0 log_channel(cluster) log [INF]
> : mon.ceph-sgp-mon3 calling new monitor election
> 2018-01-16 16:33:17.291182 7f0d01ab9700  1 
> mon.ceph-mon3@2(electing).elector(436)
> init, last seen epoch 436
> 2018-01-16 16:33:20.834853 7f0d01ab9700  1 mon.ceph-mon3@2(peon).log
> v23189 check_sub sending message to client.65755 10.255.0.95:0/2603001850
> with 8 entries (version 23189)
>
>
>
> Karun
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Limit deep scrub

2018-01-28 Thread Karun Josy

Hello,

Sorry for bringing this up again.

What is the proper way to adjust the scrub settings ?
Can I use injectargs ?
---
ceph tell osd.* injectargs '--osd_scrub_sleep .1'
---

Or do I have to use set manually in each osd daemons ?
---
ceph daemon osd.21 set osd_scrub_sleep .1


While using both it shows (not observed, change may require restart)
So is it not set ?


Karun Josy

On Mon, Jan 15, 2018 at 7:16 AM, shadow_lin  wrote:

> hi,
> you can try to adjusting osd_scrub_chunk_min,osd_scrub_chunk_max and
> osd_scrub_sleep.
>
>
> osd scrub sleep
>
> Description: Time to sleep before scrubbing next group of chunks.
> Increasing this value will slow down whole scrub operation while client
> operations will be less impacted.
> Type: Float
> Default: 0
>
> osd scrub chunk min
>
> Description: The minimal number of object store chunks to scrub during
> single operation. Ceph blocks writes to single chunk during scrub.
> Type: 32-bit Integer
> Default: 5
>
>
> 2018-01-15
> --
> lin.yunfan
> --
>
> *发件人：*Karun Josy 
> *发送时间：*2018-01-15 06:53
> *主题：*[ceph-users] Limit deep scrub
> *收件人：*"ceph-users"
> *抄送：*
>
> Hello,
>
> It appears that cluster is having many slow requests while it is scrubbing
> and deep scrubbing. Also sometimes we can see osds flapping.
>
> So we have put the flags : noscrub,nodeep-scrub
>
> When we unset it, 5 PGs start to scrub.
> Is there a way to limit it to one at a time?
>
> # ceph daemon osd.35 config show | grep scrub
> "mds_max_scrub_ops_in_progress": "5",
> "mon_scrub_inject_crc_mismatch": "0.00",
> "mon_scrub_inject_missing_keys": "0.00",
> "mon_scrub_interval": "86400",
> "mon_scrub_max_keys": "100",
> "mon_scrub_timeout": "300",
> "mon_warn_not_deep_scrubbed": "0",
> "mon_warn_not_scrubbed": "0",
> "osd_debug_scrub_chance_rewrite_digest": "0",
> "osd_deep_scrub_interval": "604800.00",
> "osd_deep_scrub_randomize_ratio": "0.15",
> "osd_deep_scrub_stride": "524288",
> "osd_deep_scrub_update_digest_min_age": "7200",
> "osd_max_scrubs": "1",
> "osd_op_queue_mclock_scrub_lim": "0.001000",
> "osd_op_queue_mclock_scrub_res": "0.00",
> "osd_op_queue_mclock_scrub_wgt": "1.00",
> "osd_requested_scrub_priority": "120",
> "osd_scrub_auto_repair": "false",
> "osd_scrub_auto_repair_num_errors": "5",
> "osd_scrub_backoff_ratio": "0.66",
> "osd_scrub_begin_hour": "0",
> "osd_scrub_chunk_max": "25",
> "osd_scrub_chunk_min": "5",
> "osd_scrub_cost": "52428800",
> "osd_scrub_during_recovery": "false",
> "osd_scrub_end_hour": "24",
> "osd_scrub_interval_randomize_ratio": "0.50",
> "osd_scrub_invalid_stats": "true",
> "osd_scrub_load_threshold": "0.50",
> "osd_scrub_max_interval": "604800.00",
> "osd_scrub_min_interval": "86400.00",
> "osd_scrub_priority": "5",
> "osd_scrub_sleep": "0.00",
>
>
> Karun
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Snapshot trimming

2018-01-27 Thread Karun Josy

Hi David,

Thank you for your reply! I really appreciate it.

The images are in pool id 55. It is an erasure coded pool.

---
$ echo $(( $(ceph pg  55.58 query | grep snap_trimq | cut -d[ -f2 | cut -d]
-f1 | tr ',' '\n' | wc -l) - 1 ))
0
$ echo $(( $(ceph pg  55.a query | grep snap_trimq | cut -d[ -f2 | cut -d]
-f1 | tr ',' '\n' | wc -l) - 1 ))
0
$ echo $(( $(ceph pg  55.65 query | grep snap_trimq | cut -d[ -f2 | cut -d]
-f1 | tr ',' '\n' | wc -l) - 1 ))
0
--

Current snap_trim_sleep value is default.
"osd_snap_trim_sleep": "0.00". I assume it means there is no delay.
(Can't find any documentation related to it)
Will changing its value initiate snaptrimming, like
ceph tell osd.* injectargs '--osd_snap_trim_sleep 0.05'

Also, we are using an rbd user with the below profile. It is used while
deleting snapshots
---
caps: [mon] profile rbd
caps: [osd] profile rbd pool=ecpool, profile rbd pool=vm, profile
rbd-read-only pool=templates
---

Can it be a reason ?

Also, can you let me know which all logs to check while deleting snapshots
to see if it is snaptrimming ?
I am sorry I feel like pestering you too much.
But in mailing lists, I can see you have dealt with similar issues with
Snapshots
So I think you can help me figure this mess out.


Karun Josy

On Sat, Jan 27, 2018 at 7:15 PM, David Turner  wrote:

> Prove* a positive
>
> On Sat, Jan 27, 2018, 8:45 AM David Turner  wrote:
>
>> Unless you have things in your snap_trimq, your problem isn't snap
>> trimming. That is currently how you can check snap trimming and you say
>> you're caught up.
>>
>> Are you certain that you are querying the correct pool for the images you
>> are snapshotting. You showed that you tested 4 different pools. You should
>> only need to check the pool with the images you are dealing with.
>>
>> You can inversely price a positive by changing your snap_trim settings to
>> not do any cleanup and see if the appropriate PGs have anything in their q.
>>
>> On Sat, Jan 27, 2018, 12:06 AM Karun Josy  wrote:
>>
>>> Is scrubbing and deep scrubbing necessary for Snaptrim operation to
>>> happen ?
>>>
>>> Karun Josy
>>>
>>> On Fri, Jan 26, 2018 at 9:29 PM, Karun Josy 
>>> wrote:
>>>
>>>> Thank you for your quick response!
>>>>
>>>> I used the command to fetch the snap_trimq from many pgs, however it
>>>> seems they don't have any in queue ?
>>>>
>>>> For eg :
>>>> 
>>>> $ echo $(( $(ceph pg  55.4a query | grep snap_trimq | cut -d[ -f2 | cut
>>>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>> 0
>>>> $ echo $(( $(ceph pg  55.5a query | grep snap_trimq | cut -d[ -f2 | cut
>>>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>> 0
>>>> $ echo $(( $(ceph pg  55.88 query | grep snap_trimq | cut -d[ -f2 | cut
>>>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>> 0
>>>> $ echo $(( $(ceph pg  55.55 query | grep snap_trimq | cut -d[ -f2 | cut
>>>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>> 0
>>>> $ echo $(( $(ceph pg  54.a query | grep snap_trimq | cut -d[ -f2 | cut
>>>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>> 0
>>>> $ echo $(( $(ceph pg  34.1d query | grep snap_trimq | cut -d[ -f2 | cut
>>>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>> 0
>>>> $ echo $(( $(ceph pg  1.3f query | grep snap_trimq | cut -d[ -f2 | cut
>>>> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
>>>> 0
>>>> =
>>>>
>>>>
>>>> While going through the PG query, I find that these PGs have no value
>>>> in purged_snaps section too.
>>>> For eg :
>>>> ceph pg  55.80 query
>>>> --
>>>> ---
>>>> ---
>>>>  {
>>>> "peer": "83(3)",
>>>> "pgid": "55.80s3",
>>>> "last_update": "43360'15121927",
>>>> "last_complete": "43345'15073146",
>>>> "log_tail": "43335'15064480",
>>>> "last_user_version": 15066124,
>>>> "last_backfill": "MAX",
>>>> &qu

Re: [ceph-users] Snapshot trimming

2018-01-26 Thread Karun Josy

Is scrubbing and deep scrubbing necessary for Snaptrim operation to happen ?

Karun Josy

On Fri, Jan 26, 2018 at 9:29 PM, Karun Josy  wrote:

> Thank you for your quick response!
>
> I used the command to fetch the snap_trimq from many pgs, however it seems
> they don't have any in queue ?
>
> For eg :
> 
> $ echo $(( $(ceph pg  55.4a query | grep snap_trimq | cut -d[ -f2 | cut
> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
> 0
> $ echo $(( $(ceph pg  55.5a query | grep snap_trimq | cut -d[ -f2 | cut
> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
> 0
> $ echo $(( $(ceph pg  55.88 query | grep snap_trimq | cut -d[ -f2 | cut
> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
> 0
> $ echo $(( $(ceph pg  55.55 query | grep snap_trimq | cut -d[ -f2 | cut
> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
> 0
> $ echo $(( $(ceph pg  54.a query | grep snap_trimq | cut -d[ -f2 | cut -d]
> -f1 | tr ',' '\n' | wc -l) - 1 ))
> 0
> $ echo $(( $(ceph pg  34.1d query | grep snap_trimq | cut -d[ -f2 | cut
> -d] -f1 | tr ',' '\n' | wc -l) - 1 ))
> 0
> $ echo $(( $(ceph pg  1.3f query | grep snap_trimq | cut -d[ -f2 | cut -d]
> -f1 | tr ',' '\n' | wc -l) - 1 ))
> 0
> =
>
>
> While going through the PG query, I find that these PGs have no value in
> purged_snaps section too.
> For eg :
> ceph pg  55.80 query
> --
> ---
> ---
>  {
> "peer": "83(3)",
> "pgid": "55.80s3",
> "last_update": "43360'15121927",
> "last_complete": "43345'15073146",
> "log_tail": "43335'15064480",
> "last_user_version": 15066124,
> "last_backfill": "MAX",
> "last_backfill_bitwise": 1,
> "purged_snaps": [],
> "history": {
> "epoch_created": 5950,
> "epoch_pool_created": 5950,
> "last_epoch_started": 43339,
> "last_interval_started": 43338,
> "last_epoch_clean": 43340,
> "last_interval_clean": 43338,
> "last_epoch_split": 0,
> "last_epoch_marked_full": 42032,
> "same_up_since": 43338,
> "same_interval_since": 43338,
> "same_primary_since": 43276,
>     "last_scrub": "35299'13072533",
> "last_scrub_stamp": "2018-01-18 14:01:19.557972",
> "last_deep_scrub": "31372'12176860",
> "last_deep_scrub_stamp": "2018-01-15 12:21:17.025305",
> "last_clean_scrub_stamp": "2018-01-18 14:01:19.557972"
> },
>
> Not sure if it is related.
>
> The cluster is not open to any new clients. However we see a steady growth
> of  space usage every day.
> And worst case scenario, it might grow faster than we can add more space,
> which will be dangerous.
>
> Any help is really appreciated.
>
> Karun Josy
>
> On Fri, Jan 26, 2018 at 8:23 PM, David Turner 
> wrote:
>
>> "snap_trimq": "[]",
>>
>> That is exactly what you're looking for to see how many objects a PG
>> still had that need to be cleaned up. I think something like this should
>> give you the number of objects in the snap_trimq for a PG.
>>
>> echo $(( $(ceph pg $pg query | grep snap_trimq | cut -d[ -f2 | cut -d]
>> -f1 | tr ',' '\n' | wc -l) - 1 ))
>>
>> Note, I'm not at a computer and topping this from my phone so it's not
>> pretty and I know of a few ways to do that better, but that should work all
>> the same.
>>
>> For your needs a visual inspection of several PGs should be sufficient to
>> see if there is anything in the snap_trimq to begin with.
>>
>> On Fri, Jan 26, 2018, 9:18 AM Karun Josy  wrote:
>>
>>>  Hi David,
>>>
>>> Thank you for the response. To be honest, I am afraid it is going to be
>>> a issue in our cluster.
>>> It seems snaptrim has not been going on for sometime now , maybe because
>>> we were expanding the cluster adding nodes for the past few weeks.
>>>
>>> I would be really glad if you can

[ceph-users] Snapshot trimming

2018-01-25 Thread Karun Josy

Hi,

We have set no scrub , no deep scrub flag on a ceph cluster.
When we are deleting snapshots we are not seeing any change in usage space.

I understand that Ceph OSDs delete data asynchronously, so deleting a
snapshot doesn’t free up the disk space immediately. But we are not seeing
any change for sometime.

What can be possible reason ? Any suggestions would be really helpful as
the cluster size seems to be growing each day even though snapshots are
deleted.


Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Full Ratio

2018-01-24 Thread Karun Josy

Thank you!

Ceph version is 12.2

Also, can you let me know the format to set  osd_backfill_full_ratio ?

Is it  " ceph osd   set   -backfillfull-ratio .89 " ?









Karun Josy

On Thu, Jan 25, 2018 at 1:29 AM, Jean-Charles Lopez 
wrote:

> Hi,
>
> if you are using an older Ceph version note that the
> mon_osd_near_full_ration and mon_osd_full_ration must be set in the config
> file on the MON hosts first and then the MONs restarted one after the other
> one.
>
> If using a recent  version there is a command ceph osd set-full-ratio and
> ceph osd set-nearfull-ratio
>
> Regards
> JC
>
> > On Jan 24, 2018, at 11:07, Karun Josy  wrote:
> >
> > Hi,
> >
> > I am trying to increase the full ratio of OSDs in a cluster.
> > While adding a new node one of the new disk got backfilled to more than
> 95% and cluster freezed. So I am trying to avoid it from happening again.
> >
> >
> > Tried pg set command but it is not working :
> > $ ceph pg set_nearfull_ratio 0.88
> > Error ENOTSUP: this command is obsolete
> >
> > I had increased the full ratio in osds using injectargs initially but it
> didnt work as when the disk reached 95% it showed osd full status
> >
> > $ ceph tell osd.* injectargs '--mon_osd_full_ratio 0.97'
> > osd.0: mon_osd_full_ratio = '0.97' (not observed, change may require
> restart)
> > osd.1: mon_osd_full_ratio = '0.97' (not observed, change may require
> restart)
> > 
> > 
> >
> > How can I set full ratio to more than 95% ?
> >
> > Karun
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Full Ratio

2018-01-24 Thread Karun Josy

Hi,

I am trying to increase the full ratio of OSDs in a cluster.
While adding a new node one of the new disk got backfilled to more than 95%
and cluster freezed. So I am trying to avoid it from happening again.


Tried pg set command but it is not working :
$ ceph pg set_nearfull_ratio 0.88
Error ENOTSUP: this command is obsolete

I had increased the full ratio in osds using injectargs initially but it
didnt work as when the disk reached 95% it showed osd full status

$ ceph tell osd.* injectargs '--mon_osd_full_ratio 0.97'
osd.0: mon_osd_full_ratio = '0.97' (not observed, change may require
restart)
osd.1: mon_osd_full_ratio = '0.97' (not observed, change may require
restart)



How can I set full ratio to more than 95% ?

Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] PG inactive, peering

2018-01-22 Thread Karun Josy

Hi,

We added a new host to cluster and it was rebalancing.
And one PG became "inactive, peering" for very long time which created lot
of slow requests and poor performance to the whole cluster.

When I queried that PG, it showed this :

"recovery_state": [
{
"name": "Started/Primary/Peering/GetMissing",
"enter_time": "2018-01-22 18:40:04.777654",
"peer_missing_requested": [
{
"osd": "77(7)",

So I assumed it was stuck getting information from osd77 and so I marked
osd.77 down.
The status of the PG changed to "active+undersized+degraded" and PG became
active again.

Can anyone know why this happened ?
If I start osd.77,again the PG becomes inactive and peering state.


Is it becase osd.77 is bad ? Or will the same happen when the PG tries to
peer again with another disk?


Any help is really appreciated

Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] lease_timeout

2018-01-16 Thread Karun Josy

Hello,

In one of our cluster set up, there is frequent monitor elections
happening.
In the logs of one of the monitor, there is "lease_timeout" message before
that happens. Can anyone help me to figure it out ?
(When this happens, the Ceph Dashboard GUI gets stuck and we have to
restart the manager daemon to make it work again)

Ceph version : Luminous 12.2.2

Log :
=

2018-01-16 16:33:08.001937 7f0cfbaad700  4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/compaction_job.cc:1173]
[default] [JOB 885] Compacted 1@0 + 1@1 files to L1 => 20046585 bytes
2018-01-16 16:33:08.015891 7f0cfbaad700  4 rocksdb: (Original Log Time
2018/01/16-16:33:08.015826)
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/compaction_job.cc:621]
[default] compacted to: base level 1 max bytes base 268435456 files[0 1 0 0
0 0 0] max score 0.07, MB/sec: 32.7 rd, 30.9 wr, level 1, files in(1, 1)
out(1) MB in(1.3, 18.9) out(19.1), read-write-amplify(31.0)
write-amplify(15.1) OK, records in: 4305, records dropped: 515

2018-01-16 16:33:08.015897 7f0cfbaad700  4 rocksdb: (Original Log Time
2018/01/16-16:33:08.015840) EVENT_LOG_v1 {"time_micros": 1516149188015833,
"job": 885, "event": "compaction_finished", "compaction_time_micros":
647876, "output_level": 1, "num_output_files": 1, "total_output_size":
20046585, "num_input_records": 4305, "num_output_records": 3790,
"num_subcompactions": 1, "num_single_delete_mismatches": 0,
"num_single_delete_fallthrough": 0, "lsm_state": [0, 1, 0, 0, 0, 0, 0]}
2018-01-16 16:33:08.016131 7f0cfbaad700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1516149188016128, "job": 885, "event":
"table_file_deletion", "file_number": 2419}
2018-01-16 16:33:08.018147 7f0cfbaad700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1516149188018146, "job": 885, "event":
"table_file_deletion", "file_number": 2417}
2018-01-16 16:33:11.051010 7f0d042be700  0
mon.ceph-mon3@2(peon).data_health(436)
update_stats avail 84% total 20918 MB, used 2179 MB, avail 17653 MB
2018-01-16 16:33:17.269954 7f0d042be700  1 mon.ceph-mon3@2(peon).paxos(paxos
active c 84337..84838) lease_timeout -- calling new election
2018-01-16 16:33:17.291096 7f0d01ab9700  0 log_channel(cluster) log [INF] :
mon.ceph-sgp-mon3 calling new monitor election
2018-01-16 16:33:17.291182 7f0d01ab9700  1
mon.ceph-mon3@2(electing).elector(436)
init, last seen epoch 436
2018-01-16 16:33:20.834853 7f0d01ab9700  1 mon.ceph-mon3@2(peon).log v23189
check_sub sending message to client.65755 10.255.0.95:0/2603001850 with 8
entries (version 23189)



Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Limit deep scrub

2018-01-14 Thread Karun Josy

Hello,

It appears that cluster is having many slow requests while it is scrubbing
and deep scrubbing. Also sometimes we can see osds flapping.

So we have put the flags : noscrub,nodeep-scrub

When we unset it, 5 PGs start to scrub.
Is there a way to limit it to one at a time?

# ceph daemon osd.35 config show | grep scrub
"mds_max_scrub_ops_in_progress": "5",
"mon_scrub_inject_crc_mismatch": "0.00",
"mon_scrub_inject_missing_keys": "0.00",
"mon_scrub_interval": "86400",
"mon_scrub_max_keys": "100",
"mon_scrub_timeout": "300",
"mon_warn_not_deep_scrubbed": "0",
"mon_warn_not_scrubbed": "0",
"osd_debug_scrub_chance_rewrite_digest": "0",
"osd_deep_scrub_interval": "604800.00",
"osd_deep_scrub_randomize_ratio": "0.15",
"osd_deep_scrub_stride": "524288",
"osd_deep_scrub_update_digest_min_age": "7200",
"osd_max_scrubs": "1",
"osd_op_queue_mclock_scrub_lim": "0.001000",
"osd_op_queue_mclock_scrub_res": "0.00",
"osd_op_queue_mclock_scrub_wgt": "1.00",
"osd_requested_scrub_priority": "120",
"osd_scrub_auto_repair": "false",
"osd_scrub_auto_repair_num_errors": "5",
"osd_scrub_backoff_ratio": "0.66",
"osd_scrub_begin_hour": "0",
"osd_scrub_chunk_max": "25",
"osd_scrub_chunk_min": "5",
"osd_scrub_cost": "52428800",
"osd_scrub_during_recovery": "false",
"osd_scrub_end_hour": "24",
"osd_scrub_interval_randomize_ratio": "0.50",
"osd_scrub_invalid_stats": "true",
"osd_scrub_load_threshold": "0.50",
"osd_scrub_max_interval": "604800.00",
"osd_scrub_min_interval": "86400.00",
"osd_scrub_priority": "5",
"osd_scrub_sleep": "0.00",


Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] rbd: map failed

2018-01-09 Thread Karun Josy

Hello,

We have a user "testuser" with below permissions :

$ ceph auth get client.testuser
exported keyring for client.testuser
[client.testuser]
key = ==
caps mon = "profile rbd"
caps osd = "profile rbd pool=ecpool, profile rbd pool=cv, profile
rbd-read-only pool=templates"


But when we try to map an image in pool 'templates' we get the below error
:
--
# rbd map templates/centos.7-4.x86-64.2017 --id testuser
rbd: sysfs write failed
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (1) Operation not permitted


Is it because that user has only read permission in templates pool ?



Karun Josy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to evict a client in rbd

2018-01-03 Thread Karun Josy

It happens randomly.

Karun Josy

On Wed, Jan 3, 2018 at 7:07 AM, Jason Dillaman  wrote:

> I tried to reproduce this for over an hour today using the specified
> versions w/o any success. Is this something that you can repeat
> on-demand or was this a one-time occurance?
>
> On Sat, Dec 23, 2017 at 3:48 PM, Karun Josy  wrote:
> > Hello,
> >
> > The image is not mapped.
> >
> > # ceph --version
> > ceph version 12.2.1  luminous (stable)
> > # uname -r
> > 4.14.0-1.el7.elrepo.x86_64
> >
> >
> > Karun Josy
> >
> > On Sat, Dec 23, 2017 at 6:51 PM, Jason Dillaman 
> wrote:
> >>
> >> What Ceph and what kernel version are you using? Are you positive that
> >> the image has been unmapped from 10.255.0.17?
> >>
> >> On Fri, Dec 22, 2017 at 7:14 PM, Karun Josy 
> wrote:
> >> > Hello,
> >> >
> >> > I am unable to delete this abandoned image.Rbd info shows a watcher ip
> >> > Image is not mapped
> >> > Image has no snapshots
> >> >
> >> >
> >> > rbd status cvm/image  --id clientuser
> >> > Watchers:
> >> > watcher=10.255.0.17:0/3495340192 client.390908
> >> > cookie=18446462598732841114
> >> >
> >> > How can I  evict or black list a watcher client so that image can be
> >> > deleted
> >> > http://docs.ceph.com/docs/master/cephfs/eviction/
> >> > I see this is possible in Cephfs
> >> >
> >> >
> >> >
> >> > Karun
> >> >
> >> > ___
> >> > ceph-users mailing list
> >> > ceph-users@lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >>
> >>
> >>
> >> --
> >> Jason
> >
> >
>
>
>
> --
> Jason
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Increasing PG number

2018-01-02 Thread Karun Josy

https://access.redhat.com/solutions/2457321

It says it is a very intensive process and can affect cluster performance.

Our Version is Luminous 12.2.2
And we are using erasure coding profile for a pool 'ecpool' with k=5 and m=3
Current PG number is 256 and it has about 20 TB of data.

Should I increase it gradually? Or set pg as 512 in one step ?




Karun Josy

On Tue, Jan 2, 2018 at 9:26 PM, Hans van den Bogert 
wrote:

> Please refer to standard documentation as much as possible,
>
> http://docs.ceph.com/docs/jewel/rados/operations/
> placement-groups/#set-the-number-of-placement-groups
>
> Han’s is also incomplete, since you also need to change the ‘pgp_num’ as
> well.
>
> Regards,
>
> Hans
>
> On Jan 2, 2018, at 4:41 PM, Vladimir Prokofev  wrote:
>
> Increased number of PGs in multiple pools in a production cluster on
> 12.2.2 recently - zero issues.
> CEPH claims that increasing pg_num and pgp_num are safe operations, which
> are essential for it's ability to scale, and this sounds pretty reasonable
> to me. [1]
>
>
> [1] https://www.sebastien-han.fr/blog/2013/03/12/ceph-change
> -pg-number-on-the-fly/
>
> 2018-01-02 18:21 GMT+03:00 Karun Josy :
>
>> Hi,
>>
>>  Initial PG count was not properly planned while setting up the cluster,
>> so now there are only less than 50 PGs per OSDs.
>>
>> What are the best practises to increase PG number of a pool ?
>> We have replicated pools as well as EC pools.
>>
>> Or is it better to create a new pool with higher PG numbers?
>>
>>
>> Karun
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Increasing PG number

2018-01-02 Thread Karun Josy

Hi,

 Initial PG count was not properly planned while setting up the cluster, so
now there are only less than 50 PGs per OSDs.

What are the best practises to increase PG number of a pool ?
We have replicated pools as well as EC pools.

Or is it better to create a new pool with higher PG numbers?


Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PG active+clean+remapped status

2018-01-02 Thread Karun Josy

Hi,

We added some more osds to the cluster and it was fixed.

Karun Josy

On Tue, Jan 2, 2018 at 6:21 AM, 한승진  wrote:

> Are all odsd are same version?
> I recently experienced similar situation.
>
> I upgraded all osds to exact same version and reset of pool configuration
> like below
>
> ceph osd pool set  min_size 5
>
> I have 5+2 erasure code the important thing is not the number of min_size
> but re-configuration I think.
> I hope this help you.
>
> 2017. 12. 19. 오전 5:25에 "Karun Josy" 님이 작성:
>
> I think what happened is this :
>>
>> http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/
>>
>>
>> Note
>>
>>
>> Sometimes, typically in a “small” cluster with few hosts (for instance
>> with a small testing cluster), the fact to take out the OSD can spawn a
>> CRUSH corner case where some PGs remain stuck in the active+remapped
>>  state
>>
>> Its a small cluster with unequal number of osds and one of the OSD disk
>> failed and I had taken it out.
>> I have already purged it, so I cannot use the reweight option mentioned
>> in that link.
>>
>>
>> So any other workarounds ?
>> Will adding more disks will clear it ?
>>
>> Karun Josy
>>
>> On Mon, Dec 18, 2017 at 9:06 AM, David Turner 
>> wrote:
>>
>>> Maybe try outing the disk that should have a copy of the PG, but
>>> doesn't. Then mark it back in. It might check that it has everything
>>> properly and pull a copy of the data it's missing. I dunno.
>>>
>>> On Sun, Dec 17, 2017, 10:00 PM Karun Josy  wrote:
>>>
>>>> Tried restarting all osds. Still no luck.
>>>>
>>>> Will adding a new disk to any of the server forces a rebalance and fix
>>>> it?
>>>>
>>>> Karun Josy
>>>>
>>>> On Sun, Dec 17, 2017 at 12:22 PM, Cary  wrote:
>>>>
>>>>> Karun,
>>>>>
>>>>>  Could you paste in the output from "ceph health detail"? Which OSD
>>>>> was just added?
>>>>>
>>>>> Cary
>>>>> -Dynamic
>>>>>
>>>>> On Sun, Dec 17, 2017 at 4:59 AM, Karun Josy 
>>>>> wrote:
>>>>> > Any help would be appreciated!
>>>>> >
>>>>> > Karun Josy
>>>>> >
>>>>> > On Sat, Dec 16, 2017 at 11:04 PM, Karun Josy 
>>>>> wrote:
>>>>> >>
>>>>> >> Hi,
>>>>> >>
>>>>> >> Repair didnt fix the issue.
>>>>> >>
>>>>> >> In the pg dump details, I notice this None. Seems pg is missing
>>>>> from one
>>>>> >> of the OSD
>>>>> >>
>>>>> >> [0,2,NONE,4,12,10,5,1]
>>>>> >> [0,2,1,4,12,10,5,1]
>>>>> >>
>>>>> >> There is no way Ceph corrects this automatically ? I have to edit/
>>>>> >> troubleshoot it manually ?
>>>>> >>
>>>>> >> Karun
>>>>> >>
>>>>> >> On Sat, Dec 16, 2017 at 10:44 PM, Cary 
>>>>> wrote:
>>>>> >>>
>>>>> >>> Karun,
>>>>> >>>
>>>>> >>>  Running ceph pg repair should not cause any problems. It may not
>>>>> fix
>>>>> >>> the issue though. If that does not help, there is more information
>>>>> at
>>>>> >>> the link below.
>>>>> >>> http://ceph.com/geen-categorie/ceph-manually-repair-object/
>>>>> >>>
>>>>> >>> I recommend not rebooting, or restarting while Ceph is repairing or
>>>>> >>> recovering. If possible, wait until the cluster is in a healthy
>>>>> state
>>>>> >>> first.
>>>>> >>>
>>>>> >>> Cary
>>>>> >>> -Dynamic
>>>>> >>>
>>>>> >>> On Sat, Dec 16, 2017 at 2:05 PM, Karun Josy 
>>>>> wrote:
>>>>> >>> > Hi Cary,
>>>>> >>> >
>>>>> >>> > No, I didnt try to repair it.
>>>>> >>> > I am comparatively new in ceph. Is it okay to try to repair it ?
>>>>> >>> > Or should I take any precautions while doing it ?
>&g

Re: [ceph-users] Cache tiering on Erasure coded pools

2017-12-28 Thread Karun Josy

Hello David,

Thank you!
We setup 2 pools to use EC with RBD. One ecpool and other normal replicated
pool.

However, would it still be advantageous to add a replicated cache tier in
front of an EC one, even though it is not required anymore? I would still
assume that replication would be less intensive than EC computing?


Karun Josy

On Wed, Dec 27, 2017 at 3:42 AM, David Turner  wrote:

> Please use the version of the docs for your installed version of ceph.
> Now the Jewel in your URL and the Luminous in mine.  In Luminous you no
> longer need a cache tier to use EC with RBDs.
>
> http://docs.ceph.com/docs/luminous/rados/operations/cache-tiering/
>
> On Tue, Dec 26, 2017, 4:21 PM Karun Josy  wrote:
>
>> Hi,
>>
>> We are using Erasure coded pools in a ceph cluster for RBD images.
>> Ceph version is 12.2.2 Luminous.
>>
>> -
>> http://docs.ceph.com/docs/jewel/rados/operations/cache-tiering/
>> -
>>
>> Here it says we can use a Cache tiering infront of ec pools.
>> To use erasure code with RBD we  have a replicated pool to store metadata
>> and  ecpool as data pool .
>>
>> Is it possible to setup cache tiering since there is already a replicated
>> pool that is being used ?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Karun Josy
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Cache tiering on Erasure coded pools

2017-12-26 Thread Karun Josy

Hi,

We are using Erasure coded pools in a ceph cluster for RBD images.
Ceph version is 12.2.2 Luminous.

-
http://docs.ceph.com/docs/jewel/rados/operations/cache-tiering/
-

Here it says we can use a Cache tiering infront of ec pools.
To use erasure code with RBD we  have a replicated pool to store metadata
and  ecpool as data pool .

Is it possible to setup cache tiering since there is already a replicated
pool that is being used ?










Karun Josy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to evict a client in rbd

2017-12-26 Thread Karun Josy

Any help is really appreciated.

Karun Josy

On Sun, Dec 24, 2017 at 2:18 AM, Karun Josy  wrote:

> Hello,
>
> The image is not mapped.
>
> # ceph --version
> ceph version 12.2.1  luminous (stable)
> # uname -r
> 4.14.0-1.el7.elrepo.x86_64
>
>
> Karun Josy
>
> On Sat, Dec 23, 2017 at 6:51 PM, Jason Dillaman 
> wrote:
>
>> What Ceph and what kernel version are you using? Are you positive that
>> the image has been unmapped from 10.255.0.17?
>>
>> On Fri, Dec 22, 2017 at 7:14 PM, Karun Josy  wrote:
>> > Hello,
>> >
>> > I am unable to delete this abandoned image.Rbd info shows a watcher ip
>> > Image is not mapped
>> > Image has no snapshots
>> >
>> >
>> > rbd status cvm/image  --id clientuser
>> > Watchers:
>> > watcher=10.255.0.17:0/3495340192 client.390908
>> > cookie=18446462598732841114
>> >
>> > How can I  evict or black list a watcher client so that image can be
>> deleted
>> > http://docs.ceph.com/docs/master/cephfs/eviction/
>> > I see this is possible in Cephfs
>> >
>> >
>> >
>> > Karun
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>>
>> --
>> Jason
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to evict a client in rbd

2017-12-23 Thread Karun Josy

Hello,

The image is not mapped.

# ceph --version
ceph version 12.2.1  luminous (stable)
# uname -r
4.14.0-1.el7.elrepo.x86_64


Karun Josy

On Sat, Dec 23, 2017 at 6:51 PM, Jason Dillaman  wrote:

> What Ceph and what kernel version are you using? Are you positive that
> the image has been unmapped from 10.255.0.17?
>
> On Fri, Dec 22, 2017 at 7:14 PM, Karun Josy  wrote:
> > Hello,
> >
> > I am unable to delete this abandoned image.Rbd info shows a watcher ip
> > Image is not mapped
> > Image has no snapshots
> >
> >
> > rbd status cvm/image  --id clientuser
> > Watchers:
> > watcher=10.255.0.17:0/3495340192 client.390908
> > cookie=18446462598732841114
> >
> > How can I  evict or black list a watcher client so that image can be
> deleted
> > http://docs.ceph.com/docs/master/cephfs/eviction/
> > I see this is possible in Cephfs
> >
> >
> >
> > Karun
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
>
> --
> Jason
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] How to evict a client in rbd

2017-12-22 Thread Karun Josy

Hello,

I am unable to delete this abandoned image.Rbd info shows a watcher ip
Image is not mapped
Image has no snapshots


rbd status cvm/image  --id clientuser
Watchers:
watcher=10.255.0.17:0/3495340192 client.390908
cookie=18446462598732841114

How can I  evict or black list a watcher client so that image can be deleted
http://docs.ceph.com/docs/master/cephfs/eviction/
I see this is possible in Cephfs



Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Proper way of removing osds

2017-12-22 Thread Karun Josy

Thank you!

Karun Josy

On Thu, Dec 21, 2017 at 3:51 PM, Konstantin Shalygin  wrote:

> Is this the correct way to removes OSDs, or am I doing something wrong ?
>>
> Generic way for maintenance (e.g. disk replace) is rebalance by change osd
> weight:
>
>
> ceph osd crush reweight osdid 0
>
> cluster migrate data "from this osd"
>
>
> When HEALTH_OK you can safe remove this OSD:
>
> ceph osd out osd_id
> systemctl stop ceph-osd@osd_id
> ceph osd crush remove osd_id
> ceph auth del osd_id
> ceph osd rm osd_id
>
>
>
> k
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Proper way of removing osds

2017-12-21 Thread Karun Josy

Hi,

This is how I remove an OSD from cluster


   - Take it out
   ceph osd out osdid

   Wait for the balancing to finish

   - Mark it down
   ceph osd down osdid

   Then Purge it
ceph osd purge osdid --yes-i-really-mean-it


While purging I can see there is another rebalancing occurring.
Is this the correct way to removes OSDs, or am I doing something wrong ?



Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] POOL_NEARFULL

2017-12-19 Thread Karun Josy

Hi ,

That makes sense.

How can I adjust the osd nearfull ratio ?  I tried this, however it didnt
change.

$ ceph tell mon.* injectargs "--mon_osd_nearfull_ratio .86"
mon.mon-a1: injectargs:mon_osd_nearfull_ratio = '0.86' (not observed,
change may require restart)
mon.mon-a2: injectargs:mon_osd_nearfull_ratio = '0.86' (not observed,
change may require restart)
mon.mon-a3: injectargs:mon_osd_nearfull_ratio = '0.86' (not observed,
change may require restart)


Karun Josy

On Tue, Dec 19, 2017 at 10:05 PM, Jean-Charles Lopez 
wrote:

> OK so it’s telling you that the near full OSD holds PGs for these three
> pools.
>
> JC
>
> On Dec 19, 2017, at 08:05, Karun Josy  wrote:
>
> No, I haven't.
>
> Interestingly, the POOL_NEARFULL flag is shown only when there is OSD_NEARFULL
> flag.
> I have recently upgraded to Luminous 12.2.2, haven't seen this flag in
> 12.2.1
>
>
>
> Karun Josy
>
> On Tue, Dec 19, 2017 at 9:27 PM, Jean-Charles Lopez 
> wrote:
>
>> Hi
>>
>> did you set quotas on these pools?
>>
>> See this page for explanation of most error messages:
>> http://docs.ceph.com/docs/master/rados/operations/
>> health-checks/#pool-near-full
>>
>> JC
>>
>> On Dec 19, 2017, at 01:48, Karun Josy  wrote:
>>
>> Hello,
>>
>> In one of our clusters, health is showing these warnings :
>> -
>> OSD_NEARFULL 1 nearfull osd(s)
>> osd.22 is near full
>> POOL_NEARFULL 3 pool(s) nearfull
>> pool 'templates' is nearfull
>> pool 'cvm' is nearfull
>> pool 'ecpool' is nearfull
>> 
>>
>> One osd is above 85% used, which I know caused the OSD_Nearfull flag.
>> But what does pool(s) nearfull mean ?
>> And how can I correct it ?
>>
>> ]$ ceph df
>> GLOBAL:
>> SIZE   AVAIL  RAW USED %RAW USED
>> 31742G 11147G   20594G 64.88
>> POOLS:
>> NAMEID USED   %USED MAX AVAIL OBJECTS
>> templates  5196G 23.28  645G   50202
>> cvm   66528 0 1076G 770
>> ecpool   7  10260G 83.56 2018G 3004031
>>
>>
>>
>> Karun
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] POOL_NEARFULL

2017-12-19 Thread Karun Josy

No, I haven't.

Interestingly, the POOL_NEARFULL flag is shown only when there is OSD_NEARFULL
flag.
I have recently upgraded to Luminous 12.2.2, haven't seen this flag in
12.2.1



Karun Josy

On Tue, Dec 19, 2017 at 9:27 PM, Jean-Charles Lopez 
wrote:

> Hi
>
> did you set quotas on these pools?
>
> See this page for explanation of most error messages: http://docs.ceph.
> com/docs/master/rados/operations/health-checks/#pool-near-full
>
> JC
>
> On Dec 19, 2017, at 01:48, Karun Josy  wrote:
>
> Hello,
>
> In one of our clusters, health is showing these warnings :
> -
> OSD_NEARFULL 1 nearfull osd(s)
> osd.22 is near full
> POOL_NEARFULL 3 pool(s) nearfull
> pool 'templates' is nearfull
> pool 'cvm' is nearfull
> pool 'ecpool' is nearfull
> 
>
> One osd is above 85% used, which I know caused the OSD_Nearfull flag.
> But what does pool(s) nearfull mean ?
> And how can I correct it ?
>
> ]$ ceph df
> GLOBAL:
> SIZE   AVAIL  RAW USED %RAW USED
> 31742G 11147G   20594G 64.88
> POOLS:
> NAMEID USED   %USED MAX AVAIL OBJECTS
> templates  5196G 23.28  645G   50202
> cvm   66528 0 1076G 770
> ecpool   7  10260G 83.56 2018G 3004031
>
>
>
> Karun
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] POOL_NEARFULL

2017-12-19 Thread Karun Josy

Hello,

In one of our clusters, health is showing these warnings :
-
OSD_NEARFULL 1 nearfull osd(s)
osd.22 is near full
POOL_NEARFULL 3 pool(s) nearfull
pool 'templates' is nearfull
pool 'cvm' is nearfull
pool 'ecpool' is nearfull


One osd is above 85% used, which I know caused the OSD_Nearfull flag.
But what does pool(s) nearfull mean ?
And how can I correct it ?

]$ ceph df
GLOBAL:
SIZE   AVAIL  RAW USED %RAW USED
31742G 11147G   20594G 64.88
POOLS:
NAMEID USED   %USED MAX AVAIL OBJECTS
templates  5196G 23.28  645G   50202
cvm   66528 0 1076G 770
ecpool   7  10260G 83.56 2018G 3004031



Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PG active+clean+remapped status

2017-12-18 Thread Karun Josy

I think what happened is this :

http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/


Note


Sometimes, typically in a “small” cluster with few hosts (for instance with
a small testing cluster), the fact to take out the OSD can spawn a CRUSH
corner case where some PGs remain stuck in the active+remapped state

Its a small cluster with unequal number of osds and one of the OSD disk
failed and I had taken it out.
I have already purged it, so I cannot use the reweight option mentioned in
that link.


So any other workarounds ?
Will adding more disks will clear it ?

Karun Josy

On Mon, Dec 18, 2017 at 9:06 AM, David Turner  wrote:

> Maybe try outing the disk that should have a copy of the PG, but doesn't.
> Then mark it back in. It might check that it has everything properly and
> pull a copy of the data it's missing. I dunno.
>
> On Sun, Dec 17, 2017, 10:00 PM Karun Josy  wrote:
>
>> Tried restarting all osds. Still no luck.
>>
>> Will adding a new disk to any of the server forces a rebalance and fix it?
>>
>> Karun Josy
>>
>> On Sun, Dec 17, 2017 at 12:22 PM, Cary  wrote:
>>
>>> Karun,
>>>
>>>  Could you paste in the output from "ceph health detail"? Which OSD
>>> was just added?
>>>
>>> Cary
>>> -Dynamic
>>>
>>> On Sun, Dec 17, 2017 at 4:59 AM, Karun Josy 
>>> wrote:
>>> > Any help would be appreciated!
>>> >
>>> > Karun Josy
>>> >
>>> > On Sat, Dec 16, 2017 at 11:04 PM, Karun Josy 
>>> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> Repair didnt fix the issue.
>>> >>
>>> >> In the pg dump details, I notice this None. Seems pg is missing from
>>> one
>>> >> of the OSD
>>> >>
>>> >> [0,2,NONE,4,12,10,5,1]
>>> >> [0,2,1,4,12,10,5,1]
>>> >>
>>> >> There is no way Ceph corrects this automatically ? I have to edit/
>>> >> troubleshoot it manually ?
>>> >>
>>> >> Karun
>>> >>
>>> >> On Sat, Dec 16, 2017 at 10:44 PM, Cary 
>>> wrote:
>>> >>>
>>> >>> Karun,
>>> >>>
>>> >>>  Running ceph pg repair should not cause any problems. It may not fix
>>> >>> the issue though. If that does not help, there is more information at
>>> >>> the link below.
>>> >>> http://ceph.com/geen-categorie/ceph-manually-repair-object/
>>> >>>
>>> >>> I recommend not rebooting, or restarting while Ceph is repairing or
>>> >>> recovering. If possible, wait until the cluster is in a healthy state
>>> >>> first.
>>> >>>
>>> >>> Cary
>>> >>> -Dynamic
>>> >>>
>>> >>> On Sat, Dec 16, 2017 at 2:05 PM, Karun Josy 
>>> wrote:
>>> >>> > Hi Cary,
>>> >>> >
>>> >>> > No, I didnt try to repair it.
>>> >>> > I am comparatively new in ceph. Is it okay to try to repair it ?
>>> >>> > Or should I take any precautions while doing it ?
>>> >>> >
>>> >>> > Karun Josy
>>> >>> >
>>> >>> > On Sat, Dec 16, 2017 at 2:08 PM, Cary 
>>> wrote:
>>> >>> >>
>>> >>> >> Karun,
>>> >>> >>
>>> >>> >>  Did you attempt a "ceph pg repair "? Replace  with
>>> the pg
>>> >>> >> ID that needs repaired, 3.4.
>>> >>> >>
>>> >>> >> Cary
>>> >>> >> -D123
>>> >>> >>
>>> >>> >> On Sat, Dec 16, 2017 at 8:24 AM, Karun Josy >> >
>>> >>> >> wrote:
>>> >>> >> > Hello,
>>> >>> >> >
>>> >>> >> > I added 1 disk to the cluster and after rebalancing, it shows 1
>>> PG
>>> >>> >> > is in
>>> >>> >> > remapped state. How can I correct it ?
>>> >>> >> >
>>> >>> >> > (I had to restart some osds during the rebalancing as there were
>>> >>> >> > some
>>> >>> >> > slow
>>> >>> >> > requests)
>>> >>> >> &

Re: [ceph-users] PG active+clean+remapped status

2017-12-17 Thread Karun Josy

Tried restarting all osds. Still no luck.

Will adding a new disk to any of the server forces a rebalance and fix it?

Karun Josy

On Sun, Dec 17, 2017 at 12:22 PM, Cary  wrote:

> Karun,
>
>  Could you paste in the output from "ceph health detail"? Which OSD
> was just added?
>
> Cary
> -Dynamic
>
> On Sun, Dec 17, 2017 at 4:59 AM, Karun Josy  wrote:
> > Any help would be appreciated!
> >
> > Karun Josy
> >
> > On Sat, Dec 16, 2017 at 11:04 PM, Karun Josy 
> wrote:
> >>
> >> Hi,
> >>
> >> Repair didnt fix the issue.
> >>
> >> In the pg dump details, I notice this None. Seems pg is missing from one
> >> of the OSD
> >>
> >> [0,2,NONE,4,12,10,5,1]
> >> [0,2,1,4,12,10,5,1]
> >>
> >> There is no way Ceph corrects this automatically ? I have to edit/
> >> troubleshoot it manually ?
> >>
> >> Karun
> >>
> >> On Sat, Dec 16, 2017 at 10:44 PM, Cary  wrote:
> >>>
> >>> Karun,
> >>>
> >>>  Running ceph pg repair should not cause any problems. It may not fix
> >>> the issue though. If that does not help, there is more information at
> >>> the link below.
> >>> http://ceph.com/geen-categorie/ceph-manually-repair-object/
> >>>
> >>> I recommend not rebooting, or restarting while Ceph is repairing or
> >>> recovering. If possible, wait until the cluster is in a healthy state
> >>> first.
> >>>
> >>> Cary
> >>> -Dynamic
> >>>
> >>> On Sat, Dec 16, 2017 at 2:05 PM, Karun Josy 
> wrote:
> >>> > Hi Cary,
> >>> >
> >>> > No, I didnt try to repair it.
> >>> > I am comparatively new in ceph. Is it okay to try to repair it ?
> >>> > Or should I take any precautions while doing it ?
> >>> >
> >>> > Karun Josy
> >>> >
> >>> > On Sat, Dec 16, 2017 at 2:08 PM, Cary 
> wrote:
> >>> >>
> >>> >> Karun,
> >>> >>
> >>> >>  Did you attempt a "ceph pg repair "? Replace  with the
> pg
> >>> >> ID that needs repaired, 3.4.
> >>> >>
> >>> >> Cary
> >>> >> -D123
> >>> >>
> >>> >> On Sat, Dec 16, 2017 at 8:24 AM, Karun Josy 
> >>> >> wrote:
> >>> >> > Hello,
> >>> >> >
> >>> >> > I added 1 disk to the cluster and after rebalancing, it shows 1 PG
> >>> >> > is in
> >>> >> > remapped state. How can I correct it ?
> >>> >> >
> >>> >> > (I had to restart some osds during the rebalancing as there were
> >>> >> > some
> >>> >> > slow
> >>> >> > requests)
> >>> >> >
> >>> >> > $ ceph pg dump | grep remapped
> >>> >> > dumped all
> >>> >> > 3.4 981  00 0   0
> >>> >> > 2655009792
> >>> >> > 1535 1535 active+clean+remapped 2017-12-15 22:07:21.663964
> >>> >> > 2824'785115
> >>> >> > 2824:2297888 [0,2,NONE,4,12,10,5,1]  0
>  [0,2,1,4,12,10,5,1]
> >>> >> > 0  2288'767367 2017-12-14 11:00:15.576741  417'518549
> 2017-12-08
> >>> >> > 03:56:14.006982
> >>> >> >
> >>> >> > That PG belongs to an erasure pool with k=5, m =3 profile, failure
> >>> >> > domain is
> >>> >> > host.
> >>> >> >
> >>> >> > ===
> >>> >> >
> >>> >> > $ ceph osd tree
> >>> >> > ID  CLASS WEIGHT   TYPE NAMESTATUS REWEIGHT
> PRI-AFF
> >>> >> >  -1   16.94565 root default
> >>> >> >  -32.73788 host ceph-a1
> >>> >> >   0   ssd  1.86469 osd.0up  1.0
> 1.0
> >>> >> >  14   ssd  0.87320 osd.14   up  1.0
> 1.0
> >>> >> >  -52.73788 host ceph-a2
> >>> >> >   1   ssd  1.86469 osd.1up  1.0
> 1.0
> >>> >> >  15   ssd  0.87320

Re: [ceph-users] Adding new host

2017-12-17 Thread Karun Josy

Hi David,

Thank you for your response.

Failure domain for ec profile is 'host'.  So I guess it is okay to add a
node and activate 5 disks at a time ?

$ ceph osd erasure-code-profile get profile5by3
crush-device-class=
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=5
m=3
plugin=jerasure
technique=reed_sol_van
w=8


Karun Josy

On Sun, Dec 17, 2017 at 11:26 PM, David Turner 
wrote:

> I like to avoid adding disks from more than 1 failure domain at a time in
> case some of the new disks are bad. In your example of only adding 1 new
> node, I would say that adding all of the disks at the same time is the
> better way to do it.
>
> Adding only 1 disk in the new node at a time would actually be worse for
> the balance of the cluster as it would only have 1 disk while the rest have
> all 5 or more.
>
> The EC profile shouldn't play into account as you already have enough
> hosts to fulfill it.
>
> On Sun, Dec 17, 2017, 11:57 AM Karun Josy  wrote:
>
>> Hi,
>>
>> We have a live cluster with 8 OSD nodes all having 5-6 disks each.
>>
>> We would like to add a new host and expand the cluster.
>>
>> We have 4 pools
>> - 3 replicated pools with replication factor 5 and 3
>> - 1 erasure coded pool with k=5, m=3
>>
>> So my concern is, is there any precautions that are needed to add the new
>> host since the ec profile is 5+3.
>>
>> And can we add multiple disks at the same time in the new host ? Or
>> should it be 1 at a time ?
>>
>>
>>
>> Karun
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Adding new host

2017-12-17 Thread Karun Josy

Hi,

We have a live cluster with 8 OSD nodes all having 5-6 disks each.

We would like to add a new host and expand the cluster.

We have 4 pools
- 3 replicated pools with replication factor 5 and 3
- 1 erasure coded pool with k=5, m=3

So my concern is, is there any precautions that are needed to add the new
host since the ec profile is 5+3.

And can we add multiple disks at the same time in the new host ? Or should
it be 1 at a time ?



Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PG active+clean+remapped status

2017-12-16 Thread Karun Josy

Any help would be appreciated!

Karun Josy

On Sat, Dec 16, 2017 at 11:04 PM, Karun Josy  wrote:

> Hi,
>
> Repair didnt fix the issue.
>
> In the pg dump details, I notice this None. Seems pg is missing from one
> of the OSD
>
> [0,2,NONE,4,12,10,5,1]
> [0,2,1,4,12,10,5,1]
>
> There is no way Ceph corrects this automatically ? I have to edit/
> troubleshoot it manually ?
>
> Karun
>
> On Sat, Dec 16, 2017 at 10:44 PM, Cary  wrote:
>
>> Karun,
>>
>>  Running ceph pg repair should not cause any problems. It may not fix
>> the issue though. If that does not help, there is more information at
>> the link below.
>> http://ceph.com/geen-categorie/ceph-manually-repair-object/
>>
>> I recommend not rebooting, or restarting while Ceph is repairing or
>> recovering. If possible, wait until the cluster is in a healthy state
>> first.
>>
>> Cary
>> -Dynamic
>>
>> On Sat, Dec 16, 2017 at 2:05 PM, Karun Josy  wrote:
>> > Hi Cary,
>> >
>> > No, I didnt try to repair it.
>> > I am comparatively new in ceph. Is it okay to try to repair it ?
>> > Or should I take any precautions while doing it ?
>> >
>> > Karun Josy
>> >
>> > On Sat, Dec 16, 2017 at 2:08 PM, Cary  wrote:
>> >>
>> >> Karun,
>> >>
>> >>  Did you attempt a "ceph pg repair "? Replace  with the pg
>> >> ID that needs repaired, 3.4.
>> >>
>> >> Cary
>> >> -D123
>> >>
>> >> On Sat, Dec 16, 2017 at 8:24 AM, Karun Josy 
>> wrote:
>> >> > Hello,
>> >> >
>> >> > I added 1 disk to the cluster and after rebalancing, it shows 1 PG
>> is in
>> >> > remapped state. How can I correct it ?
>> >> >
>> >> > (I had to restart some osds during the rebalancing as there were some
>> >> > slow
>> >> > requests)
>> >> >
>> >> > $ ceph pg dump | grep remapped
>> >> > dumped all
>> >> > 3.4 981  00 0   0
>> 2655009792
>> >> > 1535 1535 active+clean+remapped 2017-12-15 22:07:21.663964
>> >> > 2824'785115
>> >> > 2824:2297888 [0,2,NONE,4,12,10,5,1]  0   [0,2,1,4,12,10,5,1]
>> >> > 0  2288'767367 2017-12-14 11:00:15.576741  417'518549 2017-12-08
>> >> > 03:56:14.006982
>> >> >
>> >> > That PG belongs to an erasure pool with k=5, m =3 profile, failure
>> >> > domain is
>> >> > host.
>> >> >
>> >> > ===
>> >> >
>> >> > $ ceph osd tree
>> >> > ID  CLASS WEIGHT   TYPE NAMESTATUS REWEIGHT PRI-AFF
>> >> >  -1   16.94565 root default
>> >> >  -32.73788 host ceph-a1
>> >> >   0   ssd  1.86469 osd.0up  1.0 1.0
>> >> >  14   ssd  0.87320 osd.14   up  1.0 1.0
>> >> >  -52.73788 host ceph-a2
>> >> >   1   ssd  1.86469 osd.1up  1.0 1.0
>> >> >  15   ssd  0.87320 osd.15   up  1.0 1.0
>> >> >  -71.86469 host ceph-a3
>> >> >   2   ssd  1.86469 osd.2up  1.0 1.0
>> >> >  -91.74640 host ceph-a4
>> >> >   3   ssd  0.87320 osd.3up  1.0 1.0
>> >> >   4   ssd  0.87320 osd.4up  1.0 1.0
>> >> > -111.74640 host ceph-a5
>> >> >   5   ssd  0.87320 osd.5up  1.0 1.0
>> >> >   6   ssd  0.87320 osd.6up  1.0 1.0
>> >> > -131.74640 host ceph-a6
>> >> >   7   ssd  0.87320 osd.7up  1.0 1.0
>> >> >   8   ssd  0.87320 osd.8up  1.0 1.0
>> >> > -151.74640 host ceph-a7
>> >> >   9   ssd  0.87320 osd.9up  1.0 1.0
>> >> >  10   ssd  0.87320 osd.10   up  1.0 1.0
>> >> > -172.61960 host ceph-a8
>> >> >  11   ssd  0.87320 osd.11   up  1.0 1.0
>> >> >  12   ssd  0.87320 osd.12   up  1.0 1.0
>> >> >  13   ssd  0.87320 osd.13   up  1.0 1.0
>> >> >
>> >> >
>> >> >
>> >> > Karun
>> >> >
>> >> > ___
>> >> > ceph-users mailing list
>> >> > ceph-users@lists.ceph.com
>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >
>> >
>> >
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PG active+clean+remapped status

2017-12-16 Thread Karun Josy

Hi,

Repair didnt fix the issue.

In the pg dump details, I notice this None. Seems pg is missing from one of
the OSD

[0,2,NONE,4,12,10,5,1]
[0,2,1,4,12,10,5,1]

There is no way Ceph corrects this automatically ? I have to edit/
troubleshoot it manually ?

Karun

On Sat, Dec 16, 2017 at 10:44 PM, Cary  wrote:

> Karun,
>
>  Running ceph pg repair should not cause any problems. It may not fix
> the issue though. If that does not help, there is more information at
> the link below.
> http://ceph.com/geen-categorie/ceph-manually-repair-object/
>
> I recommend not rebooting, or restarting while Ceph is repairing or
> recovering. If possible, wait until the cluster is in a healthy state
> first.
>
> Cary
> -Dynamic
>
> On Sat, Dec 16, 2017 at 2:05 PM, Karun Josy  wrote:
> > Hi Cary,
> >
> > No, I didnt try to repair it.
> > I am comparatively new in ceph. Is it okay to try to repair it ?
> > Or should I take any precautions while doing it ?
> >
> > Karun Josy
> >
> > On Sat, Dec 16, 2017 at 2:08 PM, Cary  wrote:
> >>
> >> Karun,
> >>
> >>  Did you attempt a "ceph pg repair "? Replace  with the pg
> >> ID that needs repaired, 3.4.
> >>
> >> Cary
> >> -D123
> >>
> >> On Sat, Dec 16, 2017 at 8:24 AM, Karun Josy 
> wrote:
> >> > Hello,
> >> >
> >> > I added 1 disk to the cluster and after rebalancing, it shows 1 PG is
> in
> >> > remapped state. How can I correct it ?
> >> >
> >> > (I had to restart some osds during the rebalancing as there were some
> >> > slow
> >> > requests)
> >> >
> >> > $ ceph pg dump | grep remapped
> >> > dumped all
> >> > 3.4 981  00 0   0
> 2655009792
> >> > 1535 1535 active+clean+remapped 2017-12-15 22:07:21.663964
> >> > 2824'785115
> >> > 2824:2297888 [0,2,NONE,4,12,10,5,1]  0   [0,2,1,4,12,10,5,1]
> >> > 0  2288'767367 2017-12-14 11:00:15.576741  417'518549 2017-12-08
> >> > 03:56:14.006982
> >> >
> >> > That PG belongs to an erasure pool with k=5, m =3 profile, failure
> >> > domain is
> >> > host.
> >> >
> >> > ===
> >> >
> >> > $ ceph osd tree
> >> > ID  CLASS WEIGHT   TYPE NAMESTATUS REWEIGHT PRI-AFF
> >> >  -1   16.94565 root default
> >> >  -32.73788 host ceph-a1
> >> >   0   ssd  1.86469 osd.0up  1.0 1.0
> >> >  14   ssd  0.87320 osd.14   up  1.0 1.0
> >> >  -52.73788 host ceph-a2
> >> >   1   ssd  1.86469 osd.1up  1.0 1.0
> >> >  15   ssd  0.87320 osd.15   up  1.0 1.0
> >> >  -71.86469 host ceph-a3
> >> >   2   ssd  1.86469 osd.2up  1.0 1.0
> >> >  -91.74640 host ceph-a4
> >> >   3   ssd  0.87320 osd.3up  1.0 1.0
> >> >   4   ssd  0.87320 osd.4up  1.0 1.0
> >> > -111.74640 host ceph-a5
> >> >   5   ssd  0.87320 osd.5up  1.0 1.0
> >> >   6   ssd  0.87320 osd.6up  1.0 1.0
> >> > -131.74640 host ceph-a6
> >> >   7   ssd  0.87320 osd.7up  1.0 1.0
> >> >   8   ssd  0.87320 osd.8up  1.0 1.0
> >> > -151.74640 host ceph-a7
> >> >   9   ssd  0.87320 osd.9up  1.0 1.0
> >> >  10   ssd  0.87320 osd.10   up  1.0 1.0
> >> > -172.61960 host ceph-a8
> >> >  11   ssd  0.87320 osd.11   up  1.0 1.0
> >> >  12   ssd  0.87320 osd.12   up  1.0 1.0
> >> >  13   ssd  0.87320 osd.13   up  1.0 1.0
> >> >
> >> >
> >> >
> >> > Karun
> >> >
> >> > ___
> >> > ceph-users mailing list
> >> > ceph-users@lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PG active+clean+remapped status

2017-12-16 Thread Karun Josy

Hi Cary,

No, I didnt try to repair it.
I am comparatively new in ceph. Is it okay to try to repair it ?
Or should I take any precautions while doing it ?

Karun Josy

On Sat, Dec 16, 2017 at 2:08 PM, Cary  wrote:

> Karun,
>
>  Did you attempt a "ceph pg repair "? Replace  with the pg
> ID that needs repaired, 3.4.
>
> Cary
> -D123
>
> On Sat, Dec 16, 2017 at 8:24 AM, Karun Josy  wrote:
> > Hello,
> >
> > I added 1 disk to the cluster and after rebalancing, it shows 1 PG is in
> > remapped state. How can I correct it ?
> >
> > (I had to restart some osds during the rebalancing as there were some
> slow
> > requests)
> >
> > $ ceph pg dump | grep remapped
> > dumped all
> > 3.4 981  00 0   0 2655009792
> > 1535 1535 active+clean+remapped 2017-12-15 22:07:21.663964
> 2824'785115
> > 2824:2297888 [0,2,NONE,4,12,10,5,1]  0   [0,2,1,4,12,10,5,1]
> > 0  2288'767367 2017-12-14 11:00:15.576741  417'518549 2017-12-08
> > 03:56:14.006982
> >
> > That PG belongs to an erasure pool with k=5, m =3 profile, failure
> domain is
> > host.
> >
> > ===
> >
> > $ ceph osd tree
> > ID  CLASS WEIGHT   TYPE NAMESTATUS REWEIGHT PRI-AFF
> >  -1   16.94565 root default
> >  -32.73788 host ceph-a1
> >   0   ssd  1.86469 osd.0up  1.0 1.0
> >  14   ssd  0.87320 osd.14   up  1.0 1.0
> >  -52.73788 host ceph-a2
> >   1   ssd  1.86469 osd.1up  1.0 1.0
> >  15   ssd  0.87320 osd.15   up  1.0 1.0
> >  -71.86469 host ceph-a3
> >   2   ssd  1.86469 osd.2up  1.0 1.0
> >  -91.74640 host ceph-a4
> >   3   ssd  0.87320 osd.3up  1.0 1.0
> >   4   ssd  0.87320 osd.4up  1.0 1.0
> > -111.74640 host ceph-a5
> >   5   ssd  0.87320 osd.5up  1.0 1.0
> >   6   ssd  0.87320 osd.6up  1.0 1.0
> > -131.74640 host ceph-a6
> >   7   ssd  0.87320 osd.7up  1.0 1.0
> >   8   ssd  0.87320 osd.8up  1.0 1.0
> > -151.74640 host ceph-a7
> >   9   ssd  0.87320 osd.9up  1.0 1.0
> >  10   ssd  0.87320 osd.10   up  1.0 1.0
> > -172.61960 host ceph-a8
> >  11   ssd  0.87320 osd.11   up  1.0 1.0
> >  12   ssd  0.87320 osd.12   up  1.0 1.0
> >  13   ssd  0.87320 osd.13   up  1.0 1.0
> >
> >
> >
> > Karun
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] PG active+clean+remapped status

2017-12-16 Thread Karun Josy

Hello,

I added 1 disk to the cluster and after rebalancing, it shows 1 PG is in
remapped state. How can I correct it ?

(I had to restart some osds during the rebalancing as there were some slow
requests)

$ ceph pg dump | grep remapped
dumped all
3.4 981  00 0   0 2655009792
1535 1535 active+clean+remapped 2017-12-15 22:07:21.663964  2824'785115
2824:2297888
[0,2,NONE,4,12,10,5,1]  0   [0,2,1,4,12,10,5,1]  0
 2288'767367
2017-12-14 11:00:15.576741  417'518549 2017-12-08 03:56:14.006982

That PG belongs to an erasure pool with k=5, m =3 profile, failure domain
is host.

===

$ ceph osd tree
ID  CLASS WEIGHT   TYPE NAMESTATUS REWEIGHT PRI-AFF
 -1   16.94565 root default
 -32.73788 host ceph-a1
  0   ssd  1.86469 osd.0up  1.0 1.0
 14   ssd  0.87320 osd.14   up  1.0 1.0
 -52.73788 host ceph-a2
  1   ssd  1.86469 osd.1up  1.0 1.0
 15   ssd  0.87320 osd.15   up  1.0 1.0
 -71.86469 host ceph-a3
  2   ssd  1.86469 osd.2up  1.0 1.0
 -91.74640 host ceph-a4
  3   ssd  0.87320 osd.3up  1.0 1.0
  4   ssd  0.87320 osd.4up  1.0 1.0
-111.74640 host ceph-a5
  5   ssd  0.87320 osd.5up  1.0 1.0
  6   ssd  0.87320 osd.6up  1.0 1.0
-131.74640 host ceph-a6
  7   ssd  0.87320 osd.7up  1.0 1.0
  8   ssd  0.87320 osd.8up  1.0 1.0
-151.74640 host ceph-a7
  9   ssd  0.87320 osd.9up  1.0 1.0
 10   ssd  0.87320 osd.10   up  1.0 1.0
-172.61960 host ceph-a8
 11   ssd  0.87320 osd.11   up  1.0 1.0
 12   ssd  0.87320 osd.12   up  1.0 1.0
 13   ssd  0.87320 osd.13   up  1.0 1.0



Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Health Error : Request Stuck

2017-12-13 Thread Karun Josy

Hi Nick,

Finally, was able to correct the issue!

We found that there were many slow requests in ceph health detail.
And found that some osds were slowing the cluster down.

Initially the cluster was unusable when there were 10 PGs with
"activating+remapped"
status and slow requests.
Slow requests were mainly on 2 osds. And we restarted osd daemons one by
one, which cleared the block requests.

And that made the cluster reusable. However, there were 4 PGs still in
inactive state.
So I took down one of the osd with slow requests for some time, and allowed
the cluster to rebalance.
And it worked!

To be honest, not exactly sure its the correct way.

P.S : I had upgraded to Luminous 12.2.2 yesterday.


Karun Josy

On Wed, Dec 13, 2017 at 4:31 PM, Nick Fisk  wrote:

> Hi Karun,
>
>
>
> I too am experiencing something very similar with a PG stuck in
> activating+remapped state after re-introducing a OSD back into the cluster
> as Bluestore. Although this new OSD is not the one listed against the PG’s
> stuck activating. I also see the same thing as you where the up set is
> different to the acting set.
>
>
>
> Can I just ask what ceph version you are running and the output of ceph
> osd tree?
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Karun Josy
> *Sent:* 13 December 2017 07:06
> *To:* ceph-users 
> *Subject:* Re: [ceph-users] Health Error : Request Stuck
>
>
>
> Cluster is unusable because of inactive PGs. How can we correct it?
>
>
>
> =
>
> ceph pg dump_stuck inactive
>
> ok
>
> PG_STAT STATE   UP   UP_PRIMARY ACTING
>  ACTING_PRIMARY
>
> 1.4bactivating+remapped [5,2,0,13,1]  5 [5,2,13,1,4]
> 5
>
> 1.35activating+remapped [2,7,0,1,12]  2 [2,7,1,12,9]
> 2
>
> 1.12activating+remapped  [1,3,5,0,7]  1  [1,3,5,7,2]
> 1
>
> 1.4eactivating+remapped  [1,3,0,9,2]  1  [1,3,0,9,5]
> 1
>
> 2.3bactivating+remapped [13,1,0] 13 [13,1,2]
>13
>
> 1.19activating+remapped [2,13,8,9,0]  2 [2,13,8,9,1]
> 2
>
> 1.1eactivating+remapped [2,3,1,10,0]  2 [2,3,1,10,5]
> 2
>
> 2.29activating+remapped [1,0,13]  1 [1,8,11]
> 1
>
> 1.6factivating+remapped [8,2,0,4,13]  8 [8,2,4,13,1]
> 8
>
> 1.74activating+remapped [7,13,2,0,4]  7 [7,13,2,4,1]
> 7
>
> 
>
>
> Karun Josy
>
>
>
> On Wed, Dec 13, 2017 at 8:27 AM, Karun Josy  wrote:
>
> Hello,
>
>
>
> We added a new disk to the cluster and while rebalancing we are getting
> error warnings.
>
>
>
> =
>
> Overall status: HEALTH_ERR
>
> REQUEST_SLOW: 1824 slow requests are blocked > 32 sec
>
> REQUEST_STUCK: 1022 stuck requests are blocked > 4096 sec
>
> ==
>
>
>
> The load in the servers seems to be very low.
>
>
>
> How can I correct it?
>
>
>
>
>
> Karun
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Health Error : Request Stuck

2017-12-12 Thread Karun Josy

Cluster is unusable because of inactive PGs. How can we correct it?

=
ceph pg dump_stuck inactive
ok
PG_STAT STATE   UP   UP_PRIMARY ACTING
 ACTING_PRIMARY
1.4bactivating+remapped [5,2,0,13,1]  5 [5,2,13,1,4]
  5
1.35activating+remapped [2,7,0,1,12]  2 [2,7,1,12,9]
  2
1.12activating+remapped  [1,3,5,0,7]  1  [1,3,5,7,2]
  1
1.4eactivating+remapped  [1,3,0,9,2]  1  [1,3,0,9,5]
  1
2.3bactivating+remapped [13,1,0] 13 [13,1,2]
 13
1.19activating+remapped [2,13,8,9,0]  2 [2,13,8,9,1]
  2
1.1eactivating+remapped [2,3,1,10,0]  2 [2,3,1,10,5]
  2
2.29activating+remapped [1,0,13]  1 [1,8,11]
  1
1.6factivating+remapped [8,2,0,4,13]  8 [8,2,4,13,1]
  8
1.74activating+remapped [7,13,2,0,4]  7 [7,13,2,4,1]
  7


Karun Josy

On Wed, Dec 13, 2017 at 8:27 AM, Karun Josy  wrote:

> Hello,
>
> We added a new disk to the cluster and while rebalancing we are getting
> error warnings.
>
> =
> Overall status: HEALTH_ERR
> REQUEST_SLOW: 1824 slow requests are blocked > 32 sec
> REQUEST_STUCK: 1022 stuck requests are blocked > 4096 sec
> ==
>
> The load in the servers seems to be very low.
>
> How can I correct it?
>
>
> Karun
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Health Error : Request Stuck

2017-12-12 Thread Karun Josy

Hello,

We added a new disk to the cluster and while rebalancing we are getting
error warnings.

=
Overall status: HEALTH_ERR
REQUEST_SLOW: 1824 slow requests are blocked > 32 sec
REQUEST_STUCK: 1022 stuck requests are blocked > 4096 sec
==

The load in the servers seems to be very low.

How can I correct it?


Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] HEALTH_ERR : PG_DEGRADED_FULL

2017-12-07 Thread Karun Josy

Hi Lars, Sean,

Thank you for your response.
The cluster health is ok now! :)



Karun Josy

On Thu, Dec 7, 2017 at 3:35 PM, Sean Redmond 
wrote:

> Can you share - ceph osd tree / crushmap and `ceph health detail` via
> pastebin?
>
> Is recovery stuck or it is on going?
>
> On 7 Dec 2017 07:06, "Karun Josy"  wrote:
>
>> Hello,
>>
>> I am seeing health error in our production cluster.
>>
>>  health: HEALTH_ERR
>> 1105420/11038158 objects misplaced (10.015%)
>> Degraded data redundancy: 2046/11038158 objects degraded
>> (0.019%), 102 pgs unclean, 2 pgs degraded
>> Degraded data redundancy (low space): 4 pgs backfill_toofull
>>
>> The cluster space was running out.
>> So I was in the process of adding a disk.
>> Since I got this error, we deleted some of the data to create more space.
>>
>>
>> This is the current usage, after clearing some space, earlier 3 disks
>> were at 85%.
>> 
>>
>> $ ceph osd df
>> ID CLASS WEIGHT  REWEIGHT SIZE   USE   AVAIL %USE  VAR  PGS
>>  0   ssd 1.86469  1.0  1909G  851G 1058G 44.59 0.78 265
>> 16   ssd 0.87320  1.0   894G  361G  532G 40.43 0.71 112
>>  1   ssd 0.87320  1.0   894G  586G  307G 65.57 1.15 163
>>  2   ssd 0.87320  1.0   894G  490G  403G 54.84 0.96 145
>> 17   ssd 0.87320  1.0   894G  163G  731G 18.24 0.32  58
>>  3   ssd 0.87320  1.0   894G  616G  277G 68.98 1.21 176
>>  4   ssd 0.87320  1.0   894G  593G  300G 66.42 1.17 179
>>  5   ssd 0.87320  1.0   894G  419G  474G 46.89 0.82 130
>>  6   ssd 0.87320  1.0   894G  422G  472G 47.21 0.83 129
>>  7   ssd 0.87320  1.0   894G  397G  496G 44.50 0.78 115
>>  8   ssd 0.87320  1.0   894G  656G  237G 73.44 1.29 184
>>  9   ssd 0.87320  1.0   894G  560G  333G 62.72 1.10 170
>> 10   ssd 0.87320  1.0   894G  623G  270G 69.78 1.22 183
>> 11   ssd 0.87320  1.0   894G  586G  307G 65.57 1.15 172
>> 12   ssd 0.87320  1.0   894G  610G  283G 68.29 1.20 172
>> 13   ssd 0.87320  1.0   894G  597G  296G 66.87 1.17 180
>> 14   ssd 0.87320  1.0   894G  597G  296G 66.79 1.17 168
>> 15   ssd 0.87320  1.0   894G  610G  283G 68.32 1.20 179
>> TOTAL 17110G 9746G 7363G 56.97
>>
>> How to fix this? Please help!
>>
>> Karun
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] HEALTH_ERR : PG_DEGRADED_FULL

2017-12-06 Thread Karun Josy

Hello,

I am seeing health error in our production cluster.

 health: HEALTH_ERR
1105420/11038158 objects misplaced (10.015%)
Degraded data redundancy: 2046/11038158 objects degraded
(0.019%), 102 pgs unclean, 2 pgs degraded
Degraded data redundancy (low space): 4 pgs backfill_toofull

The cluster space was running out.
So I was in the process of adding a disk.
Since I got this error, we deleted some of the data to create more space.


This is the current usage, after clearing some space, earlier 3 disks were
at 85%.


$ ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZE   USE   AVAIL %USE  VAR  PGS
 0   ssd 1.86469  1.0  1909G  851G 1058G 44.59 0.78 265
16   ssd 0.87320  1.0   894G  361G  532G 40.43 0.71 112
 1   ssd 0.87320  1.0   894G  586G  307G 65.57 1.15 163
 2   ssd 0.87320  1.0   894G  490G  403G 54.84 0.96 145
17   ssd 0.87320  1.0   894G  163G  731G 18.24 0.32  58
 3   ssd 0.87320  1.0   894G  616G  277G 68.98 1.21 176
 4   ssd 0.87320  1.0   894G  593G  300G 66.42 1.17 179
 5   ssd 0.87320  1.0   894G  419G  474G 46.89 0.82 130
 6   ssd 0.87320  1.0   894G  422G  472G 47.21 0.83 129
 7   ssd 0.87320  1.0   894G  397G  496G 44.50 0.78 115
 8   ssd 0.87320  1.0   894G  656G  237G 73.44 1.29 184
 9   ssd 0.87320  1.0   894G  560G  333G 62.72 1.10 170
10   ssd 0.87320  1.0   894G  623G  270G 69.78 1.22 183
11   ssd 0.87320  1.0   894G  586G  307G 65.57 1.15 172
12   ssd 0.87320  1.0   894G  610G  283G 68.29 1.20 172
13   ssd 0.87320  1.0   894G  597G  296G 66.87 1.17 180
14   ssd 0.87320  1.0   894G  597G  296G 66.79 1.17 168
15   ssd 0.87320  1.0   894G  610G  283G 68.32 1.20 179
TOTAL 17110G 9746G 7363G 56.97

How to fix this? Please help!

Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Adding multiple OSD

2017-12-04 Thread Karun Josy

Thank you for detailed explanation!

Got one another doubt,

This is the total space available in the cluster :

TOTAL : 23490G
Use  : 10170G
Avail : 13320G


But ecpool shows max avail as just 3 TB. What am I missing ?

==


$ ceph df
GLOBAL:
SIZE   AVAIL  RAW USED %RAW USED
23490G 13338G   10151G 43.22
POOLS:
NAMEID USED  %USED MAX AVAIL OBJECTS
ostemplates 1   162G  2.79 1134G   42084
imagepool   34  122G  2.11 1891G   34196
cvm154  8058 0 1891G 950
ecpool1 55 4246G 42.77 3546G 1232590


$ ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZE   USEAVAIL  %USE  VAR  PGS
 0   ssd 1.86469  1.0  1909G   625G  1284G 32.76 0.76 201
 1   ssd 1.86469  1.0  1909G   691G  1217G 36.23 0.84 208
 2   ssd 0.87320  1.0   894G   587G   306G 65.67 1.52 156
11   ssd 0.87320  1.0   894G   631G   262G 70.68 1.63 186
 3   ssd 0.87320  1.0   894G   605G   288G 67.73 1.56 165
14   ssd 0.87320  1.0   894G   635G   258G 71.07 1.64 177
 4   ssd 0.87320  1.0   894G   419G   474G 46.93 1.08 127
15   ssd 0.87320  1.0   894G   373G   521G 41.73 0.96 114
16   ssd 0.87320  1.0   894G   492G   401G 55.10 1.27 149
 5   ssd 0.87320  1.0   894G   288G   605G 32.25 0.74  87
 6   ssd 0.87320  1.0   894G   342G   551G 38.28 0.88 102
 7   ssd 0.87320  1.0   894G   300G   593G 33.61 0.78  93
22   ssd 0.87320  1.0   894G   343G   550G 38.43 0.89 104
 8   ssd 0.87320  1.0   894G   267G   626G 29.90 0.69  77
 9   ssd 0.87320  1.0   894G   376G   518G 42.06 0.97 118
10   ssd 0.87320  1.0   894G   322G   571G 36.12 0.83 102
19   ssd 0.87320  1.0   894G   339G   554G 37.95 0.88 109
12   ssd 0.87320  1.0   894G   360G   534G 40.26 0.93 112
13   ssd 0.87320  1.0   894G   404G   489G 45.21 1.04 120
20   ssd 0.87320  1.0   894G   342G   551G 38.29 0.88 103
23   ssd 0.87320  1.0   894G   148G   745G 16.65 0.38  61
17   ssd 0.87320  1.0   894G   423G   470G 47.34 1.09 117
18   ssd 0.87320  1.0   894G   403G   490G 45.18 1.04 120
21   ssd 0.87320  1.0   894G   444G   450G 49.67 1.15 130
TOTAL 23490G 10170G 13320G 43.30



Karun Josy

On Tue, Dec 5, 2017 at 4:42 AM, Karun Josy  wrote:

> Thank you for detailed explanation!
>
> Got one another doubt,
>
> This is the total space available in the cluster :
>
> TOTAL 23490G
> Use 10170G
> Avail : 13320G
>
>
> But ecpool shows max avail as just 3 TB.
>
>
>
> Karun Josy
>
> On Tue, Dec 5, 2017 at 1:06 AM, David Turner 
> wrote:
>
>> No, I would only add disks to 1 failure domain at a time.  So in your
>> situation where you're adding 2 more disks to each node, I would recommend
>> adding the 2 disks into 1 node at a time.  Your failure domain is the
>> crush-failure-domain=host.  So you can lose a host and only lose 1 copy of
>> the data.  If all of your pools are using the k=5 m=3 profile, then I would
>> say it's fine to add the disks into 2 nodes at a time.  If you have any
>> replica pools for RGW metadata or anything, then I would stick with the 1
>> host at a time.
>>
>> On Mon, Dec 4, 2017 at 2:29 PM Karun Josy  wrote:
>>
>>> Thanks for your reply!
>>>
>>> I am using erasure coded profile with k=5, m=3 settings
>>>
>>> $ ceph osd erasure-code-profile get profile5by3
>>> crush-device-class=
>>> crush-failure-domain=host
>>> crush-root=default
>>> jerasure-per-chunk-alignment=false
>>> k=5
>>> m=3
>>> plugin=jerasure
>>> technique=reed_sol_van
>>> w=8
>>>
>>>
>>> Cluster has 8 nodes, with 3 disks each. We are planning to add 2 more on
>>> each nodes.
>>>
>>> If I understand correctly, then I can add 3 disks at once right ,
>>> assuming 3 disks can fail at a time as per the ec code profile.
>>>
>>> Karun Josy
>>>
>>> On Tue, Dec 5, 2017 at 12:06 AM, David Turner 
>>> wrote:
>>>
>>>> Depending on how well you burn-in/test your new disks, I like to only
>>>> add 1 failure domain of disks at a time in case you have bad disks that
>>>> you're adding.  If you are confident that your disks aren't likely to fail
>>>> during the backfilling, then you can go with more.  I just added 8 servers
>>>> (16 OSDs each) to a cluster with 15 servers (16 OSDs each) all at the same
>>>> time, but we spent 2 weeks testing the hardware before adding the new nodes
>>>> to the cluster.
>>>>
>>>> If you add 1 failure domain at a time

Re: [ceph-users] Adding multiple OSD

2017-12-04 Thread Karun Josy

Thank you for detailed explanation!

Got one another doubt,

This is the total space available in the cluster :

TOTAL 23490G
Use 10170G
Avail : 13320G


But ecpool shows max avail as just 3 TB.



Karun Josy

On Tue, Dec 5, 2017 at 1:06 AM, David Turner  wrote:

> No, I would only add disks to 1 failure domain at a time.  So in your
> situation where you're adding 2 more disks to each node, I would recommend
> adding the 2 disks into 1 node at a time.  Your failure domain is the
> crush-failure-domain=host.  So you can lose a host and only lose 1 copy of
> the data.  If all of your pools are using the k=5 m=3 profile, then I would
> say it's fine to add the disks into 2 nodes at a time.  If you have any
> replica pools for RGW metadata or anything, then I would stick with the 1
> host at a time.
>
> On Mon, Dec 4, 2017 at 2:29 PM Karun Josy  wrote:
>
>> Thanks for your reply!
>>
>> I am using erasure coded profile with k=5, m=3 settings
>>
>> $ ceph osd erasure-code-profile get profile5by3
>> crush-device-class=
>> crush-failure-domain=host
>> crush-root=default
>> jerasure-per-chunk-alignment=false
>> k=5
>> m=3
>> plugin=jerasure
>> technique=reed_sol_van
>> w=8
>>
>>
>> Cluster has 8 nodes, with 3 disks each. We are planning to add 2 more on
>> each nodes.
>>
>> If I understand correctly, then I can add 3 disks at once right ,
>> assuming 3 disks can fail at a time as per the ec code profile.
>>
>> Karun Josy
>>
>> On Tue, Dec 5, 2017 at 12:06 AM, David Turner 
>> wrote:
>>
>>> Depending on how well you burn-in/test your new disks, I like to only
>>> add 1 failure domain of disks at a time in case you have bad disks that
>>> you're adding.  If you are confident that your disks aren't likely to fail
>>> during the backfilling, then you can go with more.  I just added 8 servers
>>> (16 OSDs each) to a cluster with 15 servers (16 OSDs each) all at the same
>>> time, but we spent 2 weeks testing the hardware before adding the new nodes
>>> to the cluster.
>>>
>>> If you add 1 failure domain at a time, then any DoA disks in the new
>>> nodes will only be able to fail with 1 copy of your data instead of across
>>> multiple nodes.
>>>
>>> On Mon, Dec 4, 2017 at 12:54 PM Karun Josy  wrote:
>>>
>>>> Hi,
>>>>
>>>> Is it recommended to add OSD disks one by one or can I add couple of
>>>> disks at a time ?
>>>>
>>>> Current cluster size is about 4 TB.
>>>>
>>>>
>>>>
>>>> Karun
>>>> ___
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Adding multiple OSD

2017-12-04 Thread Karun Josy

Thanks for your reply!

I am using erasure coded profile with k=5, m=3 settings

$ ceph osd erasure-code-profile get profile5by3
crush-device-class=
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=5
m=3
plugin=jerasure
technique=reed_sol_van
w=8


Cluster has 8 nodes, with 3 disks each. We are planning to add 2 more on
each nodes.

If I understand correctly, then I can add 3 disks at once right , assuming
3 disks can fail at a time as per the ec code profile.

Karun Josy

On Tue, Dec 5, 2017 at 12:06 AM, David Turner  wrote:

> Depending on how well you burn-in/test your new disks, I like to only add
> 1 failure domain of disks at a time in case you have bad disks that you're
> adding.  If you are confident that your disks aren't likely to fail during
> the backfilling, then you can go with more.  I just added 8 servers (16
> OSDs each) to a cluster with 15 servers (16 OSDs each) all at the same
> time, but we spent 2 weeks testing the hardware before adding the new nodes
> to the cluster.
>
> If you add 1 failure domain at a time, then any DoA disks in the new nodes
> will only be able to fail with 1 copy of your data instead of across
> multiple nodes.
>
> On Mon, Dec 4, 2017 at 12:54 PM Karun Josy  wrote:
>
>> Hi,
>>
>> Is it recommended to add OSD disks one by one or can I add couple of
>> disks at a time ?
>>
>> Current cluster size is about 4 TB.
>>
>>
>>
>> Karun
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Adding multiple OSD

2017-12-04 Thread Karun Josy

Hi,

Is it recommended to add OSD disks one by one or can I add couple of disks
at a time ?

Current cluster size is about 4 TB.



Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] OSD down ( rocksdb: submit_transaction error: Corruption: block checksum mismatch)

2017-11-23 Thread Karun Josy

Hi,

One OSD in the cluster is down. Tried to restart the service, but its still
failing.
I can see the below error in log file. Can this be a hardware issue ?

-

  -9> 2017-11-23 09:47:37.768969 7f368686a700  3 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.1/rpm/el7/BUILD/ceph-12.2.1/src/rocksdb/db/db_impl_compaction_flush.cc:1591]
Compaction error: Corruption: block checksum mismatch
-8> 2017-11-23 09:47:37.768980 7f368686a700  4 rocksdb: (Original Log
Time 2017/11/23-09:47:37.768936)
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.1/rpm/el7/BUILD/ceph-12.2.1/src/rocksdb/db/compaction_job.cc:621]
[default] compacted to: base level 1 max bytes base 268435456 files[11 1 0
0 0 0 0] max score 0.00, MB/sec: 2.3 rd, 2.0 wr, level 1, files in(11, 1)
out(1) MB in(0.1, 7.8) out(7.0), read-write-amplify(202.0)
write-amplify(94.6) Corruption: block checksum mismatch, records in: 42
-7> 2017-11-23 09:47:37.768984 7f368686a700  4 rocksdb: (Original Log
Time 2017/11/23-09:47:37.768963) EVENT_LOG_v1 {"time_micros":
1511459257768950, "job": 3, "event": "compaction_finished",
"compaction_time_micros": 3667366, "output_level": 1, "num_output_files":
1, "total_output_size": 7317366, "num_input_records": 38738,
"num_output_records": 37539, "num_subcompactions": 1,
"num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0,
"lsm_state": [11, 1, 0, 0, 0, 0, 0]}
-6> 2017-11-23 09:47:37.768988 7f368686a700  2 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.1/rpm/el7/BUILD/ceph-12.2.1/src/rocksdb/db/db_impl_compaction_flush.cc:1275]
Waiting after background compaction error: Corruption: block checksum
mismatch, Accumulated background error counts: 1
-5> 2017-11-23 09:47:38.245022 7f369a708d00  5 osd.6 pg_epoch: 324
pg[3.98s5(unlocked)] enter Initial
-4> 2017-11-23 09:47:38.245256 7f369a708d00  5 osd.6 pg_epoch: 324
pg[3.98s5( empty local-lis/les=323/324 n=0 ec=69/69 lis/c 323/323 les/c/f
324/324/0 323/323/69) [2,11,7,1,0,6,9,3] r=5 lpr=0 crt=0'0 unknown NOTIFY]
exit Initial 0.000235 0 0.00
-3> 2017-11-23 09:47:38.245275 7f369a708d00  5 osd.6 pg_epoch: 324
pg[3.98s5( empty local-lis/les=323/324 n=0 ec=69/69 lis/c 323/323 les/c/f
324/324/0 323/323/69) [2,11,7,1,0,6,9,3] r=5 lpr=0 crt=0'0 unknown NOTIFY]
enter Reset
-2> 2017-11-23 09:47:38.245288 7f369a708d00  5 write_log_and_missing
with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615,
writeout_from: 4294967295'18446744073709551615, trimmed: , trimmed_dups: ,
clear_divergent_priors: 0
-1> 2017-11-23 09:47:38.245355 7f368806d700 -1 rocksdb:
submit_transaction error: Corruption: block checksum mismatch code = 2
Rocksdb transaction:
Put( Prefix = M key = 0x052c'.can_rollback_to' Value size = 12)
Put( Prefix = M key = 0x052c'.rollback_info_trimmed_to' Value
size = 12)
Put( Prefix = O key =
0x858003190021213dfffe'o' Value
size = 29)
 0> 2017-11-23 09:47:38.247357 7f368806d700 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.1/rpm/el7/BUILD/ceph-12.2.1/src/os/bluestore/BlueStore.cc:
In function 'void BlueStore::_kv_sync_thread()' thread 7f368806d700 time
2017-11-23 09:47:38.245386
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.1/rpm/el7/BUILD/ceph-12.2.1/src/os/bluestore/BlueStore.cc:
8453: FAILED assert(r == 0)

Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Admin server

2017-11-23 Thread Karun Josy

Hi,

Just a not so significant doubt :)

We have a cluster with 1 admin server and 3 monitors and 8 OSD nodes.
Admin server is used to deploy the cluster.

What if the admin server permanently fails?
Will it affect the cluster ?


Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to set osd_max_backfills in Luminous

2017-11-22 Thread Karun Josy

Thanks!

Karun Josy

On Wed, Nov 22, 2017 at 5:44 AM, Jean-Charles Lopez 
wrote:

> Hi,
>
> to check a current value use the following command on the machine where
> the OSD you want to check is running
>
> ceph daemon osd.{id} config show | grep {parameter}
>   Or
> ceph daemon osd.{id} config get {parameter}
>
> What you are seeing is actually a known glitch where you are being told it
> has no effect when in fact it does. See capture below
> [root@luminous ceph-deploy]# ceph daemon osd.0 config get
> osd_max_backfills
> {
> "osd_max_backfills": "1"
> }
> [root@luminous ceph-deploy]# ceph tell osd.* injectargs
> '--osd_max_backfills 2'
> osd.0: osd_max_backfills = '2' rocksdb_separate_wal_dir = 'false' (not
> observed, change may require restart)
> osd.1: osd_max_backfills = '2' rocksdb_separate_wal_dir = 'false' (not
> observed, change may require restart)
> osd.2: osd_max_backfills = '2' rocksdb_separate_wal_dir = 'false' (not
> observed, change may require restart)
> [root@luminous ceph-deploy]# ceph daemon osd.0 config get
> osd_max_backfills
> {
> "osd_max_backfills": "2"
> }
>
> Regards
> JC
>
> On Nov 21, 2017, at 15:17, Karun Josy  wrote:
>
> Hello,
>
> We added couple of OSDs to the cluster and the recovery is taking much
> time.
>
> So I tried to increase the osd_max_backfills value dynamically. But its
> saying the change may need restart.
>
> $ ceph tell osd.* injectargs '--osd-max-backfills 5'
> osd.0: osd_max_backfills = '5' osd_objectstore = 'bluestore' (not
> observed, change may require restart) rocksdb_separate_wal_dir = 'false'
> (not observed, change may require restart)
> 
> 
> =
>
> The value seems to be not changed too.
>
> [cephuser@ceph-las-admin-a1 home]$  ceph -n osd.0 --show-config | grep
> osd_max_backfills
> osd_max_backfills = 1
>
> Do I have to really restart all the OSD daemons ?
>
>
>
> Karun
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] How to set osd_max_backfills in Luminous

2017-11-21 Thread Karun Josy

Hello,

We added couple of OSDs to the cluster and the recovery is taking much time.

So I tried to increase the osd_max_backfills value dynamically. But its
saying the change may need restart.

$ ceph tell osd.* injectargs '--osd-max-backfills 5'
osd.0: osd_max_backfills = '5' osd_objectstore = 'bluestore' (not observed,
change may require restart) rocksdb_separate_wal_dir = 'false' (not
observed, change may require restart)


=

The value seems to be not changed too.

[cephuser@ceph-las-admin-a1 home]$  ceph -n osd.0 --show-config | grep
osd_max_backfills
osd_max_backfills = 1

Do I have to really restart all the OSD daemons ?



Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Reuse pool id

2017-11-15 Thread Karun Josy

Any suggestions ?

Karun Josy

On Mon, Nov 13, 2017 at 10:06 PM, Karun Josy  wrote:

> Hi,
>
> Is there anyway we can change or reuse pool id ?
> I had created and deleted lot of test pools. So the IDs kind of look like
> this now:
>
> ---
> $ ceph osd lspools
> 34 imagepool,37 cvmpool,40 testecpool,41 ecpool1,
> --
>
> Can I change it to 0,1,2,3 etc ?
>
> Karun
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Incorrect pool usage statistics

2017-11-14 Thread Karun Josy

Help?!

There seems to be many objects still present in the pool :
-
$ rados df
POOL_NAME   USED   OBJECTS CLONES  COPIES  MISSING_ON_PRIMARY UNFOUND
DEGRADED RD_OPSRDWR_OPSWR
vm 886   105  0 315  0
 00943399 1301M 39539 30889M
ecpool 403G   388652 316701 2720564  0
 00 156972536 1081G 203383441  4074G
imagepool89014M   22485  0   67455  0   0
  0   7856029  708G  13140767   602G
template   115G29848 43  149240  0
 00  66138389 2955G   1123900   539G


Karun Josy

On Tue, Nov 14, 2017 at 4:16 AM, Karun Josy  wrote:

> Hello,
>
> Recently, I deleted all the disks from an erasure pool 'ecpool'.
> The pool is empty. However the space usage shows around 400GB.
> What might be wrong?
>
>
> $ rbd ls -l ecpool
> $ $ ceph df
>
> GLOBAL:
> SIZE   AVAIL  RAW USED %RAW USED
> 19019G 16796G2223G 11.69
> POOLS:
> NAMEID USED   %USED MAX AVAIL OBJECTS
> template 1227G  1.59 2810G   58549
> vm 21  0 0 4684G   2
> ecpool  33   403G  2.7910038G  388652
> imagepool       34 90430M  0.62 4684G   22789
>
>
>
> Karun Josy
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Incorrect pool usage statistics

2017-11-13 Thread Karun Josy

Hello,

Recently, I deleted all the disks from an erasure pool 'ecpool'.
The pool is empty. However the space usage shows around 400GB.
What might be wrong?


$ rbd ls -l ecpool
$ $ ceph df

GLOBAL:
SIZE   AVAIL  RAW USED %RAW USED
19019G 16796G2223G 11.69
POOLS:
NAMEID USED   %USED MAX AVAIL OBJECTS
template 1227G  1.59 2810G   58549
vm 21  0 0 4684G   2
ecpool  33   403G  2.7910038G  388652
imagepool   34 90430M  0.62 4684G   22789



Karun Josy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Reuse pool id

2017-11-13 Thread Karun Josy

Hi,

Is there anyway we can change or reuse pool id ?
I had created and deleted lot of test pools. So the IDs kind of look like
this now:

---
$ ceph osd lspools
34 imagepool,37 cvmpool,40 testecpool,41 ecpool1,
--

Can I change it to 0,1,2,3 etc ?

Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Disconnect a client Hypervisor

2017-11-08 Thread Karun Josy

Hi,

Do you think there is a way for ceph to disconnect an HV client from a
cluster?

We want to prevent the possibility that two hvs are running the same vm.
When a hv crashes, we have to make sure that when the
vms are started in a new hv, that the disk is not open in the crashed hv.


I can see 'eviction' in filesystem:
http://docs.ceph.com/docs/master/cephfs/eviction/

But we are implementing RBD in erasure coded profile.


Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] OSD daemons active in nodes after removal

2017-10-25 Thread Karun Josy

Hello everyone! :)

I have an interesting problem. For a few weeks, we've been testing Luminous
in a cluster made up of 8 servers and with about 20 SSD disks almost evenly
distributed. It is running erasure coding.

Yesterday, we decided to bring the cluster to a minimum of 8 servers and 1
disk per server.

So, we went ahead and removed the additional disks from the ceph cluster,
by executing commands like this from the admin server:

---
$ ceph osd out osd.20
osd.20 is already out.
$ ceph osd down osd.20
marked down osd.20.
$ ceph osd purge osd.20 --yes-i-really-mean-it
Error EBUSY: osd.20 is not `down`.

So I logged in  to the host it resides on and killed it systemctl stop ceph
-osd@26
$ ceph osd purge osd.20 --yes-i-really-mean-it
purged osd.20


We waited for the cluster to be healthy once again and I physically removed
the disks (hot swap, connected to an LSI 3008 controller). A few minutes
after that, I needed to turn off one of the OSD servers to swap out a piece
of hardware inside. So, I issued:

ceph osd set noout

And proceeded to turn off that 1 OSD server.

But the interesting thing happened then. Once that 1 server came back up,
the cluster all of a sudden showed that out of the 8 nodes, only 2 were up!

8 (2 up, 5 in)

Even more interesting is that it seems Ceph, in each OSD server, still
thinks the missing disks are there!

When I start ceph on each OSD server with "systemctl start ceph-osd.target",
/var/logs/ceph gets filled with logs for disks that are not supposed to
exist anymore.

The contents of the logs show something like:

# cat /var/log/ceph/ceph-osd.7.log
2017-10-20 08:45:16.389432 7f8ee6e36d00  0 set uid:gid to 167:167 (ceph:ceph
)
2017-10-20 08:45:16.389449 7f8ee6e36d00  0 ceph version 12.2.1
(3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable), process
(unknown), pid 2591
2017-10-20 08:45:16.389639 7f8ee6e36d00 -1  ** ERROR: unable to open OSD
superblock on /var/lib/ceph/osd/ceph-7: (2) No such file or directory
2017-10-20 08:45:36.639439 7fb389277d00  0 set uid:gid to 167:167 (ceph:ceph
)

The actual Ceph cluster sees only 8 disks, as you can see here:

$ ceph osd tree
ID  CLASS WEIGHT  TYPE NAME STATUS REWEIGHT PRI-AFF
 -1   7.97388 root default
 -3   1.86469 host ceph-las1-a1-osd
  1   ssd 1.86469 osd.1   down0 1.0
 -5   0.87320 host ceph-las1-a2-osd
  2   ssd 0.87320 osd.2   down0 1.0
 -7   0.87320 host ceph-las1-a3-osd
  4   ssd 0.87320 osd.4   down  1.0 1.0
 -9   0.87320 host ceph-las1-a4-osd
  8   ssd 0.87320 osd.8 up  1.0 1.0
-11   0.87320 host ceph-las1-a5-osd
 12   ssd 0.87320 osd.12  down  1.0 1.0
-13   0.87320 host ceph-las1-a6-osd
 17   ssd 0.87320 osd.17up  1.0 1.0
-15   0.87320 host ceph-las1-a7-osd
 21   ssd 0.87320 osd.21  down  1.0 1.0
-17   0.87000 host ceph-las1-a8-osd
 28   ssd 0.87000 osd.28  down0 1.0


Linux, in the OSD servers, seems to also think the disks are in:

# df -h
Filesystem  Size  Used Avail Use% Mounted on
/dev/sde2   976M  183M  727M  21% /boot
/dev/sdd197M  5.4M   92M   6% /var/lib/ceph/osd/ceph-7
/dev/sdc197M  5.4M   92M   6% /var/lib/ceph/osd/ceph-6
/dev/sda197M  5.4M   92M   6% /var/lib/ceph/osd/ceph-4
/dev/sdb197M  5.4M   92M   6% /var/lib/ceph/osd/ceph-5
tmpfs   6.3G 0  6.3G   0% /run/user/0

It should show only one disk, not 4.

I tried to issue again the commands to remove the disks, this time, in the
OSD server itself:

$ ceph osd out osd.X
osd.X does not exist.

$ ceph osd purge osd.X --yes-i-really-mean-it
osd.X does not exist

Yet, if I again issue "systemctl start ceph-osd.target", /var/log/ceph again
shows logs for a disk that does not exist (to make sure, I deleted all logs
prior).

So, it seems, somewhere, Ceph in the OSD still thinks there should be more
disks?

The Ceph cluster is unusable though. We've tried everything to bring it
back again. But as Dr. Bones would say, it's dead Jim.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Erasure code profile

2017-10-24 Thread Karun Josy

Thank you for your reply.

I am finding it confusing to understand the profile structure.
Consider a cluster of 8 OSD servers with 3 disks on each server.

If I use a profile setting of k=5, m=3 and  ruleset-failure-domain=host ;

Encoding Rate (r) : r = k / n , where n = k+m = 5/8 = 0.625
Storage Required : 1 / r = 1 / 0.625 = 1.6 times original file

Is this correct? And more importantly, will the profile work without
failure?

As far as I understand it can tolerate failure of 3 OSDs and 1 host, am I
right ?

I can't find much information from this link :
-
http://docs.ceph.com/docs/master/rados/operations/erasure-code-profile/


Is there a better article that I can refer to ?


Karun Josy

On Tue, Oct 24, 2017 at 1:23 AM, David Turner  wrote:

> This can be changed to a failure domain of OSD in which case it could
> satisfy the criteria.  The problem with a failure domain of OSD, is that
> all of your data could reside on a single host and you could lose access to
> your data after restarting a single host.
>
> On Mon, Oct 23, 2017 at 3:23 PM LOPEZ Jean-Charles 
> wrote:
>
>> Hi,
>>
>> the default failure domain if not specified on the CLI at the moment you
>> create your EC profile is set to HOST. So you need 14 OSDs spread across 14
>> different nodes by default. And you only have 8 different nodes.
>>
>> Regards
>> JC
>>
>> On 23 Oct 2017, at 21:13, Karun Josy  wrote:
>>
>> Thank you for the reply.
>>
>> There are 8 OSD nodes with 23 OSDs in total. (However, they are not
>> distributed equally on all nodes)
>>
>> So it satisfies that criteria, right?
>>
>>
>>
>> Karun Josy
>>
>> On Tue, Oct 24, 2017 at 12:30 AM, LOPEZ Jean-Charles 
>> wrote:
>>
>>> Hi,
>>>
>>> yes you need as many OSDs that k+m is equal to. In your example you need
>>> a minimum of 14 OSDs for each PG to become active+clean.
>>>
>>> Regards
>>> JC
>>>
>>> On 23 Oct 2017, at 20:29, Karun Josy  wrote:
>>>
>>> Hi,
>>>
>>> While creating a pool with erasure code profile k=10, m=4, I get PG
>>> status as
>>> "200 creating+incomplete"
>>>
>>> While creating pool with profile k=5, m=3 it works fine.
>>>
>>> Cluster has 8 OSDs with total 23 disks.
>>>
>>> Is there any requirements for setting the first profile ?
>>>
>>> Karun
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Erasure code profile

2017-10-23 Thread Karun Josy

Thank you for the reply.

There are 8 OSD nodes with 23 OSDs in total. (However, they are not
distributed equally on all nodes)

So it satisfies that criteria, right?



Karun Josy

On Tue, Oct 24, 2017 at 12:30 AM, LOPEZ Jean-Charles 
wrote:

> Hi,
>
> yes you need as many OSDs that k+m is equal to. In your example you need a
> minimum of 14 OSDs for each PG to become active+clean.
>
> Regards
> JC
>
> On 23 Oct 2017, at 20:29, Karun Josy  wrote:
>
> Hi,
>
> While creating a pool with erasure code profile k=10, m=4, I get PG status
> as
> "200 creating+incomplete"
>
> While creating pool with profile k=5, m=3 it works fine.
>
> Cluster has 8 OSDs with total 23 disks.
>
> Is there any requirements for setting the first profile ?
>
> Karun
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Erasure code profile

2017-10-23 Thread Karun Josy

Hi,

While creating a pool with erasure code profile k=10, m=4, I get PG status
as
"200 creating+incomplete"

While creating pool with profile k=5, m=3 it works fine.

Cluster has 8 OSDs with total 23 disks.

Is there any requirements for setting the first profile ?

Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

67 matches

Mail list logo