from:"Amudhan P"

[ceph-users] Re: OSD not starting

2023-11-05 Thread Amudhan P

Hi Alex,

Thank you very much, yes it was a time sync issue after fixing time sync
OSD service started.

regards,
Amudhan

On Sat, Nov 4, 2023 at 9:07 PM Alex Gorbachev 
wrote:

> Hi  Amudhan,
>
> Have you checked the time sync?  This could be an issue:
>
> https://tracker.ceph.com/issues/17170
> --
> Alex Gorbachev
> Intelligent Systems Services Inc.
> http://www.iss-integration.com
> https://www.linkedin.com/in/alex-gorbachev-iss/
>
>
>
> On Sat, Nov 4, 2023 at 11:22 AM Amudhan P  wrote:
>
>> Hi,
>>
>> One of the server  in Ceph cluster accidentally shutdown abruptly due to
>> power failure. After restarting OSD's not coming up and in Ceph health
>> check it shows osd down.
>> When checking OSD status "osd.26 18865 unable to obtain rotating service
>> keys; retrying"
>> For every 30 seconds it's just putting a message and it's all the same in
>> all OSD in the system.
>>
>> Nov 04 20:03:05 strg-node-03 bash[34287]: debug
>> 2023-11-04T14:33:05.089+ 7f1f5693c080 -1 osd.26 18865 unable to obtain
>> rotating service keys; retrying
>> Nov 04 20:03:35 strg-node-03 bash[34287]: debug
>> 2023-11-04T14:33:35.090+ 7f1f5693c080 -1 osd.26 18865 unable to obtain
>> rotating service keys; retrying
>>
>> ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific
>> (stable)  on Debian 11 bullseye and cephadm based  installation.
>>
>> Tried to search for errors and msg couldn't find anything useful.
>>
>> How do I fix this issue ?
>>
>> regards,
>> Amudhan
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] OSD not starting

2023-11-04 Thread Amudhan P

Hi,

One of the server  in Ceph cluster accidentally shutdown abruptly due to
power failure. After restarting OSD's not coming up and in Ceph health
check it shows osd down.
When checking OSD status "osd.26 18865 unable to obtain rotating service
keys; retrying"
For every 30 seconds it's just putting a message and it's all the same in
all OSD in the system.

Nov 04 20:03:05 strg-node-03 bash[34287]: debug
2023-11-04T14:33:05.089+ 7f1f5693c080 -1 osd.26 18865 unable to obtain
rotating service keys; retrying
Nov 04 20:03:35 strg-node-03 bash[34287]: debug
2023-11-04T14:33:35.090+ 7f1f5693c080 -1 osd.26 18865 unable to obtain
rotating service keys; retrying

ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific
(stable)  on Debian 11 bullseye and cephadm based  installation.

Tried to search for errors and msg couldn't find anything useful.

How do I fix this issue ?

regards,
Amudhan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ceph failing to write data - MDSs read only

2023-01-02 Thread Amudhan P

Hi Kotresh,

The issue is fixed for now I followed  the steps below.

I have an unmounted kernel client and restarted mds service which brought
back mds to normal. But even after this "1 MDSs behind on trimming issue"
didn't solve I waited for about 20 - 30 mins which automatically fixed the
trimming issue and ceph status is healthy now.

I didn't modify the settings related to the MDS cache they are in their
default settings.


On Mon, Jan 2, 2023 at 10:54 AM Kotresh Hiremath Ravishankar <
khire...@redhat.com> wrote:

> The MDS requests the clients to release caps to trim caches when there is
> cache pressure or it
> might proactively request the client to release caps in some cases. But
> the client is failing to release the
> caps soon enough in your case.
>
> Few questions:
>
> 1. Have you tuned MDS cache configurations? If so please share.
> 2. Is this kernel client or fuse client?
> 3. Could you please share 'session ls' output?
> 4. Also share the MDS/Client logs.
>
> Sometimes dropping the caches (echo 3 > /proc/sys/vm/drop_caches if it's
> kclient) or unmount and mounting
> the problematic client  could fix the issue if it's acceptable.
>
> Thanks and Regards,
> Kotresh H R
>
> On Thu, Dec 29, 2022 at 4:35 PM Amudhan P  wrote:
>
>> Hi,
>>
>> Suddenly facing an issue with Ceph cluster I am using ceph version 16.2.6.
>> I couldn't find any solution for the issue below.
>> Any suggestions?
>>
>>
>> health: HEALTH_WARN
>> 1 clients failing to respond to capability release
>> 1 clients failing to advance oldest client/flush tid
>> 1 MDSs are read only
>> 1 MDSs report slow requests
>> 1 MDSs behind on trimming
>>
>>   services:
>> mon: 3 daemons, quorum strg-node1,strg-node2,strg-node3 (age 9w)
>> mgr: strg-node1.ivkfid(active, since 9w), standbys: strg-node2.unyimy
>> mds: 1/1 daemons up, 1 standby
>> osd: 32 osds: 32 up (since 9w), 32 in (since 5M)
>>
>>   data:
>> volumes: 1/1 healthy
>> pools:   3 pools, 321 pgs
>> objects: 13.19M objects, 45 TiB
>> usage:   90 TiB used, 85 TiB / 175 TiB avail
>> pgs: 319 active+clean
>>  2   active+clean+scrubbing+deep
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] ceph failing to write data - MDSs read only

2022-12-29 Thread Amudhan P

Hi,

Suddenly facing an issue with Ceph cluster I am using ceph version 16.2.6.
I couldn't find any solution for the issue below.
Any suggestions?


health: HEALTH_WARN
1 clients failing to respond to capability release
1 clients failing to advance oldest client/flush tid
1 MDSs are read only
1 MDSs report slow requests
1 MDSs behind on trimming

  services:
mon: 3 daemons, quorum strg-node1,strg-node2,strg-node3 (age 9w)
mgr: strg-node1.ivkfid(active, since 9w), standbys: strg-node2.unyimy
mds: 1/1 daemons up, 1 standby
osd: 32 osds: 32 up (since 9w), 32 in (since 5M)

  data:
volumes: 1/1 healthy
pools:   3 pools, 321 pgs
objects: 13.19M objects, 45 TiB
usage:   90 TiB used, 85 TiB / 175 TiB avail
pgs: 319 active+clean
 2   active+clean+scrubbing+deep
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] ceph mgr alert mail using tls

2021-09-18 Thread Amudhan P

Hi,
I am trying to configure Ceph (version 15.2.3) mgr alert email using
office365 account I get the below error.

[WRN] ALERTS_SMTP_ERROR: unable to send alert email
[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1056)

Configured SMTP server and port 587.
I have followed the document specified on the Ceph site is there any other
config for TLS?


regards
Amudhan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: osds crash and restart in octopus

2021-09-03 Thread Amudhan P

I also have a similar problem in my case OSD's starts and stops after a few
mins and not much in the log.

I have filed a bug waiting for a reply to confirm it's a bug or an issue.







On Fri, Sep 3, 2021 at 5:21 PM mahnoosh shahidi 
wrote:

> We still have this problem. Does anybody have any ideas about this?
>
> On Mon, Aug 23, 2021 at 9:53 AM mahnoosh shahidi 
> wrote:
>
> > Hi everyone,
> >
> > We have a problem with octopus 15.2.12. osds randomly crash and restart
> > with the following traceback log.
> >
> > -8> 2021-08-20T15:01:03.165+0430 7f2d10fd7700 10 monclient:
> > handle_auth_request added challenge on 0x55a3fc654400
> > -7> 2021-08-20T15:01:03.201+0430 7f2d02960700  2 osd.202 1145364
> > ms_handle_reset con 0x55a548087000 session 0x55a4be8a4940
> > -6> 2021-08-20T15:01:03.209+0430 7f2d02960700  2 osd.202 1145364
> > ms_handle_reset con 0x55a52aab2800 session 0x55a4497dd0c0
> > -5> 2021-08-20T15:01:03.213+0430 7f2d02960700  2 osd.202 1145364
> > ms_handle_reset con 0x55a548084800 session 0x55a3fca0f860
> > -4> 2021-08-20T15:01:03.217+0430 7f2d02960700  2 osd.202 1145364
> > ms_handle_reset con 0x55a3c5e50800 session 0x55a51c1b7680
> > -3> 2021-08-20T15:01:03.217+0430 7f2d02960700  2 osd.202 1145364
> > ms_handle_reset con 0x55a3c5e52000 session 0x55a4055932a0
> > -2> 2021-08-20T15:01:03.225+0430 7f2d02960700  2 osd.202 1145364
> > ms_handle_reset con 0x55a4b835f800 session 0x55a51c1b90c0
> > -1> 2021-08-20T15:01:03.225+0430 7f2d107d6700 10 monclient:
> > handle_auth_request added challenge on 0x55a3c5e52000
> >  0> 2021-08-20T15:01:03.233+0430 7f2d0ffd5700 -1 *** Caught signal
> > (Segmentation fault) **
> >  in thread 7f2d0ffd5700 thread_name:msgr-worker-2
> >
> >  ceph version 15.2.12 (ce065eabfa5ce81323b009786bdf5bb03127cbe1) octopus
> > (stable)
> >  1: (()+0x12980) [0x7f2d144b0980]
> >  2: (AsyncConnection::_stop()+0x9c) [0x55a37bf56cdc]
> >  3: (ProtocolV2::stop()+0x8b) [0x55a37bf8016b]
> >  4: (ProtocolV2::_fault()+0x6b) [0x55a37bf8030b]
> >  5:
> >
> (ProtocolV2::handle_read_frame_preamble_main(std::unique_ptr > ceph::buffer::v15_2_0::ptr_node::disposer>&&, int)+0x1d1)
> [0x55a37bf97d51]
> >  6: (ProtocolV2::run_continuation(Ct&)+0x34) [0x55a37bf80e64]
> >  7: (AsyncConnection::process()+0x5fc) [0x55a37bf59e0c]
> >  8: (EventCenter::process_events(unsigned int,
> > std::chrono::duration
> >*)+0x7dd)
> > [0x55a37bda9a2d]
> >  9: (()+0x11d45a8) [0x55a37bdaf5a8]
> >  10: (()+0xbd6df) [0x7f2d13b886df]
> >  11: (()+0x76db) [0x7f2d144a56db]
> >  12: (clone()+0x3f) [0x7f2d1324571f]
> >  NOTE: a copy of the executable, or `objdump -rdS ` is needed
> > to interpret this.
> >
> > Our cluster has 220 hdd disks and 200 ssds. We have separate nvme for DB
> > use in hdd osds. bucket indexes have also separate ssd disks.
> > Does anybody have any idea what the problem could be?
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: OSD stop and fails

2021-08-30 Thread Amudhan P

Gregory,

I have raised a ticket already.
https://tracker.ceph.com/issues/52445

Amudhan

On Tue, Aug 31, 2021 at 12:00 AM Gregory Farnum  wrote:

> Hmm, this ceph_assert hasn't shown up in my email before. It looks
> like there may be a soft-state bug in Octopus. Can you file a ticket
> at tracker.ceph.com with the bcaktrace and osd log file? We can direct
> that to the RADOS team to check out.
> -Greg
>
> On Sat, Aug 28, 2021 at 7:13 AM Amudhan P  wrote:
> >
> > Hi,
> >
> > I am having a peculiar problem with my ceph octopus cluster. 2 weeks ago
> I
> > had an issue that started like too many scrub error and later random OSD
> > stopped which lead to pg corrupt, replica missing. since it's a testing
> > cluster I wanted to understand the issue.
> > I tried to recover PG but it didn't help. when I use `set norecover,
> > norebalance, nodown` OSD service running without stopping.
> >
> > I have gone thru the steps in ceph osd troubleshooting but nothing helps
> or
> > leads to finding the issue.
> >
> > I have mailed earlier but couldn't get any solution.
> >
> > Any help would be appreciated to find out the issue.
> >
> > *error msg in one of  the OSD which failed.*
> >
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.7/rpm/el8/BUILD/
> > ceph-15.2.7/src/osd/OSD.cc: 9521: FAILED ceph_assert(started <=
> > reserved_pushes)
> >
> >  ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus
> > (stable)
> >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x158) [0x55fcb6621dbe]
> >  2: (()+0x504fd8) [0x55fcb6621fd8]
> >  3: (OSD::do_recovery(PG*, unsigned int, unsigned long,
> > ThreadPool::TPHandle&)+0x5f5) [0x55fcb6704c25]
> >  4: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*,
> > boost::intrusive_ptr&, ThreadPool::TPHandle&)+0x1d) [0x55fcb6960a3d]
> >  5: (OSD::ShardedOpWQ::_process(unsigned int,
> > ceph::heartbeat_handle_d*)+0x12ef) [0x55fcb67224df]
> >  6: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4)
> > [0x55fcb6d5b224]
> >  7: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x55fcb6d5de84]
> >  8: (()+0x82de) [0x7f04c1b1c2de]
> >  9: (clone()+0x43) [0x7f04c0853e83]
> >
> >  0> 2021-08-28T13:53:37.444+ 7f04a128d700 -1 *** Caught signal
> > (Aborted) **
> >  in thread 7f04a128d700 thread_name:tp_osd_tp
> >
> >  ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus
> > (stable)
> >  1: (()+0x12dd0) [0x7f04c1b26dd0]
> >  2: (gsignal()+0x10f) [0x7f04c078f70f]
> >  3: (abort()+0x127) [0x7f04c0779b25]
> >  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x1a9) [0x55fcb6621e0f]
> >  5: (()+0x504fd8) [0x55fcb6621fd8]
> >  6: (OSD::do_recovery(PG*, unsigned int, unsigned long,
> > ThreadPool::TPHandle&)+0x5f5) [0x55fcb6704c25]
> >  7: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*,
> > boost::intrusive_ptr&, ThreadPool::TPHandle&)+0x1d) [0x55fcb6960a3d]
> >  8: (OSD::ShardedOpWQ::_process(unsigned int,
> > ceph::heartbeat_handle_d*)+0x12ef) [0x55fcb67224df]
> >  9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4)
> > [0x55fcb6d5b224]
> >  10: (ShardedThreadPool::WorkThreadSharded::entry()+0x14)
> [0x55fcb6d5de84]
> >  11: (()+0x82de) [0x7f04c1b1c2de]
> >  12: (clone()+0x43) [0x7f04c0853e83]
> >  NOTE: a copy of the executable, or `objdump -rdS ` is needed
> > to interpret this.
> >
> >
> > Thanks
> > Amudhan
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] OSD stop and fails

2021-08-28 Thread Amudhan P

Hi,

I am having a peculiar problem with my ceph octopus cluster. 2 weeks ago I
had an issue that started like too many scrub error and later random OSD
stopped which lead to pg corrupt, replica missing. since it's a testing
cluster I wanted to understand the issue.
I tried to recover PG but it didn't help. when I use `set norecover,
norebalance, nodown` OSD service running without stopping.

I have gone thru the steps in ceph osd troubleshooting but nothing helps or
leads to finding the issue.

I have mailed earlier but couldn't get any solution.

Any help would be appreciated to find out the issue.

*error msg in one of  the OSD which failed.*
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.7/rpm/el8/BUILD/
ceph-15.2.7/src/osd/OSD.cc: 9521: FAILED ceph_assert(started <=
reserved_pushes)

 ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus
(stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x158) [0x55fcb6621dbe]
 2: (()+0x504fd8) [0x55fcb6621fd8]
 3: (OSD::do_recovery(PG*, unsigned int, unsigned long,
ThreadPool::TPHandle&)+0x5f5) [0x55fcb6704c25]
 4: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*,
boost::intrusive_ptr&, ThreadPool::TPHandle&)+0x1d) [0x55fcb6960a3d]
 5: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x12ef) [0x55fcb67224df]
 6: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4)
[0x55fcb6d5b224]
 7: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x55fcb6d5de84]
 8: (()+0x82de) [0x7f04c1b1c2de]
 9: (clone()+0x43) [0x7f04c0853e83]

 0> 2021-08-28T13:53:37.444+ 7f04a128d700 -1 *** Caught signal
(Aborted) **
 in thread 7f04a128d700 thread_name:tp_osd_tp

 ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus
(stable)
 1: (()+0x12dd0) [0x7f04c1b26dd0]
 2: (gsignal()+0x10f) [0x7f04c078f70f]
 3: (abort()+0x127) [0x7f04c0779b25]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1a9) [0x55fcb6621e0f]
 5: (()+0x504fd8) [0x55fcb6621fd8]
 6: (OSD::do_recovery(PG*, unsigned int, unsigned long,
ThreadPool::TPHandle&)+0x5f5) [0x55fcb6704c25]
 7: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*,
boost::intrusive_ptr&, ThreadPool::TPHandle&)+0x1d) [0x55fcb6960a3d]
 8: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x12ef) [0x55fcb67224df]
 9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4)
[0x55fcb6d5b224]
 10: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x55fcb6d5de84]
 11: (()+0x82de) [0x7f04c1b1c2de]
 12: (clone()+0x43) [0x7f04c0853e83]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.


Thanks
Amudhan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Recovery stuck and Multiple PG fails

2021-08-14 Thread Amudhan P

Suresh,

The problem is some of my OSD services is not stable it crashes
continuously.

I have attached OSD log lines during the failure which are already in debug
mode.

let me know if you need more details.

On Sat, Aug 14, 2021 at 8:10 PM Suresh Rama  wrote:

> Amudhan,
>
> Have you looked at the logs and did you try enabling debug to see why the
> OSDs are marked down? There should be some reason right? Just focus on the
> MON and take one node/OSD by enabling debug to see what is happening.
> https://docs.ceph.com/en/latest/cephadm/operations/.
>
> Thanks,
> Suresh
>
> On Sat, Aug 14, 2021, 9:53 AM Amudhan P  wrote:
>
>> Hi,
>> I am stuck with ceph cluster with multiple PG errors due to multiple OSD
>> was stopped and starting OSD's manually again didn't help. OSD service
>> stops again there is no issue with HDD for sure but for some reason, OSD
>> stops.
>>
>> I am using running ceph version 15.2.5 on podman container.
>>
>> How do I recover these pg failures?
>>
>> can someone help me to recover this or where to look further?
>>
>> pgs: 0.360% pgs not active
>>  124186/5082364 objects degraded (2.443%)
>>  29899/5082364 objects misplaced (0.588%)
>>  670 active+clean
>>  69  active+undersized+remapped
>>  26  active+undersized+degraded+remapped+backfill_wait
>>  16  active+undersized+remapped+backfill_wait
>>  15  active+undersized+degraded+remapped
>>  13  active+clean+remapped
>>  9   active+recovery_wait+degraded
>>  4   active+remapped+backfill_wait
>>  3   stale+down
>>  3   active+undersized+remapped+inconsistent
>>  2   active+recovery_wait+degraded+remapped
>>  1   active+recovering+degraded+remapped
>>  1   active+clean+remapped+inconsistent
>>  1   active+recovering+degraded
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
Aug 14 20:25:32 node1 bash[29321]: debug-16> 2021-08-14T14:55:32.139+ 
7f2097869700 10 monclient: handle_auth_request added challenge on 0x5564eccdb400
Aug 14 20:25:32 node1 bash[29321]: debug-15> 2021-08-14T14:55:32.139+ 
7f207afc0700  5 osd.7 pg_epoch: 7180 pg[2.cd( v 6838'194480 
(6838'187296,6838'194480] local-lis/les=7171/7172 n=5007 ec=226/226 
lis/c=7176/6927 les/c/f=7177/6928/0 sis=7180) [7,34]/[7,47] r=0 lpr=7180 
pi=[6927,7180)/1 crt=6838'194480 lcod 0'0 mlcod 0'0 remapped+peering mbc={}] 
exit Started/Primary/Peering/GetInfo 0.486478 5 0.000268
Aug 14 20:25:32 node1 bash[29321]: debug-14> 2021-08-14T14:55:32.139+ 
7f207afc0700  5 osd.7 pg_epoch: 7180 pg[2.cd( v 6838'194480 
(6838'187296,6838'194480] local-lis/les=7171/7172 n=5007 ec=226/226 
lis/c=7176/6927 les/c/f=7177/6928/0 sis=7180) [7,34]/[7,47] r=0 lpr=7180 
pi=[6927,7180)/1 crt=6838'194480 lcod 0'0 mlcod 0'0 remapped+peering mbc={}] 
enter Started/Primary/Peering/GetLog
Aug 14 20:25:32 node1 bash[29321]: debug-13> 2021-08-14T14:55:32.139+ 
7f2083fd2700  3 osd.7 7180 handle_osd_map epochs [7180,7180], i have 7180, src 
has [5697,7180]
Aug 14 20:25:32 node1 bash[29321]: debug-12> 2021-08-14T14:55:32.143+ 
7f207afc0700  5 osd.7 pg_epoch: 7180 pg[2.cd( v 6838'194480 
(6838'187296,6838'194480] local-lis/les=7176/7177 n=5007 ec=226/226 
lis/c=7176/6927 les/c/f=7177/6928/0 sis=7180) [7,34]/[7,47] backfill=[34] r=0 
lpr=7180 pi=[6927,7180)/1 crt=6838'194480 lcod 0'0 mlcod 0'0 remapped+peering 
mbc={}] exit Started/Primary/Peering/GetLog 0.004066 2 0.000112
Aug 14 20:25:32 node1 bash[29321]: debug-11> 2021-08-14T14:55:32.143+ 
7f207afc0700  5 osd.7 pg_epoch: 7180 pg[2.cd( v 6838'194480 
(6838'187296,6838'194480] local-lis/les=7176/7177 n=5007 ec=226/226 
lis/c=7176/6927 les/c/f=7177/6928/0 sis=7180) [7,34]/[7,47] backfill=[34] r=0 
lpr=7180 pi=[6927,7180)/1 crt=6838'194480 lcod 0'0 mlcod 0'0 remapped+peering 
mbc={}] enter Started/Primary/Peering/GetMissing
Aug 14 20:25:32 node1 bash[29321]: debug-10> 2021-08-14T14:55:32.143+ 
7f207afc0700  5 osd.7 pg_epoch: 7180 pg[2.cd( v 6838'194480 
(6838'187296,6838'194480] local-lis/les=7176/7177 n=5007 ec=226/226 
lis/c=7176/6927 les/c/f=7177/6928/0 sis=7180) [7,34]/[7,47] backfill=[34] r=0 
lpr=7180 pi=[6927,7180)/1 crt=6838'194480 lcod 0'0 mlcod 0'0 remapped+peering 
mbc={}] exit Started/Primary/Peering/GetMissing 0.57 0 0.00
Aug 14 20:25:32 node1 bash[29321]: debug -9> 2021-08-14T14:55:32.143+ 
7f207afc0700  5 osd.7 pg_epoch: 7180 pg[2.cd( v 6838'194480 
(6838'187296,6838'194480] local-lis/les=7176/7177 n=5007 ec=226/226 
lis/c=71

[ceph-users] Recovery stuck and Multiple PG fails

2021-08-14 Thread Amudhan P

Hi,
I am stuck with ceph cluster with multiple PG errors due to multiple OSD
was stopped and starting OSD's manually again didn't help. OSD service
stops again there is no issue with HDD for sure but for some reason, OSD
stops.

I am using running ceph version 15.2.5 on podman container.

How do I recover these pg failures?

can someone help me to recover this or where to look further?

pgs: 0.360% pgs not active
 124186/5082364 objects degraded (2.443%)
 29899/5082364 objects misplaced (0.588%)
 670 active+clean
 69  active+undersized+remapped
 26  active+undersized+degraded+remapped+backfill_wait
 16  active+undersized+remapped+backfill_wait
 15  active+undersized+degraded+remapped
 13  active+clean+remapped
 9   active+recovery_wait+degraded
 4   active+remapped+backfill_wait
 3   stale+down
 3   active+undersized+remapped+inconsistent
 2   active+recovery_wait+degraded+remapped
 1   active+recovering+degraded+remapped
 1   active+clean+remapped+inconsistent
 1   active+recovering+degraded
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ceph osd continously fails

2021-08-11 Thread Amudhan P

ph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*,
boost::intrusive_ptr&, ThreadPool::TPHandle&)+0x1d) [0x563b46b74
Aug 11 16:55:48 bash[27152]:  8: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x12ef) [0x563b469364df]
Aug 11 16:55:48 bash[27152]:  9:
(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4)
[0x563b46f6f224]
Aug 11 16:55:48 bash[27152]:  10:
(ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x563b46f71e84]
Aug 11 16:55:48 bash[27152]:  11: (()+0x82de) [0x7fbf528952de]
Aug 11 16:55:48 bash[27152]:  12: (clone()+0x43) [0x7fbf515cce83]
Aug 11 16:55:48 bash[27152]:  NOTE: a copy of the executable, or `objdump
-rdS ` is needed to interpret this.
Aug 11 16:55:48 bash[27152]: debug -1> 2021-08-11T11:25:48.045+
7fbf3f9f4700 10 monclient: tick
Aug 11 16:55:48 bash[27152]: debug  0> 2021-08-11T11:25:48.045+
7fbf3f9f4700 10 monclient: _check_auth_rotating have uptodate secrets (they
expire af
Aug 11 16:55:48 bash[27152]: --- logging levels ---
Aug 11 16:55:48 bash[27152]:0/ 5 none
Aug 11 16:55:48 bash[27152]:0/ 1 lockdep
Aug 11 16:55:48 bash[27152]:0/ 1 context
Aug 11 16:55:48 bash[27152]:1/ 1 crush
Aug 11 16:55:48 bash[27152]:1/ 5 mds
Aug 11 16:55:48 bash[27152]:1/ 5 mds_balancer
Aug 11 16:55:48 bash[27152]:1/ 5 mds_locker
Aug 11 16:55:48 bash[27152]:1/ 5 mds_log
Aug 11 16:55:48 bash[27152]: --- pthread ID / name mapping for recent
threads ---
Aug 11 16:55:48 bash[27152]:   7fbf30002700 / osd_srv_heartbt
Aug 11 16:55:48 bash[27152]:   7fbf30803700 / tp_osd_tp
Aug 11 16:55:48 bash[27152]:   7fbf31004700 / tp_osd_tp
Aug 11 16:55:48 bash[27152]:   7fbf31805700 / tp_osd_tp
Aug 11 16:55:48 bash[27152]:   7fbf32006700 / tp_osd_tp
Aug 11 16:55:48 bash[27152]:   7fbf32807700 / tp_osd_tp
Aug 11 16:55:48 bash[27152]:   7fbf39815700 / ms_dispatch
Aug 11 16:55:48 bash[27152]:   7fbf3a817700 / ms_dispatch
Aug 11 16:55:48 bash[27152]:   7fbf3b819700 / ms_dispatch
Aug 11 16:55:48 bash[27152]:   7fbf3c81b700 / rocksdb:dump_st
Aug 11 16:55:48 bash[27152]:   7fbf3d617700 / fn_anonymous
Aug 11 16:55:48 bash[27152]:   7fbf3e619700 / cfin
Aug 11 16:55:48 bash[27152]:   7fbf3f9f4700 / safe_timer
Aug 11 16:55:48 bash[27152]:   7fbf409f6700 / ms_dispatch
Aug 11 16:55:48 bash[27152]:   7fbf43623700 / bstore_mempool
Aug 11 16:55:48 bash[27152]:   7fbf48833700 / fn_anonymous
Aug 11 16:55:48 bash[27152]:   7fbf4a036700 / safe_timer
Aug 11 16:55:48 bash[27152]:   7fbf4b8a9700 / safe_timer
Aug 11 16:55:48 bash[27152]:   7fbf4c0aa700 / signal_handler
Aug 11 16:55:48 bash[27152]:   7fbf4d0ac700 / admin_socket
Aug 11 16:55:48 bash[27152]:   7fbf4d8ad700 / service
Aug 11 16:55:48 bash[27152]:   7fbf4e0ae700 / msgr-worker-2
Aug 11 16:55:48 bash[27152]:   7fbf4e8af700 / msgr-worker-1
Aug 11 16:55:48 bash[27152]:   7fbf4f0b0700 / msgr-worker-0
Aug 11 16:55:48 bash[27152]:   7fbf54b2cf40 / ceph-osd
Aug 11 16:55:48 bash[27152]:   max_recent 1
Aug 11 16:55:48 bash[27152]:   max_new 1000
Aug 11 16:55:48 bash[27152]:   log_file
/var/lib/ceph/crash/2021-08-11T11:25:47.930411Z_a06defcc-19c6-41df-a37d-c071166cdcf3/log
Aug 11 16:55:48 bash[27152]: --- end dump of recent events ---
Aug 11 16:55:48 bash[27152]: reraise_fatal: default handler for signal 6
didn't terminate the process?

On Wed, Aug 11, 2021 at 5:53 PM Amudhan P  wrote:

> Hi,
> I am using ceph version 15.2.7 in 4 node cluster my OSD's is
> continuously stopping and even if I start again it stops after some time. I
> couldn't find anything from the log.
> I have set norecover and nobackfil as soon as I unset norecover OSD starts
> to fail.
>
>  cluster:
> id: b6437922-3edf-11eb-adc2-0cc47a5ec98a
> health: HEALTH_ERR
> 1/6307061 objects unfound (0.000%)
> noout,nobackfill,norebalance,norecover,noscrub,nodeep-scrub
> flag(s) set
> 19 osds down
> 62477 scrub errors
> Reduced data availability: 75 pgs inactive, 12 pgs down, 57
> pgs peering, 90 pgs stale
> Possible data damage: 1 pg recovery_unfound, 7 pgs inconsistent
> Degraded data redundancy: 3090660/12617416 objects degraded
> (24.495%), 394 pgs degraded, 399 pgs undersized
> 5 pgs not deep-scrubbed in time
> 127 daemons have recently crashed
>
>   data:
> pools:   4 pools, 833 pgs
> objects: 6.31M objects, 23 TiB
> usage:   47 TiB used, 244 TiB / 291 TiB avail
> pgs: 9.004% pgs not active
>  3090660/12617416 objects degraded (24.495%)
>  315034/12617416 objects misplaced (2.497%)
>  1/6307061 objects unfound (0.000%)
>  368 active+undersized+degraded
>  299 active+clean
>  56  stale+peering
>  24  stale+active+clean
>  15  active+recovery_wait
>  12  active+unde

[ceph-users] ceph osd continously fails

2021-08-11 Thread Amudhan P

Hi,
I am using ceph version 15.2.7 in 4 node cluster my OSD's is
continuously stopping and even if I start again it stops after some time. I
couldn't find anything from the log.
I have set norecover and nobackfil as soon as I unset norecover OSD starts
to fail.

 cluster:
id: b6437922-3edf-11eb-adc2-0cc47a5ec98a
health: HEALTH_ERR
1/6307061 objects unfound (0.000%)
noout,nobackfill,norebalance,norecover,noscrub,nodeep-scrub
flag(s) set
19 osds down
62477 scrub errors
Reduced data availability: 75 pgs inactive, 12 pgs down, 57 pgs
peering, 90 pgs stale
Possible data damage: 1 pg recovery_unfound, 7 pgs inconsistent
Degraded data redundancy: 3090660/12617416 objects degraded
(24.495%), 394 pgs degraded, 399 pgs undersized
5 pgs not deep-scrubbed in time
127 daemons have recently crashed

  data:
pools:   4 pools, 833 pgs
objects: 6.31M objects, 23 TiB
usage:   47 TiB used, 244 TiB / 291 TiB avail
pgs: 9.004% pgs not active
 3090660/12617416 objects degraded (24.495%)
 315034/12617416 objects misplaced (2.497%)
 1/6307061 objects unfound (0.000%)
 368 active+undersized+degraded
 299 active+clean
 56  stale+peering
 24  stale+active+clean
 15  active+recovery_wait
 12  active+undersized+remapped
 11  active+undersized+degraded+remapped+backfill_wait
 11  down
 7   active+recovery_wait+degraded
 7   active+clean+remapped
 5   active+clean+remapped+inconsistent
 5   stale+activating+undersized
 4   active+recovering+degraded
 2   stale+active+recovery_wait+degraded
 1   active+recovery_unfound+undersized+degraded+remapped
 1   stale+remapped+peering
 1   stale+activating
 1   stale+down
 1   active+remapped+backfill_wait
 1   active+undersized+remapped+inconsistent
 1
active+undersized+degraded+remapped+inconsistent+backfill_wait


what needs to be done to recover this?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Not able to read file from ceph kernel mount

2020-11-12 Thread Amudhan P

Hi,

This issue is fixed now after setting cluster_IP to only osd's. Mount works
perfectly fine.

"ceph config set osd cluster_network 10.100.4.0/24"

regards
Amudhan
On Sat, Nov 7, 2020 at 10:09 PM Amudhan P  wrote:

> Hi,
>
> At last, the problem fixed for now by adding cluster network IP to the
> second interface.
>
> But It looks weird why the client wants to communicate with Cluster IP.
>
> Does anyone have an idea? why we need to provide cluster IP to client
> mounting thru kernel.
>
> Initially, when the cluster was set up it had only public network. later
> added cluster with cluster IP and it was working fine until the restart of
> the entire cluster.
>
> regards
> Amudhan P
>
> On Fri, Nov 6, 2020 at 12:02 AM Amudhan P  wrote:
>
>> Hi,
>> I am trying to read file from my ceph kernel mount and file read stays in
>> bytes for very long and I am getting below error msg in dmesg.
>>
>> [  167.591095] ceph: loaded (mds proto 32)
>> [  167.600010] libceph: mon0 10.0.103.1:6789 session established
>> [  167.601167] libceph: client144519 fsid f8bc7682-0d11-11eb-a332-
>> 0cc47a5ec98a
>> [  272.132787] libceph: osd1 10.0.104.1:6891 socket closed (con state
>> CONNECTING)
>>
>> Ceph cluster status is healthy no error It was working fine until before
>> my entire cluster was down.
>>
>> Using Ceph octopus in debian.
>>
>> Regards
>> Amudhan P
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs Kernel client not working properly without ceph cluster IP

2020-11-12 Thread Amudhan P

Hi Eugen,

The issue looks fixed now in my kernel client mount works fine without
cluster IP.

I have re-run "ceph config set osd cluster_network 10.100.4.0/24" and
restarted all service. Eearlier it was run with "ceph config set global
cluster_network 10.100.4.0/24".

I have run the command output you have asked and output is after
applying all the changes above said.
# ceph config get mon cluster_network
ouput :
#ceph config get mon public_network
output : 10.100.3.0/24

Still testing more on this to confirm the issue and playing out with my
ceph cluster.

regards
Amudhan P

On Wed, Nov 11, 2020 at 2:14 PM Eugen Block  wrote:

> > Do you find any issue in the below commands I have used to set cluster IP
> > in cluster.
>
> Yes I do:
>
> > ### adding public IP for ceph cluster ###
> > ceph config set global cluster_network 10.100.4.0/24
>
> I'm still not convinced that your setup is as you want it to be.
> Can you share your actual config?
>
> ceph config get mon cluster_network
> ceph config get mon public_network
>
>
>
> Zitat von Amudhan P :
>
> > Hi Eugen,
> >
> > I have only added my Public IP and relevant hostname to hosts file.
> >
> > Do you find any issue in the below commands I have used to set cluster IP
> > in cluster.
> >
> > ### adding public IP for ceph cluster ###
> > ceph config set global cluster_network 10.100.4.0/24
> >
> > ceph orch daemon reconfig mon.host1
> > ceph orch daemon reconfig mon.host2
> > ceph orch daemon reconfig mon.host3
> > ceph orch daemon reconfig osd.1
> > ceph orch daemon reconfig osd.2
> > ceph orch daemon reconfig osd.3
> >
> > restarting all daemons.
> >
> > regards
> > Amudhan
> >
> > On Tue, Nov 10, 2020 at 7:42 PM Eugen Block  wrote:
> >
> >> Could it be possible that you have some misconfiguration in the name
> >> resolution and IP mapping? I've never heard or experienced that a
> >> client requires a cluster address, that would make the whole concept
> >> of separate networks obsolete which is hard to believe, to be honest.
> >> I would recommend to double-check your setup.
> >>
> >>
> >> Zitat von Amudhan P :
> >>
> >> > Hi Nathan,
> >> >
> >> > Kernel client should be using only the public IP of the cluster to
> >> > communicate with OSD's.
> >> >
> >> > But here it requires both IP's for mount to work properly.
> >> >
> >> > regards
> >> > Amudhan
> >> >
> >> >
> >> >
> >> > On Mon, Nov 9, 2020 at 9:51 PM Nathan Fish 
> wrote:
> >> >
> >> >> It sounds like your client is able to reach the mon but not the OSD?
> >> >> It needs to be able to reach all mons and all OSDs.
> >> >>
> >> >> On Sun, Nov 8, 2020 at 4:29 AM Amudhan P 
> wrote:
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > I have mounted my cephfs (ceph octopus) thru kernel client in
> Debian.
> >> >> > I get following error in "dmesg" when I try to read any file from
> my
> >> >> mount.
> >> >> > "[  236.429897] libceph: osd1 10.100.4.1:6891 socket closed (con
> >> state
> >> >> > CONNECTING)"
> >> >> >
> >> >> > I use public IP (10.100.3.1) and cluster IP (10.100.4.1) in my ceph
> >> >> > cluster. I think public IP is enough to mount the share and work
> on it
> >> >> but
> >> >> > in my case, it needs me to assign public IP also to the client to
> work
> >> >> > properly.
> >> >> >
> >> >> > Does anyone have experience in this?
> >> >> >
> >> >> > I have earlier also mailed the ceph-user group but I didn't get any
> >> >> > response. So sending again not sure my mail went through.
> >> >> >
> >> >> > regards
> >> >> > Amudhan
> >> >> > ___
> >> >> > ceph-users mailing list -- ceph-users@ceph.io
> >> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >> >>
> >> > ___
> >> > ceph-users mailing list -- ceph-users@ceph.io
> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >>
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
>
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs Kernel client not working properly without ceph cluster IP

2020-11-10 Thread Amudhan P

Hi Eugen,

I have only added my Public IP and relevant hostname to hosts file.

Do you find any issue in the below commands I have used to set cluster IP
in cluster.

### adding public IP for ceph cluster ###
ceph config set global cluster_network 10.100.4.0/24

ceph orch daemon reconfig mon.host1
ceph orch daemon reconfig mon.host2
ceph orch daemon reconfig mon.host3
ceph orch daemon reconfig osd.1
ceph orch daemon reconfig osd.2
ceph orch daemon reconfig osd.3

restarting all daemons.

regards
Amudhan

On Tue, Nov 10, 2020 at 7:42 PM Eugen Block  wrote:

> Could it be possible that you have some misconfiguration in the name
> resolution and IP mapping? I've never heard or experienced that a
> client requires a cluster address, that would make the whole concept
> of separate networks obsolete which is hard to believe, to be honest.
> I would recommend to double-check your setup.
>
>
> Zitat von Amudhan P :
>
> > Hi Nathan,
> >
> > Kernel client should be using only the public IP of the cluster to
> > communicate with OSD's.
> >
> > But here it requires both IP's for mount to work properly.
> >
> > regards
> > Amudhan
> >
> >
> >
> > On Mon, Nov 9, 2020 at 9:51 PM Nathan Fish  wrote:
> >
> >> It sounds like your client is able to reach the mon but not the OSD?
> >> It needs to be able to reach all mons and all OSDs.
> >>
> >> On Sun, Nov 8, 2020 at 4:29 AM Amudhan P  wrote:
> >> >
> >> > Hi,
> >> >
> >> > I have mounted my cephfs (ceph octopus) thru kernel client in Debian.
> >> > I get following error in "dmesg" when I try to read any file from my
> >> mount.
> >> > "[  236.429897] libceph: osd1 10.100.4.1:6891 socket closed (con
> state
> >> > CONNECTING)"
> >> >
> >> > I use public IP (10.100.3.1) and cluster IP (10.100.4.1) in my ceph
> >> > cluster. I think public IP is enough to mount the share and work on it
> >> but
> >> > in my case, it needs me to assign public IP also to the client to work
> >> > properly.
> >> >
> >> > Does anyone have experience in this?
> >> >
> >> > I have earlier also mailed the ceph-user group but I didn't get any
> >> > response. So sending again not sure my mail went through.
> >> >
> >> > regards
> >> > Amudhan
> >> > ___
> >> > ceph-users mailing list -- ceph-users@ceph.io
> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs Kernel client not working properly without ceph cluster IP

2020-11-10 Thread Amudhan P

Hi Janne,

My OSD's have both public IP and Cluster IP configured. The monitor node
and OSD nodes are co-located.

regards
Amudhan P

On Tue, Nov 10, 2020 at 4:45 PM Janne Johansson  wrote:

>
>
> Den tis 10 nov. 2020 kl 11:13 skrev Amudhan P :
>
>> Hi Nathan,
>>
>> Kernel client should be using only the public IP of the cluster to
>> communicate with OSD's.
>>
>
> "ip of the cluster" is a bit weird way to state it.
>
> A mounting client needs only to talk to ips in the public range yes, but
> OSDs alwaysneed to have an ip in the public range too.
> The private range is only for OSD<->OSD traffic and can be in the private
> network, meaning an OSD which uses both private and public ranges needs two
> ips, one in each range.
>
>
>
>> But here it requires both IP's for mount to work properly.
>>
>> regards
>> Amudhan
>>
>>
>>
>> On Mon, Nov 9, 2020 at 9:51 PM Nathan Fish  wrote:
>>
>> > It sounds like your client is able to reach the mon but not the OSD?
>> > It needs to be able to reach all mons and all OSDs.
>> >
>> > On Sun, Nov 8, 2020 at 4:29 AM Amudhan P  wrote:
>> > >
>> > > Hi,
>> > >
>> > > I have mounted my cephfs (ceph octopus) thru kernel client in Debian.
>> > > I get following error in "dmesg" when I try to read any file from my
>> > mount.
>> > > "[  236.429897] libceph: osd1 10.100.4.1:6891 socket closed (con
>> state
>> > > CONNECTING)"
>> > >
>> > > I use public IP (10.100.3.1) and cluster IP (10.100.4.1) in my ceph
>> > > cluster. I think public IP is enough to mount the share and work on it
>> > but
>> > > in my case, it needs me to assign public IP also to the client to work
>> > > properly.
>> > >
>> > > Does anyone have experience in this?
>> > >
>> > > I have earlier also mailed the ceph-user group but I didn't get any
>> > > response. So sending again not sure my mail went through.
>> > >
>> > > regards
>> > > Amudhan
>> > > ___
>> > > ceph-users mailing list -- ceph-users@ceph.io
>> > > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
>
> --
> May the most significant bit of your life be positive.
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs Kernel client not working properly without ceph cluster IP

2020-11-10 Thread Amudhan P

Hi Nathan,

Kernel client should be using only the public IP of the cluster to
communicate with OSD's.

But here it requires both IP's for mount to work properly.

regards
Amudhan



On Mon, Nov 9, 2020 at 9:51 PM Nathan Fish  wrote:

> It sounds like your client is able to reach the mon but not the OSD?
> It needs to be able to reach all mons and all OSDs.
>
> On Sun, Nov 8, 2020 at 4:29 AM Amudhan P  wrote:
> >
> > Hi,
> >
> > I have mounted my cephfs (ceph octopus) thru kernel client in Debian.
> > I get following error in "dmesg" when I try to read any file from my
> mount.
> > "[  236.429897] libceph: osd1 10.100.4.1:6891 socket closed (con state
> > CONNECTING)"
> >
> > I use public IP (10.100.3.1) and cluster IP (10.100.4.1) in my ceph
> > cluster. I think public IP is enough to mount the share and work on it
> but
> > in my case, it needs me to assign public IP also to the client to work
> > properly.
> >
> > Does anyone have experience in this?
> >
> > I have earlier also mailed the ceph-user group but I didn't get any
> > response. So sending again not sure my mail went through.
> >
> > regards
> > Amudhan
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs Kernel client not working properly without ceph cluster IP

2020-11-10 Thread Amudhan P

Hi Eugen,

Yes, you're right other than OSD's rest don't require cluster IP.

But in my case, I don't know what went wrong my kernel client requires
cluster IP for the mount to work properly.

About my setup; :-
Cluster Initially bootstrapped configured with public IP only, later added
cluster IP by the below steps.

### adding public IP for ceph cluster ###
ceph config set global cluster_network 10.100.4.0/24

ceph orch daemon reconfig mon.host1
ceph orch daemon reconfig mon.host2
ceph orch daemon reconfig mon.host3
ceph orch daemon reconfig osd.1
ceph orch daemon reconfig osd.2
ceph orch daemon reconfig osd.3

restarting all daemons.

regards
Amudhan P

On Mon, Nov 9, 2020 at 9:49 PM Eugen Block  wrote:

> Clients don't need the cluster IP because that's only for OSD <--> OSD
> replication, no client traffic. But of course to be able to
> communicate with Ceph the clients need a public IP, how else would
> they contact the MON? Or did I misunderstand your setup?
>
>
> Zitat von Amudhan P :
>
> > Hi,
> >
> > I have mounted my cephfs (ceph octopus) thru kernel client in Debian.
> > I get following error in "dmesg" when I try to read any file from my
> mount.
> > "[  236.429897] libceph: osd1 10.100.4.1:6891 socket closed (con state
> > CONNECTING)"
> >
> > I use public IP (10.100.3.1) and cluster IP (10.100.4.1) in my ceph
> > cluster. I think public IP is enough to mount the share and work on it
> but
> > in my case, it needs me to assign public IP also to the client to work
> > properly.
> >
> > Does anyone have experience in this?
> >
> > I have earlier also mailed the ceph-user group but I didn't get any
> > response. So sending again not sure my mail went through.
> >
> > regards
> > Amudhan
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: pg xyz is stuck undersized for long time

2020-11-08 Thread Amudhan P

Hi Frank,

You said only one OSD is down but in ceph status shows more than 20 OSD is
down.

Regards,
Amudhan

On Sun 8 Nov, 2020, 12:13 AM Frank Schilder,  wrote:

> Hi all,
>
> I moved the crush location of 8 OSDs and rebalancing went on happily
> (misplaced objects only). Today, osd.1 crashed, restarted and rejoined the
> cluster. However, it seems not to re-join some PGs it was a member of. I
> have now undersized PGs for no real reason I would believe:
>
> PG_DEGRADED Degraded data redundancy: 52173/2268789087 objects degraded
> (0.002%), 2 pgs degraded, 7 pgs undersized
> pg 11.52 is stuck undersized for 663.929664, current state
> active+undersized+remapped+backfilling, last acting
> [237,60,2147483647,74,233,232,292,86]
>
> The up and acting sets are:
>
> "up": [
> 237,
> 2,
> 74,
> 289,
> 233,
> 232,
> 292,
> 86
> ],
> "acting": [
> 237,
> 60,
> 2147483647,
> 74,
> 233,
> 232,
> 292,
> 86
> ],
>
> How can I get the PG to complete peering and osd.1 to join? I have an
> unreasonable number of degraded objects where the missing part is on this
> OSD.
>
> For completeness, here the cluster status:
>
> # ceph status
>   cluster:
> id: ...
> health: HEALTH_ERR
> noout,norebalance flag(s) set
> 1 large omap objects
> 35815902/2268938858 objects misplaced (1.579%)
> Degraded data redundancy: 46122/2268938858 objects degraded
> (0.002%), 2 pgs degraded, 7 pgs undersized
> Degraded data redundancy (low space): 28 pgs backfill_toofull
>
>   services:
> mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
> mgr: ceph-01(active), standbys: ceph-03, ceph-02
> mds: con-fs2-1/1/1 up  {0=ceph-08=up:active}, 1 up:standby-replay
> osd: 299 osds: 275 up, 275 in; 301 remapped pgs
>  flags noout,norebalance
>
>   data:
> pools:   11 pools, 3215 pgs
> objects: 268.8 M objects, 675 TiB
> usage:   854 TiB used, 1.1 PiB / 1.9 PiB avail
> pgs: 46122/2268938858 objects degraded (0.002%)
>  35815902/2268938858 objects misplaced (1.579%)
>  2907 active+clean
>  219  active+remapped+backfill_wait
>  47   active+remapped+backfilling
>  28   active+remapped+backfill_wait+backfill_toofull
>  6active+clean+scrubbing+deep
>  5active+undersized+remapped+backfilling
>  2active+undersized+degraded+remapped+backfilling
>  1active+clean+scrubbing
>
>   io:
> client:   13 MiB/s rd, 196 MiB/s wr, 2.82 kop/s rd, 1.81 kop/s wr
> recovery: 57 MiB/s, 14 objects/s
>
> Thanks and best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Cephfs Kernel client not working properly without ceph cluster IP

2020-11-08 Thread Amudhan P

Hi,

I have mounted my cephfs (ceph octopus) thru kernel client in Debian.
I get following error in "dmesg" when I try to read any file from my mount.
"[  236.429897] libceph: osd1 10.100.4.1:6891 socket closed (con state
CONNECTING)"

I use public IP (10.100.3.1) and cluster IP (10.100.4.1) in my ceph
cluster. I think public IP is enough to mount the share and work on it but
in my case, it needs me to assign public IP also to the client to work
properly.

Does anyone have experience in this?

I have earlier also mailed the ceph-user group but I didn't get any
response. So sending again not sure my mail went through.

regards
Amudhan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Not able to read file from ceph kernel mount

2020-11-07 Thread Amudhan P

Hi,
Hi,

At last, the problem fixed for now by adding cluster network IP to the
second interface.

But It looks weird why the client wants to communicate with Cluster IP.

Does anyone have an idea? why we need to provide cluster IP to client
mounting thru kernel.

Initially, when the cluster was set up it had only public network. later
added cluster with cluster IP and it was working fine until the restart of
the entire cluster.

regards
Amudhan P

On Fri, Nov 6, 2020 at 12:02 AM Amudhan P  wrote:
>
>> Hi,
>> I am trying to read file from my ceph kernel mount and file read stays in
>> bytes for very long and I am getting below error msg in dmesg.
>>
>> [  167.591095] ceph: loaded (mds proto 32)
>> [  167.600010] libceph: mon0 10.0.103.1:6789 session established
>> [  167.601167] libceph: client144519 fsid f8bc7682-0d11-11eb-a332-
>> 0cc47a5ec98a
>> [  272.132787] libceph: osd1 10.0.104.1:6891 socket closed (con state
>> CONNECTING)
>>
>> Ceph cluster status is healthy no error It was working fine until before
>> my entire cluster was down.
>>
>> Using Ceph octopus in debian.
>>
>> Regards
>> Amudhan P
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Not able to read file from ceph kernel mount

2020-11-07 Thread Amudhan P

Hi,

At last, the problem fixed for now by adding cluster network IP to the
second interface.

But It looks weird why the client wants to communicate with Cluster IP.

Does anyone have an idea? why we need to provide cluster IP to client
mounting thru kernel.

Initially, when the cluster was set up it had only public network. later
added cluster with cluster IP and it was working fine until the restart of
the entire cluster.

regards
Amudhan P

On Fri, Nov 6, 2020 at 12:02 AM Amudhan P  wrote:

> Hi,
> I am trying to read file from my ceph kernel mount and file read stays in
> bytes for very long and I am getting below error msg in dmesg.
>
> [  167.591095] ceph: loaded (mds proto 32)
> [  167.600010] libceph: mon0 10.0.103.1:6789 session established
> [  167.601167] libceph: client144519 fsid f8bc7682-0d11-11eb-a332-
> 0cc47a5ec98a
> [  272.132787] libceph: osd1 10.0.104.1:6891 socket closed (con state
> CONNECTING)
>
> Ceph cluster status is healthy no error It was working fine until before
> my entire cluster was down.
>
> Using Ceph octopus in debian.
>
> Regards
> Amudhan P
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Not able to read file from ceph kernel mount

2020-11-05 Thread Amudhan P

Hi,
I am trying to read file from my ceph kernel mount and file read stays in
bytes for very long and I am getting below error msg in dmesg.

[  167.591095] ceph: loaded (mds proto 32)
[  167.600010] libceph: mon0 10.0.103.1:6789 session established
[  167.601167] libceph: client144519 fsid f8bc7682-0d11-11eb-a332-
0cc47a5ec98a
[  272.132787] libceph: osd1 10.0.104.1:6891 socket closed (con state
CONNECTING)

Ceph cluster status is healthy no error It was working fine until before my
entire cluster was down.

Using Ceph octopus in debian.

Regards
Amudhan P
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Fwd: File read are not completing and IO shows in bytes able to not reading from cephfs

2020-11-04 Thread Amudhan P

Update on the issue of reading file drops to bytes and error.

When new files are copied to mount it works fine and reading the same also
working with no issue.
But reading old or existing files still the same issue and below error msg
in client.
"libceph: osd1 10.0.104.1:6891 socket closed (con state CONNECTING)"


-- Forwarded message -----
From: Amudhan P 
Date: Wed, Nov 4, 2020 at 6:24 PM
Subject: File read are not completing and IO shows in bytes able to not
reading from cephfs
To: ceph-users 


Hi,

In my test ceph octopus cluster I was trying to simulate a failure case of
when client mounted cephfs thru kernel client and doing  read and write
process, shutting down entire cluster with OSD flags like no down, no out,
no backfiling and no recovery.

Cluster is  4 node composed of 3 mons, 2 mgr, 2 mds, 48 OSD's.
Public IP range : 10.0.103.0 and Cluster IP range : 10.0.104.0

Write and Read got stalled after some time cluster was brought live and
healthy. But when reading file thru kernel mount read start at above
100MB/s and suddenly drops to byte and continues for long.
only error msg I could see in the client machine.

[  167.591095] ceph: loaded (mds proto 32)
[  167.600010] libceph: mon0 10.0.103.1:6789 session established
[  167.601167] libceph: client144519 fsid
f8bc7682-0d11-11eb-a332-0cc47a5ec98a
[  272.132787] libceph: osd1 10.0.104.1:6891 socket closed (con state
CONNECTING)

What went wrong why is this issue.?

regards
Amudhan P
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] File read are not completing and IO shows in bytes able to not reading from cephfs

2020-11-04 Thread Amudhan P

Hi,

In my test ceph octopus cluster I was trying to simulate a failure case of
when client mounted cephfs thru kernel client and doing  read and write
process, shutting down entire cluster with OSD flags like no down, no out,
no backfiling and no recovery.

Cluster is  4 node composed of 3 mons, 2 mgr, 2 mds, 48 OSD's.
Public IP range : 10.0.103.0 and Cluster IP range : 10.0.104.0

Write and Read got stalled after some time cluster was brought live and
healthy. But when reading file thru kernel mount read start at above
100MB/s and suddenly drops to byte and continues for long.
only error msg I could see in the client machine.

[  167.591095] ceph: loaded (mds proto 32)
[  167.600010] libceph: mon0 10.0.103.1:6789 session established
[  167.601167] libceph: client144519 fsid
f8bc7682-0d11-11eb-a332-0cc47a5ec98a
[  272.132787] libceph: osd1 10.0.104.1:6891 socket closed (con state
CONNECTING)

What went wrong why is this issue.?

regards
Amudhan P
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph not showing full capacity

2020-10-26 Thread Amudhan P

Hi,

>>  Your first mail shows 67T (instead of 62)

I have just given an approximate number the first given number is the
right number.

I have deleted all pools and just created a fresh pool for test with PG num
128 and now it's showing a full size of 248TB.

output from " ceph df  "
--- RAW STORAGE ---
CLASS  SIZE AVAILUSED RAW USED  %RAW USED
hdd262 TiB  262 TiB  3.9 GiB52 GiB   0.02
TOTAL  262 TiB  262 TiB  3.9 GiB52 GiB   0.02

--- POOLS ---
POOL   ID  STORED  OBJECTS  USED  %USED  MAX AVAIL
pool3   8 0 B0   0 B  0124 TiB

So, PG number is not an issue in showing less size.

I am trying other options also to see what made this issue.


On Mon, Oct 26, 2020 at 8:20 PM 胡 玮文  wrote:

>
> 在 2020年10月26日，22:30，Amudhan P  写道：
>
> 
> Hi Jane,
>
> I agree with you and I was trying to say disk which has more PG will fill
> up quick.
>
> But, My question even though RAW disk space is 262 TB, pool 2 replica max
> storage is showing only 132 TB in the dashboard and when mounting the pool
> using cephfs it's showing 62 TB, I could understand that due to replica
> it's showing half of the space.
>
>
> Your first mail shows 67T (instead of 62)
>
> why it's not showing the entire RAW disk space as available space?
> Number of PG per pool play any vital role in showing available space?
>
>
> I might be wrong, but I think the size of mounted cephfs is calculated by
> “used + available”. It is not directly related to raw disk space. You have
> unbalance issue, so you have less available space as explained previously.
> So the total size is less than expected.
>
> Maybe you should try to correct the unbalance first, and see if the
> available space and size go up. Increase pg_num, run balancer, etc.
>
> On Mon, Oct 26, 2020 at 12:37 PM Janne Johansson 
> wrote:
>
>>
>>
>> Den sön 25 okt. 2020 kl 15:18 skrev Amudhan P :
>>
>>> Hi,
>>>
>>> For my quick understanding How PG's are responsible for allowing space
>>> allocation to a pool?
>>>
>>
>> An objects name will decide which PG (from the list of PGs in the pool)
>> it will end
>> up on, so if you have very few PGs, the hashed/pseudorandom placement will
>> be unbalanced at times. As an example, if you have only 8 PGs and write
>> 9 large objects, then at least one (but probably two or three) PGs will
>> receive two
>> or more of those 9, and some will receive none just on pure statistics.
>> If you have 100 PGs, the chance of one getting two out of those nine
>> objects
>> is much smaller. Overall, with all pools accounted for, one should aim
>> for something
>> like 100 PGs per OSD, but you also need to count the replication factor
>> for each pool
>> so if you have replication = 3 and a pool gets 128 PGs, it will place
>> 3*128 PGs
>> out on various OSDs according to the crush rules.
>>
>> PGs don't have a size, but will grow as needed, and since the next object
>> to
>> be written can end up anywhere (depending on the hashed result) ceph df
>> must
>> always tell you the worst case when listing how much data this pool has
>> "left".
>> It will always be the OSD with least space left that limits the pool.
>>
>>
>>> My understanding that PG's basically helps in object placement when the
>>> number of PG's for a OSD's is high there is a high possibility that PG
>>> gets
>>> lot more data than other PG's.
>>
>>
>> This statement seems incorrect to me.
>>
>>
>>> At this situation, we can use the balance
>>> between OSD's.
>>> But, I can't understand the logic of how does it restrict space to a
>>> pool?
>>>
>>
>>
>> --
>> May the most significant bit of your life be positive.
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph not showing full capacity

2020-10-26 Thread Amudhan P

Hi Jane,

I agree with you and I was trying to say disk which has more PG will fill
up quick.

But, My question even though RAW disk space is 262 TB, pool 2 replica max
storage is showing only 132 TB in the dashboard and when mounting the pool
using cephfs it's showing 62 TB, I could understand that due to replica
it's showing half of the space.

why it's not showing the entire RAW disk space as available space?
Number of PG per pool play any vital role in showing available space?

On Mon, Oct 26, 2020 at 12:37 PM Janne Johansson 
wrote:

>
>
> Den sön 25 okt. 2020 kl 15:18 skrev Amudhan P :
>
>> Hi,
>>
>> For my quick understanding How PG's are responsible for allowing space
>> allocation to a pool?
>>
>
> An objects name will decide which PG (from the list of PGs in the pool) it
> will end
> up on, so if you have very few PGs, the hashed/pseudorandom placement will
> be unbalanced at times. As an example, if you have only 8 PGs and write
> 9 large objects, then at least one (but probably two or three) PGs will
> receive two
> or more of those 9, and some will receive none just on pure statistics.
> If you have 100 PGs, the chance of one getting two out of those nine
> objects
> is much smaller. Overall, with all pools accounted for, one should aim for
> something
> like 100 PGs per OSD, but you also need to count the replication factor
> for each pool
> so if you have replication = 3 and a pool gets 128 PGs, it will place
> 3*128 PGs
> out on various OSDs according to the crush rules.
>
> PGs don't have a size, but will grow as needed, and since the next object
> to
> be written can end up anywhere (depending on the hashed result) ceph df
> must
> always tell you the worst case when listing how much data this pool has
> "left".
> It will always be the OSD with least space left that limits the pool.
>
>
>> My understanding that PG's basically helps in object placement when the
>> number of PG's for a OSD's is high there is a high possibility that PG
>> gets
>> lot more data than other PG's.
>
>
> This statement seems incorrect to me.
>
>
>> At this situation, we can use the balance
>> between OSD's.
>> But, I can't understand the logic of how does it restrict space to a pool?
>>
>
>
> --
> May the most significant bit of your life be positive.
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph not showing full capacity

2020-10-25 Thread Amudhan P

Hi,

For my quick understanding How PG's are responsible for allowing space
allocation to a pool?

My understanding that PG's basically helps in object placement when the
number of PG's for a OSD's is high there is a high possibility that PG gets
lot more data than other PG's. At this situation, we can use the balance
between OSD's.

But, I can't understand the logic of how does it restrict space to a pool?


On Sun, Oct 25, 2020 at 5:55 PM 胡 玮文  wrote:

> Hi,
>
> In ceph, when you create an object, it cannot go any OSD as it fits. An
> object is mapped to a placement group using a hash algorithm. Then
> placement groups are mapped to OSDs. See [1] for details. So, if any of
> your OSD goes full, write operations cannot be guaranteed success. Once you
> correct the unbalance, you should see more available space.
>
> Also, you only have 289 placement groups, which I think is too few for
> your 48 OSDs [2]. If you have more placement groups, the unbalance issue
> will be far less severe.
>
> [1]: https://docs.ceph.com/en/latest/architecture/#mapping-pgs-to-osds
> [2]: https://docs.ceph.com/en/latest/rados/operations/placement-groups/
>
> > 在 2020年10月25日，18:24，Amudhan P  写道：
> >
> > Hi Stefan,
> >
> > I have started balancer but what I don't understand is there are enough
> > free space in other disks.
> >
> > Why it's not showing those in available space?
> > How to reclaim the free space?
> >
> >> On Sun 25 Oct, 2020, 2:27 PM Stefan Kooman,  wrote:
> >>> On 2020-10-25 05:33, Amudhan P wrote:
> >>> Yes, There is a unbalance in PG's assigned to OSD's.
> >>> `ceph osd df` output snip
> >>> ID  CLASS  WEIGHT   REWEIGHT  SIZE RAW USE  DATA OMAP META
> >>> AVAIL%USE   VAR   PGS  STATUS
> >>> 0hdd  5.45799   1.0  5.5 TiB  3.6 TiB  3.6 TiB  9.7 MiB   4.6
> >>> GiB  1.9 TiB  65.94  1.31   13  up
> >>> 1hdd  5.45799   1.0  5.5 TiB  1.0 TiB  1.0 TiB  4.4 MiB   1.3
> >>> GiB  4.4 TiB  18.87  0.389  up
> >>> 2hdd  5.45799   1.0  5.5 TiB  1.5 TiB  1.5 TiB  4.0 MiB   1.9
> >>> GiB  3.9 TiB  28.30  0.56   10  up
> >>> 3hdd  5.45799   1.0  5.5 TiB  2.1 TiB  2.1 TiB  7.7 MiB   2.7
> >>> GiB  3.4 TiB  37.70  0.75   12  up
> >>> 4hdd  5.45799   1.0  5.5 TiB  4.1 TiB  4.1 TiB  5.8 MiB   5.2
> >>> GiB  1.3 TiB  75.27  1.50   20  up
> >>> 5hdd  5.45799   1.0  5.5 TiB  5.1 TiB  5.1 TiB  5.9 MiB   6.7
> >>> GiB  317 GiB  94.32  1.88   18  up
> >>> 6hdd  5.45799   1.0  5.5 TiB  1.5 TiB  1.5 TiB  5.2 MiB   2.0
> >>> GiB  3.9 TiB  28.32  0.569  up
> >>> MIN/MAX VAR: 0.19/1.88  STDDEV: 22.13
> >> ceph balancer mode upmap
> >> ceph balancer on
> >> The balancer should start balancing and this should result in way more
> >> space available. Good to know that ceph df is based on the disk that is
> >> most full.
> >> There is all sorts of tuning available for the balancer, although I
> >> can't find it in the documentation. Ceph docu better project is working
> >> on that. See [1] for information. You can look up the python code to see
> >> what variables you can tune: /usr/share/ceph/mgr/balancer/module.py
> >> ceph config set mgr/balancer/begin_weekday 1
> >> ceph config set mgr/balancer/end_weekday 5
> >> ceph config set mgr mgr/balancer/begin_time 1000
> >> ceph config set mgr mgr/balancer/end_time 1700
> >> ^^ to restrict the balancer running only on weekdays (monday to friday)
> >> from 10:00 - 17:00 h.
> >> Gr. Stefan
> >> [1]:
> https://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.ceph.com%2Fen%2Flatest%2Frados%2Foperations%2Fbalancer%2F%23balancerdata=04%7C01%7C%7C3d442b418e2c4fa062a508d878d017d2%7C84df9e7fe9f640afb435%7C1%7C0%7C637392182428809986%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=08TFogVHZLrAiSRlU5M%2F0vkTiP0fX9I9gjgQRYNc%2Fh4%3Dreserved=0
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph not showing full capacity

2020-10-25 Thread Amudhan P

Hi Stefan,

I have started balancer but what I don't understand is there are enough
free space in other disks.

Why it's not showing those in available space?
How to reclaim the free space?

On Sun 25 Oct, 2020, 2:27 PM Stefan Kooman,  wrote:

> On 2020-10-25 05:33, Amudhan P wrote:
> > Yes, There is a unbalance in PG's assigned to OSD's.
> > `ceph osd df` output snip
> > ID  CLASS  WEIGHT   REWEIGHT  SIZE RAW USE  DATA OMAP META
> >AVAIL%USE   VAR   PGS  STATUS
> >  0hdd  5.45799   1.0  5.5 TiB  3.6 TiB  3.6 TiB  9.7 MiB   4.6
> > GiB  1.9 TiB  65.94  1.31   13  up
> >  1hdd  5.45799   1.0  5.5 TiB  1.0 TiB  1.0 TiB  4.4 MiB   1.3
> > GiB  4.4 TiB  18.87  0.389  up
> >  2hdd  5.45799   1.0  5.5 TiB  1.5 TiB  1.5 TiB  4.0 MiB   1.9
> > GiB  3.9 TiB  28.30  0.56   10  up
> >  3hdd  5.45799   1.0  5.5 TiB  2.1 TiB  2.1 TiB  7.7 MiB   2.7
> > GiB  3.4 TiB  37.70  0.75   12  up
> >  4hdd  5.45799   1.0  5.5 TiB  4.1 TiB  4.1 TiB  5.8 MiB   5.2
> > GiB  1.3 TiB  75.27  1.50   20  up
> >  5hdd  5.45799   1.0  5.5 TiB  5.1 TiB  5.1 TiB  5.9 MiB   6.7
> > GiB  317 GiB  94.32  1.88   18  up
> >  6hdd  5.45799   1.0  5.5 TiB  1.5 TiB  1.5 TiB  5.2 MiB   2.0
> > GiB  3.9 TiB  28.32  0.569  up
> >
> > MIN/MAX VAR: 0.19/1.88  STDDEV: 22.13
>
> ceph balancer mode upmap
> ceph balancer on
>
> The balancer should start balancing and this should result in way more
> space available. Good to know that ceph df is based on the disk that is
> most full.
>
> There is all sorts of tuning available for the balancer, although I
> can't find it in the documentation. Ceph docu better project is working
> on that. See [1] for information. You can look up the python code to see
> what variables you can tune: /usr/share/ceph/mgr/balancer/module.py
>
> ceph config set mgr/balancer/begin_weekday 1
> ceph config set mgr/balancer/end_weekday 5
> ceph config set mgr mgr/balancer/begin_time 1000
> ceph config set mgr mgr/balancer/end_time 1700
>
> ^^ to restrict the balancer running only on weekdays (monday to friday)
> from 10:00 - 17:00 h.
>
> Gr. Stefan
>
> [1]: https://docs.ceph.com/en/latest/rados/operations/balancer/#balancer
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph not showing full capacity

2020-10-24 Thread Amudhan P

Yes, There is a unbalance in PG's assigned to OSD's.
`ceph osd df` output snip
ID  CLASS  WEIGHT   REWEIGHT  SIZE RAW USE  DATA OMAP META
 AVAIL%USE   VAR   PGS  STATUS
 0hdd  5.45799   1.0  5.5 TiB  3.6 TiB  3.6 TiB  9.7 MiB   4.6 GiB
 1.9 TiB  65.94  1.31   13  up
 1hdd  5.45799   1.0  5.5 TiB  1.0 TiB  1.0 TiB  4.4 MiB   1.3 GiB
 4.4 TiB  18.87  0.389  up
 2hdd  5.45799   1.0  5.5 TiB  1.5 TiB  1.5 TiB  4.0 MiB   1.9 GiB
 3.9 TiB  28.30  0.56   10  up
 3hdd  5.45799   1.0  5.5 TiB  2.1 TiB  2.1 TiB  7.7 MiB   2.7 GiB
 3.4 TiB  37.70  0.75   12  up
 4hdd  5.45799   1.0  5.5 TiB  4.1 TiB  4.1 TiB  5.8 MiB   5.2 GiB
 1.3 TiB  75.27  1.50   20  up
 5hdd  5.45799   1.0  5.5 TiB  5.1 TiB  5.1 TiB  5.9 MiB   6.7 GiB
 317 GiB  94.32  1.88   18  up
 6hdd  5.45799   1.0  5.5 TiB  1.5 TiB  1.5 TiB  5.2 MiB   2.0 GiB
 3.9 TiB  28.32  0.569  up

MIN/MAX VAR: 0.19/1.88  STDDEV: 22.13

On Sun, Oct 25, 2020 at 12:08 AM Stefan Kooman  wrote:

> On 2020-10-24 14:53, Amudhan P wrote:
> > Hi,
> >
> > I have created a test Ceph cluster with Ceph Octopus using cephadm.
> >
> > Cluster total RAW disk capacity is 262 TB but it's allowing to use of
> only
> > 132TB.
> > I have not set quota for any of the pool. what could be the issue?
>
> Unbalance? What does ceph osd df show? How large is the standard deviation?
>
> Gr. Stefan
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph not showing full capacity

2020-10-24 Thread Amudhan P

Hi Nathan,

Attached crushmap output.

let me know if you find any thing odd.

On Sat, Oct 24, 2020 at 6:47 PM Nathan Fish  wrote:

> Can you post your crush map? Perhaps some OSDs are in the wrong place.
>
> On Sat, Oct 24, 2020 at 8:51 AM Amudhan P  wrote:
> >
> > Hi,
> >
> > I have created a test Ceph cluster with Ceph Octopus using cephadm.
> >
> > Cluster total RAW disk capacity is 262 TB but it's allowing to use of
> only
> > 132TB.
> > I have not set quota for any of the pool. what could be the issue?
> >
> > Output from :-
> > ceph -s
> >   cluster:
> > id: f8bc7682-0d11-11eb-a332-0cc47a5ec98a
> > health: HEALTH_WARN
> > clock skew detected on mon.strg-node3, mon.strg-node2
> > 2 backfillfull osd(s)
> > 4 pool(s) backfillfull
> > 1 pools have too few placement groups
> >
> >   services:
> > mon: 3 daemons, quorum strg-node1,strg-node3,strg-node2 (age 7m)
> > mgr: strg-node3.jtacbn(active, since 7m), standbys: strg-node1.gtlvyv
> > mds: cephfs-strg:1 {0=cephfs-strg.strg-node1.lhmeea=up:active} 1
> > up:standby
> > osd: 48 osds: 48 up (since 7m), 48 in (since 5d)
> >
> >   task status:
> > scrub status:
> > mds.cephfs-strg.strg-node1.lhmeea: idle
> >
> >   data:
> > pools:   4 pools, 289 pgs
> > objects: 17.29M objects, 66 TiB
> > usage:   132 TiB used, 130 TiB / 262 TiB avail
> > pgs: 288 active+clean
> >  1   active+clean+scrubbing+deep
> >
> > mounted volume shows
> > node1:/ 67T   66T  910G  99% /mnt/cephfs
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
{
"devices": [
{
"id": 0,
"name": "osd.0",
"class": "hdd"
},
{
"id": 1,
"name": "osd.1",
"class": "hdd"
},
{
"id": 2,
"name": "osd.2",
"class": "hdd"
},
{
"id": 3,
"name": "osd.3",
"class": "hdd"
},
{
"id": 4,
"name": "osd.4",
"class": "hdd"
},
{
"id": 5,
"name": "osd.5",
"class": "hdd"
},
{
"id": 6,
"name": "osd.6",
"class": "hdd"
},
{
"id": 7,
"name": "osd.7",
"class": "hdd"
},
{
"id": 8,
"name": "osd.8",
"class": "hdd"
},
{
"id": 9,
"name": "osd.9",
"class": "hdd"
},
{
"id": 10,
"name": "osd.10",
"class": "hdd"
},
{
"id": 11,
"name": "osd.11",
"class": "hdd"
},
{
"id": 12,
"name": "osd.12",
"class": "hdd"
},
{
"id": 13,
"name": "osd.13",
"class": "hdd"
},
{
"id": 14,
"name": "osd.14",
"class": "hdd"
},
{
"id": 15,
"name": "osd.15",
"class": "hdd"
},
{
"id": 16,
"name": "osd.16",
"class": "hdd"
},
{
"id": 17,
"name": "osd.17",
"class": "hdd"
},
{
"id": 18,
"name": "osd.18",

[ceph-users] Ceph not showing full capacity

2020-10-24 Thread Amudhan P

Hi,

I have created a test Ceph cluster with Ceph Octopus using cephadm.

Cluster total RAW disk capacity is 262 TB but it's allowing to use of only
132TB.
I have not set quota for any of the pool. what could be the issue?

Output from :-
ceph -s
  cluster:
id: f8bc7682-0d11-11eb-a332-0cc47a5ec98a
health: HEALTH_WARN
clock skew detected on mon.strg-node3, mon.strg-node2
2 backfillfull osd(s)
4 pool(s) backfillfull
1 pools have too few placement groups

  services:
mon: 3 daemons, quorum strg-node1,strg-node3,strg-node2 (age 7m)
mgr: strg-node3.jtacbn(active, since 7m), standbys: strg-node1.gtlvyv
mds: cephfs-strg:1 {0=cephfs-strg.strg-node1.lhmeea=up:active} 1
up:standby
osd: 48 osds: 48 up (since 7m), 48 in (since 5d)

  task status:
scrub status:
mds.cephfs-strg.strg-node1.lhmeea: idle

  data:
pools:   4 pools, 289 pgs
objects: 17.29M objects, 66 TiB
usage:   132 TiB used, 130 TiB / 262 TiB avail
pgs: 288 active+clean
 1   active+clean+scrubbing+deep

mounted volume shows
node1:/ 67T   66T  910G  99% /mnt/cephfs
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph Octopus

2020-10-23 Thread Amudhan P

Hi Eugen,

ceph config output shows set network address.

I have not restarted containers directly I was trying the command `ceph
orch restart osd.46` I think that was a problem now after running `ceph
orch daemon restart osd.46` it's showing changes in dashboard.

Thanks.


On Fri, Oct 23, 2020 at 6:14 PM Eugen Block  wrote:

> Did you restart the OSD containers? Does ceph config show your changes?
>
> ceph config get mon cluster_network
> ceph config get mon public_network
>
>
>
> Zitat von Amudhan P :
>
> > Hi Eugen,
> >
> > I did the same step specified but OSD is not updated cluster address.
> >
> >
> > On Tue, Oct 20, 2020 at 2:52 PM Eugen Block  wrote:
> >
> >> > I wonder if this would be impactful, even if  `nodown` were set.
> >> > When a given OSD latches onto
> >> > the new replication network, I would expect it to want to use it for
> >> > heartbeats — but when
> >> > its heartbeat peers aren’t using the replication network yet, they
> >> > won’t be reachable.
> >>
> >> I also expected at least some sort of impact, I just tested it in a
> >> virtual lab environment. But besides the temporary "down" OSDs during
> >> container restart the cluster was always responsive (although there's
> >> no client traffic). I didn't even set "nodown". But all OSDs now have
> >> a new backend address and the cluster seems to be happy.
> >>
> >> Regards,
> >> Eugen
> >>
> >>
> >> Zitat von Anthony D'Atri :
> >>
> >> > I wonder if this would be impactful, even if  `nodown` were set.
> >> > When a given OSD latches onto
> >> > the new replication network, I would expect it to want to use it for
> >> > heartbeats — but when
> >> > its heartbeat peers aren’t using the replication network yet, they
> >> > won’t be reachable.
> >> >
> >> > Unless something has changed since I tried this with Luminous.
> >> >
> >> >> On Oct 20, 2020, at 12:47 AM, Eugen Block  wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> a quick search [1] shows this:
> >> >>
> >> >> ---snip---
> >> >> # set new config
> >> >> ceph config set global cluster_network 192.168.1.0/24
> >> >>
> >> >> # let orchestrator reconfigure the daemons
> >> >> ceph orch daemon reconfig mon.host1
> >> >> ceph orch daemon reconfig mon.host2
> >> >> ceph orch daemon reconfig mon.host3
> >> >> ceph orch daemon reconfig osd.1
> >> >> ceph orch daemon reconfig osd.2
> >> >> ceph orch daemon reconfig osd.3
> >> >> ---snip---
> >> >>
> >> >> I haven't tried it myself though.
> >> >>
> >> >> Regards,
> >> >> Eugen
> >> >>
> >> >> [1]
> >> >>
> >>
> https://stackoverflow.com/questions/61763230/configure-a-cluster-network-with-cephadm
> >> >>
> >> >>
> >> >> Zitat von Amudhan P :
> >> >>
> >> >>> Hi,
> >> >>>
> >> >>> I have installed Ceph Octopus cluster using cephadm with a single
> >> network
> >> >>> now I want to add a second network and configure it as a cluster
> >> address.
> >> >>>
> >> >>> How do I configure ceph to use second Network as cluster network?.
> >> >>>
> >> >>> Amudhan
> >> >>> ___
> >> >>> ceph-users mailing list -- ceph-users@ceph.io
> >> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
> >> >>
> >> >>
> >> >> ___
> >> >> ceph-users mailing list -- ceph-users@ceph.io
> >> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >>
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
>
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph Octopus

2020-10-23 Thread Amudhan P

Hi Eugen,

I did the same step specified but OSD is not updated cluster address.


On Tue, Oct 20, 2020 at 2:52 PM Eugen Block  wrote:

> > I wonder if this would be impactful, even if  `nodown` were set.
> > When a given OSD latches onto
> > the new replication network, I would expect it to want to use it for
> > heartbeats — but when
> > its heartbeat peers aren’t using the replication network yet, they
> > won’t be reachable.
>
> I also expected at least some sort of impact, I just tested it in a
> virtual lab environment. But besides the temporary "down" OSDs during
> container restart the cluster was always responsive (although there's
> no client traffic). I didn't even set "nodown". But all OSDs now have
> a new backend address and the cluster seems to be happy.
>
> Regards,
> Eugen
>
>
> Zitat von Anthony D'Atri :
>
> > I wonder if this would be impactful, even if  `nodown` were set.
> > When a given OSD latches onto
> > the new replication network, I would expect it to want to use it for
> > heartbeats — but when
> > its heartbeat peers aren’t using the replication network yet, they
> > won’t be reachable.
> >
> > Unless something has changed since I tried this with Luminous.
> >
> >> On Oct 20, 2020, at 12:47 AM, Eugen Block  wrote:
> >>
> >> Hi,
> >>
> >> a quick search [1] shows this:
> >>
> >> ---snip---
> >> # set new config
> >> ceph config set global cluster_network 192.168.1.0/24
> >>
> >> # let orchestrator reconfigure the daemons
> >> ceph orch daemon reconfig mon.host1
> >> ceph orch daemon reconfig mon.host2
> >> ceph orch daemon reconfig mon.host3
> >> ceph orch daemon reconfig osd.1
> >> ceph orch daemon reconfig osd.2
> >> ceph orch daemon reconfig osd.3
> >> ---snip---
> >>
> >> I haven't tried it myself though.
> >>
> >> Regards,
> >> Eugen
> >>
> >> [1]
> >>
> https://stackoverflow.com/questions/61763230/configure-a-cluster-network-with-cephadm
> >>
> >>
> >> Zitat von Amudhan P :
> >>
> >>> Hi,
> >>>
> >>> I have installed Ceph Octopus cluster using cephadm with a single
> network
> >>> now I want to add a second network and configure it as a cluster
> address.
> >>>
> >>> How do I configure ceph to use second Network as cluster network?.
> >>>
> >>> Amudhan
> >>> ___
> >>> ceph-users mailing list -- ceph-users@ceph.io
> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >>
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Ceph Octopus

2020-10-19 Thread Amudhan P

Hi,

I have installed Ceph Octopus cluster using cephadm with a single network
now I want to add a second network and configure it as a cluster address.

How do I configure ceph to use second Network as cluster network?.

Amudhan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Cephdeploy support

2020-10-10 Thread Amudhan P

Hi,

Future releases of Ceph support cephdeploy or only Cephadm will be the
choice.

Thanks,
Amudhan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: cephadm not working with non-root user

2020-08-20 Thread Amudhan P

Hi,

Any of you used cephadm bootstrap command without root user?


On Wed, Aug 19, 2020 at 11:30 AM Amudhan P  wrote:

> Hi,
>
> I am trying to install ceph 'octopus' using cephadm. In bootstrap
> command, I have specified a non-root user account as ssh-user.
> cephadm bootstrap --mon-ip xx.xxx.xx.xx --ssh-user non-rootuser
>
> when bootstrap about to complete it threw an error stating.
>
> """"
> INFO:cephadm:Non-zero exit code 2 from /usr/bin/podman run --rm --net=host
> --ipc=host -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=node1
> -   v
> /var/log/ceph/ae4ed114-e145-11ea-9c1f-0025900a8ebe:/var/log/ceph:z -v
> /tmp/ceph-tmpm22k9j9w:/etc/ceph/ceph.client.admin.keyring:z -v
> /tmp/ceph-tmpe   1ltigk8:/etc/ceph/ceph.conf:z --entrypoint
> /usr/bin/ceph docker.io/ceph/ceph:v15 orch host add node1
> INFO:cephadm:/usr/bin/ceph:stderr Error ENOENT: Failed to connect to node1
> (node1).
> INFO:cephadm:/usr/bin/ceph:stderr Check that the host is reachable and
> accepts connections using the cephadm SSH key
> INFO:cephadm:/usr/bin/ceph:stderr
> INFO:cephadm:/usr/bin/ceph:stderr you may want to run:
> INFO:cephadm:/usr/bin/ceph:stderr > ceph cephadm get-ssh-config >
> ssh_config
> INFO:cephadm:/usr/bin/ceph:stderr > ceph config-key get
> mgr/cephadm/ssh_identity_key > key
> INFO:cephadm:/usr/bin/ceph:stderr > ssh -F ssh_config -i key root@node1
> """""
> In the above steps, it's trying to connect as root to the node and when I
> downloaded ssh_config file it was also specified as 'root' inside. so, I
> modified the config file and uploaded but same to ceph but still ssh to
> node1 is not working.
>
> To confirm if I have used the right command been used during bootstrap. I
> have tried the below command.
>
> " ceph config-key dump mgr/cephadm/ssh_user"
> {
> "mgr/cephadm/ssh_user": "non-rootuser"
> }
>
> and the output shows the user I have used during bootstrap  "non-rootuser"
>
> but at the same time when I run cmd " ceph cephadm get-user " the output
> still shows 'root' as the user.
>
> Why the change is not affecting? do anyone faced a similar issue in
> bootstrap?
>
> Is there any way to avoid using container with cephadm?
>
> regards
> Amudhan
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] cephadm not working with non-root user

2020-08-18 Thread Amudhan P

Hi,

I am trying to install ceph 'octopus' using cephadm. In bootstrap
command, I have specified a non-root user account as ssh-user.
cephadm bootstrap --mon-ip xx.xxx.xx.xx --ssh-user non-rootuser

when bootstrap about to complete it threw an error stating.


INFO:cephadm:Non-zero exit code 2 from /usr/bin/podman run --rm --net=host
--ipc=host -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=node1 -
  v
/var/log/ceph/ae4ed114-e145-11ea-9c1f-0025900a8ebe:/var/log/ceph:z -v
/tmp/ceph-tmpm22k9j9w:/etc/ceph/ceph.client.admin.keyring:z -v
/tmp/ceph-tmpe   1ltigk8:/etc/ceph/ceph.conf:z --entrypoint
/usr/bin/ceph docker.io/ceph/ceph:v15 orch host add node1
INFO:cephadm:/usr/bin/ceph:stderr Error ENOENT: Failed to connect to node1
(node1).
INFO:cephadm:/usr/bin/ceph:stderr Check that the host is reachable and
accepts connections using the cephadm SSH key
INFO:cephadm:/usr/bin/ceph:stderr
INFO:cephadm:/usr/bin/ceph:stderr you may want to run:
INFO:cephadm:/usr/bin/ceph:stderr > ceph cephadm get-ssh-config > ssh_config
INFO:cephadm:/usr/bin/ceph:stderr > ceph config-key get
mgr/cephadm/ssh_identity_key > key
INFO:cephadm:/usr/bin/ceph:stderr > ssh -F ssh_config -i key root@node1
"
In the above steps, it's trying to connect as root to the node and when I
downloaded ssh_config file it was also specified as 'root' inside. so, I
modified the config file and uploaded but same to ceph but still ssh to
node1 is not working.

To confirm if I have used the right command been used during bootstrap. I
have tried the below command.

" ceph config-key dump mgr/cephadm/ssh_user"
{
"mgr/cephadm/ssh_user": "non-rootuser"
}

and the output shows the user I have used during bootstrap  "non-rootuser"

but at the same time when I run cmd " ceph cephadm get-user " the output
still shows 'root' as the user.

Why the change is not affecting? do anyone faced a similar issue in
bootstrap?

Is there any way to avoid using container with cephadm?

regards
Amudhan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph latest install

2020-06-13 Thread Amudhan P

https://computingforgeeks.com/install-ceph-storage-cluster-on-ubuntu-linux-servers/


On Sat, Jun 13, 2020 at 2:31 PM masud parvez 
wrote:

> Could anyone give me the latest version ceph install guide for ubuntu 20.04
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Ceph deployment and Managing suite

2020-06-13 Thread Amudhan P

Hi,

I am looking for a Software suite to deploy Ceph Storage Node and Gateway
server (SMB & NFS) and also dashboard Showing entire Cluster status,
Individual node health, disk identification or maintenance activity,
network utilization.
Simple user manageable dashboard.

Please suggest any Paid or Community based you have been using or you
recommend to others.

regards
Amudhan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Upload speed slow for 7MB file cephfs+Samaba

2020-06-12 Thread Amudhan P

Hi,

I have Ceph Octopus 4 Node, each node has 12 disk cluster, which is
configured with cephfs (replica 2) + exposed via samba for windows 10G
client.

When a user copies a folder containing 1000's of 7MB files from windows 10
client getting only a speed of 40MB/s.
Client and Ceph nodes all connected in 10G. In the same setup copying 1GB
file from windows client to samba getting 90 MB/s.

Are there any kernel or network tunning needs to be done?

Any suggestions?

regards
Amudhan P
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Octopus: orchestrator not working correctly with nfs

2020-06-11 Thread Amudhan P

Hi,

I have not worked with orchestrator but I remember I read somewhere that
NFS implementation is not supported.

Refer Cephadm documentation and for NFS you have configure nfs Ganesha.

You can manage NFS thru dashboard but for that you have initial config in
dashboard and in nfsganaesha you have refer it.

Regards
Amudhan

On Thu 11 Jun, 2020, 11:40 AM Simon Sutter,  wrote:

> Hello,
>
>
> Did I not provide enough information, or simply nobody knows how to solve
> the problem?
> Should I write to the ceph tracker or does this just produce unnecessary
> overhead?
>
>
> Thanks in advance,
>
> Simon
>
> 
> Von: Simon Sutter 
> Gesendet: Montag, 8. Juni 2020 10:56:00
> An: ceph-users@ceph.io
> Betreff: [ceph-users] Octopus: orchestrator not working correctly with nfs
>
> Hello
>
>
> I know that nfs on octopus is still a bit under development.
>
> I'm trying to deploy nfs daemons and have some issues with the
> orchestartor.
>
> For the other daemons, for example monitors, I can issue the command "ceph
> orch apply mon 3"
>
> This will tell the orchestrator to deploy or remove monitor daemons until
> there are three of them.
>
> The command does not work with nfs, and now the orchestrator is a bit
> missconfigured...
>
> And with missconfigured I mean, that I have now a nfs daemon on node 1 and
> the orchestrator wants to create another one on node 1 but with wrong
> settings (it fails).
> Also a "ceph orch apply nfs –unconfigured" does not work, so I can't
> manually manage the nfs containers.
>
> Is there a manual way to tell ceph orch, to not create or remove nfs
> daemons? then I would be able to set them up manually.
> Or a manual way of configuring the orchestrator so it does the right thing.
>
>
> Thanks in advance
>
> Simon
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> hosttech GmbH | Simon Sutter
> hosttech.ch
>
> WE LOVE TO HOST YOU.
>
> create your own website!
> more information & online-demo: www.website-creator.ch<
> http://www.website-creator.ch http://www.website-creator.ch>>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph dashboard inventory page not listing osds

2020-06-10 Thread Amudhan P

Eugen,
The issue was not because of the page display setting it was due to module
`test_orchestrator` which I have enabled on the modules page. Now I have
disabled it and trying to install ceph-mgr-cephadm to support orchestration.

Now i am trying to enable cephadm in module page to view Inventory
(identify device) i am getting below error msg.
```
`ceph mgr module enable cephadm`
Error ENOENT: module 'cephadm' reports that it cannot run on the active
manager daemon: loading remoto library:No module named 'remoto' (pass
--force to force enablement)
```
I have installed `pip3 install remoto` in system running mgr daemon.

My octopus cluster was installed using ceph-deploy initially.

regards
Amudhan

On Wed, Jun 10, 2020 at 1:29 PM Ni-Feng Chang  wrote:

> Hi,
>
> Do you mind creating an issue on https://tracker.ceph.com/projects/mgr ?
> It's easier to attach screenshot or logs on the issue tracker.
>
> Screenshots might be helpful:
> - Cluster >> Inventory page, include the device that should have an OSD ID.
> - Cluster >> OSDs page, click the OSD that not displayed in inventory
> page. On `Metadata` tab, what's the value of `devices` in the table.
>
> Regards,
> --
> Kiefer Chang (Ni-Feng Chang)
>
>
>
>
> On 2020/6/7, 11:03 PM, "Amudhan P"  wrote:
>
> Hi,
>
> I am using Ceph octopus in a small cluster.
>
> I have enabled ceph dashboard and when I go to inventory page I could see
> OSD's running in mgr node only not listing other OSD in other 3 nodes.
>
> I don't see any issue in the log.
>
> How do I list other OSD'S
>
> Regards
> Amudhan P
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Ceph dashboard inventory page not listing osds

2020-06-07 Thread Amudhan P

Hi,

I am using Ceph octopus in a small cluster.

I have enabled ceph dashboard and when I go to inventory page I could see
OSD's running in mgr node only not listing other OSD in other 3 nodes.

I don't see any issue in the log.

How do I list other OSD'S

Regards
Amudhan P
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph Nautius not working after setting MTU 9000

2020-05-24 Thread Amudhan P

I didn't do any changes but started working now with jumbo frames.

On Sun, May 24, 2020 at 1:04 PM Khodayar Doustar 
wrote:

> So this is your problem, it has nothing to do with Ceph. Just fix the
> network or rollback all changes.
>
> On Sun, May 24, 2020 at 9:05 AM Amudhan P  wrote:
>
>> No, ping with MTU size 9000 didn't work.
>>
>> On Sun, May 24, 2020 at 12:26 PM Khodayar Doustar 
>> wrote:
>>
>> > Does your ping work or not?
>> >
>> >
>> > On Sun, May 24, 2020 at 6:53 AM Amudhan P  wrote:
>> >
>> >> Yes, I have set setting on the switch side also.
>> >>
>> >> On Sat 23 May, 2020, 6:47 PM Khodayar Doustar, 
>> >> wrote:
>> >>
>> >>> Problem should be with network. When you change MTU it should be
>> changed
>> >>> all over the network, any single hup on your network should speak and
>> >>> accept 9000 MTU packets. you can check it on your hosts with
>> "ifconfig"
>> >>> command and there is also equivalent commands for other
>> network/security
>> >>> devices.
>> >>>
>> >>> If you have just one node which it not correctly configured for MTU
>> 9000
>> >>> it wouldn't work.
>> >>>
>> >>> On Sat, May 23, 2020 at 2:30 PM si...@turka.nl 
>> wrote:
>> >>>
>> >>>> Can the servers/nodes ping eachother using large packet sizes? I
>> guess
>> >>>> not.
>> >>>>
>> >>>> Sinan Polat
>> >>>>
>> >>>> > Op 23 mei 2020 om 14:21 heeft Amudhan P  het
>> >>>> volgende geschreven:
>> >>>> >
>> >>>> > In OSD logs "heartbeat_check: no reply from OSD"
>> >>>> >
>> >>>> >> On Sat, May 23, 2020 at 5:44 PM Amudhan P 
>> >>>> wrote:
>> >>>> >>
>> >>>> >> Hi,
>> >>>> >>
>> >>>> >> I have set Network switch with MTU size 9000 and also in my
>> netplan
>> >>>> >> configuration.
>> >>>> >>
>> >>>> >> What else needs to be checked?
>> >>>> >>
>> >>>> >>
>> >>>> >>> On Sat, May 23, 2020 at 3:39 PM Wido den Hollander <
>> w...@42on.com>
>> >>>> wrote:
>> >>>> >>>
>> >>>> >>>
>> >>>> >>>
>> >>>> >>>> On 5/23/20 12:02 PM, Amudhan P wrote:
>> >>>> >>>> Hi,
>> >>>> >>>>
>> >>>> >>>> I am using ceph Nautilus in Ubuntu 18.04 working fine wit MTU
>> size
>> >>>> 1500
>> >>>> >>>> (default) recently i tried to update MTU size to 9000.
>> >>>> >>>> After setting Jumbo frame running ceph -s is timing out.
>> >>>> >>>
>> >>>> >>> Ceph can run just fine with an MTU of 9000. But there is probably
>> >>>> >>> something else wrong on the network which is causing this.
>> >>>> >>>
>> >>>> >>> Check the Jumbo Frames settings on all the switches as well to
>> make
>> >>>> sure
>> >>>> >>> they forward all the packets.
>> >>>> >>>
>> >>>> >>> This is definitely not a Ceph issue.
>> >>>> >>>
>> >>>> >>> Wido
>> >>>> >>>
>> >>>> >>>>
>> >>>> >>>> regards
>> >>>> >>>> Amudhan P
>> >>>> >>>> ___
>> >>>> >>>> ceph-users mailing list -- ceph-users@ceph.io
>> >>>> >>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> >>>> >>>>
>> >>>> >>> ___
>> >>>> >>> ceph-users mailing list -- ceph-users@ceph.io
>> >>>> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> >>>> >>>
>> >>>> >>
>> >>>> > ___
>> >>>> > ceph-users mailing list -- ceph-users@ceph.io
>> >>>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >>>>
>> >>>> ___
>> >>>> ceph-users mailing list -- ceph-users@ceph.io
>> >>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> >>>>
>> >>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs IO halt on Node failure

2020-05-24 Thread Amudhan P

Sorry for the late reply.
I have pasted crush map in below url : https://pastebin.com/ASPpY2VB
and this my osd tree output and this issue are only when i use it with
filelayout.

ID CLASS WEIGHTTYPE NAME  STATUS REWEIGHT PRI-AFF
-1   327.48047 root default
-3   109.16016 host strgsrv01
 0   hdd   5.45799 osd.0  up  1.0 1.0
 2   hdd   5.45799 osd.2  up  1.0 1.0
 3   hdd   5.45799 osd.3  up  1.0 1.0
 4   hdd   5.45799 osd.4  up  1.0 1.0
 5   hdd   5.45799 osd.5  up  1.0 1.0
 6   hdd   5.45799 osd.6  up  1.0 1.0
 7   hdd   5.45799 osd.7  up  1.0 1.0
19   hdd   5.45799 osd.19 up  1.0 1.0
20   hdd   5.45799 osd.20 up  1.0 1.0
21   hdd   5.45799 osd.21 up  1.0 1.0
22   hdd   5.45799 osd.22 up  1.0 1.0
23   hdd   5.45799 osd.23 up  1.0 1.0
-5   109.16016 host strgsrv02
 1   hdd   5.45799 osd.1  up  1.0 1.0
 8   hdd   5.45799 osd.8  up  1.0 1.0
 9   hdd   5.45799 osd.9  up  1.0 1.0
10   hdd   5.45799 osd.10 up  1.0 1.0
11   hdd   5.45799 osd.11 up  1.0 1.0
12   hdd   5.45799 osd.12 up  1.0 1.0
24   hdd   5.45799 osd.24 up  1.0 1.0
25   hdd   5.45799 osd.25 up  1.0 1.0
26   hdd   5.45799 osd.26 up  1.0 1.0
27   hdd   5.45799 osd.27 up  1.0 1.0
28   hdd   5.45799 osd.28 up  1.0 1.0
29   hdd   5.45799 osd.29 up  1.0 1.0
-7   109.16016 host strgsrv03
13   hdd   5.45799 osd.13 up  1.0 1.0
14   hdd   5.45799 osd.14 up  1.0 1.0
15   hdd   5.45799 osd.15 up  1.0 1.0
16   hdd   5.45799 osd.16 up  1.0 1.0
17   hdd   5.45799 osd.17 up  1.0 1.0
18   hdd   5.45799 osd.18 up  1.0 1.0
30   hdd   5.45799 osd.30 up  1.0 1.0
31   hdd   5.45799 osd.31 up  1.0 1.0
32   hdd   5.45799 osd.32 up  1.0 1.0
33   hdd   5.45799 osd.33 up  1.0 1.0
34   hdd   5.45799 osd.34 up  1.0 1.0
35   hdd   5.45799 osd.35 up  1.0 1.0

On Tue, May 19, 2020 at 12:16 PM Eugen Block  wrote:

> Was that a typo and you mean you changed min_size to 1? I/O paus with
> min_size 1 and size 2 is unexpected, can you share more details like
> your crushmap and your osd tree?
>
>
> Zitat von Amudhan P :
>
> > Behaviour is same even after setting min_size 2.
> >
> > On Mon 18 May, 2020, 12:34 PM Eugen Block,  wrote:
> >
> >> If your pool has a min_size 2 and size 2 (always a bad idea) it will
> >> pause IO in case of a failure until the recovery has finished. So the
> >> described behaviour is expected.
> >>
> >>
> >> Zitat von Amudhan P :
> >>
> >> > Hi,
> >> >
> >> > Crush rule is "replicated" and min_size 2 actually. I am trying to
> test
> >> > multiple volume configs in a single filesystem
> >> > using file layout.
> >> >
> >> > I have created metadata pool with rep 3 (min_size2 and replicated
> crush
> >> > rule) and data pool with rep 3  (min_size2 and replicated crush rule).
> >> and
> >> > also  I have created multiple (replica 2, ec2-1 & ec4-2) pools and
> added
> >> to
> >> > the filesystem.
> >> >
> >> > Using file layout I have set different data pool to a different
> folders.
> >> so
> >> > I can test different configs in the same filesystem. all data pools
> >> > min_size set to handle single node failure.
> >> >
> >> > Single node failure is handled properly when only having metadata pool
> >> and
> >> > one data pool (rep3).
> >> >
> >> > After adding additional data pool to fs, single node failure scenario
> is
> >> > not working.
> >> >
> >> > regards
> >> > Amudhan P
> >> >
> >> > On Sun, May 17, 2020 at 1:29 AM Eugen Block  wrote:
> >> >
> >> >> What’s your pool configuration wrt min_size and crush rules?
> >> >>
> >> >>
> >> >> Zitat von Amudhan P :
> >> >>
> >> >> > Hi,
> >&

[ceph-users] Re: Ceph Nautius not working after setting MTU 9000

2020-05-24 Thread Amudhan P

it's a Dell S4048T-ON Switch using 10G Ethernet.

On Sat, May 23, 2020 at 11:05 PM apely agamakou  wrote:

> Hi,
>
> Please check you MTU limit at the switch level, cand check other
> ressources with icmp ping.
> Try to add 14Byte for ethernet header at your switch level mean an MTU of
> 9014 ? are you using juniper ???
>
> Exemple : ping -D -s 9 other_ip
>
>
>
> Le sam. 23 mai 2020 à 15:18, Khodayar Doustar  a
> écrit :
>
>> Problem should be with network. When you change MTU it should be changed
>> all over the network, any single hup on your network should speak and
>> accept 9000 MTU packets. you can check it on your hosts with "ifconfig"
>> command and there is also equivalent commands for other network/security
>> devices.
>>
>> If you have just one node which it not correctly configured for MTU 9000
>> it
>> wouldn't work.
>>
>> On Sat, May 23, 2020 at 2:30 PM si...@turka.nl  wrote:
>>
>> > Can the servers/nodes ping eachother using large packet sizes? I guess
>> not.
>> >
>> > Sinan Polat
>> >
>> > > Op 23 mei 2020 om 14:21 heeft Amudhan P  het
>> > volgende geschreven:
>> > >
>> > > In OSD logs "heartbeat_check: no reply from OSD"
>> > >
>> > >> On Sat, May 23, 2020 at 5:44 PM Amudhan P 
>> wrote:
>> > >>
>> > >> Hi,
>> > >>
>> > >> I have set Network switch with MTU size 9000 and also in my netplan
>> > >> configuration.
>> > >>
>> > >> What else needs to be checked?
>> > >>
>> > >>
>> > >>> On Sat, May 23, 2020 at 3:39 PM Wido den Hollander 
>> > wrote:
>> > >>>
>> > >>>
>> > >>>
>> > >>>> On 5/23/20 12:02 PM, Amudhan P wrote:
>> > >>>> Hi,
>> > >>>>
>> > >>>> I am using ceph Nautilus in Ubuntu 18.04 working fine wit MTU size
>> > 1500
>> > >>>> (default) recently i tried to update MTU size to 9000.
>> > >>>> After setting Jumbo frame running ceph -s is timing out.
>> > >>>
>> > >>> Ceph can run just fine with an MTU of 9000. But there is probably
>> > >>> something else wrong on the network which is causing this.
>> > >>>
>> > >>> Check the Jumbo Frames settings on all the switches as well to make
>> > sure
>> > >>> they forward all the packets.
>> > >>>
>> > >>> This is definitely not a Ceph issue.
>> > >>>
>> > >>> Wido
>> > >>>
>> > >>>>
>> > >>>> regards
>> > >>>> Amudhan P
>> > >>>> ___
>> > >>>> ceph-users mailing list -- ceph-users@ceph.io
>> > >>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> > >>>>
>> > >>> ___
>> > >>> ceph-users mailing list -- ceph-users@ceph.io
>> > >>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> > >>>
>> > >>
>> > > ___
>> > > ceph-users mailing list -- ceph-users@ceph.io
>> > > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph Nautius not working after setting MTU 9000

2020-05-24 Thread Amudhan P

No, ping with MTU size 9000 didn't work.

On Sun, May 24, 2020 at 12:26 PM Khodayar Doustar 
wrote:

> Does your ping work or not?
>
>
> On Sun, May 24, 2020 at 6:53 AM Amudhan P  wrote:
>
>> Yes, I have set setting on the switch side also.
>>
>> On Sat 23 May, 2020, 6:47 PM Khodayar Doustar, 
>> wrote:
>>
>>> Problem should be with network. When you change MTU it should be changed
>>> all over the network, any single hup on your network should speak and
>>> accept 9000 MTU packets. you can check it on your hosts with "ifconfig"
>>> command and there is also equivalent commands for other network/security
>>> devices.
>>>
>>> If you have just one node which it not correctly configured for MTU 9000
>>> it wouldn't work.
>>>
>>> On Sat, May 23, 2020 at 2:30 PM si...@turka.nl  wrote:
>>>
>>>> Can the servers/nodes ping eachother using large packet sizes? I guess
>>>> not.
>>>>
>>>> Sinan Polat
>>>>
>>>> > Op 23 mei 2020 om 14:21 heeft Amudhan P  het
>>>> volgende geschreven:
>>>> >
>>>> > In OSD logs "heartbeat_check: no reply from OSD"
>>>> >
>>>> >> On Sat, May 23, 2020 at 5:44 PM Amudhan P 
>>>> wrote:
>>>> >>
>>>> >> Hi,
>>>> >>
>>>> >> I have set Network switch with MTU size 9000 and also in my netplan
>>>> >> configuration.
>>>> >>
>>>> >> What else needs to be checked?
>>>> >>
>>>> >>
>>>> >>> On Sat, May 23, 2020 at 3:39 PM Wido den Hollander 
>>>> wrote:
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>> On 5/23/20 12:02 PM, Amudhan P wrote:
>>>> >>>> Hi,
>>>> >>>>
>>>> >>>> I am using ceph Nautilus in Ubuntu 18.04 working fine wit MTU size
>>>> 1500
>>>> >>>> (default) recently i tried to update MTU size to 9000.
>>>> >>>> After setting Jumbo frame running ceph -s is timing out.
>>>> >>>
>>>> >>> Ceph can run just fine with an MTU of 9000. But there is probably
>>>> >>> something else wrong on the network which is causing this.
>>>> >>>
>>>> >>> Check the Jumbo Frames settings on all the switches as well to make
>>>> sure
>>>> >>> they forward all the packets.
>>>> >>>
>>>> >>> This is definitely not a Ceph issue.
>>>> >>>
>>>> >>> Wido
>>>> >>>
>>>> >>>>
>>>> >>>> regards
>>>> >>>> Amudhan P
>>>> >>>> ___
>>>> >>>> ceph-users mailing list -- ceph-users@ceph.io
>>>> >>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>> >>>>
>>>> >>> ___
>>>> >>> ceph-users mailing list -- ceph-users@ceph.io
>>>> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>> >>>
>>>> >>
>>>> > ___
>>>> > ceph-users mailing list -- ceph-users@ceph.io
>>>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>>>
>>>> ___
>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>>
>>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph Nautius not working after setting MTU 9000

2020-05-23 Thread Amudhan P

Yes, I have set setting on the switch side also.

On Sat 23 May, 2020, 6:47 PM Khodayar Doustar,  wrote:

> Problem should be with network. When you change MTU it should be changed
> all over the network, any single hup on your network should speak and
> accept 9000 MTU packets. you can check it on your hosts with "ifconfig"
> command and there is also equivalent commands for other network/security
> devices.
>
> If you have just one node which it not correctly configured for MTU 9000
> it wouldn't work.
>
> On Sat, May 23, 2020 at 2:30 PM si...@turka.nl  wrote:
>
>> Can the servers/nodes ping eachother using large packet sizes? I guess
>> not.
>>
>> Sinan Polat
>>
>> > Op 23 mei 2020 om 14:21 heeft Amudhan P  het
>> volgende geschreven:
>> >
>> > In OSD logs "heartbeat_check: no reply from OSD"
>> >
>> >> On Sat, May 23, 2020 at 5:44 PM Amudhan P  wrote:
>> >>
>> >> Hi,
>> >>
>> >> I have set Network switch with MTU size 9000 and also in my netplan
>> >> configuration.
>> >>
>> >> What else needs to be checked?
>> >>
>> >>
>> >>> On Sat, May 23, 2020 at 3:39 PM Wido den Hollander 
>> wrote:
>> >>>
>> >>>
>> >>>
>> >>>> On 5/23/20 12:02 PM, Amudhan P wrote:
>> >>>> Hi,
>> >>>>
>> >>>> I am using ceph Nautilus in Ubuntu 18.04 working fine wit MTU size
>> 1500
>> >>>> (default) recently i tried to update MTU size to 9000.
>> >>>> After setting Jumbo frame running ceph -s is timing out.
>> >>>
>> >>> Ceph can run just fine with an MTU of 9000. But there is probably
>> >>> something else wrong on the network which is causing this.
>> >>>
>> >>> Check the Jumbo Frames settings on all the switches as well to make
>> sure
>> >>> they forward all the packets.
>> >>>
>> >>> This is definitely not a Ceph issue.
>> >>>
>> >>> Wido
>> >>>
>> >>>>
>> >>>> regards
>> >>>> Amudhan P
>> >>>> ___
>> >>>> ceph-users mailing list -- ceph-users@ceph.io
>> >>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> >>>>
>> >>> ___
>> >>> ceph-users mailing list -- ceph-users@ceph.io
>> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> >>>
>> >>
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph Nautius not working after setting MTU 9000

2020-05-23 Thread Amudhan P

In OSD logs "heartbeat_check: no reply from OSD"

On Sat, May 23, 2020 at 5:44 PM Amudhan P  wrote:

> Hi,
>
> I have set Network switch with MTU size 9000 and also in my netplan
> configuration.
>
> What else needs to be checked?
>
>
> On Sat, May 23, 2020 at 3:39 PM Wido den Hollander  wrote:
>
>>
>>
>> On 5/23/20 12:02 PM, Amudhan P wrote:
>> > Hi,
>> >
>> > I am using ceph Nautilus in Ubuntu 18.04 working fine wit MTU size 1500
>> > (default) recently i tried to update MTU size to 9000.
>> > After setting Jumbo frame running ceph -s is timing out.
>>
>> Ceph can run just fine with an MTU of 9000. But there is probably
>> something else wrong on the network which is causing this.
>>
>> Check the Jumbo Frames settings on all the switches as well to make sure
>> they forward all the packets.
>>
>> This is definitely not a Ceph issue.
>>
>> Wido
>>
>> >
>> > regards
>> > Amudhan P
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph Nautius not working after setting MTU 9000

2020-05-23 Thread Amudhan P

Hi,

I have set Network switch with MTU size 9000 and also in my netplan
configuration.

What else needs to be checked?


On Sat, May 23, 2020 at 3:39 PM Wido den Hollander  wrote:

>
>
> On 5/23/20 12:02 PM, Amudhan P wrote:
> > Hi,
> >
> > I am using ceph Nautilus in Ubuntu 18.04 working fine wit MTU size 1500
> > (default) recently i tried to update MTU size to 9000.
> > After setting Jumbo frame running ceph -s is timing out.
>
> Ceph can run just fine with an MTU of 9000. But there is probably
> something else wrong on the network which is causing this.
>
> Check the Jumbo Frames settings on all the switches as well to make sure
> they forward all the packets.
>
> This is definitely not a Ceph issue.
>
> Wido
>
> >
> > regards
> > Amudhan P
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Ceph Nautius not working after setting MTU 9000

2020-05-23 Thread Amudhan P

Hi,

I am using ceph Nautilus in Ubuntu 18.04 working fine wit MTU size 1500
(default) recently i tried to update MTU size to 9000.
After setting Jumbo frame running ceph -s is timing out.

regards
Amudhan P
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs IO halt on Node failure

2020-05-19 Thread Amudhan P

Behaviour is same even after setting min_size 2.

On Mon 18 May, 2020, 12:34 PM Eugen Block,  wrote:

> If your pool has a min_size 2 and size 2 (always a bad idea) it will
> pause IO in case of a failure until the recovery has finished. So the
> described behaviour is expected.
>
>
> Zitat von Amudhan P :
>
> > Hi,
> >
> > Crush rule is "replicated" and min_size 2 actually. I am trying to test
> > multiple volume configs in a single filesystem
> > using file layout.
> >
> > I have created metadata pool with rep 3 (min_size2 and replicated crush
> > rule) and data pool with rep 3  (min_size2 and replicated crush rule).
> and
> > also  I have created multiple (replica 2, ec2-1 & ec4-2) pools and added
> to
> > the filesystem.
> >
> > Using file layout I have set different data pool to a different folders.
> so
> > I can test different configs in the same filesystem. all data pools
> > min_size set to handle single node failure.
> >
> > Single node failure is handled properly when only having metadata pool
> and
> > one data pool (rep3).
> >
> > After adding additional data pool to fs, single node failure scenario is
> > not working.
> >
> > regards
> > Amudhan P
> >
> > On Sun, May 17, 2020 at 1:29 AM Eugen Block  wrote:
> >
> >> What’s your pool configuration wrt min_size and crush rules?
> >>
> >>
> >> Zitat von Amudhan P :
> >>
> >> > Hi,
> >> >
> >> > I am using ceph Nautilus cluster with below configuration.
> >> >
> >> > 3 node's (Ubuntu 18.04) each has 12 OSD's, and mds, mon and mgr are
> >> running
> >> > in shared mode.
> >> >
> >> > The client mounted through ceph kernel client.
> >> >
> >> > I was trying to emulate a node failure when a write and read were
> going
> >> on
> >> > (replica2) pool.
> >> >
> >> > I was expecting read and write continue after a small pause due to a
> Node
> >> > failure but it halts and never resumes until the failed node is up.
> >> >
> >> > I remember I tested the same scenario before in ceph mimic where it
> >> > continued IO after a small pause.
> >> >
> >> > regards
> >> > Amudhan P
> >> > ___
> >> > ceph-users mailing list -- ceph-users@ceph.io
> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >>
> >>
> >>
>
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs IO halt on Node failure

2020-05-17 Thread Amudhan P

Hi,

Crush rule is "replicated" and min_size 2 actually. I am trying to test
multiple volume configs in a single filesystem
using file layout.

I have created metadata pool with rep 3 (min_size2 and replicated crush
rule) and data pool with rep 3  (min_size2 and replicated crush rule). and
also  I have created multiple (replica 2, ec2-1 & ec4-2) pools and added to
the filesystem.

Using file layout I have set different data pool to a different folders. so
I can test different configs in the same filesystem. all data pools
min_size set to handle single node failure.

Single node failure is handled properly when only having metadata pool and
one data pool (rep3).

After adding additional data pool to fs, single node failure scenario is
not working.

regards
Amudhan P

On Sun, May 17, 2020 at 1:29 AM Eugen Block  wrote:

> What’s your pool configuration wrt min_size and crush rules?
>
>
> Zitat von Amudhan P :
>
> > Hi,
> >
> > I am using ceph Nautilus cluster with below configuration.
> >
> > 3 node's (Ubuntu 18.04) each has 12 OSD's, and mds, mon and mgr are
> running
> > in shared mode.
> >
> > The client mounted through ceph kernel client.
> >
> > I was trying to emulate a node failure when a write and read were going
> on
> > (replica2) pool.
> >
> > I was expecting read and write continue after a small pause due to a Node
> > failure but it halts and never resumes until the failed node is up.
> >
> > I remember I tested the same scenario before in ceph mimic where it
> > continued IO after a small pause.
> >
> > regards
> > Amudhan P
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Cephfs IO halt on Node failure

2020-05-16 Thread Amudhan P

Hi,

I am using ceph Nautilus cluster with below configuration.

3 node's (Ubuntu 18.04) each has 12 OSD's, and mds, mon and mgr are running
in shared mode.

The client mounted through ceph kernel client.

I was trying to emulate a node failure when a write and read were going on
(replica2) pool.

I was expecting read and write continue after a small pause due to a Node
failure but it halts and never resumes until the failed node is up.

I remember I tested the same scenario before in ceph mimic where it
continued IO after a small pause.

regards
Amudhan P
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephfs - NFS Ganesha

2020-05-16 Thread Amudhan P

Tried to add it to ganesha.conf but didn't work out.

I tried using the default "ganesh-ceph.conf" file comes which comes with
"ganesha-ceph" installation is working fine.
I will try again using conf file provided in nfs-ganesha github.

On Fri, May 15, 2020 at 6:30 PM Daniel Gryniewicz  wrote:

> It sounds like you're putting the FSAL_CEPH config in another file in
> /etc/ganesha.  Ganesha only loads one file: /etc/ganesha/ganesha.conf -
> other files need to be included in that file with the %include command.
> For a simple config like yours, just use the single
> /etc/ganesha/ganesha.conf file.
>
> Daniel
>
> On 5/15/20 4:59 AM, Amudhan P wrote:
> > Hi Rafael,
> >
> > I have used config you have provided but still i am not able mount nfs. I
> > don't see any error in log msg
> >
> > Output from ganesha.log
> > ---
> > 15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl :
> ganesha.nfsd-8732[main]
> > main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version 2.6.0
> > 15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl :
> ganesha.nfsd-8738[main]
> > nfs_set_param_from_conf :NFS STARTUP :EVENT :Configuration file
> > successfully parsed
> > 15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl :
> ganesha.nfsd-8738[main]
> > init_server_pkgs :NFS STARTUP :EVENT :Initializing ID Mapper.
> > 15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl :
> ganesha.nfsd-8738[main]
> > init_server_pkgs :NFS STARTUP :EVENT :ID Mapper successfully initialized.
> > 15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl :
> ganesha.nfsd-8738[main]
> > lower_my_caps :NFS STARTUP :EVENT :CAP_SYS_RESOURCE was successfully
> > removed for proper quota
> >   management in FSAL
> > 15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl :
> ganesha.nfsd-8738[main]
> > lower_my_caps :NFS STARTUP :EVENT :currenty set capabilities are: =
> > cap_chown,cap_dac_overrid
> >
> e,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_
> >
> raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty
> >
> _config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep
> > 15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl :
> ganesha.nfsd-8738[main]
> > nfs_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 90
> > 15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl :
> ganesha.nfsd-8738[main]
> > nfs_Init_svc :DISP :CRIT :Cannot acquire credentials for principal nfs
> > 15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl :
> ganesha.nfsd-8738[main]
> > nfs_Init_admin_thread :NFS CB :EVENT :Admin thread initialized
> > 15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl :
> ganesha.nfsd-8738[main]
> > nfs_rpc_cb_init_ccache :NFS STARTUP :EVENT :Callback creds directory
> > (/var/run/ganesha) alrea
> > dy exists
> > 15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl :
> ganesha.nfsd-8738[main]
> > nfs_rpc_cb_init_ccache :NFS STARTUP :WARN
> > :gssd_refresh_krb5_machine_credential failed (-1765
> > 328160:0)
> > 15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl :
> ganesha.nfsd-8738[main]
> > nfs_Start_threads :THREAD :EVENT :Starting delayed executor.
> > 15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl :
> ganesha.nfsd-8738[main]
> > nfs_Start_threads :THREAD :EVENT :9P/TCP dispatcher thread was started
> > successfully
> > 15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl :
> > ganesha.nfsd-8738[_9p_disp] _9p_dispatcher_thread :9P DISP :EVENT :9P
> > dispatcher started
> > 15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl :
> ganesha.nfsd-8738[main]
> > nfs_Start_threads :THREAD :EVENT :gsh_dbusthread was started successfully
> > 15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl :
> ganesha.nfsd-8738[main]
> > nfs_Start_threads :THREAD :EVENT :admin thread was started successfully
> > 15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl :
> ganesha.nfsd-8738[main]
> > nfs_Start_threads :THREAD :EVENT :reaper thread was started successfully
> > 15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl :
> ganesha.nfsd-8738[main]
> > nfs_Start_threads :THREAD :EVENT :General fridge was started successfully
> > 15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl :
> ganesha.nfsd-8738[main]
> > nfs_start :NFS STARTUP :EVENT
> > :-
> > 15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl :
> ganesha.nfsd-8738[main]
>

[ceph-users] Re: Cephfs - NFS Ganesha

2020-05-15 Thread Amudhan P

Hi Rafael,

I have used config you have provided but still i am not able mount nfs. I
don't see any error in log msg

Output from ganesha.log
---
15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl : ganesha.nfsd-8732[main]
main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version 2.6.0
15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl : ganesha.nfsd-8738[main]
nfs_set_param_from_conf :NFS STARTUP :EVENT :Configuration file
successfully parsed
15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl : ganesha.nfsd-8738[main]
init_server_pkgs :NFS STARTUP :EVENT :Initializing ID Mapper.
15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl : ganesha.nfsd-8738[main]
init_server_pkgs :NFS STARTUP :EVENT :ID Mapper successfully initialized.
15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl : ganesha.nfsd-8738[main]
lower_my_caps :NFS STARTUP :EVENT :CAP_SYS_RESOURCE was successfully
removed for proper quota
 management in FSAL
15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl : ganesha.nfsd-8738[main]
lower_my_caps :NFS STARTUP :EVENT :currenty set capabilities are: =
cap_chown,cap_dac_overrid
e,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_
raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty
_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep
15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl : ganesha.nfsd-8738[main]
nfs_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 90
15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl : ganesha.nfsd-8738[main]
nfs_Init_svc :DISP :CRIT :Cannot acquire credentials for principal nfs
15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl : ganesha.nfsd-8738[main]
nfs_Init_admin_thread :NFS CB :EVENT :Admin thread initialized
15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl : ganesha.nfsd-8738[main]
nfs_rpc_cb_init_ccache :NFS STARTUP :EVENT :Callback creds directory
(/var/run/ganesha) alrea
dy exists
15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl : ganesha.nfsd-8738[main]
nfs_rpc_cb_init_ccache :NFS STARTUP :WARN
:gssd_refresh_krb5_machine_credential failed (-1765
328160:0)
15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl : ganesha.nfsd-8738[main]
nfs_Start_threads :THREAD :EVENT :Starting delayed executor.
15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl : ganesha.nfsd-8738[main]
nfs_Start_threads :THREAD :EVENT :9P/TCP dispatcher thread was started
successfully
15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl :
ganesha.nfsd-8738[_9p_disp] _9p_dispatcher_thread :9P DISP :EVENT :9P
dispatcher started
15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl : ganesha.nfsd-8738[main]
nfs_Start_threads :THREAD :EVENT :gsh_dbusthread was started successfully
15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl : ganesha.nfsd-8738[main]
nfs_Start_threads :THREAD :EVENT :admin thread was started successfully
15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl : ganesha.nfsd-8738[main]
nfs_Start_threads :THREAD :EVENT :reaper thread was started successfully
15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl : ganesha.nfsd-8738[main]
nfs_Start_threads :THREAD :EVENT :General fridge was started successfully
15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl : ganesha.nfsd-8738[main]
nfs_start :NFS STARTUP :EVENT
:-
15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl : ganesha.nfsd-8738[main]
nfs_start :NFS STARTUP :EVENT : NFS SERVER INITIALIZED
15/05/2020 08:50:43 : epoch 5ebe57e3 : strgcntrl : ganesha.nfsd-8738[main]
nfs_start :NFS STARTUP :EVENT
:-
15/05/2020 08:52:13 : epoch 5ebe57e3 : strgcntrl :
ganesha.nfsd-8738[reaper] nfs_lift_grace_locked :STATE :EVENT :NFS Server
Now NOT IN GRACE

Regards
Amudhan P

On Fri, May 15, 2020 at 1:01 PM Rafael Lopez 
wrote:

> Hello Amudhan,
>
> The only ceph specific thing required in the ganesha config is to add the
> FSAL block to your export, everything else is standard ganesha config as
> far as I know. eg: this would export the root dir of your cephfs as
> nfs-server:/cephfs
> EXPORT
> {
> Export_ID = 100;
> Path = /;
> Pseudo = /cephfs;
> FSAL {
> Name = CEPH;
> User_Id = cephfs_cephx_user;
> }
> CLIENT {
> Clients =  1.2.3.4;
> Access_type = RW;
> }
> }
>
> This will rely on ceph config in /etc/ceph/ceph.conf containing typical
> cluster client connection info (cluster id, mon addresses etc).
> You also have to have the cephx user specified configured for cephfs
> access, including the keyring file in
> /etc/ceph/ceph.client.cephfs_cephx_user.keyring.
>
> Your cephx user co

[ceph-users] Cephfs - NFS Ganesha

2020-05-15 Thread Amudhan P

Hi,

I am trying to setup NFS ganesh in Ceph Nautilus.

In a ubuntu 18.04 system i have installed nfs-ganesha (v2.6) and
nfs-ganesha-ceph pkg and followed the steps in the link
https://docs.ceph.com/docs/nautilus/cephfs/nfs/  but i am not able to
export my cephfs volume there is no error msg in nfs-ganesha, also i doubt
whether its loading nfs-ganesha-ceph config file from "/etc/ganesha" folder.

>From same system i am able to mount thru ceph kernel client without any
issue?

How do i make this work?

regards
Amudhan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cluster network and public network

2020-05-14 Thread Amudhan P

Will EC based write benefit from Public network and Cluster network?


On Thu, May 14, 2020 at 1:39 PM lin yunfan  wrote:

> That is correct.I didn't explain it clearly. I said that is because in
> some write only scenario  the public network and cluster network will
> all be saturated the same time.
> linyunfan
>
> Janne Johansson  于2020年5月14日周四 下午3:42写道：
> >
> > Den tors 14 maj 2020 kl 08:42 skrev lin yunfan :
> >>
> >> Besides the recoverry  scenario , in a write only scenario the cluster
> >> network will use the almost the same bandwith as public network.
> >
> >
> > That would depend on the replication factor. If it is high, I would
> assume every MB from the client network would make (repl-factor - 1) times
> the data on the private network to send replication requests to the other
> OSD hosts with the same amount of data.
> >
> > --
> > May the most significant bit of your life be positive.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Memory usage of OSD

2020-05-13 Thread Amudhan P

For Ceph  release before nautilus  to effect osd_memory_target changes need
to restart OSD service.

I had similar issue in mimic I did the same in my test setup.

Before restarting OSD service ensure you set osd nodown and osd noout
similar commands to ensure it doesn't trigger OSD down and recovery.

On Wed 13 May, 2020, 10:03 PM Mark Nelson,  wrote:

> Coincidentally Adam on our core team just reported this morning that he
> saw extremely high bluestore_cache_other memory usage while running
> compression performance tests as well.  That may indicate we have a
> memory leak related to the compression code.  I doubt setting the
> memory_target to 3GiB will help in the long run as that will just
> attempt to compensate by decreasing the other caches until nothing else
> can be shrunk.  Adam said he's planning to investigate so hopefully we
> will know more soon.
>
>
> Mark
>
>
>
> On 5/13/20 10:52 AM, Rafał Wądołowski wrote:
> > Mark,
> > Unfortunetly I closed terminal with mempool. But there was a lot of
> > bytes used by bluestore_cache_other. That was the highest value (about
> > 85%). The onode cache takes about 10%. PGlog and osdmaps was okey, low
> > values. I saw some ideas that maybe compression_mode force in pool can
> > make a mess.
> > One more thing, we are running stupid allocator. Right now I am
> > decrease the osd_memory_target to 3GiB and will wait if ram problem
> > occurs.
> >
> >
> >
> > Regards,
> >
> > */Rafał Wądołowski/*
> >
> > 
> > *From:* Mark Nelson 
> > *Sent:* Wednesday, May 13, 2020 3:30 PM
> > *To:* ceph-users@ceph.io 
> > *Subject:* [ceph-users] Re: Memory usage of OSD
> > On 5/13/20 12:43 AM, RafaĹ‚ WÄ…doĹ‚owski wrote:
> > > Hi,
> > > I noticed strange situation in one of our clusters. The OSD deamons
> > are taking too much RAM.
> > > We are running 12.2.12 and have default configuration of
> > osd_memory_target (4GiB).
> > > Heap dump shows:
> > >
> > > osd.2969 dumping heap profile now.
> > > 
> > > MALLOC: 6381526944 ( 6085.9 MiB) Bytes in use by application
> > > MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
> > > MALLOC: +173373288 (  165.3 MiB) Bytes in central cache freelist
> > > MALLOC: + 17163520 (   16.4 MiB) Bytes in transfer cache freelist
> > > MALLOC: + 95339512 (   90.9 MiB) Bytes in thread cache freelists
> > > MALLOC: + 28995744 (   27.7 MiB) Bytes in malloc metadata
> > > MALLOC:   
> > > MALLOC: =   6696399008 ( 6386.2 MiB) Actual memory used (physical +
> > swap)
> > > MALLOC: +218267648 (  208.2 MiB) Bytes released to OS (aka
> unmapped)
> > > MALLOC:   
> > > MALLOC: =   691456 ( 6594.3 MiB) Virtual address space used
> > > MALLOC:
> > > MALLOC: 408276  Spans in use
> > > MALLOC: 75  Thread heaps in use
> > > MALLOC:   8192  Tcmalloc page size
> > > 
> > > Call ReleaseFreeMemory() to release freelist memory to the OS (via
> > madvise()).
> > > Bytes released to the OS take up virtual address space but no
> > physical memory.
> > >
> > > IMO "Bytes in use by application" should be less than
> > osd_memory_target. Am I correct?
> > > I checked heap dump with google-pprof and got following results.
> > > Total: 149.4 MB
> > >  60.5  40.5%  40.5% 60.5  40.5%
> > rocksdb::UncompressBlockContentsForCompressionType
> > >  34.2  22.9%  63.4% 34.2  22.9%
> > ceph::buffer::create_aligned_in_mempool
> > >  11.9   7.9%  71.3% 12.1   8.1%
> > std::_Rb_tree::_M_emplace_hint_unique
> > >  10.7   7.1%  78.5% 71.2  47.7% rocksdb::ReadBlockContents
> > >
> > > Does it mean that most of RAM is used by rocksdb?
> >
> >
> > It looks like your heap dump is only accounting for 149.4MB of the
> > memory so probably not representative across the whole ~6.5G. Instead
> > could you try dumping the mempools via "ceph daemon osd.2969
> > dump_mempools"?
> >
> >
> > >
> > > How can I take a deeper look into memory usage ?
> >
> >
> > Beyond looking at the mempools, you can see the bluestore cache
> > allocation information by either enabling debug bluestore and debug
> > priority_cache_manager 5, or potentially looking at the PCM perf
> > counters (I'm not sure if those were in 14.2.12 though). Between the
> > heap data, mempool data, and priority cache records, it should become
> > clearer what's going on.
> >
> >
> > Mark
> >
> >
> > >
> > >
> > > Regards,
> > >
> > > RafaĹ‚ WÄ…doĹ‚owski
> > >
> > >
> > >
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>

[ceph-users] Read speed low in cephfs volume exposed as samba share using vfs_ceph

2020-05-12 Thread Amudhan P

Hi,

I am running a small 3 node Ceph Nautilus 14.2.8 cluster on Ubuntu 18.04.

I am testing cluster to expose cephfs volume in samba v4 share for the user
to access from windows for latter use.
Samba Version 4.7.6-Ubuntu and mount.cifs version: 6.8.

When I did a test with DD Write (600 MB/s) and md5sum file Read speed is
(300 - 400 MB/s) from ceph kernel mount.

The same volume I have exposed in samba using "vfs_ceph" and mounted it
through CIFS in another ubuntu18.04 as client.
Now, when I perform DD write I get the speed of 600 MB/s and md5sum of file
Read speed is only 65 MB/s.

There is a different result when I try to read the same file using
smbclinet getting the speed of 101 MB/s.

Why is this difference what could be the issue?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Cifs slow read speed

2020-05-12 Thread Amudhan P

Hi,

I am running  a small Ceph Nautilus cluster on Ubuntu 18.04.

I am testing cluster to expose cephfs volume in samba v4 share for user to
access from windows.

When i do test with DD Write (600 MB/s) and md5sum file Read speed is (700
- 800 MB/s) from ceph kernel mount.

Same volume i have exposed in samba using "vfs_ceph" and mounted it thru
cifs in another ubuntu18.04 as client.
Now, when i perform DD write i get speed of 600 MB/s and md5sum of file
Read speed is only 65 MB/s.

What could be the problem? Any one faced similar issue?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: New 3 node Ceph cluster

2020-03-15 Thread Amudhan P

What if I add another 16GB of RAM for a Planned capacity of not more than
150TB?

On Sun 15 Mar, 2020, 7:26 PM Martin Verges,  wrote:

> This is too little memory. We have already seen MDS with well over 50 GB
> Ram requirements.
>
> --
> Martin Verges
> Managing director
>
> Mobile: +49 174 9335695
> E-Mail: martin.ver...@croit.io
> Chat: https://t.me/MartinVerges
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
>
> Web: https://croit.io
> YouTube: https://goo.gl/PGE1Bx
>
>
> Am So., 15. März 2020 um 14:34 Uhr schrieb Amudhan P  >:
>
>> Thank you, All for your suggestions and ideas.
>>
>> what is your view on using MON, MGR, MDS and cephfs client or samba-ceph
>> vfs in a single machine (10 core xeon CPU with 16GB RAM and SSD disk)?.
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: New 3 node Ceph cluster

2020-03-15 Thread Amudhan P

Thank you, All for your suggestions and ideas.

what is your view on using MON, MGR, MDS and cephfs client or samba-ceph
vfs in a single machine (10 core xeon CPU with 16GB RAM and SSD disk)?.

On Sun, Mar 15, 2020 at 3:58 PM Dr. Marco Savoca 
wrote:

> Hi Jesper,
>
>
>
> can you state your suggestion more precisely? I have a similar setup and
> I’m also interested.
>
>
>
> If i understand you right, you suggest to create an RBD image for data
> that attachs to a VM with installed samba Server.
>
>
>
> But what would be the „best“ way to connect? Kernel module mapping or
> iSCSI targets.
>
>
>
> Another possibilty would be to create an RBD Image containing data and
> samba and use it with QEMU.
>
>
>
> Regards
>
>
>
> Marco Savoca
>
>
>
> *Von: *jes...@krogh.cc
> *Gesendet: *Samstag, 14. März 2020 09:15
> *An: *Amudhan P 
> *Cc: *ceph-users 
> *Betreff: *[ceph-users] Re: New 3 node Ceph cluster
>
>
>
> Hi.
>
>
>
> Unless there is plans for going to Petabyte scale with it - then I really
>
> dont see the benefits of getting CephFS involved over just an RBD image
>
> with a VM running standard samba on top.
>
>
>
> More performant and less complexity to handle - zero gains (by my book)
>
>
>
> Jesper
>
>
>
> > Hi,
>
> >
>
> > I am planning to create a new 3 node ceph storage cluster.
>
> >
>
> > I will be using Cephfs + with samba for max 10 clients for upload and
>
> > download.
>
> >
>
> > Storage Node HW is Intel Xeon E5v2 8 core single Proc, 32GB RAM and 10Gb
>
> > Nic 2 nos., 6TB SATA  HDD 24 Nos. each node, OS separate SSD disk.
>
> >
>
> > Earlier I have tested orchestration using ceph-deploy in the test setup.
>
> > now, is there any other alternative to ceph-deploy?
>
> >
>
> > Can I restrict folder access to the user using cephfs + vfs samba or
>
> > should
>
> > I use ceph client + samba?
>
> >
>
> > Ubuntu or Centos?
>
> >
>
> > Any block size consideration for object size, metadata when using cephfs?
>
> >
>
> > Idea or suggestion from existing users. I am also going to start to
>
> > explore
>
> > all the above.
>
> >
>
> > regards
>
> > Amudhan
>
> > ___
>
> > ceph-users mailing list -- ceph-users@ceph.io
>
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
> >
>
>
>
> ___
>
> ceph-users mailing list -- ceph-users@ceph.io
>
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] New 3 node Ceph cluster

2020-03-14 Thread Amudhan P

Hi,

I am planning to create a new 3 node ceph storage cluster.

I will be using Cephfs + with samba for max 10 clients for upload and
download.

Storage Node HW is Intel Xeon E5v2 8 core single Proc, 32GB RAM and 10Gb
Nic 2 nos., 6TB SATA  HDD 24 Nos. each node, OS separate SSD disk.

Earlier I have tested orchestration using ceph-deploy in the test setup.
now, is there any other alternative to ceph-deploy?

Can I restrict folder access to the user using cephfs + vfs samba or should
I use ceph client + samba?

Ubuntu or Centos?

Any block size consideration for object size, metadata when using cephfs?

Idea or suggestion from existing users. I am also going to start to explore
all the above.

regards
Amudhan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: mds log showing msg with HANGUP

2019-10-22 Thread Amudhan P

Ok, thanks.

On Mon, Oct 21, 2019 at 8:28 AM Konstantin Shalygin  wrote:

> On 10/18/19 8:43 PM, Amudhan P wrote:
> > I am getting below error msg in ceph nautilus cluster, do I need to
> > worry about this?
> >
> > Oct 14 06:25:02 mon01 ceph-mds[35067]: 2019-10-14 06:25:02.209
> > 7f55a4c48700 -1 received  signal: Hangup from killall -q -1 ceph-mon
> > ceph-mgr ceph-mds ceph-osd ceph-fuse
> > Oct 14 06:25:02 mon01 ceph-mds[35067]: 2019-10-14 06:25:02.253
> > 7f55a4c48700 -1 received  signal: Hangup from  (PID: 244322) UID: 0
> > Oct 15 06:25:01 mon01 ceph-mds[35067]: 2019-10-15 06:25:01.988
> > 7f55a4c48700 -1 received  signal: Hangup from killall -q -1 ceph-mon
> > ceph-mgr ceph-mds ceph-osd ceph-fuse
> > Oct 15 06:25:02 mon01 ceph-mds[35067]: 2019-10-15 06:25:02.040
> > 7f55a4c48700 -1 received  signal: Hangup from pkill -1 -x
> > ceph-mon|ceph-mgr|ceph-mds|ceph-osd|ceph-fuse|r
> > Oct 16 06:25:02 mon01 ceph-mds[35067]: 2019-10-16 06:25:02.646
> > 7f55a4c48700 -1 received  signal: Hangup from killall -q -1 ceph-mon
> > ceph-mgr ceph-mds ceph-osd ceph-fuse
> > Oct 16 06:25:02 mon01 ceph-mds[35067]: 2019-10-16 06:25:02.678
> > 7f55a4c48700 -1 received  signal: Hangup from  (PID: 305528) UID: 0
> > Oct 17 06:25:02 mon01 ceph-mds[35067]: 2019-10-17 06:25:02.337
> > 7f55a4c48700 -1 received  signal: Hangup from killall -q -1 ceph-mon
> > ceph-mgr ceph-mds ceph-osd ceph-fuse
> > Oct 17 06:25:02 mon01 ceph-mds[35067]: 2019-10-17 06:25:02.381
> > 7f55a4c48700 -1 received  signal: Hangup from  (PID: 374957) UID: 0
> > Oct 18 06:25:01 mon01 ceph-mds[35067]: 2019-10-18 06:25:01.947
> > 7f55a4c48700 -1 received  signal: Hangup from killall -q -1 ceph-mon
> > ceph-mgr ceph-mds ceph-osd ceph-fuse
> > Oct 18 06:25:02 mon01 ceph-mds[35067]: 2019-10-18 06:25:02.015
> > 7f55a4c48700 -1 Fail to open '/proc/436318/cmdline' error = (2) No
> > such file or directory
> > Oct 18 06:25:02 mon01 ceph-mds[35067]: 2019-10-18 06:25:02.015
> > 7f55a4c48700 -1 received  signal: Hangup from  (PID: 436318)
> > UID: 0
>
>
> Is just a logrotate cron task, don't worry about it.
>
>
>
> k
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: How to reduce or control memory usage during recovery?

2019-09-24 Thread Amudhan P

memory usage was high even when backfills is set to "1".

On Mon, Sep 23, 2019 at 8:54 PM Robert LeBlanc  wrote:

> On Fri, Sep 20, 2019 at 5:41 AM Amudhan P  wrote:
> > I have already set "mon osd memory target to 1Gb" and I have set
> max-backfill from 1 to 8.
>
> Reducing the number of backfills should reduce the amount of memory,
> especially for EC pools.
>
> 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Host failure trigger " Cannot allocate memory"

2019-09-10 Thread Amudhan P

Its a test cluster each node with a single OSD and 4GB RAM.

On Tue, Sep 10, 2019 at 3:42 PM Ashley Merrick 
wrote:

> What's specs ate the machines?
>
> Recovery work will use more memory the general clean operation and looks
> like your maxing out the available memory on the machines during CEPH
> trying to recover.
>
>
>
>  On Tue, 10 Sep 2019 18:10:50 +0800 * amudha...@gmail.com
>  * wrote 
>
> I have also found below error in dmesg.
>
> [332884.028810] systemd-journald[6240]: Failed to parse kernel command
> line, ignoring: Cannot allocate memory
> [332885.054147] systemd-journald[6240]: Out of memory.
> [332894.844765] systemd[1]: systemd-journald.service: Main process exited,
> code=exited, status=1/FAILURE
> [332897.199736] systemd[1]: systemd-journald.service: Failed with result
> 'exit-code'.
> [332906.503076] systemd[1]: Failed to start Journal Service.
> [332937.909198] systemd[1]: ceph-crash.service: Main process exited,
> code=exited, status=1/FAILURE
> [332939.308341] systemd[1]: ceph-crash.service: Failed with result
> 'exit-code'.
> [332949.545907] systemd[1]: systemd-journald.service: Service has no
> hold-off time, scheduling restart.
> [332949.546631] systemd[1]: systemd-journald.service: Scheduled restart
> job, restart counter is at 7.
> [332949.546781] systemd[1]: Stopped Journal Service.
> [332949.566402] systemd[1]: Starting Journal Service...
> [332950.190332] systemd[1]: ceph-osd@1.service: Main process exited,
> code=killed, status=6/ABRT
> [332950.190477] systemd[1]: ceph-osd@1.service: Failed with result
> 'signal'.
> [332950.842297] systemd-journald[6249]: File
> /var/log/journal/8f2559099bf54865adc95e5340d04447/system.journal corrupted
> or uncleanly shut down, renaming and replacing.
> [332951.019531] systemd[1]: Started Journal Service.
>
> On Tue, Sep 10, 2019 at 3:04 PM Amudhan P  wrote:
>
> Hi,
>
> I am using ceph version 13.2.6 (mimic) on test setup trying with cephfs.
>
> My current setup:
> 3 nodes, 1 node contain two bricks and other 2 nodes contain single brick
> each.
>
> Volume is a 3 replica, I am trying to simulate node failure.
>
> I powered down one host and started getting msg in other systems when
> running any command
> "-bash: fork: Cannot allocate memory" and system not responding to
> commands.
>
> what could be the reason for this?
> at this stage, I could able to read some of the data stored in the volume
> and some just waiting for IO.
>
> output from "sudo ceph -s"
>   cluster:
> id: 7c138e13-7b98-4309-b591-d4091a1742b4
> health: HEALTH_WARN
> 1 osds down
> 2 hosts (3 osds) down
> Degraded data redundancy: 5313488/7970232 objects degraded
> (66.667%), 64 pgs degraded
>
>   services:
> mon: 1 daemons, quorum mon01
> mgr: mon01(active)
> mds: cephfs-tst-1/1/1 up  {0=mon01=up:active}
> osd: 4 osds: 1 up, 2 in
>
>   data:
> pools:   2 pools, 64 pgs
> objects: 2.66 M objects, 206 GiB
> usage:   421 GiB used, 3.2 TiB / 3.6 TiB avail
> pgs: 5313488/7970232 objects degraded (66.667%)
>  64 active+undersized+degraded
>
>   io:
> client:   79 MiB/s rd, 24 op/s rd, 0 op/s wr
>
> output from : sudo ceph osd df
> ID CLASS WEIGHT  REWEIGHT SIZEUSE AVAIL   %USE  VAR  PGS
>  0   hdd 1.819400 0 B 0 B 0 B 00   0
>  3   hdd 1.819400 0 B 0 B 0 B 00   0
>  1   hdd 1.81940  1.0 1.8 TiB 211 GiB 1.6 TiB 11.34 1.00   0
>  2   hdd 1.81940  1.0 1.8 TiB 210 GiB 1.6 TiB 11.28 1.00  64
> TOTAL 3.6 TiB 421 GiB 3.2 TiB 11.31
> MIN/MAX VAR: 1.00/1.00  STDDEV: 0.03
>
> regards
> Amudhan
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Host failure trigger " Cannot allocate memory"

2019-09-10 Thread Amudhan P

I have also found below error in dmesg.

[332884.028810] systemd-journald[6240]: Failed to parse kernel command
line, ignoring: Cannot allocate memory
[332885.054147] systemd-journald[6240]: Out of memory.
[332894.844765] systemd[1]: systemd-journald.service: Main process exited,
code=exited, status=1/FAILURE
[332897.199736] systemd[1]: systemd-journald.service: Failed with result
'exit-code'.
[332906.503076] systemd[1]: Failed to start Journal Service.
[332937.909198] systemd[1]: ceph-crash.service: Main process exited,
code=exited, status=1/FAILURE
[332939.308341] systemd[1]: ceph-crash.service: Failed with result
'exit-code'.
[332949.545907] systemd[1]: systemd-journald.service: Service has no
hold-off time, scheduling restart.
[332949.546631] systemd[1]: systemd-journald.service: Scheduled restart
job, restart counter is at 7.
[332949.546781] systemd[1]: Stopped Journal Service.
[332949.566402] systemd[1]: Starting Journal Service...
[332950.190332] systemd[1]: ceph-osd@1.service: Main process exited,
code=killed, status=6/ABRT
[332950.190477] systemd[1]: ceph-osd@1.service: Failed with result 'signal'.
[332950.842297] systemd-journald[6249]: File
/var/log/journal/8f2559099bf54865adc95e5340d04447/system.journal corrupted
or uncleanly shut down, renaming and replacing.
[332951.019531] systemd[1]: Started Journal Service.

On Tue, Sep 10, 2019 at 3:04 PM Amudhan P  wrote:

> Hi,
>
> I am using ceph version 13.2.6 (mimic) on test setup trying with cephfs.
>
> My current setup:
> 3 nodes, 1 node contain two bricks and other 2 nodes contain single brick
> each.
>
> Volume is a 3 replica, I am trying to simulate node failure.
>
> I powered down one host and started getting msg in other systems when
> running any command
> "-bash: fork: Cannot allocate memory" and system not responding to
> commands.
>
> what could be the reason for this?
> at this stage, I could able to read some of the data stored in the volume
> and some just waiting for IO.
>
> output from "sudo ceph -s"
>   cluster:
> id: 7c138e13-7b98-4309-b591-d4091a1742b4
> health: HEALTH_WARN
> 1 osds down
> 2 hosts (3 osds) down
> Degraded data redundancy: 5313488/7970232 objects degraded
> (66.667%), 64 pgs degraded
>
>   services:
> mon: 1 daemons, quorum mon01
> mgr: mon01(active)
> mds: cephfs-tst-1/1/1 up  {0=mon01=up:active}
> osd: 4 osds: 1 up, 2 in
>
>   data:
> pools:   2 pools, 64 pgs
> objects: 2.66 M objects, 206 GiB
> usage:   421 GiB used, 3.2 TiB / 3.6 TiB avail
> pgs: 5313488/7970232 objects degraded (66.667%)
>  64 active+undersized+degraded
>
>   io:
> client:   79 MiB/s rd, 24 op/s rd, 0 op/s wr
>
> output from : sudo ceph osd df
> ID CLASS WEIGHT  REWEIGHT SIZEUSE AVAIL   %USE  VAR  PGS
>  0   hdd 1.819400 0 B 0 B 0 B 00   0
>  3   hdd 1.819400 0 B 0 B 0 B 00   0
>  1   hdd 1.81940  1.0 1.8 TiB 211 GiB 1.6 TiB 11.34 1.00   0
>  2   hdd 1.81940  1.0 1.8 TiB 210 GiB 1.6 TiB 11.28 1.00  64
> TOTAL 3.6 TiB 421 GiB 3.2 TiB 11.31
> MIN/MAX VAR: 1.00/1.00  STDDEV: 0.03
>
> regards
> Amudhan
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Host failure trigger " Cannot allocate memory"

2019-09-10 Thread Amudhan P

Hi,

I am using ceph version 13.2.6 (mimic) on test setup trying with cephfs.

My current setup:
3 nodes, 1 node contain two bricks and other 2 nodes contain single brick
each.

Volume is a 3 replica, I am trying to simulate node failure.

I powered down one host and started getting msg in other systems when
running any command
"-bash: fork: Cannot allocate memory" and system not responding to commands.

what could be the reason for this?
at this stage, I could able to read some of the data stored in the volume
and some just waiting for IO.

output from "sudo ceph -s"
  cluster:
id: 7c138e13-7b98-4309-b591-d4091a1742b4
health: HEALTH_WARN
1 osds down
2 hosts (3 osds) down
Degraded data redundancy: 5313488/7970232 objects degraded
(66.667%), 64 pgs degraded

  services:
mon: 1 daemons, quorum mon01
mgr: mon01(active)
mds: cephfs-tst-1/1/1 up  {0=mon01=up:active}
osd: 4 osds: 1 up, 2 in

  data:
pools:   2 pools, 64 pgs
objects: 2.66 M objects, 206 GiB
usage:   421 GiB used, 3.2 TiB / 3.6 TiB avail
pgs: 5313488/7970232 objects degraded (66.667%)
 64 active+undersized+degraded

  io:
client:   79 MiB/s rd, 24 op/s rd, 0 op/s wr

output from : sudo ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZEUSE AVAIL   %USE  VAR  PGS
 0   hdd 1.819400 0 B 0 B 0 B 00   0
 3   hdd 1.819400 0 B 0 B 0 B 00   0
 1   hdd 1.81940  1.0 1.8 TiB 211 GiB 1.6 TiB 11.34 1.00   0
 2   hdd 1.81940  1.0 1.8 TiB 210 GiB 1.6 TiB 11.28 1.00  64
TOTAL 3.6 TiB 421 GiB 3.2 TiB 11.31
MIN/MAX VAR: 1.00/1.00  STDDEV: 0.03

regards
Amudhan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] ceph cluster warning after adding disk to cluster

2019-09-04 Thread Amudhan P

Hi,

I am using ceph version 13.2.6 (mimic) on test setup trying with cephfs.
my ceph health status showing warning.

My current setup:
3 OSD node each with a single disk, recently I added one more disk in one
of the node and ceph cluster status showing warning.
I can see the progress but it was more than 12 hours but still its moving
objects.

How to increase the speed of moving objects?

output from "ceph -s"

  cluster:
id: 7c138e13-7b98-4309-b591-d4091a1742b4
health: HEALTH_WARN
834820/7943361 objects misplaced (10.510%)

  services:
mon: 1 daemons, quorum mon01
mgr: mon01(active)
mds: cephfs-tst-1/1/1 up  {0=mon01=up:active}
osd: 4 osds: 4 up, 4 in; 12 remapped pgs

  data:
pools:   2 pools, 64 pgs
objects: 2.65 M objects, 178 GiB
usage:   548 GiB used, 6.7 TiB / 7.3 TiB avail
pgs: 834820/7943361 objects misplaced (10.510%)
 52 active+clean
 11 active+remapped+backfill_wait
 1  active+remapped+backfilling

  io:
recovery: 0 B/s, 6 objects/s

output from "ceph osd df "

ID CLASS WEIGHT  REWEIGHT SIZEUSE AVAIL   %USE VAR  PGS
 0   hdd 1.81940  1.0 1.8 TiB  88 GiB 1.7 TiB 4.71 0.64  40
 3   hdd 1.81940  1.0 1.8 TiB  96 GiB 1.7 TiB 5.15 0.70  24
 1   hdd 1.81940  1.0 1.8 TiB 182 GiB 1.6 TiB 9.79 1.33  64
 2   hdd 1.81940  1.0 1.8 TiB 182 GiB 1.6 TiB 9.79 1.33  64
TOTAL 7.3 TiB 548 GiB 6.7 TiB 7.36
MIN/MAX VAR: 0.64/1.33  STDDEV: 2.43

regards
Amudhan P
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: help

2019-08-30 Thread Amudhan P

my cluster health status went to warning mode only after running mkdir of
1000's of folders with multiple subdirectories. if this has made OSD crash
does it really takes that long to heal empty directories.

On Fri, Aug 30, 2019 at 3:12 PM Janne Johansson  wrote:

> Den fre 30 aug. 2019 kl 10:49 skrev Amudhan P :
>
>> After leaving 12 hours time now cluster status is healthy, but why did it
>> take such a long time for backfill?
>> How do I fine-tune? if in case of same kind error pop-out again.
>>
>> The backfilling is taking a while because max_backfills = 1 and you only
>>> have 3 OSD's total so the backfilling per PG has to have for the previous
>>> PG backfill to complete.
>>>
>>>
> That setting is the main tuning, EXCEPT it will be at the expense of
> client traffic, so you can allow a large(r) amount of parallel recoveries
> and backfills, but of course it will be more noticeable for your client IO
> if you do.
>
> Lastly, getting backfill MB/s up is "best" done by having a huge amount of
> OSD hosts, and fast OSD drives and let the cluster work in parallel, as
> opposed to having 3 drives only because you will see no parallelism on that
> setup (if you have size=3 all OSDs are always involved in every single PG
> to recover) and you will just see overhead compare to what disk-read and
> disk-write would give on a single drive.
>
> --
> May the most significant bit of your life be positive.
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: help

2019-08-30 Thread Amudhan P

After leaving 12 hours time now cluster status is healthy, but why did it
take such a long time for backfill?

How do I fine-tune? if in case of same kind error pop-out again.


On Thu, Aug 29, 2019 at 6:52 PM Caspar Smit  wrote:

> Hi,
>
> This output doesn't show anything 'wrong' with the cluster. It's just
> still recovering (backfilling) from what seems like one of your OSD's
> crashed and restarted.
> The backfilling is taking a while because max_backfills = 1 and you only
> have 3 OSD's total so the backfilling per PG has to have for the previous
> PG backfill to complete.
>
> The real concern is not the current state of the cluster but how you end
> up in this state. Probably the script overloaded the OSD's.
>
> I also advise you to add a monitor to your other 2 nodes as well (running
> 3 mons total). Running 1 mon is not advised.
>
> Furthermore, just let the backfilling complete and HEALTH_OK will return
> eventually if nothing goes wrong in between.
>
> Met vriendelijke groet,
>
> Caspar Smit
> Systemengineer
> SuperNAS
> Dorsvlegelstraat 13
> 1445 PA Purmerend
>
> t: (+31) 299 410 414
> e: caspars...@supernas.eu
> w: www.supernas.eu
>
>
> Op do 29 aug. 2019 om 14:35 schreef Amudhan P :
>
>> output from "ceph -s "
>>
>>   cluster:
>> id: 7c138e13-7b98-4309-b591-d4091a1742b4
>> health: HEALTH_WARN
>> Degraded data redundancy: 1141587/7723191 objects degraded
>> (14.781%), 15 pgs degraded, 16 pgs undersized
>>
>>   services:
>> mon: 1 daemons, quorum mon01
>> mgr: mon01(active)
>> mds: cephfs-tst-1/1/1 up  {0=mon01=up:active}
>> osd: 3 osds: 3 up, 3 in; 16 remapped pgs
>>
>>   data:
>> pools:   2 pools, 64 pgs
>> objects: 2.57 M objects, 59 GiB
>> usage:   190 GiB used, 5.3 TiB / 5.5 TiB avail
>> pgs: 1141587/7723191 objects degraded (14.781%)
>>  48 active+clean
>>  15 active+undersized+degraded+remapped+backfill_wait
>>  1  active+undersized+remapped+backfilling
>>
>>   io:
>> recovery: 0 B/s, 10 objects/s
>>
>> output from  "ceph osd tree"
>> ID CLASS WEIGHT  TYPE NAME   STATUS REWEIGHT PRI-AFF
>> -1   5.45819 root default
>> -3   1.81940 host test-node1
>>  0   hdd 1.81940 osd.0   up  1.0 1.0
>> -5   1.81940 host test-node2
>>  1   hdd 1.81940 osd.1   up  1.0 1.0
>> -7   1.81940 host test-node3
>>  2   hdd 1.81940 osd.2   up  1.0 1.0
>>
>> failure domain not configured yet, setup is 3 OSD node each with a single
>> disk, 1 node with mon running.
>> the cluster was healthy until I run a script for creating multiple
>> folders.
>>
>> regards
>> Amudhan
>>
>> On Thu, Aug 29, 2019 at 5:33 PM Heðin Ejdesgaard Møller 
>> wrote:
>>
>>> In adition to ceph -s, could you provide the output of
>>> ceph osd tree
>>> and specify what your failure domain is ?
>>>
>>> /Heðin
>>>
>>>
>>> On hós, 2019-08-29 at 13:55 +0200, Janne Johansson wrote:
>>> >
>>> >
>>> > Den tors 29 aug. 2019 kl 13:50 skrev Amudhan P :
>>> > > Hi,
>>> > >
>>> > > I am using ceph version 13.2.6 (mimic) on test setup trying with
>>> > > cephfs.
>>> > > my ceph health status showing warning .
>>> > >
>>> > > "ceph health"
>>> > > HEALTH_WARN Degraded data redundancy: 1197023/7723191 objects
>>> > > degraded (15.499%)
>>> > >
>>> > > "ceph health detail"
>>> > > HEALTH_WARN Degraded data redundancy: 1197128/7723191 objects
>>> > > degraded (15.500%)
>>> > > PG_DEGRADED Degraded data redundancy: 1197128/7723191 objects
>>> > > degraded (15.500%)
>>> > > pg 2.0 is stuck undersized for 1076.454929, current state
>>> > > active+undersized+
>>> > > pg 2.2 is stuck undersized for 1076.456639, current state
>>> > > active+undersized+
>>> > >
>>> >
>>> > How does "ceph -s" look?
>>> > It should have more info on what else is wrong.
>>> >
>>> > --
>>> > May the most significant bit of your life be positive.
>>> > ___
>>> > ceph-users mailing list -- ceph-users@ceph.io
>>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] modifying "osd_memory_target"

2019-08-29 Thread Amudhan P

Hi,

How do i change "osd_memory_target" in ceph command line.

regards
Amudhan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: help

2019-08-29 Thread Amudhan P

output from "ceph osd pool ls detail"

pool 1 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash
rjenkins pg_num 32 pgp_num 32 last_change 74 lfor 0/64 flags hashpspool
stripe_width 0 application cephfs

pool 2 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 32 pgp_num 32 last_change 75 lfor 0/67 flags
hashpspool stripe_width 0 application cephfs

On Thu, Aug 29, 2019 at 6:13 PM Heðin Ejdesgaard Møller 
wrote:

> What's the output of
> ceph osd pool ls detail
>
>
> On hós, 2019-08-29 at 18:06 +0530, Amudhan P wrote:
> > output from "ceph -s "
> >
> >   cluster:
> > id: 7c138e13-7b98-4309-b591-d4091a1742b4
> > health: HEALTH_WARN
> > Degraded data redundancy: 1141587/7723191 objects
> > degraded (14.781%), 15 pgs degraded, 16 pgs undersized
> >
> >   services:
> > mon: 1 daemons, quorum mon01
> > mgr: mon01(active)
> > mds: cephfs-tst-1/1/1 up  {0=mon01=up:active}
> > osd: 3 osds: 3 up, 3 in; 16 remapped pgs
> >
> >   data:
> > pools:   2 pools, 64 pgs
> > objects: 2.57 M objects, 59 GiB
> > usage:   190 GiB used, 5.3 TiB / 5.5 TiB avail
> > pgs: 1141587/7723191 objects degraded (14.781%)
> >  48 active+clean
> >  15 active+undersized+degraded+remapped+backfill_wait
> >  1  active+undersized+remapped+backfilling
> >
> >   io:
> > recovery: 0 B/s, 10 objects/s
> >
> > output from  "ceph osd tree"
> > ID CLASS WEIGHT  TYPE NAME   STATUS REWEIGHT PRI-AFF
> > -1   5.45819 root default
> > -3   1.81940 host test-node1
> >  0   hdd 1.81940 osd.0   up  1.0 1.0
> > -5   1.81940 host test-node2
> >  1   hdd 1.81940 osd.1   up  1.0 1.0
> > -7   1.81940 host test-node3
> >  2   hdd 1.81940 osd.2   up  1.0 1.0
> >
> > failure domain not configured yet, setup is 3 OSD node each with a
> > single disk, 1 node with mon running.
> > the cluster was healthy until I run a script for creating multiple
> > folders.
> >
> > regards
> > Amudhan
> >
> > On Thu, Aug 29, 2019 at 5:33 PM Heðin Ejdesgaard Møller <
> > h...@synack.fo> wrote:
> > > In adition to ceph -s, could you provide the output of
> > > ceph osd tree
> > > and specify what your failure domain is ?
> > >
> > > /Heðin
> > >
> > >
> > > On hós, 2019-08-29 at 13:55 +0200, Janne Johansson wrote:
> > > >
> > > >
> > > > Den tors 29 aug. 2019 kl 13:50 skrev Amudhan P <
> > > amudha...@gmail.com>:
> > > > > Hi,
> > > > >
> > > > > I am using ceph version 13.2.6 (mimic) on test setup trying
> > > with
> > > > > cephfs.
> > > > > my ceph health status showing warning .
> > > > >
> > > > > "ceph health"
> > > > > HEALTH_WARN Degraded data redundancy: 1197023/7723191 objects
> > > > > degraded (15.499%)
> > > > >
> > > > > "ceph health detail"
> > > > > HEALTH_WARN Degraded data redundancy: 1197128/7723191 objects
> > > > > degraded (15.500%)
> > > > > PG_DEGRADED Degraded data redundancy: 1197128/7723191 objects
> > > > > degraded (15.500%)
> > > > > pg 2.0 is stuck undersized for 1076.454929, current state
> > > > > active+undersized+
> > > > > pg 2.2 is stuck undersized for 1076.456639, current state
> > > > > active+undersized+
> > > > >
> > > >
> > > > How does "ceph -s" look?
> > > > It should have more info on what else is wrong.
> > > >
> > > > --
> > > > May the most significant bit of your life be positive.
> > > > ___
> > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: help

2019-08-29 Thread Amudhan P

output from "ceph -s "

  cluster:
id: 7c138e13-7b98-4309-b591-d4091a1742b4
health: HEALTH_WARN
Degraded data redundancy: 1141587/7723191 objects degraded
(14.781%), 15 pgs degraded, 16 pgs undersized

  services:
mon: 1 daemons, quorum mon01
mgr: mon01(active)
mds: cephfs-tst-1/1/1 up  {0=mon01=up:active}
osd: 3 osds: 3 up, 3 in; 16 remapped pgs

  data:
pools:   2 pools, 64 pgs
objects: 2.57 M objects, 59 GiB
usage:   190 GiB used, 5.3 TiB / 5.5 TiB avail
pgs: 1141587/7723191 objects degraded (14.781%)
 48 active+clean
 15 active+undersized+degraded+remapped+backfill_wait
 1  active+undersized+remapped+backfilling

  io:
recovery: 0 B/s, 10 objects/s

output from  "ceph osd tree"
ID CLASS WEIGHT  TYPE NAME   STATUS REWEIGHT PRI-AFF
-1   5.45819 root default
-3   1.81940 host test-node1
 0   hdd 1.81940 osd.0   up  1.0 1.0
-5   1.81940 host test-node2
 1   hdd 1.81940 osd.1   up  1.0 1.0
-7   1.81940 host test-node3
 2   hdd 1.81940 osd.2   up  1.0 1.0

failure domain not configured yet, setup is 3 OSD node each with a single
disk, 1 node with mon running.
the cluster was healthy until I run a script for creating multiple folders.

regards
Amudhan

On Thu, Aug 29, 2019 at 5:33 PM Heðin Ejdesgaard Møller 
wrote:

> In adition to ceph -s, could you provide the output of
> ceph osd tree
> and specify what your failure domain is ?
>
> /Heðin
>
>
> On hós, 2019-08-29 at 13:55 +0200, Janne Johansson wrote:
> >
> >
> > Den tors 29 aug. 2019 kl 13:50 skrev Amudhan P :
> > > Hi,
> > >
> > > I am using ceph version 13.2.6 (mimic) on test setup trying with
> > > cephfs.
> > > my ceph health status showing warning .
> > >
> > > "ceph health"
> > > HEALTH_WARN Degraded data redundancy: 1197023/7723191 objects
> > > degraded (15.499%)
> > >
> > > "ceph health detail"
> > > HEALTH_WARN Degraded data redundancy: 1197128/7723191 objects
> > > degraded (15.500%)
> > > PG_DEGRADED Degraded data redundancy: 1197128/7723191 objects
> > > degraded (15.500%)
> > > pg 2.0 is stuck undersized for 1076.454929, current state
> > > active+undersized+
> > > pg 2.2 is stuck undersized for 1076.456639, current state
> > > active+undersized+
> > >
> >
> > How does "ceph -s" look?
> > It should have more info on what else is wrong.
> >
> > --
> > May the most significant bit of your life be positive.
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] help

2019-08-29 Thread Amudhan P

Hi,

I am using ceph version 13.2.6 (mimic) on test setup trying with cephfs.
my ceph health status showing warning .

"ceph health"
HEALTH_WARN Degraded data redundancy: 1197023/7723191 objects degraded
(15.499%)

"ceph health detail"
HEALTH_WARN Degraded data redundancy: 1197128/7723191 objects degraded
(15.500%)
PG_DEGRADED Degraded data redundancy: 1197128/7723191 objects degraded
(15.500%)
pg 2.0 is stuck undersized for 1076.454929, current state
active+undersized+
pg 2.2 is stuck undersized for 1076.456639, current state
active+undersized+
pg 2.3 is stuck undersized for 1076.456113, current state
active+undersized+
pg 2.7 is stuck undersized for 1076.456342, current state
active+undersized+
pg 2.8 is stuck undersized for 1076.455920, current state
active+undersized+
pg 2.a is stuck undersized for 1076.486412, current state
active+undersized+
pg 2.b is stuck undersized for 1076.485975, current state
active+undersized+
pg 2.f is stuck undersized for 1076.486953, current state
active+undersized+
pg 2.10 is stuck undersized for 1076.486763, current state
active+undersized
pg 2.12 is stuck undersized for 1076.486539, current state
active+undersized
pg 2.13 is stuck undersized for 1075.419199, current state
active+undersized
pg 2.17 is stuck undersized for 1076.455424, current state
active+undersized
pg 2.18 is stuck undersized for 1075.419639, current state
active+undersized
pg 2.1a is stuck undersized for 1076.455966, current state
active+undersized
pg 2.1b is stuck undersized for 1076.486677, current state
active+undersized
pg 2.1f is stuck undersized for 1076.455572, current state
active+undersized

how to bring it health status OK

regards
Amudhan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

77 matches

Mail list logo