Re: [ceph-users] strange osd beacon

2019-06-15 Thread huang jun
osd send osd beacons every 300s, and it's used to let mon know that
osd is alive,
for some cases, the osd don't have peers, ex, no pools created.

Rafał Wądołowski  于2019年6月14日周五 下午12:53写道:
>
> Hi,
>
> Is it normal that osd beacon could be without pgs? Like below. This
> drive contain data, but I cannot make him to run.
>
> Ceph v.12.2.4
>
>
>  {
> "description": "osd_beacon(pgs [] lec 857158 v869771)",
> "initiated_at": "2019-06-14 06:39:37.972795",
> "age": 189.310037,
> "duration": 189.453167,
> "type_data": {
> "events": [
> {
> "time": "2019-06-14 06:39:37.972795",
> "event": "initiated"
> },
> {
> "time": "2019-06-14 06:39:37.972954",
> "event": "mon:_ms_dispatch"
> },
> {
> "time": "2019-06-14 06:39:37.972956",
> "event": "mon:dispatch_op"
> },
> {
> "time": "2019-06-14 06:39:37.972956",
> "event": "psvc:dispatch"
> },
> {
> "time": "2019-06-14 06:39:37.972976",
> "event": "osdmap:preprocess_query"
> },
> {
> "time": "2019-06-14 06:39:37.972978",
> "event": "osdmap:preprocess_beacon"
> },
> {
> "time": "2019-06-14 06:39:37.972982",
> "event": "forward_request_leader"
> },
> {
> "time": "2019-06-14 06:39:37.973064",
> "event": "forwarded"
> }
> ],
> "info": {
> "seq": 22378,
> "src_is_mon": false,
> "source": "osd.1092 10.11.2.33:6842/159188",
> "forwarded_to_leader": true
> }
> }
> }
>
>
> Best Regards,
>
> Rafał Wądołowski
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] problem with degraded PG

2019-06-15 Thread huang jun
can you show us the output of 'ceph osd dump' and 'ceph health detail'?

Luk  于2019年6月14日周五 下午8:02写道:
>
> Hello,
>
> All kudos are going to friends from Wroclaw, PL :)
>
> It was as simple as typo...
>
> There  was osd added two times to crushmap due to (this commands where
> run  over  week  ago  -  didn't  have problem then, it showed up after
> replacing another osd - osd-7):
>
> ceph osd crush add osd.112 0.00 root=hdd
> ceph osd crush move osd.112 0.00 root=hdd rack=rack-a host=stor-a02
> ceph osd crush add osd.112 0.00 host=stor-a02
>
> and the ceph osd tree was like this:
> [root@ceph-mon-01 ~]# ceph osd tree
> ID   CLASS WEIGHTTYPE NAME STATUS REWEIGHT PRI-AFF
> -100   200.27496 root hdd
> -10167.64999 rack rack-a
>   -233.82500 host stor-a01
>0   hdd   7.27499 osd.0 up  1.0 1.0
>6   hdd   7.27499 osd.6 up  1.0 1.0
>   12   hdd   7.27499 osd.12up  1.0 1.0
>  108   hdd   4.0 osd.108   up  1.0 1.0
>  109   hdd   4.0 osd.109   up  1.0 1.0
>  110   hdd   4.0 osd.110   up  1.0 1.0
>   -733.82500 host stor-a02
>5   hdd   7.27499 osd.5 up  1.0 1.0
>9   hdd   7.27499 osd.9 up  1.0 1.0
>   15   hdd   7.27499 osd.15up  1.0 1.0
>  111   hdd   4.0 osd.111   up  1.0 1.0
>  112   hdd   4.0 osd.112   up  1.0 1.0
>  113   hdd   4.0 osd.113   up  1.0 1.0
> -10260.97498 rack rack-b
>   -327.14998 host stor-b01
>1   hdd   7.27499 osd.1 up  1.0 1.0
>7   hdd   0.5 osd.7 up  1.0 1.0
>   13   hdd   7.27499 osd.13up  1.0 1.0
>  114   hdd   4.0 osd.114   up  1.0 1.0
>  115   hdd   4.0 osd.115   up  1.0 1.0
>  116   hdd   4.0 osd.116   up  1.0 1.0
>   -433.82500 host stor-b02
>2   hdd   7.27499 osd.2 up  1.0 1.0
>   10   hdd   7.27499 osd.10up  1.0 1.0
>   16   hdd   7.27499 osd.16up  1.0 1.0
>  117   hdd   4.0 osd.117   up  1.0 1.0
>  118   hdd   4.0 osd.118   up  1.0 1.0
>  119   hdd   4.0 osd.119   up  1.0 1.0
> -10367.64999 rack rack-c
>   -633.82500 host stor-c01
>4   hdd   7.27499 osd.4 up  1.0 1.0
>8   hdd   7.27499 osd.8 up  1.0 1.0
>   14   hdd   7.27499 osd.14up  1.0 1.0
>  120   hdd   4.0 osd.120   up  1.0 1.0
>  121   hdd   4.0 osd.121   up  1.0 1.0
>  122   hdd   4.0 osd.122   up  1.0 1.0
>   -533.82500 host stor-c02
>3   hdd   7.27499 osd.3 up  1.0 1.0
>   11   hdd   7.27499 osd.11up  1.0 1.0
>   17   hdd   7.27499 osd.17up  1.0 1.0
>  123   hdd   4.0 osd.123   up  1.0 1.0
>  124   hdd   4.0 osd.124   up  1.0 1.0
>  125   hdd   4.0 osd.125   up  1.0 1.0
>  112   hdd   4.0 osd.112   up  1.0 1.0
>
>  [cut]
>
>  after  editing  crushmap  and removing osd.112 from root ceph started
>  recover and is healthy now :)
>
>  Regards
>  Lukasz
>
>
> > Here is ceph osd tree, in first post there is also ceph osd df tree:
>
> > https://pastebin.com/Vs75gpwZ
>
>
>
> >> Ahh I was thinking of chooseleaf_vary_r, which you already have.
> >> So probably not related to tunables. What is your `ceph osd tree` ?
>
> >> By the way, 12.2.9 has an unrelated bug (details
> >> http://tracker.ceph.com/issues/36686)
> >> AFAIU you will just need to update to v12.2.11 or v12.2.12 for that fix.
>
> >> -- Dan
>
> >> On Fri, Jun 14, 2019 at 11:29 AM Luk  wrote:
> >>>
> >>> Hi,
> >>>
> >>> here is the output:
> >>>
> >>> ceph osd crush show-tunables
> >>> {
> >>> "choose_local_tries": 0,
> >>> "choose_local_fallback_tries": 0,
> >>> "choose_total_tries": 100,
> >>> "chooseleaf_descend_once": 1,
> >>> "chooseleaf_vary_r": 1,
> >>> "chooseleaf_stable": 0,
> >>> "straw_calc_version": 1,
> >>> "allowed_bucket_algs": 22,
> >>> "profile": "unknown",
> >>> "optimal_tunables": 0,
> >>> "legacy_tunables": 0,
> >>> "minimum_required_version": 

Re: [ceph-users] HEALTH_WARN - 3 modules have failed dependencies

2019-06-15 Thread Harry G. Coin

Ubuntu ceph dashboard failure/regression still exists as of today.

root@nocsupport2:~# uname -a
Linux nocsupport2 5.0.0-16-generic #17-Ubuntu SMP Wed May 15 10:52:21 
UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

root@nocsupport2:~# date
Sat 15 Jun 2019 03:03:52 PM CDT
root@nocsupport2:~# ceph -s
cluster:
id: x
health: HEALTH_WARN
...

Module 'dashboard' has failed dependency: Interpreter change detected - 
this module can only be loaded into one interpreter per process.


...


On 5/1/19 11:07 AM, Ranjan Ghosh wrote:


Ah, after researching some more I think I got hit by this bug:

https://github.com/ceph/ceph/pull/25585

At least that's exactly what I see in the logs: "Interpreter change 
detected - this module can only be loaded into one interpreter per 
process."


Ceph modules don't seem to work at all with the newest Ubuntu version. 
Only one module can be loaded. Sad :-(


Hope this will be fixed soon...


Am 30.04.19 um 21:18 schrieb Ranjan Ghosh:


Hi my beloved Ceph list,

After an upgrade from Ubuntu Cosmic to Ubuntu Disco (and according 
Ceph packages updated from 13.2.2 to 13.2.4), I now get this when I 
enter "ceph health":


HEALTH_WARN 3 modules have failed dependencies

"ceph mgr module ls" only reports those 3 modules enabled:

"enabled_modules": [
    "dashboard",
    "restful",
    "status"
    ],
...

Then I found this page here:

docs.ceph.com/docs/master/rados/operations/health-checks

Under "MGR_MODULE_DEPENDENCY" it says:

"An enabled manager module is failing its dependency check. This 
health check should come with an explanatory message from the module 
about the problem."


What is "this health check"? If the page talks about "ceph health" or 
"ceph -s" then, no, there is no explanatory message there on what's 
wrong.


Furthermore, it says:

"This health check is only applied to enabled modules. If a module is 
not enabled, you can see whether it is reporting dependency issues in 
the output of ceph module ls."


The command "ceph module ls", however, doesn't exist. If "ceph mgr 
module ls" is really meant, then I get this:


{
    "enabled_modules": [
    "dashboard",
    "restful",
    "status"
    ],
    "disabled_modules": [
    {
    "name": "balancer",
    "can_run": true,
    "error_string": ""
    },
    {
    "name": "hello",
    "can_run": false,
    "error_string": "Interpreter change detected - this 
module can only be loaded into one interpreter per process."

    },
    {
    "name": "influx",
    "can_run": false,
    "error_string": "Interpreter change detected - this 
module can only be loaded into one interpreter per process."

    },
    {
    "name": "iostat",
    "can_run": false,
    "error_string": "Interpreter change detected - this 
module can only be loaded into one interpreter per process."

    },
    {
    "name": "localpool",
    "can_run": false,
    "error_string": "Interpreter change detected - this 
module can only be loaded into one interpreter per process."

    },
    {
    "name": "prometheus",
    "can_run": false,
    "error_string": "Interpreter change detected - this 
module can only be loaded into one interpreter per process."

    },
    {
    "name": "selftest",
    "can_run": false,
    "error_string": "Interpreter change detected - this 
module can only be loaded into one interpreter per process."

    },
    {
    "name": "smart",
    "can_run": false,
    "error_string": "Interpreter change detected - this 
module can only be loaded into one interpreter per process."

    },
    {
    "name": "telegraf",
    "can_run": false,
    "error_string": "Interpreter change detected - this 
module can only be loaded into one interpreter per process."

    },
    {
    "name": "telemetry",
    "can_run": false,
    "error_string": "Interpreter change detected - this 
module can only be loaded into one interpreter per process."

    },
    {
    "name": "zabbix",
    "can_run": false,
    "error_string": "Interpreter change detected - this 
module can only be loaded into one interpreter per process."

    }
    ]
}

Usually the Ceph documentation is great, very detailed and helpful. 
But I can find nothing on how to resolve this problem. Any help is 
much appreciated.


Thank you / Best regards

Ranjan





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___

[ceph-users] RGW Blocking Behaviour on Inactive / Incomplete PG

2019-06-15 Thread Romit Misra
Hi,
 I wanted to understand the nature of the RGW Threads Being Blocked on
Requests for a PG which is currently in INACTIVE State

1.As long as the PG is inactive the requests stay blocked
2.Could the RGW Threads Use Event Based Model, if a PG is inactive, put the
Current Request into a Block Queue, A event based Model, which is something
similar to nginx
3.Could The RGW threads Timeout, if the Request stay blocked then a certain
threshold?
4.Was the Design of Blocking the RGW threads on a Inactive PG by choice, or
it was they way  this model was implemented?
5.Are there any serialisation issues that could arise if a async model is
used?

The above questions are based upon the observation seen on Hammer, and the
reason for the above is to Increase the Availability of a WebStack service,
is slight percentages of PG are down for a increased amount of time.

Thanks
Romit
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com