All that looks fine.

There must be some state where the cluster is known to calamari and it is 
failing to actually show it.

If you have time to debug I would love to see the logs at debug level.

If you don’t we could try cleaning out calamari’s state.
sudo supervisorctl shutdown
sudo service httpd stop
sudo calamari-ctl clear —yes-i-am-sure
sudo calamari-ctl initialize

then 
sudo service supervisord start
sudo service httpd start

see what the API and UI says then.

regards,
Gregory 
> On May 12, 2015, at 5:18 PM, Bruce McFarland 
> <bruce.mcfarl...@taec.toshiba.com> wrote:
> 
> Master was ess68 and now it's essperf3. 
> 
> On all cluster nodes the following files now have 'master: essperf3'
> /etc/salt/minion 
> /etc/salt/minion/calamari.conf 
> /etc/diamond/diamond.conf
> 
> The 'salt \* ceph.get_heartbeats' is being run on essperf3 - heres a 'salt \* 
> test.ping' from essperf3 Calamari Master to the cluster. I've also included a 
> quick cluster sanity test with the output of ceph -s and ceph osd tree. And 
> for your reading pleasure the output of 'salt octeon109 ceph.get_heartbeats' 
> since I suspect there might be a missing field in the monitor response. 
> 
> oot@essperf3:/etc/ceph# salt \* test.ping
> octeon108:
>    True
> octeon114:
>    True
> octeon111:
>    True
> octeon101:
>    True
> octeon106:
>    True
> octeon109:
>    True
> octeon118:
>    True
> root@essperf3:/etc/ceph# ceph osd tree
> # id  weight  type name       up/down reweight
> -1    7       root default
> -4    1               host octeon108
> 0     1                       osd.0   up      1       
> -2    1               host octeon111
> 1     1                       osd.1   up      1       
> -5    1               host octeon115
> 2     1                       osd.2   DNE             
> -6    1               host octeon118
> 3     1                       osd.3   up      1       
> -7    1               host octeon114
> 4     1                       osd.4   up      1       
> -8    1               host octeon106
> 5     1                       osd.5   up      1       
> -9    1               host octeon101
> 6     1                       osd.6   up      1       
> root@essperf3:/etc/ceph# ceph -s 
>    cluster 868bfacc-e492-11e4-89fa-000fb711110c
>     health HEALTH_OK
>     monmap e1: 1 mons at {octeon109=209.243.160.70:6789/0}, election epoch 1, 
> quorum 0 octeon109
>     osdmap e80: 6 osds: 6 up, 6 in
>      pgmap v26765: 728 pgs, 2 pools, 20070 MB data, 15003 objects
>            60604 MB used, 2734 GB / 2793 GB avail
>                 728 active+clean
> root@essperf3:/etc/ceph#
> 
> root@essperf3:/etc/ceph# salt octeon109 ceph.get_heartbeats
> octeon109:
>    ----------
>    - boot_time:
>        1430784431
>    - ceph_version:
>        0.80.8-0.el6
>    - services:
>        ----------
>        ceph-mon.octeon109:
>            ----------
>            cluster:
>                ceph
>            fsid:
>                868bfacc-e492-11e4-89fa-000fb711110c
>            id:
>                octeon109
>            status:
>                ----------
>                election_epoch:
>                    1
>                extra_probe_peers:
>                monmap:
>                    ----------
>                    created:
>                        2015-04-16 23:50:52.412686
>                    epoch:
>                        1
>                    fsid:
>                        868bfacc-e492-11e4-89fa-000fb711110c
>                    modified:
>                        2015-04-16 23:50:52.412686
>                    mons:
>                        ----------
>                        - addr:
>                            209.243.160.70:6789/0
>                        - name:
>                            octeon109
>                        - rank:
>                            0
>                name:
>                    octeon109
>                outside_quorum:
>                quorum:
>                    - 0
>                rank:
>                    0
>                state:
>                    leader
>                sync_provider:
>            type:
>                mon
>            version:
>                0.86
>    ----------
>    - 868bfacc-e492-11e4-89fa-000fb711110c:
>        ----------
>        fsid:
>            868bfacc-e492-11e4-89fa-000fb711110c
>        name:
>            ceph
>        versions:
>            ----------
>            config:
>                87f175c60e5c7ec06c263c556056fbcb
>            health:
>                a907d0ec395713369b4843381ec31bc2
>            mds_map:
>                1
>            mon_map:
>                1
>            mon_status:
>                1
>            osd_map:
>                80
>            pg_summary:
>                7e29d7cc93cfced8f3f146cc78f5682f
> root@essperf3:/etc/ceph#
> 
> 
> 
>> -----Original Message-----
>> From: Gregory Meno [mailto:gm...@redhat.com]
>> Sent: Tuesday, May 12, 2015 5:03 PM
>> To: Bruce McFarland
>> Cc: ceph-calam...@lists.ceph.com; ceph-us...@ceph.com; ceph-devel
>> (ceph-de...@vger.kernel.org)
>> Subject: Re: [ceph-calamari] Does anyone understand Calamari??
>> 
>> Bruce,
>> 
>> It is great to hear that salt is reporting status from all the nodes in the
>> cluster.
>> 
>> Let me see if I understand your question:
>> 
>> You want to know what conditions cause us to recognize a working cluster?
>> 
>> see
>> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
>> manager.py#L135
>> 
>> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/
>> manager.py#L349
>> 
>> and
>> 
>> https://github.com/ceph/calamari/blob/master/cthulhu/cthulhu/manager/c
>> luster_monitor.py
>> 
>> 
>> Let’s check that you need to be digging into that level of detail:
>> 
>> You switched to a new instance of calamari and it is not recognizing the
>> cluster.
>> 
>> You what to know what you are overlooking? Would you please clarify with
>> some hostnames?
>> 
>> i.e. Let say that your old calamari node was called calamariA and that your
>> new node is calamariB
>> 
>> from which are you running the get_heartbeats?
>> 
>> what is the master setting in the minion config files out on the nodes of the
>> cluster if things are setup correctly they would look like this:
>> 
>> [root@node1 shadow_man]# cat /etc/salt/minion.d/calamari.conf
>> master: calamariB
>> 
>> 
>> If this is the case the thing I would check is the
>> http://calamariB/api/v2/cluster endpoint is reporting anything?
>> 
>> hope this helps,
>> Gregory
>> 
>>> On May 12, 2015, at 4:34 PM, Bruce McFarland
>> <bruce.mcfarl...@taec.toshiba.com> wrote:
>>> 
>>> Increasing the audience since ceph-calamari is not responsive. What salt
>> event/info does the Calamari Master expect to see from the ceph-mon to
>> determine there is an working cluster? I had to change servers hosting the
>> calamari master and can’t get the new machine to recognize the cluster.
>> The ‘salt \* ceph.get_heartbeats’ returns monmap, fsid, ver, epoch, etc for
>> the monitor and all of the osd’s. Can anyone point me to docs or code that
>> might enlighten me to what I’m overlooking? Thanks.
>>> _______________________________________________
>>> ceph-calamari mailing list
>>> ceph-calam...@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-calamari-ceph.com
> 

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to