[ceph-users] Re: Dashboard's website hangs during loading, no errors

2021-11-18 Thread Kai Börnert

Hi,

do you use more nodes than deployed mgrs and cephadm?

If so it might be, that the node you are connecting to no longer has a 
instance of the mgr running, and you only getting some leftovers in the 
browser cache?


At least this was happening in my test cluster, but I was always able to 
find a node with the mgr running by just trying trough them.


Greetings,

Kai

On 11/19/21 00:03, Zach Heise (SSCC) wrote:


Hello!

Our test cluster is a few months old, was initially set up from 
scratch with Pacific and has now had two separate small patches 16.2.5 
and then a couple weeks ago, 16.2.6 applied to it. The issue I?m 
describing has been present since the beginning.


We have an active and standby mgr daemon, and the dashboard module is 
installed with SSL turned on. Self signed certificates only, not 
trusted by browsers, but I always just click ?okay? through Chrome and 
Firefox?s warnings about that.


I have noticed that every 2-3 days, in the morning when I start work, 
our ceph dashboard page does not respond in the browser. It works fine 
throughout the day, but it seems like after a certain unknown hours 
without anyone accessing it (I?m the only one using the dashboard now 
since it?s just a test) something must be going wrong with the 
dashboard module, or mgr daemon, because when I try to load (or 
refresh when it's already loaded) the ceph dashboard site, the browser 
just does the ?throbber ? ? no 
content on the page ever appears, no errors or anything. None of the 
buttons on the page load ? nor time out and show a 404 ? for example, 
Block\Images or Cluster\Hosts in the left sidebar will load, but show 
empty. And the throbber never stops.


Confirmed that this happens in all browsers too.

I can easily fix it with ceph mgr module disable dashboard and then 
waiting 10 seconds, then ceph mgr module enable dashboard ? this makes 
it start working again, until the next time I go a few days without 
using the dashboard, at which point I need to do the same process again.


Any ideas as to what could be causing this? I have already turned on 
debug mode. When I?m in this hanging state, I check the cephadm logs 
with cephadm logs --name mgr.ceph01.fblojp -- -f but there?s nothing 
obvious (to my untrained eyes at least). When the dashboard is 
functional, I can see my own navigation around the dashboard in the 
logs so I know that logging is working:


Nov 01 15:46:32 ceph01.domain conmon[5814]: debug 
2021-11-01T20:46:32.601+ 7f7cbb42e700  0 [dashboard INFO request] 
[10.130.50.252:52267] [GET] [200] [0.013s] [admin] [1.0K] /api/summary


I already confirmed that the same thing happens regardless of whether 
I?m using default ports of http://ceph01.domain:8080 or 
https://ceph01.domain:8443 (although as mentioned I usually use 
self-signed SSL).


At this moment the dashboard is currently in this hanging state so I 
am happy to try to get logs.


Thanks,

-Zach


___
ceph-users mailing list --ceph-users@ceph.io
To unsubscribe send an email toceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Dashboard's website hangs during loading, no errors

2021-11-19 Thread Ernesto Puerta
Hi Zach,

Thanks for the thorough description. We haven't noticed this issue so far
and have some long-running clusters, but let's try to debug it:

   - First of all, as Kai suggested, let's ensure we're hitting the active
   manager address (there's a redirection mechanism, but let's ensure it
   anyway): a "ceph mgr services" should give you the active Dashboard URL.
   - After that, my suggestion for you is to open the Browser's Dev Tools
   (built-in in both Chrome or Firefox) and visit the Networking tab. In
   there, you should be able a few network requests on hard reload (remember
   to keep CTRL+SHIFT pressed while clicking on the reload icon). You should
   see a few HTML, CSS and JS assets downloading.
   - Let's try to perform a "curl" from the CLI: "curl -k
https://:".
   That should return the index HTML file.

Are you using a reverse proxy/cache that might be interfering with this?

Kind Regards,
Ernesto


On Fri, Nov 19, 2021 at 12:04 AM Zach Heise (SSCC) 
wrote:

> Hello!
>
>
>
> Our test cluster is a few months old, was initially set up from scratch
> with Pacific and has now had two separate small patches 16.2.5 and then a
> couple weeks ago, 16.2.6 applied to it. The issue I?m describing has been
> present since the beginning.
>
>
>
> We have an active and standby mgr daemon, and the dashboard module is
> installed with SSL turned on. Self signed certificates only, not trusted by
> browsers, but I always just click ?okay? through Chrome and Firefox?s
> warnings about that.
>
>
>
> I have noticed that every 2-3 days, in the morning when I start work, our
> ceph dashboard page does not respond in the browser. It works fine
> throughout the day, but it seems like after a certain unknown hours without
> anyone accessing it (I?m the only one using the dashboard now since it?s
> just a test) something must be going wrong with the dashboard module, or
> mgr daemon, because when I try to load (or refresh when it's already
> loaded) the ceph dashboard site, the browser just does the ?throbber
> ? ? no content on the page ever
> appears, no errors or anything. None of the buttons on the page load ? nor
> time out and show a 404 ? for example, Block\Images or Cluster\Hosts in the
> left sidebar will load, but show empty. And the throbber never stops.
>
>
>
> Confirmed that this happens in all browsers too.
>
>
>
> I can easily fix it with ceph mgr module disable dashboard and then
> waiting 10 seconds, then ceph mgr module enable dashboard ? this makes it
> start working again, until the next time I go a few days without using the
> dashboard, at which point I need to do the same process again.
>
>
>
> Any ideas as to what could be causing this? I have already turned on debug
> mode. When I?m in this hanging state, I check the cephadm logs with cephadm
> logs --name mgr.ceph01.fblojp -- -f but there?s nothing obvious (to my
> untrained eyes at least). When the dashboard is functional, I can see my
> own navigation around the dashboard in the logs so I know that logging is
> working:
>
>
>
> Nov 01 15:46:32 ceph01.domain conmon[5814]: debug
> 2021-11-01T20:46:32.601+ 7f7cbb42e700  0 [dashboard INFO request] [
> 10.130.50.252:52267] [GET] [200] [0.013s] [admin] [1.0K] /api/summary
>
>
>
> I already confirmed that the same thing happens regardless of whether I?m
> using default ports of http://ceph01.domain:8080 or
> https://ceph01.domain:8443 (although as mentioned I usually use
> self-signed SSL).
>
>
>
> At this moment the dashboard is currently in this hanging state so I am
> happy to try to get logs.
>
>
>
> Thanks,
>
> -Zach
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Dashboard's website hangs during loading, no errors

2021-11-19 Thread Zach Heise (SSCC)

  
Thanks for writing, Ernesto.

  output of ceph mgr services:
ceph mgr services
{
    "dashboard": "https://144.92.190.200:8443/",
    "prometheus": "http://144.92.190.200:9283/"
}
  Network tab in dev tools, doing a reload just results in a GET
-> DOMAIN: ceph01.ssc.wisc.edu:8443, file /
  
Nothing else comes up as the throbber throbs.
No assets list as being downloaded.

  
  Similar result with curl: curl -k https://144.92.190.200:8443
just results in a blinking cursor. No errors, just hanging. If I
try any other random port, curl (as expected) says "connection
refused" and quits instantly.

Zach
  
  
  

On 2021-11-19 10:17 AM, Ernesto Puerta
  wrote:


  
  Hi Zach,


Thanks for the thorough description. We haven't noticed
  this issue so far and have some long-running clusters, but
  let's try to debug it:

  
First of all, as Kai suggested, let's ensure we're
  hitting the active manager address (there's a redirection
  mechanism, but let's ensure it anyway): a "ceph mgr
  services" should give you the active Dashboard URL.
After that, my suggestion for you is to open the
  Browser's Dev Tools (built-in in both Chrome or Firefox)
  and visit the Networking tab. In there, you should be able
  a few network requests on hard reload (remember to keep
  CTRL+SHIFT pressed while clicking on the reload icon). You
  should see a few HTML, CSS and JS assets downloading.
Let's try to perform a "curl" from the CLI: "curl -k
  https://:". That should return
  the index HTML file.
  
  Are you using a reverse proxy/cache that might be
interfering with this?


  

  
Kind Regards,
Ernesto

  
  

  
  
  
On Fri, Nov 19, 2021 at 12:04
  AM Zach Heise (SSCC) 
  wrote:


  

  Hello!
   
  Our test cluster is a few months old,
was initially set up from scratch with Pacific and has
now had two separate small patches 16.2.5 and then a
couple weeks ago, 16.2.6 applied to it. The issue I?m
describing has been present since the beginning.
   
  We have an active and standby mgr
daemon, and the dashboard module is installed with SSL
turned on. Self signed certificates only, not trusted by
browsers, but I always just click ?okay? through Chrome
and Firefox?s warnings about that.
   
  I have noticed that every 2-3 days,
in the morning when I start work, our ceph dashboard
page does not respond in the browser. It works fine
throughout the day, but it seems like after a certain
unknown hours without anyone accessing it (I?m the only
one using the dashboard now since it?s just a test)
something must be going wrong with the dashboard module,
or mgr daemon, because when I try to load (or refresh
when it's already loaded) the ceph dashboard site, the
browser just does the ?throbber? ?
no content on the page ever appears, no errors or
anything. None of the buttons on the page load ? nor
time out and show a 404 ? for example, Block\Images or
Cluster\Hosts in the left sidebar will load, but show
empty. And the throbber never stops.
   
  Confirmed that this happens in all
browsers too.
   
  I can easily fix it with ceph mgr module disable
  dashboard and then waiting 10 seconds, then ceph mgr module enable
  dashboard ? this makes it start working again,
until the next time I go a few days without using the
dashboard, at which point I need to do the same process
again.
   
  Any ideas as to what could be causing
this? I have already turned on debug mode. When I?m in
this hanging state, I check the cephadm logs with cephadm
  logs --name mgr.ceph01.fblojp -- -f but there?s
nothing obvious (to my untrained eyes at least). When
the dashboard is functional, I can see my own navigation
around the dashboard in the logs so I know that log

[ceph-users] Re: Dashboard's website hangs during loading, no errors

2021-11-19 Thread Ernesto Puerta
Hi Zach,

I remember the Cherrypy webserver (Cheroot 8.5.1) had a hellish
deadlock-kind of issue  no
that long ago, but that was already fixed in 8.5.2.

Could you please run the same curl command with the "-v" flag to get a
verbose output?

You can compare that with a sample output of a freezing Cherrypy server at
this tracker: https://tracker.ceph.com/issues/48973

BTW we also managed to speed up reproduction by using a benchmark tool like
Apache benchmark. You can get here a ready to use reproducer code:
https://bugzilla.redhat.com/show_bug.cgi?id=1920461#c2

Kind Regards,
Ernesto


On Fri, Nov 19, 2021 at 8:09 PM Zach Heise (SSCC) 
wrote:

> Thanks for writing, Ernesto.
>
>1. output of ceph mgr services:
>ceph mgr services
>{
>"dashboard": "https://144.92.190.200:8443/";
>,
>"prometheus": "http://144.92.190.200:9283/";
>
>}
>2. Network tab in dev tools, doing a reload just results in a GET ->
>DOMAIN: ceph01.ssc.wisc.edu:8443, file /
>   1. Nothing else comes up as the throbber throbs.
>   2. No assets list as being downloaded.
>   3. Similar result with curl: curl -k https://144.92.190.200:8443
>just results in a blinking cursor. No errors, just hanging. If I try any
>other random port, curl (as expected) says "connection refused" and quits
>instantly.
>
> Zach
>
>
> On 2021-11-19 10:17 AM, Ernesto Puerta wrote:
>
> Hi Zach,
>
> Thanks for the thorough description. We haven't noticed this issue so far
> and have some long-running clusters, but let's try to debug it:
>
>- First of all, as Kai suggested, let's ensure we're hitting the
>active manager address (there's a redirection mechanism, but let's ensure
>it anyway): a "ceph mgr services" should give you the active Dashboard URL.
>- After that, my suggestion for you is to open the Browser's Dev Tools
>(built-in in both Chrome or Firefox) and visit the Networking tab. In
>there, you should be able a few network requests on hard reload (remember
>to keep CTRL+SHIFT pressed while clicking on the reload icon). You should
>see a few HTML, CSS and JS assets downloading.
>- Let's try to perform a "curl" from the CLI: "curl -k 
> https://:".
>That should return the index HTML file.
>
> Are you using a reverse proxy/cache that might be interfering with this?
>
> Kind Regards,
> Ernesto
>
>
> On Fri, Nov 19, 2021 at 12:04 AM Zach Heise (SSCC) 
> wrote:
>
>> Hello!
>>
>>
>>
>> Our test cluster is a few months old, was initially set up from scratch
>> with Pacific and has now had two separate small patches 16.2.5 and then a
>> couple weeks ago, 16.2.6 applied to it. The issue I?m describing has been
>> present since the beginning.
>>
>>
>>
>> We have an active and standby mgr daemon, and the dashboard module is
>> installed with SSL turned on. Self signed certificates only, not trusted by
>> browsers, but I always just click ?okay? through Chrome and Firefox?s
>> warnings about that.
>>
>>
>>
>> I have noticed that every 2-3 days, in the morning when I start work, our
>> ceph dashboard page does not respond in the browser. It works fine
>> throughout the day, but it seems like after a certain unknown hours without
>> anyone accessing it (I?m the only one using the dashboard now since it?s
>> just a test) something must be going wrong with the dashboard module, or
>> mgr daemon, because when I try to load (or refresh when it's already
>> loaded) the ceph dashboard site, the browser just does the ?throbber
>> ? ? no content on the page ever
>> appears, no errors or anything. None of the buttons on the page load ? nor
>> time out and show a 404 ? for example, Block\Images or Cluster\Hosts in the
>> left sidebar will load, but show empty. And the throbber never stops.
>>
>>
>>
>> Confirmed that this happens in all browsers too.
>>
>>
>>
>> I can easily fix it with ceph mgr module disable dashboard and then
>> waiting 10 seconds, then ceph mgr module enable dashboard ? this makes
>> it start working again, until the next time I go a few days without using
>> the dashboard, at which point I need to do the same process again.
>>
>>
>>
>> Any ideas as to what could be causing this? I have already turned on
>> debug mode. When I?m in this hanging state, I check the cephadm logs with 
>> cephadm
>> logs --name mgr.ceph01.fblojp -- -f but there?s nothing obvious (to my
>> untrained eyes at least). When the dashboard is functional, I can see my
>> own navigation around the dashboard in the logs so I know that logging is
>> working:
>>
>>
>>
>> Nov 01 15:46:32 ceph01.domain conmon[5814]: debug
>> 2021-11-01T20:46:32.601+ 7f7cbb42e700  0 [dashboard INFO request] [
>> 10.130.50.252:52267] [GET] [200] [0.013s] [admin] [1.0K] /api/summary
>>
>>
>>
>> I already confirmed that the same thing happens regardless of whether 

[ceph-users] Re: Dashboard's website hangs during loading, no errors

2021-11-19 Thread Zach Heise (SSCC)

  
Spot on, Ernesto - my output looks basically identical:
curl -kv
https://144.92.190.200:8443
* Rebuilt URL to: https://144.92.190.200:8443/
*   Trying 144.92.190.200...
* TCP_NODELAY set
* Connected to 144.92.190.200 (144.92.190.200) port 8443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: C=US; O=SSCC; CN=ceph01.ads.ssc.wisc.edu
*  start date: Sep 22 20:12:31 2021 GMT
*  expire date: Sep 22 20:12:31 2023 GMT
*  issuer: DC=edu; DC=wisc; DC=ssc; DC=ads; CN=SSCC CA
*  SSL certificate verify result: unable to get local issuer
certificate (20), continuing anyway.
* TLSv1.3 (OUT), TLS app data, [no content] (0):
> GET / HTTP/1.1
> Host: 144.92.190.200:8443
> User-Agent: curl/7.61.1
> Accept: */*
> 
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
  


Zach
  
  

On 2021-11-19 3:06 PM, Ernesto Puerta
  wrote:


  
  

  Hi Zach,
  
  
  I remember the Cherrypy webserver (Cheroot 8.5.1) had a hellish deadlock-kind of issue
no that long ago, but that was already fixed in 8.5.2.
  
  
  Could you please run the same curl command with the "-v"
flag to get a verbose output?
  
  
  You can compare that with a sample output of a freezing
Cherrypy server at this tracker: https://tracker.ceph.com/issues/48973
  
  
  BTW we also managed to speed up reproduction by using a
benchmark tool like Apache benchmark. You can get here a
ready to use reproducer code: https://bugzilla.redhat.com/show_bug.cgi?id=1920461#c2
  
  
  Kind Regards,
  
  

  Ernesto

  
  

  

  

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Dashboard's website hangs during loading, no errors

2021-11-26 Thread Zach Heise (SSCC)

Good afternoon Kai, I think I missed this email originally when you sent it.

I think that, due to how reliably this issue happens, that this seems to 
be unlikely to be an issue with the mgr daemon going down.


Ernesto - at this point, are there any other debug logs I can provide 
that would give more detail? This isn't a mission critical thing for me 
since I can restart the dashboard module so easily, and I'll feel hugely 
better if it's the same problem that the Cherrypy webserver had a few 
months ago.


Should I go ahead and open up a new ceph bug at tracker.ceph.com if one 
does not exist?


Zach


On 2021-11-18 6:39 PM, Kai Börnert wrote:

Hi,

do you use more nodes than deployed mgrs and cephadm?

If so it might be, that the node you are connecting to no longer has a 
instance of the mgr running, and you only getting some leftovers in 
the browser cache?


At least this was happening in my test cluster, but I was always able 
to find a node with the mgr running by just trying trough them.


Greetings,

Kai


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Dashboard's website hangs during loading, no errors

2021-11-29 Thread Ernesto Puerta
Hi Zach,

I just checked the Cherroot repo and there's no newer version (latest is
8.5.2), but that version should be both in 16.2.5 and 16.2.6
(python3-cheroot-8.5.2-1.el8.noarch), so I don't get why this started
happening with .6. Are you running vanilla Ceph containers (
quay.ceph.io/ceph-ci/ceph) or a custom build?

At this point, it'd be useful to follow up on a Ceph tracker issue, so
could you please open one
?

Thanks.

Kind Regards,
Ernesto


On Fri, Nov 26, 2021 at 10:55 PM Zach Heise (SSCC) 
wrote:

> Good afternoon Kai, I think I missed this email originally when you sent
> it.
>
> I think that, due to how reliably this issue happens, that this seems to
> be unlikely to be an issue with the mgr daemon going down.
>
> Ernesto - at this point, are there any other debug logs I can provide
> that would give more detail? This isn't a mission critical thing for me
> since I can restart the dashboard module so easily, and I'll feel hugely
> better if it's the same problem that the Cherrypy webserver had a few
> months ago.
>
> Should I go ahead and open up a new ceph bug at tracker.ceph.com if one
> does not exist?
>
> Zach
>
>
> On 2021-11-18 6:39 PM, Kai Börnert wrote:
> > Hi,
> >
> > do you use more nodes than deployed mgrs and cephadm?
> >
> > If so it might be, that the node you are connecting to no longer has a
> > instance of the mgr running, and you only getting some leftovers in
> > the browser cache?
> >
> > At least this was happening in my test cluster, but I was always able
> > to find a node with the mgr running by just trying trough them.
> >
> > Greetings,
> >
> > Kai
> >
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io