[ceph-users] Re: Issue upgrading 17.2.0 to 17.2.5

2023-03-07 Thread aellahib
Your advice regarding the set container images manually did lead me to check 
cephadmin config to see what other nodes are set to and i did see stop and 
17.2.5 set for certain nodes and OSDs. As soon as I pointed all of them the 
right away my logs started showing real data and I can deploy and configure 
nodes.

Thank you very much for your help!
 I will attempt to upgrade again soon.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Issue upgrading 17.2.0 to 17.2.5

2023-03-07 Thread aellahib
Hey David, yes its me..Thank you for your help btw.
 I was waiting on my acceptance to the ceph tracker website. Seems it is in so 
I will submit a request soon, but I havent been able to reproduce it so I am 
not sure if I can provide relevant info for that.
I already ran that orch upgrade stop command multiple times, the new return i 
am getting ir not a stop image but rather 17.2.5 with some additional fields as 
I posted above, very strange.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Issue upgrading 17.2.0 to 17.2.5

2023-03-07 Thread Adam King
>
> Current cluster status says healthy but I cannot deploy new daemons, the
>> mgr information isnt refreshing (5 days old info) under hosts and services
>> but the main dashboard is accurate like ceph -s
>> Ceph -s will show accurate information but things like ceph orch ps
>> --daemon-type mgr will say that I have 5MGRs running which is inaccurate,
>> nor will it let me remove them manually as it says theyre not found
>>
>
Can you try a mgr failover (ceph mgr fail), wait ~5 minutes and then see
what actually gets refreshed (as in check the refreshed column in "ceph
orch ps" and "ceph orch device ls"). Typically when it's having issues like
this where it's "stuck" and not refreshing there is an issue blocking the
refresh on one specific host, so would be good to see if most hosts refresh
and there is only specific host(s) where the refresh doesn't occur.

osd.11  basic
>  container_imagestop
>
osd.47  basic
>  container_image17.2.5
>*

osd.49  basic
>  container_image17.2.5
>*


That looks bad.  Might be worth trying just a "ceph config set osd
container_image
quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346"
to get all the osd config options onto a valid image. With those options it
will try to use the image "stop" or "17.2.5" when redeploying or upgrading
those OSDs.

On Tue, Mar 7, 2023 at 11:40 AM  wrote:

> Hello at this point I've tried to upgrade a few times so I believe the
> command is long gone. On another forum someone was eluding that i
> accidentally set the image to "stop" instead of running a proper upgrade
> stop command but I couldnt find anything like that on the hosts I ran
> commands from but wouldnt be surprised if i accidentally pasted then wrote
> additional commands to it.
>
> The failing OSD was interesting, ceph didnt report it as a stray daemon
> but i noticed it was showing as a daemon but not as an actual OSD for
> storage in ceph, so I attempted to remove it and it would eventually come
> back.
>
> It had upgraded all the managers, mons to 17.2.5. Some OSDs had upgraded
> as well.
> Current cluster status says healthy but I cannot deploy new daemons, the
> mgr information isnt refreshing (5 days old info) under hosts and services
> but the main dashboard is accurate like ceph -s
> Ceph -s will show accurate information but things like ceph orch ps
> --daemon-type mgr will say that I have 5MGRs running which is inaccurate,
> nor will it let me remove them manually as it says theyre not found
>
> ERROR: Failed command: /usr/bin/docker pull 17.2.5
> 2023-03-06T09:26:55.925386-0700 mgr.mgr.idvkbw [DBG] serve loop sleep
> 2023-03-06T09:26:55.925507-0700 mgr.mgr.idvkbw [DBG] Sleeping for 60
> seconds
> 2023-03-06T09:27:55.925847-0700 mgr.mgr.idvkbw [DBG] serve loop wake
> 2023-03-06T09:27:55.925959-0700 mgr.mgr.idvkbw [DBG] serve loop start
> 2023-03-06T09:27:55.929849-0700 mgr.mgr.idvkbw [DBG] mon_command: 'config
> dump' -> 0 in 0.004s
> 2023-03-06T09:27:55.931625-0700 mgr.mgr.idvkbw [DBG] _run_cephadm :
> command = pull
> 2023-03-06T09:27:55.932025-0700 mgr.mgr.idvkbw [DBG] _run_cephadm : args =
> []
> 2023-03-06T09:27:55.932469-0700 mgr.mgr.idvkbw [DBG] args: --image 17.2.5
> --no-container-init pull
> 2023-03-06T09:27:55.932925-0700 mgr.mgr.idvkbw [DBG] Running command:
> which python3
> 2023-03-06T09:27:55.968793-0700 mgr.mgr.idvkbw [DBG] Running command:
> /usr/bin/python3
> /var/lib/ceph/5058e342-dac7-11ec-ada3-01065e90228d/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e
> --image 17.2.5 --no-container-init pull
> 2023-03-06T09:27:57.278932-0700 mgr.mgr.idvkbw [DBG] code: 1
> 2023-03-06T09:27:57.279045-0700 mgr.mgr.idvkbw [DBG] err: Pulling
> container image 17.2.5...
> Non-zero exit code 1 from /usr/bin/docker pull 17.2.5
> /usr/bin/docker: stdout Using default tag: latest
> /usr/bin/docker: stderr Error response from daemon: pull access denied for
> 17.2.5, repository does not exist or may require 'docker login': denied:
> requested access to the resource is denied
> ERROR: Failed command: /usr/bin/docker pull 17.2.5
>
> 2023-03-06T09:27:57.280517-0700 mgr.mgr.idvkbw [DBG] serve loop
>
> I had stopped the upgrade before so its at
> neteng@mon:~$ ceph orch upgrade status
> {
> "target_image": null,
> "in_progress": false,
> "which": "",
> "services_complete": [],
> "progress": null,
> "message": "",
> "is_paused": false
> }
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___

[ceph-users] Re: Issue upgrading 17.2.0 to 17.2.5

2023-03-07 Thread aellahib
This is the output

{
"target_image": null,
"in_progress": false,
"which": "",
"services_complete": [],
"progress": null,
"message": "",
"is_paused": false
}


grep image
global  basic 
container_image
quay.io/ceph/ceph@sha256:12a0a4f43413fd97a14a3d47a3451b2d2df50020835bb93db666209f3f77617a
  *
mon basic 
container_image
quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346
  *
mgr basic 
container_image
quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346
  *
osd.0   basic 
container_image
quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346
  *
osd.1   basic 
container_image
quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346
  *
osd.11  basic 
container_imagestop 
  *
osd.16  basic 
container_image
quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346
  *
osd.17  basic 
container_image
quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346
  *
osd.2   basic 
container_image
quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346
  *
osd.25  basic 
container_image
quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346
  *
osd.3   basic 
container_image
quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346
  *
osd.34  basic 
container_image
quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346
  *
osd.35  basic 
container_image
quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346
  *
osd.37  basic 
container_image
quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346
  *
osd.38  basic 
container_image
quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346
  *
osd.39  basic 
container_image
quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346
  *
osd.40  basic 
container_image
quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346
  *
osd.42  basic 
container_image
quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346
  *
osd.43  basic 
container_image
quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346
  *
osd.44  basic 
container_image
quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346
  *
osd.45  basic 
container_image
quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346
  *
osd.46  basic 
container_image
quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346
  *
osd.47  basic 
container_image17.2.5   
  *
osd.49  basic 
container_image 

[ceph-users] Re: Issue upgrading 17.2.0 to 17.2.5

2023-03-07 Thread aellahib
Hello at this point I've tried to upgrade a few times so I believe the command 
is long gone. On another forum someone was eluding that i accidentally set the 
image to "stop" instead of running a proper upgrade stop command but I couldnt 
find anything like that on the hosts I ran commands from but wouldnt be 
surprised if i accidentally pasted then wrote additional commands to it.

The failing OSD was interesting, ceph didnt report it as a stray daemon but i 
noticed it was showing as a daemon but not as an actual OSD for storage in 
ceph, so I attempted to remove it and it would eventually come back.

It had upgraded all the managers, mons to 17.2.5. Some OSDs had upgraded as 
well.
Current cluster status says healthy but I cannot deploy new daemons, the mgr 
information isnt refreshing (5 days old info) under hosts and services but the 
main dashboard is accurate like ceph -s
Ceph -s will show accurate information but things like ceph orch ps 
--daemon-type mgr will say that I have 5MGRs running which is inaccurate, nor 
will it let me remove them manually as it says theyre not found

ERROR: Failed command: /usr/bin/docker pull 17.2.5
2023-03-06T09:26:55.925386-0700 mgr.mgr.idvkbw [DBG] serve loop sleep
2023-03-06T09:26:55.925507-0700 mgr.mgr.idvkbw [DBG] Sleeping for 60 seconds
2023-03-06T09:27:55.925847-0700 mgr.mgr.idvkbw [DBG] serve loop wake
2023-03-06T09:27:55.925959-0700 mgr.mgr.idvkbw [DBG] serve loop start
2023-03-06T09:27:55.929849-0700 mgr.mgr.idvkbw [DBG] mon_command: 'config dump' 
-> 0 in 0.004s
2023-03-06T09:27:55.931625-0700 mgr.mgr.idvkbw [DBG] _run_cephadm : command = 
pull
2023-03-06T09:27:55.932025-0700 mgr.mgr.idvkbw [DBG] _run_cephadm : args = []
2023-03-06T09:27:55.932469-0700 mgr.mgr.idvkbw [DBG] args: --image 17.2.5 
--no-container-init pull
2023-03-06T09:27:55.932925-0700 mgr.mgr.idvkbw [DBG] Running command: which 
python3
2023-03-06T09:27:55.968793-0700 mgr.mgr.idvkbw [DBG] Running command: 
/usr/bin/python3 
/var/lib/ceph/5058e342-dac7-11ec-ada3-01065e90228d/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e
 --image 17.2.5 --no-container-init pull
2023-03-06T09:27:57.278932-0700 mgr.mgr.idvkbw [DBG] code: 1
2023-03-06T09:27:57.279045-0700 mgr.mgr.idvkbw [DBG] err: Pulling container 
image 17.2.5...
Non-zero exit code 1 from /usr/bin/docker pull 17.2.5
/usr/bin/docker: stdout Using default tag: latest
/usr/bin/docker: stderr Error response from daemon: pull access denied for 
17.2.5, repository does not exist or may require 'docker login': denied: 
requested access to the resource is denied
ERROR: Failed command: /usr/bin/docker pull 17.2.5

2023-03-06T09:27:57.280517-0700 mgr.mgr.idvkbw [DBG] serve loop

I had stopped the upgrade before so its at
neteng@mon:~$ ceph orch upgrade status
{
"target_image": null,
"in_progress": false,
"which": "",
"services_complete": [],
"progress": null,
"message": "",
"is_paused": false
}
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Issue upgrading 17.2.0 to 17.2.5

2023-03-06 Thread David Orman
I've seen what appears to be the same post on Reddit, previously, and attempted 
to assist. My suspicion is a "stop" command was passed to ceph orch upgrade in 
an attempt to stop it, but with the --image flag preceding it, setting the 
image to stop. I asked the user to do an actual upgrade stop, then re-attempt 
specifying a different image, and the user indicated the "stop" image pull 
attempt continued. That part didn't seem right, to which I suggested a bug 
report.

https://www.reddit.com/r/ceph/comments/11g3rze/anyone_having_pull_issues_with_ceph_images/

@OP - are you the same poster as the above, or do you just have the same 
problem? If there's multiple users with this, it would indicate something 
larger than just a misplaced option/flag/command. If it is you - could you link 
to the bug report?

Just to make sure, you've issued:

"ceph orch upgrade stop"

Then performed another "ceph orch upgrade start" specifying a --ceph-version or 
--image?

I'll also echo Adam's request for a "ceph config dump |grep image". It sounds 
like it's still set to "stop", but I'd have expected the above to initiate an 
upgrade to the correct image. If not, the bug report would be helpful to 
continue so it could be fixed.

David

On Mon, Mar 6, 2023, at 15:02, Adam King wrote:
> Can I see the output of `ceph orch upgrade status` and `ceph config dump |
> grep image`? The "Pulling container image stop" implies somehow (as Eugen
> pointed out) that cephadm thinks the image to pull is named "stop" which
> means it is likely set as either the image to upgrade to or as one of the
> config options.
>
> On Sat, Mar 4, 2023 at 2:06 AM  wrote:
>
>> I initially ran the upgrade fine but it failed @ around 40/100 on an osd,
>> so after waiting for  along time i thought I'd try restarting it and then
>> restarting the upgrade.
>> I am stuck with the below debug error, I have tested docker pull from
>> other servers and they dont fail for the ceph images but on ceph it does.
>> If i even try to redeploy or add or remove mon damons for example it comes
>> up with the same error related to the images.
>>
>> The error that ceph is giving me is:
>> 2023-03-02T07:22:45.063976-0700 mgr.mgr-node.idvkbw [DBG] _run_cephadm :
>> args = []
>> 2023-03-02T07:22:45.070342-0700 mgr.mgr-node.idvkbw [DBG] args: --image
>> stop --no-container-init pull
>> 2023-03-02T07:22:45.081086-0700 mgr.mgr-node.idvkbw [DBG] Running command:
>> which python3
>> 2023-03-02T07:22:45.180052-0700 mgr.mgr-node.idvkbw [DBG] Running command:
>> /usr/bin/python3
>> /var/lib/ceph/5058e342-dac7-11ec-ada3-01065e90228d/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e
>> --image stop --no-container-init pull
>> 2023-03-02T07:22:46.500561-0700 mgr.mgr-node.idvkbw [DBG] code: 1
>> 2023-03-02T07:22:46.500787-0700 mgr.mgr-node.idvkbw [DBG] err: Pulling
>> container image stop...
>> Non-zero exit code 1 from /usr/bin/docker pull stop
>> /usr/bin/docker: stdout Using default tag: latest
>> /usr/bin/docker: stderr Error response from daemon: pull access denied for
>> stop, repository does not exist or may require 'docker login': denied:
>> requested access to the resource is denied
>> ERROR: Failed command: /usr/bin/docker pull stop
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Issue upgrading 17.2.0 to 17.2.5

2023-03-06 Thread Adam King
Can I see the output of `ceph orch upgrade status` and `ceph config dump |
grep image`? The "Pulling container image stop" implies somehow (as Eugen
pointed out) that cephadm thinks the image to pull is named "stop" which
means it is likely set as either the image to upgrade to or as one of the
config options.

On Sat, Mar 4, 2023 at 2:06 AM  wrote:

> I initially ran the upgrade fine but it failed @ around 40/100 on an osd,
> so after waiting for  along time i thought I'd try restarting it and then
> restarting the upgrade.
> I am stuck with the below debug error, I have tested docker pull from
> other servers and they dont fail for the ceph images but on ceph it does.
> If i even try to redeploy or add or remove mon damons for example it comes
> up with the same error related to the images.
>
> The error that ceph is giving me is:
> 2023-03-02T07:22:45.063976-0700 mgr.mgr-node.idvkbw [DBG] _run_cephadm :
> args = []
> 2023-03-02T07:22:45.070342-0700 mgr.mgr-node.idvkbw [DBG] args: --image
> stop --no-container-init pull
> 2023-03-02T07:22:45.081086-0700 mgr.mgr-node.idvkbw [DBG] Running command:
> which python3
> 2023-03-02T07:22:45.180052-0700 mgr.mgr-node.idvkbw [DBG] Running command:
> /usr/bin/python3
> /var/lib/ceph/5058e342-dac7-11ec-ada3-01065e90228d/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e
> --image stop --no-container-init pull
> 2023-03-02T07:22:46.500561-0700 mgr.mgr-node.idvkbw [DBG] code: 1
> 2023-03-02T07:22:46.500787-0700 mgr.mgr-node.idvkbw [DBG] err: Pulling
> container image stop...
> Non-zero exit code 1 from /usr/bin/docker pull stop
> /usr/bin/docker: stdout Using default tag: latest
> /usr/bin/docker: stderr Error response from daemon: pull access denied for
> stop, repository does not exist or may require 'docker login': denied:
> requested access to the resource is denied
> ERROR: Failed command: /usr/bin/docker pull stop
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Issue upgrading 17.2.0 to 17.2.5

2023-03-05 Thread Eugen Block

Hi,

can you paste the exact command of your upgrade attempt? It looks like  
„stop“ is supposed to be the image name? An upgrade usually starts  
with the MGR, then MONs and then OSDs, does ‚ceph versions‘ reflect  
that some of the OSDs were upgraded successfully? Do you have logs  
from the failing OSDs? For example the cephadm.log on the host where  
an OSD upgrade failed and the active MGR at time could help figuring  
this out.

Also what’s the current ceph status? And also add ‚ceph orch upgrade status‘.

Regards
Eugen

Zitat von aella...@gmail.com:

I initially ran the upgrade fine but it failed @ around 40/100 on an  
osd, so after waiting for  along time i thought I'd try restarting  
it and then restarting the upgrade.
I am stuck with the below debug error, I have tested docker pull  
from other servers and they dont fail for the ceph images but on  
ceph it does. If i even try to redeploy or add or remove mon damons  
for example it comes up with the same error related to the images.


The error that ceph is giving me is:
2023-03-02T07:22:45.063976-0700 mgr.mgr-node.idvkbw [DBG]  
_run_cephadm : args = []
2023-03-02T07:22:45.070342-0700 mgr.mgr-node.idvkbw [DBG] args:  
--image stop --no-container-init pull
2023-03-02T07:22:45.081086-0700 mgr.mgr-node.idvkbw [DBG] Running  
command: which python3
2023-03-02T07:22:45.180052-0700 mgr.mgr-node.idvkbw [DBG] Running  
command: /usr/bin/python3  
/var/lib/ceph/5058e342-dac7-11ec-ada3-01065e90228d/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e --image stop --no-container-init  
pull

2023-03-02T07:22:46.500561-0700 mgr.mgr-node.idvkbw [DBG] code: 1
2023-03-02T07:22:46.500787-0700 mgr.mgr-node.idvkbw [DBG] err:  
Pulling container image stop...

Non-zero exit code 1 from /usr/bin/docker pull stop
/usr/bin/docker: stdout Using default tag: latest
/usr/bin/docker: stderr Error response from daemon: pull access  
denied for stop, repository does not exist or may require 'docker  
login': denied: requested access to the resource is denied

ERROR: Failed command: /usr/bin/docker pull stop
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io