[ceph-users] Re: Issue upgrading 17.2.0 to 17.2.5
Your advice regarding the set container images manually did lead me to check cephadmin config to see what other nodes are set to and i did see stop and 17.2.5 set for certain nodes and OSDs. As soon as I pointed all of them the right away my logs started showing real data and I can deploy and configure nodes. Thank you very much for your help! I will attempt to upgrade again soon. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Issue upgrading 17.2.0 to 17.2.5
Hey David, yes its me..Thank you for your help btw. I was waiting on my acceptance to the ceph tracker website. Seems it is in so I will submit a request soon, but I havent been able to reproduce it so I am not sure if I can provide relevant info for that. I already ran that orch upgrade stop command multiple times, the new return i am getting ir not a stop image but rather 17.2.5 with some additional fields as I posted above, very strange. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Issue upgrading 17.2.0 to 17.2.5
> > Current cluster status says healthy but I cannot deploy new daemons, the >> mgr information isnt refreshing (5 days old info) under hosts and services >> but the main dashboard is accurate like ceph -s >> Ceph -s will show accurate information but things like ceph orch ps >> --daemon-type mgr will say that I have 5MGRs running which is inaccurate, >> nor will it let me remove them manually as it says theyre not found >> > Can you try a mgr failover (ceph mgr fail), wait ~5 minutes and then see what actually gets refreshed (as in check the refreshed column in "ceph orch ps" and "ceph orch device ls"). Typically when it's having issues like this where it's "stuck" and not refreshing there is an issue blocking the refresh on one specific host, so would be good to see if most hosts refresh and there is only specific host(s) where the refresh doesn't occur. osd.11 basic > container_imagestop > osd.47 basic > container_image17.2.5 >* osd.49 basic > container_image17.2.5 >* That looks bad. Might be worth trying just a "ceph config set osd container_image quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346" to get all the osd config options onto a valid image. With those options it will try to use the image "stop" or "17.2.5" when redeploying or upgrading those OSDs. On Tue, Mar 7, 2023 at 11:40 AM wrote: > Hello at this point I've tried to upgrade a few times so I believe the > command is long gone. On another forum someone was eluding that i > accidentally set the image to "stop" instead of running a proper upgrade > stop command but I couldnt find anything like that on the hosts I ran > commands from but wouldnt be surprised if i accidentally pasted then wrote > additional commands to it. > > The failing OSD was interesting, ceph didnt report it as a stray daemon > but i noticed it was showing as a daemon but not as an actual OSD for > storage in ceph, so I attempted to remove it and it would eventually come > back. > > It had upgraded all the managers, mons to 17.2.5. Some OSDs had upgraded > as well. > Current cluster status says healthy but I cannot deploy new daemons, the > mgr information isnt refreshing (5 days old info) under hosts and services > but the main dashboard is accurate like ceph -s > Ceph -s will show accurate information but things like ceph orch ps > --daemon-type mgr will say that I have 5MGRs running which is inaccurate, > nor will it let me remove them manually as it says theyre not found > > ERROR: Failed command: /usr/bin/docker pull 17.2.5 > 2023-03-06T09:26:55.925386-0700 mgr.mgr.idvkbw [DBG] serve loop sleep > 2023-03-06T09:26:55.925507-0700 mgr.mgr.idvkbw [DBG] Sleeping for 60 > seconds > 2023-03-06T09:27:55.925847-0700 mgr.mgr.idvkbw [DBG] serve loop wake > 2023-03-06T09:27:55.925959-0700 mgr.mgr.idvkbw [DBG] serve loop start > 2023-03-06T09:27:55.929849-0700 mgr.mgr.idvkbw [DBG] mon_command: 'config > dump' -> 0 in 0.004s > 2023-03-06T09:27:55.931625-0700 mgr.mgr.idvkbw [DBG] _run_cephadm : > command = pull > 2023-03-06T09:27:55.932025-0700 mgr.mgr.idvkbw [DBG] _run_cephadm : args = > [] > 2023-03-06T09:27:55.932469-0700 mgr.mgr.idvkbw [DBG] args: --image 17.2.5 > --no-container-init pull > 2023-03-06T09:27:55.932925-0700 mgr.mgr.idvkbw [DBG] Running command: > which python3 > 2023-03-06T09:27:55.968793-0700 mgr.mgr.idvkbw [DBG] Running command: > /usr/bin/python3 > /var/lib/ceph/5058e342-dac7-11ec-ada3-01065e90228d/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e > --image 17.2.5 --no-container-init pull > 2023-03-06T09:27:57.278932-0700 mgr.mgr.idvkbw [DBG] code: 1 > 2023-03-06T09:27:57.279045-0700 mgr.mgr.idvkbw [DBG] err: Pulling > container image 17.2.5... > Non-zero exit code 1 from /usr/bin/docker pull 17.2.5 > /usr/bin/docker: stdout Using default tag: latest > /usr/bin/docker: stderr Error response from daemon: pull access denied for > 17.2.5, repository does not exist or may require 'docker login': denied: > requested access to the resource is denied > ERROR: Failed command: /usr/bin/docker pull 17.2.5 > > 2023-03-06T09:27:57.280517-0700 mgr.mgr.idvkbw [DBG] serve loop > > I had stopped the upgrade before so its at > neteng@mon:~$ ceph orch upgrade status > { > "target_image": null, > "in_progress": false, > "which": "", > "services_complete": [], > "progress": null, > "message": "", > "is_paused": false > } > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > > ___
[ceph-users] Re: Issue upgrading 17.2.0 to 17.2.5
This is the output { "target_image": null, "in_progress": false, "which": "", "services_complete": [], "progress": null, "message": "", "is_paused": false } grep image global basic container_image quay.io/ceph/ceph@sha256:12a0a4f43413fd97a14a3d47a3451b2d2df50020835bb93db666209f3f77617a * mon basic container_image quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346 * mgr basic container_image quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346 * osd.0 basic container_image quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346 * osd.1 basic container_image quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346 * osd.11 basic container_imagestop * osd.16 basic container_image quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346 * osd.17 basic container_image quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346 * osd.2 basic container_image quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346 * osd.25 basic container_image quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346 * osd.3 basic container_image quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346 * osd.34 basic container_image quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346 * osd.35 basic container_image quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346 * osd.37 basic container_image quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346 * osd.38 basic container_image quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346 * osd.39 basic container_image quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346 * osd.40 basic container_image quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346 * osd.42 basic container_image quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346 * osd.43 basic container_image quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346 * osd.44 basic container_image quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346 * osd.45 basic container_image quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346 * osd.46 basic container_image quay.io/ceph/ceph@sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346 * osd.47 basic container_image17.2.5 * osd.49 basic container_image
[ceph-users] Re: Issue upgrading 17.2.0 to 17.2.5
Hello at this point I've tried to upgrade a few times so I believe the command is long gone. On another forum someone was eluding that i accidentally set the image to "stop" instead of running a proper upgrade stop command but I couldnt find anything like that on the hosts I ran commands from but wouldnt be surprised if i accidentally pasted then wrote additional commands to it. The failing OSD was interesting, ceph didnt report it as a stray daemon but i noticed it was showing as a daemon but not as an actual OSD for storage in ceph, so I attempted to remove it and it would eventually come back. It had upgraded all the managers, mons to 17.2.5. Some OSDs had upgraded as well. Current cluster status says healthy but I cannot deploy new daemons, the mgr information isnt refreshing (5 days old info) under hosts and services but the main dashboard is accurate like ceph -s Ceph -s will show accurate information but things like ceph orch ps --daemon-type mgr will say that I have 5MGRs running which is inaccurate, nor will it let me remove them manually as it says theyre not found ERROR: Failed command: /usr/bin/docker pull 17.2.5 2023-03-06T09:26:55.925386-0700 mgr.mgr.idvkbw [DBG] serve loop sleep 2023-03-06T09:26:55.925507-0700 mgr.mgr.idvkbw [DBG] Sleeping for 60 seconds 2023-03-06T09:27:55.925847-0700 mgr.mgr.idvkbw [DBG] serve loop wake 2023-03-06T09:27:55.925959-0700 mgr.mgr.idvkbw [DBG] serve loop start 2023-03-06T09:27:55.929849-0700 mgr.mgr.idvkbw [DBG] mon_command: 'config dump' -> 0 in 0.004s 2023-03-06T09:27:55.931625-0700 mgr.mgr.idvkbw [DBG] _run_cephadm : command = pull 2023-03-06T09:27:55.932025-0700 mgr.mgr.idvkbw [DBG] _run_cephadm : args = [] 2023-03-06T09:27:55.932469-0700 mgr.mgr.idvkbw [DBG] args: --image 17.2.5 --no-container-init pull 2023-03-06T09:27:55.932925-0700 mgr.mgr.idvkbw [DBG] Running command: which python3 2023-03-06T09:27:55.968793-0700 mgr.mgr.idvkbw [DBG] Running command: /usr/bin/python3 /var/lib/ceph/5058e342-dac7-11ec-ada3-01065e90228d/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e --image 17.2.5 --no-container-init pull 2023-03-06T09:27:57.278932-0700 mgr.mgr.idvkbw [DBG] code: 1 2023-03-06T09:27:57.279045-0700 mgr.mgr.idvkbw [DBG] err: Pulling container image 17.2.5... Non-zero exit code 1 from /usr/bin/docker pull 17.2.5 /usr/bin/docker: stdout Using default tag: latest /usr/bin/docker: stderr Error response from daemon: pull access denied for 17.2.5, repository does not exist or may require 'docker login': denied: requested access to the resource is denied ERROR: Failed command: /usr/bin/docker pull 17.2.5 2023-03-06T09:27:57.280517-0700 mgr.mgr.idvkbw [DBG] serve loop I had stopped the upgrade before so its at neteng@mon:~$ ceph orch upgrade status { "target_image": null, "in_progress": false, "which": "", "services_complete": [], "progress": null, "message": "", "is_paused": false } ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Issue upgrading 17.2.0 to 17.2.5
I've seen what appears to be the same post on Reddit, previously, and attempted to assist. My suspicion is a "stop" command was passed to ceph orch upgrade in an attempt to stop it, but with the --image flag preceding it, setting the image to stop. I asked the user to do an actual upgrade stop, then re-attempt specifying a different image, and the user indicated the "stop" image pull attempt continued. That part didn't seem right, to which I suggested a bug report. https://www.reddit.com/r/ceph/comments/11g3rze/anyone_having_pull_issues_with_ceph_images/ @OP - are you the same poster as the above, or do you just have the same problem? If there's multiple users with this, it would indicate something larger than just a misplaced option/flag/command. If it is you - could you link to the bug report? Just to make sure, you've issued: "ceph orch upgrade stop" Then performed another "ceph orch upgrade start" specifying a --ceph-version or --image? I'll also echo Adam's request for a "ceph config dump |grep image". It sounds like it's still set to "stop", but I'd have expected the above to initiate an upgrade to the correct image. If not, the bug report would be helpful to continue so it could be fixed. David On Mon, Mar 6, 2023, at 15:02, Adam King wrote: > Can I see the output of `ceph orch upgrade status` and `ceph config dump | > grep image`? The "Pulling container image stop" implies somehow (as Eugen > pointed out) that cephadm thinks the image to pull is named "stop" which > means it is likely set as either the image to upgrade to or as one of the > config options. > > On Sat, Mar 4, 2023 at 2:06 AM wrote: > >> I initially ran the upgrade fine but it failed @ around 40/100 on an osd, >> so after waiting for along time i thought I'd try restarting it and then >> restarting the upgrade. >> I am stuck with the below debug error, I have tested docker pull from >> other servers and they dont fail for the ceph images but on ceph it does. >> If i even try to redeploy or add or remove mon damons for example it comes >> up with the same error related to the images. >> >> The error that ceph is giving me is: >> 2023-03-02T07:22:45.063976-0700 mgr.mgr-node.idvkbw [DBG] _run_cephadm : >> args = [] >> 2023-03-02T07:22:45.070342-0700 mgr.mgr-node.idvkbw [DBG] args: --image >> stop --no-container-init pull >> 2023-03-02T07:22:45.081086-0700 mgr.mgr-node.idvkbw [DBG] Running command: >> which python3 >> 2023-03-02T07:22:45.180052-0700 mgr.mgr-node.idvkbw [DBG] Running command: >> /usr/bin/python3 >> /var/lib/ceph/5058e342-dac7-11ec-ada3-01065e90228d/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e >> --image stop --no-container-init pull >> 2023-03-02T07:22:46.500561-0700 mgr.mgr-node.idvkbw [DBG] code: 1 >> 2023-03-02T07:22:46.500787-0700 mgr.mgr-node.idvkbw [DBG] err: Pulling >> container image stop... >> Non-zero exit code 1 from /usr/bin/docker pull stop >> /usr/bin/docker: stdout Using default tag: latest >> /usr/bin/docker: stderr Error response from daemon: pull access denied for >> stop, repository does not exist or may require 'docker login': denied: >> requested access to the resource is denied >> ERROR: Failed command: /usr/bin/docker pull stop >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io >> >> > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Issue upgrading 17.2.0 to 17.2.5
Can I see the output of `ceph orch upgrade status` and `ceph config dump | grep image`? The "Pulling container image stop" implies somehow (as Eugen pointed out) that cephadm thinks the image to pull is named "stop" which means it is likely set as either the image to upgrade to or as one of the config options. On Sat, Mar 4, 2023 at 2:06 AM wrote: > I initially ran the upgrade fine but it failed @ around 40/100 on an osd, > so after waiting for along time i thought I'd try restarting it and then > restarting the upgrade. > I am stuck with the below debug error, I have tested docker pull from > other servers and they dont fail for the ceph images but on ceph it does. > If i even try to redeploy or add or remove mon damons for example it comes > up with the same error related to the images. > > The error that ceph is giving me is: > 2023-03-02T07:22:45.063976-0700 mgr.mgr-node.idvkbw [DBG] _run_cephadm : > args = [] > 2023-03-02T07:22:45.070342-0700 mgr.mgr-node.idvkbw [DBG] args: --image > stop --no-container-init pull > 2023-03-02T07:22:45.081086-0700 mgr.mgr-node.idvkbw [DBG] Running command: > which python3 > 2023-03-02T07:22:45.180052-0700 mgr.mgr-node.idvkbw [DBG] Running command: > /usr/bin/python3 > /var/lib/ceph/5058e342-dac7-11ec-ada3-01065e90228d/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e > --image stop --no-container-init pull > 2023-03-02T07:22:46.500561-0700 mgr.mgr-node.idvkbw [DBG] code: 1 > 2023-03-02T07:22:46.500787-0700 mgr.mgr-node.idvkbw [DBG] err: Pulling > container image stop... > Non-zero exit code 1 from /usr/bin/docker pull stop > /usr/bin/docker: stdout Using default tag: latest > /usr/bin/docker: stderr Error response from daemon: pull access denied for > stop, repository does not exist or may require 'docker login': denied: > requested access to the resource is denied > ERROR: Failed command: /usr/bin/docker pull stop > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Issue upgrading 17.2.0 to 17.2.5
Hi, can you paste the exact command of your upgrade attempt? It looks like „stop“ is supposed to be the image name? An upgrade usually starts with the MGR, then MONs and then OSDs, does ‚ceph versions‘ reflect that some of the OSDs were upgraded successfully? Do you have logs from the failing OSDs? For example the cephadm.log on the host where an OSD upgrade failed and the active MGR at time could help figuring this out. Also what’s the current ceph status? And also add ‚ceph orch upgrade status‘. Regards Eugen Zitat von aella...@gmail.com: I initially ran the upgrade fine but it failed @ around 40/100 on an osd, so after waiting for along time i thought I'd try restarting it and then restarting the upgrade. I am stuck with the below debug error, I have tested docker pull from other servers and they dont fail for the ceph images but on ceph it does. If i even try to redeploy or add or remove mon damons for example it comes up with the same error related to the images. The error that ceph is giving me is: 2023-03-02T07:22:45.063976-0700 mgr.mgr-node.idvkbw [DBG] _run_cephadm : args = [] 2023-03-02T07:22:45.070342-0700 mgr.mgr-node.idvkbw [DBG] args: --image stop --no-container-init pull 2023-03-02T07:22:45.081086-0700 mgr.mgr-node.idvkbw [DBG] Running command: which python3 2023-03-02T07:22:45.180052-0700 mgr.mgr-node.idvkbw [DBG] Running command: /usr/bin/python3 /var/lib/ceph/5058e342-dac7-11ec-ada3-01065e90228d/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e --image stop --no-container-init pull 2023-03-02T07:22:46.500561-0700 mgr.mgr-node.idvkbw [DBG] code: 1 2023-03-02T07:22:46.500787-0700 mgr.mgr-node.idvkbw [DBG] err: Pulling container image stop... Non-zero exit code 1 from /usr/bin/docker pull stop /usr/bin/docker: stdout Using default tag: latest /usr/bin/docker: stderr Error response from daemon: pull access denied for stop, repository does not exist or may require 'docker login': denied: requested access to the resource is denied ERROR: Failed command: /usr/bin/docker pull stop ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io