[ceph-users] Re: Problems adding a new host via orchestration. (solved)

2024-05-29 Thread mcpherson
Hi everyone.

I had the very same problem, and I believe I've figured out what is happening.  
Many admins advise using "ufw limit ssh" to help protect your system against 
brute-force password guessing.  Well, "ceph orch host add" makes multiple ssh 
connections very quickly and triggers the ufw limit.  I switched to "ufw allow 
ssh" and everything works perfectly.

Mike
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Problems adding a new host via orchestration. (solved)

2024-02-10 Thread Eugen Block

Oh really? That seems rather trivial but alright. :-D

Zitat von Gary Molenkamp :


Just wanted to follow up on this to say that it is now working.

After reviewing the configuration of the new host many times, I did  
a hard restart of the active mrg container.

The command to add the new host proceeded without error.

Thanks everyone.
Gary



On 2024-02-06 16:01, Tim Holloway wrote:
[You don't often get email from t...@mousetech.com. Learn why this  
is important at https://aka.ms/LearnAboutSenderIdentification ]


Just FYI, I've seen this on CentOS systems as well, and I'm not even
sure that it was just for Ceph. Maybe some stuff like Ansible.

I THINK you can safely ignore that message or alternatively that it's
such an easy fix that senility has already driven it from my mind.

Tim

On Tue, 2024-02-06 at 14:44 -0500, Gary Molenkamp wrote:

I confirmed selinux is disabled on all existing and new hosts.
Likewise,
python3.6 is installed on all as well.  (3.9.16 on RL8, 3.9.18 on
RL9).

I am running 16.2.12 on all containers, so it may be worth updating
to
16.2.14 to ensure I'm on the latest Pacific release.

Gary


On 2024-02-05 08:17, Curt wrote:


You don't often get email from light...@gmail.com. Learn why this
is
important 


I don't use rocky, so stab in the dark and probably not the issue,
but
could selinux be blocking the process?  Really long shot, but
python3
is in the standard location? So if you run python3 --version as
your
ceph user it returns?

Probably not much help, but figured I'd throw it out there.

On Mon, 5 Feb 2024, 16:54 Gary Molenkamp,  wrote:

I have verified the server's expected hostname (with
`hostname`)
matches
the hostname I am trying to use.
Just to be sure, I also ran:
 cephadm check-host --expect-hostname 
and it returns:
 Hostname "" matches what is expected.

On the current admin server where I am trying to add the host,
the
host
is reachable, the shortname even matches proper IP with dns
search
order.
Likewise, on the server where the mgr is running, I am able to
confirm
reachability and DNS resolution for the new server as well.

I thought this may be a DNS/name resolution issue as well, but
I
don't
see any errors in my setup wrt to host naming.

Thanks
Gary


On 2024-02-03 06:46, Eugen Block wrote:
> Hi,
>
> I found this blog post [1] which reports the same error
message. It
> seems a bit misleading because it appears to be about DNS.
Can
you check
>
> cephadm check-host --expect-hostname 
>
> Or is that what you already tried? It's not entirely clear
how you
> checked the hostname.
>
> Regards,
> Eugen
>
> [1]
>

https://blog.mousetech.com/ceph-distributed-file-system-for-the-enterprise/ceph-bogus-error-cannot-allocate-memory/
>
> Zitat von Gary Molenkamp :
>
>> Happy Friday all.  I was hoping someone could point me in
the
right
>> direction or clarify any limitations that could be impacting
an
issue
>> I am having.
>>
>> I'm struggling to add a new set of hosts to my ceph cluster
using
>> cephadm and orchestration.  When trying to add a host:
>> "ceph orch host add  172.31.102.41 --labels
_admin"
>> returns:
>> "Error EINVAL: Can't communicate with remote host
>> `172.31.102.41`, possibly because python3 is not installed
there:
>> [Errno 12] Cannot allocate memory"
>>
>> I've verified that the ceph ssh key works to the remote
host,
host's
>> name matches that returned from `hostname`, python3 is
installed, and
>> "/usr/sbin/cephadm prepare-host" on the new hosts returns
"host is
>> ok".In addition, the cluster ssh key works between hosts
and the
>> existing hosts are able to ssh in using the ceph key.
>>
>> The existing ceph cluster is Pacific release using docker
based
>> containerization on RockyLinux8 base OS.  The new hosts are
>> RockyLinux9 based, with the cephadm being installed from
Quincy
release:
>> ./cephadm add-repo --release quincy
>> ./cephadm install
>> I did try installing cephadm from the Pacific release by
changing the
>> repo to el8,  but that did not work either.
>>
>> Is there a limitation is mixing RL8 and RL9 container hosts
under
>> Pacific?  Does this same limitation exist under Quincy? Is
there a
>> python version dependency?
>> The reason for RL9 on the new hosts is to stage upgrading
the OS's
>> for the cluster.  I did this under Octopus for moving from
Centos7 to
>> RL8.
>>
>> Thanks and I appreciate any feedback/pointers.
>> Gary
>>
>>
>> I've added the log trace here in case that helps (from `ceph
log last
>> cephadm`)
>>
>>
>>
>> 2024-02-02T14:22:32.610048+ mgr.storage01.oonvfl

[ceph-users] Re: Problems adding a new host via orchestration. (solved)

2024-02-09 Thread Gary Molenkamp

Just wanted to follow up on this to say that it is now working.

After reviewing the configuration of the new host many times, I did a 
hard restart of the active mrg container.

The command to add the new host proceeded without error.

Thanks everyone.
Gary



On 2024-02-06 16:01, Tim Holloway wrote:

[You don't often get email from t...@mousetech.com. Learn why this is important 
at https://aka.ms/LearnAboutSenderIdentification ]

Just FYI, I've seen this on CentOS systems as well, and I'm not even
sure that it was just for Ceph. Maybe some stuff like Ansible.

I THINK you can safely ignore that message or alternatively that it's
such an easy fix that senility has already driven it from my mind.

 Tim

On Tue, 2024-02-06 at 14:44 -0500, Gary Molenkamp wrote:

I confirmed selinux is disabled on all existing and new hosts.
Likewise,
python3.6 is installed on all as well.  (3.9.16 on RL8, 3.9.18 on
RL9).

I am running 16.2.12 on all containers, so it may be worth updating
to
16.2.14 to ensure I'm on the latest Pacific release.

Gary


On 2024-02-05 08:17, Curt wrote:


You don't often get email from light...@gmail.com. Learn why this
is
important 


I don't use rocky, so stab in the dark and probably not the issue,
but
could selinux be blocking the process?  Really long shot, but
python3
is in the standard location? So if you run python3 --version as
your
ceph user it returns?

Probably not much help, but figured I'd throw it out there.

On Mon, 5 Feb 2024, 16:54 Gary Molenkamp,  wrote:

 I have verified the server's expected hostname (with
`hostname`)
 matches
 the hostname I am trying to use.
 Just to be sure, I also ran:
  cephadm check-host --expect-hostname 
 and it returns:
  Hostname "" matches what is expected.

 On the current admin server where I am trying to add the host,
the
 host
 is reachable, the shortname even matches proper IP with dns
search
 order.
 Likewise, on the server where the mgr is running, I am able to
 confirm
 reachability and DNS resolution for the new server as well.

 I thought this may be a DNS/name resolution issue as well, but
I
 don't
 see any errors in my setup wrt to host naming.

 Thanks
 Gary


 On 2024-02-03 06:46, Eugen Block wrote:
 > Hi,
 >
 > I found this blog post [1] which reports the same error
message. It
 > seems a bit misleading because it appears to be about DNS.
Can
 you check
 >
 > cephadm check-host --expect-hostname 
 >
 > Or is that what you already tried? It's not entirely clear
how you
 > checked the hostname.
 >
 > Regards,
 > Eugen
 >
 > [1]
 >

https://blog.mousetech.com/ceph-distributed-file-system-for-the-enterprise/ceph-bogus-error-cannot-allocate-memory/
 >
 > Zitat von Gary Molenkamp :
 >
 >> Happy Friday all.  I was hoping someone could point me in
the
 right
 >> direction or clarify any limitations that could be impacting
an
 issue
 >> I am having.
 >>
 >> I'm struggling to add a new set of hosts to my ceph cluster
using
 >> cephadm and orchestration.  When trying to add a host:
 >> "ceph orch host add  172.31.102.41 --labels
_admin"
 >> returns:
 >> "Error EINVAL: Can't communicate with remote host
 >> `172.31.102.41`, possibly because python3 is not installed
there:
 >> [Errno 12] Cannot allocate memory"
 >>
 >> I've verified that the ceph ssh key works to the remote
host,
 host's
 >> name matches that returned from `hostname`, python3 is
 installed, and
 >> "/usr/sbin/cephadm prepare-host" on the new hosts returns
"host is
 >> ok".In addition, the cluster ssh key works between hosts
 and the
 >> existing hosts are able to ssh in using the ceph key.
 >>
 >> The existing ceph cluster is Pacific release using docker
based
 >> containerization on RockyLinux8 base OS.  The new hosts are
 >> RockyLinux9 based, with the cephadm being installed from
Quincy
 release:
 >> ./cephadm add-repo --release quincy
 >> ./cephadm install
 >> I did try installing cephadm from the Pacific release by
 changing the
 >> repo to el8,  but that did not work either.
 >>
 >> Is there a limitation is mixing RL8 and RL9 container hosts
under
 >> Pacific?  Does this same limitation exist under Quincy? Is
there a
 >> python version dependency?
 >> The reason for RL9 on the new hosts is to stage upgrading
the OS's
 >> for the cluster.  I did this under Octopus for moving from
 Centos7 to
 >> RL8.
 >>
 >> Thanks and I appreciate any feedback/pointers.
 >> Gary
 >>
 >>
 >> I've added the log trace here in case that helps (from `ceph
 log last
 >> cephadm`)
 >>
 >>
 >>
 >> 2024-02-02T14:22:32.610048+ mgr.storage01.oonvfl
  

[ceph-users] Re: Problems adding a new host via orchestration.

2024-02-07 Thread Eugen Block
I still don't have an explanation or other ideas, but I was able to  
add a Rocky Linux 9 host to my existing quincy cluster based on  
openSUSE (don't have pacific in this environment) quite fast and easy.  
It is a fresh Rocky Install, only added cephadm and podman packages,  
copied the ceph.pub key over and ran 'ceph orch host add ...'  
successfully.


Zitat von Gary Molenkamp :

I confirmed selinux is disabled on all existing and new hosts.  
Likewise, python3.6 is installed on all as well.  (3.9.16 on RL8,  
3.9.18 on RL9).


I am running 16.2.12 on all containers, so it may be worth updating  
to 16.2.14 to ensure I'm on the latest Pacific release.


Gary


On 2024-02-05 08:17, Curt wrote:



You don't often get email from light...@gmail.com. Learn why this  
is important 



I don't use rocky, so stab in the dark and probably not the issue,  
but could selinux be blocking the process?  Really long shot, but  
python3 is in the standard location? So if you run python3  
--version as your ceph user it returns?


Probably not much help, but figured I'd throw it out there.

On Mon, 5 Feb 2024, 16:54 Gary Molenkamp,  wrote:

   I have verified the server's expected hostname (with `hostname`)
   matches
   the hostname I am trying to use.
   Just to be sure, I also ran:
    cephadm check-host --expect-hostname 
   and it returns:
    Hostname "" matches what is expected.

   On the current admin server where I am trying to add the host, the
   host
   is reachable, the shortname even matches proper IP with dns search
   order.
   Likewise, on the server where the mgr is running, I am able to
   confirm
   reachability and DNS resolution for the new server as well.

   I thought this may be a DNS/name resolution issue as well, but I
   don't
   see any errors in my setup wrt to host naming.

   Thanks
   Gary


   On 2024-02-03 06:46, Eugen Block wrote:
   > Hi,
   >
   > I found this blog post [1] which reports the same error message. It
   > seems a bit misleading because it appears to be about DNS. Can
   you check
   >
   > cephadm check-host --expect-hostname 
   >
   > Or is that what you already tried? It's not entirely clear how you
   > checked the hostname.
   >
   > Regards,
   > Eugen
   >
   > [1]
   >

https://blog.mousetech.com/ceph-distributed-file-system-for-the-enterprise/ceph-bogus-error-cannot-allocate-memory/

   >
   > Zitat von Gary Molenkamp :
   >
   >> Happy Friday all.  I was hoping someone could point me in the
   right
   >> direction or clarify any limitations that could be impacting an
   issue
   >> I am having.
   >>
   >> I'm struggling to add a new set of hosts to my ceph cluster using
   >> cephadm and orchestration.  When trying to add a host:
   >>     "ceph orch host add  172.31.102.41 --labels _admin"
   >> returns:
   >>     "Error EINVAL: Can't communicate with remote host
   >> `172.31.102.41`, possibly because python3 is not installed there:
   >> [Errno 12] Cannot allocate memory"
   >>
   >> I've verified that the ceph ssh key works to the remote host,
   host's
   >> name matches that returned from `hostname`, python3 is
   installed, and
   >> "/usr/sbin/cephadm prepare-host" on the new hosts returns "host is
   >> ok".    In addition, the cluster ssh key works between hosts
   and the
   >> existing hosts are able to ssh in using the ceph key.
   >>
   >> The existing ceph cluster is Pacific release using docker based
   >> containerization on RockyLinux8 base OS.  The new hosts are
   >> RockyLinux9 based, with the cephadm being installed from Quincy
   release:
   >>         ./cephadm add-repo --release quincy
   >>         ./cephadm install
   >> I did try installing cephadm from the Pacific release by
   changing the
   >> repo to el8,  but that did not work either.
   >>
   >> Is there a limitation is mixing RL8 and RL9 container hosts under
   >> Pacific?  Does this same limitation exist under Quincy? Is there a
   >> python version dependency?
   >> The reason for RL9 on the new hosts is to stage upgrading the OS's
   >> for the cluster.  I did this under Octopus for moving from
   Centos7 to
   >> RL8.
   >>
   >> Thanks and I appreciate any feedback/pointers.
   >> Gary
   >>
   >>
   >> I've added the log trace here in case that helps (from `ceph
   log last
   >> cephadm`)
   >>
   >>
   >>
   >> 2024-02-02T14:22:32.610048+ mgr.storage01.oonvfl
   (mgr.441023307)
   >> 4957871 : cephadm [ERR] Can't communicate with remote host
   >> `172.31.102.41`, possibly because python3 is not installed there:
   >> [Errno 12] Cannot allocate memory
   >> Traceback (most recent call last):
   >>   File "/usr/share/ceph/mgr/cephadm/serve.py", line 1524, in
   >> _remote_connection
   >>     conn, connr = self.mgr._get_connection(addr)
   >>   File "/usr/share/ceph/mgr/cephadm/module.py", line 1370, in
   >> _get_connection
   >>     sudo=True if self.ssh_user != 'root' else False)
   >>   File 

[ceph-users] Re: Problems adding a new host via orchestration.

2024-02-06 Thread Tim Holloway
Just FYI, I've seen this on CentOS systems as well, and I'm not even
sure that it was just for Ceph. Maybe some stuff like Ansible.

I THINK you can safely ignore that message or alternatively that it's
such an easy fix that senility has already driven it from my mind.

Tim

On Tue, 2024-02-06 at 14:44 -0500, Gary Molenkamp wrote:
> I confirmed selinux is disabled on all existing and new hosts.
> Likewise, 
> python3.6 is installed on all as well.  (3.9.16 on RL8, 3.9.18 on
> RL9).
> 
> I am running 16.2.12 on all containers, so it may be worth updating
> to 
> 16.2.14 to ensure I'm on the latest Pacific release.
> 
> Gary
> 
> 
> On 2024-02-05 08:17, Curt wrote:
> > 
> > 
> > You don't often get email from light...@gmail.com. Learn why this
> > is 
> > important 
> > 
> > 
> > I don't use rocky, so stab in the dark and probably not the issue,
> > but 
> > could selinux be blocking the process?  Really long shot, but
> > python3 
> > is in the standard location? So if you run python3 --version as
> > your 
> > ceph user it returns?
> > 
> > Probably not much help, but figured I'd throw it out there.
> > 
> > On Mon, 5 Feb 2024, 16:54 Gary Molenkamp,  wrote:
> > 
> >     I have verified the server's expected hostname (with
> > `hostname`)
> >     matches
> >     the hostname I am trying to use.
> >     Just to be sure, I also ran:
> >      cephadm check-host --expect-hostname 
> >     and it returns:
> >      Hostname "" matches what is expected.
> > 
> >     On the current admin server where I am trying to add the host,
> > the
> >     host
> >     is reachable, the shortname even matches proper IP with dns
> > search
> >     order.
> >     Likewise, on the server where the mgr is running, I am able to
> >     confirm
> >     reachability and DNS resolution for the new server as well.
> > 
> >     I thought this may be a DNS/name resolution issue as well, but
> > I
> >     don't
> >     see any errors in my setup wrt to host naming.
> > 
> >     Thanks
> >     Gary
> > 
> > 
> >     On 2024-02-03 06:46, Eugen Block wrote:
> >     > Hi,
> >     >
> >     > I found this blog post [1] which reports the same error
> > message. It
> >     > seems a bit misleading because it appears to be about DNS.
> > Can
> >     you check
> >     >
> >     > cephadm check-host --expect-hostname 
> >     >
> >     > Or is that what you already tried? It's not entirely clear
> > how you
> >     > checked the hostname.
> >     >
> >     > Regards,
> >     > Eugen
> >     >
> >     > [1]
> >     >
> >    
> > https://blog.mousetech.com/ceph-distributed-file-system-for-the-enterprise/ceph-bogus-error-cannot-allocate-memory/
> >     >
> >     > Zitat von Gary Molenkamp :
> >     >
> >     >> Happy Friday all.  I was hoping someone could point me in
> > the
> >     right
> >     >> direction or clarify any limitations that could be impacting
> > an
> >     issue
> >     >> I am having.
> >     >>
> >     >> I'm struggling to add a new set of hosts to my ceph cluster
> > using
> >     >> cephadm and orchestration.  When trying to add a host:
> >     >>     "ceph orch host add  172.31.102.41 --labels
> > _admin"
> >     >> returns:
> >     >>     "Error EINVAL: Can't communicate with remote host
> >     >> `172.31.102.41`, possibly because python3 is not installed
> > there:
> >     >> [Errno 12] Cannot allocate memory"
> >     >>
> >     >> I've verified that the ceph ssh key works to the remote
> > host,
> >     host's
> >     >> name matches that returned from `hostname`, python3 is
> >     installed, and
> >     >> "/usr/sbin/cephadm prepare-host" on the new hosts returns
> > "host is
> >     >> ok".    In addition, the cluster ssh key works between hosts
> >     and the
> >     >> existing hosts are able to ssh in using the ceph key.
> >     >>
> >     >> The existing ceph cluster is Pacific release using docker
> > based
> >     >> containerization on RockyLinux8 base OS.  The new hosts are
> >     >> RockyLinux9 based, with the cephadm being installed from
> > Quincy
> >     release:
> >     >>         ./cephadm add-repo --release quincy
> >     >>         ./cephadm install
> >     >> I did try installing cephadm from the Pacific release by
> >     changing the
> >     >> repo to el8,  but that did not work either.
> >     >>
> >     >> Is there a limitation is mixing RL8 and RL9 container hosts
> > under
> >     >> Pacific?  Does this same limitation exist under Quincy? Is
> > there a
> >     >> python version dependency?
> >     >> The reason for RL9 on the new hosts is to stage upgrading
> > the OS's
> >     >> for the cluster.  I did this under Octopus for moving from
> >     Centos7 to
> >     >> RL8.
> >     >>
> >     >> Thanks and I appreciate any feedback/pointers.
> >     >> Gary
> >     >>
> >     >>
> >     >> I've added the log trace here in case that helps (from `ceph
> >     log last
> >     >> cephadm`)
> >     >>
> >     >>
> >     >>
> 

[ceph-users] Re: Problems adding a new host via orchestration.

2024-02-06 Thread Gary Molenkamp
I confirmed selinux is disabled on all existing and new hosts. Likewise, 
python3.6 is installed on all as well.  (3.9.16 on RL8, 3.9.18 on RL9).


I am running 16.2.12 on all containers, so it may be worth updating to 
16.2.14 to ensure I'm on the latest Pacific release.


Gary


On 2024-02-05 08:17, Curt wrote:



You don't often get email from light...@gmail.com. Learn why this is 
important 



I don't use rocky, so stab in the dark and probably not the issue, but 
could selinux be blocking the process?  Really long shot, but python3 
is in the standard location? So if you run python3 --version as your 
ceph user it returns?


Probably not much help, but figured I'd throw it out there.

On Mon, 5 Feb 2024, 16:54 Gary Molenkamp,  wrote:

I have verified the server's expected hostname (with `hostname`)
matches
the hostname I am trying to use.
Just to be sure, I also ran:
 cephadm check-host --expect-hostname 
and it returns:
 Hostname "" matches what is expected.

On the current admin server where I am trying to add the host, the
host
is reachable, the shortname even matches proper IP with dns search
order.
Likewise, on the server where the mgr is running, I am able to
confirm
reachability and DNS resolution for the new server as well.

I thought this may be a DNS/name resolution issue as well, but I
don't
see any errors in my setup wrt to host naming.

Thanks
Gary


On 2024-02-03 06:46, Eugen Block wrote:
> Hi,
>
> I found this blog post [1] which reports the same error message. It
> seems a bit misleading because it appears to be about DNS. Can
you check
>
> cephadm check-host --expect-hostname 
>
> Or is that what you already tried? It's not entirely clear how you
> checked the hostname.
>
> Regards,
> Eugen
>
> [1]
>

https://blog.mousetech.com/ceph-distributed-file-system-for-the-enterprise/ceph-bogus-error-cannot-allocate-memory/
>
> Zitat von Gary Molenkamp :
>
>> Happy Friday all.  I was hoping someone could point me in the
right
>> direction or clarify any limitations that could be impacting an
issue
>> I am having.
>>
>> I'm struggling to add a new set of hosts to my ceph cluster using
>> cephadm and orchestration.  When trying to add a host:
>>     "ceph orch host add  172.31.102.41 --labels _admin"
>> returns:
>>     "Error EINVAL: Can't communicate with remote host
>> `172.31.102.41`, possibly because python3 is not installed there:
>> [Errno 12] Cannot allocate memory"
>>
>> I've verified that the ceph ssh key works to the remote host,
host's
>> name matches that returned from `hostname`, python3 is
installed, and
>> "/usr/sbin/cephadm prepare-host" on the new hosts returns "host is
>> ok".    In addition, the cluster ssh key works between hosts
and the
>> existing hosts are able to ssh in using the ceph key.
>>
>> The existing ceph cluster is Pacific release using docker based
>> containerization on RockyLinux8 base OS.  The new hosts are
>> RockyLinux9 based, with the cephadm being installed from Quincy
release:
>>         ./cephadm add-repo --release quincy
>>         ./cephadm install
>> I did try installing cephadm from the Pacific release by
changing the
>> repo to el8,  but that did not work either.
>>
>> Is there a limitation is mixing RL8 and RL9 container hosts under
>> Pacific?  Does this same limitation exist under Quincy? Is there a
>> python version dependency?
>> The reason for RL9 on the new hosts is to stage upgrading the OS's
>> for the cluster.  I did this under Octopus for moving from
Centos7 to
>> RL8.
>>
>> Thanks and I appreciate any feedback/pointers.
>> Gary
>>
>>
>> I've added the log trace here in case that helps (from `ceph
log last
>> cephadm`)
>>
>>
>>
>> 2024-02-02T14:22:32.610048+ mgr.storage01.oonvfl
(mgr.441023307)
>> 4957871 : cephadm [ERR] Can't communicate with remote host
>> `172.31.102.41`, possibly because python3 is not installed there:
>> [Errno 12] Cannot allocate memory
>> Traceback (most recent call last):
>>   File "/usr/share/ceph/mgr/cephadm/serve.py", line 1524, in
>> _remote_connection
>>     conn, connr = self.mgr._get_connection(addr)
>>   File "/usr/share/ceph/mgr/cephadm/module.py", line 1370, in
>> _get_connection
>>     sudo=True if self.ssh_user != 'root' else False)
>>   File "/lib/python3.6/site-packages/remoto/backends/__init__.py",
>> line 35, in __init__
>>     self.gateway = self._make_gateway(hostname)
>>   File "/lib/python3.6/site-packages/remoto/backends/__init__.py",
>> line 46, in _make_gateway
>>     

[ceph-users] Re: Problems adding a new host via orchestration.

2024-02-05 Thread Curt
I don't use rocky, so stab in the dark and probably not the issue, but
could selinux be blocking the process?  Really long shot, but python3 is in
the standard location? So if you run python3 --version as your ceph user it
returns?

Probably not much help, but figured I'd throw it out there.

On Mon, 5 Feb 2024, 16:54 Gary Molenkamp,  wrote:

> I have verified the server's expected hostname (with `hostname`) matches
> the hostname I am trying to use.
> Just to be sure, I also ran:
>  cephadm check-host --expect-hostname 
> and it returns:
>  Hostname "" matches what is expected.
>
> On the current admin server where I am trying to add the host, the host
> is reachable, the shortname even matches proper IP with dns search order.
> Likewise, on the server where the mgr is running, I am able to confirm
> reachability and DNS resolution for the new server as well.
>
> I thought this may be a DNS/name resolution issue as well, but I don't
> see any errors in my setup wrt to host naming.
>
> Thanks
> Gary
>
>
> On 2024-02-03 06:46, Eugen Block wrote:
> > Hi,
> >
> > I found this blog post [1] which reports the same error message. It
> > seems a bit misleading because it appears to be about DNS. Can you check
> >
> > cephadm check-host --expect-hostname 
> >
> > Or is that what you already tried? It's not entirely clear how you
> > checked the hostname.
> >
> > Regards,
> > Eugen
> >
> > [1]
> >
> https://blog.mousetech.com/ceph-distributed-file-system-for-the-enterprise/ceph-bogus-error-cannot-allocate-memory/
> >
> > Zitat von Gary Molenkamp :
> >
> >> Happy Friday all.  I was hoping someone could point me in the right
> >> direction or clarify any limitations that could be impacting an issue
> >> I am having.
> >>
> >> I'm struggling to add a new set of hosts to my ceph cluster using
> >> cephadm and orchestration.  When trying to add a host:
> >> "ceph orch host add  172.31.102.41 --labels _admin"
> >> returns:
> >> "Error EINVAL: Can't communicate with remote host
> >> `172.31.102.41`, possibly because python3 is not installed there:
> >> [Errno 12] Cannot allocate memory"
> >>
> >> I've verified that the ceph ssh key works to the remote host, host's
> >> name matches that returned from `hostname`, python3 is installed, and
> >> "/usr/sbin/cephadm prepare-host" on the new hosts returns "host is
> >> ok".In addition, the cluster ssh key works between hosts and the
> >> existing hosts are able to ssh in using the ceph key.
> >>
> >> The existing ceph cluster is Pacific release using docker based
> >> containerization on RockyLinux8 base OS.  The new hosts are
> >> RockyLinux9 based, with the cephadm being installed from Quincy release:
> >> ./cephadm add-repo --release quincy
> >> ./cephadm install
> >> I did try installing cephadm from the Pacific release by changing the
> >> repo to el8,  but that did not work either.
> >>
> >> Is there a limitation is mixing RL8 and RL9 container hosts under
> >> Pacific?  Does this same limitation exist under Quincy? Is there a
> >> python version dependency?
> >> The reason for RL9 on the new hosts is to stage upgrading the OS's
> >> for the cluster.  I did this under Octopus for moving from Centos7 to
> >> RL8.
> >>
> >> Thanks and I appreciate any feedback/pointers.
> >> Gary
> >>
> >>
> >> I've added the log trace here in case that helps (from `ceph log last
> >> cephadm`)
> >>
> >>
> >>
> >> 2024-02-02T14:22:32.610048+ mgr.storage01.oonvfl (mgr.441023307)
> >> 4957871 : cephadm [ERR] Can't communicate with remote host
> >> `172.31.102.41`, possibly because python3 is not installed there:
> >> [Errno 12] Cannot allocate memory
> >> Traceback (most recent call last):
> >>   File "/usr/share/ceph/mgr/cephadm/serve.py", line 1524, in
> >> _remote_connection
> >> conn, connr = self.mgr._get_connection(addr)
> >>   File "/usr/share/ceph/mgr/cephadm/module.py", line 1370, in
> >> _get_connection
> >> sudo=True if self.ssh_user != 'root' else False)
> >>   File "/lib/python3.6/site-packages/remoto/backends/__init__.py",
> >> line 35, in __init__
> >> self.gateway = self._make_gateway(hostname)
> >>   File "/lib/python3.6/site-packages/remoto/backends/__init__.py",
> >> line 46, in _make_gateway
> >> self._make_connection_string(hostname)
> >>   File "/lib/python3.6/site-packages/execnet/multi.py", line 133, in
> >> makegateway
> >> io = gateway_io.create_io(spec, execmodel=self.execmodel)
> >>   File "/lib/python3.6/site-packages/execnet/gateway_io.py", line
> >> 121, in create_io
> >> io = Popen2IOMaster(args, execmodel)
> >>   File "/lib/python3.6/site-packages/execnet/gateway_io.py", line 21,
> >> in __init__
> >> self.popen = p = execmodel.PopenPiped(args)
> >>   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line
> >> 184, in PopenPiped
> >> return self.subprocess.Popen(args, stdout=PIPE, stdin=PIPE)
> >>   File "/lib64/python3.6/subprocess.py", line 729, in __init__
> >> 

[ceph-users] Re: Problems adding a new host via orchestration.

2024-02-05 Thread Gary Molenkamp
I have verified the server's expected hostname (with `hostname`) matches 
the hostname I am trying to use.

Just to be sure, I also ran:
    cephadm check-host --expect-hostname 
and it returns:
    Hostname "" matches what is expected.

On the current admin server where I am trying to add the host, the host 
is reachable, the shortname even matches proper IP with dns search order.
Likewise, on the server where the mgr is running, I am able to confirm 
reachability and DNS resolution for the new server as well.


I thought this may be a DNS/name resolution issue as well, but I don't 
see any errors in my setup wrt to host naming.


Thanks
Gary


On 2024-02-03 06:46, Eugen Block wrote:

Hi,

I found this blog post [1] which reports the same error message. It 
seems a bit misleading because it appears to be about DNS. Can you check


cephadm check-host --expect-hostname 

Or is that what you already tried? It's not entirely clear how you 
checked the hostname.


Regards,
Eugen

[1] 
https://blog.mousetech.com/ceph-distributed-file-system-for-the-enterprise/ceph-bogus-error-cannot-allocate-memory/


Zitat von Gary Molenkamp :

Happy Friday all.  I was hoping someone could point me in the right 
direction or clarify any limitations that could be impacting an issue 
I am having.


I'm struggling to add a new set of hosts to my ceph cluster using 
cephadm and orchestration.  When trying to add a host:

    "ceph orch host add  172.31.102.41 --labels _admin"
returns:
    "Error EINVAL: Can't communicate with remote host 
`172.31.102.41`, possibly because python3 is not installed there: 
[Errno 12] Cannot allocate memory"


I've verified that the ceph ssh key works to the remote host, host's 
name matches that returned from `hostname`, python3 is installed, and 
"/usr/sbin/cephadm prepare-host" on the new hosts returns "host is 
ok".    In addition, the cluster ssh key works between hosts and the 
existing hosts are able to ssh in using the ceph key.


The existing ceph cluster is Pacific release using docker based 
containerization on RockyLinux8 base OS.  The new hosts are 
RockyLinux9 based, with the cephadm being installed from Quincy release:

        ./cephadm add-repo --release quincy
        ./cephadm install
I did try installing cephadm from the Pacific release by changing the 
repo to el8,  but that did not work either.


Is there a limitation is mixing RL8 and RL9 container hosts under 
Pacific?  Does this same limitation exist under Quincy? Is there a 
python version dependency?
The reason for RL9 on the new hosts is to stage upgrading the OS's 
for the cluster.  I did this under Octopus for moving from Centos7 to 
RL8.


Thanks and I appreciate any feedback/pointers.
Gary


I've added the log trace here in case that helps (from `ceph log last 
cephadm`)




2024-02-02T14:22:32.610048+ mgr.storage01.oonvfl (mgr.441023307) 
4957871 : cephadm [ERR] Can't communicate with remote host 
`172.31.102.41`, possibly because python3 is not installed there: 
[Errno 12] Cannot allocate memory

Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 1524, in 
_remote_connection

    conn, connr = self.mgr._get_connection(addr)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1370, in 
_get_connection

    sudo=True if self.ssh_user != 'root' else False)
  File "/lib/python3.6/site-packages/remoto/backends/__init__.py", 
line 35, in __init__

    self.gateway = self._make_gateway(hostname)
  File "/lib/python3.6/site-packages/remoto/backends/__init__.py", 
line 46, in _make_gateway

    self._make_connection_string(hostname)
  File "/lib/python3.6/site-packages/execnet/multi.py", line 133, in 
makegateway

    io = gateway_io.create_io(spec, execmodel=self.execmodel)
  File "/lib/python3.6/site-packages/execnet/gateway_io.py", line 
121, in create_io

    io = Popen2IOMaster(args, execmodel)
  File "/lib/python3.6/site-packages/execnet/gateway_io.py", line 21, 
in __init__

    self.popen = p = execmodel.PopenPiped(args)
  File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 
184, in PopenPiped

    return self.subprocess.Popen(args, stdout=PIPE, stdin=PIPE)
  File "/lib64/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/lib64/python3.6/subprocess.py", line 1295, in _execute_child
    restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 1528, in 
_remote_connection

    raise execnet.gateway_bootstrap.HostNotFound(msg)
execnet.gateway_bootstrap.HostNotFound: Can't communicate with remote 
host `172.31.102.41`, possibly because python3 is not installed 
there: [Errno 12] Cannot allocate memory


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File 

[ceph-users] Re: Problems adding a new host via orchestration.

2024-02-03 Thread Eugen Block

Hi,

I found this blog post [1] which reports the same error message. It  
seems a bit misleading because it appears to be about DNS. Can you check


cephadm check-host --expect-hostname 

Or is that what you already tried? It's not entirely clear how you  
checked the hostname.


Regards,
Eugen

[1]  
https://blog.mousetech.com/ceph-distributed-file-system-for-the-enterprise/ceph-bogus-error-cannot-allocate-memory/


Zitat von Gary Molenkamp :

Happy Friday all.  I was hoping someone could point me in the right  
direction or clarify any limitations that could be impacting an  
issue I am having.


I'm struggling to add a new set of hosts to my ceph cluster using  
cephadm and orchestration.  When trying to add a host:

    "ceph orch host add  172.31.102.41 --labels _admin"
returns:
    "Error EINVAL: Can't communicate with remote host  
`172.31.102.41`, possibly because python3 is not installed there:  
[Errno 12] Cannot allocate memory"


I've verified that the ceph ssh key works to the remote host, host's  
name matches that returned from `hostname`, python3 is installed,  
and "/usr/sbin/cephadm prepare-host" on the new hosts returns "host  
is ok".    In addition, the cluster ssh key works between hosts and  
the existing hosts are able to ssh in using the ceph key.


The existing ceph cluster is Pacific release using docker based  
containerization on RockyLinux8 base OS.  The new hosts are  
RockyLinux9 based, with the cephadm being installed from Quincy  
release:

        ./cephadm add-repo --release quincy
        ./cephadm install
I did try installing cephadm from the Pacific release by changing  
the repo to el8,  but that did not work either.


Is there a limitation is mixing RL8 and RL9 container hosts under  
Pacific?  Does this same limitation exist under Quincy?  Is there a  
python version dependency?
The reason for RL9 on the new hosts is to stage upgrading the OS's  
for the cluster.  I did this under Octopus for moving from Centos7  
to RL8.


Thanks and I appreciate any feedback/pointers.
Gary


I've added the log trace here in case that helps (from `ceph log  
last cephadm`)




2024-02-02T14:22:32.610048+ mgr.storage01.oonvfl (mgr.441023307)  
4957871 : cephadm [ERR] Can't communicate with remote host  
`172.31.102.41`, possibly because python3 is not installed there:  
[Errno 12] Cannot allocate memory

Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 1524, in  
_remote_connection

    conn, connr = self.mgr._get_connection(addr)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1370, in _get_connection
    sudo=True if self.ssh_user != 'root' else False)
  File "/lib/python3.6/site-packages/remoto/backends/__init__.py",  
line 35, in __init__

    self.gateway = self._make_gateway(hostname)
  File "/lib/python3.6/site-packages/remoto/backends/__init__.py",  
line 46, in _make_gateway

    self._make_connection_string(hostname)
  File "/lib/python3.6/site-packages/execnet/multi.py", line 133, in  
makegateway

    io = gateway_io.create_io(spec, execmodel=self.execmodel)
  File "/lib/python3.6/site-packages/execnet/gateway_io.py", line  
121, in create_io

    io = Popen2IOMaster(args, execmodel)
  File "/lib/python3.6/site-packages/execnet/gateway_io.py", line  
21, in __init__

    self.popen = p = execmodel.PopenPiped(args)
  File "/lib/python3.6/site-packages/execnet/gateway_base.py", line  
184, in PopenPiped

    return self.subprocess.Popen(args, stdout=PIPE, stdin=PIPE)
  File "/lib64/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/lib64/python3.6/subprocess.py", line 1295, in _execute_child
    restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 1528, in  
_remote_connection

    raise execnet.gateway_bootstrap.HostNotFound(msg)
execnet.gateway_bootstrap.HostNotFound: Can't communicate with  
remote host `172.31.102.41`, possibly because python3 is not  
installed there: [Errno 12] Cannot allocate memory


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 125, in wrapper
    return OrchResult(f(*args, **kwargs))
  File "/usr/share/ceph/mgr/cephadm/module.py", line 2709, in apply
    results.append(self._apply(spec))
  File "/usr/share/ceph/mgr/cephadm/module.py", line 2574, in _apply
    return self._add_host(cast(HostSpec, spec))
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1517, in _add_host
    ip_addr = self._check_valid_addr(spec.hostname, spec.addr)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1498, in  
_check_valid_addr

    error_ok=True, no_fsid=True)
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 1326, in _run_cephadm
    with