Since you really bothered to provide so very detailed inputs and help
us
help you (vs what some other people tend to do) - I think you really
deserved a decent answer (and some explanation).
The last question first -even though you don't specify/have dedicated
Storage traffic, there will be an additional interface inside the SSVM
connected to the same Management network (not to the old Storage
network -
if you see the old storage network, restart your mgmt server and
destroy
the SSVM - a new one should be created, with proper interfaces inside
it)
bond naming issues:
- rename your "bond-services" to something industry-standard like
"bond0"
or similar - cloudstack extracts "child" interfaces from cloudbr1 IF
you
specify a VLAN for a network that ACS should create - so your
"bond-services", while fancy (and unclear to me WHY you named it in
that
weird way - smiley here) - is NOT something CloudStack will recognize
and
this is the reason it fails (it even says so in that error message)
- no reason to NOT have that dedicated storage network - feel free to
bring it back - the same issue you have as for the public traffic -
rename
"bond-storage" to e.g. "bond1" and you will be good to go - since you
are
NOT using tagging, ACS will just plug vNIC of the VM into the cloudbr2
(or
whatever bridge name you use for it).
Now some explanation (even though your deduction capabilities
certainly
made you draw some conclusions from what I wrote above ^^^)
- When you specify a VLAN id for some network in CLoudStack -
CloudStack
will look for the device name that is specified as the "Traffic label"
for
that traffic (and you have none??? for your Public traffic - while it
should be set to the name of the bridge device "cloudbr1") - and then
it
will provision a VLAN interface and create a new bridge - (i.e. for
Public
network with VLAN id 48, it will extract "bond0" from the "cloudbr1",
and
create bond0.48 VLAN interface - AND it will create a brand new bridge
with
this bond0.48 interface (bridge with funny name), and plug Public
vNICs
into this new bridge....
- When you do NOT specify a VLAN id for some network in CloudStack
(i.e.
your storage network doesn't use VLAN ID in CloudStack, your switch
ports
are in access vlan 96) - you need to have a bridge (i.e. cloudbr2)
with the
bondYYY child interface (instead of that "bond-storage" fancy but
unrecognized child interface name) - and then ACS will NOT extract
child
interface (nor do everything I explained in the previous
paraghraph/bullet
point) - it will just bluntly "stick" all the vNICs into that cloudbr2
-
and hope you have a proper physical/child interface also added to the
cloudbr2 that will carry the traffic down the line... (purely FYI -
you
could also e.g. use trunking on Linux if you want to, and have e.g.
"bondXXX.96" VLAN interface manually configured and add it to the
bridge,
while still NOT defining any VLAN in the CloudStack for that Storage
network - and ACS will just stick vNIC to this bridge)
Public traffic/network - is the network that all systemVMs (SSVM, CPVM
and
all VRs) are connected to - this network is "public" like "external"
to
other CloudStack internal or Guest network - this is the network to
which
the "north" interface is connected - but does NOT have to be " non-RFC
1918
" - it can be any private IP range from your company internal network
(that
will eventually route traffic to internet - IF you want your ACS to be
able
to download stuff/templates from Internet - otherwise it does NOT have
to
route to internet - if you are using private cloud and do NOT want
external
access to your ACS, well to SSVM and CPVM and VRs external ("public")
interfaces/IPs - but if you are running a public cloud - then you want
to
provide a non-RFC 1918 i.e. a really Publicly routable IP
addresses/range
for the Public network - ACS will assign 1IP for SSVM, 1 IP for CPVM,
and
many IPs to your many VRs you create.
A thing that I briefly touched somewhere upstairs ^^^ - for each
traffic
type you have defined - you need to define a traffic label - my
deduction
capabilities make me believe you are using KVM, so you need to set
your KVM
traffic label for all your network traffic (traffic label, in you case
=
exact name of the bridge as visible in Linux) - I recall there are
some new
UI issues when it comes to tags, so go to your
<MGMT-IP>:8080/client/legacy
- and check your traffic label there - and set it there, UI in
4.15.0.0
doesn't allow you to update/set it after the zone is created - but old
UI
will allow you to do it.
Not sure why I spent 30 minutes of my life, but there you go - hope
you
got everything from my email - let me know if anything is unclear!
Cheers,
On Wed, 16 Jun 2021 at 19:15, Joshua Schaeffer
<jschaef...@harmonywave.com>
wrote:
So Suresh's advise has pushed me in the right direction. The VM was
up
but the agent state was down. I was able to connect to the VM in
order to
continue investigating and the VM is having network issues connecting
to
both my load balancer and my secondary storage server. I don't think
I'm
understanding how the public network portion is supposed to work in
my zone
and could use some clarification. First let me explain my network
setup. On
my compute nodes, ideally, I want to use 3 NIC's:
1. A management NIC for management traffic. I was using cloudbr0 for
this. cloudbr0 is a bridge I created that is connected to an access
port on
my switch. No vlan tagging is required to use this network (it uses
VLAN 20)
2. A cloud NIC for both public and guest traffic. I was using
cloudbr1
for this. cloudbr1 is a bridge I created that is connected to a trunk
port
on my switch. Public traffic uses VLAN 48 and guest traffic should
use
VLANs 400 - 656. As the port is trunked I have to use vlan tagging
for any
traffic over this NIC.
3. A storage NIC for storage traffic. I use a bond called
"bond-storage"
for this. bond-storage is connected to an access port on my switch.
No vlan
tagging is required to use this network (it uses VLAN 96)
For now I've removed the storage NIC from the configuration to
simplify
my troubleshooting, so I should only be working with cloudbr0 and
cloudbr1.
To me the public network is a *non-RFC 1918* address that should be
assigned to tenant VM's for external internet access. Why do system
VM's
need/get a public IP address? Can't they access all the internal
CloudStack
servers using the pod's management network?
So the first problem I'm seeing is whenever I tell CloudStack to tag
VLAN
48 for public traffic it uses the underlying bond under cloudbr1 and
not
the bridge. I don't know where it is even getting this name as I
never
provided it to CloudStack
Here is how I have it configured:
https://drive.google.com/file/d/10PxLdp6e46_GW7oPFJwB3sQQxnvwUhvH/view?usp=sharing
Here is the message in the management logs:
2021-06-16 16:00:40,454 INFO [c.c.v.VirtualMachineManagerImpl]
(Work-Job-Executor-13:ctx-0f39d8e2 job-4/job-68 ctx-a4f832c5)
(logid:eb82035c) Unable to start VM on Host[-2-Routing] due to Failed
to
create vnet 48: Error: argument "bond-services.48" is wrong: "name"
not a
valid ifnameCannot find device "bond-services.48"Failed to create
vlan 48
on pif: bond-services.
This ultimately results in an error and the system VM never even
starts.
If I remove the vlan tag from the configuration (
https://drive.google.com/file/d/11tF6YIHm9xDZvQkvphi1xvHCX_X9rDz1/view?usp=sharing)
then the VM starts and gets a public IP but without a tagged NIC it
can't
actually connect to the network. This is from inside the system VM:
root@s-9-VM:~# ip --brief addr
lo UNKNOWN 127.0.0.1/8
eth0 UP 169.254.91.216/16
eth1 UP 10.2.21.72/22
eth2 UP 192.41.41.162/25
eth3 UP 10.2.99.15/22
root@s-9-VM:~# ping 192.41.41.129
PING 192.41.41.129 (192.41.41.129): 56 data bytes
92 bytes from s-9-VM (192.41.41.162): Destination Host Unreachable
92 bytes from s-9-VM (192.41.41.162): Destination Host Unreachable
92 bytes from s-9-VM (192.41.41.162): Destination Host Unreachable
92 bytes from s-9-VM (192.41.41.162): Destination Host Unreachable
^C--- 192.41.41.129 ping statistics ---
5 packets transmitted, 0 packets received, 100% packet loss
Obviously if the network isn't functioning then it can't connect to
my
storage server and the agent never starts. How do I setup my public
network
so that it tags the packets going over cloudbr1? Also, can I not have
a
public IP address for system VM's or is this required?
I have some other issues as well like the fact that it is creating a
storage NIC on the system VM's even though I deleted my storage
network
from the zone, but I can tackle one problem at a time. If anyone is
curious
or it helps visualize my network, here is is a little ASCII diagram
of how
I have the compute node's networking setup. Hopefully it comes across
the
mailing list correctly and not all mangled:
+===============================================================================================================
|
| enp3s0f0 (eth) enp3s0f1 (eth) enp65s0f0 (eth)
enp65s0f1
(eth) enp71s0 (eth) enp72s0 (eth)
| | | |
| | |
| | |
+--------+---------+ +--------+---------+
| | |
| |
| | | bond-services
(bond) |
| | |
| |
| | |
| |
| | |
| |
| cloudbr0 (bridge) N/A cloudbr1
(bridge) bond-storage (bond)
| VLAN 20 (access) VLAN 48, 400 - 656
(trunk) VLAN 96 (access)
On 6/16/21 9:38 AM, Andrija Panic wrote:
> " There is no secondary storage VM for downloading template to image
store
> LXC_SEC_STOR1 "
>
> So next step to investigate why there is no SSVM (can hosts access the
> secondary storage NFS, can they access the Primary Storage, etc - those
> tests you can do manually) - and as Suresh advised - one it's up, is it
all
> green (COnnected / Up state).
>
> Best,
>
I appreciate everyone's help.
--
Thanks,
Joshua Schaeffer
--
Andrija Panić