Re: Openshift origin - nodes in vlans
Hi Lukasz, happy to read that you found the root cause and fixed the issue. I have seen a good number of issues, sometimes also related to DNS caching but this happening during the provisioning process is a new one for me. Always interesting Best Regards, Frédéric On Wed, Jun 21, 2017 at 7:37 PM, Łukasz Strzelecwrote: > Hi Frédéric:), > > I found out what was the root cause of that behaviour. You would not > believe in this. But first things at first place: > > > It is true that our OSO implementation is let say "unusual". During > scaling up our cluster, we are using FQDNs . We have got also self service > portal for provisioning new hosts. Our customer order 10 atomics hosts in > dedicated vlan and decided to attach them into OSO cluster. Before doing > this, it was decided to change DNS names of them. > > > And here is the place where the story is starting :) The dns zone was > refreshed after 45 minutes. But the host from where we were executing > Ansible playbooks, had got cached old IP addresses. > > > So whats happend. All atomics hosts had beed properly configured, but all > entries in Openshift configuration contains wrong IP addresses . This is > why , from network layer cluster was working, all nodes had beed reported > as "ready". But inside of cluster the configuration was messup. > > Your link was very helpful. Thanks to it, I found wrong configuration: > > # oc get hostsubnet > NAME HOST HOST IP SUBNET > rh71-os1.example.com rh71-os1.example.com 192.168.122.46 > 10.1.1.0/24 > rh71-os2.example.com rh71-os2.example.com 192.168.122.18 > 10.1.2.0/24 > rh71-os3.example.com rh71-os3.example.com 192.168.122.202 > 10.1.0.0/24 > > and from the first shot I've noticed wrong IP addresses. > > > I've re-run the playbook and everything is working like a charm. Thx a > lot for your help. > > Best regards:) > > > 2017-06-21 10:12 GMT+02:00 Frederic Giloux : > >> Hi Lukasz >> >> if you don't have connectivity at the service level it is likely that the >> IPTables have not been configured on your new node. You can validate that >> with iptables -L -n. Compare the result on your new node and on one in the >> other VLAN. If this is confirmed the master may not be able to connect to >> the kubelet on the new node (port TCP 10250 as per my previews email). >> Another thing that could have gone wrong is the population of the OVS >> table. In that case restarting the node would reinitialise it. >> Other point the traffic between pods communicating through service should >> go through the SDN, which means your network team should only see SDN >> packets between nodes at a firewall between VLANs and not traffic to your >> service IP range. >> This resource should also be of help: https://docs.openshift.com/con >> tainer-platform/3.5/admin_guide/sdn_troubleshooting.html# >> debugging-a-service >> >> I hope this helps. >> >> Regards, >> >> Frédéric >> >> >> On Wed, Jun 21, 2017 at 9:27 AM, Łukasz Strzelec < >> lukasz.strze...@gmail.com> wrote: >> >>> Hello :) >>> >>> Thx for quick replay, >>> >>> I did, I mean, the mentioned port had been opened. All nodes are >>> visible to eachother, and oc get nodes showing "ready" state. >>> >>> But pushing to registry, or simply test connectivity to endpoints or >>> services IPs showin no route to host. >>> >>> Do you know how to test this properly ? >>> >>> The network guy is telling me that he see some denies from VLAN_B to >>> 172.30.0.0/16 network. He also ensure me that traffic on 4780 port is >>> allowed. >>> >>> >>> I did some test once again: >>> >>> I try to deploy example ruby application, and it stops on pushing into >>> registry :( >>> >>> Also when I deploy simple pod (hello-openshift) then expose service. I >>> cannot reach the website. I'm seeing default route page with infor that >>> application doesn't exists >>> >>> >>> Please see the logs below: >>> >>> Fetching gem metadata from https://rubygems.org/... >>> Fetching version metadata from https://rubygems.org/.. >>> Warning: the running version of Bundler is older than the version that >>> created the lockfile. We suggest you upgrade to the latest version of >>> Bundler by running `gem install bundler`. >>> Installing puma 3.4.0 with native extensions >>> Installing rack 1.6.4 >>> Using bundler 1.10.6 >>> Bundle complete! 2 Gemfile dependencies, 3 gems now installed. >>> Gems in the groups development and test were not installed. >>> Bundled gems are installed into ./bundle. >>> ---> Cleaning up unused ruby gems ... >>> Warning: the running version of Bundler is older than the version that >>> created the lockfile. We suggest you upgrade to the latest version of >>> Bundler by running `gem install bundler`. >>> Pushing image 172.30.123.59:5000/testshared/d:latest ... >>> Registry server Address: >>> Registry server User Name: serviceaccount >>> Registry server
Re: oc whoami bug?
It doesn't give you an anonymous token. It gives you the current token held by oc, which the server may or may not consider valid. On Wed, Jun 21, 2017 at 1:47 PM, Ben Pareeswrote: > > > On Wed, Jun 21, 2017 at 12:30 PM, Clayton Coleman > wrote: > >> If your script looks like: >> >> $ oc get service foo --token "$(oc whoami -t)" >> >> and whoami -t fails you're going to get something you didn't expect as >> output. >> > > if it succeeds and gives you an anonymous token you're also going to get > something you didn't expect as output. Namely a denial on the get that > appears to make no sense. (I don't know what you'll get if the oc whoami > -t failed with an error, but probably at least something that might point > you towards the token being malformed which might lead you to run oc whoami > -t to see what it's returning. Getting a permission denied is going to > lead you to go check if the user you think you are, has permissions) > > > > >> >> >> >> On Wed, Jun 21, 2017 at 9:38 AM, Ben Parees wrote: >> >>> >>> >>> On Wed, Jun 21, 2017 at 9:31 AM, Clayton Coleman >>> wrote: >>> The reason today it does not do that so you can use it in scripting effectively. It's expected you're using that immediately in another command which would display that error. >>> >>> why would "oc whoami -t" returning an error in this case prevent using >>> it in scripting effectively? it would just mean the script would fail one >>> command earlier (before the bad token was used). Seems like that would be >>> the more useful behavior in terms of understanding what failed in the >>> script, too. >>> >>> >>> >>> On Jun 21, 2017, at 7:49 AM, Philippe Lafoucrière < philippe.lafoucri...@tech-angels.com> wrote: Just to be clear, my point is: if `oc whoami` returns "error: You must be logged in to the server (the server has asked for the client to provide credentials)", `oc whoami -t` should return the same if the session has timed out ;) ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users >>> >>> >>> -- >>> Ben Parees | OpenShift >>> >>> >> > > > -- > Ben Parees | OpenShift > > > ___ > users mailing list > users@lists.openshift.redhat.com > http://lists.openshift.redhat.com/openshiftmm/listinfo/users > > ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: oc whoami bug?
On Wed, Jun 21, 2017 at 12:30 PM, Clayton Colemanwrote: > If your script looks like: > > $ oc get service foo --token "$(oc whoami -t)" > > and whoami -t fails you're going to get something you didn't expect as > output. > if it succeeds and gives you an anonymous token you're also going to get something you didn't expect as output. Namely a denial on the get that appears to make no sense. (I don't know what you'll get if the oc whoami -t failed with an error, but probably at least something that might point you towards the token being malformed which might lead you to run oc whoami -t to see what it's returning. Getting a permission denied is going to lead you to go check if the user you think you are, has permissions) > > > > On Wed, Jun 21, 2017 at 9:38 AM, Ben Parees wrote: > >> >> >> On Wed, Jun 21, 2017 at 9:31 AM, Clayton Coleman >> wrote: >> >>> The reason today it does not do that so you can use it in scripting >>> effectively. It's expected you're using that immediately in another >>> command which would display that error. >>> >> >> why would "oc whoami -t" returning an error in this case prevent using it >> in scripting effectively? it would just mean the script would fail one >> command earlier (before the bad token was used). Seems like that would be >> the more useful behavior in terms of understanding what failed in the >> script, too. >> >> >> >> >>> >>> On Jun 21, 2017, at 7:49 AM, Philippe Lafoucrière < >>> philippe.lafoucri...@tech-angels.com> wrote: >>> >>> Just to be clear, my point is: if `oc whoami` returns "error: You must >>> be logged in to the server (the server has asked for the client to provide >>> credentials)", `oc whoami -t` should return the same if the session has >>> timed out ;) >>> >>> ___ >>> users mailing list >>> users@lists.openshift.redhat.com >>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >>> >>> >>> ___ >>> users mailing list >>> users@lists.openshift.redhat.com >>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >>> >>> >> >> >> -- >> Ben Parees | OpenShift >> >> > -- Ben Parees | OpenShift ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Openshift origin - nodes in vlans
Hi Frédéric:), I found out what was the root cause of that behaviour. You would not believe in this. But first things at first place: It is true that our OSO implementation is let say "unusual". During scaling up our cluster, we are using FQDNs . We have got also self service portal for provisioning new hosts. Our customer order 10 atomics hosts in dedicated vlan and decided to attach them into OSO cluster. Before doing this, it was decided to change DNS names of them. And here is the place where the story is starting :) The dns zone was refreshed after 45 minutes. But the host from where we were executing Ansible playbooks, had got cached old IP addresses. So whats happend. All atomics hosts had beed properly configured, but all entries in Openshift configuration contains wrong IP addresses . This is why , from network layer cluster was working, all nodes had beed reported as "ready". But inside of cluster the configuration was messup. Your link was very helpful. Thanks to it, I found wrong configuration: # oc get hostsubnet NAME HOST HOST IP SUBNET rh71-os1.example.com rh71-os1.example.com 192.168.122.4610.1.1.0/24 rh71-os2.example.com rh71-os2.example.com 192.168.122.1810.1.2.0/24 rh71-os3.example.com rh71-os3.example.com 192.168.122.202 10.1.0.0/24 and from the first shot I've noticed wrong IP addresses. I've re-run the playbook and everything is working like a charm. Thx a lot for your help. Best regards:) 2017-06-21 10:12 GMT+02:00 Frederic Giloux: > Hi Lukasz > > if you don't have connectivity at the service level it is likely that the > IPTables have not been configured on your new node. You can validate that > with iptables -L -n. Compare the result on your new node and on one in the > other VLAN. If this is confirmed the master may not be able to connect to > the kubelet on the new node (port TCP 10250 as per my previews email). > Another thing that could have gone wrong is the population of the OVS > table. In that case restarting the node would reinitialise it. > Other point the traffic between pods communicating through service should > go through the SDN, which means your network team should only see SDN > packets between nodes at a firewall between VLANs and not traffic to your > service IP range. > This resource should also be of help: https://docs.openshift.com/ > container-platform/3.5/admin_guide/sdn_troubleshooting. > html#debugging-a-service > > I hope this helps. > > Regards, > > Frédéric > > > On Wed, Jun 21, 2017 at 9:27 AM, Łukasz Strzelec < > lukasz.strze...@gmail.com> wrote: > >> Hello :) >> >> Thx for quick replay, >> >> I did, I mean, the mentioned port had been opened. All nodes are >> visible to eachother, and oc get nodes showing "ready" state. >> >> But pushing to registry, or simply test connectivity to endpoints or >> services IPs showin no route to host. >> >> Do you know how to test this properly ? >> >> The network guy is telling me that he see some denies from VLAN_B to >> 172.30.0.0/16 network. He also ensure me that traffic on 4780 port is >> allowed. >> >> >> I did some test once again: >> >> I try to deploy example ruby application, and it stops on pushing into >> registry :( >> >> Also when I deploy simple pod (hello-openshift) then expose service. I >> cannot reach the website. I'm seeing default route page with infor that >> application doesn't exists >> >> >> Please see the logs below: >> >> Fetching gem metadata from https://rubygems.org/... >> Fetching version metadata from https://rubygems.org/.. >> Warning: the running version of Bundler is older than the version that >> created the lockfile. We suggest you upgrade to the latest version of >> Bundler by running `gem install bundler`. >> Installing puma 3.4.0 with native extensions >> Installing rack 1.6.4 >> Using bundler 1.10.6 >> Bundle complete! 2 Gemfile dependencies, 3 gems now installed. >> Gems in the groups development and test were not installed. >> Bundled gems are installed into ./bundle. >> ---> Cleaning up unused ruby gems ... >> Warning: the running version of Bundler is older than the version that >> created the lockfile. We suggest you upgrade to the latest version of >> Bundler by running `gem install bundler`. >> Pushing image 172.30.123.59:5000/testshared/d:latest ... >> Registry server Address: >> Registry server User Name: serviceaccount >> Registry server Email: serviceacco...@example.org >> Registry server Password: <> >> error: build error: Failed to push image: Put >> http://172.30.123.59:5000/v1/repositories/testshared/d/: dial tcp >> 172.30.123.59:5000: getsockopt: no route to host >> >> >> 2017-06-21 7:31 GMT+02:00 Frederic Giloux : >> >>> Hi Lukasz, >>> >>> this is not an unusual setup. You will need: >>> - the SDN port: 4789 UDP (both directions: masters/nodes to nodes) >>> - the kubelet port: 10250 TCP
Re: oc whoami bug?
If you're looking for a "am I authenticated" script element, generally I would recommend doing: $ oc get user/~ -o name --token "$(oc whoami -t)" On Wed, Jun 21, 2017 at 12:30 PM, Clayton Colemanwrote: > If your script looks like: > > $ oc get service foo --token "$(oc whoami -t)" > > and whoami -t fails you're going to get something you didn't expect as > output. > > > > On Wed, Jun 21, 2017 at 9:38 AM, Ben Parees wrote: > >> >> >> On Wed, Jun 21, 2017 at 9:31 AM, Clayton Coleman >> wrote: >> >>> The reason today it does not do that so you can use it in scripting >>> effectively. It's expected you're using that immediately in another >>> command which would display that error. >>> >> >> why would "oc whoami -t" returning an error in this case prevent using it >> in scripting effectively? it would just mean the script would fail one >> command earlier (before the bad token was used). Seems like that would be >> the more useful behavior in terms of understanding what failed in the >> script, too. >> >> >> >> >>> >>> On Jun 21, 2017, at 7:49 AM, Philippe Lafoucrière < >>> philippe.lafoucri...@tech-angels.com> wrote: >>> >>> Just to be clear, my point is: if `oc whoami` returns "error: You must >>> be logged in to the server (the server has asked for the client to provide >>> credentials)", `oc whoami -t` should return the same if the session has >>> timed out ;) >>> >>> ___ >>> users mailing list >>> users@lists.openshift.redhat.com >>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >>> >>> >>> ___ >>> users mailing list >>> users@lists.openshift.redhat.com >>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >>> >>> >> >> >> -- >> Ben Parees | OpenShift >> >> > ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: oc whoami bug?
If your script looks like: $ oc get service foo --token "$(oc whoami -t)" and whoami -t fails you're going to get something you didn't expect as output. On Wed, Jun 21, 2017 at 9:38 AM, Ben Pareeswrote: > > > On Wed, Jun 21, 2017 at 9:31 AM, Clayton Coleman > wrote: > >> The reason today it does not do that so you can use it in scripting >> effectively. It's expected you're using that immediately in another >> command which would display that error. >> > > why would "oc whoami -t" returning an error in this case prevent using it > in scripting effectively? it would just mean the script would fail one > command earlier (before the bad token was used). Seems like that would be > the more useful behavior in terms of understanding what failed in the > script, too. > > > > >> >> On Jun 21, 2017, at 7:49 AM, Philippe Lafoucrière < >> philippe.lafoucri...@tech-angels.com> wrote: >> >> Just to be clear, my point is: if `oc whoami` returns "error: You must >> be logged in to the server (the server has asked for the client to provide >> credentials)", `oc whoami -t` should return the same if the session has >> timed out ;) >> >> ___ >> users mailing list >> users@lists.openshift.redhat.com >> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >> >> >> ___ >> users mailing list >> users@lists.openshift.redhat.com >> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >> >> > > > -- > Ben Parees | OpenShift > > ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: CephFS => ls: cannot open directory /cephfs/: Permission denied
2017-06-21 16:25 GMT+02:00 Stéphane Klein: > I don't see where is my permission error. > Maybe it's this error: http://tracker.ceph.com/issues/13231 ? I have tried that: # setfattr -n security.selinux -v system_u:object_r:nfs_t:s0 /var/lib/origin/openshift.local.volumes/pods/4cc61dfa-5692-11e7-aef3-005056b1755a/volumes/ kubernetes.io~cephfs/pv-ceph-prod-rbx-fs1 setfattr: /var/lib/origin/openshift.local.volumes/pods/4cc61dfa-5692-11e7-aef3-005056b1755a/volumes/ kubernetes.io~cephfs/pv-ceph-prod-rbx-fs1: Operation not supported I don't know if it is the good syntax. Kernel version is: 3.10.0-514.16.1.el7.x86_64 # ceph -v ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) # atomic host status State: idle Deployments: ● centos-atomic-host:centos-atomic-host/7/x86_64/standard Version: 7.20170428 (2017-05-09 16:53:51) Commit: 67c8af37c5d05bb3b377ec1bd3c127f98664d6f7a78bf2089fcfb02784d12fbd OSName: centos-atomic-host GPGSignature: 1 signature Signature made Tue 09 May 2017 05:43:07 PM EDT using RSA key ID F17E745691BA8335 Good signature from "CentOS Atomic SIG < secur...@centos.org>" Best regards, Stéphane ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: oc whoami bug?
Title: Re: oc whoami bug? Hi Philippe Lafoucrière. on Mittwoch, 21. Juni 2017 at 13:48 was written: Just to be clear, my point is: if `oc whoami` returns "error: You must be logged in to the server (the server has asked for the client to provide credentials)", `oc whoami -t` should return the same if the session has timed out ;) +1 or some error code e. g. -1 or something similar -- Best Regards Aleks smime.p7s Description: S/MIME Cryptographic Signature ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
CephFS => ls: cannot open directory /cephfs/: Permission denied
Hi, I have one CephFS cluster. This my PV: apiVersion: v1 kind: PersistentVolume metadata: name: pv-ceph-prod-rbx-fs1 labels: storage-type: ceph-fs ceph-cluster: ceph-prod-rbx spec: accessModes: - ReadWriteMany capacity: storage: 100Mi cephfs: monitors: - 137.74.203.82:6789 - 172.29.20.31:6789 - 172.29.20.32:6789 pool: rbd user: admin path: /data1/ secretRef: name: ceph-secret readOnly: false persistentVolumeReclaimPolicy: Retain After container started, CephFS volume is mounted with success on OpenShift node. In OpenShift node host: # mount | grep "ceph" 137.74.203.82:6789,172.29.20.31:6789,172.29.20.32:6789:/data1/ on /var/lib/origin/openshift.local.volumes/pods/0f4bb6ef-568b-11e7-aef3-005056b1755a/volumes/ kubernetes.io~cephfs/pv-ceph-prod-rbx-fs1 type ceph (rw,relatime,name=admin,secret=,acl) # ls /var/lib/origin/openshift.local.volumes/pods/0f4bb6ef-568b-11e7-aef3-005056b1755a/volumes/ kubernetes.io~cephfs/pv-ceph-prod-rbx-fs1 -lha total 0 drwxrwxrwx 1 root root 1 Jun 21 09:58 . drwxr-x---. 3 root root 33 Jun 21 10:08 .. drwxr-xr-x 1 root root 0 Jun 21 09:58 foo Here, I can write in CephFS volume. In container, I have this error: $ oc rsh test-cephfs-3-v5ggn bash root@test-cephfs-3-v5ggn:/# ls /cephfs/ -lha ls: cannot open directory /cephfs/: Permission denied This is docker mount information: "Mounts": [ { "Source": "/var/lib/origin/openshift.local.volumes/pods/0f4bb6ef-568b-11e7-aef3-005056b1755a/volumes/ kubernetes.io~cephfs/pv-ceph-prod-rbx-fs1", "Destination": "/cephfs", "Mode": "", "RW": true, "Propagation": "rprivate" } I have created this SCC: apiVersion: v1 kind: List metadata: {} items: - apiVersion: v1 kind: SecurityContextConstraints metadata: name: test-cephfs priority: 1 requiredDropCapabilities: null readOnlyRootFilesystem: false runAsUser: type: RunAsAny seLinuxContext: type: MustRunAs supplementalGroups: type: RunAsAny fsGroup: type: MustRunAs users: - system:serviceaccount:test-cephfs:default volumes: - cephFS - configMap - emptyDir - nfs - persistentVolumeClaim - rbd - secret allowHostDirVolumePlugin: false allowHostIPC: false allowHostNetwork: false allowHostPID: false allowHostPorts: false allowPrivilegedContainer: false allowedCapabilities: null I don't see where is my permission error. Best regards, Stéphane -- Stéphane Kleinblog: http://stephane-klein.info cv : http://cv.stephane-klein.info Twitter: http://twitter.com/klein_stephane ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: oc whoami bug?
On Wed, Jun 21, 2017 at 9:31 AM, Clayton Colemanwrote: > The reason today it does not do that so you can use it in scripting > effectively. It's expected you're using that immediately in another > command which would display that error. > why would "oc whoami -t" returning an error in this case prevent using it in scripting effectively? it would just mean the script would fail one command earlier (before the bad token was used). Seems like that would be the more useful behavior in terms of understanding what failed in the script, too. > > On Jun 21, 2017, at 7:49 AM, Philippe Lafoucrière < > philippe.lafoucri...@tech-angels.com> wrote: > > Just to be clear, my point is: if `oc whoami` returns "error: You must be > logged in to the server (the server has asked for the client to provide > credentials)", `oc whoami -t` should return the same if the session has > timed out ;) > > ___ > users mailing list > users@lists.openshift.redhat.com > http://lists.openshift.redhat.com/openshiftmm/listinfo/users > > > ___ > users mailing list > users@lists.openshift.redhat.com > http://lists.openshift.redhat.com/openshiftmm/listinfo/users > > -- Ben Parees | OpenShift ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: oc whoami bug?
The reason today it does not do that so you can use it in scripting effectively. It's expected you're using that immediately in another command which would display that error. On Jun 21, 2017, at 7:49 AM, Philippe Lafoucrière < philippe.lafoucri...@tech-angels.com> wrote: Just to be clear, my point is: if `oc whoami` returns "error: You must be logged in to the server (the server has asked for the client to provide credentials)", `oc whoami -t` should return the same if the session has timed out ;) ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: oc whoami bug?
Just to be clear, my point is: if `oc whoami` returns "error: You must be logged in to the server (the server has asked for the client to provide credentials)", `oc whoami -t` should return the same if the session has timed out ;) ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Openshift origin - nodes in vlans
Hi Lukasz if you don't have connectivity at the service level it is likely that the IPTables have not been configured on your new node. You can validate that with iptables -L -n. Compare the result on your new node and on one in the other VLAN. If this is confirmed the master may not be able to connect to the kubelet on the new node (port TCP 10250 as per my previews email). Another thing that could have gone wrong is the population of the OVS table. In that case restarting the node would reinitialise it. Other point the traffic between pods communicating through service should go through the SDN, which means your network team should only see SDN packets between nodes at a firewall between VLANs and not traffic to your service IP range. This resource should also be of help: https://docs.openshift.com/container-platform/3.5/admin_guide/sdn_troubleshooting.html#debugging-a-service I hope this helps. Regards, Frédéric On Wed, Jun 21, 2017 at 9:27 AM, Łukasz Strzelecwrote: > Hello :) > > Thx for quick replay, > > I did, I mean, the mentioned port had been opened. All nodes are visible > to eachother, and oc get nodes showing "ready" state. > > But pushing to registry, or simply test connectivity to endpoints or > services IPs showin no route to host. > > Do you know how to test this properly ? > > The network guy is telling me that he see some denies from VLAN_B to > 172.30.0.0/16 network. He also ensure me that traffic on 4780 port is > allowed. > > > I did some test once again: > > I try to deploy example ruby application, and it stops on pushing into > registry :( > > Also when I deploy simple pod (hello-openshift) then expose service. I > cannot reach the website. I'm seeing default route page with infor that > application doesn't exists > > > Please see the logs below: > > Fetching gem metadata from https://rubygems.org/... > Fetching version metadata from https://rubygems.org/.. > Warning: the running version of Bundler is older than the version that > created the lockfile. We suggest you upgrade to the latest version of > Bundler by running `gem install bundler`. > Installing puma 3.4.0 with native extensions > Installing rack 1.6.4 > Using bundler 1.10.6 > Bundle complete! 2 Gemfile dependencies, 3 gems now installed. > Gems in the groups development and test were not installed. > Bundled gems are installed into ./bundle. > ---> Cleaning up unused ruby gems ... > Warning: the running version of Bundler is older than the version that > created the lockfile. We suggest you upgrade to the latest version of > Bundler by running `gem install bundler`. > Pushing image 172.30.123.59:5000/testshared/d:latest ... > Registry server Address: > Registry server User Name: serviceaccount > Registry server Email: serviceacco...@example.org > Registry server Password: <> > error: build error: Failed to push image: Put > http://172.30.123.59:5000/v1/repositories/testshared/d/: dial tcp > 172.30.123.59:5000: getsockopt: no route to host > > > 2017-06-21 7:31 GMT+02:00 Frederic Giloux : > >> Hi Lukasz, >> >> this is not an unusual setup. You will need: >> - the SDN port: 4789 UDP (both directions: masters/nodes to nodes) >> - the kubelet port: 10250 TCP (masters to nodes) >> - the DNS port: 8053 TCP/UDP (nodes to masters) >> If you can't reach VLAN b pods from VLAN A the issue is probably with the >> SDN port. Mind that it is using UDP. >> >> Regards, >> >> Frédéric >> >> On Wed, Jun 21, 2017 at 4:13 AM, Łukasz Strzelec < >> lukasz.strze...@gmail.com> wrote: >> >>> -- Hello, >>> >>> I have to install OSO with dedicated HW nodes for one of my customer. >>> >>> Current cluster is placed in VLAN (for the sake of this question) >>> called: VLAN_A >>> >>> The Customer's nodes have to be place in another vlan: VLAN_B >>> >>> Now the question, what ports and routes I have to setup to get this to >>> work? >>> >>> The assumption is that traffic between vlans is filtered by default. >>> >>> >>> Now, what I already did: >>> >>> I had opened the ports with accordance to documentation, then scaled up >>> the cluster (ansible playbook). >>> >>> From the first sight , everything was working fine. Nodes had been >>> ready. I can deploy simple pod (eg. hello-openshift). But I can't reach te >>> service. During S2I process, pushing into registry is ending with >>> >>> information "no route to host". I've checked this out, and for nodes >>> placed in VLAN_A (the same one as registry and router) everything works >>> fine. The problem is in the traffic between VLANs A <-> B. I >>> >>> can't reach any IP of services of deployed pods on newly added nodes. >>> Thus, traffic between pods over service-subnet is not allow. Question is >>> what should I open? Whole 172.30.0.0/16 between those 2 >>> >>> vlans, or dedicated rules to /from registry, router , metrics and so on >>> ? >>> >>> >>> -- >>> Ł.S. >>> >>>