Re: [Project Clearwater] Issues with clearwater-docker homestead and homestead-prov under Kubernetes

Adam Lindley Fri, 25 May 2018 01:44:21 -0700

Hi Matthew,

Our Helm support is a recent addition, and came from another external 
contributor. See the Pull Request at 
https://github.com/Metaswitch/clearwater-docker/pull/85 for the details :)
As it stands at the moment, the chart is good enough for deploying and 
re-creating a full standard deployment through Helm, but I don't believe it 
handles more of the complexities of upgrading a clearwater deployment that it 
potentially could.


We haven't yet done any significant work in setting up Helm charts, or 
integrating with them in a more detailed manner, so if that's something you're 
interested in as well, we'd love to work with you to get some more enhancements 
in. Especially if you have other expert contacts who know more in this area.

(I'm removing some of the thread in the email below, to keep us below the list 
limits. The online archives will keep all the info though)

Cheers,
Adam


From: Davis, Matthew [mailto:matthew.davi...@team.telstra.com]
Sent: 25 May 2018 08:30
To: Adam Lindley <adam.lind...@metaswitch.com>; 
clearwater@lists.projectclearwater.org
Subject: RE: [Project Clearwater] Issues with clearwater-docker homestead and 
homestead-prov under Kubernetes

Hi Adam,

Thanks for that.

I haven't had a chance to try your latest suggestion or do more work on AKS yet.

I just have a quick question. Should the helm charts be mostly empty? I spoke 
to someone at Microsoft and he was surprised at how short Chart.yaml is.

Chart.yaml:

```
apiVersion: v1
description: A Helm chart for Clearwater
name: clearwater
version: 0.1.0
```

values.yaml

```
# Default values for clearwater.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
image:
  path: {{IMAGE_PATH}}
  tag: {{IMAGE_TAG}}
```
Thanks,


Matthew Davis
Telstra Graduate Engineer
CTO | Cloud SDN NFV

From: Adam Lindley [mailto:adam.lind...@metaswitch.com]
Sent: Friday, 25 May 2018 3:05 AM
To: Davis, Matthew 
<matthew.davi...@team.telstra.com<mailto:matthew.davi...@team.telstra.com>>; 
clearwater@lists.projectclearwater.org<mailto:clearwater@lists.projectclearwater.org>
Cc: Richard Whitehouse (projectclearwater.org) 
<richard.whiteho...@projectclearwater.org<mailto:richard.whiteho...@projectclearwater.org>>;
 kie...@aptira.com<mailto:kie...@aptira.com>
Subject: RE: [Project Clearwater] Issues with clearwater-docker homestead and 
homestead-prov under Kubernetes

Hey Matthew,

Sorry, I think I was a bit unclear, and have also noticed I missed out one step 
in my previous message. I'll try to clarify a bit here.

> you're having issues connecting to Bono because the service/pod is not exposed

By this, I meant that your Bono service is not exposed outside the kubernetes 
cluster. Other pods would have been able to reach it just fine, but your test 
setup was trying to reach into it from outside the host, and there was no 
config in place mapping it through to allow that to happen (e.g. nodePort, 
LoadBalancer type configuration).

The step I missed out was that we want to change the bono service port on the 
inside of the pod as well, to match the one we set up on the host. This is 
because Bono will be record-routing itself using the PUBLIC_IP and pcscf port, 
so if a SIP client tried to respond using this route, it would by default be 
sent back to port 5060. Because we can't open that up as a nodePort on the 
Kubernetes host, this would cause issues in the SIP flows.
As for why we are specifically interested in port 5060; this is the external 
P-CSCF port. I.e. this is the interface we want to send all our mainline SIP 
flows through. The others are used for webrtc and restund, neither of which we 
are particularly interested in at the moment, and neither of which are critical 
for getting calls through.

So, I think we just want to make a couple of tweaks to the bono-svc.yaml file. 
I'm going to copy bono yaml files in at the bottom. Simply we want the values 
of `port` and `nodePort` under the `-name: "5060"` section to match.
With the configuration as in my yaml files, and changing the `--pcscf` option 
in the bono pod, I see my live tests passing under the following command:
`rake test[default.svc.cw-k8s.test] PROXY=<k8s-host-IP PROXY_PORT=32060 
ELLIS=<k8s-host-IP:30080`

Hopefully you see the same. From the Bono logs you've got below, I think the 
issue is simply that the above misconfiguration meant traffic couldn't reach 
the bono service correctly. I think this should be the last step in this set of 
hurdles; hopefully you see tests and calls working :)

Unfortunately, I don't have much experience using weave networking, so can't 
give much guidance there on how to open the pods up more to the wider network 
to make this of more use. And if in your testing you are seeing no packets hit 
any of the pods, that's going to be the first thing we need to debug, as we've 
probably missed a different piece of network configuration somewhere.
On the other thread, have you made any further progress in the Azure Kubernetes 
Service setup? I would still be very interested to see that up and working too.

Hope this helps. Cheers,
Adam


bono-svc.yaml:
```
apiVersion: v1
kind: Service
metadata:
  name: bono
spec:
  type: NodePort
  ports:
  - name: "3478"
    port: 3478
  - name: "5060"
    port: 32060
    nodePort: 32060
  - name: "5062"
    port: 5062
  selector:
    service: bono
```

bono-depl.yaml
```
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: bono
spec:
  replicas: 1
  selector:
    matchLabels:
      service: bono
  template:
    metadata:
      labels:
        service: bono
        snmp: enabled
    spec:
      containers:
      - image:
        imagePullPolicy: Always
        name: bono
        ports:
        - containerPort: 22
        - containerPort: 3478
        - containerPort: 5060
        - containerPort: 5062
        - containerPort: 5060
          protocol: "UDP"
        - containerPort: 5062
          protocol: "UDP"
        envFrom:
        - configMapRef:
              name: env-vars
        env:
        - name: MY_POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: PUBLIC_IP
          value: 10.230.16.1
        livenessProbe:
          exec:
            command: ["/bin/bash", "/usr/share/kubernetes/liveness.sh", "3478 
5062"]
          initialDelaySeconds: 30
        readinessProbe:
          exec:
            command: ["/bin/bash", "/usr/share/kubernetes/liveness.sh", "3478 
5062"]
        volumeMounts:
        - name: bonologs
          mountPath: /var/log/bono
      - image: busybox
        name: tailer
        command: [ "tail", "-F", "/var/log/bono/bono_current.txt" ]
        volumeMounts:
        - name: bonologs
          mountPath: /var/log/bono
      volumes:
      - name: bonologs
        emptyDir: {}
      restartPolicy: Always
```


From: Davis, Matthew [mailto:matthew.davi...@team.telstra.com]
Sent: 24 May 2018 06:20
To: Adam Lindley 
<adam.lind...@metaswitch.com<mailto:adam.lind...@metaswitch.com>>; 
clearwater@lists.projectclearwater.org<mailto:clearwater@lists.projectclearwater.org>
Cc: Richard Whitehouse (projectclearwater.org) 
<richard.whiteho...@projectclearwater.org<mailto:richard.whiteho...@projectclearwater.org>>;
 kie...@aptira.com<mailto:kie...@aptira.com>
Subject: RE: [Project Clearwater] Issues with clearwater-docker homestead and 
homestead-prov under Kubernetes

Hi Adam,

Thanks for that.

> you're having issues connecting to Bono because the service/pod is not exposed

What do you mean by that? I've run `kubectl apply -f bono-svc.yaml`. The 
service is exposed, isn't it?

$ kubectl get services
NAME             TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                 
              AGE
bono             ClusterIP   None         <none>        
3478/TCP,5060/TCP,5062/TCP            1d

or after the pcscf changes:

NAME             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             
                           AGE
bono             NodePort    10.108.162.246   <none>        
3478:30078/TCP,5060:30060/TCP,5062:30062/TCP   41m

I've cc'ed in my colleague (Kieran) who set up the cluster for me. He said it's 
using weave net. Contrary to what I thought, nothing will be externally 
routable. So that's an issue. After adding the nodePorts I can view ellis in a 
browser (http://10.3.1.76:30080/login.html), and ` nc 10.3.1.76 30080 -v` shows 
that I can connect to the kubernetes cluster on port 30080. ` nc 10.3.1.76 
30060 -v` says connection refused. When that happens I see a packet in tcpdump 
on the bono pod. I see nothing on tcpdump in bono on the relevant ports when I 
run the rake tests.

I'm curious about the ports for Bono. You keep mentioning port 5060. Is that 
the only port bono uses? What about 3478 and 5062?

I tried your suggestion for the pcscf thing. It didn't work.
The stdio for the rake test is below. How can I figure out whether the 403 
error is for ellis or bono?
The last line of the rake test output says: "Error logs, including Call-IDs of 
failed calls, are in the 'logfiles' directory". But the logfiles directory is 
empty. Where can I find more verbose logs?

I ran tcpdump inside the bono and ellis pods during the test. If I filter by 
source IP address I see nothing, so I have to filter by port. The tcpdump 
command I'm using in bono is `tcpdump -a -vnni any port 5060 or port 5062 or 
port 3478 or port 30078 or port 30060 or port 30062`

Literally all ports are now open on the firewall.

Here is my bono-svc.yaml file:
```
apiVersion: v1
kind: Service
metadata:
  name: bono
spec:
  type: NodePort
  ports:
  - name: "3478"
    port: 3478
    nodePort: 30078
  - name: "5060"
    port: 5060
    nodePort: 30060
  - name: "5062"
    port: 5062
    nodePort: 30062
  selector:
    service: bono
```

Here is my bono-depl.yaml file:

```
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: bono
spec:
  replicas: 1
  selector:
    matchLabels:
      service: bono
  template:
    metadata:
      labels:
        service: bono
        snmp: enabled
    spec:
      containers:
      - image: "mlda065/bono:latest"
        imagePullPolicy: Always
        name: bono
        ports:
        - containerPort: 22
        - containerPort: 3478
        - containerPort: 5060
        - containerPort: 5062
        - containerPort: 5060
          protocol: "UDP"
        - containerPort: 5062
          protocol: "UDP"
        envFrom:
        - configMapRef:
              name: env-vars
        env:
        - name: MY_POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: PUBLIC_IP
          value: 10.3.1.76 #:6443
        volumeMounts:
        - name: bonologs
          mountPath: /var/log/bono
      - image: busybox
        name: tailer
        command: [ "tail", "-F", "/var/log/bono/bono_current.txt" ]
        volumeMounts:
        - name: bonologs
          mountPath: /var/log/bono
      volumes:
      - name: bonologs
        emptyDir: {}
      imagePullSecrets:
      - name: myregistrykey
      restartPolicy: Always
```

Here is my env-vars:

```
$ kubectl describe configmap env-vars
Name:         env-vars
Namespace:    default
Labels:       <none>
Annotations:  <none>

Data
====
ZONE:
----
default.svc.cluster.local
Events:  <none>
```

Here are my services once deployed:

```
$ kubectl get service
NAME             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             
                           AGE
astaire          ClusterIP   None             <none>        11311/TCP           
                           22m
bono             NodePort    10.108.162.246   <none>        
3478:30078/TCP,5060:30060/TCP,5062:30062/TCP   22m
cassandra        ClusterIP   None             <none>        
7001/TCP,7000/TCP,9042/TCP,9160/TCP            22m
chronos          ClusterIP   None             <none>        7253/TCP            
                           22m
ellis            NodePort    10.97.76.168     <none>        80:30080/TCP        
                           22m
etcd             ClusterIP   None             <none>        
2379/TCP,2380/TCP,4001/TCP                     22m
homer            ClusterIP   None             <none>        7888/TCP            
                           22m
homestead        ClusterIP   None             <none>        8888/TCP            
                           22m
homestead-prov   ClusterIP   None             <none>        8889/TCP            
                           22m
kubernetes       ClusterIP   10.96.0.1        <none>        443/TCP             
                           2d
ralf             ClusterIP   None             <none>        10888/TCP           
                           22m
sprout           ClusterIP   None             <none>        5052/TCP,5054/TCP   
                           22m
```


When I restarted the bono service there was a couple of warnings:

```
Defaulting container name to bono.
Use 'kubectl describe pod/bono-6dfc579b5-bdgzj -n default' to see all of the 
containers in this pod.
* Restarting Bono SIP Edge Proxy bono
/etc/init.d/bono: line 63: ulimit: open files: cannot modify limit: Operation 
not permitted
/etc/init.d/bono: line 64: ulimit: open files: cannot modify limit: Invalid 
argument
23-05-2018 06:21:52.570 UTC [7f2b793ec7c0] Status utils.cpp:651: Switching to 
daemon mode
   ...done.
```

Here's the result of the test:

```
$rake test[default.svc.cw-k8s.test] PROXY=10.3.1.76 PROXY_PORT=30060 
SIGNUP_CODE='secret' ELLIS=10.3.1.76:30080 TESTS="Basic call - mainline"
Basic Call - Mainline (TCP) - Failed
  RestClient::Forbidden thrown:
   - 403 Forbidden
     - 
/home/ubuntu/.rvm/gems/ruby-1.9.3-p551/gems/rest-client-1.8.0/lib/restclient/abstract_response.rb:74:in
 `return
!'
     - 
/home/ubuntu/.rvm/gems/ruby-1.9.3-p551/gems/rest-client-1.8.0/lib/restclient/request.rb:495:in
 `process_result'
     - 
/home/ubuntu/.rvm/gems/ruby-1.9.3-p551/gems/rest-client-1.8.0/lib/restclient/request.rb:421:in
 `block in transm
it'
     - /usr/lib/ruby/2.3.0/net/http.rb:853:in `start'
     - 
/home/ubuntu/.rvm/gems/ruby-1.9.3-p551/gems/rest-client-1.8.0/lib/restclient/request.rb:413:in
 `transmit'
     - 
/home/ubuntu/.rvm/gems/ruby-1.9.3-p551/gems/rest-client-1.8.0/lib/restclient/request.rb:176:in
 `execute'
     - 
/home/ubuntu/.rvm/gems/ruby-1.9.3-p551/gems/rest-client-1.8.0/lib/restclient/request.rb:41:in
 `execute'
     - 
/home/ubuntu/.rvm/gems/ruby-1.9.3-p551/gems/rest-client-1.8.0/lib/restclient.rb:69:in
 `post'
```

Here is /var/log/bono/bono_current.txt in the bono pod:

```
24-05-2018 05:17:22.847 UTC [7f2b6b7fe700] Status alarm.cpp:244: Reraising all 
alarms with a known state
24-05-2018 05:17:22.847 UTC [7f2b6b7fe700] Status alarm.cpp:37: sprout issued 
1012.3 alarm
24-05-2018 05:17:22.847 UTC [7f2b6b7fe700] Status alarm.cpp:37: sprout issued 
1013.3 alarm
24-05-2018 05:17:33.337 UTC [7f2b4b7ee700] Status sip_connection_pool.cpp:428: 
Recycle TCP connection slot 11
24-05-2018 05:17:52.847 UTC [7f2b6b7fe700] Status alarm.cpp:244: Reraising all 
alarms with a known state
24-05-2018 05:17:52.847 UTC [7f2b6b7fe700] Status alarm.cpp:37: sprout issued 
1012.3 alarm
24-05-2018 05:17:52.847 UTC [7f2b6b7fe700] Status alarm.cpp:37: sprout issued 
1013.3 alarm
24-05-2018 05:18:21.346 UTC [7f2b4b7ee700] Status sip_connection_pool.cpp:428: 
Recycle TCP connection slot 36
24-05-2018 05:18:22.847 UTC [7f2b6b7fe700] Status alarm.cpp:244: Reraising all 
alarms with a known state
24-05-2018 05:18:22.847 UTC [7f2b6b7fe700] Status alarm.cpp:37: sprout issued 
1012.3 alarm
24-05-2018 05:18:22.847 UTC [7f2b6b7fe700] Status alarm.cpp:37: sprout issued 
1013.3 alarm
24-05-2018 05:18:52.847 UTC [7f2b6b7fe700] Status alarm.cpp:244: Reraising all 
alarms with a known state
24-05-2018 05:18:52.847 UTC [7f2b6b7fe700] Status alarm.cpp:37: sprout issued 
1012.3 alarm
24-05-2018 05:18:52.847 UTC [7f2b6b7fe700] Status alarm.cpp:37: sprout issued 
1013.3 alarm
24-05-2018 05:18:57.350 UTC [7f2b4b7ee700] Status sip_connection_pool.cpp:428: 
Recycle TCP connection slot 35
```

Thanks,

Matthew Davis
Telstra Graduate Engineer
CTO | Cloud SDN NFV

_______________________________________________
Clearwater mailing list
Clearwater@lists.projectclearwater.org
http://lists.projectclearwater.org/mailman/listinfo/clearwater_lists.projectclearwater.org

Re: [Project Clearwater] Issues with clearwater-docker homestead and homestead-prov under Kubernetes

Reply via email to