Re: [onap-discuss] facing problem with portal app

2018-07-19 Thread Borislav Glozman
Hi,

The AAF problem is usually related to nfs (/dockerdata-nfs) not working between 
the nodes.
Please check that the nfs is working.

Thanks,
Borislav Glozman
O:+972.9.776.1988
M:+972.52.2835726
[amdocs-a]
Amdocs a Platinum member of 
ONAP

From: Michael O'Brien
Sent: Thursday, July 19, 2018 5:49 PM
To: Vidhu Shekhar Pandey - ERS, HCL Tech ; Borislav 
Glozman ; Mike Elliott 
Cc: onap-discuss@lists.onap.org
Subject: RE: facing problem with portal app

Hi,
   For the aaf pod - this looks like one of the known issue pods - we need a 
page for these though for reference - the teams like AAF can keep up to date 
(along with checking the status of the CI/CD reference servers)
   For your portal issue - I don't remember if it was your team or another we 
were working with - where I mentioned that only the Rancher RI comes with a 
default LoadBalancer service - make sure if you are using an alternative 
kubernetes setup based on kubectl that you setup your own native LoadBalancer - 
or switch to using Rancher.
   Thank you
/michael

From: Vidhu Shekhar Pandey - ERS, HCL Tech 
mailto:vidhu.pan...@hcl.com>>
Sent: Thursday, July 19, 2018 10:08 AM
To: Michael O'Brien mailto:frank.obr...@amdocs.com>>; 
Borislav Glozman 
mailto:borislav.gloz...@amdocs.com>>; Mike Elliott 
mailto:mike.elli...@amdocs.com>>
Cc: onap-discuss@lists.onap.org
Subject: RE: facing problem with portal app

Hi Mike/Borislav,

As discussed in yesterday's meeting I am sharing the configuration I have setup 
in my lab. I have one Rancher node and a cluster of 4 Kubernetes nodes for the 
Beijing release. Rancher VM is 12 vCPU, 15GB RAM , 80GB disk. Each k8s VM node 
is 12 vCPUs, 40GB RAM, 160GB disk. This I am setting this up in OpenStack 
following the steps is link 
http://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_setup_kubernetes_rancher.html.

I had shared this with Michael O'Brien earlier and got the setup verified as in 
the below mail. But even after trying 2-3 attempts some of the pods did not 
come up. These are intermittent and the failed pods vary each time. In recent 
run I had problem with:

onap-aaf-cm (crashloopbackoff)
onap-dbc-pg-0 (this shows readiness probe failure)
onap-dbc-pg-1 (this pod used to come up earlier but not this time)
onap-dmaap-bus-controller (crashloopbackoff)
portal-app (this remains in init state every time due to a failed init 
container)

Just wanted  to know if having 6 k8s nodes with 32 GB RAM, as suggested by 
Borislav in recent posts, would improve the chances?
Michael had indicated that there are Docker image downloading problems which 
can lead to pod syncing issues. What Internet speed is recommended for pulling 
the images; could fluctuating bandwidth be a problem?

Thanks,
Vidhu

From: Michael O'Brien [mailto:frank.obr...@amdocs.com]
Sent: 05 July 2018 04:57
To: Vidhu Shekhar Pandey - ERS, HCL Tech 
mailto:vidhu.pan...@hcl.com>>
Cc: onap-discuss@lists.onap.org
Subject: RE: facing problem with portal app

Adding ONAP community for reference and input
Your setup looks fine - docker downloads will be 40G per VM, the master will 
only run the rancher/kubernetes system, ONAP will sort of come up in 96G but it 
will expand past 120G if you provide a large enough cluster - you are running 
160G so ok there as well.

Deploying ONAP on a clean set of VM will be problematic mostly because of the 
docker downloads issue - until we retrofit the preload script to run off the 
manifest.  The 2nd time you deploy it will come up faster for that day as your 
4 docker caches per/vm are filled.  This is less of a problem on some cloud 
providers and if you run behind your own local nexus3 proxy.

The NFS share is recommended but not required until pods start getting 
rescheduled across cluster VM's
The system does have dependency tracking in the case of readiness checks - but 
the number of retries are unfortunately finite in number and durations and 
start time - to be fine tuned as we go.  Therefore you are subject to a bit of 
random start order for now - as we have not allocated resource cpu/ram where it 
is required low in the dep tree yet - except for DaemonSets.  Lately for the 
past 2 weeks portal has also been failing for the CD system when it runs every 
4 hours, along with clamp, appc, nbi, oof, policy, sdc (db init container 
related retries), sdnc, so,  and intermittently dcae, aaf, and a couple aai on 
kibana.onap.info:5601

I have not looked into all of these failures, a couple were docker image t

Re: [onap-discuss] facing problem with portal app

2018-07-19 Thread Michael O'Brien
Hi,
   For the aaf pod - this looks like one of the known issue pods - we need a 
page for these though for reference - the teams like AAF can keep up to date 
(along with checking the status of the CI/CD reference servers)
   For your portal issue - I don't remember if it was your team or another we 
were working with - where I mentioned that only the Rancher RI comes with a 
default LoadBalancer service - make sure if you are using an alternative 
kubernetes setup based on kubectl that you setup your own native LoadBalancer - 
or switch to using Rancher.
   Thank you
/michael

From: Vidhu Shekhar Pandey - ERS, HCL Tech 
Sent: Thursday, July 19, 2018 10:08 AM
To: Michael O'Brien ; Borislav Glozman 
; Mike Elliott 
Cc: onap-discuss@lists.onap.org
Subject: RE: facing problem with portal app

Hi Mike/Borislav,

As discussed in yesterday's meeting I am sharing the configuration I have setup 
in my lab. I have one Rancher node and a cluster of 4 Kubernetes nodes for the 
Beijing release. Rancher VM is 12 vCPU, 15GB RAM , 80GB disk. Each k8s VM node 
is 12 vCPUs, 40GB RAM, 160GB disk. This I am setting this up in OpenStack 
following the steps is link 
http://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_setup_kubernetes_rancher.html.

I had shared this with Michael O'Brien earlier and got the setup verified as in 
the below mail. But even after trying 2-3 attempts some of the pods did not 
come up. These are intermittent and the failed pods vary each time. In recent 
run I had problem with:

onap-aaf-cm (crashloopbackoff)
onap-dbc-pg-0 (this shows readiness probe failure)
onap-dbc-pg-1 (this pod used to come up earlier but not this time)
onap-dmaap-bus-controller (crashloopbackoff)
portal-app (this remains in init state every time due to a failed init 
container)

Just wanted  to know if having 6 k8s nodes with 32 GB RAM, as suggested by 
Borislav in recent posts, would improve the chances?
Michael had indicated that there are Docker image downloading problems which 
can lead to pod syncing issues. What Internet speed is recommended for pulling 
the images; could fluctuating bandwidth be a problem?

Thanks,
Vidhu

From: Michael O'Brien [mailto:frank.obr...@amdocs.com]
Sent: 05 July 2018 04:57
To: Vidhu Shekhar Pandey - ERS, HCL Tech 
mailto:vidhu.pan...@hcl.com>>
Cc: onap-discuss@lists.onap.org
Subject: RE: facing problem with portal app

Adding ONAP community for reference and input
Your setup looks fine - docker downloads will be 40G per VM, the master will 
only run the rancher/kubernetes system, ONAP will sort of come up in 96G but it 
will expand past 120G if you provide a large enough cluster - you are running 
160G so ok there as well.

Deploying ONAP on a clean set of VM will be problematic mostly because of the 
docker downloads issue - until we retrofit the preload script to run off the 
manifest.  The 2nd time you deploy it will come up faster for that day as your 
4 docker caches per/vm are filled.  This is less of a problem on some cloud 
providers and if you run behind your own local nexus3 proxy.

The NFS share is recommended but not required until pods start getting 
rescheduled across cluster VM's
The system does have dependency tracking in the case of readiness checks - but 
the number of retries are unfortunately finite in number and durations and 
start time - to be fine tuned as we go.  Therefore you are subject to a bit of 
random start order for now - as we have not allocated resource cpu/ram where it 
is required low in the dep tree yet - except for DaemonSets.  Lately for the 
past 2 weeks portal has also been failing for the CD system when it runs every 
4 hours, along with clamp, appc, nbi, oof, policy, sdc (db init container 
related retries), sdnc, so,  and intermittently dcae, aaf, and a couple aai on 
kibana.onap.info:5601

I have not looked into all of these failures, a couple were docker image tag 
version flips that were required, a couple were removed images from nexus3 that 
were fixed.

My highest healthcheck was 40/43 on June 20th on a clean cluster - where there 
were only issues with clamp, sdc and sdnc - most of them timing related not 
particular to these apps - they were just unlucky to be starved of resources - 
this is on a 4 cluster (64cores/256G ram) each 16 cores/64G/120Gssd/20GbpsNet 
with EFS/NFS share

As you can see -failed containers does not necessarily match failed 
healtchecks.  Health can still fail on 1/1 or 2/2 running containers if these 
are still initializing - which is good, or not fail if the container is not 
part of the healthcheck

Re: [onap-discuss] facing problem with portal app

2018-07-04 Thread Michael O'Brien
Adding ONAP community for reference and input
Your setup looks fine - docker downloads will be 40G per VM, the master will 
only run the rancher/kubernetes system, ONAP will sort of come up in 96G but it 
will expand past 120G if you provide a large enough cluster - you are running 
160G so ok there as well.

Deploying ONAP on a clean set of VM will be problematic mostly because of the 
docker downloads issue - until we retrofit the preload script to run off the 
manifest.  The 2nd time you deploy it will come up faster for that day as your 
4 docker caches per/vm are filled.  This is less of a problem on some cloud 
providers and if you run behind your own local nexus3 proxy.

The NFS share is recommended but not required until pods start getting 
rescheduled across cluster VM's
The system does have dependency tracking in the case of readiness checks - but 
the number of retries are unfortunately finite in number and durations and 
start time - to be fine tuned as we go.  Therefore you are subject to a bit of 
random start order for now - as we have not allocated resource cpu/ram where it 
is required low in the dep tree yet - except for DaemonSets.  Lately for the 
past 2 weeks portal has also been failing for the CD system when it runs every 
4 hours, along with clamp, appc, nbi, oof, policy, sdc (db init container 
related retries), sdnc, so,  and intermittently dcae, aaf, and a couple aai on 
kibana.onap.info:5601

I have not looked into all of these failures, a couple were docker image tag 
version flips that were required, a couple were removed images from nexus3 that 
were fixed.

My highest healthcheck was 40/43 on June 20th on a clean cluster - where there 
were only issues with clamp, sdc and sdnc - most of them timing related not 
particular to these apps - they were just unlucky to be starved of resources - 
this is on a 4 cluster (64cores/256G ram) each 16 cores/64G/120Gssd/20GbpsNet 
with EFS/NFS share

As you can see -failed containers does not necessarily match failed 
healtchecks.  Health can still fail on 1/1 or 2/2 running containers if these 
are still initializing - which is good, or not fail if the container is not 
part of the healthcheck - which is sometimes by design because the container is 
an optional one.

http://jenkins.onap.info/job/oom-cd-master/3185/console
22:41:11 report on non-running containers
22:41:12 down-aai=1
22:41:15 down-sdc=1
22:41:18 down-clamp=1
22:41:20 pending containers=3
22:41:20 onap  onap-aai-champ-68ff644d85-mnf79  0/1 
  Running0  2h
22:41:20 onap  onap-clamp-7d69d4cdd7-vlkkk  1/2 
  CrashLoopBackOff   31 2h
22:41:20 onap  onap-sdc-be-8447b4d544-7tn5w 1/2 
  Running0  2h


23:02:46 Basic SDC Health Check
| FAIL |

23:02:46 500 != 200

23:02:46 
--

23:02:46 Basic SDNC Health Check   
| FAIL |

23:02:46 Resolving variable '${resp.json()['output']['response-code']}' failed: 
JSONDecodeError: Expecting value: line 1 column 1 (char 0)

23:02:34 Basic CLAMP Health Check  
| FAIL |

23:02:34 Test timeout 1 minute exceeded.



We will be having the next OOM meet on wed - we can discuss there as well.

Thank you
/michael



From: Vidhu Shekhar Pandey - ERS, HCL Tech 
Sent: Tuesday, July 3, 2018 2:27 PM
To: Michael O'Brien 
Subject: facing problem with portal app

Hi Michael,

I am trying to install ONAP OOM Beijing. I have setup one Rancher VM  and  
cluster of 4 Kubernetes nodes . Rancher VM is 12 vCPU, 15GB RAM , 80GB disk. 
Each k8s node is 12 vCPUs, 40GB RAM, 160GB disk. This I am setting up in 
OpenStack following the steps is link 
http://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_setup_kubernetes_rancher.html.

I tired cd.sh script earlier many times but each time most of the pods did not 
come up in running state and were stuck. So I resorted to getting the pods up 
one by one (in smaller groups) using "helm install local/onap -n onap 
-namespace onap -f values.yaml" and then adding more components using "helm 
upgrade onap local/onap -f values.yaml". This way I am able to get almost all 
pods running except few. One is the portal app pod which is stuck in init state 
waiting for ever. Is this due to some dependency on other pods? Is there a 
sequence I should follow while bringing the pods up? Is there any dependency 
diagram of components for Beijing release?

Earlier I also faced problems in getting policy nexus (sonatype) pod up but 
after the recent bug fixes in liveness and readiness time delay I got it 
working for me now.

Would really appreciate your suggestions.

Thanks,
Vidhu
::DISCLAIMER::
---