2019-11-25 05:10:08 UTC - chetanm: Which ContainerFactory are you using. From 
response code it appears some error case. Does controller report any healthy 
invoker `/invokers` endpoint
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574658608431200?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 05:32:04 UTC - Ali Tariq: i am using defaults, looking at 
values.yaml it impl: `kubernetes`. I looked at the logs from controller - it 
looks like invoker is overloaded with requests (`[WebActionsApi] No invokers 
available [marker:controller_loadbalancer_error:20:2]`)
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574659924431400?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 05:36:08 UTC - Ali Tariq: what i don't understand is, shouldn't it 
simple queue up the extra-requests and continue servicing? instead of sending 
`down for maintenance response` ? ... plus from my custom logging, i know its 
not servicing any requests right now - but for some reason its overloaded.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574660168431900?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 05:36:57 UTC - chetanm: Its not overloaded but looks like none of 
the invokers are found to be healthy. Need to check invoker logs if it is able 
to send health pings
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574660217432100?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 05:43:35 UTC - Ali Tariq: i don't see any new logs update in 
invoker, for any new request i send (request gives back down for maintenance on 
client side though)
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574660615432300?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 05:48:08 UTC - chetanm: Yeah looks like currently its tricky to 
debug this case to determine why invokers are not being considered healthy
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574660888432800?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 05:49:07 UTC - chetanm: Check on controller side logs having 
sid_`invokerHealth`
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574660947433000?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 05:49:35 UTC - chetanm: That may give some clue. We need to have 
some better have to debug through this situation
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574660975433200?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 06:36:38 UTC - Ali Tariq: Okay, i kept checking the invoker logs 
until the invoker becomes available again ... it turns out the is 
`java.lang.OutOfMemoryError: Java heap space` .  Why would that be the case?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574663798433400?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 06:38:29 UTC - chetanm: How big are the response from actions?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574663909433800?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 06:39:04 UTC - Ali Tariq: the response are just strings
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574663944434000?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 06:39:21 UTC - Ali Tariq: about 20 chars
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574663961434200?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 06:40:43 UTC - chetanm: Try increasing the `invoker.jvmHeapMB` which 
currently defaults to 512MB
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574664043434400?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 06:40:56 UTC - Ali Tariq: in the actions ... i connect to a remote 
server and send logging information, then simple return a finished string 
response
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574664056434600?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 06:41:40 UTC - Ali Tariq: yeah ... but without knowing the issue , 
how much should i increase it to?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574664100434800?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 06:44:41 UTC - chetanm: Given that you increased the per invoker 
concurrent container handling the resource requirements would increase. 
Defaults are there more for basic development settings.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574664281435000?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 06:45:40 UTC - chetanm: Try settings it to 2048
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574664340435200?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 06:46:02 UTC - chetanm: That should give it more space to work with 
and those are the defaults used for various test runs
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574664362435400?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 07:15:47 UTC - Ali Tariq: changing jvmHeap to `2048` didn't solve 
the problem, although i did not see any heap exceptions in the invoker-logs 
this time. the attached snippet show the Unhealthy transitions of `invoker`  
from controller's logs (sid_invokerHealth). It just states the state 
transitions, how can i find details for the cause of these transitions?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574666147435800?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 07:16:37 UTC - chetanm: Are all invoker pod up?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574666197436300?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 07:18:46 UTC - Ali Tariq: they are up when i send a new burst of 
requests (i can see) hundreds of new invokers running.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574666326436500?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 07:20:24 UTC - Ali Tariq: not right now! because the invoker is no 
longer down - (it only happens after i send in a new burst, it will service 
some chunk of those requests and go down. After some time (5-10 minute), its up 
again)
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574666424436700?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 07:22:57 UTC - Ali Tariq: i just send a burst of 800 requests, i can 
see 924 invoker-pods in the deployment. And invoker is again down.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574666577436900?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 07:24:46 UTC - chetanm: That many invoker pods seems odd. They 
should be action pods. Invokers are by default configured to be 1 for 
`KubernetesContainerFactory` mode 
<https://github.com/apache/openwhisk-deploy-kube/blob/master/helm/openwhisk/values.yaml#L275>
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574666686437100?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 07:28:30 UTC - Ali Tariq: i believe they are action pods like you 
said (some shown in the snippet). This is the main invoker pod `owdev-invoker-0 
                                                1/1     Running     0          
38m`  (not shown in the snippet).
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574666910437300?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 12:59:39 UTC - volo: hey guys, is openwhisk installable on ECS AWS 
and docker swarm?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574686779438200
----
2019-11-25 14:44:48 UTC - volo: one more question - is there in openwhisk some 
management console?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574693088438700
----
2019-11-25 16:07:25 UTC - Rodric Rabbah: There are dashboards you looks at the 
ops metrics. For example 
<https://user-images.githubusercontent.com/736614/69112434-0dd79c80-0a35-11ea-9749-761dedc95877.png>
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574698045439000
----
2019-11-25 16:07:33 UTC - Rodric Rabbah: can you clarify what you mean by 
management console
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574698053439500
----
2019-11-25 18:18:29 UTC - Dave Grove: Currently the nginx certificate is the 
hard-coded in templates/nginx-secret.yaml as the value associated with the 
`tls.crt` key.  You are right that the current value is expired (actually 
expired on Oct 1, 2019).    I’ve been meaning to fix this for a while (opened 
the issue on Oct 1, 2018, last time I had to regenerate an expired cert).  I 
did make some progress on this last week; hope to be able to finish it off 
relatively soon.  <https://github.com/apache/openwhisk-deploy-kube/issues/305>
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574705909439700?thread_ts=1574608357.430300&cid=C3TPCAQG1
----
2019-11-25 18:26:37 UTC - Dave Grove: Couple of things, you probably need to 
increase the number of invokers by changing line 275 in values.yaml as Chetan 
suggested.  Pumping the log processing and input/output of 1000 concurrent 
actions through a single invoker is probably too much for it to handle.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574706397439900?thread_ts=1574633042.430600&cid=C3TPCAQG1
----
2019-11-25 18:30:53 UTC - Dave Grove: Also, setting `containerpool.userMemory` 
to a value of 281600m implies you have about 55GB of RAM available on each 
worker node to use for user actions.  If you don’t actually have that memory 
(or at least close to  that memory) you are going to get into all sorts of 
resource-related problems.   Invoker will try to ask Kubernetes to create more 
containers, Kubernetes will either refuse (because the resources aren’t there) 
or it will create them, but your worker node will thrash or start randomly 
doing OOM kills of containers because it doesn’t have the resources to actually 
run the container.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1574706653440100?thread_ts=1574633042.430600&cid=C3TPCAQG1
----

Reply via email to