falkzoll commented on PR #5442:
URL: https://github.com/apache/openwhisk/pull/5442#issuecomment-1778874198

   Hi @joni-jones , maybe you or someone else here in this PR can help us to 
understand an issue in the https://github.com/apache/openwhisk-runtime-nodejs 
runtime... actually the scheduled openwhisk nodejs runtime builds fail 
(https://github.com/apache/openwhisk-runtime-nodejs/actions). They fail with a 
timeout (after 30s) in a certain testcase that tests the concurrent invocation 
capability (number of concurrent invokes/runs for a single action container) of 
this runtime (for nodejs:18 and nodejs:20).
   ```
   runtime.actionContainers.NodeJs18ConcurrentTests > action-nodejs-v18 should 
allow running activations concurrently FAILED
       java.util.concurrent.TimeoutException
           at 
org.apache.openwhisk.core.containerpool.AkkaContainerClient$.$anonfun$executeRequest$1(AkkaContainerClient.scala:252)
           at scala.util.Success.$anonfun$map$1(Try.scala:255)
           at scala.util.Success.map(Try.scala:213)
           at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
           at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
           at 
scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
           at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
           at 
java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
           at 
java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
           at 
java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
           at 
java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
           at 
java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
           at 
java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
   ```
   
   Same for nodejs:20. This kind of concurrency is actually only supported in 
the nodejs runtimes. Other runtimes just support '__OW_ALLOW_CONCURRENT=false' 
and can only handle one invoke/run per action container at a time.
   
   To run the tests, the github action of the nodejs runtime clones the latest 
available apache/openwhisk repository and uses its master as the base to run 
its tests 
(https://github.com/apache/openwhisk-runtime-nodejs/blob/master/.github/workflows/ci.yaml).
   
   The actually failing testcase performs 128 
(https://github.com/apache/openwhisk-runtime-nodejs/blob/c60a6676375d85878c658412162004848c19f965/tests/src/test/scala/runtime/actionContainers/NodeJsConcurrentTests.scala#L32)
 parallel action invokes/run requests into a single nodejs runtime action 
container. The scala test utilizes the AkkaContainerClient to open these 128 
parallel connections 
(https://github.com/apache/openwhisk-runtime-nodejs/blob/c60a6676375d85878c658412162004848c19f965/tests/src/test/scala/runtime/actionContainers/NodeJsConcurrentTests.scala#L24).
   The nodejs test action invoked inside the action container 
(https://github.com/apache/openwhisk-runtime-nodejs/blob/c60a6676375d85878c658412162004848c19f965/tests/src/test/scala/runtime/actionContainers/NodeJsConcurrentTests.scala#L41)
 is coded to wait for 128 incoming invokes before it starts to complete all of 
them with a response (makes use of global variables). Means the first action 
run request to this action is held open and not answered before the 128th run 
reached the action code in this container. With this the test can be sure that 
this runtime can handle this number of concurrent action invokes being open at 
the same time. This test usually takes far less than 5 seconds in the github 
action while the timeout for this test is 30s.
   
   Debugging it locally showed that with the current implementation in this PR 
the test seems not to reach the required 128 parallel connections anymore. It 
seems after a set of open connections is reached (far less than 128), no others 
are done anymore or maybe they arrive very, very slow. With this, the pending 
invokes in the action container do not return a response in time and the 
testcase fails after 30s.
   Reverting to the previous commit (0c27a650ab6073e131e5c74002465e93cf4d8621) 
resolves the issue and the concurrency test completes successful within the 
usual few seconds for nodejs:18 and nodejs:20.
   
   Looking at the changes in this PR it does not look like a change is needed 
to adapt the nodejs runtime tests. Anyhow, with your broader akka background, 
do we need to modify something in the nodejs runtime tests to consume this 
latest apache/openwhisk core?
   Any hints are welcome :-).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@openwhisk.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to