gkousiouris opened a new issue, #5359: URL: https://github.com/apache/openwhisk/issues/5359
<!-- We use the issue tracker for bugs and feature requests. For general questions and discussion please use https://openwhisk.apache.org/slack.html or https://openwhisk.apache.org/community.html instead. Do NOT share passwords, credentials or other confidential information. Before creating a new issue, please check if there is one already open that fits the defect you are reporting. If you open an issue and realize later it is a duplicate of a pre-existing open issue, please close yours and add a comment to the other. Issues can be created for either defects or enhancement requests. If you are a committer than please add the labels "bug" or "feature". If you are not a committer please make clear in the comments which one it is, so that committers can add these labels later. If you are reporting a defect, please edit the issue description to include the information shown below. If you are reporting an enhancement request, please include information on what you are trying to achieve and why that enhancement would help you. For more information about reporting issues, see https://github.com/apache/openwhisk/blob/master/CONTRIBUTING.md#raising-issues Use the commands below to provide key information from your environment: You do not have to include this information if this is a feature request. --> Hi, we have come across a reporting issue/bug(?). So in the case of activations of a working custom docker image, occasionally there are some errors (reported as developer errors) especially with the indication of failure during initialization (no available POST /init endpoint). This happens mainly in cold starts. ## Environment details: Openwhisk deployment on top of kubernetes and Openshift, running on AWS VMs ## Steps to reproduce the issue: 1. Create a docker action and register it (in our case a docker action based on the Node-RED server) 2. Test it so that it works ok (with proper INIT, RUN endpoints etc.) 3. Launch activations with the same input, normally finalizing with success, however occasionally some errors (developer error type) occur, especially in cold starts that happen after some inactivity time ## Provide the expected results and outputs: The action should finalize with success, example: ``` { "namespace": "guest", "name": "Inference_3fff08e8-3617-4eb4-b118-4229f30514e3.json", "version": "0.0.6", "subject": "guest", "activationId": "be308f36967d4613b08f36967df6138b", "start": 1658257753690, "end": 1658257762621, "duration": 8931, "statusCode": 0, "response": { "status": "success", "statusCode": 0, "success": true, "result": { }, "logs": [], "annotations": [ { "key": "path", "value": "guest/Inference_3fff08e8-3617-4eb4-b118-4229f30514e3.json" }, { "key": "waitTime", "value": 6 }, { "key": "kind", "value": "blackbox" }, { "key": "timeout", "value": false }, { "key": "limits", "value": { "concurrency": 1, "logs": 10, "memory": 512, "timeout": 300000 } } ], "publish": false } ``` ## Provide the actual results and outputs: The server has started ok inside the action container as seen in the logs. Given that it happens in cold starts, maybe there is some form of timeout that is violated for fetching the image? Does Openwhisk try to /init before the server inside the container is up? If yes then this is masked behind a misleading error message (developer error) or potentially there should be some retry from Openwhisk side? ``` { "namespace": "guest", "name": "Inference_3fff08e8-3617-4eb4-b118-4229f30514e3.json", "version": "0.0.6", "subject": "guest", "activationId": "1040685ebb1e444980685ebb1e64495a", "start": 1658257763885, "end": 1658257771584, "duration": 7699, "statusCode": 0, "response": { "status": "action developer error", "statusCode": 0, "success": false, "result": { "error": "The action failed during initialization: <!DOCTYPE html>\n<html lang=\"en\">\n<head>\n<meta charset=\"utf-8\">\n<title>Error</title>\n</head>\n<body>\n<pre>Cannot POST /init</pre>\n</body>\n</html>\n" } }, "logs": [ "2022-07-19T19:09:23.899349481Z stdout: ", "2022-07-19T19:09:23.899349481Z stdout: > node-red-docker@2.2.2 start /usr/src/node-red", "2022-07-19T19:09:23.899349481Z stdout: > node $NODE_OPTIONS node_modules/node-red/red.js $FLOWS \"--userDir\" \"/data\"", "2022-07-19T19:09:23.899349481Z stdout: ", "2022-07-19T19:09:26.301015945Z stdout: 19 Jul 19:09:26 - [info] ", "2022-07-19T19:09:26.301015945Z stdout: ", "2022-07-19T19:09:26.301015945Z stdout: Welcome to Node-RED", "2022-07-19T19:09:26.301015945Z stdout: ===================", "2022-07-19T19:09:26.301015945Z stdout: ", "2022-07-19T19:09:26.302106314Z stdout: 19 Jul 19:09:26 - [info] Node-RED version: v2.2.2", "2022-07-19T19:09:26.302956696Z stdout: 19 Jul 19:09:26 - [info] Node.js version: v14.19.1", "2022-07-19T19:09:26.303139937Z stdout: 19 Jul 19:09:26 - [info] Linux 5.14.14-200.fc34.x86_64 x64 LE", "2022-07-19T19:09:27.305314368Z stdout: 19 Jul 19:09:27 - [info] Loading palette nodes", "2022-07-19T19:09:31.219388687Z stdout: 19 Jul 19:09:31 - [info] Settings file : /data/settings.js", "2022-07-19T19:09:31.220267452Z stdout: 19 Jul 19:09:31 - [info] Context store : 'default' [module=memory]", "2022-07-19T19:09:31.221039641Z stdout: 19 Jul 19:09:31 - [info] User directory : /data", "2022-07-19T19:09:31.221229505Z stdout: 19 Jul 19:09:31 - [warn] Projects disabled : editorTheme.projects.enabled=false", "2022-07-19T19:09:31.239239234Z stdout: 19 Jul 19:09:31 - [info] Flows file : /data/flows.json", "2022-07-19T19:09:31.340091186Z stdout: 19 Jul 19:09:31 - [info] Server now running at http://127.0.0.1:8080/", "2022-07-19T19:09:31.617204409Z stdout: 19 Jul 19:09:31 - [info] Starting flows", "2022-07-19T19:09:31.709196934Z stdout: 19 Jul 19:09:31 - [info] Started flows", "2022-07-19T19:09:31.584796934Z stderr: The action did not initialize or run as expected. Log data might be missing." ], "annotations": [ { "key": "path", "value": "guest/Inference_3fff08e8-3617-4eb4-b118-4229f30514e3.json" }, { "key": "waitTime", "value": 4199 }, { "key": "kind", "value": "blackbox" }, { "key": "timeout", "value": false }, { "key": "limits", "value": { "concurrency": 1, "logs": 10, "memory": 512, "timeout": 300000 } }, { "key": "initTime", "value": 7699 } ], "publish": false } ``` ## Additional information you deem important: The issue seems to happen always in the first cold start case after a long time of inactivity. So maybe this is triggered by longer delays needed for fetching more container layers from the registry. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@openwhisk.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org