Ryan van Huuksloot created FLINK-34991: ------------------------------------------
Summary: Flink Operator ClassPath Race Condition Bug Key: FLINK-34991 URL: https://issues.apache.org/jira/browse/FLINK-34991 Project: Flink Issue Type: Bug Components: Kubernetes Operator Affects Versions: 1.7.2 Environment: Standard Flink Operator with Flink Deployment. To recreate, just remove a critical SQL connector library from the bundled jar Reporter: Ryan van Huuksloot Hello, I believe we've found a bug with the Job Managers of the Kubernetes Operator. I think there is a race condition or an incorrect conditional where the operator is checking for High Availability instead of seeing if there is an issue with Class Loading in the Job Manager. *Example:* When deploying a SQL Flink Job, it starts the job managers in a failed state. Describing the flink deployment returns the Error message {code:java} RestoreFailed ... HA metadata not available to restore from last state. It is possible that the job has finished or terminally failed, or the configmaps have been deleted.{code} But upon further investigation, the actual error was that the class loading of the Job Manager wasn't correct. This was a log in the Job Manager {code:java} "Could not find any factory for identifier 'kafka' that implements 'org.apache.flink.table.factories.DynamicTableFactory' in the classpath.\n\nAvailable factory identifiers are:\n\nblackhole\ndatagen\nfilesystem\nprint","name":"org.apache.flink.table.api.ValidationException","extendedStackTrace":"org.apache.flink.table.api.ValidationException: Could not find any factory for identifier 'kafka' that implements 'org.apache.flink.table.factories.DynamicTableFactory' in the classpath.\n\nAvailable factory identifiers are:\n\nblackhole\ndatagen\nfilesystem\nprint\n\"{code} There is also logging in the operator {code:java} ... Cannot discover a connector using option: 'connector'='kafka'\n\tat org.apache.flink.table.factories.FactoryUtil.enrichNoMatchingConnectorError(FactoryUtil.java:798)\n\tat org.apache.flink.table.factories.FactoryUtil.discoverTableFactory(FactoryUtil.java:772)\n\tat org.apache.flink.table.factories.FactoryUtil.createDynamicTableSource(FactoryUtil.java:215)\n\t... 52 more\nCaused by: org.apache.flink.table.api.ValidationException: Could not find any factory for identifier 'kafka' that implements 'org.apache.flink.table.factories.DynamicTableFactory' in the classpath ....{code} I think that the operator should return this error in the CRD since the HA error is not the root cause. -- This message was sent by Atlassian Jira (v8.20.10#820010)