Rui Fan created FLINK-34655:
-------------------------------

             Summary: Autoscaler doesn't work for flink 1.15
                 Key: FLINK-34655
                 URL: https://issues.apache.org/jira/browse/FLINK-34655
             Project: Flink
          Issue Type: Bug
          Components: Autoscaler
            Reporter: Rui Fan
            Assignee: Rui Fan
             Fix For: 1.8.0


flink-ubernetes-operator is committed to supporting the latest 4 flink minor 
versions, and autoscaler is a part of flink-ubernetes-operator. Currently,  the 
latest 4 flink minor versions are 1.15, 1.16, 1.17 and 1.18.

But autoscaler doesn't work for  flink 1.15.

h2. Root cause: 

* FLINK-28310 added some properties in IOMetricsInfo in flink-1.16
* IOMetricsInfo is a part of JobDetailsInfo
* JobDetailsInfo is necessary for autoscaler [1]
* flink's RestClient doesn't allow miss any property during deserializing the 
json

That means that the RestClient after 1.15 cannot fetch JobDetailsInfo for 1.15 
jobs.

h2. How to fix it properly?

Flink side support ignore unknown properties.

FLINK-33268 already do it. But I try run autoscaler with flink-1.15 job, it 
still doesn't work. Because the IOMetricsInfo added some properties, they are 
primitive type.

It should disable DeserializationFeature.FAIL_ON_NULL_FOR_PRIMITIVES as well. 
(Not sure whether it should be a seperate FLIP or it can be a part of FLIP-401 
[2].)


h2. How to fix it in the short term?

1. Copy the latest RestMapperUtils and RestClient from master branch (It 
includes FLINK-33268) to flink-autoscaler module. (The copied class will be 
loaded first)
2. Disable DeserializationFeature.FAIL_ON_NULL_FOR_PRIMITIVES in 
RestMapperUtils#flexibleObjectMapper in copied class.

Based on these 2 steps, flink-1.15 works well with autoscaler. (I try it 
locally).


After DeserializationFeature.FAIL_ON_NULL_FOR_PRIMITIVES in 
RestMapperUtils#flexibleObjectMapper is disabled, and the corresponding code is 
released in flink side. flink-ubernetes-operator can remove these 2 copied 
classes.

[1] 
https://github.com/apache/flink-kubernetes-operator/blob/ede1a610b3375d31a2e82287eec67ace70c4c8df/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/ScalingMetricCollector.java#L109
[2] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-401%3A+REST+API+JSON+response+deserialization+unknown+field+tolerance



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to