Koen De Groote created KAFKA-7013:
-------------------------------------
Summary: Do Infer analysis and investigate results
Key: KAFKA-7013
URL: https://issues.apache.org/jira/browse/KAFKA-7013
Project: Kafka
Issue Type: Bug
Reporter: Koen De Groote
Infer is a tool by Facebook that performs checks on code in order to find
potential flaws.
Specifically, it attemps to track down null dereferences, thread safety
violations and resource leaks.
Site: [http://fbinfer.com|http://fbinfer.com/]
Github: [https://github.com/facebook/infer]
The tool supports java. I made a docker container for the kafka code and the
infer tool and ran Infer's analysis on the `gradle clean compileJava` process.
Results:
Summary of the reports
THREAD_SAFETY_VIOLATION: 274
RESOURCE_LEAK: 263
NULL_DEREFERENCE: 151
Do take into account: the analysis is not perfect. Most resources leaks
reported are simply because the end of a scope was reached and the resource
wasn't closed. But there are times you don't actually want to close them.
For instance:
[https://github.com/apache/kafka/blob/2.0/streams/src/main/java/org/apache/kafka/streams/processor/internals/DefaultKafkaClientSupplier.java]
All those byte serializers/deserializers are reported as resource leaks. While
I don't know this code, a quick gander tells me it seems like these resources
should not be closed and the end of the scope.
688 issues is a lot and probably worth going through, even if it is only to
conclude that all the reported instances are false positives.
Another example, actual infer output this time:
{noformat}
687.
clients/src/main/java/org/apache/kafka/common/record/CompressionRatioEstimator.java:52:
error: THREAD_SAFETY_VIOLATION
Read/Write race. Non-private method `float
CompressionRatioEstimator.updateEstimation(String,CompressionType,float)` reads
without synchronization from `compressionRatioForTopic.[_]`. Potentially races
with write in method `CompressionRatioEstimator.updateEstimation(...)`.
{noformat}
While technically correct in that the synchronization only happens on write and
not on read, one can wonder if adding it would negatively affect performance
and if the "correct" ratio would differ so much from the incorrect that it
actually warrants adding the synchronization on read.
The results should definitely be taken with a grain of salt.
Reason for not uploading the file with the bugs listed in them: because that
file only shows the last step in the code that would potentially trigger the
problem. Infer has a secondary command `infer-explore` which lets you pick one
of the issues it found and then it outputs the exact code path that leads to
the problem, something showing surprising cases.
Because of the scope of this and my relative inexperience with this codebase, I
don't really know how to properly fill in this ticket, so I'll be leaving the
ticket as is.
Final word: looking into stuff like this is work I usually do outside of
working hours. Unless someone is really really interested, I'd ask people not
to take time away from what would be their regular tickets. Especially since
eventual PR(s) will most likely have a discussion of whether or not a
particular change should actually happen or not.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)