Github user uce commented on the pull request: https://github.com/apache/flink/pull/729#issuecomment-107996871 Hey Timo, I've tested this locally so far and it's working smoothly! Let's write a blog post about this very soon. :-) As soon as I have access to more machines, I will test it on a cluster. The stuff denoted with [RB] should be fixed in any case before merging imo. The rest we can also do afterwards. **USER FACING** 1. Disable the analysis by default. [RB] 2. This is cumbersome, but we should go for Annotation-based exclusions instead of package based (see inline comments). [RB] 3. Currently manual annotations trump automatic ones, which is a good thing, because people won't have unexpected results. Could you add a test for this? 4. I would give the analyzer util the call location based function name (String callLocation = Utils.getCallLocationName()). I think that will have better output than just `Function 'Job$2' has been analyzed with the following result: ...` 4. I would rename `UdfAnalyisMode` (see inline comments). 5. The hints are currently logged on a singe line w/o a whitespace after each hint, like this: ``` Function modifies static fields. This can lead to unexpected behaviour during runtime.Function returns 'null' values. This can lead to errors during runtime.A need for forwarded fields annotations could not be found. ``` 6. Some analysis like a wrong tuple index access or returning null lead fail the program before submitting it, which is very nice. I would actually like to log these problems on a log level WARN (instead of INFO) when only hints are enabled. 7. When you have optimize enabled and run a filter, which changes the input, the error you get is this. You can't tell what's wrong. ``` Exception in thread "main" org.apache.flink.api.java.sca.UdfErrorException: UDF contains obvious errors. at org.apache.flink.api.java.sca.UdfAnalyzer.analyze(UdfAnalyzer.java:300) ... ``` For a wrong tuple index access, it is correct: ``` Exception in thread "main" org.apache.flink.api.java.sca.UdfErrorException: UDF contains obvious errors. at org.apache.flink.api.java.sca.UdfAnalyzer.analyze(UdfAnalyzer.java:300) ... Caused by: org.apache.flink.api.java.sca.UdfErrorException: Function contains tuple accesses with invalid indexes. This can lead to errors during runtime. at org.apache.flink.api.java.sca.UdfAnalyzer.addHintOrThrowException(UdfAnalyzer.java:413) ... ``` I'm not sure if the thrown Execption should be an `UdfErrorException` or an `UdfAnalysisException`? 8. When optimizing is enabled, I would still print the inferred forwarded fields. 9. Regarding the result messages: personally, I think the hints/messages could be more consise, e.g. for example skip the `(should be kept to a minimum)` in the number of object creations msg or just say `Forwarded fields: none` instead of `A need for forwarded fields annotations could not be found.` **INTERNALS** 1. I think the internals could use more comments. It's easy to get the general idea, but it would be nice to also get some high-level comments about the ASM-related stuff. I didn't have time to dig deep into it. --- Will report back after cluster tests.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---