[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561838#comment-14561838
 ] 

ASF GitHub Bot commented on FLINK-1319:
---------------------------------------

Github user uce commented on the pull request:

    https://github.com/apache/flink/pull/729#issuecomment-106093530
  
    This is a very cool! Thanks for all the effort you put into this. I think 
this will be a great addition to the project. :-)
    
    ---
    
    I've skimmed over the code and checked it out. Some initial feedback:
    - **Naming UDF**. The feedback of committers giving talks about Flink some 
time ago was that the name "UDF" was sometimes confusing. @rmetzger, can you 
confirm this? We might take this into account and rename the `UdfAnalysisMode` 
to something else, for example just `CodeAnalysisMode`.
    - **Default mode**: the default mode is currently set to hinting. Running 
the WordCount example, it didn't work out of the box though. After some 
debugging I found this in the `UdfAnalyzer`, which prevented the analyis:
      ```java
    if (internalUdfClassName.startsWith(EXCLUDED_CLASSPATH)
            && 
!internalUdfClassName.startsWith("org/apache/flink/api/java/sca")) {
            return false;
    }
    ```
      After commenting the second condition out, it worked fine. For a 
WordCount with a reducer, which only touches the second field of the Tuple2, it 
worked like a charm. The produced output looks like this:
    
      ```
    23:23:49,207 INFO  org.apache.flink.api.java.operators.UdfOperatorUtils     
     - Function 'org.apache.flink.examples.java.wordcount.WordCount$Tokenizer' 
has been analyzed with the following result:
    Number of object creations (should be kept to a minimum): 1 in method / 36 
transitively
    A need for forwarded fields annotations could not be found.
    
    23:23:49,217 INFO  org.apache.flink.api.java.operators.UdfOperatorUtils     
     - Function 'org.apache.flink.examples.java.wordcount.WordCount$1' has been 
analyzed with the following result:
    Number of object creations (should be kept to a minimum): 1 in method / 1 
transitively
    You could use the following annotation: @ForwardedFields("f0->f0;")
    
    23:23:49,243 INFO  org.apache.flink.api.java.operators.UdfOperatorUtils     
     - Function 'org.apache.flink.api.java.Utils$CollectHelper' has been 
analyzed with the following result:
    Number of object creations (should be kept to a minimum): 0 in method / 6 
transitively
    A need for forwarded fields annotations could not be found.
    ```
      Very nice to see this working. This is great news for the optimizer. :-)
    
      What do the transitive object creations refer to exactly? I'm wondering 
how a user could influence them? Reusing a result object in the WordCount sum 
reducer is correctly detected as 0 creations in the method as well.
    
      I was wondering whether it could be possible to configure code analysis 
on a per-function level. For example, all library related functions should not 
print the hints imo.
    
    ---
    
    I very much like this. Regarding merging this before the release: I would 
vote to only merge it if we disable it by default. This will essentially affect 
every program written in the upcoming Flink release and merging it without 
proper review, testing, and exposure seems rushed to me. But I wouldn't veto 
it, if everyone wants it in asap.
    
    I think this feature alone would warrant a new 0.10 release soon after 0.9.


> Add static code analysis for UDFs
> ---------------------------------
>
>                 Key: FLINK-1319
>                 URL: https://issues.apache.org/jira/browse/FLINK-1319
>             Project: Flink
>          Issue Type: New Feature
>          Components: Java API, Scala API
>            Reporter: Stephan Ewen
>            Assignee: Timo Walther
>            Priority: Minor
>
> Flink's Optimizer takes information that tells it for UDFs which fields of 
> the input elements are accessed, modified, or frwarded/copied. This 
> information frequently helps to reuse partitionings, sorts, etc. It may speed 
> up programs significantly, as it can frequently eliminate sorts and shuffles, 
> which are costly.
> Right now, users can add lightweight annotations to UDFs to provide this 
> information (such as adding {{@ConstandFields("0->3, 1, 2->1")}}.
> We worked with static code analysis of UDFs before, to determine this 
> information automatically. This is an incredible feature, as it "magically" 
> makes programs faster.
> For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this 
> works surprisingly well in many cases. We used the "Soot" toolkit for the 
> static code analysis. Unfortunately, Soot is LGPL licensed and thus we did 
> not include any of the code so far.
> I propose to add this functionality to Flink, in the form of a drop-in 
> addition, to work around the LGPL incompatibility with ALS 2.0. Users could 
> simply download a special "flink-code-analysis.jar" and drop it into the 
> "lib" folder to enable this functionality. We may even add a script to 
> "tools" that downloads that library automatically into the lib folder. This 
> should be legally fine, since we do not redistribute LGPL code and only 
> dynamically link it (the incompatibility with ASL 2.0 is mainly in the 
> patentability, if I remember correctly).
> Prior work on this has been done by [~aljoscha] and [~skunert], which could 
> provide a code base to start with.
> *Appendix*
> Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
> Papers on static analysis and for optimization: 
> http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and 
> http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
> Quick introduction to the Optimizer: 
> http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf 
> (Section 6)
> Optimizer for Iterations: 
> http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf 
> (Sections 4.3 and 5.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to