Stephan Ewen created FLINK-1319:
-----------------------------------
Summary: Add static code analysis for UDFs
Key: FLINK-1319
URL: https://issues.apache.org/jira/browse/FLINK-1319
Project: Flink
Issue Type: New Feature
Components: Java API, Scala API
Reporter: Stephan Ewen
Priority: Minor
Flink's Optimizer takes information that tells it for UDFs which fields of the
input elements are accessed, modified, or frwarded/copied. This information
frequently helps to reuse partitionings, sorts, etc. It may speed up programs
significantly, as it can frequently eliminate sorts and shuffles, which are
costly.
Right now, users can add lightweight annotations to UDFs to provide this
information (such as adding {{@ConstandFields("0->3, 1, 2->1")}}.
We worked with static code analysis of UDFs before, to determine this
information automatically. This is an incredible feature, as it "magically"
makes programs faster.
For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this works
surprisingly well in many cases. We used the "Soot" toolkit for the static code
analysis. Unfortunately, Soot is LGPL licensed and thus we did not include any
of the code so far.
I propose to add this functionality to Flink, in the form of a drop-in
addition, to work around the LGPL incompatibility with ALS 2.0. Users could
simply download a special "flink-code-analysis.jar" and drop it into the "lib"
folder to enable this functionality. We may even add a script to "tools" that
downloads that library automatically into the lib folder. This should be
legally fine, since we do not redistribute LGPL code and only dynamically link
it (the incompatibility with ASL 2.0 is mainly in the patentability, if I
remember correctly).
Prior work on this has been done by [~aljoscha] and [~skunert], which could
provide a code base to start with.
*Appendix*
Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
Papers on static analysis and for optimization:
http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and
http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
Quick introduction to the Optimizer:
http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf
(Section 6)
Optimizer for Iterations:
http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf
(Sections 4.3 and 5.3)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)