[ https://issues.apache.org/jira/browse/SPARK-13431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159649#comment-15159649 ]
Kazuaki Ishizaki commented on SPARK-13431: ------------------------------------------ I identified why this problem occurs only in maven. Shade plugin for maven increases the length of Java bytecode for a method. This increasing happens since shade plugin rewrites Java bytecode to rebuild constant pool. Here is an output of ``javap -c SparkSqlParser_ExpressionParser.class`` before applying shade plugin. The static initializer ``static{}`` uses ``ldc`` bytecode for accessing constant pool at offset 13, 18, and 23. Each ``ldc`` consume only two bytes. As a result, the bytecode length of this method is *less than 65536*. {code} public class org.apache.spark.sql.catalyst.parser.SparkSqlParser_ExpressionParser extends org.antlr.runtime.Parser { ... static {}; Code: 0: bipush 70 2: anewarray #1035 // class java/lang/String 5: dup 6: iconst_0 7: ldc_w #1036 // String ... 10: aastore 11: dup 12: iconst_1 13: ldc #127 // String 15: aastore 16: dup 17: iconst_2 18: ldc #127 // String 20: aastore 21: dup 22: iconst_3 23: ldc #127 // String 25: aastore ... 59900: return } } {code} After applying shade plugin, the static initializer ``static{}`` uses ``ldc_w`` bytecode for accessing constant pool at offset 13, 19, and 25. Each ``ldc_w`` consumes three bytes. As a result, the bytecode length of this method is *more than 65535*. {code} static {}; Code: 0: bipush 70 2: anewarray #2965 // class java/lang/String 5: dup 6: iconst_0 7: ldc_w #5240 // String ... 10: aastore 11: dup 12: iconst_1 13: ldc_w #2924 // String 16: aastore 17: dup 18: iconst_2 19: ldc_w #2924 // String 22: aastore 23: dup 24: iconst_3 25: ldc_w #2924 // String 28: aastore ... 65533: lconst_0 65534: lastore ... } } {code} Shading plugin seems to rebuild constant pool based on [this comment|http://svn.apache.org/viewvc/maven/plugins/tags/maven-shade-plugin-2.4.3/src/main/java/org/apache/maven/plugins/shade/DefaultShader.java?view=markup#l417]. To use a lot of constant pool entry due to many definitions of String may increase the entry index of the constant pool. As a result, it leads to replace ``ldc`` with ``ldc_w``. Finally, the length of Java bytecode is increased. As a next step, what will we do? * Can we avoid this rebuild by an option? * Can we create a pull request for shade plugin to avoid this? * Can we use another plugin? * Can we split ExpressionParser.g into smaller files? * Other solutions? > Maven build fails due to: Method code too large! in Catalyst > ------------------------------------------------------------ > > Key: SPARK-13431 > URL: https://issues.apache.org/jira/browse/SPARK-13431 > Project: Spark > Issue Type: Bug > Components: Build > Affects Versions: 2.0.0 > Reporter: Stavros Kontopoulos > Priority: Blocker > > Cannot build the project when run the normal build commands: > eg. > {code} > build/mvn -Phadoop-2.6 -Dhadoop.version=2.6.0 clean package > ./make-distribution.sh --name test --tgz -Phadoop-2.6 > {code} > Integration builds are also failing: > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.6/229/console > https://ci.typesafe.com/job/mit-docker-test-zk-ref/12/console > It looks like this is the commit that introduced the issue: > https://github.com/apache/spark/commit/7925071280bfa1570435bde3e93492eaf2167d56 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org