[jira] [Commented] (SPARK-23986) CompileException when using too many avg aggregation after joining
[ https://issues.apache.org/jira/browse/SPARK-23986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439915#comment-16439915 ] Michel Davit commented on SPARK-23986: -- Thx [~mgaido]. I didn't have time to setup the environment to submit the pull request this weekend :) > CompileException when using too many avg aggregation after joining > -- > > Key: SPARK-23986 > URL: https://issues.apache.org/jira/browse/SPARK-23986 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Michel Davit >Priority: Major > Attachments: spark-generated.java > > > Considering the following code: > {code:java} > val df1: DataFrame = sparkSession.sparkContext > .makeRDD(Seq((0, 1, 2, 3, 4, 5, 6))) > .toDF("key", "col1", "col2", "col3", "col4", "col5", "col6") > val df2: DataFrame = sparkSession.sparkContext > .makeRDD(Seq((0, "val1", "val2"))) > .toDF("key", "dummy1", "dummy2") > val agg = df1 > .join(df2, df1("key") === df2("key"), "leftouter") > .groupBy(df1("key")) > .agg( > avg("col2").as("avg2"), > avg("col3").as("avg3"), > avg("col4").as("avg4"), > avg("col1").as("avg1"), > avg("col5").as("avg5"), > avg("col6").as("avg6") > ) > val head = agg.take(1) > {code} > This logs the following exception: > {code:java} > ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 467, Column 28: Redefinition of parameter "agg_expr_11" > {code} > I am not a spark expert but after investigation, I realized that the > generated {{doConsume}} method is responsible of the exception. > Indeed, {{avg}} calls several times > {{org.apache.spark.sql.execution.CodegenSupport.constructDoConsumeFunction}}. > The 1st time with the 'avg' Expr and a second time for the base aggregation > Expr (count and sum). > The problem comes from the generation of parameters in CodeGenerator: > {code:java} > /** >* Returns a term name that is unique within this instance of a > `CodegenContext`. >*/ > def freshName(name: String): String = synchronized { > val fullName = if (freshNamePrefix == "") { > name > } else { > s"${freshNamePrefix}_$name" > } > if (freshNameIds.contains(fullName)) { > val id = freshNameIds(fullName) > freshNameIds(fullName) = id + 1 > s"$fullName$id" > } else { > freshNameIds += fullName -> 1 > fullName > } > } > {code} > The {{freshNameIds}} already contains {{agg_expr_[1..6]}} from the 1st call. > The second call is made with {{agg_expr_[1..12]}} and generates the > following names: > {{agg_expr_[11|21|31|41|51|61|11|12]}}. We then have a parameter name > conflicts in the generated code: {{agg_expr_11.}} > Appending the 'id' in s"$fullName$id" to generate unique term name is source > of conflict. Maybe simply using undersoce can solve this issue : > $fullName_$id" -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23986) CompileException when using too many avg aggregation after joining
[ https://issues.apache.org/jira/browse/SPARK-23986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michel Davit updated SPARK-23986: - Description: Considering the following code: {code:java} val df1: DataFrame = sparkSession.sparkContext .makeRDD(Seq((0, 1, 2, 3, 4, 5, 6))) .toDF("key", "col1", "col2", "col3", "col4", "col5", "col6") val df2: DataFrame = sparkSession.sparkContext .makeRDD(Seq((0, "val1", "val2"))) .toDF("key", "dummy1", "dummy2") val agg = df1 .join(df2, df1("key") === df2("key"), "leftouter") .groupBy(df1("key")) .agg( avg("col2").as("avg2"), avg("col3").as("avg3"), avg("col4").as("avg4"), avg("col1").as("avg1"), avg("col5").as("avg5"), avg("col6").as("avg6") ) val head = agg.take(1) {code} This logs the following exception: {code:java} ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 467, Column 28: Redefinition of parameter "agg_expr_11" {code} I am not a spark expert but after investigation, I realized that the generated {{doConsume}} method is responsible of the exception. Indeed, {{avg}} calls several times {{org.apache.spark.sql.execution.CodegenSupport.constructDoConsumeFunction}}. The 1st time with the 'avg' Expr and a second time for the base aggregation Expr (count and sum). The problem comes from the generation of parameters in CodeGenerator: {code:java} /** * Returns a term name that is unique within this instance of a `CodegenContext`. */ def freshName(name: String): String = synchronized { val fullName = if (freshNamePrefix == "") { name } else { s"${freshNamePrefix}_$name" } if (freshNameIds.contains(fullName)) { val id = freshNameIds(fullName) freshNameIds(fullName) = id + 1 s"$fullName$id" } else { freshNameIds += fullName -> 1 fullName } } {code} The {{freshNameIds}} already contains {{agg_expr_[1..6]}} from the 1st call. The second call is made with {{agg_expr_[1..12]}} and generates the following names: {{agg_expr_[11|21|31|41|51|61|11|12]}}. We then have a parameter name conflicts in the generated code: {{agg_expr_11.}} Appending the 'id' in s"$fullName$id" to generate unique term name is source of conflict. Maybe simply using undersoce can solve this issue : $fullName_$id" was: Considering the following code: {code:java} val df1: DataFrame = sparkSession.sparkContext .makeRDD(Seq((0, 1, 2, 3, 4, 5, 6))) .toDF("key", "col1", "col2", "col3", "col4", "col5", "col6") val df2: DataFrame = sparkSession.sparkContext .makeRDD(Seq((0, "val1", "val2"))) .toDF("key", "dummy1", "dummy2") val agg = df1 .join(df2, df1("key") === df2("key"), "leftouter") .groupBy(df1("key")) .agg( avg("col2").as("avg2"), avg("col3").as("avg3"), avg("col4").as("avg4"), avg("col1").as("avg1"), avg("col5").as("avg5"), avg("col6").as("avg6") ) val head = agg.take(1) {code} This logs the following exception: {code:java} ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 467, Column 28: Redefinition of parameter "agg_expr_11" {code} I am not a spark expert but after investigation, I realized that the generated {{doConsume}} method is responsible of the exception. Indeed, {{avg}} calls several times {{org.apache.spark.sql.execution.CodegenSupport.constructDoConsumeFunction}}. The 1st time with the 'avg' Expr and a second time for the base aggregation Expr (count and sum). The problem comes from the generation of parameters in CodeGenerator: {code:java} /** * Returns a term name that is unique within this instance of a `CodegenContext`. */ def freshName(name: String): String = synchronized { val fullName = if (freshNamePrefix == "") { name } else { s"${freshNamePrefix}_$name" } if (freshNameIds.contains(fullName)) { val id = freshNameIds(fullName) freshNameIds(fullName) = id + 1 s"$fullName$id" } else { freshNameIds += fullName -> 1 fullName } } {code} The {{freshNameIds}} already contains {{agg_expr_[1..6]}} from the 1st call. The second call is made with {{agg_expr_[1..12]}} and generates the following names: {{agg_expr_[11|21|31|41|51|61|11|12}}. We then have a parameter name conflicts in the generated code: {{agg_expr_11.}} Appending the 'id' in s"$fullName$id" to generate unique term name is source of conflict. Maybe simply using undersoce can solve this issue : $fullName_$id" > CompileException when using too many avg aggregation after joining > -- > > Key: SPARK-23986 > URL: https://issues.apache
[jira] [Updated] (SPARK-23986) CompileException when using too many avg aggregation after joining
[ https://issues.apache.org/jira/browse/SPARK-23986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michel Davit updated SPARK-23986: - Description: Considering the following code: {code:java} val df1: DataFrame = sparkSession.sparkContext .makeRDD(Seq((0, 1, 2, 3, 4, 5, 6))) .toDF("key", "col1", "col2", "col3", "col4", "col5", "col6") val df2: DataFrame = sparkSession.sparkContext .makeRDD(Seq((0, "val1", "val2"))) .toDF("key", "dummy1", "dummy2") val agg = df1 .join(df2, df1("key") === df2("key"), "leftouter") .groupBy(df1("key")) .agg( avg("col2").as("avg2"), avg("col3").as("avg3"), avg("col4").as("avg4"), avg("col1").as("avg1"), avg("col5").as("avg5"), avg("col6").as("avg6") ) val head = agg.take(1) {code} This logs the following exception: {code:java} ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 467, Column 28: Redefinition of parameter "agg_expr_11" {code} I am not a spark expert but after investigation, I realized that the generated {{doConsume}} method is responsible of the exception. Indeed, {{avg}} calls several times {{org.apache.spark.sql.execution.CodegenSupport.constructDoConsumeFunction}}. The 1st time with the 'avg' Expr and a second time for the base aggregation Expr (count and sum). The problem comes from the generation of parameters in CodeGenerator: {code:java} /** * Returns a term name that is unique within this instance of a `CodegenContext`. */ def freshName(name: String): String = synchronized { val fullName = if (freshNamePrefix == "") { name } else { s"${freshNamePrefix}_$name" } if (freshNameIds.contains(fullName)) { val id = freshNameIds(fullName) freshNameIds(fullName) = id + 1 s"$fullName$id" } else { freshNameIds += fullName -> 1 fullName } } {code} The {{freshNameIds}} already contains {{agg_expr_[1..6]}} from the 1st call. The second call is made with {{agg_expr_[1..12]}} and generates the following names: {{agg_expr_[11|21|31|41|51|61|11|12}}. We then have a parameter name conflicts in the generated code: {{agg_expr_11.}} Appending the 'id' in s"$fullName$id" to generate unique term name is source of conflict. Maybe simply using undersoce can solve this issue : $fullName_$id" was: Considering the following code: {code:java} val df1: DataFrame = sparkSession.sparkContext .makeRDD(Seq((0, 1, 2, 3, 4, 5, 6))) .toDF("key", "col1", "col2", "col3", "col4", "col5", "col6") val df2: DataFrame = sparkSession.sparkContext .makeRDD(Seq((0, "val1", "val2"))) .toDF("key", "dummy1", "dummy2") val agg = df1 .join(df2, df1("key") === df2("key"), "leftouter") .groupBy(df1("key")) .agg( avg("col2").as("avg2"), avg("col3").as("avg3"), avg("col4").as("avg4"), avg("col1").as("avg1"), avg("col5").as("avg5"), avg("col6").as("avg6") ) val head = agg.take(1) {code} This logs the following exception: {code:java} ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 467, Column 28: Redefinition of parameter "agg_expr_11" {code} I am not a spark expert but after investigation, I realized that the generated {{doConsume}} method is responsible of the exception. Indeed, {{avg}} calls several times {{org.apache.spark.sql.execution.CodegenSupport.constructDoConsumeFunction}}. The 1st time with the 'avg' Expr and a second time for the base aggregation Expr (count and sum). The problem comes from the generation of parameters in CodeGenerator: {code:java} /** * Returns a term name that is unique within this instance of a `CodegenContext`. */ def freshName(name: String): String = synchronized { val fullName = if (freshNamePrefix == "") { name } else { s"${freshNamePrefix}_$name" } if (freshNameIds.contains(fullName)) { val id = freshNameIds(fullName) freshNameIds(fullName) = id + 1 s"$fullName$id" } else { freshNameIds += fullName -> 1 fullName } } {code} The {{freshNameIds}} already contains {{agg_expr_[1..6]}} from the 1st call. The second call is made with {{agg_expr_[1..12]}} and generates the following names: {{agg_expr_[11|21|31|41|51|61|11|12}}. We then have 2 parameter name conflicts in the generated code: {{agg_expr_11}} and {{agg_expr_12}}. Appending the 'id' in s"$fullName$id" to generate unique term name is source of conflict. Maybe simply using undersoce can solve this issue : $fullName_$id" > CompileException when using too many avg aggregation after joining > -- > > Key: SPARK-23986 > URL: ht
[jira] [Commented] (SPARK-23986) CompileException when using too many avg aggregation after joining
[ https://issues.apache.org/jira/browse/SPARK-23986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439318#comment-16439318 ] Michel Davit commented on SPARK-23986: -- I tested on Spark v2.3.0 I attached the generated code: [^spark-generated.java] . Here is the faulty line (467): {code:java} private void agg_doConsume1(int agg_expr_01, double agg_expr_11, boolean agg_exprIsNull_1, long agg_expr_21, boolean agg_exprIsNull_2, double agg_expr_31, boolean agg_exprIsNull_3, long agg_expr_41, boolean agg_exprIsNull_4, double agg_expr_51, boolean agg_exprIsNull_5, long agg_expr_61, boolean agg_exprIsNull_6, double agg_expr_7, boolean agg_exprIsNull_7, long agg_expr_8, boolean agg_exprIsNull_8, double agg_expr_9, boolean agg_exprIsNull_9, long agg_expr_10, boolean agg_exprIsNull_10, double agg_expr_11, boolean agg_exprIsNull_11, long agg_expr_12, boolean agg_exprIsNull_12) throws java.io.IOException {code} Maybe a precision: the code does not throw, it just logs an error. I also checked the computed average values, everything seems correct. > CompileException when using too many avg aggregation after joining > -- > > Key: SPARK-23986 > URL: https://issues.apache.org/jira/browse/SPARK-23986 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Michel Davit >Priority: Major > Attachments: spark-generated.java > > > Considering the following code: > {code:java} > val df1: DataFrame = sparkSession.sparkContext > .makeRDD(Seq((0, 1, 2, 3, 4, 5, 6))) > .toDF("key", "col1", "col2", "col3", "col4", "col5", "col6") > val df2: DataFrame = sparkSession.sparkContext > .makeRDD(Seq((0, "val1", "val2"))) > .toDF("key", "dummy1", "dummy2") > val agg = df1 > .join(df2, df1("key") === df2("key"), "leftouter") > .groupBy(df1("key")) > .agg( > avg("col2").as("avg2"), > avg("col3").as("avg3"), > avg("col4").as("avg4"), > avg("col1").as("avg1"), > avg("col5").as("avg5"), > avg("col6").as("avg6") > ) > val head = agg.take(1) > {code} > This logs the following exception: > {code:java} > ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 467, Column 28: Redefinition of parameter "agg_expr_11" > {code} > I am not a spark expert but after investigation, I realized that the > generated {{doConsume}} method is responsible of the exception. > Indeed, {{avg}} calls several times > {{org.apache.spark.sql.execution.CodegenSupport.constructDoConsumeFunction}}. > The 1st time with the 'avg' Expr and a second time for the base aggregation > Expr (count and sum). > The problem comes from the generation of parameters in CodeGenerator: > {code:java} > /** >* Returns a term name that is unique within this instance of a > `CodegenContext`. >*/ > def freshName(name: String): String = synchronized { > val fullName = if (freshNamePrefix == "") { > name > } else { > s"${freshNamePrefix}_$name" > } > if (freshNameIds.contains(fullName)) { > val id = freshNameIds(fullName) > freshNameIds(fullName) = id + 1 > s"$fullName$id" > } else { > freshNameIds += fullName -> 1 > fullName > } > } > {code} > The {{freshNameIds}} already contains {{agg_expr_[1..6]}} from the 1st call. > The second call is made with {{agg_expr_[1..12]}} and generates the > following names: > {{agg_expr_[11|21|31|41|51|61|11|12}}. We then have 2 parameter name > conflicts in the generated code: {{agg_expr_11}} and {{agg_expr_12}}. > Appending the 'id' in s"$fullName$id" to generate unique term name is source > of conflict. Maybe simply using undersoce can solve this issue : > $fullName_$id" -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23986) CompileException when using too many avg aggregation after joining
[ https://issues.apache.org/jira/browse/SPARK-23986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michel Davit updated SPARK-23986: - Attachment: spark-generated.java > CompileException when using too many avg aggregation after joining > -- > > Key: SPARK-23986 > URL: https://issues.apache.org/jira/browse/SPARK-23986 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Michel Davit >Priority: Major > Attachments: spark-generated.java > > > Considering the following code: > {code:java} > val df1: DataFrame = sparkSession.sparkContext > .makeRDD(Seq((0, 1, 2, 3, 4, 5, 6))) > .toDF("key", "col1", "col2", "col3", "col4", "col5", "col6") > val df2: DataFrame = sparkSession.sparkContext > .makeRDD(Seq((0, "val1", "val2"))) > .toDF("key", "dummy1", "dummy2") > val agg = df1 > .join(df2, df1("key") === df2("key"), "leftouter") > .groupBy(df1("key")) > .agg( > avg("col2").as("avg2"), > avg("col3").as("avg3"), > avg("col4").as("avg4"), > avg("col1").as("avg1"), > avg("col5").as("avg5"), > avg("col6").as("avg6") > ) > val head = agg.take(1) > {code} > This logs the following exception: > {code:java} > ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 467, Column 28: Redefinition of parameter "agg_expr_11" > {code} > I am not a spark expert but after investigation, I realized that the > generated {{doConsume}} method is responsible of the exception. > Indeed, {{avg}} calls several times > {{org.apache.spark.sql.execution.CodegenSupport.constructDoConsumeFunction}}. > The 1st time with the 'avg' Expr and a second time for the base aggregation > Expr (count and sum). > The problem comes from the generation of parameters in CodeGenerator: > {code:java} > /** >* Returns a term name that is unique within this instance of a > `CodegenContext`. >*/ > def freshName(name: String): String = synchronized { > val fullName = if (freshNamePrefix == "") { > name > } else { > s"${freshNamePrefix}_$name" > } > if (freshNameIds.contains(fullName)) { > val id = freshNameIds(fullName) > freshNameIds(fullName) = id + 1 > s"$fullName$id" > } else { > freshNameIds += fullName -> 1 > fullName > } > } > {code} > The {{freshNameIds}} already contains {{agg_expr_[1..6]}} from the 1st call. > The second call is made with {{agg_expr_[1..12]}} and generates the > following names: > {{agg_expr_[11|21|31|41|51|61|11|12}}. We then have 2 parameter name > conflicts in the generated code: {{agg_expr_11}} and {{agg_expr_12}}. > Appending the 'id' in s"$fullName$id" to generate unique term name is source > of conflict. Maybe simply using undersoce can solve this issue : > $fullName_$id" -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23986) CompileException when using too many avg aggregation after joining
[ https://issues.apache.org/jira/browse/SPARK-23986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michel Davit updated SPARK-23986: - Priority: Major (was: Minor) > CompileException when using too many avg aggregation after joining > -- > > Key: SPARK-23986 > URL: https://issues.apache.org/jira/browse/SPARK-23986 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Michel Davit >Priority: Major > > Considering the following code: > {code:java} > val df1: DataFrame = sparkSession.sparkContext > .makeRDD(Seq((0, 1, 2, 3, 4, 5, 6))) > .toDF("key", "col1", "col2", "col3", "col4", "col5", "col6") > val df2: DataFrame = sparkSession.sparkContext > .makeRDD(Seq((0, "val1", "val2"))) > .toDF("key", "dummy1", "dummy2") > val agg = df1 > .join(df2, df1("key") === df2("key"), "leftouter") > .groupBy(df1("key")) > .agg( > avg("col2").as("avg2"), > avg("col3").as("avg3"), > avg("col4").as("avg4"), > avg("col1").as("avg1"), > avg("col5").as("avg5"), > avg("col6").as("avg6") > ) > val head = agg.take(1) > {code} > This logs the following exception: > {code:java} > ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 467, Column 28: Redefinition of parameter "agg_expr_11" > {code} > I am not a spark expert but after investigation, I realized that the > generated {{doConsume}} method is responsible of the exception. > Indeed, {{avg}} calls several times > {{org.apache.spark.sql.execution.CodegenSupport.constructDoConsumeFunction}}. > The 1st time with the 'avg' Expr and a second time for the base aggregation > Expr (count and sum). > The problem comes from the generation of parameters in CodeGenerator: > {code:java} > /** >* Returns a term name that is unique within this instance of a > `CodegenContext`. >*/ > def freshName(name: String): String = synchronized { > val fullName = if (freshNamePrefix == "") { > name > } else { > s"${freshNamePrefix}_$name" > } > if (freshNameIds.contains(fullName)) { > val id = freshNameIds(fullName) > freshNameIds(fullName) = id + 1 > s"$fullName$id" > } else { > freshNameIds += fullName -> 1 > fullName > } > } > {code} > The {{freshNameIds}} already contains {{agg_expr_[1..6]}} from the 1st call. > The second call is made with {{agg_expr_[1..12]}} and generates the > following names: > {{agg_expr_[11|21|31|41|51|61|11|12}}. We then have 2 parameter name > conflicts in the generated code: {{agg_expr_11}} and {{agg_expr_12}}. > Appending the 'id' in s"$fullName$id" to generate unique term name is source > of conflict. Maybe simply using undersoce can solve this issue : > $fullName_$id" -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23986) CompileException when using too many avg aggregation after joining
Michel Davit created SPARK-23986: Summary: CompileException when using too many avg aggregation after joining Key: SPARK-23986 URL: https://issues.apache.org/jira/browse/SPARK-23986 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Reporter: Michel Davit Considering the following code: {code:java} val df1: DataFrame = sparkSession.sparkContext .makeRDD(Seq((0, 1, 2, 3, 4, 5, 6))) .toDF("key", "col1", "col2", "col3", "col4", "col5", "col6") val df2: DataFrame = sparkSession.sparkContext .makeRDD(Seq((0, "val1", "val2"))) .toDF("key", "dummy1", "dummy2") val agg = df1 .join(df2, df1("key") === df2("key"), "leftouter") .groupBy(df1("key")) .agg( avg("col2").as("avg2"), avg("col3").as("avg3"), avg("col4").as("avg4"), avg("col1").as("avg1"), avg("col5").as("avg5"), avg("col6").as("avg6") ) val head = agg.take(1) {code} This logs the following exception: {code:java} ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 467, Column 28: Redefinition of parameter "agg_expr_11" {code} I am not a spark expert but after investigation, I realized that the generated {{doConsume}} method is responsible of the exception. Indeed, {{avg}} calls several times {{org.apache.spark.sql.execution.CodegenSupport.constructDoConsumeFunction}}. The 1st time with the 'avg' Expr and a second time for the base aggregation Expr (count and sum). The problem comes from the generation of parameters in CodeGenerator: {code:java} /** * Returns a term name that is unique within this instance of a `CodegenContext`. */ def freshName(name: String): String = synchronized { val fullName = if (freshNamePrefix == "") { name } else { s"${freshNamePrefix}_$name" } if (freshNameIds.contains(fullName)) { val id = freshNameIds(fullName) freshNameIds(fullName) = id + 1 s"$fullName$id" } else { freshNameIds += fullName -> 1 fullName } } {code} The {{freshNameIds}} already contains {{agg_expr_[1..6]}} from the 1st call. The second call is made with {{agg_expr_[1..12]}} and generates the following names: {{agg_expr_[11|21|31|41|51|61|11|12}}. We then have 2 parameter name conflicts in the generated code: {{agg_expr_11}} and {{agg_expr_12}}. Appending the 'id' in s"$fullName$id" to generate unique term name is source of conflict. Maybe simply using undersoce can solve this issue : $fullName_$id" -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org