[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/6810 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-123187284 I'm working on a patch to enable Unsafe mode by default and ran into this problem as well, so I've opened #7560 to bring this patch up to date and merge it. I'll still credit you with the primary patch authorship. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-118639190 Now that you've moved the second bug to #7225, do you mind cleaning this PR up to reflect only the first bug? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user navis commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-118584139 Thanks, @JoshRosen. Actually, it's two bugs which is * memory leak on empty input * CCE in some cases (codeGen=false && (groupbyException.isEmpty || unsafe = false)) I've booked second bug to SPARK-8826(#7225). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-117362141 As I understand it, there are two separate bugs here: - A memory leak fix. - A bug related to mis-handling of empty inputs. If this is the case, can we fix these separately? Combining them into the same PR / test code makes them harder to understand. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-116950964 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-116950828 [Test build #36072 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36072/console) for PR 6810 at commit [`c5419b3`](https://github.com/apache/spark/commit/c5419b37f2ece4842d8c2bd7463a50b61245fbcf). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds the following public classes _(experimental)_: * `class InterpretedProjection(expressions: Seq[Expression], mutableRow: Boolean = false)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-116920493 [Test build #36072 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36072/consoleFull) for PR 6810 at commit [`c5419b3`](https://github.com/apache/spark/commit/c5419b37f2ece4842d8c2bd7463a50b61245fbcf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-116918598 Build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-116918513 Build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user navis commented on a diff in the pull request: https://github.com/apache/spark/pull/6810#discussion_r33535200 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala --- @@ -153,13 +153,14 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with Logging with Serializ } protected def newProjection( - expressions: Seq[Expression], inputSchema: Seq[Attribute]): Projection = { + expressions: Seq[Expression], + inputSchema: Seq[Attribute], mutableRow: Boolean = false): Projection = { --- End diff -- done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user navis commented on a diff in the pull request: https://github.com/apache/spark/pull/6810#discussion_r33535187 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/AggregateSuite.scala --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.sql.SQLConf +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.test.TestSQLContext +import org.apache.spark.sql.types.DataTypes._ + +class AggregateSuite extends SparkPlanTest { + + test("SPARK-8357 Memory leakage on unsafe aggregation path with empty input") { + +val input0 = Seq.empty[(String, Int, Double)] +val input1 = Seq(("Hello", 4, 2.0)) + +// hack : current default parallelism of test local backend is two --- End diff -- In the case that null buffer should be forwarded as described above, it should be handled in single task to make single row. But the parallelism for test is fixed as two("local[2]") and not easy to change the value. So expected output is two rows, instead of one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user navis commented on a diff in the pull request: https://github.com/apache/spark/pull/6810#discussion_r33533948 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/AggregateSuite.scala --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.sql.SQLConf +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.test.TestSQLContext +import org.apache.spark.sql.types.DataTypes._ + +class AggregateSuite extends SparkPlanTest { + + test("SPARK-8357 Memory leakage on unsafe aggregation path with empty input") { + +val input0 = Seq.empty[(String, Int, Double)] +val input1 = Seq(("Hello", 4, 2.0)) + +// hack : current default parallelism of test local backend is two +val x0 = Seq(Tuple1(0L), Tuple1(0L)) +val y0 = Seq.empty[Tuple1[Long]] + +val x1 = Seq(Tuple1(0L), Tuple1(1L)) +val y1 = Seq(Tuple1(1L)) + +val codegenDefault = TestSQLContext.getConf(SQLConf.CODEGEN_ENABLED) +try { + for ((input, x, y) <- Seq((input0, x0, y0), (input1, x1, y1))) { +val df = input.toDF("a", "b", "c") +val colB = df.col("b").expr +val colC = df.col("c").expr +val aggrExpr = Alias(Count(Cast(colC, LongType)), "Count")() + +for ((codegen, unsafe) <- Seq((false, false), (true, false), (true, true)); + partial <- Seq(false, true); groupExpr <- Seq(colB :: Nil, Seq.empty)) { --- End diff -- nit = fixed It seemed related to the question below : "(partial || groupingExpressions.nonEmpty)". Handling some test fails, I've recognized that even with empty input iterator, if the group-by operator is for global(groupingExpression.isEmpty) and final(partial=false), we still need to make a null row. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user navis commented on a diff in the pull request: https://github.com/apache/spark/pull/6810#discussion_r33533643 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/AggregateSuite.scala --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.sql.SQLConf +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.test.TestSQLContext +import org.apache.spark.sql.types.DataTypes._ + +class AggregateSuite extends SparkPlanTest { + + test("SPARK-8357 Memory leakage on unsafe aggregation path with empty input") { + +val input0 = Seq.empty[(String, Int, Double)] +val input1 = Seq(("Hello", 4, 2.0)) + +// hack : current default parallelism of test local backend is two +val x0 = Seq(Tuple1(0L), Tuple1(0L)) +val y0 = Seq.empty[Tuple1[Long]] + +val x1 = Seq(Tuple1(0L), Tuple1(1L)) +val y1 = Seq(Tuple1(1L)) + +val codegenDefault = TestSQLContext.getConf(SQLConf.CODEGEN_ENABLED) +try { + for ((input, x, y) <- Seq((input0, x0, y0), (input1, x1, y1))) { +val df = input.toDF("a", "b", "c") +val colB = df.col("b").expr +val colC = df.col("c").expr +val aggrExpr = Alias(Count(Cast(colC, LongType)), "Count")() + +for ((codegen, unsafe) <- Seq((false, false), (true, false), (true, true)); --- End diff -- Just wanted to see all is well. I'll remove the condition. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/6810#discussion_r33524986 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/GeneratedAggregate.scala --- @@ -270,7 +270,10 @@ case class GeneratedAggregate( val joinedRow = new JoinedRow3 - if (groupingExpressions.isEmpty) { + if (!iter.hasNext && (partial || groupingExpressions.nonEmpty)) { --- End diff -- Can you explain the `(partial || groupingExpressions.nonEmpty)` here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/6810#discussion_r33524575 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/AggregateSuite.scala --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.sql.SQLConf +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.test.TestSQLContext +import org.apache.spark.sql.types.DataTypes._ + +class AggregateSuite extends SparkPlanTest { + + test("SPARK-8357 Memory leakage on unsafe aggregation path with empty input") { + +val input0 = Seq.empty[(String, Int, Double)] +val input1 = Seq(("Hello", 4, 2.0)) + +// hack : current default parallelism of test local backend is two +val x0 = Seq(Tuple1(0L), Tuple1(0L)) +val y0 = Seq.empty[Tuple1[Long]] + +val x1 = Seq(Tuple1(0L), Tuple1(1L)) +val y1 = Seq(Tuple1(1L)) + +val codegenDefault = TestSQLContext.getConf(SQLConf.CODEGEN_ENABLED) +try { + for ((input, x, y) <- Seq((input0, x0, y0), (input1, x1, y1))) { +val df = input.toDF("a", "b", "c") +val colB = df.col("b").expr +val colC = df.col("c").expr +val aggrExpr = Alias(Count(Cast(colC, LongType)), "Count")() + +for ((codegen, unsafe) <- Seq((false, false), (true, false), (true, true)); + partial <- Seq(false, true); groupExpr <- Seq(colB :: Nil, Seq.empty)) { --- End diff -- AFAIK we also don't need to test `partial` here, since that only has an effect on the `requiredChildDistribution`, which would matter if we were running this through the physical plan optimizer but which shouldn't matter here because we're explicitly constructing the plan physical plan by hand. Have I overlooked something here, though? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/6810#discussion_r33523617 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala --- @@ -153,13 +153,14 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with Logging with Serializ } protected def newProjection( - expressions: Seq[Expression], inputSchema: Seq[Attribute]): Projection = { + expressions: Seq[Expression], + inputSchema: Seq[Attribute], mutableRow: Boolean = false): Projection = { --- End diff -- Style nit: `mutableRow` needs to go on its own line (one parameter per line). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/6810#discussion_r33523568 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/AggregateSuite.scala --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.sql.SQLConf +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.test.TestSQLContext +import org.apache.spark.sql.types.DataTypes._ + +class AggregateSuite extends SparkPlanTest { + + test("SPARK-8357 Memory leakage on unsafe aggregation path with empty input") { + +val input0 = Seq.empty[(String, Int, Double)] +val input1 = Seq(("Hello", 4, 2.0)) + +// hack : current default parallelism of test local backend is two --- End diff -- Can you elaborate a bit on this comment? Why does the parallelism of the test backend matter here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/6810#discussion_r33523518 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/AggregateSuite.scala --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.sql.SQLConf +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.test.TestSQLContext +import org.apache.spark.sql.types.DataTypes._ + +class AggregateSuite extends SparkPlanTest { + + test("SPARK-8357 Memory leakage on unsafe aggregation path with empty input") { + +val input0 = Seq.empty[(String, Int, Double)] +val input1 = Seq(("Hello", 4, 2.0)) + +// hack : current default parallelism of test local backend is two +val x0 = Seq(Tuple1(0L), Tuple1(0L)) +val y0 = Seq.empty[Tuple1[Long]] + +val x1 = Seq(Tuple1(0L), Tuple1(1L)) +val y1 = Seq(Tuple1(1L)) + +val codegenDefault = TestSQLContext.getConf(SQLConf.CODEGEN_ENABLED) +try { + for ((input, x, y) <- Seq((input0, x0, y0), (input1, x1, y1))) { +val df = input.toDF("a", "b", "c") +val colB = df.col("b").expr +val colC = df.col("c").expr +val aggrExpr = Alias(Count(Cast(colC, LongType)), "Count")() + +for ((codegen, unsafe) <- Seq((false, false), (true, false), (true, true)); + partial <- Seq(false, true); groupExpr <- Seq(colB :: Nil, Seq.empty)) { --- End diff -- Style nit: put each nested loop on its own line (e.g. one <- per line, since that's easier to read). Also, I think `Seq(Seq(colB), Seq.empty)` is a bit more readable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/6810#discussion_r33523492 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/AggregateSuite.scala --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.sql.SQLConf +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.test.TestSQLContext +import org.apache.spark.sql.types.DataTypes._ + +class AggregateSuite extends SparkPlanTest { + + test("SPARK-8357 Memory leakage on unsafe aggregation path with empty input") { + +val input0 = Seq.empty[(String, Int, Double)] +val input1 = Seq(("Hello", 4, 2.0)) + +// hack : current default parallelism of test local backend is two +val x0 = Seq(Tuple1(0L), Tuple1(0L)) +val y0 = Seq.empty[Tuple1[Long]] + +val x1 = Seq(Tuple1(0L), Tuple1(1L)) +val y1 = Seq(Tuple1(1L)) + +val codegenDefault = TestSQLContext.getConf(SQLConf.CODEGEN_ENABLED) +try { + for ((input, x, y) <- Seq((input0, x0, y0), (input1, x1, y1))) { +val df = input.toDF("a", "b", "c") +val colB = df.col("b").expr +val colC = df.col("c").expr +val aggrExpr = Alias(Count(Cast(colC, LongType)), "Count")() + +for ((codegen, unsafe) <- Seq((false, false), (true, false), (true, true)); --- End diff -- AFAIK we only need to test the `(true, true)` case, since this bug only occurred when using the unsafe aggregation path? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-115647116 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-115647008 [Test build #35850 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35850/console) for PR 6810 at commit [`19907a5`](https://github.com/apache/spark/commit/19907a5984936bb6a15a92b98a5d418576624fe3). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class StreamingLinearAlgorithm(object):` * `class StreamingLogisticRegressionWithSGD(StreamingLinearAlgorithm):` * `class LinearDataGenerator(object):` * `class InterpretedProjection(expressions: Seq[Expression], mutableRow: Boolean = false)` * `case class CountFunction(expr: Expression, base: AggregateExpression) extends AggregateFunction ` * `case class CountDistinctFunction(` * `case class ApproxCountDistinctPartitionFunction(` * `case class ApproxCountDistinctMergeFunction(` * `case class Sum(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression] ` * `case class CombineSum(child: Expression) extends AggregateExpression ` * `case class SumDistinct(child: Expression)` * `case class CombineSetsAndSum(inputSet: Expression, base: Expression) extends AggregateExpression ` * `case class CombineSetsAndSumFunction(` * `case class First(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression] ` * `case class Last(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression] ` * `case class Sha2(left: Expression, right: Expression)` * `case class BroadcastHint(child: LogicalPlan) extends UnaryNode ` * `case class PrecisionInfo(precision: Int, scale: Int) ` * `case class TakeOrderedAndProject(` * `s"Using output committer class $` * `logInfo(s"Using user defined output committer class $` * ` s"Using output committer class $` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-115591371 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-115591322 [Test build #35849 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35849/console) for PR 6810 at commit [`c4b1f32`](https://github.com/apache/spark/commit/c4b1f32dbd0f890c989ee8b6e2038320a8a2e1b4). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class InterpretedProjection(expressions: Seq[Expression], mutableRow: Boolean = false)` * `case class Sha2(left: Expression, right: Expression)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-115588321 [Test build #35850 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35850/consoleFull) for PR 6810 at commit [`19907a5`](https://github.com/apache/spark/commit/19907a5984936bb6a15a92b98a5d418576624fe3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-115587446 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-115587563 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-115586038 [Test build #35849 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35849/consoleFull) for PR 6810 at commit [`c4b1f32`](https://github.com/apache/spark/commit/c4b1f32dbd0f890c989ee8b6e2038320a8a2e1b4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-115585527 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-115585420 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user navis commented on a diff in the pull request: https://github.com/apache/spark/pull/6810#discussion_r5692 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Projection.scala --- @@ -22,9 +22,11 @@ package org.apache.spark.sql.catalyst.expressions * @param expressions a sequence of expressions that determine the value of each column of the *output row. */ -class InterpretedProjection(expressions: Seq[Expression]) extends Projection { - def this(expressions: Seq[Expression], inputSchema: Seq[Attribute]) = -this(expressions.map(BindReferences.bindReference(_, inputSchema))) +class InterpretedProjection(expressions: Seq[Expression], mutableRow: Boolean = false) + extends Projection { + def this(expressions: Seq[Expression], + inputSchema: Seq[Attribute], mutableRow: Boolean = false) = --- End diff -- Sound like there should another test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user navis commented on a diff in the pull request: https://github.com/apache/spark/pull/6810#discussion_r5666 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Projection.scala --- @@ -22,9 +22,11 @@ package org.apache.spark.sql.catalyst.expressions * @param expressions a sequence of expressions that determine the value of each column of the *output row. */ -class InterpretedProjection(expressions: Seq[Expression]) extends Projection { - def this(expressions: Seq[Expression], inputSchema: Seq[Attribute]) = -this(expressions.map(BindReferences.bindReference(_, inputSchema))) +class InterpretedProjection(expressions: Seq[Expression], mutableRow: Boolean = false) + extends Projection { + def this(expressions: Seq[Expression], + inputSchema: Seq[Attribute], mutableRow: Boolean = false) = --- End diff -- Sorry for late reply. It happened when I changed empty input in AggregateSuite test to something like ("Hello", 4, 2.0). In GeneratedAggreate, whenever the grouping expression is empty or unsafeEnabled is not applied for some reason spark tries to case GenericRow to MutableRow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/6810#discussion_r33059743 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Projection.scala --- @@ -22,9 +22,11 @@ package org.apache.spark.sql.catalyst.expressions * @param expressions a sequence of expressions that determine the value of each column of the *output row. */ -class InterpretedProjection(expressions: Seq[Expression]) extends Projection { - def this(expressions: Seq[Expression], inputSchema: Seq[Attribute]) = -this(expressions.map(BindReferences.bindReference(_, inputSchema))) +class InterpretedProjection(expressions: Seq[Expression], mutableRow: Boolean = false) + extends Projection { + def this(expressions: Seq[Expression], + inputSchema: Seq[Attribute], mutableRow: Boolean = false) = --- End diff -- I see that your comment mentioned that this is avoiding a potential ClassCastException. Did you actually run into that exception somewhere? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/6810#discussion_r33059434 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Projection.scala --- @@ -22,9 +22,11 @@ package org.apache.spark.sql.catalyst.expressions * @param expressions a sequence of expressions that determine the value of each column of the *output row. */ -class InterpretedProjection(expressions: Seq[Expression]) extends Projection { - def this(expressions: Seq[Expression], inputSchema: Seq[Attribute]) = -this(expressions.map(BindReferences.bindReference(_, inputSchema))) +class InterpretedProjection(expressions: Seq[Expression], mutableRow: Boolean = false) + extends Projection { + def this(expressions: Seq[Expression], + inputSchema: Seq[Attribute], mutableRow: Boolean = false) = --- End diff -- Could you explain this `mutableRow` change? If we do need to keep it, the style here isn't quite right; see https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide for how to properly wrap long parameter lists. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114422982 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114422960 [Test build #35532 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35532/console) for PR 6810 at commit [`f1859be`](https://github.com/apache/spark/commit/f1859bee95bbd688e42e8d1953acc6fd121eb487). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class InterpretedProjection(expressions: Seq[Expression], mutableRow: Boolean = false)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114395415 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114394069 [Test build #35532 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35532/consoleFull) for PR 6810 at commit [`f1859be`](https://github.com/apache/spark/commit/f1859bee95bbd688e42e8d1953acc6fd121eb487). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114393939 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114393960 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user navis commented on a diff in the pull request: https://github.com/apache/spark/pull/6810#discussion_r33016547 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala --- @@ -49,7 +49,7 @@ import org.apache.spark.sql.SQLConf import org.apache.spark.sql.catalyst.expressions.{Attribute, Expression, InternalRow, _} import org.apache.spark.sql.execution.{LeafNode, SparkPlan, UnaryNode} import org.apache.spark.sql.types.StructType -import org.apache.spark.{Logging, TaskContext} --- End diff -- Seemed mistakenly included. Will be removed in next update. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user navis commented on a diff in the pull request: https://github.com/apache/spark/pull/6810#discussion_r33016397 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/AggregateSuite.scala --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.test.TestSQLContext +import org.apache.spark.sql.types.DataTypes._ + +class AggregateSuite extends SparkPlanTest { + + test("SPARK-8357 Memory leakage on unsafe aggregation path with empty input") { + +val df = Seq.empty[(String, Int, Double)].toDF("a", "b", "c") + +val groupExpr = df.col("b").expr +val aggrExpr = Alias(Count(Cast(groupExpr, LongType)), "Count")() + +for ((codegen, unsafe) <- Seq((false, false), (true, false), (true, true)); + partial <- Seq(false, true)) { + TestSQLContext.conf.setConfString("spark.sql.codegen", String.valueOf(codegen)) --- End diff -- Done. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114393003 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114392946 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/6810#discussion_r33015113 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/AggregateSuite.scala --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.test.TestSQLContext +import org.apache.spark.sql.types.DataTypes._ + +class AggregateSuite extends SparkPlanTest { + + test("SPARK-8357 Memory leakage on unsafe aggregation path with empty input") { + +val df = Seq.empty[(String, Int, Double)].toDF("a", "b", "c") + +val groupExpr = df.col("b").expr +val aggrExpr = Alias(Count(Cast(groupExpr, LongType)), "Count")() + +for ((codegen, unsafe) <- Seq((false, false), (true, false), (true, true)); + partial <- Seq(false, true)) { + TestSQLContext.conf.setConfString("spark.sql.codegen", String.valueOf(codegen)) --- End diff -- Could you use the new SparkConf API? This line will be `TestSQLContext.setConf(SQLConf.CODEGEN_ENABLED, codegen)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114335130 Also, if we do introduce a more general way of handling this sort of state cleanup as part of the test runner, then I suspect we'll want to both remove a lot of the old try-finally code and will have to fix any bugs due to tests that implicitly relied on the old behavior. Since that has the potential to grow to cover a lot more changes than this bugfix, I'd like to tackle it separately. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114334861 I appreciate you trying to add a more general save / restore / backup mechanism for SqlConf state, but I'd rather not introduce that change as part of a bugfix. I think that we might want to be able to introduce / test / backport that test infra change independent of the change here. Do you mind just mimicking the simple try-finally save/restore that's used by other SQL tests and rolling back the wider test infra changes? Happy to include them, just not here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114333710 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114333653 [Test build #35507 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35507/console) for PR 6810 at commit [`d20f8f9`](https://github.com/apache/spark/commit/d20f8f9ee075e94541f33760b9f45db01d8ff1d2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114328832 [Test build #35507 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35507/consoleFull) for PR 6810 at commit [`d20f8f9`](https://github.com/apache/spark/commit/d20f8f9ee075e94541f33760b9f45db01d8ff1d2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114327877 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114327952 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114325568 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114325567 [Test build #35504 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35504/console) for PR 6810 at commit [`eb86c67`](https://github.com/apache/spark/commit/eb86c67265f283e56408ca4b30ba3a5dddffdd86). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114325431 [Test build #35504 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35504/consoleFull) for PR 6810 at commit [`eb86c67`](https://github.com/apache/spark/commit/eb86c67265f283e56408ca4b30ba3a5dddffdd86). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114325291 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114325255 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/6810#discussion_r32998574 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/AggregateSuite.scala --- @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.SparkEnv +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.test.TestSQLContext +import org.apache.spark.sql.types.DataTypes._ + +class AggregateSuite extends SparkPlanTest { + + test("SPARK-8357 Memory leakage on unsafe aggregation path with empty input") { + +val df = Seq.empty[(String, Int, Double)].toDF("a", "b", "c") + +val groupExpr = df.col("b").expr +val aggrExpr = Alias(Count(Cast(groupExpr, LongType)), "Count")() + +SparkEnv.get.conf.set("spark.unsafe.exceptionOnMemoryLeak", "true") --- End diff -- Ah, I can see how this would be an issue for IntelliJ. If we do want to modify the conf, though, we should do it elsewhere (when creating the SparkContext that's used to create the TestSQLContext). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user navis commented on a diff in the pull request: https://github.com/apache/spark/pull/6810#discussion_r32998276 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/AggregateSuite.scala --- @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.SparkEnv +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.test.TestSQLContext +import org.apache.spark.sql.types.DataTypes._ + +class AggregateSuite extends SparkPlanTest { + + test("SPARK-8357 Memory leakage on unsafe aggregation path with empty input") { + +val df = Seq.empty[(String, Int, Double)].toDF("a", "b", "c") + +val groupExpr = df.col("b").expr +val aggrExpr = Alias(Count(Cast(groupExpr, LongType)), "Count")() + +SparkEnv.get.conf.set("spark.unsafe.exceptionOnMemoryLeak", "true") +for (codegen <- Seq(false, true); partial <- Seq(false, true); unsafe <- Seq(false, true)) { --- End diff -- Ok, I'll do that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user navis commented on a diff in the pull request: https://github.com/apache/spark/pull/6810#discussion_r32998221 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/AggregateSuite.scala --- @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.SparkEnv +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.test.TestSQLContext +import org.apache.spark.sql.types.DataTypes._ + +class AggregateSuite extends SparkPlanTest { + + test("SPARK-8357 Memory leakage on unsafe aggregation path with empty input") { + +val df = Seq.empty[(String, Int, Double)].toDF("a", "b", "c") + +val groupExpr = df.col("b").expr +val aggrExpr = Alias(Count(Cast(groupExpr, LongType)), "Count")() + +SparkEnv.get.conf.set("spark.unsafe.exceptionOnMemoryLeak", "true") --- End diff -- Yes, I know that. Maybe I'm wrong but when running the test in IDE(intelliJ), the policy seemed not applied. I'll remove this line, anyway. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/6810#discussion_r32984080 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/AggregateSuite.scala --- @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.SparkEnv +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.test.TestSQLContext +import org.apache.spark.sql.types.DataTypes._ + +class AggregateSuite extends SparkPlanTest { + + test("SPARK-8357 Memory leakage on unsafe aggregation path with empty input") { + +val df = Seq.empty[(String, Int, Double)].toDF("a", "b", "c") + +val groupExpr = df.col("b").expr +val aggrExpr = Alias(Count(Cast(groupExpr, LongType)), "Count")() + +SparkEnv.get.conf.set("spark.unsafe.exceptionOnMemoryLeak", "true") +for (codegen <- Seq(false, true); partial <- Seq(false, true); unsafe <- Seq(false, true)) { --- End diff -- Actually, some of these cross-product options don't make sense: if codegen is off, then the planner won't use GeneratedAggregate. I think that we should change this test to store the old configuration values, then enable unsafe, run the regression test, then switch back to the old value. We should use a try-finally block to ensure that the reset of the settings is done properly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/6810#discussion_r32970075 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/AggregateSuite.scala --- @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.SparkEnv +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.test.TestSQLContext +import org.apache.spark.sql.types.DataTypes._ + +class AggregateSuite extends SparkPlanTest { + + test("SPARK-8357 Memory leakage on unsafe aggregation path with empty input") { + +val df = Seq.empty[(String, Int, Double)].toDF("a", "b", "c") + +val groupExpr = df.col("b").expr +val aggrExpr = Alias(Count(Cast(groupExpr, LongType)), "Count")() + +SparkEnv.get.conf.set("spark.unsafe.exceptionOnMemoryLeak", "true") +for (codegen <- Seq(false, true); partial <- Seq(false, true); unsafe <- Seq(false, true)) { --- End diff -- I don't think that this is necessarily a safe way to implement this kind of configuration cross-product logic in SQL because this test's modifications to the global `TestSqlContext` won't be cleaned up at the end of this test. Ping @mambrus, is there a right way to handle this sort of cleanup / rollback of configuration changes? In Spark core, we have a similar problem with system properties and use a test suite mixin to automatically snapshot the configuration before the tests and roll back to it after they've run. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/6810#discussion_r32969602 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/AggregateSuite.scala --- @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.SparkEnv +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.test.TestSQLContext +import org.apache.spark.sql.types.DataTypes._ + +class AggregateSuite extends SparkPlanTest { + + test("SPARK-8357 Memory leakage on unsafe aggregation path with empty input") { + +val df = Seq.empty[(String, Int, Double)].toDF("a", "b", "c") + +val groupExpr = df.col("b").expr +val aggrExpr = Alias(Count(Cast(groupExpr, LongType)), "Count")() + +SparkEnv.get.conf.set("spark.unsafe.exceptionOnMemoryLeak", "true") --- End diff -- This option is automatically enabled when running tests in Maven or SBT, so we don't need to put this here. Additionally, mutating the SparkConf after it's already been used to create a SparkContext is not always safe, so I think that we should remove this line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114154895 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114154839 [Test build #35450 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35450/console) for PR 6810 at commit [`0a5bf17`](https://github.com/apache/spark/commit/0a5bf170a1ed89344f2a57bd891154c17a6c44f8). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class Module(object):` * `class PCAModel(JavaVectorTransformer):` * `class PCA(object):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114108709 [Test build #35450 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35450/consoleFull) for PR 6810 at commit [`0a5bf17`](https://github.com/apache/spark/commit/0a5bf170a1ed89344f2a57bd891154c17a6c44f8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114107943 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114108019 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114057184 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114057166 [Test build #35440 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35440/console) for PR 6810 at commit [`36ecd1f`](https://github.com/apache/spark/commit/36ecd1f4d9ef5d49be4f4826adf0778c70b82395). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114053435 [Test build #35440 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35440/consoleFull) for PR 6810 at commit [`36ecd1f`](https://github.com/apache/spark/commit/36ecd1f4d9ef5d49be4f4826adf0778c70b82395). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114052812 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114038705 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user navis commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114038518 @JoshRosen Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user navis commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-114017606 @JoshRosen Sure, I'll review your patch first. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-113321764 Now that we've merged #6885, it should be possible to write a regression test for this. Take a look and let me know if you're interested in doing that. If not, I can merge this as-is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-111859376 [Test build #34889 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34889/console) for PR 6810 at commit [`993c289`](https://github.com/apache/spark/commit/993c289afbee58b9cb82d7c828d8658c25baab91). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-111859431 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-111844186 [Test build #34889 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/34889/consoleFull) for PR 6810 at commit [`993c289`](https://github.com/apache/spark/commit/993c289afbee58b9cb82d7c828d8658c25baab91). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-111844007 I'm actually working on some tooling that will make it very easy to write regressions tests for these kinds of issues (basically, a way to instantiate individual operators and run them with canned inputs, which is much easier than trying to rig a high-level query that triggers the code path under test). This should not block on that, though, but I'll ping this thread later today with an example of how we could write such a test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-111843956 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-111843953 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-111843931 Jenkins, this is ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-111777483 I guess there's not a super easy way to add a regression test for this since it's kind of hard to just instantiate the operator and run it with some empty input. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/6810#discussion_r32375944 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/GeneratedAggregate.scala --- @@ -270,7 +270,9 @@ case class GeneratedAggregate( val joinedRow = new JoinedRow3 - if (groupingExpressions.isEmpty) { + if (!iter.hasNext) { +Iterator[InternalRow]() --- End diff -- Can we add a comment here to explain why we're doing this? Just needs to mention the memory leak issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6810#issuecomment-111776267 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8357] [SQL] Memory leakage on unsafe ag...
GitHub user navis opened a pull request: https://github.com/apache/spark/pull/6810 [SPARK-8357] [SQL] Memory leakage on unsafe aggregation path with empty input Currently, unsafe-based hash is released on 'next' call but if input is empty, it would not be called ever. You can merge this pull request into a Git repository by running: $ git pull https://github.com/navis/spark SPARK-8357 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6810.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6810 commit b67b8aad2f4ebdcbfeba173bfaf626d73e4f2355 Author: navis.ryu Date: 2015-06-14T03:10:03Z [SPARK-8357] [SQL] Memory leakage on unsafe aggregation path with empty input --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org