[spark] branch branch-3.0 updated: [SPARK-32906][SQL] Struct field names should not change after normalizing floats
This is an automated email from the ASF dual-hosted git repository. viirya pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2d55de5 [SPARK-32906][SQL] Struct field names should not change after normalizing floats 2d55de5 is described below commit 2d55de5db06ff522a8e7fded3760c3f65c8bbf93 Author: Takeshi Yamamuro AuthorDate: Thu Sep 17 22:07:47 2020 -0700 [SPARK-32906][SQL] Struct field names should not change after normalizing floats ### What changes were proposed in this pull request? This PR intends to fix a minor bug when normalizing floats for struct types; ``` scala> import org.apache.spark.sql.execution.aggregate.HashAggregateExec scala> val df = Seq(Tuple1(Tuple1(-0.0d)), Tuple1(Tuple1(0.0d))).toDF("k") scala> val agg = df.distinct() scala> agg.explain() == Physical Plan == *(2) HashAggregate(keys=[k#40], functions=[]) +- Exchange hashpartitioning(k#40, 200), true, [id=#62] +- *(1) HashAggregate(keys=[knownfloatingpointnormalized(if (isnull(k#40)) null else named_struct(col1, knownfloatingpointnormalized(normalizenanandzero(k#40._1 AS k#40], functions=[]) +- *(1) LocalTableScan [k#40] scala> val aggOutput = agg.queryExecution.sparkPlan.collect { case a: HashAggregateExec => a.output.head } scala> aggOutput.foreach { attr => println(attr.prettyJson) } ### Final Aggregate ### [ { "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children" : 0, "name" : "k", "dataType" : { "type" : "struct", "fields" : [ { "name" : "_1", ^^^ "type" : "double", "nullable" : false, "metadata" : { } } ] }, "nullable" : true, "metadata" : { }, "exprId" : { "product-class" : "org.apache.spark.sql.catalyst.expressions.ExprId", "id" : 40, "jvmId" : "a824e83f-933e-4b85-a1ff-577b5a0e2366" }, "qualifier" : [ ] } ] ### Partial Aggregate ### [ { "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children" : 0, "name" : "k", "dataType" : { "type" : "struct", "fields" : [ { "name" : "col1", "type" : "double", "nullable" : true, "metadata" : { } } ] }, "nullable" : true, "metadata" : { }, "exprId" : { "product-class" : "org.apache.spark.sql.catalyst.expressions.ExprId", "id" : 40, "jvmId" : "a824e83f-933e-4b85-a1ff-577b5a0e2366" }, "qualifier" : [ ] } ] ``` ### Why are the changes needed? bugfix. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added tests. Closes #29780 from maropu/FixBugInNormalizedFloatingNumbers. Authored-by: Takeshi Yamamuro Signed-off-by: Liang-Chi Hsieh (cherry picked from commit b49aaa33e13814a448be51a7e65a29cb515b8248) Signed-off-by: Liang-Chi Hsieh --- .../spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala | 6 +++--- .../test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala | 8 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala index 8d5dbc7..f0cf671 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala @@ -120,10 +120,10 @@ object NormalizeFloatingNumbers extends Rule[LogicalPlan] { KnownFloatingPointNormalized(NormalizeNaNAndZero(expr)) case _ if expr.dataType.isInstanceOf[StructType] => - val fields = expr.dataType.asInstanceOf[StructType].fields.indices.map { i => -normalize(GetStructField(expr, i)) + val fields = expr.dataType.asInstanceOf[StructType].fieldNames.zipWithIndex.map { +case (name, i) => Seq(Literal(name), normalize(GetStructField(expr, i))) } - val struct = CreateStruct(fields) + val struct = CreateNamedStruct(fields.flatten.toSeq) KnownFloatingPointNormalized(If(IsNull(expr), Literal(null, struct.dataType), struct)) case _ if expr.dataType.isInstanceOf[ArrayType] => diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala index 54327b3..2cb7790 100644 ---
[spark] branch branch-3.0 updated: [SPARK-32906][SQL] Struct field names should not change after normalizing floats
This is an automated email from the ASF dual-hosted git repository. viirya pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2d55de5 [SPARK-32906][SQL] Struct field names should not change after normalizing floats 2d55de5 is described below commit 2d55de5db06ff522a8e7fded3760c3f65c8bbf93 Author: Takeshi Yamamuro AuthorDate: Thu Sep 17 22:07:47 2020 -0700 [SPARK-32906][SQL] Struct field names should not change after normalizing floats ### What changes were proposed in this pull request? This PR intends to fix a minor bug when normalizing floats for struct types; ``` scala> import org.apache.spark.sql.execution.aggregate.HashAggregateExec scala> val df = Seq(Tuple1(Tuple1(-0.0d)), Tuple1(Tuple1(0.0d))).toDF("k") scala> val agg = df.distinct() scala> agg.explain() == Physical Plan == *(2) HashAggregate(keys=[k#40], functions=[]) +- Exchange hashpartitioning(k#40, 200), true, [id=#62] +- *(1) HashAggregate(keys=[knownfloatingpointnormalized(if (isnull(k#40)) null else named_struct(col1, knownfloatingpointnormalized(normalizenanandzero(k#40._1 AS k#40], functions=[]) +- *(1) LocalTableScan [k#40] scala> val aggOutput = agg.queryExecution.sparkPlan.collect { case a: HashAggregateExec => a.output.head } scala> aggOutput.foreach { attr => println(attr.prettyJson) } ### Final Aggregate ### [ { "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children" : 0, "name" : "k", "dataType" : { "type" : "struct", "fields" : [ { "name" : "_1", ^^^ "type" : "double", "nullable" : false, "metadata" : { } } ] }, "nullable" : true, "metadata" : { }, "exprId" : { "product-class" : "org.apache.spark.sql.catalyst.expressions.ExprId", "id" : 40, "jvmId" : "a824e83f-933e-4b85-a1ff-577b5a0e2366" }, "qualifier" : [ ] } ] ### Partial Aggregate ### [ { "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children" : 0, "name" : "k", "dataType" : { "type" : "struct", "fields" : [ { "name" : "col1", "type" : "double", "nullable" : true, "metadata" : { } } ] }, "nullable" : true, "metadata" : { }, "exprId" : { "product-class" : "org.apache.spark.sql.catalyst.expressions.ExprId", "id" : 40, "jvmId" : "a824e83f-933e-4b85-a1ff-577b5a0e2366" }, "qualifier" : [ ] } ] ``` ### Why are the changes needed? bugfix. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added tests. Closes #29780 from maropu/FixBugInNormalizedFloatingNumbers. Authored-by: Takeshi Yamamuro Signed-off-by: Liang-Chi Hsieh (cherry picked from commit b49aaa33e13814a448be51a7e65a29cb515b8248) Signed-off-by: Liang-Chi Hsieh --- .../spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala | 6 +++--- .../test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala | 8 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala index 8d5dbc7..f0cf671 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala @@ -120,10 +120,10 @@ object NormalizeFloatingNumbers extends Rule[LogicalPlan] { KnownFloatingPointNormalized(NormalizeNaNAndZero(expr)) case _ if expr.dataType.isInstanceOf[StructType] => - val fields = expr.dataType.asInstanceOf[StructType].fields.indices.map { i => -normalize(GetStructField(expr, i)) + val fields = expr.dataType.asInstanceOf[StructType].fieldNames.zipWithIndex.map { +case (name, i) => Seq(Literal(name), normalize(GetStructField(expr, i))) } - val struct = CreateStruct(fields) + val struct = CreateNamedStruct(fields.flatten.toSeq) KnownFloatingPointNormalized(If(IsNull(expr), Literal(null, struct.dataType), struct)) case _ if expr.dataType.isInstanceOf[ArrayType] => diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala index 54327b3..2cb7790 100644 ---
[spark] branch master updated (75dd864 -> b49aaa3)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 75dd864 [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()` add b49aaa3 [SPARK-32906][SQL] Struct field names should not change after normalizing floats No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala | 6 +++--- .../test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala | 8 2 files changed, 11 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32906][SQL] Struct field names should not change after normalizing floats
This is an automated email from the ASF dual-hosted git repository. viirya pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2d55de5 [SPARK-32906][SQL] Struct field names should not change after normalizing floats 2d55de5 is described below commit 2d55de5db06ff522a8e7fded3760c3f65c8bbf93 Author: Takeshi Yamamuro AuthorDate: Thu Sep 17 22:07:47 2020 -0700 [SPARK-32906][SQL] Struct field names should not change after normalizing floats ### What changes were proposed in this pull request? This PR intends to fix a minor bug when normalizing floats for struct types; ``` scala> import org.apache.spark.sql.execution.aggregate.HashAggregateExec scala> val df = Seq(Tuple1(Tuple1(-0.0d)), Tuple1(Tuple1(0.0d))).toDF("k") scala> val agg = df.distinct() scala> agg.explain() == Physical Plan == *(2) HashAggregate(keys=[k#40], functions=[]) +- Exchange hashpartitioning(k#40, 200), true, [id=#62] +- *(1) HashAggregate(keys=[knownfloatingpointnormalized(if (isnull(k#40)) null else named_struct(col1, knownfloatingpointnormalized(normalizenanandzero(k#40._1 AS k#40], functions=[]) +- *(1) LocalTableScan [k#40] scala> val aggOutput = agg.queryExecution.sparkPlan.collect { case a: HashAggregateExec => a.output.head } scala> aggOutput.foreach { attr => println(attr.prettyJson) } ### Final Aggregate ### [ { "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children" : 0, "name" : "k", "dataType" : { "type" : "struct", "fields" : [ { "name" : "_1", ^^^ "type" : "double", "nullable" : false, "metadata" : { } } ] }, "nullable" : true, "metadata" : { }, "exprId" : { "product-class" : "org.apache.spark.sql.catalyst.expressions.ExprId", "id" : 40, "jvmId" : "a824e83f-933e-4b85-a1ff-577b5a0e2366" }, "qualifier" : [ ] } ] ### Partial Aggregate ### [ { "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children" : 0, "name" : "k", "dataType" : { "type" : "struct", "fields" : [ { "name" : "col1", "type" : "double", "nullable" : true, "metadata" : { } } ] }, "nullable" : true, "metadata" : { }, "exprId" : { "product-class" : "org.apache.spark.sql.catalyst.expressions.ExprId", "id" : 40, "jvmId" : "a824e83f-933e-4b85-a1ff-577b5a0e2366" }, "qualifier" : [ ] } ] ``` ### Why are the changes needed? bugfix. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added tests. Closes #29780 from maropu/FixBugInNormalizedFloatingNumbers. Authored-by: Takeshi Yamamuro Signed-off-by: Liang-Chi Hsieh (cherry picked from commit b49aaa33e13814a448be51a7e65a29cb515b8248) Signed-off-by: Liang-Chi Hsieh --- .../spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala | 6 +++--- .../test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala | 8 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala index 8d5dbc7..f0cf671 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala @@ -120,10 +120,10 @@ object NormalizeFloatingNumbers extends Rule[LogicalPlan] { KnownFloatingPointNormalized(NormalizeNaNAndZero(expr)) case _ if expr.dataType.isInstanceOf[StructType] => - val fields = expr.dataType.asInstanceOf[StructType].fields.indices.map { i => -normalize(GetStructField(expr, i)) + val fields = expr.dataType.asInstanceOf[StructType].fieldNames.zipWithIndex.map { +case (name, i) => Seq(Literal(name), normalize(GetStructField(expr, i))) } - val struct = CreateStruct(fields) + val struct = CreateNamedStruct(fields.flatten.toSeq) KnownFloatingPointNormalized(If(IsNull(expr), Literal(null, struct.dataType), struct)) case _ if expr.dataType.isInstanceOf[ArrayType] => diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala index 54327b3..2cb7790 100644 ---
[spark] branch master updated (75dd864 -> b49aaa3)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 75dd864 [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()` add b49aaa3 [SPARK-32906][SQL] Struct field names should not change after normalizing floats No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala | 6 +++--- .../test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala | 8 2 files changed, 11 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32906][SQL] Struct field names should not change after normalizing floats
This is an automated email from the ASF dual-hosted git repository. viirya pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2d55de5 [SPARK-32906][SQL] Struct field names should not change after normalizing floats 2d55de5 is described below commit 2d55de5db06ff522a8e7fded3760c3f65c8bbf93 Author: Takeshi Yamamuro AuthorDate: Thu Sep 17 22:07:47 2020 -0700 [SPARK-32906][SQL] Struct field names should not change after normalizing floats ### What changes were proposed in this pull request? This PR intends to fix a minor bug when normalizing floats for struct types; ``` scala> import org.apache.spark.sql.execution.aggregate.HashAggregateExec scala> val df = Seq(Tuple1(Tuple1(-0.0d)), Tuple1(Tuple1(0.0d))).toDF("k") scala> val agg = df.distinct() scala> agg.explain() == Physical Plan == *(2) HashAggregate(keys=[k#40], functions=[]) +- Exchange hashpartitioning(k#40, 200), true, [id=#62] +- *(1) HashAggregate(keys=[knownfloatingpointnormalized(if (isnull(k#40)) null else named_struct(col1, knownfloatingpointnormalized(normalizenanandzero(k#40._1 AS k#40], functions=[]) +- *(1) LocalTableScan [k#40] scala> val aggOutput = agg.queryExecution.sparkPlan.collect { case a: HashAggregateExec => a.output.head } scala> aggOutput.foreach { attr => println(attr.prettyJson) } ### Final Aggregate ### [ { "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children" : 0, "name" : "k", "dataType" : { "type" : "struct", "fields" : [ { "name" : "_1", ^^^ "type" : "double", "nullable" : false, "metadata" : { } } ] }, "nullable" : true, "metadata" : { }, "exprId" : { "product-class" : "org.apache.spark.sql.catalyst.expressions.ExprId", "id" : 40, "jvmId" : "a824e83f-933e-4b85-a1ff-577b5a0e2366" }, "qualifier" : [ ] } ] ### Partial Aggregate ### [ { "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children" : 0, "name" : "k", "dataType" : { "type" : "struct", "fields" : [ { "name" : "col1", "type" : "double", "nullable" : true, "metadata" : { } } ] }, "nullable" : true, "metadata" : { }, "exprId" : { "product-class" : "org.apache.spark.sql.catalyst.expressions.ExprId", "id" : 40, "jvmId" : "a824e83f-933e-4b85-a1ff-577b5a0e2366" }, "qualifier" : [ ] } ] ``` ### Why are the changes needed? bugfix. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added tests. Closes #29780 from maropu/FixBugInNormalizedFloatingNumbers. Authored-by: Takeshi Yamamuro Signed-off-by: Liang-Chi Hsieh (cherry picked from commit b49aaa33e13814a448be51a7e65a29cb515b8248) Signed-off-by: Liang-Chi Hsieh --- .../spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala | 6 +++--- .../test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala | 8 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala index 8d5dbc7..f0cf671 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala @@ -120,10 +120,10 @@ object NormalizeFloatingNumbers extends Rule[LogicalPlan] { KnownFloatingPointNormalized(NormalizeNaNAndZero(expr)) case _ if expr.dataType.isInstanceOf[StructType] => - val fields = expr.dataType.asInstanceOf[StructType].fields.indices.map { i => -normalize(GetStructField(expr, i)) + val fields = expr.dataType.asInstanceOf[StructType].fieldNames.zipWithIndex.map { +case (name, i) => Seq(Literal(name), normalize(GetStructField(expr, i))) } - val struct = CreateStruct(fields) + val struct = CreateNamedStruct(fields.flatten.toSeq) KnownFloatingPointNormalized(If(IsNull(expr), Literal(null, struct.dataType), struct)) case _ if expr.dataType.isInstanceOf[ArrayType] => diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala index 54327b3..2cb7790 100644 ---
[spark] branch master updated (75dd864 -> b49aaa3)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 75dd864 [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()` add b49aaa3 [SPARK-32906][SQL] Struct field names should not change after normalizing floats No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala | 6 +++--- .../test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala | 8 2 files changed, 11 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32906][SQL] Struct field names should not change after normalizing floats
This is an automated email from the ASF dual-hosted git repository. viirya pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2d55de5 [SPARK-32906][SQL] Struct field names should not change after normalizing floats 2d55de5 is described below commit 2d55de5db06ff522a8e7fded3760c3f65c8bbf93 Author: Takeshi Yamamuro AuthorDate: Thu Sep 17 22:07:47 2020 -0700 [SPARK-32906][SQL] Struct field names should not change after normalizing floats ### What changes were proposed in this pull request? This PR intends to fix a minor bug when normalizing floats for struct types; ``` scala> import org.apache.spark.sql.execution.aggregate.HashAggregateExec scala> val df = Seq(Tuple1(Tuple1(-0.0d)), Tuple1(Tuple1(0.0d))).toDF("k") scala> val agg = df.distinct() scala> agg.explain() == Physical Plan == *(2) HashAggregate(keys=[k#40], functions=[]) +- Exchange hashpartitioning(k#40, 200), true, [id=#62] +- *(1) HashAggregate(keys=[knownfloatingpointnormalized(if (isnull(k#40)) null else named_struct(col1, knownfloatingpointnormalized(normalizenanandzero(k#40._1 AS k#40], functions=[]) +- *(1) LocalTableScan [k#40] scala> val aggOutput = agg.queryExecution.sparkPlan.collect { case a: HashAggregateExec => a.output.head } scala> aggOutput.foreach { attr => println(attr.prettyJson) } ### Final Aggregate ### [ { "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children" : 0, "name" : "k", "dataType" : { "type" : "struct", "fields" : [ { "name" : "_1", ^^^ "type" : "double", "nullable" : false, "metadata" : { } } ] }, "nullable" : true, "metadata" : { }, "exprId" : { "product-class" : "org.apache.spark.sql.catalyst.expressions.ExprId", "id" : 40, "jvmId" : "a824e83f-933e-4b85-a1ff-577b5a0e2366" }, "qualifier" : [ ] } ] ### Partial Aggregate ### [ { "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children" : 0, "name" : "k", "dataType" : { "type" : "struct", "fields" : [ { "name" : "col1", "type" : "double", "nullable" : true, "metadata" : { } } ] }, "nullable" : true, "metadata" : { }, "exprId" : { "product-class" : "org.apache.spark.sql.catalyst.expressions.ExprId", "id" : 40, "jvmId" : "a824e83f-933e-4b85-a1ff-577b5a0e2366" }, "qualifier" : [ ] } ] ``` ### Why are the changes needed? bugfix. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added tests. Closes #29780 from maropu/FixBugInNormalizedFloatingNumbers. Authored-by: Takeshi Yamamuro Signed-off-by: Liang-Chi Hsieh (cherry picked from commit b49aaa33e13814a448be51a7e65a29cb515b8248) Signed-off-by: Liang-Chi Hsieh --- .../spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala | 6 +++--- .../test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala | 8 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala index 8d5dbc7..f0cf671 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala @@ -120,10 +120,10 @@ object NormalizeFloatingNumbers extends Rule[LogicalPlan] { KnownFloatingPointNormalized(NormalizeNaNAndZero(expr)) case _ if expr.dataType.isInstanceOf[StructType] => - val fields = expr.dataType.asInstanceOf[StructType].fields.indices.map { i => -normalize(GetStructField(expr, i)) + val fields = expr.dataType.asInstanceOf[StructType].fieldNames.zipWithIndex.map { +case (name, i) => Seq(Literal(name), normalize(GetStructField(expr, i))) } - val struct = CreateStruct(fields) + val struct = CreateNamedStruct(fields.flatten.toSeq) KnownFloatingPointNormalized(If(IsNull(expr), Literal(null, struct.dataType), struct)) case _ if expr.dataType.isInstanceOf[ArrayType] => diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala index 54327b3..2cb7790 100644 ---
[spark] branch master updated (75dd864 -> b49aaa3)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 75dd864 [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()` add b49aaa3 [SPARK-32906][SQL] Struct field names should not change after normalizing floats No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala | 6 +++--- .../test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala | 8 2 files changed, 11 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (75dd864 -> b49aaa3)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 75dd864 [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()` add b49aaa3 [SPARK-32906][SQL] Struct field names should not change after normalizing floats No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala | 6 +++--- .../test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala | 8 2 files changed, 11 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated (2fa68a6 -> eed7c62)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git. from 2fa68a6 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls add eed7c62 [SPARK-32908][SQL][2.4] Fix target error calculation in `percentile_approx()` No new revisions were added by this update. Summary of changes: .../sql/catalyst/util/QuantileSummaries.scala | 2 +- .../test-data/percentile_approx-input.csv.bz2 | Bin 0 -> 124614 bytes .../sql/ApproximatePercentileQuerySuite.scala | 21 - 3 files changed, 21 insertions(+), 2 deletions(-) create mode 100644 sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()`
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 5581a92 [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()` 5581a92 is described below commit 5581a92bb8b1d96278796e6c18812c5a2931db11 Author: Max Gekk AuthorDate: Fri Sep 18 10:47:06 2020 +0900 [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()` ### What changes were proposed in this pull request? 1. Change the target error calculation according to the paper [Space-Efficient Online Computation of Quantile Summaries](http://infolab.stanford.edu/~datar/courses/cs361a/papers/quantiles.pdf). It says that the error `e = max(gi, deltai)/2` (see the page 59). Also this has clear explanation [ε-approximate quantiles](http://www.mathcs.emory.edu/~cheung/Courses/584/Syllabus/08-Quantile/Greenwald.html#proofprop1). 2. Added a test to check different accuracies. 3. Added an input CSV file `percentile_approx-input.csv.bz2` to the resource folder `sql/catalyst/src/main/resources` for the test. ### Why are the changes needed? To fix incorrect percentile calculation, see an example in SPARK-32908. ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? - By running existing tests in `QuantileSummariesSuite` and in `ApproximatePercentileQuerySuite`. - Added new test `SPARK-32908: maximum target error in percentile_approx` to `ApproximatePercentileQuerySuite`. Closes #29784 from MaxGekk/fix-percentile_approx-2. Authored-by: Max Gekk Signed-off-by: HyukjinKwon (cherry picked from commit 75dd86400c3c2348a4139586fbbead840512b909) Signed-off-by: HyukjinKwon --- .../sql/catalyst/util/QuantileSummaries.scala | 2 +- .../test-data/percentile_approx-input.csv.bz2 | Bin 0 -> 124614 bytes .../sql/ApproximatePercentileQuerySuite.scala | 21 - 3 files changed, 21 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala index 3a0490d..e07aa0f 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala @@ -254,7 +254,7 @@ class QuantileSummaries( // Target rank val rank = math.ceil(quantile * count).toLong -val targetError = relativeError * count +val targetError = sampled.map(s => s.delta + s.g).max / 2 // Minimum rank at current sample var minRank = 0L var i = 0 diff --git a/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 b/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 new file mode 100644 index 000..f85e289 Binary files /dev/null and b/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 differ diff --git a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala index 2b4abed..4991e39 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala @@ -150,7 +150,7 @@ class ApproximatePercentileQuerySuite extends QueryTest with SharedSparkSession (1 to 1000).toDF("col").createOrReplaceTempView(table) checkAnswer( spark.sql(s"SELECT percentile_approx(col, array(0.25 + 0.25D), 200 + 800) FROM $table"), -Row(Seq(499)) +Row(Seq(500)) ) } } @@ -296,4 +296,23 @@ class ApproximatePercentileQuerySuite extends QueryTest with SharedSparkSession buffer.quantileSummaries assert(buffer.isCompressed) } + + test("SPARK-32908: maximum target error in percentile_approx") { +withTempView(table) { + spark.read +.schema("col int") +.csv(testFile("test-data/percentile_approx-input.csv.bz2")) +.repartition(1) +.createOrReplaceTempView(table) + checkAnswer( +spark.sql( + s"""SELECT + | percentile_approx(col, 0.77, 1000), + | percentile_approx(col, 0.77, 1), + | percentile_approx(col, 0.77, 10), + | percentile_approx(col, 0.77, 100) + |FROM $table""".stripMargin), +Row(18, 17, 17, 17)) +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated (2fa68a6 -> eed7c62)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git. from 2fa68a6 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls add eed7c62 [SPARK-32908][SQL][2.4] Fix target error calculation in `percentile_approx()` No new revisions were added by this update. Summary of changes: .../sql/catalyst/util/QuantileSummaries.scala | 2 +- .../test-data/percentile_approx-input.csv.bz2 | Bin 0 -> 124614 bytes .../sql/ApproximatePercentileQuerySuite.scala | 21 - 3 files changed, 21 insertions(+), 2 deletions(-) create mode 100644 sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()`
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 5581a92 [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()` 5581a92 is described below commit 5581a92bb8b1d96278796e6c18812c5a2931db11 Author: Max Gekk AuthorDate: Fri Sep 18 10:47:06 2020 +0900 [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()` ### What changes were proposed in this pull request? 1. Change the target error calculation according to the paper [Space-Efficient Online Computation of Quantile Summaries](http://infolab.stanford.edu/~datar/courses/cs361a/papers/quantiles.pdf). It says that the error `e = max(gi, deltai)/2` (see the page 59). Also this has clear explanation [ε-approximate quantiles](http://www.mathcs.emory.edu/~cheung/Courses/584/Syllabus/08-Quantile/Greenwald.html#proofprop1). 2. Added a test to check different accuracies. 3. Added an input CSV file `percentile_approx-input.csv.bz2` to the resource folder `sql/catalyst/src/main/resources` for the test. ### Why are the changes needed? To fix incorrect percentile calculation, see an example in SPARK-32908. ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? - By running existing tests in `QuantileSummariesSuite` and in `ApproximatePercentileQuerySuite`. - Added new test `SPARK-32908: maximum target error in percentile_approx` to `ApproximatePercentileQuerySuite`. Closes #29784 from MaxGekk/fix-percentile_approx-2. Authored-by: Max Gekk Signed-off-by: HyukjinKwon (cherry picked from commit 75dd86400c3c2348a4139586fbbead840512b909) Signed-off-by: HyukjinKwon --- .../sql/catalyst/util/QuantileSummaries.scala | 2 +- .../test-data/percentile_approx-input.csv.bz2 | Bin 0 -> 124614 bytes .../sql/ApproximatePercentileQuerySuite.scala | 21 - 3 files changed, 21 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala index 3a0490d..e07aa0f 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala @@ -254,7 +254,7 @@ class QuantileSummaries( // Target rank val rank = math.ceil(quantile * count).toLong -val targetError = relativeError * count +val targetError = sampled.map(s => s.delta + s.g).max / 2 // Minimum rank at current sample var minRank = 0L var i = 0 diff --git a/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 b/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 new file mode 100644 index 000..f85e289 Binary files /dev/null and b/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 differ diff --git a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala index 2b4abed..4991e39 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala @@ -150,7 +150,7 @@ class ApproximatePercentileQuerySuite extends QueryTest with SharedSparkSession (1 to 1000).toDF("col").createOrReplaceTempView(table) checkAnswer( spark.sql(s"SELECT percentile_approx(col, array(0.25 + 0.25D), 200 + 800) FROM $table"), -Row(Seq(499)) +Row(Seq(500)) ) } } @@ -296,4 +296,23 @@ class ApproximatePercentileQuerySuite extends QueryTest with SharedSparkSession buffer.quantileSummaries assert(buffer.isCompressed) } + + test("SPARK-32908: maximum target error in percentile_approx") { +withTempView(table) { + spark.read +.schema("col int") +.csv(testFile("test-data/percentile_approx-input.csv.bz2")) +.repartition(1) +.createOrReplaceTempView(table) + checkAnswer( +spark.sql( + s"""SELECT + | percentile_approx(col, 0.77, 1000), + | percentile_approx(col, 0.77, 1), + | percentile_approx(col, 0.77, 10), + | percentile_approx(col, 0.77, 100) + |FROM $table""".stripMargin), +Row(18, 17, 17, 17)) +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9d6221b -> 75dd864)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9d6221b [SPARK-18409][ML][FOLLOWUP] LSH approxNearestNeighbors optimization 2 add 75dd864 [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()` No new revisions were added by this update. Summary of changes: .../sql/catalyst/util/QuantileSummaries.scala | 2 +- .../test-data/percentile_approx-input.csv.bz2 | Bin 0 -> 124614 bytes .../sql/ApproximatePercentileQuerySuite.scala | 21 - 3 files changed, 21 insertions(+), 2 deletions(-) create mode 100644 sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated (2fa68a6 -> eed7c62)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git. from 2fa68a6 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls add eed7c62 [SPARK-32908][SQL][2.4] Fix target error calculation in `percentile_approx()` No new revisions were added by this update. Summary of changes: .../sql/catalyst/util/QuantileSummaries.scala | 2 +- .../test-data/percentile_approx-input.csv.bz2 | Bin 0 -> 124614 bytes .../sql/ApproximatePercentileQuerySuite.scala | 21 - 3 files changed, 21 insertions(+), 2 deletions(-) create mode 100644 sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()`
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 5581a92 [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()` 5581a92 is described below commit 5581a92bb8b1d96278796e6c18812c5a2931db11 Author: Max Gekk AuthorDate: Fri Sep 18 10:47:06 2020 +0900 [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()` ### What changes were proposed in this pull request? 1. Change the target error calculation according to the paper [Space-Efficient Online Computation of Quantile Summaries](http://infolab.stanford.edu/~datar/courses/cs361a/papers/quantiles.pdf). It says that the error `e = max(gi, deltai)/2` (see the page 59). Also this has clear explanation [ε-approximate quantiles](http://www.mathcs.emory.edu/~cheung/Courses/584/Syllabus/08-Quantile/Greenwald.html#proofprop1). 2. Added a test to check different accuracies. 3. Added an input CSV file `percentile_approx-input.csv.bz2` to the resource folder `sql/catalyst/src/main/resources` for the test. ### Why are the changes needed? To fix incorrect percentile calculation, see an example in SPARK-32908. ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? - By running existing tests in `QuantileSummariesSuite` and in `ApproximatePercentileQuerySuite`. - Added new test `SPARK-32908: maximum target error in percentile_approx` to `ApproximatePercentileQuerySuite`. Closes #29784 from MaxGekk/fix-percentile_approx-2. Authored-by: Max Gekk Signed-off-by: HyukjinKwon (cherry picked from commit 75dd86400c3c2348a4139586fbbead840512b909) Signed-off-by: HyukjinKwon --- .../sql/catalyst/util/QuantileSummaries.scala | 2 +- .../test-data/percentile_approx-input.csv.bz2 | Bin 0 -> 124614 bytes .../sql/ApproximatePercentileQuerySuite.scala | 21 - 3 files changed, 21 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala index 3a0490d..e07aa0f 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala @@ -254,7 +254,7 @@ class QuantileSummaries( // Target rank val rank = math.ceil(quantile * count).toLong -val targetError = relativeError * count +val targetError = sampled.map(s => s.delta + s.g).max / 2 // Minimum rank at current sample var minRank = 0L var i = 0 diff --git a/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 b/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 new file mode 100644 index 000..f85e289 Binary files /dev/null and b/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 differ diff --git a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala index 2b4abed..4991e39 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala @@ -150,7 +150,7 @@ class ApproximatePercentileQuerySuite extends QueryTest with SharedSparkSession (1 to 1000).toDF("col").createOrReplaceTempView(table) checkAnswer( spark.sql(s"SELECT percentile_approx(col, array(0.25 + 0.25D), 200 + 800) FROM $table"), -Row(Seq(499)) +Row(Seq(500)) ) } } @@ -296,4 +296,23 @@ class ApproximatePercentileQuerySuite extends QueryTest with SharedSparkSession buffer.quantileSummaries assert(buffer.isCompressed) } + + test("SPARK-32908: maximum target error in percentile_approx") { +withTempView(table) { + spark.read +.schema("col int") +.csv(testFile("test-data/percentile_approx-input.csv.bz2")) +.repartition(1) +.createOrReplaceTempView(table) + checkAnswer( +spark.sql( + s"""SELECT + | percentile_approx(col, 0.77, 1000), + | percentile_approx(col, 0.77, 1), + | percentile_approx(col, 0.77, 10), + | percentile_approx(col, 0.77, 100) + |FROM $table""".stripMargin), +Row(18, 17, 17, 17)) +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9d6221b -> 75dd864)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9d6221b [SPARK-18409][ML][FOLLOWUP] LSH approxNearestNeighbors optimization 2 add 75dd864 [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()` No new revisions were added by this update. Summary of changes: .../sql/catalyst/util/QuantileSummaries.scala | 2 +- .../test-data/percentile_approx-input.csv.bz2 | Bin 0 -> 124614 bytes .../sql/ApproximatePercentileQuerySuite.scala | 21 - 3 files changed, 21 insertions(+), 2 deletions(-) create mode 100644 sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated (2fa68a6 -> eed7c62)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git. from 2fa68a6 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls add eed7c62 [SPARK-32908][SQL][2.4] Fix target error calculation in `percentile_approx()` No new revisions were added by this update. Summary of changes: .../sql/catalyst/util/QuantileSummaries.scala | 2 +- .../test-data/percentile_approx-input.csv.bz2 | Bin 0 -> 124614 bytes .../sql/ApproximatePercentileQuerySuite.scala | 21 - 3 files changed, 21 insertions(+), 2 deletions(-) create mode 100644 sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()`
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 5581a92 [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()` 5581a92 is described below commit 5581a92bb8b1d96278796e6c18812c5a2931db11 Author: Max Gekk AuthorDate: Fri Sep 18 10:47:06 2020 +0900 [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()` ### What changes were proposed in this pull request? 1. Change the target error calculation according to the paper [Space-Efficient Online Computation of Quantile Summaries](http://infolab.stanford.edu/~datar/courses/cs361a/papers/quantiles.pdf). It says that the error `e = max(gi, deltai)/2` (see the page 59). Also this has clear explanation [ε-approximate quantiles](http://www.mathcs.emory.edu/~cheung/Courses/584/Syllabus/08-Quantile/Greenwald.html#proofprop1). 2. Added a test to check different accuracies. 3. Added an input CSV file `percentile_approx-input.csv.bz2` to the resource folder `sql/catalyst/src/main/resources` for the test. ### Why are the changes needed? To fix incorrect percentile calculation, see an example in SPARK-32908. ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? - By running existing tests in `QuantileSummariesSuite` and in `ApproximatePercentileQuerySuite`. - Added new test `SPARK-32908: maximum target error in percentile_approx` to `ApproximatePercentileQuerySuite`. Closes #29784 from MaxGekk/fix-percentile_approx-2. Authored-by: Max Gekk Signed-off-by: HyukjinKwon (cherry picked from commit 75dd86400c3c2348a4139586fbbead840512b909) Signed-off-by: HyukjinKwon --- .../sql/catalyst/util/QuantileSummaries.scala | 2 +- .../test-data/percentile_approx-input.csv.bz2 | Bin 0 -> 124614 bytes .../sql/ApproximatePercentileQuerySuite.scala | 21 - 3 files changed, 21 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala index 3a0490d..e07aa0f 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala @@ -254,7 +254,7 @@ class QuantileSummaries( // Target rank val rank = math.ceil(quantile * count).toLong -val targetError = relativeError * count +val targetError = sampled.map(s => s.delta + s.g).max / 2 // Minimum rank at current sample var minRank = 0L var i = 0 diff --git a/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 b/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 new file mode 100644 index 000..f85e289 Binary files /dev/null and b/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 differ diff --git a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala index 2b4abed..4991e39 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala @@ -150,7 +150,7 @@ class ApproximatePercentileQuerySuite extends QueryTest with SharedSparkSession (1 to 1000).toDF("col").createOrReplaceTempView(table) checkAnswer( spark.sql(s"SELECT percentile_approx(col, array(0.25 + 0.25D), 200 + 800) FROM $table"), -Row(Seq(499)) +Row(Seq(500)) ) } } @@ -296,4 +296,23 @@ class ApproximatePercentileQuerySuite extends QueryTest with SharedSparkSession buffer.quantileSummaries assert(buffer.isCompressed) } + + test("SPARK-32908: maximum target error in percentile_approx") { +withTempView(table) { + spark.read +.schema("col int") +.csv(testFile("test-data/percentile_approx-input.csv.bz2")) +.repartition(1) +.createOrReplaceTempView(table) + checkAnswer( +spark.sql( + s"""SELECT + | percentile_approx(col, 0.77, 1000), + | percentile_approx(col, 0.77, 1), + | percentile_approx(col, 0.77, 10), + | percentile_approx(col, 0.77, 100) + |FROM $table""".stripMargin), +Row(18, 17, 17, 17)) +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9d6221b -> 75dd864)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9d6221b [SPARK-18409][ML][FOLLOWUP] LSH approxNearestNeighbors optimization 2 add 75dd864 [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()` No new revisions were added by this update. Summary of changes: .../sql/catalyst/util/QuantileSummaries.scala | 2 +- .../test-data/percentile_approx-input.csv.bz2 | Bin 0 -> 124614 bytes .../sql/ApproximatePercentileQuerySuite.scala | 21 - 3 files changed, 21 insertions(+), 2 deletions(-) create mode 100644 sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated (2fa68a6 -> eed7c62)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git. from 2fa68a6 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls add eed7c62 [SPARK-32908][SQL][2.4] Fix target error calculation in `percentile_approx()` No new revisions were added by this update. Summary of changes: .../sql/catalyst/util/QuantileSummaries.scala | 2 +- .../test-data/percentile_approx-input.csv.bz2 | Bin 0 -> 124614 bytes .../sql/ApproximatePercentileQuerySuite.scala | 21 - 3 files changed, 21 insertions(+), 2 deletions(-) create mode 100644 sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()`
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 5581a92 [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()` 5581a92 is described below commit 5581a92bb8b1d96278796e6c18812c5a2931db11 Author: Max Gekk AuthorDate: Fri Sep 18 10:47:06 2020 +0900 [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()` ### What changes were proposed in this pull request? 1. Change the target error calculation according to the paper [Space-Efficient Online Computation of Quantile Summaries](http://infolab.stanford.edu/~datar/courses/cs361a/papers/quantiles.pdf). It says that the error `e = max(gi, deltai)/2` (see the page 59). Also this has clear explanation [ε-approximate quantiles](http://www.mathcs.emory.edu/~cheung/Courses/584/Syllabus/08-Quantile/Greenwald.html#proofprop1). 2. Added a test to check different accuracies. 3. Added an input CSV file `percentile_approx-input.csv.bz2` to the resource folder `sql/catalyst/src/main/resources` for the test. ### Why are the changes needed? To fix incorrect percentile calculation, see an example in SPARK-32908. ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? - By running existing tests in `QuantileSummariesSuite` and in `ApproximatePercentileQuerySuite`. - Added new test `SPARK-32908: maximum target error in percentile_approx` to `ApproximatePercentileQuerySuite`. Closes #29784 from MaxGekk/fix-percentile_approx-2. Authored-by: Max Gekk Signed-off-by: HyukjinKwon (cherry picked from commit 75dd86400c3c2348a4139586fbbead840512b909) Signed-off-by: HyukjinKwon --- .../sql/catalyst/util/QuantileSummaries.scala | 2 +- .../test-data/percentile_approx-input.csv.bz2 | Bin 0 -> 124614 bytes .../sql/ApproximatePercentileQuerySuite.scala | 21 - 3 files changed, 21 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala index 3a0490d..e07aa0f 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala @@ -254,7 +254,7 @@ class QuantileSummaries( // Target rank val rank = math.ceil(quantile * count).toLong -val targetError = relativeError * count +val targetError = sampled.map(s => s.delta + s.g).max / 2 // Minimum rank at current sample var minRank = 0L var i = 0 diff --git a/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 b/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 new file mode 100644 index 000..f85e289 Binary files /dev/null and b/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 differ diff --git a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala index 2b4abed..4991e39 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala @@ -150,7 +150,7 @@ class ApproximatePercentileQuerySuite extends QueryTest with SharedSparkSession (1 to 1000).toDF("col").createOrReplaceTempView(table) checkAnswer( spark.sql(s"SELECT percentile_approx(col, array(0.25 + 0.25D), 200 + 800) FROM $table"), -Row(Seq(499)) +Row(Seq(500)) ) } } @@ -296,4 +296,23 @@ class ApproximatePercentileQuerySuite extends QueryTest with SharedSparkSession buffer.quantileSummaries assert(buffer.isCompressed) } + + test("SPARK-32908: maximum target error in percentile_approx") { +withTempView(table) { + spark.read +.schema("col int") +.csv(testFile("test-data/percentile_approx-input.csv.bz2")) +.repartition(1) +.createOrReplaceTempView(table) + checkAnswer( +spark.sql( + s"""SELECT + | percentile_approx(col, 0.77, 1000), + | percentile_approx(col, 0.77, 1), + | percentile_approx(col, 0.77, 10), + | percentile_approx(col, 0.77, 100) + |FROM $table""".stripMargin), +Row(18, 17, 17, 17)) +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9d6221b -> 75dd864)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9d6221b [SPARK-18409][ML][FOLLOWUP] LSH approxNearestNeighbors optimization 2 add 75dd864 [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()` No new revisions were added by this update. Summary of changes: .../sql/catalyst/util/QuantileSummaries.scala | 2 +- .../test-data/percentile_approx-input.csv.bz2 | Bin 0 -> 124614 bytes .../sql/ApproximatePercentileQuerySuite.scala | 21 - 3 files changed, 21 insertions(+), 2 deletions(-) create mode 100644 sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9d6221b -> 75dd864)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9d6221b [SPARK-18409][ML][FOLLOWUP] LSH approxNearestNeighbors optimization 2 add 75dd864 [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()` No new revisions were added by this update. Summary of changes: .../sql/catalyst/util/QuantileSummaries.scala | 2 +- .../test-data/percentile_approx-input.csv.bz2 | Bin 0 -> 124614 bytes .../sql/ApproximatePercentileQuerySuite.scala | 21 - 3 files changed, 21 insertions(+), 2 deletions(-) create mode 100644 sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (68e0d5f -> 9d6221b)
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 68e0d5f [SPARK-32902][SQL] Logging plan changes for AQE add 9d6221b [SPARK-18409][ML][FOLLOWUP] LSH approxNearestNeighbors optimization 2 No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/ml/feature/LSH.scala| 28 +++--- 1 file changed, 14 insertions(+), 14 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (68e0d5f -> 9d6221b)
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 68e0d5f [SPARK-32902][SQL] Logging plan changes for AQE add 9d6221b [SPARK-18409][ML][FOLLOWUP] LSH approxNearestNeighbors optimization 2 No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/ml/feature/LSH.scala| 28 +++--- 1 file changed, 14 insertions(+), 14 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (68e0d5f -> 9d6221b)
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 68e0d5f [SPARK-32902][SQL] Logging plan changes for AQE add 9d6221b [SPARK-18409][ML][FOLLOWUP] LSH approxNearestNeighbors optimization 2 No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/ml/feature/LSH.scala| 28 +++--- 1 file changed, 14 insertions(+), 14 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (68e0d5f -> 9d6221b)
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 68e0d5f [SPARK-32902][SQL] Logging plan changes for AQE add 9d6221b [SPARK-18409][ML][FOLLOWUP] LSH approxNearestNeighbors optimization 2 No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/ml/feature/LSH.scala| 28 +++--- 1 file changed, 14 insertions(+), 14 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (68e0d5f -> 9d6221b)
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 68e0d5f [SPARK-32902][SQL] Logging plan changes for AQE add 9d6221b [SPARK-18409][ML][FOLLOWUP] LSH approxNearestNeighbors optimization 2 No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/ml/feature/LSH.scala| 28 +++--- 1 file changed, 14 insertions(+), 14 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4ced588 -> 68e0d5f)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4ced588 [SPARK-32635][SQL] Fix foldable propagation add 68e0d5f [SPARK-32902][SQL] Logging plan changes for AQE No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 45 +- .../adaptive/AdaptiveQueryExecSuite.scala | 20 ++ 2 files changed, 56 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4ced588 -> 68e0d5f)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4ced588 [SPARK-32635][SQL] Fix foldable propagation add 68e0d5f [SPARK-32902][SQL] Logging plan changes for AQE No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 45 +- .../adaptive/AdaptiveQueryExecSuite.scala | 20 ++ 2 files changed, 56 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4ced588 -> 68e0d5f)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4ced588 [SPARK-32635][SQL] Fix foldable propagation add 68e0d5f [SPARK-32902][SQL] Logging plan changes for AQE No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 45 +- .../adaptive/AdaptiveQueryExecSuite.scala | 20 ++ 2 files changed, 56 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4ced588 -> 68e0d5f)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4ced588 [SPARK-32635][SQL] Fix foldable propagation add 68e0d5f [SPARK-32902][SQL] Logging plan changes for AQE No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 45 +- .../adaptive/AdaptiveQueryExecSuite.scala | 20 ++ 2 files changed, 56 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4ced588 -> 68e0d5f)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4ced588 [SPARK-32635][SQL] Fix foldable propagation add 68e0d5f [SPARK-32902][SQL] Logging plan changes for AQE No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 45 +- .../adaptive/AdaptiveQueryExecSuite.scala | 20 ++ 2 files changed, 56 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32635][SQL] Fix foldable propagation
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new ecc2f5d [SPARK-32635][SQL] Fix foldable propagation ecc2f5d is described below commit ecc2f5d9e227b62f418d65708f516ffe8e690f96 Author: Peter Toth AuthorDate: Fri Sep 18 08:17:23 2020 +0900 [SPARK-32635][SQL] Fix foldable propagation ### What changes were proposed in this pull request? This PR rewrites `FoldablePropagation` rule to replace attribute references in a node with foldables coming only from the node's children. Before this PR in the case of this example (with setting`spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation`): ```scala val a = Seq("1").toDF("col1").withColumn("col2", lit("1")) val b = Seq("2").toDF("col1").withColumn("col2", lit("2")) val aub = a.union(b) val c = aub.filter($"col1" === "2").cache() val d = Seq("2").toDF( "col4") val r = d.join(aub, $"col2" === $"col4").select("col4") val l = c.select("col2") val df = l.join(r, $"col2" === $"col4", "LeftOuter") df.show() ``` foldable propagation happens incorrectly: ``` Join LeftOuter, (col2#6 = col4#34) Join LeftOuter, (col2#6 = col4#34) !:- Project [col2#6] :- Project [1 AS col2#6] : +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, deserialized, 1 replicas) : +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, deserialized, 1 replicas) :+- Union :+- Union : :- *(1) Project [value#1 AS col1#4, 1 AS col2#6] : :- *(1) Project [value#1 AS col1#4, 1 AS col2#6] : : +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2)) : : +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2)) : : +- *(1) LocalTableScan [value#1] : : +- *(1) LocalTableScan [value#1] : +- *(2) Project [value#10 AS col1#13, 2 AS col2#15] : +- *(2) Project [value#10 AS col1#13, 2 AS col2#15] : +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) : +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) : +- *(2) LocalTableScan [value#10] : +- *(2) LocalTableScan [value#10] +- Project [col4#34] +- Project [col4#34] +- Join Inner, (col2#6 = col4#34) +- Join Inner, (col2#6 = col4#34) :- Project [value#31 AS col4#34] :- Project [value#31 AS col4#34] : +- LocalRelation [value#31] : +- LocalRelation [value#31] +- Project [col2#6] +- Project [col2#6] +- Union false, false +- Union false, false :- Project [1 AS col2#6] :- Project [1 AS col2#6] : +- LocalRelation [value#1] : +- LocalRelation [value#1] +- Project [2 AS col2#15] +- Project [2 AS col2#15] +- LocalRelation [value#10] +- LocalRelation [value#10] ``` and so the result is wrong: ``` +++ |col2|col4| +++ | 1|null| +++ ``` After this PR foldable propagation will not happen incorrectly and the result is correct: ``` +++ |col2|col4| +++ | 2| 2| +++ ``` ### Why are the changes needed? To fix a correctness issue. ### Does this PR introduce _any_ user-facing change? Yes, fixes a correctness issue. ### How was this patch tested? Existing and new UTs. Closes #29771 from peter-toth/SPARK-32635-fix-foldable-propagation. Authored-by: Peter Toth Signed-off-by: Takeshi
[spark] branch master updated (ea3b979 -> 4ced588)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ea3b979 [SPARK-32889][SQL] orc table column name supports special characters add 4ced588 [SPARK-32635][SQL] Fix foldable propagation No new revisions were added by this update. Summary of changes: .../sql/catalyst/expressions/AttributeMap.scala| 2 + .../sql/catalyst/expressions/AttributeMap.scala| 2 + .../spark/sql/catalyst/optimizer/expressions.scala | 121 - .../org/apache/spark/sql/DataFrameSuite.scala | 12 ++ 4 files changed, 88 insertions(+), 49 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32635][SQL] Fix foldable propagation
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new ecc2f5d [SPARK-32635][SQL] Fix foldable propagation ecc2f5d is described below commit ecc2f5d9e227b62f418d65708f516ffe8e690f96 Author: Peter Toth AuthorDate: Fri Sep 18 08:17:23 2020 +0900 [SPARK-32635][SQL] Fix foldable propagation ### What changes were proposed in this pull request? This PR rewrites `FoldablePropagation` rule to replace attribute references in a node with foldables coming only from the node's children. Before this PR in the case of this example (with setting`spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation`): ```scala val a = Seq("1").toDF("col1").withColumn("col2", lit("1")) val b = Seq("2").toDF("col1").withColumn("col2", lit("2")) val aub = a.union(b) val c = aub.filter($"col1" === "2").cache() val d = Seq("2").toDF( "col4") val r = d.join(aub, $"col2" === $"col4").select("col4") val l = c.select("col2") val df = l.join(r, $"col2" === $"col4", "LeftOuter") df.show() ``` foldable propagation happens incorrectly: ``` Join LeftOuter, (col2#6 = col4#34) Join LeftOuter, (col2#6 = col4#34) !:- Project [col2#6] :- Project [1 AS col2#6] : +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, deserialized, 1 replicas) : +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, deserialized, 1 replicas) :+- Union :+- Union : :- *(1) Project [value#1 AS col1#4, 1 AS col2#6] : :- *(1) Project [value#1 AS col1#4, 1 AS col2#6] : : +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2)) : : +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2)) : : +- *(1) LocalTableScan [value#1] : : +- *(1) LocalTableScan [value#1] : +- *(2) Project [value#10 AS col1#13, 2 AS col2#15] : +- *(2) Project [value#10 AS col1#13, 2 AS col2#15] : +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) : +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) : +- *(2) LocalTableScan [value#10] : +- *(2) LocalTableScan [value#10] +- Project [col4#34] +- Project [col4#34] +- Join Inner, (col2#6 = col4#34) +- Join Inner, (col2#6 = col4#34) :- Project [value#31 AS col4#34] :- Project [value#31 AS col4#34] : +- LocalRelation [value#31] : +- LocalRelation [value#31] +- Project [col2#6] +- Project [col2#6] +- Union false, false +- Union false, false :- Project [1 AS col2#6] :- Project [1 AS col2#6] : +- LocalRelation [value#1] : +- LocalRelation [value#1] +- Project [2 AS col2#15] +- Project [2 AS col2#15] +- LocalRelation [value#10] +- LocalRelation [value#10] ``` and so the result is wrong: ``` +++ |col2|col4| +++ | 1|null| +++ ``` After this PR foldable propagation will not happen incorrectly and the result is correct: ``` +++ |col2|col4| +++ | 2| 2| +++ ``` ### Why are the changes needed? To fix a correctness issue. ### Does this PR introduce _any_ user-facing change? Yes, fixes a correctness issue. ### How was this patch tested? Existing and new UTs. Closes #29771 from peter-toth/SPARK-32635-fix-foldable-propagation. Authored-by: Peter Toth Signed-off-by: Takeshi
[spark] branch master updated (ea3b979 -> 4ced588)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ea3b979 [SPARK-32889][SQL] orc table column name supports special characters add 4ced588 [SPARK-32635][SQL] Fix foldable propagation No new revisions were added by this update. Summary of changes: .../sql/catalyst/expressions/AttributeMap.scala| 2 + .../sql/catalyst/expressions/AttributeMap.scala| 2 + .../spark/sql/catalyst/optimizer/expressions.scala | 121 - .../org/apache/spark/sql/DataFrameSuite.scala | 12 ++ 4 files changed, 88 insertions(+), 49 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32635][SQL] Fix foldable propagation
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new ecc2f5d [SPARK-32635][SQL] Fix foldable propagation ecc2f5d is described below commit ecc2f5d9e227b62f418d65708f516ffe8e690f96 Author: Peter Toth AuthorDate: Fri Sep 18 08:17:23 2020 +0900 [SPARK-32635][SQL] Fix foldable propagation ### What changes were proposed in this pull request? This PR rewrites `FoldablePropagation` rule to replace attribute references in a node with foldables coming only from the node's children. Before this PR in the case of this example (with setting`spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation`): ```scala val a = Seq("1").toDF("col1").withColumn("col2", lit("1")) val b = Seq("2").toDF("col1").withColumn("col2", lit("2")) val aub = a.union(b) val c = aub.filter($"col1" === "2").cache() val d = Seq("2").toDF( "col4") val r = d.join(aub, $"col2" === $"col4").select("col4") val l = c.select("col2") val df = l.join(r, $"col2" === $"col4", "LeftOuter") df.show() ``` foldable propagation happens incorrectly: ``` Join LeftOuter, (col2#6 = col4#34) Join LeftOuter, (col2#6 = col4#34) !:- Project [col2#6] :- Project [1 AS col2#6] : +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, deserialized, 1 replicas) : +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, deserialized, 1 replicas) :+- Union :+- Union : :- *(1) Project [value#1 AS col1#4, 1 AS col2#6] : :- *(1) Project [value#1 AS col1#4, 1 AS col2#6] : : +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2)) : : +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2)) : : +- *(1) LocalTableScan [value#1] : : +- *(1) LocalTableScan [value#1] : +- *(2) Project [value#10 AS col1#13, 2 AS col2#15] : +- *(2) Project [value#10 AS col1#13, 2 AS col2#15] : +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) : +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) : +- *(2) LocalTableScan [value#10] : +- *(2) LocalTableScan [value#10] +- Project [col4#34] +- Project [col4#34] +- Join Inner, (col2#6 = col4#34) +- Join Inner, (col2#6 = col4#34) :- Project [value#31 AS col4#34] :- Project [value#31 AS col4#34] : +- LocalRelation [value#31] : +- LocalRelation [value#31] +- Project [col2#6] +- Project [col2#6] +- Union false, false +- Union false, false :- Project [1 AS col2#6] :- Project [1 AS col2#6] : +- LocalRelation [value#1] : +- LocalRelation [value#1] +- Project [2 AS col2#15] +- Project [2 AS col2#15] +- LocalRelation [value#10] +- LocalRelation [value#10] ``` and so the result is wrong: ``` +++ |col2|col4| +++ | 1|null| +++ ``` After this PR foldable propagation will not happen incorrectly and the result is correct: ``` +++ |col2|col4| +++ | 2| 2| +++ ``` ### Why are the changes needed? To fix a correctness issue. ### Does this PR introduce _any_ user-facing change? Yes, fixes a correctness issue. ### How was this patch tested? Existing and new UTs. Closes #29771 from peter-toth/SPARK-32635-fix-foldable-propagation. Authored-by: Peter Toth Signed-off-by: Takeshi
[spark] branch master updated (ea3b979 -> 4ced588)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ea3b979 [SPARK-32889][SQL] orc table column name supports special characters add 4ced588 [SPARK-32635][SQL] Fix foldable propagation No new revisions were added by this update. Summary of changes: .../sql/catalyst/expressions/AttributeMap.scala| 2 + .../sql/catalyst/expressions/AttributeMap.scala| 2 + .../spark/sql/catalyst/optimizer/expressions.scala | 121 - .../org/apache/spark/sql/DataFrameSuite.scala | 12 ++ 4 files changed, 88 insertions(+), 49 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32635][SQL] Fix foldable propagation
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new ecc2f5d [SPARK-32635][SQL] Fix foldable propagation ecc2f5d is described below commit ecc2f5d9e227b62f418d65708f516ffe8e690f96 Author: Peter Toth AuthorDate: Fri Sep 18 08:17:23 2020 +0900 [SPARK-32635][SQL] Fix foldable propagation ### What changes were proposed in this pull request? This PR rewrites `FoldablePropagation` rule to replace attribute references in a node with foldables coming only from the node's children. Before this PR in the case of this example (with setting`spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation`): ```scala val a = Seq("1").toDF("col1").withColumn("col2", lit("1")) val b = Seq("2").toDF("col1").withColumn("col2", lit("2")) val aub = a.union(b) val c = aub.filter($"col1" === "2").cache() val d = Seq("2").toDF( "col4") val r = d.join(aub, $"col2" === $"col4").select("col4") val l = c.select("col2") val df = l.join(r, $"col2" === $"col4", "LeftOuter") df.show() ``` foldable propagation happens incorrectly: ``` Join LeftOuter, (col2#6 = col4#34) Join LeftOuter, (col2#6 = col4#34) !:- Project [col2#6] :- Project [1 AS col2#6] : +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, deserialized, 1 replicas) : +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, deserialized, 1 replicas) :+- Union :+- Union : :- *(1) Project [value#1 AS col1#4, 1 AS col2#6] : :- *(1) Project [value#1 AS col1#4, 1 AS col2#6] : : +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2)) : : +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2)) : : +- *(1) LocalTableScan [value#1] : : +- *(1) LocalTableScan [value#1] : +- *(2) Project [value#10 AS col1#13, 2 AS col2#15] : +- *(2) Project [value#10 AS col1#13, 2 AS col2#15] : +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) : +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) : +- *(2) LocalTableScan [value#10] : +- *(2) LocalTableScan [value#10] +- Project [col4#34] +- Project [col4#34] +- Join Inner, (col2#6 = col4#34) +- Join Inner, (col2#6 = col4#34) :- Project [value#31 AS col4#34] :- Project [value#31 AS col4#34] : +- LocalRelation [value#31] : +- LocalRelation [value#31] +- Project [col2#6] +- Project [col2#6] +- Union false, false +- Union false, false :- Project [1 AS col2#6] :- Project [1 AS col2#6] : +- LocalRelation [value#1] : +- LocalRelation [value#1] +- Project [2 AS col2#15] +- Project [2 AS col2#15] +- LocalRelation [value#10] +- LocalRelation [value#10] ``` and so the result is wrong: ``` +++ |col2|col4| +++ | 1|null| +++ ``` After this PR foldable propagation will not happen incorrectly and the result is correct: ``` +++ |col2|col4| +++ | 2| 2| +++ ``` ### Why are the changes needed? To fix a correctness issue. ### Does this PR introduce _any_ user-facing change? Yes, fixes a correctness issue. ### How was this patch tested? Existing and new UTs. Closes #29771 from peter-toth/SPARK-32635-fix-foldable-propagation. Authored-by: Peter Toth Signed-off-by: Takeshi
[spark] branch master updated (ea3b979 -> 4ced588)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ea3b979 [SPARK-32889][SQL] orc table column name supports special characters add 4ced588 [SPARK-32635][SQL] Fix foldable propagation No new revisions were added by this update. Summary of changes: .../sql/catalyst/expressions/AttributeMap.scala| 2 + .../sql/catalyst/expressions/AttributeMap.scala| 2 + .../spark/sql/catalyst/optimizer/expressions.scala | 121 - .../org/apache/spark/sql/DataFrameSuite.scala | 12 ++ 4 files changed, 88 insertions(+), 49 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32635][SQL] Fix foldable propagation
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new ecc2f5d [SPARK-32635][SQL] Fix foldable propagation ecc2f5d is described below commit ecc2f5d9e227b62f418d65708f516ffe8e690f96 Author: Peter Toth AuthorDate: Fri Sep 18 08:17:23 2020 +0900 [SPARK-32635][SQL] Fix foldable propagation ### What changes were proposed in this pull request? This PR rewrites `FoldablePropagation` rule to replace attribute references in a node with foldables coming only from the node's children. Before this PR in the case of this example (with setting`spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation`): ```scala val a = Seq("1").toDF("col1").withColumn("col2", lit("1")) val b = Seq("2").toDF("col1").withColumn("col2", lit("2")) val aub = a.union(b) val c = aub.filter($"col1" === "2").cache() val d = Seq("2").toDF( "col4") val r = d.join(aub, $"col2" === $"col4").select("col4") val l = c.select("col2") val df = l.join(r, $"col2" === $"col4", "LeftOuter") df.show() ``` foldable propagation happens incorrectly: ``` Join LeftOuter, (col2#6 = col4#34) Join LeftOuter, (col2#6 = col4#34) !:- Project [col2#6] :- Project [1 AS col2#6] : +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, deserialized, 1 replicas) : +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, deserialized, 1 replicas) :+- Union :+- Union : :- *(1) Project [value#1 AS col1#4, 1 AS col2#6] : :- *(1) Project [value#1 AS col1#4, 1 AS col2#6] : : +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2)) : : +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2)) : : +- *(1) LocalTableScan [value#1] : : +- *(1) LocalTableScan [value#1] : +- *(2) Project [value#10 AS col1#13, 2 AS col2#15] : +- *(2) Project [value#10 AS col1#13, 2 AS col2#15] : +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) : +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) : +- *(2) LocalTableScan [value#10] : +- *(2) LocalTableScan [value#10] +- Project [col4#34] +- Project [col4#34] +- Join Inner, (col2#6 = col4#34) +- Join Inner, (col2#6 = col4#34) :- Project [value#31 AS col4#34] :- Project [value#31 AS col4#34] : +- LocalRelation [value#31] : +- LocalRelation [value#31] +- Project [col2#6] +- Project [col2#6] +- Union false, false +- Union false, false :- Project [1 AS col2#6] :- Project [1 AS col2#6] : +- LocalRelation [value#1] : +- LocalRelation [value#1] +- Project [2 AS col2#15] +- Project [2 AS col2#15] +- LocalRelation [value#10] +- LocalRelation [value#10] ``` and so the result is wrong: ``` +++ |col2|col4| +++ | 1|null| +++ ``` After this PR foldable propagation will not happen incorrectly and the result is correct: ``` +++ |col2|col4| +++ | 2| 2| +++ ``` ### Why are the changes needed? To fix a correctness issue. ### Does this PR introduce _any_ user-facing change? Yes, fixes a correctness issue. ### How was this patch tested? Existing and new UTs. Closes #29771 from peter-toth/SPARK-32635-fix-foldable-propagation. Authored-by: Peter Toth Signed-off-by: Takeshi
[spark] branch master updated: [SPARK-32635][SQL] Fix foldable propagation
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4ced588 [SPARK-32635][SQL] Fix foldable propagation 4ced588 is described below commit 4ced58862c707aa916f7a55d15c3887c94c9b210 Author: Peter Toth AuthorDate: Fri Sep 18 08:17:23 2020 +0900 [SPARK-32635][SQL] Fix foldable propagation ### What changes were proposed in this pull request? This PR rewrites `FoldablePropagation` rule to replace attribute references in a node with foldables coming only from the node's children. Before this PR in the case of this example (with setting`spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation`): ```scala val a = Seq("1").toDF("col1").withColumn("col2", lit("1")) val b = Seq("2").toDF("col1").withColumn("col2", lit("2")) val aub = a.union(b) val c = aub.filter($"col1" === "2").cache() val d = Seq("2").toDF( "col4") val r = d.join(aub, $"col2" === $"col4").select("col4") val l = c.select("col2") val df = l.join(r, $"col2" === $"col4", "LeftOuter") df.show() ``` foldable propagation happens incorrectly: ``` Join LeftOuter, (col2#6 = col4#34) Join LeftOuter, (col2#6 = col4#34) !:- Project [col2#6] :- Project [1 AS col2#6] : +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, deserialized, 1 replicas) : +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, deserialized, 1 replicas) :+- Union :+- Union : :- *(1) Project [value#1 AS col1#4, 1 AS col2#6] : :- *(1) Project [value#1 AS col1#4, 1 AS col2#6] : : +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2)) : : +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2)) : : +- *(1) LocalTableScan [value#1] : : +- *(1) LocalTableScan [value#1] : +- *(2) Project [value#10 AS col1#13, 2 AS col2#15] : +- *(2) Project [value#10 AS col1#13, 2 AS col2#15] : +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) : +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) : +- *(2) LocalTableScan [value#10] : +- *(2) LocalTableScan [value#10] +- Project [col4#34] +- Project [col4#34] +- Join Inner, (col2#6 = col4#34) +- Join Inner, (col2#6 = col4#34) :- Project [value#31 AS col4#34] :- Project [value#31 AS col4#34] : +- LocalRelation [value#31] : +- LocalRelation [value#31] +- Project [col2#6] +- Project [col2#6] +- Union false, false +- Union false, false :- Project [1 AS col2#6] :- Project [1 AS col2#6] : +- LocalRelation [value#1] : +- LocalRelation [value#1] +- Project [2 AS col2#15] +- Project [2 AS col2#15] +- LocalRelation [value#10] +- LocalRelation [value#10] ``` and so the result is wrong: ``` +++ |col2|col4| +++ | 1|null| +++ ``` After this PR foldable propagation will not happen incorrectly and the result is correct: ``` +++ |col2|col4| +++ | 2| 2| +++ ``` ### Why are the changes needed? To fix a correctness issue. ### Does this PR introduce _any_ user-facing change? Yes, fixes a correctness issue. ### How was this patch tested? Existing and new UTs. Closes #29771 from peter-toth/SPARK-32635-fix-foldable-propagation. Authored-by: Peter Toth Signed-off-by: Takeshi Yamamuro ---
[spark] branch master updated (5817c58 -> ea3b979)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5817c58 [SPARK-32909][SQL] Pass all `sql/hive-thriftserver` module UTs in Scala 2.13 add ea3b979 [SPARK-32889][SQL] orc table column name supports special characters No new revisions were added by this update. Summary of changes: .../execution/datasources/orc/OrcFileFormat.scala | 2 +- .../spark/sql/FileBasedDataSourceSuite.scala | 14 .../spark/sql/hive/execution/SQLQuerySuite.scala | 74 ++ 3 files changed, 64 insertions(+), 26 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5817c58 -> ea3b979)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5817c58 [SPARK-32909][SQL] Pass all `sql/hive-thriftserver` module UTs in Scala 2.13 add ea3b979 [SPARK-32889][SQL] orc table column name supports special characters No new revisions were added by this update. Summary of changes: .../execution/datasources/orc/OrcFileFormat.scala | 2 +- .../spark/sql/FileBasedDataSourceSuite.scala | 14 .../spark/sql/hive/execution/SQLQuerySuite.scala | 74 ++ 3 files changed, 64 insertions(+), 26 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5817c58 -> ea3b979)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5817c58 [SPARK-32909][SQL] Pass all `sql/hive-thriftserver` module UTs in Scala 2.13 add ea3b979 [SPARK-32889][SQL] orc table column name supports special characters No new revisions were added by this update. Summary of changes: .../execution/datasources/orc/OrcFileFormat.scala | 2 +- .../spark/sql/FileBasedDataSourceSuite.scala | 14 .../spark/sql/hive/execution/SQLQuerySuite.scala | 74 ++ 3 files changed, 64 insertions(+), 26 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5817c58 -> ea3b979)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5817c58 [SPARK-32909][SQL] Pass all `sql/hive-thriftserver` module UTs in Scala 2.13 add ea3b979 [SPARK-32889][SQL] orc table column name supports special characters No new revisions were added by this update. Summary of changes: .../execution/datasources/orc/OrcFileFormat.scala | 2 +- .../spark/sql/FileBasedDataSourceSuite.scala | 14 .../spark/sql/hive/execution/SQLQuerySuite.scala | 74 ++ 3 files changed, 64 insertions(+), 26 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5817c58 -> ea3b979)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5817c58 [SPARK-32909][SQL] Pass all `sql/hive-thriftserver` module UTs in Scala 2.13 add ea3b979 [SPARK-32889][SQL] orc table column name supports special characters No new revisions were added by this update. Summary of changes: .../execution/datasources/orc/OrcFileFormat.scala | 2 +- .../spark/sql/FileBasedDataSourceSuite.scala | 14 .../spark/sql/hive/execution/SQLQuerySuite.scala | 74 ++ 3 files changed, 64 insertions(+), 26 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a8442c2 -> 5817c58)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a8442c2 [SPARK-32926][TESTS] Add Scala 2.13 build test in GitHub Action add 5817c58 [SPARK-32909][SQL] Pass all `sql/hive-thriftserver` module UTs in Scala 2.13 No new revisions were added by this update. Summary of changes: sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala | 2 +- .../apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala| 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a8442c2 -> 5817c58)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a8442c2 [SPARK-32926][TESTS] Add Scala 2.13 build test in GitHub Action add 5817c58 [SPARK-32909][SQL] Pass all `sql/hive-thriftserver` module UTs in Scala 2.13 No new revisions were added by this update. Summary of changes: sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala | 2 +- .../apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala| 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a8442c2 -> 5817c58)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a8442c2 [SPARK-32926][TESTS] Add Scala 2.13 build test in GitHub Action add 5817c58 [SPARK-32909][SQL] Pass all `sql/hive-thriftserver` module UTs in Scala 2.13 No new revisions were added by this update. Summary of changes: sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala | 2 +- .../apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala| 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a8442c2 -> 5817c58)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a8442c2 [SPARK-32926][TESTS] Add Scala 2.13 build test in GitHub Action add 5817c58 [SPARK-32909][SQL] Pass all `sql/hive-thriftserver` module UTs in Scala 2.13 No new revisions were added by this update. Summary of changes: sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala | 2 +- .../apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala| 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a8442c2 -> 5817c58)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a8442c2 [SPARK-32926][TESTS] Add Scala 2.13 build test in GitHub Action add 5817c58 [SPARK-32909][SQL] Pass all `sql/hive-thriftserver` module UTs in Scala 2.13 No new revisions were added by this update. Summary of changes: sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala | 2 +- .../apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala| 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (88e87bc -> a8442c2)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 88e87bc [SPARK-32887][DOC] Correct the typo for SHOW TABLE add a8442c2 [SPARK-32926][TESTS] Add Scala 2.13 build test in GitHub Action No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 26 ++ 1 file changed, 26 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (88e87bc -> a8442c2)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 88e87bc [SPARK-32887][DOC] Correct the typo for SHOW TABLE add a8442c2 [SPARK-32926][TESTS] Add Scala 2.13 build test in GitHub Action No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 26 ++ 1 file changed, 26 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (88e87bc -> a8442c2)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 88e87bc [SPARK-32887][DOC] Correct the typo for SHOW TABLE add a8442c2 [SPARK-32926][TESTS] Add Scala 2.13 build test in GitHub Action No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 26 ++ 1 file changed, 26 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (88e87bc -> a8442c2)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 88e87bc [SPARK-32887][DOC] Correct the typo for SHOW TABLE add a8442c2 [SPARK-32926][TESTS] Add Scala 2.13 build test in GitHub Action No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 26 ++ 1 file changed, 26 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (88e87bc -> a8442c2)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 88e87bc [SPARK-32887][DOC] Correct the typo for SHOW TABLE add a8442c2 [SPARK-32926][TESTS] Add Scala 2.13 build test in GitHub Action No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 26 ++ 1 file changed, 26 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32738][CORE][3.0] Should reduce the number of active threads if fatal error happens in `Inbox.process`
This is an automated email from the ASF dual-hosted git repository. mridulm80 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 17a5195 [SPARK-32738][CORE][3.0] Should reduce the number of active threads if fatal error happens in `Inbox.process` 17a5195 is described below commit 17a5195dead75a4dffd150f3f69fa4983e05561b Author: Zhenhua Wang AuthorDate: Thu Sep 17 12:22:35 2020 -0500 [SPARK-32738][CORE][3.0] Should reduce the number of active threads if fatal error happens in `Inbox.process` This is a backport for [pr#29580](https://github.com/apache/spark/pull/29580) to branch 3.0. ### What changes were proposed in this pull request? Processing for `ThreadSafeRpcEndpoint` is controlled by `numActiveThreads` in `Inbox`. Now if any fatal error happens during `Inbox.process`, `numActiveThreads` is not reduced. Then other threads can not process messages in that inbox, which causes the endpoint to "hang". For other type of endpoints, we also should keep `numActiveThreads` correct. This problem is more serious in previous Spark 2.x versions since the driver, executor and block manager endpoints are all thread safe endpoints. To fix this, we should reduce the number of active threads if fatal error happens in `Inbox.process`. ### Why are the changes needed? `numActiveThreads` is not correct when fatal error happens and will cause the described problem. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Add a new test. Closes #29763 from wzhfy/deal_with_fatal_error_3.0. Authored-by: Zhenhua Wang Signed-off-by: Mridul Muralidharan gmail.com> --- .../scala/org/apache/spark/rpc/netty/Inbox.scala | 20 .../org/apache/spark/rpc/netty/InboxSuite.scala | 13 + 2 files changed, 33 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/rpc/netty/Inbox.scala b/core/src/main/scala/org/apache/spark/rpc/netty/Inbox.scala index 2ed03f7..472401b 100644 --- a/core/src/main/scala/org/apache/spark/rpc/netty/Inbox.scala +++ b/core/src/main/scala/org/apache/spark/rpc/netty/Inbox.scala @@ -200,6 +200,16 @@ private[netty] class Inbox(val endpointName: String, val endpoint: RpcEndpoint) * Calls action closure, and calls the endpoint's onError function in the case of exceptions. */ private def safelyCall(endpoint: RpcEndpoint)(action: => Unit): Unit = { +def dealWithFatalError(fatal: Throwable): Unit = { + inbox.synchronized { +assert(numActiveThreads > 0, "The number of active threads should be positive.") +// Should reduce the number of active threads before throw the error. +numActiveThreads -= 1 + } + logError(s"An error happened while processing message in the inbox for $endpointName", fatal) + throw fatal +} + try action catch { case NonFatal(e) => try endpoint.onError(e) catch { @@ -209,8 +219,18 @@ private[netty] class Inbox(val endpointName: String, val endpoint: RpcEndpoint) } else { logError("Ignoring error", ee) } + case fatal: Throwable => +dealWithFatalError(fatal) } + case fatal: Throwable => +dealWithFatalError(fatal) } } + // exposed only for testing + def getNumActiveThreads: Int = { +inbox.synchronized { + inbox.numActiveThreads +} + } } diff --git a/core/src/test/scala/org/apache/spark/rpc/netty/InboxSuite.scala b/core/src/test/scala/org/apache/spark/rpc/netty/InboxSuite.scala index c74c728..8b1c602 100644 --- a/core/src/test/scala/org/apache/spark/rpc/netty/InboxSuite.scala +++ b/core/src/test/scala/org/apache/spark/rpc/netty/InboxSuite.scala @@ -136,4 +136,17 @@ class InboxSuite extends SparkFunSuite { endpoint.verifySingleOnNetworkErrorMessage(cause, remoteAddress) } + + test("SPARK-32738: should reduce the number of active threads when fatal error happens") { +val endpoint = mock(classOf[TestRpcEndpoint]) +when(endpoint.receive).thenThrow(new OutOfMemoryError()) + +val dispatcher = mock(classOf[Dispatcher]) +val inbox = new Inbox("name", endpoint) +inbox.post(OneWayMessage(null, "hi")) +intercept[OutOfMemoryError] { + inbox.process(dispatcher) +} +assert(inbox.getNumActiveThreads == 0) + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (482a79a -> 88e87bc)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 482a79a [SPARK-24994][SQL][FOLLOW-UP] Handle foldable, timezone and cleanup add 88e87bc [SPARK-32887][DOC] Correct the typo for SHOW TABLE No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-aux-show-table.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32887][DOC] Correct the typo for SHOW TABLE
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new b3b6f38 [SPARK-32887][DOC] Correct the typo for SHOW TABLE b3b6f38 is described below commit b3b6f381108b770c38753ebdb55e209361674008 Author: Udbhav30 AuthorDate: Thu Sep 17 09:25:17 2020 -0700 [SPARK-32887][DOC] Correct the typo for SHOW TABLE ### What changes were proposed in this pull request? Correct the typo in Show Table document ### Why are the changes needed? Current Document of Show Table returns in parse error, so it is misleading to users ### Does this PR introduce _any_ user-facing change? Yes, the document of show table is corrected now ### How was this patch tested? NA Closes #29758 from Udbhav30/showtable. Authored-by: Udbhav30 Signed-off-by: Dongjoon Hyun (cherry picked from commit 88e87bc8ebfa5aa1a8cc8928672749517ae0c41f) Signed-off-by: Dongjoon Hyun --- docs/sql-ref-syntax-aux-show-table.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/sql-ref-syntax-aux-show-table.md b/docs/sql-ref-syntax-aux-show-table.md index 0ce0a3e..3314402 100644 --- a/docs/sql-ref-syntax-aux-show-table.md +++ b/docs/sql-ref-syntax-aux-show-table.md @@ -97,7 +97,7 @@ SHOW TABLE EXTENDED LIKE 'employee'; ++-+---+--+ -- showing the multiple table details with pattern matching -SHOW TABLE EXTENDED LIKE `employe*`; +SHOW TABLE EXTENDED LIKE 'employe*'; ++-+---+--+ |database|tableName|isTemporary| information | ++-+---+--+ @@ -146,7 +146,7 @@ SHOW TABLE EXTENDED LIKE `employe*`; ++-+--+---+ -- show partition file system details -SHOW TABLE EXTENDED IN default LIKE `employee` PARTITION (`grade=1`); +SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION (grade=1); ++-+---+--+ |database|tableName|isTemporary| information | ++-+---+--+ @@ -169,7 +169,7 @@ SHOW TABLE EXTENDED IN default LIKE `employee` PARTITION (`grade=1`); ++-+---+--+ -- show partition file system details with regex fails as shown below -SHOW TABLE EXTENDED IN default LIKE `empl*` PARTITION (`grade=1`); +SHOW TABLE EXTENDED IN default LIKE 'empl*' PARTITION (grade=1); Error: Error running query: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'emplo*' not found in database 'default'; (state=,code=0) ``` - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (482a79a -> 88e87bc)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 482a79a [SPARK-24994][SQL][FOLLOW-UP] Handle foldable, timezone and cleanup add 88e87bc [SPARK-32887][DOC] Correct the typo for SHOW TABLE No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-aux-show-table.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32887][DOC] Correct the typo for SHOW TABLE
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new b3b6f38 [SPARK-32887][DOC] Correct the typo for SHOW TABLE b3b6f38 is described below commit b3b6f381108b770c38753ebdb55e209361674008 Author: Udbhav30 AuthorDate: Thu Sep 17 09:25:17 2020 -0700 [SPARK-32887][DOC] Correct the typo for SHOW TABLE ### What changes were proposed in this pull request? Correct the typo in Show Table document ### Why are the changes needed? Current Document of Show Table returns in parse error, so it is misleading to users ### Does this PR introduce _any_ user-facing change? Yes, the document of show table is corrected now ### How was this patch tested? NA Closes #29758 from Udbhav30/showtable. Authored-by: Udbhav30 Signed-off-by: Dongjoon Hyun (cherry picked from commit 88e87bc8ebfa5aa1a8cc8928672749517ae0c41f) Signed-off-by: Dongjoon Hyun --- docs/sql-ref-syntax-aux-show-table.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/sql-ref-syntax-aux-show-table.md b/docs/sql-ref-syntax-aux-show-table.md index 0ce0a3e..3314402 100644 --- a/docs/sql-ref-syntax-aux-show-table.md +++ b/docs/sql-ref-syntax-aux-show-table.md @@ -97,7 +97,7 @@ SHOW TABLE EXTENDED LIKE 'employee'; ++-+---+--+ -- showing the multiple table details with pattern matching -SHOW TABLE EXTENDED LIKE `employe*`; +SHOW TABLE EXTENDED LIKE 'employe*'; ++-+---+--+ |database|tableName|isTemporary| information | ++-+---+--+ @@ -146,7 +146,7 @@ SHOW TABLE EXTENDED LIKE `employe*`; ++-+--+---+ -- show partition file system details -SHOW TABLE EXTENDED IN default LIKE `employee` PARTITION (`grade=1`); +SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION (grade=1); ++-+---+--+ |database|tableName|isTemporary| information | ++-+---+--+ @@ -169,7 +169,7 @@ SHOW TABLE EXTENDED IN default LIKE `employee` PARTITION (`grade=1`); ++-+---+--+ -- show partition file system details with regex fails as shown below -SHOW TABLE EXTENDED IN default LIKE `empl*` PARTITION (`grade=1`); +SHOW TABLE EXTENDED IN default LIKE 'empl*' PARTITION (grade=1); Error: Error running query: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'emplo*' not found in database 'default'; (state=,code=0) ``` - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (482a79a -> 88e87bc)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 482a79a [SPARK-24994][SQL][FOLLOW-UP] Handle foldable, timezone and cleanup add 88e87bc [SPARK-32887][DOC] Correct the typo for SHOW TABLE No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-aux-show-table.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32887][DOC] Correct the typo for SHOW TABLE
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new b3b6f38 [SPARK-32887][DOC] Correct the typo for SHOW TABLE b3b6f38 is described below commit b3b6f381108b770c38753ebdb55e209361674008 Author: Udbhav30 AuthorDate: Thu Sep 17 09:25:17 2020 -0700 [SPARK-32887][DOC] Correct the typo for SHOW TABLE ### What changes were proposed in this pull request? Correct the typo in Show Table document ### Why are the changes needed? Current Document of Show Table returns in parse error, so it is misleading to users ### Does this PR introduce _any_ user-facing change? Yes, the document of show table is corrected now ### How was this patch tested? NA Closes #29758 from Udbhav30/showtable. Authored-by: Udbhav30 Signed-off-by: Dongjoon Hyun (cherry picked from commit 88e87bc8ebfa5aa1a8cc8928672749517ae0c41f) Signed-off-by: Dongjoon Hyun --- docs/sql-ref-syntax-aux-show-table.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/sql-ref-syntax-aux-show-table.md b/docs/sql-ref-syntax-aux-show-table.md index 0ce0a3e..3314402 100644 --- a/docs/sql-ref-syntax-aux-show-table.md +++ b/docs/sql-ref-syntax-aux-show-table.md @@ -97,7 +97,7 @@ SHOW TABLE EXTENDED LIKE 'employee'; ++-+---+--+ -- showing the multiple table details with pattern matching -SHOW TABLE EXTENDED LIKE `employe*`; +SHOW TABLE EXTENDED LIKE 'employe*'; ++-+---+--+ |database|tableName|isTemporary| information | ++-+---+--+ @@ -146,7 +146,7 @@ SHOW TABLE EXTENDED LIKE `employe*`; ++-+--+---+ -- show partition file system details -SHOW TABLE EXTENDED IN default LIKE `employee` PARTITION (`grade=1`); +SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION (grade=1); ++-+---+--+ |database|tableName|isTemporary| information | ++-+---+--+ @@ -169,7 +169,7 @@ SHOW TABLE EXTENDED IN default LIKE `employee` PARTITION (`grade=1`); ++-+---+--+ -- show partition file system details with regex fails as shown below -SHOW TABLE EXTENDED IN default LIKE `empl*` PARTITION (`grade=1`); +SHOW TABLE EXTENDED IN default LIKE 'empl*' PARTITION (grade=1); Error: Error running query: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'emplo*' not found in database 'default'; (state=,code=0) ``` - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (482a79a -> 88e87bc)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 482a79a [SPARK-24994][SQL][FOLLOW-UP] Handle foldable, timezone and cleanup add 88e87bc [SPARK-32887][DOC] Correct the typo for SHOW TABLE No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-aux-show-table.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32887][DOC] Correct the typo for SHOW TABLE
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new b3b6f38 [SPARK-32887][DOC] Correct the typo for SHOW TABLE b3b6f38 is described below commit b3b6f381108b770c38753ebdb55e209361674008 Author: Udbhav30 AuthorDate: Thu Sep 17 09:25:17 2020 -0700 [SPARK-32887][DOC] Correct the typo for SHOW TABLE ### What changes were proposed in this pull request? Correct the typo in Show Table document ### Why are the changes needed? Current Document of Show Table returns in parse error, so it is misleading to users ### Does this PR introduce _any_ user-facing change? Yes, the document of show table is corrected now ### How was this patch tested? NA Closes #29758 from Udbhav30/showtable. Authored-by: Udbhav30 Signed-off-by: Dongjoon Hyun (cherry picked from commit 88e87bc8ebfa5aa1a8cc8928672749517ae0c41f) Signed-off-by: Dongjoon Hyun --- docs/sql-ref-syntax-aux-show-table.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/sql-ref-syntax-aux-show-table.md b/docs/sql-ref-syntax-aux-show-table.md index 0ce0a3e..3314402 100644 --- a/docs/sql-ref-syntax-aux-show-table.md +++ b/docs/sql-ref-syntax-aux-show-table.md @@ -97,7 +97,7 @@ SHOW TABLE EXTENDED LIKE 'employee'; ++-+---+--+ -- showing the multiple table details with pattern matching -SHOW TABLE EXTENDED LIKE `employe*`; +SHOW TABLE EXTENDED LIKE 'employe*'; ++-+---+--+ |database|tableName|isTemporary| information | ++-+---+--+ @@ -146,7 +146,7 @@ SHOW TABLE EXTENDED LIKE `employe*`; ++-+--+---+ -- show partition file system details -SHOW TABLE EXTENDED IN default LIKE `employee` PARTITION (`grade=1`); +SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION (grade=1); ++-+---+--+ |database|tableName|isTemporary| information | ++-+---+--+ @@ -169,7 +169,7 @@ SHOW TABLE EXTENDED IN default LIKE `employee` PARTITION (`grade=1`); ++-+---+--+ -- show partition file system details with regex fails as shown below -SHOW TABLE EXTENDED IN default LIKE `empl*` PARTITION (`grade=1`); +SHOW TABLE EXTENDED IN default LIKE 'empl*' PARTITION (grade=1); Error: Error running query: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'emplo*' not found in database 'default'; (state=,code=0) ``` - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (482a79a -> 88e87bc)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 482a79a [SPARK-24994][SQL][FOLLOW-UP] Handle foldable, timezone and cleanup add 88e87bc [SPARK-32887][DOC] Correct the typo for SHOW TABLE No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-aux-show-table.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32887][DOC] Correct the typo for SHOW TABLE
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new b3b6f38 [SPARK-32887][DOC] Correct the typo for SHOW TABLE b3b6f38 is described below commit b3b6f381108b770c38753ebdb55e209361674008 Author: Udbhav30 AuthorDate: Thu Sep 17 09:25:17 2020 -0700 [SPARK-32887][DOC] Correct the typo for SHOW TABLE ### What changes were proposed in this pull request? Correct the typo in Show Table document ### Why are the changes needed? Current Document of Show Table returns in parse error, so it is misleading to users ### Does this PR introduce _any_ user-facing change? Yes, the document of show table is corrected now ### How was this patch tested? NA Closes #29758 from Udbhav30/showtable. Authored-by: Udbhav30 Signed-off-by: Dongjoon Hyun (cherry picked from commit 88e87bc8ebfa5aa1a8cc8928672749517ae0c41f) Signed-off-by: Dongjoon Hyun --- docs/sql-ref-syntax-aux-show-table.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/sql-ref-syntax-aux-show-table.md b/docs/sql-ref-syntax-aux-show-table.md index 0ce0a3e..3314402 100644 --- a/docs/sql-ref-syntax-aux-show-table.md +++ b/docs/sql-ref-syntax-aux-show-table.md @@ -97,7 +97,7 @@ SHOW TABLE EXTENDED LIKE 'employee'; ++-+---+--+ -- showing the multiple table details with pattern matching -SHOW TABLE EXTENDED LIKE `employe*`; +SHOW TABLE EXTENDED LIKE 'employe*'; ++-+---+--+ |database|tableName|isTemporary| information | ++-+---+--+ @@ -146,7 +146,7 @@ SHOW TABLE EXTENDED LIKE `employe*`; ++-+--+---+ -- show partition file system details -SHOW TABLE EXTENDED IN default LIKE `employee` PARTITION (`grade=1`); +SHOW TABLE EXTENDED IN default LIKE 'employee' PARTITION (grade=1); ++-+---+--+ |database|tableName|isTemporary| information | ++-+---+--+ @@ -169,7 +169,7 @@ SHOW TABLE EXTENDED IN default LIKE `employee` PARTITION (`grade=1`); ++-+---+--+ -- show partition file system details with regex fails as shown below -SHOW TABLE EXTENDED IN default LIKE `empl*` PARTITION (`grade=1`); +SHOW TABLE EXTENDED IN default LIKE 'empl*' PARTITION (grade=1); Error: Error running query: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'emplo*' not found in database 'default'; (state=,code=0) ``` - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a54a6a0 -> 482a79a)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a54a6a0 [SPARK-32287][CORE] Fix flaky o.a.s.ExecutorAllocationManagerSuite on GithubActions add 482a79a [SPARK-24994][SQL][FOLLOW-UP] Handle foldable, timezone and cleanup No new revisions were added by this update. Summary of changes: .../sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala | 7 +-- .../catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala | 10 ++ .../scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala | 5 ++--- 3 files changed, 13 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a54a6a0 -> 482a79a)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a54a6a0 [SPARK-32287][CORE] Fix flaky o.a.s.ExecutorAllocationManagerSuite on GithubActions add 482a79a [SPARK-24994][SQL][FOLLOW-UP] Handle foldable, timezone and cleanup No new revisions were added by this update. Summary of changes: .../sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala | 7 +-- .../catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala | 10 ++ .../scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala | 5 ++--- 3 files changed, 13 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a54a6a0 -> 482a79a)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a54a6a0 [SPARK-32287][CORE] Fix flaky o.a.s.ExecutorAllocationManagerSuite on GithubActions add 482a79a [SPARK-24994][SQL][FOLLOW-UP] Handle foldable, timezone and cleanup No new revisions were added by this update. Summary of changes: .../sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala | 7 +-- .../catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala | 10 ++ .../scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala | 5 ++--- 3 files changed, 13 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a54a6a0 -> 482a79a)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a54a6a0 [SPARK-32287][CORE] Fix flaky o.a.s.ExecutorAllocationManagerSuite on GithubActions add 482a79a [SPARK-24994][SQL][FOLLOW-UP] Handle foldable, timezone and cleanup No new revisions were added by this update. Summary of changes: .../sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala | 7 +-- .../catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala | 10 ++ .../scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala | 5 ++--- 3 files changed, 13 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a54a6a0 -> 482a79a)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a54a6a0 [SPARK-32287][CORE] Fix flaky o.a.s.ExecutorAllocationManagerSuite on GithubActions add 482a79a [SPARK-24994][SQL][FOLLOW-UP] Handle foldable, timezone and cleanup No new revisions were added by this update. Summary of changes: .../sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala | 7 +-- .../catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala | 10 ++ .../scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala | 5 ++--- 3 files changed, 13 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e5e54a3 -> a54a6a0)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e5e54a3 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls add a54a6a0 [SPARK-32287][CORE] Fix flaky o.a.s.ExecutorAllocationManagerSuite on GithubActions No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/ExecutorAllocationManager.scala | 13 ++--- .../main/scala/org/apache/spark/internal/config/Tests.scala | 10 +- .../org/apache/spark/ExecutorAllocationManagerSuite.scala | 9 + 3 files changed, 16 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e5e54a3 -> a54a6a0)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e5e54a3 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls add a54a6a0 [SPARK-32287][CORE] Fix flaky o.a.s.ExecutorAllocationManagerSuite on GithubActions No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/ExecutorAllocationManager.scala | 13 ++--- .../main/scala/org/apache/spark/internal/config/Tests.scala | 10 +- .../org/apache/spark/ExecutorAllocationManagerSuite.scala | 9 + 3 files changed, 16 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e5e54a3 -> a54a6a0)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e5e54a3 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls add a54a6a0 [SPARK-32287][CORE] Fix flaky o.a.s.ExecutorAllocationManagerSuite on GithubActions No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/ExecutorAllocationManager.scala | 13 ++--- .../main/scala/org/apache/spark/internal/config/Tests.scala | 10 +- .../org/apache/spark/ExecutorAllocationManagerSuite.scala | 9 + 3 files changed, 16 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e5e54a3 -> a54a6a0)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e5e54a3 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls add a54a6a0 [SPARK-32287][CORE] Fix flaky o.a.s.ExecutorAllocationManagerSuite on GithubActions No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/ExecutorAllocationManager.scala | 13 ++--- .../main/scala/org/apache/spark/internal/config/Tests.scala | 10 +- .../org/apache/spark/ExecutorAllocationManagerSuite.scala | 9 + 3 files changed, 16 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e5e54a3 -> a54a6a0)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e5e54a3 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls add a54a6a0 [SPARK-32287][CORE] Fix flaky o.a.s.ExecutorAllocationManagerSuite on GithubActions No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/ExecutorAllocationManager.scala | 13 ++--- .../main/scala/org/apache/spark/internal/config/Tests.scala | 10 +- .../org/apache/spark/ExecutorAllocationManagerSuite.scala | 9 + 3 files changed, 16 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls
This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 2fa68a6 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls 2fa68a6 is described below commit 2fa68a669cc83521c7257d844202790933ae9771 Author: Tom van Bussel AuthorDate: Thu Sep 17 12:35:40 2020 +0200 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls ### What changes were proposed in this pull request? This PR changes the way `UnsafeExternalSorter.SpillableIterator` checks whether it has spilled already, by checking whether `inMemSorter` is null. It also allows it to spill other `UnsafeSorterIterator`s than `UnsafeInMemorySorter.SortedIterator`. ### Why are the changes needed? Before this PR `UnsafeExternalSorter.SpillableIterator` could not spill when there are NULLs in the input and radix sorting is used. Currently, Spark determines whether UnsafeExternalSorter.SpillableIterator has not spilled yet by checking whether `upstream` is an instance of `UnsafeInMemorySorter.SortedIterator`. When radix sorting is used and there are NULLs in the input however, `upstream` will be an instance of `UnsafeExternalSorter.ChainedIterator` instead, and Spark will assume [...] ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A test was added to `UnsafeExternalSorterSuite` (and therefore also to `UnsafeExternalSorterRadixSortSuite`). I manually confirmed that the test failed in `UnsafeExternalSorterRadixSortSuite` without this patch. Closes #29772 from tomvanbussel/SPARK-32900. Authored-by: Tom van Bussel Signed-off-by: herman (cherry picked from commit e5e54a3614ffd2a9150921e84e5b813d5cbf285a) Signed-off-by: herman --- .../unsafe/sort/UnsafeExternalSorter.java | 69 +- .../unsafe/sort/UnsafeInMemorySorter.java | 1 + .../unsafe/sort/UnsafeSorterIterator.java | 2 + .../unsafe/sort/UnsafeSorterSpillMerger.java | 5 ++ .../unsafe/sort/UnsafeSorterSpillReader.java | 5 ++ .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++ 6 files changed, 88 insertions(+), 27 deletions(-) diff --git a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java index a6a2076..f720ccd 100644 --- a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java +++ b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java @@ -505,11 +505,15 @@ public final class UnsafeExternalSorter extends MemoryConsumer { */ class SpillableIterator extends UnsafeSorterIterator { private UnsafeSorterIterator upstream; -private UnsafeSorterIterator nextUpstream = null; private MemoryBlock lastPage = null; private boolean loaded = false; private int numRecords = 0; +private Object currentBaseObject; +private long currentBaseOffset; +private int currentRecordLength; +private long currentKeyPrefix; + SpillableIterator(UnsafeSorterIterator inMemIterator) { this.upstream = inMemIterator; this.numRecords = inMemIterator.getNumRecords(); @@ -520,23 +524,26 @@ public final class UnsafeExternalSorter extends MemoryConsumer { return numRecords; } +@Override +public long getCurrentPageNumber() { + throw new UnsupportedOperationException(); +} + public long spill() throws IOException { synchronized (this) { -if (!(upstream instanceof UnsafeInMemorySorter.SortedIterator && nextUpstream == null - && numRecords > 0)) { +if (inMemSorter == null || numRecords <= 0) { return 0L; } -UnsafeInMemorySorter.SortedIterator inMemIterator = - ((UnsafeInMemorySorter.SortedIterator) upstream).clone(); +long currentPageNumber = upstream.getCurrentPageNumber(); - ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics(); +ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics(); // Iterate over the records that have not been returned and spill them. final UnsafeSorterSpillWriter spillWriter = new UnsafeSorterSpillWriter(blockManager, fileBufferSizeBytes, writeMetrics, numRecords); -spillIterator(inMemIterator, spillWriter); +spillIterator(upstream, spillWriter); spillWriters.add(spillWriter); -nextUpstream = spillWriter.getReader(serializerManager); +upstream = spillWriter.getReader(serializerManager); long released = 0L; synchronized
[spark] branch branch-3.0 updated: [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls
This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2e94d9a [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls 2e94d9a is described below commit 2e94d9af6e0d0c06eaba650dc70f07cc5b2bd791 Author: Tom van Bussel AuthorDate: Thu Sep 17 12:35:40 2020 +0200 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls ### What changes were proposed in this pull request? This PR changes the way `UnsafeExternalSorter.SpillableIterator` checks whether it has spilled already, by checking whether `inMemSorter` is null. It also allows it to spill other `UnsafeSorterIterator`s than `UnsafeInMemorySorter.SortedIterator`. ### Why are the changes needed? Before this PR `UnsafeExternalSorter.SpillableIterator` could not spill when there are NULLs in the input and radix sorting is used. Currently, Spark determines whether UnsafeExternalSorter.SpillableIterator has not spilled yet by checking whether `upstream` is an instance of `UnsafeInMemorySorter.SortedIterator`. When radix sorting is used and there are NULLs in the input however, `upstream` will be an instance of `UnsafeExternalSorter.ChainedIterator` instead, and Spark will assume [...] ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A test was added to `UnsafeExternalSorterSuite` (and therefore also to `UnsafeExternalSorterRadixSortSuite`). I manually confirmed that the test failed in `UnsafeExternalSorterRadixSortSuite` without this patch. Closes #29772 from tomvanbussel/SPARK-32900. Authored-by: Tom van Bussel Signed-off-by: herman (cherry picked from commit e5e54a3614ffd2a9150921e84e5b813d5cbf285a) Signed-off-by: herman --- .../unsafe/sort/UnsafeExternalSorter.java | 69 +- .../unsafe/sort/UnsafeInMemorySorter.java | 1 + .../unsafe/sort/UnsafeSorterIterator.java | 2 + .../unsafe/sort/UnsafeSorterSpillMerger.java | 5 ++ .../unsafe/sort/UnsafeSorterSpillReader.java | 5 ++ .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++ 6 files changed, 88 insertions(+), 27 deletions(-) diff --git a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java index 55e4e60..71b9a5b 100644 --- a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java +++ b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java @@ -501,11 +501,15 @@ public final class UnsafeExternalSorter extends MemoryConsumer { */ class SpillableIterator extends UnsafeSorterIterator { private UnsafeSorterIterator upstream; -private UnsafeSorterIterator nextUpstream = null; private MemoryBlock lastPage = null; private boolean loaded = false; private int numRecords = 0; +private Object currentBaseObject; +private long currentBaseOffset; +private int currentRecordLength; +private long currentKeyPrefix; + SpillableIterator(UnsafeSorterIterator inMemIterator) { this.upstream = inMemIterator; this.numRecords = inMemIterator.getNumRecords(); @@ -516,23 +520,26 @@ public final class UnsafeExternalSorter extends MemoryConsumer { return numRecords; } +@Override +public long getCurrentPageNumber() { + throw new UnsupportedOperationException(); +} + public long spill() throws IOException { synchronized (this) { -if (!(upstream instanceof UnsafeInMemorySorter.SortedIterator && nextUpstream == null - && numRecords > 0)) { +if (inMemSorter == null || numRecords <= 0) { return 0L; } -UnsafeInMemorySorter.SortedIterator inMemIterator = - ((UnsafeInMemorySorter.SortedIterator) upstream).clone(); +long currentPageNumber = upstream.getCurrentPageNumber(); - ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics(); +ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics(); // Iterate over the records that have not been returned and spill them. final UnsafeSorterSpillWriter spillWriter = new UnsafeSorterSpillWriter(blockManager, fileBufferSizeBytes, writeMetrics, numRecords); -spillIterator(inMemIterator, spillWriter); +spillIterator(upstream, spillWriter); spillWriters.add(spillWriter); -nextUpstream = spillWriter.getReader(serializerManager); +upstream = spillWriter.getReader(serializerManager); long released = 0L; synchronized
[spark] branch branch-2.4 updated: [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls
This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 2fa68a6 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls 2fa68a6 is described below commit 2fa68a669cc83521c7257d844202790933ae9771 Author: Tom van Bussel AuthorDate: Thu Sep 17 12:35:40 2020 +0200 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls ### What changes were proposed in this pull request? This PR changes the way `UnsafeExternalSorter.SpillableIterator` checks whether it has spilled already, by checking whether `inMemSorter` is null. It also allows it to spill other `UnsafeSorterIterator`s than `UnsafeInMemorySorter.SortedIterator`. ### Why are the changes needed? Before this PR `UnsafeExternalSorter.SpillableIterator` could not spill when there are NULLs in the input and radix sorting is used. Currently, Spark determines whether UnsafeExternalSorter.SpillableIterator has not spilled yet by checking whether `upstream` is an instance of `UnsafeInMemorySorter.SortedIterator`. When radix sorting is used and there are NULLs in the input however, `upstream` will be an instance of `UnsafeExternalSorter.ChainedIterator` instead, and Spark will assume [...] ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A test was added to `UnsafeExternalSorterSuite` (and therefore also to `UnsafeExternalSorterRadixSortSuite`). I manually confirmed that the test failed in `UnsafeExternalSorterRadixSortSuite` without this patch. Closes #29772 from tomvanbussel/SPARK-32900. Authored-by: Tom van Bussel Signed-off-by: herman (cherry picked from commit e5e54a3614ffd2a9150921e84e5b813d5cbf285a) Signed-off-by: herman --- .../unsafe/sort/UnsafeExternalSorter.java | 69 +- .../unsafe/sort/UnsafeInMemorySorter.java | 1 + .../unsafe/sort/UnsafeSorterIterator.java | 2 + .../unsafe/sort/UnsafeSorterSpillMerger.java | 5 ++ .../unsafe/sort/UnsafeSorterSpillReader.java | 5 ++ .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++ 6 files changed, 88 insertions(+), 27 deletions(-) diff --git a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java index a6a2076..f720ccd 100644 --- a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java +++ b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java @@ -505,11 +505,15 @@ public final class UnsafeExternalSorter extends MemoryConsumer { */ class SpillableIterator extends UnsafeSorterIterator { private UnsafeSorterIterator upstream; -private UnsafeSorterIterator nextUpstream = null; private MemoryBlock lastPage = null; private boolean loaded = false; private int numRecords = 0; +private Object currentBaseObject; +private long currentBaseOffset; +private int currentRecordLength; +private long currentKeyPrefix; + SpillableIterator(UnsafeSorterIterator inMemIterator) { this.upstream = inMemIterator; this.numRecords = inMemIterator.getNumRecords(); @@ -520,23 +524,26 @@ public final class UnsafeExternalSorter extends MemoryConsumer { return numRecords; } +@Override +public long getCurrentPageNumber() { + throw new UnsupportedOperationException(); +} + public long spill() throws IOException { synchronized (this) { -if (!(upstream instanceof UnsafeInMemorySorter.SortedIterator && nextUpstream == null - && numRecords > 0)) { +if (inMemSorter == null || numRecords <= 0) { return 0L; } -UnsafeInMemorySorter.SortedIterator inMemIterator = - ((UnsafeInMemorySorter.SortedIterator) upstream).clone(); +long currentPageNumber = upstream.getCurrentPageNumber(); - ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics(); +ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics(); // Iterate over the records that have not been returned and spill them. final UnsafeSorterSpillWriter spillWriter = new UnsafeSorterSpillWriter(blockManager, fileBufferSizeBytes, writeMetrics, numRecords); -spillIterator(inMemIterator, spillWriter); +spillIterator(upstream, spillWriter); spillWriters.add(spillWriter); -nextUpstream = spillWriter.getReader(serializerManager); +upstream = spillWriter.getReader(serializerManager); long released = 0L; synchronized
[spark] branch branch-3.0 updated: [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls
This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2e94d9a [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls 2e94d9a is described below commit 2e94d9af6e0d0c06eaba650dc70f07cc5b2bd791 Author: Tom van Bussel AuthorDate: Thu Sep 17 12:35:40 2020 +0200 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls ### What changes were proposed in this pull request? This PR changes the way `UnsafeExternalSorter.SpillableIterator` checks whether it has spilled already, by checking whether `inMemSorter` is null. It also allows it to spill other `UnsafeSorterIterator`s than `UnsafeInMemorySorter.SortedIterator`. ### Why are the changes needed? Before this PR `UnsafeExternalSorter.SpillableIterator` could not spill when there are NULLs in the input and radix sorting is used. Currently, Spark determines whether UnsafeExternalSorter.SpillableIterator has not spilled yet by checking whether `upstream` is an instance of `UnsafeInMemorySorter.SortedIterator`. When radix sorting is used and there are NULLs in the input however, `upstream` will be an instance of `UnsafeExternalSorter.ChainedIterator` instead, and Spark will assume [...] ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A test was added to `UnsafeExternalSorterSuite` (and therefore also to `UnsafeExternalSorterRadixSortSuite`). I manually confirmed that the test failed in `UnsafeExternalSorterRadixSortSuite` without this patch. Closes #29772 from tomvanbussel/SPARK-32900. Authored-by: Tom van Bussel Signed-off-by: herman (cherry picked from commit e5e54a3614ffd2a9150921e84e5b813d5cbf285a) Signed-off-by: herman --- .../unsafe/sort/UnsafeExternalSorter.java | 69 +- .../unsafe/sort/UnsafeInMemorySorter.java | 1 + .../unsafe/sort/UnsafeSorterIterator.java | 2 + .../unsafe/sort/UnsafeSorterSpillMerger.java | 5 ++ .../unsafe/sort/UnsafeSorterSpillReader.java | 5 ++ .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++ 6 files changed, 88 insertions(+), 27 deletions(-) diff --git a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java index 55e4e60..71b9a5b 100644 --- a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java +++ b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java @@ -501,11 +501,15 @@ public final class UnsafeExternalSorter extends MemoryConsumer { */ class SpillableIterator extends UnsafeSorterIterator { private UnsafeSorterIterator upstream; -private UnsafeSorterIterator nextUpstream = null; private MemoryBlock lastPage = null; private boolean loaded = false; private int numRecords = 0; +private Object currentBaseObject; +private long currentBaseOffset; +private int currentRecordLength; +private long currentKeyPrefix; + SpillableIterator(UnsafeSorterIterator inMemIterator) { this.upstream = inMemIterator; this.numRecords = inMemIterator.getNumRecords(); @@ -516,23 +520,26 @@ public final class UnsafeExternalSorter extends MemoryConsumer { return numRecords; } +@Override +public long getCurrentPageNumber() { + throw new UnsupportedOperationException(); +} + public long spill() throws IOException { synchronized (this) { -if (!(upstream instanceof UnsafeInMemorySorter.SortedIterator && nextUpstream == null - && numRecords > 0)) { +if (inMemSorter == null || numRecords <= 0) { return 0L; } -UnsafeInMemorySorter.SortedIterator inMemIterator = - ((UnsafeInMemorySorter.SortedIterator) upstream).clone(); +long currentPageNumber = upstream.getCurrentPageNumber(); - ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics(); +ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics(); // Iterate over the records that have not been returned and spill them. final UnsafeSorterSpillWriter spillWriter = new UnsafeSorterSpillWriter(blockManager, fileBufferSizeBytes, writeMetrics, numRecords); -spillIterator(inMemIterator, spillWriter); +spillIterator(upstream, spillWriter); spillWriters.add(spillWriter); -nextUpstream = spillWriter.getReader(serializerManager); +upstream = spillWriter.getReader(serializerManager); long released = 0L; synchronized
[spark] branch master updated (92b75dc -> e5e54a3)
This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 92b75dc [SPARK-32508][SQL] Disallow empty part col values in partition spec before static partition writing add e5e54a3 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls No new revisions were added by this update. Summary of changes: .../unsafe/sort/UnsafeExternalSorter.java | 69 +- .../unsafe/sort/UnsafeInMemorySorter.java | 1 + .../unsafe/sort/UnsafeSorterIterator.java | 2 + .../unsafe/sort/UnsafeSorterSpillMerger.java | 5 ++ .../unsafe/sort/UnsafeSorterSpillReader.java | 5 ++ .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++ 6 files changed, 88 insertions(+), 27 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls
This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 2fa68a6 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls 2fa68a6 is described below commit 2fa68a669cc83521c7257d844202790933ae9771 Author: Tom van Bussel AuthorDate: Thu Sep 17 12:35:40 2020 +0200 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls ### What changes were proposed in this pull request? This PR changes the way `UnsafeExternalSorter.SpillableIterator` checks whether it has spilled already, by checking whether `inMemSorter` is null. It also allows it to spill other `UnsafeSorterIterator`s than `UnsafeInMemorySorter.SortedIterator`. ### Why are the changes needed? Before this PR `UnsafeExternalSorter.SpillableIterator` could not spill when there are NULLs in the input and radix sorting is used. Currently, Spark determines whether UnsafeExternalSorter.SpillableIterator has not spilled yet by checking whether `upstream` is an instance of `UnsafeInMemorySorter.SortedIterator`. When radix sorting is used and there are NULLs in the input however, `upstream` will be an instance of `UnsafeExternalSorter.ChainedIterator` instead, and Spark will assume [...] ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A test was added to `UnsafeExternalSorterSuite` (and therefore also to `UnsafeExternalSorterRadixSortSuite`). I manually confirmed that the test failed in `UnsafeExternalSorterRadixSortSuite` without this patch. Closes #29772 from tomvanbussel/SPARK-32900. Authored-by: Tom van Bussel Signed-off-by: herman (cherry picked from commit e5e54a3614ffd2a9150921e84e5b813d5cbf285a) Signed-off-by: herman --- .../unsafe/sort/UnsafeExternalSorter.java | 69 +- .../unsafe/sort/UnsafeInMemorySorter.java | 1 + .../unsafe/sort/UnsafeSorterIterator.java | 2 + .../unsafe/sort/UnsafeSorterSpillMerger.java | 5 ++ .../unsafe/sort/UnsafeSorterSpillReader.java | 5 ++ .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++ 6 files changed, 88 insertions(+), 27 deletions(-) diff --git a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java index a6a2076..f720ccd 100644 --- a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java +++ b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java @@ -505,11 +505,15 @@ public final class UnsafeExternalSorter extends MemoryConsumer { */ class SpillableIterator extends UnsafeSorterIterator { private UnsafeSorterIterator upstream; -private UnsafeSorterIterator nextUpstream = null; private MemoryBlock lastPage = null; private boolean loaded = false; private int numRecords = 0; +private Object currentBaseObject; +private long currentBaseOffset; +private int currentRecordLength; +private long currentKeyPrefix; + SpillableIterator(UnsafeSorterIterator inMemIterator) { this.upstream = inMemIterator; this.numRecords = inMemIterator.getNumRecords(); @@ -520,23 +524,26 @@ public final class UnsafeExternalSorter extends MemoryConsumer { return numRecords; } +@Override +public long getCurrentPageNumber() { + throw new UnsupportedOperationException(); +} + public long spill() throws IOException { synchronized (this) { -if (!(upstream instanceof UnsafeInMemorySorter.SortedIterator && nextUpstream == null - && numRecords > 0)) { +if (inMemSorter == null || numRecords <= 0) { return 0L; } -UnsafeInMemorySorter.SortedIterator inMemIterator = - ((UnsafeInMemorySorter.SortedIterator) upstream).clone(); +long currentPageNumber = upstream.getCurrentPageNumber(); - ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics(); +ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics(); // Iterate over the records that have not been returned and spill them. final UnsafeSorterSpillWriter spillWriter = new UnsafeSorterSpillWriter(blockManager, fileBufferSizeBytes, writeMetrics, numRecords); -spillIterator(inMemIterator, spillWriter); +spillIterator(upstream, spillWriter); spillWriters.add(spillWriter); -nextUpstream = spillWriter.getReader(serializerManager); +upstream = spillWriter.getReader(serializerManager); long released = 0L; synchronized
[spark] branch branch-3.0 updated: [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls
This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2e94d9a [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls 2e94d9a is described below commit 2e94d9af6e0d0c06eaba650dc70f07cc5b2bd791 Author: Tom van Bussel AuthorDate: Thu Sep 17 12:35:40 2020 +0200 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls ### What changes were proposed in this pull request? This PR changes the way `UnsafeExternalSorter.SpillableIterator` checks whether it has spilled already, by checking whether `inMemSorter` is null. It also allows it to spill other `UnsafeSorterIterator`s than `UnsafeInMemorySorter.SortedIterator`. ### Why are the changes needed? Before this PR `UnsafeExternalSorter.SpillableIterator` could not spill when there are NULLs in the input and radix sorting is used. Currently, Spark determines whether UnsafeExternalSorter.SpillableIterator has not spilled yet by checking whether `upstream` is an instance of `UnsafeInMemorySorter.SortedIterator`. When radix sorting is used and there are NULLs in the input however, `upstream` will be an instance of `UnsafeExternalSorter.ChainedIterator` instead, and Spark will assume [...] ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A test was added to `UnsafeExternalSorterSuite` (and therefore also to `UnsafeExternalSorterRadixSortSuite`). I manually confirmed that the test failed in `UnsafeExternalSorterRadixSortSuite` without this patch. Closes #29772 from tomvanbussel/SPARK-32900. Authored-by: Tom van Bussel Signed-off-by: herman (cherry picked from commit e5e54a3614ffd2a9150921e84e5b813d5cbf285a) Signed-off-by: herman --- .../unsafe/sort/UnsafeExternalSorter.java | 69 +- .../unsafe/sort/UnsafeInMemorySorter.java | 1 + .../unsafe/sort/UnsafeSorterIterator.java | 2 + .../unsafe/sort/UnsafeSorterSpillMerger.java | 5 ++ .../unsafe/sort/UnsafeSorterSpillReader.java | 5 ++ .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++ 6 files changed, 88 insertions(+), 27 deletions(-) diff --git a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java index 55e4e60..71b9a5b 100644 --- a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java +++ b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java @@ -501,11 +501,15 @@ public final class UnsafeExternalSorter extends MemoryConsumer { */ class SpillableIterator extends UnsafeSorterIterator { private UnsafeSorterIterator upstream; -private UnsafeSorterIterator nextUpstream = null; private MemoryBlock lastPage = null; private boolean loaded = false; private int numRecords = 0; +private Object currentBaseObject; +private long currentBaseOffset; +private int currentRecordLength; +private long currentKeyPrefix; + SpillableIterator(UnsafeSorterIterator inMemIterator) { this.upstream = inMemIterator; this.numRecords = inMemIterator.getNumRecords(); @@ -516,23 +520,26 @@ public final class UnsafeExternalSorter extends MemoryConsumer { return numRecords; } +@Override +public long getCurrentPageNumber() { + throw new UnsupportedOperationException(); +} + public long spill() throws IOException { synchronized (this) { -if (!(upstream instanceof UnsafeInMemorySorter.SortedIterator && nextUpstream == null - && numRecords > 0)) { +if (inMemSorter == null || numRecords <= 0) { return 0L; } -UnsafeInMemorySorter.SortedIterator inMemIterator = - ((UnsafeInMemorySorter.SortedIterator) upstream).clone(); +long currentPageNumber = upstream.getCurrentPageNumber(); - ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics(); +ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics(); // Iterate over the records that have not been returned and spill them. final UnsafeSorterSpillWriter spillWriter = new UnsafeSorterSpillWriter(blockManager, fileBufferSizeBytes, writeMetrics, numRecords); -spillIterator(inMemIterator, spillWriter); +spillIterator(upstream, spillWriter); spillWriters.add(spillWriter); -nextUpstream = spillWriter.getReader(serializerManager); +upstream = spillWriter.getReader(serializerManager); long released = 0L; synchronized
[spark] branch master updated (92b75dc -> e5e54a3)
This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 92b75dc [SPARK-32508][SQL] Disallow empty part col values in partition spec before static partition writing add e5e54a3 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls No new revisions were added by this update. Summary of changes: .../unsafe/sort/UnsafeExternalSorter.java | 69 +- .../unsafe/sort/UnsafeInMemorySorter.java | 1 + .../unsafe/sort/UnsafeSorterIterator.java | 2 + .../unsafe/sort/UnsafeSorterSpillMerger.java | 5 ++ .../unsafe/sort/UnsafeSorterSpillReader.java | 5 ++ .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++ 6 files changed, 88 insertions(+), 27 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls
This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 2fa68a6 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls 2fa68a6 is described below commit 2fa68a669cc83521c7257d844202790933ae9771 Author: Tom van Bussel AuthorDate: Thu Sep 17 12:35:40 2020 +0200 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls ### What changes were proposed in this pull request? This PR changes the way `UnsafeExternalSorter.SpillableIterator` checks whether it has spilled already, by checking whether `inMemSorter` is null. It also allows it to spill other `UnsafeSorterIterator`s than `UnsafeInMemorySorter.SortedIterator`. ### Why are the changes needed? Before this PR `UnsafeExternalSorter.SpillableIterator` could not spill when there are NULLs in the input and radix sorting is used. Currently, Spark determines whether UnsafeExternalSorter.SpillableIterator has not spilled yet by checking whether `upstream` is an instance of `UnsafeInMemorySorter.SortedIterator`. When radix sorting is used and there are NULLs in the input however, `upstream` will be an instance of `UnsafeExternalSorter.ChainedIterator` instead, and Spark will assume [...] ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A test was added to `UnsafeExternalSorterSuite` (and therefore also to `UnsafeExternalSorterRadixSortSuite`). I manually confirmed that the test failed in `UnsafeExternalSorterRadixSortSuite` without this patch. Closes #29772 from tomvanbussel/SPARK-32900. Authored-by: Tom van Bussel Signed-off-by: herman (cherry picked from commit e5e54a3614ffd2a9150921e84e5b813d5cbf285a) Signed-off-by: herman --- .../unsafe/sort/UnsafeExternalSorter.java | 69 +- .../unsafe/sort/UnsafeInMemorySorter.java | 1 + .../unsafe/sort/UnsafeSorterIterator.java | 2 + .../unsafe/sort/UnsafeSorterSpillMerger.java | 5 ++ .../unsafe/sort/UnsafeSorterSpillReader.java | 5 ++ .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++ 6 files changed, 88 insertions(+), 27 deletions(-) diff --git a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java index a6a2076..f720ccd 100644 --- a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java +++ b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java @@ -505,11 +505,15 @@ public final class UnsafeExternalSorter extends MemoryConsumer { */ class SpillableIterator extends UnsafeSorterIterator { private UnsafeSorterIterator upstream; -private UnsafeSorterIterator nextUpstream = null; private MemoryBlock lastPage = null; private boolean loaded = false; private int numRecords = 0; +private Object currentBaseObject; +private long currentBaseOffset; +private int currentRecordLength; +private long currentKeyPrefix; + SpillableIterator(UnsafeSorterIterator inMemIterator) { this.upstream = inMemIterator; this.numRecords = inMemIterator.getNumRecords(); @@ -520,23 +524,26 @@ public final class UnsafeExternalSorter extends MemoryConsumer { return numRecords; } +@Override +public long getCurrentPageNumber() { + throw new UnsupportedOperationException(); +} + public long spill() throws IOException { synchronized (this) { -if (!(upstream instanceof UnsafeInMemorySorter.SortedIterator && nextUpstream == null - && numRecords > 0)) { +if (inMemSorter == null || numRecords <= 0) { return 0L; } -UnsafeInMemorySorter.SortedIterator inMemIterator = - ((UnsafeInMemorySorter.SortedIterator) upstream).clone(); +long currentPageNumber = upstream.getCurrentPageNumber(); - ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics(); +ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics(); // Iterate over the records that have not been returned and spill them. final UnsafeSorterSpillWriter spillWriter = new UnsafeSorterSpillWriter(blockManager, fileBufferSizeBytes, writeMetrics, numRecords); -spillIterator(inMemIterator, spillWriter); +spillIterator(upstream, spillWriter); spillWriters.add(spillWriter); -nextUpstream = spillWriter.getReader(serializerManager); +upstream = spillWriter.getReader(serializerManager); long released = 0L; synchronized
[spark] branch branch-3.0 updated: [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls
This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2e94d9a [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls 2e94d9a is described below commit 2e94d9af6e0d0c06eaba650dc70f07cc5b2bd791 Author: Tom van Bussel AuthorDate: Thu Sep 17 12:35:40 2020 +0200 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls ### What changes were proposed in this pull request? This PR changes the way `UnsafeExternalSorter.SpillableIterator` checks whether it has spilled already, by checking whether `inMemSorter` is null. It also allows it to spill other `UnsafeSorterIterator`s than `UnsafeInMemorySorter.SortedIterator`. ### Why are the changes needed? Before this PR `UnsafeExternalSorter.SpillableIterator` could not spill when there are NULLs in the input and radix sorting is used. Currently, Spark determines whether UnsafeExternalSorter.SpillableIterator has not spilled yet by checking whether `upstream` is an instance of `UnsafeInMemorySorter.SortedIterator`. When radix sorting is used and there are NULLs in the input however, `upstream` will be an instance of `UnsafeExternalSorter.ChainedIterator` instead, and Spark will assume [...] ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A test was added to `UnsafeExternalSorterSuite` (and therefore also to `UnsafeExternalSorterRadixSortSuite`). I manually confirmed that the test failed in `UnsafeExternalSorterRadixSortSuite` without this patch. Closes #29772 from tomvanbussel/SPARK-32900. Authored-by: Tom van Bussel Signed-off-by: herman (cherry picked from commit e5e54a3614ffd2a9150921e84e5b813d5cbf285a) Signed-off-by: herman --- .../unsafe/sort/UnsafeExternalSorter.java | 69 +- .../unsafe/sort/UnsafeInMemorySorter.java | 1 + .../unsafe/sort/UnsafeSorterIterator.java | 2 + .../unsafe/sort/UnsafeSorterSpillMerger.java | 5 ++ .../unsafe/sort/UnsafeSorterSpillReader.java | 5 ++ .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++ 6 files changed, 88 insertions(+), 27 deletions(-) diff --git a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java index 55e4e60..71b9a5b 100644 --- a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java +++ b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java @@ -501,11 +501,15 @@ public final class UnsafeExternalSorter extends MemoryConsumer { */ class SpillableIterator extends UnsafeSorterIterator { private UnsafeSorterIterator upstream; -private UnsafeSorterIterator nextUpstream = null; private MemoryBlock lastPage = null; private boolean loaded = false; private int numRecords = 0; +private Object currentBaseObject; +private long currentBaseOffset; +private int currentRecordLength; +private long currentKeyPrefix; + SpillableIterator(UnsafeSorterIterator inMemIterator) { this.upstream = inMemIterator; this.numRecords = inMemIterator.getNumRecords(); @@ -516,23 +520,26 @@ public final class UnsafeExternalSorter extends MemoryConsumer { return numRecords; } +@Override +public long getCurrentPageNumber() { + throw new UnsupportedOperationException(); +} + public long spill() throws IOException { synchronized (this) { -if (!(upstream instanceof UnsafeInMemorySorter.SortedIterator && nextUpstream == null - && numRecords > 0)) { +if (inMemSorter == null || numRecords <= 0) { return 0L; } -UnsafeInMemorySorter.SortedIterator inMemIterator = - ((UnsafeInMemorySorter.SortedIterator) upstream).clone(); +long currentPageNumber = upstream.getCurrentPageNumber(); - ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics(); +ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics(); // Iterate over the records that have not been returned and spill them. final UnsafeSorterSpillWriter spillWriter = new UnsafeSorterSpillWriter(blockManager, fileBufferSizeBytes, writeMetrics, numRecords); -spillIterator(inMemIterator, spillWriter); +spillIterator(upstream, spillWriter); spillWriters.add(spillWriter); -nextUpstream = spillWriter.getReader(serializerManager); +upstream = spillWriter.getReader(serializerManager); long released = 0L; synchronized
[spark] branch master updated (92b75dc -> e5e54a3)
This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 92b75dc [SPARK-32508][SQL] Disallow empty part col values in partition spec before static partition writing add e5e54a3 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls No new revisions were added by this update. Summary of changes: .../unsafe/sort/UnsafeExternalSorter.java | 69 +- .../unsafe/sort/UnsafeInMemorySorter.java | 1 + .../unsafe/sort/UnsafeSorterIterator.java | 2 + .../unsafe/sort/UnsafeSorterSpillMerger.java | 5 ++ .../unsafe/sort/UnsafeSorterSpillReader.java | 5 ++ .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++ 6 files changed, 88 insertions(+), 27 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls
This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 2fa68a6 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls 2fa68a6 is described below commit 2fa68a669cc83521c7257d844202790933ae9771 Author: Tom van Bussel AuthorDate: Thu Sep 17 12:35:40 2020 +0200 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls ### What changes were proposed in this pull request? This PR changes the way `UnsafeExternalSorter.SpillableIterator` checks whether it has spilled already, by checking whether `inMemSorter` is null. It also allows it to spill other `UnsafeSorterIterator`s than `UnsafeInMemorySorter.SortedIterator`. ### Why are the changes needed? Before this PR `UnsafeExternalSorter.SpillableIterator` could not spill when there are NULLs in the input and radix sorting is used. Currently, Spark determines whether UnsafeExternalSorter.SpillableIterator has not spilled yet by checking whether `upstream` is an instance of `UnsafeInMemorySorter.SortedIterator`. When radix sorting is used and there are NULLs in the input however, `upstream` will be an instance of `UnsafeExternalSorter.ChainedIterator` instead, and Spark will assume [...] ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A test was added to `UnsafeExternalSorterSuite` (and therefore also to `UnsafeExternalSorterRadixSortSuite`). I manually confirmed that the test failed in `UnsafeExternalSorterRadixSortSuite` without this patch. Closes #29772 from tomvanbussel/SPARK-32900. Authored-by: Tom van Bussel Signed-off-by: herman (cherry picked from commit e5e54a3614ffd2a9150921e84e5b813d5cbf285a) Signed-off-by: herman --- .../unsafe/sort/UnsafeExternalSorter.java | 69 +- .../unsafe/sort/UnsafeInMemorySorter.java | 1 + .../unsafe/sort/UnsafeSorterIterator.java | 2 + .../unsafe/sort/UnsafeSorterSpillMerger.java | 5 ++ .../unsafe/sort/UnsafeSorterSpillReader.java | 5 ++ .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++ 6 files changed, 88 insertions(+), 27 deletions(-) diff --git a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java index a6a2076..f720ccd 100644 --- a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java +++ b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java @@ -505,11 +505,15 @@ public final class UnsafeExternalSorter extends MemoryConsumer { */ class SpillableIterator extends UnsafeSorterIterator { private UnsafeSorterIterator upstream; -private UnsafeSorterIterator nextUpstream = null; private MemoryBlock lastPage = null; private boolean loaded = false; private int numRecords = 0; +private Object currentBaseObject; +private long currentBaseOffset; +private int currentRecordLength; +private long currentKeyPrefix; + SpillableIterator(UnsafeSorterIterator inMemIterator) { this.upstream = inMemIterator; this.numRecords = inMemIterator.getNumRecords(); @@ -520,23 +524,26 @@ public final class UnsafeExternalSorter extends MemoryConsumer { return numRecords; } +@Override +public long getCurrentPageNumber() { + throw new UnsupportedOperationException(); +} + public long spill() throws IOException { synchronized (this) { -if (!(upstream instanceof UnsafeInMemorySorter.SortedIterator && nextUpstream == null - && numRecords > 0)) { +if (inMemSorter == null || numRecords <= 0) { return 0L; } -UnsafeInMemorySorter.SortedIterator inMemIterator = - ((UnsafeInMemorySorter.SortedIterator) upstream).clone(); +long currentPageNumber = upstream.getCurrentPageNumber(); - ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics(); +ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics(); // Iterate over the records that have not been returned and spill them. final UnsafeSorterSpillWriter spillWriter = new UnsafeSorterSpillWriter(blockManager, fileBufferSizeBytes, writeMetrics, numRecords); -spillIterator(inMemIterator, spillWriter); +spillIterator(upstream, spillWriter); spillWriters.add(spillWriter); -nextUpstream = spillWriter.getReader(serializerManager); +upstream = spillWriter.getReader(serializerManager); long released = 0L; synchronized
[spark] branch branch-3.0 updated: [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls
This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2e94d9a [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls 2e94d9a is described below commit 2e94d9af6e0d0c06eaba650dc70f07cc5b2bd791 Author: Tom van Bussel AuthorDate: Thu Sep 17 12:35:40 2020 +0200 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls ### What changes were proposed in this pull request? This PR changes the way `UnsafeExternalSorter.SpillableIterator` checks whether it has spilled already, by checking whether `inMemSorter` is null. It also allows it to spill other `UnsafeSorterIterator`s than `UnsafeInMemorySorter.SortedIterator`. ### Why are the changes needed? Before this PR `UnsafeExternalSorter.SpillableIterator` could not spill when there are NULLs in the input and radix sorting is used. Currently, Spark determines whether UnsafeExternalSorter.SpillableIterator has not spilled yet by checking whether `upstream` is an instance of `UnsafeInMemorySorter.SortedIterator`. When radix sorting is used and there are NULLs in the input however, `upstream` will be an instance of `UnsafeExternalSorter.ChainedIterator` instead, and Spark will assume [...] ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A test was added to `UnsafeExternalSorterSuite` (and therefore also to `UnsafeExternalSorterRadixSortSuite`). I manually confirmed that the test failed in `UnsafeExternalSorterRadixSortSuite` without this patch. Closes #29772 from tomvanbussel/SPARK-32900. Authored-by: Tom van Bussel Signed-off-by: herman (cherry picked from commit e5e54a3614ffd2a9150921e84e5b813d5cbf285a) Signed-off-by: herman --- .../unsafe/sort/UnsafeExternalSorter.java | 69 +- .../unsafe/sort/UnsafeInMemorySorter.java | 1 + .../unsafe/sort/UnsafeSorterIterator.java | 2 + .../unsafe/sort/UnsafeSorterSpillMerger.java | 5 ++ .../unsafe/sort/UnsafeSorterSpillReader.java | 5 ++ .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++ 6 files changed, 88 insertions(+), 27 deletions(-) diff --git a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java index 55e4e60..71b9a5b 100644 --- a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java +++ b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java @@ -501,11 +501,15 @@ public final class UnsafeExternalSorter extends MemoryConsumer { */ class SpillableIterator extends UnsafeSorterIterator { private UnsafeSorterIterator upstream; -private UnsafeSorterIterator nextUpstream = null; private MemoryBlock lastPage = null; private boolean loaded = false; private int numRecords = 0; +private Object currentBaseObject; +private long currentBaseOffset; +private int currentRecordLength; +private long currentKeyPrefix; + SpillableIterator(UnsafeSorterIterator inMemIterator) { this.upstream = inMemIterator; this.numRecords = inMemIterator.getNumRecords(); @@ -516,23 +520,26 @@ public final class UnsafeExternalSorter extends MemoryConsumer { return numRecords; } +@Override +public long getCurrentPageNumber() { + throw new UnsupportedOperationException(); +} + public long spill() throws IOException { synchronized (this) { -if (!(upstream instanceof UnsafeInMemorySorter.SortedIterator && nextUpstream == null - && numRecords > 0)) { +if (inMemSorter == null || numRecords <= 0) { return 0L; } -UnsafeInMemorySorter.SortedIterator inMemIterator = - ((UnsafeInMemorySorter.SortedIterator) upstream).clone(); +long currentPageNumber = upstream.getCurrentPageNumber(); - ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics(); +ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics(); // Iterate over the records that have not been returned and spill them. final UnsafeSorterSpillWriter spillWriter = new UnsafeSorterSpillWriter(blockManager, fileBufferSizeBytes, writeMetrics, numRecords); -spillIterator(inMemIterator, spillWriter); +spillIterator(upstream, spillWriter); spillWriters.add(spillWriter); -nextUpstream = spillWriter.getReader(serializerManager); +upstream = spillWriter.getReader(serializerManager); long released = 0L; synchronized
[spark] branch master updated (92b75dc -> e5e54a3)
This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 92b75dc [SPARK-32508][SQL] Disallow empty part col values in partition spec before static partition writing add e5e54a3 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls No new revisions were added by this update. Summary of changes: .../unsafe/sort/UnsafeExternalSorter.java | 69 +- .../unsafe/sort/UnsafeInMemorySorter.java | 1 + .../unsafe/sort/UnsafeSorterIterator.java | 2 + .../unsafe/sort/UnsafeSorterSpillMerger.java | 5 ++ .../unsafe/sort/UnsafeSorterSpillReader.java | 5 ++ .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++ 6 files changed, 88 insertions(+), 27 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (92b75dc -> e5e54a3)
This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 92b75dc [SPARK-32508][SQL] Disallow empty part col values in partition spec before static partition writing add e5e54a3 [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls No new revisions were added by this update. Summary of changes: .../unsafe/sort/UnsafeExternalSorter.java | 69 +- .../unsafe/sort/UnsafeInMemorySorter.java | 1 + .../unsafe/sort/UnsafeSorterIterator.java | 2 + .../unsafe/sort/UnsafeSorterSpillMerger.java | 5 ++ .../unsafe/sort/UnsafeSorterSpillReader.java | 5 ++ .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++ 6 files changed, 88 insertions(+), 27 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bd38e0b -> 92b75dc)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bd38e0b [SPARK-32903][SQL] GeneratePredicate should be able to eliminate common sub-expressions add 92b75dc [SPARK-32508][SQL] Disallow empty part col values in partition spec before static partition writing No new revisions were added by this update. Summary of changes: .../spark/sql/execution/datasources/rules.scala| 22 ++ .../org/apache/spark/sql/sources/InsertSuite.scala | 22 ++ .../org/apache/spark/sql/hive/InsertSuite.scala| 22 ++ 3 files changed, 62 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bd38e0b -> 92b75dc)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bd38e0b [SPARK-32903][SQL] GeneratePredicate should be able to eliminate common sub-expressions add 92b75dc [SPARK-32508][SQL] Disallow empty part col values in partition spec before static partition writing No new revisions were added by this update. Summary of changes: .../spark/sql/execution/datasources/rules.scala| 22 ++ .../org/apache/spark/sql/sources/InsertSuite.scala | 22 ++ .../org/apache/spark/sql/hive/InsertSuite.scala| 22 ++ 3 files changed, 62 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bd38e0b -> 92b75dc)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bd38e0b [SPARK-32903][SQL] GeneratePredicate should be able to eliminate common sub-expressions add 92b75dc [SPARK-32508][SQL] Disallow empty part col values in partition spec before static partition writing No new revisions were added by this update. Summary of changes: .../spark/sql/execution/datasources/rules.scala| 22 ++ .../org/apache/spark/sql/sources/InsertSuite.scala | 22 ++ .../org/apache/spark/sql/hive/InsertSuite.scala| 22 ++ 3 files changed, 62 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bd38e0b -> 92b75dc)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bd38e0b [SPARK-32903][SQL] GeneratePredicate should be able to eliminate common sub-expressions add 92b75dc [SPARK-32508][SQL] Disallow empty part col values in partition spec before static partition writing No new revisions were added by this update. Summary of changes: .../spark/sql/execution/datasources/rules.scala| 22 ++ .../org/apache/spark/sql/sources/InsertSuite.scala | 22 ++ .../org/apache/spark/sql/hive/InsertSuite.scala| 22 ++ 3 files changed, 62 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org