date:20200917

[spark] branch branch-3.0 updated: [SPARK-32906][SQL] Struct field names should not change after normalizing floats

2020-09-17 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2d55de5  [SPARK-32906][SQL] Struct field names should not change after 
normalizing floats
2d55de5 is described below

commit 2d55de5db06ff522a8e7fded3760c3f65c8bbf93
Author: Takeshi Yamamuro 
AuthorDate: Thu Sep 17 22:07:47 2020 -0700

[SPARK-32906][SQL] Struct field names should not change after normalizing 
floats

### What changes were proposed in this pull request?

This PR intends to fix a minor bug when normalizing floats for struct types;
```
scala> import org.apache.spark.sql.execution.aggregate.HashAggregateExec
scala> val df = Seq(Tuple1(Tuple1(-0.0d)), Tuple1(Tuple1(0.0d))).toDF("k")
scala> val agg = df.distinct()
scala> agg.explain()
== Physical Plan ==
*(2) HashAggregate(keys=[k#40], functions=[])
+- Exchange hashpartitioning(k#40, 200), true, [id=#62]
   +- *(1) HashAggregate(keys=[knownfloatingpointnormalized(if 
(isnull(k#40)) null else named_struct(col1, 
knownfloatingpointnormalized(normalizenanandzero(k#40._1 AS k#40], 
functions=[])
  +- *(1) LocalTableScan [k#40]

scala> val aggOutput = agg.queryExecution.sparkPlan.collect { case a: 
HashAggregateExec => a.output.head }
scala> aggOutput.foreach { attr => println(attr.prettyJson) }
### Final Aggregate ###
[ {
  "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference",
  "num-children" : 0,
  "name" : "k",
  "dataType" : {
"type" : "struct",
"fields" : [ {
  "name" : "_1",
^^^
  "type" : "double",
  "nullable" : false,
  "metadata" : { }
} ]
  },
  "nullable" : true,
  "metadata" : { },
  "exprId" : {
"product-class" : "org.apache.spark.sql.catalyst.expressions.ExprId",
"id" : 40,
"jvmId" : "a824e83f-933e-4b85-a1ff-577b5a0e2366"
  },
  "qualifier" : [ ]
} ]

### Partial Aggregate ###
[ {
  "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference",
  "num-children" : 0,
  "name" : "k",
  "dataType" : {
"type" : "struct",
"fields" : [ {
  "name" : "col1",

  "type" : "double",
  "nullable" : true,
  "metadata" : { }
} ]
  },
  "nullable" : true,
  "metadata" : { },
  "exprId" : {
"product-class" : "org.apache.spark.sql.catalyst.expressions.ExprId",
"id" : 40,
"jvmId" : "a824e83f-933e-4b85-a1ff-577b5a0e2366"
  },
  "qualifier" : [ ]
} ]
```

### Why are the changes needed?

bugfix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added tests.

Closes #29780 from maropu/FixBugInNormalizedFloatingNumbers.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Liang-Chi Hsieh 
(cherry picked from commit b49aaa33e13814a448be51a7e65a29cb515b8248)
Signed-off-by: Liang-Chi Hsieh 
---
 .../spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala   | 6 +++---
 .../test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala | 8 
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
index 8d5dbc7..f0cf671 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
@@ -120,10 +120,10 @@ object NormalizeFloatingNumbers extends Rule[LogicalPlan] 
{
   KnownFloatingPointNormalized(NormalizeNaNAndZero(expr))
 
 case _ if expr.dataType.isInstanceOf[StructType] =>
-  val fields = expr.dataType.asInstanceOf[StructType].fields.indices.map { 
i =>
-normalize(GetStructField(expr, i))
+  val fields = 
expr.dataType.asInstanceOf[StructType].fieldNames.zipWithIndex.map {
+case (name, i) => Seq(Literal(name), normalize(GetStructField(expr, 
i)))
   }
-  val struct = CreateStruct(fields)
+  val struct = CreateNamedStruct(fields.flatten.toSeq)
   KnownFloatingPointNormalized(If(IsNull(expr), Literal(null, 
struct.dataType), struct))
 
 case _ if expr.dataType.isInstanceOf[ArrayType] =>
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
index 54327b3..2cb7790 100644
---

[spark] branch branch-3.0 updated: [SPARK-32906][SQL] Struct field names should not change after normalizing floats

2020-09-17 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2d55de5  [SPARK-32906][SQL] Struct field names should not change after 
normalizing floats
2d55de5 is described below

commit 2d55de5db06ff522a8e7fded3760c3f65c8bbf93
Author: Takeshi Yamamuro 
AuthorDate: Thu Sep 17 22:07:47 2020 -0700

[SPARK-32906][SQL] Struct field names should not change after normalizing 
floats

### What changes were proposed in this pull request?

This PR intends to fix a minor bug when normalizing floats for struct types;
```
scala> import org.apache.spark.sql.execution.aggregate.HashAggregateExec
scala> val df = Seq(Tuple1(Tuple1(-0.0d)), Tuple1(Tuple1(0.0d))).toDF("k")
scala> val agg = df.distinct()
scala> agg.explain()
== Physical Plan ==
*(2) HashAggregate(keys=[k#40], functions=[])
+- Exchange hashpartitioning(k#40, 200), true, [id=#62]
   +- *(1) HashAggregate(keys=[knownfloatingpointnormalized(if 
(isnull(k#40)) null else named_struct(col1, 
knownfloatingpointnormalized(normalizenanandzero(k#40._1 AS k#40], 
functions=[])
  +- *(1) LocalTableScan [k#40]

scala> val aggOutput = agg.queryExecution.sparkPlan.collect { case a: 
HashAggregateExec => a.output.head }
scala> aggOutput.foreach { attr => println(attr.prettyJson) }
### Final Aggregate ###
[ {
  "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference",
  "num-children" : 0,
  "name" : "k",
  "dataType" : {
"type" : "struct",
"fields" : [ {
  "name" : "_1",
^^^
  "type" : "double",
  "nullable" : false,
  "metadata" : { }
} ]
  },
  "nullable" : true,
  "metadata" : { },
  "exprId" : {
"product-class" : "org.apache.spark.sql.catalyst.expressions.ExprId",
"id" : 40,
"jvmId" : "a824e83f-933e-4b85-a1ff-577b5a0e2366"
  },
  "qualifier" : [ ]
} ]

### Partial Aggregate ###
[ {
  "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference",
  "num-children" : 0,
  "name" : "k",
  "dataType" : {
"type" : "struct",
"fields" : [ {
  "name" : "col1",

  "type" : "double",
  "nullable" : true,
  "metadata" : { }
} ]
  },
  "nullable" : true,
  "metadata" : { },
  "exprId" : {
"product-class" : "org.apache.spark.sql.catalyst.expressions.ExprId",
"id" : 40,
"jvmId" : "a824e83f-933e-4b85-a1ff-577b5a0e2366"
  },
  "qualifier" : [ ]
} ]
```

### Why are the changes needed?

bugfix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added tests.

Closes #29780 from maropu/FixBugInNormalizedFloatingNumbers.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Liang-Chi Hsieh 
(cherry picked from commit b49aaa33e13814a448be51a7e65a29cb515b8248)
Signed-off-by: Liang-Chi Hsieh 
---
 .../spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala   | 6 +++---
 .../test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala | 8 
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
index 8d5dbc7..f0cf671 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
@@ -120,10 +120,10 @@ object NormalizeFloatingNumbers extends Rule[LogicalPlan] 
{
   KnownFloatingPointNormalized(NormalizeNaNAndZero(expr))
 
 case _ if expr.dataType.isInstanceOf[StructType] =>
-  val fields = expr.dataType.asInstanceOf[StructType].fields.indices.map { 
i =>
-normalize(GetStructField(expr, i))
+  val fields = 
expr.dataType.asInstanceOf[StructType].fieldNames.zipWithIndex.map {
+case (name, i) => Seq(Literal(name), normalize(GetStructField(expr, 
i)))
   }
-  val struct = CreateStruct(fields)
+  val struct = CreateNamedStruct(fields.flatten.toSeq)
   KnownFloatingPointNormalized(If(IsNull(expr), Literal(null, 
struct.dataType), struct))
 
 case _ if expr.dataType.isInstanceOf[ArrayType] =>
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
index 54327b3..2cb7790 100644
---

[spark] branch master updated (75dd864 -> b49aaa3)

2020-09-17 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 75dd864  [SPARK-32908][SQL] Fix target error calculation in 
`percentile_approx()`
 add b49aaa3  [SPARK-32906][SQL] Struct field names should not change after 
normalizing floats

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala   | 6 +++---
 .../test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala | 8 
 2 files changed, 11 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32906][SQL] Struct field names should not change after normalizing floats

2020-09-17 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2d55de5  [SPARK-32906][SQL] Struct field names should not change after 
normalizing floats
2d55de5 is described below

commit 2d55de5db06ff522a8e7fded3760c3f65c8bbf93
Author: Takeshi Yamamuro 
AuthorDate: Thu Sep 17 22:07:47 2020 -0700

[SPARK-32906][SQL] Struct field names should not change after normalizing 
floats

### What changes were proposed in this pull request?

This PR intends to fix a minor bug when normalizing floats for struct types;
```
scala> import org.apache.spark.sql.execution.aggregate.HashAggregateExec
scala> val df = Seq(Tuple1(Tuple1(-0.0d)), Tuple1(Tuple1(0.0d))).toDF("k")
scala> val agg = df.distinct()
scala> agg.explain()
== Physical Plan ==
*(2) HashAggregate(keys=[k#40], functions=[])
+- Exchange hashpartitioning(k#40, 200), true, [id=#62]
   +- *(1) HashAggregate(keys=[knownfloatingpointnormalized(if 
(isnull(k#40)) null else named_struct(col1, 
knownfloatingpointnormalized(normalizenanandzero(k#40._1 AS k#40], 
functions=[])
  +- *(1) LocalTableScan [k#40]

scala> val aggOutput = agg.queryExecution.sparkPlan.collect { case a: 
HashAggregateExec => a.output.head }
scala> aggOutput.foreach { attr => println(attr.prettyJson) }
### Final Aggregate ###
[ {
  "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference",
  "num-children" : 0,
  "name" : "k",
  "dataType" : {
"type" : "struct",
"fields" : [ {
  "name" : "_1",
^^^
  "type" : "double",
  "nullable" : false,
  "metadata" : { }
} ]
  },
  "nullable" : true,
  "metadata" : { },
  "exprId" : {
"product-class" : "org.apache.spark.sql.catalyst.expressions.ExprId",
"id" : 40,
"jvmId" : "a824e83f-933e-4b85-a1ff-577b5a0e2366"
  },
  "qualifier" : [ ]
} ]

### Partial Aggregate ###
[ {
  "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference",
  "num-children" : 0,
  "name" : "k",
  "dataType" : {
"type" : "struct",
"fields" : [ {
  "name" : "col1",

  "type" : "double",
  "nullable" : true,
  "metadata" : { }
} ]
  },
  "nullable" : true,
  "metadata" : { },
  "exprId" : {
"product-class" : "org.apache.spark.sql.catalyst.expressions.ExprId",
"id" : 40,
"jvmId" : "a824e83f-933e-4b85-a1ff-577b5a0e2366"
  },
  "qualifier" : [ ]
} ]
```

### Why are the changes needed?

bugfix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added tests.

Closes #29780 from maropu/FixBugInNormalizedFloatingNumbers.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Liang-Chi Hsieh 
(cherry picked from commit b49aaa33e13814a448be51a7e65a29cb515b8248)
Signed-off-by: Liang-Chi Hsieh 
---
 .../spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala   | 6 +++---
 .../test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala | 8 
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
index 8d5dbc7..f0cf671 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
@@ -120,10 +120,10 @@ object NormalizeFloatingNumbers extends Rule[LogicalPlan] 
{
   KnownFloatingPointNormalized(NormalizeNaNAndZero(expr))
 
 case _ if expr.dataType.isInstanceOf[StructType] =>
-  val fields = expr.dataType.asInstanceOf[StructType].fields.indices.map { 
i =>
-normalize(GetStructField(expr, i))
+  val fields = 
expr.dataType.asInstanceOf[StructType].fieldNames.zipWithIndex.map {
+case (name, i) => Seq(Literal(name), normalize(GetStructField(expr, 
i)))
   }
-  val struct = CreateStruct(fields)
+  val struct = CreateNamedStruct(fields.flatten.toSeq)
   KnownFloatingPointNormalized(If(IsNull(expr), Literal(null, 
struct.dataType), struct))
 
 case _ if expr.dataType.isInstanceOf[ArrayType] =>
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
index 54327b3..2cb7790 100644
---

[spark] branch master updated (75dd864 -> b49aaa3)

2020-09-17 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 75dd864  [SPARK-32908][SQL] Fix target error calculation in 
`percentile_approx()`
 add b49aaa3  [SPARK-32906][SQL] Struct field names should not change after 
normalizing floats

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala   | 6 +++---
 .../test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala | 8 
 2 files changed, 11 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32906][SQL] Struct field names should not change after normalizing floats

2020-09-17 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2d55de5  [SPARK-32906][SQL] Struct field names should not change after 
normalizing floats
2d55de5 is described below

commit 2d55de5db06ff522a8e7fded3760c3f65c8bbf93
Author: Takeshi Yamamuro 
AuthorDate: Thu Sep 17 22:07:47 2020 -0700

[SPARK-32906][SQL] Struct field names should not change after normalizing 
floats

### What changes were proposed in this pull request?

This PR intends to fix a minor bug when normalizing floats for struct types;
```
scala> import org.apache.spark.sql.execution.aggregate.HashAggregateExec
scala> val df = Seq(Tuple1(Tuple1(-0.0d)), Tuple1(Tuple1(0.0d))).toDF("k")
scala> val agg = df.distinct()
scala> agg.explain()
== Physical Plan ==
*(2) HashAggregate(keys=[k#40], functions=[])
+- Exchange hashpartitioning(k#40, 200), true, [id=#62]
   +- *(1) HashAggregate(keys=[knownfloatingpointnormalized(if 
(isnull(k#40)) null else named_struct(col1, 
knownfloatingpointnormalized(normalizenanandzero(k#40._1 AS k#40], 
functions=[])
  +- *(1) LocalTableScan [k#40]

scala> val aggOutput = agg.queryExecution.sparkPlan.collect { case a: 
HashAggregateExec => a.output.head }
scala> aggOutput.foreach { attr => println(attr.prettyJson) }
### Final Aggregate ###
[ {
  "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference",
  "num-children" : 0,
  "name" : "k",
  "dataType" : {
"type" : "struct",
"fields" : [ {
  "name" : "_1",
^^^
  "type" : "double",
  "nullable" : false,
  "metadata" : { }
} ]
  },
  "nullable" : true,
  "metadata" : { },
  "exprId" : {
"product-class" : "org.apache.spark.sql.catalyst.expressions.ExprId",
"id" : 40,
"jvmId" : "a824e83f-933e-4b85-a1ff-577b5a0e2366"
  },
  "qualifier" : [ ]
} ]

### Partial Aggregate ###
[ {
  "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference",
  "num-children" : 0,
  "name" : "k",
  "dataType" : {
"type" : "struct",
"fields" : [ {
  "name" : "col1",

  "type" : "double",
  "nullable" : true,
  "metadata" : { }
} ]
  },
  "nullable" : true,
  "metadata" : { },
  "exprId" : {
"product-class" : "org.apache.spark.sql.catalyst.expressions.ExprId",
"id" : 40,
"jvmId" : "a824e83f-933e-4b85-a1ff-577b5a0e2366"
  },
  "qualifier" : [ ]
} ]
```

### Why are the changes needed?

bugfix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added tests.

Closes #29780 from maropu/FixBugInNormalizedFloatingNumbers.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Liang-Chi Hsieh 
(cherry picked from commit b49aaa33e13814a448be51a7e65a29cb515b8248)
Signed-off-by: Liang-Chi Hsieh 
---
 .../spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala   | 6 +++---
 .../test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala | 8 
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
index 8d5dbc7..f0cf671 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
@@ -120,10 +120,10 @@ object NormalizeFloatingNumbers extends Rule[LogicalPlan] 
{
   KnownFloatingPointNormalized(NormalizeNaNAndZero(expr))
 
 case _ if expr.dataType.isInstanceOf[StructType] =>
-  val fields = expr.dataType.asInstanceOf[StructType].fields.indices.map { 
i =>
-normalize(GetStructField(expr, i))
+  val fields = 
expr.dataType.asInstanceOf[StructType].fieldNames.zipWithIndex.map {
+case (name, i) => Seq(Literal(name), normalize(GetStructField(expr, 
i)))
   }
-  val struct = CreateStruct(fields)
+  val struct = CreateNamedStruct(fields.flatten.toSeq)
   KnownFloatingPointNormalized(If(IsNull(expr), Literal(null, 
struct.dataType), struct))
 
 case _ if expr.dataType.isInstanceOf[ArrayType] =>
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
index 54327b3..2cb7790 100644
---

[spark] branch master updated (75dd864 -> b49aaa3)

2020-09-17 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 75dd864  [SPARK-32908][SQL] Fix target error calculation in 
`percentile_approx()`
 add b49aaa3  [SPARK-32906][SQL] Struct field names should not change after 
normalizing floats

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala   | 6 +++---
 .../test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala | 8 
 2 files changed, 11 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32906][SQL] Struct field names should not change after normalizing floats

2020-09-17 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2d55de5  [SPARK-32906][SQL] Struct field names should not change after 
normalizing floats
2d55de5 is described below

commit 2d55de5db06ff522a8e7fded3760c3f65c8bbf93
Author: Takeshi Yamamuro 
AuthorDate: Thu Sep 17 22:07:47 2020 -0700

[SPARK-32906][SQL] Struct field names should not change after normalizing 
floats

### What changes were proposed in this pull request?

This PR intends to fix a minor bug when normalizing floats for struct types;
```
scala> import org.apache.spark.sql.execution.aggregate.HashAggregateExec
scala> val df = Seq(Tuple1(Tuple1(-0.0d)), Tuple1(Tuple1(0.0d))).toDF("k")
scala> val agg = df.distinct()
scala> agg.explain()
== Physical Plan ==
*(2) HashAggregate(keys=[k#40], functions=[])
+- Exchange hashpartitioning(k#40, 200), true, [id=#62]
   +- *(1) HashAggregate(keys=[knownfloatingpointnormalized(if 
(isnull(k#40)) null else named_struct(col1, 
knownfloatingpointnormalized(normalizenanandzero(k#40._1 AS k#40], 
functions=[])
  +- *(1) LocalTableScan [k#40]

scala> val aggOutput = agg.queryExecution.sparkPlan.collect { case a: 
HashAggregateExec => a.output.head }
scala> aggOutput.foreach { attr => println(attr.prettyJson) }
### Final Aggregate ###
[ {
  "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference",
  "num-children" : 0,
  "name" : "k",
  "dataType" : {
"type" : "struct",
"fields" : [ {
  "name" : "_1",
^^^
  "type" : "double",
  "nullable" : false,
  "metadata" : { }
} ]
  },
  "nullable" : true,
  "metadata" : { },
  "exprId" : {
"product-class" : "org.apache.spark.sql.catalyst.expressions.ExprId",
"id" : 40,
"jvmId" : "a824e83f-933e-4b85-a1ff-577b5a0e2366"
  },
  "qualifier" : [ ]
} ]

### Partial Aggregate ###
[ {
  "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference",
  "num-children" : 0,
  "name" : "k",
  "dataType" : {
"type" : "struct",
"fields" : [ {
  "name" : "col1",

  "type" : "double",
  "nullable" : true,
  "metadata" : { }
} ]
  },
  "nullable" : true,
  "metadata" : { },
  "exprId" : {
"product-class" : "org.apache.spark.sql.catalyst.expressions.ExprId",
"id" : 40,
"jvmId" : "a824e83f-933e-4b85-a1ff-577b5a0e2366"
  },
  "qualifier" : [ ]
} ]
```

### Why are the changes needed?

bugfix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added tests.

Closes #29780 from maropu/FixBugInNormalizedFloatingNumbers.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Liang-Chi Hsieh 
(cherry picked from commit b49aaa33e13814a448be51a7e65a29cb515b8248)
Signed-off-by: Liang-Chi Hsieh 
---
 .../spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala   | 6 +++---
 .../test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala | 8 
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
index 8d5dbc7..f0cf671 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
@@ -120,10 +120,10 @@ object NormalizeFloatingNumbers extends Rule[LogicalPlan] 
{
   KnownFloatingPointNormalized(NormalizeNaNAndZero(expr))
 
 case _ if expr.dataType.isInstanceOf[StructType] =>
-  val fields = expr.dataType.asInstanceOf[StructType].fields.indices.map { 
i =>
-normalize(GetStructField(expr, i))
+  val fields = 
expr.dataType.asInstanceOf[StructType].fieldNames.zipWithIndex.map {
+case (name, i) => Seq(Literal(name), normalize(GetStructField(expr, 
i)))
   }
-  val struct = CreateStruct(fields)
+  val struct = CreateNamedStruct(fields.flatten.toSeq)
   KnownFloatingPointNormalized(If(IsNull(expr), Literal(null, 
struct.dataType), struct))
 
 case _ if expr.dataType.isInstanceOf[ArrayType] =>
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
index 54327b3..2cb7790 100644
---

[spark] branch master updated (75dd864 -> b49aaa3)

2020-09-17 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 75dd864  [SPARK-32908][SQL] Fix target error calculation in 
`percentile_approx()`
 add b49aaa3  [SPARK-32906][SQL] Struct field names should not change after 
normalizing floats

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala   | 6 +++---
 .../test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala | 8 
 2 files changed, 11 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (75dd864 -> b49aaa3)

2020-09-17 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 75dd864  [SPARK-32908][SQL] Fix target error calculation in 
`percentile_approx()`
 add b49aaa3  [SPARK-32906][SQL] Struct field names should not change after 
normalizing floats

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala   | 6 +++---
 .../test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala | 8 
 2 files changed, 11 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated (2fa68a6 -> eed7c62)

2020-09-17 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2fa68a6  [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when 
there are nulls
 add eed7c62  [SPARK-32908][SQL][2.4] Fix target error calculation in 
`percentile_approx()`

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/util/QuantileSummaries.scala  |   2 +-
 .../test-data/percentile_approx-input.csv.bz2  | Bin 0 -> 124614 bytes
 .../sql/ApproximatePercentileQuerySuite.scala  |  21 -
 3 files changed, 21 insertions(+), 2 deletions(-)
 create mode 100644 
sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()`

2020-09-17 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 5581a92  [SPARK-32908][SQL] Fix target error calculation in 
`percentile_approx()`
5581a92 is described below

commit 5581a92bb8b1d96278796e6c18812c5a2931db11
Author: Max Gekk 
AuthorDate: Fri Sep 18 10:47:06 2020 +0900

[SPARK-32908][SQL] Fix target error calculation in `percentile_approx()`

### What changes were proposed in this pull request?
1. Change the target error calculation according to the paper 
[Space-Efficient Online Computation of Quantile 
Summaries](http://infolab.stanford.edu/~datar/courses/cs361a/papers/quantiles.pdf).
 It says that the error `e = max(gi, deltai)/2` (see the page 59). Also this 
has clear explanation [ε-approximate 
quantiles](http://www.mathcs.emory.edu/~cheung/Courses/584/Syllabus/08-Quantile/Greenwald.html#proofprop1).
2. Added a test to check different accuracies.
3. Added an input CSV file `percentile_approx-input.csv.bz2` to the 
resource folder `sql/catalyst/src/main/resources` for the test.

### Why are the changes needed?
To fix incorrect percentile calculation, see an example in SPARK-32908.

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
- By running existing tests in `QuantileSummariesSuite` and in 
`ApproximatePercentileQuerySuite`.
- Added new test `SPARK-32908: maximum target error in percentile_approx` 
to `ApproximatePercentileQuerySuite`.

Closes #29784 from MaxGekk/fix-percentile_approx-2.

Authored-by: Max Gekk 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 75dd86400c3c2348a4139586fbbead840512b909)
Signed-off-by: HyukjinKwon 
---
 .../sql/catalyst/util/QuantileSummaries.scala  |   2 +-
 .../test-data/percentile_approx-input.csv.bz2  | Bin 0 -> 124614 bytes
 .../sql/ApproximatePercentileQuerySuite.scala  |  21 -
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala
index 3a0490d..e07aa0f 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala
@@ -254,7 +254,7 @@ class QuantileSummaries(
 
 // Target rank
 val rank = math.ceil(quantile * count).toLong
-val targetError = relativeError * count
+val targetError = sampled.map(s => s.delta + s.g).max / 2
 // Minimum rank at current sample
 var minRank = 0L
 var i = 0
diff --git 
a/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 
b/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2
new file mode 100644
index 000..f85e289
Binary files /dev/null and 
b/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 differ
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
index 2b4abed..4991e39 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
@@ -150,7 +150,7 @@ class ApproximatePercentileQuerySuite extends QueryTest 
with SharedSparkSession
   (1 to 1000).toDF("col").createOrReplaceTempView(table)
   checkAnswer(
 spark.sql(s"SELECT percentile_approx(col, array(0.25 + 0.25D), 200 + 
800) FROM $table"),
-Row(Seq(499))
+Row(Seq(500))
   )
 }
   }
@@ -296,4 +296,23 @@ class ApproximatePercentileQuerySuite extends QueryTest 
with SharedSparkSession
 buffer.quantileSummaries
 assert(buffer.isCompressed)
   }
+
+  test("SPARK-32908: maximum target error in percentile_approx") {
+withTempView(table) {
+  spark.read
+.schema("col int")
+.csv(testFile("test-data/percentile_approx-input.csv.bz2"))
+.repartition(1)
+.createOrReplaceTempView(table)
+  checkAnswer(
+spark.sql(
+  s"""SELECT
+ |  percentile_approx(col, 0.77, 1000),
+ |  percentile_approx(col, 0.77, 1),
+ |  percentile_approx(col, 0.77, 10),
+ |  percentile_approx(col, 0.77, 100)
+ |FROM $table""".stripMargin),
+Row(18, 17, 17, 17))
+}
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated (2fa68a6 -> eed7c62)

2020-09-17 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2fa68a6  [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when 
there are nulls
 add eed7c62  [SPARK-32908][SQL][2.4] Fix target error calculation in 
`percentile_approx()`

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/util/QuantileSummaries.scala  |   2 +-
 .../test-data/percentile_approx-input.csv.bz2  | Bin 0 -> 124614 bytes
 .../sql/ApproximatePercentileQuerySuite.scala  |  21 -
 3 files changed, 21 insertions(+), 2 deletions(-)
 create mode 100644 
sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()`

2020-09-17 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 5581a92  [SPARK-32908][SQL] Fix target error calculation in 
`percentile_approx()`
5581a92 is described below

commit 5581a92bb8b1d96278796e6c18812c5a2931db11
Author: Max Gekk 
AuthorDate: Fri Sep 18 10:47:06 2020 +0900

[SPARK-32908][SQL] Fix target error calculation in `percentile_approx()`

### What changes were proposed in this pull request?
1. Change the target error calculation according to the paper 
[Space-Efficient Online Computation of Quantile 
Summaries](http://infolab.stanford.edu/~datar/courses/cs361a/papers/quantiles.pdf).
 It says that the error `e = max(gi, deltai)/2` (see the page 59). Also this 
has clear explanation [ε-approximate 
quantiles](http://www.mathcs.emory.edu/~cheung/Courses/584/Syllabus/08-Quantile/Greenwald.html#proofprop1).
2. Added a test to check different accuracies.
3. Added an input CSV file `percentile_approx-input.csv.bz2` to the 
resource folder `sql/catalyst/src/main/resources` for the test.

### Why are the changes needed?
To fix incorrect percentile calculation, see an example in SPARK-32908.

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
- By running existing tests in `QuantileSummariesSuite` and in 
`ApproximatePercentileQuerySuite`.
- Added new test `SPARK-32908: maximum target error in percentile_approx` 
to `ApproximatePercentileQuerySuite`.

Closes #29784 from MaxGekk/fix-percentile_approx-2.

Authored-by: Max Gekk 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 75dd86400c3c2348a4139586fbbead840512b909)
Signed-off-by: HyukjinKwon 
---
 .../sql/catalyst/util/QuantileSummaries.scala  |   2 +-
 .../test-data/percentile_approx-input.csv.bz2  | Bin 0 -> 124614 bytes
 .../sql/ApproximatePercentileQuerySuite.scala  |  21 -
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala
index 3a0490d..e07aa0f 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala
@@ -254,7 +254,7 @@ class QuantileSummaries(
 
 // Target rank
 val rank = math.ceil(quantile * count).toLong
-val targetError = relativeError * count
+val targetError = sampled.map(s => s.delta + s.g).max / 2
 // Minimum rank at current sample
 var minRank = 0L
 var i = 0
diff --git 
a/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 
b/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2
new file mode 100644
index 000..f85e289
Binary files /dev/null and 
b/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 differ
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
index 2b4abed..4991e39 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
@@ -150,7 +150,7 @@ class ApproximatePercentileQuerySuite extends QueryTest 
with SharedSparkSession
   (1 to 1000).toDF("col").createOrReplaceTempView(table)
   checkAnswer(
 spark.sql(s"SELECT percentile_approx(col, array(0.25 + 0.25D), 200 + 
800) FROM $table"),
-Row(Seq(499))
+Row(Seq(500))
   )
 }
   }
@@ -296,4 +296,23 @@ class ApproximatePercentileQuerySuite extends QueryTest 
with SharedSparkSession
 buffer.quantileSummaries
 assert(buffer.isCompressed)
   }
+
+  test("SPARK-32908: maximum target error in percentile_approx") {
+withTempView(table) {
+  spark.read
+.schema("col int")
+.csv(testFile("test-data/percentile_approx-input.csv.bz2"))
+.repartition(1)
+.createOrReplaceTempView(table)
+  checkAnswer(
+spark.sql(
+  s"""SELECT
+ |  percentile_approx(col, 0.77, 1000),
+ |  percentile_approx(col, 0.77, 1),
+ |  percentile_approx(col, 0.77, 10),
+ |  percentile_approx(col, 0.77, 100)
+ |FROM $table""".stripMargin),
+Row(18, 17, 17, 17))
+}
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9d6221b -> 75dd864)

2020-09-17 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9d6221b  [SPARK-18409][ML][FOLLOWUP] LSH approxNearestNeighbors 
optimization 2
 add 75dd864  [SPARK-32908][SQL] Fix target error calculation in 
`percentile_approx()`

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/util/QuantileSummaries.scala  |   2 +-
 .../test-data/percentile_approx-input.csv.bz2  | Bin 0 -> 124614 bytes
 .../sql/ApproximatePercentileQuerySuite.scala  |  21 -
 3 files changed, 21 insertions(+), 2 deletions(-)
 create mode 100644 
sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated (2fa68a6 -> eed7c62)

2020-09-17 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2fa68a6  [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when 
there are nulls
 add eed7c62  [SPARK-32908][SQL][2.4] Fix target error calculation in 
`percentile_approx()`

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/util/QuantileSummaries.scala  |   2 +-
 .../test-data/percentile_approx-input.csv.bz2  | Bin 0 -> 124614 bytes
 .../sql/ApproximatePercentileQuerySuite.scala  |  21 -
 3 files changed, 21 insertions(+), 2 deletions(-)
 create mode 100644 
sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()`

2020-09-17 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 5581a92  [SPARK-32908][SQL] Fix target error calculation in 
`percentile_approx()`
5581a92 is described below

commit 5581a92bb8b1d96278796e6c18812c5a2931db11
Author: Max Gekk 
AuthorDate: Fri Sep 18 10:47:06 2020 +0900

[SPARK-32908][SQL] Fix target error calculation in `percentile_approx()`

### What changes were proposed in this pull request?
1. Change the target error calculation according to the paper 
[Space-Efficient Online Computation of Quantile 
Summaries](http://infolab.stanford.edu/~datar/courses/cs361a/papers/quantiles.pdf).
 It says that the error `e = max(gi, deltai)/2` (see the page 59). Also this 
has clear explanation [ε-approximate 
quantiles](http://www.mathcs.emory.edu/~cheung/Courses/584/Syllabus/08-Quantile/Greenwald.html#proofprop1).
2. Added a test to check different accuracies.
3. Added an input CSV file `percentile_approx-input.csv.bz2` to the 
resource folder `sql/catalyst/src/main/resources` for the test.

### Why are the changes needed?
To fix incorrect percentile calculation, see an example in SPARK-32908.

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
- By running existing tests in `QuantileSummariesSuite` and in 
`ApproximatePercentileQuerySuite`.
- Added new test `SPARK-32908: maximum target error in percentile_approx` 
to `ApproximatePercentileQuerySuite`.

Closes #29784 from MaxGekk/fix-percentile_approx-2.

Authored-by: Max Gekk 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 75dd86400c3c2348a4139586fbbead840512b909)
Signed-off-by: HyukjinKwon 
---
 .../sql/catalyst/util/QuantileSummaries.scala  |   2 +-
 .../test-data/percentile_approx-input.csv.bz2  | Bin 0 -> 124614 bytes
 .../sql/ApproximatePercentileQuerySuite.scala  |  21 -
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala
index 3a0490d..e07aa0f 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala
@@ -254,7 +254,7 @@ class QuantileSummaries(
 
 // Target rank
 val rank = math.ceil(quantile * count).toLong
-val targetError = relativeError * count
+val targetError = sampled.map(s => s.delta + s.g).max / 2
 // Minimum rank at current sample
 var minRank = 0L
 var i = 0
diff --git 
a/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 
b/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2
new file mode 100644
index 000..f85e289
Binary files /dev/null and 
b/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 differ
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
index 2b4abed..4991e39 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
@@ -150,7 +150,7 @@ class ApproximatePercentileQuerySuite extends QueryTest 
with SharedSparkSession
   (1 to 1000).toDF("col").createOrReplaceTempView(table)
   checkAnswer(
 spark.sql(s"SELECT percentile_approx(col, array(0.25 + 0.25D), 200 + 
800) FROM $table"),
-Row(Seq(499))
+Row(Seq(500))
   )
 }
   }
@@ -296,4 +296,23 @@ class ApproximatePercentileQuerySuite extends QueryTest 
with SharedSparkSession
 buffer.quantileSummaries
 assert(buffer.isCompressed)
   }
+
+  test("SPARK-32908: maximum target error in percentile_approx") {
+withTempView(table) {
+  spark.read
+.schema("col int")
+.csv(testFile("test-data/percentile_approx-input.csv.bz2"))
+.repartition(1)
+.createOrReplaceTempView(table)
+  checkAnswer(
+spark.sql(
+  s"""SELECT
+ |  percentile_approx(col, 0.77, 1000),
+ |  percentile_approx(col, 0.77, 1),
+ |  percentile_approx(col, 0.77, 10),
+ |  percentile_approx(col, 0.77, 100)
+ |FROM $table""".stripMargin),
+Row(18, 17, 17, 17))
+}
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9d6221b -> 75dd864)

2020-09-17 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9d6221b  [SPARK-18409][ML][FOLLOWUP] LSH approxNearestNeighbors 
optimization 2
 add 75dd864  [SPARK-32908][SQL] Fix target error calculation in 
`percentile_approx()`

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/util/QuantileSummaries.scala  |   2 +-
 .../test-data/percentile_approx-input.csv.bz2  | Bin 0 -> 124614 bytes
 .../sql/ApproximatePercentileQuerySuite.scala  |  21 -
 3 files changed, 21 insertions(+), 2 deletions(-)
 create mode 100644 
sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated (2fa68a6 -> eed7c62)

2020-09-17 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2fa68a6  [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when 
there are nulls
 add eed7c62  [SPARK-32908][SQL][2.4] Fix target error calculation in 
`percentile_approx()`

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/util/QuantileSummaries.scala  |   2 +-
 .../test-data/percentile_approx-input.csv.bz2  | Bin 0 -> 124614 bytes
 .../sql/ApproximatePercentileQuerySuite.scala  |  21 -
 3 files changed, 21 insertions(+), 2 deletions(-)
 create mode 100644 
sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()`

2020-09-17 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 5581a92  [SPARK-32908][SQL] Fix target error calculation in 
`percentile_approx()`
5581a92 is described below

commit 5581a92bb8b1d96278796e6c18812c5a2931db11
Author: Max Gekk 
AuthorDate: Fri Sep 18 10:47:06 2020 +0900

[SPARK-32908][SQL] Fix target error calculation in `percentile_approx()`

### What changes were proposed in this pull request?
1. Change the target error calculation according to the paper 
[Space-Efficient Online Computation of Quantile 
Summaries](http://infolab.stanford.edu/~datar/courses/cs361a/papers/quantiles.pdf).
 It says that the error `e = max(gi, deltai)/2` (see the page 59). Also this 
has clear explanation [ε-approximate 
quantiles](http://www.mathcs.emory.edu/~cheung/Courses/584/Syllabus/08-Quantile/Greenwald.html#proofprop1).
2. Added a test to check different accuracies.
3. Added an input CSV file `percentile_approx-input.csv.bz2` to the 
resource folder `sql/catalyst/src/main/resources` for the test.

### Why are the changes needed?
To fix incorrect percentile calculation, see an example in SPARK-32908.

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
- By running existing tests in `QuantileSummariesSuite` and in 
`ApproximatePercentileQuerySuite`.
- Added new test `SPARK-32908: maximum target error in percentile_approx` 
to `ApproximatePercentileQuerySuite`.

Closes #29784 from MaxGekk/fix-percentile_approx-2.

Authored-by: Max Gekk 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 75dd86400c3c2348a4139586fbbead840512b909)
Signed-off-by: HyukjinKwon 
---
 .../sql/catalyst/util/QuantileSummaries.scala  |   2 +-
 .../test-data/percentile_approx-input.csv.bz2  | Bin 0 -> 124614 bytes
 .../sql/ApproximatePercentileQuerySuite.scala  |  21 -
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala
index 3a0490d..e07aa0f 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala
@@ -254,7 +254,7 @@ class QuantileSummaries(
 
 // Target rank
 val rank = math.ceil(quantile * count).toLong
-val targetError = relativeError * count
+val targetError = sampled.map(s => s.delta + s.g).max / 2
 // Minimum rank at current sample
 var minRank = 0L
 var i = 0
diff --git 
a/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 
b/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2
new file mode 100644
index 000..f85e289
Binary files /dev/null and 
b/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 differ
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
index 2b4abed..4991e39 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
@@ -150,7 +150,7 @@ class ApproximatePercentileQuerySuite extends QueryTest 
with SharedSparkSession
   (1 to 1000).toDF("col").createOrReplaceTempView(table)
   checkAnswer(
 spark.sql(s"SELECT percentile_approx(col, array(0.25 + 0.25D), 200 + 
800) FROM $table"),
-Row(Seq(499))
+Row(Seq(500))
   )
 }
   }
@@ -296,4 +296,23 @@ class ApproximatePercentileQuerySuite extends QueryTest 
with SharedSparkSession
 buffer.quantileSummaries
 assert(buffer.isCompressed)
   }
+
+  test("SPARK-32908: maximum target error in percentile_approx") {
+withTempView(table) {
+  spark.read
+.schema("col int")
+.csv(testFile("test-data/percentile_approx-input.csv.bz2"))
+.repartition(1)
+.createOrReplaceTempView(table)
+  checkAnswer(
+spark.sql(
+  s"""SELECT
+ |  percentile_approx(col, 0.77, 1000),
+ |  percentile_approx(col, 0.77, 1),
+ |  percentile_approx(col, 0.77, 10),
+ |  percentile_approx(col, 0.77, 100)
+ |FROM $table""".stripMargin),
+Row(18, 17, 17, 17))
+}
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9d6221b -> 75dd864)

2020-09-17 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9d6221b  [SPARK-18409][ML][FOLLOWUP] LSH approxNearestNeighbors 
optimization 2
 add 75dd864  [SPARK-32908][SQL] Fix target error calculation in 
`percentile_approx()`

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/util/QuantileSummaries.scala  |   2 +-
 .../test-data/percentile_approx-input.csv.bz2  | Bin 0 -> 124614 bytes
 .../sql/ApproximatePercentileQuerySuite.scala  |  21 -
 3 files changed, 21 insertions(+), 2 deletions(-)
 create mode 100644 
sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated (2fa68a6 -> eed7c62)

2020-09-17 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2fa68a6  [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when 
there are nulls
 add eed7c62  [SPARK-32908][SQL][2.4] Fix target error calculation in 
`percentile_approx()`

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/util/QuantileSummaries.scala  |   2 +-
 .../test-data/percentile_approx-input.csv.bz2  | Bin 0 -> 124614 bytes
 .../sql/ApproximatePercentileQuerySuite.scala  |  21 -
 3 files changed, 21 insertions(+), 2 deletions(-)
 create mode 100644 
sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32908][SQL] Fix target error calculation in `percentile_approx()`

2020-09-17 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 5581a92  [SPARK-32908][SQL] Fix target error calculation in 
`percentile_approx()`
5581a92 is described below

commit 5581a92bb8b1d96278796e6c18812c5a2931db11
Author: Max Gekk 
AuthorDate: Fri Sep 18 10:47:06 2020 +0900

[SPARK-32908][SQL] Fix target error calculation in `percentile_approx()`

### What changes were proposed in this pull request?
1. Change the target error calculation according to the paper 
[Space-Efficient Online Computation of Quantile 
Summaries](http://infolab.stanford.edu/~datar/courses/cs361a/papers/quantiles.pdf).
 It says that the error `e = max(gi, deltai)/2` (see the page 59). Also this 
has clear explanation [ε-approximate 
quantiles](http://www.mathcs.emory.edu/~cheung/Courses/584/Syllabus/08-Quantile/Greenwald.html#proofprop1).
2. Added a test to check different accuracies.
3. Added an input CSV file `percentile_approx-input.csv.bz2` to the 
resource folder `sql/catalyst/src/main/resources` for the test.

### Why are the changes needed?
To fix incorrect percentile calculation, see an example in SPARK-32908.

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
- By running existing tests in `QuantileSummariesSuite` and in 
`ApproximatePercentileQuerySuite`.
- Added new test `SPARK-32908: maximum target error in percentile_approx` 
to `ApproximatePercentileQuerySuite`.

Closes #29784 from MaxGekk/fix-percentile_approx-2.

Authored-by: Max Gekk 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 75dd86400c3c2348a4139586fbbead840512b909)
Signed-off-by: HyukjinKwon 
---
 .../sql/catalyst/util/QuantileSummaries.scala  |   2 +-
 .../test-data/percentile_approx-input.csv.bz2  | Bin 0 -> 124614 bytes
 .../sql/ApproximatePercentileQuerySuite.scala  |  21 -
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala
index 3a0490d..e07aa0f 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala
@@ -254,7 +254,7 @@ class QuantileSummaries(
 
 // Target rank
 val rank = math.ceil(quantile * count).toLong
-val targetError = relativeError * count
+val targetError = sampled.map(s => s.delta + s.g).max / 2
 // Minimum rank at current sample
 var minRank = 0L
 var i = 0
diff --git 
a/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 
b/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2
new file mode 100644
index 000..f85e289
Binary files /dev/null and 
b/sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2 differ
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
index 2b4abed..4991e39 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
@@ -150,7 +150,7 @@ class ApproximatePercentileQuerySuite extends QueryTest 
with SharedSparkSession
   (1 to 1000).toDF("col").createOrReplaceTempView(table)
   checkAnswer(
 spark.sql(s"SELECT percentile_approx(col, array(0.25 + 0.25D), 200 + 
800) FROM $table"),
-Row(Seq(499))
+Row(Seq(500))
   )
 }
   }
@@ -296,4 +296,23 @@ class ApproximatePercentileQuerySuite extends QueryTest 
with SharedSparkSession
 buffer.quantileSummaries
 assert(buffer.isCompressed)
   }
+
+  test("SPARK-32908: maximum target error in percentile_approx") {
+withTempView(table) {
+  spark.read
+.schema("col int")
+.csv(testFile("test-data/percentile_approx-input.csv.bz2"))
+.repartition(1)
+.createOrReplaceTempView(table)
+  checkAnswer(
+spark.sql(
+  s"""SELECT
+ |  percentile_approx(col, 0.77, 1000),
+ |  percentile_approx(col, 0.77, 1),
+ |  percentile_approx(col, 0.77, 10),
+ |  percentile_approx(col, 0.77, 100)
+ |FROM $table""".stripMargin),
+Row(18, 17, 17, 17))
+}
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9d6221b -> 75dd864)

2020-09-17 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9d6221b  [SPARK-18409][ML][FOLLOWUP] LSH approxNearestNeighbors 
optimization 2
 add 75dd864  [SPARK-32908][SQL] Fix target error calculation in 
`percentile_approx()`

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/util/QuantileSummaries.scala  |   2 +-
 .../test-data/percentile_approx-input.csv.bz2  | Bin 0 -> 124614 bytes
 .../sql/ApproximatePercentileQuerySuite.scala  |  21 -
 3 files changed, 21 insertions(+), 2 deletions(-)
 create mode 100644 
sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9d6221b -> 75dd864)

2020-09-17 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9d6221b  [SPARK-18409][ML][FOLLOWUP] LSH approxNearestNeighbors 
optimization 2
 add 75dd864  [SPARK-32908][SQL] Fix target error calculation in 
`percentile_approx()`

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/util/QuantileSummaries.scala  |   2 +-
 .../test-data/percentile_approx-input.csv.bz2  | Bin 0 -> 124614 bytes
 .../sql/ApproximatePercentileQuerySuite.scala  |  21 -
 3 files changed, 21 insertions(+), 2 deletions(-)
 create mode 100644 
sql/core/src/test/resources/test-data/percentile_approx-input.csv.bz2


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (68e0d5f -> 9d6221b)

2020-09-17 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 68e0d5f  [SPARK-32902][SQL] Logging plan changes for AQE
 add 9d6221b  [SPARK-18409][ML][FOLLOWUP] LSH approxNearestNeighbors 
optimization 2

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/ml/feature/LSH.scala| 28 +++---
 1 file changed, 14 insertions(+), 14 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (68e0d5f -> 9d6221b)

2020-09-17 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 68e0d5f  [SPARK-32902][SQL] Logging plan changes for AQE
 add 9d6221b  [SPARK-18409][ML][FOLLOWUP] LSH approxNearestNeighbors 
optimization 2

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/ml/feature/LSH.scala| 28 +++---
 1 file changed, 14 insertions(+), 14 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (68e0d5f -> 9d6221b)

2020-09-17 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 68e0d5f  [SPARK-32902][SQL] Logging plan changes for AQE
 add 9d6221b  [SPARK-18409][ML][FOLLOWUP] LSH approxNearestNeighbors 
optimization 2

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/ml/feature/LSH.scala| 28 +++---
 1 file changed, 14 insertions(+), 14 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (68e0d5f -> 9d6221b)

2020-09-17 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 68e0d5f  [SPARK-32902][SQL] Logging plan changes for AQE
 add 9d6221b  [SPARK-18409][ML][FOLLOWUP] LSH approxNearestNeighbors 
optimization 2

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/ml/feature/LSH.scala| 28 +++---
 1 file changed, 14 insertions(+), 14 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (68e0d5f -> 9d6221b)

2020-09-17 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 68e0d5f  [SPARK-32902][SQL] Logging plan changes for AQE
 add 9d6221b  [SPARK-18409][ML][FOLLOWUP] LSH approxNearestNeighbors 
optimization 2

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/ml/feature/LSH.scala| 28 +++---
 1 file changed, 14 insertions(+), 14 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4ced588 -> 68e0d5f)

2020-09-17 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4ced588  [SPARK-32635][SQL] Fix foldable propagation
 add 68e0d5f  [SPARK-32902][SQL] Logging plan changes for AQE

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala | 45 +-
 .../adaptive/AdaptiveQueryExecSuite.scala  | 20 ++
 2 files changed, 56 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4ced588 -> 68e0d5f)

2020-09-17 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4ced588  [SPARK-32635][SQL] Fix foldable propagation
 add 68e0d5f  [SPARK-32902][SQL] Logging plan changes for AQE

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala | 45 +-
 .../adaptive/AdaptiveQueryExecSuite.scala  | 20 ++
 2 files changed, 56 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4ced588 -> 68e0d5f)

2020-09-17 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4ced588  [SPARK-32635][SQL] Fix foldable propagation
 add 68e0d5f  [SPARK-32902][SQL] Logging plan changes for AQE

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala | 45 +-
 .../adaptive/AdaptiveQueryExecSuite.scala  | 20 ++
 2 files changed, 56 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4ced588 -> 68e0d5f)

2020-09-17 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4ced588  [SPARK-32635][SQL] Fix foldable propagation
 add 68e0d5f  [SPARK-32902][SQL] Logging plan changes for AQE

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala | 45 +-
 .../adaptive/AdaptiveQueryExecSuite.scala  | 20 ++
 2 files changed, 56 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4ced588 -> 68e0d5f)

2020-09-17 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4ced588  [SPARK-32635][SQL] Fix foldable propagation
 add 68e0d5f  [SPARK-32902][SQL] Logging plan changes for AQE

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala | 45 +-
 .../adaptive/AdaptiveQueryExecSuite.scala  | 20 ++
 2 files changed, 56 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32635][SQL] Fix foldable propagation

2020-09-17 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ecc2f5d  [SPARK-32635][SQL] Fix foldable propagation
ecc2f5d is described below

commit ecc2f5d9e227b62f418d65708f516ffe8e690f96
Author: Peter Toth 
AuthorDate: Fri Sep 18 08:17:23 2020 +0900

[SPARK-32635][SQL] Fix foldable propagation

### What changes were proposed in this pull request?
This PR rewrites `FoldablePropagation` rule to replace attribute references 
in a node with foldables coming only from the node's children.

Before this PR in the case of this example (with 
setting`spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation`):
```scala
val a = Seq("1").toDF("col1").withColumn("col2", lit("1"))
val b = Seq("2").toDF("col1").withColumn("col2", lit("2"))
val aub = a.union(b)
val c = aub.filter($"col1" === "2").cache()
val d = Seq("2").toDF( "col4")
val r = d.join(aub, $"col2" === $"col4").select("col4")
val l = c.select("col2")
val df = l.join(r, $"col2" === $"col4", "LeftOuter")
df.show()
```
foldable propagation happens incorrectly:
```
 Join LeftOuter, (col2#6 = col4#34) 
 Join LeftOuter, (col2#6 = col4#34)
!:- Project [col2#6]
 :- Project [1 AS col2#6]
 :  +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, 
deserialized, 1 replicas)   :  +- InMemoryRelation [col1#4, col2#6], 
StorageLevel(disk, memory, deserialized, 1 replicas)
 :+- Union  
 :+- Union
 :   :- *(1) Project [value#1 AS col1#4, 1 AS col2#6]   
 :   :- *(1) Project [value#1 AS col1#4, 1 AS 
col2#6]
 :   :  +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2))   
 :   :  +- *(1) Filter (isnotnull(value#1) AND 
(value#1 = 2))
 :   : +- *(1) LocalTableScan [value#1] 
 :   : +- *(1) LocalTableScan [value#1]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS col2#15]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS 
col2#15]
 :  +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) 
 :  +- *(2) Filter (isnotnull(value#10) AND 
(value#10 = 2))
 : +- *(2) LocalTableScan [value#10]
 : +- *(2) LocalTableScan [value#10]
 +- Project [col4#34]   
 +- Project [col4#34]
+- Join Inner, (col2#6 = col4#34)   
+- Join Inner, (col2#6 = col4#34)
   :- Project [value#31 AS col4#34] 
   :- Project [value#31 AS col4#34]
   :  +- LocalRelation [value#31]   
   :  +- LocalRelation [value#31]
   +- Project [col2#6]  
   +- Project [col2#6]
  +- Union false, false 
  +- Union false, false
 :- Project [1 AS col2#6]   
 :- Project [1 AS col2#6]
 :  +- LocalRelation [value#1]  
 :  +- LocalRelation [value#1]
 +- Project [2 AS col2#15]  
 +- Project [2 AS col2#15]
+- LocalRelation [value#10] 
+- LocalRelation [value#10]

```
and so the result is wrong:
```
+++
|col2|col4|
+++
|   1|null|
+++
```

After this PR foldable propagation will not happen incorrectly and the 
result is correct:
```
+++
|col2|col4|
+++
|   2|   2|
+++
```

### Why are the changes needed?
To fix a correctness issue.

### Does this PR introduce _any_ user-facing change?
Yes, fixes a correctness issue.

### How was this patch tested?
Existing and new UTs.

Closes #29771 from peter-toth/SPARK-32635-fix-foldable-propagation.

Authored-by: Peter Toth 
Signed-off-by: Takeshi

[spark] branch master updated (ea3b979 -> 4ced588)

2020-09-17 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ea3b979  [SPARK-32889][SQL] orc table column name supports special 
characters
 add 4ced588  [SPARK-32635][SQL] Fix foldable propagation

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/expressions/AttributeMap.scala|   2 +
 .../sql/catalyst/expressions/AttributeMap.scala|   2 +
 .../spark/sql/catalyst/optimizer/expressions.scala | 121 -
 .../org/apache/spark/sql/DataFrameSuite.scala  |  12 ++
 4 files changed, 88 insertions(+), 49 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32635][SQL] Fix foldable propagation

2020-09-17 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ecc2f5d  [SPARK-32635][SQL] Fix foldable propagation
ecc2f5d is described below

commit ecc2f5d9e227b62f418d65708f516ffe8e690f96
Author: Peter Toth 
AuthorDate: Fri Sep 18 08:17:23 2020 +0900

[SPARK-32635][SQL] Fix foldable propagation

### What changes were proposed in this pull request?
This PR rewrites `FoldablePropagation` rule to replace attribute references 
in a node with foldables coming only from the node's children.

Before this PR in the case of this example (with 
setting`spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation`):
```scala
val a = Seq("1").toDF("col1").withColumn("col2", lit("1"))
val b = Seq("2").toDF("col1").withColumn("col2", lit("2"))
val aub = a.union(b)
val c = aub.filter($"col1" === "2").cache()
val d = Seq("2").toDF( "col4")
val r = d.join(aub, $"col2" === $"col4").select("col4")
val l = c.select("col2")
val df = l.join(r, $"col2" === $"col4", "LeftOuter")
df.show()
```
foldable propagation happens incorrectly:
```
 Join LeftOuter, (col2#6 = col4#34) 
 Join LeftOuter, (col2#6 = col4#34)
!:- Project [col2#6]
 :- Project [1 AS col2#6]
 :  +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, 
deserialized, 1 replicas)   :  +- InMemoryRelation [col1#4, col2#6], 
StorageLevel(disk, memory, deserialized, 1 replicas)
 :+- Union  
 :+- Union
 :   :- *(1) Project [value#1 AS col1#4, 1 AS col2#6]   
 :   :- *(1) Project [value#1 AS col1#4, 1 AS 
col2#6]
 :   :  +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2))   
 :   :  +- *(1) Filter (isnotnull(value#1) AND 
(value#1 = 2))
 :   : +- *(1) LocalTableScan [value#1] 
 :   : +- *(1) LocalTableScan [value#1]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS col2#15]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS 
col2#15]
 :  +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) 
 :  +- *(2) Filter (isnotnull(value#10) AND 
(value#10 = 2))
 : +- *(2) LocalTableScan [value#10]
 : +- *(2) LocalTableScan [value#10]
 +- Project [col4#34]   
 +- Project [col4#34]
+- Join Inner, (col2#6 = col4#34)   
+- Join Inner, (col2#6 = col4#34)
   :- Project [value#31 AS col4#34] 
   :- Project [value#31 AS col4#34]
   :  +- LocalRelation [value#31]   
   :  +- LocalRelation [value#31]
   +- Project [col2#6]  
   +- Project [col2#6]
  +- Union false, false 
  +- Union false, false
 :- Project [1 AS col2#6]   
 :- Project [1 AS col2#6]
 :  +- LocalRelation [value#1]  
 :  +- LocalRelation [value#1]
 +- Project [2 AS col2#15]  
 +- Project [2 AS col2#15]
+- LocalRelation [value#10] 
+- LocalRelation [value#10]

```
and so the result is wrong:
```
+++
|col2|col4|
+++
|   1|null|
+++
```

After this PR foldable propagation will not happen incorrectly and the 
result is correct:
```
+++
|col2|col4|
+++
|   2|   2|
+++
```

### Why are the changes needed?
To fix a correctness issue.

### Does this PR introduce _any_ user-facing change?
Yes, fixes a correctness issue.

### How was this patch tested?
Existing and new UTs.

Closes #29771 from peter-toth/SPARK-32635-fix-foldable-propagation.

Authored-by: Peter Toth 
Signed-off-by: Takeshi

[spark] branch master updated (ea3b979 -> 4ced588)

2020-09-17 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ea3b979  [SPARK-32889][SQL] orc table column name supports special 
characters
 add 4ced588  [SPARK-32635][SQL] Fix foldable propagation

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/expressions/AttributeMap.scala|   2 +
 .../sql/catalyst/expressions/AttributeMap.scala|   2 +
 .../spark/sql/catalyst/optimizer/expressions.scala | 121 -
 .../org/apache/spark/sql/DataFrameSuite.scala  |  12 ++
 4 files changed, 88 insertions(+), 49 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32635][SQL] Fix foldable propagation

2020-09-17 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ecc2f5d  [SPARK-32635][SQL] Fix foldable propagation
ecc2f5d is described below

commit ecc2f5d9e227b62f418d65708f516ffe8e690f96
Author: Peter Toth 
AuthorDate: Fri Sep 18 08:17:23 2020 +0900

[SPARK-32635][SQL] Fix foldable propagation

### What changes were proposed in this pull request?
This PR rewrites `FoldablePropagation` rule to replace attribute references 
in a node with foldables coming only from the node's children.

Before this PR in the case of this example (with 
setting`spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation`):
```scala
val a = Seq("1").toDF("col1").withColumn("col2", lit("1"))
val b = Seq("2").toDF("col1").withColumn("col2", lit("2"))
val aub = a.union(b)
val c = aub.filter($"col1" === "2").cache()
val d = Seq("2").toDF( "col4")
val r = d.join(aub, $"col2" === $"col4").select("col4")
val l = c.select("col2")
val df = l.join(r, $"col2" === $"col4", "LeftOuter")
df.show()
```
foldable propagation happens incorrectly:
```
 Join LeftOuter, (col2#6 = col4#34) 
 Join LeftOuter, (col2#6 = col4#34)
!:- Project [col2#6]
 :- Project [1 AS col2#6]
 :  +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, 
deserialized, 1 replicas)   :  +- InMemoryRelation [col1#4, col2#6], 
StorageLevel(disk, memory, deserialized, 1 replicas)
 :+- Union  
 :+- Union
 :   :- *(1) Project [value#1 AS col1#4, 1 AS col2#6]   
 :   :- *(1) Project [value#1 AS col1#4, 1 AS 
col2#6]
 :   :  +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2))   
 :   :  +- *(1) Filter (isnotnull(value#1) AND 
(value#1 = 2))
 :   : +- *(1) LocalTableScan [value#1] 
 :   : +- *(1) LocalTableScan [value#1]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS col2#15]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS 
col2#15]
 :  +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) 
 :  +- *(2) Filter (isnotnull(value#10) AND 
(value#10 = 2))
 : +- *(2) LocalTableScan [value#10]
 : +- *(2) LocalTableScan [value#10]
 +- Project [col4#34]   
 +- Project [col4#34]
+- Join Inner, (col2#6 = col4#34)   
+- Join Inner, (col2#6 = col4#34)
   :- Project [value#31 AS col4#34] 
   :- Project [value#31 AS col4#34]
   :  +- LocalRelation [value#31]   
   :  +- LocalRelation [value#31]
   +- Project [col2#6]  
   +- Project [col2#6]
  +- Union false, false 
  +- Union false, false
 :- Project [1 AS col2#6]   
 :- Project [1 AS col2#6]
 :  +- LocalRelation [value#1]  
 :  +- LocalRelation [value#1]
 +- Project [2 AS col2#15]  
 +- Project [2 AS col2#15]
+- LocalRelation [value#10] 
+- LocalRelation [value#10]

```
and so the result is wrong:
```
+++
|col2|col4|
+++
|   1|null|
+++
```

After this PR foldable propagation will not happen incorrectly and the 
result is correct:
```
+++
|col2|col4|
+++
|   2|   2|
+++
```

### Why are the changes needed?
To fix a correctness issue.

### Does this PR introduce _any_ user-facing change?
Yes, fixes a correctness issue.

### How was this patch tested?
Existing and new UTs.

Closes #29771 from peter-toth/SPARK-32635-fix-foldable-propagation.

Authored-by: Peter Toth 
Signed-off-by: Takeshi

[spark] branch master updated (ea3b979 -> 4ced588)

2020-09-17 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ea3b979  [SPARK-32889][SQL] orc table column name supports special 
characters
 add 4ced588  [SPARK-32635][SQL] Fix foldable propagation

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/expressions/AttributeMap.scala|   2 +
 .../sql/catalyst/expressions/AttributeMap.scala|   2 +
 .../spark/sql/catalyst/optimizer/expressions.scala | 121 -
 .../org/apache/spark/sql/DataFrameSuite.scala  |  12 ++
 4 files changed, 88 insertions(+), 49 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32635][SQL] Fix foldable propagation

2020-09-17 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ecc2f5d  [SPARK-32635][SQL] Fix foldable propagation
ecc2f5d is described below

commit ecc2f5d9e227b62f418d65708f516ffe8e690f96
Author: Peter Toth 
AuthorDate: Fri Sep 18 08:17:23 2020 +0900

[SPARK-32635][SQL] Fix foldable propagation

### What changes were proposed in this pull request?
This PR rewrites `FoldablePropagation` rule to replace attribute references 
in a node with foldables coming only from the node's children.

Before this PR in the case of this example (with 
setting`spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation`):
```scala
val a = Seq("1").toDF("col1").withColumn("col2", lit("1"))
val b = Seq("2").toDF("col1").withColumn("col2", lit("2"))
val aub = a.union(b)
val c = aub.filter($"col1" === "2").cache()
val d = Seq("2").toDF( "col4")
val r = d.join(aub, $"col2" === $"col4").select("col4")
val l = c.select("col2")
val df = l.join(r, $"col2" === $"col4", "LeftOuter")
df.show()
```
foldable propagation happens incorrectly:
```
 Join LeftOuter, (col2#6 = col4#34) 
 Join LeftOuter, (col2#6 = col4#34)
!:- Project [col2#6]
 :- Project [1 AS col2#6]
 :  +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, 
deserialized, 1 replicas)   :  +- InMemoryRelation [col1#4, col2#6], 
StorageLevel(disk, memory, deserialized, 1 replicas)
 :+- Union  
 :+- Union
 :   :- *(1) Project [value#1 AS col1#4, 1 AS col2#6]   
 :   :- *(1) Project [value#1 AS col1#4, 1 AS 
col2#6]
 :   :  +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2))   
 :   :  +- *(1) Filter (isnotnull(value#1) AND 
(value#1 = 2))
 :   : +- *(1) LocalTableScan [value#1] 
 :   : +- *(1) LocalTableScan [value#1]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS col2#15]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS 
col2#15]
 :  +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) 
 :  +- *(2) Filter (isnotnull(value#10) AND 
(value#10 = 2))
 : +- *(2) LocalTableScan [value#10]
 : +- *(2) LocalTableScan [value#10]
 +- Project [col4#34]   
 +- Project [col4#34]
+- Join Inner, (col2#6 = col4#34)   
+- Join Inner, (col2#6 = col4#34)
   :- Project [value#31 AS col4#34] 
   :- Project [value#31 AS col4#34]
   :  +- LocalRelation [value#31]   
   :  +- LocalRelation [value#31]
   +- Project [col2#6]  
   +- Project [col2#6]
  +- Union false, false 
  +- Union false, false
 :- Project [1 AS col2#6]   
 :- Project [1 AS col2#6]
 :  +- LocalRelation [value#1]  
 :  +- LocalRelation [value#1]
 +- Project [2 AS col2#15]  
 +- Project [2 AS col2#15]
+- LocalRelation [value#10] 
+- LocalRelation [value#10]

```
and so the result is wrong:
```
+++
|col2|col4|
+++
|   1|null|
+++
```

After this PR foldable propagation will not happen incorrectly and the 
result is correct:
```
+++
|col2|col4|
+++
|   2|   2|
+++
```

### Why are the changes needed?
To fix a correctness issue.

### Does this PR introduce _any_ user-facing change?
Yes, fixes a correctness issue.

### How was this patch tested?
Existing and new UTs.

Closes #29771 from peter-toth/SPARK-32635-fix-foldable-propagation.

Authored-by: Peter Toth 
Signed-off-by: Takeshi

[spark] branch master updated (ea3b979 -> 4ced588)

2020-09-17 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ea3b979  [SPARK-32889][SQL] orc table column name supports special 
characters
 add 4ced588  [SPARK-32635][SQL] Fix foldable propagation

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/expressions/AttributeMap.scala|   2 +
 .../sql/catalyst/expressions/AttributeMap.scala|   2 +
 .../spark/sql/catalyst/optimizer/expressions.scala | 121 -
 .../org/apache/spark/sql/DataFrameSuite.scala  |  12 ++
 4 files changed, 88 insertions(+), 49 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32635][SQL] Fix foldable propagation

2020-09-17 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ecc2f5d  [SPARK-32635][SQL] Fix foldable propagation
ecc2f5d is described below

commit ecc2f5d9e227b62f418d65708f516ffe8e690f96
Author: Peter Toth 
AuthorDate: Fri Sep 18 08:17:23 2020 +0900

[SPARK-32635][SQL] Fix foldable propagation

### What changes were proposed in this pull request?
This PR rewrites `FoldablePropagation` rule to replace attribute references 
in a node with foldables coming only from the node's children.

Before this PR in the case of this example (with 
setting`spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation`):
```scala
val a = Seq("1").toDF("col1").withColumn("col2", lit("1"))
val b = Seq("2").toDF("col1").withColumn("col2", lit("2"))
val aub = a.union(b)
val c = aub.filter($"col1" === "2").cache()
val d = Seq("2").toDF( "col4")
val r = d.join(aub, $"col2" === $"col4").select("col4")
val l = c.select("col2")
val df = l.join(r, $"col2" === $"col4", "LeftOuter")
df.show()
```
foldable propagation happens incorrectly:
```
 Join LeftOuter, (col2#6 = col4#34) 
 Join LeftOuter, (col2#6 = col4#34)
!:- Project [col2#6]
 :- Project [1 AS col2#6]
 :  +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, 
deserialized, 1 replicas)   :  +- InMemoryRelation [col1#4, col2#6], 
StorageLevel(disk, memory, deserialized, 1 replicas)
 :+- Union  
 :+- Union
 :   :- *(1) Project [value#1 AS col1#4, 1 AS col2#6]   
 :   :- *(1) Project [value#1 AS col1#4, 1 AS 
col2#6]
 :   :  +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2))   
 :   :  +- *(1) Filter (isnotnull(value#1) AND 
(value#1 = 2))
 :   : +- *(1) LocalTableScan [value#1] 
 :   : +- *(1) LocalTableScan [value#1]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS col2#15]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS 
col2#15]
 :  +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) 
 :  +- *(2) Filter (isnotnull(value#10) AND 
(value#10 = 2))
 : +- *(2) LocalTableScan [value#10]
 : +- *(2) LocalTableScan [value#10]
 +- Project [col4#34]   
 +- Project [col4#34]
+- Join Inner, (col2#6 = col4#34)   
+- Join Inner, (col2#6 = col4#34)
   :- Project [value#31 AS col4#34] 
   :- Project [value#31 AS col4#34]
   :  +- LocalRelation [value#31]   
   :  +- LocalRelation [value#31]
   +- Project [col2#6]  
   +- Project [col2#6]
  +- Union false, false 
  +- Union false, false
 :- Project [1 AS col2#6]   
 :- Project [1 AS col2#6]
 :  +- LocalRelation [value#1]  
 :  +- LocalRelation [value#1]
 +- Project [2 AS col2#15]  
 +- Project [2 AS col2#15]
+- LocalRelation [value#10] 
+- LocalRelation [value#10]

```
and so the result is wrong:
```
+++
|col2|col4|
+++
|   1|null|
+++
```

After this PR foldable propagation will not happen incorrectly and the 
result is correct:
```
+++
|col2|col4|
+++
|   2|   2|
+++
```

### Why are the changes needed?
To fix a correctness issue.

### Does this PR introduce _any_ user-facing change?
Yes, fixes a correctness issue.

### How was this patch tested?
Existing and new UTs.

Closes #29771 from peter-toth/SPARK-32635-fix-foldable-propagation.

Authored-by: Peter Toth 
Signed-off-by: Takeshi

[spark] branch master updated: [SPARK-32635][SQL] Fix foldable propagation

2020-09-17 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4ced588  [SPARK-32635][SQL] Fix foldable propagation
4ced588 is described below

commit 4ced58862c707aa916f7a55d15c3887c94c9b210
Author: Peter Toth 
AuthorDate: Fri Sep 18 08:17:23 2020 +0900

[SPARK-32635][SQL] Fix foldable propagation

### What changes were proposed in this pull request?
This PR rewrites `FoldablePropagation` rule to replace attribute references 
in a node with foldables coming only from the node's children.

Before this PR in the case of this example (with 
setting`spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation`):
```scala
val a = Seq("1").toDF("col1").withColumn("col2", lit("1"))
val b = Seq("2").toDF("col1").withColumn("col2", lit("2"))
val aub = a.union(b)
val c = aub.filter($"col1" === "2").cache()
val d = Seq("2").toDF( "col4")
val r = d.join(aub, $"col2" === $"col4").select("col4")
val l = c.select("col2")
val df = l.join(r, $"col2" === $"col4", "LeftOuter")
df.show()
```
foldable propagation happens incorrectly:
```
 Join LeftOuter, (col2#6 = col4#34) 
 Join LeftOuter, (col2#6 = col4#34)
!:- Project [col2#6]
 :- Project [1 AS col2#6]
 :  +- InMemoryRelation [col1#4, col2#6], StorageLevel(disk, memory, 
deserialized, 1 replicas)   :  +- InMemoryRelation [col1#4, col2#6], 
StorageLevel(disk, memory, deserialized, 1 replicas)
 :+- Union  
 :+- Union
 :   :- *(1) Project [value#1 AS col1#4, 1 AS col2#6]   
 :   :- *(1) Project [value#1 AS col1#4, 1 AS 
col2#6]
 :   :  +- *(1) Filter (isnotnull(value#1) AND (value#1 = 2))   
 :   :  +- *(1) Filter (isnotnull(value#1) AND 
(value#1 = 2))
 :   : +- *(1) LocalTableScan [value#1] 
 :   : +- *(1) LocalTableScan [value#1]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS col2#15]
 :   +- *(2) Project [value#10 AS col1#13, 2 AS 
col2#15]
 :  +- *(2) Filter (isnotnull(value#10) AND (value#10 = 2)) 
 :  +- *(2) Filter (isnotnull(value#10) AND 
(value#10 = 2))
 : +- *(2) LocalTableScan [value#10]
 : +- *(2) LocalTableScan [value#10]
 +- Project [col4#34]   
 +- Project [col4#34]
+- Join Inner, (col2#6 = col4#34)   
+- Join Inner, (col2#6 = col4#34)
   :- Project [value#31 AS col4#34] 
   :- Project [value#31 AS col4#34]
   :  +- LocalRelation [value#31]   
   :  +- LocalRelation [value#31]
   +- Project [col2#6]  
   +- Project [col2#6]
  +- Union false, false 
  +- Union false, false
 :- Project [1 AS col2#6]   
 :- Project [1 AS col2#6]
 :  +- LocalRelation [value#1]  
 :  +- LocalRelation [value#1]
 +- Project [2 AS col2#15]  
 +- Project [2 AS col2#15]
+- LocalRelation [value#10] 
+- LocalRelation [value#10]

```
and so the result is wrong:
```
+++
|col2|col4|
+++
|   1|null|
+++
```

After this PR foldable propagation will not happen incorrectly and the 
result is correct:
```
+++
|col2|col4|
+++
|   2|   2|
+++
```

### Why are the changes needed?
To fix a correctness issue.

### Does this PR introduce _any_ user-facing change?
Yes, fixes a correctness issue.

### How was this patch tested?
Existing and new UTs.

Closes #29771 from peter-toth/SPARK-32635-fix-foldable-propagation.

Authored-by: Peter Toth 
Signed-off-by: Takeshi Yamamuro 
---

[spark] branch master updated (5817c58 -> ea3b979)

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5817c58  [SPARK-32909][SQL] Pass all `sql/hive-thriftserver` module 
UTs in Scala 2.13
 add ea3b979  [SPARK-32889][SQL] orc table column name supports special 
characters

No new revisions were added by this update.

Summary of changes:
 .../execution/datasources/orc/OrcFileFormat.scala  |  2 +-
 .../spark/sql/FileBasedDataSourceSuite.scala   | 14 
 .../spark/sql/hive/execution/SQLQuerySuite.scala   | 74 ++
 3 files changed, 64 insertions(+), 26 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5817c58 -> ea3b979)

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5817c58  [SPARK-32909][SQL] Pass all `sql/hive-thriftserver` module 
UTs in Scala 2.13
 add ea3b979  [SPARK-32889][SQL] orc table column name supports special 
characters

No new revisions were added by this update.

Summary of changes:
 .../execution/datasources/orc/OrcFileFormat.scala  |  2 +-
 .../spark/sql/FileBasedDataSourceSuite.scala   | 14 
 .../spark/sql/hive/execution/SQLQuerySuite.scala   | 74 ++
 3 files changed, 64 insertions(+), 26 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5817c58 -> ea3b979)

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5817c58  [SPARK-32909][SQL] Pass all `sql/hive-thriftserver` module 
UTs in Scala 2.13
 add ea3b979  [SPARK-32889][SQL] orc table column name supports special 
characters

No new revisions were added by this update.

Summary of changes:
 .../execution/datasources/orc/OrcFileFormat.scala  |  2 +-
 .../spark/sql/FileBasedDataSourceSuite.scala   | 14 
 .../spark/sql/hive/execution/SQLQuerySuite.scala   | 74 ++
 3 files changed, 64 insertions(+), 26 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5817c58 -> ea3b979)

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5817c58  [SPARK-32909][SQL] Pass all `sql/hive-thriftserver` module 
UTs in Scala 2.13
 add ea3b979  [SPARK-32889][SQL] orc table column name supports special 
characters

No new revisions were added by this update.

Summary of changes:
 .../execution/datasources/orc/OrcFileFormat.scala  |  2 +-
 .../spark/sql/FileBasedDataSourceSuite.scala   | 14 
 .../spark/sql/hive/execution/SQLQuerySuite.scala   | 74 ++
 3 files changed, 64 insertions(+), 26 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5817c58 -> ea3b979)

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5817c58  [SPARK-32909][SQL] Pass all `sql/hive-thriftserver` module 
UTs in Scala 2.13
 add ea3b979  [SPARK-32889][SQL] orc table column name supports special 
characters

No new revisions were added by this update.

Summary of changes:
 .../execution/datasources/orc/OrcFileFormat.scala  |  2 +-
 .../spark/sql/FileBasedDataSourceSuite.scala   | 14 
 .../spark/sql/hive/execution/SQLQuerySuite.scala   | 74 ++
 3 files changed, 64 insertions(+), 26 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a8442c2 -> 5817c58)

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a8442c2  [SPARK-32926][TESTS] Add Scala 2.13 build test in GitHub 
Action
 add 5817c58  [SPARK-32909][SQL] Pass all `sql/hive-thriftserver` module 
UTs in Scala 2.13

No new revisions were added by this update.

Summary of changes:
 sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala | 2 +-
 .../apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala| 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a8442c2 -> 5817c58)

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a8442c2  [SPARK-32926][TESTS] Add Scala 2.13 build test in GitHub 
Action
 add 5817c58  [SPARK-32909][SQL] Pass all `sql/hive-thriftserver` module 
UTs in Scala 2.13

No new revisions were added by this update.

Summary of changes:
 sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala | 2 +-
 .../apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala| 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a8442c2 -> 5817c58)

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a8442c2  [SPARK-32926][TESTS] Add Scala 2.13 build test in GitHub 
Action
 add 5817c58  [SPARK-32909][SQL] Pass all `sql/hive-thriftserver` module 
UTs in Scala 2.13

No new revisions were added by this update.

Summary of changes:
 sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala | 2 +-
 .../apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala| 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a8442c2 -> 5817c58)

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a8442c2  [SPARK-32926][TESTS] Add Scala 2.13 build test in GitHub 
Action
 add 5817c58  [SPARK-32909][SQL] Pass all `sql/hive-thriftserver` module 
UTs in Scala 2.13

No new revisions were added by this update.

Summary of changes:
 sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala | 2 +-
 .../apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala| 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a8442c2 -> 5817c58)

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a8442c2  [SPARK-32926][TESTS] Add Scala 2.13 build test in GitHub 
Action
 add 5817c58  [SPARK-32909][SQL] Pass all `sql/hive-thriftserver` module 
UTs in Scala 2.13

No new revisions were added by this update.

Summary of changes:
 sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala | 2 +-
 .../apache/spark/sql/hive/HiveMetastoreLazyInitializationSuite.scala| 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (88e87bc -> a8442c2)

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 88e87bc  [SPARK-32887][DOC] Correct the typo for SHOW TABLE
 add a8442c2  [SPARK-32926][TESTS] Add Scala 2.13 build test in GitHub 
Action

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_and_test.yml | 26 ++
 1 file changed, 26 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (88e87bc -> a8442c2)

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 88e87bc  [SPARK-32887][DOC] Correct the typo for SHOW TABLE
 add a8442c2  [SPARK-32926][TESTS] Add Scala 2.13 build test in GitHub 
Action

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_and_test.yml | 26 ++
 1 file changed, 26 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (88e87bc -> a8442c2)

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 88e87bc  [SPARK-32887][DOC] Correct the typo for SHOW TABLE
 add a8442c2  [SPARK-32926][TESTS] Add Scala 2.13 build test in GitHub 
Action

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_and_test.yml | 26 ++
 1 file changed, 26 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (88e87bc -> a8442c2)

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 88e87bc  [SPARK-32887][DOC] Correct the typo for SHOW TABLE
 add a8442c2  [SPARK-32926][TESTS] Add Scala 2.13 build test in GitHub 
Action

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_and_test.yml | 26 ++
 1 file changed, 26 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (88e87bc -> a8442c2)

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 88e87bc  [SPARK-32887][DOC] Correct the typo for SHOW TABLE
 add a8442c2  [SPARK-32926][TESTS] Add Scala 2.13 build test in GitHub 
Action

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_and_test.yml | 26 ++
 1 file changed, 26 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32738][CORE][3.0] Should reduce the number of active threads if fatal error happens in `Inbox.process`

2020-09-17 Thread mridulm80

This is an automated email from the ASF dual-hosted git repository.

mridulm80 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 17a5195  [SPARK-32738][CORE][3.0] Should reduce the number of active 
threads if fatal error happens in `Inbox.process`
17a5195 is described below

commit 17a5195dead75a4dffd150f3f69fa4983e05561b
Author: Zhenhua Wang 
AuthorDate: Thu Sep 17 12:22:35 2020 -0500

[SPARK-32738][CORE][3.0] Should reduce the number of active threads if 
fatal error happens in `Inbox.process`

This is a backport for 
[pr#29580](https://github.com/apache/spark/pull/29580) to branch 3.0.

### What changes were proposed in this pull request?

Processing for `ThreadSafeRpcEndpoint` is controlled by  `numActiveThreads` 
in `Inbox`. Now if any fatal error happens during `Inbox.process`, 
`numActiveThreads` is not reduced. Then other threads can not process messages 
in that inbox, which causes the endpoint to "hang". For other type of 
endpoints, we also should keep  `numActiveThreads` correct.

This problem is more serious in previous Spark 2.x versions since the 
driver, executor and block manager endpoints are all thread safe endpoints.

To fix this, we should reduce the number of active threads if fatal error 
happens in `Inbox.process`.

### Why are the changes needed?

`numActiveThreads` is not correct when fatal error happens and will cause 
the described problem.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Add a new test.

Closes #29763 from wzhfy/deal_with_fatal_error_3.0.

Authored-by: Zhenhua Wang 
Signed-off-by: Mridul Muralidharan gmail.com>
---
 .../scala/org/apache/spark/rpc/netty/Inbox.scala | 20 
 .../org/apache/spark/rpc/netty/InboxSuite.scala  | 13 +
 2 files changed, 33 insertions(+)

diff --git a/core/src/main/scala/org/apache/spark/rpc/netty/Inbox.scala 
b/core/src/main/scala/org/apache/spark/rpc/netty/Inbox.scala
index 2ed03f7..472401b 100644
--- a/core/src/main/scala/org/apache/spark/rpc/netty/Inbox.scala
+++ b/core/src/main/scala/org/apache/spark/rpc/netty/Inbox.scala
@@ -200,6 +200,16 @@ private[netty] class Inbox(val endpointName: String, val 
endpoint: RpcEndpoint)
* Calls action closure, and calls the endpoint's onError function in the 
case of exceptions.
*/
   private def safelyCall(endpoint: RpcEndpoint)(action: => Unit): Unit = {
+def dealWithFatalError(fatal: Throwable): Unit = {
+  inbox.synchronized {
+assert(numActiveThreads > 0, "The number of active threads should be 
positive.")
+// Should reduce the number of active threads before throw the error.
+numActiveThreads -= 1
+  }
+  logError(s"An error happened while processing message in the inbox for 
$endpointName", fatal)
+  throw fatal
+}
+
 try action catch {
   case NonFatal(e) =>
 try endpoint.onError(e) catch {
@@ -209,8 +219,18 @@ private[netty] class Inbox(val endpointName: String, val 
endpoint: RpcEndpoint)
 } else {
   logError("Ignoring error", ee)
 }
+  case fatal: Throwable =>
+dealWithFatalError(fatal)
 }
+  case fatal: Throwable =>
+dealWithFatalError(fatal)
 }
   }
 
+  // exposed only for testing
+  def getNumActiveThreads: Int = {
+inbox.synchronized {
+  inbox.numActiveThreads
+}
+  }
 }
diff --git a/core/src/test/scala/org/apache/spark/rpc/netty/InboxSuite.scala 
b/core/src/test/scala/org/apache/spark/rpc/netty/InboxSuite.scala
index c74c728..8b1c602 100644
--- a/core/src/test/scala/org/apache/spark/rpc/netty/InboxSuite.scala
+++ b/core/src/test/scala/org/apache/spark/rpc/netty/InboxSuite.scala
@@ -136,4 +136,17 @@ class InboxSuite extends SparkFunSuite {
 
 endpoint.verifySingleOnNetworkErrorMessage(cause, remoteAddress)
   }
+
+  test("SPARK-32738: should reduce the number of active threads when fatal 
error happens") {
+val endpoint = mock(classOf[TestRpcEndpoint])
+when(endpoint.receive).thenThrow(new OutOfMemoryError())
+
+val dispatcher = mock(classOf[Dispatcher])
+val inbox = new Inbox("name", endpoint)
+inbox.post(OneWayMessage(null, "hi"))
+intercept[OutOfMemoryError] {
+  inbox.process(dispatcher)
+}
+assert(inbox.getNumActiveThreads == 0)
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (482a79a -> 88e87bc)

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 482a79a  [SPARK-24994][SQL][FOLLOW-UP] Handle foldable, timezone and 
cleanup
 add 88e87bc  [SPARK-32887][DOC] Correct the typo for SHOW TABLE

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-aux-show-table.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32887][DOC] Correct the typo for SHOW TABLE

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new b3b6f38  [SPARK-32887][DOC] Correct the typo for SHOW TABLE
b3b6f38 is described below

commit b3b6f381108b770c38753ebdb55e209361674008
Author: Udbhav30 
AuthorDate: Thu Sep 17 09:25:17 2020 -0700

[SPARK-32887][DOC] Correct the typo for SHOW TABLE

### What changes were proposed in this pull request?
Correct the typo in Show Table document

### Why are the changes needed?
Current Document of Show Table returns in parse error, so it is misleading 
to users

### Does this PR introduce _any_ user-facing change?
Yes, the document of show table is corrected now

### How was this patch tested?
NA

Closes #29758 from Udbhav30/showtable.

Authored-by: Udbhav30 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 88e87bc8ebfa5aa1a8cc8928672749517ae0c41f)
Signed-off-by: Dongjoon Hyun 
---
 docs/sql-ref-syntax-aux-show-table.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/sql-ref-syntax-aux-show-table.md 
b/docs/sql-ref-syntax-aux-show-table.md
index 0ce0a3e..3314402 100644
--- a/docs/sql-ref-syntax-aux-show-table.md
+++ b/docs/sql-ref-syntax-aux-show-table.md
@@ -97,7 +97,7 @@ SHOW TABLE EXTENDED LIKE 'employee';
 
++-+---+--+
 
 -- showing the multiple table details with pattern matching
-SHOW TABLE EXTENDED  LIKE `employe*`;
+SHOW TABLE EXTENDED  LIKE 'employe*';
 
++-+---+--+
 |database|tableName|isTemporary| information   
   |
 
++-+---+--+
@@ -146,7 +146,7 @@ SHOW TABLE EXTENDED  LIKE `employe*`;
 
++-+--+---+
   
 -- show partition file system details
-SHOW TABLE EXTENDED  IN default LIKE `employee` PARTITION (`grade=1`);
+SHOW TABLE EXTENDED  IN default LIKE 'employee' PARTITION (grade=1);
 
++-+---+--+
 |database|tableName|isTemporary| information   
   |
 
++-+---+--+
@@ -169,7 +169,7 @@ SHOW TABLE EXTENDED  IN default LIKE `employee` PARTITION 
(`grade=1`);
 
++-+---+--+
 
 -- show partition file system details with regex fails as shown below
-SHOW TABLE EXTENDED  IN default LIKE `empl*` PARTITION (`grade=1`);
+SHOW TABLE EXTENDED  IN default LIKE 'empl*' PARTITION (grade=1);
 Error: Error running query: 
org.apache.spark.sql.catalyst.analysis.NoSuchTableException:
  Table or view 'emplo*' not found in database 'default'; (state=,code=0)
 ```


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (482a79a -> 88e87bc)

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 482a79a  [SPARK-24994][SQL][FOLLOW-UP] Handle foldable, timezone and 
cleanup
 add 88e87bc  [SPARK-32887][DOC] Correct the typo for SHOW TABLE

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-aux-show-table.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32887][DOC] Correct the typo for SHOW TABLE

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new b3b6f38  [SPARK-32887][DOC] Correct the typo for SHOW TABLE
b3b6f38 is described below

commit b3b6f381108b770c38753ebdb55e209361674008
Author: Udbhav30 
AuthorDate: Thu Sep 17 09:25:17 2020 -0700

[SPARK-32887][DOC] Correct the typo for SHOW TABLE

### What changes were proposed in this pull request?
Correct the typo in Show Table document

### Why are the changes needed?
Current Document of Show Table returns in parse error, so it is misleading 
to users

### Does this PR introduce _any_ user-facing change?
Yes, the document of show table is corrected now

### How was this patch tested?
NA

Closes #29758 from Udbhav30/showtable.

Authored-by: Udbhav30 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 88e87bc8ebfa5aa1a8cc8928672749517ae0c41f)
Signed-off-by: Dongjoon Hyun 
---
 docs/sql-ref-syntax-aux-show-table.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/sql-ref-syntax-aux-show-table.md 
b/docs/sql-ref-syntax-aux-show-table.md
index 0ce0a3e..3314402 100644
--- a/docs/sql-ref-syntax-aux-show-table.md
+++ b/docs/sql-ref-syntax-aux-show-table.md
@@ -97,7 +97,7 @@ SHOW TABLE EXTENDED LIKE 'employee';
 
++-+---+--+
 
 -- showing the multiple table details with pattern matching
-SHOW TABLE EXTENDED  LIKE `employe*`;
+SHOW TABLE EXTENDED  LIKE 'employe*';
 
++-+---+--+
 |database|tableName|isTemporary| information   
   |
 
++-+---+--+
@@ -146,7 +146,7 @@ SHOW TABLE EXTENDED  LIKE `employe*`;
 
++-+--+---+
   
 -- show partition file system details
-SHOW TABLE EXTENDED  IN default LIKE `employee` PARTITION (`grade=1`);
+SHOW TABLE EXTENDED  IN default LIKE 'employee' PARTITION (grade=1);
 
++-+---+--+
 |database|tableName|isTemporary| information   
   |
 
++-+---+--+
@@ -169,7 +169,7 @@ SHOW TABLE EXTENDED  IN default LIKE `employee` PARTITION 
(`grade=1`);
 
++-+---+--+
 
 -- show partition file system details with regex fails as shown below
-SHOW TABLE EXTENDED  IN default LIKE `empl*` PARTITION (`grade=1`);
+SHOW TABLE EXTENDED  IN default LIKE 'empl*' PARTITION (grade=1);
 Error: Error running query: 
org.apache.spark.sql.catalyst.analysis.NoSuchTableException:
  Table or view 'emplo*' not found in database 'default'; (state=,code=0)
 ```


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (482a79a -> 88e87bc)

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 482a79a  [SPARK-24994][SQL][FOLLOW-UP] Handle foldable, timezone and 
cleanup
 add 88e87bc  [SPARK-32887][DOC] Correct the typo for SHOW TABLE

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-aux-show-table.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32887][DOC] Correct the typo for SHOW TABLE

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new b3b6f38  [SPARK-32887][DOC] Correct the typo for SHOW TABLE
b3b6f38 is described below

commit b3b6f381108b770c38753ebdb55e209361674008
Author: Udbhav30 
AuthorDate: Thu Sep 17 09:25:17 2020 -0700

[SPARK-32887][DOC] Correct the typo for SHOW TABLE

### What changes were proposed in this pull request?
Correct the typo in Show Table document

### Why are the changes needed?
Current Document of Show Table returns in parse error, so it is misleading 
to users

### Does this PR introduce _any_ user-facing change?
Yes, the document of show table is corrected now

### How was this patch tested?
NA

Closes #29758 from Udbhav30/showtable.

Authored-by: Udbhav30 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 88e87bc8ebfa5aa1a8cc8928672749517ae0c41f)
Signed-off-by: Dongjoon Hyun 
---
 docs/sql-ref-syntax-aux-show-table.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/sql-ref-syntax-aux-show-table.md 
b/docs/sql-ref-syntax-aux-show-table.md
index 0ce0a3e..3314402 100644
--- a/docs/sql-ref-syntax-aux-show-table.md
+++ b/docs/sql-ref-syntax-aux-show-table.md
@@ -97,7 +97,7 @@ SHOW TABLE EXTENDED LIKE 'employee';
 
++-+---+--+
 
 -- showing the multiple table details with pattern matching
-SHOW TABLE EXTENDED  LIKE `employe*`;
+SHOW TABLE EXTENDED  LIKE 'employe*';
 
++-+---+--+
 |database|tableName|isTemporary| information   
   |
 
++-+---+--+
@@ -146,7 +146,7 @@ SHOW TABLE EXTENDED  LIKE `employe*`;
 
++-+--+---+
   
 -- show partition file system details
-SHOW TABLE EXTENDED  IN default LIKE `employee` PARTITION (`grade=1`);
+SHOW TABLE EXTENDED  IN default LIKE 'employee' PARTITION (grade=1);
 
++-+---+--+
 |database|tableName|isTemporary| information   
   |
 
++-+---+--+
@@ -169,7 +169,7 @@ SHOW TABLE EXTENDED  IN default LIKE `employee` PARTITION 
(`grade=1`);
 
++-+---+--+
 
 -- show partition file system details with regex fails as shown below
-SHOW TABLE EXTENDED  IN default LIKE `empl*` PARTITION (`grade=1`);
+SHOW TABLE EXTENDED  IN default LIKE 'empl*' PARTITION (grade=1);
 Error: Error running query: 
org.apache.spark.sql.catalyst.analysis.NoSuchTableException:
  Table or view 'emplo*' not found in database 'default'; (state=,code=0)
 ```


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (482a79a -> 88e87bc)

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 482a79a  [SPARK-24994][SQL][FOLLOW-UP] Handle foldable, timezone and 
cleanup
 add 88e87bc  [SPARK-32887][DOC] Correct the typo for SHOW TABLE

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-aux-show-table.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32887][DOC] Correct the typo for SHOW TABLE

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new b3b6f38  [SPARK-32887][DOC] Correct the typo for SHOW TABLE
b3b6f38 is described below

commit b3b6f381108b770c38753ebdb55e209361674008
Author: Udbhav30 
AuthorDate: Thu Sep 17 09:25:17 2020 -0700

[SPARK-32887][DOC] Correct the typo for SHOW TABLE

### What changes were proposed in this pull request?
Correct the typo in Show Table document

### Why are the changes needed?
Current Document of Show Table returns in parse error, so it is misleading 
to users

### Does this PR introduce _any_ user-facing change?
Yes, the document of show table is corrected now

### How was this patch tested?
NA

Closes #29758 from Udbhav30/showtable.

Authored-by: Udbhav30 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 88e87bc8ebfa5aa1a8cc8928672749517ae0c41f)
Signed-off-by: Dongjoon Hyun 
---
 docs/sql-ref-syntax-aux-show-table.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/sql-ref-syntax-aux-show-table.md 
b/docs/sql-ref-syntax-aux-show-table.md
index 0ce0a3e..3314402 100644
--- a/docs/sql-ref-syntax-aux-show-table.md
+++ b/docs/sql-ref-syntax-aux-show-table.md
@@ -97,7 +97,7 @@ SHOW TABLE EXTENDED LIKE 'employee';
 
++-+---+--+
 
 -- showing the multiple table details with pattern matching
-SHOW TABLE EXTENDED  LIKE `employe*`;
+SHOW TABLE EXTENDED  LIKE 'employe*';
 
++-+---+--+
 |database|tableName|isTemporary| information   
   |
 
++-+---+--+
@@ -146,7 +146,7 @@ SHOW TABLE EXTENDED  LIKE `employe*`;
 
++-+--+---+
   
 -- show partition file system details
-SHOW TABLE EXTENDED  IN default LIKE `employee` PARTITION (`grade=1`);
+SHOW TABLE EXTENDED  IN default LIKE 'employee' PARTITION (grade=1);
 
++-+---+--+
 |database|tableName|isTemporary| information   
   |
 
++-+---+--+
@@ -169,7 +169,7 @@ SHOW TABLE EXTENDED  IN default LIKE `employee` PARTITION 
(`grade=1`);
 
++-+---+--+
 
 -- show partition file system details with regex fails as shown below
-SHOW TABLE EXTENDED  IN default LIKE `empl*` PARTITION (`grade=1`);
+SHOW TABLE EXTENDED  IN default LIKE 'empl*' PARTITION (grade=1);
 Error: Error running query: 
org.apache.spark.sql.catalyst.analysis.NoSuchTableException:
  Table or view 'emplo*' not found in database 'default'; (state=,code=0)
 ```


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (482a79a -> 88e87bc)

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 482a79a  [SPARK-24994][SQL][FOLLOW-UP] Handle foldable, timezone and 
cleanup
 add 88e87bc  [SPARK-32887][DOC] Correct the typo for SHOW TABLE

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-aux-show-table.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32887][DOC] Correct the typo for SHOW TABLE

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new b3b6f38  [SPARK-32887][DOC] Correct the typo for SHOW TABLE
b3b6f38 is described below

commit b3b6f381108b770c38753ebdb55e209361674008
Author: Udbhav30 
AuthorDate: Thu Sep 17 09:25:17 2020 -0700

[SPARK-32887][DOC] Correct the typo for SHOW TABLE

### What changes were proposed in this pull request?
Correct the typo in Show Table document

### Why are the changes needed?
Current Document of Show Table returns in parse error, so it is misleading 
to users

### Does this PR introduce _any_ user-facing change?
Yes, the document of show table is corrected now

### How was this patch tested?
NA

Closes #29758 from Udbhav30/showtable.

Authored-by: Udbhav30 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 88e87bc8ebfa5aa1a8cc8928672749517ae0c41f)
Signed-off-by: Dongjoon Hyun 
---
 docs/sql-ref-syntax-aux-show-table.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/sql-ref-syntax-aux-show-table.md 
b/docs/sql-ref-syntax-aux-show-table.md
index 0ce0a3e..3314402 100644
--- a/docs/sql-ref-syntax-aux-show-table.md
+++ b/docs/sql-ref-syntax-aux-show-table.md
@@ -97,7 +97,7 @@ SHOW TABLE EXTENDED LIKE 'employee';
 
++-+---+--+
 
 -- showing the multiple table details with pattern matching
-SHOW TABLE EXTENDED  LIKE `employe*`;
+SHOW TABLE EXTENDED  LIKE 'employe*';
 
++-+---+--+
 |database|tableName|isTemporary| information   
   |
 
++-+---+--+
@@ -146,7 +146,7 @@ SHOW TABLE EXTENDED  LIKE `employe*`;
 
++-+--+---+
   
 -- show partition file system details
-SHOW TABLE EXTENDED  IN default LIKE `employee` PARTITION (`grade=1`);
+SHOW TABLE EXTENDED  IN default LIKE 'employee' PARTITION (grade=1);
 
++-+---+--+
 |database|tableName|isTemporary| information   
   |
 
++-+---+--+
@@ -169,7 +169,7 @@ SHOW TABLE EXTENDED  IN default LIKE `employee` PARTITION 
(`grade=1`);
 
++-+---+--+
 
 -- show partition file system details with regex fails as shown below
-SHOW TABLE EXTENDED  IN default LIKE `empl*` PARTITION (`grade=1`);
+SHOW TABLE EXTENDED  IN default LIKE 'empl*' PARTITION (grade=1);
 Error: Error running query: 
org.apache.spark.sql.catalyst.analysis.NoSuchTableException:
  Table or view 'emplo*' not found in database 'default'; (state=,code=0)
 ```


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a54a6a0 -> 482a79a)

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a54a6a0  [SPARK-32287][CORE] Fix flaky 
o.a.s.ExecutorAllocationManagerSuite on GithubActions
 add 482a79a  [SPARK-24994][SQL][FOLLOW-UP] Handle foldable, timezone and 
cleanup

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala  |  7 +--
 .../catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala | 10 ++
 .../scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala  |  5 ++---
 3 files changed, 13 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a54a6a0 -> 482a79a)

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a54a6a0  [SPARK-32287][CORE] Fix flaky 
o.a.s.ExecutorAllocationManagerSuite on GithubActions
 add 482a79a  [SPARK-24994][SQL][FOLLOW-UP] Handle foldable, timezone and 
cleanup

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala  |  7 +--
 .../catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala | 10 ++
 .../scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala  |  5 ++---
 3 files changed, 13 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a54a6a0 -> 482a79a)

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a54a6a0  [SPARK-32287][CORE] Fix flaky 
o.a.s.ExecutorAllocationManagerSuite on GithubActions
 add 482a79a  [SPARK-24994][SQL][FOLLOW-UP] Handle foldable, timezone and 
cleanup

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala  |  7 +--
 .../catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala | 10 ++
 .../scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala  |  5 ++---
 3 files changed, 13 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a54a6a0 -> 482a79a)

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a54a6a0  [SPARK-32287][CORE] Fix flaky 
o.a.s.ExecutorAllocationManagerSuite on GithubActions
 add 482a79a  [SPARK-24994][SQL][FOLLOW-UP] Handle foldable, timezone and 
cleanup

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala  |  7 +--
 .../catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala | 10 ++
 .../scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala  |  5 ++---
 3 files changed, 13 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a54a6a0 -> 482a79a)

2020-09-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a54a6a0  [SPARK-32287][CORE] Fix flaky 
o.a.s.ExecutorAllocationManagerSuite on GithubActions
 add 482a79a  [SPARK-24994][SQL][FOLLOW-UP] Handle foldable, timezone and 
cleanup

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala  |  7 +--
 .../catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala | 10 ++
 .../scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala  |  5 ++---
 3 files changed, 13 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (e5e54a3 -> a54a6a0)

2020-09-17 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e5e54a3  [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when 
there are nulls
 add a54a6a0  [SPARK-32287][CORE] Fix flaky 
o.a.s.ExecutorAllocationManagerSuite on GithubActions

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/ExecutorAllocationManager.scala  | 13 ++---
 .../main/scala/org/apache/spark/internal/config/Tests.scala | 10 +-
 .../org/apache/spark/ExecutorAllocationManagerSuite.scala   |  9 +
 3 files changed, 16 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (e5e54a3 -> a54a6a0)

2020-09-17 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e5e54a3  [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when 
there are nulls
 add a54a6a0  [SPARK-32287][CORE] Fix flaky 
o.a.s.ExecutorAllocationManagerSuite on GithubActions

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/ExecutorAllocationManager.scala  | 13 ++---
 .../main/scala/org/apache/spark/internal/config/Tests.scala | 10 +-
 .../org/apache/spark/ExecutorAllocationManagerSuite.scala   |  9 +
 3 files changed, 16 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (e5e54a3 -> a54a6a0)

2020-09-17 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e5e54a3  [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when 
there are nulls
 add a54a6a0  [SPARK-32287][CORE] Fix flaky 
o.a.s.ExecutorAllocationManagerSuite on GithubActions

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/ExecutorAllocationManager.scala  | 13 ++---
 .../main/scala/org/apache/spark/internal/config/Tests.scala | 10 +-
 .../org/apache/spark/ExecutorAllocationManagerSuite.scala   |  9 +
 3 files changed, 16 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (e5e54a3 -> a54a6a0)

2020-09-17 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e5e54a3  [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when 
there are nulls
 add a54a6a0  [SPARK-32287][CORE] Fix flaky 
o.a.s.ExecutorAllocationManagerSuite on GithubActions

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/ExecutorAllocationManager.scala  | 13 ++---
 .../main/scala/org/apache/spark/internal/config/Tests.scala | 10 +-
 .../org/apache/spark/ExecutorAllocationManagerSuite.scala   |  9 +
 3 files changed, 16 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (e5e54a3 -> a54a6a0)

2020-09-17 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e5e54a3  [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when 
there are nulls
 add a54a6a0  [SPARK-32287][CORE] Fix flaky 
o.a.s.ExecutorAllocationManagerSuite on GithubActions

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/ExecutorAllocationManager.scala  | 13 ++---
 .../main/scala/org/apache/spark/internal/config/Tests.scala | 10 +-
 .../org/apache/spark/ExecutorAllocationManagerSuite.scala   |  9 +
 3 files changed, 16 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls

2020-09-17 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 2fa68a6  [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when 
there are nulls
2fa68a6 is described below

commit 2fa68a669cc83521c7257d844202790933ae9771
Author: Tom van Bussel 
AuthorDate: Thu Sep 17 12:35:40 2020 +0200

[SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls

### What changes were proposed in this pull request?

This PR changes the way `UnsafeExternalSorter.SpillableIterator` checks 
whether it has spilled already, by checking whether `inMemSorter` is null. It 
also allows it to spill other `UnsafeSorterIterator`s than 
`UnsafeInMemorySorter.SortedIterator`.

### Why are the changes needed?

Before this PR `UnsafeExternalSorter.SpillableIterator` could not spill 
when there are NULLs in the input and radix sorting is used. Currently, Spark 
determines whether UnsafeExternalSorter.SpillableIterator has not spilled yet 
by checking whether `upstream` is an instance of 
`UnsafeInMemorySorter.SortedIterator`. When radix sorting is used and there are 
NULLs in the input however, `upstream` will be an instance of 
`UnsafeExternalSorter.ChainedIterator` instead, and Spark will assume  [...]

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

A test was added to `UnsafeExternalSorterSuite` (and therefore also to 
`UnsafeExternalSorterRadixSortSuite`). I manually confirmed that the test 
failed in `UnsafeExternalSorterRadixSortSuite` without this patch.

Closes #29772 from tomvanbussel/SPARK-32900.

Authored-by: Tom van Bussel 
Signed-off-by: herman 
(cherry picked from commit e5e54a3614ffd2a9150921e84e5b813d5cbf285a)
Signed-off-by: herman 
---
 .../unsafe/sort/UnsafeExternalSorter.java  | 69 +-
 .../unsafe/sort/UnsafeInMemorySorter.java  |  1 +
 .../unsafe/sort/UnsafeSorterIterator.java  |  2 +
 .../unsafe/sort/UnsafeSorterSpillMerger.java   |  5 ++
 .../unsafe/sort/UnsafeSorterSpillReader.java   |  5 ++
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++
 6 files changed, 88 insertions(+), 27 deletions(-)

diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
index a6a2076..f720ccd 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
@@ -505,11 +505,15 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
*/
   class SpillableIterator extends UnsafeSorterIterator {
 private UnsafeSorterIterator upstream;
-private UnsafeSorterIterator nextUpstream = null;
 private MemoryBlock lastPage = null;
 private boolean loaded = false;
 private int numRecords = 0;
 
+private Object currentBaseObject;
+private long currentBaseOffset;
+private int currentRecordLength;
+private long currentKeyPrefix;
+
 SpillableIterator(UnsafeSorterIterator inMemIterator) {
   this.upstream = inMemIterator;
   this.numRecords = inMemIterator.getNumRecords();
@@ -520,23 +524,26 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
   return numRecords;
 }
 
+@Override
+public long getCurrentPageNumber() {
+  throw new UnsupportedOperationException();
+}
+
 public long spill() throws IOException {
   synchronized (this) {
-if (!(upstream instanceof UnsafeInMemorySorter.SortedIterator && 
nextUpstream == null
-  && numRecords > 0)) {
+if (inMemSorter == null || numRecords <= 0) {
   return 0L;
 }
 
-UnsafeInMemorySorter.SortedIterator inMemIterator =
-  ((UnsafeInMemorySorter.SortedIterator) upstream).clone();
+long currentPageNumber = upstream.getCurrentPageNumber();
 
-   ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics();
+ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics();
 // Iterate over the records that have not been returned and spill them.
 final UnsafeSorterSpillWriter spillWriter =
   new UnsafeSorterSpillWriter(blockManager, fileBufferSizeBytes, 
writeMetrics, numRecords);
-spillIterator(inMemIterator, spillWriter);
+spillIterator(upstream, spillWriter);
 spillWriters.add(spillWriter);
-nextUpstream = spillWriter.getReader(serializerManager);
+upstream = spillWriter.getReader(serializerManager);
 
 long released = 0L;
 synchronized

[spark] branch branch-3.0 updated: [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls

2020-09-17 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2e94d9a  [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when 
there are nulls
2e94d9a is described below

commit 2e94d9af6e0d0c06eaba650dc70f07cc5b2bd791
Author: Tom van Bussel 
AuthorDate: Thu Sep 17 12:35:40 2020 +0200

[SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls

### What changes were proposed in this pull request?

This PR changes the way `UnsafeExternalSorter.SpillableIterator` checks 
whether it has spilled already, by checking whether `inMemSorter` is null. It 
also allows it to spill other `UnsafeSorterIterator`s than 
`UnsafeInMemorySorter.SortedIterator`.

### Why are the changes needed?

Before this PR `UnsafeExternalSorter.SpillableIterator` could not spill 
when there are NULLs in the input and radix sorting is used. Currently, Spark 
determines whether UnsafeExternalSorter.SpillableIterator has not spilled yet 
by checking whether `upstream` is an instance of 
`UnsafeInMemorySorter.SortedIterator`. When radix sorting is used and there are 
NULLs in the input however, `upstream` will be an instance of 
`UnsafeExternalSorter.ChainedIterator` instead, and Spark will assume  [...]

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

A test was added to `UnsafeExternalSorterSuite` (and therefore also to 
`UnsafeExternalSorterRadixSortSuite`). I manually confirmed that the test 
failed in `UnsafeExternalSorterRadixSortSuite` without this patch.

Closes #29772 from tomvanbussel/SPARK-32900.

Authored-by: Tom van Bussel 
Signed-off-by: herman 
(cherry picked from commit e5e54a3614ffd2a9150921e84e5b813d5cbf285a)
Signed-off-by: herman 
---
 .../unsafe/sort/UnsafeExternalSorter.java  | 69 +-
 .../unsafe/sort/UnsafeInMemorySorter.java  |  1 +
 .../unsafe/sort/UnsafeSorterIterator.java  |  2 +
 .../unsafe/sort/UnsafeSorterSpillMerger.java   |  5 ++
 .../unsafe/sort/UnsafeSorterSpillReader.java   |  5 ++
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++
 6 files changed, 88 insertions(+), 27 deletions(-)

diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
index 55e4e60..71b9a5b 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
@@ -501,11 +501,15 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
*/
   class SpillableIterator extends UnsafeSorterIterator {
 private UnsafeSorterIterator upstream;
-private UnsafeSorterIterator nextUpstream = null;
 private MemoryBlock lastPage = null;
 private boolean loaded = false;
 private int numRecords = 0;
 
+private Object currentBaseObject;
+private long currentBaseOffset;
+private int currentRecordLength;
+private long currentKeyPrefix;
+
 SpillableIterator(UnsafeSorterIterator inMemIterator) {
   this.upstream = inMemIterator;
   this.numRecords = inMemIterator.getNumRecords();
@@ -516,23 +520,26 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
   return numRecords;
 }
 
+@Override
+public long getCurrentPageNumber() {
+  throw new UnsupportedOperationException();
+}
+
 public long spill() throws IOException {
   synchronized (this) {
-if (!(upstream instanceof UnsafeInMemorySorter.SortedIterator && 
nextUpstream == null
-  && numRecords > 0)) {
+if (inMemSorter == null || numRecords <= 0) {
   return 0L;
 }
 
-UnsafeInMemorySorter.SortedIterator inMemIterator =
-  ((UnsafeInMemorySorter.SortedIterator) upstream).clone();
+long currentPageNumber = upstream.getCurrentPageNumber();
 
-   ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics();
+ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics();
 // Iterate over the records that have not been returned and spill them.
 final UnsafeSorterSpillWriter spillWriter =
   new UnsafeSorterSpillWriter(blockManager, fileBufferSizeBytes, 
writeMetrics, numRecords);
-spillIterator(inMemIterator, spillWriter);
+spillIterator(upstream, spillWriter);
 spillWriters.add(spillWriter);
-nextUpstream = spillWriter.getReader(serializerManager);
+upstream = spillWriter.getReader(serializerManager);
 
 long released = 0L;
 synchronized

[spark] branch branch-2.4 updated: [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls

2020-09-17 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 2fa68a6  [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when 
there are nulls
2fa68a6 is described below

commit 2fa68a669cc83521c7257d844202790933ae9771
Author: Tom van Bussel 
AuthorDate: Thu Sep 17 12:35:40 2020 +0200

[SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls

### What changes were proposed in this pull request?

This PR changes the way `UnsafeExternalSorter.SpillableIterator` checks 
whether it has spilled already, by checking whether `inMemSorter` is null. It 
also allows it to spill other `UnsafeSorterIterator`s than 
`UnsafeInMemorySorter.SortedIterator`.

### Why are the changes needed?

Before this PR `UnsafeExternalSorter.SpillableIterator` could not spill 
when there are NULLs in the input and radix sorting is used. Currently, Spark 
determines whether UnsafeExternalSorter.SpillableIterator has not spilled yet 
by checking whether `upstream` is an instance of 
`UnsafeInMemorySorter.SortedIterator`. When radix sorting is used and there are 
NULLs in the input however, `upstream` will be an instance of 
`UnsafeExternalSorter.ChainedIterator` instead, and Spark will assume  [...]

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

A test was added to `UnsafeExternalSorterSuite` (and therefore also to 
`UnsafeExternalSorterRadixSortSuite`). I manually confirmed that the test 
failed in `UnsafeExternalSorterRadixSortSuite` without this patch.

Closes #29772 from tomvanbussel/SPARK-32900.

Authored-by: Tom van Bussel 
Signed-off-by: herman 
(cherry picked from commit e5e54a3614ffd2a9150921e84e5b813d5cbf285a)
Signed-off-by: herman 
---
 .../unsafe/sort/UnsafeExternalSorter.java  | 69 +-
 .../unsafe/sort/UnsafeInMemorySorter.java  |  1 +
 .../unsafe/sort/UnsafeSorterIterator.java  |  2 +
 .../unsafe/sort/UnsafeSorterSpillMerger.java   |  5 ++
 .../unsafe/sort/UnsafeSorterSpillReader.java   |  5 ++
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++
 6 files changed, 88 insertions(+), 27 deletions(-)

diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
index a6a2076..f720ccd 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
@@ -505,11 +505,15 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
*/
   class SpillableIterator extends UnsafeSorterIterator {
 private UnsafeSorterIterator upstream;
-private UnsafeSorterIterator nextUpstream = null;
 private MemoryBlock lastPage = null;
 private boolean loaded = false;
 private int numRecords = 0;
 
+private Object currentBaseObject;
+private long currentBaseOffset;
+private int currentRecordLength;
+private long currentKeyPrefix;
+
 SpillableIterator(UnsafeSorterIterator inMemIterator) {
   this.upstream = inMemIterator;
   this.numRecords = inMemIterator.getNumRecords();
@@ -520,23 +524,26 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
   return numRecords;
 }
 
+@Override
+public long getCurrentPageNumber() {
+  throw new UnsupportedOperationException();
+}
+
 public long spill() throws IOException {
   synchronized (this) {
-if (!(upstream instanceof UnsafeInMemorySorter.SortedIterator && 
nextUpstream == null
-  && numRecords > 0)) {
+if (inMemSorter == null || numRecords <= 0) {
   return 0L;
 }
 
-UnsafeInMemorySorter.SortedIterator inMemIterator =
-  ((UnsafeInMemorySorter.SortedIterator) upstream).clone();
+long currentPageNumber = upstream.getCurrentPageNumber();
 
-   ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics();
+ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics();
 // Iterate over the records that have not been returned and spill them.
 final UnsafeSorterSpillWriter spillWriter =
   new UnsafeSorterSpillWriter(blockManager, fileBufferSizeBytes, 
writeMetrics, numRecords);
-spillIterator(inMemIterator, spillWriter);
+spillIterator(upstream, spillWriter);
 spillWriters.add(spillWriter);
-nextUpstream = spillWriter.getReader(serializerManager);
+upstream = spillWriter.getReader(serializerManager);
 
 long released = 0L;
 synchronized

[spark] branch branch-3.0 updated: [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls

2020-09-17 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2e94d9a  [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when 
there are nulls
2e94d9a is described below

commit 2e94d9af6e0d0c06eaba650dc70f07cc5b2bd791
Author: Tom van Bussel 
AuthorDate: Thu Sep 17 12:35:40 2020 +0200

[SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls

### What changes were proposed in this pull request?

This PR changes the way `UnsafeExternalSorter.SpillableIterator` checks 
whether it has spilled already, by checking whether `inMemSorter` is null. It 
also allows it to spill other `UnsafeSorterIterator`s than 
`UnsafeInMemorySorter.SortedIterator`.

### Why are the changes needed?

Before this PR `UnsafeExternalSorter.SpillableIterator` could not spill 
when there are NULLs in the input and radix sorting is used. Currently, Spark 
determines whether UnsafeExternalSorter.SpillableIterator has not spilled yet 
by checking whether `upstream` is an instance of 
`UnsafeInMemorySorter.SortedIterator`. When radix sorting is used and there are 
NULLs in the input however, `upstream` will be an instance of 
`UnsafeExternalSorter.ChainedIterator` instead, and Spark will assume  [...]

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

A test was added to `UnsafeExternalSorterSuite` (and therefore also to 
`UnsafeExternalSorterRadixSortSuite`). I manually confirmed that the test 
failed in `UnsafeExternalSorterRadixSortSuite` without this patch.

Closes #29772 from tomvanbussel/SPARK-32900.

Authored-by: Tom van Bussel 
Signed-off-by: herman 
(cherry picked from commit e5e54a3614ffd2a9150921e84e5b813d5cbf285a)
Signed-off-by: herman 
---
 .../unsafe/sort/UnsafeExternalSorter.java  | 69 +-
 .../unsafe/sort/UnsafeInMemorySorter.java  |  1 +
 .../unsafe/sort/UnsafeSorterIterator.java  |  2 +
 .../unsafe/sort/UnsafeSorterSpillMerger.java   |  5 ++
 .../unsafe/sort/UnsafeSorterSpillReader.java   |  5 ++
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++
 6 files changed, 88 insertions(+), 27 deletions(-)

diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
index 55e4e60..71b9a5b 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
@@ -501,11 +501,15 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
*/
   class SpillableIterator extends UnsafeSorterIterator {
 private UnsafeSorterIterator upstream;
-private UnsafeSorterIterator nextUpstream = null;
 private MemoryBlock lastPage = null;
 private boolean loaded = false;
 private int numRecords = 0;
 
+private Object currentBaseObject;
+private long currentBaseOffset;
+private int currentRecordLength;
+private long currentKeyPrefix;
+
 SpillableIterator(UnsafeSorterIterator inMemIterator) {
   this.upstream = inMemIterator;
   this.numRecords = inMemIterator.getNumRecords();
@@ -516,23 +520,26 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
   return numRecords;
 }
 
+@Override
+public long getCurrentPageNumber() {
+  throw new UnsupportedOperationException();
+}
+
 public long spill() throws IOException {
   synchronized (this) {
-if (!(upstream instanceof UnsafeInMemorySorter.SortedIterator && 
nextUpstream == null
-  && numRecords > 0)) {
+if (inMemSorter == null || numRecords <= 0) {
   return 0L;
 }
 
-UnsafeInMemorySorter.SortedIterator inMemIterator =
-  ((UnsafeInMemorySorter.SortedIterator) upstream).clone();
+long currentPageNumber = upstream.getCurrentPageNumber();
 
-   ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics();
+ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics();
 // Iterate over the records that have not been returned and spill them.
 final UnsafeSorterSpillWriter spillWriter =
   new UnsafeSorterSpillWriter(blockManager, fileBufferSizeBytes, 
writeMetrics, numRecords);
-spillIterator(inMemIterator, spillWriter);
+spillIterator(upstream, spillWriter);
 spillWriters.add(spillWriter);
-nextUpstream = spillWriter.getReader(serializerManager);
+upstream = spillWriter.getReader(serializerManager);
 
 long released = 0L;
 synchronized

[spark] branch master updated (92b75dc -> e5e54a3)

2020-09-17 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 92b75dc  [SPARK-32508][SQL] Disallow empty part col values in 
partition spec before static partition writing
 add e5e54a3  [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when 
there are nulls

No new revisions were added by this update.

Summary of changes:
 .../unsafe/sort/UnsafeExternalSorter.java  | 69 +-
 .../unsafe/sort/UnsafeInMemorySorter.java  |  1 +
 .../unsafe/sort/UnsafeSorterIterator.java  |  2 +
 .../unsafe/sort/UnsafeSorterSpillMerger.java   |  5 ++
 .../unsafe/sort/UnsafeSorterSpillReader.java   |  5 ++
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++
 6 files changed, 88 insertions(+), 27 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls

2020-09-17 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 2fa68a6  [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when 
there are nulls
2fa68a6 is described below

commit 2fa68a669cc83521c7257d844202790933ae9771
Author: Tom van Bussel 
AuthorDate: Thu Sep 17 12:35:40 2020 +0200

[SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls

### What changes were proposed in this pull request?

This PR changes the way `UnsafeExternalSorter.SpillableIterator` checks 
whether it has spilled already, by checking whether `inMemSorter` is null. It 
also allows it to spill other `UnsafeSorterIterator`s than 
`UnsafeInMemorySorter.SortedIterator`.

### Why are the changes needed?

Before this PR `UnsafeExternalSorter.SpillableIterator` could not spill 
when there are NULLs in the input and radix sorting is used. Currently, Spark 
determines whether UnsafeExternalSorter.SpillableIterator has not spilled yet 
by checking whether `upstream` is an instance of 
`UnsafeInMemorySorter.SortedIterator`. When radix sorting is used and there are 
NULLs in the input however, `upstream` will be an instance of 
`UnsafeExternalSorter.ChainedIterator` instead, and Spark will assume  [...]

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

A test was added to `UnsafeExternalSorterSuite` (and therefore also to 
`UnsafeExternalSorterRadixSortSuite`). I manually confirmed that the test 
failed in `UnsafeExternalSorterRadixSortSuite` without this patch.

Closes #29772 from tomvanbussel/SPARK-32900.

Authored-by: Tom van Bussel 
Signed-off-by: herman 
(cherry picked from commit e5e54a3614ffd2a9150921e84e5b813d5cbf285a)
Signed-off-by: herman 
---
 .../unsafe/sort/UnsafeExternalSorter.java  | 69 +-
 .../unsafe/sort/UnsafeInMemorySorter.java  |  1 +
 .../unsafe/sort/UnsafeSorterIterator.java  |  2 +
 .../unsafe/sort/UnsafeSorterSpillMerger.java   |  5 ++
 .../unsafe/sort/UnsafeSorterSpillReader.java   |  5 ++
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++
 6 files changed, 88 insertions(+), 27 deletions(-)

diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
index a6a2076..f720ccd 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
@@ -505,11 +505,15 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
*/
   class SpillableIterator extends UnsafeSorterIterator {
 private UnsafeSorterIterator upstream;
-private UnsafeSorterIterator nextUpstream = null;
 private MemoryBlock lastPage = null;
 private boolean loaded = false;
 private int numRecords = 0;
 
+private Object currentBaseObject;
+private long currentBaseOffset;
+private int currentRecordLength;
+private long currentKeyPrefix;
+
 SpillableIterator(UnsafeSorterIterator inMemIterator) {
   this.upstream = inMemIterator;
   this.numRecords = inMemIterator.getNumRecords();
@@ -520,23 +524,26 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
   return numRecords;
 }
 
+@Override
+public long getCurrentPageNumber() {
+  throw new UnsupportedOperationException();
+}
+
 public long spill() throws IOException {
   synchronized (this) {
-if (!(upstream instanceof UnsafeInMemorySorter.SortedIterator && 
nextUpstream == null
-  && numRecords > 0)) {
+if (inMemSorter == null || numRecords <= 0) {
   return 0L;
 }
 
-UnsafeInMemorySorter.SortedIterator inMemIterator =
-  ((UnsafeInMemorySorter.SortedIterator) upstream).clone();
+long currentPageNumber = upstream.getCurrentPageNumber();
 
-   ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics();
+ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics();
 // Iterate over the records that have not been returned and spill them.
 final UnsafeSorterSpillWriter spillWriter =
   new UnsafeSorterSpillWriter(blockManager, fileBufferSizeBytes, 
writeMetrics, numRecords);
-spillIterator(inMemIterator, spillWriter);
+spillIterator(upstream, spillWriter);
 spillWriters.add(spillWriter);
-nextUpstream = spillWriter.getReader(serializerManager);
+upstream = spillWriter.getReader(serializerManager);
 
 long released = 0L;
 synchronized

[spark] branch branch-3.0 updated: [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls

2020-09-17 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2e94d9a  [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when 
there are nulls
2e94d9a is described below

commit 2e94d9af6e0d0c06eaba650dc70f07cc5b2bd791
Author: Tom van Bussel 
AuthorDate: Thu Sep 17 12:35:40 2020 +0200

[SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls

### What changes were proposed in this pull request?

This PR changes the way `UnsafeExternalSorter.SpillableIterator` checks 
whether it has spilled already, by checking whether `inMemSorter` is null. It 
also allows it to spill other `UnsafeSorterIterator`s than 
`UnsafeInMemorySorter.SortedIterator`.

### Why are the changes needed?

Before this PR `UnsafeExternalSorter.SpillableIterator` could not spill 
when there are NULLs in the input and radix sorting is used. Currently, Spark 
determines whether UnsafeExternalSorter.SpillableIterator has not spilled yet 
by checking whether `upstream` is an instance of 
`UnsafeInMemorySorter.SortedIterator`. When radix sorting is used and there are 
NULLs in the input however, `upstream` will be an instance of 
`UnsafeExternalSorter.ChainedIterator` instead, and Spark will assume  [...]

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

A test was added to `UnsafeExternalSorterSuite` (and therefore also to 
`UnsafeExternalSorterRadixSortSuite`). I manually confirmed that the test 
failed in `UnsafeExternalSorterRadixSortSuite` without this patch.

Closes #29772 from tomvanbussel/SPARK-32900.

Authored-by: Tom van Bussel 
Signed-off-by: herman 
(cherry picked from commit e5e54a3614ffd2a9150921e84e5b813d5cbf285a)
Signed-off-by: herman 
---
 .../unsafe/sort/UnsafeExternalSorter.java  | 69 +-
 .../unsafe/sort/UnsafeInMemorySorter.java  |  1 +
 .../unsafe/sort/UnsafeSorterIterator.java  |  2 +
 .../unsafe/sort/UnsafeSorterSpillMerger.java   |  5 ++
 .../unsafe/sort/UnsafeSorterSpillReader.java   |  5 ++
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++
 6 files changed, 88 insertions(+), 27 deletions(-)

diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
index 55e4e60..71b9a5b 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
@@ -501,11 +501,15 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
*/
   class SpillableIterator extends UnsafeSorterIterator {
 private UnsafeSorterIterator upstream;
-private UnsafeSorterIterator nextUpstream = null;
 private MemoryBlock lastPage = null;
 private boolean loaded = false;
 private int numRecords = 0;
 
+private Object currentBaseObject;
+private long currentBaseOffset;
+private int currentRecordLength;
+private long currentKeyPrefix;
+
 SpillableIterator(UnsafeSorterIterator inMemIterator) {
   this.upstream = inMemIterator;
   this.numRecords = inMemIterator.getNumRecords();
@@ -516,23 +520,26 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
   return numRecords;
 }
 
+@Override
+public long getCurrentPageNumber() {
+  throw new UnsupportedOperationException();
+}
+
 public long spill() throws IOException {
   synchronized (this) {
-if (!(upstream instanceof UnsafeInMemorySorter.SortedIterator && 
nextUpstream == null
-  && numRecords > 0)) {
+if (inMemSorter == null || numRecords <= 0) {
   return 0L;
 }
 
-UnsafeInMemorySorter.SortedIterator inMemIterator =
-  ((UnsafeInMemorySorter.SortedIterator) upstream).clone();
+long currentPageNumber = upstream.getCurrentPageNumber();
 
-   ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics();
+ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics();
 // Iterate over the records that have not been returned and spill them.
 final UnsafeSorterSpillWriter spillWriter =
   new UnsafeSorterSpillWriter(blockManager, fileBufferSizeBytes, 
writeMetrics, numRecords);
-spillIterator(inMemIterator, spillWriter);
+spillIterator(upstream, spillWriter);
 spillWriters.add(spillWriter);
-nextUpstream = spillWriter.getReader(serializerManager);
+upstream = spillWriter.getReader(serializerManager);
 
 long released = 0L;
 synchronized

[spark] branch master updated (92b75dc -> e5e54a3)

2020-09-17 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 92b75dc  [SPARK-32508][SQL] Disallow empty part col values in 
partition spec before static partition writing
 add e5e54a3  [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when 
there are nulls

No new revisions were added by this update.

Summary of changes:
 .../unsafe/sort/UnsafeExternalSorter.java  | 69 +-
 .../unsafe/sort/UnsafeInMemorySorter.java  |  1 +
 .../unsafe/sort/UnsafeSorterIterator.java  |  2 +
 .../unsafe/sort/UnsafeSorterSpillMerger.java   |  5 ++
 .../unsafe/sort/UnsafeSorterSpillReader.java   |  5 ++
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++
 6 files changed, 88 insertions(+), 27 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls

2020-09-17 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 2fa68a6  [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when 
there are nulls
2fa68a6 is described below

commit 2fa68a669cc83521c7257d844202790933ae9771
Author: Tom van Bussel 
AuthorDate: Thu Sep 17 12:35:40 2020 +0200

[SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls

### What changes were proposed in this pull request?

This PR changes the way `UnsafeExternalSorter.SpillableIterator` checks 
whether it has spilled already, by checking whether `inMemSorter` is null. It 
also allows it to spill other `UnsafeSorterIterator`s than 
`UnsafeInMemorySorter.SortedIterator`.

### Why are the changes needed?

Before this PR `UnsafeExternalSorter.SpillableIterator` could not spill 
when there are NULLs in the input and radix sorting is used. Currently, Spark 
determines whether UnsafeExternalSorter.SpillableIterator has not spilled yet 
by checking whether `upstream` is an instance of 
`UnsafeInMemorySorter.SortedIterator`. When radix sorting is used and there are 
NULLs in the input however, `upstream` will be an instance of 
`UnsafeExternalSorter.ChainedIterator` instead, and Spark will assume  [...]

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

A test was added to `UnsafeExternalSorterSuite` (and therefore also to 
`UnsafeExternalSorterRadixSortSuite`). I manually confirmed that the test 
failed in `UnsafeExternalSorterRadixSortSuite` without this patch.

Closes #29772 from tomvanbussel/SPARK-32900.

Authored-by: Tom van Bussel 
Signed-off-by: herman 
(cherry picked from commit e5e54a3614ffd2a9150921e84e5b813d5cbf285a)
Signed-off-by: herman 
---
 .../unsafe/sort/UnsafeExternalSorter.java  | 69 +-
 .../unsafe/sort/UnsafeInMemorySorter.java  |  1 +
 .../unsafe/sort/UnsafeSorterIterator.java  |  2 +
 .../unsafe/sort/UnsafeSorterSpillMerger.java   |  5 ++
 .../unsafe/sort/UnsafeSorterSpillReader.java   |  5 ++
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++
 6 files changed, 88 insertions(+), 27 deletions(-)

diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
index a6a2076..f720ccd 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
@@ -505,11 +505,15 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
*/
   class SpillableIterator extends UnsafeSorterIterator {
 private UnsafeSorterIterator upstream;
-private UnsafeSorterIterator nextUpstream = null;
 private MemoryBlock lastPage = null;
 private boolean loaded = false;
 private int numRecords = 0;
 
+private Object currentBaseObject;
+private long currentBaseOffset;
+private int currentRecordLength;
+private long currentKeyPrefix;
+
 SpillableIterator(UnsafeSorterIterator inMemIterator) {
   this.upstream = inMemIterator;
   this.numRecords = inMemIterator.getNumRecords();
@@ -520,23 +524,26 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
   return numRecords;
 }
 
+@Override
+public long getCurrentPageNumber() {
+  throw new UnsupportedOperationException();
+}
+
 public long spill() throws IOException {
   synchronized (this) {
-if (!(upstream instanceof UnsafeInMemorySorter.SortedIterator && 
nextUpstream == null
-  && numRecords > 0)) {
+if (inMemSorter == null || numRecords <= 0) {
   return 0L;
 }
 
-UnsafeInMemorySorter.SortedIterator inMemIterator =
-  ((UnsafeInMemorySorter.SortedIterator) upstream).clone();
+long currentPageNumber = upstream.getCurrentPageNumber();
 
-   ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics();
+ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics();
 // Iterate over the records that have not been returned and spill them.
 final UnsafeSorterSpillWriter spillWriter =
   new UnsafeSorterSpillWriter(blockManager, fileBufferSizeBytes, 
writeMetrics, numRecords);
-spillIterator(inMemIterator, spillWriter);
+spillIterator(upstream, spillWriter);
 spillWriters.add(spillWriter);
-nextUpstream = spillWriter.getReader(serializerManager);
+upstream = spillWriter.getReader(serializerManager);
 
 long released = 0L;
 synchronized

[spark] branch branch-3.0 updated: [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls

2020-09-17 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2e94d9a  [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when 
there are nulls
2e94d9a is described below

commit 2e94d9af6e0d0c06eaba650dc70f07cc5b2bd791
Author: Tom van Bussel 
AuthorDate: Thu Sep 17 12:35:40 2020 +0200

[SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls

### What changes were proposed in this pull request?

This PR changes the way `UnsafeExternalSorter.SpillableIterator` checks 
whether it has spilled already, by checking whether `inMemSorter` is null. It 
also allows it to spill other `UnsafeSorterIterator`s than 
`UnsafeInMemorySorter.SortedIterator`.

### Why are the changes needed?

Before this PR `UnsafeExternalSorter.SpillableIterator` could not spill 
when there are NULLs in the input and radix sorting is used. Currently, Spark 
determines whether UnsafeExternalSorter.SpillableIterator has not spilled yet 
by checking whether `upstream` is an instance of 
`UnsafeInMemorySorter.SortedIterator`. When radix sorting is used and there are 
NULLs in the input however, `upstream` will be an instance of 
`UnsafeExternalSorter.ChainedIterator` instead, and Spark will assume  [...]

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

A test was added to `UnsafeExternalSorterSuite` (and therefore also to 
`UnsafeExternalSorterRadixSortSuite`). I manually confirmed that the test 
failed in `UnsafeExternalSorterRadixSortSuite` without this patch.

Closes #29772 from tomvanbussel/SPARK-32900.

Authored-by: Tom van Bussel 
Signed-off-by: herman 
(cherry picked from commit e5e54a3614ffd2a9150921e84e5b813d5cbf285a)
Signed-off-by: herman 
---
 .../unsafe/sort/UnsafeExternalSorter.java  | 69 +-
 .../unsafe/sort/UnsafeInMemorySorter.java  |  1 +
 .../unsafe/sort/UnsafeSorterIterator.java  |  2 +
 .../unsafe/sort/UnsafeSorterSpillMerger.java   |  5 ++
 .../unsafe/sort/UnsafeSorterSpillReader.java   |  5 ++
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++
 6 files changed, 88 insertions(+), 27 deletions(-)

diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
index 55e4e60..71b9a5b 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
@@ -501,11 +501,15 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
*/
   class SpillableIterator extends UnsafeSorterIterator {
 private UnsafeSorterIterator upstream;
-private UnsafeSorterIterator nextUpstream = null;
 private MemoryBlock lastPage = null;
 private boolean loaded = false;
 private int numRecords = 0;
 
+private Object currentBaseObject;
+private long currentBaseOffset;
+private int currentRecordLength;
+private long currentKeyPrefix;
+
 SpillableIterator(UnsafeSorterIterator inMemIterator) {
   this.upstream = inMemIterator;
   this.numRecords = inMemIterator.getNumRecords();
@@ -516,23 +520,26 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
   return numRecords;
 }
 
+@Override
+public long getCurrentPageNumber() {
+  throw new UnsupportedOperationException();
+}
+
 public long spill() throws IOException {
   synchronized (this) {
-if (!(upstream instanceof UnsafeInMemorySorter.SortedIterator && 
nextUpstream == null
-  && numRecords > 0)) {
+if (inMemSorter == null || numRecords <= 0) {
   return 0L;
 }
 
-UnsafeInMemorySorter.SortedIterator inMemIterator =
-  ((UnsafeInMemorySorter.SortedIterator) upstream).clone();
+long currentPageNumber = upstream.getCurrentPageNumber();
 
-   ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics();
+ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics();
 // Iterate over the records that have not been returned and spill them.
 final UnsafeSorterSpillWriter spillWriter =
   new UnsafeSorterSpillWriter(blockManager, fileBufferSizeBytes, 
writeMetrics, numRecords);
-spillIterator(inMemIterator, spillWriter);
+spillIterator(upstream, spillWriter);
 spillWriters.add(spillWriter);
-nextUpstream = spillWriter.getReader(serializerManager);
+upstream = spillWriter.getReader(serializerManager);
 
 long released = 0L;
 synchronized

[spark] branch master updated (92b75dc -> e5e54a3)

2020-09-17 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 92b75dc  [SPARK-32508][SQL] Disallow empty part col values in 
partition spec before static partition writing
 add e5e54a3  [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when 
there are nulls

No new revisions were added by this update.

Summary of changes:
 .../unsafe/sort/UnsafeExternalSorter.java  | 69 +-
 .../unsafe/sort/UnsafeInMemorySorter.java  |  1 +
 .../unsafe/sort/UnsafeSorterIterator.java  |  2 +
 .../unsafe/sort/UnsafeSorterSpillMerger.java   |  5 ++
 .../unsafe/sort/UnsafeSorterSpillReader.java   |  5 ++
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++
 6 files changed, 88 insertions(+), 27 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls

2020-09-17 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 2fa68a6  [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when 
there are nulls
2fa68a6 is described below

commit 2fa68a669cc83521c7257d844202790933ae9771
Author: Tom van Bussel 
AuthorDate: Thu Sep 17 12:35:40 2020 +0200

[SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls

### What changes were proposed in this pull request?

This PR changes the way `UnsafeExternalSorter.SpillableIterator` checks 
whether it has spilled already, by checking whether `inMemSorter` is null. It 
also allows it to spill other `UnsafeSorterIterator`s than 
`UnsafeInMemorySorter.SortedIterator`.

### Why are the changes needed?

Before this PR `UnsafeExternalSorter.SpillableIterator` could not spill 
when there are NULLs in the input and radix sorting is used. Currently, Spark 
determines whether UnsafeExternalSorter.SpillableIterator has not spilled yet 
by checking whether `upstream` is an instance of 
`UnsafeInMemorySorter.SortedIterator`. When radix sorting is used and there are 
NULLs in the input however, `upstream` will be an instance of 
`UnsafeExternalSorter.ChainedIterator` instead, and Spark will assume  [...]

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

A test was added to `UnsafeExternalSorterSuite` (and therefore also to 
`UnsafeExternalSorterRadixSortSuite`). I manually confirmed that the test 
failed in `UnsafeExternalSorterRadixSortSuite` without this patch.

Closes #29772 from tomvanbussel/SPARK-32900.

Authored-by: Tom van Bussel 
Signed-off-by: herman 
(cherry picked from commit e5e54a3614ffd2a9150921e84e5b813d5cbf285a)
Signed-off-by: herman 
---
 .../unsafe/sort/UnsafeExternalSorter.java  | 69 +-
 .../unsafe/sort/UnsafeInMemorySorter.java  |  1 +
 .../unsafe/sort/UnsafeSorterIterator.java  |  2 +
 .../unsafe/sort/UnsafeSorterSpillMerger.java   |  5 ++
 .../unsafe/sort/UnsafeSorterSpillReader.java   |  5 ++
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++
 6 files changed, 88 insertions(+), 27 deletions(-)

diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
index a6a2076..f720ccd 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
@@ -505,11 +505,15 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
*/
   class SpillableIterator extends UnsafeSorterIterator {
 private UnsafeSorterIterator upstream;
-private UnsafeSorterIterator nextUpstream = null;
 private MemoryBlock lastPage = null;
 private boolean loaded = false;
 private int numRecords = 0;
 
+private Object currentBaseObject;
+private long currentBaseOffset;
+private int currentRecordLength;
+private long currentKeyPrefix;
+
 SpillableIterator(UnsafeSorterIterator inMemIterator) {
   this.upstream = inMemIterator;
   this.numRecords = inMemIterator.getNumRecords();
@@ -520,23 +524,26 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
   return numRecords;
 }
 
+@Override
+public long getCurrentPageNumber() {
+  throw new UnsupportedOperationException();
+}
+
 public long spill() throws IOException {
   synchronized (this) {
-if (!(upstream instanceof UnsafeInMemorySorter.SortedIterator && 
nextUpstream == null
-  && numRecords > 0)) {
+if (inMemSorter == null || numRecords <= 0) {
   return 0L;
 }
 
-UnsafeInMemorySorter.SortedIterator inMemIterator =
-  ((UnsafeInMemorySorter.SortedIterator) upstream).clone();
+long currentPageNumber = upstream.getCurrentPageNumber();
 
-   ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics();
+ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics();
 // Iterate over the records that have not been returned and spill them.
 final UnsafeSorterSpillWriter spillWriter =
   new UnsafeSorterSpillWriter(blockManager, fileBufferSizeBytes, 
writeMetrics, numRecords);
-spillIterator(inMemIterator, spillWriter);
+spillIterator(upstream, spillWriter);
 spillWriters.add(spillWriter);
-nextUpstream = spillWriter.getReader(serializerManager);
+upstream = spillWriter.getReader(serializerManager);
 
 long released = 0L;
 synchronized

[spark] branch branch-3.0 updated: [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls

2020-09-17 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2e94d9a  [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when 
there are nulls
2e94d9a is described below

commit 2e94d9af6e0d0c06eaba650dc70f07cc5b2bd791
Author: Tom van Bussel 
AuthorDate: Thu Sep 17 12:35:40 2020 +0200

[SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when there are nulls

### What changes were proposed in this pull request?

This PR changes the way `UnsafeExternalSorter.SpillableIterator` checks 
whether it has spilled already, by checking whether `inMemSorter` is null. It 
also allows it to spill other `UnsafeSorterIterator`s than 
`UnsafeInMemorySorter.SortedIterator`.

### Why are the changes needed?

Before this PR `UnsafeExternalSorter.SpillableIterator` could not spill 
when there are NULLs in the input and radix sorting is used. Currently, Spark 
determines whether UnsafeExternalSorter.SpillableIterator has not spilled yet 
by checking whether `upstream` is an instance of 
`UnsafeInMemorySorter.SortedIterator`. When radix sorting is used and there are 
NULLs in the input however, `upstream` will be an instance of 
`UnsafeExternalSorter.ChainedIterator` instead, and Spark will assume  [...]

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

A test was added to `UnsafeExternalSorterSuite` (and therefore also to 
`UnsafeExternalSorterRadixSortSuite`). I manually confirmed that the test 
failed in `UnsafeExternalSorterRadixSortSuite` without this patch.

Closes #29772 from tomvanbussel/SPARK-32900.

Authored-by: Tom van Bussel 
Signed-off-by: herman 
(cherry picked from commit e5e54a3614ffd2a9150921e84e5b813d5cbf285a)
Signed-off-by: herman 
---
 .../unsafe/sort/UnsafeExternalSorter.java  | 69 +-
 .../unsafe/sort/UnsafeInMemorySorter.java  |  1 +
 .../unsafe/sort/UnsafeSorterIterator.java  |  2 +
 .../unsafe/sort/UnsafeSorterSpillMerger.java   |  5 ++
 .../unsafe/sort/UnsafeSorterSpillReader.java   |  5 ++
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++
 6 files changed, 88 insertions(+), 27 deletions(-)

diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
index 55e4e60..71b9a5b 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
@@ -501,11 +501,15 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
*/
   class SpillableIterator extends UnsafeSorterIterator {
 private UnsafeSorterIterator upstream;
-private UnsafeSorterIterator nextUpstream = null;
 private MemoryBlock lastPage = null;
 private boolean loaded = false;
 private int numRecords = 0;
 
+private Object currentBaseObject;
+private long currentBaseOffset;
+private int currentRecordLength;
+private long currentKeyPrefix;
+
 SpillableIterator(UnsafeSorterIterator inMemIterator) {
   this.upstream = inMemIterator;
   this.numRecords = inMemIterator.getNumRecords();
@@ -516,23 +520,26 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
   return numRecords;
 }
 
+@Override
+public long getCurrentPageNumber() {
+  throw new UnsupportedOperationException();
+}
+
 public long spill() throws IOException {
   synchronized (this) {
-if (!(upstream instanceof UnsafeInMemorySorter.SortedIterator && 
nextUpstream == null
-  && numRecords > 0)) {
+if (inMemSorter == null || numRecords <= 0) {
   return 0L;
 }
 
-UnsafeInMemorySorter.SortedIterator inMemIterator =
-  ((UnsafeInMemorySorter.SortedIterator) upstream).clone();
+long currentPageNumber = upstream.getCurrentPageNumber();
 
-   ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics();
+ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics();
 // Iterate over the records that have not been returned and spill them.
 final UnsafeSorterSpillWriter spillWriter =
   new UnsafeSorterSpillWriter(blockManager, fileBufferSizeBytes, 
writeMetrics, numRecords);
-spillIterator(inMemIterator, spillWriter);
+spillIterator(upstream, spillWriter);
 spillWriters.add(spillWriter);
-nextUpstream = spillWriter.getReader(serializerManager);
+upstream = spillWriter.getReader(serializerManager);
 
 long released = 0L;
 synchronized

[spark] branch master updated (92b75dc -> e5e54a3)

2020-09-17 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 92b75dc  [SPARK-32508][SQL] Disallow empty part col values in 
partition spec before static partition writing
 add e5e54a3  [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when 
there are nulls

No new revisions were added by this update.

Summary of changes:
 .../unsafe/sort/UnsafeExternalSorter.java  | 69 +-
 .../unsafe/sort/UnsafeInMemorySorter.java  |  1 +
 .../unsafe/sort/UnsafeSorterIterator.java  |  2 +
 .../unsafe/sort/UnsafeSorterSpillMerger.java   |  5 ++
 .../unsafe/sort/UnsafeSorterSpillReader.java   |  5 ++
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++
 6 files changed, 88 insertions(+), 27 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (92b75dc -> e5e54a3)

2020-09-17 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 92b75dc  [SPARK-32508][SQL] Disallow empty part col values in 
partition spec before static partition writing
 add e5e54a3  [SPARK-32900][CORE] Allow UnsafeExternalSorter to spill when 
there are nulls

No new revisions were added by this update.

Summary of changes:
 .../unsafe/sort/UnsafeExternalSorter.java  | 69 +-
 .../unsafe/sort/UnsafeInMemorySorter.java  |  1 +
 .../unsafe/sort/UnsafeSorterIterator.java  |  2 +
 .../unsafe/sort/UnsafeSorterSpillMerger.java   |  5 ++
 .../unsafe/sort/UnsafeSorterSpillReader.java   |  5 ++
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 33 +++
 6 files changed, 88 insertions(+), 27 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (bd38e0b -> 92b75dc)

2020-09-17 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bd38e0b  [SPARK-32903][SQL] GeneratePredicate should be able to 
eliminate common sub-expressions
 add 92b75dc  [SPARK-32508][SQL] Disallow empty part col values in 
partition spec before static partition writing

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/datasources/rules.scala| 22 ++
 .../org/apache/spark/sql/sources/InsertSuite.scala | 22 ++
 .../org/apache/spark/sql/hive/InsertSuite.scala| 22 ++
 3 files changed, 62 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (bd38e0b -> 92b75dc)

2020-09-17 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bd38e0b  [SPARK-32903][SQL] GeneratePredicate should be able to 
eliminate common sub-expressions
 add 92b75dc  [SPARK-32508][SQL] Disallow empty part col values in 
partition spec before static partition writing

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/datasources/rules.scala| 22 ++
 .../org/apache/spark/sql/sources/InsertSuite.scala | 22 ++
 .../org/apache/spark/sql/hive/InsertSuite.scala| 22 ++
 3 files changed, 62 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (bd38e0b -> 92b75dc)

2020-09-17 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bd38e0b  [SPARK-32903][SQL] GeneratePredicate should be able to 
eliminate common sub-expressions
 add 92b75dc  [SPARK-32508][SQL] Disallow empty part col values in 
partition spec before static partition writing

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/datasources/rules.scala| 22 ++
 .../org/apache/spark/sql/sources/InsertSuite.scala | 22 ++
 .../org/apache/spark/sql/hive/InsertSuite.scala| 22 ++
 3 files changed, 62 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (bd38e0b -> 92b75dc)

2020-09-17 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bd38e0b  [SPARK-32903][SQL] GeneratePredicate should be able to 
eliminate common sub-expressions
 add 92b75dc  [SPARK-32508][SQL] Disallow empty part col values in 
partition spec before static partition writing

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/datasources/rules.scala| 22 ++
 .../org/apache/spark/sql/sources/InsertSuite.scala | 22 ++
 .../org/apache/spark/sql/hive/InsertSuite.scala| 22 ++
 3 files changed, 62 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

1 2 >

1 - 100 of 101 matches

Mail list logo