[spark] branch branch-2.4 updated: [SPARK-27485][BRANCH-2.4] EnsureRequirements.reorder should handle duplicate expressions gracefully

2019-07-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 198f2f3  [SPARK-27485][BRANCH-2.4] EnsureRequirements.reorder should 
handle duplicate expressions gracefully
198f2f3 is described below

commit 198f2f331b79fc0a1d3afcf4d999c0dd4ad7a818
Author: herman 
AuthorDate: Tue Jul 16 18:01:15 2019 -0700

[SPARK-27485][BRANCH-2.4] EnsureRequirements.reorder should handle 
duplicate expressions gracefully

Backport of 421d9d56efd447d31787e77316ce0eafb5fe45a5

## What changes were proposed in this pull request?
When reordering joins EnsureRequirements only checks if all the join keys 
are present in the partitioning expression seq. This is problematic when the 
joins keys and and partitioning expressions both contain duplicates but not the 
same number of duplicates for each expression, e.g. `Seq(a, a, b)` vs `Seq(a, 
b, b)`. This fails with an index lookup failure in the `reorder` function.

This PR fixes this removing the equality checking logic from the 
`reorderJoinKeys` function, and by doing the multiset equality in the `reorder` 
function while building the reordered key sequences.

## How was this patch tested?
Added a unit test to the `PlannerSuite` and added an integration test to 
`JoinSuite`

Closes #25174 from hvanhovell/SPARK-27485-2.4.

Authored-by: herman 
Signed-off-by: Dongjoon Hyun 
---
 .../execution/exchange/EnsureRequirements.scala| 72 --
 .../scala/org/apache/spark/sql/JoinSuite.scala | 20 ++
 .../apache/spark/sql/execution/PlannerSuite.scala  | 26 
 3 files changed, 86 insertions(+), 32 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
index d2d5011..bdb9a31 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
@@ -24,8 +24,7 @@ import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.plans.physical._
 import org.apache.spark.sql.catalyst.rules.Rule
 import org.apache.spark.sql.execution._
-import org.apache.spark.sql.execution.joins.{BroadcastHashJoinExec, 
ShuffledHashJoinExec,
-  SortMergeJoinExec}
+import org.apache.spark.sql.execution.joins.{ShuffledHashJoinExec, 
SortMergeJoinExec}
 import org.apache.spark.sql.internal.SQLConf
 
 /**
@@ -221,25 +220,41 @@ case class EnsureRequirements(conf: SQLConf) extends 
Rule[SparkPlan] {
   }
 
   private def reorder(
-  leftKeys: Seq[Expression],
-  rightKeys: Seq[Expression],
+  leftKeys: IndexedSeq[Expression],
+  rightKeys: IndexedSeq[Expression],
   expectedOrderOfKeys: Seq[Expression],
   currentOrderOfKeys: Seq[Expression]): (Seq[Expression], Seq[Expression]) 
= {
-val leftKeysBuffer = ArrayBuffer[Expression]()
-val rightKeysBuffer = ArrayBuffer[Expression]()
-val pickedIndexes = mutable.Set[Int]()
-val keysAndIndexes = currentOrderOfKeys.zipWithIndex
+if (expectedOrderOfKeys.size != currentOrderOfKeys.size) {
+  return (leftKeys, rightKeys)
+}
+
+// Build a lookup between an expression and the positions its holds in the 
current key seq.
+val keyToIndexMap = mutable.Map.empty[Expression, mutable.BitSet]
+currentOrderOfKeys.zipWithIndex.foreach {
+  case (key, index) =>
+keyToIndexMap.getOrElseUpdate(key.canonicalized, 
mutable.BitSet.empty).add(index)
+}
+
+// Reorder the keys.
+val leftKeysBuffer = new ArrayBuffer[Expression](leftKeys.size)
+val rightKeysBuffer = new ArrayBuffer[Expression](rightKeys.size)
+val iterator = expectedOrderOfKeys.iterator
+while (iterator.hasNext) {
+  // Lookup the current index of this key.
+  keyToIndexMap.get(iterator.next().canonicalized) match {
+case Some(indices) if indices.nonEmpty =>
+  // Take the first available index from the map.
+  val index = indices.firstKey
+  indices.remove(index)
 
-expectedOrderOfKeys.foreach(expression => {
-  val index = keysAndIndexes.find { case (e, idx) =>
-// As we may have the same key used many times, we need to filter out 
its occurrence we
-// have already used.
-e.semanticEquals(expression) && !pickedIndexes.contains(idx)
-  }.map(_._2).get
-  pickedIndexes += index
-  leftKeysBuffer.append(leftKeys(index))
-  rightKeysBuffer.append(rightKeys(index))
-})
+  // Add the keys for that index to the reordered keys.
+  leftKeysBuffer += leftKeys(index)
+  rightKeysBuffer += rightKeys(index)
+case _ =>
+  

[spark] branch master updated: [SPARK-27963][CORE] Allow dynamic allocation without a shuffle service.

2019-07-16 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2ddeff9  [SPARK-27963][CORE] Allow dynamic allocation without a 
shuffle service.
2ddeff9 is described below

commit 2ddeff97d7329942a98ef363991eeabc3fa71a76
Author: Marcelo Vanzin 
AuthorDate: Tue Jul 16 16:37:38 2019 -0700

[SPARK-27963][CORE] Allow dynamic allocation without a shuffle service.

This change adds a new option that enables dynamic allocation without
the need for a shuffle service. This mode works by tracking which stages
generate shuffle files, and keeping executors that generate data for those
shuffles alive while the jobs that use them are active.

A separate timeout is also added for shuffle data; so that executors that
hold shuffle data can use a separate timeout before being removed because
of being idle. This allows the shuffle data to be kept around in case it
is needed by some new job, or allow users to be more aggressive in timing
out executors that don't have shuffle data in active use.

The code also hooks up to the context cleaner so that shuffles that are
garbage collected are detected, and the respective executors not held
unnecessarily.

Testing done with added unit tests, and also with TPC-DS workloads on
YARN without a shuffle service.

Closes #24817 from vanzin/SPARK-27963.

Authored-by: Marcelo Vanzin 
Signed-off-by: Marcelo Vanzin 
---
 .../apache/spark/ExecutorAllocationManager.scala   |  16 +-
 .../main/scala/org/apache/spark/SparkContext.scala |  20 +-
 .../org/apache/spark/internal/config/package.scala |  11 +
 .../org/apache/spark/scheduler/StageInfo.scala |  10 +-
 .../spark/scheduler/dynalloc/ExecutorMonitor.scala | 225 -
 .../spark/ExecutorAllocationManagerSuite.scala |   2 +-
 .../scheduler/dynalloc/ExecutorMonitorSuite.scala  | 132 +++-
 docs/configuration.md  |  20 ++
 8 files changed, 407 insertions(+), 29 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala 
b/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
index bceb26c..5114cf7 100644
--- a/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
+++ b/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
@@ -94,6 +94,7 @@ private[spark] class ExecutorAllocationManager(
 client: ExecutorAllocationClient,
 listenerBus: LiveListenerBus,
 conf: SparkConf,
+cleaner: Option[ContextCleaner] = None,
 clock: Clock = new SystemClock())
   extends Logging {
 
@@ -148,7 +149,7 @@ private[spark] class ExecutorAllocationManager(
   // Listener for Spark events that impact the allocation policy
   val listener = new ExecutorAllocationListener
 
-  val executorMonitor = new ExecutorMonitor(conf, client, clock)
+  val executorMonitor = new ExecutorMonitor(conf, client, listenerBus, clock)
 
   // Executor that handles the scheduling task.
   private val executor =
@@ -194,11 +195,13 @@ private[spark] class ExecutorAllocationManager(
   throw new SparkException(
 s"s${DYN_ALLOCATION_SUSTAINED_SCHEDULER_BACKLOG_TIMEOUT.key} must be > 
0!")
 }
-// Require external shuffle service for dynamic allocation
-// Otherwise, we may lose shuffle files when killing executors
-if (!conf.get(config.SHUFFLE_SERVICE_ENABLED) && !testing) {
-  throw new SparkException("Dynamic allocation of executors requires the 
external " +
-"shuffle service. You may enable this through 
spark.shuffle.service.enabled.")
+if (!conf.get(config.SHUFFLE_SERVICE_ENABLED)) {
+  if (conf.get(config.DYN_ALLOCATION_SHUFFLE_TRACKING)) {
+logWarning("Dynamic allocation without a shuffle service is an 
experimental feature.")
+  } else if (!testing) {
+throw new SparkException("Dynamic allocation of executors requires the 
external " +
+  "shuffle service. You may enable this through 
spark.shuffle.service.enabled.")
+  }
 }
 
 if (executorAllocationRatio > 1.0 || executorAllocationRatio <= 0.0) {
@@ -214,6 +217,7 @@ private[spark] class ExecutorAllocationManager(
   def start(): Unit = {
 listenerBus.addToManagementQueue(listener)
 listenerBus.addToManagementQueue(executorMonitor)
+cleaner.foreach(_.attachListener(executorMonitor))
 
 val scheduleTask = new Runnable() {
   override def run(): Unit = {
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index f289c17..75182b0 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -553,14 +553,22 @@ class SparkContext(config: SparkConf) extends 

[spark] branch master updated: [SPARK-18299][SQL] Allow more aggregations on KeyValueGroupedDataset

2019-07-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1134fae  [SPARK-18299][SQL] Allow more aggregations on 
KeyValueGroupedDataset
1134fae is described below

commit 1134faecf4fe2cc1bf7c3670f5f8f2b9d0c6f2e7
Author: nooberfsh 
AuthorDate: Tue Jul 16 16:35:04 2019 -0700

[SPARK-18299][SQL] Allow more aggregations on KeyValueGroupedDataset

## What changes were proposed in this pull request?

Add 4 additional agg to KeyValueGroupedDataset

## How was this patch tested?

New test in DatasetSuite for typed aggregation

Closes #24993 from nooberfsh/sqlagg.

Authored-by: nooberfsh 
Signed-off-by: Dongjoon Hyun 
---
 .../apache/spark/sql/KeyValueGroupedDataset.scala  | 65 ++
 .../scala/org/apache/spark/sql/DatasetSuite.scala  | 64 +
 2 files changed, 129 insertions(+)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala
index a3cbea9..0da52d4 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala
@@ -521,6 +521,71 @@ class KeyValueGroupedDataset[K, V] private[sql](
 aggUntyped(col1, col2, col3, col4).asInstanceOf[Dataset[(K, U1, U2, U3, 
U4)]]
 
   /**
+   * Computes the given aggregations, returning a [[Dataset]] of tuples for 
each unique key
+   * and the result of computing these aggregations over all elements in the 
group.
+   *
+   * @since 3.0.0
+   */
+  def agg[U1, U2, U3, U4, U5](
+  col1: TypedColumn[V, U1],
+  col2: TypedColumn[V, U2],
+  col3: TypedColumn[V, U3],
+  col4: TypedColumn[V, U4],
+  col5: TypedColumn[V, U5]): Dataset[(K, U1, U2, U3, U4, U5)] =
+aggUntyped(col1, col2, col3, col4, col5).asInstanceOf[Dataset[(K, U1, U2, 
U3, U4, U5)]]
+
+  /**
+   * Computes the given aggregations, returning a [[Dataset]] of tuples for 
each unique key
+   * and the result of computing these aggregations over all elements in the 
group.
+   *
+   * @since 3.0.0
+   */
+  def agg[U1, U2, U3, U4, U5, U6](
+  col1: TypedColumn[V, U1],
+  col2: TypedColumn[V, U2],
+  col3: TypedColumn[V, U3],
+  col4: TypedColumn[V, U4],
+  col5: TypedColumn[V, U5],
+  col6: TypedColumn[V, U6]): Dataset[(K, U1, U2, U3, U4, U5, U6)] =
+aggUntyped(col1, col2, col3, col4, col5, col6)
+  .asInstanceOf[Dataset[(K, U1, U2, U3, U4, U5, U6)]]
+
+  /**
+   * Computes the given aggregations, returning a [[Dataset]] of tuples for 
each unique key
+   * and the result of computing these aggregations over all elements in the 
group.
+   *
+   * @since 3.0.0
+   */
+  def agg[U1, U2, U3, U4, U5, U6, U7](
+  col1: TypedColumn[V, U1],
+  col2: TypedColumn[V, U2],
+  col3: TypedColumn[V, U3],
+  col4: TypedColumn[V, U4],
+  col5: TypedColumn[V, U5],
+  col6: TypedColumn[V, U6],
+  col7: TypedColumn[V, U7]): Dataset[(K, U1, U2, U3, U4, U5, U6, U7)] =
+aggUntyped(col1, col2, col3, col4, col5, col6, col7)
+  .asInstanceOf[Dataset[(K, U1, U2, U3, U4, U5, U6, U7)]]
+
+  /**
+   * Computes the given aggregations, returning a [[Dataset]] of tuples for 
each unique key
+   * and the result of computing these aggregations over all elements in the 
group.
+   *
+   * @since 3.0.0
+   */
+  def agg[U1, U2, U3, U4, U5, U6, U7, U8](
+  col1: TypedColumn[V, U1],
+  col2: TypedColumn[V, U2],
+  col3: TypedColumn[V, U3],
+  col4: TypedColumn[V, U4],
+  col5: TypedColumn[V, U5],
+  col6: TypedColumn[V, U6],
+  col7: TypedColumn[V, U7],
+  col8: TypedColumn[V, U8]): Dataset[(K, U1, U2, U3, U4, U5, U6, U7, U8)] =
+aggUntyped(col1, col2, col3, col4, col5, col6, col7, col8)
+  .asInstanceOf[Dataset[(K, U1, U2, U3, U4, U5, U6, U7, U8)]]
+
+  /**
* Returns a [[Dataset]] that contains a tuple with each key and the number 
of items present
* for that key.
*
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
index 4b08a4b..ff61431 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
@@ -603,6 +603,70 @@ class DatasetSuite extends QueryTest with SharedSQLContext 
{
   ("a", 30L, 32L, 2L, 15.0), ("b", 3L, 5L, 2L, 1.5), ("c", 1L, 2L, 1L, 
1.0))
   }
 
+  test("typed aggregation: expr, expr, expr, expr, expr") {
+val ds = Seq(("a", 10), ("a", 20), ("b", 1), ("b", 2), ("c", 1)).toDS()
+
+checkDatasetUnorderly(
+  ds.groupByKey(_._1).agg(
+sum("_2").as[Long],
+sum($"_2" + 1).as[Long],
+

[spark] branch master updated: [SPARK-27959][YARN] Change YARN resource configs to use .amount

2019-07-16 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 43d68cd  [SPARK-27959][YARN] Change YARN resource configs to use 
.amount
43d68cd is described below

commit 43d68cd4ff84530c3d597f07352984225ab1db7c
Author: Thomas Graves 
AuthorDate: Tue Jul 16 10:56:07 2019 -0700

[SPARK-27959][YARN] Change YARN resource configs to use .amount

## What changes were proposed in this pull request?

we are adding in generic resource support into spark where we have suffix 
for the amount of the resource so that we could support other configs.

Spark on yarn already had added configs to request resources via the 
configs spark.yarn.{executor/driver/am}.resource=, where the  is value and unit together.  We should change those configs to have a 
`.amount` suffix on them to match the spark configs and to allow future configs 
to be more easily added. YARN itself already supports tags and attributes so if 
we want the user to be able to pass those from spark at some point having a 
suffix makes sense. it wou [...]

## How was this patch tested?

Tested via unit tests and manually on a yarn 3.x cluster with GPU resources 
configured on.

Closes #24989 from tgravescs/SPARK-27959-yarn-resourceconfigs.

Authored-by: Thomas Graves 
Signed-off-by: Marcelo Vanzin 
---
 docs/running-on-yarn.md| 14 ++--
 .../org/apache/spark/deploy/yarn/Client.scala  | 14 ++--
 .../spark/deploy/yarn/ResourceRequestHelper.scala  | 46 -
 .../apache/spark/deploy/yarn/YarnAllocator.scala   |  5 +-
 .../spark/deploy/yarn/YarnSparkHadoopUtil.scala| 18 -
 .../org/apache/spark/deploy/yarn/ClientSuite.scala | 19 +++---
 .../deploy/yarn/ResourceRequestHelperSuite.scala   | 77 +-
 .../spark/deploy/yarn/YarnAllocatorSuite.scala | 13 ++--
 8 files changed, 137 insertions(+), 69 deletions(-)

diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index dc93e9c..9d9b253 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -142,20 +142,20 @@ To use a custom metrics.properties for the application 
master and executors, upd
   
 
 
-  spark.yarn.am.resource.{resource-type}
+  spark.yarn.am.resource.{resource-type}.amount
   (none)
   
 Amount of resource to use for the YARN Application Master in client mode.
-In cluster mode, use 
spark.yarn.driver.resource.resource-type instead.
+In cluster mode, use 
spark.yarn.driver.resource.resource-type.amount instead.
 Please note that this feature can be used only with YARN 3.0+
 For reference, see YARN Resource Model documentation: 
https://hadoop.apache.org/docs/r3.0.1/hadoop-yarn/hadoop-yarn-site/ResourceModel.html
 
 Example: 
-To request GPU resources from YARN, use: 
spark.yarn.am.resource.yarn.io/gpu
+To request GPU resources from YARN, use: 
spark.yarn.am.resource.yarn.io/gpu.amount
   
 
 
-  spark.yarn.driver.resource.{resource-type}
+  spark.yarn.driver.resource.{resource-type}.amount
   (none)
   
 Amount of resource to use for the YARN Application Master in cluster mode.
@@ -163,11 +163,11 @@ To use a custom metrics.properties for the application 
master and executors, upd
 For reference, see YARN Resource Model documentation: 
https://hadoop.apache.org/docs/r3.0.1/hadoop-yarn/hadoop-yarn-site/ResourceModel.html
 
 Example: 
-To request GPU resources from YARN, use: 
spark.yarn.driver.resource.yarn.io/gpu
+To request GPU resources from YARN, use: 
spark.yarn.driver.resource.yarn.io/gpu.amount
   
 
 
-  spark.yarn.executor.resource.{resource-type}
+  spark.yarn.executor.resource.{resource-type}.amount
   (none)
  
  Amount of resource to use per executor process.
@@ -175,7 +175,7 @@ To use a custom metrics.properties for the application 
master and executors, upd
  For reference, see YARN Resource Model documentation: 
https://hadoop.apache.org/docs/r3.0.1/hadoop-yarn/hadoop-yarn-site/ResourceModel.html
  
  Example: 
- To request GPU resources from YARN, use: 
spark.yarn.executor.resource.yarn.io/gpu
+ To request GPU resources from YARN, use: 
spark.yarn.executor.resource.yarn.io/gpu.amount
  
 
 
diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
index 5b361d1..651e706 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
@@ -51,7 +51,7 @@ import org.apache.spark.{SecurityManager, SparkConf, 
SparkException}
 import org.apache.spark.api.python.PythonUtils
 import org.apache.spark.deploy.{SparkApplication, SparkHadoopUtil}
 

[spark] branch master updated: [SPARK-28343][FOLLOW-UP][SQL][TEST] Enable spark.sql.function.preferIntegralDivision for PostgreSQL testing

2019-07-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 71882f1  [SPARK-28343][FOLLOW-UP][SQL][TEST] Enable 
spark.sql.function.preferIntegralDivision for PostgreSQL testing
71882f1 is described below

commit 71882f119e72934a02d4c177f0d52c785e2df79f
Author: Yuming Wang 
AuthorDate: Tue Jul 16 08:46:01 2019 -0700

[SPARK-28343][FOLLOW-UP][SQL][TEST] Enable 
spark.sql.function.preferIntegralDivision for PostgreSQL testing

## What changes were proposed in this pull request?

This PR enables `spark.sql.function.preferIntegralDivision` for PostgreSQL 
testing.

## How was this patch tested?

N/A

Closes #25170 from wangyum/SPARK-28343-2.

Authored-by: Yuming Wang 
Signed-off-by: Dongjoon Hyun 
---
 .../test/resources/sql-tests/inputs/pgSQL/int2.sql |  6 +-
 .../test/resources/sql-tests/inputs/pgSQL/int4.sql |  1 -
 .../test/resources/sql-tests/inputs/pgSQL/int8.sql |  1 -
 .../resources/sql-tests/results/pgSQL/case.sql.out | 18 ++---
 .../resources/sql-tests/results/pgSQL/int2.sql.out |  4 +-
 .../resources/sql-tests/results/pgSQL/int4.sql.out | 32 -
 .../resources/sql-tests/results/pgSQL/int8.sql.out | 78 +++---
 .../sql-tests/results/udf/pgSQL/udf-case.sql.out   |  8 +--
 .../org/apache/spark/sql/SQLQueryTestSuite.scala   |  1 +
 9 files changed, 73 insertions(+), 76 deletions(-)

diff --git a/sql/core/src/test/resources/sql-tests/inputs/pgSQL/int2.sql 
b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/int2.sql
index 61f350d..f64ec5d 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/pgSQL/int2.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/int2.sql
@@ -88,11 +88,9 @@ WHERE f1 > -32767;
 
 SELECT '' AS five, i.f1, i.f1 - int('2') AS x FROM INT2_TBL i;
 
--- PostgreSQL `/` is the same with Spark `div` since SPARK-2659.
-SELECT '' AS five, i.f1, i.f1 div smallint('2') AS x FROM INT2_TBL i;
+SELECT '' AS five, i.f1, i.f1 / smallint('2') AS x FROM INT2_TBL i;
 
--- PostgreSQL `/` is the same with Spark `div` since SPARK-2659.
-SELECT '' AS five, i.f1, i.f1 div int('2') AS x FROM INT2_TBL i;
+SELECT '' AS five, i.f1, i.f1 / int('2') AS x FROM INT2_TBL i;
 
 -- corner cases
 SELECT string(shiftleft(smallint(-1), 15));
diff --git a/sql/core/src/test/resources/sql-tests/inputs/pgSQL/int4.sql 
b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/int4.sql
index 675636e..86432a8 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/pgSQL/int4.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/int4.sql
@@ -135,7 +135,6 @@ SELECT int('1000') < int('999') AS `false`;
 
 SELECT 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 AS ten;
 
--- [SPARK-2659] HiveQL: Division operator should always perform fractional 
division
 SELECT 2 + 2 / 2 AS three;
 
 SELECT (2 + 2) / 2 AS two;
diff --git a/sql/core/src/test/resources/sql-tests/inputs/pgSQL/int8.sql 
b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/int8.sql
index 32ac877..d29bf3b 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/pgSQL/int8.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/int8.sql
@@ -85,7 +85,6 @@ SELECT 37 - q1 AS minus4 FROM INT8_TBL;
 SELECT '' AS five, 2 * q1 AS `twice int4` FROM INT8_TBL;
 SELECT '' AS five, q1 * 2 AS `twice int4` FROM INT8_TBL;
 
--- [SPARK-2659] HiveQL: Division operator should always perform fractional 
division
 -- int8 op int4
 SELECT q1 + int(42) AS `8plus4`, q1 - int(42) AS `8minus4`, q1 * int(42) AS 
`8mul4`, q1 / int(42) AS `8div4` FROM INT8_TBL;
 -- int4 op int8
diff --git a/sql/core/src/test/resources/sql-tests/results/pgSQL/case.sql.out 
b/sql/core/src/test/resources/sql-tests/results/pgSQL/case.sql.out
index 9b20b31..f95adcd 100644
--- a/sql/core/src/test/resources/sql-tests/results/pgSQL/case.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/pgSQL/case.sql.out
@@ -176,28 +176,28 @@ struct
 -- !query 18
 SELECT CASE WHEN 1=0 THEN 1/0 WHEN 1=1 THEN 1 ELSE 2/0 END
 -- !query 18 schema
-struct
+struct
 -- !query 18 output
-1.0
+1
 
 
 -- !query 19
 SELECT CASE 1 WHEN 0 THEN 1/0 WHEN 1 THEN 1 ELSE 2/0 END
 -- !query 19 schema
-struct
+struct
 -- !query 19 output
-1.0
+1
 
 
 -- !query 20
 SELECT CASE WHEN i > 100 THEN 1/0 ELSE 0 END FROM case_tbl
 -- !query 20 schema
-struct 100) THEN (CAST(1 AS DOUBLE) / CAST(0 AS DOUBLE)) ELSE 
CAST(0 AS DOUBLE) END:double>
+struct 100) THEN (1 div 0) ELSE 0 END:int>
 -- !query 20 output
-0.0
-0.0
-0.0
-0.0
+0
+0
+0
+0
 
 
 -- !query 21
diff --git a/sql/core/src/test/resources/sql-tests/results/pgSQL/int2.sql.out 
b/sql/core/src/test/resources/sql-tests/results/pgSQL/int2.sql.out
index 6b9246f..7a7ce5f 100644
--- a/sql/core/src/test/resources/sql-tests/results/pgSQL/int2.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/pgSQL/int2.sql.out
@@ -266,7 

[spark] branch branch-2.4 updated: Revert "[SPARK-27485] EnsureRequirements.reorder should handle duplicate expressions gracefully"

2019-07-16 Thread lixiao
This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 63898cb  Revert "[SPARK-27485] EnsureRequirements.reorder should 
handle duplicate expressions gracefully"
63898cb is described below

commit 63898cbc1db46be8bcbb46d21fe01340bc883520
Author: gatorsmile 
AuthorDate: Tue Jul 16 06:57:14 2019 -0700

Revert "[SPARK-27485] EnsureRequirements.reorder should handle duplicate 
expressions gracefully"

This reverts commit 72f547d4a960ba0ba9cace53a0a5553eca1b4dd6.
---
 .../execution/exchange/EnsureRequirements.scala| 72 ++
 .../scala/org/apache/spark/sql/JoinSuite.scala | 20 --
 .../apache/spark/sql/execution/PlannerSuite.scala  | 26 
 3 files changed, 32 insertions(+), 86 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
index bdb9a31..d2d5011 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
@@ -24,7 +24,8 @@ import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.plans.physical._
 import org.apache.spark.sql.catalyst.rules.Rule
 import org.apache.spark.sql.execution._
-import org.apache.spark.sql.execution.joins.{ShuffledHashJoinExec, 
SortMergeJoinExec}
+import org.apache.spark.sql.execution.joins.{BroadcastHashJoinExec, 
ShuffledHashJoinExec,
+  SortMergeJoinExec}
 import org.apache.spark.sql.internal.SQLConf
 
 /**
@@ -220,41 +221,25 @@ case class EnsureRequirements(conf: SQLConf) extends 
Rule[SparkPlan] {
   }
 
   private def reorder(
-  leftKeys: IndexedSeq[Expression],
-  rightKeys: IndexedSeq[Expression],
+  leftKeys: Seq[Expression],
+  rightKeys: Seq[Expression],
   expectedOrderOfKeys: Seq[Expression],
   currentOrderOfKeys: Seq[Expression]): (Seq[Expression], Seq[Expression]) 
= {
-if (expectedOrderOfKeys.size != currentOrderOfKeys.size) {
-  return (leftKeys, rightKeys)
-}
-
-// Build a lookup between an expression and the positions its holds in the 
current key seq.
-val keyToIndexMap = mutable.Map.empty[Expression, mutable.BitSet]
-currentOrderOfKeys.zipWithIndex.foreach {
-  case (key, index) =>
-keyToIndexMap.getOrElseUpdate(key.canonicalized, 
mutable.BitSet.empty).add(index)
-}
-
-// Reorder the keys.
-val leftKeysBuffer = new ArrayBuffer[Expression](leftKeys.size)
-val rightKeysBuffer = new ArrayBuffer[Expression](rightKeys.size)
-val iterator = expectedOrderOfKeys.iterator
-while (iterator.hasNext) {
-  // Lookup the current index of this key.
-  keyToIndexMap.get(iterator.next().canonicalized) match {
-case Some(indices) if indices.nonEmpty =>
-  // Take the first available index from the map.
-  val index = indices.firstKey
-  indices.remove(index)
+val leftKeysBuffer = ArrayBuffer[Expression]()
+val rightKeysBuffer = ArrayBuffer[Expression]()
+val pickedIndexes = mutable.Set[Int]()
+val keysAndIndexes = currentOrderOfKeys.zipWithIndex
 
-  // Add the keys for that index to the reordered keys.
-  leftKeysBuffer += leftKeys(index)
-  rightKeysBuffer += rightKeys(index)
-case _ =>
-  // The expression cannot be found, or we have exhausted all indices 
for that expression.
-  return (leftKeys, rightKeys)
-  }
-}
+expectedOrderOfKeys.foreach(expression => {
+  val index = keysAndIndexes.find { case (e, idx) =>
+// As we may have the same key used many times, we need to filter out 
its occurrence we
+// have already used.
+e.semanticEquals(expression) && !pickedIndexes.contains(idx)
+  }.map(_._2).get
+  pickedIndexes += index
+  leftKeysBuffer.append(leftKeys(index))
+  rightKeysBuffer.append(rightKeys(index))
+})
 (leftKeysBuffer, rightKeysBuffer)
   }
 
@@ -264,13 +249,20 @@ case class EnsureRequirements(conf: SQLConf) extends 
Rule[SparkPlan] {
   leftPartitioning: Partitioning,
   rightPartitioning: Partitioning): (Seq[Expression], Seq[Expression]) = {
 if (leftKeys.forall(_.deterministic) && rightKeys.forall(_.deterministic)) 
{
-  (leftPartitioning, rightPartitioning) match {
-case (HashPartitioning(leftExpressions, _), _) =>
-  reorder(leftKeys.toIndexedSeq, rightKeys.toIndexedSeq, 
leftExpressions, leftKeys)
-case (_, HashPartitioning(rightExpressions, _)) =>
-  reorder(leftKeys.toIndexedSeq, rightKeys.toIndexedSeq, 
rightExpressions, rightKeys)
-case _ =>
-  (leftKeys, rightKeys)
+

[spark] branch master updated (113f62d -> 282a12d)

2019-07-16 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 113f62d  [SPARK-27485][FOLLOWUP] Do not reduce the number of 
partitions for repartition in adaptive execution - fix compilation
 add 282a12d  [SPARK-27944][ML] Unify the behavior of checking empty output 
column names

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/ml/Predictor.scala |  4 +--
 .../spark/ml/classification/Classifier.scala   |  6 ++---
 .../apache/spark/ml/classification/OneVsRest.scala | 18 ++---
 .../classification/ProbabilisticClassifier.scala   |  4 +--
 .../spark/ml/clustering/GaussianMixture.scala  | 30 +-
 .../scala/org/apache/spark/ml/clustering/LDA.scala | 15 +++
 .../ml/regression/AFTSurvivalRegression.scala  | 27 ++-
 .../ml/regression/DecisionTreeRegressor.scala  | 26 ++-
 .../regression/GeneralizedLinearRegression.scala   | 25 +-
 9 files changed, 103 insertions(+), 52 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated (72f547d -> 3f5a114)

2019-07-16 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 72f547d  [SPARK-27485] EnsureRequirements.reorder should handle 
duplicate expressions gracefully
 add 3f5a114  [SPARK-28247][SS][BRANCH-2.4] Fix flaky test "query without 
test harness" on ContinuousSuite

No new revisions were added by this update.

Summary of changes:
 .../sql/streaming/continuous/ContinuousSuite.scala | 63 --
 1 file changed, 47 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-27485] EnsureRequirements.reorder should handle duplicate expressions gracefully

2019-07-16 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 72f547d  [SPARK-27485] EnsureRequirements.reorder should handle 
duplicate expressions gracefully
72f547d is described below

commit 72f547d4a960ba0ba9cace53a0a5553eca1b4dd6
Author: herman 
AuthorDate: Tue Jul 16 17:09:52 2019 +0800

[SPARK-27485] EnsureRequirements.reorder should handle duplicate 
expressions gracefully

## What changes were proposed in this pull request?
When reordering joins EnsureRequirements only checks if all the join keys 
are present in the partitioning expression seq. This is problematic when the 
joins keys and and partitioning expressions both contain duplicates but not the 
same number of duplicates for each expression, e.g. `Seq(a, a, b)` vs `Seq(a, 
b, b)`. This fails with an index lookup failure in the `reorder` function.

This PR fixes this removing the equality checking logic from the 
`reorderJoinKeys` function, and by doing the multiset equality in the `reorder` 
function while building the reordered key sequences.

## How was this patch tested?
Added a unit test to the `PlannerSuite` and added an integration test to 
`JoinSuite`

Closes #25167 from hvanhovell/SPARK-27485.

Authored-by: herman 
Signed-off-by: Wenchen Fan 
---
 .../execution/exchange/EnsureRequirements.scala| 72 --
 .../scala/org/apache/spark/sql/JoinSuite.scala | 20 ++
 .../apache/spark/sql/execution/PlannerSuite.scala  | 26 
 3 files changed, 86 insertions(+), 32 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
index d2d5011..bdb9a31 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
@@ -24,8 +24,7 @@ import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.plans.physical._
 import org.apache.spark.sql.catalyst.rules.Rule
 import org.apache.spark.sql.execution._
-import org.apache.spark.sql.execution.joins.{BroadcastHashJoinExec, 
ShuffledHashJoinExec,
-  SortMergeJoinExec}
+import org.apache.spark.sql.execution.joins.{ShuffledHashJoinExec, 
SortMergeJoinExec}
 import org.apache.spark.sql.internal.SQLConf
 
 /**
@@ -221,25 +220,41 @@ case class EnsureRequirements(conf: SQLConf) extends 
Rule[SparkPlan] {
   }
 
   private def reorder(
-  leftKeys: Seq[Expression],
-  rightKeys: Seq[Expression],
+  leftKeys: IndexedSeq[Expression],
+  rightKeys: IndexedSeq[Expression],
   expectedOrderOfKeys: Seq[Expression],
   currentOrderOfKeys: Seq[Expression]): (Seq[Expression], Seq[Expression]) 
= {
-val leftKeysBuffer = ArrayBuffer[Expression]()
-val rightKeysBuffer = ArrayBuffer[Expression]()
-val pickedIndexes = mutable.Set[Int]()
-val keysAndIndexes = currentOrderOfKeys.zipWithIndex
+if (expectedOrderOfKeys.size != currentOrderOfKeys.size) {
+  return (leftKeys, rightKeys)
+}
+
+// Build a lookup between an expression and the positions its holds in the 
current key seq.
+val keyToIndexMap = mutable.Map.empty[Expression, mutable.BitSet]
+currentOrderOfKeys.zipWithIndex.foreach {
+  case (key, index) =>
+keyToIndexMap.getOrElseUpdate(key.canonicalized, 
mutable.BitSet.empty).add(index)
+}
+
+// Reorder the keys.
+val leftKeysBuffer = new ArrayBuffer[Expression](leftKeys.size)
+val rightKeysBuffer = new ArrayBuffer[Expression](rightKeys.size)
+val iterator = expectedOrderOfKeys.iterator
+while (iterator.hasNext) {
+  // Lookup the current index of this key.
+  keyToIndexMap.get(iterator.next().canonicalized) match {
+case Some(indices) if indices.nonEmpty =>
+  // Take the first available index from the map.
+  val index = indices.firstKey
+  indices.remove(index)
 
-expectedOrderOfKeys.foreach(expression => {
-  val index = keysAndIndexes.find { case (e, idx) =>
-// As we may have the same key used many times, we need to filter out 
its occurrence we
-// have already used.
-e.semanticEquals(expression) && !pickedIndexes.contains(idx)
-  }.map(_._2).get
-  pickedIndexes += index
-  leftKeysBuffer.append(leftKeys(index))
-  rightKeysBuffer.append(rightKeys(index))
-})
+  // Add the keys for that index to the reordered keys.
+  leftKeysBuffer += leftKeys(index)
+  rightKeysBuffer += rightKeys(index)
+case _ =>
+  // The expression cannot be found, or we have exhausted all indices 
for that expression.
+   

[spark] branch master updated (f74ad3d -> 113f62d)

2019-07-16 Thread hvanhovell
This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f74ad3d  [SPARK-28129][SQL][TEST] Port float8.sql
 add 113f62d  [SPARK-27485][FOLLOWUP] Do not reduce the number of 
partitions for repartition in adaptive execution - fix compilation

No new revisions were added by this update.

Summary of changes:
 .../src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala  | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-28129][SQL][TEST] Port float8.sql

2019-07-16 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f74ad3d  [SPARK-28129][SQL][TEST] Port float8.sql
f74ad3d is described below

commit f74ad3d7004e833b6dbc07d6281407ab89ef2d32
Author: Yuming Wang 
AuthorDate: Tue Jul 16 19:31:20 2019 +0900

[SPARK-28129][SQL][TEST] Port float8.sql

## What changes were proposed in this pull request?

This PR is to port float8.sql from PostgreSQL regression tests. 
https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/float8.sql

The expected results can be found in the link: 
https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/float8.out

When porting the test cases, found six PostgreSQL specific features that do 
not exist in Spark SQL:
[SPARK-28060](https://issues.apache.org/jira/browse/SPARK-28060): Double 
type can not accept some special inputs
[SPARK-28027](https://issues.apache.org/jira/browse/SPARK-28027): Spark SQL 
does not support prefix operator `` and `|/`
[SPARK-28061](https://issues.apache.org/jira/browse/SPARK-28061): Support 
for converting float to binary format
[SPARK-23906](https://issues.apache.org/jira/browse/SPARK-23906): Support 
Truncate number
[SPARK-28134](https://issues.apache.org/jira/browse/SPARK-28134): Missing 
Trigonometric Functions

Also, found two bug:
[SPARK-28024](https://issues.apache.org/jira/browse/SPARK-28024): Incorrect 
value when out of range
[SPARK-28135](https://issues.apache.org/jira/browse/SPARK-28135): 
ceil/ceiling/floor/power returns incorrect values

Also, found four inconsistent behavior:
[SPARK-27923](https://issues.apache.org/jira/browse/SPARK-27923): Spark SQL 
insert bad inputs to NULL
[SPARK-28028](https://issues.apache.org/jira/browse/SPARK-28028): Cast 
numeric to integral type need round
[SPARK-27923](https://issues.apache.org/jira/browse/SPARK-27923): Spark SQL 
returns NULL when dividing by zero
[SPARK-28007](https://issues.apache.org/jira/browse/SPARK-28007):  Caret 
operator (^) means bitwise XOR in Spark/Hive and exponentiation in Postgres

## How was this patch tested?

N/A

Closes #24931 from wangyum/SPARK-28129.

Authored-by: Yuming Wang 
Signed-off-by: HyukjinKwon 
---
 .../resources/sql-tests/inputs/pgSQL/float8.sql| 500 
 .../sql-tests/results/pgSQL/float8.sql.out | 839 +
 2 files changed, 1339 insertions(+)

diff --git a/sql/core/src/test/resources/sql-tests/inputs/pgSQL/float8.sql 
b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/float8.sql
new file mode 100644
index 000..6f8e3b5
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/float8.sql
@@ -0,0 +1,500 @@
+--
+-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+--
+--
+-- FLOAT8
+-- 
https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/float8.sql
+
+CREATE TABLE FLOAT8_TBL(f1 double) USING parquet;
+
+INSERT INTO FLOAT8_TBL VALUES ('0.0   ');
+INSERT INTO FLOAT8_TBL VALUES ('1004.30  ');
+INSERT INTO FLOAT8_TBL VALUES ('   -34.84');
+INSERT INTO FLOAT8_TBL VALUES ('1.2345678901234e+200');
+INSERT INTO FLOAT8_TBL VALUES ('1.2345678901234e-200');
+
+-- [SPARK-28024] Incorrect numeric values when out of range
+-- test for underflow and overflow handling
+SELECT double('10e400');
+SELECT double('-10e400');
+SELECT double('10e-400');
+SELECT double('-10e-400');
+
+-- [SPARK-28061] Support for converting float to binary format
+-- test smallest normalized input
+-- SELECT float8send('2.2250738585072014E-308'::float8);
+
+-- [SPARK-27923] Spark SQL insert there bad inputs to NULL
+-- bad input
+-- INSERT INTO FLOAT8_TBL VALUES ('');
+-- INSERT INTO FLOAT8_TBL VALUES (' ');
+-- INSERT INTO FLOAT8_TBL VALUES ('xyz');
+-- INSERT INTO FLOAT8_TBL VALUES ('5.0.0');
+-- INSERT INTO FLOAT8_TBL VALUES ('5 . 0');
+-- INSERT INTO FLOAT8_TBL VALUES ('5.   0');
+-- INSERT INTO FLOAT8_TBL VALUES ('- 3');
+-- INSERT INTO FLOAT8_TBL VALUES ('123   5');
+
+-- special inputs
+SELECT double('NaN');
+-- [SPARK-28060] Double type can not accept some special inputs
+SELECT double('nan');
+SELECT double('   NAN  ');
+SELECT double('infinity');
+SELECT double('  -INFINiTY   ');
+-- [SPARK-27923] Spark SQL insert there bad special inputs to NULL
+-- bad special inputs
+SELECT double('N A N');
+SELECT double('NaN x');
+SELECT double(' INFINITYx');
+
+SELECT double('Infinity') + 100.0;
+-- [SPARK-27768] Infinity, -Infinity, NaN should be recognized in a case 
insensitive manner
+SELECT double('Infinity') / double('Infinity');
+SELECT double('NaN') / double('NaN');
+-- [SPARK-28315] Decimal can not accept NaN as input
+SELECT double(decimal('nan'));
+

[spark] branch master updated (421d9d5 -> d1a1376)

2019-07-16 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 421d9d5  [SPARK-27485] EnsureRequirements.reorder should handle 
duplicate expressions gracefully
 add d1a1376  [SPARK-28356][SQL] Do not reduce the number of partitions for 
repartition in adaptive execution

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/SparkStrategies.scala   |  6 --
 .../adaptive/ReduceNumShufflePartitions.scala   | 21 +++--
 .../sql/execution/exchange/EnsureRequirements.scala |  4 ++--
 .../execution/exchange/ShuffleExchangeExec.scala|  3 ++-
 .../scala/org/apache/spark/sql/DatasetSuite.scala   |  2 +-
 .../execution/ReduceNumShufflePartitionsSuite.scala | 15 +--
 6 files changed, 25 insertions(+), 26 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (9a7f01d -> 421d9d5)

2019-07-16 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9a7f01d  [SPARK-28201][SQL][TEST][FOLLOWUP] Fix Integration test suite 
according to the new exception message
 add 421d9d5  [SPARK-27485] EnsureRequirements.reorder should handle 
duplicate expressions gracefully

No new revisions were added by this update.

Summary of changes:
 .../execution/exchange/EnsureRequirements.scala| 74 --
 .../scala/org/apache/spark/sql/JoinSuite.scala | 20 ++
 .../apache/spark/sql/execution/PlannerSuite.scala  | 26 
 3 files changed, 87 insertions(+), 33 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (6926849 -> 9a7f01d)

2019-07-16 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6926849  [SPARK-28395][SQL] Division operator support integral division
 add 9a7f01d  [SPARK-28201][SQL][TEST][FOLLOWUP] Fix Integration test suite 
according to the new exception message

No new revisions were added by this update.

Summary of changes:
 .../test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-28395][SQL] Division operator support integral division

2019-07-16 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6926849  [SPARK-28395][SQL] Division operator support integral division
6926849 is described below

commit 69268492471137dd7a3da54c218026c3b1fa7db3
Author: Yuming Wang 
AuthorDate: Tue Jul 16 15:43:15 2019 +0800

[SPARK-28395][SQL] Division operator support integral division

## What changes were proposed in this pull request?

PostgreSQL, Teradata, SQL Server, DB2 and Presto perform integral division 
with the `/` operator.
But Oracle, Vertica, Hive, MySQL and MariaDB perform fractional division 
with the `/` operator.

This pr add a flag(`spark.sql.function.preferIntegralDivision`) to control 
whether to use integral division with the `/` operator.

Examples:

**PostgreSQL**:
```sql
postgres=# select substr(version(), 0, 16), cast(10 as int) / cast(3 as 
int), cast(10.1 as float8) / cast(3 as int), cast(10 as int) / cast(3.1 as 
float8), cast(10.1 as float8)/cast(3.1 as float8);
 substr  | ?column? | ?column? |?column? | 
?column?

-+--+--+-+--
 PostgreSQL 11.3 |3 | 3.37 | 3.2258064516129 | 
3.25806451612903
(1 row)
```
**SQL Server**:
```sql
1> select cast(10 as int) / cast(3 as int), cast(10.1 as float) / cast(3 as 
int), cast(10 as int) / cast(3.1 as float), cast(10.1 as float)/cast(3.1 as 
float);
2> go

---   

  3   3.36673.225806451612903
3.258064516129032

(1 rows affected)
```
**DB2**:
```sql
[db2inst12f3c821d36b7 ~]$ db2 "select cast(10 as int) / cast(3 as int), 
cast(10.1 as double) / cast(3 as int), cast(10 as int) / cast(3.1 as double), 
cast(10.1 as double)/cast(3.1 as double) from table 
(sysproc.env_get_inst_info())"

1   234
---   

  3   +3.37E+000   +3.22580645161290E+000   
+3.25806451612903E+000

  1 record(s) selected.
```
**Presto**:
```sql
presto> select cast(10 as int) / cast(3 as int), cast(10.1 as double) / 
cast(3 as int), cast(10 as int) / cast(3.1 as double), cast(10.1 as 
double)/cast(3.1 as double);
 _col0 |   _col1|   _col2   |   _col3
---++---+---
 3 | 3.3667 | 3.225806451612903 | 3.258064516129032
(1 row)
```
**Teradata**:

![image](https://user-images.githubusercontent.com/5399861/61200701-e97d5380-a714-11e9-9a1d-57fd99d38c8d.png)

**Oracle**:
```sql
SQL> select 10 / 3 from dual;

  10/3
--
3.
```
**Vertica**
```sql
dbadmin=> select version(), cast(10 as int) / cast(3 as int), cast(10.1 as 
float8) / cast(3 as int), cast(10 as int) / cast(3.1 as float8), cast(10.1 as 
float8)/cast(3.1 as float8);
  version   |   ?column?   | ?column?   
  |?column? | ?column?

+--+--+-+--
 Vertica Analytic Database v9.1.1-0 | 3.33 | 
3.37 | 3.2258064516129 | 3.25806451612903
(1 row)
```
**Hive**:
```sql
hive> select cast(10 as int) / cast(3 as int), cast(10.1 as double) / 
cast(3 as int), cast(10 as int) / cast(3.1 as double), cast(10.1 as 
double)/cast(3.1 as double);
OK
3.3335  3.3667  3.225806451612903   
3.258064516129032
Time taken: 0.143 seconds, Fetched: 1 row(s)
```
**MariaDB**:
```sql
MariaDB [(none)]> select version(), cast(10 as int) / cast(3 as int), 
cast(10.1 as double) / cast(3 as int), cast(10 as int) / cast(3.1 as double), 
cast(10.1 as double)/cast(3.1 as double);

+--+--+---+---+--+
| version()| cast(10 as int) / cast(3 as int) | 
cast(10.1 as double) / cast(3 as int) | cast(10 as int) / cast(3.1 as double) | 
cast(10.1 as double)/cast(3.1 as double) |

+--+--+---+---+--+
| 10.4.6-MariaDB-1:10.4.6+maria~bionic | 

[spark] branch master updated (b94fa97 -> be4a552)

2019-07-16 Thread jshao
This is an automated email from the ASF dual-hosted git repository.

jshao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b94fa97  [SPARK-28345][SQL][PYTHON] PythonUDF predicate should be able 
to pushdown to join
 add be4a552  [SPARK-28106][SQL] When Spark SQL use  "add jar" ,  before 
add to SparkContext, check  jar path exist first.

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/SparkContext.scala | 34 ++
 .../scala/org/apache/spark/SparkContextSuite.scala | 11 +++
 2 files changed, 40 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (8e26d4d -> b94fa97)

2019-07-16 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8e26d4d  [SPARK-28408][SQL][TEST] Restrict test values for DateType, 
TimestampType and CalendarIntervalType
 add b94fa97  [SPARK-28345][SQL][PYTHON] PythonUDF predicate should be able 
to pushdown to join

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/expressions/predicates.scala  |  4 +++
 .../catalyst/optimizer/FilterPushdownSuite.scala   | 34 --
 .../scala/org/apache/spark/sql/JoinSuite.scala | 25 +++-
 3 files changed, 59 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org