date:20180122

Author: pwendell
Date: Tue Jan 23 02:15:25 2018
New Revision: 24370

Log:
Apache Spark 2.3.1-SNAPSHOT-2018_01_22_18_01-7241556 docs


[This commit notification would consist of 1442 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r24368 - in /dev/spark/2.4.0-SNAPSHOT-2018_01_22_16_01-51eb750-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

Author: pwendell
Date: Tue Jan 23 00:14:52 2018
New Revision: 24368

Log:
Apache Spark 2.4.0-SNAPSHOT-2018_01_22_16_01-51eb750 docs


[This commit notification would consist of 1442 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-22389][SQL] data source v2 partitioning reporting interface

Repository: spark
Updated Branches:
  refs/heads/master 76b8b840d -> 51eb75026


[SPARK-22389][SQL] data source v2 partitioning reporting interface

## What changes were proposed in this pull request?

a new interface which allows data source to report partitioning and avoid 
shuffle at Spark side.

The design is pretty like the internal distribution/partitioing framework. 
Spark defines a `Distribution` interfaces and several concrete implementations, 
and ask the data source to report a `Partitioning`, the `Partitioning` should 
tell Spark if it can satisfy a `Distribution` or not.

## How was this patch tested?

new test

Author: Wenchen Fan 

Closes #20201 from cloud-fan/partition-reporting.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/51eb7502
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/51eb7502
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/51eb7502

Branch: refs/heads/master
Commit: 51eb750263dd710434ddb60311571fa3dcec66eb
Parents: 76b8b84
Author: Wenchen Fan 
Authored: Mon Jan 22 15:21:09 2018 -0800
Committer: gatorsmile 
Committed: Mon Jan 22 15:21:09 2018 -0800

--
 .../catalyst/plans/physical/partitioning.scala  |   2 +-
 .../v2/reader/ClusteredDistribution.java|  38 +++
 .../sql/sources/v2/reader/Distribution.java |  39 +++
 .../sql/sources/v2/reader/Partitioning.java |  46 
 .../v2/reader/SupportsReportPartitioning.java   |  33 ++
 .../datasources/v2/DataSourcePartitioning.scala |  56 ++
 .../datasources/v2/DataSourceV2ScanExec.scala   |   9 ++
 .../v2/JavaPartitionAwareDataSource.java| 110 +++
 .../sql/sources/v2/DataSourceV2Suite.scala  |  79 +
 9 files changed, 411 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/51eb7502/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
index 0189bd7..4d9a992 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
@@ -153,7 +153,7 @@ case class BroadcastDistribution(mode: BroadcastMode) 
extends Distribution {
  *   1. number of partitions.
  *   2. if it can satisfy a given distribution.
  */
-sealed trait Partitioning {
+trait Partitioning {
   /** Returns the number of partitions that the data is split across */
   val numPartitions: Int
 

http://git-wip-us.apache.org/repos/asf/spark/blob/51eb7502/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ClusteredDistribution.java
--
diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ClusteredDistribution.java
 
b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ClusteredDistribution.java
new file mode 100644
index 000..7346500
--- /dev/null
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ClusteredDistribution.java
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.reader;
+
+import org.apache.spark.annotation.InterfaceStability;
+
+/**
+ * A concrete implementation of {@link Distribution}. Represents a 
distribution where records that
+ * share the same values for the {@link #clusteredColumns} will be produced by 
the same
+ * {@link ReadTask}.
+ */
+@InterfaceStability.Evolving
+public class ClusteredDistribution implements Distribution {
+
+  /**
+   * The names of the clustered columns. Note that they are order insensitive.
+   */
+  public final String[] clusteredColumns;
+
+  public

spark git commit: [SPARK-22389][SQL] data source v2 partitioning reporting interface

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 566ef93a6 -> 7241556d8


[SPARK-22389][SQL] data source v2 partitioning reporting interface

## What changes were proposed in this pull request?

a new interface which allows data source to report partitioning and avoid 
shuffle at Spark side.

The design is pretty like the internal distribution/partitioing framework. 
Spark defines a `Distribution` interfaces and several concrete implementations, 
and ask the data source to report a `Partitioning`, the `Partitioning` should 
tell Spark if it can satisfy a `Distribution` or not.

## How was this patch tested?

new test

Author: Wenchen Fan 

Closes #20201 from cloud-fan/partition-reporting.

(cherry picked from commit 51eb750263dd710434ddb60311571fa3dcec66eb)
Signed-off-by: gatorsmile 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7241556d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7241556d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7241556d

Branch: refs/heads/branch-2.3
Commit: 7241556d8b550e22eed2341287812ea373dc1cb2
Parents: 566ef93
Author: Wenchen Fan 
Authored: Mon Jan 22 15:21:09 2018 -0800
Committer: gatorsmile 
Committed: Mon Jan 22 15:21:19 2018 -0800

--
 .../catalyst/plans/physical/partitioning.scala  |   2 +-
 .../v2/reader/ClusteredDistribution.java|  38 +++
 .../sql/sources/v2/reader/Distribution.java |  39 +++
 .../sql/sources/v2/reader/Partitioning.java |  46 
 .../v2/reader/SupportsReportPartitioning.java   |  33 ++
 .../datasources/v2/DataSourcePartitioning.scala |  56 ++
 .../datasources/v2/DataSourceV2ScanExec.scala   |   9 ++
 .../v2/JavaPartitionAwareDataSource.java| 110 +++
 .../sql/sources/v2/DataSourceV2Suite.scala  |  79 +
 9 files changed, 411 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7241556d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
index 0189bd7..4d9a992 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
@@ -153,7 +153,7 @@ case class BroadcastDistribution(mode: BroadcastMode) 
extends Distribution {
  *   1. number of partitions.
  *   2. if it can satisfy a given distribution.
  */
-sealed trait Partitioning {
+trait Partitioning {
   /** Returns the number of partitions that the data is split across */
   val numPartitions: Int
 

http://git-wip-us.apache.org/repos/asf/spark/blob/7241556d/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ClusteredDistribution.java
--
diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ClusteredDistribution.java
 
b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ClusteredDistribution.java
new file mode 100644
index 000..7346500
--- /dev/null
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ClusteredDistribution.java
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.reader;
+
+import org.apache.spark.annotation.InterfaceStability;
+
+/**
+ * A concrete implementation of {@link Distribution}. Represents a 
distribution where records that
+ * share the same values for the {@link #clusteredColumns} will be produced by 
the same
+ * {@link ReadTask}.
+ */
+@InterfaceStability.Evolving
+public class ClusteredDistribution implements Distribution {
+
+  /**
+   * The names

svn commit: r24366 - in /dev/spark/2.3.1-SNAPSHOT-2018_01_22_14_01-566ef93-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

Author: pwendell
Date: Mon Jan 22 22:14:49 2018
New Revision: 24366

Log:
Apache Spark 2.3.1-SNAPSHOT-2018_01_22_14_01-566ef93 docs


[This commit notification would consist of 1441 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r24365 - /dev/spark/KEYS

Author: sameerag
Date: Mon Jan 22 21:25:48 2018
New Revision: 24365

Log:
Update KEYS

Modified:
dev/spark/KEYS

Modified: dev/spark/KEYS
==
--- dev/spark/KEYS (original)
+++ dev/spark/KEYS Mon Jan 22 21:25:48 2018
@@ -403,40 +403,61 @@ dcqbOYBLINwxIMZA6N9qCGrST4DfqbAzGSvZ08oe
 =et2/
 -END PGP PUBLIC KEY BLOCK-
 
-pub   rsa2048/A1CEDBA8AD0C022A 2018-01-11 [SC]
-  FA757B8D64ABBC21FC02BC1CA1CEDBA8AD0C022A
-uid [ultimate] Sameer Agarwal 
-sub   rsa2048/5B0E7FAD797FCBE2 2018-01-11 [E]
+pub   rsa4096 2018-01-17 [SC]
+  F2C64242EC1BEC69EA8FBE35DCE4BFD807461E96
+uid   [ultimate] Sameer Agarwal (CODE SIGNING KEY) 

+sub   rsa4096 2018-01-17 [E]
 
 -BEGIN PGP PUBLIC KEY BLOCK-
 
-mQENBFpX9XgBCADGZb9Jywy7gJuoyzX3+8JA7kPnc6Ah/mTbCemzkq+NkrMQ+eXP
-D6IyHH+ktCp8rG0KEZph3BwQ9m/9YpvGpyUjEAl7miWvnYQCoBfhoMdoM+/9R77G
-yaUgV1z85n0rI7+EUmstitb1Q1qu6FJgO0r/YOBImEqD0VID+vuDVEmjg9DPX2K/
-fADhKHvQDbR5car8Oh9lXEdxn6oRdQif9spkX26P75Oa7oLbK5s1PQm/z2Wn0q6/
-9tsh+HNCKU4oNTboTXiuNEI4S3ypjb5zsSL2PMmxw+eSV859lBuL/THRN1xe3+3h
-EK6Ma3UThtNcHpOHx+YJmiWahic9NHvO58jHABEBAAG0JFNhbWVlciBBZ2Fyd2Fs
-IDxzYW1lZXJhZ0BhcGFjaGUub3JnPokBTgQTAQgAOBYhBPp1e41kq7wh/AK8HKHO
-26itDAIqBQJaV/V4AhsDBQsJCAcCBhUKCQgLAgQWAgMBAh4BAheAAAoJEKHO26it
-DAIqIZYH/AoMHZ27lfK1XfQqEujmz5KSWsSVImgMh/t7F61D9sIvnoiMkrhP9/RG
-R/LJA8bIEIBR906Lto4fcuDboUhNYlGpOsJGSTQeEnGpuonNzNpOssFXYfxrGSRe
-M062/9GwvOer7MthhLbNYSzah6lYnijHe67a5woL3mLEnJj0a8vc0DH0jxpe0d/8
-f0VVQnWe+oZOiFx/Gp+RLfqtnMQ+FrPlGu7WFDseXd9NtMzEVQpoQoBbJ29nBvAU
-4AXjuBZa0dR7cZr4u8C+QMkJOBPEQcyBHYv0/MOT3ggABuLTSdJcGsj7NdCxkSZ2
-NTjjgi+OzLqsdU4srniy8vVDuaIqBhi5AQ0EWlf1eAEIAMk/n66XAoetLEyBHOO7
-wZJNnnCssuGOFh4+xLelOeB4Tx4fKeU9wWGUPaqHbyQJbYxEmVPH0Rq/VTfRYgGl
-XuJXgi7f0A/Q0bhxc5A3DRMl5ifnT6Ame9yOUq9BFoH/VG7qO/GVQ7yRrp+cmj5h
-kTSMUxYrzvHWzozxj9/P1bE5EGGsDjaHkA9t3RuzzV/mKjwpyCep72IxMbmRMfPM
-vD/KaKfNryvyEBmqQpdvJXXremfs3warmvhkYnSpkIeUrRjt32jMO4MHzzC74w+J
-/Cn4+0A/YuvFfU0YnjySRNMqpgT2EFA802QI+Mwj2D6fat8oKhnVvBAY+wHal1c2
-m/UAEQEAAYkBNgQYAQgAIBYhBPp1e41kq7wh/AK8HKHO26itDAIqBQJaV/V4AhsM
-AAoJEKHO26itDAIqMi4IAJ1dyai2f03R1AgzI+W5enp8989vf5KVxwDPv4tJX87o
-sAOSNYmPRXBbj2Hr2N+A+656vx3KkIIozuwuVSDbVDdDnxS6dUqvmA07qtKRXWEO
-da8taStwiaetbCJQkLOr1kyrL6XgL+t5E1jMcDmZxF2Owu4NSaEVERtkovY89V4m
-Ku0fEiDWr/6SWUcPnyPGpwZKccShDGl8JuwM/uRO5HKLeAJp93poqWeOtnpw1Xpw
-RiLNdJXDBol1/+xtV2O3CzX0i4o6Z/hhderuJc/v57LlP/PnOVkGG4/mZA8G/kSC
-jUFFi/fz1oSCMpcpdSOAhCs4oRFv2POgXTCLkpOJNSU=
-=Oc/a
+mQINBFpftRMBEADEsiDSnSg7EBdFoWdRhVrjePjsYyEq4Sxt61vkkwhrH/pZ8r07
+4kVSZV0hdc+7PLa27X400re6OgULDtQ7c3F1hcrcl72VLNo7iE5FcQITSRvXXsf0
+Lb6eHmkUjCrZW8FF5WLdr/XA/aC2YpuXYszCWH3f7It9864M8OjzKznGfR/Q+9kd
+jq2l2d1gLhdMnBwOjxMlyDvU3N3wr1bGNf/s7QAltv5V3yNTPvH9I+iy9FbTuseE
+vnMo3KnopEivmF0yqz2qlN3joVg7yAcMPWG92lRQzkUAkrQXcPvcsEvu22kipcOQ
+SQQMcMQZFQh8E/dLzp4+DA2bRcshHnM5bWG9NZNMnXKRmcJrHmjJDstEN7LR+zwt
+cRj9d0RwCFtS7M9YUX4eCc9Dqgtgg31GVNUZdUcZ1/OHqv+NJUOSZipoKJmAfcBN
+OyEGhlWOGidd/3xJtK1GUtTd9iLqjcbcxHapeTOS3kNdXbAwuvX1ADkQ+CTYw5cd
+jx2CAEKsBCz1r++/sApRPLIWSRBaGoF2HgGv89/33R66EVSmNhGkS3g6W6ICqrdY
+vwhK92NJpapQFwhzk4U3ZrcRwXXktv7PlMFywuSXNbOT7XwkrGOUYqzzi7esV4uF
+TDllNmwuVG7q3K7cvGDn69mbgYH8vULzEfuZQYhT9zYPaRePKaILqWLf6wARAQAB
+tDdTYW1lZXIgQWdhcndhbCAoQ09ERSBTSUdOSU5HIEtFWSkgPHNhbWVlcmFnQGFw
+YWNoZS5vcmc+iQJOBBMBCAA4FiEE8sZCQuwb7Gnqj7413OS/2AdGHpYFAlpftRMC
+GwMFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQ3OS/2AdGHpYqtg/+IrcrH66c
+8A6+LurGr0ZDxQzI3Ka016UOkruLGI4oitqyzgJ/j6quGTxLNEcBToeh8IUqQDN0
+VriV9iPntIUarf9b6Yx6aCxSvBwls9k9PMZqWVu0oIAecWGvvniGooxJlrelpp0M
+PJaEPHswH80d8rBDGjktBOrQIq8bak7jLomsFK1zGH6pPkAL9GYo4XK2Ik5OiRs3
+H8bJA/FS4sx17GR0IBWumBvYXtHvAmvfwIEeGtcE+cPj/S438N+fwuXI82c6EGIH
+ubFM7uqylbZMlmDgdKkG6YmEQMqK0Ka84iLzUOzqFyOj/aTrKj9GKLc8bBVLU1DP
+/PfMmJQDiETJGwwcKhRm9tYYH1DiMhWp5j1jyhOKIEKGUVJ8IxgpAkFURyOQaA4e
+5rnPoC65Pp1JzTKXWqmjDm7MRgcP77WqWis7SDgMq56/tdCbjZ2WzyfBQCUlfKU3
+7Iax5qKtdoczZRYhdZGzT8d2pMvQVu9zGuwhiPU/nwFybY1haneZhWpXTKbJkNpc
+Gzi2gE7pqXasjA+fn40tuMa4WZlrlvNhTONatcfVuNv1hGS/G+UJjhJzOo40AX2w
+2TCmaj4jiwiqByc4QZKM/iGfVCN6GlOI3+1O1KzybqoQG2Tg/ug5unmAvc23ZYw7
+uu+BnBSTsCODqQG8fPRiDlYRdZtDyQQC8M25Ag0EWl+1EwEQAJ82cuI/R4StkgBX
+zn7loZmSRZUx08EgsB8vq0s1h8g/pLdBN1h22sj9dnfcW4tFUxIKiwpLK84/Rlj7
+o2W8ZynpaKzR6pelV6Cb3+SMgtWe6DQnKaBRKJ3hzdcdA7Fp6aIjuzMsakOEOx3V
+wmtHkCn5MgN/xQBAB3T65thTOFryYqcmEoKWkd5FegJwG4sjHCCARPjgv8ucY/Vs
+6lZ0cxOB6qMO0jxH+FSMCZ4xmy7gpvQSs7D0/aj73kJ0Xv1sPZYxacf+P9MnF8jr
+mI7jKODvtKNbffRzIK/c2YCcYHvb0PtkLN8hhsmtXcmm4ezQwqA1QZWJhtI7oiCX
+A7AYrDKqsLPY4sgzeIzVmz35P/Y0baFp6Qt2eiHQ58I3Eu2+PG6x897So5j6obKi
+FEfprFKOewjefPmt+yNxhXITXUAuw57uXR7PeIcIb6bynZjyUcK+Rr8+vfI1JPaS
+ZVFaUn6KNFueK/bxDo4dzHMdj4gF9kGE+hPNRGepO7ba90QeaZSA6Bk3EUhovu8H
+eMmN/ZsdgMwIHOO3JZ9aWV7wkak7df6qbNVGDhp/QycBAm6J/iG2xYfncYp9nyw8
+UAkrht5EMAdG14Qm3Vq9GGihUsthl2ehPeD37d2/pitTMfnf2Ac6TieHbye0JgL0
+wC3WvL7cLXGmvtIRfXzNd4oDmjGtABEBAAGJAjYEGAEIACAWIQTyxkJC7BvsaeqP
+vjXc5L/YB0YelgUCWl+1EwIbDAAKCRDc5L/YB0YelrVgEACjcrAN9bY+Kv8eNcn0

svn commit: r24364 - in /dev/spark/v2.3.0-rc2-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/spark

Author: sameerag
Date: Mon Jan 22 20:30:45 2018
New Revision: 24364

Log:
Apache Spark v2.3.0-rc2 docs


[This commit notification would consist of 1444 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r24363 - in /dev/spark/2.4.0-SNAPSHOT-2018_01_22_12_01-76b8b84-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

Author: pwendell
Date: Mon Jan 22 20:17:30 2018
New Revision: 24363

Log:
Apache Spark 2.4.0-SNAPSHOT-2018_01_22_12_01-76b8b84 docs


[This commit notification would consist of 1441 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [MINOR] Typo fixes

2018-01-22 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 446948af1 -> 76b8b840d


[MINOR] Typo fixes

## What changes were proposed in this pull request?

Typo fixes

## How was this patch tested?

Local build / Doc-only changes

Author: Jacek Laskowski 

Closes #20344 from jaceklaskowski/typo-fixes.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/76b8b840
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/76b8b840
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/76b8b840

Branch: refs/heads/master
Commit: 76b8b840ddc951ee6203f9cccd2c2b9671c1b5e8
Parents: 446948a
Author: Jacek Laskowski 
Authored: Mon Jan 22 13:55:14 2018 -0600
Committer: Sean Owen 
Committed: Mon Jan 22 13:55:14 2018 -0600

--
 core/src/main/scala/org/apache/spark/SparkContext.scala |  2 +-
 .../apache/spark/sql/kafka010/KafkaSourceProvider.scala |  4 ++--
 .../org/apache/spark/sql/kafka010/KafkaWriteTask.scala  |  2 +-
 .../java/org/apache/spark/sql/streaming/OutputMode.java |  2 +-
 .../apache/spark/sql/catalyst/analysis/Analyzer.scala   |  8 
 .../apache/spark/sql/catalyst/analysis/unresolved.scala |  2 +-
 .../sql/catalyst/expressions/aggregate/interfaces.scala | 12 +---
 .../sql/catalyst/plans/logical/LogicalPlanVisitor.scala |  2 +-
 .../logical/statsEstimation/BasicStatsPlanVisitor.scala |  2 +-
 .../SizeInBytesOnlyStatsPlanVisitor.scala   |  4 ++--
 .../scala/org/apache/spark/sql/internal/SQLConf.scala   |  2 +-
 .../org/apache/spark/sql/catalyst/plans/PlanTest.scala  |  2 +-
 .../scala/org/apache/spark/sql/DataFrameWriter.scala|  2 +-
 .../org/apache/spark/sql/execution/SparkSqlParser.scala |  2 +-
 .../spark/sql/execution/WholeStageCodegenExec.scala |  2 +-
 .../apache/spark/sql/execution/command/SetCommand.scala |  4 ++--
 .../apache/spark/sql/execution/datasources/rules.scala  |  2 +-
 .../spark/sql/execution/streaming/HDFSMetadataLog.scala |  2 +-
 .../spark/sql/execution/streaming/OffsetSeq.scala   |  2 +-
 .../spark/sql/execution/streaming/OffsetSeqLog.scala|  2 +-
 .../sql/execution/streaming/StreamingQueryWrapper.scala |  2 +-
 .../sql/execution/streaming/state/StateStore.scala  |  2 +-
 .../apache/spark/sql/execution/ui/ExecutionPage.scala   |  2 +-
 .../spark/sql/expressions/UserDefinedFunction.scala |  4 ++--
 .../spark/sql/internal/BaseSessionStateBuilder.scala|  4 ++--
 .../apache/spark/sql/streaming/DataStreamReader.scala   |  6 +++---
 .../sql-tests/results/columnresolution-negative.sql.out |  2 +-
 .../sql-tests/results/columnresolution-views.sql.out|  2 +-
 .../sql-tests/results/columnresolution.sql.out  |  6 +++---
 .../test/scala/org/apache/spark/sql/SQLQuerySuite.scala |  4 ++--
 .../org/apache/spark/sql/execution/SQLViewSuite.scala   |  2 +-
 .../org/apache/spark/sql/hive/HiveExternalCatalog.scala |  4 ++--
 32 files changed, 50 insertions(+), 52 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/76b8b840/core/src/main/scala/org/apache/spark/SparkContext.scala
--
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index 31f3cb9..3828d4f 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -2276,7 +2276,7 @@ class SparkContext(config: SparkConf) extends Logging {
   }
 
   /**
-   * Clean a closure to make it ready to be serialized and send to tasks
+   * Clean a closure to make it ready to be serialized and sent to tasks
* (removes unreferenced variables in $outer's, updates REPL variables)
* If checkSerializable is set, clean will also proactively
* check to see if f is serializable and throw a 
SparkException

http://git-wip-us.apache.org/repos/asf/spark/blob/76b8b840/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
--
diff --git 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
index 3914370..62a998f 100644
--- 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
+++ 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
@@ -307,7 +307,7 @@ private[kafka010] class KafkaSourceProvider extends 
DataSourceRegister
 if 
(caseInsensitiveParams.contains(s"kafka.${ConsumerConfig.GROUP_ID_CONFIG}")) {
   throw new IllegalArgumentException(
 s"Kafka option

spark git commit: [MINOR] Typo fixes

2018-01-22 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 6facc7fb2 -> 566ef93a6


[MINOR] Typo fixes

## What changes were proposed in this pull request?

Typo fixes

## How was this patch tested?

Local build / Doc-only changes

Author: Jacek Laskowski 

Closes #20344 from jaceklaskowski/typo-fixes.

(cherry picked from commit 76b8b840ddc951ee6203f9cccd2c2b9671c1b5e8)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/566ef93a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/566ef93a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/566ef93a

Branch: refs/heads/branch-2.3
Commit: 566ef93a672aea1803d6977883204780c2f6982d
Parents: 6facc7f
Author: Jacek Laskowski 
Authored: Mon Jan 22 13:55:14 2018 -0600
Committer: Sean Owen 
Committed: Mon Jan 22 13:55:22 2018 -0600

--
 core/src/main/scala/org/apache/spark/SparkContext.scala |  2 +-
 .../apache/spark/sql/kafka010/KafkaSourceProvider.scala |  4 ++--
 .../org/apache/spark/sql/kafka010/KafkaWriteTask.scala  |  2 +-
 .../java/org/apache/spark/sql/streaming/OutputMode.java |  2 +-
 .../apache/spark/sql/catalyst/analysis/Analyzer.scala   |  8 
 .../apache/spark/sql/catalyst/analysis/unresolved.scala |  2 +-
 .../sql/catalyst/expressions/aggregate/interfaces.scala | 12 +---
 .../sql/catalyst/plans/logical/LogicalPlanVisitor.scala |  2 +-
 .../logical/statsEstimation/BasicStatsPlanVisitor.scala |  2 +-
 .../SizeInBytesOnlyStatsPlanVisitor.scala   |  4 ++--
 .../scala/org/apache/spark/sql/internal/SQLConf.scala   |  2 +-
 .../org/apache/spark/sql/catalyst/plans/PlanTest.scala  |  2 +-
 .../scala/org/apache/spark/sql/DataFrameWriter.scala|  2 +-
 .../org/apache/spark/sql/execution/SparkSqlParser.scala |  2 +-
 .../spark/sql/execution/WholeStageCodegenExec.scala |  2 +-
 .../apache/spark/sql/execution/command/SetCommand.scala |  4 ++--
 .../apache/spark/sql/execution/datasources/rules.scala  |  2 +-
 .../spark/sql/execution/streaming/HDFSMetadataLog.scala |  2 +-
 .../spark/sql/execution/streaming/OffsetSeq.scala   |  2 +-
 .../spark/sql/execution/streaming/OffsetSeqLog.scala|  2 +-
 .../sql/execution/streaming/StreamingQueryWrapper.scala |  2 +-
 .../sql/execution/streaming/state/StateStore.scala  |  2 +-
 .../apache/spark/sql/execution/ui/ExecutionPage.scala   |  2 +-
 .../spark/sql/expressions/UserDefinedFunction.scala |  4 ++--
 .../spark/sql/internal/BaseSessionStateBuilder.scala|  4 ++--
 .../apache/spark/sql/streaming/DataStreamReader.scala   |  6 +++---
 .../sql-tests/results/columnresolution-negative.sql.out |  2 +-
 .../sql-tests/results/columnresolution-views.sql.out|  2 +-
 .../sql-tests/results/columnresolution.sql.out  |  6 +++---
 .../test/scala/org/apache/spark/sql/SQLQuerySuite.scala |  4 ++--
 .../org/apache/spark/sql/execution/SQLViewSuite.scala   |  2 +-
 .../org/apache/spark/sql/hive/HiveExternalCatalog.scala |  4 ++--
 32 files changed, 50 insertions(+), 52 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/566ef93a/core/src/main/scala/org/apache/spark/SparkContext.scala
--
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index 31f3cb9..3828d4f 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -2276,7 +2276,7 @@ class SparkContext(config: SparkConf) extends Logging {
   }
 
   /**
-   * Clean a closure to make it ready to be serialized and send to tasks
+   * Clean a closure to make it ready to be serialized and sent to tasks
* (removes unreferenced variables in $outer's, updates REPL variables)
* If checkSerializable is set, clean will also proactively
* check to see if f is serializable and throw a 
SparkException

http://git-wip-us.apache.org/repos/asf/spark/blob/566ef93a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
--
diff --git 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
index 3914370..62a998f 100644
--- 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
+++ 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
@@ -307,7 +307,7 @@ private[kafka010] class KafkaSourceProvider extends 
DataSourceRegister
 if

svn commit: r24362 - /dev/spark/v2.3.0-rc2-bin/

Author: sameerag
Date: Mon Jan 22 19:45:22 2018
New Revision: 24362

Log:
Apache Spark v2.3.0-rc2

Added:
dev/spark/v2.3.0-rc2-bin/
dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz   (with props)
dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz.asc
dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz.md5
dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz.sha512
dev/spark/v2.3.0-rc2-bin/pyspark-2.3.0.tar.gz   (with props)
dev/spark/v2.3.0-rc2-bin/pyspark-2.3.0.tar.gz.asc
dev/spark/v2.3.0-rc2-bin/pyspark-2.3.0.tar.gz.md5
dev/spark/v2.3.0-rc2-bin/pyspark-2.3.0.tar.gz.sha512
dev/spark/v2.3.0-rc2-bin/spark-2.3.0-bin-hadoop2.6.tgz   (with props)
dev/spark/v2.3.0-rc2-bin/spark-2.3.0-bin-hadoop2.6.tgz.asc
dev/spark/v2.3.0-rc2-bin/spark-2.3.0-bin-hadoop2.6.tgz.md5
dev/spark/v2.3.0-rc2-bin/spark-2.3.0-bin-hadoop2.6.tgz.sha512
dev/spark/v2.3.0-rc2-bin/spark-2.3.0-bin-hadoop2.7.tgz   (with props)
dev/spark/v2.3.0-rc2-bin/spark-2.3.0-bin-hadoop2.7.tgz.asc
dev/spark/v2.3.0-rc2-bin/spark-2.3.0-bin-hadoop2.7.tgz.md5
dev/spark/v2.3.0-rc2-bin/spark-2.3.0-bin-hadoop2.7.tgz.sha512
dev/spark/v2.3.0-rc2-bin/spark-2.3.0-bin-without-hadoop.tgz   (with props)
dev/spark/v2.3.0-rc2-bin/spark-2.3.0-bin-without-hadoop.tgz.asc
dev/spark/v2.3.0-rc2-bin/spark-2.3.0-bin-without-hadoop.tgz.md5
dev/spark/v2.3.0-rc2-bin/spark-2.3.0-bin-without-hadoop.tgz.sha512
dev/spark/v2.3.0-rc2-bin/spark-2.3.0.tgz   (with props)
dev/spark/v2.3.0-rc2-bin/spark-2.3.0.tgz.asc
dev/spark/v2.3.0-rc2-bin/spark-2.3.0.tgz.md5
dev/spark/v2.3.0-rc2-bin/spark-2.3.0.tgz.sha512
dev/spark/v2.3.0-rc2-bin/spark-parent_2.11.iml

Added: dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz.asc
==
--- dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz.asc (added)
+++ dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz.asc Mon Jan 22 19:45:22 2018
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCAAdFiEE8sZCQuwb7Gnqj7413OS/2AdGHpYFAlpmPsoACgkQ3OS/2AdG
+Hpb5gg/+P0jEiAZi7FRqfRiVW2O2qBe/Oj24CgwM3wbdxD9OMaywQkWmzAMaFSBJ
+Pqkam/lxL3oy1GE+bQI8gMkfZIwneJK6fJwyCo5zqqLwZO+eDCDc1BWqEYn2sAvR
+xVdOFE5RZ3qahOjH1JPnIsrUQT3aWfVBMMWTJLm+cEUhQ4yTmiABH2nqlqiFdRM4
+Cvw6r7wRo/bvPhnyc9Ly+Cu0UnBZFdV/qHdNqaJD/CoJPpuPEyuEv4Y0QN42MgC4
+RUY3YwaRerBS3wxEbO+zUVgnWZR7KlBQZVy40YjzLRhIjgo4KkiqX6hWIaPL+TlU
+mTRWFvIQEZh/b7gZkCitLoO/t2iHvf2TvJqXFeWpieCDgXghmWdSVdg5UYREcxcY
+gY86E8qfyPxnKquJHlBu/qExESjEzrvfaPgZcY9aQFrLaS9zBzRIr51Evz6dBiD5
+0UcgiQW98cZgDJqgwMqfTNosYB9GEEWlB7llLROy/iWZ9JEpZYNYk52JQieW7gWM
+kUodYkoTOuquBE93TZiFRXEr9Er+ACofESh7kdm+MgPvFlLSYdCeaknf8+JB2Q+M
+aASarUslmgOehCGU5cqRgBXEdvm7PDuLyzNfYOT6onmbMCm6QU/wygCy3DQTR+cp
+75kTNlVqAISMQCC7S/3+8DSZhZffugnqnb6mmxa4uOqSsljczws=
+=Is9J
+-END PGP SIGNATURE-

Added: dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz.md5
==
--- dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz.md5 (added)
+++ dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz.md5 Mon Jan 22 19:45:22 2018
@@ -0,0 +1 @@
+SparkR_2.3.0.tar.gz: 58 7E C4 A4 7E 60 B1 AC  F1 FB 81 96 F7 7E BD A0

Added: dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz.sha512
==
--- dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz.sha512 (added)
+++ dev/spark/v2.3.0-rc2-bin/SparkR_2.3.0.tar.gz.sha512 Mon Jan 22 19:45:22 2018
@@ -0,0 +1,3 @@
+SparkR_2.3.0.tar.gz: 86A461C9 84324BB0 DC525774 2D4CCCB8 F0F16495 3C147E25
+ 3040DBE3 D2FFBE31 C1596FEB C1905139 92AAF623 C296E3DD
+ 7599955F DFC55EE1 BCF5691A 6FB02759

Added: dev/spark/v2.3.0-rc2-bin/pyspark-2.3.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v2.3.0-rc2-bin/pyspark-2.3.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v2.3.0-rc2-bin/pyspark-2.3.0.tar.gz.asc
==
--- dev/spark/v2.3.0-rc2-bin/pyspark-2.3.0.tar.gz.asc (added)
+++ dev/spark/v2.3.0-rc2-bin/pyspark-2.3.0.tar.gz.asc Mon Jan 22 19:45:22 2018
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCAAdFiEE8sZCQuwb7Gnqj7413OS/2AdGHpYFAlpmPkkACgkQ3OS/2AdG
+HpbGZBAAjfAgbQuI1ye/5BBDT5Zd65kT78FD4/E6l6Idu0r4DRVywrUyjp90Vc+3
++g9/cLDF5faWq23KyWSYpkO9rOL96sx0z65KV+spdaSRwNk7z4NOfyvzHyxzHSoy
+723l9coFwG5zD96PzmI2mTfOSrfrXyKs1nn/j8QBSDhkGxNhCEGMhUKYgYICJ34Q

[1/2] spark git commit: Preparing Spark release v2.3.0-rc2

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 4e75b0cb4 -> 6facc7fb2


Preparing Spark release v2.3.0-rc2


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/489ecb0e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/489ecb0e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/489ecb0e

Branch: refs/heads/branch-2.3
Commit: 489ecb0ef23e5d9b705e5e5bae4fa3d871bdac91
Parents: 4e75b0c
Author: Sameer Agarwal 
Authored: Mon Jan 22 10:49:08 2018 -0800
Committer: Sameer Agarwal 
Committed: Mon Jan 22 10:49:08 2018 -0800

--
 R/pkg/DESCRIPTION | 2 +-
 assembly/pom.xml  | 2 +-
 common/kvstore/pom.xml| 2 +-
 common/network-common/pom.xml | 2 +-
 common/network-shuffle/pom.xml| 2 +-
 common/network-yarn/pom.xml   | 2 +-
 common/sketch/pom.xml | 2 +-
 common/tags/pom.xml   | 2 +-
 common/unsafe/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 docs/_config.yml  | 4 ++--
 examples/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml | 2 +-
 external/flume-assembly/pom.xml   | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/kafka-0-10-assembly/pom.xml  | 2 +-
 external/kafka-0-10-sql/pom.xml   | 2 +-
 external/kafka-0-10/pom.xml   | 2 +-
 external/kafka-0-8-assembly/pom.xml   | 2 +-
 external/kafka-0-8/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml | 2 +-
 external/kinesis-asl/pom.xml  | 2 +-
 external/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml| 2 +-
 hadoop-cloud/pom.xml  | 2 +-
 launcher/pom.xml  | 2 +-
 mllib-local/pom.xml   | 2 +-
 mllib/pom.xml | 2 +-
 pom.xml   | 2 +-
 python/pyspark/version.py | 2 +-
 repl/pom.xml  | 2 +-
 resource-managers/kubernetes/core/pom.xml | 2 +-
 resource-managers/mesos/pom.xml   | 2 +-
 resource-managers/yarn/pom.xml| 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 41 files changed, 42 insertions(+), 42 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/489ecb0e/R/pkg/DESCRIPTION
--
diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 29a8a00..6d46c31 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 2.3.1
+Version: 2.3.0
 Title: R Frontend for Apache Spark
 Description: Provides an R Frontend for Apache Spark.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),

http://git-wip-us.apache.org/repos/asf/spark/blob/489ecb0e/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 5c5a8e9..2ca9ab6 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.1-SNAPSHOT
+2.3.0
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/489ecb0e/common/kvstore/pom.xml
--
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 2a625da..404c744 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.1-SNAPSHOT
+2.3.0
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/489ecb0e/common/network-common/pom.xml
--
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index adb1890..3c0b528 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.1-SNAPSHOT
+2.3.0
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/489ecb0e/common/network-shuffle/pom.xml
--
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 4cdcfa2..fe3bcfd 100644
---

[spark] Git Push Summary

Repository: spark
Updated Tags:  refs/tags/v2.3.0-rc2 [created] 489ecb0ef

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[2/2] spark git commit: Preparing development version 2.3.1-SNAPSHOT

Preparing development version 2.3.1-SNAPSHOT


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6facc7fb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6facc7fb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6facc7fb

Branch: refs/heads/branch-2.3
Commit: 6facc7fb2333cc61409149e2f896bf84dd085fa3
Parents: 489ecb0
Author: Sameer Agarwal 
Authored: Mon Jan 22 10:49:29 2018 -0800
Committer: Sameer Agarwal 
Committed: Mon Jan 22 10:49:29 2018 -0800

--
 R/pkg/DESCRIPTION | 2 +-
 assembly/pom.xml  | 2 +-
 common/kvstore/pom.xml| 2 +-
 common/network-common/pom.xml | 2 +-
 common/network-shuffle/pom.xml| 2 +-
 common/network-yarn/pom.xml   | 2 +-
 common/sketch/pom.xml | 2 +-
 common/tags/pom.xml   | 2 +-
 common/unsafe/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 docs/_config.yml  | 4 ++--
 examples/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml | 2 +-
 external/flume-assembly/pom.xml   | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/kafka-0-10-assembly/pom.xml  | 2 +-
 external/kafka-0-10-sql/pom.xml   | 2 +-
 external/kafka-0-10/pom.xml   | 2 +-
 external/kafka-0-8-assembly/pom.xml   | 2 +-
 external/kafka-0-8/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml | 2 +-
 external/kinesis-asl/pom.xml  | 2 +-
 external/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml| 2 +-
 hadoop-cloud/pom.xml  | 2 +-
 launcher/pom.xml  | 2 +-
 mllib-local/pom.xml   | 2 +-
 mllib/pom.xml | 2 +-
 pom.xml   | 2 +-
 python/pyspark/version.py | 2 +-
 repl/pom.xml  | 2 +-
 resource-managers/kubernetes/core/pom.xml | 2 +-
 resource-managers/mesos/pom.xml   | 2 +-
 resource-managers/yarn/pom.xml| 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 41 files changed, 42 insertions(+), 42 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6facc7fb/R/pkg/DESCRIPTION
--
diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 6d46c31..29a8a00 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 2.3.0
+Version: 2.3.1
 Title: R Frontend for Apache Spark
 Description: Provides an R Frontend for Apache Spark.
 Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),

http://git-wip-us.apache.org/repos/asf/spark/blob/6facc7fb/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 2ca9ab6..5c5a8e9 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.0
+2.3.1-SNAPSHOT
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/6facc7fb/common/kvstore/pom.xml
--
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 404c744..2a625da 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.0
+2.3.1-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/6facc7fb/common/network-common/pom.xml
--
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 3c0b528..adb1890 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.3.0
+2.3.1-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/6facc7fb/common/network-shuffle/pom.xml
--
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index fe3bcfd..4cdcfa2 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@

spark git commit: [SPARK-23121][CORE] Fix for ui becoming unaccessible for long running streaming apps

2018-01-22 Thread vanzin

Repository: spark
Updated Branches:
  refs/heads/master 4327ccf28 -> 446948af1


[SPARK-23121][CORE] Fix for ui becoming unaccessible for long running streaming 
apps

## What changes were proposed in this pull request?

The allJobs and the job pages attempt to use stage attempt and DAG 
visualization from the store, but for long running jobs they are not guaranteed 
to be retained, leading to exceptions when these pages are rendered.

To fix it `store.lastStageAttempt(stageId)` and 
`store.operationGraphForJob(jobId)` are wrapped in `store.asOption` and default 
values are used if the info is missing.

## How was this patch tested?

Manual testing of the UI, also using the test command reported in SPARK-23121:

./bin/spark-submit --class org.apache.spark.examples.streaming.HdfsWordCount 
./examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar /spark

Closes #20287

Author: Sandor Murakozi 

Closes #20330 from smurakozi/SPARK-23121.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/446948af
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/446948af
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/446948af

Branch: refs/heads/master
Commit: 446948af1d8dbc080a26a6eec6f743d338f1d12b
Parents: 4327ccf
Author: Sandor Murakozi 
Authored: Mon Jan 22 10:36:28 2018 -0800
Committer: Marcelo Vanzin 
Committed: Mon Jan 22 10:36:28 2018 -0800

--
 .../org/apache/spark/ui/jobs/AllJobsPage.scala  | 24 +++-
 .../org/apache/spark/ui/jobs/JobPage.scala  | 10 ++--
 .../org/apache/spark/ui/jobs/StagePage.scala|  9 +---
 3 files changed, 27 insertions(+), 16 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/446948af/core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala
--
diff --git a/core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala 
b/core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala
index e3b72f1..2b0f4ac 100644
--- a/core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala
+++ b/core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala
@@ -36,6 +36,9 @@ import org.apache.spark.util.Utils
 
 /** Page showing list of all ongoing and recently finished jobs */
 private[ui] class AllJobsPage(parent: JobsTab, store: AppStatusStore) extends 
WebUIPage("") {
+
+  import ApiHelper._
+
   private val JOBS_LEGEND =
 
   
   val jobId = job.jobId
   val status = job.status
-  val jobDescription = store.lastStageAttempt(job.stageIds.max).description
-  val displayJobDescription = jobDescription
-.map(UIUtils.makeDescription(_, "", plainText = true).text)
-.getOrElse("")
+  val (_, lastStageDescription) = lastStageNameAndDescription(store, job)
+  val jobDescription = UIUtils.makeDescription(lastStageDescription, "", 
plainText = true).text
+
   val submissionTime = job.submissionTime.get.getTime()
   val completionTime = 
job.completionTime.map(_.getTime()).getOrElse(System.currentTimeMillis())
   val classNameByStatus = status match {
@@ -80,7 +82,7 @@ private[ui] class AllJobsPage(parent: JobsTab, store: 
AppStatusStore) extends We
 
   // The timeline library treats contents as HTML, so we have to escape 
them. We need to add
   // extra layers of escaping in order to embed this in a Javascript 
string literal.
-  val escapedDesc = Utility.escape(displayJobDescription)
+  val escapedDesc = Utility.escape(jobDescription)
   val jsEscapedDesc = StringEscapeUtils.escapeEcmaScript(escapedDesc)
   val jobEventJsonAsStr =
 s"""
@@ -430,6 +432,8 @@ private[ui] class JobDataSource(
 sortColumn: String,
 desc: Boolean) extends PagedDataSource[JobTableRowData](pageSize) {
 
+  import ApiHelper._
+
   // Convert JobUIData to JobTableRowData which contains the final contents to 
show in the table
   // so that we can avoid creating duplicate contents during sorting the data
   private val data = jobs.map(jobRow).sorted(ordering(sortColumn, desc))
@@ -454,23 +458,21 @@ private[ui] class JobDataSource(
 val formattedDuration = duration.map(d => 
UIUtils.formatDuration(d)).getOrElse("Unknown")
 val submissionTime = jobData.submissionTime
 val formattedSubmissionTime = 
submissionTime.map(UIUtils.formatDate).getOrElse("Unknown")
-val lastStageAttempt = store.lastStageAttempt(jobData.stageIds.max)
-val lastStageDescription = lastStageAttempt.description.getOrElse("")
+val (lastStageName, lastStageDescription) = 
lastStageNameAndDescription(store, jobData)
 
-val formattedJobDescription =
-  UIUtils.makeDescription(lastStageDescription, basePath, plainText = 
false)
+val

spark git commit: [SPARK-23121][CORE] Fix for ui becoming unaccessible for long running streaming apps

2018-01-22 Thread vanzin

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 d963ba031 -> 4e75b0cb4


[SPARK-23121][CORE] Fix for ui becoming unaccessible for long running streaming 
apps

## What changes were proposed in this pull request?

The allJobs and the job pages attempt to use stage attempt and DAG 
visualization from the store, but for long running jobs they are not guaranteed 
to be retained, leading to exceptions when these pages are rendered.

To fix it `store.lastStageAttempt(stageId)` and 
`store.operationGraphForJob(jobId)` are wrapped in `store.asOption` and default 
values are used if the info is missing.

## How was this patch tested?

Manual testing of the UI, also using the test command reported in SPARK-23121:

./bin/spark-submit --class org.apache.spark.examples.streaming.HdfsWordCount 
./examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar /spark

Closes #20287

Author: Sandor Murakozi 

Closes #20330 from smurakozi/SPARK-23121.

(cherry picked from commit 446948af1d8dbc080a26a6eec6f743d338f1d12b)
Signed-off-by: Marcelo Vanzin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4e75b0cb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4e75b0cb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4e75b0cb

Branch: refs/heads/branch-2.3
Commit: 4e75b0cb4b575d4799c02455eed286fe971c6c50
Parents: d963ba0
Author: Sandor Murakozi 
Authored: Mon Jan 22 10:36:28 2018 -0800
Committer: Marcelo Vanzin 
Committed: Mon Jan 22 10:36:39 2018 -0800

--
 .../org/apache/spark/ui/jobs/AllJobsPage.scala  | 24 +++-
 .../org/apache/spark/ui/jobs/JobPage.scala  | 10 ++--
 .../org/apache/spark/ui/jobs/StagePage.scala|  9 +---
 3 files changed, 27 insertions(+), 16 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4e75b0cb/core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala
--
diff --git a/core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala 
b/core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala
index ff916bb..c2668a7 100644
--- a/core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala
+++ b/core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala
@@ -36,6 +36,9 @@ import org.apache.spark.util.Utils
 
 /** Page showing list of all ongoing and recently finished jobs */
 private[ui] class AllJobsPage(parent: JobsTab, store: AppStatusStore) extends 
WebUIPage("") {
+
+  import ApiHelper._
+
   private val JOBS_LEGEND =
 
   
   val jobId = job.jobId
   val status = job.status
-  val jobDescription = store.lastStageAttempt(job.stageIds.max).description
-  val displayJobDescription = jobDescription
-.map(UIUtils.makeDescription(_, "", plainText = true).text)
-.getOrElse("")
+  val (_, lastStageDescription) = lastStageNameAndDescription(store, job)
+  val jobDescription = UIUtils.makeDescription(lastStageDescription, "", 
plainText = true).text
+
   val submissionTime = job.submissionTime.get.getTime()
   val completionTime = 
job.completionTime.map(_.getTime()).getOrElse(System.currentTimeMillis())
   val classNameByStatus = status match {
@@ -80,7 +82,7 @@ private[ui] class AllJobsPage(parent: JobsTab, store: 
AppStatusStore) extends We
 
   // The timeline library treats contents as HTML, so we have to escape 
them. We need to add
   // extra layers of escaping in order to embed this in a Javascript 
string literal.
-  val escapedDesc = Utility.escape(displayJobDescription)
+  val escapedDesc = Utility.escape(jobDescription)
   val jsEscapedDesc = StringEscapeUtils.escapeEcmaScript(escapedDesc)
   val jobEventJsonAsStr =
 s"""
@@ -403,6 +405,8 @@ private[ui] class JobDataSource(
 sortColumn: String,
 desc: Boolean) extends PagedDataSource[JobTableRowData](pageSize) {
 
+  import ApiHelper._
+
   // Convert JobUIData to JobTableRowData which contains the final contents to 
show in the table
   // so that we can avoid creating duplicate contents during sorting the data
   private val data = jobs.map(jobRow).sorted(ordering(sortColumn, desc))
@@ -427,23 +431,21 @@ private[ui] class JobDataSource(
 val formattedDuration = duration.map(d => 
UIUtils.formatDuration(d)).getOrElse("Unknown")
 val submissionTime = jobData.submissionTime
 val formattedSubmissionTime = 
submissionTime.map(UIUtils.formatDate).getOrElse("Unknown")
-val lastStageAttempt = store.lastStageAttempt(jobData.stageIds.max)
-val lastStageDescription = lastStageAttempt.description.getOrElse("")
+val (lastStageName, lastStageDescription) = 
lastStageNameAndDescription(store, jobData)
 
-

svn commit: r24360 - in /dev/spark/2.4.0-SNAPSHOT-2018_01_22_08_01-4327ccf-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

Author: pwendell
Date: Mon Jan 22 16:22:00 2018
New Revision: 24360

Log:
Apache Spark 2.4.0-SNAPSHOT-2018_01_22_08_01-4327ccf docs


[This commit notification would consist of 1441 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11630][CORE] ClosureCleaner moved from warning to debug

2018-01-22 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 87ffe7add -> 4327ccf28


[SPARK-11630][CORE] ClosureCleaner moved from warning to debug

## What changes were proposed in this pull request?
ClosureCleaner moved from warning to debug
## How was this patch tested?
Existing tests

Author: Rekha Joshi 
Author: rjoshi2 

Closes #20337 from rekhajoshm/SPARK-11630-1.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4327ccf2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4327ccf2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4327ccf2

Branch: refs/heads/master
Commit: 4327ccf289b5a0dc51f6294113d01af6eb52eea0
Parents: 87ffe7a
Author: Rekha Joshi 
Authored: Mon Jan 22 08:36:17 2018 -0600
Committer: Sean Owen 
Committed: Mon Jan 22 08:36:17 2018 -0600

--
 core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4327ccf2/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala
--
diff --git a/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala 
b/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala
index 4061642..ad0c063 100644
--- a/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala
+++ b/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala
@@ -207,7 +207,7 @@ private[spark] object ClosureCleaner extends Logging {
   accessedFields: Map[Class[_], Set[String]]): Unit = {
 
 if (!isClosure(func.getClass)) {
-  logWarning("Expected a closure; got " + func.getClass.getName)
+  logDebug(s"Expected a closure; got ${func.getClass.getName}")
   return
 }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r24354 - in /dev/spark/2.3.1-SNAPSHOT-2018_01_22_06_01-d963ba0-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

Author: pwendell
Date: Mon Jan 22 14:20:14 2018
New Revision: 24354

Log:
Apache Spark 2.3.1-SNAPSHOT-2018_01_22_06_01-d963ba0 docs


[This commit notification would consist of 1441 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-7721][PYTHON][TESTS] Adds PySpark coverage generation script

2018-01-22 Thread gurwls223

Repository: spark
Updated Branches:
refs/heads/master 5d680cae4 -> 87ffe7add

[SPARK-7721][PYTHON][TESTS] Adds PySpark coverage generation script

## What changes were proposed in this pull request?

Note that this PR was made based on the top of
https://github.com/apache/spark/pull/20151. So, it almost leaves the main codes
intact.

This PR proposes to add a script for the preparation of automatic PySpark
coverage generation. Now, it's difficult to check the actual coverage in case
of PySpark. With this script, it allows to run tests by the way we did via
`run-tests` script before. The usage is exactly the same with `run-tests`
script as this basically wraps it.

This script and PR alone should also be useful. I was asked about how to run
this before, and seems some reviewers (including me) need this. It would be
also useful to run it manually.

It usually requires a small diff in normal Python projects but PySpark cases
are a bit different because apparently we are unable to track the coverage
after it's forked. So, here, I made a custom worker that forces the coverage,
based on the top of https://github.com/apache/spark/pull/20151.

I made a simple demo. Please take a look -
https://spark-test.github.io/pyspark-coverage-site.

To show up the structure, this PR adds the files as below:

```
python
âââ .coveragerc # Runtime configuration when we run the script.
âââ run-tests-with-coverage # The script that has coverage support and
wraps run-tests script.
âââ test_coverage # Directories that have files required when running
coverage.
âââ conf
âÂ Â âââ spark-defaults.conf # Having the configuration
'spark.python.daemon.module'.
âââ coverage_daemon.py # A daemon having custom fix and wrapping our
daemon.py
âââ sitecustomize.py # Initiate coverage with COVERAGE_PROCESS_START
```

Note that this PR has a minor nit:

[This
scope](https://github.com/apache/spark/blob/04e44b37cc04f62fbf9e08c7076349e0a4d12ea8/python/pyspark/daemon.py#L148-L169)
in `daemon.py` is not in the coverage results as basically I am producing the
coverage results in `worker.py` separately and then merging it. I believe it's
not a big deal.

In a followup, I might have a site that has a single up-to-date PySpark
coverage from the master branch as the fallback / default, or have a site that
has multiple PySpark coverages and the site link will be left to each pull
request.

## How was this patch tested?

Manually tested. Usage is the same with the existing Python test script -
`./python/run-tests`. For example,

```
sh run-tests-with-coverage --python-executables=python3 --modules=pyspark-sql
```

Running this will generate HTMLs under `./python/test_coverage/htmlcov`.

Console output example:

```
sh run-tests-with-coverage --python-executables=python3,python
--modules=pyspark-core
Running PySpark tests. Output is in /.../spark/python/unit-tests.log
Will test against the following Python executables: ['python3', 'python']
Will test the following Python modules: ['pyspark-core']
Starting test(python): pyspark.tests
Starting test(python3): pyspark.tests
...
Tests passed in 231 seconds
Combining collected coverage data under
/.../spark/python/test_coverage/coverage_data
Reporting the coverage data at
/...spark/python/test_coverage/coverage_data/coverage
Name Stmts Miss Branch BrPart Cover
--
pyspark/__init__.py 41 0 8 296%
...
pyspark/profiler.py 74 11 22 583%
pyspark/rdd.py 871 40303 3293%
pyspark/rddsampler.py 68 10 32 282%
...
--
TOTAL 8521 3077 274819159%
Generating HTML files for PySpark coverage under
/.../spark/python/test_coverage/htmlcov
```

Author: hyukjinkwon

Closes #20204 from HyukjinKwon/python-coverage.

Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/87ffe7ad
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/87ffe7ad
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/87ffe7ad

Branch: refs/heads/master
Commit: 87ffe7adddf517541aac0d1e8536b02ad8881606
Parents: 5d680ca
Author: hyukjinkwon
Authored: Mon Jan 22 22:12:50 2018 +0900
Committer: hyukjinkwon
Committed: Mon Jan 22 22:12:50 2018 +0900

spark git commit: [SPARK-23090][SQL] polish ColumnVector

2018-01-22 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 1069fad41 -> d963ba031


[SPARK-23090][SQL] polish ColumnVector

## What changes were proposed in this pull request?

Several improvements:
* provide a default implementation for the batch get methods
* rename `getChildColumn` to `getChild`, which is more concise
* remove `getStruct(int, int)`, it's only used to simplify the codegen, which 
is an internal thing, we should not add a public API for this purpose.

## How was this patch tested?

existing tests

Author: Wenchen Fan 

Closes #20277 from cloud-fan/column-vector.

(cherry picked from commit 5d680cae486c77cdb12dbe9e043710e49e8d51e4)
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d963ba03
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d963ba03
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d963ba03

Branch: refs/heads/branch-2.3
Commit: d963ba031748711ec7847ad0b702911eb7319c63
Parents: 1069fad
Author: Wenchen Fan 
Authored: Mon Jan 22 20:56:38 2018 +0800
Committer: Wenchen Fan 
Committed: Mon Jan 22 20:56:57 2018 +0800

--
 .../expressions/codegen/CodeGenerator.scala | 18 ++--
 .../datasources/orc/OrcColumnVector.java| 65 +
 .../datasources/orc/OrcColumnarBatchReader.java | 23 ++---
 .../execution/vectorized/ColumnVectorUtils.java | 10 +-
 .../vectorized/MutableColumnarRow.java  |  4 +-
 .../vectorized/WritableColumnVector.java| 10 +-
 .../spark/sql/vectorized/ArrowColumnVector.java | 99 +---
 .../spark/sql/vectorized/ColumnVector.java  | 79 +++-
 .../spark/sql/vectorized/ColumnarArray.java |  4 +-
 .../spark/sql/vectorized/ColumnarRow.java   | 46 -
 .../spark/sql/execution/ColumnarBatchScan.scala |  2 +-
 .../aggregate/VectorizedHashMapGenerator.scala  |  4 +-
 .../sql/execution/arrow/ArrowWriterSuite.scala  | 14 +--
 .../vectorized/ArrowColumnVectorSuite.scala | 12 +--
 .../vectorized/ColumnVectorSuite.scala  | 12 +--
 .../vectorized/ColumnarBatchBenchmark.scala | 38 
 .../vectorized/ColumnarBatchSuite.scala | 20 ++--
 17 files changed, 164 insertions(+), 296 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d963ba03/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
index 2c714c2..f96ed76 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
@@ -688,17 +688,13 @@ class CodegenContext {
   /**
* Returns the specialized code to access a value from a column vector for a 
given `DataType`.
*/
-  def getValue(vector: String, rowId: String, dataType: DataType): String = {
-val jt = javaType(dataType)
-dataType match {
-  case _ if isPrimitiveType(jt) =>
-s"$vector.get${primitiveTypeName(jt)}($rowId)"
-  case t: DecimalType =>
-s"$vector.getDecimal($rowId, ${t.precision}, ${t.scale})"
-  case StringType =>
-s"$vector.getUTF8String($rowId)"
-  case _ =>
-throw new IllegalArgumentException(s"cannot generate code for 
unsupported type: $dataType")
+  def getValueFromVector(vector: String, dataType: DataType, rowId: String): 
String = {
+if (dataType.isInstanceOf[StructType]) {
+  // `ColumnVector.getStruct` is different from `InternalRow.getStruct`, 
it only takes an
+  // `ordinal` parameter.
+  s"$vector.getStruct($rowId)"
+} else {
+  getValue(vector, dataType, rowId)
 }
   }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/d963ba03/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java
--
diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java
index b6e7922..aaf2a38 100644
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java
@@ -111,57 +111,21 @@ public class OrcColumnVector extends 
org.apache.spark.sql.vectorized.ColumnVecto
   }
 
   @Override
-

spark git commit: [SPARK-23090][SQL] polish ColumnVector

2018-01-22 Thread wenchen

Repository: spark
Updated Branches:
  refs/heads/master 896e45af5 -> 5d680cae4


[SPARK-23090][SQL] polish ColumnVector

## What changes were proposed in this pull request?

Several improvements:
* provide a default implementation for the batch get methods
* rename `getChildColumn` to `getChild`, which is more concise
* remove `getStruct(int, int)`, it's only used to simplify the codegen, which 
is an internal thing, we should not add a public API for this purpose.

## How was this patch tested?

existing tests

Author: Wenchen Fan 

Closes #20277 from cloud-fan/column-vector.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5d680cae
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5d680cae
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5d680cae

Branch: refs/heads/master
Commit: 5d680cae486c77cdb12dbe9e043710e49e8d51e4
Parents: 896e45a
Author: Wenchen Fan 
Authored: Mon Jan 22 20:56:38 2018 +0800
Committer: Wenchen Fan 
Committed: Mon Jan 22 20:56:38 2018 +0800

--
 .../expressions/codegen/CodeGenerator.scala | 18 ++--
 .../datasources/orc/OrcColumnVector.java| 65 +
 .../datasources/orc/OrcColumnarBatchReader.java | 23 ++---
 .../execution/vectorized/ColumnVectorUtils.java | 10 +-
 .../vectorized/MutableColumnarRow.java  |  4 +-
 .../vectorized/WritableColumnVector.java| 10 +-
 .../spark/sql/vectorized/ArrowColumnVector.java | 99 +---
 .../spark/sql/vectorized/ColumnVector.java  | 79 +++-
 .../spark/sql/vectorized/ColumnarArray.java |  4 +-
 .../spark/sql/vectorized/ColumnarRow.java   | 46 -
 .../spark/sql/execution/ColumnarBatchScan.scala |  2 +-
 .../aggregate/VectorizedHashMapGenerator.scala  |  4 +-
 .../sql/execution/arrow/ArrowWriterSuite.scala  | 14 +--
 .../vectorized/ArrowColumnVectorSuite.scala | 12 +--
 .../vectorized/ColumnVectorSuite.scala  | 12 +--
 .../vectorized/ColumnarBatchBenchmark.scala | 38 
 .../vectorized/ColumnarBatchSuite.scala | 20 ++--
 17 files changed, 164 insertions(+), 296 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5d680cae/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
index 2c714c2..f96ed76 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
@@ -688,17 +688,13 @@ class CodegenContext {
   /**
* Returns the specialized code to access a value from a column vector for a 
given `DataType`.
*/
-  def getValue(vector: String, rowId: String, dataType: DataType): String = {
-val jt = javaType(dataType)
-dataType match {
-  case _ if isPrimitiveType(jt) =>
-s"$vector.get${primitiveTypeName(jt)}($rowId)"
-  case t: DecimalType =>
-s"$vector.getDecimal($rowId, ${t.precision}, ${t.scale})"
-  case StringType =>
-s"$vector.getUTF8String($rowId)"
-  case _ =>
-throw new IllegalArgumentException(s"cannot generate code for 
unsupported type: $dataType")
+  def getValueFromVector(vector: String, dataType: DataType, rowId: String): 
String = {
+if (dataType.isInstanceOf[StructType]) {
+  // `ColumnVector.getStruct` is different from `InternalRow.getStruct`, 
it only takes an
+  // `ordinal` parameter.
+  s"$vector.getStruct($rowId)"
+} else {
+  getValue(vector, dataType, rowId)
 }
   }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/5d680cae/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java
--
diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java
index b6e7922..aaf2a38 100644
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java
@@ -111,57 +111,21 @@ public class OrcColumnVector extends 
org.apache.spark.sql.vectorized.ColumnVecto
   }
 
   @Override
-  public boolean[] getBooleans(int rowId, int count) {
-boolean[] res = new boolean[count];
-for (int i = 0; i < count;

spark git commit: [MINOR][SQL][TEST] Test case cleanups for recent PRs

Repository: spark
Updated Branches:
  refs/heads/master 78801881c -> 896e45af5


[MINOR][SQL][TEST] Test case cleanups for recent PRs

## What changes were proposed in this pull request?
Revert the unneeded test case changes we made in SPARK-23000

Also fixes the test suites that do not call `super.afterAll()` in the local 
`afterAll`. The `afterAll()` of `TestHiveSingleton` actually reset the 
environments.

## How was this patch tested?
N/A

Author: gatorsmile 

Closes #20341 from gatorsmile/testRelated.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/896e45af
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/896e45af
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/896e45af

Branch: refs/heads/master
Commit: 896e45af5fea264683b1d7d20a1711f33908a06f
Parents: 7880188
Author: gatorsmile 
Authored: Mon Jan 22 04:32:59 2018 -0800
Committer: gatorsmile 
Committed: Mon Jan 22 04:32:59 2018 -0800

--
 .../apache/spark/sql/DataFrameJoinSuite.scala   | 21 ++--
 .../apache/spark/sql/hive/test/TestHive.scala   |  3 +-
 .../sql/hive/HiveMetastoreCatalogSuite.scala| 26 +++
 .../sql/hive/execution/HiveUDAFSuite.scala  |  8 +++--
 .../sql/hive/execution/Hive_2_1_DDLSuite.scala  |  6 +++-
 .../execution/ObjectHashAggregateSuite.scala|  6 +++-
 .../apache/spark/sql/hive/parquetSuites.scala   | 35 
 7 files changed, 60 insertions(+), 45 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/896e45af/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala
index 1656f29..0d9eeab 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala
@@ -21,6 +21,7 @@ import org.apache.spark.sql.catalyst.plans.{Inner, LeftOuter, 
RightOuter}
 import org.apache.spark.sql.catalyst.plans.logical.Join
 import org.apache.spark.sql.execution.joins.BroadcastHashJoinExec
 import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.test.SharedSQLContext
 
 class DataFrameJoinSuite extends QueryTest with SharedSQLContext {
@@ -276,16 +277,14 @@ class DataFrameJoinSuite extends QueryTest with 
SharedSQLContext {
 
   test("SPARK-23087: don't throw Analysis Exception in CheckCartesianProduct 
when join condition " +
 "is false or null") {
-val df = spark.range(10)
-val dfNull = spark.range(10).select(lit(null).as("b"))
-val planNull = df.join(dfNull, $"id" === $"b", 
"left").queryExecution.analyzed
-
-spark.sessionState.executePlan(planNull).optimizedPlan
-
-val dfOne = df.select(lit(1).as("a"))
-val dfTwo = spark.range(10).select(lit(2).as("b"))
-val planFalse = dfOne.join(dfTwo, $"a" === $"b", 
"left").queryExecution.analyzed
-
-spark.sessionState.executePlan(planFalse).optimizedPlan
+withSQLConf(SQLConf.CROSS_JOINS_ENABLED.key -> "false") {
+  val df = spark.range(10)
+  val dfNull = spark.range(10).select(lit(null).as("b"))
+  df.join(dfNull, $"id" === $"b", "left").queryExecution.optimizedPlan
+
+  val dfOne = df.select(lit(1).as("a"))
+  val dfTwo = spark.range(10).select(lit(2).as("b"))
+  dfOne.join(dfTwo, $"a" === $"b", "left").queryExecution.optimizedPlan
+}
   }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/896e45af/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala
index c84131f..7287e20 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala
@@ -492,8 +492,7 @@ private[hive] class TestHiveSparkSession(
   protected val originalUDFs: JavaSet[String] = 
FunctionRegistry.getFunctionNames
 
   /**
-   * Resets the test instance by deleting any tables that have been created.
-   * TODO: also clear out UDFs, views, etc.
+   * Resets the test instance by deleting any table, view, temp view, and UDF 
that have been created
*/
   def reset() {
 try {

http://git-wip-us.apache.org/repos/asf/spark/blob/896e45af/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala
--
diff --git

spark git commit: [MINOR][SQL][TEST] Test case cleanups for recent PRs

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 d933fcea6 -> 1069fad41


[MINOR][SQL][TEST] Test case cleanups for recent PRs

## What changes were proposed in this pull request?
Revert the unneeded test case changes we made in SPARK-23000

Also fixes the test suites that do not call `super.afterAll()` in the local 
`afterAll`. The `afterAll()` of `TestHiveSingleton` actually reset the 
environments.

## How was this patch tested?
N/A

Author: gatorsmile 

Closes #20341 from gatorsmile/testRelated.

(cherry picked from commit 896e45af5fea264683b1d7d20a1711f33908a06f)
Signed-off-by: gatorsmile 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1069fad4
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1069fad4
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1069fad4

Branch: refs/heads/branch-2.3
Commit: 1069fad41fb6896fef4245e6ae6b5ba36115ad68
Parents: d933fce
Author: gatorsmile 
Authored: Mon Jan 22 04:32:59 2018 -0800
Committer: gatorsmile 
Committed: Mon Jan 22 04:33:07 2018 -0800

--
 .../apache/spark/sql/DataFrameJoinSuite.scala   | 21 ++--
 .../apache/spark/sql/hive/test/TestHive.scala   |  3 +-
 .../sql/hive/HiveMetastoreCatalogSuite.scala| 26 +++
 .../sql/hive/execution/HiveUDAFSuite.scala  |  8 +++--
 .../sql/hive/execution/Hive_2_1_DDLSuite.scala  |  6 +++-
 .../execution/ObjectHashAggregateSuite.scala|  6 +++-
 .../apache/spark/sql/hive/parquetSuites.scala   | 35 
 7 files changed, 60 insertions(+), 45 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1069fad4/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala
index 1656f29..0d9eeab 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala
@@ -21,6 +21,7 @@ import org.apache.spark.sql.catalyst.plans.{Inner, LeftOuter, 
RightOuter}
 import org.apache.spark.sql.catalyst.plans.logical.Join
 import org.apache.spark.sql.execution.joins.BroadcastHashJoinExec
 import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.test.SharedSQLContext
 
 class DataFrameJoinSuite extends QueryTest with SharedSQLContext {
@@ -276,16 +277,14 @@ class DataFrameJoinSuite extends QueryTest with 
SharedSQLContext {
 
   test("SPARK-23087: don't throw Analysis Exception in CheckCartesianProduct 
when join condition " +
 "is false or null") {
-val df = spark.range(10)
-val dfNull = spark.range(10).select(lit(null).as("b"))
-val planNull = df.join(dfNull, $"id" === $"b", 
"left").queryExecution.analyzed
-
-spark.sessionState.executePlan(planNull).optimizedPlan
-
-val dfOne = df.select(lit(1).as("a"))
-val dfTwo = spark.range(10).select(lit(2).as("b"))
-val planFalse = dfOne.join(dfTwo, $"a" === $"b", 
"left").queryExecution.analyzed
-
-spark.sessionState.executePlan(planFalse).optimizedPlan
+withSQLConf(SQLConf.CROSS_JOINS_ENABLED.key -> "false") {
+  val df = spark.range(10)
+  val dfNull = spark.range(10).select(lit(null).as("b"))
+  df.join(dfNull, $"id" === $"b", "left").queryExecution.optimizedPlan
+
+  val dfOne = df.select(lit(1).as("a"))
+  val dfTwo = spark.range(10).select(lit(2).as("b"))
+  dfOne.join(dfTwo, $"a" === $"b", "left").queryExecution.optimizedPlan
+}
   }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/1069fad4/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala
index c84131f..7287e20 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala
@@ -492,8 +492,7 @@ private[hive] class TestHiveSparkSession(
   protected val originalUDFs: JavaSet[String] = 
FunctionRegistry.getFunctionNames
 
   /**
-   * Resets the test instance by deleting any tables that have been created.
-   * TODO: also clear out UDFs, views, etc.
+   * Resets the test instance by deleting any table, view, temp view, and UDF 
that have been created
*/
   def reset() {
 try {

spark git commit: [SPARK-23170][SQL] Dump the statistics of effective runs of analyzer and optimizer rules

Repository: spark
Updated Branches:
  refs/heads/master 73281161f -> 78801881c


[SPARK-23170][SQL] Dump the statistics of effective runs of analyzer and 
optimizer rules

## What changes were proposed in this pull request?

Dump the statistics of effective runs of analyzer and optimizer rules.

## How was this patch tested?

Do a manual run of TPCDSQuerySuite

```
=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 175899
Total time: 25.486559948 seconds

Rule
   Effective Time / Total Time Effective 
Runs / Total Runs

org.apache.spark.sql.catalyst.optimizer.ColumnPruning   
   1603280450 / 2868461549 761 / 1877
org.apache.spark.sql.catalyst.analysis.Analyzer$CTESubstitution 
   2045860009 / 2056602674 37 / 788
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions   
   440719059 / 1693110949  38 / 1982
org.apache.spark.sql.catalyst.optimizer.Optimizer$OptimizeSubqueries
   1429834919 / 1446016225 39 / 285
org.apache.spark.sql.catalyst.optimizer.PruneFilters
   33273083 / 1389586938   3 / 1592
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences   
   821183615 / 128754  616 / 1982
org.apache.spark.sql.catalyst.optimizer.ReorderJoin 
   775837028 / 866238225   132 / 1592
org.apache.spark.sql.catalyst.analysis.DecimalPrecision 
   550683593 / 748854507   211 / 1982
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery 
   513075345 / 634370596   49 / 1982
org.apache.spark.sql.catalyst.analysis.Analyzer$FixNullability  
   33475731 / 60640653212 / 742
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts   
   193144298 / 545403925   86 / 1982
org.apache.spark.sql.catalyst.optimizer.BooleanSimplification   
   18651497 / 4957250047 / 1592
org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin
   369257217 / 489934378   709 / 1592
org.apache.spark.sql.catalyst.optimizer.RemoveRedundantAliases  
   3707000 / 468291609 9 / 1592
org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints 
   410155900 / 435254175   192 / 285
org.apache.spark.sql.execution.datasources.FindDataSourceTable  
   348885539 / 371855866   233 / 1982
org.apache.spark.sql.catalyst.optimizer.NullPropagation 
   11307645 / 30753122526 / 1592
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions
   120324545 / 304948785   294 / 1982
org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion  
   92323199 / 28669500738 / 1982
org.apache.spark.sql.catalyst.optimizer.PushDownPredicate   
   230084193 / 265845972   785 / 1592
org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings  
   45938401 / 26514400940 / 1982
org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion
   14888776 / 2614994501 / 1982
org.apache.spark.sql.catalyst.analysis.TypeCoercion$CaseWhenCoercion
   113796384 / 244913861   29 / 1982
org.apache.spark.sql.catalyst.optimizer.ConstantFolding 
   65008069 / 236548480126 / 1592
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator
   0 / 226338929   0 / 1982
org.apache.spark.sql.catalyst.analysis.ResolveTimeZone  
   98134906 / 221323770417 / 1982
org.apache.spark.sql.catalyst.optimizer.ReorderAssociativeOperator  
   0 / 208421703   0 / 1592
org.apache.spark.sql.catalyst.optimizer.OptimizeIn

spark git commit: [SPARK-23170][SQL] Dump the statistics of effective runs of analyzer and optimizer rules

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 743b9173f -> d933fcea6


[SPARK-23170][SQL] Dump the statistics of effective runs of analyzer and 
optimizer rules

## What changes were proposed in this pull request?

Dump the statistics of effective runs of analyzer and optimizer rules.

## How was this patch tested?

Do a manual run of TPCDSQuerySuite

```
=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 175899
Total time: 25.486559948 seconds

Rule
   Effective Time / Total Time Effective 
Runs / Total Runs

org.apache.spark.sql.catalyst.optimizer.ColumnPruning   
   1603280450 / 2868461549 761 / 1877
org.apache.spark.sql.catalyst.analysis.Analyzer$CTESubstitution 
   2045860009 / 2056602674 37 / 788
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions   
   440719059 / 1693110949  38 / 1982
org.apache.spark.sql.catalyst.optimizer.Optimizer$OptimizeSubqueries
   1429834919 / 1446016225 39 / 285
org.apache.spark.sql.catalyst.optimizer.PruneFilters
   33273083 / 1389586938   3 / 1592
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences   
   821183615 / 128754  616 / 1982
org.apache.spark.sql.catalyst.optimizer.ReorderJoin 
   775837028 / 866238225   132 / 1592
org.apache.spark.sql.catalyst.analysis.DecimalPrecision 
   550683593 / 748854507   211 / 1982
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery 
   513075345 / 634370596   49 / 1982
org.apache.spark.sql.catalyst.analysis.Analyzer$FixNullability  
   33475731 / 60640653212 / 742
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts   
   193144298 / 545403925   86 / 1982
org.apache.spark.sql.catalyst.optimizer.BooleanSimplification   
   18651497 / 4957250047 / 1592
org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin
   369257217 / 489934378   709 / 1592
org.apache.spark.sql.catalyst.optimizer.RemoveRedundantAliases  
   3707000 / 468291609 9 / 1592
org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints 
   410155900 / 435254175   192 / 285
org.apache.spark.sql.execution.datasources.FindDataSourceTable  
   348885539 / 371855866   233 / 1982
org.apache.spark.sql.catalyst.optimizer.NullPropagation 
   11307645 / 30753122526 / 1592
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions
   120324545 / 304948785   294 / 1982
org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion  
   92323199 / 28669500738 / 1982
org.apache.spark.sql.catalyst.optimizer.PushDownPredicate   
   230084193 / 265845972   785 / 1592
org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings  
   45938401 / 26514400940 / 1982
org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion
   14888776 / 2614994501 / 1982
org.apache.spark.sql.catalyst.analysis.TypeCoercion$CaseWhenCoercion
   113796384 / 244913861   29 / 1982
org.apache.spark.sql.catalyst.optimizer.ConstantFolding 
   65008069 / 236548480126 / 1592
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator
   0 / 226338929   0 / 1982
org.apache.spark.sql.catalyst.analysis.ResolveTimeZone  
   98134906 / 221323770417 / 1982
org.apache.spark.sql.catalyst.optimizer.ReorderAssociativeOperator  
   0 / 208421703   0 / 1592
org.apache.spark.sql.catalyst.optimizer.OptimizeIn

spark git commit: [SPARK-23122][PYSPARK][FOLLOW-UP] Update the docs for UDF Registration

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 cf078a205 -> 743b9173f


[SPARK-23122][PYSPARK][FOLLOW-UP] Update the docs for UDF Registration

## What changes were proposed in this pull request?

This PR is to update the docs for UDF registration

## How was this patch tested?

N/A

Author: gatorsmile 

Closes #20348 from gatorsmile/testUpdateDoc.

(cherry picked from commit 73281161fc7fddd645c712986ec376ac2b1bd213)
Signed-off-by: gatorsmile 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/743b9173
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/743b9173
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/743b9173

Branch: refs/heads/branch-2.3
Commit: 743b9173f8feaed8e594961aa85d61fb3f8e5e70
Parents: cf078a2
Author: gatorsmile 
Authored: Mon Jan 22 04:27:59 2018 -0800
Committer: gatorsmile 
Committed: Mon Jan 22 04:28:08 2018 -0800

--
 python/pyspark/sql/udf.py | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/743b9173/python/pyspark/sql/udf.py
--
diff --git a/python/pyspark/sql/udf.py b/python/pyspark/sql/udf.py
index c77f19f8..134badb 100644
--- a/python/pyspark/sql/udf.py
+++ b/python/pyspark/sql/udf.py
@@ -199,8 +199,8 @@ class UDFRegistration(object):
 @ignore_unicode_prefix
 @since("1.3.1")
 def register(self, name, f, returnType=None):
-"""Registers a Python function (including lambda function) or a 
user-defined function
-in SQL statements.
+"""Register a Python function (including lambda function) or a 
user-defined function
+as a SQL function.
 
 :param name: name of the user-defined function in SQL statements.
 :param f: a Python function, or a user-defined function. The 
user-defined function can
@@ -210,6 +210,10 @@ class UDFRegistration(object):
 be either a :class:`pyspark.sql.types.DataType` object or a 
DDL-formatted type string.
 :return: a user-defined function.
 
+To register a nondeterministic Python function, users need to first 
build
+a nondeterministic user-defined function for the Python function and 
then register it
+as a SQL function.
+
 `returnType` can be optionally specified when `f` is a Python function 
but not
 when `f` is a user-defined function. Please see below.
 
@@ -297,7 +301,7 @@ class UDFRegistration(object):
 @ignore_unicode_prefix
 @since(2.3)
 def registerJavaFunction(self, name, javaClassName, returnType=None):
-"""Register a Java user-defined function so it can be used in SQL 
statements.
+"""Register a Java user-defined function as a SQL function.
 
 In addition to a name and the function itself, the return type can be 
optionally specified.
 When the return type is not specified we would infer it via reflection.
@@ -334,7 +338,7 @@ class UDFRegistration(object):
 @ignore_unicode_prefix
 @since(2.3)
 def registerJavaUDAF(self, name, javaClassName):
-"""Register a Java user-defined aggregate function so it can be used 
in SQL statements.
+"""Register a Java user-defined aggregate function as a SQL function.
 
 :param name: name of the user-defined aggregate function
 :param javaClassName: fully qualified name of java class


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-23122][PYSPARK][FOLLOW-UP] Update the docs for UDF Registration

Repository: spark
Updated Branches:
  refs/heads/master 60175e959 -> 73281161f


[SPARK-23122][PYSPARK][FOLLOW-UP] Update the docs for UDF Registration

## What changes were proposed in this pull request?

This PR is to update the docs for UDF registration

## How was this patch tested?

N/A

Author: gatorsmile 

Closes #20348 from gatorsmile/testUpdateDoc.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/73281161
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/73281161
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/73281161

Branch: refs/heads/master
Commit: 73281161fc7fddd645c712986ec376ac2b1bd213
Parents: 60175e959
Author: gatorsmile 
Authored: Mon Jan 22 04:27:59 2018 -0800
Committer: gatorsmile 
Committed: Mon Jan 22 04:27:59 2018 -0800

--
 python/pyspark/sql/udf.py | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/73281161/python/pyspark/sql/udf.py
--
diff --git a/python/pyspark/sql/udf.py b/python/pyspark/sql/udf.py
index c77f19f8..134badb 100644
--- a/python/pyspark/sql/udf.py
+++ b/python/pyspark/sql/udf.py
@@ -199,8 +199,8 @@ class UDFRegistration(object):
 @ignore_unicode_prefix
 @since("1.3.1")
 def register(self, name, f, returnType=None):
-"""Registers a Python function (including lambda function) or a 
user-defined function
-in SQL statements.
+"""Register a Python function (including lambda function) or a 
user-defined function
+as a SQL function.
 
 :param name: name of the user-defined function in SQL statements.
 :param f: a Python function, or a user-defined function. The 
user-defined function can
@@ -210,6 +210,10 @@ class UDFRegistration(object):
 be either a :class:`pyspark.sql.types.DataType` object or a 
DDL-formatted type string.
 :return: a user-defined function.
 
+To register a nondeterministic Python function, users need to first 
build
+a nondeterministic user-defined function for the Python function and 
then register it
+as a SQL function.
+
 `returnType` can be optionally specified when `f` is a Python function 
but not
 when `f` is a user-defined function. Please see below.
 
@@ -297,7 +301,7 @@ class UDFRegistration(object):
 @ignore_unicode_prefix
 @since(2.3)
 def registerJavaFunction(self, name, javaClassName, returnType=None):
-"""Register a Java user-defined function so it can be used in SQL 
statements.
+"""Register a Java user-defined function as a SQL function.
 
 In addition to a name and the function itself, the return type can be 
optionally specified.
 When the return type is not specified we would infer it via reflection.
@@ -334,7 +338,7 @@ class UDFRegistration(object):
 @ignore_unicode_prefix
 @since(2.3)
 def registerJavaUDAF(self, name, javaClassName):
-"""Register a Java user-defined aggregate function so it can be used 
in SQL statements.
+"""Register a Java user-defined aggregate function as a SQL function.
 
 :param name: name of the user-defined aggregate function
 :param javaClassName: fully qualified name of java class


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [MINOR][DOC] Fix the path to the examples jar

2018-01-22 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 57c320a0d -> cf078a205


[MINOR][DOC] Fix the path to the examples jar

## What changes were proposed in this pull request?

The example jar file is now in ./examples/jars directory of Spark distribution.

Author: Arseniy Tashoyan 

Closes #20349 from tashoyan/patch-1.

(cherry picked from commit 60175e959f275d2961798fbc5a9150dac9de51ff)
Signed-off-by: jerryshao 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cf078a20
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cf078a20
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cf078a20

Branch: refs/heads/branch-2.3
Commit: cf078a205a14d8709e2c4a9d9f23f6efa20b4fe7
Parents: 57c320a
Author: Arseniy Tashoyan 
Authored: Mon Jan 22 20:17:05 2018 +0800
Committer: jerryshao 
Committed: Mon Jan 22 20:20:45 2018 +0800

--
 docs/running-on-yarn.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/cf078a20/docs/running-on-yarn.md
--
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index e4f5a0c..c010af3 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -35,7 +35,7 @@ For example:
 --executor-memory 2g \
 --executor-cores 1 \
 --queue thequeue \
-lib/spark-examples*.jar \
+examples/jars/spark-examples*.jar \
 10
 
 The above starts a YARN client program which starts the default Application 
Master. Then SparkPi will be run as a child thread of Application Master. The 
client will periodically poll the Application Master for status updates and 
display them in the console. The client will exit once your application has 
finished running.  Refer to the "Debugging your Application" section below for 
how to see driver and executor logs.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [MINOR][DOC] Fix the path to the examples jar

2018-01-22 Thread jshao

Repository: spark
Updated Branches:
  refs/heads/master ec2289761 -> 60175e959


[MINOR][DOC] Fix the path to the examples jar

## What changes were proposed in this pull request?

The example jar file is now in ./examples/jars directory of Spark distribution.

Author: Arseniy Tashoyan 

Closes #20349 from tashoyan/patch-1.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/60175e95
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/60175e95
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/60175e95

Branch: refs/heads/master
Commit: 60175e959f275d2961798fbc5a9150dac9de51ff
Parents: ec22897
Author: Arseniy Tashoyan 
Authored: Mon Jan 22 20:17:05 2018 +0800
Committer: jerryshao 
Committed: Mon Jan 22 20:17:05 2018 +0800

--
 docs/running-on-yarn.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/60175e95/docs/running-on-yarn.md
--
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index e4f5a0c..c010af3 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -35,7 +35,7 @@ For example:
 --executor-memory 2g \
 --executor-cores 1 \
 --queue thequeue \
-lib/spark-examples*.jar \
+examples/jars/spark-examples*.jar \
 10
 
 The above starts a YARN client program which starts the default Application 
Master. Then SparkPi will be run as a child thread of Application Master. The 
client will periodically poll the Application Master for status updates and 
display them in the console. The client will exit once your application has 
finished running.  Refer to the "Debugging your Application" section below for 
how to see driver and executor logs.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r24352 - in /dev/spark/2.3.1-SNAPSHOT-2018_01_22_02_01-57c320a-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

Author: pwendell
Date: Mon Jan 22 10:15:16 2018
New Revision: 24352

Log:
Apache Spark 2.3.1-SNAPSHOT-2018_01_22_02_01-57c320a docs


[This commit notification would consist of 1441 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r24350 - in /dev/spark/2.4.0-SNAPSHOT-2018_01_22_00_01-ec22897-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s