[GitHub] spark issue #19840: [SPARK-22640][PYSPARK][YARN]switch python exec on execut...

2017-12-03 Thread yaooqinn
Github user yaooqinn commented on the issue:

https://github.com/apache/spark/pull/19840
  
 use spark-2.2.0-bin-hadoop2.7 numpy 
examples/src/main/python/mllib/correlations_example.py
### case 1 
|key|value|
|---|---|
|**PYSPARK_DRIVER_PYTHON**|~/anaconda3/envs/py3/bin/python|
|deploy-mode|**client**|
|--archives |~/anaconda3/envs/py3.zip|
|spark.yarn.appMasterEnv.**PYSPARK_PYTHON**|py3.zip/py3/bin/python|
|spark.executorEnv.PYSPARK_PYTHON| py3.zip/py3/bin/python |
|failure|Exception: Python in worker has different version 2.7 than that in 
driver 3.6, PySpark cannot run with different minor versions.Please check 
environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly 
set.|

### case 2

|key|value|
|---|---|
|**PYSPARK_DRIVER_PYTHON**|~/anaconda3/envs/py3/bin/python|
|deploy-mode|**cluster**|
|--archives |~/anaconda3/envs/py3.zip|
|spark.yarn.appMasterEnv.**PYSPARK_PYTHON**|py3.zip/py3/bin/python|
|spark.executorEnv.PYSPARK_PYTHON| py3.zip/py3/bin/python |
|failure|java.io.IOException: Cannot run program 
"/home/hadoop/anaconda3/envs/py3/bin/python": error=2, No such file or 
directory at 
org.apache.spark.**deploy.PythonRunner**$.main(PythonRunner.scala:91)|

### case 3 & 4

|key|value|
|---|---|
|**PYSPARK_DRIVER_PYTHON**|~/anaconda3/envs/py3/bin/python|
|deploy-mode|**cluster(3) client (4)**|
|--archives |~/anaconda3/envs/py3.zip|
|spark.yarn.appMasterEnv.**PYSPARK_DRIVER_PYTHON**|py3.zip/py3/bin/python|
|spark.executorEnv. PYSPARK_DRIVER_PYTHON | py3.zip/py3/bin/python |
|failure|Exception: Python in worker has different version 2.7 than that in 
driver 3.6, PySpark cannot run with different minor versions.Please check 
environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly 
set.|

### case 5 && 6 

|key|value|
|---|---|
|**PYSPARK_PYTHON**|~/anaconda3/envs/py3/bin/python|
|deploy-mode|**cluster(6)**|
|--archives |~/anaconda3/envs/py3.zip|
|spark.yarn.appMasterEnv.**PYSPARK_DRIVER_PYTHON**|py3.zip/py3/bin/python|
|spark.executorEnv. PYSPARK_DRIVER_PYTHON | py3.zip/py3/bin/python |
|failure|java.io.IOException: Cannot run program 
"/home/hadoop/anaconda3/envs/py3/bin/python": error=2, No such file or 
directory [**executor side PythonRunner**]|

### case 7
|key|value|
|---|---|
|**PYSPARK_PYTHON**|~/anaconda3/envs/py3/bin/python|
|deploy-mode|**cluster**|
|--archives |~/anaconda3/envs/py3.zip|
|spark.yarn.appMasterEnv.**PYSPARK_PYTHON**|py3.zip/py3/bin/python|
|spark.executorEnv. PYSPARK_DRIVER_PYTHON | py3.zip/py3/bin/python |
|**success **|--|

### case 8
|key|value|
|---|---|
|**PYSPARK_PYTHON**|~/anaconda3/envs/py3/bin/python|
|deploy-mode|**cluster**|
|--archives |~/anaconda3/envs/py3.zip|
|spark.yarn.appMasterEnv.**PYSPARK_PYTHON**|py3.zip/py3/bin/python|
|spark.executorEnv. PYSPARK_DRIVER_PYTHON | py3.zip/py3/bin/python |
|failure|java.io.IOException: Cannot run program 
"/home/hadoop/anaconda3/envs/py3/bin/python": error=2, No such file or 
directory [**executor side PythonRunner**]|


### case 9
|key|value|
|---|---|
|not setting~~PYSPARK_[DRIVER]_PYTHON~~||
|deploy-mode|**client**|
|--archives |~/anaconda3/envs/py3.zip|
|spark.yarn.appMasterEnv.**PYSPARK_PYTHON**|py3.zip/py3/bin/python|
|spark.executorEnv. PYSPARK_DRIVER_PYTHON | py3.zip/py3/bin/python |
|failure|ImportError: No module named numpy|

### case 10
|key|value|
|---|---|
|not setting~~PYSPARK_[DRIVER]_PYTHON~~||
|deploy-mode|**cluster**|
|--archives |~/anaconda3/envs/py3.zip|
|spark.yarn.appMasterEnv.**PYSPARK_PYTHON**|py3.zip/py3/bin/python|
|spark.executorEnv. PYSPARK_DRIVER_PYTHON | py3.zip/py3/bin/python |
|**success**| -- |


### my humble opinions
1. spark.executorEnv. PYSPARK_* takes no affect on executor side 
pythonExec, which is determined by driver.
2. if PYSPARK_PYTHON is specified then **spark.yarn.appMasterEnv.** should 
be suffixed  by **PYSPARK_PYTHON** not ~~PYSPARK_DRIVER_PYTHON~~
3. specifying PYSPARK_DRIVER_PYTHON fails all the cases, it may be caused 
by  
https://github.com/yaooqinn/spark/blob/8ff5663fe9a32eae79c8ee6bc310409170a8da64/python/pyspark/context.py#L191
 only deal with PYSPARK_PYTHON







---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2017-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19868
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84408/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2017-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19868
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2017-12-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19868
  
**[Test build #84408 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84408/testReport)**
 for PR 19868 at commit 
[`5767660`](https://github.com/apache/spark/commit/57676609faed4512291979a8d639e3be1ec80578).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-12-03 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17770
  
@cloud-fan Sure. Seems there is no option to reopen it as it was merged 
before. Should I create another PR for it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17770
  
Hi @viirya , since it's close to Spark 2.3, would you like to reopen this 
PR? Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...

2017-12-03 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19871#discussion_r154558750
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -568,8 +570,13 @@ object DataSource extends Logging {
 "org.apache.spark.Logging")
 
   /** Given a provider name, look up the data source class definition. */
-  def lookupDataSource(provider: String): Class[_] = {
-val provider1 = backwardCompatibilityMap.getOrElse(provider, provider)
+  def lookupDataSource(sparkSession: SparkSession, provider: String): 
Class[_] = {
+var provider1 = backwardCompatibilityMap.getOrElse(provider, provider)
+if (Seq("orc", 
"org.apache.spark.sql.hive.orc.OrcFileFormat").contains(provider1.toLowerCase) 
&&
+sparkSession.conf.get(SQLConf.ORC_ENABLED)) {
--- End diff --

Shouldn't we get the conf from `sessionState`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...

2017-12-03 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19871#discussion_r154558153
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -363,6 +363,11 @@ object SQLConf {
 .checkValues(Set("none", "uncompressed", "snappy", "zlib", "lzo"))
 .createWithDefault("snappy")
 
+  val ORC_ENABLED = buildConf("spark.sql.orc.enabled")
+.doc("When true, use OrcFileFormat in sql/core module instead of the 
one in sql/hive module.")
--- End diff --

The description should include the major difference of these two orc 
versions.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...

2017-12-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19869
  
**[Test build #84410 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84410/testReport)**
 for PR 19869 at commit 
[`9b8ae3d`](https://github.com/apache/spark/commit/9b8ae3d6635c5ed0323bf088e20d0de55dd1c098).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...

2017-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19869#discussion_r154556638
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 ---
@@ -672,48 +668,56 @@ case class HashAggregateExec(
 
 def outputFromRowBasedMap: String = {
   s"""
-   while ($iterTermForFastHashMap.next()) {
- $numOutput.add(1);
- UnsafeRow $keyTerm = (UnsafeRow) $iterTermForFastHashMap.getKey();
- UnsafeRow $bufferTerm = (UnsafeRow) 
$iterTermForFastHashMap.getValue();
- $outputFunc($keyTerm, $bufferTerm);
-
- if (shouldStop()) return;
-   }
-   $fastHashMapTerm.close();
- """
+ |while ($iterTermForFastHashMap.next()) {
+ |  UnsafeRow $keyTerm = (UnsafeRow) 
$iterTermForFastHashMap.getKey();
+ |  UnsafeRow $bufferTerm = (UnsafeRow) 
$iterTermForFastHashMap.getValue();
+ |  $outputFunc($keyTerm, $bufferTerm);
+ |
+ |  if (shouldStop()) return;
+ |}
+ |$fastHashMapTerm.close();
+   """.stripMargin
 }
 
 // Iterate over the aggregate rows and convert them from ColumnarRow 
to UnsafeRow
 def outputFromVectorizedMap: String = {
 val row = ctx.freshName("fastHashMapRow")
--- End diff --

ah good catch!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18995: [SPARK-21787][SPARK-22672][SQL] Support for pushing down...

2017-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18995
  
maybe have a PR to move the tests first?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...

2017-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19871#discussion_r154556299
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/DDLSourceLoadSuite.scala 
---
@@ -54,11 +55,17 @@ class DDLSourceLoadSuite extends DataSourceTest with 
SharedSQLContext {
   .load().schema == StructType(Seq(StructField("stringType", 
StringType, nullable = false
   }
 
-  test("should fail to load ORC without Hive Support") {
-val e = intercept[AnalysisException] {
-  spark.read.format("orc").load()
+  test("should fail to load ORC only if spark.sql.orc.enabled=false and 
without Hive Support") {
--- End diff --

I think this test is replaced by 
https://github.com/apache/spark/pull/19871/files#diff-5a2e7f03d14856c8769fd3ddea8742bdR2788


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...

2017-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19871#discussion_r154556243
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -568,8 +570,13 @@ object DataSource extends Logging {
 "org.apache.spark.Logging")
 
   /** Given a provider name, look up the data source class definition. */
-  def lookupDataSource(provider: String): Class[_] = {
-val provider1 = backwardCompatibilityMap.getOrElse(provider, provider)
+  def lookupDataSource(sparkSession: SparkSession, provider: String): 
Class[_] = {
+var provider1 = backwardCompatibilityMap.getOrElse(provider, provider)
+if (Seq("orc", 
"org.apache.spark.sql.hive.orc.OrcFileFormat").contains(provider1.toLowerCase) 
&&
--- End diff --

"org.apache.spark.sql.hive.orc.OrcFileFormat" should still point to the old 
implementation


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...

2017-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19871#discussion_r154556215
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -568,8 +570,13 @@ object DataSource extends Logging {
 "org.apache.spark.Logging")
 
   /** Given a provider name, look up the data source class definition. */
-  def lookupDataSource(provider: String): Class[_] = {
-val provider1 = backwardCompatibilityMap.getOrElse(provider, provider)
+  def lookupDataSource(sparkSession: SparkSession, provider: String): 
Class[_] = {
--- End diff --

instead of passing the `SparkSession`, I think we only need `SQLConf`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...

2017-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19871#discussion_r154556132
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -363,6 +363,11 @@ object SQLConf {
 .checkValues(Set("none", "uncompressed", "snappy", "zlib", "lzo"))
 .createWithDefault("snappy")
 
+  val ORC_ENABLED = buildConf("spark.sql.orc.enabled")
--- End diff --

how about `spark.sql.orc.useNewVersion`? Also let's make it an internal 
config and enable it by default.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19840: [SPARK-22640][PYSPARK][YARN]switch python exec on execut...

2017-12-03 Thread yaooqinn
Github user yaooqinn commented on the issue:

https://github.com/apache/spark/pull/19840
  
@vanzin PYSPARK_DRIVER_PYTHON won't work because 
[context.py#L191](https://github.com/yaooqinn/spark/blob/8ff5663fe9a32eae79c8ee6bc310409170a8da64/python/pyspark/context.py#L191)
 does't deal with it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-03 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154554648
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/DriverServiceBootstrapStep.scala
 ---
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit.steps
+
+import scala.collection.JavaConverters._
+
+import io.fabric8.kubernetes.api.model.ServiceBuilder
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit.KubernetesDriverSpec
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Clock
+
+/**
+ * Allows the driver to be reachable by executor pods through a headless 
service. The service's
+ * ports should correspond to the ports that the executor will reach the 
pod at for RPC.
+ */
+private[spark] class DriverServiceBootstrapStep(
+kubernetesResourceNamePrefix: String,
+driverLabels: Map[String, String],
+submissionSparkConf: SparkConf,
+clock: Clock) extends DriverConfigurationStep with Logging {
+  import DriverServiceBootstrapStep._
+
+  override def configureDriver(driverSpec: KubernetesDriverSpec): 
KubernetesDriverSpec = {
+require(submissionSparkConf.getOption(DRIVER_BIND_ADDRESS_KEY).isEmpty,
+  s"$DRIVER_BIND_ADDRESS_KEY is not supported in Kubernetes mode, as 
the driver's bind " +
+  "address is managed and set to the driver pod's IP address.")
+require(submissionSparkConf.getOption(DRIVER_HOST_KEY).isEmpty,
+  s"$DRIVER_HOST_KEY is not supported in Kubernetes mode, as the 
driver's hostname will be " +
+  "managed via a Kubernetes service.")
+
+val preferredServiceName = 
s"$kubernetesResourceNamePrefix$DRIVER_SVC_POSTFIX"
+val resolvedServiceName = if (preferredServiceName.length <= 
MAX_SERVICE_NAME_LENGTH) {
+  preferredServiceName
+} else {
+  val randomServiceId = clock.getTimeMillis()
+  val shorterServiceName = s"spark-$randomServiceId$DRIVER_SVC_POSTFIX"
+  logWarning(s"Driver's hostname would preferably be 
$preferredServiceName, but this is " +
+s"too long (must be <= $MAX_SERVICE_NAME_LENGTH characters). 
Falling back to use " +
+s"$shorterServiceName as the driver service's name.")
+  shorterServiceName
+}
+
+val driverPort = submissionSparkConf.getInt("spark.driver.port", 
DEFAULT_DRIVER_PORT)
+val driverBlockManagerPort = submissionSparkConf.getInt(
+org.apache.spark.internal.config.DRIVER_BLOCK_MANAGER_PORT.key, 
DEFAULT_BLOCKMANAGER_PORT)
+val driverService = new ServiceBuilder()
+  .withNewMetadata()
+.withName(resolvedServiceName)
+.endMetadata()
+  .withNewSpec()
+.withClusterIP("None")
+.withSelector(driverLabels.asJava)
+.addNewPort()
+  .withName(DRIVER_PORT_NAME)
+  .withPort(driverPort)
+  .withNewTargetPort(driverPort)
+  .endPort()
+.addNewPort()
+  .withName(BLOCK_MANAGER_PORT_NAME)
+  .withPort(driverBlockManagerPort)
+  .withNewTargetPort(driverBlockManagerPort)
+  .endPort()
+.endSpec()
+  .build()
+
+val namespace = submissionSparkConf.get(KUBERNETES_NAMESPACE)
+val driverHostname = 
s"${driverService.getMetadata.getName}.$namespace.svc.cluster.local"
+val resolvedSparkConf = driverSpec.driverSparkConf.clone()
+  .set(org.apache.spark.internal.config.DRIVER_HOST_ADDRESS, 
driverHostname)
--- End diff --

`DRIVER_HOST_KEY`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19871: [SPARK-20728][SQL] Make OrcFileFormat configurable betwe...

2017-12-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19871
  
This is a second PR after #19651 .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-03 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154549074
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -251,6 +252,7 @@ object SparkSubmit extends CommandLineUtils with 
Logging {
 YARN
   case m if m.startsWith("spark") => STANDALONE
   case m if m.startsWith("mesos") => MESOS
+  case m if m.startsWith("k8s") => KUBERNETES
   case m if m.startsWith("local") => LOCAL
   case _ =>
 printErrorAndExit("Master must either be yarn or start with spark, 
mesos, local")
--- End diff --

`Master must either be yarn or start with spark, mesos, k8s, local`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-03 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154549172
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -296,6 +298,12 @@ object SparkSubmit extends CommandLineUtils with 
Logging {
   case (STANDALONE, CLUSTER) if args.isR =>
 printErrorAndExit("Cluster deploy mode is currently not supported 
for R " +
   "applications on standalone clusters.")
+  case (KUBERNETES, CLIENT) =>
+printErrorAndExit("Client mode is currently not supported for 
Kubernetes.")
+  case (KUBERNETES, _) if args.isPython =>
+printErrorAndExit("Python applications are currently not supported 
for Kubernetes.")
+  case (KUBERNETES, _) if args.isR =>
+printErrorAndExit("R applications are currently not supported for 
Kubernetes.")
--- End diff --

nit: Not affect the result, but logically I think it is better:

```scala
case (KUBERNETES, _) if args.isPython =>
  printErrorAndExit("Python applications are currently not supported for 
Kubernetes.")
case (KUBERNETES, _) if args.isR =>
  printErrorAndExit("R applications are currently not supported for 
Kubernetes.")
case (KUBERNETES, CLIENT) =>
  printErrorAndExit("Client mode is currently not supported for 
Kubernetes.")
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-03 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154549733
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -590,6 +604,11 @@ private[deploy] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, S
 |  the node running the Application 
Master via the Secure
 |  Distributed Cache, for renewing the 
login tickets and the
 |  delegation tokens periodically.
+|
+| Kubernetes only:
+|  --kubernetes-namespace NS   The namespace in the Kubernetes 
cluster within which the
+|  application must be launched. The 
namespace must already
+|  exist in the cluster. (Default: 
default).
--- End diff --

There are some messages needed to be updated too, e.g,:

| Spark standalone or Mesos with cluster deploy mode only:
|  --supervise If given, restarts the driver on 
failure.
|  --kill SUBMISSION_IDIf given, kills the driver specified.
|  --status SUBMISSION_ID  If given, requests the status of the 
driver specified.

From above, k8s supports killing submission and requesting submission 
statuses.

| Spark standalone and Mesos only:
|  --total-executor-cores NUM  Total cores for all executors.

k8s also supports `totalExecutorCores` option.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-03 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154553669
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/executor/Dockerfile ---
@@ -0,0 +1,31 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+FROM spark-base
+
+# If this docker file is being used in the context of building your images 
from a Spark distribution, the docker build
+# command should be invoked from the top level directory of the Spark 
distribution. E.g.:
+# docker build -t spark-executor:latest -f dockerfiles/executor/Dockerfile 
.
+
+COPY examples /opt/spark/examples
+
+CMD SPARK_CLASSPATH="${SPARK_HOME}/jars/*" && \
+env | grep SPARK_JAVA_OPT_ | sed 's/[^=]*=\(.*\)/\1/g' > 
/tmp/java_opts.txt && \
+readarray -t SPARK_EXECUTOR_JAVA_OPTS < /tmp/java_opts.txt && \
+if ! [ -z ${SPARK_MOUNTED_CLASSPATH}+x} ]; then 
SPARK_CLASSPATH="$SPARK_MOUNTED_CLASSPATH:$SPARK_CLASSPATH"; fi && \
+if ! [ -z ${SPARK_EXECUTOR_EXTRA_CLASSPATH+x} ]; then 
SPARK_CLASSPATH="$SPARK_EXECUTOR_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && \
+${JAVA_HOME}/bin/java "${SPARK_EXECUTOR_JAVA_OPTS[@]}" 
-Dspark.executor.port=$SPARK_EXECUTOR_PORT -Xms$SPARK_EXECUTOR_MEMORY 
-Xmx$SPARK_EXECUTOR_MEMORY -cp "$SPARK_CLASSPATH" 
org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url 
$SPARK_DRIVER_URL --executor-id $SPARK_EXECUTOR_ID --cores 
$SPARK_EXECUTOR_CORES --app-id $SPARK_APPLICATION_ID --hostname 
$SPARK_EXECUTOR_POD_IP
--- End diff --

For `SPARK_EXECUTOR_PORT`, I don't see the corresponding 
`ENV_EXECUTOR_PORT` has been set in `createExecutorPod`.  Is it missing?

ditto for`SPARK_MOUNTED_CLASSPATH`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19871: [SPARK-20728][SQL] Make OrcFileFormat configurable betwe...

2017-12-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19871
  
**[Test build #84409 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84409/testReport)**
 for PR 19871 at commit 
[`37e240c`](https://github.com/apache/spark/commit/37e240cf12ea7463c2e0ea56501b812e12745869).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19871: [SPARK-20728][SQL] Make OrcFileFormat configurabl...

2017-12-03 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/19871

[SPARK-20728][SQL] Make OrcFileFormat configurable between sql/hive and 
sql/core

## What changes were proposed in this pull request?

This PR aims to provide a configuration to choose the default 
`OrcFileFormat` from legacy `sql/hive` module or new `sql/core` module.

For example, this configuration will affects the following operations.
```scala
spark.read.orc(...)
```
```sql
CREATE TABLE t
USING ORC
...
```

## How was this patch tested?

Pass the Jenkins with new test suites.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark spark-sql-orc-enabled

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19871.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19871


commit 37e240cf12ea7463c2e0ea56501b812e12745869
Author: Dongjoon Hyun 
Date:   2017-12-03T23:33:55Z

[SPARK-20728][SQL] Make ORCFileFormat configurable between sql/hive and 
sql/core




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2017-12-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19868
  
**[Test build #84408 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84408/testReport)**
 for PR 19868 at commit 
[`5767660`](https://github.com/apache/spark/commit/57676609faed4512291979a8d639e3be1ec80578).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19855: [SPARK-22662] [SQL] Failed to prune columns after rewrit...

2017-12-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19855
  
**[Test build #84407 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84407/testReport)**
 for PR 19855 at commit 
[`f9e9d10`](https://github.com/apache/spark/commit/f9e9d10d4b581379c2af966a35362b77f35a7b6f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2017-12-03 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/19868
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19870: [SPARK-22665][SQL] Avoid repartitioning with empty list ...

2017-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19870
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84406/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19870: [SPARK-22665][SQL] Avoid repartitioning with empty list ...

2017-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19870
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19870: [SPARK-22665][SQL] Avoid repartitioning with empty list ...

2017-12-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19870
  
**[Test build #84406 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84406/testReport)**
 for PR 19870 at commit 
[`fdf10b3`](https://github.com/apache/spark/commit/fdf10b36cbc746e57cb0b8c2f79646db9dd06964).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18995: [SPARK-21787][SPARK-22672][SQL] Support for pushing down...

2017-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18995
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84405/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18995: [SPARK-21787][SPARK-22672][SQL] Support for pushing down...

2017-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18995
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18995: [SPARK-21787][SPARK-22672][SQL] Support for pushing down...

2017-12-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18995
  
**[Test build #84405 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84405/testReport)**
 for PR 18995 at commit 
[`9f6d75b`](https://github.com/apache/spark/commit/9f6d75bece86233baeb277c95b1974ac207c0d32).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19756: [SPARK-22527][SQL] Reuse coordinated exchanges if possib...

2017-12-03 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19756
  
ping @cloud-fan @hvanhovell 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nested expr...

2017-12-03 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19813
  
ping @cloud-fan @kiszk Do you have more review on this? Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19870: [SPARK-22665][SQL] Avoid repartitioning with empty list ...

2017-12-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19870
  
**[Test build #84406 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84406/testReport)**
 for PR 19870 at commit 
[`fdf10b3`](https://github.com/apache/spark/commit/fdf10b36cbc746e57cb0b8c2f79646db9dd06964).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19870: [SPARK-22665][SQL] Avoid repartitioning with empt...

2017-12-03 Thread mgaido91
GitHub user mgaido91 opened a pull request:

https://github.com/apache/spark/pull/19870

[SPARK-22665][SQL] Avoid repartitioning with empty list of expressions

## What changes were proposed in this pull request?

Repartitioning by empty set of expressions is currently possible, even 
though it is a case which is not handled properly. Indeed, in `HashExpression` 
there is a check to avoid to run it on an empty set, but this check is not 
performed while repartitioning.
Thus, the PR adds a check to avoid this wrong situation.

## How was this patch tested?

added UT

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mgaido91/spark SPARK-22665

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19870.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19870


commit fdf10b36cbc746e57cb0b8c2f79646db9dd06964
Author: Marco Gaido 
Date:   2017-12-03T19:40:01Z

[SPARK-22665][SQL] Avoid repartitioning with empty list of expressions




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18995: [SPARK-21787][SQL] Support for pushing down filters for ...

2017-12-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/18995
  
Hi, @cloud-fan .
This is the first followup after #19651 .
Could you review this PR, too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18995: [SPARK-21787][SQL] Support for pushing down filte...

2017-12-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/18995#discussion_r154537895
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcTest.scala
 ---
@@ -15,20 +15,21 @@
  * limitations under the License.
  */
 
-package org.apache.spark.sql.hive.orc
+package org.apache.spark.sql.execution.datasources.orc
 
 import java.io.File
 
 import scala.reflect.ClassTag
 import scala.reflect.runtime.universe.TypeTag
 
 import org.apache.spark.sql._
-import org.apache.spark.sql.hive.test.TestHiveSingleton
 import org.apache.spark.sql.test.SQLTestUtils
 
-private[sql] trait OrcTest extends SQLTestUtils with TestHiveSingleton {
+abstract class OrcTest extends QueryTest with SQLTestUtils {
   import testImplicits._
 
+  protected def format: String
--- End diff --

By using full canonical class names, we will explicitly test new 
OrcFileFormat in `sql/core` and old OrcFileFormat in `sql/hive` without 
considering SQL ORC switching conf.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18995: [SPARK-21787][SQL] Support for pushing down filte...

2017-12-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/18995#discussion_r154537824
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilterSuite.scala
 ---
@@ -0,0 +1,362 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.orc
+
+import java.nio.charset.StandardCharsets
+import java.sql.{Date, Timestamp}
+
+import scala.collection.JavaConverters._
+
+import org.apache.orc.storage.ql.io.sarg.{PredicateLeaf, SearchArgument}
--- End diff --

Please note that this suite use Apache ORC `SearchArgument` and 
`PredicateLeaf` classes.
We cannot share the test code for this suite.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18995: [SPARK-21787][SQL] Support for pushing down filte...

2017-12-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/18995#discussion_r154537798
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala
 ---
@@ -82,8 +82,7 @@ private[orc] object OrcFilters {
* Both CharType and VarcharType are cleaned at AstBuilder.
*/
   private def isSearchableType(dataType: DataType) = dataType match {
-// TODO: SPARK-21787 Support for pushing down filters for DateType in 
ORC
-case BinaryType | DateType => false
--- End diff --

This PR supports DateType PPD on new OrcFileFormat.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18995: [SPARK-21787][SQL] Support for pushing down filters for ...

2017-12-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18995
  
**[Test build #84405 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84405/testReport)**
 for PR 18995 at commit 
[`9f6d75b`](https://github.com/apache/spark/commit/9f6d75bece86233baeb277c95b1974ac207c0d32).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-03 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154537698
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -304,7 +313,9 @@ private[deploy] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, S
   }
 
   private def validateStatusRequestArguments(): Unit = {
-if (!master.startsWith("spark://") && !master.startsWith("mesos://")) {
+if (!master.startsWith("spark://")
+  && !master.startsWith("mesos://")
+  && !master.startsWith("k8s://")) {
   SparkSubmit.printErrorAndExit(
 "Requesting submission statuses is only supported in standalone or 
Mesos mode!")
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-03 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154537691
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -294,7 +301,9 @@ private[deploy] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, S
   }
 
   private def validateKillArguments(): Unit = {
-if (!master.startsWith("spark://") && !master.startsWith("mesos://")) {
+if (!master.startsWith("spark://")
+  && !master.startsWith("mesos://")
+  && !master.startsWith("k8s://")) {
   SparkSubmit.printErrorAndExit(
 "Killing submissions is only supported in standalone or Mesos 
mode!")
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-03 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154537632
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/steps/DependencyResolutionStepSuite.scala
 ---
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit.steps
+
+import java.io.File
+
+import scala.collection.JavaConverters._
+
+import io.fabric8.kubernetes.api.model.{ContainerBuilder, HasMetadata, 
PodBuilder}
+
+import org.apache.spark.{SparkConf, SparkFunSuite}
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit.KubernetesDriverSpec
+
+private[spark] class DependencyResolutionStepSuite extends SparkFunSuite {
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-03 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154537561
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -119,5 +139,60 @@ private[spark] object Config extends Logging {
 "must be a positive integer")
   .createWithDefault(10)
 
+  val WAIT_FOR_APP_COMPLETION =
+ConfigBuilder("spark.kubernetes.submission.waitAppCompletion")
+  .doc("In cluster mode, whether to wait for the application to finish 
before exiting the " +
+"launcher process.")
+  .booleanConf
+  .createWithDefault(true)
+
+  val REPORT_INTERVAL =
+ConfigBuilder("spark.kubernetes.report.interval")
+  .doc("Interval between reports of the current app status in cluster 
mode.")
+  .timeConf(TimeUnit.MILLISECONDS)
+  .createWithDefaultString("1s")
+
+  private[spark] val JARS_DOWNLOAD_LOCATION =
+ConfigBuilder("spark.kubernetes.mountDependencies.jarsDownloadDir")
+  .doc("Location to download jars to in the driver and executors. When 
using" +
+" spark-submit, this directory must be empty and will be mounted 
as an empty directory" +
+" volume on the driver and executor pod.")
+  .stringConf
+  .createWithDefault("/var/spark-data/spark-jars")
--- End diff --

This is somehow implementation details and we expect normally users 
wouldn't need to set or even know about it, so having a default makes sense.  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16578
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16578
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84404/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-12-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16578
  
**[Test build #84404 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84404/testReport)**
 for PR 16578 at commit 
[`981e53b`](https://github.com/apache/spark/commit/981e53bf6b4b23f790ff0bbd457f54f308441076).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class AggregateFieldExtractionPushdownSuite extends SchemaPruningTest `
  * `class JoinFieldExtractionPushdownSuite extends SchemaPruningTest `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19867: [SPARK-22675] [SQL] Deduplicate PropagateTypes in...

2017-12-03 Thread gatorsmile
Github user gatorsmile closed the pull request at:

https://github.com/apache/spark/pull/19867


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19867: [SPARK-22675] [SQL] Deduplicate PropagateTypes in TypeCo...

2017-12-03 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19867
  
After rethinking about it, this is not a right fix. I will close it first. 
There are multiple issues in the existing `PropagateTypes `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...

2017-12-03 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19869#discussion_r154534333
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 ---
@@ -672,48 +668,56 @@ case class HashAggregateExec(
 
 def outputFromRowBasedMap: String = {
   s"""
-   while ($iterTermForFastHashMap.next()) {
- $numOutput.add(1);
- UnsafeRow $keyTerm = (UnsafeRow) $iterTermForFastHashMap.getKey();
- UnsafeRow $bufferTerm = (UnsafeRow) 
$iterTermForFastHashMap.getValue();
- $outputFunc($keyTerm, $bufferTerm);
-
- if (shouldStop()) return;
-   }
-   $fastHashMapTerm.close();
- """
+ |while ($iterTermForFastHashMap.next()) {
+ |  UnsafeRow $keyTerm = (UnsafeRow) 
$iterTermForFastHashMap.getKey();
+ |  UnsafeRow $bufferTerm = (UnsafeRow) 
$iterTermForFastHashMap.getValue();
+ |  $outputFunc($keyTerm, $bufferTerm);
+ |
+ |  if (shouldStop()) return;
+ |}
+ |$fastHashMapTerm.close();
+   """.stripMargin
 }
 
 // Iterate over the aggregate rows and convert them from ColumnarRow 
to UnsafeRow
 def outputFromVectorizedMap: String = {
 val row = ctx.freshName("fastHashMapRow")
--- End diff --

nit: Is it better to fix indentations for these three lines, too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...

2017-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19869
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84403/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...

2017-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19869
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...

2017-12-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19869
  
**[Test build #84403 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84403/testReport)**
 for PR 19869 at commit 
[`174f4ec`](https://github.com/apache/spark/commit/174f4ec2ea000de01da6f494db366dc3ff58ccc4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-12-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16578
  
**[Test build #84404 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84404/testReport)**
 for PR 16578 at commit 
[`981e53b`](https://github.com/apache/spark/commit/981e53bf6b4b23f790ff0bbd457f54f308441076).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19863: [SPARK-22672][TEST][SQL] Move OrcTest to `sql/core`

2017-12-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19863
  
@gatorsmile . Since the main PR is merged, I'll include this into the 
others. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19863: [SPARK-22672][TEST][SQL] Move OrcTest to `sql/cor...

2017-12-03 Thread dongjoon-hyun
Github user dongjoon-hyun closed the pull request at:

https://github.com/apache/spark/pull/19863


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

2017-12-03 Thread sergey-rubtsov
Github user sergey-rubtsov commented on the issue:

https://github.com/apache/spark/pull/16735
  
I wil try to complete it in this month


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

2017-12-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/18953
  
This is resolved via https://github.com/apache/spark/pull/19651 .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18953: [SPARK-20682][SQL] Update ORC data source based o...

2017-12-03 Thread dongjoon-hyun
Github user dongjoon-hyun closed the pull request at:

https://github.com/apache/spark/pull/18953


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19571: [SPARK-15474][SQL] Write and read back non-emtpy ...

2017-12-03 Thread dongjoon-hyun
Github user dongjoon-hyun closed the pull request at:

https://github.com/apache/spark/pull/19571


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19571: [SPARK-15474][SQL] Write and read back non-emtpy schema ...

2017-12-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19571
  
This is resolved in https://github.com/apache/spark/pull/19651 .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new ORCFileF...

2017-12-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19651
  
Thank you so much for making ORC move forward, @cloud-fan ! 
Also, thank you, @HyukjinKwon , @gatorsmile , @viirya , @kiszk .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...

2017-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19869
  
cc @juliuszsompolski @kiszk @viirya @maropu 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...

2017-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19869#discussion_r154528578
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 ---
@@ -882,45 +851,65 @@ case class HashAggregateExec(
  |${evaluateVariables(unsafeRowBufferEvals)}
  |// update unsafe row buffer
  |${updateUnsafeRowBuffer.mkString("\n").trim}
-   """.stripMargin
+   """.stripMargin
 }
 
+val updateRowInHashMap: String = {
+  if (isFastHashMapEnabled) {
+ctx.INPUT_ROW = fastRowBuffer
+val boundUpdateExpr = 
updateExpr.map(BindReferences.bindReference(_, inputAttr))
+val subExprs = 
ctx.subexpressionEliminationForWholeStageCodegen(boundUpdateExpr)
+val effectiveCodes = subExprs.codes.mkString("\n")
+val fastRowEvals = 
ctx.withSubExprEliminationExprs(subExprs.states) {
+  boundUpdateExpr.map(_.genCode(ctx))
+}
+val updateFastRow = fastRowEvals.zipWithIndex.map { case (ev, i) =>
+  val dt = updateExpr(i).dataType
+  ctx.updateColumn(
+fastRowBuffer, dt, i, ev, updateExpr(i).nullable, 
isVectorizedHashMapEnabled)
+}
+
+// If fast hash map is on, we first generate code to update row in 
fast hash map, if the
+// previous loop up hit fast hash map. Otherwise, update row in 
regular hash map.
+s"""
+   |if ($fastRowBuffer != null) {
+   |  // common sub-expressions
+   |  $effectiveCodes
+   |  // evaluate aggregate function
+   |  ${evaluateVariables(fastRowEvals)}
+   |  // update fast row
+   |  ${updateFastRow.mkString("\n").trim}
+   |} else {
+   |  $updateRowInRegularHashMap
+   |}
+   """.stripMargin
+  } else {
+updateRowInRegularHashMap
--- End diff --

Previously we always declare the `fastRowBuffer` and have the `if 
(fastRowBuffer != null)` check. Now we don't generate then if fast hash map is 
not enabled.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...

2017-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19869#discussion_r154528498
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 ---
@@ -784,86 +774,65 @@ case class HashAggregateExec(
   ("true", "true", "", "")
 }
 
-// We first generate code to probe and update the fast hash map. If 
the probe is
-// successful the corresponding fast row buffer will hold the mutable 
row
-val findOrInsertFastHashMap: Option[String] = {
+val findOrInsertRegularHashMap: String =
+  s"""
+ |// generate grouping key
+ |${unsafeRowKeyCode.code.trim}
+ |${hashEval.code.trim}
+ |if ($checkFallbackForBytesToBytesMap) {
+ |  // try to get the buffer from hash map
+ |  $unsafeRowBuffer =
+ |
$hashMapTerm.getAggregationBufferFromUnsafeRow($unsafeRowKeys, 
${hashEval.value});
+ |}
+ |// Can't allocate buffer from the hash map. Spill the map and 
fallback to sort-based
+ |// aggregation after processing all input rows.
+ |if ($unsafeRowBuffer == null) {
+ |  if ($sorterTerm == null) {
+ |$sorterTerm = $hashMapTerm.destructAndCreateExternalSorter();
+ |  } else {
+ |
$sorterTerm.merge($hashMapTerm.destructAndCreateExternalSorter());
+ |  }
+ |  $resetCounter
+ |  // the hash map had be spilled, it should have enough memory 
now,
+ |  // try to allocate buffer again.
+ |  $unsafeRowBuffer = 
$hashMapTerm.getAggregationBufferFromUnsafeRow(
+ |$unsafeRowKeys, ${hashEval.value});
+ |  if ($unsafeRowBuffer == null) {
+ |// failed to allocate the first page
+ |throw new OutOfMemoryError("No enough memory for 
aggregation");
+ |  }
+ |}
+   """.stripMargin
+
+val findOrInsertHashMap: String = {
   if (isFastHashMapEnabled) {
-Option(
-  s"""
- |
- |if ($checkFallbackForGeneratedHashMap) {
- |  ${fastRowKeys.map(_.code).mkString("\n")}
- |  if (${fastRowKeys.map("!" + _.isNull).mkString(" && ")}) {
- |$fastRowBuffer = $fastHashMapTerm.findOrInsert(
- |${fastRowKeys.map(_.value).mkString(", ")});
- |  }
- |}
- """.stripMargin)
+// If fast hash map is on, we first generate code to probe and 
update the fast hash map.
+// If the probe is successful the corresponding fast row buffer 
will hold the mutable row.
+s"""
+   |if ($checkFallbackForGeneratedHashMap) {
--- End diff --

moved from 
https://github.com/apache/spark/pull/19869/files#diff-2eb948516b5beaeb746aadac27fbd5b4L794


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19869: [SPARK-22677][SQL] cleanup whole stage codegen for hash ...

2017-12-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19869
  
**[Test build #84403 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84403/testReport)**
 for PR 19869 at commit 
[`174f4ec`](https://github.com/apache/spark/commit/174f4ec2ea000de01da6f494db366dc3ff58ccc4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...

2017-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19869#discussion_r154528455
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 ---
@@ -784,86 +774,65 @@ case class HashAggregateExec(
   ("true", "true", "", "")
 }
 
-// We first generate code to probe and update the fast hash map. If 
the probe is
-// successful the corresponding fast row buffer will hold the mutable 
row
-val findOrInsertFastHashMap: Option[String] = {
+val findOrInsertRegularHashMap: String =
--- End diff --

moved from 
https://github.com/apache/spark/pull/19869/files#diff-2eb948516b5beaeb746aadac27fbd5b4L833


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...

2017-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19869#discussion_r154528431
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 ---
@@ -672,48 +668,56 @@ case class HashAggregateExec(
 
 def outputFromRowBasedMap: String = {
   s"""
-   while ($iterTermForFastHashMap.next()) {
- $numOutput.add(1);
- UnsafeRow $keyTerm = (UnsafeRow) $iterTermForFastHashMap.getKey();
- UnsafeRow $bufferTerm = (UnsafeRow) 
$iterTermForFastHashMap.getValue();
- $outputFunc($keyTerm, $bufferTerm);
-
- if (shouldStop()) return;
-   }
-   $fastHashMapTerm.close();
- """
+ |while ($iterTermForFastHashMap.next()) {
+ |  UnsafeRow $keyTerm = (UnsafeRow) 
$iterTermForFastHashMap.getKey();
+ |  UnsafeRow $bufferTerm = (UnsafeRow) 
$iterTermForFastHashMap.getValue();
+ |  $outputFunc($keyTerm, $bufferTerm);
+ |
+ |  if (shouldStop()) return;
+ |}
+ |$fastHashMapTerm.close();
+   """.stripMargin
 }
 
 // Iterate over the aggregate rows and convert them from ColumnarRow 
to UnsafeRow
 def outputFromVectorizedMap: String = {
 val row = ctx.freshName("fastHashMapRow")
 ctx.currentVars = null
 ctx.INPUT_ROW = row
-val generateKeyRow = GenerateUnsafeProjection.createCode(ctx,
-  groupingKeySchema.toAttributes.zipWithIndex
+  val generateKeyRow = GenerateUnsafeProjection.createCode(ctx,
+groupingKeySchema.toAttributes.zipWithIndex
   .map { case (attr, i) => BoundReference(i, attr.dataType, 
attr.nullable) }
-)
-val generateBufferRow = GenerateUnsafeProjection.createCode(ctx,
-  bufferSchema.toAttributes.zipWithIndex
-  .map { case (attr, i) =>
-BoundReference(groupingKeySchema.length + i, attr.dataType, 
attr.nullable) })
-s"""
-   | while ($iterTermForFastHashMap.hasNext()) {
-   |   $numOutput.add(1);
-   |   org.apache.spark.sql.execution.vectorized.ColumnarRow $row =
-   | (org.apache.spark.sql.execution.vectorized.ColumnarRow)
-   | $iterTermForFastHashMap.next();
-   |   ${generateKeyRow.code}
-   |   ${generateBufferRow.code}
-   |   $outputFunc(${generateKeyRow.value}, 
${generateBufferRow.value});
-   |
-   |   if (shouldStop()) return;
-   | }
-   |
-   | $fastHashMapTerm.close();
- """.stripMargin
+  )
+  val generateBufferRow = GenerateUnsafeProjection.createCode(ctx,
+bufferSchema.toAttributes.zipWithIndex.map { case (attr, i) =>
+  BoundReference(groupingKeySchema.length + i, attr.dataType, 
attr.nullable)
+})
+  val columnarRowCls = classOf[ColumnarRow].getName
+  s"""
+ |while ($iterTermForFastHashMap.hasNext()) {
+ |  $columnarRowCls $row = ($columnarRowCls) 
$iterTermForFastHashMap.next();
+ |  ${generateKeyRow.code}
+ |  ${generateBufferRow.code}
+ |  $outputFunc(${generateKeyRow.value}, 
${generateBufferRow.value});
+ |
+ |  if (shouldStop()) return;
+ |}
+ |
+ |$fastHashMapTerm.close();
+   """.stripMargin
 }
 
+def outputFromRegularHashMap: String = {
+  s"""
+ |while ($iterTerm.next()) {
--- End diff --

moved from 
https://github.com/apache/spark/pull/19869/files#diff-2eb948516b5beaeb746aadac27fbd5b4L731


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...

2017-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19869#discussion_r154528409
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 ---
@@ -672,48 +668,56 @@ case class HashAggregateExec(
 
 def outputFromRowBasedMap: String = {
   s"""
-   while ($iterTermForFastHashMap.next()) {
- $numOutput.add(1);
- UnsafeRow $keyTerm = (UnsafeRow) $iterTermForFastHashMap.getKey();
- UnsafeRow $bufferTerm = (UnsafeRow) 
$iterTermForFastHashMap.getValue();
- $outputFunc($keyTerm, $bufferTerm);
-
- if (shouldStop()) return;
-   }
-   $fastHashMapTerm.close();
- """
+ |while ($iterTermForFastHashMap.next()) {
+ |  UnsafeRow $keyTerm = (UnsafeRow) 
$iterTermForFastHashMap.getKey();
+ |  UnsafeRow $bufferTerm = (UnsafeRow) 
$iterTermForFastHashMap.getValue();
+ |  $outputFunc($keyTerm, $bufferTerm);
+ |
+ |  if (shouldStop()) return;
+ |}
+ |$fastHashMapTerm.close();
+   """.stripMargin
 }
 
 // Iterate over the aggregate rows and convert them from ColumnarRow 
to UnsafeRow
 def outputFromVectorizedMap: String = {
 val row = ctx.freshName("fastHashMapRow")
 ctx.currentVars = null
 ctx.INPUT_ROW = row
-val generateKeyRow = GenerateUnsafeProjection.createCode(ctx,
-  groupingKeySchema.toAttributes.zipWithIndex
--- End diff --

The indentation was wrong previously.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...

2017-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19869#discussion_r154528386
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 ---
@@ -444,6 +444,7 @@ case class HashAggregateExec(
 val funcName = ctx.freshName("doAggregateWithKeysOutput")
 val keyTerm = ctx.freshName("keyTerm")
 val bufferTerm = ctx.freshName("bufferTerm")
+val numOutput = metricTerm(ctx, "numOutputRows")
--- End diff --

update the `numOutputRows`  in the result function instead of doing it for 
both fast hash map and regular hash map.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19869: [SPARK-22677][SQL] cleanup whole stage codegen fo...

2017-12-03 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/19869

[SPARK-22677][SQL] cleanup whole stage codegen for hash aggregate

## What changes were proposed in this pull request?

The `HashAggregateExec` whole stage codegen path is a little messy and hard 
to understand, this code cleans it up a little bit, especially for the fast 
hash map part.

## How was this patch tested?

existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark hash-agg

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19869.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19869


commit 174f4ec2ea000de01da6f494db366dc3ff58ccc4
Author: Wenchen Fan 
Date:   2017-11-24T12:43:37Z

cleanup whole stage codegen for hash aggregate




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19831: [SPARK-22626][SQL] It deals with wrong Hive's sta...

2017-12-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19831


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19831: [SPARK-22626][SQL] It deals with wrong Hive's statistics...

2017-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19831
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19865: [SPARK-22668][SQL] Exclude global variables from argumen...

2017-12-03 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19865
  
I am neutral how to fix this problem in the current master. What I am 
saying from the beginning is that this problem does not only exists in #19811, 
but also in the current master.

I am happy to agree that we fix the invalid case in the current master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19860: [SPARK-22669][SQL] Avoid unnecessary function cal...

2017-12-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19860


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19865: [SPARK-22668][SQL] Exclude global variables from argumen...

2017-12-03 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/19865
  
@kiszk I think that in the case you hit them, this might have also been 
done appositely and relying on the way Java behaves, ie. that it uses the local 
variable and the global one is not used there. It can also be something which 
has been designed like that.  In this way instead you are forcing a behavior 
which is not the expected one.

I totally agree with @cloud-fan, that we should fix the problem where they 
are created, if there is any. We can also decide that such situation should be 
avoided for clarity, and therefore we can change the point where you find this 
behavior to be present. I am neutral to that. But I disagree in creating a 
situation which is counterintuitive.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19860: [SPARK-22669][SQL] Avoid unnecessary function calls in c...

2017-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19860
  
LGTM, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19865: [SPARK-22668][SQL] Exclude global variables from argumen...

2017-12-03 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19865
  
@cloud-fan I see. As I pointed out, there are several places to set a 
global variable `ExprCode.value` that is passed to successor operations.  
Should we make lifetime of global time local in an operation?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19865: [SPARK-22668][SQL] Exclude global variables from argumen...

2017-12-03 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19865
  
I think for this case, shouldn't we fix it and not pass in a global 
variable into `splitExpressions`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19865: [SPARK-22668][SQL] Exclude global variables from argumen...

2017-12-03 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19865
  
@mgaido91 @viirya  As you see, we see an assertion failure. Here is an 
evidence that we pass a global variable to arguments of split function.
In practice, we did not guarantee that we do not pass a global variable.


An 
[value](github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala)
 was declared as a global variable. Then, it is passed as `ExprCode.value`. 
Finally, `value` is passed as an argument in `CodeGenContext.splitFunction`. 
Fortunally, this `expressions` did not update the global variable. Thus, it 
worked fuctionally correct. 
In general, it is hard to ensure there is no update in the `expressions`. 
Of course, we do not like to use regular expressions to detect it.

As you said, how do we ensure that we do not pass a global variable?



```
**
File 
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/ml/feature.py", 
line 1205, in __main__.MinHashLSH
Failed example:
...
Caused by: java.lang.AssertionError: assertion failed: smj_value16 in 
arguments should not be declared as a global variable
at scala.Predef$.assert(Predef.scala:170)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.org$apache$spark$sql$catalyst$expressions$codegen$CodegenContext$$isDeclaredMutableState(CodeGenerator.scala:226)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext$$anonfun$9.apply(CodeGenerator.scala:854)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext$$anonfun$9.apply(CodeGenerator.scala:854)
at 
scala.collection.TraversableLike$$anonfun$filterImpl$1.apply(TraversableLike.scala:248)
at scala.collection.immutable.List.foreach(List.scala:381)
at 
scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
at 
scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
at 
scala.collection.AbstractTraversable.filter(Traversable.scala:104)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.splitExpressions(CodeGenerator.scala:853)
at 
org.apache.spark.sql.catalyst.expressions.HashExpression.genHashForStruct(hash.scala:395)
at 
org.apache.spark.sql.catalyst.expressions.HashExpression.computeHashWithTailRec(hash.scala:421)
at 
org.apache.spark.sql.catalyst.expressions.HashExpression.computeHash(hash.scala:429)
at 
org.apache.spark.sql.catalyst.expressions.HashExpression$$anonfun$1.apply(hash.scala:276)
at 
org.apache.spark.sql.catalyst.expressions.HashExpression$$anonfun$1.apply(hash.scala:273)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at 
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at 
scala.collection.AbstractTraversable.map(Traversable.scala:104)
at 
org.apache.spark.sql.catalyst.expressions.HashExpression.doGenCode(hash.scala:273)
at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:107)
at 
org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:104)
at 
org.apache.spark.sql.execution.aggregate.HashAggregateExec.doConsumeWithKeys(HashAggregateExec.scala:772)
at 
org.apache.spark.sql.execution.aggregate.HashAggregateExec.doConsume(HashAggregateExec.scala:173)
at 
org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:162)
at 
org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:35)
at 
org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:65)
at 
org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:162)
at 
org.apache.spark.sql.execution.joins.SortMergeJoinExec.consume(SortMergeJoinExec.scala:36)
at 
org.apache.spark.sql.execution.joins.SortMergeJoinExec.doProduce(SortMergeJoinExec.scala:626)
at 

[GitHub] spark issue #19865: [SPARK-22668][SQL] Exclude global variables from argumen...

2017-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19865
  
making a variable global need to be done manually(call `ctx.mutableState`), 
splitting the code into methods also need to be done manually(call 
`ctx.splitExpressions`). If we hit a problem here, it's probably due to misuse. 
Can we just check the codebase, find these invalid cases and fix them? We may 
probably add document to `ctx.splitExpression` that global variables should not 
be in the parameter.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new O...

2017-12-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19651


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new ORCFileF...

2017-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19651
  
thanks, merging to master!

followups:
1. add a config to use new orc by default
2. move orc test to sql core
3. columnar orc reader


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19865: [SPARK-22668][SQL] Exclude global variables from argumen...

2017-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19865
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19865: [SPARK-22668][SQL] Exclude global variables from argumen...

2017-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19865
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84402/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19865: [SPARK-22668][SQL] Exclude global variables from argumen...

2017-12-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19865
  
**[Test build #84402 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84402/testReport)**
 for PR 19865 at commit 
[`f63cd3a`](https://github.com/apache/spark/commit/f63cd3a37f828f86aa2a991bd611813a155645d5).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19824: [SPARK][STREAMING] Invoke onBatchCompletion() onl...

2017-12-03 Thread victor-wong
Github user victor-wong closed the pull request at:

https://github.com/apache/spark/pull/19824


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19824: [SPARK][STREAMING] Invoke onBatchCompletion() only when ...

2017-12-03 Thread victor-wong
Github user victor-wong commented on the issue:

https://github.com/apache/spark/pull/19824
  
@CodingCat 
Thank you:)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19865: [SPARK-22668][SQL] Exclude global variables from argumen...

2017-12-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19865
  
**[Test build #84402 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84402/testReport)**
 for PR 19865 at commit 
[`f63cd3a`](https://github.com/apache/spark/commit/f63cd3a37f828f86aa2a991bd611813a155645d5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19865: [SPARK-22668][SQL] Exclude global variables from argumen...

2017-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19865
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84401/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19865: [SPARK-22668][SQL] Exclude global variables from argumen...

2017-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19865
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19865: [SPARK-22668][SQL] Exclude global variables from argumen...

2017-12-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19865
  
**[Test build #84401 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84401/testReport)**
 for PR 19865 at commit 
[`ad86986`](https://github.com/apache/spark/commit/ad869866f4aa28eceaf30a1fa09f3d5930a726ad).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19867: [SPARK-22675] [SQL] Deduplicate PropagateTypes in TypeCo...

2017-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19867
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84400/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19867: [SPARK-22675] [SQL] Deduplicate PropagateTypes in TypeCo...

2017-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19867
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19867: [SPARK-22675] [SQL] Deduplicate PropagateTypes in TypeCo...

2017-12-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19867
  
**[Test build #84400 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84400/testReport)**
 for PR 19867 at commit 
[`fc3a24e`](https://github.com/apache/spark/commit/fc3a24e039c905fdd263d2ece2088ce152887fc7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19865: [SPARK-22668][SQL] Exclude global variables from argumen...

2017-12-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19865
  
**[Test build #84401 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84401/testReport)**
 for PR 19865 at commit 
[`ad86986`](https://github.com/apache/spark/commit/ad869866f4aa28eceaf30a1fa09f3d5930a726ad).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19860: [SPARK-22669][SQL] Avoid unnecessary function calls in c...

2017-12-03 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/19860
  
@kiszk @viirya I made the following performance test:

```
val a = (1 to 10).map(x => 1).toDS
val filtered = a.where($"value".isin((1 to 10): _*))
(1 to 20).map(x=>time(filtered.count)).sum / 20 // where time is an easy 
function which measures the function time
```

before the PR the average execution time over the 20 trials is 3,428 s, 
while after the PR it is 3,121 s (on OSX 2,8 GHz Intel Core i7). This means 
about 10% improvement of the overall performance in this case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2017-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19868
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84399/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   >