[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20779
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88212/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...

2018-03-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20779
  
**[Test build #88212 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88212/testReport)**
 for PR 20779 at commit 
[`8fb5df0`](https://github.com/apache/spark/commit/8fb5df0f76a6773594bb7e695036f3fdf0063c6a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20799: [SPARK-23635][YARN] AM env variable should not ov...

2018-03-13 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/20799#discussion_r174294428
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
 ---
@@ -247,6 +241,18 @@ private[yarn] class ExecutorRunnable(
 
 System.getenv().asScala.filterKeys(_.startsWith("SPARK"))
   .foreach { case (k, v) => env(k) = v }
+
+sparkConf.getExecutorEnv.foreach { case (key, value) =>
+  if (key == Environment.CLASSPATH.name()) {
--- End diff --

Ah, I see sorry missed that. So I guess here we are just stomping on 
whatever is in the system env path now, vs before we were stomping on the 
executorEnv specified with the system env.  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20816: SPARK-21479 Outer join filter pushdown in null supplying...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20816
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1494/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20816: SPARK-21479 Outer join filter pushdown in null supplying...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20816
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20796: [SPARK-23649][SQL] Prevent crashes on schema infe...

2018-03-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20796#discussion_r174293523
  
--- Diff: sql/core/src/test/resources/test-data/utf8xFF.csv ---
@@ -0,0 +1,3 @@
+channel,code
+United,123
+ABGUN�,456
--- End diff --

how did you create this file?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20816: SPARK-21479 Outer join filter pushdown in null supplying...

2018-03-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20816
  
**[Test build #88215 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88215/testReport)**
 for PR 20816 at commit 
[`ac17976`](https://github.com/apache/spark/commit/ac17976fd2b024039ee6cd848b864d2d052ec573).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20816: SPARK-21479 Outer join filter pushdown in null su...

2018-03-13 Thread maryannxue
GitHub user maryannxue opened a pull request:

https://github.com/apache/spark/pull/20816

SPARK-21479 Outer join filter pushdown in null supplying table when 
condition is on one of the joined columns

## What changes were proposed in this pull request?

Added `TransitPredicateInOuterJoin` optimization rule that transits 
constraints from the preserved side of an outer join to the null-supplying 
side. The constraints of the join operator will remain unchanged.

## How was this patch tested?

Added 3 tests in `InferFiltersFromConstraintsSuite`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maryannxue/spark spark-21479

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20816.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20816


commit ac17976fd2b024039ee6cd848b864d2d052ec573
Author: maryannxue 
Date:   2018-03-13T21:05:37Z

SPARK-21479 Outer join filter pushdown in null supplying table when 
condition is on one of the joined columns




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20806: [SPARK-23661][SQL] Implement treeAggregate on Dat...

2018-03-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20806#discussion_r174277864
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1658,6 +1659,43 @@ class Dataset[T] private[sql](
   def groupByKey[K](func: MapFunction[T, K], encoder: Encoder[K]): 
KeyValueGroupedDataset[K, T] =
 groupByKey(func.call(_))(encoder)
 
+
+  /**
+   * Aggregates the elements of this Dataset in a multi-level tree pattern.
+   *
+   * @param depth suggested depth of the tree (default: 2)
+   */
+  private[spark] def treeAggregate[U : Encoder : ClassTag](zeroValue: U)(
+  seqOp: (U, T) => U,
+  combOp: (U, U) => U,
+  depth: Int = 2): U = {
+require(depth >= 1, s"Depth must be greater than or equal to 1 but got 
$depth.")
+val sparkContext = sparkSession.sparkContext
+val copiedZeroValue = Utils.clone(zeroValue, 
sparkContext.env.closureSerializer.newInstance())
+if (rdd.partitions.length == 0) {
+  copiedZeroValue
+} else {
+  val aggregatePartition =
+(it: Iterator[T]) => it.aggregate(zeroValue)(seqOp, combOp)
+  var partiallyAggregated: Dataset[U] = mapPartitions(it => 
Iterator(aggregatePartition(it)))
--- End diff --

Why can't we call `rdd.treeAggregate` directly?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20806: [SPARK-23661][SQL] Implement treeAggregate on Dat...

2018-03-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20806#discussion_r174276969
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1658,6 +1659,43 @@ class Dataset[T] private[sql](
   def groupByKey[K](func: MapFunction[T, K], encoder: Encoder[K]): 
KeyValueGroupedDataset[K, T] =
 groupByKey(func.call(_))(encoder)
 
+
+  /**
+   * Aggregates the elements of this Dataset in a multi-level tree pattern.
+   *
+   * @param depth suggested depth of the tree (default: 2)
+   */
+  private[spark] def treeAggregate[U : Encoder : ClassTag](zeroValue: U)(
+  seqOp: (U, T) => U,
+  combOp: (U, U) => U,
+  depth: Int = 2): U = {
+require(depth >= 1, s"Depth must be greater than or equal to 1 but got 
$depth.")
--- End diff --

why would depth 1 make sense?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml.featur...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20686
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88214/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml.featur...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20686
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml.featur...

2018-03-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20686
  
**[Test build #88214 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88214/testReport)**
 for PR 20686 at commit 
[`bf713b5`](https://github.com/apache/spark/commit/bf713b5366e1b42bd5e52f0366ca24944f509721).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20799: [SPARK-23635][YARN] AM env variable should not ov...

2018-03-13 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/20799#discussion_r174273641
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
 ---
@@ -247,6 +241,18 @@ private[yarn] class ExecutorRunnable(
 
 System.getenv().asScala.filterKeys(_.startsWith("SPARK"))
   .foreach { case (k, v) => env(k) = v }
+
+sparkConf.getExecutorEnv.foreach { case (key, value) =>
+  if (key == Environment.CLASSPATH.name()) {
+// If the key of env variable is CLASSPATH, we assume it is a path 
and append it.
+// This is kept for backward compatibility and consistency with 
hadoop
+YarnSparkHadoopUtil.addPathToEnvironment(env, key, value)
+  } else {
+// For other env variables, simply overwrite the value.
+env(key) = value
+  }
+}
--- End diff --

@jerryshao I think there is a potential issue with this change - it allows 
for users to (incorrectly) specify SPARK_LOG_URL_STDERR, SPARK_LOG_URL_STDOUT : 
which should be generated by driver. The section "// Add log urls" above this 
code snippet.

Note, this is an existing bug in the code regarding the same - if the same 
variables had been present in driver env, they would have overridden the 
generated value's.
Would be good to fix this issue as well as part of this change.

Solution would be to move the block for '// Add log urls' below this 
current block


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20799: [SPARK-23635][YARN] AM env variable should not ov...

2018-03-13 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/20799#discussion_r174272808
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala
 ---
@@ -247,6 +241,18 @@ private[yarn] class ExecutorRunnable(
 
 System.getenv().asScala.filterKeys(_.startsWith("SPARK"))
   .foreach { case (k, v) => env(k) = v }
+
+sparkConf.getExecutorEnv.foreach { case (key, value) =>
+  if (key == Environment.CLASSPATH.name()) {
--- End diff --

@tgravescs In existing code, in `prepareEnvironment`, "env" is populated 
only with `Environment.CLASSPATH`. Hence LD_LIBRARY_PATH does not apply to this 
specific change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml.featur...

2018-03-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20686
  
**[Test build #88214 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88214/testReport)**
 for PR 20686 at commit 
[`bf713b5`](https://github.com/apache/spark/commit/bf713b5366e1b42bd5e52f0366ca24944f509721).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20805: [SPARK-21479][SQL] Outer join filter pushdown in ...

2018-03-13 Thread maryannxue
Github user maryannxue closed the pull request at:

https://github.com/apache/spark/pull/20805


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-13 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/20686#discussion_r174247908
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/VectorSlicerSuite.scala ---
@@ -84,26 +84,29 @@ class VectorSlicerSuite extends SparkFunSuite with 
MLlibTestSparkContext with De
 
 val vectorSlicer = new 
VectorSlicer().setInputCol("features").setOutputCol("result")
 
-def validateResults(df: DataFrame): Unit = {
-  df.select("result", "expected").collect().foreach { case Row(vec1: 
Vector, vec2: Vector) =>
+def validateResults(rows: Seq[Row]): Unit = {
+  rows.foreach { case Row(vec1: Vector, vec2: Vector) =>
 assert(vec1 === vec2)
   }
-  val resultMetadata = 
AttributeGroup.fromStructField(df.schema("result"))
-  val expectedMetadata = 
AttributeGroup.fromStructField(df.schema("expected"))
+  val resultMetadata = 
AttributeGroup.fromStructField(rows.head.schema("result"))
+  val expectedMetadata = 
AttributeGroup.fromStructField(rows.head.schema("expected"))
   assert(resultMetadata.numAttributes === 
expectedMetadata.numAttributes)
   
resultMetadata.attributes.get.zip(expectedMetadata.attributes.get).foreach { 
case (a, b) =>
 assert(a === b)
   }
 }
 
 vectorSlicer.setIndices(Array(1, 4)).setNames(Array.empty)
-validateResults(vectorSlicer.transform(df))
+testTransformerByGlobalCheckFunc[(Vector, Vector)](df, vectorSlicer, 
"result", "expected")(
--- End diff --

The reason I have chosen the global check function is the checks for the 
attributes:

```
  val resultMetadata = 
AttributeGroup.fromStructField(rows.head.schema("result"))
  val expectedMetadata = 
AttributeGroup.fromStructField(rows.head.schema("expected"))
  assert(resultMetadata.numAttributes === 
expectedMetadata.numAttributes)
  
resultMetadata.attributes.get.zip(expectedMetadata.attributes.get).foreach { 
case (a, b) =>
assert(a === b)
  }
```
This is part is not row based but more like result set based.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19876: [ML][SPARK-11171][SPARK-11239] Add PMML export to Spark ...

2018-03-13 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/19876
  
March 15th is soon, any thoughts  @MLnick @jkbradley @sethah ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-13 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/20686#discussion_r174245361
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala ---
@@ -58,14 +57,16 @@ class VectorAssemblerSuite
 assert(v2.isInstanceOf[DenseVector])
   }
 
-  test("VectorAssembler") {
+  ignore("VectorAssembler") {
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20812: [SPARK-23669] Executors fetch jars and name the jars wit...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20812
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20812: [SPARK-23669] Executors fetch jars and name the jars wit...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20812
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88208/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20812: [SPARK-23669] Executors fetch jars and name the jars wit...

2018-03-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20812
  
**[Test build #88208 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88208/testReport)**
 for PR 20812 at commit 
[`f78c273`](https://github.com/apache/spark/commit/f78c273c6132f9cc226668590273836950c39b74).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20815: [SPARK-23658][LAUNCHER] InProcessAppHandle uses the wron...

2018-03-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20815
  
**[Test build #88213 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88213/testReport)**
 for PR 20815 at commit 
[`1518a5a`](https://github.com/apache/spark/commit/1518a5af591b2254e947a60e0ec107551f2155a4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20815: [SPARK-23658][LAUNCHER] InProcessAppHandle uses the wron...

2018-03-13 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/20815
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20702: [SPARK-23547][SQL]Cleanup the .pipeout file when ...

2018-03-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20702


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20811: [SPARK-23668][K8S] Add config option for passing through...

2018-03-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20811
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1476/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20815: [SPARK-23658][LAUNCHER] InProcessAppHandle uses the wron...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20815
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20702: [SPARK-23547][SQL]Cleanup the .pipeout file when the Hiv...

2018-03-13 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20702
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20803: [SPARK-23653][SQL] Show sql statement in spark SQL UI

2018-03-13 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20803
  
So this patch duplicates the SQL text info on the jobs page to the SQL 
query page. I think it's good and more user-friendly, but we need to make sure 
the underlying implementation reuse the code, to avoid problems like missing 
the `--hivevar`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...

2018-03-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20779
  
**[Test build #88212 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88212/testReport)**
 for PR 20779 at commit 
[`8fb5df0`](https://github.com/apache/spark/commit/8fb5df0f76a6773594bb7e695036f3fdf0063c6a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20779
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1493/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20815: [SPARK-23658][LAUNCHER] InProcessAppHandle uses the wron...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20815
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20779
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20811: [SPARK-23668][K8S] Add config option for passing through...

2018-03-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20811
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1476/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20811: [SPARK-23668][K8S] Add config option for passing through...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20811
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20811: [SPARK-23668][K8S] Add config option for passing through...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20811
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1492/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20815: [SPARK-23658][LAUNCHER] InProcessAppHandle uses t...

2018-03-13 Thread sahilTakiar
GitHub user sahilTakiar opened a pull request:

https://github.com/apache/spark/pull/20815

[SPARK-23658][LAUNCHER] InProcessAppHandle uses the wrong class in getLogger

## What changes were proposed in this pull request?

Changed `Logger` in `InProcessAppHandle` to use `InProcessAppHandle` 
instead of `ChildProcAppHandle`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sahilTakiar/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20815.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20815


commit 1518a5af591b2254e947a60e0ec107551f2155a4
Author: Sahil Takiar 
Date:   2018-03-13T18:24:20Z

[SPARK-23658][LAUNCHER] InProcessAppHandle uses the wrong class in getLogger




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20811: [SPARK-23668][K8S] Add config option for passing ...

2018-03-13 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20811#discussion_r174236490
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala
 ---
@@ -108,6 +109,8 @@ private[spark] class ExecutorPodFactory(
   nodeToLocalTaskCount: Map[String, Int]): Pod = {
 val name = s"$executorPodNamePrefix-exec-$executorId"
 
+val imagePullSecrets = imagePullSecret.map(new 
LocalObjectReference(_)).toList
--- End diff --

Given the same code is used to configure both the driver and executor pods, 
it can be extracted out into a utility method in `KubernetesUtil`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20811: [SPARK-23668][K8S] Add config option for passing ...

2018-03-13 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20811#discussion_r174235585
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/BasicDriverConfigurationStep.scala
 ---
@@ -17,9 +17,7 @@
 package org.apache.spark.deploy.k8s.submit.steps
 
 import scala.collection.JavaConverters._
-
-import io.fabric8.kubernetes.api.model.{ContainerBuilder, EnvVarBuilder, 
EnvVarSourceBuilder, PodBuilder, QuantityBuilder}
-
+import io.fabric8.kubernetes.api.model._
--- End diff --

There should be an empty line between third-party imports and 
`org.apache.spark.*` imports.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20811: [SPARK-23668][K8S] Add config option for passing ...

2018-03-13 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20811#discussion_r174235357
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -54,6 +54,12 @@ private[spark] object Config extends Logging {
   .checkValues(Set("Always", "Never", "IfNotPresent"))
   .createWithDefault("IfNotPresent")
 
+  val IMAGE_PULL_SECRET =
+ConfigBuilder("spark.kubernetes.imagePullSecret")
+  .doc("Specifies the Kubernetes image secret used to access private 
image registry.")
--- End diff --

The first `image` can be removed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20433
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88209/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20433
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...

2018-03-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20433
  
**[Test build #88209 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88209/testReport)**
 for PR 20433 at commit 
[`f6210a2`](https://github.com/apache/spark/commit/f6210a2029129d38c15aeeb309c2b001d83757a6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20811: [SPARK-23668][K8S] Add config option for passing through...

2018-03-13 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/20811
  
cc/ @mccheah @liyinan926 @vanzin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20814: [SPARK-23671][core] Fix condition to enable the SHS thre...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20814
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20814: [SPARK-23671][core] Fix condition to enable the SHS thre...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20814
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1491/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20811: [SPARK-23668][K8S] Add config option for passing through...

2018-03-13 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/20811
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20814: [SPARK-23671][core] Fix condition to enable the SHS thre...

2018-03-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20814
  
**[Test build #88211 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88211/testReport)**
 for PR 20814 at commit 
[`a897277`](https://github.com/apache/spark/commit/a89727753820aa0cbbe9bea4b2066c89b9ecfb4d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20814: [SPARK-23671][core] Fix condition to enable the S...

2018-03-13 Thread vanzin
GitHub user vanzin opened a pull request:

https://github.com/apache/spark/pull/20814

[SPARK-23671][core] Fix condition to enable the SHS thread pool.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vanzin/spark SPARK-23671

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20814.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20814


commit a89727753820aa0cbbe9bea4b2066c89b9ecfb4d
Author: Marcelo Vanzin 
Date:   2018-03-13T18:04:45Z

[SPARK-23671][core] Fix condition to enable the SHS thread pool.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20790: [SPARK-23642][DOCS] AccumulatorV2 subclass isZero...

2018-03-13 Thread smallory
Github user smallory commented on a diff in the pull request:

https://github.com/apache/spark/pull/20790#discussion_r174225748
  
--- Diff: core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala ---
@@ -290,7 +290,8 @@ class LongAccumulator extends AccumulatorV2[jl.Long, 
jl.Long] {
   private var _count = 0L
 
   /**
-   * Adds v to the accumulator, i.e. increment sum by v and count by 1.
+   * Returns false if this accumulator has had any values added to it or 
the sum is non-zero.
+   *
--- End diff --

The current documentation for AccumulatorV2.isZero would be misleading for 
the behaviour shown when values have been added to the accumulator, but the sum 
is zero. This still would return false, even though it is a non-count 
accumulator. I don't believe that any of the implementations in this file 
actually behave exactly as described by AccumulatorV2.isZero.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20790: [SPARK-23642][DOCS] AccumulatorV2 subclass isZero scalad...

2018-03-13 Thread smallory
Github user smallory commented on the issue:

https://github.com/apache/spark/pull/20790
  
Thanks for the pointer on the title convention, the way the contributing 
doc distinguished code and documentation changes left me a bit puzzled as to 
what actually applied to this change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20813: [SPARK-23670][SQL] Fix memory leak on SparkPlanGraphWrap...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20813
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20813: [SPARK-23670][SQL] Fix memory leak on SparkPlanGraphWrap...

2018-03-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20813
  
**[Test build #88210 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88210/testReport)**
 for PR 20813 at commit 
[`f866701`](https://github.com/apache/spark/commit/f866701b322c9ddf2fddc49d162fc9bc8d83bcdb).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20813: [SPARK-23670][SQL] Fix memory leak on SparkPlanGraphWrap...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20813
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88210/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hi...

2018-03-13 Thread DylanGuedes
Github user DylanGuedes commented on a diff in the pull request:

https://github.com/apache/spark/pull/20788#discussion_r174208501
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -437,10 +437,11 @@ def hint(self, name, *parameters):
 if not isinstance(name, str):
 raise TypeError("name should be provided as str, got 
{0}".format(type(name)))
 
+allowed_types = (basestring, list, float, int)
--- End diff --

It looks like Scala can handle unicode, so basestring looks correct. What 
you guys think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...

2018-03-13 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19951
  
Since the target of the fix is silencing a misleading exception, handling 
that exception as I suggested before would be a feasible solution. But anything 
more complicated than that is overkill.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20742: [SPARK-23572][docs] Bring "security.md" up to dat...

2018-03-13 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20742#discussion_r174203564
  
--- Diff: docs/running-on-yarn.md ---
@@ -2,6 +2,8 @@
 layout: global
 title: Running Spark on YARN
 ---
+* This will become a table of contents (this text will be scraped).
--- End diff --

This text will be replaced with the TOC.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20813: [SPARK-23670][SQL] Fix memory leak on SparkPlanGraphWrap...

2018-03-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20813
  
**[Test build #88210 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88210/testReport)**
 for PR 20813 at commit 
[`f866701`](https://github.com/apache/spark/commit/f866701b322c9ddf2fddc49d162fc9bc8d83bcdb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20813: [SPARK-23670][SQL] Fix memory leak on SparkPlanGraphWrap...

2018-03-13 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/20813
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20659: [DO-NOT-MERGE] Try to update Hive to 2.3.2

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20659
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20659: [DO-NOT-MERGE] Try to update Hive to 2.3.2

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20659
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88206/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20659: [DO-NOT-MERGE] Try to update Hive to 2.3.2

2018-03-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20659
  
**[Test build #88206 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88206/testReport)**
 for PR 20659 at commit 
[`b35daa0`](https://github.com/apache/spark/commit/b35daa0593af1204e3b2833c30ec0374e8c2b530).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20249
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88205/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20249
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...

2018-03-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20249
  
**[Test build #88205 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88205/testReport)**
 for PR 20249 at commit 
[`910d4d0`](https://github.com/apache/spark/commit/910d4d08e6743bd29358453ae977a10c30d36774).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20813: [SPARK-23670][SQL] Fix memory leak on SparkPlanGraphWrap...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20813
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20813: [SPARK-23670][SQL] Fix memory leak on SparkPlanGraphWrap...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20813
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20813: [SPARK-23670][SQL] Fix memory leak on SparkPlanGr...

2018-03-13 Thread myroslavlisniak
GitHub user myroslavlisniak opened a pull request:

https://github.com/apache/spark/pull/20813

[SPARK-23670][SQL] Fix memory leak on SparkPlanGraphWrapper

## What changes were proposed in this pull request?
Clean up SparkPlanGraphWrapper objects from InMemoryStore together with 
cleaning up SQLExecutionUIData
## How was this patch tested?
existing unit test was extended to check also SparkPlanGraphWrapper object 
count


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/myroslavlisniak/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20813.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20813


commit f866701b322c9ddf2fddc49d162fc9bc8d83bcdb
Author: myroslavlisniak 
Date:   2018-03-13T11:22:07Z

fix memory leak on SparkPlanGraphWrapper




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20433
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1490/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...

2018-03-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20433
  
**[Test build #88209 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88209/testReport)**
 for PR 20433 at commit 
[`f6210a2`](https://github.com/apache/spark/commit/f6210a2029129d38c15aeeb309c2b001d83757a6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20433
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20812: [SPARK-23669] Executors fetch jars and name the jars wit...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20812
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1489/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20812: [SPARK-23669] Executors fetch jars and name the jars wit...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20812
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20812: [SPARK-23669] Executors fetch jars and name the jars wit...

2018-03-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20812
  
**[Test build #88208 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88208/testReport)**
 for PR 20812 at commit 
[`f78c273`](https://github.com/apache/spark/commit/f78c273c6132f9cc226668590273836950c39b74).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20812: [SPARK-23669] Executors fetch jars and name the jars wit...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20812
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20812: [SPARK-23669] Executors fetch jars and name the jars wit...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20812
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1488/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20812: [SPARK-23669] Executors fetch jars and name the j...

2018-03-13 Thread jinxing64
GitHub user jinxing64 opened a pull request:

https://github.com/apache/spark/pull/20812

[SPARK-23669] Executors fetch jars and name the jars with md5 prefix

## What changes were proposed in this pull request?

In our cluster, there are lots of UDF jars, some of them have the same 
filename but different path, for example:
```
hdfs://A/B/udf.jar  -> udfA
hdfs://C/D/udf.jar  -> udfB
```
When user uses udfA and udfB in same sql, executor will fetch both 
`hdfs://A/B/udf.jar` and `hdfs://C/D/udf.jar` to local. There will be a 
conflict for the same name. 

Can we config to fetch jars and save with a filename with MD5 prefix, so 
there will be no conflict.

## How was this patch tested?
 UT 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jinxing64/spark SPARK-23669

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20812.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20812


commit 5791edb4d325f24be63485032bf01125cc2aa28b
Author: jinxing 
Date:   2018-03-13T14:15:56Z

[SPARK-23669] Executors fetch jars and name the jars with md5 prefix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20780: [MINOR] [SQL] [TEST] Create table using `dataSour...

2018-03-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20780


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20780: [MINOR] [SQL] [TEST] Create table using `dataSourceName`...

2018-03-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20780
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20810: [SPARK-20114][ML] spark.ml parity for sequential pattern...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20810
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88207/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20810: [SPARK-20114][ML] spark.ml parity for sequential pattern...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20810
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20810: [SPARK-20114][ML] spark.ml parity for sequential pattern...

2018-03-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20810
  
**[Test build #88207 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88207/testReport)**
 for PR 20810 at commit 
[`c1c5338`](https://github.com/apache/spark/commit/c1c5338c5698bb4fa87151fd8ba5cc986e1e1466).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20804: [SPARK-23656][Test] Perform assertions in XXH64Su...

2018-03-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20804


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20811: [SPARK-23668][K8S] Add config option for passing through...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20811
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19675: [SPARK-14540][BUILD] Support Scala 2.12 closures and Jav...

2018-03-13 Thread ekrich
Github user ekrich commented on the issue:

https://github.com/apache/spark/pull/19675
  
I just posted the info on https://gitter.im/scala/contributors but there is 
also scala/center and scala/scala or the forum at 
https://contributors.scala-lang.org/ . Maybe Lightbend too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20804: [SPARK-23656][Test] Perform assertions in XXH64Suite.tes...

2018-03-13 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/20804
  
LGTM - merging to master. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20742: [SPARK-23572][docs] Bring "security.md" up to date.

2018-03-13 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/20742
  
+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20811: [SPARK-23668][K8S] Add config option for passing through...

2018-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20811
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20811: [SPARK-23668][K8S] Add config option for passing ...

2018-03-13 Thread andrusha
GitHub user andrusha opened a pull request:

https://github.com/apache/spark/pull/20811

[SPARK-23668][K8S] Add config option for passing through k8s 
Pod.spec.imagePullSecrets

## What changes were proposed in this pull request?

Pass through the `imagePullSecrets` option to the k8s pod in order to allow 
user to access private image registries.

See 
https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/

## How was this patch tested?

Unit tests + manual testing.

Manual testing procedure:
1. Have private image registry.
2. Spark-submit application with no `spark.kubernetes.imagePullSecret` set. 
Do `kubectl describe pod ...`. See the error message: 
```
Error syncing pod, skipping: failed to "StartContainer" for 
"spark-kubernetes-driver" with ErrImagePull: "rpc error: code = 2 desc = Error: 
Status 400 trying to pull repository rtdp/hyperconvergence: \"{\\n  
\\\"errors\\\" : [ {\\n\\\"status\\\" : 400,\\n\\\"message\\\" : 
\\\"Unsupported docker v1 repository request for '...'\\\"\\n  } ]\\n}\""
```
3. Create secret `kubectl create secret docker-registry ...`
4. Spark-submit with `spark.kubernetes.imagePullSecret` set to the new 
secret. See that deployment was successful.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrusha/spark spark-23668-image-pull-secrets

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20811.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20811


commit dc2c1852a5056a023de64855d1f3b1ce5fd050b9
Author: Andrew Korzhuev 
Date:   2018-03-13T14:05:58Z

Add config option for passing through k8s Pod.spec.imagePullSecrets

This will allow users to access images from private registries.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19675: [SPARK-14540][BUILD] Support Scala 2.12 closures and Jav...

2018-03-13 Thread fedeoasi
Github user fedeoasi commented on the issue:

https://github.com/apache/spark/pull/19675
  
Ideally we would get someone from Scala or Scala Center to pick this up. 
Anyone knows how to get in touch with them?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

2018-03-13 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/19881#discussion_r174130728
  
--- Diff: docs/configuration.md ---
@@ -1795,6 +1796,19 @@ Apart from these, the following properties are also 
available, and may be useful
 Lower bound for the number of executors if dynamic allocation is 
enabled.
   
 
+
+  spark.dynamicAllocation.fullParallelismDivisor
+  1
+  
+By default, the dynamic allocation will request enough executors to 
maximize the 
+parallelism according to the number of tasks to process. While this 
minimizes the 
+latency of the job, with small tasks this setting wastes a lot of 
resources due to
+executor allocation overhead, as some executor might not even do any 
work.
+This setting allows to set a divisor that will be used to reduce the 
number of
+executors w.r.t. full parallelism
--- End diff --

add period at end of parallelism


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

2018-03-13 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/19881#discussion_r174126562
  
--- Diff: docs/configuration.md ---
@@ -1795,6 +1796,19 @@ Apart from these, the following properties are also 
available, and may be useful
 Lower bound for the number of executors if dynamic allocation is 
enabled.
   
 
+
+  spark.dynamicAllocation.fullParallelismDivisor
+  1
+  
+By default, the dynamic allocation will request enough executors to 
maximize the 
+parallelism according to the number of tasks to process. While this 
minimizes the 
+latency of the job, with small tasks this setting wastes a lot of 
resources due to
+executor allocation overhead, as some executor might not even do any 
work.
+This setting allows to set a divisor that will be used to reduce the 
number of
+executors w.r.t. full parallelism
+Defaults to 1.0
--- End diff --

I think we should define that maxExecutors trumps this setting.  

If I have 1 tasks, divisor 2, I would expect 5000 executors, but if max 
executors is 1000, that is all I get. 

we should add a test for this interaction as well


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

2018-03-13 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/19881#discussion_r174145101
  
--- Diff: docs/configuration.md ---
@@ -1795,6 +1796,19 @@ Apart from these, the following properties are also 
available, and may be useful
 Lower bound for the number of executors if dynamic allocation is 
enabled.
   
 
+
+  spark.dynamicAllocation.fullParallelismDivisor
--- End diff --

Naming configs is really hard and lots of different opinions on it and in 
the end someone is going to be confused,  I need to think about this some more. 
 I see the reason to use Parallelism here rather then maxExecutors  
(maxExecutorsDivisor -  could be confusing if people think it applies to the 
maxExecutors config), but I also think parallelism would be confused with the 
parallelism in the spark.default.parallelism, its not defining number of tasks 
but number of executors to allocate based on the parallelism.  Another one I 
thought of is executorAllocationDivisor.  I'll think about it some more and get 
back.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

2018-03-13 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/19881#discussion_r174125381
  
--- Diff: docs/configuration.md ---
@@ -1795,6 +1796,19 @@ Apart from these, the following properties are also 
available, and may be useful
 Lower bound for the number of executors if dynamic allocation is 
enabled.
   
 
+
+  spark.dynamicAllocation.fullParallelismDivisor
+  1
+  
+By default, the dynamic allocation will request enough executors to 
maximize the 
+parallelism according to the number of tasks to process. While this 
minimizes the 
+latency of the job, with small tasks this setting wastes a lot of 
resources due to
--- End diff --

can waste.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19675: [SPARK-14540][BUILD] Support Scala 2.12 closures and Jav...

2018-03-13 Thread ekrich
Github user ekrich commented on the issue:

https://github.com/apache/spark/pull/19675
  
Thank-you for the clarification.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19675: [SPARK-14540][BUILD] Support Scala 2.12 closures and Jav...

2018-03-13 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19675
  
I am not working on this and am not aware of anyone working on it. Yes, it 
will only be resolved when someone picks up the last piece of work and finishes 
it. It should be down to handling how closures are serialized as lambdas now; 
all the other updates should be in place.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20742: [SPARK-23572][docs] Bring "security.md" up to dat...

2018-03-13 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/20742#discussion_r174133114
  
--- Diff: docs/running-on-yarn.md ---
@@ -2,6 +2,8 @@
 layout: global
 title: Running Spark on YARN
 ---
+* This will become a table of contents (this text will be scraped).
--- End diff --

maybe I misread this, is it supposed to be this text will be scraped 
(meaning looked for in order to fill in TOC) or was it supposed to be scrapped 
(meaning thrown away)?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

2018-03-13 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20692
  
sure, @gatorsmile, no problem. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19675: [SPARK-14540][BUILD] Support Scala 2.12 closures and Jav...

2018-03-13 Thread ekrich
Github user ekrich commented on the issue:

https://github.com/apache/spark/pull/19675
  
@srowen Are you needing expertise to get the last issues fixed? It would 
sure be nice to see this completed and released. Java 8/Scala 2.12 was a hard 
upgrade for Scala too and it has taken the point releases to solve some 
problems needed for Spark. Scala 2.12.0 was released on 3 Nov 2016 and 2.13 is 
on the horizon so it would be great to see Spark support 2.12 very soon.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   >