[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19657
  
**[Test build #83544 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83544/testReport)**
 for PR 19657 at commit 
[`ca5349b`](https://github.com/apache/spark/commit/ca5349bfc0dae03c2402b104e51c78a841541b09).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19657
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83544/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19683: [SPARK-21657][SQL] optimize explode quadratic mem...

2017-11-07 Thread uzadude
GitHub user uzadude opened a pull request:

https://github.com/apache/spark/pull/19683

[SPARK-21657][SQL] optimize explode quadratic memory consumpation

## What changes were proposed in this pull request?

The issue has been raised in two Jira tickets: 
[SPARK-21657](https://issues.apache.org/jira/browse/SPARK-21657), 
[SPARK-16998](https://issues.apache.org/jira/browse/SPARK-16998). Basically, 
what happens is that in collection generators like explode/inline we create 
many rows from each row. Currently each exploded row contains also the column 
on which it was created. This causes, for example, if we have a 10k array in 
one row that this array will get copy 10k times - to each of the row. this 
results a qudratic memory consumption. However, it is a common case that the 
original column gets projected out after the explode, so we can avoid 
duplicating it.
In this solution we propose to identify this situation in the optimizer and 
turn on a flag for omitting the original column in the generation process.

## How was this patch tested?

1. We added a benchmark test to MiscBenchmark that shows x16 improvement in 
runtimes.
2. We ran some of the other tests in MiscBenchmark and they show 15% 
improvements.
3. We ran this code on a specific case from our production data with rows 
containing arrays of size ~200k and it reduced the runtime from 6 hours to 3 
mins.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uzadude/spark optimize_explode

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19683.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19683


commit ce7c3694a99584348957dc756234bb667466be4e
Author: oraviv 
Date:   2017-11-07T11:34:21Z

[SPARK-21657][SQL] optimize explode quadratic memory consumpation




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19657
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19479: [SPARK-17074] [SQL] Generate equi-height histogra...

2017-11-07 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19479#discussion_r149344559
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala
 ---
@@ -275,6 +317,118 @@ object ColumnStat extends Logging {
   avgLen = row.getLong(4),
   maxLen = row.getLong(5)
 )
+if (row.isNullAt(6)) {
+  cs
+} else {
+  val ndvs = row.getArray(6).toLongArray()
+  assert(percentiles.get.length == ndvs.length + 1)
+  val endpoints = percentiles.get.map(_.toString.toDouble)
+  // Construct equi-height histogram
+  val buckets = ndvs.zipWithIndex.map { case (ndv, i) =>
+EquiHeightBucket(endpoints(i), endpoints(i + 1), ndv)
+  }
+  val nonNullRows = rowCount - cs.nullCount
+  val ehHistogram = EquiHeightHistogram(nonNullRows.toDouble / 
ndvs.length, buckets)
+  cs.copy(histogram = Some(ehHistogram))
+}
+  }
+
+}
+
+/**
+ * There are a few types of histograms in state-of-the-art estimation 
methods. E.g. equi-width
+ * histogram, equi-height histogram, frequency histogram (value-frequency 
pairs) and hybrid
+ * histogram, etc.
+ * Currently in Spark, we support equi-height histogram since it is good 
at handling skew
+ * distribution, and also provides reasonable accuracy in other cases.
+ * We can add other histograms in the future, which will make estimation 
logic more complicated.
+ * This is because we will have to deal with computation between different 
types of histograms in
+ * some cases, e.g. for join columns.
+ */
+trait Histogram
+
+/**
+ * Equi-height histogram represents column value distribution by a 
sequence of buckets. Each bucket
+ * has a value range and contains approximately the same number of rows.
+ * @param height number of rows in each bucket
+ * @param ehBuckets equi-height histogram buckets
+ */
+case class EquiHeightHistogram(height: Double, ehBuckets: 
Array[EquiHeightBucket])
+  extends Histogram {
+
+  // Only for histogram equality test.
+  override def equals(other: Any): Boolean = other match {
+case otherEHH: EquiHeightHistogram =>
+  height == otherEHH.height && 
ehBuckets.sameElements(otherEHH.ehBuckets)
+case _ => false
+  }
+
+  override def hashCode(): Int = super.hashCode()
+}
+
+/**
+ * A bucket in an equi-height histogram. We use double type for 
lower/higher bound for simplicity.
+ * @param lo lower bound of the value range in this bucket
+ * @param hi higher bound of the value range in this bucket
+ * @param ndv approximate number of distinct values in this bucket
+ */
+case class EquiHeightBucket(lo: Double, hi: Double, ndv: Long)
+
+object HistogramSerializer {
+  // A flag to indicate the type of histogram
+  val EQUI_HEIGHT_HISTOGRAM_TYPE: Byte = 1
+
+  /**
+   * Serializes a given histogram to a string. For advanced statistics 
like histograms, sketches,
+   * etc, we don't provide readability for their serialized formats in 
metastore (as
+   * string-to-string table properties). This is because it's hard or 
unnatural for these
+   * statistics to be human readable. For example, histogram is probably 
split into multiple
+   * key-value properties, instead of a single, self-described property. 
And for
+   * count-min-sketch, it's essentially unnatural to make it a readable 
string.
+   */
+  final def serialize(histogram: Histogram): String = histogram match {
+case h: EquiHeightHistogram =>
+  // type + numBuckets + height + numBuckets * (lo + hi + ndv)
--- End diff --

Thanks for the advice. The default number of buckets is 254. Tests showed 
that after compression, the serialized length is reduced by more than 50%.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19683
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19667: [SPARK-21127][SQL][followup] fix a config name typo

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19667
  
**[Test build #83537 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83537/testReport)**
 for PR 19667 at commit 
[`55b949e`](https://github.com/apache/spark/commit/55b949e0e039ea981aeb69d6c3699c829071e368).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19667: [SPARK-21127][SQL][followup] fix a config name typo

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19667
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19667: [SPARK-21127][SQL][followup] fix a config name typo

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19667
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83537/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19674: [DO-NOT-MERGE] Investigate test failures related with SP...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19674
  
**[Test build #83546 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83546/testReport)**
 for PR 19674 at commit 
[`d03f768`](https://github.com/apache/spark/commit/d03f7689d531a1421af2f90b6c7735f0184d8b76).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19208
  
**[Test build #83534 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83534/testReport)**
 for PR 19208 at commit 
[`654e4d5`](https://github.com/apache/spark/commit/654e4d580889dcd2fcf7c0bea2060349190faaac).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19208
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19208
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83534/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19674: [DO-NOT-MERGE] Investigate test failures related with SP...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19674
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19674: [DO-NOT-MERGE] Investigate test failures related with SP...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19674
  
**[Test build #83546 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83546/testReport)**
 for PR 19674 at commit 
[`d03f768`](https://github.com/apache/spark/commit/d03f7689d531a1421af2f90b6c7735f0184d8b76).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19674: [DO-NOT-MERGE] Investigate test failures related with SP...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19674
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83546/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19674: [DO-NOT-MERGE] Investigate test failures related with SP...

2017-11-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19674
  
@ueshin, now I am getting what you meant by DST. I think you roughly knew 
this problem but let me describe it more given my debugging.

Looks the problem is in `mktime` and sounds `mktime` is platform-dependent 
(roughly assuming from codes and docs).

I made a minimised reproducer:

```python
import time
import os
from datetime import datetime
os.environ["TZ"] = "America/Los_Angeles"
time.tzset()
time.mktime(datetime(2100, 4, 4, 4, 4, 4).timetuple())
```

My local, it prints: 

```
4110523444.0
```

On Unbuntu 14.04:

```
4110519844.0
```

Jenkins, it prints:

```
4110519844.0
```

I am not sure if this is easily fixable within Spark as it looks dependent 
on Python implementation and/or the underlying C library, up to my knowledge 
and from my reading some docs.

Could you maybe avoid this time within DST for now in your PR? I currently 
don't have a good idea to fix this with a simple and surgical fix


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19649: [SPARK-22405][SQL] Add more ExternalCatalogEvent

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19649
  
**[Test build #83538 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83538/testReport)**
 for PR 19649 at commit 
[`c79c314`](https://github.com/apache/spark/commit/c79c31452d285c3b36b8adda55b9daccd6cea0d4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class AlterTablePreEvent(`
  * `case class AlterTableEvent(`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19649: [SPARK-22405][SQL] Add more ExternalCatalogEvent

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19649
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83538/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19649: [SPARK-22405][SQL] Add more ExternalCatalogEvent

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19649
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19631: [SPARK-22372][core, yarn] Make cluster submission...

2017-11-07 Thread ArtRand
Github user ArtRand commented on a diff in the pull request:

https://github.com/apache/spark/pull/19631#discussion_r149361297
  
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -542,7 +496,55 @@ private[spark] class SecurityManager(
* Gets the secret key.
* @return the secret key as a String if authentication is enabled, 
otherwise returns null
*/
-  def getSecretKey(): String = secretKey
+  def getSecretKey(): String = {
+if (isAuthenticationEnabled) {
+  Option(sparkConf.getenv(ENV_AUTH_SECRET))
+.orElse(sparkConf.getOption(SPARK_AUTH_SECRET_CONF))
+.getOrElse {
+  throw new IllegalArgumentException(
+s"A secret key must be specified via the 
$SPARK_AUTH_SECRET_CONF config")
+}
+} else {
+  null
+}
+  }
+
+  /**
+   * Initialize the configuration object held by this class for 
authentication.
+   *
+   * If authentication is disabled, do nothing.
+   *
+   * In YARN mode, generate a secret key and store it in the configuration 
object, setting it up to
+   * also be propagated to executors using an env variable.
+   *
+   * In other modes, assert that the auth secret is set in the 
configuration.
+   */
+  def initializeAuth(): Unit = {
+if (!sparkConf.get(NETWORK_AUTH_ENABLED)) {
+  return
+}
+
+if (sparkConf.get(SparkLauncher.SPARK_MASTER, null) != "yarn") {
+  require(sparkConf.contains(SPARK_AUTH_SECRET_CONF),
+s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF 
config.")
+  return
+}
+
+// In YARN, force creation of a new secret if this is client mode. 
This ensures each
--- End diff --

Is there a reason this _has_ to be unique to YARN? Will this solve the 
problem (in Mesos currently) where when the Executors 
[bootstrap](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L193)
 they do so without security (unless you "bake" the secret and secret config 
into the container image)?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19684: [SPARK-22461][ML] Move Spark ML model summaries i...

2017-11-07 Thread mgaido91
GitHub user mgaido91 opened a pull request:

https://github.com/apache/spark/pull/19684

[SPARK-22461][ML] Move Spark ML model summaries into a dedicated package

## What changes were proposed in this pull request?

Added a common abstraction (the `Summary` trait) for all the summaries in 
ML and moved them all to a new `summary` package.

## How was this patch tested?

Existing UTs and manual execution of the changed examples

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mgaido91/spark SPARK-22461

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19684.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19684


commit a663b7bbcb1591cd123b95478bb9771ed3674135
Author: Marco Gaido 
Date:   2017-11-07T11:58:27Z

[SPARK-22461][ML] Move Spark ML model summaries into a dedicated package




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19680: [SPARK-22641][ML] Refactor Spark ML model summaries

2017-11-07 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19680
  
Oh, this is tagged to the wrong JIRA. Should be SPARK-22461


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19684: [SPARK-22461][ML] Move Spark ML model summaries into a d...

2017-11-07 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19684
  
This duplicates https://github.com/apache/spark/pull/19680 @mgaido91 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19684: [SPARK-22461][ML] Move Spark ML model summaries i...

2017-11-07 Thread mgaido91
Github user mgaido91 closed the pull request at:

https://github.com/apache/spark/pull/19684


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19684: [SPARK-22461][ML] Move Spark ML model summaries into a d...

2017-11-07 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/19684
  
@srowen , sorry, I haven't seen it. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19479
  
**[Test build #83547 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83547/testReport)**
 for PR 19479 at commit 
[`fa338dd`](https://github.com/apache/spark/commit/fa338ddcb655f6e421b1be35fdd8dcd5cd866df0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19640: [SPARK-16986][CORE][WEB-UI] Support configure his...

2017-11-07 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19640#discussion_r149365816
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2742,6 +2742,11 @@ private[spark] object Utils extends Logging {
 }
   }
 
+  def getTimeZone: TimeZone = {
+val sparkConf = new SparkConf(false).loadFromSystemProperties(true)
--- End diff --

Can we make SparkConf as a input param instead of create a new instance for 
every function call?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19390: [SPARK-18935][MESOS] Fix dynamic reservations on ...

2017-11-07 Thread skonto
Github user skonto commented on a diff in the pull request:

https://github.com/apache/spark/pull/19390#discussion_r149367812
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala
 ---
@@ -451,15 +465,20 @@ trait MesosSchedulerUtils extends Logging {
   }
 
   /** Creates a mesos resource for a specific port number. */
-  private def createResourcesFromPorts(portsAndRoles: List[(Long, 
String)]) : List[Resource] = {
-portsAndRoles.flatMap{ case (port, role) =>
-  createMesosPortResource(List((port, port)), Some(role))}
+  private def createResourcesFromPorts(
+   portsAndResourcesInfo: List[(Long, (String, AllocationInfo, 
Option[ReservationInfo]))])
+: List[Resource] = {
+portsAndResourcesInfo.flatMap{ case (port, rInfo) =>
+  createMesosPortResource(List((port, port)), Option(rInfo._1),
--- End diff --

fixed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19390
  
**[Test build #83548 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83548/testReport)**
 for PR 19390 at commit 
[`dbf7875`](https://github.com/apache/spark/commit/dbf787593d72afe16cadd9731a1dd1cba608451c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new ORCFileF...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19651
  
**[Test build #83543 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83543/testReport)**
 for PR 19651 at commit 
[`f644c6a`](https://github.com/apache/spark/commit/f644c6a88b4f24376c67028d0e927a2ee49fedbe).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new ORCFileF...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19651
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new ORCFileF...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19651
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83543/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19631: [SPARK-22372][core, yarn] Make cluster submission...

2017-11-07 Thread ArtRand
Github user ArtRand commented on a diff in the pull request:

https://github.com/apache/spark/pull/19631#discussion_r149375835
  
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -542,7 +496,55 @@ private[spark] class SecurityManager(
* Gets the secret key.
* @return the secret key as a String if authentication is enabled, 
otherwise returns null
*/
-  def getSecretKey(): String = secretKey
+  def getSecretKey(): String = {
+if (isAuthenticationEnabled) {
+  Option(sparkConf.getenv(ENV_AUTH_SECRET))
+.orElse(sparkConf.getOption(SPARK_AUTH_SECRET_CONF))
+.getOrElse {
+  throw new IllegalArgumentException(
+s"A secret key must be specified via the 
$SPARK_AUTH_SECRET_CONF config")
+}
+} else {
+  null
+}
+  }
+
+  /**
+   * Initialize the configuration object held by this class for 
authentication.
+   *
+   * If authentication is disabled, do nothing.
+   *
+   * In YARN mode, generate a secret key and store it in the configuration 
object, setting it up to
+   * also be propagated to executors using an env variable.
+   *
+   * In other modes, assert that the auth secret is set in the 
configuration.
+   */
+  def initializeAuth(): Unit = {
+if (!sparkConf.get(NETWORK_AUTH_ENABLED)) {
+  return
+}
+
+if (sparkConf.get(SparkLauncher.SPARK_MASTER, null) != "yarn") {
+  require(sparkConf.contains(SPARK_AUTH_SECRET_CONF),
+s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF 
config.")
+  return
+}
+
+// In YARN, force creation of a new secret if this is client mode. 
This ensures each
--- End diff --

Is there a reason this _has_ to be unique to YARN? Will this solve the 
problem (in Mesos currently) where when the Executors 
[bootstrap](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L193)
 they do so without security (unless you 
"[bake](https://github.com/apache/spark/blob/e1960c3d6f380b0dfbba6ee5d8ac6da4bc29a698/core/src/main/scala/org/apache/spark/SparkConf.scala#L482)"
 the secret and secret config into the container image)? Looks like propagating 
the envvar is only handled in the YARN case?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19631: [SPARK-22372][core, yarn] Make cluster submission...

2017-11-07 Thread ArtRand
Github user ArtRand commented on a diff in the pull request:

https://github.com/apache/spark/pull/19631#discussion_r149375879
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -365,22 +370,21 @@ object SparkSubmit extends CommandLineUtils with 
Logging {
 
 // This security manager will not need an auth secret, but set a dummy 
value in case
 // spark.authenticate is enabled, otherwise an exception is thrown.
--- End diff --

this comment is no longer true?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19631: [SPARK-22372][core, yarn] Make cluster submission...

2017-11-07 Thread ArtRand
Github user ArtRand commented on a diff in the pull request:

https://github.com/apache/spark/pull/19631#discussion_r149375998
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionClient.scala ---
@@ -398,9 +399,20 @@ private[spark] object RestSubmissionClient {
   val PROTOCOL_VERSION = "v1"
 
   /**
-   * Submit an application, assuming Spark parameters are specified 
through the given config.
-   * This is abstracted to its own method for testing purposes.
+   * Filter non-spark environment variables from any environment.
*/
+  private[rest] def filterSystemEnvironment(env: Map[String, String]): 
Map[String, String] = {
+env.filterKeys { k =>
+  // SPARK_HOME is filtered out because it is usually wrong on the 
remote machine (SPARK-12345)
+  (k.startsWith("SPARK_") && k != "SPARK_ENV_LOADED" && k != 
"SPARK_HOME") ||
+k.startsWith("MESOS_")
--- End diff --

Will this may break Mesos when using [the mesos 
bundle](https://github.com/apache/spark/blob/master/conf/spark-env.sh.template#L33)?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19674: [DO-NOT-MERGE] Investigate test failures related with SP...

2017-11-07 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/19674
  
@HyukjinKwon Awesome! Thanks for the investigation!
I didn't think the "platform-dependent" is so dependent.
I'll update my pr to avoid the time within DST. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19285: [SPARK-22068][CORE]Reduce the duplicate code betw...

2017-11-07 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19285#discussion_r149379088
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -261,37 +259,97 @@ private[spark] class MemoryStore(
   // If this task attempt already owns more unroll memory than is 
necessary to store the
   // block, then release the extra memory that will not be used.
   val excessUnrollMemory = unrollMemoryUsedByThisBlock - size
-  releaseUnrollMemoryForThisTask(MemoryMode.ON_HEAP, 
excessUnrollMemory)
+  releaseUnrollMemoryForThisTask(memoryMode, excessUnrollMemory)
   transferUnrollToStorage(size)
   true
 }
   }
+
   if (enoughStorageMemory) {
 entries.synchronized {
-  entries.put(blockId, entry)
+  entries.put(blockId, createMemoryEntry())
 }
 logInfo("Block %s stored as values in memory (estimated size %s, 
free %s)".format(
   blockId, Utils.bytesToString(size), 
Utils.bytesToString(maxMemory - blocksMemoryUsed)))
 Right(size)
   } else {
 assert(currentUnrollMemoryForThisTask >= 
unrollMemoryUsedByThisBlock,
   "released too much unroll memory")
+Left(unrollMemoryUsedByThisBlock)
+  }
+} else {
+  Left(unrollMemoryUsedByThisBlock)
+}
+  }
+
+  /**
+   * Attempt to put the given block in memory store as values.
+   *
+   * It's possible that the iterator is too large to materialize and store 
in memory. To avoid
+   * OOM exceptions, this method will gradually unroll the iterator while 
periodically checking
+   * whether there is enough free memory. If the block is successfully 
materialized, then the
+   * temporary unroll memory used during the materialization is 
"transferred" to storage memory,
+   * so we won't acquire more memory than is actually needed to store the 
block.
+   *
+   * @return in case of success, the estimated size of the stored data. In 
case of failure, return
+   * an iterator containing the values of the block. The returned 
iterator will be backed
+   * by the combination of the partially-unrolled block and the 
remaining elements of the
+   * original input iterator. The caller must either fully consume 
this iterator or call
+   * `close()` on it in order to free the storage memory consumed 
by the partially-unrolled
+   * block.
+   */
+  private[storage] def putIteratorAsValues[T](
+  blockId: BlockId,
+  values: Iterator[T],
+  classTag: ClassTag[T]): Either[PartiallyUnrolledIterator[T], Long] = 
{
+
+// Underlying vector for unrolling the block
+var vector = new SizeTrackingVector[T]()(classTag)
+var arrayValues: Array[T] = null
+var preciseSize: Long = -1
+
+def storeValue(value: T): Unit = {
+  vector += value
+}
+
+def estimateSize(precise: Boolean): Long = {
+  if (precise) {
+// We only call need the precise size after all values unrolled.
+arrayValues = vector.toArray
+preciseSize = SizeEstimator.estimate(arrayValues)
+vector = null
--- End diff --

It looks scary to put vector to null in the function `estimateSize`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19285: [SPARK-22068][CORE]Reduce the duplicate code betw...

2017-11-07 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19285#discussion_r149379817
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -261,37 +259,97 @@ private[spark] class MemoryStore(
   // If this task attempt already owns more unroll memory than is 
necessary to store the
   // block, then release the extra memory that will not be used.
   val excessUnrollMemory = unrollMemoryUsedByThisBlock - size
-  releaseUnrollMemoryForThisTask(MemoryMode.ON_HEAP, 
excessUnrollMemory)
+  releaseUnrollMemoryForThisTask(memoryMode, excessUnrollMemory)
   transferUnrollToStorage(size)
   true
 }
   }
+
   if (enoughStorageMemory) {
 entries.synchronized {
-  entries.put(blockId, entry)
+  entries.put(blockId, createMemoryEntry())
 }
 logInfo("Block %s stored as values in memory (estimated size %s, 
free %s)".format(
   blockId, Utils.bytesToString(size), 
Utils.bytesToString(maxMemory - blocksMemoryUsed)))
 Right(size)
   } else {
 assert(currentUnrollMemoryForThisTask >= 
unrollMemoryUsedByThisBlock,
   "released too much unroll memory")
+Left(unrollMemoryUsedByThisBlock)
+  }
+} else {
+  Left(unrollMemoryUsedByThisBlock)
+}
+  }
+
+  /**
+   * Attempt to put the given block in memory store as values.
+   *
+   * It's possible that the iterator is too large to materialize and store 
in memory. To avoid
+   * OOM exceptions, this method will gradually unroll the iterator while 
periodically checking
+   * whether there is enough free memory. If the block is successfully 
materialized, then the
+   * temporary unroll memory used during the materialization is 
"transferred" to storage memory,
+   * so we won't acquire more memory than is actually needed to store the 
block.
+   *
+   * @return in case of success, the estimated size of the stored data. In 
case of failure, return
+   * an iterator containing the values of the block. The returned 
iterator will be backed
+   * by the combination of the partially-unrolled block and the 
remaining elements of the
+   * original input iterator. The caller must either fully consume 
this iterator or call
+   * `close()` on it in order to free the storage memory consumed 
by the partially-unrolled
+   * block.
+   */
+  private[storage] def putIteratorAsValues[T](
+  blockId: BlockId,
+  values: Iterator[T],
+  classTag: ClassTag[T]): Either[PartiallyUnrolledIterator[T], Long] = 
{
+
+// Underlying vector for unrolling the block
+var vector = new SizeTrackingVector[T]()(classTag)
+var arrayValues: Array[T] = null
+var preciseSize: Long = -1
+
+def storeValue(value: T): Unit = {
+  vector += value
+}
+
+def estimateSize(precise: Boolean): Long = {
+  if (precise) {
+// We only call need the precise size after all values unrolled.
+arrayValues = vector.toArray
+preciseSize = SizeEstimator.estimate(arrayValues)
+vector = null
+preciseSize
+  } else {
+vector.estimateSize()
+  }
+}
+
+def createMemoryEntry(): MemoryEntry[T] = {
+  // We successfully unrolled the entirety of this block
+  assert(arrayValues != null, "arrayValue shouldn't be null!")
+  assert(preciseSize != -1, "preciseSize shouldn't be -1")
+  val entry = new DeserializedMemoryEntry[T](arrayValues, preciseSize, 
classTag)
--- End diff --

Why do we need to create the val `entry`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19285: [SPARK-22068][CORE]Reduce the duplicate code betw...

2017-11-07 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19285#discussion_r149379715
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -261,37 +259,97 @@ private[spark] class MemoryStore(
   // If this task attempt already owns more unroll memory than is 
necessary to store the
   // block, then release the extra memory that will not be used.
   val excessUnrollMemory = unrollMemoryUsedByThisBlock - size
-  releaseUnrollMemoryForThisTask(MemoryMode.ON_HEAP, 
excessUnrollMemory)
+  releaseUnrollMemoryForThisTask(memoryMode, excessUnrollMemory)
   transferUnrollToStorage(size)
   true
 }
   }
+
   if (enoughStorageMemory) {
 entries.synchronized {
-  entries.put(blockId, entry)
+  entries.put(blockId, createMemoryEntry())
 }
 logInfo("Block %s stored as values in memory (estimated size %s, 
free %s)".format(
   blockId, Utils.bytesToString(size), 
Utils.bytesToString(maxMemory - blocksMemoryUsed)))
 Right(size)
   } else {
 assert(currentUnrollMemoryForThisTask >= 
unrollMemoryUsedByThisBlock,
   "released too much unroll memory")
+Left(unrollMemoryUsedByThisBlock)
+  }
+} else {
+  Left(unrollMemoryUsedByThisBlock)
+}
+  }
+
+  /**
+   * Attempt to put the given block in memory store as values.
+   *
+   * It's possible that the iterator is too large to materialize and store 
in memory. To avoid
+   * OOM exceptions, this method will gradually unroll the iterator while 
periodically checking
+   * whether there is enough free memory. If the block is successfully 
materialized, then the
+   * temporary unroll memory used during the materialization is 
"transferred" to storage memory,
+   * so we won't acquire more memory than is actually needed to store the 
block.
+   *
+   * @return in case of success, the estimated size of the stored data. In 
case of failure, return
+   * an iterator containing the values of the block. The returned 
iterator will be backed
+   * by the combination of the partially-unrolled block and the 
remaining elements of the
+   * original input iterator. The caller must either fully consume 
this iterator or call
+   * `close()` on it in order to free the storage memory consumed 
by the partially-unrolled
+   * block.
+   */
+  private[storage] def putIteratorAsValues[T](
+  blockId: BlockId,
+  values: Iterator[T],
+  classTag: ClassTag[T]): Either[PartiallyUnrolledIterator[T], Long] = 
{
+
+// Underlying vector for unrolling the block
+var vector = new SizeTrackingVector[T]()(classTag)
+var arrayValues: Array[T] = null
+var preciseSize: Long = -1
+
+def storeValue(value: T): Unit = {
+  vector += value
+}
+
+def estimateSize(precise: Boolean): Long = {
+  if (precise) {
+// We only call need the precise size after all values unrolled.
+arrayValues = vector.toArray
+preciseSize = SizeEstimator.estimate(arrayValues)
+vector = null
+preciseSize
+  } else {
+vector.estimateSize()
+  }
+}
+
+def createMemoryEntry(): MemoryEntry[T] = {
+  // We successfully unrolled the entirety of this block
+  assert(arrayValues != null, "arrayValue shouldn't be null!")
+  assert(preciseSize != -1, "preciseSize shouldn't be -1")
--- End diff --

Under which condition would `preciseSize` be `-1`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19285: [SPARK-22068][CORE]Reduce the duplicate code betw...

2017-11-07 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19285#discussion_r149374760
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -162,26 +162,29 @@ private[spark] class MemoryStore(
   }
 
   /**
-   * Attempt to put the given block in memory store as values.
+   * Attempt to put the given block in memory store as values or bytes.
*
* It's possible that the iterator is too large to materialize and store 
in memory. To avoid
* OOM exceptions, this method will gradually unroll the iterator while 
periodically checking
* whether there is enough free memory. If the block is successfully 
materialized, then the
* temporary unroll memory used during the materialization is 
"transferred" to storage memory,
* so we won't acquire more memory than is actually needed to store the 
block.
*
-   * @return in case of success, the estimated size of the stored data. In 
case of failure, return
-   * an iterator containing the values of the block. The returned 
iterator will be backed
-   * by the combination of the partially-unrolled block and the 
remaining elements of the
-   * original input iterator. The caller must either fully consume 
this iterator or call
-   * `close()` on it in order to free the storage memory consumed 
by the partially-unrolled
-   * block.
+   * @param memoryMode The values saved mode.
--- End diff --

nit: also add param description for `blockId`、 `values` and `classTag`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19674: [DO-NOT-MERGE] Investigate test failures related with SP...

2017-11-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19674
  
Let me leave this closed!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19674: [DO-NOT-MERGE] Investigate test failures related ...

2017-11-07 Thread HyukjinKwon
Github user HyukjinKwon closed the pull request at:

https://github.com/apache/spark/pull/19674


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "...

2017-11-07 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17436#discussion_r149382131
  
--- Diff: 
core/src/test/scala/org/apache/spark/memory/StaticMemoryManagerSuite.scala ---
@@ -43,12 +43,13 @@ class StaticMemoryManagerSuite extends 
MemoryManagerSuite {
 
   override protected def createMemoryManager(
   maxOnHeapExecutionMemory: Long,
-  maxOffHeapExecutionMemory: Long): StaticMemoryManager = {
+  maxOffHeapExecutionMemory: Long = 1000): StaticMemoryManager = {
 new StaticMemoryManager(
   conf.clone
 .set("spark.memory.fraction", "1")
 .set("spark.testing.memory", maxOnHeapExecutionMemory.toString)
-.set("spark.memory.offHeap.size", 
maxOffHeapExecutionMemory.toString),
+.set("spark.memory.offHeap.size",
--- End diff --

How about move the configuration to `internal/config`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "...

2017-11-07 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17436#discussion_r149382213
  
--- Diff: 
core/src/test/scala/org/apache/spark/memory/StaticMemoryManagerSuite.scala ---
@@ -43,12 +43,13 @@ class StaticMemoryManagerSuite extends 
MemoryManagerSuite {
 
   override protected def createMemoryManager(
   maxOnHeapExecutionMemory: Long,
-  maxOffHeapExecutionMemory: Long): StaticMemoryManager = {
+  maxOffHeapExecutionMemory: Long = 1000): StaticMemoryManager = {
--- End diff --

Please update this line.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-11-07 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
Ping @MLnick @jkbradley  . @sethah has given LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19479
  
**[Test build #83545 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83545/testReport)**
 for PR 19479 at commit 
[`804b375`](https://github.com/apache/spark/commit/804b37565b2f5d61edd492d415475a59afec41f5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class EquiHeightHistogram(height: Double, buckets: 
Array[EquiHeightBucket]) `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19479
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83545/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19479
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19672: [SPARK-22456][SQL] Add support for dayofweek function

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19672
  
**[Test build #83549 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83549/testReport)**
 for PR 19672 at commit 
[`edbe583`](https://github.com/apache/spark/commit/edbe583583a066a03df5b97a3067412f1fa51887).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new ORCFileF...

2017-11-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19651
  
The PR is updated according to your advice. Thank you again, @cloud-fan !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "...

2017-11-07 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17436#discussion_r149415162
  
--- Diff: 
core/src/test/scala/org/apache/spark/memory/StaticMemoryManagerSuite.scala ---
@@ -43,12 +43,13 @@ class StaticMemoryManagerSuite extends 
MemoryManagerSuite {
 
   override protected def createMemoryManager(
   maxOnHeapExecutionMemory: Long,
-  maxOffHeapExecutionMemory: Long): StaticMemoryManager = {
+  maxOffHeapExecutionMemory: Long = 1000): StaticMemoryManager = {
 new StaticMemoryManager(
   conf.clone
 .set("spark.memory.fraction", "1")
 .set("spark.testing.memory", maxOnHeapExecutionMemory.toString)
-.set("spark.memory.offHeap.size", 
maxOffHeapExecutionMemory.toString),
+.set("spark.memory.offHeap.size",
--- End diff --

I see. I will move `"spark.memory.offHeap.size"` to 
`org.apache.spark.internal`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19250: [SPARK-12297] Table timezone correction for Timestamps

2017-11-07 Thread zivanfi
Github user zivanfi commented on the issue:

https://github.com/apache/spark/pull/19250
  
The interoperability issue is that Impala follows timezone-agnostic 
timestamp semantics as mandated by the SQL standard, while SparkSQL follows 
UTC-normalized semantics instead (which is not SQL-compliant). Please see the 
[design 
doc|https://docs.google.com/document/d/1XmyVjr3eOJiNFjVeSnmjIU60Hq-XiZB03pgi3r1razM/edit]
 for details.

The long-term solution is using separate SQL types for different timestamps 
semantics indeed as you suggest. However, until we get there we would like to 
have a short-term solution that fixes timestamps that are _already written_.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "...

2017-11-07 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17436#discussion_r149418567
  
--- Diff: 
core/src/test/scala/org/apache/spark/memory/StaticMemoryManagerSuite.scala ---
@@ -43,12 +43,13 @@ class StaticMemoryManagerSuite extends 
MemoryManagerSuite {
 
   override protected def createMemoryManager(
   maxOnHeapExecutionMemory: Long,
-  maxOffHeapExecutionMemory: Long): StaticMemoryManager = {
+  maxOffHeapExecutionMemory: Long = 1000): StaticMemoryManager = {
 new StaticMemoryManager(
   conf.clone
 .set("spark.memory.fraction", "1")
 .set("spark.testing.memory", maxOnHeapExecutionMemory.toString)
-.set("spark.memory.offHeap.size", 
maxOffHeapExecutionMemory.toString),
+.set("spark.memory.offHeap.size",
--- End diff --

To add `= 1000` is intentional and necessary.  
If the original line is used, `spark.memory.offHeap.size` is zero even in 
the case `spark.memory.offHeap.enabled == true`. This is because the caller 
sites do not pass the second argument and then the default value `(= 0)` is 
used.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19479
  
**[Test build #83547 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83547/testReport)**
 for PR 19479 at commit 
[`fa338dd`](https://github.com/apache/spark/commit/fa338ddcb655f6e421b1be35fdd8dcd5cd866df0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19479
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19479
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83547/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19640: [SPARK-16986][CORE][WEB-UI] Support configure history se...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19640
  
**[Test build #83550 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83550/testReport)**
 for PR 19640 at commit 
[`5291d16`](https://github.com/apache/spark/commit/5291d1641c1a5dcbacd7579c11430ea791acfd56).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19685: [SPARK-19759][ML] not using blas in ALSModel.pred...

2017-11-07 Thread mgaido91
GitHub user mgaido91 opened a pull request:

https://github.com/apache/spark/pull/19685

[SPARK-19759][ML] not using blas in ALSModel.predict for optimization

## What changes were proposed in this pull request?

In `ALS.predict` currently we are using `blas.sdot` function to perform a 
dot product on two `Seq`s. It turns out that this is not the most efficient way.

I used the following code to compare the implementations:

```
def time[R](block: => R): Unit = {
val t0 = System.nanoTime()
block
val t1 = System.nanoTime()
println("Elapsed time: " + (t1 - t0) + "ns")
}
val r = new scala.util.Random(100)
val input = (1 to 50).map(_ => (1 to 100).map(_ => r.nextFloat).toSeq)
def f(a:Seq[Float], b:Seq[Float]): Float = {
var r = 0.0f
for(i <- 0 until a.length) {
r+=a(i)*b(i)
}
r
}
import com.github.fommil.netlib.BLAS.{getInstance => blas}
val b = (1 to 100).map(_ => r.nextFloat).toSeq
time { input.foreach(a=>blas.sdot(100, a.toArray, 1, b.toArray, 1)) }
// on average it takes 2968718815 ns
time { input.foreach(a=>f(a,b)) }
// on average it takes 515510185 ns
``` 

Thus this PR proposes the old-style for loop implementation for performance 
reasons.

## How was this patch tested?

existing UTs


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mgaido91/spark SPARK-19759

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19685.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19685


commit 8b0add68ddf68939c0f5ac19836f9b3b9cc58432
Author: Marco Gaido 
Date:   2017-11-07T16:29:23Z

[SPARK-19759][ML] not using blas in ALSModel.predict for optimization




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19685: [SPARK-19759][ML] not using blas in ALSModel.predict for...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19685
  
**[Test build #83551 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83551/testReport)**
 for PR 19685 at commit 
[`8b0add6`](https://github.com/apache/spark/commit/8b0add68ddf68939c0f5ac19836f9b3b9cc58432).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19657: [SPARK-22344][SPARKR] clean up install dir if run...

2017-11-07 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/19657#discussion_r149436164
  
--- Diff: R/pkg/tests/fulltests/test_utils.R ---
@@ -236,4 +236,23 @@ test_that("basenameSansExtFromUrl", {
   expect_equal(basenameSansExtFromUrl(z), "spark-2.1.0--hive")
 })
 
+test_that("getOne", {
+  dummy <- getOne(".dummyValue", envir = new.env(), ifnotfound = FALSE)
+  expect_equal(dummy, FALSE)
+})
+
+test_that("traverseParentDirs", {
+  if (is_windows()) {
+dirs <- 
traverseParentDirs("c:\\Users\\user\\AppData\\Local\\Apache\\Spark\\Cache\\spark2.2",
 3)
+expect <- 
c("c:\\Users\\user\\AppData\\Local\\Apache\\Spark\\Cache\\spark2.2",
+"c:\\Users\\user\\AppData\\Local\\Apache\\Spark\\Cache",
+"c:\\Users\\user\\AppData\\Local\\Apache\\Spark",
+"c:\\Users\\user\\AppData\\Local\\Apache")
+  } else {
+dirs <- 
traverseParentDirs("/Users/user/Library/Caches/spark/spark2.2", 1)
--- End diff --

can we also test the linux one (`/home/user/.cache`) - Just want to make 
sure we will not miss hidden files / directories.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-07 Thread shivaram
Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/19657
  
Thanks @felixcheung -- The Appveyor test seems to have failed with the 
following err
```
1. Failure: traverseParentDirs (@test_utils.R#255) 
-
`dirs` not equal to `expect`.
3/4 mismatches
x[2]: "c:/Users/user/AppData/Local/Apache/Spark/Cache"
y[2]: "c:\\Users\\user\\AppData\\Local\\Apache\\Spark\\Cache"
x[3]: "c:/Users/user/AppData/Local/Apache/Spark"
y[3]: "c:\\Users\\user\\AppData\\Local\\Apache\\Spark"
x[4]: "c:/Users/user/AppData/Local/Apache"
y[4]: "c:\\Users\\user\\AppData\\Local\\Apache"
```

@HyukjinKwon could you also take a quick look at this PR ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19390
  
**[Test build #83548 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83548/testReport)**
 for PR 19390 at commit 
[`dbf7875`](https://github.com/apache/spark/commit/dbf787593d72afe16cadd9731a1dd1cba608451c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19390
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83548/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19390
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19682: [SPARK-22464] [SQL] No pushdown for Hive metastor...

2017-11-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19682#discussion_r149445286
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -592,6 +592,19 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 }
   }
 
+
+  /**
+   * An extractor that matches all binary comparison operators except 
null-safe equality.
+   *
+   * null-safe equality is not supported by Hive metastore partition 
predicate pushdown
+   */
+  object OperatorsInMetastorePartitionFPD {
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19682: [SPARK-22464] [SQL] No pushdown for Hive metastore parti...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19682
  
**[Test build #83552 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83552/testReport)**
 for PR 19682 at commit 
[`bae7494`](https://github.com/apache/spark/commit/bae74941c28a3efb0b713e97b5fb57af7be654eb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19657
  
**[Test build #83553 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83553/testReport)**
 for PR 19657 at commit 
[`f2aa5b7`](https://github.com/apache/spark/commit/f2aa5b7e12ed36e7b56610e695615260643f952f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19667: [SPARK-21127][SQL][followup] fix a config name typo

2017-11-07 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19667
  
LGTM

Thanks! Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19679: [SPARK-20647][core] Port StorageTab to the new UI backen...

2017-11-07 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/19679
  
I figured out what is going on with StreamBlocks now, both before and after 
this change, so I'm OK with it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19667: [SPARK-21127][SQL][followup] fix a config name ty...

2017-11-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19667


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19674: [DO-NOT-MERGE] Investigate test failures related with SP...

2017-11-07 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/19674
  
Thanks for investigating this @HyukjinKwon !  In regards to the python 2.7 
versions, I thought all workers were updated to use pandas 0.19.2 and pyarrow 
0.4.1, not just for py3.  I'll bring this up when we start updating Arrow.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19657: [SPARK-22344][SPARKR] clean up install dir if run...

2017-11-07 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/19657#discussion_r149448556
  
--- Diff: R/pkg/tests/fulltests/test_utils.R ---
@@ -236,4 +236,23 @@ test_that("basenameSansExtFromUrl", {
   expect_equal(basenameSansExtFromUrl(z), "spark-2.1.0--hive")
 })
 
+test_that("getOne", {
+  dummy <- getOne(".dummyValue", envir = new.env(), ifnotfound = FALSE)
+  expect_equal(dummy, FALSE)
+})
+
+test_that("traverseParentDirs", {
+  if (is_windows()) {
+dirs <- 
traverseParentDirs("c:\\Users\\user\\AppData\\Local\\Apache\\Spark\\Cache\\spark2.2",
 3)
+expect <- 
c("c:\\Users\\user\\AppData\\Local\\Apache\\Spark\\Cache\\spark2.2",
+"c:\\Users\\user\\AppData\\Local\\Apache\\Spark\\Cache",
+"c:\\Users\\user\\AppData\\Local\\Apache\\Spark",
+"c:\\Users\\user\\AppData\\Local\\Apache")
+  } else {
+dirs <- 
traverseParentDirs("/Users/user/Library/Caches/spark/spark2.2", 1)
--- End diff --

sure, but well hopefully the implementation is not platform dependent, 
otherwise we will need to test linux as well as osx
 (and it doesn't check if the path is valid/present) 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19657
  
**[Test build #83554 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83554/testReport)**
 for PR 19657 at commit 
[`f21a90b`](https://github.com/apache/spark/commit/f21a90bef2a08c9d4cfdcc6588fb2da64679b4ec).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19631: [SPARK-22372][core, yarn] Make cluster submission...

2017-11-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19631#discussion_r149451263
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionClient.scala ---
@@ -398,9 +399,20 @@ private[spark] object RestSubmissionClient {
   val PROTOCOL_VERSION = "v1"
 
   /**
-   * Submit an application, assuming Spark parameters are specified 
through the given config.
-   * This is abstracted to its own method for testing purposes.
+   * Filter non-spark environment variables from any environment.
*/
+  private[rest] def filterSystemEnvironment(env: Map[String, String]): 
Map[String, String] = {
+env.filterKeys { k =>
+  // SPARK_HOME is filtered out because it is usually wrong on the 
remote machine (SPARK-12345)
+  (k.startsWith("SPARK_") && k != "SPARK_ENV_LOADED" && k != 
"SPARK_HOME") ||
+k.startsWith("MESOS_")
--- End diff --

I don't know, but this should be the exact same behavior as before, no? I 
didn't really change this code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19631: [SPARK-22372][core, yarn] Make cluster submission...

2017-11-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19631#discussion_r149451751
  
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -542,7 +496,55 @@ private[spark] class SecurityManager(
* Gets the secret key.
* @return the secret key as a String if authentication is enabled, 
otherwise returns null
*/
-  def getSecretKey(): String = secretKey
+  def getSecretKey(): String = {
+if (isAuthenticationEnabled) {
+  Option(sparkConf.getenv(ENV_AUTH_SECRET))
+.orElse(sparkConf.getOption(SPARK_AUTH_SECRET_CONF))
+.getOrElse {
+  throw new IllegalArgumentException(
+s"A secret key must be specified via the 
$SPARK_AUTH_SECRET_CONF config")
+}
+} else {
+  null
+}
+  }
+
+  /**
+   * Initialize the configuration object held by this class for 
authentication.
+   *
+   * If authentication is disabled, do nothing.
+   *
+   * In YARN mode, generate a secret key and store it in the configuration 
object, setting it up to
+   * also be propagated to executors using an env variable.
+   *
+   * In other modes, assert that the auth secret is set in the 
configuration.
+   */
+  def initializeAuth(): Unit = {
+if (!sparkConf.get(NETWORK_AUTH_ENABLED)) {
+  return
+}
+
+if (sparkConf.get(SparkLauncher.SPARK_MASTER, null) != "yarn") {
+  require(sparkConf.contains(SPARK_AUTH_SECRET_CONF),
+s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF 
config.")
+  return
+}
+
+// In YARN, force creation of a new secret if this is client mode. 
This ensures each
--- End diff --

This behaves the same way as before for non-YARN. Standalone and Mesos have 
always used hardcoded secrets in the config to authenticate executors to driver 
and the driver to the master (in the case of standalone).

You can see the code I'm changing in this class, where for non-YARN it 
would throw an error if the secret was not set. If changing that behavior is 
desired for Mesos, then it should be done in a separate change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19631: [SPARK-22372][core, yarn] Make cluster submission use Sp...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19631
  
**[Test build #83555 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83555/testReport)**
 for PR 19631 at commit 
[`cb6c8ff`](https://github.com/apache/spark/commit/cb6c8ffe9eb04e05b8cd982e123ae962bd2e7373).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19685: [SPARK-19759][ML] not using blas in ALSModel.predict for...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19685
  
**[Test build #83551 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83551/testReport)**
 for PR 19685 at commit 
[`8b0add6`](https://github.com/apache/spark/commit/8b0add68ddf68939c0f5ac19836f9b3b9cc58432).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19685: [SPARK-19759][ML] not using blas in ALSModel.predict for...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19685
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19685: [SPARK-19759][ML] not using blas in ALSModel.predict for...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19685
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83551/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19678: [SPARK-20646][core] Port executors page to new UI backen...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19678
  
**[Test build #83556 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83556/testReport)**
 for PR 19678 at commit 
[`e4c427a`](https://github.com/apache/spark/commit/e4c427afebfa6ce2a234076eda4e4e426c4006dc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19631: [SPARK-22372][core, yarn] Make cluster submission...

2017-11-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19631#discussion_r149454492
  
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -542,7 +496,55 @@ private[spark] class SecurityManager(
* Gets the secret key.
* @return the secret key as a String if authentication is enabled, 
otherwise returns null
*/
-  def getSecretKey(): String = secretKey
+  def getSecretKey(): String = {
+if (isAuthenticationEnabled) {
+  Option(sparkConf.getenv(ENV_AUTH_SECRET))
+.orElse(sparkConf.getOption(SPARK_AUTH_SECRET_CONF))
+.getOrElse {
+  throw new IllegalArgumentException(
+s"A secret key must be specified via the 
$SPARK_AUTH_SECRET_CONF config")
+}
+} else {
+  null
+}
+  }
+
+  /**
+   * Initialize the configuration object held by this class for 
authentication.
+   *
+   * If authentication is disabled, do nothing.
+   *
+   * In YARN mode, generate a secret key and store it in the configuration 
object, setting it up to
+   * also be propagated to executors using an env variable.
+   *
+   * In other modes, assert that the auth secret is set in the 
configuration.
+   */
+  def initializeAuth(): Unit = {
+if (!sparkConf.get(NETWORK_AUTH_ENABLED)) {
+  return
+}
+
+if (sparkConf.get(SparkLauncher.SPARK_MASTER, null) != "yarn") {
+  require(sparkConf.contains(SPARK_AUTH_SECRET_CONF),
+s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF 
config.")
+  return
+}
+
+// In YARN, force creation of a new secret if this is client mode. 
This ensures each
--- End diff --

BTW standalone at least propagates the secret using an env var, the issue 
is just that standalone, at least, needs the same secret everywhere, including 
the part where the driver authenticates with the master. Mesos just inherited 
that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19631: [SPARK-22372][core, yarn] Make cluster submission...

2017-11-07 Thread ArtRand
Github user ArtRand commented on a diff in the pull request:

https://github.com/apache/spark/pull/19631#discussion_r149455493
  
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -542,7 +496,55 @@ private[spark] class SecurityManager(
* Gets the secret key.
* @return the secret key as a String if authentication is enabled, 
otherwise returns null
*/
-  def getSecretKey(): String = secretKey
+  def getSecretKey(): String = {
+if (isAuthenticationEnabled) {
+  Option(sparkConf.getenv(ENV_AUTH_SECRET))
+.orElse(sparkConf.getOption(SPARK_AUTH_SECRET_CONF))
+.getOrElse {
+  throw new IllegalArgumentException(
+s"A secret key must be specified via the 
$SPARK_AUTH_SECRET_CONF config")
+}
+} else {
+  null
+}
+  }
+
+  /**
+   * Initialize the configuration object held by this class for 
authentication.
+   *
+   * If authentication is disabled, do nothing.
+   *
+   * In YARN mode, generate a secret key and store it in the configuration 
object, setting it up to
+   * also be propagated to executors using an env variable.
+   *
+   * In other modes, assert that the auth secret is set in the 
configuration.
+   */
+  def initializeAuth(): Unit = {
+if (!sparkConf.get(NETWORK_AUTH_ENABLED)) {
+  return
+}
+
+if (sparkConf.get(SparkLauncher.SPARK_MASTER, null) != "yarn") {
+  require(sparkConf.contains(SPARK_AUTH_SECRET_CONF),
+s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF 
config.")
+  return
+}
+
+// In YARN, force creation of a new secret if this is client mode. 
This ensures each
--- End diff --

Yes. I guess I'm wondering if now with your change will this work in all 
cases not just YARN? Perhaps obviously, I'm looking into changing this for 
Mesos.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19672: [SPARK-22456][SQL] Add support for dayofweek function

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19672
  
**[Test build #83549 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83549/testReport)**
 for PR 19672 at commit 
[`edbe583`](https://github.com/apache/spark/commit/edbe583583a066a03df5b97a3067412f1fa51887).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19672: [SPARK-22456][SQL] Add support for dayofweek function

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19672
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19672: [SPARK-22456][SQL] Add support for dayofweek function

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19672
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83549/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19631: [SPARK-22372][core, yarn] Make cluster submission...

2017-11-07 Thread ArtRand
Github user ArtRand commented on a diff in the pull request:

https://github.com/apache/spark/pull/19631#discussion_r149456772
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionClient.scala ---
@@ -398,9 +399,20 @@ private[spark] object RestSubmissionClient {
   val PROTOCOL_VERSION = "v1"
 
   /**
-   * Submit an application, assuming Spark parameters are specified 
through the given config.
-   * This is abstracted to its own method for testing purposes.
+   * Filter non-spark environment variables from any environment.
*/
+  private[rest] def filterSystemEnvironment(env: Map[String, String]): 
Map[String, String] = {
+env.filterKeys { k =>
+  // SPARK_HOME is filtered out because it is usually wrong on the 
remote machine (SPARK-12345)
+  (k.startsWith("SPARK_") && k != "SPARK_ENV_LOADED" && k != 
"SPARK_HOME") ||
+k.startsWith("MESOS_")
--- End diff --

Yes, I apologize you're correct. I think this is actually to filter out 
things like `MESOS_EXECUTOR_ID` and `MESOS_FRAMEWORK_ID`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19631: [SPARK-22372][core, yarn] Make cluster submission...

2017-11-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19631#discussion_r149459107
  
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -542,7 +496,55 @@ private[spark] class SecurityManager(
* Gets the secret key.
* @return the secret key as a String if authentication is enabled, 
otherwise returns null
*/
-  def getSecretKey(): String = secretKey
+  def getSecretKey(): String = {
+if (isAuthenticationEnabled) {
+  Option(sparkConf.getenv(ENV_AUTH_SECRET))
+.orElse(sparkConf.getOption(SPARK_AUTH_SECRET_CONF))
+.getOrElse {
+  throw new IllegalArgumentException(
+s"A secret key must be specified via the 
$SPARK_AUTH_SECRET_CONF config")
+}
+} else {
+  null
+}
+  }
+
+  /**
+   * Initialize the configuration object held by this class for 
authentication.
+   *
+   * If authentication is disabled, do nothing.
+   *
+   * In YARN mode, generate a secret key and store it in the configuration 
object, setting it up to
+   * also be propagated to executors using an env variable.
+   *
+   * In other modes, assert that the auth secret is set in the 
configuration.
+   */
+  def initializeAuth(): Unit = {
+if (!sparkConf.get(NETWORK_AUTH_ENABLED)) {
+  return
+}
+
+if (sparkConf.get(SparkLauncher.SPARK_MASTER, null) != "yarn") {
+  require(sparkConf.contains(SPARK_AUTH_SECRET_CONF),
+s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF 
config.")
+  return
+}
+
+// In YARN, force creation of a new secret if this is client mode. 
This ensures each
--- End diff --

I'm not changing the previous behavior for non-YARN. So my change shouldn't 
make it easier nor harder to make things work for other cluster managers.

Whether it will work depends on how the auth secret is used in those cases.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17436
  
**[Test build #83557 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83557/testReport)**
 for PR 17436 at commit 
[`5fe810e`](https://github.com/apache/spark/commit/5fe810ee88165650415b01711708f9dc1dcdf251).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17436
  
**[Test build #83557 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83557/testReport)**
 for PR 17436 at commit 
[`5fe810e`](https://github.com/apache/spark/commit/5fe810ee88165650415b01711708f9dc1dcdf251).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17436
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83557/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17436
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...

2017-11-07 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19683
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19657
  
**[Test build #83553 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83553/testReport)**
 for PR 19657 at commit 
[`f2aa5b7`](https://github.com/apache/spark/commit/f2aa5b7e12ed36e7b56610e695615260643f952f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19657
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19657
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83553/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19683
  
**[Test build #83558 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83558/testReport)**
 for PR 19683 at commit 
[`ce7c369`](https://github.com/apache/spark/commit/ce7c3694a99584348957dc756234bb667466be4e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19683
  
**[Test build #83558 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83558/testReport)**
 for PR 19683 at commit 
[`ce7c369`](https://github.com/apache/spark/commit/ce7c3694a99584348957dc756234bb667466be4e).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19683
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19683
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83558/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   >