[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19657 **[Test build #83544 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83544/testReport)** for PR 19657 at commit [`ca5349b`](https://github.com/apache/spark/commit/ca5349bfc0dae03c2402b104e51c78a841541b09). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19657 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83544/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19683: [SPARK-21657][SQL] optimize explode quadratic mem...
GitHub user uzadude opened a pull request: https://github.com/apache/spark/pull/19683 [SPARK-21657][SQL] optimize explode quadratic memory consumpation ## What changes were proposed in this pull request? The issue has been raised in two Jira tickets: [SPARK-21657](https://issues.apache.org/jira/browse/SPARK-21657), [SPARK-16998](https://issues.apache.org/jira/browse/SPARK-16998). Basically, what happens is that in collection generators like explode/inline we create many rows from each row. Currently each exploded row contains also the column on which it was created. This causes, for example, if we have a 10k array in one row that this array will get copy 10k times - to each of the row. this results a qudratic memory consumption. However, it is a common case that the original column gets projected out after the explode, so we can avoid duplicating it. In this solution we propose to identify this situation in the optimizer and turn on a flag for omitting the original column in the generation process. ## How was this patch tested? 1. We added a benchmark test to MiscBenchmark that shows x16 improvement in runtimes. 2. We ran some of the other tests in MiscBenchmark and they show 15% improvements. 3. We ran this code on a specific case from our production data with rows containing arrays of size ~200k and it reduced the runtime from 6 hours to 3 mins. You can merge this pull request into a Git repository by running: $ git pull https://github.com/uzadude/spark optimize_explode Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19683.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19683 commit ce7c3694a99584348957dc756234bb667466be4e Author: oraviv Date: 2017-11-07T11:34:21Z [SPARK-21657][SQL] optimize explode quadratic memory consumpation --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19657 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19479: [SPARK-17074] [SQL] Generate equi-height histogra...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19479#discussion_r149344559 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala --- @@ -275,6 +317,118 @@ object ColumnStat extends Logging { avgLen = row.getLong(4), maxLen = row.getLong(5) ) +if (row.isNullAt(6)) { + cs +} else { + val ndvs = row.getArray(6).toLongArray() + assert(percentiles.get.length == ndvs.length + 1) + val endpoints = percentiles.get.map(_.toString.toDouble) + // Construct equi-height histogram + val buckets = ndvs.zipWithIndex.map { case (ndv, i) => +EquiHeightBucket(endpoints(i), endpoints(i + 1), ndv) + } + val nonNullRows = rowCount - cs.nullCount + val ehHistogram = EquiHeightHistogram(nonNullRows.toDouble / ndvs.length, buckets) + cs.copy(histogram = Some(ehHistogram)) +} + } + +} + +/** + * There are a few types of histograms in state-of-the-art estimation methods. E.g. equi-width + * histogram, equi-height histogram, frequency histogram (value-frequency pairs) and hybrid + * histogram, etc. + * Currently in Spark, we support equi-height histogram since it is good at handling skew + * distribution, and also provides reasonable accuracy in other cases. + * We can add other histograms in the future, which will make estimation logic more complicated. + * This is because we will have to deal with computation between different types of histograms in + * some cases, e.g. for join columns. + */ +trait Histogram + +/** + * Equi-height histogram represents column value distribution by a sequence of buckets. Each bucket + * has a value range and contains approximately the same number of rows. + * @param height number of rows in each bucket + * @param ehBuckets equi-height histogram buckets + */ +case class EquiHeightHistogram(height: Double, ehBuckets: Array[EquiHeightBucket]) + extends Histogram { + + // Only for histogram equality test. + override def equals(other: Any): Boolean = other match { +case otherEHH: EquiHeightHistogram => + height == otherEHH.height && ehBuckets.sameElements(otherEHH.ehBuckets) +case _ => false + } + + override def hashCode(): Int = super.hashCode() +} + +/** + * A bucket in an equi-height histogram. We use double type for lower/higher bound for simplicity. + * @param lo lower bound of the value range in this bucket + * @param hi higher bound of the value range in this bucket + * @param ndv approximate number of distinct values in this bucket + */ +case class EquiHeightBucket(lo: Double, hi: Double, ndv: Long) + +object HistogramSerializer { + // A flag to indicate the type of histogram + val EQUI_HEIGHT_HISTOGRAM_TYPE: Byte = 1 + + /** + * Serializes a given histogram to a string. For advanced statistics like histograms, sketches, + * etc, we don't provide readability for their serialized formats in metastore (as + * string-to-string table properties). This is because it's hard or unnatural for these + * statistics to be human readable. For example, histogram is probably split into multiple + * key-value properties, instead of a single, self-described property. And for + * count-min-sketch, it's essentially unnatural to make it a readable string. + */ + final def serialize(histogram: Histogram): String = histogram match { +case h: EquiHeightHistogram => + // type + numBuckets + height + numBuckets * (lo + hi + ndv) --- End diff -- Thanks for the advice. The default number of buckets is 254. Tests showed that after compression, the serialized length is reduced by more than 50%. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19683 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19667: [SPARK-21127][SQL][followup] fix a config name typo
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19667 **[Test build #83537 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83537/testReport)** for PR 19667 at commit [`55b949e`](https://github.com/apache/spark/commit/55b949e0e039ea981aeb69d6c3699c829071e368). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19667: [SPARK-21127][SQL][followup] fix a config name typo
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19667 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19667: [SPARK-21127][SQL][followup] fix a config name typo
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19667 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83537/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19674: [DO-NOT-MERGE] Investigate test failures related with SP...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19674 **[Test build #83546 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83546/testReport)** for PR 19674 at commit [`d03f768`](https://github.com/apache/spark/commit/d03f7689d531a1421af2f90b6c7735f0184d8b76). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19208 **[Test build #83534 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83534/testReport)** for PR 19208 at commit [`654e4d5`](https://github.com/apache/spark/commit/654e4d580889dcd2fcf7c0bea2060349190faaac). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19208 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19208 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83534/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19674: [DO-NOT-MERGE] Investigate test failures related with SP...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19674 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19674: [DO-NOT-MERGE] Investigate test failures related with SP...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19674 **[Test build #83546 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83546/testReport)** for PR 19674 at commit [`d03f768`](https://github.com/apache/spark/commit/d03f7689d531a1421af2f90b6c7735f0184d8b76). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19674: [DO-NOT-MERGE] Investigate test failures related with SP...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19674 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83546/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19674: [DO-NOT-MERGE] Investigate test failures related with SP...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19674 @ueshin, now I am getting what you meant by DST. I think you roughly knew this problem but let me describe it more given my debugging. Looks the problem is in `mktime` and sounds `mktime` is platform-dependent (roughly assuming from codes and docs). I made a minimised reproducer: ```python import time import os from datetime import datetime os.environ["TZ"] = "America/Los_Angeles" time.tzset() time.mktime(datetime(2100, 4, 4, 4, 4, 4).timetuple()) ``` My local, it prints: ``` 4110523444.0 ``` On Unbuntu 14.04: ``` 4110519844.0 ``` Jenkins, it prints: ``` 4110519844.0 ``` I am not sure if this is easily fixable within Spark as it looks dependent on Python implementation and/or the underlying C library, up to my knowledge and from my reading some docs. Could you maybe avoid this time within DST for now in your PR? I currently don't have a good idea to fix this with a simple and surgical fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19649: [SPARK-22405][SQL] Add more ExternalCatalogEvent
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19649 **[Test build #83538 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83538/testReport)** for PR 19649 at commit [`c79c314`](https://github.com/apache/spark/commit/c79c31452d285c3b36b8adda55b9daccd6cea0d4). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class AlterTablePreEvent(` * `case class AlterTableEvent(` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19649: [SPARK-22405][SQL] Add more ExternalCatalogEvent
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19649 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83538/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19649: [SPARK-22405][SQL] Add more ExternalCatalogEvent
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19649 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19631: [SPARK-22372][core, yarn] Make cluster submission...
Github user ArtRand commented on a diff in the pull request: https://github.com/apache/spark/pull/19631#discussion_r149361297 --- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala --- @@ -542,7 +496,55 @@ private[spark] class SecurityManager( * Gets the secret key. * @return the secret key as a String if authentication is enabled, otherwise returns null */ - def getSecretKey(): String = secretKey + def getSecretKey(): String = { +if (isAuthenticationEnabled) { + Option(sparkConf.getenv(ENV_AUTH_SECRET)) +.orElse(sparkConf.getOption(SPARK_AUTH_SECRET_CONF)) +.getOrElse { + throw new IllegalArgumentException( +s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF config") +} +} else { + null +} + } + + /** + * Initialize the configuration object held by this class for authentication. + * + * If authentication is disabled, do nothing. + * + * In YARN mode, generate a secret key and store it in the configuration object, setting it up to + * also be propagated to executors using an env variable. + * + * In other modes, assert that the auth secret is set in the configuration. + */ + def initializeAuth(): Unit = { +if (!sparkConf.get(NETWORK_AUTH_ENABLED)) { + return +} + +if (sparkConf.get(SparkLauncher.SPARK_MASTER, null) != "yarn") { + require(sparkConf.contains(SPARK_AUTH_SECRET_CONF), +s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF config.") + return +} + +// In YARN, force creation of a new secret if this is client mode. This ensures each --- End diff -- Is there a reason this _has_ to be unique to YARN? Will this solve the problem (in Mesos currently) where when the Executors [bootstrap](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L193) they do so without security (unless you "bake" the secret and secret config into the container image)? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19684: [SPARK-22461][ML] Move Spark ML model summaries i...
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/19684 [SPARK-22461][ML] Move Spark ML model summaries into a dedicated package ## What changes were proposed in this pull request? Added a common abstraction (the `Summary` trait) for all the summaries in ML and moved them all to a new `summary` package. ## How was this patch tested? Existing UTs and manual execution of the changed examples You can merge this pull request into a Git repository by running: $ git pull https://github.com/mgaido91/spark SPARK-22461 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19684.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19684 commit a663b7bbcb1591cd123b95478bb9771ed3674135 Author: Marco Gaido Date: 2017-11-07T11:58:27Z [SPARK-22461][ML] Move Spark ML model summaries into a dedicated package --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19680: [SPARK-22641][ML] Refactor Spark ML model summaries
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19680 Oh, this is tagged to the wrong JIRA. Should be SPARK-22461 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19684: [SPARK-22461][ML] Move Spark ML model summaries into a d...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19684 This duplicates https://github.com/apache/spark/pull/19680 @mgaido91 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19684: [SPARK-22461][ML] Move Spark ML model summaries i...
Github user mgaido91 closed the pull request at: https://github.com/apache/spark/pull/19684 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19684: [SPARK-22461][ML] Move Spark ML model summaries into a d...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19684 @srowen , sorry, I haven't seen it. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19479 **[Test build #83547 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83547/testReport)** for PR 19479 at commit [`fa338dd`](https://github.com/apache/spark/commit/fa338ddcb655f6e421b1be35fdd8dcd5cd866df0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19640: [SPARK-16986][CORE][WEB-UI] Support configure his...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/19640#discussion_r149365816 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2742,6 +2742,11 @@ private[spark] object Utils extends Logging { } } + def getTimeZone: TimeZone = { +val sparkConf = new SparkConf(false).loadFromSystemProperties(true) --- End diff -- Can we make SparkConf as a input param instead of create a new instance for every function call? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19390: [SPARK-18935][MESOS] Fix dynamic reservations on ...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/19390#discussion_r149367812 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala --- @@ -451,15 +465,20 @@ trait MesosSchedulerUtils extends Logging { } /** Creates a mesos resource for a specific port number. */ - private def createResourcesFromPorts(portsAndRoles: List[(Long, String)]) : List[Resource] = { -portsAndRoles.flatMap{ case (port, role) => - createMesosPortResource(List((port, port)), Some(role))} + private def createResourcesFromPorts( + portsAndResourcesInfo: List[(Long, (String, AllocationInfo, Option[ReservationInfo]))]) +: List[Resource] = { +portsAndResourcesInfo.flatMap{ case (port, rInfo) => + createMesosPortResource(List((port, port)), Option(rInfo._1), --- End diff -- fixed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19390 **[Test build #83548 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83548/testReport)** for PR 19390 at commit [`dbf7875`](https://github.com/apache/spark/commit/dbf787593d72afe16cadd9731a1dd1cba608451c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new ORCFileF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19651 **[Test build #83543 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83543/testReport)** for PR 19651 at commit [`f644c6a`](https://github.com/apache/spark/commit/f644c6a88b4f24376c67028d0e927a2ee49fedbe). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new ORCFileF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19651 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new ORCFileF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19651 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83543/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19631: [SPARK-22372][core, yarn] Make cluster submission...
Github user ArtRand commented on a diff in the pull request: https://github.com/apache/spark/pull/19631#discussion_r149375835 --- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala --- @@ -542,7 +496,55 @@ private[spark] class SecurityManager( * Gets the secret key. * @return the secret key as a String if authentication is enabled, otherwise returns null */ - def getSecretKey(): String = secretKey + def getSecretKey(): String = { +if (isAuthenticationEnabled) { + Option(sparkConf.getenv(ENV_AUTH_SECRET)) +.orElse(sparkConf.getOption(SPARK_AUTH_SECRET_CONF)) +.getOrElse { + throw new IllegalArgumentException( +s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF config") +} +} else { + null +} + } + + /** + * Initialize the configuration object held by this class for authentication. + * + * If authentication is disabled, do nothing. + * + * In YARN mode, generate a secret key and store it in the configuration object, setting it up to + * also be propagated to executors using an env variable. + * + * In other modes, assert that the auth secret is set in the configuration. + */ + def initializeAuth(): Unit = { +if (!sparkConf.get(NETWORK_AUTH_ENABLED)) { + return +} + +if (sparkConf.get(SparkLauncher.SPARK_MASTER, null) != "yarn") { + require(sparkConf.contains(SPARK_AUTH_SECRET_CONF), +s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF config.") + return +} + +// In YARN, force creation of a new secret if this is client mode. This ensures each --- End diff -- Is there a reason this _has_ to be unique to YARN? Will this solve the problem (in Mesos currently) where when the Executors [bootstrap](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L193) they do so without security (unless you "[bake](https://github.com/apache/spark/blob/e1960c3d6f380b0dfbba6ee5d8ac6da4bc29a698/core/src/main/scala/org/apache/spark/SparkConf.scala#L482)" the secret and secret config into the container image)? Looks like propagating the envvar is only handled in the YARN case? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19631: [SPARK-22372][core, yarn] Make cluster submission...
Github user ArtRand commented on a diff in the pull request: https://github.com/apache/spark/pull/19631#discussion_r149375879 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -365,22 +370,21 @@ object SparkSubmit extends CommandLineUtils with Logging { // This security manager will not need an auth secret, but set a dummy value in case // spark.authenticate is enabled, otherwise an exception is thrown. --- End diff -- this comment is no longer true? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19631: [SPARK-22372][core, yarn] Make cluster submission...
Github user ArtRand commented on a diff in the pull request: https://github.com/apache/spark/pull/19631#discussion_r149375998 --- Diff: core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionClient.scala --- @@ -398,9 +399,20 @@ private[spark] object RestSubmissionClient { val PROTOCOL_VERSION = "v1" /** - * Submit an application, assuming Spark parameters are specified through the given config. - * This is abstracted to its own method for testing purposes. + * Filter non-spark environment variables from any environment. */ + private[rest] def filterSystemEnvironment(env: Map[String, String]): Map[String, String] = { +env.filterKeys { k => + // SPARK_HOME is filtered out because it is usually wrong on the remote machine (SPARK-12345) + (k.startsWith("SPARK_") && k != "SPARK_ENV_LOADED" && k != "SPARK_HOME") || +k.startsWith("MESOS_") --- End diff -- Will this may break Mesos when using [the mesos bundle](https://github.com/apache/spark/blob/master/conf/spark-env.sh.template#L33)? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19674: [DO-NOT-MERGE] Investigate test failures related with SP...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19674 @HyukjinKwon Awesome! Thanks for the investigation! I didn't think the "platform-dependent" is so dependent. I'll update my pr to avoid the time within DST. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19285: [SPARK-22068][CORE]Reduce the duplicate code betw...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/19285#discussion_r149379088 --- Diff: core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala --- @@ -261,37 +259,97 @@ private[spark] class MemoryStore( // If this task attempt already owns more unroll memory than is necessary to store the // block, then release the extra memory that will not be used. val excessUnrollMemory = unrollMemoryUsedByThisBlock - size - releaseUnrollMemoryForThisTask(MemoryMode.ON_HEAP, excessUnrollMemory) + releaseUnrollMemoryForThisTask(memoryMode, excessUnrollMemory) transferUnrollToStorage(size) true } } + if (enoughStorageMemory) { entries.synchronized { - entries.put(blockId, entry) + entries.put(blockId, createMemoryEntry()) } logInfo("Block %s stored as values in memory (estimated size %s, free %s)".format( blockId, Utils.bytesToString(size), Utils.bytesToString(maxMemory - blocksMemoryUsed))) Right(size) } else { assert(currentUnrollMemoryForThisTask >= unrollMemoryUsedByThisBlock, "released too much unroll memory") +Left(unrollMemoryUsedByThisBlock) + } +} else { + Left(unrollMemoryUsedByThisBlock) +} + } + + /** + * Attempt to put the given block in memory store as values. + * + * It's possible that the iterator is too large to materialize and store in memory. To avoid + * OOM exceptions, this method will gradually unroll the iterator while periodically checking + * whether there is enough free memory. If the block is successfully materialized, then the + * temporary unroll memory used during the materialization is "transferred" to storage memory, + * so we won't acquire more memory than is actually needed to store the block. + * + * @return in case of success, the estimated size of the stored data. In case of failure, return + * an iterator containing the values of the block. The returned iterator will be backed + * by the combination of the partially-unrolled block and the remaining elements of the + * original input iterator. The caller must either fully consume this iterator or call + * `close()` on it in order to free the storage memory consumed by the partially-unrolled + * block. + */ + private[storage] def putIteratorAsValues[T]( + blockId: BlockId, + values: Iterator[T], + classTag: ClassTag[T]): Either[PartiallyUnrolledIterator[T], Long] = { + +// Underlying vector for unrolling the block +var vector = new SizeTrackingVector[T]()(classTag) +var arrayValues: Array[T] = null +var preciseSize: Long = -1 + +def storeValue(value: T): Unit = { + vector += value +} + +def estimateSize(precise: Boolean): Long = { + if (precise) { +// We only call need the precise size after all values unrolled. +arrayValues = vector.toArray +preciseSize = SizeEstimator.estimate(arrayValues) +vector = null --- End diff -- It looks scary to put vector to null in the function `estimateSize`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19285: [SPARK-22068][CORE]Reduce the duplicate code betw...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/19285#discussion_r149379817 --- Diff: core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala --- @@ -261,37 +259,97 @@ private[spark] class MemoryStore( // If this task attempt already owns more unroll memory than is necessary to store the // block, then release the extra memory that will not be used. val excessUnrollMemory = unrollMemoryUsedByThisBlock - size - releaseUnrollMemoryForThisTask(MemoryMode.ON_HEAP, excessUnrollMemory) + releaseUnrollMemoryForThisTask(memoryMode, excessUnrollMemory) transferUnrollToStorage(size) true } } + if (enoughStorageMemory) { entries.synchronized { - entries.put(blockId, entry) + entries.put(blockId, createMemoryEntry()) } logInfo("Block %s stored as values in memory (estimated size %s, free %s)".format( blockId, Utils.bytesToString(size), Utils.bytesToString(maxMemory - blocksMemoryUsed))) Right(size) } else { assert(currentUnrollMemoryForThisTask >= unrollMemoryUsedByThisBlock, "released too much unroll memory") +Left(unrollMemoryUsedByThisBlock) + } +} else { + Left(unrollMemoryUsedByThisBlock) +} + } + + /** + * Attempt to put the given block in memory store as values. + * + * It's possible that the iterator is too large to materialize and store in memory. To avoid + * OOM exceptions, this method will gradually unroll the iterator while periodically checking + * whether there is enough free memory. If the block is successfully materialized, then the + * temporary unroll memory used during the materialization is "transferred" to storage memory, + * so we won't acquire more memory than is actually needed to store the block. + * + * @return in case of success, the estimated size of the stored data. In case of failure, return + * an iterator containing the values of the block. The returned iterator will be backed + * by the combination of the partially-unrolled block and the remaining elements of the + * original input iterator. The caller must either fully consume this iterator or call + * `close()` on it in order to free the storage memory consumed by the partially-unrolled + * block. + */ + private[storage] def putIteratorAsValues[T]( + blockId: BlockId, + values: Iterator[T], + classTag: ClassTag[T]): Either[PartiallyUnrolledIterator[T], Long] = { + +// Underlying vector for unrolling the block +var vector = new SizeTrackingVector[T]()(classTag) +var arrayValues: Array[T] = null +var preciseSize: Long = -1 + +def storeValue(value: T): Unit = { + vector += value +} + +def estimateSize(precise: Boolean): Long = { + if (precise) { +// We only call need the precise size after all values unrolled. +arrayValues = vector.toArray +preciseSize = SizeEstimator.estimate(arrayValues) +vector = null +preciseSize + } else { +vector.estimateSize() + } +} + +def createMemoryEntry(): MemoryEntry[T] = { + // We successfully unrolled the entirety of this block + assert(arrayValues != null, "arrayValue shouldn't be null!") + assert(preciseSize != -1, "preciseSize shouldn't be -1") + val entry = new DeserializedMemoryEntry[T](arrayValues, preciseSize, classTag) --- End diff -- Why do we need to create the val `entry`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19285: [SPARK-22068][CORE]Reduce the duplicate code betw...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/19285#discussion_r149379715 --- Diff: core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala --- @@ -261,37 +259,97 @@ private[spark] class MemoryStore( // If this task attempt already owns more unroll memory than is necessary to store the // block, then release the extra memory that will not be used. val excessUnrollMemory = unrollMemoryUsedByThisBlock - size - releaseUnrollMemoryForThisTask(MemoryMode.ON_HEAP, excessUnrollMemory) + releaseUnrollMemoryForThisTask(memoryMode, excessUnrollMemory) transferUnrollToStorage(size) true } } + if (enoughStorageMemory) { entries.synchronized { - entries.put(blockId, entry) + entries.put(blockId, createMemoryEntry()) } logInfo("Block %s stored as values in memory (estimated size %s, free %s)".format( blockId, Utils.bytesToString(size), Utils.bytesToString(maxMemory - blocksMemoryUsed))) Right(size) } else { assert(currentUnrollMemoryForThisTask >= unrollMemoryUsedByThisBlock, "released too much unroll memory") +Left(unrollMemoryUsedByThisBlock) + } +} else { + Left(unrollMemoryUsedByThisBlock) +} + } + + /** + * Attempt to put the given block in memory store as values. + * + * It's possible that the iterator is too large to materialize and store in memory. To avoid + * OOM exceptions, this method will gradually unroll the iterator while periodically checking + * whether there is enough free memory. If the block is successfully materialized, then the + * temporary unroll memory used during the materialization is "transferred" to storage memory, + * so we won't acquire more memory than is actually needed to store the block. + * + * @return in case of success, the estimated size of the stored data. In case of failure, return + * an iterator containing the values of the block. The returned iterator will be backed + * by the combination of the partially-unrolled block and the remaining elements of the + * original input iterator. The caller must either fully consume this iterator or call + * `close()` on it in order to free the storage memory consumed by the partially-unrolled + * block. + */ + private[storage] def putIteratorAsValues[T]( + blockId: BlockId, + values: Iterator[T], + classTag: ClassTag[T]): Either[PartiallyUnrolledIterator[T], Long] = { + +// Underlying vector for unrolling the block +var vector = new SizeTrackingVector[T]()(classTag) +var arrayValues: Array[T] = null +var preciseSize: Long = -1 + +def storeValue(value: T): Unit = { + vector += value +} + +def estimateSize(precise: Boolean): Long = { + if (precise) { +// We only call need the precise size after all values unrolled. +arrayValues = vector.toArray +preciseSize = SizeEstimator.estimate(arrayValues) +vector = null +preciseSize + } else { +vector.estimateSize() + } +} + +def createMemoryEntry(): MemoryEntry[T] = { + // We successfully unrolled the entirety of this block + assert(arrayValues != null, "arrayValue shouldn't be null!") + assert(preciseSize != -1, "preciseSize shouldn't be -1") --- End diff -- Under which condition would `preciseSize` be `-1`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19285: [SPARK-22068][CORE]Reduce the duplicate code betw...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/19285#discussion_r149374760 --- Diff: core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala --- @@ -162,26 +162,29 @@ private[spark] class MemoryStore( } /** - * Attempt to put the given block in memory store as values. + * Attempt to put the given block in memory store as values or bytes. * * It's possible that the iterator is too large to materialize and store in memory. To avoid * OOM exceptions, this method will gradually unroll the iterator while periodically checking * whether there is enough free memory. If the block is successfully materialized, then the * temporary unroll memory used during the materialization is "transferred" to storage memory, * so we won't acquire more memory than is actually needed to store the block. * - * @return in case of success, the estimated size of the stored data. In case of failure, return - * an iterator containing the values of the block. The returned iterator will be backed - * by the combination of the partially-unrolled block and the remaining elements of the - * original input iterator. The caller must either fully consume this iterator or call - * `close()` on it in order to free the storage memory consumed by the partially-unrolled - * block. + * @param memoryMode The values saved mode. --- End diff -- nit: also add param description for `blockId`ã `values` and `classTag`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19674: [DO-NOT-MERGE] Investigate test failures related with SP...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19674 Let me leave this closed! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19674: [DO-NOT-MERGE] Investigate test failures related ...
Github user HyukjinKwon closed the pull request at: https://github.com/apache/spark/pull/19674 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/17436#discussion_r149382131 --- Diff: core/src/test/scala/org/apache/spark/memory/StaticMemoryManagerSuite.scala --- @@ -43,12 +43,13 @@ class StaticMemoryManagerSuite extends MemoryManagerSuite { override protected def createMemoryManager( maxOnHeapExecutionMemory: Long, - maxOffHeapExecutionMemory: Long): StaticMemoryManager = { + maxOffHeapExecutionMemory: Long = 1000): StaticMemoryManager = { new StaticMemoryManager( conf.clone .set("spark.memory.fraction", "1") .set("spark.testing.memory", maxOnHeapExecutionMemory.toString) -.set("spark.memory.offHeap.size", maxOffHeapExecutionMemory.toString), +.set("spark.memory.offHeap.size", --- End diff -- How about move the configuration to `internal/config`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/17436#discussion_r149382213 --- Diff: core/src/test/scala/org/apache/spark/memory/StaticMemoryManagerSuite.scala --- @@ -43,12 +43,13 @@ class StaticMemoryManagerSuite extends MemoryManagerSuite { override protected def createMemoryManager( maxOnHeapExecutionMemory: Long, - maxOffHeapExecutionMemory: Long): StaticMemoryManager = { + maxOffHeapExecutionMemory: Long = 1000): StaticMemoryManager = { --- End diff -- Please update this line. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 Ping @MLnick @jkbradley . @sethah has given LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19479 **[Test build #83545 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83545/testReport)** for PR 19479 at commit [`804b375`](https://github.com/apache/spark/commit/804b37565b2f5d61edd492d415475a59afec41f5). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class EquiHeightHistogram(height: Double, buckets: Array[EquiHeightBucket]) ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19479 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83545/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19479 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19672: [SPARK-22456][SQL] Add support for dayofweek function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19672 **[Test build #83549 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83549/testReport)** for PR 19672 at commit [`edbe583`](https://github.com/apache/spark/commit/edbe583583a066a03df5b97a3067412f1fa51887). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new ORCFileF...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19651 The PR is updated according to your advice. Thank you again, @cloud-fan ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17436#discussion_r149415162 --- Diff: core/src/test/scala/org/apache/spark/memory/StaticMemoryManagerSuite.scala --- @@ -43,12 +43,13 @@ class StaticMemoryManagerSuite extends MemoryManagerSuite { override protected def createMemoryManager( maxOnHeapExecutionMemory: Long, - maxOffHeapExecutionMemory: Long): StaticMemoryManager = { + maxOffHeapExecutionMemory: Long = 1000): StaticMemoryManager = { new StaticMemoryManager( conf.clone .set("spark.memory.fraction", "1") .set("spark.testing.memory", maxOnHeapExecutionMemory.toString) -.set("spark.memory.offHeap.size", maxOffHeapExecutionMemory.toString), +.set("spark.memory.offHeap.size", --- End diff -- I see. I will move `"spark.memory.offHeap.size"` to `org.apache.spark.internal`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19250: [SPARK-12297] Table timezone correction for Timestamps
Github user zivanfi commented on the issue: https://github.com/apache/spark/pull/19250 The interoperability issue is that Impala follows timezone-agnostic timestamp semantics as mandated by the SQL standard, while SparkSQL follows UTC-normalized semantics instead (which is not SQL-compliant). Please see the [design doc|https://docs.google.com/document/d/1XmyVjr3eOJiNFjVeSnmjIU60Hq-XiZB03pgi3r1razM/edit] for details. The long-term solution is using separate SQL types for different timestamps semantics indeed as you suggest. However, until we get there we would like to have a short-term solution that fixes timestamps that are _already written_. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17436#discussion_r149418567 --- Diff: core/src/test/scala/org/apache/spark/memory/StaticMemoryManagerSuite.scala --- @@ -43,12 +43,13 @@ class StaticMemoryManagerSuite extends MemoryManagerSuite { override protected def createMemoryManager( maxOnHeapExecutionMemory: Long, - maxOffHeapExecutionMemory: Long): StaticMemoryManager = { + maxOffHeapExecutionMemory: Long = 1000): StaticMemoryManager = { new StaticMemoryManager( conf.clone .set("spark.memory.fraction", "1") .set("spark.testing.memory", maxOnHeapExecutionMemory.toString) -.set("spark.memory.offHeap.size", maxOffHeapExecutionMemory.toString), +.set("spark.memory.offHeap.size", --- End diff -- To add `= 1000` is intentional and necessary. If the original line is used, `spark.memory.offHeap.size` is zero even in the case `spark.memory.offHeap.enabled == true`. This is because the caller sites do not pass the second argument and then the default value `(= 0)` is used. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19479 **[Test build #83547 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83547/testReport)** for PR 19479 at commit [`fa338dd`](https://github.com/apache/spark/commit/fa338ddcb655f6e421b1be35fdd8dcd5cd866df0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19479 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19479 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83547/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19640: [SPARK-16986][CORE][WEB-UI] Support configure history se...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19640 **[Test build #83550 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83550/testReport)** for PR 19640 at commit [`5291d16`](https://github.com/apache/spark/commit/5291d1641c1a5dcbacd7579c11430ea791acfd56). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19685: [SPARK-19759][ML] not using blas in ALSModel.pred...
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/19685 [SPARK-19759][ML] not using blas in ALSModel.predict for optimization ## What changes were proposed in this pull request? In `ALS.predict` currently we are using `blas.sdot` function to perform a dot product on two `Seq`s. It turns out that this is not the most efficient way. I used the following code to compare the implementations: ``` def time[R](block: => R): Unit = { val t0 = System.nanoTime() block val t1 = System.nanoTime() println("Elapsed time: " + (t1 - t0) + "ns") } val r = new scala.util.Random(100) val input = (1 to 50).map(_ => (1 to 100).map(_ => r.nextFloat).toSeq) def f(a:Seq[Float], b:Seq[Float]): Float = { var r = 0.0f for(i <- 0 until a.length) { r+=a(i)*b(i) } r } import com.github.fommil.netlib.BLAS.{getInstance => blas} val b = (1 to 100).map(_ => r.nextFloat).toSeq time { input.foreach(a=>blas.sdot(100, a.toArray, 1, b.toArray, 1)) } // on average it takes 2968718815 ns time { input.foreach(a=>f(a,b)) } // on average it takes 515510185 ns ``` Thus this PR proposes the old-style for loop implementation for performance reasons. ## How was this patch tested? existing UTs You can merge this pull request into a Git repository by running: $ git pull https://github.com/mgaido91/spark SPARK-19759 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19685.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19685 commit 8b0add68ddf68939c0f5ac19836f9b3b9cc58432 Author: Marco Gaido Date: 2017-11-07T16:29:23Z [SPARK-19759][ML] not using blas in ALSModel.predict for optimization --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19685: [SPARK-19759][ML] not using blas in ALSModel.predict for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19685 **[Test build #83551 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83551/testReport)** for PR 19685 at commit [`8b0add6`](https://github.com/apache/spark/commit/8b0add68ddf68939c0f5ac19836f9b3b9cc58432). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19657: [SPARK-22344][SPARKR] clean up install dir if run...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/19657#discussion_r149436164 --- Diff: R/pkg/tests/fulltests/test_utils.R --- @@ -236,4 +236,23 @@ test_that("basenameSansExtFromUrl", { expect_equal(basenameSansExtFromUrl(z), "spark-2.1.0--hive") }) +test_that("getOne", { + dummy <- getOne(".dummyValue", envir = new.env(), ifnotfound = FALSE) + expect_equal(dummy, FALSE) +}) + +test_that("traverseParentDirs", { + if (is_windows()) { +dirs <- traverseParentDirs("c:\\Users\\user\\AppData\\Local\\Apache\\Spark\\Cache\\spark2.2", 3) +expect <- c("c:\\Users\\user\\AppData\\Local\\Apache\\Spark\\Cache\\spark2.2", +"c:\\Users\\user\\AppData\\Local\\Apache\\Spark\\Cache", +"c:\\Users\\user\\AppData\\Local\\Apache\\Spark", +"c:\\Users\\user\\AppData\\Local\\Apache") + } else { +dirs <- traverseParentDirs("/Users/user/Library/Caches/spark/spark2.2", 1) --- End diff -- can we also test the linux one (`/home/user/.cache`) - Just want to make sure we will not miss hidden files / directories. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/19657 Thanks @felixcheung -- The Appveyor test seems to have failed with the following err ``` 1. Failure: traverseParentDirs (@test_utils.R#255) - `dirs` not equal to `expect`. 3/4 mismatches x[2]: "c:/Users/user/AppData/Local/Apache/Spark/Cache" y[2]: "c:\\Users\\user\\AppData\\Local\\Apache\\Spark\\Cache" x[3]: "c:/Users/user/AppData/Local/Apache/Spark" y[3]: "c:\\Users\\user\\AppData\\Local\\Apache\\Spark" x[4]: "c:/Users/user/AppData/Local/Apache" y[4]: "c:\\Users\\user\\AppData\\Local\\Apache" ``` @HyukjinKwon could you also take a quick look at this PR ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19390 **[Test build #83548 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83548/testReport)** for PR 19390 at commit [`dbf7875`](https://github.com/apache/spark/commit/dbf787593d72afe16cadd9731a1dd1cba608451c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19390 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83548/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19390 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19682: [SPARK-22464] [SQL] No pushdown for Hive metastor...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19682#discussion_r149445286 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -592,6 +592,19 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { } } + + /** + * An extractor that matches all binary comparison operators except null-safe equality. + * + * null-safe equality is not supported by Hive metastore partition predicate pushdown + */ + object OperatorsInMetastorePartitionFPD { --- End diff -- Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19682: [SPARK-22464] [SQL] No pushdown for Hive metastore parti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19682 **[Test build #83552 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83552/testReport)** for PR 19682 at commit [`bae7494`](https://github.com/apache/spark/commit/bae74941c28a3efb0b713e97b5fb57af7be654eb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19657 **[Test build #83553 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83553/testReport)** for PR 19657 at commit [`f2aa5b7`](https://github.com/apache/spark/commit/f2aa5b7e12ed36e7b56610e695615260643f952f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19667: [SPARK-21127][SQL][followup] fix a config name typo
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19667 LGTM Thanks! Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19679: [SPARK-20647][core] Port StorageTab to the new UI backen...
Github user squito commented on the issue: https://github.com/apache/spark/pull/19679 I figured out what is going on with StreamBlocks now, both before and after this change, so I'm OK with it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19667: [SPARK-21127][SQL][followup] fix a config name ty...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19667 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19674: [DO-NOT-MERGE] Investigate test failures related with SP...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/19674 Thanks for investigating this @HyukjinKwon ! In regards to the python 2.7 versions, I thought all workers were updated to use pandas 0.19.2 and pyarrow 0.4.1, not just for py3. I'll bring this up when we start updating Arrow. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19657: [SPARK-22344][SPARKR] clean up install dir if run...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/19657#discussion_r149448556 --- Diff: R/pkg/tests/fulltests/test_utils.R --- @@ -236,4 +236,23 @@ test_that("basenameSansExtFromUrl", { expect_equal(basenameSansExtFromUrl(z), "spark-2.1.0--hive") }) +test_that("getOne", { + dummy <- getOne(".dummyValue", envir = new.env(), ifnotfound = FALSE) + expect_equal(dummy, FALSE) +}) + +test_that("traverseParentDirs", { + if (is_windows()) { +dirs <- traverseParentDirs("c:\\Users\\user\\AppData\\Local\\Apache\\Spark\\Cache\\spark2.2", 3) +expect <- c("c:\\Users\\user\\AppData\\Local\\Apache\\Spark\\Cache\\spark2.2", +"c:\\Users\\user\\AppData\\Local\\Apache\\Spark\\Cache", +"c:\\Users\\user\\AppData\\Local\\Apache\\Spark", +"c:\\Users\\user\\AppData\\Local\\Apache") + } else { +dirs <- traverseParentDirs("/Users/user/Library/Caches/spark/spark2.2", 1) --- End diff -- sure, but well hopefully the implementation is not platform dependent, otherwise we will need to test linux as well as osx (and it doesn't check if the path is valid/present) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19657 **[Test build #83554 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83554/testReport)** for PR 19657 at commit [`f21a90b`](https://github.com/apache/spark/commit/f21a90bef2a08c9d4cfdcc6588fb2da64679b4ec). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19631: [SPARK-22372][core, yarn] Make cluster submission...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19631#discussion_r149451263 --- Diff: core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionClient.scala --- @@ -398,9 +399,20 @@ private[spark] object RestSubmissionClient { val PROTOCOL_VERSION = "v1" /** - * Submit an application, assuming Spark parameters are specified through the given config. - * This is abstracted to its own method for testing purposes. + * Filter non-spark environment variables from any environment. */ + private[rest] def filterSystemEnvironment(env: Map[String, String]): Map[String, String] = { +env.filterKeys { k => + // SPARK_HOME is filtered out because it is usually wrong on the remote machine (SPARK-12345) + (k.startsWith("SPARK_") && k != "SPARK_ENV_LOADED" && k != "SPARK_HOME") || +k.startsWith("MESOS_") --- End diff -- I don't know, but this should be the exact same behavior as before, no? I didn't really change this code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19631: [SPARK-22372][core, yarn] Make cluster submission...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19631#discussion_r149451751 --- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala --- @@ -542,7 +496,55 @@ private[spark] class SecurityManager( * Gets the secret key. * @return the secret key as a String if authentication is enabled, otherwise returns null */ - def getSecretKey(): String = secretKey + def getSecretKey(): String = { +if (isAuthenticationEnabled) { + Option(sparkConf.getenv(ENV_AUTH_SECRET)) +.orElse(sparkConf.getOption(SPARK_AUTH_SECRET_CONF)) +.getOrElse { + throw new IllegalArgumentException( +s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF config") +} +} else { + null +} + } + + /** + * Initialize the configuration object held by this class for authentication. + * + * If authentication is disabled, do nothing. + * + * In YARN mode, generate a secret key and store it in the configuration object, setting it up to + * also be propagated to executors using an env variable. + * + * In other modes, assert that the auth secret is set in the configuration. + */ + def initializeAuth(): Unit = { +if (!sparkConf.get(NETWORK_AUTH_ENABLED)) { + return +} + +if (sparkConf.get(SparkLauncher.SPARK_MASTER, null) != "yarn") { + require(sparkConf.contains(SPARK_AUTH_SECRET_CONF), +s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF config.") + return +} + +// In YARN, force creation of a new secret if this is client mode. This ensures each --- End diff -- This behaves the same way as before for non-YARN. Standalone and Mesos have always used hardcoded secrets in the config to authenticate executors to driver and the driver to the master (in the case of standalone). You can see the code I'm changing in this class, where for non-YARN it would throw an error if the secret was not set. If changing that behavior is desired for Mesos, then it should be done in a separate change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19631: [SPARK-22372][core, yarn] Make cluster submission use Sp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19631 **[Test build #83555 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83555/testReport)** for PR 19631 at commit [`cb6c8ff`](https://github.com/apache/spark/commit/cb6c8ffe9eb04e05b8cd982e123ae962bd2e7373). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19685: [SPARK-19759][ML] not using blas in ALSModel.predict for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19685 **[Test build #83551 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83551/testReport)** for PR 19685 at commit [`8b0add6`](https://github.com/apache/spark/commit/8b0add68ddf68939c0f5ac19836f9b3b9cc58432). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19685: [SPARK-19759][ML] not using blas in ALSModel.predict for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19685 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19685: [SPARK-19759][ML] not using blas in ALSModel.predict for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19685 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83551/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19678: [SPARK-20646][core] Port executors page to new UI backen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19678 **[Test build #83556 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83556/testReport)** for PR 19678 at commit [`e4c427a`](https://github.com/apache/spark/commit/e4c427afebfa6ce2a234076eda4e4e426c4006dc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19631: [SPARK-22372][core, yarn] Make cluster submission...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19631#discussion_r149454492 --- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala --- @@ -542,7 +496,55 @@ private[spark] class SecurityManager( * Gets the secret key. * @return the secret key as a String if authentication is enabled, otherwise returns null */ - def getSecretKey(): String = secretKey + def getSecretKey(): String = { +if (isAuthenticationEnabled) { + Option(sparkConf.getenv(ENV_AUTH_SECRET)) +.orElse(sparkConf.getOption(SPARK_AUTH_SECRET_CONF)) +.getOrElse { + throw new IllegalArgumentException( +s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF config") +} +} else { + null +} + } + + /** + * Initialize the configuration object held by this class for authentication. + * + * If authentication is disabled, do nothing. + * + * In YARN mode, generate a secret key and store it in the configuration object, setting it up to + * also be propagated to executors using an env variable. + * + * In other modes, assert that the auth secret is set in the configuration. + */ + def initializeAuth(): Unit = { +if (!sparkConf.get(NETWORK_AUTH_ENABLED)) { + return +} + +if (sparkConf.get(SparkLauncher.SPARK_MASTER, null) != "yarn") { + require(sparkConf.contains(SPARK_AUTH_SECRET_CONF), +s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF config.") + return +} + +// In YARN, force creation of a new secret if this is client mode. This ensures each --- End diff -- BTW standalone at least propagates the secret using an env var, the issue is just that standalone, at least, needs the same secret everywhere, including the part where the driver authenticates with the master. Mesos just inherited that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19631: [SPARK-22372][core, yarn] Make cluster submission...
Github user ArtRand commented on a diff in the pull request: https://github.com/apache/spark/pull/19631#discussion_r149455493 --- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala --- @@ -542,7 +496,55 @@ private[spark] class SecurityManager( * Gets the secret key. * @return the secret key as a String if authentication is enabled, otherwise returns null */ - def getSecretKey(): String = secretKey + def getSecretKey(): String = { +if (isAuthenticationEnabled) { + Option(sparkConf.getenv(ENV_AUTH_SECRET)) +.orElse(sparkConf.getOption(SPARK_AUTH_SECRET_CONF)) +.getOrElse { + throw new IllegalArgumentException( +s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF config") +} +} else { + null +} + } + + /** + * Initialize the configuration object held by this class for authentication. + * + * If authentication is disabled, do nothing. + * + * In YARN mode, generate a secret key and store it in the configuration object, setting it up to + * also be propagated to executors using an env variable. + * + * In other modes, assert that the auth secret is set in the configuration. + */ + def initializeAuth(): Unit = { +if (!sparkConf.get(NETWORK_AUTH_ENABLED)) { + return +} + +if (sparkConf.get(SparkLauncher.SPARK_MASTER, null) != "yarn") { + require(sparkConf.contains(SPARK_AUTH_SECRET_CONF), +s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF config.") + return +} + +// In YARN, force creation of a new secret if this is client mode. This ensures each --- End diff -- Yes. I guess I'm wondering if now with your change will this work in all cases not just YARN? Perhaps obviously, I'm looking into changing this for Mesos. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19672: [SPARK-22456][SQL] Add support for dayofweek function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19672 **[Test build #83549 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83549/testReport)** for PR 19672 at commit [`edbe583`](https://github.com/apache/spark/commit/edbe583583a066a03df5b97a3067412f1fa51887). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19672: [SPARK-22456][SQL] Add support for dayofweek function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19672 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19672: [SPARK-22456][SQL] Add support for dayofweek function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19672 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83549/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19631: [SPARK-22372][core, yarn] Make cluster submission...
Github user ArtRand commented on a diff in the pull request: https://github.com/apache/spark/pull/19631#discussion_r149456772 --- Diff: core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionClient.scala --- @@ -398,9 +399,20 @@ private[spark] object RestSubmissionClient { val PROTOCOL_VERSION = "v1" /** - * Submit an application, assuming Spark parameters are specified through the given config. - * This is abstracted to its own method for testing purposes. + * Filter non-spark environment variables from any environment. */ + private[rest] def filterSystemEnvironment(env: Map[String, String]): Map[String, String] = { +env.filterKeys { k => + // SPARK_HOME is filtered out because it is usually wrong on the remote machine (SPARK-12345) + (k.startsWith("SPARK_") && k != "SPARK_ENV_LOADED" && k != "SPARK_HOME") || +k.startsWith("MESOS_") --- End diff -- Yes, I apologize you're correct. I think this is actually to filter out things like `MESOS_EXECUTOR_ID` and `MESOS_FRAMEWORK_ID` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19631: [SPARK-22372][core, yarn] Make cluster submission...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19631#discussion_r149459107 --- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala --- @@ -542,7 +496,55 @@ private[spark] class SecurityManager( * Gets the secret key. * @return the secret key as a String if authentication is enabled, otherwise returns null */ - def getSecretKey(): String = secretKey + def getSecretKey(): String = { +if (isAuthenticationEnabled) { + Option(sparkConf.getenv(ENV_AUTH_SECRET)) +.orElse(sparkConf.getOption(SPARK_AUTH_SECRET_CONF)) +.getOrElse { + throw new IllegalArgumentException( +s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF config") +} +} else { + null +} + } + + /** + * Initialize the configuration object held by this class for authentication. + * + * If authentication is disabled, do nothing. + * + * In YARN mode, generate a secret key and store it in the configuration object, setting it up to + * also be propagated to executors using an env variable. + * + * In other modes, assert that the auth secret is set in the configuration. + */ + def initializeAuth(): Unit = { +if (!sparkConf.get(NETWORK_AUTH_ENABLED)) { + return +} + +if (sparkConf.get(SparkLauncher.SPARK_MASTER, null) != "yarn") { + require(sparkConf.contains(SPARK_AUTH_SECRET_CONF), +s"A secret key must be specified via the $SPARK_AUTH_SECRET_CONF config.") + return +} + +// In YARN, force creation of a new secret if this is client mode. This ensures each --- End diff -- I'm not changing the previous behavior for non-YARN. So my change shouldn't make it easier nor harder to make things work for other cluster managers. Whether it will work depends on how the auth secret is used in those cases. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17436 **[Test build #83557 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83557/testReport)** for PR 17436 at commit [`5fe810e`](https://github.com/apache/spark/commit/5fe810ee88165650415b01711708f9dc1dcdf251). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17436 **[Test build #83557 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83557/testReport)** for PR 17436 at commit [`5fe810e`](https://github.com/apache/spark/commit/5fe810ee88165650415b01711708f9dc1dcdf251). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17436 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83557/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17436 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19683 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19657 **[Test build #83553 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83553/testReport)** for PR 19657 at commit [`f2aa5b7`](https://github.com/apache/spark/commit/f2aa5b7e12ed36e7b56610e695615260643f952f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19657 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19657 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83553/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19683 **[Test build #83558 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83558/testReport)** for PR 19683 at commit [`ce7c369`](https://github.com/apache/spark/commit/ce7c3694a99584348957dc756234bb667466be4e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19683 **[Test build #83558 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83558/testReport)** for PR 19683 at commit [`ce7c369`](https://github.com/apache/spark/commit/ce7c3694a99584348957dc756234bb667466be4e). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19683 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19683 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83558/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org