[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20779 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88212/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20779 **[Test build #88212 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88212/testReport)** for PR 20779 at commit [`8fb5df0`](https://github.com/apache/spark/commit/8fb5df0f76a6773594bb7e695036f3fdf0063c6a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20799: [SPARK-23635][YARN] AM env variable should not ov...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/20799#discussion_r174294428 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala --- @@ -247,6 +241,18 @@ private[yarn] class ExecutorRunnable( System.getenv().asScala.filterKeys(_.startsWith("SPARK")) .foreach { case (k, v) => env(k) = v } + +sparkConf.getExecutorEnv.foreach { case (key, value) => + if (key == Environment.CLASSPATH.name()) { --- End diff -- Ah, I see sorry missed that. So I guess here we are just stomping on whatever is in the system env path now, vs before we were stomping on the executorEnv specified with the system env. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20816: SPARK-21479 Outer join filter pushdown in null supplying...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20816 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1494/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20816: SPARK-21479 Outer join filter pushdown in null supplying...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20816 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20796: [SPARK-23649][SQL] Prevent crashes on schema infe...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20796#discussion_r174293523 --- Diff: sql/core/src/test/resources/test-data/utf8xFF.csv --- @@ -0,0 +1,3 @@ +channel,code +United,123 +ABGUN�,456 --- End diff -- how did you create this file? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20816: SPARK-21479 Outer join filter pushdown in null supplying...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20816 **[Test build #88215 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88215/testReport)** for PR 20816 at commit [`ac17976`](https://github.com/apache/spark/commit/ac17976fd2b024039ee6cd848b864d2d052ec573). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20816: SPARK-21479 Outer join filter pushdown in null su...
GitHub user maryannxue opened a pull request: https://github.com/apache/spark/pull/20816 SPARK-21479 Outer join filter pushdown in null supplying table when condition is on one of the joined columns ## What changes were proposed in this pull request? Added `TransitPredicateInOuterJoin` optimization rule that transits constraints from the preserved side of an outer join to the null-supplying side. The constraints of the join operator will remain unchanged. ## How was this patch tested? Added 3 tests in `InferFiltersFromConstraintsSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maryannxue/spark spark-21479 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20816.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20816 commit ac17976fd2b024039ee6cd848b864d2d052ec573 Author: maryannxue Date: 2018-03-13T21:05:37Z SPARK-21479 Outer join filter pushdown in null supplying table when condition is on one of the joined columns --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20806: [SPARK-23661][SQL] Implement treeAggregate on Dat...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20806#discussion_r174277864 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1658,6 +1659,43 @@ class Dataset[T] private[sql]( def groupByKey[K](func: MapFunction[T, K], encoder: Encoder[K]): KeyValueGroupedDataset[K, T] = groupByKey(func.call(_))(encoder) + + /** + * Aggregates the elements of this Dataset in a multi-level tree pattern. + * + * @param depth suggested depth of the tree (default: 2) + */ + private[spark] def treeAggregate[U : Encoder : ClassTag](zeroValue: U)( + seqOp: (U, T) => U, + combOp: (U, U) => U, + depth: Int = 2): U = { +require(depth >= 1, s"Depth must be greater than or equal to 1 but got $depth.") +val sparkContext = sparkSession.sparkContext +val copiedZeroValue = Utils.clone(zeroValue, sparkContext.env.closureSerializer.newInstance()) +if (rdd.partitions.length == 0) { + copiedZeroValue +} else { + val aggregatePartition = +(it: Iterator[T]) => it.aggregate(zeroValue)(seqOp, combOp) + var partiallyAggregated: Dataset[U] = mapPartitions(it => Iterator(aggregatePartition(it))) --- End diff -- Why can't we call `rdd.treeAggregate` directly? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20806: [SPARK-23661][SQL] Implement treeAggregate on Dat...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20806#discussion_r174276969 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1658,6 +1659,43 @@ class Dataset[T] private[sql]( def groupByKey[K](func: MapFunction[T, K], encoder: Encoder[K]): KeyValueGroupedDataset[K, T] = groupByKey(func.call(_))(encoder) + + /** + * Aggregates the elements of this Dataset in a multi-level tree pattern. + * + * @param depth suggested depth of the tree (default: 2) + */ + private[spark] def treeAggregate[U : Encoder : ClassTag](zeroValue: U)( + seqOp: (U, T) => U, + combOp: (U, U) => U, + depth: Int = 2): U = { +require(depth >= 1, s"Depth must be greater than or equal to 1 but got $depth.") --- End diff -- why would depth 1 make sense? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml.featur...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20686 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88214/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml.featur...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20686 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml.featur...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20686 **[Test build #88214 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88214/testReport)** for PR 20686 at commit [`bf713b5`](https://github.com/apache/spark/commit/bf713b5366e1b42bd5e52f0366ca24944f509721). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20799: [SPARK-23635][YARN] AM env variable should not ov...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/20799#discussion_r174273641 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala --- @@ -247,6 +241,18 @@ private[yarn] class ExecutorRunnable( System.getenv().asScala.filterKeys(_.startsWith("SPARK")) .foreach { case (k, v) => env(k) = v } + +sparkConf.getExecutorEnv.foreach { case (key, value) => + if (key == Environment.CLASSPATH.name()) { +// If the key of env variable is CLASSPATH, we assume it is a path and append it. +// This is kept for backward compatibility and consistency with hadoop +YarnSparkHadoopUtil.addPathToEnvironment(env, key, value) + } else { +// For other env variables, simply overwrite the value. +env(key) = value + } +} --- End diff -- @jerryshao I think there is a potential issue with this change - it allows for users to (incorrectly) specify SPARK_LOG_URL_STDERR, SPARK_LOG_URL_STDOUT : which should be generated by driver. The section "// Add log urls" above this code snippet. Note, this is an existing bug in the code regarding the same - if the same variables had been present in driver env, they would have overridden the generated value's. Would be good to fix this issue as well as part of this change. Solution would be to move the block for '// Add log urls' below this current block --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20799: [SPARK-23635][YARN] AM env variable should not ov...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/20799#discussion_r174272808 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala --- @@ -247,6 +241,18 @@ private[yarn] class ExecutorRunnable( System.getenv().asScala.filterKeys(_.startsWith("SPARK")) .foreach { case (k, v) => env(k) = v } + +sparkConf.getExecutorEnv.foreach { case (key, value) => + if (key == Environment.CLASSPATH.name()) { --- End diff -- @tgravescs In existing code, in `prepareEnvironment`, "env" is populated only with `Environment.CLASSPATH`. Hence LD_LIBRARY_PATH does not apply to this specific change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml.featur...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20686 **[Test build #88214 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88214/testReport)** for PR 20686 at commit [`bf713b5`](https://github.com/apache/spark/commit/bf713b5366e1b42bd5e52f0366ca24944f509721). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20805: [SPARK-21479][SQL] Outer join filter pushdown in ...
Github user maryannxue closed the pull request at: https://github.com/apache/spark/pull/20805 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/20686#discussion_r174247908 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSlicerSuite.scala --- @@ -84,26 +84,29 @@ class VectorSlicerSuite extends SparkFunSuite with MLlibTestSparkContext with De val vectorSlicer = new VectorSlicer().setInputCol("features").setOutputCol("result") -def validateResults(df: DataFrame): Unit = { - df.select("result", "expected").collect().foreach { case Row(vec1: Vector, vec2: Vector) => +def validateResults(rows: Seq[Row]): Unit = { + rows.foreach { case Row(vec1: Vector, vec2: Vector) => assert(vec1 === vec2) } - val resultMetadata = AttributeGroup.fromStructField(df.schema("result")) - val expectedMetadata = AttributeGroup.fromStructField(df.schema("expected")) + val resultMetadata = AttributeGroup.fromStructField(rows.head.schema("result")) + val expectedMetadata = AttributeGroup.fromStructField(rows.head.schema("expected")) assert(resultMetadata.numAttributes === expectedMetadata.numAttributes) resultMetadata.attributes.get.zip(expectedMetadata.attributes.get).foreach { case (a, b) => assert(a === b) } } vectorSlicer.setIndices(Array(1, 4)).setNames(Array.empty) -validateResults(vectorSlicer.transform(df)) +testTransformerByGlobalCheckFunc[(Vector, Vector)](df, vectorSlicer, "result", "expected")( --- End diff -- The reason I have chosen the global check function is the checks for the attributes: ``` val resultMetadata = AttributeGroup.fromStructField(rows.head.schema("result")) val expectedMetadata = AttributeGroup.fromStructField(rows.head.schema("expected")) assert(resultMetadata.numAttributes === expectedMetadata.numAttributes) resultMetadata.attributes.get.zip(expectedMetadata.attributes.get).foreach { case (a, b) => assert(a === b) } ``` This is part is not row based but more like result set based. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19876: [ML][SPARK-11171][SPARK-11239] Add PMML export to Spark ...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/19876 March 15th is soon, any thoughts @MLnick @jkbradley @sethah ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/20686#discussion_r174245361 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala --- @@ -58,14 +57,16 @@ class VectorAssemblerSuite assert(v2.isInstanceOf[DenseVector]) } - test("VectorAssembler") { + ignore("VectorAssembler") { --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20812: [SPARK-23669] Executors fetch jars and name the jars wit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20812 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20812: [SPARK-23669] Executors fetch jars and name the jars wit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20812 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88208/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20812: [SPARK-23669] Executors fetch jars and name the jars wit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20812 **[Test build #88208 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88208/testReport)** for PR 20812 at commit [`f78c273`](https://github.com/apache/spark/commit/f78c273c6132f9cc226668590273836950c39b74). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20815: [SPARK-23658][LAUNCHER] InProcessAppHandle uses the wron...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20815 **[Test build #88213 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88213/testReport)** for PR 20815 at commit [`1518a5a`](https://github.com/apache/spark/commit/1518a5af591b2254e947a60e0ec107551f2155a4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20815: [SPARK-23658][LAUNCHER] InProcessAppHandle uses the wron...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/20815 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20702: [SPARK-23547][SQL]Cleanup the .pipeout file when ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20702 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20811: [SPARK-23668][K8S] Add config option for passing through...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20811 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1476/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20815: [SPARK-23658][LAUNCHER] InProcessAppHandle uses the wron...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20815 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20702: [SPARK-23547][SQL]Cleanup the .pipeout file when the Hiv...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20702 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20803: [SPARK-23653][SQL] Show sql statement in spark SQL UI
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20803 So this patch duplicates the SQL text info on the jobs page to the SQL query page. I think it's good and more user-friendly, but we need to make sure the underlying implementation reuse the code, to avoid problems like missing the `--hivevar`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20779 **[Test build #88212 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88212/testReport)** for PR 20779 at commit [`8fb5df0`](https://github.com/apache/spark/commit/8fb5df0f76a6773594bb7e695036f3fdf0063c6a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20779 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1493/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20815: [SPARK-23658][LAUNCHER] InProcessAppHandle uses the wron...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20815 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20779 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20811: [SPARK-23668][K8S] Add config option for passing through...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20811 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1476/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20811: [SPARK-23668][K8S] Add config option for passing through...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20811 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20811: [SPARK-23668][K8S] Add config option for passing through...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20811 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1492/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20815: [SPARK-23658][LAUNCHER] InProcessAppHandle uses t...
GitHub user sahilTakiar opened a pull request: https://github.com/apache/spark/pull/20815 [SPARK-23658][LAUNCHER] InProcessAppHandle uses the wrong class in getLogger ## What changes were proposed in this pull request? Changed `Logger` in `InProcessAppHandle` to use `InProcessAppHandle` instead of `ChildProcAppHandle` You can merge this pull request into a Git repository by running: $ git pull https://github.com/sahilTakiar/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20815.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20815 commit 1518a5af591b2254e947a60e0ec107551f2155a4 Author: Sahil Takiar Date: 2018-03-13T18:24:20Z [SPARK-23658][LAUNCHER] InProcessAppHandle uses the wrong class in getLogger --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20811: [SPARK-23668][K8S] Add config option for passing ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/20811#discussion_r174236490 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala --- @@ -108,6 +109,8 @@ private[spark] class ExecutorPodFactory( nodeToLocalTaskCount: Map[String, Int]): Pod = { val name = s"$executorPodNamePrefix-exec-$executorId" +val imagePullSecrets = imagePullSecret.map(new LocalObjectReference(_)).toList --- End diff -- Given the same code is used to configure both the driver and executor pods, it can be extracted out into a utility method in `KubernetesUtil`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20811: [SPARK-23668][K8S] Add config option for passing ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/20811#discussion_r174235585 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/BasicDriverConfigurationStep.scala --- @@ -17,9 +17,7 @@ package org.apache.spark.deploy.k8s.submit.steps import scala.collection.JavaConverters._ - -import io.fabric8.kubernetes.api.model.{ContainerBuilder, EnvVarBuilder, EnvVarSourceBuilder, PodBuilder, QuantityBuilder} - +import io.fabric8.kubernetes.api.model._ --- End diff -- There should be an empty line between third-party imports and `org.apache.spark.*` imports. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20811: [SPARK-23668][K8S] Add config option for passing ...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/20811#discussion_r174235357 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala --- @@ -54,6 +54,12 @@ private[spark] object Config extends Logging { .checkValues(Set("Always", "Never", "IfNotPresent")) .createWithDefault("IfNotPresent") + val IMAGE_PULL_SECRET = +ConfigBuilder("spark.kubernetes.imagePullSecret") + .doc("Specifies the Kubernetes image secret used to access private image registry.") --- End diff -- The first `image` can be removed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88209/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20433 **[Test build #88209 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88209/testReport)** for PR 20433 at commit [`f6210a2`](https://github.com/apache/spark/commit/f6210a2029129d38c15aeeb309c2b001d83757a6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20811: [SPARK-23668][K8S] Add config option for passing through...
Github user foxish commented on the issue: https://github.com/apache/spark/pull/20811 cc/ @mccheah @liyinan926 @vanzin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20814: [SPARK-23671][core] Fix condition to enable the SHS thre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20814 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20814: [SPARK-23671][core] Fix condition to enable the SHS thre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20814 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1491/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20811: [SPARK-23668][K8S] Add config option for passing through...
Github user foxish commented on the issue: https://github.com/apache/spark/pull/20811 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20814: [SPARK-23671][core] Fix condition to enable the SHS thre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20814 **[Test build #88211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88211/testReport)** for PR 20814 at commit [`a897277`](https://github.com/apache/spark/commit/a89727753820aa0cbbe9bea4b2066c89b9ecfb4d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20814: [SPARK-23671][core] Fix condition to enable the S...
GitHub user vanzin opened a pull request: https://github.com/apache/spark/pull/20814 [SPARK-23671][core] Fix condition to enable the SHS thread pool. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vanzin/spark SPARK-23671 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20814.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20814 commit a89727753820aa0cbbe9bea4b2066c89b9ecfb4d Author: Marcelo Vanzin Date: 2018-03-13T18:04:45Z [SPARK-23671][core] Fix condition to enable the SHS thread pool. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20790: [SPARK-23642][DOCS] AccumulatorV2 subclass isZero...
Github user smallory commented on a diff in the pull request: https://github.com/apache/spark/pull/20790#discussion_r174225748 --- Diff: core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala --- @@ -290,7 +290,8 @@ class LongAccumulator extends AccumulatorV2[jl.Long, jl.Long] { private var _count = 0L /** - * Adds v to the accumulator, i.e. increment sum by v and count by 1. + * Returns false if this accumulator has had any values added to it or the sum is non-zero. + * --- End diff -- The current documentation for AccumulatorV2.isZero would be misleading for the behaviour shown when values have been added to the accumulator, but the sum is zero. This still would return false, even though it is a non-count accumulator. I don't believe that any of the implementations in this file actually behave exactly as described by AccumulatorV2.isZero. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20790: [SPARK-23642][DOCS] AccumulatorV2 subclass isZero scalad...
Github user smallory commented on the issue: https://github.com/apache/spark/pull/20790 Thanks for the pointer on the title convention, the way the contributing doc distinguished code and documentation changes left me a bit puzzled as to what actually applied to this change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20813: [SPARK-23670][SQL] Fix memory leak on SparkPlanGraphWrap...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20813 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20813: [SPARK-23670][SQL] Fix memory leak on SparkPlanGraphWrap...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20813 **[Test build #88210 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88210/testReport)** for PR 20813 at commit [`f866701`](https://github.com/apache/spark/commit/f866701b322c9ddf2fddc49d162fc9bc8d83bcdb). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20813: [SPARK-23670][SQL] Fix memory leak on SparkPlanGraphWrap...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20813 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88210/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hi...
Github user DylanGuedes commented on a diff in the pull request: https://github.com/apache/spark/pull/20788#discussion_r174208501 --- Diff: python/pyspark/sql/dataframe.py --- @@ -437,10 +437,11 @@ def hint(self, name, *parameters): if not isinstance(name, str): raise TypeError("name should be provided as str, got {0}".format(type(name))) +allowed_types = (basestring, list, float, int) --- End diff -- It looks like Scala can handle unicode, so basestring looks correct. What you guys think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19951: [SPARK-22760][CORE][YARN] When sc.stop() is called, set ...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19951 Since the target of the fix is silencing a misleading exception, handling that exception as I suggested before would be a feasible solution. But anything more complicated than that is overkill. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20742: [SPARK-23572][docs] Bring "security.md" up to dat...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20742#discussion_r174203564 --- Diff: docs/running-on-yarn.md --- @@ -2,6 +2,8 @@ layout: global title: Running Spark on YARN --- +* This will become a table of contents (this text will be scraped). --- End diff -- This text will be replaced with the TOC. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20813: [SPARK-23670][SQL] Fix memory leak on SparkPlanGraphWrap...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20813 **[Test build #88210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88210/testReport)** for PR 20813 at commit [`f866701`](https://github.com/apache/spark/commit/f866701b322c9ddf2fddc49d162fc9bc8d83bcdb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20813: [SPARK-23670][SQL] Fix memory leak on SparkPlanGraphWrap...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/20813 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20659: [DO-NOT-MERGE] Try to update Hive to 2.3.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20659 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20659: [DO-NOT-MERGE] Try to update Hive to 2.3.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20659 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88206/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20659: [DO-NOT-MERGE] Try to update Hive to 2.3.2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20659 **[Test build #88206 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88206/testReport)** for PR 20659 at commit [`b35daa0`](https://github.com/apache/spark/commit/b35daa0593af1204e3b2833c30ec0374e8c2b530). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20249 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88205/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20249 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20249: [SPARK-23057][SPARK-19235][SQL] SET LOCATION should chan...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20249 **[Test build #88205 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88205/testReport)** for PR 20249 at commit [`910d4d0`](https://github.com/apache/spark/commit/910d4d08e6743bd29358453ae977a10c30d36774). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20813: [SPARK-23670][SQL] Fix memory leak on SparkPlanGraphWrap...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20813 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20813: [SPARK-23670][SQL] Fix memory leak on SparkPlanGraphWrap...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20813 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20813: [SPARK-23670][SQL] Fix memory leak on SparkPlanGr...
GitHub user myroslavlisniak opened a pull request: https://github.com/apache/spark/pull/20813 [SPARK-23670][SQL] Fix memory leak on SparkPlanGraphWrapper ## What changes were proposed in this pull request? Clean up SparkPlanGraphWrapper objects from InMemoryStore together with cleaning up SQLExecutionUIData ## How was this patch tested? existing unit test was extended to check also SparkPlanGraphWrapper object count You can merge this pull request into a Git repository by running: $ git pull https://github.com/myroslavlisniak/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20813.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20813 commit f866701b322c9ddf2fddc49d162fc9bc8d83bcdb Author: myroslavlisniak Date: 2018-03-13T11:22:07Z fix memory leak on SparkPlanGraphWrapper --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1490/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20433 **[Test build #88209 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88209/testReport)** for PR 20433 at commit [`f6210a2`](https://github.com/apache/spark/commit/f6210a2029129d38c15aeeb309c2b001d83757a6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20812: [SPARK-23669] Executors fetch jars and name the jars wit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20812 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1489/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20812: [SPARK-23669] Executors fetch jars and name the jars wit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20812 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20812: [SPARK-23669] Executors fetch jars and name the jars wit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20812 **[Test build #88208 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88208/testReport)** for PR 20812 at commit [`f78c273`](https://github.com/apache/spark/commit/f78c273c6132f9cc226668590273836950c39b74). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20812: [SPARK-23669] Executors fetch jars and name the jars wit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20812 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20812: [SPARK-23669] Executors fetch jars and name the jars wit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20812 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1488/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20812: [SPARK-23669] Executors fetch jars and name the j...
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/20812 [SPARK-23669] Executors fetch jars and name the jars with md5 prefix ## What changes were proposed in this pull request? In our cluster, there are lots of UDF jars, some of them have the same filename but different path, for example: ``` hdfs://A/B/udf.jar -> udfA hdfs://C/D/udf.jar -> udfB ``` When user uses udfA and udfB in same sql, executor will fetch both `hdfs://A/B/udf.jar` and `hdfs://C/D/udf.jar` to local. There will be a conflict for the same name. Can we config to fetch jars and save with a filename with MD5 prefix, so there will be no conflict. ## How was this patch tested? UT You can merge this pull request into a Git repository by running: $ git pull https://github.com/jinxing64/spark SPARK-23669 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20812.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20812 commit 5791edb4d325f24be63485032bf01125cc2aa28b Author: jinxing Date: 2018-03-13T14:15:56Z [SPARK-23669] Executors fetch jars and name the jars with md5 prefix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20780: [MINOR] [SQL] [TEST] Create table using `dataSour...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20780 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20780: [MINOR] [SQL] [TEST] Create table using `dataSourceName`...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20780 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20810: [SPARK-20114][ML] spark.ml parity for sequential pattern...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20810 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88207/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20810: [SPARK-20114][ML] spark.ml parity for sequential pattern...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20810 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20810: [SPARK-20114][ML] spark.ml parity for sequential pattern...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20810 **[Test build #88207 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88207/testReport)** for PR 20810 at commit [`c1c5338`](https://github.com/apache/spark/commit/c1c5338c5698bb4fa87151fd8ba5cc986e1e1466). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20804: [SPARK-23656][Test] Perform assertions in XXH64Su...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20804 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20811: [SPARK-23668][K8S] Add config option for passing through...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20811 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19675: [SPARK-14540][BUILD] Support Scala 2.12 closures and Jav...
Github user ekrich commented on the issue: https://github.com/apache/spark/pull/19675 I just posted the info on https://gitter.im/scala/contributors but there is also scala/center and scala/scala or the forum at https://contributors.scala-lang.org/ . Maybe Lightbend too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20804: [SPARK-23656][Test] Perform assertions in XXH64Suite.tes...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20804 LGTM - merging to master. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20742: [SPARK-23572][docs] Bring "security.md" up to date.
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/20742 +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20811: [SPARK-23668][K8S] Add config option for passing through...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20811 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20811: [SPARK-23668][K8S] Add config option for passing ...
GitHub user andrusha opened a pull request: https://github.com/apache/spark/pull/20811 [SPARK-23668][K8S] Add config option for passing through k8s Pod.spec.imagePullSecrets ## What changes were proposed in this pull request? Pass through the `imagePullSecrets` option to the k8s pod in order to allow user to access private image registries. See https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/ ## How was this patch tested? Unit tests + manual testing. Manual testing procedure: 1. Have private image registry. 2. Spark-submit application with no `spark.kubernetes.imagePullSecret` set. Do `kubectl describe pod ...`. See the error message: ``` Error syncing pod, skipping: failed to "StartContainer" for "spark-kubernetes-driver" with ErrImagePull: "rpc error: code = 2 desc = Error: Status 400 trying to pull repository rtdp/hyperconvergence: \"{\\n \\\"errors\\\" : [ {\\n\\\"status\\\" : 400,\\n\\\"message\\\" : \\\"Unsupported docker v1 repository request for '...'\\\"\\n } ]\\n}\"" ``` 3. Create secret `kubectl create secret docker-registry ...` 4. Spark-submit with `spark.kubernetes.imagePullSecret` set to the new secret. See that deployment was successful. You can merge this pull request into a Git repository by running: $ git pull https://github.com/andrusha/spark spark-23668-image-pull-secrets Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20811.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20811 commit dc2c1852a5056a023de64855d1f3b1ce5fd050b9 Author: Andrew Korzhuev Date: 2018-03-13T14:05:58Z Add config option for passing through k8s Pod.spec.imagePullSecrets This will allow users to access images from private registries. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19675: [SPARK-14540][BUILD] Support Scala 2.12 closures and Jav...
Github user fedeoasi commented on the issue: https://github.com/apache/spark/pull/19675 Ideally we would get someone from Scala or Scala Center to pick this up. Anyone knows how to get in touch with them? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/19881#discussion_r174130728 --- Diff: docs/configuration.md --- @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful Lower bound for the number of executors if dynamic allocation is enabled. + + spark.dynamicAllocation.fullParallelismDivisor + 1 + +By default, the dynamic allocation will request enough executors to maximize the +parallelism according to the number of tasks to process. While this minimizes the +latency of the job, with small tasks this setting wastes a lot of resources due to +executor allocation overhead, as some executor might not even do any work. +This setting allows to set a divisor that will be used to reduce the number of +executors w.r.t. full parallelism --- End diff -- add period at end of parallelism --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/19881#discussion_r174126562 --- Diff: docs/configuration.md --- @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful Lower bound for the number of executors if dynamic allocation is enabled. + + spark.dynamicAllocation.fullParallelismDivisor + 1 + +By default, the dynamic allocation will request enough executors to maximize the +parallelism according to the number of tasks to process. While this minimizes the +latency of the job, with small tasks this setting wastes a lot of resources due to +executor allocation overhead, as some executor might not even do any work. +This setting allows to set a divisor that will be used to reduce the number of +executors w.r.t. full parallelism +Defaults to 1.0 --- End diff -- I think we should define that maxExecutors trumps this setting. If I have 1 tasks, divisor 2, I would expect 5000 executors, but if max executors is 1000, that is all I get. we should add a test for this interaction as well --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/19881#discussion_r174145101 --- Diff: docs/configuration.md --- @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful Lower bound for the number of executors if dynamic allocation is enabled. + + spark.dynamicAllocation.fullParallelismDivisor --- End diff -- Naming configs is really hard and lots of different opinions on it and in the end someone is going to be confused, I need to think about this some more. I see the reason to use Parallelism here rather then maxExecutors (maxExecutorsDivisor - could be confusing if people think it applies to the maxExecutors config), but I also think parallelism would be confused with the parallelism in the spark.default.parallelism, its not defining number of tasks but number of executors to allocate based on the parallelism. Another one I thought of is executorAllocationDivisor. I'll think about it some more and get back. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/19881#discussion_r174125381 --- Diff: docs/configuration.md --- @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful Lower bound for the number of executors if dynamic allocation is enabled. + + spark.dynamicAllocation.fullParallelismDivisor + 1 + +By default, the dynamic allocation will request enough executors to maximize the +parallelism according to the number of tasks to process. While this minimizes the +latency of the job, with small tasks this setting wastes a lot of resources due to --- End diff -- can waste. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19675: [SPARK-14540][BUILD] Support Scala 2.12 closures and Jav...
Github user ekrich commented on the issue: https://github.com/apache/spark/pull/19675 Thank-you for the clarification. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19675: [SPARK-14540][BUILD] Support Scala 2.12 closures and Jav...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19675 I am not working on this and am not aware of anyone working on it. Yes, it will only be resolved when someone picks up the last piece of work and finishes it. It should be down to handling how closures are serialized as lambdas now; all the other updates should be in place. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20742: [SPARK-23572][docs] Bring "security.md" up to dat...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/20742#discussion_r174133114 --- Diff: docs/running-on-yarn.md --- @@ -2,6 +2,8 @@ layout: global title: Running Spark on YARN --- +* This will become a table of contents (this text will be scraped). --- End diff -- maybe I misread this, is it supposed to be this text will be scraped (meaning looked for in order to fill in TOC) or was it supposed to be scrapped (meaning thrown away)? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/20692 sure, @gatorsmile, no problem. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19675: [SPARK-14540][BUILD] Support Scala 2.12 closures and Jav...
Github user ekrich commented on the issue: https://github.com/apache/spark/pull/19675 @srowen Are you needing expertise to get the last issues fixed? It would sure be nice to see this completed and released. Java 8/Scala 2.12 was a hard upgrade for Scala too and it has taken the point releases to solve some problems needed for Spark. Scala 2.12.0 was released on 3 Nov 2016 and 2.13 is on the horizon so it would be great to see Spark support 2.12 very soon. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org