[GitHub] spark issue #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to run in th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15338 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null as input seed
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15432 @rxin yes, I just wanted to avoid changing a lot. Will try to fix it in that way (at least) to show how it actually look like. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null as input seed
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15432 @rxin yes, I just wanted to avoid changing a lot. Will try to fix it in that way (at least) to show how it actually look like. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null as inpu...
GitHub user HyukjinKwon reopened a pull request: https://github.com/apache/spark/pull/15432 [SPARK-17854][SQL] rand/randn allows null as input seed ## What changes were proposed in this pull request? This PR proposes `rand`/`randn` accept `null` as input. In this case, it treats the values as `0`. It seems MySQL also accepts this. ```sql mysql> select rand(0); +-+ | rand(0) | +-+ | 0.15522042769493574 | +-+ 1 row in set (0.00 sec) mysql> select rand(NULL); +-+ | rand(NULL) | +-+ | 0.15522042769493574 | +-+ 1 row in set (0.00 sec) ``` and also Hive does according to [HIVE-14694](https://issues.apache.org/jira/browse/HIVE-14694) ## How was this patch tested? Unit tests in `DataFrameSuite.scala` and `RandomSuite.scala`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-17854 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15432.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15432 commit 7fa7db22dd4f2ba88ab1f09e4b776003b3f62fdb Author: hyukjinkwon Date: 2016-10-11T09:21:18Z rand/randn allows null as input seed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to run in th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15338 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66759/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15421 That's interesting. I patched your change to a clean checkout and simply tested against the example on the JIRA. It throws the above exception. val obj = (sqlSerDe._1)(dis, dataType) if (obj == null) { throw new IllegalArgumentException (s"Invalid type $dataType")<= this line } else { obj } I have no clue why it fails on my laptop. I can test on my own server (ubuntu) tonight. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15421 I suspect that it could be related to my R installation: localhost:~ mwang$ R R version 3.3.0 (2016-05-03) -- "Supposedly Educational" Copyright (C) 2016 The R Foundation for Statistical Computing Platform: x86_64-apple-darwin13.4.0 (64-bit) But I am not sure yet. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15389: [SPARK-17817][PySpark] PySpark RDD Repartitioning...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15389#discussion_r82907726 --- Diff: python/pyspark/rdd.py --- @@ -2029,7 +2028,15 @@ def coalesce(self, numPartitions, shuffle=False): >>> sc.parallelize([1, 2, 3, 4, 5], 3).coalesce(1).glom().collect() [[1, 2, 3, 4, 5]] """ -jrdd = self._jrdd.coalesce(numPartitions, shuffle) +if shuffle: +# In Scala's repartition code, we will distribute elements evenly across output +# partitions. However, the RDD from Python is serialized as a single binary data, +# so the distribution fails and produces highly skewed partitions. We need to +# convert it to a RDD of java object before repartitioning. +data_java_rdd = self._to_java_object_rdd().coalesce(numPartitions, shuffle) --- End diff -- Hi @davies, actually it seems a simple benchmark was done in https://github.com/apache/spark/pull/15389#discussion_r82444378 If you worry, then, I'd like to proceed another benchmark with larger data and then will share when I have some time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/11336#discussion_r82907729 --- Diff: R/pkg/R/DataFrame.R --- @@ -1035,10 +1035,16 @@ setMethod("dim", c(count(x), ncol(x)) }) -#' Collects all the elements of a SparkDataFrame and coerces them into an R data.frame. +#' Download Spark datasets into R --- End diff -- I'm not sure this should say "datasets" - we don't have this term elsewhere --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/11336#discussion_r82907897 --- Diff: R/pkg/R/DataFrame.R --- @@ -1182,10 +1195,18 @@ setMethod("take", #' @export #' @examples #'\dontrun{ -#' sparkR.session() -#' path <- "path/to/file.json" -#' df <- read.json(path) -#' head(df) +#' # Initialize Spark context and SQL context +#' sc <- sparkR.init() +#' sqlContext <- sparkRSQL.init(sc) --- End diff -- ditto here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/11336#discussion_r82907977 --- Diff: R/pkg/R/DataFrame.R --- @@ -1168,12 +1179,14 @@ setMethod("take", #' Head #' -#' Return the first \code{num} rows of a SparkDataFrame as a R data.frame. If \code{num} is not -#' specified, then head() returns the first 6 rows as with R data.frame. +#' Return the first elements of a dataset. If \code{x} is a SparkDataFrame, its first +#' rows will be returned as a data.frame. If the dataset is a \code{Column}, its first +#' elements will be returned as a vector. The number of elements to be returned +#' is given by parameter \code{num}. Default value for \code{num} is 6. #' -#' @param x a SparkDataFrame. -#' @param num the number of rows to return. Default is 6. -#' @return A data.frame. +#' @param x A SparkDataFrame or Column --- End diff -- for something like this the convention we have is to add the @param in generics.R - you can see other examples there --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...
Github user ericl commented on the issue: https://github.com/apache/spark/pull/14690 Btw I also made https://github.com/VideoAmp/spark-public/pull/2/files, to fix inputFiles. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/11336#discussion_r82908439 --- Diff: R/pkg/R/column.R --- @@ -32,35 +34,65 @@ setOldClass("jobj") #' @export #' @note Column since 1.4.0 setClass("Column", - slots = list(jc = "jobj")) + slots = list(jc = "jobj", df = "SparkDataFrameOrNull")) #' A set of operations working with SparkDataFrame columns #' @rdname columnfunctions #' @name columnfunctions NULL - -setMethod("initialize", "Column", function(.Object, jc) { +setMethod("initialize", "Column", function(.Object, jc, df) { .Object@jc <- jc + + # Some Column objects don't have any referencing DataFrame. In such case, df will be NULL. + if (missing(df)) { +df <- NULL + } + .Object@df <- df .Object }) +setMethod("show", signature = "Column", definition = function(object) { --- End diff -- +1, default to 6 for consistency? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11336: [SPARK-9325][SPARK-R] collect() head() and show() for Co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11336 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11336: [SPARK-9325][SPARK-R] collect() head() and show() for Co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11336 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66767/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11336: [SPARK-9325][SPARK-R] collect() head() and show() for Co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11336 **[Test build #66767 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66767/consoleFull)** for PR 11336 at commit [`ed0abf2`](https://github.com/apache/spark/commit/ed0abf24d7f65ad2381f6d664ba23e440013c97a). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/11336#discussion_r82908731 --- Diff: R/pkg/R/functions.R --- @@ -2836,7 +2845,11 @@ setMethod("lpad", signature(x = "Column", len = "numeric", pad = "character"), setMethod("rand", signature(seed = "missing"), function(seed) { jc <- callJStatic("org.apache.spark.sql.functions", "rand") -column(jc) + +# By assigning a one-row data.frame, the result of this function can be collected +# returning a one-element Column +df <- as.DataFrame(sparkRSQL.init(), data.frame(0)) --- End diff -- I think this is why test fails - do not use sparkRQL.init() --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/11336#discussion_r82908825 --- Diff: R/pkg/R/functions.R --- @@ -2876,7 +2897,8 @@ setMethod("randn", signature(seed = "missing"), setMethod("randn", signature(seed = "numeric"), function(seed) { jc <- callJStatic("org.apache.spark.sql.functions", "randn", as.integer(seed)) -column(jc) +df <- as.DataFrame(sparkRSQL.init(), data.frame(0)) --- End diff -- ditto --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/11336#discussion_r82908799 --- Diff: R/pkg/R/functions.R --- @@ -2847,7 +2860,11 @@ setMethod("rand", signature(seed = "missing"), setMethod("rand", signature(seed = "numeric"), function(seed) { jc <- callJStatic("org.apache.spark.sql.functions", "rand", as.integer(seed)) -column(jc) + +# By assigning a one-row data.frame, the result of this function can be collected +# returning a one-element Column +df <- as.DataFrame(sparkRSQL.init(), data.frame(0)) --- End diff -- ditto --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/11336#discussion_r82908811 --- Diff: R/pkg/R/functions.R --- @@ -2865,7 +2882,11 @@ setMethod("rand", signature(seed = "numeric"), setMethod("randn", signature(seed = "missing"), function(seed) { jc <- callJStatic("org.apache.spark.sql.functions", "randn") -column(jc) + +# By assigning a one-row data.frame, the result of this function can be collected +# returning a one-element Column +df <- as.DataFrame(sparkRSQL.init(), data.frame(0)) --- End diff -- ditto --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/11336#discussion_r82908836 --- Diff: R/pkg/R/functions.R --- @@ -3026,7 +3048,11 @@ setMethod("translate", setMethod("unix_timestamp", signature(x = "missing", format = "missing"), function(x, format) { jc <- callJStatic("org.apache.spark.sql.functions", "unix_timestamp") -column(jc) + +# By assigning a one-row data.frame, the result of this function can be collected +# returning a one-element Column +df <- as.DataFrame(sparkRSQL.init(), data.frame(0)) --- End diff -- ditto --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15438: [SPARK-17845][SQL] More self-evident window function fra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15438 **[Test build #66762 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66762/consoleFull)** for PR 15438 at commit [`1913d29`](https://github.com/apache/spark/commit/1913d29b36a408e8b583fc97045847369e31ff66). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11336: [SPARK-9325][SPARK-R] collect() head() and show() for Co...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/11336 I know why tests fail - please see my comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15438: [SPARK-17845][SQL] More self-evident window function fra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15438 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66762/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15438: [SPARK-17845][SQL] More self-evident window function fra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15438 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r82909681 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala --- @@ -0,0 +1,244 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.streaming + +import java.{util => ju} + +import scala.collection.mutable + +import com.codahale.metrics.{Gauge, MetricRegistry} + +import org.apache.spark.internal.Logging +import org.apache.spark.metrics.source.{Source => CodahaleSource} +import org.apache.spark.util.Clock + +/** + * Class that manages all the metrics related to a StreamingQuery. It does the following. + * - Calculates metrics (rates, latencies, etc.) based on information reported by StreamExecution. + * - Allows the current metric values to be queried + * - Serves some of the metrics through Codahale/DropWizard metrics + * + * @param sources Unique set of sources in a query + * @param triggerClock Clock used for triggering in StreamExecution + * @param codahaleSourceName Root name for all the Codahale metrics + */ +class StreamMetrics(sources: Set[Source], triggerClock: Clock, codahaleSourceName: String) + extends CodahaleSource with Logging { + + import StreamMetrics._ + + // Trigger infos + private val triggerStatus = new mutable.HashMap[String, String] + private val sourceTriggerStatus = new mutable.HashMap[Source, mutable.HashMap[String, String]] + + // Rate estimators for sources and sinks + private val inputRates = new mutable.HashMap[Source, RateCalculator] + private val processingRates = new mutable.HashMap[Source, RateCalculator] + + // Number of input rows in the current trigger + private val numInputRows = new mutable.HashMap[Source, Long] + private var numOutputRows: Option[Long] = None + private var currentTriggerStartTimestamp: Long = -1 + private var previousTriggerStartTimestamp: Long = -1 + private var latency: Option[Double] = None + + override val sourceName: String = codahaleSourceName + override val metricRegistry: MetricRegistry = new MetricRegistry + + // === Initialization === + + // Metric names should not have . in them, so that all the metrics of a query are identified + // together in Ganglia as a single metric group + registerGauge("inputRate-total", currentInputRate) + registerGauge("processingRate-total", () => currentProcessingRate) + registerGauge("latency", () => currentLatency().getOrElse(-1.0)) + + sources.foreach { s => +inputRates.put(s, new RateCalculator) +processingRates.put(s, new RateCalculator) +sourceTriggerStatus.put(s, new mutable.HashMap[String, String]) + +registerGauge(s"inputRate-${s.toString}", () => currentSourceInputRate(s)) +registerGauge(s"processingRate-${s.toString}", () => currentSourceProcessingRate(s)) + } + + // === Setter methods === + + def reportTriggerStarted(triggerId: Long): Unit = synchronized { +numInputRows.clear() +numOutputRows = None +triggerStatus.clear() +sourceTriggerStatus.values.foreach(_.clear()) + +reportTriggerStatus(TRIGGER_ID, triggerId) +sources.foreach(s => reportSourceTriggerStatus(s, TRIGGER_ID, triggerId)) +reportTriggerStatus(ACTIVE, true) +currentTriggerStartTimestamp = triggerClock.getTimeMillis() +reportTriggerStatus(START_TIMESTAMP, currentTriggerStartTimestamp) + } + + def reportTriggerStatus[T](key: String, value: T): Unit = synchronized { +triggerStatus.put(key, value.toString) + } + + def reportSourceTriggerStatus[T](source: Source, key: String, value: T): Unit = synchronized { +sourceTriggerStatus(source).put(key, value.toString) + } + + def reportNumInputRows(inputRows: Map[Source, Long]): Unit = synchroni
[GitHub] spark pull request #15439: [SPARK-17880][DOC] The url linking to `Accumulato...
GitHub user sarutak opened a pull request: https://github.com/apache/spark/pull/15439 [SPARK-17880][DOC] The url linking to `AccumulatorV2` in the document is incorrect. ## What changes were proposed in this pull request? In `programming-guide.md`, the url which links to `AccumulatorV2` says `api/scala/index.html#org.apache.spark.AccumulatorV2` but `api/scala/index.html#org.apache.spark.util.AccumulatorV2` is correct. ## How was this patch tested? manual test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sarutak/spark SPARK-17880 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15439.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15439 commit 623120685fcf3136007c450d5f282a5312bcce2f Author: Kousuke Saruta Date: 2016-10-11T23:22:59Z Fix the url to AccumulatorV2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/15421 It's possible with R version - Jenkins is running 3.1.1 I think, the minimal supported version. AppVeyor is running 3.3.2 I believe, which matches closer to the one @wangmiao1981 has --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15307 **[Test build #66768 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66768/consoleFull)** for PR 15307 at commit [`3d7c71a`](https://github.com/apache/spark/commit/3d7c71a24b3fbfe86fee074b9034db4b89eca2bb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r82909980 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -105,11 +105,21 @@ class StreamExecution( var lastExecution: QueryExecution = null @volatile - var streamDeathCause: StreamingQueryException = null + private var streamDeathCause: StreamingQueryException = null /* Get the call site in the caller thread; will pass this into the micro batch thread */ private val callSite = Utils.getCallSite() + /** Metrics for this query */ + private val streamMetrics = +new StreamMetrics(uniqueSources.toSet, triggerClock, s"StructuredStreaming.$name") --- End diff -- yeah. old data cannot update internal metrics. the final posted QueryTerminated event in the listener bus will have the final value of the metrics. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/15375 This LGTM. Spark unit tests are failing? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r82910187 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala --- @@ -56,7 +57,12 @@ case class StateStoreRestoreExec( child: SparkPlan) extends execution.UnaryExecNode with StatefulOperator { + override lazy val metrics = Map( +"numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output rows")) + override protected def doExecute(): RDD[InternalRow] = { --- End diff -- `longMetrics("...")` forces `metrics` to be initialized. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r82910246 --- Diff: python/pyspark/sql/streaming.py --- @@ -189,6 +189,282 @@ def resetTerminated(self): self._jsqm.resetTerminated() +class StreamingQueryStatus(object): +"""A class used to report information about the progress of a StreamingQuery. + +.. note:: Experimental + +.. versionadded:: 2.1 +""" + +def __init__(self, jsqs): +self._jsqs = jsqs + +def __str__(self): +""" +Pretty string of this query status. + +>>> print(sqs) +StreamingQueryStatus: +Query name: query +Query id: 1 +Status timestamp: 123 +Input rate: 1.0 rows/sec +Processing rate 2.0 rows/sec +Latency: 345.0 ms +Trigger status: +key: value +Source statuses [1 source]: +Source 1:MySource1 +Available offset: #0 +Input rate: 4.0 rows/sec +Processing rate: 5.0 rows/sec +Trigger status: +key: value +Sink status: MySink +Committed offsets: [#1, -] +""" +return self._jsqs.toString() + +@property +@ignore_unicode_prefix +@since(2.1) +def name(self): +""" +Name of the query. This name is unique across all active queries. + +>>> sqs.name +u'query' +""" +return self._jsqs.name() + +@property +@since(2.1) +def id(self): +""" +Id of the query. This id is unique across all queries that have been started in +the current process. + +>>> int(sqs.id) +1 +""" +return self._jsqs.id() + +@property +@since(2.1) +def timestamp(self): +""" +Timestamp (ms) of when this query was generated. + +>>> int(sqs.timestamp) +123 +""" +return self._jsqs.timestamp() + +@property +@since(2.1) +def inputRate(self): +""" +Current rate (rows/sec) at which data is being generated by all the sources. + +>>> sqs.inputRate +1.0 +""" +return self._jsqs.inputRate() + +@property +@since(2.1) +def processingRate(self): +""" +Current rate (rows/sec) at which the query is processing data from all the sources. + +>>> sqs.processingRate +2.0 +""" +return self._jsqs.processingRate() + +@property +@since(2.1) +def latency(self): +""" +Current average latency between the data being available in source and the sink +writing the corresponding output. + +>>> sqs.latency +345.0 +""" +if (self._jsqs.latency().nonEmpty()): +return self._jsqs.latency().get() +else: +return None + +@property +@since(2.1) +def sourceStatuses(self): +""" +Current statuses of the sources. --- End diff -- Added --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/15375 It's interesting AppVeyor is not running for this PR even though there are R changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...
Github user olarayej commented on a diff in the pull request: https://github.com/apache/spark/pull/11336#discussion_r82910308 --- Diff: R/pkg/R/functions.R --- @@ -2836,7 +2845,11 @@ setMethod("lpad", signature(x = "Column", len = "numeric", pad = "character"), setMethod("rand", signature(seed = "missing"), function(seed) { jc <- callJStatic("org.apache.spark.sql.functions", "rand") -column(jc) + +# By assigning a one-row data.frame, the result of this function can be collected +# returning a one-element Column +df <- as.DataFrame(sparkRSQL.init(), data.frame(0)) --- End diff -- See my comment from March 30 to illustrate why this is needed. I'll change sparkRSQL.init() to sparkR.session(). Thanks for catching this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r82910318 --- Diff: python/pyspark/sql/streaming.py --- @@ -189,6 +189,282 @@ def resetTerminated(self): self._jsqm.resetTerminated() +class StreamingQueryStatus(object): +"""A class used to report information about the progress of a StreamingQuery. + +.. note:: Experimental + +.. versionadded:: 2.1 +""" + +def __init__(self, jsqs): +self._jsqs = jsqs + +def __str__(self): +""" +Pretty string of this query status. + +>>> print(sqs) +StreamingQueryStatus: +Query name: query +Query id: 1 +Status timestamp: 123 +Input rate: 1.0 rows/sec +Processing rate 2.0 rows/sec +Latency: 345.0 ms +Trigger status: +key: value +Source statuses [1 source]: +Source 1:MySource1 +Available offset: #0 +Input rate: 4.0 rows/sec +Processing rate: 5.0 rows/sec +Trigger status: +key: value +Sink status: MySink +Committed offsets: [#1, -] +""" +return self._jsqs.toString() + +@property +@ignore_unicode_prefix +@since(2.1) +def name(self): +""" +Name of the query. This name is unique across all active queries. + +>>> sqs.name +u'query' +""" +return self._jsqs.name() + +@property +@since(2.1) +def id(self): +""" +Id of the query. This id is unique across all queries that have been started in +the current process. + +>>> int(sqs.id) +1 +""" +return self._jsqs.id() + +@property +@since(2.1) +def timestamp(self): +""" +Timestamp (ms) of when this query was generated. + +>>> int(sqs.timestamp) +123 +""" +return self._jsqs.timestamp() + +@property +@since(2.1) +def inputRate(self): +""" +Current rate (rows/sec) at which data is being generated by all the sources. + +>>> sqs.inputRate +1.0 +""" +return self._jsqs.inputRate() + +@property +@since(2.1) +def processingRate(self): +""" +Current rate (rows/sec) at which the query is processing data from all the sources. + +>>> sqs.processingRate +2.0 +""" +return self._jsqs.processingRate() + +@property +@since(2.1) +def latency(self): +""" +Current average latency between the data being available in source and the sink +writing the corresponding output. + +>>> sqs.latency +345.0 +""" +if (self._jsqs.latency().nonEmpty()): +return self._jsqs.latency().get() +else: +return None + +@property +@since(2.1) +def sourceStatuses(self): +""" +Current statuses of the sources. + +>>> len(sqs.sourceStatuses) +1 +>>> sqs.sourceStatuses[0].description +u'MySource1' +""" +return [SourceStatus(ss) for ss in self._jsqs.sourceStatuses()] + +@property +@since(2.1) +def sinkStatus(self): +""" +Current status of the sink. + +>>> sqs.sinkStatus.description +u'MySink' +""" +return SinkStatus(self._jsqs.sinkStatus()) + +@property +@since(2.1) +def triggerStatus(self): +""" +Low-level detailed status of the last completed/currently active trigger. + +>>> sqs.triggerStatus +{u'key': u'value'} --- End diff -- I changed the test data to show a glimpse of the actual data that could be there --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h
[GitHub] spark issue #15439: [SPARK-17880][DOC] The url linking to `AccumulatorV2` in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15439 **[Test build #66769 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66769/consoleFull)** for PR 15439 at commit [`6231206`](https://github.com/apache/spark/commit/623120685fcf3136007c450d5f282a5312bcce2f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to run in th...
Github user mikejihbe commented on the issue: https://github.com/apache/spark/pull/15338 Thanks for the review @srowen. Those changes are in. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15307 **[Test build #66771 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66771/consoleFull)** for PR 15307 at commit [`8b4bce8`](https://github.com/apache/spark/commit/8b4bce8ff338aeb982beb6f93e79f09b718c46b6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14690 **[Test build #66772 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66772/consoleFull)** for PR 14690 at commit [`10e9e8a`](https://github.com/apache/spark/commit/10e9e8a08661aa53347bccfecbc88aad8e89adb8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10307: [SPARK-12334][SQL][PYSPARK] Support read from multiple i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/10307 **[Test build #66773 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66773/consoleFull)** for PR 10307 at commit [`b9e6481`](https://github.com/apache/spark/commit/b9e64815890db81d8168e4aa350b939b9b83c94e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to run in th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15338 **[Test build #66770 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66770/consoleFull)** for PR 15338 at commit [`42c9874`](https://github.com/apache/spark/commit/42c9874ac35c124d6cfd93c272dda6e28b4ce9d3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...
Github user olarayej commented on a diff in the pull request: https://github.com/apache/spark/pull/11336#discussion_r82911244 --- Diff: R/pkg/R/DataFrame.R --- @@ -1035,10 +1035,16 @@ setMethod("dim", c(count(x), ncol(x)) }) -#' Collects all the elements of a SparkDataFrame and coerces them into an R data.frame. +#' Download Spark datasets into R --- End diff -- Sure. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...
Github user olarayej commented on a diff in the pull request: https://github.com/apache/spark/pull/11336#discussion_r82911261 --- Diff: R/pkg/R/DataFrame.R --- @@ -1049,11 +1055,16 @@ setMethod("dim", #' @export #' @examples #'\dontrun{ -#' sparkR.session() -#' path <- "path/to/file.json" -#' df <- read.json(path) -#' collected <- collect(df) -#' firstName <- collected[[1]]$name +#' # Initialize Spark context and SQL context +#' sc <- sparkR.init() +#' sqlContext <- sparkRSQL.init(sc) --- End diff -- Sure. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/11336#discussion_r82911304 --- Diff: R/pkg/R/functions.R --- @@ -2836,7 +2845,11 @@ setMethod("lpad", signature(x = "Column", len = "numeric", pad = "character"), setMethod("rand", signature(seed = "missing"), function(seed) { jc <- callJStatic("org.apache.spark.sql.functions", "rand") -column(jc) + +# By assigning a one-row data.frame, the result of this function can be collected +# returning a one-element Column +df <- as.DataFrame(sparkRSQL.init(), data.frame(0)) --- End diff -- actually, just change it to `as.DataFrame(data.frame(0))` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12524: [SPARK-12524][Core]DagScheduler may submit a task set fo...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/12524 @seayi any progress on this ? Would be good to add this in if consistently reproducible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66774 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66774/consoleFull)** for PR 15148 at commit [`19f6d89`](https://github.com/apache/spark/commit/19f6d8927f56f9e67a1d4f6d9a14722392469b5a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15439: [SPARK-17880][DOC] The url linking to `AccumulatorV2` in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15439 **[Test build #66769 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66769/consoleFull)** for PR 15439 at commit [`6231206`](https://github.com/apache/spark/commit/623120685fcf3136007c450d5f282a5312bcce2f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15439: [SPARK-17880][DOC] The url linking to `AccumulatorV2` in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15439 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66769/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15439: [SPARK-17880][DOC] The url linking to `AccumulatorV2` in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15439 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15440: Fix hadoop.version in building-spark.md
GitHub user apivovarov opened a pull request: https://github.com/apache/spark/pull/15440 Fix hadoop.version in building-spark.md Couple of mvn build examples use `-Dhadoop.version=VERSION` instead of actual version number You can merge this pull request into a Git repository by running: $ git pull https://github.com/apivovarov/spark-1 patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15440.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15440 commit 0fa66f3410c3ae0c4f98d7f4ca4f2b0e53df0e44 Author: Alexander Pivovarov Date: 2016-10-11T23:48:18Z Fix hadoop.version in building-spark.md Couple mvn build examples use `-Dhadoop.version=VERSION` instead of actual version number --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15440: Fix hadoop.version in building-spark.md
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15440 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15375 Seems like a flaky test in `DirectKafkaStreamSuite`: ``` DirectKafkaStreamSuite: - pattern based subscription *** FAILED *** (1 minute, 41 seconds) ``` If jenkins listens to your commands, maybe we can have it retest this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15441: [SPARK-4411] [Web UI] Add "kill" link for jobs in...
GitHub user ajbozarth opened a pull request: https://github.com/apache/spark/pull/15441 [SPARK-4411] [Web UI] Add "kill" link for jobs in the UI ## What changes were proposed in this pull request? Currently users can kill stages via the web ui but not jobs directly (jobs are killed if one of their stages is). I've added the ability to kill jobs via the web ui. This code change is based on #4823 by @lianhuiwang and updated to work with the latest code matching how stages are currently killed. In general I've copied the kill stage code warning and note comments and all. I also updated applicable tests and documentation. ## How was this patch tested? Manually tested and dev/run-tests ![screen shot 2016-10-11 at 4 49 43 pm](https://cloud.githubusercontent.com/assets/13952758/19292857/12f1b7c0-8fd4-11e6-8982-210249f7b697.png) You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajbozarth/spark spark4411 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15441.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15441 commit af461ccce44e2792ea9356ccc2db6c84609511a0 Author: Lianhui Wang Date: 2015-02-28T03:24:46Z Add kill link for jobs in the UI commit 7f52874badfea314d019b0dc9097c54b8af2f654 Author: Lianhui Wang Date: 2015-02-28T05:23:22Z Update JobsTab.scala commit 7a6143a8d44620aec47a77ddc4f3242231924d3f Author: Lianhui Wang Date: 2015-03-01T06:26:14Z Merge branch 'master' of https://github.com/apache/spark into SPARK-4411 commit 584240affe2422e167b4d3ea87b5766623ed72f6 Author: Lianhui Wang Date: 2015-03-01T06:30:34Z address srowen’s comments commit 25fc0fd1fc574522ab08f23f6f61673960a1072a Author: Lianhui Wang Date: 2015-03-01T06:45:46Z address srowen’s comments commit ba168399f4ee4f59a2c0568b9e094b55747e97c0 Author: Lianhui Wang Date: 2015-03-24T07:26:43Z add : Unit return type commit a0eee0caa14824cefb99d178522f6ada2a305f4a Author: Lianhui Wang Date: 2015-03-25T01:43:40Z add an else case commit d0e208385482daac4a7bcaa4a90637cf88f66c77 Author: Alex Bozarth Date: 2016-10-11T20:39:32Z add kill jobs link. initial commit based on pr #4823 by @lianhuiwang commit f2519fc3903bb6b4c2e08a38d67a5b3df52dea49 Author: Alex Bozarth Date: 2016-10-11T21:41:55Z Fixed scalastyle commit 999f83a8b89e5fb89d5753b79346f8730656c0cd Author: Alex Bozarth Date: 2016-10-12T00:03:18Z Merge branch 'master' into spark4411 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15441: [SPARK-4411] [Web UI] Add "kill" link for jobs in the UI
Github user ajbozarth commented on the issue: https://github.com/apache/spark/pull/15441 @srowen @kayousterhout @tgravescs You have had input or the JIRA or previous PR, could you take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15408 **[Test build #66765 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66765/consoleFull)** for PR 15408 at commit [`30173fa`](https://github.com/apache/spark/commit/30173facf79e03469291199807f84368a320e262). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15441: [SPARK-4411] [Web UI] Add "kill" link for jobs in the UI
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15441 **[Test build #66775 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66775/consoleFull)** for PR 15441 at commit [`999f83a`](https://github.com/apache/spark/commit/999f83a8b89e5fb89d5753b79346f8730656c0cd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15408 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15408 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66765/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15422: [SPARK-17850][Core]Add a flag to ignore corrupt files
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15422 **[Test build #66776 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66776/consoleFull)** for PR 15422 at commit [`ef88a64`](https://github.com/apache/spark/commit/ef88a64ac5e27e58f6f87bf0588ac1c3995be882). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15295: [SPARK-17720][SQL] introduce static SQL conf
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15295 **[Test build #66777 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66777/consoleFull)** for PR 15295 at commit [`595b220`](https://github.com/apache/spark/commit/595b22097dba8716545cd405fa36448065ce779d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14690 **[Test build #66764 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66764/consoleFull)** for PR 14690 at commit [`175c268`](https://github.com/apache/spark/commit/175c2684eb515a1d0def8cf6a72011aa9a48625d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14690 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14690 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66764/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15434: [SPARK-17873][SQL] ALTER TABLE RENAME TO should allow us...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15434 **[Test build #66778 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66778/consoleFull)** for PR 15434 at commit [`65c1885`](https://github.com/apache/spark/commit/65c1885818e4b712c2132e7e97e0b96ceb3f6dd7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15436: [SPARK-17875] [BUILD] Remove unneeded direct depe...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/15436#discussion_r82917622 --- Diff: dev/deps/spark-deps-hadoop-2.3 --- @@ -130,7 +130,6 @@ metrics-json-3.1.2.jar metrics-jvm-3.1.2.jar minlog-1.3.0.jar mx4j-3.0.2.jar -netty-3.8.0.Final.jar --- End diff -- I think netty 3 is used by hadoop-nfs: https://issues.apache.org/jira/browse/HADOOP-12415 However, I don't know why the patch for HADOOP-12415 also added netty 3 to `hadoop-hdfs`... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15436: [SPARK-17875] [BUILD] Remove unneeded direct dependence ...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/15436 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15436: [SPARK-17875] [BUILD] Remove unneeded direct dependence ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15436 **[Test build #66779 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66779/consoleFull)** for PR 15436 at commit [`a5c5c31`](https://github.com/apache/spark/commit/a5c5c3146e702a5c6ac8a86648f58f44d13a95f2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13440 **[Test build #66766 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66766/consoleFull)** for PR 13440 at commit [`83f5e83`](https://github.com/apache/spark/commit/83f5e83fb87407bdd7dc8d740fba6fb30d1da3aa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13440 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13440 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66766/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...
Github user olarayej commented on a diff in the pull request: https://github.com/apache/spark/pull/11336#discussion_r82918200 --- Diff: R/pkg/R/column.R --- @@ -32,35 +34,57 @@ setOldClass("jobj") #' @export #' @note Column since 1.4.0 setClass("Column", - slots = list(jc = "jobj")) + slots = list(jc = "jobj", df = "SparkDataFrameOrNull")) #' A set of operations working with SparkDataFrame columns #' @rdname columnfunctions #' @name columnfunctions NULL - -setMethod("initialize", "Column", function(.Object, jc) { +setMethod("initialize", "Column", function(.Object, jc, df) { .Object@jc <- jc + + # Some Column objects don't have any referencing DataFrame. In such case, df will be NULL. + if (missing(df)) { +df <- NULL + } + .Object@df <- df .Object }) +setMethod("show", signature = "Column", definition = function(object) { --- End diff -- Sure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66774 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66774/consoleFull)** for PR 15148 at commit [`19f6d89`](https://github.com/apache/spark/commit/19f6d8927f56f9e67a1d4f6d9a14722392469b5a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66774/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...
Github user olarayej commented on a diff in the pull request: https://github.com/apache/spark/pull/11336#discussion_r82919122 --- Diff: R/pkg/R/DataFrame.R --- @@ -1168,12 +1179,14 @@ setMethod("take", #' Head #' -#' Return the first \code{num} rows of a SparkDataFrame as a R data.frame. If \code{num} is not -#' specified, then head() returns the first 6 rows as with R data.frame. +#' Return the first elements of a dataset. If \code{x} is a SparkDataFrame, its first +#' rows will be returned as a data.frame. If the dataset is a \code{Column}, its first +#' elements will be returned as a vector. The number of elements to be returned +#' is given by parameter \code{num}. Default value for \code{num} is 6. #' -#' @param x a SparkDataFrame. -#' @param num the number of rows to return. Default is 6. -#' @return A data.frame. +#' @param x A SparkDataFrame or Column --- End diff -- Not sure I follow here. Could you point to the specific example? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14690 **[Test build #66772 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66772/consoleFull)** for PR 14690 at commit [`10e9e8a`](https://github.com/apache/spark/commit/10e9e8a08661aa53347bccfecbc88aad8e89adb8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14690 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66772/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14690 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12933: [Spark-15155][Mesos] Optionally ignore default role reso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12933 **[Test build #66780 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66780/consoleFull)** for PR 12933 at commit [`838dc77`](https://github.com/apache/spark/commit/838dc77d5473e7a584efbd3ac223eba696a427f7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12933: [Spark-15155][Mesos] Optionally ignore default role reso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12933 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66780/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12933: [Spark-15155][Mesos] Optionally ignore default role reso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12933 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12933: [Spark-15155][Mesos] Optionally ignore default role reso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12933 **[Test build #66780 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66780/consoleFull)** for PR 12933 at commit [`838dc77`](https://github.com/apache/spark/commit/838dc77d5473e7a584efbd3ac223eba696a427f7). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15442: [SPARK-17853][STREAMING][KAFKA][DOC] make it clea...
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/15442 [SPARK-17853][STREAMING][KAFKA][DOC] make it clear that reusing group.id is bad ## What changes were proposed in this pull request? Documentation fix to make it clear that reusing group id for different streams is super duper bad, just like it is with the underlying Kafka consumer. ## How was this patch tested? I built jekyll doc and made sure it looked ok. You can merge this pull request into a Git repository by running: $ git pull https://github.com/koeninger/spark-1 SPARK-17853 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15442.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15442 commit c78c601b7c8af870085f31635ae8b374fb238332 Author: cody koeninger Date: 2016-10-12T01:18:35Z [SPARK-17853][DOC] make it clear that reusing group.id is bad --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15442: [SPARK-17853][STREAMING][KAFKA][DOC] make it clear that ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15442 **[Test build #66781 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66781/consoleFull)** for PR 15442 at commit [`c78c601`](https://github.com/apache/spark/commit/c78c601b7c8af870085f31635ae8b374fb238332). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15297: [WIP][SPARK-9862]Handling data skew
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/15297#discussion_r82922339 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SkewShuffleRowRDD.scala --- @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import java.util.Arrays + +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark._ +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.InternalRow + +class SkewCoalescedPartitioner( +val parent: Partitioner, +val partitionStartIndices: Array[(Int, Int)]) + extends Partitioner { + + @transient private lazy val parentPartitionMapping: Array[Int] = { +val n = parent.numPartitions +val result = new Array[Int](n) +for (i <- 0 until partitionStartIndices.length) { + val start = partitionStartIndices(i)._2 + val end = if (i < partitionStartIndices.length - 1) partitionStartIndices(i + 1)._2 else n + for (j <- start until end) { +result(j) = i + } +} +result + } + + override def numPartitions: Int = partitionStartIndices.length + + override def getPartition(key: Any): Int = { +parentPartitionMapping(parent.getPartition(key)) + } + + override def equals(other: Any): Boolean = other match { +case c: SkewCoalescedPartitioner => + c.parent == parent && +c.partitionStartIndices.zip(partitionStartIndices). + forall( r => r match { +case (x, y) => (x._1 == y._1 && x._2 == y._2) +}) +case _ => + false + } + + override def hashCode(): Int = 31 * parent.hashCode() + partitionStartIndices.hashCode() +} + + /** + * if mapIndex is -1, same as ShuffledRowRDDPartition + * if mapIndex > -1 ,only read one block of mappers. + */ +private final class SkewShuffledRowRDDPartition( +val postShufflePartitionIndex: Int, +val mapIndex: Int, +val startPreShufflePartitionIndex: Int, +val endPreShufflePartitionIndex: Int) extends Partition { + override val index: Int = postShufflePartitionIndex + + override def hashCode(): Int = postShufflePartitionIndex + + override def equals(other: Any): Boolean = super.equals(other) +} + + /** + * only use for skew data join. In join case , need fetch the same partition of + * left output and rigth output together. but when some partiton have bigger data than + * other partitions, it occur data skew . in the case , we need a specialized RDD to handling this. + * in skew partition side,we don't produce one partition, because one partition produce + * one task deal so much data is too slaw . but produce per-stage mapping task num parititons. + * one task only deal one mapper data. in other no skew side. In order to deal with the + * corresponding skew partition , we need produce same partition per-stage parititon num + * times.(Equivalent to broadcoast this partition) + * + * other no skew partition, then deal like ShuffledRowRDD + */ +class SkewShuffleRowRDD( +var dependency1: ShuffleDependency[Int, InternalRow, InternalRow], +partitionStartIndices: Array[(Int, Int, Int)]) + extends ShuffledRowRDD ( dependency1, None) { + + private[this] val numPreShufflePartitions = dependency.partitioner.numPartitions + + override def getPartitions: Array[Partition] = { +val partitions = ArrayBuffer[Partition]() +var partitionIndex = -1 +for(i <- 0 until partitionStartIndices.length ) { --- End diff -- ` for(i <- 0 until partitionStartIndices.length )` -> ` for (i <- partitionStartIndices.indices) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15307 **[Test build #66768 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66768/consoleFull)** for PR 15307 at commit [`3d7c71a`](https://github.com/apache/spark/commit/3d7c71a24b3fbfe86fee074b9034db4b89eca2bb). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15297: [WIP][SPARK-9862]Handling data skew
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/15297#discussion_r82922520 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SkewShuffleRowRDD.scala --- @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import java.util.Arrays + +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark._ +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.InternalRow + +class SkewCoalescedPartitioner( +val parent: Partitioner, +val partitionStartIndices: Array[(Int, Int)]) + extends Partitioner { + + @transient private lazy val parentPartitionMapping: Array[Int] = { +val n = parent.numPartitions +val result = new Array[Int](n) +for (i <- 0 until partitionStartIndices.length) { --- End diff -- `for (i <- 0 until partitionStartIndices.length) ` ->`for (i <- partitionStartIndices.indices) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15307 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66768/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15307 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15443: [SPARK-17881] [SQL] Aggregation function for gene...
GitHub user wzhfy opened a pull request: https://github.com/apache/spark/pull/15443 [SPARK-17881] [SQL] Aggregation function for generating string histograms ## What changes were proposed in this pull request? This agg function generates equi-width histograms in the form of Map(value: String, frequency: Long) for string type columns, with a maximum number of histogram bins. It returns a empty result if the ndv(number of distinct value) of the column exceeds the maximum number allowed. ## How was this patch tested? add test cases You can merge this pull request into a Git repository by running: $ git pull https://github.com/wzhfy/spark stringHistogram Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15443.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15443 commit a843920983914de7efd21608b8f0e39c70b210d7 Author: wangzhenhua Date: 2016-10-12T01:02:37Z add agg function to generate string histogram --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15307 **[Test build #66771 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66771/consoleFull)** for PR 15307 at commit [`8b4bce8`](https://github.com/apache/spark/commit/8b4bce8ff338aeb982beb6f93e79f09b718c46b6). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15443: [SPARK-17881] [SQL] Aggregation function for generating ...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/15443 cc @cloud-fan @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15307 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15307 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66771/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15443: [SPARK-17881] [SQL] Aggregation function for generating ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15443 **[Test build #66782 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66782/consoleFull)** for PR 15443 at commit [`a843920`](https://github.com/apache/spark/commit/a843920983914de7efd21608b8f0e39c70b210d7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15441: [SPARK-4411] [Web UI] Add "kill" link for jobs in...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/15441#discussion_r82923313 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobsTab.scala --- @@ -35,4 +37,18 @@ private[ui] class JobsTab(parent: SparkUI) extends SparkUITab(parent, "jobs") { attachPage(new AllJobsPage(this)) attachPage(new JobPage(this)) + + def handleKillRequest(request: HttpServletRequest): Unit = { +if (killEnabled && (parent.securityManager.checkModifyPermissions(request.getRemoteUser))) { + val killFlag = Option(request.getParameter("terminate")).getOrElse("false").toBoolean + val jobId = Option(request.getParameter("id")).getOrElse("-1").toInt + if (jobId >= 0 && killFlag && jobProgresslistener.activeJobs.contains(jobId)) { +sc.get.cancelJob(jobId) + } --- End diff -- Creating an `Option` only to immediately `get` the value out of it is poor style, and unnecessary. ```scala val jobId = Option(request.getParameter("id")) jobId.foreach { id => if (killFlag && jobProgresslistener.activeJobs.contains(id)) { sc.get.cancelJob(id) } } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10307: [SPARK-12334][SQL][PYSPARK] Support read from multiple i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/10307 **[Test build #66773 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66773/consoleFull)** for PR 10307 at commit [`b9e6481`](https://github.com/apache/spark/commit/b9e64815890db81d8168e4aa350b939b9b83c94e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10307: [SPARK-12334][SQL][PYSPARK] Support read from multiple i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/10307 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66773/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10307: [SPARK-12334][SQL][PYSPARK] Support read from multiple i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/10307 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org