[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15505 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73520/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17080: [SPARK-19739][CORE] propagate S3 session token to cluser
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17080 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17083: [SPARK-19750][UI][branch-2.1] Fix redirect issue from ht...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17083 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73498/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17083: [SPARK-19750][UI][branch-2.1] Fix redirect issue from ht...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17083 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15505 **[Test build #73520 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73520/testReport)** for PR 15505 at commit [`1917b61`](https://github.com/apache/spark/commit/1917b616d8e33241ec763ac583b9e938873a1c7f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17080: [SPARK-19739][CORE] propagate S3 session token to cluser
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17080 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73501/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17083: [SPARK-19750][UI][branch-2.1] Fix redirect issue from ht...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17083 **[Test build #73498 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73498/testReport)** for PR 17083 at commit [`5408005`](https://github.com/apache/spark/commit/5408005912c1e369cbf3d77ea490b88f621ee047). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17080: [SPARK-19739][CORE] propagate S3 session token to cluser
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17080 **[Test build #73501 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73501/testReport)** for PR 17080 at commit [`0ae5aa7`](https://github.com/apache/spark/commit/0ae5aa73c70ae2f46a2d16087b5c55652d1e0282). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17031: [SPARK-19702][MESOS] Add suppress/revive support to the ...
Github user skonto commented on the issue: https://github.com/apache/spark/pull/17031 Ok like the Cassandra case you mean right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17086 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16478 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73519/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17086 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73529/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16478 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17086 **[Test build #73529 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73529/testReport)** for PR 17086 at commit [`cf6a5ab`](https://github.com/apache/spark/commit/cf6a5aba61716dcb11ef3ca7b1f3b803bf99ef33). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class MulticlassMetrics @Since(\"1.1.0\") (predAndLabelsWithOptWeight: RDD[_]) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16478 **[Test build #73519 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73519/testReport)** for PR 16478 at commit [`e6b01f0`](https://github.com/apache/spark/commit/e6b01f07947da06be2fc3114793d7793a0f7406a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17031: [SPARK-19702][MESOS] Add suppress/revive support ...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/17031#discussion_r103287098 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosFineGrainedSchedulerBackend.scala --- @@ -24,6 +24,7 @@ import scala.collection.JavaConverters._ import scala.collection.mutable.{HashMap, HashSet} --- End diff -- ok cool! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16990: [SPARK-19660][CORE][SQL] Replace the configuration prope...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16990 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73511/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16990: [SPARK-19660][CORE][SQL] Replace the configuration prope...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16990 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16990: [SPARK-19660][CORE][SQL] Replace the configuration prope...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16990 **[Test build #73511 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73511/testReport)** for PR 16990 at commit [`21956db`](https://github.com/apache/spark/commit/21956db9ac1908807bbe7761c815980309c35ac8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17031: [SPARK-19702][MESOS] Add suppress/revive support to the ...
Github user mgummelt commented on the issue: https://github.com/apache/spark/pull/17031 Given the concerns about the dispatcher being stuck in a suppressed state, I'm going to solve this a different way. I'm going to increase the default offer decline timeout to 120s and make it configurable, just like it is in the driver. This will make it so that the offer will be offered to 120 other frameworks before circling back to the dispatcher, rather than the default 5. I'll also keep the explicit revive calls when a new driver is submitted or an existing one fails, which immediately causes offers to be re-offered to the dispatcher. This removes the risk that the driver gets stuck in a suppressed state, because the dispatcher never suppresses itself. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: Ensure broadcasted variables are destroyed even in case ...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/14299 You also should file a bug and reference it from the PR title. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17084 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73526/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17084 **[Test build #73526 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73526/testReport)** for PR 17084 at commit [`98652cf`](https://github.com/apache/spark/commit/98652cfb6c92eed90deff61bc83ef66b9096df20). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class BinaryClassificationMetrics @Since(\"2.2.0\") (` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17084 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/17039 How about the more pragmatic approach. I think relation algebra only guarantees ordering when an order by is the top level operation. Why not just check that, and if we find one, add all output columns to the order by? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17001 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73510/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17001 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17001 **[Test build #73510 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73510/testReport)** for PR 17001 at commit [`13245e4`](https://github.com/apache/spark/commit/13245e4474115b41880224d43cd7b4b8613bd6ac). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17071: [SPARK-15615][SQL][BUILD][FOLLOW-UP] Replace deprecated ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17071 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73506/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17071: [SPARK-15615][SQL][BUILD][FOLLOW-UP] Replace deprecated ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17071 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17031: [SPARK-19702][MESOS] Add suppress/revive support ...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/17031#discussion_r103281854 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -582,141 +688,33 @@ private[spark] class MesosClusterScheduler( } } - override def resourceOffers(driver: SchedulerDriver, offers: JList[Offer]): Unit = { -logTrace(s"Received offers from Mesos: \n${offers.asScala.mkString("\n")}") -val tasks = new mutable.HashMap[OfferID, ArrayBuffer[TaskInfo]]() -val currentTime = new Date() - -val currentOffers = offers.asScala.map { - o => new ResourceOffer(o.getId, o.getSlaveId, o.getResourcesList) -}.toList - -stateLock.synchronized { - // We first schedule all the supervised drivers that are ready to retry. - // This list will be empty if none of the drivers are marked as supervise. - val driversToRetry = pendingRetryDrivers.filter { d => -d.retryState.get.nextRetry.before(currentTime) - } - - scheduleTasks( -copyBuffer(driversToRetry), -removeFromPendingRetryDrivers, -currentOffers, -tasks) - - // Then we walk through the queued drivers and try to schedule them. - scheduleTasks( -copyBuffer(queuedDrivers), -removeFromQueuedDrivers, -currentOffers, -tasks) -} -tasks.foreach { case (offerId, taskInfos) => - driver.launchTasks(Collections.singleton(offerId), taskInfos.asJava) -} - -for (o <- currentOffers if !tasks.contains(o.offerId)) { - driver.declineOffer(o.offerId) -} - } - - private def copyBuffer( - buffer: ArrayBuffer[MesosDriverDescription]): ArrayBuffer[MesosDriverDescription] = { -val newBuffer = new ArrayBuffer[MesosDriverDescription](buffer.size) -buffer.copyToBuffer(newBuffer) -newBuffer - } - - def getSchedulerState(): MesosClusterSchedulerState = { -stateLock.synchronized { - new MesosClusterSchedulerState( -frameworkId, -masterInfo.map(m => s"http://${m.getIp}:${m.getPort};), -copyBuffer(queuedDrivers), -launchedDrivers.values.map(_.copy()).toList, -finishedDrivers.map(_.copy()).toList, -copyBuffer(pendingRetryDrivers)) -} - } - - override def offerRescinded(driver: SchedulerDriver, offerId: OfferID): Unit = {} - override def disconnected(driver: SchedulerDriver): Unit = {} - override def reregistered(driver: SchedulerDriver, masterInfo: MasterInfo): Unit = { -logInfo(s"Framework re-registered with master ${masterInfo.getId}") - } - override def slaveLost(driver: SchedulerDriver, slaveId: SlaveID): Unit = {} - override def error(driver: SchedulerDriver, error: String): Unit = { -logError("Error received: " + error) -markErr() - } + private def createTaskInfo(desc: MesosDriverDescription, offer: ResourceOffer): TaskInfo = { +val taskId = TaskID.newBuilder().setValue(desc.submissionId).build() - /** - * Check if the task state is a recoverable state that we can relaunch the task. - * Task state like TASK_ERROR are not relaunchable state since it wasn't able - * to be validated by Mesos. - */ - private def shouldRelaunch(state: MesosTaskState): Boolean = { -state == MesosTaskState.TASK_FAILED || - state == MesosTaskState.TASK_LOST - } +val (remainingResources, cpuResourcesToUse) = + partitionResources(offer.resources, "cpus", desc.cores) +val (finalResources, memResourcesToUse) = + partitionResources(remainingResources.asJava, "mem", desc.mem) +offer.resources = finalResources.asJava - override def statusUpdate(driver: SchedulerDriver, status: TaskStatus): Unit = { -val taskId = status.getTaskId.getValue -stateLock.synchronized { - if (launchedDrivers.contains(taskId)) { -if (status.getReason == Reason.REASON_RECONCILIATION && - !pendingRecover.contains(taskId)) { - // Task has already received update and no longer requires reconciliation. - return -} -val state = launchedDrivers(taskId) -// Check if the driver is supervise enabled and can be relaunched. -if (state.driverDescription.supervise && shouldRelaunch(status.getState)) { - removeFromLaunchedDrivers(taskId) - state.finishDate = Some(new Date()) - val retryState: Option[MesosClusterRetryState] = state.driverDescription.retryState - val (retries, waitTimeSec) = retryState
[GitHub] spark issue #17071: [SPARK-15615][SQL][BUILD][FOLLOW-UP] Replace deprecated ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17071 **[Test build #73506 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73506/testReport)** for PR 17071 at commit [`6f35ee3`](https://github.com/apache/spark/commit/6f35ee3d07892743b318ea8dd23276e881873d2b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16809: [SPARK-19463][SQL]refresh cache after the InsertIntoHado...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16809 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73516/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16809: [SPARK-19463][SQL]refresh cache after the InsertIntoHado...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16809 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16809: [SPARK-19463][SQL]refresh cache after the InsertIntoHado...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16809 **[Test build #73516 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73516/testReport)** for PR 16809 at commit [`f8ccc2f`](https://github.com/apache/spark/commit/f8ccc2fe54c29f69adc730a7078590540b1b4b5e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17039 The major issue is we do not know the original intention of users' query. The query might purposely check whether the result set is sorted or not. Thus, the existing test suite design is conservative to avoid adding any sort as long as users specify the ORDER BY clause. For example, ```SQL SELECT c1, c2, sum(c1) FROM tab1 GROUP BY c1, c2 ORDER BY c1, c2 ``` In the above example, although the order by clause does not contain all the columns, the result set is always sorted. Thus, our test suite should not sort it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17031: [SPARK-19702][MESOS] Add suppress/revive support ...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/17031#discussion_r103281366 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -737,13 +735,75 @@ private[spark] class MesosClusterScheduler( if (index != -1) { pendingRetryDrivers.remove(index) pendingRetryDriversState.expunge(id) + suppressOrRevive() true } else { false } } - def getQueuedDriversSize: Int = queuedDrivers.size - def getLaunchedDriversSize: Int = launchedDrivers.size - def getPendingRetryDriversSize: Int = pendingRetryDrivers.size + private def copyBuffer(buffer: ArrayBuffer[MesosDriverDescription]): + ArrayBuffer[MesosDriverDescription] = { +val newBuffer = new ArrayBuffer[MesosDriverDescription](buffer.size) +buffer.copyToBuffer(newBuffer) +newBuffer + } + + /** + * Check if the task state is a recoverable state that we can relaunch the task. + * Task state like TASK_ERROR are not relaunchable state since it wasn't able + * to be validated by Mesos. + */ + private def isFailure(state: MesosTaskState): Boolean = { +state == MesosTaskState.TASK_FAILED || + state == MesosTaskState.TASK_LOST + } + + private def shouldSuppress: Boolean = { +return queuedDrivers.isEmpty && pendingRetryDrivers.isEmpty --- End diff -- return is redundant. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13326: [SPARK-15560] [Mesos] Queued/Supervise drivers waiting f...
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/13326 @mgummelt /@tnachen, can you have a look into this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13143: [SPARK-15359] [Mesos] Mesos dispatcher should handle DRI...
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/13143 @mgummelt /@tnachen, can you have a look into this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17031: [SPARK-19702][MESOS] Add suppress/revive support ...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/17031#discussion_r103280797 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -737,13 +735,75 @@ private[spark] class MesosClusterScheduler( if (index != -1) { pendingRetryDrivers.remove(index) pendingRetryDriversState.expunge(id) + suppressOrRevive() true } else { false } } - def getQueuedDriversSize: Int = queuedDrivers.size - def getLaunchedDriversSize: Int = launchedDrivers.size - def getPendingRetryDriversSize: Int = pendingRetryDrivers.size + private def copyBuffer(buffer: ArrayBuffer[MesosDriverDescription]): + ArrayBuffer[MesosDriverDescription] = { +val newBuffer = new ArrayBuffer[MesosDriverDescription](buffer.size) +buffer.copyToBuffer(newBuffer) +newBuffer + } + + /** + * Check if the task state is a recoverable state that we can relaunch the task. + * Task state like TASK_ERROR are not relaunchable state since it wasn't able + * to be validated by Mesos. + */ + private def isFailure(state: MesosTaskState): Boolean = { +state == MesosTaskState.TASK_FAILED || + state == MesosTaskState.TASK_LOST + } + + private def shouldSuppress: Boolean = { +return queuedDrivers.isEmpty && pendingRetryDrivers.isEmpty + } + + private def suppressOrRevive(): Unit = { +if (shouldSuppress && !isSuppressed) { + logInfo("Suppressing Offers.") + driver.suppressOffers() + isSuppressed = true +} else if (!shouldSuppress && isSuppressed) { + logInfo("Reviving Offers.") + driver.reviveOffers() + isSuppressed = false +} + } + + /** + * Escape args for Unix-like shells, unless already quoted by the user. + * Based on: http://www.gnu.org/software/bash/manual/html_node/Double-Quotes.html + * and http://www.grymoire.com/Unix/Quote.html + * + * @param value argument + * @return escaped argument + */ + private[scheduler] def shellEscape(value: String): String = { +val WrappedInQuotes = """^(".+"|'.+')$""".r +val ShellSpecialChars = (""".*([ '<>&|\?\*;!#\\(\)"$`]).*""").r --- End diff -- Parentheses are redundant. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17031: [SPARK-19702][MESOS] Add suppress/revive support ...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/17031#discussion_r103280133 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/deploy/mesos/ui/MesosClusterPage.scala --- @@ -32,7 +32,7 @@ private[mesos] class MesosClusterPage(parent: MesosClusterUI) extends WebUIPage( private val historyServerURL = parent.conf.get(HISTORY_SERVER_URL) def render(request: HttpServletRequest): Seq[Node] = { -val state = parent.scheduler.getSchedulerState() +val state = parent.scheduler.getSchedulerState val driverHeader = Seq("Driver ID") val historyHeader = historyServerURL.map(url => Seq("History")).getOrElse(Nil) --- End diff -- Since you are refactoring the code s/url/_. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only decorato...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/16782 also, using the `inspection` module it would be possible to check if the wrapped function is a method. Then we wouldn't need to just make that assumption. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17039 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17039 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73508/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17079: [SPARK-19748][SQL]refresh function has a wrong order to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17079 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73502/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17079: [SPARK-19748][SQL]refresh function has a wrong order to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17079 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17039: [SPARK-19710][SQL][TESTS] Fix ordering of rows in query ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17039 **[Test build #73508 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73508/testReport)** for PR 17039 at commit [`4a4d7ad`](https://github.com/apache/spark/commit/4a4d7ad4b349e49dc4cb81235f796e360dd183f8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only decorato...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/16782 Thanks @jkbradley and @davies for reviewing. This fix still seems a little hacky to me and you could still possibly run into trouble if you call a nested wrapped function and don't consume the `_input_kwargs` right away. But it is the best solution I could think of without being overly complicated and it is a little better than it was before. If you guys give the go ahead, I can update the other uses in pyspark.ml and try to add a test also. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17079: [SPARK-19748][SQL]refresh function has a wrong order to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17079 **[Test build #73502 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73502/testReport)** for PR 17079 at commit [`fd3bb21`](https://github.com/apache/spark/commit/fd3bb21597809409e7f33796589c9178744063c5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16476: [SPARK-19084][SQL] Implement expression field
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/16476 @gczsjdy : I had one comment in past about `genIfElseStructure` but after giving more thought, I was not able to think about a better way to do that. I am fine withthe current version of code you have. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17083: [SPARK-19750][UI][branch-2.1] Fix redirect issue ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/17083#discussion_r103278713 --- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala --- @@ -378,7 +378,8 @@ private[spark] object JettyUtils extends Logging { server.getHandler().asInstanceOf[ContextHandlerCollection]) } - private def createRedirectHttpsHandler(securePort: Int, scheme: String): ContextHandler = { + private def createRedirectHttpsHandler( + httpsConnector: ServerConnector, scheme: String): ContextHandler = { --- End diff -- nit: one argument per line when using multiple lines. But instead of changing this, why not pass the correct port from the caller in the first place? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17082: [SPARK-19749][SS] Name socket source with a meaningful n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17082 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17083: [SPARK-19750][UI][branch-2.1] Fix redirect issue ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/17083#discussion_r103279006 --- Diff: core/src/test/scala/org/apache/spark/ui/UISuite.scala --- @@ -267,8 +267,11 @@ class UISuite extends SparkFunSuite { s"$scheme://localhost:$port/test1/root", s"$scheme://localhost:$port/test2/root") urls.foreach { url => - val rc = TestUtils.httpResponseCode(new URL(url)) - assert(rc === expected, s"Unexpected status $rc for $url") + val rc = TestUtils.httpResponseCodeAndURL(new URL(url)) --- End diff -- `val (rc, redirectUrl) = ...` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17082: [SPARK-19749][SS] Name socket source with a meaningful n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17082 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73499/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16784: [SPARK-19382][ML]:Test sparse vectors in LinearSV...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16784#discussion_r103278706 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -220,12 +246,13 @@ object LinearSVCSuite { "aggregationDepth" -> 3 ) -// Generate noisy input of the form Y = signum(x.dot(weights) + intercept + noise) + // Generate noisy input of the form Y = signum(x.dot(weights) + intercept + noise) def generateSVMInput( --- End diff -- This API is strange, where the caller expects numFeatures = weights.size, but really numFeatures = 10 * weights.size if isDense=false. Please update it to construct a random dense or sparse vector first (both of length weights.size) and then compute y to make the API more consistent. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16784: [SPARK-19382][ML]:Test sparse vectors in LinearSV...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16784#discussion_r103278715 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -234,7 +261,12 @@ object LinearSVCSuite { val yD = new BDV(xi).dot(weightsMat) + intercept + 0.01 * rnd.nextGaussian() if (yD > 0) 1.0 else 0.0 } -y.zip(x).map(p => LabeledPoint(p._1, Vectors.dense(p._2))) +val index = (0 to weights.length - 1).toArray --- End diff -- Move inside if-then to branch where it is used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17082: [SPARK-19749][SS] Name socket source with a meaningful n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17082 **[Test build #73499 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73499/testReport)** for PR 17082 at commit [`68349fa`](https://github.com/apache/spark/commit/68349facee3b33fd5975e90c74c882f3d922). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16784: [SPARK-19382][ML]:Test sparse vectors in LinearSV...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16784#discussion_r103278729 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -203,6 +227,8 @@ class LinearSVCSuite extends SparkFunSuite with MLlibTestSparkContext with Defau val svm = new LinearSVC() testEstimatorAndModelReadWrite(svm, smallBinaryDataset, LinearSVCSuite.allParamSettings, checkModelData) +testEstimatorAndModelReadWrite(svm, smallSparseBinaryDataset, LinearSVCSuite.allParamSettings, --- End diff -- No need for this. Once the model has been fit, its training data is irrelevant. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only d...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/16782#discussion_r103277884 --- Diff: python/pyspark/__init__.py --- @@ -96,9 +96,11 @@ def keyword_only(func): """ @wraps(func) def wrapper(*args, **kwargs): +# NOTE - this assumes we are wrapping a method and args[0] will be 'self' if len(args) > 1: raise TypeError("Method %s forces keyword arguments." % func.__name__) wrapper._input_kwargs = kwargs --- End diff -- Yeah, that is what I was suggesting only that removing that would require changing everywhere it is used in ml. So I just wanted to check with you guys first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only d...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/16782#discussion_r103276349 --- Diff: python/pyspark/__init__.py --- @@ -96,9 +96,11 @@ def keyword_only(func): """ @wraps(func) def wrapper(*args, **kwargs): +# NOTE - this assumes we are wrapping a method and args[0] will be 'self' if len(args) > 1: raise TypeError("Method %s forces keyword arguments." % func.__name__) wrapper._input_kwargs = kwargs --- End diff -- If the assumption is correct, should we always use 'self' to hold the kwargs? (remove this line and update all the fuctions that use `keyword_only`)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17064: [SPARK-19736][SQL] refreshByPath should clear all...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17064#discussion_r103277557 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala --- @@ -168,15 +168,16 @@ class CacheManager extends Logging { (fs, path.makeQualified(fs.getUri, fs.getWorkingDirectory)) } -cachedData.foreach { - case data if data.plan.find(lookupAndRefresh(_, fs, qualifiedPath)).isDefined => -val dataIndex = cachedData.indexWhere(cd => data.plan.sameResult(cd.plan)) -if (dataIndex >= 0) { - data.cachedRepresentation.cachedColumnBuffers.unpersist(blocking = true) - cachedData.remove(dataIndex) -} - sparkSession.sharedState.cacheManager.cacheQuery(Dataset.ofRows(sparkSession, data.plan)) - case _ => // Do Nothing +cachedData.filter { --- End diff -- why the previous one doesn't work? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for bucket...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17077 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for bucket...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17077 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73527/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for bucket...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17077 **[Test build #73527 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73527/testReport)** for PR 17077 at commit [`18c709c`](https://github.com/apache/spark/commit/18c709c4bf77fc6db5530e00a9e5bba0e1ab0250). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17086 **[Test build #73529 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73529/testReport)** for PR 17086 at commit [`cf6a5ab`](https://github.com/apache/spark/commit/cf6a5aba61716dcb11ef3ca7b1f3b803bf99ef33). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16557: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16557 I've created 3 PRs, located here: https://github.com/apache/spark/pull/17084 https://github.com/apache/spark/pull/17085 https://github.com/apache/spark/pull/17086 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use...
GitHub user imatiach-msft opened a pull request: https://github.com/apache/spark/pull/17086 [SPARK-18693][ML][MLLIB] ML Evaluators should use weight column - added weight column for multiclass classification evaluator ## What changes were proposed in this pull request? The evaluators BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator and the corresponding metrics classes BinaryClassificationMetrics, RegressionMetrics and MulticlassMetrics should use sample weight data. I've closed the PR: https://github.com/apache/spark/pull/16557 as recommended in favor of creating three pull requests, one for each of the evaluators (binary/regression/multiclass) to make it easier to review/update. ## How was this patch tested? I added tests to the metrics class. You can merge this pull request into a Git repository by running: $ git pull https://github.com/imatiach-msft/spark ilmat/multiclass-evaluate Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17086.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17086 commit cf6a5aba61716dcb11ef3ca7b1f3b803bf99ef33 Author: Ilya MatiachDate: 2017-02-27T18:28:08Z Added weight column for multiclass classification evaluator --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17085: [SPARK-18693][ML][MLLIB] ML Evaluators should use...
GitHub user imatiach-msft opened a pull request: https://github.com/apache/spark/pull/17085 [SPARK-18693][ML][MLLIB] ML Evaluators should use weight column - added weight column for regression evaluator ## What changes were proposed in this pull request? The evaluators BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator and the corresponding metrics classes BinaryClassificationMetrics, RegressionMetrics and MulticlassMetrics should use sample weight data. I've closed the PR: https://github.com/apache/spark/pull/16557 as recommended in favor of creating three pull requests, one for each of the evaluators (binary/regression/multiclass) to make it easier to review/update. The updates to the regression metrics were based on (and updated with new changes based on comments): https://issues.apache.org/jira/browse/SPARK-11520 ("RegressionMetrics should support instance weights") but the pull request was closed as the changes were never checked in. ## How was this patch tested? I added tests to the metrics class. You can merge this pull request into a Git repository by running: $ git pull https://github.com/imatiach-msft/spark ilmat/regression-evaluate Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17085.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17085 commit 48800eb91b27a713232ab27f05bb8cef92129852 Author: Ilya MatiachDate: 2017-02-27T18:20:44Z Added weight column for regression evaluator --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17076: [SPARK-19745][ML] SVCAggregator captures coeffici...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/17076#discussion_r103276904 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -440,19 +440,9 @@ private class LinearSVCAggregator( private val numFeatures: Int = bcFeaturesStd.value.length private val numFeaturesPlusIntercept: Int = if (fitIntercept) numFeatures + 1 else numFeatures - private val coefficients: Vector = bcCoefficients.value private var weightSum: Double = 0.0 private var lossSum: Double = 0.0 - require(numFeaturesPlusIntercept == coefficients.size, s"Dimension mismatch. Coefficients " + -s"length ${coefficients.size}, FeaturesStd length ${numFeatures}, fitIntercept: $fitIntercept") - - private val coefficientsArray = coefficients match { -case dv: DenseVector => dv.values -case _ => - throw new IllegalArgumentException( -s"coefficients only supports dense vector but got type ${coefficients.getClass}.") - } - private val gradientSumArray = Array.fill[Double](coefficientsArray.length)(0) + private lazy val gradientSumArray = new Array[Double](numFeaturesPlusIntercept) --- End diff -- Actually this question is slightly different than what I was referring to above. We don't use `@transient` here because we do need to serialize this when we send the gradient updates back to the driver. The reason for making it lazy is because we don't need to serialize the array of zeros. We can just initialize it on the workers and avoid the communication cost. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17085: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17085 **[Test build #73528 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73528/testReport)** for PR 17085 at commit [`48800eb`](https://github.com/apache/spark/commit/48800eb91b27a713232ab27f05bb8cef92129852). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17075: [SPARK-19727][SQL] Fix for round function that modifies ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17075 I think we should fix `changePrecison` to return a new instance instead of updating itself. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/16290#discussion_r103276097 --- Diff: R/pkg/R/sparkR.R --- @@ -376,6 +377,12 @@ sparkR.session <- function( overrideEnvs(sparkConfigMap, paramMap) } + # NOTE(shivaram): Set default warehouse dir to tmpdir to meet CRAN requirements + # See SPARK-18817 for more details + if (!exists("spark.sql.default.warehouse.dir", envir = sparkConfigMap)) { --- End diff -- Ah I see - I will make try to use `SessionState` and see if it can avoid having to create a new option --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16330: [SPARK-18817][SPARKR][SQL] change derby log output and m...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/16330 Its a bit tricky to ask users permission during installation (actually I'm not sure how we can create such an option ?) -- I think a viable option could be to do add `logWarning` that shows where SparkSQL data is going to be stored and a pointer to how the location can be changed. @felixcheung I think its worth a shot to ask the CRAN submission process with such a warning and then revisit this if we still have a problem ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for bucket...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17077 **[Test build #73527 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73527/testReport)** for PR 17077 at commit [`18c709c`](https://github.com/apache/spark/commit/18c709c4bf77fc6db5530e00a9e5bba0e1ab0250). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17084 **[Test build #73526 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73526/testReport)** for PR 17084 at commit [`98652cf`](https://github.com/apache/spark/commit/98652cfb6c92eed90deff61bc83ef66b9096df20). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16842: [SPARK-19304] [Streaming] [Kinesis] fix kinesis s...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/16842#discussion_r103274162 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala --- @@ -212,7 +214,7 @@ class KinesisSequenceRangeIterator( val getRecordsRequest = new GetRecordsRequest getRecordsRequest.setRequestCredentials(credentials) getRecordsRequest.setShardIterator(shardIterator) -getRecordsRequest.setLimit(recordCount) +getRecordsRequest.setLimit(Math.max(recordCount, this.maxGetRecordsLimit)) --- End diff -- this should be a `min` not a `max` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16910: [SPARK-19575][SQL]Reading from or writing to a hive serd...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16910 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16910: [SPARK-19575][SQL]Reading from or writing to a hive serd...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16910 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73513/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use...
GitHub user imatiach-msft opened a pull request: https://github.com/apache/spark/pull/17084 [SPARK-18693][ML][MLLIB] ML Evaluators should use weight column - added weight column for binary classification evaluator ## What changes were proposed in this pull request? The evaluators BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator and the corresponding metrics classes BinaryClassificationMetrics, RegressionMetrics and MulticlassMetrics should use sample weight data. I've closed the PR: https://github.com/apache/spark/pull/16557 as recommended in favor of creating three pull requests, one for each of the evaluators (binary/regression/multiclass) to make it easier to review/update. ## How was this patch tested? I added tests to the metrics and evaluators classes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/imatiach-msft/spark ilmat/binary-evalute Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17084.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17084 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16910: [SPARK-19575][SQL]Reading from or writing to a hive serd...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16910 **[Test build #73513 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73513/testReport)** for PR 16910 at commit [`92d1067`](https://github.com/apache/spark/commit/92d10679b5a07b34f6d5cfdb8cd27279165c95e3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16557: [SPARK-18693][ML][MLLIB] ML Evaluators should use...
Github user imatiach-msft closed the pull request at: https://github.com/apache/spark/pull/16557 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16557: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16557 ok, I will close this and create three new PRs, one for each of the evaluators --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17052 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17052 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73507/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17052 **[Test build #73507 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73507/testReport)** for PR 17052 at commit [`38e3a14`](https://github.com/apache/spark/commit/38e3a14b609373f2fae21fcd70a14669cfc96aa1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16557: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16557 **[Test build #73525 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73525/testReport)** for PR 16557 at commit [`a0fc4c3`](https://github.com/apache/spark/commit/a0fc4c3ddb9e9e62e78b4dff59e65d7ae4387054). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16971#discussion_r103270963 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala --- @@ -91,7 +100,13 @@ object StatFunctions extends Logging { } val summaries = df.select(columns: _*).rdd.aggregate(emptySummaries)(apply, merge) -summaries.map { summary => probabilities.map(summary.query) } +summaries.map { summary => + try { +probabilities.map(summary.query) + } catch { +case e: SparkException => Seq.empty[Double] --- End diff -- Please do not use the Exception handling for this purpose. Instead, you can return None. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16793 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73524/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16793 **[Test build #73524 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73524/testReport)** for PR 16793 at commit [`17e6820`](https://github.com/apache/spark/commit/17e68205ef639893902c65c0394c8aa4406191be). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16793 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16929: [SPARK-19595][SQL] Support json array in from_json
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/16929 Thanks @HyukjinKwon took a pass. My comments are mainly: 1. We don't need to support APIs for both `StructType` and `ArrayType`. I would rather just add an API for `DataType` and `require` that the `DataType` is either `StructType` or `ArrayType`. 2. If a user specifies the schema as an `Array` but one of the rows has a JSON object, we should still consider it an Array of records. No need to separate `Array support` and `Object support` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for bucket...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17077 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73523/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16842: [SPARK-19304] [Streaming] [Kinesis] fix kinesis slow che...
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/16842 @Gauravshah Can you please comment on how much faster this PR improved your recovery time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for bucket...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17077 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for bucket...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17077 **[Test build #73523 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73523/testReport)** for PR 17077 at commit [`9fde39f`](https://github.com/apache/spark/commit/9fde39fa2174e9e67d6045b890f8cc0fc76cd61b). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/16929#discussion_r103268734 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -39,7 +39,12 @@ private[sql] class SparkSQLJsonProcessingException(msg: String) extends RuntimeE */ class JacksonParser( schema: StructType, -options: JSONOptions) extends Logging { +options: JSONOptions, +arraySupport: Boolean = true, --- End diff -- as I commented above, I don't think we need this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/16929#discussion_r103268655 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -480,36 +480,79 @@ case class JsonTuple(children: Seq[Expression]) } /** - * Converts an json input string to a [[StructType]] with the specified schema. + * Converts an json input string to a [[StructType]] or [[ArrayType]] with the specified schema. */ case class JsonToStruct( -schema: StructType, +schema: DataType, options: Map[String, String], child: Expression, timeZoneId: Option[String] = None) extends UnaryExpression with TimeZoneAwareExpression with CodegenFallback with ExpectsInputTypes { override def nullable: Boolean = true - def this(schema: StructType, options: Map[String, String], child: Expression) = + def this(schema: DataType, options: Map[String, String], child: Expression) = this(schema, options, child, None) + override def checkInputDataTypes(): TypeCheckResult = schema match { +case _: StructType | ArrayType(_: StructType, _) => + super.checkInputDataTypes() +case _ => TypeCheckResult.TypeCheckFailure( + s"Input schema ${schema.simpleString} must be a struct or an array of structs.") + } + + @transient + lazy val rowSchema = schema match { +case st: StructType => st +case ArrayType(st: StructType, _) => st + } + + // This converts parsed rows to the desired output by the given schema. + @transient + lazy val converter = schema match { +case _: StructType => + // These are always produced from json objects by `objectSupport` in `JacksonParser`. + (rows: Seq[InternalRow]) => rows.head + +case ArrayType(_: StructType, _) => + // These are always produced from json arrays by `arraySupport` in `JacksonParser`. + (rows: Seq[InternalRow]) => new GenericArrayData(rows) + } + @transient lazy val parser = new JacksonParser( - schema, - new JSONOptions(options + ("mode" -> ParseModes.FAIL_FAST_MODE), timeZoneId.get)) + rowSchema, + new JSONOptions(options + ("mode" -> ParseModes.FAIL_FAST_MODE), timeZoneId.get), + objectSupport = schema.isInstanceOf[StructType], --- End diff -- Do you think we need the `objectSupport` and `arraySupport`? I would rather not add it. If someone specifies an `ArrayType` but the row contains just an object, let's still just return it as an `ArrayType`. I think users would appreciate this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16929: [SPARK-19595][SQL] Support json array in from_jso...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/16929#discussion_r103268156 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -480,36 +480,79 @@ case class JsonTuple(children: Seq[Expression]) } /** - * Converts an json input string to a [[StructType]] with the specified schema. + * Converts an json input string to a [[StructType]] or [[ArrayType]] with the specified schema. */ case class JsonToStruct( -schema: StructType, +schema: DataType, options: Map[String, String], child: Expression, timeZoneId: Option[String] = None) extends UnaryExpression with TimeZoneAwareExpression with CodegenFallback with ExpectsInputTypes { override def nullable: Boolean = true - def this(schema: StructType, options: Map[String, String], child: Expression) = + def this(schema: DataType, options: Map[String, String], child: Expression) = this(schema, options, child, None) + override def checkInputDataTypes(): TypeCheckResult = schema match { --- End diff -- why not just override: `override def inputTypes = new TypeCollection(ArrayType, StructType) :: Nil` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17076: [SPARK-19745][ML] SVCAggregator captures coefficients in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17076 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17076: [SPARK-19745][ML] SVCAggregator captures coefficients in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17076 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73505/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org