[GitHub] spark pull request: [SPARK-4286] Add an external shuffle service t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4990#issuecomment-94563238 [Test build #30600 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30600/consoleFull) for PR 4990 at commit [`07804ad`](https://github.com/apache/spark/commit/07804adebcc9dd94723e2cf2df8de2a025d97308). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7000] [ml] Refactor prediction and tree...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/5585#issuecomment-94566462 `prediction` sounds too general here, and I don't know what should go into this package. Many models can make predictions, but only tree nodes are under `prediction` now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4435#discussion_r28731362 --- Diff: core/src/main/java/org/apache/spark/status/api/EnumUtil.java --- @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.status.api; + +import com.google.common.base.Joiner; + +import java.util.Arrays; + +public class EnumUtil { + public static E extends EnumE E parseIgnoreCase(ClassE clz, String str) { +E[] constants = clz.getEnumConstants(); +if (str == null) { + return null; +} +for (E e : constants) { + if (e.name().equalsIgnoreCase(str)) +return e; +} +throw new IllegalArgumentException( + String.format(Illegal type='%s'. Supported type values: %s, +str, Joiner.on(, ).join( + Arrays.asList(constants; --- End diff -- Is `Arrays.asList` needed? There's a `join(Object[])` method in Joiner. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4435#discussion_r28731820 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala --- @@ -112,6 +118,10 @@ class HistoryServer( */ def initialize() { attachPage(new HistoryPage(this)) + --- End diff -- nit: delete --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6418] Add simple per-stage visualizatio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5547#issuecomment-94585789 [Test build #30610 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30610/consoleFull) for PR 5547 at commit [`7fac1eb`](https://github.com/apache/spark/commit/7fac1eb96c61cb23e020aa55c306f1b698e4196b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6954] [YARN] Dynamic allocation: numExe...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/5536#issuecomment-94593114 By the way, I plan to rewrite a large part of this logic in a way that is more intuitive, after which any test you write here may not apply anymore so it might make sense to hold that off for now. This code has grown to be quite unmanageable such that even I, the original author of this feature, need to spend a significant chunk of time trying to follow the logic and understand the root cause of the issue. For this reason I'm going to merge this as is into master and 1.3. Thanks @piaozhexiu @sryza @jerryshao. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7022][PySpark][ML] Add ML.Tuning.ParamG...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5601#issuecomment-94593306 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5932][CORE] Use consistent naming for s...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/5574#discussion_r28741957 --- Diff: core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala --- @@ -49,16 +49,16 @@ class KryoSerializer(conf: SparkConf) with Logging with Serializable { - private val bufferSizeMb = conf.getDouble(spark.kryoserializer.buffer.mb, 0.064) + private val bufferSizeMb = conf.getSizeAsKb(spark.kryoserializer.buffer, 64k).toDouble/1000.0d --- End diff -- I think this is incorrect. If the user specifies `0.064` as before this will be interpreted as `0.064kb` even though they mean `0.064mb` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7022][PySpark][ML] Add ML.Tuning.ParamG...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5601#issuecomment-94604732 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30616/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7022][PySpark][ML] Add ML.Tuning.ParamG...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5601#issuecomment-94604729 [Test build #30616 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30616/consoleFull) for PR 5601 at commit [`8b8a6d2`](https://github.com/apache/spark/commit/8b8a6d26ed935efe91a2f36a08c9835c88605239). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class ParamGridBuilder(object):` * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Project Infra] SPARK-1684: Merge script shoul...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5149#issuecomment-94604740 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7022][PySpark][ML] Add ML.Tuning.ParamG...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5601#issuecomment-94604611 [Test build #30616 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30616/consoleFull) for PR 5601 at commit [`8b8a6d2`](https://github.com/apache/spark/commit/8b8a6d26ed935efe91a2f36a08c9835c88605239). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6996][SQL] Support map types in java be...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/5578#discussion_r28743975 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -1222,20 +1224,31 @@ class SQLContext(@transient val sparkContext: SparkContext) * Returns a Catalyst Schema for the given java bean class. */ protected def getSchema(beanClass: Class[_]): Seq[AttributeReference] = { -val (dataType, _) = inferDataType(beanClass) +val (dataType, _) = SQLContext.inferDataType(TypeToken.of(beanClass)) dataType.asInstanceOf[StructType].fields.map { f = AttributeReference(f.name, f.dataType, f.nullable)() } } +} + +object SQLContext { --- End diff -- As long as we are going to break this out into its own object, can we move it to its own file and maybe name it something like `JavaTypeInference`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6996][SQL] Support map types in java be...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/5578#discussion_r28743927 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -18,13 +18,15 @@ package org.apache.spark.sql import java.beans.Introspector -import java.util.Properties +import java.lang.{Iterable = JIterable} +import java.util.{Iterator = JIterator, Map = JMap, Properties} import scala.collection.JavaConversions._ import scala.collection.immutable -import scala.language.implicitConversions +import scala.language.{existentials, implicitConversions} import scala.reflect.runtime.universe.TypeTag +import com.google.common.reflect.TypeToken --- End diff -- Nit: blank line to separate imports --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5342][YARN] Allow long running Spark ap...
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/4688#discussion_r28733644 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -122,7 +126,7 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp } else { logInfo(Registered executor: + executorRef + with ID + executorId) context.reply(RegisteredExecutor) - + latestTokens.foreach(x = context.reply(NewTokens(x))) --- End diff -- Isn't that the case with any of the messages being sent? I have no idea how akka handles message delivery failure. I can try again, but there were so many test failures it was painful to fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5342][YARN] Allow long running Spark ap...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4688#discussion_r28733812 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -122,7 +126,7 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp } else { logInfo(Registered executor: + executorRef + with ID + executorId) context.reply(RegisteredExecutor) - + latestTokens.foreach(x = context.reply(NewTokens(x))) --- End diff -- Yeah, but the executor won't do anything if it doesn't get a `RegisteredExecutor` reply. Your style was making things worse by adding this extra case where `RegisteredExecutor` could arrive but the next message could be lost. Anyway, the issue is mostly theoretical. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5342][YARN] Allow long running Spark ap...
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/4688#discussion_r28734183 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -122,7 +126,7 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp } else { logInfo(Registered executor: + executorRef + with ID + executorId) context.reply(RegisteredExecutor) - + latestTokens.foreach(x = context.reply(NewTokens(x))) --- End diff -- Yep, I understand that. But that issue exists for pretty much all messages sent within Spark - it is not clear whether we do anything at all to handle retries outside of what akka does (if any). Anyway, I am going to try once more - if test fail, I will just stick to 2 messages rather than risk random regressions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6479][Block Manager]Create off-heap blo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5430#issuecomment-94584504 [Test build #30609 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30609/consoleFull) for PR 5430 at commit [`d1c9921`](https://github.com/apache/spark/commit/d1c9921422de76e9f1d74d4450dd9dd889f22fb0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Checking data types when resolving types
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/4685#discussion_r28740529 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala --- @@ -290,16 +301,20 @@ case class GreaterThan(left: Expression, right: Expression) extends BinaryCompar case class GreaterThanOrEqual(left: Expression, right: Expression) extends BinaryComparison { def symbol = = - lazy val ordering = { -if (left.dataType != right.dataType) { - throw new TreeNodeException(this, -sTypes do not match ${left.dataType} != ${right.dataType}) -} -left.dataType match { - case i: NativeType = i.ordering.asInstanceOf[Ordering[Any]] - case other = sys.error(sType $other does not support ordered operations) + override lazy val resolved = +left.resolved right.resolved +left.dataType == right.dataType +(left.dataType.isInstanceOf[NativeType] || left.dataType.isInstanceOf[NullType]) + + val ordering = +if (resolved) { + left.dataType match { +case n: NativeType = n.ordering.asInstanceOf[Ordering[Any]] +case n: NullType = UnresolvedOrdering + } +} else { + UnresolvedOrdering } --- End diff -- Perhaps we could add helper methods to Expression for checking if things are numeric and for finding the correct ordering / numeric types. This pattern seems to be repeated quite a bit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6954] [YARN] Dynamic allocation: numExe...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/5536#issuecomment-94597105 Ok, let me know if you have any other comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6955] Perform port retries at NettyBloc...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/5575#discussion_r28740515 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/StandaloneWorkerShuffleService.scala --- @@ -35,7 +35,10 @@ private[worker] class StandaloneWorkerShuffleService(sparkConf: SparkConf, securityManager: SecurityManager) extends Logging { - private val enabled = sparkConf.getBoolean(spark.shuffle.service.enabled, false) + // Check both if shuffle service is enabled, and that the worker should actually host the + // shuffle service in that case. (The latter is currently only used for testing.) + private val enabled = sparkConf.getBoolean(spark.shuffle.service.enabled, false) +sparkConf.getBoolean(spark.worker.shouldHostShuffleServiceIfEnabled, true) --- End diff -- FYI this change will conflict with https://github.com/apache/spark/pull/4990, which makes this a general class that Mesos can also use. If you follow my suggestion of just disabling this for local cluster then this doesn't need to change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5932][CORE] Use consistent naming for s...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/5574#discussion_r28741752 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -407,7 +474,22 @@ private[spark] object SparkConf extends Logging { The spark.cache.class property is no longer being used! Specify storage levels using + the RDD.persist() method instead.), DeprecatedConfig(spark.yarn.user.classpath.first, 1.3, -Please use spark.{driver,executor}.userClassPathFirst instead.)) +Please use spark.{driver,executor}.userClassPathFirst instead.), + DeprecatedConfig(spark.reducer.maxMbInFlight, 1.4, +Please use spark.reducer.maxSizeInFlight instead.), + DeprecatedConfig(spark.kryoserializer.buffer.mb, 1.4, +Please use spark.kryoserializer.buffer instead.), + DeprecatedConfig(spark.kryoserializer.buffer.max.mb, 1.4, +Please use spark.kryoserializer.buffer.max instead.), + DeprecatedConfig(spark.shuffle.file.buffer.kb, 1.4, +Please use spark.shuffle.file.buffer instead.), + DeprecatedConfig(spark.executor.logs.rolling.size.maxBytes, 1.4, +Please use spark.executor.logs.rolling.maxSize instead.), + DeprecatedConfig(spark.io.compression.snappy.block.size, 1.4, +Please use spark.io.compression.snappy.blockSize instead.), + DeprecatedConfig(spark.io.compression.lz4.block.size, 1.4, +Please use spark.io.compression.lz4.blockSize instead.)) --- End diff -- You don't need to put these both here and in `configsWithAlternative`. Even if you just put it there we will still print a warning. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7022][PySpark][ML] Add ML.Tuning.ParamG...
Github user ash211 commented on the pull request: https://github.com/apache/spark/pull/5601#issuecomment-94604123 Jenkins this is ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6418] Add simple per-stage visualizatio...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5547#discussion_r28743323 --- Diff: core/src/main/resources/org/apache/spark/ui/static/jobs-graph.js --- @@ -0,0 +1,118 @@ +function renderJobsGraphs(data) { + /* show visualization toggle */ --- End diff -- Can you use 2 spaces for indents throughout all javascript files, instead of tap characters? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2342#discussion_r28726587 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala --- @@ -17,18 +17,137 @@ package org.apache.spark.ui.jobs -import scala.xml.{Node, NodeSeq} +import scala.xml.{Node, NodeSeq, Unparsed} +import java.util.Date import javax.servlet.http.HttpServletRequest -import org.apache.spark.ui.{WebUIPage, UIUtils} +import org.apache.spark.ui.{UIUtils, WebUIPage} import org.apache.spark.ui.jobs.UIData.JobUIData +import org.apache.spark.JobExecutionStatus /** Page showing list of all ongoing and recently finished jobs */ private[ui] class AllJobsPage(parent: JobsTab) extends WebUIPage() { private val startTime: Option[Long] = parent.sc.map(_.startTime) private val listener = parent.listener + private def applicationTimelineView(jobs: Seq[JobUIData], now: Long): Seq[Node] = { +val jobEventJsonAsStrSeq = jobs.flatMap { jobUIData = + val jobId = jobUIData.jobId + val status = jobUIData.status + val submissionTimeOpt = jobUIData.submissionTime + val completionTimeOpt = jobUIData.completionTime + + if (status == JobExecutionStatus.UNKNOWN || submissionTimeOpt.isEmpty || +completionTimeOpt.isEmpty status != JobExecutionStatus.RUNNING) { +None + } + + val submissionTime = submissionTimeOpt.get + val completionTime = completionTimeOpt.getOrElse(now) + val classNameByStatus = status match { +case JobExecutionStatus.SUCCEEDED = succeeded +case JobExecutionStatus.FAILED = failed +case JobExecutionStatus.RUNNING = running + } + + val jobEventJsonAsStr = +s + |{ + | 'className': 'job application-timeline-object ${classNameByStatus}', --- End diff -- it might be nice to factor these out into a utility function: ``` def generateTimelineEventJSON( className: String, group: String, start: Date, end: Option[Date], title: String) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [doc][streaming] Fixed broken link in mllib se...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5600#issuecomment-94562377 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [doc][streaming] Fixed broken link in mllib se...
Github user BenFradet commented on the pull request: https://github.com/apache/spark/pull/5600#issuecomment-94562538 I did run it locally, for information. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [doc][streaming] Fixed broken link in mllib se...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/5600#issuecomment-94564233 Merged into master and branch-1.3. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2342#discussion_r28727537 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala --- @@ -17,18 +17,137 @@ package org.apache.spark.ui.jobs -import scala.xml.{Node, NodeSeq} +import scala.xml.{Node, NodeSeq, Unparsed} +import java.util.Date import javax.servlet.http.HttpServletRequest -import org.apache.spark.ui.{WebUIPage, UIUtils} +import org.apache.spark.ui.{UIUtils, WebUIPage} import org.apache.spark.ui.jobs.UIData.JobUIData +import org.apache.spark.JobExecutionStatus /** Page showing list of all ongoing and recently finished jobs */ private[ui] class AllJobsPage(parent: JobsTab) extends WebUIPage() { private val startTime: Option[Long] = parent.sc.map(_.startTime) private val listener = parent.listener + private def applicationTimelineView(jobs: Seq[JobUIData], now: Long): Seq[Node] = { +val jobEventJsonAsStrSeq = jobs.flatMap { jobUIData = + val jobId = jobUIData.jobId + val status = jobUIData.status + val submissionTimeOpt = jobUIData.submissionTime + val completionTimeOpt = jobUIData.completionTime + + if (status == JobExecutionStatus.UNKNOWN || submissionTimeOpt.isEmpty || +completionTimeOpt.isEmpty status != JobExecutionStatus.RUNNING) { +None + } + + val submissionTime = submissionTimeOpt.get + val completionTime = completionTimeOpt.getOrElse(now) + val classNameByStatus = status match { +case JobExecutionStatus.SUCCEEDED = succeeded +case JobExecutionStatus.FAILED = failed +case JobExecutionStatus.RUNNING = running + } + + val jobEventJsonAsStr = +s + |{ + | 'className': 'job application-timeline-object ${classNameByStatus}', + | 'group': 'jobs', + | 'start': new Date(${submissionTime}), + | 'end': new Date(${completionTime}), + | 'content': 'div class=application-timeline-contentJob ${jobId}/div', + | 'title': 'Job ${jobId}\\nStatus: ${status}\\n' + --- End diff -- Can this use the Job description (then say the Job ID in parentheses)? Also, can there be a link that will take you back to the job if you click on it? If you need to add anchor tags for each entry in the jobs table... that's fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [doc][streaming] Fixed broken link in mllib se...
Github user BenFradet commented on the pull request: https://github.com/apache/spark/pull/5600#issuecomment-94564318 Glad I could help! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [doc][streaming] Fixed broken link in mllib se...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/5600 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6985][streaming] Receiver maxRate over ...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5559#issuecomment-94575083 Jenkins, add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6985][streaming] Receiver maxRate over ...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5559#issuecomment-94575102 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5337][Mesos][Standalone] respect spark....
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4129#issuecomment-94577204 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30603/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5337][Mesos][Standalone] respect spark....
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4129#issuecomment-94577188 [Test build #30603 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30603/consoleFull) for PR 4129 at commit [`c10f980`](https://github.com/apache/spark/commit/c10f980f437d326cab031b6927927c848cdec4cc). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class MapConfigProvider extends ConfigProvider ` * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4435#discussion_r28732740 --- Diff: core/pom.xml --- @@ -220,6 +220,21 @@ version3.2.10/version /dependency dependency + groupIdcom.fasterxml.jackson.module/groupId + artifactIdjackson-module-scala_2.10/artifactId + version2.3.1/version +/dependency +dependency + groupIdcom.sun.jersey/groupId + artifactIdjersey-server/artifactId + version1.9/version +/dependency +dependency + groupIdorg.glassfish.jersey.media/groupId + artifactIdjersey-media-json-jackson/artifactId --- End diff -- In fact is this needed at all? It seems to be a plugin for jersey 2.x which is not what you're using. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-94582162 It just occurred to me that we're converting from Float to Double. I'm not sure historically why Word2Vec used Float, but I'm worrying now about switching since it will double model sizes. (I'm sorry I didn't think about this earlier!) This PR should still be doable, but you would need to store an Array[Float] instead of the Matrix type. You would also need to use ```mllib.linalg.BLAS.nativeBLAS``` to make the BLAS calls. What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2342#issuecomment-94583906 Hi @sarutak - I did a pretty thorough pass on this and I have some feedback. First some high level stuff: - The overall architecture. I'm +1 on using this timelines library, it seems like it has a pretty nice interface and simplifies a lot. This also largely uses things we already have inside of existing listeners, which is good. - Using this vs the approach in #5547. I think a good answer here is to use this vis.js library for the jobs page and then use a custom D3-based approach for the stage page, where we need to be careful about scalability to thousands of events (e.g. thousands of tasks). So with that in mind, I'd propose removing the stage functionality for now from this patch and only having the other pages. - What does it look like if you just display all stages instead of doing jobs-level view? I wonder if that would be better. Then we could give stages in the same job the same color. There are also some lower level notes inline about code cleanliness, etc. - It would be good to have a visual line indicating the start of the application. - It would be nice if you could use ctrl+scroll to zoom, so that we could remove the scroll lock. Is this possible with the library? - It would be nice if I could mouse over a job and then have it highlight the corresponding job on the table below. Ping @andrewor14 as well case I missed anything. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5932][CORE] Use consistent naming for s...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/5574#discussion_r28738733 --- Diff: network/common/src/main/java/org/apache/spark/network/util/JavaUtils.java --- @@ -137,6 +137,16 @@ private static boolean isSymlink(File file) throws IOException { .put(d, TimeUnit.DAYS) .build(); + private static ImmutableMapString, ByteUnit byteSuffixes = --- End diff -- final --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5932][CORE] Use consistent naming for s...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/5574#discussion_r28738683 --- Diff: network/common/src/main/java/org/apache/spark/network/util/JavaUtils.java --- @@ -186,5 +196,80 @@ public static long timeStringAsMs(String str) { public static long timeStringAsSec(String str) { return parseTimeString(str, TimeUnit.SECONDS); } + + /** + * Convert a passed byte string (e.g. 50b, 100kb, or 250mb) to a ByteUnit for + * internal use. If no suffix is provided a direct conversion of the provided default is + * attempted. + */ + private static long parseByteString(String str, ByteUnit unit) { +String lower = str.toLowerCase().trim(); +try { + String suffix; + long val; + Matcher m = Pattern.compile(([0-9]+)([a-z]+)?).matcher(lower); + if (m.matches()) { +val = Long.parseLong(m.group(1)); +suffix = m.group(2); + } else { +throw new NumberFormatException(Failed to parse byte string: + str); + } + + // Check for invalid suffixes + if (suffix != null !byteSuffixes.containsKey(suffix)) { +throw new NumberFormatException(Invalid suffix: \ + suffix + \); + } + + // If suffix is valid use that, otherwise none was provided and use the default passed + return new Double( --- End diff -- Why `new Double`? Not only it looks unnecessary but you're reducing the range of the result (double = 53 bits vs. long = 64 bits). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/5478#issuecomment-94594191 @Sephiroth-Lin would you mind closing this PR then? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6955] Perform port retries at NettyBloc...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/5575#discussion_r28740458 --- Diff: core/src/test/scala/org/apache/spark/ExternalShuffleServiceSuite.scala --- @@ -43,6 +43,9 @@ class ExternalShuffleServiceSuite extends ShuffleSuite with BeforeAndAfterAll { conf.set(spark.shuffle.manager, sort) conf.set(spark.shuffle.service.enabled, true) conf.set(spark.shuffle.service.port, server.getPort.toString) + +// local-cluster mode starts a Worker which would start its own shuffle service without this: +conf.set(spark.worker.shouldHostShuffleServiceIfEnabled, false) --- End diff -- Do we actually ever want to start an external shuffle service in a local cluster? If not I think it makes more sense to just set `spark.shuffle.service.enabled` to false in `LocalSparkCluster` (we already do this for the REST submission server for Master) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [BUILD] Support building with SBT on encrypted...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/5546#issuecomment-94599321 You are right that it will effect how the class is found, but if I understand correctly, we are shortening generated classes that will only be referenced internally. For example, a class that is created for a closure inside of a function such as `StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1$$anonfun$apply$5$$anonfun$apply$6.class` I think building on encrypted file systems is becoming more common. Most corporate laptops I have owned required it, and it was the default when I installed ubuntu on my desktop. Regarding doing it for maven, we can. I was going going for the minimal change that would make my dev workflow easier. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][minor] make it more clear that we only n...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/5588#issuecomment-94605715 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2342#discussion_r28726698 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala --- @@ -17,18 +17,137 @@ package org.apache.spark.ui.jobs -import scala.xml.{Node, NodeSeq} +import scala.xml.{Node, NodeSeq, Unparsed} +import java.util.Date import javax.servlet.http.HttpServletRequest -import org.apache.spark.ui.{WebUIPage, UIUtils} +import org.apache.spark.ui.{UIUtils, WebUIPage} import org.apache.spark.ui.jobs.UIData.JobUIData +import org.apache.spark.JobExecutionStatus /** Page showing list of all ongoing and recently finished jobs */ private[ui] class AllJobsPage(parent: JobsTab) extends WebUIPage() { private val startTime: Option[Long] = parent.sc.map(_.startTime) private val listener = parent.listener + private def applicationTimelineView(jobs: Seq[JobUIData], now: Long): Seq[Node] = { +val jobEventJsonAsStrSeq = jobs.flatMap { jobUIData = + val jobId = jobUIData.jobId + val status = jobUIData.status + val submissionTimeOpt = jobUIData.submissionTime + val completionTimeOpt = jobUIData.completionTime + + if (status == JobExecutionStatus.UNKNOWN || submissionTimeOpt.isEmpty || +completionTimeOpt.isEmpty status != JobExecutionStatus.RUNNING) { +None + } + + val submissionTime = submissionTimeOpt.get + val completionTime = completionTimeOpt.getOrElse(now) + val classNameByStatus = status match { +case JobExecutionStatus.SUCCEEDED = succeeded +case JobExecutionStatus.FAILED = failed +case JobExecutionStatus.RUNNING = running + } + + val jobEventJsonAsStr = +s + |{ + | 'className': 'job application-timeline-object ${classNameByStatus}', --- End diff -- Actually on second though, maybe it's more readable as-is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2342#discussion_r28727987 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala --- @@ -17,18 +17,137 @@ package org.apache.spark.ui.jobs -import scala.xml.{Node, NodeSeq} +import scala.xml.{Node, NodeSeq, Unparsed} +import java.util.Date import javax.servlet.http.HttpServletRequest -import org.apache.spark.ui.{WebUIPage, UIUtils} +import org.apache.spark.ui.{UIUtils, WebUIPage} import org.apache.spark.ui.jobs.UIData.JobUIData +import org.apache.spark.JobExecutionStatus /** Page showing list of all ongoing and recently finished jobs */ private[ui] class AllJobsPage(parent: JobsTab) extends WebUIPage() { private val startTime: Option[Long] = parent.sc.map(_.startTime) private val listener = parent.listener + private def applicationTimelineView(jobs: Seq[JobUIData], now: Long): Seq[Node] = { +val jobEventJsonAsStrSeq = jobs.flatMap { jobUIData = + val jobId = jobUIData.jobId + val status = jobUIData.status + val submissionTimeOpt = jobUIData.submissionTime + val completionTimeOpt = jobUIData.completionTime + + if (status == JobExecutionStatus.UNKNOWN || submissionTimeOpt.isEmpty || +completionTimeOpt.isEmpty status != JobExecutionStatus.RUNNING) { +None + } + + val submissionTime = submissionTimeOpt.get + val completionTime = completionTimeOpt.getOrElse(now) + val classNameByStatus = status match { +case JobExecutionStatus.SUCCEEDED = succeeded +case JobExecutionStatus.FAILED = failed +case JobExecutionStatus.RUNNING = running + } + + val jobEventJsonAsStr = +s + |{ + | 'className': 'job application-timeline-object ${classNameByStatus}', + | 'group': 'jobs', + | 'start': new Date(${submissionTime}), + | 'end': new Date(${completionTime}), + | 'content': 'div class=application-timeline-contentJob ${jobId}/div', + | 'title': 'Job ${jobId}\\nStatus: ${status}\\n' + + |'Submission Time: ${UIUtils.formatDate(new Date(submissionTime))}' + + |'${ + if (status != JobExecutionStatus.RUNNING) { + s\\nCompletion Time: ${UIUtils.formatDate(new Date(completionTime))} + } else { + + } + }' + |} + .stripMargin + Option(jobEventJsonAsStr) --- End diff -- Or just wrap it in a `Seq` if you are expecting it to be a sequence. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5342][YARN] Allow long running Spark ap...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4688#discussion_r28728796 --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala --- @@ -27,7 +29,7 @@ import org.scalatest.{FunSuite, Matchers} import org.apache.hadoop.yarn.api.records.ApplicationAccessType -import org.apache.spark.{Logging, SecurityManager, SparkConf} +import org.apache.spark.{SparkException, Logging, SecurityManager, SparkConf} --- End diff -- nit: order --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2342#discussion_r28729612 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala --- @@ -17,18 +17,137 @@ package org.apache.spark.ui.jobs -import scala.xml.{Node, NodeSeq} +import scala.xml.{Node, NodeSeq, Unparsed} +import java.util.Date import javax.servlet.http.HttpServletRequest -import org.apache.spark.ui.{WebUIPage, UIUtils} +import org.apache.spark.ui.{UIUtils, WebUIPage} import org.apache.spark.ui.jobs.UIData.JobUIData +import org.apache.spark.JobExecutionStatus /** Page showing list of all ongoing and recently finished jobs */ private[ui] class AllJobsPage(parent: JobsTab) extends WebUIPage() { private val startTime: Option[Long] = parent.sc.map(_.startTime) private val listener = parent.listener + private def applicationTimelineView(jobs: Seq[JobUIData], now: Long): Seq[Node] = { --- End diff -- Could you do a few things to make this cleaner and easier to follow? 1. Remove the `val listener` field of the `AllJobsPage` class, then instead just access `parent.listener` directly in the `render` function. 2. Refactor the timeline generation code to be more modular. I'd recommend the following functions: ``` # These can be constants at the top level val EXECUTORS_LEGEND val JOBS_LEGEND def makeJobEvent(job: JobUIData): Seq[Node] // TODO: Consider pushing this into JobProgressListener case class ExecutorDataUI(startTime: Long, status: String, finishTime: Option[Long], finishReason: Option[String]) def makeExecutorEvent(executor: ExecutorUIData): Seq[Node] def makeTimeline(jobs: Seq[JobUIData], executors: seq[ExecutorUIData]) ``` Then in the `render` function you can access the listener and create all the necessary inputs to those functions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4435#discussion_r28731264 --- Diff: core/src/main/java/org/apache/spark/status/api/EnumUtil.java --- @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.status.api; + +import com.google.common.base.Joiner; + +import java.util.Arrays; + +public class EnumUtil { + public static E extends EnumE E parseIgnoreCase(ClassE clz, String str) { +E[] constants = clz.getEnumConstants(); +if (str == null) { + return null; +} +for (E e : constants) { + if (e.name().equalsIgnoreCase(str)) --- End diff -- nit: if { ... } --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4435#discussion_r28731238 --- Diff: core/src/main/java/org/apache/spark/status/api/EnumUtil.java --- @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.status.api; + +import com.google.common.base.Joiner; + +import java.util.Arrays; + +public class EnumUtil { --- End diff -- Also... if this is supposed to be a helper class, maybe it should be in different package. If it's supposed to be used by this code only, maybe it could be package-private? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1537 [WiP] Application Timeline Server i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5423#issuecomment-94573736 [Test build #30605 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30605/consoleFull) for PR 5423 at commit [`0f0f66c`](https://github.com/apache/spark/commit/0f0f66c810a0ea3c647ddf70e73c2a79e932d2f7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6985][streaming] Receiver maxRate over ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5559#issuecomment-94575454 [Test build #30606 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30606/consoleFull) for PR 5559 at commit [`d29d2e0`](https://github.com/apache/spark/commit/d29d2e060fe48e8a3f1e506bf2bf2cc13d99d751). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4435#discussion_r28732087 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala --- @@ -149,7 +159,13 @@ class HistoryServer( * * @return List of all known applications. */ - def getApplicationList(): Iterable[ApplicationHistoryInfo] = provider.getListing() + def getApplicationList(): Iterable[ApplicationHistoryInfo] = { +provider.getListing() + } + + def getApplicationInfoList: Seq[ApplicationInfo] = { +getApplicationList().map { ApplicationsListResource.appHistoryInfoToPublicAppInfo }.toSeq --- End diff -- Is this transformation lazy? Otherwise it can cause memory usage to shoot up considerably when this method is called (which can happen from multiple concurrent user requests). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7000] [ml] Refactor prediction and tree...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/5585#issuecomment-94580875 `ml.tree` and `ml.ensemble` look good. If we want to distinguish decision tree from tree elements used in hierarchical clustering, we can put them under separate packages, e.g., `ml.tree` and `ml.clustering.hierachical`. It is not necessary to create common base classes if the subclasses are not expected to be called in a generic way. What do we want to put under `ml.prediction` beside `Predictor`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7000] [ml] Refactor prediction and tree...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5585#issuecomment-94582904 I'm not sure what else would go under ```ml.prediction```. I have, however, started to wonder if evaluation metrics should sit under the relevant subpackage (to make it easier for users to matches evaluators with models), in which case there might be an evaluation abstraction under ```ml.prediction```. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2342#discussion_r28735427 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala --- @@ -73,8 +73,15 @@ class JobProgressListener(conf: SparkConf) extends SparkListener with Logging { var numCompletedStages = 0 var numFailedStages = 0 + // Executors: + val addedExecutors = TreeSet[(Long, ExecutorId)]() --- End diff -- Rather than modifying the JobProgressListener, can you use the existing ExecutorsListener? Feel free to modify it if you need to. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1537 [WiP] Application Timeline Server i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5423#issuecomment-94593519 [Test build #30605 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30605/consoleFull) for PR 5423 at commit [`0f0f66c`](https://github.com/apache/spark/commit/0f0f66c810a0ea3c647ddf70e73c2a79e932d2f7). * This patch **passes all tests**. * This patch **does not merge cleanly**. * This patch adds the following public classes _(experimental)_: * `case class TimestampEvent(sparkEvent: SparkListenerEvent, time: Long, flush: Boolean = false) ` * `class YarnHistoryProvider(sparkConf: SparkConf)` * `trait YarnService extends Closeable ` * This patch **adds the following new dependencies:** * `commons-math3-3.1.1.jar` * `snappy-java-1.1.1.6.jar` * This patch **removes the following dependencies:** * `commons-math3-3.4.1.jar` * `snappy-java-1.1.1.7.jar` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1537 [WiP] Application Timeline Server i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5423#issuecomment-94593553 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30605/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6969][SQL] Refresh the cached table whe...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/5583#discussion_r28740186 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala --- @@ -347,7 +347,23 @@ private[sql] case class RefreshTable(databaseName: String, tableName: String) extends RunnableCommand { override def run(sqlContext: SQLContext): Seq[Row] = { +// Refresh the given table's metadata first. sqlContext.catalog.refreshTable(databaseName, tableName) + +// If this table is cached as a InMemoryColumnarRelation, drop the original +// cached version and make the new version cached lazily. +val logicalPlan = sqlContext.catalog.lookupRelation(Seq(databaseName, tableName)) --- End diff -- Leave a TODO here to clean this up when uncacheTable supports databases correctly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6479][Block Manager]Create off-heap blo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5430#issuecomment-94598787 [Test build #30614 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30614/consoleFull) for PR 5430 at commit [`3ea0689`](https://github.com/apache/spark/commit/3ea068940d16ba0fe9576073080cbcc3d1a220d3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5932][CORE] Use consistent naming for s...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/5574#discussion_r28741874 --- Diff: core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala --- @@ -49,16 +49,16 @@ class KryoSerializer(conf: SparkConf) with Logging with Serializable { - private val bufferSizeMb = conf.getDouble(spark.kryoserializer.buffer.mb, 0.064) + private val bufferSizeMb = conf.getSizeAsKb(spark.kryoserializer.buffer, 64k).toDouble/1000.0d --- End diff -- should this be 1024? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5932][CORE] Use consistent naming for s...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/5574#discussion_r28741842 --- Diff: core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala --- @@ -49,16 +49,16 @@ class KryoSerializer(conf: SparkConf) with Logging with Serializable { - private val bufferSizeMb = conf.getDouble(spark.kryoserializer.buffer.mb, 0.064) + private val bufferSizeMb = conf.getSizeAsKb(spark.kryoserializer.buffer, 64k).toDouble/1000.0d --- End diff -- space around / --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6014] [core] Revamp Spark shutdown hook...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5560#issuecomment-94562189 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30598/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4435#discussion_r28731684 --- Diff: core/src/main/java/org/apache/spark/status/api/v1/TaskSorting.java --- @@ -0,0 +1,45 @@ +package org.apache.spark.status.api.v1;/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import org.apache.spark.status.api.EnumUtil; + +import java.util.HashSet; +import java.util.Set; + +public enum TaskSorting { + ID, + IncreasingRuntime(runtime), + DecreasingRuntime(-runtime); + + final SetString alternateNames; + TaskSorting(String... names) { +alternateNames = new HashSetString(); +for (String n: names) { + alternateNames.add(n); +} + } + + public static TaskSorting fromString(String str) { +for (TaskSorting t: values()) { + if (t.alternateNames.contains(str.toLowerCase())) { --- End diff -- nit: you could hoist `str.toLowerCase()` out of the loop. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1537 [WiP] Application Timeline Server i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5423#issuecomment-94578270 [Test build #30607 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30607/consoleFull) for PR 5423 at commit [`1256532`](https://github.com/apache/spark/commit/1256532b65c27196afb5a40ea33151f8d269a05e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1537 [WiP] Application Timeline Server i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5423#issuecomment-94583088 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30607/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6479][Block Manager]Create off-heap blo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5430#issuecomment-94587069 [Test build #30609 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30609/consoleFull) for PR 5430 at commit [`d1c9921`](https://github.com/apache/spark/commit/d1c9921422de76e9f1d74d4450dd9dd889f22fb0). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6479][Block Manager]Create off-heap blo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5430#issuecomment-94587080 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30609/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5337][Mesos][Standalone] respect spark....
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/4129#issuecomment-94589740 @andrewor14 , I updated the patch, how about the current version? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] SPARK-6981: Factor out SparkPlanner and ...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/5556#issuecomment-94592384 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] There are three tests of sql are failed ...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/5552#issuecomment-94592413 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6994] Allow to fetch field values by na...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/5573#issuecomment-94592348 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6996][SQL] Support map types in java be...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/5578#discussion_r28743714 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -1260,18 +1273,41 @@ class SQLContext(@transient val sparkContext: SparkContext) case c: Class[_] if c == classOf[java.sql.Date] = (DateType, true) case c: Class[_] if c == classOf[java.sql.Timestamp] = (TimestampType, true) - case c: Class[_] if c.isArray = -val (dataType, nullable) = inferDataType(c.getComponentType) + case _ if typeToken.isArray = +val (dataType, nullable) = inferDataType(typeToken.getComponentType) +(ArrayType(dataType, nullable), true) + + case _ if iterableType.isAssignableFrom(typeToken) = { --- End diff -- Spark style avoids `{ }` for `case` statements. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6418] Add simple per-stage visualizatio...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5547#discussion_r28743664 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala --- @@ -234,6 +235,8 @@ private[ui] class StagePage(parent: StagesTab) extends WebUIPage(stage) { val deserializationTimes = validTasks.map { case TaskUIData(_, metrics, _) = metrics.get.executorDeserializeTime.toDouble } + graphData(Task Deserialization Time) = deserializationTimes.mkString(,) --- End diff -- Rather than creating your own string representation of a list, can do the necessary conversation to pass proper JSON lists here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6418] Add simple per-stage visualizatio...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5547#discussion_r28743686 --- Diff: core/src/main/resources/org/apache/spark/ui/static/jobs-graph.js --- @@ -0,0 +1,118 @@ +function renderJobsGraphs(data) { + /* show visualization toggle */ + $(.expand-visualization-arrow).toggleClass('arrow-closed'); + $(.expand-visualization-arrow).toggleClass('arrow-open'); + if ($(.expand-visualization-arrow).hasClass(arrow-closed)) { + $(#chartContainer).empty(); + return; + } + + /* no data to graph */ + if (!Object.keys(data).length) { + return; + } + + /* format data to a form readable by dimple.js */ + var tableData = []; + for (var k in data) { + var arr = (data[k]).split(,); --- End diff -- If you pass each of the inputs as lists, I think you won't need to do this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Avoid warning message about invalid refuse_sec...
Github user MartinWeindel commented on the pull request: https://github.com/apache/spark/pull/5597#issuecomment-94562061 The value 5 seconds is the default value of Mesos, which is used if not set or an invalid value is given. So at least with current versions of Mesos nothing changes in the behavior. The parameter refuse_seconds configures how long Mesos should wait before it offers resources again after the framework (i.e. here the Spark scheduler backend) has refused them. If you set it to 0, this means that Mesos will immediately offer these resources again with the next allocation (by default after 1 second). This will cause slightly higher traffic between the scheduler backend and the Mesos master. Alternatively, this parameter could be made configurable by Spark, but I am not sure if it is really worth the effort. In coarse grained mode, resources are allocated at the start. Are there any circumstances other than a lost executor, where refused resources will be used? Am 20.04.2015 um 19:03 schrieb Sean Owen: Sounds reasonable, since the value is reported to be invalid. The intent seemed to be to set this to unset or something. 5 seems to do something different as it sets it to a concrete value. Knowing nothing about this, is there maybe a closer equivalent value like 0? or is it really best to set this to a fixed value? â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/5597#issuecomment-94510001. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5342][YARN] Allow long running Spark ap...
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/4688#discussion_r28728653 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -122,7 +126,7 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp } else { logInfo(Registered executor: + executorRef + with ID + executorId) context.reply(RegisteredExecutor) - + latestTokens.foreach(x = context.reply(NewTokens(x))) --- End diff -- I actually tried doing that, but it failed a huge number of tests and thought it was perhaps not worth that risk. With the new RPC thing, I was not sure if calling send from inside this method would break something somewhere. Really, yes, it is `send` we want to call here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5360] [SPARK-6606] Eliminate duplicate ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4145#issuecomment-94581077 [Test build #30604 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30604/consoleFull) for PR 4145 at commit [`85156c3`](https://github.com/apache/spark/commit/85156c33fe37901ad4059bc5b227b6cc99645c9e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5360] [SPARK-6606] Eliminate duplicate ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4145#issuecomment-94581090 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30604/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6479][Block Manager]Create off-heap blo...
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/5430#issuecomment-94583699 @rxin Could you help to review the patch and let me know if you have any concern? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6418] Add simple per-stage visualizatio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5547#issuecomment-94585899 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30610/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6418] Add simple per-stage visualizatio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5547#issuecomment-94585898 [Test build #30610 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30610/consoleFull) for PR 5547 at commit [`7fac1eb`](https://github.com/apache/spark/commit/7fac1eb96c61cb23e020aa55c306f1b698e4196b). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6418] Add simple per-stage visualizatio...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/5547#issuecomment-94585961 @pwendell talked about this and #2342 a little bit offline. Our feeling is that this is a more elegant representation of task times than #2342, especially when there are many tasks within a stage. One concern I have, however, what happens when you zoom (does it currently support zooming?). It would make little sense to zoom without keeping the axes, but my impression is that implementing this is pretty hard since we're directly using d3. Bonus: It doesn't have to be part of this patch, but it would really cool if there's a mode where we can align the breakdown of the task times along the vertical axis. Right now you can't really compare the serialization time of the first task with that of the last task, let alone track whether it has grown incrementally over time. Realistically we will implement this separately say for 1.5, but I imagine this bonus feature is gonna be immensely useful. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3376] Add in-memory shuffle option.
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5403#issuecomment-94588226 Since it's a pretty simple implementation, I'd be fine if it were merged in. But I think we should say clearly that it can be useful for benchmarking, etc, but isn't meant to be used in production setting since it's not robust to OOM. /cc @rxin for his thoughts also. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6985][streaming] Receiver maxRate over ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5559#issuecomment-94590862 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30606/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6985][streaming] Receiver maxRate over ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5559#issuecomment-94590849 [Test build #30606 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30606/consoleFull) for PR 5559 at commit [`d29d2e0`](https://github.com/apache/spark/commit/d29d2e060fe48e8a3f1e506bf2bf2cc13d99d751). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [ML][SPARK-6529] Add Word2Vec transformer
Github user oefirouz commented on a diff in the pull request: https://github.com/apache/spark/pull/5596#discussion_r28739862 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -0,0 +1,238 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.feature + +import org.apache.spark.annotation.AlphaComponent +import org.apache.spark.ml.{Estimator, Model} +import org.apache.spark.ml.param.{HasInputCol, ParamMap, Params, _} +import org.apache.spark.mllib.feature +import org.apache.spark.mllib.linalg.{Vector, VectorUDT} +import org.apache.spark.sql.{DataFrame, Row} +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.types._ +import org.apache.spark.util.Utils + +/** + * Params for [[Word2Vec]] and [[Word2VecModel]]. + */ +private[feature] trait Word2VecParams extends Params + with HasInputCol with HasMaxIter with HasLearningRate { + + /** + * The dimension of the code that you want to transform from words. + */ + val vectorSize = new IntParam( +this, vectorSize, the dimension of codes after transforming from words, Some(100)) + + /** @group getParam */ + def getVectorSize: Int = get(vectorSize) + + /** + * Number of partitions for sentences of words. + */ + val numPartitions = new IntParam( +this, numPartitions, number of partitions for sentences of words, Some(1)) + + /** @group getParam */ + def getNumPartitions: Int = get(numPartitions) + + /** + * A random seed to random an initial vector. + */ + val seed = new LongParam( +this, seed, a random seed to random an initial vector, Some(Utils.random.nextLong())) + + /** @group getParam */ + def getSeed: Long = get(seed) + + /** + * The minimum count of words that can be kept in training set. --- End diff -- this wording is unclear, perhaps it would just be easier to copy the comments from the implementation? so for example: The minimum number of times a token must appear to be included in the word2vec model's vocabulary https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/5580#issuecomment-94594756 @lianhuiwang Two high level comments. First, why just put it on `--py-files`? If it works as expected, then it will automatically add it to the executor's `PYTHONPATH`s without you having to do it manually as you have done in `Client.scala`. Second, this seems to still require some action on the user's part. They must manually zip up all the python archives themselves and put it on the `PYSPARK_ARCHIVES_PATH`. I would propose that we do this automatically for the user behind the scenes using Java's `ZipEntry` APIs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5932][CORE] Use consistent naming for s...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/5574#discussion_r28742073 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -1037,21 +1037,52 @@ private[spark] object Utils extends Logging { } /** - * Convert a Java memory parameter passed to -Xmx (such as 300m or 1g) to a number of megabytes. + * Convert a passed byte string (e.g. 50b, 100k, or 250m) to bytes for + * internal use. + * + * If no suffix is provided, the passed number is assumed to be in bytes. + */ + def byteStringAsBytes(str: String): Long = { +JavaUtils.byteStringAsBytes(str) + } + + /** + * Convert a passed byte string (e.g. 50b, 100k, or 250m) to kibibytes for + * internal use. --- End diff -- nit: this could totally fit on the previous line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Project Infra] SPARK-1684: Merge script shoul...
Github user texasmichelle commented on the pull request: https://github.com/apache/spark/pull/5149#issuecomment-94603924 Prompt now disappears if the title is not modified - good call, @pwendell. Is everyone happy with the rules here? I uploaded a doc to the JIRA that shows what the before and after would be for existing PR titles (spark_pulls_before_after.txt). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6994] Allow to fetch field values by na...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5573#issuecomment-94607134 [Test build #30611 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30611/consoleFull) for PR 5573 at commit [`6145ae3`](https://github.com/apache/spark/commit/6145ae3fb30251e0affdda2358f2c95d1407d3aa). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6994] Allow to fetch field values by na...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5573#issuecomment-94607143 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30611/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6418] Add simple per-stage visualizatio...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5547#discussion_r28744020 --- Diff: core/src/main/resources/org/apache/spark/ui/static/jobs-graph.js --- @@ -0,0 +1,118 @@ +function renderJobsGraphs(data) { + /* show visualization toggle */ + $(.expand-visualization-arrow).toggleClass('arrow-closed'); + $(.expand-visualization-arrow).toggleClass('arrow-open'); + if ($(.expand-visualization-arrow).hasClass(arrow-closed)) { + $(#chartContainer).empty(); + return; + } + + /* no data to graph */ + if (!Object.keys(data).length) { + return; + } + + /* format data to a form readable by dimple.js */ + var tableData = []; + for (var k in data) { + var arr = (data[k]).split(,); + data[k] = arr; + } + var startTime = getMin(data[launchtime]); + var numTasks = Math.min(1000, data[k].length); + + /*data update */ + data[launchtime] = data[launchtime].map(function (launchTime) {return launchTime-startTime;}); + var maxTime = 0; + for (i = 0; i numTasks; i++) { + var time = 0; + for (var key in data) { + time += parseFloat(data[key][i]); --- End diff -- this might be pretty slow when there are thousands of tasks - if so, sending proper double types in JSON would be faster. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-94561695 @selvinsource I sent you a PR at https://github.com/selvinsource/spark/pull/1 to update the code style. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [doc][streaming] Fixed broken link in mllib se...
GitHub user BenFradet opened a pull request: https://github.com/apache/spark/pull/5600 [doc][streaming] Fixed broken link in mllib section The commit message is pretty self-explanatory. You can merge this pull request into a Git repository by running: $ git pull https://github.com/BenFradet/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5600.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5600 commit 108492dfa78a2b922f1acd2fb76a8b6c35158c93 Author: BenFradet benjamin.fra...@gmail.com Date: 2015-04-20T20:32:21Z [doc][streaming] Fixed broken link in mllib section --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5342][YARN] Allow long running Spark ap...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4688#discussion_r28728339 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -122,7 +126,7 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp } else { logInfo(Registered executor: + executorRef + with ID + executorId) context.reply(RegisteredExecutor) - + latestTokens.foreach(x = context.reply(NewTokens(x))) --- End diff -- So this is weird. You just replied in the line above, so replying here doesn't make sense. Perhaps you mean `send`? Also, why not stash the tokens in the `RegisteredExecutor` message? One less message to send! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7000] [ml] Refactor prediction and tree...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5585#issuecomment-94570451 Ok, so you'd vote for having separate subpackages for each type of classification/prediction abstraction? * ml.prediction.Predictor (once it is public) * ml.tree.* * ml.ensembles.* (once we add general boosting, bagging) * (There may be more which are not on the roadmap.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4435#discussion_r28731144 --- Diff: core/src/main/java/org/apache/spark/status/api/EnumUtil.java --- @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.status.api; + +import com.google.common.base.Joiner; + +import java.util.Arrays; + +public class EnumUtil { --- End diff -- Tag this as `@DeveloperApi`? Kinda sucks that Java doesn't have the `private[foo]` modifier. (I'm trying to avoid suggesting using Hadoop's `InterfaceAudience` annotation...) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4435#discussion_r28731520 --- Diff: core/src/main/java/org/apache/spark/status/api/v1/StageStatus.java --- @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.status.api.v1; + +import org.apache.spark.status.api.EnumUtil; + +public enum StageStatus { + Active, --- End diff -- Any reason for these to not be `ALL_CAPS` like, e.g., `ApplicationStatus`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4435#discussion_r28731535 --- Diff: core/src/main/java/org/apache/spark/status/api/v1/TaskSorting.java --- @@ -0,0 +1,45 @@ +package org.apache.spark.status.api.v1;/* --- End diff -- Oops. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org