[GitHub] spark pull request: [SPARK-8590][SQL] add code gen for ExtractValu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6982#issuecomment-116558337 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8590][SQL] add code gen for ExtractValu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6982#issuecomment-116590951 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8694][WebUI]Defer executing drawTaskAss...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/7071#issuecomment-116596423 I have never seen the situation you mentioned even when there are 1 tasks. The contents below the timeline are rendered immediately with my browser (Chrome, Firefox). I wonder this issue may depend on environments. @kayousterhout , @pwendell Have you ever seen this issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-8374] [YARN] Job frequently hangs after ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7083#issuecomment-116610617 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-8703] [ML] Add CountVectorizer as a ml ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7084#issuecomment-116639479 [Test build #35982 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35982/consoleFull) for PR 7084 at commit [`809fb59`](https://github.com/apache/spark/commit/809fb5947728de16c9addd5a8e27a41371394ff9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6263][MLLIB] Python MLlib API missing i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5707#issuecomment-116645394 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8590][SQL] add code gen for ExtractValu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6982#issuecomment-116652861 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-8703] [ML] Add CountVectorizer as a ml ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7084#issuecomment-116652920 [Test build #35981 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35981/console) for PR 7084 at commit [`7c61fb3`](https://github.com/apache/spark/commit/7c61fb32801ed802b8792663b1769c9eddd1346e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8590][SQL] add code gen for ExtractValu...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/6982#discussion_r33461668 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -245,13 +254,26 @@ abstract class UnaryExpression extends Expression with trees.UnaryNode[Expressio ctx: CodeGenContext, ev: GeneratedExpressionCode, f: String = String): String = { +nullSafeCodeGen(ctx, ev, (result, eval) = { + s$result = ${f(eval)}; +}) + } + + /** + * Called by unary expressions to generate a code block that returns null if its parent returns + * null, and if not not null, use `f` to generate the expression. + */ + protected def nullSafeCodeGen( + ctx: CodeGenContext, + ev: GeneratedExpressionCode, + f: (String, String) = String): String = { val eval = child.gen(ctx) -// reuse the previous isNull -ev.isNull = eval.isNull --- End diff -- I removed this because if `child` is `Literal`, then `eval.isNull` is literal boolean, and we can't change `ev.isNull` afterwards. It's not a problem before as we won't change `ev.isNull` in `defineCodeGen`, but now we will and need to do it in `nullSafeCodeGen`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8693][Project Infra]: profiles and goal...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7085#issuecomment-116661837 [Test build #35986 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35986/consoleFull) for PR 7085 at commit [`c5575f1`](https://github.com/apache/spark/commit/c5575f1276032e878c7d7e680ccbf9eb527c2f68). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6263][MLLIB] Python MLlib API missing i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5707#issuecomment-116668972 [Test build #35983 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35983/console) for PR 5707 at commit [`3fc27e7`](https://github.com/apache/spark/commit/3fc27e76efda90b575a714b1bf79495a553ddc86). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1537 [WiP] Application Timeline Server i...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/5423#discussion_r33465613 --- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala --- @@ -220,9 +220,9 @@ private[spark] object JettyUtils extends Logging { val pool = new QueuedThreadPool pool.setDaemon(true) server.setThreadPool(pool) - val errorHandler = new ErrorHandler() - errorHandler.setShowStacks(true) - server.addBean(errorHandler) + val errorHandler = new ErrorHandler(); --- End diff -- probably a copy/paste relic; will fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6263][MLLIB] Python MLlib API missing i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5707#issuecomment-116669107 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1537 [WiP] Application Timeline Server i...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/5423#discussion_r33466420 --- Diff: yarn/history/src/main/scala/org/apache/spark/deploy/history/yarn/YarnHistoryProvider.scala --- @@ -0,0 +1,1015 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.history.yarn + +import java.io.FileNotFoundException +import java.net.URI +import java.util.Date +import java.util.concurrent.LinkedBlockingQueue +import java.util.concurrent.atomic.{AtomicLong, AtomicBoolean} +import java.util.zip.ZipOutputStream + +import scala.collection.JavaConversions._ + +import org.apache.hadoop.security.UserGroupInformation +import org.apache.hadoop.yarn.api.records.timeline.TimelineEntity +import org.apache.hadoop.yarn.conf.YarnConfiguration + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.history.yarn.YarnTimelineUtils._ +import org.apache.spark.deploy.history.yarn.rest.{JerseyBinding, TimelineQueryClient} +import org.apache.spark.deploy.history.{ApplicationHistoryInfo, ApplicationHistoryProvider, HistoryServer} +import org.apache.spark.scheduler.{ApplicationEventListener, SparkListenerBus} +import org.apache.spark.ui.SparkUI +import org.apache.spark.{SparkException, Logging, SecurityManager, SparkConf} + +/** + * A History provider which reads in the history from + * the YARN Timeline Service. + * + * The service is a remote HTTP service, so failure modes are + * different from simple file IO. + * + * 1. Application listings are asynchronous, and made on a schedule, though + * they can be forced (and the schedule disabled). + * 2. The results are cached and can be retrieved with [[getApplications()]]. + * 3. The most recent failure of any operation is stored, + * The [[getLastFailure()]] call will return the last exception + * or `None`. It is shared across threads so is primarily there for + * tests and basic diagnostics. + * 4. Listing the details of a single application in [[getAppUI()]] + * is synchronous and *not* cached. + * 5. the [[maybeCheckHealth()]] call performs a health check as the initial + * binding operation of this instance. This call invokes [[TimelineQueryClient.healthCheck()]] + * for better diagnostics on binding failures -particularly configuration problems. + * 6. Every REST call, synchronous or asynchronous, will invoke [[maybeCheckHealth()]] until + * the health check eventually succeeds. + * p + * If the timeline is not enabled, the API calls used by the web UI + * downgrade gracefully (returning empty entries), rather than fail. + * + * + * @param sparkConf configuration of the provider + */ +private[spark] class YarnHistoryProvider(sparkConf: SparkConf) + extends ApplicationHistoryProvider with Logging { + + /** + * The configuration here is a YarnConfiguration built off the spark configuration + * supplied in the constructor; this operation ensures that `yarn-default.xml` + * and `yarn-site.xml` are pulled in. Options in the spark conf will override + * those in the -default and -site XML resources which are not marked as final. + */ + private val yarnConf = { +new YarnConfiguration(SparkHadoopUtil.get.newConfiguration(sparkConf)) + } + + /** + * UI ACL option + */ + private val uiAclsEnabled = sparkConf.getBoolean(spark.history.ui.acls.enable, false) + + private val detailedInfo = sparkConf.getBoolean(YarnHistoryProvider.OPTION_DETAILED_INFO, false) + private val NOT_STARTED = Not Started + + /* minimum interval between each check for event log updates */ + private val refreshInterval = sparkConf.getLong(YarnHistoryProvider.OPTION_MIN_REFRESH_INTERVAL, +YarnHistoryProvider.DEFAULT_MIN_REFRESH_INTERVAL_SECONDS) * 1000 + + /** + * Window limit in milliseconds + */ + private val
[GitHub] spark pull request: SPARK-1537 [WiP] Application Timeline Server i...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/5423#discussion_r33465769 --- Diff: yarn/history/src/main/scala/org/apache/spark/deploy/history/yarn/YarnEventListener.scala --- @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.history.yarn + +import org.apache.spark.scheduler._ +import org.apache.spark.{Logging, SparkContext} + +private[spark] class YarnEventListener(sc: SparkContext, service: YarnHistoryService) + extends SparkListener with Logging { + + /** + * Called when a stage completes successfully or fails, with information on the completed stage. + */ + override def onStageCompleted(stageCompleted: SparkListenerStageCompleted) { +service.enqueue(new HandleSparkEvent(stageCompleted, now())) + } + + /** + * Source of current time; may be overridden in tests + * @return + */ + protected def now(): Long = { --- End diff -- Handn't seen that -yes. I knew of Google Ticker, but I also know of the Google stopwatch incident, so am never in a rush to add any Guava dependencies to software. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6743#issuecomment-116674849 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8702][WebUI]Avoid massive concating str...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/7082#issuecomment-116682495 Oh, I see. could you file another one as a separate issue? Opened a JIRA for this issue: https://issues.apache.org/jira/browse/SPARK-8705 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1537 [WiP] Application Timeline Server i...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/5423#discussion_r33467960 --- Diff: yarn/history/src/main/scala/org/apache/spark/deploy/history/yarn/YarnHistoryProvider.scala --- @@ -0,0 +1,1015 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.history.yarn + +import java.io.FileNotFoundException +import java.net.URI +import java.util.Date +import java.util.concurrent.LinkedBlockingQueue +import java.util.concurrent.atomic.{AtomicLong, AtomicBoolean} +import java.util.zip.ZipOutputStream + +import scala.collection.JavaConversions._ + +import org.apache.hadoop.security.UserGroupInformation +import org.apache.hadoop.yarn.api.records.timeline.TimelineEntity +import org.apache.hadoop.yarn.conf.YarnConfiguration + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.history.yarn.YarnTimelineUtils._ +import org.apache.spark.deploy.history.yarn.rest.{JerseyBinding, TimelineQueryClient} +import org.apache.spark.deploy.history.{ApplicationHistoryInfo, ApplicationHistoryProvider, HistoryServer} +import org.apache.spark.scheduler.{ApplicationEventListener, SparkListenerBus} +import org.apache.spark.ui.SparkUI +import org.apache.spark.{SparkException, Logging, SecurityManager, SparkConf} + +/** + * A History provider which reads in the history from + * the YARN Timeline Service. + * + * The service is a remote HTTP service, so failure modes are + * different from simple file IO. + * + * 1. Application listings are asynchronous, and made on a schedule, though + * they can be forced (and the schedule disabled). + * 2. The results are cached and can be retrieved with [[getApplications()]]. + * 3. The most recent failure of any operation is stored, + * The [[getLastFailure()]] call will return the last exception + * or `None`. It is shared across threads so is primarily there for + * tests and basic diagnostics. + * 4. Listing the details of a single application in [[getAppUI()]] + * is synchronous and *not* cached. + * 5. the [[maybeCheckHealth()]] call performs a health check as the initial + * binding operation of this instance. This call invokes [[TimelineQueryClient.healthCheck()]] + * for better diagnostics on binding failures -particularly configuration problems. + * 6. Every REST call, synchronous or asynchronous, will invoke [[maybeCheckHealth()]] until + * the health check eventually succeeds. + * p + * If the timeline is not enabled, the API calls used by the web UI + * downgrade gracefully (returning empty entries), rather than fail. + * + * + * @param sparkConf configuration of the provider + */ +private[spark] class YarnHistoryProvider(sparkConf: SparkConf) + extends ApplicationHistoryProvider with Logging { + + /** + * The configuration here is a YarnConfiguration built off the spark configuration + * supplied in the constructor; this operation ensures that `yarn-default.xml` + * and `yarn-site.xml` are pulled in. Options in the spark conf will override + * those in the -default and -site XML resources which are not marked as final. + */ + private val yarnConf = { +new YarnConfiguration(SparkHadoopUtil.get.newConfiguration(sparkConf)) + } + + /** + * UI ACL option + */ + private val uiAclsEnabled = sparkConf.getBoolean(spark.history.ui.acls.enable, false) + + private val detailedInfo = sparkConf.getBoolean(YarnHistoryProvider.OPTION_DETAILED_INFO, false) + private val NOT_STARTED = Not Started + + /* minimum interval between each check for event log updates */ + private val refreshInterval = sparkConf.getLong(YarnHistoryProvider.OPTION_MIN_REFRESH_INTERVAL, +YarnHistoryProvider.DEFAULT_MIN_REFRESH_INTERVAL_SECONDS) * 1000 + + /** + * Window limit in milliseconds + */ + private val
[GitHub] spark pull request: [SPARK-8402][MLLIB] DP Means Clustering
Github user FlytxtRnD commented on the pull request: https://github.com/apache/spark/pull/6880#issuecomment-116645007 @mengxr Could you please say your comments on this PR ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6263][MLLIB] Python MLlib API missing i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5707#issuecomment-116645468 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8702][WebUI]Avoid massive concating str...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/7082#issuecomment-116657416 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8702][WebUI]Avoid massive concating str...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7082#issuecomment-116658293 [Test build #35985 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35985/consoleFull) for PR 7082 at commit [`b29231d`](https://github.com/apache/spark/commit/b29231d5fd1b5e1ff8bcc68d8dd8706dda99ee58). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8702][WebUI]Avoid massive concating str...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7082#issuecomment-116657944 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8704] [ML] [PySpark] Add additional met...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/7086#issuecomment-116664808 This breaks `model.transform(doc).collect()` for `Word2Vec` but I do not understand why. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8405][WebUI] Show executor logs on Web ...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/7033#issuecomment-116664781 Unfortunately YARN --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1537 [WiP] Application Timeline Server i...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/5423#discussion_r33465661 --- Diff: docs/monitoring.md --- @@ -256,6 +256,157 @@ still required, though there is only one application available. Eg. to see the running app, you would go to `http://localhost:4040/api/v1/applications/[app-id]/jobs`. This is to keep the paths consistent in both modes. +## Hadoop YARN Timeline service history provider + +As well as the Filesystem History Provider, Spark can integrate with the Hadoop YARN +Application Timeline Service. This is a service which runs in a YARN cluster, recording +application- and YARN- published events to a database, retrieving them on request. + +Spark integrates with the timeline service by +1. Publishing events to the timeline service as applications execute. +2. Listing application histories published to the timeline service. +3. Retrieving the details of specific application histories. + +### Configuring the Timeline Service + +For details on configuring and starting the timeline service, consult the Hadoop documentation. + +From the perspective of Spark, the key requirements are +1. The YARN timeline service must be running. +1. Its URL is known, and configured in the `yarn-site.xml` configuration file. --- End diff -- Markdown takes either; here's [the rendered file|https://github.com/steveloughran/spark/blob/stevel/feature/SPARK-1537-ATS/docs/monitoring.md] --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8702][WebUI]Avoid massive concating str...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/7082#issuecomment-116669695 LGTM. After test #35985 pass, I'll merge this into `master`. Thanks @zsxwing for your contribution! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6743#issuecomment-116670483 Build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6743#issuecomment-116670346 Build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6735:[YARN] Adding properties to disable...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/5449#issuecomment-116689661 @twinkle-sachdeva do you have time to address at Sandy's comments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-8374] [YARN] Job frequently hangs after ...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/7083#issuecomment-116693495 @xuchenCN can you add the first [ to the description before jira number --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/6743#issuecomment-116698303 @tgravescsï¼ I think the problem of shipping R itself is that R executable is platform specific. Also it may require OS specific installation before running R (not sure). pySpark also does not ship python itself. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7013][ML][test]Add unit test for spark....
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/6665#discussion_r33459310 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StandardScalerSuite.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.feature + + +import org.apache.spark.SparkFunSuite +import org.apache.spark.mllib.linalg.{DenseVector, SparseVector, Vector, Vectors} +import org.apache.spark.mllib.util.MLlibTestSparkContext +import org.apache.spark.mllib.util.TestingUtils._ +import org.apache.spark.sql.{DataFrame, Row, SQLContext} + +class StandardScalerSuite extends SparkFunSuite with MLlibTestSparkContext{ + + @transient var data: Array[Vector] = _ + @transient var resWithStd: Array[Vector] = _ + @transient var resWithMean: Array[Vector] = _ + @transient var resWithBoth: Array[Vector] = _ + + override def beforeAll(): Unit = { +super.beforeAll() + +data = Array( + Vectors.dense(-2.0, 2.3, 0.0), + Vectors.dense(0.0, -5.1, 1.0), + Vectors.dense(1.7, -0.6, 3.3) +) +resWithMean = Array( + Vectors.dense(-1.9, 3.4333, -1.4333), + Vectors.dense(0.1, -3.9667, -0.4333), + Vectors.dense(1.8, 0.5333, 1.8667) +) +resWithStd = Array( + Vectors.dense(-1.079898494312, 0.616834091415, 0.0), + Vectors.dense(0.0, -1.367762550529, 0.590968109266), + Vectors.dense(0.917913720165, -0.160913241239, 1.950194760579) +) +resWithBoth = Array( + Vectors.dense(-1.0259035695965, 0.920781324866, -0.8470542899497), + Vectors.dense(0.0539949247156, -1.063815317078, -0.256086180682), + Vectors.dense(0.9719086448809, 0.143033992212, 1.103140470631) +) + } + --- End diff -- checkParams? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [CORE] [SPARK-6593] Provide option for HadoopR...
Github user tigerquoll commented on the pull request: https://github.com/apache/spark/pull/5250#issuecomment-116651383 Correct. Cheers,Dale. Date: Fri, 19 Jun 2015 06:01:26 -0700 From: notificati...@github.com To: sp...@noreply.github.com CC: tigerqu...@outlook.com Subject: Re: [spark] [CORE] [SPARK-6593] Provide option for HadoopRDD to skip corrupted files (#5250) I think this should be closed now in favor of #5368 right? â Reply to this email directly or view it on GitHub. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8590][SQL] add code gen for ExtractValu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6982#issuecomment-116654128 [Test build #35984 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35984/consoleFull) for PR 6982 at commit [`17439fe`](https://github.com/apache/spark/commit/17439fe330333a494300cbb8437dc4b381a845c1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8702][WebUI]Avoid massive concating str...
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/7082#discussion_r33462776 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala --- @@ -572,55 +572,55 @@ private[ui] class StagePage(parent: StagesTab) extends WebUIPage(stage) { val attempt = taskInfo.attempt val timelineObject = s - { - 'className': 'task task-assignment-timeline-object', - 'group': '$executorId', - 'content': 'div class=task-assignment-timeline-content' + - 'data-toggle=tooltip data-placement=top' + - 'data-html=true data-container=body' + - 'data-title=${sTask + index + (attempt + attempt + )}br' + - 'Status: ${taskInfo.status}br' + - 'Launch Time: ${UIUtils.formatDate(new Date(launchTime))}' + - '${ + |{ + |'className': 'task task-assignment-timeline-object', + |'group': '$executorId', + |'content': 'div class=task-assignment-timeline-content + |data-toggle=tooltip data-placement=top + |data-html=true data-container=body + |data-title=${sTask + index + (attempt + attempt + )}br + |Status: ${taskInfo.status}br + |Launch Time: ${UIUtils.formatDate(new Date(launchTime))} + |${ if (!taskInfo.running) { sbrFinish Time: ${UIUtils.formatDate(new Date(finishTime))} } else { } - }' + - 'brScheduler Delay: $schedulerDelay ms' + - 'brTask Deserialization Time: ${UIUtils.formatDuration(deserializationTime)}' + - 'brShuffle Read Time: ${UIUtils.formatDuration(shuffleReadTime)}' + - 'brExecutor Computing Time: ${UIUtils.formatDuration(executorComputingTime)}' + - 'brShuffle Write Time: ${UIUtils.formatDuration(shuffleWriteTime)}' + - 'brResult Serialization Time: ${UIUtils.formatDuration(serializationTime)}' + - 'brGetting Result Time: ${UIUtils.formatDuration(gettingResultTime)}' + - 'svg class=task-assignment-timeline-duration-bar' + - 'rect class=scheduler-delay-proportion ' + - 'x=$schedulerDelayProportionPos% y=0px height=26px' + - 'width=$schedulerDelayProportion%/rect' + - 'rect class=deserialization-time-proportion '+ - 'x=$deserializationTimeProportionPos% y=0px height=26px' + - 'width=$deserializationTimeProportion%/rect' + - 'rect class=shuffle-read-time-proportion ' + - 'x=$shuffleReadTimeProportionPos% y=0px height=26px' + - 'width=$shuffleReadTimeProportion%/rect' + - 'rect class=executor-runtime-proportion ' + - 'x=$executorRuntimeProportionPos% y=0px height=26px' + - 'width=$executorComputingTimeProportion%/rect' + - 'rect class=shuffle-write-time-proportion ' + - 'x=$shuffleWriteTimeProportionPos% y=0px height=26px' + - 'width=$shuffleWriteTimeProportion%/rect' + - 'rect class=serialization-time-proportion ' + - 'x=$serializationTimeProportionPos% y=0px height=26px' + - 'width=$serializationTimeProportion%/rect' + - 'rect class=getting-result-time-proportion ' + - 'x=$gettingResultTimeProportionPos% y=0px height=26px' + - 'width=$gettingResultTimeProportion%/rect/svg', - 'start': new Date($launchTime), - 'end': new Date($finishTime) - } - + } + |brScheduler Delay: $schedulerDelay ms + |brTask Deserialization Time: ${UIUtils.formatDuration(deserializationTime)} + |brShuffle Read Time: ${UIUtils.formatDuration(shuffleReadTime)} + |brExecutor Computing Time: ${UIUtils.formatDuration(executorComputingTime)} + |brShuffle Write Time: ${UIUtils.formatDuration(shuffleWriteTime)} + |brResult Serialization Time: ${UIUtils.formatDuration(serializationTime)} + |brGetting Result Time: ${UIUtils.formatDuration(gettingResultTime)} + |svg class=task-assignment-timeline-duration-bar + |rect
[GitHub] spark pull request: [SPARK-8693][Project Infra]: profiles and goal...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7085#issuecomment-116660962 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8590][SQL] add code gen for ExtractValu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6982#issuecomment-116660971 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8704] [ML] [PySpark] Add additional met...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7086#issuecomment-116662430 [Test build #35988 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35988/consoleFull) for PR 7086 at commit [`ac9397b`](https://github.com/apache/spark/commit/ac9397b3b6933ab2c394a7b1db52ea70164cfb2c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8704] [ML] [PySpark] Add additional met...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7086#issuecomment-116662264 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-8703] [ML] Add CountVectorizer as a ml ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7084#issuecomment-116663360 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1537 [WiP] Application Timeline Server i...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/5423#discussion_r33466020 --- Diff: yarn/history/src/main/scala/org/apache/spark/deploy/history/yarn/YarnHistoryProvider.scala --- @@ -0,0 +1,1015 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.history.yarn + +import java.io.FileNotFoundException +import java.net.URI +import java.util.Date +import java.util.concurrent.LinkedBlockingQueue +import java.util.concurrent.atomic.{AtomicLong, AtomicBoolean} +import java.util.zip.ZipOutputStream + +import scala.collection.JavaConversions._ + +import org.apache.hadoop.security.UserGroupInformation +import org.apache.hadoop.yarn.api.records.timeline.TimelineEntity +import org.apache.hadoop.yarn.conf.YarnConfiguration + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.history.yarn.YarnTimelineUtils._ +import org.apache.spark.deploy.history.yarn.rest.{JerseyBinding, TimelineQueryClient} +import org.apache.spark.deploy.history.{ApplicationHistoryInfo, ApplicationHistoryProvider, HistoryServer} +import org.apache.spark.scheduler.{ApplicationEventListener, SparkListenerBus} +import org.apache.spark.ui.SparkUI +import org.apache.spark.{SparkException, Logging, SecurityManager, SparkConf} + +/** + * A History provider which reads in the history from + * the YARN Timeline Service. + * + * The service is a remote HTTP service, so failure modes are + * different from simple file IO. + * + * 1. Application listings are asynchronous, and made on a schedule, though + * they can be forced (and the schedule disabled). + * 2. The results are cached and can be retrieved with [[getApplications()]]. + * 3. The most recent failure of any operation is stored, + * The [[getLastFailure()]] call will return the last exception + * or `None`. It is shared across threads so is primarily there for + * tests and basic diagnostics. + * 4. Listing the details of a single application in [[getAppUI()]] + * is synchronous and *not* cached. + * 5. the [[maybeCheckHealth()]] call performs a health check as the initial + * binding operation of this instance. This call invokes [[TimelineQueryClient.healthCheck()]] + * for better diagnostics on binding failures -particularly configuration problems. + * 6. Every REST call, synchronous or asynchronous, will invoke [[maybeCheckHealth()]] until + * the health check eventually succeeds. + * p + * If the timeline is not enabled, the API calls used by the web UI + * downgrade gracefully (returning empty entries), rather than fail. + * + * + * @param sparkConf configuration of the provider + */ +private[spark] class YarnHistoryProvider(sparkConf: SparkConf) + extends ApplicationHistoryProvider with Logging { + + /** + * The configuration here is a YarnConfiguration built off the spark configuration + * supplied in the constructor; this operation ensures that `yarn-default.xml` + * and `yarn-site.xml` are pulled in. Options in the spark conf will override + * those in the -default and -site XML resources which are not marked as final. + */ + private val yarnConf = { +new YarnConfiguration(SparkHadoopUtil.get.newConfiguration(sparkConf)) + } + + /** + * UI ACL option + */ + private val uiAclsEnabled = sparkConf.getBoolean(spark.history.ui.acls.enable, false) + + private val detailedInfo = sparkConf.getBoolean(YarnHistoryProvider.OPTION_DETAILED_INFO, false) + private val NOT_STARTED = Not Started + + /* minimum interval between each check for event log updates */ + private val refreshInterval = sparkConf.getLong(YarnHistoryProvider.OPTION_MIN_REFRESH_INTERVAL, +YarnHistoryProvider.DEFAULT_MIN_REFRESH_INTERVAL_SECONDS) * 1000 + + /** + * Window limit in milliseconds + */ + private val
[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6743#issuecomment-116673790 [Test build #35989 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35989/consoleFull) for PR 6743 at commit [`0925e2a`](https://github.com/apache/spark/commit/0925e2a4d7c9e9f67cd95d0444bc8d20b888db57). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8704] [ML] [PySpark] Add additional met...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7086#issuecomment-116680056 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4449][Core]Specify port range in spark
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/5722#issuecomment-116688294 I haven't been keeping up with this. @vanzin @srowen do you have further comments on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-8703] [ML] Add CountVectorizer as a ml ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7084#issuecomment-116638087 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-8703] [ML] Add CountVectorizer as a ml ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7084#issuecomment-116638157 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-116638244 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-116638099 [Test build #35976 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35976/console) for PR 7080 at commit [`588c75f`](https://github.com/apache/spark/commit/588c75f714372b6da4dd20fa7d006afe399fa8e2). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8405][WebUI] Show executor logs on Web ...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/7033#issuecomment-116643570 After configuring yarn.log.server.url in yarn-site.xml, the log url will be redirected to MR job history server to show the spark log. This means the user has to have the MR job history server running to view the Spark logs on Web UI. Idealy the aggregated logs should be showed by a Yarn generic app history server or Spark history server itself instead of MR history server. Anyway if this is fine for now, I'll close this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1537 [WiP] Application Timeline Server i...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/5423#discussion_r33459858 --- Diff: yarn/history/src/main/scala/org/apache/spark/deploy/history/yarn/YarnHistoryProvider.scala --- @@ -0,0 +1,1015 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.history.yarn + +import java.io.FileNotFoundException +import java.net.URI +import java.util.Date +import java.util.concurrent.LinkedBlockingQueue +import java.util.concurrent.atomic.{AtomicLong, AtomicBoolean} +import java.util.zip.ZipOutputStream + +import scala.collection.JavaConversions._ + +import org.apache.hadoop.security.UserGroupInformation +import org.apache.hadoop.yarn.api.records.timeline.TimelineEntity +import org.apache.hadoop.yarn.conf.YarnConfiguration + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.history.yarn.YarnTimelineUtils._ +import org.apache.spark.deploy.history.yarn.rest.{JerseyBinding, TimelineQueryClient} +import org.apache.spark.deploy.history.{ApplicationHistoryInfo, ApplicationHistoryProvider, HistoryServer} +import org.apache.spark.scheduler.{ApplicationEventListener, SparkListenerBus} +import org.apache.spark.ui.SparkUI +import org.apache.spark.{SparkException, Logging, SecurityManager, SparkConf} + +/** + * A History provider which reads in the history from + * the YARN Timeline Service. + * + * The service is a remote HTTP service, so failure modes are + * different from simple file IO. + * + * 1. Application listings are asynchronous, and made on a schedule, though + * they can be forced (and the schedule disabled). + * 2. The results are cached and can be retrieved with [[getApplications()]]. + * 3. The most recent failure of any operation is stored, + * The [[getLastFailure()]] call will return the last exception + * or `None`. It is shared across threads so is primarily there for + * tests and basic diagnostics. + * 4. Listing the details of a single application in [[getAppUI()]] + * is synchronous and *not* cached. + * 5. the [[maybeCheckHealth()]] call performs a health check as the initial + * binding operation of this instance. This call invokes [[TimelineQueryClient.healthCheck()]] + * for better diagnostics on binding failures -particularly configuration problems. + * 6. Every REST call, synchronous or asynchronous, will invoke [[maybeCheckHealth()]] until + * the health check eventually succeeds. + * p + * If the timeline is not enabled, the API calls used by the web UI + * downgrade gracefully (returning empty entries), rather than fail. + * + * + * @param sparkConf configuration of the provider + */ +private[spark] class YarnHistoryProvider(sparkConf: SparkConf) + extends ApplicationHistoryProvider with Logging { + + /** + * The configuration here is a YarnConfiguration built off the spark configuration + * supplied in the constructor; this operation ensures that `yarn-default.xml` + * and `yarn-site.xml` are pulled in. Options in the spark conf will override + * those in the -default and -site XML resources which are not marked as final. + */ + private val yarnConf = { +new YarnConfiguration(SparkHadoopUtil.get.newConfiguration(sparkConf)) + } + + /** + * UI ACL option + */ + private val uiAclsEnabled = sparkConf.getBoolean(spark.history.ui.acls.enable, false) + + private val detailedInfo = sparkConf.getBoolean(YarnHistoryProvider.OPTION_DETAILED_INFO, false) + private val NOT_STARTED = Not Started + + /* minimum interval between each check for event log updates */ + private val refreshInterval = sparkConf.getLong(YarnHistoryProvider.OPTION_MIN_REFRESH_INTERVAL, +YarnHistoryProvider.DEFAULT_MIN_REFRESH_INTERVAL_SECONDS) * 1000 + + /** + * Window limit in milliseconds + */ + private val
[GitHub] spark pull request: [SPARK-8693][Project Infra]: profiles and goal...
GitHub user brennonyork opened a pull request: https://github.com/apache/spark/pull/7085 [SPARK-8693][Project Infra]: profiles and goals are not printed in a nice way Hotfix to correct formatting errors of print statements within the dev and jenkins builds. Error looks like: ``` -Phadoop-1[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: -Dhadoop.version=1.0.4[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: -Pkinesis-asl[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: -Phive-thriftserver[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: -Phive[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: package[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: assembly/assembly[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments: streaming-kafka-assembly/assembly ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/brennonyork/spark SPARK-8693 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7085.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7085 commit c5575f1276032e878c7d7e680ccbf9eb527c2f68 Author: Brennon York brennon.y...@capitalone.com Date: 2015-06-29T13:26:45Z added commas to end of print statements for proper printing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8590][SQL] add code gen for ExtractValu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6982#issuecomment-116661524 [Test build #35987 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35987/consoleFull) for PR 6982 at commit [`773d3a1`](https://github.com/apache/spark/commit/773d3a1f5ed5c403a22903800fba419d95ab821c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8704] [ML] [PySpark] Add additional met...
GitHub user MechCoder opened a pull request: https://github.com/apache/spark/pull/7086 [SPARK-8704] [ML] [PySpark] Add additional methods to wrappers in ml.pyspark.feature Add std, mean to StandardScalerModel getVectors, findSynonyms to Word2Vec Model setFeatures and getFeatures to hashingTF You can merge this pull request into a Git repository by running: $ git pull https://github.com/MechCoder/spark missing_model_methods Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7086.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7086 commit ac9397b3b6933ab2c394a7b1db52ea70164cfb2c Author: MechCoder manojkumarsivaraj...@gmail.com Date: 2015-06-29T13:27:38Z [SPARK-8704] [ML] [PySpark] Add additional methods to wrappers in ml.pyspark.feature --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1537 [WiP] Application Timeline Server i...
Github user steveloughran commented on the pull request: https://github.com/apache/spark/pull/5423#issuecomment-116667971 Thanks for sitting down to review it; it has grown to handle the end to end problem, auth, unreliable endpoints, etc, where some complexity is always the curse. 1. Note that most of the code is actually in the tests, a combination of mock, unit and functional -in the latter there's stuff that I hadn't seen the spark codebase yet, spinning up web services in VM and working with them. So yes, they do add more code, though some of that could be factored up into base modules. I just kept things as isolated into one path as possible. 2. Splitting up the patch? Would that help? Without that history-server side of things, there's not much in the way of testing the publishing aspects âespecially of the big one can the published events be unmarshalled and used to rebuld AppUI instances. If you want it split up for reviewing, I'll gladly do it, with that caveat that full test coverage comes when the History server joins in. 3. Style things, easily addressed âthe usual subtleties of different project's expectations. Regarding app attempts, this pull request has been up for review since before they came out; it's been chasing a moving target. I've also been keeping the code working against 1.3, the testing of which kept the scale and security coverage up. I hadn't done the app attempt stuff yet because of the high rate of change there, and was reasonably confident that once checked in another iteration would round it off. Now that works done I have the time to finish of these details -but its still trying to track something relatively unstable, so needs to get in. How about, then 1. I do all the style comments tag test only methods as `@VisibleForTesting`. If there are some other things you've not commented on, please highlight them. 2. I'll sync it up with the current payload of messages Unless it really makes a fundamental difference in getting the patch reviewed, I'd like to keep things together on the grounds of testability. But if splitting it up is what it takes to get in, that's what I'll do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8704] [ML] [PySpark] Add additional met...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7086#issuecomment-116679844 [Test build #35988 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35988/console) for PR 7086 at commit [`ac9397b`](https://github.com/apache/spark/commit/ac9397b3b6933ab2c394a7b1db52ea70164cfb2c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1537 [WiP] Application Timeline Server i...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/5423#discussion_r33467225 --- Diff: yarn/history/src/main/scala/org/apache/spark/deploy/history/yarn/YarnHistoryProvider.scala --- @@ -0,0 +1,1015 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.history.yarn + +import java.io.FileNotFoundException +import java.net.URI +import java.util.Date +import java.util.concurrent.LinkedBlockingQueue +import java.util.concurrent.atomic.{AtomicLong, AtomicBoolean} +import java.util.zip.ZipOutputStream + +import scala.collection.JavaConversions._ + +import org.apache.hadoop.security.UserGroupInformation +import org.apache.hadoop.yarn.api.records.timeline.TimelineEntity +import org.apache.hadoop.yarn.conf.YarnConfiguration + +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.history.yarn.YarnTimelineUtils._ +import org.apache.spark.deploy.history.yarn.rest.{JerseyBinding, TimelineQueryClient} +import org.apache.spark.deploy.history.{ApplicationHistoryInfo, ApplicationHistoryProvider, HistoryServer} +import org.apache.spark.scheduler.{ApplicationEventListener, SparkListenerBus} +import org.apache.spark.ui.SparkUI +import org.apache.spark.{SparkException, Logging, SecurityManager, SparkConf} + +/** + * A History provider which reads in the history from + * the YARN Timeline Service. + * + * The service is a remote HTTP service, so failure modes are + * different from simple file IO. + * + * 1. Application listings are asynchronous, and made on a schedule, though + * they can be forced (and the schedule disabled). + * 2. The results are cached and can be retrieved with [[getApplications()]]. + * 3. The most recent failure of any operation is stored, + * The [[getLastFailure()]] call will return the last exception + * or `None`. It is shared across threads so is primarily there for + * tests and basic diagnostics. + * 4. Listing the details of a single application in [[getAppUI()]] + * is synchronous and *not* cached. + * 5. the [[maybeCheckHealth()]] call performs a health check as the initial + * binding operation of this instance. This call invokes [[TimelineQueryClient.healthCheck()]] + * for better diagnostics on binding failures -particularly configuration problems. + * 6. Every REST call, synchronous or asynchronous, will invoke [[maybeCheckHealth()]] until + * the health check eventually succeeds. + * p + * If the timeline is not enabled, the API calls used by the web UI + * downgrade gracefully (returning empty entries), rather than fail. + * + * + * @param sparkConf configuration of the provider + */ +private[spark] class YarnHistoryProvider(sparkConf: SparkConf) + extends ApplicationHistoryProvider with Logging { + + /** + * The configuration here is a YarnConfiguration built off the spark configuration + * supplied in the constructor; this operation ensures that `yarn-default.xml` + * and `yarn-site.xml` are pulled in. Options in the spark conf will override + * those in the -default and -site XML resources which are not marked as final. + */ + private val yarnConf = { +new YarnConfiguration(SparkHadoopUtil.get.newConfiguration(sparkConf)) + } + + /** + * UI ACL option + */ + private val uiAclsEnabled = sparkConf.getBoolean(spark.history.ui.acls.enable, false) + + private val detailedInfo = sparkConf.getBoolean(YarnHistoryProvider.OPTION_DETAILED_INFO, false) + private val NOT_STARTED = Not Started + + /* minimum interval between each check for event log updates */ + private val refreshInterval = sparkConf.getLong(YarnHistoryProvider.OPTION_MIN_REFRESH_INTERVAL, +YarnHistoryProvider.DEFAULT_MIN_REFRESH_INTERVAL_SECONDS) * 1000 + + /** + * Window limit in milliseconds + */ + private val
[GitHub] spark pull request: [SPARK-8618] Obtain hbase token retries many t...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/7007#discussion_r33467275 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -1258,10 +1258,17 @@ object Client extends Logging { logDebug(Attempting to fetch HBase security token.) val hbaseConf = confCreate.invoke(null, conf) -val token = obtainToken.invoke(null, hbaseConf).asInstanceOf[Token[TokenIdentifier]] -credentials.addToken(token.getService, token) - -logInfo(Added HBase security token to credentials.) +val hbaseConfGet = (param: String) = Option(confClass + .getMethod(get, classOf[java.lang.String]) + .invoke(hbaseConf, param)) +val zkQuorum = hbaseConfGet(hbase.zookeeper.quorum) --- End diff -- Any particular reason to the zookeeper.quorum conf here? I'm guessing you just chose one to try. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6263][MLLIB] Python MLlib API missing i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5707#issuecomment-116646719 [Test build #35983 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35983/consoleFull) for PR 5707 at commit [`3fc27e7`](https://github.com/apache/spark/commit/3fc27e76efda90b575a714b1bf79495a553ddc86). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-8703] [ML] Add CountVectorizer as a ml ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7084#issuecomment-116652988 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8590][SQL] add code gen for ExtractValu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6982#issuecomment-116652902 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8702][WebUI]Avoid massive concating str...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/7082#discussion_r33462112 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala --- @@ -572,55 +572,55 @@ private[ui] class StagePage(parent: StagesTab) extends WebUIPage(stage) { val attempt = taskInfo.attempt val timelineObject = s - { - 'className': 'task task-assignment-timeline-object', - 'group': '$executorId', - 'content': 'div class=task-assignment-timeline-content' + - 'data-toggle=tooltip data-placement=top' + - 'data-html=true data-container=body' + - 'data-title=${sTask + index + (attempt + attempt + )}br' + - 'Status: ${taskInfo.status}br' + - 'Launch Time: ${UIUtils.formatDate(new Date(launchTime))}' + - '${ + |{ + |'className': 'task task-assignment-timeline-object', + |'group': '$executorId', + |'content': 'div class=task-assignment-timeline-content + |data-toggle=tooltip data-placement=top + |data-html=true data-container=body + |data-title=${sTask + index + (attempt + attempt + )}br + |Status: ${taskInfo.status}br + |Launch Time: ${UIUtils.formatDate(new Date(launchTime))} + |${ if (!taskInfo.running) { sbrFinish Time: ${UIUtils.formatDate(new Date(finishTime))} } else { } - }' + - 'brScheduler Delay: $schedulerDelay ms' + - 'brTask Deserialization Time: ${UIUtils.formatDuration(deserializationTime)}' + - 'brShuffle Read Time: ${UIUtils.formatDuration(shuffleReadTime)}' + - 'brExecutor Computing Time: ${UIUtils.formatDuration(executorComputingTime)}' + - 'brShuffle Write Time: ${UIUtils.formatDuration(shuffleWriteTime)}' + - 'brResult Serialization Time: ${UIUtils.formatDuration(serializationTime)}' + - 'brGetting Result Time: ${UIUtils.formatDuration(gettingResultTime)}' + - 'svg class=task-assignment-timeline-duration-bar' + - 'rect class=scheduler-delay-proportion ' + - 'x=$schedulerDelayProportionPos% y=0px height=26px' + - 'width=$schedulerDelayProportion%/rect' + - 'rect class=deserialization-time-proportion '+ - 'x=$deserializationTimeProportionPos% y=0px height=26px' + - 'width=$deserializationTimeProportion%/rect' + - 'rect class=shuffle-read-time-proportion ' + - 'x=$shuffleReadTimeProportionPos% y=0px height=26px' + - 'width=$shuffleReadTimeProportion%/rect' + - 'rect class=executor-runtime-proportion ' + - 'x=$executorRuntimeProportionPos% y=0px height=26px' + - 'width=$executorComputingTimeProportion%/rect' + - 'rect class=shuffle-write-time-proportion ' + - 'x=$shuffleWriteTimeProportionPos% y=0px height=26px' + - 'width=$shuffleWriteTimeProportion%/rect' + - 'rect class=serialization-time-proportion ' + - 'x=$serializationTimeProportionPos% y=0px height=26px' + - 'width=$serializationTimeProportion%/rect' + - 'rect class=getting-result-time-proportion ' + - 'x=$gettingResultTimeProportionPos% y=0px height=26px' + - 'width=$gettingResultTimeProportion%/rect/svg', - 'start': new Date($launchTime), - 'end': new Date($finishTime) - } - + } + |brScheduler Delay: $schedulerDelay ms + |brTask Deserialization Time: ${UIUtils.formatDuration(deserializationTime)} + |brShuffle Read Time: ${UIUtils.formatDuration(shuffleReadTime)} + |brExecutor Computing Time: ${UIUtils.formatDuration(executorComputingTime)} + |brShuffle Write Time: ${UIUtils.formatDuration(shuffleWriteTime)} + |brResult Serialization Time: ${UIUtils.formatDuration(serializationTime)} + |brGetting Result Time: ${UIUtils.formatDuration(gettingResultTime)} + |svg class=task-assignment-timeline-duration-bar + |rect
[GitHub] spark pull request: [SPARK-8702][WebUI]Avoid massive concating str...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7082#issuecomment-116656454 [Test build #979 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/979/console) for PR 7082 at commit [`b29231d`](https://github.com/apache/spark/commit/b29231d5fd1b5e1ff8bcc68d8dd8706dda99ee58). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8702][WebUI]Avoid massive concating str...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7082#issuecomment-116657924 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8702][WebUI]Avoid massive concating str...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7082#issuecomment-116659288 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8702][WebUI]Avoid massive concating str...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7082#issuecomment-116659244 [Test build #35980 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35980/console) for PR 7082 at commit [`b29231d`](https://github.com/apache/spark/commit/b29231d5fd1b5e1ff8bcc68d8dd8706dda99ee58). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8693][Project Infra]: profiles and goal...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7085#issuecomment-116661002 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8590][SQL] add code gen for ExtractValu...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6982#issuecomment-116661050 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4666] Improve YarnAllocator's parsing o...
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/3525#issuecomment-116662486 @srowen @JoshRosen I think this should be refactored to use the updates from #5574 but I don't think #5574 resolves this on its own because of the need to handle the min/max allocation - my 2c. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-8703] [ML] Add CountVectorizer as a ml ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7084#issuecomment-116663308 [Test build #35982 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35982/console) for PR 7084 at commit [`809fb59`](https://github.com/apache/spark/commit/809fb5947728de16c9addd5a8e27a41371394ff9). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class CountVectorizer (override val uid: String, vocabulary: Array[String])` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6743#issuecomment-116674814 [Test build #35989 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35989/console) for PR 6743 at commit [`0925e2a`](https://github.com/apache/spark/commit/0925e2a4d7c9e9f67cd95d0444bc8d20b888db57). * This patch **fails Scala style tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8618] Obtain hbase token retries many t...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/7007#issuecomment-116670907 What is the use case for adding hbase jars but not the configuration? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8625] [Core] Propagate user exceptions ...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/7014#discussion_r33468301 --- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala --- @@ -97,11 +101,17 @@ case class ExceptionFailure( description: String, stackTrace: Array[StackTraceElement], fullStackTrace: String, -metrics: Option[TaskMetrics]) +metrics: Option[TaskMetrics], +exception: Option[Throwable] = None) --- End diff -- I think we're definitely out of luck for binary compatibility, but I think @pwendell just wanted to preserve source compatibility (ie. maybe users will need to recompile, but they won't need to change their code at all). However, I don't think that is possible either. You would need to have another method like `def unapply(ef: ExceptionFailure): Option[(String, String, Array[StackTraceElement], String, Option[TaskMetrics])]` -- ie., exactly the same as the built-in unapply, but without the final `Option[Throwable]` in the return type. But that isn't legal overloading -- it has the same set of arguments as the built-in `unapply`, just a different return type. Is there another way around this I'm not seeing? I agree we shouldn't change things willy-nilly just b/c its `@DeveloperApi`, but IMO this change is worth it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8657] [YARN] Simplify method addResourc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7053#issuecomment-116591124 [Test build #976 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/976/consoleFull) for PR 7053 at commit [`75b457e`](https://github.com/apache/spark/commit/75b457ecfebd379221f26f60277c0346081017e5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7871][SQL]Improve the outputPartitionin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6413#issuecomment-116591217 [Test build #35975 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35975/consoleFull) for PR 6413 at commit [`e59b4d4`](https://github.com/apache/spark/commit/e59b4d4c799e0487a1ec4557456ee031d0b30b13). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8702][WebUI]Avoid massive concating str...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7082#issuecomment-116621564 [Test build #979 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/979/consoleFull) for PR 7082 at commit [`b29231d`](https://github.com/apache/spark/commit/b29231d5fd1b5e1ff8bcc68d8dd8706dda99ee58). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8702][WebUI]Avoid massive concating str...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/7082#issuecomment-116630589 Oh, I see. could you file another one as a separate issue? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-116591832 [Test build #35976 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35976/consoleFull) for PR 7080 at commit [`588c75f`](https://github.com/apache/spark/commit/588c75f714372b6da4dd20fa7d006afe399fa8e2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8590][SQL] add code gen for ExtractValu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6982#issuecomment-116591859 [Test build #35979 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35979/consoleFull) for PR 6982 at commit [`c085e60`](https://github.com/apache/spark/commit/c085e60c85ef940c80d9952e44431a9be2eca74b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8701][Streaming][WebUI] Add input metad...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7081#issuecomment-116621962 [Test build #35977 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35977/console) for PR 7081 at commit [`d496ae9`](https://github.com/apache/spark/commit/d496ae9eb042ee81a9b497cf4f5ebaf87bb38337). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8701][Streaming][WebUI] Add input metad...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7081#issuecomment-116621984 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8625] [Core] Propagate user exceptions ...
Github user tomwhite commented on a diff in the pull request: https://github.com/apache/spark/pull/7014#discussion_r33454836 --- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala --- @@ -97,11 +101,17 @@ case class ExceptionFailure( description: String, stackTrace: Array[StackTraceElement], fullStackTrace: String, -metrics: Option[TaskMetrics]) +metrics: Option[TaskMetrics], +exception: Option[Throwable] = None) --- End diff -- @pwendell You're right - it will break pattern matching on the class. My understanding is that an `unapply` method won't help since pattern matching won't use it (they are for user code). Case classes don't play well with binary compatibility, it seems. To do this compatibly, we'd have to have another case class, called `ExceptionFailureWithCause` say, and a trait that both it and `ExceptionFailure` extend with the common fields. Then everywhere that handles `ExceptionFailure` would also have to handle `ExceptionFailureWithCause`. Having said all that, this class is marked `@DeveloperApi` so it's within the contract to change it. The `fullStackTrace` field was added last November, for example. I can understand the general reluctance to change code even if it is marked as being for developers only, but it's not clear if the workaround here to preserve binary compatibility is worth the complexity it adds. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-8703] [ML] Add CountVectorizer as a ml ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7084#issuecomment-116628270 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-8703] [ML] Add CountVectorizer as a ml ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7084#issuecomment-116628338 [Test build #35981 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35981/consoleFull) for PR 7084 at commit [`7c61fb3`](https://github.com/apache/spark/commit/7c61fb32801ed802b8792663b1769c9eddd1346e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-8703] [ML] Add CountVectorizer as a ml ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7084#issuecomment-116628258 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7871][SQL]Improve the outputPartitionin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6413#issuecomment-116628777 [Test build #35975 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35975/console) for PR 6413 at commit [`e59b4d4`](https://github.com/apache/spark/commit/e59b4d4c799e0487a1ec4557456ee031d0b30b13). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ClusteredDistribution(` * `sealed case class Partitioning(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7871][SQL]Improve the outputPartitionin...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6413#issuecomment-116628800 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8702][WebUI]Avoid massive concating str...
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/7082#discussion_r33457314 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala --- @@ -572,55 +572,55 @@ private[ui] class StagePage(parent: StagesTab) extends WebUIPage(stage) { val attempt = taskInfo.attempt val timelineObject = s - { - 'className': 'task task-assignment-timeline-object', - 'group': '$executorId', - 'content': 'div class=task-assignment-timeline-content' + - 'data-toggle=tooltip data-placement=top' + - 'data-html=true data-container=body' + - 'data-title=${sTask + index + (attempt + attempt + )}br' + - 'Status: ${taskInfo.status}br' + - 'Launch Time: ${UIUtils.formatDate(new Date(launchTime))}' + - '${ + |{ + |'className': 'task task-assignment-timeline-object', + |'group': '$executorId', + |'content': 'div class=task-assignment-timeline-content + |data-toggle=tooltip data-placement=top + |data-html=true data-container=body + |data-title=${sTask + index + (attempt + attempt + )}br + |Status: ${taskInfo.status}br + |Launch Time: ${UIUtils.formatDate(new Date(launchTime))} + |${ if (!taskInfo.running) { sbrFinish Time: ${UIUtils.formatDate(new Date(finishTime))} } else { } - }' + - 'brScheduler Delay: $schedulerDelay ms' + - 'brTask Deserialization Time: ${UIUtils.formatDuration(deserializationTime)}' + - 'brShuffle Read Time: ${UIUtils.formatDuration(shuffleReadTime)}' + - 'brExecutor Computing Time: ${UIUtils.formatDuration(executorComputingTime)}' + - 'brShuffle Write Time: ${UIUtils.formatDuration(shuffleWriteTime)}' + - 'brResult Serialization Time: ${UIUtils.formatDuration(serializationTime)}' + - 'brGetting Result Time: ${UIUtils.formatDuration(gettingResultTime)}' + - 'svg class=task-assignment-timeline-duration-bar' + - 'rect class=scheduler-delay-proportion ' + - 'x=$schedulerDelayProportionPos% y=0px height=26px' + - 'width=$schedulerDelayProportion%/rect' + - 'rect class=deserialization-time-proportion '+ - 'x=$deserializationTimeProportionPos% y=0px height=26px' + - 'width=$deserializationTimeProportion%/rect' + - 'rect class=shuffle-read-time-proportion ' + - 'x=$shuffleReadTimeProportionPos% y=0px height=26px' + - 'width=$shuffleReadTimeProportion%/rect' + - 'rect class=executor-runtime-proportion ' + - 'x=$executorRuntimeProportionPos% y=0px height=26px' + - 'width=$executorComputingTimeProportion%/rect' + - 'rect class=shuffle-write-time-proportion ' + - 'x=$shuffleWriteTimeProportionPos% y=0px height=26px' + - 'width=$shuffleWriteTimeProportion%/rect' + - 'rect class=serialization-time-proportion ' + - 'x=$serializationTimeProportionPos% y=0px height=26px' + - 'width=$serializationTimeProportion%/rect' + - 'rect class=getting-result-time-proportion ' + - 'x=$gettingResultTimeProportionPos% y=0px height=26px' + - 'width=$gettingResultTimeProportion%/rect/svg', - 'start': new Date($launchTime), - 'end': new Date($finishTime) - } - + } + |brScheduler Delay: $schedulerDelay ms + |brTask Deserialization Time: ${UIUtils.formatDuration(deserializationTime)} + |brShuffle Read Time: ${UIUtils.formatDuration(shuffleReadTime)} + |brExecutor Computing Time: ${UIUtils.formatDuration(executorComputingTime)} + |brShuffle Write Time: ${UIUtils.formatDuration(shuffleWriteTime)} + |brResult Serialization Time: ${UIUtils.formatDuration(serializationTime)} + |brGetting Result Time: ${UIUtils.formatDuration(gettingResultTime)} + |svg class=task-assignment-timeline-duration-bar + |rect
[GitHub] spark pull request: [SPARK-8657] [YARN] Fail to upload conf archiv...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7055#issuecomment-116630838 [Test build #977 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/977/consoleFull) for PR 7055 at commit [`cbae84e`](https://github.com/apache/spark/commit/cbae84e72cc8c1949e96b0d17c7ff38ca6da7281). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7735] [pyspark] Raise Exception on non-...
Github user megatron-me-uk commented on a diff in the pull request: https://github.com/apache/spark/pull/6262#discussion_r33448014 --- Diff: python/pyspark/tests.py --- @@ -874,6 +874,15 @@ def test_sortByKey_uses_all_partitions_not_only_first_and_last(self): for size in sizes: self.assertGreater(size, 0) +def test_pipe_functions(self): +data = ['1', '2', '3'] +rdd = self.sc.parallelize(data) +with QuietTest(self.sc): +self.assertRaises(Py4JJavaError, rdd.pipe('cc').collect) +result = rdd.pipe('cat').collect() --- End diff -- This issue can be worked around using `grep target; test $? -le 1` although maybe not the best solution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8702][WebUI]Avoid massive concating str...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/7082#issuecomment-116612731 @sarutak I also find another issue in the console. Because `System.currentTimeMillis()` is not accurate for tasks that only need several milliseconds, sometimes `totalExecutionTime` in `makeTimeline` will be 0. If `totalExecutionTime` is 0, there will the following error in the console. ![screen shot 2015-06-29 at 7 08 55 pm](https://cloud.githubusercontent.com/assets/1000778/8406776/5cd38e04-1e92-11e5-89f2-0c5134fe4b6b.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8701][Streaming][WebUI] Add input metad...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7081#issuecomment-116592009 [Test build #35977 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35977/consoleFull) for PR 7081 at commit [`d496ae9`](https://github.com/apache/spark/commit/d496ae9eb042ee81a9b497cf4f5ebaf87bb38337). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8214][SQL]Add function hex
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6976#issuecomment-116591873 [Test build #35978 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35978/consoleFull) for PR 6976 at commit [`e218d1b`](https://github.com/apache/spark/commit/e218d1b8b6a92d4dd566bb3817b41c09c15b1614). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-8374] [YARN] Job frequently hangs after ...
GitHub user xuchenCN opened a pull request: https://github.com/apache/spark/pull/7083 SPARK-8374] [YARN] Job frequently hangs after YARN preemption Issue description [SPARK-8374](https://issues.apache.org/jira/browse/SPARK-8374) Application starve because YARN scheduler preemption,we should to keep Running containers number Do not think about the resource ,that is Yarn scheduler's job . You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchenCN/spark SPARK-8374 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7083.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7083 commit f2667f274d1ee3d1c643f19221b9e83141f2137a Author: xuchenCN chenxu198...@gmail.com Date: 2015-06-29T10:53:04Z [SPARK-8374] [YARN] Job frequently hangs after YARN preemption commit 9555dd52c66efc5fa4b748ff9253309ac70b9c0a Author: xuchenCN chenxu198...@gmail.com Date: 2015-06-29T11:04:42Z [SPARK-8374] [YARN] Job frequently hangs after YARN preemption --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8702][WebUI]Avoid massive concating str...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7082#issuecomment-116609513 [Test build #35980 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35980/consoleFull) for PR 7082 at commit [`b29231d`](https://github.com/apache/spark/commit/b29231d5fd1b5e1ff8bcc68d8dd8706dda99ee58). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8214][SQL]Add function hex
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6976#issuecomment-116629588 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-7889 Jobs progress of apps on complete p...
Github user steveloughran commented on the pull request: https://github.com/apache/spark/pull/6935#issuecomment-116592795 I like the selenium test; it could be combined with the provider i wrote which lets us programmatically create our own history, so add changes we can look for. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8214][SQL]Add function hex
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6976#issuecomment-116629493 [Test build #35978 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35978/console) for PR 6976 at commit [`e218d1b`](https://github.com/apache/spark/commit/e218d1b8b6a92d4dd566bb3817b41c09c15b1614). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Hex(child: Expression)` * `case class Hypot(left: Expression, right: Expression)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8657] [YARN] Fail to upload conf archiv...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7055#issuecomment-116591021 [Test build #977 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/977/consoleFull) for PR 7055 at commit [`cbae84e`](https://github.com/apache/spark/commit/cbae84e72cc8c1949e96b0d17c7ff38ca6da7281). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-116590952 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org