[GitHub] spark pull request: [SPARK-6972][SQL] Add Coalesce to DataFrame
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5545#issuecomment-93866737 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30442/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [BUILD] Support building with SBT on encrypted...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5546#issuecomment-93877539 [Test build #30452 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30452/consoleFull) for PR 5546 at commit [`031c602`](https://github.com/apache/spark/commit/031c6025113c064b6fc0b5895b1830f223f6cf55). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5623][GraphX] Replace an obsolete mapRe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4402#issuecomment-93873823 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30450/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6963][CORE]Flaky test: o.a.s.ContextCle...
GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/5548 [SPARK-6963][CORE]Flaky test: o.a.s.ContextCleanerSuite automatically cleanup checkpoint cc @andrewor14 You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark SPARK-6963 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5548.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5548 commit b08b3c994902629b54808c27841335fb6ca2715d Author: GuoQiang Li wi...@qq.com Date: 2015-04-17T01:56:11Z Flaky test: o.a.s.ContextCleanerSuite automatically cleanup checkpoint --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6368][SQL] Build a specialized serializ...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/5497#discussion_r28563536 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -139,6 +141,8 @@ private[sql] class SQLConf extends Serializable { */ private[spark] def codegenEnabled: Boolean = getConf(CODEGEN_ENABLED, false).toBoolean + private[spark] def useSqlSerializer2: Boolean = getConf(USE_SQL_SERIALIZER2, false).toBoolean --- End diff -- Also do we want to turn it on by default? Its easy to turn off if we find bugs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6350][Mesos] Make mesosExecutorCores co...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5063#issuecomment-93882452 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30448/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6957] [SPARK-6958] [SQL] improve API co...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5544#issuecomment-93865283 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30440/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/5467#discussion_r28573221 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -479,9 +492,16 @@ class Word2VecModel private[mllib] ( */ def findSynonyms(vector: Vector, num: Int): Array[(String, Double)] = { require(num 0, Number of similar words should 0) -// TODO: optimize top-k -val fVector = vector.toArray.map(_.toFloat) -model.mapValues(vec = cosineSimilarity(fVector, vec)) + +val numWords = wordVectors.numRows +val cosineVec = Vectors.zeros(numWords).asInstanceOf[DenseVector] +BLAS.gemv(1.0, wordVectors, vector.asInstanceOf[DenseVector], 0.0, cosineVec) + +// Need not divide with the norm of the given vector since it is constant. +val updatedCosines = indexedModel.map { case (_, ind) = --- End diff -- Do you mean that when I do a `dict.map`, the ordering need not be the same as that of the dict? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6975][Yarn] Fix argument validation err...
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/5551 [SPARK-6975][Yarn] Fix argument validation error `numExecutors` checking is failed when dynamic allocation is enabled with default configuration. Details can be seen is [SPARK-6975](https://issues.apache.org/jira/browse/SPARK-6975). @sryza, please help me to review this, not sure is this the correct way, I think previous you change this part :) You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerryshao/apache-spark SPARK-6975 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5551.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5551 commit 77bdcbdc00522e76f9394c68d769f35c15af09a6 Author: jerryshao saisai.s...@intel.com Date: 2015-04-17T07:08:16Z Fix argument validation error --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-93914210 [Test build #30464 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30464/consoleFull) for PR 5467 at commit [`64575b0`](https://github.com/apache/spark/commit/64575b0282b350facc93340fbf653b38b0121b1a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6953] [PySpark] speed up python tests
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5427#issuecomment-93897975 [Test build #30458 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30458/consoleFull) for PR 5427 at commit [`2654bfd`](https://github.com/apache/spark/commit/2654bfda79da9d12c897bc144da2b2137a56c68c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Project Infra] SPARK-1684: Merge script shoul...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5149#discussion_r28569271 --- Diff: dev/merge_spark_pr.py --- @@ -286,68 +281,137 @@ def resolve_jira_issues(title, merge_branches, comment): resolve_jira_issue(merge_branches, comment, jira_id) -branches = get_json(%s/branches % GITHUB_API_BASE) -branch_names = filter(lambda x: x.startswith(branch-), [x['name'] for x in branches]) -# Assumes branch names can be sorted lexicographically -latest_branch = sorted(branch_names, reverse=True)[0] - -pr_num = raw_input(Which pull request would you like to merge? (e.g. 34): ) -pr = get_json(%s/pulls/%s % (GITHUB_API_BASE, pr_num)) -pr_events = get_json(%s/issues/%s/events % (GITHUB_API_BASE, pr_num)) - -url = pr[url] -title = pr[title] -body = pr[body] -target_ref = pr[base][ref] -user_login = pr[user][login] -base_ref = pr[head][ref] -pr_repo_desc = %s/%s % (user_login, base_ref) - -# Merged pull requests don't appear as merged in the GitHub API; -# Instead, they're closed by asfgit. -merge_commits = \ -[e for e in pr_events if e[actor][login] == asfgit and e[event] == closed] - -if merge_commits: -merge_hash = merge_commits[0][commit_id] -message = get_json(%s/commits/%s % (GITHUB_API_BASE, merge_hash))[commit][message] - -print Pull request %s has already been merged, assuming you want to backport % pr_num -commit_is_downloaded = run_cmd(['git', 'rev-parse', '--quiet', '--verify', +def standardize_jira_ref(text): + +Standardize the [MODULE] SPARK-X prefix +Converts [SPARK-XXX][mllib] Issue, [MLLib] SPARK-XXX. Issue or SPARK XXX [MLLIB]: Issue to [MLLIB] SPARK-XXX: Issue + + standardize_jira_ref([SPARK-5821] [SQL] ParquetRelation2 CTAS should check if delete is successful) +'[SQL] SPARK-5821: ParquetRelation2 CTAS should check if delete is successful' + standardize_jira_ref([SPARK-4123][Project Infra][WIP]: Show new dependencies added in pull requests) +'[PROJECT INFRA] [WIP] SPARK-4123: Show new dependencies added in pull requests' + standardize_jira_ref([MLlib] Spark 5954: Top by key) +'[MLLIB] SPARK-5954: Top by key' + +#If the string is compliant, no need to process any further +if (re.search(r'\[[A-Z0-9_]+\] SPARK-[0-9]{3,5}: \S+', text)): +return text + +# Extract JIRA ref(s): +jira_refs = deque() +pattern = re.compile(r'(SPARK[-\s]*[0-9]{3,5})', re.IGNORECASE) +while (pattern.search(text) is not None): +ref = pattern.search(text).groups()[0] +# Replace any whitespace with a dash convert to uppercase +jira_refs.append(re.sub(r'\s+', '-', ref.upper())) +text = text.replace(ref, '') + +# Extract spark component(s): +components = deque() +# Look for alphanumeric chars, spaces, and/or commas +pattern = re.compile(r'(\[[\w\s,]+\])', re.IGNORECASE) +while (pattern.search(text) is not None): +component = pattern.search(text).groups()[0] +# Convert to uppercase +components.append(component.upper()) +text = text.replace(component, '') + +# Cleanup remaining symbols: +pattern = re.compile(r'^\W+(.*)', re.IGNORECASE) +if (pattern.search(text) is not None): +text = pattern.search(text).groups()[0] + +# Assemble full text (module(s), JIRA ref(s), remaining text) +if (len(components) 1): +components = +component_text = ' '.join(components).strip() +if (len(jira_refs) 1): +jira_ref_text = +jira_ref_text = ' '.join(jira_refs).strip() + +if (len(jira_ref_text) 1 and len(component_text) 1): +clean_text = text.strip() +elif (len(jira_ref_text) 1): +clean_text = component_text + ' ' + text.strip() +elif (len(component_text) 1): +clean_text = jira_ref_text + ': ' + text.strip() +else: +clean_text = component_text + ' ' + jira_ref_text + ': ' + text.strip() + +return clean_text + +def main(): +os.chdir(SPARK_HOME) --- End diff -- Just to be sure since it's a bit tricky with the diff here - all of this is simply re-organization, correct? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail:
[GitHub] spark pull request: [SPARK-6113] [ml] Stabilize DecisionTree API
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5530#issuecomment-93864879 [Test build #30439 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30439/consoleFull) for PR 5530 at commit [`6aae255`](https://github.com/apache/spark/commit/6aae25587cdcadc0e5d68078ca77d0cdee59e6e4). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class Params(` * `sealed abstract class Node extends Serializable ` * `sealed trait Split extends Serializable ` * `final class CategoricalSplit(` * `final class ContinuousSplit(override val featureIndex: Int, val threshold: Double) extends Split ` * `trait DecisionTreeModel ` * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] SPARK-6548: Adding stddev to DataFrame f...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/5357#issuecomment-93868139 /cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6368][SQL] Build a specialized serializ...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/5497#discussion_r28563376 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -139,6 +141,8 @@ private[sql] class SQLConf extends Serializable { */ private[spark] def codegenEnabled: Boolean = getConf(CODEGEN_ENABLED, false).toBoolean + private[spark] def useSqlSerializer2: Boolean = getConf(USE_SQL_SERIALIZER2, false).toBoolean --- End diff -- This seems pretty hard as there is no standard interface to the serializer constructor. Perhaps we should document this and say it is experimental? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5213] [SQL] Pluggable SQL Parser Suppor...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4015#issuecomment-93900302 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30460/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4435#issuecomment-93898160 I'd like to finish reviewing this, but I keep getting pre-empted by other work, so instead I'll leave a list of things that I would look at / check when reviewing this (to let other folks pick up and finish the review). This looks like it's in pretty good shape overall, though, so hopefully it won't be too much work to finish this. Here's what I'd look at in any final review passes: - Has the visibility of new classes / methods / interfaces been restricted to the narrowest possible scope (i.e. are we unintentionally exposing internal functionality)? If something _has_ to be public but is not intended to be stable / available to users, we should add a documentation comment to explain this. - Have accesses to listeners been properly synchronized? - Are there any code style nits that we should clean up? I noticed a bunch of minor indentation problems, but don't really have time to comment individually. - I'd take a look at how we handle timestamps in JSON, just to double-check that we're exposing them in an easy-to-consume format. - Documentation-wise, are there any confusing parts of the code that need to be documented? - Can we add a top-level Javadoc comment somewhere to explain our overall strategy for handling JSON compatibility, etc, and maybe a checklist / rules to follow when changing these classes? There's something similar to this in one of the JSONProtocol classes, which might be nice to model this on. I'd also manually test this in a spark-shell. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5352][GraphX] Add getPartitionStrategy ...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/5549 [SPARK-5352][GraphX] Add getPartitionStrategy in Graph Graph remembers an applied partition strategy in partitionBy() and returns it via getPartitionStrategy(). This is useful in case of the following situation; val g1 = GraphLoader.edgeListFile(sc, graph.txt) val g2 = g1.partitionBy(EdgePartition2D, 2) // Modify (e.g., add, contract, ...) edges in g2 val newEdges = ... // Re-build a new graph based on g2 val g3 = Graph(g1.vertices, newEdges) // Partition edges in a similar way of g2 val g4 = g3.partitionBy(g2.getPartitionStrategy, 2) You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark PartitionStrategyInGraph Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5549.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5549 commit 084ae5a80c96cb481c2b7d3f5aced99b09619057 Author: Takeshi YAMAMURO linguin@gmail.com Date: 2015-04-17T04:05:13Z Add getPartitionStrategy commit c46d126a044d089f70b1c38b3cdb4979b6ffe589 Author: Takeshi YAMAMURO linguin@gmail.com Date: 2015-04-17T04:54:38Z Add an new entry in MimaExlucdes.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-93917004 @viper-kun What's the status of this patch? If you don't make further updates, I'd like to brush up this patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5213] [SQL] Pluggable SQL Parser Suppor...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4015#issuecomment-93900300 [Test build #30460 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30460/consoleFull) for PR 4015 at commit [`12249a2`](https://github.com/apache/spark/commit/12249a2ea065effc00c8ad67a3d2f9eef5e8878b). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5352][GraphX] Add getPartitionStrategy ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5549#issuecomment-93893522 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5338][MESOS] Add cluster mode support f...
Github user tnachen commented on a diff in the pull request: https://github.com/apache/spark/pull/5144#discussion_r28573550 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -0,0 +1,614 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster.mesos + +import java.io.File +import java.util.concurrent.locks.ReentrantLock +import java.util.{Collections, Date, List = JList} + +import org.apache.mesos.Protos.Environment.Variable +import org.apache.mesos.Protos.TaskStatus.Reason +import org.apache.mesos.Protos.{TaskState = MesosTaskState, _} +import org.apache.mesos.{Scheduler, SchedulerDriver} +import org.apache.spark.deploy.mesos.MesosDriverDescription +import org.apache.spark.deploy.rest.{CreateSubmissionResponse, KillSubmissionResponse, SubmissionStatusResponse} +import org.apache.spark.metrics.MetricsSystem +import org.apache.spark.util.Utils +import org.apache.spark.{SecurityManager, SparkConf, SparkException, TaskState} + +import scala.collection.JavaConversions._ +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer + + +/** + * Tracks the current state of a Mesos Task that runs a Spark driver. + * @param submission Submitted driver description from + * [[org.apache.spark.deploy.rest.mesos.MesosRestServer]] + * @param taskId Mesos TaskID generated for the task + * @param slaveId Slave ID that the task is assigned to + * @param taskState The last known task status update. + * @param startDate The date the task was launched + */ +private[spark] class MesosClusterTaskState( +val submission: MesosDriverDescription, +val taskId: TaskID, +val slaveId: SlaveID, +var taskState: Option[TaskStatus], +var startDate: Date) + extends Serializable { + + def copy(): MesosClusterTaskState = { +new MesosClusterTaskState( + submission, taskId, slaveId, taskState, startDate) + } +} + +/** + * Tracks the retry state of a driver, which includes the next time it should be scheduled + * and necessary information to do exponential backoff. + * This class is not thread-safe, and we expect the caller to handle synchronizing state. + * @param lastFailureStatus Last Task status when it failed. + * @param retries Number of times it has retried. + * @param nextRetry Next retry time to be scheduled. + * @param waitTime The amount of time driver is scheduled to wait until next retry. + */ +private[spark] class RetryState( +val lastFailureStatus: TaskStatus, +val retries: Int, +val nextRetry: Date, +val waitTime: Int) extends Serializable { + def copy(): RetryState = +new RetryState(lastFailureStatus, retries, nextRetry, waitTime) +} + +/** + * The full state of the cluster scheduler, currently being used for displaying + * information on the UI. + * @param frameworkId Mesos Framework id for the cluster scheduler. + * @param masterUrl The Mesos master url + * @param queuedDrivers All drivers queued to be launched + * @param launchedDrivers All launched or running drivers + * @param finishedDrivers All terminated drivers + * @param retryList All drivers pending to be retried + */ +private[spark] class MesosClusterSchedulerState( +val frameworkId: String, +val masterUrl: Option[String], +val queuedDrivers: Iterable[MesosDriverDescription], +val launchedDrivers: Iterable[MesosClusterTaskState], +val finishedDrivers: Iterable[MesosClusterTaskState], +val retryList: Iterable[MesosDriverDescription]) + +/** + * A Mesos scheduler that is responsible for launching submitted Spark drivers in cluster mode + * as Mesos tasks in a Mesos cluster. + * All drivers are launched asynchronously by the framework, which will eventually be launched + * by one of the
[GitHub] spark pull request: [SPARK-6957] [SPARK-6958] [SQL] improve API co...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/5544#discussion_r28570802 --- Diff: python/pyspark/sql/dataframe.py --- @@ -999,6 +1017,13 @@ def _to_java_column(col): return jcol +def _to_seq(sc, cols, converter=None): --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6966][SQL] Use correct ClassLoader for ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5543#issuecomment-93862506 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30438/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6899][SQL] Fix type mismatch when using...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/5517 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6156][CORE]Not cache in memory again wh...
Github user suyanNone commented on the pull request: https://github.com/apache/spark/pull/4886#issuecomment-93885511 @srowen I forgot to update desc, already refine if program go here `if (!putLevel.useMemory) {`, means put a disk level block, or memory_and_disk level block which put in memory is failed and try to put in disk(which `putLevel.useMemory` is false, and block `level.useMemory` is true). I not sure is reasonable to put twice in a short time. ``` if (!putLevel.useMemory) { /* * This RDD is not to be cached in memory, so we can just pass the computed values as an * iterator directly to the BlockManager rather than first fully unrolling it in memory. */ updatedBlocks ++= blockManager.putIterator(key, values, level, tellMaster = true, effectiveStorageLevel) blockManager.getLocal(key, !level.useMemory) match { case Some(v) = v.data.asInstanceOf[Iterator[T]] ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [BUILD] Support building with SBT on encrypted...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/5546#issuecomment-93872501 Talk with @pwendell off-line and seems since we don't publish with SBT this is pretty safe. He asked me to update the docs to make it clear why we don't do this for maven (though if someone from the scala side says this is safe, I'd argue to do it there too). @srowen any further objections to merging this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5478#issuecomment-93902319 [Test build #30462 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30462/consoleFull) for PR 5478 at commit [`547fd95`](https://github.com/apache/spark/commit/547fd957ba224c86cf828890562b2eafde2b8ecb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6888][SQL] Export driver quirks
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5498#issuecomment-93909594 [Test build #30463 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30463/consoleFull) for PR 5498 at commit [`22d65ca`](https://github.com/apache/spark/commit/22d65cac9bb22a9cdda5019042acca0c66e46270). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6845] [MLlib] [PySpark] Add isTranposed...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5455#issuecomment-93896772 [Test build #30457 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30457/consoleFull) for PR 5455 at commit [`151f3b6`](https://github.com/apache/spark/commit/151f3b67dbdd07462b00125c696d987a3cebb6ad). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6806] [SparkR] [Docs] Fill in SparkR ex...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/5442#issuecomment-93897920 @shivaram Should we merge this or wait for API audit? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5338][MESOS] Add cluster mode support f...
Github user tnachen commented on a diff in the pull request: https://github.com/apache/spark/pull/5144#discussion_r28570218 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -0,0 +1,614 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster.mesos + +import java.io.File +import java.util.concurrent.locks.ReentrantLock +import java.util.{Collections, Date, List = JList} + +import org.apache.mesos.Protos.Environment.Variable +import org.apache.mesos.Protos.TaskStatus.Reason +import org.apache.mesos.Protos.{TaskState = MesosTaskState, _} +import org.apache.mesos.{Scheduler, SchedulerDriver} +import org.apache.spark.deploy.mesos.MesosDriverDescription +import org.apache.spark.deploy.rest.{CreateSubmissionResponse, KillSubmissionResponse, SubmissionStatusResponse} +import org.apache.spark.metrics.MetricsSystem +import org.apache.spark.util.Utils +import org.apache.spark.{SecurityManager, SparkConf, SparkException, TaskState} + +import scala.collection.JavaConversions._ +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer + + +/** + * Tracks the current state of a Mesos Task that runs a Spark driver. + * @param submission Submitted driver description from + * [[org.apache.spark.deploy.rest.mesos.MesosRestServer]] + * @param taskId Mesos TaskID generated for the task + * @param slaveId Slave ID that the task is assigned to + * @param taskState The last known task status update. + * @param startDate The date the task was launched + */ +private[spark] class MesosClusterTaskState( +val submission: MesosDriverDescription, +val taskId: TaskID, +val slaveId: SlaveID, +var taskState: Option[TaskStatus], +var startDate: Date) + extends Serializable { + + def copy(): MesosClusterTaskState = { +new MesosClusterTaskState( + submission, taskId, slaveId, taskState, startDate) + } +} + +/** + * Tracks the retry state of a driver, which includes the next time it should be scheduled + * and necessary information to do exponential backoff. + * This class is not thread-safe, and we expect the caller to handle synchronizing state. + * @param lastFailureStatus Last Task status when it failed. + * @param retries Number of times it has retried. + * @param nextRetry Next retry time to be scheduled. + * @param waitTime The amount of time driver is scheduled to wait until next retry. + */ +private[spark] class RetryState( +val lastFailureStatus: TaskStatus, +val retries: Int, +val nextRetry: Date, +val waitTime: Int) extends Serializable { + def copy(): RetryState = +new RetryState(lastFailureStatus, retries, nextRetry, waitTime) +} + +/** + * The full state of the cluster scheduler, currently being used for displaying + * information on the UI. + * @param frameworkId Mesos Framework id for the cluster scheduler. + * @param masterUrl The Mesos master url + * @param queuedDrivers All drivers queued to be launched + * @param launchedDrivers All launched or running drivers + * @param finishedDrivers All terminated drivers + * @param retryList All drivers pending to be retried + */ +private[spark] class MesosClusterSchedulerState( +val frameworkId: String, +val masterUrl: Option[String], +val queuedDrivers: Iterable[MesosDriverDescription], +val launchedDrivers: Iterable[MesosClusterTaskState], +val finishedDrivers: Iterable[MesosClusterTaskState], +val retryList: Iterable[MesosDriverDescription]) + +/** + * A Mesos scheduler that is responsible for launching submitted Spark drivers in cluster mode + * as Mesos tasks in a Mesos cluster. + * All drivers are launched asynchronously by the framework, which will eventually be launched + * by one of the
[GitHub] spark pull request: [SPARK-5338][MESOS] Add cluster mode support f...
Github user tnachen commented on a diff in the pull request: https://github.com/apache/spark/pull/5144#discussion_r28573523 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -0,0 +1,614 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster.mesos + +import java.io.File +import java.util.concurrent.locks.ReentrantLock +import java.util.{Collections, Date, List = JList} + +import org.apache.mesos.Protos.Environment.Variable +import org.apache.mesos.Protos.TaskStatus.Reason +import org.apache.mesos.Protos.{TaskState = MesosTaskState, _} +import org.apache.mesos.{Scheduler, SchedulerDriver} +import org.apache.spark.deploy.mesos.MesosDriverDescription +import org.apache.spark.deploy.rest.{CreateSubmissionResponse, KillSubmissionResponse, SubmissionStatusResponse} +import org.apache.spark.metrics.MetricsSystem +import org.apache.spark.util.Utils +import org.apache.spark.{SecurityManager, SparkConf, SparkException, TaskState} + +import scala.collection.JavaConversions._ +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer + + +/** + * Tracks the current state of a Mesos Task that runs a Spark driver. + * @param submission Submitted driver description from + * [[org.apache.spark.deploy.rest.mesos.MesosRestServer]] + * @param taskId Mesos TaskID generated for the task + * @param slaveId Slave ID that the task is assigned to + * @param taskState The last known task status update. + * @param startDate The date the task was launched + */ +private[spark] class MesosClusterTaskState( +val submission: MesosDriverDescription, +val taskId: TaskID, +val slaveId: SlaveID, +var taskState: Option[TaskStatus], +var startDate: Date) + extends Serializable { + + def copy(): MesosClusterTaskState = { +new MesosClusterTaskState( + submission, taskId, slaveId, taskState, startDate) + } +} + +/** + * Tracks the retry state of a driver, which includes the next time it should be scheduled + * and necessary information to do exponential backoff. + * This class is not thread-safe, and we expect the caller to handle synchronizing state. + * @param lastFailureStatus Last Task status when it failed. + * @param retries Number of times it has retried. + * @param nextRetry Next retry time to be scheduled. + * @param waitTime The amount of time driver is scheduled to wait until next retry. + */ +private[spark] class RetryState( +val lastFailureStatus: TaskStatus, +val retries: Int, +val nextRetry: Date, +val waitTime: Int) extends Serializable { + def copy(): RetryState = +new RetryState(lastFailureStatus, retries, nextRetry, waitTime) +} + +/** + * The full state of the cluster scheduler, currently being used for displaying + * information on the UI. + * @param frameworkId Mesos Framework id for the cluster scheduler. + * @param masterUrl The Mesos master url + * @param queuedDrivers All drivers queued to be launched + * @param launchedDrivers All launched or running drivers + * @param finishedDrivers All terminated drivers + * @param retryList All drivers pending to be retried + */ +private[spark] class MesosClusterSchedulerState( +val frameworkId: String, +val masterUrl: Option[String], +val queuedDrivers: Iterable[MesosDriverDescription], +val launchedDrivers: Iterable[MesosClusterTaskState], +val finishedDrivers: Iterable[MesosClusterTaskState], +val retryList: Iterable[MesosDriverDescription]) + +/** + * A Mesos scheduler that is responsible for launching submitted Spark drivers in cluster mode + * as Mesos tasks in a Mesos cluster. + * All drivers are launched asynchronously by the framework, which will eventually be launched + * by one of the
[GitHub] spark pull request: [SPARK-5623][GraphX] Replace an obsolete mapRe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4402#issuecomment-93892862 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30454/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5213] [SQL] Pluggable SQL Parser Suppor...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4015#issuecomment-93922737 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30461/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6845] [MLlib] [PySpark] Add isTranposed...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/5455#issuecomment-93903354 @mengxr It would be really helpful if you could guide me on my two questions? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5213] [SQL] Pluggable SQL Parser Suppor...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4015#issuecomment-93899838 [Test build #30460 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30460/consoleFull) for PR 4015 at commit [`12249a2`](https://github.com/apache/spark/commit/12249a2ea065effc00c8ad67a3d2f9eef5e8878b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6604][PySpark]Specify ip of python serv...
Github user Sephiroth-Lin commented on the pull request: https://github.com/apache/spark/pull/5256#issuecomment-93915251 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6806] [SparkR] [Docs] Fill in SparkR ex...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/5442#discussion_r28571272 --- Diff: docs/programming-guide.md --- @@ -576,6 +660,34 @@ before the `reduce`, which would cause `lineLengths` to be saved in memory after /div +div data-lang=r markdown=1 + +To illustrate RDD basics, consider the simple program below: + +{% highlight r %} +lines - textFile(sc, data.txt) +lineLengths - map(lines, length) +totalLength - reduce(lineLengths, +) +{% endhighlight %} + +The first line defines a base RDD from an external file. This dataset is not loaded in memory or +otherwise acted on: `lines` is merely a pointer to the file. +The second line defines `lineLengths` as the result of a `map` transformation. Again, `lineLengths` +is *not* immediately computed, due to laziness. +Finally, we run `reduce`, which is an action. At this point Spark breaks the computation into tasks +to run on separate machines, and each machine runs both its part of the map and a local reduction, +returning only its answer to the driver program. + +If we also wanted to use `lineLengths` again later, we could add: + +{% highlight r %} +persist(lineLengths) --- End diff -- Added a default value for `newLevel` of `persist` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5623][GraphX] Replace an obsolete mapRe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4402#issuecomment-93892854 [Test build #30454 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30454/consoleFull) for PR 4402 at commit [`182b39b`](https://github.com/apache/spark/commit/182b39bb6818c168fbc23d07f653d4af0ced3cd8). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6973]modify total stages/tasks on the a...
GitHub user XuTingjun opened a pull request: https://github.com/apache/spark/pull/5550 [SPARK-6973]modify total stages/tasks on the allJobsPage Though totalStages = allStages - skippedStages is understandable. But consider the problem [SPARK-6973], I think totalStages = allStages is more reasonable. Like 2/1 (2 failed) (1 skipped), this item also shows the skipped num, it also will be understandable. You can merge this pull request into a Git repository by running: $ git pull https://github.com/XuTingjun/spark allJobsPage Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5550.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5550 commit 47525c6138597a01a6cd2408b95b0fdd4387e0c5 Author: Xu Tingjun xuting...@huawei.com Date: 2015-04-17T06:29:41Z modify total stages/tasks on the allJobsPage --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5352][GraphX] Add getPartitionStrategy ...
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/4138#issuecomment-93893277 Sorry but mistook to close, so re-make the PR. https://github.com/apache/spark/pull/5549 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6845] [MLlib] [PySpark] Add isTranposed...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5455#issuecomment-93911463 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30457/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6957] [SPARK-6958] [SQL] improve API co...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5544#issuecomment-93896816 [Test build #30456 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30456/consoleFull) for PR 5544 at commit [`4944058`](https://github.com/apache/spark/commit/49440583911ccef250e96761de40d3d1605f28c9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5213] [SQL] Pluggable SQL Parser Suppor...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4015#issuecomment-93901819 [Test build #30461 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30461/consoleFull) for PR 4015 at commit [`8117e14`](https://github.com/apache/spark/commit/8117e1438c1e771a16418ee655a7b0dbb891d1c9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5213] [SQL] Pluggable SQL Parser Suppor...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4015#issuecomment-93934762 [Test build #30467 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30467/consoleFull) for PR 4015 at commit [`eb026cd`](https://github.com/apache/spark/commit/eb026cd589f6b5a75544ae6130f19dfc7903ea66). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6418] Add simple per-stage visualizatio...
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/5547#issuecomment-93864988 cc @andrewor14 @pwendell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [BUILD] Support building with SBT on encrypted...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5546#issuecomment-93887188 [Test build #30452 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30452/consoleFull) for PR 5546 at commit [`031c602`](https://github.com/apache/spark/commit/031c6025113c064b6fc0b5895b1830f223f6cf55). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5623][GraphX] Replace an obsolete mapRe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4402#issuecomment-93873653 [Test build #30450 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30450/consoleFull) for PR 4402 at commit [`5810ff2`](https://github.com/apache/spark/commit/5810ff27aa16c183c4cb142f5c75d49f1e755e50). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6350][Mesos] Make mesosExecutorCores co...
Github user jongyoul commented on the pull request: https://github.com/apache/spark/pull/5063#issuecomment-93898567 @andrewor14 I've fixed what you issue. Please review and merge this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4435#issuecomment-93889612 Hey @squito it looks like the automated dependency checking isn't working so well for this PR. Can you do a diff and list all of the dependencies this is adding to or updating in Spark? Creating conflicts with user applications seems like a concern with this patch. Right now the patch shades the asm dependency, is there any reason to shade that one in particular and not others? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6807] [SparkR] Merge recent SparkR-pkg ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5436#issuecomment-93918020 [Test build #30465 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30465/consoleFull) for PR 5436 at commit [`c2b09be`](https://github.com/apache/spark/commit/c2b09be4a465a85ad4d362e9def8139e6b16a05f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6888][SQL] Export driver quirks
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5498#issuecomment-93909599 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30463/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5623][GraphX] Replace an obsolete mapRe...
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/4402#issuecomment-93892969 ok, fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6845] [MLlib] [PySpark] Add isTranposed...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5455#issuecomment-93911425 [Test build #30457 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30457/consoleFull) for PR 5455 at commit [`151f3b6`](https://github.com/apache/spark/commit/151f3b67dbdd07462b00125c696d987a3cebb6ad). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5352][GraphX] Add getPartitionStrategy ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4138#issuecomment-93902330 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30455/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6973]modify total stages/tasks on the a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5550#issuecomment-93913097 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5338][MESOS] Add cluster mode support f...
Github user tnachen commented on a diff in the pull request: https://github.com/apache/spark/pull/5144#discussion_r28570148 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -0,0 +1,614 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster.mesos + +import java.io.File +import java.util.concurrent.locks.ReentrantLock +import java.util.{Collections, Date, List = JList} + +import org.apache.mesos.Protos.Environment.Variable +import org.apache.mesos.Protos.TaskStatus.Reason +import org.apache.mesos.Protos.{TaskState = MesosTaskState, _} +import org.apache.mesos.{Scheduler, SchedulerDriver} +import org.apache.spark.deploy.mesos.MesosDriverDescription +import org.apache.spark.deploy.rest.{CreateSubmissionResponse, KillSubmissionResponse, SubmissionStatusResponse} +import org.apache.spark.metrics.MetricsSystem +import org.apache.spark.util.Utils +import org.apache.spark.{SecurityManager, SparkConf, SparkException, TaskState} + +import scala.collection.JavaConversions._ +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer + + +/** + * Tracks the current state of a Mesos Task that runs a Spark driver. + * @param submission Submitted driver description from + * [[org.apache.spark.deploy.rest.mesos.MesosRestServer]] + * @param taskId Mesos TaskID generated for the task + * @param slaveId Slave ID that the task is assigned to + * @param taskState The last known task status update. + * @param startDate The date the task was launched + */ +private[spark] class MesosClusterTaskState( +val submission: MesosDriverDescription, +val taskId: TaskID, +val slaveId: SlaveID, +var taskState: Option[TaskStatus], +var startDate: Date) + extends Serializable { + + def copy(): MesosClusterTaskState = { +new MesosClusterTaskState( + submission, taskId, slaveId, taskState, startDate) + } +} + +/** + * Tracks the retry state of a driver, which includes the next time it should be scheduled + * and necessary information to do exponential backoff. + * This class is not thread-safe, and we expect the caller to handle synchronizing state. + * @param lastFailureStatus Last Task status when it failed. + * @param retries Number of times it has retried. + * @param nextRetry Next retry time to be scheduled. + * @param waitTime The amount of time driver is scheduled to wait until next retry. + */ +private[spark] class RetryState( +val lastFailureStatus: TaskStatus, +val retries: Int, +val nextRetry: Date, +val waitTime: Int) extends Serializable { + def copy(): RetryState = +new RetryState(lastFailureStatus, retries, nextRetry, waitTime) +} + +/** + * The full state of the cluster scheduler, currently being used for displaying + * information on the UI. + * @param frameworkId Mesos Framework id for the cluster scheduler. + * @param masterUrl The Mesos master url + * @param queuedDrivers All drivers queued to be launched + * @param launchedDrivers All launched or running drivers + * @param finishedDrivers All terminated drivers + * @param retryList All drivers pending to be retried + */ +private[spark] class MesosClusterSchedulerState( +val frameworkId: String, +val masterUrl: Option[String], +val queuedDrivers: Iterable[MesosDriverDescription], +val launchedDrivers: Iterable[MesosClusterTaskState], +val finishedDrivers: Iterable[MesosClusterTaskState], +val retryList: Iterable[MesosDriverDescription]) + +/** + * A Mesos scheduler that is responsible for launching submitted Spark drivers in cluster mode + * as Mesos tasks in a Mesos cluster. + * All drivers are launched asynchronously by the framework, which will eventually be launched + * by one of the
[GitHub] spark pull request: [SPARK-6198][SQL] Support select current_data...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5538#issuecomment-93899836 [Test build #30459 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30459/consoleFull) for PR 5538 at commit [`fad020e`](https://github.com/apache/spark/commit/fad020ebc9a1bd1a98a8c758d770d947205e89b1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6604][PySpark]Specify ip of python serv...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5256#issuecomment-93917056 [Test build #688 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/688/consoleFull) for PR 5256 at commit [`7b3c633`](https://github.com/apache/spark/commit/7b3c6338db700ad6ba52b53d163dae69db6bd326). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6975][Yarn] Fix argument validation err...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5551#issuecomment-93928521 [Test build #30466 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30466/consoleFull) for PR 5551 at commit [`77bdcbd`](https://github.com/apache/spark/commit/77bdcbdc00522e76f9394c68d769f35c15af09a6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5478#issuecomment-93930774 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30462/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6953] [PySpark] speed up python tests
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5427#issuecomment-93930521 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30458/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Project Infra] SPARK-1684: Merge script shoul...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5149#issuecomment-93891216 Hey @texasmichelle thanks for contributing this. It slipped of my radar but it will be nice to get something like this in. One thing though, even though I originally intended the format to be SPARK XXX, in practice, pretty much every contributor now puts brackets around that part /cc @srowen. So it has now sort of become the de-facto standard! We should probably update this page to simply tell people to put brackets: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark So I think what we really want now is to coerce the presence of brackets rather than remove it! If you look at some recent titles, a few of them have this problem. https://git-wip-us.apache.org/repos/asf?p=spark.git;a=shortlog Sorry for some delay in reviewing this, I can address any updates promptly in the next week. Maybe we can start with that pretty simple rule, and then we can expand in subsequent patches to do fancier stuff. The broad organization here looks good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6953] [PySpark] speed up python tests
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5427#issuecomment-93930478 [Test build #30458 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30458/consoleFull) for PR 5427 at commit [`2654bfd`](https://github.com/apache/spark/commit/2654bfda79da9d12c897bc144da2b2137a56c68c). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5478#issuecomment-93930705 [Test build #30462 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30462/consoleFull) for PR 5478 at commit [`547fd95`](https://github.com/apache/spark/commit/547fd957ba224c86cf828890562b2eafde2b8ecb). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6957] [SPARK-6958] [SQL] improve API co...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5544#issuecomment-93931726 [Test build #30456 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30456/consoleFull) for PR 5544 at commit [`4944058`](https://github.com/apache/spark/commit/49440583911ccef250e96761de40d3d1605f28c9). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6957] [SPARK-6958] [SQL] improve API co...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5544#issuecomment-93931780 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30456/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6418] Add simple per-stage visualizatio...
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/5547#issuecomment-93865113 Jenkins, this is ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5338][MESOS] Add cluster mode support f...
Github user tnachen commented on a diff in the pull request: https://github.com/apache/spark/pull/5144#discussion_r28573613 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -0,0 +1,614 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster.mesos + +import java.io.File +import java.util.concurrent.locks.ReentrantLock +import java.util.{Collections, Date, List = JList} + +import org.apache.mesos.Protos.Environment.Variable +import org.apache.mesos.Protos.TaskStatus.Reason +import org.apache.mesos.Protos.{TaskState = MesosTaskState, _} +import org.apache.mesos.{Scheduler, SchedulerDriver} +import org.apache.spark.deploy.mesos.MesosDriverDescription +import org.apache.spark.deploy.rest.{CreateSubmissionResponse, KillSubmissionResponse, SubmissionStatusResponse} +import org.apache.spark.metrics.MetricsSystem +import org.apache.spark.util.Utils +import org.apache.spark.{SecurityManager, SparkConf, SparkException, TaskState} + +import scala.collection.JavaConversions._ +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer + + +/** + * Tracks the current state of a Mesos Task that runs a Spark driver. + * @param submission Submitted driver description from + * [[org.apache.spark.deploy.rest.mesos.MesosRestServer]] + * @param taskId Mesos TaskID generated for the task + * @param slaveId Slave ID that the task is assigned to + * @param taskState The last known task status update. + * @param startDate The date the task was launched + */ +private[spark] class MesosClusterTaskState( +val submission: MesosDriverDescription, +val taskId: TaskID, +val slaveId: SlaveID, +var taskState: Option[TaskStatus], +var startDate: Date) + extends Serializable { + + def copy(): MesosClusterTaskState = { +new MesosClusterTaskState( + submission, taskId, slaveId, taskState, startDate) + } +} + +/** + * Tracks the retry state of a driver, which includes the next time it should be scheduled + * and necessary information to do exponential backoff. + * This class is not thread-safe, and we expect the caller to handle synchronizing state. + * @param lastFailureStatus Last Task status when it failed. + * @param retries Number of times it has retried. + * @param nextRetry Next retry time to be scheduled. + * @param waitTime The amount of time driver is scheduled to wait until next retry. + */ +private[spark] class RetryState( +val lastFailureStatus: TaskStatus, +val retries: Int, +val nextRetry: Date, +val waitTime: Int) extends Serializable { + def copy(): RetryState = +new RetryState(lastFailureStatus, retries, nextRetry, waitTime) +} + +/** + * The full state of the cluster scheduler, currently being used for displaying + * information on the UI. + * @param frameworkId Mesos Framework id for the cluster scheduler. + * @param masterUrl The Mesos master url + * @param queuedDrivers All drivers queued to be launched + * @param launchedDrivers All launched or running drivers + * @param finishedDrivers All terminated drivers + * @param retryList All drivers pending to be retried + */ +private[spark] class MesosClusterSchedulerState( +val frameworkId: String, +val masterUrl: Option[String], +val queuedDrivers: Iterable[MesosDriverDescription], +val launchedDrivers: Iterable[MesosClusterTaskState], +val finishedDrivers: Iterable[MesosClusterTaskState], +val retryList: Iterable[MesosDriverDescription]) + +/** + * A Mesos scheduler that is responsible for launching submitted Spark drivers in cluster mode + * as Mesos tasks in a Mesos cluster. + * All drivers are launched asynchronously by the framework, which will eventually be launched + * by one of the
[GitHub] spark pull request: [SPARK-5352][GraphX] Add getPartitionStrategy ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4138#issuecomment-93902325 **[Test build #30455 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30455/consoleFull)** for PR 4138 at commit [`f72c058`](https://github.com/apache/spark/commit/f72c05811d89c08fe9f189e9866a1b7bce19d554) after a configured wait of `150m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6888][SQL] Export driver quirks
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5498#issuecomment-93909418 [Test build #30463 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30463/consoleFull) for PR 5498 at commit [`22d65ca`](https://github.com/apache/spark/commit/22d65cac9bb22a9cdda5019042acca0c66e46270). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6807] [SparkR] Merge recent SparkR-pkg ...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/5436#issuecomment-93917450 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5213] [SQL] Pluggable SQL Parser Suppor...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4015#issuecomment-93922706 [Test build #30461 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30461/consoleFull) for PR 4015 at commit [`8117e14`](https://github.com/apache/spark/commit/8117e1438c1e771a16418ee655a7b0dbb891d1c9). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class Dialect ` * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5563][mllib] online lda initial checkin
Github user hhbyyh commented on the pull request: https://github.com/apache/spark/pull/4419#issuecomment-93939245 @jkbradley Provide some update on Correctness test. I have tested current PR with https://github.com/Blei-Lab/onlineldavb and the result are identical. I've uploaded the result and code to https://github.com/hhbyyh/LDACrossValidation. I made some change to get rid of randomness, like initialize matrix with fixed numbers from file and replace batch sample with even split. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6975][Yarn] Fix argument validation err...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/5551#discussion_r28577855 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala --- @@ -103,9 +103,14 @@ private[spark] class ClientArguments(args: Array[String], sparkConf: SparkConf) * This is intended to be called only after the provided arguments have been parsed. */ private def validateArgs(): Unit = { -if (numExecutors = 0) { +if (numExecutors 0 || (!isDynamicAllocationEnabled numExecutors == 0)) { throw new IllegalArgumentException( -You must specify at least 1 executor!\n + getUsageMessage()) +s + |Number of executors $numExecutors is not legal. + |If dynamic allocation is enable, number of executors should at least be 0. --- End diff -- OK, I will change the statement. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6973]modify total stages/tasks on the a...
Github user XuTingjun commented on the pull request: https://github.com/apache/spark/pull/5550#issuecomment-93946383 Yeah, there will be this result. But consider the bug described in the jira, I think it's more reasonable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6955][NETWORK]Do not let Yarn Shuffle S...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5537#issuecomment-93949870 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30473/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6955][NETWORK]Do not let Yarn Shuffle S...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5537#issuecomment-93949861 [Test build #30473 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30473/consoleFull) for PR 5537 at commit [`962770c`](https://github.com/apache/spark/commit/962770c914a1a1928dccbf14a26df735ba4f77f3). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6635][SQL] DataFrame.withColumn should ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5541#issuecomment-93956221 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30469/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6635][SQL] DataFrame.withColumn should ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5541#issuecomment-93956208 [Test build #30469 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30469/consoleFull) for PR 5541 at commit [`b539c7b`](https://github.com/apache/spark/commit/b539c7b7aa55c095163d06bac525d1bb90c0b734). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6046] [core] Reorganize deprecated conf...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/5514 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4602#discussion_r28575468 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -107,6 +113,12 @@ trait CheckAnalysis { failAnalysis( sunresolved operator ${operator.simpleString}) + case p @ Project(exprs, _) if containsMultipleGenerators(exprs) = +failAnalysis( + sOnly a single table generating function is allowed in a SELECT clause, found: + | ${exprs.map(_.prettyString).mkString(,)}.stripMargin) --- End diff -- Yea, I added in the unit test. see `HiveQuerySuite.scala`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4602#discussion_r28575507 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -473,10 +473,47 @@ class Analyzer( */ object ImplicitGenerate extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transform { - case Project(Seq(Alias(g: Generator, _)), child) = -Generate(g, join = false, outer = false, None, child) + case Project(Seq(Alias(g: Generator, name)), child) = +Generate(g, join = false, outer = false, child, qualifier = None, name :: Nil, Nil) + case Project(Seq(MultiAlias(g: Generator, names)), child) = +Generate(g, join = false, outer = false, child, qualifier = None, names, Nil) } } + + object ResolveGenerate extends Rule[LogicalPlan] { +// Construct the output attributes for the generator, +// The output attribute names can be either specified or +// auto generated. +private def makeGeneratorOutput( +generator: Generator, +attributeNames: Seq[String], +qualifier: Option[String]): Array[Attribute] = { + val elementTypes = generator.elementTypes + + val raw = if (attributeNames.size == elementTypes.size) { --- End diff -- Hive does exactly the same as you listed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6635][SQL] DataFrame.withColumn should ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5541#issuecomment-93939754 [Test build #30469 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30469/consoleFull) for PR 5541 at commit [`b539c7b`](https://github.com/apache/spark/commit/b539c7b7aa55c095163d06bac525d1bb90c0b734). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5338][MESOS] Add cluster mode support f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5144#issuecomment-93942077 [Test build #30470 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30470/consoleFull) for PR 5144 at commit [`61e5dba`](https://github.com/apache/spark/commit/61e5dbabc0e4ef1c1bd80c838991e15bc1e40f4e). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5338][MESOS] Add cluster mode support f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5144#issuecomment-93942082 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30470/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6975][Yarn] Fix argument validation err...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/5551#discussion_r28577520 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala --- @@ -103,9 +103,14 @@ private[spark] class ClientArguments(args: Array[String], sparkConf: SparkConf) * This is intended to be called only after the provided arguments have been parsed. */ private def validateArgs(): Unit = { -if (numExecutors = 0) { +if (numExecutors 0 || (!isDynamicAllocationEnabled numExecutors == 0)) { throw new IllegalArgumentException( -You must specify at least 1 executor!\n + getUsageMessage()) +s + |Number of executors $numExecutors is not legal. + |If dynamic allocation is enable, number of executors should at least be 0. --- End diff -- enabled - enabled. I think this is simpler to state as Number of executors was $numExecutors, but must be at least 1 (or 0 if dynamic executor allocation is enabled). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6955][NETWORK]Do not let Yarn Shuffle S...
Github user SaintBacchus commented on the pull request: https://github.com/apache/spark/pull/5537#issuecomment-93947685 @andrewor14 `TransportServer#bindRightPort` will use in `Netty` network, in that case have a retry mechanism is a better way. @vanzin I have clone the configuration and modify the javadoc and also set no retry in `StandaloneWorkerShuffleService`. Please have a check. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6807] [SparkR] Merge recent SparkR-pkg ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5436#issuecomment-93949224 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30465/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6604][PySpark]Specify ip of python serv...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5256#issuecomment-93949537 [Test build #30474 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30474/consoleFull) for PR 5256 at commit [`7b3c633`](https://github.com/apache/spark/commit/7b3c6338db700ad6ba52b53d163dae69db6bd326). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] There are three tests of sql are failed ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5552#issuecomment-93949334 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6807] [SparkR] Merge recent SparkR-pkg ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5436#issuecomment-93949211 [Test build #30465 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30465/consoleFull) for PR 5436 at commit [`c2b09be`](https://github.com/apache/spark/commit/c2b09be4a465a85ad4d362e9def8139e6b16a05f). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6976][SQL] drop table if exists src p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5553#issuecomment-93950877 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6976][SQL] drop table if exists src p...
GitHub user DoingDone9 opened a pull request: https://github.com/apache/spark/pull/5553 [SPARK-6976][SQL] drop table if exists src print ERROR info that should not be printed when src not exists. If table src not exists and run sql drop table if exists src, then some ERROR info will be printed, like that ``` 15/04/17 17:09:53 ERROR Hive: NoSuchObjectException(message:default.src table not found) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1560) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105) at $Proxy10.get_table(Unknown Source) ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/DoingDone9/spark drop_table_exists Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5553.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5553 commit c3f046f8de7c418d4aa7e74afea9968a8baf9231 Author: DoingDone9 799203...@qq.com Date: 2015-03-02T02:11:18Z Merge pull request #1 from apache/master merge lastest spark commit cb1852d14f62adbd194b1edda4ec639ba942a8ba Author: DoingDone9 799203...@qq.com Date: 2015-03-05T07:05:10Z Merge pull request #2 from apache/master merge lastest spark commit c87e8b6d8cb433376a7d14778915006c31f6c01c Author: DoingDone9 799203...@qq.com Date: 2015-03-10T07:46:12Z Merge pull request #3 from apache/master merge lastest spark commit 161cae3a29951d793ce721f9904888bd9529de72 Author: DoingDone9 799203...@qq.com Date: 2015-03-12T06:46:28Z Merge pull request #4 from apache/master merge lastest spark commit 98b134f39ca57f11a5b761c7b9e5f8a7477bd069 Author: DoingDone9 799203...@qq.com Date: 2015-03-19T09:00:07Z Merge pull request #5 from apache/master merge lastest spark commit d00303b7af9436b9bd6d6d27d411a5c8a2e2294d Author: DoingDone9 799203...@qq.com Date: 2015-03-24T08:43:44Z Merge pull request #6 from apache/master merge lastest spark commit 802261c043f56bd5ebe9e46b15e33cdc7c212176 Author: DoingDone9 799203...@qq.com Date: 2015-03-26T02:21:24Z Merge pull request #7 from apache/master merge lastest spark commit 34b1a9a8a30f689b41fd52b8a10c08666c2ff2b5 Author: Zhongshuai Pei 799203...@qq.com Date: 2015-04-08T07:55:24Z Merge pull request #8 from apache/master merge lastest spark commit f61210c03f693a266969e06c52c23ccd1bfe3e1b Author: Zhongshuai Pei 799203...@qq.com Date: 2015-04-17T09:10:48Z Merge pull request #9 from apache/master merge lastest spark commit c783d02f5fdc44b894d4e8010d3c26c4cde7850c Author: Zhongshuai Pei 799203...@qq.com Date: 2015-04-17T09:13:44Z Update HiveMetastoreCatalog.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-93951771 [Test build #30477 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30477/consoleFull) for PR 5467 at commit [`da1642d`](https://github.com/apache/spark/commit/da1642deb67dde65bb55b08ae47bd5ce0d29d545). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5338][MESOS] Add cluster mode support f...
Github user tnachen commented on a diff in the pull request: https://github.com/apache/spark/pull/5144#discussion_r28575245 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala --- @@ -0,0 +1,614 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster.mesos + +import java.io.File +import java.util.concurrent.locks.ReentrantLock +import java.util.{Collections, Date, List = JList} + +import org.apache.mesos.Protos.Environment.Variable +import org.apache.mesos.Protos.TaskStatus.Reason +import org.apache.mesos.Protos.{TaskState = MesosTaskState, _} +import org.apache.mesos.{Scheduler, SchedulerDriver} +import org.apache.spark.deploy.mesos.MesosDriverDescription +import org.apache.spark.deploy.rest.{CreateSubmissionResponse, KillSubmissionResponse, SubmissionStatusResponse} +import org.apache.spark.metrics.MetricsSystem +import org.apache.spark.util.Utils +import org.apache.spark.{SecurityManager, SparkConf, SparkException, TaskState} + +import scala.collection.JavaConversions._ +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer + + +/** + * Tracks the current state of a Mesos Task that runs a Spark driver. + * @param submission Submitted driver description from + * [[org.apache.spark.deploy.rest.mesos.MesosRestServer]] + * @param taskId Mesos TaskID generated for the task + * @param slaveId Slave ID that the task is assigned to + * @param taskState The last known task status update. + * @param startDate The date the task was launched + */ +private[spark] class MesosClusterTaskState( +val submission: MesosDriverDescription, +val taskId: TaskID, +val slaveId: SlaveID, +var taskState: Option[TaskStatus], +var startDate: Date) + extends Serializable { + + def copy(): MesosClusterTaskState = { +new MesosClusterTaskState( + submission, taskId, slaveId, taskState, startDate) + } +} + +/** + * Tracks the retry state of a driver, which includes the next time it should be scheduled + * and necessary information to do exponential backoff. + * This class is not thread-safe, and we expect the caller to handle synchronizing state. + * @param lastFailureStatus Last Task status when it failed. + * @param retries Number of times it has retried. + * @param nextRetry Next retry time to be scheduled. + * @param waitTime The amount of time driver is scheduled to wait until next retry. + */ +private[spark] class RetryState( +val lastFailureStatus: TaskStatus, +val retries: Int, +val nextRetry: Date, +val waitTime: Int) extends Serializable { + def copy(): RetryState = +new RetryState(lastFailureStatus, retries, nextRetry, waitTime) +} + +/** + * The full state of the cluster scheduler, currently being used for displaying + * information on the UI. + * @param frameworkId Mesos Framework id for the cluster scheduler. + * @param masterUrl The Mesos master url + * @param queuedDrivers All drivers queued to be launched + * @param launchedDrivers All launched or running drivers + * @param finishedDrivers All terminated drivers + * @param retryList All drivers pending to be retried + */ +private[spark] class MesosClusterSchedulerState( +val frameworkId: String, +val masterUrl: Option[String], +val queuedDrivers: Iterable[MesosDriverDescription], +val launchedDrivers: Iterable[MesosClusterTaskState], +val finishedDrivers: Iterable[MesosClusterTaskState], +val retryList: Iterable[MesosDriverDescription]) + +/** + * A Mesos scheduler that is responsible for launching submitted Spark drivers in cluster mode + * as Mesos tasks in a Mesos cluster. + * All drivers are launched asynchronously by the framework, which will eventually be launched + * by one of the
[GitHub] spark pull request: [BUILD] Support building with SBT on encrypted...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5546#issuecomment-93937847 Hm, doesn't the class file name affect how it's found? that's how the classloader finds the class. I also don't know of a specific instance where this created a problem, but the fact that it needs to be set means something about the output will change. What was the additional bit of info you mention that says this is safe? If it really is verifiably never going to change the linking result, then it doesn't matter, but seems like it would by changing file names. Is building on an encrypted file system common? Yes the SBT build would only be for developers though the inconsistency is a little worrying, so I think it best to do it both places or neither. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-93942492 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30464/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6065] [MLlib] Optimize word2vec.findSyn...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5467#issuecomment-93942477 [Test build #30464 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30464/consoleFull) for PR 5467 at commit [`64575b0`](https://github.com/apache/spark/commit/64575b0282b350facc93340fbf653b38b0121b1a). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org