[GitHub] spark pull request: [SPARK-1112, 2156] (1.0 edition) Use correct a...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1172#discussion_r14053159 --- Diff: core/src/main/scala/org/apache/spark/executor/ExecutorBackend.scala --- @@ -26,4 +26,7 @@ import org.apache.spark.TaskState.TaskState */ private[spark] trait ExecutorBackend { def statusUpdate(taskId: Long, state: TaskState, data: ByteBuffer) + + // Exists as a work around for SPARK-1112. This only exists in branch-1.x of Spark. + def akkaFrameSize(): Long = Long.MaxValue --- End diff -- The `MesosExecutorBackend` sends results through mesos, not akka. The LocalBackend sends a message to an actor within the same actor system... which I assumed won't go over TCP. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1316. Remove use of Commons IO
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1173#issuecomment-46773549 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1316. Remove use of Commons IO
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1173#issuecomment-46773550 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1316. Remove use of Commons IO
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1173#issuecomment-46773551 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-46773577 1.1 is not released yet. This PR is in master but not in 1.0 (it may be released in 1.0.1 or if not then 1.1). So you'll have to clone master and run sbt/sbt publish-local which will publish the maven and sbt artifacts to your local repos. â Sent from Mailbox On Sun, Jun 22, 2014 at 1:22 AM, Russell Jurney notificati...@github.com wrote: Thanks a ton! One thing - how can I pull spark core 1.1 from maven? [ERROR] Failed to execute goal on project avro: Could not resolve dependencies for project example:avro:jar:0.1: Could not find artifact org.apache.spark:spark-core_2.10:jar:1.1.0-SNAPSHOT in scala-tools.org ( http://scala-tools.org/repo-releases) - [Help 1] On Fri, Jun 20, 2014 at 10:45 PM, MLnick notificati...@github.com wrote: @rjurney https://github.com/rjurney this works for me (building Spark from current master): https://gist.github.com/MLnick/5864741781b9340cb211 if you run mvn package and then add that to SPARK_CLASSPATH and use it in IPython console. However it seems to come through as only strings (not a dict). I verified that if I take only the string field and explicitly convert to string (ie Map[String, String]) then it works. I suspect then that Avro doesn't have the type information at all, so Pyrolite cannot pickle it. I guess you might have to do something more in depth in the AvroConverter to read the type info from the Avro schema and do a cast... â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/455#issuecomment-46745394. -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com --- Reply to this email directly or view it on GitHub: https://github.com/apache/spark/pull/455#issuecomment-46767642 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1112, 2156] (1.0 edition) Use correct a...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1172#issuecomment-46773627 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1112, 2156] (1.0 edition) Use correct a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1172#issuecomment-46773676 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1112, 2156] (1.0 edition) Use correct a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1172#issuecomment-46773677 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1996. Remove use of special Maven repo f...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1170 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2034. KafkaInputDStream doesn't close re...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/980#discussion_r14053246 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaInputDStream.scala --- @@ -112,10 +114,14 @@ class KafkaReceiver[ val topicMessageStreams = consumerConnector.createMessageStreams( topics, keyDecoder, valueDecoder) - -// Start the messages handler for each partition -topicMessageStreams.values.foreach { streams = - streams.foreach { stream = executorPool.submit(new MessageHandler(stream)) } +val executorPool = Executors.newFixedThreadPool(topics.values.sum) --- End diff -- minor - but to avoid a name collision with Spark's own `Executor` we usually try call variables like this `threadPool`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2034. KafkaInputDStream doesn't close re...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/980#discussion_r14053248 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaInputDStream.scala --- @@ -112,10 +114,14 @@ class KafkaReceiver[ val topicMessageStreams = consumerConnector.createMessageStreams( topics, keyDecoder, valueDecoder) - -// Start the messages handler for each partition -topicMessageStreams.values.foreach { streams = - streams.foreach { stream = executorPool.submit(new MessageHandler(stream)) } +val executorPool = Executors.newFixedThreadPool(topics.values.sum) --- End diff -- I see that actually you didn't add this name, so nevermind! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2034. KafkaInputDStream doesn't close re...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/980#issuecomment-46774049 LGTM pending tests. Jenkins retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1316. Remove use of Commons IO
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1173#issuecomment-46774097 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2034. KafkaInputDStream doesn't close re...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/980#issuecomment-46774099 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1316. Remove use of Commons IO
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1173#issuecomment-46774098 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16006/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2034. KafkaInputDStream doesn't close re...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/980#issuecomment-46774102 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2231: dev/run-tests should include YARN ...
GitHub user pwendell opened a pull request: https://github.com/apache/spark/pull/1175 SPARK-2231: dev/run-tests should include YARN and use a recent Hadoop version ...rsion You can merge this pull request into a Git repository by running: $ git pull https://github.com/pwendell/spark test-hadoop-version Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1175.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1175 commit 9210ef400f7e907d9f67f396bbb7806558d61930 Author: Patrick Wendell pwend...@gmail.com Date: 2014-06-22T06:56:02Z SPARK-2231: dev/run-tests should include YARN and use a recent Hadoop version --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2231: dev/run-tests should include YARN ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1175#issuecomment-46774162 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1112, 2156] (1.0 edition) Use correct a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1172#issuecomment-46774166 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16007/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2231: dev/run-tests should include YARN ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1175#issuecomment-46774164 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2141] Adding getPersistentRddIds and un...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1082#discussion_r14053327 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala --- @@ -559,6 +559,19 @@ class JavaSparkContext(val sc: SparkContext) extends JavaSparkContextVarargsWork def getLocalProperty(key: String): String = sc.getLocalProperty(key) /** + * Get a set of RDD IDs that have marked themselves as persistent via cache() call. + * Note that this does not necessarily mean the caching or computation was successful. + */ + def getPersistentRddIds(): java.util.Set[Int] = +setAsJavaSet(sc.getPersistentRDDs.keySet) + + /** + * Unpersist an RDD from memory and/or disk storage --- End diff -- Minor: needs to end with a `.` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2141] Adding getPersistentRddIds and un...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1082#issuecomment-46774517 LGTM with a minor comment that can be addressed on merge. @rxin any further comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2034. KafkaInputDStream doesn't close re...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/980#issuecomment-46774624 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16008/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2231: dev/run-tests should include YARN ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1175#issuecomment-46774725 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2231: dev/run-tests should include YARN ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1175#issuecomment-46774727 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16009/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1112, 2156] (1.0 edition) Use correct a...
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1172#discussion_r14053359 --- Diff: core/src/main/scala/org/apache/spark/executor/ExecutorBackend.scala --- @@ -26,4 +26,7 @@ import org.apache.spark.TaskState.TaskState */ private[spark] trait ExecutorBackend { def statusUpdate(taskId: Long, state: TaskState, data: ByteBuffer) + + // Exists as a work around for SPARK-1112. This only exists in branch-1.x of Spark. + def akkaFrameSize(): Long = Long.MaxValue --- End diff -- I see. So the only real change is that in certain cases the LocalBackend will no longer use the BlockManager for returning results. Sounds fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2231: dev/run-tests should include YARN ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1175 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1112, 2156] (1.0 edition) Use correct a...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1172#issuecomment-46775109 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2034. KafkaInputDStream doesn't close re...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/980 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1112, 2156] (1.0 edition) Use correct a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1172#issuecomment-46775157 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1112, 2156] (1.0 edition) Use correct a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1172#issuecomment-46775153 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1112, 2156] (1.0 edition) Use correct a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1172#issuecomment-46775720 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16010/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1112, 2156] (1.0 edition) Use correct a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1172#issuecomment-46775719 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2229: FileAppender throw an llegalArgume...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1174#issuecomment-46778033 +1 I literally ran into this too 6 hours ago and had the same fix. It's from the change for SPARK-1940. I think it's a good idea that test be run on Java 6 as a result? this is another of several that would have been caught by that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [MLLIB] [SPARK-2222] Add multiclass evaluation...
Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/1155#issuecomment-46778186 The micro averaged Precision and Recall are equal for multiclass classifier, because sum(fni)=sum(fpi), i.e. they are just the sum of all non-diagonal elements in confusion matrix. F1-measure, as a harmonic mean of teo equal numbers, also equals to P and R. For more details please refer to the book Introduction to IR by Manning. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1316. Remove use of Commons IO
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1173#issuecomment-46778249 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1316. Remove use of Commons IO
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1173#issuecomment-46778248 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1316. Remove use of Commons IO
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1173#issuecomment-46779166 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16011/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1316. Remove use of Commons IO
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1173#issuecomment-46779165 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1949. Servlet 2.5 vs 3.0 conflict in SBT...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/906#issuecomment-46781514 Ah OK, it did fail for me locally with `sbt clean assembly test`. Sorry, this did in fact have a problem. I think akka does need the old Netty; the second commit was a change too far. The first commit it the one cleaning up the immediate issue. I dropped the second commit and rebased and all is well. Let's see what Jenkins makes of it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1949. Servlet 2.5 vs 3.0 conflict in SBT...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/906#issuecomment-46781581 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1949. Servlet 2.5 vs 3.0 conflict in SBT...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/906#issuecomment-46782563 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16012/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1949. Servlet 2.5 vs 3.0 conflict in SBT...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/906#issuecomment-46782562 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [Spark 1199][WIP] Changed wrappers to not use ...
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/1176 [Spark 1199][WIP] Changed wrappers to not use vals and thus avoid Path dependent types problem. TODO: Write description. basically it fails for one particular scenario and I am enjoying tough time debugging it :) You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 SPARK-1199/repl-case-class-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1176.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1176 commit c2a6498bb40253da9195d01927caa2748919ad96 Author: Prashant Sharma prashan...@imaginea.com Date: 2014-06-18T12:34:12Z Back porting scala 2.11 SI-7747's changes on top of my patch. commit fa7ffca15d0d6cd1c8e2a0064ba4f12f35d5f263 Author: Prashant Sharma prashan...@imaginea.com Date: 2014-06-19T12:06:08Z Added a convenience for debugging the generated wrappers as it exists in scala 2.11 repl. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [Spark 1199][WIP] Changed wrappers to not use ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1176#issuecomment-46785360 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1199][WIP] Changed wrappers to not use ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1176#issuecomment-46785365 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [MLLIB] [SPARK-2222] Add multiclass evaluation...
Github user xiejuncs commented on the pull request: https://github.com/apache/spark/pull/1155#issuecomment-46786256 It makes sense. You are right. sum(fni)=sum(fpi). The recall and precision are the same. Thanks very much. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1199][WIP] Changed wrappers to not use ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1176#issuecomment-46786334 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1199][WIP] Changed wrappers to not use ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1176#issuecomment-46786336 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16013/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update BasicOperationsSuite.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1084#issuecomment-46786699 let me do a check --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-46789138 Thanks, master doesn't build for me. Is there a particular commit you recommend using? [error] [error] last tree to typer: Literal(Constant(org.apache.spark.sql.catalyst.types.PrimitiveType)) [error] symbol: null [error]symbol definition: null [error] tpe: Class(classOf[org.apache.spark.sql.catalyst.types.PrimitiveType]) [error]symbol owners: [error] context owners: object TestSQLContext - package test [error] [error] == Enclosing template or block == [error] [error] Template( // val local TestSQLContext: notype in object TestSQLContext, tree.tpe=org.apache.spark.sql.test.TestSQLContext.type [error] org.apache.spark.sql.SQLContext // parents [error] ValDef( [error] private [error] _ [error] tpt [error] empty [error] ) [error] // 2 statements [error] DefDef( // private def readResolve(): Object in object TestSQLContext [error] method private synthetic [error] readResolve [error] [] [error] List(Nil) [error] tpt // tree.tpe=Object [error] test.this.TestSQLContext // object TestSQLContext in package test, tree.tpe=org.apache.spark.sql.test.TestSQLContext.type [error] ) [error] DefDef( // def init(): org.apache.spark.sql.test.TestSQLContext.type in object TestSQLContext [error] method [error] init [error] [] [error] List(Nil) [error] tpt // tree.tpe=org.apache.spark.sql.test.TestSQLContext.type [error] Block( // tree.tpe=Unit [error] Apply( // def init(sparkContext: org.apache.spark.SparkContext): org.apache.spark.sql.SQLContext in class SQLContext, tree.tpe=org.apache.spark.sql.SQLContext [error] TestSQLContext.super.init // def init(sparkContext: org.apache.spark.SparkContext): org.apache.spark.sql.SQLContext in class SQLContext, tree.tpe=(sparkContext: org.apache.spark.SparkContext)org.apache.spark.sql.SQLContext [error] Apply( // def init(master: String,appName: String,conf: org.apache.spark.SparkConf): org.apache.spark.SparkContext in class SparkContext, tree.tpe=org.apache.spark.SparkContext [error] new org.apache.spark.SparkContext.init // def init(master: String,appName: String,conf: org.apache.spark.SparkConf): org.apache.spark.SparkContext in class SparkContext, tree.tpe=(master: String, appName: String, conf: org.apache.spark.SparkConf)org.apache.spark.SparkContext [error] // 3 arguments [error] local [error] TestSQLContext [error] Apply( // def init(): org.apache.spark.SparkConf in class SparkConf, tree.tpe=org.apache.spark.SparkConf [error] new org.apache.spark.SparkConf.init // def init(): org.apache.spark.SparkConf in class SparkConf, tree.tpe=()org.apache.spark.SparkConf [error] Nil [error] ) [error] ) [error] ) [error] () [error] ) [error] ) [error] ) [error] [error] == Expanded type of tree == [error] [error] ConstantType( [error] value = Constant(org.apache.spark.sql.catalyst.types.PrimitiveType) [error] ) [error] [error] uncaught exception during compilation: java.lang.AssertionError java.lang.AssertionError: assertion failed: List(object package$DebugNode, object package$DebugNode) at scala.reflect.internal.Symbols$Symbol.suchThat(Symbols.scala:1678) at scala.reflect.internal.Symbols$ClassSymbol.companionModule0(Symbols.scala:2988) at scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:2991) at scala.tools.nsc.backend.jvm.GenASM$JPlainBuilder.genClass(GenASM.scala:1371) at scala.tools.nsc.backend.jvm.GenASM$AsmPhase.run(GenASM.scala:120) at scala.tools.nsc.Global$Run.compileUnitsInternal(Global.scala:1583) at scala.tools.nsc.Global$Run.compileUnits(Global.scala:1557) at scala.tools.nsc.Global$Run.compileSources(Global.scala:1553) at scala.tools.nsc.Global$Run.compile(Global.scala:1662) at xsbt.CachedCompiler0.run(CompilerInterface.scala:123) at xsbt.CachedCompiler0.run(CompilerInterface.scala:99) at xsbt.CompilerInterface.run(CompilerInterface.scala:27) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
[GitHub] spark pull request: SPARK-1316. Remove use of Commons IO
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1173#issuecomment-46789151 Thanks. I've merged this in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1316. Remove use of Commons IO
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1173 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2141] Adding getPersistentRddIds and un...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1082#issuecomment-46789198 Yup looks good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1112, 2156] (1.0 edition) Use correct a...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1172#issuecomment-46789575 I think these are failing because our tests assume that in local mode we enforce the frame size limit (which we actually don't need to). I'll make the appropriate adjustments in a bit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: add a materialize method to materialize Vertex...
GitHub user bxshi opened a pull request: https://github.com/apache/spark/pull/1177 add a materialize method to materialize VertexRDD by calling RDD's count Seems one can not materialize VertexRDD by simply calling count method, which is overridden by VertexRDD. But if you call RDD's count, it could materialize it. Is this a feature that designed to get the count without materialize VertexRDD? If so, do you guys think it is necessary to add a materialize method to VertexRDD? By the way, does count() is the cheapest way to materialize a RDD? Or it just cost the same resources like other actions? Best, You can merge this pull request into a Git repository by running: $ git pull https://github.com/bxshi/spark materialize_vertexRDD Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1177.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1177 commit 3be5d6a6f6285c6276d80210bf477c483c09c2f9 Author: bxshi baoxu@gmail.com Date: 2014-06-22T20:39:52Z add a materialize method to materialize VertexRDD by calling RDD's count method --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: add a materialize method to materialize Vertex...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1177#issuecomment-46792651 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: add a materialize method to materialize Vertex...
Github user bxshi commented on the pull request: https://github.com/apache/spark/pull/1177#issuecomment-46792759 Here's a simple code that could reproduce the problem ``` val conf = new SparkConf().setAppName(HDTM) .setMaster(local[4]) val sc = new SparkContext(conf) sc.setCheckpointDir(./checkpoint) val v = sc.parallelize(Seq[(VertexId, Long)]((0L, 0L), (1L, 1L), (2L, 2L))) val e = sc.parallelize(Seq[Edge[Long]](Edge(0L, 1L, 0L), Edge(1L, 2L, 1L), Edge(2L, 0L, 2L))) val g = Graph(v, e) g.vertices.checkpoint() g.edges.checkpoint() g.vertices.count() g.numEdges println(s${g.vertices.isCheckpointed } ${g.edges.isCheckpointed}) g.vertices.materialize() println(s${g.vertices.isCheckpointed } ${g.edges.isCheckpointed}) ``` The first output is `false true` and after calling `materialize` the output is `true true`, which means vertexRDD is correctly check pointed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1112, 2156] (1.0 edition) Use correct a...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1172#issuecomment-46793729 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1112, 2156] (1.0 edition) Use correct a...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1172#discussion_r14056893 --- Diff: core/src/main/scala/org/apache/spark/executor/ExecutorBackend.scala --- @@ -26,4 +26,7 @@ import org.apache.spark.TaskState.TaskState */ private[spark] trait ExecutorBackend { def statusUpdate(taskId: Long, state: TaskState, data: ByteBuffer) + + // Exists as a work around for SPARK-1112. This only exists in branch-1.x of Spark. + def akkaFrameSize(): Long = Long.MaxValue --- End diff -- So that change actually alters the expectations of the unit tests, so I went ahead and just enforced the limit in the LocalBackend anwyays. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1112, 2156] (1.0 edition) Use correct a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1172#issuecomment-46793847 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1112, 2156] (1.0 edition) Use correct a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1172#issuecomment-46793850 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1112, 2156] (1.0 edition) Use correct a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1172#issuecomment-46794756 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1112, 2156] (1.0 edition) Use correct a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1172#issuecomment-46794757 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16014/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2124] Move aggregation into shuffle imp...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/1064#issuecomment-46798422 Hi Matei, thanks for your review, I will update the code soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2229: FileAppender throw an llegalArgume...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1174#issuecomment-46799166 Thanks. I'm merging this in master. @pwendell - we probably want to run tests on JDK6 ... (if possible both in the build matrix) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2229: FileAppender throw an llegalArgume...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1174 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1151#issuecomment-46800258 Hi marmbrus I update these files as your comment tips ,but i think i may make some mistakes in the code .Could you help me and give me some tips ?I will continue to work around it and debug it to make it better Thanks a lot ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SQL][SPARK-2212]HashJoin(Shuffled)
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1147#issuecomment-46800787 Thank you all for the comments, I will changed some of the code accordingly. This PR actually contains 2 relevant parts: - Code Re-factor for Join - Removed `FilteredOperation` from the patterns.scala, cause the filters(WHERE CONDITION JOIN CONDITION) has been pushed down via the `PushPredicateThroughJoin` in logical.Optimizer.scala already. Discard the combination of filters(where and join condition) seems make the join pattern match more clean and simple. - Pattern matching order is actually very critical for the Join Operator Selection in SparkStrategies.scala, hence I merged the 3 Join Strategies into 1. - The trait `BinaryJoinNode`, which can be utilized by `HashJoin` / `SortMergeJoin`(will implement soon) / `CartesionProduct`(InnerJoin) / `MapSide Join` (Left/Inner/LeftSemi, assume the right table is the build table) for all of the join types; and if we want to add code gen for join condition, only we need to modify is the trait `BinaryJoinNode`. - Add Outer Join Support for HashJoin - With `BinaryJoinNode`, add hash based outer join support is easy. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1112, 2156] (1.0 edition) Use correct a...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1172#issuecomment-46801344 @aarondav mind taking a final pass and merging this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1112, 2156] (1.0 edition) Use correct a...
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/1172#issuecomment-46801472 Absolutely. LGTM, merging into branch-1.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Compression should be a setting for individual...
Github user ScrapCodes closed the pull request at: https://github.com/apache/spark/pull/1091 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Compression should be a setting for individual...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/1091#issuecomment-46801657 Thanks @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1937: fix issue with task locality
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-46801884 Sorry about the code style and thanks @mateiz for pointing out. I've updated the patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1112, 2156] (1.0 edition) Use correct a...
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/1172#issuecomment-46801936 You may have to close this manually, @pwendell, I'm not sure github will close it if it's not in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1937: fix issue with task locality
Github user lirui-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r14059200 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -181,16 +181,14 @@ private[spark] class TaskSetManager( var hadAliveLocations = false for (loc - tasks(index).preferredLocations) { for (execId - loc.executorId) { -if (sched.isExecutorAlive(execId)) { - addTo(pendingTasksForExecutor.getOrElseUpdate(execId, new ArrayBuffer)) - hadAliveLocations = true -} +addTo(pendingTasksForExecutor.getOrElseUpdate(execId, new ArrayBuffer)) } if (sched.hasExecutorsAliveOnHost(loc.host)) { -addTo(pendingTasksForHost.getOrElseUpdate(loc.host, new ArrayBuffer)) -for (rack - sched.getRackForHost(loc.host)) { - addTo(pendingTasksForRack.getOrElseUpdate(rack, new ArrayBuffer)) -} +hadAliveLocations = true + } + addTo(pendingTasksForHost.getOrElseUpdate(loc.host, new ArrayBuffer)) + for (rack - sched.getRackForHost(loc.host)) { +addTo(pendingTasksForRack.getOrElseUpdate(rack, new ArrayBuffer)) hadAliveLocations = true --- End diff -- Do you mean the TaskScheduler should provide something like hasHostOnRack, and we have to check that before set hadAliveLocations to true? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1112, 2156] (1.0 edition) Use correct a...
Github user pwendell closed the pull request at: https://github.com/apache/spark/pull/1172 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1112, 2156] (1.0 edition) Use correct a...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1172#issuecomment-46802157 Thanks, closed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] [SQL] SPARK-1800 Add broadcast hash join...
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1163#discussion_r14059227 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetRelation.scala --- @@ -44,10 +49,21 @@ import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, LeafNode} * @param path The path to the Parquet file. */ private[sql] case class ParquetRelation( -val path: String, -@transient val conf: Option[Configuration] = None) extends LeafNode with MultiInstanceRelation { +path: String, +@transient conf: Option[Configuration] = None) + extends LeafNode + with MultiInstanceRelation + with SizeEstimatableRelation[SQLContext] { + self: Product = + def estimatedSize(context: SQLContext): Long = { --- End diff -- Here we could probably estimate the size more accurately if we also had some semantic information, like which columns we wanted, as I believe Parquet stores stats for each column. Perhaps worthy of a TODO, this seems perfectly reasonable for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: spark-ec2: quote command line args
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1169#issuecomment-46803411 Thanks - I merged this into several maintenance branches and I also created this JIRA to track it: https://issues.apache.org/jira/browse/SPARK-2241 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2166 - Listing of instances to be termin...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/270#issuecomment-46804399 This was actually a pretty tough merge since we changed the spacing around a lot in `spark_ec2` recently. I went ahead and manually dealt with the merge. I also made two minor changes on merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2166 - Listing of instances to be termin...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/270 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2150: Provide direct link to finished ap...
Github user rahulsinghaliitd commented on the pull request: https://github.com/apache/spark/pull/1094#issuecomment-46804664 @sryza thanks for the thumbs up. Although I wonder if the approach in https://github.com/apache/spark/pull/1112 is better for passing the UI address (certainly is much cleaner). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1056#issuecomment-46805087 Sure - it would be great to add a general heartbeat mechanism that is shared between this and the blockmanager. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2242] HOTFIX: Do not mask pyspark stder...
GitHub user andrewor14 opened a pull request: https://github.com/apache/spark/pull/1178 [SPARK-2242] HOTFIX: Do not mask pyspark stderr from output This reverts a change introduced in 3870248740d83b0292ccca88a494ce19783847f0 that masked stderr from surfacing to the `bin/pyspark` shell output. By itself this is not a bug. However, if your `spark.master` is not specified correctly, for example, your spark jobs just hang without any output instead of indicating that it cannot connect to the master. That commit was not merged in branch-1.0, so this fix is for master only. You can merge this pull request into a Git repository by running: $ git pull https://github.com/andrewor14/spark fix-python Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1178.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1178 commit 21c9d7c5af9d1647b496734dcd8fa3901bf8b19a Author: Andrew Or andrewo...@gmail.com Date: 2014-06-23T04:10:04Z Do not mask stderr from output --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2242] HOTFIX: Do not mask pyspark stder...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1178#issuecomment-46805167 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2242] HOTFIX: Do not mask pyspark stder...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1178#issuecomment-46805169 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/1151#discussion_r14060166 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -369,6 +369,17 @@ class SQLQuerySuite extends QueryTest { (3, null))) } + test(subtract) { +checkAnswer( + sql(SELECT * FROM lowerCaseData SUBTRACT SELECT * FROM upperCaseData ), + (1, a) :: + (2, b) :: + (3, c) :: + 4, d) :: Nil) --- End diff -- Maybe you missed a '(' here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/1151#discussion_r14060224 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -119,6 +119,7 @@ class SqlParser extends StandardTokenParsers with PackratParsers { protected val UNCACHE = Keyword(UNCACHE) protected val UNION = Keyword(UNION) protected val WHERE = Keyword(WHERE) + protected val SUBTRACT = Keyword(SUBTRACT) --- End diff -- I think we'd better use MINUS or EXCEPT instead of SUBTRACT --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2242] HOTFIX: pyspark shell hangs on si...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1178#issuecomment-46806151 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2242] HOTFIX: pyspark shell hangs on si...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1178#issuecomment-46806156 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1946] Submit stage after (configured ra...
Github user li-zhihui commented on a diff in the pull request: https://github.com/apache/spark/pull/900#discussion_r14060589 --- Diff: yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClusterSchedulerBackend.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster + + +import org.apache.spark.{Logging, SparkContext} +import org.apache.spark.deploy.yarn.ApplicationMasterArguments +import org.apache.spark.scheduler.TaskSchedulerImpl + +import scala.collection.mutable.ArrayBuffer + +private[spark] class YarnClusterSchedulerBackend( +scheduler: TaskSchedulerImpl, +sc: SparkContext) + extends CoarseGrainedSchedulerBackend(scheduler, sc.env.actorSystem) + with Logging { + + private[spark] def addArg(optionName: String, envVar: String, sysProp: String, + arrayBuf: ArrayBuffer[String]) { +if (System.getenv(envVar) != null) { + arrayBuf += (optionName, System.getenv(envVar)) +} else if (sc.getConf.contains(sysProp)) { + arrayBuf += (optionName, sc.getConf.get(sysProp)) +} + } + + override def start() { +super.start() +val argsArrayBuf = new ArrayBuffer[String]() +List((--num-executors, SPARK_EXECUTOR_INSTANCES, spark.executor.instances), + (--num-executors, SPARK_WORKER_INSTANCES, spark.worker.instances)) + .foreach { case (optName, envVar, sysProp) = addArg(optName, envVar, sysProp, argsArrayBuf) } +val args = new ApplicationMasterArguments(argsArrayBuf.toArray) +totalExecutors.set(args.numExecutors) --- End diff -- @kayousterhout Here ApplicationMaterArguments is used to get default value of numExecutors (It's 2, now). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1729. Make Flume pull data from source, ...
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/807#issuecomment-46806458 @tdas - Have you gotten a chance to take a look at this? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2242] HOTFIX: pyspark shell hangs on si...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1178#issuecomment-46806569 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2242] HOTFIX: pyspark shell hangs on si...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1178#issuecomment-46806570 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16015/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...
Github user YanjieGao commented on a diff in the pull request: https://github.com/apache/spark/pull/1151#discussion_r14060925 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -369,6 +369,17 @@ class SQLQuerySuite extends QueryTest { (3, null))) } + test(subtract) { +checkAnswer( + sql(SELECT * FROM lowerCaseData SUBTRACT SELECT * FROM upperCaseData ), + (1, a) :: + (2, b) :: + (3, c) :: + 4, d) :: Nil) --- End diff -- Thanks ,I have correct it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...
Github user YanjieGao commented on a diff in the pull request: https://github.com/apache/spark/pull/1151#discussion_r14060932 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -119,6 +119,7 @@ class SqlParser extends StandardTokenParsers with PackratParsers { protected val UNCACHE = Keyword(UNCACHE) protected val UNION = Keyword(UNION) protected val WHERE = Keyword(WHERE) + protected val SUBTRACT = Keyword(SUBTRACT) --- End diff -- Thanks, I have correct it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2242] HOTFIX: pyspark shell hangs on si...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1178#issuecomment-46807454 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16016/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2242] HOTFIX: pyspark shell hangs on si...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1178#issuecomment-46807453 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2242] HOTFIX: pyspark shell hangs on si...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1178#issuecomment-46807757 Jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---