[GitHub] spark pull request: [SPARK-4721][CORE] Improve logic while first t...
Github user suyanNone commented on the pull request: https://github.com/apache/spark/pull/3582#issuecomment-66031430 @JoshRosen I guessï¼ 1. Tow Thread in Same Executor 1.1 Two Thread in same Executor, Executor have 4 core, and cpu per task is 1. RDDC <-- other: TaskSet1 RDDA.cache <-- RDDB <-- other: TaskSet2 Thread A give to TaskSet1 Thread B give to TaskSet2, because A can't get any task because the locality or A's tasks are all scheduled. It so chanced that Thread A and B all deal with the same partition at the same time. 2. Two Thread in different Executor. 2.2 replicate blockA to ExecutorB, and ExecutorB is just to cache blockA after call cacheManager.putInBlockManager. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...
Github user tianyi commented on a diff in the pull request: https://github.com/apache/spark/pull/3375#discussion_r21435396 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashOuterJoin.scala --- @@ -68,62 +68,59 @@ case class HashOuterJoin( @transient private[this] lazy val DUMMY_LIST = Seq[Row](null) @transient private[this] lazy val EMPTY_LIST = Seq.empty[Row] + @transient private[this] lazy val joinedRow = new JoinedRow() --- End diff -- Hi, @marmbrus I have tested these codes in following way: 1 Added the following codes in `leftOuterIterator` before return the iterator ``` System.out.println("["+Thread.currentThread().getName+"]The hashcode of joinedRow = ["+ System.identityHashCode(joinedRow) +"]"); ``` 2 I started a thriftserver with `-Dspark.master=local[4]` properties to make sure there are 4 tasks running in the same JVM. 3 Then I run a sql like (same table and data with my benchmark test): ``` select count(1) from test_csv a left outer join dim_csv b on a.key = b.key; ``` 4 I got the following logs: ``` [Executor task launch worker-0]The hashcode of joinedRow = [155362507] [Executor task launch worker-1]The hashcode of joinedRow = [2032520348] [Executor task launch worker-2]The hashcode of joinedRow = [1382003837] [Executor task launch worker-3]The hashcode of joinedRow = [459410904] [Executor task launch worker-2]The hashcode of joinedRow = [1382003837] [Executor task launch worker-1]The hashcode of joinedRow = [2032520348] [Executor task launch worker-0]The hashcode of joinedRow = [155362507] [Executor task launch worker-1]The hashcode of joinedRow = [2032520348] [Executor task launch worker-2]The hashcode of joinedRow = [1382003837] [Executor task launch worker-3]The hashcode of joinedRow = [459410904] ``` I think that proved the codes should working fine under mutible tasks running in the same JVM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user suyanNone commented on the pull request: https://github.com/apache/spark/pull/3574#issuecomment-65976983 @JoshRosen yes, it will not cause any error if there not have a check in removeBlock and dropOldBlocks I pull this request, because I think it will be more reasonable in semantic. and I already remove the check in removeBlock and dropOldBlocks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3802][BUILD] Scala version is wrong in ...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/2661#issuecomment-65974844 @sarutak It looks like this code has now been [changed](https://github.com/apache/spark/blob/8817fc7fe8785d7b11138ca744f22f7e70f1f0a0/dev/audit-release/blank_sbt_build/build.sbt#L22) to read from an environment variable. Do we still need this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] SPARK-2126: Move MapOutputTracker behind...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/1240#issuecomment-65974769 Okie doke! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] SPARK-2126: Move MapOutputTracker behind...
Github user CodingCat closed the pull request at: https://github.com/apache/spark/pull/1240 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] SPARK-2126: Move MapOutputTracker behind...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/1240#issuecomment-65974706 we can close it, as the relevant code has been changed a lot --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] SPARK-2126: Move MapOutputTracker behind...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/1240#issuecomment-65974642 @pwendell @mateiz What is the status of this PR? It's been 5 months since the last update. Should @CodingCat continue to work on this or close it and move on? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4620] Add unpersist in Graph and GraphI...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3476 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4620] Add unpersist in Graph and GraphI...
Github user ankurdave commented on the pull request: https://github.com/apache/spark/pull/3476#issuecomment-65972811 Thanks! Merged into master & branch-1.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4646] Replace Scala.util.Sorting.quickS...
Github user ankurdave commented on the pull request: https://github.com/apache/spark/pull/3507#issuecomment-65972597 Thanks! Merged into master & branch-1.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4646] Replace Scala.util.Sorting.quickS...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3507 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4721][CORE] Improve logic while first t...
Github user suyanNone commented on the pull request: https://github.com/apache/spark/pull/3582#issuecomment-65970969 @JoshRosen Okay, I will check if there will be multiple thread to be the writer. I write this code because while I read the current code, it think there will be multiple thread to put the same block. So I just start from that point to improve it. I will comment later about that. and for my code, the read thread only wait one time whether is succeed or failed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4769] [SQL] CTAS does not work when rea...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3336#issuecomment-65968143 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24218/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4769] [SQL] CTAS does not work when rea...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3336#issuecomment-65968142 [Test build #24218 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24218/consoleFull) for PR 3336 at commit [`3563142`](https://github.com/apache/spark/commit/356314220d095ef125ea4a925578179b1b33dde6). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4769] [SQL] CTAS does not work when rea...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3336#issuecomment-65967881 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24217/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4769] [SQL] CTAS does not work when rea...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3336#issuecomment-65967878 [Test build #24217 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24217/consoleFull) for PR 3336 at commit [`e215187`](https://github.com/apache/spark/commit/e21518724febd2bc9e761fa6502a6924fd6a22e6). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4769] [SQL] CTAS does not work when rea...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3336#issuecomment-65964919 [Test build #24218 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24218/consoleFull) for PR 3336 at commit [`3563142`](https://github.com/apache/spark/commit/356314220d095ef125ea4a925578179b1b33dde6). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4769] [SQL] CTAS does not work when rea...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3336#issuecomment-65964575 [Test build #24217 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24217/consoleFull) for PR 3336 at commit [`e215187`](https://github.com/apache/spark/commit/e21518724febd2bc9e761fa6502a6924fd6a22e6). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3639] [Streaming] [Kinesis] Allow users...
Github user cfregly commented on the pull request: https://github.com/apache/spark/pull/3092#issuecomment-65963749 @tdas the summary is here: https://issues.apache.org/jira/browse/SPARK-3640?focusedCommentId=14204334&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14204334 @aniketbhatnagar: is this still a valid Jira and PR? or should we close them? lemme know. thanks! -chris --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4769] [SQL] CTAS does not work when rea...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/3336#issuecomment-65963725 @marmbrus Any comment on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.
Github user mingyukim commented on the pull request: https://github.com/apache/spark/pull/3130#issuecomment-65959927 To add a more specific use case of using Spark without spark-submit, we have a REST server that has a long-running SparkContext, which serves simple queries like rdd.take(100) or rdd.count(). This would be possible, but a lot harder to implement with spark-submit. Spark-submit job will need to act like another REST server waiting for requests. I strongly believe that supporting the submission of jobs in an independent application will be crucial in the wide adoption of Spark in many other applications. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3655 GroupByKeyAndSortValues
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3632#issuecomment-65958988 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3655 GroupByKeyAndSortValues
GitHub user koertkuipers opened a pull request: https://github.com/apache/spark/pull/3632 SPARK-3655 GroupByKeyAndSortValues See https://issues.apache.org/jira/browse/SPARK-3655 This pullreq is based on the approach that uses repartitionAndSortWithinPartition, but only implements GroupByKeyAndSortValues and not foldLeft. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tresata/spark feat-group-by-key-and-sort-values Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3632.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3632 commit 7e3cde989ec93849d60988e6d9fae729ca0c46a4 Author: Koert Kuipers Date: 2014-12-07T20:16:53Z works but Iterables in signature are not right commit 42075338a32c40e4b962b547dbc74aad89351207 Author: Koert Kuipers Date: 2014-12-07T21:57:25Z change groupByKeyAndSortValues to return RDD[(K, TraversableOnce[V]) instead of RDD[(K, Iterable[V]). i dont think the Iterable version can be implemented efficiently --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: #2808 update kafka to version 0.8.2
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3631#issuecomment-65941807 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: #2808 update kafka to version 0.8.2
GitHub user helena opened a pull request: https://github.com/apache/spark/pull/3631 #2808 update kafka to version 0.8.2 #2808 update kafka to version 0.8.2. Kafka 0.8.2 is in beta still. You can merge this pull request into a Git repository by running: $ git pull https://github.com/helena/spark wip-2808-kafka-0.8.2-upgrade Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3631.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3631 commit e768164fca1c93ec0a99f7020e301368f798156c Author: Helena Edelson Date: 2014-12-07T15:50:44Z #2808 update kafka to version 0.8.2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4461][YARN] pass extra java options to ...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/3409#issuecomment-65939827 If we use `spark.yarn.clientmode.am.extraJavaOptions`, then should tell users Application Master related configurations contains `clientmode` is only active on yarn-client mode. AM will use corresponding items on yarn-cluster mode. If we use `spark.yarn.am.extraJavaOptions` instead, we should tell users same thing. Why don't we keep the name shorter? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3779. yarn spark.yarn.applicationMaster....
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/3471#issuecomment-65938199 LGTM +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org