[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3249#issuecomment-65758205 [Test build #24177 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24177/consoleFull) for PR 3249 at commit [`ea4e121`](https://github.com/apache/spark/commit/ea4e121bf7adc52a26583c795a748f794b838dfb). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3249#issuecomment-65758275 [Test build #24177 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24177/consoleFull) for PR 3249 at commit [`ea4e121`](https://github.com/apache/spark/commit/ea4e121bf7adc52a26583c795a748f794b838dfb). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class SubqueryExpression(value: Expression, subquery: LogicalPlan) extends Expression ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3249#issuecomment-65758277 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24177/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user suyanNone commented on a diff in the pull request: https://github.com/apache/spark/pull/3574#discussion_r21359660 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1089,15 +1089,17 @@ private[spark] class BlockManager( val info = blockInfo.get(blockId).orNull if (info != null) { info.synchronized { -// Removals are idempotent in disk store and memory store. At worst, we get a warning. -val removedFromMemory = memoryStore.remove(blockId) -val removedFromDisk = diskStore.remove(blockId) -val removedFromTachyon = if (tachyonInitialized) tachyonStore.remove(blockId) else false -if (!removedFromMemory !removedFromDisk !removedFromTachyon) { - logWarning(sBlock $blockId could not be removed as it was not found in either + -the disk, memory, or tachyon store) +if (blockInfo.get(blockId).isEmpty) { --- End diff -- 1. ThreadA into RemoveBlock(), and got info for blockId1 2. ThreadB into DropFromMemory(), and got info for blockId1 now Thread A, B all want got info.sychronized B got, and drop block(blockId) from memory, and this block is use memory only, so it remove from blockinfo. and then release the lock. then A got, but info is already not in blockinfo. Did I have miss sth or misunderstand sth? may dropForMemory and removeBlock can't happen at the same time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4005][CORE] handle message replies in r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2853#issuecomment-65759456 [Test build #541 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/541/consoleFull) for PR 2853 at commit [`9b06f0a`](https://github.com/apache/spark/commit/9b06f0ae7e69fba6a87f56ee97ffad6fd0a20e4b). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3249#issuecomment-65760473 [Test build #24178 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24178/consoleFull) for PR 3249 at commit [`d62887e`](https://github.com/apache/spark/commit/d62887e7ed323b4de0bb7ff61b7bece21fd85d17). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4528][SQL] add comment support for Spar...
Github user tsingfu commented on the pull request: https://github.com/apache/spark/pull/3501#issuecomment-65760992 @marmbrus Need we do something more. If this implement is pretty complex, We may consider that reduce the number of comment support styles(only support -- as comment marker) to make code more simple, What do you think about? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4688] Have a single shared network time...
Github user varunsaxena commented on the pull request: https://github.com/apache/spark/pull/3562#issuecomment-65761296 @rxin , I will just summarize what are the configuration defaults I have used. I put a value of 100 in initial pull request with the intention of having a futher discussion on appropriate defaults. There are 2 approaches possible. We can continue using the same defaults as earlier. That spark.network.timeout will have different default values. Or decide a fixed default value. I think latter should be done but an appropriate value has to be decided. 1. spark.core.connection.ack.wait.timeout - Default value of 60s was used earlier. 2. spark.shuffle.io.connectionTimeout - Default value of 120s was used earlier. 3. spark.akka.timeout - Default value of 100s. was used earlier 4. spark.storage.blockManagerSlaveTimeoutMs - Here default was 3 times value of spark.executor.heartbeatInterval or 45 sec., whichever is higher. I think based on these cases we can fix a default timeout value of 120 sec. for spark.network.timeout The only issue i can see is in case 4. But 120 sec. should be a good enough upper cap I think. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [spark-4691][shuffle]Code improvement for aggr...
Github user maji2014 commented on the pull request: https://github.com/apache/spark/pull/3553#issuecomment-65761712 @pwendell any idea about this title?/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4688] Have a single shared network time...
Github user varunsaxena commented on a diff in the pull request: https://github.com/apache/spark/pull/3562#discussion_r21361036 --- Diff: docs/configuration.md --- @@ -777,6 +777,16 @@ Apart from these, the following properties are also available, and may be useful /td /tr tr + tdcodespark.network.timeout/code/td + td100/td + td +Default timeout for all network interactions, in seconds. This config will be used in +place of spark.core.connection.ack.wait.timeout, spark.akka.timeout, --- End diff -- Ok... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4762: Add support for tuples in 'where i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3618#issuecomment-65764171 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4762: Add support for tuples in 'where i...
GitHub user saucam opened a pull request: https://github.com/apache/spark/pull/3618 SPARK-4762: Add support for tuples in 'where in' clause query Currently, in the where in clause the filter is applied only on a single column. We can enhance it to accept filter on multiple columns. So current support is for queries like : Select * from table where c1 in (value1,value2,...value n); This added support for queries like : Select * from table where (c1,c2,... cn) in ((value1,value2...value n), (value1' , value2' ... ,value n') ) Also, added optimized version of where in clause of tuples , where we create a hashset of the filter tuples for matching rows. This also requires a change in the hive parser since currently there is no support for multiple columns in IN clause. You can merge this pull request into a Git repository by running: $ git pull https://github.com/saucam/spark tuple_where_clause Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3618.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3618 commit c877926c64c7c6f2048d31759f35446c9cec1cdc Author: Yash Datta yash.da...@guavus.com Date: 2014-12-05T08:55:29Z SPARK-4762: 1. Add support for tuples in 'where in' clause query 2. Also adds optimized version of the same, which uses hashset to filter rows --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4762: Add support for tuples in 'where i...
Github user saucam commented on the pull request: https://github.com/apache/spark/pull/3618#issuecomment-65764552 @pwendell this PR requires a change in the hive parser for which i created a PR against hive trunk here : https://github.com/apache/hive/pull/25 can you please suggest if I need to open this request against some other branch which is used for spark build ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2554][SQL] Supporting SumDistinct parti...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/3348#issuecomment-65766741 I have Rebased with master,Please review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3249#issuecomment-65767090 [Test build #24178 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24178/consoleFull) for PR 3249 at commit [`d62887e`](https://github.com/apache/spark/commit/d62887e7ed323b4de0bb7ff61b7bece21fd85d17). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class SubqueryExpression(value: Expression, subquery: LogicalPlan) extends Expression ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4338. Ditch yarn-alpha.
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3215#issuecomment-65767093 @tgravescs @andrewor14 do you feel comfortable merging this now that 1.2 is out the door? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3249#issuecomment-65767098 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24178/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4762: Add support for tuples in 'where i...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3618#issuecomment-65767357 @saucam mind tagging this PR as [SQL]? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4338. Ditch yarn-alpha.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3215#issuecomment-65767819 [Test build #24180 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24180/consoleFull) for PR 3215 at commit [`1c5ac08`](https://github.com/apache/spark/commit/1c5ac0889f387a27650b7f4dd37bb315f96dd201). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4338. Ditch yarn-alpha.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3215#issuecomment-65768623 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24179/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4763] All-pairs shortest paths algorith...
GitHub user ankurdave opened a pull request: https://github.com/apache/spark/pull/3619 [SPARK-4763] All-pairs shortest paths algorithm for GraphX Computes unweighted all-pairs shortest paths, returning an RDD containing the shortest-path distance between all pairs of reachable vertices. The algorithm is similar to distance-vector routing and runs in `O(|V|)` iterations. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ankurdave/spark SPARK-4763-all-pairs-shortest-paths Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3619.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3619 commit b69cffbc831ae79d1f5b56713ff839f80ba0caaa Author: Ankur Dave ankurd...@gmail.com Date: 2014-12-05T10:01:34Z Checkpoint every 25 iterations in Pregel commit 0c0e18465b871d97ddf0b11d86c3462147d45fb7 Author: Ankur Dave ankurd...@gmail.com Date: 2014-12-05T10:00:49Z Add AllPairsShortestPaths and unit test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4763] All-pairs shortest paths algorith...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3619#issuecomment-65769987 [Test build #24181 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24181/consoleFull) for PR 3619 at commit [`0c0e184`](https://github.com/apache/spark/commit/0c0e18465b871d97ddf0b11d86c3462147d45fb7). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4338. Ditch yarn-alpha.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3215#issuecomment-65776868 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24180/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4338. Ditch yarn-alpha.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3215#issuecomment-65776860 [Test build #24180 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24180/consoleFull) for PR 3215 at commit [`1c5ac08`](https://github.com/apache/spark/commit/1c5ac0889f387a27650b7f4dd37bb315f96dd201). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4763] All-pairs shortest paths algorith...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3619#issuecomment-65778564 [Test build #24181 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24181/consoleFull) for PR 3619 at commit [`0c0e184`](https://github.com/apache/spark/commit/0c0e18465b871d97ddf0b11d86c3462147d45fb7). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4763] All-pairs shortest paths algorith...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3619#issuecomment-65778569 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24181/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user suyanNone commented on the pull request: https://github.com/apache/spark/pull/3574#issuecomment-65779235 @JoshRosen ThreadA into RemoveBlock(), and got info for blockId1 ThreadB into DropFromMemory(), and got info for blockId1 now Thread A, B all want got info.sychronized B got, and drop block(blockId) from memory, and this block is use memory only, so it remove from blockinfo. and then release the lock. then A got, but info is already not in blockinfo. Did I have miss sth or misunderstand sth? may dropForMemory and removeBlock can't happen at the same time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...
Github user tsliwowicz commented on the pull request: https://github.com/apache/spark/pull/2914#issuecomment-65783784 Seems like an issue with Jenkins --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4006] Block Manager - Double Register C...
Github user tsliwowicz commented on the pull request: https://github.com/apache/spark/pull/2854#issuecomment-65783822 Seems like an issue with Jenkins --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: do you mean inadvertently?
GitHub user CrazyJvm opened a pull request: https://github.com/apache/spark/pull/3620 do you mean inadvertently? You can merge this pull request into a Git repository by running: $ git pull https://github.com/CrazyJvm/spark streaming-foreachRDD Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3620.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3620 commit b72886b6570be62ca4bcf1964c489a5f51d41394 Author: CrazyJvm crazy...@gmail.com Date: 2014-12-05T13:39:13Z do you mean inadvertently? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Streaming doc : do you mean inadvertently?
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3620#issuecomment-65791786 [Test build #24182 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24182/consoleFull) for PR 3620 at commit [`b72886b`](https://github.com/apache/spark/commit/b72886b6570be62ca4bcf1964c489a5f51d41394). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Streaming doc : do you mean inadvertently?
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3620#issuecomment-65791847 Correct, but this is really trivial. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4734][Streaming]limit the file Dstream ...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3597#issuecomment-65792502 See my comments in https://issues.apache.org/jira/browse/SPARK-4734 as to why I don't think this is a good idea. In particular, this solution clearly has the potential to lose data and not process files, as well as changing the Spark Streaming semantics. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3926 [CORE] Reopened: result of JavaRDD ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3587#discussion_r21372796 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaUtils.scala --- @@ -32,7 +33,65 @@ private[spark] object JavaUtils { def mapAsSerializableJavaMap[A, B](underlying: collection.Map[A, B]) = new SerializableMapWrapper(underlying) + // Implementation is copied from scala.collection.convert.Wrappers.MapWrapper, --- End diff -- Good question. It appears to be licensed just like the rest of the Scala code (http://www.scala-lang.org/license.html) Spark already integrates some Scala code and has the proper entries in `LICENSE` as a result. I can modify the text to clearly call out that part of `MapWrapper` was copied, for good measure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3405] add subnet-id and vpc-id options ...
Github user brdw commented on the pull request: https://github.com/apache/spark/pull/2872#issuecomment-65801064 I'd love to see this as well. We have a strict vpc policy. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Streaming doc : do you mean inadvertently?
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3620#issuecomment-65802460 [Test build #24182 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24182/consoleFull) for PR 3620 at commit [`b72886b`](https://github.com/apache/spark/commit/b72886b6570be62ca4bcf1964c489a5f51d41394). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Streaming doc : do you mean inadvertently?
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3620#issuecomment-65802466 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24182/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4761][SQL] Enables Kryo by default in S...
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/3621 [SPARK-4761][SQL] Enables Kryo by default in Spark SQL Thrift server Enables Kryo and disables reference tracking by default in Spark SQL Thrift server. Configurations explicitly defined by users in `spark-defaults.conf` are respected. You can merge this pull request into a Git repository by running: $ git pull https://github.com/liancheng/spark kryo-by-default Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3621.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3621 commit 70c27756159d43bc5907965c3b7cfeff8e703c79 Author: Cheng Lian l...@databricks.com Date: 2014-12-05T15:02:54Z Enables Kryo by default in Spark SQL Thrift server --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4763] All-pairs shortest paths algorith...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/3619#discussion_r21377222 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala --- @@ -139,6 +146,14 @@ object Pregel extends Logging { // get to send messages. We must cache messages so it can be materialized on the next line, // allowing us to uncache the previous iteration. messages = g.mapReduceTriplets(sendMsg, mergeMsg, Some((newVerts, activeDirection))).cache() + + if (checkpoint i % checkpointFrequency == checkpointFrequency - 1) { +logInfo(Checkpointing in iteration + i) +g.vertices.checkpoint() --- End diff -- This checkpoint method will work? see: https://issues.apache.org/jira/browse/SPARK-3625 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4761][SQL] Enables Kryo by default in S...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3621#issuecomment-65804482 [Test build #24183 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24183/consoleFull) for PR 3621 at commit [`70c2775`](https://github.com/apache/spark/commit/70c27756159d43bc5907965c3b7cfeff8e703c79). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3926 [CORE] Reopened: result of JavaRDD ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3587#discussion_r21377699 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaUtils.scala --- @@ -32,7 +33,65 @@ private[spark] object JavaUtils { def mapAsSerializableJavaMap[A, B](underlying: collection.Map[A, B]) = new SerializableMapWrapper(underlying) + // Implementation is copied from scala.collection.convert.Wrappers.MapWrapper, + // but implements java.io.Serializable and adds a no-arg constructor class SerializableMapWrapper[A, B](underlying: collection.Map[A, B]) -extends MapWrapper(underlying) with java.io.Serializable +extends ju.AbstractMap[A, B] with java.io.Serializable { self = +// Add no-arg constructor just for serialization +def this() = this(null) --- End diff -- Hm, so it does. Maybe I misunderstood the original error. It's complaining about the _superclass_ (`MapWrapper`) not having a no-arg constructor? So copying the class works, since we no longer subclass `MapWrapper`, but the copy in `SerializableMapWrapper` need not define a no-arg constructor. OK, that line can be removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3926 [CORE] Reopened: result of JavaRDD ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3587#issuecomment-65805992 [Test build #24184 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24184/consoleFull) for PR 3587 at commit [`8586bb9`](https://github.com/apache/spark/commit/8586bb9c72047e378366fa429d4e1e75a44c0d63). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3926 [CORE] Reopened: result of JavaRDD ...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/3587#issuecomment-65806925 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4461][YARN] pass extra java options to ...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/3409#issuecomment-65812731 I'm personally not a fan of executorLauncher. Cluster mode also launches executors and users shouldn't really have to know executorLauncher = client mode. If you want to change the name then we should be explicit this is for yarn client mode only then (spark.yarn.clientmode.am.extraJavaOptions or similar). Otherwise I don't see how you get around some sort of confusion. You can make it apply to both but then you have to define presendence between it and driver.extraJavOptions and if users have both set and don't realize the presendence they get unexpected behavior. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4761][SQL] Enables Kryo by default in S...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3621#issuecomment-65814668 [Test build #24183 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24183/consoleFull) for PR 3621 at commit [`70c2775`](https://github.com/apache/spark/commit/70c27756159d43bc5907965c3b7cfeff8e703c79). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4761][SQL] Enables Kryo by default in S...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3621#issuecomment-65814677 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24183/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3778] newAPIHadoopRDD doesn't properly ...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/2676#discussion_r21382056 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -641,6 +641,7 @@ class SparkContext(config: SparkConf) extends Logging { kClass: Class[K], vClass: Class[V], conf: Configuration = hadoopConfiguration): RDD[(K, V)] = { +// mapreduce.Job (NewHadoopJob) merges any credentials for you. --- End diff -- sure I can change the comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3405] add subnet-id and vpc-id options ...
Github user changetip commented on the pull request: https://github.com/apache/spark/pull/2872#issuecomment-65814859 Hi mvj101, dreid93 sent you a Bitcoin tip worth 1 lunch (21,255 bits/$8.00), and I'm here to deliver it ➔ **[collect your tip at ChangeTip.com](https://www.changetip.com/collect/213620)**. **[Learn more about ChangeTip](https://www.changetip.com/tip-online/github)** --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3778] newAPIHadoopRDD doesn't properly ...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/2676#issuecomment-65814785 I was waiting for clarification from @pwendell on my question about his comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3405] add subnet-id and vpc-id options ...
Github user dreid93 commented on the pull request: https://github.com/apache/spark/pull/2872#issuecomment-65814791 I'll buy anyone willing to take care of this merge lunch via @ChangeTip :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3926 [CORE] Reopened: result of JavaRDD ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3587#issuecomment-65819459 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24184/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3926 [CORE] Reopened: result of JavaRDD ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3587#issuecomment-65819448 [Test build #24184 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24184/consoleFull) for PR 3587 at commit [`8586bb9`](https://github.com/apache/spark/commit/8586bb9c72047e378366fa429d4e1e75a44c0d63). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...
Github user ryan-williams commented on a diff in the pull request: https://github.com/apache/spark/pull/2848#discussion_r21386174 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -412,6 +403,48 @@ private[spark] object Utils extends Logging { } /** + * Download a file from @in to @tempFile, then move it to @destFile, checking whether @destFile --- End diff -- sure, will do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...
Github user ryan-williams commented on a diff in the pull request: https://github.com/apache/spark/pull/2848#discussion_r21386186 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -412,6 +403,48 @@ private[spark] object Utils extends Logging { } /** + * Download a file from @in to @tempFile, then move it to @destFile, checking whether @destFile + * already exists, is equal to the downloaded file, and can be overwritten. --- End diff -- ditto --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...
Github user ryan-williams commented on a diff in the pull request: https://github.com/apache/spark/pull/2848#discussion_r21386332 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -412,6 +403,48 @@ private[spark] object Utils extends Logging { } /** + * Download a file from @in to @tempFile, then move it to @destFile, checking whether @destFile + * already exists, is equal to the downloaded file, and can be overwritten. + */ + private def maybeMoveFile( +url: String, --- End diff -- Yes. The idea was to abstract out this duplicated code from ~4 other places in this file, all of which has a log message that includes the url, without changing any functionality. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...
Github user ryan-williams commented on a diff in the pull request: https://github.com/apache/spark/pull/2848#discussion_r21386398 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -412,6 +403,48 @@ private[spark] object Utils extends Logging { } /** + * Download a file from @in to @tempFile, then move it to @destFile, checking whether @destFile + * already exists, is equal to the downloaded file, and can be overwritten. + */ + private def maybeMoveFile( +url: String, +sourceFile: File, +destFile: File, +fileOverwrite: Boolean): Unit = { + +var shouldCopy = true --- End diff -- geez, seems like I had a bad merge at some point and dropped an important code path in this function; e.g. the `Files.move` call is gone. will audit and fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...
Github user ryan-williams commented on a diff in the pull request: https://github.com/apache/spark/pull/2848#discussion_r21386447 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -412,6 +403,48 @@ private[spark] object Utils extends Logging { } /** + * Download a file from @in to @tempFile, then move it to @destFile, checking whether @destFile + * already exists, is equal to the downloaded file, and can be overwritten. + */ + private def maybeMoveFile( +url: String, +sourceFile: File, +destFile: File, +fileOverwrite: Boolean): Unit = { + +var shouldCopy = true +if (destFile.exists) { --- End diff -- yea, merge bug I think; looking --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...
Github user ryan-williams commented on a diff in the pull request: https://github.com/apache/spark/pull/2848#discussion_r21386501 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -412,6 +403,48 @@ private[spark] object Utils extends Logging { } /** + * Download a file from @in to @tempFile, then move it to @destFile, checking whether @destFile + * already exists, is equal to the downloaded file, and can be overwritten. + */ + private def maybeMoveFile( +url: String, +sourceFile: File, +destFile: File, +fileOverwrite: Boolean): Unit = { + +var shouldCopy = true +if (destFile.exists) { + if (!Files.equal(sourceFile, destFile)) { +if (fileOverwrite) { + destFile.delete() + logInfo((File %s exists and does not match contents of %s, + --- End diff -- sure, done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...
Github user ryan-williams commented on a diff in the pull request: https://github.com/apache/spark/pull/2848#discussion_r21386652 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -412,6 +403,48 @@ private[spark] object Utils extends Logging { } /** + * Download a file from @in to @tempFile, then move it to @destFile, checking whether @destFile + * already exists, is equal to the downloaded file, and can be overwritten. + */ + private def maybeMoveFile( +url: String, +sourceFile: File, +destFile: File, +fileOverwrite: Boolean): Unit = { + +var shouldCopy = true +if (destFile.exists) { + if (!Files.equal(sourceFile, destFile)) { +if (fileOverwrite) { + destFile.delete() + logInfo((File %s exists and does not match contents of %s, + +replacing it with %s).format(destFile, url, url)) --- End diff -- one would think so; this is how it was at [L459](https://github.com/apache/spark/pull/2848/files#diff-d239aee594001f8391676e1047a0381eL459), [L477](https://github.com/apache/spark/pull/2848/files#diff-d239aee594001f8391676e1047a0381eL477), and [L506](https://github.com/apache/spark/pull/2848/files#diff-d239aee594001f8391676e1047a0381eL506). Not obvious to me what the original intent was but I'll try to figure it out and do something more sensible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4461][YARN] pass extra java options to ...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3409#issuecomment-65826773 @tgravescs that makes sense. clientmode.am sounds good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...
Github user ryan-williams commented on a diff in the pull request: https://github.com/apache/spark/pull/2848#discussion_r21386945 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -412,6 +403,48 @@ private[spark] object Utils extends Logging { } /** + * Download a file from @in to @tempFile, then move it to @destFile, checking whether @destFile + * already exists, is equal to the downloaded file, and can be overwritten. + */ + private def maybeMoveFile( +url: String, +sourceFile: File, +destFile: File, +fileOverwrite: Boolean): Unit = { + +var shouldCopy = true +if (destFile.exists) { + if (!Files.equal(sourceFile, destFile)) { +if (fileOverwrite) { + destFile.delete() + logInfo((File %s exists and does not match contents of %s, + +replacing it with %s).format(destFile, url, url)) +} else { + throw new SparkException( +File + destFile + exists and does not match contents of + + url) +} + } else { +// Do nothing if the file contents are the same, i.e. this file has been copied +// previously. +logInfo(sourceFile.getAbsolutePath + has been previously copied to --- End diff -- done, though it went to 101 chars and I had to wrap it, which is not the clearest, but whatevs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...
Github user ryan-williams commented on the pull request: https://github.com/apache/spark/pull/2848#issuecomment-65827581 Thanks for the review pass, @JoshRosen. As I mentioned in some of the comments, this was attempting to shoehorn 4 basically-identical blocks of code from elsewhere in this file into one common code path (it started with 1 or 2 and then grew to encompass the others as previous reviewers suggested that that was desirable). The PR has been mostly sitting around for many weeks and apparently one of the maintenance merges I did removed some key bits; I'll fix those and bump this again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: add foldLeftByKey to PairRDDFunctions for redu...
Github user koertkuipers commented on a diff in the pull request: https://github.com/apache/spark/pull/2963#discussion_r21387829 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -460,6 +461,63 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) } /** + * Group the values for each key in the RDD and apply a binary operator to a start value and all + * ordered values for a key, going left to right. + * + * Note: this operation may be expensive, since there is no map-side combine, so all values are + * send through the shuffle. + */ + def foldLeftByKey[U: ClassTag](valueOrdering: Ordering[V], zeroValue: U, --- End diff -- Does this make it harder for the user to provide an ordering other than the natural ordering? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: add foldLeftByKey to PairRDDFunctions for redu...
Github user koertkuipers commented on the pull request: https://github.com/apache/spark/pull/2963#issuecomment-65828969 Hey @zsxwing, In Scala Seq the order in which the values get processed in foldLeft is well defined. But can we make any assumptions at all about the ordering of the values if you do not sort them in Spark? And if not, is foldLeft without sorting still useful? If so, i guess we can make the sorting optional. Or rename this function to make it clear it sorts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...
Github user ryan-williams commented on a diff in the pull request: https://github.com/apache/spark/pull/2848#discussion_r21389032 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -412,6 +403,48 @@ private[spark] object Utils extends Logging { } /** + * Download a file from @in to @tempFile, then move it to @destFile, checking whether @destFile + * already exists, is equal to the downloaded file, and can be overwritten. + */ + private def maybeMoveFile( +url: String, +sourceFile: File, +destFile: File, +fileOverwrite: Boolean): Unit = { + +var shouldCopy = true +if (destFile.exists) { + if (!Files.equal(sourceFile, destFile)) { +if (fileOverwrite) { + destFile.delete() + logInfo((File %s exists and does not match contents of %s, + +replacing it with %s).format(destFile, url, url)) --- End diff -- on reading it again, I think the original code (that predates me) made sense: it asserting that the file exists and does not match `url`, therefore we are replacing the thing that already exists with what is at `url`, because that is what we want it to match. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4761][SQL] Enables Kryo by default in S...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3621#issuecomment-65832481 Awesome, thanks Cheng. This is great. I forgot we can still modify the SparkConf before we pass it to the SparkContext constructor. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4761][SQL] Enables Kryo by default in S...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3621 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4421] Wrong link in spark-standalone.ht...
Github user tsudukim closed the pull request at: https://github.com/apache/spark/pull/3280 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4421] Wrong link in spark-standalone.ht...
Github user tsudukim commented on the pull request: https://github.com/apache/spark/pull/3280#issuecomment-65834528 Thank you! @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4756][SQL] FIX: sessionToActivePool gro...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/3617#issuecomment-65834619 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...
Github user ryan-williams commented on the pull request: https://github.com/apache/spark/pull/2848#issuecomment-65834877 OK @JoshRosen I fixed and cleaned things up. * the two overloaded `maybeMoveFile` signatures are more distinctly named (`downloadStreamAndMove` and `copyFile`) * each is called twice, and the former actually uses the latter, so I think there are some overall code-reuse wins, in addition to the don't move/copy if the thing is already there bug fix happening in one place. I'm open to writing some tests but it's not that obvious how to do so in a way that is meaningful / doesn't just duplicate the logic in the file itself. The fetch file code path is not tested in the existing `UtilsSuite` afaict, and it doesn't feel worth adding an integration test around, imho. lmk what else you'd like. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4756][SQL] FIX: sessionToActivePool gro...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3617#issuecomment-65835734 [Test build #24185 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24185/consoleFull) for PR 3617 at commit [`e070998`](https://github.com/apache/spark/commit/e070998b5d048334f9e059b485ec87c7d86fbac4). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4668] Fix some documentation typos.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3523#issuecomment-65836128 [Test build #24186 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24186/consoleFull) for PR 3523 at commit [`a4409be`](https://github.com/apache/spark/commit/a4409bed00c86cd33df298a6d116ed980c6b78a1). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4756][SQL] FIX: sessionToActivePool gro...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3617#issuecomment-65836379 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24185/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4756][SQL] FIX: sessionToActivePool gro...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3617#issuecomment-65836374 [Test build #24185 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24185/consoleFull) for PR 3617 at commit [`e070998`](https://github.com/apache/spark/commit/e070998b5d048334f9e059b485ec87c7d86fbac4). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4668] Fix some documentation typos.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3523#issuecomment-65837247 [Test build #24187 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24187/consoleFull) for PR 3523 at commit [`05a3113`](https://github.com/apache/spark/commit/05a31132bc15d98b9a831917007b34afd87e2c55). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2188] Support sbt/sbt for Windows
Github user tsudukim commented on the pull request: https://github.com/apache/spark/pull/3591#issuecomment-65837669 I wonder which is good but I tend not to think to submit this to upstream. It is a good idea if this was made from the latest sbt script, but unfortunately this is made from our sbt script which is an old version. If we would try to submit this to upstream, we should update this as equivalent to the latest sbt, which means the updated windows script will differ from our current linux sbt script so it will be more difficult to maintain. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4765] Make GC time always shown in UI.
GitHub user kayousterhout opened a pull request: https://github.com/apache/spark/pull/3622 [SPARK-4765] Make GC time always shown in UI. This commit removes the GC time for each task from the set of optional, additional metrics, and instead always shows it for each task. cc @pwendell You can merge this pull request into a Git repository by running: $ git pull https://github.com/kayousterhout/spark-1 gc_time Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3622.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3622 commit 4d1cf4b31689d7f60c9104516a36cea6ca4e69d3 Author: Kay Ousterhout kayousterh...@gmail.com Date: 2014-12-05T19:08:13Z [SPARK-4765] Make GC time always shown in UI. This commit removes the GC time for each task from the set of optional, additional metrics, and instead always shows it for each task. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4765] Make GC time always shown in UI.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3622#issuecomment-65840120 [Test build #24188 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24188/consoleFull) for PR 3622 at commit [`4d1cf4b`](https://github.com/apache/spark/commit/4d1cf4b31689d7f60c9104516a36cea6ca4e69d3). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4763] All-pairs shortest paths algorith...
Github user ankurdave commented on a diff in the pull request: https://github.com/apache/spark/pull/3619#discussion_r21393556 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala --- @@ -139,6 +146,14 @@ object Pregel extends Logging { // get to send messages. We must cache messages so it can be materialized on the next line, // allowing us to uncache the previous iteration. messages = g.mapReduceTriplets(sendMsg, mergeMsg, Some((newVerts, activeDirection))).cache() + + if (checkpoint i % checkpointFrequency == checkpointFrequency - 1) { +logInfo(Checkpointing in iteration + i) +g.vertices.checkpoint() --- End diff -- We merged a simpler fix for graph checkpointing in https://issues.apache.org/jira/browse/SPARK-4672. I think this fixes SPARK-3623 -- are the extra changes to Spark core in your PR necessary? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3625][SPARK-3623][GraphX] GraphX should...
Github user ankurdave commented on the pull request: https://github.com/apache/spark/pull/2631#issuecomment-65841397 Due to https://issues.apache.org/jira/browse/SPARK-4672, we now support checkpointing graphs (by checkpointing their constituent vertices and edges) with the same semantics as RDDs, meaning that you have to checkpoint an RDD before calling an action on it. This PR additionally adds support for checkpointing after calling actions, but that is not strictly necessary and involves a behavior change to Spark core. So I think we should close this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Clear local copies of accumulators as soon as ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3570#issuecomment-65844299 [Test build #24189 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24189/consoleFull) for PR 3570 at commit [`537baad`](https://github.com/apache/spark/commit/537baad0379644537f21385f0cc1150b4af0b237). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4005][CORE] handle message replies in r...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2853#issuecomment-65845790 LGTM. Since this is code-cleanup and not a bugfix, I'm only going to merge this into `master` (1.3.0). Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4005][CORE] handle message replies in r...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2853 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add a Note on jsonFile having separate JSON ob...
Github user petervandenabeele commented on the pull request: https://github.com/apache/spark/pull/3517#issuecomment-65846515 More problematic (and sorry I had not seen that before) ... there already _is_ an example file named `people.txt` with a different format: ``` $ spark git:(pv-docs-note-on-jsonFile-format/01) cat examples/src/main/resources/people.txt Michael, 29 Andy, 30 Justin, 19 ``` In that case, I could rename the example jsonFile to `people.jsons`. It is a weird name, but it's _reasonably_ accurate (following the `xs` pattern from Scala, as it is like a list of json objects). I would then indeed also need to change the name in all other locations where a reference to `people.json` is made (confirming the list mentioned by @marmbrus): ``` spark git:(pv-docs-note-on-jsonFile-format/01) grep -r 'people\.json' * | grep -v Binary | grep -v _site examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSQL.java: String path = examples/src/main/resources/people.json; examples/src/main/python/sql.py:path = os.path.join(os.environ['SPARK_HOME'], examples/src/main/resources/people.json) ``` On a more fundamental note, from the outside, I would have perceived it following the principle of least astonishment (POLA) if the import to this function required a standard valid json file that needs to be formatted as an array of hashes with identical schema, like e.g. ``` [ {name: Tom, character:cat}, {name:Jerry, character:mouse} ] ``` This would have allowed us to simply import data generated from any other language with `array.to_json`. I hear the proposal from @marmbrus to also improve the error message (that would also have helped us in more quickly understanding the issue), but it would suggest to put that in a different JIRA issue (that needs some real programming and testing work). I look forward to directions on how to best fix at least the documentation to avoid this confusion for others. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4767: Add support for launching in a spe...
GitHub user holdenk opened a pull request: https://github.com/apache/spark/pull/3623 SPARK-4767: Add support for launching in a specified placement group to spark_ec2 Placement groups are cool and all the cool kids are using them. Lets add support for them to spark_ec2.py because I'm lazy You can merge this pull request into a Git repository by running: $ git pull https://github.com/holdenk/spark SPARK-4767-add-support-for-launching-in-a-specified-placement-group-to-spark-ec2-scripts Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3623.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3623 commit 70ace25cf260d1b968f631a2adc0cfb8aeeffe08 Author: Holden Karau hol...@pigscanfly.ca Date: 2014-12-05T19:58:35Z Placement groups are cool and all the cool kids are using them. Lets add support for them to spark_ec2.py because I'm lazy --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4668] Fix some documentation typos.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3523#issuecomment-65847289 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24186/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4668] Fix some documentation typos.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3523#issuecomment-65847280 [Test build #24186 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24186/consoleFull) for PR 3523 at commit [`a4409be`](https://github.com/apache/spark/commit/a4409bed00c86cd33df298a6d116ed980c6b78a1). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4767: Add support for launching in a spe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3623#issuecomment-65847864 [Test build #24190 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24190/consoleFull) for PR 3623 at commit [`70ace25`](https://github.com/apache/spark/commit/70ace25cf260d1b968f631a2adc0cfb8aeeffe08). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4668] Fix some documentation typos.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3523#issuecomment-65848971 [Test build #24187 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24187/consoleFull) for PR 3523 at commit [`05a3113`](https://github.com/apache/spark/commit/05a31132bc15d98b9a831917007b34afd87e2c55). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4668] Fix some documentation typos.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3523#issuecomment-65848977 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24187/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4765] Make GC time always shown in UI.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3622#issuecomment-65851401 [Test build #24188 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24188/consoleFull) for PR 3622 at commit [`4d1cf4b`](https://github.com/apache/spark/commit/4d1cf4b31689d7f60c9104516a36cea6ca4e69d3). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4765] Make GC time always shown in UI.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3622#issuecomment-65851408 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24188/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4765] Make GC time always shown in UI.
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/3622#issuecomment-65852060 MIMA tests pass locally; I rebased this on master to see if that makes the tests pass --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4765] Make GC time always shown in UI.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3622#issuecomment-65852692 [Test build #24191 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24191/consoleFull) for PR 3622 at commit [`e71d893`](https://github.com/apache/spark/commit/e71d89356e66f8532498652c80da9f1a43b98f39). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4770. [DOC] [YARN] spark.scheduler.minRe...
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/3624 SPARK-4770. [DOC] [YARN] spark.scheduler.minRegisteredResourcesRatio doc... ...umented default is incorrect for YARN You can merge this pull request into a Git repository by running: $ git pull https://github.com/sryza/spark sandy-spark-4770 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3624.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3624 commit bd81a3a9081b61e41ebdaba2c657ff66f80d946f Author: Sandy Ryza sa...@cloudera.com Date: 2014-12-05T20:54:37Z SPARK-4770. [DOC] [YARN] spark.scheduler.minRegisteredResourcesRatio documented default is incorrect for YARN --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4770. [DOC] [YARN] spark.scheduler.minRe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3624#issuecomment-65854478 [Test build #24192 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24192/consoleFull) for PR 3624 at commit [`bd81a3a`](https://github.com/apache/spark/commit/bd81a3a9081b61e41ebdaba2c657ff66f80d946f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Clear local copies of accumulators as soon as ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3570#issuecomment-65855809 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24189/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Clear local copies of accumulators as soon as ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3570#issuecomment-65855799 [Test build #24189 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24189/consoleFull) for PR 3570 at commit [`537baad`](https://github.com/apache/spark/commit/537baad0379644537f21385f0cc1150b4af0b237). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4740] [WIP] Create multiple concurrent ...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/3625 [SPARK-4740] [WIP] Create multiple concurrent connections between two peer nodes in Netty. Need to test add test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-4740 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3625.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3625 commit 4f216736024f48ed9fa3fca6ea5953495def8e51 Author: Reynold Xin r...@databricks.com Date: 2014-12-05T21:20:58Z [SPARK-4740] Create multiple concurrent connections between two peer nodes in Netty. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org