[GitHub] spark issue #16330: [SPARK-18817][SPARKR][SQL] change derby log output to te...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16330 The code changes are now very specific to R. Let me know if you still need me. : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17110: [SPARK-19635][ML] DataFrame-based API for chi squ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17110 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17110: [SPARK-19635][ML] DataFrame-based API for chi square tes...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17110 OK merging with master Thanks @imatiach-msft and @thunterdb ! @imatiach-msft I agree about sparse testing. This has all of the MLlib tests, but we should add more in the future. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17326: [SPARK-19985][ML] Fixed copy method for some ML Models
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/17326 ping @jkbradley @MLnick --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16483: [SPARK-18847][GraphX] PageRank gives incorrect results f...
Github user aray commented on the issue: https://github.com/apache/spark/pull/16483 @thunterdb The extra step -- as implemented -- is only at the end as that gives the same result as doing it after every iteration but without the extra overhead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17326: [SPARK-19985][ML] Fixed copy method for some ML Models
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17326 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17326: [SPARK-19985][ML] Fixed copy method for some ML Models
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17326 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74695/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15363: [SPARK-17791][SQL] Join reordering using star sch...
Github user ioana-delaney commented on a diff in the pull request: https://github.com/apache/spark/pull/15363#discussion_r106558961 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala --- @@ -51,6 +51,11 @@ case class CostBasedJoinReorder(conf: CatalystConf) extends Rule[LogicalPlan] wi def reorder(plan: LogicalPlan, output: AttributeSet): LogicalPlan = { val (items, conditions) = extractInnerJoins(plan) +// Find the star schema joins. Currently, it returns the star join with the largest +// fact table. In the future, it can return more than one star join (e.g. F1-D1-D2 +// and F2-D3-D4). +val starJoinPlans = StarSchemaDetection(conf).findStarJoins(items, conditions.toSeq) --- End diff -- @wzhfy Done. Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17326: [SPARK-19985][ML] Fixed copy method for some ML Models
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17326 **[Test build #74695 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74695/testReport)** for PR 17326 at commit [`89bac8a`](https://github.com/apache/spark/commit/89bac8a209ca05bfa58e952a01ef664fe8ae2f65). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17307: [SPARK-13369] Make number of consecutive fetch failures ...
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/17307 @squito FYI I filed a JIRA for the 2nd of the two unit tests that failed in that run (looks like you'd already filed a JIRA for the first one) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16781 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16781 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74688/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16781 **[Test build #74688 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74688/testReport)** for PR 16781 at commit [`38e19cd`](https://github.com/apache/spark/commit/38e19cdd497992fb063cb39d1d65bde1622553e4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...
Github user ericl commented on the issue: https://github.com/apache/spark/pull/17166 Made the change to improve the default reason, which now says "killed via SparkContext.killTaskAttempt". --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17166 **[Test build #74697 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74697/testReport)** for PR 17166 at commit [`8f7ffb3`](https://github.com/apache/spark/commit/8f7ffb395cae9ae7aa24a14dcdb908aaee30b710). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r106555744 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -710,7 +710,11 @@ private[spark] class TaskSetManager( logInfo(s"Killing attempt ${attemptInfo.attemptNumber} for task ${attemptInfo.id} " + s"in stage ${taskSet.id} (TID ${attemptInfo.taskId}) on ${attemptInfo.host} " + s"as the attempt ${info.attemptNumber} succeeded on ${info.host}") - sched.backend.killTask(attemptInfo.taskId, attemptInfo.executorId, true) + sched.backend.killTask( +attemptInfo.taskId, +attemptInfo.executorId, +interruptThread = true, +reason = "another attempt succeeded") --- End diff -- Ok let's leave this as-is -- seems too complicated to have a longer and shorter reason (and unlike the reason above, this one is per-task, so hard to summarize on the stage page) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r106555729 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -2250,6 +2250,22 @@ class SparkContext(config: SparkConf) extends Logging { } /** + * Kill and reschedule the given task attempt. Task ids can be obtained from the Spark UI + * or through SparkListener.onTaskStart. + * + * @param taskId the task ID to kill. This id uniquely identifies the task attempt. + * @param interruptThread whether to interrupt the thread running the task. + * @param reason the reason for killing the task, which should be a short string. If a task + * is killed multiple times with different reasons, only one reason will be reported. + */ + def killTaskAttempt( + taskId: Long, + interruptThread: Boolean = true, + reason: String = "cancelled"): Unit = { --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r106555639 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -2250,6 +2250,22 @@ class SparkContext(config: SparkConf) extends Logging { } /** + * Kill and reschedule the given task attempt. Task ids can be obtained from the Spark UI + * or through SparkListener.onTaskStart. + * + * @param taskId the task ID to kill. This id uniquely identifies the task attempt. + * @param interruptThread whether to interrupt the thread running the task. + * @param reason the reason for killing the task, which should be a short string. If a task + * is killed multiple times with different reasons, only one reason will be reported. + */ + def killTaskAttempt( + taskId: Long, + interruptThread: Boolean = true, + reason: String = "cancelled"): Unit = { --- End diff -- As discussed how about "killed via SparkContext.killTaskAttempt" or similar? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17226: [SPARK-19882][SQL] Pivot with null as a distinct pivot v...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17226 cc @cloud-fan and @mambrus, do you mind if I ask to take a look here? I guess this is an important ix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17322: [SPARK-19987][SQL] Pass all filters into FileIndex
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17322 **[Test build #74696 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74696/testReport)** for PR 17322 at commit [`a667f80`](https://github.com/apache/spark/commit/a667f8083c3528a9a95ec130d1887a9f982c4af7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 @viirya A month has gone by since my last update. I've added much more comprehensive coverage to the `SelectedFieldSuite`, however I haven't yet fixed the `SelectedField` extractor to pass all of the tests. All of the failures are related to handling path expressions including `GetArrayStructFields` extractors. There are many complicated cases, and they are proving quite a challenge to resolve comprehensively. I hope to spend some more time on this by the end of next week. I would love to push an update by then. After next week I will be away for two weeks. Cheers. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17324 **[Test build #74690 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74690/testReport)** for PR 17324 at commit [`ac0683b`](https://github.com/apache/spark/commit/ac0683b6799e9d9090da9e2244b609c59717466b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17324 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17324 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74690/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17324 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17296: [SPARK-19953][ML] Random Forest Models use parent UID wh...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/17296 @MLnick , I found an existing `MLTestingUtils.checkCopy` that is used to check the copied model uids match and can easily be extended to include the check needed here. I went through and added these checks to any ML suite that wasn't already using, but that led to another issue that I felt should be covered in a separate PR at #17326. Can you take a look at that first and merge if ok, then I'll update this push the regression tests for this? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17324 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74689/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17324 **[Test build #74689 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74689/testReport)** for PR 17324 at commit [`f2e7a69`](https://github.com/apache/spark/commit/f2e7a69badc9d4e0352fcfe09e8d18cdfe007d9e). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class JavaImputerExample ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17315: [SPARK-19949][SQL] unify bad record handling in CSV and ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17315 I support this idea. Let me try to take a close look within tomorrow to help. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17326: [SPARK-19985][ML] Fixed copy method for some ML Models
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17326 **[Test build #74695 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74695/testReport)** for PR 17326 at commit [`89bac8a`](https://github.com/apache/spark/commit/89bac8a209ca05bfa58e952a01ef664fe8ae2f65). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17320: [SPARK-19967][SQL] Add from_json in FunctionRegistry
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17320 All the valid examples are using a single column. Could you also add a test case to verify the schema having multiple columns? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16499: [SPARK-17204][CORE] Fix replicated off heap stora...
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16499#discussion_r106552361 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1048,7 +1065,7 @@ private[spark] class BlockManager( try { replicate(blockId, bytesToReplicate, level, remoteClassTag) } finally { -bytesToReplicate.dispose() +bytesToReplicate.unmap() --- End diff -- @cloud-fan I explored the approach of making the `MemoryStore` return a `ChunkedByteBuffer` that cannot be disposed, however I don't think there's a clean way to safely support that behavior. In essence, if the memory manager marks a buffer as indisposable when it returns it to the block manager, then that buffer cannot be evicted later. Adding additional code to handle this other behavior correctly was looking rather messy, and I abandoned the effort. At this point, I think that explicitly separating `unmap` and `dispose` methods is still the best way to resolve this issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17326: [SPARK-19985][ML] Fixed copy method for some ML M...
GitHub user BryanCutler opened a pull request: https://github.com/apache/spark/pull/17326 [SPARK-19985][ML] Fixed copy method for some ML Models ## What changes were proposed in this pull request? Some ML Models were using `defaultCopy` which expects a default constructor, and others were not setting the parent estimator. This change fixes these by creating a new instance of the model and explicitly setting values and parent. ## How was this patch tested? Added `MLTestingUtils.checkCopy` to the offending models to tests to verify the copy is made and parent is set. You can merge this pull request into a Git repository by running: $ git pull https://github.com/BryanCutler/spark ml-model-copy-error-SPARK-19985 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17326.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17326 commit dc65b135e3d0110a951888baa7121e93613cda5f Author: Bryan CutlerDate: 2017-03-16T22:16:46Z added regression testing for model copies commit 89bac8a209ca05bfa58e952a01ef664fe8ae2f65 Author: Bryan Cutler Date: 2017-03-16T22:59:10Z fixed copy method for ML models throwing error --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17088: [SPARK-19753][CORE] Un-register all shuffle output on a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17088 **[Test build #74694 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74694/testReport)** for PR 17088 at commit [`d4979e3`](https://github.com/apache/spark/commit/d4979e35137152db00c53ea0b9e82aaf41dad5b5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17322: [SPARK-19987][SQL] Pass all filters into FileIndex
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17322 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74687/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17322: [SPARK-19987][SQL] Pass all filters into FileIndex
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17322 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17322: [SPARK-19987][SQL] Pass all filters into FileIndex
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17322 **[Test build #74687 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74687/testReport)** for PR 17322 at commit [`87a8be5`](https://github.com/apache/spark/commit/87a8be533ca316ee586cc0f50c2b4aeeec1fb903). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15363: [SPARK-17791][SQL] Join reordering using star schema det...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15363 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74685/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15363: [SPARK-17791][SQL] Join reordering using star schema det...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15363 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15363: [SPARK-17791][SQL] Join reordering using star schema det...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15363 **[Test build #74685 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74685/testReport)** for PR 15363 at commit [`de16f53`](https://github.com/apache/spark/commit/de16f539450a518e7b72cc52bbc5eb489d8dae32). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17320: [SPARK-19967][SQL] Add from_json in FunctionRegis...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17320#discussion_r106550987 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala --- @@ -202,12 +202,12 @@ class JsonFunctionsSuite extends QueryTest with SharedSQLContext { val df1 = Seq(Tuple1(Tuple1(1))).toDF("a") checkAnswer( df1.selectExpr("to_json(a)"), - Row("""{"_1":1}""") :: Nil) + Row( """{"_1":1}""") :: Nil) --- End diff -- revert it back --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17320: [SPARK-19967][SQL] Add from_json in FunctionRegis...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17320#discussion_r106550986 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala --- @@ -202,12 +202,12 @@ class JsonFunctionsSuite extends QueryTest with SharedSQLContext { val df1 = Seq(Tuple1(Tuple1(1))).toDF("a") checkAnswer( df1.selectExpr("to_json(a)"), - Row("""{"_1":1}""") :: Nil) + Row( """{"_1":1}""") :: Nil) val df2 = Seq(Tuple1(Tuple1(java.sql.Timestamp.valueOf("2015-08-26 18:00:00.0".toDF("a") checkAnswer( df2.selectExpr("to_json(a, map('timestampFormat', 'dd/MM/ HH:mm'))"), - Row("""{"_1":"26/08/2015 18:00"}""") :: Nil) + Row( """{"_1":"26/08/2015 18:00"}""") :: Nil) --- End diff -- revert it back --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16483: [SPARK-18847][GraphX] PageRank gives incorrect results f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16483 **[Test build #74693 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74693/testReport)** for PR 16483 at commit [`ac5d0ce`](https://github.com/apache/spark/commit/ac5d0ce1bdf2cf1e570107613e55c7c24b58a638). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16483: [SPARK-18847][GraphX] PageRank gives incorrect results f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16483 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16483: [SPARK-18847][GraphX] PageRank gives incorrect results f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16483 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74693/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17320: [SPARK-19967][SQL] Add from_json in FunctionRegis...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17320#discussion_r106550372 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -634,7 +661,12 @@ case class StructToJson( override def inputTypes: Seq[AbstractDataType] = StructType :: Nil } -object StructToJson { +object JsonExprUtils { + + def validateSchemaLiteral(exp: Expression): StructType = exp match { +case Literal(s, StringType) => CatalystSqlParser.parseTableSchema(s.toString) +case e => throw new AnalysisException(s"Must be a string literal, but: $e") --- End diff -- How about? > Expected a string literal instead of $e --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16209: [WIP][SPARK-10849][SQL] Adds option to the JDBC data sou...
Github user sureshthalamati commented on the issue: https://github.com/apache/spark/pull/16209 @gatorsmile I like the DDL schema format approach. But the method `CatalystSqlParser.parseTableSchema(sql)` will work only if user wants to specify the target database datatype that also exists in Spark. For example if user wants to specify CLOB(200K) ; it will not work because that is not a valid data type in spark. How about simple comma separate list with restriction of ,(comma) can not be in the column name to use this option ?. I am guessing that would work in most of the scenarios. Any suggestions ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16483: [SPARK-18847][GraphX] PageRank gives incorrect results f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16483 **[Test build #74693 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74693/testReport)** for PR 16483 at commit [`ac5d0ce`](https://github.com/apache/spark/commit/ac5d0ce1bdf2cf1e570107613e55c7c24b58a638). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17325: [SPARK-19803][CORE][TEST] Proactive replication test fai...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17325 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17307: [SPARK-13369] Make number of consecutive fetch failures ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17307 **[Test build #74692 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74692/testReport)** for PR 17307 at commit [`0f95c8b`](https://github.com/apache/spark/commit/0f95c8b1ad260abb1a64d9cbd25d09a1bafeb1d8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17325: [SPARK-19803][CORE][TEST] Proactive replication t...
GitHub user shubhamchopra opened a pull request: https://github.com/apache/spark/pull/17325 [SPARK-19803][CORE][TEST] Proactive replication test failures ## What changes were proposed in this pull request? Executors cache a list of their peers that is refreshed by default every minute. The cached stale references were randomly being used for replication. Since those executors were removed from the master, they did not occur in the block locations as reported by the master. This was fixed by 1. Refreshing peer cache in the block manager before trying to pro-actively replicate. This way the probability of replicating to a failed executor is eliminated. 2. Explicitly stopping the block manager in the tests. This shuts down the RPC endpoint use by the block manager. This way, even if a block manager tries to replicate using a stale reference, the replication logic should take care of refreshing the list of peers after failure. ## How was this patch tested? Tested manually You can merge this pull request into a Git repository by running: $ git pull https://github.com/shubhamchopra/spark SPARK-19803 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17325.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17325 commit 22f9dbd6825939f93f8d32b3ec428f890d361d9f Author: Shubham ChopraDate: 2017-03-16T22:14:23Z Fixing an issue with executors using stale peer references to replicate. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17307: [SPARK-13369] Make number of consecutive fetch failures ...
Github user squito commented on the issue: https://github.com/apache/spark/pull/17307 yeah, sorry I am looking, but keep getting distracted ... I'm sure these failures don't matter but can't merge this second anyhow so lets just test again ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17307: [SPARK-13369] Make number of consecutive fetch failures ...
Github user squito commented on the issue: https://github.com/apache/spark/pull/17307 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17319: [SPARK-19765][SPARK-18549][SPARK-19093][SPARK-19736][BAC...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17319 cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16483: [SPARK-18847][GraphX] PageRank gives incorrect re...
Github user aray commented on a diff in the pull request: https://github.com/apache/spark/pull/16483#discussion_r106548090 --- Diff: graphx/src/test/scala/org/apache/spark/graphx/lib/PageRankSuite.scala --- @@ -68,26 +69,34 @@ class PageRankSuite extends SparkFunSuite with LocalSparkContext { val nVertices = 100 val starGraph = GraphGenerators.starGraph(sc, nVertices).cache() val resetProb = 0.15 + val tol = 0.0001 + val numIter = 2 val errorTol = 1.0e-5 - val staticRanks1 = starGraph.staticPageRank(numIter = 2, resetProb).vertices - val staticRanks2 = starGraph.staticPageRank(numIter = 3, resetProb).vertices.cache() + val staticRanks = starGraph.staticPageRank(numIter, resetProb).vertices.cache() + val staticRanks2 = starGraph.staticPageRank(numIter + 1, resetProb).vertices - // Static PageRank should only take 3 iterations to converge - val notMatching = staticRanks1.innerZipJoin(staticRanks2) { (vid, pr1, pr2) => + // Static PageRank should only take 2 iterations to converge --- End diff -- It didn't change, were still comparing the output of the 2nd and 3rd iteration. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17307: [SPARK-13369] Make number of consecutive fetch failures ...
Github user sitalkedia commented on the issue: https://github.com/apache/spark/pull/17307 The tests passes on my laptop, so looks like we have more flaky tests? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17088: [SPARK-19753][CORE] Un-register all shuffle output on a ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17088 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74691/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17088: [SPARK-19753][CORE] Un-register all shuffle output on a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17088 **[Test build #74691 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74691/testReport)** for PR 17088 at commit [`f96ec68`](https://github.com/apache/spark/commit/f96ec68d6922fe2108c5869fedf2d8aca373c6eb). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17088: [SPARK-19753][CORE] Un-register all shuffle output on a ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17088 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17088: [SPARK-19753][CORE] Un-register all shuffle outpu...
Github user sitalkedia commented on a diff in the pull request: https://github.com/apache/spark/pull/17088#discussion_r106546928 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -394,6 +394,32 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with Timeou assertDataStructuresEmpty() } + test("All shuffle files should on the slave should be cleaned up when slave lost") { +// reset the test context with the right shuffle service config +afterEach() +val conf = new SparkConf() +conf.set("spark.shuffle.service.enabled", "true") +init(conf) +runEvent(ExecutorAdded("exec-hostA1", "hostA")) +runEvent(ExecutorAdded("exec-hostA2", "hostA")) +runEvent(ExecutorAdded("exec-hostB", "hostB")) +val shuffleMapRdd = new MyRDD(sc, 3, Nil) +val shuffleDep = new ShuffleDependency(shuffleMapRdd, new HashPartitioner(1)) +val shuffleId = shuffleDep.shuffleId +val reduceRdd = new MyRDD(sc, 1, List(shuffleDep), tracker = mapOutputTracker) +submit(reduceRdd, Array(0)) +complete(taskSets(0), Seq( + (Success, makeMapStatus("hostA", 1)), + (Success, makeMapStatus("hostA", 1)), + (Success, makeMapStatus("hostB", 1 +scheduler.handleExecutorLost("exec-hostA1", fileLost = false, hostLost = true, Some("hostA")) +runEvent(ExecutorLost("exec-hostA1", SlaveLost("", true))) +val mapStatus = mapOutputTracker.mapStatuses.get(0).get.filter(_!= null) --- End diff -- Thanks for providing a better test case, I also modified it to include map output from multiple stages. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17088: [SPARK-19753][CORE] Un-register all shuffle outpu...
Github user sitalkedia commented on a diff in the pull request: https://github.com/apache/spark/pull/17088#discussion_r106546818 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1365,19 +1369,27 @@ class DAGScheduler( */ private[scheduler] def handleExecutorLost( execId: String, - filesLost: Boolean, + fileLost: Boolean, + hostLost: Boolean = false, + maybeHost: Option[String] = None, --- End diff -- I agree the method is very hard to understand. I did some refactoring to make it clearer, let me know what you think. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17088: [SPARK-19753][CORE] Un-register all shuffle output on a ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17088 **[Test build #74691 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74691/testReport)** for PR 17088 at commit [`f96ec68`](https://github.com/apache/spark/commit/f96ec68d6922fe2108c5869fedf2d8aca373c6eb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17088: [SPARK-19753][CORE] Un-register all shuffle outpu...
Github user sitalkedia commented on a diff in the pull request: https://github.com/apache/spark/pull/17088#discussion_r106546591 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1331,7 +1328,14 @@ class DAGScheduler( // TODO: mark the executor as failed only if there were lots of fetch failures on it if (bmAddress != null) { -handleExecutorLost(bmAddress.executorId, filesLost = true, Some(task.epoch)) +if (!env.blockManager.externalShuffleServiceEnabled) { --- End diff -- Ah, thanks for pointing that out, fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16483: [SPARK-18847][GraphX] PageRank gives incorrect re...
Github user aray commented on a diff in the pull request: https://github.com/apache/spark/pull/16483#discussion_r106546448 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala --- @@ -322,13 +335,12 @@ object PageRank extends Logging { def personalizedVertexProgram(id: VertexId, attr: (Double, Double), msgSum: Double): (Double, Double) = { val (oldPR, lastDelta) = attr - var teleport = oldPR - val delta = if (src==id) resetProb else 0.0 - teleport = oldPR*delta - - val newPR = teleport + (1.0 - resetProb) * msgSum - val newDelta = if (lastDelta == Double.NegativeInfinity) newPR else newPR - oldPR - (newPR, newDelta) + val newPR = if (lastDelta == Double.NegativeInfinity) { --- End diff -- I'm guessing you mean the `if (src==id)` check? I'm honestly not sure what was going on with this code its just wrong. The results do not match up with igraph/networkx at all. Furthermore the code is just nonsensical -- definition of `var teleport = oldPR` that is then unconditionally set two lines down to `teleport = oldPR*delta` without being used prior. This revised implementation is much easier to follow and is now tested against 3 sets of reference values computed by igraph/networkx. Please let me know if you thing I'm missing something. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17324 **[Test build #74690 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74690/testReport)** for PR 17324 at commit [`ac0683b`](https://github.com/apache/spark/commit/ac0683b6799e9d9090da9e2244b609c59717466b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17324 **[Test build #74689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74689/testReport)** for PR 17324 at commit [`f2e7a69`](https://github.com/apache/spark/commit/f2e7a69badc9d4e0352fcfe09e8d18cdfe007d9e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/17324 [SPARK-19969] [ML] Imputer doc and example ## What changes were proposed in this pull request? Add docs and examples for spark.ml.feature.Imputer. Currently scala and Java examples are included. Python example will be added after https://github.com/apache/spark/pull/17316 ## How was this patch tested? local doc generation and example execution You can merge this pull request into a Git repository by running: $ git pull https://github.com/hhbyyh/spark imputerdoc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17324.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17324 commit f2e7a69badc9d4e0352fcfe09e8d18cdfe007d9e Author: Yuhao YangDate: 2017-03-16T22:05:56Z imputer doc and example --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17317: [SPARK-19329][SQL][BRANCH-2.1]Reading from or wri...
Github user windpiger closed the pull request at: https://github.com/apache/spark/pull/17317 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17284: [DO_NOT_MERGE]Test PySpark Streaming tests
Github user zsxwing closed the pull request at: https://github.com/apache/spark/pull/17284 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/17166 Thinking about this more, this seems like two separate changes (that should probably be separated): (1) Allowing cancellations to be injected via SparkContext. This seems like it should have its own JIRA, and is relatively few LOC (so should be easy to decouple). Those changes look fine and I think are good to merge as-is if you move them to a new PR. (2) Allowing reasons to be specified. This changes the API and changes many LOC. I'm skeptical of this change: I think this could be helpful if descriptive reasons are allowed (like the few I suggested in the comments), but if you restrict reasons to a few words so that they fit in the stage summary page, they don't seem very useful to a user. E.g., the default message of "cancelled" when sc.killTask is used seems pretty meaningless (and will require someone to read the code to understand -- at which point it seems like they might as well look in the logs instead of getting info from the UI). This doesn't seem useful enough to merit an API change, but maybe I'm missing something important here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17191: [SPARK-14471][SQL] Aliases in SELECT could be use...
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/17191#discussion_r106540691 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -836,17 +836,29 @@ class Analyzer( Generate(newG.asInstanceOf[Generator], join, outer, qualifier, output, child) } + // If grouping keys have unresolved expressions, we need to replace them with resolved one + // in SELECT clauses. + case agg @ Aggregate(groups, aggs, child) + if child.resolved && aggs.forall(_.resolved) && groups.exists(!_.resolved) => +agg.copy(groupingExpressions = groups.map { +case u: UnresolvedAttribute => + aggs.find(ne => resolver(ne.name, u.name)).map { +case alias @ Alias(e, _) => e +case e => e + }.getOrElse(u) +case e => e + }) + --- End diff -- Nit: Should we place this Aggregate pattern next to the other Aggregate that deals with the wildcard character (*) above? Otherwise, LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17323: [SPARK-19986][Tests]Make pyspark.streaming.tests.Checkpo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17323 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74686/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17323: [SPARK-19986][Tests]Make pyspark.streaming.tests.Checkpo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17323 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17323: [SPARK-19986][Tests]Make pyspark.streaming.tests.Checkpo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17323 **[Test build #74686 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74686/testReport)** for PR 17323 at commit [`6dddaab`](https://github.com/apache/spark/commit/6dddaab4715d3cce093c4a00f87a0c7cd1bda4db). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17323: [SPARK-19986][Tests]Make pyspark.streaming.tests.Checkpo...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/17323 cc @tdas --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16781 **[Test build #74688 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74688/testReport)** for PR 16781 at commit [`38e19cd`](https://github.com/apache/spark/commit/38e19cdd497992fb063cb39d1d65bde1622553e4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16483: [SPARK-18847][GraphX] PageRank gives incorrect results f...
Github user thunterdb commented on the issue: https://github.com/apache/spark/pull/16483 In addition, this introduces an extra step reduction at each iteration. I am fine with that since it is for correctness, but @jkbradley may want to comment as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17323: [SPARK-19986][Tests]Make pyspark.streaming.tests.Checkpo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17323 **[Test build #74686 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74686/testReport)** for PR 17323 at commit [`6dddaab`](https://github.com/apache/spark/commit/6dddaab4715d3cce093c4a00f87a0c7cd1bda4db). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17322: [SPARK-19987][SQL] Pass all filters into FileIndex
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17322 **[Test build #74687 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74687/testReport)** for PR 17322 at commit [`87a8be5`](https://github.com/apache/spark/commit/87a8be533ca316ee586cc0f50c2b4aeeec1fb903). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16483: [SPARK-18847][GraphX] PageRank gives incorrect re...
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/16483#discussion_r106529377 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala --- @@ -353,9 +365,19 @@ object PageRank extends Logging { vertexProgram(id, attr, msgSum) } -Pregel(pagerankGraph, initialMessage, activeDirection = EdgeDirection.Out)( +val rankGraph = Pregel(pagerankGraph, initialMessage, activeDirection = EdgeDirection.Out)( vp, sendMessage, messageCombiner) .mapVertices((vid, attr) => attr._1) - } // end of deltaPageRank + +// If the graph has sinks (vertices with no outgoing edges) the sum of ranks will not be correct --- End diff -- This is the same code as above, please factor it into a function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16483: [SPARK-18847][GraphX] PageRank gives incorrect re...
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/16483#discussion_r106532078 --- Diff: graphx/src/test/scala/org/apache/spark/graphx/lib/PageRankSuite.scala --- @@ -68,26 +69,34 @@ class PageRankSuite extends SparkFunSuite with LocalSparkContext { val nVertices = 100 val starGraph = GraphGenerators.starGraph(sc, nVertices).cache() val resetProb = 0.15 + val tol = 0.0001 + val numIter = 2 val errorTol = 1.0e-5 - val staticRanks1 = starGraph.staticPageRank(numIter = 2, resetProb).vertices - val staticRanks2 = starGraph.staticPageRank(numIter = 3, resetProb).vertices.cache() + val staticRanks = starGraph.staticPageRank(numIter, resetProb).vertices.cache() + val staticRanks2 = starGraph.staticPageRank(numIter + 1, resetProb).vertices - // Static PageRank should only take 3 iterations to converge - val notMatching = staticRanks1.innerZipJoin(staticRanks2) { (vid, pr1, pr2) => + // Static PageRank should only take 2 iterations to converge --- End diff -- Why does it take only two iterations to converge now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16483: [SPARK-18847][GraphX] PageRank gives incorrect re...
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/16483#discussion_r106535595 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala --- @@ -322,13 +335,12 @@ object PageRank extends Logging { def personalizedVertexProgram(id: VertexId, attr: (Double, Double), msgSum: Double): (Double, Double) = { val (oldPR, lastDelta) = attr - var teleport = oldPR - val delta = if (src==id) resetProb else 0.0 - teleport = oldPR*delta - - val newPR = teleport + (1.0 - resetProb) * msgSum - val newDelta = if (lastDelta == Double.NegativeInfinity) newPR else newPR - oldPR - (newPR, newDelta) + val newPR = if (lastDelta == Double.NegativeInfinity) { --- End diff -- My memory of the algorithm is a bit rusty. Why don't you need to check for self-loops here anymore? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17323: [SPARK-19986][Tests]Make pyspark.streaming.tests....
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/17323 [SPARK-19986][Tests]Make pyspark.streaming.tests.CheckpointTests more stable ## What changes were proposed in this pull request? Sometimes, CheckpointTests will hang on a busy machine because the streaming jobs are too slow and cannot catch up. I observed the scheduled delay was keeping increasing for dozens of seconds locally. This PR increases the batch interval from 0.5 seconds to 2 seconds to generate less Spark jobs. It should make `pyspark.streaming.tests.CheckpointTests` more stable. ## How was this patch tested? Jenkins You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark SPARK-19986 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17323.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17323 commit 6dddaab4715d3cce093c4a00f87a0c7cd1bda4db Author: Shixiong ZhuDate: 2017-03-16T21:13:43Z Fix test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17322: [SPARK-19987][SQL] Pass all filters into FileInde...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/17322 [SPARK-19987][SQL] Pass all filters into FileIndex ## What changes were proposed in this pull request? This is a tiny teeny refactoring to pass data filters also to the FileIndex, so FileIndex can have a more global view on predicates. ## How was this patch tested? Change should be covered by existing test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-19987 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17322.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17322 commit 87a8be533ca316ee586cc0f50c2b4aeeec1fb903 Author: Reynold XinDate: 2017-03-16T21:23:55Z [SPARK-19987][SQL] Pass all filters into FileIndex --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16905 **[Test build #3601 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3601/testReport)** for PR 16905 at commit [`479c01d`](https://github.com/apache/spark/commit/479c01d43de71d03b3276cdd59f12083e7da31c9). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class FakeSchedulerBackend extends SchedulerBackend ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17320: [SPARK-19967][SQL] Add from_json in FunctionRegistry
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17320 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74682/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17320: [SPARK-19967][SQL] Add from_json in FunctionRegistry
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17320 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17320: [SPARK-19967][SQL] Add from_json in FunctionRegistry
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17320 **[Test build #74682 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74682/testReport)** for PR 17320 at commit [`f6472c7`](https://github.com/apache/spark/commit/f6472c79e5ff693f805e9a97452e5e2134f63d0c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17307: [SPARK-13369] Make number of consecutive fetch failures ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17307 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74683/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17307: [SPARK-13369] Make number of consecutive fetch failures ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17307 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17307: [SPARK-13369] Make number of consecutive fetch failures ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17307 **[Test build #74683 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74683/testReport)** for PR 17307 at commit [`0f95c8b`](https://github.com/apache/spark/commit/0f95c8b1ad260abb1a64d9cbd25d09a1bafeb1d8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 @jkbradley can you take a look? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17166 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17166 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74679/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17166 **[Test build #74679 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74679/testReport)** for PR 17166 at commit [`fda712d`](https://github.com/apache/spark/commit/fda712de614a4ab37d359b1c192415d1df894ab1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16483: [SPARK-18847][GraphX] PageRank gives incorrect re...
Github user thunterdb commented on a diff in the pull request: https://github.com/apache/spark/pull/16483#discussion_r106528007 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala --- @@ -162,7 +162,15 @@ object PageRank extends Logging { iteration += 1 } -rankGraph +// If the graph has sinks (vertices with no outgoing edges) the sum of ranks will not be correct --- End diff -- put the name of the ticket as well --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15363: [SPARK-17791][SQL] Join reordering using star schema det...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15363 **[Test build #74685 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74685/testReport)** for PR 15363 at commit [`de16f53`](https://github.com/apache/spark/commit/de16f539450a518e7b72cc52bbc5eb489d8dae32). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17319: [SPARK-19765][SPARK-18549][SPARK-19093][SPARK-19736][BAC...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17319 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17319: [SPARK-19765][SPARK-18549][SPARK-19093][SPARK-19736][BAC...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17319 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74678/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17319: [SPARK-19765][SPARK-18549][SPARK-19093][SPARK-19736][BAC...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17319 **[Test build #74678 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74678/testReport)** for PR 17319 at commit [`11a8f31`](https://github.com/apache/spark/commit/11a8f31d5954c14eb8e546d001688f93357da676). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org