[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3222#issuecomment-65889555 [Test build #24209 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24209/consoleFull) for PR 3222 at commit [`5b2ef49`](https://github.com/apache/spark/commit/5b2ef49ab42a0cdcb309495dd47f8f436b139ed7). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3623][GraphX] GraphX should support the...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2631#issuecomment-65889881 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24208/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3623][GraphX] GraphX should support the...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2631#issuecomment-65889877 [Test build #24208 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24208/consoleFull) for PR 2631 at commit [`a70c500`](https://github.com/apache/spark/commit/a70c5001977b7ab0a10716f69190ed0a6a797d5d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3623][GraphX] GraphX should support the...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2631 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3623][GraphX] GraphX should support the...
Github user ankurdave commented on the pull request: https://github.com/apache/spark/pull/2631#issuecomment-65890469 Thanks! Merged into master branch-1.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4620] Add unpersist in Graph and GraphI...
Github user ankurdave commented on the pull request: https://github.com/apache/spark/pull/3476#issuecomment-65890609 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2374] Path pattern matching for GraphX
GitHub user ankurdave reopened a pull request: https://github.com/apache/spark/pull/1307 [SPARK-2374] Path pattern matching for GraphX Based on a [request](http://apache-spark-user-list.1001560.n3.nabble.com/Graphx-traversal-and-merge-interesting-edges-td8788.html) on the mailing list, I wrote a simple implementation of path pattern matching for GraphX. It accepts patterns in the form of sequences of edge matchers, then iteratively propagates partial pattern matches to find all matching paths in the graph. Though this is only an initial implementation and there are many opportunities for optimization, having this algorithm in the library expands the scope of GraphX beyond ML-like algorithms into graph traversal. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ankurdave/spark PatternMatching Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1307.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1307 commit bc546a22066365122b2cbf4402946b88ee81de7b Author: Ankur Dave ankurd...@gmail.com Date: 2014-07-05T09:46:34Z Add graphx.lib.PatternMatching commit 9332a4927a065e2e5217a4256a5bc12a127ca97b Author: Ankur Dave ankurd...@gmail.com Date: 2014-07-05T10:16:20Z Fix comment error --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4620] Add unpersist in Graph and GraphI...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3476#issuecomment-65890689 [Test build #24210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24210/consoleFull) for PR 3476 at commit [`77a006a`](https://github.com/apache/spark/commit/77a006a77889a2f847dc0a6ad2e8e15e329b9137). * This patch **does not merge cleanly**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2374] Path pattern matching for GraphX
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1307#issuecomment-65890780 [Test build #24211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24211/consoleFull) for PR 1307 at commit [`9332a49`](https://github.com/apache/spark/commit/9332a4927a065e2e5217a4256a5bc12a127ca97b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3222#issuecomment-65891033 [Test build #24209 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24209/consoleFull) for PR 3222 at commit [`5b2ef49`](https://github.com/apache/spark/commit/5b2ef49ab42a0cdcb309495dd47f8f436b139ed7). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DBN(val stackedRBM: StackedRBM, val nn: MLP)` * `class MLP(` * `class RBM(` * `class StackedRBM(val innerRBMs: Array[RBM])` * `case class MinstItem(label: Int, data: Array[Int]) ` * `class MinstDatasetReader(labelsFile: String, imagesFile: String)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3222#issuecomment-65891034 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24209/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4721][CORE] Improve logic while first t...
Github user suyanNone commented on the pull request: https://github.com/apache/spark/pull/3582#issuecomment-65891778 Sorry for my poor comments and English. In all, 1. we do put one thread by one thread until there have 1 thread succeed. 2. multiple doGetLocal threads and only 1 dropFromMemory thread will wait 1 time whenever put is succeed or failed. doGetLocal get failed, the return none. dropFromMemory get failed, return none. There are 3 places call info.waitForReady() 1. doGetLocal 2. dropFromMemory 3. doPut and if there are many thread try to put the same block. for 1, do doGetLocal, I think just wait for one time(Wait1Condition, now renamed as OtherCondition), succeed or failed. for 2, actually it will never have the situation if we call dropFromMemory but the block is not ready. but in current code there are have a info.waitForReady method call in dropFromMemory, just for compatibility, let's wait only one time(Wait1Condition) for block put succeed or failed. and also think, if we found one thread do the dropFromMemory, we should cancel all put threads. for 3, do all put threads one by one untill there have a success or have a thread want drop it from memory as we described in 2. it may can fails many times, so WaitNCondition(now named as PutCondition) All I want to do for WaitType(now I rename BlockWaitCondition), just reuse enum convenience to call method and have a variable can record number of thread wait for that block finish put. and Each Block object have its own wait count, so I use extends Enumration. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4699][SQL] make caseSensitive configura...
Github user jackylk commented on the pull request: https://github.com/apache/spark/pull/3558#issuecomment-65892057 If we go for second way, it will create cyclic dependency between spark-catalyst and spark-sql sub-projects, because SQLConf and SQLContext is in spark-sql while Analyzer is in spark-catalyst. I think the current way the only drawback is that caseSensitive can only be set while initializing SQLContext, but can not be set after initialization. If client want to use case insensitive analyzer, he need to create a new SQLContext, which I think it is probably OK. I tested this way locally and it is passing SQLQuerySuite, I do not know why test case failed, as I can not access Jenkins test report now. Can anyone trigger Jenkins again, thanks. And any more suggestion for better solution? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2374] Path pattern matching for GraphX
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1307#issuecomment-65892529 [Test build #24211 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24211/consoleFull) for PR 1307 at commit [`9332a49`](https://github.com/apache/spark/commit/9332a4927a065e2e5217a4256a5bc12a127ca97b). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class EdgePattern[ED](attr: ED, matchDstFirst: Boolean = false) extends Serializable ` * `case class EdgeMatch[ED](srcId: VertexId, dstId: VertexId, attr: ED) extends Serializable ` * `case class Match[ED](path: List[EdgeMatch[ED]]) extends Serializable` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2374] Path pattern matching for GraphX
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1307#issuecomment-65892530 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24211/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4620] Add unpersist in Graph and GraphI...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3476#issuecomment-65892978 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24210/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4620] Add unpersist in Graph and GraphI...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3476#issuecomment-65892975 [Test build #24210 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24210/consoleFull) for PR 3476 at commit [`77a006a`](https://github.com/apache/spark/commit/77a006a77889a2f847dc0a6ad2e8e15e329b9137). * This patch **passes all tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4777][CORE] Some block memory after unr...
GitHub user suyanNone opened a pull request: https://github.com/apache/spark/pull/3629 [SPARK-4777][CORE] Some block memory after unrollSafely not count into used memory(memoryStore.entrys or unrollMemory) Some memory not count into memory used by memoryStore or unrollMemory. Thread A after unrollsafely memory, it will release 40MB unrollMemory(40MB will used by other threads). then ThreadA wait get accountingLock to tryToPut blockA(30MB). before Thread A get accountingLock, blockA memory size is not counting into unrollMemory or memoryStore.currentMemory. IIUC, freeMemory should minus that block memory So, put this release memory into pending, and release it in tryToPut before ensureSpace You can merge this pull request into a Git repository by running: $ git pull https://github.com/suyanNone/spark unroll-memory Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3629.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3629 commit 072e43d49226f1ae660d9b2ad53dc43ee78481e9 Author: hushan[è¡ç] hus...@xiaomi.com Date: 2014-12-05T02:56:20Z Pending unroll memory for this block untill tryToPut --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4777][CORE] Some block memory after unr...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3629#issuecomment-65894783 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] remove unnecessary import in spark-sql
GitHub user jackylk opened a pull request: https://github.com/apache/spark/pull/3630 [SQL] remove unnecessary import in spark-sql You can merge this pull request into a Git repository by running: $ git pull https://github.com/jackylk/spark remove Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3630.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3630 commit 150e7e0f4b0ec0eaa39736262d69c81d4ee83486 Author: Jacky Li jacky.li...@huawei.com Date: 2014-12-06T16:16:41Z remove unnecessary import --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] remove unnecessary import in spark-sql
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3630#issuecomment-65903938 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4691][Minor] Rewrite a few lines in shu...
Github user maji2014 commented on the pull request: https://github.com/apache/spark/pull/3553#issuecomment-65904987 NP, done for title change --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3574#issuecomment-65910884 Ah, I think I see your concern: let's say that we a block and there are two threads that are racing to perform operations on it: to use your example, thread A wants to call `removeBlock()` and thread B wants to call `dropFromMemory()`. For this code to work correctly, we want it to operate correctly for all possible interleavings of those threads If we consider the case where _all_ of thread A's steps execute before _any_ of thread B's, then things work okay: thread A will have removed the entry from `blockInfo` before thread B runs, so `B` will see that `info == null` and log a warning that the block has already been removed. The same is true for B before A. In another execution, though, both threads could find the `BlockInfo` instance in the `blockInfo` map but race on acquiring its lock (`info.synchronized`), so `info != null` will be true for both threads. I agree that this could be a problem, but it might not be if the operations performed in those threads are idempotent. Let's take a look and see if that's the case: - `removeBlock`: all of the operations here are idempotent, so at worst we get a warning if we run this on a block that's removed by another racing thread. - `dropOldBlocks`: similarly, this just consists of calls to `*Store.remove()`, which are idempotent. - `dropFromMemory`: this case might actually be problematic, since I think that this method calls data store operations that don't handle missing blocks. I'm going to look at this case in a little more detail, but I think that your fix for this might be a good idea. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3574#discussion_r21418593 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1089,15 +1089,17 @@ private[spark] class BlockManager( val info = blockInfo.get(blockId).orNull if (info != null) { info.synchronized { -// Removals are idempotent in disk store and memory store. At worst, we get a warning. -val removedFromMemory = memoryStore.remove(blockId) -val removedFromDisk = diskStore.remove(blockId) -val removedFromTachyon = if (tachyonInitialized) tachyonStore.remove(blockId) else false -if (!removedFromMemory !removedFromDisk !removedFromTachyon) { - logWarning(sBlock $blockId could not be removed as it was not found in either + -the disk, memory, or tachyon store) +if (blockInfo.get(blockId).isEmpty) { --- End diff -- I don't think that this extra check is necessary; check out my comment on the main pull request and see if you agree. Even if we did want to add a check here, I think we want to check for `if(blockInfo.get(blockId).nonEmpty)`, since this branch handles the case where blocks have _not_ been removed already. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3574#discussion_r21418607 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1126,12 +1128,14 @@ private[spark] class BlockManager( val (id, info, time) = (entry.getKey, entry.getValue.value, entry.getValue.timestamp) if (time cleanupTime shouldDrop(id)) { info.synchronized { - val level = info.level - if (level.useMemory) { memoryStore.remove(id) } - if (level.useDisk) { diskStore.remove(id) } - if (level.useOffHeap) { tachyonStore.remove(id) } - iterator.remove() - logInfo(sDropped block $id) + if (blockInfo.get(id).isEmpty) { --- End diff -- Similarly, I don't think that we strictly _need_ a check here since the `remove(id)` operations are idempotent. It might be nice to log a warning if the block has already been removed, but that might not be necessary since this is a background cleanup call. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3574#discussion_r21418614 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1010,9 +1010,9 @@ private[spark] class BlockManager( info.synchronized { // required ? As of now, this will be invoked only for blocks which are ready // But in case this changes in future, adding for consistency sake. -if (!info.waitForReady()) { +if (blockInfo.get(blockId).isEmpty || !info.waitForReady()) { --- End diff -- It might be nice to split this into an `if` and `else if` case so that we can log specific / accurate error messages in each of the cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4721][CORE] Improve logic while first t...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3582#issuecomment-65911726 To clarify a bit further: I think that `BlockInfo.waitForReady()` is designed to allow callers to block until a block write has completed. If we have many threads (readers) waiting for a block to be written, then think we should be okay because `notifyAll()` will wake all of those threads when the block becomes ready. From your description, it sounds like you're worried about a multiple writer-threads case, where we have many threads attempting to write the same block and a failed write attempt from _one_ of them wakes up the waiting threads and notifies them of a failure even though there's another write in progress which might succeed. Is your goal to wait until _all_ of the pending writes have failed before notifying a reader that the write has failed and to wait for _one_ of them to succeed before notifying the reader that the write succeeded? I'll have to dig into the BlockManager internals to see whether we can ever have multiple in-progress writes for the same block. Do you have an example of when this can happen? I'd be happy to look over the code and provide more feedback / suggestions, but I want to make sure that I understand the motivation and confirm that this is fixing an actual bug, since it seems like this adds a moderate amount of complexity. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] remove unnecessary import in spark-sql
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3630#issuecomment-65911873 Jenkins, this is ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] remove unnecessary import in spark-sql
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3630#issuecomment-65911912 [Test build #24212 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24212/consoleFull) for PR 3630 at commit [`150e7e0`](https://github.com/apache/spark/commit/150e7e0f4b0ec0eaa39736262d69c81d4ee83486). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...
Github user jimjh commented on the pull request: https://github.com/apache/spark/pull/3238#issuecomment-65913114 Yea I should have been more careful. I agree that we should figure out a proper solution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] remove unnecessary import in spark-sql
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3630#issuecomment-65914284 [Test build #24212 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24212/consoleFull) for PR 3630 at commit [`150e7e0`](https://github.com/apache/spark/commit/150e7e0f4b0ec0eaa39736262d69c81d4ee83486). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] remove unnecessary import in spark-sql
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3630#issuecomment-65914287 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24212/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add mesos specific configurations into doc
Github user tnachen commented on the pull request: https://github.com/apache/spark/pull/3349#issuecomment-65921947 @ash211 didn't know there is already a built in one, I updated this PR to use that instead. please take a look, and if it looks good please let me know who I should ping to push this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add mesos specific configurations into doc
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3349#issuecomment-65922015 [Test build #24213 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24213/consoleFull) for PR 3349 at commit [`737ef49`](https://github.com/apache/spark/commit/737ef4983d4e7f5221d49f132d4a17c9b999c71a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add mesos specific configurations into doc
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3349#issuecomment-65923795 [Test build #24213 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24213/consoleFull) for PR 3349 at commit [`737ef49`](https://github.com/apache/spark/commit/737ef4983d4e7f5221d49f132d4a17c9b999c71a). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add mesos specific configurations into doc
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3349#issuecomment-65923798 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24213/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4469] [SQL] Move the SemanticAnalyzer f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3336#issuecomment-65926034 [Test build #24214 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24214/consoleFull) for PR 3336 at commit [`b85b620`](https://github.com/apache/spark/commit/b85b6204736855f0380d675494329e85d8a3948a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4769] [SQL] CTAS does not work when rea...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3336#issuecomment-65926083 [Test build #24215 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24215/consoleFull) for PR 3336 at commit [`5d58812`](https://github.com/apache/spark/commit/5d5881214ecc7223bfc6e2fef04a46a457eb66f5). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4769] [SQL] CTAS does not work when rea...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3336#issuecomment-65926374 [Test build #24216 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24216/consoleFull) for PR 3336 at commit [`4f97f14`](https://github.com/apache/spark/commit/4f97f144fa70be4ea13fe36dc22f868c455243f4). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4769] [SQL] CTAS does not work when rea...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3336#issuecomment-65927167 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24215/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4769] [SQL] CTAS does not work when rea...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3336#issuecomment-65927166 [Test build #24215 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24215/consoleFull) for PR 3336 at commit [`5d58812`](https://github.com/apache/spark/commit/5d5881214ecc7223bfc6e2fef04a46a457eb66f5). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4769] [SQL] CTAS does not work when rea...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3336#issuecomment-65927251 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24214/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4769] [SQL] CTAS does not work when rea...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3336#issuecomment-65927249 [Test build #24214 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24214/consoleFull) for PR 3336 at commit [`b85b620`](https://github.com/apache/spark/commit/b85b6204736855f0380d675494329e85d8a3948a). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4769] [SQL] CTAS does not work when rea...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3336#issuecomment-65927454 [Test build #24216 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24216/consoleFull) for PR 3336 at commit [`4f97f14`](https://github.com/apache/spark/commit/4f97f144fa70be4ea13fe36dc22f868c455243f4). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4769] [SQL] CTAS does not work when rea...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3336#issuecomment-65927457 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24216/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org