[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3574#discussion_r21473307 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1010,7 +1010,10 @@ private[spark] class BlockManager( info.synchronized { // required ? As of now, this will be invoked only for blocks which are ready // But in case this changes in future, adding for consistency sake. -if (!info.waitForReady()) { +if (blockInfo.get(blockId).isEmpty) { + logWarning(sBlock $blockId was already dropped.) + return None +} else if(!info.waitForReady()) { --- End diff -- Minor style nit: this needs a space after the `if` and before the open paren: `if (!info...`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3574#issuecomment-66162338 Jenkins, this is ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3574#issuecomment-66163112 [Test build #24225 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24225/consoleFull) for PR 3574 at commit [`55fa4ba`](https://github.com/apache/spark/commit/55fa4ba1e41eb36b1c4f867efbdd35c9b8a4f131). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3574#issuecomment-66177706 [Test build #24225 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24225/consoleFull) for PR 3574 at commit [`55fa4ba`](https://github.com/apache/spark/commit/55fa4ba1e41eb36b1c4f867efbdd35c9b8a4f131). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3574#issuecomment-66177717 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24225/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3574#discussion_r21491375 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1010,7 +1010,10 @@ private[spark] class BlockManager( info.synchronized { // required ? As of now, this will be invoked only for blocks which are ready --- End diff -- This comment actually refers to the `!info.waitForReady()` case, so I'd like to either move the comment or swap the order of these checks so that we check for `blockInfo.get(blockId).isEmpty` in the `else if` clause instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3574#issuecomment-66199309 Left one minor code organization comment; aside from that, this looks good to me and should be ready to merge after you fix that up (I can do it if you don't have time, though; just let me know). There are a couple of edits that I'd like to make to the commit title / description before merging this, but I can do it myself on merge. Thanks for the careful analysis and for catching this issue! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user suyanNone commented on the pull request: https://github.com/apache/spark/pull/3574#issuecomment-65976983 @JoshRosen yes, it will not cause any error if there not have a check in removeBlock and dropOldBlocks I pull this request, because I think it will be more reasonable in semantic. and I already remove the check in removeBlock and dropOldBlocks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3574#issuecomment-65910884 Ah, I think I see your concern: let's say that we a block and there are two threads that are racing to perform operations on it: to use your example, thread A wants to call `removeBlock()` and thread B wants to call `dropFromMemory()`. For this code to work correctly, we want it to operate correctly for all possible interleavings of those threads If we consider the case where _all_ of thread A's steps execute before _any_ of thread B's, then things work okay: thread A will have removed the entry from `blockInfo` before thread B runs, so `B` will see that `info == null` and log a warning that the block has already been removed. The same is true for B before A. In another execution, though, both threads could find the `BlockInfo` instance in the `blockInfo` map but race on acquiring its lock (`info.synchronized`), so `info != null` will be true for both threads. I agree that this could be a problem, but it might not be if the operations performed in those threads are idempotent. Let's take a look and see if that's the case: - `removeBlock`: all of the operations here are idempotent, so at worst we get a warning if we run this on a block that's removed by another racing thread. - `dropOldBlocks`: similarly, this just consists of calls to `*Store.remove()`, which are idempotent. - `dropFromMemory`: this case might actually be problematic, since I think that this method calls data store operations that don't handle missing blocks. I'm going to look at this case in a little more detail, but I think that your fix for this might be a good idea. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3574#discussion_r21418593 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1089,15 +1089,17 @@ private[spark] class BlockManager( val info = blockInfo.get(blockId).orNull if (info != null) { info.synchronized { -// Removals are idempotent in disk store and memory store. At worst, we get a warning. -val removedFromMemory = memoryStore.remove(blockId) -val removedFromDisk = diskStore.remove(blockId) -val removedFromTachyon = if (tachyonInitialized) tachyonStore.remove(blockId) else false -if (!removedFromMemory !removedFromDisk !removedFromTachyon) { - logWarning(sBlock $blockId could not be removed as it was not found in either + -the disk, memory, or tachyon store) +if (blockInfo.get(blockId).isEmpty) { --- End diff -- I don't think that this extra check is necessary; check out my comment on the main pull request and see if you agree. Even if we did want to add a check here, I think we want to check for `if(blockInfo.get(blockId).nonEmpty)`, since this branch handles the case where blocks have _not_ been removed already. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3574#discussion_r21418607 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1126,12 +1128,14 @@ private[spark] class BlockManager( val (id, info, time) = (entry.getKey, entry.getValue.value, entry.getValue.timestamp) if (time cleanupTime shouldDrop(id)) { info.synchronized { - val level = info.level - if (level.useMemory) { memoryStore.remove(id) } - if (level.useDisk) { diskStore.remove(id) } - if (level.useOffHeap) { tachyonStore.remove(id) } - iterator.remove() - logInfo(sDropped block $id) + if (blockInfo.get(id).isEmpty) { --- End diff -- Similarly, I don't think that we strictly _need_ a check here since the `remove(id)` operations are idempotent. It might be nice to log a warning if the block has already been removed, but that might not be necessary since this is a background cleanup call. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3574#discussion_r21418614 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1010,9 +1010,9 @@ private[spark] class BlockManager( info.synchronized { // required ? As of now, this will be invoked only for blocks which are ready // But in case this changes in future, adding for consistency sake. -if (!info.waitForReady()) { +if (blockInfo.get(blockId).isEmpty || !info.waitForReady()) { --- End diff -- It might be nice to split this into an `if` and `else if` case so that we can log specific / accurate error messages in each of the cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user suyanNone commented on a diff in the pull request: https://github.com/apache/spark/pull/3574#discussion_r21359660 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1089,15 +1089,17 @@ private[spark] class BlockManager( val info = blockInfo.get(blockId).orNull if (info != null) { info.synchronized { -// Removals are idempotent in disk store and memory store. At worst, we get a warning. -val removedFromMemory = memoryStore.remove(blockId) -val removedFromDisk = diskStore.remove(blockId) -val removedFromTachyon = if (tachyonInitialized) tachyonStore.remove(blockId) else false -if (!removedFromMemory !removedFromDisk !removedFromTachyon) { - logWarning(sBlock $blockId could not be removed as it was not found in either + -the disk, memory, or tachyon store) +if (blockInfo.get(blockId).isEmpty) { --- End diff -- 1. ThreadA into RemoveBlock(), and got info for blockId1 2. ThreadB into DropFromMemory(), and got info for blockId1 now Thread A, B all want got info.sychronized B got, and drop block(blockId) from memory, and this block is use memory only, so it remove from blockinfo. and then release the lock. then A got, but info is already not in blockinfo. Did I have miss sth or misunderstand sth? may dropForMemory and removeBlock can't happen at the same time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user suyanNone commented on the pull request: https://github.com/apache/spark/pull/3574#issuecomment-65779235 @JoshRosen ThreadA into RemoveBlock(), and got info for blockId1 ThreadB into DropFromMemory(), and got info for blockId1 now Thread A, B all want got info.sychronized B got, and drop block(blockId) from memory, and this block is use memory only, so it remove from blockinfo. and then release the lock. then A got, but info is already not in blockinfo. Did I have miss sth or misunderstand sth? may dropForMemory and removeBlock can't happen at the same time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3574#discussion_r21352757 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1089,15 +1089,17 @@ private[spark] class BlockManager( val info = blockInfo.get(blockId).orNull if (info != null) { info.synchronized { -// Removals are idempotent in disk store and memory store. At worst, we get a warning. -val removedFromMemory = memoryStore.remove(blockId) -val removedFromDisk = diskStore.remove(blockId) -val removedFromTachyon = if (tachyonInitialized) tachyonStore.remove(blockId) else false -if (!removedFromMemory !removedFromDisk !removedFromTachyon) { - logWarning(sBlock $blockId could not be removed as it was not found in either + -the disk, memory, or tachyon store) +if (blockInfo.get(blockId).isEmpty) { --- End diff -- Unless I'm missing something, this seems like the opposite of the condition we want here: if blockInfo doesn't contain an entry for `blockId`, then why would we want to remove that entry? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3574#issuecomment-6571 Can you elaborate on the problem that this PR addresses? Have you observed an actual error that this fixes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
GitHub user suyanNone opened a pull request: https://github.com/apache/spark/pull/3574 [SPARK-4714][CORE]: Add checking info is null or not while having gotten info.syn [SPARK-4714][CORE]: Add checking info is null or not while having gotten info.syn in removeBlock()/ dropOldBlock()/ dropFromMemory() all have the same logic: 1. info = blockInfo.get(id) 2. if (info != null) 3. info.synchronized there may be a possibility that while one thread got info.lock while the previous thread already removed from blockinfo in info.lock. but one thing in current code, That not check info is null or not, while getting info.lock to remove block, will not cause any errors. You can merge this pull request into a Git repository by running: $ git pull https://github.com/suyanNone/spark refine-block-concurrency Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3574.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3574 commit e57e270906da88599cd240bff3d0fce6334bd954 Author: hushan[è¡ç] hus...@xiaomi.com Date: 2014-12-03T07:19:50Z add check info is already remove or not while having gotten info.syn --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3574#issuecomment-65366168 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org