[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...

2014-12-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3574#discussion_r21473307
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -1010,7 +1010,10 @@ private[spark] class BlockManager(
   info.synchronized {
 // required ? As of now, this will be invoked only for blocks 
which are ready
 // But in case this changes in future, adding for consistency sake.
-if (!info.waitForReady()) {
+if (blockInfo.get(blockId).isEmpty) {
+  logWarning(sBlock $blockId was already dropped.)
+  return None
+} else if(!info.waitForReady()) {
--- End diff --

Minor style nit: this needs a space after the `if` and before the open 
paren: `if (!info...`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...

2014-12-08 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3574#issuecomment-66162338
  
Jenkins, this is ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...

2014-12-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3574#issuecomment-66163112
  
  [Test build #24225 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24225/consoleFull)
 for   PR 3574 at commit 
[`55fa4ba`](https://github.com/apache/spark/commit/55fa4ba1e41eb36b1c4f867efbdd35c9b8a4f131).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...

2014-12-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3574#issuecomment-66177706
  
  [Test build #24225 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24225/consoleFull)
 for   PR 3574 at commit 
[`55fa4ba`](https://github.com/apache/spark/commit/55fa4ba1e41eb36b1c4f867efbdd35c9b8a4f131).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...

2014-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3574#issuecomment-66177717
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24225/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...

2014-12-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3574#discussion_r21491375
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -1010,7 +1010,10 @@ private[spark] class BlockManager(
   info.synchronized {
 // required ? As of now, this will be invoked only for blocks 
which are ready
--- End diff --

This comment actually refers to the `!info.waitForReady()` case, so I'd 
like to either move the comment or swap the order of these checks so that we 
check for `blockInfo.get(blockId).isEmpty` in the `else if` clause instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...

2014-12-08 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3574#issuecomment-66199309
  
Left one minor code organization comment; aside from that, this looks good 
to me and should be ready to merge after you fix that up (I can do it if you 
don't have time, though; just let me know).

There are a couple of edits that I'd like to make to the commit title / 
description before merging this, but I can do it myself on merge.

Thanks for the careful analysis and for catching this issue!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...

2014-12-07 Thread suyanNone
Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/3574#issuecomment-65976983
  
@JoshRosen 
yes, it will not cause any error if there not have a check in removeBlock 
and dropOldBlocks
I pull this request, because I think it will be more reasonable in semantic.
and I already remove the check in removeBlock and dropOldBlocks 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...

2014-12-06 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3574#issuecomment-65910884
  
Ah, I think I see your concern: let's say that we a block and there are two 
threads that are racing to perform operations on it: to use your example, 
thread A wants to call `removeBlock()` and thread B wants to call 
`dropFromMemory()`.  For this code to work correctly, we want it to operate 
correctly for all possible interleavings of those threads

If we consider the case where _all_ of thread A's steps execute before 
_any_ of thread B's, then things work okay: thread A will have removed the 
entry from `blockInfo` before thread B runs, so `B` will see that `info == 
null` and log a warning that the block has already been removed.  The same is 
true for B before A.

In another execution, though, both threads could find the `BlockInfo` 
instance in the `blockInfo` map but race on acquiring its lock 
(`info.synchronized`), so `info != null` will be true for both threads.  I 
agree that this could be a problem, but it might not be if the operations 
performed in those threads are idempotent.  Let's take a look and see if that's 
the case:

 - `removeBlock`: all of the operations here are idempotent, so at worst we 
get a warning if we run this on a block that's removed by another racing thread.

- `dropOldBlocks`: similarly, this just consists of calls to 
`*Store.remove()`, which are idempotent.

- `dropFromMemory`: this case might actually be problematic, since I think 
that this method calls data store operations that don't handle missing blocks.  
I'm going to look at this case in a little more detail, but I think that your 
fix for this might be a good idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...

2014-12-06 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3574#discussion_r21418593
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -1089,15 +1089,17 @@ private[spark] class BlockManager(
 val info = blockInfo.get(blockId).orNull
 if (info != null) {
   info.synchronized {
-// Removals are idempotent in disk store and memory store. At 
worst, we get a warning.
-val removedFromMemory = memoryStore.remove(blockId)
-val removedFromDisk = diskStore.remove(blockId)
-val removedFromTachyon = if (tachyonInitialized) 
tachyonStore.remove(blockId) else false
-if (!removedFromMemory  !removedFromDisk  !removedFromTachyon) 
{
-  logWarning(sBlock $blockId could not be removed as it was not 
found in either  +
-the disk, memory, or tachyon store)
+if (blockInfo.get(blockId).isEmpty) {
--- End diff --

I don't think that this extra check is necessary; check out my comment on 
the main pull request and see if you agree.

Even if we did want to add a check here, I think we want to check for 
`if(blockInfo.get(blockId).nonEmpty)`, since this branch handles the case where 
blocks have _not_ been removed already.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...

2014-12-06 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3574#discussion_r21418607
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -1126,12 +1128,14 @@ private[spark] class BlockManager(
   val (id, info, time) = (entry.getKey, entry.getValue.value, 
entry.getValue.timestamp)
   if (time  cleanupTime  shouldDrop(id)) {
 info.synchronized {
-  val level = info.level
-  if (level.useMemory) { memoryStore.remove(id) }
-  if (level.useDisk) { diskStore.remove(id) }
-  if (level.useOffHeap) { tachyonStore.remove(id) }
-  iterator.remove()
-  logInfo(sDropped block $id)
+  if (blockInfo.get(id).isEmpty) {
--- End diff --

Similarly, I don't think that we strictly _need_ a check here since the 
`remove(id)` operations are idempotent.  It might be nice to log a warning if 
the block has already been removed, but that might not be necessary since this 
is a background cleanup call.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...

2014-12-06 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3574#discussion_r21418614
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -1010,9 +1010,9 @@ private[spark] class BlockManager(
   info.synchronized {
 // required ? As of now, this will be invoked only for blocks 
which are ready
 // But in case this changes in future, adding for consistency sake.
-if (!info.waitForReady()) {
+if (blockInfo.get(blockId).isEmpty || !info.waitForReady()) {
--- End diff --

It might be nice to split this into an `if` and `else if` case so that we 
can log specific / accurate error messages in each of the cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...

2014-12-05 Thread suyanNone
Github user suyanNone commented on a diff in the pull request:

https://github.com/apache/spark/pull/3574#discussion_r21359660
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -1089,15 +1089,17 @@ private[spark] class BlockManager(
 val info = blockInfo.get(blockId).orNull
 if (info != null) {
   info.synchronized {
-// Removals are idempotent in disk store and memory store. At 
worst, we get a warning.
-val removedFromMemory = memoryStore.remove(blockId)
-val removedFromDisk = diskStore.remove(blockId)
-val removedFromTachyon = if (tachyonInitialized) 
tachyonStore.remove(blockId) else false
-if (!removedFromMemory  !removedFromDisk  !removedFromTachyon) 
{
-  logWarning(sBlock $blockId could not be removed as it was not 
found in either  +
-the disk, memory, or tachyon store)
+if (blockInfo.get(blockId).isEmpty) {
--- End diff --


1. ThreadA into RemoveBlock(), and got info for blockId1
2. ThreadB into DropFromMemory(), and got info for blockId1

now Thread A, B  all want got info.sychronized

B got, and drop block(blockId) from memory, and this block is use memory 
only, so it remove from blockinfo. and then release the lock.

then A got, but info is already not  in blockinfo.

Did I have miss sth or misunderstand sth? may dropForMemory and removeBlock 
can't happen at the same time?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...

2014-12-05 Thread suyanNone
Github user suyanNone commented on the pull request:

https://github.com/apache/spark/pull/3574#issuecomment-65779235
  
@JoshRosen  
ThreadA into RemoveBlock(), and got info for blockId1
ThreadB into DropFromMemory(), and got info for blockId1
now Thread A, B all want got info.sychronized

B got, and drop block(blockId) from memory, and this block is use memory 
only, so it remove from blockinfo. and then release the lock.

then A got, but info is already not in blockinfo.

Did I have miss sth or misunderstand sth? may dropForMemory and removeBlock 
can't happen at the same time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...

2014-12-04 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3574#discussion_r21352757
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -1089,15 +1089,17 @@ private[spark] class BlockManager(
 val info = blockInfo.get(blockId).orNull
 if (info != null) {
   info.synchronized {
-// Removals are idempotent in disk store and memory store. At 
worst, we get a warning.
-val removedFromMemory = memoryStore.remove(blockId)
-val removedFromDisk = diskStore.remove(blockId)
-val removedFromTachyon = if (tachyonInitialized) 
tachyonStore.remove(blockId) else false
-if (!removedFromMemory  !removedFromDisk  !removedFromTachyon) 
{
-  logWarning(sBlock $blockId could not be removed as it was not 
found in either  +
-the disk, memory, or tachyon store)
+if (blockInfo.get(blockId).isEmpty) {
--- End diff --

Unless I'm missing something, this seems like the opposite of the condition 
we want here: if blockInfo doesn't contain an entry for `blockId`, then why 
would we want to remove that entry?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...

2014-12-04 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3574#issuecomment-6571
  
Can you elaborate on the problem that this PR addresses?  Have you observed 
an actual error that this fixes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...

2014-12-03 Thread suyanNone
GitHub user suyanNone opened a pull request:

https://github.com/apache/spark/pull/3574

[SPARK-4714][CORE]: Add checking info is null or not while having gotten 
info.syn

[SPARK-4714][CORE]: Add checking info is null or not while having gotten 
info.syn
in removeBlock()/ dropOldBlock()/ dropFromMemory()
all have the same logic:
1. info = blockInfo.get(id)
2. if (info != null)
3. info.synchronized
there may be a possibility that while one thread got info.lock while the 
previous thread already removed from blockinfo in info.lock.
but one thing in current code, That not check info is null or not, while 
getting info.lock to remove block, will not cause any errors.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/suyanNone/spark refine-block-concurrency

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3574.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3574


commit e57e270906da88599cd240bff3d0fce6334bd954
Author: hushan[胡珊] hus...@xiaomi.com
Date:   2014-12-03T07:19:50Z

add check info is already remove or not while having gotten info.syn




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...

2014-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3574#issuecomment-65366168
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org