[GitHub] spark pull request: [SPARK-4006] Block Manager - Double Register C...

2015-06-04 Thread tsliwowicz
Github user tsliwowicz commented on the pull request:

https://github.com/apache/spark/pull/2854#issuecomment-108803453
  
@pwendell  Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] Block Manager - Double Register C...

2015-06-04 Thread tsliwowicz
Github user tsliwowicz closed the pull request at:

https://github.com/apache/spark/pull/2854


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6305] add log4j2 profile and prevent lo...

2015-03-15 Thread tsliwowicz
Github user tsliwowicz commented on the pull request:

https://github.com/apache/spark/pull/4998#issuecomment-80878911
  
@srowen We don't know of an option to run side by side with two log4j 
versions. It conflicts on both slf4j and log4j classes. In any case, I believe 
it won't create a bigger problem because it just adds a build option. How is 
that a problem at all? For us, the log4j version is fixed for all drivers. The 
rest of the organisation switched to log4j 2, and we would like to do the same 
for Spark. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6305] add log4j2 profile and prevent lo...

2015-03-13 Thread tsliwowicz
Github user tsliwowicz commented on the pull request:

https://github.com/apache/spark/pull/4998#issuecomment-78868836
  
@srowen this is in essence the same as the ability to control hadoop 
version only through classpath manipulation. Since there are many flavors, its 
being done by using different profiles. While Spark may choose to leave the 
default log4j support for 1.2, the only way to fully switch to log4j 2 is via 
changing the classpath. Using the same logic that was applied by the spark 
designer for hadoop, @liorchaga created the profile for log4j 2. Adding a build 
profile does not complicate things because the default (that is not specifying 
the profile) means the build stays the same. Only those who need it, will use 
the profile. Same as for various hadoop versions. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...

2014-12-19 Thread tsliwowicz
Github user tsliwowicz closed the pull request at:

https://github.com/apache/spark/pull/2914


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...

2014-12-17 Thread tsliwowicz
Github user tsliwowicz commented on the pull request:

https://github.com/apache/spark/pull/2914#issuecomment-67451754
  
hurray :-)

On Thu, Dec 18, 2014 at 12:13 AM, andrewor14 notificati...@github.com
wrote:

 Finally. I'm merging this into branch-1.0 thanks for your patience
 @tsliwowicz https://github.com/tsliwowicz

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/2914#issuecomment-67404758.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...

2014-12-10 Thread tsliwowicz
Github user tsliwowicz commented on the pull request:

https://github.com/apache/spark/pull/2914#issuecomment-66446290
  
No problem. Glad to help :-)

On Wed, Dec 10, 2014 at 4:44 AM, andrewor14 notificati...@github.com
wrote:

 Hey sorry @tsliwowicz https://github.com/tsliwowicz for using your PRs
 as the battleground in fixing our builds against older branches. There
 aren't a lot of PRs opened against older branches so these tests aren't 
run
 in this context very often. So far I think all of these test failures have
 nothing to do with your patch so there is no action needed on your side. 
On
 our side, we'll keep investigating why the tests are failing all the time.

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/2914#issuecomment-66396333.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...

2014-12-05 Thread tsliwowicz
Github user tsliwowicz commented on the pull request:

https://github.com/apache/spark/pull/2914#issuecomment-65783784
  
Seems like an issue with Jenkins


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] Block Manager - Double Register C...

2014-12-05 Thread tsliwowicz
Github user tsliwowicz commented on the pull request:

https://github.com/apache/spark/pull/2854#issuecomment-65783822
  
Seems like an issue with Jenkins


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...

2014-10-28 Thread tsliwowicz
Github user tsliwowicz commented on the pull request:

https://github.com/apache/spark/pull/2914#issuecomment-60739832
  
Hi @andrewor14 - can I help somehow? I see that the PRs were not yet merged 
into 0.9 and 1.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] Block Manager - Double Register C...

2014-10-24 Thread tsliwowicz
Github user tsliwowicz commented on the pull request:

https://github.com/apache/spark/pull/2854#issuecomment-60389860
  
there seems to be some technical issue with the build. (not a real failure 
with the pull request itself)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...

2014-10-24 Thread tsliwowicz
Github user tsliwowicz commented on the pull request:

https://github.com/apache/spark/pull/2914#issuecomment-60389844
  
there seems to be some technical issue with the build. (not a real failure 
with the pull request itself)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...

2014-10-24 Thread tsliwowicz
Github user tsliwowicz commented on the pull request:

https://github.com/apache/spark/pull/2914#issuecomment-60410765
  
I was asked by @andrewor14 to open separate PRs because it does not merge 
cleanly. https://github.com/apache/spark/pull/2886 was approved and merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...

2014-10-24 Thread tsliwowicz
Github user tsliwowicz commented on the pull request:

https://github.com/apache/spark/pull/2914#issuecomment-60411019
  
@srowen I don't have a login to Jenkins so someone else needs to restart 
the build. Is there a way to get a login? I would gladly do it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...

2014-10-24 Thread tsliwowicz
Github user tsliwowicz commented on the pull request:

https://github.com/apache/spark/pull/2914#issuecomment-60448957
  
@andrewor14  - thanks for your help!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...

2014-10-23 Thread tsliwowicz
Github user tsliwowicz commented on the pull request:

https://github.com/apache/spark/pull/2886#issuecomment-60221362
  
@andrewor14  - thanks for the comments. I believe I fixed them all. Let me 
know!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...

2014-10-23 Thread tsliwowicz
Github user tsliwowicz commented on the pull request:

https://github.com/apache/spark/pull/2886#issuecomment-60222733
  
the failure seems technical (not related to my fix), I think. Local maven 
build works fine for me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...

2014-10-23 Thread tsliwowicz
Github user tsliwowicz commented on the pull request:

https://github.com/apache/spark/pull/2886#issuecomment-60306506
  
will do. Can you also merge into the 0.9 branch? I will update the PR I 
already have for it. https://github.com/apache/spark/pull/2854


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...

2014-10-23 Thread tsliwowicz
GitHub user tsliwowicz opened a pull request:

https://github.com/apache/spark/pull/2914

[SPARK-4006] In long running contexts, we encountered the situation of d...

...ouble registe...

...r without a remove in between. The cause for that is unknown, and 
assumed a temp network issue.

However, since the second register is with a BlockManagerId on a different 
port, blockManagerInfo.contains() returns false, while blockManagerIdByExecutor 
returns Some. This inconsistency is caught in a conditional statement that does 
System.exit(1), which is a huge robustness issue for us.

The fix - simply remove the old id from both maps during register when this 
happens. We are mimicking the behavior of expireDeadHosts(), by doing local 
cleanup of the maps before trying to add new ones.

Also - added some logging for register and unregister.

This is just like https://github.com/apache/spark/pull/2886 except it's on 
master


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/taboola/spark branch-1.0-block-mgr-removal

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2914.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2914


commit 1014493621016c596eb02eba4cf5228b0b834ef7
Author: Tal Sliwowicz ta...@taboola.com
Date:   2014-10-23T20:26:26Z

[SPARK-4006] In long running contexts, we encountered the situation of 
double registe...

...r without a remove in between. The cause for that is unknown, and 
assumed a temp network issue.

However, since the second register is with a BlockManagerId on a different 
port, blockManagerInfo.contains() returns false, while blockManagerIdByExecutor 
returns Some. This inconsistency is caught in a conditional statement that does 
System.exit(1), which is a huge robustness issue for us.

The fix - simply remove the old id from both maps during register when this 
happens. We are mimicking the behavior of expireDeadHosts(), by doing local 
cleanup of the maps before trying to add new ones.

Also - added some logging for register and unregister.

This is just like https://github.com/apache/spark/pull/2854 except it's on 
master

Author: Tal Sliwowicz ta...@taboola.com

Closes #2886 from tsliwowicz/master-block-mgr-removal and squashes the 
following commits:

094d508 [Tal Sliwowicz] some more white space change undone
41a2217 [Tal Sliwowicz] some more whitspaces change undone
7bcfc3d [Tal Sliwowicz] whitspaces fix
df9d98f [Tal Sliwowicz] Code review comments fixed
f48bce9 [Tal Sliwowicz] In long running contexts, we encountered the 
situation of double register without a remove in between. The cause for that is 
unknown, and assumed a temp network issue.

(cherry picked from commit 6b485225271a3c616c4fa1231c20090a95c86f32)

Conflicts:

core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala

(cherry picked from commit d122236252d63635df7a112d92e90a2654702fc4)

Conflicts:

core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...

2014-10-23 Thread tsliwowicz
GitHub user tsliwowicz opened a pull request:

https://github.com/apache/spark/pull/2915

[SPARK-4006] In long running contexts, we encountered the situation of d...

...ouble registe...

...r without a remove in between. The cause for that is unknown, and 
assumed a temp network issue.

However, since the second register is with a BlockManagerId on a different 
port, blockManagerInfo.contains() returns false, while blockManagerIdByExecutor 
returns Some. This inconsistency is caught in a conditional statement that does 
System.exit(1), which is a huge robustness issue for us.

The fix - simply remove the old id from both maps during register when this 
happens. We are mimicking the behavior of expireDeadHosts(), by doing local 
cleanup of the maps before trying to add new ones.

Also - added some logging for register and unregister.

This is just like https://github.com/apache/spark/pull/2886 except it's on 
branch-1.1


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/taboola/spark branch-1.1-block-mgr-removal

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2915.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2915


commit d122236252d63635df7a112d92e90a2654702fc4
Author: Tal Sliwowicz ta...@taboola.com
Date:   2014-10-23T20:13:41Z

[SPARK-4006] In long running contexts, we encountered the situation of 
double registe...

...r without a remove in between. The cause for that is unknown, and 
assumed a temp network issue.

However, since the second register is with a BlockManagerId on a different 
port, blockManagerInfo.contains() returns false, while blockManagerIdByExecutor 
returns Some. This inconsistency is caught in a conditional statement that does 
System.exit(1), which is a huge robustness issue for us.

The fix - simply remove the old id from both maps during register when this 
happens. We are mimicking the behavior of expireDeadHosts(), by doing local 
cleanup of the maps before trying to add new ones.

Also - added some logging for register and unregister.

This is just like https://github.com/apache/spark/pull/2854 except it's on 
master

Author: Tal Sliwowicz ta...@taboola.com

Closes #2886 from tsliwowicz/master-block-mgr-removal and squashes the 
following commits:

094d508 [Tal Sliwowicz] some more white space change undone
41a2217 [Tal Sliwowicz] some more whitspaces change undone
7bcfc3d [Tal Sliwowicz] whitspaces fix
df9d98f [Tal Sliwowicz] Code review comments fixed
f48bce9 [Tal Sliwowicz] In long running contexts, we encountered the 
situation of double register without a remove in between. The cause for that is 
unknown, and assumed a temp network issue.

(cherry picked from commit 6b485225271a3c616c4fa1231c20090a95c86f32)

Conflicts:

core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...

2014-10-23 Thread tsliwowicz
Github user tsliwowicz commented on the pull request:

https://github.com/apache/spark/pull/2886#issuecomment-60312692
  
@andrewor14 I created PR for 1.0, 1.1 and updated the 0.9 PR - can you 
please review and merge if ok?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] Block Manager - Double Register C...

2014-10-21 Thread tsliwowicz
Github user tsliwowicz commented on the pull request:

https://github.com/apache/spark/pull/2854#issuecomment-59922468
  
@andrewor14 - thanks,  and sure - I will fix your comments and do a PR 
against master.
However, re your logging comments, it really isn't that much. It adds a few 
lines of logging per run, which is insignificant, and it helps greatly to track 
registration and removal of block managers, which is really helpful in 
production to track issues. If you still think it's too much I will leave them 
out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] In long running contexts, we enco...

2014-10-21 Thread tsliwowicz
GitHub user tsliwowicz opened a pull request:

https://github.com/apache/spark/pull/2886

[SPARK-4006] In long running contexts, we encountered the situation of 
double registe...

...r without a remove in between. The cause for that is unknown, and 
assumed a temp network issue.

However, since the second register is with a BlockManagerId on a different 
port, blockManagerInfo.contains() returns false, while blockManagerIdByExecutor 
returns Some. This inconsistency is caught in a conditional statement that does 
System.exit(1), which is a huge robustness issue for us.

The fix - simply remove the old id from both maps during register when this 
happens. We are mimicking the behavior of expireDeadHosts(), by doing local 
cleanup of the maps before trying to add new ones.

Also - added some logging for register and unregister.

This is just like https://github.com/apache/spark/pull/2854 except it's on 
master

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/taboola/spark master-block-mgr-removal

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2886.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2886


commit f48bce9cc25fa2672ea36bd90e64854159de8ead
Author: Tal Sliwowicz ta...@taboola.com
Date:   2014-10-21T14:29:39Z

In long running contexts, we encountered the situation of double register 
without a remove in between. The cause for that is unknown, and assumed a temp 
network issue.

However, since the second register is with a BlockManagerId on a 
different port, blockManagerInfo.contains() returns false, while 
blockManagerIdByExecutor returns Some. This inconsistency is caught in a 
conditional statement that does System.exit(1), which is a huge robustness 
issue for us.

The fix - simply remove the old id from both maps during register when 
this happens. We are mimicking the behavior of expireDeadHosts(), by doing 
local cleanup of the maps before trying to add new ones.

Also - added some logging for register and unregister.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4006] Block Manager - Double Register C...

2014-10-21 Thread tsliwowicz
Github user tsliwowicz commented on the pull request:

https://github.com/apache/spark/pull/2854#issuecomment-60017882
  
Created another pull request - https://github.com/apache/spark/pull/2886 - 
this time on master and also fixed the comments above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Block Manager - Double Register Crash

2014-10-20 Thread tsliwowicz
GitHub user tsliwowicz opened a pull request:

https://github.com/apache/spark/pull/2854

Block Manager - Double Register Crash

   In long running contexts, we encountered the situation of double 
register without a remove in between. The cause for that is unknown, and 
assumed a temp network issue.

However, since the second register is with a BlockManagerId on a 
different port, blockManagerInfo.contains() returns false, while 
blockManagerIdByExecutor returns Some. This inconsistency is caught in a 
conditional statement that does System.exit(1), which is a huge robustness 
issue for us.

The fix - simply remove the old id from both maps during register when 
this happens. We are mimicking the behavior of expireDeadHosts(), by doing 
local cleanup of the maps before trying to add new ones.

Also - added some logging for register and unregister.

https://issues.apache.org/jira/browse/SPARK-4006



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/taboola/spark branch-0.9.2-block-mgr-removal

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2854.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2854


commit efd93f2026ddc427e84fa03e8a595ded2b1a81ce
Author: Tal Sliwowicz ta...@taboola.com
Date:   2014-10-12T08:35:20Z

In long running contexts, we encountered the situation of double register 
without a remove in between. The cause for that is unknown, and assumed a temp 
network issue.

However, since the second register is with a BlockManagerId on a different 
port, blockManagerInfo.contains() returns false, while blockManagerIdByExecutor 
returns Some. This inconsistency is caught in a conditional statement that does 
System.exit(1), which is a huge robustness issue for us.

The fix - simply remove the old id from both maps during register when this 
happens. We are mimicking the behavior of expireDeadHosts(), by doing local 
cleanup of the maps before trying to add new ones.

Also - added some logging for register and unregister.

commit 81d69f088e421b19e47495d06e8b187a0ec29075
Author: Tal Sliwowicz ta...@taboola.com
Date:   2014-10-12T08:41:53Z

fixed comment




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: mesos executor ids now consist of the slave id...

2014-10-20 Thread tsliwowicz
Github user tsliwowicz commented on the pull request:

https://github.com/apache/spark/pull/1358#issuecomment-59724452
  
@mateiz - @KashiErez and I went on a different route. The killer issue was 
that there is a System.exit(1) in BlockManagerMasterActor which was a huge 
robustness issue for us. @taboola we are running some pretty large clusters 
(process many tera bytes of data / day) which do real time calculations and are 
mission critical. So - we fixed the issue and it's been running successfully in 
our production for a while now. 

I opened a new ticket - https://issues.apache.org/jira/browse/SPARK-4006
And a pull request - https://github.com/apache/spark/pull/2854

What do you think about our fix? 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org