[GitHub] spark pull request #21527: [SPARK-24519] Make the threshold for highly compr...

2018-09-18 Thread hthuynh2
Github user hthuynh2 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21527#discussion_r218656012
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -50,7 +50,9 @@ private[spark] sealed trait MapStatus {
 private[spark] object MapStatus {
 
   def apply(loc: BlockManagerId, uncompressedSizes: Array[Long]): 
MapStatus = {
-if (uncompressedSizes.length > 2000) {
+if (uncompressedSizes.length >  Option(SparkEnv.get)
--- End diff --

How about creating a "static" val shuffleMinNumPartsToHighlyCompress for 
this? Please let me know if this is good for you so I can update it. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21953: [SPARK-24992][Core] spark should randomize yarn local di...

2018-08-02 Thread hthuynh2
Github user hthuynh2 commented on the issue:

https://github.com/apache/spark/pull/21953
  
@tgravescs I updated it. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21936: [SPARK-24981][Core] ShutdownHook timeout causes job to f...

2018-08-02 Thread hthuynh2
Github user hthuynh2 commented on the issue:

https://github.com/apache/spark/pull/21936
  
@tgravescs I updated. Thanks.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21953: [SPARK-24992][Core] spark should randomize yarn local di...

2018-08-01 Thread hthuynh2
Github user hthuynh2 commented on the issue:

https://github.com/apache/spark/pull/21953
  
@tgravescs Can you test this please? Thank you.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21953: [SPARK-24992][Core] spark should randomize yarn l...

2018-08-01 Thread hthuynh2
GitHub user hthuynh2 opened a pull request:

https://github.com/apache/spark/pull/21953

[SPARK-24992][Core] spark should randomize yarn local dir selection

**Description: 
[SPARK-24992](https://issues.apache.org/jira/browse/SPARK-24992)**
Utils.getLocalDir is used to get path of a temporary directory. However, it 
always returns the the same directory, which is the first element in the array 
localRootDirs. When running on YARN, this might causes the case that we always 
write to one disk, which makes it busy while other disks are free. We should 
randomize the selection to spread out the loads. 

**What changes were proposed in this pull request?**
This PR randomized the selection of local directory inside the method 
Utils.getLocalDir. This change affects the Utils.fetchFile method since it 
based on the fact that Utils.getLocalDir always return the same directory to 
cache file. Therefore, a new variable cachedLocalDir is used to cache the first 
localDirectory that it gets from Utils.getLocalDir. Also, when getting the 
configured local directories (inside Utils. getConfiguredLocalDirs), in case we 
are in yarn mode, the array of directories are also randomized before return.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hthuynh2/spark SPARK_24992

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21953.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21953


commit 3986e75c3c000e7a7e7674be6837d663499f35f1
Author: Hieu Huynh <“hieu.huynh@...>
Date:   2018-08-01T22:31:48Z

initial commit




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21936: [SPARK-24981][Core] ShutdownHook timeout causes j...

2018-08-01 Thread hthuynh2
Github user hthuynh2 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21936#discussion_r206918869
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -571,7 +571,12 @@ class SparkContext(config: SparkConf) extends Logging {
 _shutdownHookRef = ShutdownHookManager.addShutdownHook(
   ShutdownHookManager.SPARK_CONTEXT_SHUTDOWN_PRIORITY) { () =>
   logInfo("Invoking stop() from shutdown hook")
-  stop()
+  try {
+stop()
+  } catch {
+case e: Throwable =>
+  logWarning("Ignoring Exception while stoping SparkContext. 
Exception: " + e)
--- End diff --

@felixcheung Thanks for the comments. I updated it. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21936: [SPARK-24981][Core] ShutdownHook timeout causes job to f...

2018-07-31 Thread hthuynh2
Github user hthuynh2 commented on the issue:

https://github.com/apache/spark/pull/21936
  
@tgravescs Can you please have a look at it? Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21936: [SPARK-24981][Core] ShutdownHook timeout causes j...

2018-07-31 Thread hthuynh2
GitHub user hthuynh2 opened a pull request:

https://github.com/apache/spark/pull/21936

[SPARK-24981][Core] ShutdownHook timeout causes job to fail when succeeded 
when SparkContext stop() not called by user program

**Description**
The issue is described in 
[SPARK-24981](https://issues.apache.org/jira/browse/SPARK-24981). 

**How does this PR fix the issue?**
This PR catch the Exception that is thrown while the Sparkcontext.stop() is 
running (when it is called by the ShutdownHookManager).

**How was this patch tested?**
I manually tested it by adding delay (60s) inside the stop(). This make the 
shutdownHookManger interrupt the thread that is running stop(). The Interrupted 
Exception was catched and the job succeed.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hthuynh2/spark SPARK_24981

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21936.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21936


commit 101bf2b0488ab495dbdaf7b54d9a57d9827cb833
Author: Hieu Huynh <“hieu.huynh@...>
Date:   2018-07-31T18:30:11Z

initial commit




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21653: [SPARK-13343] speculative tasks that didn't commit shoul...

2018-07-22 Thread hthuynh2
Github user hthuynh2 commented on the issue:

https://github.com/apache/spark/pull/21653
  
@tgravescs Can you please run the test again, thank you. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21653: [SPARK-13343] speculative tasks that didn't commit shoul...

2018-07-19 Thread hthuynh2
Github user hthuynh2 commented on the issue:

https://github.com/apache/spark/pull/21653
  
@tgravescs I updated it. Can you please have a look at it when you have 
time. Thank you.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21729: [SPARK-24755][Core] Executor loss can cause task ...

2018-07-10 Thread hthuynh2
Github user hthuynh2 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21729#discussion_r201412205
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -87,7 +87,7 @@ private[spark] class TaskSetManager(
   // Set the coresponding index of Boolean var when the task killed by 
other attempt tasks,
--- End diff --

I'll update it. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21729: [SPARK-24755][Core] Executor loss can cause task ...

2018-07-10 Thread hthuynh2
Github user hthuynh2 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21729#discussion_r201411743
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -87,7 +87,7 @@ private[spark] class TaskSetManager(
   // Set the coresponding index of Boolean var when the task killed by 
other attempt tasks,
   // this happened while we set the `spark.speculation` to true. The task 
killed by others
   // should not resubmit while executor lost.
-  private val killedByOtherAttempt: Array[Boolean] = new 
Array[Boolean](numTasks)
+  private val killedByOtherAttempt = new HashSet[Long]
--- End diff --

I think we should use ArrayBuffer[Long] instead of Array[Long] because the 
number of elements can grow when there are more killed attempts.
Also, I think there is a downside of using Array-like data structure for 
this variable. Lookup operation for array-like data structure takes linear time 
and that operation is used many times when we check if a task need to be 
resubmitted (inside executorLost method of TSM). This will not matter much if 
the size of the array is small, but still I think this is something we might 
want to consider. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21729: [SPARK-24755][Core] Executor loss can cause task ...

2018-07-10 Thread hthuynh2
Github user hthuynh2 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21729#discussion_r201372921
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -87,7 +87,7 @@ private[spark] class TaskSetManager(
   // Set the coresponding index of Boolean var when the task killed by 
other attempt tasks,
   // this happened while we set the `spark.speculation` to true. The task 
killed by others
   // should not resubmit while executor lost.
-  private val killedByOtherAttempt: Array[Boolean] = new 
Array[Boolean](numTasks)
+  private val killedByOtherAttempt = new HashSet[Long]
--- End diff --

Also, the comment "Set the corresponding index of Boolean var when the task 
killed ..." is not correct anymore. I'm sorry I forgot to update it. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21729: [SPARK-24755][Core] Executor loss can cause task ...

2018-07-10 Thread hthuynh2
Github user hthuynh2 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21729#discussion_r201371752
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -87,7 +87,7 @@ private[spark] class TaskSetManager(
   // Set the coresponding index of Boolean var when the task killed by 
other attempt tasks,
   // this happened while we set the `spark.speculation` to true. The task 
killed by others
   // should not resubmit while executor lost.
-  private val killedByOtherAttempt: Array[Boolean] = new 
Array[Boolean](numTasks)
+  private val killedByOtherAttempt = new HashSet[Long]
--- End diff --

Hi @jiangxb1987, thanks for the comment, but I'm sure if I understand your 
suggestion correctly. Do you mean: private val killedByOtherAttempt = new 
Array[Long] ? 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21729: [SPARK-24755][Core] Executor loss can cause task to not ...

2018-07-09 Thread hthuynh2
Github user hthuynh2 commented on the issue:

https://github.com/apache/spark/pull/21729
  
@xuanyuanking Thanks for the comments. I also thought about modifying the 
UT of SPARK-22074 instead of adding new UT but I was afraid it might cause 
confusing since they are 2 different issues although they are very close. If 
you feel it is better to combine them, I can change it. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21729: SPARK-24755 Executor loss can cause task to not be resub...

2018-07-08 Thread hthuynh2
Github user hthuynh2 commented on the issue:

https://github.com/apache/spark/pull/21729
  
 cc @mridulm @xuanyuanking 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21729: SPARK-24755 Executor loss can cause task to not b...

2018-07-08 Thread hthuynh2
GitHub user hthuynh2 opened a pull request:

https://github.com/apache/spark/pull/21729

SPARK-24755 Executor loss can cause task to not be resubmitted

**Description**
As described in 
[SPARK-24755](https://issues.apache.org/jira/browse/SPARK-24755), when 
speculation is enabled, there is scenario that executor loss can cause task to 
not be resubmitted. 
This patch changes the variable killedByOtherAttempt to keeps track of the 
taskId of tasks that are killed by other attempt. By doing this, we can still 
prevent resubmitting task killed by other attempt while resubmit successful 
attempt when executor lost.

**How was this patch tested?**
A UT is added based on the UT written by @xuanyuanking with modification to 
simulate the scenario described in SPARK-24755. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hthuynh2/spark SPARK_24755

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21729.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21729


commit 093e39cf76378821284ef7d771e819afb69930ae
Author: Hieu Huynh <“hieu.huynh@...>
Date:   2018-07-08T18:20:26Z

SPARK-24755 Executor loss can cause task to not be resubmitted




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21653: [SPARK-13343] speculative tasks that didn't commit shoul...

2018-07-06 Thread hthuynh2
Github user hthuynh2 commented on the issue:

https://github.com/apache/spark/pull/21653
  
@squito Thanks for the suggestions. I updated it. Could you please have a 
look at it to see if there is anything else I need to change? Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21653: [SPARK-13343] speculative tasks that didn't commit shoul...

2018-07-03 Thread hthuynh2
Github user hthuynh2 commented on the issue:

https://github.com/apache/spark/pull/21653
  
@jiangxb1987 yes, you are correct that it is actually ignored. I think it 
doesn't worth to add a new TaskState because we might need to add changes in 
many places but does not add much benefit. Instead, I think we can add some 
message to the kill message to differentiate it from task that is actually 
killed and to inform the user.  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21653: [SPARK-13343] speculative tasks that didn't commit shoul...

2018-07-02 Thread hthuynh2
Github user hthuynh2 commented on the issue:

https://github.com/apache/spark/pull/21653
  
I updated it. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21653: [SPARK-13343] speculative tasks that didn't commit shoul...

2018-06-27 Thread hthuynh2
Github user hthuynh2 commented on the issue:

https://github.com/apache/spark/pull/21653
  
cc @tgravescs 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21653: [SPARK-13343] speculative tasks that didn't commi...

2018-06-27 Thread hthuynh2
GitHub user hthuynh2 opened a pull request:

https://github.com/apache/spark/pull/21653

[SPARK-13343] speculative tasks that didn't commit shouldn't be marked as 
success

**Description**
Currently Speculative tasks that didn't commit can show up as success of 
failures (depending on timing of commit). This is a bit confusing because that 
task didn't really succeed in the sense it didn't write anything.
I think these tasks should be marked as KILLED or something that is more 
obvious to the user exactly what happened. it is happened to hit the timing 
where it got a commit denied exception then it shows up as failed and counts 
against your task failures. It shouldn't count against task failures since that 
failure really doesn't matter.
MapReduce handles these situation so perhaps we can look there for a model.

https://user-images.githubusercontent.com/15680678/42013170-99db48c2-7a61-11e8-8c7b-ef94c84e36ea.png;>

**How can this issue happen?**
When both attempts of a task finish before the driver sends command to kill 
one of them, both of them send the status update FINISHED to the driver. The 
driver calls TaskSchedulerImpl to handle one successful task at a time. When it 
handles the first successful task, it sends the command to kill the other copy 
of the task, however, because that task is already finished, the executor will 
ignore the command. After finishing handling the first attempt, it processes 
the second one, although all actions on the result of this task are skipped, 
this copy of the task is still marked as SUCCESS. As a result, even though this 
issue does not affect the result of the job, it might cause confusing to user 
because both of them appear to be successful.

**How does this PR fix the issue?**
The simple way to fix this issue is that when taskSetManager handles 
successful task, it checks if any other attempt succeeded. If this is the case, 
it will call handleFailedTask with state==KILLED and 
reason==TaskKilled(“another attempt succeeded”) to handle this task as 
begin killed.

**How was this patch tested?**
I tested this manually by running applications, that caused the issue 
before, a few times, and observed that the issue does not happen again. Also, I 
added a unit test in TaskSetManagerSuite to test that if we call 
handleSuccessfulTask to handle status update for 2 copies of a task, only the 
one that is handled first will be mark as SUCCESS


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hthuynh2/spark SPARK_13343

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21653.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21653


commit 8f7d98177816e11659cf79a2b28f96bd4b7173d5
Author: Hieu Huynh <“hieu.huynh@...>
Date:   2018-06-28T04:19:14Z

Fixed issue and added unit test




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21527: [SPARK-24519] Make the threshold for highly compressed m...

2018-06-20 Thread hthuynh2
Github user hthuynh2 commented on the issue:

https://github.com/apache/spark/pull/21527
  
@vanzin @squito Thank for the comments. I updated the PR, please have a 
look and let me know if anything need to be changed. Thank you.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21527: [SPARK-24519] MapStatus has 2000 hardcoded

2018-06-20 Thread hthuynh2
Github user hthuynh2 commented on the issue:

https://github.com/apache/spark/pull/21527
  
@tgravescs I updated it. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21527: [SPARK-24519] MapStatus has 2000 hardcoded

2018-06-19 Thread hthuynh2
Github user hthuynh2 commented on the issue:

https://github.com/apache/spark/pull/21527
  
I updated it. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21527: [SPARK-24519] MapStatus has 2000 hardcoded

2018-06-11 Thread hthuynh2
Github user hthuynh2 commented on the issue:

https://github.com/apache/spark/pull/21527
  
@tgravescs 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21527: Spark branch 1

2018-06-11 Thread hthuynh2
GitHub user hthuynh2 opened a pull request:

https://github.com/apache/spark/pull/21527

Spark branch 1

**Problem**
MapStatus uses hardcoded value of 2000 partitions to determine if it should 
use highly compressed map status. We should make it configurable.

**What changes were proposed in this pull request?**
I make the hardcoded value mentioned above to be configurable under the 
name _SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_COMPRESS_, which has default value to be 
2000. Users can set it to the value they want by setting the property name 
_spark.shuffle.minNumPartitionsToHighlyCompress_ 

**How was this patch tested?**
I wrote a unit test to make sure that the default value is 2000, and  
_IllegalArgumentException_ will be thrown if user set it to a non-positive 
value. The unit test also checks that highly compressed map status is correctly 
used when the number of partition is greater than 
_SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_COMPRESS_.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hthuynh2/spark spark_branch_1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21527.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21527


commit 93582bd1ce114368654ff896749c517d979ed23a
Author: Hieu Huynh <“hieu.huynh@...>
Date:   2018-06-11T13:47:02Z

Change MapStatus hardcode value to configurable

commit d3f24b501c68f8ef22726d711a887268d02a9fc7
Author: Hieu Huynh <“hieu.huynh@...>
Date:   2018-06-11T14:16:25Z

Fixed incorrect name




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org