[GitHub] spark issue #21658: [SPARK-24678][Spark-Streaming] Give priority in use of '...

2018-07-10 Thread sharkdtu
Github user sharkdtu commented on the issue: https://github.com/apache/spark/pull/21658 @jerryshao Yeah, I hava verified it in our cluster, and the locality is 'PROCESS_LOCAL'. --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #21658: [SPARK-24678][Spark-Streaming] Give priority in u...

2018-07-05 Thread sharkdtu
Github user sharkdtu commented on a diff in the pull request: https://github.com/apache/spark/pull/21658#discussion_r200310184 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1569,7 +1569,7 @@ private[spark] object BlockManager

[GitHub] spark pull request #21658: [SPARK-24678][Spark-Streaming] Give priority in u...

2018-06-28 Thread sharkdtu
GitHub user sharkdtu opened a pull request: https://github.com/apache/spark/pull/21658 [SPARK-24678][Spark-Streaming] Give priority in use of 'PROCESS_LOCAL' for spark-streaming ## What changes were proposed in this pull request? Currently, `BlockRDD.getPreferredLocations

[GitHub] spark issue #20078: [SPARK-22900] [Spark-Streaming] Remove unnecessary restr...

2018-01-03 Thread sharkdtu
Github user sharkdtu commented on the issue: https://github.com/apache/spark/pull/20078 @felixcheung Have you ever thought about initial num-executors? Actually, it is default 2 executors when you run spark on yarn. How can you make sure that this 2 executors have enougth cores

[GitHub] spark issue #20078: [SPARK-22900] [Spark-Streaming] Remove unnecessary restr...

2018-01-03 Thread sharkdtu
Github user sharkdtu commented on the issue: https://github.com/apache/spark/pull/20078 @jerryshao if this PR can fix bugs as you said. why not fix it. Or, it should be marked as deprecated

[GitHub] spark issue #20078: [SPARK-22900] [Spark-Streaming] Remove unnecessary restr...

2018-01-01 Thread sharkdtu
Github user sharkdtu commented on the issue: https://github.com/apache/spark/pull/20078 @felixcheung if you submit spark on yarn with `spark.streaming.dynamicAllocation.enabled=true`, the `num-executors` can not be set. So, at the begining, there are only 2(default value

[GitHub] spark issue #20078: [SPARK-22900] [Spark-Streaming] Remove unnecessary restr...

2018-01-01 Thread sharkdtu
Github user sharkdtu commented on the issue: https://github.com/apache/spark/pull/20078 @felixcheung At the beginning, if numReceivers > totleExecutorCores, there is not cpu cores for batch processing, and `ExecutorAllocationManager` can't listen metrics of any batc

[GitHub] spark pull request #20078: [SPARK-22900] [Spark-Streaming] Remove unnecessar...

2017-12-25 Thread sharkdtu
GitHub user sharkdtu opened a pull request: https://github.com/apache/spark/pull/20078 [SPARK-22900] [Spark-Streaming] Remove unnecessary restrict for streaming dynamic allocation ## What changes were proposed in this pull request? When i set the conf

[GitHub] spark pull request #18352: [SPARK-21138] [YARN] Cannot delete staging dir wh...

2017-06-19 Thread sharkdtu
GitHub user sharkdtu opened a pull request: https://github.com/apache/spark/pull/18352 [SPARK-21138] [YARN] Cannot delete staging dir when the clusters of "spark.yarn.stagingDir" and "spark.hadoop.fs.defaultFS" are different ## What changes were proposed

[GitHub] spark pull request #17963: [SPARK-20722][CORE] Replay newer event log that h...

2017-06-19 Thread sharkdtu
Github user sharkdtu closed the pull request at: https://github.com/apache/spark/pull/17963 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #17963: [SPARK-20722][CORE] Replay newer event log that hasn't b...

2017-05-15 Thread sharkdtu
Github user sharkdtu commented on the issue: https://github.com/apache/spark/pull/17963 @ajbozarth Yes, this case is a big issue in my production cluster, where run nearly 20,000 applications every day. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #17963: [SPARK-20722][CORE] Replay newer event log that hasn't b...

2017-05-15 Thread sharkdtu
Github user sharkdtu commented on the issue: https://github.com/apache/spark/pull/17963 @jerryshao thx, i agree that. this pr may be a temporary fix before SPARK-18085 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #17963: [SPARK-20722][CORE] Replay newer event log that hasn't b...

2017-05-15 Thread sharkdtu
Github user sharkdtu commented on the issue: https://github.com/apache/spark/pull/17963 @jerryshao Event log file will not be processed twice, you can review `FsHistoryProvider.checkForLogs` and `FsHistoryProvider.mergeApplicationListing`. In next checking period, it will check

[GitHub] spark issue #17963: [SPARK-20722][CORE] Replay newer event log that hasn't b...

2017-05-13 Thread sharkdtu
Github user sharkdtu commented on the issue: https://github.com/apache/spark/pull/17963 cc @srowen @ajbozarth --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #16912: [SPARK-19576] [Core] Task attempt paths exist in ...

2017-05-13 Thread sharkdtu
Github user sharkdtu closed the pull request at: https://github.com/apache/spark/pull/16912 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #17963: [SPARK-20722][Core][History Server] Replay newer ...

2017-05-12 Thread sharkdtu
GitHub user sharkdtu opened a pull request: https://github.com/apache/spark/pull/17963 [SPARK-20722][Core][History Server] Replay newer event log that hasn't be replayed in advance for request ## What changes were proposed in this pull request? History server may replay

[GitHub] spark pull request #16912: [SPARK-19576] [Core] Task attempt paths exist in ...

2017-02-13 Thread sharkdtu
GitHub user sharkdtu opened a pull request: https://github.com/apache/spark/pull/16912 [SPARK-19576] [Core] Task attempt paths exist in output path after saveAsNewAPIHadoopFile completes with speculation enabled `writeShard` in `saveAsNewAPIHadoopDataset` always committed its tasks

[GitHub] spark pull request #16911: [SPARK-19576] [Core] Task attempt paths exist in ...

2017-02-13 Thread sharkdtu
Github user sharkdtu closed the pull request at: https://github.com/apache/spark/pull/16911 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #16911: [SPARK-19576] [Core] Task attempt paths exist in ...

2017-02-13 Thread sharkdtu
GitHub user sharkdtu opened a pull request: https://github.com/apache/spark/pull/16911 [SPARK-19576] [Core] Task attempt paths exist in output path after saveAsNewAPIHadoopFile completes with speculation enabled `writeShard` in `saveAsNewAPIHadoopDataset` always committed its tasks

[GitHub] spark pull request #16651: [SPARK-19298][Core] History server can't match Ma...

2017-02-13 Thread sharkdtu
Github user sharkdtu closed the pull request at: https://github.com/apache/spark/pull/16651 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #16651: [SPARK-19298][Core] History server can't match Malformed...

2017-01-20 Thread sharkdtu
Github user sharkdtu commented on the issue: https://github.com/apache/spark/pull/16651 @srowen i think the logs were just for `MalformedInputException`, it does't matter that non-IOExceptions will be rethrown, because they will be catched by upper callers. --- If your project

[GitHub] spark pull request #16651: [SPARK-19298][Core] History server can't match Ma...

2017-01-20 Thread sharkdtu
Github user sharkdtu commented on a diff in the pull request: https://github.com/apache/spark/pull/16651#discussion_r97037524 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala --- @@ -107,11 +107,11 @@ private[spark] class ReplayListenerBus extends

[GitHub] spark pull request #16651: [SPARK-19298][Core] History server can't match Ma...

2017-01-20 Thread sharkdtu
Github user sharkdtu commented on a diff in the pull request: https://github.com/apache/spark/pull/16651#discussion_r97034201 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala --- @@ -107,11 +107,11 @@ private[spark] class ReplayListenerBus extends

[GitHub] spark pull request #16651: [SPARK-19298][Core] History server can't match Ma...

2017-01-19 Thread sharkdtu
GitHub user sharkdtu opened a pull request: https://github.com/apache/spark/pull/16651 [SPARK-19298][Core] History server can't match MalformedInputException and prompt the detail logs while repalying eventlog History server can't match MalformedInputException and prompt the detail

[GitHub] spark pull request #14479: [SPARK-16873] [Core] Fix SpillReader NPE when spi...

2016-08-03 Thread sharkdtu
GitHub user sharkdtu opened a pull request: https://github.com/apache/spark/pull/14479 [SPARK-16873] [Core] Fix SpillReader NPE when spillFile has no data ## What changes were proposed in this pull request? SpillReader NPE when spillFile has no data. See follow logs

[GitHub] spark pull request #14166: [MINOR][YARN] Fix code error in yarn-cluster unit...

2016-07-12 Thread sharkdtu
GitHub user sharkdtu opened a pull request: https://github.com/apache/spark/pull/14166 [MINOR][YARN] Fix code error in yarn-cluster unit test ## What changes were proposed in this pull request? Fix code error in yarn-cluster unit test. ## How was this patch

[GitHub] spark pull request #14088: [SPARK-16414] [YARN] Fix bugs for "Can not get us...

2016-07-11 Thread sharkdtu
Github user sharkdtu commented on a diff in the pull request: https://github.com/apache/spark/pull/14088#discussion_r70362297 --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala --- @@ -274,6 +288,37 @@ private object YarnClusterDriverWithFailure

[GitHub] spark pull request #14088: [SPARK-16414] [YARN] Fix bugs for "Can not get us...

2016-07-08 Thread sharkdtu
Github user sharkdtu commented on a diff in the pull request: https://github.com/apache/spark/pull/14088#discussion_r70076189 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -743,6 +735,14 @@ object ApplicationMaster extends Logging

[GitHub] spark issue #14088: [SPARK-16414] [YARN] Fix bugs for "Can not get user conf...

2016-07-07 Thread sharkdtu
Github user sharkdtu commented on the issue: https://github.com/apache/spark/pull/14088 @tgravescs fixed the description and style --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #14088: Fix bugs for "Can not get user config when callin...

2016-07-07 Thread sharkdtu
GitHub user sharkdtu opened a pull request: https://github.com/apache/spark/pull/14088 Fix bugs for "Can not get user config when calling SparkHadoopUtil.get.conf in other places" ## What changes were proposed in this pull request? Fix bugs for "Can not

[GitHub] spark pull request: [Core] Remove unnecessary calculation of stage...

2016-05-15 Thread sharkdtu
GitHub user sharkdtu opened a pull request: https://github.com/apache/spark/pull/13123 [Core] Remove unnecessary calculation of stage's parents ## What changes were proposed in this pull request? Remove unnecessary calculation of stage's parents, because stage's parents