[GitHub] spark pull request #16632: [SPARK-19273]shuffle stage should retry when fetc...
Github user viper-kun closed the pull request at: https://github.com/apache/spark/pull/16632 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16632: [SPARK-19273]shuffle stage should retry when fetch shuff...
Github user viper-kun commented on the issue: https://github.com/apache/spark/pull/16632 ok, close it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16632: [SPARK-19273]shuffle stage should retry when fetch shuff...
Github user viper-kun commented on the issue: https://github.com/apache/spark/pull/16632 @srowen I have not test in master version. I will do it later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16632: [SPARK-19273]shuffle stage should retry when fetc...
GitHub user viper-kun opened a pull request: https://github.com/apache/spark/pull/16632 [SPARK-19273]shuffle stage should retry when fetch shuffle fail ## What changes were proposed in this pull request? shuffle stage should retry when fetch shuffle fail ## How was this patch tested? see https://issues.apache.org/jira/browse/SPARK-19273 You can merge this pull request into a Git repository by running: $ git pull https://github.com/viper-kun/spark unretry Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16632.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16632 commit f8d411e4c7332967914b930c120e6629a1a17684 Author: x00228947 Date: 2017-01-18T09:38:08Z shuffle stage not retry --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9113: [SPARK-11100][SQL]HiveThriftServer HA issue,HiveThriftSer...
Github user viper-kun commented on the issue: https://github.com/apache/spark/pull/9113 @rxin Is there any design about replacement? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9113: [SPARK-11100][SQL]HiveThriftServer HA issue,HiveThriftSer...
Github user viper-kun commented on the issue: https://github.com/apache/spark/pull/9113 @xiaowangyu What I means is how to choose which active thrift server.It chooses random or by thrift server load. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11100][SQL]HiveThriftServer HA issue,Hi...
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/9113#issuecomment-222619116 @xiaowangyu 1.Can you tell me how to work? Thrift server is easy to dump because of too many beelines. If beeline can choose less-load thrift server. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14065]Increase probability of using cac...
Github user viper-kun closed the pull request at: https://github.com/apache/spark/pull/11886 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14065]Increase probability of using cac...
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/11886#issuecomment-218140398 @tgravescs ok, i close it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4105][CORE] regenerate the shuffle file...
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/12700#issuecomment-214948007 @jerryshao @srowen We met this problem in spark 1.4, spark 1.5 and spark 1.6 and just know shuffle file is broken. We can reproduce this problem by modify shuffle file, but don't know the root-cause. Any idea for this problem? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13112]CoarsedExecutorBackend register t...
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/12078#issuecomment-204322877 I don't think so. It is a wrong order "after registed to drive first, it begin new Executor. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14065]Increase probability of using cac...
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/11886#issuecomment-204319818 @srowen The logic is the same to me.If necessary, I will put serializeMapStatus into match...None. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13112]CoarsedExecutorBackend register t...
GitHub user viper-kun opened a pull request: https://github.com/apache/spark/pull/12078 [SPARK-13112]CoarsedExecutorBackend register to driver should wait Executor was ready ## What changes were proposed in this pull request? When CoarseGrainedExecutorBackend receives RegisterExecutorResponse slow after LaunchTask, it will occurs the problem. ## How was this patch tested? Executor host IO Busy You can merge this pull request into a Git repository by running: $ git pull https://github.com/viper-kun/spark patch-3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12078.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12078 commit 1b046304313c7663015667ab9cc8fe4201d17eb2 Author: xukun Date: 2016-03-31T02:39:40Z CoarsedExecutorBackend register to driver should wait Executor was ready --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14065]Increase probability of using cac...
Github user viper-kun commented on a diff in the pull request: https://github.com/apache/spark/pull/11886#discussion_r57102821 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -443,13 +443,12 @@ private[spark] class MapOutputTrackerMaster(conf: SparkConf) statuses = mapStatuses.getOrElse(shuffleId, Array[MapStatus]()) epochGotten = epoch } -} -// If we got here, we failed to find the serialized locations in the cache, so we pulled -// out a snapshot of the locations as "statuses"; let's serialize and return that -val bytes = MapOutputTracker.serializeMapStatuses(statuses) -logInfo("Size of output statuses for shuffle %d is %d bytes".format(shuffleId, bytes.length)) -// Add them into the table only if the epoch hasn't changed while we were working -epochLock.synchronized { + + // If we got here, we failed to find the serialized locations in the cache, so we pulled --- End diff -- The problem description: Execute the query listed in jira(https://issues.apache.org/jira/browse/SPARK-14065), all tasks running on some executor are slow. Slow executor logs show that RPC(GetMapOutputStatuses) get RpcTimeoutException ``` Error sending message [message = GetMapOutputStatuses(1)] in 1 attempts | org.apache.spark.Logging$class.logWarning(Logging.scala:92) org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.network.timeout ``` Driver logs shows that serialize mapstatus is slow ``` 16/03/22 11:47:07 INFO [dispatcher-event-loop-36] MapOutputTrackerMaster: Size of output statuses for shuffle 1 is 10298063 bytes 16/03/22 11:47:14 INFO [dispatcher-event-loop-30] MapOutputTrackerMaster: Size of output statuses for shuffle 1 is 10298063 bytes 16/03/22 11:47:21 INFO [dispatcher-event-loop-3] MapOutputTrackerMaster: Size of output statuses for shuffle 1 is 10298063 bytes 16/03/22 11:47:27 INFO [dispatcher-event-loop-32] MapOutputTrackerMaster: Size of output statuses for shuffle 1 is 10298063 bytes 16/03/22 11:47:34 INFO [dispatcher-event-loop-31] MapOutputTrackerMaster: Size of output statuses for shuffle 1 is 10298063 bytes 16/03/22 11:47:41 INFO [dispatcher-event-loop-38] MapOutputTrackerMaster: Size of output statuses for shuffle 1 is 10298063 bytes 16/03/22 11:47:47 INFO [dispatcher-event-loop-4] MapOutputTrackerMaster: Size of output statuses for shuffle 1 is 10298063 bytes 16/03/22 11:47:54 INFO [dispatcher-event-loop-37] MapOutputTrackerMaster: Size of output statuses for shuffle 1 is 10298063 bytes 16/03/22 11:48:00 INFO [dispatcher-event-loop-28] MapOutputTrackerMaster: Size of output statuses for shuffle 1 is 10298063 bytes ``` When reduce task start, all executors will get mapstatus from driver.Because "MapOutputTracker.serializeMapStatuses(statuses)" is out of epochLock.synchronized {} ,all thread will do the operation - MapOutputTracker.serializeMapStatuses(statuses). In function serializeMapStatuses, it has sync on statuses. So Serialize is one by one. Every serialize cost 7 seconds. We have 80 executors, it total cost 560 seconds. The result is some executor get mapstatus timeout. This patch put "MapOutputTracker.serializeMapStatuses(statuses)" into epochLock.synchronized {}, it will increase probability of using cached serialized status. I have test listed sql in 30T tpcds. The result shows it faster than old. ![image](https://cloud.githubusercontent.com/assets/6460155/13974069/9f448cc4-f0e3-11e5-9a5d-bc3cd8300db1.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Increase probability of using cached serialize...
GitHub user viper-kun opened a pull request: https://github.com/apache/spark/pull/11886 Increase probability of using cached serialized status Increase probability of using cached serialized status You can merge this pull request into a Git repository by running: $ git pull https://github.com/viper-kun/spark patch-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11886.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11886 commit 32ad0d57becd216f1ad9e835ee6e4db527b3ab20 Author: xukun Date: 2016-03-22T12:44:34Z Increase probability of using cached serialized status --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13652][Core]Copy ByteBuffer in sendRpcS...
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/11499#issuecomment-192824479 @zsxwing It is ok. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: remove unnecessary copy
Github user viper-kun closed the pull request at: https://github.com/apache/spark/pull/10040 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: remove unnecessary copy
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/10040#issuecomment-160824966 ok.close it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: remove unnecessary copy
GitHub user viper-kun opened a pull request: https://github.com/apache/spark/pull/10040 remove unnecessary copy You can merge this pull request into a Git repository by running: $ git pull https://github.com/viper-kun/spark patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10040.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10040 commit 7f665bb5735923eccc219ea0b03b8f914230c474 Author: xukun Date: 2015-11-30T12:45:50Z remove unnecessary copy --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11225]Prevent generate empty file
Github user viper-kun closed the pull request at: https://github.com/apache/spark/pull/9191 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11225]Prevent generate empty file
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/9191#issuecomment-154050289 ok. close it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11225]Prevent generate empty file
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/9191#issuecomment-151385025 @JoshRosen Is it ok? If it doesn't work, I will close this pr. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11225]Prevent generate empty file
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/9191#issuecomment-150747910 @davies This test is flaky, pls re-test it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11225]Prevent generate empty file
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/9191#issuecomment-150176502 Hi @davies, Sorry, I do not understand Python. Can you help me fix it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11225]Prevent generate empty file
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/9191#issuecomment-150076597 @JoshRosen In my test environment, it do not have this error. Pls retest it.Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11225]Prevent generate empty file
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/9191#issuecomment-149825404 Thanks @JoshRosen Sorry, I don't do performacne test. As I know, it will reduce number of open file. When there too much empty file, it will get some benefit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11225]Prevent generate empty file
GitHub user viper-kun opened a pull request: https://github.com/apache/spark/pull/9191 [SPARK-11225]Prevent generate empty file If no data will be written into the bucket, it will be generate empty files. So open() must be called in the first write(key,value). You can merge this pull request into a Git repository by running: $ git pull https://github.com/viper-kun/spark apache Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9191.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9191 commit 9887ef1dbcd39ab796c77239d5bba0da5e9ba3ab Author: x00228947 Date: 2015-10-21T02:30:01Z remove invalid code --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: correct buffer size
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/8189#issuecomment-130988766 @liancheng @scwf is it OK? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: correct buffer size
GitHub user viper-kun opened a pull request: https://github.com/apache/spark/pull/8189 correct buffer size No need multiply columnType.defaultSize here, we have done it In ColumnBuilder class. buffer = ByteBuffer.allocate(4 + size * columnType.defaultSize) You can merge this pull request into a Git repository by running: $ git pull https://github.com/viper-kun/spark errorSize Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8189.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8189 commit 6741f239052c72663b55b3d398db3b26d93ae2e3 Author: x00228947 Date: 2015-08-14T04:25:07Z recorrect buffer size --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-6924]Fix driver hangs in yarn-client mo...
Github user viper-kun closed the pull request at: https://github.com/apache/spark/pull/5523 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-6924]Fix driver hangs in yarn-client mo...
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/5523#issuecomment-95764614 ok. I will close it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-94250246 @maropu I will update it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user viper-kun commented on a diff in the pull request: https://github.com/apache/spark/pull/5178#discussion_r28650737 --- Diff: core/src/main/scala/org/apache/spark/shuffle/FileShuffleBlockManager.scala --- @@ -180,7 +180,8 @@ class FileShuffleBlockManager(conf: SparkConf) Some(segment.nioByteBuffer()) } - override def getBlockData(blockId: ShuffleBlockId): ManagedBuffer = { + override def getBlockData(blockId: ShuffleBlockId, + blockManagerId: BlockManagerId = blockManager.blockManagerId): ManagedBuffer = { --- End diff -- It not support HashShuffle with consolidateShuffleFiles. For not confuseï¼it only support SortShuffleManager. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-6924]Fix driver hangs in yarn-client mo...
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/5523#issuecomment-93728497 Between construction, it is normal that it didn't hear from executors. Only after construction, executors have connected and sent heartbeat to driver. We can indicates whether there is a disconnection. SparkContext.isInited show whether SparkContext construction has completed . >>>I don't think you can call stop() just because you didn't hear from executors recently. Is there any better way? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-6924]Fix driver hangs in yarn-client mo...
Github user viper-kun commented on a diff in the pull request: https://github.com/apache/spark/pull/5523#discussion_r28502410 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -75,6 +75,8 @@ import org.apache.spark.util._ */ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationClient { + var isInited: Boolean = false --- End diff -- >>> I also don't think that a lack of executor messages indicates a disconnection; it's not possible to distinguish from temporary loss of connectivity this way. All executors send heartbeat to driver at fixed Rate.Over a period of time, all executors are expire, I think there is a disconnection. Is there any better way to distinguish from temporary loss of connectivity. Can we check some times? If all executors still expire, we indicates a disconnection. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-6924]Fix driver hangs in yarn-client mo...
Github user viper-kun commented on a diff in the pull request: https://github.com/apache/spark/pull/5523#discussion_r28501942 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -75,6 +75,8 @@ import org.apache.spark.util._ */ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationClient { + var isInited: Boolean = false --- End diff -- >>>This is going to conflict with the overhaul of the SparkContext constructor. I don't see why this depends on the constructor finishing since where you reference the SparkContext, it has been constructed. In SparkContext constructor, when HeartbeatReceiver create, timeoutCheckingThread will check expire dead host. if executorLastSeen is empty, it will execute sc.stop(). Then it will throw exception: java.lang.NullPointerException at org.apache.spark.SparkContext.stop(SparkContext.scala:1416) at org.apache.spark.HeartbeatReceiver.org$apache$spark$HeartbeatReceiver$$expireDeadHosts(HeartbeatReceiver.scala:134) at org.apache.spark.HeartbeatReceiver$$anonfun$receive$1.applyOrElse(HeartbeatReceiver.scala:92) at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:176) at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$4.apply$mcV$sp(AkkaRpcEnv.scala:125) at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:196) at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1.applyOrElse(AkkaRpcEnv.scala:124) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53) at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.aroundReceive(AkkaRpcEnv.scala:91) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 15/04/16 15:12:16 INFO netty.NettyBlockTransferService: Server created on 53493 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-6924]Fix driver hangs in yarn-client mo...
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/5523#issuecomment-93615543 @srowen I updated the jira. Pls review it. Thanksã --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-6924]Fix client hands in yarn-client mo...
GitHub user viper-kun opened a pull request: https://github.com/apache/spark/pull/5523 [Spark-6924]Fix client hands in yarn-client mode when net is broken https://issues.apache.org/jira/browse/SPARK-6924 You can merge this pull request into a Git repository by running: $ git pull https://github.com/viper-kun/spark spark-6924 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5523.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5523 commit 9288f220a6b7e4ea0d938a2ec4fbfb201de1aa71 Author: xukun 00228947 Date: 2015-04-15T09:40:28Z fix client hands in yarn-client mode --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6879][HistoryServer]check if app is com...
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/5491#issuecomment-92578144 I think it is ok. User must call sc.stop(), if not, it just not delete some event log. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6479][Block Manager]Create off-heap blo...
Github user viper-kun commented on a diff in the pull request: https://github.com/apache/spark/pull/5430#discussion_r28030832 --- Diff: core/src/main/scala/org/apache/spark/storage/OffHeapStore.scala --- @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.storage + +import java.nio.ByteBuffer +import org.apache.spark.Logging +import org.apache.spark.util.Utils + +import scala.util.control.NonFatal + + +/** + * Stores BlockManager blocks on OffHeap. + * We capture any potential exception from underlying implementation + * and return with the expected failure value + */ +private[spark] class OffHeapStore(blockManager: BlockManager, executorId: String) + extends BlockStore(blockManager: BlockManager) with Logging { + + lazy val offHeapManager: Option[OffHeapBlockManager] = +OffHeapBlockManager.create(blockManager, executorId) + + logInfo("OffHeap started") + + override def getSize(blockId: BlockId): Long = { +try { + offHeapManager.map(_.getSize(blockId)).getOrElse(0) +} catch { + case NonFatal(t) => logError(s"error in getSize from $blockId") +0 +} + } + + override def putBytes(blockId: BlockId, bytes: ByteBuffer, level: StorageLevel): PutResult = { +putIntoOffHeapStore(blockId, bytes, returnValues = true) + } + + override def putArray( + blockId: BlockId, + values: Array[Any], + level: StorageLevel, + returnValues: Boolean): PutResult = { +putIterator(blockId, values.toIterator, level, returnValues) + } + + override def putIterator( + blockId: BlockId, + values: Iterator[Any], + level: StorageLevel, + returnValues: Boolean): PutResult = { +logDebug(s"Attempting to write values for block $blockId") +val bytes = blockManager.dataSerialize(blockId, values) +putIntoOffHeapStore(blockId, bytes, returnValues) + } + + private def putIntoOffHeapStore( + blockId: BlockId, + bytes: ByteBuffer, + returnValues: Boolean): PutResult = { + + // So that we do not modify the input offsets ! --- End diff -- style: two blank left --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6479][Block Manager]Create off-heap blo...
Github user viper-kun commented on a diff in the pull request: https://github.com/apache/spark/pull/5430#discussion_r28030751 --- Diff: core/src/main/scala/org/apache/spark/storage/OffHeapBlockManager.scala --- @@ -0,0 +1,110 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.storage + +import java.nio.ByteBuffer +import org.apache.spark.Logging + +import scala.util.control.NonFatal + + +trait OffHeapBlockManager { + + /** + * desc for the implementation. + * + */ + def desc(): String = {"OffHeap"} + + /** + * initialize a concrete block manager implementation. + * + * @throws java.io.IOException when FS init failure. + */ + def init(blockManager: BlockManager, executorId: String) + + /** + * remove the cache from offheap + * + * @throws java.io.IOException when FS failure in removing file. + */ + def removeFile(blockId: BlockId): Boolean + + /** + * check the existence of the block cache + * + * @throws java.io.IOException when FS failure in checking the block existence. + */ + def fileExists(blockId: BlockId): Boolean + + /** + * save the cache to the offheap. + * + * @throws java.io.IOException when FS failure in put blocks. + */ + def putBytes(blockId: BlockId, bytes: ByteBuffer) + + /** + * retrieve the cache from offheap + * + * @throws java.io.IOException when FS failure in get blocks. + */ + def getBytes(blockId: BlockId): Option[ByteBuffer] + + /** + * retrieve the size of the cache + * + * @throws java.io.IOException when FS failure in get block size. + */ + def getSize(blockId: BlockId): Long + + /** + * cleanup when shutdown + * + */ + def addShutdownHook() + + final def setup(blockManager: BlockManager, executorId: String): Unit = { +init(blockManager, executorId) +addShutdownHook() + } +} + +object OffHeapBlockManager extends Logging{ + val MAX_DIR_CREATION_ATTEMPTS = 10 + val subDirsPerDir = 64 + def create(blockManager: BlockManager, + executorId: String): Option[OffHeapBlockManager] = { + val sNames = blockManager.conf.getOption("spark.offHeapStore.blockManager") --- End diff -- i think default value is necessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-88064150 @andrewor14 Pls review it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-86849484 Hi @andrewor14. pls retest it, test build time out. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user viper-kun commented on a diff in the pull request: https://github.com/apache/spark/pull/5178#discussion_r27193887 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -181,68 +181,100 @@ final class ShuffleBlockFetcherIterator( // at most maxBytesInFlight in order to limit the amount of data in flight. val remoteRequests = new ArrayBuffer[FetchRequest] +val shuffleMgrName = SparkEnv.get.conf.get("spark.shuffle.manager", "sort") + // Tracks total number of blocks (including zero sized blocks) var totalBlocks = 0 -for ((address, blockInfos) <- blocksByAddress) { - totalBlocks += blockInfos.size - if (address.executorId == blockManager.blockManagerId.executorId) { -// Filter out zero-sized blocks -localBlocks ++= blockInfos.filter(_._2 != 0).map(_._1) -numBlocksToFetch += localBlocks.size - } else { -val iterator = blockInfos.iterator -var curRequestSize = 0L -var curBlocks = new ArrayBuffer[(BlockId, Long)] -while (iterator.hasNext) { - val (blockId, size) = iterator.next() - // Skip empty blocks - if (size > 0) { -curBlocks += ((blockId, size)) -remoteBlocks += blockId -numBlocksToFetch += 1 -curRequestSize += size - } else if (size < 0) { -throw new BlockException(blockId, "Negative block size " + size) - } - if (curRequestSize >= targetRequestSize) { -// Add this FetchRequest -remoteRequests += new FetchRequest(address, curBlocks) -curBlocks = new ArrayBuffer[(BlockId, Long)] -logDebug(s"Creating fetch request of $curRequestSize at $address") -curRequestSize = 0 - } +if(shuffleMgrName == "hash") { --- End diff -- what are you meansï¼how to specify the full pathï¼ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
GitHub user viper-kun opened a pull request: https://github.com/apache/spark/pull/5178 [SPARK-6521][Core]executors in the same node read local shuffle file In the past, executor read other executor's shuffle file in the same node by net. This pr make that executors in the same node read local shuffle file In sort-based Shuffle. It will reduce net transport. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viper-kun/spark readlocal Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5178.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5178 commit 28a96ec3fe2e9e6adadedddc6e74b2c1ef667b00 Author: xukun 00228947 Date: 2015-03-25T02:26:33Z read local commit e6d1e0b62e248714042a1f1d463f9ab26b018516 Author: xukun 00228947 Date: 2015-03-25T02:46:08Z read local commit 5b766ca3f02d2e9b213e39908aec9c6b5804b623 Author: xukun 00228947 Date: 2015-03-25T02:49:02Z fix it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/4214#issuecomment-76109919 @andrewor14 thanks for your check. Pls retest it . I can not get test log. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5661]function hasShutdownDeleteTachyonD...
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/4418#issuecomment-73452955 @srowen it is a static function and unused now. I think we should better leave it in Utils class now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5661]function hasShutdownDeleteTachyonD...
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/4418#issuecomment-73341954 ok. i will create a JIRA. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Remove unused function
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/4418#issuecomment-73190743 @srowen ok. if it is useful later; we should change it like this def hasShutdownDeleteTachyonDir(file: TachyonFile): Boolean = { val absolutePath = file.getPath() shutdownDeleteTachyonPaths.synchronized { shutdownDeleteTachyonPaths.contains(absolutePath) } } --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Remove unused function
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/4418#issuecomment-73179289 cc @haoyuan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: remove unused function
GitHub user viper-kun opened a pull request: https://github.com/apache/spark/pull/4418 remove unused function hasShutdownDeleteTachyonDir(file: TachyonFile) should use shutdownDeleteTachyonPaths(not shutdownDeletePaths) to determine Whether contain file. To solve it ,delete two unused function. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viper-kun/spark deleteunusedfun Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4418.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4418 commit 2bc397edf3793cc8f3e13b1d7a3f31efb7e8f9c2 Author: xukun 00228947 Date: 2015-02-06T03:26:34Z deleteunusedfun --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5526][SQL] fix issue about cast to date
Github user viper-kun closed the pull request at: https://github.com/apache/spark/pull/4307 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5526][SQL] fix issue about cast to date
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/4307#issuecomment-72777528 Thanks, the issue is fixed by #4325. I will close this pr. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5526][SQL] fix issue about cast to date
GitHub user viper-kun opened a pull request: https://github.com/apache/spark/pull/4307 [SPARK-5526][SQL] fix issue about cast to date You can merge this pull request into a Git repository by running: $ git pull https://github.com/viper-kun/spark fixcastissue Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4307.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4307 commit 7280d6cc44fff8b18b1e06455da425ce0cd3804f Author: xukun 00228947 Date: 2015-02-02T09:31:09Z Throw valid precision more than day --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
Github user viper-kun commented on a diff in the pull request: https://github.com/apache/spark/pull/4214#discussion_r23750788 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -53,8 +79,10 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis private val fs = Utils.getHadoopFileSystem(logDir, SparkHadoopUtil.get.newConfiguration(conf)) - // A timestamp of when the disk was last accessed to check for log updates - private var lastLogCheckTimeMs = -1L + // The schedule thread pool size must be one, otherwise it will have concurrent issues about fs + // and applications between check task and clean task.. --- End diff -- Sorry, i don't know what you means? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
Github user viper-kun commented on a diff in the pull request: https://github.com/apache/spark/pull/4214#discussion_r23750759 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -113,12 +129,12 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis "Logging directory specified is not a directory: %s".format(logDir)) } -checkForLogs() +// A task that periodically checks for event log updates on disk. +pool.scheduleAtFixedRate(getRunner(checkForLogs), 0, UPDATE_INTERVAL_MS, TimeUnit.MILLISECONDS) -// Disable the background thread during tests. -if (!conf.contains("spark.testing")) { --- End diff -- I got it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
Github user viper-kun closed the pull request at: https://github.com/apache/spark/pull/2471 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
Github user viper-kun commented on a diff in the pull request: https://github.com/apache/spark/pull/4214#discussion_r23750085 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -163,9 +179,6 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis * applications that haven't been updated since last time the logs were checked. */ private[history] def checkForLogs(): Unit = { -lastLogCheckTimeMs = getMonotonicTimeMs() -logDebug("Checking for logs. Time is now %d.".format(lastLogCheckTimeMs)) --- End diff -- checkForlogs thread is Scheduled by pool. lastLogCheckTimeMs is no use and remove it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/2471#issuecomment-71575458 I have file a new pr #4214 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
GitHub user viper-kun opened a pull request: https://github.com/apache/spark/pull/4214 [SPARK-3562]Periodic cleanup event logs You can merge this pull request into a Git repository by running: $ git pull https://github.com/viper-kun/spark cleaneventlog Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4214.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4214 commit adcfe869863cd5f73b06143d41f138ea3b8d145f Author: xukun 00228947 Date: 2015-01-27T01:30:41Z Periodic cleanup event logs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3266] [WIP] Remove implementations from...
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/2951#issuecomment-63768467 are you still working on this? i am working on this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: invalid variable
GitHub user viper-kun opened a pull request: https://github.com/apache/spark/pull/3154 invalid variable You can merge this pull request into a Git repository by running: $ git pull https://github.com/viper-kun/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3154.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3154 commit f5bde61e4597e89fcadbec73a7d28c3ccf2ac569 Author: viper-kun Date: 2014-11-07T09:15:19Z invalid variable --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/2471#issuecomment-60541215 @vanzin @andrewor14 @srowen . is it ok to go? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/2471#issuecomment-60027733 @vanzin @andrewor14. is it ok to go? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/2471#issuecomment-59648648 @vanzin. is it ok to go? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/2471#issuecomment-59164199 @vanzin , is it ok to go? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/2471#issuecomment-59038886 in my opinion, spark create event log data, and spark delete it. In hadoop, event log is deleted by JobHistoryServer, not by fileSystem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
Github user viper-kun commented on a diff in the pull request: https://github.com/apache/spark/pull/2471#discussion_r18763243 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -214,6 +224,43 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis } --- End diff -- @vanzin sorry, i do not what you means. do you means that do not throw Throwable? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
Github user viper-kun commented on a diff in the pull request: https://github.com/apache/spark/pull/2471#discussion_r18741084 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -195,22 +241,68 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis } } -val newIterator = logInfos.iterator.buffered -val oldIterator = applications.values.iterator.buffered -while (newIterator.hasNext && oldIterator.hasNext) { - if (newIterator.head.endTime > oldIterator.head.endTime) { -addIfAbsent(newIterator.next) - } else { -addIfAbsent(oldIterator.next) +applications.synchronized { + val newIterator = logInfos.iterator.buffered + val oldIterator = applications.values.iterator.buffered + while (newIterator.hasNext && oldIterator.hasNext) { +if (newIterator.head.endTime > oldIterator.head.endTime) { + addIfAbsent(newIterator.next) +} else { + addIfAbsent(oldIterator.next) +} } + newIterator.foreach(addIfAbsent) + oldIterator.foreach(addIfAbsent) + + applications = newApps } -newIterator.foreach(addIfAbsent) -oldIterator.foreach(addIfAbsent) + } +} catch { + case t: Throwable => logError("Exception in checking for event log updates", t) --- End diff -- you means: don't catch Throwable? what should we do? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
Github user viper-kun commented on a diff in the pull request: https://github.com/apache/spark/pull/2471#discussion_r18740169 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -195,22 +241,68 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis } } -val newIterator = logInfos.iterator.buffered -val oldIterator = applications.values.iterator.buffered -while (newIterator.hasNext && oldIterator.hasNext) { - if (newIterator.head.endTime > oldIterator.head.endTime) { -addIfAbsent(newIterator.next) - } else { -addIfAbsent(oldIterator.next) +applications.synchronized { + val newIterator = logInfos.iterator.buffered + val oldIterator = applications.values.iterator.buffered + while (newIterator.hasNext && oldIterator.hasNext) { +if (newIterator.head.endTime > oldIterator.head.endTime) { + addIfAbsent(newIterator.next) +} else { + addIfAbsent(oldIterator.next) +} } + newIterator.foreach(addIfAbsent) + oldIterator.foreach(addIfAbsent) + + applications = newApps } -newIterator.foreach(addIfAbsent) -oldIterator.foreach(addIfAbsent) + } +} catch { + case t: Throwable => logError("Exception in checking for event log updates", t) +} + } + + /** + * Deleting apps if setting cleaner. + */ + private def cleanLogs() = { +lastLogCleanTimeMs = getMonotonicTimeMs() +logDebug("Cleaning logs. Time is now %d.".format(lastLogCleanTimeMs)) +try { + val logStatus = fs.listStatus(new Path(resolvedLogDir)) + val logDirs = if (logStatus != null) logStatus.filter(_.isDir).toSeq else Seq[FileStatus]() + val maxAge = conf.getLong("spark.history.fs.maxAge.seconds", +DEFAULT_SPARK_HISTORY_FS_MAXAGE_S) * 1000 + + val now = System.currentTimeMillis() + fs.synchronized { +// scan all logs from the log directory. +// Only directories older than this many seconds will be deleted . +logDirs.foreach { dir => + // history file older than this many seconds will be deleted + // when the history cleaner runs. + if (now - getModificationTime(dir) > maxAge) { +fs.delete(dir.getPath, true) --- End diff -- Can you tell me the detail reason that add try..catch into fs.delete? i think the exception may be caught by try..catch(line 271). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
Github user viper-kun commented on a diff in the pull request: https://github.com/apache/spark/pull/2471#discussion_r18740124 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -195,22 +241,68 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis } } -val newIterator = logInfos.iterator.buffered -val oldIterator = applications.values.iterator.buffered -while (newIterator.hasNext && oldIterator.hasNext) { - if (newIterator.head.endTime > oldIterator.head.endTime) { -addIfAbsent(newIterator.next) - } else { -addIfAbsent(oldIterator.next) +applications.synchronized { + val newIterator = logInfos.iterator.buffered + val oldIterator = applications.values.iterator.buffered + while (newIterator.hasNext && oldIterator.hasNext) { +if (newIterator.head.endTime > oldIterator.head.endTime) { + addIfAbsent(newIterator.next) +} else { + addIfAbsent(oldIterator.next) +} } + newIterator.foreach(addIfAbsent) + oldIterator.foreach(addIfAbsent) + + applications = newApps } -newIterator.foreach(addIfAbsent) -oldIterator.foreach(addIfAbsent) + } +} catch { + case t: Throwable => logError("Exception in checking for event log updates", t) +} + } + + /** + * Deleting apps if setting cleaner. + */ + private def cleanLogs() = { +lastLogCleanTimeMs = getMonotonicTimeMs() +logDebug("Cleaning logs. Time is now %d.".format(lastLogCleanTimeMs)) +try { + val logStatus = fs.listStatus(new Path(resolvedLogDir)) + val logDirs = if (logStatus != null) logStatus.filter(_.isDir).toSeq else Seq[FileStatus]() + val maxAge = conf.getLong("spark.history.fs.maxAge.seconds", +DEFAULT_SPARK_HISTORY_FS_MAXAGE_S) * 1000 + + val now = System.currentTimeMillis() + fs.synchronized { +// scan all logs from the log directory. +// Only directories older than this many seconds will be deleted . +logDirs.foreach { dir => + // history file older than this many seconds will be deleted + // when the history cleaner runs. + if (now - getModificationTime(dir) > maxAge) { +fs.delete(dir.getPath, true) + } +} + } + + val newApps = new mutable.LinkedHashMap[String, FsApplicationHistoryInfo]() + def addIfNotExpire(info: FsApplicationHistoryInfo) = { +if(now - info.lastUpdated <= maxAge) { + newApps += (info.id -> info) --- End diff -- info.lastUpdated is the timestamps of the directory and the info.lastUpdated is always bigger than the files timestamps. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
Github user viper-kun commented on a diff in the pull request: https://github.com/apache/spark/pull/2471#discussion_r18739911 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -195,22 +241,68 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis } } -val newIterator = logInfos.iterator.buffered -val oldIterator = applications.values.iterator.buffered -while (newIterator.hasNext && oldIterator.hasNext) { - if (newIterator.head.endTime > oldIterator.head.endTime) { -addIfAbsent(newIterator.next) - } else { -addIfAbsent(oldIterator.next) +applications.synchronized { --- End diff -- I think there is a need for the two tasks to never run concurrently. if the order is: 1. check task get applications 2. clean task get applications 3. clean task get result, and replace applications 4. check task get result, and replace applications then clean task result is covered by check result. use a ScheduledExecutorService with a single worker thread is a good way to solve it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/2471#issuecomment-58624839 @mattf @vanzin is this ok to go ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/2471#issuecomment-57083376 Thanks for your options. @vanzin @andrewor14 .i have changed code according your options. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/2391#issuecomment-56261135 @vanzin , @andrewor14 .Thanks for your opinions. Because the source branch had been deleted by me, i can change it in this commit. i submit another commit[#2471] and change the source according your opinions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
Github user viper-kun commented on a diff in the pull request: https://github.com/apache/spark/pull/2391#discussion_r17818744 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -214,6 +245,27 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis } } + /** + * Deleting apps if setting cleaner. + */ + private def cleanLogs() = { +lastLogCleanTimeMs = System.currentTimeMillis() +logDebug("Cleaning logs. Time is now %d.".format(lastLogCleanTimeMs)) +try { + val logStatus = fs.listStatus(new Path(resolvedLogDir)) + val logDirs = if (logStatus != null) logStatus.filter(_.isDir).toSeq else Seq[FileStatus]() + val maxAge = conf.getLong("spark.history.fs.maxAge", 604800) * 1000 + logDirs.foreach{ +case dir: FileStatus => + if(System.currentTimeMillis() - getModificationTime(dir) > maxAge) { +fs.delete(dir.getPath, true) --- End diff -- FileSystem.listStatus() will throw FileNotFoundException and IOException; FileSystem.delete will throw IOException. And I will change it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Periodic cleanup event logs
GitHub user viper-kun opened a pull request: https://github.com/apache/spark/pull/2471 Periodic cleanup event logs You can merge this pull request into a Git repository by running: $ git pull https://github.com/viper-kun/spark deletelog2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2471.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2471 commit f085cf02921574f734caa816ba1515b6e44b7341 Author: xukun 00228947 Date: 2014-09-20T06:45:28Z Periodic cleanup event logs commit 9cd45d48a4db3b6acdbbfea652a3fefc2b030190 Author: viper-kun Date: 2014-09-20T08:26:03Z Update monitoring.md commit 78e78cb5917be9df6b019e297c24d47d88ff5c12 Author: viper-kun Date: 2014-09-20T08:27:50Z Update monitoring.md commit e72b51ad20f3c7fbac8cd92860827df2f23ba71d Author: viper-kun Date: 2014-09-20T08:28:39Z Update monitoring.md commit da637c0a9ee7aaf163c4da77f07affc67fecfacd Author: viper-kun Date: 2014-09-20T08:30:13Z Update monitoring.md commit ce53db2db2e1be8b2a1a1d1dec56fd8df8a43b41 Author: viper-kun Date: 2014-09-20T08:31:37Z Update monitoring.md --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/2391#issuecomment-56129613 @andrewor14 i have checked Hadoop's JobHistoryServer. it is JobHistoryServer's responsibility to delete the application logs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
Github user viper-kun commented on a diff in the pull request: https://github.com/apache/spark/pull/2391#discussion_r17766911 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -100,6 +125,12 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis checkForLogs() logCheckingThread.setDaemon(true) logCheckingThread.start() + +if(conf.getBoolean("spark.history.cleaner.enable", false)) { + cleanLogs() + logCleaningThread.setDaemon(true) + logCleaningThread.start() +} --- End diff -- yesï¼you are right. it is no need to clean it on initialization. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update configuration.md
GitHub user viper-kun opened a pull request: https://github.com/apache/spark/pull/2406 Update configuration.md change the value of spark.files.fetchTimeout You can merge this pull request into a Git repository by running: $ git pull https://github.com/viper-kun/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2406.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2406 commit 7cf4c7a13533c1eb0db5ef2ad10678fa4096175b Author: viper-kun Date: 2014-09-16T09:04:46Z Update configuration.md --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: cycle of deleting history log
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/2391#issuecomment-55688517 @srowen If we run spark application frequently, it will write many spark event log into spark.eventLog.dir. After a long time later, there will be many spark event log that we do not concern in the spark.eventLog.dir. So we need delete some logs that exceeding the time limit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: cycle of deleting history log
GitHub user viper-kun opened a pull request: https://github.com/apache/spark/pull/2391 cycle of deleting history log You can merge this pull request into a Git repository by running: $ git pull https://github.com/viper-kun/spark xk2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2391.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2391 commit 79f492811dc939d1a50e6d8279e20042df4f95fe Author: xukun 00228947 Date: 2014-09-13T08:50:18Z cycle of deleting history log --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org