from:"viper\-kun"

[GitHub] spark pull request #16632: [SPARK-19273]shuffle stage should retry when fetc...

2017-01-22 Thread viper-kun

Github user viper-kun closed the pull request at:

https://github.com/apache/spark/pull/16632


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16632: [SPARK-19273]shuffle stage should retry when fetch shuff...

2017-01-22 Thread viper-kun

Github user viper-kun commented on the issue:

https://github.com/apache/spark/pull/16632
  
ok, close it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16632: [SPARK-19273]shuffle stage should retry when fetch shuff...

2017-01-18 Thread viper-kun

Github user viper-kun commented on the issue:

https://github.com/apache/spark/pull/16632
  
@srowen I have not test in master version. I will do it later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16632: [SPARK-19273]shuffle stage should retry when fetc...

2017-01-18 Thread viper-kun

GitHub user viper-kun opened a pull request:

https://github.com/apache/spark/pull/16632

[SPARK-19273]shuffle stage should retry when fetch shuffle fail

## What changes were proposed in this pull request?
shuffle stage should retry when fetch shuffle fail

## How was this patch tested?
see  https://issues.apache.org/jira/browse/SPARK-19273


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viper-kun/spark unretry

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16632.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16632


commit f8d411e4c7332967914b930c120e6629a1a17684
Author: x00228947 
Date:   2017-01-18T09:38:08Z

shuffle stage not retry




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9113: [SPARK-11100][SQL]HiveThriftServer HA issue,HiveThriftSer...

2016-06-05 Thread viper-kun

Github user viper-kun commented on the issue:

https://github.com/apache/spark/pull/9113
  
@rxin  Is there any design about replacement?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9113: [SPARK-11100][SQL]HiveThriftServer HA issue,HiveThriftSer...

2016-06-01 Thread viper-kun

Github user viper-kun commented on the issue:

https://github.com/apache/spark/pull/9113
  
@xiaowangyu  What I means is how to choose which active thrift server.It 
chooses random or by thrift server load.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11100][SQL]HiveThriftServer HA issue,Hi...

2016-05-31 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/9113#issuecomment-222619116
  
@xiaowangyu 
1.Can you tell me how to work? Thrift server is easy to dump because of too 
many beelines. If beeline can choose less-load thrift server. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14065]Increase probability of using cac...

2016-05-10 Thread viper-kun

Github user viper-kun closed the pull request at:

https://github.com/apache/spark/pull/11886


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14065]Increase probability of using cac...

2016-05-10 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/11886#issuecomment-218140398
  
@tgravescs  ok, i close it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4105][CORE] regenerate the shuffle file...

2016-04-26 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/12700#issuecomment-214948007
  
@jerryshao @srowen 
We met this problem in spark 1.4, spark 1.5 and spark 1.6 and just know 
shuffle file is broken. We can reproduce this problem by modify shuffle file, 
but don't know the root-cause. Any idea for this problem?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13112]CoarsedExecutorBackend register t...

2016-04-01 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/12078#issuecomment-204322877
  
I don't think so. It is a wrong order "after registed to drive first, it 
begin new Executor.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14065]Increase probability of using cac...

2016-04-01 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/11886#issuecomment-204319818
  
@srowen  The logic is the same to me.If necessary, I will put  
serializeMapStatus into  match...None.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13112]CoarsedExecutorBackend register t...

2016-03-30 Thread viper-kun

GitHub user viper-kun opened a pull request:

https://github.com/apache/spark/pull/12078

[SPARK-13112]CoarsedExecutorBackend register to driver should wait Executor 
was ready

## What changes were proposed in this pull request?

When CoarseGrainedExecutorBackend receives RegisterExecutorResponse slow 
after LaunchTask, it will occurs the problem. 

## How was this patch tested?
Executor host IO Busy

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viper-kun/spark patch-3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12078.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12078


commit 1b046304313c7663015667ab9cc8fe4201d17eb2
Author: xukun 
Date:   2016-03-31T02:39:40Z

CoarsedExecutorBackend register to driver should wait Executor was ready




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14065]Increase probability of using cac...

2016-03-22 Thread viper-kun

Github user viper-kun commented on a diff in the pull request:

https://github.com/apache/spark/pull/11886#discussion_r57102821
  
--- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ---
@@ -443,13 +443,12 @@ private[spark] class MapOutputTrackerMaster(conf: 
SparkConf)
   statuses = mapStatuses.getOrElse(shuffleId, Array[MapStatus]())
   epochGotten = epoch
   }
-}
-// If we got here, we failed to find the serialized locations in the 
cache, so we pulled
-// out a snapshot of the locations as "statuses"; let's serialize and 
return that
-val bytes = MapOutputTracker.serializeMapStatuses(statuses)
-logInfo("Size of output statuses for shuffle %d is %d 
bytes".format(shuffleId, bytes.length))
-// Add them into the table only if the epoch hasn't changed while we 
were working
-epochLock.synchronized {
+  
+  // If we got here, we failed to find the serialized locations in the 
cache, so we pulled
--- End diff --

The problem description:
Execute the query listed in 
jira(https://issues.apache.org/jira/browse/SPARK-14065), all tasks running on 
some executor are slow.

Slow executor logs show that RPC(GetMapOutputStatuses) get  
RpcTimeoutException
```
Error sending message [message = GetMapOutputStatuses(1)] in 1 attempts | 
org.apache.spark.Logging$class.logWarning(Logging.scala:92)
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 
seconds]. This timeout is controlled by spark.network.timeout
```

Driver logs shows that serialize mapstatus is slow
```
16/03/22 11:47:07 INFO [dispatcher-event-loop-36] MapOutputTrackerMaster: 
Size of output statuses for shuffle 1 is 10298063 bytes
16/03/22 11:47:14 INFO [dispatcher-event-loop-30] MapOutputTrackerMaster: 
Size of output statuses for shuffle 1 is 10298063 bytes
16/03/22 11:47:21 INFO [dispatcher-event-loop-3] MapOutputTrackerMaster: 
Size of output statuses for shuffle 1 is 10298063 bytes
16/03/22 11:47:27 INFO [dispatcher-event-loop-32] MapOutputTrackerMaster: 
Size of output statuses for shuffle 1 is 10298063 bytes
16/03/22 11:47:34 INFO [dispatcher-event-loop-31] MapOutputTrackerMaster: 
Size of output statuses for shuffle 1 is 10298063 bytes
16/03/22 11:47:41 INFO [dispatcher-event-loop-38] MapOutputTrackerMaster: 
Size of output statuses for shuffle 1 is 10298063 bytes
16/03/22 11:47:47 INFO [dispatcher-event-loop-4] MapOutputTrackerMaster: 
Size of output statuses for shuffle 1 is 10298063 bytes
16/03/22 11:47:54 INFO [dispatcher-event-loop-37] MapOutputTrackerMaster: 
Size of output statuses for shuffle 1 is 10298063 bytes
16/03/22 11:48:00 INFO [dispatcher-event-loop-28] MapOutputTrackerMaster: 
Size of output statuses for shuffle 1 is 10298063 bytes
```

When reduce task start, all executors will get mapstatus from 
driver.Because "MapOutputTracker.serializeMapStatuses(statuses)" is out of 
epochLock.synchronized {}
,all thread will do the operation - 
MapOutputTracker.serializeMapStatuses(statuses). In function 
serializeMapStatuses,  it has sync on statuses. So Serialize is one by one. 
Every serialize cost 7 seconds. We have 80 executors, it total cost 560 
seconds. The result is  some executor get mapstatus timeout.
This patch put "MapOutputTracker.serializeMapStatuses(statuses)" into 
epochLock.synchronized {}, it will increase probability of using cached 
serialized status.

I have test  listed sql in 30T tpcds. The result shows it  faster than old. 

![image](https://cloud.githubusercontent.com/assets/6460155/13974069/9f448cc4-f0e3-11e5-9a5d-bc3cd8300db1.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Increase probability of using cached serialize...

2016-03-22 Thread viper-kun

GitHub user viper-kun opened a pull request:

https://github.com/apache/spark/pull/11886

Increase probability of using cached serialized status

Increase probability of using cached serialized status





You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viper-kun/spark patch-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11886.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11886


commit 32ad0d57becd216f1ad9e835ee6e4db527b3ab20
Author: xukun 
Date:   2016-03-22T12:44:34Z

Increase probability of using cached serialized status




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13652][Core]Copy ByteBuffer in sendRpcS...

2016-03-05 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/11499#issuecomment-192824479
  
@zsxwing  It is ok. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: remove unnecessary copy

2015-11-30 Thread viper-kun

Github user viper-kun closed the pull request at:

https://github.com/apache/spark/pull/10040


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: remove unnecessary copy

2015-11-30 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/10040#issuecomment-160824966
  
ok.close it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: remove unnecessary copy

2015-11-30 Thread viper-kun

GitHub user viper-kun opened a pull request:

https://github.com/apache/spark/pull/10040

remove unnecessary copy



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viper-kun/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10040.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10040


commit 7f665bb5735923eccc219ea0b03b8f914230c474
Author: xukun 
Date:   2015-11-30T12:45:50Z

remove unnecessary copy




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11225]Prevent generate empty file

2015-11-05 Thread viper-kun

Github user viper-kun closed the pull request at:

https://github.com/apache/spark/pull/9191


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11225]Prevent generate empty file

2015-11-05 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/9191#issuecomment-154050289
  
ok. close it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11225]Prevent generate empty file

2015-10-26 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/9191#issuecomment-151385025
  
@JoshRosen 

Is it ok? If it doesn't work, I will close this pr.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11225]Prevent generate empty file

2015-10-23 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/9191#issuecomment-150747910
  
@davies This test is flaky, pls re-test it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11225]Prevent generate empty file

2015-10-22 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/9191#issuecomment-150176502
  
Hi @davies, 
 Sorry, I do not understand Python. Can you help me fix it? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11225]Prevent generate empty file

2015-10-21 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/9191#issuecomment-150076597
  
@JoshRosen 
 In my test environment, it do not have this error. Pls retest it.Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11225]Prevent generate empty file

2015-10-21 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/9191#issuecomment-149825404
  
Thanks @JoshRosen 

Sorry, I don't do performacne test. As I know, it will reduce number of 
open file. When there too much empty file, it will get some benefit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11225]Prevent generate empty file

2015-10-20 Thread viper-kun

GitHub user viper-kun opened a pull request:

https://github.com/apache/spark/pull/9191

[SPARK-11225]Prevent generate empty file

If no data will be written into the bucket, it will be generate empty 
files. So open() must be called in the first write(key,value).   

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viper-kun/spark apache

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9191.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9191


commit 9887ef1dbcd39ab796c77239d5bba0da5e9ba3ab
Author: x00228947 
Date:   2015-10-21T02:30:01Z

remove invalid code




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: correct buffer size

2015-08-13 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/8189#issuecomment-130988766
  
@liancheng @scwf   is it OK?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: correct buffer size

2015-08-13 Thread viper-kun

GitHub user viper-kun opened a pull request:

https://github.com/apache/spark/pull/8189

correct buffer size

No need multiply columnType.defaultSize here, we have done it  In 
ColumnBuilder class. 
  buffer = ByteBuffer.allocate(4 + size * columnType.defaultSize)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viper-kun/spark errorSize

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8189.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8189


commit 6741f239052c72663b55b3d398db3b26d93ae2e3
Author: x00228947 
Date:   2015-08-14T04:25:07Z

recorrect buffer size




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-6924]Fix driver hangs in yarn-client mo...

2015-04-23 Thread viper-kun

Github user viper-kun closed the pull request at:

https://github.com/apache/spark/pull/5523


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-6924]Fix driver hangs in yarn-client mo...

2015-04-23 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/5523#issuecomment-95764614
  
ok.  I will close it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...

2015-04-19 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/5178#issuecomment-94250246
  
@maropu  I will update it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...

2015-04-19 Thread viper-kun

Github user viper-kun commented on a diff in the pull request:

https://github.com/apache/spark/pull/5178#discussion_r28650737
  
--- Diff: 
core/src/main/scala/org/apache/spark/shuffle/FileShuffleBlockManager.scala ---
@@ -180,7 +180,8 @@ class FileShuffleBlockManager(conf: SparkConf)
 Some(segment.nioByteBuffer())
   }
 
-  override def getBlockData(blockId: ShuffleBlockId): ManagedBuffer = {
+  override def getBlockData(blockId: ShuffleBlockId,
+  blockManagerId: BlockManagerId = blockManager.blockManagerId): 
ManagedBuffer = {
--- End diff --

It not support HashShuffle with consolidateShuffleFiles. For not 
confuseï¼it only support SortShuffleManager.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-6924]Fix driver hangs in yarn-client mo...

2015-04-16 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/5523#issuecomment-93728497
  
Between construction, it is normal that it didn't hear from executors. Only 
after construction, executors have connected and sent heartbeat to driver. We 
can indicates whether there is a disconnection. SparkContext.isInited show 
whether SparkContext construction has completed .

>>>I don't think you can call stop() just because you didn't hear from 
executors recently.
 Is there any better way?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-6924]Fix driver hangs in yarn-client mo...

2015-04-16 Thread viper-kun

Github user viper-kun commented on a diff in the pull request:

https://github.com/apache/spark/pull/5523#discussion_r28502410
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -75,6 +75,8 @@ import org.apache.spark.util._
  */
 class SparkContext(config: SparkConf) extends Logging with 
ExecutorAllocationClient {
 
+  var isInited: Boolean = false
--- End diff --

>>> I also don't think that a lack of executor messages indicates a 
disconnection; it's not possible to distinguish from temporary loss of 
connectivity this way.

All executors send heartbeat to driver at fixed Rate.Over a period of time, 
all executors are expire,  I think there is a disconnection. Is there any 
better way to distinguish from temporary loss of connectivity.
Can we check some times? If all executors still expire, we  indicates a 
disconnection.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-6924]Fix driver hangs in yarn-client mo...

2015-04-16 Thread viper-kun

Github user viper-kun commented on a diff in the pull request:

https://github.com/apache/spark/pull/5523#discussion_r28501942
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -75,6 +75,8 @@ import org.apache.spark.util._
  */
 class SparkContext(config: SparkConf) extends Logging with 
ExecutorAllocationClient {
 
+  var isInited: Boolean = false
--- End diff --

>>>This is going to conflict with the overhaul of the SparkContext 
constructor. I don't see why this depends on the constructor finishing since 
where you reference the SparkContext, it has been constructed.

In SparkContext constructor, when HeartbeatReceiver create,  
timeoutCheckingThread will check expire dead host. if executorLastSeen is 
empty, it will execute sc.stop(). Then it will throw exception:
java.lang.NullPointerException
at org.apache.spark.SparkContext.stop(SparkContext.scala:1416)
at 
org.apache.spark.HeartbeatReceiver.org$apache$spark$HeartbeatReceiver$$expireDeadHosts(HeartbeatReceiver.scala:134)
at 
org.apache.spark.HeartbeatReceiver$$anonfun$receive$1.applyOrElse(HeartbeatReceiver.scala:92)
at 
org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:176)
at 
org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$4.apply$mcV$sp(AkkaRpcEnv.scala:125)
at 
org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:196)
at 
org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1.applyOrElse(AkkaRpcEnv.scala:124)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at 
org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53)
at 
org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
at 
scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at 
org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at 
org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.aroundReceive(AkkaRpcEnv.scala:91)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at 
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15/04/16 15:12:16 INFO netty.NettyBlockTransferService: Server created on 
53493



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-6924]Fix driver hangs in yarn-client mo...

2015-04-15 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/5523#issuecomment-93615543
  
@srowen  I updated the jira. Pls review it. Thanksã


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-6924]Fix client hands in yarn-client mo...

2015-04-15 Thread viper-kun

GitHub user viper-kun opened a pull request:

https://github.com/apache/spark/pull/5523

[Spark-6924]Fix client hands in yarn-client mode when net is broken

https://issues.apache.org/jira/browse/SPARK-6924

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viper-kun/spark spark-6924

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5523.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5523


commit 9288f220a6b7e4ea0d938a2ec4fbfb201de1aa71
Author: xukun 00228947 
Date:   2015-04-15T09:40:28Z

fix client hands in yarn-client mode




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6879][HistoryServer]check if app is com...

2015-04-13 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/5491#issuecomment-92578144
  
I think it is ok. User must call sc.stop(), if not, it just not delete some 
event log.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6479][Block Manager]Create off-heap blo...

2015-04-08 Thread viper-kun

Github user viper-kun commented on a diff in the pull request:

https://github.com/apache/spark/pull/5430#discussion_r28030832
  
--- Diff: core/src/main/scala/org/apache/spark/storage/OffHeapStore.scala 
---
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.storage
+
+import java.nio.ByteBuffer
+import org.apache.spark.Logging
+import org.apache.spark.util.Utils
+
+import scala.util.control.NonFatal
+
+
+/**
+ * Stores BlockManager blocks on OffHeap.
+ * We capture any potential exception from underlying implementation
+ * and return with the expected failure value
+ */
+private[spark] class OffHeapStore(blockManager: BlockManager, executorId: 
String)
+  extends BlockStore(blockManager: BlockManager) with Logging {
+
+  lazy val offHeapManager: Option[OffHeapBlockManager] =
+OffHeapBlockManager.create(blockManager, executorId)
+
+  logInfo("OffHeap started")
+
+  override def getSize(blockId: BlockId): Long = {
+try {
+  offHeapManager.map(_.getSize(blockId)).getOrElse(0)
+} catch {
+  case NonFatal(t) => logError(s"error in getSize from $blockId")
+0
+}
+  }
+
+  override def putBytes(blockId: BlockId, bytes: ByteBuffer, level: 
StorageLevel): PutResult = {
+putIntoOffHeapStore(blockId, bytes, returnValues = true)
+  }
+
+  override def putArray(
+  blockId: BlockId,
+  values: Array[Any],
+  level: StorageLevel,
+  returnValues: Boolean): PutResult = {
+putIterator(blockId, values.toIterator, level, returnValues)
+  }
+
+  override def putIterator(
+  blockId: BlockId,
+  values: Iterator[Any],
+  level: StorageLevel,
+  returnValues: Boolean): PutResult = {
+logDebug(s"Attempting to write values for block $blockId")
+val bytes = blockManager.dataSerialize(blockId, values)
+putIntoOffHeapStore(blockId, bytes, returnValues)
+  }
+
+  private def putIntoOffHeapStore(
+  blockId: BlockId,
+  bytes: ByteBuffer,
+  returnValues: Boolean): PutResult = {
+
+  // So that we do not modify the input offsets !
--- End diff --

style: two blank left


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6479][Block Manager]Create off-heap blo...

2015-04-08 Thread viper-kun

Github user viper-kun commented on a diff in the pull request:

https://github.com/apache/spark/pull/5430#discussion_r28030751
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/OffHeapBlockManager.scala ---
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.storage
+
+import java.nio.ByteBuffer
+import org.apache.spark.Logging
+
+import scala.util.control.NonFatal
+
+
+trait OffHeapBlockManager {
+
+  /**
+   * desc for the implementation.
+   *
+   */
+  def desc(): String = {"OffHeap"}
+
+  /**
+   * initialize a concrete block manager implementation.
+   *
+   * @throws java.io.IOException when FS init failure.
+   */
+  def init(blockManager: BlockManager, executorId: String)
+
+  /**
+   * remove the cache from offheap
+   *
+   * @throws java.io.IOException when FS failure in removing file.
+   */
+  def removeFile(blockId: BlockId): Boolean
+
+  /**
+   * check the existence of the block cache
+   *
+   * @throws java.io.IOException when FS failure in checking the block 
existence.
+   */
+  def fileExists(blockId: BlockId): Boolean
+
+  /**
+   * save the cache to the offheap.
+   *
+   * @throws java.io.IOException when FS failure in put blocks.
+   */
+  def putBytes(blockId: BlockId, bytes: ByteBuffer)
+
+  /**
+   * retrieve the cache from offheap
+   *
+   * @throws java.io.IOException when FS failure in get blocks.
+   */
+  def getBytes(blockId: BlockId): Option[ByteBuffer]
+
+  /**
+   * retrieve the size of the cache
+   *
+   * @throws java.io.IOException when FS failure in get block size.
+   */
+  def getSize(blockId: BlockId): Long
+
+  /**
+   * cleanup when shutdown
+   *
+   */
+  def addShutdownHook()
+
+  final def setup(blockManager: BlockManager, executorId: String): Unit = {
+init(blockManager, executorId)
+addShutdownHook()
+  }
+}
+
+object OffHeapBlockManager extends Logging{
+  val MAX_DIR_CREATION_ATTEMPTS = 10
+  val subDirsPerDir = 64
+  def create(blockManager: BlockManager,
+ executorId: String): Option[OffHeapBlockManager] = {
+ val sNames = 
blockManager.conf.getOption("spark.offHeapStore.blockManager")
--- End diff --

i think default value is necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...

2015-03-31 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/5178#issuecomment-88064150
  
@andrewor14  Pls review it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...

2015-03-27 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/5178#issuecomment-86849484
  
Hi @andrewor14. pls retest it, test build time out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...

2015-03-26 Thread viper-kun

Github user viper-kun commented on a diff in the pull request:

https://github.com/apache/spark/pull/5178#discussion_r27193887
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala 
---
@@ -181,68 +181,100 @@ final class ShuffleBlockFetcherIterator(
 // at most maxBytesInFlight in order to limit the amount of data in 
flight.
 val remoteRequests = new ArrayBuffer[FetchRequest]
 
+val shuffleMgrName = SparkEnv.get.conf.get("spark.shuffle.manager", 
"sort")
+
 // Tracks total number of blocks (including zero sized blocks)
 var totalBlocks = 0
-for ((address, blockInfos) <- blocksByAddress) {
-  totalBlocks += blockInfos.size
-  if (address.executorId == blockManager.blockManagerId.executorId) {
-// Filter out zero-sized blocks
-localBlocks ++= blockInfos.filter(_._2 != 0).map(_._1)
-numBlocksToFetch += localBlocks.size
-  } else {
-val iterator = blockInfos.iterator
-var curRequestSize = 0L
-var curBlocks = new ArrayBuffer[(BlockId, Long)]
-while (iterator.hasNext) {
-  val (blockId, size) = iterator.next()
-  // Skip empty blocks
-  if (size > 0) {
-curBlocks += ((blockId, size))
-remoteBlocks += blockId
-numBlocksToFetch += 1
-curRequestSize += size
-  } else if (size < 0) {
-throw new BlockException(blockId, "Negative block size " + 
size)
-  }
-  if (curRequestSize >= targetRequestSize) {
-// Add this FetchRequest
-remoteRequests += new FetchRequest(address, curBlocks)
-curBlocks = new ArrayBuffer[(BlockId, Long)]
-logDebug(s"Creating fetch request of $curRequestSize at 
$address")
-curRequestSize = 0
-  }
+if(shuffleMgrName == "hash") {
--- End diff --

what are you meansï¼how to specify the full pathï¼


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...

2015-03-24 Thread viper-kun

GitHub user viper-kun opened a pull request:

https://github.com/apache/spark/pull/5178

[SPARK-6521][Core]executors in the same node read local shuffle file

In the past, executor read other executor's shuffle file in the same node 
by net. This pr make that executors in the same node read local shuffle file In 
sort-based Shuffle. It will reduce net transport.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viper-kun/spark readlocal

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5178.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5178


commit 28a96ec3fe2e9e6adadedddc6e74b2c1ef667b00
Author: xukun 00228947 
Date:   2015-03-25T02:26:33Z

read local

commit e6d1e0b62e248714042a1f1d463f9ab26b018516
Author: xukun 00228947 
Date:   2015-03-25T02:46:08Z

read local

commit 5b766ca3f02d2e9b213e39908aec9c6b5804b623
Author: xukun 00228947 
Date:   2015-03-25T02:49:02Z

fix it




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2015-02-25 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/4214#issuecomment-76109919
  
@andrewor14  thanks for your check. Pls retest it . I can not get test log.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5661]function hasShutdownDeleteTachyonD...

2015-02-08 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/4418#issuecomment-73452955
  
@srowen  it is a static function and unused now. I think we should better 
leave it in Utils class now. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5661]function hasShutdownDeleteTachyonD...

2015-02-08 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/4418#issuecomment-73341954
  
ok. i will create a JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Remove unused function

2015-02-05 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/4418#issuecomment-73190743
  
@srowen  
ok. if it is useful later; we should change it like this
def hasShutdownDeleteTachyonDir(file: TachyonFile): Boolean = {
val absolutePath = file.getPath()
shutdownDeleteTachyonPaths.synchronized {
  shutdownDeleteTachyonPaths.contains(absolutePath)
}
  }



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Remove unused function

2015-02-05 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/4418#issuecomment-73179289
  
cc @haoyuan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: remove unused function

2015-02-05 Thread viper-kun

GitHub user viper-kun opened a pull request:

https://github.com/apache/spark/pull/4418

remove unused function

hasShutdownDeleteTachyonDir(file: TachyonFile) should use 
shutdownDeleteTachyonPaths(not shutdownDeletePaths) to determine Whether 
contain file. To solve it ,delete two unused function.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viper-kun/spark deleteunusedfun

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4418.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4418


commit 2bc397edf3793cc8f3e13b1d7a3f31efb7e8f9c2
Author: xukun 00228947 
Date:   2015-02-06T03:26:34Z

deleteunusedfun




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5526][SQL] fix issue about cast to date

2015-02-03 Thread viper-kun

Github user viper-kun closed the pull request at:

https://github.com/apache/spark/pull/4307


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5526][SQL] fix issue about cast to date

2015-02-03 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/4307#issuecomment-72777528
  
Thanks, the issue is fixed by #4325. I will close this pr.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5526][SQL] fix issue about cast to date

2015-02-02 Thread viper-kun

GitHub user viper-kun opened a pull request:

https://github.com/apache/spark/pull/4307

[SPARK-5526][SQL] fix issue about cast  to date



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viper-kun/spark fixcastissue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4307.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4307


commit 7280d6cc44fff8b18b1e06455da425ce0cd3804f
Author: xukun 00228947 
Date:   2015-02-02T09:31:09Z

Throw valid precision more than day




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2015-01-28 Thread viper-kun

Github user viper-kun commented on a diff in the pull request:

https://github.com/apache/spark/pull/4214#discussion_r23750788
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -53,8 +79,10 @@ private[history] class FsHistoryProvider(conf: 
SparkConf) extends ApplicationHis
 
   private val fs = Utils.getHadoopFileSystem(logDir, 
SparkHadoopUtil.get.newConfiguration(conf))
 
-  // A timestamp of when the disk was last accessed to check for log 
updates
-  private var lastLogCheckTimeMs = -1L
+  // The schedule thread pool size must be one, otherwise it will have 
concurrent issues about fs
+  // and applications between check task and clean task..
--- End diff --

Sorry, i don't know what you means?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2015-01-28 Thread viper-kun

Github user viper-kun commented on a diff in the pull request:

https://github.com/apache/spark/pull/4214#discussion_r23750759
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -113,12 +129,12 @@ private[history] class FsHistoryProvider(conf: 
SparkConf) extends ApplicationHis
 "Logging directory specified is not a directory: 
%s".format(logDir))
 }
 
-checkForLogs()
+// A task that periodically checks for event log updates on disk.
+pool.scheduleAtFixedRate(getRunner(checkForLogs), 0, 
UPDATE_INTERVAL_MS, TimeUnit.MILLISECONDS)
 
-// Disable the background thread during tests.
-if (!conf.contains("spark.testing")) {
--- End diff --

I got it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2015-01-28 Thread viper-kun

Github user viper-kun closed the pull request at:

https://github.com/apache/spark/pull/2471


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2015-01-28 Thread viper-kun

Github user viper-kun commented on a diff in the pull request:

https://github.com/apache/spark/pull/4214#discussion_r23750085
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -163,9 +179,6 @@ private[history] class FsHistoryProvider(conf: 
SparkConf) extends ApplicationHis
* applications that haven't been updated since last time the logs were 
checked.
*/
   private[history] def checkForLogs(): Unit = {
-lastLogCheckTimeMs = getMonotonicTimeMs()
-logDebug("Checking for logs. Time is now 
%d.".format(lastLogCheckTimeMs))
--- End diff --

checkForlogs thread is Scheduled by pool.  lastLogCheckTimeMs  is no use 
and remove it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2015-01-26 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/2471#issuecomment-71575458
  
I have file a new pr #4214


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2015-01-26 Thread viper-kun

GitHub user viper-kun opened a pull request:

https://github.com/apache/spark/pull/4214

[SPARK-3562]Periodic cleanup event logs



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viper-kun/spark cleaneventlog

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4214.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4214


commit adcfe869863cd5f73b06143d41f138ea3b8d145f
Author: xukun 00228947 
Date:   2015-01-27T01:30:41Z

Periodic cleanup event logs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3266] [WIP] Remove implementations from...

2014-11-19 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/2951#issuecomment-63768467
  
are you still working on this?  i am working on this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: invalid variable

2014-11-07 Thread viper-kun

GitHub user viper-kun opened a pull request:

https://github.com/apache/spark/pull/3154

invalid variable



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viper-kun/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3154.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3154


commit f5bde61e4597e89fcadbec73a7d28c3ccf2ac569
Author: viper-kun 
Date:   2014-11-07T09:15:19Z

invalid variable




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-10-26 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/2471#issuecomment-60541215
  
@vanzin @andrewor14 @srowen . is it ok to go?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-10-21 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/2471#issuecomment-60027733
  
@vanzin @andrewor14. is it ok to go?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-10-19 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/2471#issuecomment-59648648
  
@vanzin. is it ok to go?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-10-14 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/2471#issuecomment-59164199
  
@vanzin , is it ok to go?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-10-14 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/2471#issuecomment-59038886
  
in my opinion,  spark create event log data, and spark delete it. In 
hadoop, event log is deleted by JobHistoryServer, not by fileSystem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-10-13 Thread viper-kun

Github user viper-kun commented on a diff in the pull request:

https://github.com/apache/spark/pull/2471#discussion_r18763243
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -214,6 +224,43 @@ private[history] class FsHistoryProvider(conf: 
SparkConf) extends ApplicationHis
 }
--- End diff --

@vanzin  sorry, i do not what you means. do you means that do not throw 
Throwable?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-10-11 Thread viper-kun

Github user viper-kun commented on a diff in the pull request:

https://github.com/apache/spark/pull/2471#discussion_r18741084
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -195,22 +241,68 @@ private[history] class FsHistoryProvider(conf: 
SparkConf) extends ApplicationHis
   }
 }
 
-val newIterator = logInfos.iterator.buffered
-val oldIterator = applications.values.iterator.buffered
-while (newIterator.hasNext && oldIterator.hasNext) {
-  if (newIterator.head.endTime > oldIterator.head.endTime) {
-addIfAbsent(newIterator.next)
-  } else {
-addIfAbsent(oldIterator.next)
+applications.synchronized {
+  val newIterator = logInfos.iterator.buffered
+  val oldIterator = applications.values.iterator.buffered
+  while (newIterator.hasNext && oldIterator.hasNext) {
+if (newIterator.head.endTime > oldIterator.head.endTime) {
+  addIfAbsent(newIterator.next)
+} else {
+  addIfAbsent(oldIterator.next)
+}
   }
+  newIterator.foreach(addIfAbsent)
+  oldIterator.foreach(addIfAbsent)
+
+  applications = newApps
 }
-newIterator.foreach(addIfAbsent)
-oldIterator.foreach(addIfAbsent)
+  }
+} catch {
+  case t: Throwable => logError("Exception in checking for event log 
updates", t)
--- End diff --

you means: don't catch Throwable? what should we do?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-10-11 Thread viper-kun

Github user viper-kun commented on a diff in the pull request:

https://github.com/apache/spark/pull/2471#discussion_r18740169
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -195,22 +241,68 @@ private[history] class FsHistoryProvider(conf: 
SparkConf) extends ApplicationHis
   }
 }
 
-val newIterator = logInfos.iterator.buffered
-val oldIterator = applications.values.iterator.buffered
-while (newIterator.hasNext && oldIterator.hasNext) {
-  if (newIterator.head.endTime > oldIterator.head.endTime) {
-addIfAbsent(newIterator.next)
-  } else {
-addIfAbsent(oldIterator.next)
+applications.synchronized {
+  val newIterator = logInfos.iterator.buffered
+  val oldIterator = applications.values.iterator.buffered
+  while (newIterator.hasNext && oldIterator.hasNext) {
+if (newIterator.head.endTime > oldIterator.head.endTime) {
+  addIfAbsent(newIterator.next)
+} else {
+  addIfAbsent(oldIterator.next)
+}
   }
+  newIterator.foreach(addIfAbsent)
+  oldIterator.foreach(addIfAbsent)
+
+  applications = newApps
 }
-newIterator.foreach(addIfAbsent)
-oldIterator.foreach(addIfAbsent)
+  }
+} catch {
+  case t: Throwable => logError("Exception in checking for event log 
updates", t)
+}
+  }
+
+  /**
+   *  Deleting apps if setting cleaner.
+   */
+  private def cleanLogs() = {
+lastLogCleanTimeMs = getMonotonicTimeMs()
+logDebug("Cleaning logs. Time is now %d.".format(lastLogCleanTimeMs))
+try {
+  val logStatus = fs.listStatus(new Path(resolvedLogDir))
+  val logDirs = if (logStatus != null) logStatus.filter(_.isDir).toSeq 
else Seq[FileStatus]()
+  val maxAge = conf.getLong("spark.history.fs.maxAge.seconds",
+DEFAULT_SPARK_HISTORY_FS_MAXAGE_S) * 1000
+
+  val now = System.currentTimeMillis()
+  fs.synchronized {
+// scan all logs from the log directory.
+// Only directories older than this many seconds will be deleted .
+logDirs.foreach { dir =>
+  // history file older than this many seconds will be deleted 
+  // when the history cleaner runs.
+  if (now - getModificationTime(dir) > maxAge) {
+fs.delete(dir.getPath, true)
--- End diff --

Can you tell me the detail reason that add try..catch into fs.delete?   i 
think the exception may be caught by try..catch(line 271). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-10-11 Thread viper-kun

Github user viper-kun commented on a diff in the pull request:

https://github.com/apache/spark/pull/2471#discussion_r18740124
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -195,22 +241,68 @@ private[history] class FsHistoryProvider(conf: 
SparkConf) extends ApplicationHis
   }
 }
 
-val newIterator = logInfos.iterator.buffered
-val oldIterator = applications.values.iterator.buffered
-while (newIterator.hasNext && oldIterator.hasNext) {
-  if (newIterator.head.endTime > oldIterator.head.endTime) {
-addIfAbsent(newIterator.next)
-  } else {
-addIfAbsent(oldIterator.next)
+applications.synchronized {
+  val newIterator = logInfos.iterator.buffered
+  val oldIterator = applications.values.iterator.buffered
+  while (newIterator.hasNext && oldIterator.hasNext) {
+if (newIterator.head.endTime > oldIterator.head.endTime) {
+  addIfAbsent(newIterator.next)
+} else {
+  addIfAbsent(oldIterator.next)
+}
   }
+  newIterator.foreach(addIfAbsent)
+  oldIterator.foreach(addIfAbsent)
+
+  applications = newApps
 }
-newIterator.foreach(addIfAbsent)
-oldIterator.foreach(addIfAbsent)
+  }
+} catch {
+  case t: Throwable => logError("Exception in checking for event log 
updates", t)
+}
+  }
+
+  /**
+   *  Deleting apps if setting cleaner.
+   */
+  private def cleanLogs() = {
+lastLogCleanTimeMs = getMonotonicTimeMs()
+logDebug("Cleaning logs. Time is now %d.".format(lastLogCleanTimeMs))
+try {
+  val logStatus = fs.listStatus(new Path(resolvedLogDir))
+  val logDirs = if (logStatus != null) logStatus.filter(_.isDir).toSeq 
else Seq[FileStatus]()
+  val maxAge = conf.getLong("spark.history.fs.maxAge.seconds",
+DEFAULT_SPARK_HISTORY_FS_MAXAGE_S) * 1000
+
+  val now = System.currentTimeMillis()
+  fs.synchronized {
+// scan all logs from the log directory.
+// Only directories older than this many seconds will be deleted .
+logDirs.foreach { dir =>
+  // history file older than this many seconds will be deleted 
+  // when the history cleaner runs.
+  if (now - getModificationTime(dir) > maxAge) {
+fs.delete(dir.getPath, true)
+  }
+}
+  }
+  
+  val newApps = new mutable.LinkedHashMap[String, 
FsApplicationHistoryInfo]()
+  def addIfNotExpire(info: FsApplicationHistoryInfo) = {
+if(now - info.lastUpdated <= maxAge) {
+  newApps += (info.id -> info)
--- End diff --

info.lastUpdated is the timestamps of the directory and the 
info.lastUpdated is always bigger than the files timestamps. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-10-11 Thread viper-kun

Github user viper-kun commented on a diff in the pull request:

https://github.com/apache/spark/pull/2471#discussion_r18739911
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -195,22 +241,68 @@ private[history] class FsHistoryProvider(conf: 
SparkConf) extends ApplicationHis
   }
 }
 
-val newIterator = logInfos.iterator.buffered
-val oldIterator = applications.values.iterator.buffered
-while (newIterator.hasNext && oldIterator.hasNext) {
-  if (newIterator.head.endTime > oldIterator.head.endTime) {
-addIfAbsent(newIterator.next)
-  } else {
-addIfAbsent(oldIterator.next)
+applications.synchronized {
--- End diff --

I think there is a need for the two tasks to never run concurrently. if the 
order is: 
1. check task get applications 
2. clean task get applications
3. clean task get result, and replace applications
4. check task get result, and replace applications
then clean task result is covered by check result.
use a ScheduledExecutorService with a single worker thread is a good way to 
solve it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-10-10 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/2471#issuecomment-58624839
  
@mattf @vanzin  is this ok to go ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-09-28 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/2471#issuecomment-57083376
  
Thanks for your options. @vanzin @andrewor14 .i have changed code according 
your options.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-09-20 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/2391#issuecomment-56261135
  
@vanzin , @andrewor14 .Thanks for your opinions. Because the source branch 
had been deleted by me, i can change it in this commit. i submit another 
commit[#2471] and change the source according your opinions. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-09-20 Thread viper-kun

Github user viper-kun commented on a diff in the pull request:

https://github.com/apache/spark/pull/2391#discussion_r17818744
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -214,6 +245,27 @@ private[history] class FsHistoryProvider(conf: 
SparkConf) extends ApplicationHis
 }
   }
 
+  /**
+   *  Deleting apps if setting cleaner.
+   */
+  private def cleanLogs() = {
+lastLogCleanTimeMs = System.currentTimeMillis()
+logDebug("Cleaning logs. Time is now %d.".format(lastLogCleanTimeMs))
+try {
+  val logStatus = fs.listStatus(new Path(resolvedLogDir))
+  val logDirs = if (logStatus != null) logStatus.filter(_.isDir).toSeq 
else Seq[FileStatus]()
+  val maxAge = conf.getLong("spark.history.fs.maxAge", 604800) * 1000
+  logDirs.foreach{
+case dir: FileStatus =>
+  if(System.currentTimeMillis() - getModificationTime(dir) > 
maxAge) {
+fs.delete(dir.getPath, true)
--- End diff --

FileSystem.listStatus() will throw FileNotFoundException and IOException; 
FileSystem.delete will throw IOException. And I will change it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Periodic cleanup event logs

2014-09-20 Thread viper-kun

GitHub user viper-kun opened a pull request:

https://github.com/apache/spark/pull/2471

Periodic cleanup event logs



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viper-kun/spark deletelog2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2471.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2471


commit f085cf02921574f734caa816ba1515b6e44b7341
Author: xukun 00228947 
Date:   2014-09-20T06:45:28Z

Periodic cleanup event logs

commit 9cd45d48a4db3b6acdbbfea652a3fefc2b030190
Author: viper-kun 
Date:   2014-09-20T08:26:03Z

Update monitoring.md

commit 78e78cb5917be9df6b019e297c24d47d88ff5c12
Author: viper-kun 
Date:   2014-09-20T08:27:50Z

Update monitoring.md

commit e72b51ad20f3c7fbac8cd92860827df2f23ba71d
Author: viper-kun 
Date:   2014-09-20T08:28:39Z

Update monitoring.md

commit da637c0a9ee7aaf163c4da77f07affc67fecfacd
Author: viper-kun 
Date:   2014-09-20T08:30:13Z

Update monitoring.md

commit ce53db2db2e1be8b2a1a1d1dec56fd8df8a43b41
Author: viper-kun 
Date:   2014-09-20T08:31:37Z

Update monitoring.md




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-09-18 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/2391#issuecomment-56129613
  
@andrewor14  i have checked Hadoop's JobHistoryServer. it is 
JobHistoryServer's responsibility to delete the application logs. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs

2014-09-18 Thread viper-kun

Github user viper-kun commented on a diff in the pull request:

https://github.com/apache/spark/pull/2391#discussion_r17766911
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -100,6 +125,12 @@ private[history] class FsHistoryProvider(conf: 
SparkConf) extends ApplicationHis
 checkForLogs()
 logCheckingThread.setDaemon(true)
 logCheckingThread.start()
+
+if(conf.getBoolean("spark.history.cleaner.enable", false)) {
+  cleanLogs()
+  logCleaningThread.setDaemon(true)
+  logCleaningThread.start()
+}
--- End diff --

yesï¼you are right. it is no need to clean it on initialization.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update configuration.md

2014-09-16 Thread viper-kun

GitHub user viper-kun opened a pull request:

https://github.com/apache/spark/pull/2406

Update configuration.md

change the value of spark.files.fetchTimeout

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viper-kun/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2406.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2406


commit 7cf4c7a13533c1eb0db5ef2ad10678fa4096175b
Author: viper-kun 
Date:   2014-09-16T09:04:46Z

Update configuration.md




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: cycle of deleting history log

2014-09-15 Thread viper-kun

Github user viper-kun commented on the pull request:

https://github.com/apache/spark/pull/2391#issuecomment-55688517
  
@srowen If we run spark application frequently, it will write many spark 
event log into spark.eventLog.dir. After a long time later, there will be many 
spark event log that we do not concern in the spark.eventLog.dir. So we need 
delete some logs that exceeding the time limit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: cycle of deleting history log

2014-09-14 Thread viper-kun

GitHub user viper-kun opened a pull request:

https://github.com/apache/spark/pull/2391

cycle of deleting history log



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viper-kun/spark xk2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2391.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2391


commit 79f492811dc939d1a50e6d8279e20042df4f95fe
Author: xukun 00228947 
Date:   2014-09-13T08:50:18Z

cycle of deleting history log




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

82 matches

Mail list logo