[GitHub] spark issue #18174: [SPARK-20950][CORE]Improve diskWriteBufferSize configura...

2017-06-25 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/18174
  
It would be also great to update the explicit performance tuning result in 
PR description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18174: [SPARK-20950][CORE]Improve diskWriteBufferSize configura...

2017-06-25 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18174
  
the change looks reasonable, but you really need to update your PR 
description. You are not improving some config, but introducing a new config 
which is hard coded before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18174: [SPARK-20950][CORE]Improve diskWriteBufferSize configura...

2017-06-25 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18174
  
cc @jiangxb1987 @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18174: [SPARK-20950][CORE]Improve diskWriteBufferSize configura...

2017-06-23 Thread heary-cao
Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/18174
  
@gatorsmile 
Can you review this code.
thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18174: [SPARK-20950][CORE]Improve diskWriteBufferSize configura...

2017-06-03 Thread heary-cao
Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/18174
  
@srowen 
yes, you're right, It's time, and their unit is MS.
the numbers Is the average time of 10 times running` forceSorterToSpill`.
I assume big buffer copies time consuming longer than small buffer. 
Although the small buffer has been copied many times.  or local file systems 
write big buffer that time consuming longer than small buffer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18174: [SPARK-20950][CORE]Improve diskWriteBufferSize configura...

2017-06-03 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18174
  
There's no description of your test or what the numbers mean. I assume 
they're times. Why would a smaller buffer be faster?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18174: [SPARK-20950][CORE]Improve diskWriteBufferSize configura...

2017-06-03 Thread heary-cao
Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/18174
  
@srowen 
thanks for review it.
In our performance tuning, find the row of record the size of more than 2M. 
 so need to initialSerBufferSize configurable. but Change `initialSerBufferSize 
` is not good for performance tuning.
However, change spill `diskWriteBufferSize ` is good for performance 
tuning. 
So I did a little experiment, change the size of the diskWriteBufferSize to 
test.
set diskWriteBufferSize to 1M, 512K, 256k, 256K, 128K, 64K,etc. 

diskWriteBufferSize   `1M512K256K128K64K32K
16K8K 4K`
RecordSize:2.5M `742   722 694686  667  668 
  671669683`
   RecordSize:1M`294   293 292287  283  
285281 279285`

In order to eliminate the interference of other factors, these results are 
tested take the average of 10 times.

please review code again.
thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org