[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-07-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18174


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-07-04 Thread heary-cao
Github user heary-cao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r125548280
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -335,4 +335,29 @@ package object config {
 "spark.")
   .booleanConf
   .createWithDefault(false)
+
+  private[spark] val SHUFFLE_FILE_BUFFER_SIZE =
+ConfigBuilder("spark.shuffle.file.buffer")
+  .doc("Used to set the buffer size of outputstream in shuffle 
writer.")
--- End diff --

OK,
I have modify it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-07-04 Thread heary-cao
Github user heary-cao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r125548173
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -335,4 +335,29 @@ package object config {
 "spark.")
   .booleanConf
   .createWithDefault(false)
+
+  private[spark] val SHUFFLE_FILE_BUFFER_SIZE =
+ConfigBuilder("spark.shuffle.file.buffer")
+  .doc("Used to set the buffer size of outputstream in shuffle 
writer.")
+  .bytesConf(ByteUnit.KiB)
+  .checkValue(v => v > 0 && v <= Int.MaxValue / 1024,
+s"The file buffer size must be greater than 0 to ${Int.MaxValue / 
1024}.")
--- End diff --

thank you.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-07-04 Thread heary-cao
Github user heary-cao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r125547523
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -335,4 +335,29 @@ package object config {
 "spark.")
   .booleanConf
   .createWithDefault(false)
+
+  private[spark] val SHUFFLE_FILE_BUFFER_SIZE =
+ConfigBuilder("spark.shuffle.file.buffer")
+  .doc("Used to set the buffer size of outputstream in shuffle 
writer.")
+  .bytesConf(ByteUnit.KiB)
+  .checkValue(v => v > 0 && v <= Int.MaxValue / 1024,
+s"The file buffer size must be greater than 0 to ${Int.MaxValue / 
1024}.")
+  .createWithDefaultString("32k")
+
+  private[spark] val SHUFFLE_UNSAFE_FILE_OUTPUT_BUFFER_SIZE =
+ConfigBuilder("spark.shuffle.unsafe.file.output.buffer")
+  .doc("The file system for this buffer size after each partition " +
+"is written in unsafe shuffle writer.")
+  .bytesConf(ByteUnit.KiB)
+  .checkValue(v => v > 0 && v <= Int.MaxValue / 1024,
+s"The buffer size must be greater than 0 to ${Int.MaxValue / 
1024}.")
+  .createWithDefaultString("32k")
+
+  private[spark] val SHUFFLE_DISK_WRITE_BUFFER_SIZE =
+ConfigBuilder("spark.shuffle.spill.diskWriteBufferSize")
+  .doc("The buffer size to use when writing the sorted records to an 
on-disk file.")
+  .intConf
--- End diff --

OK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-07-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r125546508
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -335,4 +335,29 @@ package object config {
 "spark.")
   .booleanConf
   .createWithDefault(false)
+
+  private[spark] val SHUFFLE_FILE_BUFFER_SIZE =
+ConfigBuilder("spark.shuffle.file.buffer")
+  .doc("Used to set the buffer size of outputstream in shuffle 
writer.")
+  .bytesConf(ByteUnit.KiB)
+  .checkValue(v => v > 0 && v <= Int.MaxValue / 1024,
+s"The file buffer size must be greater than 0 to ${Int.MaxValue / 
1024}.")
+  .createWithDefaultString("32k")
+
+  private[spark] val SHUFFLE_UNSAFE_FILE_OUTPUT_BUFFER_SIZE =
+ConfigBuilder("spark.shuffle.unsafe.file.output.buffer")
+  .doc("The file system for this buffer size after each partition " +
+"is written in unsafe shuffle writer.")
+  .bytesConf(ByteUnit.KiB)
+  .checkValue(v => v > 0 && v <= Int.MaxValue / 1024,
+s"The buffer size must be greater than 0 to ${Int.MaxValue / 
1024}.")
+  .createWithDefaultString("32k")
+
+  private[spark] val SHUFFLE_DISK_WRITE_BUFFER_SIZE =
+ConfigBuilder("spark.shuffle.spill.diskWriteBufferSize")
+  .doc("The buffer size to use when writing the sorted records to an 
on-disk file.")
+  .intConf
--- End diff --

use `bytesConf` please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-07-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r125546465
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -335,4 +335,29 @@ package object config {
 "spark.")
   .booleanConf
   .createWithDefault(false)
+
+  private[spark] val SHUFFLE_FILE_BUFFER_SIZE =
+ConfigBuilder("spark.shuffle.file.buffer")
+  .doc("Used to set the buffer size of outputstream in shuffle 
writer.")
+  .bytesConf(ByteUnit.KiB)
+  .checkValue(v => v > 0 && v <= Int.MaxValue / 1024,
+s"The file buffer size must be greater than 0 to ${Int.MaxValue / 
1024}.")
--- End diff --

`must be greater than 0 and less than ${Int.MaxValue / 1024}`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-07-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r125546440
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -335,4 +335,29 @@ package object config {
 "spark.")
   .booleanConf
   .createWithDefault(false)
+
+  private[spark] val SHUFFLE_FILE_BUFFER_SIZE =
+ConfigBuilder("spark.shuffle.file.buffer")
+  .doc("Used to set the buffer size of outputstream in shuffle 
writer.")
--- End diff --

can we copy the document from 
http://spark.apache.org/docs/latest/configuration.html ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-07-04 Thread heary-cao
Github user heary-cao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r125545875
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -335,4 +335,26 @@ package object config {
 "spark.")
   .booleanConf
   .createWithDefault(false)
+
+  private[spark] val SHUFFLE_FILE_BUFFER_SIZE =
+ConfigBuilder("spark.shuffle.file.buffer")
+  .doc("Used to set the buffer size of outputstream in shuffle 
writer.")
+  .bytesConf(ByteUnit.KiB)
+  .checkValues(Set(4, 8, 16, 32, 64, 128, 256, 512, 1024))
--- End diff --

I have modify it.
For buffer size, I think 0 is meaningless, 
so I have modify for
`checkValue(v => v > 0 && v <= Int.MaxValue / 1024, ...)`
do you agree it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-07-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r125468331
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -335,4 +335,26 @@ package object config {
 "spark.")
   .booleanConf
   .createWithDefault(false)
+
+  private[spark] val SHUFFLE_FILE_BUFFER_SIZE =
+ConfigBuilder("spark.shuffle.file.buffer")
+  .doc("Used to set the buffer size of outputstream in shuffle 
writer.")
+  .bytesConf(ByteUnit.KiB)
+  .checkValues(Set(4, 8, 16, 32, 64, 128, 256, 512, 1024))
--- End diff --

we should use `checkValue(v => v >= 0 && v <= Int.MaxValue / 1024, ...)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-07-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r125468128
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -335,4 +335,26 @@ package object config {
 "spark.")
   .booleanConf
   .createWithDefault(false)
+
+  private[spark] val SHUFFLE_FILE_BUFFER_SIZE =
+ConfigBuilder("spark.shuffle.file.buffer")
+  .doc("Used to set the buffer size of outputstream in shuffle 
writer.")
+  .bytesConf(ByteUnit.KiB)
+  .checkValues(Set(4, 8, 16, 32, 64, 128, 256, 512, 1024))
--- End diff --

do we have this requirement before? I think the only requirement is, the 
value should be `>= 0` and `<= Int.MaxValue / 1024`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-07-04 Thread heary-cao
Github user heary-cao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r125421705
  
--- Diff: 
core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java ---
@@ -360,12 +368,10 @@ void forceSorterToSpill() throws IOException {
 
 final OutputStream bos = new BufferedOutputStream(
 new FileOutputStream(outputFile),
-(int) 
sparkConf.getSizeAsKb("spark.shuffle.unsafe.file.output.buffer", "32k") * 1024);
+outputBufferSizeInBytes);
--- End diff --

thanks for explain it.
The explanation is very clear.
I study it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-07-03 Thread manku-timma
Github user manku-timma commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r125386319
  
--- Diff: 
core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java ---
@@ -360,12 +368,10 @@ void forceSorterToSpill() throws IOException {
 
 final OutputStream bos = new BufferedOutputStream(
 new FileOutputStream(outputFile),
-(int) 
sparkConf.getSizeAsKb("spark.shuffle.unsafe.file.output.buffer", "32k") * 1024);
+outputBufferSizeInBytes);
--- End diff --

Just to understand what is happening.

1. Shuffle records are written to a serialisation buffer (1M) after 
serialisation
2. The serialised buffer is written to in-memory-sorter’s buffer
3. once in-memory sorter’s buffer is full, the data is copied to 
sorter’s disk buffer (1M)
4. the sorter’s disk buffer is written out to a buffered output stream 
(buffer = 32k)

I am guessing reducing the sorter’s disk buffer (in step 3) is helping 
because it triggers fewer writes/allocations in a single call at step 4 (and 
allowing more parallelism in writing back to disk and copying of data).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-07-02 Thread heary-cao
Github user heary-cao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r125204174
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -335,4 +335,20 @@ package object config {
 "spark.")
   .booleanConf
   .createWithDefault(false)
+
+  private[spark] val SHUFFLE_FILE_BUFFER_SIZE =
+ConfigBuilder("spark.shuffle.file.buffer")
+  .intConf
+  .createWithDefault(32 * 1024)
+
+  private[spark] val SHUFFLE_UNSAFE_FILE_OUTPUT_BUFFER_SIZE =
+ConfigBuilder("spark.shuffle.unsafe.file.output.buffer")
--- End diff --

OK,
thank you for review it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-07-02 Thread heary-cao
Github user heary-cao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r125204165
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -335,4 +335,20 @@ package object config {
 "spark.")
   .booleanConf
   .createWithDefault(false)
+
+  private[spark] val SHUFFLE_FILE_BUFFER_SIZE =
+ConfigBuilder("spark.shuffle.file.buffer")
--- End diff --

OK,
thank you for review it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-06-30 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r125106803
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -335,4 +335,20 @@ package object config {
 "spark.")
   .booleanConf
   .createWithDefault(false)
+
+  private[spark] val SHUFFLE_FILE_BUFFER_SIZE =
+ConfigBuilder("spark.shuffle.file.buffer")
--- End diff --

Also please add `.checkValue` for all these three


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-06-30 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r125106509
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -335,4 +335,20 @@ package object config {
 "spark.")
   .booleanConf
   .createWithDefault(false)
+
+  private[spark] val SHUFFLE_FILE_BUFFER_SIZE =
+ConfigBuilder("spark.shuffle.file.buffer")
--- End diff --

The same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-06-30 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r125106486
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -335,4 +335,20 @@ package object config {
 "spark.")
   .booleanConf
   .createWithDefault(false)
+
+  private[spark] val SHUFFLE_FILE_BUFFER_SIZE =
+ConfigBuilder("spark.shuffle.file.buffer")
+  .intConf
+  .createWithDefault(32 * 1024)
+
+  private[spark] val SHUFFLE_UNSAFE_FILE_OUTPUT_BUFFER_SIZE =
+ConfigBuilder("spark.shuffle.unsafe.file.output.buffer")
--- End diff --

No description?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-06-27 Thread heary-cao
Github user heary-cao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r124265367
  
--- Diff: 
core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
@@ -123,6 +126,8 @@
 this.inMemSorter = new ShuffleInMemorySorter(
   this, initialSize, 
conf.getBoolean("spark.shuffle.sort.useRadixSort", true));
 this.peakMemoryUsedBytes = getMemoryUsage();
+this.diskWriteBufferSize =
+conf.getInt("spark.shuffle.spill.diskWriteBufferSize", 
DISK_WRITE_BUFFER_SIZE);
--- End diff --

Thank you for your guidance.
please review it again.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-06-27 Thread heary-cao
Github user heary-cao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r124239231
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillWriter.java
 ---
@@ -61,6 +65,9 @@ public UnsafeSorterSpillWriter(
 this.file = spilledFileInfo._2();
 this.blockId = spilledFileInfo._1();
 this.numRecordsToWrite = numRecordsToWrite;
+this.diskWriteBufferSize =
--- End diff --

ok, I have update it.
please review it again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-06-27 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r124214950
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillWriter.java
 ---
@@ -61,6 +65,9 @@ public UnsafeSorterSpillWriter(
 this.file = spilledFileInfo._2();
 this.blockId = spilledFileInfo._1();
 this.numRecordsToWrite = numRecordsToWrite;
+this.diskWriteBufferSize =
--- End diff --

I don't have special interest over this, but at least we should keep the 
same behavior for both `ShuffleExternalSorter` class and 
`UnsafeSorterSpillWriter` class, otherwise we are introducing unnecessary 
complexity. Could you please update this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-06-26 Thread heary-cao
Github user heary-cao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r124182713
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillWriter.java
 ---
@@ -61,6 +65,9 @@ public UnsafeSorterSpillWriter(
 this.file = spilledFileInfo._2();
 this.blockId = spilledFileInfo._1();
 this.numRecordsToWrite = numRecordsToWrite;
+this.diskWriteBufferSize =
--- End diff --

Do you want to initialize as a member variable?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-06-26 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r124041199
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillWriter.java
 ---
@@ -61,6 +65,9 @@ public UnsafeSorterSpillWriter(
 this.file = spilledFileInfo._2();
 this.blockId = spilledFileInfo._1();
 this.numRecordsToWrite = numRecordsToWrite;
+this.diskWriteBufferSize =
--- End diff --

Why are we doing this in a different way?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-06-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r123932606
  
--- Diff: 
core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
@@ -123,6 +126,8 @@
 this.inMemSorter = new ShuffleInMemorySorter(
   this, initialSize, 
conf.getBoolean("spark.shuffle.sort.useRadixSort", true));
 this.peakMemoryUsedBytes = getMemoryUsage();
+this.diskWriteBufferSize =
+conf.getInt("spark.shuffle.spill.diskWriteBufferSize", 
DISK_WRITE_BUFFER_SIZE);
--- End diff --

oh, in java, you need to
```
import org.apache.spark.internal.config.package$;
...
package$.MODULE$.XXX()
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-06-25 Thread heary-cao
Github user heary-cao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r123924134
  
--- Diff: 
core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
@@ -123,6 +126,8 @@
 this.inMemSorter = new ShuffleInMemorySorter(
   this, initialSize, 
conf.getBoolean("spark.shuffle.sort.useRadixSort", true));
 this.peakMemoryUsedBytes = getMemoryUsage();
+this.diskWriteBufferSize =
+conf.getInt("spark.shuffle.spill.diskWriteBufferSize", 
DISK_WRITE_BUFFER_SIZE);
--- End diff --

@cloud-fan 
thanks for review it.
Thank you for your advice. I tried to fix it.
However, org.apache.spark.internal.config cannot be imported in the 
ShuffleExternalSorter.java class
This modification will change the org.apache.spark.internal.config and 
affect other code changes, 
so I suggest using the PR to modify it. Do you agree?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-06-25 Thread heary-cao
Github user heary-cao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r123923859
  
--- Diff: 
core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
@@ -82,6 +82,9 @@
   /** The buffer size to use when writing spills using 
DiskBlockObjectWriter */
   private final int fileBufferSizeBytes;
 
+  /** The buffer size to use when writes the sorted records to an on-disk 
file */
--- End diff --

@jiangxb1987 
thanks for review it,
The UnsafeSorterSpillWriter changes were updated.
please review it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org