[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15915


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-16 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r92785420
  
--- Diff: 
core/src/test/scala/org/apache/spark/storage/MemoryStoreSuite.scala ---
@@ -60,7 +60,7 @@ class MemoryStoreSuite
 SizeEstimator invokePrivate initialize()
   }
 
-  def makeMemoryStore(maxMem: Long): (MemoryStore, BlockInfoManager) = {
+  def makeMemoryStore(maxMem: Long, conf: SparkConf = conf): (MemoryStore, 
BlockInfoManager) = {
--- End diff --

get it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-16 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r92774218
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -331,7 +331,15 @@ private[spark] class MemoryStore(
 var unrollMemoryUsedByThisBlock = 0L
 // Underlying buffer for unrolling the block
 val redirectableStream = new RedirectableOutputStream
-val bbos = new 
ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator)
+val chunkSize = if (initialMemoryThreshold > Int.MaxValue) {
+  logWarning(s"Initial memory threshold of 
${Utils.bytesToString(initialMemoryThreshold)} " +
+s"is too large to be set as chunk size. It is safe to be cap at 
Int.MaxValue for chunk " +
+s"size instead.")
+  Int.MaxValue
+} else {
+  initialMemoryThreshold.toInt
+}
+val bbos = new ChunkedByteBufferOutputStream(chunkSize, allocator)
--- End diff --

Yeah. As it causes OOM during jenkins test, we may not be able to test it 
like currently added test case if we cap it as `Int.MaxValue`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-16 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r92768403
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -331,7 +331,15 @@ private[spark] class MemoryStore(
 var unrollMemoryUsedByThisBlock = 0L
 // Underlying buffer for unrolling the block
 val redirectableStream = new RedirectableOutputStream
-val bbos = new 
ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator)
+val chunkSize = if (initialMemoryThreshold > Int.MaxValue) {
+  logWarning(s"Initial memory threshold of 
${Utils.bytesToString(initialMemoryThreshold)} " +
+s"is too large to be set as chunk size. It is safe to be cap at 
Int.MaxValue for chunk " +
+s"size instead.")
+  Int.MaxValue
+} else {
+  initialMemoryThreshold.toInt
+}
+val bbos = new ChunkedByteBufferOutputStream(chunkSize, allocator)
--- End diff --

Another option is to catch OOM exception here and then log a meaningful 
error message to hint users to lower `spark.storage.unrollMemoryThreshold` 
setting.

@srowen What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-15 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r92740284
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -331,7 +331,15 @@ private[spark] class MemoryStore(
 var unrollMemoryUsedByThisBlock = 0L
 // Underlying buffer for unrolling the block
 val redirectableStream = new RedirectableOutputStream
-val bbos = new 
ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator)
+val chunkSize = if (initialMemoryThreshold > Int.MaxValue) {
+  logWarning(s"Initial memory threshold of 
${Utils.bytesToString(initialMemoryThreshold)} " +
+s"is too large to be set as chunk size. It is safe to be cap at 
Int.MaxValue for chunk " +
+s"size instead.")
+  Int.MaxValue
--- End diff --

Asking for `Int.MaxValue` causes `java.lang.OutOfMemoryError` on jenkins.

Although I think it is reasonable if it is on users side because they ask 
so much memory by configuration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-15 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r92739704
  
--- Diff: 
core/src/test/scala/org/apache/spark/storage/MemoryStoreSuite.scala ---
@@ -60,7 +60,7 @@ class MemoryStoreSuite
 SizeEstimator invokePrivate initialize()
   }
 
-  def makeMemoryStore(maxMem: Long): (MemoryStore, BlockInfoManager) = {
+  def makeMemoryStore(maxMem: Long, conf: SparkConf = conf): (MemoryStore, 
BlockInfoManager) = {
--- End diff --

I changed this to pass a temp conf which set 
`spark.storage.unrollMemoryThreshold` a different value.

```
+val tmpConf = conf.clone.set("spark.storage.unrollMemoryThreshold", 
s"${1L + Int.MaxValue}")
+val (memoryStore, blockInfoManager) = makeMemoryStore(12000L + 
Int.MaxValue, tmpConf)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-15 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r92739413
  
--- Diff: 
core/src/test/scala/org/apache/spark/storage/MemoryStoreSuite.scala ---
@@ -60,7 +60,7 @@ class MemoryStoreSuite
 SizeEstimator invokePrivate initialize()
   }
 
-  def makeMemoryStore(maxMem: Long): (MemoryStore, BlockInfoManager) = {
+  def makeMemoryStore(maxMem: Long, conf: SparkConf = conf): (MemoryStore, 
BlockInfoManager) = {
--- End diff --

I think it is for passing a SparkConf with modified 
`spark.storage.unrollMemoryThreshold`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-15 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r92739151
  
--- Diff: 
core/src/test/scala/org/apache/spark/storage/MemoryStoreSuite.scala ---
@@ -60,7 +60,7 @@ class MemoryStoreSuite
 SizeEstimator invokePrivate initialize()
   }
 
-  def makeMemoryStore(maxMem: Long): (MemoryStore, BlockInfoManager) = {
+  def makeMemoryStore(maxMem: Long, conf: SparkConf = conf): (MemoryStore, 
BlockInfoManager) = {
--- End diff --

👍 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-15 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r92739178
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -331,7 +331,15 @@ private[spark] class MemoryStore(
 var unrollMemoryUsedByThisBlock = 0L
 // Underlying buffer for unrolling the block
 val redirectableStream = new RedirectableOutputStream
-val bbos = new 
ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator)
+val chunkSize = if (initialMemoryThreshold > Int.MaxValue) {
+  logWarning(s"Initial memory threshold of 
${Utils.bytesToString(initialMemoryThreshold)} " +
+s"is too large to be set as chunk size. It is safe to be cap at 
Int.MaxValue for chunk " +
--- End diff --

👍 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-15 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r92611203
  
--- Diff: 
core/src/test/scala/org/apache/spark/storage/MemoryStoreSuite.scala ---
@@ -60,7 +60,7 @@ class MemoryStoreSuite
 SizeEstimator invokePrivate initialize()
   }
 
-  def makeMemoryStore(maxMem: Long): (MemoryStore, BlockInfoManager) = {
+  def makeMemoryStore(maxMem: Long, conf: SparkConf = conf): (MemoryStore, 
BlockInfoManager) = {
--- End diff --

Is this change needed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-15 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r92611127
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -331,7 +331,15 @@ private[spark] class MemoryStore(
 var unrollMemoryUsedByThisBlock = 0L
 // Underlying buffer for unrolling the block
 val redirectableStream = new RedirectableOutputStream
-val bbos = new 
ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator)
+val chunkSize = if (initialMemoryThreshold > Int.MaxValue) {
+  logWarning(s"Initial memory threshold of 
${Utils.bytesToString(initialMemoryThreshold)} " +
+s"is too large to be set as chunk size. It is safe to be cap at 
Int.MaxValue for chunk " +
--- End diff --

How about `Chunk size has been capped to 
${Utils.bytesToStringInt.MaxValue)}`instead, in the second sentence? that's 
more direct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r92323575
  
--- Diff: 
core/src/test/scala/org/apache/spark/storage/MemoryStoreSuite.scala ---
@@ -303,6 +303,34 @@ class MemoryStoreSuite
 assert(memoryStore.currentUnrollMemoryForThisTask === 0) // discard 
released the unroll memory
   }
 
+  test("set unrollMemoryThreshold a huge value larger than Int.MaxValue") {
+val tmpConf = conf.clone.set("spark.storage.unrollMemoryThreshold", 
s"${1L + Int.MaxValue}")
+val (memoryStore, blockInfoManager) = makeMemoryStore(12000L + 
Int.MaxValue, tmpConf)
+val smallList = List.fill(40)(new Array[Byte](100))
+def smallIterator: Iterator[Any] = 
smallList.iterator.asInstanceOf[Iterator[Any]]
+assert(memoryStore.currentUnrollMemoryForThisTask === 0)
+
+def putIteratorAsBytes[T](
+blockId: BlockId,
+iter: Iterator[T],
+classTag: ClassTag[T]): Either[PartiallySerializedBlock[T], Long] 
= {
+  assert(blockInfoManager.lockNewBlockForWriting(
+blockId,
+new BlockInfo(StorageLevel.MEMORY_ONLY_SER, classTag, tellMaster = 
false)))
+  val res = memoryStore.putIteratorAsBytes(blockId, iter, classTag, 
MemoryMode.ON_HEAP)
+  blockInfoManager.unlock(blockId)
+  res
+}
+
+// Unroll with plenty of space. This should succeed and cache both 
blocks.
+val result1 = putIteratorAsBytes("b1", smallIterator, ClassTag.Any)
+val result2 = putIteratorAsBytes("b2", smallIterator, ClassTag.Any)
+assert(memoryStore.contains("b1"))
+assert(memoryStore.contains("b2"))
+assert(result1.isRight) // unroll was successful
+assert(result2.isRight)
--- End diff --

nvm. It will be released once unrolling is successful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r92322708
  
--- Diff: 
core/src/test/scala/org/apache/spark/storage/MemoryStoreSuite.scala ---
@@ -303,6 +303,34 @@ class MemoryStoreSuite
 assert(memoryStore.currentUnrollMemoryForThisTask === 0) // discard 
released the unroll memory
   }
 
+  test("set unrollMemoryThreshold a huge value larger than Int.MaxValue") {
+val tmpConf = conf.clone.set("spark.storage.unrollMemoryThreshold", 
s"${1L + Int.MaxValue}")
+val (memoryStore, blockInfoManager) = makeMemoryStore(12000L + 
Int.MaxValue, tmpConf)
+val smallList = List.fill(40)(new Array[Byte](100))
+def smallIterator: Iterator[Any] = 
smallList.iterator.asInstanceOf[Iterator[Any]]
+assert(memoryStore.currentUnrollMemoryForThisTask === 0)
+
+def putIteratorAsBytes[T](
+blockId: BlockId,
+iter: Iterator[T],
+classTag: ClassTag[T]): Either[PartiallySerializedBlock[T], Long] 
= {
+  assert(blockInfoManager.lockNewBlockForWriting(
+blockId,
+new BlockInfo(StorageLevel.MEMORY_ONLY_SER, classTag, tellMaster = 
false)))
+  val res = memoryStore.putIteratorAsBytes(blockId, iter, classTag, 
MemoryMode.ON_HEAP)
+  blockInfoManager.unlock(blockId)
+  res
+}
+
+// Unroll with plenty of space. This should succeed and cache both 
blocks.
+val result1 = putIteratorAsBytes("b1", smallIterator, ClassTag.Any)
+val result2 = putIteratorAsBytes("b2", smallIterator, ClassTag.Any)
+assert(memoryStore.contains("b1"))
+assert(memoryStore.contains("b2"))
+assert(result1.isRight) // unroll was successful
+assert(result2.isRight)
--- End diff --

Let's also check if the `currentUnrollMemoryForThisTask` is the default 
size 4MB.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-13 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r92308058
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -331,7 +331,12 @@ private[spark] class MemoryStore(
 var unrollMemoryUsedByThisBlock = 0L
 // Underlying buffer for unrolling the block
 val redirectableStream = new RedirectableOutputStream
-val bbos = new 
ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator)
+val chunkSize = if (initialMemoryThreshold > Int.MaxValue) {
+  4 * 1024 * 1024
--- End diff --

I  prefer the first one, and the second one is too radical.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-13 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r92305310
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -331,7 +331,12 @@ private[spark] class MemoryStore(
 var unrollMemoryUsedByThisBlock = 0L
 // Underlying buffer for unrolling the block
 val redirectableStream = new RedirectableOutputStream
-val bbos = new 
ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator)
+val chunkSize = if (initialMemoryThreshold > Int.MaxValue) {
+  4 * 1024 * 1024
--- End diff --

I prefer the first one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r92169149
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -331,7 +331,12 @@ private[spark] class MemoryStore(
 var unrollMemoryUsedByThisBlock = 0L
 // Underlying buffer for unrolling the block
 val redirectableStream = new RedirectableOutputStream
-val bbos = new 
ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator)
+val chunkSize = if (initialMemoryThreshold > Int.MaxValue) {
+  4 * 1024 * 1024
--- End diff --

Because `initialMemoryThreshold` is set to `unrollMemoryThreshold` which is 
configurable, if we want to use a default value in this case, we should log a 
warning message at least.

Another option is just throwing an exception. I am ok for either one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-13 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r92123402
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/io/ChunkedByteBufferOutputStream.scala
 ---
@@ -30,9 +31,14 @@ import org.apache.spark.storage.StorageUtils
  * @param chunkSize size of each chunk, in bytes.
  */
 private[spark] class ChunkedByteBufferOutputStream(
-chunkSize: Int,
+var chunkSize: Int,
 allocator: Int => ByteBuffer)
-  extends OutputStream {
+  extends OutputStream with Logging{
+
+  if (chunkSize < 0) {
+logWarning(s"chunkSize should not be an negative value, replaced as 
4MB default.")
+chunkSize = 4 * 1024 * 1024
+  }
 
--- End diff --

This doesn't fix the issue, because a value > Int.MaxValue can still 
truncate to a (wrong) positive Int.

If protecting against Long values > Int.MaxValue is the least we can do to 
address this problem, we should actually do that. It can't be that many call 
sites right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-12 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r92104532
  
--- Diff: 
core/src/test/scala/org/apache/spark/util/io/ChunkedByteBufferOutputStreamSuite.scala
 ---
@@ -119,4 +119,21 @@ class ChunkedByteBufferOutputStreamSuite extends 
SparkFunSuite {
 assert(arrays(1).toSeq === ref.slice(10, 20))
 assert(arrays(2).toSeq === ref.slice(20, 30))
   }
+
+  test("negative chunk size") {
+val ref = new Array[Byte](8 * 1024 * 1024 + 10)
+Random.nextBytes(ref)
+val o = new ChunkedByteBufferOutputStream(-10, ByteBuffer.allocate)
+o.write(ref)
+o.close()
+val arrays = o.toChunkedByteBuffer.getChunks().map(_.array())
+assert(arrays.length === 3)
+assert(arrays(0).length === 4 * 1024 * 1024)
+assert(arrays(1).length === 4 * 1024 * 1024)
+assert(arrays(2).length === 10 )
+
+assert(arrays(0).toSeq === ref.slice(0, 4 * 1024 * 1024))
+assert(arrays(1).toSeq === ref.slice(4 * 1024 * 1024, 8 * 1024 * 1024))
+assert(arrays(2).toSeq === ref.slice(8 * 1024 * 1024, 8 * 1024 * 1024 
+ 10))
+  }
 }
--- End diff --

Discovery starting.
Discovery completed in 42 seconds, 124 milliseconds.
Run starting. Expected test count is: 9
ChunkedByteBufferOutputStreamSuite:
- empty output
- write a single byte
- write a single near boundary
- write a single at boundary
- single chunk output
- single chunk output at boundary size
- multiple chunk output
- multiple chunk output at boundary size
- negative chunk size
Run completed in 42 seconds, 700 milliseconds.
Total number of tests run: 9
Suites: completed 2, aborted 0
Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
All tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-12 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r92103662
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/io/ChunkedByteBufferOutputStream.scala
 ---
@@ -30,9 +31,14 @@ import org.apache.spark.storage.StorageUtils
  * @param chunkSize size of each chunk, in bytes.
  */
 private[spark] class ChunkedByteBufferOutputStream(
-chunkSize: Int,
+var chunkSize: Int,
 allocator: Int => ByteBuffer)
-  extends OutputStream {
+  extends OutputStream with Logging{
+
+  if (chunkSize < 0) {
+logWarning(s"chunkSize should not be an negative value, replaced as 
4MB default.")
+chunkSize = 4 * 1024 * 1024
+  }
 
--- End diff --

As `chunkSize` is passed from many code path, and there is underlying 
integer overflow when convert from `Long` to `Int`. As we do not have a better 
solution, introducing a protection check may be a tradeoff way in the first 
step.

@JoshRosen @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r91893984
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -331,7 +332,7 @@ private[spark] class MemoryStore(
 var unrollMemoryUsedByThisBlock = 0L
 // Underlying buffer for unrolling the block
 val redirectableStream = new RedirectableOutputStream
-val bbos = new 
ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator)
+val bbos = new ChunkedByteBufferOutputStream(chunkSize, allocator)
--- End diff --

Don't we need to add check for the size? It still exposes to overflow by 
converting `pageSizeBytes` from long to int, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r91892123
  
--- Diff: 
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -78,6 +80,7 @@ private[spark] class TorrentBroadcast[T: ClassTag](obj: 
T, id: Long)
 }
 // Note: use getSizeAsKb (not bytes) to maintain compatibility if no 
units are provided
 blockSize = conf.getSizeAsKb("spark.broadcast.blockSize", "4m").toInt 
* 1024
--- End diff --

`spark.broadcast.blockSize` has special meaning. I don't think we should 
replace it with `pageSizeBytes`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-11 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r9169
  
--- Diff: 
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -78,6 +80,7 @@ private[spark] class TorrentBroadcast[T: ClassTag](obj: 
T, id: Long)
 }
 // Note: use getSizeAsKb (not bytes) to maintain compatibility if no 
units are provided
 blockSize = conf.getSizeAsKb("spark.broadcast.blockSize", "4m").toInt 
* 1024
+chunkSize = SparkEnv.get.memoryManager.pageSizeBytes.toInt
 checksumEnabled = conf.getBoolean("spark.broadcast.checksum", true)
--- End diff --

@JoshRosen We use `SparkEnv.get.memoryManager.pageSizeBytes` as chunk size. 
As `SparkEnv.get.memoryManager.pageSizeBytes` returns `Long`, there is still 
underlying integer overflow issue, isn't it? Besides, users will never know the 
 low level details and the effect to chunk size when modify `pageSizeBytes`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-11 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r91880781
  
--- Diff: core/src/main/scala/org/apache/spark/memory/MemoryManager.scala 
---
@@ -223,8 +222,10 @@ private[spark] abstract class MemoryManager(
   case MemoryMode.OFF_HEAP => offHeapExecutionMemoryPool.poolSize
 }
 val size = ByteArrayMethods.nextPowerOf2(maxTungstenMemory / cores / 
safetyFactor)
-val default = math.min(maxPageSize, math.max(minPageSize, size))
-conf.getSizeAsBytes("spark.buffer.pageSize", default)
+val maxPageSize = math.min(64L * minPageSize, math.max(minPageSize, 
size))
+val userSetting = conf.getSizeAsBytes("spark.buffer.pageSize")
+// In case of too large page size.
+math.min(userSetting, maxPageSize)
   }
--- End diff --

@JoshRosen The `SparkEnv.memoryManager.pageSizeBytes` returns `Long`, if we 
reuse it as chunk size, there is still underlying integer overflow, isn't it? 
Here, I restricted the upper limit of page size in case of too large.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-11-17 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r88602061
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -331,7 +331,12 @@ private[spark] class MemoryStore(
 var unrollMemoryUsedByThisBlock = 0L
 // Underlying buffer for unrolling the block
 val redirectableStream = new RedirectableOutputStream
-val bbos = new 
ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator)
+val chunkSize = if (initialMemoryThreshold >= Integer.MAX_VALUE) {
--- End diff --

Thanks for your feedback. Let us  listen to @joshrosen 's advice


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-11-17 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r88453146
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -331,7 +331,12 @@ private[spark] class MemoryStore(
 var unrollMemoryUsedByThisBlock = 0L
 // Underlying buffer for unrolling the block
 val redirectableStream = new RedirectableOutputStream
-val bbos = new 
ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator)
+val chunkSize = if (initialMemoryThreshold >= Integer.MAX_VALUE) {
--- End diff --

Yes, this is no worse than the current behavior (error) in this case; is 
this valid? It looks like it's valid to use something smaller than this init 
threshold if it's huge. Tests will help prove that out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-11-17 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r88429590
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -331,7 +332,8 @@ private[spark] class MemoryStore(
 var unrollMemoryUsedByThisBlock = 0L
 // Underlying buffer for unrolling the block
 val redirectableStream = new RedirectableOutputStream
-val bbos = new 
ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator)
+val bbos =
--- End diff --

make sense, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-11-17 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r88429484
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -331,7 +332,8 @@ private[spark] class MemoryStore(
 var unrollMemoryUsedByThisBlock = 0L
 // Underlying buffer for unrolling the block
 val redirectableStream = new RedirectableOutputStream
-val bbos = new 
ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator)
+val bbos =
--- End diff --

make sense, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-11-17 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r88426103
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -331,7 +332,8 @@ private[spark] class MemoryStore(
 var unrollMemoryUsedByThisBlock = 0L
 // Underlying buffer for unrolling the block
 val redirectableStream = new RedirectableOutputStream
-val bbos = new 
ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator)
+val bbos =
--- End diff --

Rather than import a whole other class for this, you probably want to 
explicitly check and throw an exception.

But, you're basically saying that this threshold can't be >= 4GB. Should 
that not just be checked earlier?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-11-17 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/15915

[SPARK-18485][CORE] Underlying integer overflow when create 
ChunkedByteBufferOutputStream in MemoryStore

## What changes were proposed in this pull request?

There is an underlying integer overflow when create 
ChunkedByteBufferOutputStream in MemoryStore. This PR provide a check before 
cast.  

## How was this patch tested?


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark SPARK-18485

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15915.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15915


commit e10143b0a7beb8e0924c7ba5bf437f0848eac778
Author: uncleGen 
Date:   2016-11-17T08:30:05Z

SPARK-18485: Underlying integer overflow when create 
ChunkedByteBufferOutputStream in MemoryStore

commit 8cd01d1d9f291a4515ae55ed15532fb89d09a006
Author: uncleGen 
Date:   2016-11-17T08:31:28Z

update code style




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org