[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15915 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r92785420 --- Diff: core/src/test/scala/org/apache/spark/storage/MemoryStoreSuite.scala --- @@ -60,7 +60,7 @@ class MemoryStoreSuite SizeEstimator invokePrivate initialize() } - def makeMemoryStore(maxMem: Long): (MemoryStore, BlockInfoManager) = { + def makeMemoryStore(maxMem: Long, conf: SparkConf = conf): (MemoryStore, BlockInfoManager) = { --- End diff -- get it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r92774218 --- Diff: core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala --- @@ -331,7 +331,15 @@ private[spark] class MemoryStore( var unrollMemoryUsedByThisBlock = 0L // Underlying buffer for unrolling the block val redirectableStream = new RedirectableOutputStream -val bbos = new ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator) +val chunkSize = if (initialMemoryThreshold > Int.MaxValue) { + logWarning(s"Initial memory threshold of ${Utils.bytesToString(initialMemoryThreshold)} " + +s"is too large to be set as chunk size. It is safe to be cap at Int.MaxValue for chunk " + +s"size instead.") + Int.MaxValue +} else { + initialMemoryThreshold.toInt +} +val bbos = new ChunkedByteBufferOutputStream(chunkSize, allocator) --- End diff -- Yeah. As it causes OOM during jenkins test, we may not be able to test it like currently added test case if we cap it as `Int.MaxValue`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r92768403 --- Diff: core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala --- @@ -331,7 +331,15 @@ private[spark] class MemoryStore( var unrollMemoryUsedByThisBlock = 0L // Underlying buffer for unrolling the block val redirectableStream = new RedirectableOutputStream -val bbos = new ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator) +val chunkSize = if (initialMemoryThreshold > Int.MaxValue) { + logWarning(s"Initial memory threshold of ${Utils.bytesToString(initialMemoryThreshold)} " + +s"is too large to be set as chunk size. It is safe to be cap at Int.MaxValue for chunk " + +s"size instead.") + Int.MaxValue +} else { + initialMemoryThreshold.toInt +} +val bbos = new ChunkedByteBufferOutputStream(chunkSize, allocator) --- End diff -- Another option is to catch OOM exception here and then log a meaningful error message to hint users to lower `spark.storage.unrollMemoryThreshold` setting. @srowen What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r92740284 --- Diff: core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala --- @@ -331,7 +331,15 @@ private[spark] class MemoryStore( var unrollMemoryUsedByThisBlock = 0L // Underlying buffer for unrolling the block val redirectableStream = new RedirectableOutputStream -val bbos = new ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator) +val chunkSize = if (initialMemoryThreshold > Int.MaxValue) { + logWarning(s"Initial memory threshold of ${Utils.bytesToString(initialMemoryThreshold)} " + +s"is too large to be set as chunk size. It is safe to be cap at Int.MaxValue for chunk " + +s"size instead.") + Int.MaxValue --- End diff -- Asking for `Int.MaxValue` causes `java.lang.OutOfMemoryError` on jenkins. Although I think it is reasonable if it is on users side because they ask so much memory by configuration. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r92739704 --- Diff: core/src/test/scala/org/apache/spark/storage/MemoryStoreSuite.scala --- @@ -60,7 +60,7 @@ class MemoryStoreSuite SizeEstimator invokePrivate initialize() } - def makeMemoryStore(maxMem: Long): (MemoryStore, BlockInfoManager) = { + def makeMemoryStore(maxMem: Long, conf: SparkConf = conf): (MemoryStore, BlockInfoManager) = { --- End diff -- I changed this to pass a temp conf which set `spark.storage.unrollMemoryThreshold` a different value. ``` +val tmpConf = conf.clone.set("spark.storage.unrollMemoryThreshold", s"${1L + Int.MaxValue}") +val (memoryStore, blockInfoManager) = makeMemoryStore(12000L + Int.MaxValue, tmpConf) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r92739413 --- Diff: core/src/test/scala/org/apache/spark/storage/MemoryStoreSuite.scala --- @@ -60,7 +60,7 @@ class MemoryStoreSuite SizeEstimator invokePrivate initialize() } - def makeMemoryStore(maxMem: Long): (MemoryStore, BlockInfoManager) = { + def makeMemoryStore(maxMem: Long, conf: SparkConf = conf): (MemoryStore, BlockInfoManager) = { --- End diff -- I think it is for passing a SparkConf with modified `spark.storage.unrollMemoryThreshold`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r92739151 --- Diff: core/src/test/scala/org/apache/spark/storage/MemoryStoreSuite.scala --- @@ -60,7 +60,7 @@ class MemoryStoreSuite SizeEstimator invokePrivate initialize() } - def makeMemoryStore(maxMem: Long): (MemoryStore, BlockInfoManager) = { + def makeMemoryStore(maxMem: Long, conf: SparkConf = conf): (MemoryStore, BlockInfoManager) = { --- End diff -- ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r92739178 --- Diff: core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala --- @@ -331,7 +331,15 @@ private[spark] class MemoryStore( var unrollMemoryUsedByThisBlock = 0L // Underlying buffer for unrolling the block val redirectableStream = new RedirectableOutputStream -val bbos = new ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator) +val chunkSize = if (initialMemoryThreshold > Int.MaxValue) { + logWarning(s"Initial memory threshold of ${Utils.bytesToString(initialMemoryThreshold)} " + +s"is too large to be set as chunk size. It is safe to be cap at Int.MaxValue for chunk " + --- End diff -- ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r92611203 --- Diff: core/src/test/scala/org/apache/spark/storage/MemoryStoreSuite.scala --- @@ -60,7 +60,7 @@ class MemoryStoreSuite SizeEstimator invokePrivate initialize() } - def makeMemoryStore(maxMem: Long): (MemoryStore, BlockInfoManager) = { + def makeMemoryStore(maxMem: Long, conf: SparkConf = conf): (MemoryStore, BlockInfoManager) = { --- End diff -- Is this change needed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r92611127 --- Diff: core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala --- @@ -331,7 +331,15 @@ private[spark] class MemoryStore( var unrollMemoryUsedByThisBlock = 0L // Underlying buffer for unrolling the block val redirectableStream = new RedirectableOutputStream -val bbos = new ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator) +val chunkSize = if (initialMemoryThreshold > Int.MaxValue) { + logWarning(s"Initial memory threshold of ${Utils.bytesToString(initialMemoryThreshold)} " + +s"is too large to be set as chunk size. It is safe to be cap at Int.MaxValue for chunk " + --- End diff -- How about `Chunk size has been capped to ${Utils.bytesToStringInt.MaxValue)}`instead, in the second sentence? that's more direct. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r92323575 --- Diff: core/src/test/scala/org/apache/spark/storage/MemoryStoreSuite.scala --- @@ -303,6 +303,34 @@ class MemoryStoreSuite assert(memoryStore.currentUnrollMemoryForThisTask === 0) // discard released the unroll memory } + test("set unrollMemoryThreshold a huge value larger than Int.MaxValue") { +val tmpConf = conf.clone.set("spark.storage.unrollMemoryThreshold", s"${1L + Int.MaxValue}") +val (memoryStore, blockInfoManager) = makeMemoryStore(12000L + Int.MaxValue, tmpConf) +val smallList = List.fill(40)(new Array[Byte](100)) +def smallIterator: Iterator[Any] = smallList.iterator.asInstanceOf[Iterator[Any]] +assert(memoryStore.currentUnrollMemoryForThisTask === 0) + +def putIteratorAsBytes[T]( +blockId: BlockId, +iter: Iterator[T], +classTag: ClassTag[T]): Either[PartiallySerializedBlock[T], Long] = { + assert(blockInfoManager.lockNewBlockForWriting( +blockId, +new BlockInfo(StorageLevel.MEMORY_ONLY_SER, classTag, tellMaster = false))) + val res = memoryStore.putIteratorAsBytes(blockId, iter, classTag, MemoryMode.ON_HEAP) + blockInfoManager.unlock(blockId) + res +} + +// Unroll with plenty of space. This should succeed and cache both blocks. +val result1 = putIteratorAsBytes("b1", smallIterator, ClassTag.Any) +val result2 = putIteratorAsBytes("b2", smallIterator, ClassTag.Any) +assert(memoryStore.contains("b1")) +assert(memoryStore.contains("b2")) +assert(result1.isRight) // unroll was successful +assert(result2.isRight) --- End diff -- nvm. It will be released once unrolling is successful. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r92322708 --- Diff: core/src/test/scala/org/apache/spark/storage/MemoryStoreSuite.scala --- @@ -303,6 +303,34 @@ class MemoryStoreSuite assert(memoryStore.currentUnrollMemoryForThisTask === 0) // discard released the unroll memory } + test("set unrollMemoryThreshold a huge value larger than Int.MaxValue") { +val tmpConf = conf.clone.set("spark.storage.unrollMemoryThreshold", s"${1L + Int.MaxValue}") +val (memoryStore, blockInfoManager) = makeMemoryStore(12000L + Int.MaxValue, tmpConf) +val smallList = List.fill(40)(new Array[Byte](100)) +def smallIterator: Iterator[Any] = smallList.iterator.asInstanceOf[Iterator[Any]] +assert(memoryStore.currentUnrollMemoryForThisTask === 0) + +def putIteratorAsBytes[T]( +blockId: BlockId, +iter: Iterator[T], +classTag: ClassTag[T]): Either[PartiallySerializedBlock[T], Long] = { + assert(blockInfoManager.lockNewBlockForWriting( +blockId, +new BlockInfo(StorageLevel.MEMORY_ONLY_SER, classTag, tellMaster = false))) + val res = memoryStore.putIteratorAsBytes(blockId, iter, classTag, MemoryMode.ON_HEAP) + blockInfoManager.unlock(blockId) + res +} + +// Unroll with plenty of space. This should succeed and cache both blocks. +val result1 = putIteratorAsBytes("b1", smallIterator, ClassTag.Any) +val result2 = putIteratorAsBytes("b2", smallIterator, ClassTag.Any) +assert(memoryStore.contains("b1")) +assert(memoryStore.contains("b2")) +assert(result1.isRight) // unroll was successful +assert(result2.isRight) --- End diff -- Let's also check if the `currentUnrollMemoryForThisTask` is the default size 4MB. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r92308058 --- Diff: core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala --- @@ -331,7 +331,12 @@ private[spark] class MemoryStore( var unrollMemoryUsedByThisBlock = 0L // Underlying buffer for unrolling the block val redirectableStream = new RedirectableOutputStream -val bbos = new ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator) +val chunkSize = if (initialMemoryThreshold > Int.MaxValue) { + 4 * 1024 * 1024 --- End diff -- I prefer the first one, and the second one is too radical. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r92305310 --- Diff: core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala --- @@ -331,7 +331,12 @@ private[spark] class MemoryStore( var unrollMemoryUsedByThisBlock = 0L // Underlying buffer for unrolling the block val redirectableStream = new RedirectableOutputStream -val bbos = new ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator) +val chunkSize = if (initialMemoryThreshold > Int.MaxValue) { + 4 * 1024 * 1024 --- End diff -- I prefer the first one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r92169149 --- Diff: core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala --- @@ -331,7 +331,12 @@ private[spark] class MemoryStore( var unrollMemoryUsedByThisBlock = 0L // Underlying buffer for unrolling the block val redirectableStream = new RedirectableOutputStream -val bbos = new ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator) +val chunkSize = if (initialMemoryThreshold > Int.MaxValue) { + 4 * 1024 * 1024 --- End diff -- Because `initialMemoryThreshold` is set to `unrollMemoryThreshold` which is configurable, if we want to use a default value in this case, we should log a warning message at least. Another option is just throwing an exception. I am ok for either one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r92123402 --- Diff: core/src/main/scala/org/apache/spark/util/io/ChunkedByteBufferOutputStream.scala --- @@ -30,9 +31,14 @@ import org.apache.spark.storage.StorageUtils * @param chunkSize size of each chunk, in bytes. */ private[spark] class ChunkedByteBufferOutputStream( -chunkSize: Int, +var chunkSize: Int, allocator: Int => ByteBuffer) - extends OutputStream { + extends OutputStream with Logging{ + + if (chunkSize < 0) { +logWarning(s"chunkSize should not be an negative value, replaced as 4MB default.") +chunkSize = 4 * 1024 * 1024 + } --- End diff -- This doesn't fix the issue, because a value > Int.MaxValue can still truncate to a (wrong) positive Int. If protecting against Long values > Int.MaxValue is the least we can do to address this problem, we should actually do that. It can't be that many call sites right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r92104532 --- Diff: core/src/test/scala/org/apache/spark/util/io/ChunkedByteBufferOutputStreamSuite.scala --- @@ -119,4 +119,21 @@ class ChunkedByteBufferOutputStreamSuite extends SparkFunSuite { assert(arrays(1).toSeq === ref.slice(10, 20)) assert(arrays(2).toSeq === ref.slice(20, 30)) } + + test("negative chunk size") { +val ref = new Array[Byte](8 * 1024 * 1024 + 10) +Random.nextBytes(ref) +val o = new ChunkedByteBufferOutputStream(-10, ByteBuffer.allocate) +o.write(ref) +o.close() +val arrays = o.toChunkedByteBuffer.getChunks().map(_.array()) +assert(arrays.length === 3) +assert(arrays(0).length === 4 * 1024 * 1024) +assert(arrays(1).length === 4 * 1024 * 1024) +assert(arrays(2).length === 10 ) + +assert(arrays(0).toSeq === ref.slice(0, 4 * 1024 * 1024)) +assert(arrays(1).toSeq === ref.slice(4 * 1024 * 1024, 8 * 1024 * 1024)) +assert(arrays(2).toSeq === ref.slice(8 * 1024 * 1024, 8 * 1024 * 1024 + 10)) + } } --- End diff -- Discovery starting. Discovery completed in 42 seconds, 124 milliseconds. Run starting. Expected test count is: 9 ChunkedByteBufferOutputStreamSuite: - empty output - write a single byte - write a single near boundary - write a single at boundary - single chunk output - single chunk output at boundary size - multiple chunk output - multiple chunk output at boundary size - negative chunk size Run completed in 42 seconds, 700 milliseconds. Total number of tests run: 9 Suites: completed 2, aborted 0 Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0 All tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r92103662 --- Diff: core/src/main/scala/org/apache/spark/util/io/ChunkedByteBufferOutputStream.scala --- @@ -30,9 +31,14 @@ import org.apache.spark.storage.StorageUtils * @param chunkSize size of each chunk, in bytes. */ private[spark] class ChunkedByteBufferOutputStream( -chunkSize: Int, +var chunkSize: Int, allocator: Int => ByteBuffer) - extends OutputStream { + extends OutputStream with Logging{ + + if (chunkSize < 0) { +logWarning(s"chunkSize should not be an negative value, replaced as 4MB default.") +chunkSize = 4 * 1024 * 1024 + } --- End diff -- As `chunkSize` is passed from many code path, and there is underlying integer overflow when convert from `Long` to `Int`. As we do not have a better solution, introducing a protection check may be a tradeoff way in the first step. @JoshRosen @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r91893984 --- Diff: core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala --- @@ -331,7 +332,7 @@ private[spark] class MemoryStore( var unrollMemoryUsedByThisBlock = 0L // Underlying buffer for unrolling the block val redirectableStream = new RedirectableOutputStream -val bbos = new ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator) +val bbos = new ChunkedByteBufferOutputStream(chunkSize, allocator) --- End diff -- Don't we need to add check for the size? It still exposes to overflow by converting `pageSizeBytes` from long to int, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r91892123 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -78,6 +80,7 @@ private[spark] class TorrentBroadcast[T: ClassTag](obj: T, id: Long) } // Note: use getSizeAsKb (not bytes) to maintain compatibility if no units are provided blockSize = conf.getSizeAsKb("spark.broadcast.blockSize", "4m").toInt * 1024 --- End diff -- `spark.broadcast.blockSize` has special meaning. I don't think we should replace it with `pageSizeBytes`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r9169 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -78,6 +80,7 @@ private[spark] class TorrentBroadcast[T: ClassTag](obj: T, id: Long) } // Note: use getSizeAsKb (not bytes) to maintain compatibility if no units are provided blockSize = conf.getSizeAsKb("spark.broadcast.blockSize", "4m").toInt * 1024 +chunkSize = SparkEnv.get.memoryManager.pageSizeBytes.toInt checksumEnabled = conf.getBoolean("spark.broadcast.checksum", true) --- End diff -- @JoshRosen We use `SparkEnv.get.memoryManager.pageSizeBytes` as chunk size. As `SparkEnv.get.memoryManager.pageSizeBytes` returns `Long`, there is still underlying integer overflow issue, isn't it? Besides, users will never know the low level details and the effect to chunk size when modify `pageSizeBytes`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r91880781 --- Diff: core/src/main/scala/org/apache/spark/memory/MemoryManager.scala --- @@ -223,8 +222,10 @@ private[spark] abstract class MemoryManager( case MemoryMode.OFF_HEAP => offHeapExecutionMemoryPool.poolSize } val size = ByteArrayMethods.nextPowerOf2(maxTungstenMemory / cores / safetyFactor) -val default = math.min(maxPageSize, math.max(minPageSize, size)) -conf.getSizeAsBytes("spark.buffer.pageSize", default) +val maxPageSize = math.min(64L * minPageSize, math.max(minPageSize, size)) +val userSetting = conf.getSizeAsBytes("spark.buffer.pageSize") +// In case of too large page size. +math.min(userSetting, maxPageSize) } --- End diff -- @JoshRosen The `SparkEnv.memoryManager.pageSizeBytes` returns `Long`, if we reuse it as chunk size, there is still underlying integer overflow, isn't it? Here, I restricted the upper limit of page size in case of too large. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r88602061 --- Diff: core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala --- @@ -331,7 +331,12 @@ private[spark] class MemoryStore( var unrollMemoryUsedByThisBlock = 0L // Underlying buffer for unrolling the block val redirectableStream = new RedirectableOutputStream -val bbos = new ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator) +val chunkSize = if (initialMemoryThreshold >= Integer.MAX_VALUE) { --- End diff -- Thanks for your feedback. Let us listen to @joshrosen 's advice --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r88453146 --- Diff: core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala --- @@ -331,7 +331,12 @@ private[spark] class MemoryStore( var unrollMemoryUsedByThisBlock = 0L // Underlying buffer for unrolling the block val redirectableStream = new RedirectableOutputStream -val bbos = new ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator) +val chunkSize = if (initialMemoryThreshold >= Integer.MAX_VALUE) { --- End diff -- Yes, this is no worse than the current behavior (error) in this case; is this valid? It looks like it's valid to use something smaller than this init threshold if it's huge. Tests will help prove that out. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r88429590 --- Diff: core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala --- @@ -331,7 +332,8 @@ private[spark] class MemoryStore( var unrollMemoryUsedByThisBlock = 0L // Underlying buffer for unrolling the block val redirectableStream = new RedirectableOutputStream -val bbos = new ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator) +val bbos = --- End diff -- make sense, thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r88429484 --- Diff: core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala --- @@ -331,7 +332,8 @@ private[spark] class MemoryStore( var unrollMemoryUsedByThisBlock = 0L // Underlying buffer for unrolling the block val redirectableStream = new RedirectableOutputStream -val bbos = new ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator) +val bbos = --- End diff -- make sense, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r88426103 --- Diff: core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala --- @@ -331,7 +332,8 @@ private[spark] class MemoryStore( var unrollMemoryUsedByThisBlock = 0L // Underlying buffer for unrolling the block val redirectableStream = new RedirectableOutputStream -val bbos = new ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator) +val bbos = --- End diff -- Rather than import a whole other class for this, you probably want to explicitly check and throw an exception. But, you're basically saying that this threshold can't be >= 4GB. Should that not just be checked earlier? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/15915 [SPARK-18485][CORE] Underlying integer overflow when create ChunkedByteBufferOutputStream in MemoryStore ## What changes were proposed in this pull request? There is an underlying integer overflow when create ChunkedByteBufferOutputStream in MemoryStore. This PR provide a check before cast. ## How was this patch tested? You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark SPARK-18485 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15915.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15915 commit e10143b0a7beb8e0924c7ba5bf437f0848eac778 Author: uncleGen Date: 2016-11-17T08:30:05Z SPARK-18485: Underlying integer overflow when create ChunkedByteBufferOutputStream in MemoryStore commit 8cd01d1d9f291a4515ae55ed15532fb89d09a006 Author: uncleGen Date: 2016-11-17T08:31:28Z update code style --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org