[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Github user wangjiaochun commented on a diff in the pull request: https://github.com/apache/spark/pull/23225#discussion_r239990587 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java --- @@ -161,6 +161,10 @@ private void writeSortedFile(boolean isLastFile) { final ShuffleInMemorySorter.ShuffleSorterIterator sortedRecords = inMemSorter.getSortedIterator(); +// If there are no sorted records, so we don't need to create an empty spill file. +if (!sortedRecords.hasNext()) { + return; +} --- End diff -- Okay, I will make the changes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Github user wangjiaochun commented on a diff in the pull request: https://github.com/apache/spark/pull/23225#discussion_r239990083 --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java --- @@ -235,6 +235,7 @@ public void writeEmptyIterator() throws Exception { final Option mapStatus = writer.stop(true); assertTrue(mapStatus.isDefined()); assertTrue(mergedOutputFile.exists()); +assertEquals(0, spillFilesCreated.size()); --- End diff -- I think it's unnecessary to add a new test case, and it can delete line 239~242 of this test writeEmptyIterator because they're always right. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23225 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23225#discussion_r240014126 --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java --- @@ -235,11 +235,8 @@ public void writeEmptyIterator() throws Exception { final Option mapStatus = writer.stop(true); assertTrue(mapStatus.isDefined()); assertTrue(mergedOutputFile.exists()); +assertEquals(0, spillFilesCreated.size()); assertArrayEquals(new long[NUM_PARTITITONS], partitionSizesInMergedFile); -assertEquals(0, taskMetrics.shuffleWriteMetrics().recordsWritten()); -assertEquals(0, taskMetrics.shuffleWriteMetrics().bytesWritten()); -assertEquals(0, taskMetrics.diskBytesSpilled()); -assertEquals(0, taskMetrics.memoryBytesSpilled()); --- End diff -- We need to keep these test coverage guaranteeing that the task metrics remained an untouched state. Please revert this removal. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/23225#discussion_r239990036 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java --- @@ -161,6 +161,10 @@ private void writeSortedFile(boolean isLastFile) { final ShuffleInMemorySorter.ShuffleSorterIterator sortedRecords = inMemSorter.getSortedIterator(); +// If there are no sorted records, so we don't need to create an empty spill file. +if (!sortedRecords.hasNext()) { + return; +} --- End diff -- If you're going to short-circuit, why not do it at the start of the function and save the rest of the work done above? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Github user wangjiaochun commented on a diff in the pull request: https://github.com/apache/spark/pull/23225#discussion_r239989805 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java --- @@ -161,6 +161,10 @@ private void writeSortedFile(boolean isLastFile) { final ShuffleInMemorySorter.ShuffleSorterIterator sortedRecords = inMemSorter.getSortedIterator(); +// If there are no sorted records, so we don't need to create an empty spill file. +if (!sortedRecords.hasNext()) { + return; +} --- End diff -- I think it's better not to do that.Because change the original code style and it don't makes an appreciable difference in readability. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23225#discussion_r239872296 --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java --- @@ -235,6 +235,7 @@ public void writeEmptyIterator() throws Exception { final Option mapStatus = writer.stop(true); assertTrue(mapStatus.isDefined()); assertTrue(mergedOutputFile.exists()); +assertEquals(0, spillFilesCreated.size()); --- End diff -- BTW, since we are touching `ShuffleExternalSorter.java`, it seems that we need to add a test case to `ShuffleExternalSorterSuite`, too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23225#discussion_r239860885 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java --- @@ -161,6 +161,10 @@ private void writeSortedFile(boolean isLastFile) { final ShuffleInMemorySorter.ShuffleSorterIterator sortedRecords = inMemSorter.getSortedIterator(); +// If there are no sorted records, so we don't need to create an empty spill file. +if (!sortedRecords.hasNext()) { + return; +} --- End diff -- @wangjiaochun . Shall we move line 160 ~ 167 to the beginning of this function (line 147)? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23225#discussion_r239859088 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java --- @@ -161,6 +161,10 @@ private void writeSortedFile(boolean isLastFile) { final ShuffleInMemorySorter.ShuffleSorterIterator sortedRecords = inMemSorter.getSortedIterator(); +// If there are no sorted records, so we don't need to create an empty spill file. +if (!sortedRecords.hasNext()) { + return; +} --- End diff -- @wangjiaochun . Can we move this line 160 ~ 167 to line 147? If then, we don't need to do `new SuffleWriteMetrics()` at all. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Github user wangjiaochun commented on a diff in the pull request: https://github.com/apache/spark/pull/23225#discussion_r239711919 --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java --- @@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception { } } + @Test + public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception { +final UnsafeShuffleWriter writer = createWriter(true); +writer.write(Iterators.emptyIterator()); +final Option mapStatus = writer.stop(true); +assertTrue(mapStatus.isDefined()); +assertTrue(mergedOutputFile.exists()); +assertEquals(0, spillFilesCreated.size()); --- End diff -- That's it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23225#discussion_r239707130 --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java --- @@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception { } } + @Test + public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception { +final UnsafeShuffleWriter writer = createWriter(true); +writer.write(Iterators.emptyIterator()); +final Option mapStatus = writer.stop(true); +assertTrue(mapStatus.isDefined()); +assertTrue(mergedOutputFile.exists()); +assertEquals(0, spillFilesCreated.size()); --- End diff -- Ya. That's the same what I observed. There is no need to keep both test cases because there is no code path like that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Github user wangjiaochun commented on a diff in the pull request: https://github.com/apache/spark/pull/23225#discussion_r239704796 --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java --- @@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception { } } + @Test + public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception { +final UnsafeShuffleWriter writer = createWriter(true); +writer.write(Iterators.emptyIterator()); +final Option mapStatus = writer.stop(true); +assertTrue(mapStatus.isDefined()); +assertTrue(mergedOutputFile.exists()); +assertEquals(0, spillFilesCreated.size()); --- End diff -- I mean that before add code "if (sortedRecords.hasNext()) { return }" it will fail. now add assertEquals(0, spillFilesCreated.size()) to writeEmptyIterator seems good. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23225#discussion_r239702595 --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java --- @@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception { } } + @Test + public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception { +final UnsafeShuffleWriter writer = createWriter(true); +writer.write(Iterators.emptyIterator()); +final Option mapStatus = writer.stop(true); +assertTrue(mapStatus.isDefined()); +assertTrue(mergedOutputFile.exists()); +assertEquals(0, spillFilesCreated.size()); --- End diff -- Do you mean that `writeEmptyIterator` will fail if we add this line `assertEquals(0, spillFilesCreated.size());` there? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Github user wangjiaochun commented on a diff in the pull request: https://github.com/apache/spark/pull/23225#discussion_r239700999 --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java --- @@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception { } } + @Test + public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception { +final UnsafeShuffleWriter writer = createWriter(true); +writer.write(Iterators.emptyIterator()); +final Option mapStatus = writer.stop(true); +assertTrue(mapStatus.isDefined()); +assertTrue(mergedOutputFile.exists()); +assertEquals(0, spillFilesCreated.size()); --- End diff -- Test writeEmptyIterator() has create spill file although write empty iterator,but the writeEmptyIteratorNotCreateEmptySpillFile test not create spill file while there has not records in memroy --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23225#discussion_r239697971 --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java --- @@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception { } } + @Test + public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception { +final UnsafeShuffleWriter writer = createWriter(true); +writer.write(Iterators.emptyIterator()); +final Option mapStatus = writer.stop(true); +assertTrue(mapStatus.isDefined()); +assertTrue(mergedOutputFile.exists()); +assertEquals(0, spillFilesCreated.size()); --- End diff -- It seems that we can move this line to `writeEmptyIterator()` instead of making a new test case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23225#discussion_r239697860 --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java --- @@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception { } } + @Test + public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception { --- End diff -- This looks like mostly a duplication of `writeEmptyIterator`. Logically, do we need to keep both `writeEmptyIterator` and `writeEmptyIteratorNotCreateEmptySpillFile` seperately? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org