[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter's readingItera...

2018-11-25 Thread szhem
Github user szhem commented on a diff in the pull request: https://github.com/apache/spark/pull/23083#discussion_r236083618 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -727,9 +727,10 @@ private[spark] class ExternalSorter[K, V, C

[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator fie...

2018-11-24 Thread szhem
Github user szhem commented on the issue: https://github.com/apache/spark/pull/23083 Hi @cloud-fan > Looking at the code, we are trying to fix 2 memory leaks: the task completion listener in ShuffleBlockFetcherIterator, and the CompletionIterator. If that's case, can

[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter Leak

2018-11-24 Thread szhem
Github user szhem commented on a diff in the pull request: https://github.com/apache/spark/pull/23083#discussion_r236057101 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -727,9 +727,10 @@ private[spark] class ExternalSorter[K, V, C

[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter Leak

2018-11-22 Thread szhem
Github user szhem commented on a diff in the pull request: https://github.com/apache/spark/pull/23083#discussion_r235769204 --- Diff: core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala --- @@ -103,11 +116,26 @@ private[spark] class BlockStoreShuffleReader[K

[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter Leak

2018-11-22 Thread szhem
Github user szhem commented on a diff in the pull request: https://github.com/apache/spark/pull/23083#discussion_r235759169 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -72,7 +73,8 @@ final class ShuffleBlockFetcherIterator

[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter Leak

2018-11-22 Thread szhem
Github user szhem commented on a diff in the pull request: https://github.com/apache/spark/pull/23083#discussion_r235757779 --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala --- @@ -99,6 +99,13 @@ private[spark] class TaskContextImpl

[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter Leak

2018-11-21 Thread szhem
Github user szhem commented on the issue: https://github.com/apache/spark/pull/23083 > So do you mean CoGroupRDDs with multiple input sources will have similar problems? Yep, but a little bit different ones > If so, can you create another Jira? W

[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter Leak

2018-11-21 Thread szhem
Github user szhem commented on a diff in the pull request: https://github.com/apache/spark/pull/23083#discussion_r235319165 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -72,7 +73,8 @@ final class ShuffleBlockFetcherIterator

[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter Leak

2018-11-21 Thread szhem
Github user szhem commented on a diff in the pull request: https://github.com/apache/spark/pull/23083#discussion_r235314949 --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala --- @@ -99,6 +99,13 @@ private[spark] class TaskContextImpl

[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter Leak

2018-11-21 Thread szhem
Github user szhem commented on a diff in the pull request: https://github.com/apache/spark/pull/23083#discussion_r235308451 --- Diff: core/src/main/scala/org/apache/spark/TaskContext.scala --- @@ -127,9 +127,21 @@ abstract class TaskContext extends Serializable { // Note

[GitHub] spark issue #23083: [SPARK-26114][CORE] ExternalSorter Leak

2018-11-20 Thread szhem
Github user szhem commented on the issue: https://github.com/apache/spark/pull/23083 Hi @davies, @advancedxy, @rxin, You seem to be the last ones who touched the corresponding parts of the files in this PR. Could you be so kind to take a look

[GitHub] spark pull request #23083: [SPARK-26114][CORE] ExternalSorter Leak

2018-11-19 Thread szhem
GitHub user szhem opened a pull request: https://github.com/apache/spark/pull/23083 [SPARK-26114][CORE] ExternalSorter Leak ## What changes were proposed in this pull request? This pull request fixes [SPARK-26114](https://issues.apache.org/jira/browse/SPARK-26114) issue

[GitHub] spark issue #19410: [SPARK-22184][CORE][GRAPHX] GraphX fails in case of insu...

2018-09-30 Thread szhem
Github user szhem commented on the issue: https://github.com/apache/spark/pull/19410 Hello @mallman, @sujithjay, @felixcheung, @jkbradley, @mengxr, it's already about a year passed since this pull request has been opened. I'm just wondering whether there is any chance to get any

[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-09-30 Thread szhem
Github user szhem commented on the issue: https://github.com/apache/spark/pull/19373 Hello @sujithjay, @felixcheung, @jkbradley, @mengxr, it's already more than a year passed since this pull request has been opened. I'm just wondering whether there is any chance for this PR

[GitHub] spark issue #19410: [SPARK-22184][CORE][GRAPHX] GraphX fails in case of insu...

2018-09-03 Thread szhem
Github user szhem commented on the issue: https://github.com/apache/spark/pull/19410 I've tested the mentioned checkpointers with `spark.cleaner.referenceTracking.cleanCheckpoints` set to `true` and without explicit checkpoint files removal. It seems that there are somewhere

[GitHub] spark issue #19410: [SPARK-22184][CORE][GRAPHX] GraphX fails in case of insu...

2018-09-01 Thread szhem
Github user szhem commented on the issue: https://github.com/apache/spark/pull/19410 Hi @asolimando, I believe that the solution with weak references will work and probably with `ContextCleaner` too, but there are some points I'd like to discuss if you don't mind

[GitHub] spark issue #19410: [SPARK-22184][CORE][GRAPHX] GraphX fails in case of insu...

2018-07-09 Thread szhem
Github user szhem commented on the issue: https://github.com/apache/spark/pull/19410 Hi @mallman, I believe, that `ContextCleaner` currently does not delete checkpoint data it case of unexpected failures. Also as it works at the end of the job then there is still a chance

[GitHub] spark issue #19410: [SPARK-22184][CORE][GRAPHX] GraphX fails in case of insu...

2018-07-03 Thread szhem
Github user szhem commented on the issue: https://github.com/apache/spark/pull/19410 @mallman Just my two cents regarding built-in solutions: Periodic checkpointer deletes checkpoint files not to pollute the hard drive. Although disk storage is cheap it's not free

[GitHub] spark issue #19410: [SPARK-22184][CORE][GRAPHX] GraphX fails in case of insu...

2018-06-27 Thread szhem
Github user szhem commented on the issue: https://github.com/apache/spark/pull/19410 Hi @mallman! In case of ``` StorageLevel.MEMORY_AND_DISK StorageLevel.MEMORY_AND_DISK_SER_2 ``` ... tests pass. They still fail in case

[GitHub] spark issue #19410: [SPARK-22184][CORE][GRAPHX] GraphX fails in case of insu...

2018-06-25 Thread szhem
Github user szhem commented on the issue: https://github.com/apache/spark/pull/19410 Just a kind remainder... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-06-25 Thread szhem
Github user szhem commented on the issue: https://github.com/apache/spark/pull/19373 Just a kind remainder... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-04-02 Thread szhem
Github user szhem commented on the issue: https://github.com/apache/spark/pull/19373 > so the fix might be to change to call checkpoint() to checkpoint(eager: true) - this ensures by the time checkpoint call is returned the checkpointing is completed. Even if checkpo

[GitHub] spark issue #19410: [SPARK-22184][CORE][GRAPHX] GraphX fails in case of insu...

2018-03-29 Thread szhem
Github user szhem commented on the issue: https://github.com/apache/spark/pull/19410 Hello @viirya, @mallman, @felixcheung, You were reviewing graph checkpointing, introduced here #15125, and this PR changes the behaviour a little bit. Could you please review this PR

[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-03-29 Thread szhem
Github user szhem commented on the issue: https://github.com/apache/spark/pull/19373 @felixcheung, Unfortunately, RDDs, `PeriodicRDDCheckpointer` is based on, do not have `checkpoint(eager: true)` yet. It's a functionality of DataSets. I've experimented

[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-03-27 Thread szhem
Github user szhem commented on the issue: https://github.com/apache/spark/pull/19373 BTW, how do you think guys, may be it would be better to merge changes from #19410 into this one? The #19410 is almost about the same issue and fixes the described behaviour for GraphX

[GitHub] spark pull request #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in...

2018-03-27 Thread szhem
Github user szhem commented on a diff in the pull request: https://github.com/apache/spark/pull/19373#discussion_r177484476 --- Diff: core/src/main/scala/org/apache/spark/rdd/util/PeriodicRDDCheckpointer.scala --- @@ -73,8 +76,6 @@ import org.apache.spark.util.PeriodicCheckpointer

[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-03-27 Thread szhem
Github user szhem commented on the issue: https://github.com/apache/spark/pull/19373 @felixcheung > It is deleting earlier checkpoint after the current checkpoint is called though? Currently PeriodicCheckpointer can fail in case of checkpointing RDDs which dep

[GitHub] spark issue #19410: [SPARK-22184][CORE][GRAPHX] GraphX fails in case of insu...

2017-10-16 Thread szhem
Github user szhem commented on the issue: https://github.com/apache/spark/pull/19410 I would happy if anyone can take a look at this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2017-10-16 Thread szhem
Github user szhem commented on the issue: https://github.com/apache/spark/pull/19373 I would happy if anyone can take a look at this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #19294: [SPARK-21549][CORE] Respect OutputFormats with no...

2017-10-06 Thread szhem
Github user szhem commented on a diff in the pull request: https://github.com/apache/spark/pull/19294#discussion_r143306215 --- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala --- @@ -57,6 +60,15 @@ class HadoopMapReduceCommitProtocol

[GitHub] spark pull request #19294: [SPARK-21549][CORE] Respect OutputFormats with no...

2017-10-06 Thread szhem
Github user szhem commented on a diff in the pull request: https://github.com/apache/spark/pull/19294#discussion_r143305853 --- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala --- @@ -57,6 +60,15 @@ class HadoopMapReduceCommitProtocol

[GitHub] spark issue #19294: [SPARK-21549][CORE] Respect OutputFormats with no output...

2017-10-06 Thread szhem
Github user szhem commented on the issue: https://github.com/apache/spark/pull/19294 @mridulm sql-related tests were removed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19294: [SPARK-21549][CORE] Respect OutputFormats with no...

2017-10-03 Thread szhem
Github user szhem commented on a diff in the pull request: https://github.com/apache/spark/pull/19294#discussion_r142319273 --- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala --- @@ -57,6 +57,15 @@ class HadoopMapReduceCommitProtocol

[GitHub] spark issue #19294: [SPARK-21549][CORE] Respect OutputFormats with no output...

2017-10-02 Thread szhem
Github user szhem commented on the issue: https://github.com/apache/spark/pull/19294 @gatorsmile I believe that in Spark SQL code path `path` cannot be null, because in that case `FileFormatWriter` [fails even before](https://github.com/apache/spark/blob

[GitHub] spark pull request #19410: [SPARK-22184][CORE][GRAPHX] GraphX fails in case ...

2017-10-02 Thread szhem
GitHub user szhem opened a pull request: https://github.com/apache/spark/pull/19410 [SPARK-22184][CORE][GRAPHX] GraphX fails in case of insufficient memory and checkpoints enabled ## What changes were proposed in this pull request? Fix for [SPARK-22184](https

[GitHub] spark issue #19294: [SPARK-21549][CORE] Respect OutputFormats with no output...

2017-09-28 Thread szhem
Github user szhem commented on the issue: https://github.com/apache/spark/pull/19294 @mridulm Regarding FileFormatWriter I've implemented some basic tests which show that 1. [FileFormatWriter fails](https://github.com/apache/spark/blob

[GitHub] spark pull request #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in...

2017-09-27 Thread szhem
GitHub user szhem opened a pull request: https://github.com/apache/spark/pull/19373 [SPARK-22150][CORE] PeriodicCheckpointer fails in case of dependant RDDs ## What changes were proposed in this pull request? Fix for [SPARK-22150](https://issues.apache.org/jira/browse/SPARK

[GitHub] spark issue #19294: [SPARK-21549][CORE] Respect OutputFormats with no output...

2017-09-26 Thread szhem
Github user szhem commented on the issue: https://github.com/apache/spark/pull/19294 Hello guys, are there a change for this patch to be merged to master? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #19294: [SPARK-21549][CORE] Respect OutputFormats with no output...

2017-09-24 Thread szhem
Github user szhem commented on the issue: https://github.com/apache/spark/pull/19294 @mridulm Updated `FileFormatWriterSuite` [to cover](https://github.com/apache/spark/pull/19294/files#diff-bc98a3d91cf4f95f4f473146400044aa) both branches of the [committer calling](https

[GitHub] spark pull request #19294: [SPARK-21549][CORE] Respect OutputFormats with no...

2017-09-24 Thread szhem
Github user szhem commented on a diff in the pull request: https://github.com/apache/spark/pull/19294#discussion_r140654204 --- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala --- @@ -57,6 +57,11 @@ class HadoopMapReduceCommitProtocol

[GitHub] spark pull request #19294: [SPARK-21549][CORE] Respect OutputFormats with no...

2017-09-24 Thread szhem
Github user szhem commented on a diff in the pull request: https://github.com/apache/spark/pull/19294#discussion_r140652214 --- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala --- @@ -130,17 +135,21 @@ class HadoopMapReduceCommitProtocol

[GitHub] spark issue #19294: [SPARK-21549][CORE] Respect OutputFormats with no output...

2017-09-23 Thread szhem
Github user szhem commented on the issue: https://github.com/apache/spark/pull/19294 @mridulm > incorporating a test for the sql part will also help in this matter. What should be the expected behaviour in case of sql? I'm asking because [the sql part seems to f

[GitHub] spark pull request #19294: [SPARK-21549][CORE] Respect OutputFormats with no...

2017-09-21 Thread szhem
Github user szhem commented on a diff in the pull request: https://github.com/apache/spark/pull/19294#discussion_r140165731 --- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala --- @@ -130,17 +130,21 @@ class HadoopMapReduceCommitProtocol

[GitHub] spark pull request #19294: [SPARK-21549][CORE] Respect OutputFormats with no...

2017-09-20 Thread szhem
Github user szhem commented on a diff in the pull request: https://github.com/apache/spark/pull/19294#discussion_r140076614 --- Diff: core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala --- @@ -568,6 +568,51 @@ class PairRDDFunctionsSuite extends SparkFunSuite

[GitHub] spark pull request #19294: [SPARK-21549][CORE] Respect OutputFormats with no...

2017-09-20 Thread szhem
Github user szhem commented on a diff in the pull request: https://github.com/apache/spark/pull/19294#discussion_r140076564 --- Diff: core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala --- @@ -568,6 +568,51 @@ class PairRDDFunctionsSuite extends SparkFunSuite

[GitHub] spark pull request #19294: [SPARK-21549][CORE] Respect OutputFormats with no...

2017-09-20 Thread szhem
GitHub user szhem opened a pull request: https://github.com/apache/spark/pull/19294 [SPARK-21549][CORE] Respect OutputFormats with no output directory provided ## What changes were proposed in this pull request? Fix for https://issues.apache.org/jira/browse/SPARK-21549 JIRA

[GitHub] spark pull request #17740: [SPARK-SPARK-20404][CORE] Using Option(name) inst...

2017-04-24 Thread szhem
GitHub user szhem opened a pull request: https://github.com/apache/spark/pull/17740 [SPARK-SPARK-20404][CORE] Using Option(name) instead of Some(name) Using Option(name) instead of Some(name) to prevent runtime failures when using accumulators created like the following