Github user szhem commented on a diff in the pull request:
https://github.com/apache/spark/pull/23083#discussion_r236083618
--- Diff:
core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala ---
@@ -727,9 +727,10 @@ private[spark] class ExternalSorter[K, V, C
Github user szhem commented on the issue:
https://github.com/apache/spark/pull/23083
Hi @cloud-fan
> Looking at the code, we are trying to fix 2 memory leaks: the task
completion listener in ShuffleBlockFetcherIterator, and the CompletionIterator.
If that's case, can
Github user szhem commented on a diff in the pull request:
https://github.com/apache/spark/pull/23083#discussion_r236057101
--- Diff:
core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala ---
@@ -727,9 +727,10 @@ private[spark] class ExternalSorter[K, V, C
Github user szhem commented on a diff in the pull request:
https://github.com/apache/spark/pull/23083#discussion_r235769204
--- Diff:
core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala ---
@@ -103,11 +116,26 @@ private[spark] class BlockStoreShuffleReader[K
Github user szhem commented on a diff in the pull request:
https://github.com/apache/spark/pull/23083#discussion_r235759169
--- Diff:
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
---
@@ -72,7 +73,8 @@ final class ShuffleBlockFetcherIterator
Github user szhem commented on a diff in the pull request:
https://github.com/apache/spark/pull/23083#discussion_r235757779
--- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala ---
@@ -99,6 +99,13 @@ private[spark] class TaskContextImpl
Github user szhem commented on the issue:
https://github.com/apache/spark/pull/23083
> So do you mean CoGroupRDDs with multiple input sources will have similar
problems?
Yep, but a little bit different ones
> If so, can you create another Jira?
W
Github user szhem commented on a diff in the pull request:
https://github.com/apache/spark/pull/23083#discussion_r235319165
--- Diff:
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
---
@@ -72,7 +73,8 @@ final class ShuffleBlockFetcherIterator
Github user szhem commented on a diff in the pull request:
https://github.com/apache/spark/pull/23083#discussion_r235314949
--- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala ---
@@ -99,6 +99,13 @@ private[spark] class TaskContextImpl
Github user szhem commented on a diff in the pull request:
https://github.com/apache/spark/pull/23083#discussion_r235308451
--- Diff: core/src/main/scala/org/apache/spark/TaskContext.scala ---
@@ -127,9 +127,21 @@ abstract class TaskContext extends Serializable {
// Note
Github user szhem commented on the issue:
https://github.com/apache/spark/pull/23083
Hi @davies, @advancedxy, @rxin,
You seem to be the last ones who touched the corresponding parts of the
files in this PR.
Could you be so kind to take a look
GitHub user szhem opened a pull request:
https://github.com/apache/spark/pull/23083
[SPARK-26114][CORE] ExternalSorter Leak
## What changes were proposed in this pull request?
This pull request fixes
[SPARK-26114](https://issues.apache.org/jira/browse/SPARK-26114) issue
Github user szhem commented on the issue:
https://github.com/apache/spark/pull/19410
Hello @mallman, @sujithjay, @felixcheung, @jkbradley, @mengxr, it's already
about a year passed since this pull request has been opened.
I'm just wondering whether there is any chance to get any
Github user szhem commented on the issue:
https://github.com/apache/spark/pull/19373
Hello @sujithjay, @felixcheung, @jkbradley, @mengxr, it's already more than
a year passed since this pull request has been opened.
I'm just wondering whether there is any chance for this PR
Github user szhem commented on the issue:
https://github.com/apache/spark/pull/19410
I've tested the mentioned checkpointers with
`spark.cleaner.referenceTracking.cleanCheckpoints` set to `true` and without
explicit checkpoint files removal.
It seems that there are somewhere
Github user szhem commented on the issue:
https://github.com/apache/spark/pull/19410
Hi @asolimando,
I believe that the solution with weak references will work and probably
with `ContextCleaner` too, but there are some points I'd like to discuss if you
don't mind
Github user szhem commented on the issue:
https://github.com/apache/spark/pull/19410
Hi @mallman,
I believe, that `ContextCleaner` currently does not delete checkpoint data
it case of unexpected failures.
Also as it works at the end of the job then there is still a chance
Github user szhem commented on the issue:
https://github.com/apache/spark/pull/19410
@mallman
Just my two cents regarding built-in solutions:
Periodic checkpointer deletes checkpoint files not to pollute the hard
drive. Although disk storage is cheap it's not free
Github user szhem commented on the issue:
https://github.com/apache/spark/pull/19410
Hi @mallman!
In case of
```
StorageLevel.MEMORY_AND_DISK
StorageLevel.MEMORY_AND_DISK_SER_2
```
... tests pass.
They still fail in case
Github user szhem commented on the issue:
https://github.com/apache/spark/pull/19410
Just a kind remainder...
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user szhem commented on the issue:
https://github.com/apache/spark/pull/19373
Just a kind remainder...
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user szhem commented on the issue:
https://github.com/apache/spark/pull/19373
> so the fix might be to change to call checkpoint() to checkpoint(eager:
true) - this ensures by the time checkpoint call is returned the checkpointing
is completed.
Even if checkpo
Github user szhem commented on the issue:
https://github.com/apache/spark/pull/19410
Hello @viirya, @mallman, @felixcheung,
You were reviewing graph checkpointing, introduced here #15125, and this PR
changes the behaviour a little bit.
Could you please review this PR
Github user szhem commented on the issue:
https://github.com/apache/spark/pull/19373
@felixcheung,
Unfortunately, RDDs, `PeriodicRDDCheckpointer` is based on, do not have
`checkpoint(eager: true)` yet.
It's a functionality of DataSets.
I've experimented
Github user szhem commented on the issue:
https://github.com/apache/spark/pull/19373
BTW, how do you think guys, may be it would be better to merge changes from
#19410 into this one?
The #19410 is almost about the same issue and fixes the described behaviour
for GraphX
Github user szhem commented on a diff in the pull request:
https://github.com/apache/spark/pull/19373#discussion_r177484476
--- Diff:
core/src/main/scala/org/apache/spark/rdd/util/PeriodicRDDCheckpointer.scala ---
@@ -73,8 +76,6 @@ import org.apache.spark.util.PeriodicCheckpointer
Github user szhem commented on the issue:
https://github.com/apache/spark/pull/19373
@felixcheung
> It is deleting earlier checkpoint after the current checkpoint is called
though?
Currently PeriodicCheckpointer can fail in case of checkpointing RDDs which
dep
Github user szhem commented on the issue:
https://github.com/apache/spark/pull/19410
I would happy if anyone can take a look at this PR.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user szhem commented on the issue:
https://github.com/apache/spark/pull/19373
I would happy if anyone can take a look at this PR.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user szhem commented on a diff in the pull request:
https://github.com/apache/spark/pull/19294#discussion_r143306215
--- Diff:
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
---
@@ -57,6 +60,15 @@ class HadoopMapReduceCommitProtocol
Github user szhem commented on a diff in the pull request:
https://github.com/apache/spark/pull/19294#discussion_r143305853
--- Diff:
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
---
@@ -57,6 +60,15 @@ class HadoopMapReduceCommitProtocol
Github user szhem commented on the issue:
https://github.com/apache/spark/pull/19294
@mridulm sql-related tests were removed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user szhem commented on a diff in the pull request:
https://github.com/apache/spark/pull/19294#discussion_r142319273
--- Diff:
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
---
@@ -57,6 +57,15 @@ class HadoopMapReduceCommitProtocol
Github user szhem commented on the issue:
https://github.com/apache/spark/pull/19294
@gatorsmile I believe that in Spark SQL code path `path` cannot be null,
because in that case `FileFormatWriter` [fails even
before](https://github.com/apache/spark/blob
GitHub user szhem opened a pull request:
https://github.com/apache/spark/pull/19410
[SPARK-22184][CORE][GRAPHX] GraphX fails in case of insufficient memory and
checkpoints enabled
## What changes were proposed in this pull request?
Fix for [SPARK-22184](https
Github user szhem commented on the issue:
https://github.com/apache/spark/pull/19294
@mridulm Regarding FileFormatWriter I've implemented some basic tests which
show that
1. [FileFormatWriter
fails](https://github.com/apache/spark/blob
GitHub user szhem opened a pull request:
https://github.com/apache/spark/pull/19373
[SPARK-22150][CORE] PeriodicCheckpointer fails in case of dependant RDDs
## What changes were proposed in this pull request?
Fix for [SPARK-22150](https://issues.apache.org/jira/browse/SPARK
Github user szhem commented on the issue:
https://github.com/apache/spark/pull/19294
Hello guys, are there a change for this patch to be merged to master?
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user szhem commented on the issue:
https://github.com/apache/spark/pull/19294
@mridulm Updated `FileFormatWriterSuite` [to
cover](https://github.com/apache/spark/pull/19294/files#diff-bc98a3d91cf4f95f4f473146400044aa)
both branches of the [committer
calling](https
Github user szhem commented on a diff in the pull request:
https://github.com/apache/spark/pull/19294#discussion_r140654204
--- Diff:
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
---
@@ -57,6 +57,11 @@ class HadoopMapReduceCommitProtocol
Github user szhem commented on a diff in the pull request:
https://github.com/apache/spark/pull/19294#discussion_r140652214
--- Diff:
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
---
@@ -130,17 +135,21 @@ class HadoopMapReduceCommitProtocol
Github user szhem commented on the issue:
https://github.com/apache/spark/pull/19294
@mridulm
> incorporating a test for the sql part will also help in this matter.
What should be the expected behaviour in case of sql?
I'm asking because [the sql part seems to f
Github user szhem commented on a diff in the pull request:
https://github.com/apache/spark/pull/19294#discussion_r140165731
--- Diff:
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
---
@@ -130,17 +130,21 @@ class HadoopMapReduceCommitProtocol
Github user szhem commented on a diff in the pull request:
https://github.com/apache/spark/pull/19294#discussion_r140076614
--- Diff:
core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala ---
@@ -568,6 +568,51 @@ class PairRDDFunctionsSuite extends SparkFunSuite
Github user szhem commented on a diff in the pull request:
https://github.com/apache/spark/pull/19294#discussion_r140076564
--- Diff:
core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala ---
@@ -568,6 +568,51 @@ class PairRDDFunctionsSuite extends SparkFunSuite
GitHub user szhem opened a pull request:
https://github.com/apache/spark/pull/19294
[SPARK-21549][CORE] Respect OutputFormats with no output directory provided
## What changes were proposed in this pull request?
Fix for https://issues.apache.org/jira/browse/SPARK-21549 JIRA
GitHub user szhem opened a pull request:
https://github.com/apache/spark/pull/17740
[SPARK-SPARK-20404][CORE] Using Option(name) instead of Some(name)
Using Option(name) instead of Some(name) to prevent runtime failures when
using accumulators created like the following
47 matches
Mail list logo