[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Priority: Major (was: Minor)
Memory-based shuffle strategy to reduce overhead of disk I/O
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Description:
I think a memory-based shuffle can reduce some overhead of disk I/O. I just
want to know
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Description:
I think a memory-based shuffle can reduce some overhead of disk I/O. I just
want to know
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Description:
I think a memory-based shuffle can reduce some overhead of disk I/O. I just
want to know
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Description:
I think a memory-based shuffle can reduce some overhead of disk I/O. I just
want to know
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Description:
I think a memory-based shuffle can reduce some overhead of disk I/O. I just
want to know
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Description:
I think a memory-based shuffle can reduce some overhead of disk I/O. I just
want to know
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Description:
I think a memory-based shuffle can reduce some overhead of disk I/O. I just
want to know
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Description:
I think a memory-based shuffle can reduce some overhead of disk I/O. I just
want to know
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Description:
I think a memory-based shuffle can reduce some overhead of disk I/O. I just
want to know
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Description:
I think a memory-based shuffle can reduce some overhead of disk I/O. I just
want to know
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Description:
I think a memory-based shuffle can reduce some overhead of disk I/O. I just
want to know
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Description:
I think a memory-based shuffle can reduce some overhead of disk I/O. I just
want to know
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Description:
I think a memory-based shuffle can reduce some overhead of disk I/O. I just
want to know
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Description:
I think a memory-based shuffle can reduce some overhead of disk I/O. I just
want to know
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Component/s: Shuffle
Memory-based shuffle strategy to reduce overhead of disk I/O
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Issue Type: New Feature (was: Planned Work)
Memory-based shuffle strategy to reduce overhead of disk I/O
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Affects Version/s: 1.1.0
Memory-based shuffle strategy to reduce overhead of disk I/O
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Target Version/s: 1.3.0
Memory-based shuffle strategy to reduce overhead of disk I/O
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Labels: performance (was: )
Memory-based shuffle strategy to reduce overhead of disk I/O
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Description:
I think a memory-based shuffle can reduce some overhead of disk I/O. I just
want to know
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Description:
I think a memory-based shuffle can reduce some overhead of disk I/O. I just
want to know
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Description:
I think a memory-based shuffle can reduce some overhead of disk I/O. I just
want to know
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Description:
I think a memory-based shuffle can reduce some overhead of disk I/O. I just
want to know
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240985#comment-14240985
]
uncleGen commented on SPARK-3376:
-
[~rxin] Yeah, I agree with you. We can improve the I/O
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Priority: Minor (was: Trivial)
Memory-based shuffle strategy to reduce overhead of disk I/O
Github user uncleGen commented on the pull request:
https://github.com/apache/spark/pull/3366#issuecomment-63831692
@davies Could you help reviewing this patch? Thank you!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
uncleGen created SPARK-4488:
---
Summary: Add control over map-side aggregation
Key: SPARK-4488
URL: https://issues.apache.org/jira/browse/SPARK-4488
Project: Spark
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/SPARK-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3373:
Target Version/s: 1.1.1, 1.2.0 (was: 1.1.0, 1.0.3)
Filtering operations should optionally rebuild routing
GitHub user uncleGen opened a pull request:
https://github.com/apache/spark/pull/3365
[SPARK-4488][PySpark] Add control over map-side aggregation
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/uncleGen/spark master-clean-141119
Github user uncleGen closed the pull request at:
https://github.com/apache/spark/pull/3365
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user uncleGen reopened a pull request:
https://github.com/apache/spark/pull/3365
[SPARK-4488][PySpark] Add control over map-side aggregation
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/uncleGen/spark master-clean
Github user uncleGen closed the pull request at:
https://github.com/apache/spark/pull/3365
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user uncleGen opened a pull request:
https://github.com/apache/spark/pull/3366
[SPARK-4488][PySpark] Add control over map-side aggregation
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/uncleGen/spark master-pyspark
Github user uncleGen commented on the pull request:
https://github.com/apache/spark/pull/2574#issuecomment-63651117
@JoshRosen [[SPARK-4168][WebUI]
](https://github.com/apache/spark/commit/97a466eca0a629f17e9662ca2b59eeca99142c54)
The patch solved the same problem, and I will close
Github user uncleGen closed the pull request at:
https://github.com/apache/spark/pull/2574
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user uncleGen commented on the pull request:
https://github.com/apache/spark/pull/2249#issuecomment-63654065
@ankurdave Hi, can you review it again. Thank you!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user uncleGen commented on the pull request:
https://github.com/apache/spark/pull/2679#issuecomment-59037906
@ankurdave I see. And I think it is worthy to provide a memory-based
shuffle manager in some cases, like sufficient memory resources, stringent
performance requirement
Github user uncleGen commented on the pull request:
https://github.com/apache/spark/pull/2679#issuecomment-58885039
@ankurdave I have some doubts, but not about this patch. In [GraphX OSDI
paper](http://ankurdave.com/dl/graphx-osdi14.pdf) , I find that you have
implemented a memory
Github user uncleGen commented on the pull request:
https://github.com/apache/spark/pull/2574#issuecomment-58735669
@JoshRosen Sorry for my misunderstanding, I will correct it as soon as
possible.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user uncleGen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2574#discussion_r18622733
--- Diff:
core/src/main/scala/org/apache/spark/ui/jobs/JobProgressPage.scala ---
@@ -70,11 +72,11 @@ private[ui] class JobProgressPage(parent
Github user uncleGen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2574#discussion_r18561760
--- Diff:
core/src/main/scala/org/apache/spark/ui/jobs/JobProgressPage.scala ---
@@ -70,11 +72,11 @@ private[ui] class JobProgressPage(parent
uncleGen created SPARK-3719:
---
Summary: Spark UI: complete/failed stages is better to show the
total number of stages
Key: SPARK-3719
URL: https://issues.apache.org/jira/browse/SPARK-3719
Project: Spark
GitHub user uncleGen opened a pull request:
https://github.com/apache/spark/pull/2574
[SPARK-3719][CORE]:complete/failed stages is better to show the total ...
...number of stages
You can merge this pull request into a Git repository by running:
$ git pull https://github.com
uncleGen created SPARK-3712:
---
Summary: add a new UpdateDStream to update a rdd dynamically
Key: SPARK-3712
URL: https://issues.apache.org/jira/browse/SPARK-3712
Project: Spark
Issue Type
GitHub user uncleGen opened a pull request:
https://github.com/apache/spark/pull/2562
[SPARK-3712][STREAMING]: add a new UpdateDStream to update a rdd dynamically
Maybe, we can achieve the aim by using forEachRdd function. But it is
weird in this way, because I need to pass
Github user uncleGen commented on the pull request:
https://github.com/apache/spark/pull/2562#issuecomment-57083416
Test failure appears to be unrelated to my patch.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user uncleGen commented on the pull request:
https://github.com/apache/spark/pull/2562#issuecomment-57087576
@jerryshao Thanks for your comments! I want to abstract an independent
DStream to achieve the aim. I feel it is weird to update a rdd by passing a
closure. Maybe
Github user uncleGen closed the pull request at:
https://github.com/apache/spark/pull/2562
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user uncleGen closed the pull request at:
https://github.com/apache/spark/pull/2488
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user uncleGen commented on the pull request:
https://github.com/apache/spark/pull/2488#issuecomment-56505578
@pwendell ah, make sense, I will close this PR. Thank you!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user uncleGen commented on a diff in the pull request:
https://github.com/apache/spark/pull/710#discussion_r17951094
--- Diff:
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
@@ -192,15 +236,17 @@ class SparkSubmitSuite extends FunSuite
uncleGen created SPARK-3636:
---
Summary: It is not friendly to interrupt a Job when user passes
different storageLevels to a RDD
Key: SPARK-3636
URL: https://issues.apache.org/jira/browse/SPARK-3636
Project
GitHub user uncleGen opened a pull request:
https://github.com/apache/spark/pull/2488
[SPARK-3636][CORE]:It is not friendly to interrupt a Job when user passe...
... different storageLevels to a RDD
You can merge this pull request into a Git repository by running:
$ git pull
uncleGen created SPARK-3373:
---
Summary: trim some useless informations of VertexRDD in some cases
Key: SPARK-3373
URL: https://issues.apache.org/jira/browse/SPARK-3373
Project: Spark
Issue Type
uncleGen created SPARK-3376:
---
Summary: Memory-based shuffle strategy to reduce overhead of disk
I/O
Key: SPARK-3376
URL: https://issues.apache.org/jira/browse/SPARK-3376
Project: Spark
Issue Type
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Description: I think a memory-based shuffle can reduce some overhead of
disk I/O. I just want to know
[
https://issues.apache.org/jira/browse/SPARK-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3376:
Description: I think a memory-based shuffle can reduce some overhead of
disk I/O. I just want to know
GitHub user uncleGen opened a pull request:
https://github.com/apache/spark/pull/2249
[GraphX]: trim some useless informations of VertexRDD in some cases
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/uncleGen/spark
Github user uncleGen commented on the pull request:
https://github.com/apache/spark/pull/2249#issuecomment-54399143
@ankurdave Thanks for you comments, I will update it as soon as possible.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user uncleGen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2249#discussion_r17095835
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/Graph.scala ---
@@ -262,13 +262,61 @@ abstract class Graph[VD: ClassTag, ED: ClassTag]
protected
Github user uncleGen closed the pull request at:
https://github.com/apache/spark/pull/1429
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user uncleGen commented on the pull request:
https://github.com/apache/spark/pull/1696#issuecomment-54102947
@pwendell OKï¼
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user uncleGen commented on the pull request:
https://github.com/apache/spark/pull/1356#issuecomment-53977557
okay!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user uncleGen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2131#discussion_r16761636
--- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
@@ -68,7 +68,9 @@ private[spark] class CacheManager(blockManager:
BlockManager
Github user uncleGen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2131#discussion_r16762096
--- Diff: core/src/test/scala/org/apache/spark/CacheManagerSuite.scala ---
@@ -87,4 +99,12 @@ class CacheManagerSuite extends FunSuite with
BeforeAndAfter
Github user uncleGen commented on the pull request:
https://github.com/apache/spark/pull/2131#issuecomment-53549872
@andrewor14 sorry for my poor coding. Unit test passed locally, test it
again pls.
---
If your project is set up for it, you can reply to this email and have your
GitHub user uncleGen opened a pull request:
https://github.com/apache/spark/pull/2131
[SPARK-3170][CORE][BUG]:RDD info loss in StorageTab and ExecutorTab
compeleted stage only need to remove its own partitions that are no longer
cached. However, StorageTab may lost some rdds which
Github user uncleGen commented on the pull request:
https://github.com/apache/spark/pull/2076#issuecomment-53383270
@andrewor14 @pwendell @srowen
As my branch is not up to date, I decide to close this and submit a new PR.
Please Review It : https://github.com/apache/spark
Github user uncleGen closed the pull request at:
https://github.com/apache/spark/pull/2076
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user uncleGen commented on the pull request:
https://github.com/apache/spark/pull/2131#issuecomment-53521662
Hi @andrewor14, test it again please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user uncleGen commented on the pull request:
https://github.com/apache/spark/pull/2076#issuecomment-53144395
@pwendell Okay! I will add them as soon as possible and pay more attention.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user uncleGen commented on the pull request:
https://github.com/apache/spark/pull/2076#issuecomment-52908740
@srowen yes! Not only in StorageTab, ExectutorTab may also lose some
rdd-infos which have been overwritten by following rdd in a same task.
StorageTab: when
uncleGen created SPARK-3170:
---
Summary: Bug Fix in Storage UI
Key: SPARK-3170
URL: https://issues.apache.org/jira/browse/SPARK-3170
Project: Spark
Issue Type: Bug
Components: Spark Core
[
https://issues.apache.org/jira/browse/SPARK-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-3170:
Description: current compeleted stage only need to remove its own
partitions that are no longer cached
GitHub user uncleGen opened a pull request:
https://github.com/apache/spark/pull/2076
[SPARK-3170][CORE]: Bug Fix in Storage UI
current compeleted stage only need to remove its own partitions that are no
longer cached. Currently, Storage in Spark UI may lost some rdds which
uncleGen created SPARK-3123:
---
Summary: override the setName function to set EdgeRDD's name
manually just as VertexRDD does.
Key: SPARK-3123
URL: https://issues.apache.org/jira/browse/SPARK-3123
Project
GitHub user uncleGen opened a pull request:
https://github.com/apache/spark/pull/2033
[GraphX]: override the setName function to set EdgeRDD's name manually
just as VertexRDD does.
You can merge this pull request into a Git repository by running:
$ git pull https
Github user uncleGen commented on the pull request:
https://github.com/apache/spark/pull/1696#issuecomment-50958278
@rxin Thanks for your attention, I have updated my jira.
https://issues.apache.org/jira/browse/SPARK-2773
---
If your project is set up for it, you can reply
uncleGen created SPARK-2773:
---
Summary: Shuffle:use growth rate to predict if need to spill
Key: SPARK-2773
URL: https://issues.apache.org/jira/browse/SPARK-2773
Project: Spark
Issue Type
[
https://issues.apache.org/jira/browse/SPARK-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081161#comment-14081161
]
uncleGen commented on SPARK-2773:
-
here is my improvement: https://github.com/apache/spark
GitHub user uncleGen opened a pull request:
https://github.com/apache/spark/pull/1696
Spark Shuffleï¼use growth rate to predict if need to spill
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/uncleGen/spark master
Github user uncleGen commented on the pull request:
https://github.com/apache/spark/pull/1429#issuecomment-49751553
@tgravescs ok, and any suggestions?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
uncleGen created SPARK-2506:
---
Summary: In yarn-cluster mode, ApplicationMaster does not clean up
correctly at the end of the job if users call sc.stop manually
Key: SPARK-2506
URL: https://issues.apache.org/jira/browse
[
https://issues.apache.org/jira/browse/SPARK-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
uncleGen updated SPARK-2506:
Description:
when i call sc.stop manually, some strange ERRORs will appear:
1. in driver log:
INFO
[
https://issues.apache.org/jira/browse/SPARK-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063171#comment-14063171
]
uncleGen commented on SPARK-2506:
-
Here is a simple PR fix this problem: https
GitHub user uncleGen opened a pull request:
https://github.com/apache/spark/pull/1356
Bug Fix: LiveListenerBus Queue Overflow
As we know, the size of eventQueue is fixed. When event comes faster
than consume speed of listener, overflow events will be thrown away with
throwing
Github user uncleGen commented on the pull request:
https://github.com/apache/spark/pull/1356#issuecomment-48691344
@pwendell yeah, this is not a handsome way to resolve the bug. My fix is a
compromised way. Actually, there are no frequent get/put opertions in
blockManager when
Github user uncleGen commented on the pull request:
https://github.com/apache/spark/pull/1356#issuecomment-48691872
@pwendell yeah, it is not a handsome way to resolve the bug. My fix is a
compromise way. Actualy, is will not cause frequent put/get opertions in
blockManager when
601 - 689 of 689 matches
Mail list logo