Github user yucai commented on the issue:
https://github.com/apache/spark/pull/19788
@cloud-fan , exception screenshot. Let me know if you want any change.
![image](https://user-images.githubusercontent.com/2989575/47471258-1793ce00-d83c-11e8-90bf-107865fc9032.png)
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97975/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #97975 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97975/testReport)**
for PR 19788 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #97975 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97975/testReport)**
for PR 19788 at commit
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19788
BTW, let's add a config for this feature. We may enable adaptive execution
by default in the future, and we should still allow users to run spark with
legacy shuffle service. We should also throw
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #97787 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97787/testReport)**
for PR 19788 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #97745 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97745/testReport)**
for PR 19788 at commit
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19788
Hi @yucai , good points on the performance concerns. Let's go with the
previous approach:
https://github.com/apache/spark/pull/19788#issuecomment-366887404
sorry for the back and forth!
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #97745 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97745/testReport)**
for PR 19788 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands,
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97715/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #97715 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97715/testReport)**
for PR 19788 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #97715 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97715/testReport)**
for PR 19788 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/19788
**Summary**
One disk IO solution's performance seems not as good as current PR19877's
implementation.
**Benchmark**
```scacla
spark.range(1, 512000L, 1,
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19788
> One possible solution is to read all contiguous partition in one shot and
then send each shuffle block one by one, how do you think? We may need
benchmark performance in this way.
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/19788
@cloud-fan @gatorsmile I am trying the new method as suggested and I have a
question.
If we make it **purely server-side** optimization, for external shuffle
service, it has no idea how
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/19788
@gatorsmile @cloud-fan @carsonwang I will update it recently.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19788
ping @yucai @carsonwang
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/19788
Have synced with @cloud-fan , I will update this, thanks very much!
---
-
To unsubscribe, e-mail:
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19788
I think we can do this better, to make it a purely server-side
optimization. The shuffle protocol can already fetch multiple blocks in one
request, i.e. the `OpenBlocks` request.
The
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/19788
@cloud-fan , sorry for late response, could you help take a look at?
---
-
To unsubscribe, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87947/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #87947 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87947/testReport)**
for PR 19788 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #87947 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87947/testReport)**
for PR 19788 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87930/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #87930 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87930/testReport)**
for PR 19788 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #87930 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87930/testReport)**
for PR 19788 at commit
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/19788
Jenkins, retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/19788
I did local UT, org.apache.spark.sql.FileBasedDataSourceSuite is good.
---
-
To unsubscribe, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87814/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #87814 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87814/testReport)**
for PR 19788 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87812/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #87812 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87812/testReport)**
for PR 19788 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #87814 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87814/testReport)**
for PR 19788 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #87812 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87812/testReport)**
for PR 19788 at commit
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/19788
@cloud-fan if encryption is enabled
`blockManager.serializerManager().encryptionEnabled() == true`, shall we
disable this feature also?
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87610/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #87610 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87610/testReport)**
for PR 19788 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #87610 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87610/testReport)**
for PR 19788 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87607/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #87607 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87607/testReport)**
for PR 19788 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #87607 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87607/testReport)**
for PR 19788 at commit
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/19788
Good suggestion, thanks @cloud-fan! Let me update accordingly.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19788
The idea LGTM, but I think @JoshRosen has a valid concern. My 2 cents:
1. The concept of reading multiple reducer partitions in one shot was
introduced by `ShuffleManager.getReader`. Although
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87404/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands,
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #87404 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87404/testReport)**
for PR 19788 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #87404 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87404/testReport)**
for PR 19788 at commit
Github user squito commented on the issue:
https://github.com/apache/spark/pull/19788
Jenkins, retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/19788
I did local UT, org.apache.spark.sql.FileBasedDataSourceSuite looks good,
@jerryshao could you help re-trigger testing?
---
-
To
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/19788
retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/19788
@JoshRosen this feature only happens in SparkSQL, and SparkSQL uses
UnsafeRowSerializer, which supports relocation of serialized objects, so I
think we only consider compression, right? I have added
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/19788
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87312/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #87312 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87312/testReport)**
for PR 19788 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #87312 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87312/testReport)**
for PR 19788 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87305/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #87305 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87305/testReport)**
for PR 19788 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #87305 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87305/testReport)**
for PR 19788 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/19788
Thanks @wangyum for this nice data support!
Now, we can see obvious time reduce from this feature.
---
-
To unsubscribe,
Github user wangyum commented on the issue:
https://github.com/apache/spark/pull/19788
Thanks @yucai , It's a great improvement for many output files. The figure
below is our comparison:
**Before**:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86991/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #86991 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86991/testReport)**
for PR 19788 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #86991 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86991/testReport)**
for PR 19788 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #86939 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86939/testReport)**
for PR 19788 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86939/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #86939 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86939/testReport)**
for PR 19788 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #86938 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86938/testReport)**
for PR 19788 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86938/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19788
**[Test build #86938 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86938/testReport)**
for PR 19788 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/19788
I will update a new version.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19788
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user JoshRosen commented on the issue:
https://github.com/apache/spark/pull/19788
Is there an implicit assumption here that contiguous partitions' data can
be decompressed / deserialized in a single stream? If the shuffled data is
written with a non-relocatable serializer
Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/19788
Sounds good, it would be great if we could document it clearly that if user
wants to use adaptive execution, they have to update the external shuffle
service.
---
Github user yucai commented on the issue:
https://github.com/apache/spark/pull/19788
@jerryshao @cloud-fan @gczsjdy
Because this feature is only used in adaptive execution, how about this way:
- Remove `spark.shuffle.continuousFetch`
- When
Github user gczsjdy commented on the issue:
https://github.com/apache/spark/pull/19788
Can we just add the `ContinuousShuffleBlockId` without adding new conf
`spark.shuffle.continuousFetch`? While in classes related to shuffle read like
`ShuffleBlockFetcherIterator`, we also pattern
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19788
@yucai I'm thinking of the necessity to add this new configuration
`spark.shuffle.continuousFetch` like you mentioned above. This PR you proposed
is actually a superset of previous way, it is
Github user gczsjdy commented on the issue:
https://github.com/apache/spark/pull/19788
What are ` external shuffle service` here? Can you explain a little bit?
---
-
To unsubscribe, e-mail:
1 - 100 of 120 matches
Mail list logo