[GitHub] [spark] JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution

2019-10-17 Thread GitBox
JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to 
local shuffle reader when smj converted to bhj in adaptive execution
URL: https://github.com/apache/spark/pull/25295#issuecomment-543456393
 
 
   @maryannxue 
   1. [#25295 
(comment)](https://github.com/apache/spark/pull/25295#discussion_r334792481): 
this comment has been resolved in 
[commit](https://github.com/apache/spark/commit/51f10ed90f6b28c58fa1e576c8ceaa22e8c5f5ba),
 which let `BlockStoreShuffleReadershould` take `blocksByAddress` directly 
instead of a map id.
   2. I will resolve [#25295 
(comment)](https://github.com/apache/spark/pull/25295#discussion_r336019058) 
and `test("Exchange reuse")` can prove that "query stage reuse still working in 
presence of local shuffle reader". I will add some small updated in 
`test("Exchange reuse")` .


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution

2019-10-13 Thread GitBox
JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to 
local shuffle reader when smj converted to bhj in adaptive execution
URL: https://github.com/apache/spark/pull/25295#issuecomment-541502159
 
 
   @cloud-fan Can you help review the updated patch? Thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution

2019-10-07 Thread GitBox
JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to 
local shuffle reader when smj converted to bhj in adaptive execution
URL: https://github.com/apache/spark/pull/25295#issuecomment-539301442
 
 
   Resolve the conflicts.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution

2019-09-18 Thread GitBox
JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to 
local shuffle reader when smj converted to bhj in adaptive execution
URL: https://github.com/apache/spark/pull/25295#issuecomment-532552503
 
 
   @cloud-fan 
   Move the rule of converting the shuffle reader to local shuffle reader 
before `ReduceNumShufflePartitions`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution

2019-09-11 Thread GitBox
JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to 
local shuffle reader when smj converted to bhj in adaptive execution
URL: https://github.com/apache/spark/pull/25295#issuecomment-530640464
 
 
   @cloud-fan 
   The specific `ShuffleRDD` is implemented by reading the whole data from one 
mapper output locally to ensure there is no data transferred from the network.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution

2019-09-11 Thread GitBox
JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to 
local shuffle reader when smj converted to bhj in adaptive execution
URL: https://github.com/apache/spark/pull/25295#issuecomment-530319218
 
 
   @cloud-fan Thanks for you reviews! When the shuffle blocks exist locally, 
the shuffle service already read the blocks locally even through shuffle 
service in 
[ShuffleBlockFetcherIterator](https://github.com/apache/spark/blob/7f36cd2aa5e066a807d498b8c51645b136f08a75/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala#L100),
 I think. Correct me if wrong understanding! If so, whether need to optimize it 
to locally read?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution

2019-09-10 Thread GitBox
JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to 
local shuffle reader when smj converted to bhj in adaptive execution
URL: https://github.com/apache/spark/pull/25295#issuecomment-530202694
 
 
   @cloud-fan Can you help review if you have available time? Thanks for your 
help very much. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution

2019-09-10 Thread GitBox
JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to 
local shuffle reader when smj converted to bhj in adaptive execution
URL: https://github.com/apache/spark/pull/25295#issuecomment-530202124
 
 
   fixed the conflicts.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution

2019-08-04 Thread GitBox
JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to 
local shuffle reader when smj converted to bhj in adaptive execution
URL: https://github.com/apache/spark/pull/25295#issuecomment-518092402
 
 
   We have done the functionality and performance tests in 3TB TPC-DS. And the 
result is shown in 
[here](https://docs.google.com/spreadsheets/d/1jtT3tCiNjtUbjOelpf50w7Z5JNl2YhzrBnbpF-7EhTw/edit#gid=0).
 Q82 can show 1.76x performance improvement with this PR. And no queries have 
significant performance degradation.
   @carsonwang @@cloud-fan can you help review if you have available time? 
Thanks for your help.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org