[GitHub] [spark] zhouyejoe commented on pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data

2021-06-10 Thread GitBox


zhouyejoe commented on pull request #32007:
URL: https://github.com/apache/spark/pull/32007#issuecomment-858861907


   Updated to the latest master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhouyejoe commented on pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data

2021-06-08 Thread GitBox


zhouyejoe commented on pull request #32007:
URL: https://github.com/apache/spark/pull/32007#issuecomment-856517863


   Added two unit tests for IndexShuffleBlockResolver getMergedBlockData and 
getMergedBlockMeta


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhouyejoe commented on pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data

2021-06-06 Thread GitBox


zhouyejoe commented on pull request #32007:
URL: https://github.com/apache/spark/pull/32007#issuecomment-855593936


   Address most of the review comments. The unit test for 
getLocalMergedBlockData and getLocalMergedBlockMeta are yet to be added. I 
think we should add the unit tests for IndexShuffleBlockResolver for these two 
tests. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhouyejoe commented on pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data

2021-06-04 Thread GitBox


zhouyejoe commented on pull request #32007:
URL: https://github.com/apache/spark/pull/32007#issuecomment-853615497


   Updated with a slim version, which excludes the handling for multiple 
attempts case. 
   @Ngone51  Would like to share a little bit more context. We had multiple 
round of discussion internally regarding this PR, and the agreement we have 
reached internally is to exclude the multiple attempts support from this PR, 
whereas we have created a ticket SPARK-30602 to add it later on.
   
   cc @mridulm @Victsm @otterc


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhouyejoe commented on pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data

2021-06-03 Thread GitBox


zhouyejoe commented on pull request #32007:
URL: https://github.com/apache/spark/pull/32007#issuecomment-853615497


   Updated with a slim version, which excludes the handling for multiple 
attempts case. 
   @Ngone51  Would like to share a little bit more context. We had multiple 
round of discussion internally regarding this PR, and the agreement we have 
reached internally is to exclude the multiple attempts support from this PR, 
whereas we have created a ticket SPARK-30602 to add it later on.
   
   cc @mridulm @Victsm @otterc


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhouyejoe commented on pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data

2021-05-27 Thread GitBox


zhouyejoe commented on pull request #32007:
URL: https://github.com/apache/spark/pull/32007#issuecomment-850036241


   Created ticket for later improvement 
https://issues.apache.org/jira/browse/SPARK-35546


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhouyejoe commented on pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data

2021-05-27 Thread GitBox


zhouyejoe commented on pull request #32007:
URL: https://github.com/apache/spark/pull/32007#issuecomment-849395958


   @mridulm @Victsm @otterc @Ngone51  Updated the PR. Please help review. 
Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhouyejoe commented on pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data

2021-05-22 Thread GitBox


zhouyejoe commented on pull request #32007:
URL: https://github.com/apache/spark/pull/32007#issuecomment-846476480


   @otterc @mridulm @Ngone51 Addressed the comments. Please review. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhouyejoe commented on pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data

2021-05-22 Thread GitBox


zhouyejoe commented on pull request #32007:
URL: https://github.com/apache/spark/pull/32007#issuecomment-846476449


   > 1. `RemoteBlockPushResolver` needs to ignore any `PushBlock` message that 
is from previous attempts otherwise it will still merge a block of previous 
attempt to files of latest attempt and this is going to corrupt merged files.
   > 2. We should try to keep active partitions info in `partitions` map and 
delete stale entries (partition info belonging to old attempts).
   > 3. Need to add UTs for the server to ignore any pushblock messages from 
previous attempts.
   > 4. I don't think we need to add attemptId to FinalizeMerge message
   
   1. Added the part to ignore the PushBlock from previous attempt
   2. Added the part to delete stale entries in partitions hashmap
   3. Added UT
   4. As discussed above, it is still needed to have the attemptID in the 
FinalizeMerge message


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhouyejoe commented on pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data

2021-04-20 Thread GitBox


zhouyejoe commented on pull request #32007:
URL: https://github.com/apache/spark/pull/32007#issuecomment-823463844


   @tgravescs @Ngone51 @attilapiros Can you help review this PR? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhouyejoe commented on pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data

2021-04-16 Thread GitBox


zhouyejoe commented on pull request #32007:
URL: https://github.com/apache/spark/pull/32007#issuecomment-820954486


   Thanks for reviewing the PR. I am addressing the comments, will update the 
PR soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org