[GitHub] [spark] zhouyejoe commented on pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data
zhouyejoe commented on pull request #32007: URL: https://github.com/apache/spark/pull/32007#issuecomment-858861907 Updated to the latest master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhouyejoe commented on pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data
zhouyejoe commented on pull request #32007: URL: https://github.com/apache/spark/pull/32007#issuecomment-856517863 Added two unit tests for IndexShuffleBlockResolver getMergedBlockData and getMergedBlockMeta -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhouyejoe commented on pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data
zhouyejoe commented on pull request #32007: URL: https://github.com/apache/spark/pull/32007#issuecomment-855593936 Address most of the review comments. The unit test for getLocalMergedBlockData and getLocalMergedBlockMeta are yet to be added. I think we should add the unit tests for IndexShuffleBlockResolver for these two tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhouyejoe commented on pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data
zhouyejoe commented on pull request #32007: URL: https://github.com/apache/spark/pull/32007#issuecomment-853615497 Updated with a slim version, which excludes the handling for multiple attempts case. @Ngone51 Would like to share a little bit more context. We had multiple round of discussion internally regarding this PR, and the agreement we have reached internally is to exclude the multiple attempts support from this PR, whereas we have created a ticket SPARK-30602 to add it later on. cc @mridulm @Victsm @otterc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhouyejoe commented on pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data
zhouyejoe commented on pull request #32007: URL: https://github.com/apache/spark/pull/32007#issuecomment-853615497 Updated with a slim version, which excludes the handling for multiple attempts case. @Ngone51 Would like to share a little bit more context. We had multiple round of discussion internally regarding this PR, and the agreement we have reached internally is to exclude the multiple attempts support from this PR, whereas we have created a ticket SPARK-30602 to add it later on. cc @mridulm @Victsm @otterc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhouyejoe commented on pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data
zhouyejoe commented on pull request #32007: URL: https://github.com/apache/spark/pull/32007#issuecomment-850036241 Created ticket for later improvement https://issues.apache.org/jira/browse/SPARK-35546 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhouyejoe commented on pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data
zhouyejoe commented on pull request #32007: URL: https://github.com/apache/spark/pull/32007#issuecomment-849395958 @mridulm @Victsm @otterc @Ngone51 Updated the PR. Please help review. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhouyejoe commented on pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data
zhouyejoe commented on pull request #32007: URL: https://github.com/apache/spark/pull/32007#issuecomment-846476480 @otterc @mridulm @Ngone51 Addressed the comments. Please review. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhouyejoe commented on pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data
zhouyejoe commented on pull request #32007: URL: https://github.com/apache/spark/pull/32007#issuecomment-846476449 > 1. `RemoteBlockPushResolver` needs to ignore any `PushBlock` message that is from previous attempts otherwise it will still merge a block of previous attempt to files of latest attempt and this is going to corrupt merged files. > 2. We should try to keep active partitions info in `partitions` map and delete stale entries (partition info belonging to old attempts). > 3. Need to add UTs for the server to ignore any pushblock messages from previous attempts. > 4. I don't think we need to add attemptId to FinalizeMerge message 1. Added the part to ignore the PushBlock from previous attempt 2. Added the part to delete stale entries in partitions hashmap 3. Added UT 4. As discussed above, it is still needed to have the attemptID in the FinalizeMerge message -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhouyejoe commented on pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data
zhouyejoe commented on pull request #32007: URL: https://github.com/apache/spark/pull/32007#issuecomment-823463844 @tgravescs @Ngone51 @attilapiros Can you help review this PR? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhouyejoe commented on pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data
zhouyejoe commented on pull request #32007: URL: https://github.com/apache/spark/pull/32007#issuecomment-820954486 Thanks for reviewing the PR. I am addressing the comments, will update the PR soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org