jerqi commented on issue #124: URL: https://github.com/apache/incubator-uniffle/issues/124#issuecomment-1204830693
> > > Do you set `spark.rss.data.replica.read=2` > > > > > > Yes > > > As long as the read client gets the metadata from the 2 of servers, it can check the integrity of data from any one of server. > > > > > > But this step seems execute before `readShuffleData` > > The metadata is acquired in advance, but data integrity check is executed when all blocks have been fetched. In current implementation, the client will only fetch “the first available” server to avoid the read cost. But when the data in this first server is damaged, the final check will report "read inconsistent". I feel a little unreasonable about this implement. Should we read next shuffle server when the data isn't complete? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
