jerqi commented on issue #124:
URL: 
https://github.com/apache/incubator-uniffle/issues/124#issuecomment-1204830693

   > > > Do you set `spark.rss.data.replica.read=2`
   > > 
   > > 
   > > Yes
   > > > As long as the read client gets the metadata from the 2 of servers, it 
can check the integrity of data from any one of server.
   > > 
   > > 
   > > But this step seems execute before `readShuffleData`
   > 
   > The metadata is acquired in advance, but data integrity check is executed 
when all blocks have been fetched. In current implementation, the client will 
only fetch “the first available” server to avoid the read cost. But when the 
data in this first server is damaged, the final check will report "read 
inconsistent".
   
   I feel a little unreasonable about this implement. Should we read next 
shuffle server when the data isn't complete?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to