[GitHub] [incubator-uniffle] zuston commented on issue #239: [Problem] RssUtils#transIndexDataToSegments should consider the length of the data file

GitBox Thu, 22 Sep 2022 20:24:29 -0700


zuston commented on issue #239:
URL: 
https://github.com/apache/incubator-uniffle/issues/239#issuecomment-1255762455


   > If the data is currently being flushed, we cannot guarantee that the 
number of blocks in the index is the same as the number of blocks in the data.
   
   Yes, you are right. Please refer to #204 . but this PR wont solve your 
problems you mentioned, it just make fail fast and log some exception for 
analysis.
   
   Let revisit this problem, as I know, the reading sequence of client will 
from memory -> localfile to hdfs, that means the incomplete data reading is not 
affect the result. 
   
   For example, the partial memory shuffle data is being flushed to HDFS or in 
the flushing queue, it also will get from the read client side. Although the 
index file in HDFS is incomplete, the partial data has been accepted from 
memory. So this is not a problem. 
   
   So the problems you mentioned make me confused, there should be no problem 
with the design of reading, there may be some bugs.
   
   By the way, I have also encountered inconsistent block problems, but we are 
using the memory_localfile mode, which is caused by the instability of grpc 
service, refer to #198 
   
   Feel free to discuss more. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-uniffle] zuston commented on issue #239: [Problem] RssUtils#transIndexDataToSegments should consider the length of the data file

Reply via email to