kuczoram commented on pull request #2231:
URL: https://github.com/apache/hive/pull/2231#issuecomment-829104540


   Hi Krisztian!
   
   Thanks for this patch, it is very interesting. 
   I would have one question about using multiple reducers. Do you know how it 
will be guaranteed that all rows with the same bucket would go into the same 
reducer and at the end to the same file?
   
   I am asking it because during compaction we had issues that rows with the 
same bucket number went into different reducers and we ended up with corrupted 
files (when rows with the same bucket numbers went into different files or 
files contained rows with different bucket numbers). I saw this issue when 
created an unbucketed table, but inserted a bigger amount of data, so at the 
end, the table contained multiple bucket files.
   
   I know that compaction is a different story, I am just curious whether or 
not something similar could happen with deletes/updates using multiple 
reducers. If you know how the row distribution between the reducers would work 
for deletes and updates, I would be really grateful if you could share some 
details.
   
   Thanks and regards,
   Marta


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to