yanghua commented on pull request #2410:
URL: https://github.com/apache/hudi/pull/2410#issuecomment-755391853


   > @yanghua Did mull this a lot, along similar lines. I was thinking about 
Engine as a general construct that provides parallel execution, rather than 
being tied to the client/writing. For e.g we can use parallelized listing even 
on the InputFormat implementations.
   > 
   > The core issue is we want to parallelize `FSUtils.getAllPartitionPaths()` 
(and the underlying call to the HoodieTableMetadata#getAllPartitionPaths()). if 
you can think of a better way, please let us know.
   
   Maybe I don't have a good way, but I personally tend to make common a little 
simpler, so that it does not blend with the engine. Engine can not be tied to 
client/writing, but can engine become a standalone module? For example, between 
common and client? `common <- engine <- client`. The operations to be 
parallelized are in the engine module, I don't know if it is feasible.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to