[GitHub] [arrow-datafusion] alamb commented on issue #6983: [DataFrame] Parallel Load into dataframe

via GitHub Mon, 17 Jul 2023 07:20:17 -0700


alamb commented on issue #6983:
URL: 
https://github.com/apache/arrow-datafusion/issues/6983#issuecomment-1638250822


   I made a POC on https://github.com/apache/arrow-datafusion/pull/6984 which 
demonstrates the issue is indeed using more cores to do the write. However, the 
implementation of doing repartitioning is probably not right -- I think the 
better approach would be to set the target partitions when writing into memory 
table
   
   Perhaps this could be done by creating a `LogicalPlan::DmlStatement` for 
write and then letting the existing insert machinery work rather than doing a 
custom "collect". 
   
   
https://docs.rs/datafusion/latest/datafusion/logical_expr/logical_plan/struct.DmlStatement.html
   
   Marking this as a good first issue as I think the approach will work well 
and should be able to follow existing patterns, and was asked for by a customer


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on issue #6983: [DataFrame] Parallel Load into dataframe

Reply via email to