gobraves commented on issue #6983: URL: https://github.com/apache/arrow-datafusion/issues/6983#issuecomment-1660801140
hi @alamb, I apologize for the delayed response. Based on your tips, I executed the following commands in the CLI and also ran the code you provided to reproduce the issue. I noticed that executing the commands in the CLI was almost 8 times faster than running the code mentioned above, which is consistent with my CPU core count. Here are the commands I executed in the CLI: create external table test stored as parquet location 'part-0.parquet'; create table t as select * from test; explain create table t as select * from test; In the logical_plan of the explain output, I observed `CreateMemoryTable` and `TableScan`. Consequently, I reviewed the code for `CreateMemoryTable` in the datafusion-cli and the `.cache() ` function, hoping to identify the differences. I noticed that the target_partitions are indeed passed in both cases, but I'm unsure why they are not utilized in `.cache()`. However, from the commit mentioned in issue #6984 , it seems that the problem is resolved by using repartitioning. Therefore, it appears that the difference lies in one implementation using `Partitioning`, while the other does not. However, when browsing through the code myself, I couldn't find any relevant settings. If this is the case, could you please provide some hints as to which part of the code this operation occurs? I have one more question: Do we need to create a new DmlStatement to address this issue? > Perhaps this could be done by creating a LogicalPlan::DmlStatement for write and then letting the existing insert machinery work rather than doing a custom "collect". I'm not entirely clear about this statement, and I believe it might be because I haven't fully grasped the problem described above. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org