Re: [I] Missing data when inserting into MemTable [datafusion]

via GitHub Thu, 24 Jul 2025 08:17:16 -0700


cadonna commented on issue #16836:
URL: https://github.com/apache/datafusion/issues/16836#issuecomment-3113873969


   We do not have a reproducer yet. However, I would like to share some more 
data points with you. We suspect the issue is related to partitioning for two 
reasons:
   1. If we set the target partitions to 1, the issue does not happen.
   2. In the following call to `execute_input_stream()` the partition of the 
input stream is hard coded to 0. To my beginner's eyes it looks like that only 
partition 0 of the input stream is read and inserted into the sink table. Could 
that be?
   
https://github.com/apache/datafusion/blob/dbc03fa4f6d47c8f3b97f3a3d979945b2b7ccce7/datafusion/datasource/src/sink.rs#L227
   
   A useful info might also be the workaround we found. Instead of using 
`provider.insert_into()` we use `df.clone().write_table(sink_table_name, ...)`.
   
   I hope this is helpful!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Missing data when inserting into MemTable [datafusion]

Reply via email to