ableegoldman opened a new issue, #16836:
URL: https://github.com/apache/datafusion/issues/16836

   ### Describe the bug
   
   Hi, I've recently started using DataFusion and have run into an issue trying 
to copy some results into a local cache implemented using the Memtable. Here is 
the code:
   
   ```
   // execute the query and get the DataFrame:
   let df = self
               .ctx
               .execute_logical_plan(plan.clone())
               .await
               .map_err(anyhow::Error::msg)?;
   
   // print total number of rows
   info!("Number of DB results {}", df.clone().count().await?);
   
   // insert into the local table
   let plan = cached_table.provider
       .insert_into(
               &self.ctx.state(),
               df.clone().create_physical_plan().await?,
               InsertOp::Append,
           )
           .await?;
   let task_ctx = self.ctx.task_ctx();
   let mut stream = plan.execute(0, task_ctx)?;
   
   // print number of rows inserted
   while let Some(batch) = stream.try_next().await? {
       let rows = batch.column_by_name("count");
       info!(
           "Inserted {:?} rows into cache table {}",
           rows,
           id
       );
   }
   
   ```
   
   There are 731 result rows, but every time I run this only 87 rows are 
inserted into the MemTable/cache. I've confirmed this is the accurate count of 
rows inserted because some later code scans this cache and indeed finds only 87 
rows. The number 87 is consistent for me, with the 731 original rows, but 
varies slightly depending on how many total rows there are -- for example, my 
teammate had 751 rows in his backing db, and saw it repeatedly insert only 90 
rows instead of 87. 
   
   Any idea why only a small subset of these rows are being inserted? There is 
only 1 partition btw (according to         
`plan.output_partitioning().partition_count())`)
   
   ### To Reproduce
   
   _No response_
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to