jorgecarleitao commented on a change in pull request #8079:
URL: https://github.com/apache/arrow/pull/8079#discussion_r479896964



##########
File path: rust/datafusion/src/execution/context.rs
##########
@@ -255,26 +253,17 @@ impl ExecutionContext {
 
     /// Register a table using a custom TableProvider so that it can be 
referenced from SQL
     /// statements executed against this context.
-    pub fn register_table(
-        &mut self,
-        name: &str,
-        provider: Box<dyn TableProvider + Send + Sync>,
-    ) {
-        let mut ctx_state = self.state.lock().expect("failed to lock mutex");
-        ctx_state.datasources.insert(name.to_string(), provider);
+    pub fn register_table(&mut self, name: &str, provider: Box<dyn 
TableProvider>) {

Review comment:
       TL;DR: let's ship this, as master is already not thread safe.
   
   The two use-cases I though at the time:
   
   1. when a project is planning a complex number of plans that depend on a 
common set of sources and UDFs, it would be nice to be able to multi-thread the 
planning. This is particularly important when planning requires reading remote 
metadata to formulate themselves (e.g. when the source is in `s3` with many 
partitions). Metadata reading is often slow and network bounded, which makes 
threads suitable for these workloads. If multi-threading is not possible, 
either each plan needs to read the metadata independently (one context per 
plan) or planning must be sequential (with lots of network waiting).
   
   2. when creating bindings to programming languages that support 
multi-threading, it would be nice for the `ExecutionContext` to be thread safe, 
so that we can more easily integrate with those languages.
   
   The reason for 2 is described in [`pyo3`'s 
documentation](https://pyo3.rs/master/class.html#defining-a-new-class):
   
   > Because Python objects are freely shared between threads by the Python 
interpreter, all structs annotated with #[pyclass] must implement Send
   
   Thread safety is not a strict requirement, though `pyo3` offers the option 
to `panic!` if the instance is used across threads. Not a great user 
experience, but definitely better than undefined behavior :)
   
   The code below works in DataFusion 1.0.0, and no longer works in master:
   
   ```
   #[pyclass]
   struct ExecutionContext {
       ctx: datafusion::execution::context::ExecutionContext,
   }
   ```
   
   > `(dyn datafusion::execution::physical_plan::PhysicalPlanner + 'static)` 
cannot be shared between threads safely
   
   Since this is already an issue on master, I think that we just ship this, 
that is a great simplification.
   
   However, I think that we would enjoy some clarity of our stance wrt to 
thread safety of the `ExecutionContext`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to