jorgecarleitao commented on a change in pull request #8079:
URL: https://github.com/apache/arrow/pull/8079#discussion_r479896964
##########
File path: rust/datafusion/src/execution/context.rs
##########
@@ -255,26 +253,17 @@ impl ExecutionContext {
/// Register a table using a custom TableProvider so that it can be
referenced from SQL
/// statements executed against this context.
- pub fn register_table(
- &mut self,
- name: &str,
- provider: Box<dyn TableProvider + Send + Sync>,
- ) {
- let mut ctx_state = self.state.lock().expect("failed to lock mutex");
- ctx_state.datasources.insert(name.to_string(), provider);
+ pub fn register_table(&mut self, name: &str, provider: Box<dyn
TableProvider>) {
Review comment:
TL;DR: let's ship this, as master is already not thread safe.
The two use-cases I though at the time:
1. when a project is planning a complex number of plans that depend on a
common set of sources and UDFs, it would be nice to be able to multi-thread the
planning. This is particularly important when planning requires reading remote
metadata to formulate themselves (e.g. when the source is in `s3` with many
partitions). Metadata reading is often slow and network bounded, which makes
threads suitable for these workloads. If multi-threading is not possible,
either each plan needs to read the metadata independently (one context per
plan) or planning must be sequential (with lots of network waiting).
2. when creating bindings to programming languages that support
multi-threading, it would be nice for the `ExecutionContext` to be thread safe,
so that we can more easily integrate with those languages.
The reason for 2 is described in [`pyo3`'s
documentation](https://pyo3.rs/master/class.html#defining-a-new-class):
> Because Python objects are freely shared between threads by the Python
interpreter, all structs annotated with #[pyclass] must implement Send
Thread safety is not a strict requirement, though `pyo3` offers the option
to `panic!` if the instance is used across threads. Not a great user
experience, but definitely better than undefined behavior :)
The code below works in DataFusion 1.0.0, and no longer works in master:
```
#[pyclass]
struct ExecutionContext {
ctx: datafusion::execution::context::ExecutionContext,
}
```
> `(dyn datafusion::execution::physical_plan::PhysicalPlanner + 'static)`
cannot be shared between threads safely
Since this is already an issue on master, I think that we just ship this,
that is a great simplification.
However, I think that we would enjoy some clarity of our stance wrt to
thread safety of the `ExecutionContext`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]