mingmwang commented on a change in pull request #1924:
URL: https://github.com/apache/arrow-datafusion/pull/1924#discussion_r820349045
##########
File path: datafusion/src/execution/runtime_env.rs
##########
@@ -25,47 +25,110 @@ use crate::{
memory_manager::{MemoryConsumerId, MemoryManager, MemoryManagerConfig},
},
};
-
+use std::fmt;
use std::fmt::{Debug, Formatter};
+
+use crate::datasource::object_store::{ObjectStore, ObjectStoreRegistry};
+use crate::execution::context::{
+ SessionContextRegistry, TaskContext, TaskContextRegistry, BATCH_SIZE,
+ PARQUET_PRUNING, REPARTITION_AGGREGATIONS, REPARTITION_JOINS,
REPARTITION_WINDOWS,
+ TARGET_PARTITIONS,
+};
+use crate::prelude::{SessionConfig, SessionContext};
+use datafusion_common::DataFusionError;
+use once_cell::sync::OnceCell;
+use parking_lot::Mutex;
+use std::path::PathBuf;
use std::sync::Arc;
-#[derive(Clone)]
-/// Execution runtime environment. This structure is passed to the
-/// physical plans when they are run.
+/// Global singleton RuntimeEnv
+pub static RUNTIME_ENV: OnceCell<Arc<RuntimeEnv>> = OnceCell::new();
Review comment:
> I think a single static runtime environment makes sense for Ballista
but not for DataFusion (which gets used in a variety of usecases that a single
runtime might not be applicable for)
Hi, Alamb
Thanks a lot for taking a look. One reason that I have to use the global
static runtime is make the ExecutionPlans session/configuration aware.
````
pub trait ExecutionPlan: Debug + Send + Sync {
...................
/// Return the Session id associated with the execution plan.
fn session_id(&self) -> String;
/// Return the Session configuration associated with the execution plan.
fn session_config(&self) -> SessionConfig {
let session_id = self.session_id();
let runtime = RuntimeEnv::global();
runtime.lookup_session_config(session_id.as_str())
}
impl RuntimeEnv {
...................
/// Retrieves a copied version of `SessionConfig` by session_id
pub fn lookup_session_config(&self, session_id: &str) -> SessionConfig {
if self.is_local() {
// It is possible that in a local env such as in unit tests
there is no
// SessionContext created, in this case we have to return a
default SessionConfig.
self.lookup_config(session_id)
.map_or(SessionConfig::new(), |c| c.lock().clone())
} else if self.is_scheduler() {
self.lookup_session(session_id)
.expect("SessionContext doesn't exist")
.copied_config()
} else {
self.config_from_task_context(session_id)
}
}
````
I think even we just use DataFusion as a lib, it is still a valid
requirement to make the ExecutionPlan configuration aware. We might add more
and more configuration options to enable/disable some feature during plan time
and execution runtime(System like SparkSQL/Presto they have more than hundreds
of configuration options). We need an extendable way to achieve this. With the
single static runtime environment, I could make the ExecutionPlan
configuration aware and runtime aware(it knows it runs in DataFusion local as a
lib, or in the Ballista executor/scheduler) without pass-though anything.
Some of the systems might leverage the ThreadLocal facility in the program
language to get the current context/configuration to avoid pass-through
configurations. But In Rust we can not leverage ThreadLocal, they are so many
async parts in the code paths.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]