martin-g commented on code in PR #19316:
URL: https://github.com/apache/datafusion/pull/19316#discussion_r2619201926
##########
datafusion/physical-plan/src/analyze.rs:
##########
@@ -194,26 +228,49 @@ impl ExecutionPlan for AnalyzeExec {
// JoinSet that computes the overall row count and final
// record batch
let mut input_stream = builder.build();
+
+ let inner_schema = Arc::clone(&self.input.schema());
+ let auto_explain = self.auto_explain;
+ let auto_explain_output = self.auto_explain_output.clone();
+ let auto_explain_min_duration = self.auto_explain_min_duration;
+
let output = async move {
+ let mut batches = vec![];
let mut total_rows = 0;
while let Some(batch) = input_stream.next().await.transpose()? {
total_rows += batch.num_rows();
+ batches.push(batch);
Review Comment:
This introduces batching even for the non-auto-explain mode! It will add a
lot of memory pressure for all users!
##########
datafusion/physical-plan/src/analyze.rs:
##########
@@ -271,19 +328,41 @@ fn create_output_batch(
.map_err(DataFusionError::from)
}
+fn export_auto_explain(batch: RecordBatch, output: &str) -> Result<()> {
+ let fd: &mut dyn Write = match output {
+ "stdout" => &mut io::stdout(),
+ "stderr" => &mut io::stderr(),
+ _ => &mut OpenOptions::new().create(true).append(true).open(output)?,
Review Comment:
Does this need any kind of validation of the file location ?
Or it is left to the developer/admin to make sure it is a safe place ?
##########
datafusion/physical-plan/src/analyze.rs:
##########
Review Comment:
This method does not preserve the new auto_explain settings. Is this
intended ?
##########
datafusion/common/src/config.rs:
##########
@@ -1117,6 +1117,19 @@ config_namespace! {
/// "summary" shows common metrics for high-level insights.
/// "dev" provides deep operator-level introspection for developers.
pub analyze_level: ExplainAnalyzeLevel, default =
ExplainAnalyzeLevel::Dev
+
+ /// Whether to enable the `auto_explain` mode. In this mode, the
execution plan is
+ /// automatically written to the location set by
`auto_explain_output`, if the plan's total
+ /// duration is greater or equal to `auto_explain_min_duration`, in
milliseconds.
+ pub auto_explain: bool, default = false
+
+ /// Output location used in the `auto_explain` mode. Supports
`stdout`, `stderr`, or a file
+ /// path (file is created if it does not exist; plans are appended to
the file).
+ pub auto_explain_output: String, default = "stdout".to_owned()
+
+ /// In the `auto_explain` mode, only output plans if their duration is
bigger than this
Review Comment:
```suggestion
/// In the `auto_explain` mode, only output plans if their duration
is bigger than or equal to this
```
##########
datafusion/physical-plan/src/analyze.rs:
##########
@@ -271,19 +328,41 @@ fn create_output_batch(
.map_err(DataFusionError::from)
}
+fn export_auto_explain(batch: RecordBatch, output: &str) -> Result<()> {
+ let fd: &mut dyn Write = match output {
+ "stdout" => &mut io::stdout(),
+ "stderr" => &mut io::stderr(),
+ _ => &mut OpenOptions::new().create(true).append(true).open(output)?,
Review Comment:
Does this need some kind of synchronisation when a file path is used for the
output ? Two or more DF sessions using the same config may try to write to the
same file simultaneously.
##########
datafusion/sqllogictest/test_files/explain_analyze.slt:
##########
@@ -25,3 +25,23 @@ Plan with Metrics LazyMemoryExec: partitions=1,
batch_generators=[generate_serie
statement ok
reset datafusion.explain.analyze_level;
+
+
+# test auto_explain
+
+statement ok
+set datafusion.explain.auto_explain_output =
'test_files/scratch/auto_explain.txt';
Review Comment:
Does something assert the contents of this output file ?
Does something remove this file at the end ?
##########
datafusion/physical-plan/src/analyze.rs:
##########
@@ -101,6 +121,20 @@ impl AnalyzeExec {
input.boundedness(),
)
}
+
+ pub fn enable_auto_explain(&mut self) {
+ self.auto_explain = true;
+ self.cache =
+ Self::compute_properties(&self.input,
Arc::clone(&self.input.schema()));
+ }
+
+ pub fn set_auto_explain_output(&mut self, value: String) {
+ self.auto_explain_output = value
+ }
+
+ pub fn set_auto_explain_min_duration(&mut self, value: usize) {
Review Comment:
nit: this helps to realize what units are used here. Otherwise the reader
will need to search around to find this out.
Or even better - change the type of auto_explain_min_duration at
https://github.com/apache/datafusion/pull/19316/changes#diff-0cd0842fd045a959ba4ee886c0410353ee6af047478bddb5019b1e45a9fc9774R71
to Duration
```suggestion
pub fn set_auto_explain_min_duration_ms(&mut self, value: usize) {
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]