GitHub user debajyoti-truefoundry added a comment to the discussion: DataSourceExec metrics explanation
Thanks for your response. I have a follow-up question. For `time_scanning_total`, > /// Sum of time between when the [`FileStream`] requests data from /// the stream and when a [`RecordBatch`] is produced for all /// record batches in the stream. Note that this metric also /// includes the time of the parent operator's execution. Above, it says that it includes the time of the parent operator's execution. ``` ProjectionExec: expr=...], metrics=[output_rows=10, elapsed_compute=3.06µs] SortPreservingMergeExec: [Timestamp@14 ASC NULLS LAST], fetch=10, metrics=[output_rows=10, elapsed_compute=94.06µs] SortExec: TopK(fetch=10), expr=[Timestamp@14 ASC NULLS LAST], preserve_partitioning=[true], metrics=[output_rows=80, elapsed_compute=12.08824ms, row_replacements=524] ProjectionExec: expr=[...], metrics=[output_rows=492435, elapsed_compute=1.642723ms] AggregateExec: mode=FinalPartitioned, gby=[SpanId@0 as SpanId], aggr=[...], metrics=[output_rows=492435, elapsed_compute=3.782239042s, spill_count=0, spilled_bytes=0, spilled_rows=0, peak_mem_used=1391660979] CoalesceBatchesExec: target_batch_size=20000, metrics=[output_rows=492435, elapsed_compute=833.513µs] RepartitionExec: partitioning=Hash([SpanId@0], 8), input_partitions=8, metrics=[fetch_time=3.348780759s, repartition_time=74.558637ms, send_time=191.866672ms] AggregateExec: mode=Partial, gby=[SpanId@2 as SpanId], aggr=[...], metrics=[output_rows=492435, elapsed_compute=2.739883773s, spill_count=0, spilled_bytes=0, spilled_rows=0, skipped_aggregation_rows=178127, peak_mem_used=905512155] CoalesceBatchesExec: target_batch_size=20000, metrics=[output_rows=492435, elapsed_compute=65.121µs] FilterExec: TraceId@2 = fb09d0c9b49136bb161464b3e32c5083 AND ParentSpanId@4 = 662b7388122dfb79 AND Timestamp@0 >= 1744702200000000 AND TsBucketStart@1 >= 1745712000, projection=[...], metrics=[output_rows=492435, elapsed_compute=3.906094ms] DeltaScan, metrics=[files_pruned=96, files_scanned=205] DataSourceExec: file_groups={8 groups: [[...]]}, projection=[...], file_type=parquet, predicate=..., required_guarantees=[...], metrics=[output_rows=492435, elapsed_compute=8ns, bytes_scanned=369875275, file_open_errors=0, file_scan_errors=0, num_predicate_creation_errors=0, page_index_rows_matched=5863737, page_index_rows_pruned=0, predicate_evaluation_errors=0, pushdown_rows_matched=12712344, pushdown_rows_pruned=5371302, row_groups_matched_bloom_filter=10, row_groups_matched_statistics=205, row_groups_pruned_bloom_filter=195, row_groups_pruned_statistics=0, bloom_filter_eval_time=8.481976ms, metadata_load_time=60.123782ms, page_index_eval_time=306.984µs, row_pushdown_eval_time=28.323855ms, statistics_eval_time=4.302826ms, time_elapsed_opening=7.653019ms, time_elapsed_processing=582.67032ms, time_elapsed_scanning_total=3.243596916s, time_elapsed_scanning_until_data=398.202399ms] ``` For this query plan, what will be the parent operator? Is the parent operator part of the above query plan? Or in this case, the parent operator is the Parquet record batch reader itself? GitHub link: https://github.com/apache/datafusion/discussions/16572#discussioncomment-13607637 ---- This is an automatically sent email for github@datafusion.apache.org. To unsubscribe, please send an email to: github-unsubscr...@datafusion.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org