[ https://issues.apache.org/jira/browse/SPARK-46639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eren Avsarogullari updated SPARK-46639: --------------------------------------- Attachment: WindowExec SQLMetrics.png > Add WindowExec SQLMetrics > ------------------------- > > Key: SPARK-46639 > URL: https://issues.apache.org/jira/browse/SPARK-46639 > Project: Spark > Issue Type: Task > Components: SQL > Affects Versions: 4.0.0 > Reporter: Eren Avsarogullari > Priority: Major > Labels: pull-request-available > Attachments: WindowExec SQLMetrics.png > > > Currently, WindowExec Physical Operator has only spillSize SQLMetric. This > jira aims to add following SQLMetrics to provide more information from > WindowExec usage during query execution: > {code:java} > numOfOutputRows: Number of total output rows. > numOfPartitions: Number of processed input partitions. > numOfWindowPartitions: Number of generated window partitions. > spilledRows: Number of total spilled rows. > spillSizeOnDisk: Total spilled data size on disk.{code} > As an example use-case, WindowExec spilling behavior depends on multiple > factors and it can sometime cause {{SparkOutOfMemoryError}} instead of > spilling to disk so it is hard to track without SQL Metrics such as: > *1-* WindowExec creates ExternalAppendOnlyUnsafeRowArray (internal > ArrayBuffer) per task (a.k.a child RDD partition) > *2-* When ExternalAppendOnlyUnsafeRowArray size exceeds > spark.sql.windowExec.buffer.in.memory.threshold=4096, > ExternalAppendOnlyUnsafeRowArray switches to UnsafeExternalSorter as > spillableArray by moving its all buffered rows into UnsafeExternalSorter and > ExternalAppendOnlyUnsafeRowArray (internal ArrayBuffer) is cleared. In this > case, WindowExec starts to write UnsafeExternalSorter' s buffer (a.k.a > UnsafeInMemorySorter). > *3-* UnsafeExternalSorter is being created per window partition. When > UnsafeExternalSorter' buffer size exceeds > spark.sql.windowExec.buffer.spill.threshold=Integer.MAX_VALUE, it starts to > write to disk and get cleared all buffer (a.k.a UnsafeInMemorySorter) > content. In this case, UnsafeExternalSorter will continue to buffer next > records until exceeding spark.sql.windowExec.buffer.spill.threshold. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org