[ 
https://issues.apache.org/jira/browse/SPARK-54152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishanth updated SPARK-54152:
-----------------------------
    Description: 
Currently, log messages in *{{FileScanRDD.compute()}}* do not include *Spark 
Task Context details* such as *Task ID* or {*}Partition ID{*}.

When analyzing executor logs for file-based queries, this makes it *difficult 
to correlate file read operations with the specific tasks* that performed them 
— especially when multiple concurrent file scans occur during shuffle or scan 
stages.

  was:
Currently, log messages in FileScanRDD.compute() and PythonRunner do not 
include Spark task context information (e.g., Task ID, Partition ID).

This change adds task context details to file reading and Python UDF execution 
logs to improve traceability and debugging when UDFs hang or perform slowly.


> Spark | `Add Task Context Information to FileScanRDD.compute() Logs`
> --------------------------------------------------------------------
>
>                 Key: SPARK-54152
>                 URL: https://issues.apache.org/jira/browse/SPARK-54152
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.5.1, 3.5.2, 4.0.0, 4.0.1
>            Reporter: Nishanth
>            Priority: Major
>
> Currently, log messages in *{{FileScanRDD.compute()}}* do not include *Spark 
> Task Context details* such as *Task ID* or {*}Partition ID{*}.
> When analyzing executor logs for file-based queries, this makes it *difficult 
> to correlate file read operations with the specific tasks* that performed 
> them — especially when multiple concurrent file scans occur during shuffle or 
> scan stages.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to