[ https://issues.apache.org/jira/browse/SPARK-48517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Prabhu Joseph resolved SPARK-48517. ----------------------------------- Resolution: Invalid Driver logs have correctly captured the error stream. Apologies for the spam. > PythonWorkerFactory does not print error stream in case the daemon fails > before the main daemon.py#main() > --------------------------------------------------------------------------------------------------------- > > Key: SPARK-48517 > URL: https://issues.apache.org/jira/browse/SPARK-48517 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.3.2 > Reporter: Prabhu Joseph > Priority: Major > > PythonWorkerFactory does not print the error stream in case the daemon fails > before executing the main daemon.py#mainI(). It throws SparkException like > below but does not print the error stream which has the failure why > pyspark.daemon failed to start. > {code:java} > org.apache.spark.SparkException: > 2024-05-07T16:04:53.169524256Z stderr F Bad data in pyspark.daemon's standard > output. Invalid port number: > 2024-05-07T16:04:53.169530703Z stderr F 1097887852 (0x4170706c){code} > The error stream is being [read|#L303]] after throwing SparkException. It has > to be captured during the exception as well. > > *Simple Repro:* > 1. Run a sample pyspark job by setting wrong > spark.python.daemon.module like pyspark.wrongdaemon instead of default one > pyspark.daemon. > > 2. The forked python process will fail with > {*}"{*}{*}/opt/python/3.9.2/bin/python: No module named > pyspark.wrongdaemon"{*} but it is not captured by PythonWorkerFactory. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org