[jira] [Updated] (SPARK-23015) spark-submit fails when submitting several jobs in parallel

Hugh Zabriskie (JIRA) Tue, 09 Jan 2018 15:14:56 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-23015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hugh Zabriskie updated SPARK-23015:
-----------------------------------
    Description: 
Spark Submit's launching library prints the command to execute the launcher 
(org.apache.spark.launcher.main) to a temporary text file, reads the result 
back into a variable, and then executes that command.

{code}
set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt
"%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main %* 
> %LAUNCHER_OUTPUT%
{code}
[bin/spark-class2.cmd, 
L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66]

That temporary text file is given a pseudo-random name by the %RANDOM% env 
variable generator, which generates a number between 0 and 32767.

This appears to be the cause of an error occurring when several spark-submit 
jobs are launched simultaneously. The following error is returned from stderr:

{quote}The process cannot access the file because it is being used by another 
process. The system cannot find the file
USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt.
The process cannot access the file because it is being used by another 
process.{quote}

My hypothesis is that %RANDOM% is returning the same value for multiple jobs, 
causing the launcher library to attempt to write to the same file from multiple 
processes. Another mechanism is needed for reliably generating the names of the 
temporary files so that the concurrency issue is resolved.

  was:
Spark Submit's launching library prints the command to execute the launcher 
(org.apache.spark.launcher.main) to a temporary text file, reads the result 
back into a variable, and then executes that command.

[bin/spark-class2.cmd, 
L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66]

That temporary text file is given a pseudo-random name by the 
{code}%RANDOM%{code} env variable generator, which generates a number between 0 
and 32767.

This appears to be the cause of an error occurring when several spark-submit 
jobs are launched simultaneously. The following error is returned from stderr:

{quote}The process cannot access the file because it is being used by another 
process. The system cannot find the file
USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt.
The process cannot access the file because it is being used by another 
process.{quote}

My hypothesis is that %RANDOM% is returning the same value for multiple jobs, 
causing the launcher library to attempt to write to the same file from multiple 
processes. Another mechanism is needed for reliably generating the names of the 
temporary files so that the concurrency issue is resolved.


> spark-submit fails when submitting several jobs in parallel
> -----------------------------------------------------------
>
>                 Key: SPARK-23015
>                 URL: https://issues.apache.org/jira/browse/SPARK-23015
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Submit
>    Affects Versions: 1.4.0, 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 
> 2.2.1
>         Environment: Windows 10 (1709/16299.125)
> Spark 2.3.0
> Java 8, Update 151
>            Reporter: Hugh Zabriskie
>
> Spark Submit's launching library prints the command to execute the launcher 
> (org.apache.spark.launcher.main) to a temporary text file, reads the result 
> back into a variable, and then executes that command.
> {code}
> set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt
> "%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main 
> %* > %LAUNCHER_OUTPUT%
> {code}
> [bin/spark-class2.cmd, 
> L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66]
> That temporary text file is given a pseudo-random name by the %RANDOM% env 
> variable generator, which generates a number between 0 and 32767.
> This appears to be the cause of an error occurring when several spark-submit 
> jobs are launched simultaneously. The following error is returned from stderr:
> {quote}The process cannot access the file because it is being used by another 
> process. The system cannot find the file
> USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt.
> The process cannot access the file because it is being used by another 
> process.{quote}
> My hypothesis is that %RANDOM% is returning the same value for multiple jobs, 
> causing the launcher library to attempt to write to the same file from 
> multiple processes. Another mechanism is needed for reliably generating the 
> names of the temporary files so that the concurrency issue is resolved.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-23015) spark-submit fails when submitting several jobs in parallel

Reply via email to