[jira] [Commented] (SPARK-23015) spark-submit fails when submitting several jobs in parallel
[ https://issues.apache.org/jira/browse/SPARK-23015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850230#comment-17850230 ] Hyukjin Kwon commented on SPARK-23015: -- Fixed in https://github.com/apache/spark/pull/43706 > spark-submit fails when submitting several jobs in parallel > --- > > Key: SPARK-23015 > URL: https://issues.apache.org/jira/browse/SPARK-23015 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1 > Environment: Windows 10 (1709/16299.125) > Spark 2.3.0 > Java 8, Update 151 >Reporter: Hugh Zabriskie >Priority: Major > Labels: bulk-closed, pull-request-available > Fix For: 4.0.0 > > > Spark Submit's launching library prints the command to execute the launcher > (org.apache.spark.launcher.main) to a temporary text file, reads the result > back into a variable, and then executes that command. > {code} > set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt > "%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main > %* > %LAUNCHER_OUTPUT% > {code} > [bin/spark-class2.cmd, > L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66] > That temporary text file is given a pseudo-random name by the %RANDOM% env > variable generator, which generates a number between 0 and 32767. > This appears to be the cause of an error occurring when several spark-submit > jobs are launched simultaneously. The following error is returned from stderr: > {quote}The process cannot access the file because it is being used by another > process. The system cannot find the file > USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt. > The process cannot access the file because it is being used by another > process.{quote} > My hypothesis is that %RANDOM% is returning the same value for multiple jobs, > causing the launcher library to attempt to write to the same file from > multiple processes. Another mechanism is needed for reliably generating the > names of the temporary files so that the concurrency issue is resolved. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23015) spark-submit fails when submitting several jobs in parallel
[ https://issues.apache.org/jira/browse/SPARK-23015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997724#comment-16997724 ] Kevin Grealish commented on SPARK-23015: Here is something that may help craft a complete solution that you can change in the Spark the scripts. This uses VB script to create a GUID and assign it to an environment variable. It depends on cscript which is part of Windows since Windows 95. Change the two %%i it just %i to run outside a batch program. Instead of writing a temp .vbs file, just include it with the script now using %RANDOM%. echo WScript.StdOut.WriteLine Mid(CreateObject("Scriptlet.TypeLib").GUID, 2, 36) > %TEMP%\uuid.vbs for /f %%i in ('cscript //NoLogo %TEMP%\uuid.vbs') do @set UUID=%%i echo made a UUID: %UUID% This code will collide on writing uudi.vbs so instead, a uuid.vbs (say called makeuuid.vbs) file should be added to the scripts. > spark-submit fails when submitting several jobs in parallel > --- > > Key: SPARK-23015 > URL: https://issues.apache.org/jira/browse/SPARK-23015 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1 > Environment: Windows 10 (1709/16299.125) > Spark 2.3.0 > Java 8, Update 151 >Reporter: Hugh Zabriskie >Priority: Major > > Spark Submit's launching library prints the command to execute the launcher > (org.apache.spark.launcher.main) to a temporary text file, reads the result > back into a variable, and then executes that command. > {code} > set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt > "%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main > %* > %LAUNCHER_OUTPUT% > {code} > [bin/spark-class2.cmd, > L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66] > That temporary text file is given a pseudo-random name by the %RANDOM% env > variable generator, which generates a number between 0 and 32767. > This appears to be the cause of an error occurring when several spark-submit > jobs are launched simultaneously. The following error is returned from stderr: > {quote}The process cannot access the file because it is being used by another > process. The system cannot find the file > USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt. > The process cannot access the file because it is being used by another > process.{quote} > My hypothesis is that %RANDOM% is returning the same value for multiple jobs, > causing the launcher library to attempt to write to the same file from > multiple processes. Another mechanism is needed for reliably generating the > names of the temporary files so that the concurrency issue is resolved. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23015) spark-submit fails when submitting several jobs in parallel
[ https://issues.apache.org/jira/browse/SPARK-23015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997669#comment-16997669 ] Kevin Grealish commented on SPARK-23015: %TIME% has a granularity of 10ms, so while this does reduce the probability of collision, it does not remove the problem. Neither does using multiple RANDOM's as this is pseudo random number. See https://devblogs.microsoft.com/oldnewthing/20100617-00/?p=13673 "Why cmd.exe's %RANDOM% isn't so random". Once the seed is set from the current time using a granularity of a second, the sequence of numbers coming from %RANDOM% is fixed, so if using %RANDOM% will cause a collision, then so will %RANDOM%%RANDOM%%RANDOM%%RANDOM%... > spark-submit fails when submitting several jobs in parallel > --- > > Key: SPARK-23015 > URL: https://issues.apache.org/jira/browse/SPARK-23015 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1 > Environment: Windows 10 (1709/16299.125) > Spark 2.3.0 > Java 8, Update 151 >Reporter: Hugh Zabriskie >Priority: Major > > Spark Submit's launching library prints the command to execute the launcher > (org.apache.spark.launcher.main) to a temporary text file, reads the result > back into a variable, and then executes that command. > {code} > set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt > "%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main > %* > %LAUNCHER_OUTPUT% > {code} > [bin/spark-class2.cmd, > L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66] > That temporary text file is given a pseudo-random name by the %RANDOM% env > variable generator, which generates a number between 0 and 32767. > This appears to be the cause of an error occurring when several spark-submit > jobs are launched simultaneously. The following error is returned from stderr: > {quote}The process cannot access the file because it is being used by another > process. The system cannot find the file > USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt. > The process cannot access the file because it is being used by another > process.{quote} > My hypothesis is that %RANDOM% is returning the same value for multiple jobs, > causing the launcher library to attempt to write to the same file from > multiple processes. Another mechanism is needed for reliably generating the > names of the temporary files so that the concurrency issue is resolved. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23015) spark-submit fails when submitting several jobs in parallel
[ https://issues.apache.org/jira/browse/SPARK-23015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997604#comment-16997604 ] Evgenii commented on SPARK-23015: - Here is working solution: set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%%TIME::=0%.txt instruction ::=0 is about to remove char ':' from timestamp I checked that. > spark-submit fails when submitting several jobs in parallel > --- > > Key: SPARK-23015 > URL: https://issues.apache.org/jira/browse/SPARK-23015 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1 > Environment: Windows 10 (1709/16299.125) > Spark 2.3.0 > Java 8, Update 151 >Reporter: Hugh Zabriskie >Priority: Major > > Spark Submit's launching library prints the command to execute the launcher > (org.apache.spark.launcher.main) to a temporary text file, reads the result > back into a variable, and then executes that command. > {code} > set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt > "%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main > %* > %LAUNCHER_OUTPUT% > {code} > [bin/spark-class2.cmd, > L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66] > That temporary text file is given a pseudo-random name by the %RANDOM% env > variable generator, which generates a number between 0 and 32767. > This appears to be the cause of an error occurring when several spark-submit > jobs are launched simultaneously. The following error is returned from stderr: > {quote}The process cannot access the file because it is being used by another > process. The system cannot find the file > USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt. > The process cannot access the file because it is being used by another > process.{quote} > My hypothesis is that %RANDOM% is returning the same value for multiple jobs, > causing the launcher library to attempt to write to the same file from > multiple processes. Another mechanism is needed for reliably generating the > names of the temporary files so that the concurrency issue is resolved. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23015) spark-submit fails when submitting several jobs in parallel
[ https://issues.apache.org/jira/browse/SPARK-23015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16996082#comment-16996082 ] Evgenii commented on SPARK-23015: - We invoke it from Java code in parallel too. > spark-submit fails when submitting several jobs in parallel > --- > > Key: SPARK-23015 > URL: https://issues.apache.org/jira/browse/SPARK-23015 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1 > Environment: Windows 10 (1709/16299.125) > Spark 2.3.0 > Java 8, Update 151 >Reporter: Hugh Zabriskie >Priority: Major > > Spark Submit's launching library prints the command to execute the launcher > (org.apache.spark.launcher.main) to a temporary text file, reads the result > back into a variable, and then executes that command. > {code} > set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt > "%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main > %* > %LAUNCHER_OUTPUT% > {code} > [bin/spark-class2.cmd, > L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66] > That temporary text file is given a pseudo-random name by the %RANDOM% env > variable generator, which generates a number between 0 and 32767. > This appears to be the cause of an error occurring when several spark-submit > jobs are launched simultaneously. The following error is returned from stderr: > {quote}The process cannot access the file because it is being used by another > process. The system cannot find the file > USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt. > The process cannot access the file because it is being used by another > process.{quote} > My hypothesis is that %RANDOM% is returning the same value for multiple jobs, > causing the launcher library to attempt to write to the same file from > multiple processes. Another mechanism is needed for reliably generating the > names of the temporary files so that the concurrency issue is resolved. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23015) spark-submit fails when submitting several jobs in parallel
[ https://issues.apache.org/jira/browse/SPARK-23015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16996080#comment-16996080 ] Evgenii commented on SPARK-23015: - Guys, why not to invoke %RANDOM% multiple times? Just change the spark-class2.cmd set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt to set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%-%RANDOM%-%RANDOM%.txt > spark-submit fails when submitting several jobs in parallel > --- > > Key: SPARK-23015 > URL: https://issues.apache.org/jira/browse/SPARK-23015 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1 > Environment: Windows 10 (1709/16299.125) > Spark 2.3.0 > Java 8, Update 151 >Reporter: Hugh Zabriskie >Priority: Major > > Spark Submit's launching library prints the command to execute the launcher > (org.apache.spark.launcher.main) to a temporary text file, reads the result > back into a variable, and then executes that command. > {code} > set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt > "%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main > %* > %LAUNCHER_OUTPUT% > {code} > [bin/spark-class2.cmd, > L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66] > That temporary text file is given a pseudo-random name by the %RANDOM% env > variable generator, which generates a number between 0 and 32767. > This appears to be the cause of an error occurring when several spark-submit > jobs are launched simultaneously. The following error is returned from stderr: > {quote}The process cannot access the file because it is being used by another > process. The system cannot find the file > USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt. > The process cannot access the file because it is being used by another > process.{quote} > My hypothesis is that %RANDOM% is returning the same value for multiple jobs, > causing the launcher library to attempt to write to the same file from > multiple processes. Another mechanism is needed for reliably generating the > names of the temporary files so that the concurrency issue is resolved. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23015) spark-submit fails when submitting several jobs in parallel
[ https://issues.apache.org/jira/browse/SPARK-23015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662750#comment-16662750 ] Kevin Grealish commented on SPARK-23015: One workaround is to create a temp directory in temp and set that to be the TEMP directory for that process being launched. This way each process you launch gets its on temp speace. For example, when launching from C#: {{ // Workaround for Spark bug https://issues.apache.org/jira/browse/SPARK-23015 }} {{ // Spark Submit's launching library prints the command to execute the launcher (org.apache.spark.launcher.main) // to a temporary text file ("%TEMP%\spark-class-launcher-output-%RANDOM%.txt"), reads the result back into a variable, // and then executes that command. %RANDOM% does not have sufficient range to avoid collisions when launching many Spark processes. // As a result the Spark processes end up running one anothers' commands (silently) or gives an error like: // "The process cannot access the file because it is being used by another process." // "The system cannot find the file C:\VsoAgent\_work\_temp\spark-class-launcher-output-654.txt." // As a workaround, we give each run its own TEMP directory, we create using a GUID. string newTemp = null; if (AppRuntimeEnvironment.IsRunningOnWindows()) { var ourTemp = Environment.GetEnvironmentVariable("TEMP"); var newDirName = "t" + Convert.ToBase64String(Guid.NewGuid().ToByteArray()).Substring(0, 22).Replace('/', '-'); newTemp = Path.Combine(ourTemp, newDirName); Directory.CreateDirectory(newTemp); start.Environment["TEMP"] = newTemp; } }} > spark-submit fails when submitting several jobs in parallel > --- > > Key: SPARK-23015 > URL: https://issues.apache.org/jira/browse/SPARK-23015 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1 > Environment: Windows 10 (1709/16299.125) > Spark 2.3.0 > Java 8, Update 151 >Reporter: Hugh Zabriskie >Priority: Major > > Spark Submit's launching library prints the command to execute the launcher > (org.apache.spark.launcher.main) to a temporary text file, reads the result > back into a variable, and then executes that command. > {code} > set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt > "%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main > %* > %LAUNCHER_OUTPUT% > {code} > [bin/spark-class2.cmd, > L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66] > That temporary text file is given a pseudo-random name by the %RANDOM% env > variable generator, which generates a number between 0 and 32767. > This appears to be the cause of an error occurring when several spark-submit > jobs are launched simultaneously. The following error is returned from stderr: > {quote}The process cannot access the file because it is being used by another > process. The system cannot find the file > USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt. > The process cannot access the file because it is being used by another > process.{quote} > My hypothesis is that %RANDOM% is returning the same value for multiple jobs, > causing the launcher library to attempt to write to the same file from > multiple processes. Another mechanism is needed for reliably generating the > names of the temporary files so that the concurrency issue is resolved. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23015) spark-submit fails when submitting several jobs in parallel
[ https://issues.apache.org/jira/browse/SPARK-23015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654072#comment-16654072 ] Bansal, Parvesh commented on SPARK-23015: - Hi Hugh Zabriskie I was following the issue , as I am also facing the same challenges while working under windows environment [using python threads to launch multiple spark-submit] Can you please let me know the current resolution or workaround for windows, is it fixed for Linux environment? Please suggest. Thanks and Regards Parvesh K Bansal -- > spark-submit fails when submitting several jobs in parallel > --- > > Key: SPARK-23015 > URL: https://issues.apache.org/jira/browse/SPARK-23015 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1 > Environment: Windows 10 (1709/16299.125) > Spark 2.3.0 > Java 8, Update 151 >Reporter: Hugh Zabriskie >Priority: Major > > Spark Submit's launching library prints the command to execute the launcher > (org.apache.spark.launcher.main) to a temporary text file, reads the result > back into a variable, and then executes that command. > {code} > set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt > "%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main > %* > %LAUNCHER_OUTPUT% > {code} > [bin/spark-class2.cmd, > L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66] > That temporary text file is given a pseudo-random name by the %RANDOM% env > variable generator, which generates a number between 0 and 32767. > This appears to be the cause of an error occurring when several spark-submit > jobs are launched simultaneously. The following error is returned from stderr: > {quote}The process cannot access the file because it is being used by another > process. The system cannot find the file > USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt. > The process cannot access the file because it is being used by another > process.{quote} > My hypothesis is that %RANDOM% is returning the same value for multiple jobs, > causing the launcher library to attempt to write to the same file from > multiple processes. Another mechanism is needed for reliably generating the > names of the temporary files so that the concurrency issue is resolved. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23015) spark-submit fails when submitting several jobs in parallel
[ https://issues.apache.org/jira/browse/SPARK-23015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16319399#comment-16319399 ] Marcelo Vanzin commented on SPARK-23015: The {{%RANDOM%}} thing is not something I added originally - I avoided it exactly because of this kind of problem. Need to figure out why it was added later, but Windows is not really on my list of things to prioritize at the moment... > spark-submit fails when submitting several jobs in parallel > --- > > Key: SPARK-23015 > URL: https://issues.apache.org/jira/browse/SPARK-23015 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1 > Environment: Windows 10 (1709/16299.125) > Spark 2.3.0 > Java 8, Update 151 >Reporter: Hugh Zabriskie > > Spark Submit's launching library prints the command to execute the launcher > (org.apache.spark.launcher.main) to a temporary text file, reads the result > back into a variable, and then executes that command. > {code} > set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt > "%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main > %* > %LAUNCHER_OUTPUT% > {code} > [bin/spark-class2.cmd, > L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66] > That temporary text file is given a pseudo-random name by the %RANDOM% env > variable generator, which generates a number between 0 and 32767. > This appears to be the cause of an error occurring when several spark-submit > jobs are launched simultaneously. The following error is returned from stderr: > {quote}The process cannot access the file because it is being used by another > process. The system cannot find the file > USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt. > The process cannot access the file because it is being used by another > process.{quote} > My hypothesis is that %RANDOM% is returning the same value for multiple jobs, > causing the launcher library to attempt to write to the same file from > multiple processes. Another mechanism is needed for reliably generating the > names of the temporary files so that the concurrency issue is resolved. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org