[jira] [Comment Edited] (TOREE-399) Make Spark Kernel work on Windows

Patrick McCarty (JIRA) Sun, 08 Oct 2017 22:35:36 -0700

    [ 
https://issues.apache.org/jira/browse/TOREE-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196515#comment-16196515
 ]


Patrick McCarty edited comment on TOREE-399 at 10/9/17 5:34 AM:
----------------------------------------------------------------

I wrote my own hacky run.cmd and managed to get it working using Spark 2.2.0 
and toree-assembly-0.2.0.dev1-incubating-SNAPSHOT. I'm a complete newbie to 
Spark, Scala, and Jupyter (learning them for a college class) so hopefully 
someone with greater familiarity can cleanup what I've done and implement it 
properly.

Firstly, I found that I needed to set -Dscala.usejavacp=true, or else you get 
the error mentioned by the previous 2 posters:
Failed to initialize compiler: object scala in compiler mirror not found.
** Note that as of 2.8 scala does not assume use of the java classpath.
** For the old behavior pass -usejavacp to scala, or if using a Settings
** object programmatically, settings.usejavacp.value = true.
Exception in thread "main" java.lang.NullPointerException

Secondly, I found that I needed to include the toree-assembly jar file in the 
classpath, or else I get "error: object toree is not a member of package 
org.apache".

Third, I found that I couldn't achieve the above classpath change in a good way 
by simply invoking spark-submit.cmd. I tried to make use of the --jars argument 
to the SparkSubmit program which feels like it ought to be the proper way to 
put toree in the classpath, but it didn't work. Maybe that's a bug or maybe I 
wasn't using it correctly. Due to the way that the spark-submit2.cmd and 
spark-class2.cmd files work, I could not find a way to use those scripts 
unmodified and still add toree to the classpath in addition to the normal 
classpath items. So the only way I was able to get this working was to avoid 
using the spark-submit.cmd script and just hardcode the SparkSubmit java 
command-line directly as shown in the following run.cmd file:


{code:cmd}
@echo off

set PROG_HOME=%~dp0..

if not defined SPARK_HOME (
  echo SPARK_HOME must be set to the location of a Spark distribution!
  exit 1
)

REM disable randomized hash for string in Python 3.3+
set PYTHONHASHSEED=0

REM The SPARK_OPTS values during installation are stored in 
__TOREE_SPARK_OPTS__. This allows values to be specified during
REM install, but also during runtime. The runtime options take precedence over 
the install options.

if not defined SPARK_OPTS (
  set SPARK_OPTS=%__TOREE_SPARK_OPTS__%
) else (
  if "%SPARK_OPTS%" == "" (
    set SPARK_OPTS=%__TOREE_SPARK_OPTS__%
  )
)

if not defined TOREE_OPTS (
  set TOREE_OPTS=%__TOREE_OPTS__%
) else (
  if "%TOREE_OPTS%" == "" (
    set TOREE_OPTS=%__TOREE_OPTS__%
  )
)

echo Starting Spark Kernel with SPARK_HOME=%SPARK_HOME%

REM This doesn't work because the classpath doesn't get set properly, unless 
you hardcode it in SPARK_SUBMIT_OPTS using forward slashes or double 
backslashes, but then you can't use the SPARK_HOME and PROG_HOME variables.
REM set SPARK_SUBMIT_OPTS=-cp 
"%SPARK_HOME%\conf\;%SPARK_HOME%\jars\*;%PROG_HOME%\lib\toree-assembly-0.2.0.dev1-incubating-SNAPSHOT.jar"
 -Dscala.usejavacp=true
REM set TOREE_COMMAND="%SPARK_HOME%\bin\spark-submit.cmd" %SPARK_OPTS% --class 
org.apache.toree.Main 
%PROG_HOME%\lib\toree-assembly-0.2.0.dev1-incubating-SNAPSHOT.jar %TOREE_OPTS% 
%*

REM The two important things that we must do differently on Windows are that we 
must add toree-assembly-0.2.0.dev1-incubating-SNAPSHOT.jar to the classpath, 
and we must define the java property scala.usejavacp=true.
set TOREE_COMMAND=%JAVA_HOME%\bin\java -cp 
"%SPARK_HOME%\conf\;%SPARK_HOME%\jars\*;%PROG_HOME%\lib\toree-assembly-0.2.0.dev1-incubating-SNAPSHOT.jar"
 -Dscala.usejavacp=true -Xmx1g org.apache.spark.deploy.SparkSubmit %SPARK_OPTS% 
--class org.apache.toree.Main 
%PROG_HOME%\lib\toree-assembly-0.2.0.dev1-incubating-SNAPSHOT.jar %TOREE_OPTS% 
%*

echo.
echo %TOREE_COMMAND%
echo.

%TOREE_COMMAND%
{code}

The run.cmd file should be placed in 
C:\ProgramData\jupyter\kernels\apache_toree_scala\bin\
Additionally, you need to edit kernel.json in the folder above that to change 
run.sh to run.cmd.
If you want to allow for installing additional Toree kernels, you should also 
edit toreeapp.py to change run.sh to run.cmd (obviously the real solution will 
need code to detect the OS and reference the appropriate script).


was (Author: patrickjmccarty):
I wrote my own hacky run.cmd and managed to get it working using Spark 2.2.0 
and toree-assembly-0.2.0.dev1-incubating-SNAPSHOT. I'm a complete newbie to 
Spark, Scala, and Jupyter (learning them for a college class) so hopefully 
someone with greater familiarity can cleanup what I've done and implement it 
properly.

Firstly, I found that I needed to set -Dscala.usejavacp=true, or else you get 
the error mentioned by the previous 2 posters:
Failed to initialize compiler: object scala in compiler mirror not found.
** Note that as of 2.8 scala does not assume use of the java classpath.
** For the old behavior pass -usejavacp to scala, or if using a Settings
** object programmatically, settings.usejavacp.value = true.
Exception in thread "main" java.lang.NullPointerException

Secondly, I found that I needed to include the toree-assembly jar file in the 
classpath, or else I get "error: object toree is not a member of package 
org.apache".

Third, I found that I couldn't achieve the above classpath change in a good way 
by simply invoking spark-submit.cmd. I tried to make use of the --jars argument 
to the SparkSubmit program which feels like it ought to be the proper way to 
put toree in the classpath, but it didn't work. Maybe that's a bug or maybe I 
wasn't using it correctly. Due to the way that the spark-submit2.cmd and 
spark-class2.cmd files work, I could not find a way to use those scripts 
unmodified and still add toree to the classpath in addition to the normal 
classpath items. So the only way I was able to get this working was to avoid 
using the spark-submit.cmd script and just hardcode the SparkSubmit java 
command-line directly as shown in the following run.cmd file:

{{@echo off

set PROG_HOME=%~dp0..

if not defined SPARK_HOME (
  echo SPARK_HOME must be set to the location of a Spark distribution!
  exit 1
)

REM disable randomized hash for string in Python 3.3+
set PYTHONHASHSEED=0

REM The SPARK_OPTS values during installation are stored in 
__TOREE_SPARK_OPTS__. This allows values to be specified during
REM install, but also during runtime. The runtime options take precedence over 
the install options.

if not defined SPARK_OPTS (
  set SPARK_OPTS=%__TOREE_SPARK_OPTS__%
) else (
  if "%SPARK_OPTS%" == "" (
    set SPARK_OPTS=%__TOREE_SPARK_OPTS__%
  )
)

if not defined TOREE_OPTS (
  set TOREE_OPTS=%__TOREE_OPTS__%
) else (
  if "%TOREE_OPTS%" == "" (
    set TOREE_OPTS=%__TOREE_OPTS__%
  )
)

echo Starting Spark Kernel with SPARK_HOME=%SPARK_HOME%

REM This doesn't work because the classpath doesn't get set properly, unless 
you hardcode it in SPARK_SUBMIT_OPTS using forward slashes or double 
backslashes, but then you can't use the SPARK_HOME and PROG_HOME variables.
REM set SPARK_SUBMIT_OPTS=-cp 
"%SPARK_HOME%\conf\;%SPARK_HOME%\jars\*;%PROG_HOME%\lib\toree-assembly-0.2.0.dev1-incubating-SNAPSHOT.jar"
 -Dscala.usejavacp=true
REM set TOREE_COMMAND="%SPARK_HOME%\bin\spark-submit.cmd" %SPARK_OPTS% --class 
org.apache.toree.Main 
%PROG_HOME%\lib\toree-assembly-0.2.0.dev1-incubating-SNAPSHOT.jar %TOREE_OPTS% 
%*

REM The two important things that we must do differently on Windows are that we 
must add toree-assembly-0.2.0.dev1-incubating-SNAPSHOT.jar to the classpath, 
and we must define the java property scala.usejavacp=true.
set TOREE_COMMAND=%JAVA_HOME%\bin\java -cp 
"%SPARK_HOME%\conf\;%SPARK_HOME%\jars\*;%PROG_HOME%\lib\toree-assembly-0.2.0.dev1-incubating-SNAPSHOT.jar"
 -Dscala.usejavacp=true -Xmx1g org.apache.spark.deploy.SparkSubmit %SPARK_OPTS% 
--class org.apache.toree.Main 
%PROG_HOME%\lib\toree-assembly-0.2.0.dev1-incubating-SNAPSHOT.jar %TOREE_OPTS% 
%*

echo.
echo %TOREE_COMMAND%
echo.

%TOREE_COMMAND%}}


The run.cmd file should be placed in 
C:\ProgramData\jupyter\kernels\apache_toree_scala\bin\
Additionally, you need to edit kernel.json in the folder above that to change 
run.sh to run.cmd.
If you want to allow for installing additional Toree kernels, you should also 
edit toreeapp.py to change run.sh to run.cmd (obviously the real solution will 
need code to detect the OS and reference the appropriate script).

> Make Spark Kernel work on Windows
> ---------------------------------
>
>                 Key: TOREE-399
>                 URL: https://issues.apache.org/jira/browse/TOREE-399
>             Project: TOREE
>          Issue Type: New Feature
>         Environment: Windows 7/8/10
>            Reporter: aldo
>         Attachments: run.bat
>
>
> After a successful install of the Spark Kernel the error: "Failed to run 
> command:" occurs when from jupyter we select a Scala Notebook.
> The error happens because the kernel.json runs 
> C:\\ProgramData\\jupyter\\kernels\\apache_toree_scala\\bin\\run.sh which is 
> bash shell script and hence cannot work on windows.
> Can you give me some direction to fix this, and I will implement it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (TOREE-399) Make Spark Kernel work on Windows

Reply via email to