Hi, I prefer to do most of my projects in Python and for that I use Jupyter. I have been downloading the compiled version of spark.
I do not normally like the source code version because the build process makes me nervous. You know with lines of stuff scrolling up the screen. What am I am going to do if a build fails. I am a user! I decided to risk it and it was only one mvn command to build. (45 minutes later) Everything is great. Success. I removed all jvms except jdk8 for compilation. I used jdk8 so I know which libraries where linked in the build process. I also used my local version of maven. Not the apt install version . I used jdk8 because if you go this scala site. http://scala-ide.org/download/sdk.html. they say requirement jdk8 for IDE even for scala12. They don't say JDK 8 or higher , just jdk8. So anyway once in a while I do spark projects in scala with eclipse. For that I don't use maven or anything. I prefer to make use of build path And external jars. This way I know exactly which libraries I am linking to. creating a jar in eclipse is straight forward for spark_submit. Anyway as you can see (below) I am pointing jupyter to find spark.init('opt/spark'). That's OK everything is fine. With the compiled version of spark there is a jar directory which I have been using in eclipse. With my own compiled from source version there is no jar directory. Where are all the jars gone ?. I am not sure how findspark.init('/opt/spark') is locating the libraries unless it is finding them from Anaconda. import findspark findspark.init('/opt/spark') from pyspark.sql import SparkSession spark = SparkSession \ .builder \ .appName('Titanic Data') \ .getOrCreate() <http://www.backbutton.co.uk/>