Hi,

I prefer to do most of my projects in Python and for that I use Jupyter.
I have been downloading the compiled version of spark.

I do not normally like the source code version because the build process
makes me nervous.
You know with lines of stuff   scrolling up the screen.
What am I am going to do if a build fails. I am a user!

I decided to risk it and it was only one  mvn command to build. (45 minutes
later)
Everything is great. Success.

I removed all jvms except jdk8 for compilation.

I used jdk8 so I know which libraries where linked in the build process.
I also used my local version of maven. Not the apt install version .

I used jdk8 because if you go this scala site.

http://scala-ide.org/download/sdk.html. they say requirement  jdk8 for IDE
 even for scala12.
They don't say JDK 8 or higher ,  just jdk8.

So anyway  once in a while I  do spark projects in scala with eclipse.

For that I don't use maven or anything. I prefer to make use of build path
And external jars. This way I know exactly which libraries I am linking to.

creating a jar in eclipse is straight forward for spark_submit.


Anyway  as you can see (below) I am pointing jupyter to find
spark.init('opt/spark').
That's OK everything is fine.

With the compiled version of spark there is a jar directory which I have
been using in eclipse.



With my own compiled from source version there is no jar directory.


Where are all the jars gone  ?.



I am not sure how findspark.init('/opt/spark') is locating the libraries
unless it is finding them from
Anaconda.


import findspark
findspark.init('/opt/spark')
from pyspark.sql import SparkSession
spark = SparkSession \
    .builder \
    .appName('Titanic Data') \
    .getOrCreate()
<http://www.backbutton.co.uk/>

Reply via email to