Re: Scala 2.12 version mismatch for Spark Interpreter

2021-10-28 Thread Mich Talebzadeh
apologies should say the docker image should be on 3.1.1



   view my Linkedin profile




*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 28 Oct 2021 at 14:34, Mich Talebzadeh 
wrote:

> you should go for Spark 3.1.1 for k8s. That is the tried and tested one
> for Kubernetes in Spark 3 series, meaning the docker image should be on
> .1.1 and your client which I think is used to submit spark-submit on k8s
> should also be on 3.1.1
>
> HTH
>
>
>view my Linkedin profile
> 
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 28 Oct 2021 at 13:13, Jeff Zhang  wrote:
>
>> Hi Fabrizio,
>>
>> Spark 3.2.0 is supported recently in this PR
>> https://github.com/apache/zeppelin/pull/4257
>> The problem you mentioned is solved.
>>
>> Fabrizio Fab  于2021年10月28日周四 下午7:43写道:
>>
>>> I am aware that Spark 3.20 is not officially released, but I am trying
>>> to put it to work.
>>>
>>> The first thing that I noticed is the following:
>>>
>>> the SparkInterpreter is compiled for Scala 2.12.7
>>>
>>> Spark 3.2 is compiled for Scala 2.12.15
>>>
>>> Unfortunately there are some breaking changes between the two versions
>>> (even if only the minor version has changed... W.T.F. ??)  that requires a
>>> recompiling (I hope no code update)..
>>>
>>> The first incompatibily I run into is at line 66 of
>>> SparkScala212Interpreter.scala
>>> val settings = new Settings()
>>> settings.processArguments(List("-Yrepl-class-based",
>>>   "-Yrepl-outdir", s"${outputDir.getAbsolutePath}"), true)
>>> settings.embeddedDefaults(sparkInterpreterClassLoader)
>>>
>>> -->settings.usejavacp.value = true  <--
>>>
>>> scala.tools.nsc.Settings.usejavacp was moved since 2.12.13 from
>>> AbsSettings to MutableSettings, so you  get a runtime error.
>>>
>>>
>>> I'll make you know if I'll resolve all problems.
>>>
>>>
>>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>


Re: Scala 2.12 version mismatch for Spark Interpreter

2021-10-28 Thread Mich Talebzadeh
you should go for Spark 3.1.1 for k8s. That is the tried and tested one for
Kubernetes in Spark 3 series, meaning the docker image should be on .1.1
and your client which I think is used to submit spark-submit on k8s should
also be on 3.1.1

HTH


   view my Linkedin profile




*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 28 Oct 2021 at 13:13, Jeff Zhang  wrote:

> Hi Fabrizio,
>
> Spark 3.2.0 is supported recently in this PR
> https://github.com/apache/zeppelin/pull/4257
> The problem you mentioned is solved.
>
> Fabrizio Fab  于2021年10月28日周四 下午7:43写道:
>
>> I am aware that Spark 3.20 is not officially released, but I am trying to
>> put it to work.
>>
>> The first thing that I noticed is the following:
>>
>> the SparkInterpreter is compiled for Scala 2.12.7
>>
>> Spark 3.2 is compiled for Scala 2.12.15
>>
>> Unfortunately there are some breaking changes between the two versions
>> (even if only the minor version has changed... W.T.F. ??)  that requires a
>> recompiling (I hope no code update)..
>>
>> The first incompatibily I run into is at line 66 of
>> SparkScala212Interpreter.scala
>> val settings = new Settings()
>> settings.processArguments(List("-Yrepl-class-based",
>>   "-Yrepl-outdir", s"${outputDir.getAbsolutePath}"), true)
>> settings.embeddedDefaults(sparkInterpreterClassLoader)
>>
>> -->settings.usejavacp.value = true  <--
>>
>> scala.tools.nsc.Settings.usejavacp was moved since 2.12.13 from
>> AbsSettings to MutableSettings, so you  get a runtime error.
>>
>>
>> I'll make you know if I'll resolve all problems.
>>
>>
>>
>
> --
> Best Regards
>
> Jeff Zhang
>


Re: Scala 2.12 version mismatch for Spark Interpreter

2021-10-28 Thread Jeff Zhang
Hi Fabrizio,

Spark 3.2.0 is supported recently in this PR
https://github.com/apache/zeppelin/pull/4257
The problem you mentioned is solved.

Fabrizio Fab  于2021年10月28日周四 下午7:43写道:

> I am aware that Spark 3.20 is not officially released, but I am trying to
> put it to work.
>
> The first thing that I noticed is the following:
>
> the SparkInterpreter is compiled for Scala 2.12.7
>
> Spark 3.2 is compiled for Scala 2.12.15
>
> Unfortunately there are some breaking changes between the two versions
> (even if only the minor version has changed... W.T.F. ??)  that requires a
> recompiling (I hope no code update)..
>
> The first incompatibily I run into is at line 66 of
> SparkScala212Interpreter.scala
> val settings = new Settings()
> settings.processArguments(List("-Yrepl-class-based",
>   "-Yrepl-outdir", s"${outputDir.getAbsolutePath}"), true)
> settings.embeddedDefaults(sparkInterpreterClassLoader)
>
> -->settings.usejavacp.value = true  <--
>
> scala.tools.nsc.Settings.usejavacp was moved since 2.12.13 from
> AbsSettings to MutableSettings, so you  get a runtime error.
>
>
> I'll make you know if I'll resolve all problems.
>
>
>

-- 
Best Regards

Jeff Zhang


Scala 2.12 version mismatch for Spark Interpreter

2021-10-28 Thread Fabrizio Fab
I am aware that Spark 3.20 is not officially released, but I am trying to put 
it to work.

The first thing that I noticed is the following:

the SparkInterpreter is compiled for Scala 2.12.7

Spark 3.2 is compiled for Scala 2.12.15

Unfortunately there are some breaking changes between the two versions (even if 
only the minor version has changed... W.T.F. ??)  that requires a recompiling 
(I hope no code update)..

The first incompatibily I run into is at line 66 of 
SparkScala212Interpreter.scala
val settings = new Settings()
settings.processArguments(List("-Yrepl-class-based",
  "-Yrepl-outdir", s"${outputDir.getAbsolutePath}"), true)
settings.embeddedDefaults(sparkInterpreterClassLoader)

-->settings.usejavacp.value = true  <--

scala.tools.nsc.Settings.usejavacp was moved since 2.12.13 from AbsSettings to 
MutableSettings, so you  get a runtime error.


I'll make you know if I'll resolve all problems.




Re: Spark on k8s cluster mode, from outside of the cluster. [SOLVED]

2021-10-28 Thread Jeff Zhang
Thanks for the sharing, it would be nice if you can write a blog to share
it with more wide zeppelin users.


Fabrizio Fab  于2021年10月28日周四 下午4:29写道:

>
> Yeah ! Thank you very much Philipp: tonight I explored carefully the
> source code and discovered the 2 thrift servers stuff.
>
> Therefore I solved my problem: here the solution adopted, which can be
> useful for other people.
>
> CONTEXT
> I have my Zeppelin Server installation located into a LAN, where a K8s
> Cluster is available, and I want to submit notes in cluster mode over the
> k8s Cluster.
>
> SOLUTION
> - the driver pod must have its address exposed on the LAN network,
> otherwise the Zeppelin server cannot connect to the Interpreter Thrift
> server: I suppose that there are several ways of doing this, but I am not a
> k8s expert so I simply created a basic driver-pod.template.yaml with a
> "hostNetwork" spec and referenced it by the
> "spark.kubernetes.driver.podTemplateFile" interpreter setting.
>
> At this point, the 2 servers can talk each other.
>
> NOTE
> 1) do not set the zeppelin run mode = k8s. It must be "local" (or the
> default "auto")
> 2) a NFS share (or other shared persistent volume) is required in order to
> upload the required JARS and easily access the driver logs when the driver
> shuts down:
>
> spark.kubernetes.driver.volumes.nfs..options.server= server>
> spark.kubernetes.driver.volumes.nfs..options.path= path>
> spark.kubernetes.driver.volumes.nfs..mount.path= path>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On 2021/10/28 06:48:54, Philipp Dallig  wrote:
> > Hi Fabrizio,
> >
> > We have two connections. First, the Zeppelin interpreter opens a
> > connection to the Zeppelin server to register and to send back the
> > interpreter output. The Zeppelin server is the CALLBACK_HOST and the
> > PORT indicates where the Zeppelin server opened the Thrift service for
> > the Zeppelin interpreter.
> >
> > An important part of the registration is that the Zeppelin interpreter
> > tells the Zeppelin server where the interpreter pod has an open Thrifts
> > server port. This information can be found in the Zeppelin server log
> > output. Be on the lookout for this message.
> >
> https://github.com/apache/zeppelin/blob/master/zeppelin-plugins/launcher/k8s-standard/src/main/java/org/apache/zeppelin/interpreter/launcher/K8sRemoteInterpreterProcess.java#L483
> > Also note the function ZEPPELIN_K8S_PORTFORWARD, which should help your
> > Zeppelin server to reach the Zeppelin interpreter in K8s.
> >
> >  > the 1st "spark-submit" in "cluster mode" is started from the client
> > (in the zeppelin host, in our case), then the 2nd "spark-submit" in
> > "client mode" is started by the "/opt/entrypoint.sh" script inside the
> > standard spark docker image.
> >
> > Are you sure you are using the K8s launcher? As you can see in this part
> > of the code
> > (
> https://github.com/apache/zeppelin/blob/2f55fe8ed277b28d71f858633f9c9d76fd18f0c3/zeppelin-plugins/launcher/k8s-standard/src/main/java/org/apache/zeppelin/interpreter/launcher/K8sRemoteInterpreterProcess.java#L411),
>
> > Zeppelin always uses client mode.
> >
> > The architecture is quite simple:
> >
> > Zeppelin-Server -> Zeppelin-Interpreter (with Spark in client mode) on
> > K8s -> x-Spark-executors (based on your config)
> >
> > Best Regards
> > Philipp
> >
> >
> > Am 27.10.21 um 15:19 schrieb Fabrizio Fab:
> >
> > > Hi Philipp, okay, I realized just now of my HUGE misunderstanding !
> > >
> > > The "double-spark-submit" patter is just the standard spark-on-k8s way
> of running spark applications in cluster mode:
> > > the 1st "spark-submit" in "cluster mode" is started from the client
> (in the zeppelin host, in our case), then the 2nd "spark-submit" in "client
> mode" is started by the "/opt/entrypoint.sh" script inside the standard
> spark docker image.
> > >
> > > At this point I can make a more precise question:
> > >
> > > I see that the interpreter.sh starts the RemoteInterpreterServer with,
> in particular the following paramters: CALLBACK_HOST / PORT
> > > They refers to the Zeppelin host and RPC port
> > >
> > > Moreover, when the interpreter starts, it runs a Thrift server on some
> random port.
> > >
> > > So, I ask: which communications are supposed to happen, in order to
> correctly set-up my firewall/routing rules ?
> > >
> > > -1 Must the Zeppelin server connect to the Interpreter Thrift server ?
> > > -2 Must the Interpreter Thrift server connect to the Zeppelin server?
> > > -3 Both ?
> > >
> > > - Which ports must the Zeppelin server/ The thrift server  find open
> on the other server ?
> > >
> > > Thank you everybody!
> > >
> > > Fabrizio
> > >
> > >
> > >
> > >
> > > On 2021/10/26 11:40:24, Philipp Dallig 
> wrote:
> > >> Hi Fabrizio,
> > >>
> > >> At the moment I think zeppelin does not support running spark jobs in
> > >> cluster mode. But in fact K8s mode simulates cluster mode. Because the
> > >> Zeppelin interpreter is already started as a pod in K8s, as a manual
> > >> 

Re: Spark on k8s cluster mode, from outside of the cluster. [SOLVED]

2021-10-28 Thread Fabrizio Fab


Yeah ! Thank you very much Philipp: tonight I explored carefully the source 
code and discovered the 2 thrift servers stuff.

Therefore I solved my problem: here the solution adopted, which can be useful 
for other people.

CONTEXT
I have my Zeppelin Server installation located into a LAN, where a K8s Cluster 
is available, and I want to submit notes in cluster mode over the k8s Cluster.

SOLUTION
- the driver pod must have its address exposed on the LAN network, otherwise 
the Zeppelin server cannot connect to the Interpreter Thrift server: I suppose 
that there are several ways of doing this, but I am not a k8s expert so I 
simply created a basic driver-pod.template.yaml with a "hostNetwork" spec and 
referenced it by the  "spark.kubernetes.driver.podTemplateFile" interpreter 
setting.
 
At this point, the 2 servers can talk each other.

NOTE
1) do not set the zeppelin run mode = k8s. It must be "local" (or the default 
"auto")
2) a NFS share (or other shared persistent volume) is required in order to 
upload the required JARS and easily access the driver logs when the driver 
shuts down:

spark.kubernetes.driver.volumes.nfs..options.server=
spark.kubernetes.driver.volumes.nfs..options.path=
spark.kubernetes.driver.volumes.nfs..mount.path=















On 2021/10/28 06:48:54, Philipp Dallig  wrote: 
> Hi Fabrizio,
> 
> We have two connections. First, the Zeppelin interpreter opens a 
> connection to the Zeppelin server to register and to send back the 
> interpreter output. The Zeppelin server is the CALLBACK_HOST and the 
> PORT indicates where the Zeppelin server opened the Thrift service for 
> the Zeppelin interpreter.
> 
> An important part of the registration is that the Zeppelin interpreter 
> tells the Zeppelin server where the interpreter pod has an open Thrifts 
> server port. This information can be found in the Zeppelin server log 
> output. Be on the lookout for this message. 
> https://github.com/apache/zeppelin/blob/master/zeppelin-plugins/launcher/k8s-standard/src/main/java/org/apache/zeppelin/interpreter/launcher/K8sRemoteInterpreterProcess.java#L483
> Also note the function ZEPPELIN_K8S_PORTFORWARD, which should help your 
> Zeppelin server to reach the Zeppelin interpreter in K8s.
> 
>  > the 1st "spark-submit" in "cluster mode" is started from the client 
> (in the zeppelin host, in our case), then the 2nd "spark-submit" in 
> "client mode" is started by the "/opt/entrypoint.sh" script inside the 
> standard spark docker image.
> 
> Are you sure you are using the K8s launcher? As you can see in this part 
> of the code 
> (https://github.com/apache/zeppelin/blob/2f55fe8ed277b28d71f858633f9c9d76fd18f0c3/zeppelin-plugins/launcher/k8s-standard/src/main/java/org/apache/zeppelin/interpreter/launcher/K8sRemoteInterpreterProcess.java#L411),
>  
> Zeppelin always uses client mode.
> 
> The architecture is quite simple:
> 
> Zeppelin-Server -> Zeppelin-Interpreter (with Spark in client mode) on 
> K8s -> x-Spark-executors (based on your config)
> 
> Best Regards
> Philipp
> 
> 
> Am 27.10.21 um 15:19 schrieb Fabrizio Fab:
> 
> > Hi Philipp, okay, I realized just now of my HUGE misunderstanding !
> >
> > The "double-spark-submit" patter is just the standard spark-on-k8s way of 
> > running spark applications in cluster mode:
> > the 1st "spark-submit" in "cluster mode" is started from the client (in the 
> > zeppelin host, in our case), then the 2nd "spark-submit" in "client mode" 
> > is started by the "/opt/entrypoint.sh" script inside the standard spark 
> > docker image.
> >
> > At this point I can make a more precise question:
> >
> > I see that the interpreter.sh starts the RemoteInterpreterServer with, in 
> > particular the following paramters: CALLBACK_HOST / PORT
> > They refers to the Zeppelin host and RPC port
> >
> > Moreover, when the interpreter starts, it runs a Thrift server on some 
> > random port.
> >
> > So, I ask: which communications are supposed to happen, in order to 
> > correctly set-up my firewall/routing rules ?
> >
> > -1 Must the Zeppelin server connect to the Interpreter Thrift server ?
> > -2 Must the Interpreter Thrift server connect to the Zeppelin server?
> > -3 Both ?
> >
> > - Which ports must the Zeppelin server/ The thrift server  find open on the 
> > other server ?
> >
> > Thank you everybody!
> >
> > Fabrizio
> >
> >
> >
> >
> > On 2021/10/26 11:40:24, Philipp Dallig  wrote:
> >> Hi Fabrizio,
> >>
> >> At the moment I think zeppelin does not support running spark jobs in
> >> cluster mode. But in fact K8s mode simulates cluster mode. Because the
> >> Zeppelin interpreter is already started as a pod in K8s, as a manual
> >> Spark submit execution would do in cluster mode.
> >>
> >> Spark-submit is called only once during the start of the Zeppelin
> >> interpreter. You will find the call in these lines:
> >> https://github.com/apache/zeppelin/blob/2f55fe8ed277b28d71f858633f9c9d76fd18f0c3/bin/interpreter.sh#L303-L305
> >>
> >> Best Regards
> >> 

Re: Spark on k8s cluster mode, from outside of the cluster.

2021-10-28 Thread Philipp Dallig

Hi Fabrizio,

We have two connections. First, the Zeppelin interpreter opens a 
connection to the Zeppelin server to register and to send back the 
interpreter output. The Zeppelin server is the CALLBACK_HOST and the 
PORT indicates where the Zeppelin server opened the Thrift service for 
the Zeppelin interpreter.


An important part of the registration is that the Zeppelin interpreter 
tells the Zeppelin server where the interpreter pod has an open Thrifts 
server port. This information can be found in the Zeppelin server log 
output. Be on the lookout for this message. 
https://github.com/apache/zeppelin/blob/master/zeppelin-plugins/launcher/k8s-standard/src/main/java/org/apache/zeppelin/interpreter/launcher/K8sRemoteInterpreterProcess.java#L483
Also note the function ZEPPELIN_K8S_PORTFORWARD, which should help your 
Zeppelin server to reach the Zeppelin interpreter in K8s.


> the 1st "spark-submit" in "cluster mode" is started from the client 
(in the zeppelin host, in our case), then the 2nd "spark-submit" in 
"client mode" is started by the "/opt/entrypoint.sh" script inside the 
standard spark docker image.


Are you sure you are using the K8s launcher? As you can see in this part 
of the code 
(https://github.com/apache/zeppelin/blob/2f55fe8ed277b28d71f858633f9c9d76fd18f0c3/zeppelin-plugins/launcher/k8s-standard/src/main/java/org/apache/zeppelin/interpreter/launcher/K8sRemoteInterpreterProcess.java#L411), 
Zeppelin always uses client mode.


The architecture is quite simple:

Zeppelin-Server -> Zeppelin-Interpreter (with Spark in client mode) on 
K8s -> x-Spark-executors (based on your config)


Best Regards
Philipp


Am 27.10.21 um 15:19 schrieb Fabrizio Fab:


Hi Philipp, okay, I realized just now of my HUGE misunderstanding !

The "double-spark-submit" patter is just the standard spark-on-k8s way of 
running spark applications in cluster mode:
the 1st "spark-submit" in "cluster mode" is started from the client (in the zeppelin host, in our case), then 
the 2nd "spark-submit" in "client mode" is started by the "/opt/entrypoint.sh" script inside the 
standard spark docker image.

At this point I can make a more precise question:

I see that the interpreter.sh starts the RemoteInterpreterServer with, in 
particular the following paramters: CALLBACK_HOST / PORT
They refers to the Zeppelin host and RPC port

Moreover, when the interpreter starts, it runs a Thrift server on some random 
port.

So, I ask: which communications are supposed to happen, in order to correctly 
set-up my firewall/routing rules ?

-1 Must the Zeppelin server connect to the Interpreter Thrift server ?
-2 Must the Interpreter Thrift server connect to the Zeppelin server?
-3 Both ?

- Which ports must the Zeppelin server/ The thrift server  find open on the 
other server ?

Thank you everybody!

Fabrizio




On 2021/10/26 11:40:24, Philipp Dallig  wrote:

Hi Fabrizio,

At the moment I think zeppelin does not support running spark jobs in
cluster mode. But in fact K8s mode simulates cluster mode. Because the
Zeppelin interpreter is already started as a pod in K8s, as a manual
Spark submit execution would do in cluster mode.

Spark-submit is called only once during the start of the Zeppelin
interpreter. You will find the call in these lines:
https://github.com/apache/zeppelin/blob/2f55fe8ed277b28d71f858633f9c9d76fd18f0c3/bin/interpreter.sh#L303-L305

Best Regards
Philipp


Am 25.10.21 um 21:58 schrieb Fabrizio Fab:

Dear All, I am struggling since more than a week on the following problem.
My Zeppelin Server is running outside the k8s cluster (there is a reason for 
this) and I am able to run Spark zeppelin notes in Client mode but not in 
Cluster mode.

I see that, at first, a pod for the interpreter (RemoteInterpreterServer) is 
created on the cluster by spark-submit from the Zeppelin host, with 
deployMode=cluster (and this happens without errors), then the interpreter 
itself runs another spark-submit  (this time from the Pod) with 
deployMode=client.

Exactly, the following is the command line submitted by the interpreter from 
its pod

/opt/spark/bin/spark-submit \
--conf spark.driver.bindAddress= \
--deploy-mode client \
--properties-file /opt/spark/conf/spark.properties \
--class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer \
spark-internal \
 \
 \
-

At this point, the interpreter Pod remains in "Running" state, while the Zeppelin note 
remains in "Pending" forever.

The log of the Interpreter (level = DEBUG) at the end only says:
   INFO [2021-10-25 18:16:58,229] ({RemoteInterpreterServer-Thread} 
RemoteInterpreterServer.java[run]:194) Launching ThriftServer at :
   INFO [2021-10-25 18:16:58,229] ({RegisterThread} 
RemoteInterpreterServer.java[run]:592) Start registration
   INFO [2021-10-25 18:16:58,332] ({RegisterThread} 
RemoteInterpreterServer.java[run]:606) Registering interpreter process
   INFO [2021-10-25 18:16:58,356] ({RegisterThread} 
RemoteInterpreterServer.java[run]:608) Registered