Possible to limit number of IPC retries on spark-submit?

2020-01-22 Thread Jeff Evans
Greetings,

Is it possible to limit the number of times the IPC client retries upon a
spark-submit invocation?  For context, see this StackOverflow post
.
In essence, I am trying to call spark-submit on a Kerberized cluster,
without having valid Kerberos tickets available.  This is deliberate, and
I'm not truly facing a Kerberos issue.  Rather, this is the
easiest reproducible case of "long IPC retry" I have been able to trigger.

In this particular case, the following errors are printed (presumably by
the launcher):

20/01/22 15:49:32 INFO retry.RetryInvocationHandler:
java.io.IOException: Failed on local exception: java.io.IOException:
org.apache.hadoop.security.AccessControlException: Client cannot
authenticate via:[TOKEN, KERBEROS]; Host Details : local host is:
"node-1.cluster/172.18.0.2"; destination host is:
"node-1.cluster":8032; , while invoking
ApplicationClientProtocolPBClientImpl.getClusterMetrics over null
after 1 failover attempts. Trying to failover after sleeping for
35160ms.

This continues for 30 times before the launcher finally gives up.

As indicated in the answer on that StackOverflow post, the relevant Hadoop
properties should be ipc.client.connect.max.retries and/or
ipc.client.connect.max.retries.on.sasl.  However, in testing on Spark 2.4.0
(on CDH 6.1), I am not able to get either of these to take effect (it still
retries 30 times regardless).  I am trying the SparkPi example, and
specifying them with --conf spark.hadoop.ipc.client.connect.max.retries
and/or --conf spark.hadoop.ipc.client.connect.max.retries.on.sasl.

Any ideas on what I could be doing wrong, or why I can't get these
properties to take effect?


Re: Is there a way to get the final web URL from an active Spark context

2020-01-22 Thread Jeff Evans
To answer my own question, it turns out what I was after is the YARN
ResourceManager URL for the Spark application.  As alluded to in SPARK-20458
, it's possible to use
the YARN API client to get this value.  Here is a gist that shows how it
can be done (given an instance of the Hadoop Configuration object):
https://gist.github.com/jeff303/8dab0e52dc227741b6605f576a317798


On Fri, Jan 17, 2020 at 4:09 PM Jeff Evans 
wrote:

> Given a session/context, we can get the UI web URL like this:
>
> sparkSession.sparkContext.uiWebUrl
>
> This gives me something like http://node-name.cluster-name:4040.  If
> opening this from outside the cluster (ex: my laptop), this redirects
> via HTTP 302 to something like
>
> http://node-name.cluster-name:8088/proxy/redirect/application_1579210019853_0023/
> .
> For discussion purposes, call the latter one the "final web URL".
> Critically, this final URL is active even after the application
> terminates.  The original uiWebUrl
> (http://node-name.cluster-name:4040) is not available after the
> application terminates, so one has to have captured the redirect in
> time, if they want to provide a persistent link to that history server
> UI entry (ex: for debugging purposes).
>
> Is there a way, other than using some HTTP client, to detect what this
> final URL will be directly from the SparkContext?
>


Re: Problems during upgrade 2.2.2 -> 2.4.4

2020-01-22 Thread bsikander
After digging deeper, we found that apps/workers inside zookeeper are not
deserializable but drivers can.
Due to this driver comes up (mysteriously).

The deserialization is failing due to "RpcEndpointRef".

I think somebody should be able to point me to a solution now, i guess.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Problems during upgrade 2.2.2 -> 2.4.4

2020-01-22 Thread bsikander
A few details about clusters

- Current Version 2.2
- Resource manager: Spark standalone
- Modes: cluster + supervise
- HA setup: Zookeeper
- Expected version after upgrade: 2.4.4

Note: Before and after the upgrade, everything works fine.

During the upgrade, I see number of issues.
- Spark master on version 2.4.4 tries to recover itself from zookeeper and
fails to deserialize the driver/app/worker objects and throws
InvalidClassException.
- Spark master (2.4.4) after failing to deserialize, deletes all the
information about driver/apps/workers from zookeeper and loses all contacts
to running JVMs.
- Sometimes mysteriously respawns the drivers but with new ids, having no
clue about old ones. Sometimes multiple "same" drivers are running at the
same time with different ids.
- Old spark workers (2.2) fails to communicate with new Spark master (2.4.4)

I checked the release notes and couldn't find anything regarding upgrades.

Could someone please help me answer a few questions above and maybe point me
to some documentation regarding upgrades. Or if the upgrades are not
working, then maybe some documentation which explains this would be helpful.

Exceptions as seen on master:
2020-01-21 23:58:09,010 INFO dispatcher-event-loop-1-EventThread
org.apache.spark.deploy.master.ZooKeeperLeaderElectionAgent: We have gained
leadership
2020-01-21 23:58:09,073 ERROR dispatcher-event-loop-1
org.apache.spark.util.Utils: Exception encountered
java.io.InvalidClassException: org.apache.spark.rpc.RpcEndpointRef; local
class incompatible: stream classdesc serialVersionUID = 1835832137613908542,
local class serialVersionUID = -1329125091869941550
at
java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:687)
at
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1880)
at
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1746)
at
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1880)
at
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1746)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2037)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2282)
at
java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:558)
at
org.apache.spark.deploy.master.ApplicationInfo$$anonfun$readObject$1.apply$mcV$sp(ApplicationInfo.scala:55)
at
org.apache.spark.deploy.master.ApplicationInfo$$anonfun$readObject$1.apply(ApplicationInfo.scala:54)
at
org.apache.spark.deploy.master.ApplicationInfo$$anonfun$readObject$1.apply(ApplicationInfo.scala:54)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)
at
org.apache.spark.deploy.master.ApplicationInfo.readObject(ApplicationInfo.scala:54)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1160)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2173)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:428)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:108)
at
org.apache.spark.deploy.master.ZooKeeperPersistenceEngine.org$apache$spark$deploy$master$ZooKeeperPersistenceEngine$$deserializeFromFile(ZooKeeperPersistenceEngine.scala:76)
at
org.apache.spark.deploy.master.ZooKeeperPersistenceEngine$$anonfun$read$2.apply(ZooKeeperPersistenceEngine.scala:59)
at
org.apache.spark.deploy.master.ZooKeeperPersistenceEngine$$anonfun$read$2.apply(ZooKeeperPersistenceEngine.scala:59)
at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at
scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at
org.apache.spark.deploy.master.ZooKeeperPersistenceEngine.read(ZooKeeperPersistenceEngine.scala:59)
at