It silently allowed the object to serialize, though the serialized/deserialized session would not work. Now it explicitly fails.
On Mon, Jan 2, 2023 at 9:43 AM Shrikant Prasad <shrikant....@gmail.com> wrote: > Thats right. But the serialization would be happening in Spark 2.3 also, > why we dont see this error there? > > On Mon, 2 Jan 2023 at 9:09 PM, Sean Owen <sro...@gmail.com> wrote: > >> Oh, it's because you are defining "spark" within your driver object, and >> then it's getting serialized because you are trying to use TestMain methods >> in your program. >> This was never correct, but now it's an explicit error in Spark 3. The >> session should not be a member variable. >> >> On Mon, Jan 2, 2023 at 9:24 AM Shrikant Prasad <shrikant....@gmail.com> >> wrote: >> >>> Please see these logs. The error is thrown in executor: >>> >>> 23/01/02 15:14:44 ERROR Executor: Exception in task 0.0 in stage 0.0 >>> (TID 0) >>> >>> java.lang.ExceptionInInitializerError >>> >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>> >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> >>> at java.lang.reflect.Method.invoke(Method.java:498) >>> >>> at >>> java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:230) >>> >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>> >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> >>> at java.lang.reflect.Method.invoke(Method.java:498) >>> >>> at >>> java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1274) >>> >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196) >>> >>> at >>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) >>> >>> at >>> java.io.ObjectInputStream.readArray(ObjectInputStream.java:2093) >>> >>> at >>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1655) >>> >>> at >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) >>> >>> at >>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) >>> >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) >>> >>> at >>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) >>> >>> at >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) >>> >>> at >>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) >>> >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) >>> >>> at >>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) >>> >>> at >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) >>> >>> at >>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) >>> >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) >>> >>> at >>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) >>> >>> at >>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) >>> >>> at >>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) >>> >>> at >>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) >>> >>> at >>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) >>> >>> at >>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83) >>> >>> at org.apache.spark.scheduler.Task.run(Task.scala:127) >>> >>> at >>> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446) >>> >>> at >>> org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) >>> >>> at >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) >>> >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >>> >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >>> >>> at java.lang.Thread.run(Thread.java:748) >>> >>> Caused by: org.apache.spark.SparkException: A master URL must be set in >>> your configuration >>> >>> at org.apache.spark.SparkContext.<init>(SparkContext.scala:385) >>> >>> at >>> org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2574) >>> >>> at >>> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:934) >>> >>> at scala.Option.getOrElse(Option.scala:189) >>> >>> at >>> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:928) >>> >>> at TestMain$.<init>(TestMain.scala:12) >>> >>> at TestMain$.<clinit>(TestMain.scala) >>> >>> On Mon, 2 Jan 2023 at 8:29 PM, Sean Owen <sro...@gmail.com> wrote: >>> >>>> It's not running on the executor; that's not the issue. See your stack >>>> trace, where it clearly happens in the driver. >>>> >>>> On Mon, Jan 2, 2023 at 8:58 AM Shrikant Prasad <shrikant....@gmail.com> >>>> wrote: >>>> >>>>> Even if I set the master as yarn, it will not have access to rest of >>>>> the spark confs. It will need spark.yarn.app.id. >>>>> >>>>> The main issue is if its working as it is in Spark 2.3 why its not >>>>> working in Spark 3 i.e why the session is getting created on executor. >>>>> Another thing we tried is removing the df to rdd conversion just for >>>>> debug and it works in Spark 3. >>>>> >>>>> So, it might be something to do with df to rdd conversion or >>>>> serialization behavior change from Spark 2.3 to Spark 3.0 if there is any. >>>>> But couldn't find the root cause. >>>>> >>>>> Regards, >>>>> Shrikant >>>>> >>>>> On Mon, 2 Jan 2023 at 7:54 PM, Sean Owen <sro...@gmail.com> wrote: >>>>> >>>>>> So call .setMaster("yarn"), per the error >>>>>> >>>>>> On Mon, Jan 2, 2023 at 8:20 AM Shrikant Prasad < >>>>>> shrikant....@gmail.com> wrote: >>>>>> >>>>>>> We are running it in cluster deploy mode with yarn. >>>>>>> >>>>>>> Regards, >>>>>>> Shrikant >>>>>>> >>>>>>> On Mon, 2 Jan 2023 at 6:15 PM, Stelios Philippou <stevo...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Can we see your Spark Configuration parameters ? >>>>>>>> >>>>>>>> The mater URL refers to as per java >>>>>>>> new SparkConf()....setMaster("local[*]") >>>>>>>> according to where you want to run this >>>>>>>> >>>>>>>> On Mon, 2 Jan 2023 at 14:38, Shrikant Prasad < >>>>>>>> shrikant....@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I am trying to migrate one spark application from Spark 2.3 to >>>>>>>>> 3.0.1. >>>>>>>>> >>>>>>>>> The issue can be reproduced using below sample code: >>>>>>>>> >>>>>>>>> object TestMain { >>>>>>>>> >>>>>>>>> val session = >>>>>>>>> SparkSession.builder().appName("test").enableHiveSupport().getOrCreate() >>>>>>>>> >>>>>>>>> def main(args: Array[String]): Unit = { >>>>>>>>> >>>>>>>>> import session.implicits._ >>>>>>>>> val a = *session.*sparkContext.parallelize(*Array* >>>>>>>>> (("A",1),("B",2))).toDF("_c1","_c2").*rdd*.map(x=> >>>>>>>>> x(0).toString).collect() >>>>>>>>> *println*(a.mkString("|")) >>>>>>>>> >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>> It runs successfully in Spark 2.3 but fails with Spark 3.0.1 with >>>>>>>>> below exception: >>>>>>>>> >>>>>>>>> Caused by: org.apache.spark.SparkException: A master URL must be >>>>>>>>> set in your configuration >>>>>>>>> >>>>>>>>> at >>>>>>>>> org.apache.spark.SparkContext.<init>(SparkContext.scala:394) >>>>>>>>> >>>>>>>>> at >>>>>>>>> org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2690) >>>>>>>>> >>>>>>>>> at >>>>>>>>> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949) >>>>>>>>> >>>>>>>>> at scala.Option.getOrElse(Option.scala:189) >>>>>>>>> >>>>>>>>> at >>>>>>>>> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943) >>>>>>>>> >>>>>>>>> at TestMain$.<init>(TestMain.scala:7) >>>>>>>>> >>>>>>>>> at TestMain$.<clinit>(TestMain.scala) >>>>>>>>> >>>>>>>>> >>>>>>>>> From the exception it appears that it tries to create spark >>>>>>>>> session on executor also in Spark 3 whereas its not created again on >>>>>>>>> executor in Spark 2.3. >>>>>>>>> >>>>>>>>> Can anyone help in identfying why there is this change in >>>>>>>>> behavior? >>>>>>>>> >>>>>>>>> Thanks and Regards, >>>>>>>>> >>>>>>>>> Shrikant >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Regards, >>>>>>>>> Shrikant Prasad >>>>>>>>> >>>>>>>> -- >>>>>>> Regards, >>>>>>> Shrikant Prasad >>>>>>> >>>>>> -- >>>>> Regards, >>>>> Shrikant Prasad >>>>> >>>> -- >>> Regards, >>> Shrikant Prasad >>> >> -- > Regards, > Shrikant Prasad >