Re: Spark migration from 2.3 to 3.0.1
I agree with you that it's not the recommended approach. But I just want to understand which change caused this change in behavior. If you can point me to some Jira in which this change was made, that would be greatly appreciated. Regards, Shrikant On Mon, 2 Jan 2023 at 9:46 PM, Sean Owen wrote: > Not true, you've never been able to use the SparkSession inside a Spark > task. You aren't actually using it, if the application worked in Spark 2.x. > Now, you need to avoid accidentally serializing it, which was the right > thing to do even in Spark 2.x. Just move the sesion inside main(), not a > member. > Or what other explanation do you have? I don't understand. > > On Mon, Jan 2, 2023 at 10:10 AM Shrikant Prasad > wrote: > >> If that was the case and deserialized session would not work, the >> application would not have worked. >> >> As per the logs and debug prints, in spark 2.3 the main object is not >> getting deserialized in executor, otherise it would have failed then also. >> >> On Mon, 2 Jan 2023 at 9:15 PM, Sean Owen wrote: >> >>> It silently allowed the object to serialize, though the >>> serialized/deserialized session would not work. Now it explicitly fails. >>> >>> On Mon, Jan 2, 2023 at 9:43 AM Shrikant Prasad >>> wrote: >>> Thats right. But the serialization would be happening in Spark 2.3 also, why we dont see this error there? On Mon, 2 Jan 2023 at 9:09 PM, Sean Owen wrote: > Oh, it's because you are defining "spark" within your driver object, > and then it's getting serialized because you are trying to use TestMain > methods in your program. > This was never correct, but now it's an explicit error in Spark 3. The > session should not be a member variable. > > On Mon, Jan 2, 2023 at 9:24 AM Shrikant Prasad > wrote: > >> Please see these logs. The error is thrown in executor: >> >> 23/01/02 15:14:44 ERROR Executor: Exception in task 0.0 in stage 0.0 >> (TID 0) >> >> java.lang.ExceptionInInitializerError >> >>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> >>at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> >>at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> >>at java.lang.reflect.Method.invoke(Method.java:498) >> >>at >> java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:230) >> >>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> >>at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> >>at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> >>at java.lang.reflect.Method.invoke(Method.java:498) >> >>at >> java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1274) >> >>at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196) >> >>at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) >> >>at >> java.io.ObjectInputStream.readArray(ObjectInputStream.java:2093) >> >>at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1655) >> >>at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) >> >>at >> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) >> >>at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) >> >>at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) >> >>at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) >> >>at >> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) >> >>at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) >> >>at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) >> >>at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) >> >>at >> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) >> >>at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) >> >>at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) >> >>at >> java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) >> >>at >> java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) >> >>at >> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) >> >>at >> org.apache.spar
Re: Spark migration from 2.3 to 3.0.1
Not true, you've never been able to use the SparkSession inside a Spark task. You aren't actually using it, if the application worked in Spark 2.x. Now, you need to avoid accidentally serializing it, which was the right thing to do even in Spark 2.x. Just move the sesion inside main(), not a member. Or what other explanation do you have? I don't understand. On Mon, Jan 2, 2023 at 10:10 AM Shrikant Prasad wrote: > If that was the case and deserialized session would not work, the > application would not have worked. > > As per the logs and debug prints, in spark 2.3 the main object is not > getting deserialized in executor, otherise it would have failed then also. > > On Mon, 2 Jan 2023 at 9:15 PM, Sean Owen wrote: > >> It silently allowed the object to serialize, though the >> serialized/deserialized session would not work. Now it explicitly fails. >> >> On Mon, Jan 2, 2023 at 9:43 AM Shrikant Prasad >> wrote: >> >>> Thats right. But the serialization would be happening in Spark 2.3 also, >>> why we dont see this error there? >>> >>> On Mon, 2 Jan 2023 at 9:09 PM, Sean Owen wrote: >>> Oh, it's because you are defining "spark" within your driver object, and then it's getting serialized because you are trying to use TestMain methods in your program. This was never correct, but now it's an explicit error in Spark 3. The session should not be a member variable. On Mon, Jan 2, 2023 at 9:24 AM Shrikant Prasad wrote: > Please see these logs. The error is thrown in executor: > > 23/01/02 15:14:44 ERROR Executor: Exception in task 0.0 in stage 0.0 > (TID 0) > > java.lang.ExceptionInInitializerError > >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > >at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >at java.lang.reflect.Method.invoke(Method.java:498) > >at > java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:230) > >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > >at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >at java.lang.reflect.Method.invoke(Method.java:498) > >at > java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1274) > >at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196) > >at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) > >at > java.io.ObjectInputStream.readArray(ObjectInputStream.java:2093) > >at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1655) > >at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) > >at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) > >at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) > >at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) > >at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) > >at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) > >at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) > >at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) > >at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) > >at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) > >at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) > >at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) > >at > java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) > >at > java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) > >at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) > >at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) > >at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83) > >at org.apache.spark.scheduler.Task.run(Task.scala:127) > >at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446) > >at > org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala
Re: Spark migration from 2.3 to 3.0.1
If that was the case and deserialized session would not work, the application would not have worked. As per the logs and debug prints, in spark 2.3 the main object is not getting deserialized in executor, otherise it would have failed then also. On Mon, 2 Jan 2023 at 9:15 PM, Sean Owen wrote: > It silently allowed the object to serialize, though the > serialized/deserialized session would not work. Now it explicitly fails. > > On Mon, Jan 2, 2023 at 9:43 AM Shrikant Prasad > wrote: > >> Thats right. But the serialization would be happening in Spark 2.3 also, >> why we dont see this error there? >> >> On Mon, 2 Jan 2023 at 9:09 PM, Sean Owen wrote: >> >>> Oh, it's because you are defining "spark" within your driver object, and >>> then it's getting serialized because you are trying to use TestMain methods >>> in your program. >>> This was never correct, but now it's an explicit error in Spark 3. The >>> session should not be a member variable. >>> >>> On Mon, Jan 2, 2023 at 9:24 AM Shrikant Prasad >>> wrote: >>> Please see these logs. The error is thrown in executor: 23/01/02 15:14:44 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.ExceptionInInitializerError at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:230) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1274) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2093) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1655) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83) at org.apache.spark.scheduler.Task.run(Task.scala:127) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.SparkException: A master URL must be set in your configuration at org.apache.spark.SparkContext.(SparkContext.scala:385) >>>
Re: Spark migration from 2.3 to 3.0.1
It silently allowed the object to serialize, though the serialized/deserialized session would not work. Now it explicitly fails. On Mon, Jan 2, 2023 at 9:43 AM Shrikant Prasad wrote: > Thats right. But the serialization would be happening in Spark 2.3 also, > why we dont see this error there? > > On Mon, 2 Jan 2023 at 9:09 PM, Sean Owen wrote: > >> Oh, it's because you are defining "spark" within your driver object, and >> then it's getting serialized because you are trying to use TestMain methods >> in your program. >> This was never correct, but now it's an explicit error in Spark 3. The >> session should not be a member variable. >> >> On Mon, Jan 2, 2023 at 9:24 AM Shrikant Prasad >> wrote: >> >>> Please see these logs. The error is thrown in executor: >>> >>> 23/01/02 15:14:44 ERROR Executor: Exception in task 0.0 in stage 0.0 >>> (TID 0) >>> >>> java.lang.ExceptionInInitializerError >>> >>>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> >>>at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>> >>>at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> >>>at java.lang.reflect.Method.invoke(Method.java:498) >>> >>>at >>> java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:230) >>> >>>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> >>>at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>> >>>at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> >>>at java.lang.reflect.Method.invoke(Method.java:498) >>> >>>at >>> java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1274) >>> >>>at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196) >>> >>>at >>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) >>> >>>at >>> java.io.ObjectInputStream.readArray(ObjectInputStream.java:2093) >>> >>>at >>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1655) >>> >>>at >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) >>> >>>at >>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) >>> >>>at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) >>> >>>at >>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) >>> >>>at >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) >>> >>>at >>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) >>> >>>at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) >>> >>>at >>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) >>> >>>at >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) >>> >>>at >>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) >>> >>>at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) >>> >>>at >>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) >>> >>>at >>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) >>> >>>at >>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) >>> >>>at >>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) >>> >>>at >>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) >>> >>>at >>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83) >>> >>>at org.apache.spark.scheduler.Task.run(Task.scala:127) >>> >>>at >>> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446) >>> >>>at >>> org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) >>> >>>at >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) >>> >>>at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >>> >>>at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >>> >>>at java.lang.Thread.run(Thread.java:748) >>> >>> Caused by: org.apache.spark.SparkException: A master URL must be set in >>> your configuration >>> >>>at org.apache.spark.SparkContext.(SparkContext.scala:385) >>> >>>at >>> org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2574) >>> >>>at >>> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:934) >>> >>>at scala.Option.getOrElse(Option.scala:189) >>> >>>at >>> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:928) >>> >>>at TestMain$.(TestMain.scala:12) >>> >>>at TestMain$.(TestMai
Re: Spark migration from 2.3 to 3.0.1
Thats right. But the serialization would be happening in Spark 2.3 also, why we dont see this error there? On Mon, 2 Jan 2023 at 9:09 PM, Sean Owen wrote: > Oh, it's because you are defining "spark" within your driver object, and > then it's getting serialized because you are trying to use TestMain methods > in your program. > This was never correct, but now it's an explicit error in Spark 3. The > session should not be a member variable. > > On Mon, Jan 2, 2023 at 9:24 AM Shrikant Prasad > wrote: > >> Please see these logs. The error is thrown in executor: >> >> 23/01/02 15:14:44 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID >> 0) >> >> java.lang.ExceptionInInitializerError >> >>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> >>at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> >>at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> >>at java.lang.reflect.Method.invoke(Method.java:498) >> >>at >> java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:230) >> >>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> >>at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> >>at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> >>at java.lang.reflect.Method.invoke(Method.java:498) >> >>at >> java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1274) >> >>at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196) >> >>at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) >> >>at >> java.io.ObjectInputStream.readArray(ObjectInputStream.java:2093) >> >>at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1655) >> >>at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) >> >>at >> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) >> >>at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) >> >>at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) >> >>at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) >> >>at >> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) >> >>at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) >> >>at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) >> >>at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) >> >>at >> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) >> >>at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) >> >>at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) >> >>at >> java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) >> >>at >> java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) >> >>at >> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) >> >>at >> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) >> >>at >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83) >> >>at org.apache.spark.scheduler.Task.run(Task.scala:127) >> >>at >> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446) >> >>at >> org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) >> >>at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) >> >>at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >> >>at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >> >>at java.lang.Thread.run(Thread.java:748) >> >> Caused by: org.apache.spark.SparkException: A master URL must be set in >> your configuration >> >>at org.apache.spark.SparkContext.(SparkContext.scala:385) >> >>at >> org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2574) >> >>at >> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:934) >> >>at scala.Option.getOrElse(Option.scala:189) >> >>at >> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:928) >> >>at TestMain$.(TestMain.scala:12) >> >>at TestMain$.(TestMain.scala) >> >> On Mon, 2 Jan 2023 at 8:29 PM, Sean Owen wrote: >> >>> It's not running on the executor; that's not the issue. See your stack >>> trace, where it clearly happens in the driver. >>> >>> On Mon, Jan 2, 2023 at 8:58 AM Shrikant Prasad >>> wrote: >>> Even if I set the master as yarn, it will not have access to rest of
Re: Spark migration from 2.3 to 3.0.1
Oh, it's because you are defining "spark" within your driver object, and then it's getting serialized because you are trying to use TestMain methods in your program. This was never correct, but now it's an explicit error in Spark 3. The session should not be a member variable. On Mon, Jan 2, 2023 at 9:24 AM Shrikant Prasad wrote: > Please see these logs. The error is thrown in executor: > > 23/01/02 15:14:44 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID > 0) > > java.lang.ExceptionInInitializerError > >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > >at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >at java.lang.reflect.Method.invoke(Method.java:498) > >at > java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:230) > >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > >at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >at java.lang.reflect.Method.invoke(Method.java:498) > >at > java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1274) > >at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196) > >at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) > >at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2093) > >at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1655) > >at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) > >at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) > >at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) > >at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) > >at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) > >at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) > >at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) > >at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) > >at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) > >at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) > >at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) > >at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) > >at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) > >at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) > >at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) > >at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) > >at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83) > >at org.apache.spark.scheduler.Task.run(Task.scala:127) > >at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446) > >at > org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) > >at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) > >at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > >at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > >at java.lang.Thread.run(Thread.java:748) > > Caused by: org.apache.spark.SparkException: A master URL must be set in > your configuration > >at org.apache.spark.SparkContext.(SparkContext.scala:385) > >at > org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2574) > >at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:934) > >at scala.Option.getOrElse(Option.scala:189) > >at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:928) > >at TestMain$.(TestMain.scala:12) > >at TestMain$.(TestMain.scala) > > On Mon, 2 Jan 2023 at 8:29 PM, Sean Owen wrote: > >> It's not running on the executor; that's not the issue. See your stack >> trace, where it clearly happens in the driver. >> >> On Mon, Jan 2, 2023 at 8:58 AM Shrikant Prasad >> wrote: >> >>> Even if I set the master as yarn, it will not have access to rest of the >>> spark confs. It will need spark.yarn.app.id. >>> >>> The main issue is if its working as it is in Spark 2.3 why its not >>> working in Spark 3 i.e why the session is getting created on executor. >>> Another thing we tried is removing the df to rdd conversion just for >>> debug and it works in Spark 3. >>> >>> So,
Re: Spark migration from 2.3 to 3.0.1
Please see these logs. The error is thrown in executor: 23/01/02 15:14:44 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.ExceptionInInitializerError at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:230) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1274) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2093) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1655) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83) at org.apache.spark.scheduler.Task.run(Task.scala:127) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.SparkException: A master URL must be set in your configuration at org.apache.spark.SparkContext.(SparkContext.scala:385) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2574) at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:934) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:928) at TestMain$.(TestMain.scala:12) at TestMain$.(TestMain.scala) On Mon, 2 Jan 2023 at 8:29 PM, Sean Owen wrote: > It's not running on the executor; that's not the issue. See your stack > trace, where it clearly happens in the driver. > > On Mon, Jan 2, 2023 at 8:58 AM Shrikant Prasad > wrote: > >> Even if I set the master as yarn, it will not have access to rest of the >> spark confs. It will need spark.yarn.app.id. >> >> The main issue is if its working as it is in Spark 2.3 why its not >> working in Spark 3 i.e why the session is getting created on executor. >> Another thing we tried is removing the df to rdd conversion just for >> debug and it works in Spark 3. >> >> So, it might be something to do with df to rdd conversion or >> serialization behavior change from Spark 2.3 to Spark 3.0 if there is any. >> But couldn't find the root cause. >> >> Regards, >> Shrikant >> >> On Mon, 2 Jan 2023 at 7:54 PM, Sean Owen wrote: >> >>> So call .setMaster("yarn"), per the error >>> >>> On Mon, Jan 2, 2023 at 8:20 AM Shrikant Prasad >>> wrote: >>> We are running it in cluster deploy mode with yarn. Regards, Shrikant On Mon, 2 Jan 2023 at 6:15 PM, Stelios Philippou wrote: > Can we see your Spark
Re: Spark migration from 2.3 to 3.0.1
It's not running on the executor; that's not the issue. See your stack trace, where it clearly happens in the driver. On Mon, Jan 2, 2023 at 8:58 AM Shrikant Prasad wrote: > Even if I set the master as yarn, it will not have access to rest of the > spark confs. It will need spark.yarn.app.id. > > The main issue is if its working as it is in Spark 2.3 why its not working > in Spark 3 i.e why the session is getting created on executor. > Another thing we tried is removing the df to rdd conversion just for debug > and it works in Spark 3. > > So, it might be something to do with df to rdd conversion or serialization > behavior change from Spark 2.3 to Spark 3.0 if there is any. But couldn't > find the root cause. > > Regards, > Shrikant > > On Mon, 2 Jan 2023 at 7:54 PM, Sean Owen wrote: > >> So call .setMaster("yarn"), per the error >> >> On Mon, Jan 2, 2023 at 8:20 AM Shrikant Prasad >> wrote: >> >>> We are running it in cluster deploy mode with yarn. >>> >>> Regards, >>> Shrikant >>> >>> On Mon, 2 Jan 2023 at 6:15 PM, Stelios Philippou >>> wrote: >>> Can we see your Spark Configuration parameters ? The mater URL refers to as per java new SparkConf()setMaster("local[*]") according to where you want to run this On Mon, 2 Jan 2023 at 14:38, Shrikant Prasad wrote: > Hi, > > I am trying to migrate one spark application from Spark 2.3 to 3.0.1. > > The issue can be reproduced using below sample code: > > object TestMain { > > val session = > SparkSession.builder().appName("test").enableHiveSupport().getOrCreate() > > def main(args: Array[String]): Unit = { > > import session.implicits._ > val a = *session.*sparkContext.parallelize(*Array* > (("A",1),("B",2))).toDF("_c1","_c2").*rdd*.map(x=> > x(0).toString).collect() > *println*(a.mkString("|")) > > } > } > > It runs successfully in Spark 2.3 but fails with Spark 3.0.1 with > below exception: > > Caused by: org.apache.spark.SparkException: A master URL must be set > in your configuration > > at > org.apache.spark.SparkContext.(SparkContext.scala:394) > > at > org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2690) > > at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949) > > at scala.Option.getOrElse(Option.scala:189) > > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943) > > at TestMain$.(TestMain.scala:7) > > at TestMain$.(TestMain.scala) > > > From the exception it appears that it tries to create spark session on > executor also in Spark 3 whereas its not created again on executor in > Spark > 2.3. > > Can anyone help in identfying why there is this change in behavior? > > Thanks and Regards, > > Shrikant > > -- > Regards, > Shrikant Prasad > -- >>> Regards, >>> Shrikant Prasad >>> >> -- > Regards, > Shrikant Prasad >
Re: Spark migration from 2.3 to 3.0.1
Even if I set the master as yarn, it will not have access to rest of the spark confs. It will need spark.yarn.app.id. The main issue is if its working as it is in Spark 2.3 why its not working in Spark 3 i.e why the session is getting created on executor. Another thing we tried is removing the df to rdd conversion just for debug and it works in Spark 3. So, it might be something to do with df to rdd conversion or serialization behavior change from Spark 2.3 to Spark 3.0 if there is any. But couldn't find the root cause. Regards, Shrikant On Mon, 2 Jan 2023 at 7:54 PM, Sean Owen wrote: > So call .setMaster("yarn"), per the error > > On Mon, Jan 2, 2023 at 8:20 AM Shrikant Prasad > wrote: > >> We are running it in cluster deploy mode with yarn. >> >> Regards, >> Shrikant >> >> On Mon, 2 Jan 2023 at 6:15 PM, Stelios Philippou >> wrote: >> >>> Can we see your Spark Configuration parameters ? >>> >>> The mater URL refers to as per java >>> new SparkConf()setMaster("local[*]") >>> according to where you want to run this >>> >>> On Mon, 2 Jan 2023 at 14:38, Shrikant Prasad >>> wrote: >>> Hi, I am trying to migrate one spark application from Spark 2.3 to 3.0.1. The issue can be reproduced using below sample code: object TestMain { val session = SparkSession.builder().appName("test").enableHiveSupport().getOrCreate() def main(args: Array[String]): Unit = { import session.implicits._ val a = *session.*sparkContext.parallelize(*Array* (("A",1),("B",2))).toDF("_c1","_c2").*rdd*.map(x=> x(0).toString).collect() *println*(a.mkString("|")) } } It runs successfully in Spark 2.3 but fails with Spark 3.0.1 with below exception: Caused by: org.apache.spark.SparkException: A master URL must be set in your configuration at org.apache.spark.SparkContext.(SparkContext.scala:394) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2690) at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943) at TestMain$.(TestMain.scala:7) at TestMain$.(TestMain.scala) From the exception it appears that it tries to create spark session on executor also in Spark 3 whereas its not created again on executor in Spark 2.3. Can anyone help in identfying why there is this change in behavior? Thanks and Regards, Shrikant -- Regards, Shrikant Prasad >>> -- >> Regards, >> Shrikant Prasad >> > -- Regards, Shrikant Prasad
Re: Spark migration from 2.3 to 3.0.1
So call .setMaster("yarn"), per the error On Mon, Jan 2, 2023 at 8:20 AM Shrikant Prasad wrote: > We are running it in cluster deploy mode with yarn. > > Regards, > Shrikant > > On Mon, 2 Jan 2023 at 6:15 PM, Stelios Philippou > wrote: > >> Can we see your Spark Configuration parameters ? >> >> The mater URL refers to as per java >> new SparkConf()setMaster("local[*]") >> according to where you want to run this >> >> On Mon, 2 Jan 2023 at 14:38, Shrikant Prasad >> wrote: >> >>> Hi, >>> >>> I am trying to migrate one spark application from Spark 2.3 to 3.0.1. >>> >>> The issue can be reproduced using below sample code: >>> >>> object TestMain { >>> >>> val session = >>> SparkSession.builder().appName("test").enableHiveSupport().getOrCreate() >>> >>> def main(args: Array[String]): Unit = { >>> >>> import session.implicits._ >>> val a = *session.*sparkContext.parallelize(*Array* >>> (("A",1),("B",2))).toDF("_c1","_c2").*rdd*.map(x=> >>> x(0).toString).collect() >>> *println*(a.mkString("|")) >>> >>> } >>> } >>> >>> It runs successfully in Spark 2.3 but fails with Spark 3.0.1 with below >>> exception: >>> >>> Caused by: org.apache.spark.SparkException: A master URL must be set in >>> your configuration >>> >>> at >>> org.apache.spark.SparkContext.(SparkContext.scala:394) >>> >>> at >>> org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2690) >>> >>> at >>> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949) >>> >>> at scala.Option.getOrElse(Option.scala:189) >>> >>> at >>> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943) >>> >>> at TestMain$.(TestMain.scala:7) >>> >>> at TestMain$.(TestMain.scala) >>> >>> >>> From the exception it appears that it tries to create spark session on >>> executor also in Spark 3 whereas its not created again on executor in Spark >>> 2.3. >>> >>> Can anyone help in identfying why there is this change in behavior? >>> >>> Thanks and Regards, >>> >>> Shrikant >>> >>> -- >>> Regards, >>> Shrikant Prasad >>> >> -- > Regards, > Shrikant Prasad >
Re: Spark migration from 2.3 to 3.0.1
We are running it in cluster deploy mode with yarn. Regards, Shrikant On Mon, 2 Jan 2023 at 6:15 PM, Stelios Philippou wrote: > Can we see your Spark Configuration parameters ? > > The mater URL refers to as per java > new SparkConf()setMaster("local[*]") > according to where you want to run this > > On Mon, 2 Jan 2023 at 14:38, Shrikant Prasad > wrote: > >> Hi, >> >> I am trying to migrate one spark application from Spark 2.3 to 3.0.1. >> >> The issue can be reproduced using below sample code: >> >> object TestMain { >> >> val session = >> SparkSession.builder().appName("test").enableHiveSupport().getOrCreate() >> >> def main(args: Array[String]): Unit = { >> >> import session.implicits._ >> val a = *session.*sparkContext.parallelize(*Array* >> (("A",1),("B",2))).toDF("_c1","_c2").*rdd*.map(x=> >> x(0).toString).collect() >> *println*(a.mkString("|")) >> >> } >> } >> >> It runs successfully in Spark 2.3 but fails with Spark 3.0.1 with below >> exception: >> >> Caused by: org.apache.spark.SparkException: A master URL must be set in >> your configuration >> >> at >> org.apache.spark.SparkContext.(SparkContext.scala:394) >> >> at >> org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2690) >> >> at >> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949) >> >> at scala.Option.getOrElse(Option.scala:189) >> >> at >> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943) >> >> at TestMain$.(TestMain.scala:7) >> >> at TestMain$.(TestMain.scala) >> >> >> From the exception it appears that it tries to create spark session on >> executor also in Spark 3 whereas its not created again on executor in Spark >> 2.3. >> >> Can anyone help in identfying why there is this change in behavior? >> >> Thanks and Regards, >> >> Shrikant >> >> -- >> Regards, >> Shrikant Prasad >> > -- Regards, Shrikant Prasad
Re: Spark migration from 2.3 to 3.0.1
Can we see your Spark Configuration parameters ? The mater URL refers to as per java new SparkConf()setMaster("local[*]") according to where you want to run this On Mon, 2 Jan 2023 at 14:38, Shrikant Prasad wrote: > Hi, > > I am trying to migrate one spark application from Spark 2.3 to 3.0.1. > > The issue can be reproduced using below sample code: > > object TestMain { > > val session = > SparkSession.builder().appName("test").enableHiveSupport().getOrCreate() > > def main(args: Array[String]): Unit = { > > import session.implicits._ > val a = *session.*sparkContext.parallelize(*Array* > (("A",1),("B",2))).toDF("_c1","_c2").*rdd*.map(x=> > x(0).toString).collect() > *println*(a.mkString("|")) > > } > } > > It runs successfully in Spark 2.3 but fails with Spark 3.0.1 with below > exception: > > Caused by: org.apache.spark.SparkException: A master URL must be set in > your configuration > > at > org.apache.spark.SparkContext.(SparkContext.scala:394) > > at > org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2690) > > at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949) > > at scala.Option.getOrElse(Option.scala:189) > > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943) > > at TestMain$.(TestMain.scala:7) > > at TestMain$.(TestMain.scala) > > > From the exception it appears that it tries to create spark session on > executor also in Spark 3 whereas its not created again on executor in Spark > 2.3. > > Can anyone help in identfying why there is this change in behavior? > > Thanks and Regards, > > Shrikant > > -- > Regards, > Shrikant Prasad >
Spark migration from 2.3 to 3.0.1
Hi, I am trying to migrate one spark application from Spark 2.3 to 3.0.1. The issue can be reproduced using below sample code: object TestMain { val session = SparkSession.builder().appName("test").enableHiveSupport().getOrCreate() def main(args: Array[String]): Unit = { import session.implicits._ val a = *session.*sparkContext.parallelize(*Array*(("A",1),("B",2))).toDF("_ c1","_c2").*rdd*.map(x=> x(0).toString).collect() *println*(a.mkString("|")) } } It runs successfully in Spark 2.3 but fails with Spark 3.0.1 with below exception: Caused by: org.apache.spark.SparkException: A master URL must be set in your configuration at org.apache.spark.SparkContext.(SparkContext.scala:394) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2690) at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943) at TestMain$.(TestMain.scala:7) at TestMain$.(TestMain.scala) >From the exception it appears that it tries to create spark session on executor also in Spark 3 whereas its not created again on executor in Spark 2.3. Can anyone help in identfying why there is this change in behavior? Thanks and Regards, Shrikant -- Regards, Shrikant Prasad