Re: Query regarding Proleptic Gregorian Calendar Spark3

2022-09-20 Thread Sachit Murarka
Reposting once.
Kind Regards,
Sachit Murarka


On Tue, Sep 20, 2022 at 6:56 PM Sachit Murarka 
wrote:

> Hi All,
>
> I am getting below error , I read the document and understood that we need
> to set 2 properties
> spark.conf.set("spark.sql.parquet.int96RebaseModeInRead","CORRECTED")
> spark.conf.set("spark.sql.parquet.int96RebaseModeInWrite","CORRECTED")
>
> is this the only way or is there any other way to handle this behaviour?
>
> Caused by: org.apache.spark.SparkUpgradeException: You may get a different
> result due to the upgrading of Spark 3.0:
> writing dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z
> into Parquet INT96
> files can be dangerous, as the files may be read by Spark 2.x or legacy
> versions of Hive
> later, which uses a legacy hybrid calendar that is different from Spark
> 3.0+'s Proleptic
> Gregorian calendar. See more details in SPARK-31404. You can set
> spark.sql.parquet.int96RebaseModeInWrite to 'LEGACY' to
> rebase the datetime values w.r.t. the calendar difference during writing,
> to get maximum
> interoperability. Or set spark.sql.parquet.int96RebaseModeInWrite to
> 'CORRECTED' to write the datetime values as it is,
> if you are 100% sure that the written files will only be read by Spark
> 3.0+ or other
> systems that use Proleptic Gregorian calendar.
>
>
> Kind Regards,
> Sachit Murarka
>


Query regarding Proleptic Gregorian Calendar Spark3

2022-09-20 Thread Sachit Murarka
Hi All,

I am getting below error , I read the document and understood that we need
to set 2 properties
spark.conf.set("spark.sql.parquet.int96RebaseModeInRead","CORRECTED")
spark.conf.set("spark.sql.parquet.int96RebaseModeInWrite","CORRECTED")

is this the only way or is there any other way to handle this behaviour?

Caused by: org.apache.spark.SparkUpgradeException: You may get a different
result due to the upgrading of Spark 3.0:
writing dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z
into Parquet INT96
files can be dangerous, as the files may be read by Spark 2.x or legacy
versions of Hive
later, which uses a legacy hybrid calendar that is different from Spark
3.0+'s Proleptic
Gregorian calendar. See more details in SPARK-31404. You can set
spark.sql.parquet.int96RebaseModeInWrite to 'LEGACY' to
rebase the datetime values w.r.t. the calendar difference during writing,
to get maximum
interoperability. Or set spark.sql.parquet.int96RebaseModeInWrite to
'CORRECTED' to write the datetime values as it is,
if you are 100% sure that the written files will only be read by Spark 3.0+
or other
systems that use Proleptic Gregorian calendar.


Kind Regards,
Sachit Murarka


Error - Spark STREAMING

2022-09-20 Thread Akash Vellukai
Hello,


  py4j.protocol.Py4JJavaError: An error occurred while calling o80.load. :
java.lang.NoClassDefFoundError:
org/apache/spark/sql/internal/connector/SimpleTableProvider


May anyone help Me to solve this issue.


Thanks and regards
Akash


Re: Issue with SparkContext

2022-09-20 Thread javacaoyu
Is you using the pyspark?


If pyspark, you can try to set env about PYSPARK_PYTHON  SPARK_HOME
Example:


import os
os.environ['PYSPARK_PYTHON'] = “python path”
os.environ[’SPARK_HOME’] = “SPARK path”


you can try this code…may it can resolved this.


在 2022年9月20日 17:34,Bjørn Jørgensen 写道:


Hi, we have a user group at user@spark.apache.org 


You must install a java JRE 


If you are on ubuntu you can type
apt-get install openjdk-17-jre-headless



tir. 20. sep. 2022 kl. 06:15 skrev yogita bhardwaj :

 
I am getting the py4j.protocol.Py4JJavaError while running SparkContext. Can 
you please help me to resolve this issue.
 
Sent from Mail for Windows
 




-- 

Bjørn Jørgensen 
Vestre Aspehaug 4, 6010 Ålesund 
Norge

+47 480 94 297

Re: Issue with SparkContext

2022-09-20 Thread Bjørn Jørgensen
Hi, we have a user group at user@spark.apache.org

You must install a java JRE

If you are on ubuntu you can type
apt-get install openjdk-17-jre-headless

tir. 20. sep. 2022 kl. 06:15 skrev yogita bhardwaj <
yogita.bhard...@iktara.ai>:

>
>
> I am getting the py4j.protocol.Py4JJavaError while running SparkContext.
> Can you please help me to resolve this issue.
>
>
>
> Sent from Mail  for
> Windows
>
>
>


-- 
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge

+47 480 94 297


Re: Re: [how to]RDD using JDBC data source in PySpark

2022-09-20 Thread Bjørn Jørgensen
There is a PR for this now. [SPARK-40491][SQL] Expose a jdbcRDD function in
SparkContext 

man. 19. sep. 2022 kl. 12:47 skrev javaca...@163.com :

> Thank you Bjorn Jorgensen and also thank to Sean Owen.
>
> DataFrame and .format("jdbc") is good way to resolved it.
> But in some reasons, i can't using DataFrame API, only can use RDD API in
> PySpark.
> ...T_T...
>
> thanks all you guys help.  but still need new idea to resolve it. XD
>
>
>
>
>
> --
> javaca...@163.com
>
>
> *发件人:* Bjørn Jørgensen 
> *发送时间:* 2022-09-19 18:34
> *收件人:* javaca...@163.com
> *抄送:* Xiao, Alton ; user@spark.apache.org
> *主题:* Re: 答复: [how to]RDD using JDBC data source in PySpark
> https://www.projectpro.io/recipes/save-dataframe-mysql-pyspark
> and
> https://towardsdatascience.com/pyspark-mysql-tutorial-fa3f7c26dc7
>
> man. 19. sep. 2022 kl. 12:29 skrev javaca...@163.com :
>
>> Thank you answer alton.
>>
>> But i see that is use scala to implement it.
>> I know java/scala can get data from mysql using JDBCRDD farily well.
>> But i want to get same way in Python Spark.
>>
>> Would you to give me more advice, very thanks to you.
>>
>>
>> --
>> javaca...@163.com
>>
>>
>> *发件人:* Xiao, Alton 
>> *发送时间:* 2022-09-19 18:04
>> *收件人:* javaca...@163.com; user@spark.apache.org
>> *主题:* 答复: [how to]RDD using JDBC data source in PySpark
>>
>> Hi javacaoyu:
>>
>> https://hevodata.com/learn/spark-mysql/#Spark-MySQL-Integration
>>
>> I think spark have already integrated mysql
>>
>>
>>
>> *发件人**:* javaca...@163.com 
>> *日期**:* 星期一, 2022年9月19日 17:53
>> *收件人**:* user@spark.apache.org 
>> *主题**:* [how to]RDD using JDBC data source in PySpark
>>
>> 你通常不会收到来自 javaca...@163.com 的电子邮件。了解这一点为什么很重要
>> 
>>
>> Hi guys:
>>
>>
>>
>> Does have some way to let rdd can using jdbc data source in pyspark?
>>
>>
>>
>> i want to get data from mysql, but in PySpark, there is not supported
>> JDBCRDD like java/scala.
>>
>> and i search docs from web site, no answer.
>>
>>
>>
>>
>>
>> So i need your guys help,  Thank you very much.
>>
>>
>> --
>>
>> javaca...@163.com
>>
>>
>
> --
> Bjørn Jørgensen
> Vestre Aspehaug 4, 6010 Ålesund
> Norge
>
> +47 480 94 297
>
>

-- 
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge

+47 480 94 297


Re: NoClassDefError and SparkSession should only be created and accessed on the driver.

2022-09-20 Thread Paul Rogalinski
Hi Rajat,


I have been facing similar problem recently and could solve it by moving the 
UDF implementation into a dedicated class instead having it implemented in the 
driver class/object.


Regards,
Paul.

On Tuesday 20 September 2022 10:11:31 (+02:00), rajat kumar wrote:


Hi Alton, it's in same scala class only. Is there any change in spark3 to 
serialize separately?


Regards
Rajat


On Tue, Sep 20, 2022, 13:35 Xiao, Alton  wrote:


Can you show us your code?

your udf wasn’t  serialized by spark, In my opinion,  were they out of the 
spark running code?

 

发件人: rajat kumar 
日期: 星期二, 2022年9月20日 15:58
收件人: user @spark 
主题: NoClassDefError and SparkSession should only be created and accessed on the 
driver.

Hello ,

I am using Spark3 where there are some UDFs along . I am using Dataframe APIs 
to write parquet using spark. I am getting NoClassDefError along with below 
error. 

If I comment out all UDFs , it is working fine. 

Could someone suggest what could be wrong. It was working fine in Spark2.4

22/09/20 06:33:17 WARN TaskSetManager: Lost task 9.0 in stage 1.0 (TID 10) 
(vm-36408481 executor 2): java.lang.ExceptionInInitializerError

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:230)

at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1274)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)

at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)

at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)

at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)

at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)

at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)

at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)

at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)

at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)

at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)

at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)

at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)

at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)

at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)

at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)

at 

Re: NoClassDefError and SparkSession should only be created and accessed on the driver.

2022-09-20 Thread rajat kumar
Hi Alton, it's in same scala class only. Is there any change in spark3 to
serialize separately?

Regards
Rajat

On Tue, Sep 20, 2022, 13:35 Xiao, Alton  wrote:

> Can you show us your code?
>
> your udf wasn’t  serialized by spark, In my opinion,  were they out of the
> spark running code?
>
>
>
> *发件人**:* rajat kumar 
> *日期**:* 星期二, 2022年9月20日 15:58
> *收件人**:* user @spark 
> *主题**:* NoClassDefError and SparkSession should only be created and
> accessed on the driver.
>
> Hello ,
>
> I am using Spark3 where there are some UDFs along . I am using Dataframe
> APIs to write parquet using spark. I am getting NoClassDefError along with
> below error.
>
> If I comment out all UDFs , it is working fine.
>
> Could someone suggest what could be wrong. It was working fine in Spark2.4
>
> 22/09/20 06:33:17 WARN TaskSetManager: Lost task 9.0 in stage 1.0 (TID 10)
> (vm-36408481 executor 2): java.lang.ExceptionInInitializerError
>
> *at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)*
>
> *at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)*
>
> *at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*
>
> *at java.lang.reflect.Method.invoke(Method.java:498)*
>
> *at
> java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:230)*
>
> *at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)*
>
> *at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*
>
> *at java.lang.reflect.Method.invoke(Method.java:498)*
>
> *at
> java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1274)*
>
> *at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:)*
>
> *at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)*
>
> *at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)*
>
> *at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)*
>
> *at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)*
>
> *at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)*
>
> *at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)*
>
> *at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)*
>
> *at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)*
>
> *at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)*
>
> *at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)*
>
> *at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)*
>
> *at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)*
>
> *at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)*
>
> *at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)*
>
> *at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)*
>
> *at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)*
>
> *at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)*
>
> *at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)*
>
> *at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)*
>
> *at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)*
>
> *at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)*
>
> *at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)*
>
> *at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)*
>
> *at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)*
>
> *at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)*
>
> *at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)*
>
> *at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)*
>
> *at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)*
>
> *at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)*
>
> *at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)*
>
> *at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)*
>
> *at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)*
>
> *at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)*
>
> *at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)*
>
> *at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)*
>
> *at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)*
>
> *at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)*
>
> *at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)*
>
> *at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)*
>
> *at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)*
>
> *at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)*
>
> *at
> 

答复: NoClassDefError and SparkSession should only be created and accessed on the driver.

2022-09-20 Thread Xiao, Alton
Can you show us your code?
your udf wasn’t  serialized by spark, In my opinion,  were they out of the 
spark running code?

发件人: rajat kumar 
日期: 星期二, 2022年9月20日 15:58
收件人: user @spark 
主题: NoClassDefError and SparkSession should only be created and accessed on the 
driver.
Hello ,

I am using Spark3 where there are some UDFs along . I am using Dataframe APIs 
to write parquet using spark. I am getting NoClassDefError along with below 
error.

If I comment out all UDFs , it is working fine.

Could someone suggest what could be wrong. It was working fine in Spark2.4
22/09/20 06:33:17 WARN TaskSetManager: Lost task 9.0 in stage 1.0 (TID 10) 
(vm-36408481 executor 2): java.lang.ExceptionInInitializerError
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:230)
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1274)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at 

NoClassDefError and SparkSession should only be created and accessed on the driver.

2022-09-20 Thread rajat kumar
Hello ,

I am using Spark3 where there are some UDFs along . I am using Dataframe
APIs to write parquet using spark. I am getting NoClassDefError along with
below error.

If I comment out all UDFs , it is working fine.

Could someone suggest what could be wrong. It was working fine in Spark2.4

22/09/20 06:33:17 WARN TaskSetManager: Lost task 9.0 in stage 1.0 (TID 10)
(vm-36408481 executor 2): java.lang.ExceptionInInitializerError
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:230)
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1274)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at