Re: Data analysis issues

2023-11-02 Thread Mich Talebzadeh
Hi,

Your mileage varies so to speak.Whether or not the data you use to analyze
in Spark through RStudio will be seen by Spark's back-end depends on how
you deploy Spark and RStudio. If you are deploying Spark and RStudio on
your own premises or in a private cloud environment, then the data you use
will only be accessible to the roles that have access to your environment.
However, if you are using a managed Spark service such as Google Dataproc
or Amazon EMR etc, then the data you use may be accessible to Spark's
back-end. This is because managed Spark services typically store your data
on their own servers. Try using encryption combined with RBAC (who can
access what), to protect your data privacy. Also beware of security risks
associated with third-party libraries if you are deploying them.

HTH

Mich Talebzadeh,
Distinguished Technologist, Solutions Architect & Engineer
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 2 Nov 2023 at 22:46, Jauru Lin  wrote:

> Hello all,
>
> I have a question about Apache Spark,
> I would like to ask if I use Rstudio to connect to Spark to analyze data,
> will the data I use be seen by Spark's back-end personnel?
>
> Hope someone can solve my problem.
> Thanks!
>


Re: Spark / Scala conflict

2023-11-02 Thread Harry Jamison
Thanks Alonso,
I think this gives me some ideas.

My code is written in Python, and I use spark-submit to submit it.
I am not sure what code is written in scala.  Maybe the Phoenix driver based on 
the stack trace?
How do I tell which version of scala that was compiled against?

Is there a jar that I need to add to the spark or hbase classpath?




On Thursday, November 2, 2023 at 01:38:21 AM PDT, Aironman DirtDiver 
 wrote: 





The error message Caused by: java.lang.ClassNotFoundException: 
scala.Product$class indicates that the Spark job is trying to load a class that 
is not available in the classpath. This can happen if the Spark job is compiled 
with a different version of Scala than the version of Scala that is used to run 
the job.
You have mentioned that you are using Spark 3.5.0, which is compatible with 
Scala 2.12. However, you have also mentioned that you have tried Scala versions 
2.10, 2.11, 2.12, and 2.13. This suggests that you may have multiple versions 
of Scala installed on your system.
To resolve the issue, you need to make sure that the Spark job is compiled and 
run with the same version of Scala. You can do this by setting the 
SPARK_SCALA_VERSION environment variable to the desired Scala version before 
starting the Spark job.
For example, to compile the Spark job with Scala 2.12, you would run the 
following command:
SPARK_SCALA_VERSION=2.12 sbt compile

To run the Spark job with Scala 2.12, you would run the following command:
SPARK_SCALA_VERSION=2.12 spark-submit spark-job.jar

If you are using Databricks, you can set the Scala version for the Spark 
cluster in the cluster creation settings.
Once you have ensured that the Spark job is compiled and run with the same 
version of Scala, the error should be resolved.
Here are some additional tips for troubleshooting Scala version conflicts:
* Make sure that you are using the correct version of the Spark libraries. 
The Spark libraries must be compiled with the same version of Scala as the 
Spark job.
* If you are using a third-party library, make sure that it is compatible 
with the version of Scala that you are using.
* Check the Spark logs for any ClassNotFoundExceptions. The logs may 
indicate the specific class that is missing from the classpath.
* Use a tool like sbt dependency:tree to view the dependencies of your 
Spark job. This can help you to identify any conflicting dependencies.

El jue, 2 nov 2023 a las 5:39, Harry Jamison 
() escribió:
> I am getting the error below when I try to run a spark job connecting to 
> phoneix.  It seems like I have the incorrect scala version that some part of 
> the code is expecting.
> 
> I am using spark 3.5.0, and I have copied these phoenix jars into the spark 
> lib
> phoenix-server-hbase-2.5-5.1.3.jar  
> phoenix-spark-5.0.0-HBase-2.0.jar
> 
> I have tried scala 2.10, 2.11, 2.12, and 2.13
> I do not see the scala version used in the logs so I am not 100% sure that it 
> is using the version I expect that it should be.
> 
> 
> Here is the exception that I am getting
> 
> 2023-11-01T16:13:00,391 INFO  [Thread-4] handler.ContextHandler: Started 
> o.s.j.s.ServletContextHandler@15cd3b2a{/static/sql,null,AVAILABLE,@Spark}
> Traceback (most recent call last):
>   File "/hadoop/spark/spark-3.5.0-bin-hadoop3/copy_tables.py", line 10, in 
> 
> .option("zkUrl", "namenode:2181").load()
>   File 
> "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 
> 314, in load
>   File 
> "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", 
> line 1322, in __call__
>   File 
> "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py",
>  line 179, in deco
>   File 
> "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 
> 326, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o28.load.
> : java.lang.NoClassDefFoundError: scala/Product$class
> at 
> org.apache.phoenix.spark.PhoenixRelation.(PhoenixRelation.scala:29)
> at 
> org.apache.phoenix.spark.DefaultSource.createRelation(DefaultSource.scala:29)
> at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:346)
> at 
> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:229)
> at 
> org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:211)
> at scala.Option.getOrElse(Option.scala:189)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:172)
> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at 

RE: jackson-databind version mismatch

2023-11-02 Thread moshik.vitas
Thanks for replying,

 

The issue was import of spring-boot-dependencies on my dependencyManagement pom 
that forced invalid jar version.

Removed this section and got valid spark dependencies.

 

Regards,
Moshik Vitas

 

From: Bjørn Jørgensen  
Sent: Thursday, 2 November 2023 10:40
To: eab...@163.com
Cc: user @spark ; Saar Barhoom ; 
moshik.vi...@veeva.com
Subject: Re: jackson-databind version mismatch

 

[SPARK-43225][BUILD][SQL] Remove jackson-core-asl and jackson-mapper-asl from 
pre-built distribution  

 

tor. 2. nov. 2023 kl. 09:15 skrev Bjørn Jørgensen mailto:bjornjorgen...@gmail.com> >:

In spark 3.5.0 removed  jackson-core-asl and jackson-mapper-asl  those are with 
groupid org.codehaus.jackson. 

 

Those others jackson-* are with groupid com.fasterxml.jackson.core 

 

 

tor. 2. nov. 2023 kl. 01:43 skrev eab...@163.com   
mailto:eab...@163.com> >:

Hi,

Please check the versions of jar files starting with "jackson-". Make sure 
all versions are consistent.  jackson jar list in spark-3.3.0:



2022/06/10  04:3775,714 jackson-annotations-2.13.3.jar

2022/06/10  04:37   374,895 jackson-core-2.13.3.jar

2022/06/10  04:37   232,248 jackson-core-asl-1.9.13.jar

2022/06/10  04:37 1,536,542 jackson-databind-2.13.3.jar

2022/06/10  04:3752,020 jackson-dataformat-yaml-2.13.3.jar

2022/06/10  04:37   121,201 jackson-datatype-jsr310-2.13.3.jar

2022/06/10  04:37   780,664 jackson-mapper-asl-1.9.13.jar

2022/06/10  04:37   458,981 jackson-module-scala_2.12-2.13.3.jar



Spark 3.3.0 uses Jackson version 2.13.3, while Spark 3.5.0 uses Jackson version 
2.15.2. I think you can remove the lower version of Jackson package to keep the 
versions consistent.

eabour

 

From:   moshik.vi...@veeva.com.INVALID

Date: 2023-11-01 15:03

To:   user@spark.apache.org

CC:   'Saar Barhoom'

Subject: jackson-databind version mismatch

Hi Spark team,

 

On upgrading spark version from 3.2.1 to 3.4.1 got the following issue:

java.lang.NoSuchMethodError: 'com.fasterxml.jackson.core.JsonGenerator 
com.fasterxml.jackson.databind.ObjectMapper.createGenerator(java.io.OutputStream,
 com.fasterxml.jackson.core.JsonEncoding)'

at 
org.apache.spark.util.JsonProtocol$.toJsonString(JsonProtocol.scala:75)

at 
org.apache.spark.SparkThrowableHelper$.getMessage(SparkThrowableHelper.scala:74)

at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$7(SQLExecution.scala:127)

at scala.Option.map(Option.scala:230)

at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)

at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)

at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)

at 
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)

at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)

at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4165)

at org.apache.spark.sql.Dataset.head(Dataset.scala:3161)

at org.apache.spark.sql.Dataset.take(Dataset.scala:3382)

at org.apache.spark.sql.Dataset.takeAsList(Dataset.scala:3405)

at 
com.crossix.safemine.cloud.utils.DebugRDDLogger.showDataset(DebugRDDLogger.java:84)

at 
com.crossix.safemine.cloud.components.statistics.spark.StatisticsTransformer.getFillRateCountsWithSparkQuery(StatisticsTransformer.java:122)

at 
com.crossix.safemine.cloud.components.statistics.spark.StatisticsTransformer.calculateStatistics(StatisticsTransformer.java:61)

at 
com.crossix.safemine.cloud.components.statistics.spark.SparkFileStatistics.execute(SparkFileStatistics.java:102)

at 
com.crossix.safemine.cloud.StatisticsFlow.calculateAllStatistics(StatisticsFlow.java:146)

at 
com.crossix.safemine.cloud.StatisticsFlow.runStatistics(StatisticsFlow.java:119)

at 
com.crossix.safemine.cloud.StatisticsFlow.initialFileStatistics(StatisticsFlow.java:77)

at com.crossix.safemine.cloud.SMCFlow.process(SMCFlow.java:221)

at com.crossix.safemine.cloud.SMCFlow.execute(SMCFlow.java:132)

at com.crossix.safemine.cloud.SMCFlow.run(SMCFlow.java:91)



I see that that spark package contains the dependency:

com.fasterxml.jackson.core:jackson-databind:jar:2.10.5:compile

 

But jackson-databind 2.10.5 does not contain 
ObjectMapper.createGenerator(java.io.OutputStream, 

Data analysis issues

2023-11-02 Thread Jauru Lin
Hello all,

I have a question about Apache Spark,
I would like to ask if I use Rstudio to connect to Spark to analyze data,
will the data I use be seen by Spark's back-end personnel?

Hope someone can solve my problem.
Thanks!


Re: Re: jackson-databind version mismatch

2023-11-02 Thread eab...@163.com
Hi,
But in fact, it does have those packages.

 D:\02_bigdata\spark-3.5.0-bin-hadoop3\jars 

2023/09/09  10:0875,567 jackson-annotations-2.15.2.jar
2023/09/09  10:08   549,207 jackson-core-2.15.2.jar
2023/09/09  10:08   232,248 jackson-core-asl-1.9.13.jar
2023/09/09  10:08 1,620,088 jackson-databind-2.15.2.jar
2023/09/09  10:0854,630 jackson-dataformat-yaml-2.15.2.jar
2023/09/09  10:08   122,937 jackson-datatype-jsr310-2.15.2.jar
2023/09/09  10:08   780,664 jackson-mapper-asl-1.9.13.jar
2023/09/09  10:08   513,968 jackson-module-scala_2.12-2.15.2.jar



eabour
 
From: Bjørn Jørgensen
Date: 2023-11-02 16:40
To: eab...@163.com
CC: user @spark; Saar Barhoom; moshik.vitas
Subject: Re: jackson-databind version mismatch
[SPARK-43225][BUILD][SQL] Remove jackson-core-asl and jackson-mapper-asl from 
pre-built distribution

tor. 2. nov. 2023 kl. 09:15 skrev Bjørn Jørgensen :
In spark 3.5.0 removed  jackson-core-asl and jackson-mapper-asl  those are with 
groupid org.codehaus.jackson. 

Those others jackson-* are with groupid com.fasterxml.jackson.core 


tor. 2. nov. 2023 kl. 01:43 skrev eab...@163.com :
Hi,
Please check the versions of jar files starting with "jackson-". Make sure 
all versions are consistent.  jackson jar list in spark-3.3.0:

2022/06/10  04:3775,714 jackson-annotations-2.13.3.jar
2022/06/10  04:37   374,895 jackson-core-2.13.3.jar
2022/06/10  04:37   232,248 jackson-core-asl-1.9.13.jar
2022/06/10  04:37 1,536,542 jackson-databind-2.13.3.jar
2022/06/10  04:3752,020 jackson-dataformat-yaml-2.13.3.jar
2022/06/10  04:37   121,201 jackson-datatype-jsr310-2.13.3.jar
2022/06/10  04:37   780,664 jackson-mapper-asl-1.9.13.jar
2022/06/10  04:37   458,981 jackson-module-scala_2.12-2.13.3.jar

Spark 3.3.0 uses Jackson version 2.13.3, while Spark 3.5.0 uses Jackson version 
2.15.2. I think you can remove the lower version of Jackson package to keep the 
versions consistent.
eabour
 
From: moshik.vi...@veeva.com.INVALID
Date: 2023-11-01 15:03
To: user@spark.apache.org
CC: 'Saar Barhoom'
Subject: jackson-databind version mismatch
Hi Spark team,
 
On upgrading spark version from 3.2.1 to 3.4.1 got the following issue:
java.lang.NoSuchMethodError: 'com.fasterxml.jackson.core.JsonGenerator 
com.fasterxml.jackson.databind.ObjectMapper.createGenerator(java.io.OutputStream,
 com.fasterxml.jackson.core.JsonEncoding)'
at 
org.apache.spark.util.JsonProtocol$.toJsonString(JsonProtocol.scala:75)
at 
org.apache.spark.SparkThrowableHelper$.getMessage(SparkThrowableHelper.scala:74)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$7(SQLExecution.scala:127)
at scala.Option.map(Option.scala:230)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
at 
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4165)
at org.apache.spark.sql.Dataset.head(Dataset.scala:3161)
at org.apache.spark.sql.Dataset.take(Dataset.scala:3382)
at org.apache.spark.sql.Dataset.takeAsList(Dataset.scala:3405)
at 
com.crossix.safemine.cloud.utils.DebugRDDLogger.showDataset(DebugRDDLogger.java:84)
at 
com.crossix.safemine.cloud.components.statistics.spark.StatisticsTransformer.getFillRateCountsWithSparkQuery(StatisticsTransformer.java:122)
at 
com.crossix.safemine.cloud.components.statistics.spark.StatisticsTransformer.calculateStatistics(StatisticsTransformer.java:61)
at 
com.crossix.safemine.cloud.components.statistics.spark.SparkFileStatistics.execute(SparkFileStatistics.java:102)
at 
com.crossix.safemine.cloud.StatisticsFlow.calculateAllStatistics(StatisticsFlow.java:146)
at 
com.crossix.safemine.cloud.StatisticsFlow.runStatistics(StatisticsFlow.java:119)
at 
com.crossix.safemine.cloud.StatisticsFlow.initialFileStatistics(StatisticsFlow.java:77)
at com.crossix.safemine.cloud.SMCFlow.process(SMCFlow.java:221)
at com.crossix.safemine.cloud.SMCFlow.execute(SMCFlow.java:132)
at com.crossix.safemine.cloud.SMCFlow.run(SMCFlow.java:91)

I see that that spark package contains the dependency:

Re: jackson-databind version mismatch

2023-11-02 Thread Bjørn Jørgensen
[SPARK-43225][BUILD][SQL] Remove jackson-core-asl and jackson-mapper-asl
from pre-built distribution 

tor. 2. nov. 2023 kl. 09:15 skrev Bjørn Jørgensen :

> In spark 3.5.0 removed  jackson-core-asl and jackson-mapper-asl  those
> are with groupid org.codehaus.jackson.
>
> Those others jackson-* are with groupid com.fasterxml.jackson.core
>
>
> tor. 2. nov. 2023 kl. 01:43 skrev eab...@163.com :
>
>> Hi,
>> Please check the versions of jar files starting with "jackson-". Make 
>> sure all versions are consistent.
>>  jackson jar list in spark-3.3.0:
>> 
>> 2022/06/10  04:3775,714 jackson-annotations-*2.13.3*.jar
>> 2022/06/10  04:37   374,895 jackson-core-*2.13.3*.jar
>> 2022/06/10  04:37   232,248 jackson-core-asl-1.9.13.jar
>> 2022/06/10  04:37 1,536,542 jackson-databind-*2.13.3*.jar
>> 2022/06/10  04:3752,020 jackson-dataformat-yaml-*2.13.3*.jar
>> 2022/06/10  04:37   121,201 jackson-datatype-jsr310-*2.13.3*.jar
>> 2022/06/10  04:37   780,664 jackson-mapper-asl-1.9.13.jar
>> 2022/06/10  04:37   458,981 jackson-module-scala_2.12-*2.13.3*
>> .jar
>> 
>>
>> Spark 3.3.0 uses Jackson version 2.13.3, while Spark 3.5.0 uses Jackson 
>> version 2.15.2.
>> I think you can remove the lower version of Jackson package to keep the 
>> versions consistent.
>> eabour
>>
>>
>> *From:* moshik.vi...@veeva.com.INVALID
>> *Date:* 2023-11-01 15:03
>> *To:* user@spark.apache.org
>> *CC:* 'Saar Barhoom' 
>> *Subject:* jackson-databind version mismatch
>>
>> Hi Spark team,
>>
>>
>>
>> On upgrading spark version from 3.2.1 to 3.4.1 got the following issue:
>>
>> *java.lang.NoSuchMethodError: 'com.fasterxml.jackson.core.JsonGenerator
>> com.fasterxml.jackson.databind.ObjectMapper.createGenerator(java.io.OutputStream,
>> com.fasterxml.jackson.core.JsonEncoding)'*
>>
>> *at
>> org.apache.spark.util.JsonProtocol$.toJsonString(JsonProtocol.scala:75)*
>>
>> *at
>> org.apache.spark.SparkThrowableHelper$.getMessage(SparkThrowableHelper.scala:74)*
>>
>> *at
>> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$7(SQLExecution.scala:127)*
>>
>> *at scala.Option.map(Option.scala:230)*
>>
>> *at
>> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)*
>>
>> *at
>> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)*
>>
>> *at
>> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)*
>>
>> *at
>> org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)*
>>
>> *at
>> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)*
>>
>> *at
>> org.apache.spark.sql.Dataset.withAction(Dataset.scala:4165)*
>>
>> *at org.apache.spark.sql.Dataset.head(Dataset.scala:3161)*
>>
>> *at org.apache.spark.sql.Dataset.take(Dataset.scala:3382)*
>>
>> *at
>> org.apache.spark.sql.Dataset.takeAsList(Dataset.scala:3405)*
>>
>> *at
>> com.crossix.safemine.cloud.utils.DebugRDDLogger.showDataset(DebugRDDLogger.java:84)*
>>
>> *at
>> com.crossix.safemine.cloud.components.statistics.spark.StatisticsTransformer.getFillRateCountsWithSparkQuery(StatisticsTransformer.java:122)*
>>
>> *at
>> com.crossix.safemine.cloud.components.statistics.spark.StatisticsTransformer.calculateStatistics(StatisticsTransformer.java:61)*
>>
>> *at
>> com.crossix.safemine.cloud.components.statistics.spark.SparkFileStatistics.execute(SparkFileStatistics.java:102)*
>>
>> *at
>> com.crossix.safemine.cloud.StatisticsFlow.calculateAllStatistics(StatisticsFlow.java:146)*
>>
>> *at
>> com.crossix.safemine.cloud.StatisticsFlow.runStatistics(StatisticsFlow.java:119)*
>>
>> *at
>> com.crossix.safemine.cloud.StatisticsFlow.initialFileStatistics(StatisticsFlow.java:77)*
>>
>> *at
>> com.crossix.safemine.cloud.SMCFlow.process(SMCFlow.java:221)*
>>
>> *at
>> com.crossix.safemine.cloud.SMCFlow.execute(SMCFlow.java:132)*
>>
>> *at
>> com.crossix.safemine.cloud.SMCFlow.run(SMCFlow.java:91)*
>>
>>
>>
>> I see that that spark package contains the dependency:
>>
>> com.fasterxml.jackson.core:jackson-databind:jar:2.10.5:compile
>>
>>
>>
>> But jackson-databind 2.10.5 does not contain 
>> *ObjectMapper.createGenerator(java.io.OutputStream,
>> com.fasterxml.jackson.core.JsonEncoding)*
>>
>> It was added on 2.11.0
>>
>>
>>
>> Trying to upgrade jackson-databind fails with:
>>
>> *com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.5
>> requires Jackson Databind version >= 2.10.0 and < 2.11.0*
>>
>>
>>

Re: Spark / Scala conflict

2023-11-02 Thread Aironman DirtDiver
The error message Caused by: java.lang.ClassNotFoundException:
scala.Product$class indicates that the Spark job is trying to load a class
that is not available in the classpath. This can happen if the Spark job is
compiled with a different version of Scala than the version of Scala that
is used to run the job.

You have mentioned that you are using Spark 3.5.0, which is compatible with
Scala 2.12. However, you have also mentioned that you have tried Scala
versions 2.10, 2.11, 2.12, and 2.13. This suggests that you may have
multiple versions of Scala installed on your system.

To resolve the issue, you need to make sure that the Spark job is compiled
and run with the same version of Scala. You can do this by setting the
SPARK_SCALA_VERSION environment variable to the desired Scala version
before starting the Spark job.

For example, to compile the Spark job with Scala 2.12, you would run the
following command:

SPARK_SCALA_VERSION=2.12 sbt compile

To run the Spark job with Scala 2.12, you would run the following command:

SPARK_SCALA_VERSION=2.12 spark-submit spark-job.jar

If you are using Databricks, you can set the Scala version for the Spark
cluster in the cluster creation settings.

Once you have ensured that the Spark job is compiled and run with the same
version of Scala, the error should be resolved.

Here are some additional tips for troubleshooting Scala version conflicts:

   - Make sure that you are using the correct version of the Spark
   libraries. The Spark libraries must be compiled with the same version of
   Scala as the Spark job.
   - If you are using a third-party library, make sure that it is
   compatible with the version of Scala that you are using.
   - Check the Spark logs for any ClassNotFoundExceptions. The logs may
   indicate the specific class that is missing from the classpath.
   - Use a tool like sbt dependency:tree to view the dependencies of your
   Spark job. This can help you to identify any conflicting dependencies.


El jue, 2 nov 2023 a las 5:39, Harry Jamison
() escribió:

> I am getting the error below when I try to run a spark job connecting to
> phoneix.  It seems like I have the incorrect scala version that some part
> of the code is expecting.
>
> I am using spark 3.5.0, and I have copied these phoenix jars into the
> spark lib
> phoenix-server-hbase-2.5-5.1.3.jar
> phoenix-spark-5.0.0-HBase-2.0.jar
>
> I have tried scala 2.10, 2.11, 2.12, and 2.13
> I do not see the scala version used in the logs so I am not 100% sure that
> it is using the version I expect that it should be.
>
>
> Here is the exception that I am getting
>
> 2023-11-01T16:13:00,391 INFO  [Thread-4] handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@15cd3b2a{/static/sql,null,AVAILABLE,@Spark}
> Traceback (most recent call last):
>   File "/hadoop/spark/spark-3.5.0-bin-hadoop3/copy_tables.py", line 10, in
> 
> .option("zkUrl", "namenode:2181").load()
>   File
> "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py",
> line 314, in load
>   File
> "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py",
> line 1322, in __call__
>   File
> "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py",
> line 179, in deco
>   File
> "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py",
> line 326, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o28.load.
> : java.lang.NoClassDefFoundError: scala/Product$class
> at
> org.apache.phoenix.spark.PhoenixRelation.(PhoenixRelation.scala:29)
> at
> org.apache.phoenix.spark.DefaultSource.createRelation(DefaultSource.scala:29)
> at
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:346)
> at
> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:229)
> at
> org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:211)
> at scala.Option.getOrElse(Option.scala:189)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:172)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
> at py4j.Gateway.invoke(Gateway.java:282)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at
> py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
> at 

Re: jackson-databind version mismatch

2023-11-02 Thread Bjørn Jørgensen
In spark 3.5.0 removed  jackson-core-asl and jackson-mapper-asl
 those are with
groupid org.codehaus.jackson.

Those others jackson-* are with groupid com.fasterxml.jackson.core


tor. 2. nov. 2023 kl. 01:43 skrev eab...@163.com :

> Hi,
> Please check the versions of jar files starting with "jackson-". Make 
> sure all versions are consistent.
>  jackson jar list in spark-3.3.0:
> 
> 2022/06/10  04:3775,714 jackson-annotations-*2.13.3*.jar
> 2022/06/10  04:37   374,895 jackson-core-*2.13.3*.jar
> 2022/06/10  04:37   232,248 jackson-core-asl-1.9.13.jar
> 2022/06/10  04:37 1,536,542 jackson-databind-*2.13.3*.jar
> 2022/06/10  04:3752,020 jackson-dataformat-yaml-*2.13.3*.jar
> 2022/06/10  04:37   121,201 jackson-datatype-jsr310-*2.13.3*.jar
> 2022/06/10  04:37   780,664 jackson-mapper-asl-1.9.13.jar
> 2022/06/10  04:37   458,981 jackson-module-scala_2.12-*2.13.3*.jar
> 
>
> Spark 3.3.0 uses Jackson version 2.13.3, while Spark 3.5.0 uses Jackson 
> version 2.15.2.
> I think you can remove the lower version of Jackson package to keep the 
> versions consistent.
> eabour
>
>
> *From:* moshik.vi...@veeva.com.INVALID
> *Date:* 2023-11-01 15:03
> *To:* user@spark.apache.org
> *CC:* 'Saar Barhoom' 
> *Subject:* jackson-databind version mismatch
>
> Hi Spark team,
>
>
>
> On upgrading spark version from 3.2.1 to 3.4.1 got the following issue:
>
> *java.lang.NoSuchMethodError: 'com.fasterxml.jackson.core.JsonGenerator
> com.fasterxml.jackson.databind.ObjectMapper.createGenerator(java.io.OutputStream,
> com.fasterxml.jackson.core.JsonEncoding)'*
>
> *at
> org.apache.spark.util.JsonProtocol$.toJsonString(JsonProtocol.scala:75)*
>
> *at
> org.apache.spark.SparkThrowableHelper$.getMessage(SparkThrowableHelper.scala:74)*
>
> *at
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$7(SQLExecution.scala:127)*
>
> *at scala.Option.map(Option.scala:230)*
>
> *at
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)*
>
> *at
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)*
>
> *at
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)*
>
> *at
> org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)*
>
> *at
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)*
>
> *at
> org.apache.spark.sql.Dataset.withAction(Dataset.scala:4165)*
>
> *at org.apache.spark.sql.Dataset.head(Dataset.scala:3161)*
>
> *at org.apache.spark.sql.Dataset.take(Dataset.scala:3382)*
>
> *at
> org.apache.spark.sql.Dataset.takeAsList(Dataset.scala:3405)*
>
> *at
> com.crossix.safemine.cloud.utils.DebugRDDLogger.showDataset(DebugRDDLogger.java:84)*
>
> *at
> com.crossix.safemine.cloud.components.statistics.spark.StatisticsTransformer.getFillRateCountsWithSparkQuery(StatisticsTransformer.java:122)*
>
> *at
> com.crossix.safemine.cloud.components.statistics.spark.StatisticsTransformer.calculateStatistics(StatisticsTransformer.java:61)*
>
> *at
> com.crossix.safemine.cloud.components.statistics.spark.SparkFileStatistics.execute(SparkFileStatistics.java:102)*
>
> *at
> com.crossix.safemine.cloud.StatisticsFlow.calculateAllStatistics(StatisticsFlow.java:146)*
>
> *at
> com.crossix.safemine.cloud.StatisticsFlow.runStatistics(StatisticsFlow.java:119)*
>
> *at
> com.crossix.safemine.cloud.StatisticsFlow.initialFileStatistics(StatisticsFlow.java:77)*
>
> *at
> com.crossix.safemine.cloud.SMCFlow.process(SMCFlow.java:221)*
>
> *at
> com.crossix.safemine.cloud.SMCFlow.execute(SMCFlow.java:132)*
>
> *at
> com.crossix.safemine.cloud.SMCFlow.run(SMCFlow.java:91)*
>
>
>
> I see that that spark package contains the dependency:
>
> com.fasterxml.jackson.core:jackson-databind:jar:2.10.5:compile
>
>
>
> But jackson-databind 2.10.5 does not contain 
> *ObjectMapper.createGenerator(java.io.OutputStream,
> com.fasterxml.jackson.core.JsonEncoding)*
>
> It was added on 2.11.0
>
>
>
> Trying to upgrade jackson-databind fails with:
>
> *com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.5
> requires Jackson Databind version >= 2.10.0 and < 2.11.0*
>
>
>
> According to spark 3.3.0 release notes: "Upgrade Jackson to 2.13.3" but
> spark package of 3.4.1 contains Jackson of 2.10.5
>
> (https://spark.apache.org/releases/spark-release-3-3-0.html)
>
> What am I missing?
>
>
>
> --
>
>