Re: jar incompatibility with Spark 3.1.1 for structured streaming with kafka

2021-04-07 Thread Mich Talebzadeh
Hi Amit, Many thanks for your suggestion. My problem is that I am using PySpark in this particular case and there is no SBT or Maven which is used similar to Scala to build an Uber jar file with shading. So regrettably that is the only way I could resolve the problem by adding the jar file in

Re: jar incompatibility with Spark 3.1.1 for structured streaming with kafka

2021-04-07 Thread Amit Joshi
Hi Mich, If I correctly understood your problem, it is that the spark-kafka jar is shadowed by the installed kafka client jar at run time. I had been in that place earlier. I can recommend resolving the issue using the shade plugin. The example I am pasting here works for pom.xml. I am very sure

Re: jar incompatibility with Spark 3.1.1 for structured streaming with kafka

2021-04-07 Thread Mich Talebzadeh
Did some tests. The concern is SSS job running under YARN *Scenario 1)* use spark-sql-kafka-0-10_2.12-3.1.0.jar - Removed spark-sql-kafka-0-10_2.12-3.1.0.jar from anywhere on CLASSPATH including $SPARK_HOME/jars - Added the said jar file to spark-submit in client mode (the only mode

Re: jar incompatibility with Spark 3.1.1 for structured streaming with kafka

2021-04-07 Thread Gabor Somogyi
+1 on Sean's opinion On Wed, Apr 7, 2021 at 2:17 PM Sean Owen wrote: > You shouldn't be modifying your cluster install. You may at this point > have conflicting, excess JARs in there somewhere. I'd start it over if you > can. > > On Wed, Apr 7, 2021 at 7:15 AM Gabor Somogyi > wrote: > >> Not

Re: jar incompatibility with Spark 3.1.1 for structured streaming with kafka

2021-04-07 Thread Sean Owen
You shouldn't be modifying your cluster install. You may at this point have conflicting, excess JARs in there somewhere. I'd start it over if you can. On Wed, Apr 7, 2021 at 7:15 AM Gabor Somogyi wrote: > Not sure what you mean not working. You've added 3.1.1 to packages which > uses: > * 2.6.0

Re: jar incompatibility with Spark 3.1.1 for structured streaming with kafka

2021-04-07 Thread Gabor Somogyi
Not sure what you mean not working. You've added 3.1.1 to packages which uses: * 2.6.0 kafka-clients: https://github.com/apache/spark/blob/1d550c4e90275ab418b9161925049239227f3dc9/pom.xml#L136 * 2.6.2 commons pool:

Re: jar incompatibility with Spark 3.1.1 for structured streaming with kafka

2021-04-07 Thread Mich Talebzadeh
Hi Gabor et. al., To be honest I am not convinced this package --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.1 is really working! I know for definite that spark-sql-kafka-0-10_2.12-3.1.0.jar works fine. I reported the package working before because under $SPARK_HOME/jars on all nodes

Re: jar incompatibility with Spark 3.1.1 for structured streaming with kafka

2021-04-07 Thread Gabor Somogyi
Good to hear it's working. Happy Spark usage. G On Tue, 6 Apr 2021, 21:56 Mich Talebzadeh, wrote: > OK we found out the root cause of this issue. > > We were writing to Redis from Spark and downloaded a recently compiled > version of Redis jar with scala 2.12. > >

Re: jar incompatibility with Spark 3.1.1 for structured streaming with kafka

2021-04-06 Thread Mich Talebzadeh
OK we found out the root cause of this issue. We were writing to Redis from Spark and downloaded a recently compiled version of Redis jar with scala 2.12. spark-redis_2.12-2.4.1-SNAPSHOT-jar-with-dependencies.jar It was giving grief. We removed that one. So the job runs with either

Re: jar incompatibility with Spark 3.1.1 for structured streaming with kafka

2021-04-06 Thread Mich Talebzadeh
Fine. Just to clarify please. With SBT assembly and Scala I would create an Uber jar file and used that one with spark-submit As I understand (and stand corrected) with PySpark one can only run spark-submit in client mode by directly using a py file? So hence spark-submit --master local[4]

Re: jar incompatibility with Spark 3.1.1 for structured streaming with kafka

2021-04-06 Thread Sean Owen
Gabor's point is that these are not libraries you typically install in your cluster itself. You package them with your app. On Tue, Apr 6, 2021 at 11:35 AM Mich Talebzadeh wrote: > Hi G > > Thanks for the heads-up. > > In a thread on 3rd of March I reported that 3.1.1 works in yarn mode > >

Re: jar incompatibility with Spark 3.1.1 for structured streaming with kafka

2021-04-06 Thread Mich Talebzadeh
Hi G Thanks for the heads-up. In a thread on 3rd of March I reported that 3.1.1 works in yarn mode Spark 3.1.1 Preliminary results (mainly to do with Spark Structured Streaming) (mail-archive.com) >From that mail The needed

Re: jar incompatibility with Spark 3.1.1 for structured streaming with kafka

2021-04-06 Thread Gabor Somogyi
> Anyway I unzipped the tarball for Spark-3.1.1 and there is no spark-sql-kafka-0-10_2.12-3.0.1.jar even Please see how Structured Streaming app with Kafka needs to be deployed here: https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#deploying I don't see the

Re: jar incompatibility with Spark 3.1.1 for structured streaming with kafka

2021-04-06 Thread Mich Talebzadeh
OK thanks for that. I am using spark-submit with PySpark as follows spark-submit --version Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.1 /_/ Using Scala version 2.12.9, Java HotSpot(TM) 64-Bit

Re: jar incompatibility with Spark 3.1.1 for structured streaming with kafka

2021-04-06 Thread Sean Owen
You may be compiling your app against 3.0.1 JARs but submitting to 3.1.1. You do not in general modify the Spark libs. You need to package libs like this with your app at the correct version. On Tue, Apr 6, 2021 at 6:42 AM Mich Talebzadeh wrote: > Thanks Gabor. > > All nodes are running Spark

Re: jar incompatibility with Spark 3.1.1 for structured streaming with kafka

2021-04-06 Thread Mich Talebzadeh
Thanks Gabor. All nodes are running Spark /spark-3.1.1-bin-hadoop3.2 So $SPARK_HOME/jars contains all the required jars on all nodes including the jar file commons-pool2-2.9.0.jar as well. They are installed identically on all nodes. I have looked at the Spark environment for classpath. Still

Re: jar incompatibility with Spark 3.1.1 for structured streaming with kafka

2021-04-06 Thread Gabor Somogyi
I've just had a deeper look at the possible issue and here are my findings: * In 3.0.1 KafkaTokenUtil.needTokenUpdate has 3 params * In 3.1.1 KafkaTokenUtil.needTokenUpdate has 2 params * I've decompiled spark-token-provider-kafka-0-10_2.12-3.1.1.jar and KafkaTokenUtil.needTokenUpdate has 2 params

Re: jar incompatibility with Spark 3.1.1 for structured streaming with kafka

2021-04-06 Thread Gabor Somogyi
Since you've not shared too much details I presume you've updated the spark-sql-kafka jar only. KafkaTokenUtil is in the token provider jar. As a general note if I'm right, please update Spark as a whole on all nodes and not just jars independently. BR, G On Tue, Apr 6, 2021 at 10:21 AM Mich

jar incompatibility with Spark 3.1.1 for structured streaming with kafka

2021-04-06 Thread Mich Talebzadeh
Hi, Any chance of someone testing the latest spark-sql-kafka-0-10_2.12-3.1.1.jar for Spark. It throws java.lang.NoSuchMethodError: org.apache.spark.kafka010.KafkaTokenUtil$.needTokenUpdate(Ljava/util/Map;Lscala/Option;)Z However, the previous version spark-sql-kafka-0-10_2.12-3.0.1.jar