If you look at the dependencies of the 5.0.0-HBase-2.0 artifact https://mvnrepository.com/artifact/org.apache.phoenix/phoenix-spark/5.0.0-HBase-2.0 it was built against Spark 2.3.0, Scala 2.11.8
You may need to check with the Phoenix community if your setup with Spark 3.4.1 etc is supported by something like https://github.com/apache/phoenix-connectors/tree/master/phoenix5-spark3 On Mon, Aug 21, 2023 at 6:12 PM Kal Stevens <kalgstev...@gmail.com> wrote: > Sorry for being so Dense and thank you for your help. > > I was using this version > phoenix-spark-5.0.0-HBase-2.0.jar > > Because it was the latest in this repo > https://mvnrepository.com/artifact/org.apache.phoenix/phoenix-spark > > > On Mon, Aug 21, 2023 at 5:07 PM Sean Owen <sro...@gmail.com> wrote: > >> It is. But you have a third party library in here which seems to require >> a different version. >> >> On Mon, Aug 21, 2023, 7:04 PM Kal Stevens <kalgstev...@gmail.com> wrote: >> >>> OK, it was my impression that scala was packaged with Spark to avoid a >>> mismatch >>> https://spark.apache.org/downloads.html >>> >>> It looks like spark 3.4.1 (my version) uses scala Scala 2.12 >>> How do I specify the scala version? >>> >>> On Mon, Aug 21, 2023 at 4:47 PM Sean Owen <sro...@gmail.com> wrote: >>> >>>> That's a mismatch in the version of scala that your library uses vs >>>> spark uses. >>>> >>>> On Mon, Aug 21, 2023, 6:46 PM Kal Stevens <kalgstev...@gmail.com> >>>> wrote: >>>> >>>>> I am having a hard time figuring out what I am doing wrong here. >>>>> I am not sure if I have an incompatible version of something installed >>>>> or something else. >>>>> I can not find anything relevant in google to figure out what I am >>>>> doing wrong >>>>> I am using *spark 3.4.1*, and *python3.10* >>>>> >>>>> This is my code to save my dataframe >>>>> urls = [] >>>>> pull_sitemap_xml(robot, urls) >>>>> df = spark.createDataFrame(data=urls, schema=schema) >>>>> df.write.format("org.apache.phoenix.spark") \ >>>>> .mode("overwrite") \ >>>>> .option("table", "property") \ >>>>> .option("zkUrl", "192.168.1.162:2181") \ >>>>> .save() >>>>> >>>>> urls is an array of maps, containing a "url" and a "last_mod" field. >>>>> >>>>> Here is the error that I am getting >>>>> >>>>> Traceback (most recent call last): >>>>> >>>>> File "/home/kal/real-estate/pullhttp/pull_properties.py", line 65, >>>>> in main >>>>> >>>>> .save() >>>>> >>>>> File >>>>> "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", >>>>> line 1396, in save >>>>> >>>>> self._jwrite.save() >>>>> >>>>> File >>>>> "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", >>>>> line 1322, in __call__ >>>>> >>>>> return_value = get_return_value( >>>>> >>>>> File >>>>> "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", >>>>> line 169, in deco >>>>> >>>>> return f(*a, **kw) >>>>> >>>>> File >>>>> "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", >>>>> line 326, in get_return_value >>>>> >>>>> raise Py4JJavaError( >>>>> >>>>> py4j.protocol.Py4JJavaError: An error occurred while calling o636.save. >>>>> >>>>> : java.lang.NoSuchMethodError: 'scala.collection.mutable.ArrayOps >>>>> scala.Predef$.refArrayOps(java.lang.Object[])' >>>>> >>>>> at >>>>> org.apache.phoenix.spark.DataFrameFunctions.getFieldArray(DataFrameFunctions.scala:76) >>>>> >>>>> at >>>>> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:35) >>>>> >>>>> at >>>>> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:28) >>>>> >>>>> at >>>>> org.apache.phoenix.spark.DefaultSource.createRelation(DefaultSource.scala:47) >>>>> >>>>> at >>>>> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47) >>>>> >>>>> at >>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) >>>>> >>>>> at >>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) >>>>> >>>>