Thank you so much, you absolutely nailed it. There is a stupid "SPARK_HOME" env variable pointing to Spark2.4 existed on zsh, which is the troublemaker.
Totally forgot that and didn't realize this environment variable could cause days frustration for me. Yong ________________________________ From: Artemis User <arte...@dtechspace.com> Sent: Thursday, March 10, 2022 3:13 PM To: user <user@spark.apache.org> Subject: Re: Spark 3.1 with spark AVRO It must be some misconfiguration in your environment. Do you perhaps have a hardwired $SPARK_HOME env variable in your shell? An easy test would be to place the spark-avro jar file you downloaded in the jars directory of Spark and run spark-shell again without the packages option. This will guarantee that the jar file is on the classpath of Spark driver and executors.. On 3/10/22 1:24 PM, Yong Zhang wrote: Hi, I am puzzled with this issue of Spark 3.1 version to read avro file. Everything is done on my local mac laptop so far, and I really don't know where the issue comes from, and I googled a lot and cannot find any clue. I am always using Spark 2.4 version, as it is really mature. But for a new project, I want to taste Spark 3.1, which needs to read AVRO file. To my surprise, on my local, the Spark 3.1.3 throws error when trying to read the avro files. * I download the Spark 3.1.2 and 3.1.3 with Hadoop2 or 3 from https://spark.apache.org/downloads.html * Use JDK "1.8.0_321" on the Mac * Untar the spark 3.1.x local * And follow https://spark.apache.org/docs/3.1.3/sql-data-sources-avro.html Start the spark-shell in the following exactly command: spark-3.1.3-bin-hadoop3.2/bin/spark-shell --packages org.apache.spark:spark-avro_2.12:3.1.3 * And I always get the following error when read the existing test AVRO files: scala> val pageview = spark.read.format("avro").load("/Users/user/output/raw/") java.lang.NoClassDefFoundError: org/apache/spark/sql/execution/datasources/v2/FileDataSourceV2 I tried different version of Spark 3.x, from Spark 3.1.2 -> 3.1.3 -> 3.2.1, and I believe they are all under Scala 2.12, and I start the spark-shell with "--packages org.apache.spark:spark-avro_2.12:x.x.x", which x.x.x matches the Spark version, but I got the above wired "NoClassDefFoundError" in all cases. Meantime, download Spark2.4.8 and start spark-shell with "--packages org.apache.spark:spark-avro_2.11:2.4.3", I can read the exactly same ARVO file without any issue. I am thinking it must be done wrongly on my end, but: * I downloaded several versions of Spark and untar them directly. * I DIDN'T have any custom "spark-env.sh/spark-default.conf" file to include any potential jar files to mess up things * Straight creating a spark session under spark-shell with the correct package and try to read avro files. Nothing more. I have to doubt there are something wrong with the Spark 3.x avro package releases, but I know that possiblity is very low, especailly for multi different veresions. But the class "org/apache/spark/sql/execution/datasources/v2/FileDataSourceV2" existed under "spark-sql_2.12-3.1.3.jar", as blow: jar tvf spark-sql_2.12-3.1.3.jar | grep FileDataSourceV2 15436 Sun Feb 06 22:54:00 EST 2022 org/apache/spark/sql/execution/datasources/v2/FileDataSourceV2.class So what could be wrong? Thanks Yong