Dear team,
With the 0.5.1 version released, user need to add `org.apache.spark:spark-avro_2.11:2.4.4` when starting hudi command, like bellow /-------------------------------------------------------------------------------------------------------------------------------------------------------------/ spark-2.4.4-bin-hadoop2.7/bin/spark-shell \ --packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4 \ --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' /-------------------------------------------------------------------------------------------------------------------------------------------------------------/ From spark-avro-guide[1], we know that the spark-avro module is external, it is not exists in spark-2.4.4-bin-hadoop2.7.tgz. So may it's better to relocate spark-avro dependency by using maven-shade-plugin. If so, user will starting hudi like 0.5.0 version does. /-------------------------------------------------------------------------------------------------------------------------------------------------------------/ spark-2.4.4-bin-hadoop2.7/bin/spark-shell \ --packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating \ --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' /-------------------------------------------------------------------------------------------------------------------------------------------------------------/ I created a pr to fix this[3], we may need have more discussion about this, any suggestion is welcome, thanks very much :) Current state: @bhasudha : +1 @vinoth : -1 [1] http://spark.apache.org/docs/latest/sql-data-sources-avro.html [2] http://mirror.bit.edu.cn/apache/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz [3] https://github.com/apache/incubator-hudi/pull/1290