Hi Gurudatt, I am guessing client mode does not support hdfs jar extraction. Can you try changing the deploy-mode to cluster (default is client mode if you have not specified) ?
You can also try specifying the ```--packages org.apache.hudi:hudi-spark-bundle:0.5.0-incubating``` instead of --jars. This would pull the jars from Maven directly. Thanks, Sudha On Wed, Nov 13, 2019 at 1:19 AM Gurudatt Kulkarni <[email protected]> wrote: > Hi All, > > I am running into a strange issue where I am unable to query Hudi tables > via spark-shell. I followed the procedure as stated in Hudi Docs > <https://hudi.apache.org/querying_data.html#spark>. > > Used this command > spark-shell --jars hdfs:///jars/hudi-spark-bundle-0.5.1-SNAPSHOT.jar > --master yarn > > Added this config, > > spark.sparkContext.hadoopConfiguration.setClass("mapreduce.input.pathFilter.class", > classOf[org.apache.hudi.hadoop.HoodieROTablePathFilter], > classOf[org.apache.hadoop.fs.PathFilter]); > > Ran a simple select query on the hive table via spark sql. It is throwing > a java.lang.ClassNotFoundException: > org.apache.hudi.hadoop.HoodieParquetInputFormat . I checked the > hudi-spark-bundle jar, for the particular class, it is available in the > jar. Also, the hudi-hadoop-mr bundle is available in Hive classpath. Have I > missed any step here? > > Regards, > Gurudatt > > >
