Hi, When we try to call saveAsParquetFile on a schemaRDD we get the following error :
Py4JJavaError: An error occurred while calling o384.saveAsParquetFile. : java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/lib/output/DirectFileOutputCommitter at org.apache.spark.sql.parquet.InsertIntoParquetTable.execute(ParquetTableOperations.scala:240) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425) at org.apache.spark.sql.SchemaRDDLike$class.saveAsParquetFile(SchemaRDDLike.scala:76) at org.apache.spark.sql.api.java.JavaSchemaRDD.saveAsParquetFile(JavaSchemaRDD.scala:42) https://issues.apache.org/jira/browse/SPARK-3595 seems to have addressed this issue of respecting the OutputCommitter but when I pull from the master and try the same I still encounter this issue. I am on a Mapr Distribution and my org\apache\hadoop\mapreduce\lib\output does not contain DirectFileOutputCommitter Best Regards, Santosh