Hi,

When we try to call saveAsParquetFile on a schemaRDD we get the following error 
:


Py4JJavaError: An error occurred while calling o384.saveAsParquetFile.
: java.lang.NoClassDefFoundError: 
org/apache/hadoop/mapreduce/lib/output/DirectFileOutputCommitter
        at 
org.apache.spark.sql.parquet.InsertIntoParquetTable.execute(ParquetTableOperations.scala:240)
        at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)
        at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)
        at 
org.apache.spark.sql.SchemaRDDLike$class.saveAsParquetFile(SchemaRDDLike.scala:76)
        at 
org.apache.spark.sql.api.java.JavaSchemaRDD.saveAsParquetFile(JavaSchemaRDD.scala:42)



https://issues.apache.org/jira/browse/SPARK-3595 seems to have addressed this 
issue of respecting the OutputCommitter but when I pull from the master and try 
the same I still encounter this issue.

I am on a Mapr Distribution and my org\apache\hadoop\mapreduce\lib\output does 
not contain DirectFileOutputCommitter

Best Regards,
Santosh

Reply via email to