Thanks!
Let me update the status.
I have copied the DirectOutputCommitter to my local. And set:
Conf.set(spark.hadoop.mapred.output.committer.class,
org..DirectOutputCommitter)
It works perfectly.
Thanks everyone J
Regards,
Shuai
From: Aaron Davidson
I'm not super familiar w/ S3, but I think the issue is that you want to use
a different output committers with object stores, that don't have a
simple move operation. There have been a few other threads on S3
outputcommitters. I think the most relevant for you is most probably this
open JIRA:
Actually, this is the more relevant JIRA (which is resolved):
https://issues.apache.org/jira/browse/SPARK-3595
6352 is about saveAsParquetFile, which is not in use here.
Here is a DirectOutputCommitter implementation:
https://gist.github.com/aarondav/c513916e72101bbe14ec
and it can be
If you use fileStream, there's an option to filter out files. In your case
you can easily create a filter to remove _temporary files. In that case,
you will have to move your codes inside foreachRDD of the dstream since the
application will become a streaming app.
Thanks
Best Regards
On Sat, Mar
And one thing forget to mention, even I have this exception and the result
is not well format in my target folder (part of them are there, rest are
under different folder structure of _tempoary folder). In the webUI of
spark-shell, it is still be marked as successful step. I think this is a
bug?