you are going to need hadoop-3.1 on your classpath, with hadoop-aws and the same aws-sdk it was built with (1.11.something). Mixing hadoop JARs is doomed. using a different aws sdk jar is a bit risky, though more recent upgrades have all be fairly low stress
On Fri, 19 Jun 2020 at 05:39, murat migdisoglu <murat.migdiso...@gmail.com> wrote: > Hi all > I've upgraded my test cluster to spark 3 and change my comitter to > directory and I still get this error.. The documentations are somehow > obscure on that. > Do I need to add a third party jar to support new comitters? > > java.lang.ClassNotFoundException: > org.apache.spark.internal.io.cloud.PathOutputCommitProtocol > > > On Thu, Jun 18, 2020 at 1:35 AM murat migdisoglu < > murat.migdiso...@gmail.com> wrote: > >> Hello all, >> we have a hadoop cluster (using yarn) using s3 as filesystem with >> s3guard is enabled. >> We are using hadoop 3.2.1 with spark 2.4.5. >> >> When I try to save a dataframe in parquet format, I get the following >> exception: >> java.lang.ClassNotFoundException: >> com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol >> >> My relevant spark configurations are as following: >> >> "hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory", >> "fs.s3a.committer.name": "magic", >> "fs.s3a.committer.magic.enabled": true, >> "fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem", >> >> While spark streaming fails with the exception above, apache beam >> succeeds writing parquet files. >> What might be the problem? >> >> Thanks in advance >> >> >> -- >> "Talkers aren’t good doers. Rest assured that we’re going there to use >> our hands, not our tongues." >> W. Shakespeare >> > > > -- > "Talkers aren’t good doers. Rest assured that we’re going there to use > our hands, not our tongues." > W. Shakespeare >