you are going to need hadoop-3.1 on your classpath, with hadoop-aws and the
same aws-sdk it was built with (1.11.something). Mixing hadoop JARs is
doomed. using a different aws sdk jar is a bit risky, though more recent
upgrades have all be fairly low stress

On Fri, 19 Jun 2020 at 05:39, murat migdisoglu <murat.migdiso...@gmail.com>
wrote:

> Hi all
> I've upgraded my test cluster to spark 3 and change my comitter to
> directory and I still get this error.. The documentations are somehow
> obscure on that.
> Do I need to add a third party jar to support new comitters?
>
> java.lang.ClassNotFoundException:
> org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
>
>
> On Thu, Jun 18, 2020 at 1:35 AM murat migdisoglu <
> murat.migdiso...@gmail.com> wrote:
>
>> Hello all,
>> we have a hadoop cluster (using yarn) using  s3 as filesystem with
>> s3guard is enabled.
>> We are using hadoop 3.2.1 with spark 2.4.5.
>>
>> When I try to save a dataframe in parquet format, I get the following
>> exception:
>> java.lang.ClassNotFoundException:
>> com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol
>>
>> My relevant spark configurations are as following:
>>
>> "hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory",
>> "fs.s3a.committer.name": "magic",
>> "fs.s3a.committer.magic.enabled": true,
>> "fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",
>>
>> While spark streaming fails with the exception above, apache beam
>> succeeds writing parquet files.
>> What might be the problem?
>>
>> Thanks in advance
>>
>>
>> --
>> "Talkers aren’t good doers. Rest assured that we’re going there to use
>> our hands, not our tongues."
>> W. Shakespeare
>>
>
>
> --
> "Talkers aren’t good doers. Rest assured that we’re going there to use
> our hands, not our tongues."
> W. Shakespeare
>

Reply via email to