Hi,

I am not sure about this but is there any requirement to use S3a at all ?


Regards,
Gourav

On Tue, Jul 21, 2020 at 12:07 PM Steve Loughran <ste...@cloudera.com.invalid>
wrote:

>
>
> On Tue, 7 Jul 2020 at 03:42, Stephen Coy <s...@infomedia.com.au.invalid>
> wrote:
>
>> Hi Steve,
>>
>> While I understand your point regarding the mixing of Hadoop jars, this
>> does not address the java.lang.ClassNotFoundException.
>>
>> Prebuilt Apache Spark 3.0 builds are only available for Hadoop 2.7 or
>> Hadoop 3.2. Not Hadoop 3.1.
>>
>
> sorry, I should have been clearer. Hadoop 3.2.x has everything you need.
>
>
>
>>
>> The only place that I have found that missing class is in the Spark
>> “hadoop-cloud” source module, and currently the only way to get the jar
>> containing it is to build it yourself. If any of the devs are listening it
>>  would be nice if this was included in the standard distribution. It has a
>> sizeable chunk of a repackaged Jetty embedded in it which I find a bit odd.
>>
>> But I am relatively new to this stuff so I could be wrong.
>>
>> I am currently running Spark 3.0 clusters with no HDFS. Spark is set up
>> like:
>>
>> hadoopConfiguration.set("spark.hadoop.fs.s3a.committer.name",
>> "directory");
>> hadoopConfiguration.set("spark.sql.sources.commitProtocolClass",
>> "org.apache.spark.internal.io.cloud.PathOutputCommitProtocol");
>> hadoopConfiguration.set("spark.sql.parquet.output.committer.class",
>> "org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter");
>> hadoopConfiguration.set("fs.s3a.connection.maximum",
>> Integer.toString(coreCount * 2));
>>
>> Querying and updating s3a data sources seems to be working ok.
>>
>> Thanks,
>>
>> Steve C
>>
>> On 29 Jun 2020, at 10:34 pm, Steve Loughran <ste...@cloudera.com.INVALID>
>> wrote:
>>
>> you are going to need hadoop-3.1 on your classpath, with hadoop-aws and
>> the same aws-sdk it was built with (1.11.something). Mixing hadoop JARs is
>> doomed. using a different aws sdk jar is a bit risky, though more recent
>> upgrades have all be fairly low stress
>>
>> On Fri, 19 Jun 2020 at 05:39, murat migdisoglu <
>> murat.migdiso...@gmail.com> wrote:
>>
>>> Hi all
>>> I've upgraded my test cluster to spark 3 and change my comitter to
>>> directory and I still get this error.. The documentations are somehow
>>> obscure on that.
>>> Do I need to add a third party jar to support new comitters?
>>>
>>> java.lang.ClassNotFoundException:
>>> org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
>>>
>>>
>>> On Thu, Jun 18, 2020 at 1:35 AM murat migdisoglu <
>>> murat.migdiso...@gmail.com> wrote:
>>>
>>>> Hello all,
>>>> we have a hadoop cluster (using yarn) using  s3 as filesystem with
>>>> s3guard is enabled.
>>>> We are using hadoop 3.2.1 with spark 2.4.5.
>>>>
>>>> When I try to save a dataframe in parquet format, I get the following
>>>> exception:
>>>> java.lang.ClassNotFoundException:
>>>> com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol
>>>>
>>>> My relevant spark configurations are as following:
>>>>
>>>> "hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory",
>>>> "fs.s3a.committer.name
>>>> <https://aus01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffs.s3a.committer.name%2F&data=02%7C01%7Cscoy%40infomedia.com.au%7C25d6f7b564dd4cb53e5508d81c28e645%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637290309277792405&sdata=jxbuOsgSShhHZcXjrjkZmJ4DCXIXstzRFSOaOEEadRE%3D&reserved=0>":
>>>> "magic",
>>>> "fs.s3a.committer.magic.enabled": true,
>>>> "fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",
>>>>
>>>> While spark streaming fails with the exception above, apache beam
>>>> succeeds writing parquet files.
>>>> What might be the problem?
>>>>
>>>> Thanks in advance
>>>>
>>>>
>>>> --
>>>> "Talkers aren’t good doers. Rest assured that we’re going there to use
>>>> our hands, not our tongues."
>>>> W. Shakespeare
>>>>
>>>
>>>
>>> --
>>> "Talkers aren’t good doers. Rest assured that we’re going there to use
>>> our hands, not our tongues."
>>> W. Shakespeare
>>>
>>
>>
>> <https://www.infomedia.com.au/driving-force/?utm_campaign=200630%20Email%20Signature&utm_source=Internal&utm_medium=Email&utm_content=Driving%20Force>
>> This email contains confidential information of and is the copyright of
>> Infomedia. It must not be forwarded, amended or disclosed without consent
>> of the sender. If you received this message by mistake, please advise the
>> sender and delete all copies. Security of transmission on the internet
>> cannot be guaranteed, could be infected, intercepted, or corrupted and you
>> should ensure you have suitable antivirus protection in place. By sending
>> us your or any third party personal details, you consent to (or confirm you
>> have obtained consent from such third parties) to Infomedia’s privacy
>> policy. http://www.infomedia.com.au/privacy-policy/
>>
>

Reply via email to