On Tue, 7 Jul 2020 at 03:42, Stephen Coy <s...@infomedia.com.au.invalid> wrote:
> Hi Steve, > > While I understand your point regarding the mixing of Hadoop jars, this > does not address the java.lang.ClassNotFoundException. > > Prebuilt Apache Spark 3.0 builds are only available for Hadoop 2.7 or > Hadoop 3.2. Not Hadoop 3.1. > sorry, I should have been clearer. Hadoop 3.2.x has everything you need. > > The only place that I have found that missing class is in the Spark > “hadoop-cloud” source module, and currently the only way to get the jar > containing it is to build it yourself. If any of the devs are listening it > would be nice if this was included in the standard distribution. It has a > sizeable chunk of a repackaged Jetty embedded in it which I find a bit odd. > > But I am relatively new to this stuff so I could be wrong. > > I am currently running Spark 3.0 clusters with no HDFS. Spark is set up > like: > > hadoopConfiguration.set("spark.hadoop.fs.s3a.committer.name", > "directory"); > hadoopConfiguration.set("spark.sql.sources.commitProtocolClass", > "org.apache.spark.internal.io.cloud.PathOutputCommitProtocol"); > hadoopConfiguration.set("spark.sql.parquet.output.committer.class", > "org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter"); > hadoopConfiguration.set("fs.s3a.connection.maximum", > Integer.toString(coreCount * 2)); > > Querying and updating s3a data sources seems to be working ok. > > Thanks, > > Steve C > > On 29 Jun 2020, at 10:34 pm, Steve Loughran <ste...@cloudera.com.INVALID> > wrote: > > you are going to need hadoop-3.1 on your classpath, with hadoop-aws and > the same aws-sdk it was built with (1.11.something). Mixing hadoop JARs is > doomed. using a different aws sdk jar is a bit risky, though more recent > upgrades have all be fairly low stress > > On Fri, 19 Jun 2020 at 05:39, murat migdisoglu <murat.migdiso...@gmail.com> > wrote: > >> Hi all >> I've upgraded my test cluster to spark 3 and change my comitter to >> directory and I still get this error.. The documentations are somehow >> obscure on that. >> Do I need to add a third party jar to support new comitters? >> >> java.lang.ClassNotFoundException: >> org.apache.spark.internal.io.cloud.PathOutputCommitProtocol >> >> >> On Thu, Jun 18, 2020 at 1:35 AM murat migdisoglu < >> murat.migdiso...@gmail.com> wrote: >> >>> Hello all, >>> we have a hadoop cluster (using yarn) using s3 as filesystem with >>> s3guard is enabled. >>> We are using hadoop 3.2.1 with spark 2.4.5. >>> >>> When I try to save a dataframe in parquet format, I get the following >>> exception: >>> java.lang.ClassNotFoundException: >>> com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol >>> >>> My relevant spark configurations are as following: >>> >>> "hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory", >>> "fs.s3a.committer.name >>> <https://aus01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffs.s3a.committer.name%2F&data=02%7C01%7Cscoy%40infomedia.com.au%7C25d6f7b564dd4cb53e5508d81c28e645%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637290309277792405&sdata=jxbuOsgSShhHZcXjrjkZmJ4DCXIXstzRFSOaOEEadRE%3D&reserved=0>": >>> "magic", >>> "fs.s3a.committer.magic.enabled": true, >>> "fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem", >>> >>> While spark streaming fails with the exception above, apache beam >>> succeeds writing parquet files. >>> What might be the problem? >>> >>> Thanks in advance >>> >>> >>> -- >>> "Talkers aren’t good doers. Rest assured that we’re going there to use >>> our hands, not our tongues." >>> W. Shakespeare >>> >> >> >> -- >> "Talkers aren’t good doers. Rest assured that we’re going there to use >> our hands, not our tongues." >> W. Shakespeare >> > > > <https://www.infomedia.com.au/driving-force/?utm_campaign=200630%20Email%20Signature&utm_source=Internal&utm_medium=Email&utm_content=Driving%20Force> > This email contains confidential information of and is the copyright of > Infomedia. It must not be forwarded, amended or disclosed without consent > of the sender. If you received this message by mistake, please advise the > sender and delete all copies. Security of transmission on the internet > cannot be guaranteed, could be infected, intercepted, or corrupted and you > should ensure you have suitable antivirus protection in place. By sending > us your or any third party personal details, you consent to (or confirm you > have obtained consent from such third parties) to Infomedia’s privacy > policy. http://www.infomedia.com.au/privacy-policy/ >