Re: Spark 2.3 in oozie

2018-05-16 Thread Peter Cseh
Wow, that's great news! Can I ask you to summarize the steps necessary to make this happen? It would be good to see everything together - also, it would probably help others as well. Thank you for sharing your struggles - and solutions as well! Peter On Wed, May 16, 2018 at 10:49 PM, purna

Re: Spark 2.3 in oozie

2018-05-16 Thread purna pradeep
Thanks Peter! I’m able to run spark pi example on Kubernetes cluster from oozie after this change On Wed, May 16, 2018 at 10:27 AM Peter Cseh wrote: > The version of the xml schema has nothing to do with the version of the > component you're using. > > Thanks for

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread Peter Cseh
Great progress there purna! :) Have you tried adding these properites to the coordinator's configuration? we usually use the action config to build up connection to the distributed file system. Although I'm not sure we're using these when polling the dependencies for coordinators, but I'm excited

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread Artem Ervits
Here's some related info https://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/s3-trouble/index.html https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md On Wed, May 16, 2018, 3:45 PM purna

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread purna pradeep
Peter, I got rid of this error by adding hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar But I’m getting below error now java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified by setting the fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey properties

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread purna pradeep
I have tried this ,just added s3 instead of * oozie.service.HadoopAccessorService.supported.filesystems hdfs,hftp,webhdfs,s3 Getting below error java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found at

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread purna pradeep
This is what is in the logs 2018-05-16 14:06:13,500 INFO URIHandlerService:520 - SERVER[localhost] Loaded urihandlers [org.apache.oozie.dependency.FSURIHandler] 2018-05-16 14:06:13,501 INFO URIHandlerService:520 - SERVER[localhost] Loaded default urihandler

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread Peter Cseh
That's strange, this exception should not happen in that case. Can you check the server logs for messages like this? LOG.info("Loaded urihandlers {0}", Arrays.toString(classes)); LOG.info("Loaded default urihandler {0}", defaultHandler.getClass().getName()); Thanks On Wed, May 16,

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread purna pradeep
This is what I already have in my oozie-site.xml oozie.service.HadoopAccessorService.supported.filesystems * On Wed, May 16, 2018 at 11:37 AM Peter Cseh wrote: > You'll have to configure > oozie.service.HadoopAccessorService.supported.filesystems >

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread Peter Cseh
You'll have to configure oozie.service.HadoopAccessorService.supported.filesystems hdfs,hftp,webhdfs Enlist the different filesystems supported for federation. If wildcard "*" is specified, then ALL file schemes will be allowed.properly. For testing purposes it's ok to put * in there in

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread purna pradeep
+Peter On Wed, May 16, 2018 at 11:29 AM purna pradeep wrote: > Peter, > > I have tried to specify dataset with uri starting with s3://, s3a:// and > s3n:// and I am getting exception > > > > Exception occurred:E0904: Scheme [s3] not supported in uri >

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread purna pradeep
Peter, I have tried to specify dataset with uri starting with s3://, s3a:// and s3n:// and I am getting exception Exception occurred:E0904: Scheme [s3] not supported in uri [s3://mybucket/input.data] Making the job failed org.apache.oozie.dependency.URIHandlerException: E0904: Scheme [s3] not

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread Shikin, Igor
Hi Peter, I am working with Purna. I have tried to specify dataset with uri starting with s3://, s3a:// and s3n:// and I am getting exception Exception occurred:E0904: Scheme [s3] not supported in uri [s3://cmsegmentation-qa/oozie-test/input.data] Making the job failed

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread Peter Cseh
I think it should be possible for Oozie to poll S3. Check out this description on how to make it work in jobs, something similar should work on the server side as well On Tue, May 15, 2018 at 4:43 PM, purna

Re: Spark 2.3 in oozie

2018-05-16 Thread Peter Cseh
The version of the xml schema has nothing to do with the version of the component you're using. Thanks for verifying that -Dspark.scala.binary.verstion=2.11 is required for compilation with Spark 2.3.0 Oozie does not pull in Spark's Kubernetes artifact. To make it part of the Oozie Spark