Re: Oozie for spark jobs without Hadoop

2018-05-21 Thread purna pradeep
Here you go ! - Add oozie.service.HadoopAccessorService.supported.filesystems as * in oozie-site.xml - include hadoop-aws-2.8.3.jar - Rebuild oozie with Dhttpclient.version=4.5.5 -Dhttpcore.version=4.4.9 - Set jetty_opts with proxy values On Sat, May 19, 2018 at 2:17 AM Peter

Re: Oozie for spark jobs without Hadoop

2018-05-19 Thread Peter Cseh
Wow, great work! Can you please summarize the required steps? This would be useful for others so we probably should add it to our documentation. Thanks in advance! Peter On Fri, May 18, 2018 at 11:33 PM, purna pradeep wrote: > I got this fixed by setting jetty_opts with

Re: Oozie for spark jobs without Hadoop

2018-05-17 Thread purna pradeep
Ok I fixed this by adding aws keys in oozie But I’m getting below error I have tried setting proxy in core-site.xml but no luck 2018-05-17 15:39:20,602 ERROR CoordInputLogicEvaluatorPhaseOne:517 - SERVER[localhost] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[000-180517144113498-oozie-xjt0-C]

Re: Oozie for spark jobs without Hadoop

2018-05-17 Thread Peter Cseh
Can you try configuring the access keys via environment variables in the server? https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Authenticating_via_environment_variables It's possible that we don't propagate the coordinator action's configuration properly to the

Re: Oozie for spark jobs without Hadoop

2018-05-17 Thread purna pradeep
Ok I got passed this error By rebuilding oozie with Dhttpclient.version=4.5.5 -Dhttpcore.version=4.4.9 now getting this error ACTION[000-180517144113498-oozie-xjt0-C@1] org.apache.oozie.service.HadoopAccessorException: E0902: Exception occurred: [doesBucketExist on

Re: Oozie for spark jobs without Hadoop

2018-05-17 Thread purna pradeep
Peter, Also When I submit a job with new http client jar, I get ```Error: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 1. Exception = Could not authenticate, Authentication failed, status: 500, message: Server Error``` On Thu, May 17, 2018 at 12:14 PM

Re: Oozie for spark jobs without Hadoop

2018-05-17 Thread purna pradeep
Ok I have tried this It appears that s3a support requires httpclient 4.4.x and oozie is bundled with httpclient 4.3.6. When httpclient is upgraded, the ext UI stops loading. On Thu, May 17, 2018 at 10:28 AM Peter Cseh wrote: > Purna, > > Based on >

Re: Oozie for spark jobs without Hadoop

2018-05-17 Thread Peter Cseh
Purna, Based on https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3 you should try to go for s3a. You'll have to include the aws-jdk as well if I see it correctly: https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A Also, the property names

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread Peter Cseh
Great progress there purna! :) Have you tried adding these properites to the coordinator's configuration? we usually use the action config to build up connection to the distributed file system. Although I'm not sure we're using these when polling the dependencies for coordinators, but I'm excited

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread Artem Ervits
Here's some related info https://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/s3-trouble/index.html https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md On Wed, May 16, 2018, 3:45 PM purna

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread purna pradeep
Peter, I got rid of this error by adding hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar But I’m getting below error now java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified by setting the fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey properties

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread purna pradeep
I have tried this ,just added s3 instead of * oozie.service.HadoopAccessorService.supported.filesystems hdfs,hftp,webhdfs,s3 Getting below error java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found at

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread purna pradeep
This is what is in the logs 2018-05-16 14:06:13,500 INFO URIHandlerService:520 - SERVER[localhost] Loaded urihandlers [org.apache.oozie.dependency.FSURIHandler] 2018-05-16 14:06:13,501 INFO URIHandlerService:520 - SERVER[localhost] Loaded default urihandler

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread Peter Cseh
That's strange, this exception should not happen in that case. Can you check the server logs for messages like this? LOG.info("Loaded urihandlers {0}", Arrays.toString(classes)); LOG.info("Loaded default urihandler {0}", defaultHandler.getClass().getName()); Thanks On Wed, May 16,

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread purna pradeep
This is what I already have in my oozie-site.xml oozie.service.HadoopAccessorService.supported.filesystems * On Wed, May 16, 2018 at 11:37 AM Peter Cseh wrote: > You'll have to configure > oozie.service.HadoopAccessorService.supported.filesystems >

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread Peter Cseh
You'll have to configure oozie.service.HadoopAccessorService.supported.filesystems hdfs,hftp,webhdfs Enlist the different filesystems supported for federation. If wildcard "*" is specified, then ALL file schemes will be allowed.properly. For testing purposes it's ok to put * in there in

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread purna pradeep
+Peter On Wed, May 16, 2018 at 11:29 AM purna pradeep wrote: > Peter, > > I have tried to specify dataset with uri starting with s3://, s3a:// and > s3n:// and I am getting exception > > > > Exception occurred:E0904: Scheme [s3] not supported in uri >

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread purna pradeep
Peter, I have tried to specify dataset with uri starting with s3://, s3a:// and s3n:// and I am getting exception Exception occurred:E0904: Scheme [s3] not supported in uri [s3://mybucket/input.data] Making the job failed org.apache.oozie.dependency.URIHandlerException: E0904: Scheme [s3] not

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread Shikin, Igor
Hi Peter, I am working with Purna. I have tried to specify dataset with uri starting with s3://, s3a:// and s3n:// and I am getting exception Exception occurred:E0904: Scheme [s3] not supported in uri [s3://cmsegmentation-qa/oozie-test/input.data] Making the job failed

Re: Oozie for spark jobs without Hadoop

2018-05-16 Thread Peter Cseh
I think it should be possible for Oozie to poll S3. Check out this description on how to make it work in jobs, something similar should work on the server side as well On Tue, May 15, 2018 at 4:43 PM, purna

Re: Oozie for spark jobs without Hadoop

2018-05-15 Thread purna pradeep
Thanks Andras, Also I also would like to know if oozie supports Aws S3 as input events to poll for a dependency file before kicking off a spark action For example: I don’t want to kick off a spark action until a file is arrived on a given AWS s3 location On Tue, May 15, 2018 at 10:17 AM Andras

Re: Oozie for spark jobs without Hadoop

2018-05-15 Thread Andras Piros
Hi, Oozie needs HDFS to store workflow, coordinator, or bundle definitions, as well as sharelib files in a safe, distributed and scalable way. Oozie needs YARN to run almost all of its actions, Spark action being no exception. At the moment it's not feasible to install Oozie without those Hadoop