Peter, I’m using latest oozie 5.0.0 and I have tried below changes but no luck
Is this for s3 or s3a ? I’m using s3 but if this is for s3a do you know which jar I need to include I mean Hadoop-aws jar or any other jar if required Hadoop-aws-2.8.3.jar is what I’m using On Wed, May 16, 2018 at 5:19 PM Peter Cseh <gezap...@cloudera.com> wrote: > Ok, I've found it: > > If you are using 4.3.0 or newer this is the part which checks for > dependencies: > > https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/coord/CoordCommandUtils.java#L914-L926 > It passes the coordinator action's configuration and even does > impersonation to check for the dependencies: > > https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/coord/input/logic/CoordInputLogicEvaluatorPhaseOne.java#L159 > > Have you tried the following in the coordinator xml: > > <action> > <workflow> > <app-path>hdfs://bar:9000/usr/joe/logsprocessor-wf</app-path> > <configuration> > <property> > <name>fs.s3.awsAccessKeyId</name> > <value>[YOURKEYID]</value> > </property> > <property> > <name>fs.s3.awsSecretAccessKey</name> > <value>[YOURKEY]</value> > </property> > </configuration> > </workflow> > </action> > > Based on the source this should be able to poll s3 periodically. > > On Wed, May 16, 2018 at 10:57 PM, purna pradeep <purna2prad...@gmail.com> > wrote: > >> >> I have tried with coordinator's configuration too but no luck ☹️ >> >> On Wed, May 16, 2018 at 3:54 PM Peter Cseh <gezap...@cloudera.com> wrote: >> >>> Great progress there purna! :) >>> >>> Have you tried adding these properites to the coordinator's >>> configuration? we usually use the action config to build up connection to >>> the distributed file system. >>> Although I'm not sure we're using these when polling the dependencies >>> for coordinators, but I'm excited about you trying to make it work! >>> >>> I'll get back with a - hopefully - more helpful answer soon, I have to >>> check the code in more depth first. >>> gp >>> >>> On Wed, May 16, 2018 at 9:45 PM, purna pradeep <purna2prad...@gmail.com> >>> wrote: >>> >>>> Peter, >>>> >>>> I got rid of this error by adding >>>> hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar >>>> >>>> But I’m getting below error now >>>> >>>> java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access >>>> Key must be specified by setting the fs.s3.awsAccessKeyId and >>>> fs.s3.awsSecretAccessKey properties (respectively) >>>> >>>> I have tried adding AWS access ,secret keys in >>>> >>>> oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml >>>> >>>> >>>> >>>> >>>> On Wed, May 16, 2018 at 2:30 PM purna pradeep <purna2prad...@gmail.com> >>>> wrote: >>>> >>>>> >>>>> I have tried this ,just added s3 instead of * >>>>> >>>>> <property> >>>>> >>>>> >>>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name> >>>>> >>>>> <value>hdfs,hftp,webhdfs,s3</value> >>>>> >>>>> </property> >>>>> >>>>> >>>>> Getting below error >>>>> >>>>> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class >>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found >>>>> >>>>> at >>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2369) >>>>> >>>>> at >>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2793) >>>>> >>>>> at >>>>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2810) >>>>> >>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100) >>>>> >>>>> at >>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849) >>>>> >>>>> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831) >>>>> >>>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389) >>>>> >>>>> at >>>>> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:625) >>>>> >>>>> at >>>>> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:623 >>>>> >>>>> >>>>> On Wed, May 16, 2018 at 2:19 PM purna pradeep <purna2prad...@gmail.com> >>>>> wrote: >>>>> >>>>>> This is what is in the logs >>>>>> >>>>>> 2018-05-16 14:06:13,500 INFO URIHandlerService:520 - >>>>>> SERVER[localhost] Loaded urihandlers >>>>>> [org.apache.oozie.dependency.FSURIHandler] >>>>>> >>>>>> 2018-05-16 14:06:13,501 INFO URIHandlerService:520 - >>>>>> SERVER[localhost] Loaded default urihandler >>>>>> org.apache.oozie.dependency.FSURIHandler >>>>>> >>>>>> >>>>>> On Wed, May 16, 2018 at 12:27 PM Peter Cseh <gezap...@cloudera.com> >>>>>> wrote: >>>>>> >>>>>>> That's strange, this exception should not happen in that case. >>>>>>> Can you check the server logs for messages like this? >>>>>>> LOG.info("Loaded urihandlers {0}", Arrays.toString(classes)); >>>>>>> LOG.info("Loaded default urihandler {0}", >>>>>>> defaultHandler.getClass().getName()); >>>>>>> Thanks >>>>>>> >>>>>>> On Wed, May 16, 2018 at 5:47 PM, purna pradeep < >>>>>>> purna2prad...@gmail.com> wrote: >>>>>>> >>>>>>>> This is what I already have in my oozie-site.xml >>>>>>>> >>>>>>>> <property> >>>>>>>> >>>>>>>> >>>>>>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name> >>>>>>>> >>>>>>>> <value>*</value> >>>>>>>> >>>>>>>> </property> >>>>>>>> >>>>>>>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <gezap...@cloudera.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> You'll have to configure >>>>>>>>> oozie.service.HadoopAccessorService.supported.filesystems >>>>>>>>> hdfs,hftp,webhdfs Enlist >>>>>>>>> the different filesystems supported for federation. If wildcard >>>>>>>>> "*" is >>>>>>>>> specified, then ALL file schemes will be allowed.properly. >>>>>>>>> >>>>>>>>> For testing purposes it's ok to put * in there in oozie-site.xml >>>>>>>>> >>>>>>>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep < >>>>>>>>> purna2prad...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> > Peter, >>>>>>>>> > >>>>>>>>> > I have tried to specify dataset with uri starting with s3://, >>>>>>>>> s3a:// and >>>>>>>>> > s3n:// and I am getting exception >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > Exception occurred:E0904: Scheme [s3] not supported in uri >>>>>>>>> > [s3://mybucket/input.data] Making the job failed >>>>>>>>> > >>>>>>>>> > org.apache.oozie.dependency.URIHandlerException: E0904: Scheme >>>>>>>>> [s3] not >>>>>>>>> > supported in uri [s3:// mybucket /input.data] >>>>>>>>> > >>>>>>>>> > at >>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler( >>>>>>>>> > URIHandlerService.java:185) >>>>>>>>> > >>>>>>>>> > at >>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler( >>>>>>>>> > URIHandlerService.java:168) >>>>>>>>> > >>>>>>>>> > at >>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler( >>>>>>>>> > URIHandlerService.java:160) >>>>>>>>> > >>>>>>>>> > at >>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs( >>>>>>>>> > CoordCommandUtils.java:465) >>>>>>>>> > >>>>>>>>> > at >>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils. >>>>>>>>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404) >>>>>>>>> > >>>>>>>>> > at >>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils. >>>>>>>>> > materializeInputDataEvents(CoordCommandUtils.java:731) >>>>>>>>> > >>>>>>>>> > at >>>>>>>>> > >>>>>>>>> org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance( >>>>>>>>> > CoordCommandUtils.java:546) >>>>>>>>> > >>>>>>>>> > at >>>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom >>>>>>>>> > >>>>>>>>> mand.materializeActions(CoordMaterializeTransitionXCommand.java:492) >>>>>>>>> > >>>>>>>>> > at >>>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom >>>>>>>>> > mand.materialize(CoordMaterializeTransitionXCommand.java:362) >>>>>>>>> > >>>>>>>>> > at >>>>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute( >>>>>>>>> > MaterializeTransitionXCommand.java:73) >>>>>>>>> > >>>>>>>>> > at >>>>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute( >>>>>>>>> > MaterializeTransitionXCommand.java:29) >>>>>>>>> > >>>>>>>>> > at org.apache.oozie.command.XCommand.call(XCommand.java:290) >>>>>>>>> > >>>>>>>>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>>>>>>>> > >>>>>>>>> > at >>>>>>>>> > >>>>>>>>> org.apache.oozie.service.CallableQueueService$CallableWrapper.run( >>>>>>>>> > CallableQueueService.java:181) >>>>>>>>> > >>>>>>>>> > at >>>>>>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker( >>>>>>>>> > ThreadPoolExecutor.java:1149) >>>>>>>>> > >>>>>>>>> > at >>>>>>>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run( >>>>>>>>> > ThreadPoolExecutor.java:624) >>>>>>>>> > >>>>>>>>> > at java.lang.Thread.run(Thread.java:748) >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > Is S3 support specific to CDH distribution or should it work in >>>>>>>>> Apache >>>>>>>>> > Oozie as well? I’m not using CDH yet so >>>>>>>>> > >>>>>>>>> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh < >>>>>>>>> gezap...@cloudera.com> wrote: >>>>>>>>> > >>>>>>>>> > > I think it should be possible for Oozie to poll S3. Check out >>>>>>>>> this >>>>>>>>> > > < >>>>>>>>> > > https://www.cloudera.com/documentation/enterprise/5-9- >>>>>>>>> > x/topics/admin_oozie_s3.html >>>>>>>>> > > > >>>>>>>>> > > description on how to make it work in jobs, something similar >>>>>>>>> should work >>>>>>>>> > > on the server side as well >>>>>>>>> > > >>>>>>>>> > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep < >>>>>>>>> purna2prad...@gmail.com> >>>>>>>>> > > wrote: >>>>>>>>> > > >>>>>>>>> > > > Thanks Andras, >>>>>>>>> > > > >>>>>>>>> > > > Also I also would like to know if oozie supports Aws S3 as >>>>>>>>> input events >>>>>>>>> > > to >>>>>>>>> > > > poll for a dependency file before kicking off a spark action >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > > For example: I don’t want to kick off a spark action until a >>>>>>>>> file is >>>>>>>>> > > > arrived on a given AWS s3 location >>>>>>>>> > > > >>>>>>>>> > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros < >>>>>>>>> > andras.pi...@cloudera.com >>>>>>>>> > > > >>>>>>>>> > > > wrote: >>>>>>>>> > > > >>>>>>>>> > > > > Hi, >>>>>>>>> > > > > >>>>>>>>> > > > > Oozie needs HDFS to store workflow, coordinator, or bundle >>>>>>>>> > definitions, >>>>>>>>> > > > as >>>>>>>>> > > > > well as sharelib files in a safe, distributed and scalable >>>>>>>>> way. Oozie >>>>>>>>> > > > needs >>>>>>>>> > > > > YARN to run almost all of its actions, Spark action being >>>>>>>>> no >>>>>>>>> > exception. >>>>>>>>> > > > > >>>>>>>>> > > > > At the moment it's not feasible to install Oozie without >>>>>>>>> those Hadoop >>>>>>>>> > > > > components. How to install Oozie please *find here >>>>>>>>> > > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*. >>>>>>>>> > > > > >>>>>>>>> > > > > Regards, >>>>>>>>> > > > > >>>>>>>>> > > > > Andras >>>>>>>>> > > > > >>>>>>>>> > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep < >>>>>>>>> > > purna2prad...@gmail.com> >>>>>>>>> > > > > wrote: >>>>>>>>> > > > > >>>>>>>>> > > > > > Hi, >>>>>>>>> > > > > > >>>>>>>>> > > > > > Would like to know if I can use sparkaction in oozie >>>>>>>>> without having >>>>>>>>> > > > > Hadoop >>>>>>>>> > > > > > cluster? >>>>>>>>> > > > > > >>>>>>>>> > > > > > I want to use oozie to schedule spark jobs on Kubernetes >>>>>>>>> cluster >>>>>>>>> > > > > > >>>>>>>>> > > > > > I’m a beginner in oozie >>>>>>>>> > > > > > >>>>>>>>> > > > > > Thanks >>>>>>>>> > > > > > >>>>>>>>> > > > > >>>>>>>>> > > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > -- >>>>>>>>> > > *Peter Cseh *| Software Engineer >>>>>>>>> > > cloudera.com <https://www.cloudera.com> >>>>>>>>> > > >>>>>>>>> > > [image: Cloudera] <https://www.cloudera.com/> >>>>>>>>> > > >>>>>>>>> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> >>>>>>>>> [image: >>>>>>>>> > > Cloudera on Facebook] <https://www.facebook.com/cloudera> >>>>>>>>> [image: >>>>>>>>> > Cloudera >>>>>>>>> > > on LinkedIn] <https://www.linkedin.com/company/cloudera> >>>>>>>>> > > ------------------------------ >>>>>>>>> > > >>>>>>>>> > >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> *Peter Cseh *| Software Engineer >>>>>>>>> cloudera.com <https://www.cloudera.com> >>>>>>>>> >>>>>>>>> [image: Cloudera] <https://www.cloudera.com/> >>>>>>>>> >>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> >>>>>>>>> [image: >>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >>>>>>>>> Cloudera >>>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera> >>>>>>>>> ------------------------------ >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *Peter Cseh *| Software Engineer >>>>>>> cloudera.com <https://www.cloudera.com> >>>>>>> >>>>>>> [image: Cloudera] <https://www.cloudera.com/> >>>>>>> >>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: >>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >>>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> >>>>>>> ------------------------------ >>>>>>> >>>>>>> >>> >>> >>> -- >>> *Peter Cseh *| Software Engineer >>> cloudera.com <https://www.cloudera.com> >>> >>> [image: Cloudera] <https://www.cloudera.com/> >>> >>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: >>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> >>> ------------------------------ >>> >>> > > > -- > *Peter Cseh *| Software Engineer > cloudera.com <https://www.cloudera.com> > > [image: Cloudera] <https://www.cloudera.com/> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: > Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> > ------------------------------ > >