Ok, I've found it: If you are using 4.3.0 or newer this is the part which checks for dependencies: https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/coord/CoordCommandUtils.java#L914-L926 It passes the coordinator action's configuration and even does impersonation to check for the dependencies: https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/coord/input/logic/CoordInputLogicEvaluatorPhaseOne.java#L159
Have you tried the following in the coordinator xml: <action> <workflow> <app-path>hdfs://bar:9000/usr/joe/logsprocessor-wf</app-path> <configuration> <property> <name>fs.s3.awsAccessKeyId</name> <value>[YOURKEYID]</value> </property> <property> <name>fs.s3.awsSecretAccessKey</name> <value>[YOURKEY]</value> </property> </configuration> </workflow> </action> Based on the source this should be able to poll s3 periodically. On Wed, May 16, 2018 at 10:57 PM, purna pradeep <purna2prad...@gmail.com> wrote: > > I have tried with coordinator's configuration too but no luck ☹️ > > On Wed, May 16, 2018 at 3:54 PM Peter Cseh <gezap...@cloudera.com> wrote: > >> Great progress there purna! :) >> >> Have you tried adding these properites to the coordinator's >> configuration? we usually use the action config to build up connection to >> the distributed file system. >> Although I'm not sure we're using these when polling the dependencies for >> coordinators, but I'm excited about you trying to make it work! >> >> I'll get back with a - hopefully - more helpful answer soon, I have to >> check the code in more depth first. >> gp >> >> On Wed, May 16, 2018 at 9:45 PM, purna pradeep <purna2prad...@gmail.com> >> wrote: >> >>> Peter, >>> >>> I got rid of this error by adding >>> hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar >>> >>> But I’m getting below error now >>> >>> java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access >>> Key must be specified by setting the fs.s3.awsAccessKeyId and >>> fs.s3.awsSecretAccessKey properties (respectively) >>> >>> I have tried adding AWS access ,secret keys in >>> >>> oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml >>> >>> >>> >>> >>> On Wed, May 16, 2018 at 2:30 PM purna pradeep <purna2prad...@gmail.com> >>> wrote: >>> >>>> >>>> I have tried this ,just added s3 instead of * >>>> >>>> <property> >>>> >>>> <name>oozie.service.HadoopAccessorService. >>>> supported.filesystems</name> >>>> >>>> <value>hdfs,hftp,webhdfs,s3</value> >>>> >>>> </property> >>>> >>>> >>>> Getting below error >>>> >>>> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class >>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found >>>> >>>> at org.apache.hadoop.conf.Configuration.getClass( >>>> Configuration.java:2369) >>>> >>>> at org.apache.hadoop.fs.FileSystem.getFileSystemClass( >>>> FileSystem.java:2793) >>>> >>>> at org.apache.hadoop.fs.FileSystem.createFileSystem( >>>> FileSystem.java:2810) >>>> >>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100) >>>> >>>> at org.apache.hadoop.fs.FileSystem$Cache.getInternal( >>>> FileSystem.java:2849) >>>> >>>> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831) >>>> >>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389) >>>> >>>> at org.apache.oozie.service.HadoopAccessorService$5.run( >>>> HadoopAccessorService.java:625) >>>> >>>> at org.apache.oozie.service.HadoopAccessorService$5.run( >>>> HadoopAccessorService.java:623 >>>> >>>> >>>> On Wed, May 16, 2018 at 2:19 PM purna pradeep <purna2prad...@gmail.com> >>>> wrote: >>>> >>>>> This is what is in the logs >>>>> >>>>> 2018-05-16 14:06:13,500 INFO URIHandlerService:520 - >>>>> SERVER[localhost] Loaded urihandlers [org.apache.oozie.dependency. >>>>> FSURIHandler] >>>>> >>>>> 2018-05-16 14:06:13,501 INFO URIHandlerService:520 - >>>>> SERVER[localhost] Loaded default urihandler org.apache.oozie.dependency. >>>>> FSURIHandler >>>>> >>>>> >>>>> On Wed, May 16, 2018 at 12:27 PM Peter Cseh <gezap...@cloudera.com> >>>>> wrote: >>>>> >>>>>> That's strange, this exception should not happen in that case. >>>>>> Can you check the server logs for messages like this? >>>>>> LOG.info("Loaded urihandlers {0}", Arrays.toString(classes)); >>>>>> LOG.info("Loaded default urihandler {0}", >>>>>> defaultHandler.getClass().getName()); >>>>>> Thanks >>>>>> >>>>>> On Wed, May 16, 2018 at 5:47 PM, purna pradeep < >>>>>> purna2prad...@gmail.com> wrote: >>>>>> >>>>>>> This is what I already have in my oozie-site.xml >>>>>>> >>>>>>> <property> >>>>>>> >>>>>>> <name>oozie.service.HadoopAccessorService. >>>>>>> supported.filesystems</name> >>>>>>> >>>>>>> <value>*</value> >>>>>>> >>>>>>> </property> >>>>>>> >>>>>>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <gezap...@cloudera.com> >>>>>>> wrote: >>>>>>> >>>>>>>> You'll have to configure >>>>>>>> oozie.service.HadoopAccessorService.supported.filesystems >>>>>>>> hdfs,hftp,webhdfs Enlist >>>>>>>> the different filesystems supported for federation. If wildcard "*" >>>>>>>> is >>>>>>>> specified, then ALL file schemes will be allowed.properly. >>>>>>>> >>>>>>>> For testing purposes it's ok to put * in there in oozie-site.xml >>>>>>>> >>>>>>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep < >>>>>>>> purna2prad...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> > Peter, >>>>>>>> > >>>>>>>> > I have tried to specify dataset with uri starting with s3://, >>>>>>>> s3a:// and >>>>>>>> > s3n:// and I am getting exception >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > Exception occurred:E0904: Scheme [s3] not supported in uri >>>>>>>> > [s3://mybucket/input.data] Making the job failed >>>>>>>> > >>>>>>>> > org.apache.oozie.dependency.URIHandlerException: E0904: Scheme >>>>>>>> [s3] not >>>>>>>> > supported in uri [s3:// mybucket /input.data] >>>>>>>> > >>>>>>>> > at >>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler( >>>>>>>> > URIHandlerService.java:185) >>>>>>>> > >>>>>>>> > at >>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler( >>>>>>>> > URIHandlerService.java:168) >>>>>>>> > >>>>>>>> > at >>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler( >>>>>>>> > URIHandlerService.java:160) >>>>>>>> > >>>>>>>> > at >>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs( >>>>>>>> > CoordCommandUtils.java:465) >>>>>>>> > >>>>>>>> > at >>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils. >>>>>>>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404) >>>>>>>> > >>>>>>>> > at >>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils. >>>>>>>> > materializeInputDataEvents(CoordCommandUtils.java:731) >>>>>>>> > >>>>>>>> > at >>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils. >>>>>>>> materializeOneInstance( >>>>>>>> > CoordCommandUtils.java:546) >>>>>>>> > >>>>>>>> > at >>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom >>>>>>>> > mand.materializeActions(CoordMaterializeTransitionXCom >>>>>>>> mand.java:492) >>>>>>>> > >>>>>>>> > at >>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom >>>>>>>> > mand.materialize(CoordMaterializeTransitionXCommand.java:362) >>>>>>>> > >>>>>>>> > at >>>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute( >>>>>>>> > MaterializeTransitionXCommand.java:73) >>>>>>>> > >>>>>>>> > at >>>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute( >>>>>>>> > MaterializeTransitionXCommand.java:29) >>>>>>>> > >>>>>>>> > at org.apache.oozie.command.XCommand.call(XCommand.java:290) >>>>>>>> > >>>>>>>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>>>>>>> > >>>>>>>> > at >>>>>>>> > org.apache.oozie.service.CallableQueueService$ >>>>>>>> CallableWrapper.run( >>>>>>>> > CallableQueueService.java:181) >>>>>>>> > >>>>>>>> > at >>>>>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker( >>>>>>>> > ThreadPoolExecutor.java:1149) >>>>>>>> > >>>>>>>> > at >>>>>>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run( >>>>>>>> > ThreadPoolExecutor.java:624) >>>>>>>> > >>>>>>>> > at java.lang.Thread.run(Thread.java:748) >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > Is S3 support specific to CDH distribution or should it work in >>>>>>>> Apache >>>>>>>> > Oozie as well? I’m not using CDH yet so >>>>>>>> > >>>>>>>> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh < >>>>>>>> gezap...@cloudera.com> wrote: >>>>>>>> > >>>>>>>> > > I think it should be possible for Oozie to poll S3. Check out >>>>>>>> this >>>>>>>> > > < >>>>>>>> > > https://www.cloudera.com/documentation/enterprise/5-9- >>>>>>>> > x/topics/admin_oozie_s3.html >>>>>>>> > > > >>>>>>>> > > description on how to make it work in jobs, something similar >>>>>>>> should work >>>>>>>> > > on the server side as well >>>>>>>> > > >>>>>>>> > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep < >>>>>>>> purna2prad...@gmail.com> >>>>>>>> > > wrote: >>>>>>>> > > >>>>>>>> > > > Thanks Andras, >>>>>>>> > > > >>>>>>>> > > > Also I also would like to know if oozie supports Aws S3 as >>>>>>>> input events >>>>>>>> > > to >>>>>>>> > > > poll for a dependency file before kicking off a spark action >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > For example: I don’t want to kick off a spark action until a >>>>>>>> file is >>>>>>>> > > > arrived on a given AWS s3 location >>>>>>>> > > > >>>>>>>> > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros < >>>>>>>> > andras.pi...@cloudera.com >>>>>>>> > > > >>>>>>>> > > > wrote: >>>>>>>> > > > >>>>>>>> > > > > Hi, >>>>>>>> > > > > >>>>>>>> > > > > Oozie needs HDFS to store workflow, coordinator, or bundle >>>>>>>> > definitions, >>>>>>>> > > > as >>>>>>>> > > > > well as sharelib files in a safe, distributed and scalable >>>>>>>> way. Oozie >>>>>>>> > > > needs >>>>>>>> > > > > YARN to run almost all of its actions, Spark action being no >>>>>>>> > exception. >>>>>>>> > > > > >>>>>>>> > > > > At the moment it's not feasible to install Oozie without >>>>>>>> those Hadoop >>>>>>>> > > > > components. How to install Oozie please *find here >>>>>>>> > > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*. >>>>>>>> > > > > >>>>>>>> > > > > Regards, >>>>>>>> > > > > >>>>>>>> > > > > Andras >>>>>>>> > > > > >>>>>>>> > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep < >>>>>>>> > > purna2prad...@gmail.com> >>>>>>>> > > > > wrote: >>>>>>>> > > > > >>>>>>>> > > > > > Hi, >>>>>>>> > > > > > >>>>>>>> > > > > > Would like to know if I can use sparkaction in oozie >>>>>>>> without having >>>>>>>> > > > > Hadoop >>>>>>>> > > > > > cluster? >>>>>>>> > > > > > >>>>>>>> > > > > > I want to use oozie to schedule spark jobs on Kubernetes >>>>>>>> cluster >>>>>>>> > > > > > >>>>>>>> > > > > > I’m a beginner in oozie >>>>>>>> > > > > > >>>>>>>> > > > > > Thanks >>>>>>>> > > > > > >>>>>>>> > > > > >>>>>>>> > > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > -- >>>>>>>> > > *Peter Cseh *| Software Engineer >>>>>>>> > > cloudera.com <https://www.cloudera.com> >>>>>>>> > > >>>>>>>> > > [image: Cloudera] <https://www.cloudera.com/> >>>>>>>> > > >>>>>>>> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> >>>>>>>> [image: >>>>>>>> > > Cloudera on Facebook] <https://www.facebook.com/cloudera> >>>>>>>> [image: >>>>>>>> > Cloudera >>>>>>>> > > on LinkedIn] <https://www.linkedin.com/company/cloudera> >>>>>>>> > > ------------------------------ >>>>>>>> > > >>>>>>>> > >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> *Peter Cseh *| Software Engineer >>>>>>>> cloudera.com <https://www.cloudera.com> >>>>>>>> >>>>>>>> [image: Cloudera] <https://www.cloudera.com/> >>>>>>>> >>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: >>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >>>>>>>> Cloudera >>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera> >>>>>>>> ------------------------------ >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Peter Cseh *| Software Engineer >>>>>> cloudera.com <https://www.cloudera.com> >>>>>> >>>>>> [image: Cloudera] <https://www.cloudera.com/> >>>>>> >>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: >>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> >>>>>> ------------------------------ >>>>>> >>>>>> >> >> >> -- >> *Peter Cseh *| Software Engineer >> cloudera.com <https://www.cloudera.com> >> >> [image: Cloudera] <https://www.cloudera.com/> >> >> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: >> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: >> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> >> ------------------------------ >> >> -- *Peter Cseh *| Software Engineer cloudera.com <https://www.cloudera.com> [image: Cloudera] <https://www.cloudera.com/> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> ------------------------------