Peter,

I’m using latest oozie 5.0.0 and I have tried below changes but no luck

Is this for s3 or s3a ?

I’m using s3 but if this is for s3a do you know which jar I need to include
I mean Hadoop-aws jar or any other jar if required

Hadoop-aws-2.8.3.jar is what I’m using

On Wed, May 16, 2018 at 5:19 PM Peter Cseh <gezap...@cloudera.com> wrote:

> Ok, I've found it:
>
> If you are using 4.3.0 or newer this is the part which checks for
> dependencies:
>
> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/coord/CoordCommandUtils.java#L914-L926
> It passes the coordinator action's configuration and even does
> impersonation to check for the dependencies:
>
> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/coord/input/logic/CoordInputLogicEvaluatorPhaseOne.java#L159
>
> Have you tried the following in the coordinator xml:
>
>  <action>
>         <workflow>
>           <app-path>hdfs://bar:9000/usr/joe/logsprocessor-wf</app-path>
>           <configuration>
>             <property>
>               <name>fs.s3.awsAccessKeyId</name>
>               <value>[YOURKEYID]</value>
>             </property>
>             <property>
>               <name>fs.s3.awsSecretAccessKey</name>
>               <value>[YOURKEY]</value>
>             </property>
>          </configuration>
>        </workflow>
>       </action>
>
> Based on the source this should be able to poll s3 periodically.
>
> On Wed, May 16, 2018 at 10:57 PM, purna pradeep <purna2prad...@gmail.com>
> wrote:
>
>>
>> I have tried with coordinator's configuration too but no luck ☹️
>>
>> On Wed, May 16, 2018 at 3:54 PM Peter Cseh <gezap...@cloudera.com> wrote:
>>
>>> Great progress there purna! :)
>>>
>>> Have you tried adding these properites to the coordinator's
>>> configuration? we usually use the action config to build up connection to
>>> the distributed file system.
>>> Although I'm not sure we're using these when polling the dependencies
>>> for coordinators, but I'm excited about you trying to make it work!
>>>
>>> I'll get back with a - hopefully - more helpful answer soon, I have to
>>> check the code in more depth first.
>>> gp
>>>
>>> On Wed, May 16, 2018 at 9:45 PM, purna pradeep <purna2prad...@gmail.com>
>>> wrote:
>>>
>>>> Peter,
>>>>
>>>> I got rid of this error by adding
>>>> hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar
>>>>
>>>> But I’m getting below error now
>>>>
>>>> java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access
>>>> Key must be specified by setting the fs.s3.awsAccessKeyId and
>>>> fs.s3.awsSecretAccessKey properties (respectively)
>>>>
>>>> I have tried adding AWS access ,secret keys in
>>>>
>>>> oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, May 16, 2018 at 2:30 PM purna pradeep <purna2prad...@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> I have tried this ,just added s3 instead of *
>>>>>
>>>>> <property>
>>>>>
>>>>>
>>>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
>>>>>
>>>>>     <value>hdfs,hftp,webhdfs,s3</value>
>>>>>
>>>>> </property>
>>>>>
>>>>>
>>>>> Getting below error
>>>>>
>>>>> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found
>>>>>
>>>>>     at
>>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2369)
>>>>>
>>>>>     at
>>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2793)
>>>>>
>>>>>     at
>>>>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2810)
>>>>>
>>>>>     at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
>>>>>
>>>>>     at
>>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849)
>>>>>
>>>>>     at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831)
>>>>>
>>>>>     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
>>>>>
>>>>>     at
>>>>> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:625)
>>>>>
>>>>>     at
>>>>> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:623
>>>>>
>>>>>
>>>>> On Wed, May 16, 2018 at 2:19 PM purna pradeep <purna2prad...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> This is what is in the logs
>>>>>>
>>>>>> 2018-05-16 14:06:13,500  INFO URIHandlerService:520 -
>>>>>> SERVER[localhost] Loaded urihandlers
>>>>>> [org.apache.oozie.dependency.FSURIHandler]
>>>>>>
>>>>>> 2018-05-16 14:06:13,501  INFO URIHandlerService:520 -
>>>>>> SERVER[localhost] Loaded default urihandler
>>>>>> org.apache.oozie.dependency.FSURIHandler
>>>>>>
>>>>>>
>>>>>> On Wed, May 16, 2018 at 12:27 PM Peter Cseh <gezap...@cloudera.com>
>>>>>> wrote:
>>>>>>
>>>>>>> That's strange, this exception should not happen in that case.
>>>>>>> Can you check the server logs for messages like this?
>>>>>>>         LOG.info("Loaded urihandlers {0}", Arrays.toString(classes));
>>>>>>>         LOG.info("Loaded default urihandler {0}",
>>>>>>> defaultHandler.getClass().getName());
>>>>>>> Thanks
>>>>>>>
>>>>>>> On Wed, May 16, 2018 at 5:47 PM, purna pradeep <
>>>>>>> purna2prad...@gmail.com> wrote:
>>>>>>>
>>>>>>>> This is what I already have in my oozie-site.xml
>>>>>>>>
>>>>>>>> <property>
>>>>>>>>
>>>>>>>>
>>>>>>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
>>>>>>>>
>>>>>>>>         <value>*</value>
>>>>>>>>
>>>>>>>> </property>
>>>>>>>>
>>>>>>>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <gezap...@cloudera.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> You'll have to configure
>>>>>>>>> oozie.service.HadoopAccessorService.supported.filesystems
>>>>>>>>> hdfs,hftp,webhdfs Enlist
>>>>>>>>> the different filesystems supported for federation. If wildcard
>>>>>>>>> "*" is
>>>>>>>>> specified, then ALL file schemes will be allowed.properly.
>>>>>>>>>
>>>>>>>>> For testing purposes it's ok to put * in there in oozie-site.xml
>>>>>>>>>
>>>>>>>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep <
>>>>>>>>> purna2prad...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> > Peter,
>>>>>>>>> >
>>>>>>>>> > I have tried to specify dataset with uri starting with s3://,
>>>>>>>>> s3a:// and
>>>>>>>>> > s3n:// and I am getting exception
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > Exception occurred:E0904: Scheme [s3] not supported in uri
>>>>>>>>> > [s3://mybucket/input.data] Making the job failed
>>>>>>>>> >
>>>>>>>>> > org.apache.oozie.dependency.URIHandlerException: E0904: Scheme
>>>>>>>>> [s3] not
>>>>>>>>> > supported in uri [s3:// mybucket /input.data]
>>>>>>>>> >
>>>>>>>>> >     at
>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>> > URIHandlerService.java:185)
>>>>>>>>> >
>>>>>>>>> >     at
>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>> > URIHandlerService.java:168)
>>>>>>>>> >
>>>>>>>>> >     at
>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>> > URIHandlerService.java:160)
>>>>>>>>> >
>>>>>>>>> >     at
>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs(
>>>>>>>>> > CoordCommandUtils.java:465)
>>>>>>>>> >
>>>>>>>>> >     at
>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404)
>>>>>>>>> >
>>>>>>>>> >     at
>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>> > materializeInputDataEvents(CoordCommandUtils.java:731)
>>>>>>>>> >
>>>>>>>>> >     at
>>>>>>>>> >
>>>>>>>>> org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance(
>>>>>>>>> > CoordCommandUtils.java:546)
>>>>>>>>> >
>>>>>>>>> >     at
>>>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>>>>> >
>>>>>>>>> mand.materializeActions(CoordMaterializeTransitionXCommand.java:492)
>>>>>>>>> >
>>>>>>>>> >     at
>>>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>>>>> > mand.materialize(CoordMaterializeTransitionXCommand.java:362)
>>>>>>>>> >
>>>>>>>>> >     at
>>>>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>>>>> > MaterializeTransitionXCommand.java:73)
>>>>>>>>> >
>>>>>>>>> >     at
>>>>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>>>>> > MaterializeTransitionXCommand.java:29)
>>>>>>>>> >
>>>>>>>>> >     at org.apache.oozie.command.XCommand.call(XCommand.java:290)
>>>>>>>>> >
>>>>>>>>> >     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>>>>>> >
>>>>>>>>> >     at
>>>>>>>>> >
>>>>>>>>> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(
>>>>>>>>> > CallableQueueService.java:181)
>>>>>>>>> >
>>>>>>>>> >     at
>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>>>>>>> > ThreadPoolExecutor.java:1149)
>>>>>>>>> >
>>>>>>>>> >     at
>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>>>>>>> > ThreadPoolExecutor.java:624)
>>>>>>>>> >
>>>>>>>>> >     at java.lang.Thread.run(Thread.java:748)
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > Is S3 support specific to CDH distribution or should it work in
>>>>>>>>> Apache
>>>>>>>>> > Oozie as well? I’m not using CDH yet so
>>>>>>>>> >
>>>>>>>>> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh <
>>>>>>>>> gezap...@cloudera.com> wrote:
>>>>>>>>> >
>>>>>>>>> > > I think it should be possible for Oozie to poll S3. Check out
>>>>>>>>> this
>>>>>>>>> > > <
>>>>>>>>> > > https://www.cloudera.com/documentation/enterprise/5-9-
>>>>>>>>> > x/topics/admin_oozie_s3.html
>>>>>>>>> > > >
>>>>>>>>> > > description on how to make it work in jobs, something similar
>>>>>>>>> should work
>>>>>>>>> > > on the server side as well
>>>>>>>>> > >
>>>>>>>>> > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep <
>>>>>>>>> purna2prad...@gmail.com>
>>>>>>>>> > > wrote:
>>>>>>>>> > >
>>>>>>>>> > > > Thanks Andras,
>>>>>>>>> > > >
>>>>>>>>> > > > Also I also would like to know if oozie supports Aws S3 as
>>>>>>>>> input events
>>>>>>>>> > > to
>>>>>>>>> > > > poll for a dependency file before kicking off a spark action
>>>>>>>>> > > >
>>>>>>>>> > > >
>>>>>>>>> > > > For example: I don’t want to kick off a spark action until a
>>>>>>>>> file is
>>>>>>>>> > > > arrived on a given AWS s3 location
>>>>>>>>> > > >
>>>>>>>>> > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros <
>>>>>>>>> > andras.pi...@cloudera.com
>>>>>>>>> > > >
>>>>>>>>> > > > wrote:
>>>>>>>>> > > >
>>>>>>>>> > > > > Hi,
>>>>>>>>> > > > >
>>>>>>>>> > > > > Oozie needs HDFS to store workflow, coordinator, or bundle
>>>>>>>>> > definitions,
>>>>>>>>> > > > as
>>>>>>>>> > > > > well as sharelib files in a safe, distributed and scalable
>>>>>>>>> way. Oozie
>>>>>>>>> > > > needs
>>>>>>>>> > > > > YARN to run almost all of its actions, Spark action being
>>>>>>>>> no
>>>>>>>>> > exception.
>>>>>>>>> > > > >
>>>>>>>>> > > > > At the moment it's not feasible to install Oozie without
>>>>>>>>> those Hadoop
>>>>>>>>> > > > > components. How to install Oozie please *find here
>>>>>>>>> > > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*.
>>>>>>>>> > > > >
>>>>>>>>> > > > > Regards,
>>>>>>>>> > > > >
>>>>>>>>> > > > > Andras
>>>>>>>>> > > > >
>>>>>>>>> > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <
>>>>>>>>> > > purna2prad...@gmail.com>
>>>>>>>>> > > > > wrote:
>>>>>>>>> > > > >
>>>>>>>>> > > > > > Hi,
>>>>>>>>> > > > > >
>>>>>>>>> > > > > > Would like to know if I can use sparkaction in oozie
>>>>>>>>> without having
>>>>>>>>> > > > > Hadoop
>>>>>>>>> > > > > > cluster?
>>>>>>>>> > > > > >
>>>>>>>>> > > > > > I want to use oozie to schedule spark jobs on Kubernetes
>>>>>>>>> cluster
>>>>>>>>> > > > > >
>>>>>>>>> > > > > > I’m a beginner in oozie
>>>>>>>>> > > > > >
>>>>>>>>> > > > > > Thanks
>>>>>>>>> > > > > >
>>>>>>>>> > > > >
>>>>>>>>> > > >
>>>>>>>>> > >
>>>>>>>>> > >
>>>>>>>>> > >
>>>>>>>>> > > --
>>>>>>>>> > > *Peter Cseh *| Software Engineer
>>>>>>>>> > > cloudera.com <https://www.cloudera.com>
>>>>>>>>> > >
>>>>>>>>> > > [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>> > >
>>>>>>>>> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>>>>>>>> [image:
>>>>>>>>> > > Cloudera on Facebook] <https://www.facebook.com/cloudera>
>>>>>>>>> [image:
>>>>>>>>> > Cloudera
>>>>>>>>> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>> > > ------------------------------
>>>>>>>>> > >
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>>
>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>
>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>>>>>>>> [image:
>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>>>> Cloudera
>>>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>> ------------------------------
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>
>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>
>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>> ------------------------------
>>>>>>>
>>>>>>>
>>>
>>>
>>> --
>>> *Peter Cseh *| Software Engineer
>>> cloudera.com <https://www.cloudera.com>
>>>
>>> [image: Cloudera] <https://www.cloudera.com/>
>>>
>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>> ------------------------------
>>>
>>>
>
>
> --
> *Peter Cseh *| Software Engineer
> cloudera.com <https://www.cloudera.com>
>
> [image: Cloudera] <https://www.cloudera.com/>
>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------
>
>

Reply via email to