Purna,

Based on
https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3
you should try to go for s3a.
You'll have to include the aws-jdk as well if I see it correctly:
https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A
Also, the property names are slightly different so you'll have to change
the example I've given.



On Thu, May 17, 2018 at 4:16 PM, purna pradeep <purna2prad...@gmail.com>
wrote:

> Peter,
>
> I’m using latest oozie 5.0.0 and I have tried below changes but no luck
>
> Is this for s3 or s3a ?
>
> I’m using s3 but if this is for s3a do you know which jar I need to
> include I mean Hadoop-aws jar or any other jar if required
>
> Hadoop-aws-2.8.3.jar is what I’m using
>
> On Wed, May 16, 2018 at 5:19 PM Peter Cseh <gezap...@cloudera.com> wrote:
>
>> Ok, I've found it:
>>
>> If you are using 4.3.0 or newer this is the part which checks for
>> dependencies:
>> https://github.com/apache/oozie/blob/master/core/src/
>> main/java/org/apache/oozie/command/coord/CoordCommandUtils.java#L914-L926
>> It passes the coordinator action's configuration and even does
>> impersonation to check for the dependencies:
>> https://github.com/apache/oozie/blob/master/core/src/
>> main/java/org/apache/oozie/coord/input/logic/
>> CoordInputLogicEvaluatorPhaseOne.java#L159
>>
>> Have you tried the following in the coordinator xml:
>>
>>  <action>
>>         <workflow>
>>           <app-path>hdfs://bar:9000/usr/joe/logsprocessor-wf</app-path>
>>           <configuration>
>>             <property>
>>               <name>fs.s3.awsAccessKeyId</name>
>>               <value>[YOURKEYID]</value>
>>             </property>
>>             <property>
>>               <name>fs.s3.awsSecretAccessKey</name>
>>               <value>[YOURKEY]</value>
>>             </property>
>>          </configuration>
>>        </workflow>
>>       </action>
>>
>> Based on the source this should be able to poll s3 periodically.
>>
>> On Wed, May 16, 2018 at 10:57 PM, purna pradeep <purna2prad...@gmail.com>
>> wrote:
>>
>>>
>>> I have tried with coordinator's configuration too but no luck ☹️
>>>
>>> On Wed, May 16, 2018 at 3:54 PM Peter Cseh <gezap...@cloudera.com>
>>> wrote:
>>>
>>>> Great progress there purna! :)
>>>>
>>>> Have you tried adding these properites to the coordinator's
>>>> configuration? we usually use the action config to build up connection to
>>>> the distributed file system.
>>>> Although I'm not sure we're using these when polling the dependencies
>>>> for coordinators, but I'm excited about you trying to make it work!
>>>>
>>>> I'll get back with a - hopefully - more helpful answer soon, I have to
>>>> check the code in more depth first.
>>>> gp
>>>>
>>>> On Wed, May 16, 2018 at 9:45 PM, purna pradeep <purna2prad...@gmail.com
>>>> > wrote:
>>>>
>>>>> Peter,
>>>>>
>>>>> I got rid of this error by adding
>>>>> hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar
>>>>>
>>>>> But I’m getting below error now
>>>>>
>>>>> java.lang.IllegalArgumentException: AWS Access Key ID and Secret
>>>>> Access Key must be specified by setting the fs.s3.awsAccessKeyId and
>>>>> fs.s3.awsSecretAccessKey properties (respectively)
>>>>>
>>>>> I have tried adding AWS access ,secret keys in
>>>>>
>>>>> oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, May 16, 2018 at 2:30 PM purna pradeep <purna2prad...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> I have tried this ,just added s3 instead of *
>>>>>>
>>>>>> <property>
>>>>>>
>>>>>>     <name>oozie.service.HadoopAccessorService.
>>>>>> supported.filesystems</name>
>>>>>>
>>>>>>     <value>hdfs,hftp,webhdfs,s3</value>
>>>>>>
>>>>>> </property>
>>>>>>
>>>>>>
>>>>>> Getting below error
>>>>>>
>>>>>> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
>>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found
>>>>>>
>>>>>>     at org.apache.hadoop.conf.Configuration.getClass(
>>>>>> Configuration.java:2369)
>>>>>>
>>>>>>     at org.apache.hadoop.fs.FileSystem.getFileSystemClass(
>>>>>> FileSystem.java:2793)
>>>>>>
>>>>>>     at org.apache.hadoop.fs.FileSystem.createFileSystem(
>>>>>> FileSystem.java:2810)
>>>>>>
>>>>>>     at org.apache.hadoop.fs.FileSystem.access$200(
>>>>>> FileSystem.java:100)
>>>>>>
>>>>>>     at org.apache.hadoop.fs.FileSystem$Cache.getInternal(
>>>>>> FileSystem.java:2849)
>>>>>>
>>>>>>     at org.apache.hadoop.fs.FileSystem$Cache.get(
>>>>>> FileSystem.java:2831)
>>>>>>
>>>>>>     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
>>>>>>
>>>>>>     at org.apache.oozie.service.HadoopAccessorService$5.run(
>>>>>> HadoopAccessorService.java:625)
>>>>>>
>>>>>>     at org.apache.oozie.service.HadoopAccessorService$5.run(
>>>>>> HadoopAccessorService.java:623
>>>>>>
>>>>>>
>>>>>> On Wed, May 16, 2018 at 2:19 PM purna pradeep <
>>>>>> purna2prad...@gmail.com> wrote:
>>>>>>
>>>>>>> This is what is in the logs
>>>>>>>
>>>>>>> 2018-05-16 14:06:13,500  INFO URIHandlerService:520 -
>>>>>>> SERVER[localhost] Loaded urihandlers [org.apache.oozie.dependency.
>>>>>>> FSURIHandler]
>>>>>>>
>>>>>>> 2018-05-16 14:06:13,501  INFO URIHandlerService:520 -
>>>>>>> SERVER[localhost] Loaded default urihandler org.apache.oozie.dependency.
>>>>>>> FSURIHandler
>>>>>>>
>>>>>>>
>>>>>>> On Wed, May 16, 2018 at 12:27 PM Peter Cseh <gezap...@cloudera.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> That's strange, this exception should not happen in that case.
>>>>>>>> Can you check the server logs for messages like this?
>>>>>>>>         LOG.info("Loaded urihandlers {0}",
>>>>>>>> Arrays.toString(classes));
>>>>>>>>         LOG.info("Loaded default urihandler {0}",
>>>>>>>> defaultHandler.getClass().getName());
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> On Wed, May 16, 2018 at 5:47 PM, purna pradeep <
>>>>>>>> purna2prad...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> This is what I already have in my oozie-site.xml
>>>>>>>>>
>>>>>>>>> <property>
>>>>>>>>>
>>>>>>>>>         <name>oozie.service.HadoopAccessorService.
>>>>>>>>> supported.filesystems</name>
>>>>>>>>>
>>>>>>>>>         <value>*</value>
>>>>>>>>>
>>>>>>>>> </property>
>>>>>>>>>
>>>>>>>>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <gezap...@cloudera.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> You'll have to configure
>>>>>>>>>> oozie.service.HadoopAccessorService.supported.filesystems
>>>>>>>>>> hdfs,hftp,webhdfs Enlist
>>>>>>>>>> the different filesystems supported for federation. If wildcard
>>>>>>>>>> "*" is
>>>>>>>>>> specified, then ALL file schemes will be allowed.properly.
>>>>>>>>>>
>>>>>>>>>> For testing purposes it's ok to put * in there in oozie-site.xml
>>>>>>>>>>
>>>>>>>>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep <
>>>>>>>>>> purna2prad...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> > Peter,
>>>>>>>>>> >
>>>>>>>>>> > I have tried to specify dataset with uri starting with s3://,
>>>>>>>>>> s3a:// and
>>>>>>>>>> > s3n:// and I am getting exception
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > Exception occurred:E0904: Scheme [s3] not supported in uri
>>>>>>>>>> > [s3://mybucket/input.data] Making the job failed
>>>>>>>>>> >
>>>>>>>>>> > org.apache.oozie.dependency.URIHandlerException: E0904: Scheme
>>>>>>>>>> [s3] not
>>>>>>>>>> > supported in uri [s3:// mybucket /input.data]
>>>>>>>>>> >
>>>>>>>>>> >     at
>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>> > URIHandlerService.java:185)
>>>>>>>>>> >
>>>>>>>>>> >     at
>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>> > URIHandlerService.java:168)
>>>>>>>>>> >
>>>>>>>>>> >     at
>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>> > URIHandlerService.java:160)
>>>>>>>>>> >
>>>>>>>>>> >     at
>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>> createEarlyURIs(
>>>>>>>>>> > CoordCommandUtils.java:465)
>>>>>>>>>> >
>>>>>>>>>> >     at
>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404)
>>>>>>>>>> >
>>>>>>>>>> >     at
>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>> > materializeInputDataEvents(CoordCommandUtils.java:731)
>>>>>>>>>> >
>>>>>>>>>> >     at
>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>> materializeOneInstance(
>>>>>>>>>> > CoordCommandUtils.java:546)
>>>>>>>>>> >
>>>>>>>>>> >     at
>>>>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>>>>>> > mand.materializeActions(CoordMaterializeTransitionXCom
>>>>>>>>>> mand.java:492)
>>>>>>>>>> >
>>>>>>>>>> >     at
>>>>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>>>>>> > mand.materialize(CoordMaterializeTransitionXCommand.java:362)
>>>>>>>>>> >
>>>>>>>>>> >     at
>>>>>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>>>>>> > MaterializeTransitionXCommand.java:73)
>>>>>>>>>> >
>>>>>>>>>> >     at
>>>>>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>>>>>> > MaterializeTransitionXCommand.java:29)
>>>>>>>>>> >
>>>>>>>>>> >     at org.apache.oozie.command.XCommand.call(XCommand.java:
>>>>>>>>>> 290)
>>>>>>>>>> >
>>>>>>>>>> >     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>>>>>>> >
>>>>>>>>>> >     at
>>>>>>>>>> > org.apache.oozie.service.CallableQueueService$
>>>>>>>>>> CallableWrapper.run(
>>>>>>>>>> > CallableQueueService.java:181)
>>>>>>>>>> >
>>>>>>>>>> >     at
>>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>>>>>>>> > ThreadPoolExecutor.java:1149)
>>>>>>>>>> >
>>>>>>>>>> >     at
>>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>>>>>>>> > ThreadPoolExecutor.java:624)
>>>>>>>>>> >
>>>>>>>>>> >     at java.lang.Thread.run(Thread.java:748)
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > Is S3 support specific to CDH distribution or should it work in
>>>>>>>>>> Apache
>>>>>>>>>> > Oozie as well? I’m not using CDH yet so
>>>>>>>>>> >
>>>>>>>>>> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh <
>>>>>>>>>> gezap...@cloudera.com> wrote:
>>>>>>>>>> >
>>>>>>>>>> > > I think it should be possible for Oozie to poll S3. Check out
>>>>>>>>>> this
>>>>>>>>>> > > <
>>>>>>>>>> > > https://www.cloudera.com/documentation/enterprise/5-9-
>>>>>>>>>> > x/topics/admin_oozie_s3.html
>>>>>>>>>> > > >
>>>>>>>>>> > > description on how to make it work in jobs, something similar
>>>>>>>>>> should work
>>>>>>>>>> > > on the server side as well
>>>>>>>>>> > >
>>>>>>>>>> > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep <
>>>>>>>>>> purna2prad...@gmail.com>
>>>>>>>>>> > > wrote:
>>>>>>>>>> > >
>>>>>>>>>> > > > Thanks Andras,
>>>>>>>>>> > > >
>>>>>>>>>> > > > Also I also would like to know if oozie supports Aws S3 as
>>>>>>>>>> input events
>>>>>>>>>> > > to
>>>>>>>>>> > > > poll for a dependency file before kicking off a spark action
>>>>>>>>>> > > >
>>>>>>>>>> > > >
>>>>>>>>>> > > > For example: I don’t want to kick off a spark action until
>>>>>>>>>> a file is
>>>>>>>>>> > > > arrived on a given AWS s3 location
>>>>>>>>>> > > >
>>>>>>>>>> > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros <
>>>>>>>>>> > andras.pi...@cloudera.com
>>>>>>>>>> > > >
>>>>>>>>>> > > > wrote:
>>>>>>>>>> > > >
>>>>>>>>>> > > > > Hi,
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > Oozie needs HDFS to store workflow, coordinator, or bundle
>>>>>>>>>> > definitions,
>>>>>>>>>> > > > as
>>>>>>>>>> > > > > well as sharelib files in a safe, distributed and
>>>>>>>>>> scalable way. Oozie
>>>>>>>>>> > > > needs
>>>>>>>>>> > > > > YARN to run almost all of its actions, Spark action being
>>>>>>>>>> no
>>>>>>>>>> > exception.
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > At the moment it's not feasible to install Oozie without
>>>>>>>>>> those Hadoop
>>>>>>>>>> > > > > components. How to install Oozie please *find here
>>>>>>>>>> > > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*.
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > Regards,
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > Andras
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <
>>>>>>>>>> > > purna2prad...@gmail.com>
>>>>>>>>>> > > > > wrote:
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > > Hi,
>>>>>>>>>> > > > > >
>>>>>>>>>> > > > > > Would like to know if I can use sparkaction in oozie
>>>>>>>>>> without having
>>>>>>>>>> > > > > Hadoop
>>>>>>>>>> > > > > > cluster?
>>>>>>>>>> > > > > >
>>>>>>>>>> > > > > > I want to use oozie to schedule spark jobs on
>>>>>>>>>> Kubernetes cluster
>>>>>>>>>> > > > > >
>>>>>>>>>> > > > > > I’m a beginner in oozie
>>>>>>>>>> > > > > >
>>>>>>>>>> > > > > > Thanks
>>>>>>>>>> > > > > >
>>>>>>>>>> > > > >
>>>>>>>>>> > > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > > --
>>>>>>>>>> > > *Peter Cseh *| Software Engineer
>>>>>>>>>> > > cloudera.com <https://www.cloudera.com>
>>>>>>>>>> > >
>>>>>>>>>> > > [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>> > >
>>>>>>>>>> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>>>>>>>>> [image:
>>>>>>>>>> > > Cloudera on Facebook] <https://www.facebook.com/cloudera>
>>>>>>>>>> [image:
>>>>>>>>>> > Cloudera
>>>>>>>>>> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>>> > > ------------------------------
>>>>>>>>>> > >
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>>>
>>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>
>>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>>>>>>>>> [image:
>>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera>
>>>>>>>>>> [image: Cloudera
>>>>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>>> ------------------------------
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>
>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>
>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>> ------------------------------
>>>>>>>>
>>>>>>>>
>>>>
>>>>
>>>> --
>>>> *Peter Cseh *| Software Engineer
>>>> cloudera.com <https://www.cloudera.com>
>>>>
>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>
>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>> ------------------------------
>>>>
>>>>
>>
>>
>> --
>> *Peter Cseh *| Software Engineer
>> cloudera.com <https://www.cloudera.com>
>>
>> [image: Cloudera] <https://www.cloudera.com/>
>>
>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>> ------------------------------
>>
>>


-- 
*Peter Cseh *| Software Engineer
cloudera.com <https://www.cloudera.com>

[image: Cloudera] <https://www.cloudera.com/>

[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------

Reply via email to