Ok I have tried this

It appears that s3a support requires httpclient 4.4.x and oozie is bundled
with httpclient 4.3.6. When httpclient is upgraded, the ext UI stops
loading.



On Thu, May 17, 2018 at 10:28 AM Peter Cseh <gezap...@cloudera.com> wrote:

> Purna,
>
> Based on
> https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3
> you should try to go for s3a.
> You'll have to include the aws-jdk as well if I see it correctly:
> https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A
> Also, the property names are slightly different so you'll have to change
> the example I've given.
>
>
>
> On Thu, May 17, 2018 at 4:16 PM, purna pradeep <purna2prad...@gmail.com>
> wrote:
>
>> Peter,
>>
>> I’m using latest oozie 5.0.0 and I have tried below changes but no luck
>>
>> Is this for s3 or s3a ?
>>
>> I’m using s3 but if this is for s3a do you know which jar I need to
>> include I mean Hadoop-aws jar or any other jar if required
>>
>> Hadoop-aws-2.8.3.jar is what I’m using
>>
>> On Wed, May 16, 2018 at 5:19 PM Peter Cseh <gezap...@cloudera.com> wrote:
>>
>>> Ok, I've found it:
>>>
>>> If you are using 4.3.0 or newer this is the part which checks for
>>> dependencies:
>>>
>>> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/coord/CoordCommandUtils.java#L914-L926
>>> It passes the coordinator action's configuration and even does
>>> impersonation to check for the dependencies:
>>>
>>> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/coord/input/logic/CoordInputLogicEvaluatorPhaseOne.java#L159
>>>
>>> Have you tried the following in the coordinator xml:
>>>
>>>  <action>
>>>         <workflow>
>>>           <app-path>hdfs://bar:9000/usr/joe/logsprocessor-wf</app-path>
>>>           <configuration>
>>>             <property>
>>>               <name>fs.s3.awsAccessKeyId</name>
>>>               <value>[YOURKEYID]</value>
>>>             </property>
>>>             <property>
>>>               <name>fs.s3.awsSecretAccessKey</name>
>>>               <value>[YOURKEY]</value>
>>>             </property>
>>>          </configuration>
>>>        </workflow>
>>>       </action>
>>>
>>> Based on the source this should be able to poll s3 periodically.
>>>
>>> On Wed, May 16, 2018 at 10:57 PM, purna pradeep <purna2prad...@gmail.com
>>> > wrote:
>>>
>>>>
>>>> I have tried with coordinator's configuration too but no luck ☹️
>>>>
>>>> On Wed, May 16, 2018 at 3:54 PM Peter Cseh <gezap...@cloudera.com>
>>>> wrote:
>>>>
>>>>> Great progress there purna! :)
>>>>>
>>>>> Have you tried adding these properites to the coordinator's
>>>>> configuration? we usually use the action config to build up connection to
>>>>> the distributed file system.
>>>>> Although I'm not sure we're using these when polling the dependencies
>>>>> for coordinators, but I'm excited about you trying to make it work!
>>>>>
>>>>> I'll get back with a - hopefully - more helpful answer soon, I have to
>>>>> check the code in more depth first.
>>>>> gp
>>>>>
>>>>> On Wed, May 16, 2018 at 9:45 PM, purna pradeep <
>>>>> purna2prad...@gmail.com> wrote:
>>>>>
>>>>>> Peter,
>>>>>>
>>>>>> I got rid of this error by adding
>>>>>> hadoop-aws-2.8.3.jar and jets3t-0.9.4.jar
>>>>>>
>>>>>> But I’m getting below error now
>>>>>>
>>>>>> java.lang.IllegalArgumentException: AWS Access Key ID and Secret
>>>>>> Access Key must be specified by setting the fs.s3.awsAccessKeyId and
>>>>>> fs.s3.awsSecretAccessKey properties (respectively)
>>>>>>
>>>>>> I have tried adding AWS access ,secret keys in
>>>>>>
>>>>>> oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, May 16, 2018 at 2:30 PM purna pradeep <
>>>>>> purna2prad...@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> I have tried this ,just added s3 instead of *
>>>>>>>
>>>>>>> <property>
>>>>>>>
>>>>>>>
>>>>>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
>>>>>>>
>>>>>>>     <value>hdfs,hftp,webhdfs,s3</value>
>>>>>>>
>>>>>>> </property>
>>>>>>>
>>>>>>>
>>>>>>> Getting below error
>>>>>>>
>>>>>>> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
>>>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem not found
>>>>>>>
>>>>>>>     at
>>>>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2369)
>>>>>>>
>>>>>>>     at
>>>>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2793)
>>>>>>>
>>>>>>>     at
>>>>>>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2810)
>>>>>>>
>>>>>>>     at
>>>>>>> org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
>>>>>>>
>>>>>>>     at
>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849)
>>>>>>>
>>>>>>>     at
>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831)
>>>>>>>
>>>>>>>     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
>>>>>>>
>>>>>>>     at
>>>>>>> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:625)
>>>>>>>
>>>>>>>     at
>>>>>>> org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:623
>>>>>>>
>>>>>>>
>>>>>>> On Wed, May 16, 2018 at 2:19 PM purna pradeep <
>>>>>>> purna2prad...@gmail.com> wrote:
>>>>>>>
>>>>>>>> This is what is in the logs
>>>>>>>>
>>>>>>>> 2018-05-16 14:06:13,500  INFO URIHandlerService:520 -
>>>>>>>> SERVER[localhost] Loaded urihandlers
>>>>>>>> [org.apache.oozie.dependency.FSURIHandler]
>>>>>>>>
>>>>>>>> 2018-05-16 14:06:13,501  INFO URIHandlerService:520 -
>>>>>>>> SERVER[localhost] Loaded default urihandler
>>>>>>>> org.apache.oozie.dependency.FSURIHandler
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, May 16, 2018 at 12:27 PM Peter Cseh <gezap...@cloudera.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> That's strange, this exception should not happen in that case.
>>>>>>>>> Can you check the server logs for messages like this?
>>>>>>>>>         LOG.info("Loaded urihandlers {0}",
>>>>>>>>> Arrays.toString(classes));
>>>>>>>>>         LOG.info("Loaded default urihandler {0}",
>>>>>>>>> defaultHandler.getClass().getName());
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> On Wed, May 16, 2018 at 5:47 PM, purna pradeep <
>>>>>>>>> purna2prad...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> This is what I already have in my oozie-site.xml
>>>>>>>>>>
>>>>>>>>>> <property>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
>>>>>>>>>>
>>>>>>>>>>         <value>*</value>
>>>>>>>>>>
>>>>>>>>>> </property>
>>>>>>>>>>
>>>>>>>>>> On Wed, May 16, 2018 at 11:37 AM Peter Cseh <
>>>>>>>>>> gezap...@cloudera.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> You'll have to configure
>>>>>>>>>>> oozie.service.HadoopAccessorService.supported.filesystems
>>>>>>>>>>> hdfs,hftp,webhdfs Enlist
>>>>>>>>>>> the different filesystems supported for federation. If wildcard
>>>>>>>>>>> "*" is
>>>>>>>>>>> specified, then ALL file schemes will be allowed.properly.
>>>>>>>>>>>
>>>>>>>>>>> For testing purposes it's ok to put * in there in oozie-site.xml
>>>>>>>>>>>
>>>>>>>>>>> On Wed, May 16, 2018 at 5:29 PM, purna pradeep <
>>>>>>>>>>> purna2prad...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> > Peter,
>>>>>>>>>>> >
>>>>>>>>>>> > I have tried to specify dataset with uri starting with s3://,
>>>>>>>>>>> s3a:// and
>>>>>>>>>>> > s3n:// and I am getting exception
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > Exception occurred:E0904: Scheme [s3] not supported in uri
>>>>>>>>>>> > [s3://mybucket/input.data] Making the job failed
>>>>>>>>>>> >
>>>>>>>>>>> > org.apache.oozie.dependency.URIHandlerException: E0904: Scheme
>>>>>>>>>>> [s3] not
>>>>>>>>>>> > supported in uri [s3:// mybucket /input.data]
>>>>>>>>>>> >
>>>>>>>>>>> >     at
>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>> > URIHandlerService.java:185)
>>>>>>>>>>> >
>>>>>>>>>>> >     at
>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>> > URIHandlerService.java:168)
>>>>>>>>>>> >
>>>>>>>>>>> >     at
>>>>>>>>>>> > org.apache.oozie.service.URIHandlerService.getURIHandler(
>>>>>>>>>>> > URIHandlerService.java:160)
>>>>>>>>>>> >
>>>>>>>>>>> >     at
>>>>>>>>>>> >
>>>>>>>>>>> org.apache.oozie.command.coord.CoordCommandUtils.createEarlyURIs(
>>>>>>>>>>> > CoordCommandUtils.java:465)
>>>>>>>>>>> >
>>>>>>>>>>> >     at
>>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>>> > separateResolvedAndUnresolved(CoordCommandUtils.java:404)
>>>>>>>>>>> >
>>>>>>>>>>> >     at
>>>>>>>>>>> > org.apache.oozie.command.coord.CoordCommandUtils.
>>>>>>>>>>> > materializeInputDataEvents(CoordCommandUtils.java:731)
>>>>>>>>>>> >
>>>>>>>>>>> >     at
>>>>>>>>>>> >
>>>>>>>>>>> org.apache.oozie.command.coord.CoordCommandUtils.materializeOneInstance(
>>>>>>>>>>> > CoordCommandUtils.java:546)
>>>>>>>>>>> >
>>>>>>>>>>> >     at
>>>>>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>>>>>>> >
>>>>>>>>>>> mand.materializeActions(CoordMaterializeTransitionXCommand.java:492)
>>>>>>>>>>> >
>>>>>>>>>>> >     at
>>>>>>>>>>> > org.apache.oozie.command.coord.CoordMaterializeTransitionXCom
>>>>>>>>>>> > mand.materialize(CoordMaterializeTransitionXCommand.java:362)
>>>>>>>>>>> >
>>>>>>>>>>> >     at
>>>>>>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>>>>>>> > MaterializeTransitionXCommand.java:73)
>>>>>>>>>>> >
>>>>>>>>>>> >     at
>>>>>>>>>>> > org.apache.oozie.command.MaterializeTransitionXCommand.execute(
>>>>>>>>>>> > MaterializeTransitionXCommand.java:29)
>>>>>>>>>>> >
>>>>>>>>>>> >     at
>>>>>>>>>>> org.apache.oozie.command.XCommand.call(XCommand.java:290)
>>>>>>>>>>> >
>>>>>>>>>>> >     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>>>>>>>> >
>>>>>>>>>>> >     at
>>>>>>>>>>> >
>>>>>>>>>>> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(
>>>>>>>>>>> > CallableQueueService.java:181)
>>>>>>>>>>> >
>>>>>>>>>>> >     at
>>>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>>>>>>>>> > ThreadPoolExecutor.java:1149)
>>>>>>>>>>> >
>>>>>>>>>>> >     at
>>>>>>>>>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>>>>>>>>> > ThreadPoolExecutor.java:624)
>>>>>>>>>>> >
>>>>>>>>>>> >     at java.lang.Thread.run(Thread.java:748)
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > Is S3 support specific to CDH distribution or should it work
>>>>>>>>>>> in Apache
>>>>>>>>>>> > Oozie as well? I’m not using CDH yet so
>>>>>>>>>>> >
>>>>>>>>>>> > On Wed, May 16, 2018 at 10:28 AM Peter Cseh <
>>>>>>>>>>> gezap...@cloudera.com> wrote:
>>>>>>>>>>> >
>>>>>>>>>>> > > I think it should be possible for Oozie to poll S3. Check
>>>>>>>>>>> out this
>>>>>>>>>>> > > <
>>>>>>>>>>> > > https://www.cloudera.com/documentation/enterprise/5-9-
>>>>>>>>>>> > x/topics/admin_oozie_s3.html
>>>>>>>>>>> > > >
>>>>>>>>>>> > > description on how to make it work in jobs, something
>>>>>>>>>>> similar should work
>>>>>>>>>>> > > on the server side as well
>>>>>>>>>>> > >
>>>>>>>>>>> > > On Tue, May 15, 2018 at 4:43 PM, purna pradeep <
>>>>>>>>>>> purna2prad...@gmail.com>
>>>>>>>>>>> > > wrote:
>>>>>>>>>>> > >
>>>>>>>>>>> > > > Thanks Andras,
>>>>>>>>>>> > > >
>>>>>>>>>>> > > > Also I also would like to know if oozie supports Aws S3 as
>>>>>>>>>>> input events
>>>>>>>>>>> > > to
>>>>>>>>>>> > > > poll for a dependency file before kicking off a spark
>>>>>>>>>>> action
>>>>>>>>>>> > > >
>>>>>>>>>>> > > >
>>>>>>>>>>> > > > For example: I don’t want to kick off a spark action until
>>>>>>>>>>> a file is
>>>>>>>>>>> > > > arrived on a given AWS s3 location
>>>>>>>>>>> > > >
>>>>>>>>>>> > > > On Tue, May 15, 2018 at 10:17 AM Andras Piros <
>>>>>>>>>>> > andras.pi...@cloudera.com
>>>>>>>>>>> > > >
>>>>>>>>>>> > > > wrote:
>>>>>>>>>>> > > >
>>>>>>>>>>> > > > > Hi,
>>>>>>>>>>> > > > >
>>>>>>>>>>> > > > > Oozie needs HDFS to store workflow, coordinator, or
>>>>>>>>>>> bundle
>>>>>>>>>>> > definitions,
>>>>>>>>>>> > > > as
>>>>>>>>>>> > > > > well as sharelib files in a safe, distributed and
>>>>>>>>>>> scalable way. Oozie
>>>>>>>>>>> > > > needs
>>>>>>>>>>> > > > > YARN to run almost all of its actions, Spark action
>>>>>>>>>>> being no
>>>>>>>>>>> > exception.
>>>>>>>>>>> > > > >
>>>>>>>>>>> > > > > At the moment it's not feasible to install Oozie without
>>>>>>>>>>> those Hadoop
>>>>>>>>>>> > > > > components. How to install Oozie please *find here
>>>>>>>>>>> > > > > <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*.
>>>>>>>>>>> > > > >
>>>>>>>>>>> > > > > Regards,
>>>>>>>>>>> > > > >
>>>>>>>>>>> > > > > Andras
>>>>>>>>>>> > > > >
>>>>>>>>>>> > > > > On Tue, May 15, 2018 at 4:11 PM, purna pradeep <
>>>>>>>>>>> > > purna2prad...@gmail.com>
>>>>>>>>>>> > > > > wrote:
>>>>>>>>>>> > > > >
>>>>>>>>>>> > > > > > Hi,
>>>>>>>>>>> > > > > >
>>>>>>>>>>> > > > > > Would like to know if I can use sparkaction in oozie
>>>>>>>>>>> without having
>>>>>>>>>>> > > > > Hadoop
>>>>>>>>>>> > > > > > cluster?
>>>>>>>>>>> > > > > >
>>>>>>>>>>> > > > > > I want to use oozie to schedule spark jobs on
>>>>>>>>>>> Kubernetes cluster
>>>>>>>>>>> > > > > >
>>>>>>>>>>> > > > > > I’m a beginner in oozie
>>>>>>>>>>> > > > > >
>>>>>>>>>>> > > > > > Thanks
>>>>>>>>>>> > > > > >
>>>>>>>>>>> > > > >
>>>>>>>>>>> > > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > >
>>>>>>>>>>> > > --
>>>>>>>>>>> > > *Peter Cseh *| Software Engineer
>>>>>>>>>>> > > cloudera.com <https://www.cloudera.com>
>>>>>>>>>>> > >
>>>>>>>>>>> > > [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>> > >
>>>>>>>>>>> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>>>>>>>>>> [image:
>>>>>>>>>>> > > Cloudera on Facebook] <https://www.facebook.com/cloudera>
>>>>>>>>>>> [image:
>>>>>>>>>>> > Cloudera
>>>>>>>>>>> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>>>> > > ------------------------------
>>>>>>>>>>> > >
>>>>>>>>>>> >
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>>>>
>>>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>>>
>>>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera>
>>>>>>>>>>> [image:
>>>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera>
>>>>>>>>>>> [image: Cloudera
>>>>>>>>>>> on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>>>> ------------------------------
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Peter Cseh *| Software Engineer
>>>>>>>>> cloudera.com <https://www.cloudera.com>
>>>>>>>>>
>>>>>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>>>>>
>>>>>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>>>>>> ------------------------------
>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Peter Cseh *| Software Engineer
>>>>> cloudera.com <https://www.cloudera.com>
>>>>>
>>>>> [image: Cloudera] <https://www.cloudera.com/>
>>>>>
>>>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>>>> ------------------------------
>>>>>
>>>>>
>>>
>>>
>>> --
>>> *Peter Cseh *| Software Engineer
>>> cloudera.com <https://www.cloudera.com>
>>>
>>> [image: Cloudera] <https://www.cloudera.com/>
>>>
>>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>>> ------------------------------
>>>
>>>
>
>
> --
> *Peter Cseh *| Software Engineer
> cloudera.com <https://www.cloudera.com>
>
> [image: Cloudera] <https://www.cloudera.com/>
>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------
>
>

Reply via email to