[jira] [Commented] (OOZIE-1770) Create Oozie Application Master for YARN

2016-04-25 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257566#comment-15257566
 ] 

Srikanth Sundarrajan commented on OOZIE-1770:
-

Thanks [~rkanter] for summarizing the discussions. Here are some additional 
things to think about while considering an AM pool

1. Currently we have one external url per action to track the execution of the 
action and associated logs. With the Yarn app per action, this would continue 
to work cleanly. It might introduce avoidable overheads to use AM Pool if logs 
& action execution details have to be tracked per action
2. With AppMaster being the launcher, it might be trivilally simple to handle 
AM restarts and RM restarts/fail overs. With the AM pool, I am guessing we need 
to worry about AM failures and container failures


> Create Oozie Application Master for YARN
> 
>
> Key: OOZIE-1770
> URL: https://issues.apache.org/jira/browse/OOZIE-1770
> Project: Oozie
>  Issue Type: New Feature
>Reporter: Bowen Zhang
>Assignee: Bowen Zhang
> Attachments: OozieYarnAM.pdf, Prelim OYA Scoping Doc 001.pdf, 
> oya-rm-screenshot.jpg, oya.patch
>
>
> After the first release of oozie on hadoop 2, it will be good if users can 
> set execution engine in oozie conf, be it YARN AM or traditional MR. We can 
> target this for post oozie 4.1 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2259) Create a callback action

2016-03-01 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173726#comment-15173726
 ] 

Srikanth Sundarrajan commented on OOZIE-2259:
-

[~rohini]/[~rkanter], You had some feedback on the patch (available on review 
board). Had shared my views on those in response. Would it be possible for 
either or both of you to let me know your thoughts. We can move forward on this 
based on your inputs.

Thanks

> Create a callback action 
> -
>
> Key: OOZIE-2259
> URL: https://issues.apache.org/jira/browse/OOZIE-2259
> Project: Oozie
>  Issue Type: New Feature
>  Components: action
>Reporter: Jaydeep Vishwakarma
>Assignee: Jaydeep Vishwakarma
> Attachments: OOZIE-2259-v1.patch, OOZIE-2259-v3.patch, 
> OOZIE-2259-v4.patch, OOZIE-2259-v5.patch, OOZIE-2259-v8.patch, 
> OOZIE-2259-v9.patch, OOZIE-2259_v6.patch, OOZIE-2259_v7.patch
>
>
> Need an action to send notification to external server by oozie. We should be 
> able to do multiple types of callback, Currently I know jms and http call. It 
> should suppose to have capability to call diffrent types of methods along 
> with n number of arguments. 
> The sample workflow with callback action 
> {code:xml}
> 
> ...
> 
> 
>   [HOST]
>   [METHOD]
>   
>   [KEY][VALUE]
>   
> ...
> 
> ...
> 
> ...
> 
> {code}
> HOST : by the host system can figure out if it is http or jms callback 
> action. System will send the notification to that host.
> METHOD : it can be POST/GET/QUEUE/TOPIC



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2259) Create a callback action

2015-11-05 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991910#comment-14991910
 ] 

Srikanth Sundarrajan commented on OOZIE-2259:
-

My bad. I was under the impression separation of the thread pool is already in 
this. I know we discussed this, but forgot that this is scoped in another JIRA. 
[~puru], like I mentioned, I am in total agreement with the concern you had 
raised (that is why isolation of the thread pool is necessary). if this jira 
gets in, we need to follow that up OOZIE-2231 soon enough.

> Create a callback action 
> -
>
> Key: OOZIE-2259
> URL: https://issues.apache.org/jira/browse/OOZIE-2259
> Project: Oozie
>  Issue Type: New Feature
>  Components: action
>Reporter: Jaydeep Vishwakarma
>Assignee: Jaydeep Vishwakarma
> Attachments: OOZIE-2259-v1.patch, OOZIE-2259-v3.patch, 
> OOZIE-2259-v4.patch, OOZIE-2259-v5.patch
>
>
> Need an action to send notification to external server by oozie. We should be 
> able to do multiple types of callback, Currently I know jms and http call. It 
> should suppose to have capability to call diffrent types of methods along 
> with n number of arguments. 
> The sample workflow with callback action 
> {code:xml}
> 
> ...
> 
> 
>   [HOST]
>   [METHOD]
>   
>   [KEY][VALUE]
>   
> ...
> 
> ...
> 
> ...
> 
> {code}
> HOST : by the host system can figure out if it is http or jms callback 
> action. System will send the notification to that host.
> METHOD : it can be POST/GET/QUEUE/TOPIC



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2259) Create a callback action

2015-11-04 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991214#comment-14991214
 ] 

Srikanth Sundarrajan commented on OOZIE-2259:
-

[~puru], I feel that isolating this into a different thread pool was necessary 
for exactly solving the issue that you highlighted. If the callback action were 
to be in the main command queue execution threadpool, it can potentially take 
the system for a ride. The only issue I see is that if there were significant 
back pressure on the callback end point, then the auxillary queue for callback 
actions may grow and put some memory pressure. But eventually it would start 
throttling down the materialization of the coordinator that triggered of the 
workflow/action. Generally the sense I get is that the there are enough safe 
guards to prevent general degradation of other services within the system.

> Create a callback action 
> -
>
> Key: OOZIE-2259
> URL: https://issues.apache.org/jira/browse/OOZIE-2259
> Project: Oozie
>  Issue Type: New Feature
>  Components: action
>Reporter: Jaydeep Vishwakarma
>Assignee: Jaydeep Vishwakarma
> Attachments: OOZIE-2259-v1.patch, OOZIE-2259-v3.patch, 
> OOZIE-2259-v4.patch, OOZIE-2259-v5.patch
>
>
> Need an action to send notification to external server by oozie. We should be 
> able to do multiple types of callback, Currently I know jms and http call. It 
> should suppose to have capability to call diffrent types of methods along 
> with n number of arguments. 
> The sample workflow with callback action 
> {code:xml}
> 
> ...
> 
> 
>   [HOST]
>   [METHOD]
>   
>   [KEY][VALUE]
>   
> ...
> 
> ...
> 
> ...
> 
> {code}
> HOST : by the host system can figure out if it is http or jms callback 
> action. System will send the notification to that host.
> METHOD : it can be POST/GET/QUEUE/TOPIC



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (OOZIE-1534) Launcher job might run do hadoop attempt relaunch - possibly causing correctness issues

2015-10-21 Thread Srikanth Sundarrajan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OOZIE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Sundarrajan resolved OOZIE-1534.
-
   Resolution: Fixed
 Assignee: Jaydeep Vishwakarma
Fix Version/s: 4.2.0

With the use of Yarn-tags this is solved for hadoop-2

> Launcher job might run do hadoop attempt relaunch - possibly causing 
> correctness issues
> ---
>
> Key: OOZIE-1534
> URL: https://issues.apache.org/jira/browse/OOZIE-1534
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Srikanth Sundarrajan
>Assignee: Jaydeep Vishwakarma
> Fix For: 4.2.0
>
>
> The  section of the action allow to clean up the output dir. This is 
> not sufficient as MR jobs started by Pig/Hive may be still running.We should 
> look to kill child MR jobs if any before launching new ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2258) Introducing a new counter in the instrumentation log to distinguish between the reasons for launcher failure

2015-09-24 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907653#comment-14907653
 ] 

Srikanth Sundarrajan commented on OOZIE-2258:
-

{code}
@@ -1411,12 +1414,20 @@ public class JavaActionExecutor extends ActionExecutor {
 if (exMsg != null) {
 LOG.warn("Launcher exception: {0}{E}{1}", 
exMsg, exStackTrace);
 }
+else {
+childJobKill = true;
+}
{code}

Not sure if this is in the right place. Possible to add a test.

A more fundamental question. How do we intend to use this ? 

> Introducing a new counter in the instrumentation log to distinguish between 
> the reasons for launcher failure
> 
>
> Key: OOZIE-2258
> URL: https://issues.apache.org/jira/browse/OOZIE-2258
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Narayan Periwal
>Assignee: Narayan Periwal
> Attachments: OOZIE-2258-v0.patch, OOZIE-2258-v1.patch
>
>
> Whether the launcher job fails due to child job failure or exception in the 
> launcher job itself, in both the case, the "counters:jobs:killed" counter is 
> updated in the instrumentation log. Hence, we cannot distinguish whether the 
> launcher failure was due to child job getting failed or not. So, we can 
> introduce a new counter "kill" under the group "childjobs" that will help us 
> to distinguish if the launcher failure is due to the child jobs getting 
> failed.
> Let me know if there is already any other way by which we can distinguish 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2314) Unable to kill old instance child job by workflow or coord rerun by Launcher

2015-09-24 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907637#comment-14907637
 ] 

Srikanth Sundarrajan commented on OOZIE-2314:
-

Good catch [~jaydeepvishwakarma]. Thanks for the patch. A minor nit, have left 
my comments in RB.

> Unable to kill old instance child job by workflow or coord rerun by Launcher
> 
>
> Key: OOZIE-2314
> URL: https://issues.apache.org/jira/browse/OOZIE-2314
> Project: Oozie
>  Issue Type: Bug
>Reporter: Jaydeep Vishwakarma
>Assignee: Jaydeep Vishwakarma
>Priority: Blocker
> Attachments: OOZIE-2314.patch
>
>
> Oozie launcher kills all the child jobs which, launched by an old instance of 
> same launcher, workflow or coord action to avoid the duplicate child running 
> at same. For same it searches the application ids by tag and time, And it 
> kills all AMs. You can find more detail in OOZIE-2129. 
> It works fine when Launcher attempt gets killed and tries again. In case of 
> Yarn container which contains AM get kills due to some reason and we run 
> workflow/coord action this patch does not work.
>It happens due to a time filter applied during finding the app ids, which 
> always takes the current time from the server.
>{{LauncherMapperHelper.java}}
>{code}
>public static void setupYarnRestartHandling(JobConf launcherJobConf, 
> Configuration actionConf, String launcherTag)
>throws NoSuchAlgorithmException {
>
> launcherJobConf.setLong(LauncherMainHadoopUtils.OOZIE_JOB_LAUNCH_TIME, 
> System.currentTimeMillis());
>// Tags are limited to 100 chars so we need to hash them to make 
> sure (the actionId otherwise doesn't have a max length)
>String tag = getTag(launcherTag);
>// keeping the oozie.child.mapreduce.job.tags instead of 
> mapreduce.job.tags to avoid killing launcher itself.
>// mapreduce.job.tags should only go to child job launch by 
> launcher.
>actionConf.set(LauncherMainHadoopUtils.CHILD_MAPREDUCE_JOB_TAGS, 
> tag);
>}
>{code}
> When a user rerun the workflow or coord action, Launcher picks the current 
> system time along with tags, It searches for running application ids and 
> kills them. It eventually does not find any App Id, As the previous instance 
> of the same workflow/coord ran before the new system time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2251) Expose instrumental matrices in Realtime Graphing tool

2015-09-21 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901229#comment-14901229
 ] 

Srikanth Sundarrajan commented on OOZIE-2251:
-

Looks good to me, but this needs to be rebased.

> Expose instrumental matrices in Realtime Graphing tool
> --
>
> Key: OOZIE-2251
> URL: https://issues.apache.org/jira/browse/OOZIE-2251
> Project: Oozie
>  Issue Type: New Feature
>  Components: monitoring
>Reporter: Jaydeep Vishwakarma
>Assignee: Narayan Periwal
> Attachments: OOZIE-2251-v0.patch, OOZIE-2251-v1.patch, 
> OOZIE-2251-v10.patch, OOZIE-2251-v2.patch, OOZIE-2251-v3.patch, 
> OOZIE-2251-v4.patch, OOZIE-2251-v5.patch, OOZIE-2251-v6.patch, 
> OOZIE-2251-v7.patch, OOZIE-2251-v8.patch, OOZIE-2251-v9.patch
>
>
> We have been logging so many important matrices in oozie-instrumentation.log 
> . These information is very useful for oozie functional monitoring. But it is 
> always difficult to get the meaning from flat file. If we expose this 
> information on some graphing tool, We can get the lot of meaning out of it 
> and can take some actions based on it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2243) Kill Command does not kill the child job for java action

2015-09-21 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901218#comment-14901218
 ] 

Srikanth Sundarrajan commented on OOZIE-2243:
-

This one needs to be rebased I guess. [~nperiwal], can you rebase and put the 
patch up in review board please?

> Kill Command does not kill the child job for java action
> 
>
> Key: OOZIE-2243
> URL: https://issues.apache.org/jira/browse/OOZIE-2243
> Project: Oozie
>  Issue Type: Bug
>Reporter: Narayan Periwal
>Assignee: Narayan Periwal
>Priority: Minor
> Attachments: OOZIE-2243-v0.patch, OOZIE-2243-v1.patch, 
> OOZIE-2243-v2.patch
>
>
> Lets say, there is launcher job that launches another map-reduce job through 
> java-action. When we kill the launcher job, the child job launched by it does 
> not get killed and only the launcher job gets killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-1770) Create Oozie Application Master for YARN

2015-07-17 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632228#comment-14632228
 ] 

Srikanth Sundarrajan commented on OOZIE-1770:
-

Running Oozie launcher tasks as a Map reduce task/job is indeed a huge hack and 
we should most certainly look to take advantage of YARN and integrate more 
directly with it. Here are possibly some direct rewards that we should look to 
reap with such a direct integration.
  - Cleaner integration (No artificial split creations, Input & Output exchange 
mechanisms)
  - Assumptions in MR of tasks being idempotent is a huge limitation and new 
solution should be able to overcome this
  - Heavy resource overheads in terms of App Master/Launcher task for each 
action can be avoided
  - Issues such as App Master restarts or Task Attempt relaunches causes both 
lost work and possibly issues with data today. They can be avoided
 
Taking a step back, here is the list of possible ways in which we can integrate 
with YARN more natively.

+Actions executed via Native Oozie App Master+
An App Master which is capable of executing Oozie Action directly as opposed to 
making it appear as a MR Job. This in all likely hood going to appear like the 
current MR based execution in uber mode. Doesn't really offer much other than 
moving away from Map task execution mode.

+Actions executed via Single AM per user+
A reusable Oozie AM per user, which creates launcher containers for each action 
(as proposed by [~rkanter]). This would allow us to reduce the AM overheads and 
also reduce the launch latency (as AMs are ready and warmed up) and would 
launch tasks more natively as opposed to it appearing as MR job.

+Workflows executed via a Single AM+
Run the entire workflow in a single AM. In this mode, the workflow and all its 
actions (DagEngine) is actually executed on the Oozie Workflow AM and all the 
child actions can either be executed in a action specific thread/class loader 
by default with an ability to execute them in a forked container. In this mode, 
the Oozie Workflows can be executed at a much lower overheads, with the 
possibility of lowering the burden on Oozie server. This ofcourse introduces 
challenges relating to maintaining state in Oozie DB relating to workflow 
execution. However can be solved by maintaining state in HDFS with notification 
+ polling based updates by Oozie server to DB. 

My personal choice would be to do the last option as we can allow Workflow 
execution to be used outside of Oozie Coordinators besides allwoing Oozie 
server to scale better, while keeping the larger objective of moving away from 
Map Reduce jobs for Oozie actions. Thoughts ?


> Create Oozie Application Master for YARN
> 
>
> Key: OOZIE-1770
> URL: https://issues.apache.org/jira/browse/OOZIE-1770
> Project: Oozie
>  Issue Type: New Feature
>Reporter: Bowen Zhang
>Assignee: Bowen Zhang
> Attachments: oya-rm-screenshot.jpg, oya.patch
>
>
> After the first release of oozie on hadoop 2, it will be good if users can 
> set execution engine in oozie conf, be it YARN AM or traditional MR. We can 
> target this for post oozie 4.1 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2302) Reload feature for oozie-site config

2015-07-14 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14626031#comment-14626031
 ] 

Srikanth Sundarrajan commented on OOZIE-2302:
-

Consider separating the conf into static startup config and a dynamic runtime 
config. This however is backward incompatible and would require a general 
consensus or a way to keep it compatible, till user chooses to separate them.

> Reload feature for oozie-site config 
> -
>
> Key: OOZIE-2302
> URL: https://issues.apache.org/jira/browse/OOZIE-2302
> Project: Oozie
>  Issue Type: New Feature
>  Components: core
>Reporter: Jaydeep Vishwakarma
>Assignee: Jaydeep Vishwakarma
>
> Whenever user wants to add/modify any property, He has to restart the oozie 
> server to see the impact of config updates. It is very inconvenient as User 
> has to either kill or drain out all the jobs from oozie, which eventually 
> lead to slow down the production pace. We should suppose to have reload 
> support for config updates. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2030) Configuration properties from global section is not getting set in Hadoop job conf when using sub-workflow action in Oozie workflow.xml

2015-07-14 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14626013#comment-14626013
 ] 

Srikanth Sundarrajan commented on OOZIE-2030:
-

>From what I understood there were 2 concerns with the patch

1> Not removing the global conf from workflow conf before persistence
2> Not encoding global conf and using xml as is

Given the global section is being added only to the workflow conf (in 
compressed state), am assuming there isn't much storage overhead and retaining 
the conf in xml format, might not be all that bad as long as there is no direct 
string manipulation of xmls. 

Any further work needed on the patch ?

> Configuration properties from global section is not getting set in Hadoop job 
> conf when using sub-workflow action in Oozie workflow.xml 
> 
>
> Key: OOZIE-2030
> URL: https://issues.apache.org/jira/browse/OOZIE-2030
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Reporter: Peeyush Bishnoi
>Assignee: Jaydeep Vishwakarma
> Attachments: OOZIE-2030-v2.patch, OOZIE-2030-v3.patch, 
> OOZIE-2030-v4.patch, OOZIE-2030-v5.patch, OOZIE-2030-v6.patch, 
> OOZIE-2030.patch
>
>
> When submitting Oozie workflow with sub-workflow action and with global 
> section, configuration properties defined in global section is not getting 
> set in launched Hadoop job conf. But when we use Pig or MR action in 
> workflow.xml, configuration properties from global section set properly into 
> Hadoop job conf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2299) Falcon build fails with Oozie-4.2.0

2015-07-09 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14620439#comment-14620439
 ] 

Srikanth Sundarrajan commented on OOZIE-2299:
-

In an offline conversation with [~pallavi.rao], she seemed to suggest that the 
issue is due to codehaus repository being decommissioned.

> Falcon build fails with Oozie-4.2.0
> ---
>
> Key: OOZIE-2299
> URL: https://issues.apache.org/jira/browse/OOZIE-2299
> Project: Oozie
>  Issue Type: Bug
>Affects Versions: 4.2.0, 4.3.0
>Reporter: Peeyush Bishnoi
>Priority: Blocker
> Fix For: 4.2.0
>
>
> Falcon build fails with following error when try to build with Apache 
> Oozie-4.2.0. 
> {code:java}
> [INFO] Apache Falcon Oozie EL Extension ... FAILURE [  1.388 
> s]
> [INFO] Apache Falcon Embedded Hadoop - Test Cluster ... SKIPPED
> [INFO] Apache Falcon Sharelib Hive - Test Cluster . SKIPPED
> [INFO] Apache Falcon Sharelib Pig - Test Cluster .. SKIPPED
> [INFO] Apache Falcon Sharelib Hcatalog - Test Cluster . SKIPPED
> [INFO] Apache Falcon Sharelib Oozie - Test Cluster  SKIPPED
> [INFO] Apache Falcon Test Tools - Test Cluster  SKIPPED
> [INFO] Apache Falcon Messaging  SKIPPED
> [INFO] Apache Falcon Oozie Adaptor  SKIPPED
> [INFO] Apache Falcon Acquisition .. SKIPPED
> [INFO] Apache Falcon Distcp Replication ... SKIPPED
> [INFO] Apache Falcon Retention  SKIPPED
> [INFO] Apache Falcon Archival . SKIPPED
> [INFO] Apache Falcon Rerun  SKIPPED
> [INFO] Apache Falcon Prism  SKIPPED
> [INFO] Apache Falcon Hive Replication . SKIPPED
> [INFO] Apache Falcon Web Application .. SKIPPED
> [INFO] Apache Falcon Documentation  SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 35.898 s
> [INFO] Finished at: 2015-07-08T22:30:44+05:30
> [INFO] Final Memory: 151M/613M
> [INFO] 
> 
> [ERROR] Failed to execute goal on project falcon-oozie-el-extension: Could 
> not resolve dependencies for project 
> org.apache.falcon:falcon-oozie-el-extension:jar:0.7-SNAPSHOT: Failed to 
> collect dependencies at org.apache.oozie:oozie-core:jar:4.2.0-falcon -> 
> org.apache.oozie:oozie-client:jar:4.2.0-falcon -> 
> org.apache.oozie:oozie-hadoop-auth:jar:hadoop-1-4.2.0-falcon: Failed to read 
> artifact descriptor for 
> org.apache.oozie:oozie-hadoop-auth:jar:hadoop-1-4.2.0-falcon: Could not 
> transfer artifact 
> org.apache.oozie:oozie-hadoop-auth:pom:hadoop-1-4.2.0-falcon from/to Codehaus 
> repository (http://repository.codehaus.org/): Failed to transfer file: 
> http://repository.codehaus.org/org/apache/oozie/oozie-hadoop-auth/hadoop-1-4.2.0-falcon/oozie-hadoop-auth-hadoop-1-4.2.0-falcon.pom.
>  Return code is: 410 , ReasonPhrase:Gone. -> [Help 1]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2299) Falcon build fails with Oozie-4.2.0

2015-07-09 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14620376#comment-14620376
 ] 

Srikanth Sundarrajan commented on OOZIE-2299:
-

Perhaps this needs to be fixed in Falcon. [~peeyushb], Can you verify and move 
to Falcon project if necessary ?

> Falcon build fails with Oozie-4.2.0
> ---
>
> Key: OOZIE-2299
> URL: https://issues.apache.org/jira/browse/OOZIE-2299
> Project: Oozie
>  Issue Type: Bug
>Affects Versions: 4.2.0, 4.3.0
>Reporter: Peeyush Bishnoi
>Priority: Blocker
> Fix For: 4.2.0
>
>
> Falcon build fails with following error when try to build with Apache 
> Oozie-4.2.0. 
> {code:java}
> [INFO] Apache Falcon Oozie EL Extension ... FAILURE [  1.388 
> s]
> [INFO] Apache Falcon Embedded Hadoop - Test Cluster ... SKIPPED
> [INFO] Apache Falcon Sharelib Hive - Test Cluster . SKIPPED
> [INFO] Apache Falcon Sharelib Pig - Test Cluster .. SKIPPED
> [INFO] Apache Falcon Sharelib Hcatalog - Test Cluster . SKIPPED
> [INFO] Apache Falcon Sharelib Oozie - Test Cluster  SKIPPED
> [INFO] Apache Falcon Test Tools - Test Cluster  SKIPPED
> [INFO] Apache Falcon Messaging  SKIPPED
> [INFO] Apache Falcon Oozie Adaptor  SKIPPED
> [INFO] Apache Falcon Acquisition .. SKIPPED
> [INFO] Apache Falcon Distcp Replication ... SKIPPED
> [INFO] Apache Falcon Retention  SKIPPED
> [INFO] Apache Falcon Archival . SKIPPED
> [INFO] Apache Falcon Rerun  SKIPPED
> [INFO] Apache Falcon Prism  SKIPPED
> [INFO] Apache Falcon Hive Replication . SKIPPED
> [INFO] Apache Falcon Web Application .. SKIPPED
> [INFO] Apache Falcon Documentation  SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 35.898 s
> [INFO] Finished at: 2015-07-08T22:30:44+05:30
> [INFO] Final Memory: 151M/613M
> [INFO] 
> 
> [ERROR] Failed to execute goal on project falcon-oozie-el-extension: Could 
> not resolve dependencies for project 
> org.apache.falcon:falcon-oozie-el-extension:jar:0.7-SNAPSHOT: Failed to 
> collect dependencies at org.apache.oozie:oozie-core:jar:4.2.0-falcon -> 
> org.apache.oozie:oozie-client:jar:4.2.0-falcon -> 
> org.apache.oozie:oozie-hadoop-auth:jar:hadoop-1-4.2.0-falcon: Failed to read 
> artifact descriptor for 
> org.apache.oozie:oozie-hadoop-auth:jar:hadoop-1-4.2.0-falcon: Could not 
> transfer artifact 
> org.apache.oozie:oozie-hadoop-auth:pom:hadoop-1-4.2.0-falcon from/to Codehaus 
> repository (http://repository.codehaus.org/): Failed to transfer file: 
> http://repository.codehaus.org/org/apache/oozie/oozie-hadoop-auth/hadoop-1-4.2.0-falcon/oozie-hadoop-auth-hadoop-1-4.2.0-falcon.pom.
>  Return code is: 410 , ReasonPhrase:Gone. -> [Help 1]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2251) Expose instrumental matrices in Realtime Graphing tool

2015-07-06 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14616187#comment-14616187
 ] 

Srikanth Sundarrajan commented on OOZIE-2251:
-

[~nperiwal], If some user were to use Ganglia in their environment, then they 
will need additionally add the excluded dependency to the class path. It would 
be useful to cover this in the docs.

> Expose instrumental matrices in Realtime Graphing tool
> --
>
> Key: OOZIE-2251
> URL: https://issues.apache.org/jira/browse/OOZIE-2251
> Project: Oozie
>  Issue Type: New Feature
>  Components: monitoring
>Reporter: Jaydeep Vishwakarma
>Assignee: Narayan Periwal
> Attachments: OOZIE-2251-v0.patch, OOZIE-2251-v1.patch, 
> OOZIE-2251-v2.patch, OOZIE-2251-v3.patch, OOZIE-2251-v4.patch, 
> OOZIE-2251-v5.patch, OOZIE-2251-v6.patch, OOZIE-2251-v7.patch, 
> OOZIE-2251-v8.patch, OOZIE-2251-v9.patch
>
>
> We have been logging so many important matrices in oozie-instrumentation.log 
> . These information is very useful for oozie functional monitoring. But it is 
> always difficult to get the meaning from flat file. If we expose this 
> information on some graphing tool, We can get the lot of meaning out of it 
> and can take some actions based on it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2030) Configuration properties from global section is not getting set in Hadoop job conf when using sub-workflow action in Oozie workflow.xml

2015-07-06 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614897#comment-14614897
 ] 

Srikanth Sundarrajan commented on OOZIE-2030:
-

+1

> Configuration properties from global section is not getting set in Hadoop job 
> conf when using sub-workflow action in Oozie workflow.xml 
> 
>
> Key: OOZIE-2030
> URL: https://issues.apache.org/jira/browse/OOZIE-2030
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Reporter: Peeyush Bishnoi
>Assignee: Jaydeep Vishwakarma
> Attachments: OOZIE-2030-v2.patch, OOZIE-2030-v3.patch, 
> OOZIE-2030-v4.patch, OOZIE-2030-v5.patch, OOZIE-2030.patch
>
>
> When submitting Oozie workflow with sub-workflow action and with global 
> section, configuration properties defined in global section is not getting 
> set in launched Hadoop job conf. But when we use Pig or MR action in 
> workflow.xml, configuration properties from global section set properly into 
> Hadoop job conf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2253) Spark Job is failing when it is running in standalone server

2015-06-15 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587506#comment-14587506
 ] 

Srikanth Sundarrajan commented on OOZIE-2253:
-

Looks good. +1

> Spark Job is failing when it is running in standalone server
> 
>
> Key: OOZIE-2253
> URL: https://issues.apache.org/jira/browse/OOZIE-2253
> Project: Oozie
>  Issue Type: Bug
>Reporter: pavan kumar kolamuri
>Assignee: pavan kumar kolamuri
> Attachments: OOZIE-2253.patch
>
>
> When Spark Job is running in spark standalone cluster the job is getting 
> launched and succedded and infinite jobs are getting launched in spark 
> cluster. Oozie workflow will be in running state forever as spark is 
> launching job infinite times. 
> This might be because in spark when job succeeds and it always do 
> System.exit(0) . In LauncherSecurityManager  exception is thrown for this. It 
> looks like spark(through akka framework)  is catching that and launching one 
> more attempt for the same job. It is happening infinitely .
> {noformat}
> Sending launch command to spark://inmobi-Precision-T3610:7077
> Driver successfully submitted as driver-20150526105806-
> ... waiting before polling master for driver state
> ... polling master for driver state
> State of driver-20150526105806- is SUBMITTED
> Sending launch command to spark://inmobi-Precision-T3610:7077
> Driver successfully submitted as driver-20150526105811-0001
> ... waiting before polling master for driver state
> ... polling master for driver state
> State of driver-20150526105811-0001 is SUBMITTED
> Sending launch command to spark://inmobi-Precision-T3610:7077
> Driver successfully submitted as driver-20150526105816-0002
> ... waiting before polling master for driver state
> ... polling master for driver state
> State of driver-20150526105816-0002 is SUBMITTED
> Sending launch command to spark://inmobi-Precision-T3610:7077
> Driver successfully submitted as driver-20150526105821-0003
> ... waiting before polling master for driver state
> ... polling master for driver state
> State of driver-20150526105821-0003 is SUBMITTED
> Sending launch command to spark://inmobi-Precision-T3610:7077
> Driver successfully submitted as driver-20150526105826-0004
> ... waiting before polling master for driver state
> {noformat}
> {noformat}
> 2015-05-26 10:58:11,573 ERROR [driverClient-akka.actor.default-dispatcher-4] 
> akka.actor.OneForOneStrategy: Intercepted System.exit(0)
> java.lang.SecurityException: Intercepted System.exit(0)
>   at 
> org.apache.oozie.action.hadoop.LauncherSecurityManager.checkExit(LauncherMapper.java:601)
>   at java.lang.Runtime.exit(Runtime.java:107)
>   at java.lang.System.exit(System.java:962)
>   at 
> org.apache.spark.deploy.ClientActor.pollAndReportStatus(Client.scala:115)
>   at 
> org.apache.spark.deploy.ClientActor$$anonfun$receiveWithLogging$1.applyOrElse(Client.scala:123)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>   at 
> org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53)
>   at 
> org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
>   at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>   at 
> org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2253) Spark Job is failing when it is running in standalone server

2015-06-15 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587482#comment-14587482
 ] 

Srikanth Sundarrajan commented on OOZIE-2253:
-

Can you please upload to review board ?


> Spark Job is failing when it is running in standalone server
> 
>
> Key: OOZIE-2253
> URL: https://issues.apache.org/jira/browse/OOZIE-2253
> Project: Oozie
>  Issue Type: Bug
>Reporter: pavan kumar kolamuri
>Assignee: pavan kumar kolamuri
> Attachments: OOZIE-2253.patch
>
>
> When Spark Job is running in spark standalone cluster the job is getting 
> launched and succedded and infinite jobs are getting launched in spark 
> cluster. Oozie workflow will be in running state forever as spark is 
> launching job infinite times. 
> This might be because in spark when job succeeds and it always do 
> System.exit(0) . In LauncherSecurityManager  exception is thrown for this. It 
> looks like spark(through akka framework)  is catching that and launching one 
> more attempt for the same job. It is happening infinitely .
> {noformat}
> Sending launch command to spark://inmobi-Precision-T3610:7077
> Driver successfully submitted as driver-20150526105806-
> ... waiting before polling master for driver state
> ... polling master for driver state
> State of driver-20150526105806- is SUBMITTED
> Sending launch command to spark://inmobi-Precision-T3610:7077
> Driver successfully submitted as driver-20150526105811-0001
> ... waiting before polling master for driver state
> ... polling master for driver state
> State of driver-20150526105811-0001 is SUBMITTED
> Sending launch command to spark://inmobi-Precision-T3610:7077
> Driver successfully submitted as driver-20150526105816-0002
> ... waiting before polling master for driver state
> ... polling master for driver state
> State of driver-20150526105816-0002 is SUBMITTED
> Sending launch command to spark://inmobi-Precision-T3610:7077
> Driver successfully submitted as driver-20150526105821-0003
> ... waiting before polling master for driver state
> ... polling master for driver state
> State of driver-20150526105821-0003 is SUBMITTED
> Sending launch command to spark://inmobi-Precision-T3610:7077
> Driver successfully submitted as driver-20150526105826-0004
> ... waiting before polling master for driver state
> {noformat}
> {noformat}
> 2015-05-26 10:58:11,573 ERROR [driverClient-akka.actor.default-dispatcher-4] 
> akka.actor.OneForOneStrategy: Intercepted System.exit(0)
> java.lang.SecurityException: Intercepted System.exit(0)
>   at 
> org.apache.oozie.action.hadoop.LauncherSecurityManager.checkExit(LauncherMapper.java:601)
>   at java.lang.Runtime.exit(Runtime.java:107)
>   at java.lang.System.exit(System.java:962)
>   at 
> org.apache.spark.deploy.ClientActor.pollAndReportStatus(Client.scala:115)
>   at 
> org.apache.spark.deploy.ClientActor$$anonfun$receiveWithLogging$1.applyOrElse(Client.scala:123)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>   at 
> org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53)
>   at 
> org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
>   at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>   at 
> org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2030) Configuration properties from global section is not getting set in Hadoop job conf when using sub-workflow action in Oozie workflow.xml

2015-06-15 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585679#comment-14585679
 ] 

Srikanth Sundarrajan commented on OOZIE-2030:
-

Didn't realize that [~shwethags] has already proposed the same approach. Sorry

> Configuration properties from global section is not getting set in Hadoop job 
> conf when using sub-workflow action in Oozie workflow.xml 
> 
>
> Key: OOZIE-2030
> URL: https://issues.apache.org/jira/browse/OOZIE-2030
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Reporter: Peeyush Bishnoi
>Assignee: Jaydeep Vishwakarma
> Attachments: OOZIE-2030-v2.patch, OOZIE-2030-v3.patch, 
> OOZIE-2030-v4.patch, OOZIE-2030.patch
>
>
> When submitting Oozie workflow with sub-workflow action and with global 
> section, configuration properties defined in global section is not getting 
> set in launched Hadoop job conf. But when we use Pig or MR action in 
> workflow.xml, configuration properties from global section set properly into 
> Hadoop job conf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2030) Configuration properties from global section is not getting set in Hadoop job conf when using sub-workflow action in Oozie workflow.xml

2015-06-15 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585671#comment-14585671
 ] 

Srikanth Sundarrajan commented on OOZIE-2030:
-

There is perhaps a simpler way to tackle this issue. If we modify the 
LiteWorkflowAppParser to serialize persist the contents of global in conf and 
have handleGlobal() also consult conf for the handling section, this will 
ensure that global is propagated correctly with no further changes to any other 
section of the code, honoring the right overlay priorities. While the code in 
LWAP wouldn't be specific to Subworkflows either. Luckily the conf itself is 
propagates into subflows on user's request. - [~shwethags], [~rohini], makes 
sense ?

> Configuration properties from global section is not getting set in Hadoop job 
> conf when using sub-workflow action in Oozie workflow.xml 
> 
>
> Key: OOZIE-2030
> URL: https://issues.apache.org/jira/browse/OOZIE-2030
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Reporter: Peeyush Bishnoi
>Assignee: Jaydeep Vishwakarma
> Attachments: OOZIE-2030-v2.patch, OOZIE-2030-v3.patch, 
> OOZIE-2030-v4.patch, OOZIE-2030.patch
>
>
> When submitting Oozie workflow with sub-workflow action and with global 
> section, configuration properties defined in global section is not getting 
> set in launched Hadoop job conf. But when we use Pig or MR action in 
> workflow.xml, configuration properties from global section set properly into 
> Hadoop job conf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2251) Expose instrumental matrices in Realtime Graphing tool

2015-06-13 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584520#comment-14584520
 ] 

Srikanth Sundarrajan commented on OOZIE-2251:
-

+1 for the patch.

Additional note FWIW, Gmetric pulls in an additional dependency (LGPL) which 
isn't compatible. This might be an issue if binary releases were to include all 
transitive dependencies. Some details can be found 
[here|https://cwiki.apache.org/confluence/display/LENS/Licensing+in+Apache+Lens].
 For additional reference see SPARK-1167

> Expose instrumental matrices in Realtime Graphing tool
> --
>
> Key: OOZIE-2251
> URL: https://issues.apache.org/jira/browse/OOZIE-2251
> Project: Oozie
>  Issue Type: New Feature
>  Components: monitoring
>Reporter: Jaydeep Vishwakarma
>Assignee: Narayan Periwal
> Attachments: OOZIE-2251-v0.patch, OOZIE-2251-v1.patch, 
> OOZIE-2251-v2.patch, OOZIE-2251-v3.patch, OOZIE-2251-v4.patch, 
> OOZIE-2251-v5.patch, OOZIE-2251-v6.patch, OOZIE-2251-v7.patch
>
>
> We have been logging so many important matrices in oozie-instrumentation.log 
> . These information is very useful for oozie functional monitoring. But it is 
> always difficult to get the meaning from flat file. If we expose this 
> information on some graphing tool, We can get the lot of meaning out of it 
> and can take some actions based on it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2030) Configuration properties from global section is not getting set in Hadoop job conf when using sub-workflow action in Oozie workflow.xml

2015-06-12 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583370#comment-14583370
 ] 

Srikanth Sundarrajan commented on OOZIE-2030:
-

We need to have a uniform mechanism to parse global section and make them 
available to the action executors and then it should be upto the executors on 
how to use them as they deem fit.

> Configuration properties from global section is not getting set in Hadoop job 
> conf when using sub-workflow action in Oozie workflow.xml 
> 
>
> Key: OOZIE-2030
> URL: https://issues.apache.org/jira/browse/OOZIE-2030
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Reporter: Peeyush Bishnoi
>Assignee: Jaydeep Vishwakarma
> Attachments: OOZIE-2030-v2.patch, OOZIE-2030-v3.patch, 
> OOZIE-2030-v4.patch, OOZIE-2030.patch
>
>
> When submitting Oozie workflow with sub-workflow action and with global 
> section, configuration properties defined in global section is not getting 
> set in launched Hadoop job conf. But when we use Pig or MR action in 
> workflow.xml, configuration properties from global section set properly into 
> Hadoop job conf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2030) Configuration properties from global section is not getting set in Hadoop job conf when using sub-workflow action in Oozie workflow.xml

2015-06-12 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583369#comment-14583369
 ] 

Srikanth Sundarrajan commented on OOZIE-2030:
-

My bad, I must have confused it with the other patch I was reviewing. 

> Configuration properties from global section is not getting set in Hadoop job 
> conf when using sub-workflow action in Oozie workflow.xml 
> 
>
> Key: OOZIE-2030
> URL: https://issues.apache.org/jira/browse/OOZIE-2030
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Reporter: Peeyush Bishnoi
>Assignee: Jaydeep Vishwakarma
> Attachments: OOZIE-2030-v2.patch, OOZIE-2030-v3.patch, 
> OOZIE-2030-v4.patch, OOZIE-2030.patch
>
>
> When submitting Oozie workflow with sub-workflow action and with global 
> section, configuration properties defined in global section is not getting 
> set in launched Hadoop job conf. But when we use Pig or MR action in 
> workflow.xml, configuration properties from global section set properly into 
> Hadoop job conf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2030) Configuration properties from global section is not getting set in Hadoop job conf when using sub-workflow action in Oozie workflow.xml

2015-06-12 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583368#comment-14583368
 ] 

Srikanth Sundarrajan commented on OOZIE-2030:
-

My bad, I must have confused it with the other patch I was reviewing. 

> Configuration properties from global section is not getting set in Hadoop job 
> conf when using sub-workflow action in Oozie workflow.xml 
> 
>
> Key: OOZIE-2030
> URL: https://issues.apache.org/jira/browse/OOZIE-2030
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Reporter: Peeyush Bishnoi
>Assignee: Jaydeep Vishwakarma
> Attachments: OOZIE-2030-v2.patch, OOZIE-2030-v3.patch, 
> OOZIE-2030-v4.patch, OOZIE-2030.patch
>
>
> When submitting Oozie workflow with sub-workflow action and with global 
> section, configuration properties defined in global section is not getting 
> set in launched Hadoop job conf. But when we use Pig or MR action in 
> workflow.xml, configuration properties from global section set properly into 
> Hadoop job conf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2030) Configuration properties from global section is not getting set in Hadoop job conf when using sub-workflow action in Oozie workflow.xml

2015-06-11 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582934#comment-14582934
 ] 

Srikanth Sundarrajan commented on OOZIE-2030:
-

Yes [~shwethags], didn't want two open review request pending in review board 
at the same time, hence the request. Thanks

> Configuration properties from global section is not getting set in Hadoop job 
> conf when using sub-workflow action in Oozie workflow.xml 
> 
>
> Key: OOZIE-2030
> URL: https://issues.apache.org/jira/browse/OOZIE-2030
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Reporter: Peeyush Bishnoi
>Assignee: Jaydeep Vishwakarma
> Attachments: OOZIE-2030-v2.patch, OOZIE-2030-v3.patch, 
> OOZIE-2030-v4.patch, OOZIE-2030.patch
>
>
> When submitting Oozie workflow with sub-workflow action and with global 
> section, configuration properties defined in global section is not getting 
> set in launched Hadoop job conf. But when we use Pig or MR action in 
> workflow.xml, configuration properties from global section set properly into 
> Hadoop job conf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2020) Rerun all Failed/killed/timedout coordinator actions rather than specifying action numbers

2015-06-11 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582325#comment-14582325
 ] 

Srikanth Sundarrajan commented on OOZIE-2020:
-

Is it intentional that succeeded action’s can’t be re-run through this filter 
or is it an oversight ? Also it might help to call this something else, 
FILTERSTATUS appears like a generic collection of statuses that can be used for 
filter. Why not use COMPLETED_COORD_ACTION_STATUSES, which seems more 
appropriate.
{code}
+public enum FILTERSTATUS {KILLED, FAILED, TIMEDOUT};
{code}
If you do consider all actions in completed status for re-run consider renaming 
GET_TERMINATED* in JPA modules to GET_COMPLETED* ?

{{LocalOozieClientCoord}}
Should we have a different error code instead for input checks ?
{code}
throw new CommandException(ErrorCode.E1018, "Invalid value provided for filter 
option; " +
"Valid Value is 
'status=KILLED;status=FAILED;status=TIMEDOUT'");

{code}

{{V1JobsServlet}}
parseFilters is redundantly implemeented in two places in the server. Possible 
to reconcile ?

{{TestCoordRerunXCommand}}

why not checking for waiting in this case as in others ?
{code}
assertNotSame(action2.getStatus(), CoordinatorAction.Status.SUCCEEDED);
{code} 

Few nits:
* Plenty of unused imports introduced in the patch
* Few unused variables
* javadoc incorrect:: CoordUtils
* Use logger instead of print stack trace to SysErr (LocalOozieClientCoord)
* Error message in OozieCLI isn't consistent with other invalid input exceptions

> Rerun all Failed/killed/timedout coordinator actions rather than specifying 
> action numbers
> --
>
> Key: OOZIE-2020
> URL: https://issues.apache.org/jira/browse/OOZIE-2020
> Project: Oozie
>  Issue Type: New Feature
>  Components: action
>Reporter: Sreedish P S
>Assignee: Narayan Periwal
>Priority: Minor
> Attachments: OOZIE-2020-v10.patch, OOZIE-2020-v11.patch, 
> OOZIE-2020-v12.patch, OOZIE-2020-v13.patch, OOZIE-2020-v14.patch, 
> OOZIE-2020-v15.patch, OOZIE-2020-v16.patch, OOZIE-2020-v17.patch, 
> OOZIE-2020-v8.patch, OOZIE-2020-v9.patch
>
>
> Currently rerun of coordinator actions are made through coordinator id and 
> action numbers, this feature request is for rerunning all coordinator actions 
> by mentioning a particular state
> for example :
> oozie job -rerun coord-id -state killed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2030) Configuration properties from global section is not getting set in Hadoop job conf when using sub-workflow action in Oozie workflow.xml

2015-06-11 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582313#comment-14582313
 ] 

Srikanth Sundarrajan commented on OOZIE-2030:
-

(minor nit) There are few unused imports in the patch.

[~shwethags], Can you please close the previous review request. This should 
allow [~jaydeepvishwakarma] to create a new review request and upload this 
revisions. 

> Configuration properties from global section is not getting set in Hadoop job 
> conf when using sub-workflow action in Oozie workflow.xml 
> 
>
> Key: OOZIE-2030
> URL: https://issues.apache.org/jira/browse/OOZIE-2030
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Reporter: Peeyush Bishnoi
>Assignee: Jaydeep Vishwakarma
> Attachments: OOZIE-2030-v2.patch, OOZIE-2030-v3.patch, 
> OOZIE-2030-v4.patch, OOZIE-2030.patch
>
>
> When submitting Oozie workflow with sub-workflow action and with global 
> section, configuration properties defined in global section is not getting 
> set in launched Hadoop job conf. But when we use Pig or MR action in 
> workflow.xml, configuration properties from global section set properly into 
> Hadoop job conf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2030) Configuration properties from global section is not getting set in Hadoop job conf when using sub-workflow action in Oozie workflow.xml

2015-06-11 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582309#comment-14582309
 ] 

Srikanth Sundarrajan commented on OOZIE-2030:
-

[~jaydeepvishwakarma], Context arg in Subworkflow should give you a handle to 
the parent workflow via {{context::getWorkflow()}}. Shouldn't you be accessing 
the global section of the parent workflow here instead of 
{{LiteWorkflowAppParser::parse()}}. Mutating a single property 
via {{jobConf.get(SubWorkflowActionExecutor.SUBWF_JOBCONF)}} isn't likely to 
work when subflow depth is more than one, as the same will be overwritten. 
{code}
XConfiguration subWorkflowConf = new XConfiguration();
Configuration parentConf = new XConfiguration(new 
StringReader(context.getWorkflow().getConf()));
if (eConf.getChild(("propagate-configuration"), ns) != null) {
XConfiguration.copy(parentConf, subWorkflowConf);
}
{code}

Also the test case isn't really testing for precedence of global overlays. The 
current test has mutually exclusive conf in the parent & subflow global section 
and with this test case it is not possible to assert that the precedence is 
being honored correctly.

> Configuration properties from global section is not getting set in Hadoop job 
> conf when using sub-workflow action in Oozie workflow.xml 
> 
>
> Key: OOZIE-2030
> URL: https://issues.apache.org/jira/browse/OOZIE-2030
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Reporter: Peeyush Bishnoi
>Assignee: Jaydeep Vishwakarma
> Attachments: OOZIE-2030-v2.patch, OOZIE-2030-v3.patch, 
> OOZIE-2030-v4.patch, OOZIE-2030.patch
>
>
> When submitting Oozie workflow with sub-workflow action and with global 
> section, configuration properties defined in global section is not getting 
> set in launched Hadoop job conf. But when we use Pig or MR action in 
> workflow.xml, configuration properties from global section set properly into 
> Hadoop job conf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2020) Rerun all Failed/killed/timedout coordinator actions rather than specifying action numbers

2015-06-11 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582192#comment-14582192
 ] 

Srikanth Sundarrajan commented on OOZIE-2020:
-

Hi [~nperiwal], Please upload the latest patch to review board.

> Rerun all Failed/killed/timedout coordinator actions rather than specifying 
> action numbers
> --
>
> Key: OOZIE-2020
> URL: https://issues.apache.org/jira/browse/OOZIE-2020
> Project: Oozie
>  Issue Type: New Feature
>  Components: action
>Reporter: Sreedish P S
>Assignee: Narayan Periwal
>Priority: Minor
> Attachments: OOZIE-2020-v10.patch, OOZIE-2020-v11.patch, 
> OOZIE-2020-v12.patch, OOZIE-2020-v13.patch, OOZIE-2020-v14.patch, 
> OOZIE-2020-v15.patch, OOZIE-2020-v16.patch, OOZIE-2020-v17.patch, 
> OOZIE-2020-v8.patch, OOZIE-2020-v9.patch
>
>
> Currently rerun of coordinator actions are made through coordinator id and 
> action numbers, this feature request is for rerunning all coordinator actions 
> by mentioning a particular state
> for example :
> oozie job -rerun coord-id -state killed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2251) Expose instrumental matrices in Realtime Graphing tool

2015-06-11 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582042#comment-14582042
 ] 

Srikanth Sundarrajan commented on OOZIE-2251:
-

Thanks [~nperiwal], fix does address most of the comments from earlier review. 
Have a few more on the new patch, can you please check.

> Expose instrumental matrices in Realtime Graphing tool
> --
>
> Key: OOZIE-2251
> URL: https://issues.apache.org/jira/browse/OOZIE-2251
> Project: Oozie
>  Issue Type: New Feature
>  Components: monitoring
>Reporter: Jaydeep Vishwakarma
>Assignee: Narayan Periwal
> Attachments: OOZIE-2251-v0.patch, OOZIE-2251-v1.patch, 
> OOZIE-2251-v2.patch, OOZIE-2251-v3.patch, OOZIE-2251-v4.patch, 
> OOZIE-2251-v5.patch
>
>
> We have been logging so many important matrices in oozie-instrumentation.log 
> . These information is very useful for oozie functional monitoring. But it is 
> always difficult to get the meaning from flat file. If we expose this 
> information on some graphing tool, We can get the lot of meaning out of it 
> and can take some actions based on it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2259) Create a callback action

2015-06-04 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573925#comment-14573925
 ] 

Srikanth Sundarrajan commented on OOZIE-2259:
-

Callback action can be quite useful. Have some questions relating to the 
proposal though.

1. Would standard action level retries be available for this, am assuming it 
will be. Please confirm
2. Host isn't adequate, you essentially need a URL comprising of scheme and 
authority
3. The method being queue/topic is misleading. Would suggest HTTP_GET, 
HTTP_POST, QUEUE_OFFER, TOPIC_PUBLISH to be explicit.
4. From the proposal it seems like it is not possible to include post body, 
That should actually be ok. Just wanted to hear your thoughts on that.
5. Would capture-ouput work for this action?
6. In case of HTTP_METHODS you might get a response body, will that be 
preserved should the user need them. In my view, that can be skipped too, as 
this is to serve as a callback notification
7. Would this be a fire and forget action. Say you get a HTTP/400 back what 
would be the behavior ?
8. How is this proposed to be implemented ? As an action performed through the 
launcher (via JavaActionExecutor) or something along the lines of 
FsActionExecutor/EmailActionExecutor?

> Create a callback action 
> -
>
> Key: OOZIE-2259
> URL: https://issues.apache.org/jira/browse/OOZIE-2259
> Project: Oozie
>  Issue Type: New Feature
>  Components: action
>Reporter: Jaydeep Vishwakarma
>Assignee: Jaydeep Vishwakarma
>
> Need an action to send notification to external server by oozie. We should be 
> able to do multiple types of callback, Currently I know jms and http call. It 
> should suppose to have capability to call diffrent types of methods along 
> with n number of arguments. 
> The sample workflow with callback action 
> {code:xml}
> 
> ...
> 
> 
>   [HOST]
>   [METHOD]
>   
>   [KEY][VALUE]
>   
> ...
> 
> ...
> 
> ...
> 
> {code}
> HOST : by the host system can figure out if it is http or jms callback 
> action. System will send the notification to that host.
> METHOD : it can be POST/GET/QUEUE/TOPIC



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2251) Expose instrumental matrices in Realtime Graphing tool

2015-06-03 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572197#comment-14572197
 ] 

Srikanth Sundarrajan commented on OOZIE-2251:
-

[~nperiwal], Have shared my comments on review board

> Expose instrumental matrices in Realtime Graphing tool
> --
>
> Key: OOZIE-2251
> URL: https://issues.apache.org/jira/browse/OOZIE-2251
> Project: Oozie
>  Issue Type: New Feature
>  Components: monitoring
>Reporter: Jaydeep Vishwakarma
>Assignee: Narayan Periwal
> Attachments: OOZIE-2251-v0.patch, OOZIE-2251-v1.patch, 
> OOZIE-2251-v2.patch
>
>
> We have been logging so many important matrices in oozie-instrumentation.log 
> . These information is very useful for oozie functional monitoring. But it is 
> always difficult to get the meaning from flat file. If we expose this 
> information on some graphing tool, We can get the lot of meaning out of it 
> and can take some actions based on it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2251) Expose instrumental matrices in Realtime Graphing tool

2015-05-30 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566343#comment-14566343
 ] 

Srikanth Sundarrajan commented on OOZIE-2251:
-

Possible to upload this in reviewboard ?

> Expose instrumental matrices in Realtime Graphing tool
> --
>
> Key: OOZIE-2251
> URL: https://issues.apache.org/jira/browse/OOZIE-2251
> Project: Oozie
>  Issue Type: New Feature
>  Components: monitoring
>Reporter: Jaydeep Vishwakarma
>Assignee: Narayan Periwal
> Attachments: OOZIE-2251-v0.patch, OOZIE-2251-v1.patch, 
> OOZIE-2251-v2.patch
>
>
> We have been logging so many important matrices in oozie-instrumentation.log 
> . These information is very useful for oozie functional monitoring. But it is 
> always difficult to get the meaning from flat file. If we expose this 
> information on some graphing tool, We can get the lot of meaning out of it 
> and can take some actions based on it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2216) Aperiodic Data handling in oozie

2015-05-11 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537773#comment-14537773
 ] 

Srikanth Sundarrajan commented on OOZIE-2216:
-

Some issue with JIRA. Sorry for the multiple posts.

> Aperiodic Data handling in oozie
> 
>
> Key: OOZIE-2216
> URL: https://issues.apache.org/jira/browse/OOZIE-2216
> Project: Oozie
>  Issue Type: New Feature
>  Components: coordinator
>Reporter: Jaydeep Vishwakarma
>Assignee: Jaydeep Vishwakarma
> Attachments: Oozie_aperiodic_data_handling.pdf
>
>
> Currently Oozie scheduling works on periodic datasets. It does not have any 
> mechanism to handle aperiodic datasets, which doesn’t follow a fixed 
> schedule/frequency. 
> Use cases
> When incoming dataset arrives with no fixed schedule.
> Need to trigger the job based all data available since last run with a 
> possible cap on the max size to process in one run.
> Try to avoid creating so many instances when you know input instances will be 
> very few.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2216) Aperiodic Data handling in oozie

2015-05-11 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537771#comment-14537771
 ] 

Srikanth Sundarrajan commented on OOZIE-2216:
-

How often is input checked before an action is created ? Question is 
specifically to see what kind of load is NN subjected to.

> Aperiodic Data handling in oozie
> 
>
> Key: OOZIE-2216
> URL: https://issues.apache.org/jira/browse/OOZIE-2216
> Project: Oozie
>  Issue Type: New Feature
>  Components: coordinator
>Reporter: Jaydeep Vishwakarma
>Assignee: Jaydeep Vishwakarma
> Attachments: Oozie_aperiodic_data_handling.pdf
>
>
> Currently Oozie scheduling works on periodic datasets. It does not have any 
> mechanism to handle aperiodic datasets, which doesn’t follow a fixed 
> schedule/frequency. 
> Use cases
> When incoming dataset arrives with no fixed schedule.
> Need to trigger the job based all data available since last run with a 
> possible cap on the max size to process in one run.
> Try to avoid creating so many instances when you know input instances will be 
> very few.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2216) Aperiodic Data handling in oozie

2015-05-11 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537769#comment-14537769
 ] 

Srikanth Sundarrajan commented on OOZIE-2216:
-

How often is input checked before an action is created ? Question is 
specifically to see what kind of load is NN subjected to.

> Aperiodic Data handling in oozie
> 
>
> Key: OOZIE-2216
> URL: https://issues.apache.org/jira/browse/OOZIE-2216
> Project: Oozie
>  Issue Type: New Feature
>  Components: coordinator
>Reporter: Jaydeep Vishwakarma
>Assignee: Jaydeep Vishwakarma
> Attachments: Oozie_aperiodic_data_handling.pdf
>
>
> Currently Oozie scheduling works on periodic datasets. It does not have any 
> mechanism to handle aperiodic datasets, which doesn’t follow a fixed 
> schedule/frequency. 
> Use cases
> When incoming dataset arrives with no fixed schedule.
> Need to trigger the job based all data available since last run with a 
> possible cap on the max size to process in one run.
> Try to avoid creating so many instances when you know input instances will be 
> very few.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2216) Aperiodic Data handling in oozie

2015-05-11 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537770#comment-14537770
 ] 

Srikanth Sundarrajan commented on OOZIE-2216:
-

How often is input checked before an action is created ? Question is 
specifically to see what kind of load is NN subjected to.

> Aperiodic Data handling in oozie
> 
>
> Key: OOZIE-2216
> URL: https://issues.apache.org/jira/browse/OOZIE-2216
> Project: Oozie
>  Issue Type: New Feature
>  Components: coordinator
>Reporter: Jaydeep Vishwakarma
>Assignee: Jaydeep Vishwakarma
> Attachments: Oozie_aperiodic_data_handling.pdf
>
>
> Currently Oozie scheduling works on periodic datasets. It does not have any 
> mechanism to handle aperiodic datasets, which doesn’t follow a fixed 
> schedule/frequency. 
> Use cases
> When incoming dataset arrives with no fixed schedule.
> Need to trigger the job based all data available since last run with a 
> possible cap on the max size to process in one run.
> Try to avoid creating so many instances when you know input instances will be 
> very few.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-2216) Aperiodic Data handling in oozie

2015-05-07 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532558#comment-14532558
 ] 

Srikanth Sundarrajan commented on OOZIE-2216:
-

[~jaydeepvishwakarma], This would be a nice addition to Oozie. Looking at the 
design, it looks like the major shift you plan to bring about is to avoid eager 
materialization followed by input check to lazy materialization upon input 
availability for coordinators that are marked as gating on aperiodic datasets. 
Seems simple enough. Perhaps you can share your thinking on these.

1. How periodic will the polling be when materialization is lazy (to gauge the 
effect this would have on NN) ?
2. What is the behavior when some periodic and aperiodic datasets are required 
for a coordinator. Is that supported ?
3. How will this co-exist with features outlined in OOZIE-1976
4. You seem to imply that there would no schema changes. Would you need any 
additional state maintained for this, if so where is that planned to be 
maintained?
5. Do you expect the DB to be loaded more than what it is today?

Thanks for taking this up.

> Aperiodic Data handling in oozie
> 
>
> Key: OOZIE-2216
> URL: https://issues.apache.org/jira/browse/OOZIE-2216
> Project: Oozie
>  Issue Type: New Feature
>  Components: coordinator
>Reporter: Jaydeep Vishwakarma
>Assignee: Jaydeep Vishwakarma
> Attachments: Oozie_aperiodic_data_handling.pdf
>
>
> Currently Oozie scheduling works on periodic datasets. It does not have any 
> mechanism to handle aperiodic datasets, which doesn’t follow a fixed 
> schedule/frequency. 
> Use cases
> When incoming dataset arrives with no fixed schedule.
> Need to trigger the job based all data available since last run with a 
> possible cap on the max size to process in one run.
> Try to avoid creating so many instances when you know input instances will be 
> very few.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-1536) Coordinator action reruns start a new workflow

2014-09-30 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154344#comment-14154344
 ] 

Srikanth Sundarrajan commented on OOZIE-1536:
-

+1, Looks good to me.

> Coordinator action reruns start a new workflow
> --
>
> Key: OOZIE-1536
> URL: https://issues.apache.org/jira/browse/OOZIE-1536
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Srikanth Sundarrajan
>Assignee: Jaydeep Vishwakarma
>
> Coordinator action reruns start a new workflow and if existing workflow for 
> the action is in running state, the same is not checked. Coord rerun can 
> possibly do a workflow re-run to prevent this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-1536) Coordinator action reruns start a new workflow

2014-07-28 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077283#comment-14077283
 ] 

Srikanth Sundarrajan commented on OOZIE-1536:
-

[~puru], I guess [~shwethags] is talking about the following scenario 

An initial run of the coord action creates a workflow and then a subsequent run 
creates another workflow. It is understood that coord re-run will make sure 
that the older workflow is not running before the second one starts. However It 
appears like Workflow #1 can be run independently through a direct workflow 
re-run (not going via coord re-run). In which case you might see both the 
workflows run and the behavior is undefined.

If there was a one to one correspondence between a coord action and a workflow 
this problem might not occur. Makes sense ?

> Coordinator action reruns start a new workflow
> --
>
> Key: OOZIE-1536
> URL: https://issues.apache.org/jira/browse/OOZIE-1536
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Srikanth Sundarrajan
>
> Coordinator action reruns start a new workflow and if existing workflow for 
> the action is in running state, the same is not checked. Coord rerun can 
> possibly do a workflow re-run to prevent this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OOZIE-1533) Coordinator action materialization is too slow due to coarse job level locks

2014-03-11 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13931400#comment-13931400
 ] 

Srikanth Sundarrajan commented on OOZIE-1533:
-

[~rohini], Unless all coord actions are done, status transit service should't 
be updating the coord job. correct ? Perhaps we should keep updates to coord 
possible only via three routes (1. user action, 2. when all coord actions are 
in completed state, 3. Materialization) to prevent StatusTransitService from 
playing god.
{quote}
One problem that needs to be addressed before this was that there are lot of 
places in code where coord job is updated
{quote}

Regarding the CoordActionInputCheckXCommand, you bring up a really important 
concern, but to throttle it down through a coord lock seems to generally bring 
down the throughput and it might useful to keep it free of this lock. We should 
look at options to perform bulk checks for input to improve the scalability of 
this operation without hurting NN / DB

In practice I found that most commands resort to checking the coord status in 
verifyPrecondition(), so the odds of a coord action running while the coord 
being in killed state due to a user interrupt is negligible, however the 
possibility does exist.
{quote}
Another thing is interrupt commands like coord kill, etc will not be processed 
earlier if the lock is changed to the action id.
{quote}

> Coordinator action materialization is too slow due to coarse job level locks
> 
>
> Key: OOZIE-1533
> URL: https://issues.apache.org/jira/browse/OOZIE-1533
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Srikanth Sundarrajan
>Assignee: Srikanth Sundarrajan
>  Labels: locking
> Attachments: OOZIE-1533.patch
>
>
> Coord job level lock introduces high contention. Instead introduce coord 
> action level locking whenever appropriate



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (OOZIE-1533) Coordinator action materialization is too slow due to coarse job level locks

2014-03-06 Thread Srikanth Sundarrajan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OOZIE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Sundarrajan updated OOZIE-1533:


Attachment: OOZIE-1533.patch

> Coordinator action materialization is too slow due to coarse job level locks
> 
>
> Key: OOZIE-1533
> URL: https://issues.apache.org/jira/browse/OOZIE-1533
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Srikanth Sundarrajan
>Assignee: Srikanth Sundarrajan
> Attachments: OOZIE-1533.patch
>
>
> Coord job level lock introduces high contention. Instead introduce coord 
> action level locking whenever appropriate



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (OOZIE-1531) Add a blocking / synchronous option to oozie client

2014-03-05 Thread Srikanth Sundarrajan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OOZIE-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Sundarrajan reassigned OOZIE-1531:
---

Assignee: Srikanth Sundarrajan  (was: Bowen Zhang)

> Add a blocking / synchronous option to oozie client  
> -
>
> Key: OOZIE-1531
> URL: https://issues.apache.org/jira/browse/OOZIE-1531
> Project: Oozie
>  Issue Type: New Feature
>Reporter: Srikanth Sundarrajan
>Assignee: Srikanth Sundarrajan
>
> Currently Oozie returns immediately after sending the request, there is not 
> warrantee that the request is correct or it has been done.
> ASK: a client Java API that blocks until the submitted job is running, it has 
> been killed, etc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (OOZIE-1699) Some of the commands submitted to Oozie internal queue are never executed

2014-02-21 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908136#comment-13908136
 ] 

Srikanth Sundarrajan commented on OOZIE-1699:
-

The test in focus is  
org.apache.oozie.event.TestEventGeneration.testCoordinatorActionEvent



> Some of the commands submitted to Oozie internal queue are never executed
> -
>
> Key: OOZIE-1699
> URL: https://issues.apache.org/jira/browse/OOZIE-1699
> Project: Oozie
>  Issue Type: Bug
>Reporter: Srikanth Sundarrajan
>Assignee: Srikanth Sundarrajan
> Attachments: OOZIE-1699-v1-no-prefix.patch, OOZIE-1699.patch
>
>
> At scale, we are seeing issues with some command submitted to the command 
> queue in CallableQueueService aren't getting executed at all.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (OOZIE-1699) Some of the commands submitted to Oozie internal queue are never executed

2014-02-20 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907999#comment-13907999
 ] 

Srikanth Sundarrajan commented on OOZIE-1699:
-

Verified the failed test multiple times and there doesn't seem to be any 
regression. I am guessing that this test is generally flaky, based on what I 
saw with other test-patches

https://builds.apache.org/job/oozie-trunk-precommit-build/1066/testReport/ 
(OOZIE-1698)
https://builds.apache.org/job/oozie-trunk-precommit-build/1044/testReport/ 
(OOZIE-1681) 
...

> Some of the commands submitted to Oozie internal queue are never executed
> -
>
> Key: OOZIE-1699
> URL: https://issues.apache.org/jira/browse/OOZIE-1699
> Project: Oozie
>  Issue Type: Bug
>Reporter: Srikanth Sundarrajan
>Assignee: Srikanth Sundarrajan
> Attachments: OOZIE-1699-v1-no-prefix.patch, OOZIE-1699.patch
>
>
> At scale, we are seeing issues with some command submitted to the command 
> queue in CallableQueueService aren't getting executed at all.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (OOZIE-1533) Coordinator action materialization is too slow due to coarse job level locks

2014-02-20 Thread Srikanth Sundarrajan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OOZIE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Sundarrajan reassigned OOZIE-1533:
---

Assignee: Srikanth Sundarrajan

> Coordinator action materialization is too slow due to coarse job level locks
> 
>
> Key: OOZIE-1533
> URL: https://issues.apache.org/jira/browse/OOZIE-1533
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Srikanth Sundarrajan
>Assignee: Srikanth Sundarrajan
>
> Coord job level lock introduces high contention. Instead introduce coord 
> action level locking whenever appropriate



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (OOZIE-1533) Coordinator action materialization is too slow due to coarse job level locks

2014-02-20 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906783#comment-13906783
 ] 

Srikanth Sundarrajan commented on OOZIE-1533:
-

Currently locks are being held for various coord-action-commands as follows

||Command||Lock (entity-key)||
|CoordActionCheckXCommand|coord-action-id|
|CoordActionInfoXCommand|no-locks|
|CoordActionInputCheckXCommand|coord-job-id|
|CoordActionMaterializeCommand|RANDOM("coord_action_mater" + UUID())|
|CoordActionNotificationXCommand|RANDOM("coord_action_notification" + UUID())|
|CoordActionReadyXCommand|coord-job-id|
|CoordActionsKillXCommand|coord-job-id|
|CoordActionStartXCommand|coord-job-id|
|CoordActionTimeOutXCommand|coord-action-id|
|CoordActionUpdatePushMissingDependency|coord-action-id|
|CoordActionUpdateXCommand|coord-job-id|

I intend to put up a patch changing locks for the following commands.

||Command||Lock (entity-key)||
|CoordActionInputCheckXCommand|coord-action-id|
|CoordActionReadyXCommand|coord-action-id|
|CoordActionStartXCommand|coord-action-id|
|CoordActionUpdateXCommand|coord-action-id|

It seems like these commands were using the coord-job-id level locks to prevent 
starting the action when the parent coord is in killed or paused state. But 
from a correctness stand point performing these commands when the coord is in 
killed / paused state there isn't any impact, except perhaps in 
CoordActionStartXCommand. While holding lock at the coord-job-id isn't all that 
helpful as it unnecessarily forces serial execution of independent 
coord-actions command essentially working on their specific actions. 

Are there any concerns ?

> Coordinator action materialization is too slow due to coarse job level locks
> 
>
> Key: OOZIE-1533
> URL: https://issues.apache.org/jira/browse/OOZIE-1533
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Srikanth Sundarrajan
>
> Coord job level lock introduces high contention. Instead introduce coord 
> action level locking whenever appropriate



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (OOZIE-1699) Some of the commands submitted to Oozie internal queue are never executed

2014-02-20 Thread Srikanth Sundarrajan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OOZIE-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Sundarrajan updated OOZIE-1699:


Attachment: OOZIE-1699-v1-no-prefix.patch

Attaching patch with --no-prefix as suggested by [~shwethags]. Thanks

> Some of the commands submitted to Oozie internal queue are never executed
> -
>
> Key: OOZIE-1699
> URL: https://issues.apache.org/jira/browse/OOZIE-1699
> Project: Oozie
>  Issue Type: Bug
>Reporter: Srikanth Sundarrajan
>Assignee: Srikanth Sundarrajan
> Attachments: OOZIE-1699-v1-no-prefix.patch, OOZIE-1699.patch
>
>
> At scale, we are seeing issues with some command submitted to the command 
> queue in CallableQueueService aren't getting executed at all.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (OOZIE-1699) Some of the commands submitted to Oozie internal queue are never executed

2014-02-20 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906770#comment-13906770
 ] 

Srikanth Sundarrajan commented on OOZIE-1699:
-

Patch does seem to apply alright. Am I missing something ?

{code}
sriksun:oozie-trunk sriksun$ git pull -v --all
Fetching origin
>From https://git-wip-us.apache.org/repos/asf/oozie
 = [up to date]  master -> origin/master
 = [up to date]  ap-pages   -> origin/ap-pages
 = [up to date]  branch-3.1 -> origin/branch-3.1
 = [up to date]  branch-3.1.4 -> origin/branch-3.1.4
 = [up to date]  branch-3.2 -> origin/branch-3.2
 = [up to date]  branch-3.3 -> origin/branch-3.3
 = [up to date]  branch-4.0 -> origin/branch-4.0
 = [up to date]  hcat-intre -> origin/hcat-intre
Already up-to-date.

sriksun:oozie-trunk sriksun$ curl 
"https://issues.apache.org/jira/secure/attachment/12630002/OOZIE-1699.patch"; | 
git apply -v --check
  % Total% Received % Xferd  Average Speed   TimeTime Time  Current
 Dload  Upload   Total   SpentLeft  Speed
100 10948  100 109480 0   4635  0  0:00:02  0:00:02 --:--:--  4637
Checking patch 
core/src/main/java/org/apache/oozie/service/CallableQueueService.java...
Checking patch 
core/src/main/java/org/apache/oozie/util/PollablePriorityDelayQueue.java...
Checking patch 
core/src/main/java/org/apache/oozie/util/PriorityDelayQueue.java...
Checking patch 
core/src/test/java/org/apache/oozie/service/TestCallableQueueService.java...
{code}

> Some of the commands submitted to Oozie internal queue are never executed
> -
>
> Key: OOZIE-1699
> URL: https://issues.apache.org/jira/browse/OOZIE-1699
> Project: Oozie
>  Issue Type: Bug
>Reporter: Srikanth Sundarrajan
>Assignee: Srikanth Sundarrajan
> Attachments: OOZIE-1699.patch
>
>
> At scale, we are seeing issues with some command submitted to the command 
> queue in CallableQueueService aren't getting executed at all.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (OOZIE-1531) Add a blocking / synchronous option to oozie client

2014-02-19 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906651#comment-13906651
 ] 

Srikanth Sundarrajan commented on OOZIE-1531:
-

Hi [~bowenzhangusa], Please do let me know if you are working on this, else I 
can provide a fix for this issue

> Add a blocking / synchronous option to oozie client  
> -
>
> Key: OOZIE-1531
> URL: https://issues.apache.org/jira/browse/OOZIE-1531
> Project: Oozie
>  Issue Type: New Feature
>Reporter: Srikanth Sundarrajan
>Assignee: Bowen Zhang
>
> Currently Oozie returns immediately after sending the request, there is not 
> warrantee that the request is correct or it has been done.
> ASK: a client Java API that blocks until the submitted job is running, it has 
> been killed, etc.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (OOZIE-1699) Some of the commands submitted to Oozie internal queue are never executed

2014-02-19 Thread Srikanth Sundarrajan (JIRA)

 [ 
https://issues.apache.org/jira/browse/OOZIE-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Sundarrajan updated OOZIE-1699:


Attachment: OOZIE-1699.patch

> Some of the commands submitted to Oozie internal queue are never executed
> -
>
> Key: OOZIE-1699
> URL: https://issues.apache.org/jira/browse/OOZIE-1699
> Project: Oozie
>  Issue Type: Bug
>Reporter: Srikanth Sundarrajan
>Assignee: Srikanth Sundarrajan
> Attachments: OOZIE-1699.patch
>
>
> At scale, we are seeing issues with some command submitted to the command 
> queue in CallableQueueService aren't getting executed at all.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (OOZIE-1699) Some of the commands submitted to Oozie internal queue are never executed

2014-02-17 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903809#comment-13903809
 ] 

Srikanth Sundarrajan commented on OOZIE-1699:
-

Debugging this further, able to identify that there is an Exception in 
CallableWrapper::run() before removeFromUniqueCallables() is invoked, leaving 
command behind in uniqueCallables list. This prevents this item from getting 
added again into the queue and since the earlier run() failed, the command 
never gets executed till a server restart.

> Some of the commands submitted to Oozie internal queue are never executed
> -
>
> Key: OOZIE-1699
> URL: https://issues.apache.org/jira/browse/OOZIE-1699
> Project: Oozie
>  Issue Type: Bug
>Reporter: Srikanth Sundarrajan
>Assignee: Srikanth Sundarrajan
>
> At scale, we are seeing issues with some command submitted to the command 
> queue in CallableQueueService aren't getting executed at all.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (OOZIE-1699) Some of the commands submitted to Oozie internal queue are never executed

2014-02-17 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903808#comment-13903808
 ] 

Srikanth Sundarrajan commented on OOZIE-1699:
-

Do find many uncaught exceptions from oozie captured in the catalina.out file.

{code}
>>> Exception in thread "pool-2-thread-22" java.lang.OutOfMemoryError: GC 
>>> overhead limit exceeded
>>> Exception in thread "pool-2-thread-19" 
>>> java.lang.IllegalMonitorStateException
   ...
   at 
org.apache.oozie.util.PollablePriorityDelayQueue.poll(PollablePriorityDelayQueue.java:80)
   ...
>>> Exception in thread "pool-2-thread-24" java.lang.IllegalStateException: 
>>> queueElement already in a queue
at 
org.apache.oozie.util.PriorityDelayQueue.offer(PriorityDelayQueue.java:347)
{code}

Looks like these threads have died and the ThreadPoolExecutor has created new 
threads to make good for these.

> Some of the commands submitted to Oozie internal queue are never executed
> -
>
> Key: OOZIE-1699
> URL: https://issues.apache.org/jira/browse/OOZIE-1699
> Project: Oozie
>  Issue Type: Bug
>Reporter: Srikanth Sundarrajan
>Assignee: Srikanth Sundarrajan
>
> At scale, we are seeing issues with some command submitted to the command 
> queue in CallableQueueService aren't getting executed at all.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (OOZIE-1699) Some of the commands submitted to Oozie internal queue are never executed

2014-02-17 Thread Srikanth Sundarrajan (JIRA)
Srikanth Sundarrajan created OOZIE-1699:
---

 Summary: Some of the commands submitted to Oozie internal queue 
are never executed
 Key: OOZIE-1699
 URL: https://issues.apache.org/jira/browse/OOZIE-1699
 Project: Oozie
  Issue Type: Bug
Reporter: Srikanth Sundarrajan
Assignee: Srikanth Sundarrajan


At scale, we are seeing issues with some command submitted to the command queue 
in CallableQueueService aren't getting executed at all.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (OOZIE-1532) Purging should remove completed children job for long running coordinator jobs

2014-02-11 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898820#comment-13898820
 ] 

Srikanth Sundarrajan commented on OOZIE-1532:
-

A default of 60 days for purging older wf_actions, workflows and coord_actions 
would be ideal.
{quote}
can you specify which config you want to add to the oozie-site.xml?
{quote}

> Purging should remove completed children job for long running coordinator jobs
> --
>
> Key: OOZIE-1532
> URL: https://issues.apache.org/jira/browse/OOZIE-1532
> Project: Oozie
>  Issue Type: New Feature
>Reporter: Srikanth Sundarrajan
>Assignee: Bowen Zhang
> Attachments: oozie-1532.patch
>
>
> Specifically, this is for long running coordinator jobs with high frequency. 
> all child workflows are never purged as the coord job is still running.
> Oozie server configuration that indicates how many coordinator actions 
> frequency ticks to keep. By doing this it would be possible to purge running 
> coord jobs. By default this would not be enabled and the current logic would 
> remain.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (OOZIE-1532) Purging should remove completed children job for long running coordinator jobs

2014-01-22 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13878496#comment-13878496
 ] 

Srikanth Sundarrajan commented on OOZIE-1532:
-

Yes that is correct. Thanks for picking this up. Often times the Oozie DB is 
bloated up causing performance issues and this might be very useful.

> Purging should remove completed children job for long running coordinator jobs
> --
>
> Key: OOZIE-1532
> URL: https://issues.apache.org/jira/browse/OOZIE-1532
> Project: Oozie
>  Issue Type: New Feature
>Reporter: Srikanth Sundarrajan
>Assignee: Bowen Zhang
>
> Specifically, this is for long running coordinator jobs with high frequency. 
> all child workflows are never purged as the coord job is still running.
> Oozie server configuration that indicates how many coordinator actions 
> frequency ticks to keep. By doing this it would be possible to purge running 
> coord jobs. By default this would not be enabled and the current logic would 
> remain.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (OOZIE-1532) Purging should remove completed children job for long running coordinator jobs

2014-01-21 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877619#comment-13877619
 ] 

Srikanth Sundarrajan commented on OOZIE-1532:
-

[~bowenzhangusa], Did you mean long running workflow or long running 
coordinator job ?

> Purging should remove completed children job for long running coordinator jobs
> --
>
> Key: OOZIE-1532
> URL: https://issues.apache.org/jira/browse/OOZIE-1532
> Project: Oozie
>  Issue Type: New Feature
>Reporter: Srikanth Sundarrajan
>Assignee: Bowen Zhang
>
> Specifically, this is for long running coordinator jobs with high frequency. 
> all child workflows are never purged as the coord job is still running.
> Oozie server configuration that indicates how many coordinator actions 
> frequency ticks to keep. By doing this it would be possible to purge running 
> coord jobs. By default this would not be enabled and the current logic would 
> remain.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (OOZIE-1532) Purging should remove completed children job for long running coordinator jobs

2014-01-21 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877621#comment-13877621
 ] 

Srikanth Sundarrajan commented on OOZIE-1532:
-

Purge shouldn't be touching any running workflow, the feature request is to 
purge old coord actions of a long running coord job.

> Purging should remove completed children job for long running coordinator jobs
> --
>
> Key: OOZIE-1532
> URL: https://issues.apache.org/jira/browse/OOZIE-1532
> Project: Oozie
>  Issue Type: New Feature
>Reporter: Srikanth Sundarrajan
>Assignee: Bowen Zhang
>
> Specifically, this is for long running coordinator jobs with high frequency. 
> all child workflows are never purged as the coord job is still running.
> Oozie server configuration that indicates how many coordinator actions 
> frequency ticks to keep. By doing this it would be possible to purge running 
> coord jobs. By default this would not be enabled and the current logic would 
> remain.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (OOZIE-1531) Add a blocking / synchronous option to oozie client

2013-12-14 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848294#comment-13848294
 ] 

Srikanth Sundarrajan commented on OOZIE-1531:
-

[~bowenzhangusa], The feature ask  is generic for all oozie operations and not 
restricted to workflow/coord or bundle creation (id is return in case of 
creation is adequate). Ideally would like the following behaviour for 
synchronous apis

Bundle creation: Return success only bundle and coord are valid and created
Coord creation: Return success only when coord definition is valid and created 
(not necessary to action to materialize leave alone its status)
Workflow creation: Return success only when workflow definition is valid and 
inited
Suspend (for all object types): Return success only when the request element is 
suspended successfully (which should included recursively suspend all the child 
objects)
Resume (for all object types): Return success only when the requested element 
and all child objects are resumed
kill (for all object types): Return success only the requested element and 
child objects are killed


> Add a blocking / synchronous option to oozie client  
> -
>
> Key: OOZIE-1531
> URL: https://issues.apache.org/jira/browse/OOZIE-1531
> Project: Oozie
>  Issue Type: New Feature
>Reporter: Srikanth Sundarrajan
>Assignee: Bowen Zhang
>
> Currently Oozie returns immediately after sending the request, there is not 
> warrantee that the request is correct or it has been done.
> ASK: a client Java API that blocks until the submitted job is running, it has 
> been killed, etc.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (OOZIE-1531) Add a blocking / synchronous option to oozie client

2013-12-10 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844976#comment-13844976
 ] 

Srikanth Sundarrajan commented on OOZIE-1531:
-

[~bowenzhangusa], What essentially I was looking for was support in the oozie 
server to actually perform them synchronously as opposed to it getting dropped 
into a queue for further handling later. If that is difficult, the OozieClient 
should block till the action is successful or failed. In the event the status 
doesn't change, would like an affirmative response on whether the action was 
successful or not, which should be consistent with what actually happens in the 
system. In other words, OozieClient can't respond saying the action failed, 
while the server subsequently performs this action successfully or vice-versa.

Thanks for picking this up.

> Add a blocking / synchronous option to oozie client  
> -
>
> Key: OOZIE-1531
> URL: https://issues.apache.org/jira/browse/OOZIE-1531
> Project: Oozie
>  Issue Type: New Feature
>Reporter: Srikanth Sundarrajan
>Assignee: Bowen Zhang
>
> Currently Oozie returns immediately after sending the request, there is not 
> warrantee that the request is correct or it has been done.
> ASK: a client Java API that blocks until the submitted job is running, it has 
> been killed, etc.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (OOZIE-1533) Coordinator action materialization is too slow due to coarse job level locks

2013-10-17 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798738#comment-13798738
 ] 

Srikanth Sundarrajan commented on OOZIE-1533:
-

Hi [~chitnis], Coord job level locks for materialization is perfectly fine, 
however action update is also blocked as they are also serialized through the 
coord job level lock. In practice my observation is that when individual 
actions want to update their status as the action makes progress from one state 
to another, and these update they are required to acquire a coord job level 
lock. Instead if action updates were to simply be blocked on coord action 
themselves, this will greatly improve the backlog catch up scenarios without 
being unfair to any other coordinator or compromising on correctness of the 
system.  

> Coordinator action materialization is too slow due to coarse job level locks
> 
>
> Key: OOZIE-1533
> URL: https://issues.apache.org/jira/browse/OOZIE-1533
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Srikanth Sundarrajan
>
> Coord job level lock introduces high contention. Instead introduce coord 
> action level locking whenever appropriate



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (OOZIE-1531) Add a blocking / synchronous option to oozie client

2013-09-16 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13768463#comment-13768463
 ] 

Srikanth Sundarrajan commented on OOZIE-1531:
-

Notes from discussion with [~tucu00] offline: 
This could be done in the OozieClient Java API by using the current fire&forget 
methods followed by a wait-until logic with a timeout. Similar to what Hadoop 
JobClient does.

> Add a blocking / synchronous option to oozie client  
> -
>
> Key: OOZIE-1531
> URL: https://issues.apache.org/jira/browse/OOZIE-1531
> Project: Oozie
>  Issue Type: New Feature
>Reporter: Srikanth Sundarrajan
>
> Currently Oozie returns immediately after sending the request, there is not 
> warrantee that the request is correct or it has been done.
> ASK: a client Java API that blocks until the submitted job is running, it has 
> been killed, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (OOZIE-1535) Update job properties for WF/COORD

2013-09-16 Thread Srikanth Sundarrajan (JIRA)
Srikanth Sundarrajan created OOZIE-1535:
---

 Summary: Update job properties for WF/COORD
 Key: OOZIE-1535
 URL: https://issues.apache.org/jira/browse/OOZIE-1535
 Project: Oozie
  Issue Type: Improvement
Reporter: Srikanth Sundarrajan


It should be possible to update job submission properties for a running job, 
both for WF and COORD jobs. The updated properties would be used for all 
subsequent actions (not yet started).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (OOZIE-1534) Launcher job might run do hadoop attempt relaunch - possibly causing correctness issues

2013-09-16 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13768465#comment-13768465
 ] 

Srikanth Sundarrajan commented on OOZIE-1534:
-

Notes from my discussion with [~tucu00] offline:

This could be done if the log scavenger logic with use to harvest MR jobs 
started by pig/hive/sqoop is done realtime (as opposed after pig/hive/sqoop 
finishes) and the captured job IDs are written/fsync to a file in HDFS in the 
action subdir. Then the action main class would look for this file container 
job ids at start time and if it exists, it would kill all those jobs before 
proceeding. This would make the launcher job idempotent.


> Launcher job might run do hadoop attempt relaunch - possibly causing 
> correctness issues
> ---
>
> Key: OOZIE-1534
> URL: https://issues.apache.org/jira/browse/OOZIE-1534
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Srikanth Sundarrajan
>
> The  section of the action allow to clean up the output dir. This is 
> not sufficient as MR jobs started by Pig/Hive may be still running.We should 
> look to kill child MR jobs if any before launching new ones.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (OOZIE-1537) Suspending a sub-workflow is not reflected in the parent workflow

2013-09-16 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13768468#comment-13768468
 ] 

Srikanth Sundarrajan commented on OOZIE-1537:
-

Notes from my discussion with [~tucu00] offline:

We could introduce the SUSPEND status for WF Actions and the ActionExecutor 
would have a method indicating if it is supported or not (except for sub-WF no 
other action would support that).

When a sub-WF is suspended, the parent WF action should be set to suspended and 
the parent WF job should be suspended. Resume should work in similar way. If 
the parent is suspended, the actions should be suspended if they support it. 
This should work up/down to/from coord jobs as well.

We need to figure out how to zigzag when a sub-wf within a fork of 2 or more 
sub-wf is suspended/resumed.

> Suspending a sub-workflow is not reflected in the parent workflow
> -
>
> Key: OOZIE-1537
> URL: https://issues.apache.org/jira/browse/OOZIE-1537
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Srikanth Sundarrajan
>
> Suspending a sub-workflow is not reflected in the parent workflow, thus you 
> don't know what is going on. The status of the sub-flow should be reflected 
> in the parent workflow just as in any other action.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (OOZIE-1538) Coordinator actions concurrency control across coord jobs at user level

2013-09-16 Thread Srikanth Sundarrajan (JIRA)
Srikanth Sundarrajan created OOZIE-1538:
---

 Summary: Coordinator actions concurrency control across coord jobs 
at user level
 Key: OOZIE-1538
 URL: https://issues.apache.org/jira/browse/OOZIE-1538
 Project: Oozie
  Issue Type: Improvement
Reporter: Srikanth Sundarrajan


Currently coord action concurrency is at coord job level. If the user has 
several coord jobs it can still flood the cluster.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (OOZIE-1537) Suspending a sub-workflow is not reflected in the parent workflow

2013-09-16 Thread Srikanth Sundarrajan (JIRA)
Srikanth Sundarrajan created OOZIE-1537:
---

 Summary: Suspending a sub-workflow is not reflected in the parent 
workflow
 Key: OOZIE-1537
 URL: https://issues.apache.org/jira/browse/OOZIE-1537
 Project: Oozie
  Issue Type: Improvement
Reporter: Srikanth Sundarrajan


Suspending a sub-workflow is not reflected in the parent workflow, thus you 
don't know what is going on. The status of the sub-flow should be reflected in 
the parent workflow just as in any other action.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (OOZIE-1534) Launcher job might run do hadoop attempt relaunch - possibly causing correctness issues

2013-09-16 Thread Srikanth Sundarrajan (JIRA)
Srikanth Sundarrajan created OOZIE-1534:
---

 Summary: Launcher job might run do hadoop attempt relaunch - 
possibly causing correctness issues
 Key: OOZIE-1534
 URL: https://issues.apache.org/jira/browse/OOZIE-1534
 Project: Oozie
  Issue Type: Improvement
Reporter: Srikanth Sundarrajan


The  section of the action allow to clean up the output dir. This is 
not sufficient as MR jobs started by Pig/Hive may be still running.We should 
look to kill child MR jobs if any before launching new ones.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (OOZIE-1536) Coordinator action reruns start a new workflow

2013-09-16 Thread Srikanth Sundarrajan (JIRA)
Srikanth Sundarrajan created OOZIE-1536:
---

 Summary: Coordinator action reruns start a new workflow
 Key: OOZIE-1536
 URL: https://issues.apache.org/jira/browse/OOZIE-1536
 Project: Oozie
  Issue Type: Improvement
Reporter: Srikanth Sundarrajan


Coordinator action reruns start a new workflow and if existing workflow for the 
action is in running state, the same is not checked. Coord rerun can possibly 
do a workflow re-run to prevent this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (OOZIE-1538) Coordinator actions concurrency control across coord jobs at user level

2013-09-16 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13768471#comment-13768471
 ] 

Srikanth Sundarrajan commented on OOZIE-1538:
-

Notes from my discussion with [~tucu00] offline:

Use Zookeeper as distributed countdown-locks/latches. Integration of Oozie and 
Zookeeper to do this could be done using a zookeeper URIHandler implementation 
and modeling the countdown-locks/latches as additional datasets in the 
coordinator definition. Thus the materialization of the action will depend on 
the zk:// being available.

> Coordinator actions concurrency control across coord jobs at user level
> ---
>
> Key: OOZIE-1538
> URL: https://issues.apache.org/jira/browse/OOZIE-1538
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Srikanth Sundarrajan
>
> Currently coord action concurrency is at coord job level. If the user has 
> several coord jobs it can still flood the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (OOZIE-1533) Coordinator action materialization is too slow due to coarse job level locks

2013-09-16 Thread Srikanth Sundarrajan (JIRA)
Srikanth Sundarrajan created OOZIE-1533:
---

 Summary: Coordinator action materialization is too slow due to 
coarse job level locks
 Key: OOZIE-1533
 URL: https://issues.apache.org/jira/browse/OOZIE-1533
 Project: Oozie
  Issue Type: Improvement
Reporter: Srikanth Sundarrajan


Coord job level lock introduces high contention. Instead introduce coord action 
level locking whenever appropriate

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (OOZIE-1532) Purging should remove completed children job for long running jobs

2013-09-16 Thread Srikanth Sundarrajan (JIRA)
Srikanth Sundarrajan created OOZIE-1532:
---

 Summary: Purging should remove completed children job for long 
running jobs
 Key: OOZIE-1532
 URL: https://issues.apache.org/jira/browse/OOZIE-1532
 Project: Oozie
  Issue Type: New Feature
Reporter: Srikanth Sundarrajan


Specifically, this is for long running coordinator jobs with high frequency. 
all child workflows are never purged as the coord job is still running.

Oozie server configuration that indicates how many coordinator actions 
frequency ticks to keep. By doing this it would be possible to purge running 
coord jobs. By default this would not be enabled and the current logic would 
remain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (OOZIE-1531) Add a blocking / synchronous option to oozie client

2013-09-16 Thread Srikanth Sundarrajan (JIRA)
Srikanth Sundarrajan created OOZIE-1531:
---

 Summary: Add a blocking / synchronous option to oozie client  
 Key: OOZIE-1531
 URL: https://issues.apache.org/jira/browse/OOZIE-1531
 Project: Oozie
  Issue Type: New Feature
Reporter: Srikanth Sundarrajan


Currently Oozie returns immediately after sending the request, there is not 
warrantee that the request is correct or it has been done.

ASK: a client Java API that blocks until the submitted job is running, it has 
been killed, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (OOZIE-674) resolveInstanceRange doesn't work for EL extensions

2013-06-13 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13682040#comment-13682040
 ] 

Srikanth Sundarrajan commented on OOZIE-674:


Yes please. It would be good if this is shipped in the next immediate release. 
BTW, When is 4.0 expected to ship?

> resolveInstanceRange doesn't work for EL extensions
> ---
>
> Key: OOZIE-674
> URL: https://issues.apache.org/jira/browse/OOZIE-674
> Project: Oozie
>  Issue Type: Bug
>Affects Versions: trunk
>Reporter: Shwetha G S
>Assignee: Shwetha G S
>  Labels: EL, extension
> Fix For: trunk
>
> Attachments: OOZIE-674.patch, OOZIE-674-v3.patch, OOZIE-674-v4.patch, 
> OOZIE-674-v5.patch, OOZIE-674-v6.patch, OOZIE-674-ver2.patch
>
>
> I have an EL extension today(0,0) which maps to start day of nominal time. 
> This is used to specify startInstance, endInstance and instance in dataIn and 
> dataOut of coordinator.
> In CoordCommandUtils.resolveInstanceRange(), getInstanceNumber has to return 
> the instance number with respect to current. So, for coord-action-create-inst 
> context, I have mapped today to current and hence getInstanceNumber returns 
> the correct number. But later in resolveInstanceRange(), getFuncType is 
> called with startInstance value which is today in this case and it maps to 
> UNEXPECTED and throws up. getFuncType should be passed the evaluation of 
> coord-action-create-inst context

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (OOZIE-674) resolveInstanceRange doesn't work for EL extensions

2013-06-09 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679002#comment-13679002
 ] 

Srikanth Sundarrajan commented on OOZIE-674:


Can this ported to 3.3.2 as well ?

> resolveInstanceRange doesn't work for EL extensions
> ---
>
> Key: OOZIE-674
> URL: https://issues.apache.org/jira/browse/OOZIE-674
> Project: Oozie
>  Issue Type: Bug
>Affects Versions: trunk
>Reporter: Shwetha G S
>Assignee: Shwetha G S
>  Labels: EL, extension
> Fix For: trunk
>
> Attachments: OOZIE-674.patch, OOZIE-674-v3.patch, OOZIE-674-v4.patch, 
> OOZIE-674-v5.patch, OOZIE-674-v6.patch, OOZIE-674-ver2.patch
>
>
> I have an EL extension today(0,0) which maps to start day of nominal time. 
> This is used to specify startInstance, endInstance and instance in dataIn and 
> dataOut of coordinator.
> In CoordCommandUtils.resolveInstanceRange(), getInstanceNumber has to return 
> the instance number with respect to current. So, for coord-action-create-inst 
> context, I have mapped today to current and hence getInstanceNumber returns 
> the correct number. But later in resolveInstanceRange(), getFuncType is 
> called with startInstance value which is today in this case and it maps to 
> UNEXPECTED and throws up. getFuncType should be passed the evaluation of 
> coord-action-create-inst context

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira