[jira] [Commented] (OOZIE-1770) Create Oozie Application Master for YARN
[ https://issues.apache.org/jira/browse/OOZIE-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257566#comment-15257566 ] Srikanth Sundarrajan commented on OOZIE-1770: - Thanks [~rkanter] for summarizing the discussions. Here are some additional things to think about while considering an AM pool 1. Currently we have one external url per action to track the execution of the action and associated logs. With the Yarn app per action, this would continue to work cleanly. It might introduce avoidable overheads to use AM Pool if logs & action execution details have to be tracked per action 2. With AppMaster being the launcher, it might be trivilally simple to handle AM restarts and RM restarts/fail overs. With the AM pool, I am guessing we need to worry about AM failures and container failures > Create Oozie Application Master for YARN > > > Key: OOZIE-1770 > URL: https://issues.apache.org/jira/browse/OOZIE-1770 > Project: Oozie > Issue Type: New Feature >Reporter: Bowen Zhang >Assignee: Bowen Zhang > Attachments: OozieYarnAM.pdf, Prelim OYA Scoping Doc 001.pdf, > oya-rm-screenshot.jpg, oya.patch > > > After the first release of oozie on hadoop 2, it will be good if users can > set execution engine in oozie conf, be it YARN AM or traditional MR. We can > target this for post oozie 4.1 release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2259) Create a callback action
[ https://issues.apache.org/jira/browse/OOZIE-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173726#comment-15173726 ] Srikanth Sundarrajan commented on OOZIE-2259: - [~rohini]/[~rkanter], You had some feedback on the patch (available on review board). Had shared my views on those in response. Would it be possible for either or both of you to let me know your thoughts. We can move forward on this based on your inputs. Thanks > Create a callback action > - > > Key: OOZIE-2259 > URL: https://issues.apache.org/jira/browse/OOZIE-2259 > Project: Oozie > Issue Type: New Feature > Components: action >Reporter: Jaydeep Vishwakarma >Assignee: Jaydeep Vishwakarma > Attachments: OOZIE-2259-v1.patch, OOZIE-2259-v3.patch, > OOZIE-2259-v4.patch, OOZIE-2259-v5.patch, OOZIE-2259-v8.patch, > OOZIE-2259-v9.patch, OOZIE-2259_v6.patch, OOZIE-2259_v7.patch > > > Need an action to send notification to external server by oozie. We should be > able to do multiple types of callback, Currently I know jms and http call. It > should suppose to have capability to call diffrent types of methods along > with n number of arguments. > The sample workflow with callback action > {code:xml} > > ... > > > [HOST] > [METHOD] > > [KEY][VALUE] > > ... > > ... > > ... > > {code} > HOST : by the host system can figure out if it is http or jms callback > action. System will send the notification to that host. > METHOD : it can be POST/GET/QUEUE/TOPIC -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2259) Create a callback action
[ https://issues.apache.org/jira/browse/OOZIE-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991910#comment-14991910 ] Srikanth Sundarrajan commented on OOZIE-2259: - My bad. I was under the impression separation of the thread pool is already in this. I know we discussed this, but forgot that this is scoped in another JIRA. [~puru], like I mentioned, I am in total agreement with the concern you had raised (that is why isolation of the thread pool is necessary). if this jira gets in, we need to follow that up OOZIE-2231 soon enough. > Create a callback action > - > > Key: OOZIE-2259 > URL: https://issues.apache.org/jira/browse/OOZIE-2259 > Project: Oozie > Issue Type: New Feature > Components: action >Reporter: Jaydeep Vishwakarma >Assignee: Jaydeep Vishwakarma > Attachments: OOZIE-2259-v1.patch, OOZIE-2259-v3.patch, > OOZIE-2259-v4.patch, OOZIE-2259-v5.patch > > > Need an action to send notification to external server by oozie. We should be > able to do multiple types of callback, Currently I know jms and http call. It > should suppose to have capability to call diffrent types of methods along > with n number of arguments. > The sample workflow with callback action > {code:xml} > > ... > > > [HOST] > [METHOD] > > [KEY][VALUE] > > ... > > ... > > ... > > {code} > HOST : by the host system can figure out if it is http or jms callback > action. System will send the notification to that host. > METHOD : it can be POST/GET/QUEUE/TOPIC -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2259) Create a callback action
[ https://issues.apache.org/jira/browse/OOZIE-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991214#comment-14991214 ] Srikanth Sundarrajan commented on OOZIE-2259: - [~puru], I feel that isolating this into a different thread pool was necessary for exactly solving the issue that you highlighted. If the callback action were to be in the main command queue execution threadpool, it can potentially take the system for a ride. The only issue I see is that if there were significant back pressure on the callback end point, then the auxillary queue for callback actions may grow and put some memory pressure. But eventually it would start throttling down the materialization of the coordinator that triggered of the workflow/action. Generally the sense I get is that the there are enough safe guards to prevent general degradation of other services within the system. > Create a callback action > - > > Key: OOZIE-2259 > URL: https://issues.apache.org/jira/browse/OOZIE-2259 > Project: Oozie > Issue Type: New Feature > Components: action >Reporter: Jaydeep Vishwakarma >Assignee: Jaydeep Vishwakarma > Attachments: OOZIE-2259-v1.patch, OOZIE-2259-v3.patch, > OOZIE-2259-v4.patch, OOZIE-2259-v5.patch > > > Need an action to send notification to external server by oozie. We should be > able to do multiple types of callback, Currently I know jms and http call. It > should suppose to have capability to call diffrent types of methods along > with n number of arguments. > The sample workflow with callback action > {code:xml} > > ... > > > [HOST] > [METHOD] > > [KEY][VALUE] > > ... > > ... > > ... > > {code} > HOST : by the host system can figure out if it is http or jms callback > action. System will send the notification to that host. > METHOD : it can be POST/GET/QUEUE/TOPIC -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (OOZIE-1534) Launcher job might run do hadoop attempt relaunch - possibly causing correctness issues
[ https://issues.apache.org/jira/browse/OOZIE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Sundarrajan resolved OOZIE-1534. - Resolution: Fixed Assignee: Jaydeep Vishwakarma Fix Version/s: 4.2.0 With the use of Yarn-tags this is solved for hadoop-2 > Launcher job might run do hadoop attempt relaunch - possibly causing > correctness issues > --- > > Key: OOZIE-1534 > URL: https://issues.apache.org/jira/browse/OOZIE-1534 > Project: Oozie > Issue Type: Improvement >Reporter: Srikanth Sundarrajan >Assignee: Jaydeep Vishwakarma > Fix For: 4.2.0 > > > The section of the action allow to clean up the output dir. This is > not sufficient as MR jobs started by Pig/Hive may be still running.We should > look to kill child MR jobs if any before launching new ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2258) Introducing a new counter in the instrumentation log to distinguish between the reasons for launcher failure
[ https://issues.apache.org/jira/browse/OOZIE-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907653#comment-14907653 ] Srikanth Sundarrajan commented on OOZIE-2258: - {code} @@ -1411,12 +1414,20 @@ public class JavaActionExecutor extends ActionExecutor { if (exMsg != null) { LOG.warn("Launcher exception: {0}{E}{1}", exMsg, exStackTrace); } +else { +childJobKill = true; +} {code} Not sure if this is in the right place. Possible to add a test. A more fundamental question. How do we intend to use this ? > Introducing a new counter in the instrumentation log to distinguish between > the reasons for launcher failure > > > Key: OOZIE-2258 > URL: https://issues.apache.org/jira/browse/OOZIE-2258 > Project: Oozie > Issue Type: Improvement >Reporter: Narayan Periwal >Assignee: Narayan Periwal > Attachments: OOZIE-2258-v0.patch, OOZIE-2258-v1.patch > > > Whether the launcher job fails due to child job failure or exception in the > launcher job itself, in both the case, the "counters:jobs:killed" counter is > updated in the instrumentation log. Hence, we cannot distinguish whether the > launcher failure was due to child job getting failed or not. So, we can > introduce a new counter "kill" under the group "childjobs" that will help us > to distinguish if the launcher failure is due to the child jobs getting > failed. > Let me know if there is already any other way by which we can distinguish > this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2314) Unable to kill old instance child job by workflow or coord rerun by Launcher
[ https://issues.apache.org/jira/browse/OOZIE-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907637#comment-14907637 ] Srikanth Sundarrajan commented on OOZIE-2314: - Good catch [~jaydeepvishwakarma]. Thanks for the patch. A minor nit, have left my comments in RB. > Unable to kill old instance child job by workflow or coord rerun by Launcher > > > Key: OOZIE-2314 > URL: https://issues.apache.org/jira/browse/OOZIE-2314 > Project: Oozie > Issue Type: Bug >Reporter: Jaydeep Vishwakarma >Assignee: Jaydeep Vishwakarma >Priority: Blocker > Attachments: OOZIE-2314.patch > > > Oozie launcher kills all the child jobs which, launched by an old instance of > same launcher, workflow or coord action to avoid the duplicate child running > at same. For same it searches the application ids by tag and time, And it > kills all AMs. You can find more detail in OOZIE-2129. > It works fine when Launcher attempt gets killed and tries again. In case of > Yarn container which contains AM get kills due to some reason and we run > workflow/coord action this patch does not work. >It happens due to a time filter applied during finding the app ids, which > always takes the current time from the server. >{{LauncherMapperHelper.java}} >{code} >public static void setupYarnRestartHandling(JobConf launcherJobConf, > Configuration actionConf, String launcherTag) >throws NoSuchAlgorithmException { > > launcherJobConf.setLong(LauncherMainHadoopUtils.OOZIE_JOB_LAUNCH_TIME, > System.currentTimeMillis()); >// Tags are limited to 100 chars so we need to hash them to make > sure (the actionId otherwise doesn't have a max length) >String tag = getTag(launcherTag); >// keeping the oozie.child.mapreduce.job.tags instead of > mapreduce.job.tags to avoid killing launcher itself. >// mapreduce.job.tags should only go to child job launch by > launcher. >actionConf.set(LauncherMainHadoopUtils.CHILD_MAPREDUCE_JOB_TAGS, > tag); >} >{code} > When a user rerun the workflow or coord action, Launcher picks the current > system time along with tags, It searches for running application ids and > kills them. It eventually does not find any App Id, As the previous instance > of the same workflow/coord ran before the new system time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2251) Expose instrumental matrices in Realtime Graphing tool
[ https://issues.apache.org/jira/browse/OOZIE-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901229#comment-14901229 ] Srikanth Sundarrajan commented on OOZIE-2251: - Looks good to me, but this needs to be rebased. > Expose instrumental matrices in Realtime Graphing tool > -- > > Key: OOZIE-2251 > URL: https://issues.apache.org/jira/browse/OOZIE-2251 > Project: Oozie > Issue Type: New Feature > Components: monitoring >Reporter: Jaydeep Vishwakarma >Assignee: Narayan Periwal > Attachments: OOZIE-2251-v0.patch, OOZIE-2251-v1.patch, > OOZIE-2251-v10.patch, OOZIE-2251-v2.patch, OOZIE-2251-v3.patch, > OOZIE-2251-v4.patch, OOZIE-2251-v5.patch, OOZIE-2251-v6.patch, > OOZIE-2251-v7.patch, OOZIE-2251-v8.patch, OOZIE-2251-v9.patch > > > We have been logging so many important matrices in oozie-instrumentation.log > . These information is very useful for oozie functional monitoring. But it is > always difficult to get the meaning from flat file. If we expose this > information on some graphing tool, We can get the lot of meaning out of it > and can take some actions based on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2243) Kill Command does not kill the child job for java action
[ https://issues.apache.org/jira/browse/OOZIE-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901218#comment-14901218 ] Srikanth Sundarrajan commented on OOZIE-2243: - This one needs to be rebased I guess. [~nperiwal], can you rebase and put the patch up in review board please? > Kill Command does not kill the child job for java action > > > Key: OOZIE-2243 > URL: https://issues.apache.org/jira/browse/OOZIE-2243 > Project: Oozie > Issue Type: Bug >Reporter: Narayan Periwal >Assignee: Narayan Periwal >Priority: Minor > Attachments: OOZIE-2243-v0.patch, OOZIE-2243-v1.patch, > OOZIE-2243-v2.patch > > > Lets say, there is launcher job that launches another map-reduce job through > java-action. When we kill the launcher job, the child job launched by it does > not get killed and only the launcher job gets killed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-1770) Create Oozie Application Master for YARN
[ https://issues.apache.org/jira/browse/OOZIE-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632228#comment-14632228 ] Srikanth Sundarrajan commented on OOZIE-1770: - Running Oozie launcher tasks as a Map reduce task/job is indeed a huge hack and we should most certainly look to take advantage of YARN and integrate more directly with it. Here are possibly some direct rewards that we should look to reap with such a direct integration. - Cleaner integration (No artificial split creations, Input & Output exchange mechanisms) - Assumptions in MR of tasks being idempotent is a huge limitation and new solution should be able to overcome this - Heavy resource overheads in terms of App Master/Launcher task for each action can be avoided - Issues such as App Master restarts or Task Attempt relaunches causes both lost work and possibly issues with data today. They can be avoided Taking a step back, here is the list of possible ways in which we can integrate with YARN more natively. +Actions executed via Native Oozie App Master+ An App Master which is capable of executing Oozie Action directly as opposed to making it appear as a MR Job. This in all likely hood going to appear like the current MR based execution in uber mode. Doesn't really offer much other than moving away from Map task execution mode. +Actions executed via Single AM per user+ A reusable Oozie AM per user, which creates launcher containers for each action (as proposed by [~rkanter]). This would allow us to reduce the AM overheads and also reduce the launch latency (as AMs are ready and warmed up) and would launch tasks more natively as opposed to it appearing as MR job. +Workflows executed via a Single AM+ Run the entire workflow in a single AM. In this mode, the workflow and all its actions (DagEngine) is actually executed on the Oozie Workflow AM and all the child actions can either be executed in a action specific thread/class loader by default with an ability to execute them in a forked container. In this mode, the Oozie Workflows can be executed at a much lower overheads, with the possibility of lowering the burden on Oozie server. This ofcourse introduces challenges relating to maintaining state in Oozie DB relating to workflow execution. However can be solved by maintaining state in HDFS with notification + polling based updates by Oozie server to DB. My personal choice would be to do the last option as we can allow Workflow execution to be used outside of Oozie Coordinators besides allwoing Oozie server to scale better, while keeping the larger objective of moving away from Map Reduce jobs for Oozie actions. Thoughts ? > Create Oozie Application Master for YARN > > > Key: OOZIE-1770 > URL: https://issues.apache.org/jira/browse/OOZIE-1770 > Project: Oozie > Issue Type: New Feature >Reporter: Bowen Zhang >Assignee: Bowen Zhang > Attachments: oya-rm-screenshot.jpg, oya.patch > > > After the first release of oozie on hadoop 2, it will be good if users can > set execution engine in oozie conf, be it YARN AM or traditional MR. We can > target this for post oozie 4.1 release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2302) Reload feature for oozie-site config
[ https://issues.apache.org/jira/browse/OOZIE-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14626031#comment-14626031 ] Srikanth Sundarrajan commented on OOZIE-2302: - Consider separating the conf into static startup config and a dynamic runtime config. This however is backward incompatible and would require a general consensus or a way to keep it compatible, till user chooses to separate them. > Reload feature for oozie-site config > - > > Key: OOZIE-2302 > URL: https://issues.apache.org/jira/browse/OOZIE-2302 > Project: Oozie > Issue Type: New Feature > Components: core >Reporter: Jaydeep Vishwakarma >Assignee: Jaydeep Vishwakarma > > Whenever user wants to add/modify any property, He has to restart the oozie > server to see the impact of config updates. It is very inconvenient as User > has to either kill or drain out all the jobs from oozie, which eventually > lead to slow down the production pace. We should suppose to have reload > support for config updates. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2030) Configuration properties from global section is not getting set in Hadoop job conf when using sub-workflow action in Oozie workflow.xml
[ https://issues.apache.org/jira/browse/OOZIE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14626013#comment-14626013 ] Srikanth Sundarrajan commented on OOZIE-2030: - >From what I understood there were 2 concerns with the patch 1> Not removing the global conf from workflow conf before persistence 2> Not encoding global conf and using xml as is Given the global section is being added only to the workflow conf (in compressed state), am assuming there isn't much storage overhead and retaining the conf in xml format, might not be all that bad as long as there is no direct string manipulation of xmls. Any further work needed on the patch ? > Configuration properties from global section is not getting set in Hadoop job > conf when using sub-workflow action in Oozie workflow.xml > > > Key: OOZIE-2030 > URL: https://issues.apache.org/jira/browse/OOZIE-2030 > Project: Oozie > Issue Type: Bug > Components: action >Reporter: Peeyush Bishnoi >Assignee: Jaydeep Vishwakarma > Attachments: OOZIE-2030-v2.patch, OOZIE-2030-v3.patch, > OOZIE-2030-v4.patch, OOZIE-2030-v5.patch, OOZIE-2030-v6.patch, > OOZIE-2030.patch > > > When submitting Oozie workflow with sub-workflow action and with global > section, configuration properties defined in global section is not getting > set in launched Hadoop job conf. But when we use Pig or MR action in > workflow.xml, configuration properties from global section set properly into > Hadoop job conf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2299) Falcon build fails with Oozie-4.2.0
[ https://issues.apache.org/jira/browse/OOZIE-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14620439#comment-14620439 ] Srikanth Sundarrajan commented on OOZIE-2299: - In an offline conversation with [~pallavi.rao], she seemed to suggest that the issue is due to codehaus repository being decommissioned. > Falcon build fails with Oozie-4.2.0 > --- > > Key: OOZIE-2299 > URL: https://issues.apache.org/jira/browse/OOZIE-2299 > Project: Oozie > Issue Type: Bug >Affects Versions: 4.2.0, 4.3.0 >Reporter: Peeyush Bishnoi >Priority: Blocker > Fix For: 4.2.0 > > > Falcon build fails with following error when try to build with Apache > Oozie-4.2.0. > {code:java} > [INFO] Apache Falcon Oozie EL Extension ... FAILURE [ 1.388 > s] > [INFO] Apache Falcon Embedded Hadoop - Test Cluster ... SKIPPED > [INFO] Apache Falcon Sharelib Hive - Test Cluster . SKIPPED > [INFO] Apache Falcon Sharelib Pig - Test Cluster .. SKIPPED > [INFO] Apache Falcon Sharelib Hcatalog - Test Cluster . SKIPPED > [INFO] Apache Falcon Sharelib Oozie - Test Cluster SKIPPED > [INFO] Apache Falcon Test Tools - Test Cluster SKIPPED > [INFO] Apache Falcon Messaging SKIPPED > [INFO] Apache Falcon Oozie Adaptor SKIPPED > [INFO] Apache Falcon Acquisition .. SKIPPED > [INFO] Apache Falcon Distcp Replication ... SKIPPED > [INFO] Apache Falcon Retention SKIPPED > [INFO] Apache Falcon Archival . SKIPPED > [INFO] Apache Falcon Rerun SKIPPED > [INFO] Apache Falcon Prism SKIPPED > [INFO] Apache Falcon Hive Replication . SKIPPED > [INFO] Apache Falcon Web Application .. SKIPPED > [INFO] Apache Falcon Documentation SKIPPED > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 35.898 s > [INFO] Finished at: 2015-07-08T22:30:44+05:30 > [INFO] Final Memory: 151M/613M > [INFO] > > [ERROR] Failed to execute goal on project falcon-oozie-el-extension: Could > not resolve dependencies for project > org.apache.falcon:falcon-oozie-el-extension:jar:0.7-SNAPSHOT: Failed to > collect dependencies at org.apache.oozie:oozie-core:jar:4.2.0-falcon -> > org.apache.oozie:oozie-client:jar:4.2.0-falcon -> > org.apache.oozie:oozie-hadoop-auth:jar:hadoop-1-4.2.0-falcon: Failed to read > artifact descriptor for > org.apache.oozie:oozie-hadoop-auth:jar:hadoop-1-4.2.0-falcon: Could not > transfer artifact > org.apache.oozie:oozie-hadoop-auth:pom:hadoop-1-4.2.0-falcon from/to Codehaus > repository (http://repository.codehaus.org/): Failed to transfer file: > http://repository.codehaus.org/org/apache/oozie/oozie-hadoop-auth/hadoop-1-4.2.0-falcon/oozie-hadoop-auth-hadoop-1-4.2.0-falcon.pom. > Return code is: 410 , ReasonPhrase:Gone. -> [Help 1] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2299) Falcon build fails with Oozie-4.2.0
[ https://issues.apache.org/jira/browse/OOZIE-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14620376#comment-14620376 ] Srikanth Sundarrajan commented on OOZIE-2299: - Perhaps this needs to be fixed in Falcon. [~peeyushb], Can you verify and move to Falcon project if necessary ? > Falcon build fails with Oozie-4.2.0 > --- > > Key: OOZIE-2299 > URL: https://issues.apache.org/jira/browse/OOZIE-2299 > Project: Oozie > Issue Type: Bug >Affects Versions: 4.2.0, 4.3.0 >Reporter: Peeyush Bishnoi >Priority: Blocker > Fix For: 4.2.0 > > > Falcon build fails with following error when try to build with Apache > Oozie-4.2.0. > {code:java} > [INFO] Apache Falcon Oozie EL Extension ... FAILURE [ 1.388 > s] > [INFO] Apache Falcon Embedded Hadoop - Test Cluster ... SKIPPED > [INFO] Apache Falcon Sharelib Hive - Test Cluster . SKIPPED > [INFO] Apache Falcon Sharelib Pig - Test Cluster .. SKIPPED > [INFO] Apache Falcon Sharelib Hcatalog - Test Cluster . SKIPPED > [INFO] Apache Falcon Sharelib Oozie - Test Cluster SKIPPED > [INFO] Apache Falcon Test Tools - Test Cluster SKIPPED > [INFO] Apache Falcon Messaging SKIPPED > [INFO] Apache Falcon Oozie Adaptor SKIPPED > [INFO] Apache Falcon Acquisition .. SKIPPED > [INFO] Apache Falcon Distcp Replication ... SKIPPED > [INFO] Apache Falcon Retention SKIPPED > [INFO] Apache Falcon Archival . SKIPPED > [INFO] Apache Falcon Rerun SKIPPED > [INFO] Apache Falcon Prism SKIPPED > [INFO] Apache Falcon Hive Replication . SKIPPED > [INFO] Apache Falcon Web Application .. SKIPPED > [INFO] Apache Falcon Documentation SKIPPED > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 35.898 s > [INFO] Finished at: 2015-07-08T22:30:44+05:30 > [INFO] Final Memory: 151M/613M > [INFO] > > [ERROR] Failed to execute goal on project falcon-oozie-el-extension: Could > not resolve dependencies for project > org.apache.falcon:falcon-oozie-el-extension:jar:0.7-SNAPSHOT: Failed to > collect dependencies at org.apache.oozie:oozie-core:jar:4.2.0-falcon -> > org.apache.oozie:oozie-client:jar:4.2.0-falcon -> > org.apache.oozie:oozie-hadoop-auth:jar:hadoop-1-4.2.0-falcon: Failed to read > artifact descriptor for > org.apache.oozie:oozie-hadoop-auth:jar:hadoop-1-4.2.0-falcon: Could not > transfer artifact > org.apache.oozie:oozie-hadoop-auth:pom:hadoop-1-4.2.0-falcon from/to Codehaus > repository (http://repository.codehaus.org/): Failed to transfer file: > http://repository.codehaus.org/org/apache/oozie/oozie-hadoop-auth/hadoop-1-4.2.0-falcon/oozie-hadoop-auth-hadoop-1-4.2.0-falcon.pom. > Return code is: 410 , ReasonPhrase:Gone. -> [Help 1] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2251) Expose instrumental matrices in Realtime Graphing tool
[ https://issues.apache.org/jira/browse/OOZIE-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14616187#comment-14616187 ] Srikanth Sundarrajan commented on OOZIE-2251: - [~nperiwal], If some user were to use Ganglia in their environment, then they will need additionally add the excluded dependency to the class path. It would be useful to cover this in the docs. > Expose instrumental matrices in Realtime Graphing tool > -- > > Key: OOZIE-2251 > URL: https://issues.apache.org/jira/browse/OOZIE-2251 > Project: Oozie > Issue Type: New Feature > Components: monitoring >Reporter: Jaydeep Vishwakarma >Assignee: Narayan Periwal > Attachments: OOZIE-2251-v0.patch, OOZIE-2251-v1.patch, > OOZIE-2251-v2.patch, OOZIE-2251-v3.patch, OOZIE-2251-v4.patch, > OOZIE-2251-v5.patch, OOZIE-2251-v6.patch, OOZIE-2251-v7.patch, > OOZIE-2251-v8.patch, OOZIE-2251-v9.patch > > > We have been logging so many important matrices in oozie-instrumentation.log > . These information is very useful for oozie functional monitoring. But it is > always difficult to get the meaning from flat file. If we expose this > information on some graphing tool, We can get the lot of meaning out of it > and can take some actions based on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2030) Configuration properties from global section is not getting set in Hadoop job conf when using sub-workflow action in Oozie workflow.xml
[ https://issues.apache.org/jira/browse/OOZIE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614897#comment-14614897 ] Srikanth Sundarrajan commented on OOZIE-2030: - +1 > Configuration properties from global section is not getting set in Hadoop job > conf when using sub-workflow action in Oozie workflow.xml > > > Key: OOZIE-2030 > URL: https://issues.apache.org/jira/browse/OOZIE-2030 > Project: Oozie > Issue Type: Bug > Components: action >Reporter: Peeyush Bishnoi >Assignee: Jaydeep Vishwakarma > Attachments: OOZIE-2030-v2.patch, OOZIE-2030-v3.patch, > OOZIE-2030-v4.patch, OOZIE-2030-v5.patch, OOZIE-2030.patch > > > When submitting Oozie workflow with sub-workflow action and with global > section, configuration properties defined in global section is not getting > set in launched Hadoop job conf. But when we use Pig or MR action in > workflow.xml, configuration properties from global section set properly into > Hadoop job conf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2253) Spark Job is failing when it is running in standalone server
[ https://issues.apache.org/jira/browse/OOZIE-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587506#comment-14587506 ] Srikanth Sundarrajan commented on OOZIE-2253: - Looks good. +1 > Spark Job is failing when it is running in standalone server > > > Key: OOZIE-2253 > URL: https://issues.apache.org/jira/browse/OOZIE-2253 > Project: Oozie > Issue Type: Bug >Reporter: pavan kumar kolamuri >Assignee: pavan kumar kolamuri > Attachments: OOZIE-2253.patch > > > When Spark Job is running in spark standalone cluster the job is getting > launched and succedded and infinite jobs are getting launched in spark > cluster. Oozie workflow will be in running state forever as spark is > launching job infinite times. > This might be because in spark when job succeeds and it always do > System.exit(0) . In LauncherSecurityManager exception is thrown for this. It > looks like spark(through akka framework) is catching that and launching one > more attempt for the same job. It is happening infinitely . > {noformat} > Sending launch command to spark://inmobi-Precision-T3610:7077 > Driver successfully submitted as driver-20150526105806- > ... waiting before polling master for driver state > ... polling master for driver state > State of driver-20150526105806- is SUBMITTED > Sending launch command to spark://inmobi-Precision-T3610:7077 > Driver successfully submitted as driver-20150526105811-0001 > ... waiting before polling master for driver state > ... polling master for driver state > State of driver-20150526105811-0001 is SUBMITTED > Sending launch command to spark://inmobi-Precision-T3610:7077 > Driver successfully submitted as driver-20150526105816-0002 > ... waiting before polling master for driver state > ... polling master for driver state > State of driver-20150526105816-0002 is SUBMITTED > Sending launch command to spark://inmobi-Precision-T3610:7077 > Driver successfully submitted as driver-20150526105821-0003 > ... waiting before polling master for driver state > ... polling master for driver state > State of driver-20150526105821-0003 is SUBMITTED > Sending launch command to spark://inmobi-Precision-T3610:7077 > Driver successfully submitted as driver-20150526105826-0004 > ... waiting before polling master for driver state > {noformat} > {noformat} > 2015-05-26 10:58:11,573 ERROR [driverClient-akka.actor.default-dispatcher-4] > akka.actor.OneForOneStrategy: Intercepted System.exit(0) > java.lang.SecurityException: Intercepted System.exit(0) > at > org.apache.oozie.action.hadoop.LauncherSecurityManager.checkExit(LauncherMapper.java:601) > at java.lang.Runtime.exit(Runtime.java:107) > at java.lang.System.exit(System.java:962) > at > org.apache.spark.deploy.ClientActor.pollAndReportStatus(Client.scala:115) > at > org.apache.spark.deploy.ClientActor$$anonfun$receiveWithLogging$1.applyOrElse(Client.scala:123) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53) > at > org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) > at > org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > at akka.actor.ActorCell.invoke(ActorCell.scala:456) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2253) Spark Job is failing when it is running in standalone server
[ https://issues.apache.org/jira/browse/OOZIE-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587482#comment-14587482 ] Srikanth Sundarrajan commented on OOZIE-2253: - Can you please upload to review board ? > Spark Job is failing when it is running in standalone server > > > Key: OOZIE-2253 > URL: https://issues.apache.org/jira/browse/OOZIE-2253 > Project: Oozie > Issue Type: Bug >Reporter: pavan kumar kolamuri >Assignee: pavan kumar kolamuri > Attachments: OOZIE-2253.patch > > > When Spark Job is running in spark standalone cluster the job is getting > launched and succedded and infinite jobs are getting launched in spark > cluster. Oozie workflow will be in running state forever as spark is > launching job infinite times. > This might be because in spark when job succeeds and it always do > System.exit(0) . In LauncherSecurityManager exception is thrown for this. It > looks like spark(through akka framework) is catching that and launching one > more attempt for the same job. It is happening infinitely . > {noformat} > Sending launch command to spark://inmobi-Precision-T3610:7077 > Driver successfully submitted as driver-20150526105806- > ... waiting before polling master for driver state > ... polling master for driver state > State of driver-20150526105806- is SUBMITTED > Sending launch command to spark://inmobi-Precision-T3610:7077 > Driver successfully submitted as driver-20150526105811-0001 > ... waiting before polling master for driver state > ... polling master for driver state > State of driver-20150526105811-0001 is SUBMITTED > Sending launch command to spark://inmobi-Precision-T3610:7077 > Driver successfully submitted as driver-20150526105816-0002 > ... waiting before polling master for driver state > ... polling master for driver state > State of driver-20150526105816-0002 is SUBMITTED > Sending launch command to spark://inmobi-Precision-T3610:7077 > Driver successfully submitted as driver-20150526105821-0003 > ... waiting before polling master for driver state > ... polling master for driver state > State of driver-20150526105821-0003 is SUBMITTED > Sending launch command to spark://inmobi-Precision-T3610:7077 > Driver successfully submitted as driver-20150526105826-0004 > ... waiting before polling master for driver state > {noformat} > {noformat} > 2015-05-26 10:58:11,573 ERROR [driverClient-akka.actor.default-dispatcher-4] > akka.actor.OneForOneStrategy: Intercepted System.exit(0) > java.lang.SecurityException: Intercepted System.exit(0) > at > org.apache.oozie.action.hadoop.LauncherSecurityManager.checkExit(LauncherMapper.java:601) > at java.lang.Runtime.exit(Runtime.java:107) > at java.lang.System.exit(System.java:962) > at > org.apache.spark.deploy.ClientActor.pollAndReportStatus(Client.scala:115) > at > org.apache.spark.deploy.ClientActor$$anonfun$receiveWithLogging$1.applyOrElse(Client.scala:123) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53) > at > org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) > at > org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > at akka.actor.ActorCell.invoke(ActorCell.scala:456) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2030) Configuration properties from global section is not getting set in Hadoop job conf when using sub-workflow action in Oozie workflow.xml
[ https://issues.apache.org/jira/browse/OOZIE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585679#comment-14585679 ] Srikanth Sundarrajan commented on OOZIE-2030: - Didn't realize that [~shwethags] has already proposed the same approach. Sorry > Configuration properties from global section is not getting set in Hadoop job > conf when using sub-workflow action in Oozie workflow.xml > > > Key: OOZIE-2030 > URL: https://issues.apache.org/jira/browse/OOZIE-2030 > Project: Oozie > Issue Type: Bug > Components: action >Reporter: Peeyush Bishnoi >Assignee: Jaydeep Vishwakarma > Attachments: OOZIE-2030-v2.patch, OOZIE-2030-v3.patch, > OOZIE-2030-v4.patch, OOZIE-2030.patch > > > When submitting Oozie workflow with sub-workflow action and with global > section, configuration properties defined in global section is not getting > set in launched Hadoop job conf. But when we use Pig or MR action in > workflow.xml, configuration properties from global section set properly into > Hadoop job conf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2030) Configuration properties from global section is not getting set in Hadoop job conf when using sub-workflow action in Oozie workflow.xml
[ https://issues.apache.org/jira/browse/OOZIE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585671#comment-14585671 ] Srikanth Sundarrajan commented on OOZIE-2030: - There is perhaps a simpler way to tackle this issue. If we modify the LiteWorkflowAppParser to serialize persist the contents of global in conf and have handleGlobal() also consult conf for the handling section, this will ensure that global is propagated correctly with no further changes to any other section of the code, honoring the right overlay priorities. While the code in LWAP wouldn't be specific to Subworkflows either. Luckily the conf itself is propagates into subflows on user's request. - [~shwethags], [~rohini], makes sense ? > Configuration properties from global section is not getting set in Hadoop job > conf when using sub-workflow action in Oozie workflow.xml > > > Key: OOZIE-2030 > URL: https://issues.apache.org/jira/browse/OOZIE-2030 > Project: Oozie > Issue Type: Bug > Components: action >Reporter: Peeyush Bishnoi >Assignee: Jaydeep Vishwakarma > Attachments: OOZIE-2030-v2.patch, OOZIE-2030-v3.patch, > OOZIE-2030-v4.patch, OOZIE-2030.patch > > > When submitting Oozie workflow with sub-workflow action and with global > section, configuration properties defined in global section is not getting > set in launched Hadoop job conf. But when we use Pig or MR action in > workflow.xml, configuration properties from global section set properly into > Hadoop job conf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2251) Expose instrumental matrices in Realtime Graphing tool
[ https://issues.apache.org/jira/browse/OOZIE-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584520#comment-14584520 ] Srikanth Sundarrajan commented on OOZIE-2251: - +1 for the patch. Additional note FWIW, Gmetric pulls in an additional dependency (LGPL) which isn't compatible. This might be an issue if binary releases were to include all transitive dependencies. Some details can be found [here|https://cwiki.apache.org/confluence/display/LENS/Licensing+in+Apache+Lens]. For additional reference see SPARK-1167 > Expose instrumental matrices in Realtime Graphing tool > -- > > Key: OOZIE-2251 > URL: https://issues.apache.org/jira/browse/OOZIE-2251 > Project: Oozie > Issue Type: New Feature > Components: monitoring >Reporter: Jaydeep Vishwakarma >Assignee: Narayan Periwal > Attachments: OOZIE-2251-v0.patch, OOZIE-2251-v1.patch, > OOZIE-2251-v2.patch, OOZIE-2251-v3.patch, OOZIE-2251-v4.patch, > OOZIE-2251-v5.patch, OOZIE-2251-v6.patch, OOZIE-2251-v7.patch > > > We have been logging so many important matrices in oozie-instrumentation.log > . These information is very useful for oozie functional monitoring. But it is > always difficult to get the meaning from flat file. If we expose this > information on some graphing tool, We can get the lot of meaning out of it > and can take some actions based on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2030) Configuration properties from global section is not getting set in Hadoop job conf when using sub-workflow action in Oozie workflow.xml
[ https://issues.apache.org/jira/browse/OOZIE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583370#comment-14583370 ] Srikanth Sundarrajan commented on OOZIE-2030: - We need to have a uniform mechanism to parse global section and make them available to the action executors and then it should be upto the executors on how to use them as they deem fit. > Configuration properties from global section is not getting set in Hadoop job > conf when using sub-workflow action in Oozie workflow.xml > > > Key: OOZIE-2030 > URL: https://issues.apache.org/jira/browse/OOZIE-2030 > Project: Oozie > Issue Type: Bug > Components: action >Reporter: Peeyush Bishnoi >Assignee: Jaydeep Vishwakarma > Attachments: OOZIE-2030-v2.patch, OOZIE-2030-v3.patch, > OOZIE-2030-v4.patch, OOZIE-2030.patch > > > When submitting Oozie workflow with sub-workflow action and with global > section, configuration properties defined in global section is not getting > set in launched Hadoop job conf. But when we use Pig or MR action in > workflow.xml, configuration properties from global section set properly into > Hadoop job conf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2030) Configuration properties from global section is not getting set in Hadoop job conf when using sub-workflow action in Oozie workflow.xml
[ https://issues.apache.org/jira/browse/OOZIE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583369#comment-14583369 ] Srikanth Sundarrajan commented on OOZIE-2030: - My bad, I must have confused it with the other patch I was reviewing. > Configuration properties from global section is not getting set in Hadoop job > conf when using sub-workflow action in Oozie workflow.xml > > > Key: OOZIE-2030 > URL: https://issues.apache.org/jira/browse/OOZIE-2030 > Project: Oozie > Issue Type: Bug > Components: action >Reporter: Peeyush Bishnoi >Assignee: Jaydeep Vishwakarma > Attachments: OOZIE-2030-v2.patch, OOZIE-2030-v3.patch, > OOZIE-2030-v4.patch, OOZIE-2030.patch > > > When submitting Oozie workflow with sub-workflow action and with global > section, configuration properties defined in global section is not getting > set in launched Hadoop job conf. But when we use Pig or MR action in > workflow.xml, configuration properties from global section set properly into > Hadoop job conf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2030) Configuration properties from global section is not getting set in Hadoop job conf when using sub-workflow action in Oozie workflow.xml
[ https://issues.apache.org/jira/browse/OOZIE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583368#comment-14583368 ] Srikanth Sundarrajan commented on OOZIE-2030: - My bad, I must have confused it with the other patch I was reviewing. > Configuration properties from global section is not getting set in Hadoop job > conf when using sub-workflow action in Oozie workflow.xml > > > Key: OOZIE-2030 > URL: https://issues.apache.org/jira/browse/OOZIE-2030 > Project: Oozie > Issue Type: Bug > Components: action >Reporter: Peeyush Bishnoi >Assignee: Jaydeep Vishwakarma > Attachments: OOZIE-2030-v2.patch, OOZIE-2030-v3.patch, > OOZIE-2030-v4.patch, OOZIE-2030.patch > > > When submitting Oozie workflow with sub-workflow action and with global > section, configuration properties defined in global section is not getting > set in launched Hadoop job conf. But when we use Pig or MR action in > workflow.xml, configuration properties from global section set properly into > Hadoop job conf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2030) Configuration properties from global section is not getting set in Hadoop job conf when using sub-workflow action in Oozie workflow.xml
[ https://issues.apache.org/jira/browse/OOZIE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582934#comment-14582934 ] Srikanth Sundarrajan commented on OOZIE-2030: - Yes [~shwethags], didn't want two open review request pending in review board at the same time, hence the request. Thanks > Configuration properties from global section is not getting set in Hadoop job > conf when using sub-workflow action in Oozie workflow.xml > > > Key: OOZIE-2030 > URL: https://issues.apache.org/jira/browse/OOZIE-2030 > Project: Oozie > Issue Type: Bug > Components: action >Reporter: Peeyush Bishnoi >Assignee: Jaydeep Vishwakarma > Attachments: OOZIE-2030-v2.patch, OOZIE-2030-v3.patch, > OOZIE-2030-v4.patch, OOZIE-2030.patch > > > When submitting Oozie workflow with sub-workflow action and with global > section, configuration properties defined in global section is not getting > set in launched Hadoop job conf. But when we use Pig or MR action in > workflow.xml, configuration properties from global section set properly into > Hadoop job conf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2020) Rerun all Failed/killed/timedout coordinator actions rather than specifying action numbers
[ https://issues.apache.org/jira/browse/OOZIE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582325#comment-14582325 ] Srikanth Sundarrajan commented on OOZIE-2020: - Is it intentional that succeeded action’s can’t be re-run through this filter or is it an oversight ? Also it might help to call this something else, FILTERSTATUS appears like a generic collection of statuses that can be used for filter. Why not use COMPLETED_COORD_ACTION_STATUSES, which seems more appropriate. {code} +public enum FILTERSTATUS {KILLED, FAILED, TIMEDOUT}; {code} If you do consider all actions in completed status for re-run consider renaming GET_TERMINATED* in JPA modules to GET_COMPLETED* ? {{LocalOozieClientCoord}} Should we have a different error code instead for input checks ? {code} throw new CommandException(ErrorCode.E1018, "Invalid value provided for filter option; " +
"Valid Value is 'status=KILLED;status=FAILED;status=TIMEDOUT'");
 {code} {{V1JobsServlet}} parseFilters is redundantly implemeented in two places in the server. Possible to reconcile ? {{TestCoordRerunXCommand}} 
why not checking for waiting in this case as in others ? {code} assertNotSame(action2.getStatus(), CoordinatorAction.Status.SUCCEEDED); {code} Few nits: * Plenty of unused imports introduced in the patch * Few unused variables * javadoc incorrect:: CoordUtils * Use logger instead of print stack trace to SysErr (LocalOozieClientCoord) * Error message in OozieCLI isn't consistent with other invalid input exceptions > Rerun all Failed/killed/timedout coordinator actions rather than specifying > action numbers > -- > > Key: OOZIE-2020 > URL: https://issues.apache.org/jira/browse/OOZIE-2020 > Project: Oozie > Issue Type: New Feature > Components: action >Reporter: Sreedish P S >Assignee: Narayan Periwal >Priority: Minor > Attachments: OOZIE-2020-v10.patch, OOZIE-2020-v11.patch, > OOZIE-2020-v12.patch, OOZIE-2020-v13.patch, OOZIE-2020-v14.patch, > OOZIE-2020-v15.patch, OOZIE-2020-v16.patch, OOZIE-2020-v17.patch, > OOZIE-2020-v8.patch, OOZIE-2020-v9.patch > > > Currently rerun of coordinator actions are made through coordinator id and > action numbers, this feature request is for rerunning all coordinator actions > by mentioning a particular state > for example : > oozie job -rerun coord-id -state killed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2030) Configuration properties from global section is not getting set in Hadoop job conf when using sub-workflow action in Oozie workflow.xml
[ https://issues.apache.org/jira/browse/OOZIE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582313#comment-14582313 ] Srikanth Sundarrajan commented on OOZIE-2030: - (minor nit) There are few unused imports in the patch. [~shwethags], Can you please close the previous review request. This should allow [~jaydeepvishwakarma] to create a new review request and upload this revisions. > Configuration properties from global section is not getting set in Hadoop job > conf when using sub-workflow action in Oozie workflow.xml > > > Key: OOZIE-2030 > URL: https://issues.apache.org/jira/browse/OOZIE-2030 > Project: Oozie > Issue Type: Bug > Components: action >Reporter: Peeyush Bishnoi >Assignee: Jaydeep Vishwakarma > Attachments: OOZIE-2030-v2.patch, OOZIE-2030-v3.patch, > OOZIE-2030-v4.patch, OOZIE-2030.patch > > > When submitting Oozie workflow with sub-workflow action and with global > section, configuration properties defined in global section is not getting > set in launched Hadoop job conf. But when we use Pig or MR action in > workflow.xml, configuration properties from global section set properly into > Hadoop job conf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2030) Configuration properties from global section is not getting set in Hadoop job conf when using sub-workflow action in Oozie workflow.xml
[ https://issues.apache.org/jira/browse/OOZIE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582309#comment-14582309 ] Srikanth Sundarrajan commented on OOZIE-2030: - [~jaydeepvishwakarma], Context arg in Subworkflow should give you a handle to the parent workflow via {{context::getWorkflow()}}. Shouldn't you be accessing the global section of the parent workflow here instead of {{LiteWorkflowAppParser::parse()}}. Mutating a single property via {{jobConf.get(SubWorkflowActionExecutor.SUBWF_JOBCONF)}} isn't likely to work when subflow depth is more than one, as the same will be overwritten. {code} XConfiguration subWorkflowConf = new XConfiguration(); Configuration parentConf = new XConfiguration(new StringReader(context.getWorkflow().getConf())); if (eConf.getChild(("propagate-configuration"), ns) != null) { XConfiguration.copy(parentConf, subWorkflowConf); } {code} Also the test case isn't really testing for precedence of global overlays. The current test has mutually exclusive conf in the parent & subflow global section and with this test case it is not possible to assert that the precedence is being honored correctly. > Configuration properties from global section is not getting set in Hadoop job > conf when using sub-workflow action in Oozie workflow.xml > > > Key: OOZIE-2030 > URL: https://issues.apache.org/jira/browse/OOZIE-2030 > Project: Oozie > Issue Type: Bug > Components: action >Reporter: Peeyush Bishnoi >Assignee: Jaydeep Vishwakarma > Attachments: OOZIE-2030-v2.patch, OOZIE-2030-v3.patch, > OOZIE-2030-v4.patch, OOZIE-2030.patch > > > When submitting Oozie workflow with sub-workflow action and with global > section, configuration properties defined in global section is not getting > set in launched Hadoop job conf. But when we use Pig or MR action in > workflow.xml, configuration properties from global section set properly into > Hadoop job conf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2020) Rerun all Failed/killed/timedout coordinator actions rather than specifying action numbers
[ https://issues.apache.org/jira/browse/OOZIE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582192#comment-14582192 ] Srikanth Sundarrajan commented on OOZIE-2020: - Hi [~nperiwal], Please upload the latest patch to review board. > Rerun all Failed/killed/timedout coordinator actions rather than specifying > action numbers > -- > > Key: OOZIE-2020 > URL: https://issues.apache.org/jira/browse/OOZIE-2020 > Project: Oozie > Issue Type: New Feature > Components: action >Reporter: Sreedish P S >Assignee: Narayan Periwal >Priority: Minor > Attachments: OOZIE-2020-v10.patch, OOZIE-2020-v11.patch, > OOZIE-2020-v12.patch, OOZIE-2020-v13.patch, OOZIE-2020-v14.patch, > OOZIE-2020-v15.patch, OOZIE-2020-v16.patch, OOZIE-2020-v17.patch, > OOZIE-2020-v8.patch, OOZIE-2020-v9.patch > > > Currently rerun of coordinator actions are made through coordinator id and > action numbers, this feature request is for rerunning all coordinator actions > by mentioning a particular state > for example : > oozie job -rerun coord-id -state killed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2251) Expose instrumental matrices in Realtime Graphing tool
[ https://issues.apache.org/jira/browse/OOZIE-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582042#comment-14582042 ] Srikanth Sundarrajan commented on OOZIE-2251: - Thanks [~nperiwal], fix does address most of the comments from earlier review. Have a few more on the new patch, can you please check. > Expose instrumental matrices in Realtime Graphing tool > -- > > Key: OOZIE-2251 > URL: https://issues.apache.org/jira/browse/OOZIE-2251 > Project: Oozie > Issue Type: New Feature > Components: monitoring >Reporter: Jaydeep Vishwakarma >Assignee: Narayan Periwal > Attachments: OOZIE-2251-v0.patch, OOZIE-2251-v1.patch, > OOZIE-2251-v2.patch, OOZIE-2251-v3.patch, OOZIE-2251-v4.patch, > OOZIE-2251-v5.patch > > > We have been logging so many important matrices in oozie-instrumentation.log > . These information is very useful for oozie functional monitoring. But it is > always difficult to get the meaning from flat file. If we expose this > information on some graphing tool, We can get the lot of meaning out of it > and can take some actions based on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2259) Create a callback action
[ https://issues.apache.org/jira/browse/OOZIE-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573925#comment-14573925 ] Srikanth Sundarrajan commented on OOZIE-2259: - Callback action can be quite useful. Have some questions relating to the proposal though. 1. Would standard action level retries be available for this, am assuming it will be. Please confirm 2. Host isn't adequate, you essentially need a URL comprising of scheme and authority 3. The method being queue/topic is misleading. Would suggest HTTP_GET, HTTP_POST, QUEUE_OFFER, TOPIC_PUBLISH to be explicit. 4. From the proposal it seems like it is not possible to include post body, That should actually be ok. Just wanted to hear your thoughts on that. 5. Would capture-ouput work for this action? 6. In case of HTTP_METHODS you might get a response body, will that be preserved should the user need them. In my view, that can be skipped too, as this is to serve as a callback notification 7. Would this be a fire and forget action. Say you get a HTTP/400 back what would be the behavior ? 8. How is this proposed to be implemented ? As an action performed through the launcher (via JavaActionExecutor) or something along the lines of FsActionExecutor/EmailActionExecutor? > Create a callback action > - > > Key: OOZIE-2259 > URL: https://issues.apache.org/jira/browse/OOZIE-2259 > Project: Oozie > Issue Type: New Feature > Components: action >Reporter: Jaydeep Vishwakarma >Assignee: Jaydeep Vishwakarma > > Need an action to send notification to external server by oozie. We should be > able to do multiple types of callback, Currently I know jms and http call. It > should suppose to have capability to call diffrent types of methods along > with n number of arguments. > The sample workflow with callback action > {code:xml} > > ... > > > [HOST] > [METHOD] > > [KEY][VALUE] > > ... > > ... > > ... > > {code} > HOST : by the host system can figure out if it is http or jms callback > action. System will send the notification to that host. > METHOD : it can be POST/GET/QUEUE/TOPIC -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2251) Expose instrumental matrices in Realtime Graphing tool
[ https://issues.apache.org/jira/browse/OOZIE-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572197#comment-14572197 ] Srikanth Sundarrajan commented on OOZIE-2251: - [~nperiwal], Have shared my comments on review board > Expose instrumental matrices in Realtime Graphing tool > -- > > Key: OOZIE-2251 > URL: https://issues.apache.org/jira/browse/OOZIE-2251 > Project: Oozie > Issue Type: New Feature > Components: monitoring >Reporter: Jaydeep Vishwakarma >Assignee: Narayan Periwal > Attachments: OOZIE-2251-v0.patch, OOZIE-2251-v1.patch, > OOZIE-2251-v2.patch > > > We have been logging so many important matrices in oozie-instrumentation.log > . These information is very useful for oozie functional monitoring. But it is > always difficult to get the meaning from flat file. If we expose this > information on some graphing tool, We can get the lot of meaning out of it > and can take some actions based on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2251) Expose instrumental matrices in Realtime Graphing tool
[ https://issues.apache.org/jira/browse/OOZIE-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566343#comment-14566343 ] Srikanth Sundarrajan commented on OOZIE-2251: - Possible to upload this in reviewboard ? > Expose instrumental matrices in Realtime Graphing tool > -- > > Key: OOZIE-2251 > URL: https://issues.apache.org/jira/browse/OOZIE-2251 > Project: Oozie > Issue Type: New Feature > Components: monitoring >Reporter: Jaydeep Vishwakarma >Assignee: Narayan Periwal > Attachments: OOZIE-2251-v0.patch, OOZIE-2251-v1.patch, > OOZIE-2251-v2.patch > > > We have been logging so many important matrices in oozie-instrumentation.log > . These information is very useful for oozie functional monitoring. But it is > always difficult to get the meaning from flat file. If we expose this > information on some graphing tool, We can get the lot of meaning out of it > and can take some actions based on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2216) Aperiodic Data handling in oozie
[ https://issues.apache.org/jira/browse/OOZIE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537773#comment-14537773 ] Srikanth Sundarrajan commented on OOZIE-2216: - Some issue with JIRA. Sorry for the multiple posts. > Aperiodic Data handling in oozie > > > Key: OOZIE-2216 > URL: https://issues.apache.org/jira/browse/OOZIE-2216 > Project: Oozie > Issue Type: New Feature > Components: coordinator >Reporter: Jaydeep Vishwakarma >Assignee: Jaydeep Vishwakarma > Attachments: Oozie_aperiodic_data_handling.pdf > > > Currently Oozie scheduling works on periodic datasets. It does not have any > mechanism to handle aperiodic datasets, which doesn’t follow a fixed > schedule/frequency. > Use cases > When incoming dataset arrives with no fixed schedule. > Need to trigger the job based all data available since last run with a > possible cap on the max size to process in one run. > Try to avoid creating so many instances when you know input instances will be > very few. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2216) Aperiodic Data handling in oozie
[ https://issues.apache.org/jira/browse/OOZIE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537771#comment-14537771 ] Srikanth Sundarrajan commented on OOZIE-2216: - How often is input checked before an action is created ? Question is specifically to see what kind of load is NN subjected to. > Aperiodic Data handling in oozie > > > Key: OOZIE-2216 > URL: https://issues.apache.org/jira/browse/OOZIE-2216 > Project: Oozie > Issue Type: New Feature > Components: coordinator >Reporter: Jaydeep Vishwakarma >Assignee: Jaydeep Vishwakarma > Attachments: Oozie_aperiodic_data_handling.pdf > > > Currently Oozie scheduling works on periodic datasets. It does not have any > mechanism to handle aperiodic datasets, which doesn’t follow a fixed > schedule/frequency. > Use cases > When incoming dataset arrives with no fixed schedule. > Need to trigger the job based all data available since last run with a > possible cap on the max size to process in one run. > Try to avoid creating so many instances when you know input instances will be > very few. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2216) Aperiodic Data handling in oozie
[ https://issues.apache.org/jira/browse/OOZIE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537769#comment-14537769 ] Srikanth Sundarrajan commented on OOZIE-2216: - How often is input checked before an action is created ? Question is specifically to see what kind of load is NN subjected to. > Aperiodic Data handling in oozie > > > Key: OOZIE-2216 > URL: https://issues.apache.org/jira/browse/OOZIE-2216 > Project: Oozie > Issue Type: New Feature > Components: coordinator >Reporter: Jaydeep Vishwakarma >Assignee: Jaydeep Vishwakarma > Attachments: Oozie_aperiodic_data_handling.pdf > > > Currently Oozie scheduling works on periodic datasets. It does not have any > mechanism to handle aperiodic datasets, which doesn’t follow a fixed > schedule/frequency. > Use cases > When incoming dataset arrives with no fixed schedule. > Need to trigger the job based all data available since last run with a > possible cap on the max size to process in one run. > Try to avoid creating so many instances when you know input instances will be > very few. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2216) Aperiodic Data handling in oozie
[ https://issues.apache.org/jira/browse/OOZIE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537770#comment-14537770 ] Srikanth Sundarrajan commented on OOZIE-2216: - How often is input checked before an action is created ? Question is specifically to see what kind of load is NN subjected to. > Aperiodic Data handling in oozie > > > Key: OOZIE-2216 > URL: https://issues.apache.org/jira/browse/OOZIE-2216 > Project: Oozie > Issue Type: New Feature > Components: coordinator >Reporter: Jaydeep Vishwakarma >Assignee: Jaydeep Vishwakarma > Attachments: Oozie_aperiodic_data_handling.pdf > > > Currently Oozie scheduling works on periodic datasets. It does not have any > mechanism to handle aperiodic datasets, which doesn’t follow a fixed > schedule/frequency. > Use cases > When incoming dataset arrives with no fixed schedule. > Need to trigger the job based all data available since last run with a > possible cap on the max size to process in one run. > Try to avoid creating so many instances when you know input instances will be > very few. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2216) Aperiodic Data handling in oozie
[ https://issues.apache.org/jira/browse/OOZIE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532558#comment-14532558 ] Srikanth Sundarrajan commented on OOZIE-2216: - [~jaydeepvishwakarma], This would be a nice addition to Oozie. Looking at the design, it looks like the major shift you plan to bring about is to avoid eager materialization followed by input check to lazy materialization upon input availability for coordinators that are marked as gating on aperiodic datasets. Seems simple enough. Perhaps you can share your thinking on these. 1. How periodic will the polling be when materialization is lazy (to gauge the effect this would have on NN) ? 2. What is the behavior when some periodic and aperiodic datasets are required for a coordinator. Is that supported ? 3. How will this co-exist with features outlined in OOZIE-1976 4. You seem to imply that there would no schema changes. Would you need any additional state maintained for this, if so where is that planned to be maintained? 5. Do you expect the DB to be loaded more than what it is today? Thanks for taking this up. > Aperiodic Data handling in oozie > > > Key: OOZIE-2216 > URL: https://issues.apache.org/jira/browse/OOZIE-2216 > Project: Oozie > Issue Type: New Feature > Components: coordinator >Reporter: Jaydeep Vishwakarma >Assignee: Jaydeep Vishwakarma > Attachments: Oozie_aperiodic_data_handling.pdf > > > Currently Oozie scheduling works on periodic datasets. It does not have any > mechanism to handle aperiodic datasets, which doesn’t follow a fixed > schedule/frequency. > Use cases > When incoming dataset arrives with no fixed schedule. > Need to trigger the job based all data available since last run with a > possible cap on the max size to process in one run. > Try to avoid creating so many instances when you know input instances will be > very few. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-1536) Coordinator action reruns start a new workflow
[ https://issues.apache.org/jira/browse/OOZIE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154344#comment-14154344 ] Srikanth Sundarrajan commented on OOZIE-1536: - +1, Looks good to me. > Coordinator action reruns start a new workflow > -- > > Key: OOZIE-1536 > URL: https://issues.apache.org/jira/browse/OOZIE-1536 > Project: Oozie > Issue Type: Improvement >Reporter: Srikanth Sundarrajan >Assignee: Jaydeep Vishwakarma > > Coordinator action reruns start a new workflow and if existing workflow for > the action is in running state, the same is not checked. Coord rerun can > possibly do a workflow re-run to prevent this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-1536) Coordinator action reruns start a new workflow
[ https://issues.apache.org/jira/browse/OOZIE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077283#comment-14077283 ] Srikanth Sundarrajan commented on OOZIE-1536: - [~puru], I guess [~shwethags] is talking about the following scenario An initial run of the coord action creates a workflow and then a subsequent run creates another workflow. It is understood that coord re-run will make sure that the older workflow is not running before the second one starts. However It appears like Workflow #1 can be run independently through a direct workflow re-run (not going via coord re-run). In which case you might see both the workflows run and the behavior is undefined. If there was a one to one correspondence between a coord action and a workflow this problem might not occur. Makes sense ? > Coordinator action reruns start a new workflow > -- > > Key: OOZIE-1536 > URL: https://issues.apache.org/jira/browse/OOZIE-1536 > Project: Oozie > Issue Type: Improvement >Reporter: Srikanth Sundarrajan > > Coordinator action reruns start a new workflow and if existing workflow for > the action is in running state, the same is not checked. Coord rerun can > possibly do a workflow re-run to prevent this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1533) Coordinator action materialization is too slow due to coarse job level locks
[ https://issues.apache.org/jira/browse/OOZIE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13931400#comment-13931400 ] Srikanth Sundarrajan commented on OOZIE-1533: - [~rohini], Unless all coord actions are done, status transit service should't be updating the coord job. correct ? Perhaps we should keep updates to coord possible only via three routes (1. user action, 2. when all coord actions are in completed state, 3. Materialization) to prevent StatusTransitService from playing god. {quote} One problem that needs to be addressed before this was that there are lot of places in code where coord job is updated {quote} Regarding the CoordActionInputCheckXCommand, you bring up a really important concern, but to throttle it down through a coord lock seems to generally bring down the throughput and it might useful to keep it free of this lock. We should look at options to perform bulk checks for input to improve the scalability of this operation without hurting NN / DB In practice I found that most commands resort to checking the coord status in verifyPrecondition(), so the odds of a coord action running while the coord being in killed state due to a user interrupt is negligible, however the possibility does exist. {quote} Another thing is interrupt commands like coord kill, etc will not be processed earlier if the lock is changed to the action id. {quote} > Coordinator action materialization is too slow due to coarse job level locks > > > Key: OOZIE-1533 > URL: https://issues.apache.org/jira/browse/OOZIE-1533 > Project: Oozie > Issue Type: Improvement >Reporter: Srikanth Sundarrajan >Assignee: Srikanth Sundarrajan > Labels: locking > Attachments: OOZIE-1533.patch > > > Coord job level lock introduces high contention. Instead introduce coord > action level locking whenever appropriate -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1533) Coordinator action materialization is too slow due to coarse job level locks
[ https://issues.apache.org/jira/browse/OOZIE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Sundarrajan updated OOZIE-1533: Attachment: OOZIE-1533.patch > Coordinator action materialization is too slow due to coarse job level locks > > > Key: OOZIE-1533 > URL: https://issues.apache.org/jira/browse/OOZIE-1533 > Project: Oozie > Issue Type: Improvement >Reporter: Srikanth Sundarrajan >Assignee: Srikanth Sundarrajan > Attachments: OOZIE-1533.patch > > > Coord job level lock introduces high contention. Instead introduce coord > action level locking whenever appropriate -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (OOZIE-1531) Add a blocking / synchronous option to oozie client
[ https://issues.apache.org/jira/browse/OOZIE-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Sundarrajan reassigned OOZIE-1531: --- Assignee: Srikanth Sundarrajan (was: Bowen Zhang) > Add a blocking / synchronous option to oozie client > - > > Key: OOZIE-1531 > URL: https://issues.apache.org/jira/browse/OOZIE-1531 > Project: Oozie > Issue Type: New Feature >Reporter: Srikanth Sundarrajan >Assignee: Srikanth Sundarrajan > > Currently Oozie returns immediately after sending the request, there is not > warrantee that the request is correct or it has been done. > ASK: a client Java API that blocks until the submitted job is running, it has > been killed, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1699) Some of the commands submitted to Oozie internal queue are never executed
[ https://issues.apache.org/jira/browse/OOZIE-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908136#comment-13908136 ] Srikanth Sundarrajan commented on OOZIE-1699: - The test in focus is org.apache.oozie.event.TestEventGeneration.testCoordinatorActionEvent > Some of the commands submitted to Oozie internal queue are never executed > - > > Key: OOZIE-1699 > URL: https://issues.apache.org/jira/browse/OOZIE-1699 > Project: Oozie > Issue Type: Bug >Reporter: Srikanth Sundarrajan >Assignee: Srikanth Sundarrajan > Attachments: OOZIE-1699-v1-no-prefix.patch, OOZIE-1699.patch > > > At scale, we are seeing issues with some command submitted to the command > queue in CallableQueueService aren't getting executed at all. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (OOZIE-1699) Some of the commands submitted to Oozie internal queue are never executed
[ https://issues.apache.org/jira/browse/OOZIE-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907999#comment-13907999 ] Srikanth Sundarrajan commented on OOZIE-1699: - Verified the failed test multiple times and there doesn't seem to be any regression. I am guessing that this test is generally flaky, based on what I saw with other test-patches https://builds.apache.org/job/oozie-trunk-precommit-build/1066/testReport/ (OOZIE-1698) https://builds.apache.org/job/oozie-trunk-precommit-build/1044/testReport/ (OOZIE-1681) ... > Some of the commands submitted to Oozie internal queue are never executed > - > > Key: OOZIE-1699 > URL: https://issues.apache.org/jira/browse/OOZIE-1699 > Project: Oozie > Issue Type: Bug >Reporter: Srikanth Sundarrajan >Assignee: Srikanth Sundarrajan > Attachments: OOZIE-1699-v1-no-prefix.patch, OOZIE-1699.patch > > > At scale, we are seeing issues with some command submitted to the command > queue in CallableQueueService aren't getting executed at all. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (OOZIE-1533) Coordinator action materialization is too slow due to coarse job level locks
[ https://issues.apache.org/jira/browse/OOZIE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Sundarrajan reassigned OOZIE-1533: --- Assignee: Srikanth Sundarrajan > Coordinator action materialization is too slow due to coarse job level locks > > > Key: OOZIE-1533 > URL: https://issues.apache.org/jira/browse/OOZIE-1533 > Project: Oozie > Issue Type: Improvement >Reporter: Srikanth Sundarrajan >Assignee: Srikanth Sundarrajan > > Coord job level lock introduces high contention. Instead introduce coord > action level locking whenever appropriate -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (OOZIE-1533) Coordinator action materialization is too slow due to coarse job level locks
[ https://issues.apache.org/jira/browse/OOZIE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906783#comment-13906783 ] Srikanth Sundarrajan commented on OOZIE-1533: - Currently locks are being held for various coord-action-commands as follows ||Command||Lock (entity-key)|| |CoordActionCheckXCommand|coord-action-id| |CoordActionInfoXCommand|no-locks| |CoordActionInputCheckXCommand|coord-job-id| |CoordActionMaterializeCommand|RANDOM("coord_action_mater" + UUID())| |CoordActionNotificationXCommand|RANDOM("coord_action_notification" + UUID())| |CoordActionReadyXCommand|coord-job-id| |CoordActionsKillXCommand|coord-job-id| |CoordActionStartXCommand|coord-job-id| |CoordActionTimeOutXCommand|coord-action-id| |CoordActionUpdatePushMissingDependency|coord-action-id| |CoordActionUpdateXCommand|coord-job-id| I intend to put up a patch changing locks for the following commands. ||Command||Lock (entity-key)|| |CoordActionInputCheckXCommand|coord-action-id| |CoordActionReadyXCommand|coord-action-id| |CoordActionStartXCommand|coord-action-id| |CoordActionUpdateXCommand|coord-action-id| It seems like these commands were using the coord-job-id level locks to prevent starting the action when the parent coord is in killed or paused state. But from a correctness stand point performing these commands when the coord is in killed / paused state there isn't any impact, except perhaps in CoordActionStartXCommand. While holding lock at the coord-job-id isn't all that helpful as it unnecessarily forces serial execution of independent coord-actions command essentially working on their specific actions. Are there any concerns ? > Coordinator action materialization is too slow due to coarse job level locks > > > Key: OOZIE-1533 > URL: https://issues.apache.org/jira/browse/OOZIE-1533 > Project: Oozie > Issue Type: Improvement >Reporter: Srikanth Sundarrajan > > Coord job level lock introduces high contention. Instead introduce coord > action level locking whenever appropriate -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (OOZIE-1699) Some of the commands submitted to Oozie internal queue are never executed
[ https://issues.apache.org/jira/browse/OOZIE-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Sundarrajan updated OOZIE-1699: Attachment: OOZIE-1699-v1-no-prefix.patch Attaching patch with --no-prefix as suggested by [~shwethags]. Thanks > Some of the commands submitted to Oozie internal queue are never executed > - > > Key: OOZIE-1699 > URL: https://issues.apache.org/jira/browse/OOZIE-1699 > Project: Oozie > Issue Type: Bug >Reporter: Srikanth Sundarrajan >Assignee: Srikanth Sundarrajan > Attachments: OOZIE-1699-v1-no-prefix.patch, OOZIE-1699.patch > > > At scale, we are seeing issues with some command submitted to the command > queue in CallableQueueService aren't getting executed at all. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (OOZIE-1699) Some of the commands submitted to Oozie internal queue are never executed
[ https://issues.apache.org/jira/browse/OOZIE-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906770#comment-13906770 ] Srikanth Sundarrajan commented on OOZIE-1699: - Patch does seem to apply alright. Am I missing something ? {code} sriksun:oozie-trunk sriksun$ git pull -v --all Fetching origin >From https://git-wip-us.apache.org/repos/asf/oozie = [up to date] master -> origin/master = [up to date] ap-pages -> origin/ap-pages = [up to date] branch-3.1 -> origin/branch-3.1 = [up to date] branch-3.1.4 -> origin/branch-3.1.4 = [up to date] branch-3.2 -> origin/branch-3.2 = [up to date] branch-3.3 -> origin/branch-3.3 = [up to date] branch-4.0 -> origin/branch-4.0 = [up to date] hcat-intre -> origin/hcat-intre Already up-to-date. sriksun:oozie-trunk sriksun$ curl "https://issues.apache.org/jira/secure/attachment/12630002/OOZIE-1699.patch"; | git apply -v --check % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 10948 100 109480 0 4635 0 0:00:02 0:00:02 --:--:-- 4637 Checking patch core/src/main/java/org/apache/oozie/service/CallableQueueService.java... Checking patch core/src/main/java/org/apache/oozie/util/PollablePriorityDelayQueue.java... Checking patch core/src/main/java/org/apache/oozie/util/PriorityDelayQueue.java... Checking patch core/src/test/java/org/apache/oozie/service/TestCallableQueueService.java... {code} > Some of the commands submitted to Oozie internal queue are never executed > - > > Key: OOZIE-1699 > URL: https://issues.apache.org/jira/browse/OOZIE-1699 > Project: Oozie > Issue Type: Bug >Reporter: Srikanth Sundarrajan >Assignee: Srikanth Sundarrajan > Attachments: OOZIE-1699.patch > > > At scale, we are seeing issues with some command submitted to the command > queue in CallableQueueService aren't getting executed at all. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (OOZIE-1531) Add a blocking / synchronous option to oozie client
[ https://issues.apache.org/jira/browse/OOZIE-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906651#comment-13906651 ] Srikanth Sundarrajan commented on OOZIE-1531: - Hi [~bowenzhangusa], Please do let me know if you are working on this, else I can provide a fix for this issue > Add a blocking / synchronous option to oozie client > - > > Key: OOZIE-1531 > URL: https://issues.apache.org/jira/browse/OOZIE-1531 > Project: Oozie > Issue Type: New Feature >Reporter: Srikanth Sundarrajan >Assignee: Bowen Zhang > > Currently Oozie returns immediately after sending the request, there is not > warrantee that the request is correct or it has been done. > ASK: a client Java API that blocks until the submitted job is running, it has > been killed, etc. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (OOZIE-1699) Some of the commands submitted to Oozie internal queue are never executed
[ https://issues.apache.org/jira/browse/OOZIE-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Sundarrajan updated OOZIE-1699: Attachment: OOZIE-1699.patch > Some of the commands submitted to Oozie internal queue are never executed > - > > Key: OOZIE-1699 > URL: https://issues.apache.org/jira/browse/OOZIE-1699 > Project: Oozie > Issue Type: Bug >Reporter: Srikanth Sundarrajan >Assignee: Srikanth Sundarrajan > Attachments: OOZIE-1699.patch > > > At scale, we are seeing issues with some command submitted to the command > queue in CallableQueueService aren't getting executed at all. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (OOZIE-1699) Some of the commands submitted to Oozie internal queue are never executed
[ https://issues.apache.org/jira/browse/OOZIE-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903809#comment-13903809 ] Srikanth Sundarrajan commented on OOZIE-1699: - Debugging this further, able to identify that there is an Exception in CallableWrapper::run() before removeFromUniqueCallables() is invoked, leaving command behind in uniqueCallables list. This prevents this item from getting added again into the queue and since the earlier run() failed, the command never gets executed till a server restart. > Some of the commands submitted to Oozie internal queue are never executed > - > > Key: OOZIE-1699 > URL: https://issues.apache.org/jira/browse/OOZIE-1699 > Project: Oozie > Issue Type: Bug >Reporter: Srikanth Sundarrajan >Assignee: Srikanth Sundarrajan > > At scale, we are seeing issues with some command submitted to the command > queue in CallableQueueService aren't getting executed at all. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (OOZIE-1699) Some of the commands submitted to Oozie internal queue are never executed
[ https://issues.apache.org/jira/browse/OOZIE-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903808#comment-13903808 ] Srikanth Sundarrajan commented on OOZIE-1699: - Do find many uncaught exceptions from oozie captured in the catalina.out file. {code} >>> Exception in thread "pool-2-thread-22" java.lang.OutOfMemoryError: GC >>> overhead limit exceeded >>> Exception in thread "pool-2-thread-19" >>> java.lang.IllegalMonitorStateException ... at org.apache.oozie.util.PollablePriorityDelayQueue.poll(PollablePriorityDelayQueue.java:80) ... >>> Exception in thread "pool-2-thread-24" java.lang.IllegalStateException: >>> queueElement already in a queue at org.apache.oozie.util.PriorityDelayQueue.offer(PriorityDelayQueue.java:347) {code} Looks like these threads have died and the ThreadPoolExecutor has created new threads to make good for these. > Some of the commands submitted to Oozie internal queue are never executed > - > > Key: OOZIE-1699 > URL: https://issues.apache.org/jira/browse/OOZIE-1699 > Project: Oozie > Issue Type: Bug >Reporter: Srikanth Sundarrajan >Assignee: Srikanth Sundarrajan > > At scale, we are seeing issues with some command submitted to the command > queue in CallableQueueService aren't getting executed at all. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (OOZIE-1699) Some of the commands submitted to Oozie internal queue are never executed
Srikanth Sundarrajan created OOZIE-1699: --- Summary: Some of the commands submitted to Oozie internal queue are never executed Key: OOZIE-1699 URL: https://issues.apache.org/jira/browse/OOZIE-1699 Project: Oozie Issue Type: Bug Reporter: Srikanth Sundarrajan Assignee: Srikanth Sundarrajan At scale, we are seeing issues with some command submitted to the command queue in CallableQueueService aren't getting executed at all. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (OOZIE-1532) Purging should remove completed children job for long running coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898820#comment-13898820 ] Srikanth Sundarrajan commented on OOZIE-1532: - A default of 60 days for purging older wf_actions, workflows and coord_actions would be ideal. {quote} can you specify which config you want to add to the oozie-site.xml? {quote} > Purging should remove completed children job for long running coordinator jobs > -- > > Key: OOZIE-1532 > URL: https://issues.apache.org/jira/browse/OOZIE-1532 > Project: Oozie > Issue Type: New Feature >Reporter: Srikanth Sundarrajan >Assignee: Bowen Zhang > Attachments: oozie-1532.patch > > > Specifically, this is for long running coordinator jobs with high frequency. > all child workflows are never purged as the coord job is still running. > Oozie server configuration that indicates how many coordinator actions > frequency ticks to keep. By doing this it would be possible to purge running > coord jobs. By default this would not be enabled and the current logic would > remain. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (OOZIE-1532) Purging should remove completed children job for long running coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13878496#comment-13878496 ] Srikanth Sundarrajan commented on OOZIE-1532: - Yes that is correct. Thanks for picking this up. Often times the Oozie DB is bloated up causing performance issues and this might be very useful. > Purging should remove completed children job for long running coordinator jobs > -- > > Key: OOZIE-1532 > URL: https://issues.apache.org/jira/browse/OOZIE-1532 > Project: Oozie > Issue Type: New Feature >Reporter: Srikanth Sundarrajan >Assignee: Bowen Zhang > > Specifically, this is for long running coordinator jobs with high frequency. > all child workflows are never purged as the coord job is still running. > Oozie server configuration that indicates how many coordinator actions > frequency ticks to keep. By doing this it would be possible to purge running > coord jobs. By default this would not be enabled and the current logic would > remain. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (OOZIE-1532) Purging should remove completed children job for long running coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877619#comment-13877619 ] Srikanth Sundarrajan commented on OOZIE-1532: - [~bowenzhangusa], Did you mean long running workflow or long running coordinator job ? > Purging should remove completed children job for long running coordinator jobs > -- > > Key: OOZIE-1532 > URL: https://issues.apache.org/jira/browse/OOZIE-1532 > Project: Oozie > Issue Type: New Feature >Reporter: Srikanth Sundarrajan >Assignee: Bowen Zhang > > Specifically, this is for long running coordinator jobs with high frequency. > all child workflows are never purged as the coord job is still running. > Oozie server configuration that indicates how many coordinator actions > frequency ticks to keep. By doing this it would be possible to purge running > coord jobs. By default this would not be enabled and the current logic would > remain. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (OOZIE-1532) Purging should remove completed children job for long running coordinator jobs
[ https://issues.apache.org/jira/browse/OOZIE-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877621#comment-13877621 ] Srikanth Sundarrajan commented on OOZIE-1532: - Purge shouldn't be touching any running workflow, the feature request is to purge old coord actions of a long running coord job. > Purging should remove completed children job for long running coordinator jobs > -- > > Key: OOZIE-1532 > URL: https://issues.apache.org/jira/browse/OOZIE-1532 > Project: Oozie > Issue Type: New Feature >Reporter: Srikanth Sundarrajan >Assignee: Bowen Zhang > > Specifically, this is for long running coordinator jobs with high frequency. > all child workflows are never purged as the coord job is still running. > Oozie server configuration that indicates how many coordinator actions > frequency ticks to keep. By doing this it would be possible to purge running > coord jobs. By default this would not be enabled and the current logic would > remain. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (OOZIE-1531) Add a blocking / synchronous option to oozie client
[ https://issues.apache.org/jira/browse/OOZIE-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848294#comment-13848294 ] Srikanth Sundarrajan commented on OOZIE-1531: - [~bowenzhangusa], The feature ask is generic for all oozie operations and not restricted to workflow/coord or bundle creation (id is return in case of creation is adequate). Ideally would like the following behaviour for synchronous apis Bundle creation: Return success only bundle and coord are valid and created Coord creation: Return success only when coord definition is valid and created (not necessary to action to materialize leave alone its status) Workflow creation: Return success only when workflow definition is valid and inited Suspend (for all object types): Return success only when the request element is suspended successfully (which should included recursively suspend all the child objects) Resume (for all object types): Return success only when the requested element and all child objects are resumed kill (for all object types): Return success only the requested element and child objects are killed > Add a blocking / synchronous option to oozie client > - > > Key: OOZIE-1531 > URL: https://issues.apache.org/jira/browse/OOZIE-1531 > Project: Oozie > Issue Type: New Feature >Reporter: Srikanth Sundarrajan >Assignee: Bowen Zhang > > Currently Oozie returns immediately after sending the request, there is not > warrantee that the request is correct or it has been done. > ASK: a client Java API that blocks until the submitted job is running, it has > been killed, etc. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (OOZIE-1531) Add a blocking / synchronous option to oozie client
[ https://issues.apache.org/jira/browse/OOZIE-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844976#comment-13844976 ] Srikanth Sundarrajan commented on OOZIE-1531: - [~bowenzhangusa], What essentially I was looking for was support in the oozie server to actually perform them synchronously as opposed to it getting dropped into a queue for further handling later. If that is difficult, the OozieClient should block till the action is successful or failed. In the event the status doesn't change, would like an affirmative response on whether the action was successful or not, which should be consistent with what actually happens in the system. In other words, OozieClient can't respond saying the action failed, while the server subsequently performs this action successfully or vice-versa. Thanks for picking this up. > Add a blocking / synchronous option to oozie client > - > > Key: OOZIE-1531 > URL: https://issues.apache.org/jira/browse/OOZIE-1531 > Project: Oozie > Issue Type: New Feature >Reporter: Srikanth Sundarrajan >Assignee: Bowen Zhang > > Currently Oozie returns immediately after sending the request, there is not > warrantee that the request is correct or it has been done. > ASK: a client Java API that blocks until the submitted job is running, it has > been killed, etc. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (OOZIE-1533) Coordinator action materialization is too slow due to coarse job level locks
[ https://issues.apache.org/jira/browse/OOZIE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798738#comment-13798738 ] Srikanth Sundarrajan commented on OOZIE-1533: - Hi [~chitnis], Coord job level locks for materialization is perfectly fine, however action update is also blocked as they are also serialized through the coord job level lock. In practice my observation is that when individual actions want to update their status as the action makes progress from one state to another, and these update they are required to acquire a coord job level lock. Instead if action updates were to simply be blocked on coord action themselves, this will greatly improve the backlog catch up scenarios without being unfair to any other coordinator or compromising on correctness of the system. > Coordinator action materialization is too slow due to coarse job level locks > > > Key: OOZIE-1533 > URL: https://issues.apache.org/jira/browse/OOZIE-1533 > Project: Oozie > Issue Type: Improvement >Reporter: Srikanth Sundarrajan > > Coord job level lock introduces high contention. Instead introduce coord > action level locking whenever appropriate -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (OOZIE-1531) Add a blocking / synchronous option to oozie client
[ https://issues.apache.org/jira/browse/OOZIE-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13768463#comment-13768463 ] Srikanth Sundarrajan commented on OOZIE-1531: - Notes from discussion with [~tucu00] offline: This could be done in the OozieClient Java API by using the current fire&forget methods followed by a wait-until logic with a timeout. Similar to what Hadoop JobClient does. > Add a blocking / synchronous option to oozie client > - > > Key: OOZIE-1531 > URL: https://issues.apache.org/jira/browse/OOZIE-1531 > Project: Oozie > Issue Type: New Feature >Reporter: Srikanth Sundarrajan > > Currently Oozie returns immediately after sending the request, there is not > warrantee that the request is correct or it has been done. > ASK: a client Java API that blocks until the submitted job is running, it has > been killed, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (OOZIE-1535) Update job properties for WF/COORD
Srikanth Sundarrajan created OOZIE-1535: --- Summary: Update job properties for WF/COORD Key: OOZIE-1535 URL: https://issues.apache.org/jira/browse/OOZIE-1535 Project: Oozie Issue Type: Improvement Reporter: Srikanth Sundarrajan It should be possible to update job submission properties for a running job, both for WF and COORD jobs. The updated properties would be used for all subsequent actions (not yet started). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OOZIE-1534) Launcher job might run do hadoop attempt relaunch - possibly causing correctness issues
[ https://issues.apache.org/jira/browse/OOZIE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13768465#comment-13768465 ] Srikanth Sundarrajan commented on OOZIE-1534: - Notes from my discussion with [~tucu00] offline: This could be done if the log scavenger logic with use to harvest MR jobs started by pig/hive/sqoop is done realtime (as opposed after pig/hive/sqoop finishes) and the captured job IDs are written/fsync to a file in HDFS in the action subdir. Then the action main class would look for this file container job ids at start time and if it exists, it would kill all those jobs before proceeding. This would make the launcher job idempotent. > Launcher job might run do hadoop attempt relaunch - possibly causing > correctness issues > --- > > Key: OOZIE-1534 > URL: https://issues.apache.org/jira/browse/OOZIE-1534 > Project: Oozie > Issue Type: Improvement >Reporter: Srikanth Sundarrajan > > The section of the action allow to clean up the output dir. This is > not sufficient as MR jobs started by Pig/Hive may be still running.We should > look to kill child MR jobs if any before launching new ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OOZIE-1537) Suspending a sub-workflow is not reflected in the parent workflow
[ https://issues.apache.org/jira/browse/OOZIE-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13768468#comment-13768468 ] Srikanth Sundarrajan commented on OOZIE-1537: - Notes from my discussion with [~tucu00] offline: We could introduce the SUSPEND status for WF Actions and the ActionExecutor would have a method indicating if it is supported or not (except for sub-WF no other action would support that). When a sub-WF is suspended, the parent WF action should be set to suspended and the parent WF job should be suspended. Resume should work in similar way. If the parent is suspended, the actions should be suspended if they support it. This should work up/down to/from coord jobs as well. We need to figure out how to zigzag when a sub-wf within a fork of 2 or more sub-wf is suspended/resumed. > Suspending a sub-workflow is not reflected in the parent workflow > - > > Key: OOZIE-1537 > URL: https://issues.apache.org/jira/browse/OOZIE-1537 > Project: Oozie > Issue Type: Improvement >Reporter: Srikanth Sundarrajan > > Suspending a sub-workflow is not reflected in the parent workflow, thus you > don't know what is going on. The status of the sub-flow should be reflected > in the parent workflow just as in any other action. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (OOZIE-1538) Coordinator actions concurrency control across coord jobs at user level
Srikanth Sundarrajan created OOZIE-1538: --- Summary: Coordinator actions concurrency control across coord jobs at user level Key: OOZIE-1538 URL: https://issues.apache.org/jira/browse/OOZIE-1538 Project: Oozie Issue Type: Improvement Reporter: Srikanth Sundarrajan Currently coord action concurrency is at coord job level. If the user has several coord jobs it can still flood the cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (OOZIE-1537) Suspending a sub-workflow is not reflected in the parent workflow
Srikanth Sundarrajan created OOZIE-1537: --- Summary: Suspending a sub-workflow is not reflected in the parent workflow Key: OOZIE-1537 URL: https://issues.apache.org/jira/browse/OOZIE-1537 Project: Oozie Issue Type: Improvement Reporter: Srikanth Sundarrajan Suspending a sub-workflow is not reflected in the parent workflow, thus you don't know what is going on. The status of the sub-flow should be reflected in the parent workflow just as in any other action. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (OOZIE-1534) Launcher job might run do hadoop attempt relaunch - possibly causing correctness issues
Srikanth Sundarrajan created OOZIE-1534: --- Summary: Launcher job might run do hadoop attempt relaunch - possibly causing correctness issues Key: OOZIE-1534 URL: https://issues.apache.org/jira/browse/OOZIE-1534 Project: Oozie Issue Type: Improvement Reporter: Srikanth Sundarrajan The section of the action allow to clean up the output dir. This is not sufficient as MR jobs started by Pig/Hive may be still running.We should look to kill child MR jobs if any before launching new ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (OOZIE-1536) Coordinator action reruns start a new workflow
Srikanth Sundarrajan created OOZIE-1536: --- Summary: Coordinator action reruns start a new workflow Key: OOZIE-1536 URL: https://issues.apache.org/jira/browse/OOZIE-1536 Project: Oozie Issue Type: Improvement Reporter: Srikanth Sundarrajan Coordinator action reruns start a new workflow and if existing workflow for the action is in running state, the same is not checked. Coord rerun can possibly do a workflow re-run to prevent this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OOZIE-1538) Coordinator actions concurrency control across coord jobs at user level
[ https://issues.apache.org/jira/browse/OOZIE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13768471#comment-13768471 ] Srikanth Sundarrajan commented on OOZIE-1538: - Notes from my discussion with [~tucu00] offline: Use Zookeeper as distributed countdown-locks/latches. Integration of Oozie and Zookeeper to do this could be done using a zookeeper URIHandler implementation and modeling the countdown-locks/latches as additional datasets in the coordinator definition. Thus the materialization of the action will depend on the zk:// being available. > Coordinator actions concurrency control across coord jobs at user level > --- > > Key: OOZIE-1538 > URL: https://issues.apache.org/jira/browse/OOZIE-1538 > Project: Oozie > Issue Type: Improvement >Reporter: Srikanth Sundarrajan > > Currently coord action concurrency is at coord job level. If the user has > several coord jobs it can still flood the cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (OOZIE-1533) Coordinator action materialization is too slow due to coarse job level locks
Srikanth Sundarrajan created OOZIE-1533: --- Summary: Coordinator action materialization is too slow due to coarse job level locks Key: OOZIE-1533 URL: https://issues.apache.org/jira/browse/OOZIE-1533 Project: Oozie Issue Type: Improvement Reporter: Srikanth Sundarrajan Coord job level lock introduces high contention. Instead introduce coord action level locking whenever appropriate -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (OOZIE-1532) Purging should remove completed children job for long running jobs
Srikanth Sundarrajan created OOZIE-1532: --- Summary: Purging should remove completed children job for long running jobs Key: OOZIE-1532 URL: https://issues.apache.org/jira/browse/OOZIE-1532 Project: Oozie Issue Type: New Feature Reporter: Srikanth Sundarrajan Specifically, this is for long running coordinator jobs with high frequency. all child workflows are never purged as the coord job is still running. Oozie server configuration that indicates how many coordinator actions frequency ticks to keep. By doing this it would be possible to purge running coord jobs. By default this would not be enabled and the current logic would remain. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (OOZIE-1531) Add a blocking / synchronous option to oozie client
Srikanth Sundarrajan created OOZIE-1531: --- Summary: Add a blocking / synchronous option to oozie client Key: OOZIE-1531 URL: https://issues.apache.org/jira/browse/OOZIE-1531 Project: Oozie Issue Type: New Feature Reporter: Srikanth Sundarrajan Currently Oozie returns immediately after sending the request, there is not warrantee that the request is correct or it has been done. ASK: a client Java API that blocks until the submitted job is running, it has been killed, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OOZIE-674) resolveInstanceRange doesn't work for EL extensions
[ https://issues.apache.org/jira/browse/OOZIE-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13682040#comment-13682040 ] Srikanth Sundarrajan commented on OOZIE-674: Yes please. It would be good if this is shipped in the next immediate release. BTW, When is 4.0 expected to ship? > resolveInstanceRange doesn't work for EL extensions > --- > > Key: OOZIE-674 > URL: https://issues.apache.org/jira/browse/OOZIE-674 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk >Reporter: Shwetha G S >Assignee: Shwetha G S > Labels: EL, extension > Fix For: trunk > > Attachments: OOZIE-674.patch, OOZIE-674-v3.patch, OOZIE-674-v4.patch, > OOZIE-674-v5.patch, OOZIE-674-v6.patch, OOZIE-674-ver2.patch > > > I have an EL extension today(0,0) which maps to start day of nominal time. > This is used to specify startInstance, endInstance and instance in dataIn and > dataOut of coordinator. > In CoordCommandUtils.resolveInstanceRange(), getInstanceNumber has to return > the instance number with respect to current. So, for coord-action-create-inst > context, I have mapped today to current and hence getInstanceNumber returns > the correct number. But later in resolveInstanceRange(), getFuncType is > called with startInstance value which is today in this case and it maps to > UNEXPECTED and throws up. getFuncType should be passed the evaluation of > coord-action-create-inst context -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OOZIE-674) resolveInstanceRange doesn't work for EL extensions
[ https://issues.apache.org/jira/browse/OOZIE-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679002#comment-13679002 ] Srikanth Sundarrajan commented on OOZIE-674: Can this ported to 3.3.2 as well ? > resolveInstanceRange doesn't work for EL extensions > --- > > Key: OOZIE-674 > URL: https://issues.apache.org/jira/browse/OOZIE-674 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk >Reporter: Shwetha G S >Assignee: Shwetha G S > Labels: EL, extension > Fix For: trunk > > Attachments: OOZIE-674.patch, OOZIE-674-v3.patch, OOZIE-674-v4.patch, > OOZIE-674-v5.patch, OOZIE-674-v6.patch, OOZIE-674-ver2.patch > > > I have an EL extension today(0,0) which maps to start day of nominal time. > This is used to specify startInstance, endInstance and instance in dataIn and > dataOut of coordinator. > In CoordCommandUtils.resolveInstanceRange(), getInstanceNumber has to return > the instance number with respect to current. So, for coord-action-create-inst > context, I have mapped today to current and hence getInstanceNumber returns > the correct number. But later in resolveInstanceRange(), getFuncType is > called with startInstance value which is today in this case and it maps to > UNEXPECTED and throws up. getFuncType should be passed the evaluation of > coord-action-create-inst context -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira