[jira] [Commented] (OOZIE-1391) Sub wf suspend doesn't update parent wf

2014-09-30 Thread Shwetha G S (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153048#comment-14153048
 ] 

Shwetha G S commented on OOZIE-1391:


+1 pending jenkins. Triggered 
https://builds.apache.org/job/oozie-trunk-precommit-build/2011/

> Sub wf suspend doesn't update parent wf
> ---
>
> Key: OOZIE-1391
> URL: https://issues.apache.org/jira/browse/OOZIE-1391
> Project: Oozie
>  Issue Type: Bug
>Reporter: Shwetha G S
>Assignee: Jaydeep Vishwakarma
> Attachments: OOZIE-1391-1.patch, OOZIE-1391.patch, suspend-1.patch, 
> suspend.patch
>
>
> If a workflow contains sub workflow, and sub workflow gets suspended, 
> workflow stays in running state. The workflow should also move to suspended 
> state



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-1391) Sub wf suspend doesn't update parent wf

2014-09-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153113#comment-14153113
 ] 

Hadoop QA commented on OOZIE-1391:
--

Testing JIRA OOZIE-1391

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:green}+1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:green}+1{color} the patch does adds/modifies 2 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:red}-1 TESTS{color}
.Tests run: 1533
.Tests failed: 2
.Tests errors: 0

.The patch failed the following testcases:

.  
testRequeueOnException(org.apache.oozie.command.coord.TestCoordPushDependencyCheckXCommand)
.  
testNone(org.apache.oozie.command.coord.TestCoordActionInputCheckXCommand)

{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/oozie-trunk-precommit-build/2011/

> Sub wf suspend doesn't update parent wf
> ---
>
> Key: OOZIE-1391
> URL: https://issues.apache.org/jira/browse/OOZIE-1391
> Project: Oozie
>  Issue Type: Bug
>Reporter: Shwetha G S
>Assignee: Jaydeep Vishwakarma
> Attachments: OOZIE-1391-1.patch, OOZIE-1391.patch, suspend-1.patch, 
> suspend.patch
>
>
> If a workflow contains sub workflow, and sub workflow gets suspended, 
> workflow stays in running state. The workflow should also move to suspended 
> state



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: oozie-trunk-precommit-build #2011

2014-09-30 Thread Apache Jenkins Server
See 

--
[...truncated 10377 lines...]
[INFO] share/lib already added, skipping
[INFO]  already added, skipping
[INFO] share already added, skipping
[INFO] share/lib already added, skipping
[INFO]  already added, skipping
[INFO] share already added, skipping
[INFO] share/lib already added, skipping
[INFO]  already added, skipping
[INFO] share already added, skipping
[INFO] share/lib already added, skipping
[INFO] 
[INFO] 
[INFO] Building Apache Oozie Tools 4.2.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-resources-plugin:2.5:resources (default-resources) @ 
oozie-tools ---
[debug] execute contextualize
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 

[INFO] 
[INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @ oozie-tools 
---
[INFO] Nothing to compile - all classes are up to date
[INFO] 
[INFO] --- maven-resources-plugin:2.5:testResources (default-testResources) @ 
oozie-tools ---
[debug] execute contextualize
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 2 resources
[INFO] 
[INFO] --- maven-compiler-plugin:2.3.2:testCompile (default-testCompile) @ 
oozie-tools ---
[INFO] Nothing to compile - all classes are up to date
[INFO] 
[INFO] --- maven-surefire-plugin:2.12.2:test (default-test) @ oozie-tools ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.3.1:jar (default-jar) @ oozie-tools ---
[INFO] Building jar: 

[INFO] 
[INFO] --- maven-assembly-plugin:2.2.1:single (default-cli) @ oozie-tools ---
[INFO] Reading assembly descriptor: ../src/main/assemblies/tools.xml
[WARNING] The following patterns were never triggered in this artifact 
exclusion filter:
o  '*:*:pom:*'

[INFO] Copying files to 

[WARNING] Assembly file: 

 is not a regular file (it may be a directory). It cannot be attached to the 
project build for installation or deployment.
[INFO] 
[INFO] 
[INFO] Building Apache Oozie MiniOozie 4.2.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-resources-plugin:2.5:resources (default-resources) @ 
oozie-mini ---
[debug] execute contextualize
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 

[INFO] 
[INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @ oozie-mini 
---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-resources-plugin:2.5:testResources (default-testResources) @ 
oozie-mini ---
[debug] execute contextualize
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-compiler-plugin:2.3.2:testCompile (default-testCompile) @ 
oozie-mini ---
[INFO] Nothing to compile - all classes are up to date
[INFO] 
[INFO] --- maven-surefire-plugin:2.12.2:test (default-test) @ oozie-mini ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.3.1:jar (default-jar) @ oozie-mini ---
[WARNING] JAR will be empty - no content was marked for inclusion!
[INFO] Building jar: 

[INFO] 
[INFO] --- maven-assembly-plugin:2.2.1:single (default-cli) @ oozie-mini ---
[INFO] Reading assembly descriptor: src/main/assemblies/empty.xml
[INFO] 
[INFO] 
[INFO] Building Apache Oozie Distro 4.2.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-resources-plugin:2.5:resources (default-resources) @ 
oozie-distro ---
[debug] execute contextualize
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 

[INFO] 
[INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @ oozie-distro 
---
[INFO] No sources to compile
[INFO] 
[INFO] --

[jira] [Commented] (OOZIE-1940) StatusTransitService has race condition

2014-09-30 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153372#comment-14153372
 ] 

Rohini Palaniswamy commented on OOZIE-1940:
---

+1

> StatusTransitService has race condition
> ---
>
> Key: OOZIE-1940
> URL: https://issues.apache.org/jira/browse/OOZIE-1940
> Project: Oozie
>  Issue Type: Bug
>Reporter: Purshotam Shah
>Assignee: Purshotam Shah
> Attachments: OOZIE-1940-V5.patch, OOZIE-1940-V6.patch, 
> OOZIE-1940-V7.patch, OOZIE-1940-V8.patch
>
>
> StatusTransitService doesn't acquire lock while updating DB. 
> We noticed one such issue while doing HA testing, thanks to [~mchiang]
> We issue a change command to change pause time, which got executed on one 
> server. While change command was running on one server, other server started 
> executing StatusTransitService.
> Server 1 log
> {code}
> 2014-07-16 17:28:05,268  INFO StatusTransitService$StatusTransitRunnable:539 
> [pool-1-thread-13] - USER[-] GROUP[-] Acquired lock for 
> [org.apache.oozie.service.StatusTransitService]
> 2014-07-16 17:28:09,694  INFO StatusTransitService$StatusTransitRunnable:539 
> [pool-1-thread-13] - USER[-] GROUP[-] Set coordinator job 
> [0011385-140716042555-oozie-oozi-C] status to 'SUCCEEDED' from 'RUNNING' 
> 2014-07-16 17:28:15,416  INFO StatusTransitService$StatusTransitRunnable:539 
> [pool-1-thread-13] - USER[-] GROUP[-] Released lock for 
> [org.apache.oozie.service.StatusTransitService]
> {code}
> Server 2 log
> {code}
> 2014-07-16 17:28:06,499 DEBUG CoordChangeXCommand:545 [http-0.0.0.0-4443-5] - 
> USER[hadoopqa] GROUP[users] TOKEN[] APP[coordB180] 
> JOB[0011385-140716042555-oozie-oozi-C] ACTION[-] New pause/end date is : Wed 
> Jul 16 17:30:00 UTC 2014 and last action number is : 3
> 2014-07-16 17:28:06,508  INFO CoordChangeXCommand:539 [http-0.0.0.0-4443-5] - 
> USER[hadoopqa] GROUP[users] TOKEN[] APP[coordB180] 
> JOB[0011385-140716042555-oozie-oozi-C] ACTION[-] ENDED CoordChangeXCommand 
> for jobId=0011385-140716042555-oozie-oozi-C
> {code}
> CoordMaterializeTransitionXCommand has created all actions( few were in 
> waiting and few were in running state) and set doneMaterialization to true.
> Change command deletes all waiting coords, except 3 running/SUCCEEDED action 
> and reset doneMaterialization.
> StatusTransitService first loads a set of pending jobs and for each job it 
> make DB calls to check coord action status. Coord jobs are loaded only once 
> in beginning.
> This is what happened.
> 1.StatusTransitService loads the coord job which doneMaterialization is set 
> to true at 17:28:05,268 (server 1)
> 2.Change command deletes waiting cation and reset  doneMaterialization at  
> 17:28:06,508 (server 2)
> 3.StatusTransitService load actions for job, only 3 and in SUCCEEDED status. 
> It never reload the doneMaterialization at 17:28:09,694 (server 1)
> StatusTransitService overrides set job status to SUCCEEDED, bcz it's 
> doneMaterialization and all action are SUCCEEDED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-1940) StatusTransitService has race condition

2014-09-30 Thread Purshotam Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153420#comment-14153420
 ] 

Purshotam Shah commented on OOZIE-1940:
---

Thanks Rohini and Mona for review. Committed to trunk.

> StatusTransitService has race condition
> ---
>
> Key: OOZIE-1940
> URL: https://issues.apache.org/jira/browse/OOZIE-1940
> Project: Oozie
>  Issue Type: Bug
>Reporter: Purshotam Shah
>Assignee: Purshotam Shah
> Attachments: OOZIE-1940-V5.patch, OOZIE-1940-V6.patch, 
> OOZIE-1940-V7.patch, OOZIE-1940-V8.patch
>
>
> StatusTransitService doesn't acquire lock while updating DB. 
> We noticed one such issue while doing HA testing, thanks to [~mchiang]
> We issue a change command to change pause time, which got executed on one 
> server. While change command was running on one server, other server started 
> executing StatusTransitService.
> Server 1 log
> {code}
> 2014-07-16 17:28:05,268  INFO StatusTransitService$StatusTransitRunnable:539 
> [pool-1-thread-13] - USER[-] GROUP[-] Acquired lock for 
> [org.apache.oozie.service.StatusTransitService]
> 2014-07-16 17:28:09,694  INFO StatusTransitService$StatusTransitRunnable:539 
> [pool-1-thread-13] - USER[-] GROUP[-] Set coordinator job 
> [0011385-140716042555-oozie-oozi-C] status to 'SUCCEEDED' from 'RUNNING' 
> 2014-07-16 17:28:15,416  INFO StatusTransitService$StatusTransitRunnable:539 
> [pool-1-thread-13] - USER[-] GROUP[-] Released lock for 
> [org.apache.oozie.service.StatusTransitService]
> {code}
> Server 2 log
> {code}
> 2014-07-16 17:28:06,499 DEBUG CoordChangeXCommand:545 [http-0.0.0.0-4443-5] - 
> USER[hadoopqa] GROUP[users] TOKEN[] APP[coordB180] 
> JOB[0011385-140716042555-oozie-oozi-C] ACTION[-] New pause/end date is : Wed 
> Jul 16 17:30:00 UTC 2014 and last action number is : 3
> 2014-07-16 17:28:06,508  INFO CoordChangeXCommand:539 [http-0.0.0.0-4443-5] - 
> USER[hadoopqa] GROUP[users] TOKEN[] APP[coordB180] 
> JOB[0011385-140716042555-oozie-oozi-C] ACTION[-] ENDED CoordChangeXCommand 
> for jobId=0011385-140716042555-oozie-oozi-C
> {code}
> CoordMaterializeTransitionXCommand has created all actions( few were in 
> waiting and few were in running state) and set doneMaterialization to true.
> Change command deletes all waiting coords, except 3 running/SUCCEEDED action 
> and reset doneMaterialization.
> StatusTransitService first loads a set of pending jobs and for each job it 
> make DB calls to check coord action status. Coord jobs are loaded only once 
> in beginning.
> This is what happened.
> 1.StatusTransitService loads the coord job which doneMaterialization is set 
> to true at 17:28:05,268 (server 1)
> 2.Change command deletes waiting cation and reset  doneMaterialization at  
> 17:28:06,508 (server 2)
> 3.StatusTransitService load actions for job, only 3 and in SUCCEEDED status. 
> It never reload the doneMaterialization at 17:28:09,694 (server 1)
> StatusTransitService overrides set job status to SUCCEEDED, bcz it's 
> doneMaterialization and all action are SUCCEEDED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-1885) Query optimization for StatusTransitService

2014-09-30 Thread Purshotam Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153423#comment-14153423
 ] 

Purshotam Shah commented on OOZIE-1885:
---

This was just an approach. Please refer OOZIE-1940 for query optimization.

> Query optimization for StatusTransitService
> ---
>
> Key: OOZIE-1885
> URL: https://issues.apache.org/jira/browse/OOZIE-1885
> Project: Oozie
>  Issue Type: Bug
>Reporter: Purshotam Shah
>
> {code}
>  private void coordTransit() throws JPAExecutorException, CommandException {
> List pendingJobCheckList = null;
> if (lastInstanceStartTime == null) {
> LOG.info("Running coordinator status service first instance");
> // this is the first instance, we need to check for all 
> pending jobs;
> pendingJobCheckList = jpaService.execute(new 
> CoordJobsGetPendingJPAExecutor(limit));
> }
> else {
> LOG.info("Running coordinator status service from last 
> instance time =  "
> + DateUtils.formatDateOozieTZ(lastInstanceStartTime));
> // this is not the first instance, we should only check jobs
> // that have actions or jobs been
> // updated >= start time of last service run;
> List actionsList = 
> CoordActionQueryExecutor.getInstance().getList(
> 
> CoordActionQuery.GET_COORD_ACTIONS_BY_LAST_MODIFIED_TIME, 
> lastInstanceStartTime);
> Set coordIds = new HashSet();
> for (CoordinatorActionBean action : actionsList) {
> coordIds.add(action.getJobId());
> }
> pendingJobCheckList = new ArrayList();
> for (String coordId : coordIds.toArray(new 
> String[coordIds.size()])) {
> CoordinatorJobBean coordJob;
> try {
> coordJob = 
> CoordJobQueryExecutor.getInstance().get(CoordJobQuery.GET_COORD_JOB, coordId);
> }
> catch (JPAExecutorException jpaee) {
> if (jpaee.getErrorCode().equals(ErrorCode.E0604)) {
> LOG.warn("Exception happened during 
> StatusTransitRunnable; Coordinator Job doesn't exist", jpaee);
> continue;
> } else {
> throw jpaee;
> }
> }
> // Running coord job might have pending false
> Job.Status coordJobStatus = coordJob.getStatus();
> if ((coordJob.isPending() || 
> coordJobStatus.equals(Job.Status.PAUSED)
> || coordJobStatus.equals(Job.Status.RUNNING)
> || 
> coordJobStatus.equals(Job.Status.RUNNINGWITHERROR)
> || 
> coordJobStatus.equals(Job.Status.PAUSEDWITHERROR))
> && !coordJobStatus.equals(Job.Status.IGNORED)) {
> pendingJobCheckList.add(coordJob);
> }
> }
> 
> pendingJobCheckList.addAll(CoordJobQueryExecutor.getInstance().getList(
> CoordJobQuery.GET_COORD_JOBS_CHANGED, 
> lastInstanceStartTime));
> }
> aggregateCoordJobsStatus(pendingJobCheckList);
> }
> }
> {code}
> This could be done in one sql, something like 
> select w.id, w.status, w.pending from CoordinatorJobBean w where 
> w.startTimestamp <= :matTime AND (w.statusStr = 'PREP' OR w.statusStr = 
> 'RUNNING' or w.statusStr = 'RUNNINGWITHERROR' or w.statusStr= 
> 'PAUSEDWITHERROR' and w.statusStr <> 'IGNORED') w.id in  ( select a.jobId 
> from CoordinatorActionBean a where a.lastModifiedTimestamp >= 
> :lastModifiedTime groupby a.jobId)
> Same for bundleTransit().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (OOZIE-1885) Query optimization for StatusTransitService

2014-09-30 Thread Purshotam Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/OOZIE-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Purshotam Shah resolved OOZIE-1885.
---
Resolution: Fixed
  Assignee: Purshotam Shah

> Query optimization for StatusTransitService
> ---
>
> Key: OOZIE-1885
> URL: https://issues.apache.org/jira/browse/OOZIE-1885
> Project: Oozie
>  Issue Type: Bug
>Reporter: Purshotam Shah
>Assignee: Purshotam Shah
>
> {code}
>  private void coordTransit() throws JPAExecutorException, CommandException {
> List pendingJobCheckList = null;
> if (lastInstanceStartTime == null) {
> LOG.info("Running coordinator status service first instance");
> // this is the first instance, we need to check for all 
> pending jobs;
> pendingJobCheckList = jpaService.execute(new 
> CoordJobsGetPendingJPAExecutor(limit));
> }
> else {
> LOG.info("Running coordinator status service from last 
> instance time =  "
> + DateUtils.formatDateOozieTZ(lastInstanceStartTime));
> // this is not the first instance, we should only check jobs
> // that have actions or jobs been
> // updated >= start time of last service run;
> List actionsList = 
> CoordActionQueryExecutor.getInstance().getList(
> 
> CoordActionQuery.GET_COORD_ACTIONS_BY_LAST_MODIFIED_TIME, 
> lastInstanceStartTime);
> Set coordIds = new HashSet();
> for (CoordinatorActionBean action : actionsList) {
> coordIds.add(action.getJobId());
> }
> pendingJobCheckList = new ArrayList();
> for (String coordId : coordIds.toArray(new 
> String[coordIds.size()])) {
> CoordinatorJobBean coordJob;
> try {
> coordJob = 
> CoordJobQueryExecutor.getInstance().get(CoordJobQuery.GET_COORD_JOB, coordId);
> }
> catch (JPAExecutorException jpaee) {
> if (jpaee.getErrorCode().equals(ErrorCode.E0604)) {
> LOG.warn("Exception happened during 
> StatusTransitRunnable; Coordinator Job doesn't exist", jpaee);
> continue;
> } else {
> throw jpaee;
> }
> }
> // Running coord job might have pending false
> Job.Status coordJobStatus = coordJob.getStatus();
> if ((coordJob.isPending() || 
> coordJobStatus.equals(Job.Status.PAUSED)
> || coordJobStatus.equals(Job.Status.RUNNING)
> || 
> coordJobStatus.equals(Job.Status.RUNNINGWITHERROR)
> || 
> coordJobStatus.equals(Job.Status.PAUSEDWITHERROR))
> && !coordJobStatus.equals(Job.Status.IGNORED)) {
> pendingJobCheckList.add(coordJob);
> }
> }
> 
> pendingJobCheckList.addAll(CoordJobQueryExecutor.getInstance().getList(
> CoordJobQuery.GET_COORD_JOBS_CHANGED, 
> lastInstanceStartTime));
> }
> aggregateCoordJobsStatus(pendingJobCheckList);
> }
> }
> {code}
> This could be done in one sql, something like 
> select w.id, w.status, w.pending from CoordinatorJobBean w where 
> w.startTimestamp <= :matTime AND (w.statusStr = 'PREP' OR w.statusStr = 
> 'RUNNING' or w.statusStr = 'RUNNINGWITHERROR' or w.statusStr= 
> 'PAUSEDWITHERROR' and w.statusStr <> 'IGNORED') w.id in  ( select a.jobId 
> from CoordinatorActionBean a where a.lastModifiedTimestamp >= 
> :lastModifiedTime groupby a.jobId)
> Same for bundleTransit().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-1954) Add a way for the MapReduce action to be configured by Java code

2014-09-30 Thread Mona Chitnis (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153721#comment-14153721
 ] 

Mona Chitnis commented on OOZIE-1954:
-

Good work Robert!

> Add a way for the MapReduce action to be configured by Java code
> 
>
> Key: OOZIE-1954
> URL: https://issues.apache.org/jira/browse/OOZIE-1954
> Project: Oozie
>  Issue Type: New Feature
>Affects Versions: trunk
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Fix For: trunk
>
> Attachments: OOZIE-1954.patch, OOZIE-1954.patch, OOZIE-1954.patch
>
>
> With certain other components (e.g. Avro, HFileOutputFormat (HBase), etc), it 
> becomes impractical to use the MapReduce action and users must instead use 
> the Java action. The problem is that these components require a lot of extra 
> configuration that is often hidden from the user in Java code (e.g. 
> HFileOutputFormat.configureIncrementalLoad(job, table); which can also 
> include decision logic, serialization, and other things that we can't do in 
> an XML file directly.
> One way to solve this problem is to allow the user to give the MR action some 
> Java code that would do this configuration, similar to how we allow the 
> {{}} field to specify an external XML file of configuration 
> properties.
> In more detail, we could have an interface; something like this:
> {code}
> public interface OozieActionConfigurator {
>  public void updateOozieActionConfiguration(Configuration conf);
> }
> {code}
> that the user can implement, create a jar, and include with their MR action 
> (i.e. add a "{{}}" field that let's them specify the class 
> name). To protect the Oozie server from running user code (which could do 
> anything it wants really), it would have to be run in the Launcher Job. The 
> Launcher Job could call this method after it loads the configuration prepared 
> by the Oozie server.
> Another thing this will be helpful is with users who use the Java action to 
> launch MR jobs and expect a bunch of things to be done for them that are not 
> (e.g. delegation token propagation, config loading, returning the hadoop job 
> to Oozie, etc). These are all done with the MR action, so the more users we 
> can move to the MR action from the Java action, the less they'll run into 
> these difficulties.
> Some of this may change slightly as I try to actually implement this (e.g. 
> have to handle throwing exceptions etc).  And one thing I may do is keep this 
> general enough that it should be compatible with all action types in case we 
> want to add this to any of them in the future; though for now, the schema 
> would only accept it for the MapReduce action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-1940) StatusTransitService has race condition

2014-09-30 Thread Shwetha G S (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154310#comment-14154310
 ] 

Shwetha G S commented on OOZIE-1940:


[~puru], do you want to add this to 0.6 as well?

> StatusTransitService has race condition
> ---
>
> Key: OOZIE-1940
> URL: https://issues.apache.org/jira/browse/OOZIE-1940
> Project: Oozie
>  Issue Type: Bug
>Reporter: Purshotam Shah
>Assignee: Purshotam Shah
> Attachments: OOZIE-1940-V5.patch, OOZIE-1940-V6.patch, 
> OOZIE-1940-V7.patch, OOZIE-1940-V8.patch
>
>
> StatusTransitService doesn't acquire lock while updating DB. 
> We noticed one such issue while doing HA testing, thanks to [~mchiang]
> We issue a change command to change pause time, which got executed on one 
> server. While change command was running on one server, other server started 
> executing StatusTransitService.
> Server 1 log
> {code}
> 2014-07-16 17:28:05,268  INFO StatusTransitService$StatusTransitRunnable:539 
> [pool-1-thread-13] - USER[-] GROUP[-] Acquired lock for 
> [org.apache.oozie.service.StatusTransitService]
> 2014-07-16 17:28:09,694  INFO StatusTransitService$StatusTransitRunnable:539 
> [pool-1-thread-13] - USER[-] GROUP[-] Set coordinator job 
> [0011385-140716042555-oozie-oozi-C] status to 'SUCCEEDED' from 'RUNNING' 
> 2014-07-16 17:28:15,416  INFO StatusTransitService$StatusTransitRunnable:539 
> [pool-1-thread-13] - USER[-] GROUP[-] Released lock for 
> [org.apache.oozie.service.StatusTransitService]
> {code}
> Server 2 log
> {code}
> 2014-07-16 17:28:06,499 DEBUG CoordChangeXCommand:545 [http-0.0.0.0-4443-5] - 
> USER[hadoopqa] GROUP[users] TOKEN[] APP[coordB180] 
> JOB[0011385-140716042555-oozie-oozi-C] ACTION[-] New pause/end date is : Wed 
> Jul 16 17:30:00 UTC 2014 and last action number is : 3
> 2014-07-16 17:28:06,508  INFO CoordChangeXCommand:539 [http-0.0.0.0-4443-5] - 
> USER[hadoopqa] GROUP[users] TOKEN[] APP[coordB180] 
> JOB[0011385-140716042555-oozie-oozi-C] ACTION[-] ENDED CoordChangeXCommand 
> for jobId=0011385-140716042555-oozie-oozi-C
> {code}
> CoordMaterializeTransitionXCommand has created all actions( few were in 
> waiting and few were in running state) and set doneMaterialization to true.
> Change command deletes all waiting coords, except 3 running/SUCCEEDED action 
> and reset doneMaterialization.
> StatusTransitService first loads a set of pending jobs and for each job it 
> make DB calls to check coord action status. Coord jobs are loaded only once 
> in beginning.
> This is what happened.
> 1.StatusTransitService loads the coord job which doneMaterialization is set 
> to true at 17:28:05,268 (server 1)
> 2.Change command deletes waiting cation and reset  doneMaterialization at  
> 17:28:06,508 (server 2)
> 3.StatusTransitService load actions for job, only 3 and in SUCCEEDED status. 
> It never reload the doneMaterialization at 17:28:09,694 (server 1)
> StatusTransitService overrides set job status to SUCCEEDED, bcz it's 
> doneMaterialization and all action are SUCCEEDED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-1536) Coordinator action reruns start a new workflow

2014-09-30 Thread Shwetha G S (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154321#comment-14154321
 ] 

Shwetha G S commented on OOZIE-1536:


Here is what we can do. This should work for everybody:
1. Coord action re-run takes an addition optional flag -failed which re-runs 
only the failed nodes. This will re-run the existing workflow with 
RERUN_FAIL_NODES=true. But this will also honour coord concurrency.
2. Coord action re-run with no -failed option will use the existing feature of 
launching new workflow.
3. If a workflow has sub-workflow and some actions in the sub-workflow has 
failed, currently, workflow re-run with RERUN_FAIL_NODES=true will launch new 
sub-workflow. Instead, it should re-run only the failed nodes of the 
sub-workflow
4. Disable re-runs using the workflow directly if it has a parent. This applies 
to both cases where parent is coord action/another workflow. This check should 
be controlled using oozie-site config and should be disabled by default for 
backward compatibility

[~rohini], [~puru], [~sriksun], [~jaydeepmail], does this work for your 
usecases? Do you see any issues?

[~jaydeepmail], If everyone is ok with the approach, can you create sub-task 
for each of these and work on them separately? Thanks

> Coordinator action reruns start a new workflow
> --
>
> Key: OOZIE-1536
> URL: https://issues.apache.org/jira/browse/OOZIE-1536
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Srikanth Sundarrajan
>Assignee: Jaydeep Vishwakarma
>
> Coordinator action reruns start a new workflow and if existing workflow for 
> the action is in running state, the same is not checked. Coord rerun can 
> possibly do a workflow re-run to prevent this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-1536) Coordinator action reruns start a new workflow

2014-09-30 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154344#comment-14154344
 ] 

Srikanth Sundarrajan commented on OOZIE-1536:
-

+1, Looks good to me.

> Coordinator action reruns start a new workflow
> --
>
> Key: OOZIE-1536
> URL: https://issues.apache.org/jira/browse/OOZIE-1536
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Srikanth Sundarrajan
>Assignee: Jaydeep Vishwakarma
>
> Coordinator action reruns start a new workflow and if existing workflow for 
> the action is in running state, the same is not checked. Coord rerun can 
> possibly do a workflow re-run to prevent this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-1536) Coordinator action reruns start a new workflow

2014-09-30 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154359#comment-14154359
 ] 

Rohini Palaniswamy commented on OOZIE-1536:
---

LGTM as well.

> Coordinator action reruns start a new workflow
> --
>
> Key: OOZIE-1536
> URL: https://issues.apache.org/jira/browse/OOZIE-1536
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Srikanth Sundarrajan
>Assignee: Jaydeep Vishwakarma
>
> Coordinator action reruns start a new workflow and if existing workflow for 
> the action is in running state, the same is not checked. Coord rerun can 
> possibly do a workflow re-run to prevent this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OOZIE-1536) Coordinator action reruns start a new workflow

2014-09-30 Thread Jaydeep Vishwakarma (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154381#comment-14154381
 ] 

Jaydeep Vishwakarma commented on OOZIE-1536:


+1.  I will create sub task for all items.

> Coordinator action reruns start a new workflow
> --
>
> Key: OOZIE-1536
> URL: https://issues.apache.org/jira/browse/OOZIE-1536
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Srikanth Sundarrajan
>Assignee: Jaydeep Vishwakarma
>
> Coordinator action reruns start a new workflow and if existing workflow for 
> the action is in running state, the same is not checked. Coord rerun can 
> possibly do a workflow re-run to prevent this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)