Virag, On sharelib being required, yes you are correct. For this we should:
* Make 100% clear in the quick-start/install docs that the sharelib is REQUIRED. * Add a check at Oozie startup to verify the sharelib dir exists, else fail to start. On sharelib lib issue during upgrade for in-flight jobs. Depending on the type of upgrade this may be an issue. * If you are upgrading an oozie server fix that does not change the sharelib files, this is not an issue and you can just shutdown the oozie server with in-flight jobs. * If you are upgrading an oozie server fix that involves sharelib files changes, then you have 2 options: ** Cold upgrade: bring all WF jobs to suspend/completion, wait till all running actions end, then shutdown oozie server, upgrade oozie server and sharelib. Then restart oozie server and resume WF jobs. In this case all new WF actions will use the new sharelib. ** Hot upgrade: stop oozie server. modify the oozie-site.xml sharelib location to point to a new directory. upgrade the oozie server. install the sharelib (will be a create as the sharelib dir in HDFS does not exist). start the oozie server. In this case all running WF actions will continue running with no issues as the JARs in the distributed cache have not been touch. All new WF actions will start using the new sharelib. Note that this sharelib upgrade protocol is not introduced by requiring sharelib, it is required if you have applications that use sharelib today. Does this address your concerns? Thanks. On Tue, Apr 23, 2013 at 1:35 PM, Virag Kothari <[email protected]> wrote: > Hi, > > With OOZIE-1311 and its subtasks, the idea seems to move all the launcher > classes like PigMain, HiveMain etc. to their respective sharelibs. > So, now shared lib is a mandatory deployment step. Before shared lib was > optional as users could bundle jars with their workflow application. > So always requiring shared lib seems to introduce 2 problems: > > 1. The current deployments which don't use action shared lib will fail. > So, probably we should deprecate the current behavior. > > 2. The hadoop distributed cache mechanism will fail a job if the files in > DC are updated on hdfs while the hadoop job is running. So, when Oozie is > restarted and shared lib is uploaded to hdfs as part of > deployment, hadoop will fail the existing jobs for which the > timestamp of the file on hdfs doesn't match the timestamp of its copy in > the job's DC. > > > Thanks, > Virag > > > > >
