Virag,

On sharelib being required, yes you are correct. For this we should:

* Make 100% clear in the quick-start/install docs that the sharelib is
REQUIRED.

* Add a check at Oozie startup to verify the sharelib dir exists, else fail
to start.

On sharelib lib issue during upgrade for in-flight jobs. Depending on the
type of upgrade this may be an issue.

* If you are upgrading an oozie server fix that does not change the
sharelib files, this is not an issue and you can just shutdown the oozie
server with in-flight jobs.

* If you are upgrading an oozie server fix that involves sharelib files
changes, then you have 2 options:

** Cold upgrade: bring all WF jobs to suspend/completion, wait till all
running actions end, then shutdown oozie server, upgrade oozie server and
sharelib. Then restart oozie server and resume WF jobs. In this case all
new WF actions will use the new sharelib.

** Hot upgrade: stop oozie server. modify the oozie-site.xml sharelib
location to point to a new directory. upgrade the oozie server. install the
sharelib (will be a create as the sharelib dir in HDFS does not exist).
start the oozie server. In this case all running WF actions will continue
running with no issues as the JARs in the distributed cache have not been
touch. All new WF actions will start using the new sharelib.

Note that this sharelib upgrade protocol is not introduced by requiring
sharelib, it is required if you have applications that use sharelib today.

Does this address your concerns?

Thanks.



On Tue, Apr 23, 2013 at 1:35 PM, Virag Kothari <[email protected]> wrote:

> Hi,
>
> With OOZIE-1311 and its subtasks,  the idea seems to move all the launcher
> classes like PigMain, HiveMain etc. to  their respective sharelibs.
> So, now shared lib is a mandatory deployment step. Before shared lib was
> optional as users could bundle jars with their workflow application.
> So always requiring shared lib seems to introduce 2 problems:
>
>   1.  The current deployments which don't use action shared lib will fail.
> So, probably we should deprecate the current behavior.
>
> 2. The hadoop distributed cache mechanism will fail a job if the files in
> DC are updated on hdfs while the hadoop job is running. So, when Oozie is
> restarted and shared lib is uploaded to hdfs as part of
>              deployment, hadoop  will fail the existing jobs for which the
> timestamp of  the file on hdfs doesn't match the timestamp of its copy in
> the job's DC.
>
>
> Thanks,
> Virag
>
>
>
>
>

Reply via email to