Author: challngr
Date: Fri May 17 13:11:22 2013
New Revision: 1483785

URL: http://svn.apache.org/r1483785
Log:
UIMA-2682 Doc updates

Added:
    
uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/services.tex
Modified:
    
uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part1/overview.tex
    
uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/cli/ducc-services.tex
    
uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/ducc-uguide.tex

Modified: 
uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part1/overview.tex
URL: 
http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part1/overview.tex?rev=1483785&r1=1483784&r2=1483785&view=diff
==============================================================================
--- 
uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part1/overview.tex
 (original)
+++ 
uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part1/overview.tex
 Fri May 17 13:11:22 2013
@@ -20,22 +20,22 @@
 
     The DUCC Job model consists of standard UIMA components: a Collection 
Reader (CR), a CAS
     Multiplier (CM), application logic as implemented one or more Analysis 
Engines (AE), and a CAS
-    Consumer (CC).  In theory, any CR, CM, will work with DUCC, but DUCC is 
all about scale-out.  In
+    Consumer (CC).  In theory, any CR, or CM will work with DUCC, but DUCC is 
all about scale-out.  In
     order to achieve good scale-out these components must be constructed in a 
specific way.
 
     The Collection Reader builds input CASs and forwards them to the UIMA 
pipelines.  In the DUCC
-    model, the CR is run in a process separate from the rest of the pipeline. 
In all but the
-    smallest clusters it is run on a machine that is separate from the rest of 
the pipeline.  To
+    model, the CR is run in a process separate from the rest of the pipeline. 
In face, in all but the
+    smallest clusters it is run on a different physical machine than the rest 
of the pipeline.  To
     achieve scalability, the CR must create very small CASs that do not 
contain application data,
     but which contain references to data; for instance, file names.  Ideally, 
the CR should be
     runnable in a process not much larger than the smallest Java virtual 
machine.  Later sections
     demonstrate methods for achieving this.
 
-    Each pipeline must contain at least one CAS Multiplier which receives the 
CASs from the
-    CR.  The CMs encapsulate the knowledge of how to receive the data 
references in the small
-    CASs received from the CRs and deliver data to the application pipeline.  
DUCC packages
-    the CM, AE, and CM into a single process, multiple instances of which are 
then deployed
-    over the cluster.
+    Each pipeline must contain at least one CAS Multiplier which receives the 
CASs from the CR.  The
+    CMs encapsulate the knowledge of how to receive the data references in the 
small CASs received
+    from the CRs and deliver the referenced data to the application pipeline.  
DUCC packages the CM,
+    AE(s), and CM into a single process, multiple instances of which are then 
deployed over the
+    cluster.
 
     DUCC does not provide any mechanism for receiving output CASs.  Each 
application must
     supply its own CAS Consumer which serializes the output of the Analytic 
Engines for 
@@ -76,7 +76,7 @@
     scaled-out job running under DUCC.
 
     \paragraph{UIMA Pipelines}
-    The DUCC job model is implemented over UIMA and UIMA-AS frameworks. A 
normal UIMA pipeline
+    A normal UIMA pipeline
     contains a Collection Reader, one or more Analysis Engines connected in a 
pipeline, and a CAS
     Consumer as shown in Figure ~\ref{UIMA-pipeline}.
 
@@ -92,7 +92,10 @@
     into the analytic pipeline as an interface between the CR and the 
pipeline, as shown in Figure
     ~\ref{UIMA-AS-pipeline} below.
     Multiple analytic pipelines are serviced by the 
-    CR and are scaled-out over a computing cluster. 
+    CR and are scaled-out over a computing cluster.  The difficulty with this 
model is that each
+    user is individually responsible for finding and scheduling computing 
nodes, installing
+    communication software such as ActiveMQ, and generally managing the 
distributed job and
+    associated hardware.
 
     \begin{figure}[H]
       \centering
@@ -102,6 +105,11 @@
     \end{figure}
 
     \paragraph{UIMA-AS Pipeline Scaled By DUCC}
+    DUCC is a UIMA and  UIMA-AS-aware cluster manager.  To scale out work 
under DUCC the developer
+    tells DUCC what the parts of the application are, and DUCC does the work 
to build the
+    scale-out via UIMA/AS, to find and schedule resources, to deploy the parts 
of the application
+    over the cluster, and to manage the jobs while it executes.
+
     On job submission, the DUCC Command Line Interface (CLI) inspects the XML 
defining the analytic
     and generates a UIMA-AS Deployment Descriptor (DD).  The DD establishes 
some number of pipeline
     threads per process (as indicated in the DUCC job parameters), and 
generates job-unique queues.
@@ -134,22 +142,25 @@
 
   
     \section{Error Management }
-    DUCC provides a number of facilities to assist error management.
-
-    DUCC uses the UIMA-AS error-handling facilities to reflect errors from the 
Job Processes to the
-    Job Drivers. The JD wrappers implement logic to enforce error thresholds, 
to identify and log errors,
-    and to reflect job problems in the DUCC Web Server.  All error thresholds 
are
-    configurable globally, and on a per-job basis.
-
-    Error and timeout thresholds are implemented for both the initialization 
phase of a pipeline
-    and the execution phase.
+    DUCC provides a number of facilities to assist error management:
     
-    Retry-after-error is supported: if a process has a failure on some CAS 
after initialization
-    is successful, the process is terminated and the CAS retried, up to some 
configurable threshold.
-
-    DUCC insures that processes can successfully initialize before fully 
scaling out a job, to
-    insure a cluster is not overwhelmed with errant processes.
+    \begin{itemize}
+      \item DUCC uses the UIMA-AS error-handling facilities to reflect errors 
from the Job Processes
+        to the Job Drivers. The JD wrappers implement logic to enforce error 
thresholds, to identify
+        and log errors, and to reflect job problems in the DUCC Web Server.  
All error thresholds are
+        configurable globally, and on a per-job basis.
 
+      \item Error and timeout thresholds are implemented for both the 
initialization phase of a pipeline
+        and the execution phase.
+    
+      \item Retry-after-error is supported: if a process has a failure on some 
CAS after
+        initialization is successful, the process is terminated and the CAS 
retried, up to some
+        configurable threshold.
+
+      \item DUCC insures that processes can successfully initialize before 
fully scaling out a job,
+        to insure a cluster is not overwhelmed with errant processes.
+      \end{itemize}
+      
     \section{Cluster and Job Management}
     DUCC provides significant support for managing multiple jobs and multiple 
users in a distributed cluster:
 
@@ -158,13 +169,14 @@
           provide security and privacy for each user and job. Logs are written 
with the
           user's credentials into the user's file space designated at job 
submission.
 
-        \item[Fair-Share Scheduling] DUCC provides a Fair-Share scheduler.  
The scheduler also supports
-          semi-permanent reservation of full or partial machines.
+        \item[Fair-Share Scheduling] DUCC provides a Fair-Share scheduler to 
equitably share
+          resources among multiple users.  The scheduler also supports 
semi-permanent reservation of
+          full or partial machines.
 
         \item[Service Management] DUCC provides a Service Manager capable of 
automatically starting, stopping, and
           otherwise managing and querying services in support of jobs.
 
-        \item[Job Lifetime Management and Orchestration] DUCC includes an 
Orchestrator to manages the
+        \item[Job Lifetime Management and Orchestration] DUCC includes an 
Orchestrator to manage the
           lifetimes of all entities in the system.
           
         \item[DUCC Agents] DUCC Agents manage each node's local resources and 
all
@@ -175,15 +187,19 @@
             \item Starts and stops all processes on behalf of users.
             \item Patrols the node for ``foreign'' (non-DUCC) processes, 
reporting them to the
               Web Server, and optionally reaping them.
+            \item Insures job processes to not exceed their declared memory 
requirements
+              through the use of Linux Cgroups.
           \end{itemize}
 
         \item[DUCC Web server] DUCC  provides a web server displaying all 
aspects of the system:
           \begin{itemize}
               \item All jobs in the system, their current state, resource 
usage, etc.
                 
-              \item All reserved resources and associated information (owner, 
etc.)
+              \item All reserved resources and associated information (owner, 
etc.),
+                including the ability to request and cancel reservations.
                 
-              \item All services.
+              \item All services, including the ability to start, stop, and 
modify
+                service definitions.
                 
               \item All nodes in the system and their status, usage, etc. 
                                 
@@ -193,11 +209,11 @@
           \end{itemize}
 
 
-        \item[Management Support] DUCC provides rich scripting support to:
+        \item[Cluster Management Support] DUCC provides rich scripting support 
to:
           \begin{itemize}
               \item Start, stop, and query full DUCC systems.
  
-              \item Start and stop and individual DUCC components.
+              \item Start, stop, and quiesce individual DUCC components.
  
               \item Add and delete nodes from the DUCC system.
  

Modified: 
uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/cli/ducc-services.tex
URL: 
http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/cli/ducc-services.tex?rev=1483785&r1=1483784&r2=1483785&view=diff
==============================================================================
--- 
uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/cli/ducc-services.tex
 (original)
+++ 
uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/cli/ducc-services.tex
 Fri May 17 13:11:22 2013
@@ -1,5 +1,5 @@
     \section{ducc\_services}
-
+    \label{DUCC-SERVICES-CLI}
     \paragraph{Description:}
 
         The ducc\_services CLI is used to manage service registration. It has 
a number of functions 

Modified: 
uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/ducc-uguide.tex
URL: 
http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/ducc-uguide.tex?rev=1483785&r1=1483784&r2=1483785&view=diff
==============================================================================
--- 
uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/ducc-uguide.tex
 (original)
+++ 
uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/ducc-uguide.tex
 Fri May 17 13:11:22 2013
@@ -1,4 +1,5 @@
 \chapter{Command Line Interface}
+\label[DUCC-CLI]
 
     \paragraph{The DUCC Job Descriptor}
     The DUCC Job Descriptor includes properties to enable automated management 
and scale-out 
@@ -55,8 +56,12 @@
     \input{part2/cli/ducc-services.tex}
 
 
-
 \chapter{Application Programming  Interface}
+\label[DUCC-API]
+
+\chapter{Service Management}
+\label[DUCC-SERVICES]
+\input{part2/services.tex}
 
 %% this inputs a chapter
 \input{part2/job-logs.tex}
@@ -64,179 +69,3 @@
 %% this inputs a chapter
 \input {part2/webserver.tex}
 
-
-%% TODO TODO This needs breakout into its own file
-
-      \section{Service Management}
-          TODO TODO TODO BREAK OUT INTO SEPARATE SECTION
-          THIS IS A TEMPORARY HOLDING SPOT 
-
-      \paragraph{Overview.} 
-      Services, in the context of DUCC, are long-running processes that await 
requests from
-      UIMA pipeline components and return something in response. Services can 
be any arbitrary process
-      using any arbitrary communication protocol but in the current version of 
DUCC only UIMA-AS
-      services are fully supported.
-
-      The DUCC service manager implements several high-level functions:
-      
-      Insure services are available for jobs before allowing the jobs to 
start. This fail-fast
-      prevents unncessary allocation of resources (with potential eviction of 
healthy processes) for
-      jobs that can't run, as well as quick feedback to users that something 
is amis.
-      
-      Automate the startup, care, and management of services.
-      
-      Report on the state of services: processes, queue depths, comsumers, and 
so on.  
-
-      \paragraph{Service Types.}
-      DUCC supports two types of services: UIMA-AS and CUSTOM:
-      
-      \begin{description}
-          \item[UIMA-AS] This is a "normal" UIMA-AS service. DUCC fully 
supports all aspects of UIMA-AS
-            services.
-            
-          \item[CUSTOM] This is any arbitrary service. DUCC supports 
monitoring of CUSTOM services
-            and performs job dependency checks, but (in the current version) 
does not support start
-            and stop of CUSTOM services.
-      \end{description}
-
-      \paragraph{Service Endpoints.} Services are referenced by a specifier 
called a service
-      endpoint.. The service endpoint is a formatted string indicating:
-
-      \begin{itemize}
-         \item The service type: UIMA-AS or CUSTOM.
-
-         \item The service name. For UIMA-AS services, this is the name of the 
queue in the ActiveMq
-           Broker used for communication with the service. For CUSTOM services 
this is any arbitrary
-           string as dictated by the service. Service names must be unique 
within the system.
-
-         \item For UIMA-AS services only, the URL of the ActiveMq broker.  
-      \end{itemize}
-
-      \paragraph{Dependent and Pre-Requisite Services and Jobs.} A {\em 
dependent service} is a
-      service which is dependent on at least one other service to perform it's 
function. A {\em
-        dependent job} is a job which is dependent on at least one service to 
perform it's function.
-
-      An {\em independent service} service is a service which is required by 
another job or
-      service. (Note that there are no independent jobs.)
-
-      \paragraph{Service Classes.} Services may be started externally to DUCC, 
explicitly through
-      DUCC as a job, or as registered services. These form three natural 
classes of services with
-      slightly different management characteristics.
-
-      \paragraph{Implicit Services.} An implicit service is started externally 
to DUCC and discovered by DUCC only
-      when it is referenced by a job's service-dependency parameter. On 
submission of a job with a
-      dependency on an implicit service, the SM sets up a "ping" thread that 
check if the service
-      exists at the endpoint. If so, the SM adds the service to its list of 
known services and marks
-      the job "ready to schedule". If the service is a UIMA-AS service the SM 
establishes a monitor
-      thread on the queue for reporting purposes. The service is monitored 
throughout the lifetime of
-      the job. If the service should stop responding, its state is updated as 
"not-responding" but the
-      job is allowed to continue as DUCC cannot tell if the job is still using 
it or not, or if the
-      outage is temporary. If the job is a CUSTOM service, the service owner 
may specifiy custom code
-      to run in the ping thread; for CUSTOM services, this same code is used 
to run both ping and
-      monitor functions.
-      
-      When the job exits, a timer is set and DUCC continues to monitor the 
service against the
-      possibility that subsequent jobs will need it. Once the last job using 
the service has exited
-      and the service timer expired, the SM stops the monitors and purges the 
service from its
-      records.
-      
-      \paragraph{Submitted Services.} A submitted service is a service that is 
submitted to DUCC as a job. A
-      submitted service is essentially a normal DD-style job (a job in which 
the user supplies the
-      full UIMA-AS DD), but without a Collection Reader. Because DUCC is 
managing this service it can
-      provide more support than for implicit services.
-      
-      Submitted services can be dependent upon other services. When such a 
service enters the system,
-      DUCC verifies it's pre-requisite services. When (or if) all 
pre-requisite services are availble
-      DUCC marks the new service "ready to schedule". The lifecycle of the 
service is monitored so
-      that dependent services and jobs are marked "ready to schedule" only 
after the submitted service
-      has completed its initialization phase. A ping thread and queue monitor 
are also started against
-      the newly submitted service. If the submitted service is unable to 
successfully initialize,
-      services and jobs that are dependent on it are marked "not runnable" and 
the DUCC Orchestrator
-      cancels them.
-      
-      DUCC manages the lifecycle of submitted services, but because they are 
submitted by entities
-      other than DUCC, the SM performs no additional management for them. When 
a submitted service is
-      canceled by its owner, DUCC stops the ping and queue monitors. Any jobs 
or services dependent on
-      it are allowed to continue until they complete or fail due to 
unavailability of the service.
-      
-      \paragraph{Registered Services.} Registered services are fully managed 
by DUCC. A service is
-      registered with DUCC using the CLI to provide the full job specification 
of the service, the
-      initial number of instances of the service, and whether the service 
should be automatically
-      started when DUCC itself is started. Registered services started when 
DUCC is started are
-      called automatic services.  Registered services that are started only 
when referenced by other
-      dependent jobs or services are called on-demand services. The service is 
registered with the
-      submitter's credentials and is run with that user's credentials when it 
is started.
-
-      \todo Fix and properly place this paragraph.
-          Ping and monitor threads are started. Jobs and other services may 
use these services in the same
-          manner as submitted services. If an automatic service instance 
should die or be canceled out of
-          the scope of the SM, the SM will restart the instance, maintaining 
the registered number of
-          instances at all time. Automatic services are not terminated when 
their dependent jobs/services
-          exit; they're termanted only when DUCC itself is terminated, or by 
use of the service stop
-          command.
-
-      There are several subclasses of Registered Services:
-      \begin{description}
-
-        \item[Automatic Services] An automatic service is a registered service 
that is flagged to be
-          automatically started when the DUCC system is started. When DUCC is 
started, the SM checks the
-          service registry for all service that are marked for automatic 
startup. The SM submits the
-          registered service specification on behalf of its owner. Each such 
submission is for a single
-          service instance.  If found, the SM repeatedly submits the 
specification until the registered
-          number of instances is reached.
-          
-        \item[On-Demand Services] An on-demand service is a registered service 
that is started only when
-          referenced by the service-dependency of another job or service. f 
the service is already
-          started, the dependent job/service is marked ready to schedule as 
indicated above. If not, the
-          service registry is checked and if a start-on-demand service with an 
endpoint matching the
-          service-dependency is found, DUCC submits the service on behalf of 
the service owner (in the
-          same manner as for automatic servic establishing the registered 
number of service instances, a
-          ping thread, and a monitor). When the service has completed 
initialization the dependent
-          job/service is marked ready to schedule. If the on-demand service 
cannot be found in the
-          registery, the referring entity is marked not-startable and the DUCC 
Orchestrator cancels it.
-          
-          Subsequent jobs and services that reference the on-demand service 
will use the started
-          instances.  When the last job/service that references the on-demand 
service exits, a
-          (configurable) timer is established to keep the service alive for a 
while (in anticipation that
-          it will be needed again soon.)  When the keep-alive timer exipires, 
and there are no more
-          dependent jobs/services, the on-demand service is automatically 
stopped to free up its resources
-          for other work.
-
-          \item[External Services] External services consist of only a ping 
thread.  The service
-            itself is not managed in any way by DUCC.  This is useful for 
managing dependencies
-            on services that are not under DUCC control: DUCC can detect and 
report on the health
-            of these services and take appropriate actions on dependent jobs 
if the services
-            are not responsive.
-      \end{description}
-          
-    \paragraph{Registered Service Management.} The CLI for registered services 
provides several functions:
-
-    \begin{description}
-        \item[Register] Register files a service specification with the SM. 
The service may optionally
-          be started as part of registration. The service definition and state 
is persisted over system
-          restarts and is deleted only with the Unregister function.
-          
-        \item[Unregister] Unregister removes the service specification. The 
service is stopped if it is
-          started and not busy. (Note that if the service is busy, jobs and 
services that are dependent
-          on it may subsequently fail.)
-          
-        \item[Modify] Modify allows dynamic update of some parameters of 
registered services:
-            \begin{itemize}
-              \item Automatic and On-Demand state.
-              \item The minimum number of service instances to start when the 
service is started.  
-            \end{itemize}
-
-        \item[Start] Start submits the service specification to the DUCC 
Orchestrator (repeatedly,
-          until the correct number of instances are started). If the service 
is explicitly started
-          with the start CLI, the service continues to run even after the last 
reference is gone,
-          regardless of whether it is automatic or on-demand. Start is also 
used to increase the
-          number of running instances of a service. The registry may be 
optionally updated to
-          reflect the new number of started instances.
-          
-        \item[Stop] Stop stops the instances for a registered service. The 
registry may be
-          optionally updated to reflect the new number of instances that are 
still running.
-
-        \item[Query] A CLI-based query is supplied to report on all services 
known to DUCC, their
-          states, their instances, their dependent jobs, and performance 
statistics for the service.
-    \end{description}
-        

Added: 
uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/services.tex
URL: 
http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/services.tex?rev=1483785&view=auto
==============================================================================
--- 
uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/services.tex
 (added)
+++ 
uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/services.tex
 Fri May 17 13:11:22 2013
@@ -0,0 +1,326 @@
+
+      \section{Overview.} 
+      A DUCC service is defined by the following two criteria:
+      \begin{itemize}
+          \item A service is one or more long-running processes that await 
requests from
+            UIMA pipeline components and return something in response.  These 
processes
+            are usually managed by DUCC but need not be.
+          \item A sservice is accompanied by a small program called a 
``pinger'' that
+            the DUCC Service Manager uses to guage the availability and health 
of the
+            service.  This process must always be supplied to DUCC.
+      \end{itemize}
+
+      A service is usually a UIMA-AS service, but DUCC supports any arbitrary 
process
+      as a serive.
+
+      The DUCC service manager implements several high-level functions:
+      
+      \begin{itemize}
+          \item Insure services are available for jobs before allowing the 
jobs to start. This fail-fast
+            prevents unncessary allocation of resources (with potential 
eviction of healthy processes)
+            for jobs that can't run, as well as quick feedback to users that 
something is amis.
+      
+          \item Manage the startup and management of services: allocate 
resources, spawn the
+            processes, insure the processes stay alive, handle errors, etc.
+      
+          \item Report on the state and availablity of services.
+       \end{itemize}
+
+      \section{Service Pingers}
+      A service pinger is a small program that queries a service on behalf of 
the
+      DUCC Service Manager to:
+      \begin{itemize}
+        \item Report on the availiability of the service: does it respond to 
requests?
+        \item Report on the healh of the service: is it overload, is it 
repsonding
+          adequiately, etc.
+      \end{itemize}
+      
+      Service pingers are always written in Java and must implement an 
abstract class,
+      {\tt org.apache.uima.ducc.common.AServicePing}.   When a service is 
deployed by
+      DUCC, the Service Manager spawns a DUCC process that instantiates the 
pinger for
+      the service.  On a regular basis, the Service Manager sends a request to 
the pinger
+      to query the service health.
+
+      \subsection{Declaring a Pinger in A Service}
+
+      If your service is a UIMA-AS service, there is no need to create or 
declare a pinger.  DUCC
+      provides a default pinger.  If a CUSTOM pinger is required, it must be 
declared in the service
+      descriptor, and the service must be registered.  See Section 
~\ref{DUCC-SERVICES-CLI} for
+      details on service registration and the ping directives.      
+
+      \subsection{Implementing a Pinger}
+      Pingers must implement the class {\tt 
org.apache.uima.ducc.common.AServicePing}.  The class
+      is shown below in figure ~\ref{ABSTRACT-PINGER}.
+      \begin{figure}[H]
+\begin{verbatim}
+package org.apache.uima.ducc.common;
+
+public abstract class AServicePing
+{
+    /**
+     * Called by the ping driver, to pass in useful things the pinger may want.
+     * @param endpoint This is the name of the service endpoint, as passed in
+     *                 at service registration.
+     */
+    public abstract void init(String endpoint)  throws Exception;
+
+    /**
+     * Stop is called by the ping wrapper when it is being killed.  
Implementors may optionally
+     * override this method with conenction shutdown code.
+     */
+    public abstract void stop();
+
+    /**
+     * Returns the object with application-derived health and statistics.
+     * @return {@link ServiceStatistcs} This object contains the informaton 
the service manager and web server require
+     *     for correct management and display of the service.
+     */
+    public abstract ServiceStatistics getStatistics();
+    
+}
+\end{verbatim}
+        \caption{} Service Ping Abstract Class
+        \label{ABSTRACT-PINGER}
+
+      \end{figure}
+      
+      The ServiceStatistics class defines these methods:
+      \begin{description}
+        \item[ServiceStatistics(boolean alive, boolean healthy, String info)] 
This is the constructor.
+          \begin{description}
+            \item[boolean alive] Set this to ``true'' if the service is 
responsive.  If a pinger responds
+              ``false'' (or does not respond), the Service Manager will assume 
the service is unavailable
+              and will not allow jobs dependent on this service to start.  
(Dependent jobs that are already
+              started are allowed to continue, but are annoated in the web 
server, such that developers
+              will know the job may not be functioning because of the service.)
+            \item[boolean healthy] The pinger may perform analysis on the 
service to determine whether
+              the service is ``healthy'' or not.  This is strictly subjective 
and is used by the
+              Service Manager only for reporting to the web server.
+            \item[String info] This is any string in any format.  The pinger 
sets health and availability
+              data into it for display in the webserver.  For example, the 
default UIMA-AS pinger sets
+              ActiveMQ service statistics into this string.)
+          \end{description}
+          
+          \item[void setAlive(boolean alive)] Set the ``aliveness'' of the 
service.
+
+          \item[void setUnhealthy(boolean val)] Set the ``healthiness'' of the 
service.
+            
+          \item[void setInfo(String info)] update the service information 
string.
+      \end{description}
+
+      A sample CUSTOM pinger is shown in figure ~\ref{CUSTOM-PINGER} below. 
The pinger assumes a simple
+      service port that, on connection, returns an integer.  If the connect 
and read of the integer succeds,
+      the ping is marked successful. 
+
+      \begin{figure}[H]
+\begin{verbatim}
+import java.io.DataInputStream;
+import java.io.InputStream;
+import java.net.Socket;
+import org.apache.uima.ducc.common.AServicePing;
+import org.apache.uima.ducc.common.ServiceStatistics;
+
+public class CustomPing
+    extends AServicePing
+{
+    String host;
+    String port;
+    public void init(String endpoint) throws Exception {
+        String[] parts = endpoint.split(":");
+        host = parts[1];
+        port = parts[2];
+    }
+
+    public void stop()  {  }
+
+    public long readLong(DataInputStream dis) throws Exception {
+        return Long.reverseBytes(dis.readLong());
+    }
+
+    public ServiceStatistics getStatistics() {
+        ServiceStatistics stats = new ServiceStatistics(false, false,"<NA>");
+        try {
+            Socket sock = new Socket(host, Integer.parseInt(port));
+            DataInputStream dis = new DataInputStream(sock.getInputStream());
+
+            long stat1 = readLong(dis); long stat2 = readLong(dis); 
+            long stat3 = readLong(dis); long stat4 = readLong(dis);
+
+            stats.setAlive(true);  stats.setHealthy(true);
+            stats.setInfo(  "S1[" + stat1 + "] S2[" + stat2 + "] S3[" + stat3 
+ "] S4[" + stat4 + "]" );
+        } catch ( Throwable t) {
+               t.printStackTrace();
+            stats.setInfo(t.getMessage());
+        }
+        return stats;        
+    }
+}
+\end{verbatim}
+        \caption{} Sample UIMA-AS Service Pinger
+        \label{CUSTOM-PINGER}
+
+      \end{figure}
+      
+
+     \section{Service Types.}
+      DUCC supports two types of services: UIMA-AS and CUSTOM:
+      
+      \begin{description}
+          \item[UIMA-AS] This is a "normal" UIMA-AS service. DUCC fully 
supports all aspects of UIMA-AS
+            services with minimal effort from developers.  A default 
``pinger'' is supplied by DUCC
+            for UIMA-AS services.  (It is legal to define a CUSTOM pinger for 
a UIMA-AS service,
+            however.)
+            
+          \item[CUSTOM] This is any arbitrary service.  Developers must 
provide a CUSTOM pinger
+            and declare it int he service registration.            
+      \end{description}
+
+      \section{Service Endpoints.} Services are referenced by a specifier 
called a service
+      endpoint. The service endpoint is a formatted string used to uniquely 
identify each
+      service, and to supply contact information to the pingers.  A service 
endpoint
+      is of the form 
+\begin{verbatim}
+      <service-type>:<unique id and contact information>
+\end{verbatim}
+      
+      The {\em service-type} must be either UIMA-AS or CUSTOM.
+      
+      The {\em unique id and contact information} is any string needed to 
insure the service is
+      uniquely name.  This string is passsed to the service pinger and must 
contain sufficient
+      information for the pinger to contact the service.  For UIMA-AS 
services, service endpoint is
+      inferred by the CLI by inspection of the UIMA XML descriptor.  For 
reference: the UIMA-AS
+      service endpoint is of the form:
+\begin{verbatim}
+      UIMA-AS:queue-name:broker-url
+\end{verbatim}
+      where {\em queue-name} is the name of the ActiveMQ queue used by the 
service, and {\em broker-url}
+      is the ActiveMQ broker URL.
+      
+      \section{Dependent and Pre-Requisite Services and Jobs.} A {\em 
dependent service} is a
+      service which is dependent on at least one other service to perform it's 
function. A {\em
+        dependent job} is a job which is dependent on at least one service to 
perform it's function.
+
+      An {\em independent service} service is a service which is required by 
another job or
+      service. (Note that there are no independent jobs.)
+
+      \section{Service Classes.} Services may be started externally to DUCC, 
explicitly through
+      DUCC as a job, or as registered services. These form three natural 
classes of services with
+      slightly different management characteristics.
+
+      \subsection{Implicit Services.} An implicit service is started 
externally to DUCC and discovered by DUCC only
+      when it is referenced by a job's service-dependency parameter. On 
submission of a job with a
+      dependency on an implicit service, the SM sets up a "ping" thread that 
check if the service
+      exists at the endpoint. If so, the SM adds the service to its list of 
known services and marks
+      the job "ready to schedule". If the service is a UIMA-AS service the SM 
establishes a monitor
+      thread on the queue for reporting purposes. The service is monitored 
throughout the lifetime of
+      the job. If the service should stop responding, its state is updated as 
"not-responding" but the
+      job is allowed to continue as DUCC cannot tell if the job is still using 
it or not, or if the
+      outage is temporary. If the job is a CUSTOM service, the service owner 
may specifiy custom code
+      to run in the ping thread; for CUSTOM services, this same code is used 
to run both ping and
+      monitor functions.
+      
+      When the job exits, a timer is set and DUCC continues to monitor the 
service against the
+      possibility that subsequent jobs will need it. Once the last job using 
the service has exited
+      and the service timer expired, the SM stops the monitors and purges the 
service from its
+      records.
+      
+      \subsection{Submitted Services.} A submitted service is a service that 
is submitted to DUCC as a job. A
+      submitted service is essentially a normal DD-style job (a job in which 
the user supplies the
+      full UIMA-AS DD), but without a Collection Reader. Because DUCC is 
managing this service it can
+      provide more support than for implicit services.
+      
+      Submitted services can be dependent upon other services. When such a 
service enters the system,
+      DUCC verifies it's pre-requisite services. When (or if) all 
pre-requisite services are availble
+      DUCC marks the new service "ready to schedule". The lifecycle of the 
service is monitored so
+      that dependent services and jobs are marked "ready to schedule" only 
after the submitted service
+      has completed its initialization phase. A ping thread and queue monitor 
are also started against
+      the newly submitted service. If the submitted service is unable to 
successfully initialize,
+      services and jobs that are dependent on it are marked "not runnable" and 
the DUCC Orchestrator
+      cancels them.
+      
+      DUCC manages the lifecycle of submitted services, but because they are 
submitted by entities
+      other than DUCC, the SM performs no additional management for them. When 
a submitted service is
+      canceled by its owner, DUCC stops the ping and queue monitors. Any jobs 
or services dependent on
+      it are allowed to continue until they complete or fail due to 
unavailability of the service.
+      
+      \subsection{Registered Services.} Registered services are fully managed 
by DUCC. A service is
+      registered with DUCC using the CLI to provide the full job specification 
of the service, the
+      initial number of instances of the service, and whether the service 
should be automatically
+      started when DUCC itself is started. Registered services started when 
DUCC is started are
+      called automatic services.  Registered services that are started only 
when referenced by other
+      dependent jobs or services are called on-demand services. The service is 
registered with the
+      submitter's credentials and is run with that user's credentials when it 
is started.
+
+      \todo Fix and properly place this paragraph.
+          Ping and monitor threads are started. Jobs and other services may 
use these services in the same
+          manner as submitted services. If an automatic service instance 
should die or be canceled out of
+          the scope of the SM, the SM will restart the instance, maintaining 
the registered number of
+          instances at all time. Automatic services are not terminated when 
their dependent jobs/services
+          exit; they're termanted only when DUCC itself is terminated, or by 
use of the service stop
+          command.
+
+      There are several subclasses of Registered Services:
+      \begin{description}
+
+        \item[Automatic Services] An automatic service is a registered service 
that is flagged to be
+          automatically started when the DUCC system is started. When DUCC is 
started, the SM checks the
+          service registry for all service that are marked for automatic 
startup. The SM submits the
+          registered service specification on behalf of its owner. Each such 
submission is for a single
+          service instance.  If found, the SM repeatedly submits the 
specification until the registered
+          number of instances is reached.
+          
+        \item[On-Demand Services] An on-demand service is a registered service 
that is started only when
+          referenced by the service-dependency of another job or service. f 
the service is already
+          started, the dependent job/service is marked ready to schedule as 
indicated above. If not, the
+          service registry is checked and if a start-on-demand service with an 
endpoint matching the
+          service-dependency is found, DUCC submits the service on behalf of 
the service owner (in the
+          same manner as for automatic servic establishing the registered 
number of service instances, a
+          ping thread, and a monitor). When the service has completed 
initialization the dependent
+          job/service is marked ready to schedule. If the on-demand service 
cannot be found in the
+          registery, the referring entity is marked not-startable and the DUCC 
Orchestrator cancels it.
+          
+          Subsequent jobs and services that reference the on-demand service 
will use the started
+          instances.  When the last job/service that references the on-demand 
service exits, a
+          (configurable) timer is established to keep the service alive for a 
while (in anticipation that
+          it will be needed again soon.)  When the keep-alive timer exipires, 
and there are no more
+          dependent jobs/services, the on-demand service is automatically 
stopped to free up its resources
+          for other work.
+
+          \item[External Services] External services consist of only a ping 
thread.  The service
+            itself is not managed in any way by DUCC.  This is useful for 
managing dependencies
+            on services that are not under DUCC control: DUCC can detect and 
report on the health
+            of these services and take appropriate actions on dependent jobs 
if the services
+            are not responsive.
+      \end{description}
+          
+    \subsection{Registered Service Management.} The CLI for registered 
services provides several functions:
+
+    \begin{description}
+        \item[Register] Register files a service specification with the SM. 
The service may optionally
+          be started as part of registration. The service definition and state 
is persisted over system
+          restarts and is deleted only with the Unregister function.
+          
+        \item[Unregister] Unregister removes the service specification. The 
service is stopped if it is
+          started and not busy. (Note that if the service is busy, jobs and 
services that are dependent
+          on it may subsequently fail.)
+          
+        \item[Modify] Modify allows dynamic update of some parameters of 
registered services:
+            \begin{itemize}
+              \item Automatic and On-Demand state.
+              \item The minimum number of service instances to start when the 
service is started.  
+            \end{itemize}
+
+        \item[Start] Start submits the service specification to the DUCC 
Orchestrator (repeatedly,
+          until the correct number of instances are started). If the service 
is explicitly started
+          with the start CLI, the service continues to run even after the last 
reference is gone,
+          regardless of whether it is automatic or on-demand. Start is also 
used to increase the
+          number of running instances of a service. The registry may be 
optionally updated to
+          reflect the new number of started instances.
+          
+        \item[Stop] Stop stops the instances for a registered service. The 
registry may be
+          optionally updated to reflect the new number of instances that are 
still running.
+
+        \item[Query] A CLI-based query is supplied to report on all services 
known to DUCC, their
+          states, their instances, their dependent jobs, and performance 
statistics for the service.
+    \end{description}
+        


Reply via email to