Author: challngr Date: Thu Jun 13 19:51:12 2013 New Revision: 1492836 URL: http://svn.apache.org/r1492836 Log: UIMA-2682 Duccbook updates.
Modified: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver.tex uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver/job-details.tex uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver/jobs.tex uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver/reservations.tex uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver/services.tex uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver/system.tex Modified: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver.tex URL: http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver.tex?rev=1492836&r1=1492835&r2=1492836&view=diff ============================================================================== --- uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver.tex (original) +++ uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver.tex Thu Jun 13 19:51:12 2013 @@ -5,14 +5,42 @@ \fi \chapter{DUCC Web Server} - The DUCC Web Server default address is accessed from the URL http://wshost:42133. Each local - installation configures the host for "wshost" and may override the default port of 42133. + The DUCC Web Server default address is accessed from the URL http://[DUCC-HOST]:42133. The + {\em[DUCC-HOST]} is the hostname where the local installation has installed the DUCC + Web Server. The Webserver is designed to be mostly self-documenting. The design is intentionally simple - and contains a link to this document. Column headers and reason/state codes have display a short - description if you hover your mouse over it. - - The columns can all be sorted by clicking on the column headers. + and contains a link to this document. Most of the interesting fields and column headers + have ``mouse hovers'' which display a short + description if you hover your mouse pointer over it for a moment. + + Normally, the Web Server automatically fetches new data from DUCC and updates the display. + This is controlled by setting one of the two refresh modes: + \begin{itemize} + \item Manual refresh. In this mode, the browser windows are updated only by using the + browser's refresh button, or the DUCC refresh button (to the left in the header of + each page. + \item Automatic refresh. In this mode, the browser automatically fetches and displays + new data. The rate of refresh is currently fixed and cannot be configured. + \end{itemize} + + Two different display modes are supported: + \begin{itemize} + \item Scroll Mode, and + \item Classic Mode. + \end{itemize} + Modes are switched using the {\em Preferences} link. + + \paragraph{Scroll Mode} When in {\em scroll mode}, a scroll bar is shown to the right, within + the main window. The scroll bar allows scrolling to be restricted to the data + display, leaving column and DUCC headers in place. In this mode any column may be sorted + simply by clicking on it. + + \paragraph{Classic Mode} When in {\em classic mode}, the main data may extend below the + bottom of the page and it will be necessary to use the browser's scroller on the right + to access it. The column headers and DUCC header scrolls off when doing this. Columns + may be sorted in this mode but it is necessary to first switch to ``Manual'' refresh mode to + prevent browser refreshes during sorting and display of data. % Create well-known link to this spot for HTML version \ifpdf @@ -38,7 +66,20 @@ \end{itemize} \item[Preferences] - Set preferences for table style, date style, filters, etc. + The following preferences may be set: + \begin{description} + \item[Table Style] This selects ``scroll'' or ``classic'' display mode, as + described above. + \item[Date Style] This selects long, medium, or long formats for dates. + \item[Description Style] This selects long or short formats for the various + description fields. + \item[Filter Users] This controls the ``filter'' box near the middle of + the header on each page. It allows various levels of inclusion and + exclusion of active or completed work for the filtered users. + \item[Role] This allows selection of ``User'' or ``Administrator'' roles. + This protects registered DUCC administrators from accidentally affecting + other people's work. + \end{description} \item[DuccBook] \hfill \\ This is a link to the HTML version of the document you are reading. @@ -55,13 +96,13 @@ system. \item[System] \hfill \\ - This opens a submenu with system-related links: + This opens a sub-menu with system-related links: \begin{itemize} - \item[] Administration - This opens a page with administrative functions. - \item[] Classes - This shows all the scheduling classes defined to the system. - \item[] Daemons - This shows the status of DUCC's management processes. - \item[] DuccBook - This manual. - \item[] Machines - This shows the status of all the ducc worker nodes. + \item Administration - This opens a page with administrative functions. + \item Classes - This shows all the scheduling classes defined to the system. + \item Daemons - This shows the status of DUCC's management processes. + \item DuccBook - This manual. + \item Machines - This shows the status of all the ducc worker nodes. \end{itemize} \end{description} Modified: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver/job-details.tex URL: http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver/job-details.tex?rev=1492836&r1=1492835&r2=1492836&view=diff ============================================================================== --- uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver/job-details.tex (original) +++ uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver/job-details.tex Thu Jun 13 19:51:12 2013 @@ -5,7 +5,7 @@ This page shows details of all the processes that run in support of a job. The information is divided among four tabs: \begin{description} - \item[Processes] This tab conains details on all the processes for the job, both + \item[Processes] This tab contains details on all the processes for the job, both active, and defunct. \item[Work Items] This tab shows details for each individual work-item in the job. \item[Performance] This tab shows a performance break-down of all the UIMA analytics @@ -21,15 +21,16 @@ \item[Id] \hfill \\ This is the DUCC-assigned numeric id of the process (not the Operating System's - processid). Process 0 is alwyas the Job Driver. + processid). Process 0 is always the Job Driver. \item[Log] \hfill \\ This is the log name for the process. It is hyperlinked to the log itself. \item[Size] \hfill \\ This is the size of the log in MB. If you find you have trouble viewing the log - from the web server it could be because it is too big to view in the server and needs to - be check directly. + from the Web Server it could be because it is too big to view in the server and needs to + be read by some other means than the Web Server. (It is not currently paged in by + the Web Server, it is read in full.) \item[Hostname] \hfill \\ This is the name of the node where the process ran. @@ -38,10 +39,12 @@ This is the Unix process ID (PID) of the process. \item[State Scheduler] \hfill \\ - This shows the Resesource Manager state of the job. It is one of: + % The information comes from here: + % State Scheduler: org.apache.uima.ducc.transport.event.common.IResourceState.ResourceState + This shows the Resource Manager state of the job. It is one of: \begin{description} - \item[Allocated] - The node is still allocated for this job by the RM. + \item[Allocated] - The node is currently allocated for this job by the RM. \item[Deallocated] - The resource manager has deallocated the shares for the job on this node. \end{description} @@ -49,8 +52,13 @@ \item[Reason Scheduler or extraordinary status] \hfill \\ \phantomsection\label{itm:job-details-sched} - This shows why a process was terminated. These all have ``hovers'' that provide more information + + % The information comes from here: + % Reason Scheduler: org.apache.uima.ducc.transport.event.common.IResourceState.ProcessDeallocationType + This column provides a reason for the scheduler state, when the scheduler state is other than ``Allocated''. + These all have ``hovers'' that provide more information if it is available. + \begin{description} \item[AutonomousStop] - The process terminated unexpectedly of its own accord ("crashed", or simply exited.) @@ -60,7 +68,7 @@ \item[Failed] - The process is terminated by the Agent because the JP wrapper was able to detect and communicate a fatal condition (Exception) in the pipeline.. - \item[FailedInitialization] - The process is terminated because the initialization step failed. + \item[FailedInitialization] - The process is terminated because the UIMA initialization step failed. \item[Forced] - The node is preempted by RM for other work because of fair share. @@ -70,23 +78,27 @@ \item[JobFailure] - The job failure limit is exceeded, causing the job to be canceled by the JD. - \item[InitializationTimeout] - The initialization phase exceeded the configured timeout. - - \item[Killed] - The agent terminated the process for some reason. - - \item[Stopped] - The job is winding down, there's no more work for this node, so it stops. + \item[InitializationTimeout] - The UIMA initialization phase exceeded the configured timeout. + \item[Killed] - The agent terminated the process for some reason. The ``Reason Agent'' field + should have more details in this case. + + \item[Stopped] - The process terminated normally. HELP HELP how is this different from Voluntary? + \item[Voluntary] - The job is winding down, there's no more work for this node, so it stops. + HELP HELP How is this different from Stopped. Also State Agent has Stopped - does this + RELATE TO THAT? HELP HELP - \item[Unknown] - None of the above. This is an exceptional condition. Check the JP and JD logs for - possible causes.. + \item[Unknown] - None of the above. This is an exceptional condition, sometimes an + internal DUCC error. Check the JP and JD logs for possible causes.. \end{description} \item[State Agent] \hfill \\ \phantomsection\label{itm:job-details-state} - If there's an error detected only by the DUCC Agent, this shows the Agent's reason for - a process's death. + % This state comes from here: + % State Agent: org.apache.uima.ducc.transport.event.common.IProcessState.ProcessState + This shows the DUCC Agent's view of the state of the process. \begin{description} \item[Starting] The DUCC process manager as issued a request to the assigned to start the process. @@ -98,30 +110,31 @@ \item[Failed] The DUCC Agent reports the process failed with errors. This usually means that UIMA-AS has detected exceptions in the pipeline and reported them to the Job Driver for logging. - \item[FailedInitialization] The process died during the UIMA initializaiton phase. + \item[FailedInitialization] The process died during the UIMA initialization phase. \item[InitializationTimeout] The process exceeded the site's limit for time spent in UIMA initialization. \item[Killed] The DUCC Agent killed the process for some reason. There are - three resons for this: + three rosins for this: \begin{enumerate} \item The Job Processes failed to initialize, - \item The Job Process times out during initialization, - \item The process exceedes its allowed swap. + \item The Job Process timed out during initialization, + \item The process Exocet's its allowed swap. \end{enumerate} - \item[Abandonded] + \item[Abandoned] WHAT IS THIS? \end{description} \item[Reason Agent] \hfill \\ \phantomsection\label{itm:job-details-agent} - If there's an error detected only by the agent, this shows the Agent's reason for - a process's death. + This shows extended reason information if a process exited other than having run out + of work to do. + \begin{description} \item[AgentTimedOutWatingForORState] The DUCC Agent is expecting a state update from the DUCC Orchestrator. Timer on this wait has expired. This usually indicates an infrastructure or communication problem. \item[Croaked] The process exited for no good or clear reason, it simply vanished. - \item[Deallocated] + \item[Deallocated] WHAT IS THIS? \item[ExceededShareSize] The process exceeded it's declared memory size. \item[ExceededSwapThreshold] The process exceeded the configured swap threshold. \item[FailedInitialization] The process was terminated because the UIMA @@ -137,12 +150,12 @@ out of swap space. This is a preemptive measure taken by DUCC to avoid exhaustion of swap, to effect orderly eviction of the job before the operating system starts its own reaping procedures. - \item[AdministratorInitiated] The process was canceled by an adminstrator. + \item[AdministratorInitiated] The process was canceled by an administrator. \item[UserInitated] The process was canceled by the owning user. \end{description} \item[Time Init] \hfill \\ - This is the clock time this process spent in initializaiton. + This is the clock time this process spent in initialization. \item[Time Run] \hfill \\ This is the clock time this process spent in executing, not including @@ -165,7 +178,7 @@ initialization + run times. \item[\%CPU] \hfill \\ - Currnt CPU percent consumed by the process. This will be $>$ 100\% on + Currant CPU percent consumed by the process. This will be $>$ 100\% on multi-core systems if more than one core is being used. Each core contributes up to 100\% CPU, so, for example, on a 16-core machine, this can be as high as 1600\%. @@ -177,7 +190,7 @@ This is the average time in seconds spent per work item in the process. \item[Time max] \hfill \\ - This is the minimum time in seconds spent per work item in theprocess. + This is the minimum time in seconds spent per work item in the process. \item[Time min] \hfill \\ This is the minimum time in seconds spent per work item in the process. @@ -190,7 +203,7 @@ \item[Retry] \hfill \\ This is the number of work items that were retried in this process for any reason, excluding - preemptions. + preemption. \item[Preempt] \hfill \\ This is the number of work items that were preempted from this process, if @@ -205,12 +218,16 @@ \subsection{Work Items} \label{sec:ws-work-items} This tab provides details for each individual work item. Columns include: - + + % The data comes from here: org.apache.uima.ducc.common.jd.files.IWorkItemState.State \begin{description} - \item[SeqNo] This is the sequence work items are fetched from the Collection Reader's + \item[SeqNo] \hfill \\ + This is the sequence work items are fetched from the Collection Reader's getNext() method by the DUCC Job Driver. - \item[Id] This is the name of the work item. - \item[Status] The is the current state of the work item. + \item[Id] \hfill \\ + This is the name of the work item. + \item[Status] \hfill \\ + The is the current state of the work item. States include: \begin{description} \item[ended] The work item is complete. @@ -224,12 +241,17 @@ \end{description} If a work item has not yet been retrieved from the Collect Reader it does not show on this page. - \item[Queuing Time (sec)] The time spent in ActiveMQ after being queued, and before + \item[Queuing Time (sec)] \hfill \\ + The time spent in ActiveMQ after being queued, and before being picked up by a Job Process. - \item[Processing Time (sec)] The time spent processing the work item. - \item[Node (IP)] The node IP where the work item was processed. - \item[Node (Name] The node name where the work item was processed. - \item[PID] The Unix Process Id that the work item was processed in. + \item[Processing Time (sec)] \hfill \\ + The time spent processing the work item. + \item[Node (IP)] \hfill \\ + The node IP where the work item was processed. + \item[Node (Name] \hfill \\ + The node name where the work item was processed. + \item[PID] \hfill \\ + The Unix Process Id that the work item was processed in. \end{description} @@ -239,14 +261,20 @@ are aggregated over all instances of each component in each process of the job. \begin{description} - \item[Name] The short name of the analytic. The full name is shown in the command-line + \item[Name] \hfill \\ + The short name of the analytic. The full name is shown in the command-line tool \hyperref[sec:cli.ducc-perf-stats]{ducc\_perf\_stats} - \item[Total] This is the total time in days, hours, minutes, and seconds taken by each + \item[Total] \hfill \\ + This is the total time in days, hours, minutes, and seconds taken by each component of the pipeline. - \item[\% of Total] This is the percent of the total usage consumed by this analytic. - \item[Avg] This is the average time spent by all the instances of the analytic. - \item[Min] This is the minimum time spent by any instance of the analytic. - \item[Max] This is the maximum time spent by any instance of the analytic. + \item[\% of Total] \hfill \\ + This is the percent of the total usage consumed by this analytic. + \item[Avg] \hfill \\ + This is the average time spent by all the instances of the analytic. + \item[Min] \hfill \\ + This is the minimum time spent by any instance of the analytic. + \item[Max] \hfill \\ + This is the maximum time spent by any instance of the analytic. \end{description} \subsection{Specification} Modified: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver/jobs.tex URL: http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver/jobs.tex?rev=1492836&r1=1492835&r2=1492836&view=diff ============================================================================== --- uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver/jobs.tex (original) +++ uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver/jobs.tex Thu Jun 13 19:51:12 2013 @@ -28,27 +28,50 @@ This is the resource class the job is submitted to. \item[State] \hfill \\ - This shows the state of the job. States include: + This shows the state of the job. The normal job progression is shown below, with an + explanation of what each state means. \begin{description} - \item[Received] - The job has ben vetted, persisted, and assigned a unique ID. + \item[Received] - The job has been vetted, persisted, and assigned a unique ID. \item[WaitingForDriver] - The job is waiting for the Job Driver to initialize. - \item[WaitingForServices] - The job is waiting to verify that any declared services are available. - \item[WaitingForResources] - The job is waiting to be scheduled. - \item[Initializing] - The job is in its initializaiton phase. + \item[WaitingForServices] - The job is waiting for verification from the + Service Manager that required services are started and responding. This may + cause DUCC to start services if necessary. In that even this state will + persist until all pre-requisite services are ready. + \item[WaitingForResources] - The job is waiting to be scheduled. In busy + systems this may require preemption of existing work. In that case this + state will persist until preemption is complete. + \item[Initializing] - The job initializing. Usually this + is the UIMA-AS initialization phase. In the default configuration, only + two (2) processes are allocated by the Resource Manager. No additional + resources are allocated until at least one of the new processes successfully + completes initialization. Once initialization is complete the Resource Manager + will double the number of allocated processes until the user's fair share of + the resources is attained. \item[Running] - At least one process is now initialized and running. - \item[Completing] - The last process has finished and the job is cleaning up. + \item[Completing] - The last work item has completed and DUCC is freeing resources. + If the job had many resources allocated at the time the job exited this state + will persist until all allocated resources are freed. \item[Completed] - The job is complete. \end{description} - \item[Reason] \hfill \\ - This is information relating to completion state. + \item[Reason or Extraordinary Status] \hfill \\ + + % See this structure: + % org.apache.uima.ducc.transport.event.common.IDuccCompletionType + + This field contains miscellaneous information pertaining to the job. If the job exits + the system for any reason, that reason is shown here. If the job's pre-requisite + services are unavailable (or ailing) that fact is displayed here. If there is a + job monitor running, that fact is shown here. Most of the values for this field + support ``hovers'' containing additional information about the reason. + \begin{description} - \item[EndOfJob] - The job ran with no errors. + \item[EndOfJob] - The job and completed ran with no errors. \item[Error] - All work items are processes but at least one had an error. - \item[CanceledByDriver] - The Job Driver (JD) terminated the job. The reason for termination is - seen by hovering over the text with your mouse. + \item[CanceledByDriver] - The Job Driver (JD) terminated the job. The reason for + termination is seen by hovering over the text with your mouse. \item[CanceledBySystem] - The job was canceled because DUCC was shutdown. - \item[CanceledBySser] - The job owner or DUCC administator canceled the job. + \item[CanceledBySser] - The job owner or DUCC administrator canceled the job. \item[DriverInitializationFailure] - The Job Deiver (JD) process is unable to initialize. Hover over the field with your mouse for details (if any are available), and check your JD log. \item[DriverProcessFailed] - The Job Driver (JD) process failed for some reason. Hover over the @@ -59,13 +82,18 @@ Service Manager (SM) cannot find or start the required service. \item[Premature] - The job was terminated for some unknown reason before all work items were processed. Check the JP logs for details. - \item[ProcessInitializationFailure] - Too many processes failed during initialization. Check the JP - logs for the reason. - \item[ProcessFailure] - Too many processes failed while running. Check the JP logs for the reason. + \item[ProcessInitializationFailure] - Too many processes failed during + initialization and the job was canceled by DUCC. Check the JP logs for the + reason. + \item[ProcessFailure] - Too many processes failed while running and DUCC canceled + the job. Check the JP logs for the reason. \item[ResourcesUnavailable] - The Resource Manager (RM) is unable to allocate resources for - the job. For non-preemptable jobs this could be because the limit on that type of allocaiton is + the job. For non-preemptable jobs this could be because the limit on that type of allocation is reached, or all the nodes are already allocated and work cannot be preempted to make space for it. For all jobs, it could be because the job class is invalid. + \item[{\em service_name}] If there is a service name in this field it indicates the job is + dependent on the service but the service is not responding to the Ducc Service Monitor's + pinger. \end{description} \item[Services] \hfill \\ @@ -77,11 +105,11 @@ \item[Init Fails] \hfill \\ This is the total number of initialization failures experienced by the job. This - field is hyperlinked to pages showing the specific failures. + field is hyperlinked to pages with log excerpts highlighting the specific failures. \item[Run Fails] \hfill \\ This is the total number of process failures experienced by the job. This field is - hyperlinked to a page showing the specific failures. + hyperlinked to pages with log excerpts highlighting the specific failures. \item[Pgin] This is the number of page-in events, over all processes, on the machines running the job. @@ -95,23 +123,39 @@ This is the total number of work items declared by the job. \item[Done] \hfill \\ - This is the total number of work items successfuly completed for the job. + This is the total number of work items successfully completed for the job. \item[Error] \hfill \\ This is the total number of exceptions thrown or other errors experienced by work - items. This field is hyperlinked to a page showing the specific failures. + items. This field is hyperlinked to pages containing log excerpts highlighting + the failures. \item[Dispatch] \hfill \\ - This is the total number CASs that are currently dispatched. This is usally - min(Processes * Threads, incomplete\_work\_items - errors) + This is the total number CASs that are currently dispatched. + + This usually represents the quantity derived from the following formula: +\begin{verbatim} + min( (initialized.processes * threads.per.process), (incomplete.work.items - errors) ) +\end{verbatim} + + The actual number is a measured number, not a calculated number, and may differ + slightly from the formula if the measurement is taken immediately after process + start-up, or in the time between a work item completing and a new one being + dispatched. \item[Retry] \hfill \\ - This is the number of CASs that were retried for any reason (such as timeout). + This is the number of CASs that were retried for any reason. Reasons for retry + include preemption for fair-share, work-item timeout, or error conditions. + + Note: If a work item in any process fails, the entire process is considered + suspect, and all work-items in the process are terminated. Work items in the + process which did not have errors are re-dispatched (retried) to a different + process. \item[Preempt] \hfill \\ This is the total number of processes that have been preempted to make room for other work due to Fair Share. \item[Description] \hfill \\ - This is the descriptin string from the --description string from submit. + This is the description string from the $--$description string from submit. \end{description} Modified: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver/reservations.tex URL: http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver/reservations.tex?rev=1492836&r1=1492835&r2=1492836&view=diff ============================================================================== --- uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver/reservations.tex (original) +++ uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver/reservations.tex Thu Jun 13 19:51:12 2013 @@ -8,7 +8,7 @@ and {\em unmanaged.}. A {\em managed reservation} is a reservation whose process is fully managed by DUCC. This process is any arbitrary process and is submitted with the \hyperref[sec:cli.ducc-process-submit]{ducc\_process\_submit} CLI. The lifetime of the reservation -starts at the time DUCC assignes a unique ID, and ends when the process terminates for any reason. +starts at the time DUCC assigns a unique ID, and ends when the process terminates for any reason. An {\em unmanaged reservation} is essentially a sandbox for the user. DUCC starts no processes in the reservation and manages none of the processes which run on that node. The lifetime of the @@ -26,7 +26,7 @@ The Reservations page contains the follo details on the process running in the reservation. \item[Start] \hfill \\ - This is the time the reservation was mde. + This is the time the reservation was mode. \item[End] \hfill \\ This is the time the reservation was canceled or otherwise ended. @@ -42,20 +42,31 @@ The Reservations page contains the follo \hyperref[sec:ws-reservations]{above}. \item[State] \hfill \\ + % 1. org.apache.uima.ducc.transport.event.common.IDuccState This is the status of the reservation. Values include: Received - Reservation has been vetted, persisted, and assigned unique Id. \begin{description} - \item[WaitingForResources] - The reservation is waitng for the Resource Manager to find and - schedule esources. \item[Assigned] - The reservation is active. \item[Completed] - The reservation has been terminated. + \item[Received] - The Reservation has been vetted, persisted, and assigned a unique ID. + \item[WaitingForResources] - The reservation is waiting for the Resource Manager to find and + schedule resources. \end{description} \item[Reason] \hfill \\ - If a reservation is not active, the reason. Reasons include: + + % 2. org.apache.uima.ducc.transport.event.common.IDuccCompletionType + + If a reservation is not active, this shows the reason. Note that for + {\em unmanaged reservations}, even if the user has processes running in the + reservation, DUCC does NOT attempt to terminate those processes (hence, ``unmanaged''.) + + For {\em managed reservations}, DUCC does terminate the associated process. + \begin{description} \item[CanceledBySystem] - The job was canceled because DUCC was shutdown. - \item[CanceledByUser] - The owner or administrator released the reservation. + \item[CanceledByAdmin] - The DUCC administrator released the reservation. + \item[CanceledByUser] - The reservation owner released the reservation. \item[ResourcesUnavailable] - The Resource Manager was unable to find free or freeable resources match the resource request. \item[ProgramExit] - The reservation is a {\em managed} reservation and the associated @@ -63,7 +74,7 @@ The Reservations page contains the follo \end{description} \item[Allocation] \hfill \\ - This is the number of resources (shares for FIXED policy reservartions, processes for + This is the number of resources (shares for FIXED policy reservations, processes for RESERVE policy reservations) that are allocated. \item[UserProcesses] This is the number of processes owned by the user running in all @@ -71,7 +82,7 @@ The Reservations page contains the follo Note that even for {\em unmanaged} reservations, the DUCC agent tracks processes owned by the user and reports on them. This allows better identification and management of - abandonded reservations. + abandoned reservations. \item[Size] \hfill \\ The memory size in GB of the each allocated unit. This is the amount of memory that @@ -82,6 +93,6 @@ The Reservations page contains the follo The node names of the machines where the resources are allocated. \item[Description] \hfill \\ - This is the descriptin string from the --description string from submit. + This is the description string from the --description string from submit. \end{description} Modified: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver/services.tex URL: http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver/services.tex?rev=1492836&r1=1492835&r2=1492836&view=diff ============================================================================== --- uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver/services.tex (original) +++ uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver/services.tex Thu Jun 13 19:51:12 2013 @@ -8,7 +8,7 @@ \item[Id] \hfill \\ This is the unique numeric DUCC id of the service. This ID is hyperlinked to a - \hyperref[sec:ws-service-details]{Servic Details} page with extended + \hyperref[sec:ws-service-details]{Service Details} page with extended details on the service. Note that for some types of services, DUCC may not know more about the service than is shown on the main page. @@ -19,7 +19,7 @@ This is the service type. There are a number of variants on service types, as discussed in the - \hyperref[sec:services.types]{services} section of this book. The webserver + \hyperref[sec:services.types]{services} section of this book. The web server simplifies these into the following three values: \begin{itemize} \item Registered @@ -33,13 +33,17 @@ \begin{description} \item[Available] At least one service instance is responding to the service pinger, indicating it is functional. - \item[Initializing] No service instances are running but at least one instance - is in its UIMA-AS {\em initializing} phase. - \item[Waiting] At least one service instance is in Running state, and the Service - Manager is waiting for a response from the service pinger. - \item[NotAvailable] No service instance is running. + \item[Initializing] No service instances are available for use yet but at least one instance + is in its UIMA {\em initializing} phase. + \item[Waiting] At least one service instance is in Running state, potentially available for use, + but no response has been received from the service pinger. This usually occurs during the + start-up of a service. If a service stops responding to its pinger after becoming + available, the state can regress to Waiting. + \item[NotAvailable] No service instance is running or initializing. \item[Stopping] The service has been stopped for some reason, but not all - instances have terminated. + instances have terminated. This is an intermediate state between Available and + NotAvailable to signify that the service is no longer available but not all its + resources have been returned yet. \end{description} DUCC will start dependent jobs ONLY if it's services are in state Available. Otherwise @@ -50,14 +54,15 @@ allowed to continue. \item[Pinger] \hfill \\ - This indicates whether the Service Manager is running a pinger for the service. + This indicates whether the Service Manager is running a pinger for the service. This column + does not imply the ping is successful; see the ``health'' column for ping status. \item[Health] \hfill \\ {\em Health} is a status returned by each pinger and is the result of that pinger's evaluation of the state of the service. It is shown as on of \begin{itemize} \item {\em Good} - \item {\em Bad} + \item {\em Poor} \end{itemize} Both terms are highly subjective. Pingers may return a summary of the underlying data used to label a service as good or bad. That status is shown as a hover over @@ -95,7 +100,7 @@ \item[Reservations] \hfill \\ This field shows the number of managed reservations dependent on this service. The IDs of the managed reservations - rea shown as a hover over the field. + are shown as a hover over the field. \item[Description] \hfill \\ Modified: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver/system.tex URL: http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver/system.tex?rev=1492836&r1=1492835&r2=1492836&view=diff ============================================================================== --- uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver/system.tex (original) +++ uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/webserver/system.tex Thu Jun 13 19:51:12 2013 @@ -4,9 +4,9 @@ This page shows information relating to the DUCC System itself: \begin{description} - \item[Admistration]This displays system adminstrators and implements + \item[Administration]This displays system administrators and implements the interface to various administrative controls. - \item[Classes] This shows the curent system's scheduling class definitions. + \item[Classes] This shows the current system's scheduling class definitions. \item[Daemons] This shows the status of all DUCC processes. \item[DuccBook] This is a link to the book you are reading. \item[Machines] This shows details of all the machines in the DUCC cluster. @@ -16,21 +16,21 @@ This page shows information relating to This page has two tabs: \begin{description} - \item[Administrators] This shows the userids that are authorized to administer + \item[Administrators] This shows the user-ids that are authorized to administer DUCC. In addition to executing the ``Control'' functions described below, administrators may cancel any job, reservation, or service, and may modify services they do not own. - In order to perform administrative funcrtions, the following must be satisfied: + In order to perform administrative functions, the following must be satisfied: \begin{enumerate} - \item The user is logged-in to the webserver. + \item The user is logged-in to the web server. \item The user is a registered administrator. \item The user has set the role as ``administrator'' in the DUCC Preferences - page. This is a safeguard so that adminimstrators who are also users - are less likely to inadvertantlly affect other people's jobs. + page. This is a safeguard so that administrators who are also users + are less likely to inadvertently affect other people's jobs. \end{enumerate} \item[Control] Currently DUCC supports a single administrative control function - via the webserver: Stop new job submissions and reenable them. If submissions + via the web server: Stop new job submissions and re-enable them. If submissions are blocked, all existing work runs normally, but no new work is accepted. \end{description} @@ -47,16 +47,16 @@ shown. A button in the upper left of th the status of all the DUCC agents as well. (Agents are suppressed by default because the page is expensive to render for large systems.) -The coloumns shown on this page include +The columns shown on this page include \begin{description} \item[Status] \hfill \\ - This indicats whether the daemon is running and broadcasting state {\em up}, + This indicates whether the daemon is running and broadcasting state {\em up}, or not {\em down}. All DUCC daemons broadcast a heartbeat containing process state. If the Status is {\em down}, either the daemon is not functioning, or something is preventing - state from reaching the webserver via DUCC's ActiveMq instance. + state from reaching the web server via DUCC's ActiveMq instance. \item[Daemon Name] \hfill \\ This is the name of the process. @@ -86,10 +86,10 @@ The coloumns shown on this page include \item[Heartbeat (max)] \hfill \\ This shows the longest delay since a state publication for the process was received - at the webserver. Large numbers here indicate potential cluster or DUCC problems. + at the web server. Large numbers here indicate potential cluster or DUCC problems. \item[Heartbeat (max) TOD] \hfill \\ - This shows the time the longest delay of a state publicatin occurred. + This shows the time the longest delay of a state publication occurred. \item[JConsole URL] \hfill \\ This is the jconsole URL for the process. @@ -100,7 +100,7 @@ The coloumns shown on this page include This page shows the states of all the machines in the DUCC cluster. -The coloumns shown on this page include +The columns shown on this page include \begin{description} \item[Status] \hfill \\ @@ -111,12 +111,12 @@ The coloumns shown on this page include started there, or else there is a communication problem and the state messages are not being delivered. \item[up] The node has a DUCC Agent process running on it and the - webserver is receiving regular heartbeat packets from it. + web server is receiving regular heartbeat packets from it. \item[down] The node had a healthy DUCC Agent on it at some point - in the past (since the last DUCC boot), but the webserver has stopped + in the past (since the last DUCC boot), but the web server has stopped receiving heartbeats from it. - The agent may have been manuallly shut down, may have crashed, or there + The agent may have been manually shut down, may have crashed, or there may be a communication problem. Additionally, very heavy loads from jobs running the the node can cause @@ -152,8 +152,8 @@ The coloumns shown on this page include as an alert of a potential problem \item[Alien PIDs] \hfill \\ - This shows the number of processes not owned by DUCC, the opertating system, or - jobs sheduled on each node. The Unix Process IDS of these processes is displayed + This shows the number of processes not owned by DUCC, the operating system, or + jobs scheduled on each node. The Unix Process IDS of these processes is displayed in a hover. DUCC preconfigures many of the standard operating @@ -161,7 +161,7 @@ The coloumns shown on this page include \hyperref[itm:props-rogue.user]{userids}. This list may be updated by each installation. - A common cause of alien PIDs is errant processe run in unmanaged reservations. A + A common cause of alien PIDs is errant process run in unmanaged reservations. A user may reserve a machine for use as a sandbox. If the reservation is released without properly terminating all the processes, they may linger. When ducc schedules the node for other purposes, significant performance penalties may be @@ -171,7 +171,7 @@ The coloumns shown on this page include \item[Shares (total)] \hfill \\ This shows the total number of scheduling share supported on this node. - \item[Shares(inuse)] \hfill \\ + \item[Shares(in use)] \hfill \\ This shows the total number of scheduling share in use on the node. \item[Heartbeat(last)] \hfill \\