I am sponsoring this fast-track case for myself.
The proposal will time out 01/23/2008

The proposed release binding is "minor release" (do I have that right?)
so it can be implemented in Solaris Nevada.
I see no need to do a back-port of this amount of change to Solaris 10.

This case seeks to make the POSIX standard scheduling interfaces,
implemented in libc, compatible with the Solaris scheduler class
interfaces, implemented by the priocntl() system call.

To this end, it is proposed to expand the list of POSIX-style
scheduling policies, defined in <sched.h>, from:
    #define SCHED_OTHER  0
    #define SCHED_FIFO   1       /* run to completion */
    #define SCHED_RR     2       /* round-robin */
    #define SCHED_SYS    3       /* sys scheduling class */
    #define SCHED_IA     4       /* interactive class */
to:
    #define SCHED_OTHER  0   /* traditional time-sharing scheduling class */
    #define SCHED_FIFO   1   /* real-time class: run to completion */
    #define SCHED_RR     2   /* real-time class: round-robin */
    #define SCHED_SYS    3   /* system scheduling class */
    #define SCHED_IA     4   /* interactive time-sharing class */
    #define SCHED_FSS    5   /* fair-share scheduling class */
    #define SCHED_FX     6   /* fixed-priority scheduling class */

To expunge the old libthread pseudo priority range, defined
privately inside of libc (formerly in libthread), altogether:
    #define THREAD_MIN_PRIORITY     0       /* minimum scheduling pri */
    #define THREAD_MAX_PRIORITY     127     /* max scheduling priority */

To make the sched_*() and pthread_*() scheduling interfaces deal
with proper priority ranges, as defined by the priocntl(2) interface,
not the inverted nice value ranges as is done now for SCHED_OTHER.

To change the definition of the subcommand
    PC_GETPRIRANGE
of the priocntl(2) interface and to invent a new
    PC_DOPRIO
subcommand for the priocntl(2) interface, requiring
changes to the kernel scheduling class interface:
    CL_GETCLPRI()
        Each scheduling class must change its interface definition.
    CL_DOPRIO()
        Each scheduling class must inplement this new interface.

And finally to change the pthread default 'inheritsched' attribute
for pthread_create(3C) from:
    PTHREAD_EXPLICIT_SCHED
to:
    PTHREAD_INHERIT_SCHED

================================================================

For some background, see these bugids:

1144092 need support for POSIX message passing, scheduling,
        semaphores, shared memory

4032295 Standards violation - SCHED_RR and SCHED_FIFO defined
        but not supported

This is the bugid relating to this PSARC case:

6647542 POSIX scheduling should be compatible with Solaris scheduling classes

================================================================

History and rationale:

Support for POSIX scheduling was added to Solaris in 1994,
in the Solaris 2.4 release.

Before that time, the only threads scheduling functions available
were in libthread:
    int thr_setprio(thread_t tid, int prio);
    int thr_getprio(thread_t tid, int *prio);
and these defined (internally) the legitimate range of priorities to be:
    #define THREAD_MIN_PRIORITY     0       /* minimum scheduling pri */
    #define THREAD_MAX_PRIORITY     127     /* max scheduling priority */

This was adequate for the time, when the only recognized scheduling
class for a multithreaded application was Time-Sharing (TS).

However, even at that time there was a mismatch between the notions
of priority as expressed by Solaris scheduling classes, manipulated
using the priocntl(2) system call and the priocntl(1) command,
and priority as expressed by thr_setprio(3C) and thr_getprio(3C).
The time-sharing priority range expressed by priocntl(2) was (and is)
-60 to +60 as opposed to the range for thr_setprio(3C), 0 to 127.

The situation was made a bit more complicated by the system's support
for the old notion of a nice(2) value for a time-sharing process,
where the range of nice values was (and is) -20 to +20, with lower
nice values indicating higher scheduling priority (a nice value of
-20 maps roughly into a time-sharing user-level priority of +60).
The very old but standardized setpriority(3C) and getpriority(3C)
interfaces deal with this range of "priorities" (not real priorities).

The situation was made considerably more complicated by the original
two-level multithreaded process model, where user-level threads were
multiplexed over a usually smaller set of kernel schedulable
entities or LWPs (light-weight processes).  The thr_setprio(3C)
interface applied its notion of priority to user-level threads while
the priocntl(2) interface applied its notion of priority to LWPs.

Then along came the standardized POSIX scheduling interfaces:
    sched_setparam(3RT)
    sched_getparam(3RT)
    sched_setscheduler(3RT)
    sched_getscheduler(3RT)
    sched_get_priority_min(3RT)
    sched_get_priority_max(3RT)
    sched_rr_get_interval(3RT)
These interfaces introduced the scheduling policies:
    SCHED_OTHER
    SCHED_FIFO
    SCHED_RR
along with the possibility of other scheduling policies, defined
for the system in the <sched.h> header file.  The concept of a
priority range per scheduling policy was also introduced.

These interfaces all concern themselves with individual processes,
not with individual threads.  Any process can apply any of these
interfaces to itself or to any other process, subject only to
permissions.  To "apply to a process" means to apply uniformly
to every kernel schedulable entity (LWP) within the process.

Finally along came the standardized POSIX threads scheduling interfaces:
    pthread_setschedparam(3C)
    pthread_getschedparam(3C)
    pthread_setschedprio(3C)
and their adjuncts:
    pthread_attr_setschedparam(3C)
    pthread_attr_getschedparam(3C)
    pthread_attr_setschedpolicy(3C)
    pthread_attr_getschedpolicy(3C)
    pthread_mutex_setprioceiling(3C)
    pthread_mutex_getprioceiling(3C)
    pthread_mutexattr_setprioceiling(3C)
    pthread_mutexattr_getprioceiling(3C)
The concepts of scheduling policy and per-policy priority range
introduced by the sched_*() interfaces are retained by the POSIX
threads scheduling interfaces.  These interfaces can be applied
by one thread of a multithreaded process to itself or to any other
thread within the same process, subject only to permissions.
There is no way for a thread in one process to apply these
interfaces to a thread in a different process.

Except for the pseudo priority range of THREAD_MIN_PRIORITY to
THREAD_MAX_PRIORITY, all of these scheduling interfaces can be
supported at the bottom level by the powerful but complex
priocntl(2) interface, augmented by the individual scheduling
class specific interfaces defined for the time-sharing (TS),
real-time (RT), interactive (IA), fair-share (FSS), and fixed
priority (FX) scheduling classes.

Actions to be taken:

The pseudo priority concepts and the definitions of THREAD_MIN_PRIORITY
and THREAD_MAX_PRIORITY should be expunged.  They are useless now that
the two-level threading model has been abandoned.

The supported scheduling classes should be formalized as appropriate
scheduling policies, defined in the <sched.h> header file:

#define SCHED_OTHER  0   /* traditional time-sharing scheduling class */
#define SCHED_FIFO   1   /* real-time class: run to completion */
#define SCHED_RR     2   /* real-time class: round-robin */
#define SCHED_SYS    3   /* system scheduling class */
#define SCHED_IA     4   /* interactive time-sharing class */
#define SCHED_FSS    5   /* fair-share scheduling class */
#define SCHED_FX     6   /* fixed-priority scheduling class */
#define _SCHED_NEXT  7   /* first unassigned policy number */

Since Solaris supports third-party scheduling classes, provision
must be made in the sched_*() and pthread_*() scheduling interfaces
to support dynamically-loaded third party scheduling classes by
returning a unique policy number, beyond _SCHED_NEXT, for each
interface that returns a policy number.

The sched_*() and pthread_*() scheduling interfaces should deal
with proper priority ranges, as defined by the priocntl(2) interface,
not the inverted nice value ranges as is done now for SCHED_OTHER.

Since a thread's scheduling class and priority may be changed at
any time, without notice and out of control of the thread itself,
provision must be made to communicate the current scheduling class
and priority to a thread from the kernel.  This can be accomplished
with an extension to the private schedctl() interface to include the
thread's scheduling class id, the thread's priority within that class,
and in addition, the current kernel-computed dispatch priority.

To enable a thread's class priority to be manipulated independently
of its class id, two changes to the priocntl() interface are required:
    PC_GETPRIRANGE
        Change the definition of the operation from "return the
        global scheduling priority range of the class" to "return
        the user-mode scheduling priority range of the class".
    PC_DOPRIO
        New operation:  Class-independent method for getting or
        setting the user-mode priority within the current class.
These changes require changes to the kernel scheduling class interface:
    CL_GETCLPRI()
        Each scheduling class must change its interface definition.
    CL_DOPRIO()
        Each scheduling class must inplement this new interface.
These operations enable the POSIX scheduling interfaces to work on
a thread running in a third-party scheduling class (whose policy is
not defined in the <sched.h> header file) so long as the third-party
code implements the CL_GETCLPRI() and CL_DOPRIO() interfaces properly.

Finally, the default attributes for creating a thread with pthread_create()
include PTHREAD_EXPLICIT_SCHED.  This is unnatural.  For one thing, it
subverts the intent of setting the default scheduling class for processes
in a zone, often the fair-share class (FSS).  The main thread would be
born (at exec() time) using the default scheduling class, but each thread
created by pthread_create() using the default attributes would attempt
to be placed in the SCHED_OTHER (time-sharing) scheduling class.  Likewise,
a real-time process (RT class) would have its additional threads created
in the time-sharing class by default.

The default value for this attribute should be PTHREAD_INHERIT_SCHED so
that a thread, by default, inherits its scheduling policy and priority
from the creating thread.  It obeys the principle of least surprise.

All of these changes require changes to the following manual pages:
    getpriority.3c
    priocntl.1
    priocntl.2
    priocntlset.2
    pthread_attr_getschedparam.3c
    pthread_attr_getschedpolicy.3c
    pthread_attr_init.3c
    pthread_getschedparam.3c
    pthread_mutex_getprioceiling.3c
    pthread_mutexattr_getprioceiling.3c
    pthread_setschedprio.3c
    sched.h.3head
    sched_get_priority_max.3rt
    sched_getparam.3rt
    sched_getscheduler.3rt
    sched_rr_get_interval.3rt
    sched_setparam.3rt
    sched_setscheduler.3rt
    td_thr_setprio.3c_db
    thr_getprio.3c
    threads.5

See the materials/old_man and materials/new_man directories
for the old and new versions of these manual pages.
Use diff(1) to see the changes.

Roger Faulkner



Reply via email to