Nora,
Batch jobs normally run with the privileges of the "owner" of the job, using
the SECUSER facility in z/VM.  With SFS, this can lead to unexpected results
when a prior batch job leaves the worker with a connection to the filepool
under a different user's id.  If the job ordering/selection of batch workers
is somewhat random, you could see the outcome that you're experiencing
(sometimes it works, sometimes it fails).

For example:
If WORK7 first runs a job for "JOHN" and JOHN utilizes the same filepool,
WORK7 could be left with a connection to the filepool that appears to be
from the user "JOHN".
If your failing job runs next, it would likely end up accessing the filepool
as JOHN, not as the actual owner, which could cause the failure.

If on another night, WORK7 is recycled before your failing job runs, then it
would establish a connection to the file pool with the owner's ID (say
OWNER) and then would run with the access rights of OWNER, and you'd see
success.

If this is your problem, there are many ways to solve it.  The easiest would
be to ensure that worker machines are recycled (logoff/xautolog or IPL CMS)
prior to the job running.  I wonder if there are VMBATCH options (system or
job) that provide the ability to do a more advanced "clean up" of a worker
prior to a job run.

John

--
John Hall
Safe Software, Inc.
727-608-8799
johnh...@safesoftware.com


On Thu, Apr 14, 2011 at 3:45 PM, Graves Nora E <nora.e.gra...@irs.gov>wrote:

>  We are having an intermittent problem with SFS and I'm hoping someone may
> have some ideas of what to pursue next.
>
> We have several batch jobs that run under VMBATCH overnight.  Sometimes
> they are not able to create a file in a directory, even though most times it
> is successful.  The only differences in the executions are the file names;
> for many of these the File Type is the date.
>
> In the job I am most familiar with, these are the specifics.
>
> The job runs Monday-Saturday.  This year, it has failed on January 4,
> January 12, February 9, March 18, March 25, and April 13.  It has run
> successfully the other days.  Other than the QUERY statements below, it has
> not changed.
> The job runs in a work machine, WORK7.
> The job is submitted by the User ID of the database owner.
> The SFS directories are owned by a 3rd user.  Failures occur in many of the
> subdirectories, not just one subdirectory owned by this user.  This user is
> the owner of most of the directories containing the data files we create in
> batch, so I don't think it's significant that it's the ID that has the
> problem.  However, as far as I know, it is the only ID that does have the
> problem.
> This job uses VMLINK to acquire a write link to SFS directory.  This always
> looks to be successful--no error is given.  (Other jobs use GETFMADDR and
> ACCESS to acquire the write link to the directory.  This always
> appears successful as well).
> Once the file is ready to be copied from the Work Machine's 191 disk to the
> SFS directory, the intermittent error appears.  The vast majority of the
> time, the write is successful.  However, sometimes, the job gets this error
> message:
> DMSOPN1258E You are not authorized to write to file XXXXXX 20110413 Z1
>
> The file is not large--last night's file was only 12 blocks.
>
> At the suggestion of our systems programmer, I've put in a lot of query
> statements.  I've issued QUERY LIMITS for the job submitter; it's only used
> 84% of the allocation, with several thousand blocks available. The SFS
> directory owner has only used 76% of its allocation, with several thousand
> more blocks still available.  The filepool is not full.
>
> I've issued QUERY FILEPOOL CONFLICT.  There is no conflict.
>
> I've issued QUERY ACCESSED.  The directory shows that is accessed R/W.
>
> When the write is unsuccessful, the program then loops through 5 tries of
> releasing the access, reacquiring the access, and attempting to write the
> file again.  This has never been successful.  I've issued both a COPYFILE
> and a PIPE to try to write the file; these do not work once there has been a
> failure.
>
> We've looked at the operator consoles to see if we can find any jobs
> running at the same time.  We haven't found any that are accessing that
> directory structure.
>
> There aren't any dumps to look at--it looks perfectly successful other than
> the fact that it won't write the file.
>
> Does anyone have any suggestions of something to try next?
>
>
>  Nora Graves
> nora.e.gra...@irs.gov
>
>

Reply via email to