Note: if you use VMBATCH, the worker machine connects to SFS with the authority of the job submitter. You say "This user is the owner of most of the directories" You mean: the submitter is userid ABC, the dirids are all named "fpoolid:ABC.something"?
2011/4/14 Graves Nora E <nora.e.gra...@irs.gov> > We are having an intermittent problem with SFS and I'm hoping someone may > have some ideas of what to pursue next. > > We have several batch jobs that run under VMBATCH overnight. Sometimes > they are not able to create a file in a directory, even though most times it > is successful. The only differences in the executions are the file names; > for many of these the File Type is the date. > > In the job I am most familiar with, these are the specifics. > > The job runs Monday-Saturday. This year, it has failed on January 4, > January 12, February 9, March 18, March 25, and April 13. It has run > successfully the other days. Other than the QUERY statements below, it has > not changed. > The job runs in a work machine, WORK7. > The job is submitted by the User ID of the database owner. > The SFS directories are owned by a 3rd user. Failures occur in many of the > subdirectories, not just one subdirectory owned by this user. This user is > the owner of most of the directories containing the data files we create in > batch, so I don't think it's significant that it's the ID that has the > problem. However, as far as I know, it is the only ID that does have the > problem. > This job uses VMLINK to acquire a write link to SFS directory. This always > looks to be successful--no error is given. (Other jobs use GETFMADDR and > ACCESS to acquire the write link to the directory. This always > appears successful as well). > Once the file is ready to be copied from the Work Machine's 191 disk to the > SFS directory, the intermittent error appears. The vast majority of the > time, the write is successful. However, sometimes, the job gets this error > message: > DMSOPN1258E You are not authorized to write to file XXXXXX 20110413 Z1 > > The file is not large--last night's file was only 12 blocks. > > At the suggestion of our systems programmer, I've put in a lot of query > statements. I've issued QUERY LIMITS for the job submitter; it's only used > 84% of the allocation, with several thousand blocks available. The SFS > directory owner has only used 76% of its allocation, with several thousand > more blocks still available. The filepool is not full. > > I've issued QUERY FILEPOOL CONFLICT. There is no conflict. > > I've issued QUERY ACCESSED. The directory shows that is accessed R/W. > > When the write is unsuccessful, the program then loops through 5 tries of > releasing the access, reacquiring the access, and attempting to write the > file again. This has never been successful. I've issued both a COPYFILE > and a PIPE to try to write the file; these do not work once there has been a > failure. > > We've looked at the operator consoles to see if we can find any jobs > running at the same time. We haven't found any that are accessing that > directory structure. > > There aren't any dumps to look at--it looks perfectly successful other than > the fact that it won't write the file. > > Does anyone have any suggestions of something to try next? > > > Nora Graves > nora.e.gra...@irs.gov > > -- Kris Buelens, IBM Belgium, VM customer support