- **status**: accepted --> review
---
** [tickets:#1042] LOG: out of control file descriptor of log file cause .nfs
file generate when log file deleted by rotation mechanism**
**Status:** review
**Milestone:** 4.4.1
**Created:** Thu Sep 04, 2014 02:30 AM UTC by Thuan
**Last Updated:** Fri Sep 12, 2014 06:41 AM UTC
**Owner:** elunlen
Out of control of file descriptor in LOG service cause many .nfs file generate.
The problem can be reproduced by following steps:
(1) Make a sample to writing log to one log stream.
(2) Execute sample until saLogStreamNumOpeners > 100
(3) Check the applog folder, some .nfs files exist in applog folder, not
frequently happen
Base on my LOG trace, the problem may cause by closing file unsuccessfully but
no checking return code. Here is the log:
Aug 21 6:10:35.015781 osaflogd [4384:lgs_file.c:0287] TR log_file_api -
LGSF_BUSY
Aug 21 6:10:35.015786 osaflogd [4384:lgs_stream.c:1233] TR log_stream_write_h
- API error LGSF_BUSY
Aug 21 6:10:35.015794 osaflogd [4384:lgs_stream.c:1247] IN write
'test_20140821_061033' failed "Success"
Aug 21 6:10:35.015799 osaflogd [4384:lgs_stream.c:0108] >> fileclose_h: fd=27
Aug 21 6:10:35.015902 osaflogd [4384:lgs_file.c:0287] TR log_file_api -
LGSF_BUSY
Aug 21 6:10:35.015914 osaflogd [4384:lgs_stream.c:0119] TR fileclose_h - API
error LGSF_BUSY
Aug 21 6:10:35.015919 osaflogd [4384:lgs_stream.c:0125] << fileclose_h: rc = -1
Aug 21 6:10:35.015924 osaflogd [4384:lgs_stream.c:1275] << log_stream_write_h:
rc=-1
…
Aug 21 6:10:35.016280 osaflogd [4384:lgs_stream.c:1188] >> log_stream_write_h:
safLgStr=testLogStream1,safApp=safLogService
Aug 21 6:10:35.016285 osaflogd [4384:lgs_stream.c:0293] >>
log_initiate_stream_files
Here is the code of log_stream_write_h() function:
if (*stream->p_fd == -1) {
/* Create directory and log files if they were not created at
* stream open or reopen files if bad file descriptor.
*/
log_initiate_stream_files(stream);
…
if ((rc == -1) || (rc == -2)) {
/* If writing failed always invalidate the stream file
descriptor.
*/
/* Careful with log level here to avoid syslog flooding */
LOG_IN("write '%s' failed \"%s\"", stream->logFileCurrent,
strerror(write_errno));
if (*stream->p_fd != -1) {
/* Try to close the file and invalidate the stream fd */
fileclose_h(*stream->p_fd); //===>No check return code
*stream->p_fd = -1;
}
…
So, there is one file descriptor is opened out of control of LOG service.
If LOG service open log file with new fd, when log file rotate mechanism delete
it, .nfs file generated.
I also found .nfs file with size < max size. Because file rename done, then
open new log file time out, actually success, but variable "curFileSize" is not
reset. In next write, log file is open again (2nd fd), will close after 1st
write due to "curFileSize" > max file size. This small log file will become
.nfs when rotation mechanism delete it. So, I think the variable "curFileSize"
should reset 0 after rename succeed and consider about open file api. Here is
the code:
/* Close current log file */
if ((rc = fileclose_h(*stream->p_fd)) == -1) {
LOG_IN("close FAILED: %s", strerror(errno));
goto done;
}
*stream->p_fd = -1;
/* Rename file to give it the "close timestamp" */
rc = lgs_file_rename_h(stream->pathName, stream->logFileCurrent,
current_time, LGS_LOG_FILE_EXT, NULL);
if (rc == -1)
goto done;
/* Create a new file name that includes "open time stamp" and
open the file */
snprintf(stream->logFileCurrent, NAME_MAX, "%s_%s",
stream->fileName,
current_time);
if ((*stream->p_fd = log_file_open(stream,
stream->logFileCurrent, //==>open time out but actually file opened cause fd
out of control
&errno_save)) == -1) {
LOG_IN("Could not open '%s' - %s",
stream->logFileCurrent,
strerror(errno_save));
rc = -1;
goto done;
}
stream->curFileSize = 0; //==>Reset size here should move to
after rename done
Here is the log:
Aug 22 7:43:05.038105 osaflogd [4110:lgs_stream.c:0740] >> log_file_open
Aug 22 7:43:05.038111 osaflogd [4110:lgs_stream.c:0750] TR log_file_open -
Opening file
"/storage/no-backup/coremw/var/log/saflog/test/test_20140822_074304.log"
Aug 22 7:43:05.038116 osaflogd [4110:lgs_stream.c:0063] >> fileopen_h
Aug 22 7:43:05.038121 osaflogd [4110:lgs_stream.c:0074] TR fileopen_h -
filepath
"/storage/no-backup/coremw/var/log/saflog/test/test_20140822_074304.log"
Aug 22 7:43:05.038393 osaflogd [4110:lgs_filehdl.c:0448] >> fileopen_hdl
Aug 22 7:43:05.038417 osaflogd [4110:lgs_filehdl.c:0450] TR fileopen_hdl -
filepath
"/storage/no-backup/coremw/var/log/saflog/test/test_20140822_074304.log"
…
Aug 22 7:43:05.539345 osaflogd [4110:lgs_file.c:0342] TR Timed out before
answer
Aug 22 7:43:05.558948 osaflogd [4110:lgs_stream.c:0085] TR fileopen_h - API
error LGSF_TIMEOUT
Aug 22 7:43:05.558987 osaflogd [4110:lgs_stream.c:0092] << fileopen_h
Aug 22 7:43:05.559000 osaflogd [4110:lgs_stream.c:0757] << log_file_open
Aug 22 7:43:05.559022 osaflogd [4110:lgs_stream.c:1151] IN Could not open
'test_20140822_074304' - Success
Aug 22 7:43:05.559035 osaflogd [4110:lgs_stream.c:1275] << log_stream_write_h:
rc=-1
**So, there are 3 points need to solve:**
(1) Closing file unsuccessfully but no checking return code.
(2) The current file size should reset after rename done.
(3) Open file time out but file already open cause fd out of control
---
Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets