Thank you for the clarifications in this and other rsyslog message threads.
After realizing what causes rsyslog to attempt random read operation which fails, the issue became reproducible and was able to isolate the debug output which will transpose for the results from separate tests. The problem persists regardless how configured, with queue and output regular file, or without queues, also tested with output dynaFile. When start rsyslog, logger writes files to omfile output file or dynafile to dfs-fuse mount, there are no issues until another command or program access the same path which interrupts some sort lock on the filesystem. If rsyslog is started, and then simply list the directory contents at the same hierarchy level as the omfile output, run "ls /dfs-fuse-data/test" - even though there's no files yet, it will now generate the error when logging which is part of the dfs-fuse library that does validation is using sequential access. 1234.567891012:dfs-test-file queue:Reg/w0: omfile: write to stream, pData->pStrm 0x7f1b1234567a0, lenBuf 61, strt data Oct 9 12:00:00 testsrv01 testusr: -DFS_FUSE_DRIVER_TEST 1/2 1234.567891024:dfs-test-file queue:Reg/w0: strm 0x7f1b1234567a0: file 44( dfs-appliance_test.log) flush, buflen 61 1234.567891048:dfs-test-file queue:Reg/w0: strmPhysWrite, stream 0x0a1b1234567a0, len 61 1234.567892537:dfs-test-file queue:Reg/w0: log file (44) write error 95: Operation not supported 1234.567892559:dfs-test-file queue:Reg/w0: Action 64 transitioned to state: rdy 1234.567892570:dfs-test-file queue:Reg/w0: actionCommit, in retry loop, iRet 0 1234.567892581:dfs-test-file queue:Reg/w0: regular consumer finished, iret=0, szlog 0 sz phys 1 1234.567892592:dfs-test-file queue:Reg/w0: DeleteProcessedBatch: we deleted 1 objects and enqueued 0 objects 1234.567892603:dfs-test-file queue:Reg/w0: doDeleteBatch: delete batch from store, new sizes: log 0, phys 0 1234.567892606:dfs-test-file queue:Reg/w0: regular consumer finished, iret=4, szlog 0 sz phys 0 1234.567892627:dfs-test-file queue:Reg/w0: dfs-test-file queue:Reg/w0: worker IDLE, waiting for work. The next log to parse after this write error, is the message which appears in /var/log/messages: <11>Oct 9 12:00:00 fuse_dfs: ERROR /src/main/native/fuse-dfs/fuse_impls_write.c:57 User trying to random access write to a file 61 != 0 for /dfs-fuse-data/test/dfs- appliance_test.log If there were a lot of logs attempting to write afterward, depending on the rate, there would be an error for each message that was attempted to write, and failed, then is deleted instead. After it has failed, a service rsyslog restart, or a HUP the system will flush logs that congested into the queue before the point of write failing, but any log after the error is generated is not written and deleted as described by debug output. When there is a high volume of logs sent in, it appears to write without issue until introducing the error by wc the file and see the char len, which matches the exact point of the dfsTell validation in the error message *61 in example above. As you can imagine, it is a problem to try and avoid running any command on directory rsyslog omfile is outputting to without breaking the stream process. DynaFile was tested, to see if the entire path is affected or only the top level which files are actively being written. Unfortunately, any directory rsyslog wrote to from the point of startup appears to have persistence even though the template is separating the tree structure based on date and time as workaround. 5678.910111111:dfs-test-file main Q:Reg/w0: omfile: file to log to: / dfs-fuse-data/test/2015/10/10/12_00_dfs-appliance_test.log 5678.910111213:dfs-test-file queue:Reg/w0: omfile: write to stream, pData->pStrm 0x7f2c7654321b1, lenBuf 62, strt data Oct 10 12:00:00 testsrv01 testusr: -DFS_FUSE_DRIVER_TEST 2/2 5678.910111314:dfs-test-file main Q:Reg/w0: strm 0x7f2c7654321b1: file 44(12_00_dfs-appliance_test.log) flush, buflen 62 5678.910111415:dfs-test-file main Q:Reg/w0: strmPhysWrite, stream 0x7f2c7654321b1, len 62 5678.910111516:dfs-test-file main Q:Reg/w0: log file (44) write error 95: Operation not supported 5678.910111617:dfs-test-file main Q:Reg/w0: Action 60 transitioned to state: rdy 5678.910111718:dfs-test-file main Q:Reg/w0: actionCommit, in retry loop, iRet 0 5678.910111819:dfs-test-file main Q:Reg/w0: actionCommitAll: action 62, state 1, nbr to commit 0 isTransactional 1 5678.910111920:dfs-test-file main Q:Reg/w0: doTransaction: have commitTransaction IF, using that, pWrkrInfo, 0x7f2c775432160 5678.910112021:dfs-test-file main Q:Reg/w0: entering actionCallCommitTransaction(), state: itx, actionNbr, 62, nMsgs 1 5678.910112122:dfs-test-file main Q:Reg/w0: omfile: write to stream, pData->pStrm 0x7f2c7654345c1, len 62, strt data Oct 10 12:00:00 testsrv01 testusr: -DFS_FUSE_DRIVER_TEST 2/2 5678.910112223:dfs-test-file main Q:Reg/w0: strm 0x7f2c7654345c1: file 42(messages) flush, buflen 62 5678.910112324:dfs-test-file main Q:Reg/w0: strmPhysWrite, stream 0x7f2c7654345c1, len 62 5678.910112326:dfs-test-file main Q:Reg/w0: strm 0x7f2c7654345c1: file 42 write wrote 62 bytes 5678.910112328:dfs-test-file main Q:Reg/w0: Action 62 transition to state: rdy 5678.910112625:dfs-test-file main Q:Reg/w0: Action 62 transition to state: itx 5678.910112626:dfs-test-file main Q:Reg/w0: Action 62 transition to state: rdy 5678.910112628:dfs-test-file main Q:Reg/w0: actionCommit, in retry loop, iRet 0 5678.910112629:dfs-test-file main Q:Reg/w0: actionCommitAll: action 66, state 1, nbr to commit 0 isTransactional 1 5678.910112654:dfs-test-file main Q:Reg/w0: processBATCH: batch of 1 elements has been processed. 5678.910112664:dfs-test-file main Q:Reg/w0: regular consumer finished, iret=0, szlog 1 sz phys 2 5678.910112667:dfs-test-file main Q:Reg/w0: DeleteProcessedBatch: we deleted 1 objects and enqueued 0 objects 5678.910112702:dfs-test-file main Q:Reg/w0: doDeleteBatch: delete batch from store, new sizes: log 1, phys 1 5678.910112704:dfs-test-file main Q:Reg/w0: processBATCH: batch of 1 elements must be processed 5678.910112805:dfs-test-file main Q:Reg/w0: processBATCH: next msg 0:<11>Oct 10 12:00:00 fuse_dfs: ERROR /src/main/native/fuse-dfs/fuse_impls_write.c:57 User trying to random access write to a file 62 != 0 for /dfs-fuse-data/test/2015/10/10/12_00_dfs- appliance_test.log The error would show 'write to file 123 != 61', if sent the first log (61 chars) were in successfully before ls the directory and sending second log(62 chars). I hope these debugs and details for reproducing the issue helps identify root cause. Although the numbers at the start of line are made up when transposing, ####.#########, the rest of the debug content is complete chain of events. Thanks for all the help! On Tue, Oct 6, 2015 at 4:04 AM, Rainer Gerhards <rgerha...@hq.adiscon.com> wrote: > 2015-10-06 8:26 GMT+02:00 David Lang <da...@lang.hm>: > > On Tue, 6 Oct 2015, Rainer Gerhards wrote: > > > >> 2015-10-06 7:27 GMT+02:00 David Lang <da...@lang.hm>: > >>> > >>> On Tue, 6 Oct 2015, Rainer Gerhards wrote: > >>> > >>>> 2015-10-06 0:09 GMT+02:00 Kendall Green <kendallar...@gmail.com>: > >>>>> > >>>>> > >>>>> I've set asyncWriting="off" and the only difference is, that with > 1000 > >>>>> message test, all were delivered on HUP and/or service restart, but > >>>>> apparently not on flushInterval or buffer full. I'm currently running > >>>>> another test with 100k messages iterating about 10 logs/sec, and > >>>>> watching > >>>>> the output for number of lines. The output file is created on > dfs-fuse > >>>>> mount upon receipt of the first log, but 0 lines written. > >>>>> > >>>>> This current test will hopefully show how many logs get processed > >>>>> without > >>>>> intervention over the next ~3 hours, and if still no logs appear will > >>>>> see > >>>>> how many are written upon HUP signal. > >>>>> > >>>>> I'm unsure how the asyncWriting off impacts the rest of the config, > and > >>>>> if > >>>>> the current config has disabled compression... or not with no zip. > >>>>> > >>>>> This note is from omfile asyncWriting doc: > >>>>> *Note that in order to enable FlushInterval, AsyncWriting must be set > >>>>> to > >>>>> “on”. Otherwise, the flush interval will be ignored. Also note that > >>>>> when > >>>>> FlushOnTXEnd is “on” but AsyncWriting is off, output will only be > >>>>> written > >>>>> when the buffer is full. This may take several hours, or even > require a > >>>>> rsyslog shutdown. However, a buffer flush can be forced in that case > by > >>>>> sending rsyslogd a HUP signal.* > >>>>> > >>>>> Please help clarify how these parameters correspond and what the > >>>>> settings > >>>>> can achieve expedited log transport, avoiding long delays waiting for > >>>>> buffering or flush intervals. This goal is why have chosen low > settings > >>>>> for > >>>>> IOBufferSize and flushInterval. > >>>> > >>>> > >>>> > >>>> Without async writing, everything is written sync on when it arrives > >>>> (more precisely at commit of the current batch) aka "as quickly as > >>>> possible". Thus, these params do not exist because they have no > >>>> meaning. Note that with a very large batch and a small buffer, partial > >>>> records may be written. > >>> > >>> > >>> > >>> although without the checkpoint parameter the writes are to the OS and > >>> they > >>> get flushed to disk at a later point. > >> > >> > >> you mean "sync", right? > > > > > > I may be misunderstanding, but i thought the checkpoint (or checkpoint > > interval) parameter controleld how frequently you did the sync (so > turning > > sync on didn't make it per every message) > > That checkopoint interval is for queues (which share the same util > object), not for omfile. There, what you say is right. For omfile (if > not set to async writing), the sync happens at the end of each batch, > which usually means after each write because usually a full batch goes > into the write buffer. If the batch has more data than the write > buffer fits, all buffer writes are done and then the flush. > > I've also updated the doc to be more precise: > > > https://github.com/rsyslog/rsyslog-doc/commit/b2bb5d93787f9662eb41499c86f881d2bd649dce > > Rainer > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > DON'T LIKE THAT. > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.