[jira] [Commented] (IO-278) Improve Tailer performance with buffered reads
[ https://issues.apache.org/jira/browse/IO-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251757#comment-13251757 ] Sergio Bossa commented on IO-278: - Hi James, I think this is not the right place to discuss issues related to my fork, please use Tayler github tracker for that. Thanks, Sergio B. Improve Tailer performance with buffered reads -- Key: IO-278 URL: https://issues.apache.org/jira/browse/IO-278 Project: Commons IO Issue Type: Improvement Affects Versions: 2.0.1 Reporter: Sergio Bossa Attachments: Tailer.diff, TailerTest.diff I noticed Tailer read performances are pretty poor when dealing with large, frequently written, log files, and this is due to the use of RandomAccessFile which does unbuffered reads, hence causing lots of disk I/O. So I improved the Tailer implementation by introducing buffered reads: it works by loading large (configurable) file chunks in memory, and reading lines from there; this enhances performances in my tests from 10x to 30x depending on the file size. I also added two test cases: one to simulate reading of a large file (you can use it to compare performances), the other to verify correct handling on buffer breaks; obviously, all tests pass. I'm attaching the diff files, let me know if it's okay for you guys! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (IO-278) Improve Tailer performance with buffered reads
[ https://issues.apache.org/jira/browse/IO-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225372#comment-13225372 ] Sergio Bossa commented on IO-278: - Hi Jim, Commons-IO team doesn't seem to be much responsive, so I had to create a fork from that at: https://github.com/sbtourist/tayler Feel free to give it a look. Improve Tailer performance with buffered reads -- Key: IO-278 URL: https://issues.apache.org/jira/browse/IO-278 Project: Commons IO Issue Type: Improvement Affects Versions: 2.0.1 Reporter: Sergio Bossa Attachments: Tailer.diff, TailerTest.diff I noticed Tailer read performances are pretty poor when dealing with large, frequently written, log files, and this is due to the use of RandomAccessFile which does unbuffered reads, hence causing lots of disk I/O. So I improved the Tailer implementation by introducing buffered reads: it works by loading large (configurable) file chunks in memory, and reading lines from there; this enhances performances in my tests from 10x to 30x depending on the file size. I also added two test cases: one to simulate reading of a large file (you can use it to compare performances), the other to verify correct handling on buffer breaks; obviously, all tests pass. I'm attaching the diff files, let me know if it's okay for you guys! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (IO-279) Tailer erroneously consider file as new
[ https://issues.apache.org/jira/browse/IO-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174800#comment-13174800 ] Sergio Bossa commented on IO-279: - Mark, that should be fixed in my fork: https://github.com/sbtourist/tayler Tailer erroneously consider file as new --- Key: IO-279 URL: https://issues.apache.org/jira/browse/IO-279 Project: Commons IO Issue Type: Bug Affects Versions: 2.0.1 Reporter: Sergio Bossa Tailer sometimes erroneously consider the tailed file as new, forcing a repositioning at the start of the file: I'm still unable to reproduce this in a test case, because it only happens to me with huge log files during Apache Tomcat startup. This is the piece of code causing the problem: // See if the file needs to be read again if (length position) { // The file has more content than it did last time last = System.currentTimeMillis(); position = readLines(reader); } else if (FileUtils.isFileNewer(file, last)) { /* This can happen if the file is truncated or overwritten * with the exact same length of information. In cases like * this, the file position needs to be reset */ position = 0; reader.seek(position); // cannot be null here // Now we can read new lines last = System.currentTimeMillis(); position = readLines(reader); } What probably happens is that the new file content is about to be written on disk, the date is already updated but content is still not flushed, so actual length is untouched and there you go. In other words, I think there should be some better method to verify the condition above, rather than relying only on dates: keeping and comparing the hash code of the latest line may be a solution, but may hurt performances ... other ideas? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira