[jira] [Commented] (IO-279) Tailer erroneously considers file as new
[ https://issues.apache.org/jira/browse/IO-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528636#comment-17528636 ] Apoorva Maheshwari commented on IO-279: --- Issue is also present in 2.11 version. Please check. > Tailer erroneously considers file as new > > > Key: IO-279 > URL: https://issues.apache.org/jira/browse/IO-279 > Project: Commons IO > Issue Type: Bug >Affects Versions: 2.0.1, 2.4 >Reporter: Sergio Bossa >Priority: Major > Attachments: IO-279.patch, disable_resetting.patch, fix-tailer.patch, > modify-test-fixed.patch, modify-test.patch > > Time Spent: 1h > Remaining Estimate: 0h > > Tailer sometimes erroneously considers the tailed file as new, forcing a > repositioning at the start of the file: I'm still unable to reproduce this in > a test case, because it only happens to me with huge log files during Apache > Tomcat startup. > This is the piece of code causing the problem: > {code} > // See if the file needs to be read again > if (length > position) { > // The file has more content than it did last time > last = System.currentTimeMillis(); > position = readLines(reader); > } else if (FileUtils.isFileNewer(file, last)) { > /* This can happen if the file is truncated or overwritten > * with the exact same length of information. In cases like > * this, the file position needs to be reset > */ > position = 0; > reader.seek(position); // cannot be null here > // Now we can read new lines > last = System.currentTimeMillis(); > position = readLines(reader); > } > {code} > What probably happens is that the new file content is about to be written on > disk, the date is already updated but content is still not flushed, so actual > length is untouched and there you go. > In other words, I think there should be some better method to verify the > condition above, rather than relying only on dates: keeping and comparing the > hash code of the latest line may be a solution, but may hurt performances ... > other ideas? -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (IO-279) Tailer erroneously considers file as new
[ https://issues.apache.org/jira/browse/IO-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527959#comment-17527959 ] Apoorva Maheshwari commented on IO-279: --- Issue is still present in version 2.7. As it is in reopen state, Kindly confirm if it is being planned to fix in upcoming release. > Tailer erroneously considers file as new > > > Key: IO-279 > URL: https://issues.apache.org/jira/browse/IO-279 > Project: Commons IO > Issue Type: Bug >Affects Versions: 2.0.1, 2.4 >Reporter: Sergio Bossa >Priority: Major > Attachments: IO-279.patch, disable_resetting.patch, fix-tailer.patch, > modify-test-fixed.patch, modify-test.patch > > Time Spent: 1h > Remaining Estimate: 0h > > Tailer sometimes erroneously considers the tailed file as new, forcing a > repositioning at the start of the file: I'm still unable to reproduce this in > a test case, because it only happens to me with huge log files during Apache > Tomcat startup. > This is the piece of code causing the problem: > {code} > // See if the file needs to be read again > if (length > position) { > // The file has more content than it did last time > last = System.currentTimeMillis(); > position = readLines(reader); > } else if (FileUtils.isFileNewer(file, last)) { > /* This can happen if the file is truncated or overwritten > * with the exact same length of information. In cases like > * this, the file position needs to be reset > */ > position = 0; > reader.seek(position); // cannot be null here > // Now we can read new lines > last = System.currentTimeMillis(); > position = readLines(reader); > } > {code} > What probably happens is that the new file content is about to be written on > disk, the date is already updated but content is still not flushed, so actual > length is untouched and there you go. > In other words, I think there should be some better method to verify the > condition above, rather than relying only on dates: keeping and comparing the > hash code of the latest line may be a solution, but may hurt performances ... > other ideas? -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] (IO-278) Improve Tailer performance with buffered reads
[ https://issues.apache.org/jira/browse/IO-278 ] Apoorva Maheshwari deleted comment on IO-278: --- was (Author: JIRAUSER288635): Issue is still present in version 2.7. As it is in reopen state, Kindly confirm if it is being planned to fix in upcoming release. > Improve Tailer performance with buffered reads > -- > > Key: IO-278 > URL: https://issues.apache.org/jira/browse/IO-278 > Project: Commons IO > Issue Type: Improvement >Affects Versions: 2.0.1 >Reporter: Sergio Bossa >Priority: Major > Fix For: 2.4 > > Attachments: Tailer.diff, TailerTest.diff > > > I noticed Tailer read performances are pretty poor when dealing with large, > frequently written, log files, and this is due to the use of RandomAccessFile > which does unbuffered reads, hence causing lots of disk I/O. > So I improved the Tailer implementation by introducing buffered reads: it > works by loading large (configurable) file chunks in memory, and reading > lines from there; this enhances performances in my tests from 10x to 30x > depending on the file size. > I also added two test cases: one to simulate reading of a large file (you can > use it to compare performances), the other to verify correct handling on > buffer breaks; obviously, all tests pass. > I'm attaching the diff files, let me know if it's okay for you guys! -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (IO-278) Improve Tailer performance with buffered reads
[ https://issues.apache.org/jira/browse/IO-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527956#comment-17527956 ] Apoorva Maheshwari commented on IO-278: --- Issue is still present in version 2.7. As it is in reopen state, Kindly confirm if it is being planned to fix in upcoming release. > Improve Tailer performance with buffered reads > -- > > Key: IO-278 > URL: https://issues.apache.org/jira/browse/IO-278 > Project: Commons IO > Issue Type: Improvement >Affects Versions: 2.0.1 >Reporter: Sergio Bossa >Priority: Major > Fix For: 2.4 > > Attachments: Tailer.diff, TailerTest.diff > > > I noticed Tailer read performances are pretty poor when dealing with large, > frequently written, log files, and this is due to the use of RandomAccessFile > which does unbuffered reads, hence causing lots of disk I/O. > So I improved the Tailer implementation by introducing buffered reads: it > works by loading large (configurable) file chunks in memory, and reading > lines from there; this enhances performances in my tests from 10x to 30x > depending on the file size. > I also added two test cases: one to simulate reading of a large file (you can > use it to compare performances), the other to verify correct handling on > buffer breaks; obviously, all tests pass. > I'm attaching the diff files, let me know if it's okay for you guys! -- This message was sent by Atlassian Jira (v8.20.7#820007)