[jira] [Commented] (IO-279) Tailer erroneously considers file as new

2022-04-27 Thread Apoorva Maheshwari (Jira)


[ 
https://issues.apache.org/jira/browse/IO-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528636#comment-17528636
 ] 

Apoorva Maheshwari commented on IO-279:
---

Issue is also present in 2.11 version. Please check.

> Tailer erroneously considers file as new
> 
>
> Key: IO-279
> URL: https://issues.apache.org/jira/browse/IO-279
> Project: Commons IO
>  Issue Type: Bug
>Affects Versions: 2.0.1, 2.4
>Reporter: Sergio Bossa
>Priority: Major
> Attachments: IO-279.patch, disable_resetting.patch, fix-tailer.patch, 
> modify-test-fixed.patch, modify-test.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Tailer sometimes erroneously considers the tailed file as new, forcing a 
> repositioning at the start of the file: I'm still unable to reproduce this in 
> a test case, because it only happens to me with huge log files during Apache 
> Tomcat startup.
> This is the piece of code causing the problem:
> {code}
> // See if the file needs to be read again
> if (length > position) {
> // The file has more content than it did last time
> last = System.currentTimeMillis();
> position = readLines(reader);
> } else if (FileUtils.isFileNewer(file, last)) {
> /* This can happen if the file is truncated or overwritten
> * with the exact same length of information. In cases like
> * this, the file position needs to be reset
> */
> position = 0;
> reader.seek(position); // cannot be null here
> // Now we can read new lines
> last = System.currentTimeMillis();
> position = readLines(reader);
> }
> {code}
> What probably happens is that the new file content is about to be written on 
> disk, the date is already updated but content is still not flushed, so actual 
> length is untouched and there you go.
> In other words, I think there should be some better method to verify the 
> condition above, rather than relying only on dates: keeping and comparing the 
> hash code of the latest line may be a solution, but may hurt performances ... 
> other ideas?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (IO-279) Tailer erroneously considers file as new

2022-04-26 Thread Apoorva Maheshwari (Jira)


[ 
https://issues.apache.org/jira/browse/IO-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527959#comment-17527959
 ] 

Apoorva Maheshwari commented on IO-279:
---

Issue is still present in version 2.7. As it is in reopen state, Kindly confirm 
if it is being planned to fix in upcoming release.

> Tailer erroneously considers file as new
> 
>
> Key: IO-279
> URL: https://issues.apache.org/jira/browse/IO-279
> Project: Commons IO
>  Issue Type: Bug
>Affects Versions: 2.0.1, 2.4
>Reporter: Sergio Bossa
>Priority: Major
> Attachments: IO-279.patch, disable_resetting.patch, fix-tailer.patch, 
> modify-test-fixed.patch, modify-test.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Tailer sometimes erroneously considers the tailed file as new, forcing a 
> repositioning at the start of the file: I'm still unable to reproduce this in 
> a test case, because it only happens to me with huge log files during Apache 
> Tomcat startup.
> This is the piece of code causing the problem:
> {code}
> // See if the file needs to be read again
> if (length > position) {
> // The file has more content than it did last time
> last = System.currentTimeMillis();
> position = readLines(reader);
> } else if (FileUtils.isFileNewer(file, last)) {
> /* This can happen if the file is truncated or overwritten
> * with the exact same length of information. In cases like
> * this, the file position needs to be reset
> */
> position = 0;
> reader.seek(position); // cannot be null here
> // Now we can read new lines
> last = System.currentTimeMillis();
> position = readLines(reader);
> }
> {code}
> What probably happens is that the new file content is about to be written on 
> disk, the date is already updated but content is still not flushed, so actual 
> length is untouched and there you go.
> In other words, I think there should be some better method to verify the 
> condition above, rather than relying only on dates: keeping and comparing the 
> hash code of the latest line may be a solution, but may hurt performances ... 
> other ideas?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] (IO-278) Improve Tailer performance with buffered reads

2022-04-26 Thread Apoorva Maheshwari (Jira)


[ https://issues.apache.org/jira/browse/IO-278 ]


Apoorva Maheshwari deleted comment on IO-278:
---

was (Author: JIRAUSER288635):
Issue is still present in version 2.7. As it is in reopen state, Kindly confirm 
if it is being planned to fix in upcoming release.

> Improve Tailer performance with buffered reads
> --
>
> Key: IO-278
> URL: https://issues.apache.org/jira/browse/IO-278
> Project: Commons IO
>  Issue Type: Improvement
>Affects Versions: 2.0.1
>Reporter: Sergio Bossa
>Priority: Major
> Fix For: 2.4
>
> Attachments: Tailer.diff, TailerTest.diff
>
>
> I noticed Tailer read performances are pretty poor when dealing with large, 
> frequently written, log files, and this is due to the use of RandomAccessFile 
> which does unbuffered reads, hence causing lots of disk I/O.
> So I improved the Tailer implementation by introducing buffered reads: it 
> works by loading large (configurable) file chunks in memory, and reading 
> lines from there; this enhances performances in my tests from 10x to 30x 
> depending on the file size.
> I also added two test cases: one to simulate reading of a large file (you can 
> use it to compare performances), the other to verify correct handling on 
> buffer breaks; obviously, all tests pass.
> I'm attaching the diff files, let me know if it's okay for you guys!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (IO-278) Improve Tailer performance with buffered reads

2022-04-26 Thread Apoorva Maheshwari (Jira)


[ 
https://issues.apache.org/jira/browse/IO-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527956#comment-17527956
 ] 

Apoorva Maheshwari commented on IO-278:
---

Issue is still present in version 2.7. As it is in reopen state, Kindly confirm 
if it is being planned to fix in upcoming release.

> Improve Tailer performance with buffered reads
> --
>
> Key: IO-278
> URL: https://issues.apache.org/jira/browse/IO-278
> Project: Commons IO
>  Issue Type: Improvement
>Affects Versions: 2.0.1
>Reporter: Sergio Bossa
>Priority: Major
> Fix For: 2.4
>
> Attachments: Tailer.diff, TailerTest.diff
>
>
> I noticed Tailer read performances are pretty poor when dealing with large, 
> frequently written, log files, and this is due to the use of RandomAccessFile 
> which does unbuffered reads, hence causing lots of disk I/O.
> So I improved the Tailer implementation by introducing buffered reads: it 
> works by loading large (configurable) file chunks in memory, and reading 
> lines from there; this enhances performances in my tests from 10x to 30x 
> depending on the file size.
> I also added two test cases: one to simulate reading of a large file (you can 
> use it to compare performances), the other to verify correct handling on 
> buffer breaks; obviously, all tests pass.
> I'm attaching the diff files, let me know if it's okay for you guys!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)