So you are proposing writing two logs - one compressed and one uncompressed - 
to handle this. I am wondering what the break-even point of this would be.  
Many users use a size-base trigger instead so that a) the compression won't 
take long and b) manipulating a large file is not so much of a problem.

What has me wondering about the usefulness of this is that when the file gets 
so large that compression at rollover is a problem the file is probably too 
large to manipulate effectively in something like vi.  

Ralph

Sent from my iPad

> On May 28, 2014, at 10:27 AM, David Hoa <[email protected]> wrote:
> 
> Yup, the tricky part would come on crash before close, interrupt, etc, 
> because I assume that that partially compressed file would be irrecoverable 
> (haven't verified this). Ideally, we'd be able to close it properly, but if 
> not, the log could, on startup, be recovered and compressed from the parallel 
> uncompressed log that was simultaneously being written by another/the same 
> appender.
> 
> That would incur start up time to recover, which may be more acceptable in 
> the rare case of a crash. Else, if there's another compression technique that 
> leaves behind readable files even if not closed properly, that'd eliminate 
> the need for recovery.
> 
> I'll open a jira ticket. Thanks for letting me share my thoughts on this.
> 
> - David
> 
> 
>> On Wed, May 28, 2014 at 9:39 AM, Matt Sicker <[email protected]> wrote:
>> We can use GZIPOutputStream, DeflaterOutputStream, and ZipOutputStream all 
>> out of the box.
>> 
>> What happens if you interrupt a stream in progress? No idea! But Gzip at 
>> least has CRC32 checksums on hand, so it can be detected if it's corrupted. 
>> We'll have to experiment a bit to see what really happens. I couldn't find 
>> anything in zlib.net's FAQ.
>> 
>> 
>>> On 28 May 2014 08:56, Ralph Goers <[email protected]> wrote:
>>> What would happen to the file if the system crashed before the file is 
>>> closed? Would the file be able to be decompressed or would it be corrupted?
>>> 
>>> Sent from my iPad
>>> 
>>>> On May 28, 2014, at 6:35 AM, Remko Popma <[email protected]> wrote:
>>>> 
>>>> David, thank you for the clarification. I understand better what you are 
>>>> trying to achieve now.
>>>> 
>>>> Interesting idea to have an appender that writes to a GZipOutputStream. 
>>>> Would you mind raising a Jira ticket for that feature request?
>>>> 
>>>> I would certainly be interested in learning about efficient techniques for 
>>>> compressing very large files. Not sure if or how the dd/direct I/O 
>>>> mentioned in the blog you linked to could be leveraged from java. If you 
>>>> find a way that works well for log file rollover, and you're interested in 
>>>> sharing it, please let us know.
>>>> 
>>>> 
>>>> 
>>>>> On Wed, May 28, 2014 at 3:42 PM, David Hoa <[email protected]> wrote:
>>>>> Hi Remko,
>>>>> 
>>>>> My point about gzip, which we've experienced, is that compressing very 
>>>>> large files (multi-GB) does have considerable impact on the system. The 
>>>>> dd/direct I/O workaround avoid putting that much log data into your 
>>>>> filesystem cache. For that problem, after I sent the email, I did look at 
>>>>> the log4j2 implementation, and saw that in 
>>>>> DefaultRolloverStrategy::rollover() it calls GZCompressionAction, so I 
>>>>> see how I can write my own strategy and Action to customize how gzip is 
>>>>> called.
>>>>> 
>>>>> My second question was not about adding to existing gzip files; from what 
>>>>> I know that's not possible. But if the GZipOutputStream is kept open and 
>>>>> written to until closed by a rollover event, then the cost of gzipping is 
>>>>> amortized over time rather than incurred when the rollover event gets 
>>>>> triggered. The benefit is amortization of gzip so there's no resource 
>>>>> usage spike; downside would be writing both compressed and uncompressed 
>>>>> log files and maintaining rollover strategies for both of them. So a 
>>>>> built in appender that wrote directly to gz files would be useful for 
>>>>> this.
>>>>> 
>>>>> Thanks,
>>>>> David
>>>>> 
>>>>> 
>>>>>> On Tue, May 27, 2014 at 4:52 PM, Remko Popma <[email protected]> 
>>>>>> wrote:
>>>>>> Hi David,
>>>>>> 
>>>>>> I read the blog post you linked to. It seems that the author was very, 
>>>>>> very upset that a utility called cp only uses a 512 byte buffer. He then 
>>>>>> goes on to praise gzip for having a 32KB buffer. 
>>>>>> So just based on your link, gzip is actually pretty good. 
>>>>>> 
>>>>>> That said, there are plans to improve the file rollover mechanism. These 
>>>>>> plans are currently spread out over a number of Jira tickets. One 
>>>>>> existing request is to delete archived log files that are older than 
>>>>>> some number of days. (https://issues.apache.org/jira/browse/LOG4J2-656, 
>>>>>> https://issues.apache.org/jira/browse/LOG4J2-524 )
>>>>>> This could be extended to cover your request to keep M compressed files. 
>>>>>> 
>>>>>> I'm not sure about appending to existing gzip files. Why is this 
>>>>>> desirable/What are you trying to accomplish with that?
>>>>>> 
>>>>>> Sent from my iPhone
>>>>>> 
>>>>>>> On 2014/05/28, at 3:22, David Hoa <[email protected]> wrote:
>>>>>>> 
>>>>>>> hi Log4j Dev,
>>>>>>> 
>>>>>>> I am interested in the log rollover and compression feature in log4j2. 
>>>>>>> I read the documentation online, and still have some questions.
>>>>>>> 
>>>>>>> - gzipping large files has performance impact on latencies/cpu/file 
>>>>>>> cache, and there's a workaround for that using dd and direct i/o. Is it 
>>>>>>> possible to customize how log4j2 gzips files (or does log4j2 already do 
>>>>>>> this)? See this link for a description of the common problem.
>>>>>>> http://kevinclosson.wordpress.com/2007/02/23/standard-file-utilities-with-direct-io/
>>>>>>> 
>>>>>>> - is it possible to use the existing appenders to output directly to 
>>>>>>> their final gzipped files, maintain M of those gzipped files, and 
>>>>>>> rollover/maintain N of the uncompressed logs?  I suspect that the 
>>>>>>> complicated part would be in JVM crash recovery/ application restart. 
>>>>>>> Any suggestions on how best to add/extend/customize support for this?
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> David
>> 
>> 
>> 
>> -- 
>> Matt Sicker <[email protected]>
> 

Reply via email to