I believe we need to recognize that perhaps we have 3 different behaviors here. One is appending the log file, another is rotating it, another is discarding unnecessary backups. There should be a clear separation of concerns and that will be possible if separate configurations exist.
Cheers, Paul On Wed, May 28, 2014 at 2:40 PM, David Hoa <[email protected]> wrote: > Thanks, Paul, Ralph, Matt, Remko. All good thoughts, with log4j1, we had > to have an external process do the compression, but with log4j2, the size > based trigger provides interesting possibilities, avoiding the extremely > large file problem issue in exchange for many timestamped files where there > used to be one. It also might require a mindset change from retaining N > days worth of logs to retaining M GB of logs (probably a good thing, as it > creates incentive to not have unnecessary log messages). > > thanks, > David > > > On Wed, May 28, 2014 at 11:00 AM, Paul Benedict <[email protected]>wrote: > >> I would definitely think if the file is too large, it should be rolled >> more frequently. At least with log4j 1, you could set to roll not only >> daily, but half-day, and hourly (and maybe finer). That's the ideal way to >> handle this. Personally, I don't like the idea of writing directly to a >> compressed archive since, conceptually, it's not "archived" until it's >> closed; rather it should be compressed when it rolls over. >> >> >> Cheers, >> Paul >> >> >> On Wed, May 28, 2014 at 12:55 PM, Ralph Goers <[email protected]> wrote: >> >>> So you are proposing writing two logs - one compressed and one >>> uncompressed - to handle this. I am wondering what the break-even point of >>> this would be. Many users use a size-base trigger instead so that a) the >>> compression won't take long and b) manipulating a large file is not so much >>> of a problem. >>> >>> What has me wondering about the usefulness of this is that when the file >>> gets so large that compression at rollover is a problem the file is >>> probably too large to manipulate effectively in something like vi. >>> >>> Ralph >>> >>> Sent from my iPad >>> >>> On May 28, 2014, at 10:27 AM, David Hoa <[email protected]> wrote: >>> >>> Yup, the tricky part would come on crash before close, interrupt, etc, >>> because I assume that that partially compressed file would be irrecoverable >>> (haven't verified this). Ideally, we'd be able to close it properly, but if >>> not, the log could, on startup, be recovered and compressed from the >>> parallel uncompressed log that was simultaneously being written by >>> another/the same appender. >>> >>> That would incur start up time to recover, which may be more acceptable >>> in the rare case of a crash. Else, if there's another compression technique >>> that leaves behind readable files even if not closed properly, that'd >>> eliminate the need for recovery. >>> >>> I'll open a jira ticket. Thanks for letting me share my thoughts on this. >>> >>> - David >>> >>> >>> On Wed, May 28, 2014 at 9:39 AM, Matt Sicker <[email protected]> wrote: >>> >>>> We can use GZIPOutputStream, DeflaterOutputStream, and ZipOutputStream >>>> all out of the box. >>>> >>>> What happens if you interrupt a stream in progress? No idea! But Gzip >>>> at least has CRC32 checksums on hand, so it can be detected if it's >>>> corrupted. We'll have to experiment a bit to see what really happens. I >>>> couldn't find anything in zlib.net's FAQ. >>>> >>>> >>>> On 28 May 2014 08:56, Ralph Goers <[email protected]> wrote: >>>> >>>>> What would happen to the file if the system crashed before the file is >>>>> closed? Would the file be able to be decompressed or would it be >>>>> corrupted? >>>>> >>>>> Sent from my iPad >>>>> >>>>> On May 28, 2014, at 6:35 AM, Remko Popma <[email protected]> >>>>> wrote: >>>>> >>>>> David, thank you for the clarification. I understand better what you >>>>> are trying to achieve now. >>>>> >>>>> Interesting idea to have an appender that writes to a >>>>> GZipOutputStream. Would you mind raising a Jira >>>>> <https://issues.apache.org/jira/browse/LOG4J2>ticket for that feature >>>>> request? >>>>> >>>>> I would certainly be interested in learning about efficient techniques >>>>> for compressing very large files. Not sure if or how the dd/direct I/O >>>>> mentioned in the blog you linked to could be leveraged from java. If you >>>>> find a way that works well for log file rollover, and you're interested in >>>>> sharing it, please let us know. >>>>> >>>>> >>>>> >>>>> On Wed, May 28, 2014 at 3:42 PM, David Hoa <[email protected]> wrote: >>>>> >>>>>> Hi Remko, >>>>>> >>>>>> My point about gzip, which we've experienced, is that compressing >>>>>> very large files (multi-GB) does have considerable impact on the system. >>>>>> The dd/direct I/O workaround avoid putting that much log data into your >>>>>> filesystem cache. For that problem, after I sent the email, I did look at >>>>>> the log4j2 implementation, and saw that in >>>>>> DefaultRolloverStrategy::rollover() it calls GZCompressionAction, so I >>>>>> see >>>>>> how I can write my own strategy and Action to customize how gzip is >>>>>> called. >>>>>> >>>>>> My second question was not about adding to existing gzip files; from >>>>>> what I know that's not possible. But if the GZipOutputStream is kept open >>>>>> and written to until closed by a rollover event, then the cost of >>>>>> gzipping >>>>>> is amortized over time rather than incurred when the rollover event gets >>>>>> triggered. The benefit is amortization of gzip so there's no resource >>>>>> usage >>>>>> spike; downside would be writing both compressed and uncompressed log >>>>>> files >>>>>> and maintaining rollover strategies for both of them. So a built in >>>>>> appender that wrote directly to gz files would be useful for this. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> >>>>>> On Tue, May 27, 2014 at 4:52 PM, Remko Popma >>>>>> <[email protected]>wrote: >>>>>> >>>>>>> Hi David, >>>>>>> >>>>>>> I read the blog post you linked to. It seems that the author was >>>>>>> very, very upset that a utility called cp only uses a 512 byte buffer. >>>>>>> He >>>>>>> then goes on to praise gzip for having a 32KB buffer. >>>>>>> So just based on your link, gzip is actually pretty good. >>>>>>> >>>>>>> That said, there are plans to improve the file rollover mechanism. >>>>>>> These plans are currently spread out over a number of Jira tickets. One >>>>>>> existing request is to delete archived log files that are older than >>>>>>> some >>>>>>> number of days. (https://issues.apache.org/jira/browse/LOG4J2-656, >>>>>>> https://issues.apache.org/jira/browse/LOG4J2-524 ) >>>>>>> This could be extended to cover your request to keep M compressed >>>>>>> files. >>>>>>> >>>>>>> I'm not sure about appending to existing gzip files. Why is this >>>>>>> desirable/What are you trying to accomplish with that? >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On 2014/05/28, at 3:22, David Hoa <[email protected]> wrote: >>>>>>> >>>>>>> hi Log4j Dev, >>>>>>> >>>>>>> I am interested in the log rollover and compression feature in >>>>>>> log4j2. I read the documentation online, and still have some questions. >>>>>>> >>>>>>> - gzipping large files has performance impact on latencies/cpu/file >>>>>>> cache, and there's a workaround for that using dd and direct i/o. Is it >>>>>>> possible to customize how log4j2 gzips files (or does log4j2 already do >>>>>>> this)? See this link for a description of the common problem. >>>>>>> >>>>>>> http://kevinclosson.wordpress.com/2007/02/23/standard-file-utilities-with-direct-io/ >>>>>>> >>>>>>> - is it possible to use the existing appenders to output directly to >>>>>>> their final gzipped files, maintain M of those gzipped files, and >>>>>>> rollover/maintain N of the uncompressed logs? I suspect that the >>>>>>> complicated part would be in JVM crash recovery/ application restart. >>>>>>> Any >>>>>>> suggestions on how best to add/extend/customize support for this? >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Matt Sicker <[email protected]> >>>> >>> >>> >> >
