All, I would like to bring attention to and refer to a comment I made on the PR a few weeks ago https://github.com/apache/cloudstack/pull/7131#issuecomment-1488671229 and good to see this finally being discussed. Thanks João for starting this ML thread and all the hard work you and other contributors have put into this.
I took some time to reflect on the discussions so far, gather facts, share my findings and thoughts for the community to review, consider and discuss: 1. I'm happy and it's encouraging to see all this hard work and new contributors who want to be RM. In fact, lately, I've transitioned into such a mentoring role both in the community and at work, and willing to help guide and mentor such RMs. It's important for our community to welcome contributions, all contributors equally deserve a chance. The community should consider the facts, weigh in all the benefits and cons of any large contributions, the stability/risks associated, and discuss and build consensus if this is the right thing for them. The project bylaws also give us tools and mechanisms to build such consensus, and resolve difference of opinions or to formally agree via voting. 2. We aren't using log4j1.x. In fact, we're using Reload4J (https://reload4j.qos.ch/) which is a maintained log4j1.x fork by Ceki (original author, emeritus PMC member of log4j1.x project, and also a logging expert who's behind sl4j, logback projects). Per the website, Reload4J is also commercially supported by Exoscale (a fork of cloudstack), Spotify among others, and is regularly released. Reload4j is being used by a dozen other top-level ASF projects including CloudStack. Reload4j has also released several security and improvement releases even after the notorious log4jshell (which we weren't affected as much as the 2.x users). I think Ceki is very community-oriented and trustworthy, he was kind enough to share his insights and confirm on performance considerations between log4j 2.x vs 1.x/reload4j and that several features including async logger is in his roadmap. I'll add details in a sub-note if anybody wants to refer [1]. From a dependency risk, longivity risk of using reload4j and security perspective for CloudStack, I see no technical reason to rush or there exists an urgency to move away from reloadj4j as our logging library. It meets our basic requirements for a logging library well. 3. There is no mandate or bylaws from the ASF that we must use log4j2x or any ASF-only library. The CloudStack project is free to use whatever dependencies we think are good and in favour of the wider community. 4. While I see a long list of features from log4j2.x, which particular features or benefits do we really want that aren't already available to use via reload4j? This is an important technical question to discuss - are we doing the work simply because there is a new library/version, or there are serious requirements? I genuinely want to learn what changed. 1. From what I gather, there are side effects of merging the PR https://github.com/apache/cloudstack/pull/7131 that we should consider, review, discuss, find solutions to them, and try to resolve (or gather consensus) on how to proceed: * Once merged, it will potentially cause conflicts on more than 100 outstanding pull requests (potentially any/all java changes). Have we thought about which approach we'll like to take, try to have these 100+ outstanding PRs merged in first or go ahead with this PR first? Should we give anykind of notice/documentation to them? What is the right approach for the community? * Once merged, it will also cause conflicts and build failures on all PRs that are forward merged to a branch that has this PR merged. (mostly due to logging package import and logger name changes). This will add overhead in older branch maintenance, though the health check PRs can be used to ensure stability of such forward-merges. * This will impact and affect all non-public/private integrations, plugins, and vendors such as (Storpool, Linbit, and others) that are building anything with, using, or on top of CloudStack. It may also affect their ability to deliver integration/feature/plugin jars which may not work out of the current process of replacing them in the classpath (I know Storpool and Linstor uses this approach at /usr/share/cloudstack-management/libs/ as we don't bundle these jars in the ACS mgmt uber-jar). * Users - CloudStack release upgrades may not work out of the box for users who have made changes to their logging config, or are using any kind of integrations, plugins etc currently. I'm assuming the reload4j/log4j 1.x config xml isn't compatible 100% with log4j2.x (I could be wrong). Some of the users may be even blocked to upgrade or use a version of CloudStack until their dependencies, plugins/jars, integrations etc are updated (by them or their vendors) to work with CloudStack log4j2.x. However, this could be dealt with upgrade paths (in packages post-install script) to rename/replace with new log4j2.x config file and documentation/notice in the release announcement and release notes. Have we thought about giving any kind of deprecation/change notice to the wider community, and what kind of time frame or period is appropriate for them? Which release should this work be targeted for? Should we give at least six months notice and documentation for users, and target this for 4.20? 2. I have concerns about the stability and risks of accepting the PR, I check the PR and see build/CI failures on the PR. From the surface, it looks like the PR isn't nearly ready, and still requires a lot of stabilisation effort (or does it also require changes in the CI systems?). The changes are huge so reviewing 100% is not possible, are the scope changes strictly logging replacement or there are other changes? What is the current state of the PR, is it ready? Has anybody looked at the failures - why are they happening, can we fix them? As with any PR, the PR must pass all build and CI checks. Github Actions, BO/Trillian based test matrix across KVM, VMware, XenServer/XCP-ng can help (we can help kick those tests). What is also missing are considerations for manual testing of the logs (for ex. to consider has the logging format, file, etc changes - are there any other side-effects we haven't anticipated but might learn?). I think it's fixable - somebody (non-authors) needs to manually test them - has that happened? 3. The PR intends to address the tech debt but the approach could be improved. It doesn't wrap/refactor the logging library under a utility method, so one downside of this approach I see is if in the future there is a new log4j 3.x library or something different, the community would need to go through all of this again (perhaps would be easier with consistent naming, still doesn't make it risk or effort free). Does it make sense to refactor and create a wrapper utility method (in cloud-utils module), that allows a community to easily replace one logging library with another? 4. I think a round of discussion and some iterations are necessary to evaluate all concerns, work towards a concrete proposal that should weigh in, and propose/addres how exact we transition to log4j 2.x if at all. Ideally, for such a large change we should give notice and a longer time period for the wider community to react, even bring it up, discuss in meetups and conferences to build support so this wouldn't come as a shock to people or affect them - sort of prepare the soil, before we sow the seeds. Hope this helps. [1] I discussed https://github.com/qos-ch/reload4j/issues/47 and then for my query got email from Ceki/Reload4j on async logger, also Here are a few comments that you might find helpful. In my opinion asynchronous loggers as they exist in log4j 2.x (based on LMAX Disruptor) use an extremely large buffer size. This buffer will fill up in case of back pressure. There are two major cases. Either: 1) all downstream appenders are fast In that case, a very large buffer is useless as a reasonably sized buffer will offer good performance. For example, benchmarks [1] have shown that logback's AsyncAppender is both simpler and significantly faster. 2) at least one downstream appender is slow In which case, the very large buffer will slowly fill up, possibly exhausting available memory. More importantly, in case the application is stopped, this will result in large amount of logging data to be lost, which is far from ideal for a logging framework. By the way, I intend to add JsonLayout to reload4j in the coming days. JsonLayout will not require additional dependencies and should be quite useful. Take care and best regards, [1] https://logback.qos.ch/performance.html Regards. ________________________________ From: Abhishek Kumar <shwst...@apache.org> Sent: Wednesday, May 3, 2023 15:16 To: us...@cloudstack.apache.org <us...@cloudstack.apache.org> Cc: dev@cloudstack.apache.org <dev@cloudstack.apache.org> Subject: Re: ACS upgrade to Log4J2 version 2.19 Hi Daniel, It was just my opinion it is based on the reasons that it is something that we haven't seen any request in the community before and it will create some challenges for the releases, forward-merging bug-fixes and also for any existing integrations that users might be having. To be specific, I'm neutral (0) on this. Great that you want to take up the RM work to address part of those challenges. You have my support. The only thing that I would hope for is to get this tested/merged early to avoid issues at the later stages nearer to the release. Regards, Abhishek On Mon, 1 May 2023 at 18:58, Daniel Salvador <dvsalvador...@gmail.com> wrote: > Abhishek, > > I do not see why it would be a 5.0 change. Also, ACS 5.0 is a discussion > the community has been having for a long time from now and is something we > are too far away to achieve consensus. > > The patch is important to enable further development for the log management > on ACS and facilitate everyone's life while coding and troubleshooting. If > you think it is too much work for the RM, I reiterate that I am willing to > be the 4.19 RM and conduct/execute all of the work. > > Best regards, > Daniel Salvador (gutoveronezi) > > On Mon, May 1, 2023 at 4:10 AM Abhishek Kumar <shwst...@apache.org> wrote: > > > Great work. > > Though I feel this is a 5.0 change. I agree with Wei that this would > create > > too much overhead for upcoming releases. 4.18 was pushed ahead a few > months > > and we may end up on a similar path. > > Also, reload4j is still actively maintained so I don't think this is > > urgent. > > > > Regards, > > Abhishek > > > > On Fri, 28 Apr 2023 at 18:28, João Jandre Paraquetti < > j...@scclouds.com.br > > > > > wrote: > > > > > In PR #7131 (https://github.com/apache/cloudstack/pull/7131) I have > > > proposed to normalize ACS's loggers, and more importantly, upgrade the > > > library log4j to log4j2 version 2.19. > > > > > > Log4j2 has a lot of features that could offer benefits to ACS: > > > > > > * Async Loggers - performance similar to logging switched off > > > * Custom log levels > > > * Automatically reload its configuration upon modification without > > > loosing log events during reconfigurations. > > > * Java 8-style lambda support for lazy logging (which enables methods > > > to be executed only when necessary, i.e.: the right log level) > > > * Log4j 2 is garbage-free (or at least low-garbage) since version 2.6 > > > * Plugin Architecture - easy to extend by building custom components > > > * Log4j 2 API is separated from the Log4j 2 implementation. > > > * Log4j 2 API supports more than just logging Strings: CharSequences, > > > Objects and custom Messages. Messages allow support for interesting > > > and complex constructs to be passed through the logging system and > > > be efficiently manipulated. Users are free to create their own > > > Message types and write custom Layouts, Filters and Lookups to > > > manipulate them. > > > * Concurrency improvements: log4j2 uses java.util.concurrent > libraries > > > to perform locking at the lowest level possible. Log4j-1.x has > known > > > deadlock issues. > > > * Configuration via XML, JSON, YAML, properties configuration files > or > > > programmatically. > > > > > > In my personal experience using it in some other projects, log4j2 is > > > easier to work with in general, has better performance, and is an > active > > > project with constant development, innovation, and security patches. > > > Moreover, it is under a well known and trusted open source > organization. > > > > > > The change proposed in PR #7131 > > > (https://github.com/apache/cloudstack/pull/7131) has been tested and > > > validated in a lot of different scenarios by different people. We have > > > already tested the logging in the management server, usage, agents, and > > > system VMs; all of that using KVM and Vmware + Veeam. Most feature sets > > > were tested, create/delete/update VMs, disks, cresate snapshots, user > > > management and so on. > > > > > > The proposal has been discussed since January, 2023 in the PR > > > (https://github.com/apache/cloudstack/pull/7131), but I have been > > > requested to bring it to the mailing list. I would love to hear your > > > opinions on it, also, any reviews to the PR would be welcome. > > >