All,

I would like to bring attention to and refer to a comment I made on the PR a 
few weeks ago
https://github.com/apache/cloudstack/pull/7131#issuecomment-1488671229 and good 
to see this finally being discussed. Thanks João for starting this ML thread 
and all the hard work you and other contributors have put into this.

I took some time to reflect on the discussions so far, gather facts, share my 
findings and thoughts for the community to review, consider and discuss:

  1.
I'm happy and it's encouraging to see all this hard work and new contributors 
who want to be RM. In fact, lately, I've transitioned into such a mentoring 
role both in the community and at work, and willing to help guide and mentor 
such RMs.

It's important for our community to welcome contributions, all contributors 
equally deserve a chance. The community should consider the facts, weigh in all 
the benefits and cons of any large contributions, the stability/risks 
associated, and discuss and build consensus if this is the right thing for 
them. The project bylaws also give us tools and mechanisms to build such 
consensus, and resolve difference of opinions or to formally agree via voting.

  2.
We aren't using log4j1.x. In fact, we're using Reload4J 
(https://reload4j.qos.ch/) which is a maintained log4j1.x fork by Ceki 
(original author, emeritus PMC member of log4j1.x project, and also a logging 
expert who's behind sl4j, logback projects). Per the website, Reload4J is also 
commercially supported by Exoscale (a fork of cloudstack), Spotify among 
others, and is regularly released. Reload4j is being used by a dozen other 
top-level ASF projects including CloudStack. Reload4j has also released several 
security and improvement releases even after the notorious log4jshell (which we 
weren't affected as much as the 2.x users).

I think Ceki is very community-oriented and trustworthy, he was kind enough to 
share his insights and confirm on performance considerations between log4j 2.x 
vs 1.x/reload4j and that several features including async logger is in his 
roadmap. I'll add details in a sub-note if anybody wants to refer [1].

From a dependency risk, longivity risk of using reload4j and security 
perspective for CloudStack, I see no technical reason to rush or there exists 
an urgency to move away from reloadj4j as our logging library. It meets our 
basic requirements for a logging library well.

  3.  There is no mandate or bylaws from the ASF that we must use log4j2x or 
any ASF-only library. The CloudStack project is free to use whatever 
dependencies we think are good and in favour of the wider community.

  4.  While I see a long list of features from log4j2.x, which particular 
features or benefits do we really want that aren't already available to use via 
reload4j?

This is an important technical question to discuss - are we doing the work 
simply because there is a new library/version, or there are serious 
requirements? I genuinely want to learn what changed.

  1.
From what I gather, there are side effects of merging the PR 
https://github.com/apache/cloudstack/pull/7131 that we should consider, review, 
discuss, find solutions to them, and try to resolve (or gather consensus) on 
how to proceed:

     *   Once merged, it will potentially cause conflicts on more than 100 
outstanding pull requests (potentially any/all java changes).

Have we thought about which approach we'll like to take, try to have these 100+ 
outstanding PRs merged in first or go ahead with this PR first? Should we give 
anykind of notice/documentation to them? What is the right approach for the 
community?

     *   Once merged, it will also cause conflicts and build failures on all 
PRs that are forward merged to a branch that has this PR merged. (mostly due to 
logging package import and logger name changes). This will add overhead in 
older branch maintenance, though the health check PRs can be used to ensure 
stability of such forward-merges.

     *   This will impact and affect all non-public/private integrations, 
plugins, and vendors such as (Storpool, Linbit, and others) that are building 
anything with, using, or on top of CloudStack. It may also affect their ability 
to deliver integration/feature/plugin jars which may not work out of the 
current process of replacing them in the classpath (I know Storpool and Linstor 
uses this approach at /usr/share/cloudstack-management/libs/ as we don't bundle 
these jars in the ACS mgmt uber-jar).

     *   Users - CloudStack release upgrades may not work out of the box for 
users who have made changes to their logging config, or are using any kind of 
integrations, plugins etc currently. I'm assuming the reload4j/log4j 1.x config 
xml isn't compatible 100% with log4j2.x (I could be wrong). Some of the users 
may be even blocked to upgrade or use a version of CloudStack until their 
dependencies, plugins/jars, integrations etc are updated (by them or their 
vendors) to work with CloudStack log4j2.x.

However, this could be dealt with upgrade paths (in packages post-install 
script) to rename/replace with new log4j2.x config file and 
documentation/notice in the release announcement and release notes.

Have we thought about giving any kind of deprecation/change notice to the wider 
community, and what kind of time frame or period is appropriate for them? Which 
release should this work be targeted for? Should we give at least six months 
notice and documentation for users, and target this for 4.20?

  2.  I have concerns about the stability and risks of accepting the PR, I 
check the PR and see build/CI failures on the PR. From the surface, it looks 
like the PR isn't nearly ready, and still requires a lot of stabilisation 
effort (or does it also require changes in the CI systems?).

The changes are huge so reviewing 100% is not possible, are the scope changes 
strictly logging replacement or there are other changes?

What is the current state of the PR, is it ready? Has anybody looked at the 
failures - why are they happening, can we fix them? As with any PR, the PR must 
pass all build and CI checks. Github Actions, BO/Trillian based test matrix 
across KVM, VMware, XenServer/XCP-ng can help (we can help kick those tests).

What is also missing are considerations for manual testing of the logs (for ex. 
to consider has the logging format, file, etc changes - are there any other 
side-effects we haven't anticipated but might learn?). I think it's fixable - 
somebody (non-authors) needs to manually test them - has that happened?

  3.  The PR intends to address the tech debt but the approach could be 
improved. It doesn't wrap/refactor the logging library under a utility method, 
so one downside of this approach I see is if in the future there is a new log4j 
3.x library or something different, the community would need to go through all 
of this again (perhaps would be easier with consistent naming, still doesn't 
make it risk or effort free).

Does it make sense to refactor and create a wrapper utility method (in 
cloud-utils module), that allows a community to easily replace one logging 
library with another?

  4.  I think a round of discussion and some iterations are necessary to 
evaluate all concerns,  work towards a concrete proposal that should weigh in, 
and propose/addres how exact we transition to log4j 2.x if at all.

Ideally, for such a large change we should give notice and a longer time period 
for the wider community to react, even bring it up, discuss in meetups and 
conferences to build support so this wouldn't come as a shock to people or 
affect them - sort of prepare the soil, before we sow the seeds.

Hope this helps.

[1] I discussed https://github.com/qos-ch/reload4j/issues/47 and then for my 
query got email from Ceki/Reload4j on async logger, also

Here are a few comments that you might find helpful.

In my opinion asynchronous loggers as they exist in log4j 2.x (based on
LMAX Disruptor) use an extremely large buffer size. This buffer will
fill up in case of back pressure. There are two major cases.

Either:

1) all downstream appenders are fast

In that case, a very large buffer is useless as a reasonably sized
buffer will offer good performance.  For example, benchmarks [1] have
shown that logback's AsyncAppender is both simpler and significantly faster.

2) at least one downstream appender is slow

In which case, the very large buffer will slowly fill up, possibly
exhausting available memory. More importantly, in case the application
is stopped, this will result in large amount of logging data to be lost,
which is far from ideal for a logging framework.

By the way, I intend to add  JsonLayout to reload4j in the coming days.
JsonLayout will not require additional dependencies and should be quite
useful.

Take care and best regards,

[1] https://logback.qos.ch/performance.html



Regards.

________________________________
From: Abhishek Kumar <shwst...@apache.org>
Sent: Wednesday, May 3, 2023 15:16
To: us...@cloudstack.apache.org <us...@cloudstack.apache.org>
Cc: dev@cloudstack.apache.org <dev@cloudstack.apache.org>
Subject: Re: ACS upgrade to Log4J2 version 2.19

Hi Daniel,

It was just my opinion it is based on the reasons that it is something that
we haven't seen any request in the community before and it will create some
challenges for the releases, forward-merging bug-fixes and also for any
existing integrations that users might be having.
To be specific, I'm neutral (0) on this.
Great that you want to take up the RM work to address part of those
challenges. You have my support.
The only thing that I would hope for is to get this tested/merged early to
avoid issues at the later stages nearer to the release.

Regards,
Abhishek

On Mon, 1 May 2023 at 18:58, Daniel Salvador <dvsalvador...@gmail.com>
wrote:

> Abhishek,
>
> I do not see why it would be a 5.0 change. Also, ACS 5.0 is a discussion
> the community has been having for a long time from now and is something we
> are too far away to achieve consensus.
>
> The patch is important to enable further development for the log management
> on ACS and facilitate everyone's life while coding and troubleshooting. If
> you think it is too much work for the RM, I reiterate that I am willing to
> be the 4.19 RM and conduct/execute all of the work.
>
> Best regards,
> Daniel Salvador (gutoveronezi)
>
> On Mon, May 1, 2023 at 4:10 AM Abhishek Kumar <shwst...@apache.org> wrote:
>
> > Great work.
> > Though I feel this is a 5.0 change. I agree with Wei that this would
> create
> > too much overhead for upcoming releases. 4.18 was pushed ahead a few
> months
> > and we may end up on a similar path.
> > Also, reload4j is still actively maintained so I don't think this is
> > urgent.
> >
> > Regards,
> > Abhishek
> >
> > On Fri, 28 Apr 2023 at 18:28, João Jandre Paraquetti <
> j...@scclouds.com.br
> > >
> > wrote:
> >
> > > In PR #7131 (https://github.com/apache/cloudstack/pull/7131) I have
> > > proposed to normalize ACS's loggers, and more importantly, upgrade the
> > > library log4j to log4j2 version 2.19.
> > >
> > > Log4j2 has a lot of features that could offer benefits to ACS:
> > >
> > >   * Async Loggers - performance similar to logging switched off
> > >   * Custom log levels
> > >   * Automatically reload its configuration upon modification without
> > >     loosing log events during reconfigurations.
> > >   * Java 8-style lambda support for lazy logging (which enables methods
> > >     to be executed only when necessary, i.e.: the right log level)
> > >   * Log4j 2 is garbage-free (or at least low-garbage) since version 2.6
> > >   * Plugin Architecture - easy to extend by building custom components
> > >   * Log4j 2 API is separated from the Log4j 2 implementation.
> > >   * Log4j 2 API supports more than just logging Strings: CharSequences,
> > >     Objects and custom Messages. Messages allow support for interesting
> > >     and complex constructs to be passed through the logging system and
> > >     be efficiently manipulated. Users are free to create their own
> > >     Message types and write custom Layouts, Filters and Lookups to
> > >     manipulate them.
> > >   * Concurrency improvements: log4j2 uses java.util.concurrent
> libraries
> > >     to perform locking at the lowest level possible. Log4j-1.x has
> known
> > >     deadlock issues.
> > >   * Configuration via XML, JSON, YAML, properties configuration files
> or
> > >     programmatically.
> > >
> > > In my personal experience using it in some other projects, log4j2 is
> > > easier to work with in general, has better performance, and is an
> active
> > > project with constant development, innovation, and security patches.
> > > Moreover, it is under a well known and trusted open source
> organization.
> > >
> > > The change proposed in PR #7131
> > > (https://github.com/apache/cloudstack/pull/7131) has been tested and
> > > validated in a lot of different scenarios by different people. We have
> > > already tested the logging in the management server, usage, agents, and
> > > system VMs; all of that using KVM and Vmware + Veeam. Most feature sets
> > > were tested, create/delete/update VMs, disks, cresate snapshots, user
> > > management and so on.
> > >
> > > The proposal has been discussed since January, 2023 in the PR
> > > (https://github.com/apache/cloudstack/pull/7131), but I have been
> > > requested to bring it to the mailing list. I would love to hear your
> > > opinions on it, also, any reviews to the PR would be welcome.
> >
>

 

Reply via email to