Re: HADOOP-18207 hadoop-logging module about to land

Viraj Jasani Thu, 27 Jul 2023 13:54:37 -0700

Thank you Wei-Chiu for the thread and extensive help with reviews! Thank
you Ayush for responding to the thread!
Let me try to address some points.

Please pardon my ignorance if I am not supposed to respond to any of the
questions.

> Regarding this entire activity including the parent tickets: Do we have
any dev list agreement for this?

HADOOP-16206 <https://issues.apache.org/jira/browse/HADOOP-16206> was
created back in Mar, 2019 and there has been tons of discussion on the Jira
since then. Duo is an expert and he has also worked with our esteemed Log4j
community to introduce changes that promise great benefits for both hbase
and hadoop projects (for instance, [1]). He has laid out the plan to tackle
the whole migration, one small piece at a time and there has been enough
agreement on the Jira from Hadoop experts, some of Log4j community members
also chimed in and provided their feedbacks, and it has been agreed upon to
proceed with Duo's proposed plan and integrate the changes into the trunk.
This will enable us to stabilize the work gradually over time.
The Jira has received many interactions over the past few years.

> What incompatibilities have been introduced till now for this and what
are planned.

Let me list down what has been done so far, that might be easier to discuss:

   - HADOOP-18206 <https://issues.apache.org/jira/browse/HADOOP-18206> removed
   commons-logging references, the project is no longer under any active
   development cycle (last release on 2014
   https://github.com/apache/commons-logging/tags), and without this
   cleanup, it becomes very difficult to chase log4j changes. No direct
   incompatibility involved.
   - HADOOP-18653 <https://issues.apache.org/jira/browse/HADOOP-18653>
follow-up
   to ensure we use slf4j log4j adaptor to ensure slf4j is in the classpath
   before we update loglevel (dynamically change log level using servlet). No
   incompatibility introduced.
   - HADOOP-18648 <https://issues.apache.org/jira/browse/HADOOP-18648> kms
   log4j properties should not be loaded dynamically as this is no longer
   supported by log4j2, instead use HADOOP_OPTS to provide log4j properties
   location. No incompatibility introduced.
   - HADOOP-18654 <https://issues.apache.org/jira/browse/HADOOP-18654>
TaskLogAppender
   is not being used, remove it. It was marked IA.Private and IS.Unstable. No
   incompatibility introduced.
   - HADOOP-18669 <https://issues.apache.org/jira/browse/HADOOP-18669>
remove Log4Json
   Layout as it is more suitable to be part of Log4j project rather than
   Hadoop, it's not being used anywhere. Each appender that we maintain, we
   pay for its maintenance cost. No incompatibility introduced.
   - HADOOP-18649 <https://issues.apache.org/jira/browse/HADOOP-18649> CLA
   and CLRA appenders to be replaced with log4j RFA appender. Both CLA and
   CLRA have been our custom appenders and they both provide the same
   capabilities as RFA hence their maintenance in our project would come with
   cost for any future upgrades of log4j. This has also been agreed upon on
   the parent Jira way before the work started.
   - HADOOP-18631 <https://issues.apache.org/jira/browse/HADOOP-18631> Migrate
   dynamic async appenders to log4j properties. This is *an incompatible
   change* because we replace "hadoop site configs" with "log4j
   properties". We are not losing out on our capability to generate async logs
   for namenode audit, but the way to configure it is now different. The
   release notes have been updated to reflect the same. For log4j upgrade, we
   don't have a choice here, log4j2 only supports async loggers as the
   configuration, not as programmatically loaded appenders. log4j properties
   to configure are provided on

https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/log4j.properties#L64-L81

As for the current task that introduced hadoop-logging module HADOOP-18207
<https://issues.apache.org/jira/browse/HADOOP-18207>, we don't have any
incompatibility yet because our direct usage of log4j APIs and custom
appenders have not been marked IA.Public.

The major incompatibility is going to be introduced when we add log4j2 in
our classpath and remove log4j1 from our dependencies. This work has not
yet begun and it's going to take a while.

> What does this activity bring up for the downstream folks adapting this?
Upgrading Hadoop is indeed important for a lot of projects

Downstream projects should expect to no longer get log4j1 as a transitive
dependency from hadoop, instead they would get log4j2 as transitive
dependency (only after the whole upgrade is done, log4j2 upgrade has not
even started as I mentioned above :)).

This brings an interesting question: why do we need this upgrade? For us,
almost all of hadoop ecosystem projects that we use have migrated to
log4j2, and when we keep common thirdparty dependencies to be used by all
hadoop downstreamers, we can still not use log4j2 because hadoop is still
not using log4j2.

> What tests have been executed verifying all these changes including this
and the ones already in, apart from the Jenkins results, and what's the
plan.

I have prepared new containers with individual changes and deployed k8s
cluster with hdfs (NN, DN, JN, Zkfc), mapreduce (HS), Yarn (RM, NM)
components and run some known sanity tests and ensured that some of the
known logs are generated as expected. For namenode audit logs, I have been
able to ensure that async logger is able to generate async logs using the
custom appender with new log4j configuration (in the absence of previously
used site configs).

> This Comment [
https://github.com/apache/hadoop/pull/5503#discussion_r1199614640], this
says it isn't moving all the instances? So, when do you plan to work on
this? Should that be a release blocker for us, since part of the activity
is in? Needless to say: "Best Effort, whatever could move in, moves is,
isn't an answer"

For this exact reason, I have initiated an email thread on log4j dev list:
[2]. We programmatically set monitor interval for dynamic log4j file
changes in httpfs server. This is no longer supported as that is also meant
to be kept in config only (properties, xml, json, yaml etc).

> The above comment thread even says losing some available abilities, even
some past one said so, what all is getting compromised, and how do you plan
to get it back? Most of the lost abilities are related to HDFS, I don't
think we are in a state to lose stuff there, if we aren't having enough to
make people adapt. Our ultimate goal isn't to have something in, but to
make people use it.

Apart from migrating namenode async log capability from having to rely on
"site configs" to "log4j properties", we don't seem to have any lost
abilities. This is also not a lost ability, it's the way of configuring it
that is different.

> Most of the time when this entire activity breaks & like usual we are on
a follow-up or on an addendum PR, there is generally some sarcastic or a
response like: 'We can't do it without breaking things', and I am not
taking any of these for now.

My sincere apologies if you think I have not addressed your comments. Could
you please provide a specific reference and I will be happy to provide
detailed answers. Many of the questions we are dealing with at this point
in time have already been discussed on the parent Jira in the past.

I would also like to share some of my opinions: I understand that none of
this is simpler to deal with, none of this is an interesting work either,
as opposed to working on a big feature or providing some bug fixes
(specifically when it takes hours and days to figure out where we have the
bug, despite how small the fix might be), however we still cannot give up
on the work that helps maintain our project, can we?
Just to provide an example, we don't have Java 11 compile support because
of (let's say) the lack of migration from Jersey 1.x to 2.x (HADOOP-15984
<https://issues.apache.org/jira/browse/HADOOP-15984>). At this point, it
seems extremely difficult to migrate to Jersey 2 because of multiple
factors (e.g. guice support + hk2 dependency injection don't work well in
Jersey 2), I have initiated a mailing thread on eclipse community and also
created an issue on jersey tracker to get some insights:
https://github.com/eclipse-ee4j/jersey/issues/5357
There has been no update and it is well known that guice support does not
work with jersey 2, we are not aware of which direction we are going to go
with, and how can we possibly upgrade jersey, but we still cannot give up
on it, right? We do need to find some way, don't we?
Similarly, the whole log4j2 upgrade is also a big deal, but we are not
going to lose any hadoop functionality, we have an alternative way of
achieving the same output (as mentioned about async logger).

The point I am trying to make is: all this work is really not that
interesting, but we still need to work on it, we still need to
maintain hadoop project for the rest of the world (no matter how complex it
is to maintain it) as a community, correct? Hadoop is the only project that
doesn't have log4j2 support among all the other big data projects that we
are using as of today, we can stay in this state for a while but how long
are we willing to compromise an inevitable boring maintenance effort?
Moreover, once we migrate to log4j2, we could get a real boost from some of
their async logger efforts, where we don't even need async appender.

I would be extremely happy if anyone else is willing to put efforts and get
this work rolling forward. I am committed to ensuring that my work on the
project does not give rise to "multiple community conflicts." Therefore, if
we can confidently determine that using log4j1 for the foreseeable future
is acceptable, there would be no need for additional dev and review cycles.
I am more than willing to refrain from creating any further patches in such
a scenario. Similar discussion is required for jersey as well, while it
might be tempting to hope that integrating jersey 2 will magically resolve
all our issues and make our Jackson and other dependency upgrades reliable,
it is essential to be realistic about the potential challenges ahead.
Despite the appeal of adopting jersey 2, we must be prepared to face a
substantial num of incompatibilities that would arise with jersey 2
migration.

I am really grateful to all reviewers that have reviewed the tasks so far,
I am not expecting reviewers to be able to provide their reviews as quickly
as possible, this particular sub-task has been dev and test ready for the
last 4 months, and it is absolutely okay to wait longer, but what really
hurts a bit is the fact that despite the whole discussion that took place
on the parent Jira, and the clear agreements/directions we have agreed
upon, we are still engaging in the discussion to determine the value this
work brings in.
I sincerely apologize if any aspects were not adequately clarified during
our discussions of each sub-task. I am more than willing to revisit any
line of code and engage in a detailed conversation to share insights into
the factors that influenced the changes made.

1. https://lists.apache.org/thread/gvfb3jkg6t11cyds4jmpo7lrswmx28w3
2. https://lists.apache.org/thread/4l7oyk84jpj6br0sn4ymofdcbgfxmtqp

On Thu, Jul 27, 2023 at 3:11 AM Ayush Saxena <ayush...@gmail.com> wrote:

> Hi Wei-Chiu,
> I am glad this activity finally made it to the dev mailing list. Just
> sharing the context being the guy who actually reverted this last time
> it was in: It had a test failure on the PR itself and it went in, that
> had nothing to do with the nature of the PR, generic for all PR and
> all projects.
>
> Some thoughts & Questions?
> * Regarding this entire activity including the parent tickets: Do we
> have any dev list agreement for this?
> * What incompatibilities have been introduced till now for this and
> what are planned.
> * What does this activity bring up for the downstream folks adapting
> this? Upgrading Hadoop is indeed important for a lot of projects and
> for "us as well" and it is already a big pain (my past experience)
> * What tests have been executed verifying all these changes including
> this and the ones already in, apart from the Jenkins results, and
> what's the plan.
> * Considering you are heavily involved, any insights around perf stuff?
> * This Comment [
> https://github.com/apache/hadoop/pull/5503#discussion_r1199614640],
> this says it isn't moving all the instances? So, when do you plan to
> work on this? Should that be a release blocker for us, since part of
> the activity is in? Needless to say: "Best Effort, whatever could move
> in, moves is, isn't an answer"
> * The above comment thread even says losing some available abilities,
> even some past one said so, what all is getting compromised, and how
> do you plan to get it back? Most of the lost abilities are related to
> HDFS, I don't think we are in a state to lose stuff there, if we
> aren't having enough to make people adapt. Our ultimate goal isn't to
> have something in, but to make people use it.
> * What advantages do we get with all of these activities over existing
> branch-3 stuff? Considering what are the trade-offs, Was discussing
> with some folks offline & that seems to be a good question to have an
> answer beforehand.
>
> PS. Most of the time when this entire activity breaks & like usual we
> are on a follow-up or on an addendum PR, there is generally some
> sarcastic or a response like: 'We can't do it without breaking
> things', and I am not taking any of these for now.
>
> Most importantly since we are discussing it now and if there are
> incompatibilities introduced already, is there a possible way out and
> get rid of them, if not, if there ain't an agreement, how tough is
> going back, because if it introduces incompatibilities for HDFS, you
> won't get an agreement most probably, not sure about others but I will
> veto that...
>
>
> TLDR, Please hold unless all the concerns are addressed and we have an
> agreement for this as well as anything done in past or planned for
> future, Shouldn't compromise the adaptability of the product at any
> cost
>
> -Ayush
>
> On Thu, 27 Jul 2023 at 03:47, Wei-Chiu Chuang <weic...@apache.org> wrote:
> >
> > Hi,
> >
> > I am preparing to resolve HADOOP-18207
> > <https://issues.apache.org/jira/browse/HADOOP-18207> (
> > https://github.com/apache/hadoop/pull/5717).
> >
> > This change affects all modules. With this change, it will eliminate
> almost
> > all the direct log4j usage.
> >
> > As always, landing such a big piece is tricky. I am sorry for the mishaps
> > last time and am doing more due diligence to make it a smoother
> transition.
> > I am triggering one last precommit check. Once the change is merged,
> Viraj
> > and I will pay attention to any potential problems.
> >
> > Weichiu
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>

Re: HADOOP-18207 hadoop-logging module about to land

Reply via email to