Re: HADOOP-18207 hadoop-logging module about to land

2023-07-28 Thread Ayush Saxena
Started from the bottom, a lot of irrelevant information,
java,jersey,eclipse, finally I gave up, bit of emotional stuff also,
so didn’t read the whole mail itself.

Quoting:
///->
>
hurts a bit is the fact that despite the whole discussion that took place
on the parent Jira, and the clear agreements/directions we have agreed
upon, we are still engaging in the discussion to determine the value this
work brings in.
<-///

Final derrivatives: a lot of discussion happened on the ticket itself,
patch is ready since 4 months(slow reveiewer to blame?), but few weeks
because of me, just for folks, because the ‘extensive‘ review didn’t
involves running the failed or checking even the jenkins result & it
broke obviously and then directly coming & merging based on a non
binding vote. No worries, Who cares!!!

So, I am late into the party & on the wrong venue and not an 'expert'
so not in the guest list either, *Typical ‘Not Open To
Discussion/Feedback’ scenario*, moving forward here in such a scenario
is getting into arguments & finally a fight, which I am pretty sure
atleast neither I nor Wei-Chiu is interested in. One day increased
also today, Next time make sure Wei-Chiu it is 15mins otherwise we
loose scope to say anything & people get hurt also :-)

So, my opinion/vote doesn’t hold here atleast, so didn't bother to
read also, go ahead folks. Next time I will be on time and the right
place. Please refrain from putting mails like these on dev lists after
months, can directly merge!!!

PS. IA & IS is something we hardly update now and hardly mark anything
on the higher side and we use mostly to get away with ‘unavoidable’
circumstances, not to dodge everything, functional compat is very
important irrespective of anything and just a behaviour change is not
"just" a behaviour change & is sometimes the reason for folks not
upgrading!!! Talk to folks and projects still on 2.x

PPS. There are too many experts... The ‘Hadoop’ Experts who already
said  the final yes already. Curious: who all clasify as one, do we
have a list of them? Just kidding I know that!!!, just for fact who so
ever tells people he is the Hadoop expert, ok, but he is not Hadoop &
Hadoop is not only him.

Passed fedback already & I am done, It is good enough for the
'experts' to decode. Will abstain from commenting or following mail
thread further. Last 2 cents, should not encourage these 90/95%
things, leaves technical debts for others to clean up, if the guy
doesn't comes back for the remaining, in general which is the usual
case.

Sigh...

-Ayush

> On 28-Jul-2023, at 2:25 AM, Viraj Jasani  wrote:
>
> Thank you Wei-Chiu for the thread and extensive help with reviews! Thank
> you Ayush for responding to the thread!
> Let me try to address some points.
>
> Please pardon my ignorance if I am not supposed to respond to any of the
> questions.
>
>> Regarding this entire activity including the parent tickets: Do we have
> any dev list agreement for this?
>
> HADOOP-16206  was
> created back in Mar, 2019 and there has been tons of discussion on the Jira
> since then. Duo is an expert and he has also worked with our esteemed Log4j
> community to introduce changes that promise great benefits for both hbase
> and hadoop projects (for instance, [1]). He has laid out the plan to tackle
> the whole migration, one small piece at a time and there has been enough
> agreement on the Jira from Hadoop experts, some of Log4j community members
> also chimed in and provided their feedbacks, and it has been agreed upon to
> proceed with Duo's proposed plan and integrate the changes into the trunk.
> This will enable us to stabilize the work gradually over time.
> The Jira has received many interactions over the past few years.
>
>
>> What incompatibilities have been introduced till now for this and what
> are planned.
>
> Let me list down what has been done so far, that might be easier to discuss:
>
>
> - HADOOP-18206  removed
> commons-logging references, the project is no longer under any active
> development cycle (last release on 2014
> https://github.com/apache/commons-logging/tags), and without this
> cleanup, it becomes very difficult to chase log4j changes. No direct
> incompatibility involved.
> - HADOOP-18653 
> follow-up
> to ensure we use slf4j log4j adaptor to ensure slf4j is in the classpath
> before we update loglevel (dynamically change log level using servlet). No
> incompatibility introduced.
> - HADOOP-18648  kms
> log4j properties should not be loaded dynamically as this is no longer
> supported by log4j2, instead use HADOOP_OPTS to provide log4j properties
> location. No incompatibility introduced.
> - HADOOP-18654 
> TaskLogAppender
> is not being used, remove it. It was marked IA.Private 

Re: HADOOP-18207 hadoop-logging module about to land

2023-07-27 Thread Viraj Jasani
Thank you Wei-Chiu for the thread and extensive help with reviews! Thank
you Ayush for responding to the thread!
Let me try to address some points.

Please pardon my ignorance if I am not supposed to respond to any of the
questions.

> Regarding this entire activity including the parent tickets: Do we have
any dev list agreement for this?

HADOOP-16206  was
created back in Mar, 2019 and there has been tons of discussion on the Jira
since then. Duo is an expert and he has also worked with our esteemed Log4j
community to introduce changes that promise great benefits for both hbase
and hadoop projects (for instance, [1]). He has laid out the plan to tackle
the whole migration, one small piece at a time and there has been enough
agreement on the Jira from Hadoop experts, some of Log4j community members
also chimed in and provided their feedbacks, and it has been agreed upon to
proceed with Duo's proposed plan and integrate the changes into the trunk.
This will enable us to stabilize the work gradually over time.
The Jira has received many interactions over the past few years.


> What incompatibilities have been introduced till now for this and what
are planned.

Let me list down what has been done so far, that might be easier to discuss:


   - HADOOP-18206  removed
   commons-logging references, the project is no longer under any active
   development cycle (last release on 2014
   https://github.com/apache/commons-logging/tags), and without this
   cleanup, it becomes very difficult to chase log4j changes. No direct
   incompatibility involved.
   - HADOOP-18653 
follow-up
   to ensure we use slf4j log4j adaptor to ensure slf4j is in the classpath
   before we update loglevel (dynamically change log level using servlet). No
   incompatibility introduced.
   - HADOOP-18648  kms
   log4j properties should not be loaded dynamically as this is no longer
   supported by log4j2, instead use HADOOP_OPTS to provide log4j properties
   location. No incompatibility introduced.
   - HADOOP-18654 
TaskLogAppender
   is not being used, remove it. It was marked IA.Private and IS.Unstable. No
   incompatibility introduced.
   - HADOOP-18669 
remove Log4Json
   Layout as it is more suitable to be part of Log4j project rather than
   Hadoop, it's not being used anywhere. Each appender that we maintain, we
   pay for its maintenance cost. No incompatibility introduced.
   - HADOOP-18649  CLA
   and CLRA appenders to be replaced with log4j RFA appender. Both CLA and
   CLRA have been our custom appenders and they both provide the same
   capabilities as RFA hence their maintenance in our project would come with
   cost for any future upgrades of log4j. This has also been agreed upon on
   the parent Jira way before the work started.
   - HADOOP-18631  Migrate
   dynamic async appenders to log4j properties. This is *an incompatible
   change* because we replace "hadoop site configs" with "log4j
   properties". We are not losing out on our capability to generate async logs
   for namenode audit, but the way to configure it is now different. The
   release notes have been updated to reflect the same. For log4j upgrade, we
   don't have a choice here, log4j2 only supports async loggers as the
   configuration, not as programmatically loaded appenders. log4j properties
   to configure are provided on
   
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/log4j.properties#L64-L81


As for the current task that introduced hadoop-logging module HADOOP-18207
, we don't have any
incompatibility yet because our direct usage of log4j APIs and custom
appenders have not been marked IA.Public.

The major incompatibility is going to be introduced when we add log4j2 in
our classpath and remove log4j1 from our dependencies. This work has not
yet begun and it's going to take a while.


> What does this activity bring up for the downstream folks adapting this?
Upgrading Hadoop is indeed important for a lot of projects

Downstream projects should expect to no longer get log4j1 as a transitive
dependency from hadoop, instead they would get log4j2 as transitive
dependency (only after the whole upgrade is done, log4j2 upgrade has not
even started as I mentioned above :)).

This brings an interesting question: why do we need this upgrade? For us,
almost all of hadoop ecosystem projects that we use have migrated to
log4j2, and when we keep common thirdparty dependencies to be used by all
hadoop downstreamers, we can still not use log4j2 because 

Re: HADOOP-18207 hadoop-logging module about to land

2023-07-27 Thread Ayush Saxena
Hi Wei-Chiu,
I am glad this activity finally made it to the dev mailing list. Just
sharing the context being the guy who actually reverted this last time
it was in: It had a test failure on the PR itself and it went in, that
had nothing to do with the nature of the PR, generic for all PR and
all projects.

Some thoughts & Questions?
* Regarding this entire activity including the parent tickets: Do we
have any dev list agreement for this?
* What incompatibilities have been introduced till now for this and
what are planned.
* What does this activity bring up for the downstream folks adapting
this? Upgrading Hadoop is indeed important for a lot of projects and
for "us as well" and it is already a big pain (my past experience)
* What tests have been executed verifying all these changes including
this and the ones already in, apart from the Jenkins results, and
what's the plan.
* Considering you are heavily involved, any insights around perf stuff?
* This Comment 
[https://github.com/apache/hadoop/pull/5503#discussion_r1199614640],
this says it isn't moving all the instances? So, when do you plan to
work on this? Should that be a release blocker for us, since part of
the activity is in? Needless to say: "Best Effort, whatever could move
in, moves is, isn't an answer"
* The above comment thread even says losing some available abilities,
even some past one said so, what all is getting compromised, and how
do you plan to get it back? Most of the lost abilities are related to
HDFS, I don't think we are in a state to lose stuff there, if we
aren't having enough to make people adapt. Our ultimate goal isn't to
have something in, but to make people use it.
* What advantages do we get with all of these activities over existing
branch-3 stuff? Considering what are the trade-offs, Was discussing
with some folks offline & that seems to be a good question to have an
answer beforehand.

PS. Most of the time when this entire activity breaks & like usual we
are on a follow-up or on an addendum PR, there is generally some
sarcastic or a response like: 'We can't do it without breaking
things', and I am not taking any of these for now.

Most importantly since we are discussing it now and if there are
incompatibilities introduced already, is there a possible way out and
get rid of them, if not, if there ain't an agreement, how tough is
going back, because if it introduces incompatibilities for HDFS, you
won't get an agreement most probably, not sure about others but I will
veto that...


TLDR, Please hold unless all the concerns are addressed and we have an
agreement for this as well as anything done in past or planned for
future, Shouldn't compromise the adaptability of the product at any
cost

-Ayush

On Thu, 27 Jul 2023 at 03:47, Wei-Chiu Chuang  wrote:
>
> Hi,
>
> I am preparing to resolve HADOOP-18207
>  (
> https://github.com/apache/hadoop/pull/5717).
>
> This change affects all modules. With this change, it will eliminate almost
> all the direct log4j usage.
>
> As always, landing such a big piece is tricky. I am sorry for the mishaps
> last time and am doing more due diligence to make it a smoother transition.
> I am triggering one last precommit check. Once the change is merged, Viraj
> and I will pay attention to any potential problems.
>
> Weichiu

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org