[ 
https://issues.apache.org/jira/browse/HADOOP-12168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621268#comment-14621268
 ] 

Sangjin Lee commented on HADOOP-12168:
--------------------------------------

Sorry [~gliptak], somehow I missed email about this and it fell through the 
cracks.

I took a look at the patch, and have some high level questions on the patch.

It appears to me that the patch mostly covers projects in hadoop-common-project 
and hadoop-tools, and the top level projects. Is that the intended scope of 
this patch? Are you going to follow up with subsequent patches to cover the 
other projects (yarn, hdfs, mapreduce)?

Also, assuming the scope is hadoop-common-projects, hadoop-tools, and 
top-level, the following projects don't seem to have been covered: hadoop-nfs, 
hadoop-kms, hadoop-streaming, hadoop-distcp, hadoop-archives, hadoop-rumen, 
hadoop-gridmix, hadoop-datajoin, hadoop-ant, hadoop-extras, hadoop-client, 
hadoop-sls, hadoop-tools-dist, and hadoop-dist. Could you describe the nature 
of this patch and how we will eventually cover the all code base?

While we're at it, what is the scope of this subtask? How is it different from 
the main JIRA? From the title, this subtask and the main JIRA seem almost 
identical. So I'm somewhat unsure what this subtask tries to address 
specifically. Some clarifications on the JIRA and the patch would be greatly 
appreciated. Thanks!

Also, general comments on the changes. Fixing *undeclared used* dependencies is 
very deterministic and we can simply use the maven dependency analysis to add 
them. I don't think there is much complication in fixing them.

On the other hand, fixing *declared unused* dependencies takes a much deeper 
look and greater care, or things could break very easily. As you undoubtedly 
saw, detecting what's unused is difficult. The maven dependency analysis simply 
follows the bytecode analysis and flags anything that's not referenced in the 
code. But that's only half of the story.

If a certain dependency is flagged as unused by the maven dependency analysis, 
the only thing we can say at that point is that at least that particular 
dependency should not be a compile-scope dependency. Whether that dependency 
can be completely removed or should stay as a runtime (or test) dependency 
really depends on how that is used. One example is slf4j-log4j12. SLF4j uses 
implementation binding based on a runtime library being dropped on the 
classpath. So normally the only compile-time dependency is to slf4j-api. 
However, without a real implementation library (slf4j-log4j, slf4j-jdk, etc.) 
present in the classpath, SLF4j simply does not work. This is all based on 
dynamic runtime classloading, and it cannot be detected by any static code 
analysis. So, removing a SLF4j runtime library from the runtime scope can break 
things rather easily. There can be other examples where an unused dependency is 
not as straightforward as it seems due to dynamic classloading.

In summary, we need to be 100% confident that a certain runtime dependency is 
truly unused and show the reason before we remove it. Hope it helps...



> Clean undeclared used dependencies and declared unused dependencies
> -------------------------------------------------------------------
>
>                 Key: HADOOP-12168
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12168
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: build
>    Affects Versions: 3.0.0
>            Reporter: Gabor Liptak
>            Assignee: Gabor Liptak
>         Attachments: HADOOP-12168.1.patch, HADOOP-12168.2.patch, 
> HADOOP-12168.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to