[ https://issues.apache.org/jira/browse/HADOOP-12168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621268#comment-14621268 ]
Sangjin Lee commented on HADOOP-12168: -------------------------------------- Sorry [~gliptak], somehow I missed email about this and it fell through the cracks. I took a look at the patch, and have some high level questions on the patch. It appears to me that the patch mostly covers projects in hadoop-common-project and hadoop-tools, and the top level projects. Is that the intended scope of this patch? Are you going to follow up with subsequent patches to cover the other projects (yarn, hdfs, mapreduce)? Also, assuming the scope is hadoop-common-projects, hadoop-tools, and top-level, the following projects don't seem to have been covered: hadoop-nfs, hadoop-kms, hadoop-streaming, hadoop-distcp, hadoop-archives, hadoop-rumen, hadoop-gridmix, hadoop-datajoin, hadoop-ant, hadoop-extras, hadoop-client, hadoop-sls, hadoop-tools-dist, and hadoop-dist. Could you describe the nature of this patch and how we will eventually cover the all code base? While we're at it, what is the scope of this subtask? How is it different from the main JIRA? From the title, this subtask and the main JIRA seem almost identical. So I'm somewhat unsure what this subtask tries to address specifically. Some clarifications on the JIRA and the patch would be greatly appreciated. Thanks! Also, general comments on the changes. Fixing *undeclared used* dependencies is very deterministic and we can simply use the maven dependency analysis to add them. I don't think there is much complication in fixing them. On the other hand, fixing *declared unused* dependencies takes a much deeper look and greater care, or things could break very easily. As you undoubtedly saw, detecting what's unused is difficult. The maven dependency analysis simply follows the bytecode analysis and flags anything that's not referenced in the code. But that's only half of the story. If a certain dependency is flagged as unused by the maven dependency analysis, the only thing we can say at that point is that at least that particular dependency should not be a compile-scope dependency. Whether that dependency can be completely removed or should stay as a runtime (or test) dependency really depends on how that is used. One example is slf4j-log4j12. SLF4j uses implementation binding based on a runtime library being dropped on the classpath. So normally the only compile-time dependency is to slf4j-api. However, without a real implementation library (slf4j-log4j, slf4j-jdk, etc.) present in the classpath, SLF4j simply does not work. This is all based on dynamic runtime classloading, and it cannot be detected by any static code analysis. So, removing a SLF4j runtime library from the runtime scope can break things rather easily. There can be other examples where an unused dependency is not as straightforward as it seems due to dynamic classloading. In summary, we need to be 100% confident that a certain runtime dependency is truly unused and show the reason before we remove it. Hope it helps... > Clean undeclared used dependencies and declared unused dependencies > ------------------------------------------------------------------- > > Key: HADOOP-12168 > URL: https://issues.apache.org/jira/browse/HADOOP-12168 > Project: Hadoop Common > Issue Type: Sub-task > Components: build > Affects Versions: 3.0.0 > Reporter: Gabor Liptak > Assignee: Gabor Liptak > Attachments: HADOOP-12168.1.patch, HADOOP-12168.2.patch, > HADOOP-12168.3.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)