[ https://issues.apache.org/jira/browse/SPARK-20504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008898#comment-16008898 ]
Weichen Xu commented on SPARK-20504: ------------------------------------ I have already taken the following steps to check this QA issue, and I also attach some output logs in this email, I skipped the `mllib` package which are deprecated: 1) use `jar -tf` to extract the classes in `ml` package, towards master version and 2.1.1 version, but I use `grep` to filter some nested classes (which class name contains “$”) 2) extracts the classes both existed in master version and 2.1.1 version, and use `javap -protected -s` to get the signature information of them, and use `diff` to compare their difference, and I manually check each difference, check their corresponding scala-doc and java-doc for consistency and potential incompatible problems. 3) extracts the classes added after 2.1.1 version, these classes are: ------------------- org.apache.spark.ml.classification.LinearSVC org.apache.spark.ml.classification.LinearSVC$ org.apache.spark.ml.classification.LinearSVCAggregator org.apache.spark.ml.classification.LinearSVCCostFun org.apache.spark.ml.classification.LinearSVCModel org.apache.spark.ml.classification.LinearSVCModel$ org.apache.spark.ml.classification.LinearSVCParams org.apache.spark.ml.clustering.ExpectationAggregator org.apache.spark.ml.feature.Imputer org.apache.spark.ml.feature.Imputer$ org.apache.spark.ml.feature.ImputerModel org.apache.spark.ml.feature.ImputerModel$ org.apache.spark.ml.feature.ImputerParams org.apache.spark.ml.fpm.AssociationRules org.apache.spark.ml.fpm.AssociationRules$ org.apache.spark.ml.fpm.FPGrowth org.apache.spark.ml.fpm.FPGrowth$ org.apache.spark.ml.fpm.FPGrowthModel org.apache.spark.ml.fpm.FPGrowthModel$ org.apache.spark.ml.fpm.FPGrowthParams org.apache.spark.ml.r.BisectingKMeansWrapper org.apache.spark.ml.r.BisectingKMeansWrapper$ org.apache.spark.ml.recommendation.TopByKeyAggregator org.apache.spark.ml.r.FPGrowthWrapper org.apache.spark.ml.r.FPGrowthWrapper$ org.apache.spark.ml.r.LinearSVCWrapper org.apache.spark.ml.r.LinearSVCWrapper$ org.apache.spark.ml.source.libsvm.LibSVMOptions org.apache.spark.ml.source.libsvm.LibSVMOptions$ org.apache.spark.ml.stat.ChiSquareTest org.apache.spark.ml.stat.ChiSquareTest$ org.apache.spark.ml.stat.Correlation org.apache.spark.ml.stat.Correlation$ ------------------ To these classes, I use `javap -s` to get their signatures and also manually check their corresponding scala-doc and java-docs. After I check the things listed above, I found no problem related to java compatibility. Only a small problem is, the `private` class marked in scala code, when compiled into bytecode, the `private` modifier seems to be lost and `javap` regard them as `public` classes. and Java-docs will also include these classes, these classes contains `***Aggregator`, `***CostFun` and so on but I think it is the problem scala compiler need to resolve. I attach the processing script I wrote and some intermediate output files for your further check, including: 1) processing script 2) class and method signature diff result between 2.1.1 and master version, for `ml` classes existing both in the two version. 3) class and method signature of the `ml` classes added after version 2.1.1 4) classes existing both in master and 2.1.1 version 5) classes added after version 2.1.1 > ML 2.2 QA: API: Java compatibility, docs > ---------------------------------------- > > Key: SPARK-20504 > URL: https://issues.apache.org/jira/browse/SPARK-20504 > Project: Spark > Issue Type: Sub-task > Components: Documentation, Java API, ML, MLlib > Reporter: Joseph K. Bradley > Assignee: Weichen Xu > Priority: Blocker > > Check Java compatibility for this release: > * APIs in {{spark.ml}} > * New APIs in {{spark.mllib}} (There should be few, if any.) > Checking compatibility means: > * Checking for differences in how Scala and Java handle types. Some items to > look out for are: > ** Check for generic "Object" types where Java cannot understand complex > Scala types. > *** *Note*: The Java docs do not always match the bytecode. If you find a > problem, please verify it using {{javap}}. > ** Check Scala objects (especially with nesting!) carefully. These may not > be understood in Java, or they may be accessible only via the weirdly named > Java types (with "$" or "#") which are generated by the Scala compiler. > ** Check for uses of Scala and Java enumerations, which can show up oddly in > the other language's doc. (In {{spark.ml}}, we have largely tried to avoid > using enumerations, and have instead favored plain strings.) > * Check for differences in generated Scala vs Java docs. E.g., one past > issue was that Javadocs did not respect Scala's package private modifier. > If you find issues, please comment here, or for larger items, create separate > JIRAs and link here as "requires". > * Remember that we should not break APIs from previous releases. If you find > a problem, check if it was introduced in this Spark release (in which case we > can fix it) or in a previous one (in which case we can create a java-friendly > version of the API). > * If needed for complex issues, create small Java unit tests which execute > each method. (Algorithmic correctness can be checked in Scala.) > Recommendations for how to complete this task: > * There are not great tools. In the past, this task has been done by: > ** Generating API docs > ** Building JAR and outputting the Java class signatures for MLlib > ** Manually inspecting and searching the docs and class signatures for issues > * If you do have ideas for better tooling, please say so we can make this > task easier in the future! -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org