[ 
https://issues.apache.org/jira/browse/SPARK-20504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008898#comment-16008898
 ] 

Weichen Xu commented on SPARK-20504:
------------------------------------

I have already taken the following steps to check this QA issue, and I also 
attach some output logs in this email, I skipped the `mllib` package which are 
deprecated:


1)  use `jar -tf` to extract the classes in `ml` package, towards master 
version and 2.1.1 version, but I use `grep` to filter some nested classes 
(which class name contains “$”)


2) extracts the classes both existed in master version and 2.1.1 version, and 
use `javap -protected -s` to get the signature information of them, and use 
`diff` to compare their difference, and I manually check each difference, check 
their corresponding scala-doc and java-doc for consistency and potential 
incompatible problems.


3) extracts the classes added after 2.1.1 version, these classes are:
-------------------
org.apache.spark.ml.classification.LinearSVC
org.apache.spark.ml.classification.LinearSVC$
org.apache.spark.ml.classification.LinearSVCAggregator
org.apache.spark.ml.classification.LinearSVCCostFun
org.apache.spark.ml.classification.LinearSVCModel
org.apache.spark.ml.classification.LinearSVCModel$
org.apache.spark.ml.classification.LinearSVCParams
org.apache.spark.ml.clustering.ExpectationAggregator
org.apache.spark.ml.feature.Imputer
org.apache.spark.ml.feature.Imputer$
org.apache.spark.ml.feature.ImputerModel
org.apache.spark.ml.feature.ImputerModel$
org.apache.spark.ml.feature.ImputerParams
org.apache.spark.ml.fpm.AssociationRules
org.apache.spark.ml.fpm.AssociationRules$
org.apache.spark.ml.fpm.FPGrowth
org.apache.spark.ml.fpm.FPGrowth$
org.apache.spark.ml.fpm.FPGrowthModel
org.apache.spark.ml.fpm.FPGrowthModel$
org.apache.spark.ml.fpm.FPGrowthParams
org.apache.spark.ml.r.BisectingKMeansWrapper
org.apache.spark.ml.r.BisectingKMeansWrapper$
org.apache.spark.ml.recommendation.TopByKeyAggregator
org.apache.spark.ml.r.FPGrowthWrapper
org.apache.spark.ml.r.FPGrowthWrapper$
org.apache.spark.ml.r.LinearSVCWrapper
org.apache.spark.ml.r.LinearSVCWrapper$
org.apache.spark.ml.source.libsvm.LibSVMOptions
org.apache.spark.ml.source.libsvm.LibSVMOptions$
org.apache.spark.ml.stat.ChiSquareTest
org.apache.spark.ml.stat.ChiSquareTest$
org.apache.spark.ml.stat.Correlation
org.apache.spark.ml.stat.Correlation$
------------------
To these classes, I use `javap -s` to get their signatures and also manually 
check their corresponding scala-doc and java-docs.


After I check the things listed above, I found no problem related to java 
compatibility.
Only a small problem is, the `private` class marked in scala code, when 
compiled into bytecode, the `private` modifier seems to be lost and `javap` 
regard them as `public` classes. and Java-docs will also include these classes, 
these classes contains `***Aggregator`, `***CostFun` and so on but I think it 
is the problem scala compiler need to resolve. 


I attach the processing script I wrote and some intermediate output files for 
your further check, including:
1) processing script
2) class and method signature diff result between 2.1.1 and master version, for 
`ml` classes existing both in the two version.
3) class and method signature of the `ml` classes added after version 2.1.1
4) classes existing both in master and 2.1.1 version
5) classes added after version 2.1.1

> ML 2.2 QA: API: Java compatibility, docs
> ----------------------------------------
>
>                 Key: SPARK-20504
>                 URL: https://issues.apache.org/jira/browse/SPARK-20504
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Documentation, Java API, ML, MLlib
>            Reporter: Joseph K. Bradley
>            Assignee: Weichen Xu
>            Priority: Blocker
>
> Check Java compatibility for this release:
> * APIs in {{spark.ml}}
> * New APIs in {{spark.mllib}} (There should be few, if any.)
> Checking compatibility means:
> * Checking for differences in how Scala and Java handle types. Some items to 
> look out for are:
> ** Check for generic "Object" types where Java cannot understand complex 
> Scala types.
> *** *Note*: The Java docs do not always match the bytecode. If you find a 
> problem, please verify it using {{javap}}.
> ** Check Scala objects (especially with nesting!) carefully.  These may not 
> be understood in Java, or they may be accessible only via the weirdly named 
> Java types (with "$" or "#") which are generated by the Scala compiler.
> ** Check for uses of Scala and Java enumerations, which can show up oddly in 
> the other language's doc.  (In {{spark.ml}}, we have largely tried to avoid 
> using enumerations, and have instead favored plain strings.)
> * Check for differences in generated Scala vs Java docs.  E.g., one past 
> issue was that Javadocs did not respect Scala's package private modifier.
> If you find issues, please comment here, or for larger items, create separate 
> JIRAs and link here as "requires".
> * Remember that we should not break APIs from previous releases.  If you find 
> a problem, check if it was introduced in this Spark release (in which case we 
> can fix it) or in a previous one (in which case we can create a java-friendly 
> version of the API).
> * If needed for complex issues, create small Java unit tests which execute 
> each method.  (Algorithmic correctness can be checked in Scala.)
> Recommendations for how to complete this task:
> * There are not great tools.  In the past, this task has been done by:
> ** Generating API docs
> ** Building JAR and outputting the Java class signatures for MLlib
> ** Manually inspecting and searching the docs and class signatures for issues
> * If you do have ideas for better tooling, please say so we can make this 
> task easier in the future!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to