Joseph K. Bradley created SPARK-9704:
----------------------------------------

             Summary: Make some ML APIs public: VectorUDT, Identifiable, 
ProbabilisticClassifier
                 Key: SPARK-9704
                 URL: https://issues.apache.org/jira/browse/SPARK-9704
             Project: Spark
          Issue Type: Improvement
          Components: ML
            Reporter: Joseph K. Bradley
            Assignee: Joseph K. Bradley


This JIRA is for making several ML APIs public to make it easier for users to 
write their own Pipeline stages.

Issue brought up by [~eronwright].  Descriptions below copied from 
[http://apache-spark-developers-list.1001551.n3.nabble.com/Make-ML-Developer-APIs-public-post-1-4-td13583.html].

We plan to make these APIs public in Spark 1.5.  However, they will be marked 
DeveloperApi and are *very likely* to be broken in the future.
* VectorUDT: To define a relation with a vector field, VectorUDT must be 
instantiated.
* Identifiable trait: The trait generates a unique identifier for the 
associated pipeline component.  Nice to have a consistent format by reusing the 
trait.
* ProbabilisticClassifier.  Third-party components should leverage the complex 
logic around computing only selected columns.

We will not yet make these public:
* SchemaUtils: Third-party pipeline components have a need for checking column 
types and appending columns.
** This will probably be moved into Spark SQL.  Users can copy the methods into 
their own code as needed.
* Shared Params (HasLabel, HasFeatures): This is covered in [SPARK-7146] but 
reiterating it here.
** We need to discuss whether these should be standardized public APIs.  Users 
can copy the traits into their own code as needed.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to