Hello,

In developing new third-party pipeline components for Spark ML 1.4 (see 
dl4j-spark-ml), I encountered a few gaps in the earlier effort to make the ML 
Developer APIs public (SPARK-5995).    I plan to file issues after we discuss 
on this thread.   The below is a list of types that are presently private but 
might best be made public.
VectorUDT.    To define a relation with a vector field,  VectorUDT must be 
instantiated.
SchemaUtils.   Third-party pipeline components have a need for checking column 
types and appending columns.
Identifiable trait.   The trait generates a unique identifier for the 
associated pipeline component.  Nice to have a consistent format by reusing the 
trait.
ProbabilisticClassifier.  Third-party components should leverage the complex 
logic around computing only selected columns.
Shared Params (HasLabel, HasFeatures).   This is covered in SPARK-7146 but 
reiterating it here.
Thanks,
Eron Wright

Reply via email to