[ 
https://issues.apache.org/jira/browse/SPARK-16074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-16074:
----------------------------------
    Description: 
Both VectorUDT and MatrixUDT are private APIs, because UserDefinedType itself 
is private in Spark. However, in order to let developers implement their own 
transformers and estimators, we should expose both types in a public API to 
simply the implementation of transformSchema, transform, etc. Otherwise, they 
need to get the data types using reflection.

Note that this doesn't mean to expose VectorUDT/MatrixUDT classes. We can just 
have a method or a static value that returns VectorUDT/MatrixUDT instance with 
DataType as the return type. There are two ways to implement this:
1. following DataTypes.java in SQL, so Java users doesn't need the extra "()".
2. Define DataTypes in Scala.

  was:
Both VectorUDT and MatrixUDT are private APIs, because UserDefinedType itself 
is private in Spark. However, in order to let developers implement their own 
transformers and estimators, we should expose both types in a public API to 
simply the implementation of transformSchema, transform, etc. Otherwise, they 
need to get the data types using reflection.

Note that this doesn't mean to expose VectorUDT/MatrixUDT classes. We can just 
have a method or a static value that returns VectorUDT/MatrixUDT instance with 
DataType as the return type.


> Expose VectorUDT/MatrixUDT in a public API
> ------------------------------------------
>
>                 Key: SPARK-16074
>                 URL: https://issues.apache.org/jira/browse/SPARK-16074
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLilb
>    Affects Versions: 2.0.0
>            Reporter: Xiangrui Meng
>            Priority: Critical
>
> Both VectorUDT and MatrixUDT are private APIs, because UserDefinedType itself 
> is private in Spark. However, in order to let developers implement their own 
> transformers and estimators, we should expose both types in a public API to 
> simply the implementation of transformSchema, transform, etc. Otherwise, they 
> need to get the data types using reflection.
> Note that this doesn't mean to expose VectorUDT/MatrixUDT classes. We can 
> just have a method or a static value that returns VectorUDT/MatrixUDT 
> instance with DataType as the return type. There are two ways to implement 
> this:
> 1. following DataTypes.java in SQL, so Java users doesn't need the extra "()".
> 2. Define DataTypes in Scala.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to