[ 
https://issues.apache.org/jira/browse/SPARK-19653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923307#comment-15923307
 ] 

Joseph K. Bradley commented on SPARK-19653:
-------------------------------------------

I agree it'd be nice to make it easier to work with linalg types in DataFrames. 
 There are 2 paths:
1. Make linalg types (at least Vector, ideally Matrix) into first-class 
citizens of Spark SQL.
2. Improve support for UDTs so that linalg types can stay in MLlib yet still be 
easy to work with in DataFrames.

For the purpose of ML, I'm OK with either.  For the purpose of making Spark 
more useful and powerful in general, one could argue that 2 is the better 
choice, although it might be harder to design and implement.

> `Vector` Type Should Be A First-Class Citizen In Spark SQL
> ----------------------------------------------------------
>
>                 Key: SPARK-19653
>                 URL: https://issues.apache.org/jira/browse/SPARK-19653
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib, SQL
>    Affects Versions: 2.1.0, 2.2.0
>            Reporter: Mike Dusenberry
>
> *Issue*: The {{Vector}} type in Spark MLlib (DataFrame-based API, informally 
> "Spark ML") should be added as a first-class citizen to Spark SQL.
> *Current Status*:  Currently, Spark MLlib adds a [{{Vector}} SQL datatype | 
> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.linalg.SQLDataTypes$]
>  to allow DataFrames/DataSets to use {{Vector}} columns, which is necessary 
> for MLlib algorithms.  Although this allows a DataFrame/DataSet to contain 
> vectors, it does not allow one to make complete use of the rich set of 
> features made available by Spark SQL.  For example, it is not possible to use 
> any of the SQL functions, such as {{avg}}, {{sum}}, etc. on a {{Vector}} 
> column, nor is it possible to save a DataFrame with a {{Vector}} column as a 
> CSV file.  In any of these cases, an error message is returned with an note 
> that the operator is not supported on a {{Vector}} type.
> *Benefit*: Allow users to make use of all Spark SQL features that can be 
> reasonably applied to a vector.
> *Goal*:  Move the {{Vector}} type from Spark MLlib into Spark SQL as a 
> first-class citizen.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to