Mike Dusenberry created SPARK-19653: ---------------------------------------
Summary: `Vector` Type Should Be A First-Class Citizen In Spark SQL Key: SPARK-19653 URL: https://issues.apache.org/jira/browse/SPARK-19653 Project: Spark Issue Type: Improvement Components: ML, MLlib, SQL Affects Versions: 2.1.0, 2.2.0 Reporter: Mike Dusenberry *Issue*: The {{Vector}} type in Spark MLlib (DataFrame-based API, informally "Spark ML") should be added as a first-class citizen to Spark SQL. *Current Status*: Currently, Spark MLlib adds a [{{Vector}} SQL datatype | https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.linalg.SQLDataTypes$] to allow DataFrames/DataSets to use {{Vector}} columns, which is necessary for MLlib algorithms. Although this allows a DataFrame/DataSet to contain vectors, it does not allow one to make complete use of the rich set of features made available by Spark SQL. For example, it is not possible to use any of the SQL functions, such as {{avg}}, {{sum}}, etc. on a {{Vector}} column, nor is it possible to save a DataFrame with a {{Vector}} column as a CSV file. In any of these cases, an error message is returned with an note that the operator is not supported on a {{Vector}} type. *Benefit*: Allow users to make use of all Spark SQL features that can be reasonably applied to a vector. *Goal*: Move the {{Vector}} type from Spark MLlib into Spark SQL as a first-class citizen. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org