[ https://issues.apache.org/jira/browse/SPARK-19653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872938#comment-15872938 ]
Liang-Chi Hsieh edited comment on SPARK-19653 at 2/18/17 3:12 AM: ------------------------------------------------------------------ Actually some Spark SQL functions like the mentioned {{avg}}, {{sum}} only support {{NumericType}}. They don't support {{Vector}} is not all because {{Vector}} type isn't first-class citizen in Spark SQL. Personally I would -1 for this. was (Author: viirya): Actually some Spark SQL functions like the mentioned {{avg}}, {{sum}} only support {{NumericType}}. They don't support {{Vector}} is not all because {{Vector}} type isn't first-class citizen in Spark SQL. > `Vector` Type Should Be A First-Class Citizen In Spark SQL > ---------------------------------------------------------- > > Key: SPARK-19653 > URL: https://issues.apache.org/jira/browse/SPARK-19653 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib, SQL > Affects Versions: 2.1.0, 2.2.0 > Reporter: Mike Dusenberry > > *Issue*: The {{Vector}} type in Spark MLlib (DataFrame-based API, informally > "Spark ML") should be added as a first-class citizen to Spark SQL. > *Current Status*: Currently, Spark MLlib adds a [{{Vector}} SQL datatype | > https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.linalg.SQLDataTypes$] > to allow DataFrames/DataSets to use {{Vector}} columns, which is necessary > for MLlib algorithms. Although this allows a DataFrame/DataSet to contain > vectors, it does not allow one to make complete use of the rich set of > features made available by Spark SQL. For example, it is not possible to use > any of the SQL functions, such as {{avg}}, {{sum}}, etc. on a {{Vector}} > column, nor is it possible to save a DataFrame with a {{Vector}} column as a > CSV file. In any of these cases, an error message is returned with an note > that the operator is not supported on a {{Vector}} type. > *Benefit*: Allow users to make use of all Spark SQL features that can be > reasonably applied to a vector. > *Goal*: Move the {{Vector}} type from Spark MLlib into Spark SQL as a > first-class citizen. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org