[ https://issues.apache.org/jira/browse/SPARK-12806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964475#comment-16964475 ]
John Bauer commented on SPARK-12806: ------------------------------------ This is still a problem. For example, classification models emit probability as a VectorUDT, which are unusable in PySpark. This makes constructing boosting/bagging algorithms or even just using them as additional features in a second model problematic. > Support SQL expressions extracting values from VectorUDT > -------------------------------------------------------- > > Key: SPARK-12806 > URL: https://issues.apache.org/jira/browse/SPARK-12806 > Project: Spark > Issue Type: Improvement > Components: MLlib, SQL > Affects Versions: 1.6.0 > Reporter: Feynman Liang > Priority: Major > Labels: bulk-closed > > Use cases exist where a specific index within a {{VectorUDT}} column of a > {{DataFrame}} is required. For example, we may be interested in extracting a > specific class probability from the {{probabilityCol}} of a > {{LogisticRegression}} to compute losses. However, if {{probability}} is a > column of {{df}} with type {{VectorUDT}}, the following code fails: > {code} > df.select("probability.0") > AnalysisException: u"Can't extract value from probability" > {code} > thrown from > {{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala}}. > {{VectorUDT}} essentially wraps a {{StructType}}, hence one would expect it > to support value extraction Expressions in an analogous way. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org