[ https://issues.apache.org/jira/browse/SPARK-13610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15810767#comment-15810767 ]
Joseph K. Bradley commented on SPARK-13610: ------------------------------------------- This sounds like a reasonable use case, but I'm worried about users trying to expand huge Vectors and creating DataFrames with millions of columns, which they are not currently designed to handle. I also wonder if we could support this via native DataFrame APIs, though that might require adding support for indexing Vectors within DataFrame expressions. I'd like to ask a few questions of those who want this feature to understand the needs better: * Is the Vector length fixed in an application (6 mentioned above), or would it need to vary dynamically? If the latter, why and how? * Is the Vector length always small or sometimes large? * Would all elements in the Vector need to be expanded, or would you sometimes need to select a subset? > Create a Transformer to disassemble vectors in DataFrames > --------------------------------------------------------- > > Key: SPARK-13610 > URL: https://issues.apache.org/jira/browse/SPARK-13610 > Project: Spark > Issue Type: New Feature > Components: ML, SQL > Affects Versions: 1.6.0 > Reporter: Andrew MacKinlay > Priority: Minor > > It is possible to convert a standalone numeric field into a single-item > Vector, using VectorAssembler. However the inverse operation of retrieving a > single item from a vector and translating it into a field doesn't appear to > be possible. The workaround I've found is to leave the raw field value in the > DF, but I have found no other ways to get a field out of a vector (eg to > perform arithmetic on it). Happy to be proved wrong though. Creating a > user-defined function doesn't work (in Python at least; it gets a > pickleexception).This seems like a simple operation which should be supported > for various use cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org