[ 
https://issues.apache.org/jira/browse/SPARK-13610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15810767#comment-15810767
 ] 

Joseph K. Bradley commented on SPARK-13610:
-------------------------------------------

This sounds like a reasonable use case, but I'm worried about users trying to 
expand huge Vectors and creating DataFrames with millions of columns, which 
they are not currently designed to handle.  I also wonder if we could support 
this via native DataFrame APIs, though that might require adding support for 
indexing Vectors within DataFrame expressions.

I'd like to ask a few questions of those who want this feature to understand 
the needs better:
* Is the Vector length fixed in an application (6 mentioned above), or would it 
need to vary dynamically?  If the latter, why and how?
* Is the Vector length always small or sometimes large?
* Would all elements in the Vector need to be expanded, or would you sometimes 
need to select a subset?

> Create a Transformer to disassemble vectors in DataFrames
> ---------------------------------------------------------
>
>                 Key: SPARK-13610
>                 URL: https://issues.apache.org/jira/browse/SPARK-13610
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML, SQL
>    Affects Versions: 1.6.0
>            Reporter: Andrew MacKinlay
>            Priority: Minor
>
> It is possible to convert a standalone numeric field into a single-item 
> Vector, using VectorAssembler. However the inverse operation of retrieving a 
> single item from a vector and translating it into a field doesn't appear to 
> be possible. The workaround I've found is to leave the raw field value in the 
> DF, but I have found no other ways to get a field out of a vector (eg to 
> perform arithmetic on it). Happy to be proved wrong though. Creating a 
> user-defined function doesn't work (in Python at least; it gets a 
> pickleexception).This seems like a simple operation which should be supported 
> for various use cases. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to