[jira] [Comment Edited] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame

Oscar D. Lara Yejas (JIRA) Fri, 13 Nov 2015 09:56:13 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004363#comment-15004363
 ]


Oscar D. Lara Yejas edited comment on SPARK-10863 at 11/13/15 5:54 PM:
-----------------------------------------------------------------------

[~felixcheung] Let me try to clarify a bit.

As suggested by [~shivaram], I implemented a fallback mechanism so that if 
there's no corresponding mapping from a Spark type into R's (i.e., mapping is 
NA), the same R type is returned.

The reason for this is that, in my opinion, having coltypes(df) return NA's 
would be a bit confusing from the user perspective. What would an NA type mean? 
Type not set or data inconsistency come to my mind if I were in the user's 
shoes.

I believe it all depends on the type of operations we want to support on 
Columns. For example, if the user wants to do:

df$column1 + 3
!df$colum2
grep(df$column3, "regex")
df$column4 / df$column5

column1, column4, and column5 must be numeric/integer, column2 must be logical, 
and column3 must be character.

Now, what kind of operations are we planning to support on Array, Struct, and 
Map types? Depending on that we could map them to lists/environment or leave 
them as they are right now.

Hope this helps clarify, and let me know your thoughts.

Thanks!




was (Author: olarayej):
[~felixcheung] Let me try to clarify a bit.

As suggested by [~shivaram], I implemented a fallback mechanism so that if 
there's no corresponding mapping from a Spark type into R's (i.e., mapping is 
NA), the same R type is returned.

The reason for this is that, in my opinion, having coltypes(df) return NA's 
would be a bit confusing from the user perspective. What would an NA type mean? 
Type not set or data inconsistency come to my mind if I were in the user's 
shoes.

I believe it all depends on the type of operations we want to support on 
Columns. For example, if the user wants to do:

df$column1 + 3
!df$colum2
grep(df$column, "regex")
df$column4 / df$column5

column1, column4, and column5 must be numeric/integer, column2 must be logical, 
and column3 must be character.

Now, what kind of operations are we planning to support on Array, Struct, and 
Map types? Depending on that we could map them to lists/environment or leave 
them as they are right now.

Hope this helps clarify, and let me know your thoughts.

Thanks!



> Method coltypes() to return the R column types of a DataFrame
> -------------------------------------------------------------
>
>                 Key: SPARK-10863
>                 URL: https://issues.apache.org/jira/browse/SPARK-10863
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SparkR
>    Affects Versions: 1.5.0
>            Reporter: Oscar D. Lara Yejas
>            Assignee: Oscar D. Lara Yejas
>             Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame

Reply via email to