[ https://issues.apache.org/jira/browse/SPARK-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004363#comment-15004363 ]
Oscar D. Lara Yejas edited comment on SPARK-10863 at 11/13/15 5:54 PM: ----------------------------------------------------------------------- [~felixcheung] Let me try to clarify a bit. As suggested by [~shivaram], I implemented a fallback mechanism so that if there's no corresponding mapping from a Spark type into R's (i.e., mapping is NA), the same R type is returned. The reason for this is that, in my opinion, having coltypes(df) return NA's would be a bit confusing from the user perspective. What would an NA type mean? Type not set or data inconsistency come to my mind if I were in the user's shoes. I believe it all depends on the type of operations we want to support on Columns. For example, if the user wants to do: df$column1 + 3 !df$colum2 grep(df$column3, "regex") df$column4 / df$column5 column1, column4, and column5 must be numeric/integer, column2 must be logical, and column3 must be character. Now, what kind of operations are we planning to support on Array, Struct, and Map types? Depending on that we could map them to lists/environment or leave them as they are right now. Hope this helps clarify, and let me know your thoughts. Thanks! was (Author: olarayej): [~felixcheung] Let me try to clarify a bit. As suggested by [~shivaram], I implemented a fallback mechanism so that if there's no corresponding mapping from a Spark type into R's (i.e., mapping is NA), the same R type is returned. The reason for this is that, in my opinion, having coltypes(df) return NA's would be a bit confusing from the user perspective. What would an NA type mean? Type not set or data inconsistency come to my mind if I were in the user's shoes. I believe it all depends on the type of operations we want to support on Columns. For example, if the user wants to do: df$column1 + 3 !df$colum2 grep(df$column, "regex") df$column4 / df$column5 column1, column4, and column5 must be numeric/integer, column2 must be logical, and column3 must be character. Now, what kind of operations are we planning to support on Array, Struct, and Map types? Depending on that we could map them to lists/environment or leave them as they are right now. Hope this helps clarify, and let me know your thoughts. Thanks! > Method coltypes() to return the R column types of a DataFrame > ------------------------------------------------------------- > > Key: SPARK-10863 > URL: https://issues.apache.org/jira/browse/SPARK-10863 > Project: Spark > Issue Type: Sub-task > Components: SparkR > Affects Versions: 1.5.0 > Reporter: Oscar D. Lara Yejas > Assignee: Oscar D. Lara Yejas > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org