[GitHub] spark pull request: [SPARK-10863][SPARKR] Method coltypes() to get...

olarayej Tue, 06 Oct 2015 13:58:00 -0700

Github user olarayej commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8984#discussion_r41321101
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -1881,3 +1881,31 @@ setMethod("as.data.frame",
                 collect(x)
               }
     )
    +
    +#' Returns the column types of a DataFrame.
    +#' 
    +#' @name coltypes
    +#' @title Get column types of a DataFrame
    +#' @param x (DataFrame)
    +#' @return value (character) A character vector with the column types of 
the given DataFrame
    +#' @rdname coltypes
    +setMethod("coltypes",
    +          signature(x = "DataFrame"),
    +          function(x) {
    +            # TODO: This may be moved as a global parameter
    +            # These are the supported data types and how they map to
    +            # R's data types
    +            DATA_TYPES <- c("string"="character",
    +                            "double"="numeric",
    +                            "int"="integer",
    +                            "long"="integer",
    +                            "boolean"="long"
    +            )
    +
    +            # Get the data types of the DataFrame by invoking dtypes() 
function.
    +            # Some post-processing is needed.
    +            types <- as.character(t(as.data.frame(dtypes(x))[2, ]))
    +
    +            # Map Spark data types into R's data types
    +            as.character(DATA_TYPES[types])
    --- End diff --
    
    @felixcheung Yeah, that's a good point. I'm thinking coltypes() should 
always have an equivalent R data type for each column. We don't want method 
coltypes() to return NA's or throw an unsupported-type error cuz that would 
mean that the input DataFrame is inconsistent.
    
    Therefore, it'd be just a matter of putting in DATA_TYPES, the list all 
possible values returned by dtypes() (If I'm missing any). I couldn't find that 
in the docs. Could you point me to the list?
    
    Finally, I think the check for unsupported data types should be done 
instead in the coltypes()<- method and in the DataFrame initialization. 
coltypes() assumes the input DataFrame was assigned valid data types, which 
makes sense to me.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10863][SPARKR] Method coltypes() to get...

Reply via email to