[jira] [Commented] (SPARK-17774) Add support for head on DataFrame Column
[ https://issues.apache.org/jira/browse/SPARK-17774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15550390#comment-15550390 ] Oscar D. Lara Yejas commented on SPARK-17774: - To implement method head() only I'll be happy to: 1) Remove lines 63-69 (method collect) in PR 11336 2) Throw an error if a column can't be collected as opposed to returning an empty column (though I'm okay with either option) Once again, all my code IS STILL NEEDED for head() to (1) having Column class to have a reference to the parent DatFrame and (2) propagating the parent DataFrame through every possible Column operation. Bottom line: we should mark this JIRA as a duplicate and merge PR 11336 with the minor changes above. Let me know if I have your blessing so I can proceed with this. It should be very quick for me. Thanks! cc: [~falaki] [~shivaram] > Add support for head on DataFrame Column > > > Key: SPARK-17774 > URL: https://issues.apache.org/jira/browse/SPARK-17774 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 2.0.0 >Reporter: Hossein Falaki > > There was a lot of discussion on SPARK-9325. To summarize the conversation on > that ticket regardign {{collect}} > * Pro: Ease of use and maximum compatibility with existing R API > * Con: We do not want to increase maintenance cost by opening arbitrary API. > With Spark's DataFrame API {{collect}} does not work on {{Column}} and there > is no need for it to work in R. > This ticket is strictly about {{head}}. I propose supporting {{head}} on > {{Column}} because: > 1. R users are already used to calling {{head(iris$Sepal.Length)}}. When they > do that on SparkDataFrame they get an error. Not a good experience > 2. Adding support for it does not require any change to the backend. It can > be trivially done in R code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17774) Add support for head on DataFrame Column
[ https://issues.apache.org/jira/browse/SPARK-17774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15547429#comment-15547429 ] Oscar D. Lara Yejas commented on SPARK-17774: - [~shivaram]: I concur with Shivaram. Besides, I already implemented method head() in my PR 11336: https://github.com/apache/spark/pull/11336 If you wanted to implement method head() alone, you'll still need to do all changes I did for PR 11336 except for the 5 lines of code of method collect(). If that's the case, I'd rather suggest to merge PR 11336. [~falaki]: In the corner cases where there's no parent DataFrame, we can return an empty value as opposed to throwing an error. This behavior is already implemented in PR 11336. Also, though R doesn't have method collect(), I think it's still useful to turn a Column into an R vector. Perhaps a function called as.vector()? Thanks folks! > Add support for head on DataFrame Column > > > Key: SPARK-17774 > URL: https://issues.apache.org/jira/browse/SPARK-17774 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 2.0.0 >Reporter: Hossein Falaki > > There was a lot of discussion on SPARK-9325. To summarize the conversation on > that ticket regardign {{collect}} > * Pro: Ease of use and maximum compatibility with existing R API > * Con: We do not want to increase maintenance cost by opening arbitrary API. > With Spark's DataFrame API {{collect}} does not work on {{Column}} and there > is no need for it to work in R. > This ticket is strictly about {{head}}. I propose supporting {{head}} on > {{Column}} because: > 1. R users are already used to calling {{head(iris$Sepal.Length)}}. When they > do that on SparkDataFrame they get an error. Not a good experience > 2. Adding support for it does not require any change to the backend. It can > be trivially done in R code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17774) Add support for head on DataFrame Column
[ https://issues.apache.org/jira/browse/SPARK-17774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15547429#comment-15547429 ] Oscar D. Lara Yejas edited comment on SPARK-17774 at 10/5/16 2:48 AM: -- I concur with [~shivaram]. Besides, I already implemented method head() in my PR 11336: https://github.com/apache/spark/pull/11336 If you wanted to implement method head() alone, you'll still need to do all changes I did for PR 11336 except for the 5 lines of code of method collect(). If that's the case, I'd rather suggest to merge PR 11336. [~falaki]: In the corner cases where there's no parent DataFrame, we can return an empty value as opposed to throwing an error. This behavior is already implemented in PR 11336. Also, though R doesn't have method collect(), I think it's still useful to turn a Column into an R vector. Perhaps a function called as.vector()? Thanks folks! was (Author: olarayej): [~shivaram]: I concur with Shivaram. Besides, I already implemented method head() in my PR 11336: https://github.com/apache/spark/pull/11336 If you wanted to implement method head() alone, you'll still need to do all changes I did for PR 11336 except for the 5 lines of code of method collect(). If that's the case, I'd rather suggest to merge PR 11336. [~falaki]: In the corner cases where there's no parent DataFrame, we can return an empty value as opposed to throwing an error. This behavior is already implemented in PR 11336. Also, though R doesn't have method collect(), I think it's still useful to turn a Column into an R vector. Perhaps a function called as.vector()? Thanks folks! > Add support for head on DataFrame Column > > > Key: SPARK-17774 > URL: https://issues.apache.org/jira/browse/SPARK-17774 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 2.0.0 >Reporter: Hossein Falaki > > There was a lot of discussion on SPARK-9325. To summarize the conversation on > that ticket regardign {{collect}} > * Pro: Ease of use and maximum compatibility with existing R API > * Con: We do not want to increase maintenance cost by opening arbitrary API. > With Spark's DataFrame API {{collect}} does not work on {{Column}} and there > is no need for it to work in R. > This ticket is strictly about {{head}}. I propose supporting {{head}} on > {{Column}} because: > 1. R users are already used to calling {{head(iris$Sepal.Length)}}. When they > do that on SparkDataFrame they get an error. Not a good experience > 2. Adding support for it does not require any change to the backend. It can > be trivially done in R code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16581) Making JVM backend calling functions public
[ https://issues.apache.org/jira/browse/SPARK-16581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384782#comment-15384782 ] Oscar D. Lara Yejas commented on SPARK-16581: - [~aloknsingh] [~adrian555] Could any of you share your thoughts on this? > Making JVM backend calling functions public > --- > > Key: SPARK-16581 > URL: https://issues.apache.org/jira/browse/SPARK-16581 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Shivaram Venkataraman > > As described in the design doc in SPARK-15799, to help packages that need to > call into the JVM, it will be good to expose some of the R -> JVM functions > we have. > As a part of this we could also rename, reformat the functions to make them > more user friendly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16611) Expose several hidden DataFrame/RDD functions
[ https://issues.apache.org/jira/browse/SPARK-16611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oscar D. Lara Yejas updated SPARK-16611: Description: Expose the following functions: - lapply or map - lapplyPartition or mapPartition - flatMap - RDD - toRDD - getJRDD - cleanup.jobj cc: [~javierluraschi] [~j...@rstudio.com] [~shivaram] was: Expose the following functions: - lapply or map - lapplyPartition or mapPartition - flatMap - RDD - toRDD - getJRDD - cleanup.jobj > Expose several hidden DataFrame/RDD functions > - > > Key: SPARK-16611 > URL: https://issues.apache.org/jira/browse/SPARK-16611 > Project: Spark > Issue Type: Improvement > Components: SparkR >Reporter: Oscar D. Lara Yejas > > Expose the following functions: > - lapply or map > - lapplyPartition or mapPartition > - flatMap > - RDD > - toRDD > - getJRDD > - cleanup.jobj > cc: > [~javierluraschi] [~j...@rstudio.com] [~shivaram] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16608) Expose JVM SparkR API functions
[ https://issues.apache.org/jira/browse/SPARK-16608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oscar D. Lara Yejas updated SPARK-16608: Description: Expose the following functions: - invokeJava - callJStatic - callJMethod - cleanup.jobj - broadcast and useBroadcast cc: [~javierluraschi] [~j...@rstudio.com] [~shivaram] was: Expose the following functions: - invokeJava - callJStatic - callJMethod - cleanup.jobj - broadcast and useBroadcast > Expose JVM SparkR API functions > > > Key: SPARK-16608 > URL: https://issues.apache.org/jira/browse/SPARK-16608 > Project: Spark > Issue Type: Improvement > Components: SparkR >Reporter: Oscar D. Lara Yejas > > Expose the following functions: > - invokeJava > - callJStatic > - callJMethod > - cleanup.jobj > - broadcast and useBroadcast > cc: > [~javierluraschi] [~j...@rstudio.com] [~shivaram] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16611) Expose several hidden DataFrame/RDD functions
Oscar D. Lara Yejas created SPARK-16611: --- Summary: Expose several hidden DataFrame/RDD functions Key: SPARK-16611 URL: https://issues.apache.org/jira/browse/SPARK-16611 Project: Spark Issue Type: Improvement Components: SparkR Reporter: Oscar D. Lara Yejas Expose the following functions: - lapply or map - lapplyPartition or mapPartition - flatMap - RDD - toRDD - getJRDD - cleanup.jobj -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16608) Expose JVM SparkR API functions
[ https://issues.apache.org/jira/browse/SPARK-16608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oscar D. Lara Yejas updated SPARK-16608: Description: Expose the following functions: - invokeJava - callJStatic - callJMethod - cleanup.jobj - broadcast and useBroadcast was: - invokeJava - callJStatic - callJMethod - cleanup.jobj - broadcast and useBroadcast 2) DataFrame API - lapply or map - lapplyPartition or mapPartition - flatMap 3) RDD apis - RDD - toRDD - getJRDD - cleanup.jobj > Expose JVM SparkR API functions > > > Key: SPARK-16608 > URL: https://issues.apache.org/jira/browse/SPARK-16608 > Project: Spark > Issue Type: Improvement > Components: SparkR >Reporter: Oscar D. Lara Yejas > > Expose the following functions: > - invokeJava > - callJStatic > - callJMethod > - cleanup.jobj > - broadcast and useBroadcast -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16608) Expose JVM SparkR API functions
[ https://issues.apache.org/jira/browse/SPARK-16608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oscar D. Lara Yejas updated SPARK-16608: Description: - invokeJava - callJStatic - callJMethod - cleanup.jobj - broadcast and useBroadcast 2) DataFrame API - lapply or map - lapplyPartition or mapPartition - flatMap 3) RDD apis - RDD - toRDD - getJRDD - cleanup.jobj was: 1) RPC/memory API - invokeJava - callJStatic - callJMethod - cleanup.jobj - broadcast and useBroadcast 2) DataFrame API - lapply or map - lapplyPartition or mapPartition - flatMap 3) RDD apis - RDD - toRDD - getJRDD - cleanup.jobj > Expose JVM SparkR API functions > > > Key: SPARK-16608 > URL: https://issues.apache.org/jira/browse/SPARK-16608 > Project: Spark > Issue Type: Improvement > Components: SparkR >Reporter: Oscar D. Lara Yejas > > - invokeJava > - callJStatic > - callJMethod > - cleanup.jobj > - broadcast and useBroadcast > > 2) DataFrame API > - lapply or map > - lapplyPartition or mapPartition > - flatMap > > 3) RDD apis > > - RDD > - toRDD > - getJRDD > - cleanup.jobj -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16608) Expose JVM SparkR API functions
[ https://issues.apache.org/jira/browse/SPARK-16608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oscar D. Lara Yejas updated SPARK-16608: Summary: Expose JVM SparkR API functions (was: Expose some low-level SparkR functions ) > Expose JVM SparkR API functions > > > Key: SPARK-16608 > URL: https://issues.apache.org/jira/browse/SPARK-16608 > Project: Spark > Issue Type: Improvement > Components: SparkR >Reporter: Oscar D. Lara Yejas > > 1) RPC/memory API > - invokeJava > - callJStatic > - callJMethod > - cleanup.jobj > - broadcast and useBroadcast > > 2) DataFrame API > - lapply or map > - lapplyPartition or mapPartition > - flatMap > > 3) RDD apis > > - RDD > - toRDD > - getJRDD > - cleanup.jobj -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16608) Expose some low-level SparkR functions
Oscar D. Lara Yejas created SPARK-16608: --- Summary: Expose some low-level SparkR functions Key: SPARK-16608 URL: https://issues.apache.org/jira/browse/SPARK-16608 Project: Spark Issue Type: Improvement Components: SparkR Reporter: Oscar D. Lara Yejas 1) RPC/memory API - invokeJava - callJStatic - callJMethod - cleanup.jobj - broadcast and useBroadcast 2) DataFrame API - lapply or map - lapplyPartition or mapPartition - flatMap 3) RDD apis - RDD - toRDD - getJRDD - cleanup.jobj -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14256) Remove parameter sqlContext from as.DataFrame
[ https://issues.apache.org/jira/browse/SPARK-14256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oscar D. Lara Yejas updated SPARK-14256: Description: Currently, the user requires to pass parameter sqlContext to both createDataFrame and as.DataFrame. Since sqlContext is a singleton global parameter, it should be optional from the signature of as.DataFrame. (was: Currently, the user requires to pass parameter sqlContext to both createDataFrame and as.DataFrame. Since sqlContext is a singleton global parameter, it should be obviated from the signature of these two methods.) > Remove parameter sqlContext from as.DataFrame > - > > Key: SPARK-14256 > URL: https://issues.apache.org/jira/browse/SPARK-14256 > Project: Spark > Issue Type: Improvement > Components: SparkR >Reporter: Oscar D. Lara Yejas > > Currently, the user requires to pass parameter sqlContext to both > createDataFrame and as.DataFrame. Since sqlContext is a singleton global > parameter, it should be optional from the signature of as.DataFrame. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14256) Remove parameter sqlContext from as.DataFrame
[ https://issues.apache.org/jira/browse/SPARK-14256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oscar D. Lara Yejas updated SPARK-14256: Summary: Remove parameter sqlContext from as.DataFrame (was: Remove parameter sqlContext from as.DataFrame and createDataFrame) > Remove parameter sqlContext from as.DataFrame > - > > Key: SPARK-14256 > URL: https://issues.apache.org/jira/browse/SPARK-14256 > Project: Spark > Issue Type: Improvement > Components: SparkR >Reporter: Oscar D. Lara Yejas > > Currently, the user requires to pass parameter sqlContext to both > createDataFrame and as.DataFrame. Since sqlContext is a singleton global > parameter, it should be obviated from the signature of these two methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14256) Remove parameter sqlContext from as.DataFrame and createDataFrame
[ https://issues.apache.org/jira/browse/SPARK-14256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oscar D. Lara Yejas updated SPARK-14256: Description: Currently, the user requires to pass parameter sqlContext to both createDataFrame and as.DataFrame. Since sqlContext is a singleton global parameter, it should be obviated from the signature of these two methods. > Remove parameter sqlContext from as.DataFrame and createDataFrame > - > > Key: SPARK-14256 > URL: https://issues.apache.org/jira/browse/SPARK-14256 > Project: Spark > Issue Type: Improvement > Components: SparkR >Reporter: Oscar D. Lara Yejas > > Currently, the user requires to pass parameter sqlContext to both > createDataFrame and as.DataFrame. Since sqlContext is a singleton global > parameter, it should be obviated from the signature of these two methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14256) Remove parameter sqlContext from as.DataFrame and createDataFrame
[ https://issues.apache.org/jira/browse/SPARK-14256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15217018#comment-15217018 ] Oscar D. Lara Yejas commented on SPARK-14256: - I'm working on this one > Remove parameter sqlContext from as.DataFrame and createDataFrame > - > > Key: SPARK-14256 > URL: https://issues.apache.org/jira/browse/SPARK-14256 > Project: Spark > Issue Type: Improvement > Components: SparkR >Reporter: Oscar D. Lara Yejas > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14256) Remove parameter sqlContext from as.DataFrame and createDataFrame
Oscar D. Lara Yejas created SPARK-14256: --- Summary: Remove parameter sqlContext from as.DataFrame and createDataFrame Key: SPARK-14256 URL: https://issues.apache.org/jira/browse/SPARK-14256 Project: Spark Issue Type: Improvement Components: SparkR Reporter: Oscar D. Lara Yejas -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13734) SparkR histogram
[ https://issues.apache.org/jira/browse/SPARK-13734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oscar D. Lara Yejas updated SPARK-13734: Description: Create method histogram() on SparkR to render a histogram of a given Column. > SparkR histogram > > > Key: SPARK-13734 > URL: https://issues.apache.org/jira/browse/SPARK-13734 > Project: Spark > Issue Type: New Feature > Components: SparkR >Reporter: Oscar D. Lara Yejas >Priority: Minor > > Create method histogram() on SparkR to render a histogram of a given Column. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13734) SparkR histogram
[ https://issues.apache.org/jira/browse/SPARK-13734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oscar D. Lara Yejas updated SPARK-13734: Summary: SparkR histogram (was: Histogram) > SparkR histogram > > > Key: SPARK-13734 > URL: https://issues.apache.org/jira/browse/SPARK-13734 > Project: Spark > Issue Type: New Feature > Components: SparkR >Reporter: Oscar D. Lara Yejas > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13734) Histogram
Oscar D. Lara Yejas created SPARK-13734: --- Summary: Histogram Key: SPARK-13734 URL: https://issues.apache.org/jira/browse/SPARK-13734 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Oscar D. Lara Yejas -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13734) Histogram
[ https://issues.apache.org/jira/browse/SPARK-13734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184057#comment-15184057 ] Oscar D. Lara Yejas commented on SPARK-13734: - I'm working on this one. > Histogram > - > > Key: SPARK-13734 > URL: https://issues.apache.org/jira/browse/SPARK-13734 > Project: Spark > Issue Type: New Feature > Components: SparkR >Reporter: Oscar D. Lara Yejas > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9325) Support `collect` on DataFrame columns
[ https://issues.apache.org/jira/browse/SPARK-9325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159928#comment-15159928 ] Oscar D. Lara Yejas commented on SPARK-9325: Hi, folks. I have created a PR for this. A design document is enclosed in the PR. Thanks, Oscar > Support `collect` on DataFrame columns > -- > > Key: SPARK-9325 > URL: https://issues.apache.org/jira/browse/SPARK-9325 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Shivaram Venkataraman > > This is to support code of the form > ``` > ages <- collect(df$Age) > ``` > Right now `df$Age` returns a Column, which has no functions supported. > Similarly we might consider supporting `head(df$Age)` etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13436) Add parameter drop to subsetting oeprator
[ https://issues.apache.org/jira/browse/SPARK-13436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oscar D. Lara Yejas updated SPARK-13436: Issue Type: Sub-task (was: Task) Parent: SPARK-9315 > Add parameter drop to subsetting oeprator > - > > Key: SPARK-13436 > URL: https://issues.apache.org/jira/browse/SPARK-13436 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Oscar D. Lara Yejas > > Parameter drops allows to return a vector/data.frame accordingly if the > result of subsetting a data.frame has one single column (see example below). > The same behavior is needed on a DataFrame. > > head(iris[, 1, drop=F]) > Sepal.Length > 1 5.1 > 2 4.9 > 3 4.7 > 4 4.6 > 5 5.0 > 6 5.4 > > head(iris[, 1, drop=T]) > [1] 5.1 4.9 4.7 4.6 5.0 5.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13436) Add parameter drop to subsetting operator [
[ https://issues.apache.org/jira/browse/SPARK-13436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oscar D. Lara Yejas updated SPARK-13436: Summary: Add parameter drop to subsetting operator [ (was: Add parameter drop to subsetting oeprator) > Add parameter drop to subsetting operator [ > --- > > Key: SPARK-13436 > URL: https://issues.apache.org/jira/browse/SPARK-13436 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Oscar D. Lara Yejas > > Parameter drops allows to return a vector/data.frame accordingly if the > result of subsetting a data.frame has one single column (see example below). > The same behavior is needed on a DataFrame. > > head(iris[, 1, drop=F]) > Sepal.Length > 1 5.1 > 2 4.9 > 3 4.7 > 4 4.6 > 5 5.0 > 6 5.4 > > head(iris[, 1, drop=T]) > [1] 5.1 4.9 4.7 4.6 5.0 5.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13436) Add parameter drop to subsetting oeprator
[ https://issues.apache.org/jira/browse/SPARK-13436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oscar D. Lara Yejas updated SPARK-13436: Description: Parameter drops allows to return a vector/data.frame accordingly if the result of subsetting a data.frame has one single column (see example below). The same behavior is needed on a DataFrame. > head(iris[, 1, drop=F]) Sepal.Length 1 5.1 2 4.9 3 4.7 4 4.6 5 5.0 6 5.4 > head(iris[, 1, drop=T]) [1] 5.1 4.9 4.7 4.6 5.0 5.4 > Add parameter drop to subsetting oeprator > - > > Key: SPARK-13436 > URL: https://issues.apache.org/jira/browse/SPARK-13436 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Oscar D. Lara Yejas > > Parameter drops allows to return a vector/data.frame accordingly if the > result of subsetting a data.frame has one single column (see example below). > The same behavior is needed on a DataFrame. > > head(iris[, 1, drop=F]) > Sepal.Length > 1 5.1 > 2 4.9 > 3 4.7 > 4 4.6 > 5 5.0 > 6 5.4 > > head(iris[, 1, drop=T]) > [1] 5.1 4.9 4.7 4.6 5.0 5.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13436) Add parameter drop to subsetting oeprator
[ https://issues.apache.org/jira/browse/SPARK-13436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oscar D. Lara Yejas updated SPARK-13436: Issue Type: Task (was: Bug) > Add parameter drop to subsetting oeprator > - > > Key: SPARK-13436 > URL: https://issues.apache.org/jira/browse/SPARK-13436 > Project: Spark > Issue Type: Task > Components: SparkR >Reporter: Oscar D. Lara Yejas > > Parameter drops allows to return a vector/data.frame accordingly if the > result of subsetting a data.frame has one single column (see example below). > The same behavior is needed on a DataFrame. > > head(iris[, 1, drop=F]) > Sepal.Length > 1 5.1 > 2 4.9 > 3 4.7 > 4 4.6 > 5 5.0 > 6 5.4 > > head(iris[, 1, drop=T]) > [1] 5.1 4.9 4.7 4.6 5.0 5.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13436) Add parameter drop to subsetting oeprator
[ https://issues.apache.org/jira/browse/SPARK-13436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157544#comment-15157544 ] Oscar D. Lara Yejas commented on SPARK-13436: - I'm working on this one > Add parameter drop to subsetting oeprator > - > > Key: SPARK-13436 > URL: https://issues.apache.org/jira/browse/SPARK-13436 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Oscar D. Lara Yejas > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13436) Add parameter drop to subsetting oeprator
Oscar D. Lara Yejas created SPARK-13436: --- Summary: Add parameter drop to subsetting oeprator Key: SPARK-13436 URL: https://issues.apache.org/jira/browse/SPARK-13436 Project: Spark Issue Type: Bug Components: SparkR Reporter: Oscar D. Lara Yejas -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13327) colnames()<- allows invalid column names
[ https://issues.apache.org/jira/browse/SPARK-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147843#comment-15147843 ] Oscar D. Lara Yejas commented on SPARK-13327: - I'm working on this one > colnames()<- allows invalid column names > > > Key: SPARK-13327 > URL: https://issues.apache.org/jira/browse/SPARK-13327 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Oscar D. Lara Yejas > > colnames<- fails if: > 1) Given colnames contain . > 2) Given colnames contain NA > 3) Given colnames are not character > 4) Given colnames have different length than dataset's (SparkSQL error is > through but not user friendly) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13327) colnames()<- allows invalid column names
Oscar D. Lara Yejas created SPARK-13327: --- Summary: colnames()<- allows invalid column names Key: SPARK-13327 URL: https://issues.apache.org/jira/browse/SPARK-13327 Project: Spark Issue Type: Bug Components: SparkR Reporter: Oscar D. Lara Yejas colnames<- fails if: 1) Given colnames contain . 2) Given colnames contain NA 3) Given colnames are not character 4) Given colnames have different length than dataset's (SparkSQL error is through but not user friendly) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame
[ https://issues.apache.org/jira/browse/SPARK-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009172#comment-15009172 ] Oscar D. Lara Yejas commented on SPARK-10863: - [~shivaram] [~sunrui] [~felixcheung] Any thoughts on this? > Method coltypes() to return the R column types of a DataFrame > - > > Key: SPARK-10863 > URL: https://issues.apache.org/jira/browse/SPARK-10863 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 1.5.0 >Reporter: Oscar D. Lara Yejas >Assignee: Oscar D. Lara Yejas > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame
[ https://issues.apache.org/jira/browse/SPARK-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15005019#comment-15005019 ] Oscar D. Lara Yejas edited comment on SPARK-10863 at 11/14/15 1:19 AM: --- [~felixcheung] I think a solution to all three issues would be to implement wrapper classes for complex types. For example, for StructType, we could have something like the small prototype I implemented below (still very raw, but just to give you an idea). I'd also need to implement class Row accordingly to handle the values. I could do something similar for MapType, and I believe a list/vector should suffice for ArrayType. Thoughts? {code:title=Struct.R|borderStyle=solid} # You can actually just copy and paste the code below on R to run it setClass("StructField", representation( name = "character", type = "character" )) # A Struct is a set of StructField objects, modeled as an environment setClass("Struct", representation( struct = "environment" )) # Initialize a Struct from a list of StructField objects setMethod("initialize", signature = "Struct", definition= function(.Object, fields) { lapply(fields, function(field) { .Object@struct[[field@name]] <- field }) return(.Object) }) # Overwrite [[ operator to access the environment directly setGeneric("[[") setMethod("[[", signature="Struct", definition= function(x, i) { return(x@struct[[i]]) }) # Overwrite [[<- operator to access the environment directly setGeneric("[[<-") setMethod("[[<-", signature="Struct", definition= function(x, i, value) { if (class(value) == "StructField") { x@struct[[i]] <- value } return(x) }) field1 <- new("StructField", name="x", type="numeric") field2 <- new("StructField", name="y", type="character") s <- new("Struct", fields=list(field1, field2)) s[["x"]] s[["z"]] <- new("StructField", name="z", type="logical") {code} was (Author: olarayej): [~felixcheung] I think a solution to all three issues would be to implement wrapper classes for complex types. For example, for StructType, we could have something like the small prototype I implemented below (still very raw, but just to give you an idea). I'd also need to implement class Row accordingly to handle the values. I could do something similar for MapType, and I believe a list/vector should suffice for ArrayType. Thoughts? # You can actually just copy and paste the code below on R to run it setClass("StructField", representation( name = "character", type = "character" )) # A Struct is a set of StructField objects, modeled as an environment setClass("Struct", representation( struct = "environment" )) # Initialize a Struct from a list of StructField objects setMethod("initialize", signature = "Struct", definition= function(.Object, fields) { lapply(fields, function(field) { .Object@struct[[field@name]] <- field }) return(.Object) }) # Overwrite [[ operator to access the environment directly setGeneric("[[") setMethod("[[", signature="Struct", definition= function(x, i) { return(x@struct[[i]]) }) # Overwrite [[<- operator to access the environment directly setGeneric("[[<-") setMethod("[[<-", signature="Struct", definition= function(x, i, value) { if (class(value) == "StructField") { x@struct[[i]] <- value } return(x) }) field1 <- new("StructField", name="x", type="numeric") field2 <- new("StructField", name="y", type="character") s <- new("Struct", fields=list(field1, field2)) s[["x"]] s[["z"]] <- new("StructField", name="z", type="logical") > Method coltypes() to return the R column types of a DataFrame > - > > Key: SPARK-10863 > URL: https://issues.apache.org/jira/browse/SPARK-10863 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 1.5.0 >Reporter: Oscar D. Lara Yejas >Assignee: Oscar D. Lara Yejas > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame
[ https://issues.apache.org/jira/browse/SPARK-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15005019#comment-15005019 ] Oscar D. Lara Yejas commented on SPARK-10863: - [~felixcheung] I think a solution to all three issues would be to implement wrapper classes for complex types. For example, for StructType, we could have something like the small prototype I implemented below (still very raw, but just to give you an idea). I'd also need to implement class Row accordingly to handle the values. I could do something similar for MapType, and I believe a list/vector should suffice for ArrayType. Thoughts? # You can actually just copy and paste the code below on R to run it setClass("StructField", representation( name = "character", type = "character" )) # A Struct is a set of StructField objects, modeled as an environment setClass("Struct", representation( struct = "environment" )) # Initialize a Struct from a list of StructField objects setMethod("initialize", signature = "Struct", definition= function(.Object, fields) { lapply(fields, function(field) { .Object@struct[[field@name]] <- field }) return(.Object) }) # Overwrite [[ operator to access the environment directly setGeneric("[[") setMethod("[[", signature="Struct", definition= function(x, i) { return(x@struct[[i]]) }) # Overwrite [[<- operator to access the environment directly setGeneric("[[<-") setMethod("[[<-", signature="Struct", definition= function(x, i, value) { if (class(value) == "StructField") { x@struct[[i]] <- value } return(x) }) field1 <- new("StructField", name="x", type="numeric") field2 <- new("StructField", name="y", type="character") s <- new("Struct", fields=list(field1, field2)) s[["x"]] s[["z"]] <- new("StructField", name="z", type="logical") > Method coltypes() to return the R column types of a DataFrame > - > > Key: SPARK-10863 > URL: https://issues.apache.org/jira/browse/SPARK-10863 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 1.5.0 >Reporter: Oscar D. Lara Yejas >Assignee: Oscar D. Lara Yejas > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame
[ https://issues.apache.org/jira/browse/SPARK-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004363#comment-15004363 ] Oscar D. Lara Yejas edited comment on SPARK-10863 at 11/13/15 5:58 PM: --- [~felixcheung] Let me try to clarify a bit. As suggested by [~shivaram], I implemented a fallback mechanism so that if there's no corresponding mapping from a Spark type into R's (i.e., mapping is NA), the same R type is returned. The reason for this is that, in my opinion, having coltypes(df) return NA's would be a bit confusing from the user perspective. What would an NA type mean? Type not set or data inconsistency come to my mind if I were in the user's shoes. I believe it all depends on the type of operations we want to support on Columns. For example, if the user wants to do: df$column1 + 3 !df$colum2 grep(df$column3, "regex") df$column4 / df$column5 column1, column4, and column5 must be numeric/integer, column2 must be logical, and column3 must be character. Now, what kind of operations are we planning to support on Array, Struct, and Map types? Depending on that we could map them to lists/environment or I could fix it so that instead of returning map, for example, I could return map. Hope this helps clarify, and let me know your thoughts. Thanks! was (Author: olarayej): [~felixcheung] Let me try to clarify a bit. As suggested by [~shivaram], I implemented a fallback mechanism so that if there's no corresponding mapping from a Spark type into R's (i.e., mapping is NA), the same R type is returned. The reason for this is that, in my opinion, having coltypes(df) return NA's would be a bit confusing from the user perspective. What would an NA type mean? Type not set or data inconsistency come to my mind if I were in the user's shoes. I believe it all depends on the type of operations we want to support on Columns. For example, if the user wants to do: df$column1 + 3 !df$colum2 grep(df$column3, "regex") df$column4 / df$column5 column1, column4, and column5 must be numeric/integer, column2 must be logical, and column3 must be character. Now, what kind of operations are we planning to support on Array, Struct, and Map types? Depending on that we could map them to lists/environment or leave them as they are right now. Hope this helps clarify, and let me know your thoughts. Thanks! > Method coltypes() to return the R column types of a DataFrame > - > > Key: SPARK-10863 > URL: https://issues.apache.org/jira/browse/SPARK-10863 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 1.5.0 >Reporter: Oscar D. Lara Yejas >Assignee: Oscar D. Lara Yejas > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame
[ https://issues.apache.org/jira/browse/SPARK-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004363#comment-15004363 ] Oscar D. Lara Yejas edited comment on SPARK-10863 at 11/13/15 5:54 PM: --- [~felixcheung] Let me try to clarify a bit. As suggested by [~shivaram], I implemented a fallback mechanism so that if there's no corresponding mapping from a Spark type into R's (i.e., mapping is NA), the same R type is returned. The reason for this is that, in my opinion, having coltypes(df) return NA's would be a bit confusing from the user perspective. What would an NA type mean? Type not set or data inconsistency come to my mind if I were in the user's shoes. I believe it all depends on the type of operations we want to support on Columns. For example, if the user wants to do: df$column1 + 3 !df$colum2 grep(df$column3, "regex") df$column4 / df$column5 column1, column4, and column5 must be numeric/integer, column2 must be logical, and column3 must be character. Now, what kind of operations are we planning to support on Array, Struct, and Map types? Depending on that we could map them to lists/environment or leave them as they are right now. Hope this helps clarify, and let me know your thoughts. Thanks! was (Author: olarayej): [~felixcheung] Let me try to clarify a bit. As suggested by [~shivaram], I implemented a fallback mechanism so that if there's no corresponding mapping from a Spark type into R's (i.e., mapping is NA), the same R type is returned. The reason for this is that, in my opinion, having coltypes(df) return NA's would be a bit confusing from the user perspective. What would an NA type mean? Type not set or data inconsistency come to my mind if I were in the user's shoes. I believe it all depends on the type of operations we want to support on Columns. For example, if the user wants to do: df$column1 + 3 !df$colum2 grep(df$column, "regex") df$column4 / df$column5 column1, column4, and column5 must be numeric/integer, column2 must be logical, and column3 must be character. Now, what kind of operations are we planning to support on Array, Struct, and Map types? Depending on that we could map them to lists/environment or leave them as they are right now. Hope this helps clarify, and let me know your thoughts. Thanks! > Method coltypes() to return the R column types of a DataFrame > - > > Key: SPARK-10863 > URL: https://issues.apache.org/jira/browse/SPARK-10863 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 1.5.0 >Reporter: Oscar D. Lara Yejas >Assignee: Oscar D. Lara Yejas > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame
[ https://issues.apache.org/jira/browse/SPARK-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004363#comment-15004363 ] Oscar D. Lara Yejas commented on SPARK-10863: - [~felixcheung] Let me try to clarify a bit. As suggested by [~shivaram], I implemented a fallback mechanism so that if there's no corresponding mapping from a Spark type into R's (i.e., mapping is NA), the same R type is returned. The reason for this is that, in my opinion, having coltypes(df) return NA's would be a bit confusing from the user perspective. What would an NA type mean? Type not set or data inconsistency come to my mind if I were in the user's shoes. I believe it all depends on the type of operations we want to support on Columns. For example, if the user wants to do: df$column1 + 3 !df$colum2 grep(df$column, "regex") df$column4 / df$column5 column1, column4, and column5 must be numeric/integer, column2 must be logical, and column3 must be character. Now, what kind of operations are we planning to support on Array, Struct, and Map types? Depending on that we could map them to lists/environment or leave them as they are right now. Hope this helps clarify, and let me know your thoughts. Thanks! > Method coltypes() to return the R column types of a DataFrame > - > > Key: SPARK-10863 > URL: https://issues.apache.org/jira/browse/SPARK-10863 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 1.5.0 >Reporter: Oscar D. Lara Yejas >Assignee: Oscar D. Lara Yejas > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11031) SparkR str() method on DataFrame objects
[ https://issues.apache.org/jira/browse/SPARK-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oscar D. Lara Yejas updated SPARK-11031: Issue Type: Sub-task (was: New Feature) Parent: SPARK-9315 > SparkR str() method on DataFrame objects > > > Key: SPARK-11031 > URL: https://issues.apache.org/jira/browse/SPARK-11031 > Project: Spark > Issue Type: Sub-task >Reporter: Oscar D. Lara Yejas > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11031) SparkR str() method on DataFrame objects
[ https://issues.apache.org/jira/browse/SPARK-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950988#comment-14950988 ] Oscar D. Lara Yejas commented on SPARK-11031: - I'm working on this one. It depends on coltypes(). > SparkR str() method on DataFrame objects > > > Key: SPARK-11031 > URL: https://issues.apache.org/jira/browse/SPARK-11031 > Project: Spark > Issue Type: New Feature >Reporter: Oscar D. Lara Yejas > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11031) SparkR str() method on DataFrame objects
Oscar D. Lara Yejas created SPARK-11031: --- Summary: SparkR str() method on DataFrame objects Key: SPARK-11031 URL: https://issues.apache.org/jira/browse/SPARK-11031 Project: Spark Issue Type: New Feature Reporter: Oscar D. Lara Yejas -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame
[ https://issues.apache.org/jira/browse/SPARK-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14934322#comment-14934322 ] Oscar D. Lara Yejas commented on SPARK-10863: - I have changed this JIRA as a subtask of 9315. Thanks! -Oscar > Method coltypes() to return the R column types of a DataFrame > - > > Key: SPARK-10863 > URL: https://issues.apache.org/jira/browse/SPARK-10863 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 1.5.0 >Reporter: Oscar D. Lara Yejas > Fix For: 1.5.1 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame
[ https://issues.apache.org/jira/browse/SPARK-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oscar D. Lara Yejas updated SPARK-10863: Issue Type: Sub-task (was: Task) Parent: SPARK-9315 > Method coltypes() to return the R column types of a DataFrame > - > > Key: SPARK-10863 > URL: https://issues.apache.org/jira/browse/SPARK-10863 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 1.5.0 >Reporter: Oscar D. Lara Yejas > Fix For: 1.5.1 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame
[ https://issues.apache.org/jira/browse/SPARK-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oscar D. Lara Yejas updated SPARK-10863: Issue Type: Task (was: New Feature) > Method coltypes() to return the R column types of a DataFrame > - > > Key: SPARK-10863 > URL: https://issues.apache.org/jira/browse/SPARK-10863 > Project: Spark > Issue Type: Task > Components: SparkR >Affects Versions: 1.5.0 >Reporter: Oscar D. Lara Yejas > Fix For: 1.5.1 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame
[ https://issues.apache.org/jira/browse/SPARK-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14934307#comment-14934307 ] Oscar D. Lara Yejas edited comment on SPARK-10863 at 9/28/15 11:13 PM: --- Spark data types are different than R's. For example: Spark -> R double -> numeric string -> character int -> integer Method coltypes() shows the corresponding R types of a Spark DataFrame My implementation uses method dtypes() under the covers. was (Author: olarayej): Spark data types are different than R's. For example: Spark R double -> numeric string -> character int -> integer Method coltypes() shows the corresponding R types of a Spark DataFrame My implementation uses method dtypes() under the covers. > Method coltypes() to return the R column types of a DataFrame > - > > Key: SPARK-10863 > URL: https://issues.apache.org/jira/browse/SPARK-10863 > Project: Spark > Issue Type: New Feature > Components: SparkR >Affects Versions: 1.5.0 >Reporter: Oscar D. Lara Yejas > Fix For: 1.5.1 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame
[ https://issues.apache.org/jira/browse/SPARK-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14934307#comment-14934307 ] Oscar D. Lara Yejas commented on SPARK-10863: - Spark data types are different than R's. For example: Spark R double -> numeric string -> character int -> integer Method coltypes() shows the corresponding R types of a Spark DataFrame My implementation uses method dtypes() under the covers. > Method coltypes() to return the R column types of a DataFrame > - > > Key: SPARK-10863 > URL: https://issues.apache.org/jira/browse/SPARK-10863 > Project: Spark > Issue Type: New Feature > Components: SparkR >Affects Versions: 1.5.0 >Reporter: Oscar D. Lara Yejas > Fix For: 1.5.1 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame
[ https://issues.apache.org/jira/browse/SPARK-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14934233#comment-14934233 ] Oscar D. Lara Yejas commented on SPARK-10863: - I'm working on this one. -Oscar > Method coltypes() to return the R column types of a DataFrame > - > > Key: SPARK-10863 > URL: https://issues.apache.org/jira/browse/SPARK-10863 > Project: Spark > Issue Type: New Feature > Components: SparkR >Affects Versions: 1.5.0 >Reporter: Oscar D. Lara Yejas > Fix For: 1.5.1 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame
Oscar D. Lara Yejas created SPARK-10863: --- Summary: Method coltypes() to return the R column types of a DataFrame Key: SPARK-10863 URL: https://issues.apache.org/jira/browse/SPARK-10863 Project: Spark Issue Type: New Feature Components: SparkR Affects Versions: 1.5.0 Reporter: Oscar D. Lara Yejas Fix For: 1.5.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10807) Add as.data.frame() as a synonym for collect()
[ https://issues.apache.org/jira/browse/SPARK-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14906760#comment-14906760 ] Oscar D. Lara Yejas commented on SPARK-10807: - I'm working on this one. Thanks, Oscar > Add as.data.frame() as a synonym for collect() > -- > > Key: SPARK-10807 > URL: https://issues.apache.org/jira/browse/SPARK-10807 > Project: Spark > Issue Type: New Feature > Components: SparkR >Affects Versions: 1.5.0 >Reporter: Oscar D. Lara Yejas >Priority: Minor > Fix For: 1.5.1 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10807) Add as.data.frame() as a synonym for collect()
Oscar D. Lara Yejas created SPARK-10807: --- Summary: Add as.data.frame() as a synonym for collect() Key: SPARK-10807 URL: https://issues.apache.org/jira/browse/SPARK-10807 Project: Spark Issue Type: New Feature Components: SparkR Affects Versions: 1.5.0 Reporter: Oscar D. Lara Yejas Priority: Minor Fix For: 1.5.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org