[jira] [Commented] (SPARK-17774) Add support for head on DataFrame Column

2016-10-05 Thread Oscar D. Lara Yejas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15550390#comment-15550390
 ] 

Oscar D. Lara Yejas commented on SPARK-17774:
-

To implement method head() only I'll be happy to:

1) Remove lines 63-69 (method collect) in PR 11336
2) Throw an error if a column can't be collected as opposed to returning an 
empty column (though I'm okay with either option)

Once again, all my code IS STILL NEEDED for head() to (1) having Column class 
to have a reference to the parent DatFrame and (2) propagating the parent 
DataFrame through every possible Column operation.

Bottom line: we should mark this JIRA as a duplicate and merge PR 11336 with 
the minor changes above. Let me know if I have your blessing so I can proceed 
with this. It should be very quick for me. Thanks!
cc: [~falaki] [~shivaram]

> Add support for head on DataFrame Column
> 
>
> Key: SPARK-17774
> URL: https://issues.apache.org/jira/browse/SPARK-17774
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Hossein Falaki
>
> There was a lot of discussion on SPARK-9325. To summarize the conversation on 
> that ticket regardign {{collect}}
> * Pro: Ease of use and maximum compatibility with existing R API
> * Con: We do not want to increase maintenance cost by opening arbitrary API. 
> With Spark's DataFrame API {{collect}} does not work on {{Column}} and there 
> is no need for it to work in R.
> This ticket is strictly about {{head}}. I propose supporting {{head}} on 
> {{Column}} because:
> 1. R users are already used to calling {{head(iris$Sepal.Length)}}. When they 
> do that on SparkDataFrame they get an error. Not a good experience
> 2. Adding support for it does not require any change to the backend. It can 
> be trivially done in R code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17774) Add support for head on DataFrame Column

2016-10-04 Thread Oscar D. Lara Yejas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15547429#comment-15547429
 ] 

Oscar D. Lara Yejas commented on SPARK-17774:
-

[~shivaram]: I concur with Shivaram. Besides, I already implemented method 
head() in my PR 11336:

https://github.com/apache/spark/pull/11336

If you wanted to implement method head() alone, you'll still need to do all 
changes I did for PR 11336 except for the 5 lines of code of method collect(). 
If that's the case, I'd rather suggest to merge PR 11336.

[~falaki]: In the corner cases where there's no parent DataFrame, we can return 
an empty value as opposed to throwing an error. This behavior is already 
implemented in PR 11336. Also, though R doesn't have method collect(), I think 
it's still useful to turn a Column into an R vector. Perhaps a function called 
as.vector()?

Thanks folks!





> Add support for head on DataFrame Column
> 
>
> Key: SPARK-17774
> URL: https://issues.apache.org/jira/browse/SPARK-17774
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Hossein Falaki
>
> There was a lot of discussion on SPARK-9325. To summarize the conversation on 
> that ticket regardign {{collect}}
> * Pro: Ease of use and maximum compatibility with existing R API
> * Con: We do not want to increase maintenance cost by opening arbitrary API. 
> With Spark's DataFrame API {{collect}} does not work on {{Column}} and there 
> is no need for it to work in R.
> This ticket is strictly about {{head}}. I propose supporting {{head}} on 
> {{Column}} because:
> 1. R users are already used to calling {{head(iris$Sepal.Length)}}. When they 
> do that on SparkDataFrame they get an error. Not a good experience
> 2. Adding support for it does not require any change to the backend. It can 
> be trivially done in R code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17774) Add support for head on DataFrame Column

2016-10-04 Thread Oscar D. Lara Yejas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15547429#comment-15547429
 ] 

Oscar D. Lara Yejas edited comment on SPARK-17774 at 10/5/16 2:48 AM:
--

I concur with [~shivaram]. Besides, I already implemented method head() in my 
PR 11336:

https://github.com/apache/spark/pull/11336

If you wanted to implement method head() alone, you'll still need to do all 
changes I did for PR 11336 except for the 5 lines of code of method collect(). 
If that's the case, I'd rather suggest to merge PR 11336.

[~falaki]: In the corner cases where there's no parent DataFrame, we can return 
an empty value as opposed to throwing an error. This behavior is already 
implemented in PR 11336. Also, though R doesn't have method collect(), I think 
it's still useful to turn a Column into an R vector. Perhaps a function called 
as.vector()?

Thanks folks!






was (Author: olarayej):
[~shivaram]: I concur with Shivaram. Besides, I already implemented method 
head() in my PR 11336:

https://github.com/apache/spark/pull/11336

If you wanted to implement method head() alone, you'll still need to do all 
changes I did for PR 11336 except for the 5 lines of code of method collect(). 
If that's the case, I'd rather suggest to merge PR 11336.

[~falaki]: In the corner cases where there's no parent DataFrame, we can return 
an empty value as opposed to throwing an error. This behavior is already 
implemented in PR 11336. Also, though R doesn't have method collect(), I think 
it's still useful to turn a Column into an R vector. Perhaps a function called 
as.vector()?

Thanks folks!





> Add support for head on DataFrame Column
> 
>
> Key: SPARK-17774
> URL: https://issues.apache.org/jira/browse/SPARK-17774
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Hossein Falaki
>
> There was a lot of discussion on SPARK-9325. To summarize the conversation on 
> that ticket regardign {{collect}}
> * Pro: Ease of use and maximum compatibility with existing R API
> * Con: We do not want to increase maintenance cost by opening arbitrary API. 
> With Spark's DataFrame API {{collect}} does not work on {{Column}} and there 
> is no need for it to work in R.
> This ticket is strictly about {{head}}. I propose supporting {{head}} on 
> {{Column}} because:
> 1. R users are already used to calling {{head(iris$Sepal.Length)}}. When they 
> do that on SparkDataFrame they get an error. Not a good experience
> 2. Adding support for it does not require any change to the backend. It can 
> be trivially done in R code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16581) Making JVM backend calling functions public

2016-07-19 Thread Oscar D. Lara Yejas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384782#comment-15384782
 ] 

Oscar D. Lara Yejas commented on SPARK-16581:
-

[~aloknsingh] [~adrian555] Could any of you share your thoughts on this?

> Making JVM backend calling functions public
> ---
>
> Key: SPARK-16581
> URL: https://issues.apache.org/jira/browse/SPARK-16581
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> As described in the design doc in SPARK-15799, to help packages that need to 
> call into the JVM, it will be good to expose some of the R -> JVM functions 
> we have. 
> As a part of this we could also rename, reformat the functions to make them 
> more user friendly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16611) Expose several hidden DataFrame/RDD functions

2016-07-18 Thread Oscar D. Lara Yejas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oscar D. Lara Yejas updated SPARK-16611:

Description: 
Expose the following functions:

- lapply or map
- lapplyPartition or mapPartition
- flatMap
- RDD
- toRDD
- getJRDD
- cleanup.jobj

cc:
[~javierluraschi] [~j...@rstudio.com] [~shivaram]

  was:
Expose the following functions:

- lapply or map
- lapplyPartition or mapPartition
- flatMap
- RDD
- toRDD
- getJRDD
- cleanup.jobj


> Expose several hidden DataFrame/RDD functions
> -
>
> Key: SPARK-16611
> URL: https://issues.apache.org/jira/browse/SPARK-16611
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>
> Expose the following functions:
> - lapply or map
> - lapplyPartition or mapPartition
> - flatMap
> - RDD
> - toRDD
> - getJRDD
> - cleanup.jobj
> cc:
> [~javierluraschi] [~j...@rstudio.com] [~shivaram]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16608) Expose JVM SparkR API functions

2016-07-18 Thread Oscar D. Lara Yejas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oscar D. Lara Yejas updated SPARK-16608:

Description: 
Expose the following functions:
- invokeJava
- callJStatic
- callJMethod 
- cleanup.jobj
- broadcast and useBroadcast

cc:
[~javierluraschi] [~j...@rstudio.com] [~shivaram]


  was:
Expose the following functions:
- invokeJava
- callJStatic
- callJMethod 
- cleanup.jobj
- broadcast and useBroadcast



> Expose JVM SparkR API functions 
> 
>
> Key: SPARK-16608
> URL: https://issues.apache.org/jira/browse/SPARK-16608
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>
> Expose the following functions:
> - invokeJava
> - callJStatic
> - callJMethod 
> - cleanup.jobj
> - broadcast and useBroadcast
> cc:
> [~javierluraschi] [~j...@rstudio.com] [~shivaram]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16611) Expose several hidden DataFrame/RDD functions

2016-07-18 Thread Oscar D. Lara Yejas (JIRA)
Oscar D. Lara Yejas created SPARK-16611:
---

 Summary: Expose several hidden DataFrame/RDD functions
 Key: SPARK-16611
 URL: https://issues.apache.org/jira/browse/SPARK-16611
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Reporter: Oscar D. Lara Yejas


Expose the following functions:

- lapply or map
- lapplyPartition or mapPartition
- flatMap
- RDD
- toRDD
- getJRDD
- cleanup.jobj



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16608) Expose JVM SparkR API functions

2016-07-18 Thread Oscar D. Lara Yejas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oscar D. Lara Yejas updated SPARK-16608:

Description: 
Expose the following functions:
- invokeJava
- callJStatic
- callJMethod 
- cleanup.jobj
- broadcast and useBroadcast


  was:
- invokeJava
- callJStatic
- callJMethod 
- cleanup.jobj
 - broadcast and useBroadcast
 
2) DataFrame API
- lapply or map
- lapplyPartition or mapPartition
- flatMap
 
3) RDD apis

- RDD
- toRDD
- getJRDD
- cleanup.jobj



> Expose JVM SparkR API functions 
> 
>
> Key: SPARK-16608
> URL: https://issues.apache.org/jira/browse/SPARK-16608
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>
> Expose the following functions:
> - invokeJava
> - callJStatic
> - callJMethod 
> - cleanup.jobj
> - broadcast and useBroadcast



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16608) Expose JVM SparkR API functions

2016-07-18 Thread Oscar D. Lara Yejas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oscar D. Lara Yejas updated SPARK-16608:

Description: 
- invokeJava
- callJStatic
- callJMethod 
- cleanup.jobj
 - broadcast and useBroadcast
 
2) DataFrame API
- lapply or map
- lapplyPartition or mapPartition
- flatMap
 
3) RDD apis

- RDD
- toRDD
- getJRDD
- cleanup.jobj


  was:
1) RPC/memory API 
- invokeJava
- callJStatic
- callJMethod 
- cleanup.jobj
 - broadcast and useBroadcast
 
2) DataFrame API
- lapply or map
- lapplyPartition or mapPartition
- flatMap
 
3) RDD apis

- RDD
- toRDD
- getJRDD
- cleanup.jobj



> Expose JVM SparkR API functions 
> 
>
> Key: SPARK-16608
> URL: https://issues.apache.org/jira/browse/SPARK-16608
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>
> - invokeJava
> - callJStatic
> - callJMethod 
> - cleanup.jobj
>  - broadcast and useBroadcast
>  
> 2) DataFrame API
> - lapply or map
> - lapplyPartition or mapPartition
> - flatMap
>  
> 3) RDD apis
> 
> - RDD
> - toRDD
> - getJRDD
> - cleanup.jobj



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16608) Expose JVM SparkR API functions

2016-07-18 Thread Oscar D. Lara Yejas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oscar D. Lara Yejas updated SPARK-16608:

Summary: Expose JVM SparkR API functions   (was: Expose some low-level 
SparkR functions )

> Expose JVM SparkR API functions 
> 
>
> Key: SPARK-16608
> URL: https://issues.apache.org/jira/browse/SPARK-16608
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>
> 1) RPC/memory API 
> - invokeJava
> - callJStatic
> - callJMethod 
> - cleanup.jobj
>  - broadcast and useBroadcast
>  
> 2) DataFrame API
> - lapply or map
> - lapplyPartition or mapPartition
> - flatMap
>  
> 3) RDD apis
> 
> - RDD
> - toRDD
> - getJRDD
> - cleanup.jobj



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16608) Expose some low-level SparkR functions

2016-07-18 Thread Oscar D. Lara Yejas (JIRA)
Oscar D. Lara Yejas created SPARK-16608:
---

 Summary: Expose some low-level SparkR functions 
 Key: SPARK-16608
 URL: https://issues.apache.org/jira/browse/SPARK-16608
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Reporter: Oscar D. Lara Yejas


1) RPC/memory API 
- invokeJava
- callJStatic
- callJMethod 
- cleanup.jobj
 - broadcast and useBroadcast
 
2) DataFrame API
- lapply or map
- lapplyPartition or mapPartition
- flatMap
 
3) RDD apis

- RDD
- toRDD
- getJRDD
- cleanup.jobj




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14256) Remove parameter sqlContext from as.DataFrame

2016-03-29 Thread Oscar D. Lara Yejas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oscar D. Lara Yejas updated SPARK-14256:

Description: Currently, the user requires to pass parameter sqlContext to 
both createDataFrame and as.DataFrame. Since sqlContext is a singleton global 
parameter, it should be optional from the signature of as.DataFrame.  (was: 
Currently, the user requires to pass parameter sqlContext to both 
createDataFrame and as.DataFrame. Since sqlContext is a singleton global 
parameter, it should be obviated from the signature of these two methods.)

> Remove parameter sqlContext from as.DataFrame
> -
>
> Key: SPARK-14256
> URL: https://issues.apache.org/jira/browse/SPARK-14256
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>
> Currently, the user requires to pass parameter sqlContext to both 
> createDataFrame and as.DataFrame. Since sqlContext is a singleton global 
> parameter, it should be optional from the signature of as.DataFrame.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14256) Remove parameter sqlContext from as.DataFrame

2016-03-29 Thread Oscar D. Lara Yejas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oscar D. Lara Yejas updated SPARK-14256:

Summary: Remove parameter sqlContext from as.DataFrame  (was: Remove 
parameter sqlContext from as.DataFrame and createDataFrame)

> Remove parameter sqlContext from as.DataFrame
> -
>
> Key: SPARK-14256
> URL: https://issues.apache.org/jira/browse/SPARK-14256
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>
> Currently, the user requires to pass parameter sqlContext to both 
> createDataFrame and as.DataFrame. Since sqlContext is a singleton global 
> parameter, it should be obviated from the signature of these two methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14256) Remove parameter sqlContext from as.DataFrame and createDataFrame

2016-03-29 Thread Oscar D. Lara Yejas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oscar D. Lara Yejas updated SPARK-14256:

Description: Currently, the user requires to pass parameter sqlContext to 
both createDataFrame and as.DataFrame. Since sqlContext is a singleton global 
parameter, it should be obviated from the signature of these two methods.

> Remove parameter sqlContext from as.DataFrame and createDataFrame
> -
>
> Key: SPARK-14256
> URL: https://issues.apache.org/jira/browse/SPARK-14256
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>
> Currently, the user requires to pass parameter sqlContext to both 
> createDataFrame and as.DataFrame. Since sqlContext is a singleton global 
> parameter, it should be obviated from the signature of these two methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14256) Remove parameter sqlContext from as.DataFrame and createDataFrame

2016-03-29 Thread Oscar D. Lara Yejas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15217018#comment-15217018
 ] 

Oscar D. Lara Yejas commented on SPARK-14256:
-

I'm working on this one

> Remove parameter sqlContext from as.DataFrame and createDataFrame
> -
>
> Key: SPARK-14256
> URL: https://issues.apache.org/jira/browse/SPARK-14256
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14256) Remove parameter sqlContext from as.DataFrame and createDataFrame

2016-03-29 Thread Oscar D. Lara Yejas (JIRA)
Oscar D. Lara Yejas created SPARK-14256:
---

 Summary: Remove parameter sqlContext from as.DataFrame and 
createDataFrame
 Key: SPARK-14256
 URL: https://issues.apache.org/jira/browse/SPARK-14256
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Reporter: Oscar D. Lara Yejas






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13734) SparkR histogram

2016-03-08 Thread Oscar D. Lara Yejas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oscar D. Lara Yejas updated SPARK-13734:

Description: Create method histogram() on SparkR to render a histogram of a 
given Column.

> SparkR histogram
> 
>
> Key: SPARK-13734
> URL: https://issues.apache.org/jira/browse/SPARK-13734
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>Priority: Minor
>
> Create method histogram() on SparkR to render a histogram of a given Column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13734) SparkR histogram

2016-03-07 Thread Oscar D. Lara Yejas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oscar D. Lara Yejas updated SPARK-13734:

Summary: SparkR histogram  (was: Histogram)

> SparkR histogram
> 
>
> Key: SPARK-13734
> URL: https://issues.apache.org/jira/browse/SPARK-13734
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13734) Histogram

2016-03-07 Thread Oscar D. Lara Yejas (JIRA)
Oscar D. Lara Yejas created SPARK-13734:
---

 Summary: Histogram
 Key: SPARK-13734
 URL: https://issues.apache.org/jira/browse/SPARK-13734
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Oscar D. Lara Yejas






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13734) Histogram

2016-03-07 Thread Oscar D. Lara Yejas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184057#comment-15184057
 ] 

Oscar D. Lara Yejas commented on SPARK-13734:
-

I'm working on this one.

> Histogram
> -
>
> Key: SPARK-13734
> URL: https://issues.apache.org/jira/browse/SPARK-13734
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9325) Support `collect` on DataFrame columns

2016-02-23 Thread Oscar D. Lara Yejas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159928#comment-15159928
 ] 

Oscar D. Lara Yejas commented on SPARK-9325:


Hi, folks.

I have created a PR for this. A design document is enclosed in the PR.

Thanks,
Oscar

> Support `collect` on DataFrame columns
> --
>
> Key: SPARK-9325
> URL: https://issues.apache.org/jira/browse/SPARK-9325
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> This is to support code of the form 
> ```
> ages <- collect(df$Age)
> ```
> Right now `df$Age` returns a Column, which has no functions supported.
> Similarly we might consider supporting `head(df$Age)` etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13436) Add parameter drop to subsetting oeprator

2016-02-22 Thread Oscar D. Lara Yejas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oscar D. Lara Yejas updated SPARK-13436:

Issue Type: Sub-task  (was: Task)
Parent: SPARK-9315

> Add parameter drop to subsetting oeprator
> -
>
> Key: SPARK-13436
> URL: https://issues.apache.org/jira/browse/SPARK-13436
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>
> Parameter drops allows to return a vector/data.frame accordingly if the 
> result of subsetting a data.frame has one single column (see example below). 
> The same behavior is needed on a DataFrame.
> > head(iris[, 1, drop=F])
>   Sepal.Length
> 1  5.1
> 2  4.9
> 3  4.7
> 4  4.6
> 5  5.0
> 6  5.4
> > head(iris[, 1, drop=T])
> [1] 5.1 4.9 4.7 4.6 5.0 5.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13436) Add parameter drop to subsetting operator [

2016-02-22 Thread Oscar D. Lara Yejas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oscar D. Lara Yejas updated SPARK-13436:

Summary: Add parameter drop to subsetting operator [  (was: Add parameter 
drop to subsetting oeprator)

> Add parameter drop to subsetting operator [
> ---
>
> Key: SPARK-13436
> URL: https://issues.apache.org/jira/browse/SPARK-13436
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>
> Parameter drops allows to return a vector/data.frame accordingly if the 
> result of subsetting a data.frame has one single column (see example below). 
> The same behavior is needed on a DataFrame.
> > head(iris[, 1, drop=F])
>   Sepal.Length
> 1  5.1
> 2  4.9
> 3  4.7
> 4  4.6
> 5  5.0
> 6  5.4
> > head(iris[, 1, drop=T])
> [1] 5.1 4.9 4.7 4.6 5.0 5.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13436) Add parameter drop to subsetting oeprator

2016-02-22 Thread Oscar D. Lara Yejas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oscar D. Lara Yejas updated SPARK-13436:

Description: 
Parameter drops allows to return a vector/data.frame accordingly if the result 
of subsetting a data.frame has one single column (see example below). The same 
behavior is needed on a DataFrame.

> head(iris[, 1, drop=F])
  Sepal.Length
1  5.1
2  4.9
3  4.7
4  4.6
5  5.0
6  5.4

> head(iris[, 1, drop=T])
[1] 5.1 4.9 4.7 4.6 5.0 5.4

> Add parameter drop to subsetting oeprator
> -
>
> Key: SPARK-13436
> URL: https://issues.apache.org/jira/browse/SPARK-13436
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>
> Parameter drops allows to return a vector/data.frame accordingly if the 
> result of subsetting a data.frame has one single column (see example below). 
> The same behavior is needed on a DataFrame.
> > head(iris[, 1, drop=F])
>   Sepal.Length
> 1  5.1
> 2  4.9
> 3  4.7
> 4  4.6
> 5  5.0
> 6  5.4
> > head(iris[, 1, drop=T])
> [1] 5.1 4.9 4.7 4.6 5.0 5.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13436) Add parameter drop to subsetting oeprator

2016-02-22 Thread Oscar D. Lara Yejas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oscar D. Lara Yejas updated SPARK-13436:

Issue Type: Task  (was: Bug)

> Add parameter drop to subsetting oeprator
> -
>
> Key: SPARK-13436
> URL: https://issues.apache.org/jira/browse/SPARK-13436
> Project: Spark
>  Issue Type: Task
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>
> Parameter drops allows to return a vector/data.frame accordingly if the 
> result of subsetting a data.frame has one single column (see example below). 
> The same behavior is needed on a DataFrame.
> > head(iris[, 1, drop=F])
>   Sepal.Length
> 1  5.1
> 2  4.9
> 3  4.7
> 4  4.6
> 5  5.0
> 6  5.4
> > head(iris[, 1, drop=T])
> [1] 5.1 4.9 4.7 4.6 5.0 5.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13436) Add parameter drop to subsetting oeprator

2016-02-22 Thread Oscar D. Lara Yejas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157544#comment-15157544
 ] 

Oscar D. Lara Yejas commented on SPARK-13436:
-

I'm working on this one

> Add parameter drop to subsetting oeprator
> -
>
> Key: SPARK-13436
> URL: https://issues.apache.org/jira/browse/SPARK-13436
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13436) Add parameter drop to subsetting oeprator

2016-02-22 Thread Oscar D. Lara Yejas (JIRA)
Oscar D. Lara Yejas created SPARK-13436:
---

 Summary: Add parameter drop to subsetting oeprator
 Key: SPARK-13436
 URL: https://issues.apache.org/jira/browse/SPARK-13436
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Reporter: Oscar D. Lara Yejas






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13327) colnames()<- allows invalid column names

2016-02-15 Thread Oscar D. Lara Yejas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147843#comment-15147843
 ] 

Oscar D. Lara Yejas commented on SPARK-13327:
-

I'm working on this one

> colnames()<- allows invalid column names
> 
>
> Key: SPARK-13327
> URL: https://issues.apache.org/jira/browse/SPARK-13327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>
> colnames<- fails if:
> 1) Given colnames contain .
> 2) Given colnames contain NA
> 3) Given colnames are not character
> 4) Given colnames have different length than dataset's (SparkSQL error is 
> through but not user friendly)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13327) colnames()<- allows invalid column names

2016-02-15 Thread Oscar D. Lara Yejas (JIRA)
Oscar D. Lara Yejas created SPARK-13327:
---

 Summary: colnames()<- allows invalid column names
 Key: SPARK-13327
 URL: https://issues.apache.org/jira/browse/SPARK-13327
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Reporter: Oscar D. Lara Yejas


colnames<- fails if:

1) Given colnames contain .
2) Given colnames contain NA
3) Given colnames are not character
4) Given colnames have different length than dataset's (SparkSQL error is 
through but not user friendly)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame

2015-11-17 Thread Oscar D. Lara Yejas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009172#comment-15009172
 ] 

Oscar D. Lara Yejas commented on SPARK-10863:
-

[~shivaram] [~sunrui] [~felixcheung] Any thoughts on this?

> Method coltypes() to return the R column types of a DataFrame
> -
>
> Key: SPARK-10863
> URL: https://issues.apache.org/jira/browse/SPARK-10863
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.5.0
>Reporter: Oscar D. Lara Yejas
>Assignee: Oscar D. Lara Yejas
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame

2015-11-13 Thread Oscar D. Lara Yejas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15005019#comment-15005019
 ] 

Oscar D. Lara Yejas edited comment on SPARK-10863 at 11/14/15 1:19 AM:
---

[~felixcheung] I think a solution to all three issues would be to implement 
wrapper classes for complex types. For example, for StructType, we could have 
something like the small prototype I implemented below (still very raw, but 
just to give you an idea). I'd also need to implement class Row accordingly to 
handle the values.

I could do something similar for MapType, and I believe a list/vector should 
suffice for ArrayType.

Thoughts?

{code:title=Struct.R|borderStyle=solid}
# You can actually just copy and paste the code below on R to run it
setClass("StructField",
 representation(
   name = "character",
   type = "character"
))

# A Struct is a set of StructField objects, modeled as an environment
setClass("Struct",
 representation(
   struct = "environment"
))

# Initialize a Struct from a list of StructField objects
setMethod("initialize", signature = "Struct", definition=
function(.Object, fields) {
  lapply(fields, function(field) {
.Object@struct[[field@name]] <- field
  })
  return(.Object)
})

# Overwrite [[ operator to access the environment directly
setGeneric("[[")
setMethod("[[", signature="Struct", definition=
function(x, i) {
  return(x@struct[[i]])
})

# Overwrite [[<- operator to access the environment directly
setGeneric("[[<-")
setMethod("[[<-", signature="Struct", definition=
function(x, i, value) {
  if (class(value) == "StructField") {
x@struct[[i]] <- value
  }
  return(x)
})

field1 <- new("StructField", name="x", type="numeric")
field2 <- new("StructField", name="y", type="character")
s <- new("Struct", fields=list(field1, field2))
s[["x"]]
s[["z"]] <- new("StructField", name="z", type="logical")

{code}


was (Author: olarayej):
[~felixcheung] I think a solution to all three issues would be to implement 
wrapper classes for complex types. For example, for StructType, we could have 
something like the small prototype I implemented below (still very raw, but 
just to give you an idea). I'd also need to implement class Row accordingly to 
handle the values.

I could do something similar for MapType, and I believe a list/vector should 
suffice for ArrayType.

Thoughts?

# You can actually just copy and paste the code below on R to run it
setClass("StructField",
 representation(
   name = "character",
   type = "character"
))

# A Struct is a set of StructField objects, modeled as an environment
setClass("Struct",
 representation(
   struct = "environment"
))

# Initialize a Struct from a list of StructField objects
setMethod("initialize", signature = "Struct", definition=
function(.Object, fields) {
  lapply(fields, function(field) {
.Object@struct[[field@name]] <- field
  })
  return(.Object)
})

# Overwrite [[ operator to access the environment directly
setGeneric("[[")
setMethod("[[", signature="Struct", definition=
function(x, i) {
  return(x@struct[[i]])
})

# Overwrite [[<- operator to access the environment directly
setGeneric("[[<-")
setMethod("[[<-", signature="Struct", definition=
function(x, i, value) {
  if (class(value) == "StructField") {
x@struct[[i]] <- value
  }
  return(x)
})

field1 <- new("StructField", name="x", type="numeric")
field2 <- new("StructField", name="y", type="character")
s <- new("Struct", fields=list(field1, field2))
s[["x"]]
s[["z"]] <- new("StructField", name="z", type="logical")

> Method coltypes() to return the R column types of a DataFrame
> -
>
> Key: SPARK-10863
> URL: https://issues.apache.org/jira/browse/SPARK-10863
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.5.0
>Reporter: Oscar D. Lara Yejas
>Assignee: Oscar D. Lara Yejas
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame

2015-11-13 Thread Oscar D. Lara Yejas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15005019#comment-15005019
 ] 

Oscar D. Lara Yejas commented on SPARK-10863:
-

[~felixcheung] I think a solution to all three issues would be to implement 
wrapper classes for complex types. For example, for StructType, we could have 
something like the small prototype I implemented below (still very raw, but 
just to give you an idea). I'd also need to implement class Row accordingly to 
handle the values.

I could do something similar for MapType, and I believe a list/vector should 
suffice for ArrayType.

Thoughts?

# You can actually just copy and paste the code below on R to run it
setClass("StructField",
 representation(
   name = "character",
   type = "character"
))

# A Struct is a set of StructField objects, modeled as an environment
setClass("Struct",
 representation(
   struct = "environment"
))

# Initialize a Struct from a list of StructField objects
setMethod("initialize", signature = "Struct", definition=
function(.Object, fields) {
  lapply(fields, function(field) {
.Object@struct[[field@name]] <- field
  })
  return(.Object)
})

# Overwrite [[ operator to access the environment directly
setGeneric("[[")
setMethod("[[", signature="Struct", definition=
function(x, i) {
  return(x@struct[[i]])
})

# Overwrite [[<- operator to access the environment directly
setGeneric("[[<-")
setMethod("[[<-", signature="Struct", definition=
function(x, i, value) {
  if (class(value) == "StructField") {
x@struct[[i]] <- value
  }
  return(x)
})

field1 <- new("StructField", name="x", type="numeric")
field2 <- new("StructField", name="y", type="character")
s <- new("Struct", fields=list(field1, field2))
s[["x"]]
s[["z"]] <- new("StructField", name="z", type="logical")

> Method coltypes() to return the R column types of a DataFrame
> -
>
> Key: SPARK-10863
> URL: https://issues.apache.org/jira/browse/SPARK-10863
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.5.0
>Reporter: Oscar D. Lara Yejas
>Assignee: Oscar D. Lara Yejas
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame

2015-11-13 Thread Oscar D. Lara Yejas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004363#comment-15004363
 ] 

Oscar D. Lara Yejas edited comment on SPARK-10863 at 11/13/15 5:58 PM:
---

[~felixcheung] Let me try to clarify a bit.

As suggested by [~shivaram], I implemented a fallback mechanism so that if 
there's no corresponding mapping from a Spark type into R's (i.e., mapping is 
NA), the same R type is returned.

The reason for this is that, in my opinion, having coltypes(df) return NA's 
would be a bit confusing from the user perspective. What would an NA type mean? 
Type not set or data inconsistency come to my mind if I were in the user's 
shoes.

I believe it all depends on the type of operations we want to support on 
Columns. For example, if the user wants to do:

df$column1 + 3
!df$colum2
grep(df$column3, "regex")
df$column4 / df$column5

column1, column4, and column5 must be numeric/integer, column2 must be logical, 
and column3 must be character.

Now, what kind of operations are we planning to support on Array, Struct, and 
Map types? Depending on that we could map them to lists/environment or I could 
fix it so that instead of returning map, for example, I could 
return map.

Hope this helps clarify, and let me know your thoughts.

Thanks!




was (Author: olarayej):
[~felixcheung] Let me try to clarify a bit.

As suggested by [~shivaram], I implemented a fallback mechanism so that if 
there's no corresponding mapping from a Spark type into R's (i.e., mapping is 
NA), the same R type is returned.

The reason for this is that, in my opinion, having coltypes(df) return NA's 
would be a bit confusing from the user perspective. What would an NA type mean? 
Type not set or data inconsistency come to my mind if I were in the user's 
shoes.

I believe it all depends on the type of operations we want to support on 
Columns. For example, if the user wants to do:

df$column1 + 3
!df$colum2
grep(df$column3, "regex")
df$column4 / df$column5

column1, column4, and column5 must be numeric/integer, column2 must be logical, 
and column3 must be character.

Now, what kind of operations are we planning to support on Array, Struct, and 
Map types? Depending on that we could map them to lists/environment or leave 
them as they are right now.

Hope this helps clarify, and let me know your thoughts.

Thanks!



> Method coltypes() to return the R column types of a DataFrame
> -
>
> Key: SPARK-10863
> URL: https://issues.apache.org/jira/browse/SPARK-10863
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.5.0
>Reporter: Oscar D. Lara Yejas
>Assignee: Oscar D. Lara Yejas
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame

2015-11-13 Thread Oscar D. Lara Yejas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004363#comment-15004363
 ] 

Oscar D. Lara Yejas edited comment on SPARK-10863 at 11/13/15 5:54 PM:
---

[~felixcheung] Let me try to clarify a bit.

As suggested by [~shivaram], I implemented a fallback mechanism so that if 
there's no corresponding mapping from a Spark type into R's (i.e., mapping is 
NA), the same R type is returned.

The reason for this is that, in my opinion, having coltypes(df) return NA's 
would be a bit confusing from the user perspective. What would an NA type mean? 
Type not set or data inconsistency come to my mind if I were in the user's 
shoes.

I believe it all depends on the type of operations we want to support on 
Columns. For example, if the user wants to do:

df$column1 + 3
!df$colum2
grep(df$column3, "regex")
df$column4 / df$column5

column1, column4, and column5 must be numeric/integer, column2 must be logical, 
and column3 must be character.

Now, what kind of operations are we planning to support on Array, Struct, and 
Map types? Depending on that we could map them to lists/environment or leave 
them as they are right now.

Hope this helps clarify, and let me know your thoughts.

Thanks!




was (Author: olarayej):
[~felixcheung] Let me try to clarify a bit.

As suggested by [~shivaram], I implemented a fallback mechanism so that if 
there's no corresponding mapping from a Spark type into R's (i.e., mapping is 
NA), the same R type is returned.

The reason for this is that, in my opinion, having coltypes(df) return NA's 
would be a bit confusing from the user perspective. What would an NA type mean? 
Type not set or data inconsistency come to my mind if I were in the user's 
shoes.

I believe it all depends on the type of operations we want to support on 
Columns. For example, if the user wants to do:

df$column1 + 3
!df$colum2
grep(df$column, "regex")
df$column4 / df$column5

column1, column4, and column5 must be numeric/integer, column2 must be logical, 
and column3 must be character.

Now, what kind of operations are we planning to support on Array, Struct, and 
Map types? Depending on that we could map them to lists/environment or leave 
them as they are right now.

Hope this helps clarify, and let me know your thoughts.

Thanks!



> Method coltypes() to return the R column types of a DataFrame
> -
>
> Key: SPARK-10863
> URL: https://issues.apache.org/jira/browse/SPARK-10863
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.5.0
>Reporter: Oscar D. Lara Yejas
>Assignee: Oscar D. Lara Yejas
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame

2015-11-13 Thread Oscar D. Lara Yejas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004363#comment-15004363
 ] 

Oscar D. Lara Yejas commented on SPARK-10863:
-

[~felixcheung] Let me try to clarify a bit.

As suggested by [~shivaram], I implemented a fallback mechanism so that if 
there's no corresponding mapping from a Spark type into R's (i.e., mapping is 
NA), the same R type is returned.

The reason for this is that, in my opinion, having coltypes(df) return NA's 
would be a bit confusing from the user perspective. What would an NA type mean? 
Type not set or data inconsistency come to my mind if I were in the user's 
shoes.

I believe it all depends on the type of operations we want to support on 
Columns. For example, if the user wants to do:

df$column1 + 3
!df$colum2
grep(df$column, "regex")
df$column4 / df$column5

column1, column4, and column5 must be numeric/integer, column2 must be logical, 
and column3 must be character.

Now, what kind of operations are we planning to support on Array, Struct, and 
Map types? Depending on that we could map them to lists/environment or leave 
them as they are right now.

Hope this helps clarify, and let me know your thoughts.

Thanks!



> Method coltypes() to return the R column types of a DataFrame
> -
>
> Key: SPARK-10863
> URL: https://issues.apache.org/jira/browse/SPARK-10863
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.5.0
>Reporter: Oscar D. Lara Yejas
>Assignee: Oscar D. Lara Yejas
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11031) SparkR str() method on DataFrame objects

2015-10-09 Thread Oscar D. Lara Yejas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oscar D. Lara Yejas updated SPARK-11031:

Issue Type: Sub-task  (was: New Feature)
Parent: SPARK-9315

> SparkR str() method on DataFrame objects
> 
>
> Key: SPARK-11031
> URL: https://issues.apache.org/jira/browse/SPARK-11031
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Oscar D. Lara Yejas
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11031) SparkR str() method on DataFrame objects

2015-10-09 Thread Oscar D. Lara Yejas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950988#comment-14950988
 ] 

Oscar D. Lara Yejas commented on SPARK-11031:
-

I'm working on this one. It depends on coltypes().

> SparkR str() method on DataFrame objects
> 
>
> Key: SPARK-11031
> URL: https://issues.apache.org/jira/browse/SPARK-11031
> Project: Spark
>  Issue Type: New Feature
>Reporter: Oscar D. Lara Yejas
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11031) SparkR str() method on DataFrame objects

2015-10-09 Thread Oscar D. Lara Yejas (JIRA)
Oscar D. Lara Yejas created SPARK-11031:
---

 Summary: SparkR str() method on DataFrame objects
 Key: SPARK-11031
 URL: https://issues.apache.org/jira/browse/SPARK-11031
 Project: Spark
  Issue Type: New Feature
Reporter: Oscar D. Lara Yejas






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame

2015-09-28 Thread Oscar D. Lara Yejas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14934322#comment-14934322
 ] 

Oscar D. Lara Yejas commented on SPARK-10863:
-

I have changed this JIRA as a subtask of 9315. Thanks!
-Oscar

> Method coltypes() to return the R column types of a DataFrame
> -
>
> Key: SPARK-10863
> URL: https://issues.apache.org/jira/browse/SPARK-10863
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.5.0
>Reporter: Oscar D. Lara Yejas
> Fix For: 1.5.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame

2015-09-28 Thread Oscar D. Lara Yejas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oscar D. Lara Yejas updated SPARK-10863:

Issue Type: Sub-task  (was: Task)
Parent: SPARK-9315

> Method coltypes() to return the R column types of a DataFrame
> -
>
> Key: SPARK-10863
> URL: https://issues.apache.org/jira/browse/SPARK-10863
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.5.0
>Reporter: Oscar D. Lara Yejas
> Fix For: 1.5.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame

2015-09-28 Thread Oscar D. Lara Yejas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oscar D. Lara Yejas updated SPARK-10863:

Issue Type: Task  (was: New Feature)

> Method coltypes() to return the R column types of a DataFrame
> -
>
> Key: SPARK-10863
> URL: https://issues.apache.org/jira/browse/SPARK-10863
> Project: Spark
>  Issue Type: Task
>  Components: SparkR
>Affects Versions: 1.5.0
>Reporter: Oscar D. Lara Yejas
> Fix For: 1.5.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame

2015-09-28 Thread Oscar D. Lara Yejas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14934307#comment-14934307
 ] 

Oscar D. Lara Yejas edited comment on SPARK-10863 at 9/28/15 11:13 PM:
---

Spark data types are different than R's. For example:

Spark -> R
double -> numeric
string -> character
int ->  integer

Method coltypes() shows the corresponding R types of a Spark DataFrame

My implementation uses method dtypes() under the covers. 


was (Author: olarayej):
Spark data types are different than R's. For example:

Spark R
double -> numeric
string -> character
int ->  integer

Method coltypes() shows the corresponding R types of a Spark DataFrame

My implementation uses method dtypes() under the covers. 

> Method coltypes() to return the R column types of a DataFrame
> -
>
> Key: SPARK-10863
> URL: https://issues.apache.org/jira/browse/SPARK-10863
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 1.5.0
>Reporter: Oscar D. Lara Yejas
> Fix For: 1.5.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame

2015-09-28 Thread Oscar D. Lara Yejas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14934307#comment-14934307
 ] 

Oscar D. Lara Yejas commented on SPARK-10863:
-

Spark data types are different than R's. For example:

Spark R
double -> numeric
string -> character
int ->  integer

Method coltypes() shows the corresponding R types of a Spark DataFrame

My implementation uses method dtypes() under the covers. 

> Method coltypes() to return the R column types of a DataFrame
> -
>
> Key: SPARK-10863
> URL: https://issues.apache.org/jira/browse/SPARK-10863
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 1.5.0
>Reporter: Oscar D. Lara Yejas
> Fix For: 1.5.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame

2015-09-28 Thread Oscar D. Lara Yejas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14934233#comment-14934233
 ] 

Oscar D. Lara Yejas commented on SPARK-10863:
-

I'm working on this one.

-Oscar

> Method coltypes() to return the R column types of a DataFrame
> -
>
> Key: SPARK-10863
> URL: https://issues.apache.org/jira/browse/SPARK-10863
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 1.5.0
>Reporter: Oscar D. Lara Yejas
> Fix For: 1.5.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10863) Method coltypes() to return the R column types of a DataFrame

2015-09-28 Thread Oscar D. Lara Yejas (JIRA)
Oscar D. Lara Yejas created SPARK-10863:
---

 Summary: Method coltypes() to return the R column types of a 
DataFrame
 Key: SPARK-10863
 URL: https://issues.apache.org/jira/browse/SPARK-10863
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Affects Versions: 1.5.0
Reporter: Oscar D. Lara Yejas
 Fix For: 1.5.1






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10807) Add as.data.frame() as a synonym for collect()

2015-09-24 Thread Oscar D. Lara Yejas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14906760#comment-14906760
 ] 

Oscar D. Lara Yejas commented on SPARK-10807:
-

I'm working on this one.

Thanks,
Oscar

> Add as.data.frame() as a synonym for collect()
> --
>
> Key: SPARK-10807
> URL: https://issues.apache.org/jira/browse/SPARK-10807
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 1.5.0
>Reporter: Oscar D. Lara Yejas
>Priority: Minor
> Fix For: 1.5.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10807) Add as.data.frame() as a synonym for collect()

2015-09-24 Thread Oscar D. Lara Yejas (JIRA)
Oscar D. Lara Yejas created SPARK-10807:
---

 Summary: Add as.data.frame() as a synonym for collect()
 Key: SPARK-10807
 URL: https://issues.apache.org/jira/browse/SPARK-10807
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Affects Versions: 1.5.0
Reporter: Oscar D. Lara Yejas
Priority: Minor
 Fix For: 1.5.1






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org