[jira] [Commented] (SPARK-9427) Add expression functions in SparkR

2015-11-18 Thread Michael Armbrust (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012137#comment-15012137
 ] 

Michael Armbrust commented on SPARK-9427:
-

What is the status here?  Are we actually trying to get more of this into Spark 
1.6?  or can I drop that target?

> Add expression functions in SparkR
> --
>
> Key: SPARK-9427
> URL: https://issues.apache.org/jira/browse/SPARK-9427
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Yu Ishikawa
>
> The list of functions to add is based on SQL's functions. And it would be 
> better to add them in one shot PR.
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9427) Add expression functions in SparkR

2015-08-19 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702552#comment-14702552
 ] 

Yu Ishikawa commented on SPARK-9427:


Alright.

 Add expression functions in SparkR
 --

 Key: SPARK-9427
 URL: https://issues.apache.org/jira/browse/SPARK-9427
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Yu Ishikawa

 The list of functions to add is based on SQL's functions. And it would be 
 better to add them in one shot PR.
 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9427) Add expression functions in SparkR

2015-08-19 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702563#comment-14702563
 ] 

Yu Ishikawa commented on SPARK-9427:


I see. Thank you for letting me know. 

 Add expression functions in SparkR
 --

 Key: SPARK-9427
 URL: https://issues.apache.org/jira/browse/SPARK-9427
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Yu Ishikawa

 The list of functions to add is based on SQL's functions. And it would be 
 better to add them in one shot PR.
 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9427) Add expression functions in SparkR

2015-08-18 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702519#comment-14702519
 ] 

Shivaram Venkataraman commented on SPARK-9427:
--

[~yuu.ishik...@gmail.com] I retargetted some of the sub-tasks to 1.5.1. It 
shouldn't affect any of the PRs or the development workflow. It just means that 
we can continue merging the PRs into branch-1.5 and depending on when RCs get 
cut etc. we will update the fix version.

 Add expression functions in SparkR
 --

 Key: SPARK-9427
 URL: https://issues.apache.org/jira/browse/SPARK-9427
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Yu Ishikawa

 The list of functions to add is based on SQL's functions. And it would be 
 better to add them in one shot PR.
 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9427) Add expression functions in SparkR

2015-08-17 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699843#comment-14699843
 ] 

Davies Liu commented on SPARK-9427:
---

[~yu_ishikawa] `rand` does work in PySpark (Python 2.7):
{code}
 sqlContext.range(10).select(rand(2), id).show()
+---+---+
| rand()| id|
+---+---+
| 0.6038577325006693|  0|
| 0.6319470735268434|  1|
|0.22327628846133507|  2|
|0.24223739932588373|  3|
| 0.8395518879513995|  4|
| 0.5662927043813443|  5|
| 0.2057736041310516|  6|
| 0.3408245196642603|  7|
|0.08641290347537589|  8|
|0.46561147527615276|  9|
+---+---+
{code}

 Add expression functions in SparkR
 --

 Key: SPARK-9427
 URL: https://issues.apache.org/jira/browse/SPARK-9427
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Yu Ishikawa

 The list of functions to add is based on SQL's functions. And it would be 
 better to add them in one shot PR.
 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9427) Add expression functions in SparkR

2015-08-17 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700052#comment-14700052
 ] 

Reynold Xin commented on SPARK-9427:


Re-targeted.


 Add expression functions in SparkR
 --

 Key: SPARK-9427
 URL: https://issues.apache.org/jira/browse/SPARK-9427
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Yu Ishikawa

 The list of functions to add is based on SQL's functions. And it would be 
 better to add them in one shot PR.
 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9427) Add expression functions in SparkR

2015-08-17 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700058#comment-14700058
 ] 

Shivaram Venkataraman commented on SPARK-9427:
--

Well half of the functions are already in branch-1.5 and I guess we should have 
PRs for some of the other simpler parts (like 9856) come in soon. The more 
complex ones which require changing SerDe might not be appropriate for 1.5, but 
my plan is to get as many of the simple ones in as we can ?

 Add expression functions in SparkR
 --

 Key: SPARK-9427
 URL: https://issues.apache.org/jira/browse/SPARK-9427
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Yu Ishikawa

 The list of functions to add is based on SQL's functions. And it would be 
 better to add them in one shot PR.
 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9427) Add expression functions in SparkR

2015-08-17 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700044#comment-14700044
 ] 

Davies Liu commented on SPARK-9427:
---

Should we target this for 1.6? 

 Add expression functions in SparkR
 --

 Key: SPARK-9427
 URL: https://issues.apache.org/jira/browse/SPARK-9427
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Yu Ishikawa

 The list of functions to add is based on SQL's functions. And it would be 
 better to add them in one shot PR.
 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9427) Add expression functions in SparkR

2015-08-17 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699072#comment-14699072
 ] 

Yu Ishikawa commented on SPARK-9427:


[~shivaram] and [~davies]

How do we convert R {{integer}} type to Scala {{Long}} type?
I have trouble with implementing {{rand(seed: Long)}} function in SparkR. R 
{{integer}} type is recognized as Scala {{Int}} and R {{numeric}} type is 
recognized as Scala {{Double}} type. So I wonder how I should deal with 64 bit 
integer on R. I think we should add {{rand(seed: Int)}} into spark.sql on 
Scala. What do you think?

Plus, I guess PySpark {{rand}} doesn't work on Python 2.x on the same reason. 
Because {{int}} on Python 2.x is recognized as Scala {{Integer}} type.

 Add expression functions in SparkR
 --

 Key: SPARK-9427
 URL: https://issues.apache.org/jira/browse/SPARK-9427
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Yu Ishikawa

 The list of functions to add is based on SQL's functions. And it would be 
 better to add them in one shot PR.
 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9427) Add expression functions in SparkR

2015-08-17 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699080#comment-14699080
 ] 

Shivaram Venkataraman commented on SPARK-9427:
--

Yeah I think the simplest thing might be to add a version of `rand(seed: Int)` 
(or  `rand(seed: Double)` if we want to maintain precision ?) to the API and do 
a cast in Scala to call the version with Long. 

cc [~rxin]

 Add expression functions in SparkR
 --

 Key: SPARK-9427
 URL: https://issues.apache.org/jira/browse/SPARK-9427
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Yu Ishikawa

 The list of functions to add is based on SQL's functions. And it would be 
 better to add them in one shot PR.
 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9427) Add expression functions in SparkR

2015-08-11 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692600#comment-14692600
 ] 

Yu Ishikawa commented on SPARK-9427:


[~shivaram] After all, I'd like to split this issue to a few sub-issues. Since 
it is quite difficult to add the listed expressions at once. And since it is a 
little hard to review a PR for this issue. I think we could classify them to at 
least three types in SparkR. What do you think?

1. Add expressions whose parameter are only {{(Column)}} or {{(Column, 
Column)}}, like {{md5(e: Column)}}
2. Add expressions whose parameter are a little complicated, like {{conv(num: 
Column, fromBase: Int, toBase: Int)}}
3. Add expressions which are conflicted with the already existing generic, like 
{{coalesce(e: Column*)}}

{{1}} is not a difficult task, extracting method definitions from Scala code. 
And I think we rarely need to consider the confliction with current SparkR code.
However, {{2}} and {{3}} are a little hard because of the complexityomplexity. 
For example, in {{3}}, if we must modify the existing R's generic due to new 
expressions, we should check whether the modification affects the existing code 
or not.

 Add expression functions in SparkR
 --

 Key: SPARK-9427
 URL: https://issues.apache.org/jira/browse/SPARK-9427
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Yu Ishikawa

 The list of functions to add is based on SQL's functions. And it would be 
 better to add them in one shot PR.
 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9427) Add expression functions in SparkR

2015-08-11 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692662#comment-14692662
 ] 

Shivaram Venkataraman commented on SPARK-9427:
--

[~yuu.ishik...@gmail.com] Breaking it into 3 PRs sounds good to me. Do you have 
an idea of how many functions there are of each type ?

 Add expression functions in SparkR
 --

 Key: SPARK-9427
 URL: https://issues.apache.org/jira/browse/SPARK-9427
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Yu Ishikawa

 The list of functions to add is based on SQL's functions. And it would be 
 better to add them in one shot PR.
 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9427) Add expression functions in SparkR

2015-08-11 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692777#comment-14692777
 ] 

Yu Ishikawa commented on SPARK-9427:


[~shivaram] I don't figure out the number of each type. However, I estimated 
them as folows. Please be careful that it includes the functions which have 
been added into SparkR.

1 = 50
2 and 3 = 51

 Add expression functions in SparkR
 --

 Key: SPARK-9427
 URL: https://issues.apache.org/jira/browse/SPARK-9427
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Yu Ishikawa

 The list of functions to add is based on SQL's functions. And it would be 
 better to add them in one shot PR.
 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9427) Add expression functions in SparkR

2015-08-11 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692607#comment-14692607
 ] 

Yu Ishikawa commented on SPARK-9427:


h3. Memo

These are the expressions which we should add. (Including existing expressions)
I extracted them from Scala's {{functions.scala}} with {{grep}}.

{noformat}
def abs(e: Column): Column
def acos(columnName: String): Column
def acos(e: Column): Column
def add_months(startDate: Column, numMonths: Int): Column
def approxCountDistinct(columnName: String): Column
def approxCountDistinct(columnName: String, rsd: Double): Column
def approxCountDistinct(e: Column): Column
def approxCountDistinct(e: Column, rsd: Double): Column
def array(colName: String, colNames: String*): Column
def array(cols: Column*): Column
def array_contains(column: Column, value: Any): Column
def asc(columnName: String): Column
def ascii(e: Column): Column
def asin(columnName: String): Column
def asin(e: Column): Column
def atan(columnName: String): Column
def atan(e: Column): Column
def atan2(l: Column, r: Column): Column
def atan2(l: Column, r: Double): Column
def atan2(l: Column, rightName: String): Column
def atan2(l: Double, r: Column): Column
def atan2(l: Double, rightName: String): Column
def atan2(leftName: String, r: Column): Column
def atan2(leftName: String, r: Double): Column
def atan2(leftName: String, rightName: String): Column
def avg(columnName: String): Column
def avg(e: Column): Column
def base64(e: Column): Column
def bin(columnName: String): Column
def bin(e: Column): Column
def bitwiseNOT(e: Column): Column
def cbrt(columnName: String): Column
def cbrt(e: Column): Column
def ceil(columnName: String): Column
def ceil(e: Column): Column
def coalesce(e: Column*): Column
def concat(exprs: Column*): Column
def concat_ws(sep: String, exprs: Column*): Column
def conv(num: Column, fromBase: Int, toBase: Int): Column
def cos(columnName: String): Column
def cos(e: Column): Column
def cosh(columnName: String): Column
def cosh(e: Column): Column
def count(columnName: String): Column
def count(e: Column): Column
def countDistinct(columnName: String, columnNames: String*): Column
def countDistinct(expr: Column, exprs: Column*): Column
def crc32(e: Column): Column
def cumeDist(): Column
def current_date(): Column
def current_timestamp(): Column
def date_add(start: Column, days: Int): Column
def date_format(dateExpr: Column, format: String): Column
def date_sub(start: Column, days: Int): Column
def datediff(end: Column, start: Column): Column
def dayofmonth(e: Column): Column
def dayofyear(e: Column): Column
def decode(value: Column, charset: String): Column
def denseRank(): Column
def desc(columnName: String): Column
def encode(value: Column, charset: String): Column
def exp(columnName: String): Column
def exp(e: Column): Column
def explode(e: Column): Column
def expm1(columnName: String): Column
def expm1(e: Column): Column
def expr(expr: String): Column
def factorial(e: Column): Column
def first(columnName: String): Column
def first(e: Column): Column
def floor(columnName: String): Column
def floor(e: Column): Column
def format_number(x: Column, d: Int): Column
def format_string(format: String, arguments: Column*): Column
def from_unixtime(ut: Column): Column
def from_unixtime(ut: Column, f: String): Column
def from_utc_timestamp(ts: Column, tz: String): Column
def greatest(columnName: String, columnNames: String*): Column
def greatest(exprs: Column*): Column
def hex(column: Column): Column
def hour(e: Column): Column
def hypot(l: Column, r: Column): Column
def hypot(l: Column, r: Double): Column
def hypot(l: Column, rightName: String): Column
def hypot(l: Double, r: Column): Column
def hypot(l: Double, rightName: String): Column
def hypot(leftName: String, r: Column): Column
def hypot(leftName: String, r: Double): Column
def hypot(leftName: String, rightName: String): Column
def initcap(e: Column): Column
def inputFileName(): Column
def instr(str: Column, substring: String): Column
def isNaN(e: Column): Column
def lag(columnName: String, offset: Int): Column
def lag(columnName: String, offset: Int, defaultValue: Any): Column
def lag(e: Column, offset: Int): Column
def lag(e: Column, offset: Int, defaultValue: Any): Column
def last(columnName: String): Column
def last(e: Column): Column
def last_day(e: Column): Column
def lead(columnName: String, offset: Int): Column
def lead(columnName: String, offset: Int, defaultValue: Any): Column
def lead(e: Column, offset: Int): Column
def lead(e: Column, offset: Int, defaultValue: Any): Column
def least(columnName: String, columnNames: String*): Column
def least(exprs: Column*): Column
def length(e: Column): Column
def levenshtein(l: Column, r: Column): Column
def lit(literal: Any): Column
def locate(substr: String, str: Column): Column
def locate(substr: String, str: Column, pos: Int): Column
def log(base: Double, a: Column): Column
def 

[jira] [Commented] (SPARK-9427) Add expression functions in SparkR

2015-07-28 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645464#comment-14645464
 ] 

Yu Ishikawa commented on SPARK-9427:


I'll work this issue.

 Add expression functions in SparkR
 --

 Key: SPARK-9427
 URL: https://issues.apache.org/jira/browse/SPARK-9427
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 The list of functions to add is based on SQL's functions. And it would be 
 better to add them in one shot PR.
 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org