[jira] [Commented] (SPARK-9427) Add expression functions in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012137#comment-15012137 ] Michael Armbrust commented on SPARK-9427: - What is the status here? Are we actually trying to get more of this into Spark 1.6? or can I drop that target? > Add expression functions in SparkR > -- > > Key: SPARK-9427 > URL: https://issues.apache.org/jira/browse/SPARK-9427 > Project: Spark > Issue Type: New Feature > Components: SparkR >Reporter: Yu Ishikawa > > The list of functions to add is based on SQL's functions. And it would be > better to add them in one shot PR. > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9427) Add expression functions in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702552#comment-14702552 ] Yu Ishikawa commented on SPARK-9427: Alright. Add expression functions in SparkR -- Key: SPARK-9427 URL: https://issues.apache.org/jira/browse/SPARK-9427 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Yu Ishikawa The list of functions to add is based on SQL's functions. And it would be better to add them in one shot PR. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9427) Add expression functions in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702563#comment-14702563 ] Yu Ishikawa commented on SPARK-9427: I see. Thank you for letting me know. Add expression functions in SparkR -- Key: SPARK-9427 URL: https://issues.apache.org/jira/browse/SPARK-9427 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Yu Ishikawa The list of functions to add is based on SQL's functions. And it would be better to add them in one shot PR. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9427) Add expression functions in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702519#comment-14702519 ] Shivaram Venkataraman commented on SPARK-9427: -- [~yuu.ishik...@gmail.com] I retargetted some of the sub-tasks to 1.5.1. It shouldn't affect any of the PRs or the development workflow. It just means that we can continue merging the PRs into branch-1.5 and depending on when RCs get cut etc. we will update the fix version. Add expression functions in SparkR -- Key: SPARK-9427 URL: https://issues.apache.org/jira/browse/SPARK-9427 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Yu Ishikawa The list of functions to add is based on SQL's functions. And it would be better to add them in one shot PR. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9427) Add expression functions in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699843#comment-14699843 ] Davies Liu commented on SPARK-9427: --- [~yu_ishikawa] `rand` does work in PySpark (Python 2.7): {code} sqlContext.range(10).select(rand(2), id).show() +---+---+ | rand()| id| +---+---+ | 0.6038577325006693| 0| | 0.6319470735268434| 1| |0.22327628846133507| 2| |0.24223739932588373| 3| | 0.8395518879513995| 4| | 0.5662927043813443| 5| | 0.2057736041310516| 6| | 0.3408245196642603| 7| |0.08641290347537589| 8| |0.46561147527615276| 9| +---+---+ {code} Add expression functions in SparkR -- Key: SPARK-9427 URL: https://issues.apache.org/jira/browse/SPARK-9427 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Yu Ishikawa The list of functions to add is based on SQL's functions. And it would be better to add them in one shot PR. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9427) Add expression functions in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700052#comment-14700052 ] Reynold Xin commented on SPARK-9427: Re-targeted. Add expression functions in SparkR -- Key: SPARK-9427 URL: https://issues.apache.org/jira/browse/SPARK-9427 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Yu Ishikawa The list of functions to add is based on SQL's functions. And it would be better to add them in one shot PR. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9427) Add expression functions in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700058#comment-14700058 ] Shivaram Venkataraman commented on SPARK-9427: -- Well half of the functions are already in branch-1.5 and I guess we should have PRs for some of the other simpler parts (like 9856) come in soon. The more complex ones which require changing SerDe might not be appropriate for 1.5, but my plan is to get as many of the simple ones in as we can ? Add expression functions in SparkR -- Key: SPARK-9427 URL: https://issues.apache.org/jira/browse/SPARK-9427 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Yu Ishikawa The list of functions to add is based on SQL's functions. And it would be better to add them in one shot PR. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9427) Add expression functions in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700044#comment-14700044 ] Davies Liu commented on SPARK-9427: --- Should we target this for 1.6? Add expression functions in SparkR -- Key: SPARK-9427 URL: https://issues.apache.org/jira/browse/SPARK-9427 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Yu Ishikawa The list of functions to add is based on SQL's functions. And it would be better to add them in one shot PR. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9427) Add expression functions in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699072#comment-14699072 ] Yu Ishikawa commented on SPARK-9427: [~shivaram] and [~davies] How do we convert R {{integer}} type to Scala {{Long}} type? I have trouble with implementing {{rand(seed: Long)}} function in SparkR. R {{integer}} type is recognized as Scala {{Int}} and R {{numeric}} type is recognized as Scala {{Double}} type. So I wonder how I should deal with 64 bit integer on R. I think we should add {{rand(seed: Int)}} into spark.sql on Scala. What do you think? Plus, I guess PySpark {{rand}} doesn't work on Python 2.x on the same reason. Because {{int}} on Python 2.x is recognized as Scala {{Integer}} type. Add expression functions in SparkR -- Key: SPARK-9427 URL: https://issues.apache.org/jira/browse/SPARK-9427 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Yu Ishikawa The list of functions to add is based on SQL's functions. And it would be better to add them in one shot PR. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9427) Add expression functions in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699080#comment-14699080 ] Shivaram Venkataraman commented on SPARK-9427: -- Yeah I think the simplest thing might be to add a version of `rand(seed: Int)` (or `rand(seed: Double)` if we want to maintain precision ?) to the API and do a cast in Scala to call the version with Long. cc [~rxin] Add expression functions in SparkR -- Key: SPARK-9427 URL: https://issues.apache.org/jira/browse/SPARK-9427 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Yu Ishikawa The list of functions to add is based on SQL's functions. And it would be better to add them in one shot PR. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9427) Add expression functions in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692600#comment-14692600 ] Yu Ishikawa commented on SPARK-9427: [~shivaram] After all, I'd like to split this issue to a few sub-issues. Since it is quite difficult to add the listed expressions at once. And since it is a little hard to review a PR for this issue. I think we could classify them to at least three types in SparkR. What do you think? 1. Add expressions whose parameter are only {{(Column)}} or {{(Column, Column)}}, like {{md5(e: Column)}} 2. Add expressions whose parameter are a little complicated, like {{conv(num: Column, fromBase: Int, toBase: Int)}} 3. Add expressions which are conflicted with the already existing generic, like {{coalesce(e: Column*)}} {{1}} is not a difficult task, extracting method definitions from Scala code. And I think we rarely need to consider the confliction with current SparkR code. However, {{2}} and {{3}} are a little hard because of the complexityomplexity. For example, in {{3}}, if we must modify the existing R's generic due to new expressions, we should check whether the modification affects the existing code or not. Add expression functions in SparkR -- Key: SPARK-9427 URL: https://issues.apache.org/jira/browse/SPARK-9427 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Yu Ishikawa The list of functions to add is based on SQL's functions. And it would be better to add them in one shot PR. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9427) Add expression functions in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692662#comment-14692662 ] Shivaram Venkataraman commented on SPARK-9427: -- [~yuu.ishik...@gmail.com] Breaking it into 3 PRs sounds good to me. Do you have an idea of how many functions there are of each type ? Add expression functions in SparkR -- Key: SPARK-9427 URL: https://issues.apache.org/jira/browse/SPARK-9427 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Yu Ishikawa The list of functions to add is based on SQL's functions. And it would be better to add them in one shot PR. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9427) Add expression functions in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692777#comment-14692777 ] Yu Ishikawa commented on SPARK-9427: [~shivaram] I don't figure out the number of each type. However, I estimated them as folows. Please be careful that it includes the functions which have been added into SparkR. 1 = 50 2 and 3 = 51 Add expression functions in SparkR -- Key: SPARK-9427 URL: https://issues.apache.org/jira/browse/SPARK-9427 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Yu Ishikawa The list of functions to add is based on SQL's functions. And it would be better to add them in one shot PR. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9427) Add expression functions in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692607#comment-14692607 ] Yu Ishikawa commented on SPARK-9427: h3. Memo These are the expressions which we should add. (Including existing expressions) I extracted them from Scala's {{functions.scala}} with {{grep}}. {noformat} def abs(e: Column): Column def acos(columnName: String): Column def acos(e: Column): Column def add_months(startDate: Column, numMonths: Int): Column def approxCountDistinct(columnName: String): Column def approxCountDistinct(columnName: String, rsd: Double): Column def approxCountDistinct(e: Column): Column def approxCountDistinct(e: Column, rsd: Double): Column def array(colName: String, colNames: String*): Column def array(cols: Column*): Column def array_contains(column: Column, value: Any): Column def asc(columnName: String): Column def ascii(e: Column): Column def asin(columnName: String): Column def asin(e: Column): Column def atan(columnName: String): Column def atan(e: Column): Column def atan2(l: Column, r: Column): Column def atan2(l: Column, r: Double): Column def atan2(l: Column, rightName: String): Column def atan2(l: Double, r: Column): Column def atan2(l: Double, rightName: String): Column def atan2(leftName: String, r: Column): Column def atan2(leftName: String, r: Double): Column def atan2(leftName: String, rightName: String): Column def avg(columnName: String): Column def avg(e: Column): Column def base64(e: Column): Column def bin(columnName: String): Column def bin(e: Column): Column def bitwiseNOT(e: Column): Column def cbrt(columnName: String): Column def cbrt(e: Column): Column def ceil(columnName: String): Column def ceil(e: Column): Column def coalesce(e: Column*): Column def concat(exprs: Column*): Column def concat_ws(sep: String, exprs: Column*): Column def conv(num: Column, fromBase: Int, toBase: Int): Column def cos(columnName: String): Column def cos(e: Column): Column def cosh(columnName: String): Column def cosh(e: Column): Column def count(columnName: String): Column def count(e: Column): Column def countDistinct(columnName: String, columnNames: String*): Column def countDistinct(expr: Column, exprs: Column*): Column def crc32(e: Column): Column def cumeDist(): Column def current_date(): Column def current_timestamp(): Column def date_add(start: Column, days: Int): Column def date_format(dateExpr: Column, format: String): Column def date_sub(start: Column, days: Int): Column def datediff(end: Column, start: Column): Column def dayofmonth(e: Column): Column def dayofyear(e: Column): Column def decode(value: Column, charset: String): Column def denseRank(): Column def desc(columnName: String): Column def encode(value: Column, charset: String): Column def exp(columnName: String): Column def exp(e: Column): Column def explode(e: Column): Column def expm1(columnName: String): Column def expm1(e: Column): Column def expr(expr: String): Column def factorial(e: Column): Column def first(columnName: String): Column def first(e: Column): Column def floor(columnName: String): Column def floor(e: Column): Column def format_number(x: Column, d: Int): Column def format_string(format: String, arguments: Column*): Column def from_unixtime(ut: Column): Column def from_unixtime(ut: Column, f: String): Column def from_utc_timestamp(ts: Column, tz: String): Column def greatest(columnName: String, columnNames: String*): Column def greatest(exprs: Column*): Column def hex(column: Column): Column def hour(e: Column): Column def hypot(l: Column, r: Column): Column def hypot(l: Column, r: Double): Column def hypot(l: Column, rightName: String): Column def hypot(l: Double, r: Column): Column def hypot(l: Double, rightName: String): Column def hypot(leftName: String, r: Column): Column def hypot(leftName: String, r: Double): Column def hypot(leftName: String, rightName: String): Column def initcap(e: Column): Column def inputFileName(): Column def instr(str: Column, substring: String): Column def isNaN(e: Column): Column def lag(columnName: String, offset: Int): Column def lag(columnName: String, offset: Int, defaultValue: Any): Column def lag(e: Column, offset: Int): Column def lag(e: Column, offset: Int, defaultValue: Any): Column def last(columnName: String): Column def last(e: Column): Column def last_day(e: Column): Column def lead(columnName: String, offset: Int): Column def lead(columnName: String, offset: Int, defaultValue: Any): Column def lead(e: Column, offset: Int): Column def lead(e: Column, offset: Int, defaultValue: Any): Column def least(columnName: String, columnNames: String*): Column def least(exprs: Column*): Column def length(e: Column): Column def levenshtein(l: Column, r: Column): Column def lit(literal: Any): Column def locate(substr: String, str: Column): Column def locate(substr: String, str: Column, pos: Int): Column def log(base: Double, a: Column): Column def
[jira] [Commented] (SPARK-9427) Add expression functions in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645464#comment-14645464 ] Yu Ishikawa commented on SPARK-9427: I'll work this issue. Add expression functions in SparkR -- Key: SPARK-9427 URL: https://issues.apache.org/jira/browse/SPARK-9427 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa The list of functions to add is based on SQL's functions. And it would be better to add them in one shot PR. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org