[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...
Github user TomaszGaweda closed the pull request at: https://github.com/apache/spark/pull/22249 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...
Github user TomaszGaweda commented on a diff in the pull request: https://github.com/apache/spark/pull/22249#discussion_r213232338 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2459,6 +2459,26 @@ object functions { StringTrimLeft(e.expr, Literal(trimString)) } + /** +* Extracts a part from a URL. +* +* @group string_funcs +* @since 2.4.0 +*/ + def parse_url(url: Column, partToExtract: String): Column = withExpr { --- End diff -- @HyukjinKwon I see now. Yeah, wrapping in the `Column` will be necessary, at least no string concatenation will be required --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22249#discussion_r213227841 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2459,6 +2459,26 @@ object functions { StringTrimLeft(e.expr, Literal(trimString)) } + /** +* Extracts a part from a URL. +* +* @group string_funcs +* @since 2.4.0 +*/ + def parse_url(url: Column, partToExtract: String): Column = withExpr { --- End diff -- I mean, > I would suggest method that return handler for any registered function. So that you can write: SqlFunction something = spark.(...?).getFunction("parse_url") Can this support strongly typed one? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...
Github user TomaszGaweda commented on a diff in the pull request: https://github.com/apache/spark/pull/22249#discussion_r213213173 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2459,6 +2459,26 @@ object functions { StringTrimLeft(e.expr, Literal(trimString)) } + /** +* Extracts a part from a URL. +* +* @group string_funcs +* @since 2.4.0 +*/ + def parse_url(url: Column, partToExtract: String): Column = withExpr { --- End diff -- @HyukjinKwon Thanks for the suggestion, however now users are complaining about stringly-typed system in Spark, there are libs like Frameless from Typelevel to archieve a bit more type safety. `expr` is springly-typed, while functions in `functions` object or accessed via `UserDefinedFunction` are a bit more type safe. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22249#discussion_r213176616 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2459,6 +2459,26 @@ object functions { StringTrimLeft(e.expr, Literal(trimString)) } + /** +* Extracts a part from a URL. +* +* @group string_funcs +* @since 2.4.0 +*/ + def parse_url(url: Column, partToExtract: String): Column = withExpr { --- End diff -- Can't we just use `expr` instead? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...
Github user TomaszGaweda commented on a diff in the pull request: https://github.com/apache/spark/pull/22249#discussion_r213126158 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2459,6 +2459,26 @@ object functions { StringTrimLeft(e.expr, Literal(trimString)) } + /** +* Extracts a part from a URL. +* +* @group string_funcs +* @since 2.4.0 +*/ + def parse_url(url: Column, partToExtract: String): Column = withExpr { --- End diff -- Ok, tomorrow I will create a Jira and start working on it. Thanks for your comments! :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22249#discussion_r213121794 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2459,6 +2459,26 @@ object functions { StringTrimLeft(e.expr, Literal(trimString)) } + /** +* Extracts a part from a URL. +* +* @group string_funcs +* @since 2.4.0 +*/ + def parse_url(url: Column, partToExtract: String): Column = withExpr { --- End diff -- I like this idea too --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22249#discussion_r213120096 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2459,6 +2459,26 @@ object functions { StringTrimLeft(e.expr, Literal(trimString)) } + /** +* Extracts a part from a URL. +* +* @group string_funcs +* @since 2.4.0 +*/ + def parse_url(url: Column, partToExtract: String): Column = withExpr { --- End diff -- @TomaszGaweda This sounds a good idea by returning a handler for built-in functions. cc @rxin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...
Github user TomaszGaweda commented on a diff in the pull request: https://github.com/apache/spark/pull/22249#discussion_r213112726 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2459,6 +2459,26 @@ object functions { StringTrimLeft(e.expr, Literal(trimString)) } + /** +* Extracts a part from a URL. +* +* @group string_funcs +* @since 2.4.0 +*/ + def parse_url(url: Column, partToExtract: String): Column = withExpr { --- End diff -- In long term, I would suggest method that return handler for any registered function. So that you can write: SqlFunction something = spark.(...?).getFunction("parse_url"). Now `spark.udf.register` returns a handler for UDF, something similar for getting any kind of registered function may be helpful. However, it's a lot more work, so now I've just proposed to add this function :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...
Github user TomaszGaweda commented on a diff in the pull request: https://github.com/apache/spark/pull/22249#discussion_r213110408 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2459,6 +2459,26 @@ object functions { StringTrimLeft(e.expr, Literal(trimString)) } + /** +* Extracts a part from a URL. +* +* @group string_funcs +* @since 2.4.0 +*/ + def parse_url(url: Column, partToExtract: String): Column = withExpr { --- End diff -- This PR is created after this StackOverflow question: https://stackoverflow.com/questions/52041342/how-to-parse-url-in-spark-sqlscala/52043771 I'm not sure how often it is used, however most of functions are available in functions object to make Scala and SQL interfaces similar. If you think it's useless - please let me know, I'll just close this PR --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22249#discussion_r213109101 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2459,6 +2459,26 @@ object functions { StringTrimLeft(e.expr, Literal(trimString)) } + /** +* Extracts a part from a URL. +* +* @group string_funcs +* @since 2.4.0 +*/ + def parse_url(url: Column, partToExtract: String): Column = withExpr { --- End diff -- When adding the functions into `object functions`, we need to check how useful they are. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...
GitHub user TomaszGaweda opened a pull request: https://github.com/apache/spark/pull/22249 [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to functions.scala ## What changes were proposed in this pull request? Add `parse_url` function to `functions.scala`. This will allow users to use this functions without calling `selectExpr` or `spark.sql` ## How was this patch tested? `testUrl` function was changed to test also this change Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/TomaszGaweda/spark parseUrl Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22249.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22249 commit 4002cb9c48b608123845b89af9f77e137626f83f Author: Tomasz Gaweda Date: 2018-08-27T20:05:50Z SPARK-16281 Follow-up: parse_url in functions.scala --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org