[GitHub] [spark] beliefer commented on a change in pull request #29891: [SPARK-30796][SQL] Add parameter position for REGEXP_REPLACE

2020-10-14 Thread GitBox


beliefer commented on a change in pull request #29891:
URL: https://github.com/apache/spark/pull/29891#discussion_r505103088



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##
@@ -318,16 +320,46 @@ case class StringSplit(str: Expression, regex: 
Expression, limit: Expression)
  */
 // scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(str, regexp, rep) - Replaces all substrings of `str` that 
match `regexp` with `rep`.",
+  usage = "_FUNC_(str, regexp, rep[, position]) - Replaces all substrings of 
`str` that match `regexp` with `rep`.",
+  arguments = """
+Arguments:
+  * str - a string expression to search for a regular expression pattern 
match.
+  * regexp - a string representing a regular expression. The regex string 
should be a
+  Java regular expression.
+
+  Since Spark 2.0, string literals (including regex patterns) are 
unescaped in our SQL
+  parser. For example, to match "\abc", a regular expression for 
`regexp` can be
+  "^\\abc$".
+
+  There is a SQL config 'spark.sql.parser.escapedStringLiterals' that 
can be used to
+  fallback to the Spark 1.6 behavior regarding string literal parsing. 
For example,
+  if the config is enabled, the `regexp` that can match "\abc" is 
"^\abc$".
+  * rep - a string expression to replace matched substrings.
+  * position - a positive integer expression that indicates the position 
within `str` to begin searching.

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29891: [SPARK-30796][SQL] Add parameter position for REGEXP_REPLACE

2020-10-14 Thread GitBox


beliefer commented on a change in pull request #29891:
URL: https://github.com/apache/spark/pull/29891#discussion_r505148258



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##
@@ -397,21 +437,33 @@ case class RegExpReplace(subject: Expression, regexp: 
Expression, rep: Expressio
 $termLastReplacementInUTF8 = $rep.clone();
 $termLastReplacement = $termLastReplacementInUTF8.toString();
   }
-  $classNameStringBuffer $termResult = new $classNameStringBuffer();
-  java.util.regex.Matcher $matcher = 
$termPattern.matcher($subject.toString());
-
-  while ($matcher.find()) {
-$matcher.appendReplacement($termResult, $termLastReplacement);
+  String $source = $subject.toString();
+  int $position = $pos - 1;
+  if ($position < $source.length()) {
+$classNameStringBuffer $termResult = new $classNameStringBuffer();
+java.util.regex.Matcher $matcher = $termPattern.matcher($source);
+$matcher.region($position, $source.length());
+
+while ($matcher.find()) {
+  $matcher.appendReplacement($termResult, $termLastReplacement);
+}
+$matcher.appendTail($termResult);
+${ev.value} = UTF8String.fromString($termResult.toString());
+$termResult = null;
+  } else {
+${ev.value} = $subject;
   }
-  $matcher.appendTail($termResult);
-  ${ev.value} = UTF8String.fromString($termResult.toString());
-  $termResult = null;
   $setEvNotNull
 """
 })
   }
 }
 
+object RegExpReplace {
+  def apply(subject: Expression, regexp: Expression, rep: Expression): 
RegExpReplace =
+new RegExpReplace(subject, regexp, rep, Literal(1))

Review comment:
   Yes. I will revert some code.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29891: [SPARK-30796][SQL] Add parameter position for REGEXP_REPLACE

2020-10-15 Thread GitBox


beliefer commented on a change in pull request #29891:
URL: https://github.com/apache/spark/pull/29891#discussion_r506004054



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##
@@ -318,16 +320,46 @@ case class StringSplit(str: Expression, regex: 
Expression, limit: Expression)
  */
 // scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(str, regexp, rep) - Replaces all substrings of `str` that 
match `regexp` with `rep`.",
+  usage = "_FUNC_(str, regexp, rep[, position]) - Replaces all substrings of 
`str` that match `regexp` with `rep`.",
+  arguments = """
+Arguments:
+  * str - a string expression to search for a regular expression pattern 
match.
+  * regexp - a string representing a regular expression. The regex string 
should be a
+  Java regular expression.
+
+  Since Spark 2.0, string literals (including regex patterns) are 
unescaped in our SQL
+  parser. For example, to match "\abc", a regular expression for 
`regexp` can be
+  "^\\abc$".
+
+  There is a SQL config 'spark.sql.parser.escapedStringLiterals' that 
can be used to
+  fallback to the Spark 1.6 behavior regarding string literal parsing. 
For example,
+  if the config is enabled, the `regexp` that can match "\abc" is 
"^\abc$".
+  * rep - a string expression to replace matched substrings.
+  * position - a positive integer expression that indicates the position 
within `str` to begin searching.
+  The default is 1. If position is greater than the number of 
characters in `str`, the result is `str`.
+  """,
   examples = """
 Examples:
   > SELECT _FUNC_('100-200', '(\\d+)', 'num');
num-num
   """,
   since = "1.5.0")
 // scalastyle:on line.size.limit
-case class RegExpReplace(subject: Expression, regexp: Expression, rep: 
Expression)
-  extends TernaryExpression with ImplicitCastInputTypes with NullIntolerant {
+case class RegExpReplace(subject: Expression, regexp: Expression, rep: 
Expression, pos: Expression)
+  extends QuaternaryExpression with ImplicitCastInputTypes with NullIntolerant 
{
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (!pos.foldable) {
+  return TypeCheckFailure(s"Position expression must be foldable, but got 
$pos")
+}
+
+val i = pos.eval().asInstanceOf[Int]

Review comment:
   Reference Vertica 
   ```
   dbadmin=> select regexp_replace('healthy, wealthy, and wise', '\\w', 
'something', null);
regexp_replace
   
   
   (1 row)
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29891: [SPARK-30796][SQL] Add parameter position for REGEXP_REPLACE

2020-10-15 Thread GitBox


beliefer commented on a change in pull request #29891:
URL: https://github.com/apache/spark/pull/29891#discussion_r506004054



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##
@@ -318,16 +320,46 @@ case class StringSplit(str: Expression, regex: 
Expression, limit: Expression)
  */
 // scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(str, regexp, rep) - Replaces all substrings of `str` that 
match `regexp` with `rep`.",
+  usage = "_FUNC_(str, regexp, rep[, position]) - Replaces all substrings of 
`str` that match `regexp` with `rep`.",
+  arguments = """
+Arguments:
+  * str - a string expression to search for a regular expression pattern 
match.
+  * regexp - a string representing a regular expression. The regex string 
should be a
+  Java regular expression.
+
+  Since Spark 2.0, string literals (including regex patterns) are 
unescaped in our SQL
+  parser. For example, to match "\abc", a regular expression for 
`regexp` can be
+  "^\\abc$".
+
+  There is a SQL config 'spark.sql.parser.escapedStringLiterals' that 
can be used to
+  fallback to the Spark 1.6 behavior regarding string literal parsing. 
For example,
+  if the config is enabled, the `regexp` that can match "\abc" is 
"^\abc$".
+  * rep - a string expression to replace matched substrings.
+  * position - a positive integer expression that indicates the position 
within `str` to begin searching.
+  The default is 1. If position is greater than the number of 
characters in `str`, the result is `str`.
+  """,
   examples = """
 Examples:
   > SELECT _FUNC_('100-200', '(\\d+)', 'num');
num-num
   """,
   since = "1.5.0")
 // scalastyle:on line.size.limit
-case class RegExpReplace(subject: Expression, regexp: Expression, rep: 
Expression)
-  extends TernaryExpression with ImplicitCastInputTypes with NullIntolerant {
+case class RegExpReplace(subject: Expression, regexp: Expression, rep: 
Expression, pos: Expression)
+  extends QuaternaryExpression with ImplicitCastInputTypes with NullIntolerant 
{
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (!pos.foldable) {
+  return TypeCheckFailure(s"Position expression must be foldable, but got 
$pos")
+}
+
+val i = pos.eval().asInstanceOf[Int]

Review comment:
   Oracle
   ```
   WITH strings AS ( SELECT 'healthy, wealthy, and wise' s FROM dual ) SELECT
   s "STRING",
   regexp_replace( s, '\w', 'something', NULL ) "MODIFIED_STRING" 
   FROM
strings;
   ```
   STRING | MODIFIED_STRING
   -- | --
   healthy, wealthy, and wise | 
   
   
   Vertica 
   ```
   dbadmin=> select regexp_replace('healthy, wealthy, and wise', '\\w', 
'something', null);
regexp_replace
   
   
   (1 row)
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29891: [SPARK-30796][SQL] Add parameter position for REGEXP_REPLACE

2020-10-15 Thread GitBox


beliefer commented on a change in pull request #29891:
URL: https://github.com/apache/spark/pull/29891#discussion_r506004054



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##
@@ -318,16 +320,46 @@ case class StringSplit(str: Expression, regex: 
Expression, limit: Expression)
  */
 // scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(str, regexp, rep) - Replaces all substrings of `str` that 
match `regexp` with `rep`.",
+  usage = "_FUNC_(str, regexp, rep[, position]) - Replaces all substrings of 
`str` that match `regexp` with `rep`.",
+  arguments = """
+Arguments:
+  * str - a string expression to search for a regular expression pattern 
match.
+  * regexp - a string representing a regular expression. The regex string 
should be a
+  Java regular expression.
+
+  Since Spark 2.0, string literals (including regex patterns) are 
unescaped in our SQL
+  parser. For example, to match "\abc", a regular expression for 
`regexp` can be
+  "^\\abc$".
+
+  There is a SQL config 'spark.sql.parser.escapedStringLiterals' that 
can be used to
+  fallback to the Spark 1.6 behavior regarding string literal parsing. 
For example,
+  if the config is enabled, the `regexp` that can match "\abc" is 
"^\abc$".
+  * rep - a string expression to replace matched substrings.
+  * position - a positive integer expression that indicates the position 
within `str` to begin searching.
+  The default is 1. If position is greater than the number of 
characters in `str`, the result is `str`.
+  """,
   examples = """
 Examples:
   > SELECT _FUNC_('100-200', '(\\d+)', 'num');
num-num
   """,
   since = "1.5.0")
 // scalastyle:on line.size.limit
-case class RegExpReplace(subject: Expression, regexp: Expression, rep: 
Expression)
-  extends TernaryExpression with ImplicitCastInputTypes with NullIntolerant {
+case class RegExpReplace(subject: Expression, regexp: Expression, rep: 
Expression, pos: Expression)
+  extends QuaternaryExpression with ImplicitCastInputTypes with NullIntolerant 
{
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (!pos.foldable) {
+  return TypeCheckFailure(s"Position expression must be foldable, but got 
$pos")
+}
+
+val i = pos.eval().asInstanceOf[Int]

Review comment:
   ```
   WITH strings AS ( SELECT 'healthy, wealthy, and wise' s FROM dual ) SELECT
   s "STRING",
   regexp_replace( s, '\w', 'something', NULL ) "MODIFIED_STRING" 
   FROM
strings;
   ```
   STRING | MODIFIED_STRING
   -- | --
   healthy, wealthy, and wise | 
   Vertica 
   ```
   dbadmin=> select regexp_replace('healthy, wealthy, and wise', '\\w', 
'something', null);
regexp_replace
   
   
   (1 row)
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29891: [SPARK-30796][SQL] Add parameter position for REGEXP_REPLACE

2020-10-15 Thread GitBox


beliefer commented on a change in pull request #29891:
URL: https://github.com/apache/spark/pull/29891#discussion_r506004054



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##
@@ -318,16 +320,46 @@ case class StringSplit(str: Expression, regex: 
Expression, limit: Expression)
  */
 // scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(str, regexp, rep) - Replaces all substrings of `str` that 
match `regexp` with `rep`.",
+  usage = "_FUNC_(str, regexp, rep[, position]) - Replaces all substrings of 
`str` that match `regexp` with `rep`.",
+  arguments = """
+Arguments:
+  * str - a string expression to search for a regular expression pattern 
match.
+  * regexp - a string representing a regular expression. The regex string 
should be a
+  Java regular expression.
+
+  Since Spark 2.0, string literals (including regex patterns) are 
unescaped in our SQL
+  parser. For example, to match "\abc", a regular expression for 
`regexp` can be
+  "^\\abc$".
+
+  There is a SQL config 'spark.sql.parser.escapedStringLiterals' that 
can be used to
+  fallback to the Spark 1.6 behavior regarding string literal parsing. 
For example,
+  if the config is enabled, the `regexp` that can match "\abc" is 
"^\abc$".
+  * rep - a string expression to replace matched substrings.
+  * position - a positive integer expression that indicates the position 
within `str` to begin searching.
+  The default is 1. If position is greater than the number of 
characters in `str`, the result is `str`.
+  """,
   examples = """
 Examples:
   > SELECT _FUNC_('100-200', '(\\d+)', 'num');
num-num
   """,
   since = "1.5.0")
 // scalastyle:on line.size.limit
-case class RegExpReplace(subject: Expression, regexp: Expression, rep: 
Expression)
-  extends TernaryExpression with ImplicitCastInputTypes with NullIntolerant {
+case class RegExpReplace(subject: Expression, regexp: Expression, rep: 
Expression, pos: Expression)
+  extends QuaternaryExpression with ImplicitCastInputTypes with NullIntolerant 
{
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (!pos.foldable) {
+  return TypeCheckFailure(s"Position expression must be foldable, but got 
$pos")
+}
+
+val i = pos.eval().asInstanceOf[Int]

Review comment:
   ```
   WITH strings AS ( SELECT 'healthy, wealthy, and wise' s FROM dual ) SELECT
   s "STRING",
   regexp_replace( s, '\w', 'something', NULL ) "MODIFIED_STRING" 
   FROM
strings;
   ```
   STRING | MODIFIED_STRING
   -- | --
   healthy, wealthy, and wise | 
   
   
   Vertica 
   ```
   dbadmin=> select regexp_replace('healthy, wealthy, and wise', '\\w', 
'something', null);
regexp_replace
   
   
   (1 row)
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29891: [SPARK-30796][SQL] Add parameter position for REGEXP_REPLACE

2020-10-16 Thread GitBox


beliefer commented on a change in pull request #29891:
URL: https://github.com/apache/spark/pull/29891#discussion_r506217898



##
File path: sql/core/src/test/resources/sql-tests/inputs/regexp-functions.sql
##
@@ -31,3 +31,14 @@ SELECT regexp_extract_all('1a 2b 14m', '(\\d+)([a-z]+)', 3);
 SELECT regexp_extract_all('1a 2b 14m', '(\\d+)([a-z]+)', -1);
 SELECT regexp_extract_all('1a 2b 14m', '(\\d+)?([a-z]+)', 1);
 SELECT regexp_extract_all('a 2b 14m', '(\\d+)?([a-z]+)', 1);
+
+-- regexp_replace
+SELECT regexp_replace('healthy, wealthy, and wise', '\\w+thy', 'something');
+SELECT regexp_replace('healthy, wealthy, and wise', '\\w+thy', 'something', 
-2);
+SELECT regexp_replace('healthy, wealthy, and wise', '\\w+thy', 'something', 0);
+SELECT regexp_replace('healthy, wealthy, and wise', '\\w+thy', 'something', 1);
+SELECT regexp_replace('healthy, wealthy, and wise', '\\w+thy', 'something', 2);
+SELECT regexp_replace('healthy, wealthy, and wise', '\\w+thy', 'something', 8);
+SELECT regexp_replace('healthy, wealthy, and wise', '\\w', 'something', 26);
+SELECT regexp_replace('healthy, wealthy, and wise', '\\w', 'something', 27);
+SELECT regexp_replace('healthy, wealthy, and wise', '\\w', 'something', 30);

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29891: [SPARK-30796][SQL] Add parameter position for REGEXP_REPLACE

2020-10-16 Thread GitBox


beliefer commented on a change in pull request #29891:
URL: https://github.com/apache/spark/pull/29891#discussion_r506217520



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##
@@ -318,16 +320,46 @@ case class StringSplit(str: Expression, regex: 
Expression, limit: Expression)
  */
 // scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(str, regexp, rep) - Replaces all substrings of `str` that 
match `regexp` with `rep`.",
+  usage = "_FUNC_(str, regexp, rep[, position]) - Replaces all substrings of 
`str` that match `regexp` with `rep`.",
+  arguments = """
+Arguments:
+  * str - a string expression to search for a regular expression pattern 
match.
+  * regexp - a string representing a regular expression. The regex string 
should be a
+  Java regular expression.
+
+  Since Spark 2.0, string literals (including regex patterns) are 
unescaped in our SQL
+  parser. For example, to match "\abc", a regular expression for 
`regexp` can be
+  "^\\abc$".
+
+  There is a SQL config 'spark.sql.parser.escapedStringLiterals' that 
can be used to
+  fallback to the Spark 1.6 behavior regarding string literal parsing. 
For example,
+  if the config is enabled, the `regexp` that can match "\abc" is 
"^\abc$".
+  * rep - a string expression to replace matched substrings.
+  * position - a positive integer expression that indicates the position 
within `str` to begin searching.
+  The default is 1. If position is greater than the number of 
characters in `str`, the result is `str`.
+  """,
   examples = """
 Examples:
   > SELECT _FUNC_('100-200', '(\\d+)', 'num');
num-num
   """,
   since = "1.5.0")
 // scalastyle:on line.size.limit
-case class RegExpReplace(subject: Expression, regexp: Expression, rep: 
Expression)
-  extends TernaryExpression with ImplicitCastInputTypes with NullIntolerant {
+case class RegExpReplace(subject: Expression, regexp: Expression, rep: 
Expression, pos: Expression)
+  extends QuaternaryExpression with ImplicitCastInputTypes with NullIntolerant 
{
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (!pos.foldable) {
+  return TypeCheckFailure(s"Position expression must be foldable, but got 
$pos")
+}
+
+val i = pos.eval().asInstanceOf[Int]

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29891: [SPARK-30796][SQL] Add parameter position for REGEXP_REPLACE

2020-10-12 Thread GitBox


beliefer commented on a change in pull request #29891:
URL: https://github.com/apache/spark/pull/29891#discussion_r503647723



##
File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala
##
@@ -2538,7 +2538,7 @@ object functions {
* @since 1.5.0
*/
   def regexp_replace(e: Column, pattern: String, replacement: String): Column 
= withExpr {
-RegExpReplace(e.expr, lit(pattern).expr, lit(replacement).expr)
+new RegExpReplace(e.expr, lit(pattern).expr, lit(replacement).expr)

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29891: [SPARK-30796][SQL] Add parameter position for REGEXP_REPLACE

2020-10-12 Thread GitBox


beliefer commented on a change in pull request #29891:
URL: https://github.com/apache/spark/pull/29891#discussion_r503648228



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##
@@ -318,16 +320,49 @@ case class StringSplit(str: Expression, regex: 
Expression, limit: Expression)
  */
 // scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(str, regexp, rep) - Replaces all substrings of `str` that 
match `regexp` with `rep`.",
+  usage = "_FUNC_(str, regexp, rep[, position]) - Replaces all substrings of 
`str` that match `regexp` with `rep`.",
+  arguments = """
+Arguments:
+  * str - a string expression to search for a regular expression pattern 
match.
+  * regexp - a string representing a regular expression. The regex string 
should be a
+  Java regular expression.
+
+  Since Spark 2.0, string literals (including regex patterns) are 
unescaped in our SQL
+  parser. For example, to match "\abc", a regular expression for 
`regexp` can be
+  "^\\abc$".
+
+  There is a SQL config 'spark.sql.parser.escapedStringLiterals' that 
can be used to
+  fallback to the Spark 1.6 behavior regarding string literal parsing. 
For example,
+  if the config is enabled, the `regexp` that can match "\abc" is 
"^\abc$".
+  * rep - a string expression to replace matched substrings.
+  * position - a positive integer expression that indicates the position 
within `str` to begin searching.
+  The default is 1. If position is greater than the number of 
characters in `str`, the result is `str`.
+  """,
   examples = """
 Examples:
   > SELECT _FUNC_('100-200', '(\\d+)', 'num');
num-num
   """,
   since = "1.5.0")
 // scalastyle:on line.size.limit
-case class RegExpReplace(subject: Expression, regexp: Expression, rep: 
Expression)
-  extends TernaryExpression with ImplicitCastInputTypes with NullIntolerant {
+case class RegExpReplace(subject: Expression, regexp: Expression, rep: 
Expression, pos: Expression)
+  extends QuaternaryExpression with ImplicitCastInputTypes with NullIntolerant 
{
+
+  def this(subject: Expression, regexp: Expression, rep: Expression) =
+this(subject, regexp, rep, Literal(1))
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (!pos.foldable) {

Review comment:
   Yes. Because all the database tell us pos must be positive





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29891: [SPARK-30796][SQL] Add parameter position for REGEXP_REPLACE

2020-10-12 Thread GitBox


beliefer commented on a change in pull request #29891:
URL: https://github.com/apache/spark/pull/29891#discussion_r503648480



##
File path: 
sql/core/src/test/resources/sql-tests/results/regexp-functions.sql.out
##
@@ -252,3 +252,53 @@ SELECT regexp_extract_all('a 2b 14m', '(\\d+)?([a-z]+)', 1)
 struct>
 -- !query output
 ["","2","14"]
+
+
+-- !query
+SELECT regexp_replace('healthy, wealthy, and wise', '\\w+thy', 'something')
+-- !query schema
+struct
+-- !query output
+something, something, and wise
+
+
+-- !query
+SELECT regexp_replace('healthy, wealthy, and wise', '\\w+thy', 'something', -2)
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.AnalysisException
+cannot resolve 'regexp_replace('healthy, wealthy, and wise', '\\w+thy', 
'something', -2)' due to data type mismatch: Position expression must be 
positive, but got: -2; line 1 pos 7

Review comment:
   Yes. All the database tell us position must be positive integer.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29891: [SPARK-30796][SQL] Add parameter position for REGEXP_REPLACE

2020-10-12 Thread GitBox


beliefer commented on a change in pull request #29891:
URL: https://github.com/apache/spark/pull/29891#discussion_r503683723



##
File path: sql/core/src/test/resources/sql-tests/inputs/regexp-functions.sql
##
@@ -31,3 +31,11 @@ SELECT regexp_extract_all('1a 2b 14m', '(\\d+)([a-z]+)', 3);
 SELECT regexp_extract_all('1a 2b 14m', '(\\d+)([a-z]+)', -1);
 SELECT regexp_extract_all('1a 2b 14m', '(\\d+)?([a-z]+)', 1);
 SELECT regexp_extract_all('a 2b 14m', '(\\d+)?([a-z]+)', 1);
+
+-- regexp_replace
+SELECT regexp_replace('healthy, wealthy, and wise', '\\w+thy', 'something');
+SELECT regexp_replace('healthy, wealthy, and wise', '\\w+thy', 'something', 
-2);
+SELECT regexp_replace('healthy, wealthy, and wise', '\\w+thy', 'something', 0);
+SELECT regexp_replace('healthy, wealthy, and wise', '\\w+thy', 'something', 1);
+SELECT regexp_replace('healthy, wealthy, and wise', '\\w+thy', 'something', 2);
+SELECT regexp_replace('healthy, wealthy, and wise', '\\w+thy', 'something', 8);

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29891: [SPARK-30796][SQL] Add parameter position for REGEXP_REPLACE

2020-10-12 Thread GitBox


beliefer commented on a change in pull request #29891:
URL: https://github.com/apache/spark/pull/29891#discussion_r503684400



##
File path: 
sql/core/src/test/resources/sql-tests/results/regexp-functions.sql.out
##
@@ -252,3 +252,53 @@ SELECT regexp_extract_all('a 2b 14m', '(\\d+)?([a-z]+)', 1)
 struct>
 -- !query output
 ["","2","14"]
+
+
+-- !query
+SELECT regexp_replace('healthy, wealthy, and wise', '\\w+thy', 'something')
+-- !query schema
+struct
+-- !query output
+something, something, and wise
+
+
+-- !query
+SELECT regexp_replace('healthy, wealthy, and wise', '\\w+thy', 'something', -2)
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.AnalysisException
+cannot resolve 'regexp_replace('healthy, wealthy, and wise', '\\w+thy', 
'something', -2)' due to data type mismatch: Position expression must be 
positive, but got: -2; line 1 pos 7

Review comment:
   It seems not need.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org