[jira] [Commented] (SPARK-22183) Inconsistency in LIKE escaping between literal values and column-based ones
[ https://issues.apache.org/jira/browse/SPARK-22183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402122#comment-16402122 ] Michal Szafranski commented on SPARK-22183: --- [~instanceof me] Would it work for your use case if 'contains()' would be accessible through Spark SQL (for example "SELECT * FROM test WHERE contains(_1, _2)")? > Inconsistency in LIKE escaping between literal values and column-based ones > --- > > Key: SPARK-22183 > URL: https://issues.apache.org/jira/browse/SPARK-22183 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Adrien Lavoillotte >Priority: Minor > > I'm trying to implement auto-escaping for {{LIKE}} expressions, in order to > have filters & join conditions like: > * Column A's value contains column B's > * Column A's value contains some literal string > So I need to escape {{LIKE}}-significant characters {{%}} and {{_}}. Since > SparkSQL does not support {{LIKE expr ESCAPE char}}, I need to escape using > \, and presumably also \ itself (twice in the case of literals, since '\\' > represents a single \). > But it seems that in a {{LIKE}} expression literal does not have quite the > same escaping as other literal strings or non-literals {{LIKE}} expressions, > seemingly depending on whether the left-hand side and/or right-hand side are > literals or columns. > Note: I'm using triple-quotes below to avoid scala-level \ escaping. And in > the body of this description, I'm purposedly using zero-width spaces to avoid > Jira transforming my \. > On Spark 2.2.0: > {code} > // both LHS & RHS literals > scala> spark.sql("""SELECT '\\', '\\' LIKE ''""").show() > +---+-+ > | \|\ LIKE \\| > +---+-+ > | \| true| > +---+-+ > scala> spark.sql("""SELECT '\\', '\\' LIKE '\\'""").show() > org.apache.spark.sql.AnalysisException: the pattern '\' is invalid, it is not > allowed to end with the escape character; > at > org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42) > at > org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:53) > at > org.apache.spark.sql.catalyst.expressions.Like.escape(regexpExpressions.scala:105) > at > org.apache.spark.sql.catalyst.expressions.StringRegexExpression.compile(regexpExpressions.scala:50) > at > org.apache.spark.sql.catalyst.expressions.StringRegexExpression.pattern(regexpExpressions.scala:53) > at > org.apache.spark.sql.catalyst.expressions.StringRegexExpression.nullSafeEval(regexpExpressions.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:419) > ... > scala> spark.sql("""SELECT 'a\\b', 'a\\b' LIKE 'ab'""").show() > +---+-+ > |a\b|a\b LIKE a\\b| > +---+-+ > |a\b| true| > +---+-+ > scala> spark.sql("""SELECT 'a\\b', 'a\\b' LIKE 'a\\b'""").show() > org.apache.spark.sql.AnalysisException: the pattern 'a\b' is invalid, the > escape character is not allowed to precede 'b'; > at > org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42) > at > org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:51) > at > org.apache.spark.sql.catalyst.expressions.Like.escape(regexpExpressions.scala:105) > ... > // test data > spark.sql("""SELECT * FROM test""").show() > +++ > | _1| _2| > +++ > | ok| ok| > | Ok| ok| > | a_b| a_b| > | aab| a_b| > | c%d| c%d| > |caad| c%d| > |e\nf|e\nf| > | e > f|e\nf| > +++ > // both column-based > // not escaping \ > scala> spark.sql("""SELECT * FROM test t WHERE `_1` LIKE REGEXP_REPLACE(`_2`, > '([%_])', '$1')""").show() > ERROR executor.Executor: Exception in task 0.0 in stage 1.0 (TID 1) > org.apache.spark.sql.AnalysisException: the pattern 'e\nf' is invalid, the > escape character is not allowed to precede 'n'; > at > org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42) > at > org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:51) > at > org.apache.spark.sql.catalyst.util.StringUtils.escapeLikeRegex(StringUtils.scala) > ... > // escaping \ > scala> spark.sql("""SELECT * FROM test t WHERE `_1` LIKE REGEXP_REPLACE(`_2`, > '([%_])', '$1')""").show() > +++ > | _1| _2| > +++ > | ok| ok| > | a_b| a_b| > | c%d| c%d| > |e\nf|e\nf| > +++ > // LHS column-based, RHS literal > scala> spark.sql("""SELECT * FROM test t WHERE `_1` LIKE 'e\\nf'""").show() > +++ > | _1| _2| > +++ > |e\nf|e\nf| > +++ > scala> spark.sql("""SELECT * FROM test t WHERE `_1` LIKE 'enf'""").show() > +---+---+ > | _1| _2| > +---+---+ >
[jira] [Comment Edited] (SPARK-22183) Inconsistency in LIKE escaping between literal values and column-based ones
[ https://issues.apache.org/jira/browse/SPARK-22183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402042#comment-16402042 ] Michal Szafranski edited comment on SPARK-22183 at 3/16/18 3:20 PM: As for the reporters use case, using explicit 'contains()' function would not just work around this issue, but also I would expect it to be significantly faster. I don't think it is mapped in SQL though: {code:java} sqlContext.sql("SELECT * FROM test t").filter($"_1".contains($"_2")).show() {code} was (Author: michal.db): As for the reporters use case, using explicit 'contains()' function would not just work around this issue, but also I would expect it to be significantly faster. I don't think it is mapped in SQL though: {code:java} sqlContext.sql("""SELECT * FROM test t""").filter($"_1".contains($"_2")).show() {code} > Inconsistency in LIKE escaping between literal values and column-based ones > --- > > Key: SPARK-22183 > URL: https://issues.apache.org/jira/browse/SPARK-22183 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Adrien Lavoillotte >Priority: Minor > > I'm trying to implement auto-escaping for {{LIKE}} expressions, in order to > have filters & join conditions like: > * Column A's value contains column B's > * Column A's value contains some literal string > So I need to escape {{LIKE}}-significant characters {{%}} and {{_}}. Since > SparkSQL does not support {{LIKE expr ESCAPE char}}, I need to escape using > \, and presumably also \ itself (twice in the case of literals, since '\\' > represents a single \). > But it seems that in a {{LIKE}} expression literal does not have quite the > same escaping as other literal strings or non-literals {{LIKE}} expressions, > seemingly depending on whether the left-hand side and/or right-hand side are > literals or columns. > Note: I'm using triple-quotes below to avoid scala-level \ escaping. And in > the body of this description, I'm purposedly using zero-width spaces to avoid > Jira transforming my \. > On Spark 2.2.0: > {code} > // both LHS & RHS literals > scala> spark.sql("""SELECT '\\', '\\' LIKE ''""").show() > +---+-+ > | \|\ LIKE \\| > +---+-+ > | \| true| > +---+-+ > scala> spark.sql("""SELECT '\\', '\\' LIKE '\\'""").show() > org.apache.spark.sql.AnalysisException: the pattern '\' is invalid, it is not > allowed to end with the escape character; > at > org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42) > at > org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:53) > at > org.apache.spark.sql.catalyst.expressions.Like.escape(regexpExpressions.scala:105) > at > org.apache.spark.sql.catalyst.expressions.StringRegexExpression.compile(regexpExpressions.scala:50) > at > org.apache.spark.sql.catalyst.expressions.StringRegexExpression.pattern(regexpExpressions.scala:53) > at > org.apache.spark.sql.catalyst.expressions.StringRegexExpression.nullSafeEval(regexpExpressions.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:419) > ... > scala> spark.sql("""SELECT 'a\\b', 'a\\b' LIKE 'ab'""").show() > +---+-+ > |a\b|a\b LIKE a\\b| > +---+-+ > |a\b| true| > +---+-+ > scala> spark.sql("""SELECT 'a\\b', 'a\\b' LIKE 'a\\b'""").show() > org.apache.spark.sql.AnalysisException: the pattern 'a\b' is invalid, the > escape character is not allowed to precede 'b'; > at > org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42) > at > org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:51) > at > org.apache.spark.sql.catalyst.expressions.Like.escape(regexpExpressions.scala:105) > ... > // test data > spark.sql("""SELECT * FROM test""").show() > +++ > | _1| _2| > +++ > | ok| ok| > | Ok| ok| > | a_b| a_b| > | aab| a_b| > | c%d| c%d| > |caad| c%d| > |e\nf|e\nf| > | e > f|e\nf| > +++ > // both column-based > // not escaping \ > scala> spark.sql("""SELECT * FROM test t WHERE `_1` LIKE REGEXP_REPLACE(`_2`, > '([%_])', '$1')""").show() > ERROR executor.Executor: Exception in task 0.0 in stage 1.0 (TID 1) > org.apache.spark.sql.AnalysisException: the pattern 'e\nf' is invalid, the > escape character is not allowed to precede 'n'; > at > org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42) > at > org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:51) > at > org.apache.spark.sql.catalyst.util.StringUtils.escapeLikeRegex(StringUtils.scala) > ... > // escaping \ > scal
[jira] [Commented] (SPARK-22183) Inconsistency in LIKE escaping between literal values and column-based ones
[ https://issues.apache.org/jira/browse/SPARK-22183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402042#comment-16402042 ] Michal Szafranski commented on SPARK-22183: --- As for the reporters use case, using explicit 'contains()' function would not just work around this issue, but also I would expect it to be significantly faster. I don't think it is mapped in SQL though: {code:java} sqlContext.sql("""SELECT * FROM test t""").filter($"_1".contains($"_2")).show() {code} > Inconsistency in LIKE escaping between literal values and column-based ones > --- > > Key: SPARK-22183 > URL: https://issues.apache.org/jira/browse/SPARK-22183 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Adrien Lavoillotte >Priority: Minor > > I'm trying to implement auto-escaping for {{LIKE}} expressions, in order to > have filters & join conditions like: > * Column A's value contains column B's > * Column A's value contains some literal string > So I need to escape {{LIKE}}-significant characters {{%}} and {{_}}. Since > SparkSQL does not support {{LIKE expr ESCAPE char}}, I need to escape using > \, and presumably also \ itself (twice in the case of literals, since '\\' > represents a single \). > But it seems that in a {{LIKE}} expression literal does not have quite the > same escaping as other literal strings or non-literals {{LIKE}} expressions, > seemingly depending on whether the left-hand side and/or right-hand side are > literals or columns. > Note: I'm using triple-quotes below to avoid scala-level \ escaping. And in > the body of this description, I'm purposedly using zero-width spaces to avoid > Jira transforming my \. > On Spark 2.2.0: > {code} > // both LHS & RHS literals > scala> spark.sql("""SELECT '\\', '\\' LIKE ''""").show() > +---+-+ > | \|\ LIKE \\| > +---+-+ > | \| true| > +---+-+ > scala> spark.sql("""SELECT '\\', '\\' LIKE '\\'""").show() > org.apache.spark.sql.AnalysisException: the pattern '\' is invalid, it is not > allowed to end with the escape character; > at > org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42) > at > org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:53) > at > org.apache.spark.sql.catalyst.expressions.Like.escape(regexpExpressions.scala:105) > at > org.apache.spark.sql.catalyst.expressions.StringRegexExpression.compile(regexpExpressions.scala:50) > at > org.apache.spark.sql.catalyst.expressions.StringRegexExpression.pattern(regexpExpressions.scala:53) > at > org.apache.spark.sql.catalyst.expressions.StringRegexExpression.nullSafeEval(regexpExpressions.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:419) > ... > scala> spark.sql("""SELECT 'a\\b', 'a\\b' LIKE 'ab'""").show() > +---+-+ > |a\b|a\b LIKE a\\b| > +---+-+ > |a\b| true| > +---+-+ > scala> spark.sql("""SELECT 'a\\b', 'a\\b' LIKE 'a\\b'""").show() > org.apache.spark.sql.AnalysisException: the pattern 'a\b' is invalid, the > escape character is not allowed to precede 'b'; > at > org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42) > at > org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:51) > at > org.apache.spark.sql.catalyst.expressions.Like.escape(regexpExpressions.scala:105) > ... > // test data > spark.sql("""SELECT * FROM test""").show() > +++ > | _1| _2| > +++ > | ok| ok| > | Ok| ok| > | a_b| a_b| > | aab| a_b| > | c%d| c%d| > |caad| c%d| > |e\nf|e\nf| > | e > f|e\nf| > +++ > // both column-based > // not escaping \ > scala> spark.sql("""SELECT * FROM test t WHERE `_1` LIKE REGEXP_REPLACE(`_2`, > '([%_])', '$1')""").show() > ERROR executor.Executor: Exception in task 0.0 in stage 1.0 (TID 1) > org.apache.spark.sql.AnalysisException: the pattern 'e\nf' is invalid, the > escape character is not allowed to precede 'n'; > at > org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42) > at > org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:51) > at > org.apache.spark.sql.catalyst.util.StringUtils.escapeLikeRegex(StringUtils.scala) > ... > // escaping \ > scala> spark.sql("""SELECT * FROM test t WHERE `_1` LIKE REGEXP_REPLACE(`_2`, > '([%_])', '$1')""").show() > +++ > | _1| _2| > +++ > | ok| ok| > | a_b| a_b| > | c%d| c%d| > |e\nf|e\nf| > +++ > // LHS column-based, RHS literal > scala> spark.sql("""SELECT * FROM test t WHERE `_1` LIKE 'e\\nf'""").show() > +++ > | _1| _2| > +++
[jira] [Comment Edited] (SPARK-22183) Inconsistency in LIKE escaping between literal values and column-based ones
[ https://issues.apache.org/jira/browse/SPARK-22183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401738#comment-16401738 ] Michal Szafranski edited comment on SPARK-22183 at 3/16/18 11:02 AM: - This seems to be an issue with 'LikeSimplification' rule. When it replaces Like with a simple function (equals, startsWith, etc.) it does not remove escapes from the pattern derived parameter. So {code:java} '\\' LIKE ''{code} is replaced with {code:java} '\\' = ''{code} which is incorrect. The reported inconsistency is due to this rule being applied or not. When both LHS and RHS are constants then LIKE is constant folded before LikeSimplification rule, and since pattern (RHS) needs to be constant to apply the rule this leaves `column LIKE constant` case for the problem to trigger. was (Author: michal.db): This seems to be an issue with 'LikeSimplification' rule. When it replaces Like with a simple function (equals, startsWith, etc.) it does not remove escapes from the pattern derived parameter. So `'\\' LIKE ''` is replaced with `'\\' = ''` which is incorrect. The reported inconsistency is due to this rule being applied or not. When both LHS and RHS are constants then LIKE is constant folded before LikeSimplification rule, and since pattern (RHS) needs to be constant to apply the rule this leaves `column LIKE constant` case for the problem to trigger. > Inconsistency in LIKE escaping between literal values and column-based ones > --- > > Key: SPARK-22183 > URL: https://issues.apache.org/jira/browse/SPARK-22183 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Adrien Lavoillotte >Priority: Minor > > I'm trying to implement auto-escaping for {{LIKE}} expressions, in order to > have filters & join conditions like: > * Column A's value contains column B's > * Column A's value contains some literal string > So I need to escape {{LIKE}}-significant characters {{%}} and {{_}}. Since > SparkSQL does not support {{LIKE expr ESCAPE char}}, I need to escape using > \, and presumably also \ itself (twice in the case of literals, since '\\' > represents a single \). > But it seems that in a {{LIKE}} expression literal does not have quite the > same escaping as other literal strings or non-literals {{LIKE}} expressions, > seemingly depending on whether the left-hand side and/or right-hand side are > literals or columns. > Note: I'm using triple-quotes below to avoid scala-level \ escaping. And in > the body of this description, I'm purposedly using zero-width spaces to avoid > Jira transforming my \. > On Spark 2.2.0: > {code} > // both LHS & RHS literals > scala> spark.sql("""SELECT '\\', '\\' LIKE ''""").show() > +---+-+ > | \|\ LIKE \\| > +---+-+ > | \| true| > +---+-+ > scala> spark.sql("""SELECT '\\', '\\' LIKE '\\'""").show() > org.apache.spark.sql.AnalysisException: the pattern '\' is invalid, it is not > allowed to end with the escape character; > at > org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42) > at > org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:53) > at > org.apache.spark.sql.catalyst.expressions.Like.escape(regexpExpressions.scala:105) > at > org.apache.spark.sql.catalyst.expressions.StringRegexExpression.compile(regexpExpressions.scala:50) > at > org.apache.spark.sql.catalyst.expressions.StringRegexExpression.pattern(regexpExpressions.scala:53) > at > org.apache.spark.sql.catalyst.expressions.StringRegexExpression.nullSafeEval(regexpExpressions.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:419) > ... > scala> spark.sql("""SELECT 'a\\b', 'a\\b' LIKE 'ab'""").show() > +---+-+ > |a\b|a\b LIKE a\\b| > +---+-+ > |a\b| true| > +---+-+ > scala> spark.sql("""SELECT 'a\\b', 'a\\b' LIKE 'a\\b'""").show() > org.apache.spark.sql.AnalysisException: the pattern 'a\b' is invalid, the > escape character is not allowed to precede 'b'; > at > org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42) > at > org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:51) > at > org.apache.spark.sql.catalyst.expressions.Like.escape(regexpExpressions.scala:105) > ... > // test data > spark.sql("""SELECT * FROM test""").show() > +++ > | _1| _2| > +++ > | ok| ok| > | Ok| ok| > | a_b| a_b| > | aab| a_b| > | c%d| c%d| > |caad| c%d| > |e\nf|e\nf| > | e > f|e\nf| > +++ > // both column-based > // not escaping \ > scala> spark.sql("""SELECT * FROM test t WHERE `_1` LIKE REGEXP_REP
[jira] [Commented] (SPARK-22183) Inconsistency in LIKE escaping between literal values and column-based ones
[ https://issues.apache.org/jira/browse/SPARK-22183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401738#comment-16401738 ] Michal Szafranski commented on SPARK-22183: --- This seems to be an issue with 'LikeSimplification' rule. When it replaces Like with a simple function (equals, startsWith, etc.) it does not remove escapes from the pattern derived parameter. So `'\\' LIKE ''` is replaced with `'\\' = ''` which is incorrect. The reported inconsistency is due to this rule being applied or not. When both LHS and RHS are constants then LIKE is constant folded before LikeSimplification rule, and since pattern (RHS) needs to be constant to apply the rule this leaves `column LIKE constant` case for the problem to trigger. > Inconsistency in LIKE escaping between literal values and column-based ones > --- > > Key: SPARK-22183 > URL: https://issues.apache.org/jira/browse/SPARK-22183 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Adrien Lavoillotte >Priority: Minor > > I'm trying to implement auto-escaping for {{LIKE}} expressions, in order to > have filters & join conditions like: > * Column A's value contains column B's > * Column A's value contains some literal string > So I need to escape {{LIKE}}-significant characters {{%}} and {{_}}. Since > SparkSQL does not support {{LIKE expr ESCAPE char}}, I need to escape using > \, and presumably also \ itself (twice in the case of literals, since '\\' > represents a single \). > But it seems that in a {{LIKE}} expression literal does not have quite the > same escaping as other literal strings or non-literals {{LIKE}} expressions, > seemingly depending on whether the left-hand side and/or right-hand side are > literals or columns. > Note: I'm using triple-quotes below to avoid scala-level \ escaping. And in > the body of this description, I'm purposedly using zero-width spaces to avoid > Jira transforming my \. > On Spark 2.2.0: > {code} > // both LHS & RHS literals > scala> spark.sql("""SELECT '\\', '\\' LIKE ''""").show() > +---+-+ > | \|\ LIKE \\| > +---+-+ > | \| true| > +---+-+ > scala> spark.sql("""SELECT '\\', '\\' LIKE '\\'""").show() > org.apache.spark.sql.AnalysisException: the pattern '\' is invalid, it is not > allowed to end with the escape character; > at > org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42) > at > org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:53) > at > org.apache.spark.sql.catalyst.expressions.Like.escape(regexpExpressions.scala:105) > at > org.apache.spark.sql.catalyst.expressions.StringRegexExpression.compile(regexpExpressions.scala:50) > at > org.apache.spark.sql.catalyst.expressions.StringRegexExpression.pattern(regexpExpressions.scala:53) > at > org.apache.spark.sql.catalyst.expressions.StringRegexExpression.nullSafeEval(regexpExpressions.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:419) > ... > scala> spark.sql("""SELECT 'a\\b', 'a\\b' LIKE 'ab'""").show() > +---+-+ > |a\b|a\b LIKE a\\b| > +---+-+ > |a\b| true| > +---+-+ > scala> spark.sql("""SELECT 'a\\b', 'a\\b' LIKE 'a\\b'""").show() > org.apache.spark.sql.AnalysisException: the pattern 'a\b' is invalid, the > escape character is not allowed to precede 'b'; > at > org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42) > at > org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:51) > at > org.apache.spark.sql.catalyst.expressions.Like.escape(regexpExpressions.scala:105) > ... > // test data > spark.sql("""SELECT * FROM test""").show() > +++ > | _1| _2| > +++ > | ok| ok| > | Ok| ok| > | a_b| a_b| > | aab| a_b| > | c%d| c%d| > |caad| c%d| > |e\nf|e\nf| > | e > f|e\nf| > +++ > // both column-based > // not escaping \ > scala> spark.sql("""SELECT * FROM test t WHERE `_1` LIKE REGEXP_REPLACE(`_2`, > '([%_])', '$1')""").show() > ERROR executor.Executor: Exception in task 0.0 in stage 1.0 (TID 1) > org.apache.spark.sql.AnalysisException: the pattern 'e\nf' is invalid, the > escape character is not allowed to precede 'n'; > at > org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42) > at > org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:51) > at > org.apache.spark.sql.catalyst.util.StringUtils.escapeLikeRegex(StringUtils.scala) > ... > // escaping \ > scala> spark.sql("""SELECT * FROM test t WHERE `_1` LIKE REGEXP_REPLACE(`_2`, > '([%_])', '$1')""").show() > +--
[jira] [Created] (SPARK-20474) OnHeapColumnVector realocation may not copy existing data
Michal Szafranski created SPARK-20474: - Summary: OnHeapColumnVector realocation may not copy existing data Key: SPARK-20474 URL: https://issues.apache.org/jira/browse/SPARK-20474 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.1.0 Reporter: Michal Szafranski OnHeapColumnVector reallocation copies to the new storage data up to 'elementsAppended'. This variable is only updated when using the ColumnVector.appendX API, while ColumnVector.putX is more commonly used. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20473) ColumnVector.Array is missing accessors for some types
Michal Szafranski created SPARK-20473: - Summary: ColumnVector.Array is missing accessors for some types Key: SPARK-20473 URL: https://issues.apache.org/jira/browse/SPARK-20473 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.1.0 Reporter: Michal Szafranski ColumnVector implementations originally did not support some Catalyst types (float, short, and boolean). Now that they do, those types should be also added to the ColumnVector.Array. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org