[jira] [Commented] (SPARK-22183) Inconsistency in LIKE escaping between literal values and column-based ones

2018-03-16 Thread Michal Szafranski (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402122#comment-16402122
 ] 

Michal Szafranski commented on SPARK-22183:
---

[~instanceof me] Would it work for your use case if 'contains()' would be 
accessible through Spark SQL (for example "SELECT * FROM test WHERE 
contains(_1, _2)")?

> Inconsistency in LIKE escaping between literal values and column-based ones
> ---
>
> Key: SPARK-22183
> URL: https://issues.apache.org/jira/browse/SPARK-22183
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Adrien Lavoillotte
>Priority: Minor
>
> I'm trying to implement auto-escaping for {{LIKE}} expressions, in order to 
> have filters & join conditions like:
> * Column A's value contains column B's
> * Column A's value contains some literal string
> So I need to escape {{LIKE}}-significant characters {{%}} and {{_}}. Since 
> SparkSQL does not support {{LIKE expr ESCAPE char}}, I need to escape using 
> \, and presumably also \ itself (twice in the case of literals, since '\​\' 
> represents a single \​).
> But it seems that in a {{LIKE}} expression literal does not have quite the 
> same escaping as other literal strings or non-literals {{LIKE}} expressions, 
> seemingly depending on whether the left-hand side and/or right-hand side are 
> literals or columns.
> Note: I'm using triple-quotes below to avoid scala-level \ escaping. And in 
> the body of this description, I'm purposedly using zero-width spaces to avoid 
> Jira transforming my \​.
> On Spark 2.2.0:
> {code}
> // both LHS & RHS literals
> scala> spark.sql("""SELECT '\\', '\\' LIKE ''""").show()
> +---+-+
> |  \|\ LIKE \\|
> +---+-+
> |  \| true|
> +---+-+
> scala> spark.sql("""SELECT '\\', '\\' LIKE '\\'""").show()
> org.apache.spark.sql.AnalysisException: the pattern '\' is invalid, it is not 
> allowed to end with the escape character;
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42)
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:53)
>   at 
> org.apache.spark.sql.catalyst.expressions.Like.escape(regexpExpressions.scala:105)
>   at 
> org.apache.spark.sql.catalyst.expressions.StringRegexExpression.compile(regexpExpressions.scala:50)
>   at 
> org.apache.spark.sql.catalyst.expressions.StringRegexExpression.pattern(regexpExpressions.scala:53)
>   at 
> org.apache.spark.sql.catalyst.expressions.StringRegexExpression.nullSafeEval(regexpExpressions.scala:56)
>   at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:419)
>   ...
> scala> spark.sql("""SELECT 'a\\b', 'a\\b' LIKE 'ab'""").show()
> +---+-+
> |a\b|a\b LIKE a\\b|
> +---+-+
> |a\b| true|
> +---+-+
> scala> spark.sql("""SELECT 'a\\b', 'a\\b' LIKE 'a\\b'""").show()
> org.apache.spark.sql.AnalysisException: the pattern 'a\b' is invalid, the 
> escape character is not allowed to precede 'b';
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42)
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:51)
>   at 
> org.apache.spark.sql.catalyst.expressions.Like.escape(regexpExpressions.scala:105)
>   ...
> // test data
> spark.sql("""SELECT * FROM test""").show()
> +++
> |  _1|  _2|
> +++
> |  ok|  ok|
> |  Ok|  ok|
> | a_b| a_b|
> | aab| a_b|
> | c%d| c%d|
> |caad| c%d|
> |e\nf|e\nf|
> | e
> f|e\nf|
> +++
> // both column-based
> // not escaping \
> scala> spark.sql("""SELECT * FROM test t WHERE `_1` LIKE REGEXP_REPLACE(`_2`, 
> '([%_])', '$1')""").show()
> ERROR executor.Executor: Exception in task 0.0 in stage 1.0 (TID 1)
> org.apache.spark.sql.AnalysisException: the pattern 'e\nf' is invalid, the 
> escape character is not allowed to precede 'n';
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42)
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:51)
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils.escapeLikeRegex(StringUtils.scala)
>   ...
> // escaping \
> scala> spark.sql("""SELECT * FROM test t WHERE `_1` LIKE REGEXP_REPLACE(`_2`, 
> '([%_])', '$1')""").show()
> +++
> |  _1|  _2|
> +++
> |  ok|  ok|
> | a_b| a_b|
> | c%d| c%d|
> |e\nf|e\nf|
> +++
> // LHS column-based, RHS literal
> scala> spark.sql("""SELECT * FROM test t WHERE `_1` LIKE 'e\\nf'""").show()
> +++
> |  _1|  _2|
> +++
> |e\nf|e\nf|
> +++
> scala> spark.sql("""SELECT * FROM test t WHERE `_1` LIKE 'enf'""").show()
> +---+---+
> | _1| _2|
> +---+---+
> 

[jira] [Comment Edited] (SPARK-22183) Inconsistency in LIKE escaping between literal values and column-based ones

2018-03-16 Thread Michal Szafranski (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402042#comment-16402042
 ] 

Michal Szafranski edited comment on SPARK-22183 at 3/16/18 3:20 PM:


As for the reporters use case, using explicit 'contains()' function would not 
just work around this issue, but also I would expect it to be significantly 
faster. I don't think it is mapped in SQL though:
{code:java}
sqlContext.sql("SELECT * FROM test t").filter($"_1".contains($"_2")).show()
{code}
 


was (Author: michal.db):
As for the reporters use case, using explicit 'contains()' function would not 
just work around this issue, but also I would expect it to be significantly 
faster. I don't think it is mapped in SQL though:

 
{code:java}
sqlContext.sql("""SELECT * FROM test t""").filter($"_1".contains($"_2")).show()
{code}
 

> Inconsistency in LIKE escaping between literal values and column-based ones
> ---
>
> Key: SPARK-22183
> URL: https://issues.apache.org/jira/browse/SPARK-22183
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Adrien Lavoillotte
>Priority: Minor
>
> I'm trying to implement auto-escaping for {{LIKE}} expressions, in order to 
> have filters & join conditions like:
> * Column A's value contains column B's
> * Column A's value contains some literal string
> So I need to escape {{LIKE}}-significant characters {{%}} and {{_}}. Since 
> SparkSQL does not support {{LIKE expr ESCAPE char}}, I need to escape using 
> \, and presumably also \ itself (twice in the case of literals, since '\​\' 
> represents a single \​).
> But it seems that in a {{LIKE}} expression literal does not have quite the 
> same escaping as other literal strings or non-literals {{LIKE}} expressions, 
> seemingly depending on whether the left-hand side and/or right-hand side are 
> literals or columns.
> Note: I'm using triple-quotes below to avoid scala-level \ escaping. And in 
> the body of this description, I'm purposedly using zero-width spaces to avoid 
> Jira transforming my \​.
> On Spark 2.2.0:
> {code}
> // both LHS & RHS literals
> scala> spark.sql("""SELECT '\\', '\\' LIKE ''""").show()
> +---+-+
> |  \|\ LIKE \\|
> +---+-+
> |  \| true|
> +---+-+
> scala> spark.sql("""SELECT '\\', '\\' LIKE '\\'""").show()
> org.apache.spark.sql.AnalysisException: the pattern '\' is invalid, it is not 
> allowed to end with the escape character;
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42)
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:53)
>   at 
> org.apache.spark.sql.catalyst.expressions.Like.escape(regexpExpressions.scala:105)
>   at 
> org.apache.spark.sql.catalyst.expressions.StringRegexExpression.compile(regexpExpressions.scala:50)
>   at 
> org.apache.spark.sql.catalyst.expressions.StringRegexExpression.pattern(regexpExpressions.scala:53)
>   at 
> org.apache.spark.sql.catalyst.expressions.StringRegexExpression.nullSafeEval(regexpExpressions.scala:56)
>   at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:419)
>   ...
> scala> spark.sql("""SELECT 'a\\b', 'a\\b' LIKE 'ab'""").show()
> +---+-+
> |a\b|a\b LIKE a\\b|
> +---+-+
> |a\b| true|
> +---+-+
> scala> spark.sql("""SELECT 'a\\b', 'a\\b' LIKE 'a\\b'""").show()
> org.apache.spark.sql.AnalysisException: the pattern 'a\b' is invalid, the 
> escape character is not allowed to precede 'b';
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42)
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:51)
>   at 
> org.apache.spark.sql.catalyst.expressions.Like.escape(regexpExpressions.scala:105)
>   ...
> // test data
> spark.sql("""SELECT * FROM test""").show()
> +++
> |  _1|  _2|
> +++
> |  ok|  ok|
> |  Ok|  ok|
> | a_b| a_b|
> | aab| a_b|
> | c%d| c%d|
> |caad| c%d|
> |e\nf|e\nf|
> | e
> f|e\nf|
> +++
> // both column-based
> // not escaping \
> scala> spark.sql("""SELECT * FROM test t WHERE `_1` LIKE REGEXP_REPLACE(`_2`, 
> '([%_])', '$1')""").show()
> ERROR executor.Executor: Exception in task 0.0 in stage 1.0 (TID 1)
> org.apache.spark.sql.AnalysisException: the pattern 'e\nf' is invalid, the 
> escape character is not allowed to precede 'n';
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42)
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:51)
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils.escapeLikeRegex(StringUtils.scala)
>   ...
> // escaping \
> scal

[jira] [Commented] (SPARK-22183) Inconsistency in LIKE escaping between literal values and column-based ones

2018-03-16 Thread Michal Szafranski (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402042#comment-16402042
 ] 

Michal Szafranski commented on SPARK-22183:
---

As for the reporters use case, using explicit 'contains()' function would not 
just work around this issue, but also I would expect it to be significantly 
faster. I don't think it is mapped in SQL though:

 
{code:java}
sqlContext.sql("""SELECT * FROM test t""").filter($"_1".contains($"_2")).show()
{code}
 

> Inconsistency in LIKE escaping between literal values and column-based ones
> ---
>
> Key: SPARK-22183
> URL: https://issues.apache.org/jira/browse/SPARK-22183
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Adrien Lavoillotte
>Priority: Minor
>
> I'm trying to implement auto-escaping for {{LIKE}} expressions, in order to 
> have filters & join conditions like:
> * Column A's value contains column B's
> * Column A's value contains some literal string
> So I need to escape {{LIKE}}-significant characters {{%}} and {{_}}. Since 
> SparkSQL does not support {{LIKE expr ESCAPE char}}, I need to escape using 
> \, and presumably also \ itself (twice in the case of literals, since '\​\' 
> represents a single \​).
> But it seems that in a {{LIKE}} expression literal does not have quite the 
> same escaping as other literal strings or non-literals {{LIKE}} expressions, 
> seemingly depending on whether the left-hand side and/or right-hand side are 
> literals or columns.
> Note: I'm using triple-quotes below to avoid scala-level \ escaping. And in 
> the body of this description, I'm purposedly using zero-width spaces to avoid 
> Jira transforming my \​.
> On Spark 2.2.0:
> {code}
> // both LHS & RHS literals
> scala> spark.sql("""SELECT '\\', '\\' LIKE ''""").show()
> +---+-+
> |  \|\ LIKE \\|
> +---+-+
> |  \| true|
> +---+-+
> scala> spark.sql("""SELECT '\\', '\\' LIKE '\\'""").show()
> org.apache.spark.sql.AnalysisException: the pattern '\' is invalid, it is not 
> allowed to end with the escape character;
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42)
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:53)
>   at 
> org.apache.spark.sql.catalyst.expressions.Like.escape(regexpExpressions.scala:105)
>   at 
> org.apache.spark.sql.catalyst.expressions.StringRegexExpression.compile(regexpExpressions.scala:50)
>   at 
> org.apache.spark.sql.catalyst.expressions.StringRegexExpression.pattern(regexpExpressions.scala:53)
>   at 
> org.apache.spark.sql.catalyst.expressions.StringRegexExpression.nullSafeEval(regexpExpressions.scala:56)
>   at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:419)
>   ...
> scala> spark.sql("""SELECT 'a\\b', 'a\\b' LIKE 'ab'""").show()
> +---+-+
> |a\b|a\b LIKE a\\b|
> +---+-+
> |a\b| true|
> +---+-+
> scala> spark.sql("""SELECT 'a\\b', 'a\\b' LIKE 'a\\b'""").show()
> org.apache.spark.sql.AnalysisException: the pattern 'a\b' is invalid, the 
> escape character is not allowed to precede 'b';
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42)
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:51)
>   at 
> org.apache.spark.sql.catalyst.expressions.Like.escape(regexpExpressions.scala:105)
>   ...
> // test data
> spark.sql("""SELECT * FROM test""").show()
> +++
> |  _1|  _2|
> +++
> |  ok|  ok|
> |  Ok|  ok|
> | a_b| a_b|
> | aab| a_b|
> | c%d| c%d|
> |caad| c%d|
> |e\nf|e\nf|
> | e
> f|e\nf|
> +++
> // both column-based
> // not escaping \
> scala> spark.sql("""SELECT * FROM test t WHERE `_1` LIKE REGEXP_REPLACE(`_2`, 
> '([%_])', '$1')""").show()
> ERROR executor.Executor: Exception in task 0.0 in stage 1.0 (TID 1)
> org.apache.spark.sql.AnalysisException: the pattern 'e\nf' is invalid, the 
> escape character is not allowed to precede 'n';
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42)
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:51)
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils.escapeLikeRegex(StringUtils.scala)
>   ...
> // escaping \
> scala> spark.sql("""SELECT * FROM test t WHERE `_1` LIKE REGEXP_REPLACE(`_2`, 
> '([%_])', '$1')""").show()
> +++
> |  _1|  _2|
> +++
> |  ok|  ok|
> | a_b| a_b|
> | c%d| c%d|
> |e\nf|e\nf|
> +++
> // LHS column-based, RHS literal
> scala> spark.sql("""SELECT * FROM test t WHERE `_1` LIKE 'e\\nf'""").show()
> +++
> |  _1|  _2|
> +++

[jira] [Comment Edited] (SPARK-22183) Inconsistency in LIKE escaping between literal values and column-based ones

2018-03-16 Thread Michal Szafranski (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401738#comment-16401738
 ] 

Michal Szafranski edited comment on SPARK-22183 at 3/16/18 11:02 AM:
-

This seems to be an issue with 'LikeSimplification' rule. When it replaces Like 
with a simple function (equals, startsWith, etc.) it does not remove escapes 
from the pattern derived parameter. So
{code:java}
'\\' LIKE ''{code}
is replaced with
{code:java}
'\\' = ''{code}
which is incorrect.

The reported inconsistency is due to this rule being applied or not. When both 
LHS and RHS are constants then LIKE is constant folded before 
LikeSimplification rule, and since pattern (RHS) needs to be constant to apply 
the rule this leaves `column LIKE constant` case for the problem to trigger.


was (Author: michal.db):
This seems to be an issue with 'LikeSimplification' rule. When it replaces Like 
with a simple function (equals, startsWith, etc.) it does not remove escapes 
from the pattern derived parameter. So `'\\' LIKE ''` is replaced with 
`'\\' = ''` which is incorrect.

The reported inconsistency is due to this rule being applied or not. When both 
LHS and RHS are constants then LIKE is constant folded before 
LikeSimplification rule, and since pattern (RHS) needs to be constant to apply 
the rule this leaves `column LIKE constant` case for the problem to trigger.

> Inconsistency in LIKE escaping between literal values and column-based ones
> ---
>
> Key: SPARK-22183
> URL: https://issues.apache.org/jira/browse/SPARK-22183
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Adrien Lavoillotte
>Priority: Minor
>
> I'm trying to implement auto-escaping for {{LIKE}} expressions, in order to 
> have filters & join conditions like:
> * Column A's value contains column B's
> * Column A's value contains some literal string
> So I need to escape {{LIKE}}-significant characters {{%}} and {{_}}. Since 
> SparkSQL does not support {{LIKE expr ESCAPE char}}, I need to escape using 
> \, and presumably also \ itself (twice in the case of literals, since '\​\' 
> represents a single \​).
> But it seems that in a {{LIKE}} expression literal does not have quite the 
> same escaping as other literal strings or non-literals {{LIKE}} expressions, 
> seemingly depending on whether the left-hand side and/or right-hand side are 
> literals or columns.
> Note: I'm using triple-quotes below to avoid scala-level \ escaping. And in 
> the body of this description, I'm purposedly using zero-width spaces to avoid 
> Jira transforming my \​.
> On Spark 2.2.0:
> {code}
> // both LHS & RHS literals
> scala> spark.sql("""SELECT '\\', '\\' LIKE ''""").show()
> +---+-+
> |  \|\ LIKE \\|
> +---+-+
> |  \| true|
> +---+-+
> scala> spark.sql("""SELECT '\\', '\\' LIKE '\\'""").show()
> org.apache.spark.sql.AnalysisException: the pattern '\' is invalid, it is not 
> allowed to end with the escape character;
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42)
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:53)
>   at 
> org.apache.spark.sql.catalyst.expressions.Like.escape(regexpExpressions.scala:105)
>   at 
> org.apache.spark.sql.catalyst.expressions.StringRegexExpression.compile(regexpExpressions.scala:50)
>   at 
> org.apache.spark.sql.catalyst.expressions.StringRegexExpression.pattern(regexpExpressions.scala:53)
>   at 
> org.apache.spark.sql.catalyst.expressions.StringRegexExpression.nullSafeEval(regexpExpressions.scala:56)
>   at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:419)
>   ...
> scala> spark.sql("""SELECT 'a\\b', 'a\\b' LIKE 'ab'""").show()
> +---+-+
> |a\b|a\b LIKE a\\b|
> +---+-+
> |a\b| true|
> +---+-+
> scala> spark.sql("""SELECT 'a\\b', 'a\\b' LIKE 'a\\b'""").show()
> org.apache.spark.sql.AnalysisException: the pattern 'a\b' is invalid, the 
> escape character is not allowed to precede 'b';
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42)
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:51)
>   at 
> org.apache.spark.sql.catalyst.expressions.Like.escape(regexpExpressions.scala:105)
>   ...
> // test data
> spark.sql("""SELECT * FROM test""").show()
> +++
> |  _1|  _2|
> +++
> |  ok|  ok|
> |  Ok|  ok|
> | a_b| a_b|
> | aab| a_b|
> | c%d| c%d|
> |caad| c%d|
> |e\nf|e\nf|
> | e
> f|e\nf|
> +++
> // both column-based
> // not escaping \
> scala> spark.sql("""SELECT * FROM test t WHERE `_1` LIKE REGEXP_REP

[jira] [Commented] (SPARK-22183) Inconsistency in LIKE escaping between literal values and column-based ones

2018-03-16 Thread Michal Szafranski (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401738#comment-16401738
 ] 

Michal Szafranski commented on SPARK-22183:
---

This seems to be an issue with 'LikeSimplification' rule. When it replaces Like 
with a simple function (equals, startsWith, etc.) it does not remove escapes 
from the pattern derived parameter. So `'\\' LIKE ''` is replaced with 
`'\\' = ''` which is incorrect.

The reported inconsistency is due to this rule being applied or not. When both 
LHS and RHS are constants then LIKE is constant folded before 
LikeSimplification rule, and since pattern (RHS) needs to be constant to apply 
the rule this leaves `column LIKE constant` case for the problem to trigger.

> Inconsistency in LIKE escaping between literal values and column-based ones
> ---
>
> Key: SPARK-22183
> URL: https://issues.apache.org/jira/browse/SPARK-22183
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Adrien Lavoillotte
>Priority: Minor
>
> I'm trying to implement auto-escaping for {{LIKE}} expressions, in order to 
> have filters & join conditions like:
> * Column A's value contains column B's
> * Column A's value contains some literal string
> So I need to escape {{LIKE}}-significant characters {{%}} and {{_}}. Since 
> SparkSQL does not support {{LIKE expr ESCAPE char}}, I need to escape using 
> \, and presumably also \ itself (twice in the case of literals, since '\​\' 
> represents a single \​).
> But it seems that in a {{LIKE}} expression literal does not have quite the 
> same escaping as other literal strings or non-literals {{LIKE}} expressions, 
> seemingly depending on whether the left-hand side and/or right-hand side are 
> literals or columns.
> Note: I'm using triple-quotes below to avoid scala-level \ escaping. And in 
> the body of this description, I'm purposedly using zero-width spaces to avoid 
> Jira transforming my \​.
> On Spark 2.2.0:
> {code}
> // both LHS & RHS literals
> scala> spark.sql("""SELECT '\\', '\\' LIKE ''""").show()
> +---+-+
> |  \|\ LIKE \\|
> +---+-+
> |  \| true|
> +---+-+
> scala> spark.sql("""SELECT '\\', '\\' LIKE '\\'""").show()
> org.apache.spark.sql.AnalysisException: the pattern '\' is invalid, it is not 
> allowed to end with the escape character;
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42)
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:53)
>   at 
> org.apache.spark.sql.catalyst.expressions.Like.escape(regexpExpressions.scala:105)
>   at 
> org.apache.spark.sql.catalyst.expressions.StringRegexExpression.compile(regexpExpressions.scala:50)
>   at 
> org.apache.spark.sql.catalyst.expressions.StringRegexExpression.pattern(regexpExpressions.scala:53)
>   at 
> org.apache.spark.sql.catalyst.expressions.StringRegexExpression.nullSafeEval(regexpExpressions.scala:56)
>   at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:419)
>   ...
> scala> spark.sql("""SELECT 'a\\b', 'a\\b' LIKE 'ab'""").show()
> +---+-+
> |a\b|a\b LIKE a\\b|
> +---+-+
> |a\b| true|
> +---+-+
> scala> spark.sql("""SELECT 'a\\b', 'a\\b' LIKE 'a\\b'""").show()
> org.apache.spark.sql.AnalysisException: the pattern 'a\b' is invalid, the 
> escape character is not allowed to precede 'b';
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42)
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:51)
>   at 
> org.apache.spark.sql.catalyst.expressions.Like.escape(regexpExpressions.scala:105)
>   ...
> // test data
> spark.sql("""SELECT * FROM test""").show()
> +++
> |  _1|  _2|
> +++
> |  ok|  ok|
> |  Ok|  ok|
> | a_b| a_b|
> | aab| a_b|
> | c%d| c%d|
> |caad| c%d|
> |e\nf|e\nf|
> | e
> f|e\nf|
> +++
> // both column-based
> // not escaping \
> scala> spark.sql("""SELECT * FROM test t WHERE `_1` LIKE REGEXP_REPLACE(`_2`, 
> '([%_])', '$1')""").show()
> ERROR executor.Executor: Exception in task 0.0 in stage 1.0 (TID 1)
> org.apache.spark.sql.AnalysisException: the pattern 'e\nf' is invalid, the 
> escape character is not allowed to precede 'n';
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42)
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:51)
>   at 
> org.apache.spark.sql.catalyst.util.StringUtils.escapeLikeRegex(StringUtils.scala)
>   ...
> // escaping \
> scala> spark.sql("""SELECT * FROM test t WHERE `_1` LIKE REGEXP_REPLACE(`_2`, 
> '([%_])', '$1')""").show()
> +--

[jira] [Created] (SPARK-20474) OnHeapColumnVector realocation may not copy existing data

2017-04-26 Thread Michal Szafranski (JIRA)
Michal Szafranski created SPARK-20474:
-

 Summary: OnHeapColumnVector realocation may not copy existing data
 Key: SPARK-20474
 URL: https://issues.apache.org/jira/browse/SPARK-20474
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.1.0
Reporter: Michal Szafranski


OnHeapColumnVector reallocation copies to the new storage data up to 
'elementsAppended'. This variable is only updated when using the 
ColumnVector.appendX API, while ColumnVector.putX is more commonly used.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20473) ColumnVector.Array is missing accessors for some types

2017-04-26 Thread Michal Szafranski (JIRA)
Michal Szafranski created SPARK-20473:
-

 Summary: ColumnVector.Array is missing accessors for some types
 Key: SPARK-20473
 URL: https://issues.apache.org/jira/browse/SPARK-20473
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.1.0
Reporter: Michal Szafranski


ColumnVector implementations originally did not support some Catalyst types 
(float, short, and boolean). Now that they do, those types should be also added 
to the ColumnVector.Array.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org